Yes, We Can Provide Single Sign On and Protect Privacy at the Same Time - RA21 Moves to a Beta Phase

Earlier this spring at the ER&L Conference, Si va Vaidhyanathan, gave a talk about his difficulty with the culture of online conversations and social media. A component of his talk centered around the question of technologies and how they are individually helpful, but when those same technologies operate at scale, they can create significant problems. We often don’t see the problems that technology can create until the technology becomes ubiquitous. This concept was outlined by Dr. Melvin Kranzberg, a professor of the history of technology at the Georgia Institute of Technology and the founding editor of Technology and Culture. In 1985, Dr. Kranzberg presented his Presidential Address to the Society for the History of Technology in which he put forward his Six Laws of Technology. The first of his Laws was that “Technology is neither good nor bad; nor is it neutral.” He described this as:

“technology’s interaction with the social ecology is such that technical developments frequently have environmental, social, and human consequences that go far beyond the immediate purposes of the technical devices and practices themselves, and the same technology can have quite different results when introduced into different contexts or under different circumstances.”

Simply put, technology in and of itself is generally neither a good nor a bad thing. How it is applied in a specific context or environment determines whether we perceive it as either good or bad. In his paper Kranzberg describes how perceptions of technology change as unintended consequences come to light. Technologies that were once thought to be helpful or positive eventually can be seen as problematic when applied at scale. Or in other contexts, the risks and problems posed by a technology might seem less important than the solution. One particularly troubling example of this described in the article is the use of DDT in the 1950-1960s, during which in the chemical was banned for its environmental and health consequences in most Western countries. Yet, in India it is continuing to be used and India remains the only country to still manufacture the chemical. One main driver behind India’s support is because DDT use reduced malaria cases more than 74.9 million cases per year and saved some 748,000 lives each year. It is this scale of the problem DDT addressed, which justifies to the Indian leadership DDT’s continued use despite the other environmental and health risks. There are other, potentially better ways to control malaria than a toxic pesticide, but the intransigence of the Indian government on this matter is illustrative some 35 years after it was described by Kranzberg.

Federated identity might not leap to mind in terms of technologies that could stir up significant controversy or cause societal problems, certainly nothing along the lines of DDT. Yet, it has stirred up concern and controversy in the publishing and library community over the past year. The systems that support federated identity have been in use since the 1990s and grew in adoption in the early 2000s. At that time, institutions were increasingly met with the challenges of a centralized, but granular login system that encompassed the entire user base of an institution. Federated identity allows the institution to centrally manage user login structures, while also giving service providers a single contact point for the institutional access control. This approach also allowed institutions to separate the management of who a person is (identity management) with what a person is allowed to do (access control). Today, the eduGain system, which provides services to institutional networks of Single Sign On (SSO) services, supports nearly 90 million post-secondary users from 75 countries.

Because the identity federation system was designed with the entire institution in mind, the system had to be flexible. It had to accommodate everything from HR management systems to student courseware systems, and from research tools to anonymous access for library users. These varied use cases were made possible as a result of splitting the three core components of authentication; the identity of the user, the attributes about the user and how they relate to access rights, and finally what does the authenticated system need to know about the user. Practically, the system needs a way to recognize and authenticate that this user is this registered person in the system. In essence, there is the question of validating that “I am who I say I am”. This is a security process and can involve passwords, tokens, key cards, multi-factor-authentication, or as a simple process, I see Sara at the desk and I recognize Sara, so I let her pass. The second element of this system is the detailed information that the system knows about a user who has been authenticated. Sara might be a Chemist, she may be a faculty member, she may be working on this particular research project. She may have inter-institutional credentials to work on a joint project with another institution. And she may be chair of the department, so she may have access to certain management systems at the institution that are otherwise restricted. These are all attributes, call them metadata, about the person and these attributes give her access to a variety of systems and services. For example, as Chair, Sara may have access to university HR or budget systems that most on campus don’t have access to. She may be allowed to use research services that are only available to the Chemistry department, or because she is part of a cross-institutional research project.

These attributes then are used to facilitate the access to specific resources, which is the final element of this chain. Access control is then based on ascertaining that this person has the appropriate credentials to “walk in the door”, so to speak. Then the system checks to see if this person has the appropriate attributes to access the service in question and then determines whether the person should be allowed to pass. Finally, once access is granted, the authentication system MAY (though it does not necessarily have to) pass information along with the rights to access information about the user. There are many potential purposes for this. In the case of a course management system, in addition to indicating that this user is a student, the access control system needs to also provide the details that this is Susan, she is a student, she is enrolled in these courses this semester, etc., in order to provide information to Susan about her grades, her homework, and other relevant materials for her.

It is this final question that has been the focus of a lot of the conversation about the NISO/STM joint RA21 project, particularly regarding privacy. How can a system that is designed to share information about the user protect a user’s privacy? Let’s consider first the issue that a user is logged into an online system, individually and therefore can be tracked. Whenever a user logs into any network, information about their activity is always logged somewhere, on some system. That could be your internet service provider, it could be the proxy server system you have logged into from off campus, it could be on the publisher’s website through the use of cookies, or other forms of tracking. Some of this tracking can be masked and avoided, such as blocking cookies, or using a VPN, or the TOR network. The reality is that most users do not go so far to protect their privacy. However, in the context of the RA21 project and in the context of library services,— where it is an ethical and sometimes a legal responsibility to protect patron data even if the user doesn’t care — let’s presume that they do care and that systems should be designed to be as privacy-protecting as possible.

The RA21 Recommended Practice, which was published late last month takes both a broad, but also stringent approach to attribute release. Although, the RA21 system was originally conceived as a service that could provide access to library-subscribed resources, the project has grown in scope, if not complexity. The longer the project was discussed and as more people in the identity federation community became engaged, it made sense to provide a more general solution, one that could be used for any institutional login purpose, from access to library resources, to shared research infrastructure, to even access to educational discounts arranged by one’s institution. Each of these use cases has a different demand of the metadata about the user. Again, this is a feature of the identity-federation-based access control structure. However, rules need to be established to govern the sharing and use of these data, particularly for library services.

The RA21 recommendation addresses this in a couple different ways. First, the recommendation explicitly states that for the use case where access is provided to an information resources and where no personalization is needed, only an anonymous entitlement attribute (e.g., eduPersonEntitlement) should be used. This will preserve the privacy and anonymity of the user as no personal or trackable information is shared by the institution. If additional functionality is required, and if there is a specific contractual agreement between the service provider and the institution, then pseudonymous identifiers can be used, which allow tracking, but mask the user’s identity. Institutions may also share opaque reporting codes, which identify user groups such as faculty or departments, so that more granular data can be gathered for analysis of institutional resource usage. Other services, which might require more explicit attributes can be applied, if necessary, but this data release must be limited, specific, agreed and controlled by the institution.

A second element of the approach to protecting privacy outlined in the Recommended Practice is the adoption of the GEANT Data Protection Code of Conduct. This Code outlines the principles such as data minimization, limitations on data use and reuse, prohibitions on third-party data sharing, data security assurances, and approaches to addressing non-compliance. The policy is also further supported by the EU General Data Protection Regulation (GDPR), which throws legal weight and significant potential penalties for non-compliant activities. Adherence to this code of conduct is expected by all users of the RA21-based services.

There is certainly much more work to be done to build on this infrastructure. The identity federation community has tools and approaches to enforce policies, including those regarding privacy. This is done through federation membership agreements, which all participants must sign, and through codes of conduct to which service provider must agree. In addition, norms have been established for certain classes of resources through the definition of entity categories and attribute bundles. One of the follow-up projects that is being organized will seek to define and specify attribute release policies for access to library resources that can be enforced systematically within the identity federation infrastructure. Another project stream that needs to be specified is the process by which users can consent to additional attribute release. The team at Duke University has done significant work on customizable attribute release and the experience of exposing those options to users. More user testing and development on this approach needs to be undertaken.

Implementation of the Recommended Practice is being advanced by a newly organized Coalition for Seamless Access, which is a partnership between service providers, identity providers, the publishing community, and the library community. The Coalition is working now on setting up a beta phase of services based on the RA21 recommendation. It is anticipated that the service will be available in the fall and will run for 9-12 months. During that time, the services will be tested for its overall functionality, whether it presents implementation challenges, for how well it improves the user experiences, for the system’s security and stability. As this testing phase progresses, related work will also be advanced on consent, attribute release formalization, as well as overall outreach and education around the topic of single-sign-on services. Following the Beta-release phase, there will be a review process, a public report, and the Recommended Practice may need to be updated or amended to address any identified problems.

While there has been much made of the potential problems with moving toward a single-sign-on basis for authentication for library services, this is no reason to abandon the approach. There are challenges and there is potential for misconfiguration or even abuse, but this is true of every technology system. Quoting again from Kranzberg’s Sixth Law:

“Behind every machine, I see a face – indeed, many faces: the engineer, the worker, the businessman or businesswoman, and, sometimes, the general and admiral. Furthermore, the function of the technology is its use by human beings – and sometimes, alas, its abuse and misuse.”

In the face of this, we normally don’t just throw up our hands and stop using technology. Technology needs to work in tandem with the humans that engage with it. We should endeavor to design and use it more appropriately and more wisely. We should do what we can to implement systems, both technical and social, to prevent the misuse of application. A big part of this is education and transparency: Education about how the systems actually work and how they are designed; And transparency so that everyone knows what is being done with the technology and its data and for what purpose. Without them, we are all just clicking the “I Agree” button and leaving it up to others what can be done with our data.

Todd A Carpenter

@TAC_NISO

Todd Carpenter is Executive Director of the National Information Standards Organization (NISO). He additionally serves in a number of leadership roles of a variety of organizations, including as Chair of the ISO Technical Subcommittee on Identification & Description (ISO TC46/SC9), founding partner of the Coalition for Seamless Access, Past President of FORCE11, Treasurer of the Book Industry Study Group (BISG), and a Director of the Foundation of the Baltimore County Public Library. He also previously served as Treasurer of SSP.