Disclosure: This post was jointly written by Todd Carpenter, Heather Flanagan and Chris Shillum, three members of the Resource Access for the 21st Centure (RA21) leadership team.
Recent posts on Scholarly Kitchen from Lisa Janicke Hinchliffe and Roger Schonfeld have expressed concerns about the direction of, and motivation behind the RA21 project. Both of these posts risk promulgating misunderstandings about the albeit complex technologies involved. We wonder if they are also projecting broader fears and concerns about changes in technology and the implications for scholarly communication onto this initiative?
Anyone who has used the internet knows that access control and identity management are fraught with problems, not only for users of online services, but for providers as well. How many passwords do you have for how many different sites? How often do you have to reset passwords you have forgotten, or credentials that have expired? System developers have to tread a fine line between security, usability, privacy, control and access, and often have to make trade-offs between security and ease-of-use. These challenges have been discussed, debated, and argued over for a very long time in the online community and huge amounts of effort have been expended to develop technologies to solve some of the problems.
Access control for scholarly information resources sidestepped these issues for years. After initially trying to use usernames and passwords for access to online systems, and realizing they were unwieldy for everyone involved to administer and use, resource providers adopted IP addresses as stand-in credentials for access to networked resources. Essentially, it was presumed that if you were on a campus or business network, you should be authorized to gain access to resources to which the campus subscribed. This made a certain amount of sense when users had to plug wires into the wall to get access to the Internet and when they did most of their work on campus. However, methods of accessing internet and digital resources have evolved. With the growth in mobile devices, remote working, and the expectation that information resources can be accessed from anywhere, at any time, from any device, these assumptions have become more and more problematic.
RA21 aims to solve these problems once and for all, by promoting a modern, standards-based access management system, which will meet the needs and expectations of users familiar with the seamless interactions of the consumer web, while preserving the privacy and user control that is rightly expected in a scholarly setting. It is important to dispel some myths so that we can move on from the outdated and anachronistic world of IP-authentication.
Myth 1: IP authentication is inherently privacy preserving while federated authentication technologies are not – Busted
RA21 is proposing the adoption of a federated authentication system based on a technology called SAML to authorize users’ access to institutionally provided resources. We are building on this technology specifically because it has inherent mechanisms, both technical and legal, to protect privacy and put the user and their institution in control of what personal information, if any, is released to the service provider.
On the technical front, SAML can provide the exact same degree of anonymity as IP- based authentication. Most service providers, for example platforms such as ScienceDirect, Wiley Online Library and ACS Publications, have supported anonymous authentication via Shibboleth for years. In this model, the SAML protocol allows the user’s institution to make a secure, trusted assertion that the user is a member their authorized user community without disclosing any specific information about that individual. In fact, this mechanism provides more privacy protection than IP access control, as in some circumstances IP addresses can be traced back to individuals, as evidenced by a ruling from the Court of Justice of the European Union
There is a critical point to be understood here: while the user may have signed in individually to their campus or corporate ID management system, knowledge of that user’s individual identity can remain within the institution and doesn’t have to be shared with the service provider. This is exactly analogous to what happens when a user signs in as an individual to a proxy server.
Hinchliffe is right to say that the institution can also choose to pass attributes to the service provider that are specific to the individual. These can then be used to provide additional convenience to users by allowing personalized features on service provider sites to be linked to a single institutional login. Conceptually, this is exactly the same as a service provider allowing the user to create a local account on their system, as most have done for years, but without the needs for separate usernames and passwords. RA21 believes that this should always be done in an open and transparent manner with the full consent of the user via a registration process. We agree that work needs to be done to set norms and establish best practices in this area, building on efforts being done at Internet2, Duke University, and elsewhere on the Scalable Consent framework. We hope to advance these best practices as RA21 moves forward towards implementation.
As a community, we need to raise awareness of misconfigurations of the type observed by Lisa, whereby campus ID systems incorrectly state that service providers require specific personal information to provide access, when in fact they do not, so that these can be resolved and eliminated. We anticipate a future phase of RA21 supporting the roll out of the new standard that includes a focus on educating institutions about best practices and breaking down the often rigid silos between libraries and campus IT departments that lead to these misunderstandings.
The concept of federated identity management was invented in the research and academic community close to 20 years ago. Alongside the technology, a model for building a fabric of trust has been established, based around the idea of identity management federations. To join a federation, which are typically organized geographically, identity providers and service providers must agree to a set of practices and policies such as those embodied in the US-based InCommon federation’s Participant Operating Practices. These understandings are backed by legal agreements which the participants must sign with the federation operator.
The combination of the technical and legal protections already in place in the research and academic identity management community mean that the starting point for RA21 is dramatically different than on the consumer web where, with services such as Google and Facebook, the user is generally the product, and all information is considered ‘fair game’. When the purpose is purely to support authorization to a service, privacy has a far better chance of being preserved.
When it comes to individual identity, issues around consent are actually very clear. The user should be able to consent to any sharing of personal information such as their name or email address – those items that are useful for personalization, but not fundamentally necessary to the authorization transaction. Not only is informed consent required by the GDPR, the forthcoming EU legislation, it is the right thing to do and RA21 is committed to this. Considerable work has already been done in analyzing the impact of GDPR on current practices in academic identity federations around the world.
Myth 2: Proxy servers work just fine as a solution for off-campus access – Busted
Many libraries have turned to proxy services, such as EZproxy, to solve the problem of off-campus access. These services have a huge installed base and in particular have been very good at integrating with a wide variety of campus ID systems and patron authentication services to ensure that the correct set of users can gain access to external resources. However, they also present significant problems in configuration and management, and fail to address changing patterns of resource access effectively.
One of the major difficulties faced by users when navigating the world of scholarly information resources is the need for authorization at the point-of-access. Users typically reach content provider sites from points such as Google, PubMed, references in other articles, and links sent by colleagues. From these starting points, users move from system to system among an array of resource providers and research workflow tools. They are essentially starting from anywhere and going to anywhere in their journey to access the most relevant and useful information.
Proxy servers just don’t work in this scenario; the fundamental assumption behind URL rewriting proxy servers such as EZproxy is that the user starts their research journey on the institutional portal, and can therefore follow a “proxied link” to the relevant information resources. If the user arrives at a content provider without starting in the right place, the content provider has no way to know where the user is from and therefore whether they should be granted access. Federated authentication solves this problem by allowing the user to tell the service provider where they are from, so that the service provider can point the user back to their institution to sign in. This is only possible because of the centrally managed metadata distribution services that identity federations provide. However, the “Where are you from” user experience today is inconsistent and difficult to use. This is the core problem that RA21 is trying to solve.
Proxy servers are also increasingly problematic given the drive for all websites to move to https in order to protect user privacy from snooping by governments, ISPs, and malicious actors. To work in an https environment, a proxy server has to decrypt the stream of information from a resource provider’s site, modify the contents to add proxied links, and then re-encrypt the information using its own SSL certificate before sending back to the user. The very same process is applied in reverse to requests sent from the user’s browser to the resource provider’s site which potentially contain the user’s personal data such as email addresses and passwords. Not only does this expose a weak point of vulnerability at the proxy where the user’s personal information is present in clear text, it also acclimatizes the user to the very patterns a hacker would use to stage a man-in-the-middle attack, and causes complex configuration challenges for those managing and supporting proxy servers.
We are encouraged by work that has been done to allow EZProxy to act as a gateway between campus authentication systems and service providers using CAS or SAML, and see this as a promising path to support an incremental transition to federated authentication.
Myth 3: RA21 just wants to enable publishers to track users across each other’s platforms – Busted
In her article, Hinchliffe states that with Federated Identity:
…you could leave a data trail of both who you are and what resources (content and tools) you are using. Yes, that means your data could be potentially aggregated across platforms and combined with other datasets to create a more complete profile of you as a user. It is likely that you are already leaving trails of use data connected to the IP addresses of the devices that you use. With federated identity, the trail is connected to you and to the devices
This is wide of the mark on several fronts: First, federated authentication is not necessary to set up this kind of cross-site tracking, as any of us know who have experienced those annoying ads that follow us across the web once we have expressed an interest in buying a particular kind of product from a particular site. If they had wished to, scholarly resource providers could have set up exactly the same kind of tracking mechanism as used by the giant internet advertising networks. The fact that they have chosen not to do so, in the nearly two decades since Doubleclick promulgated this technology, demonstrates that there is limited if any commercial motivation for them to share this information, while the impacts to user privacy are likely unacceptable to users and the institutions that buy these resources.
Secondly, the SAML technology proposed for RA21 is different than the technology used by the major social network providers and it includes specific technical mechanisms to protect the user from cross-site correlation of their user data. As outlined earlier, federated authentication supports anonymous access should the identity provider and user so choose. And even when personalized access is desired, SAML provides a mechanism whereby a different opaque pseudonym is assigned for the same user to each service provider, specifically preventing data sharing and cross-correlation among service providers.
Myth 4: RA21 creates yet another username and password – Busted
Through the SAML protocol as described earlier, RA21 leverages a user’s existing institutional credentials and does not require the creation of publisher-specific usernames and passwords. The vast majority of users accessing scholarly resources from a campus or corporate network have very likely already signed into those networks using their institutionally provided credentials. RA21 seeks to enable a seamless and convenient experience where users who are already signed into their home institution are not prompted to re-enter their usernames and passwords.
In this context, RA21 is following the pattern all of us are now experiencing on the consumer Web, as websites are increasingly offering the ability to log in using existing credentials (e.g. Google, Facebook, LinkedIn, etc.) in lieu of creating site-specific usernames and passwords. However unlike the OpenID technology used in these cases, the SAML technology proposed by RA21 has inbuilt mechanisms for protecting user privacy as we have already described.
Myth 5: RA21 is placing control of users’ identity in the hands of institutions and not the individuals themselves – Plausible
Roger Schonfeld makes the argument that:
The underlying question for modern authorization is about authentication of individual users and so authentication is increasingly about identity. As a result, RA21 is necessarily mucking around with issues of identity
However, this conflates the action of the individual proving their identity to their institution with the action of the individual disclosing their identity to the service provider. It is important to understand that there are two dimensions to the authentication problem for most scholarly resources. First, the resource provider needs to know if the user is a member of the institution’s authorized user community in order to determine if they should be able to gain access to institution-provided resources. Second, the resource provider (often) would like to offer features which rely on individual identity, such as personalization, recommendations and portability of settings and content when the user moves from institution to institution. These two dimensions of identity are in fact entirely orthogonal – RA21 is seeking to solve the institutional association problem, whilst Schonfeld’s proposal argues for a portable, decentralized, user-controlled solution to the second problem.
Schonfeld’s argument is a compelling one. However this is neither precluded by RA21, nor necessary to solve the problem that RA21 is trying to tackle. Only the institution can assert the individual’s membership of their community in a reliable, trustworthy manner, and therefore the solution for resource access authorization must necessarily involve the institution. However the institution needn’t — and maybe shouldn’t — be involved in mediating the user’s disclosure of their individual identity to the service provider should the user so choose.
Those in Library leadership should use their positions to communicate their opinions on this topic to institutional IT managers, as Lisa Hinchliffe recommended, to ensure that library values are reflected in the way that identity management systems are set up and managed. Beyond defining a framework capable of supporting both anonymity and personalization, RA21 leaves how to manage those preferences to identity providers and to individuals themselves.
Myth 6: RA21 seeks to eliminate IP-based access – Confirmed
As many others have noted, IP addresses work extremely well when a user is on campus, but as soon as the user has shifted to a local coffee shop, their home, or an airport, they have to jump through multiple clicks to gain access to resources which they are entitled to, most of which require them to have already taken some action to install software or pre-register with a system.
The obvious starting point for RA21 then is to improve these so-called remote access scenarios, and that is where we expect initial adoption will take place. If this is as far as we get, then we will have made things better. However, we also have a hypothesis that it is the very inconsistency between the seamlessness of on-campus IP authentication and the complex steps that have to be taken to access resources off campus that makes resource access difficult today; in effect, we have trained users not to care about authentication for resources the institution has provided for them. If we are successful in solving the current usability problems with federated authentication, we believe that it will become second nature for users to use their institutional credentials to gain access to scholarly information resources, just as they use those credentials to gain access to administrative systems, collaboration tools and eLearning resources today.
So, yes, in the long run it is a goal to eliminate our community’s reliance on IP-based access. Not because of some sinister plot to compromise user’s privacy, nor because we don’t believe that users should have control of their personal information, but because the time has come to move away from the constraints of physical infrastructure. We need to separate the concepts of network addressing from user authentication, and to adopt technologies that have been designed, over many years, to specifically address the problems that we are trying to solve. We don’t expect this to happen quickly. We also realize that we will have to work openly and collaboratively with all sectors of the community for it to happen at all. However we do believe that this is the right thing to aim for, and that it is time to let go of the outdated and increasingly anachronistic reliance on IP addresses.
The user, whether they are a student, corporate researcher, or faculty member, is at the center of RA21. The ultimate goal of the project is to produce a set of best practices for institutional authentication so that the user can access resources they have rights to access, regardless of their location, device or starting point on the web. The current pilots serve as testing grounds to determine the technical options which best preserve privacy while offering a simpler, more consistent user experience. We look forward to working with a broad set of stakeholders to test, learn and gain feedback on these proposals as we step forwards towards implementation.