Disclosure: This post was jointly written by Todd Carpenter, Heather Flanagan and Chris Shillum, three members of the Resource Access for the 21st Centure (RA21) leadership team.

Recent posts on Scholarly Kitchen from Lisa Janicke Hinchliffe and Roger Schonfeld have expressed concerns about the direction of, and motivation behind the RA21 project. Both of these posts risk promulgating misunderstandings about the albeit complex technologies involved. We wonder if they are also projecting broader fears and concerns about changes in technology and the implications for scholarly communication onto this initiative?

myth busted broken glass

Anyone who has used the internet knows that access control and identity management are fraught with problems, not only for users of online services, but for providers as well. How many passwords do you have for how many different sites? How often do you have to reset passwords you have forgotten, or credentials that have expired? System developers have to tread a fine line between security, usability, privacy, control and access, and often have to make trade-offs between security and ease-of-use. These challenges have been discussed, debated, and argued over for a very long time in the online community and huge amounts of effort have been expended to develop technologies to solve some of the problems.

Access control for scholarly information resources sidestepped these issues for years. After initially trying to use usernames and passwords for access to online systems, and realizing they were unwieldy for everyone involved to administer and use, resource providers adopted IP addresses as stand-in credentials for access to networked resources. Essentially, it was presumed that if you were on a campus or business network, you should be authorized to gain access to resources to which the campus subscribed. This made a certain amount of sense when users had to plug wires into the wall to get access to the Internet and when they did most of their work on campus. However, methods of accessing internet and digital resources have evolved. With the growth in mobile devices, remote working, and the expectation that information resources can be accessed from anywhere, at any time, from any device, these assumptions have become more and more problematic.

RA21 aims to solve these problems once and for all, by promoting a modern, standards-based access management system, which will meet the needs and expectations of users familiar with the seamless interactions of the consumer web, while preserving the privacy and user control that is rightly expected in a scholarly setting. It is important to dispel some myths so that we can move on from the outdated and anachronistic world of IP-authentication.

Myth 1: IP authentication is inherently privacy preserving while federated authentication technologies are not – Busted

RA21 is proposing the adoption of a federated authentication system based on a technology called SAML to authorize users’ access to institutionally provided resources. We are building on this technology specifically because it has inherent mechanisms, both technical and legal, to protect privacy and put the user and their institution in control of what personal information, if any, is released to the service provider.

On the technical front, SAML can provide the exact same degree of anonymity as IP- based authentication. Most service providers, for example platforms such as ScienceDirect, Wiley Online Library and ACS Publications, have supported anonymous authentication via Shibboleth for years. In this model, the SAML protocol allows the user’s institution to make a secure, trusted assertion that the user is a member their authorized user community without disclosing any specific information about that individual. In fact, this mechanism provides more privacy protection than IP access control, as in some circumstances IP addresses can be traced back to individuals, as evidenced by a ruling from the Court of Justice of the European Union

There is a critical point to be understood here: while the user may have signed in individually to their campus or corporate ID management system, knowledge of that user’s individual identity can remain within the institution and doesn’t have to be shared with the service provider. This is exactly analogous to what happens when a user signs in as an individual to a proxy server.

Hinchliffe is right to say that the institution can also choose to pass attributes to the service provider that are specific to the individual. These can then be used to provide additional convenience to users by allowing personalized features on service provider sites to be linked to a single institutional login. Conceptually, this is exactly the same as a service provider allowing the user to create a local account on their system, as most have done for years, but without the needs for separate usernames and passwords. RA21 believes that this should always be done in an open and transparent manner with the full consent of the user via a registration process. We agree that work needs to be done to set norms and establish best practices in this area, building on efforts being done at Internet2, Duke University, and elsewhere on the Scalable Consent framework. We hope to advance these best practices as RA21 moves forward towards implementation.

As a community, we need to raise awareness of misconfigurations of the type observed by Lisa, whereby campus ID systems incorrectly state that service providers require specific personal information to provide access, when in fact they do not, so that these can be resolved and eliminated. We anticipate a future phase of RA21 supporting the roll out of the new standard that includes a focus on educating institutions about best practices and breaking down the often rigid silos between libraries and campus IT departments that lead to these misunderstandings.

The concept of federated identity management was invented in the research and academic community close to 20 years ago. Alongside the technology, a model for building a fabric of trust has been established, based around the idea of identity management federations. To join a federation, which are typically organized geographically, identity providers and service providers must agree to a set of practices and policies such as those embodied in the US-based InCommon federation’s Participant Operating Practices. These understandings are backed by legal agreements which the participants must sign with the federation operator.

The combination of the technical and legal protections already in place in the research and academic identity management community mean that the starting point for RA21 is dramatically different than on the consumer web where, with services such as Google and Facebook, the user is generally the product, and all information is considered ‘fair game’. When the purpose is purely to support authorization to a service, privacy has a far better chance of being preserved.

When it comes to individual identity, issues around consent are actually very clear. The user should be able to consent to any sharing of personal information such as their name or email address – those items that are useful for personalization, but not fundamentally necessary to the authorization transaction. Not only is informed consent required by the GDPR, the forthcoming EU legislation, it is the right thing to do and RA21 is committed to this. Considerable work has already been done in analyzing the impact of GDPR on current practices in academic identity federations around the world.

Myth 2: Proxy servers work just fine as a solution for off-campus access – Busted

Many libraries have turned to proxy services, such as EZproxy, to solve the problem of off-campus access. These services have a huge installed base and in particular have been very good at integrating with a wide variety of campus ID systems and patron authentication services to ensure that the correct set of users can gain access to external resources. However, they also present significant problems in configuration and management, and fail to address changing patterns of resource access effectively.

One of the major difficulties faced by users when navigating the world of scholarly information resources is the need for authorization at the point-of-access. Users typically reach content provider sites from points such as Google, PubMed, references in other articles, and links sent by colleagues. From these starting points, users move from system to system among an array of resource providers and research workflow tools. They are essentially starting from anywhere and going to anywhere in their journey to access the most relevant and useful information.

Proxy servers just don’t work in this scenario; the fundamental assumption behind URL rewriting proxy servers such as EZproxy is that the user starts their research journey on the institutional portal, and can therefore follow a “proxied link” to the relevant information resources. If the user arrives at a content provider without starting in the right place, the content provider has no way to know where the user is from and therefore whether they should be granted access. Federated authentication solves this problem by allowing the user to tell the service provider where they are from, so that the service provider can point the user back to their institution to sign in. This is only possible because of the centrally managed metadata distribution services that identity federations provide. However, the “Where are you from” user experience today is inconsistent and difficult to use. This is the core problem that RA21 is trying to solve.

Proxy servers are also increasingly problematic given the drive for all websites to move to https in order to protect user privacy from snooping by governments, ISPs, and malicious actors. To work in an https environment, a proxy server has to decrypt the stream of information from a resource provider’s site, modify the contents to add proxied links, and then re-encrypt the information using its own SSL certificate before sending back to the user. The very same process is applied in reverse to requests sent from the user’s browser to the resource provider’s site which potentially contain the user’s personal data such as email addresses and passwords. Not only does this expose a weak point of vulnerability at the proxy where the user’s personal information is present in clear text, it also acclimatizes the user to the very patterns a hacker would use to stage a man-in-the-middle attack, and causes complex configuration challenges for those managing and supporting proxy servers.

We are encouraged by work that has been done to allow EZProxy to act as a gateway between campus authentication systems and service providers using CAS or SAML, and see this as a promising path to support an incremental transition to federated authentication.

Myth 3: RA21 just wants to enable publishers to track users across each other’s platforms – Busted

 In her article, Hinchliffe states that with Federated Identity:

…you could leave a data trail of both who you are and what resources (content and tools) you are using. Yes, that means your data could be potentially aggregated across platforms and combined with other datasets to create a more complete profile of you as a user. It is likely that you are already leaving trails of use data connected to the IP addresses of the devices that you use. With federated identity, the trail is connected to you and to the devices

This is wide of the mark on several fronts: First, federated authentication is not necessary to set up this kind of cross-site tracking, as any of us know who have experienced those annoying ads that follow us across the web once we have expressed an interest in buying a particular kind of product from a particular site. If they had wished to, scholarly resource providers could have set up exactly the same kind of tracking mechanism as used by the giant internet advertising networks. The fact that they have chosen not to do so, in the nearly two decades since Doubleclick promulgated this technology, demonstrates that there is limited if any commercial motivation for them to share this information, while the impacts to user privacy are likely unacceptable to users and the institutions that buy these resources.

Secondly, the SAML technology proposed for RA21 is different than the technology used by the major social network providers and it includes specific technical mechanisms to protect the user from cross-site correlation of their user data. As outlined earlier, federated authentication supports anonymous access should the identity provider and user so choose. And even when personalized access is desired, SAML provides a mechanism whereby a different opaque pseudonym is assigned for the same user to each service provider, specifically preventing data sharing and cross-correlation among service providers.

Myth 4: RA21 creates yet another username and password – Busted

Through the SAML protocol as described earlier, RA21 leverages a user’s existing institutional credentials and does not require the creation of publisher-specific usernames and passwords. The vast majority of users accessing scholarly resources from a campus or corporate network have very likely already signed into those networks using their institutionally provided credentials. RA21 seeks to enable a seamless and convenient experience where users who are already signed into their home institution are not prompted to re-enter their usernames and passwords.

In this context, RA21 is following the pattern all of us are now experiencing on the consumer Web, as websites are increasingly offering the ability to log in using existing credentials (e.g. Google, Facebook, LinkedIn, etc.) in lieu of creating site-specific usernames and passwords. However unlike the OpenID technology used in these cases, the SAML technology proposed by RA21 has inbuilt mechanisms for protecting user privacy as we have already described.

Myth 5: RA21 is placing control of users’ identity in the hands of institutions and not the individuals themselves – Plausible

Roger Schonfeld makes the argument that:

The underlying question for modern authorization is about authentication of individual users and so authentication is increasingly about identity. As a result, RA21 is necessarily mucking around with issues of identity

However, this conflates the action of the individual proving their identity to their institution with the action of the individual disclosing their identity to the service provider. It is important to understand that there are two dimensions to the authentication problem for most scholarly resources. First, the resource provider needs to know if the user is a member of the institution’s authorized user community in order to determine if they should be able to gain access to institution-provided resources. Second, the resource provider (often) would like to offer features which rely on individual identity, such as personalization, recommendations and portability of settings and content when the user moves from institution to institution. These two dimensions of identity are in fact entirely orthogonal – RA21 is seeking to solve the institutional association problem, whilst Schonfeld’s proposal argues for a portable, decentralized, user-controlled solution to the second problem.

Schonfeld’s argument is a compelling one. However this is neither precluded by RA21, nor necessary to solve the problem that RA21 is trying to tackle. Only the institution can assert the individual’s membership of their community in a reliable, trustworthy manner, and therefore the solution for resource access authorization must necessarily involve the institution. However the institution needn’t — and maybe shouldn’t — be involved in mediating the user’s disclosure of their individual identity to the service provider should the user so choose.

Those in Library leadership should use their positions to communicate their opinions on this topic to institutional IT managers, as Lisa Hinchliffe recommended, to ensure that library values are reflected in the way that identity management systems are set up and managed. Beyond defining a framework capable of supporting both anonymity and personalization, RA21 leaves how to manage those preferences to identity providers and to individuals themselves.

Myth 6: RA21 seeks to eliminate IP-based access – Confirmed

As many others have noted, IP addresses work extremely well when a user is on campus, but as soon as the user has shifted to a local coffee shop, their home, or an airport, they have to jump through multiple clicks to gain access to resources which they are entitled to, most of which require them to have already taken some action to install software or pre-register with a system.

The obvious starting point for RA21 then is to improve these so-called remote access scenarios, and that is where we expect initial adoption will take place. If this is as far as we get, then we will have made things better. However, we also have a hypothesis that it is the very inconsistency between the seamlessness of on-campus IP authentication and the complex steps that have to be taken to access resources off campus that makes resource access difficult today; in effect, we have trained users not to care about authentication for resources the institution has provided for them. If we are successful in solving the current usability problems with federated authentication, we believe that it will become second nature for users to use their institutional credentials to gain access to scholarly information resources, just as they use those credentials to gain access to administrative systems, collaboration tools and eLearning resources today.

So, yes, in the long run it is a goal to eliminate our community’s reliance on IP-based access. Not because of some sinister plot to compromise user’s privacy, nor because we don’t believe that users should have control of their personal information, but because the time has come to move away from the constraints of physical infrastructure. We need to separate the concepts of network addressing from user authentication, and to adopt technologies that have been designed, over many years, to specifically address the problems that we are trying to solve. We don’t expect this to happen quickly. We also realize that we will have to work openly and collaboratively with all sectors of the community for it to happen at all. However we do believe that this is the right thing to aim for, and that it is time to let go of the outdated and increasingly anachronistic reliance on IP addresses.

The user, whether they are a student, corporate researcher, or faculty member, is at the center of RA21. The ultimate goal of the project is to produce a set of best practices for institutional authentication so that the user can access resources they have rights to access, regardless of their location, device or starting point on the web. The current pilots serve as testing grounds to determine the technical options which best preserve privacy while offering a simpler, more consistent user experience. We look forward to working with a broad set of stakeholders to test, learn and gain feedback on these proposals as we step forwards towards implementation.

Todd A Carpenter

Todd A Carpenter

Todd Carpenter is Executive Director of the National Information Standards Organization (NISO). He additionally serves in a number of leadership roles of a variety of organizations, including as Chair of the ISO Technical Subcommittee on Identification & Description (ISO TC46/SC9), founding partner of the Coalition for Seamless Access, Past President of FORCE11, Treasurer of the Book Industry Study Group (BISG), and a Director of the Foundation of the Baltimore County Public Library. He also previously served as Treasurer of SSP.


42 Thoughts on "Myth Busting: Five Commonly Held Misconceptions About RA21 (and One Rumor Confirmed)"

Thank you, Todd, Chris, and Heather, for this post. I’m really pleased that RA21 is taking on clearer public communication about its work and its vision, and I’m so pleased to have helped provoke this important discussion.

I want to focus on what you identity as the “plausible” myth 5, and I’d like to cut to the heart of my concern. We all agree that RA21 focuses on access control and authorization. But my concerns about the broader use of individual identity — for personalization, analytics, and other purposes — are not, as you’ve suggested, “orthogonal.”

The RA21 model assumes that activity data generated across, for example, Elsevier properties stays with Elsevier. Given scale effects, the current majors have the most activity data today and will continue under the RA21 model to do so. My concern is only orthogonal because you have defined the scope of your efforts in this way.

It was a serious mistake to scope RA21 as you have, in a way that benefits the interests of incumbents like Elsevier, rather than the broader needs of the community. I would have expected entities that are intended to work on behalf of the broad community, such as NISO and STM, to take this opportunity to level the playing field, rather than reinforce the interests of incumbent majors.

I am still waiting for RA21 and its sponsors to make the commitment, as I called for in my piece, to develop a user-centric level playing field empowering users to manage their own identity and data. Are RA21 and its sponsors prepared to make that commitment?

Individual management of identity, which is the premise of your argument, is certainly a laudable goal overall on the Internet. The challenge with your suggested approach is multi-fold. Core to this problem, which is also core to what RA21 is trying to address, is that the subscription relationship is between the content provider and the library, not the content provider and the individual patron. The process of authentication therefore must include a query and response of some type to validate the individual’s relationship with the subscribing institution. The content provider needs to validate the user is authorized to access the site and that authorization can only come from the institution.

Now, there are a variety of initiatives that are exploring ways in which individuals can manage their own identity. These efforts are presently very modest in scale and face several challenges, both technological and social. These systems work best in domains where there is a direct relationship between the supplier (content provider in this case) and the user. These solutions haven’t worked out how an intermediary validation of identity assertions could work. In this context, the identity federation model, upon which RA21 is built, appears to be the best current solution to this authorization challenge. A core principle of the initiative was that we didn’t want to build out entirely new technological solutions to solve these issues. If we were to build out a new system based on untested individual identity management principles, we would have had to overcome both the systems adoption issues in thousands of institutions and the social barriers of getting patrons to adopt an entirely new model of access control. To do both simultaneously would have ensured the project would have taken even longer in its roll-out, if it would have been successfully adopted at all.

The model of SAML attribute exchange using federated identity management services run by trusted third-parties is something that has worked for nearly two decades. It has been adopted by thousands of institutions and is in production in the overwhelming majority of institutions. It is technology that most institution’s IT departments are familiar with and can adapt to quickly with minimum of systems implementation issues. Many publishers and publisher service providers have already implemented it, but it hasn’t seen as broad adoption and use for subscription access, particularly in US-based institutions.

Now, let’s presume that there was an individual identity model, whereby each individual would have persistent identity credentials that are shared with content providers. This approach would add to the tracking capabilities of content providers, not diminish them, and therefore diminish the potential for privacy protections. Having a trusted third party identity provider allows for the selective attribute release and for the resetting of identity tokens, which further masks user identity to the content provider.

There was a conversation in the early stage of the project about other technologies that could be used to solve some of these issues, such as OAuth or OpenID Connect, but it was quickly determined that an appropriate technology (i.e., SAML) existed and that we could build on it. If a more robust technology for identity management becomes widely accepted on the wider internet, it is likely that publishers will adopt it. Realistically, a new user-centered identity management, or a verifiable online identity management world, is not in the near term technological future. It would be a huge mistake for the publishing and library communities to try to push that forward on their own. beyond that as Lisa noted, the library often doesn’t control the institutional IT infrastructure that supports online identity and therefore couldn’t push out the new model even if it desired to do so.

Todd, thank you for this. RA21 seems destined to reinforce the existing system rather than to provide a set of solutions that allow experimentation, flexibility, and competition. I hope this is not your intention o nor that of NISO, but let me provide a few examples where I read your comment this way.

“the subscription relationship is between the content provider and the library” — yes, it’s true, but there are many efforts to build alternatives, and RA21 doesn’t give them a level playing field.

“an individual identity model, whereby each individual would have persistent identity credentials that are shared with content providers… would add to the tracking capabilities of content providers, not diminish them” This might well be true if the initiative were led by commercial publishers, but there is nothing technically intrinsic that demands this be the case.

“There was a conversation in the early stage of the project about other technologies that could be used to solve some of these issues…If a more robust technology for identity management becomes widely accepted on the wider internet, it is likely that publishers will adopt it.” If the scholarly community is faced with the choice between SAML and “wider internet” solutions, count me in as a supporter of SAML. The identity solutions on the “broader internet” seem determined by the tracking models which support Google’s and Facebook’s advertising and AI empires. But that doesn’t mean that we should support a solution likely to have much the same outcome for industry majors.

The “core” of what I and many other community members are most interested in is not authorization of subscription resources, important though this is, but rather in building up a set of standards to encourage experimentation, flexibility, and competition. I note there still is no commitment to providing a level playing field. I am disappointed but not surprised.

Roger – could you please explain specifically how RA21 is not seeking to create a level playing field in the resource access management. I understand your comments are mainly about activity tracking, however, that is not the problem we are seeking to address. I understand that you also think we have scoped our problem space incorrectly, however we are specifically scoped the problem the way we did because we believe we have a reasonable chance of success that way. We are aiming to solve a problem that all segments of the community – resource providers, libraries, identity management folks and end users have identified for many years as a common problem, and we are trying to build an effort in which all of those segments can participate to solve the problem. There are many other groups that have been, and can be, formed to solve other problems of mutual interest; it is not incumbent for RA21 to solve all problems in the scholarly communication space, nor is it realistic or feasible to expect this.

Chris S, I appreciate the question. The scoping of the problem was made by who exactly and how exactly? Are there minutes from those meetings and lists of participants?

I argued plainly last month that the community should understand the issue as identity management broadly, not access management narrowly. Of course any effort that is led by the commercial majors with will find that a scope with a “reasonable chance of success” leaves broader identity management issues alone; thereby avoiding the creation of a level playing field for the gathering and exploitation of activity data. It may be that this issue never even crossed anyone’s minds consciously, but the outcome is the same. The scope of the effort and the decision about what has a reasonable chance of success is in the interest of some more than others.

I would emphasize that I recognize the difficulty in these efforts. It’s not easy to balance the interests even within a commercial publisher trying to address access management for licensed content without doing collateral damage to its emerging data and workflow strategy — let alone if we are to balance this against the interests of other kinds of publishers, scholars who value academic freedom, and libraries that value privacy. The simple solution may be at hand, to be sure, but I don’t find it to be a balanced one. And I really don’t understand who set the scope and why anyone outside of the commercial majors would find it to be reasonable.

Thank you Todd, Heather, and Chris for continuing this conversation.

Let’s start with my appreciation that you are confirming my suspicion that it isn’t just the proxy server that RA21 is seeking to eliminate but IP authentication writ large.

I hope every librarian, library administration, small platform, publisher, etc. sees this sentence “RA21 Seeks to Eliminate IP-based Access – Confirmed” and has a serious conversation about how this will impact their strategic directions, the implications for staffing and workflows, etc.

I’ll admit that I’m a bit puzzled why Todd called my title “sensationalized” when it turns out that it is indeed a statement of fact (https://scholarlykitchen.sspnet.org/2018/01/16/what-will-you-do-when-they-come-for-your-proxy-server-ra21/).

As to all the details you provide re privacy – yes, SAML *can* be used in this ways that obscures identity, etc. But, there is nothing inherent the SAML approach that requires it to be used to do so. That will only occur through policy, regulation, etc. These things that require constant vigilance and, as we in the USA have recent experience of with the elimination of net neutrality, can be altered fairly rapidly. We know that there are already places that are passing private information along that the platforms do not want, as I as well as others have documented. This means that RA21 has selected an approach that contains, inherently within in, the mechanisms for disclosure of personal information. That it seeks to do so for not just off-campus but even on-campus access should cause us all great concern. We can all hope that platforms and institutions will act in privacy-protecting ways (and empower user control) but SAML opens that door to privacy violations that are not possible with IP authentication. This is, of course, why I immediately accepted the opportunity to serve on the RA21 privacy working group when invited to do so. If policy is going to be all that protects users against privacy-violating technical possibilities, we need strong policy that becomes a community norm and expectation, not merely a set of suggestions.

I look forward to our continued discussions and work together on this topic.

Regarding the ‘confirmed’ myth (you may be thinking ‘oxymoron’, I couldn’t possibly comment) 6 – RA21 SEEKS TO ELIMINATE IP-BASED ACCESS.

IP authentication (and proxies/vpns) is not perfect; but it is used across the web; many a organisation will only allow access to internal systems if you are either on the corporate network or using a VPN.

When the University I then worked for implemented shibboleth/saml in the mid-naughties we choose, like many, to implement it alongside the existing IP authentication/ezproxy option. Shibboleth was a great innovation.
If a user tried to access content (eg an article) from on campus, or from a University controlled url, they were utilising the IP route, if they accessed a resource remotely from, say, a google search the Shibboleth route could be used. win win.

I would say today, most people accept they will often need to sign-in to many systems (though will reasonably expect systems provided by an organisation to use a consistent set of credentials). For one project I am on, the team use basecamp, when I see an email telling me there is a new document to look at, I don’t blink as I’m prompted to authenticate. I don’t question this.

Follow link -> authenticate -> access; is a well established model. I think few would complain at this.

However Shibboleth/Saml (and I appreciate we are talking RA21, which I understand tries to improve) has struggled to achieve this.

When a typical undergrad would google (naturally) for an article from their room, they would face a page telling in large letters they do not have access. It would typically then tell them they just need to pay $30, probably in a box taking most the screen. Perhaps it would offer underneath username/password fields, which only worked if you had a login for that particular site. Surely most encountering this would either give up, or try and enter their institutional credentials in to those fields and then get no further.

They did not know that they needed to look at the top left, or top right, in small font, for the term, signin, or ‘shibboleth signin’ or ‘federated signin’ or ‘institutional signin’. There they might have to select the type of ‘signin’ they wanted to use and click on something like ‘federated’; next they might have to select their federation (or better, country), and finally their institution. One publisher, for years, would then, post authentication, take the user to the publisher’s platform homepage instead of the article.

‘WAYF-less URLs’ (which would take the user straight to their own authentication page) were technically possible, though complex, required specialist knowledge different for each publisher, and technical skill and time to create just one such url. How many articles are in a typical discovery system? And would often break without notice.

UX, then, has been an issue, and I understand RA21 takes steps to solve some of this.

A quick aside about privacy, I agree with a lot of what this article says, though do worry there is a slope we may end up going down. Again, using Shibboleth as an example, one publisher required an attribute others did not, not exceptions, do we simply agree, or potentially block access to something we have paid for? As noted, making it clear to the end user what is being shared is vital, though not always easy when the only screen they may see if a generic institutional authentication screen.

I will also add, in the future we can expect the reader not to be always human, IP authentication makes it very easy for non-humans to access the content on a site. And while some platforms have their own API, these are inconsistent. And to my (limited) knowledge access by machines via saml authentication is non-trivial.

Back to the point, IP/proxy provide a remarkably simple UX experience. It does require URLs to be manipulated, but in a easy way; simply prefix a set string in front of the URL.

It does also probably make it easier access to content including to those who either shouldn’t, or breaking the terms of the licence (though certainly in the UK, most licences allow anyone on campus to access content). And it’s this broad access which publishers are uncomfortable with.

When I’ve discussed RA21 with colleagues in Higher Education, the presumed driver is not one of the myths listed above, but that large (STM) publishers wish to rein in perceived liberal access (especially in a world of scihub and similar) to their content. It would be interesting to explore this ‘myth’ (?).

Of course, RA21 on its own would not achieve this greater control of access; a second step needs to happen; the demise IP access.

This can happen in two ways: RA21 is so compelling, beneficial, and problem solving that customers naturally move to exclusively use RA21. Or, publishers can insist/force/incentivise the switching off of IP access.

One of these is desirable, the other seems not acceptable. While I see benefits in RA21. So my personal view, the decision to stop providing access via IP authentication should be in the hands of the institution, not the supplier.

Let’s not forget as well that even publishers that already currently provide a SAML-based login option can take steps (hopefully inadvertently?) to set things up to deny the pathway to library-based access. For example, this issue with ScienceDirect – https://lisahinchliffe.com/2017/12/21/logout-of-your-elsevier-account-to-get-library-access/ (which I just confirmed is still failing to provide a pathway to library subscription based access, more than six weeks after I made this problem known publicly. This problem does not manifest itself if I am on-campus or using the proxy server/VPN using ScienceDirect via IP authentication.

Lisa – thanks for pointing out the current issue in ScienceDirect. I raised this with the ScienceDirect team when you first mentioned it to me, and they are working on a short-term solution. Interestingly (for identity management geeks like me), this is a good illustration of a platform not handling the orthogonality between individual identity and institutional association correctly, and is a good example of what makes this seemly simple problem space so complex.

Chris, I can’t imagine that you are suggesting that appending proxy information to the beginning of a URL is a user friendly solution! It’s fine when writing a subject guide, but completely unrealistic for an average user in their day to day work.

Regarding the “myth” you all about, I don’t think anyone denies that concerns about piracy are among the top reasons that publishers are so invested in RA21. I wrote about this angle a little bit in my piece linked at the beginning of today’s post, this one: https://scholarlykitchen.sspnet.org/2018/01/22/identity-everything/

We know that a lot of the users of Sci-Hub have legal access to the content they’re instead getting via piracy. But for many, it’s easier to use Sci-Hub than to sign in to one’s library subscription (particularly from off campus). So one of the big goals of RA21 is to improve the user experience to the point where it’s easier to use the legal channels than the illegal ones. As Steve Jobs famously said, you beat piracy by outcompeting it.

I think there should be serious thought about whether RA21/SAML is up to this task. Many platforms already support Shibboleth and there are many OpenAthens clients. Are they finding this SAML access that much easier than SciHub? Sure doesn’t seem like it …

Chris – thanks for your comment. I think your analysis is very astute, and you are correct, we are primarily trying to address the UI challenge of making access as seamless as it is with SaaS resources such as Bootcamp, while addressing privacy concerns that these kinds of resources don’t necessarily worry about.

It is absolutely our aim to make RA21 so compelling, beneficial, and problem solving that customers naturally move to exclusively use RA21 and it seems very unlikely indeed that service providers will insist/force/incentivise the switching off of IP access if we don’t succeed.

Sorry if I missed something, but how does the RA21 address the walk in users access issue ?
Thanks !

Seems to me, in light of the confirmed goal that RA21 has of eliminating IP authentication, walk-in user access will likely also be eliminated?

This is addressed in the RA21 FAQ: This is definitely an area which will need special attention. Using federated access does not imply using only login credentials. Institutions will be free to use different authentication methods for different classes of users. They could, for example, use smart cards, one-time access codes, certificates installed on library workstations, or even continue to use the IP address authorization for on-site usage, or may transition to a guest account service. It is up to the library itself to make these decisions. https://ra21.org/index.php/what-is-ra21/faq/#What_does_this_mean_for_the_walk-in_user_at_a_library

Thanks Chris. It is helpful to have you think aloud and see that indeed walk-in use will likely go away if RA21 achieves its vision – if not because of direct assault on that function per se but because it will be a casualty of the new environment not architecting it in. And, it sounds like publishers won’t be disappointed if they can finally exclude walk-in users.

I’ve not encountered complaints from publishers about walk-in user access. Is this something you’ve frequently run into?

I think it is more that they have been resigned to it than complaining about it. As Chris said, “I’ve often felt that many library vendors only allow walk-in use because it would be difficult for most libraries to fully exclude it in an IP authentication scheme.” As Chris goes on to say that, “I suspect most vendors would not allow IP authentication or other backup methods for this use case” So, I’m feeling confident in my conclusion that they won’t be disappointed to stop walk-in use (and maybe I’ll go further and say that I may suspect they may even be pleased?).

Honestly, I don’t think it’s on the radar of most publishers, and there’s not great concern nor a hidden agenda here. Many publishers went out of their way to help support a system for providing public access to research papers in public libraries: http://www.accesstoresearch.org.uk/

Lisa that’s not what I mean at all. I am saying that we will need to carefully think about how to preserve support for the walk-in use case as we take RA21 forward. As as I outlined, there are several ways I think this could be done.

Oops … there are two people named Chris commenting here AND my reply got threaded under the wrong one. My response is to Chris B. not Chris S. Re: Chris S. – while the FAQ may say that, today’s blog post reveals that the end game is no IP based authentication and though there may be work-arounds Chris B.’s comment shows why it is unlikely to occur and hence my comment that walk-in access will be a casualty of RA21 rather than directly assaulted.

Lisa – no worries; threading on the SK platform doesn’t appear to work very well! There is nothing in the RA21 proposal that prevents an institution from using IP access locally to authenticate walk-in users, and then passing that authentication information to the service provider platform as an anonymous SAML assertion. This is one of the ways in which I think we can continue to support walk-ins. Further discussion is required to figure out the best approach, which will likely vary by institution.

I think one of the issues here is that once a dominant authentication scheme is in place, it will extend far beyond the major scholarly publishers. For user experience sake, we don’t want to have one login process for Elsevier content, another for EBSCOhost, a third for LexisNexis, and a fourth for a small trade publisher. So while David is right to point out that many of the STM publishers currently involved in this effort would be willing to accommodate alternative authentication schemes for guest access, I know others would not. In the ebook space, EBL pushed a unique EZproxy configuration that required login on campus and Safari tech books is moving toward an approach that disallows walkins. LexisNexis has often been unfriendly to walk-ins in their standard licenses, and their new Nexis Uni product emphasizes individual accounts. Products that focus on legal or medical users have often pushed for individual account creation, requiring a layer on top of or in place of proxy authentication. If RA21 becomes the dominant authentication scheme for academic libraries, it will be more difficult to push back against these attempts to eliminate walk-in use.

I want to take a moment to consider what this will be like for walk-in users. I’ve often felt that many library vendors only allow walk-in use because it would be difficult for most libraries to fully exclude it in an IP authentication scheme. While the RA21 FAQ (https://ra21.org/index.php/what-is-ra21/faq/#What_does_this_mean_for_the_walk-in_user_at_a_library) states that institutions could just use a different login scheme, this is easier said than done. I suspect most vendors would not allow IP authentication or other backup methods for this use case. I would also wager that developing an entirely separate authentication method for guest users is not high on the to-do list for overworked and resource poor IT departments at a lot of libraries and universities. We’ve worked hard to keep walk-in users in our license agreements, but I wonder if we’ll be able to continue to make that work with RA21.

I note with interest the suggestion that RA21 is following in the pattern of Google, Facebook, and LinkedIn. Given the significant security issues vexing Facebook at the moment – despite the extraordinary financial and technical assets they possess to combat same – that is a concern.

I say that because, despite the legal protections unique to the publishing industry, I think it is clear that one of the largest pirates to hit the publishing industry clearly feels that all information – legally protected or not – is very much “fair game”.

That last point is worth considering, given that much of the content “hosted” on major pirate sites has been obtained by hacking the personal credentials of Shibboleth and related username and password services. Technical and legal protections may abound, but they are expensive and recent history suggests that they are a poor defence in and of themselves.

It’s also important to note that the implications of such theft don’t stop with the publishers. Libraries on a global basis are heavily impacted given that these hacks are affecting their computer systems and the theft of other content like personal research and records. We have also seen viruses placed on computers and personal information taken. This is virtually all via credentials and proxy servers and not via on campus access.

The point is not to say that RA21 is not a worthwhile endeavour to explore – it is. Rather it is a suggestion that an attempt to totally replace an existing system that is used globally by some 100,000 plus institutions is a big ask – particularly if the real issue is poor and insecure access while you are “at a local coffee shop, their home, or an airport”. Perhaps a better approach would be to work towards a parallel transition and not a declaration to “eliminate” one system versus another. Making these decisions on behalf of all publishers and all libraries around the works seems dictatorial and could face backlash.

When one looks at the fact that this is an issue impacting libraries and researchers on a global basis – not just in North America and Northern Europe – it’s easy to see how tight budgets and issues like Open Access may well dictate that the tail of IP Address data is long indeed. Likely longer than print.

re: integrating walk-in users. Walk-in users are a good example of library patrons that sit outside standard organizational identity systems that issue credentials and integrate into federated authentication solutions such as Shibboleth. In our experience it’s pretty usual for libraries to have at least 2 identity sources in order to control access to subscribed resources (e.g. the local directory plus a more flexible, library-controlled system used for walk-ins, affiliated users etc). In one recent case, we integrated 3 sources, as students were managed separately from staff/faculty. The messy reality for most libraries is a need to continue to support multiple authentication systems – both those used internally to verify patron credentials (incl. walk-ins) and those used externally to authenticate with resource providers that can require a mixture of IP authentication, federated SSO, and legacy methods such as referral URLs and custom user logins. Identity and access is a very technical area, but there are solutions like ours out there that manage this complexity, just as there are for many other technically-demanding areas of library management.

One of the major goals for RA21 is streamlining the process of identifying a user’s institution, to avoid having to ask the same question repeatedly. The issues above inform how individual institutions cope with multiple identity sources in order to simplify the login process when you arrive.

This issue of multiple patron identity sources doesn’t get a lot of attention, but is a very real one as we try to move towards a federated authentication world. If anyone’s interested, we recently posted on the topic: http://www.liblynx.com/helping-libraries-suffering-from-multiple-identity-disorder/

It strikes me that implementing a RA21 system will make it easier for Sci-Hub to pirate articles and populate its own web site since only one identifier , provided by a faculty member, will be needed to access multiple resources. Or am I missing something?

Sandy – only one identifier is required now for bulk downloading via EZproxy. The problem for the content providers is that it can be difficult to detect a single user bulk downloading materials via a proxy. A proxy IP is expected to be the source of a lot of traffic, since the proxy is handling sessions for multiple users. If traffic through your proxy to Resource Provider X is typically 2000 PDFs per hour, it may not be apparent that 200 of those are bulk downloads from a single user. But if sessions can be tied to individual users (identified or anonymized, it doesn’t matter) then it becomes a lot easier for content providers to detect and cap bulk downloaders. There is also a benefit to the authenticating institution in this scenario, since when blocks are put in place for a proxy they affect everyone using it, not just the single bad apple.

None of which should be taken to infer that I wholeheartedly support the switch away from IP authentication. I share a number of the concerns that have been expressed here. IP authentication has its problems but SAML as it currently stands will be far more of a burden to manage than a proxy server (and yes, I speak from many years of experience working with EZproxy).

Actually, federated authentication should make life harder for pirates for a couple of reasons.

Firstly, it raises the barrier for the credentials required to get fraudulent access. Under IP authentication, you simply need to find a way to get onto a campus IP range in order to masquerade as a legitimate user. Because most libraries either don’t require an individual login, or don’t correlate that login with activity, it’s very hard (or impossible) to trace unauthorized usage back to an individual. As there are many ways to get into a campus computer – ranging from walking into campus to hacking in remotely – there are lots of anonymous opportunities to get unauthorized access. With federated authentication, the only way in is for the fraudulent user to steal/borrow a valid set of personal network credentials (or abuse their own). As these credentials likely provide access to a wide range of other personal information (HR records? Employment details? Email?), users are likely to be more careful in sharing/protecting them. While no set of credentials is fool (or hack) proof, federated authentication at least restricts pirates to a much more limited set of options than those available under IP authentication.

Secondly, federated authentication makes it much easier to identify and shutdown the source of fraudulent activity. Under IP authentication, the publisher is flying pretty blind. They see a lot of suspicious activity but may only be able to report the IP address associated with it to the institution. If the institution is using a proxy server, then detective work is required to work back from that to the real campus IP, the device associated with that IP address at that time, and the users who were logged into that device and so could potentially be the culprits (or not …). As this sort of investigation can be very time-consuming and require niche technical skills, the only way to stop the leakage in the short term is to shut down access from those IPs – which is why some high-profile institutions lose campus-wide access to various resources every now and then. With federated authentication, the publisher has (at a minimum) a unique, anonymous identifier associated with the suspicious usage that can be reported back to the institution, so the institution can then use it to identify the underlying credentials used to gain access. These credentials can then be shut off immediately while the institution investigates what happened – and without cutting off access to everyone else.

Tim writes:

“Actually, federated authentication should make life harder for pirates for a couple of reasons… With federated authentication, the only way in is for the fraudulent user to steal/borrow a valid set of personal network credentials (or abuse their own)…”

This is also the case with authentication via proxy servers; pirates have already solved this problem. Phishing for credentials is easy.

“Under IP authentication, the publisher is flying pretty blind. They see a lot of suspicious activity but may only be able to report the IP address associated with it to the institution. If the institution is using a proxy server, then detective work is required to work back from that to the real…”

I would argue that this is intentional. We’re happy to be the ones doing the detective work on a handful of offenders as a trade-off for shielding the privacy of all our patrons. That isn’t broken, and we don’t want it “fixed”. Increasing third party access to identifying information is antithetical to library principles.

Limiting my comments at this point to just one aspect of RA21, I find the assertions of RA21’s ability to improve users’ discovery to access workflow to be shockingly disingenuous. The problems users have in accessing licensed resources outside of their library’s IP range and/or through third-party search tools are in no small part created by the same publishers who are proposing RA21 as a solution. It would be just as easy, if not easier, to use a WAYF-type process to connect users to their institutional proxy server as it would be to connect them to their institutional identity provider. That so few publishers even attempt to ease the use of this path speaks volumes about their true priorities.

Quite true Cody. I have noted in other conversations that JSTOR has implemented the WAYF-type process you describe, routing users through our proxy server. It would be great to see the commitment to ensuring user access demonstrated by publishers/platforms by their tools making use of the existing technology authentication infrastructure we have in place.

Hi there,
About Myth n°3, of course I doubt that Shibboleth will enable publishers to track users across each other’s platforms. But anyway third parties (including Google) can already track users due to poor publishers standards regarding advertising method (How relevant/ethic is advertising in academic publications anyway?). The five first publishers appearing in the steering committe list – Elsevier, Wiley, ACS, Springer, IEEE (https://ra21.org/index.php/about/) – have websites that use the Doubleclick tracker, alongside with several other trackers. Eric Hellman wrote about it https://go-to-hellman.blogspot.fr/2017/03/reader-privacy-for-research-journals-is.html

To complicate things further, for institutions and their users in countries outside of the US, stricter privacy laws need to be taken into account. For example, any personal information collected to authenticate at a Canadian institution would need to be stored on a server residing in Canada. Or alternatively, privacy assessments would need to be done to get permission granted for data to be stored internationally, and the result of these assessments would vary by institution. The definition of what is considered personal information is not always clear and in some cases could even include a single email address. At this time many of the big players in the market, EBSCO for example, do not even have a single server residing in Canada.

Hi there,

OpenAthens carries out end user experience (UX) research to ensure UX is a priority through the authentication process. The lessons are always valuable and often, quite sobering. Quotes from our most recent research in the UK include:

“Recommend reading lists include links to articles I can’t access”
“If I’m asked to log-in I don’t read the article”
“You need research skills to find the good articles”

I don’t think these comments will surprise many, if any.

As a SAML identity and access management service we know we have a part to play in solving these problems. We are using SAML to ensure our service is compliant with GDPR and is flexible enough to respond to the requirements of the end user, the library and the publisher – across many different transactional and policy models. The needs of a corporate research team are different from those of undergraduates – but many of the resources and platforms are the same.

No single technology can solve the problem. Nor a single actor or agency.

This is why RA21 is such an important, collaborative initiative. It has brought together many – if not all – of the important contributors in this space and started a discussion that will not end when this chapter of the RA21 story closes. Change is constant and the industry must continually adapt and evolve to meet the needs of the communities it serves – especially the end users.

While I agree IP authentication for remote access is overdue for replacement, and that we need an “access from anywhere” type of solution in its place, I’ve looked at the RA21 solutions and forgive me but they seem still quite a ways off. It’s also not clear to me that any solution continuing to require WAYF “work” by the user, even if it’s somehow easier than current WAYF, is the right way to go.

Proxy servers don’t have to use IP, they can be made to work seamlessly with federation, and they do what they do quite well. Perhaps throwing away the proxy server is throwing away the baby with the bathwater. At the same time I agree with the authors that federation solutions if configured correctly can do a good job at protecting user privacy.

While the library browser toolbar idea of the 2000s may have died out, other types of browser add-ons like password management services have been tremendously successful. Zotero’s browser add-on has the interesting and very useful mechanism of noticing and “remembering” licensed resource URLs, so that if the end user so chooses, Zotero can automatically put those links into proxy format whenever encountered by the end user. What if we could put Ezproxy’s configuration (but not authentication) into the browser itself, identified with the user’s institution? Truly there is nothing “secret” about Ezproxy stanzas. This might take Zotero’s idea a step further. If you kept IP involved, it might only be telling publishers a single IP – that of the institutional proxy server.

Before this thread turns into yesterday’s news, I wanted to make the following points:

1. I agree with one of the commenters that makes the point that in a fully open-access world, the particular problem of how do you provide access changes considerably. Publishers (and authors) will want to know who is reading their stuff, but if it is truly open, then many of the hard problems evaporate.

2. I totally agree that the IP based and proxy based models are out of touch with how people read these days. Again, as more things move to open access, these challenges melt away.

3. As to Roger Schonfeld’s challenge to provide users with more control about how they choose to share their identity with publishers, which is complicated by the fact that in a non-open system, the contractual relationship is with the institution not the individual, I wonder if the clever folks at RA21 have considered borrowing from some of the architecture of anonymizing technologies like TOR (https://en.wikipedia.org/wiki/Tor_(anonymity_network) which by creating a series of hops that have only limited information about the requestor masks from the provider the identity of the requestor. Such an architecture would make it impossible for the institution to share individual information with publishers while still certifying that the individuals are part of the institution.

4. Not to be overly cynical, but it does not give me great confidence to see that the chairs of this effort come from Elsevier, Wiley, and the American Chemical Society. To the extent that this particular effort is intended to strike a balance between the competing interests and needs of the various stakeholders, I suspect that including other stakeholders in the process to speak to needs surrounding privacy and data protections (the EFF perhaps?) might provide us with greater confidence in the outcomes.

No matter how much publishers might fancy RA21 and the end of IP-based access to journals, books and databases – IP authentication is used for many other purposes as well. In my institution, it is the first step to prove somebody is eligible to access certain information. In a few cases (e.g. the menu of the canteen) it is the only step; in most cases more steps follow.
This means that IP authentication will still be there, even if libraries and publishers should consent to abandon it or if publishers should decide to do so unilateral. From that follows that all costs associated with any new procedure come on top. My question is: Who will pay for that? A German quote says “If you order the music, you have to pay for it.”

“they also present significant problems in configuration and management…”

To this point, as an EZproxy admin that works with several other systems I have to say EZproxy is one of the easiest products to manage in IT. Much of the troubles in the past with configuration and management are due to vendors updating their platform with little or poor notice to customers and OCLC. This has improved some. This is not to the EZproxy documentation itself is amazing, but it too has been improving.

As Cody in the comments suggested, I have often wondered why a WAYF-type process on the vendor’s site couldn’t point back to our EZproxy server. Many vendors already have our proxy information in their client center, which could easily be used to construction the needed URL, send the user to the authentication system, and route the user back to the vendor’s website.

Comments are closed.