What Will You Do When They Come for Your Proxy Server?

I’m starting to see the end-game of the STM/NISO RA21: Resource Access for the 21st Century project. And, dear reader, I’m a little unsettled by it.

I mean, it sounds innocent enough – even laudable:

RA21’s mission is to align and simplify pathways to subscribed content across participating scientific platforms. RA21 will address the common problems users face when interacting with multiple and varied information protocols.

But, have we really thought through the implications of what is being pursued?

18th Century Painting — *The Man Who Pretends to be Asleep While the Thief Enters His House Becomes Drowsy and Really Falls Asleep, Folio from a Kalila wa Dimna, 18th century, image via The Metropolitan Museum of Art*

Most librarians are very familiar with how challenging it can be to get to the PDF from a citation. Roger Schonfeld has done great work raising these issues. When I watch this video of his attempt to gain access to an article, I sometimes have the Ride of the Valkyries running along as a soundtrack in my head (start at 6:45 for the specific demonstration).

As a PhD student, I experience my own frustrations with this regularly. Even though I am a librarian who knows how our systems are configured, I can find myself clicking through to dead-ends and frustrated with embargoes, content that is published but not yet processed for the platform via which my library subscribes to it, DOIs that aren’t yet registered, etc. It is particularly egregious when a platform, such as Elsevier’s ScienceDirect in this case, prevents users from using established pathways to library provided access. I find myself spending hours of time to retrieve a minimal number of sources.

Libraries put a great deal of effort into supporting and managing links, openURL resolvers, proxy servers, etc. I am responsible for managing the A-Z and subject database listings on our library website. I know how challenging and time-consuming even that aspect alone can be. So, of course librarians are interested in publisher efforts to “align and simplify pathways to subscribed content across participating scientific platforms.”

But, is the approach chosen by RA21 what we meant?

Publishers, libraries, and consumers have all come to the understanding that authorizing access to content based on IP address no longer works in today’s distributed world. The RA21 project hopes to resolve some of the fundamental issues that create barriers to moving to federated identity in place of IP address authentication.

Authorizing access based on IP address no longer works? Honestly, it seems to work rather well much of the time. Seamless. So seamless our users often don’t realize it is happening! Okay, I’ll grant that off-campus access is where the IP authentication system does often break down if users do not go through the library website and its proxied links. But, is “federated identity” across platforms the solution we want?

I have been encouraging librarians to take a more active interest RA21 for more than a year. At this point in the conversation many have said to me, “what does that even mean?” Great question!

Federated Identity (and Privacy)

Here’s my understanding. When fully realized, it means that by logging in once, you would be recognized on all participating platforms, which means you could leave a data trail of both who you are and what resources (content and tools) you are using. Yes, that means your data could be potentially aggregated across platforms and combined with other datasets to create a more complete profile of you as a user. It is likely that you are already leaving trails of use data connected to the IP addresses of the devices that you use. With federated identity, the trail is connected to you and to the devices. An analogy is how one can use a Google login to access not only your Gmail but also Dropbox, Asana, etc., and then Google is able to build a profile of you as a user by integrating the data from your activities across platforms and tools.

Such federated tracking is unlikely to be fully developed in the initial RA21 projects and the most pernicious form would require publishers to collaborate in data sharing in ways that they currently are not inclined to do. But, I think there is every reason to anticipate such technologies could be created in a fairly short period of time should those sentiments shift. (For those desiring a more detailed technical explanation of the RA21 projects, I recommend Aaron Tay’s Understanding Federated Identity, RA21 and Other Authentication Methods. The potential for an aggregated data trail is seen most easily in the WAYF Cloud project.)

Reading a little further into the RA21 website, there is a set of guiding principles for the initiative and it is good to see privacy mentioned:

The solution will be consistent with emerging privacy regulations, will avoid requiring researchers to create yet another ID, and will achieve an optimal balance between security and usability.

This is essentially a statement that says the federated identity solutions will follow the privacy laws that they are required to follow. Not too surprising and relatively good news for users in the Europe Union (EU) and less good news if you are in the United States, though it is likely that platforms would implement the stronger EU requirements uniformly for ease of management and scalability. Nonetheless, these regulations only go so far for user control because RA21 considers the institution, not the individual, to be the owner of the data. It’s not entirely clear to me that the EU regulations will support that interpretation.

I hope that the discussions may shift to the importance of building in mechanisms for user control. According to Todd Carpenter, fellow Scholarly Kitchen Chef and NISO Executive Director, RA21 is exploring adopting the NISO Consensus Principles on Users’ Digital Privacy in Library, Publisher, and Software-Provider Systems. This would be a welcome addition to the RA21 privacy framework.

A side note here: I acknowledge that the SAML approach embraced by RA21 is more privacy-protecting than, for example, adopting a Google or Facebook OpenID option. It is not, however, more privacy-protecting than IP authentication.

Eliminating IP Address Authentication

Many librarians who are concerned about user privacy have said to me that their solution will be to refuse to implement federated identity and insist on staying with IP address authentication. I’m not confident that is going to be an option.

First, campus technology units are likely to prefer federated identity solutions to IP authentication as identity-based solutions offer greater account and network security. These units do not seem to share the library’s commitment to user anonymity and minimal data sharing. In fact, more than one publisher/platform has privately confirmed to me that campus identity systems pass along more user information than they need or would like to receive. I recently watched as a campus technology SAML/Shibboleth system passed a user’s email address, full name, and staff/staff status to a vendor in order to allow access to a PDF from off-campus when on-campus access would have been possible based on IP address alone.

Second, publishers and platforms will likely prefer identity-based authentication mechanisms. Again, identity-based systems offer greater account and network security than IP authentication, which is widely seen as a weakness that Sci-Hub exploits in pirating content. Also, users are already regularly tracked by vendors via tools like Google Analytics, regardless of the authentication process. Given the voracious appetite for analytics and metrics in higher education and publishing, combined with concerns about security and license compliance, platforms have every reason to want move not only off-campus access to federated identity but on-campus access as well. I anticipate that publishers will eventually begin to craft licensing agreements that require identity-based authentication, making explicit that they no longer offer IP authentication.

Ultimately, is the long-term goal of RA21 to eliminate IP authentication altogether? When asked this question directly, the response is consistently about how long it will take and not a denial of the long-term goal. It should probably not surprise us if an initiative that is working to “resolve some of the fundamental issues that create barriers to moving to federated identity in place of IP address authentication” is aligned with a long-term goal to implement federated identity globally.

Impact on Libraries and Publishers

How would the elimination of IP authentication change the marketplace? Smaller publishers may be unable to support identity-based login themselves, potentially driving them to contract their content to a larger platform or to purchase authentication services in some way. This will raise expenses for smaller publishers without returned value and likely drive greater consolidation of scholarly publishing platforms.

Libraries will find themselves no longer able to offer the seamless user experience that IP address authentication provides much of the time. Given the importance of user-centered discovery and delivery, increasing friction in access to resources will be a disappointing step backwards. Libraries will be forced to devote increasing amounts of staff time to training and troubleshooting identity-based accounts and this will be particularly acute if on-campus IP authentication is eliminated.

Strategies and Next Steps for Libraries and Publishers

One might wonder if we are past the tipping point with RA21 and the march toward federated identity through commercial platforms. Is it too late to create a user-centric alternative managed by a trusted third party and developed by libraries? I’d like to think not but I’m worried it might be. Perhaps there are alternative efforts underway of which I’m just not aware?

Nonetheless, it would be a bit unsatisfying to close this blog post without some thoughts about next steps. So let me suggest some strategies that every library can implement locally and immediately:

Reach out to the campus technology unit that manages identity-based authentication systems (e.g., InCommon or OpenAthens) and engage in an ongoing discussion about privacy, user control, minimal sharing of identifiable data, etc., with the goal of developing local principles to guide data release.
Watch carefully for licensing terms that dictate user data sharing requirements for access to content and be prepared with responses. If IP authentication is no longer an option, seek to minimize the user data that is demanded in exchange for user access.
Review library privacy policies to make certain that the library is transparent about what data is being passed to third-party systems and what alternatives users have if they want to try to opt-out of data sharing and tracking.
Regularly use library resources without using IP address authentication to monitor the user experience of identity-based authentication and the messaging from platforms to users. Some librarians who have told me they will refuse to implement federated identity actually work at institutions that have already implemented SAML-based InCommon or OpenAthens for access from vendor sites. In such cases, the librarians had not realized this because they themselves only access library resources on-campus, over VPN, or through the proxy server.

As for smaller publishers that may not be following these issues as carefully as some of the industry heavyweights that are staffing the RA21 workgroups and setting policy, they should:

Investigate how authentication affects use of their content and platform at a tactical level and impacts their business model at a strategic level.
Recognize that authentication will affect not only discovery and access to content resources but also other parts of the research workflow, where just a few companies are consolidating major new businesses.
Monitor the overall landscape of available and in-development options from technology platform providers both in the regular course of business and through strategic inquiries during any upcoming RFP processes.

As for me, I’ll be honest. I believe that IP authentication is going away — maybe not immediately but relatively soon. Pragmatically, I’m focusing my efforts on influencing what comes next. I’ve joined the RA21 project and am serving on on the privacy and security work group with a view to advocating for user control. To give credit where credit is due, I have found the members of the work group to be very welcoming of my perspective and participation.

Lisa Janicke Hinchliffe

Lisa Janicke Hinchliffe is the Founder and Principal of Librarian in the Loop LLC, providing independent advisory services helping scholarly publishers and platforms access trusted library perspectives to inform their decisions. She retired as the Professor/Coordinator for Research Professional Development in the University Library and affiliate faculty in the School of Information Sciences, European Union Center, Center for Global Studies, and Center for Social and Behavioral Science at the University of Illinois at Urbana-Champaign.

Discussion

39 Thoughts on "What Will You Do When They Come for Your Proxy Server?"

Great piece, Lisa. I’m at a conference right now where, just yesterday, I heard a presentation about blockchain that elicited similar thoughts. I definitely see the benefits, but I wonder whether we’re considering all the potential downsides.

By Rick Anderson
Jan 16, 2018, 7:54 AM

Great post, Lisa! Provocative on a subject that needs more attention. There is a working example of SAML in the UK where JISC manages Shibboleth for all academic libraries in an environment where the EU has far more stringent regulations on general privacy matters than the US. https://www.shibboleth.net/index/basic/

As a result I’m more concerned about issues in the user’s workflow that are seldom reported to librarians. The anecdotes we hear are merely the tip of the iceberg. If we consider the volume of students who don’t do their research in the library and those who live off campus, we have a failed connection. Simply clicking on a emailed link to an article is frustrating. In an era of seamless access, these problems need to be addressed. I’m glad you’re involved in RA21 as am I.

By judy luther
Jan 16, 2018, 9:10 AM

One more reason why open access is the better answer.

By David Lewis
Jan 16, 2018, 10:00 AM

I’m not sure that makes a difference here. We get everything for free from Google, Facebook and Amazon, yet they all track every move we make on the internet. Privacy is the concern here, not payment for access.

By David Crotty
Jan 16, 2018, 11:00 AM

David’s comment is on the mark.

While I appreciate Lisa’s thoughtful analysis, the academic and scientific community gave a very clear definition of what “Research Access in the 21st Century” was to look like, some 15 years ago with the declarations of Budapest (2002), Bethesda (2003) and Berlin (2003).

They called it open access.

Given the power of SciHub, it is easy to understand why the STM publishers would create such a defense mechanism to fortify their paywalls and protect the monopoly of their content platforms, but researchers write scientific articles for impact, and limited access means limited impact.

If our collective goal is to support researchers in advancing science, our time and energy is better spent on developing open access publishing models that ensure articles are open and re-usable and that the costs associated with their dissemination are transparent and economically sustainable.

By Colleen Campbell
Jan 16, 2018, 12:05 PM

Again, I’m a bit confused here. Did you read the post beyond the phrase “Research Access in the 21st Century”? The post is about user privacy, and that is a problem, even on platforms where content is freely available and reusable. Even open access journals need a hosting platform. As I noted in the comment above, we have incredible invasions of our privacy happening on platforms like Facebook and through companies like Google. Both of these offer free content to users, yet have little restrictions on surveillance.

Open access is NOT the answer here — it’s irrelevant to the question being asked.

By David Crotty
Jan 16, 2018, 12:13 PM

While it is true that even open access publications need a platform, they do not inherently need authentication. While I would assume that they would do IP-based tracking, that is not the kind of identity-based tracking that is a concern with RA21 models.

By Lisa Janicke Hinchliffe
Jan 16, 2018, 12:21 PM

You’d think, but given the move that publishers are making into becoming “data analytics companies”, and given the increasingly fierce competition for paying authors, I’d be willing to bet that even OA publishers are interested in collecting user demographics and doing targeted marketing. In an all OA world, do you really think that Elsevier, Springer Nature and Wiley would willingly let go of all that valuable data?

By David Crotty
Jan 16, 2018, 12:28 PM

In a world of open access, I’m sure they will do everything they can to create services that will offer value sufficient to entice users to make accounts. Nonetheless, in such a world, there is no requirement to authenticate in order to read the content.

By Lisa Janicke Hinchliffe
Jan 16, 2018, 12:57 PM

Since open content does not require authentication this issue can go away. The open content provider, especially if it is a library or similar institution will have very different incentives than a commercial publisher.

The post points out an important problem that many of us have not focused on. Thanks.

By Daivd Leiws
Jan 16, 2018, 1:27 PM

Yet as open access grows, the “content provider” is increasingly a commercial company, as the big commercial publishers are now dominating the open access market. You may not be required to authenticate your credentials to receive access any more, but this does not mean that the host platforms will be unable to track you effectively. I didn’t need to authenticate myself to look at a pair of shoes on a shoe store website, and now I see advertisements for those shoes when I look at a newspaper website. Is this sort of surveillance acceptable for academic researchers based on what they’re reading?

By David Crotty
Jan 16, 2018, 1:39 PM

There may be no legal requirement, but I suspect there will be de facto requirements. You can contact all of your friends and find out what they’re up to without creating a Facebook account. Billions have chosen to sacrifice privacy for convenience instead. Many (most?) journals offer up article metrics these days (downloads, altmetrics, citations) on the abstract version of the paper, no authentication required. Yet millions have signed up to ResearchGate to track these very things. Tying the question of privacy and scholarly anonymity to open access does not solve the problems that are widespread everywhere else in the digital world (see this for example: https://www.washingtonpost.com/news/innovations/wp/2018/01/15/big-brother-on-wheels-why-your-car-company-may-know-more-about-you-than-your-spouse/). Even in an open access world, it is important to have standards and requirements for the platforms that will host those open access articles, otherwise we can expect a great deal of abuse.

By David Crotty
Jan 16, 2018, 1:36 PM

You are raising important points David C. Let me bring it into even sharper relief …. there is the danger that, in an OA world without institution based authentication, publishers might shift to use e.g., your Google or Facebook account or the like as the basis for account creation on their site. I am hopeful, based on some evidence we can already see though, that they will use ORCID, which would keep us away from “you looked at an article about libraries, would you like to buy this “I love libraries” totebag from Amazon?”

By Lisa Janicke Hinchliffe
Jan 16, 2018, 1:45 PM

Thanks! I think what I’m trying to say is that just making everything “free” does not solve the surveillance issues.

And I’m really glad that you (and other librarians) are involved in RA21 because I really want to solve the struggles that our readers often face, but don’t want to do so at the expense of the important principles of the library community. I’m hoping a fair balance between convenience and privacy can be found.

By David Crotty
Jan 16, 2018, 2:13 PM

Lisa,
Thank you for the thoughtful consideration of the RA21 effort, which in full disclosure, I am an active participant and member of the leadership of. I’d like to point out several clarifications about the effort and the SAML system upon which the effort is being based.

It seems that there are a number of misconceptions about SAML and the administration of SAML systems which leads to some confusion amongst the library community. First, the amount of information that is exposed to a content provider using a SAML authentication system can range from nearly nothing to very robust metadata about the patron. These settings are managed by the identity provider, i.e., the institution, NOT the publisher who receives the access token. The authentication token an identity provider can expire at any time that the identity provider wishes to set, therefore the tracking of individuals using that token might not last beyond the time of a single session. Actually, some systems have a token duration of as short as an hour by default, which is less time than most research sessions could last. If the token disappears and is reissued, than from the content-provider’s perspective, there is a new user, so the ability to track individual user’s behavior is then disjointed from an analytics perspective. This again, is a setting controlled by the institution. Within the entire system here, there is a balance between making the system secure and functional, but also to make it as user-friendly as possible. It is also a balance between the interests of the library and the IT system administrators. The ability of the publishers to manage this is actually rather small.

In large part, the RA21 effort is aimed to bring access to subscribed resources in line with the vast majority of restricted-access content on the web. People have been aware of the problems with IP-based authentication and have discussed moving to new forms of authentication for years. Publishing and libraries are among the last major hold-outs in this regard. Anyone familiar with network security would say (and I’m simplifying here) “you’d no more want to use IP based authentication in a modern environment than you would to bank using a computer running windows 95.”

The privacy and control issues, you describe are critically important principles that the RA21 group agreed on early on. This is one of the main reasons why RA21 is NOT using OpenID for this purpose. Sure, the project could have suggested using OpenID (i.e., login with your FB or Google or twitter account to verify your identity), but this a) creates the very tracking you object to and b) it allows those commercial identity providers to track the user as the OpenID system is constantly pinged to validate their ID, which leaves a trail of login requests across platforms. This seems even less secure or likely to protect patron privacy.

I have made the point in a variety of publications about RA21, which you seem to agree with, that there needs to be an active and robust conversation about privacy and identity provision by the institution between the library and the campus IT, not a discussion between the content providers and the library. This is a far more challenging conversation for the library, I think. The RA21 and federated ID systems can be as privacy protecting as the identity providers and systems admins want it to be. It is, however significantly more secure and there are a variety of reasons to head in this direction.

Another concern you mention is that there could be a centralized store of patron usage behavior from which publishers could generate analytics presumes that publishers would be willing to share usage data from publisher to publisher or platform to platform. I guarantee that this is NOT GOING TO HAPPEN. There is few data that publishers cherish more and are more protective of than user behavior data to the extent that they have any of it.

Finally, The trend for consolidation because of the technology needs to serve web content has impacted smaller publishers for years. Authentication systems are one cost of running your own web delivery system, which is a costly proposition. IP auth is not without its own costs. Running a web-base publishing operation is already expensive and most publishers are contracting with 3rd parties for the publishing stack in one form or another, be it a large publisher be it commercial or not, a vendor like Highwire or Silverchair, or an aggregation like BioOne or Project Muse.I don’t doubt this is a driver of consolidation but SAML-based authentication won’t change that calculation. The cost is already too high for most small publishers. RA21 will not accelerate this process.

I agree with your bullet list of ideas and discussions that the library community should advance. These suggestions apply regardless of the success of RA21 and the movement away from IP-based authentication.

By Todd Carpenter
Jan 16, 2018, 10:51 AM

Thank you for your comments, Todd. I appreciate your contributions to this conversation as well as the topics more generally.

Much of your commentary around privacy hangs off this: “These settings are managed by the identity provider, i.e., the institution, NOT the publisher who receives the access token.” That is indeed the case. That does not mean, however, that the RA21 initiative is unable to engage in setting expectations around the decisions of institutions and how they should align with the RA21 commitments to privacy and usability. After all, institutions/libraries are as much stakeholders in NISO processes as are publishers I believe?

While it appears that there was a decision at some point for RA21 to only address practices of publishers/platforms (I suspect perhaps because the project was originally set out by STM alone – which does not have the broad stakeholder community that NISO does), RA21 could also set expectations about what it means for an institution/library to be “RA21 compliant” – for example, just as the NISO Privacy Principles set out expectations for libraries/institutions and not just the platforms.

As a simple example of why this might even be in the interests of publishers, I know of at least one SAML implementation that currently tells users that the publisher is requiring personal information as part of accessing the publisher’s content when in fact the publisher is not requiring that information. This means users are mis-informed about the publisher’s policies (and it can also create confusion about why a user is not receiving personalized services from the publisher in exchange for their personal information). If I were the publisher, I’d want to be able to influence how my system’s requirements are presented to users even if I couldn’t absolutely control the identity provider settings directly. RA21 is positioned to address this by conceptualizing the identity provider as inside the sphere of concerns for RA21 rather than deflecting concerns about privacy by saying that they outside of the control of publishers/platforms.

By Lisa Janicke Hinchliffe
Jan 16, 2018, 12:48 PM

To the extent that RA21 is a joint initiative of stakeholders, I completely agree with your comments about the need to set industry norms here — why I agree with your overall suggestions about what to do next. Unfortunately, libraries weren’t early participants in the RA21 initiative, which we’ve sought to remedy as the project progressed. This has led to some unfortunate narrowing of scope, which is possible to extend–and to even revisit some baseline assumptions, if there is interest–to include a more library-concern focus on some elements. For example, the movement in the direction of adopting the Privacy Principles is a welcome. Additional norm setting around authentication systems would also be another valuable direction for additional work.

However, the post starts off in a direction to decry publisher efforts to push changes in the authentication system, but ends by calling on libraries to exert greater control over that which they (or perhaps more correctly, the campus IT and networks of identity federations) already control. In the example in your response of misleading behavior by a publisher is something that should be governed by norms, but it is not therefore a problem with RA21. There are a variety of let’s call them “shady” practices, which most publishers do not engage in, nor condone, and often publicly criticize. In this case, if there were community norms about what data should be conveyed–say the minimal amount to provide a service, with the option personalization data if the users wants to share it–than we could point to violations of norms. Ideally, this is where RA21 will end up, with sensible privacy-protecting services and guidance on how to best implement them. This seems a far cry from the somewhat sensationalized “publishers are coming for your proxy servers” tone of the article’s title and the statements about tracking all users’ behavior and aggregating that into a super-user-tracker. Neither is an aim or RA21 or its participants and the group is doing a variety of things to prevent those concerns from coming to pass.

By Todd Carpenter
Jan 16, 2018, 2:23 PM

Librarians were of course not early participants in RA21 because RA21 was originally a project of International Association of Scientific, Technical and Medical Publishers (STM) alone. (See the archived original website – no mention of NISO and hosted on the STM website: https://web.archive.org/web/20161211194357/http://www.stm-assoc.org:80/standards-technology/ra21-resource-access-21st-century/). I have no doubt that if NISO had been involved originally, the origin story re librarian involvement would be different.

As to the “coming from your proxy server” … I’d welcome a statement from RA21 that it does not seek to displace IP authentication through proxy servers or on campus IP authentication. I’ve been in two webinars/online meetings where the speaker representing RA21 has not taken the opportunity to make such a statement when asked directly and, as I mentioned in my post, have instead commented on how long it will take to get to that point. I think librarians are rightly questioning what the goals are when the RA21 website states things like “key stakeholders will explore pathways to move beyond IP-recognition as the primary authentication system” and there are statements such as your’s above that “you’d no more want to use IP based authentication in a modern environment than you would to bank using a computer running windows 95.”

By Lisa Janicke Hinchliffe
Jan 16, 2018, 3:04 PM

P.S. The example in my reply is not “misleading behavior by a publisher” … it is a misleading statement by an identity provider about a publisher. The publisher doesn’t want the data. The identity provider is sending it anyway and telling users that the publisher is demanding it.

By Lisa Janicke Hinchliffe
Jan 16, 2018, 3:10 PM

Thank you Lisa for raising this issue on the Kitchen and calling attention to the vital dynamics associated with which stakeholders lead standard-seeing efforts like this one. My concern about RA21 is that it does no more than address some of today’s problems as understood by one set of stakeholders. As a result, it reifies existing industry norms rather than serving as a platform for real innovation.

By Roger C. Schonfeld
Jan 16, 2018, 8:39 PM

Roger, I agree that RA21 currently seeks to solve the “today” problem, but mostly because the challenge of remote access to content has been a lingering issue for so long and remains unaddressed in implementation. This first step will be the basis for adoption of future innovation and will set the foundation to adapt to advances in identity and authentication as they arise from the security and privacy realms. Considering the strong industry feelings (on both ends of the spectrum) about this type of a shift in access approach, I believe that taking measured steps forward will have the greatest chance of acceptance across the wide range of use cases and audiences. Perhaps once a level of trust a and confidence is established in the intentions and outcomes of this first step, the next can be greater strides with further reach and impact.

By Daniel Ayala
Jan 17, 2018, 12:16 AM

Danny, I am grateful for your contributions on these matters. I also hope you are right about measured steps leading to greater strides. But I am doubtful.

So many of the leading participants in this effort are, in their own organizations, rapidly pivoting beyond content licensing. They are already using the advantages of their positions today (just as they should be!) to build impressive new offerings in discovery, library systems, research workflow, and analytics. It is therefore unsurprising that RA21 seems to be taking a set of approaches that reifies the interests of these companies.

We are watching before our eyes just how valuable user identity data are, and as you probably realize I have been chronicling the impressive and valuable businesses being built in our sector that rely on identity as their foundation. RA21 serves to limit opportunities for market competition in these areas of identity-enabled academic services. It is not centered around individuals and their own control of their identity and data. Instead, it gives major advantages to those market encumbents that already have access to large amounts of usage and user data.

So, I understand in one way of thinking the importance of developing trust across content providers. But, this is not an agile development project where RA21 are putting items into a backlog and committing to a colleague that it will get to other priorities next year. Once the limited shared security interests of the current group of industry leaders is addressed, why should observers trust that RA21 will support efforts that will put into place user-centric identity and personalization choices that could reduce the strength of incumbent advantages in developing new businesses? Such trust can of course be earned, but right now it is deservedly absent.

By Roger C. Schonfeld
Jan 17, 2018, 6:08 AM

I think we should be cautious about conflating a “today” problem with a specific “today” solution. The problem commonly agreed is stumbling blocks in accessing content from outside of one’s campus IP range. One solution is SAML-based approaches. Another would be OpenID. Another would be for publishers/platforms to just point back to the proxy servers at institutions that enable IP based authentication!

Who was involved in picking the solution that RA21 pursued? Again, let’s remember that the strategy was chosen by STM – not the full stakeholder community. That’s where the trust was lost.

My post isn’t just about the privacy concerns in identity based authentication. It is also about the efforts to “move beyond” IP authentication. I don’t see broad support for that among any stakeholder group except the publishers/platforms. I don’t think “a level of trust and confidence … in the intentions and outcomes of this first step” can be established if the concerns that librarians raise about moving away from IP authentication are pushed aside to be considered sometime in the future after the publishers get the identity based system they want.

By Lisa Janicke Hinchliffe
Jan 17, 2018, 10:28 AM

Todd writes:”These settings are managed by the identity provider, i.e., the institution, NOT the publisher who receives the access token. ”
That sure is true, but at this monent some service providers demand much more attributes to be exposed than is needed to be able to make use of their services.

For libraries it is quite difficult to refuse access to these services our users would like to have given access to.

By Peter van Boheemen, wageningen university and research
Jan 17, 2018, 5:31 AM

With IP authentication, it is easy for someone not affiliated with my university to walk into the library and access our licensed content. Would RA-21 mean the end of this kind of access?

And I agree that open access would make the authentication problem RA21 is trying to solve irrelevant.

By Curtis Brundy
Jan 17, 2018, 3:28 PM

One perhaps trivial observation I will make about how Shibboleth and OpenAthens are currently configured on most publisher platforms: the Shib login buttons are never consistently placed from publisher to publisher, so users have to go on a scavenger hunt to find the right login page. Right now thanks to EZproxy patron authentication is the only consistent cross-platform user experience. Initiatives to effectively supersede EZproxy should look at developing industry standards around location and presentation of login functionalities. Otherwise we’ll end up in the same boat as we are now with patrons thinking their library does not subscribe to a particular article because they aren’t able to locate the PDF button because that button is placed differently on each and every vendor platform. Inconsistent placement wouldn’t be a problem if users were accessing Shib-configured URLs from the library’s website, but if part of the justification of RA21 is to enable users “in the wild” to access content seamlesssly, it would be helpful to make the login experience consistent.

By Michael Rodriguez
Jan 18, 2018, 8:05 AM

Slightly at a tangent and yes I have heard Roger on this topic a number of times and he is always very convincing BUT I am an old person not very good with computers and when I want to access a publication I never have any problem using my library (UCL). I only have to log in with the library. More relevant none of the 40 plus ECRs (US and UK) I have now interviewed for two years have mentioned problems relating to access. We know that Google Scholar which they mainly use leads them through to the library and if not there is always Research Gate. That is what they mostly say. Librarians (even Lisa) tend to think that the complaints that reach them are the tip of the iceberg especially if they are complaints about publishers but how do we know they are? Research?
Anthony

By Anthony Watkinson
Jan 19, 2018, 11:55 AM

Anthony, no one is denying that it is possible to manually take your citation back to the website of the library/ies with which you are affiliated and then navigate to the content. (Though, I just tweeted out a multi-tweet story of trying to do just that this morning that wasn’t nearly as easy as one might think.) And, yes, one can configure Google Scholar and even access (possibly infringing) copies on ResearchGate. It isn’t that people can’t eventually get to content – it is that they regularly have to work around systems to do so.

How do we know that the complaints we get are the tip of the iceberg and not the whole iceberg? I’ll answer that in a second but let me start by saying that even if they are the whole iceberg that only means its a smaller iceberg not that it isn’t an iceberg. It isn’t just lost time of the user but also the negative impression they form of the library that means this isn’t just an ice cube but an iceberg. (Can we stretch this metaphor to its breaking point? Yes, I think we can.)

But, why do I think it’s the tip? Off the top of my head, I’m drawing on: interviews and focus groups with researchers on my campus, requests by faculty to teach their online students how to navigate to content, observational studies of undergraduate and graduate students, user survey results over a decade, and search log analysis on our website and discovery layer(s) over about a decade as well. I’m not relying on Roger’s video or my own experience as a PhD student (though let me tell you the later gives me a lot of opportunity for personal experience with this issue!) when I say there is an iceberg here.

By Lisa Janicke Hinchliffe
Jan 19, 2018, 12:31 PM

In iceberg classification this looks big. I must admit to not being up on the literature on the topic: I know it is extensive. Could you give me a reference? We did not ask our interviewees about access problems – just about discovery tools. They did not tell us about these sort of problems. They complained about other things. They did not complain about online submission systems either. The difficulty of using these systems is often mentioned and when I was a publisher I did try them and was not impressed BUT now I use them for real – and I refer to those powered by Scholar One and Aries – I find them user friendly

By Anthony Watkinson
Jan 20, 2018, 8:15 AM

I appreciate you clarifying that you didn’t ask about accessing full-text. Also, I wonder if you aren’t hearing about the problems in accessing subscribed content because scholars typically do manage to get access if they really need/want it. Over time, those work-arounds become routine habits that aren’t even thought of as work-arounds. In particular, there is a fairly often used option – but one that some of us are unwilling to resort to – and that being SciHub. And, the more licit though sometimes questionable #icanhazPDF. And, in fact, I just looked at the search results for that tag on Twitter and the first Tweet received was a complaint that since SciHub was down, researchers needed to revert to #icanhazPDF! Of course, the first reply was the URL for where SciHub can currently be found.

I’m trying to think of the best approach to give you a way into the literature on researcher behavior around retrieval/access, which is, as you say, extensive. Most of the work I have done is very specific to UIUC and used for service and tool development/evaluation rather than research that is published (though it was synthesized into the principles that I reported on here: https://scholarlykitchen.sspnet.org/2018/01/08/discovery-delivery-user-centric-principles-discovery-service/). My colleagues and I have presented at a number of conferences though – this bibliography gives a sampling through 2016 though most are focused more on search than retrieval (apologies for the IA link – our library website is in transition from one CMS to another and this content isn’t up yet – https://web.archive.org/web/20160402183924/http://www.library.illinois.edu/committee/ddst/discoveryresearch.html). We do have all of our user survey results posted online (https://www.library.illinois.edu/staff/assessment/libsurv/).

For more “comprehensive” studies, I would usually recommend that Ithaka S+R surveys, which you already know about. I also really like the more in-depth looks at the challenges scholars face that Ithaka has done focused on different disciplines (see “Available Reports” at the bottom of this page: http://www.sr.ithaka.org/services/research-support/) – though a limitation of those studies for this topic of access is that participating libraries typically have to pay to be a partner in the study and so the scholars who are interviewed are often affiliated with more well-supported libraries. I found interesting the observation in the public health report that international collaborations can be challenging because, though US scholars have access to publications, their international colleagues do not. I imagine this is overcome by the US scholars sharing copies with the international colleagues – pointing to again that it isn’t that barriers can’t be overcome but rather than there are stumbling blocks along the way that frustrate and/or drive users to illicit access options.

The other study that is useful for an otherview is “How Readers Discover Content in Scholarly Publications”by Tracey Gardner and Simon Inger (summary with links of the 2016 study here: https://scholarlykitchen.sspnet.org/2016/03/30/how-readers-discover-content-in-scholarly-publications/). I noticed an announcement on the LibLicense-L that they have secured publisher et al. support for a 2018 follow-up so I look forward to seeing how things look two years later.

Finally, I think the must reads on this topic (saving the best for last?) are “Shadow Libraries and You: Sci-Hub Usage and the Future of ILL” by Gabriel J. Gardner, Stephen R. McLaughlin, and Andrew D. Asher (http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/ShadowLibrariesandYou.pdf) and “Fast and Furious (at Publishers): The Motivations behind Crowdsourced Research Sharing” by Carolyn Caffrey Gardner and Gabriel J. Gardner (http://crl.acrl.org/index.php/crl/article/view/16578). The former, not for it’s look at ILL (I don’t think the hypotheses being tested are well-formed), but for an easy-to-skim look at how SciHub use clusters in the USA. The later for findings not only why scholars use crowdsourced alternatives to libraries to retrieve items but also for insights into why they do the work of being a provider. And of course let’s not forget the finding that 37% of scholars admitted they used SciHub to get a paper that they had legal acccess to at a library (http://www.sciencemag.org/news/2016/05/survey-most-give-thumbs-pirated-papers).

I’ll stop now at the risk of replicating my Mendeley library in comment form. But, I want to underscore something that I believe sometimes gets obscured in these discussions. There are two aspects we have to keep in mind – does a scholar have access *and* how well is that access integrated into the researcher workflow? Even OA materials – to which we all have access – can be frustrating to access due to problems with discovery and stumbling blocks in retrieval. Scholars do indeed have a lot of access to subscribed content – but as Roger, others, and I have documented, there are many, many stumbling blocks and barriers for those same scholars. This is one place where the melting of “icebergs” would be worth celebrating and not a cause for global concern.

By Lisa Janicke Hinchliffe
Jan 20, 2018, 12:04 PM

I clicked on reply in order to reply to Lisa’s post starting “I appreciate you clarifying…” but I am seem to be in a different place. I am grateful to Lisa for her list of sources not all of which I knew but how few of them are peer reviewed. That is a pity.
There is clearly scope for more research and more publications. Here is our own publication on our first year of work: http://ciber-research.eu/download/20170103-Where_and_How_ECRs_Find_Scholarly_Information-LEAP1087.pdf. It covers some of the questions Roger has asked too
Anthony

By Anthony Watkinson
Jan 21, 2018, 12:07 PM

I’m not quite sure what you mean re peer reviewed – if you mean they weren’t all published in scholarly journals, that is true. But, for example, I think that all Ithaka reports are reviewed by experts before they are published (disclosure – having been such a reviewer at times).

Anyway … thanks for sharing your research Anthony – a very interesting journal article indeed. But, I’m now quite confused as to how is it that “They did not tell us about these sort of [access] problems.” Your article reports ECRs telling you about many access problems:

“Searching styles tend to boil down to tracking down the full text as fast as possible. Thus, French ECRs use Hubs (GS, Google and WoS) and try to find, as one interviewee said, their ‘PDF’ as quickly as possible. If the PDF is not found via the first links to publisher platforms, OA repositories, or social media, they go to Sci-Hub. If that is not successful, some send an e-mail to the author, and others give up. All French ECRs complain about information overload, and this information-seeking path is intended to deal with it. Chinese and Malaysian ECRs are also driven by the same need to obtain the full text as quickly as possible from the information mountain but are, perhaps, more tenacious. So, if they cannot get access to the full text, they will change tools, and if there is no full text at all, they will check the references and try to find the full text of similar papers.”

By Lisa Hinchliffe
Jan 21, 2018, 12:54 PM

I was not clear Lisa.

In the first place none of the UK and US ECRs (those I interviewed) mentioned the technical problems that for example Roger has had. This (as you may recall) was my starting point. They might have had them but they did not mention them. Nor did they mention information overload. None of them mentioned SciHub. One did in 2017 and he was in industry.

Across a spectrum of issues the Chinese interviewees and the French interviewees were rather different in their behaviours,

I did not mean to suggest that they always found what they wanted in their libraries or that they did not sometimes go elsewhere (Research Gate for example) first but in 2016 we leant less about where they found. We revised the question slightly in 2017 to make sure that they also answered this part of the question.

The generalisation made at the start of the paragraph is one I buy into. Also Google Scholar and Google were where most of them started their research but Pub Med, WOS and Society resources were also in the frame. They rarely started with the Library. My memory is that in 2017 the big change was that there was more emphasis on the library as where they found content (we asked directly) and in searching Google Scholar was more important than it had been. That being said we have not yet published these 2017 findings yet and again I tend only to remember those that I inteviewed

By Anthony Watkinson
Jan 21, 2018, 2:56 PM

Since your paper is not yet available, let’s return to this discussion when it is and we have the benefit of the data and analysis from the 2017 interviews. If what you have uncovered is a significant difference between the experiences of US/UK ECRs and those elsewhere around the world, that would be worth a focus follow-up study. Perhaps there are differences in workflows that a detailed comparative analysis would reveal.

By Lisa Janicke Hinchliffe
Jan 21, 2018, 3:32 PM

“We know that Google Scholar which they mainly use leads them through to the library and if not there is always Research Gate.” Anthony, this is surprising to me. In both the US and the UK, just over two years ago, we found much more diversity in perceived starting points, among scholars broadly, not just those in the earlier stages of their careers. Perhaps either our sample of coverage, or some other difference in our methods, leads us to differing views about the degree of complexity in the discovery to access environment. Certainly, we continue to find a complex environment indeed, with stumbling blocks therefore of deep concern.

By Roger C. Schonfeld
Jan 19, 2018, 1:58 PM

For the US, see Figures 1-5 here: http://www.sr.ithaka.org/publications/ithaka-sr-us-faculty-survey-2015/

For the UK, see http://www.sr.ithaka.org/publications/uk-survey-of-academics-2015/

By Roger C. Schonfeld
Jan 19, 2018, 2:07 PM

I was making a generalisation but I am always wary of generalisations residing in my mind so I shall go back to our results and some of the complexities and post again. The discovery point is based on two years interviewing of the same people (with a few dropping out) but for the first year (2016) we found the wording of the question had not elicited usually where they found the content they were looking for so we changed the wording slightly to make sure this was covered

By Anthony Watkinson
Jan 20, 2018, 8:04 AM

Lisa,
Thanks for the post. It has certainly deepened my understanding of the issues – which is why I read The Scholarly Kitchen!
For full disclosure before I start, I work for OpenAthens – a SAML based identity and access management software that manages single sign on for libraries and publishers. We are involved in the RA21 initiative and we are one of many organisations that support federated identity management.
We’re part of Eduserv, a not-for-profit organisation whose mission is built around enabling technology for public good organisations.
In the past I have also worked in commercial publishing where user data has long been a source of revenue that supports journalism and associated media across diverse sectors. I go back as far as ‘controlled circulation magazines’ where readers receive information for free on the understanding they engage in a partnership with relevant suppliers. Google, Facebook and Twitter operate the same business model in the digital age. (If I ever want to dowse myself in the cold waters of reality I log into Google and look at the information they have been collecting on me over many, many years).
My perspective on RA21 was that the original purpose was to address many of the issues Michael Rodriguez raises in his comment on this blog. It was looking to build a consensus around the user experience and deal with the lack of consistency and coherence around the user log-in. The output would be a set of standards that supports the following: a seamless user journey designed to facilitate greater engagement with digital resources – a positive outcome for the library and the publishers and the end user. In many ways it is a follow up to the ESPRESSO work in 2011 http://www.niso.org/publications/niso-rp-11-2011-espresso-establishing-suggested-practices-regarding-single-sign
As the RA21 conversations continued it was clear that a consistent digital ID is inherent in managing a consistent user experience on each platform. Contextual based identity governance allows that digital ID to be used with other factors to enable authentication and authorisation. Strong identity management also works as a ‘firewall’ by allowing cloud based services to be accessible from many locations but with confidence in security and compliance.
And I think this is the point where RA21 entered the debate around privacy and anonymity vs utility and compliance. By this I mean the tension between the data an individual needs to share to receive valuable services (Library Card/Netflix credentials/Oyster Card (a London based travel card) vs a right to privacy (what am I reading/what am I watching/where am I traveling) vs the commercial relationship (are you a patron/have you paid/are you fare dodging).
There is one tenet within the new GDPR regulations we hold strongly to: privacy by design. In many ways, SAML itself was created to meet the needs of privacy by design. Within OpenAthens we strive to develop tools that enable the individual, the library and the publisher to set privacy controls depending on the specific context. The challenge is that there is no single solution that covers all scenarios. An individual’s comfort with privacy changes from person to person. An organisation can ‘request’ different levels of data compliance from their students or employees. A publisher may or may not request full disclosure to validate authorisation to resources.
Within these changing and competing priorities the rights of the end user can be lost.
SAML based access systems allows ID data to be passed opaquely to protect an individual’s identity.
It also allows publishers and libraries to control attribute release within carefully defined terms and conditions.
But the RA21 initiative has evolved to cover more than the technology used. For example, proxy logs can capture very granular usage data that allows institutions to see the pages an individual is viewing and the research they are undertaking and link it to outcomes. They are just very difficult to parse.
So, your blog reinforces the principle that technology is important but it cannot be separated from the debate about data compliance, policy and a robust ethical framework.
Jon

By Jon Bentley
Jan 22, 2018, 7:00 AM

I appreciate the compliment, Jon, and also that you are upfront about working for OpenAthens, which certainly stands to gain market share as publishers/platforms begin insist on SAML-based solutions rather than IP and proxy servers. You are correct that “SAML based access systems allows ID data to be passed opaquely to protect an individual’s identity.” Unfortunately, SAML also allows ID data to be passed transparently to the platform rather than opaquely – which is why this isn’t really about how the technologies can work but the policy frameworks in which they are deployed. RA21 has scoped the policy considerations too tightly around publisher platforms to ensure the level of privacy protections the principles might lead one to believe are of interest. This could be changed … I’ve been heartened to learn that my SK post has led some who are involved in RA21 leadership to think that it needs to be changed. In the meantime, I do hope you’ll read Roger Schonfeld’s post today for a different view of the original purpose and driving factors behind RA21 than what you suggest. Best, Lisa

By Lisa Janicke Hinchliffe
Jan 22, 2018, 8:29 AM

The Scholarly Kitchen

What Will You Do When They Come for Your Proxy Server?

Federated Identity (and Privacy)

Eliminating IP Address Authentication

Impact on Libraries and Publishers

Strategies and Next Steps for Libraries and Publishers

Lisa Janicke Hinchliffe

Discussion

Latest “Pulse Check” Results Reveal Diverse Approaches to Social Media

SSP Joins Nearly Half Million Comments in Opposition of Proposed OMB Revisions

Federated Identity (and Privacy)

Eliminating IP Address Authentication

Impact on Libraries and Publishers

Strategies and Next Steps for Libraries and Publishers

Lisa Janicke Hinchliffe

Related Articles:

Next Article: