Last week, our newest Kitchen contributor Lisa Janicke Hinchliffe raised questions about the RA21 initiative, analyzing how its effort to improve the security of content providers in the face of rampant piracy may have consequences for, among other things, privacy and the future of IP authentication. I strongly agree with her perspective about RA21, and I want to take some column inches today to explain why. It is not so much that I am specifically concerned about the privacy issues of RA21, although I think they deserve scrutiny. My concern is this: In establishing a mission “to align and simplify pathways to subscribed content across participating scientific platforms,” RA21 has scoped its problem the wrong way. Simply put: It’s not about security. It’s about identity. Every individual should be in control of their own identity.
No one can deny that there are major problems with our current mechanisms for authorizing access to licensed collections of scholarly content. While on-campus access can be made entirely seamless via IP-based authentication (for the declining case of a user who stays entirely within the campus network), off-campus access is a mess. I first called attention to these issues in an issue brief and a talk at STM about my hapless efforts to access an article citing my work. As I said emphatically, the proxy server, which is in wide use across US higher education and beyond, is not the answer. Proxies fail to support the actual patterns of discovery and access that are natural to researchers, who aim to move seamlessly among resources, regardless of starting point, without having to get authorized for every platform every time. Proxy servers for off-site access to licensed e-resources are a sorely outdated technology that sets out a stumbling block in the researcher’s path, driving users away from publishers and libraries alike towards the open web.
But something else happened on the way to improving the user experience. Sci-Hub exploded on scene as a massive driver for piracy. And, when the publishing community started to look closely at Sci-Hub and how it works, it became apparent that proxy servers were a fundamentally weak link in the security chain, for at least two reasons. First, they aren’t uniformly well maintained by libraries, allowing vulnerabilities to fester. And, second, when unauthorized use is detected, they often force content providers to choose between permitting downloads anonymously or taking the “nuclear option” of turning off access for an entire university. And so it was no longer the case that the priority was improving the options that could coexist alongside proxy servers to improve the user experience. Now, from a content provider perspective, proxy servers have got to go.
As I have been reflecting on RA21, the umbrella initiative for alternative authorization mechanisms beyond the proxy server, I fear that the framing around security or user stumbling blocks are both wrong. The underlying question for modern authorization is about authentication of individual users and so authentication is increasingly about identity. As a result, RA21 is necessarily mucking around with issues of identity. And if poor choices are made about managing identity, the academic community risks making a major mistake with wide-ranging implications.
Identity online is a hot topic today. There is widespread recognition of the substantial value of user data. Many readers will be familiar with my work to chronicle the impressive and valuable businesses being built in our sector that rely on identity as their foundation. These include the array of bibliometric and research evaluation systems that were disrupted last week with the launch of Digital Science’s new Dimensions product. Elsevier’s and Digital Science’s integrated research workflow suites reaching back into the science laboratory are fundamentally about data and analytics, as Elsevier itself shouts loudly to anyone who will listen. Some might conclude that identity is everything.
Major publishers, discovery services, and research providers are pivoting to take on more of the researcher workflow, including most of the leading organizational participants in the RA21 effort. These publishers and vendors are already using the advantages of their positions today (just as they should be!) to build impressive new offerings in discovery, library systems, research workflow, and analytics. The way this game is played, the more data a corporation has, tied to an individual identity, the more value it can generate. The most important consideration for managing identity is that the user data must be controlled by, or at least fully accessible to, the platform provider. It has been several years since David Smith on these pages called for “a set of principles about how such data is to be used.”
There is no observer more enthusiastic than me about many of the opportunities presented by data-enabled services, from personalized scholarly discovery to researcher workflow tools, but data empires are not the only way to build most of these services. Indeed, there are alternative frameworks that would leave the user in control of their own identity and their own data. Brigham Young University developed the notion of a “personal API” that would empower the individual to control their own data and choose where it was used. I proposed, nearly three years ago, an approach that would put individuals at the center, rather than universities, publishers, or other vendors, in terms of both authentication/authorization and their user data. Just last month, Herbert Van de Sompel gave a compelling overview of how a decentralized model along these lines could be conceived. On a more centralized basis, it is possible to imagine ORCID growing into such a service, or others doing so, even if there are few indications of such development. Such alternative approaches operationalize privacy by giving the user control of their identity and data. And, as a happy byproduct, they force providers to compete based solely on the services that they can provide on top of the data — rather than competing based on control of the data, with all the disadvantages that we have watched take hold in the consumer sector.
But, RA21 is not pursuing broader solutions. It is not centered around individuals and their own control of their identity and data. Users are essentially pawns. Instead, RA21 is scoped narrowly, which just happens to avoid disrupting providers or interfering with the dominance of leading players. It gives major advantages to those market incumbents that already have access to large amounts of usage and user data. Solutions that would create a level playing field around user data are certainly not in the interests of market incumbents. It is unsurprising that RA21 seems to be taking a set of approaches that reifies the interests of the companies that led its initial development. While there are efforts being made to add a “library perspective” to the RA21 table today, it is around policy considerations such as privacy rather than fundamental architecture.
Indeed, we are starting to hear rumors, as Lisa reported last week, that leading RA21 players are ultimately interested in killing off IP authentication, not just the proxy server. IP authentication has offered a truly seamless user experience, enabling researchers to choose their own pathways from discovery to access, to the extent that they are working entirely on the campus network. And, it has been seen to offer more anonymity for users than individually authenticated alternatives. But, given that users on handheld devices are likely to be moving regularly across networks, IP authentication is not the panacea of seamlessness it once was. And, to be sure, any network-based authentication does permit some content security vulnerabilities. But, notably, requiring that researchers individually authenticate for every session strengthens the data and analytics businesses discussed above. All in all, it makes perfect sense that leading providers would see removing IP authentication as a strategic objective, positioning them to build further on their data empires.
We are being asked to trust that in a second potential stage of work RA21 will take on broader questions yet to be determined. This may yet happen, and I hope that it does. But, once the limited shared security and off-site access interests of the current group of industry leaders is addressed, will the sector representatives participating in RA21 take this next step? Why should observers trust that RA21 will support efforts that will put into place user-centric identity and personalization choices that could reduce the strength of incumbent advantages in developing new businesses? All indications point in the opposite direction.
I challenge RA21 to stop asking for trust and rather to earn it. Doing so would involve a commitment to develop a user-centric level playing field empowering users to manage their own identity and data. I hope that in doing so RA21 can realize its potential to serve the broader interests of scientists and academia, not just the understandable objectives of publishers and platforms.
I had the great benefit of comments about drafts of this post, on an extremely short timeline, from: Cody Hanson, Bruce Heterick, Lisa Janicke Hinchliffe, David Smith, Aaron Tay, and one anonymous contributor. They helped to improve this post tremendously but do not uniformly share my perspective nor should they be held responsible for its shortcomings.