Rethinking Authentication, Revamping the Business

There is renewed interest in mechanisms to protect and unlock access to scholarly content.

IP authentication is the most important mechanism for authorizing access to licensed e-resources. Substantial business and policy issues for libraries and publishers alike connect up to IP authentication. Today, there is growing interest in eliminating IP authentication, so it is timely to examine the implications if we were soon to see its end.

Earlier this month, I had the opportunity to attend and participate in the Universal Resource Access Forum, hosted by the Copyright Clearance Center (CCC). The day-long meeting brought together publishers, corporate librarians and other information professionals, publishing technologists, and several academic librarians. It focused primarily on barriers to information usage in the corporate sector, with somewhat of an emphasis on multinational pharmaceutical companies. I spoke about the findings of a project on barriers to discovery and access in the corporate sector. There were many interlinked issues discussed during the course of the day. IP authentication, and the opportunities to move beyond it for licensed e-resources, was perhaps the forum’s most important theme.

The Bedrock Of the Site License

First, some basic background. IP authentication is the bedrock of access control for licensed e-resources. A content provider calculates whether the internet address of a user is within a subscribing institution’s range of IP addresses. If yes, access to content is provided. If no, access is denied.

Off-site access is only growing in importance, and when a user is working remotely from the campus or corporate network, one of several mechanisms is made available to provide access. In US academic libraries, the most common solution is a proxy, which make the user appear to be on the corporate network by using institutional credentials to login to a separate service. In many other countries, a SAML-based solution is more common; these allow a user to login directly through an institution’s single sign on infrastructure. As they are implemented, the SAML-based systems are more likely to appear at the right moment in a research workflow, although there is no inherent reason why proxies cannot be used in this way. Of more substantive differentiation, SAML-based solutions allow a user to be more readily associated with one’s usage activity, providing advantages both for security and personalization. By contrast, proxies provide greater anonymity and privacy.

IP authentication in combination with proxy servers for remote access are the basis for the bundled site license model. The bundled aspect receives the greatest attention in discussions about what is wrong with the model for libraries. Two other characteristics are key: First, that it provides access to an entire site as defined by IP address; and Second, that it affords unlimited usage within that site as a result.

Not all e-resources used IP authentication in combination with proxy-based off-site access. For example, the library collaboration HathiTrust does not permit IP-based or proxy-driven access, explaining that “HathiTrust uses rate-limiting to ensure compliance with third-party agreements…Our rate-limiting mechanisms treat all users accessing through a proxy server as a single user…” Nevertheless, IP-based site licenses, with proxy-based off-site access for US academia, remains the common solution.

Recently, however, publishers have had a substantial change of heart. Among the factors at play is the experience of users who require greater seamlessness and personalization. But let there be no doubt that the growing scope and prominence of Sci-Hub has concentrated the mind of many a publishing executive. In this view, the anonymity of IP authentication has facilitated piracy and continues to complicate their efforts to shut down suspicious use.

IP Alternatives

While I have heard these arguments on and off this year, the meeting hosted by CCC made abundantly clear that there is great dissatisfaction with IP-based authentication across the community. Publishers want to move away from it due to their piracy concerns, their desire to improve seamlessness for researchers, and their expectations about the value they can offer through greater personalization. Corporate librarians want to move away from it because of administrative headaches and workflow deficiencies it imposes in their environment. And at least some academic librarians want to move away from it because of the poor user experience, especially with off-site access. Taking aim at IP authentication and proxy servers has become all the rage. But what might supplant them?

The most radical, and I would argue the most user-centric option, is to decouple identity from institutional affiliation. Right now, for authentication purposes, one’s identity is provided by a single institution. But any person is more complicated that a single institutional role; we may be employees of one organization, students or faculty members of another, alumni of a third, and residents with privileges at a local public library. We have access rights from each of these affiliations, each of which may be in flux, and want to be able to work across then seamlessly. Unfortunately, developing this approach would require substantial platform re-engineering. And let’s not forget that, at least in the basic analysis, empowering users would commensurately weaken the role of publishers and institutions in the various data strategies that all are pursuing.

But more pragmatic options are advancing in the near term. One option, which has not received the attention it deserves, Google Apps for Education, which is in widespread use across US higher education and beyond. The outsourcing of email, calendar, and other basic applications to Google and Microsoft has opened up the possibility of using these services for authentication, a modified version of “social” login. The benefit is using these existing consumer-grade authentication providers at little to no cost. There are presumably risks to outsourcing this important function in a way that further consolidates Google’s position in publishing and discovery. Perhaps this is one reason it has not been rolled out broadly, although Google apps authentication is available for a number Gale products.

Another development of some importance is that publisher platform providers will offer more seamless authentication across the platforms they power. This direction, being pioneered by Semantico, will reduce the number of tiny content-platform authentication silos that currently exist. It does not eliminate the underlying authentication issues that are motivating publishers and libraries to wish to move beyond IP authentication.

One direction that seems most likely to gain traction is the further rollout of SAML-based solutions. For licensed e-resources, SAML had been implemented most commonly through Shibboleth federations or OpenAthens. Users attempting to access licensed content are typically confronted with a list of institutions with such implementations, sometimes grouped by the federations of which they are members. The inability for the content provider to send a user automatically to the appropriate institutional authentication service (the so-called Where Are You From, or WAYF tool) creates a confusing and complicated step for the user. But WAYF is a problem that has ready technical solutions. SAML-based solutions are in more widespread use for off-site access elsewhere, but they have not gained traction as the preferred solution for US academic libraries. While SAML is sometimes used for single sign on course management systems, grades, and other student information, I am not aware of it being used for on-site e-resource authentication .

At the Universal Resource Access Forum, presenters proposed a number of approaches to address the issues in question. There were suggestions that corporations could organize their own Shibboleth federation or that SAML solutions could be implemented without Shibboleth. A number of potential pilots are under consideration.

While a SAML-driven solution may not take hold in the long run, we should expect to see much greater energy in alternatives to IP authentication and proxies. Libraries, publishers, and intermediaries should be planning on policy, business, and technical levels for the future they wish to see. Here are several areas for consideration:

Privacy and Personalization

Libraries have stood up for the privacy of their user communities in many ways, and in recent years have expressed concern that data collection and personalization efforts by vendors not betray these principles. The site license model, built on IP authentication, has enabled some important efforts to ensure user privacy. While few libraries have in fact attempted to route all users through a single anonymous IP address, even this type of effort, common in large corporations, has been possible.

Any new authentication model will, in all likelihood, connect directly with vendor platform user accounts. Such a connection will be a real boon to personalization, allowing for the tracking of user-level usage patterns and the delivery of personalization with every interaction. It will commensurately interfere with privacy in a variety of ways.

Key factors to consider:

Will it be possible for users or libraries to opt in or opt out of the new tracking techniques?
Will it be functionally possible for users be anonymous? Or will the data gathering apply to everyone, whether the personalization is delivered to all?
Will it be possible for users to merge more aspects of their identity and activity across services together? In other words, will those who desire a more personalized experience be able to have their data shared across platforms?

Pricing Models

Content providers have been interested for some time in moving away from the unlimited-access site-license model. Many academic libraries, at least, are today paying for journal bundles based on their historical print journal spend plus a variety of inflationary factors and the effects of negotiations over time. Most content providers have at least examined alternatives and some have attempted to establish versions of “value-based pricing” in the marketplace. Such models can emphasize FTE or research expenditures, but perhaps even more promising, in one sense, are models that utilize article downloads or other usage metrics.

In a new authentication model, content providers will gain access to more granular user data than has previously been available to them. For example, they will likely gain access to information about the total number of active users from each institution and the pattern of usage across those users. This may increase the opportunities to introduce pricing models that distinguish more effectively based on the value that customer libraries receive.

While libraries have substantial concerns with the pricing models in place today, they generally prefer not to move to a different model for fee calculation. One of the benefits of unlimited access models has been that the price is known in advance rather than introducing variability into library budgets. While from a publisher perspective this pricing and authentication model has a variety of disadvantages, it has in effect been grandfathered in.

Key factors to consider:

What additional data will publishers and vendors derive from various authentication alternatives that may influence the pricing models available to them?
Are there alternative pricing models that would be of benefit to publishers and libraries alike?

Library as Gateway

My colleagues and I at Ithaka S+R have been tracking the evolving position of the library as research starting point, or gateway, for more than 15 years. Our most recent surveys of academics in the US and UK have found a recovery in the share perceiving the library as a research starting point: evidence to some that the index-based discovery services are influencing discovery behaviors.

Authentication also plays a vital role in influencing discovery. As I showed in a presentation last year, proxy-based authentication forces users to navigate through the library infrastructure, and to use library discovery tools, in order to gain access. Libraries and library vendors should anticipate that changes to authentication models could negatively impact the library’s role as a gateway.

Key factors to consider:

Is it strategically important that the library be seen by researchers as their starting point?
Will the elimination of an authentication workflow that routes researchers through the library website weaken the library’s gateway role?
Is the mechanism of authentication an appropriate way for the library to defend an intermediary role?

Unaffiliated Users

IP authentication made it possible for libraries to serve anonymous and walk-in users. This includes the unaffiliated general public, which is a key priority for public libraries and academic libraries in public universities. A system can incorporate stronger authentication of individual users while still maintaining options for anonymous and unaffiliated walk-in use, although libraries might need to route that unaffiliated use through a specific account, which presumably could then have different permissions, usage throttling policies, and privacy considerations.

Key questions to consider:

Will there be a way to authorize unaffiliated users?
Will any restrictions be imposed as compared with the fairly anonymous approach in an IP-authenticated environment?
Will libraries be able to accept any possible restrictions on walk-in use in order to maintain their ability to serve these users?

Stepping Back

Most academic libraries and scholarly publishers have accepted IP authentication as a given for licensed e-resources. Few have thought about the strategic, policy, and business implications of a fundamental shift in how users are authorized. The time to begin doing so is now. While simple existing solutions like SAML-based options may have been perfectly acceptable as one in an array of alternatives, they have different affordances when examined as the potential sole means of authentication. In that light, the key issue is whether a comparatively easy fix, using existing technologies, can be acceptable enough to all parties — or whether a more extensive but technically complex solution would allow for a better negotiated transition.

Roger C. Schonfeld

@rschon

Roger C. Schonfeld is the vice president of organizational strategy for ITHAKA and of Ithaka S+R’s libraries, scholarly communication, and museums program. Roger leads a team of subject matter and methodological experts and analysts who conduct research and provide advisory services to drive evidence-based innovation and leadership among libraries, publishers, and museums to foster research, learning, and preservation. He serves as a Board Member for the Center for Research Libraries. Previously, Roger was a research associate at The Andrew W. Mellon Foundation.

Discussion

18 Thoughts on "Rethinking Authentication, Revamping the Business"

I think the economic/financial aspects of these potential changes aren’t trivial. Users have become accustomed to not paying for content, preferring for their employers or institutions to pay. Moving away from this won’t be simple, and may not even be likely. Users have come to expect seamless access as a benefit of employment or tuition.

The IP infrastructure also won’t go away, even with SAML or other schemes layered on top. For those of us old enough to remember the pre-IP-site-license days, piracy was rampant with usernames and passwords being shared (probably still is — well, in fact, that’s how Sci-Hub got its content, so definitely it still is). As the Sci-Hub incidents have shown, we still turn to IPs to determine many aspects of what has gone on. It is a very basic and stable part of the Internet infrastructure, and one that was not compromised itself by Sci-Hub. Humans were the point of vulnerability, sharing (purposely or inadvertently) their login credentials.

I think one of the most important points raised around these issues is the need to train staff (and audit systems) regarding social engineering hacks. Getting an email from “your bank” or “PayPal” or “your provost” asking you to verify your login credentials works often enough still, sad to say.

By Kent Anderson
Jun 22, 2016, 8:44 AM

This is a complicated and significant issue and I completely agree, the least trustworthy and least controllable component of the system as a whole are the individual students, faculty and, researchers who use the content. They are also the ones who are the most inconvenienced by the poor hand offs our systems have. Any time we inconvenience our users we’re giving them an incentive to find a way around us. It always comes down to the same question, which is easier to do–change human behavior or write code? As long as the answer isn’t write code that supports outdated publishing/distribution models my vote is for providing a truly seamless experience for the user.

By Collette Mak
Jun 22, 2016, 11:15 AM

Thanks for the detailed analysis Roger. What a minefield!!

Is this not the strongest argument to change our business model to Open Access completely and be done with IP tracking, SAML, Shibboleth, OpenAthens, CCC, etc? And then we don’t need to worry about Sci-Hub either. 😉

By Kaveh Bazargan (@kaveh1000)
Jun 22, 2016, 1:18 PM

Open access certainly eliminates the authorization problem for content address.

By Roger C. Schonfeld
Jun 23, 2016, 5:58 AM

Open access certainly eliminates the challenges associated with authorizing access to connect resources. But even if publishers are moving to other mechanisms for cost recovery (Joe Esposito’s recently suggested “at some future point we may give away the content to reap the benefits of the control of metadata” https://scholarlykitchen.sspnet.org/2016/06/14/the-publishing-industry-is-mature-but-publishing-companies-are-not/) there will still be a need to authenticate users for access to their own accounts for personalized services. I won’t be surprised if there remains an ongoing need to connect these to institutional identities and privileges. So even in a different business model, I’m not sure the “minefield” just disappears.

By Roger C. Schonfeld
Jun 23, 2016, 6:05 AM

Surprise that this whole discussion can take place without a single mention of ORCID.

By Richard Wynne
Jun 22, 2016, 1:55 PM

I see this comment seemingly every time I write about authentication. Does ORCID have a product offering or something in the pipeline?

By Roger C. Schonfeld
Jun 22, 2016, 2:03 PM

Because authentication is related to identity, and ORCID “offers” a commons identity solution.

Take a look at the ORCID sign-in page https://orcid.org/oauth/signin and you’ll see an institutional option. I believe it uses SAML to integrate with the dozens/hundreds of listed universities including your neighbor Cornell (but somebody form ORCID would need to confirm technical details).

In this context, it seems like ORCID would at least be worth a mention.

By Richard Wynne
Jun 22, 2016, 5:15 PM

ORCID has focused on researchers and other contributors, and the need for authentication is (from a numbers perspective) mostly about students. Privileges right now are mostly derived from institutional affiliation, which as I understand it ORCID does not police. There is no bigger fan of ORCID’s work, and I would love to see it connect up into the solution, but, unless they are expressing that this type of work is on their own roadmap, I am reluctant to assume or propose a major role for them here.

By Roger C. Schonfeld
Jun 23, 2016, 6:11 AM

In our experience of working with 1000s of libraries and a multitude of publishers, SAML- and Shibboleth-based solutions in libraries are not growing at a rate that would see them dominate the market for many years to come. In many respects, in the UK (which is a special case because of its government’s very early and public recommendation to its higher education institutes to adopt Shibboleth based solutions) we are seeing a reverse trend away from Shibboleth dependence towards a scenario of proxy access with Shibboleth as a back up. To a great extent this is due to limitations in link server compatibility with multiple concurrent authentication scenarios. The combination of SSO with a professionally-run proxy service seems to be the preferred, lower-cost and lower-support-cost solution.

Using SSO as the sign-on method for the proxy affords it the same level of security as Shibboleth in the first place. Many publishers’ fear of proxies is based on their experience of shared user names and passwords as a means of proxy authentication. A method which is clearly flawed.

It’s also worth noting that all significant change takes time as history has shown in scholarly publishing – print to web, books to eBooks and primary to OA publishing, etc. In our view the same will apply to authentication systems. This is a global issue which presently serves more than 70,000 site license customers from about 120 countries. How long would it take for all publishers and their customers to migrate to and find budget to pay for integration with new and complicated authentication systems?

Exploration of new authentication models for the future is a valid and important exercise. In the meantime, it’s important to remember how reliant all facets of the industry are on IP Address data. Important examples include enabling OA publishers to track reliable usage metrics and identify authors, enabling the industry to identify and address compromised usernames and passwords (another authentication method).

Having worked with publishers and libraries alike on the creation of a registry that ensures valid assignment of institutions to vetted IP addresses and the ability to report reliable usage data we can say with certainty that regardless of what evolves over time in terms of future authentication possibilities, the fact remains that IP authentication is here for some time and therefore the need for effective policing of the allocation of IP addresses to institutional accounts is immediate and crucial.

By Andrew Pitts
Jun 22, 2016, 3:01 PM

Is SLO under SAML still a mess?

By Boris Ogon
Jun 22, 2016, 8:57 PM

Interested that there seems to be no mention of ‘federated active directory’. I think that is also worth considering in this context. I suspect most (all?) institutions are using some form of active directory to which we could build services that are enabled through it.

By Jeremy Macdonald (@jermcd)
Jun 24, 2016, 10:28 AM

It is easy to understand the attraction of IP authentication and its success over the years.
IP authentication removes barriers for end-users who can access the information they need without stopping to think. It is easy to set up for libraries and publishers alike. Visits via the IP range can be counted and shared between the library and publisher.
But there are challenges. One that I have encountered is the increasing importance of engagement data being used to evaluate the relationship between the customer and the publisher. A better understanding of which individuals are reading research and publications – as well as where, when and how often – recalibrates the importance and value of content to an organisation and enables publishers to develop their services to meet the needs of key users.
This year, I was fortunate enough to attend the SLA conference in Philadelphia in June. Many corporate librarians there were looking for alternatives to pure IP authentication.
“We need the IP authentication user-experience, without IP authentication”.
A SAML based, federated identity solution nearly solves the problem. But it currently falls short because the end-user experience does not remove barriers to access in every scenario. The critical variable is the starting point and research habits of the end-user. If they are comfortable with Google, they are most likely to use Google.
The conversation within the OpenAthens team is how we can be part of a solution that offers a secure, seamless, user-friendly authentication process while offering personalisation and security as part of the discovery journey.
What is great about this article and the Universal Resource Access Forum is the recognition that this problem cannot be solved by one party. It needs collaboration across a range of agents and a real focus on understanding the end user requirements so we can balance ease of access against other competing priorities.

By Jon Bentley
Jun 27, 2016, 10:34 AM

One of the challenges of an industry wide conversation around authentication is the temptation to look for the one answer, whether it’s an enhanced version of IP recognition + SSO, or social logins, or ORCiD etc…

The reality, of course, is that both publishers and their users vary in their needs. The trade-off between simplicity vs security can look very different to a publisher selling access to unique STM content that is essential to research vs a publisher selling access to humanities content where the value add can critically depend on personalizing the user experience. Similarly, some users just want to go and grab content quickly and get out, while others are comfortable with providing credentials to establish their identity in return for more functionality.

The answer has to lie in authentication systems that offer much greater flexibility. Instead of a one-size-fits-all-products approach (IP + the big deal), publishers can support a variety of access scenarios that may vary by product, by pricing model, by access right, and by user profile. For example, a product with highly valuable, unique content could require 2-factor authentication in order to access downloads above a certain volume, but accept anonymous IP authentication for low volume browsing. Or users could be offered the opportunity to associate a social login with their institutional account in order to personalize their experience, without having to learn another set of credentials. We’re investigating a variety of techniques to flag accounts with anomalous usage that could then be required to provide enhanced authentication. And, in some cases, the status quo may be fine.

Most of the component technologies are already in place – the biggest barrier is simply the perceived cost of more flexible solutions. Most of the publishers we speak to at LibLynx are struggling with legacy access management systems that were designed to support a limited set of access scenarios (mainly subscription products authenticated by IP and username/password). These systems were typically built a decade or more ago and have been organically upgraded since. They’re expensive to re-engineer to support new authentication methods and authorization logic. They were never designed to support different access scenarios within and across products.

While I don’t doubt new technologies could provide exciting new answers, and collaboration across our industry is clearly needed to address the friction created when users cross platforms, I think publishers can do a lot more now to offer users access experiences that are far better tailored to both their needs and those of their users.

To paraphrase SSP’s keynote speaker, David Kidder, we need to start firing lead bullets, not silver ones, if we’re to lower the cost of experimentation and learn what works (and what doesn’t).

By Tim Lloyd
Jun 29, 2016, 6:41 AM

Tim,

I was involved in the creation of a legacy access management system, similar to the ones you refer to, less than 10 years ago. Your analysis on the lack of flexibility is spot on.

One goal of our project was to capture and process user data in order to manage authentication on the publisher side.

There are a lot of overheads in the management of customer data. These include the admin resource required to collect the data accurately; the effort required to validate and then activate the user account; the need to manage change and churn within the records.

If a publisher is able to pivot their perception of data management and realise all that effort can be managed on the customer side there could be more focus on optimising the user journey for the institutional subscriber. It is the role of SAML and the federation to manage the authorisation within the terms of the licence – and agreed data can still be passed to the publisher.

The commercials remains very similar to a contract based on IP Authentication – although added granularity means a licence can be based on attributes that exist within the customers own user directory. Department is one OpenAthens often sees used within contracts.

The change in perception is not a silver bullet. But it could empower the publisher to realise they can increase their engagement with individuals within organisations and embrace flexibility by releasing themselves of a large administrative burden.

By bentoswp
Jun 29, 2016, 8:56 AM

The English NHS has been using Athens and OpenAthens (http://www.openathens.net/nhs_users.php) for 15 years to manage paywalled access on behalf of about 1 million staff, without any significant issues. NHS libraries manage registration. I should add that the NHS was an early signatory to Berlin Open Access, and uses geolocation to provide England wide access to the Cochrane Library, but where identity management is required Athens has met its needs admirably.