Editor’s Note: Today’s post is by Ralph Youngen. Ralph is Senior Director of Technology Strategy & Partnerships at the American Chemical Society (ACS). He is past Co-Chair of RA21, a member of the SeamlessAccess governance committee, and one of the founders of GetFTR.)
Last month I attended my first Electronic Resources and Libraries (ER&L) conference. My decision to attend was prompted in part by Lettie Conrad’s report “Of Paywalls and Proxies: The Buzz about Access at ER&L 2019”. Given my involvement with RA21 and its production implementation of SeamlessAccess over the past several years, I looked forward to the interaction with representatives from the library community most directly involved in matters of access and authentication. (N.B. For those who may be unfamiliar with them, RA21 was a community-driven initiative that culminated in the creation of a NISO Recommended Practice for improved access to institutionally-provided information resources. SeamlessAccess is a coalition creating an operational service based upon those recommended practices, promoting digital authentication leveraging an existing single sign-on infrastructure through one’s home institution.)
While Lettie’s ER&L 2019 report noted a collective need for change, the sentiment throughout ER&L 2020 acknowledged that said change is finally dawning. Campuses shared their successes and challenges with enabling federated authentication through services like OpenAthens. Other sessions focused on patron privacy. I particularly enjoyed a session presented by campus librarians from Stanford, Duke, and Yale, in which they indicated that the RA21 initiative raised their awareness of the need to focus on privacy issues and sparked long-overdue conversations with their campus IT departments that oversee authentication services.
Protection of user privacy is indeed a critical concern. Recent privacy regulations such as the EU’s General Data Protection Regulation (GDPR) or the California Consumer Privacy Act provide useful foundations to build upon, but more specifically, this concern highlights the need for community-driven best practices for the scholarly information industry. As an example, SeamlessAccess is sponsoring an open working group whose goal is to codify sets of attributes that campuses would release about their users; of particular note is the distinction as to when it would or would not be appropriate to release some personally identifiable attributes. A separate working group tasked with developing model contract language around authentication issues is in the planning stage. See the SeamlessAccess website to get involved.
Then, the World Changed
For the hundreds of us who attended ER&L in person, I suspect no one could have predicted how the world would change when we returned home. The conference signified the last time I was in contact with more than a handful of people, my last plane flight, my last hotel stay, my last dinner in a restaurant…
The day after I returned home from ER&L, I watched a news report about how several U.S. campuses had made the decision to move to online-only classes for the remainder of the term. I rewound my DVR and took a picture of the TV screen.
Motivated by a need to ensure these students and faculty continued to have access to our publications, my ACS colleagues and I mounted a small campaign to reach out to the campus librarians at these institutions. The effort was similarly spurred on by reports received from China in February from the CARSI Federation, representing more than 300 Chinese academic institutions, who reported that their campus networks were becoming overloaded due to the sharp increase in remote access. ACS Publications responded to their request in February, rapidly enabling federated authentication for these Chinese institutions. With a strong suspicion that the United States was not far behind, we asked if the aforementioned U.S. institutions would enable federated authentication so that their patrons could access ACS Publications content using their campus login credentials while off-campus. In our communication, we explained how federated authentication would work alongside existing remote access methods, such as proxy or VPN. A small number of campuses responded positively, a smaller number responded negatively, but the majority did not respond at all.
Not long after, nearly every U.S. campus closed its physical doors. It was at this point that we rolled out a broader outreach initiative to all U.S. campuses that are members of the InCommon Federation.
A Quick Primer on Federated Authentication
Federated authentication (sometimes known by its implementations or consortia, such as Shibboleth or OpenAthens) is a method for allowing members of one organization to use their authentication credentials to access a web application of another organization. There are three parties involved: the end user, the organization hosting the web application (called the Service Provider), and the organization that can validate the user’s authentication credentials (called the Identity Provider).
In our industry, Service Providers (SPs) tend to be publishers, and Identity Providers (IdPs) tend to be academic institutions. The whole system relies upon a two-way trust between IdPs and SPs that, among other things, governs the information (“attributes”) that the IdP is willing to share about their users to the SP. While this may seem similar to logging into a website using your personal Google or Facebook account, the fact that an IdP (i.e., the academic institution) has complete control over the information it is willing to share about its multiple users is a fundamental differentiation — this is the basis for ensuring patron privacy.
Federated authentication is also a more efficient way to deliver content to off-campus patrons than other commonly known remote access options. With VPN and many proxy configurations, content from a publisher’s site must flow through the campus network in order to get to the end user. In contrast, federated authentication relies solely on the campus network to authenticate the user’s ID and password. This was the reason behind the CARSI Federation’s request to enable federated authentication – to help remove bandwidth from overloaded campus networks. As expected, in March, ACS began to hear similar reports of network capacity concerns from some U.S. campuses.
Expanding Federated Authentication
ACS Publications is now enabling federated authentication for academic institutions worldwide. We are actively joining other national federations (beyond InCommon and CARSI) to provide members with federated authentication as a remote access option.
In the United States, while ACS has been a member of the InCommon federation for many years, only about a dozen of InCommon’s 550 IdP members had been enabled for federated access to ACS Publications content. For years the practice that ACS — and many other publishers — adopted was to only enable U.S. campuses that explicitly requested federated access. At the same time, most InCommon members had enabled their side of this two-way trust years ago. In March, ACS made strides to finalize our side of this configuration to easily allow the rest of these campuses to use their institutional federated identity credentials to access ACS Publications content.
About 350 of the InCommon institutions began using federated access during the last week of March. This change led to an incredible increase of more than 2,600% in the use of federated access in March. We are on pace to nearly double that rate of increase in April, as shown on this chart:
For years, the uptake of federated access has been modest and consistent. The bump of usage in February was an increase from institutions in China as a result of activating the CARSI Federation. The explosion of traffic in March was a combination of activating the InCommon U.S. campuses, along with ACS Publications’ implementation of the SeamlessAccess user experience, which makes federated authentication much easier to use.
Responses have been predominately positive, and from a usage perspective, it is clear that patrons are finding the service useful. We have received some criticism, which has generally fallen along two lines: (1) some institutions prefer that all patrons continue to access content through library systems regardless of user’s location (reasons include control over data release, analytics, usage data derived from centralized management, and license compliance to systems integration concerns); and (2) some institutions voiced concerns about patron privacy and patron attribute release.
The concerns about patron privacy have been highlighted, and these issues are related to implementation of the institution’s identity federation services. As noted above, institutions have full control over what information about a user is provided to a service provider when authentication is approved. However, most institutional identity management is controlled by the campus IT department, not by the library. Many institutions use a bundle of attributes about its users, the Research-and-scholarship (R&S) entity category, as a default bundle of metadata about a user for most identity federation services. This entity category is generally acknowledged as not appropriate for most library services as it provides far too much information about the user, and as such, is not privacy protecting. The RA21 recommendation highlighted the fact that while identity federations can be used to provide personally identifiable information about the user when necessary, this is not the case for library services. The purpose of an ongoing SeamlessAccess working group is to define a new entity category for library services. This would provide a “default setting” for institution IT departments to set up, and thereby ensure only minimally required user attributes are released to service providers.
The timing of the RA21 project and the resulting SeamlessAccess service is fortunate. Just at the time when most, if not all, institutions need to provide a reliable, secure, and simple method of remote access, SeamlessAccess provides precisely that service. It is clear from the data that ACS collected that users remain exceptionally keen to access scientific literature and will rapidly adopt federated authentication for remote access to publications. In part, this is because they are familiar with how to navigate signing on through institutional logins. But in greater part, it is because nearly everyone is having to move rapidly to reliable remote access solutions. While network-level authentication works reasonably well in a world where most people are physically together on campus, it falters in this new pandemic world where we are all working from home.
(The author gratefully acknowledges the contributions from ACS colleague Erin Wiringi and Scholarly Kitchen Chef Todd Carpenter.)
12 Thoughts on "Guest Post – Seamless Remote Access During a Global Pandemic: An Indispensable Necessity"
Thank you for sharing your experiences Ralph. You’ve pointing out the exact issue that librarians have been raising for quite some time about privacy. The problem is this sentence: “the fact that an IdP (i.e., the academic institution) has complete control over the information it is willing to share about its multiple users is a fundamental differentiation — this is the basis for ensuring patron privacy.”
Perhaps it SHOULD be the basis but in reality it is NOT.
As you yourself explain: “However, most institutional identity management is controlled by the campus IT department, not by the library. Many institutions use a bundle of attributes about its users, the Research-and-scholarship (R&S) entity category, as a default bundle of metadata about a user for most identity federation services. This entity category is generally acknowledged as not appropriate for most library services as it provides far too much information about the user, and as such, is not privacy protecting.”
Given this reality, we should not be surprised that librarians reject the notion that SeamlessAccess/RA21/etc. is inherently privacy-protecting. As is clear from the description here, the technology is not inherently privacy-protecting per se. And, though the technology can be set up to be privacy-protecting, the policy/entity categories being used in too many cases are not privacy-protecting. Until a is a library-resource appropriate entity category with absolutely minimal data sharing is established, publishers will continue to experience libraries rejecting SAML and objecting to having it imposed.
It’s interesting to see the issue framed this way, as really an internal battle between librarians and their own institutions, rather than the typical librarians versus publishers notion that usually comes into play here (with everyone assuming everything is a secret plot by Elsevier).
That is one aspect of this issue. I did raise this very point in early 2018. It is disappointing that about two years passed since then before RA21/SeamlessAccess started to address this.
FYI in 2018 here on SK: “First, campus technology units are likely to prefer federated identity solutions to IP authentication as identity-based solutions offer greater account and network security. These units do not seem to share the library’s commitment to user anonymity and minimal data sharing. In fact, more than one publisher/platform has privately confirmed to me that campus identity systems pass along more user information than they need or would like to receive. I recently watched as a campus technology SAML/Shibboleth system passed a user’s email address, full name, and staff/staff status to a vendor in order to allow access to a PDF from off-campus when on-campus access would have been possible based on IP address alone.” https://scholarlykitchen.sspnet.org/2018/01/16/what-will-you-do-when-they-come-for-your-proxy-server-ra21/
Adoption of “a library-resource appropriate entity category with absolutely minimal data sharing” will be a welcome convention to facilitate ease of appropriate access to resources with enhanced individual privacy.
Some additional nuance seems needed in this discussion if it is to go beyond a false dichotomy between privacy=anonymity=IP-based access versus no-privacy=identity-based authentication=federated/SAML-based access.
Individual IP addresses obscure but do not necessarily assure anonymity. Institutions generally need the ability to identify the use of a specific institutional IP address with a user (for example to comply with DMCA). We trust or hope that ability will not be abused, but should not imagine IP-based access guarantees a given service or resource access cannot be reliably traced to specific individual.
SAML IdPs (and Shibboleth specifically) enable privacy-preserving access by supporting identifiers that do not indicate the actual identity or record of individual users, and use of assertions of membership or affiliation (e.g., “faculty” or “engineering student”) to indicate categories for different levels of access instead of individual identities. Services (including library services) may and do request additional information that may or may not be honored. I have seen and rejected overboard requests to provide full name, date of birth, and SSN!
Sometimes privacy requires reliably identifying the user. Access to and privacy of individuals’ own health records and benefits seems to require verifying the users’ identities – with a high degree of assurance! – for access.
Hi Ralph – a really great article that I very much enjoyed reading. A few small corrections – “R&S” is not a default release policy for services – it’s a release policy that can ONLY be used for very specific types of services that require specific personal data and should never be used a default. Services that have been vetted and approved for being able to receive this bundle are a small and elite group – currently only 8% of eduGAIN Service Providers qualify and we audit this annually. The wording of R&S explicitly denies the use of R&S for licensed (library) content so its a much stronger than a general feeling – a library service would likely be rejected if they apply. In this sense, R&S is extremely focused on being privacy-preserving: it does not provide too much information about the user, it provides exactly the right amount to services that have proven that they need the information. I hope that helps clarify but REFEDS welcomes discussion with any provider or institution on R&S.
Hi Nicole. Thanks for the additional details about R&S. The point I was trying to make in the article is that I believe many IdPs are incorrectly releasing the R&S attribute set to SPs that, as you point out, do not qualify for inclusion in R&S. I could be wrong about that, but what I can definitely say is that many IdPs were sending (and likely still are sending) personally identifiable user attributes to us, even though we don’t request them. This was revealed to end users by a bug on our site that would show a “Hello Nicole” message at the top of the screen when a user authenticated with a SAML connection that contained personally identifiable attributes. We have since fixed that bug, and are now properly throwing away any unneeded attributes.
Ralph, Sorry to be late coming to your informative piece. Thanks for putting it together and sharing your perspective and experience. One question: I see that ACS recently “made strides to finalize our side of this configuration” and am wondering if you can share a little more about what that required on the ACS side? Also, if I may ask, why now? Or, perhaps asked another way, what took you so long? And, are there any tradeoffs to this approach or is it one you would recommend unreservedly to other publishers?
Hi Roger. One of the primary roles of research and education federations, like InCommon, is to facilitate the exchange of SAML metadata to enable the trust fabric between IdPs and SPs. When ACS joined InCommon (and many other federations) years ago, we exchanged SAML metadata with the participants in those federations. The final step that needs to take place on publisher platforms is to associate an institution’s EntityID (which can be thought of as a unique URL to an institution’s login page) with a contract, so that appropriate access is granted to users who log in through that institution’s login page, and so that usage is appropriately allocated to that institution. On publisher platforms, associating an EntityID with a contract also causes the institution to appear on the publisher’s “Find Your Institution” page.
As noted in the article, the common practice among ACS and other publishers has been to complete that final configuration step and make institutions discoverable at the request of the institution. While that has been a common practice, is has not been universally applied. When joining the UK federation, for example, ACS and many other publishers enabled all UK institutions for federated access.
The tradeoffs are discussed in the article. Some campus libraries want all patrons to use centralized library systems to access scholarly content, and federated access provides patrons with direct access to scholarly content outside of those centralized campus library systems.
The other concern is over patron privacy. As I noted in my response to Nicole, many campuses are indeed revealing more information about their patrons than required. This is a problem that campuses need to address, and there is an active working group sponsored by SeamlessAccess to define appropriate attribute bundles and make this easier for campuses to configure. Publishers can also commit to discarding any attributes received that are not required for accessing scholarly content. I suspect this “belt and suspenders” approach, where campuses work to configure attributes appropriately and publisher commit to discarding unneeded attributes, will be with us for some time.
I think it is important that publishers realize that if they proactively activate rather than wait for the institution to request (1) they are imposing a support burden on the library that the library may not be prepared for — it is a nice theory that people don’t need support but that is a theory not borne out by any empirical evidence at all and (2) the user population of an institution’s SAML may be much greater than the user population contracted for — i.e. the publisher may be enabling an immense amount of leakage of their content to non-entitled users and the publisher should expect the library to object to such use being reported in, for example, COUNTER reports.
On the privacy point, it is really a relief that there is no longer a pretending that there isn’t a privacy violation happening!
Hi Lisa. On (1), yes that could happen. However, I think it’s also fair to say that there is no empirical evidence to support the assertion that our implementation of federated access has increased the support burden on the library. The data we do have reveal a massive increase in users adopting federated access. I’m unaware of any data indicating a corresponding increase in library support burden. On (2), yes this could happen as well. We have considered this and do not believe an “immense amount of leakage” is likely.
Finally on your point about a privacy violation happening, while I am not an attorney, I doubt that anything described here would meet that legal threshold. That said, I am fully supportive of, and actively involved in, efforts to ensure patron privacy.
(1) My comments on support burden were not specific to ACS but to the general concept of imposing federated access. On (2) similarly not judging ACS’s assessment of this but want to make sure other publishers possibly following your lead are aware of and can make their own assessment of this risk.
As to privacy, yeah – what’s legal is just not the only way that librarians think about this re where the lines are for what’s a privacy violation. There are a lot of things that are perfectly legal that are absolutely anathema to the values of privacy, free inquiry, intellectual freedom, and the many other important values and practices.
We are happy to see Seamless Access is addressing the privacy issues and building solutions for SAML connections for libraries. A year ago several librarians and technical specialists from the EU started the initiative Federated Identity Management for Libraries (FIM4L) because of privacy concerns of libraries when using federated SSO. FIM4L has drafted a Guidelines and Recommendations document for libraries, which is now open for public comments. It is looking for a consensus on library policy for federated authentication that protects users’ identities. It aligns with SeamlessAccess. https://libereurope.eu/blog/2020/03/02/fim4l-recommendations/