Editor’s Note: We first quoted Caltech Artist-in-Residence David Kremers maxim, “Privacy is the new luxury” back in 2009. In the decade since then, we’ve seen a continuous assault on personal privacy, as surveillance has become the dominant business model on the internet. We’ve approached questions of privacy from many different angles over the years here at The Scholarly Kitchen, including technology architectural approaches to privacy (particularly relevant in light of new authentication systems like RA21), as well as striving to find the right balance between personalization and privacy).
Librarians have long been stalwart guardians of patron privacy — an increasingly difficult task in the era of Google and Facebook. Today’s post is by Mimi Calter, Deputy University Librarian for Stanford University, who brings a useful framework for libraries as they consider patron privacy.
Patron privacy has been a long-standing concern of libraries, and in the era of Facebook data-sharing scandals and of GDPR, the privacy of users of digital content is an increasing concern. In response to that general issue, and to several specific difficulties with data providers, Stanford Libraries, with support from a number of our peer institutions, have put forward a Statement on Patron Privacy and Database Access.
Our goal in putting forward this public statement is, first and foremost, to clarify our commitment to our patrons: our campus communities. We want our patrons to know our position and priorities, but, more importantly, we want to be clear with our vendors and data providers that we take this position and our responsibility very seriously. That responsibility in some cases is statutory, but is always an ethical one, as noted in the ALA code of ethics and ALA Library Bill of Rights. We value our role as trusted providers, we have a responsibility for safeguarding patron data, and we will be attentive to that responsibility when we are the customers of data service providers. We will be focused on this issue when drafting and reviewing contracts, and we will be firm on these principles in negotiating terms.
Our particular concern is the issue of transparency and disclosure. We recognize that some users may choose to share their personal data to establish accounts, for example for the sake of a customized or enhanced user experience. The growing trend that concerns us is towards the silent, or unknowing, sharing of patron data. We are seeing a growing number of demands for data from the library, by the provider, “on behalf of the patron,” but without their knowledge or control. We’ve seen the trend in consumer industries, and it is bleeding into information and content services licensed by libraries. This is unacceptable.
Silent sharing creeps in through different ways.
- We have seen examples of data use or privacy clauses that allow for change without notice. This is unacceptable because libraries have a responsibility to ensure that these clauses comply with laws and local policies, which necessitates advance review and approval of any change.
- We see clauses that allow patron data to be shared broadly with third parties. This is unacceptable because we cannot allow our patron data to be shared in ways that are not directly related to the provision of service or in ways that are not secure.
- We see proposed terms that allow for broad capture and open-ended use of patron data and patron activity. This is unacceptable because it is at odds with our long-standing practice of capturing and retaining the minimum data required to provide the desired service.
- We see data and user security terms that demonstrate a lack of understanding by vendors and content providers about GDPR and data privacy. This is unacceptable because, though we are a US-based institution, we have many connections with Europe that require us to comply with these standards.
- We see examples of existing accounts, that were created under acceptable data use policies, or under no data use policy at all, being migrated to new platforms, with different data reuse terms, without notice. This is unacceptable because it does not allow users to make an informed choice about the use of their data, and again does not allow libraries to ensure compliance.
- And we see the growth of potentially high-value initiatives such as RA21, which may bring increased pressure to expose more patron data as a “standard” part of access to digital resources. These must be carefully structured to minimize exposure of patron data as much as possible, but always to ensure disclosure of any PII that may be transmitted.
So our recent statement is to make clear that patron privacy is a matter of fundamental principle for major research libraries. And to demonstrate to our vendors that we are watching. And to show that we are coordinated in our efforts.
As the Identity Providers for our patrons, we must be proactive in protecting them, and we will insist that any data sharing is done under thoroughly informed and expressed consent. We will hold content providers and vendors accountable for their intent, actions, and security practices. Of course, privacy and data management standards will be an ongoing discussion. We expect change, and we welcome debate. But our dedication to protecting patron privacy will be clear and unwavering.
12 Thoughts on "Guest Post — Protecting Patron Privacy in Digital Resources"
Thanks for this piece. That’s quite the list of examples! Have you been able to effect any change in contracts with respect to these issues? I look at the NISO Privacy Principles, for example, and it doesn’t seem like they are implemented, even by publishers who helped create them.
We have seen examples of data use or privacy clauses that allow for change without notice.
Wouldn’t a contract that allows its terms to be changed without notice be an invalid contract, pretty much by definition? (Not that that would necessarily prevent someone from putting such a clause in a contract anyway — heaven knows I’ve had to negotiate such terms out more than once.)
The issue, of course, is no one ever reads those terms or pays attention to when they change. Libraries, acting on behalf of our users, want to at least make terms as transparent as we can to our users, but it’s hard to do if those terms shift without any meaningful notice. What I’ve seen is that we can successfully get those “change without notice” terms out of contracts directly between the library and the vendor, but then vendors turn around and incorporate a whole separate, standard use agreement that our users are expected to agree to when accessing their database or website. It’s those website terms-of-use and privacy policies that are the problem.
In the context of this post, I would like to make clear that the RA21 initiative is NOT aimed at seeking greater exposure of individual library patron data, in fact quite the opposite. Overall, the participants in the RA21 initiative share the concerns about patron privacy as expressed in this joint statement. Patron privacy is a core distinguishing feature of the library community and most suppliers who are providing content or services to the library community share in the recognition of that right of library patrons to intellectual freedom. RA21 is not an attempt to break trust with this fundamental principle of library service.
While the recommendations are still being drafted (it should be available for public comment within the next couple weeks), the RA21 Committee has approved the adoption of the GEANT Data Protection Code of Conduct (See: https://ra21.org/index.php/2019/02/28/ra21-adopts-refeds-data-protection-code-of-conduct/ ) in support of privacy, limitations on data uses, and limited attribute release in SAML exchanges for the forthcoming RA21 service. This Code of Conduct outlines principles such as data minimization, limitations on its use only for proscribed and identified services, strictures on reuse for unrelated services, and other important privacy-protecting features. Furthermore, the RA21 forthcoming recommendation will be providing guidance that for library services that pseudonymous tokens be provided with only the entitlement attribute be shared without the explicit consent of the user. Additional user attribute release is–and should be–controlled and governed by the institution, not the publisher, again with the consent of the user. Additionally, RA21 has been based on the legal requirements of GDPR and is being crafted and built accordingly, as many of the core participant in the project are EU-based, or have significant footprints in the EU market.
I will believe that RA21 is committed to patron privacy when it also incorporates provisions that service providers for library resources will, reciprocally, *not accept* more than the pseudonymous token and entitlement attribute without explicit consent from *the library* – not the institution at large.
RA21 knows that there is often a divide between libraries and the IT departments that control the authentication process. At your ER&L presentation this year, two conflicting views were presented on how to handle this: from Sari Frances at IEEE, that RA21 will “force” libraries and their IT departments to work together; from Dan Ayala at ProQuest, that RA21 will enable vendors to assist libraries who have difficulties coordinating with their IT departments.
Planning to force two conflicting entities within an institution to get along, on their own and without cooperation from the service provider, is not a recipe for success. Humans don’t work like that. There will inevitably be SPs who game the system by storing and using extra data sent to them by an IT department without consent of the library. Not their problem it got sent, right?
A troubling foreshadow to this potential scenario: IEEE’s own new proxy stanza, publicly available at https://help.oclc.org/Library_Management/EZproxy/Database_stanzas/IEEE_Xplore , includes “Option X-Forwarded-For”. According to OCLC support at https://help.oclc.org/Library_Management/EZproxy/Configure_resources/Option_X_Forwarded_For , this line exposes the patron’s IP address of origin. This rather says something about IEEE’s true level of interest in patron privacy, and puts IEEE’s position on who’s responsible for keeping things private in, I think, a very different light.
Trust has to go both ways. If libraries and their service providers are truly to be partners, and if service providers are serious about not actually wanting that extra data, then RA21 must incorporate Ayala’s proposal – vendors assisting libraries, not abandoning them. It is neither impossible nor difficult for a vendor to contact those responsible for an institution’s authentication system to tell them, “You’re sending us the wrong data. Without explicit direction otherwise from the library, we require X configuration.”
Contacting IT departments along such lines is already a common practice among vendors who desire specific attributes to be sent. Why omit the practice from RA21, if RA21 prefers data minimization, as professed – except in the interests of those SPs who *don’t* actually share a commitment to data minimization, and *would* actually like to play with any extra, unintentionally provided data?
It is clear that there needs to be better communication between campus IT and the library. One of the outreach goals in the coming year of RA21 is to seek ways to improve that communication. The guidance on attribute release, particularly using and pointing to the policies developed by the campus IT community (rather than developing our own), is a component of this outreach effort.
Publishers are not in a position to control the data that is sent to them. Among the publishers that I have spoken to about this, rather than happily using extra data that they didn’t ask for, they would prefer not to receive it and they delete it. Particularly those publishers base in the EU are particularly concerned about the potential GDPR liability of receiving PII data that they weren’t meant to receive. That said, your suggestion about including guidance about what a publisher should do if too much data is received in error is a good one and I’ll pass it along to the group.
“Publishers are not in a position to control the data that is sent to them.” Technically true.
But, here you go – publisher to institution: “if you send us these user attributes again, we’re shutting off your institution’s access to our content for failure to comply with the data agreement in the contract we have with your institution in which we said that we would not accept such data from you.”
Access shutoff is an interesting idea, Lisa. At first gut reaction, I’m appalled at the thought, but… so, too, would faculty and admin be, and all the more so to discover that it’s because of the distribution of excess personal data. Outrage from the Powers That Be would be directed precisely where it needs to be, and that has the potential to fix things pretty fast. Assuming the allied publisher adjusts relevant subscription periods to make sure any significant shutoff time isn’t included in the access time the library has paid for, I think that could work quite nicely as a united front.
Thanks for taking this feedback into account for RA21. I’m sure there are many publishers and vendors acting in good faith, as you say, Todd – but any bad actors would say the exact same things, while quietly doing whatever they want in the background. There’s a reason that the corporate line “we take your privacy seriously” has become a meme-worthy joke. Transparency is, as Deputy University Librarian Calter observes, a serious problem in today’s environment.
(It should be noted, in part for disclosure’s sake, that I work at Cornell University Library, one of the endorsing institutions on Stanford’s Statement on Patron Privacy and Database Access. But also to demonstrate the very literal truth of DUL Calter’s statement: Hi; we’re watching.)
I’m going to disagree on that. If the institution has agreed to not send data points per the terms of the contract and then does, resulting in the publisher shutting off the platform for awhile, there is no right for the institution to an extension of the access. If the institution wants to keep its access, it need only comply with the terms of the contract and send no data points. In this way publishers can exert control over what data they receive.
The way in which that thought is making me squirm extremely uncomfortably suggests that you have a strong point about it being a way in which publishers can exert control.
In addition to the concerns listed here, I also worry about hackers accessing user data.
Todd, thank you for representing RA21 in this important dialogue. What happens after the patron logs into the service provider? Let’s say that the Identify Provider (university IT, library) includes the pseudonymous pairwise user identifier (e.g., eduPersonTargetedID). What happens to the server logs with all the pseudonymous sessions? Do these logs get scrubbed after the user logs out, similar to what libraries do with checkout records in the catalog? My guess is no, otherwise what is the point of tracking the pseudonymous patron from session to session. What does RA21 promise in the way of retention policies for those server logs? Also, what about third party trackers and data brokers? What prevents the RA21 Service Provider from contracting with a data broker to match the pseudonymous sessions with other PII tracking data? (See: https://www.fastcompany.com/90310803/here-are-the-data-brokers-quietly-buying-and-selling-your-personal-information ). Thank you.