We may live in the age of privacy nihilism but recognizing one’s reality does not have to mean agreeing to do your own work by its terms. This post is for those publishers, academic and research librarians, and others who conduct research on user behavior in library information systems, who — whether for personal and/or professional ethical reasons or policies — want to do so in ways that prioritize privacy.
Situating Myself and Academic Librarianship
A bit of my own background is probably useful to contextualize this discussion. My own attention to this topic of privacy and user data came into focus when I led the launch of the Value of Academic Libraries Initiative as President of the Association of College and Research Libraries (ACRL) in 2010-2011. Grounded in The Value of Academic Libraries: A Comprehensive Research Review and Report, my work that year and since then has been heavily focused on advocating for the profession to move to evidence-based claims for library value and for the collection and analysis of individual user data in order to do so. This work has been heavily criticized for its focus on collecting user data and, at times, for facilitating the neoliberal transformation of higher education.
Given that, I have also had to confront hard questions about how gathering and analyzing user data aligns with the values of my profession. Specifically, the value of privacy as expressed in the ALA Code of Ethics statement that: “We protect each library user’s right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.” These questions have not had easy or straightforward answers, particularly as the value of privacy can be in tension with another principle in the ALA Code of Ethics: “We provide the highest level of service to all library users.” I’m grateful to Andrew Asher who joined me in a series of public presentations exploring these issues (e.g., CNI Fall 2014).
Serving as a member of the the NISO Privacy Principles working group and training librarians in the ACRL Assessment in Action program and the ACRL Standards for Libraries in Higher Education roadshow have provided continued opportunities to reflect deeply about the challenges librarians are facing in this realm. Recently, I was part of the national convening for Library Values & Privacy in Our National Digital Strategies.
All of this is to say that I have spent enormous amounts of time and energy engaging with the library and publishing communities around these topics. Librarians — myself included — have been and continue to be deeply engaged with a struggle to reconcile theory and practice, particularly as their values often put them at odds with both their own institutions as well as dominant commercial players upon which they rely to provide information services to library users.
So, can you prioritize privacy in user research? Simply put — yes. Will it be cost-free to your project or your organization to do so? Simply put — no. We have to accept some limitations on our user research and its potential applications in order to prioritize privacy. We also have to accept some limitations on privacy in order to conduct user research.
The questions at hand are how the two will be negotiated against each other, what promises are made to research participants, and how can we ensure those promises are kept throughout the stages of data collection, analysis, reporting, preservation, etc.
Before we go further, I want to untangle three terms that I find are often confused and conflated in discussions of privacy and user data: privacy, confidentiality, and anonymity. I think it is noteworthy that the ALA Code of Ethics uses both of the terms — privacy and confidentiality — in its statement. This is already a tip-off that the two are not interchangeable. ALA provides an explainer that defines the two terms: “In a library, the right to privacy is the right to open inquiry without having the subject of one’s interest examined or scrutinized by others. Confidentiality exists when a library is in possession of personally identifiable information about users and keeps that information private on their behalf. Confidentiality is a library’s responsibility.”
As one way of paraphrasing this, confidentiality is a mechanism for a library to have and use data while protecting privacy.
One might wonder why the mechanism is not anonymity? After all, wouldn’t that be the most privacy-protecting approach of all if a user is not and cannot be identified? Indeed, it would be. To collect no data about users at any time would be the most privacy-protecting approach. It is also not possible to manage a library effectively, which is a community good, if you do not collect any user data. For example, tracking who has which book checked out currently is fundamental to stewarding the collection. Monitoring how many hours a user has reserved media equipment in a given week is fundamental to stewarding access to limited resources. For rare books/special collections, best practices are to create a permanent record of who uses what and for what purpose. Given all this, ALA advises that “librarians should limit the degree to which personally identifiable information is monitored, collected, disclosed, and distributed.”
From this, we can derive a first principle of Limitation – collect only what is necessary. Without a doubt we can debate what is necessary but we have at least moved out of the realm of assuming that one should just collect data regardless. I’ll make an additional note that this is an affirmative statement of contemporary judgement — it isn’t sufficient that such data could be judged necessary in the future but rather that it is judged to be so now.
A second principle that we can derive from the above discussion is Protection — prevent examination or scrutiny by others. A library (or other organization) that has data that it has determined necessary to collect and use should not share that data with others who would examine or scrutinize it. This means that the data is secured and managed by the library and not transferred to others. As a side note, it may be tempting to think that the data can be anonymized before sharing and thus not create issues for privacy or confidentiality but one should be very careful about assuming this. Re-identification (or de-anonymization) of data has turned out to be easier and more successful than many might think it could be.
A corollary to the principle of Protection is the principle of User Control — an individual may choose to share or to not share data about themselves. This decision belongs to the user and not to the library and the user being able to exercise full agency in this decision means fully informed consent with regard to what is collected and how it will be used and managed. A library may ask permission to collect or share data but must be careful to do so in a way that does not pressure the individual to do so. In spite of this principle, we must recognize that it is common that data is collected as a condition of using a third-party tool or service in libraries; in such cases transparent disclosure about data practices is critical though we should recognize that the individual’s choice is constrained.
Applying Principles to User Research
I’ve situated this discussion in the ethics of librarianship because, regardless of who is conducting the research per se, research in library information systems is research in libraries and the ALA Code of Ethics applies not just to librarians (individuals) but to libraries (institutions). However, as we move to focus on the application of these principles to user research, I think we can also benefit from drawing upon the principles of Respect for Persons, Beneficence, and Justice, which are the underlying principles of human subjects research review in the United States as codified in the Common Rule and most commonly encountered in Institutional Review Board (IRB) processes that require their application to informed consent, assessment of risks and benefits, and selection of subjects.
Each of these the principles is explained in the Belmont Report. Respect for persons means that individuals should be treated as autonomous agents and that persons with diminished autonomy are entitled to protection. Beneficence means treating persons in an ethical manner by respecting their decisions and protecting them from harm and by making efforts to secure their well-being, including an evaluation of risk against benefit. Justice means that benefits and burdens of research are distributed fairly.
In addition to applying these two sets of principles generally, I highlight the following when coaching others on privacy and data in the context of user research:
- Presume you need permission from users in order to collect and use data about their information behaviors and to share it with others. Insist on disclosure of data practices in cases in which data is collected as a condition of using a service or tool.
- Carefully specify the user data that will be collected and how it will be managed securely throughout the processes of collection, storage, analysis, reporting, and preservation so that the practices are sufficiently detailed as to be followed by anyone who might have access to the data. It is not enough that you as the creator of the dataset can understand the procedures. They must be documented for others. (Note: Most IRB processes place heavy emphasis on how user participation is solicited and how data is collected and stored. Other stages in the process typically receive little to no examination. User research is particularly vulnerable to privacy breaches because reporting detailed analysis can result in sharing findings for small n groups such that identification of individuals is possible from the analysis output.)
- Don’t confuse anonymity, confidentiality, and privacy. In particular, be very careful that you do not promise research subjects anonymity when your data practices only support confidentiality and be certain to communicate any limitations on the promise (e.g., a court order to disclose). Confidentiality is a mechanism for prioritizing privacy. But, confidentiality is not anonymity.
- If you ever find yourself saying “but, if I tell them that, then users won’t agree to participate in my research,” take that as your conscience waving a red flag at you. It is probably a sign that you need to tell your users exactly whatever you are thinking you would rather not.
A Final Note
I’m not naïve. In reality, these are strategies for prioritizing privacy; they do not guarantee privacy. The privacy nihilism article cited in the first paragraph felt very real to me when I read it. Many library users are simultaneously logged into Google and Facebook services when they are using library resources. Data breaches are all too common and even the most careful individual can make a mistake. But, for all that, I’m not willing to give up on my ethics just yet.
I also don’t want to suggest that I’m offering the final and definite take on these issues. These discussions are complex and evolving, just as the technologies and environments in which user data tracking occurs are. I’m looking forward to diving in more deeply to questions about privacy and web analytics at the upcoming National Web Privacy Forum and also invite anyone who is interested to join me as a participant in the Digital Library Federation’s Technologies of Surveillance Working Group.