At the encouragement of Gary Price (@infodocket), I recently installed Ghostery onto all my computers. Ghostery is a browser plug-in that tells you if a Web site you are visiting is installing any tracking software on your computer. I immediately went to the home page of a major university press:  one piece of tracking software showed up. As I drilled into the site, the number of “trackers” went up and down, with seven trackers being the high point. On to another university press:  eight trackers. Next stop:   three trackers. Another press:  one. The number varied press by press and within the sites of individual presses. The home page of the New York Public Library shows eight trackers. As an attempt to gauge the outer margin of what I would call “respectable” tracking devices, I proceeded to the Web site of the New York Times, figuring that this reputable publication, which sells advertising, would have quite a few trackers, and it did: twenty. A less distinguished publication, Business Insider, in which Amazon’s founder Jeff Bezos is an investor, literally had more trackers than I could count:  I got up to twenty-five when Ghostery’s dialog box timed out.

I don’t want to suggest that there is anything wrong with the presence of these trackers, nor is it fair to lump all of them together, as they have different functions and the people who deploy them do so for a variety of reasons.  We should be aware of the tracking phenomenon, however, and articulate our practices clearly, comprehensively, and publicly.

Let’s step back and think about this issue in the abstract.

Privacy is everybody’s business, and no one’s. It’s everybody’s business because even the most exhibitionistic among us have some qualms about full and unmediated disclosure, a discomfort over living in a state of constant surveillance. It’s no one’s because nobody speaks for everybody; there is no Everyman or Every Subject whose interests align with everybody else’s. In the academic environment librarians have been the most forthright and far-seeing about privacy issues, but the library is but a single aspect of the business of scholarship. (Ann Okerson very helpfully pointed me to the privacy policy of the Duke University Library, which strikes me as a model of its kind.) Those other aspects have a perspective on privacy as well, though it is rarely (at least in my observation) as clearly understood and expressed. Articulating a thoughtful and comprehensive policy for privacy for the scholarly community is thus a desideratum, one which we would do well to work on now before the discussion is overtaken by events. There is little point in locking the barnyard door after the horse is stolen.

I first became aware of the implications for privacy on the Internet in the 1990s, when I was responsible for an early online service.  The user logs–I had never seen anything like them!  It was immediately apparent to me that we would be leaving our digital fingerprints everywhere. This is despite the famous Peter Steiner cartoon that bore the caption, “On the Internet nobody knows that you are a dog.” Of course we know if you are a dog or not–and where you live, what you read, and what you have purchased. Contrasted with Steiner’s anonymous dog we have the lyrics of The Police’s creepy song:

Every breath you take

Every move you make

Every bond you break

Every step you take

I’ll be watching you

Edward Snowden may have made the fact of surveillance something no one could ignore any longer, but he only forced us to wake up to what we already knew or should have known–and by “we” I mean the privileged First Worlders who go to universities and want to send their kids there. The question is, what do we do about this state of affairs–or do we fail to look back through the looking glass:  should not the observer also be observed?

Which brings me back to university presses. When I first began to study how university presses can sell books on a direct-to-consumer (D2C) basis, I was not thinking about some of the more serious questions about privacy. Indeed, my interests were essentially the practical concerns of a businessman:  what works, what doesn’t work, and how can we measure this? I hypothesized that university presses have an inherent limitation on D2C marketing activities in comparison with commercial publishers. Commercial firms can and do push their marketing right up to the limitations imposed by law, but a university press may have certain constraints concerning data collection and use imposed on them by their parent institutions. As I began to research this, however, I came to see that the situation is murkier than I had assumed and that the policy issues won’t sit quietly in a drawer while the tactical marketers do their job.

D2C marketing is more than simply putting a list of books on a Web page and then inviting people to purchase them. Marketers want to know who comes to the Web site, how long they stick around, and what they do there. If someone makes a purchase, a marketer will record that purchase and the name of the customer. Marketers will also try to enhance the information about that customer in any legitimate way they can. For example, they may purchase fields of data from direct marketing service companies to fill out a profile for a user, and they will also try to collect all information on a particular user that they have in house into a single record.

Thus Amazon, a direct-marketer of the first order, knows that I recently purchased both The Luminaries and The Second Machine Age.  (I will not tell you what Google knows about me.) The record of those purchases then influences Amazon as it makes recommendations to me, and those records are in turn combined with the records of other users in a process called “collaborative filtering” (people who bought this also bought that). I just went back to the Amazon page for The Luminaries to look at its recommendations. Among other titles, Amazon is recommending that I look into The Goldfinch. This is interesting in that I did in fact purchase a print copy of The Goldfinch (though not from Amazon), which I gave to my sister at Christmas. She loved it and is planning to loan it back to me the next time I see her.

How much of this data collection is legitimate, how much an intrusion, how much useful, and if useful, useful to whom?

As I have been surveying presses as part of the D2C project, the answers to questions concerning privacy have been all over the place. Yes, we have a privacy policy and here is the link where it is articulated. Yes, we have a privacy policy, at least I think so. I don’t know if we have a privacy policy or not. We comply with the university’s privacy policy, but I don’t know what that is. The university has strict rules on credit-card security, with which we comply. The university doesn’t even know we are alive (expressed in different ways, this is the most common response). In several instances when I asked if a press tracked its users, I was told no. I then got off the phone, powered up Ghostery, and found a number of trackers on those very presses’ Web sites.

It’s not my ambition to be the Winston Smith of scholarly communications, but I do think that privacy has to be elevated to a more important topic of discussion–which is to say that it has to get beyond the library and include the voices of all members of the community, including the voices of university press staff, who have very real commercial considerations.

The first step is disclosure:  what exactly are we doing? Here the statement by the Duke Library cited above is instructive. For example, Duke notes that it sometimes uses third-party software and that it cannot vouch for the privacy policies of the vendors of that software; users are encouraged to review those vendors’ own policy statements.

Beyond disclosure we have to come to terms with three areas:

  • What information is collected. A library may have a legitimate need to know who checked out a book. A university press may have a legitimate need to know who came to the Web site and from where, and also to know the history of that individual’s browsing on the press’s site and purchases. Other departments may have other legitimate needs, and they all have to be taken into account in crafting an overall policy.
  • How information is stored. It’s one thing for a university press to know who I am, where I live, and what I purchased, but quite another for that information to be accessed or stolen by third parties.
  • How information is used. If a press knows that I am interested in the social impact of technology, should it be able to send me emails to prompt me to buy more books on that subject? Should the press be able to use my information in an aggregate recommendation engine? Can a press swap my data with another university press, the better to build the marketing databases of both concerns?

What troubles me is that in some instances practices are moving forward without much regard for an overarching policy. Every employee of a press should know what the organization’s privacy policy is, just as everyone knows the rules on sexual harassment and expense reimbursement. The time to get these protocols in place is before a disaster happens, the better to prevent disasters and to manage them when they do occur. As the university press world looks more and more to reader engagement, which is D2C marketing by another name, the issue of privacy will sit front and center.

3 Thoughts on "Privacy and the University Press"

“Installing tracking software” is a little scare-mongering. For example, when I view the NYPL home page I see 4 “trackers”, these are:

Add This; a set of javascript widgets that enables users to quickly share content. Many websites use such widgets to allow users to share via social media.
Google Analytics; this is a set of javascript and cookies that allows the NYPL to understand how visitors find and use their site. I use the same software on my own website to see which pages people like the most etc.
New Relic; like Google Analytic, this is an analytics platform
Optimizely; a tool to help webmasters optimise a website. Essentially it is a testing suite that allows website to run experiments on content, placement of content etc, to see what improves site usage best.

None of these “install” any software on a computer. They do cause the browser to download javascript and execute it in the browser. The downloaded javascript will be stored in the browser’s cache, but is only executed when called by a web page. However, most sites with any dynamic content will be doing this (dropdown menus, image light boxes, slideshows etc.)
These things may (and they are) storing a cookie in your browser, which allows you to be tracked “anonymously”, but anyone can turn this off entirely in their browsers settings. I say “anonymously” because given the amount of time we now spend conducting our lives on the internet, these people know an awful lot more about you than you might care to let them.

For some of those tools, the “tracking” is secondary to understanding behaviour on a website. University Press X may not be tracking you but are using a third-party tool to help them understand how users interact with their website, which may allow that third party to track users.

Given our usage of the web, I believe it is important that people really understand how these things work and how to block them. Language like “Installing tracking software” is very scary for less tech-savvy people and they may very well presume that the NYPL or University Presses are downloading trojans, worms, or keyloggers, or trying to get their bank account details and passwords. That said, with so many people using social media these days, you’ve already passed on to faceless megacorps far more private information than your web history…

  • ucfagls
  • Mar 11, 2014, 12:46 PM

There is an interesting article in the Feb 2014 issue of InternetRetailer titled “What Gasoline Can Teach E-Retailers about Big Data” http://www.internetretailer.com/2014/02/03/what-gasoline-can-teach-e-retailers-about-big-data which discusses how Netflix has collected and used “customer interaction data” to develop and produce award winning content, namely House of Cards.

I understand that the use of data such as this will eventually target me (or at least my living room) through my television set selecting commercials for products that fit my demographic (more than they can currently figure out by zip code and viewing area). TV sets, while they show a great picture, are quite dumb (only sending data one way) and there seems to be a growing interest to make data collection in this medium more prevalent. In fact, I can’t believe that the television / broadcasting industry has lagged so far behind. However, CBS recently announcement that they may (stress may) move to an internet direct-subscription model. This will surely send them light years ahead of others (except Netflix). Of course, while all this data collection and targeting of commercials may keep me from having to sit through countless upper-middle age drug commercials, I fear my commercial breaks will then be filled with Coke Zero and Doritos ads (oh the shame).

  • Paul Yeager
  • Mar 13, 2014, 6:55 PM

Comments are closed.