At the encouragement of Gary Price (@infodocket), I recently installed Ghostery onto all my computers. Ghostery is a browser plug-in that tells you if a Web site you are visiting is installing any tracking software on your computer. I immediately went to the home page of a major university press: one piece of tracking software showed up. As I drilled into the site, the number of “trackers” went up and down, with seven trackers being the high point. On to another university press: eight trackers. Next stop: three trackers. Another press: one. The number varied press by press and within the sites of individual presses. The home page of the New York Public Library shows eight trackers. As an attempt to gauge the outer margin of what I would call “respectable” tracking devices, I proceeded to the Web site of the New York Times, figuring that this reputable publication, which sells advertising, would have quite a few trackers, and it did: twenty. A less distinguished publication, Business Insider, in which Amazon’s founder Jeff Bezos is an investor, literally had more trackers than I could count: I got up to twenty-five when Ghostery’s dialog box timed out.
I don’t want to suggest that there is anything wrong with the presence of these trackers, nor is it fair to lump all of them together, as they have different functions and the people who deploy them do so for a variety of reasons. We should be aware of the tracking phenomenon, however, and articulate our practices clearly, comprehensively, and publicly.
Let’s step back and think about this issue in the abstract.
I first became aware of the implications for privacy on the Internet in the 1990s, when I was responsible for an early online service. The user logs–I had never seen anything like them! It was immediately apparent to me that we would be leaving our digital fingerprints everywhere. This is despite the famous Peter Steiner cartoon that bore the caption, “On the Internet nobody knows that you are a dog.” Of course we know if you are a dog or not–and where you live, what you read, and what you have purchased. Contrasted with Steiner’s anonymous dog we have the lyrics of The Police’s creepy song:
Every breath you take
Every move you make
Every bond you break
Every step you take
I’ll be watching you
Edward Snowden may have made the fact of surveillance something no one could ignore any longer, but he only forced us to wake up to what we already knew or should have known–and by “we” I mean the privileged First Worlders who go to universities and want to send their kids there. The question is, what do we do about this state of affairs–or do we fail to look back through the looking glass: should not the observer also be observed?
Which brings me back to university presses. When I first began to study how university presses can sell books on a direct-to-consumer (D2C) basis, I was not thinking about some of the more serious questions about privacy. Indeed, my interests were essentially the practical concerns of a businessman: what works, what doesn’t work, and how can we measure this? I hypothesized that university presses have an inherent limitation on D2C marketing activities in comparison with commercial publishers. Commercial firms can and do push their marketing right up to the limitations imposed by law, but a university press may have certain constraints concerning data collection and use imposed on them by their parent institutions. As I began to research this, however, I came to see that the situation is murkier than I had assumed and that the policy issues won’t sit quietly in a drawer while the tactical marketers do their job.
D2C marketing is more than simply putting a list of books on a Web page and then inviting people to purchase them. Marketers want to know who comes to the Web site, how long they stick around, and what they do there. If someone makes a purchase, a marketer will record that purchase and the name of the customer. Marketers will also try to enhance the information about that customer in any legitimate way they can. For example, they may purchase fields of data from direct marketing service companies to fill out a profile for a user, and they will also try to collect all information on a particular user that they have in house into a single record.
Thus Amazon, a direct-marketer of the first order, knows that I recently purchased both The Luminaries and The Second Machine Age. (I will not tell you what Google knows about me.) The record of those purchases then influences Amazon as it makes recommendations to me, and those records are in turn combined with the records of other users in a process called “collaborative filtering” (people who bought this also bought that). I just went back to the Amazon page for The Luminaries to look at its recommendations. Among other titles, Amazon is recommending that I look into The Goldfinch. This is interesting in that I did in fact purchase a print copy of The Goldfinch (though not from Amazon), which I gave to my sister at Christmas. She loved it and is planning to loan it back to me the next time I see her.
How much of this data collection is legitimate, how much an intrusion, how much useful, and if useful, useful to whom?
It’s not my ambition to be the Winston Smith of scholarly communications, but I do think that privacy has to be elevated to a more important topic of discussion–which is to say that it has to get beyond the library and include the voices of all members of the community, including the voices of university press staff, who have very real commercial considerations.
The first step is disclosure: what exactly are we doing? Here the statement by the Duke Library cited above is instructive. For example, Duke notes that it sometimes uses third-party software and that it cannot vouch for the privacy policies of the vendors of that software; users are encouraged to review those vendors’ own policy statements.
Beyond disclosure we have to come to terms with three areas:
- What information is collected. A library may have a legitimate need to know who checked out a book. A university press may have a legitimate need to know who came to the Web site and from where, and also to know the history of that individual’s browsing on the press’s site and purchases. Other departments may have other legitimate needs, and they all have to be taken into account in crafting an overall policy.
- How information is stored. It’s one thing for a university press to know who I am, where I live, and what I purchased, but quite another for that information to be accessed or stolen by third parties.
- How information is used. If a press knows that I am interested in the social impact of technology, should it be able to send me emails to prompt me to buy more books on that subject? Should the press be able to use my information in an aggregate recommendation engine? Can a press swap my data with another university press, the better to build the marketing databases of both concerns?