Brooklyn Bridge
Image via Suiseiseki

[Editor’s Note: This is an edited transcript of a presentation Joe delivered at the recent SSP conference in Arlington. The slides from the presentation appear below the text.]

Let me begin with some basic assumptions. First, many segments of scholarly communications are now mature as businesses. This does not mean that there is no growth, but that growth comes at a high cost. For example, while there may be growth in Asian markets, it is necessary to develop infrastructure to serve a global market, which includes territories that are not always welcoming to foreign entities; and in markets in the developed world the fight for new dollars usually involves pitched battles for market share.

My second assumption is that we have barely scratched the surface of what digital technology will mean for scholarly communications. I don’t pretend to have a road map for all the developments we will be seeing in the coming years, but I am confident that new initiatives are on the way, some of which will bring growth to this industry. The problem is in figuring out which initiatives to back and which to run away from.

What I want to explore is one property of digital media and to consider what place it could have for scholarly publishing. That property is the ability for a publisher to develop a direct relationship with end-users. Sometimes that relationship remains anonymous, as when we log user visits to a Web site, but don’t know who that user is. (Of course, we may know how often that particular unnamed user came to the site, how long he or she stuck around, and what other sites may have been visited. But a name? No, we may not have that.) Sometimes we may have a complete profile, as when we ship physical goods to a user, which, among other things, permits us to capture that very important Zip Code, an essential property of print publishing. In just about all cases, though, digital media enables us to know more about users than we ever did in the print era, so we should expect that learning to manage end-user information is potentially a transformative aspect of digital publishing.

Before I go on I want to anticipate the objection that in the print era, journals publishers knew quite a bit about their individual subscribers. This is true, and it is something that we have lost as the business moved from individual to institutional subscriptions. But while we knew the name of those subscribers and had other important information about them (for example, we knew if they belonged to a professional society and we had the Zip Code), there is much that we did not and could not know. (By the way, how many professional publishers took advantage of the Zip Code data?) How many articles did they read? How long did they take? What did they re-read? And because the publication stood by itself, we had no information about the path a scholar took as he or she went from one publication to another. One of the remarkable aspects of digital media is that it places all participants and all publications into a network, where each node can refer to each other. Thus in terms of end-user information, we have gone from having some specific kinds of information to the possibility of a broader understanding of what any single user and groups of users do with our published material.

So what can we use end-user data for? Here is a partial list.  I would be grateful if others helped me expand this.

  • Direct sales.  This is sometimes called “D2C marketing.” The idea here is that a publisher can sell things directly to end-users, cutting out middlemen, even bypassing libraries.
  • Collection of data to support other initiatives, both for marketing and editorial.
  • Packaging of end-user data (fully anonymized) for sale to third parties.

Let’s explore these one by one.

Direct Sales to End-users

Just about every publisher is doing some of this today. Ecommerce engines sit on Web sites everywhere. Want to buy a book? Click here. Subscribe to a journal or purchase a single article? Just provide your credit card information. There are some built-in problems with this, though, which range from a tiny inventory of products to the very difficult task of getting people to come to your Web site in the first place. For example:

  • Having an insufficient inventory of products for sale
  • Building Web traffic
  • Technical issues (e.g., ecommerce capability)
  • Customer support
  • Pricing conundrums

To this list we should add the fact that when it comes to sales, books and journals are very different commodities. Books present a better opportunity for D2C sales than journals because the are mostly sold on a stand-alone basis. Individual subscribers to journals are hard to come by (since users get access through library subscriptions) and as for selling individual articles, the problem is in finding a price that attracts individuals without cannibalizing sales of institutional subscriptions.

So when it comes to D2C sales, it seems reasonable to say that there is an opportunity, but that opportunity is modest. The industry is not likely to embark on a growth trajectory in this way.

Using Data to Support Marketing and Editorial Programs

Collecting end-user data to support marketing programs and even editorial strategies may be a different matter, however. The notion here is that end-user data can refine marketing programs and provide better information for advertisers. Editors, this argument goes, will use the user data to determine what materials to publish.

I am more skeptical about this than many. To begin with, where is the robust online advertising market that augmented end-user data will serve? The sad fact is that advertising for scholarly materials is in secular decline. There is no amount of end-user data that is going to turn that around. So that data in this context may slow the pace of decline, but it is not going to turn it in the other direction.

As for using the information to alter editorial programs, this is a tricky situation. What happens when the editors are not cooperative? That is not an uncommon phenomenon in any branch of publishing and business people approach the editorial department with deference. This information can be valuable in shaping new products; the question is organizational: how to implement this and still observe the church-and-state divide. Having an innovative editor could in fact be a critical factor in achieving growth. This leads to another question: What processes must be in place for recruiting editorial staff to find people who have an interest in growing the business?

Overall it seems fair to say that using end-user information to sharpen marketing plans and to shape editorial programs is likely to lead to highly specific results, depending on the individual organization. This could be a growth factor for a small number of fortunate companies.

Packaging and Selling User Data

Another use of end-user information is to gather a great deal of it and package it for sale to third-parties such as pharmaceutical companies and funding agencies. What does end-user activity tell us about trends in research? Does it help us to identify new areas that require support?

This is indeed a promising area, but it comes with certain restrictions. First and most importantly, the privacy issues here are huge; any organization that does not have a carefully thought-out policy is going to run into serious trouble. Another issue is that of scale: the bigger the organization, the better the chance of being able to derive value from the data. This naturally means that the biggest companies and aggregators are in the strongest position to take advantage of this opportunity. Here again we can envision growth, but it is highly specific.

This brings us to the core of the problem: Working with large amounts of user data is a new situation for publishers, and we don’t yet know what the benefits are likely to be. On one hand we might like to adopt a wait-and-see attitude (“Let’s not make any investments in this area until the business model is clear”), but on the other hand we don’t want to run the risk of a competitor figuring out how to take advantage of this data and thereby establish a “first mover” advantage. It is probably best to think of end-user data as an emergent and promising property of digital media and networks, one that we have to invest in now, well before we can neatly articulate business plans and ROI projections. Now, take that one to your Board!

Stepping back from this analysis, it seems to me that what we have here is an ecosystem problem.  The products and services that publishers have in their portfolios today were largely created in the absence of the kind and amount of end-user data we can now gather as an aspect of the properties of digital media. Introducing an analysis of end-user data in this environment can bring some benefits, but they are likely to be small, not enough to trigger a big spurt in growth. If we want to tap end-user data, we need to move into an adjacent ecosystem, one that thrives on the use of such data. This means developing an entirely new class or products, something other than the journal or the book. Easier said than done, one might say, and I would agree. But whoever thought that bringing growth to publishing would be easy?

Joseph Esposito

Joseph Esposito

Joe Esposito is a management consultant for the publishing and digital services industries. Joe focuses on organizational strategy and new business development. He is active in both the for-profit and not-for-profit areas.


7 Thoughts on "Connecting to End-Users"

Great piece, Joe, and an excellent SSP presentation.

When I was tweeting from the conference, I expressed my support for the gathering and reuse of usage data, as long as it’s “robustly anonymized.” A couple of people responded to the effect that really robust anonymization is no longer possible because technologies now available to reverse-engineer anonymized data and reattach it to individuals’ identities. How big a problem is this, I wonder, and is it solvable?

Is it solvable? Not sure. But here is a compelling study from late last year showing how easy it can be to track someone down with just a few pieces of data:

This is not a new issue. In the 1990s, researchers were able to find out the medical conditions of the governor of Massachusetts using just two small data sets — anonymized medical data and the voter list. Because three elements intersected — ZIP code (postal code), birth date, and sex — they were able to know which anonymized medical record was the governor’s.

But people think we’re related based on our last names, which amounts to using public information to reach the wrong conclusion. So, which is more risky? Using public information without rich data to validate assumptions? Or diving in to some level to confirm a hypothesis before making a claim about someone?

There was another study about 8 years ago, that could predict how you would vote in an election based on zip code, approximate salary, and one other piece of information. The more data on top of that improved the accuracy to 97%.

One of the most valuable uses for usage data is pricing. I often find that publishers use the top level usage data and often ignore the depth of data that is being collected. Usage data is a powerful tool in the digital world and many publishers are really not making use of this tool. Google analytics is like taking baby steps and when I ask a new client what they are using I often get the Google story. There are a number of companies out there that can work with a publisher to gain far more information from usage data.

If you want to get an idea of the power of usage data take a look at CNN or other news programs that have mastered the use of usage data in real time. Most news companies have a team of people using sophisticated tools to harvest and analyze real time data.

I recently worked with a publisher that had a young guy working for them who had reams of usage data that he had analyzed in great depth but nobody had ever looked at the charts, graphs, or asked for his input on any aspects of the business. A complete waste of two important assets.

Usage data solves so many mysteries, it is hard to imagine why more publishers do not value this critical data.

I wonder about usage data. It seems to be like stock market data in that it tells one what has happened but not what is going to happen. Say an article gets lots of hits, will a similar article published say 6 mos later get lots of hits?

I think one of the uses for user data is in deciding which information to repurpose for placing into a database or other product that is devoted to a given topic or set of topics – that is what handbooks or other reference works do.

In short, the user data can help create products that make money.

I think there’s a significant piece missing from your partial list – the collection of data that drives the improved user experience of the delivery platform itself. This particular type of ‘end-user data’ tells you how content is discovered and consumed and can then of course be used to make continuous improvements to the delivery mechanism, improvements that may drive up relevance, usage, loyalty and – ultimately – revenue.

This should all be viewed in the context of a world where there are antithetical positions such as those expressed recently by Apple’s Tim Cook. See:
Digital also provides the means with which to deny publishers any and all of this data. Just because someone wants the data and is willing to provide free services to get it doesn’t mean that consumers will, in the long run, buy that proposition.
There is a massive struggle today between the Google, Twitter et. al philosophy and the Apple philosophy. Right now, the Google philosophy is ahead but whether that will continue remains to be seen.
Revisiting Charlton Heston in “Solyent Green” might be revealing in new ways today.

Comments are closed.