PIle of books
Image via  Jorge Royan.

It is one of the cruel truisms of the book business that publishers rarely have much insight into how their products are actually used. This is not for lack of curiosity on a publisher’s part but because of the structure of the industry:  books are almost never sold directly to end-users. They are sold to libraries and the wholesalers that service libraries; they are sold to your local bookshop; and they are sold to online vendors; but rarely is a book sold directly by a publisher to the person who reads it. (For a report on those exceptional D2C situations, click here.) Book publishing, in other words, is a game of intermediaries. Sitting upstream, publishers have little insight into what is happening downstream. This is an invitation to make bad business decisions based on unproven assumptions about how books are actually used, and as an industry, book publishers have accepted that invitation over the past few years and made a series of big mistakes. It may be hard to roll back these decisions, but if we don’t know what they are and how they came to be, we are likely to keep making more of the same.

This topic came to light with amusing effect a few months ago when Kobo, an international ebook retailer, began to release some information about how their readers actually read books. This caught the attention of the popular press (for a representative piece, see this one from the Guardian), which, in taunts reminiscent of high school, gleefully noted that only 46% of readers finished Donna Tartt’s bestselling The Goldfinch, a novel about an art heist that has been compared to the work of Dickens–though, I suspect, not by anyone who has read Dickens as an adult. But, wait! It gets better!  The book that was most often completed (by UK readers) was, of all things, a self-published novel, One Cold Night, by Katia Lief ! Are we a pretentious species or what?

So what’s going on here?  To begin with, we are working with ebooks, not print, which, in the age of Edward Snowden, means that they are endlessly trackable. Kobo itself has a huge readership around the world, second only to Amazon, which makes it possible to discern patterns in all the data it collects. But why would they release this information? Amazon notoriously is a very tight-lipped organization, but Kobo is headed the other way. Perhaps this is simple marketing–get out a good story about books and feature the Kobo name in it–but I suspect that Kobo is preparing to sell reports on that usage to publishers. Indeed, Kobo makes a good, if limited, case for how information on reader engagement could help a publisher acquire titles more effectively and market them better. So the sale of the data surrounding books could represent a new revenue stream, opening up the possibility that books could be sold at breakeven or even at a loss, with data sales comprising all of an online retailer’s profit. And here we have one of the fundamental truths of publishing in the twenty-first century: economic value is migrating from content to the metadata that surrounds that content.

What Kobo has zeroed in on is a new alternative metric, reader engagement. While we talk about altmetrics endlessly (immeasurably) in the journals world, for books we mostly talk about sales, measured in units: How many copies did you sell? Engagement is a different beast. I am an outlier for The Goldfinch, having completed a book that 54% of readers did not. I give it a “B.” But the number of books that I have actually finished in my lifetime is very small. I would be surprised if I have finished 10% of the books I’ve read. For nonfiction that percentage is barely discernible. My science fiction addiction fares little better:  50 pages and throw the book across the room. Not finishing a book is the norm for me. This is because there are so many good ones to choose from; why waste time with one that is not up to snuff or for which the argument becomes apparent before reaching the final page?

Is this business of not finishing books simply a comment about my own habits or does it say something about readers in general? This is where the Kobo data is so useful, as we now have unequivocal evidence that while some books are read, many are sampled. Obviously this is going to vary by reader, the specific book, subject area, and probably such things as the time of year (what if a bunch of great books all come out just as you started a different one?). We can only wonder at what Amazon knows about reader engagement for books of scholarly merit. Sometimes we can guess:  Thomas Piketty made it to the top of the New York Times bestseller list, but a book for specialists is a book for specialists.  It looks great on the coffee table, though, and attests to one’s political bona fides.

This is where we have gotten into trouble. The apparent fixity of a book, the tendency to think of a book as something stuck inside an inflexible container, has led us to imagine that books are used the way they are written, or how we assume they are written–that is, from beginning to end. The prominence of the novel as a literary form over the past two centuries reinforces this. Who would want to break off in the middle of Tom Jones? The traditional novel is linear, which has created an expectation that all books are linear. That expectation is simply wrong, as Kobo and our own reading experience tell us.

If books are often just read in part, then it makes little sense economically to say that you don’t have to pay for a book unless you read the whole thing. I have been disappointed to see that some people, including judges, believe that up to 10% of a book, in some instances 20%, or a complete chapter is not enough to undermine the commercial interests of a publisher. This is pushing fair use too far. If the expectation is that a book to be worth anything has to be read all the way through, then the book industry would simply collapse. Some people would welcome that, I know, but I am not among them. But if the expectation is that books are tasted and not always swallowed whole, we would come up with a new set of guidelines for how books can be used before payment is required.

We would also assess the various demand-driven acquisitions (DDA) policies differently. Books put into DDA programs simply must be priced higher than books purchased in advance, and the fees for short-term rentals for parts of a book should be high as well. If Kobo had released their data even a few years earlier, it’s possible that the now ubiquitous DDA programs would never have gotten off the ground.

It is interesting to consider what other aspects of the book industry will change now that more and more end-user data is coming to light. How will this affect editors’ decisions? Will this provide ammunition for the idea of book “shorts”–texts of shorter length–or will the marketing challenges for such experiments prove to be insurmountable? Will we be able to measure the practical effect of promotional campaigns? Is data the new oil, as many contend? Whatever we learn from data, however, any actions we take have to be anchored in good economic sense. But there can be no doubt that the place to start is to get the data in the first place.

Joseph Esposito

Joseph Esposito

Joe Esposito is a management consultant for the publishing and digital services industries. Joe focuses on organizational strategy and new business development. He is active in both the for-profit and not-for-profit areas.


33 Thoughts on "What We Got Wrong About Books"

Joe is quite right, we know so little about how our books are actually used. But, this is changing. At OECD, we offer up our books by chapter and even by table within chapters. We can therefore monitor downloads at the level of a complete book, an individual chapter and even an individual table/chart or graph within a chapter (everything is tagged with a DOI and has its own metadata). Last year, we recorded 2.2M downloads of complete books, 4.4M downloads of individual chapters and 1M downloads of individual tables/charts or graphs via our main online channel, OECD iLibrary. We’re also seeing a shift in behaviour. Growth in downloads of individual chapters is far outstripping growth of downloads of complete books, so it seems that users are getting used to the idea of just downloading what they need, rather than what they might need. Of course, we have no idea how much of each of these entities is actually being read or re-used but tools that track usage page-by-page are emerging. We tried one a few years ago and, for those readers who agreed to have their usage monitored (yes, tracking is going to raise all sorts of privacy issues!), we found actual usage was pretty modest – roughly 7 pages in any one session – sadly the tool couldn’t monitor if the same user came back and read some more later. Where it gets interesting, of course, is when we start feeding the usage data back to the authors. Some aren’t happy that we offer their books by chapter showing Pink Floyd-like behaviour about the integrity of their work, but they get happier once we show the downloads – especially the downloads by chapter – as it helps them learn what is popular and what is not. Since we offer our books on subscription, we can show downloads by book per institution – and that is where things get very interesting for our authors, their managers and, increasingly, their funders. We use data to help our publishing decisions (we used data to discover the popularity of a particular type of book we publish and to convince other authors that this type of book would be worth producing for their area of work and we now have a successful series). While we haven’t yet seen changes in author behaviour, I think it is only a matter of time before they start acting on the data too and we start taking ‘evidence-based’ authoring and publishing decisions in place of today’s collective gut instinct/experience-driven efforts.
Toby Green

I have to question whether academic authors are really motivated by knowing what their readers are most interested in, because it seems to me that scholars decide what to write about based on their own research interests, not what the “market” may want. And those research interests are usually formed by responding to prior scholarship or building on it in some way, whether or not the subject is “popular” or not.

You’re right about how authors choose what to write, but then that’s why we have publishers. Publishers are deeply concerned by who wants to read what and how.

I suspect that the publishers’ interest in ebook lending organizations is related to hoping that someone will rise up to compete with Amazon, rather than lack of consideration of who reads how much of what.

The ability of an eBook or eJournal article to scrape up all sorts of data and then phone home reporting to publishers or even directly to authors has just begun. To get an early glimpse, take a look at what can be done with Apple’s free iBooks Author app. The HTML widget function enables anything that can be executed with HTML 5, CSS 3 and Javascript. There are even third parties providing widgets for authors to insert into their “multi-touch” books. No coding necessary.
For example, see: Bookry: https://www.bookry.com
These widgets can even be used in other venues such as a Learning Management System (LMS).

I think we know more about books than we think we do. For instance, we know that people read authors until they become bored with them and then move on to someone else. We know that books outside of STM are a means of entertainment and that audiences are fickle. We know that many a best seller is just a fad and that the second book by a bestselling author could be a bust.

Lastly, we know metrics or data are matters of the past and more often than not have little to do with the future. I say this because if metrics had a great deal of reliability we would all be at the track.

I don’t believe that scholarly books in the arts, humanities, and social sciences –which are by definition “outside of STM” are “a means of entertainment.”

Harvey: the books that are a means of entertainment are within the subset called trade publishing. (That’s jargon for the type of book, novel or non-fiction, that might be found in a normal bookstore.)

Trade publishing is half of the total in book publishing. The iceberg is mostly underwater.

An interesting piece. (I’d love to see a version of the Kobo report with names added to the outliers – particularly the incredibly high-selling 5% completed book… presumably the contemporary equivalent of A Brief History of Time).

One thing I do wonder about is whether readers handle ebooks and print books in noticeably different ways. If so, would this limit the conclusions we can draw?

Some possible complicating factors that spring to mind, particularly for fiction:

* The different nature of e-reading. Readers may (do?) interact differently with print, ereaders, and onscreen texts – is it easier to set aside an intangible electronic copy? Or harder? (If we had perfect data, we might even find that completion rates differ from hardback to paperback print…)

* Completion lag times. I often find with both fiction and nonfiction that I buy it, read a chapter or two, set it aside and forget about it, find it again in two years when looking for something to take on holiday, and then read it so fast it’s done before I get on the plane. Given the relatively short period for which we have e-reader data available – and the ease with which you can keep something forever – how many readers will later go on to complete a book? How long do you need to stop reading to count as “not completed”? How many will tick over to “completed” come 2018?

* Subjective value – ebooks tend to be purchased for a lower price than comparable print books, with some readers (including, I admit, myself) limiting purchases to deeply discounted titles. This might lead to ‘valuing’ ebooks lower than print, and a greater willingness to abandon something rather than stick it out to “get your money’s worth”.

* The “classics problem” – a lot of ereader devices are filled with free copies of classic works; lots of Dickens, Austen, etc. It’s trivially easy to acquire these, and equally easy to discover after a couple of pages that, well, you didn’t want to read through War and Peace after all! If these are included in the overall stats, it might well skew the number of “rarely completed” titles.

As it happens, the first book I downloaded to my iPad–you guessed it!–was War and Peace. I figured that if I were ever stranded in an airport for days, I’d at least have this to be entertained by. But I have never been stranded, and consequently I have so far read only about 5% of the book. P.S. The second book i downloaded was Adam Smith’s The Wealth of Nations, and I have gotten through only about 5% of that also..

I keep a copy of A Suitable Boy on my phone for the same reason!

(Apropos of which, it occurs to me that the Kobo or similar does not know what we’ve read *before* – it can’t identify the book we leave unread, or rarely dip into, because it’s the copy of something we love, saved for a long, long day…)

Yes and yes again to the importance of “completion lag times.” I often pick up a book one year and think Nah, not for me. A year later– and usually in a time of less stress and more sleep–I often find that I am really into the book I once thought too dreary or too dumb to complete. I sometimes wonder if the people who think they completely understand readers and reading by tracking reading behavior for say one year, which I think is longer than the norm, actually read themselves, because then they would know, among other things, about “completion lag times.”

I might be being naive here, but I’m not sure that the data gathered by measuring engagement tells us very much without knowing a lot more about the people buying the books.

The Donna Tartt example illustrates what I mean because “The Goldfinch” was widely praised as being Dickensian in style and invention, and that was one of the reasons I bought it. The same was true of my friends. We all studied literature and loved Dickens and were excited about reading Tartt’s novel. We read it and were pretty bored, so any tracking of our reading would show a lot of skimming, even if we did finish it.

But I don’t think that would be true for someone who had never read Dickens and had no such high expectations. In fact, I had lots of people, who only know the name Charles Dickens but never read anything he wrote, tell me how wonderful the book was, so wonderful, they couldn’t put it down, and unless they were pretending an interest they didn’t really have, their level of engagement, if tracked, would prove to be much higher than mine.

The point I am trying to make is I’m not completely sure that, at this point, data gathered on reading habits tells you much unless you know a good deal more about the readers– about their class, education level, income, background knowledge, reading history, etc. Or again, maybe I’m being naive, and I should assume the people doing the data gathering have access to this information? That’s a creepy thought.

I think gathering data about readers could prove as valuable as this article suggests, but to be truly valuable i.e. capable of useful interpretation and application, a lot of other data would have to be gathered as well, and that’s the part I find most worrisome.

Drat, I wasn’t witty enough to use “great” instead of “high,” but I very much wish that I had. 🙂

Would you agree that Bleak House would be a good name for a publishing company?

Oh god, another witty comment, I wish I had made. Yes, indeed, Bleak House is extremely apt these days, which may be why in my world, no one claims to be a publishing company anymore. They are all technology companies.

It’s not just Kobo and Amazon that can gather this type of data, libraries can also do similar analysis without relying on publishers or compromising patron privacy.

One powerful option is for libraries to optimize their use of EZproxy access and authentication, a widely used service for providing off-campus access to electronic library content. The EZproxy server can capture a unique student identifier for each resource use, including ebooks and journal content. At Nevada State, we parse this data and then work with our Institutional Research Office to aggregate it with student demographic data. Although we collect the unique identifier, there is no way for us to connect this number to the student’s identity. A working version of our dashboard is below:


This type of data collection enables libraries to obtain highly granular data for all ebooks that are proxied for off-campus access.

I think it’s important to note that we are only discussing eBooks. It is very easy to lump all ‘books’ into one category or one system of usage, but that is just another form of assumption. It is very important to be clear in defining the data you are using to draw conclusions.

I have a recommendation for Joe about how to overcome his bad habit of reading only parts of books: confine your reading just to books that you have volunteered to review. In that way not only do you get a copy of the book for free but you have taken on motivation to read the whole book. Most of the books I read today fall into this category, and I review them for the Journal of Scholarly Publishing, Learned Publishing, and other such places.

Excellent point about DDA. The triggers are simply not sustainable because the assumptions they’re based on about how scholarly material in libraries is used, and who is using it, are just plain wrong. The triggers are in need of serious reform if the model is going to survive.

But to change the topic a bit, I was a bookseller for a couple of decades before going into publishing and I quickly learned that the key to being a successful bookseller wasn’t reading everything in its entirety, but instead reading a little bit of a lot, and reading a heck of a lot about new books—like smart book reviews and publisher catalogs—to figure out what was actually worth sampling. My job required having a broad knowledge of lots of different kinds of books, not having a deep knowledge of any particular book or even genre. No one ever made a purchase contingent on what I thought of the ending. The books don’t sell if they’re not where they belong on the shelf and they’d never get there if all the booksellers ever did was read all day. Old habits die hard and I still do quite a bit of sampling. My wife insists that I don’t actually read books as she seldom sees me return to the same one. I also spend a good portion of my work day reading what are often difficult manuscripts, so coming home and reading for fun can sometimes be more of a challenge than it should, but every now and then I get seduced by a good book and like any reader in love, I just want to spend all of my time with it, and I miss it when it’s over.

“Although we collect the unique identifier, there is no way for us to connect this number to the student’s identity.”

If that unique identifier persists for any substantial length of time, I can think of all kinds of ways to connect it to a student’s identity.

For instance, when I was in college, I often knew what courses my roommates, and many of my friends, were taking. And often I was the only person who was taking a particular pair of courses in a given semester, and I’m sure that was often the case for my friends and roommates as well.

So, say it’s common for people to do their course readings online, monitored by your system, and you happen to know two courses a particular student is taking. Look in your access data for unique IDs that are connected to readings assigned to both of those two courses. (Reading assignments are often not hard to find in posted syllabi.) If there’s only one such ID, chances are very good that the unique ID is for that student you know. And you can then find out a lot more about what that student’s reading by seeing what else is connected with that unique ID.

This is only one possible way to reidentify a student in cases like this; lots of other ways become possible the more information you have available to correlate. (And in the online world, there’s potentially a *lot* of correlatable data out there on each of us.) Basically, any identifier that stays constant across multiple transactions represents a reidentification risk, and libraries have ethical obligations to minimize these risks and give their patrons information and opportunities to forgo those risks.

Have they really hit on reader engagement, or just claimed to have done so. You can finish a book and not really have a good memory of what happened, how the characters made you feel because, well, you read it to fill time in a busy place, you were bored nothing else to do etc etc.

There have books that have engaged me (ie immersed me) for 75% of the journey and then taken silly twists that rendered the rest pointless. I didn’t finish the book. By Kobo’s definition I wasn’t engaged. Yet I’d argue, emotionally I was – and the disappointment was a key part of the engagement, and subsequent disengagement.

(Sometimes of course the silliness is part of the engagement, just how daft can it get. Hello ‘Gone Girl’.)

That we can measure it doesn’t mean it has meaning, or the meaning ascribed to it.

I think “engagement” is a poor term and I fully agree with your last sentence.

Since I work in educational publishing and hear the word “engagement” to a degree that has made me become almost allergic to it, I also vehemently agree with your last sentence Martin.

Being able to track how and when a reader uses or buys a book has value, but when I hear data-happy CEOs talk about it, I seriously wonder about their notion of the human mind, which appears to have all the complexity of a cartoon character and not the ones created by Art Spiegelman, more like those promoted by Disney studios.

Agree with your comment, but what word would you use? The word “engagement” is of course corporate doublespeak, but Kobo is measuring something. What is it and what value does it have? Let’s not get stuck on the terms.

Excellent article. Especially important is the discussion of partial reading and fees paid out by subscription sites, and the amount of sampling (because some book sites allow far too much of a book to be provided as a sample…80 pages? Really?). The trigger presets for payments on subscription sites is not working for authors or for publishers in economic terms. If the payments are less because of how readers read, not because the work is flawed, that’s a threat to the entire industry.
In the end, this means that readers will not be served, either. If the data is taken to mean that readers want shorter works when in actuality they are happy to engage for a short time with intense depth, then publishers will push for shorter books, leaving readers dissatisfied. And that too is a threat to the entire industry as well as our culture of books.

As Elmore Leonard said, try to leave out the part that readers tend to skip.

Comments are closed.