Shana Kimball, who is now with the New York Public Library, recently tweeted  that she yearns for a Web-scale university press.  She is not alone.  Shana employed the hashtag #want, which is capacious, as it could include such things as Ben & Jerry’s ice cream and the discovery of original footage of the Beatles performing in Hamburg.  But in the tiny world of scholarly communications, where we ask for little and get less, a Web-scale publishing service would be a very good thing indeed.  For journals we have PLoS, our one significant exemplar, but for scholarly books we have nothing.

It is not insignificant that Shana’s comment came to me via Twitter (as do most things), which is not, strictly speaking, a Web application.  To operate at Web scale now means to embrace other platforms as well, whether they be mobile apps or clever exploitations of SMS, as is Twitter, or anything else–and here it is useful to remember that print is no longer an alternative to digital media but simply one other expression of it.  Academic publishers have many of the pieces in place (books on the Kindle,  social media marketing plans, the sale of PDFs from their Web sites), but what’s missing is the glue to hold it all together and the ability to operate on a planetary scale.

Let’s poke around a bit and try to imagine what a Web-scale operation would look like.

For Exhibit A we turn to the New York Times, which does indeed operate at Web scale.  Although the Times continues to publish in print, it now has an array of apps and has worked assiduously to optimize discovery.  Take a look at a typical article.

That particular article was not chosen at random but  because the Times had the good sense to quote me in it.  After it appeared I naturally went to the menu of services that appear next to every article, a menu that includes Facebook, Twitter, Google Plus, and more.  I tweeted the link, an act of self-promotion and also a promotion of the Times itself.  So here we see a rule for the Web-scale enterprise:  put your users to work for you.  A Web-scale university press is going to want to have some version of the Times’s menu of social media services.

But social media in itself is not enough.  Here is the link to a very good article with an odd title, “The Zen of Web Discovery”.

This article, by John Hubbard, is highly recommended.  It was not chosen at random.  An article on Web discovery, one would think, would be better enabled for discovery on the Web.  The Webmaster at this site (hosted by the library at the University of Wisconsin at Milwaukee) knows his or her way around social media–the menu of options for sharing is almost as extensive as the Times’s–but the article itself is a PDF.  This can get in the way of full-text search and it makes a user work harder to get access to the text.  With a slap of the forehead we realize that an essential step for a Web-scale publisher is to get its texts on the Web in the first place–that is, everything should be in HTML.  Few books are published this way; where books are in electronic form, they usually appear as PDFs or in a form that only enables them to be viewed within the environment of a major retailer (e.g., Amazon, Apple).

On the assumption that librarians would know a thing or two about how to find things on the Web, I looked around for a standard reference and came across this.

The title is Communicating Professionally, Third Edition:  A How-to-do-it Manual for Librarians.  The fact that this book is now in its third edition tells us something about its success, but curiously there does not appear to be an electronic edition, not even for Kindle.  The publisher is the ALA, which is operating here in the mode of Web 0.5:  the Web is used for transactions, but the content itself is intended to be read in print.  This is not publishing at Web scale, but the use of the Web with a print mindset.  The Web-scale university press will have to look for other models.

Many academic publishers are now placing their books into digital aggregations (see, for example, the service from Project Muse)–a genuine move to embrace the Web–but this seems to me to be only a half-step–a welcome half-step, but a half-step nonetheless.  These aggregations are not open to the public Web but are targeted to institutional purchasers and users, which is entirely logical but not what is meant by Web scale with its implications of the public or consumer Internet.

In order to enhance discovery, publishers are now tagging these books on the chapter level.  And here we have to ask:  Is the chapter level sufficient?  The chapter level is good if you are selling individual chapters in coursepacks, but for online discovery what’s needed is to get to the individual paragraph, the essential unit of thought in all writing.   It would not be feasible for a publisher manually to tag a book at the granularity of the paragraph, but one aspect of Web-scale companies is that they often get a large group of users to pool their efforts, aka crowdsourcing.

Crowdsourcing brings us back to the Times and its menu of social media options.  Let’s imagine the full text of a university press book rendered in HTML and viewable by anyone with a Web browser (a note on business models in a moment).  The platform automatically breaks the linear text into paragraphs; each paragraph has its own URL, is its own Web page.  Alongside each paragraph is a menu similar to that of the Times, but presumably with the addition of other social media services that have been developed with the scholarly community in mind.  This would allow readers to use their own Twitter and Facebook accounts (and whatever service they choose) to tag the text at the paragraph level and share those tags with their personal networks.  This would be something of an interim solution to the Web annotation problem:  using existing commercial services as tools for annotating content and then sharing those annotations.

Now we are operating at Web scale:  the full text of a book on the Web, wrapped in social media, and made more valuable by the interactions of its readers.

But what is the business model?  If the text of the book is openly available on the Internet, how will the publisher make money?  That’s a fair question, for which I do not have a totally satisfactory answer.  A partial answer would include some monetization strategies and the support of sponsoring institutions. By partial monetization I mean such things as charging fees for print on demand and downloads (but not for viewing the HTML version online), which could then be read on mobile devices and laptops, where the text could be annotated for the scholar’s personal library.

The support of sponsoring institutions is nothing new in the university press world.  While some presses are profitable (some impressively so), many are subsidized by their parent institutions. Many presses also subsidize some of their programs internally.  For example, a line of regional titles may be published at a profit, which helps to offset losses on the core monograph program.  Nor should we be surprised that many scholarly books cannot make their own way in the marketplace.  University presses were originally established to publish books that commercial publishers would not take on; if those books could be published with a healthy return on capital, the commercial sector would be all over them.  But so much of academic life resides outside the workings of the marketplace, and rightly so.  Marketplace solutions are great when they work, but often inadequate.  A scholarly publisher has to know when to turn to the marketplace and when to turn away from it.

Although this idea of a Web-scale press has an open access dimension, it is not OA for the usual reasons we hear about in the journals world–that is, because there is a great deal of demand for texts, but no budget to pay for them.  University press books have the opposite problem in that the demand for many monographs is simply insufficient to underwrite the publishing process.  Many university press titles sit for years on library shelves without circulating.  For some titles the value of the work is not only in market demand but in filling holes in the scholarly record and in the training and certification of scholars. Perhaps there are other and better ways to train and certify scholars, but at this time for some fields the publication of a book is the best solution we have.

Publishing at Web scale, though, may modify the demand for scholarly monographs. While I do not believe that a business model based on using free content to sell paid content is by itself a sustainable strategy (and have written about this before), the free content and the affordances of digital text potentially open up new possibilities for discovery.  Are some books not circulating in libraries for the simple reason that they do not enable full-text search and thus scholars do not know what is inside of them?  Friends in the industry believe that this is so.  I am skeptical, but the fact is that we have to test this to find out.  So let the experiment begin.

So the Web-scale business strategy looks like this:  publish as open access texts (in HTML) to maximize accessibility and discovery; tag books down to the paragraph level through crowdsourcing; create a wrapper of social media options at the paragraph level (and the chapter and full book levels as well); charge for special versions of the text (e.g., POD, ePub); and supplement the revenue stream with institutional support, support that will rise or fall depending on how successful the new discovery vehicles prove to be.

This is a strategy for one segment and only one segment of university press publishing, the specialized monograph, which may have traditional sales in the range of 300-600 copies.  If you assume an average price received of $50 per copy, the publisher stands to earn around $15,000 to $30,000 in revenue before expenses, which may not be enough to cover the allocable overhead.  Thus experimenting with an OA offering is not putting much money at risk.  But without ongoing institutional support, it is difficult to see how this kind of publishing can continue.

pdf doesn’t seem to be any barrier to Google searching the text.

I believe Springer is providing numerous books via Springer Link and CRC Press has NetBAse. However you have to buy a collection of books. I am not sure the economics are there for single copy sales. If memory serves, it costs about $15-30K to publish a STM title with a press run of some 300 copies and PPB being one of the lesser costs.

Joe, a story that may be of interest to you in light of this observation:

The title is Communicating Professionally, Third Edition: A How-to-do-it Manual for Librarians. The fact that this book is now in its third edition tells us something about its success, but curiously there does not appear to be an electronic edition, not even for Kindle.

The book you cite is indeed currently published by ALA, but that’s because its original publisher (Neal Schuman) was recently acquired by ALA Editions. I wrote a book for Neal Schuman in 2004, and several years later I was talking to the publisher on the phone and discussing the possibility of a second edition. I mentioned in passing that, of course, I would want the second edition to be available as an ebook. His response: “We don’t do ebooks. I’m convinced that there is no market for ebooks in libraries.”

I was in an airport departure lounge at the time I was having this conversation, so I tried hard not to yell. But I’m not positive I succeeded.

I was astounded to see ePub mentioned only once and even then only as a “special version” of an academic book. The structural similarity between ePub and HTML is close in ePub version two and even closer (to HTML 5) in ePub version three. This means that most web rendering engines can also render ePub. In Firefox there is an add-on called EPUBReader. In Chrome, there is the Readium plug-in. More generally, there was Ibis Reader which was acquired along with Threepress Consulting by O’Reilly Press. Ibis Reader was able to render ePub via any HTML 5-compliant web browser, even on mobile devices. As a proof of concept, Ibis Reader was very compelling. The IDPF (maintainer of the open ePub standard) is currently in the process of enabling a similar web facility with its Readium.js project.
An unencumbered ePub is fully indexable and has a reach that includes but is not limited to the web or to having an internet connection. There are free ePub-capable eReaders on all major desktop, laptop and mobile platforms. The ePub TOC can be as granular as needed.
An academic writer also has a growing number of tools with which to create ePub documents of any length from monographs and journal articles to a digital opus on the scale of “The Decline and Fall of the Roman Empire.”
I really don’t think that ePub can be ignored.

As usual–great background and insights–gotta go share . . .

Random thought by the end of this article: monetization through crowdfunding of books. This will help ensure the right books get written. Not a “fix all” solution, but a useful band-aid perhaps?

The “Kickstarter” crowdfunding approach is something of a double-edged sword. It does provide a mechanism whereby books can be funded in advance, their costs covered, allowing for releasing them freely to the world.

The issue though, is that such an approach will also stop some books from ever being written. Book publishing is something of a speculative business. You publish a lot of books, assuming only a few of them (at best) will be hits, and the success of those hits pays for all the other books that didn’t catch on quite so well. If you’re only going to complete those books that have achieved hit status in advance, then the others don’t get written.

And there are some books that are massively important that may not appeal to a broad audience, but that changed the world for a small group and led to important things. I always like the music analogy, with Brian Eno’s quote than only a few thousand people bought the Velvet Underground’s first album, but every single one of them was inspired to start a band. If Kickstarter existed back in the mid-60’s, “The Velvet Underground & Nico” never gets recorded.

Thanks David that is good input. You have put into words what I meant by it not being a “fix all” solution. I don’t see this idea as replacing traditional book funding models, but it might be a nice adjunct.

I feel it would be a good arrow in the quiver of the imaginary Web-scale University Press, especially in the context of helping to satisfy the often hard to identify “long tail” or niche demands.

Who is going to want to click and load a new web page for every paragraph of a book? That sounds tedious and time-wasting!

Regarding annotations: you seem to suggest that publishers need to do quite a lot of work to segment their books to facilitate fine-grained annotation. How about http://hypothes.is ? Could that be an enabler for such tagging and annotation?

Hypothes.is is definitely an annotation tool to explore, one of many. The key thing is to bring discovery to the level of the paragraph or of some quantity of selected text.

The experiment has already started. Looking at the items you’ve listed, I reckon we (OECD Publishing) do them all, except paragraph-level tagging. HTML – check (we do HTML 5 so we fit all devices); chapter-level publishing – check (in HTML as well); social toolkit – check (and this includes the ability to embed our publications and individual pages in a website); digital aggregation – check (our own and via most of the industry e-channels); sustainable business model – check (html is free, everything else is charged for). You missed out one item: we publish charts, graphs and tables individually in html and Excel.
What’s the impact? Well, we publish around 400 new books annually (about 250 in English, the others mainly in French but also in Japanese, German and Spanish) and we’re getting more than a million downloads and online readings per month, of which around 20% earn revenue. ±85% of our revenue is e, the remainder print (and, yes, we do POD too). Dissemination is global (except for North Korea, we’ve been read in every country of the globe this year) and will be ±40% higher this year than last – both free dissemination and paid is growing.
Are we covering our 13 million euro cost? Just about. We have a 10% subsidy from our institution but this will be reviewed in 2015 (and likely disappear) so we are looking yet again at our costs and new ways to generate revenue, including author-side contributions when funders have OA mandates (note: a contribution, not 100% of the cost because our funders don’t have that sort of money).
I would welcome other scholarly book publishers to join our experiment.
By the way, we call it a Freemium Access publishing model because the content is free to read, the premium service is needed if you want to download, print etc.
Toby Green

This article couldn’t have come at a better time. The new director of the Amherst College Press has just been hired (I was on the search committee), and Joe’s article, plus the astute comments by Frank Lowney and Toby Green, provides a good roadmap for this new press to follow. As you know, this press is dedicated to publishing monographs freely online in the humanities (which makes it different from OECD’s concentration), and its business model is sustainable because it is based on endowment money being used to pay the salaries of the press’s employees. I shall share this post with the new director–and encourage him to sign up for TSK immediately!

Joe, I completely agree with your perspective… but will point out that ePub is not just a packaging format but more fundamentally is a way to structure HTML and related assets in an interoperable, accessible manner – this will help facilitate the “semantic” use cases you envision (fine-grained access, remixing, …) more so than custom HTML websites with randomly different organizational conventions per publisher or per title. And conversely if you’re a web-native publisher, making a quality packaged .epub file available will be a snap. So I don’t see this as HTML websites on the one hand vs. PDF and ePub on the other but rather ePub & HTML as the new platform for publishers to do both next-generation portable documents and online websites at scale. And from which high-quality PDFs can be generated for print-oriented renditions.

Also, it’s a detail but the one part of things i don’t get is the thing about a web page per paragraph. Doable certainly but useful/usable? The unit of SEO targeting might better be a section or sub-section (“cards” in Inkling parlance) rather than paragraph. Annotations need to be able to link to arbitrary sentences or phrases. So I’m not sure where paragraphs per se are useful units.

