The EPUB 3.0 specification has reached the ripe old age of 16 months and has been making waves in the book publishing community since long before it was officially published in October 2011. The waves produced in the publishing industry have primarily been focused in the trade publishing world, where the growth of e-book distribution and sales have skyrocketed over the past five years. Publishers have needed a single-format distribution mechanism for e-books that can streamline their content creation and distribution systems. According to Wikipedia, there are more than 25 types of e-book formats, which is almost certainly unsustainable. EPUB 3 has shown promise as the potential format that might fit the bill for a common format, especially as it gains adoption.
EPUB is an open specification published by the International Digital Publishing Forum (IDPF). The EPUB specification is “a distribution and interchange format standard for digital publications and documents.” It defines a structure for representing, packaging, and encoding structured content for distribution in a single file. Importantly, the standard is not solely focused on electronic books, which is why it is called “EPUB” rather than “EBOOK”. The new 3.0 version of the specification includes very robust ways to incorporate fixed layout multimedia, support for MathML, and important metadata and enhanced accessibility features. A complete list of the new features and functionality and a comparison between EPUB 3 and the previous version is available on the IDPF website. Despite praise IDPF received for taking EPUB 3 in a promising direction and the functionality and structural improvements of the new version, adoption was initially tentative. In part this was due to the modest number of platforms that could appropriately render EPUB 3. This has changed significantly in the past year, and now there are a variety of platforms that support a range of EPUB 3 features. The Book Industry Study Group just released an updated version of its EPUB 3 Support Grid describing more than 40 reading systems and the level of their technical support of EPUB 3.
We are in the midst of a community transition, the extent and pace of this transition is worthy of consideration. Outside of the core e-book community, adoption of EPUB has been slow. In the scholarly journals community in particular, use has been lagging. The added functionality provided by EPUB 3 and its many new features are things that make the standard a viable option for distributing scholarly content of all types, including journal articles. But why has adoption been slower for scholarly journal publishers than it has been for other types of publishers?
In part, it’s because the demand for the journal publishing community to move toward EPUB is limited. E-book reading devices are not how most consumers of scholarly content are reading journal content. Much like every industry, “you go where the customers are” and, at the moment, journal readers are not typically reading articles on reading devices like the Kindle, Nook, or Kobo eReader — although the growing use of tablet devices could change this trend, as might improved reading apps for smart phones. However, much of journal article discovery and reading on those platforms is still done in device-specific applications or using mobile-friendly web interfaces. This would likely point to HTML5 as the future for online and mobile content formats. EPUB should certainly be counted in that future, since the specification was drawn from existing and emerging web standards, in particular HTML5.
Scholarly journal publishers were among the first to move toward digital content delivery. This first-mover status has placed journal publishers in a unique position of having created a reader community that is comfortable reading content on web platforms, or alternatively downloading the existing file distribution method at the time, PDF. For the vast majority of scholarly content consumers, reading of electronic journals predominantly takes place on computer screens or on printed versions of page facsimiles, not on reading devices. The growth of mobile distribution forms is not something that should be over looked. Many large publishers have moved quickly toward the application development route for distributing their content. However, these apps or mobile sites are generally wrappers for either HTML or XML content, or provide download functionality for PDFs to the mobile device. The distinction between device specific apps, web-interface apps and HTML5 apps don’t necessarily demand the implementation of EPUB 3. I have written about the fool-hardiness of creating device-specific apps, but is EPUB 3 the answer to these concerns for most journal content providers?
A few publishers have been using earlier versions of EPUB to distribute scholarly journal content for some time, such as Lippincott Williams & Wilkins, and Hindawi. Elsevier has provided an application that allows users to convert ScienceDirect articles into eReader formats such as Mobi and EPUB 2. The main purpose of these services has been to allow readers to access journal content on their e-book reading devices (such as Kindle, Nook, etc.), but again adoption in the space was slow.
BioMed Central announced that it would begin using the EPUB 3 format for some of its titles last December and was among the first scholarly publishers to do so. The first title that it began distributing content in EPUB format is the Journal of Neuroinflammation. In one of the comments on the blog post that describes the move, Nicola Collingwood said:
BioMed Central has extended its existing suite of publication formats (PDF, XML, HTML) to include ePUB, offering a range of formats to suit our readers varying needs. PDF remains one of our core publication formats, but we wanted to address the growing need for more mobile access to content.
BMC received praise from IDPF, the developer of the EPUB 3 standard, for its leadership in movement toward EPUB. But it’s interesting to note BMC’s tentativeness with this move, as noted in Collingwood’s comment. The reason is fairly simple — most readers of scholarly journal content prefer PDFs and have for years.
The reading experience of PDFs is not ideal for a variety of reasons. The most basic reason is that almost no device has a standard page-size screen that matches the page-image-replication functionality of PDF. Therefore a PDF image file will never fit perfectly onto a device screen and users are forced to either scroll left to right, read impossibly small text, or switch to horizontal screen presentation that displays only a few lines of the text at a time and thus requires a lot of virtual scrolling. A second problem with PDF is that the types of embedded image and their functionality are limited. PDF is a poor vehicle for the more interactive, data-intensive, and interlinked forms where content is moving. While there are methods for embedded linking and metadata within a PDF file, few publishers take advantage of these functionalities. A third issue with PDF is accessibility for the print disabled. While the capability for such accessibility is available — if the file is properly formatted — it is not trivial to enable this and is so rarely done that in most cases, PDF files are nearly useless for the print disabled community.
There are other more advanced rendering features that exist with PDF, but they are also not generally implemented. Since content is originally created in some other software and then converted to PDF, incorporating these functions can be difficult and not all PDF readers or platforms support all functions. In 2009, Alyssa Goodman and several physics colleagues published a paper in Nature that made use of advanced 3D viewer capabilities within the PDF format. Brian Hayes in his post PDF vs. HTML noted that this was “apparently the first scientific publication to include a 3D PDF,” and that he “didn’t don’t know of another example since. And I have never seen any other kind of interactive graphics embedded in a PDF.” Yet four years later, few PDF articles have followed this lead in using 3D visualizations or more interactive multimedia content. Could EPUB improve the reading experience for those that are currently downloading PDF as well as for those wishing to include interactive content? Almost certainly.
This leads to another important question: Are journal publishers — or more importantly are authors — going to embrace the multimedia content forms that would take advantage of the functionality that EPUB offers? Discussions around non-textual content and the impending flood of interactive data or multimedia content have been ongoing for years but such rich content has yet to really take hold among researchers and authors. There has been significant movement toward data distribution, but often outside of traditional publisher distribution channels. Of note, the OSA – The Optical Society developed a prototype effort called Interactive Scientific Publishing (ISP) with support from the NIH National Library of Medicine. Since its launch, though, OSA has pulled back on the project mainly due to the challenges of creating and curating the content, as well as the lack of enthusiasm by authors, although some of this functionality has been retained in OSA’s Optics InfoBase repository. Creating multimedia is a challenge for researchers who are not expert on video, visualization tools, or audio mixing. Those publications that do include this content also invest heavily in its creation and support for the authors to do so.
For the most part, academic journals do not require the design-laden approach to content that magazines use and therefore some of the more advanced style sheet functionality provided by CSS3, such as adaptive layout, that are present in EPUB 3 won’t really be needed. Support for advanced markup like MathML, however, is critically important in some fields and is accommodated in the EPUB 3 specification. Unfortunately, only 14 of the 40 reading systems identified in the BISG EPUB 3 Support Grid fully support MathML, despite the open-source availability of basic support for MathML’s presentation markup in WebKit to do so.
E-book content distribution is usually something that is untethered from the network. Readers will generally download an entire book file and read it at their leisure. To some extent the same is true for many journal article readers. Although the data from Phil Davis’ article analyzing chemical journal usage is from 2003, other subsequent publications (such as Koon-Kiu Yan and Mark Gerstein’s paper) support the fact that the HTML/PDF download-rate show there is strong demand for a downloaded copy of articles either for personal collection, analog printing, and off-line reading. It has often been described that HTML is for skimming and scrolling through text, while PDFs are for serious reading. I’m not sure that is entirely true, but for a large segment of the reading community, their actions seem to point strongly in that direction.
With the adoption of any standard, the critical question is whether the benefits from the new specification are outweighed by the transition and ongoing costs of implementing the new system. The cost to convert from existing XML workflows to EPUB should not be a major factor. The transformation from XML formats such as JATS (Journal Article Tag Suite) was described by Laura Kelly during the 2010 JATS-Con conference in her presentation JATS to EPUB: Unraveling the Mystery. Subsequently, a variety of vendors began making this service available. Even if a publisher is not currently working in a strictly XML environment, software such as Adobe’s Creative Suite, Apple Pages, and Microsoft Word can all export to EPUB or easily be transformed to EPUB through some plug-ins, if the document is well-formed by the author. And if a publisher is already producing content in HTML or HTML5, the transformation to EPUB 3 of that content should also be minimal, since at its heart EPUB 3 is HTML 5 wrapped in a zip file with some required metadata (spine, etc.). In short, the transformation into EPUB 3 is not very costly or difficult technologically. However, we should acknowledge that no file transformation process — however simple as a one-off-activity — is therefore simple at scale, particularly at the scale of many large publishers. These transitions generally don’t comes easily or cost-free.
Whether EPUB will become the dominant distribution file format for the next 20 years the way that PDF has done since it was first released by Adobe in the 1990s, I will leave history to tell. What is obvious is that web distribution for scholarly journal content is here to stay. In all likelihood, PDF has outlasted its usefulness in our world of ever-expanding device types, screen sizes, and functionalities. In addition, as content forms grow further away from a print-based paradigm to a more interactive and multimedia environment, the PDF-as-paper-substitute will be insufficient. Taking that first step and jettisoning PDF in favor of EPUB is a step that more publishers should consider making — and sooner rather than later. The fact that creation of EPUB is becoming nearly as easy an output format as PDF should hasten the creation of more EPUBs in the community.
Nonetheless, PDF will likely remain a strong mode for offline reading due to inertia if for no other reason, despite its shortcomings. Likely, a publisher’s move to EPUB 3 will ultimately be determined by the need to implement one of the advanced features supported in EPUB 3. It is unlikely that user demand will be the primary driver publisher adoption. It’s also possible that publishers will choose to bypass EPUB and move directly to HTML 5 distribution via mobile sites or apps. The fact that publishers are as advanced as they are in web-based delivery has made the mobile transition less complex for journal publishers than it has for other content providers. Publishers could easily package that HTML 5 for offline viewing by adopting EPUB, but at the moment, these apps generally provide for PDF offline viewing or printing. Again, making the switch might be more complex and costly than one might presume.
While replacing PDF will be no small feat, if any specification is well positioned to do so, EPUB 3 is it. Journal publishers should seriously consider moving in that direction, since the costs and risks are relatively low and the opportunities opened up are substantial. When and how readers will move with them, we will have to see, but as the eager adoption of apps has shown, readers will move to new technologies when it offers them sufficient functionality and ease-of-use. If publishers don’t begin providing the new and improved functionality that EPUB 3 offers, we can be assured that the move away from PDF for journal content will remain at its present state: Emerging.