This week’s video takes a look at the evolution of the online article, from the ASCII format of early internet days, through PDF, to HTML and beyond, and asks why readers continue to prefer PDF, despite the enhanced functionality offered by HTML. Full disclosure – it was created and narrated by my colleague Gary Spencer.

Alice Meadows

Alice Meadows

I am a Co-Founder of the MoreBrains Cooperative, a scholarly communications consultancy with a focus on open research and research infrastructure. I have many years experience of both scholarly publishing (including at Blackwell Publishing and Wiley) and research infrastructure (at ORCID and, most recently, NISO, where I was Director of Community Engagement). I’m actively involved in the information community, and served as SSP President in 2021-22. I was honored to receive the SSP Distinguished Service Award in 2018, the ALPSP Award for Contribution to Scholarly Publishing in 2016, and the ISMTE Recognition Award in 2013. I’m passionate about improving trust in scholarly communications, and about addressing inequities in our community (and beyond!). Note: The opinions expressed here are my own


17 Thoughts on "The Evolution of Digital Publishing and its Formats"

PDF has attractions for users, and in spite of common – but inaccurate – perceptions, can offer functionality as rich and full as HTML and in many cases even more so. The only thing a reader needs is to use a scientific PDF-reader. Utopia Documents, for instance. It is publisher-independent, freely available from, and adds offline as well as interactive online functionality to all PDFs that are not bitmaps. It can be used in any discipline, but the link-out and knowledge navigation features are currently optimised for the biomedical/biochemical sciences.

I’m not sure that I understand the question. I think of HTML as the online format and PDF as the thing I can save. Saving HTML seems like a problem. Plus many of the documents only come as PDF. So it is not a question of preference, just what one does.

Saving HTML is only an problem if that HTML relies upon associated files that are not downloaded at the same time, eg CSS for the layout/design, javascript for dynamic elements. The Wiley app referred to in the video will most likely either have local copies of these files or grab the appropriate files to enable offline reading.

At its heart, HTML just contains a document and indicates structural elements, and CSS defines rules that a browser can use to lay the document out on a device. There is nothing inherently on-line about this; it is just that we consume HTML on-line that certain efficiencies have developed that need to be worked around for off-line use.

I believe that many, if not all, of the objections to HTML over PDF (offline reading, share-able, etc.) are effectively resolved with ePub version three, especially with born digital publications where the workflow is not a mix of paper and digital. Eventually, digital documents will have to fill the publication of record roll. An ePub is more like a “web site in a box” that is fixed more firmly than a remotely hosted web site that can change at any moment without leaving a trail.

This is well and good but technological superiority may not be sufficient to cause millions, or billions, of people to change what they do. Change always has a short term cost, whatever its long term benefits.

Until recently, I have agreed with this view. For the last 10 years I have both been in publishing and also teaching information management to physicians in training. For those 10 years watching trainees search, I strongly encouraged them to use the HTML. However, Adobe Reader X gives remarkable ability to annotate articles that are in PDF format.

I now find myself in agreement with David’s comment and my trainees that HTML and PDF have unique strengths and neither solves all my needs. HTML allows a quick read, but PDF supports detailed reading when I want to preserve my thoughts for myself and co-authors by making annotations and highlights on the PDF of complicated research reports. In my view, PDFs are greatly underused for this role. I have not found an easy way to make my own annotations on epub documents. Is ePub planning to support annotations?

I smell a conspiracy here. Publishers can lock down the content with html. Content slips through their fingers with pdf. Smart content in html is alluring, but it allows publishers to monitor what users are doing. This could potentially help much needed revenue, but it also brings up privacy issues. It also overlooks the fact that some smart content functionally is available in pdf. Why not leverage smart content functionality in pdf, instead of pushing html? Thanks.

I am in my 60s and while I am finally returning to my first love, writing, both scholarly and fiction, I am stumped by HMTL. The barrier to my switching from pdf which I have conquered and is simpler for me because of the workflow, is simply good training in using HMTL.
At APUS, I am teaching using Sakai, and am limited in posting videos etc due to my ignorance. So I do want to improve.

Nice video and discussion, but everything here is from the reader’s viewpoint. Somebody still has to write the articles that stretch beyond the PDF! The reluctance to adopt new technology may lie as much with the authors as with the readers.

My journal’s authors are pretty savvy with technology, but they are NOT beating down my door for us to get the Wiley app! (We are a Wiley journal.) I would love for an author to demonstrate their mathematical model of GIS application with a video. I’ve made it clear we would be happy to experiment. But… no takers, so far! I suspect the problem is that the PDF is good enough for prestige and tenure. Why stick your neck out and risk a publishing delay?

We have a GIS specialty conference coming up in May. I’ve arranged a special “authors and editor” session to learn more about how we can make this big switch.

Why are some documents created and digested through PDFs rather than HTML? For reasons analogous to those that motivated Meadows/Spencer to put (and us to engage) their message in video instead of HTML (which could have done the same job, and with greater advantages of interoperability, etc.).

The choice of format emerges from an entire calculus, on the author/publisher’s side of time, energy, effort, intended audience, etc.; and on the reader’s side of time, energy, intended use, available device, etc. That calculus is complicated, and no single medium can, or should be expected to, satisfy all, everywhere, every time. I think the proper attitude is not, “C’mon pdf users, get your lazy butts into the HTML world” (the thinly disguised attitude in the video, esp. in the final questions) but rather “What are the strengths and limitations of HTML, PDF, and [fill in the blank]?” That is, we should consider which formats and media are ideal for each situation.

As authors, we should write in the format that least hinders our writing. As readers, we should read using the medium, format, and device that least hinders our reading and our use of that reading. And as publishers, recognizing a great many variables on either side, we should be medium and format pluralists.

Aerbook/Pelican turns a PDF into a retina-quality graphical EPUB3 that can be further edited to include links, audio and video, and animated and/or interactive widgets that can easily be made usung tools like Hype or Adobe Edge Animate. In iBooks, for example, the books are searchable, as text is extracted and stored alongside each image-based page. Conversion is simple, inexpensive, and automated, and moreover books made with Pelican can easily be shared, previewed, and purchased directly online, with selected pages made publicly previewable even within social apps in Aerbook’s Cloud Reader.

We believe Pelican is also the best way to add structured social metadata to a publication, readable and expressive by any social network — canonical links to representative images, sample content, tags, title and author, and so on in Open Graph format.

While we are not specifically targeting scholarly publishing, in any event this article focuses directly on functionality and formats we do address, so offering this information to the community.

Ron Martinez

I hope this does not sound rude, but I believe it is time publishers stopped debating what format readers want, and let them decide for themselves. There are good reasons for choosing PDF (e.g. you remember where you were on the page), and reflowing formats (more interactivity).

For years we publishers have claimed they have “XML-first” production of their publications. So why not create a rich, structured XML file, and create whatever format is required, on the fly, as requested by the reader. Even better, publish the XML (free OA, or by subscription) and let third parties “render” it to a the readers’ preferences.

I never understood why publishers do the opposite: jealously guard the XML, spend huge amounts of money on what they think the readers want and spend more IT time and money debating what formats might be around the corner.

I think this is a great video; nice to see the chronology. We forget so quickly where we were a few years ago.

I work with government agencies to produce their signature reports – like annual reports – in alternative digital formats. We have evidence that an agency increased their user access fourfold by moving their print-ready PDFs to beautiful WCAG2.0 compliant HTML. Web pages are so much more findable by search engines.

We convert the same publications to ePUB for iPad and so that agencies can put content into iTunes. We also convert the same titles to Mobi format for Kindle.

You can see our work at the Global CCS institute at

And I write about our work and experiences at

Comments are closed.