Last week, as an activity during the SSP conference, I participated in the Arizona State University Book Sprint to develop a text on the future of scholarly publishing. My contribution to the book sprint was fine. It wasn’t great. It was probably as much as one can expect from 35 minutes of trying to get thoughts down on “paper.” Even though my piece was edited a bit and compiled into a larger whole, it didn’t meet my hopes for contributing to a book — but perhaps that isn’t the point of a book sprint. When I talked about the project with the organizers, they said the book would be published online, but it wasn’t likely to find a publisher and move to print. I asked, why not? They could post it on one of any number of print-on-demand sites and make it available. You don’t need a publisher these days to produce a book. Isn’t that the point of a book sprint?
This led me to consider what exactly constitutes “Publishing”? It is certainly more, we can agree, than posting something on a blog, or posting pictures on Instagram, or posting a recipe on Facebook. If you consider publishing to be simply the distribution of ideas, then, sure, these other things count as publishing. If you attend technology meetings, you’ll find that the word “publisher” applies to nearly anyone who posts things on the Internet. Those in web advertising particularly prefer using the term “publisher” to describe those who host web content. This distinction is important with regard to the ongoing discussion on the future of electronic formats and standards development in the publishing space.
The book-sprint exercise was quite fun and an interesting activity. I’m glad I participated. It generated ideas that drove a subsequent conversation I had with Bill Kasdorf about the recently proposed merger of IDPF and the W3C. The International Digital Publishing Forum, or IDPF, is a standards organization that is known primarily for its development of the EPUB format. W3C, or the World Wide Web Consortium, is a much larger organization focused on web technology standards development. Last month, the two organizations openly announced they were in discussions to merge.
I have no philosophical objection to using the open web platform for publishing or for HTML-based production of texts, when it is appropriate. It is fairly clear that a significant portion of publishing, content discovery, delivery, and preservation will take place digitally. Consolidating production practices around common formats, distribution tools, and metadata adds to efficiency, reduces costs, aids in accessibility, and ultimately increases usage of content. I suspect you’ll find few others who have “drunk the Kool-Aid” as much as I have when it comes to standards and digital content distribution. But I am very reluctant to believe that this proposed merger will be good for publishers.
A core principle of open standards development is that it must involve members of the community that will use those standards. Standards development is best achieved through applying due process that, according to the American National Standards Institute (ANSI), “is the key to ensuring that [American National Standards] are developed in an environment that is equitable, accessible and responsive to the requirements of various stakeholders.” The core elements of this process are collaboration, balance, and basing the work on consensus. If standards are developed without the contributions of the participants to whom they apply, they have little hope of being considered true “consensus” documents. Also central to this notion is that the work should involve a balance of interests, which includes a diversity of market participants manufacturers, distributors and consumers. IDPF is a member association comprised of publishers and those that serve the publishing industry. The W3C is not so comprised, despite its best efforts over the past five years to recruit more publishing-industry members and to engage in work in the publishing space. To be fair, neither IDPF nor W3C is developing ANSI-accredited standards, so — strictly speaking — these definitions and rules don’t apply to them. But the principles behind open standards development are sound and worth considering in the light of this merger. In addition, W3C was an inaugural signatory to broad principles on open standards development that also support such notions of balance and consensus among industry players. Is its work on development of standards for digital publishing in line with those principles? In aspiring to bring on more publishers, yes, but in practice, less so.
It certainly is true that standards are often imposed on the community from outside processes. We all know a variety of de facto standards that have been imposed on consumers, distributors, or even manufacturers. One example widely adopted in the publishing community is the PDF file format, which was originally developed in the early 1990s by Adobe to support its Acrobat product. It wasn’t until 1993 that the format was made freely available, and it took another 15 years before it was officially released as an open standard, published by the International Organization for Standardization as ISO 32000-1:2008. One might say that this is an unfair comparison: corporate-controlled standards development is significantly different from open development processes. However, this is true only up to the point that the standards-development organization actually represents the community that is the target of the standard.
The proposed structure for the IDPF-W3C merger, in so far as it has been publicly described — and, importantly, is still being negotiated — is that a special subclass of W3C membership will exist for a few years. During this period IDPF members will form the basis of a new Publishing Business Group, which will provide input on the range of publishing-related W3C activities. The members will be grandfathered in at IDPF dues levels for some period and allowed to participate in the work of the Digital Publishing Interest Group (DPIG) and its associated working groups, but not all of the wider W3C activities. What groups are directly tied to the work of the DPIG hasn’t been settled.
Presumably, at some point in the future, this temporary membership arrangement will cease and those former IDPF members will be given a choice of full W3C membership, continuation as non-participating but dues paying members of the Publishing Business Group, or being booted from the club, so to speak. The pattern of past five years gives no indication that if presented with the option a few years from now, a large number of IDPF members will become full members of the W3C, primarily because of the relatively high cost of W3C dues. More likely than not, based on the past five years of evidence, they will withdraw and leave the maintenance of the standard to W3C and its active members, be they large corporations, umbrella societies, or technology companies interested in the broader range of W3C activities. Regardless, the representation is unlikely to include a broad set of publishers that IDPF represents. This would be a tragedy, since IDPF has done such a tremendous job engaging a wide range of publishers in the process of developing EPUB and advancing publishers technology interests in the diverse e-book publishing world.
As I understand it, also unresolved would be the scope of participation and which groups constitute those that the members of the DPIG can participate in as active working group members. This gets to the question of what constitutes a publisher. For example, CSS and accessibility are firmly within the purview of the DPIG, but what about authentication and security, multi-media content, or data management? Increasingly, these issues are important for some publishers in our community and fit within the modern perception of what constitutes “publishing”, even in a traditional context.
An important element to consider when thinking about whether this merger is good for publishers or not is control. The IDPF is an organization of and for publishers. More than two-thirds of the IDPF Board of Directors are employees of publishers or non-profits — EDItEUR, Independent Book Publishers Association (IBPA), or the Italian Publishers Association, for example — that are closely tied to the publishing industry. However, the W3C has a paltry number of publishers as members. Of its 421 current members, only five are publishers: Wiley, Hachette Livre, Pearson, Hindawi, and Thomson Reuters. Creatively thinking about who contributes to and participates in the publishing industry (companies and organizations such as Bloomberg, Walt Disney, OCLC, MarkLogic, and others, most of whom are not scholarly or even publishers) adds another ten members. So of the 421, only 15 (counting generously) represent the publishing world and traditional publishing interests. That’s 3.6% of the overall membership, despite a very aggressive campaign over the past five years to recruit new publisher members to W3C. Even those 15 are hardly representative of the breadth and depth of the publishing industry. Compare this to the more than 300 members of IDPF who are generally publishers or publishing-related companies.
As I’ve said standards development bodies should only create standards for those communities that they represent. How can an organization develop standards for an industry it does not represent via its membership? Imagine if the tables were turned and the publishing industry began insisting that the formats in which it develops and distributes content must be adopted by technology companies! If there were ever a way for the publishing community to continue to marginalize itself, it would be to hand control, in so far as it still has any, of its production and distribution technology to the technology companies that have much broader business interests.
At the risk of getting too techie and losing some readers, I’ll touch on two important areas of focus where the participation of publishers is critical and how proceeding with this merger might sideline the interests of publishers in support of the W3C vendor community. There was recently a proposal to add HTML5 serialization of the data within the EPUB format, which would create significant issues with the existing XHTML5 model that EPUB is based upon. The rationale for this is that browsers have begun to deprecate XHTML5 in support of HTML5 and this is causing some rendering issues in browsers. If adopted, this move would have avoided some of the XML namespace issues that create rendering problems and make the web transition more seamless. Although I wasn’t involved in the conversations around the proposal, I understand that there was push back from several publisher-related participants, given the importance of XML workflows for a variety of publishers and that this proposal was rejected in the latest draft of EPUB 3.1. While this proposal made sense from a distributor-side, i.e. browser developer or device manufacturer with a built-in browser for rendering, it does not serve the publisher that is creating the content in its existing XML workflow. The rationale for using XML in production workflow is that it is optimized for a variety of output forms, not only web-based browser display. Few publishers have an entirely HTML5-based production workflow and the tools to support this are inadequate for a variety of production processes. (N.B.: Yes, transformations are possible and this could be managed; we don’t need to cover this technical of detail in a Scholarly Kitchen post!) The key point being that were publishers not involved in the development process of the next version of the standard and in a position to push back with their business realities and needs, considerable problems (read: COST) would be added to the production process.
Another great example of this is support for math in display technologies and browsers. The publishing community has had a robust structure for rendering mathematical equations in print and browser-based devices for years: MathML. There even exists now a W3C community group (numbering more than 50 members) that seeks to address math requirements in web pages. The group’s website summarizes this issue succinctly:
“There are many technical issues in presenting mathematics in today’s Open Web Platform, which has led to the poor access to Mathematics in Web Pages. This is in spite of the existing de jure or de facto standards for authoring mathematics, like MathML, LaTeX, or asciimath, which have been around for a very long time and are widely used by the mathematical and technical communities. While MathML was supposed to solve the problem of rendering mathematics on the web it lacks in both implementations and general interest from browser vendors.”
Further, according to the description of the group, its experts “should identify how the core [Open Web Platform] (OWP) layout engines, centered around HTML, SVG, and CSS, can be re-used for the purpose of mathematical layout by mapping mathematical entities on top of these, thereby ensuring a much more efficient result, and making use of current and future OWP optimization possibilities.”
Essentially, this charter implies that while MathML exists and would be an ideal built-in solution if only browsers would implement support for it, the group should consider alternative ways to address the problem because…well, W3C is predisposed to support an HTML5/SVG/CSS technology stack. Here we have another example of decision-making driven not by what works for the publishing industry, but rather by the technology vendors and distributors that have objectives other than supporting publishing. The membership of that MathML community group is also notably comprised of a majority of members that are not publishers.
In the words of one specialist, whom I deeply respect on these issues: “If you aren’t serious about displaying math, you’re not a serious publisher.” It’s true that not everyone cares about the display of mathematics, but for those who focus on publishing scientific information, the issue of proper display of math is critical. Now, W3C might say that there’s a community group focused on addressing the rendering of math symbols. But it seems that that group seeks to find the solution that supports everyone; everyone, that is, except publishers that already have and use a workable solution and the users who value the proper rendition of mathematical symbols.
It looks very likely that this merger will take place, unless there is a groundswell of opposition within W3C’s or IDPF’s respective memberships, which I expect is unlikely. Many of those who have commented on the merger support it, for example here and here and here. My own feeling is that deeper respect for publishing-industry experience among the technology community is warranted, and greater involvement in standards for web-based information distribution is urgently needed. If this merger helps with either one of those two things, it will be useful. Greater engagement in the the technology issues that underpin much of our industry is also desperately needed from publishers and their suppliers. Having a seat at the table for web standards is also valuable to support engagement. The entire world of web users will benefit from publishing industry’s expertise in design, form, and functionality, as it has from the earliest days of publishing in other mediums. Though providing input is not the same as being able to control the final output, which is critical in standards development. It is far too easy to say, “We’ve heard your concerns and we’re rejecting them.”
I am concerned that the merger will be another way in which the publishing industry passes yet another element of control over its products and its content distribution to other industries. Does it make sense to cede ever more of our activities to organizations outside our industry?