Don Linn has a very good two-part post on what people are talking about with regard to the current state of publishing. Part One is a list of distractors — topics of marginal importance that occupy so much time and attention (e.g., enhanced ebooks); Part Two lists the items he thinks should be at the top of the list (e.g., stem-to-stern workflow reorganization). It’s a good list (both parts), which I recommend to everyone.
I know Linn professionally, and was struck by the note of exasperation that creeps into the posts. He is not alone! Linn’s is an attempt by an experienced executive to get everybody at the meeting to stop talking among themselves and stick to the agenda. The problem, of course (as Linn knows), is that there is no meeting — no one is in charge. We should all be thinking about setting the agenda.
The orientation of Linn’s list is toward trade books — trade ebooks especially — which caused me to wonder how it would be modified or expanded for scholarly publishing — though here it must be said that most of the items on Linn’s list travel well to most segments of publishing. But, yes, there are exceptions: enhanced e-books, for example, are a costly non-scalable activity for the trade book publisher, but the publisher of, say, medical journals will soon be incorporating videos of actual procedures and interactive animations for clinical training. One segment’s distractor is another segment’s desideratum.
So here is an annex to Linn’s post for scholarly publishers. This is an incomplete list; it consists of the reflections of the moment. I will be very interested in comments that expand (or shrink) this list. Someone has to set the agenda, but no one is in charge.
I will forgo part one (the distractors), though, as it only calls more attention to them. Besides, no matter how many times you point out that open access is a marginal issue, that the clamor for collaboration between university presses and academic libraries comes up empty, or that mass digitization projects of little-read books and documents will yield little-read mass digitized databases, the party continues unabated. It will continue until programs of greater magnitude become central, such as the development of direct-to-consumer subscription services and the creation of new content types expressly for machine consumption.
1. Metadata. Linn gets at this one, too, under the “discovery” heading. The “metadata movement” is beginning to get some momentum, but the importance of this topic is still not fully appreciated. In an analog world there are so many cues about the nature and content of a publication (we can hold a journal in our hands) that we tend not to focus on how carefully crafted all the marketing messages are: the heft, the brand, the abstracts and table of contents, etc. In the online world there is only metadata; Amazon is the world’s biggest metadata machine. There is a mistaken notion floating about that full-text search (a great development) solves all discovery problems, but that ignores the important steps that take place before discovery, the creation of demand by the publisher. Markets are created, not served, and metadata is the tool of the creator.
2. The “author-pays” business model. This is distinct from open access, with which it is usually paired. The importance of this model, pioneered by BioMed Central and now widely imitated, is that it taps a new revenue source, authors, and thus is a positive response to the challenge of library bypass. At this time no one knows how far author-pays can go. Can it handle long-form texts, aka books, as well as journal articles? Will there by a great number of such services, or will the author-pays model lead to greater industry consolidation, as the broad-based services of PLoS ONE seem to imply? Also of interest is how new revenue opportunities can be layered on top of the base texts.
3. Mobile computing. While just about every knowledge worker now carries or will soon be carrying a smartphone, downloading apps and engaging on the run with email, photo-sharing, and texting, most scholarly communications is rooted in Web 0.5, where a fixed text is created in a PDF, mounted on a Web server, downloaded on a personal computer, and then printed out to be read. Recently I sat through a presentation on a survey of librarians about their preferences for ebooks, which included a desire for digital texts to be downloadable, not simply streamed. This reveals an urge to hold onto the early PC (Web 0.5) model even as Cloud computing is emerging as the dominant paradigm. This is the problem with surveys: they have difficulty looking forward. Publishers have to begin to assume that the primary mode of information consumption will be through a mobile device.
4. Sensor publishing. With the growth of the mobile phone industry, more and more sensors are being built into mobile devices, tracking movement, geography, and, soon, much, much more: heat, light, ambient sound, radiation, etc. Data is then aggregated across a wide user base (automated crowdsourcing), analyzed for patterns, and then published as reports. Thus a high-schooler in Peoria becomes a host for a data-collection device that feeds research activity in Cambridge, which then is packaged and published in Santa Clara. How many publishers are in discussions with AT&T, Verizon, and Sprint to develop passive data-collection services for sensor publishing?
5. Text mining. While the development of massive databases is unlikely to lead to more human consumption of text, machines are a different matter; their appetites are unlimited. Text mining will evolve in stages, beginning with seeking patterns in preexisting text, but later creating new publishing opportunities as texts are developed precisely to allow machines to manipulate them. If I were a young fellow with an entrepreneurial itch, this is where I would place my bet. There is an inexorable progression: from regional markets to national, from national to global, and from the global human market to the market among machines.
6. The face-down publishing paradigm. This is the connector between mobile computing (viewed face down, as you look at a screen) and back-end data-analytic services. Currently publishers are looking at new devices as a way to display the same texts that were previously distributed for print. But with mobile devices, ubiquitious broadband, and Cloud services, texts can become dynamic and be linked by geolocation. All publishing will move in this direction eventually, but the infrastructure is only now being put in place. Start now with a small, focused project to learn the implications of what could be mainstream within the working lifetimes of most industry managers today.
7. Direct-to-consumer marketing of subscription services. As noted above, publishers increasingly will be developing means to sell directly to end-users. Part of the reason for this is the need to work around the problems with library budgets, but another aspect is the growing need to get end-user usage data in real-time, which working through intermediaries (libraries, ProQuest, Project Muse, Amazon) does not permit. Such data can be used to develop new marketing programs and to influence editorial programs. A corollary to this is that publishers will have to develop their own Cloud infrastructure to manage these services or find reliable third-parties to provide them–the next generation of Highwires, Atypons, and Silverchairs.
8. Patron-driven acquisition (PDA). As libraries begin to turn over some part of their materials budgets to PDA programs, there will be a gradual restructuring of the overall supply chain. The implications for publishers are not entirely clear at this time (and are not likely to be entirely negative, as many suppose), but PDA is no longer a theoretical topic. Today’s partners in the value chain may become increasingly irrelevant.
9. Artisan publishing. The key issues for the publishing industry are not only about technology but also about enduring themes of editorial selection, craftsmanship, and exquisite communications among small groups of fellow travellers. In the campy sci-fi TV series “Battlestar Galactica,” crew members on a spaceship express their feelings for one another by giving gifts of printed books. Yes, time travel, human-like robots, and the world of Gutenberg and Maxwell Perkins intermixed. For every new and vaunted technological service, there will be a dozen small projects, led by one or two individuals, that will work in the publishing equivalent of French-pressed coffee and flowers in a cafe arranged just so. This is a growth opportunity for all of us.
As I wrote this, other “key” issues sprang to mind, but how central could they be if their sheer number militated against effective action? (I stopped at nine because I did not want the rhetorical effect of “The Ten Key Issues for Scholarly Publishing.”) We need both more ideas and fewer. I will be reading the comments to this post carefully to find out who really is in charge.
Discussion
29 Thoughts on "Setting the Agenda: Key Issues for Scholarly Publishing"
Most of this list seems to be gizmo related, not business related. I would put author pays at the top, as the biggest business issue listed. But selling articles per se (not just PDA) may be the biggest issue of all.
On the gizmo side I think full text analysis (not just search) is far more important than metadata, especially with question answering and machine reading coming on. People read scholarly works mostly either to understand something or to answer questions. Both come from the text, not the metadata, and computers are sneaking up on providing both.
Mobile is important, but it is occupying a new place (literally) not replacing desktop. Mobile has its own uses, most of which may not be scholarly. The scholarly uses are likely to be in the question answering vein.
Hope this helps.
University presses face a special challenge to the extent that they publish in different sectors–hard-core academic, regional trade, fiction and poetry, etc. Their strategies will need to be diverse if they wish to keep playing in all these sectors. But should they, given the added costs of using a variety of business models? (Even within one sector, sometimes different business models apply, as with art history in academic that has its own unique problems.) Presses diversified partly as they reinterpreted their missions more broadly but also partly because economic pressures from declining library sales compelled them to seek out different markets (like trade, through retail bookstores). But the changes in the retail marketplace might compel presses again to reexamine their strategies as general trade publishing becomes even more of a challenge for smaller publishers. On the other hand, for some sectors like regional trade, the new technologies might provide greater opportunities for sales. I think particularly of how Foursquare might help presses with regional lists take advantage of location-related marketing to enhance direct-to-consumer sales. Better direct-to-consumer marketing may also help turn PDA from a potential loser into a potential gainer, as early alerts to scholars on campus could turn into PDA-driven sales. As for mobiles, I think screen size is a real constraint for some fields where, e.g., complicated tables are common. Thus I believe tablet devices have a better chance of succeeding for many scholarly books and journals in, say, the social sciences than do smart phones. Finally, I agree with Don and Joe that quality will remain key. There is some reason to worry about that even for scholarly publishing as more presses seem to be lowering standards for copyediting and proofreading, as I have discovered in reviewing several books recently.
Fantastic post, Joe; thanks too for pointing out Don Linn’s.
I’d like to point out an issue that I think underlies most, if not all, of these issues: granularity. Chunks. The smaller the better in most (but not all) cases. My normal mantra is “appropriate chunking.” Here’s why I think this is relevant to your ten issues:
1. Metadata. Publishers currently tend to think about metadata at the book or article level. In many cases, that’s all that’s needed. But especially in books, and especially when you want to drive into content to get to the _chunk_ that is relevant (for a user, for repurposing, etc.) ib becomes helpful to have _some_ kinds of metadata at a deeper level. One of the best examples of this that I’m aware of is the New England Journal of Medicine. They have metadata all the way down to the paragraph level. Most publishers can’t afford that and don’t yet need to do that. But when appropriate, metadata at a granular level is VERY powerful.
2. Author pays. This is a bit more of a stretch, but I will point out that one of the issues for book publishers, especially publishers of textbooks, contributed books, and reference books, is the issue of rights and royalties associated with disaggregating that content. (See Linn’s “Rights, rights, rights” issue.) The same issues are an obstacle to making author-pays work in those environments. Like I said, a bit of a stretch.
3. Mobile computing. The value of granular chunking is obvious.
4. Sensor publishing. Ditto. The most obvious and currently often used example is GPS/geospatial tagging. This works best at a granular level in the content — in fact even at the “phrase level,” or “entity level.”
5. Text mining. Particularly in book and reference content (not as much in journals), the user doesn’t want the whole work; text mining often needs to find the _portions_ of the work that are relevant (even if that is just a chapter of a book or an entry in an encyclopedia).
6. Face-down publishing. Again, what the user usually needs is a chunk — the _right_ chunk. Not the whole enchilada.
7. Direct-to-consumer marketing of subscription services. I admit this one may be a bit of a stretch too. Perhaps not as relevant to journals, but perhaps to books and reference content. Especially if what the consumer is subscribing to is _information_, not books or articles.
8. Patron-driven acquisition. Here I’d argue that the “appropriate chunks” are the articles and books, as opposed to journals or bundles.
9. Artisan publishing. Chunking is not as relevant, but what _is_ relevant is the liberation from the size/quantity expectations/limitations of typical publishing modes, especially books. This is going to be done a lot with content that is not traditionally book-length, for example.
Okay, I may have stretched a bit to include all nine. But I think granularity is really key — along with its corollary, that publishers need to think first about the content (sorry, Mr. Linn) and then the products, rather than starting with a product focus that often misses the value of the chunks.
Agreeing with Sandy above, one of our many strengths that contributes without discrimination to our challenges is the diversity of our offerings. B-school strategy-types might say we have “poor internal fit.” Various offerings’ working solutions (not to mention experimental ones) can drain resources in different, often non-complimentary, and sometimes downright conflicting directions, killing five birds with ten stones. Caution is due; but, I presume, taken as read.
Direct-to-consumer marketing [of all] combined with collaboration is the hands-down favorite (IMHO); here economies of scale will allow us to slay flocks of birds with a pebble
Also, WOW, what a brilliant post–thank you!
Chunking, granularity, yes, absolutely right – and it’s all about the metadata. We chunk many of our books down to the chapter and then chunk out the tables and charts inside the chapters too. Then we add translations of the summary chapter (sometimes as many as 20) so we can reach those who are searching in languages other than English. Result: downloads are rocketing. We have one book that has been chunked into 120 pieces – each with detailed metadata. (With all this chunking going on, we’re beginning to refer to our books as portals.) The results are interesting: downloads of chunks are running ahead of complete e-books. Now we’re beginning to analyse why some chucks get downloaded more than others, which will give our editors useful evidence to shape how our authors prepare their next books. Do I see a virtuous circle taking shape . . . ?
Thanks for these terrific examples! I love the concept of “books as portals.” I promise to steal that. 😉 Seriously, I’m really glad to know of these real-world examples of how this is working for you. That’s exactly how it _should_ work. “Downloads are rocketing.” Pay attention, folks! And I also commend your multilingual strategy, and your realization that one often ignored benefit of this is the _insight_ it gives you as to what your market is interested in. It’s amazing how few people get that. Keep up the good work, and welcome to the Metadata Millennium!
I’d like to add one if I may. Personalisation/Profiling/Privacy. I’ve bundled them together as I think they are refelctions of each other in many ways. I realise that there are are a multitude of issues surrounding the ‘database of intentions’ as it pertains to mining users clickstreams and indeed other data trails ( 1, 2, 3, 4, 6 and 7 above all interface with this – especially 3 and 4). However, businesses out there have monetized this information and done so very successfully. Google’s personalisation of search results and Amazon’s reconfiguring of their retail site are just 2 examples. Collectively, our information workflow is missing a big trick here and there needs to be a debate about this.
okay so “metadata” or “chunks” or “grains” allows better searching so (in theory) more people find your stuff. Are they citing it? Or just going “oops no thanks” and moving on? I’m in a fairly specialized journal, so perhaps I just don’t get the value of this, certainly I see people wanting to take yet another chunk of my revenue stream, which is already fragile, eh?
Rachel,
Chunking is about cutting through the clutter and delivering exactly what a researcher is looking for – relevant and citeable paragraphs. This strategy can provide a significant boost to article citations.
Think about the citation decision process. When does a researcher decide that they should cite an article? Probably after reading a paragraph within the article that jumps out at them that is very relevant to their work.
Now, consider all the researchers who open an article to quickly scan the text for relevance, but then give up and click the back button before getting to that key paragraph.
“Chunking” of content – i.e., the additon of concept-level metadata – will allow article discovery systems to pick-out exactly the paragraph(s) that will demonstrate to a researcher an article’s relevance.
Some criticisms of this strategy include the threat of introducing yet another [“filter bubble”|http://on.ted.com/9NFt] and the extreme technical challenge of predicting which paragraphs will be of most interest to an article’s readers.
I believe that there will be a gradual shift in the way that scientists prepare their articles resulting in a substantial increase in presence of concept-level metadata. In the world of online research, more metadata means better targeting means more citations – authors (and publishers) that decide to put in the effort will benefit from a new research and discovery paradigm.
This is interesting and brain-stretching…and could be such a new idea (for me anyway) that I just don’t “see” it yet… yet, isn’t helping-with-research an expert librarian’s job not the publishers? I’m just saying, who will pay for all the nifty links (or whatevers)? Maybe the authors can “embed” or “code” or whatever some section (sections?) of their paper and it moves with no effort downstream into layout and so on. We could all save time. , …. {{grins}}. Thanks for the Ted link, I’ll click it!
oh my joke didn’t translate just , … ! I’ll try this {{blather part of paper}}{{end blather}} {{nonsense stats}}{{end nonsense stats}}….
You meant this as a joke, but this is exactly what I expect authors will begin to do (and already do to some degree).
We see mathematicians add theorem, equation, lemma, example, etc. “tags” to their text, but what about tags that only appear in the metadata that describe the rhetorical significance of a particular piece of text?
Consider the addition of … tags or … or even more valuable …, and yes even …
Someone should really develop an XML standard for this type of scientific markup.
Rachel, I’ll take the liberty of replying since I intially brought up the chunking issue that Nathan was fleshing out. First of all, you need to separate the concept of “chunking” from “semantic tagging.” They’re two separate but related things. There are lots of good reasons to clearly delineate meaningful subsections of content; ONE of those reasons is so that you can associate semantic information at a granular level. The latter can be done in all sorts of ways. I mentioned the New England Journal of Medicine; they use a very rich, well managed semantic technology and do that semantic tagging in a very sophisticated way. At the other end of the scale is simply attaching keywords to chunks as metadata. It all depends on what you’re publishing and what makes sense (and adds value) for you. You are coming at this from a journal perspective; it’s true that for most journal content, people just want to get the whole article and there is very little slicing-and-dicing and recombining of chunks. So while the benefits Nathan cited are still there — getting the user to exactly the portion of the content she’s looking for — in other contexts, like a lot of books, there are many additional reasons for granular chunking and semantic tagging. (The problem is especially apparent in books that only have metadata at the title level — not even chapters.) That’s why I stressed “appropriate chunking,” or what in another context I referred to as “thoughtful chunking.” The same goes for appropriate semantics, thoughtful semantics. Sure, the more granular the better and the richer the semantics the better; but each publisher has to make a judgment what level is appropriate and what is the most practical way to get that accomplished. If you engineer your workflow well, you can get a lot of this as a byproduct (especially the granular chunking, information that publishers already have upstream in their workflows but mostly throw away when the content is published), but that’s another discussion.
Let’s not let the point drop that people in this exchange are quietly talking about metadata assigned granularly, down at least to the paragraph level. I heard a presentation at AAUP this year in which publishers boasted about tagging at the chapter level. That’s not deep enough. It’s not just a matter of what unit you can sell but also of the discovery process, where the more granular, the better.
Isn’t a good index to a book a form of semantic tagging? Why cannot indexes perform this function in digital texts as well?
An index to a book lacks a few things that semantics require to work — standardization (indexes can be idiosyncratic and variable, so they don’t match up one to the other reliably) and the network effect (my book’s index doesn’t connect to your book’s index in real-time to make something greater than the sum of its parts). Indexes (or indices, if you prefer) definitely employ some of the same approaches, and some of the founders of semantic offerings got their inspiration from editing indices together back in the day, but I’d not say an index is a form of semantic tagging, for the reasons above.
An index is only suitable as a source for semantic tags if the granularity of your “thoughtful chunks” is at the page level.
Sandy’s point is a really good one, though. It drives me crazy that people throw out indexes (that is, back-of-the-book indexes, ideally done by a professional indexer). Those are one kind of “intellectual guide” to the content and they’re extremely valuable. People who think they’re obsolete because of search don’t understand how an indexer (and an index) works. While this doesn’t actually get at the ideal granular metadata that Nathan is talking about (and for which I used NEJM as an example), it’s still a valuable semantic resource that is a shame to waste. The problem, as Nathan points out, is that the explicit references are to page numbers. (Note: also, don’t throw away page numbers! Another different and big discussion for another day, for which I will drop a few hints: citations! cross references! indexes!) But that’s just an artifact of print; the index entries are _actually_ pointing to content. It is actually possible to make this work; we did it when we produced my Columbia Guide to Digital Publishing way back in 2002: the underlying XML of the index actually does point back to the _point_ or _range_ in the XML of the text that is being referred to. This is virtually never done, I admit; but that doesn’t mean it can’t be. Also, for publications that are highly structured, like reference books or books that have numbered sections and subsections, indexes often point to the actual structure, not the page numbers. Sorry to go on a bit but this is a hot button for me. I love indexes, I respect indexers, and I think as workflows get more and more sophisticated we’ll start to see, once again, how valuable they are. This is not to contradict Nathan’s comment, though; semantic tagging of granular chunks gives you a different way of semantically navigating the text than the index does. I think both are valuable.
Also, to build on Kent’s point, an index doesn’t solve the issue of disambiguation. Consider tagging a page with the phrase “normal form” based on it’s presence in the index. This phrase has a different meaning in at least four distinct disciplines. From a relevant-results perspective, this is a problem.
One more clarification on this: semantic tagging is best when it uses a very consistent vocabulary (ideally governed by a taxonomy or ontology, with a thesaurus) applied in a granular and consistent fashion to the content, and in the same way across lots of related content. An index is a different, but also very useful, thing: it is an INTELLECTUAL GUIDE to the content of a particular work, where the indexer has thought, in a different way than “tagging,” what the content is about, what things relate to what in what way, with a sense of proportion and appropriateness, and is actually _selective_, that is, thinking about, if you were a reader, _which_ parts are you actually looking for that mention, e.g., XML (to pull a topic out of the blue), rather than _any or all_ parts that mention XML. The latter is a good thing too; that’s what semantic tagging is for. But I want to take a stand for indexers. (Separate concern: yes, what they do is usually confined to a single title; it’s a guide to _that book_, specifically.) I’m a huge advocate of semantic tagging; I just hate to see indexes dismissed, as they often are. It’s not an either/or. These are both important but distinct types of guides to the content. (A third one is a good, detailed, annotated TOC. Also not the same as the other two. Also very useful, for certain content.)
A comment on the “Author-Pays” business model: In the sphere of scholarly publishing this model differs significantly from it’s trade counterpart because of the way that scientific research is funded.
Unlike an upstart novelist, a research scientist (in many fields of study) will receive funding from a government organization or research institution to carry out their research. A typical requirement for receiving such funding is to have the resulting research published in a peer reviewed journal.
Government funding organizations, in particular, are keen on having the research that they support become publicly available. This funding model is incredibly conducive to the success of open access model because the cost of article processing can be included in a research grant proposals. In the funded disciplines (medicine, mathematics and the physical sciences), it is only in very special cases where the author actually pays.
Even in disciplines where funding opportunities are absent there are ways of removing the burden of payment from the author.
By establishing prepay deals with research organizations publishers can reduce or eliminate the requirement for author contributions coming from the universities that have prepaid.
Under open access, the money that keeps the Publisher in business comes from the same sources as in the traditional publishing model, it is just delivered before an article is published instead of being paid afterward to provide researchers with access.
As library acquisition budgets shrink and the number of articles and chapters that the academic publishers are offering subscriptions to increases, the open access publishing model offers a way for publishers to grow by pulling revenue from research budgets instead of from libraries.
(Disclosure: I am a Product Development Manager at Springer Science+Business Media – BioMed Central is a subsidiary of Springer)
I have argued elsewhere that open access may work for scholarly publishing in ways it may or may not work for trade publishing. See my “Back to the Future” essay here (at the bottom): http://www.psupress.org/news/SandyThatchersWritings.html
1. I hope you reconsider and go ahead and complete your list of distractors. Doing so would have a positive value of making people stop and re-assess.
2. I like the post, but really it should be called ‘Key new issues’, since there are an awful lot of long-established issues (say, how do you get expert authors to write accessibly?) that persist – it’s just that they’re not novel and don’t get airtime.
Not to be churlish but I regard semantic tagging as another utopian vision that requires a lot of work, standardization and expense with no corresponding payoff. In addition, information is not that simple.
On the other hand, for people into tagging, my major client has a free taxonomy/thesaurus of the physical sciences that is pretty powerful:
http://www.osti.gov/taxonomy/.
It combines two semantic structures. The core is a thesaurus with 20,000 terms and 200,000 term-term (RT) relations, plus 3500 mini taxonomies (BT-NT sets) among the terms. Rising above this hummer is a unique taxonomy of science that provides multiple paths down to each of the thesaurus terms. I think it is one of the most complex semantic structures in the world and I call it the Word Web of Science.
OSTI just uses it to help people manually pick search terms. Anyone interested in it can contact me at dwojick@hughes.net. It is in the public domain.