Linus with the Cybook OPUS ebook reader
Image by Eirik Newth via Flickr

Given all the attention from mainstream media and the blogosphere, one would think that the publishing world revolved around trade books and that ebook readers, such as Amazon’s Kindle, are as ubiquitous as teenage girls at the latest “Twilight” movie. As the attendees at the recent SSP Digital Opportunities and Challenges Seminar learned, however, the trade book industry’s foray into the ebook market trails the professional and scholarly publishing (PSP) ebook market by a wide margin—and there is no evidence that will change in the foreseeable future.

Al Greco, Professor of Marketing at Fordham University’s Graduate School of Business Administration, presented data he has complied on the ebook market in the US to seminar attendees.  These data are drawn from a variety of sources, including the US Department of Commerce, the US Department of Education, the Bureau of Labor Statistics, the US Department of Treasury, the Federal Reserve Bank, the Congressional Budget Office, and the National Bureau of Economic Research. Professor Greco’s data are must-read information for anyone interested in the ebook market, and are the only data I have seen that breaks out the US market by industry segment.

According to Greco, book publishing (print and electronic) in the US is a $35 billion dollar industry. This year, he forecasts that ebooks will account for 5% of that revenue, or $1.76 billion. Of that $1.76 billion, trade books account for 8.6%, or $151 million; K-12 accounts for 8.1% ($143 million); higher education accounts for 6.9% ($122 million); and university presses account for 0.4% ($7.7 million). Professional and scholarly publishing titles represent 75.9% of the US ebook market, or $1.33 billion.

In other words, professional and scholarly ebooks account for more than three times the rest of the US ebook market combined.

With the caveat that predictions are very difficult in today’s economic climate, with forecasts becoming increasing unreliable the further out they go, Greco provided forecasts for the ebook market through 2013. Despite this caveat, these forecasts are worth noting as he uses different drivers for each market segment. In other words, his financial model does not simply apply the same assumptions to the entire market, but considers the different factors that affect each industry segment. Some of the drivers Greco cited include:

  • The overall steady growth in economic resources in corporate and university sectors in the last 20 years to “buy/rent” scientific, technical, and medical (STM) as well as legal, tax, and regulatory (LTR) information.
  • The historical emphasis on “publish or perish” in universities and the concomitant need to publish more research in peer reviewed publications.
  • The demand for Internet-based information services.
  • The “branding” of professional and scholarly publishers and publications as the “coin of the realm” in professional and scholarly publishing.
  • Increases in the number of professionals who need digital access to STM-LTR content, including lawyers, hedge fund managers, private equity firms, investment bankers, and government employees.
  • The emergence of “inexpensive” computers (including desktops, laptops, and netbooks).
  • Growth in the number of faculty members and graduate students who want and need STM-LTR content.

Over the next four years, Greco predicts these drivers, among others, will result in the US market for professional and scholarly ebooks growing by 94% to $2.60 billion. During the same period, he forecasts that the trade book sector will undergo growth of 119% to $330 million. This would mean that scholarly and professional ebooks will continue to dominate the US market, accounting for 74.7% of ebook revenue through 2013. Even with growth of over 100%, trade books are only forecast to grow to 9.5% of total US ebook revenue.

Of course, as Niels Bohr said, “prediction is very difficult, especially about the future.” But even if Greco’s forecasts for the trade sector are dramatically short and trade ebook revenues grow by as much as 200% through 2013, trade books will still account for less than 20% of the ebook market.

Which makes me wonder why the media is so fixated on the trade sector of the ebook market.

Reblog this post [with Zemanta]


Last Thursday, Jonathan Eisen, an evolutionary biologist, Open Access advocate, and the first Academic Editor-in-Chief of PLoS Biology, lost his cool.

In a blog post aptly titled, For $&%# sake, Bentham Open Journals, leave me alone, Eisen unleashed his fury on a publisher that has not let up on its “crappy spammy” email campaign to have him contribute to their journals.

Not only is Eisen perturbed by Bentham’s incessant requests, he takes issue with the their claims that publishing in open access journals necessarily leads to more readership and citations.

Yes, that is right, the crappiest, most boring, most idiotic article in an OA journal will receive “massive international exposure” and “high citations.”

Eisen doesn’t seem to be alone in his feelings about Bentham.  On Thursday, a  professor of mine received a solicitation to have him serve as Editor-in-Chief of The Open Communication Journal.  For a professor in a department of communication, the first sentence should have been a clue that this publisher should hire a copy-editor.  But if you read on, the financial ties between the new post and the publisher should raise some serious concerns about Bentham’s ability to separate editorial decision-making with their business model:

In recognition of your outstanding reputation and contribution in the field of Biology. We are pleased to propose your name as the Editor-in-Chief of “The Open Communication Journal”. After the selection your role as the journal’s Editor-in-Chief will be to solicit and submit a minimum number of ten manuscripts to the journal each year [...] For all the manuscripts that you submit to the journal, for the first ten that are published, we will pay you an annual royalty of 5% of all fees received on these manuscripts.

The editorial board boasts an astounding 169 names, with the expectation that board members will publish regularly in the journal. And to provide an incentive for their contributions, Bentham promises to waive their article processing fees:

We expect that Editor-in-Chief, Associate Editors, Co-Editors in an Open Access Journal will submit at least one article per year which will be published ABSOLUTELY FREE OF CHARGE. Beside, each and every submission from Editor-in-Chief will be published free of cost.

Earlier this year, the publisher’s acceptance of a completely nonsensical, computer-generated manuscript, with their insistence that it was peer-reviewed, led to the resignation of an Editor-in-Chief and members of several editorial boards.  The publisher’s story went through several revisions, first denying that the journal accepted the paper, then pretending it did in order to track down the perpetrator.

Author-pays open access publishing is still relatively new and uncertain in the minds of many scientists.  With certain publishers possibly giving OA a bad name, it is understandable why advocates like Jonathan Eisen would be prone to lose their cool.


Reblog this post [with Zemanta]

Last week, the arXiv received a three-year, $883,000 grant from the National Science Foundation, thanks to federal stimulus money from the American Recovery and Reinvestment Act (ARRA).

According to the grant description, the project “proposes to investigate and implement a variety of tools for enhancing the very widely used and popular Arxiv.org infrastructure, based on information filters for assisted service discovery and selection, text-mining, information genealogy, automated classification and identification of composite resources, data-mining, usage analyses, matching and ranking heuristics, support for next-generation document formats, and semantic markup.”

In 2001, Paul Ginsparg, the creator of the e-print repository and principal investigator of the grant, brought the arXiv with him to Cornell University.  Since then, the arXiv has been managed and maintained by the Cornell University Library.

The grant will generate jobs for two graduate students and one half-time programmer.  Interviewed for the Cornell Chronicle, Ginsparg outlined why such improvements for the arXiv were necessary:

Academic publishing has lagged behind the commercial Internet in providing interactive enhancements that today’s students take for granted. Configuring research communications infrastructure for the next generation of researchers requires getting into the heads of near-term future researchers — undergrads and grad students — coming of age in the Google/Facebook/Twitter era.

Reblog this post [with Zemanta]
Basic Group 1 entities and relations of the FR...
Image via Wikipedia

Recently, I came across an interesting article thanks to Jill O’Neill’s Twitter stream. The author of the post, Lukas Koster, tried to assess whether an e-book is really a book. Now, while this might seem an academic question of little practical consequence, the fact of the matter is that for libraries, publishers, and authors, the questions raised by Koster are fundamental.

The essence of Koster’s essay deals with the “Functional Requirements for Bibliographic Records,” (FRBR) limited to the Group 1 entities including books. He talks about the logical, phenomenological chain the Library of Congress uses to think through these issues:

  • Work - a distinct intellectual or artistic creation
  • Expression - the intellectual or artistic realization of a work
  • Manifestation - the physical embodiment of an expression of a work
  • Item - a single exemplar (or copy) of a manifestation

In this model, a “work” is the intellectual output that is revealed through expression and captured in some manifestation that is possessed as a single item — a copy of a book, let’s say.

But when contemplating how a manifestation becomes an item in the e-world, Koster runs aground, asking if perhaps the FRBR needs another level to accommodate the realities of e-publishing — that is, articles not captured in an issue, e-manifestations that are manipulable on the device level, and the proliferation of output standards and media.

For cataloging, selling, archiving, and tracking, the proliferation of manifestations into items that recurse into manifestations becomes problematic. My personal experience as an author with books available in print, PDF, EPUB, mobi, and many other formats has brought home how tough it is to manage all these different options, and also how different the experience can be for different readers. People enjoying “an absorbing mystery that is chock full of twisting plots and tantalizing clues” (hey, it’s an actual review, and I’m sick of Dan Brown outselling me!) are reading the same work realized through the same mode of expression (writing), and even the same manifestation (a novel), but they aren’t experiencing the same item. Did they read the book on the Kindle? The Kindle iPhone app? The Kindle desktop app? PDF on-screen? PDF printed two-up? PDF on a Sony Reader?

Ultimately, perhaps the answer isn’t to add bifurcations to the “item” end of the model, but to truncate the model after “manifestation.” But there are problems with this, as “manifestations” are described more precisely by:

  • form of carrier
  • extent of the carrier
  • physical medium
  • system requirements (electronic resource)
  • file characteristics (electronic resource)
  • mode of access (remote access electronic resource)
  • access address (remote access electronic resource)

As noted above, a Kindle book downloaded may be read on a Kindle device, an iPhone, or a computer — or any combination of the three over the course of reading the entire work.

Should we trim up the tree further? Simply stop at “expression”? In that case, you would have the expression of the work “Tom Sawyer,” with the FRBR silent from that point on. And that may be where we’re headed — toward a world that can’t presume items or manifestations, but only list expressions of works. Or perhaps we should evacuate some of the detail from “manifestation” in order to provide an appropriate silence on the issues involved.

Attempts to identify each item potentiality or each manifestation are becoming akin to debates about how many angels can dance on the head of a pin. In this case, however, it’s driven by a proliferation of items that can manipulate manifestations unpredictably.

In this realm, the angels certainly aren’t making it any easier.

Economists have compared the gross domestic product (GDP) to heat — it doesn’t measure the amount of money in a system, but rather the rate of spending and exchange. The higher the rate of exchange, the higher the GDP. Goods don’t have to be manufactured for the GDP to be positively affected.

As an example, consider a patient dying of a common disease. The physicians, nurses, and others caring for them; the equipment depleted and reordered; and the insurance, social, and financial services leveraged — all contribute positively to the GDP. They trigger spending.

Financial gurus like Alan Greenspan sought to avoid volatility in the economy by moderating spending and inflation — basically, keeping the room warm by adjusting the thermostat. A steady heat was the goal.

David Beckworth has an interesting post and graphic showing how this policy resulted in a fairly tight set of fluctuations (modest inflation and modest recessions), except for the 2008-09 downturn, which has put us in the GDP freezer.

It’s a reminder of how dramatic the economic freeze has been, in both severity and speed. Companies, individuals, and states froze spending, driving us further down as money was treated not as temperature but as materials.

Stimulus spending may heat the room again. Let’s hope so.

Reblog this post [with Zemanta]
Potter fans wait in lines outside a Borders bo...
Image via Wikipedia

We’ve all heard the pronouncements — the e-book revolution is here, boring old books made up of words will soon be replaced with exciting, radical digital artforms. In an article about the rumored Apple Tablet, Gizmodo’s Brian Lam says we need to “create hybridized content that draws from audio, video, and interactive graphics.” The always entertaining Fake Steve Jobs wrote a piece that was insightful in many ways (covered by Kent here), but which called for content that “incorporates dynamic elements (audio, video) with static elements (text, photos) plus the ability for the audience to become content creators, not just content consumers.”

Is it just me, or does this all sound kind of familiar?

Flash back to the early 1990s, when an exciting new technology called “CD-ROM” or “multimedia” arrived on the scene, with the promise of tearing down the publishing industry. CD-ROMs could contain hybridized content that drew from audio, video, and interactive graphics. They could incorporate dynamic elements with static elements as well.

How’d that work out?

The CD-ROM had a brief, shining moment in the sun, but was rapidly replaced by the Internet, which proved to be a much more flexible way of distributing information and melding different types of content.

Take a look at the exciting new Vook offerings. This is innovation? Are we just chasing the same multimedia dead ends? If the CD-ROM was beaten out by the early, primitive Internet, why would recreating a CD-ROM in a fancy new package work now? If a return to the CD-ROM is indeed the future, then what happens to e-ink based e-readers like the Kindle or the Sony Reader? These can barely show black and white pictures, so video is a non-starter. Barnes & Noble attempts to get around some of these limitations with a clumsy kludge of having segregated screens on their device, not quite the imagined integration that creates a new form.

Just as the smarter content companies of the 1990s focused on the web instead of the CD-ROM, many of the top digital players, Google, Apple, and O’Reilly seem to be ignoring e-ink and its various formats and instead focusing on XML and HTML.

Meet the hot new e-book device–it’s called “Internet Explorer,” and the revolution will result in an exciting new product called a “web page.”

The problem with declaring the book “dead” is that it assumes a zero-sum game — one art form must disappear if another is to appear. I’m not sure this is the case. The birth of MTV and the music video hasn’t made the CD disappear; both seem to coexist and even synergize. While the web has had immeasurable impact on so many aspects of the way we live, one thing it hasn’t conquered is long-form reading, particularly for fiction. The web can be great for non-fiction, for textbooks, manuals, and the like, and it’s very easy to see these sorts of books going exclusively digital. Most scholarly publishers have been working on this transition for the last 10-15 years, moving journals online, and building websites around textbooks. We’re already generating video and audio content, spectacular visualizations of data and interactive animations.

But the transition is not as simple for other types of books. I’ve tried to read novels online, but barely made it through a few chapters.

Most readers have trouble finishing my longwinded blog entries — imagine if I went to novel length!

For many authors, the whole point of putting your novels up on web pages is that it encourages readers to sample them, then buy the print version for a full reading. The novel is a highly evolved form, one that’s extremely good at delivering content in an effective manner. That’s why, despite the limitations, e-ink devices like the Kindle work so well. They’re a digital recreation of an efficient analog form, and perhaps nothing further is needed.

I’m not convinced that Moby Dick is going to be improved by being interrupted by videos of whales, or a background soundtrack of sea shanties.

Yes, the market for books is going to change, although it’s more of a continuation of a change that’s been happening for a long time. Fewer people read books, and that’s likely a trend that will continue. Eoin Purcell thinks print will continue to thrive.

My colleague Richard Sever likes the analogy of live theater, which is certainly less at the center of the cultural world than it once was, but has its enthusiasts. The average person may go to the theater on a special occasion, maybe once every year or two. Now, instead of taking the Mrs. to see a play for your anniversary, perhaps you’ll buy her the new Harry Potter novel (or whatever is popular at the time).

That said, I do expect to see new forms, new ways of telling stories to emerge from new technologies. I don’t mean CD-ROM-like content. Too many pundits are so limited in their ideas that they are merely insisting on turning books into other already existing forms rather than creating something new. Add enough video to a book and it becomes a movie. Add enough interaction to a book and it becomes a videogame (scratch that bit about Internet Explorer above — meet the new e-book reader, “the Gameboy”).

Mike Cane hits on some interesting directions here, where he discusses the graphic novel Watchmen and the television series The Singing Detective. Both are groundbreaking, tour-de-force works where the medium being used is an important part of the storytelling. It’s definitely food for thought, though creators like Alan Moore and Dennis Potter are singular visionaries. It’s hard to imagine an entire industry of creators living up to their creative standards and incorporating form in such an imaginative way.

Another valuable suggestion comes from Robin Sloan, that a new format will arise based around events. This is intriguing on many levels—the end product can be enjoyed simply as the final work, or the entire experience can be participatory on many levels. You can readily imagine how a scientific meeting or course could be presented in this manner, and evolve into a new type of textbook or lab manual.

One further thought—as “mobile” becomes the current buzzword, expect to see a return of a form that has languished in recent years, the short story. If long-form reading is hard to do on a web-based browser, shorter fiction seems perfect for filling in those train rides or waiting room delays. We’re already seeing some entertaining attempts at this, like Electric Literature.

Take all that with a grain of salt. If history is any measure, it’s not yet possible to predict what new forms will arise. Twitter started off as a means of sending out status reports, but its users found better and more interesting things to do with it. Google has released Wave, and judging from the comments on this posting, users have already discarded Google’s intent for the tool and are instead finding completely different ways to use it.

With such uncertainty, the only sure path for a publisher is to remain open, flexible and aware. Books will be around for a while yet, and abandoning a currently-profitable medium is premature. But rather than seeing this as the death of something, it instead should be viewed as the potential birth of something new, and the flexible publishing house will incorporate these new media alongside traditional forms.

Reblog this post [with Zemanta]
MIT Sloan Logo
Image via Wikipedia

Informal peer-to-peer sharing of scientific articles is common for researchers in developing countries, a new study suggests.

The article,Access to scientific literature in India,” appears in the December issue of the Journal of the American Society for Information Science & Technology.   Its author, Patrick Gaulé, is a post-doctoral student at the MIT Sloan School of Management.  An earlier draft of his manuscript is freely available.

Reporting on several related research studies, Gaulé combines a massive bibliometric citation analysis with a survey of Indian researchers on their article sharing behaviors.

Analyzing 1.27 million citations to over 45,000 articles published in 2007, Gaulé compared the citation behavior of Indian researchers with those of Swiss researchers.  On average, the reference lists of Indian papers were 6% shorter than their Swiss counterparts when published in the same journal.  This translates to about two fewer citations.  The reference length effect was more exaggerated in the life sciences (9% decrease for biology, 11% for medicine) than for physics, engineering, or chemistry.

In addition, Indian researchers included about 50% more citations to open access journals than Swiss researchers, although this translated to just a small fraction (0.16) of one citation.

In a similar study published earlier this year, Tove Frandsen reported that authors in developing countries were no more likely to cite open access journals, although her limited sample size (150 biology journals) did not permit her to detect small differences in her data.  Evans and Reimer’s 2009 paper in Science analyzed 26 million articles appearing in over 8,000 journals and thus was able to detect small but statistically significant differences.

Gaulé is cautious when attempting to interpret the differences in reference length as many citations listed in a paper are perfunctory — that is, they are not necessary to understand the meaning of paper but appear to serve to acknowledge that other general work has been done in the field.  Many authors understand this form of citation as “hand-waving.”

Assessing whether differences in citing behavior reflect a severe problem is difficult. Do missing references really imply missing knowledge?

Even the best Indian research library has sub-optimal access to the scientific journal literature.  The Indian Institute of Science, for example, lacks access to one-third of the top biology journals.  Gaulé was interested in how Indian researchers cope with their situation.

The answer? File-sharing.

Indian researchers routinely send requests to corresponding authors and peers for copies of articles, Gaulé reports.  Some Indian researchers responded that they obtained articles from former students now doing research in the United States and Europe.  Most requests for copies were honored, and the strong sharing ethos in science may help attenuate the effects of subscription access barriers.  Gaulé writes:

Thus, in practice, the importance of openness as a norm of science lessens the effect of restrictions imposed by publishers on access to the literature. It could be that the prevalence of informal information sharing is increasing over time, thanks to the generalization of new technologies facilitating information exchanges

Still, having to rely on authors and peers to supply one’s information needs may not be an optimal way to conduct science.  Programs like HINARI, AGORA, and OARE may help alleviate access problems in some of the world’s poorest nations, although countries like India fall above the cut-off for eligibility. Gaulé concludes:

In the long run, having all scientific publications freely available to the world from the day of publication may be a desirable goal. In the short run, however, it is more important to make scientific publications freely available for developing countries because this is where the problem really lies.

Reblog this post [with Zemanta]
Olmsted County Route 14.
Image via Wikipedia

The widespread availability of digital video recorders (DVRs) in the home has changed television viewing habits for millions. In the US, more than 33% of homes have one. Now, instead of using VCRs and tapes to get grainy quality copies of shows for later viewing, DVR owners can get digitally pristine copies of their favorite shows, set the machine to record every episode of a favorite show, and delete shows immediately after watching them.

But, of course, the most interesting part of the DVR is the ability to easily fast-forward through ads.

For me, this has been a godsend, shaving easily 1-2 hours per week from my television time. In addition, I can skip some really stupid ads.

Initially, television advertisers dreaded the coming of the DVR. But, time after time, they’ve found that DVR use actually helps ratings and, oddly enough, might actually make people more attentive to advertising, according to the New York Times.

How can this be? Viewers now have in their homes and hands a tool that allows them to skip advertising with ease and get only the shows they want. How could those factors lead to more viewers and more exposure to advertising?

“The DVR was going to kill television,” said Andy Donchin, director of media investment for the ad agency Carat. “It hasn’t.”

The television experts were wrong at the outset, and now that things have turned out differently than expected, their interpretation continues to be wrong. Brad Adgate (really? Adgate? Yes, really.) of Horizon Media believes the basic couch potato hasn’t changed. “It’s still a passive activity,” he’s quoted as saying in the Times article.

But I think he’s wrong, and so does Rex Hammock, head of a marketing and media firm in Tennessee. Here are a few reasons why DVR use may benefit both viewership and advertising awareness:

  1. The viewership question is easiest — people now have a machine that timeshifts their favorite shows, allowing them to never miss an episode while not consuming expensive media like tapes or DVDs. If I never miss an episode and can watch shows broadcast while I’m asleep, with no extra investment or mess, I will watch more television.
  2. To escape ads before, people would use the time to use the bathroom, make a sandwich, make a phone call, or check email (more recently). In fact, there was once an urban legend about plumbing problems being cause during commercial breaks during the Super Bowl. Running at normal speed, commercials allowed for 2-3 minutes of activity away from the television. Sped up using a DVR, people don’t leave the room.
  3. Viewers have to watch the ads to use a DVR effectively. You have to know when the ads stop and the show resumes, which means watching the ads as they go by. Even at a high speed, the ads register. And people do watch carefully. As Hammock puts it so nicely, “Often, the person with the control is being judged by a second party for their finesse in stopping the fast-forwarding at the precise time it needs to stop, so, therefore a second party is also engaged in looking at the sped up commercials.”
  4. Television viewers like television, and they even like some television ads. In our house, the Apple ads are particular favorites, as is the Travelers Insurance ad featuring the dog worried about hiding its bone. There are others. We’ll stop for those ads and enjoy them every time. Again, we all stay in the room for these accelerated commercial breaks now, and stop only for those ads we enjoy.
  5. Programs that “get it” are going to do better. SportsCenter is a prime example. They’ve recently added a sidebar menu of upcoming segments so that viewers can reliably scroll through a saved episode and see the coverage they want approaching as it progresses down the sequence of items. Very smart, and my loyalty to SportsCenter has only increased since this innovation.
  6. There’s one other explanation, related to the behavior pre-DVR and showing how badly Adgate and his ilk are missing the story here — viewers do other things while watching television. When it was a passive activity, there was no way to control the experience with the device, so people would control it by tuning it out — leaving the room during commercials, doing other things while waiting for their shows to resume, or missing shows altogether. Now that there is device-level control, viewers are more engaged, less passive, and more attentive.

Giving viewers control over a once-passive experience has increased engagement, created a more involved audience at the other side of the “boob tube,” and made watching television a more reliable and rewarding experience.

The analogy with publishers I see is that print publishers used to enjoy the illusion that entire issues of their product were consumed, even though what they delivered was a relatively chaotic mix of temporally related material (things that came into the office at roughly the same time, but related mostly in that way alone). While the true innocents might have assumed that 100% of their hard work was consumed by readers, reader data regularly revealed the actual engagement to be much lower, from 80% on a sporadic basis to as low as 5-10% in some studies I’ve seen.

Now, search engines and other tools make it possible for users to find only those articles they’re interested in. Suddenly, the mass media of journals became controllable. But are we like the unbelievably named Mr. Adgate? Are we appreciating the paradox of the times, and the power in the paradox?

Users can now find particular articles or books at any time, anywhere. They can share them, contextualize them, and embellish them with local knowledge and insights. This is engagement — a more careful, attentive interaction with the content. But some publishers worry about what content is online, or how it’s shared. Yet, they may have no choice but to accept that users will control their information. And when the tools for control become as robust as the publishers’ tools for control, what then?

There is no way to uninvent the DVRs of publishing — Google Books, Mendeley, Google search, and others yet to come.

If we play our cards right, we might actually achieve more engagement from an audience with more control over their information consumption and sharing habits.

The challenge will be appreciating the DVRs of our industry and realizing that the paradox might actually work in our favor.

It’s going to continue to be a wild ride.

Reblog this post [with Zemanta]
4.5/5 stars used for ratings on en.
Image via Wikipedia

The impact factor has long been recognized as a problematic method for interpreting the quality of an author’s output.  Any metric that is neither transparent nor reproducible is fatally flawed.  The Public Library of Science (PLoS) is trying to drive the creation of new, better measurements by releasing a variety of data through their article level metrics program.  PLoS is taking something of an “everything but the kitchen sink” approach here, compiling all sorts of data through a variety of methods and hoping some of it will translate into a meaningful measurement.

There are, however, a lot of issues with the things PLoS has chosen to measure (and to their credit, PLoS openly admits the data are ripe for misinterpretation — see, “Interpreting The Data“). Aside from the obvious worries about gaming the system, my primary concern is that popularity is a poor measure of quality.  Take a look at the most popular items on YouTube on any given day and try to convince yourself that this is the best the medium has to offer. Ratings based strictly on downloads will skew towards fields that have more participants.

While PLoS does break down their numbers into subject categories, these are often too broad to really analyze the impact an article has on a specific field.  A groundbreaking Xenopus development paper that redefines the field for the next decade might see fewer downloads than an average mouse paper because there are fewer labs that work on frogs.  Should an author be penalized for not working in a crowded field?

Probably the worst metric offered by PLoS is the article rating, a 5-star system similar to that employed by Amazon.

These rating systems are inherently flawed for a variety of reasons.  The first is that these systems reduce diversity and lead to what’s called “monopoly populism”:

The recommender “system” could be anything that tends to build on its own popularity, including word of mouth. . . . Our online experiences are heavily correlated, and we end up with monopoly populism. . . . A “niche,” remember, is a protected and hidden recess or cranny, not just another row in a big database. Ecological niches need protection from the surrounding harsh environment if they are to thrive.

Joshua-Michele Ross at O’Reilly puts it this way:

The network effects that so characterize Internet services are a positive feedback loop where the winners take all (or most). The issue isn’t what they bring to the table, it is what they are leaving behind.

Many people assume that readers/customers are more likely to leave a negative review than a positive one.  It only seems logical, if you had an adequate experience, why bother going online to write about it?  But if you’re angry and feel ripped-off, this is a form of revenge.  It turns out that in reality, this is not how things work.  According to the Wall Street Journal (article behind a paywall, but you can read it by following the top link from Google here), the average rating a 5-star system generates is 4.3, no matter the object being rated:

One of the Web’s little secrets is that when consumers write online reviews, they tend to leave positive ratings: The average grade for things online is about 4.3 stars out of five. . . . “There is an urban myth that people are far more likely to express negatives than positives,” says Ed Keller. . . . But on average, he finds that 65% of the word-of-mouth reviews are positive and only 8% are negative.

The WSJ article’s author gives some insight into the psychology behind such positivism here:

The more you see yourself as an expert in something, the more likely you are to give a positive review because that proves that you make smart choices, that you know how to pick the best restaurants or you know how to select the best dog food. And that’s what some research from the University of Toronto found. Specifically in that study they found that people generally gave negative reviews at the same rate, but people who thought of themselves as experts on topics were way more inclined to give positive reviews.

Amazon’s ratings average around that 4.3 point, and YouTube’s are even more slanted, with the vast majority of reviews giving 5 stars to videos.  A quick look at PLoS’ downloadable data (the most recent available runs through July 2009, so caveats apply because of the small sample size) shows the following:

  • 13,829 articles published
  • 708 articles rated
  • 209 = 5 stars
  • 324 = 4 to 5 stars
  • 122 = 3 to 4 stars
  • 33 = 1 to 2 stars
  • 19 = 1 to 2 stars
  • 1 = 0 to 1 star

Add up the individual ratings and the average comes out to 4.16.

Is PLoS really the publishing equivalent of Lake Wobegon, “where all the children are above average”? Or is this just another example where a rating system gives overinflated grades?

Unless detailed instructions are given, it’s difficult for a reviewer to know exactly what they’re ranking.  PLoS does give a set of guidelines, asking the reader to rank according to insight, reliability, and style.  But it’s unclear what’s being compared here.  Is one supposed to give a ranking based on a comparison to every other paper published?  To just papers within the same field?  To papers within the same journal?

The five stars available also do not allow for much nuance in a review.  While still not perfect, the recent redesign at Steepster, a site for tea drinkers, shows a better method (an example found by the authors of an upcoming O’Reilly book on online reputation).  Not only does the Steepster system include a 1-100 scale for ranking, it also allows the user to put their review in context with the other reviews they’ve written.  This would be helpful if a reviewer is supposed to be comparing the relative merit of different papers.

Though PLoS should be applauded for this experiment, it’s clear that some of the methods offered are not going to prove useful in getting a clear picture of article impact.

Five-star rating systems are proving unreliable in other venues, and the same is likely to occur here. If PLoS’ original complaint about the impact factor stands — that it is determined by “rules that are unclear” — then the solution surely can’t be creating a new system that is even more unclear.

Image representing Google as depicted in Crunc...
Image via CrunchBase

A recent trip to Europe brought home to me the fact that the American and European publishing worlds are divided by far more than an ocean. In Paris as in New York, San Francisco, and Chicago, all publishing talk sooner or later turns to the matter of Google. But in Europe. the talk often turns ugly.

Whatever one thinks of Google (and all publishers think about Google), there is little doubt that in just a few years, Google cofounders Larry Page and Sergey Brin have become the most influential people in the publishing industry, at least in the U.S., taking that distinction away from Jeff Bezos.

(Nostalgists recall when the Riggios (who control Barnes & Noble) ruled the roost, or, going back even further, when Harry Hoffman was the big dog.  I doubt that anyone currently active in the industry can remember a time when the most influential person in publishing was a publisher.  Bennett Cerf, perhaps?)

The European animus toward Google stems from many things, not least of which is that Google is viewed as very much an American company.  But beyond nationalist passions is a misunderstanding of the scope of the Google enterprise.  As fans of Star Trek know, resistance to the Borg is futile.  Google is now the defining entity in the information landscape.  To flourish, as best as publishers can hope to flourish, it’s necessary to find a place within the Google ecosystem.  There is no world elsewhere, no little pocket of commerce beyond the reach of Google’s audience aggregation, no opportunity to erect protectionist barriers or to appeal to the legacy of one’s own institutions.  To those who resent Google’s huge bulk and ambition, it has to be said:  Get over it.

Part of the resistance to Google derives from the company’s view of copyright, which, at least to European ears, sounds entirely wrong. Even for those, including myself, who have a traditional view of copyright (that is, during the term of copyright, copyright serves the interests of the producer), might pick nits.

However, it has to be said that whether Google’s view ultimately prevails or not (I think it will), obsessing about this one aspect of the Google program obscures what is happening in the marketplace and all the new publishing opportunities Google is creating.

For example, even as publishers take umbrage over the unauthorized digitization of copyrighted material in the Google Book Search project, it has to be recognized that the core Google search function, located at http://google.com, is a leading, and for many sites, the leading source of Web traffic.  Not all publishers value Web traffic as they should, which leads to an underestimation of Google’s significance.  If a publisher has traditionally worked through channels (principally bookstores, whether online or bricks-and-mortar), the implications of a direct relationship with end-users or customers may not be fully understood.

With the various Google Book Search features, publishers also have a number of intriguing ways to engage readers.  Google enables readers to search inside the book, which should yield more traffic, some portion of which can be converted to sales. Google also publishes its API (application program interface), which allows some of the features on its site, including the ability to search inside a book, to appear on the publisher’s Web site.

Let’s ponder this for a minute.  How many publishers could have created the “search inside” feature on their own?  How large does a publisher have to be to make this kind of IT investment?  And here Google is essentially giving it away to publishers.  I have heard Google referred to as “a taker, not a maker,” but if it is a taker, it’s one with an unexpected and apparently magnanimous ethical calculus.

The useful services Google provides to publishers keep growing.  Google Editions, which is expected to be introduced shortly, will enable the sale of ebooks.  Just what a publisher will be able to do with those ebooks is still something of a mystery — or at least it’s partly a mystery, making it hard to distinguish what Google plans to do with the endless speculation about Google’s strategy.  (This underscores the fact that Google is a premium brand:  the aura surrounding the company is many times larger than the company’s services themselves.)  From Google’s declared aim to allow online retailers to resell Google Editions, it appears probable that once again there will be a published API so that publishers as well as retailers can put the Google Editions on their own sites.  Google will charge for sales of Google Editions, but Google’s share is less than what publishers traditionally give to booksellers.  Google Editions will be viewable through any Web browser, which opens up intriguing questions as to whether this will serve as a form of DRM.  What happens when you close the browser?  If you can cache the ebook offline, can a browser-based solution allow copies to be shared?  We shall see, but in the meantime it appears that Google is trying hard to make their services palatable to publishers.  It’s almost as though Google is saying, “Look — you give us us mass digitization, and we will give you everything else.”

With the invention of the motion picture by Thomas Edison, the book lost its place as the center of the media universe.  All other innovations, from radio to television to the Internet, helped to push the book out further.  Now we live within a media landscape that has no center, but which does have a dominant issue, and that is the matter of online discovery, for which search engines, and Google in particular, are the dominant modes.

Google does not always behave the way publishers would like it to, but that’s true of any large company.  Nor is it always respectful of the media types that preceded it — the prerogative of the young, brash, and successful.  The question for publishers, however, isn’t whether Google sits up straight in class with its hands folded on the desk, but whether any publisher can afford to ignore this upstart.

For publishers, this is the Google century, or maybe just the Google decades, but either way, not to engage this extraordinary organization is likely to lead to obscurity.

Reblog this post [with Zemanta]

Next Page »