It is not so long ago that we routinely talked of Old vs. New Media. The “Old” was characterized by the investment in and creation of content, which in turn gave rise to a common set of properties — definitive and authoritative journalism and scientific reports, the fixed text, the pursuit of the finest authors and (in media beyond scholarly communications) the top creative talent. Old Media was analogue. New Media, on the other hand, was digital and had its own set of properties — the dynamic text, interactivity, user-generated content, and the emergence of the platform metaphor, upon which content created by others would reside. (There is an overlap in this summary and the thesis of Tim O’Reilly’s classic essay “What is Web 2.0?”)
If the iconic brands associated with Old Media were Encyclopaedia Britannica (where I once worked) and the Oxford English Dictionary, New Media had its own heroes, Facebook chief among them. Debates about Old and New often rose, or descended, to arguments about maintaining standards on one hand and the democratization of culture on the other. Even though I have strong sympathies for traditional media forms (is there anything anywhere finer than the 19th-century British novel or the films of Stanley Kubrick?), I always feel a bit puckish when I hear an executive of an established Old Media company, extracting profits like a shower of gold, talk about how the world will inevitably fall into the abyss if they are in any way inhibited from doing what they do, as though the health of the culture were in direct proportion to the profits at Elsevier, John Wiley, The New York Times, and The Disney Corporation. To be a Keeper of Civilization is a stout and necessary responsibility, and all these reckless activities at Mendeley, Sci-Hub, and ResearchGate must be brought to an end.
Push hard at the categories of Old and New and the containers seem a bit porous. The New York Times, that stalwart of the Old, is racing to become new, with digital editions, user comments, community-building, podcasts, and whatever else the management can find to experiment with; anything goes, provided that the Times — authoritative, august — remains the Times. A core innovation of PLOS ONE was to give the commentary on an article after it was published a status similar to that of the review structure that preceded publication, a notion that we now see being literalized at PLOS’s imitator, eLife, which has just announced an arrangement with the online commenting service Hypothes.is (full disclosure: I am an informal advisor to Hypothes.is). No Old Media outfit has failed to attempt to become New, even if sometimes reluctantly. We should not forget that Elsevier acquired Mendeley. We once looked to Britannica, now we turn to Wikipedia.
Into this debate about Old and New steps a new candidate, the model of the data science company, which neither creates content (the prerogative of the Old) nor aims to serve as a platform for third-party content (the aim of the New). The Data Media model is something of a meta-model: it is about other content and the software that drives it; it sits above both Old and New, observing them with a cold eye, and deriving new inferences from them. An Old Media company wants to create a movie; a New Media company wants to create a hosting platform for individuals to upload their own movies (the realization of which is YouTube); a Data Media company wants to study the movies and videos, especially large collections of movies and videos, and identify the emergent properties of these collections, not to mention the activities and identities of the people who watch them.
Just as Old Media companies seek to grow into New Media companies, New Media companies begin to look a lot like Data Media companies. We are all uncomfortably familiar with the kind of analysis that Web giants do with our own personal data, for example, and some large publishing companies are now attempting to harvest new insights and even products from their collections of scientific articles. The Data Media paradigm, in other words, is both a new way to think about business and an extension of prior business models.
I would like to enlist some working data scientists for comment, but this observer sees three categories of data, each of which has different characteristics:
- Data that is developed as part of a research program;
- Data that is generated by and about users — such things as the identities of users and their proclivities (accessing one article instead of another one, etc.);
- And data that resides in large collections of content, upon which computer processes are directed with the aim of identifying emergent properties.
For publishers I think there is little to be done with the first category — that, after all, is the role of the scientific researchers themselves. The second category is particularly useful for organizations that seek to sell advertising, which remains an important revenue stream for a select number of publishers. The third category has real promise, but the catch is that you need a large collection of articles in order for data analysis to yield meaningful results.
As one begins to ponder the implications of bringing data analytics to scientific publishing, it becomes apparent that so much of today’s conversation is oh-so-yesterday. Open access vs. toll access? A New Media controversy that is getting long in the tooth. ORCID? A great idea that may be superseded by the inference of authors’ identities through computational analyses of large databases of published articles. The development of a research agenda or the crafting of an editorial strategy? Let’s look at the patterns in the collection of articles, where researchers are implicitly telling us where a field is headed. No doubt the Data Media model is going to be responsible for some boneheaded mistakes, but we always make boneheaded mistakes, though we take particular relish when the boneheaded comes about because of something new.
For publishers, or any media company for that matter, the opportunity is in looking beyond the merely digital to the increasingly data-centric. This is a game of leapfrog, where whoever jumps first, wins. The challenge is to land on your feet.