"I feel happy."

Yesterday morning at the UKSG in Glasgow, Cameron Neylon and Michael Mabe debated the topic: The Future of Scholarly Journals: Slow Evolution, Rapid Transformation – or Redundancy?

Cameron, the newly named Director of Advocacy for PLoS (beginning this July), framed his arguments in terms of technology and design. His premise was that the transformation is already here and that there are enough clues in the technology ecosystem to give us an idea of where things are heading. “The fundamental things we think of as making up a journal will change.” He summarized those things as follows:

    1. A journal contains articles.
    2. Articles are selected as the result of a process.
    3. The process is generally managed by a publisher.
    4. A journal belongs to one publisher.
    5. An article belongs to one journal.
    6. An article contains some narrative text and has a single version of record.

Neylon’s premise is not that one or two of these aspects might change, but that the entire structure (all of the premises together) represents a house of cards that will fall. He then showed the audience two websites built on WordPress (PLoS Currents Disasters and the Journal of Conservation and Museum Studies). As Neylon put it, referring to WordPress and the journals built on it, “This is the worst that your free competition will ever be. I can put together a journal on this platform in 10 minutes with a free domain name.” Publishing has become easy.

He showed figshare, which started with a question: “What is the smallest meaningful unit of research to distribute?” The founders decided it was a figure, with enough information to make the figure understandable. It is not peer-reviewed and accommodates a wide range of formats. While the existence of WordPress does not unseat the six fundamental aspects of a journal mentioned above, Neylon posited that something like figshare certainly might — by challenging how people think about scholarly publication.

Neylon went on to make the point that when we can publish in smaller pieces, sometimes built into a narrative and sometimes not, we can build many things from our results. We need to make the interface to research compelling and useful. It needs to deliver what the researcher wants now. The problem is a design issue. The article is not the best interface for answering questions. He went on to say that someone will ultimately figure out the best way to make that interface work.  “Once we realize [articles] don’t work, we’ll stop writing them.”

Michael Mabe started his retort with a question: Why hasn’t the journal changed more as a result of the Internet?  Whereas Cameron Neylon framed his position around technology and design, Mabe’s argument introduced social aspects and human behavior.

    1. The structure of a book/pages is deeply embedded in the culture of reading and is reader friendly.
    2. The fundamental needs of researchers have remained static.
    3. There are only so many information niches, and they’re all filled.

Mabe started by saying it was the codex — the modern bound book form — that was the real revolutionary change, not the printing press. The codex is what started the familiarity with bound volumes and pages:  “Two millennia of habit and utility take some undoing.”

Building on that point, he quoted a 2009 Elsevier study of 64,000 authors and stated that researchers still prize speed, referee quality, and reputation above all else in their publication expectations. Researcher needs have not changed significantly over time. In his opinion, scholarly publishing has evolved to meet the human and philosophical needs of researchers: “There are new tools but they serve old purposes.”

Finally, there are only so many informational niches to be served. Human senses still accommodate reading, writing, speaking, and listening. Mabe felt that the communication instances that addressed these areas were all filled. Technology might enhance them but it does not fundamentally change them.

The Twitter stream was very lively during both presentations. As is often the case in debates, both arguments applied a different lens to the topic. The result was an interesting conversation and perspectives with which attendees could both agree and disagree.

Enhanced by Zemanta
Ann Michael

Ann Michael

Ann Michael is Chief Transformation Officer at AIP Publishing, leading the Data & Analytics, Product Innovation, Strategic Alignment Office, and Product Development and Operations teams. She also serves as Board Chair of Delta Think, a consultancy focused on strategy and innovation in scholarly communications. Throughout her career she has gained broad exposure to society and commercial scholarly publishers, librarians and library consortia, funders, and researchers. As an ardent believer in data informed decision-making, Ann was instrumental in the 2017 launch of the Delta Think Open Access Data & Analytics Tool, which tracks and assesses the impact of open access uptake and policies on the scholarly communications ecosystem. Additionally, Ann has served as Chief Digital Officer at PLOS, charged with driving execution and operations as well as their overall digital and supporting data strategy.


41 Thoughts on "The Article — Not Quite Dead Yet"

Neylon’s view, at least as reported (and represented on Twitter), seems superficial and naive to me. Technologically, publishing has become easier with each passing decade; culturally, publishing has become increasingly demanding.

Using WordPress to publish is nothing new (as we at the SK know). Last I heard, the New York Times owns a share of Automattic, the company behind WordPress, and the Times itself is running on a heavily tweaked WordPress install — and has been for years. But if I start another WordPress newspaper, I won’t have the social, cultural, and economic power of the Times. There’s much more to publishing than technology.

Scientists publish to transform information into prestige and priority, the two things that really matter to them. A weakly branded WordPress install isn’t going to address prestige, and a publishing technology that doesn’t count in Medline or other priority systems won’t work for scientists either.

I also worry about this race to the lowest “minimal publishable unit.” We need the entire story of an experiment to evaluate its merits and replicate it. We already have too much salami-slicing and fragmented story-telling in the sciences — just look at how many papers are retracted when mistakes occur or fraud is detected. Ideally, we’d retract one paper per study, not 9-12 as seems to be occurring now. Fragmenting the scientific record because we can hardly seems wise. Already, we’re seeing companies and individuals relying less on scientific reports because they’re finding the reports are incomplete, the studies unreplicable.

Mabe is much closer to the mark — culture trumps technology. That’s the bottom line.

“Neylon’s view, at least as reported (and represented on Twitter), seems superficial and naive to me.”

That’s interesting — I was about to say exactly the same thing about Mabe’s view.

We should meet up in ten years and see which one of us has proven to be right.

Let’s put it this way — more than 10 years ago, Harold Varmus and Pat Brown were making equally naive and superficial statements. Since then, the journal format has become only more entrenched and valuable (no matter how it’s paid for), and the article rules the roost for both authors and readers.

WordPress is almost 10 years old itself.

I think those “10 years” have passed.

Those ten years have passed. The next ten haven’t.

Just off the top of my head, in the last ten years we’ve seen:

* The invention of “open access” in Budapest (actually 10 years and a month ago)
* PLoS arising from nothing at all to become a serious player under an all-CC BY regime
* PLoS ONE arising from nothing at all to become the world’s biggest journal
* The continuing success and universal mainstream acceptance of arXiv
* The establishment of preprint servers other than arXiv
* Big Players like NPG getting on board with non-reviewed preprint publication
* Numerous open-access mandates from federal (NIH) and other (Wellcome) funding bodies
* Numerous institutional repositories
* The rise of Twitter as the dominant medium for discovering new papers
* The rise of blogs as the dominant medium for discovering new papers
* The establishment of blogs with serious scientific content of their own
* Establishment of video, 3d scan-sets and other such non-paper supplementary information
* The growth of data-sharing in initiatives such as Dryad
* The beginnings of “micropublication” in initiatives such as FigShare
* PLoS ONE totally redefining what people expect from peer-review
* Increasing recognition that traditional peer-review doesn’t do what it claims to do
* Recognition that high IF correlates more strongly with retraction rate than with citation rate
* Various experiments in non-traditional peer-review, including F1000 and one by Nature
* The kick-off of F1000 Research, which proposes another new peer-review model
* The rise of collaborative reference management via Medeley, Zotero, etc.
* Many new ways to obtain copies of papers (#ICanHazPDF, torrents, dropbox)
* The rise of author activism such as the movement that defeated the RWA
* Increasing awareness of access to research as an issue in the general public
* Disintermediation everywhere, facilitated by publishers’ withdrawal of services

And I am sure many others that I’m not thinking of right now. Some of these bear directly on what the “article” is and will become, some do not — but all are disruptive, all of them are changing the game, and the net result will be to bury publishers that stand still just as surely as cars buried buggy-whips.

Yawn. All of those things only make the article more potent and central, and most reflect cultural demands, despite their technological veneer. I think you just proved my point.

Also, let’s be clear on what “disruption” is — the replacement of one supply and value chain with another. It leaves the end-product largely the same. Steel was disrupted by mini-mills, but the end-products (rebar, steel beams, sheet steel) remained the same. Motorcycles were disrupted by Japanese manufacturing practices, but motorbikes remained the same. Disruption has occurred and is occurring in publishing, and scholarly publishing is probably the farthest along (with majority-digital distribution, printers replaced by platform vendors, XML replacing film, and so forth). But the end-product remains the same — the article.

It will take something other than “disruption” to change us from the article. By definition, disruption doesn’t change the end-product in any substantial way, just creates new ways to generate it. To move from the article, we’ll need “replacement” with alternatives, and that’s not the path we seem to be on.

Cars were not a disruptive technology for the horse supply chain — cars replaced horses, and by extension replaced buggy whips.

Where do come off with those comments on peer review? Where is your data?

The word is the basic unit of meaning. Sentences are linear strings of words, hence the simplest possible structure. Pages are strings of sentences, articles strings of pages, etc. All are extremely simple and easy to create, hence efficient. More complex arrays are now possible and we are exploiting them, especially via linking. A page with links is more than a page. The entire Web is a single structure. Visualization is everywhere we look, etc. I have developed a few new forms myself.

Much more is to come no doubt, but none of it entails the end of scholarly publishing. That is the fallacy.

“The structure of a book/pages is deeply embedded in the culture of reading and is reader friendly.”

That’s true. And before that the scroll was deeply embedded in the culture of reading and was reader friendly. And before that it was cuneiform tablets. And before that people wrote with a stick in the dirt. Things change.

The idea that people are so tied to the notion of the current system that they will be unable to cope with a new system is ignoring what is occurring in the real world. People are reading books on their Kindles and enjoy doing so, despite the ‘location-based’ system it uses in lieu of pages. Newspapers lose revenue because people experience the content online – out of the context of ‘continue to C5 to read this article.’ They do that because the partnership between content and ease of use matters. People can read on their Kindles because it has the content they want and they feel like it’s easier to deal with in comparison to carrying the codex-bound volume. Even those who are annoyed by the location-based nature of the content aren’t annoyed enough to throw the baby out with the bathwater. In that scholars do more and more research online, we know that the content is indeed important. Now is the codex/pages metaphor going to keep them from doing research that’s removed from it? We already know that ebooks are more popular than ever and print revenues are decreasing. Am I supposed to believe that their behavior is going to be different than the trend I see if I merely open my window and look outside?

If an electronic journal published through WordPress truly had content that people wanted, they would experience it that way. Even if it were as simple as one long page where would had to demarcate ‘pages’ by ‘paragraphs.’

Am I supposed to really believe if that John Rawls had published ‘A Theory of Justice’ online as a webpage the content would be less valuable because it didn’t use the ‘pages’ metaphor? Give me a break. Also, with the rise of ‘continuous scrolling’ being used across the web, people are becoming more and more comfortable using interfaces that don’t provide affordances for ‘pagination.’

I, and I think Cameron, are looking at the issue of getting beyond articles a little more deeply. All of the formats you describe are still basically pages and articles in the sense of fixed strings of sentences. Ironically, scrolling a long web page is a return to the ancient scroll, hence the name.

Beyond articles means a system where researchers put in their individual work, but the output is new mixes of this information, responding to user queries. In such a system the research results are not individually published, so there are no articles and hence no publishers of articles.

I think that line about the “structure” of the book is a bit misleading here. It strikes me that these arguments are less about the physical form of the written material than it is about the organization of the material. Scientific papers long ago moved out of the realm of the physical printed journal and are read (for the most part) in electronic form via html and pdf. That change (from scrolls to books to web pages) has already happened. The questions here are more about what we’re reading on those scrolls, in those books and on those web pages.

The form of the scientific paper evolved to serve a particular purpose. Does that purpose still exist? If not, then what has changed? If so, then do the proposed alternatives (making raw data available, publishing figures piecemeal) serve the same purpose in a superior way?

Sentences are easy to create.

Truly meaningful sentences are less easy to create.

Sentences that convey intended meaning without ambiguity are bloody hard to create.

That’s perhaps why there is so much bad writing in science. It takes effort, real effort, and so time, which people don’t have.

And perhaps ultimately that’s why there is a need for publishers (or people who take a publisher’s role).

And also why, in the name of cost-cutting efficiency, so many publishers (note: not all) make themselves seem irrelevant by neglecting this, their most publicly visible, part of their role.

This may be true but I don’t see what it has to do with the issue, which is how finding an alternative information structure to the article will somehow make publishers obsolete.

You seem to be suggesting that good writing is the job of publishers. That can’t be what you mean, can it?

Presumably all those rewrites required by reviewers are aimed at good writing, or what else if not?. This is part of the ambiguity of the purpose of peer review.

I wasn’t clear (so inadvertently proving my point I guess). I was referring to copy editing, not original writing. Something publishers once prided themselves on, but is now in many, though not all, places regarded as a cost in the process not an investment in the output.

(Though it has to be said authors arrogance also has a part to play here: some do believe they are Elliott, Shakespeare and Whitman rolled into one such that their hallowed prose should not be altered for the mere sake of clarity.)

As a copy editor (of articles on mathematics and theoretical physics), I have not encountered an author that made a self-comparison to “Elliott, Shakespeare and Whitman.” But I have had one author compare himself to Feynman, Hemingway and Vonnegut.

The debate is interesting in elucidating one extreme view. Finding myself in a role somewhat analogous to a buggy-whip maker at the beginning of the 20th century (copy editor for a journal translated from Russian, attempting to transform sentences into easily comprehended sentences “that convey intended meaning without ambiguity” [Martin above]), I have been personally interested in this issue.

The issue can also be viewed in the larger context of reports of original research => review articles => monographs => textbooks. Technological advances can affect all stages of this chain.

In addition to the general tendencies, it may be useful to consider extreme cases. On one hand, I find four papers by G. Perelman deposited in arXiv.org, and all four papers currently exist in their first version. Perelman refused to submit his work to a refereed journal. Nevertheless, he was awarded the Field Medal and was also awarded a one-million-dollar Millenium Prize by the Clay Mathematics Institute for his proof of the Poincare Conjecture (he refused to accept the awards). On the other hand, we have a short paper in arXiv.org currently in its 19th revision (arXiv:math/0312309v19).

Bill describes himself as copy editor for a journal translated from Russian, attempting to transform sentences into easily comprehended sentences “that convey intended meaning without ambiguity”.

Bill, I don’t think that is at all analogous to a buggy-whip manufacturer. You’re providing a specific service that people need irrespective of the technology that delivers the final version to readers.

Personally, I don’t think the point is to say that publishers will or will not survive. Or that the article will or will not survive. We still “dial” a phone, yet I haven’t seen an actual dial since I was about 20 (and even then it was out of place). So in essence we often find ourselves debating the future meaning of current terms – with one group of the debaters using the current definition and another group using a potential future definition.

I think the point is evolution. Where I feel Cameron gets it right is that there is evidence of evolution all around us. Where I think he might get it wrong is in the implication that evolution will completely replace all aspects of the current environment and in a relatively short time frame. Where I think Michael gets it right is that people are slow to change – especially those that are overburdened and require high levels of productivity – and that, with some exceptions, they mostly change because something comes along that they quickly identify as making it faster, easier, cheaper, etc for them to do the thing they need to do. They are, in fact, doing the same job with new or enhanced tools.

However, that is not to say the they won’t start to question the fundamental tools and norms of achieving the objective of that job (or even the objective itself). That will not happen in a wholesale manner until significant baked-in incentives change. As someone that works with organizations to help them change – I can tell you that changing incentives takes a long time, discipline, and an incredible commitment to rooting them all out (many are not quite as obvious as others). This not only takes time but it also requires respected, influential champions – many of them.

Another point I find fascinating is that I don’t believe this is all or nothing. I believe in story. I believe in the big picture. Yet, I don’t find Cameron’s views of “bite sized” science to be inconsistent with that. There should be a place for both. I would think that, whether intentional or not, the use of the lego metaphor implies the pieces can exist both independently and as part of the story. What good is a bunch of unconnected legos – their entire purpose it to be combined, busted apart when needed, and recombined. What’s missing if you focus only on the legos and the structures created with them is the overarching context of why and how the recombinations occurred – there is still a need for that story. Of course they will both exist – they need each other to make sense. What’s more scientists need them both in order to keep the legos in motion and avoid wasting their time.

The beauty of all of this IMO is that we’re having the conversation!

The problem with the Lego analogy is that it is too vague to carry the argument. Getting beyond the article has been an active research topic for decades, especially among the hypertext community. It involves deep issues in the nature of thought and knowledge, as well as language. In many cases the computer people are simply ahead of the cognitive science, much as with the artificial intelligence hype. Cameron seems to fit this model.

Just to pursue the Lego analogy, an article is a specific structure. A lot of the words go to expressing that structure, not expressing the units of thought. So you can’t just take the sentences and use some of them, as is, to build another structure. Pronouns are probably the worst case. Their meaning (technically their referent) depends on what has come before. They are also one of the biggest reasons why computers can’t read, which is actually a similar problem.

I just happen to have a general theory of the structure of information content, if anyone is interested. See my “Untangling the Web of Technical Knowledge A Model of Information Content and Structure” at http://www.osti.gov/bridge/product.biblio.jsp?query_id=0&page=0&osti_id=991543&Row=0&formname=basicsearch.jsp

I also have an alternative to the article, namely the issue tree. See http://www.stemed.info/Repo_Tree.pdf and http://www.stemed.info/reports/Wojick_Issue_Analysis_txt.pdf

Other people have other approaches. But in any case if people think the computer is just going to magically restructure what people say, to fit it to someone’s special needs, they are dreaming. Yet many do. The machine reading community is hoping to get computers just to answer questions about what an article says, not to rewrite it, and certainly not to rewrite entire technical literatures at will.

We run into the issue that you can’t simply recombine what has already been created generally in content “reuse” vrs. “repurpose” discussions with publishers. Restated the issue is 1) The content has to be developed with reuse/repurpose goals in mind and 2) Most legacy content is not. Usually in a publishing environment reuse/repurpose is a going forward approach – where only the content with the highest reuse/repurpose potential is retrofitted for this (even then it is often only on an as needed basis). My point is that you are correct in that the content we’re discussing has not been created with granular reuse/repurposing in mind. There are editorial issues here and there always will be.

However, if a publishable unit based on standards and norms were defined, there is no reason to believe it couldn’t be reused (as is) OR more likely repurposed (interpreted for use not exactly as it was created) – by people, for sure. That is, I’m not completely familiar with Cameron’s body of work, but no where in the debate did he say this was going to be done by machines.

So I do agree with you that the lego metaphor is abstract and not literal. I do believe that the ideas and concepts embodied in those legos could be made more accessible and usable by others who might build new structures with them.

Ann, it sounds,like we are talking about two different visions. I am referring to the one of dynamic documentation that has been kicking around the AI and hypertext communities for a long time. Cameron’s argument sounds like that case, especially his reference to needing new discoveries to pull it off.

The idea is to have a generic body of information that gets restructured and presented according to user needs. Each version might read like an article. Let’s say it is about malaria. One user might ask about the spread in a region, while another might ask about interventions there, and so on. The idea is that people would put units of information in, including text, and the system would select and configure it for individual users.

Can you give me an example of the re-purposing you are talking about? When you re-purpose something what is the unit of meaning that gets re-purposed?

Cameron mentions figures, but sharing figures is trivial, and no threat to publishing. People have used each others figures ever since there were figures. An article is a body of expressed thought, expressed primarily in text, so getting beyond that ultimately means being able to reconfigure text in meaningful ways. This is a deep problem, one I have worked on for 40 years.

Ugh – I just had a response all written and WordPress ate it 🙁

I’ll try again.

First I had a disclaimer – I will provide examples, but those examples are more examples of how people rethought what they were doing – not examples that I think we can slap into a sub-article type of environment tomorrow and have them work.

One example from legal and accounting realms would be that statutes, cases, or guidelines are constantly reused as is and the expert opinions written about them are often repurposed into multiple presentations for different audiences with different objectives. This is also the case regarding drug information publications. There would be basic drug information that is reused and expert, audience dependent interpretation that is repurposed (i.e., similar but different for a nursing student, LPN, RN, NP, etc.).

Why couldn’t the article as “a body of expressed thought” be supplemented by a more concise representation of the concepts within that thought – a bill of lading of sorts. The smaller concepts would aid in search discovery and potentially repurposing of those concepts within the context of scientific discovery – creating new/different articles. Are they created that way now, no. Why couldn’t they be?

Also, it occurs to me that one flaw in many debates is that they play to the extreme – X will make Y obsolete or Y will always dominate X. I don’t think this is an all or nothing game. What is useful and productive for one subject area or one type of user might not be the same for another. I tend to think that in the disciplines where smaller nuggets of information are meaningful they will evolve and in the ones in which they are not, they won’t. Also, I do believe the narrative will always exist, but that doesn’t mean the sub-concepts within that narrative are not important and valuable in their own right.

What I’m suggesting is reworking the the configuration of an article package not the replacement of an article as it is, may be an option. Like most new things attempts to do this will fail more times than it will succeed…it’s an emerging idea.

I don’t pretend to know the future – but I do value the perspectives of those that challenge the status quo. I also value the perspectives of those that defend it – reality usually ends up somewhere in the middle and, if things are working properly, both sides tend to enhance the end result.

There are ways that Cameron is right, but I think the central thesis of his argument (at least how I understand it, and I wasn’t present at this particular debate) has a problem in that it ignores the actual functional purpose of the research article.

Cameron makes the point that there are vastly superior and more efficient ways to share data between research groups. But I’m not convinced that this is why we read or write papers. Papers are a highly evolved form, one meant to convey essentially a quick summary of a completed research project. There’s an inherent efficiency involved. Of all the papers I’ve read in my career, there are very very few where I’d have actually wanted to pore through the researcher’s notebooks and view every single piece of data.

I don’t think this is unusual. It’s hard enough for a PI to find time to go through their own students’ notebooks. How much time are they going to spend going through everyone else’s raw data? With very rare exceptions (work directly related to projects ongoing in the lab), all they really need is a summary with representative examples. Tell me what you learned, don’t waste my valuable time. If I really want to see your data, I’ll contact you directly.

There are places where I want that one small piece of data, or small pieces of data that are enormously significant. New forms can and are arising to provide this sort of exchange. But they don’t necessarily serve the same purpose. Which is why they seem to be additive, rather than replacements.

I guess that was my point – which you summed up far better than I did! – “they seem to be additive, rather than replacements.” It seems to me that for different “jobs” different optimal views and packages will evolve. I don’t think this is an either or argument.

What you said, Ann and David. Debates like this always seem to push people into extreme positions which mask the real-world overlap between what they are communicating.

Actually I feel this is the opposite of what really happened. We looked at the issue from different perspectives but actually I thought there was quite a lot of agreements. It certainly wasn’t an artificially polarised discussion – I think Michael and I hold quite trenchant views that are further apart than might have been obvious.

On the other hand – maybe better to wait until the video is live – not sure how many of the people in this thread were actually there vs responding to other comments 🙂

David gets my argument more or less right. And my core point was exactly that one of efficiency. As a *reader* the paper is a very inefficient way of getting to the information that I mostly need as a researcher. Michael’s argument, which was well made, was that the paper is a highly evolved form for the *author*. What I think was most interesting, and where I think we agreed, was that there is a divergence in the needs of consumers and producers, authors and readers. Even though they are the same person in many cases the current situation drives us to very different behaviours when in each mode – and that is what I think is unstable. As these forces drive us in different directions there will have to be a breaking point. What that will look like is an open question.

Obviously part of what I said was for rhetorical effect, I don’t think the article will disappear completely, any more than I think we won’t still have something in 50 years time that we call a journal. But the point I wanted to make was that when I took a look at my behaviour in consuming information I wasn’t being served at all well by either of these in their current form. I *do* believe however that most researchers (at least in the STM space) are deluding themselves about how much they actually read and use articles per se – and that as this becomes more obvious the level of motivation to write will go down. How this will actually surface in practice I think is a really interesting question.

Mostly what I want as a researcher is a fragment of a paper – the paper usually makes it harder than it needs to be for me to find this fragment. There is an argument that the structure of papers is the best possible to help us index and locate those fragments. I don’t believe that is technically true but I’m prepared to admit the possibility that it might be socially or culturally true – at least until we get automated indexing and aggregation properly sorted out. But even then it might be the case that we computationally generate things that look like papers for some kinds of consumption activity. But I would hope that we could also generate other things for other kinds of consumption activity.

Cameron, the issue is the “needs to be” in your “….the paper usually makes it harder than it needs to be for me to find this fragment.” That is a claim that there is in fact an alternative to the paper, which does the job you describe, which at this point there is not. Given the way language works and the cost of producing coherent thought it is not clear that there can be such an alternative. (I have worked on this problem for many years.)

For example, I read one paper for the results, another for the method, and a third to see what my competition is up to (also known as knowing the field). I read other papers for the same reasons, and integrate these efforts in my overall knowledge. In most cases, as you correctly point out, I only read the fragments related to my purpose.

Of course it would be wonderful if there were some way that I were only presented with the information I need, with no additional effort on the authors’ part, in readable form, but that may well not be possible.

“That is a claim that there is in fact an alternative to the paper, which does the job you describe, which at this point there is not.”

I’ve been looking at this for a day now and I still can’t quite figure out how to respond. It doesn’t help that the whole core of my talk was precisely to demonstrate that for many of the things that I do on a regular basis there are many better ways to discover and consume information that I need as a researcher. We are clearly at cross purposes but I cannot construct a mental state where this statement makes sense, unless you are restricting your scope to some very limited notion of what “science” is.

I am taking a purely pragmatic approach. I do science. This involves needing all sorts of different information for different purposes. Amongst those things that are better than a paper for specific research tasks:

* Google (with live search update) – I can often find physical constants or parameters simply by doing a search.

* Wikipedia – where articles exist on a topic they are usually a better route into the topic. When looking for specific but “text-book” level facts Wikipedia wins every time.

* Methods forums – You get much more useful information from methods sites and forums where people talk about the problems they encounter than you generally do from a research paper. In computational science there are an increasing number of very nice examples of people actually providing the methods in executable form with the data online – much better than would be found in the equivalent paper.

* Research databases – where they exist these are invariably a better way to get data of interest and to bring data together

* Q&A sites – these vary a lot but the best are extremely good and in some cases tackle serious research level questions

* Twitter – keeping up to date in some research areas is much easier via twitter and other social media. I get better coverage of web as well as journal literature and its much easier to consume and search. It is however limited to specific areas where there is a strong twitter community.

…and on and on and on…

Its worth noting that many of these also have more effective and transparent means of validation than the peer reviewed research literature. I literally trust StackOverflow answers more than I would the methods section of a Nature Paper. Good SO answers give me runnable code and are tested in anger by people. Methods sections do not for the most part – even in computational studies.

That a research article might be the best approach for some specific task is conceivable to me but even that requires a certain degree of magical thinking and special pleading in my view. For instance – I could argue that a one-to-one conversation is a better means of transferring that information. You could counter that this doesn’t scale, but if scaling is the target then the web beats peer reviewed literature every time…we would just end up going around in circles trying to pin down what it is that an article “does better” and how to measure it.

My sense is that you’re constructing a circular argument. There is a thing that you view as “doing science”, this is supported best by consuming research papers, therefore this other stuff that I do in my day to day work is somehow “not science”. Science is that which is communicated in journals through articles, therefore there is no better way to do it. This just feels like an impoverished view and ultimately a sterile argument to me when what we could be talking about is all the different ways we do, and perhaps could, communicate and how best to enable them.

I’m not sure why any of this means an end to the article though. There are new resources that serve specific needs for you. Why does that de-value the needs served through the research paper so much that it will cease to exist?

I’d also suggest that computer science may not accurately represent all other types of scholarly research. You’re dealing with abstract data, essentially computer code or electronic data that is easily translated into computational form. For you, Stack Overflow can offer the complete package, downloadable code and data for testing. For a developmental biologist, downloading a transgenic mouse is not quite as achievable.

Computational science has also created a thriving online community, which is lacking from most other fields of research. I suspect this has much to do with both the electronic nature of the data/code and in the isolated nature of performing the research itself. With data/code that is completely transferable online, a collaborative network is easier to work with than when the research involves real physical objects, specimens or patients. Also, doing work on a screen in a room by oneself has led to a need for creating collaborative networks. There are different social pressures and needs than for a field where work is done in a crowded social laboratory surrounded by one’s peers, or in a hospital interacting with other doctors and patients all day.

One size does not fit all. Many of the tools you list above are of low value for other types of research (both scientific and humanities). None serve the same need as the research paper.

Ack, comment nesting limits reached. In reply to DavidC:

“I’m not sure why any of this means an end to the article though. There are new resources that serve specific needs for you. Why does that de-value the needs served through the research paper so much that it will cease to exist?”

It doesn’t, I never actually intended to say it did (and to the extent that I said it did it was in response to the rhetorical needs of the original debate). I do believe that articles will become less important (that’s almost a non-statement though, they are *everything* now from the author perspective, so any new communication forms will necessarily reduce their importance) and diversify in form and structure but I don’t think they will go away or become completely irrelevant.

In fact I think we’re actually agreeing. I made the case that article was my fall back position of last resort. And that in a sense is what you’re saying as well, you’re making the case that it is the general form that can cover the bases in a way that other forms, that might be more efficient in certain specific contexts, can not. I agree with that – I would still think that we can do better by exploring some new options and seeing how general we can make them, and I will accept we disagree on how much of an issue this really is in practice – but in principle I’m happy with the notion that the article serves a good purpose as a generally accessible route into a scientific information for a human reader. And I think it makes a good challenge with specific questions. How can we do that better in specific contexts and if we can achieve that, how general can we make those systems?

As an aside I’d also counter that one of the interesting things about the way that articles work is that they seem to not be unique – they are simply an example of storytelling with highly defined structures, but enough internal heterogeneity to make real automated parsing a nightmare. I think there is some very interesting work to be done on whether artificial papers can be constructed from their constituent parts and whether people would be able to tell the difference. A kind of inverse of Anita de Waard’s work on narrative analysis.

But I also think your counter examples are interesting challenges. So how could we make it just as easy to download a transgenic mouse as to get some code? What would be required? If we could “publish” the mouse what infrastructure would we need? How would you cover the costs, and how would you make it worth the “authors” while to put in significant work. Yes computational methods are easier to transfer and store but what can we learn from the notion of a recording of a method being executable? Would that change the way we thought about writing up/recording experimental methods? Where do robotics and instrumentation come in? What does it meant to do “the same experiment” on an instrument here at ISIS compared to another facility. How much of our data processing in experimental science is in principle capturable in the same way as the best of computational science?

Cameron, perhaps we are in fact simply talking past each other. I certainly agree that there are many sources of information besides articles, more all the time in fact, and a wonderful thing that is. As senior consultant for innovation at DOE OSTI I am actively involved in developing several of them. But the proliferation of these tools does not mean that articles make what they do harder than it needs to be, which is what you said that I objected to. (As a logician I am very litteral minded.)

The role of journal articles is central to science and there is no equally efficient alternative at this time. Seems simple to me. None of the new tools do what journal articles do, which is report results in a filtered fashion.

It’s often true that only a small fragment of an entire paper will turn out to be important to your research. However, you can’t know what that fragment is until you’ve read and understood the entire thing. Often you realize this months after reading the paper, as your own experiments have changed or other relevant experiments have been published. It’s not something that can be predicted in advance. I can’t conceive of a system that would predict your needs two years from now and only display the important section of the important paper to you now.

I also think there’s great value in “keeping up with science”, knowing what’s going on not only in your own field, but in other closely- and not-so-closely-related fields. The research paper does seem an efficient method for doing this. If I’m studying Drosophila behavior, I don’t have the spare hours to dig through your time lapse movies showing olfactory nerve migration in mice. But your studies may uncover some governing principle that I can apply to my own work. I can’t get that principle from a table or a single image. I can get it from your interpretation of the meaning of your work.

And to me that’s something lost in the conversation here. Technicians collect data. Scientists understand data. There has to be room somewhere to express that understanding, to take all the bits of data that have been collected (and sure, made available if you want) and to synthesize a new concept, a new understanding from them. How can that sort of knowledge be shared if we’re all just publishing the salami-sliced least publishable units?

I think you’re creating a false dichotomy here. In my view what we need is systems that help us capture and publish these fragments, precisely so that they are structured in such a way that they are easier to use as the building blocks of larger narratives. These larger narratives in turn can help us discover specific building blocks if that is what we need. And can also be used as building blocks for other, more general narratives about how the world works.

We need different types of information at different levels at different times. Sometimes I want just data, or just methods, sometimes I want to see how an argument is constructed. Sometimes I want to aggregate data across a wide range of sources to compare and contrast. Sometimes I just need to hear a specific idea. Ideally we’d have a modular system that allows us to search and discover and to build at all of these different levels and to do so in a prospective fashion as well – agents looking for things that might be useful to us at some point in the future. Clearly this is non-trivial but I don’t think impossible. The challenge lies is finding a balance between the cost of standardisation and the return on investment in terms of usefully surfaced information.

Really efficient salami slicing could at least mean that things would get out in some form, even if they’re not worth crafting into papers, or reviews, or database entries. They will clearly be less immediately useable and possibly less important, but as long as the costs of sharing them are low enough they can still bring some value. Fragments of many things have been important to me, some of them have had context, some of them haven’t. I’ve wasted a lot of money on trying things that other people knew didn’t work – and probably quite a bit more regenerating data that almost certainly exists somewhere. If we can make it easy enough to push these things out *and* have effective enough discovery tools then we can make at least some gain with very little loss. There is a ROI question obviously – and that needs exploring – but my sense is that even with the limited discovery tools we have today we could extract a lot more value.

I do agree–the false dichotomy is article/no article, and what we’re both really talking about is a combination of creating new information sources and evolving/enhancing the article.

I think there are enormous problems though, with the concept of efficient salami slicing:

First, there’s the ethical issues/privacy issues/public danger that can come about if we’re releasing medical trial data prematurely. If I feed 4 rats grapefruit for a week and they don’t get cancer, a really efficient salami slice is my dataset showing that grapefruit prevents cancer. But then, after 6 weeks the rats all die from acute grapefruit poisoning, I may have done a lot of damage to public health by publishing my results prematurely.

Second is the effect that has on the public’s trust of science. Kent wrote a nice piece last week on how the public trusts science less and less (http://scholarlykitchen.sspnet.org/2012/03/30/why-is-science-both-more-important-and-less-trusted/). If we move from a system based on completed, understood and vetted studies to one based on preliminary results and suspicions, then we completely lose the public trust.

And finally, from a pragmatic point of view, it simply doesn’t work in an environment of limited funding and limited jobs. There’s an inherent advantage in keeping your research secret. It means that you’re the only one who can exploit the fruits of your own labor. If you’re giving away the data you’ve collected before you’ve fully exploited it, then you’re helping those who are competing with you for funds and jobs. I have had several scientists literally laugh in my face when explaining the concept of open notebook science.

My graduate lab had made a transgenic mouse line with an unexpected phenotype. The mice died about 6 weeks after birth. It took us about 2 years to really understand what was happening. Had we released the data, another lab that specialized in the phenomenon (enteric nervous system migration) might have recognized it, recreated our mice and beaten us to the punch. You can argue that this is more efficient, and a better way to do science. But that final result two years later got the lab an article on the cover of Nature, which led to a large grant (paying many salaries in the lab) and it led to a faculty job for the lead author.

I don’t think it’s realistic to ask researchers to be completely altruistic and expect them to give up their own career advancement opportunities. Science is already a hard enough job. There’s an enormously long training period with very little reward offered at the end. We are driving many of our best and brightest minds away from research. If we require that they become self-sacrificing monks, we’ll lose even more.

One thing that might affect the future of the article is simply that it cannot encompass enough – that increasingly in the sciences at least, an article is just a teaser with the real science residing in the underlying data. It is possible that, increasingly, scientists and researchers will be interested in data sets rather than the articles that describe them, and new data-only publications will spring up – this has already started.

Publishing of data sets has already begun–Giga Science is a particularly interesting experiment of this nature (http://www.gigasciencejournal.com/). But again, I think the notion that the article cannot encompass enough is a misreading of the purpose of the article. The article is a highly evolved form meant to document a completed set of experiments. It’s meant to give the reader an easily digested summary of what was done and what was learned. That’s an entirely different purpose than opening up your data sets for re-use or reinterpretation.

If I’m reading about discovery that’s not directly related to the experiments I’m doing, then no, I don’t want to dig through your data. I just want the quick summary. And for many experiments, the data has very little value for re-use. Many experiments are done under very specific conditions to answer very specific questions.

And of course, there’s the self-reflexive point–if we just publish data sets, what happens when I re-use your data set for my new experiment? How can I publish that new interpretation, that new use of your data set? Do I just republish your old data set and claim it as my own?

Indeed, I regard journal articles as merely large abstracts. For example, DOE estimates that their research reports average about 60 pages long, while the resulting articles might average just 6 pages. Thus the reports have maybe 10 times as much information. This is one of the reasons I promote public access to research reports. Beyond that the normal practice is to contact the author.

But this is still no ground for abandoning the article format. The article summarized the basic information — here is the problem, here is what we did, here is what we found, and here is what it means. Each article is a coherent system of thought. The role seems inescapably necessary.

Comments are closed.