Redundant and Expensive – How F1000 Research’s Model Reveals the Root Problems of PubMed Central

Time is Money (Photo credit: Tax Credits)

In October 2012, F1000 Research applied for inclusion in PubMed Central. In January 2013, F1000 Research trumpeted that their content would be indexed in PubMed based on its approval for inclusion in PubMed Central. Yet, as of this writing, no F1000 Research content has yet appeared in either PubMed Central or PubMed. As Rebecca Lawrence, Publisher for F1000 Research, explained in an email in response to some questions (response dated April 21):

. . . there’s been a delay. I think it’s pretty standard for there to be a hiatus between a journal’s being approved for indexing and its actually going live on PMC, but in our case it’s a slightly longer story.

This is what’s happening: We’ve been working with PMC to finalise the xml tagging, which is more complex than normal for us because we have versioning and also because PMC is going to index all the referee reports and comments plus all the underlying data. We are very nearly ready to go now and PMC has prepared the first batch of articles so I am hopeful that those first articles will show up in a week or so.

So far, there are still no F1000 Research articles appearing in PubMed Central.

A review of email correspondence surrounding the tagging and ingestion of F1000 Research content reveals the fundamental flaws of PubMed Central – its activities consist of redundant and duplicative efforts; it is overly complex and bureaucratic; and it costs the US taxpayers millions of dollars per year when its core functionality could be realized for a fraction of the cost, if not for free.

Because F1000 Research has a bewildering peer review model, one that allows multiple versions of the same article to exist simultaneously, supports iterative open peer review, and leverages the standards of indexing services as its editorial approval standards, the tagging model posed some obvious initial challenges, especially with versioning, as this email from Ed Sequeira from January 10, 2013, shows:

It looks like they’re now using our acceptance criteria to classify articles on their site. . . . V2 has a prominent note about what was updated and a brief reiteration of a v1 rejection. V1 has a lengthy and convincing argument dismissing the entire paper as seriously flawed. That rejection was written a day after the paper got two unconditional acceptances. So if we’re not in sync on reports we present a misleading picture. . . . My proposed solution: our acceptance criteria remain the same – 2 yeses or 1 yes and two maybes. But when an article only makes the PMC grade at V2 or later, their first submission should include submissions for all earlier versions.

This email reveals a few problems PMC had to spend a lot of hours wrestling with — synchronizing feeds, dealing with multiple version files for one article, and the perplexing problem of the article that is accepted and later rejected, a problem I didn’t see solved in the emails received so far.

Remember, this email was sent in January, but discussions started much earlier, as emails from October 2012 show. The following comes from an email dated October 23, 2012:

Jeff and I started looking at options for capturing the Reviewer Responses a few weeks ago, so we’ll need to pick that up again.

Later emails show that capturing the various reviewer responses led to Sequeira proposing a complex color-coded tagging scheme that was proving frustratingly complex to some of his colleagues.

Fast-forward back to January 2013, and one of the main engineers seems to be giving up hope that the team at PubMed Central could ever adequately tag to the F1000 Research model:

We’ll need to figure out what happens if we receive a version greater than what we are expecting next. The safest thing to do is to die and let the JM sort it out with F1K.

Where does the redundancy and expense come in? Quite simply, government contractors, staff, and management at PubMed Central have all been involved for months and many, many meetings and test flights of files in solving a tagging protocol that F1000 Research had already solved. This is the epitome of redundant activity. If PubMed Central were merely an indexing service with signposts to the final versions, ala PubMed, it would cost far less to run, and problems like this wouldn’t arise.

F1000 Research is not the only publisher having to go through this duplicative and expensive process — the head-spinning complexity of its model merely makes the redundancy and expense vividly clear. Aside from PubMed Central’s perceived favoritism for eLife, over the past year the largest complaint I’ve heard from OA publishers is that PubMed Central is slow, difficult, and erratic in how they work on technical standards and content ingestion.

Posting and hosting duplicative copies of published articles not only costs US taxpayers money, it costs publishers money, as well. That’s inefficient, which makes it irresponsible.

You may believe that I personally think PubMed Central needs a serious reboot – at the management level, at the vision level, and at the technical level.

If so, you are quite perceptive.

Kent Anderson

@kanderson

Kent Anderson is the CEO of RedLink and RedLink Network, a past-President of SSP, and the founder of the Scholarly Kitchen. He has worked as Publisher at AAAS/Science, CEO/Publisher of JBJS, Inc., a publishing executive at the Massachusetts Medical Society, Publishing Director of the New England Journal of Medicine, and Director of Medical Journals at the American Academy of Pediatrics. Opinions on social media or blogs are his own.

Discussion

3 Thoughts on "Redundant and Expensive – How F1000 Research’s Model Reveals the Root Problems of PubMed Central"

Politics aside, the practice of indexing F1000 Research reveals fundamental problems encountered when attempting to index a journal that publishes articles that may go through iterations of review, revision and certification.

Versioning itself is not a problem–the arXiv has been dealing with this issue for years. Multiple versions of an article can coexist if they are labeled sequentially and readers understand that new documents are updates of the SAME paper. However, certification is something that only has real meaning when tied to the VERSION of the paper. An early submission may not be acceptable for publication and be denied certification while a later revision of that paper may pass the bar of certification. In this case, certification of the latest version of the article does not imply certification of ALL VERSIONS of the article.

Yet the model used to index F1000 Research runs into some apparent contradictions when it comes to certification. PMC considers F1000 articles to be certified when they receive the requisite number of peer decisions (2 yeses or 1 yes and two maybes). If an article then receives a “reject” vote, is the paper then uncertified, i.e. unpublished and therefore unindexed? If F1000 is going to tie certification with versioning, then a revised article should be required to go through another full round of review before being certified again. In the meantime, is the journal required to publish a “temporarily retracted, pending further review” notice?

The publish-first-review-later model may speed up the dissemination process of an article, but it clearly opens up a huge number of certification issues. To me, the PeerJ model of repository publishing makes much more sense–papers can be public and go through multiple versions until the author decides that it is time for submission, after which, the paper is treated like any other manuscript in the journal system. F1000 seems to want to have its cake and eat it too.