When you read a piece of journalism from a well-known publication, why do you trust it? In all likelihood, you believe that events have taken place during the article’s production to make the reporting accurate and trustworthy. For example, you likely assume that the journalists involved will interview and check sources, and that editors will revise and edit the piece. But this process is opaque, and for all intents and purposes, appears identical no matter the news source. In the current age of disinformation and “fake news”, nearly anyone can create a website that looks and feels like a high quality brand, even if the content they produce is completely made up. What if, instead of trusting the publication’s brand alone, we could peek behind the curtain, and get a glimpse of the events taking place in the creation of articles?

model of Tower Cranes with TRUST Word

Trust in academic journal articles is based on similar expectations. Journals carry out editorial processes from peer review to plagiarism checks. But these processes are highly heterogeneous in how, when, and by whom they are undertaken. In many cases, it’s not always readily apparent to the outside observer that they take place at all. And as new innovations in peer review and the open research movement lead to new experiments in how we produce and distribute research products, understanding what events take place is an increasingly important issue for publishers, authors, and readers alike.

With this in mind, the DocMaps project (a joint effort of the Knowledge Futures Group, ASAPbio, and TU Graz, supported by the Howard Hughes Medical Institute), has been working with a Technical Committee to develop a machine-readable, interoperable and extensible framework for capturing valuable context about the processes used to create research products such as journal articles. This framework is being designed to capture as much (or little) contextual data about a document as desired by the publisher: from a minimum assertion that an event took place, to a detailed history of every edit to a document.

Numerous recent initiatives (Transpose, Peer Review Transparency, Review Maps, and STM’s “A Standard Taxonomy for Peer Review”) have been positive developments in the space of peer review experimentation and transparency. Such initiatives understandably tend to focus on the needs of the creators, and can be limited to traditional, even field-specific, peer review processes. They may not focus on representing editorial practices in ways that can be reliably aggregated, surfaced, and queried, or capture the full range of editorial practices and events needed to accommodate new publishing workflows in which reviews may be conducted by multiple parties, in different ways, at multiple points in time. But they can be integrated into a framework like DocMaps to allow those who do, or wish, to use them to frame their contextual metadata around them.

DocMaps aims to be a framework with a common way of describing editorial events, on which publishers of documents can place the components they can, or wish to, share. We have come up with two example use-cases for DocMaps with the help of our Technical Committee. In one, a publisher wishes to capture context about a double anonymized review of an article published in their journal with two rounds of revisions. In the other, an independent review service notifies a preprint server about a review of an article on their platform, describing a fully transparent review of a preprint article with links to the review report and author response. These are early attempts to describe conventions based on the ideas we have prioritized, and more detailed information is available.

In the course of our work, we have identified and clarified two key issues. The first is that we are focused on the fact that an event takes place, but do not seek to pass judgement about that event. How an event is viewed and judged is up to the consumer of the DocMap. For example, a third-party review service may use a DocMap to assert that they positively reviewed a preprint, but it should be up to the preprint server to decide if that review should be posted alongside the preprint.

This could in turn be used to increase trust in scholarly publishing, and particularly address concerns about misinformation with respect to preprints, and to clarify the status of retracted works, which have come to the fore during the COVID-19 pandemic. For example, an article funded by an organization with a particular agenda was recently posted on Zenodo, claiming that COVID-19 had been developed in a lab. The preprint was subsequently promoted in a disinformation campaign, reaching the attention of the White House. To most observers, including journalists and the general public, looking at the article on Zenodo does not tell you that no quality control or screening has been undertaken, and it has all the appearance of a legitimate article. If scholarly repositories surfaced information about editorial processes, the difference between an article (and its vetting process) on Zenodo versus e.g., on medRxiv would be readily apparent. Similarly, clearer, machine-readable ways of flagging retractions would make discredited works more readily identifiable as such, and even enable publishers to demonstrate that retractions of published papers can lead to improvements in the review process.

Of course, no open system is ever completely immune to manipulation. In the Zenodo example above, the authors of the COVID-19 preprint could potentially publish a fraudulent DocMap asserting that a review process that never actually occurred took place on their preprint. But this happens regularly today with predatory journals that falsely claim to thoroughly review submissions. In a world with DocMaps, the fraud will be harder to perpetrate from a technical perspective and much more transparent, putting more of the burden of proof of process on the publisher and allowing readers to make more informed judgements about the likelihood of fraud.

The benefits compound at an ecosystem level. By making more data about editorial processes more accessible, DocMaps will allow third parties, including repositories like Zenodo, to more effectively screen for fraud, and researchers to more systematically study which processes work and which don’t. Crucially, because the data will be public and interoperable, researchers would be able to do that independently.

The second issue we have found is that there can be a temptation to be prescriptive, and attempt to define specific taxonomies or workflows that can be represented. DocMaps aims to be agnostic to these definitions, and to focus instead on recording the existence of the underlying editorial events that shape taxonomies and workflows. Similar to the preprint server example above, we expect that publishers and consumers of DocMaps will describe and interpret the events provided according to their needs. In this way, we aim to ensure flexibility, and machine-readability, of contextual information for all types of uses, including non-traditional and even non-academic ones. We expect that conventions for publishing and interpreting DocMaps that can be used with specific taxonomies and workflows will arise, and intend to provide venues to support the development of these conventions.

The DocMaps project aims to create a community-endorsed framework for representing review and editorial processes, and to ensure that the object-level editorial metadata models in development are compatible with a broad range of possible futures for scholarly publishing, rather than locking in the current system. To do this, we are now looking for broader input from the community at large, through our Co-Creation Community. We want to identify different potential models and providers of object-level editorial events relevant to the biology community (including multiple models currently in development); develop the DocMaps specification to meet the needs of those efforts; work with key stakeholders to produce technical guides for implementing DocMaps in their current or planned systems/frameworks; and lay out a technical roadmap for a future aggregation service and browser extension. To get in touch and join the effort, or simply for more information, please contact DocMaps project manager Gary McDowell.

Jessica Polka

Jessica Polka is the Executive Director of ASAPbio, a researcher-driven nonprofit working to promote innovation and transparency in life sciences publishing. Jessica leads initiatives related to peer review and oversees the organization’s general administrative and strategic needs.

Gary McDowell

Gary McDowell is an early-career researcher with a background in biomedical sciences working at Future of Research.


2 Thoughts on "Guest Post — Putting Publications into Context with the DocMaps Framework for Editorial Metadata"

So is DocMaps envisioned as manuscript management system or a secondary system to run in parallel with the publication work flow?

Great question, and one that we plan to address in a next implementation phase. Ideally, DocMaps could be provided by, or at least extracted from, existing manuscript management systems as part of the publication workflow. In practice, at least initially to make it easy to implement, it seems likely that it will need to be extracted from websites that provide review information post-publication. There’s been some interest in the idea of an initial implementation of a badge-like service that allows a publisher to put a single piece of code on the page that triggers a crawl of the page, creates a DocMap in a central repository from what it finds, and returns a user-facing badge or statement. When publishing systems are updated to output DocMaps, the service would be able to find the DocMap during the crawl and simply archive that. These are all open questions and ideas that we’re planning for the next phase of the project as we begin to pilot with some initial partners. We welcome folks to reach out if they have thoughts or are interested in participating!

Comments are closed.