Editor’s Note: Today’s post is by Rebecca Grant, Iain Hrynaszkiewicz and Amy Bourke-Waite, and is based on a preprint of trial results and the subsequent peer-reviewed article published in the International Journal of Digital Curation. Rebecca is Research Data Manager at Springer Nature; Iain is Head of Data Publishing at Springer Nature; Amy is Director of Communications, Open Research at Springer Nature.
Data sharing is like maths at school.*
Bear with us.
It might seem harder than the other subjects. You might feel your teachers are not very good at explaining it. But if you do not pay attention, you will very quickly find that many real-world skills rely on maths; and you would have benefited from learning the basics as it provides a solid foundation for the rest of your adult life (whether your ambitions are to become an astronaut, a Grandmaster of chess, or simply to balance your personal expenses).
Likewise, data sharing and data management form the foundation of global academic collaboration, discovery and scientific advancement. Sadly, surveys show that academics rarely get formal training in good data management (let alone best practice), and data management is rarely incentivized by institutions. All too often even the basics are ignored, with data ending up languishing on a USB stick or on a paper notepad.
If we want research to be discovered, shared, and reused, then the same must be said of the underlying data.
This is increasingly relevant as there is growing attention on mandatory Data Availability Statements (DASs) from publishers, institutions and funding agencies; this is particularly true in the UK where DASs are a requirement of UK Research and Innovation’s (UKRI) Common Principles on Data Policy. While a DAS does not always equal data sharing, these statements are a means to determine if and how research data are available – which can also assist funding agencies and research communities in assessing compliance with their policies. Increased prevalence of DASs will also enable further research, using machine-driven approaches (such as with natural language processing, text and data mining), across multiple journals and publishers, to analyze the types of DASs provided – and types of data sharing practiced – by researchers in different disciplines and journals.
Researchers report that they do not have enough time to share data, such that additional checks on submitted papers that push for data sharing will be costly for editors and publishers.
Springer Nature is nonetheless keen to encourage data sharing. More than two years ago we announced that all original research papers accepted for publication in Nature and the other Nature titles would be required to include information on whether and how others could access the underlying data by including a DAS. While there is some evidence of the benefits of data sharing in research, we also wanted to understand the costs of introducing DASs. We therefore examined the impact that introducing DASs had on authors and editors, and how the availability of datasets was reported.
Introducing Data Availability Statements across the Nature journals
Our specific aims were to 1) assess the ways by which researchers chose to make their data available, and 2) to measure the additional time required by editors and production staff to ensure a data availability statement is a) included in the manuscript, b) accurate, and c) correctly copy-edited. (All of the staff involved in the study were in-house, giving a reasonable basis on which to indicate potential cost to the publisher).
Nature Data Availability Statements require authors to provide information on where the data supporting the results reported in their article can be found, if and how they can be obtained. An example might be:
- The datasets generated during and/or analyzed during the current study are available in the [NAME] repository, [PERSISTENT WEB LINK TO DATASETS].
As a pilot, editors of five participating Nature journals were asked to self-report the number of additional minutes it took to ensure an appropriate DAS was provided for each manuscript they processed, for 2 months. Copyeditors and production staff were also asked to provide an estimated average additional manuscript processing time for all papers they handled in this initial period. We also invited comments on the process of incorporating the DAS into the journal workflow.
To ensure all published papers included a DAS after the policy was implemented, a DAS was requested by the journal Editors or an Editorial Assistant for all papers at the “accept in principle” stage. The request was included in the decision letter. Editors were required to update their correspondence templates and checklists, including a link to new author guidance. Production needed to update copyediting and style guides to familiarize themselves with the new section as well as having additional content – the DAS – to check and process in each manuscript. Now that the policy is established, the requirement to provide a DAS has moved to earlier stages of the peer-review process.
Once the papers were accepted for publication, the text of each DAS was read and categorized into one of four different types:
- Type 1 states that the data are available from the author on request.
- Type 2 states that the data are included in the manuscript or its supplementary material.
- Type 3 states that some or all of the data are publicly available, for example in a repository.
- Type 4 states that figure source data are included with the manuscript (this is a method of data sharing used by some authors in a subset of Nature journals that publish life sciences research.)
The second phase of this project gathered data using the same process from an additional 20 journals. These were from the biological and physical sciences, which introduced the same policy, and provided the same information, as the previous journals. Data were gathered by each journal for two months after implementation of the policy, then analyzed.
In total, once the first and second phases of the project were added together, we analyzed 557 manuscripts. The journals which contributed to the project all fall under a Type 3 data policy which requires the inclusion of a DAS, meaning that every manuscript submitted to these journals was subject to the checks, self-reporting and coding.
Reporting data sharing takes time
We found that adding mandatory DASs to all accepted articles in journals operated by professional editors did increase manuscript processing time. For the first phase of the pilot, the addition of the DAS had an impact of approximately 15-20 minutes editorial and production (copyediting) time per accepted paper across all five journals.
- Once the authors had responded to our initial request for a DAS, it took ten minutes extra editorial time on average, or a median time of eight minutes per paper, to add the DAS to the manuscript.
- Five minutes extra copyediting time was required to ensure that the DAS matched journal style guidelines.
- For Nature Communications, which used a slightly different methodology, 90% of editors reported 15 minutes or less to ensure the DAS was present for most manuscripts.
- The Type 1 statement, where data are available on request, took least time (5.9 minutes on average) to add to a paper, likely because these are a single formulaic sentence and there are no links to check.
- The Type 3 statement, where some or all data are publicly available, took the longest (18.2 minutes) as the editor needed to undertake additional checks.
The second, larger, group of journals that introduced mandatory DASs reported that fewer additional minutes were needed to incorporate a DAS into a manuscript. Possible reasons for this include greater editor and author awareness of the policy and supporting documents; improved internal communication and editor training after the first phase of the pilot; and/or greater attention being needed on the pilot journals, which informed, and made more rapid, editor training on handling future DAS for manuscripts in their discipline. (Further analysis, for example on discipline differences, is available in the original article, and the dataset supporting our analyses is on figshare.)
Investing in data sharing for the future of research
Submission-to-publication-time is an important metric, and anything that slows down publication could be seen as a negative for authors and readers – and the publisher. However, given the importance of data sharing and the value added by DASs, we believe the extra editorial time is well-invested, even (or especially) in the more complex case where the data are already publicly available. We also anticipate efficiency of incorporating DASs will improve as they become a more common editorial requirement. As editors and authors are more familiar with including them, and publishers continue to improve their guidance and procedures on providing them, we should benefit from increased experience and economies of scale.
We have already used information from this pilot to inform the implementation of data policies by other Springer Nature journals. For example, we have developed in-house administrative support for academic editors, so that journals without professional editors can also introduce DASs consistently. Simple, practical information, such as the additional time needed to process manuscripts, is valuable for editors and support staff in understanding the impacts of editorial policy changes.
In the two years since we started this work, the landscape has changed, and no doubt it will continue to evolve. Since Springer Nature began introducing standardized data policies, similar initiatives have been introduced by other large publishers such as Elsevier, Wiley, Taylor & Francis, Hindawi, and BMJ, and the standardization of research data policies across the industry is underway. As well as understanding the benefits of increasing accessibility to research data to advance discovery, it will be increasingly important to understand costs – particularly for publishers, funding agencies and policy makers.
We strongly recommend that other journal publishers looking to introduce DASs prepare by ensuring that necessary support and training is available for researchers, editors and production staff, building in extra time, or tools, and enabling them to share and cite data wherever possible. We encourage other publishers to be similarly data-driven and transparent in how they implement research data policies, and collaborate in our industry via groups such as the Data policy standardization and implementation Interest Group of the Research Data Alliance (RDA). We also welcome further research in this area, particularly on associations, if any, between the provision of particular types of data availability statement and research visibility and impact as studies have tended to be limited to specific disciplines and journals.
* ‘Math’ for American readers.