FORCE11 was formed out of a community of researchers, publishers, librarians and software developers who found common cause in attempting to rethink the ecosystem of scholarly communications and get the community to leverage the benefits of electronic scholarship. Over the years, the group has been at the forefront of initiatives such as data citation, the FAIR initiative, software citation and its use in scholarship, as well as researcher rights among others. Partnerships have also been core to FORCE11’s mission with many of these initiatives being joint efforts across the community including with the Research Data Alliance (RDA), Research Software Alliance (ReSA), UCLA, and the Committee on Publication Ethics (COPE).
Last fall, the joint FORCE11 & COPE Research Data Publication Ethics Working Group published recommendations for the ethical handling of research data publication. The group has built on the recommendations and released policy templates for journals and publishers, as well as data repositories. Both are freely available via the FORCE11 website. The leaders of this group, Daniella Lowenberg and Iratxe Puebla, along with group member Matthew Cannon, shared their reflections on the project, its goals, and what the group has accomplished .
Let’s start by please describing the problem that the group set out to solve.
Discussions around research data sharing have been developing in recent years. As publishers, funders, institutions and other groups are setting policies for how researchers should treat their research data, there has been increasing focus on the practicalities of researchers sharing data and depositing in repositories. Yet, there has been a gap in guidance and best practices for handling integrity and publishing ethics issues. The importance of this work by the FORCE11 working group has come into sharper focus in recent weeks with the policy announcements of NIH and OSTP which will require grant awardees to also make all data they collect available at the point of article publication.
In recent years as more data has been shared, there have been increasingly high profile stories of ethical issues involving the data underlying publications. A few recent examples of this include:
- The New England Journal of Medicine retracted an article covering the impact of certain heart medications on people with COVID-19, because they were unable to independently verify their data set (retracted article link). Three of the same four authors also retracted an article in The Lancet which reported that the antimalarial drug hydroxychloroquine might be dangerous to patients with COVID-19 for the same reason (retracted article link)
- In Proceedings of the Royal Society B a paper was retracted due to “duplications…[in the data which]…cannot be adequately explained nor corrected” and ultimately changed the findings of the paper. (retracted article link).
The Committee on Publication Ethics (COPE) has provided guidance and workflows to assist all publishing stakeholders with research integrity issues around journal publications for many years. However, the resources around research data were limited, and data repositories had not been involved in their creation. This led to the formation of a joint working group between FORCE11 and COPE to create recommendations, workflows and resources specifically around handling of research data from the viewpoints of repositories and publishers. The group members represented a range of stakeholders, including institutions and learned societies who all contributed to the creation of the project outputs, even though they eventually focused on actions for data repositories and journal publishers.
Without going through all the recommendations, can you summarize some of the key elements?
As the range of ethical issues that could arise is very broad, the working group agreed upon some thematic areas on which to focus: Authorship and contributorship; Legal and regulatory concerns; Rigor; and Risk. The recommendations provide guidance on how both unpublished and published datasets should be updated or corrected in case of concerns being raised. A second key element is how hosts of related research objects should communicate (for example the publisher of a journal article, and the repository that hosts the related data set) when ethical cases arise.
One of the main outputs of the working group has been the creation of a series of flowchart diagrams. For those who are used to working with COPE guidance these should be encouragingly familiar. There are workflow documents for each of the four areas and these will be available in the coming weeks.
Who are these recommendations aimed at? Who is best in a position to adopt them?
The guidelines contain guidance and best practice actions to be taken by data repositories and publishers. A lot of the actions focus on lines of communication between these two stakeholders, when a concern is raised to either party, and either the data repository or the publication needs some kind of action to be taken. There are also points where the guidelines suggest that other stakeholders be involved, for example, academic institutions.
The guidelines have consciously been written so they can provide useful guidance to publishers and repositories of all sizes, regardless of the level of support or resource available. While some publishers may have large teams handling issues around research integrity, smaller publishers may only be able to allocate part of a person’s time. Same with repositories, where for many resources are much more stretched. It is hoped the recommendations and the workflow documentation will make it easier for all groups to take action when ethical issues arise.
The cultural shifts within various disciplines and institutions seem the biggest challenge to some of the group’s recommendations — did they consider addressing the foundational changes needed to shift researchers to revise and/or take up new data management practices?
While the group that came together to work on this project would all support increased uptake of data management practices by researchers, shifting author behavior around sharing of data was not the primary aim. When considering the data lifecycle (such as the JISC RDM model), the discussions of the group were primarily focused on the publication and sharing of data — the point at which data have been submitted to a repository. While it is accepted that integrity issues are often highlighted and resolved as part of the peer review process, data are not commonly reviewed. Therefore the aims of the group focused on increasing communication and development of best practices between various stakeholders in the publication processes
Since the recommendations have been published, what has been the reaction by the community?
So far the recommendations of the group have been shared via FORCE11 website, the COPE community, promoted by RDA and ISMTE, and presented by WG members at the World Congress on Research Integrity and at CNI’s Fall Member meeting, and through a publication in PLOS Biology. While most of the feedback received has been anecdotal, it’s been very supportive. With the publication of the supporting flowcharts, the group encourages the community to further endorse, adopt and implement the recommendations. We appreciate that comments will mainly crystallize as the recommendations are put into practice, so we want to hear from stakeholders as they use them to guide their responses to ethical cases. We welcome feedback or commentary on these outputs via working group members.
What are your next steps regarding this effort?
The publication of the recommendations and forthcoming publication of this set of flowcharts marks the end of the main active phase of the working group. The biggest step now is for mass adoption by journals, repositories, and institutions. As we see broader uptake, we will continue to promote the opportunity for the community to send feedback on these outputs. As feedback is received the group can reconvene to update these resources as data practices develop in the future.
Finally, it is worth pointing out that not all of the points raised in the discussions on the topic areas are included in the outputs. A range of questions or topics were raised which need further discussion among the community such as terminology for when datasets are removed or retracted. More ideas were shared by the group chairs in this paper published in PLOS Biology.
As joint leaders of the project we want to thank the members of the group for their input and time in attending meetings during the project duration. To the broad community we encourage you to engage with the outputs and let us know how useful they are with guiding ethical discussions of data sharing.