Some journals have strong data policies. Others do not.
For the first group, authors are required to deposit a copy of their dataset (or supporting evidence, like images) into a third-party repository that has a commitment to maintain the integrity of original files. Other journals simply require an author to include a data statement in their paper about what they promise to do. Journals with even weaker policies simply state that authors need to maintain their own data and make it available to others when requested, but take no active position in mediating that transaction or holding the authors accountable when they shirk their responsibilities.
This is a tale of a journal with a very weak data policy and what happens when a critical reader tests the limits of that policy.
The journal in question is JASIST, the official organ of the Association for Information Science and Technology. This is not a new title with an august-sounding name, but a staid title for researchers and librarians doing work in information science.
If there were a journal that would take data integrity seriously, it would be JASIST. Only, it doesn’t. From their author guidelines, under “Data Sets, Source Code, and Version of Record”
Authors are expected to maintain a version of the data set or software code used to support any published work (the data set of record) in case there are subsequent concerns raised regarding analysis or data, following the guidance of the Committee on Publishing Ethics (COPE). Authors who chose to make their data sets or code public take on this valuable responsibility: The Journal provides no hosting or curation support for data or code.
The policy states that authors are fully responsible for maintaining their data. But it isn’t an enforceable policy because it uses the term “expected” when it could have easily used “required.” It also gives it gives no tools for the editor to take action if expectations are not met. It may have been easier to just write: “We take no responsibility for the underlying evidence supporting claims printed in our journal.” Let’s not deceive the readers.
So, I shouldn’t be surprised in how a story about a problematic paper published in the pages of JASIST played out.
In my Scholarly Kitchen post, “bioRxiv and citations,” I took issue with the data analysis, reporting, and interpretation of a paper claiming that bioRxiv increases citations in published papers. More importantly, I questioned the validity of the results because the authors’ dataset included a column of zeros where non-zero numbers should have been.
After several failed attempts to contact the authors of the paper for answers, I contacted the Editor-in-Chief (EiC), a professor at Syracuse University iSchool, who advised me to submit a Letter to the Editor. That was back in June, 2024.
My letter went out for peer review and when it returned, I was asked to make some minor revisions. Before resubmitting, I went back to the supporting dataset and noticed that it was changed. That column of zeros was replaced by a column of plausible numbers. The authors were able to alter their dataset, because it was sitting on a personal Microsoft OneDrive account and not in a third-party data archive.
I alerted the EiC, and that’s where the lawyers were called in. Okay, I don’t know if they were lawyers, but the case was bumped up to Wiley’s Integrity Assurance & Case Resolution Team. They seemed to act like lawyers, because my letter (now on version 2) went nowhere.
Wiley wanted the EiC to issue an Expression of Concern. Later, they changed their minds. In the meantime, the principle investigator of the paper, wrote me directly and asked me to withdraw my letter. In a long appeal, he wrote:
…this paper represents Ms. Hongxu Liu’s first publication, and she is nearing the completion of her PhD. A published letter expressing concerns on her research will have tremendous impact on Ms. Liu’s career prospects. I agree the paper is flawed, but Ms. Liu had done honest research and there was no misconduct involved. You would understand the importance of reputation in academia, especially for young researchers. I am deeply worried that this will become a huge setback for Ms. Liu to pursue a research career and years of training will be wasted.
Forget about the problems in the paper and post-publication alteration of their dataset. Forget that I was ignored repeatedly by the authors and that writing a letter was my only course of action. Forget that that the PI agrees that the paper is flawed, and that these flaws were missed in the several rounds of editorial and peer review. Forget that the editor asked me to submit a letter. I was the bad guy here. I should show some compassion. Shame on me.
After prodding the EiC again, I got another response. The editor was finally willing to publish my letter. I just had to do one small thing — remove one sentence from my Letter. This is the sentence:
Following my correspondence with the JASIST Editor-in-Chief, the authors modified their supporting online dataset, distorting the published Version of Record.
To me, this is the statement that holds the most weight. While authors and readers can debate all they want about how to build and interpret regression models, no one can debate whether the authors modified their dataset after publication. And if they can alter their supporting evidence post-publication, then they have distorted the Version of Record. If you take a minute to ponder the implications, it opens up a whole Pandora’s box of integrity issues if authors can simply make alterations to their data after someone questions their work. There is a reason why good journals have strong data policies.
Asking for the above sentence to be removed, the EiC acknowledges that the authors violated the journal’s policy of altering “the data set of record.” And if they did alter the published record, isn’t this grounds for retraction? I’ve been wondering why the EiC would want me to remove this sentence if he is only publishing a Letter and the Authors’ Response. The only explanation that makes any sense is that, by deleting the Version of Record statement, the question of retraction vanishes altogether. The entire issue gets reduced to a simple misunderstanding between the authors and reader.
Like many stories reported in Retraction Watch, long, drawn-out tales like mine involving editors, corporate lawyers, and ultimatums that reduce real integrity concerns to bland milquetoast may be a common experience of those who spend their time trying to do the right thing. It makes me wonder why I spent all this time in the first place. I’m sure others feel the same way.
In the end, I’m not willing to compromise my own integrity because Wiley’s Integrity Assurance and Case Resolution Team asked me to. So, I told the editor that the sentence needed to stay in.
I’m awaiting for a response on my letter, now on version 3.
Discussion
6 Thoughts on "Does Altering A Dataset Merit Retraction?"
As I wrote back in 2016, “Without effective monitoring and enforcement, the policy becomes an empty promise.”
https://scholarlykitchen.sspnet.org/2016/01/13/what-price-progress-the-costs-of-an-effective-data-publishing-policy/
Good for you. Hold strong!
If there was a “clerical” error in the data set as posted, that did not influence the published paper, they can establish that and annotate the documents accordingly. But if the data omissions impacted the paper’s results, all should be retracted…. And shame on all concerned for trying to cover this up.
Thanks Phil. If it helps, COPE and Force11 published guidelines on handling data publication including corrections. So there is “official” guidance for editors to refer to or to be reminded of. See https://scholarlykitchen.sspnet.org/2022/10/20/force11-cope-release-recommendations-on-data-publishing-ethics/ Basically the correction, as for article corrections, should list what was corrected and why. But/and larger point is that journals should require deposits in ideally the most relevant repository.
thanks Phil, this is an interesting case you flag up, have you discussed or heard more on this from COPE?
A couple of tangential comments:
figshare does allow versioning of deposited data, if updates are carried out, there is a clearer audit trail (am not saying it’s right to change data when published, but just noting, the original data set would still be visible, along with the new dataset). Is versioning of datasets OK?
I do know a number of funders don’t approve GitHub as a valid repository for data/code deposits on it’s own, because of the lack or persistent identifiers, and that the data or code can be changed – DataSeer.ai helps do these compliance checks, and recommends authors place a copy of the data/code on a repository that is pre-approved, depending on the topical nature of the research (GitHub can be linked to Zenodo for example).
It does seem like you had to spend a lot of time on this, and the system isn’t working optimally – hope there are lessons learned here, appreciate you sharing the tortuous experience.
“Have you tried COPE?” is a question I have asked before but one I never ask seriously now.
COPE’s team can help you get a reply from an editor or publisher but that’s about it. To my knowledge they have never actually done anything. I would still do it because I have a hard head and like brick walls, but I don’t expect much.
Altering a dataset should be grounds for retraction. In examining a recent article (Chen et al, NEJM, 2023, Issue 3), I noticed “The data were Winsorized at the 95% level, to limit the influence of outliers”. This kind of data involved no outliers (bounded responses), so that made no sense. I did get a letter published, and they had to re-analyze the data – no important changes. That’s not really the point. The point is that they were so sloppy, careless, and inexperienced with professional data examination that they felt that this Winsorization (rolling high values down to the 95th pcnt point value) showed a sophisticated and well-educated data analysis view, where it actually showed that they were complete ignoramusi in data analysis.