Guest Post - How Identifiers Can Help Publishers Do a Better Job of Curating the Scholarly Record

Editor’s Note: Today’s post is by Richard Wynne. Richard is the founder of Rescognito, an open platform for research contribution recognition. Richard also serves as a Strategic Advisor to deepPath.aI and Cactus Communications.

If you attend a scholarly publishing conference, you’re likely to hear comments along the lines of: “the problem with ORCID is that it’s full of junk. Too many fake Albert Einsteins and Mickey Mice”. This quip makes it psychologically safe for publishing executives to either dismiss ORCID as too contaminated to be useful, or to lament ORCID’s inability to authenticate identity.

But expecting ORCID to proactively police identity is completely unrealistic and (arguably) outside its mission. The real value of ORCID is uniqueness and persistence. From a publisher’s point of view, the fact that ORCID does not guarantee authenticity is not a problem but an opportunity to add value. Layering metadata on top of a unique and persistent foundation is a fantastic way to improve their offerings and to build new, useful solutions.

Let’s consider a use case related to scholarly retractions.

Background

Scholarly journal editorial practices are the subject of growing scrutiny. Severe reputational damage occurs when publishers are perceived as overlooking best practices in peer review and editorial integrity. Any doubt about this disappeared when hundreds of millions of dollars were wiped off the value of Wiley following their announcement of bulk retractions of Hindawi articles.

Historically, best practices in peer review revolved around the evaluation of publication content. This approach remains an important aspect of journal practice, however, the emergence of generative AI, ever-more complex content, and limited editor time, means that deriving “quality signals” from content alone is increasingly sub-optimal.

For this reason, publishers should explore “quality signals” systemically derived from researcher identity and metadata associated with identity.

Example:

The fact that an author’s work was previously retracted for alleged research malpractice (such as image manipulation) should provide an informative “quality signal” to a would-be publisher suggesting that the author’s work deserves a higher level of scrutiny. In other words, while a previously retracted author should never be outright barred from publication, publishers should never make editorial decisions without being aware of the author’s history.

Where ORCIDs are in use, this association can be made with a high degree of certainty. In the example below ORCID ID 0000-0001-6205-3317 was associated with the MDPI manuscript published on November 3, 2023 and with a manuscript previously retracted by RSC Advances on August 24, 2022:

0000-0001-6205-3317 published on November 3, 2023 by MDPI AG in ‘Rapid Photocatalytic Activity of Crystalline CeO2-CuO-Cu(OH)2 Ternary Nanocomposite‘ may be subject to a retraction or expression of concern as:

● 0000-0001-6205-3317 published in 10.1039/c7ra11763a “Retracted Article: Anti-cancer activity of hierarchical ZSM-5 zeolites synthesized from rice-based waste materials” which was subject to a Retraction notice on August 24, 2022 at: 10.1039/d2ra90079c‘

On the other hand, where ORCIDs are not in use, a similar connection can only be speculation based on name and institutional string matching:

FA Essa possibly from Kafrelsheikh University, EG published on November 6, 2023 by Frontiers Media SA in ‘Thermal and entropy behavior of sustainable solar energy in water solar collectors due to non Newtonian power law hybrid nanofluids‘ may be the subject of a retraction or expression of concern as:

● FA Essa possibly from Kafrelsheikh University, EG possibly published in ‘RETRACTED Solar still with condenser A detailed review‘ with Retraction notice June 2016: 10.1016/j.rser.2016.01.020

● FA Essa possibly from Kafrelsheikh University, EG possibly published in ‘RETRACTED Thermal analysis of an annular fin under multi boiling heat transfer coefficient using differential transform method with Pade approximant DTM Pade‘ with Retraction notice in July 2023: 10.1177/09544089231188713

Publication of previously retracted authors is not rare. Every day, hundreds of previously retracted authors are indexed in Crossref with new publications, but unfortunately only a small proportion can be definitively identified using ORCID IDs. But, using name and institutional string matching suggests a much higher level of retracted author publication.

Subject to the limitations of methodology, the results below show the approximate number of instances where a previously retracted author has been indexed in Crossref during a sample period (the first week of November 2023):

	Total Articles	Instances of retracted author publication based on ORCID ID	Instances of retracted author publication based on name and institution string match
Informa (T&F)	6,399	4	57
Springer Nature	10,701	22	407
Wiley	6,450	23	175
Elsevier	77,409	172	3,142
Wolters Kluwer	1,970	1	66
MDPI	4,333	25	121
Frontiers	1,408	0	70
Sage	1,440	1	18
Cold Spring Harbor Laboratory*	1,320	15	46

*publishers of bioRxiv and medRxiv

(supporting data available from Richard Wynne)

In other words, failure to comprehensively adopt ORCID iDs makes it cumbersome for publishers to know when they are re-publishing authors who were the subject of prior retractions, meaning that authors with a track record of research malpractice are continuing to contribute to the scholarly record. Bad apples are being tossed back into the barrel at an alarming rate.

Conclusion

Curating identity does not come naturally in a trust-based publishing culture where editors are expected to vouch for their authors, reviewers, and the integrity of their work. But historically valid editorial practices do not scale in a modern, global, open access, AI publishing context.

More than 10 years since the foundation of ORCID, most scholarly authors of newly published manuscripts are still only identified by a text string rather than by a unique and persistent identifier (e.g., based on Crossref data, on November 2, 2023, only 10,542 of the 40,883 authors in 8,426 research articles had ORCID IDs). Understandably publishers are reluctant to assume the cost of collecting additional metadata, especially for co-authors, but as I outlined in a prior Scholarly Kitchen post, such costs arise more from antiquated workflow practices than from author reluctance.

Publishers have had such durable and valuable brands that until recently, ineffective curation of researcher identity has not mattered much in economic terms. But now, the community seems much more sensitive to how well publishers perform this core function; and new markets could emerge for publishers who view this as an opportunity.

Richard Wynne

Richard Wynne is the founder of Rescognito, an open platform for research contribution recognition. Richard also serves as a Strategic Advisor to deepPath.aI and Cactus Communications.

Discussion

12 Thoughts on "Guest Post — How Identifiers Can Help Publishers Do a Better Job of Curating the Scholarly Record "

It’s a nice idea, but if it were actually adopted, offenders would respond by creating new Orcid IDs every time an article associated with their old one was retracted. Then you’d be back to matching up names, except now you’ve also ruined the entire purpose of Orcid IDs along the way.

By Melissa Belvadi
Feb 13, 2024, 8:28 AM

Thank you for the comment. Of course, no solution is perfect. An author who repeatedly creates new ORCID IDs not only disjoints their scholarly record but also leaves “breadcrumbs” that signal greater integrity risk. For example, in several of the Hindawi manuscripts retracted by Wiley (e.g. https://doi.org/10.1155/2022/7100238), all the authors had newly minted ORCIDs that were created within a few minutes of each other 16 days before acceptance. This is a pattern that can be mechanically detected and should lead to heightened scrutiny in editorial workflow.

By Richard Wynne
Feb 13, 2024, 9:26 AM

Thank you for this insightful post, which eloquently describes why identity verification is about to shoot up everyone’s priority list. I completely agree we should all be insisting on ORCID identifiers for all our authors, co-authors, reviewers and board members, and urge everyone in publishing to move towards this as fast as they can. But it’s not a panacea – far from it.

I think ORCID is the best placed in our industry to respond to the need for some kind of identity verification. We’ve all seen and heard the stories about completely fake profiles and identity being misappropriated. As soon as papermills know we require an ORCID for every author, they’ll start creating them on authors behalf (in fact they’re almost certainly doing this already). If FA Essa has been retracted with their 0000-0001-6205-3317 ORCID, there is nothing stopping them setting up another completely new ORCID and using that on future submissions. ORCIDs are helpful, but far from foolproof. It’s not enough just to know the ORCID history – we still need to look at the name history too, so this isn’t really helping much.

I’m firmly behind ORCID as the most useful disambiguation service out there, but it could be so much better. The Trust Markers project is a really good start, I just think there’s more to be done, and ORCID has the best chance of doing it.

In short, data out is only as good as the data in.

By Kim Eggleton
Feb 13, 2024, 8:56 AM

+1 on the trustmarker project.

ORCID on it’s own can’t solve identity verification, because it’s too easy to create a new one if no contextual value is assigned to it. To be effective, there needs to be a network of entities that are connected with assertions and importantly, who makes those assertions is important to how trustworthy they are. For example, if a funder says a researcher was awarded a grant, that’s amore trustworthy assertion than the researcher saying so. Likewise, if a publisher says that somebody published a peer-reviewed article with them, that’s been cited many times, that’s better than the author claiming to have written something really important.

The lesson of reproducibility crisis and the recent retraction crisis, it’s that the solution to the problem of research integrity needs to be more sophisticated than a single point of enforcement.

By Phill Jones
Feb 13, 2024, 1:00 PM

Or, if getting an ORCID was as formal/regulated as getting a bank account (which I see no reason why it shouldn’t be in principle – accepting all the cost and complexities of doing so), it could go such a long way to solving identity verification, and cement its place in the scholarly infrastructure.

I know this isn’t an easy fix, but we have a big complex problem here – I suspect the solution(s) will need to be the same.

By Kim Eggleton
Feb 13, 2024, 1:58 PM

Or consider that if your list of papers on ORCID could only be populated by the publishing journal and the ORCID iD used at the time of publication (basically the assertion Phill mentions above). If you had a paper retracted and you switched to a new ORCID iD, then all of your past papers would not be associated with that iD, and you’d essentially be starting your career from scratch. Perhaps useful for someone with lots and lots of shady behavior (and subsequent retractions) but maybe not such a good look for someone far along in their career to be publishing their first paper at such a later stage.

By David Crotty
Feb 13, 2024, 2:07 PM

Good point, Phill. There are readily available tools to improve reproducibility. For example, scite can make retraction data available to publishers (and the public) at the article level. They also have a manuscript reference check, which can be used by reviewers to screen before accepting a paper, thus avoiding citing a retracted paper in a newly published article. Working grant information into this would strengthen the veracity of the tool. Integrity is the last bastion of Science. It must be defended — even if it means admitting to a having let down our guard in the past.

By Sharon Mattern Büttiker
Feb 15, 2024, 1:06 PM

This is an interesting idea that deserves much more consideration. Your example of a retraction is a limited example of a “quality signal,” but let’s continue with this. Consider this retracted paper had 10 authors, for three of whom, this paper represents their first publication. Such an association with a retraction could essentially put a stop to their academic careers if future editors see a flag associated with their name. You may counter that editors should use such a flag to do more research about these authors, but in the absence of a detailed narrative about why they were associated with a retracted paper, an editor of a selective journal may simply discount ever publishing papers from flagged authors, especially those without long publication histories.

In the US, employers are able to ask whether potential employees were ever convicted of a crime, however minor. Such power of that simple check-box to deny employment has led to a counter-movement to “ban the box” on employment application forms.

Considering this real-life analogy, is there an argument to make that protecting early-stage academics outweighs the potential of catching serial fraudsters?

By Phil Davis
Feb 13, 2024, 9:21 AM

This is an important comment Phil, thanks for raising this. Research integrity is vital and these ideas will have a place in a high-volume publishing environment. At the same time, we need to mitigate against automated checks based on assumptions that are inaccurate or discriminatory, which will result in a narrowing of the field of people who can publish, when in fact we need the opposite.

By Vicky Bache
Feb 16, 2024, 7:14 AM

Thank you for this insightful post!
From the RRID Initiative (ORCIDs for key biological resources) we whole-heartedly agree with this sentiment.

We also see far fewer problems with reagents and resources in cases where RRIDs are used.
Example:
Babic et al 2019; Meta-Research: Incidences of problematic cell lines are lower in papers that use RRIDs to identify cell lines; https://elifesciences.org/articles/41676

By Anita E Bandrowski
Feb 13, 2024, 12:22 PM

Thanks for sharing your perspectives on this, Richard. At ORCID, we agree that the ubiquitous use of PIDs such as ORCID leads to important “quality signals” you mention and can make the job of “knowing your author” much easier for editors and reviewers. We also agree that it is not ORCID’s job to validate researcher identities or act as an authority on who is or is not considered a legitimate researcher. As an infrastructure organization, this isn’t our role, and indeed we believe it would be dangerous for any single organization to take on such broad gatekeeping responsibility.

ORCID records are designed to offer something far more robust than validating identity: the ability for record holders to demonstrate their identity. ORCID records accumulate trustworthiness over time. The simple fact that a record has been used consistently over a long period provides one source of trustworthiness. Additional trustworthiness comes from our organizational members adding validated assertions to ORCID records, such as universities adding validated affiliation and education information, publishers adding validated publications, and funders adding validated funding awards. When organizations add information to ORCID records, that provenance is captured as part of the record itself, allowing organizationally validated information to be clearly differentiated from self-asserted information. We call these organizationally validated assertions “trust markers”.

This year we piloted a project to create a summary of a record’s trust markers in a way that is visually easy for editors, reviewers, or other interested parties to see, at a glance, the trust markers in that record. And like all of our data, trust markers are included in our machine-readable metadata and API responses, subject to the privacy controls of the record holder.

In this way, ORCID acts not as a single gatekeeper of researcher legitimacy, but as a clearing house of trusted information about researcher bona-fides, which is exchanged openly and transparently, and can be used as determined by the needs of any specific use case to make decisions on the trustworthiness of the record holder. This metadata is already being used in heuristics and algorithms that help weed out paper mills and other forms of academic misconduct, and we are cooperating closely with initiatives in this space such as United2Act and the STM Integrity Hub.

For more information, see our blog post at https://info.orcid.org/summarizing-orcid-record-data-to-help-maintain-integrity-in-scholarly-publishing/

By Chris Shillum
Feb 13, 2024, 2:27 PM

I’m in humanities and social science, where an ORCID identifier serves mostly for simple author identification, even if one lists works in the ORCID database. It seems that STM fields may require a system beyond that.

By Lou Mendola
Feb 17, 2024, 5:41 AM

The Scholarly Kitchen

Guest Post — How Identifiers Can Help Publishers Do a Better Job of Curating the Scholarly Record

Background

Example:

Conclusion

Richard Wynne

Discussion

Latest “Pulse Check” Results Reveal Diverse Approaches to Social Media

SSP Joins Nearly Half Million Comments in Opposition of Proposed OMB Revisions

Background

Example:

Conclusion

Richard Wynne

Related Articles:

Next Article: