For many years consumers and users of scholarly metadata have been spectators – reduced to berating one’s screen or the universe at large when the recorded metadata for an article clearly does not match the contents.
A new initiative – COMET, or the Collaborative Metadata Enrichment Taskforce – seeks to change that dynamic: what if users could improve metadata so that it’s less wrong? I sent some interview questions over to two of the organizers (Adam Buttrick and Juan-Pablo Alperin) to find out more.
How did the COMET initiative come about? What problem are you trying to solve?
The Collaborative Metadata Enrichment Taskforce (COMET) emerged from a series of community discussions, beginning in earnest at the FORCE11 conference in Los Angeles and the Paris Conference on Open Research Information. These conversations focused on resolving a key tension: while all of our work demands complete, high-quality PID metadata, our models and methods for producing this metadata are themselves incomplete.
To provide an example, currently, DOIs operate using what we could call a “push” model, where the original creator deposits metadata with a DOI registration agency, such as Crossref or DataCite, and then is solely responsible for its upkeep. This model means that, even if others in the community discover errors or improvements, only the original depositor can make these changes. If the depositor lacks the ability to make these improvements, the metadata remains incomplete or incorrect. As a result, the community often works outside the DOI ecosystem to correct, complete, and enrich DOI metadata, but these downstream improvements end up scattered across different services and platforms, making it hard for everyone to see and benefit from this work.
COMET is trying to close these gaps through a new model where, instead of siloed and fragmentary enrichment work, improvements to metadata can be made openly available, shared across the community, and used to create more robust and complete PID metadata records. This collaborative approach would eliminate redundant work across organizations, ensure the sustainability of enrichment efforts beyond any single service’s lifespan, while also maintaining the integrity and consistency of PIDs through coordinated, transparent improvements.
In terms of to whom these problems belong, creators and consumers of PID (persistent identifier) metadata are already dealing with these challenges and working to resolve them! As such, we like to think of it more as an opportunity to acknowledge that reality, build upon the past efforts, and to rethink our current models to make them better aligned with the community’s needs.
Why now? Is the metadata situation more urgent than before, or is the time right for this initiative?
Initiatives like the Barcelona Declaration and the community curation work being done in services like ROR and OpenAlex have really galvanized the community, encouraging them to rethink the status quo of how PID metadata is both produced and maintained. This urgency has also been articulated in the Barcelona Declaration’s roadmap itself, where a key objective is community control over metadata. This obviously has many dimensions, but from our perspective, emphasizes the fact that the creation of the scholarly record is a collective endeavor, requiring collective stewardship, as opposed to a burden to be managed by any one individual or organization. What we think this also entails is doing our best to make sure PID metadata reflects the sum total of the community’s investment in it, rather than it being enclosed or unduly constrained.
More generally, as the total volume of scholarly outputs and their associated metadata grows in size, so too do the challenges posed by gaps and errors in PID metadata. There is thus no better time than now to tackle them! We are fortunate here that the adoption and use of PIDs has coincided with many organizations independently developing sophisticated workflows to improve this metadata. The moment we are in now is thus one where we can harness these parallel efforts into that collective stewardship, leveraging all of this work to address metadata challenges systematically, rather than in isolation.
How will the initiative work? What’s step 1?
Bringing the community together, moving from those initial discussions to the formation of COMET, was the first step. We convened a series of structured listening sessions, where taskforce members from across the global scholarly communications ecosystem came together to define what a collaborative metadata enrichment infrastructure would entail — from use cases, to product considerations, to governance and technical requirements. The depth of insight and level of engagement people brought to these sessions was truly remarkable. We did our best to capture all of this in detail in the group’s outputs, which we would encourage everyone to review.
The next step is to make the vision that was developed in the context of COMET a reality. This is why we have now published the Community Call to Action, where we are asking those willing to contribute resources towards making this infrastructure exist to come forward and express their interest.
Is there a risk that engaging with many diverse stakeholder groups will lead to the initiative getting bogged down resolving multiple conflicting priorities?
These are challenges faced by every initiative in our space, and while we shouldn’t underestimate them, there are ways to organize and scope the work around creating real and immediate value for as many people as possible, identifying what works and what doesn’t based on those initial attempts, and then iterating on that basis to address the full extent of needs. This is the approach we have taken in our other, successful projects, including our work on ROR and PKP.
COMET is also unique in that rather than viewing diverse stakeholder needs as a risk factor, it treats this diversity as essential to its success. The problem things like DOI metadata currently face is the inverse: they’re designed around a single source of truth model that fails to incorporate the reality of how this metadata is actually consumed and improved by others to create a comprehensive and accurate scholarly record. In more practical terms, a community enrichment model also benefits from these multiple perspectives, as independent validation from different sources helps fill gaps and errors, as well as establishes confidence in the quality of any improvements. For example, if multiple sources agree on how a given gap or error in a record should be addressed, that can be a good signal for the correctness of the fix.
Is there a financial model for the initiative?
Through our Community Call to Action, we will identify which organizations are interested in contributing and in what ways. We expect to blend direct funding, in-kind support, and other resource contributions, with the final model shaped by the input we receive from our community.
Who will ‘own’ the improved metadata?
Our backgrounds are in building open source software and organizing ourselves around things like the Principles of Open Scholarly Infrastructure, so the focus of this work has been on creating community and opportunities for collaboration, rather than ownership in any traditional sense. It was strongly conveyed to us in the context of the taskforce that the enriched metadata should be freely available to the entire community, which only reaffirmed these prior commitments.
Discussion
2 Thoughts on "It Takes a Village: Empowering the Community to Improve Scholarly Metadata through COMET"
I’m not sure I’d want a separate organization tinkering with (enriching) the metadata of the articles we (Canadian Science Publishing) publish without the publisher’s and maybe the author’s consent. I think the responsibility of ensuring an article is tagged with the correct information in its metadata is the publisher’s responsibility. For example, metadata which indicates copyright information.
Maybe I’m missing something or haven’t understood the aims of COMET properly?
COMET isn’t aiming to overwrite or modify publisher metadata. Instead, it’s designed as an assertion store…a place where the community can contribute known, improved versions of PID records. This allows publishers, platforms, and other stakeholders to leverage higher-quality metadata if they choose to.
I was one of the conveners of the taskforce and we fully recognized that some publishers, like Canadian Science Publishing, already do an excellent job managing their metadata, but not every publisher has the same level of resources. Some face constraints due to bandwidth limitations or reliance on third-party vendors. Our goal is to create pathways for better metadata to reach the broader community, not to impose a single model on everyone. Participation is entirely optional, and if a publisher prefers not to engage, they don’t have to.
I’d love to continue the conversation and clarify any concerns—happy to chat further!