Much Ado About Metadata 2020!

Metadata – digital “data about data” – is arguably one of the most powerful tools available in scholarly communications. Good metadata enables discoverability and access, and (potentially) eliminates errors. But all too often we are stuck with bad metadata – incomplete, inaccurate, and out-of-date. Metadata 2020, a new initiative being launched today, aims to change all that.

Metadata 2020 is a community-led initiative, organized by Crossref in collaboration with associations, publishers, universities, and other scholarly communications organizations* globally. At its core is the belief that investing in richer metadata should be the scholarly community’s top priority, because:

Richer metadata fuels discoverability and innovation
Connected metadata bridges the gaps between systems and communities
Reusable metadata eliminates duplication of effort

The aim is to create awareness and resources for everyone involved in creating and using scholarly metadata — researchers and research organizations alike — through a community effort. To quote Crossref’s Executive Director, Ed Pentz: “Everybody in scholarly communications has a responsibility to improve metadata.”

It’s an exciting but challenging goal! Among the issues that the Metadata 2020 advisory group (of which I’m a member) have been grappling with are the scope and goals of the project; the opportunities for innovation, including automation; how to successfully engage the community; and how — and when — will we know if we’ve been successful?

One important point to keep in mind is that, despite the name, metadata is the means not the end goal. The real prize is what we will be able to do with good/better/best metadata once it’s readily available. It’s not an exaggeration to say that it could — will! — be life-changing. We’ve already made great leaps forward in terms of discoverability and accessibility through our improved ability to connect the dots between research and researchers. It’s now possible, albeit in a somewhat limited way at present, for information about researchers, their affiliations, grants, and research outputs to flow seamlessly between systems that use persistent identifiers for people, places, and things. Imagine how much more powerful this information would be if supplemented by comprehensive, accurate, up-to-date metadata.

One major task will be defining what constitutes good metadata, or perhaps more accurately, what is good enough metadata, since (librarians cover your ears!) we may have to sacrifice completeness for detail. As one researcher interviewed by Metadata 2020 put it: “I’d rather have empty fields if the information is not available than a field that combines different kinds of examples. For me it’s really detailed information with exact definitions. Remove the ambiguity. A blank field helps us more and is more transparent.”

Of course, good (enough) metadata will never be a reality if we don’t make it easy to create. This is where we need some serious innovation. Rather than reinventing the wheel, though, perhaps we can adapt a system that researchers are already using. Manuscript submission systems seem a likely candidate, however, they’re already viewed as too complex and time-consuming by some authors, so how can we ask them for even more information? Synchronization and automation will be key in solving the challenge, as will taking a community approach — collaborating around common processes and standards in order to build (sometimes competing) platforms and systems.

Last but not least, to be successful we will need to effectively engage the community in Metadata 2020. To do so, we’ve identified the following communication goals:

Raising awareness of the importance of sharing richer metadata
Providing information for the community on the role of metadata in making scholarly content discoverable
Encouraging publishers, aggregators, funders, research institutions, and service providers to make a public commitment to increase the quality of their metadata
Facilitating communication between the stakeholders to encourage collaboration
Equipping all stakeholders with tools and information

In addition, we want to use real-life stories from the community to help increase understanding of why metadata is important and gain buy-in for improving it. A great example from the initial Metadata 2020 workshop came from a Spanish researcher, who told us: “The Spanish law has an article about the open access repository. When they tried to evaluate the degree of accomplishments of the papers under Spanish funded research, they came up with a disappointing approach because the metadata wasn’t sufficient. Most of the repositories didn’t include the project number. There wasn’t a metadata attribute in the records. They couldn’t even measure properly the accomplishment of the open access deposit because we didn’t have good enough or standard metadata. That’s why I like the metaphor of agile development, you need to demonstrate the cost of low quality metadata, that is the key.”

So who is “the community”? Although the Metadata 2020 initiative is being led by Crossref, it’s not just about journals, or about books and journals; and it’s not just a publisher initiative – it’s a community initiative. Having input, and feedback from all parts of the research ecosystem will be critical to Metadata 2020’s success, and participation is open to all.

As it says on the website: “Who is Metadata 2020? You are!”

*My own organization, ORCID, is one of Metadata 2020’s advisors, and I am an advocate for the initiative

Alice Meadows

I am a scholarly communications consultant with many years experience of both academic publishing (including at Blackwell Publishing and Wiley) and research infrastructure (at ORCID and NISO). As well as consulting independently I also act as a consultant-at-large for Open Research Ecosystem (ORE) Consulting. I’m actively involved in the information community, and served as SSP President in 2021-22. I was honored to receive the SSP Distinguished Service Award in 2018, the ALPSP Award for Contribution to Scholarly Publishing in 2016, and the ISMTE Recognition Award in 2013. I’m passionate about improving trust in scholarly communications, and about addressing inequities in our community (and beyond). Note: The opinions expressed here are my own

Discussion

6 Thoughts on "Much Ado About Metadata 2020!"

Good article and wonderful and ambitious initiative. At first I thought that we can improve metadata at the manuscript submission level, but I do appreciate the level of detail already requested of the author at this time. Perhaps publishers can collect more information at the acceptance level?

I think the question “what is good enough” metadata is appropriate. I think we need to ask what do we plan on doing with this data and start from there. No use in asking for 20 pieces of info, if, for example, we will only need/use 10.

I wonder if we can also include data on conflicts of interest. This data collection is weak. Keeping track of COIs and updating them continues to be an administrative nightmare for many publishers.

Looking forward to hearing more about Metadata 2020!
And yes…it is about community.

By Josephine E. Sciortino
Sep 6, 2017, 8:11 AM

Thank you for raising awareness about this new initiative. I’m especially interested in the open and reusable aspects of the initiative. Interesting to reflect on the kinds of business models that these principles may foster – and foreclose.

By Roger C. Schonfeld
Sep 6, 2017, 9:05 AM

Thanks, Alice, for this great summary and helping to spread the word! Establishing this ethos of a shared responsibility approach is really excellent. This is an area I think many standards bodies have struggled to articulate, it can be a challenge to generate a sense of calling or urgency in this space. But, Metadata 2020’s tagline is golden and their call-to-action is most inspirational!

By Lettie Y. Conrad
Sep 6, 2017, 5:02 PM

Fantastic project!
Cataloguers will back you to the hilt. We advocate for high quality metadata at the institutional level. Would you be interested in presenting at next years’ @CILIPCIG conference?

By Jane Daniels
Sep 7, 2017, 2:29 AM

Thanks everyone for the positive feedback and support! The suggestion re including data on COIs is a good one. Jane, please can we follow up offline about the CILIPCIG conference via info@metadata2020?

By Alice Meadows
Sep 7, 2017, 12:10 PM

“At its core is the belief that investing in richer metadata should be the scholarly community’s top priority, because:”
point 4 ….good metadata reduces mis-communication and mis-interpretation

this extends into the clinical world of EHR’s….everyone’s EHR contains a measurement of blood pressure and this is a quantitative number…but there are more than 20 different ways to measure blood pressure and variance as much as 15 points can be common…not to mention when the measurement was made in reference to time of day, meals, etc

By Michael Liebman
Sep 8, 2017, 1:43 PM

The Scholarly Kitchen

Alice Meadows

Discussion

Announcing Our 2026 New Directions Seminar: “What Is a Journal in 2030?”

Ten Community Perspectives Celebrate Digital Preservation, the Second Recipient of the Rosenblum Award for Scholarly Publishing Impact

OMB Proposed Rule Town Hall: Summary, Board Outcomes, and Community Resources

Alice Meadows

Related Articles:

Next Article: