Metadata – digital “data about data” – is arguably one of the most powerful tools available in scholarly communications. Good metadata enables discoverability and access, and (potentially) eliminates errors. But all too often we are stuck with bad metadata – incomplete, inaccurate, and out-of-date. Metadata 2020, a new initiative being launched today, aims to change all that.
Metadata 2020 is a community-led initiative, organized by Crossref in collaboration with associations, publishers, universities, and other scholarly communications organizations* globally. At its core is the belief that investing in richer metadata should be the scholarly community’s top priority, because:
- Richer metadata fuels discoverability and innovation
- Connected metadata bridges the gaps between systems and communities
- Reusable metadata eliminates duplication of effort
The aim is to create awareness and resources for everyone involved in creating and using scholarly metadata — researchers and research organizations alike — through a community effort. To quote Crossref’s Executive Director, Ed Pentz: “Everybody in scholarly communications has a responsibility to improve metadata.”
It’s an exciting but challenging goal! Among the issues that the Metadata 2020 advisory group (of which I’m a member) have been grappling with are the scope and goals of the project; the opportunities for innovation, including automation; how to successfully engage the community; and how — and when — will we know if we’ve been successful?
One important point to keep in mind is that, despite the name, metadata is the means not the end goal. The real prize is what we will be able to do with good/better/best metadata once it’s readily available. It’s not an exaggeration to say that it could — will! — be life-changing. We’ve already made great leaps forward in terms of discoverability and accessibility through our improved ability to connect the dots between research and researchers. It’s now possible, albeit in a somewhat limited way at present, for information about researchers, their affiliations, grants, and research outputs to flow seamlessly between systems that use persistent identifiers for people, places, and things. Imagine how much more powerful this information would be if supplemented by comprehensive, accurate, up-to-date metadata.
One major task will be defining what constitutes good metadata, or perhaps more accurately, what is good enough metadata, since (librarians cover your ears!) we may have to sacrifice completeness for detail. As one researcher interviewed by Metadata 2020 put it: “I’d rather have empty fields if the information is not available than a field that combines different kinds of examples. For me it’s really detailed information with exact definitions. Remove the ambiguity. A blank field helps us more and is more transparent.”
Of course, good (enough) metadata will never be a reality if we don’t make it easy to create. This is where we need some serious innovation. Rather than reinventing the wheel, though, perhaps we can adapt a system that researchers are already using. Manuscript submission systems seem a likely candidate, however, they’re already viewed as too complex and time-consuming by some authors, so how can we ask them for even more information? Synchronization and automation will be key in solving the challenge, as will taking a community approach — collaborating around common processes and standards in order to build (sometimes competing) platforms and systems.
Last but not least, to be successful we will need to effectively engage the community in Metadata 2020. To do so, we’ve identified the following communication goals:
- Raising awareness of the importance of sharing richer metadata
- Providing information for the community on the role of metadata in making scholarly content discoverable
- Encouraging publishers, aggregators, funders, research institutions, and service providers to make a public commitment to increase the quality of their metadata
- Facilitating communication between the stakeholders to encourage collaboration
- Equipping all stakeholders with tools and information
In addition, we want to use real-life stories from the community to help increase understanding of why metadata is important and gain buy-in for improving it. A great example from the initial Metadata 2020 workshop came from a Spanish researcher, who told us: “The Spanish law has an article about the open access repository. When they tried to evaluate the degree of accomplishments of the papers under Spanish funded research, they came up with a disappointing approach because the metadata wasn’t sufficient. Most of the repositories didn’t include the project number. There wasn’t a metadata attribute in the records. They couldn’t even measure properly the accomplishment of the open access deposit because we didn’t have good enough or standard metadata. That’s why I like the metaphor of agile development, you need to demonstrate the cost of low quality metadata, that is the key.”
So who is “the community”? Although the Metadata 2020 initiative is being led by Crossref, it’s not just about journals, or about books and journals; and it’s not just a publisher initiative – it’s a community initiative. Having input, and feedback from all parts of the research ecosystem will be critical to Metadata 2020’s success, and participation is open to all.
As it says on the website: “Who is Metadata 2020? You are!”
*My own organization, ORCID, is one of Metadata 2020’s advisors, and I am an advocate for the initiative
Discussion
6 Thoughts on "Much Ado About Metadata 2020!"
Good article and wonderful and ambitious initiative. At first I thought that we can improve metadata at the manuscript submission level, but I do appreciate the level of detail already requested of the author at this time. Perhaps publishers can collect more information at the acceptance level?
I think the question “what is good enough” metadata is appropriate. I think we need to ask what do we plan on doing with this data and start from there. No use in asking for 20 pieces of info, if, for example, we will only need/use 10.
I wonder if we can also include data on conflicts of interest. This data collection is weak. Keeping track of COIs and updating them continues to be an administrative nightmare for many publishers.
Looking forward to hearing more about Metadata 2020!
And yes…it is about community.
Thank you for raising awareness about this new initiative. I’m especially interested in the open and reusable aspects of the initiative. Interesting to reflect on the kinds of business models that these principles may foster – and foreclose.
Thanks, Alice, for this great summary and helping to spread the word! Establishing this ethos of a shared responsibility approach is really excellent. This is an area I think many standards bodies have struggled to articulate, it can be a challenge to generate a sense of calling or urgency in this space. But, Metadata 2020’s tagline is golden and their call-to-action is most inspirational!
Fantastic project!
Cataloguers will back you to the hilt. We advocate for high quality metadata at the institutional level. Would you be interested in presenting at next years’ @CILIPCIG conference?
Thanks everyone for the positive feedback and support! The suggestion re including data on COIs is a good one. Jane, please can we follow up offline about the CILIPCIG conference via info@metadata2020?
“At its core is the belief that investing in richer metadata should be the scholarly community’s top priority, because:”
point 4 ….good metadata reduces mis-communication and mis-interpretation
this extends into the clinical world of EHR’s….everyone’s EHR contains a measurement of blood pressure and this is a quantitative number…but there are more than 20 different ways to measure blood pressure and variance as much as 15 points can be common…not to mention when the measurement was made in reference to time of day, meals, etc