Metadata – digital “data about data” – is arguably one of the most powerful tools available in scholarly communications. Good metadata enables discoverability and access, and (potentially) eliminates errors. But all too often we are stuck with bad metadata – incomplete, inaccurate, and out-of-date. Metadata 2020, a new initiative being launched today, aims to change all that.
Metadata 2020 is a community-led initiative, organized by Crossref in collaboration with associations, publishers, universities, and other scholarly communications organizations* globally. At its core is the belief that investing in richer metadata should be the scholarly community’s top priority, because:
- Richer metadata fuels discoverability and innovation
- Connected metadata bridges the gaps between systems and communities
- Reusable metadata eliminates duplication of effort
The aim is to create awareness and resources for everyone involved in creating and using scholarly metadata — researchers and research organizations alike — through a community effort. To quote Crossref’s Executive Director, Ed Pentz: “Everybody in scholarly communications has a responsibility to improve metadata.”
It’s an exciting but challenging goal! Among the issues that the Metadata 2020 advisory group (of which I’m a member) have been grappling with are the scope and goals of the project; the opportunities for innovation, including automation; how to successfully engage the community; and how — and when — will we know if we’ve been successful?
One important point to keep in mind is that, despite the name, metadata is the means not the end goal. The real prize is what we will be able to do with good/better/best metadata once it’s readily available. It’s not an exaggeration to say that it could — will! — be life-changing. We’ve already made great leaps forward in terms of discoverability and accessibility through our improved ability to connect the dots between research and researchers. It’s now possible, albeit in a somewhat limited way at present, for information about researchers, their affiliations, grants, and research outputs to flow seamlessly between systems that use persistent identifiers for people, places, and things. Imagine how much more powerful this information would be if supplemented by comprehensive, accurate, up-to-date metadata.
One major task will be defining what constitutes good metadata, or perhaps more accurately, what is good enough metadata, since (librarians cover your ears!) we may have to sacrifice completeness for detail. As one researcher interviewed by Metadata 2020 put it: “I’d rather have empty fields if the information is not available than a field that combines different kinds of examples. For me it’s really detailed information with exact definitions. Remove the ambiguity. A blank field helps us more and is more transparent.”
Of course, good (enough) metadata will never be a reality if we don’t make it easy to create. This is where we need some serious innovation. Rather than reinventing the wheel, though, perhaps we can adapt a system that researchers are already using. Manuscript submission systems seem a likely candidate, however, they’re already viewed as too complex and time-consuming by some authors, so how can we ask them for even more information? Synchronization and automation will be key in solving the challenge, as will taking a community approach — collaborating around common processes and standards in order to build (sometimes competing) platforms and systems.
Last but not least, to be successful we will need to effectively engage the community in Metadata 2020. To do so, we’ve identified the following communication goals:
- Raising awareness of the importance of sharing richer metadata
- Providing information for the community on the role of metadata in making scholarly content discoverable
- Encouraging publishers, aggregators, funders, research institutions, and service providers to make a public commitment to increase the quality of their metadata
- Facilitating communication between the stakeholders to encourage collaboration
- Equipping all stakeholders with tools and information
In addition, we want to use real-life stories from the community to help increase understanding of why metadata is important and gain buy-in for improving it. A great example from the initial Metadata 2020 workshop came from a Spanish researcher, who told us: “The Spanish law has an article about the open access repository. When they tried to evaluate the degree of accomplishments of the papers under Spanish funded research, they came up with a disappointing approach because the metadata wasn’t sufficient. Most of the repositories didn’t include the project number. There wasn’t a metadata attribute in the records. They couldn’t even measure properly the accomplishment of the open access deposit because we didn’t have good enough or standard metadata. That’s why I like the metaphor of agile development, you need to demonstrate the cost of low quality metadata, that is the key.”
So who is “the community”? Although the Metadata 2020 initiative is being led by Crossref, it’s not just about journals, or about books and journals; and it’s not just a publisher initiative – it’s a community initiative. Having input, and feedback from all parts of the research ecosystem will be critical to Metadata 2020’s success, and participation is open to all.
As it says on the website: “Who is Metadata 2020? You are!”
*My own organization, ORCID, is one of Metadata 2020’s advisors, and I am an advocate for the initiative