Authority, Business Models, Experimentation, Housekeeping, Metrics and Analytics, Technology, Tools, World of Tomorrow

As Hybrid Open Access Grows, the Scholarly Community Needs Article-level OA Metadata

Wordle on a German Open Access Text

Wordle on a German Open Access Text (Photo credit: Wikipedia)

Over the past decade, open access (OA) delivery of content has seen rapid growth and has proven itself as a viable business model for distributing scholarly content. Increasingly, many publishers are providing OA options to authors, and a growing number of funding agencies are mandating that researchers release their results to the community via some form of OA distribution model. According to the Registry of Open Access Repositories Mandatory Archiving Policies (ROARMAP), there are now 54 funding organizations mandating some form of OA publication as a condition of a grant award, with an additional 10 funders considering some form of mandate. Beyond funding mandates, some 163 institutional mandates are in place worldwide, with several dozen more sub-institutional mandates. Many “traditional” (i.e., subscription-based) publishers are moving beyond limited experimentation with different OA options to full implementation of hybrid models. The SHERPA/RoMEO site provides a list of more than 100 publishers who offer hybrid OA services along with details of those plans. These hybrid models allow for accommodation of papers from authors who are either required to publish OA as a condition of their funding, or due to adherence to an institutional mandate, or are simply choosing to pursue OA publication for their own personal preferences.

While there are an increasing number of OA articles in these hybrid titles, there is no standard way to represent to readers, or to discovery systems, that an article is freely available outside of a subscription wall. In most cases, discovery and link-resolution systems describe access terms only at the journal level, so OA papers that are published in hybrid journals might not be made visible to patrons because of the systems’ presumption about access. If a searcher does not specify they would like to include all indexed articles, all of the content contained within a subscription-based journal title that the institution does not have access to will likely be excluded from search results. If a user does specify they want a broader “all-content” search and is provided with an OA article as a result option, often the only way to determine if an article is freely accessible is to attempt to gain access and await the appearance or absence of a paywall.

Similarly, there is no standard logo or icon that is used to represent OA. The Public Library of Science and SPARC released for public use the “open lock” logo. The German project made another OA logo available, but neither logo has been broadly adopted as the standard indicator of OA content by publishers. For example, out of five publishers — Nature, Springer, Elsevier, Emerald, and the American Chemical Society — that provide some form of free access to journal content, each displays that fact quite differently on their internal tables of contents. Nature notes the article with orange “open” text beside the title of the article. Springer uses an orange “Open Access” tag in hybrid journal table of contents. Springer also uses a separate broken circle on its SpringerOpen brand, with a blue “OpenAccess” tag next to open article titles. Elsevier uses a green “key” logo on its journal lists to indicate a title has OA, but the key can mean all OA or hybrid. On the individual issue tables of contents, the full text “page” icon is green where an article is OA and white where it is not; a large button with the price on it makes it obvious where payment is required. Emerald notes with a check box what content one has available and at what level, i.e., abstract, backfile, etc. The American Chemical Society notes OA articles with a “Sponsored Access” badge. These are only a few examples of the wide diversity of approaches to this issue. Indications of open access status are as varied as there are opinions about OA’s value in the community.

More important than the visible OA indicator, however, is the back-end metadata, which will be shared with search engine providers, indexed search services, and aggregator services. There is currently no common article-level bibliographic metadata that provides information on whether a specific article is openly accessible. A few services do store some of this information, notably CCC and SIPX, but that information is not broadly distributed. Another challenge for indexing systems is that some content might be embargoed, where access is restricted for a period of time before becoming open access. Finally, while some content might be open for reading, it might not be sharable or reuse-able — and where some re-purposing is available, wide variations exist in what is and isn’t allowed. Currently, there is nothing in place that would allow machine understanding of these characteristics so that any automation can be applied. The use of a metadata structure could be a mechanism for addressing this and other discovery-related issues surrounding this hybrid mix of open- and subscribed-access systems.

Beyond simply accessing content there are several additional use cases for OA metadata at the article level. With an appropriate metadata system, discovery services could use such content-level information for improving their services. The NISO/UKSG KBART project has highlighted this situation in its recommendations for improving link resolver knowledge-bases. To provide users with information about re-use rights, and to be of computational use, this information is required to be in a machine-readable form. Funding organizations could potentially automate the assessment of compliance with their mandates. Institutions might also want to assess their own investments and resource allocations, and researchers might like to determine which journals are compliant with a given funder policy. Libraries are also interested in assessing the amount of OA content that is included within subscription titles, since many hybrid-OA publishers are willing to reduce subscription costs if take-up of author-pays models advance.

Developing such article-level metadata and public indicators are not without their challenges. First among them is the sheer scale of metadata management at the article level. Even at the title level, managing access information is no small feat and despite concerted efforts by several large discovery service providers and library systems suppliers, the title-level problem is only marginally solved. Publishers and libraries are still struggling with the expression of rights expression at the title or even the collection level. Expressing rights information is a complex problem, not least of all because rights documents are verbosely written in legalese and are very dependent on specific licenses, so that even a particular title (let alone an article) could have numerous variations in rights from one licensee to another. This is compounded by some publishers’ value propositions for obtaining revenue on re-use rights of content that is available for free reading. Finally, there are many competing marketing and branding interests tied to the association with certain agreed logos. Overcoming this final obstacle might be the simplest of the three problems, but consensus on many OA issues has often been hard fought and often hard to reach across all members of the scholarly community.

Regardless of a publisher’s particular stance regarding OA, communicating article-level access information is becoming an important factor in our community and one that publishers are going to have to face. Just providing OA options may not be sufficient to either authors, mandating institutions, libraries, or end users unless the OA indications are made clear not just to people, but also to machines that will index, describe, or assess the OA content. Publishers should aim to meet those authors’, funders’, and users’ needs by providing the fullest service possible and by taking advantage of the many benefits that OA provides if made visible to the community.  Certainly some are making efforts, but community consensus will allow tools and systems to take full advantage of the investments in OA. By improving the metadata and indicators of OA availability, along with information on re-use options, the improved discovery and end-user experience should yield higher usage and impact for these articles. And isn’t that the end result that everyone wants?

Enhanced by Zemanta

About Todd A Carpenter

Todd is the Executive Director of the National Information Standards (NISO). He is focused on facilitating information exchange via standards, technology and business best practices within the US and internationally.


28 thoughts on “As Hybrid Open Access Grows, the Scholarly Community Needs Article-level OA Metadata

  1. Amen! excellent summary Todd! I’ve been advocating for article level indicators of OA status for several years–unsuccessfully. Hopefully this insightful summary will move the whole OA movement to seriously address the challenge of article level metadata to identify articles within (most prominently) Hybrid journals that is OA. But it is as you note a systemic problem. There’s no reason that article level indicators couldn’t prominently be displayed in google scholalr, for instance, and certainly the many A&I databases need the same information. Open URL resolver systems capable of article level identification would be a truly revolutionary support for OA.

    Posted by chuck hamaker | Dec 5, 2012, 6:34 am
  2. This does not detract at all from the point you are making but it appears the uptake of for the hybrid option by authors is quite low, 1-2% despite the growing number of publishers offering the option.

    The green accepted version of the same article can be found here.

    Posted by David Solomon | Dec 5, 2012, 7:14 am
    • I wonder if take up is low because few researchers understand it is an option and simply gravitate toward completely Open Access titles presuming that is the best or only option. I expect that as the number of mandates grow, subscription based publishers may find they are losing ground to the OA titles and will seek to more broadly promote their OA options. Better to adopt a solution to this problem now, than to wait until the problem is 5%, 10% or 20% of articles. Also, a standardized approach might enhance the visibility of the option, which would be better for “traditional” publishers than continuing to lose authors to newer OA-only publications.

      Posted by Todd A Carpenter | Dec 5, 2012, 10:07 am
      • You may well be right but I also suspect it is partly due to the price. I believe the going rate for most hybrid journals is ~ $3,000 which is at the upper end of full OA journals.

        Posted by David Solomon | Dec 5, 2012, 2:51 pm
  3. There is a good post about this issue over here:, as ever Rod Page has been ahead of the curve in thinking about this.

    My own personal feeling on this is that PDFs should carry appropriate licensing info in the XMPP, that we should extend the NLM DTD to account for some form of OA description, but beyond those two steps, there is obviously a long way to go. I heard rumours of an idea to set up a crossref like service that would respond with rights and reuse info based on DOI, and I think that is also a strong contender for a way to approach this issue.

    Todd, this is certainly an area I’m interested in, perhaps a topic for an STM working group?

    Posted by Ian Mulvany (@IanMulvany) | Dec 5, 2012, 7:28 am
    • A number of industry organizations are exploring the possibility of a joint working group to address the problem, but we’re not quite there yet in terms of scope and approach. We’re working on it though and hopefully, many of the leading organizations in our space will sign on to participate.

      Posted by Todd A Carpenter | Dec 5, 2012, 10:11 am
      • Hi Todd,

        I’ve been very interested in this idea for a while.

        I’d love to see a centralized meta-data source that covers not only the open access status of individual articles, cross-referenced to the DOI of the publisher version, but also the locations of all openly available instances.

        To my mind, we have one good, large green repository (PMC), a couple of pre-print servers like arXiv and a bunch of fragmented institutional repositories. We’re currently putting the onus on the researcher to check multiple potential locations manually. This situation either leads to people paying for content that they already have a legal right to access, or they get frustrated trying to figure out how to access content and just back-channel it.

        I’d like to learn more about the working group, could you point me in the right direction?

        Posted by Phill Jones | Dec 6, 2012, 10:31 am
  4. It sounds as though the legal issues are the biggest obstacle, with hybrid OA articles caught up in the legal systems of the subscription journals they appear in. This is an organizational problem not a metadata one and if the legal systems have to change then the problem is serious. As an aside I would point out that legal language is complex for the same reason that scientific language is, namely people are talking about highly specialized stuff.

    Posted by David Wojick | Dec 5, 2012, 8:10 am
    • Encoding legal language unambiguously in a way computers can understand it is no small feat. As you correctly state, the reason for this is that legal agreements are inherently complex things. You can’t simply encode something if the thing you are encoding is complex without losing the nuance. NISO and EDItEUR, along with the Publishers Licensing Society and the Digital Library Federation, have been working on license encoding for coming on eight years now. While progress has been made, getting that work (ONIX for Publication Licenses) implemented has been a challenge for a variety of reasons.

      However, the simpler problem of describing to systems that this particular article is openly available–within the context of a subscription-based journal–and exposing it needn’t get held up by the complexities of legal wrangling over terms and describing those terms. Perhaps the easiest approach will be to build out a system that addresses the simpler problem and then addresses the more complex issues later.

      Posted by Todd A Carpenter | Dec 5, 2012, 8:43 am
      • How about drafting standard language that gets the hybrid OA article out of the subscription legal system, a release or some such? I know very little about this particular briar patch but it sounds interesting. My legal background is regulation not copyright but rules is rules and it sounds like we need some new rules to fit the revolution.

        Posted by David Wojick | Dec 5, 2012, 9:28 am
  5. Why would a researcher limit a search to OA articles? It just seems counter to intuitive to me.

    Posted by Laurent Gagnier | Dec 5, 2012, 9:38 am
    • A researcher might not intentionally limit their search, but the default search on many library systems is to limit the search to those things that the library only has access to. That way, it cuts down on confusion about why patrons are provided with search results of materials that they–or more specifically, their institutions–don’t have access to. In this way, journals that aren’t subscribed to are excluded, even if some of the content within that title is OA.

      Posted by Todd A Carpenter | Dec 5, 2012, 10:01 am
      • No working scientist in my field limits him/herself to some library-specific search system. My default search is via PubMed, my students default is usually Google. And frankly, we don’t notice or care how a relevant article is tagged once we find it, the only way we find out if it is not available to us is by clicking a full-text link and hitting a paywall.

        Posted by Mike_F | Dec 5, 2012, 11:48 am
        • I know exactly what you mean. As an ex-scientist myself, i’m very familiar with the work-flow that you’re talking about.

          The problem is that the link-outs in PubMed don’t include institutional repositories or pre-print servers and the ones in google are highly incomplete. As a result, when you search, you’re often hitting a paywall when you might have perfectly legitimate access to that article elsewhere on the web.

          Posted by Phill Jones | Dec 6, 2012, 11:17 am
    • They might limit the search so they can read the articles. Sounds sensible enough.

      Posted by David Wojick | Dec 5, 2012, 10:05 am
      • Exactly, David. Certainly, one could do a comprehensive search of all of the potential content in the world, but unless the researcher is at one of the top 50-75 institutions that have access to the vast majority of content, one might search for everything across all subscribed and non-subscribed content, but only have access to a fraction of that content. This would likely be quite frustrating.

        Also, while PMC works well enough for the biomedical sciences, there is a great deal of other scientific literature that is not included in that service. Granted this is also where the majority of OA content is concentrated, it seems short-sighted to presume that it won’t grow to include other domains as well, especially ones that don’t have as robust an openly-available discovery service as PMC.

        Posted by Todd A Carpenter | Dec 6, 2012, 1:49 pm
  6. Seems to me there is quite a broad range of use cases to which work in this area might be an answer, including those related to search / discovery, library systems and management, institutional and funder research management, and more. There’s a small JISC project called “Vocabularies for OA” – V4OA – that is collecting these and trying to get some consensus of the key semantics, include access levels. It reports in July.

    Posted by Neil Jacobs | Dec 5, 2012, 11:01 am
    • We’ve been talking with JISC as well about this initiative and it likely will lead to other interesting potential use cases and applications for this type of metadata.

      Posted by Todd A Carpenter | Dec 6, 2012, 1:50 pm
  7. This is certainly an important issue, and one that I think will not be solved with link resolvers. Traditional link resolvers have enough complexity at the journal level, requiring constant revision globally and locally. Including books in those systems is often a stretch, but I think that extending them down to the article level would simply be asking too much.

    Perhaps there could be some combination of a journal-level link resolver with article-level data for OA and individually purchased articles. This would likely require cooperation between publishers, aggregators and systems vendors well beyond today’s levels though.

    Posted by Chris Bulock (@chrisbulock) | Dec 5, 2012, 11:19 am
  8. What is the evidence that “there are an increasing number of OA articles in these hybrid titles”? Many journals that implemented hybrid OA years ago have noted little uptake in most (but not all) titles – typically 5-10% – and in many cases this fraction has remained unchanged over the years or even declined. Indeed this has led some early advocates of the hybrid model as a route to open access to concede that it has failed and new wholly open access journals like PLoSOne represent a better strategy to accomplish this.

    The new funder mandates may of course increase the numbers of OA articles in hybrid journals but I’d be interested to see any numbers that indicate this is already happening.

    Posted by Richard Sever | Dec 5, 2012, 12:33 pm
  9. Might it make sense to extend CrossMark to include the fact that an OA version exists complete with a link to an OA version of an article (or, indeed, book chapter – all our books and their individual chapters have free to read versions)? Might make more sense to build on something that already exists than to invent yet another standard ‘ mark’. I’d be happy to contribute to any group that might work to define an OA standard ‘ mark’.

    Posted by Toby Green | Dec 5, 2012, 3:28 pm
  10. Might it not be an idea to extend CrossMark to include a tag to show that an OA version exists, complete with a link to that version? This could work for books and book chapters as well as journal articles (we, for example, have a free to read version of all our books and chapters available). Might be simpler to build on something that exists than create yet another system.
    BTW – if a working group is set up to look at this, I’d be happy to lend a hand.

    Posted by tobygreen | Dec 5, 2012, 3:36 pm
    • Toby, a number of people have suggested CrossMark as a delivery mechanism for this type of OA metadata. Ed Pentz and others at CrossRef have been deeply engaged in the early conversations. You’ll hear more in the new year about a potential project getting off the ground.

      Posted by Todd A Carpenter | Dec 6, 2012, 1:52 pm
  11. SPIE has introduced CC-BY open access for individual journal articles for which authors pay modest page charges. The issue of how the research community determines which articles are open access is, thus, important to us. Thank you for bringing it up, Todd. We have had discussions with various indexers and search services but a unified approach is certainly preferable.
    By the way, PLoS staff told us that their open lock symbol may only be used in cases in which an item is covered by the CC-BY license.

    Posted by Mary Summerfield | Dec 5, 2012, 5:02 pm
  12. I very much agree with the points raised by Todd, thanks for raising awareness for this important topic. We are indexing OA indicators on article level in Primo Central. The purpose is that institutions can show not just articles from journals as available that they subscribe to or that are entirely Open Access but also articles from hybrid journals which they do not subscribe to. Without this indication, OA articles from hybrid journals often remain hidden and don’t get the usage they deserve. We believe with time more and more publishers will provide indicators on at article level – the more awareness we raise to this, the better. I discussed this topic also on our initiatives blog a few months ago

    Posted by Christine Stohn | Dec 6, 2012, 5:20 am
  13. The subject of your post is interesting and timely. Thank you. I think RSS is a potential standard way to represent to readers and aggregators, that an article is freely available outside of a subscription wall. This can be accomplished using the cc-by ( and dc:rights ( elements in the TOC RSS feeds that over 22,000 scholarly journals are regularly publishing.
    In fact some publishers are already using dc:rights in their RSS feeds (e.g. In July 2011 JournalTOCs ( highlighted to JISC ( the possibility of using TOC RSSs a standard way to discover OA articles in hybrid journals. A paper discussing this and other related issues is going to be published in the January issue of Learned Publishing (

    Posted by Santy Chumbe | Dec 6, 2012, 9:33 am


  1. Pingback: What does it cost to publish a Gold Open Access article? « Sauropod Vertebra Picture of the Week #AcademicSpring - Dec 10, 2012

The Scholarly Kitchen on Twitter

Find Posts by Category

Find Posts by Date

December 2012
« Nov   Jan »
The mission of the Society for Scholarly Publishing (SSP) is "[t]o advance scholarly publishing and communication, and the professional development of its members through education, collaboration, and networking." SSP established The Scholarly Kitchen blog in February 2008 to keep SSP members and interested parties aware of new developments in publishing.
The Scholarly Kitchen is a moderated and independent blog. Opinions on The Scholarly Kitchen are those of the authors. They are not necessarily those held by the Society for Scholarly Publishing nor by their respective employers.
%d bloggers like this: