CHORUSToday, CHORUS and the National Science Foundation (NSF) announced an agreement to use CHORUS for facilitating the discovery of NSF funded works. The news from NSF is important for CHORUS as it represents the second large funding agency to use the service.

The partnership is not a huge surprise given that the NSF had signed a Memorandum of Understanding with the Department of Energy (DOE) to use their PAGES database to collect manuscripts that result from federal funding, and the DOE plan has already built the integration with CHORUS. In fact, the DOE was the first agency to sign an agreement with CHORUS. That said, federal agencies have not been particularly transparent about their intentions, and information is coming out in dribs and drabs.

According to the press release today, the NSF plans to collect accepted manuscripts from grantees for deposit in the NSF Public Access Repository (NSF-PAR), hosted by DOE. Just like the DOE-CHORUS workflow, people using the NSF-PAR will either be able to access the accepted manuscript in the NSF-PAR database after the 12-month embargo has expired, or will be presented with a link to the publicly available published version on the publisher site. The idea is that access will be provided to the best available version, using CHORUS as the infrastructure.

This announcement comes on the heels of another partnership. CHORUS announced two weeks ago that the U.S. Geological Survey (USGS) has signed on. The USGS has a policy about making data available and collecting and getting work approved but nothing that is specific to making accepted manuscripts reviewed by journals accessible to the public, so we have little information about how this will work. It is not clear whether the USGS will collect accepted manuscripts from authors for hosting in their USGS Publications Warehouse database.

The National Institute of Standards and Technology (NIST) is in pilot phase with CHORUS, and the Smithsonian Institution is using CHORUS for linking to content.

Two major agencies we have not heard from yet are the Department of Defense and the Environmental Protection Agency. The Department of Transportation announced their plan late last week to use their own National Transportation Library and the USDOT Research Hub for making funded works accessible after a 12 month embargo.

With the NSF on board, CHORUS has been given a big boost. However, it seems many publishers, whose membership dues are the only source of financial support for CHORUS, have been hanging back to see which agencies will participate. CHORUS is not easy or inexpensive for publishers to implement, and it makes little sense to move forward if the solution is of little benefit to a publisher or their authors.

As a refresher, CHORUS is an overlay built on existing infrastructure that provides access to published articles resulting from federal funding. In order for CHORUS to really flourish, it needs many agencies and publishers to participate. This initiative was developed cooperatively by several publishers in response to the OSTP memorandum of 2013 calling for plans to make federal research results publicly available.

On the face of it, CHORUS provides value to three major entities:

Funders: CHORUS provides an answer to the OSTP question of, “How are you going to provide access to federally funded work?” Historically, the funders cannot answer questions such as, “How many papers were published as a result of grant X?” They really have no way of tracking that and most had no place to put reports or accepted papers. CHORUS is a free service to agencies, and they get new levels of accountability and access to content. They also receive dashboards that show them what has been published and is currently in CHORUS. Again, this comes at no cost to the agencies.

Publishers: The main value here is two-fold. First, publishers participating can use CHORUS as an author service for making papers available. That said, authors still seem to be required by most agencies to deposit accepted manuscripts, so I am not sure how much authors will value this service. So far the agencies don’t seem to have provided any mechanism for third parties to deposit on behalf of authors. Second, CHORUS argues that participating publishers will retain traffic (aka readers) on their sites if the “best version available” is on the journal’s site. This assumes that there are large numbers of people going to individual agency repositories to search for federally funded works, which at least seems to be the case for PubMed Central. That said, PubMed is THE online database for biomedical literature. The same cannot be said for the other agency repositories, yet. Traffic to third party repositories doesn’t count toward publisher COUNTER statistics, altmetrics and diminishes potential ad revenue. Publishers will also get a dashboard showing papers they have published and who funds them, information which presumably most of us already have access to. Paying for a system to facilitate access to federally funded works is also a good public relations move and broadening access serves the mission of many not-for-profit academic publishers.

Users: For the user, getting free access to research papers in context, in the journals themselves is a better user experience than finding them in a third party repository. Journals very often provide links to related articles or editorials, as well as offering corrections and retractions, which often don’t make it into many repositories. CHORUS has an open API which can potentially be used to enhance existing or new discovery tools. In the meantime, users can go to the CHORUS website to search for content that was federally funded by participating funders. If the embargo period has expired, the links provided will lead to the version of the paper the publisher has chosen to expose, either the accepted manuscript or the final version of record.

What Does Implementation Look Like for a Publisher?

CHORUS takes advantage of existing services such as Crossref but there are still loads of requirements that publishers and their technology partners need to discuss and implement. I wrote about some of these last year so I will focus today on some new requirements contained in Version 2 of the CHORUS implementation guide released three months ago.

Back-door Access to Embargoed Content

The “biggest” new requirement is stated as follows:

CHORUS Policy: Members must permit the publicly accessible AM or VOR, or a VOR behind a paywall, to be available for indexing from the date of publication by Participating Funders.

And,

CHORUS Policy: Participating funders agree to harvest only articles they funded and for which FundRef metadata has been deposited, not to try to crawl a publisher’s entire site unless the funder has a separate agreement with the publisher.

Because CHORUS participating funders will want to harvest articles for indexing soon after publication which often will be prior to public access mandate start date, there is an access control implementation requirement for CHORUS publishers. Publishers are only required to grant funder access to the articles based on research that they funded, not the entire journal. As discussed in the previous section, the harvesting can be of the AM or VOR (publisher decision). CHORUS is supporting two different solutions for funder access: IP based access and token based authentication.

The agencies want to access the content before the embargo expires so they can harvest the full-text, and CHORUS is allowing them to do so. There is no definition provided on what it means to “harvest” the content. In some places the guide says “harvest and index,” which may explain the spirit of the intent. I asked CHORUS about this and apparently the agreements signed by funders state that they can index the articles in their own databases and link to it via the DOI. The problem here is that unlike a Google or Bing crawl that indexes an entire site or whole journals, the agencies are only indexing individual articles based on the funder IDs.

One way to implement access for agencies per the CHORUS guide is to allow the funders IP access to your content and trust that they will only access papers they funded. This seems a risky proposition without more information on what agencies are permitted to do with the content.

The other option presented is to use a token authentication system built by CHORUS that is heavily based on the Crossref Text and Data Mining (TDM) system. The token system provides an access token to the funder for each paper tagged with their funder ID. Once the funder has a token for the paper they would like to access, the publisher’s platform must then be able to grant access by verifying the token via a CHORUS API.

Again, all of this work needs to be done in order to provide access to the full text of embargoed content. Note that the agencies have set the embargoes. Thus far, agencies that intend to use CHORUS (with the possible exception of the USGS) have a requirement that accepted manuscripts be deposited in a repository so they already have article information even before it gets published.

Note that it is the authors’ obligation to submit the accepted manuscript to the agency.  If the publisher chooses to only make the accepted manuscript version available on her website as an author service, and if the publisher has a workflow that allows the posting of that accepted manuscript immediately in order to allow funder indexing, then the funder is getting the exact same paper it received from the authors. This seems to be a waste of resources and definitely publisher time in setting this whole thing up.

On the other hand, CHORUS publishers making the version of record open after embargo — or not posting the accepted manuscript until the embargo expires — will be required to allow federal agencies to access the version of record for free.

Correcting Errors in Funder IDs

The new guide also explains requirements for correcting funder ID errors. CHORUS depends on publishers tagging the funder name and an ID number and including this in the Crossref metadata.

CHORUS has told funders that they need to alert the author of any errors, then the author alerts the publisher, then the publisher provides the new data to Crossref. Additionally, Crossref will fix certain errors (like adding funder IDs that did not exist when the information was first deposited or correcting agency names at the agency request). Publishers will not be informed of these changes. If you are publishing the funder name and IDs that turn out to be wrong, there will be a discrepancy.

CHORUS notes that CrossMark, another service provided by Crossref, will automatically update this information if you deposit the info to CrossMark in addition or in place of Crossref. Note that CrossMark is not a free service, nor is it endorsed by CHORUS. That said, the CHORUS guide states, “CrossMark is a good mechanism to document versions of articles (updates, corrections, retractions, etc.) and direct readers to the latest version. A side benefit is that the CrossMark user interface widget can then display the FundRef metadata without any additional implementation requirements for the publisher platform.”

Funding Agency Implementation

There are still a lot of questions about how some of this will work from the funder perspective. With a few exceptions, agencies seem to be doing a lot of this behind closed doors. This is a bad way to make policy, particularly one designed to increase transparency for the public. Here are some outstanding issues:

  • What exactly are the author requirements? Are they simply required to deposit their papers into the repository of their funder? What’s the timeline? What kind of information do they need to include? What if they don’t have a publication date or a DOI at acceptance? Will they then need to go back into the repositories to add this information? What if they withdraw the paper after acceptance or the accept decision is rescinded?
  • What happens if a paper has two federal funders? If a paper is funded by the DOE and the NSF, is it essentially deposited twice into the same overall database? Doesn’t this defeat the provision in the OSTP memo to mitigate having duplicates of works online?
  • Is there a reason why the agencies are hesitant to allow the publishers to deposit papers in required repositories as is done with NIH funded papers going into the PubMed Central?
  • Why can’t there be a way for publishers to validate grant/funder information? If publishers are being asked to allow public access to published works that result from federal funding, isn’t it reasonable to ask for a way to validate which papers qualify?

Being Compliant

I want to state a couple of things that tend to get lost in the discussions.

  • Public access mandates are requirements set on researchers accepting federal grant money. Publishers are not being mandated to give away content.
  • Authors are required to ensure that the accepted manuscripts be publicly available after an embargo period set by the funder. Publishers that do not allow posting will likely see a dramatic drop in submissions from federally funded researchers.
  • No one has defined how much federal funds need to be contributed to a study in order to qualify it as federally funded and therefore bound to the rules of public accessibility. I wrote about that here.
  • Agencies have been and plan to continue to collect accepted manuscripts and post them in their own repositories. If the publisher chooses not to open the content on the publisher site, the agency will allow public access to their version after the embargo period expires.

As negotiations continue between CHORUS and other agencies, we will see new requirements emerge. Likewise, as publishers move further along with implementing CHORUS, there will be new, maybe even easier, technical requirements. I continue to have concerns that these implementation requirements are a whole lot easier for very large societies and commercial publishers than they are for the rest of us. This is not entirely surprising given that the CHORUS board consists mostly of people from very large organizations.

When I am at publishing conferences, I get the feeling that a lot of society publishers are sort of waiting until their platform vendor says that CHORUS is “turned on” and then some sort of magic will happen. It’s really not as simple as just “joining” CHORUS. As I have said before, there are a lot of decisions a publisher needs to make.

Known agency mandates have start dates right around the corner. Authors are going to start asking questions and now is a good time to explore what is right for your organization and your authors. Nothing is hidden, you just need to do your research and know what you are expected to do.

Discussion

20 Thoughts on "CHORUS Gets a Boost from Federal Agencies – But Will New Approaches Make It Harder to Implement?"

A very insightful analysis, Angela. Back in the early days I thought that CHORUS would be simple (see my http://scholarlykitchen.sspnet.org/2013/06/17/chorus-confusions/). But since then I have tracked it through my weekly newsletter “Inside Public Access” and the process has become progressively more complex. Here is what I wrote just a few weeks ago, when reviewing the version 2 Publisher Implementation Guide that you reference: “Many of these issues are ones we have been writing about for a long time. Some of these issues are quite complex, so new prospective publishers may find them a bit bewildering. Deciding among the alternatives will not be a simple task. Some may need to be decided at the Board level.”

Regarding the prospect of even more requirements emerging, the NIST pilot is especially interesting, because NIST is using PubMed Central. Meshing CHORUS with PMC will be especially challenging. But in the long run we may wind up with a very useful system for scientific communication. As I like to say, confusion is the price of progress.

For a small society publisher, if they don’t utilize CrossMark, who will do the work of differentiating the AM and VoR? Or would it be that the author submits the AM and then the publisher’s version is tagged as the VoR? Or is it on CHORUS to do given the ambiguity of their terminology “harvesting and indexing”?

All of this is done through tagging. If you are depositing information into CrossRef, you will need to include license information and description information. You may have multiple versions of content (like an AM and a VOR) for which license information is required. Here is what the guide says on that:

“CrossRef License Metadata. The CrossRef license metadata (called “access indicators”) supports multiple licenses for the same article. Here are some scenarios:
● One license for during the embargo (subscription access) period, another starting afterwards for public access
● A different license for each version (AM or VOR)
● A separate license for TDM privileges (Not related to CHORUS)
● Or any combination of the above”

Here is what the guide says about the agency access with tagging:

“To support funder harvesting, a URL for the full text must be provided using the collection
metadata element in the “syndication” collection. If a different version is intended for funder
harvesting and for public access, the harvesting version should be in the “syndication” collection
and the public access version in “unspecified” collection.”

A great overview, Angela. There are challenges to publishers collecting funder names and getting the funder ID from the Crossref Open Funder Registry (as part of our rebranding “FundRef” is being phased out as a name – we’ll be putting out a blog post about this soon) but things are improving all the time and Crossref is looking at tools for smaller publishers to make this easier.

I wanted to correct one point. When Crossref fixes errors you say “Publishers will not be informed of these changes.” This is incorrect. If Crossref fixes an error by adding a funder ID to a publisher’s Crossref deposit the publisher is notified and can update their system. In addition, in the Crossref metadata the funder ID is tagged as being inserted by Crossref so the provenance of the data is clear.

Crossref is inserting funder IDs into about 25% of deposits from publishers. For example, in October Crossref received 42,000 deposits with funder IDs (good!) but also received 68,000 deposits with funder names but no funder ID (not so good! some of them could be funders we don’t have in the registry). Crossref was able to add funder IDs to just over 18,000 of the 68,000 deposits.

This is all a work in progress but Crossref is very persistent.

That is a lot of work on Crossref’s part! As you know Ed, I have been concerned about the magnitude of the challenge of funder identification for a long time. Of the 50,000 deposits where Crossref could not match an ID, do you have any feeling for the ratio of unregistered funders versus unrecognizable names?

By the way, if you are cataloging all the world’s funders and matching them with all the articles, that might be valuable data indeed.

Thank you for the clarification. This was not clear in the CHORUS implementation guide.

Most of the articles appearing in DOE’s PAGES portal are still coming from just two CHORUS members. These are the APS and the ACS, plus a few from AIP and OSA. This suggests that implementation is proceeding very slowly, especially at the funding agency end. See http://www.osti.gov/pages/. The CHORUS dashboard shows a lot more DOE articles but they are not showing up in PAGES.

Yes, I know, but these Elsevier papers are not showing up in DOE PAGES, which is presumably the point of CHORUS, to feed the agency portals, not to compete with them. The problem is that CHORUS has these articles but DOE does not. When I last asked DOE about this it sounded like they had been overwhelmed by the new publisher imposed complexities.

For example all the licensing stuff you refer to above. The agency Public Access actions are based on the claim that they have a “Federal use license” to articles that flow from their funding (not that they fund the papers, which generally they do not). The Federal use license should govern the use of what the agencies publish, but the publishers are trying to craft their own use licenses, to somehow flow through the agency portals along with the articles. The result seems to be a big bottleneck.

The emerging complexity of public access is almost overwhelming. The original simplicity has been lost.

The angle of posting accepted manuscripts in either a funding agency repository or via the publishers’ sites seems to go against the core notion for CHORUS to point users to the best available version of articles on the publishers’ sites. Presumably Elsevier must have decided that even after a 12 month embargo, letting the federally funded fraction of articles go for free would reduce subscription revenues.
I’ve thought the whole debate over embargo period was a red herring. I worked for many years without library support and seldom if ever had an author of a recent article ignore my request for a copy. Not instant access with a click, but getting an article in a few days wasn’t a major handicap. Authors are delighted when someone actually wants to read their new publication. Backfiles were another matter. So if for example, a publisher like American Geophysical Union (AGU) makes all final published articles freely available after 24-months, how important is it to carve out the federally funded 12-month accepted version exception?
Data availability gets little voice in a publishers’ oriented forum, but I argue that the efficient availability of publicly funded data is a much bigger issue than clickable availability of articles. Researchers have incentives to not share data, as data sharing can take a lot of work, expense, and your rival can scoop you or use your data against you. Authors may be happy to share their glorious articles, but giving away “their” data for free is another matter, even though if the work was publicly funded and the data is no more the personal property of the researcher than is “their” satellite, ship, or lab. Even journals with data accessibility policies may not enforce them. Real data sharing will require mandates with teeth, or some way to incentivize it. To me, the issue of making the publicly funded data that underlies the research article accessible and durable is far more important in the long-run than is instant access to the article itself.
But then again I work amongst data geeks, who pay for (and wear) shirts that say things like “Where will your data go when you retire?” with a drawing of a dumpster, or “Without data, you’re just someone with an opinion.”
(Disclaimer – while I enjoy excellent library access through my position at USGS, these after-hours viewpoints are solely my own.)

Hi Chris, lots to unpack in that comment.

My understanding of why some publishers (not all) have chosen to use the author’s manuscript (AM) is because that is what the funding agencies have specifically called for. Some of the agencies absolutely will not accept the version of record (VOR) for deposit in their repository. By “best available version”, what is meant is the best version that is available to that particular user–if they are a subscriber to the journal in question (or if it’s a Gold OA article) then they have access to the VOR. If not, then then best version available to them is the AM. Please note that some publisher are making the VOR available publicly on their websites to fulfill CHORUS obligations.

I suspect (and this is all speculation) that some of the rationale for this from the side of the agencies comes from their demand for a federal use license, and concerns that they may be extending their claims too far by doing this. In order to hedge, they ask for the AM, assuming the publisher has done more work on the VOR, so they can claim they’re only taking away rights from the fundee, not the publisher (though in reality, the publisher has already done a lot of work on the AM by the time it’s accepted). For the publisher, using the AM allows them to experiment with ways to retain value in the VOR, to invest in adding new features and technologies without having to give them away.

The AM will also likely create some chaos as you will now have multiple different versions of papers floating around, and what’s cited in one version may not even exist in another version.

I’m not as confident as you that the embargo argument is a red herring. I know of at least one society that made all journal articles freely available after 12 months and saw a 20% drop in subscriptions within a year (albeit, it was a Humanities journal). We have had librarians state that free availability will factor into their decisions whether to retain subscriptions. These periods must be carefully considered if they are to preserve the journals upon which the policies rely.

And I agree with you that the articles are small potatoes as compared to the data, which is the real meat of this policy. It’s interesting how vehemently many researchers are against releasing their own data. Seems it’s a lot easier to ask someone else to be loose and free with the thing they use to make their living than it is to do so yourself.

Oh I don’t question that embargo periods are very real issues for publishers and societies. My argument is that at least within my field (environmental science), I question whether access to recent publications, such as those published within the 6 to 24 month embargo periods commonly debated, is really a hardship to most researchers without subscriptions. In my experience, recent authors were easy to find and eager to share. Yet older publications were often very hard to acquire for one without research library access. Student authors graduate and move on, post-docs move on, emails change, and that senior faculty advisor on the tail of the author list might be a less helpful than the eager early-career first author. Easier public access to older literature older could be a real public benefit of the public access movement.
.
Appreciate the clarification that the AM vs VOR chaos has deeper roots than embargo periods. In that case, the Elsevier example that Angela Cochran linked to above looks workable enough. My opposition to the notion of agencies such as mine having to standup a PubFedish style repository is self-serving. It would be laborious to implement, the costs would likely come out of the library’s hide, and the costs would need to be offset elsewhere. Potential offsets that come to mind to pay for such a PubFedish invention of debatable benefit might include a contract research librarian let go, subscriptions not renewed, acquisitions not acquired, …

First I have heard of Federal agencies refusing to take VORs. If an agency did that they could not use CHORUS, so it cannot be DOE, NSF, SI, NIST or USDA. My understanding is that PMC takes whole journals worth of VORs, independent of funder, so the other HHS agencies should too, as well as NASA.. Or am I wrong? Care to give us a hint?

Agencies that use CHORUS will link to the version provided by the journal on the journal’s website. This is different from any version required by the agency to be deposited by the author into the agency’s repository. Some journals will make the VOR available on their website, but not allow authors to deposit the VOR for display in the repository. PMC does indeed accept VORs directly from journals, but as far as I know, no other agency has requested journals deposit on behalf of authors, and some have expressly forbidden it.

But if you want an example, the DOT has specifically defined “publications” as:
““Publications,” for the purpose of this plan, will be defined as any final peer-reviewed manuscript accepted for publication, any intramural technical or final reports, and any Scientific Research project written deliverable (e.g., technical/final reports) that arises from extramural research funded, either fully or partially, by federal funds awarded through a DOT-managed contract, grant, or other agreement.”

No mention of the Version of Record whatsoever.

Yes, but DOT is explicitly not using CHORUS. My confusion may be that when you said an agency would not accept the VOR, I thought you meant not even the link to the VOR on the publisher’s website, which would preclude CHORUS. If you are just referring to the AM collected for the agency’s dark archive, then yes that must be the AM because the agency cannot post the VOR.

As a retired USGS scientist, I think I can shed some light on USGS publication practices mentioned here. My former colleagues are free to correct me if things have changed.
The USGS Publications Warehouse originated in the late 1990’s to serve all USGS report series online. These range from straightforward data reports to monumental professional papers. Scanning the archives was a massive effort. The Warehouse did NOT originally include journal articles, draft or otherwise; these were archived with the journals themselves and libraries.
Journal articles by USGS scientists used to be announced in a monthly publication, “New Publications of the U.S. Geological Survey.” Unfortunately, the process for collecting titles functioned poorly, so the completeness of this list is dismal.
Many USGS authors now seem to be choosing gold open access, which means the Publications Warehouse can, using CHORUS, bring users directly to the full official copy. However, I don’t know how complete the Warehouse is regarding journal articles. Maybe it’s ok for recent works, but I have great doubts about older articles.
Bottom line:
1. If you want a USGS numbered series report, go to pubs.usgs.gov. Quick, simple, free.
2. If you want a recent journal article by a USGS scientist, first try the journal.
3. For older journal articles by USGS scientists, you might still need a broader search, like Google Scholar.
It’s easy to think of implementation being simple. But, when you’re talking of over a century of publications of all kinds, things get complicated and expensive very fast. Budgets are tight, and I congratulate USGS for their efforts.

There is an interesting back story to USGS being so late to the table, which I wrote about recently in Inside Public Access. It is a case of regulatory confusion. Here is the relevant excerpt:

“To begin with, the original OSTP mandate creating the Public Access program specified that it included any Federal agency with a research budget of $100 million or more. The question is, what is an agency?

The USGS exceeds the mandate dollar threshold so it sent a draft Public Access plan to OSTP for review. In fact they say they were the first agency to do so. But OSTP said that the USGS research spending meant that the Interior Department had to implement Public Access. Such a plan would include any research funded by any Interior component, of which there are many besides the USGS. For example, Interior includes the Bureau of Reclamation, the Bureau of Land Management, the National Park Service, etc., all of whom fund some research.

Apparently one or more of these components said they did not want to do Public Access and Interior HQ either did not or could not change that. The result was a standoff between Interior and OSTP. Interior would not produce an implementation plan and OSTP would not accept the USGS plan. One can only imagine the discussions that took place. On the other hand there is some precedent for Interior’s position. Four of HHS’s component agencies have separate Public Access plans, while HHS has none.

This standoff then went into limbo when the Public Access chair at OSTP went vacant. Happily the new OSTP Public Access guru has reportedly agreed to accept the USGS plan, ending the stalemate. USGS says they hope to announce an implementation plan before the end of the year, which is nigh, hence the agreement with CHORUS.”

Comments are closed.