Today, CHORUS and the National Science Foundation (NSF) announced an agreement to use CHORUS for facilitating the discovery of NSF funded works. The news from NSF is important for CHORUS as it represents the second large funding agency to use the service.
The partnership is not a huge surprise given that the NSF had signed a Memorandum of Understanding with the Department of Energy (DOE) to use their PAGES database to collect manuscripts that result from federal funding, and the DOE plan has already built the integration with CHORUS. In fact, the DOE was the first agency to sign an agreement with CHORUS. That said, federal agencies have not been particularly transparent about their intentions, and information is coming out in dribs and drabs.
According to the press release today, the NSF plans to collect accepted manuscripts from grantees for deposit in the NSF Public Access Repository (NSF-PAR), hosted by DOE. Just like the DOE-CHORUS workflow, people using the NSF-PAR will either be able to access the accepted manuscript in the NSF-PAR database after the 12-month embargo has expired, or will be presented with a link to the publicly available published version on the publisher site. The idea is that access will be provided to the best available version, using CHORUS as the infrastructure.
This announcement comes on the heels of another partnership. CHORUS announced two weeks ago that the U.S. Geological Survey (USGS) has signed on. The USGS has a policy about making data available and collecting and getting work approved but nothing that is specific to making accepted manuscripts reviewed by journals accessible to the public, so we have little information about how this will work. It is not clear whether the USGS will collect accepted manuscripts from authors for hosting in their USGS Publications Warehouse database.
The National Institute of Standards and Technology (NIST) is in pilot phase with CHORUS, and the Smithsonian Institution is using CHORUS for linking to content.
Two major agencies we have not heard from yet are the Department of Defense and the Environmental Protection Agency. The Department of Transportation announced their plan late last week to use their own National Transportation Library and the USDOT Research Hub for making funded works accessible after a 12 month embargo.
With the NSF on board, CHORUS has been given a big boost. However, it seems many publishers, whose membership dues are the only source of financial support for CHORUS, have been hanging back to see which agencies will participate. CHORUS is not easy or inexpensive for publishers to implement, and it makes little sense to move forward if the solution is of little benefit to a publisher or their authors.
As a refresher, CHORUS is an overlay built on existing infrastructure that provides access to published articles resulting from federal funding. In order for CHORUS to really flourish, it needs many agencies and publishers to participate. This initiative was developed cooperatively by several publishers in response to the OSTP memorandum of 2013 calling for plans to make federal research results publicly available.
On the face of it, CHORUS provides value to three major entities:
Funders: CHORUS provides an answer to the OSTP question of, “How are you going to provide access to federally funded work?” Historically, the funders cannot answer questions such as, “How many papers were published as a result of grant X?” They really have no way of tracking that and most had no place to put reports or accepted papers. CHORUS is a free service to agencies, and they get new levels of accountability and access to content. They also receive dashboards that show them what has been published and is currently in CHORUS. Again, this comes at no cost to the agencies.
Publishers: The main value here is two-fold. First, publishers participating can use CHORUS as an author service for making papers available. That said, authors still seem to be required by most agencies to deposit accepted manuscripts, so I am not sure how much authors will value this service. So far the agencies don’t seem to have provided any mechanism for third parties to deposit on behalf of authors. Second, CHORUS argues that participating publishers will retain traffic (aka readers) on their sites if the “best version available” is on the journal’s site. This assumes that there are large numbers of people going to individual agency repositories to search for federally funded works, which at least seems to be the case for PubMed Central. That said, PubMed is THE online database for biomedical literature. The same cannot be said for the other agency repositories, yet. Traffic to third party repositories doesn’t count toward publisher COUNTER statistics, altmetrics and diminishes potential ad revenue. Publishers will also get a dashboard showing papers they have published and who funds them, information which presumably most of us already have access to. Paying for a system to facilitate access to federally funded works is also a good public relations move and broadening access serves the mission of many not-for-profit academic publishers.
Users: For the user, getting free access to research papers in context, in the journals themselves is a better user experience than finding them in a third party repository. Journals very often provide links to related articles or editorials, as well as offering corrections and retractions, which often don’t make it into many repositories. CHORUS has an open API which can potentially be used to enhance existing or new discovery tools. In the meantime, users can go to the CHORUS website to search for content that was federally funded by participating funders. If the embargo period has expired, the links provided will lead to the version of the paper the publisher has chosen to expose, either the accepted manuscript or the final version of record.
What Does Implementation Look Like for a Publisher?
CHORUS takes advantage of existing services such as Crossref but there are still loads of requirements that publishers and their technology partners need to discuss and implement. I wrote about some of these last year so I will focus today on some new requirements contained in Version 2 of the CHORUS implementation guide released three months ago.
Back-door Access to Embargoed Content
The “biggest” new requirement is stated as follows:
CHORUS Policy: Members must permit the publicly accessible AM or VOR, or a VOR behind a paywall, to be available for indexing from the date of publication by Participating Funders.
CHORUS Policy: Participating funders agree to harvest only articles they funded and for which FundRef metadata has been deposited, not to try to crawl a publisher’s entire site unless the funder has a separate agreement with the publisher.
Because CHORUS participating funders will want to harvest articles for indexing soon after publication which often will be prior to public access mandate start date, there is an access control implementation requirement for CHORUS publishers. Publishers are only required to grant funder access to the articles based on research that they funded, not the entire journal. As discussed in the previous section, the harvesting can be of the AM or VOR (publisher decision). CHORUS is supporting two different solutions for funder access: IP based access and token based authentication.
The agencies want to access the content before the embargo expires so they can harvest the full-text, and CHORUS is allowing them to do so. There is no definition provided on what it means to “harvest” the content. In some places the guide says “harvest and index,” which may explain the spirit of the intent. I asked CHORUS about this and apparently the agreements signed by funders state that they can index the articles in their own databases and link to it via the DOI. The problem here is that unlike a Google or Bing crawl that indexes an entire site or whole journals, the agencies are only indexing individual articles based on the funder IDs.
One way to implement access for agencies per the CHORUS guide is to allow the funders IP access to your content and trust that they will only access papers they funded. This seems a risky proposition without more information on what agencies are permitted to do with the content.
The other option presented is to use a token authentication system built by CHORUS that is heavily based on the Crossref Text and Data Mining (TDM) system. The token system provides an access token to the funder for each paper tagged with their funder ID. Once the funder has a token for the paper they would like to access, the publisher’s platform must then be able to grant access by verifying the token via a CHORUS API.
Again, all of this work needs to be done in order to provide access to the full text of embargoed content. Note that the agencies have set the embargoes. Thus far, agencies that intend to use CHORUS (with the possible exception of the USGS) have a requirement that accepted manuscripts be deposited in a repository so they already have article information even before it gets published.
Note that it is the authors’ obligation to submit the accepted manuscript to the agency. If the publisher chooses to only make the accepted manuscript version available on her website as an author service, and if the publisher has a workflow that allows the posting of that accepted manuscript immediately in order to allow funder indexing, then the funder is getting the exact same paper it received from the authors. This seems to be a waste of resources and definitely publisher time in setting this whole thing up.
On the other hand, CHORUS publishers making the version of record open after embargo — or not posting the accepted manuscript until the embargo expires — will be required to allow federal agencies to access the version of record for free.
Correcting Errors in Funder IDs
The new guide also explains requirements for correcting funder ID errors. CHORUS depends on publishers tagging the funder name and an ID number and including this in the Crossref metadata.
CHORUS has told funders that they need to alert the author of any errors, then the author alerts the publisher, then the publisher provides the new data to Crossref. Additionally, Crossref will fix certain errors (like adding funder IDs that did not exist when the information was first deposited or correcting agency names at the agency request). Publishers will not be informed of these changes. If you are publishing the funder name and IDs that turn out to be wrong, there will be a discrepancy.
CHORUS notes that CrossMark, another service provided by Crossref, will automatically update this information if you deposit the info to CrossMark in addition or in place of Crossref. Note that CrossMark is not a free service, nor is it endorsed by CHORUS. That said, the CHORUS guide states, “CrossMark is a good mechanism to document versions of articles (updates, corrections, retractions, etc.) and direct readers to the latest version. A side benefit is that the CrossMark user interface widget can then display the FundRef metadata without any additional implementation requirements for the publisher platform.”
Funding Agency Implementation
There are still a lot of questions about how some of this will work from the funder perspective. With a few exceptions, agencies seem to be doing a lot of this behind closed doors. This is a bad way to make policy, particularly one designed to increase transparency for the public. Here are some outstanding issues:
- What exactly are the author requirements? Are they simply required to deposit their papers into the repository of their funder? What’s the timeline? What kind of information do they need to include? What if they don’t have a publication date or a DOI at acceptance? Will they then need to go back into the repositories to add this information? What if they withdraw the paper after acceptance or the accept decision is rescinded?
- What happens if a paper has two federal funders? If a paper is funded by the DOE and the NSF, is it essentially deposited twice into the same overall database? Doesn’t this defeat the provision in the OSTP memo to mitigate having duplicates of works online?
- Is there a reason why the agencies are hesitant to allow the publishers to deposit papers in required repositories as is done with NIH funded papers going into the PubMed Central?
- Why can’t there be a way for publishers to validate grant/funder information? If publishers are being asked to allow public access to published works that result from federal funding, isn’t it reasonable to ask for a way to validate which papers qualify?
I want to state a couple of things that tend to get lost in the discussions.
- Public access mandates are requirements set on researchers accepting federal grant money. Publishers are not being mandated to give away content.
- Authors are required to ensure that the accepted manuscripts be publicly available after an embargo period set by the funder. Publishers that do not allow posting will likely see a dramatic drop in submissions from federally funded researchers.
- No one has defined how much federal funds need to be contributed to a study in order to qualify it as federally funded and therefore bound to the rules of public accessibility. I wrote about that here.
- Agencies have been and plan to continue to collect accepted manuscripts and post them in their own repositories. If the publisher chooses not to open the content on the publisher site, the agency will allow public access to their version after the embargo period expires.
As negotiations continue between CHORUS and other agencies, we will see new requirements emerge. Likewise, as publishers move further along with implementing CHORUS, there will be new, maybe even easier, technical requirements. I continue to have concerns that these implementation requirements are a whole lot easier for very large societies and commercial publishers than they are for the rest of us. This is not entirely surprising given that the CHORUS board consists mostly of people from very large organizations.
When I am at publishing conferences, I get the feeling that a lot of society publishers are sort of waiting until their platform vendor says that CHORUS is “turned on” and then some sort of magic will happen. It’s really not as simple as just “joining” CHORUS. As I have said before, there are a lot of decisions a publisher needs to make.
Known agency mandates have start dates right around the corner. Authors are going to start asking questions and now is a good time to explore what is right for your organization and your authors. Nothing is hidden, you just need to do your research and know what you are expected to do.