The ability of sites to capture, index and republish digital content has created a plethora of useful tools and services on the internet. Who hasn’t found it useful to perform a search on Google or another search platform and to be returned not simply the web page, but the answer to your query that might exist on that page, in snippet form? For those conducting research, it is often helpful to store not simply a link to the paper or item, but the item itself within one’s information management tool.

Scholarly Collaboration Networks (“SCNs”) in the academic community, such as, ResearchGate, Mendeley, ReadCube Papers, and others provide this storing capacity. In addition, these tools are popular among researchers as they help organize, cite, discover and share articles to showcase work, foster collaboration and with that, advance the scholarly discourse.

Sharing documents via the web
Original by WMF, character extracted by User:Yuriy Bulka, CC BY-SA 3.0 via Wikimedia Commons

All of this content sharing/republishing often includes copyrighted or rights-protected works, and therefore inhabits a somewhat legal grey area when such rights-protected content is copied and ingested into these SCNs. While there have been some legal precedents that exempt digital duplication from copyright infringement lawsuits, notably in the US that provides some shelter for transformative use such as the Google Book Search (see also this, and this) that rely on transformative use, outside the USA there has been considerable ambiguity.

In the European Union, an initiative to address this issue was finalized in a 2019 change to the EU Copyright Directive, the Directive on Copyright in the Digital Single Market (official text here). That new law took effect in June 2019 and must be translated into national law by EU Member States by June 2021. SCNs — some of which qualify as Online Content Sharing Service Providers, or OCSSPs as they are referred to in the Directive — fall within the scope of the new rules and, thus, are required to follow certain steps and obligations if they want to preserve the possibility avoiding liability for copyright infringement under the Directive. In particular, OCSSPs have to make “best efforts to ensure the unavailability” of protected works for which rightsholders have provided “relevant and necessary information”. In other words, in order for platforms to meet their obligation, publishers themselves have an obligation to give information, regarding rights and permissions of content sharing, in a method that can be feasibly leveraged at scale by SCNs.

In order to address this challenge, a team under the STM Association’s, STEC Committee, developed the Article Sharing Framework. The Framework gives scholarly publishers a mechanism to provide SCNs — in machine-actionable form — information about an article’s PDF’s identity and the respective publisher’s sharing policies. This enables SCNs to use the information to determine in an automated way, and in real-time, whether the publisher’s content may be shared.

The Article Sharing Framework consists of slight adaptations to a number of existing structures in our technical infrastructure to communicate publisher’s sharing policies in the content. The Framework combines the NISO Journal Article Versions (JAV) and the Access and License Indicators (ALI) metadata structures, along with the Crossref DOI structures and a new registry of sharing policies that will be maintained by the STM Association. An excellent video description of the system is available on the STM website.

In order to comply with the posting requirements using the Framework, SCNs need to do two things: determine the unique identity of the published content that is intended to be shared, then determine if a sharing policy has been asserted by the publisher for that published content.

For scholarly journals, an article’s identity is a composite of two parts:  the article DOI, and the specific version of the article embedded in the PDF (the Journal Article Version (JAV)). The DOI alone is insufficient  because specific versions of a work may have different license restrictions asserted by the publisher, and publishers sometimes use the same article DOI for multiple versions. For example, an accepted manuscript  version might be sharable, whereas the version of record may be more restricted in its sharing options. The JAV metadata facilities this distinction.

Determining the applicable publisher sharing policy for the specific journal article version relies on the NISO Access and License Indicators metadata structure within Crossref. This simple structure can communicate whether the content is free to read (important, but not relevant in the Framework structure), and which reuse license or sharing policy is applied to the content, along with applicable effective dates. A small update to the ALI structure is being finalized by NISO this Spring to adapt the ALI metadata to include information regarding the Article Sharing Framework in a new “applies-to=” attribute in the existing <license_ref> metadata tag. These additional metadata will identify the registry from which sharing policies are defined. The <license_ref> field itself will contain a “policy DOI” that uniquely identifies the sharing policy applied to the article, and will also point to the interpretation that is maintained in a STM registry.

The STEC working group has identified 48 different variations of sharing policies that publishers use in their licenses, based on the journal article version, the amount content being shared, the intended audience, and whether the service has agreed to abide by the voluntary STM guidelines for article sharing on scholarly collaboration networks. Each entry in this registry represents one of the permutations of these elements identified by the STEC group, and a publisher may express multiple policies from this registry for a given journal article.

Publishers need only adapt these metadata, much of which they already are collecting and sharing via their Crossref metadata, and embed this information in the files they serve to users. Publishers generally add this type of information to files during the production process. For those back files, this information can also be added to the files as they are served to patrons from the journal platform. When an SCN is presented with a file containing these metadata, it can extract the DOI and JAV and then query Crossref for respective sharing policy identifiers. Through the Article Sharing Framework, the SCN can automatically review these data in a machine-actionable way and thereby allow or prevent the content from being posted. Those platforms that seek to act in a responsible way will now have the tools to do so.

The Framework provides a comparatively simple way for publishers to help SCN platforms conform with their duties under the new Article 17 of the EU Copyright Directive, providing an easy way for SCNs to assess the right to repost a content object on their networks for wider distribution. The STEC committee sought to adapt current systems and the existing metadata supply chain to address this issue, through some minor adjustments. Rather than providing a heavy-weight technological solution, or requiring additional significant development by the publisher, the Crossref system, or significant additional work on the part of these collaboration networks, this Framework provides an elegant solution to facilitating access to content via SCNs, should users desire to do so and should publishers allow it. It also offers an easy solution for platforms that will need to abide by the new obligations defined by the EU Directive on Copyright in the DSM.

In addition, while the framework is primarily targeted to copyrighted subscription content, it is designed to complement existing frameworks for expressing public use licenses for open access content, which is also structured using ALI. Publishers who are participating members of CHORUS are already supplying reuse information to Crossref using ALI. With the introduction of the Framework, most models of journal publishing are now covered by a reuse or sharing policy framework, including closed, hybrid, and fully open access models.

While SCNs can use the information obtained through the Framework to enable legitimate article sharing, it is up to the SCN to decide as to whether to ultimately enable the sharing or not. The Article Sharing Framework is thereby not a blocking technology in itself, as it rather supports the platform in taking a decision regarding the shareability of content: nothing in the operation of the Framework can technically prevent the upload from happening, and so there is no automated blocking.

More information about the Article Sharing Framework is available on a dedicated area on STM’s site, including some helpful FAQs  to provide additional context about the Article Sharing Framework. An integration guide to support the adoption of this Framework by publishers and software vendors is also available. To promote successful implementation of the Framework across the industry, informative webinars and hands-on workshops are organized and anyone is welcome to join these sessions. Registration/agenda information is available at STM’s Article Sharing Framework information page.

Thank you to the numerous members of the STM Article Sharing Framework working group who contributed to this article.

Todd A Carpenter

Todd A Carpenter

Todd Carpenter is Executive Director of the National Information Standards Organization (NISO). He additionally serves in a number of leadership roles of a variety of organizations, including as Chair of the ISO Technical Subcommittee on Identification & Description (ISO TC46/SC9), founding partner of the Coalition for Seamless Access, Past President of FORCE11, Treasurer of the Book Industry Study Group (BISG), and a Director of the Foundation of the Baltimore County Public Library. He also previously served as Treasurer of SSP.


8 Thoughts on "Article Sharing Framework: Facilitating Scholarly Sharing Through Metadata"

We are all familiar with instances where, let’s say, a large corporation, let’s say, accidentally … asserts copyright ownership and thus the right to determine use … in ways that are not correct. This can result in take down notices or cases where an APC was paid but there is asserted copyright to a publisher/paywalled. In such a case of this sort of accident, one anticipates that the sharing metadata is likely also inaccurate. Is there a process for the sharing metadata to be challenged/appealed and corrected via this framework? Thanks for any insight you can provide!

There are certainly problems with metadata in this community, no doubt.

The details about the protocol, which are posted on the Framework page does reference dealing with incorrect metadata. Although there’s not an actual process for reporting incorrect metadata, i.e. the platform would just have to identify and contact publisher, a protocol has been set up whereby in the case of incorrect metadata, the publisher would not hold the platform liable for any incorrect posting, but it would ask for a rectification. Also, publishers should ensure the metadata is updated in the PDF and register the corrected metadata with Crossref.

Beyond this, you’re right that there have been cases where publishers have asserted more rights than they in-fact hold, say assigning an “All rights reserved” claim to works that are published using certain Creative Commons licenses. In those cases, the SCN would have the right to post but may be more cautious in doing so In this case, again, the publisher should be receptive to, and actionable upon receipt of, notice of incorrect metadata and fix those errors. I’m sure that the community will use their eagle eyes to notice and call out inappropriate behavior.

Just to make sure I understand … The author who owns their copyright has to get the SCN to act on their behalf under this framework if the publisher is erroneously asserting ownership? Leaving the actual owner of the content unable to assert their own rights?

No, the author certainly can also reach out to the publisher to correct a metadata error. Indeed, I don’t see why anyone couldn’t inform the publisher of the errors in their metadata, although I don’t know necessarily how an average user would know definitively what the rights status is, as to what rights were transferred or reserved and under what type of license.

Thanks. I was hoping that maybe this framework would facilitate that author communication since it is in this context/workflow they will perhaps learn that something is amiss when the messaging advises against upload. I was thinking something like “do you think this is wrong? contact……”

How does this work for an author sharing an AAM?

Publishers can embed XMP into their PDF workflow or add extra query strings to their DOI hyperlink (as outlined in the implementation presentation on the STM website). But authors don’t do these things (except maybe in fields that use LaTeX like mathematics, physics or computer science). And SCNs aren’t aggregating from institutional repositories that might have the ability to inject that metadata as far as I know. That leaves me thinking that the only real use case for this is for toll access publishers to better flag VoRs that should not be shared via SCNs – i.e. to restrict sharing.

I don’t understand the reliance on a PDF either. There are some publications which are electronic-only online. Would it be expected that there would be a JAV PDF generated from the electronic article? Cannot there be a different file format, such as EPUB or HTML which is definitively the JAV? Dependency on PDF XMP seems too limiting in an era of multiple digital representation file formats of article content. It seems like a conflation of the “printable page” with the article content.

The reason that this project is focused on PDFs, is because these are generally the files that are shared within the ecosystem. The Framework would work equally well with EPUBs or HTML, since these file types can contain the same metadata tags. However most publishers are not distributing EPUB versions of journal articles (though they certainly have the capacity, since most publishers are generating the PDF and HTML and EPUBs from the same production XML) and the HTML versions aren’t easily stored or saved. The files that are saved, stored and shared using SCNs are predominantly PDF files. This Framework really isn’t so much publishers focused on the printable page as it is responding to user demand for portable documents, for which there is a preference–which is probably more because of legacy use and systems than it is because of features and functionality–by readers and users of content. As noted, PDF is a preferred consumption medium rather than a preferred production and distribution format.

Comments are closed.