Guest Post: The Time for Open and Interoperable Annotation is Now

Editors Note: Today’s post is by Alexander Naydenov, Co-founder and Head of Marketing at PaperHive; and Heather Staines, Director of Business Development at Hypothes.is. (Full disclosure, both parties work for organizations that provide annotation services for scholarly communications)

Annotation is coming to scholarly content, but there are key choices to be made that will dramatically affect the collective outcome we achieve.

Digital annotation is not a new idea. The first conceptions of what would become the internet imagined it as enabling a much more interactive experience over scholarly content, known as the “read-write” web which many have lamented that we never achieved. Since 1993, when Mosaic first briefly experimented with native annotation, dozens of projects have tried without success. There are many reasons for this failure: a lack of standards, an unwillingness to adopt proprietary systems and centralized implementations, poor user experience, and slow browsers among others.

aristotle annotated manuscript — Aristotle, Beginning of Physics. Medieval Latin manuscript, with original Greek text added in the margins.

Approval of open annotation by the W3C as a web standard in February 2017 changed everything by establishing a foundation upon which interoperable systems could be built. (Nearly all widely adopted technologies rely to some degree on standards, for instance browsers, email, cellular networks, etc.) The existence of an annotation standard, along with FAIR (Findable, Accessible, Interoperable, and Reusable) principles, and Interoperability in particular, will finally make the widespread uptake of this technology possible in a scholarly context in a way that protects against vendor lock-in.

Some skeptics are quick to point out that publishers and media sites have tried to implement comments on their sites with the results ranging from low use, to incivility and trolls, to spam. In other words: “Haven’t we tried this already”?

Advantages of annotations over comments

There are some essential differences between comments and what is possible with the architecture of web annotations:

Annotations can be public, private, or in various types of collaboration groups, while comments have only one option, public. Because we’ve only known public comments, we tend to assume that the most public form of engagement is the same for annotations too. However, public annotations represent only 25% of the activity of annotators. Annotations made privately are also indications of engagement, providing metrics on activity on parts of documents and timelines for such attention. The ability to form ad-hoc groups opens annotation up to diverse use cases from personal organization, to use in the classroom, among research teams, and many other applications.
Comments have only one motivation, discussion. The W3C model allows for users to add a motivation to their annotation. You could imagine “commenting”, “correcting”, “questioning”, “classifying”, or “tagging” among others. While these have not been implemented in any systems we’re aware of yet, they hint at the much broader potential for use cases)
Comments are stranded upon the individual pages where they are created. There is no way for readers to get access to all of their comments without returning to the pages in question — comments in a sense belong to the publisher that implements them. Annotations by contrast are owned by the author, and they can be browsed, searched and shared with others, giving readers and researchers the ability to organize notes and discover feedback generated by others from across the web.
Annotations can syndicate across formats (HTML, PDF, EPUB) and even across platforms, so readers need not worry that a key conversation is taking place on another version.
Comments are in a jumble at the bottom of the page. Annotations are in-line associated directly with text, connecting the reflections to the sentence or phrase in context. This means you can annotate precisely, and for more kinds of reasons as you move through a text.
Annotations can thus serve as direct links that can take a visitor right to a passage, automatically scrolling the document however many pages down it is.
Annotations can be created or retrieved through an API. This means that annotations can be made by machines, in specialized group layers for all kinds of purposes, including informative tags sets, biocuration, translation, correction and retraction alerts and more.

The promise of open annotation

The W3C standards body for the web published annotation as a web standard on February 23, 2017. This move paves the way for users to indicate in future browsers which annotation service they prefer. Until then, users can take open annotation with them via plug-ins, bookmarklets, apps or as embedded natively in platforms. Publishers and other sites can include a simple line of javascript to enable annotation by default across their content. Interoperable, standards-based annotation will allow readers, researchers, and students to interact with each others’ annotations even if they are using different clients — in much the same way that email works today. Those who wish to follow developments around the W3C standard can follow the activities of the Web Annotation Working Group.

What about bad behavior?

If publishers or other sites are deploying annotation across their content, they want to rest assured that the annotations created there do not detract from the quality of their content. They also want to make sure that the workload of moderation doesn’t overwhelm already burdened editorial staff.

While tool creators typically monitor the public layer, following up on any moderation flags that users click, publishers can also create groups for which they manage the moderation tools. Community and publisher guidelines detail expectations around user behavior. Improper annotations can be hidden from public view and repeat offenders may have their accounts suspended. Tool creators are also exploring the future utilization of sentiment analysis to identify toxic annotations or monitoring user behavior to identify quality contributors.

Use cases

There is a wide range of use cases for annotations in academia and e-learning.

Personal uses include note-taking on documents to organize thoughts and ideas, to emphasize important information for later review, to add related information like images and links.

Private group annotations streamline the communication and the collaboration efforts of research groups. Teams use annotations to get a better understanding of a text by asking questions about it, and discussing its strengths and weaknesses. They can organize their takeaways from literature by sharing new insights in context, by pointing at relevant paragraphs or contributing related artifacts. New manuscripts can be improved by proofreading and reviewing.

Public annotations benefit the research community by improving the exchange between readers and authors and by keeping research information relevant and up to date. Readers can ask the community or the author of a text for clarifications, research data, or experiment protocols. They can underline the merits or limitations of research findings and add their own contributions and findings, creating a network of connected knowledge. Authors and editors can enrich literature by adding corrections, updates, and recommendations precisely where they belong.

E-learning is another field benefiting from in-document discussions. Annotations on lecture slides and textbooks are used to make university lectures more interactive, to power distance learning, to encourage students to help their peers, and to themselves contribute to the content. Integration of open annotation tools with Learning Management Systems can simplify instructor workflow, course interaction, and assessment.

Lastly, peer review involving annotations is more granular and detailed, which often results in improved quality. Annotation technology can be useful in both traditional pre-publication peer review, as well as post-publication open or community peer review.

Annotations and FAIR

Scholars and research organizations are increasingly interested in the FAIR guiding principles for scientific data management and stewardship: the requirement that these should be Findable, Accessible, Interoperable, and Reusable. Annotations themselves are data and should be FAIR. Annotations make scholarly content FAIR by adding searchable metadata and links.

To be Findable, annotations should have persistent unique identifiers which are included in their metadata and registered or indexed in a searchable index. To be Accessible, annotation metadata should be retrievable with a standardized communications protocol which is open, free, and universally implementable. That metadata should remain available even if the data is no longer available. Interoperability requires a formal, accessible, shared, and broadly applicable language, vocabularies that follow FAIR principles and qualified references to other metadata. Finally, to be Reusable, metadata should include a plurality of accurate and relevant attributes with a clear and accessible data usage license, and detailed provenance. Even annotations that are not made for public consumption benefit from being FAIR. Robust machine readable metadata renders annotations more able to be found, accessed, and reused by their creator or their collaboration group. Annotations originally made for private purposes, such as in peer review or in journal clubs, may well be made visible later. Further, interoperability is a key aim of the W3C Web Annotation group, so that annotations made with one tool can be interacted with by those using other tools, irrespective of their level of visibility.

Steady progress is being made to make annotations FAIR. “Crossref’s newest content type in our metadata store ensures that scholarly discussions such as annotations are easy to find, cite, link, and assess — all core characteristics of the FAIR principles. For those not registered with Crossref, our Event Data service ensures that annotations which are made on content with a DOI are also included in the Crossref scholarly research map,” notes Jennifer Lin, Director of Product Management at Crossref.

Interoperability is the key

Of all aspects of FAIR, Interoperability might be the most important. Interoperable, standards-based annotation will allow researchers, students and readers to read and respond to each others’ annotations even if they are using different platforms and clients — in much the same way that email works today. Interoperability of annotation tools should also allow users to port their data from one tool to another or to archive their annotations securely for use later in another context. Most importantly, interoperability is a safeguard against providers who would try to lock-in users to a specific implementation, or worse, to a monolithic service.

The Annotating All Knowledge Coalition, free to join, was formed in 2015 to bring together interested publishers, universities, and technology organizations to realize an open interoperable annotation capability within the scholarly world. Today, members are exploring the use of multiple tools from a user experience perspective, with the goal of someday achieving true interoperability.

What will it take to get there?

A healthy, robust interoperable annotation ecosystem, delivering upon all of the promise detailed in the use cases above, will require the participation, effort, and resources of many players. Tool creators should build in accordance with the standard to enable interoperability for users and partners, put control of annotation data in the hands of its creators through APIs, and avoid creating new proprietary silos. Through this, publishers can avoid a vendor lock-in and be free to migrate from one system to another. Annotation enthusiasts should focus on the new standard and insist on FAIR annotations in keeping with the increasing focus on openness and transparency across scholarship.

Interest in annotation continues to grow. Recently celebrating its sixth year, the I Annotate meeting gathers those interested in open annotation to explore use cases, assess industry developments, and demo new integrations. Videos from the event are available on YouTube. More panels at industry events are focusing on the possibilities around annotation in researcher or publisher workflow. It’s one of the featured topics at the upcoming Altmetric 5:AM Conference.

Annotation, whether it is public or private, can serve as a valuable metric for engagement with documents and parts of documents. If you’d like to join in the discussion about open interoperable annotation, we welcome your feedback.

Alexander Naydenov

Alexander Naydenov is the co-founder and Head of Marketing at PaperHive.

Heather Staines

@heatherstaines

Heather Staines is Senior Consultant at Delta Think and Director of Community Engagement for the OA Data Analytics Tool. Her prior roles include Head of Partnerships for the MIT Knowledge Futures Group, Director of Business Development at Hypothesis, as well as positions at Proquest, SIPX (formerly the Stanford Intellectual Property Exchange), Springer SBM, and Greenwood Publishing Group/Praeger Publishers. She is a frequent speaker and participant at industry events including the ALPSP DEIA Working Group (co-chair), the Charleston Library Conference, the STM Futurelab and Standards and Technology Executive Committee (STEC). She is President-Elect for the Society for Scholarly Publishing and a Board Member for NASIG. She has a Ph.D. in Military and Diplomatic History from Yale University.

Discussion

9 Thoughts on "Guest Post: The Time for Open and Interoperable Annotation is Now"

Our company has an engagement service (Remarq) that uses annotation, commenting, post-publication review, surveys, article-sharing, polls, user profiles, and other tools to create engagement. The authors here make some statements that I think bear expansion based on our experience on more than 100 journals across multiple disciplines, with more being added all the time.

“Comments have only one motivation, discussion.” In the comments we’ve seen in Remarq, this is an incomplete representation of the myriad motivations people can have to comment. Sometimes, it’s to elevate an article’s historic nature for people new to the field (“this is a key article”). Sometimes, it’s to point to outside resources in a helpful way. Sometimes, a comment is made to indicate a small error. In none of these cases is a discussion the motivation. Because Remarq lets editors and publishers see moderated comments before they go up (a strong, consistent preference), this may make these and other motivations clearer to us.

“Comments are stranded upon the individual pages where they are created.” There is no reason this needs to be the case. For example, Remarq gathers comments together for users, allows users to follow articles and journals (to be notified when comments or replies are made, in addition to when articles are cited), and makes recommendations for users, with commenting a factor in the weighting of recommendations. In addition, user profiles integrate with users’ commenting histories.

“Comments are in a jumble at the bottom of the page.” Again, this doesn’t need to be the case. Remarq’s commenting features (including editor and author updates) combine the contextual feel of annotation with the ability to comment. This way, a comment is either article-level (so no internal highlighted text, but the comment is placed in a sidebar, so not at the bottom) or contextual (with highlighted text indicating what the comment refers to). So, this is not an actual true fact, but just an inelegant solution that we think Remarq has overcome.

The authors also discuss moderation as if it always needs to be after-the-fact or post hoc. Remarq and other tools place moderation policies squarely in the hands of publishers, so that editors or moderators can review comments before they post, if they wish. With Remarq, users have to be qualified in the relevant fields before they can comment, as well. Remarq also distinguishes between user comments, author updates, and editor updates via roles. Remarq also supports co-authored comments, figures, images, and references in comments.

It’s also interesting to note that in the case of at least one of these tools, which is installed on eLife, while there is talk of “interoperability,” the actual user experience appears to be an isolated one, where the profile on eLife is unique to eLife, making the annotations not interoperable in the user’s experience — that is, they can’t integrate them into their overall profile for this tool, and they are stuck on eLife. Remarq’s annotations, comments, and other features are always accessible to the user, and there are no islands like this. Even if users extend their Remarq experience via our free plugin (Remarq Lite), their full profile and history comes with them to any site they visit, be it PubMed or a cooking site (we’ve seen everything in between, as well).

The W3C standards were a great step forward for annotation, and Remarq supports the standard, but also goes well beyond it in order to realize scholarly use-cases and user-centric design in an elegant and useful way. Standards are a baseline and starting point, but users, publishers, authors, and editors seem to be seeking more refinement, capability, and utility, based on our experience. Annotation is a nice feature, and definitely valuable, but tools that bake this into a broader and more holistic publication solution will take annotation farther into actual workflows. Providing useful analytics and clever administrative tools is also proving important.

By Kent Anderson
Aug 28, 2018, 7:54 AM

Thanks Kent for your notes.

Regarding the various notes on “comments”. These are meant to distinguish a standards based annotation paradigm, which Remarq and Hypothesis have committed to, from a traditional commenting platform which suffers from these various drawbacks. Regarding the “jumble at the bottom of the page”, certainly a blended approach avoids this. The post meant to distinguish from systems which exclusively use a “below the fold” approach to commenting.

The most important point is this: In order to move annotation to scale as a fundamental capability of the web, and in order to avoid yet another set of proprietary systems which don’t work together (the trap previous efforts have fallen into), at Hypothesis we believe strongly that a federated annotation approach is the only path forward. It’s why the Web beat AOL.

You are correct to observe that the current eLife implementation does not implement this (the article made no claims about Hypothesis, only claims about what the authors think is important for the future of this space). That is a consequence of eLife wanting their own 3rd party authentication scheme, which we support. However, the development of a multi-service client that would enable annotations from various compatible services or namespaces (including Remarq if that is in your future) to be browsed simultaneously features prominently in our future– as you may have heard us discuss in any number of public fora over the last year.

It’s a complex undertaking, but one we look forward to– and presumably, as a member of the Annotating All Knowledge coalition committed to interoperability, one Redlink does too.

By Dan Whaley
Aug 28, 2018, 10:23 AM

Thanks, Dan. Interoperability is a complex concept, and can be a euphemism that deprecates user needs and desires to system thinking and technologists’ preferences. Users want something that helps them get things done, editors want something that extends their capabilities, and authors want something that empowers them. Do you mean unbounded exchange of information? That makes users uncomfortable. Do you mean a way for anything to be annotated? Sure, we do that. There has to be more precision than buzzwords often allow, and what we’re finding is that publishers, authors, editors, and users have pretty sophisticated and distinct needs, which demands going beyond a standard. Also, being user-centric rather than technology-centric means “interoperability” may take a back seat at times, or even overall, if the goals are engagement, utility, value, and usability.

By Kent Anderson
Aug 28, 2018, 10:40 AM

Buzzwords in technology abound, we all agree. But that doesn’t mean that there aren’t fairly obvious definitions for interoperability we can draw from. With web browsers its clear that they should be able to render pages the same. With email clients its clear that they should be able to send and receive email from any POP or IMAP compatible services. With IRC that you should be able to connect to and participate in forums on any IRC service. But there are degrees: Gmail offers functionality in its web interface that is not supported when you use a third party app.

The W3C undertook to standardize Web Annotation specifically to foster an ecosystem of interoperable clients and services. Specifically, it was “chartered to develop a set of specifications for an interoperable, sharable, distributed Web Annotation architecture.” https://www.w3.org/annotation/

In the Annotating All Knowledge Coalition calls last year we began to discuss a hierarchy of interoperability, going from zero where systems are not only interoperable but interfere, to full blown read-write compatibility and feature parity. A draft of that is here: https://docdrop.org/pdf/Levels-of-Interoperability-between-Web-Annotation-Systems-1–b95ru.pdf/

Essentially, to be truly interoperable, it seems clear that one annotation client should be able to create, fetch and reanchor annotations to and from a compatible service, and support all CRUD (Create Read Update Delete) operations.

If you’d like to offer a different definition, I welcome the discussion.

By Dan Whaley
Aug 28, 2018, 12:36 PM

“not only interoperable”
Sorry, I meant to say “not only *not* interoperable”.

By Dan Whaley
Aug 28, 2018, 1:06 PM

I respectfully disagree with Kent’s assertion that a great UX may be at the cost of interoperability. At PaperHive we have successfully brought together papers and people to facilitate communication and scholarly collaboration within the site, all the while maintaining a relentless focus on smooth cross-platform UX. At the same time, we’re also strongly committed to interoperability – from day one there was never a time where this interfered with our UX, or advancing the platform to serve users’ and publishers’ needs. Quite the contrary, the discussion capabilities of the PaperHive platform greatly benefited from interoperability discussions with annotation services like Hypothesis and the W3C web annotation group in general.

As highlighted in the article, a common model for exchanging data (annotations in this case) does not prevent anyone from building sophisticated features on top of that. For example, users and publishers will always be able to retrieve the data in an interoperable format from the PaperHive API, even though the PaperHive platform itself uses richer data internally and takes a more holistic UX approach where annotation is only one ingredient.

That being said, we are sure that all initiatives pursuing interoperability provide a huge benefit to users and publishers and we really appreciate the discussions around this topic.

By André Gaul
Aug 29, 2018, 5:15 AM

I maybe wasn’t as clear as intended in what I said. I wanted to note that as far as user priorities and preferences, a strong UX is far and away more important than technical feats like interoperability, which users don’t really seem to care about.

By Kent Anderson
Aug 29, 2018, 1:54 PM

“a strong UX is far and away more important than technical feats like interoperability, which users don’t really seem to care about.”

Users don’t care that email (for example) is interoperable? In fact, isn’t the interoperability of email way more important than its UX? Personally I’d take an old copy of UNIX Pine any day over modern Gmail if it meant my message would actually arrive in the recipient’s inbox.

In applications in which that capability is critical, interoperability trumps pretty much everything: Internet devices, web browsers, email clients and (lets hope) annotation systems. Sure, if you polled a bunch of AOL users in 1993 about whether interoperability was important, I doubt it would rank in their top ten if it ranked at all. But who uses AOL now? Just because “sufficient oxygen to breathe” doesn’t rank at the top of most people’s daily checklists doesn’t mean its not important to them.

Thankfully we don’t have to choose, as UX and interoperability are most definitely not mutually exclusive.

(Why join a coalition, “of some of the world’s key scholarly publishers, platforms, libraries, educational institutions, and technology organizations … to create an open, interoperable annotation layer over their content” if the “interoperable” bit is something you don’t think people care about? https://hypothes.is/annotating-all-knowledge/)

By Dan Whaley
Aug 29, 2018, 6:58 PM

Thanks to everyone who reached out with feedback and suggestions, unsurprisingly not through this commenting tool, which is in line with our commenting vs. annotation section above. As someone who annotates all day every day, just typing in this comment feels somewhat like trying to go back and remember how to text on a flip phone.

There is a publicly annotated layer across this article where I continue the conversation with updates and additional resources. Feel free to join me there!

By Heather Staines
Aug 30, 2018, 9:29 AM

The Scholarly Kitchen