The largest scholarly publishers today are driven by one major near-term strategic concern — to reduce leakage and thereby bolster the value of the subscription bundle. But while they work belatedly to address this priority through Seamless Access (RA21), GetFTR, and even partnerships with ResearchGate, the savviest of them are keeping their eyes on the true structural transformation that the internet has wrought. We are witnessing the transformation away from a journal-centric model of scholarly publishing towards a researcher-centric model of scholarly communication. Success in this new environment requires engagement with researcher identity, which is a struggle even for most of the largest publishing houses. Who is competing to own researcher identity and how can other publishers engage this vital role?
The researcher-centric model of scholarly communication has been emerging gradually and growing steadily. Researcher-centric scholarly communication enables collaboration, supports workflow, and provides personalization.
The evidence of the shift to a researcher-centric approach is everywhere. Discovery of the scholarly literature has moved away from browsing individual journal titles and towards searches, feeds, and alerts, among other mechanisms that are increasingly personalized and that look across the literature as a whole. Open access is driving publishers to focus on author relations when in the past library relations took greater primacy. Scholars are adopting an array of tools for managing the research enterprise, seeking tools that enable laboratory-based team science and cross-institutional collaborations, including lab notebooks and other types of sharing and collaboration spaces. Universities are looking to showcase scholars as universities compete with one another for research funding and assess scholars for their potential to contribute to this competition.
Each of these dynamics, and others like them, may seem to the publisher as requiring a slight adaptation, but collectively they suggest the emergence of a new type of model. This new model provides services based on extensive information about the researcher and the connections among individual researchers.
To be sure, the impact factor and the journal title remain important, more important than some would like. But, let’s acknowledge the reality that the actual version of record has declined in comparative importance. In its place we are seeing a growth in the importance of other kinds of research artifacts, not only preprints but also everything from datasets to protocols. It is possible to imagine these artifacts grouped tidily by existing publishers, titles, and articles — but the reality of the scientific process is that these artifacts are more likely project-based, laboratory-based, or researcher-based, not publisher-based.
At this point, we have not seen a complete transition away from journal-centric publishing. Instead, this is a hybrid period, perhaps one that will ultimately have been transitional, that includes traditional journal-centric publishing alongside these newer approaches that increasingly center on the researcher.
To compete in this emerging arena of researcher-centric tools, platforms, and analytics requires the ability to embrace researchers’ natural workflows and collaborations. And to do that requires that publishers and other providers find ways to center their work, in a technical sense, on the researchers themselves.
The key enabling factor in this transformation is researcher identity. Researcher identity, as I use the term, has two common elements.
First, there is a mechanism for individual researchers to express and ideally control aspects of their identity. This includes possessing some kind of user account, enabling data about the researcher’s interests and practices to be associated with them. The user account can ultimately be associated with a variety of tools and dashboards that are customized to them.
Second, there is the potential to link individual identities together in ways that express a network of one’s professional and scientific connections. This can include co-authors, laboratory members, collaborators elsewhere, others interested in the same research topics, and so forth. This social graph will typically be both inter-institutional and intra-institutional — put another way, it is largely non-institutional and certainly it is not publisher-specific.
Together, the user accounts and social graphs can follow a variety of formats and be subject to a variety of different controls. There are many different instances of standard format and control. For example, Google controls an enormous identity instance for all of its consumer Google accounts (enabling the use of Gmail, Google Docs, etc). At the same time, it also allows companies and schools to utilize their own identity instances for their employees and students through GSuite (enabling the use of Gmail, Docs, etc, in an institutionally controlled environment).
Identity instances enable the kind of researcher-centricity discussed above, and they can contain and manage an enormous amount of user data. As a result, the control of identity instances, and how, if at all, they interoperate is a strategic issue. It is possible for identity instances to follow open models and for multiple identity instances to be interoperable. Yet it is often the case that the organization that controls one identity instance feels responsibility for — or value in — maintaining a stronger degree of control.
As we now turn to reviewing some of the identity instances for researcher identity, it will become clear that what might be best for researchers is not the same as what is emerging in the marketplace.
Decentralization Serves Researchers
A decentralized approach to managing researcher identity would be in the best interest of researchers themselves. At its essence, a decentralized approach would provide user accounts controlled by the researchers themselves and made available to a platform provider on an opt-in and as-needed basis. I wrote in 2015 about the benefits of a single user account, providing a researcher identity instance that would be cross-university and cross-publisher, combining authorization as well as personalization. The technical architecture exists to develop this kind of research identity instance, as scholarly communication visionary Herbert Van de Sompel has recognized, but decentralization adds a challenging sociopolitical impediment.
BYU has led some efforts to bring a version of this approach into being for its students through what it has called a personal API. Unfortunately, we have yet to establish the standards and capabilities needed to enable decentralized researcher identity. The parties that would have been most likely to have seen this approach as aligned with their own values and interests — universities and their libraries — do not seem to see a strategic importance in taking leadership for researcher identity.
The Seamless Access (RA21) initiative builds on the institutional nature of identity management, through existing university controlled platforms. It ensures that, even though identity is everything, a truly researcher-centric alternative is not developed as part of the current efforts to address piracy.
Instead, identity instances for researchers are being developed in the marketplace. While no single provider covers 100% of scientists and other scholars, and while the potential for these identity instances and their control still remains unclear, it seems increasingly likely to rest with a profit-seeking corporation such as ResearchGate, Elsevier, or Clarivate.
To date, ResearchGate appears to be winning the battle to build a sector-wide identity instance for researchers, regardless of university or publisher, with Academia.edu as its primary competition. Though many have predicted the demise of these academic social networks, their continued growth cannot be dismissed.
ResearchGate reports having 15 million members worldwide. Some portion of these “members” are presumably inactive, or active only in limited ways. Even so, there is reason to believe that ResearchGate represents a substantial share of the global scientific community. While it is difficult to know exactly what its members are doing on the platform — anything from reading articles to engaging with collaborators to searching for jobs — the amount of traffic they generate is enormous. According to data from SimilarWeb, in a recent three month period, ResearchGate’s traffic was nearly equal to that of ScienceDirect, SpringerLink, and Nature.com combined. Or, to provide another comparative, ResearchGate’s usage was almost equivalent to that of a basket of major Elsevier properties, including ScienceDirect and all its other major STM properties, including Mendeley, bepress, SSRN, and Pure.
There is power to this scale. ResearchGate has been able to associate much of the scientific literature with its authors, enabling a variety of analytics that it is able to turn into services and in some cases to monetize. Even though ResearchGate is one of the largest sources of leakage and is therefore being sued by an array of the major publishing houses, the power of ResearchGate’s data has been sufficient to enable it to develop a partnership with Springer Nature, at least on a pilot basis, in which Springer Nature content is freely distributed on ResearchGate.
Of the primary publishers, Elsevier is the only one that has adopted a strategy that requires it to take on the role of managing researcher identity. Elsevier has acquired and developed an array of collaboration, discovery, analytics, showcasing, and assessment tools, including Pure, Mendeley, Hivebench, and bepress, among others. All of these are, to a greater or lesser degree, centered around the researchers themselves. As they are woven together into a workflow, with data and dashboards building connections across them, they require a single researcher identity instance. And for this reason, Elsevier has been steadily integrating these properties by combining user accounts at the identity and data layer even while maintaining distinctive brands.
There are some reasons to suspect that implementation is lagging on the integration side — specifically, Mendeley as the individual dashboard should be more interoperable with Pure as the institutional dashboard. But researcher identity is at the heart of Elsevier’s pivot beyond primary publishing and towards its future as what it calls an information analytics business. And, in that respect, no primary publisher has made greater inroads in researcher identity than Elsevier, and none has developed an identity instance as robust as Elsevier’s. This instance is a major asset that Elsevier is working to develop and prepared to defend. If other publishers could see fit to trust Elsevier to act neutrally with respect to its primary publishing business, it is possible that Elsevier’s identity instance could be offered as the basis for cross-publisher services offered by Elsevier (for example, publication services) or ones that could be offered by others.
Clarivate has taken a very different approach to researcher identity. On the one hand, Clarivate does not have the legacy of a primary publishing business. Yet, its legacy is equally tied to the journal-centricity of those businesses, through properties like its flagship Journal Impact Factor as well as ScholarOne. But the strategy that Clarivate pursued in recent years — though unclear how it will evolve following Annette Thomas’s departure — has also relied notably on researcher identity.
When Web of Science was still owned by Thomson Reuters, it created ResearcherID, a researcher identifier. After becoming a component of Clarivate, it purchased Publons, a service to provide credit to peer reviewers. More recently, it has merged the two together, creating a single dashboard for tracking one’s work as an author and reviewer across many publishers. Today, the Clarivate Web of Science group maintains a single identity instance enabling the use of its Publons, EndNote, and Web of Science properties. Over time, we may expect to see it combine Kopernio and other properties into this identity instance, enabling increased seamlessness on the user side. Given Clarivate’s positioning as “publisher neutral,” its identity instance could serve as the underpinning for a variety of cross-publisher initiatives that could over time challenge Elsevier’s efforts at analytics dominance.
While the corporate players continue to parry, ORCID represents a community based alternative that could grow from what it is now — principally a researcher identifier — into more of an identity instance for researchers. Indeed, it is already showing some evidence of this, providing social login support for other services.
It is possible to imagine the ORCID functionality being enhanced to become a more robust identity instance, covering not only authors and contributors but potentially a wider array of researchers and users and developing the elements of a social graph. Such an ORCID identity instance might offer centralized versions of certain core features. But, it might also adopt some of the core information ownership/control principles of the decentralized model discussed above. In such a scenario, it would allow its users to port their identity, on an opt-in basis, into a variety of services across the community — and remove their information from those services with equal ease. But, bearing in mind the current debates about CrossRef and its future directions, it is very difficult to imagine community members like Elsevier and Clarivate supporting ORCID expanding its role to become a full fledged research identity instance.
While ResearchGate may have developed a strong position as an identity instance for researchers, from a consumer perspective ResearchGate is a “niche” social network. That is to say, if one of the major consumer identity instances were to decide to develop its position in academic research, that could really change everything.
Google is an obvious candidate. An enormous number of researchers have accounts with Google’s consumer identity instance, and Google offers an array of services through its GSuite to many universities. Google Scholar is a very important discovery service for scientific research, while its Classroom has suggested a more recent willingness to develop educational tools and platforms. Its offerings may be too fractured, ultimately, to enable it to compete in what must be, from its perspective, a very small market. If it were to develop links between its GSuite for Education services and Google Scholar, that might be a sign of something afoot.
Facebook, like its philanthropic sibling CZI, is directly controlled by the Zuckerberg family. For this reason, it may be important to consider Facebook’s strengths as a social graph in combination with CZI’s acquisition of Meta and support of bioRxiv, each of which takes scholarly communication in the direction of researcher centricity. If Facebook’s enterprise product Facebook for Work, or something similar to it, begins to develop towards higher education and scientific research, that might be a sign of something afoot.
Other consumer players, including LinkedIn and Microsoft, have had less interest or less success in the scholarly communication space but could develop in these directions.
The control of researcher identity, and the management of the identity instances, should properly be seen as a major strategic dilemma for publishers, universities, and others. It is clear that the development of researcher-centric services has been hamstrung by too many publishers and other providers offering their own user accounts. Because these have not scaled, the nature of the services that can be offered remains limited. Network effects suggest we will over time draw down to a smaller number of stronger offerings for identity management. But whose interests will win out?
Perhaps the most important point of competition is between Elsevier and ResearchGate. Many publishers are in a battle royale against ResearchGate, angered by the leakage they see ResearchGate fostering. But Elsevier, which has a competing identity instance and researcher-centric set of services, has a unique rationale for leading the battle against ResearchGate — to promote its investment in analytics and defend against its most significant competition. In contrast, SpringerNature publicly, and others more privately, have examined opportunities to collaborate with ResearchGate. While no major publishing house would likely wish to rely on ResearchGate as the exclusive intermediary for its interactions with researchers, Elsevier has had a particular competitive rationale for pursuing the copyright litigation.
In this competition, however, it may be that the interests of the other major publishers, let alone the longer tail, are being ignored. None of them individually has the scale to create an alternative.
If they were to choose to do so, other publishers might be able to negotiate terms to use Elsevier’s identity instance. There are recurring rumors about ways that Elsevier has offered competing publishers opportunities to “plug into” its platform and analytics businesses. Is there a set of terms that could meet the business needs both of Elsevier and its publisher competitors?
On the other hand, it is not clear exactly what Clarivate intends in building a “publisher neutral” identity instance. In one way of thinking, Clarivate is building a portfolio of platforms, workflow, and analytics services that compete directly with Elsevier’s; i.e., Web of Science vs. Scopus; EndNote and Publons vs. Mendeley; Converis vs. Pure; ScholarOne vs. Aries. Is there a model in which Clarivate in essence becomes allied with all the major publishing houses other than Elsevier and its identity instance is shared with them?
ORCID faces the dilemmas of a poorly capitalized membership organization. As with CrossRef, many of the most exciting features that these community entities can build next might compete with one or more of their members. Can ORCID develop beyond a valuable identifier and towards more of an identity instance in ways that do not lead to unsustainable clashes with its members?
At the same time, the higher education sector remains absent from this landscape. The most prominent engagements from librarians about identity management have focused on opposing publisher efforts out of understandable concern for the protection of privacy and data security. But, we have seen no groundswell of effort to develop decentralized and/or community-controlled infrastructures to enable researcher-centric solutions.
If in the long run there is to be only one researcher identity instance, which will it be? And whose interests will it advance? Researchers strangely seem to have the least voice in the matter.