Elsevier Has Deployed an End-user Tracking Tool for Security. Should Users Be Concerned About Their Privacy?

Earlier this year, Elsevier quietly began using a tracking system to detect potentially fraudulent behavior on their sites. This should come as a surprise to exactly no one. Elsevier and other publishers have been concerned about malicious behavior on their sites for a very long time. Elsevier is not the only publisher to use this class of online fraudulent behavior security service. There are other publishers using the same service that Elsevier is using. The question is, what is this service doing and is it problematic. As with all things in technology and, in particular online security, the answer isn’t so simple.

Folder labeled Fraud over a computer keyboard — Fraud by Nick Youngson CC BY-SA 3.0 Alpha Stock Images

The service being used by Elsevier is called ThreatMetrix and is owned and provided by the LexisNexis arm of the RELX holding company, which also owns Elsevier. At its most basic level, it is an anti-fraud service that makes an assessment about a user’s visit to a website, takes in available information about the state of the computer and what it knows about the user to discern what the site should allow the user to do. Slide 44 and 48 in this RELX corporate presentation provides a simple flow of the fraud services and how they work. Essentially, it is based on the premise that we can trust potential site visitors based on how much we know about them, their behavior and the state of their systems.

This service and others like it check against a variety of potential technical security signals that might indicate fraudulent behavior, such as is this device logging in from a significantly different network location (i.e., IP range) than usual, a single user signing in with many devices, or whether ports that might be used to remotely control a device open and active, or other unusual web services are active in the browser. This particular segment of the security market is focused on network or device assessment, endpoint malware detection, behavior analytics, and behavioral biometrics. The implementer of such a service can then adjust the site’s behavior based on a mix of these criteria to establish a rough estimate of the likelihood that this authentication process is legitimate.

This near instantaneous determination then impacts things like whether supplemental authentication protocols might be warranted, or which services might be enabled or disabled, such as the ability to download a file. For example, if you have ever logged into a website you normally use, but you do so from a hotel on a trip or from a coffee shop, the browser might ask you for a secondary level of authentication, such as a two-factor authentication via your cell phone or email. Another example might be for those who have traveled overseas and are not able to stream various content you may have access to at home, perhaps even if you are using a VPN to spoof your location. ThreatMetrix is one such technology that sits in the middle of that authentication process and could divert you to a secondary or tertiary process to validate you are who you claim to be or cuts off your ability to do certain things. These types of cloud middleware security are widely deployed across sites like banking, retail-transaction sites, streaming platforms, and government services, as you might expect for obvious reasons.

At a level below the basics, it is unlikely that the average end-user would know what is going on. Indeed, one would have to know exactly what to look for in order to identify the trackers being used on your device. For example, the information that Science Direct stores in your browser is contained in a nav.sciencedirect.com local storage file. The file contains only a THX_GUID data element, a global unique ThreatMetrix identifier that pseudonymously identifies the user. This information is then passed as an element of data exchanged with an indecipherable sub-sub-subdomain of online-metrix.net. Additional information is exchanged with h.online-metrix.net. Likely, this hashed information contains the user’s global ThreatMetrix ID and various information about their system and the subsequent exchange returns information about how to proceed with various functionalities the site then allows or disallows or how the information. It would be difficult for the average user to block this functionality and still make the site function properly.

Hardly any end-users would be looking through the information being deposited on their computer by every site they visit or trolling through the server calls that are made when a web page loads. Even if they were looking, would it be clear what a h.online-metrix.net URL does or what the 210-character string after the domain in the URL means? Almost certainly not. The nav.sciencedirect.com information that Elsevier stores in your system (which is not a third-party cookie, by the way, so it’s harder to block using your basic browser settings) is opaque in the extreme. In researching this post, it isn’t clear to me what information could be encoded in the hash strings being exchanged.

Elsevier maintains in an official statement, “As part of [a comprehensive information security program], we use LexisNexis® ThreatMetrix®, along with some of the world’s leading companies, to identify potential threats and to prevent unauthorized access to our products and services. The limited data collected in this process is only ever made available to Elsevier, and is used to protect our users, their institutions and our services from security threats.”

For an extended period, the publishing world has faced issues with access control and around controlling leakage of subscribed content. Publishers are working to streamline the access control process for users to make it both secure and unobtrusive. This has been a driving force behind initiatives like Seamless Access and GetFTR. I want to make it very clear that monitoring access or the kind of tracking that is done by ThreatMetrix is radically different from the access control and usability improvements of the Seamless Access initiative. Although some saw a connection between the two at first when ThreatMetrix use was first noted, the two technologies are completely separate and do not interact in any way. While, yes, ThreatMetrix is related to authentication, it is a security protocol built around the authentication process. Like pain pills masking the pain of running, you can mask the problem temporarily or you can address the real problem. In this case, there is a need to improve the user experience of gaining access to content. Some argue that we should live without barriers to content and that is the basic problem, but in a world where there is subscribed content, securing that content from fraud is a concern.

Fraud detection is a significant problem online and one that companies are beginning to adopt into their security practices. According to a Gartner report entitled, a Market Guide for Online Fraud Detection, fewer than 5% of companies have dedicated trust and safety teams to protect the integrity of all online brand/customer interactions. This is expected to grow to 30% by 2023. Fraud detection was initially focused on banking and retail, but it is now extending to healthcare, government services and media. It is not surprising that Elsevier is an early mover, albeit not the only one to do so within our community on this issue, having both technical capacity and resources to do so. Others will certainly follow.

The key question is whether tracking library patrons using fraud detection services like ThreatMetrix is acceptable. Although most library contracts provide for provision of the protection of users’ data from third-party data sharing and protections related to their privacy, services like ThreatMetrix would likely fall under security protocols and systems maintenance provisions and therefore would be legally acceptable even by the most rigorous reading of privacy rules.

Even the GDPR privacy protections has a carve out of for security and fraud protection. Recital 47 of GDPR explicitly cites fraud prevention as a ‘legitimate interest’ for the processing of personal data and is therefore not prohibited: “The processing of personal data strictly necessary for the purposes of preventing fraud also constitutes a legitimate interest of the data controller concerned.” (emphasis added). I am no lawyer, but certainly the corporate lawyers within both Elsevier and LexisNexis have cleared this exchange from a legal perspective.

Elsevier stressed this further in an official statement provided to me, stating: “We are firmly committed to maintaining our users’ confidence and trust with respect to their privacy. Our privacy policy explains how we collect, use, retain and share any personal information. For further information please refer to https://www.elsevier.com/legal/privacy-policy.”

While these measures are likely legal and in accordance with Elsevier’s privacy policy, there are elements of this data sharing that are very troubling, particularly for anyone who is concerned about privacy. But it is important to be clear, when I say it is very troubling, I am not talking here about what Elsevier is doing, or any of the other publishers using ThreatMetrix for that matter. (Though I should note, I haven’t explored what the other publishers who are using ThreatMetrix are doing in any detail.) Both ethically and practically, these publishers are taking reasonable actions to protect their content from fraudulent use and abuse. The tracking being done in the name of security and, at least from the data flows that have been described to me by those knowledgable about the systems, seem entirely legitimate and pose no threat to users’ privacy within Elsevier.

It is important context to understand that while both Elsevier and LexisNexis are owned by the same holding company, that does not mean that each company is necessarily sharing data with the other or that one company gets preferential treatment when it comes to services over others outside the corporate structure. We don’t have to look far outside of our own community to understand this. If data sharing were going on between the vendor and the corporate parent, what external company would still be working with Atypon for hosting their journal content if their data were being supplied to the parent company Wiley. Similarly, what publisher would license their eBooks through ProQuest Ebook Central or Overdrive if the sales or usage data were provided to other publishers using the platform. Obviously, based on the trust in the community there can be reasonably and accepted barriers to data sharing across related corporate entities. It is quite unlikely that Elsevier is receiving preferential access to user information or data from ThreatMetrix that Elsevier could be using to enhance their own profiling data. The reason for this is that in order to provide this kind of tracking across numerous major corporations, including many competitors across multiple business sectors, ThreatMetrix couldn’t be sharing user profile data, or data gleaned from other clients, as this would imperil ThreatMetrix business and relationships with its clients.

The decision to partner with ThreatMetrix likely had less to do with an internal strategy around data sharing coordinated among corporate units than simple business logic. Again from the Gartner report, the factors upon which a customer of fraud detection systems might select a vendor are governed by three rather basic criteria: price, ease of implementation, and recommendations within the same industry. Based on these criteria, the selection of ThreatMetrix by Elsevier makes obvious sense. It is likely the price paid was below market, if payment was even an issue within corporate divisions held by the same holding company. ThreatMetrix is a strong partner among those providing these services more broadly and is a recognized player in this space. Finally, it is likely that the positive recommendation of other large content providers also supported this decision. If you’re looking for a partner, one that is inside from your own corporate tent makes sense.

It is pretty clear from conversations I have had with people inside RELX familiar with the operation of these systems that there is very limited patron data being shared between Elsevier and ThreatMetrix. What data that is shared is apparently anonymized. Furthermore, what one arm of RELX is doing shouldn’t be inferred to be the corporate goal or activity of the other arm. In this particular situation, what Elsevier appears to be doing is understandable, apparently limited, and prudent.

HOWEVER, and this is a giant however, there are a number of privacy problems or potential problems with ThreatMetrix and even more significant problems with the larger business services of LexisNexis provides. The likely connection with LexisNexis data services through the data being captured by ThreatMetrix tracking should be quite concerning to librarians concerned about patron privacy. A different part of this LexisNexis business model apparently relies on the data network and data collection around user’s online activity that ThreatMetrix is tracking. One should be careful to not conflate what Elsevier is doing with what LexisNexis is doing.

Within this broader context is an even bigger issue and it resides not with Elsevier or other publishers using ThreatMetrix. Several in the world of privacy, notably Wolfie Christl in his report Corporate Surveillance in Everyday Life, have detailed how information is being collected by data brokers like ThreatMetrix and how they are monetizing it. Notionally the data provided by Elsevier and other content providers to ThreatMetrix is anonymized and very limited in scope. However, there is a great deal of research that points to the ability to de-anonymize data, if you have enough of it. Through browser fingerprinting, IP-address tracking, and cross-site behavior tracking one can re-identify someone from anonymized data in most cases. It is clear that ThreatMetrix is not only capable of doing so, but that they are doing it and then turning around to sell it as a service to others.

No one from the outside would ever know what internal data processing is done with the anonymized data collected from the hundreds of ThreatMetrix customers, or the hundreds of millions of individual users those companies serve, such as the millions of library patrons using content by publishers implementing ThreatMetrix services. While we can’t know for certain, one can draw some inferences by the other products that LexisNexis sells to government law enforcement agencies. LexisNexis own marketing material that describes their LexID digital data and what can be done with it. The basis for this is the combination of “online and offline behavior” through various data collection tools. Back in 2018, ThreatMetrix claims to have identified 4.5 billion devices, 1.5 billion mobile devices (See Slide 40) and then be able to match up that information with physical addresses, consumer records, IP addresses, consumer identities, and a variety of other public and private data. The lynchpin of a large chunk of this digital ecosystem is very likely the THX_GUID that is stored in the user’s browser, which uniquely identifies the user across as many sites that use ThreatMetrix. Tracking what sites a user visits, from what location, and when, can provide a valuable and detailed map of a person’s life. This could be potentially very valuable data for law enforcement, and it is clear that they are taking advantage of it as evidenced by LexisNexis press release earlier this year. Even anonymized information from each corporate partner of ThreatMetrix, while individually not problematic, when aggregated creates serious issues.

It is easy to de-anonymize data if you have enough other information and can connect various anonymized information across datasets. Information collected for security purposes, such as browser settings can be used to uniquely identify a web browser. In this way, if you visit a new site, even if you’re new to the site, you can be linked to your unique profile in a data tracking system. These practices are well known and used most notably by advertising companies. Where LexisNexis takes this practice a step further is not simply tracking people to sell them sneakers, or their next holiday, or their potential next elected representative, but it is selling this tracking service to law enforcement and governments, specifically for the purposes of tracking and apprehending suspects. Whatever you may think about the appropriateness of online behavioral data being used by law enforcement, protecting patron’s intellectual freedom from government monitoring is exactly the rational that drove the library community toward advocating for privacy in the early 20th century.

Elsevier and other publishers are right to seek the prevention of theft of their content and it is entirely reasonable that they partner with service providers that can provide robust protection against the real fraud that is taking place. If your business is selling to the library market, turning to LexisNexis because of this other business line of capturing and selling analytics services based on user behavior tracking based upon these services would probably violate the spirit, if not the fact, of a company’s commitments to not tracking library patrons. It is not that LexisNexis is specifically tracking the download of this paper or that on the Elsevier system, but that it is using these behavioral data, in particular its identification of people and their devices, to build a profile of an ever larger segment of the population to track those citizens.

One of the problems we all have in choosing online services, be it online security, email provision, or where we choose to watch our next movie, is understanding the full scope of the business that the service provider is engaged in. In this digital environment, it oftentimes isn’t simply getting you to watch the movie or providing you email. The real reason for providing email service could be to train machine learning, or craft advertising, or it could be in selling your behavioral data to someone else. We all need to better understand what the core business of our partners are. You might not be the one getting the service, rather you might be the service for someone else. Publishers using ThreatMetrix need to understand that motivation and reconsider whether it is the right partner in this community.

Todd A Carpenter

@TAC_NISO

Todd Carpenter is Executive Director of the National Information Standards Organization (NISO). He additionally serves in a number of leadership roles of a variety of organizations, including as Chair of the ISO Technical Subcommittee on Identification & Description (ISO TC46/SC9), founding partner of the Coalition for Seamless Access, Past President of FORCE11, Treasurer of the Book Industry Study Group (BISG), and a Director of the Foundation of the Baltimore County Public Library. He also previously served as Treasurer of SSP.

Discussion

7 Thoughts on "Elsevier Has Deployed an End-user Tracking Tool for Security. Should Users Be Concerned About Their Privacy?"

Yet another reason to avoid Elsevier products. (And that sadly includes the Aries Systems’ excellent Editorial Manager platform as well.)

By atw
Oct 13, 2020, 11:08 AM

Aries Systems has not deployed or implemented ThreatMetrix to any Editorial Manager journal sites, and currently have no plans to do so in the future. We encourage publishers and editors to contact their dedicated Aries Account Coordinator for additional questions or concerns. Thank you!

By Aimee DesRoches
Oct 14, 2020, 8:10 AM

“Elsevier is not the only publisher to use this class of online fraudulent behavior security service. There are other publishers using the same service that Elsevier is using.”

I am curious to hear who the other publishers are and the services they are using.

By Emily McElroy
Oct 13, 2020, 12:26 PM

Great write-up, but none of this is particularly surprising or new, most other industries in addition to the financial services have used this sort of technology for years. Privacy concerns around reverse engineering anonymised data is a question of utility Vs cost, it’s highly unlikely a nefarious actor would go to such lengths to gain these insights, especially when there are far simpler ways to gain those data points.