At a press conference on Friday last week, the U.S. Federal Bureau of Investigation (FBI) unsealed indictments of nine Iranian citizens. This sentence is an odd way to start a Scholarly Kitchen post, admittedly. What makes this case interesting to the scholarly community is what these men were indicted for: the bulk theft of intellectual property from academic institutions in a brazen scheme to gather and redistribute scholarly content. The indictments outline a multi-year effort launched in approximately 2013, by the Mabna Institute, a company based in Tehran, to assist Iranian universities and scientific and research organizations in stealing access to non-Iranian scientific resources. The indictment press release describes the alleged efforts whereby:
“…the Mabna Institute, through the activities of the defendants, targeted more than 100,000 accounts of professors around the world. They successfully compromised approximately 8,000 professor email accounts across 144 U.S.-based universities, and 176 universities located in 21 foreign countries.”
In addition, the scheme sought to capture credentials and materials from 47 U.S.-based and foreign private sector companies, the U.S. Department of Labor, the Federal Energy Regulatory Commission, the State of Hawaii, the State of Indiana, the United Nations, and the United Nations Children’s Fund. The indictment alleges a complex and architected effort targeting all domains of research, including science and technology, engineering, social sciences, medicine, and other professional fields. The defendants allegedly conducted reconnaissance of targets to determine individuals’ research interests and where they had published articles. Based on that background information, posing as colleagues from other institutions, the team sent phishing e-mails to their targets. Once compromised credentials were collected they were then used to access and copy materials, including scholarly journals, theses and dissertations, and electronic books for further distribution. Credentials were allegedly then also resold for others to access the compromised institution’s systems.
The scale of this effort was tremendous as outlined in a statement Friday by U.S. Deputy Attorney General Rod Rosenstein, “These nine Iranian nationals allegedly stole more than 31 terabytes of documents and data.” This amount of stolen data is roughly equivalent to the disk space necessary to hold a digitized version of the print collection of the Library of Congress (if LC were to do so). In addition, the FBI alleged that the scheme had ties to the Iranian government and Iranian universities. Rosenstein continued, “For many of these intrusions, the defendants acted at the behest of the Iranian government and, specifically, the Iranian Revolutionary Guard Corps.” This allegation is quite significant and, if true, would considerably raise the stakes in the cat-and-mouse game of using credentials to illicitly capture and republish scholarly content.
Although not named in this case, there have been claims and questions whether Sci-Hub has been engaged in similar phishing and credential theft activities. Alexandra Elbakyan, founder of Sci-Hub, denies being involved in these sorts of fraudulent activities. The case brought by the FBI is the clearest legal indication that these types of attacks on the academy for the purpose of acquiring intellectual property are happening regularly and at a massive scale, some visible and some less so.
As it happened, just hours before this announcement was made, I engaged in a Twitter discussion with some open access advocates who continue to believe that Sci-Hub and related services are functioning on the basis of “donated credentials.” In an email interview published by Mike Taylor, a paleontologist at the University of Bristol, Elbakyan is quoted as saying that the use of phished credentials to add content to the Sci-Hub system, “is possible, because Sci-Hub acquires passwords from many different sources,” not just “donations.” I contended via Twitter that the scale and regularity at which compromised credentials are identified by publishers and libraries indicate that this activity is far more systemic and pervasive than a few hundred credential “donations.” And now, it is formally alleged that there is an active and aggressive, government-sponsored effort to drive this effort of scooping up login information.
Could there be a link between these Iranians and Sci-Hub? Again, such a formalized link isn’t clear, but the indictment states that one of the sites run by the defendants, Gigabpaper.ir, “sold a service to customers within Iran whereby purchasing customers could use compromised university professor accounts to directly access the online library systems of particular U.S.-based and foreign universities.” From my perspective, to presume a connection between Sci-Hub and the Mabna Institute would be pure speculation, but one only needs to glance at the available usage levels of Sci-Hub, to note that Iran-based access is a significant source of the service’s usage. (These usage levels are based on the data that was provided to John Bohannon for his 2016 article in Science.) There is no reason to imagine that the Iranians would be the sole source of compromised credentials used by Sci-Hub, but if one considers the multi-year scale of the Iranian effort described in the indictment, with allegedly significant financial support from the Iranian government, it seems odd that a “poor Ph.D. student” like Elbakyan would be able to replicate a similar enterprise without similar resources.
Reading closely Deputy A.G. Rosenstein’s statement that, “this case is important because it will disrupt the defendants’ hacking operations and deter similar crimes,” it would seem odd to target two relatively small redistribution sites in Iran and pass over the prime offender in the marketplace. On the other hand, it might have been easier to prove the direct connections outlined in this indictment than it might be with others. Of course, we don’t know what else the FBI might be working on, and whether further sealed indictments might exist or be forthcoming. Perhaps the FBI is more aggressively pursuing the source of the compromised credentials and not the distribution of the content once it has been acquired. As Elbakyan stated in the Science article, “I cannot confirm the exact source of the credentials, but can confirm that I did not send any phishing emails myself.” If the Iranians had developed the infrastructure to gather these credentials, why wouldn’t Sci-Hub just use what was available, rather than build it themselves?
While I am certain that some community members are foolish enough to “donate” credentials, it is now clear the scale and scope of the breaches of academic systems to illegally aggregate intellectual property go well beyond a few hundred zealots or simpletons willing to blithely throw their campus’ online security to the winds. The intellectual property being sought is worth hundreds of millions — if not billions — of dollars. And the intellectual property stolen is not just papers and books, but it might be someone’s next paper, scientific discovery, or corporate research. Also importantly, the theft of these credentials is significant because they provide access not only to library resources, but also administrative systems, email systems, and other valuable research resources containing private information.
One of the earliest developers of the Internet, Vint Cerf, has commented that he considers one of the biggest missteps of the formulation of the Internet to be not designing or building security more deeply into the infrastructure at the time. It was a non-trivial addition to an already complicated endeavor, so it’s not particularly surprising that it wasn’t addressed. In a 2015 Washington Post article, David Clark, an MIT computer scientist and another internet pioneer, was quoted, “It’s not that we didn’t think about security. We knew that there were untrustworthy people out there, and we thought we could exclude them.” This point is nothing new, especially for Clark and those that have been deeply involved in web technologies since the development of network computer systems. Clark was interviewed for an article some 9 years earlier, in 2006, about the problems of internet security, where he reflected even further back on a 1992 presentation that covered the topic of the lack of embedded security in Internet protocols. What is interesting and troubling is how some of those “untrustworthy people” now actually have government support for their nefarious activities.
The relatively lax information security surrounding access to subscribed resources is one of the reasons behind the push toward the Resource Access in the 21st Century project #RA21, led by NISO and the STM Association. The publishing community has done woefully little over the past 25 years to innovate and improve the access control systems in place to provide users with easy access to subscribed content, particularly as they have become more mobile. Would RA21 help prevent the types of phishing schemes that are at the core of this case? Possibly not, but its solutions, when adopted, would certainly limit the potential damage from a single compromised credential by being able to target the source of that compromised credential more quickly and more precisely. Also, tying content access to the credentials that patrons have and use regularly for access to a variety of other systems makes a lot of sense to raise awareness of the need for security, and to make it more routine for users to authenticate to get access to materials. This process need not be cumbersome, as anyone who authenticates daily for access to Facebook, Twitter, Google, or Amazon can attest. What RA21 seeks to achieve is to make the individual login experience for subscribed content similar to those that we all use daily without thinking twice about it, ideally in a more privacy-protecting environment than those other services.