On the high seas, you can see a pirate approaching for miles. This gives the captain and crew time to prepare themselves against the onslaught, warn other ships, and call the naval authorities for help. On the Internet, pirates act with stealth. Often, you don’t even know that you’ve been boarded until the pirates have left with the booty. This is a story about such an attack that took place last Christmas Eve by Sci-Hub pirates.
When you see the spike in article downloads to this one publisher, the first and most obvious question was whether the platform provider detected such an attack and whether the publisher was notified. Nope and nope.
This naturally leads us to ask the second question of why such an obvious surge in downloads wasn’t detected? Did the sailors, who were supposed to be on watch, get into the captain’s private stash of rum and fall asleep in the crow’s nest?
If you take a more granular look at the pattern of downloads that day, however, the spike is much less obvious. The download rate from that software robot did not exceed 27 papers per minute. Instead of moving systematically from one journal to the next, requesting papers sequentially, the pattern was much more random in nature. It was as if the pirate robot was trained specifically to behave like a human.
I contacted this publisher’s platform provider, who described to me that they currently employ two blocking techniques: The first is automated and done for each journal, blocking an IP address temporarily when downloading rates exceed their limit, but clearing the address within a few minutes when the traffic returns to normal levels. Their second method is executed manually and blocks the offending IP address(es) from all content this platform provider hosts.
Almost all of the downloads that took place on this Christmas Eve were attributed to a single IP address registered in Iran. In his investigative work on Sci-Hub, journalist John Bohannon reported that Iran hosts several local Sci-Hub mirror sites. According to the full dataset, this one Iranian IP address was responsible for a total of 88,202 article downloads across hundreds of scientific publishers. There were many other IP addresses with article counts far exceeding what a human being is capable of downloading. In other words, the sea is full of pirates.
Part of the problem of detecting robot pirates is that each publisher has a myopic view of what is taking place on the high seas. By keeping download requests low, cycling through journals, and jumping from publisher to publisher, this pirate robot was able to escape detection.
Put into nautical terms, while this platform provider outfits each individual ship (journal) with a rudimentary pirate detector, it invests little in overall security of the seas. Consider, now, that there are multiple shipping companies (online platform providers), each with their own security methods but no one responsible for policing the high seas, and you have a situation where online piracy can thrive.