Yesterday, Aaron Swartz — who designed the RSS 1.0 specification and is a well-known Internet academic — was indicted in Boston on charges of wire fraud, computer fraud, and a number of other infractions for allegedly breaking into a computer closet at MIT and illegally downloading millions of documents from JSTOR.

Swartz, now 24, is an Internet and open-government activist who was a fellow at Harvard’s Center for Ethics. He has done things like this before. In 2009, he downloaded millions of documents from a government site called Pacer (Public Access to Court Electronic Records), primarily because he and others thought it was wrong to charge citizens $0.08 per page to access public documents on a system that was archaic.

The months-long effort to break into MIT and JSTOR likely was also designed to draw attention, as the indictment suggests:

Although Harvard provided Swartz access to JSTOR’s services and archive as needed for his research, Swartz used MIT’s computer networks to steal well over 4,000,000 articles from JSTOR. Swartz was not affiliated with MIT as a student, faculty member, or employee or in any other manner

It’s tempting to think that he was using MIT access to prove a point as an activist — a way of demonstrating that anybody, no matter their affiliation with an academic institution, should have access to this information. However, according to the indictment, he repeatedly tried to conceal the fact that he was gaining unauthorized access to the MIT network, and persisted despite his computer being barred again and again. He allegedly changed to a different computer when his first became too well-known to network security. The indictment accuses him of using a program called “keepgrabbing.py” and of fleeing from police when spotted. He allegedly used fake email accounts and a Mailinator throwaway email address to cover his tracks.

That doesn’t sound to me like someone trying to make an ethical point.

Reactions from those involved smack of preparedness, as clearly this has been a simmering story that the indictment brought to a full boil. JSTOR issued a statement reading, in part:

Last fall and winter, JSTOR experienced a significant misuse of our database. . . . The content taken was systematically downloaded using an approach designed to avoid detection by our monitoring systems. The downloaded content included over 4 million articles, book reviews, and other content from our publisher partner’s academic journals and other publications. . . . We secured from Mr. Swartz the content that was taken, and received confirmation that the content was not and would not be used, copied, transferred, or distributed.

A statement from DemandProgress.org, Swartz’s organization, reads with forced naïveté, as if breaking into MIT and hacking into a computer service are equivalent to your everyday “downloading”:

As best as we can tell, he is being charged with allegedly downloading too many scholarly journal articles from the Web. The government contends that downloading said articles is actually felony computer hacking and should be punished with time in prison.

“This makes no sense,” said Demand Progress Executive Director David Segal; “it’s like trying to put someone in jail for allegedly checking too many books out of the library.”

Apologists include John Wilbanks, a thinker I often respect but one who, in this instance, seems to be willing to overlook all the frankly bizarre allegations about going to MIT, hacking their network, and downloading documents when Harvard provided them all to Swartz as part of his employment there. His tweets on the topic miss some of these main points in the indictment:

Wilbanks tweet — JSTOR’s mission: “to help the scholarly community take advantage of advances in technology”. except for downloading and analyzing in bulk.

Yet JSTOR has a paid service for large-scale analysis, one that doesn’t require a user like Swartz at Harvard to go to another institution, hack into their network multiple times while evading security (both network and real), and downloading millions of articles without any clear indication of research intent. Swartz could have done this from his desk at Harvard.

Wilbanks tweet — Holy smokes. JSTOR should be open access, of course, but i hope this turns out to be a misunderstanding.

Why should JSTOR be open access? Open access isn’t free access, it’s access paid for upfront by authors. The JSTOR aggregation is a mixed bag from a business model perspective, and because the archive is so large, most of it is subscription-based content that publishers and/or JSTOR had to pay to digitize, has to pay to sustain and service, and so forth.

Swartz was released on $100,000 bond yesterday; his next court date isn’t until early September. We have to assume Swartz is innocent until proven guilty. Yet, this little brouhaha seems unlikely to prove any major ethical point, no matter the outcome. Is the point that the scholarly literature as hosted at JSTOR should be free to anyone? Is the point that MIT’s computer network should be available to anyone? Both are equally illogical. Was MIT chosen because of DSpace, the open access repository? Why hide your identity and run from police if the point was an ethical stand?

But that doesn’t mean that those impassioned advocates for free information and access won’t latch onto this as some sort of rough cause célèbre — a real reach, from what I can tell at this point. Who needs to hack MIT to hack JSTOR? Couldn’t Swartz just have gone to a Starbucks?

The storyline eludes me here. And that’s what’s so perplexing about Swartz’s alleged crimes.

We may yet hear more about the specifics. We’re likely to hear even more about the generalities.

Kent Anderson

Kent Anderson

Kent Anderson is the CEO of RedLink and RedLink Network, a past-President of SSP, and the founder of the Scholarly Kitchen. He has worked as Publisher at AAAS/Science, CEO/Publisher of JBJS, Inc., a publishing executive at the Massachusetts Medical Society, Publishing Director of the New England Journal of Medicine, and Director of Medical Journals at the American Academy of Pediatrics. Opinions on social media or blogs are his own.

15 Thoughts on "A Bizarre Approach to Accessing JSTOR Earns Federal Charges for an Internet Activist"

So I’m reading this from afar with limited understanding of what constitutes a particular criminal offence under the US legal system. To me, The JSTOR statement and any comments from it are a red herring. I think that’s a breach of contract issue (which he may have been covered by depending on his place of employment at the time) and potentially a civil case which JSTOR chose not to persue (Hope I’m understanding that correctly).

The federal charges seem to revolve around some basic concepts of “using a computer in a nefarious manner” and theft. To me the theft charge is the one that really needs to be paid attention to. I’m not sure how one can ‘steal’ a digital object that has been defined as being available to all without precondition (a definition of OA there). Note, the route by which those items were acquired is irrelevent – it doesn’t matter how you stole it, just that you managed to achieve the theft. The non-OA articles are interesting here because the theft accusation implies that mass access and duplication = theft. To which I would add, “under what circumstances does mass access and duplication = theft?”. An overly restrictive/broad definition here opens up all sorts of chilling effects I suspect.

As has been pointed out, one M Zuckerberg kickstarted his career by doing what would appear to be largely the same exercise when he mass copied images off Harvard’s servers in order to create “The Facebook”. This was in breach of Harvard’s IT policy. Was that theft? Given that he most certainly has made money out of the activity, does that also qualify him for a fraud investigation? Leaving facebook out of it, at what point does screenscraping become theft? One could argue that Google engages in a very similar practice (an argument that has been made by one Mr R Murdoch).

Any legal types care to comment on this?

The other interesting legal question to me is that much of the material involved is in the public domain, and whow the IP protection laws apply for those files.

I’m assuming he cannot be charged with stealing public domain works. I’m further assuming that to be charged with theft, the owner of the IP has to assert that their material has been stolen. Over here (the UK) for example, you cannot assert that your car was stolen in order to beat a ticket from a speed camera unless you reported it stolen prior to the ticket, or can otherwise show it wasn’t you in the car. As near as I can tell here, the actual owners of the IP haven’t made any claims of theft? Can the Feds press these charges without listing the victims of the alledged theft? Re the public domain point, I’d guess that the only way you could steal from the public domain, would be to destroy all copies in the public domain whilst keeping one for yourself.

Remember though, that JSTOR is not the copyright holder on the non-public domain articles. We haven’t heard anything from the copyright holders here, just the company that licenses them and makes them available.

Also note what he’s charged with:
Wire Fraud, Computer Fraud, Unlawfully Obtaining Information from a Protected Computer (information whose “value” exceeds $5000), and Recklessly Damaging a Protected Computer. Only the third charge seems to have any relevance to the actual value of the material involved.

Keep in mind that he was *not* an MIT employee. He had not connection to MIT whatsoever. He physically trespassed onto MIT property, not just walking onto campus, which is certainly allowed, but by breaking into a locked computer equipment room and physically altering an MIT-owned router to grant himself unauthorized access to the MIT computer network. That activity alone is criminal under U.S., whether or not he stole anything with that access.

How does Swartz’s alleged attempts to avoid discovery make it any less likely that he wanted to prove a point as an activist? Assuming the goal was to make the content freely available, he wouldn’t have succeeded until it was posted on some file-sharing service, right? So it seems plausible that he’d want to avoid getting caught until the articles were out from behind the JStor paywall.

Not to say I agree with him, but that it’s not only “tempting to believe” he was making a point — it’s quite likely, based on his “Guerrilla Open Access Manifesto” (http://www.earlham.edu/~peters/fos/2008/09/guerilla-oa.html): “We need to take information, wherever it is stored, make our copies and share them with the world. We need to take stuff that’s out of copyright and add it to the archive. We need to buy secret databases and put them on the Web. We need to download scientific journals and upload them to file sharing networks….”

To me, getting a few hundred files and posting them and having the integrity to say who you are and what you’re doing when caught or accused is more of a stand-up, point-making approach than repeated attempts to disguise, evade, and elude detection and capture, even after thousands of files had been downloaded. Why not get a few thousand, post them, write a statement, and stand behind it? Seems a little bit of a different pathology to me than a straightforward ethical stand requires.

I agree that there were many other ways Swartz could have made his point — I just wanted to suggest that it seems clear he was intent on making one. However much I disagree with his methods (and I do), I don’t want to assume any pathology on his part.

Now that he’s been arrested, we may never know exactly what point he hoped to make. I keep wondering, why JStor? Surely there are other possible targets who are making enormous profits, locking content behind far more expensive paywalls, or putting stricter requirements on their authors’ re-use of their own articles ( e.g., http://bit.ly/mMCEGH ).

For the record, that was me pestering John Wilbanks on Twitter yesterday. I think he has a valid point, that in a fully OA world, these sorts of issues wouldn’t exist. But I’m not sure it’s a terribly useful point without specific plans for how one would create and maintain a service like JSTOR in a fully OA world. Without an actual strategy for making things work, it follows many of the theoretical attempts to redesign scholarly publishing that require some undetermined and unexplained magical step in the middle. It’s a lovely vision, but then again, if wishes were horses, then beggars would ride.

As Wilbanks notes, scanning, storage and bandwidth costs have dropped, but they’re still not free. The “free ponies” approach mentioned above takes it for granted that doing things online is somehow without cost. Creating, organizing and maintaining a huge archive requires work and that work must be paid for somehow. Not-for-profits must make a profit if they expect to stay around very long. Funds will always be needed for maintenance (what happens if you need to buy a new server?), for updating (imagine if you built an archive based on Flash and the whole world moves to the iPad) and for experimenting with new ventures.

The “author pays” model is proving surprisingly robust and a great model for some situations, but I don’t think that’s applicable here. Wilbanks suggests finding a library to cover the costs, which I find unlikely given the current state of library budgets. Cornell’s issues in finding funds to support arXiv are particularly telling. His other suggestion is to find a company that can use freely available content to profit in other ways. This is perhaps more realistic, though I worry about entrusting our scholarly heritage to a private company, given the financial pressures involved and the relatively short half-lives of companies (also his suggestion of Mendeley is worrisome as they’re still somewhat unproven as far as having a sustainable business model). Companies tend to do what’s best for their shareholders, not their users, so I’m not sure this is a better option than having a dedicated not-for-profit company behind the effort.

Grant funding would be another option, but it’s not reliable or sustainable, and an archive like this is too important to leave in such a tenable position. PLoS understands this and has worked hard to get away from a model where they’re reliant on donations for their survival. Even the new Wellcome/HHMI/Planck journal has immediately pledged self-sufficiency as a goal.

The other immediate problem is that JSTOR is not the copyright holder for much of the material they make available. So even with the best intentions, they simply may not be able to make the sorts of moves requested here which would instead be up to the individual publishers. They can’t just follow in the footsteps of archive.org because they’d be violating copyright law.

Overall though, I think that using this event as a reason to bash JSTOR is uncalled-for and saying something like “if JSTOR were OA this wouldn’t have happened” isn’t that far from “she was asking for it, did you see the way she was dressed”. JSTOR is the victim here, they had their TOS violated and their servers knocked out. They apparently went out of their way to settle things amicably, and reportedly did not ask for this prosecution. Why paint them as the bad guy here? Bash the Feds if you don’t like the enforcement of the laws, bash the individual copyright holders if you’re unhappy with their access policies, bash Congress for not making copyright law modern and fair, but why bash JSTOR, a not-for-profit that does great work making a huge amount of scholarly material available at least in some fashion?

This may seem an opportune moment to talk about IP, OA, etc., but those are not the issues with which the indictment is concerned. Mr. Swartz breached his contracts, through subterfuge and pilfered access, with both MITnet and JSTOR. And whether or not you agree with MITnet’s usage agreement or JSTOR’s business model, you have the option, nay, the obligation to refuse its terms of use if you do not agree to abide by them. And that is where Mr. Swartz’s defense will fail.

We can go off into the weeds and talk about copyright, IP, intellectual freedom, blah, blah, blah all day, but none of that will ever make what Mr. Swartz has done right or ethical. This was simply a bizarre and stupid attempt on his part to exploit a resource that is used by an entire academic community, and which has had serious repercussions for the legitimate users of that service. Mr. Swartz is not a hero here, and no one should be fooled by the demagogic treatment that is sure to continue about how Mr. Swartz was striking a blow for intellectual freedom.

I agree and would point out that civil disobedience, if that is what Mr, Swartz was attempting to engage in, necessarily involves public disclosure of one’s actions to make the moral point one wants to convey. It is disappointing to see advocates of OA treat this person as some kind of hero.

So are you saying that it’s reasonable to punish him with up to 35 years in jail and a $1 million fine? Because the US Attorney’s press release is certainly emphasizing that. You don’t have to think what he did was wise or proper in order to be appalled at the disproportionate response brought by the government. And this was no hasty decision–he was the target of an investigation by the Secret Service, the MIT police, and the Cambridge police, and I’d be shocked if the fact that the FBI had harassed and surveilled him in the past due to his entirely legal and wholly laudable PACER work played no part in all of this.

Talking about copyright and IP and intellectual freedom isn’t tangential to the case, it’s central to it. This was an exercise of prosecutorial discretion, one that reflects a policy throughout our government (the PROTECT-IP act in Congress, the ICE domain seizures, our new Copyright Register saying in interviews that enforcement is her fundamental priority) to normalize the use of criminal sanctions in order to protect incumbent content firms’ business models. If you let your distaste for Swartz’s methods blind you to the larger policy shift, and it’s troubling implications–again, we’re talking about potentially locking someone away for decades–you’re making a serious mistake.

“Up to” sounds very reasonable to me. If found guilty, an appropriate penalty can certainly be found somewhere between 0 jail time and 35 years, and a fine of $0 and $1 million.

I do not expect Mr. Swartz to spend any time in jail, despite what is laid out in the press. To be honest, when I first read of this (after hearing about it on NPR) I was a bit surprised that it had been made a federal case, given that Mr. Swartz had settled with JSTOR with regard to their claims, and MIT had not taken any separate action. Though I admit to ignorance herewith regard to the dynamic between the District of Massachusetts and MIT, and if there was any interaction between the two on the charges, besides the evidentiary aspects of the case.

Regardless if whether this reflects government policy re: PROTECT-IP, or not, Mr. Swartz’s non-IP related actions alone still warrant careful examination and some penalties. Whether those should be federal penalties is questionable. JSTOR seems to have settled its claims with him, and MIT, as I stated above, doesn’t seem bent on making an larger issue of this.

Mr. Swartz is well-known in the technical community at MIT and beyond. His transgressions in this case run more toward the stupid end of the scale than the criminal. Keep in mind, though, that he broke several agreements with both MIT and JSTOR to do what he did. And his actions caused, at times, a virtual blackout of access to a valuable research resource. It should not be OK for him to walk away with no penalties. Perhaps his going through this process, up to the point of settling with the government, without the need for a trial, will be penalty enough. It will certainly be expensive.

