Until recently, I’ve considered Bitcoin to be a shady digital currency that facilitates the activities of drug lords, arms dealers, smugglers, prostitution rings, and other nefarious activities that hide in the shadows of an open market. In recent years, however, Bitcoin, has been moving into more pedestrian and lawful activities. Many online stores, pubs and coffee shops now accept Bitcoin. On April 1st, PeerJ announced that it would start accepting Bitcoin, leading some to wonder whether it was a clever joke. Nope. No joke.
This post is not about Bitcoin as a financial tool — publishers can decide how they want to structure their own financial transactions — but to explore how the technology behind Bitcoin — the blockchain — can be used to solve the intractable problems of authentication and usage accounting.
For more than a decade, publishers and institutions have settled, in most cases, on an IP-model for authentication to paywalled material. If you’re a researcher or graduate student or librarian sitting in front of a computer that is physically located within an IP range, you have access. If you don’t, you need to find a work-around, like a proxy-server (a computer that sits within an authenticated IP range), or a virtual private network (VPN), which creates a secure connection that mimics a physical one. There are other models for remote authentication, none of them, in my opinion works exceptionally well, especially if you are an infrequent user. We should not be surprised to find that people who should have access to this paywalled material ultimately decide to turn to the dark web, like Sci-Hub, for their access. If publishers wish to keep with a traditional paywall model, they will need to develop a simpler authentication model that identifies individuals and not the network location of their devices.
When users turn to the dark web, it creates a second problem — a problem of accounting. More than ten years ago, as a publisher-librarian committee developed Project COUNTER, it was assumed, a priori, that most usage could be counted and reported from individual content providers. COUNTER’s role at the time was standardizing how downloads were counted and how to report them to their customers with some degree of trust.
Today, it is not clear how many downloads are taking place that go unrecorded, but one thing that we can be certain about is that that undocumented usage is growing and may eclipse the traditional model of distribution from publisher to reader. Many years ago, free accessibility from PubMed Central was responsible for a significant diversion of traffic from the publishers’ websites. Today, we may add institutional and subject repositories, peer-to-peer sharing, commercial archives (ResearchGate, Academia.edu), and now the likes of Sci-Hub, which is built upon abusing the authentication system built to facilitate access to online journals. Publisher-provided usage statistics may be reporting just a small and declining slice of total usage: our bigger problem is not knowing what the other slices look like.
Strangely, at a time when we can capture and report tweets, blog posts, news, Facebook, Google+, and reference manager use of scholarly content, the metric that may best indicate reading — the article download — is becoming more elusive.
The savvy librarian or consortial negotiator will use this information — or rather, the lack thereof — to her advantage. “Look,” she says. “Article downloads have dropped this year by 5%. Let’s begin our price negotiation at 5% lower than last year.” While both the librarian and the publisher know full well that their publisher-derived downloads reflect just a portion of overall use by its institutional members, no one at that table can provide even an estimation of overall use.
This is not just a problem for publishers. Authors may also be misled by the underreporting of article downloads when viewing their article-performance dashboard, as are their funders, who are interested in the impact of the research they sponsor. A lack of reliable usage data also means that editors are incapable of learning from their decisions on what to accept for publication. Put simply, a dark web occludes what can be known about article impact.
Understanding how the dark web affects their business, publishers have come together recently to discuss how their articles can be shared, resulting in a website that attempts to educate and document publisher policies, although I’m not convinced how this will change reader behavior. Moreover, the best publishers have achieved through these discussions is a draft set of voluntary principles. This is like agreeing that the sea is rising, but offering no concrete solution besides crying “every man for himself!”
This is the point in the post where readers anticipate a solution, and I first need to state outright that I don’t know if this solution will work technologically, politically, legally, and socially. But sitting back and complaining is not enough. Those readers who take a pessimistic view of scholarly publishing are welcome to remain critical: I am willing to try to work towards a solution. This solution may not be an ultimate solution, but at least, it may be better that what we have right now, which is a model where content providers are finding it more and more difficult accounting for what they do. I’m going to propose that a possible solution to the growing problem of authentication and accounting in scholarly publishing may be found in the technology behind Bitcoin — the blockchain.
One of the first principles of Bitcoin is that each and every transaction is a public and transparent transition. When Joe sends Alice some bitcoins, this transaction is broadcasted publicly and recorded in a public ledger. The ledger does not record the names “Joe” and “Alice” but includes each of their digital signatures, which are private, unique, and anonymous. Once recorded, this transaction is verified and validated by other public ledgers. The accuracy of these public transaction ledgers is maintained by other computers (called “miners”) that work out a computationally difficult problem in order to validate whether the transaction was real. In the Bitcoin system, miners are rewarded with new Bitcoins, so there is a financial incentive to devote computational bandwidth to maintaining the integrity of the accounting system.
If we apply this system to publishing, it is not hard to substitute a published document (a journal article, book chapter, or dataset) as the currency of transaction. A transaction from Journal A to User B is recorded in the same way a peer-to-peer transaction from User C to User D, or a transaction from Repository D to User E. In this model, every document transaction is recorded and public. There is no longer a dark web. We see the entire usage pie, not just one small slice of it.
While the system of distributed ledgers is public, it is built around privacy. Digital signatures do not necessarily need to disclose the identity of an individual, only that a transactor is a unique individual — not a computer or a proxy server — but an individual. Similarly, the blockchain does not need to disclose the full details of what was sent, only that the document was tied to a unique content creator. I think these two details are essential for such a system to be adopted: users will want to maintain their privacy and publishers will want to keep detailed information away from their competitors.
As for who will devote computational power to maintain the public ledgers, I see several large groups who are incentivized to take this role: publishers themselves, libraries and their consortia, and funders.
One of the strengths of a public and distributed accounting system is its decentralized nature. There is no need for librarians to trust the numbers they receive from their publishers. There is no need to trust that Project COUNTER is doing their job and auditing these publisher reports. There is no need to trust the numbers reported by third-party services and archives. Trust is built into the transaction system itself and verified by other players. More importantly, it is very, very difficult to game this system.
Obviously, there are some changes that will need to take place before such an open transactional model can be implemented. First, every reader will need a digital signature. This signature will identify the individual wherever s/he goes, whether it is back and forth from home to the office, to a conference, or to a new institution. It will be like an ORCID-ID you keep in your wallet at all times. Second, the digital signature will need to include information that will be used to authenticate that individual if the content is restricted to members of an institution. Unlike the current IP-based model that authenticates individual machines, the digital signature will authenticate individual people. The digital signature will work very much like a passport, but unlike passports, you can only have one.
The bitcoin public ledger model described above attempts to solve the problem of tracking and accounting for the distribution of scholarly documents around an increasingly dark web. There is another way that publishers and institutions can use blockchain to solve the problem of authentication.
If we move away from the IP-model of document authentication and replace it with the individual, the logical place to put it is in the document reader itself (i.e. Adobe Reader, Mendeley, Readcube, and Papers). Digital Rights Management (DRM) will need to be built into document readers. Each user will need a digital signature and allow their document reader to access it. In this way, it doesn’t matter whether the user is physically in the lab, in the library, at home, or on a business trip. No proxy server, no VPN, no Shibboleth. Moreover, every digital signature is encrypted and is not based on username and password. Under this model, even Bob1968 is unbreakable.
A DRM-based reader means that content will only display if the reader is able to authenticate the individual. I don’t see anyway around this, and it does mean that someone going offline will either need to pre-authenticate in order to view the documents at a later date or receive some kind of grace period until the individual returns back online. This notion of individual authentication will be a problem for some, but I should remind readers that the free reference managers listed above already track and send detailed information about reader behavior back to individual publishers. A digital signature based on blockchain would provide much more privacy than the personal registration model currently used in the reference manager model.
While I admit that I don’t have all of the details worked out in this blog post, the distributed public ledger model using encrypted blockchain may provide a working model for solving some intractable problems we currently face in authenticating users and accounting for usage and sharing in an increasingly dark web. Often, technology is used to solve problems it wasn’t initially built to solve.
If you know nothing about Bitcoin–or need a short refresher–the following video, How Bitcoin Works in 5 Minutes, provides a good overview.
16 Thoughts on "Bitcoin: A Solution to Publisher Authentication and Usage Accounting"
You may be interested in the Hyperledger project:
This Linux foundation collaborative is attempting to create an internet standard for blockchain technology that can be applied to industry-specific problems. What you describe seems to be a perfect use case for what this group is trying to accomplish.
Good post on a topic that’s been gaining momentum in many industries — the potential utility of blockchain, which may be the ultimate advance emanating from Bitcoin. In top-down industries like finance, blockchain seems to have a better chance of being implemented, especially with systems like SWIFT being infiltrated to let thieves steal millions of dollars from banks. However, for the financial system to implement this, it requires the coordination of approximately 3,000 banks, all with a vested interest in finding a solution and a history of cooperating to implement unified systems.
In a distributed and uncoordinated economy like ours, it’s hard to see this coming to fruition. DRM in general has not worked, as you note, while purchasing of content has shifted from individuals to institutions.
So I’m trying to imagine how the shift might occur, how adoption would be implemented. Authors have reasons to adopt ORCiD, but why would readers (which go far beyond authors, outnumbering them probably by 10 to 1) create a blockchain ID? Rather than 3,000 participants familiar with coordinating top-down changes, we’re talking about millions of participants familiar with doing their own thing.
It’s an interesting technology that will find applications, but how do you see it being implemented?
Good point. While universities are slow moving organizations, I suspect that several years of massive hacks, abuse, and data-trolling have pushed them to rethink whether their username/password security model is overdue for a change. If higher-education changes to a digital signature model, then publishers won’t need to be the first mover.
This is where there needs to be some proper thought leadership. From where I’m standing, our collective (Institution/Library/User/Publisher/Supplier/etc) access and authentication infrastructure is simply no longer fit for purpose. It won’t be easy to address, but there is a clear need for a fully stakeholder invested group to grapple with these issues and get us ALL moving together.
Given that the current state of maturity of the Blockchain is often compared to the WWW’s in ca 1993/1994, it would be a very nice analogy indeed if universities were early adopters again.
A university will need to keep record of the anonymous IDs linked to its staff that should have access to licensed material, and pass those IDs on to the Publishers so that a Publisher know to which participant ID of the blockchain network he can send a particular paper. Thus, behind those anonymous IDs, readers become fully traceable for those that are in possession of the information about which ID belongs to which person (traceable in the sense of who read what when). This will certainly give raise to new privacy issues, as unlike with IP addresses, the ledger system is public on the network. This just jumped to my mind reading this post. I have recently been writing on a blog post about potential use cases for blockchains in the academic sector myself and am looking to publish it in the coming days.
Assigning one key per person for life is a really bad idea. Transactions on Bitcoin-like blockchains are pseudonymous, not anonymous, so the threat of de-anonymization looms large. Since records of all downloads would be stored in public forever, a curious party (say, law enforcement) could link an individual to his/her access history via groups of citations in published works, a leaked list of user IDs, or old-fashioned network sniffing. Given the risk, I’d guess that a blockchain-based library would drive even more users to switch to Sci-Hub.
And if a key can’t be linked to an individual, what’s to stop Sci-Hub from collecting donated keys and operating as usual?
It’s my understanding that de-anonymization is very difficult in this model, which is perhaps why it is used for illegal operations. Still, the model is much less invasive than the US Patriot Act and its effects on library users (see: http://www.ala.org/advocacy/advleg/federallegislation/theusapatriotact)
As for the abuse of keys, the video at the end of the post provides a great explanation of the security against identity theft (start time: 1:08)
You stop Sci-Hub because it’s your key. You’d be giving away something rather equivalent to credit card details. 2 factor authentication stops Sci-Hub. Sci-Hub only works because our current systems are bluntly, a joke. There’s clearly a lot to figure out here, the longer term ramifications and the ultimate implementation of a blockchain data approach.
A history of interactions cuts the other way as well – the user gets a personal data stream of all the things they were ever interested in, however emphemeral. That’s an interesting thought.
Sorry but this seems to be an incredibly bad idea.
Firstly bitcoin is pseudonymous (see http://bitcoinsimplified.org/learn-more/anonymity/). Nakamoto originally recommended using a new address for each transaction to avoid the transactions being linked to a common owner. What you’re suggesting would effectively create a public record of every paper I read, which could ultimately be traced back to myself. I would not be happy with this. Furthermore, this would also create a completely public record of an entire journals usage. Publishers may or may not be happy with this.
Secondly, requiring every ‘document reader’ to implement this DRM scheme is just not going to happen. Without significant incentives Adobe will not modify their reader. And we live in a world that is not just PDF based – browsers are ‘document reader’ applications too. And since browser technology is inherently open, there is no way to implement a closed DRM system for articles served as regular HTML pages. You are just moving the security and identity issue from one place (the publisher website) to another (the document reader).
Lastly, the idea means we would no longer have true anonymous access to scientific research. This would be a major issue under any view of civil liberties. Asserting that this scheme would provide ‘much more privacy’ is, I’m afraid to say, a specious argument.
But it doesn’t have to does it? Phil is airing a conceptual idea about whether the concept of a blockchain distributed database can solve some rather tricky problems. I think there’s folk talking about how to have more private use.
I agree on the DRM thing, but I’m not sure anonymous access to research (or bluntly anything on the Internet) is really possible [see Bruce Schneier for more on this – Data and Goliath]. Google and Facebook know exactly what 2+ billion people are looking at. I don’t see how one can want altmetric/impact metrics for research in an anonymous world – impact means understanding which usage was valuable…
It’s worth kicking this idea around a bit. It feels like there’s something here.
So the anonymity issue might be fixed via this (propellerhead alert) https://petertodd.org/assets/2016-04-21/MIT-ChainAnchor-DRAFT.pdf
Which might do the job.
The only defences against sharing credentials with services like SciHub are
1) economic disincentives – i.e. “opening your wallet”, such as by sharing a real BitCoin private key, which I don’t believe the original post is suggesting. If this proposal using a separate key used only for this purpose, then it provides no real defence against sharing, and thus no improvement in usage reporting.
2) involve a physical object, i.e. 2 factor authentication, as David suggested above, where something you know has to be combined with something you have (mobile phone, digital keyfob etc)
It’s hard to imagine anything which truly uses door number 1 flying well with patrons. Door number 2 seems a more fruitful line of inquiry if anonymity can be built into it.
For door number one – If I give out my organisation access – I get the sack. Belonging to an institution, whether student or researcher or employee or whatever is a contractual matter ultimately. So if the key for door number one is attached to your continued participation in university life…the problem is that current access doesn’t meet those criteria. And it should.
Phil, thank you for starting this discussion! From what I know of blockchain technology–and I just spent over a year studying it and talking to people like Joichi Ito at the MIT Media Lab, Andreas Antonopoulos (author of MASTERING BITCOIN), and Melanie Swan (author of BLOCKCHAIN)–it has the potential to support peer-to-peer mass collaboration, disintermediating traditional publishers that extract more value than they add to the process of creating and disseminating new knowledge. We are looking at this in my academic publishing course right now.
In the short run, we need to have more conversations like this one. Unlike the Internet which had 15 to 20 years of academic gestation, companies are moving more swiftly to commercialize the blockchain in finance, music, and the Internet of Things. As others have posted, there may be unintended or unforeseen consequences. My hope is that, in the long run (and with such functionality as smart contracts), scholars and authors of all kinds will have greater control over, and reap more of the benefits from, the intellectual property they create.