It is a truth universally acknowledged — any discussion of identity and identity provision will instantly send an auditorium of conference participants to sleep, make readers of an RSS feed instantly skip to the next item, and allow email update subscribers in hot in pursuit of “inbox zero” to delete the offending message without the slightest twitch of guilt.
Well, I hope you decide to stick around, because it’s time to get to grips with some of the issues of identity in a digital world.
We start, like most things digital and Internet, with Google. But don’t worry, there’s a very direct link to the scholarly information business coming up.
Not long after Google launched their new service, Google+, a furious debate around the subject of identity was ignited. Google had decided that any user of the Google+ service must use their ‘Real Name’. This was described as “the name you commonly use.”
When I signed up for Google+, it was pretty obvious to me that Google was asking for me to identify myself as clearly and unambiguously as possible. However, it turns out that there were some edge cases — people who’s commonly used name was either a single word or contained an unusual combination of characters. Google’s automated “dodgy identity detector” was flagging these as spam or other poor behaviour and, to cut a long story short, Google closed the accounts down. It’s fair to say that Google didn’t handle the issue very well (they’ve basically admitted as such), but in the ensuing furor some interesting debating points emerged.
Question: When is anonymous use of a service to be expected?
Answer: If you are in Germany, the answer is always it seems. The German Telemediengesetz (German Act for Telecommunications & Media Services) specifies anonymous/pseudonymous access in section 4, subsection 13 of the act. Other countries may have their own requirements. Then there’s the cultural aspects to consider. There’s also the question of the circumstances by which a business that is interested in selecting for a particular type of user, should be obligated to provide services to individuals they are not interested in serving (think about that one, it isn’t quite as obvious as it seems).
Question: Regardless of whether it’s expected, is it in fact possible to have truly pseudonymous/anonymous access to a service in an always connected world?
Answer: It’s early days here, but so far the evidence would seem to be that it simply is not possible to be truly anonymous when one is connected to the web. Say you want to search the USA for ‘subversives’. You could use the Amazon wishlist feature in order to locate the addresses of people who read ‘subversive books’. Or you could read this forensic look at how your browser can be fingerprinted to a very high degree of accuracy, and thus used to identify you (pdf). Bet it didn’t occur to you that your browser is effectively a Globally Unique Identifier did it. Now you know why those annoying flash ads keep following you around when you browse the web.
Question: To what extent should the debate about identity be carried out on social network platforms built to leverage that very thing? If we use a service without a full and complete understanding of the consequences of the choice, are we forgoing rights, we might later wish to assert?
Answer: The terms of service we all sign up to when we elect to use a new product are inimical to understanding (assuming they’ve been written to be understood) given that it’s very hard to think through the consequences of ones possible patterns of use, before one has even kicked the tires so to speak. I’d suggest that enlightened statespersons should debate these complex issues, but this is the 21st century, and there don’t seem to be any around in the political arena. This one, will be decided by those who engage most effectively with it.
I find it somewhat odd that some users of Google+ were objecting to Google’s insistance on a real name as if they were in some way able to use Google’s services whilst retaining anonymity. Google certainly has a very good idea of who you are, regardless of whether you are logged into an account or not. They just don’t have your name.
So why would they want to insist on a real name? Twitter doesn’t insist on it, I’m told Facebook does, but apparently isn’t overly zealous about following up on the rule. Other social networks of various flavours allow whatever strange and wonderful combination of letters, numbers, and special characters you care to design.
Well, for one thing, Google+ is an identity service. It just happens that one of the first things they’ve implemented with it, is a method of allowing you to build a network of your contacts, with whom you can exchange information.
If you are on Google+, you may well already be aware of what Google is doing with the identity information they have collected so far. Links shared by your network are being highlighted, even if those links didn’t originate in Google+. Take a look at the image below, where two of my Google+ network show up next to links they have commented on. I note with interest the ability of Google to connect the dots between Google+ and friendfeed (which is owned by Facebook). Sarah Perez is linked to via her Google profile (a component of Google+). Content of a potentially higher value is being highlighted via your network.
The intention is pretty clear. Google wants to be able to leverage the database of intentions it has built up about you and your friends and colleagues for all sorts of services. One of those services looks to be a totally personalised Google information delivery experience. I’d guess that a fairly major play on transactional services, won’t be far behind. Throw in a tightly integrated mobile experience and some hardware, and Google has got a rather interesting end-to-end view of their users, to put it mildly. (Update: whilst writing this, I was made aware of Google Wallet, a mobile based payment system.) This is potentially a view that can follow the user seamlessly across their interactions with, well actually, with the modern world. The Internet isn’t a place you visit from time to time, it’s the base fabric of the modern world. I imagine Google has hordes of data scientists chomping at the bit to extract monetisable insights out of the datastream. Allowing for the special characters and other tricks that spammers often use would be a disaster for the data being accumulated from Google+ activity.
I’ve previously written about how companies are trying to create and sell metrics attached to individuals in order to sell the “reputation” that accumulates with any user of a social network. I’m not on Facebook as I don’t want to have anything to do with a company who’s privacy statement was longer than the US Constitution. I am, however, a user of both Twitter and Google+. I’ve been trying to rationalise why I’m comfortable with Google’s use of my data, why I trust Google more with data that can be used to identify me purely on my habits and my particular set of interests. I’ve also been thinking about how the more you use Google the more it learns about your interests in order to serve better results to you. Amazon does much the same, profiling its users in order to better understand what they are interested in, so that it can sell more stuff to them.
Effective personalisation and relevance is of great interest to the scholarly publisher, and there are a number of neat offerings out there. I think a really effective personalisation and recommendation system is of massive benefit to the time poor researcher or student, as anything that can increase the chances of a serendipitous discovery in the scholarly literature brings massive benefits. Allowing users to transit across various scholarly holdings in a meaningful way would also bring massive benefits to all (there’s a reason they use Google first) But there are a couple of big problems.
Limited Data: They are limited to the offerings of the publisher/institution. This is a problem that cuts both ways, publishers can only profile users based on the data (journals and articles) that they are able to present. Users can only see “relevant” material from within the holdings of the publisher, or perhaps the larger holdings that their institution has access to. Given that any given field of scholarly research spreads across multiple publisher portfolios, all parties are at a disadvantage when compared to Google or Amazon (or Facebook or Twitter, for that matter) in terms of how good, and therefore useful, their offerings can be. Then of course there are the privacy issues to consider.
Spam: To date, publishers have been even worse at leveraging the social graph of their users than Google has. Now it’s true that the reasons for this are many and varied but a couple of publishers experiments had everything going for them, and yet they have been discontinued. I’m talking about Connotea and 2Collab. In a nutshell, these were link storage tools, allowing scholars to store, tag and share links to research. Both had a clear value to users. Both were swamped by spam. 2Collab closed this year. Connotea seems to be still open but looks to be overrun by spam (for a perspective on this, see this article from the Kitchen archives). Two well funded publishers struggled to deal with this. Clearly it’s a very tricky problem to solve.
Seamless access to institutional holdings: IP based recognition has always fundamentally used the wrong tool for the job. IP ranges are there to enable machines to communicate with each other, not to be used as authentication methods. Now it’s the best solution out there, but aside from the major issues in maintaining and updating complex lists of numeric codes, IP addresses identify the machine, not the user. And if the user moves from device to device, then matters get even more complex as IP authentication will cheerfully withhold access to a user who decides to use their iPad or other device, even when they are physically sat at a terminal which does have access. Athens and Shibboleth — not a vision of the future, is it? I know that’s being harsh, and I know a lot of hard work went into the protocols and all that, but it’s basically a suboptimal user experience when compared with Facebook . Just to be fair, OpenID isn’t exactly a barrel of laughs either. Wouldn’t it be fantastic if one could sign in to an identity service and then use that to seamlessly authenticate access to any services that could make use of that identity? I bet Google will be more than happy to allow a Google+ identity to be used for exactly that purpose. But it is a general purpose identity, and not perhaps most suitable for the scholarly community.
Researchers are also showing interest in the possibilities of a well-configured identity service. The altmetrics movement is essentially predicated on being able to append various signifiers of scholarly output and reputation to an identity. In addition, work is being done on additional uses for a researcher identity. At the recent irisc2011 identity workshop in Helsinki, there was a breakout panel that debated additional uses for a researcher identity. They concluded that Researcher id’s would greatly improve the manuscript submission process (this is a less than optimal experience apparently). Researcher profiles, Id’s with the researcher metadata appended to them were also wanted (for grant applications), and of course metrics to support the breadth of a scholars outputs. Just to be clear on this — altmetrics is about tilting at the windmills of peer review and impact factor, two things that act as a bulwark to the disruption of the business of scholarly publishing.
ORCID was the system of choice for experimentation. ORCID is to authors what the DOI is to the articles they publish — a system for disambiguating author names and supplying them with an unambiguous identity that can be used for various things. Like the DOI before, this is one of the most important developments occurring in scholarly publishing. It is a very good thing indeed. But part of me thinks that the current ideas for using it don’t go far enough fast enough.
People adopt things that provide obvious, clearly understood benefits to them. Things that make the pain of learning how to use them worthwhile. So take a look at what Amazon has achieved in terms of providing an identity service for users of it’s offerings; relevance marketing; serendipity analysis; Whispersync. Look at how Facebook and Twitter have colonised the business of sharing links, and how Google has concluded that an identity service is vital in order to capture the same signals in order to further improve its search algorithm. Look at the difficulties a user of our wares has if they want to move from device to device whilst consuming our content.
Now, look at ORCID.
So here’s a quick vision of a possible future:
The researcher wakes in the morning and picks up their mobile device. They’ve already configured it with their ORCID credentials so the device can either supply them upon request, or any read/note/store applications can make use of the same credentials in order to allow them to get on with the business of keeping up with the competition. Speaking of which, there’s a competitve intelligence application that keeps an eye on the outputs of competing researchers. Overnight, it has run a series of searches and sorted and categorised the results for them to scan though. It’s learned what areas they like to pay most attention to. Some important items have already had various sections of text and imagery highlighted for closer inspection. Some articles and snippets of information are queued for later consumption, others are tagged to be distributed to the researchers lab workers.
As the researcher moves from their house to their place of work and switches devices, the information moves with them, again via their ORCID credentials. In fact, the same credentials have not only allowed them access to all of their institutions holdings, but various publisher apps and platforms are updating and reconfiguring information for them based not only on their activities, but the activities of their ORCID network. DOI resolver data, appended to the identity of the researcher allows much better precision and recall algorithms to help them filter through the torrent of research. The network effect is in full force.
Later, when they attend a conference at another institution, their access to scholarly resources moves with them. They also have control over exactly how much of their clickstream data is to be used to enhance their information discovery activities.
The publisher has had to employ a bunch of data scientists in order to better understand what their users are doing. Usage is up, way up. Business development is plowing through the data and surfacing a multitude of product ideas and partnerships based on opportunities to derive customised products for the emerging areas of research. Other systems are predicting these emerging areas and listing the most active researchers, ranked by their various scholarly metrics, a self assembling editorial board for a journal that doesn’t exist as yet even though the topics for discussion are already being surfaced.
What I’ve described above, is not only technically possible, it’s already happening in other areas. Identity driven data is big business, that’s why Google just spent over $500 million on Google+. There’s a massive opportunity here to build something that offers clear benefits to both publishers, scholars and libraries. If we don’t do it, somebody else will.
Don’t believe me? Take a look at what else Google has been up to.