I’ve been playing with a physical user interface for a cloud based artificial intelligence (AI). The AI system in question is Alexa and she’s quite an interesting character whose behavior gives one much to ponder. For those not familiar with such things, Amazon’s Alexa is (like Google’s Home) a small cylindrical tube that contains a small PC, a speaker, a slew of microphones, some LEDs and, with an internet connection, a link to a cloud based AI that aims to supply answers and carry out various functions in response to spoken commands. You can go and find reviews of these things in any reputable tech site (I recommend Ars Technica); this isn’t a review so much as some open thoughts on what importance these things might have in a scholarly publishing context. She’s very easy to set up. She? well the UK version comes with just the one voice, a very pleasant, soft received pronunciation accent. Now on the other side of the pond, you have had Alexa for a while, as well as Google’s Home device — but over here, Alexa is still pretty new (more on some early consequences of that later) and Google Home is just about to launch.
So what can she do? All sorts. There’s a slew of what Amazon calls ‘skills’ that you can give Alexa — things that enhance her capabilities. I have The Guardian ‘skill’ so I can ask Alexa to give a rundown of headlines etc., and then read out anything I’m interested in. I also have the Radioplayer ‘skill’ (I think this is UK only) that allows me to listen to any BBC or commercial radio station (including the local ones) simply by asking her to play the station name. I’ve hooked up my Spotify account to allow for more musical opportunities; again she is remarkably good at picking out the names of bands or albums and then filling the air with the appropriate sounds.
Oh yes, I can also turn on/off the lights in my shed with my voice, from inside the house. This is a level of awesome way beyond those few words. You think nothing of it? Well you’ll change your mind when you stagger out to the shed loaded up with power tools and whatnot, and it’s dark and raining and out there somewhere between you and the shed is something awful that the dog has left on the ground…
“Alexa, turn on the shed lights (please)”
“Ok” (And then there was light, and it saved my shoes from that nasty mess the dog left).
Yes, Alexa can act as a smart hub for Internet of Things (IoT) devices; in this case a ‘smart Wi-Fi plug’. It took 15 minutes to set this up and easily two thirds of the work was figuring out how to connect the plug to the home network. I’ve bought 2 more plugs already.
Alexa is simply brimming with facts. To check this out I introduced a smallish boy to the test environment and watched what happened next. Turns out my boy has many questions. Questions he (correctly) doesn’t think his Mum or Dad can answer. So he was most impressed when Alexa gave him the distance between Earth and Pluto in both Km and Miles, and even more impressed when she told the distance in miles and light years from Earth to the center of the Milky Way. He then asked her to tell him a joke.
15 minutes later… She finally repeated herself.
My son, pretty much immediately anthropomorphized the device. You’ll notice that I’ve done similar in writing about it. The voice conversation is so natural and remarkably fluid, that it seems wrong not to think of it in terms of a personality. This is striking. This is genuinely a new metaphor for human machine interaction. Yes, there’s been voice control before (it was in Windows 7 among other places — but it didn’t work very well). You can speak to your cell phone via Siri or whatever Google is calling their assistant these days, but the ambient usability of Alexa just works. The fact that it’s a cylinder that sits in your house somewhere is also interesting. I’m happier for Alexa to sit there with her microphone ready and waiting for me to speak, than I am for Cortana, the Windows 10 assistant, to lurk in the background of my machine, observing a serious chunk of my online existence. Anthropomorphism again.
It is a decade since the last big change in user interface metaphors. With the first iPhone in 2007, Apple gave us touch that just worked. It was initially applied to skeumorphic designs that clued the user in to the interaction modes via traditional looking analogue references. Users got used to always available compute capability, and as the computational power increased first with the mobile device processors, and then with super low latency cloud interactions that offloaded tasks to essentially infinite processor cycles, the interfaces started to abstract away. Metaphors like Google Now, with its semantically powered relevancy and personalized on-demand information delivery have moved us a long way from mouse clicks and the iconography of the ‘desktop’ (itself a skeumorphic echo of a more distant time). Alexa feels very much like the next logical step. A 1.0 version of a complete way of interacting. She’s not perfect, she can be awkward, but the things that just work, are simply brilliant. How brilliant? Read this heartfelt description of how Alexa has altered the quality of life for somebody who is dealing with the stark reality of the slide into dementia. Gives one a moment of pause doesn’t it?
This time, it’s different, it’s not labor that’s being replaced, it’s thinking. The repetitive but skilled processes that form white collar roles are now in range. Just how many non-repetitive roles are there in the world of scholarly publishing?
Okay, so what. How is this of relevance to scholarly publishing?
Let’s take a look at an Alexa skill called ArxivML. It was written by Amine Ben Khalifa, to allow him to scan the Machine Learning literature updates on arXiv whilst getting ready for work. Alexa will read out the abstracts of the ones Amine wishes to delve into further, and a more traditional title and abstract summary will be deposited into the Alexa app (where all your interactions with her are documented for posterity). The next few iterations of functionality aren’t exactly hard to think of, and not that hard to achieve either.
- Alexa Send to [Mendeley/zotero/DodgyFacebookForScholarsSites]
- Alexa Get me The PDF
- Alexa Share with …
- Alexa Save to my filestore
- Alexa Get the data from the paper
- Alexa Alert me when the authors are speaking at a conference
And so on.
When you are trying to think about AI, here’s what you need to consider. Don’t worry too much about the algorithms and the deep learning and the neural nets and so on. These things are interesting, but they are the details of how this stuff happens. Instead, realize this; you are thinking about a robot and what it can do. But not any robot., not the traditional sort that’s fluent in over 6 million forms of communication, or wants your clothes, your boots, your motorcycle, nope this is (as Bruce Schneier has brilliantly and chillingly put it) a world size robot. Alexa, like Google Home and Cortana, consists of audio (and video) sensors, a mindblowingly powerful cloud brain (where the AI lives, distributed across millions of servers around the globe) and actuators in the form of code or a multiplicity of code controllable switches. Those three things define a robot.
And what do we do with robots? We use them to replace humans.
This past week I watched in amazement as a bunch of cloud experts demonstrated how you could couple modern software development techniques to Alexa so that with a few voice commands, you could get her to build an Amazon Webservices environment — the full stack; the whole enchilada. Put more bluntly, over the course of a couple of sprints, they replaced a fairly respectable IT role, using £150 worth of hardware and a few more quid’s worth of on demand storage and computation. They didn’t even need to fire up a server to do this (You don’t need to for Alexa code). I reckon this future is one or two Moore’s Cycles from demo to cold hard reality. (By the way… recall where Amazon started back in ’95 — something something books mail order something). There will be many more human replacement cycles in the next ten years. This time, it’s different, it’s not labor that’s being replaced, it’s thinking. The repetitive but skilled processes that form white collar roles are now in range. Just how many non-repetitive roles are there in the world of scholarly publishing?
Right now you can go and spin up a general purpose image processing AI (Amazon, Google, IBM and Microsoft have these as a service) and go feed it a bunch of Northern Blot data. And with that, you can go spot any re-purposed or faked data. The publishing industry will need exactly one of these. It will need exactly one of the many other automatable functions. How about a Peer Review process that pre-emptively scans the input document for plagiarism and alerts not only the publisher, but the authors’ institution and funding agencies when it detects evidence (compiling the report, looking for previous infringements by the authors — even ones that predate it’s existence but are discoverable in the existing literature)? What will our ecosystem look like in a world where you only need one AI to perform a given scholarly function? A series of AI libraries all accessible via APIs…
In 2011, Marc Andreessen wrote “Why Software Is Eating the World“. Six years on, I’m still having conversations with people who genuinely think that publishers aren’t and shouldn’t be software companies. I guess they think Netflix is a mail order DVD company and Amazon ships books. AI is going to eat the world, and this time, it’s Scholarly Publishing that has the juicy data with which to feed the beast. One the one hand, AI is going to rip through our value and production chains like Ash’s chainsaw; On the other, there’s going to be a bunch of money to be made supplying high value data to AI’s to enable them to do ever more sophisticated things. Unpicking and understanding this, is the challenge for the next decade and beyond.
“Alexa, Open the Pod Bay Doors”