Siri
Image by NobleArc via Flickr

A third usable human:computer interface has been ushered in by Apple, bringing with it a vibrant new life for the semantic web. That interface is known as Siri.

Comparisons about inventor cred aside, one trait shared by Thomas Edison and Steve Jobs was the ability to effectively commercialize inventions. Jobs’ track record in the realm of computer interfaces is unparalleled. Minor accomplishments emerging from Jobs’ projects are almost too many to list (the double-click open, typography, layered folders, rounded edges to menus, the trash can, etc.), yet they’re clearly overshadowed by two major human:computer interfaces — the mouse and touch.

Apple didn’t invent the mouse, but it perfected and popularized it, making it not only commonplace, but ubiquitous. Apple didn’t invent gestural computing, but it perfected and popularized it, making it commonplace, first with the iPhone, then the iPad. Both have been emulated by all manner of competitors.

Apple didn’t invent the conversational user interface, but it is now in the process of perfecting it and recently introduced it to more than four million devices. Having the innovation already in so many useful devices is a major step toward collecting the secret sauce the interface needs — data — to make it perhaps the greatest of the three interfaces.

Siri, Apple’s new voice recognition technology, is transformative — calling it “voice recognition” is vastly underselling its importance. It is a technology that has the potential to revolutionize computing, entertainment, home life, and work over the coming decade. And it is perhaps the biggest stake in the ground ever for the vaunted but seemingly elusive semantic Web.

I’ve had my iPhone 4S for a little over a week now, and already I’ve become habituated to using Siri. In my car, I can compose and send texts without taking my eyes off the road; I can call people without looking down; I can start an album just by asking for it. In hotels, I can set my alarm with just a quick instruction, or ask the device to locate a certain store nearby and have a map in front of me moments later. At home, I can call my wife’s cell phone while walking downstairs and avoiding oncoming stairway traffic.

On top of all this functionality, Siri is fun. We’ve all seen the galleries of funny things to say to Siri, and her responses are so varied that it’s hard to catch her in a repetition (except she uniformly locates a nearby therapist if you say you want to murder someone). For instance, when I tried, “How much wood can a woodchuck chuck if woodchuck could chuck wood?” Siri responded at first with a droll, “Don’t you have anything better to do?” When I tried to replicate it later, she responded with, “43 pieces. Everyone knows that.”

When has an interface ever been witty?

And the voice recognition, combined with the data Siri already has to reflect upon, is awesome. The recent storm that knocked out power in many spots along the eastern seaboard had me asking Siri to call the closest Lowe’s. Now, imagine saying, “Call the nearest Lowe’s” and what might come back — “nearest” could be “near miss” or “dearest” or some other mangle of sounds, while “Lowe’s” could be “lows.” But Siri nailed it. Getting “Lowe’s” right, with the apostrophe and all, struck me as mildly amazing; mapping it accurately was by comparison a minor feat. Siri would have been forgiven as “voice recognition” if she had responded with “lows” or “Lowes” — but it was “Lowe’s.”

Siri understood.

Compare this to Amazon’s algorithms, which last month failed in a most noticeable way after I bought Michael Lewis’ book, “Boomerang.” Suddenly, Amazon’s pages were filled with suggestions that I buy boomerangs — you know, bent wooden throwing sticks that circle back to the thrower?

Siri wouldn’t have been so naïve.

The semantic Web has been written off as many times as it’s been predicted, but Siri is a powerful realization of the semantic Web — Siri has to figure out what you mean, not just take your literal words and run them against an index of files. Siri has to comprehend that you mean “Lowe’s” instead of “lows.” And that’s powerful.

So why is this beta-version so noteworthy? Imagine a technology in beta that can elevate a device as sophisticated and useful as the iPhone. That’s a beta unlike any other. And so far, Siri’s power is only available to a few core iPhone apps, like the clock, the calendar, messaging, location services, and so forth. Third-party apps aren’t allowed to access Siri’s technologies. Yet. But soon, they will be. Imagine asking Epicurious for a recipe and then for its grocery list, or checking a sports score via the ESPN app.

Or turning off your television by saying, “I’m going to bed.” After all, there’s no reason to believe that Siri won’t be licensed to television manufacturers and others. We may not be that many years away from KITT.

Writing in Forbes, Eric Jackson notes that Siri’s role as the third interface will “go ballistic” as soon as it’s open for API development:

At the moment, Siri is in “beta” and no 3rd party app exists.  But what happens when you allow developers to write Siri-enabled scripts that tie into their websites – like Yelp, OpenTable, and others?  Siri will become even smarter.  For users, it will become even more valuable because better and better data results will come back to it.  And Apple — as happened in the iPhone and then iPad spaces — will have a huge lead in 3rd party apps tied into this powerful interface.

As for Siri’s role in the emergence of the semantic Web, Gary Goldhammer writes:

Siri is collecting a monster database of human behavior. Siri goes beyond “need” to “intent” – not what somebody wants, but why. Call it technographic, call it behavioral or call it semantic – whatever the term, Apple, not Google or Microsoft, may be ushering in the era of Web 3.0 and language-based search (and with it, capturing the ad dollars that will surely follow.) “The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.

For semantic companies, the emergence of Siri is a moment to savor. It is ripe with meaning.

Enhanced by Zemanta
Kent Anderson

Kent Anderson

Kent Anderson is the CEO of RedLink and RedLink Network, a past-President of SSP, and the founder of the Scholarly Kitchen. He has worked as Publisher at AAAS/Science, CEO/Publisher of JBJS, Inc., a publishing executive at the Massachusetts Medical Society, Publishing Director of the New England Journal of Medicine, and Director of Medical Journals at the American Academy of Pediatrics. Opinions on social media or blogs are his own.

Discussion

20 Thoughts on "Siri and the Resurrection of the Semantic Web"

Cool! It will be fun figuring out how Siri actually works. It is a outgrowth of the DARPA CALO project, which integrated a host of AI technologies. http://en.wikipedia.org/wiki/CALO This AI technology mashup approach is similar to the way IBM Watson was developed.

The semantic web guru on Siri is Tom Gruber. He has a 2009 keynote presentation on Siri at http://tomgruber.org/writing/semtech09.htm.

I have always been skeptical of the semantic web, vis a vis other semantic technologies, but maybe I will have to change my mind. In any case it is great to see AI actually working.

Hi Kent,

I agree completely that Siri is a great example of Apple’s skill in commercialising new technologies.

However, to characterise Siri as the “Semantic Web” is incorrect; this is a neat application of speech recognition, knowledge representation, machine leaning and cloud infrastructure. All of the these have been studied in AI research projects for many years. Apple have turned this into deliverable real world software, which bears all of their usual trademark ease of use and design aesthetics.

The Semantic Web, as envisaged by Berners-Lee and the W3C bears only a tangential relationship to this. Siri knows and cares nothing of the mystique of RDF, OWL, ontologies or reification.

Whilst its tempting to label Siri with this moniker, I’d suggest it’s more accurate to characterise it as one part of the great artificial intelligence research project finally bearing some fruit.

Cheers,

I suspect you are correct Richard, but do you have anything technical on the guts of Siri? Given that it taps into external web resources I doubt that these are in anything more than XML, not for example, RDF triples (or little sentences as I call them), or OWL. But internally there may be some ontologies, and who knows what else? DARPA funded hundreds of researchers in many leading shops to build the basis for Siri.

David, I don’t have anything on Siri internals. I agree that its likely to leverage some ontologies, however, I would be deeply surprised if it were powered by a triple store. Real world semantic projects (Watson, Wolfram Alpha, Siri …) don’t visibly use this technology. Go figure!

Late reply but, guys, dig the nuts of iOS then, where the major elements (listed above) are spread around different functional blocks/services. Siri as an umbrella utilizing all of those, it wraps them up and delivers the end user UX.

Dont get too impressed with “understand” … using the probabilities in speech recognition is old hat. 20 years ago IBM speech reco did it in triplets. “Call the nearest” is much more likely than “call the dearest”, or “call the near miss” — not from semantics, just from frequency of use. There is no “understanding” going on. (That doesn’t detract from its usefulness).

True, but voice recognition is the least of the problem Siri is solving. If I ask for the nearest Irish pub there is a lot of work just building the routine it needs to solve, then solving it. It involves tapping into various web resources, but first deciding which ones and how. A truly impressive sequencing of tasks.

The differences are clear — IBM’s was a lab effort, Siri is in the market, and data will make it valuable, as you note. Voice recognition is necessary but not sufficient.

As for “understanding,” if something gives me the impression that it understands me, then what is the difference?

Wouldn’t it be nice to navigate science this way. In my basement I have a cognitive distance mapping algorithm for science, if anyone is interested.

I haven’t as yet had chance to try Siri. However, I think an amazement of understanding the sentence ‘Call the nearest Lowe’s’ is slightly overstating it. The sentence starts with ‘call’ so the object of the sentence is bound to be a noun, and not ‘lows’. ‘Nearest’ is pretty self explanatory. Having said that, if this works half as good as is claimed it will be an important step forward. But I worry, we have already seen how it cannot understand certain accents and can frequently get things wrong. Google voice did much the same thing, i.e struggled with accents, all Apple seem to have done is put it in a shiny case with a piece of partially eaten fruit on it.

Two immediate responses.

All of us have accents. There is no speech that isn’t accented. What you’re really saying is that Siri does better with some accents (mid-American, perhaps) than others (Scottish, maybe). But recognizing that it’s already dealing with accents shows that this dimension has been dealt with once or more, meaning it can be dealt with again. In the UK, Siri apparently has a male voice with a British accent, and can differentiate Mum from Mom.

As for “Lowe’s” being a noun and “lows” not being a noun, “lows” is a noun — the minimum temperatures over a set of 24-hour periods, the sounds made by cattle, the nadirs of markets, popularity, or other measures. Siri had to know that it was a noun with a phone number when the instruction was “call.” “Nearest” means Siri had to know my location.

There are plenty of other examples. It’s not perfect, it’s in beta, but it’s set up to be on a rapid improvement path with millions of units in use already.

A friend of mine hitch hiked around Europe recently. He reports they spoke intelligible English everywhere he went, except in Scotland. Just saying.

A good comparison of the differences between what Apple has done here and what Google has done in the past (beyond the, sigh, “shiny case”) can be found here:

Google failed on a few fronts with this functionality. First of all, while on paper and in staged demos Google’s technology looks great, they failed to make it compelling enough to entice everyday users to use it. They had a pre-defined set of instructions as to what you could say to get the system to work, and they were pretty rigid. By comparison, Apple placed an emphasis on natural language usage with Siri. There are a number of ways to say something to trigger a certain action. You don’t have to remember a set of commands.

Put another way, Google’s voice search and Siri may look comparable on paper. But in reality, one is something best used by a robot, the other is something best used by a human. And robots don’t buy phones — at least not yet.

I can think of some terrifically useful things I could build with Siri and our content. How do I go about contacting someone in Apple to talk about it? Unlike Google or Microsoft, my pretty extensive network of contacts is bare when it comes to Apple. Why is that?

Duncan I suspect that any Siri app will require some heavy duty AI programming. Do you have that capability?

Not at all, which is why I need to talk to potential partners including Apple I think. Suggestions welcome!

I can see Apple releasing some useful tools for Siri interaction in the iOS developer framework. It may be a year or so off, but I dont see why APIs into the retrieval logic cant be surfaced by Apple. This would make it a truly invaluable resource for publishers. If Apple handles the heavy lifting of the AI and the publishers build innovative tools for voice search in their domain, or “scenario recognition” which is what IBM’s Watson team was envisioning for the practical deployment of Watson in medicine.

Comments are closed.