The arrival of ecommerce offered customers the ability to ‘drive’ retailers’ businesses with their keyboards, accessing stock, information and logistics via their screen. Mobile retailing put that power under their thumbs, and ‘voice’ is promising to be the next behavioural change. Ian Jindal clears his throat and considers voice interfaces.
I am old enough to believe that Star Trek showed us the future. The Communicator became the StarTac and then an iPhone; Apple’s CarKit was in evidence driving the USS Enterprise; but the most prized aspect of human:computer interface was voice. Tapping one’s chest and speaking as normal, the whole crew, nearby aliens and a starship were at one’s disposal. Aside from the voice recognition, it was the ability to control a whole universe that encapsulated the hopes for humanity’s future: the addressable and controllable ecosystem was vast.
Voice-recognition technology itself is now mature and remarkably resilient across noisy environments and accents, and from train booking to banking. Cloud-based systems bring vast computer processing to bear on the challenge, with machine learning offering ever-better interpretations, consistency and speed.
With the technology nicely in hand, the customer rapidly bores with the simple phenomenon of a phone or computer being able to understand their words and simple instruction. In the realm of “So what?” the battleground becomes clearly about what you can do with voice computing.
There are three main players who impinge on our customers’ lives with voice: Apple, Google and Amazon. On iOS, Siri allows you to do basic commands across a subset of applications and a sub-sub-set of features, mainly on your phone. Texting, calling, transcription and – with difficulty – playing music. In the last months, Apple’s latest iOS upgrade to version 10, and the imminent porting of Siri to Apple desktop systems, heralds new access capabilities outside of the Apple walled garden. This should make things more interesting, but early indications show the scope of delivery is focused upon messaging and media consumption.
Google’s Voice goes a step further than Apple because it’s device-agnostic and has Google’s reach into your current location, search history and mail. In short, Google’s voice can “do more” for you than Siri – find, suggest, locate… Google Voice extends the context of operation into the ‘real world’ and your movements within it.
Alexa from Amazon, however, goes a step further by adding a ‘buy button’. While Google’s Voice can help you find products, Amazon’s Alexa lets you buy them and have them delivered to you. So the experience of voice interfaces is a function both of their inherent capability and their operating context. “Voice Mediated Interfaces” (VMIs, anyone?) will really thrive when two further factors are considered: experience and openness.
Regarding experience it’s still a bit like talking into a Racal two-way radio from the 1980s, over. We have to think about how the computer will interpret our voice commands. Secondly, it’s too ‘serial’ – I ask, it does, I ask again… Conversation is full of interruptions, negations, amendments… That’s the next front. As things stand voice can’t distinguish instructions from the babble of a dinner party, a raucous breakfast with teenagers or asking in the car while the radio is blaring and kids are fighting… The stage after that is predictive speech. A bit like a butler’s gentle cough and discreet comments, I suppose…
On the interoperation front, there’s a need to bring more systems and data into the scope of the voice systems… home automation, car systems, health records… It’s no coincidence that Apple, Google and others are investing in HomeKit, CarKit, HealthKit… However we should also be eyeing POS, warehouse and workforce automation and hooks into our ecommerce platforms.
The challenge to overcome is that openness is antipathetic to the business model of voice – namely to tie you into a single, dominant ecosystem. Apple’s iTunes+homekit+desktop+ iOS+health+carkit… you get the picture. As retailers we need to have open systems and access upon which we can build. The early enablers of voice are also locked in a global battle over who gets the spoils.
We’re witnessing the early stages of change. The technology is ready for roll-out, the business model is typically early-stage-monopolistic, and the social and commercial cases are embryonic. However, with home, health, travel, information, media and entertainment already within the scope of voice, retail operations and logistics will inevitably follow suit. Not tomorrow, of course, but we’re in the transition from Sci-fi to social fact.