The tech world seems to be leaning heavily towards voice activated devices. Siri – Amazon Echo – Facebook M – “OK Google” – as well as pretty much every vehicle in existence. It should make sense that we would want to speak to our digital assistants. After all, that’s how we communicate with each other. So why – then – do I feel like such a dork when I say “Siri, find me an Indian restaurant”?
I almost never use Sir as my interface to my iPhone. On the very rare occasions when I do, it’s when I’m driving. By myself. With no one to judge me. And even then, I feel unusually self-conscious.
I don’t think I’m alone. No one I know uses Siri, except on the same occasions and in the same way I do. This should be the most natural thing in the world. We’ve been talking to each other for several millennia. It’s so much more elegant than hammering away on a keyboard. But I keep seeing the same scenario play out over and over again. We give voice navigation a try. It sometimes works. When it does, it seems very cool. We try it again. And then, we don’t do it any more. I base this on admittedly anecdotal evidence. I’m sure there are those that continually chat merrily away to the nearest device. But not me. And not anyone I know either. So, given that voice activation seems to be the way devices are going, I have to ask why we’re dragging our heels to adopt?
In trying to judge the adoption of voice-activated interfaces, we have to account for mismatches in our expected utility. Every time we ask for some thing – like, for instance, “Play Bruno Mars” and we get the response, “I’m sorry, I can’t find Brutal Cars,” some frustration would be natural. This is certainly part of it. But that’s an adoption threshold that will eventually yield to sheer processing brute strength. I suspect our reluctance to talk to an object is found in the fact that we’re talking to an object. It doesn’t feel right. It makes us look addle-minded. We make fun of people who speak when there’s no one else in the room.
Our relationship with language is an intimately nuanced one. It’s a relatively newly acquired skill, in evolutionary terms, so it takes up a fair amount of cognitive processing. Granted, no matter what the interface, we currently have to translate desire into language, and speaking is certainly more efficient than typing, so it should be a natural step forward in our relationship with machines. But we also have to remember that verbal communication is the most social of things. In our minds, we have created a well-worn slot for speaking, and it’s something to be done when sitting across from another human.
Mental associations are critical for how we make sense of things. We are natural categorizers. And, if we haven’t found an appropriate category when we encounter something new, we adapt an existing one. I think vocal activation may be creating cognitive dissonance in our mental categorization schema. Interaction with devices is a generally solitary endeavor. Talking is a group activity. Something here just doesn’t seem to fit. We’re finding it hard to reconcile our usage of language and our interaction with machines.
I have no idea if I’m right about this. Perhaps I’m just being a Luddite. But given that my entire family, and most of my friends, have had voice activation capable phones for several years now and none of them use that feature except on very rare occasions, I thought it was worth mentioning.
By the way, let’s just keep this between you and I. Don’t tell Siri.