Natural Language Processing (NLP), the tech behind Voice Assistants, is still evolving. We find ourselves frustrated with Siri and Alexa when they misinterpret our commands, but the future for NLP and Voice Assistants looks great nonetheless.
It’s a frequent and irritating experience that we’ve all had. Siri’s purpose—to support our lives in every way possible the way Rosie does for the Jetsons—is one we champion. But in reality, voice assistants are frustrating. The improvement of voice assistants is something we would all love to see. Right now, Siri remains clunky and functionally confined in most of our eyes.
Siri isn’t alone. Smart speaker sales may number in the tens of millions. Apple may have declared that 500 million device holders actively use Siri, but there’s currently no human-replacing voice assistant on the market. Natural Language Processing (NLP), the technology driving these products, is still evolving. We’re often still left defeated rather than enthusiastic after our interaction with voice assistants like Siri.
The Technology Powering Siri
Siri, Alexa, and Google Home were all designed to interact with us the way another human would. So why shouldn’t we expect them to behave as a human would? After all, it’s been almost a decade since Siri was first debuted.
When we unravel the steps Siri has to take to execute a simple command, however, we find that the ease with which our brain solves problems or responds to commands is not so easily replicated. Artificial Intelligence (AI) doesn’t process information the same way we humans do.
The fact that Siri currently supports 20 languages is pretty incredible, especially when compared to Alexa’s four languages and Google Assistant’s 11. From this perspective, what Siri and her digital assistant kin are capable of executing is quite extensive, notwithstanding limitations.
Dissecting Siri’s Challenges
Siri’s constraints lie in the different modules required to make a digital assistant tick. When we give Siri a command—when we say, for example, “schedule a meeting”—a few things must happen to prompt the right action.
First, the audio signal emitted from the user needs to be recorded, digitized, and then transformed into a text representation or a sequence of words. Second, the words need to be analyzed syntactically, and the words’ meaning then must be transformed into a semantic representation. Lastly, the semantic representation must be interpreted as a sequence of operations to be performed.
A mistake at any step in this process will cause Siri (or any digital assistant) to misunderstand the user’s intent and prompt the wrong response. There are many triggers for such mistakes. Let’s focus on the variability of individuals’ vocabulary as an example. Each one of us uses slightly different words and expressions with unique pronunciations and certain grammatical errors or nuances that we as humans don’t notice in day-to-day conversation. Siri must understand all of these variances.
There’s also ambiguity to deal with. Digital assistants interpret “commands” as a sequence of words with no initial or inherent relationship or meaning. They have to figure out what the words actually mean. Ambiguity is the inversion of the variance problem. While one semantic fact can be expressed with hundreds of simple sentences, one expression can refer to millions of semantic entities. For true improvement of voice assistants, Siri and other assistants must be able to interpret this correctly.
Context is yet another challenge. It comes so naturally to us that we often don’t recognize the need for clarification. At any moment, we’re integrating an unimaginable number of nested and independent contexts into our conversations. Sometimes this can be negative (prejudice is a form of cultural context), but most often it provides useful clues as to how to interpret a sentence.
The role of context is evident in a phrase as simple as “how are you?” Siri must be aware of the cultural context of this phrase to understand that it is a greeting, rather than a question, to properly engage in a conversation.
The idea that women are bad drivers, on the other hand, is a form of cultural context at its worst; this is not a fact but a prejudiced belief. Siri wouldn’t interpret this cultural context the way a human would—actually, recent research shows that 71 percent of all vehicle crash deaths in 2014 were males.
How Can We Improve Voice Assistants?
Our brains, for the most part, have no trouble processing these concepts. Siri, on the other hand, does not behave like our brains. She’s unable to decipher context. She doesn’t know that “Friday” could refer to three weeks from now because the user is currently planning the week after his vacation.