“What we’re trying to do is make a new mode of interaction with Google, the more similar to what you have with a person,” says Huffman. The search for this simplicity goes through the most advanced technologies in Mountain View in terms of semantic search, primarily Google Knowledge Graph. Because a similar interaction functions, it is necessary that the system knows you and knows how to interpret, understand the context and implicit references in the questions that are asked.
“Google has a wealth of information about the context,” says Huffman. “We know where you are with your smartphone and in part the topics you are interested in. This should help us understand what you’re referring to when you talk about “. For example, distinguishing the pronouns from one question to another: it refers to a place mentioned earlier, or he/she in relation to a person. A matter of course in a dialogue between human beings, but much less obvious when one turns to a computer.
Of course, the conversational search Google is far from perfect and not always able to correctly interpret the words (as in the case of the pronoun, to bring forward the first example), also because it has a short-term memory. Huffman’s team is working to ensure that speech recognition requires clarification to you suggesting likely alternatives based on the latest research.
To ensure that a system like this become really efficient on any device, we must still tackle critical issues. First of all, there is the need to minimize errors in reading the voice commands: until we see the transcript on screen we can know what the computer has understood, but as be certain when the device does not have any visual interfaces?
In addition, going beyond the simple search, how you might implement voice commands in software that have the most diverse and complex functions (think a writing program with all its settings)? Huffman imagines that a future solution would be to use short and specific commands, but this will require you to learn a new language to interact with the system.
One of the main stumbling blocks, however, has already been addressed and exceeded: computing power, which goes far beyond the possibilities of a smartphone, for example. “Speech recognition requires an enormous data processing,” says Huffman. “We use a giant neural network deployed on numerous servers”. In a nutshell, every vocal command is sent and processed in the cloud rather than metabolized by single device. In this sense, Google has probably unparalleled infrastructure in the world, along with an endless search database. The future in which we will chat with the computer as if it were a real person perhaps is not so far.