A group of researchers from Georgetown University and UC Berkeley have demonstrated how voice commands hidden in YouTube videos can be used by malicious attackers to compromise smartphones.
In order for the attack to work, the target smartphones have to have Apple Siri or Google Now – the intelligent personal assistant software that uses a natural language user interface to answer questions, make recommendations, and perform actions – enabled.
And, if the video in question is not played on them, they have to be close enough to “hear” and interpret the commands hidden in the video played on other nearby devices.
The hidden (obfuscated) voice commands are unintelligible to human listeners but are interpreted as commands by devices. Human listeners might be able to hear the mangled voice commands if they known about them being played, but in practice they would not know about it and would fail to pick them up.
Attackers using this avenue of attack could instruct devices to visit certain sites, download and install malware, and more.
There are also ways to prevent users noticing that the device has received a command (and is about to perform it): the beeps and buzzes that indicate that a command has been received could be masked by other noise that the device is instructed to create at the same time. But, unfortunately, more often than not, even this is not needed, as users are accustomed to ignoring alerts.
There are things that manufacturers could do to prevent attacks like these: make users confirm voice commands via audio CAPTCHAs, tune filters to permit normal audio while eliminating hidden voice commands.
Also, machine learning can be used to teach the voice-activated assistants to distinguish between user- and computer-generated voice commands, and ignore the latter.
More details about the attack and the defenses can be found in this paper, and more attack demos on this site.
A similar attack has been demonstrated last year by a team of researchers from the French Network and Information Security Agency (ANSSI), but its success was predicated on the target devices having headphones with a microphone plugged in.