Voice assistants such as Amazon’s Alexa, Google’s Asisstant, Apple’s Siri, and Microsoft’s Cortana are part of the norm now. We all accidentally wake up voice assistants and then try to frantically close the app. Anyone who’s used one knows they have a habit of waking up when they shouldn’t.
Is your voice assistant a spy and a rat?
These systems raise important privacy concerns, such as what precisely are these systems recording from their surroundings, and does that include sensitive and personal conversations that were never meant to be shared with companies or their contractors?
These aren’t just hypothetical concerns from paranoid users; there has been a slew of recent reports about devices constantly recording audio and cloud providers outsourcing to contractors transcriptions of audio recordings of private and intimate interactions. Anyone who has used voice assistants knows that they accidentally wake up and record when the wake phrase or word isn’t spoken. But do they do this out of design error, or is the big bad company out to get you and your little dog?
This Time Is Not Another English Research Team
Well… the truth is a bit less Snowden, a bit more Giuliani. A group from the Northeastern University in Boston named “The Mon(IoT)r” dedicated six months to conduct research that goes beyond anecdotes, through the use of repeatable, controlled experiments that shed light on what causes voice assistants to mistakenly wake up and start recording.
Hey Google, pay atention to the Gilmore Girls, The Office and some other shows
The main goal of the research was to detect if, how, when, and why smart speakers are unexpectedly recording audio from their environment. The team was also interested in whether there are trends based on certain non-wake words, types of conversation, location, and other factors. To find out, they put a bunch of smart home speakers in a room together to watch 125 hours of dialogue-heavy Netflix content. Like Gilmore Girls, The Office, Grey’s Anatomy, Dear White People, Narcos, Big Bang Theory or Riverdale, just to name a few. And they exposed the gadgets to voice assistants installed on the following stand-alone smart speakers:
- Google Home Mini 1st generation (wake up word: OK/Hey/Hi Google)
- Apple Homepod 1st generation (wake up word: Hey, Siri)
- Harman Kardon Invoke by Microsoft (wake up word: Cortana)
- 2 Amazon Echo Dot 2nd generation (wake up words: Alexa, Amazon, Echo, Computer)
- 2 Amazon Echo Dot 3rd generation (wake up words: Alexa, Amazon, Echo, Computer)
To note, the measurements were conducted in a custom monitoring system, consisting of smart speakers, a camera to detect when they light up, a speaker to play the audio from TV shows, a microphone to monitor what audio the speakers play (such as responses to commands), and a wireless access point that records all network traffic between the devices and the Internet.
Your Voice Assistant Misfires At Least 1.5 To 19 Times A Day
And so, the research group discovered that the average rate of activations per device was between 1.5 to 19 times per day. In a few words that is. The team answered a few common questions that most users have:
1.Are these devices constantly recording our conversations? In short, we found no evidence to support this. The devices do wake up frequently, but often for short intervals (with some exceptions).
2.How frequently do devices activate? The average rate of activations per device is between 1.5 and 19 times per day (24 hours) during our experiments. HomePod and Cortana devices activate the most, followed by Echo Dot series 2, Google Home Mini, and Echo Dot series 3.
3.How consistently do they activate during a conversation? The majority of activations do not occur consistently. We repeated our experiments 12 times (4 times for Cortana), and only 8.44% of activations occurred consistently (at least 75% of tests). This could be due to some randomness in the way smart speakers detect wake words, or the smart speakers may learn from previous mistakes and change the way they detect wake words.
4.Are there specific TV shows that cause more overall activations than others? If so, why? Gilmore Girls and The Office were responsible for the majority of activations. These two shows have more dialogue with respect to the others, meaning that the number of activations is at least in part related to the amount of dialogue.
5.Do specific TV shows cause more activations for a given wake word? Yes. For each wake word, a different show causes the most activations.
6.Are there any TV shows that do not cause activations? No. All shows cause at least one device to wake up at least once. Almost every TV show causes multiple devices to wake up.
7.Are activations long enough to record sensitive audio from the environment? Yes, we have found several cases of long activations. Echo Dot 2nd Generation and Invoke devices have the longest activations (20-43 seconds). For the Homepod and the majority of Echo devices, more than half of the activations last 6 seconds or more.
8.What kind of non-wake words consistently cause long activations? We found several patterns for non-wake words causing activations that are 5 seconds or longer.
For instance, with the Google Home Mini, these activations commonly occurred when the dialogue included words rhyming with “hey” (such as letter “A” or “They”) followed by something that starts with hard “G”, or that contains “ol” such as “cold and told”. Examples include “A-P girl”, “Okay, and what”, “I can work”, “What kind of”, “Okay, but not”, “I can spare”, “I don’t like the cold”.
For the Apple Homepod, activations occurred with words rhyming with Hi or Hey, followed by something that starts with S+vowel, or when a word includes a syllable that rhymes with “ri” in Siri. Examples include “He clearly”, “They very”, “Hey sorry”, “Okay, Yeah”, “And seriously”, “Hi Mrs”, “Faith’s funeral”, “Historians”, “I see”, “I’m sorry”, “They say”.
For Amazon devices, we found activations with words that contain “k” and sound similar to “Alexa,” such as “exclamation”, “Kevin’s car”, “congresswoman”. When using the “Echo” wake word, we saw activations from words containing a vowel plus “k” or “g” sounds. Examples include “pickle”, “that cool”, “back to”, “a ghost”. When using the “Computer” wake up word, we saw activations from words containing “co” or “go” followed by a nasal sound, such as “cotton”, “got my GED”, “cash transfers”. Finally, when using the “Amazon” wake word, we saw activations from words containing combinations of “I’m” / “my” or “az”. Examples include: “I’m saying”, “my pants on”, “I was on”, “he wasn’t”.
For Invoke (powered by Cortana), we found activations with words starting with “co”, such as “Colorado”, “consider”, “coming up”.
More to Be Done and Tested
In other words: No smart speaker was found to be consistently recording conversations. The devices would typically record at least the next six seconds of conversation, though some assistants record more. In further research, the team wants to investigate how much of this data is being sent to the cloud, whether these accidental recordings are reported accurately, and whether there are any other factors at play. Keep an eye out for this research team.