The “Looking to Listen” demo presents a “deep learning audio-visual model for isolating a single speech signal from a mixture of sounds such as other voices and background noise.”
In short, Google’s AI can now pick and identify a single voice from a crowd, no matter the background noise, simply by selecting someone’s face.
In the demo, Google was able to produce videos where the speech of specific people was enhanced, while all other sounds were suppressed, by selecting the faces of those people in question.
According to the company, this capability could be used in video conferencing, hearing aids, caption creation, and other scenarios where clearer sound is needed.
In the video above, you can see and hear a demo for a stand-up show where the two hosts talked at once. Below, how you could video-chat someone, then later isolate the background noise to snoop into someone else’s conversation.
What do you think? Impressive or scary? Perhaps the Hushme mask will be enough to keep your conversations private.