Home speakers might give you a direct line to savvy digital assistants but even they are pretty useless in a noisy environment. During a party or a heated family dinner, they’ll have trouble distinguishing between the several voices and catching the one talking to it. But researchers from Mitsubishi Electric believe they’ve trained an AI enough to do it #machinemagic
A team from the Mitsubishi Electric Research Laboratory in Cambridge, Massachusetts developed an AI that separates voices in real time to identify the one addressing it. It was trained on 100 English speakers using “deep clustering” to find out unique features in each voiceprint. Then, those are regrouped to be reconstructed and treated as separate voices.
Although the people the program trained with spoke English, a spokesperson for Mitsubishi Electric says that the process should be successful “even if a speaker is Japanese”. In their initial tests, separating one voice from another one was done with 90% accuracy, but even when we’re talking about a group of people, composed from three persons, the accuracy doesn’t dip lower than 80%.
This advanced tech can help smart assistants work better both at home or in-car. But civilians aren’t the only ones who can benefit from this development. This AI can be the starting point for voice recognition systems used in law enforcement agencies, that are usually tasked with reconstructing conversations obstructed by music.
Mitsubishi’ AI proved able to separate voices of up to five people at once, a truly remarkable achievement that will be under everyone’s scrutiny at the Combined Exhibition of Advanced Technologies in Tokyo.