While none of the virtual assistants currently ‘in service’ on the market are capable of a lot of coherent talks, they will definitely be ready in the future, and when they will, they will also be able to look us in the eye as they speak to us.
At least that’s what Doug Roble, senior director of software of research and development at Digital Domain, an Oscar-winning visual effects studio in California, has said, during his 15 minute Ted Talk.
The future of motion capture looks just like us – that was the big idea behind the talk and Roble explained that his team managed to merge 3D motion capture and machine learning in order to generate hyper-realistic human models.
Roble added all the facial expressions he could come up with into a database, over the course of a year and demoed it on stage by wearing a full motion capture gear while a rendered an image of himself -which he had dubbed DigiDoug- carried out the same talk in his likeness, on the screen behind him.
This was the first time the technology saw the light of day -or, well, of the stage- outside the team’s lab.
“This is going to be used to give virtual assistants a body and a face, a humanity”
The models he and his team created can mimic those subtle facial mannerisms we are not aware of but all have in some form or another; they can even map out eyelash movement and get flushed cheeks by replicating the appearance of blood flow.
“I already love it that when I talk to virtual assistants they answer back in a soothing human-like voice,” Roble said “now they’ll have a face and you’ll get all of those non-verbal cues that make communication so much easier.”
Roble and his team used a device called Light Stage, to bring the models to life. Light Stage is used to capture realistic computer models of human faces and is comprised out of 32 sequential strobe lights that are set in a rotating arc. This arc illuminates a person’s face from all possible angles.
As this process happens, the person’s face is recorded by synchronized high-speed video cameras, which can see how the person’s face is transformed from one second to another, under different lights.
After the data is recorded, the face can be recreated in any arbitrary illumination by just combining together the color channels of the original image.
Of course, since Roble was the only one who was subjected to the Light Stage, the database is filled with just his own expressions and features but the hope is that, with more volunteers, the technology could be used to render a body for Alexa, which would allow the users to connect with the virtual assistant on a whole new level.
If the time comes when I will be able to have a decent discussion with a virtual assistant who will not say ‘I did not understand the question‘, then you can totally count me in to help it get a virtual body.