In order to simplify how machines learn language, MIT decided to work a bit differently than they previously had and decided to follow the less traveled road and get an AI to start learning in the same manner a child would – by observing scenes and then associating words with the recorded objects and actions.
The approach requires limited training data as the AI simply picks the expressions it most clearly thinks represent what it thinks is happening in the video. While the AI might initially begin with a wide range of meanings, eventually it will eliminate them until it is left with the right one.
The researchers are not only interested in the AI abilities – they also think that the system will allow them to understand how young children learn language too.
“A child has access to redundant, complementary information from different modalities, including hearing parents and siblings talk about the world, as well as tactile information and visual information, [which help him or her] to understand the world. It’s an amazing puzzle, to process all this simultaneous sensory input. This work is part of bigger piece to understand how this kind of learning happens in the world.”
– Boris Katz, principal research scientist and head of the InfoLad Group at CSAIL
While the AI observes the environment it can also learn how people actually speak as well. By using this process, MIT hopes to develop AI that can handle non-formal speech and can, in time, learn to respond accordingly as well.