Amazon Alexa is a ubiquitous product, and for good reason: its natural language processing is one of the best in the market. To the surprise of many, Amazon has just announced that on September 17th, 2019, it will release its Topical Chat data set to the public.
The Topical Chat data set is important because it is a compilation of “more than 210,000 utterances or over 4,100,000 words” that could be used to train natural language processing systems like Alexa. It is one of the largest conversational and knowledge data set and before now, has been reserved for the Alexa Prize Socialbot Grand Challenge 3 contestants.
While there have been reports of Alexa recording conversations, Alexa users do not need to worry; Dilek Hakkani-Tur, Senior Principle Scientist at Amazon, noted in a blog post that “none of these conversations are interactions with Alexa customers.”
By releasing this data set to the public, Amazon is allowing for researchers and engineers to push the boundaries of natural language processing technology: “The goal of this collection is to enable the next steps of research in knowledge-grounded neural response generation systems, tackling hard challenges in natural conversation that are not addressed by other publically available datasets,” explained Hakkani-Tur.
To note, Alexa prize contestants will be given the Extended Topical Chat dataset, on that includes ongoing collections and annotations.
Follow TechTheLead on Google News to get the news first.