Amazon Set to Release Data Set of Over 4,100,000 Words
Smart Life

Amazon Set to Release Data Set of Over 4,100,000 Words

amazon alexa
PC: Amazon

Amazon Alexa is a ubiquitous product, and for good reason: its natural language processing is one of the best in the market. To the surprise of many, Amazon has just announced that on September 17th, 2019, it will release its Topical Chat data set to the public.

The Topical Chat data set is important because it is a compilation of “more than 210,000 utterances or over 4,100,000 words” that could be used to train natural language processing systems like Alexa. It is one of the largest conversational and knowledge data set and before now, has been reserved for the Alexa Prize Socialbot Grand Challenge 3 contestants.

While there have been reports of Alexa recording conversations, Alexa users do not need to worry; Dilek Hakkani-Tur, Senior Principle Scientist at Amazon, noted in a blog post that “none of these conversations are interactions with Alexa customers.”

By releasing this data set to the public, Amazon is allowing for researchers and engineers to push the boundaries of natural language processing technology: “The goal of this collection is to enable the next steps of research in knowledge-grounded neural response generation systems, tackling hard challenges in natural conversation that are not addressed by other publically available datasets,” explained Hakkani-Tur.

To note, Alexa prize contestants will be given the Extended Topical Chat dataset, on that includes ongoing collections and annotations.

Subscribe to our website and stay in touch with the latest news in technology.

Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

To Top