Amazon Taught Alexa To Show Emotion With Whispers And Pauses

Alexa is a pretty fun virtual assistant to help you around the house. It’s not just a brainiac, but it can tell jokes on occasions and even sing you a song when you’re feeling blue. But its straightforward answers are what you’d expect from a robot, not a human. So Amazon felt it was time for an upgrade, one that would make you feel like you’re in the presence of a person, not a program. Jeff Bezos’ team enabled Alexa to talk more human-like with pauses and whispers #softwaremagic

At the end of the week, Amazon announced Alexa would receive the tools to behave more naturally. They had worked on a new set of talking skills that would allow the virtual assistant to whisper words, emphasize a sentence by taking an audible breath and most of all, adjust the pitch, rate and volume of her speech. This would convey emotion to Alexa’s speech, making “her” more endearing to users.

That’s the idea, at least. The execution depends on developers’ talent of creatively using this standardized markup language. They are first and foremost targeted by Amazon with this update, encouraged to make use of the full range of options. “You can add pauses, change pronunciation, spell out a word, add short audio snippets, and insert speechcons (special words and phrases) into your skill. These SSML features provide a more natural voice experience”, writes Liz Myers on the official blog.

More specifically, there are five new features: whispers for a softer dialogue, expletive beeps to bleep out words, sub to replace words from Alexa’s speech, emphasis for rate and volume adjustments and prosody for volume, pitch, and rate changes. Besides these, there are also speechcons, words and phrases that are specific to one culture and that Alexa can pronounce as it was originally intended. These speechcons have been made available to U.K. and Germany for now.

Amazon has implemented restrictions on the amount of changes developers can apply to rate, pitch and volume. In the end, Alexa should sound more like a human than a cartoon character. The only way users will be able to hear the difference is if developers copy these tags in their Alexa Skills Kit.

