![]() ![]() ![]() Using neural networks for more convincing speech What’s needed is an approach that takes those separate steps and brings them together, into a single speech synthesis engine. As they’re separate steps, the result is clearly artificial. Part of the problem is the way that traditional speech synthesis works, with separate models for both analyzing the text and for predicting the required audio. Even using SSML (Speech Synthesis Markup Language) to add emphasis and inflection doesn’t make much difference and only adds to developer workloads, requiring every utterance to be tagged in advance to add the appropriate speech constructions. What’s more disconcerting is that there’s little or no inflection. If you take the standard approach, mapping text to strings of phonemes, the result is often stilted and prone to mispronunciation. High-quality speech synthesis isn’t easy. We’re all familiar with the speech synthesis tools in automated telephony systems or in GPS apps that fail basic pronunciation tests, getting names and addresses amusingly wrong. What’s needed is an easy way of taking text content and turning it into recognisable human-quality speech, not the eerie monotone of a sci-fi robot. Computers are good at displaying text, but not very good at reading it to us. The other side of the speech recognition story is, of course, speech synthesis. We’re perhaps most familiar with digital assistants like Cortana, Alexa, Siri, and Google Assistant, but speech technologies are appearing in assistive systems, in in-car applications, and in other environments where manual operations are difficult, distracting or downright dangerous. It’s not necessary to touch them or look at them - all that’s needed are a handful of trigger words and a good speech recognition system. Speech is increasingly important, as it provides a hands-free and at-a-distance way of working with devices. SEE: Alexa Skills: A guide for business pros (free PDF) (TechRepublic) ![]() The same goes for how computers respond to us, using haptics and speech synthesis. ![]() Now we’re surrounded by more natural user interfaces, adding touch and speech recognition to our repertoire of interactions. The days of the keyboard and screen as our sole method of interacting with a computer are long gone. Defend your network with Microsoft outside-in security services ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |