18
SPEECH SYNTHESIS TECHNOLOGY Name: K. VIDYA MADHURI ROLL NO: 14311A1201 IT-A, 2 nd year, 2 nd semester

Speech synthesis technology

Embed Size (px)

Citation preview

Page 1: Speech synthesis technology

SPEECH SYNTHESIS TECHNOLOGY

Name: K. VIDYA MADHURIROLL NO: 14311A1201IT-A, 2nd year, 2nd semester

Page 2: Speech synthesis technology

CONTENTS• Introduction• History• Construction• Working• Applications• Challenges

Page 3: Speech synthesis technology

INTRODUCTION• Speech Synthesis is the artificial production of

human speech. A synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely "synthetic" voice output.

•  A computer system used for this purpose is called a speech computer or speech synthesizer.

•  A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.

Page 4: Speech synthesis technology

HISTORY• In 1779, the Danish scientist Christian Kratzenstein,

working at the Russian Academy of Sciences, built models of the human vocal tract that could produce the five long vowel sounds.

• Wolfgang Von Kempelen added models of tongue and lips , enabling the machine to pronounce both vowels and constants.

• In 1837, Charles Wheatstone produced a "speaking machine" based on von Kempelen's design, and in 1857, M. Faber built the "Euphonia". Wheatstone's design was resurrected in 1923 by Paget.

Page 5: Speech synthesis technology

• In the 1930s, Bell Labs developed the vocoder, which automatically analyzed speech into its fundamental tone and resonances.

• Homer Dudley developed a keyboard-operated voice synthesizer called The Voder (Voice Demonstrator), which is developed from the Vocoder.

• The first computer-based speech synthesis systems were created in the late 1950s. The first general English text-to-speech system was developed by Noriko Umeda et al. in 1968 at the Electro technical Laboratory, Japan.

A speech synthesizer in 1990s

Page 6: Speech synthesis technology

CONSTRUCTION• A text-to-speech system (or "engine") is composed of two

parts: a front-end and a back-end.• The front-end converts raw text containing symbols like

numbers and abbreviations into the equivalent of written-out words (tokenization), then assigns phonetic transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses, and sentences (grapheme-phoneme conversion).

• The back-end—often referred to as the synthesizer—then converts the symbolic linguistic representation into sound.

Page 7: Speech synthesis technology

Block diagram of Text-to-Speech Engine

Page 8: Speech synthesis technology

APPROACHESThere are different approaches to speech synthesis, for example: text-to-speech and concept-to-speech synthesis. 

• Concept-to-speech synthesis involves a generation component that generates a textual expression from semantic, pragmatic and discourse knowledge. The speech signal can then be generated from this expression.

• In text-to-speech synthesis, the text to be spoken in provided, it is not generated by the system. It must however be analyzed and interpreted in order to convey the proper pronunciation and emphasis.

Page 9: Speech synthesis technology

SYNTHESIZING TECHNIQUES

Concatenation

Formant

Articulatory

HMM-based

Sinewave

Page 10: Speech synthesis technology

• Concatenative synthesis is based on the concatenation (or stringing together) of segments of recorded speech. Generally, concatenative synthesis produces the most natural-sounding synthesized speech.

• Formant synthesis does not use human speech samples at runtime. Instead, the synthesized speech output is created using additive synthesis and an acoustic model (physical modelling synthesis). Parameters such as fundamental frequency, voicing, and noise levels are varied over time to create a waveform of artificial speech.

• Articulatory synthesis refers to computational techniques for synthesizing speech based on models of the human vocal tract and the articulation processes occurring there.

Page 11: Speech synthesis technology

• HMM-based synthesis is a synthesis method based on Hidden Markov models, also called Statistical Parametric Synthesis. In this system, the frequency spectrum (vocal tract),fundamental frequency (voice source), and duration (prosody) of speech are modeled simultaneously by HMMs.

• Sinewave synthesis is a technique for synthesizing speech by replacing the formants (main bands of energy) with pure tone whistles

Page 12: Speech synthesis technology

SYSTEMS PROVIDING SPEECH SYNTHESIS• Apple uses Voiceover speech engine for its Mac OS in

laptops and iOS in iPhones, iPads and iPods.

• Modern Windows desktop systems can use SAPI 4 and SAPI 5 components to support speech synthesis and speech recognition. Microsoft Speech Server is a server-based package for voice synthesis and recognition. It is designed for network use with web applications and call centers.

• The Mattel Intellivision game console offered the Intellivoice Voice Synthesis module in 1982. It included the SP0256 Narrator speech synthesizer chip on a removable cartridge.

Page 13: Speech synthesis technology

SYSTEMS PROVIDING TEXT TO SPEECH SYNTHESIS• From version 1.6, Android added support for speech

synthesis (TTS)• Currently, there are a number of applications and web

pages from a web browser or Google Toolbar such as Text-to-voice which is an add-on to Firefox.

• Some specialized software can narrate RSS-feeds.• Some e-book readers, such as the Amazon

Kindle, PocketBook eBook Reader Pro, and the Bebook Neo use TTS.

• GPS Navigation units use speech synthesis for automobile navigation.

Page 14: Speech synthesis technology

Applications using Text-to Speech software in iOS and Android respectively

Page 15: Speech synthesis technology

MARKUP LANGUAGES ON SPEECH SYNTHESIS

• A number of markup languages have been established for the rendition of text as speech in an XML-compliant format. The most recent is Speech Synthesis Markup Language(SSML), which became a W3C recommendation in 2004.

• Older speech synthesis markup languages include Java Speech Markup Language (JSML) and SABLE.

Page 16: Speech synthesis technology

APPLICATIONS•  The longest application has been in the use of screen

readers for people with visual impairment, but text-to-speech systems are now commonly used by people with dyslexia and other reading difficulties as well as by pre-literate children.

• Speech synthesis techniques are also used in entertainment productions such as games and animations.

• In addition, speech synthesis is a valuable computational aid for the analysis and assessment of speech disorders.

• It can also be used as an educational tool, to learn different accents, like in Google Translate.

Page 17: Speech synthesis technology

CHALLENGES• Despite large improvements, Speech Synthesis can still

sound a little unnatural.

• The approaches to Speech Synthesis that yield the most natural speech need considerable resources in terms of data storage and processing power.

• The process of tokenizing text is rarely straightforward. There are many spellings in English which are pronounced differently based on context making it difficult for users.

Page 18: Speech synthesis technology

THANK YOU