Upload
amadis
View
83
Download
0
Tags:
Embed Size (px)
DESCRIPTION
5- Speech Synthesis. Speech Synthesis Concept Phone Units Phone Sequence To Speech Speech Naturalness Concatenative Approaches Rule-Base Approaches. Speech Synthesis Concept. Text. Speech. Speech. Text to Phone Sequence. Phone Sequence to Speech. Text. Natural Language - PowerPoint PPT Presentation
Citation preview
5-Speech Synthesis5-Speech Synthesis
Speech Synthesis Concept Speech Synthesis Concept
Phone UnitsPhone Units
Phone Sequence To SpeechPhone Sequence To Speech
Speech NaturalnessSpeech Naturalness– Concatenative ApproachesConcatenative Approaches– Rule-Base ApproachesRule-Base Approaches
Speech Synthesis ConceptSpeech Synthesis Concept
Text toPhone Sequence
Phone Sequenceto Speech
Text Speech
Natural Language Processing (NLP)
Speech Processing
Text Speech
Phone UnitsPhone Units
Paragraph ( )
Sentence ( )
Word (Depends on the language. Usually more than 100,000)
Syllable
Diphone & Triphone
Phoneme (Between 10 , 100)
Phone Units (Cont’d)Phone Units (Cont’d)
Diphone : We model Transitions between Diphone : We model Transitions between two phonemestwo phonemes
p1 p2 p3 p4 p5 . . . . .
Diphone
Phoneme
Phone Units (Cont’d)Phone Units (Cont’d)
In farsi we have 30 Phoneme. so we have In farsi we have 30 Phoneme. so we have 30*30 Diphone Theoretically.30*30 Diphone Theoretically.
Practically the only Diphone that we don’t Practically the only Diphone that we don’t have in farsi is have in farsi is /zho/ /zho/
we have 27000 Triphone Theoretically. we have 27000 Triphone Theoretically. But practically we have about 15000 But practically we have about 15000 Triphone in farsi.Triphone in farsi.
Phone Units (Cont’d)Phone Units (Cont’d)
Syllable = Onset (Consonant) + RhymeSyllable = Onset (Consonant) + Rhyme
Syllable is a set of phonemes that exactly Syllable is a set of phonemes that exactly contains one vowelcontains one vowel
Syllables in Farsi : CV , CVC , CVCC Syllables in Farsi : CV , CVC , CVCC
We have about 4000 Syllables in farsiWe have about 4000 Syllables in farsi
Syllables in English :V, CV , CVC ,CCVC, Syllables in English :V, CV , CVC ,CCVC, CCVCC, CCCVC, CCCVCC, . . .CCVCC, CCCVC, CCCVCC, . . .
Number of Syllables in English is very muchNumber of Syllables in English is very much
Phone Sequence To SpeechPhone Sequence To Speech
Concatenative Approaches : Trade-Off Concatenative Approaches : Trade-Off between Naturality And Memory usage between Naturality And Memory usage and function amountand function amount
Rule-Based Approaches : The most Rule-Based Approaches : The most important Rule-Based approach is Klatt important Rule-Based approach is Klatt methodmethod
Phone Sequence To Speech Phone Sequence To Speech (Cont’d)(Cont’d)
Text to Phone
Sequence
Phone Sequence
to primitive utterance
Text Speechprimitive utteranceto Natural
Speech
NLP Speech Processing
Speech NaturalnessSpeech Naturalness
Obviation of undesirable noise and Obviation of undesirable noise and distortion and dissociation from speechdistortion and dissociation from speech
Prosody generationProsody generation– Speech energySpeech energy– DurationDuration– IntonationIntonation– StressStress
Speech Naturalness (Cont’d)Speech Naturalness (Cont’d)
Intonation and Stress are very effective in Intonation and Stress are very effective in speech naturalnessspeech naturalness
Intonation : Variation of Pitch frequency Intonation : Variation of Pitch frequency along speakingalong speaking
Stress : Increasing the pitch frequency in a Stress : Increasing the pitch frequency in a specific timespecific time
Concatenative ApproachesConcatenative Approaches
In this approaches we store units of In this approaches we store units of natural speech for reconstruction of natural speech for reconstruction of desired speechdesired speech
We could select the appropriate phone We could select the appropriate phone unit for speech synthesisunit for speech synthesis
we can store compressed parameters we can store compressed parameters instead of main waveforminstead of main waveform
Concatenative Approaches Concatenative Approaches (Cont’d)(Cont’d)
Benefits of storing compressed Benefits of storing compressed parameters instead of main waveformparameters instead of main waveform– Less memory useLess memory use– General state instead of a specific storedGeneral state instead of a specific stored
utteranceutterance– Generating prosody easilyGenerating prosody easily
Concatenative Approaches Concatenative Approaches (Cont’d)(Cont’d)
Phone Unit Type of StoringParagraph
Sentence
Word
Syllable
Diphone
Phoneme
Main Waveform
Main Waveform
Main Waveform
Coded/Main Waveform
Coded Waveform
Coded Waveform
Concatenative Approaches Concatenative Approaches (Cont’d)(Cont’d)
Pitch Synchronous Overlap-Add-Method (PSOLA) is a famous method in phoneme transmit smoothing
Overlap-Add-Method is a standard DSP method
PSOLA is a base action for Voice Conversion.
In this method in analysis stage we select frames that are synchronous by pitch markers.
Rule-Base Approach StagesRule-Base Approach Stages
Determine the speech model and model Determine the speech model and model parametersparameters
Determine type of phone unitsDetermine type of phone units
Determine some parameter amount for Determine some parameter amount for each phone uniteach phone unit
Substitute sequence of phone units by its Substitute sequence of phone units by its equivalent parameter sequenceequivalent parameter sequence
Put parameter sequence in speech modelPut parameter sequence in speech model
KLATT 80 ModelKLATT 80 Model
KLATT 88 ModelKLATT 88 Model
KL GLOTT 88 KL GLOTT 88 model model
(default)(default)
SPECTRAL SPECTRAL TILT LOW-PAS TILT LOW-PAS RESONANTORRESONANTOR
MODIFIED LF
MODEL
ASPIRATION NOISE
GENERATOR
FIRST DIFFERENCE
PREEMPHASIS
NASAL NASAL FORMANT FORMANT
RESONATORRESONATOR
TRACHEAL FORMANT
RESONATOR
FOURTH FORMANT
RESONATOR
THIRTH FORMANT
RESONATOR
SECOND SECOND FORMANT FORMANT
RESONATORRESONATOR
FIRST FIRST FORMANT FORMANT
RESONATORRESONATOR
FRICATION FRICATION NOISE NOISE
GENERATORGENERATOR
SECOND FORMANT
RESONATOR
THIRD FORMANT
RESONATOR
FOURTH FORMANT
RESONATOR
FIFTH FIFTH FORMANT FORMANT
RESONATORRESONATOR
SIXTH FORMANT
RESONATOR
A2F
A3F
A4F
A5F
A6F
AB
ANV
A1V
A2V
A3V
A4V
ATV
+
-
+
-
+
-
+
+
-
+
-
-
+
+
FILTERED FILTERED IMPULSE IMPULSE
TRAINTRAIN
F0 AV OO FL DI
SO
SS
TL
AH
AF
GLOTTAL SOUND SOURCES
CP
BYPASS PATH
B2F
B3F
B4F
B5F
B6F F6
PARALLEL VOCAL TRACT MODEL LYRYNGEAL
SOUND SOURCES (NORMALLY NOT USED)
PARALLEL VOCAL TRACT MODEL FRICATION SOUND SOURCES
BNP BNZ BTP BTZ DF1 DB1 F2 B2 F3 B3 F4 B4 F5 B5
CASCADE VOCAL TRACT MODEL LARYNGEAL SOUND SOURCES
NASAL NASAL
POLE ZERO POLE ZERO PAIRPAIR
TRACHEAL TRACHEAL POLE ZERO POLE ZERO
PAIRPAIR
FIRST FIRST FORMANT FORMANT
RESONATORRESONATOR
SECOND SECOND FORMANT FORMANT
RESONATORRESONATOR
THIRTH THIRTH FORMANT FORMANT
RESONATORRESONATOR
FOURTH FOURTH FORMANT FORMANT
RESONATORRESONATOR
FIFTH FIFTH FORMANT FORMANT
RESONATORRESONATOR
THE KLSYN88 CASCADE PARALLEL FORMANT SYNTHESIZERTHE KLSYN88 CASCADE PARALLEL FORMANT SYNTHESIZERFNP FNZ FTP FTZ F1 B1
Three Voicing Source Model In Three Voicing Source Model In KLATT 88KLATT 88
The old KLSYN impulsive sourceThe old KLSYN impulsive source
The KLGLOTT88 model The KLGLOTT88 model
The modified LF modelThe modified LF model