50
Module 4 Pronunciation & prosody

Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

  • Upload
    others

  • View
    21

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Module 4

Pronunciation & prosody

Page 2: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Roadmap

• Modules 1-2: The basics• Modules 3-5: Speech synthesis• Modules 6-9: Speech recognition

• Block 1 Week 4• Module 3: text processing

• Block 1 Week 5• Class trip• Module 4: pronunciation & prosody

• Block 1 Week 6• Assignment Q&A• Module 5: waveform generation

• Block 1 Week 7• Submission of first assignment

Page 3: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Orientation

• Text-to-speech pipeline architecture

• Normalise text

• Predict pronunciation & prosody

• Generate waveform

Coffee costs £2.

SIL K AA F IY K AA S T ST UW P AW N D Z SIL

coffee costs two pounds .

Page 4: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

What you should already know• morphology• POS• dictionary lookup of word + POS• syllables & lexical stress• LTS (rules or model)• post-lexical rules

• gathering and preparing training data• choosing the predictors • growing the tree (learning from data)

• placement of events (classification)• deciding their types (classification)• realisation (regression)

• From the videos & readings• Letter to sound (LTS)• A worked example of LTS using a

classification tree• Prosody prediction

Page 5: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Today’s topics - Module 4: pronunciation & prosody

Theory Application

SpeechSignal

processingProbabilistic modelling

Speech Synthesis Automatic Speech Recognition

Signals Production Perception Front endWaveform

generationFeature

extractionPattern matching

Hidden Markov Models

Connected speech

Concepts

Time domainSound source

Pitch Digital signal Describing dataTokenisation & normalisation

Waveform concatena

tion

Series expansion

ExemplarGenerative model of sequences

Hierarchy

Periodic signal

Harmonics CochleaShort-term

analysis

Discrete & continuous variables

Pronunciation Diphone FeatureS DistanceSub-word

unit

Frequency domain

Vocal tract resonance & formants

Mel scaleSpectral envelope

Joint, conditional,

Bayes’ formulaProsody

Feature engineering

SequenceHidden state sequence

N-grams

Models & data

structures

FilterResonant

tubeFilterbank Impulse train Gaussian

Finite state transducer

Feature vector

Sequence of feature vectors

Hidden Markov Model

Impulse response

Source-filter model

Phoneme Pitch periodGenerative

modelDecision tree Grid Lattice Graph

Algorithms & analysis

Fourier analysis

Fitting a Gaussian to

data

Handwritten rules

Overlap-add

MFCCsDynamic

programming (DTW)

Dynamic programming

(Viterbi)

Composition (“compiling”)

Cepstral analysis

ClassificationLearning

decision treesTD-PSOLA Baum Welch

Approximation (pruning)

Phoneme

Pronunciation

Prosody

Decision tree

Learning decision trees

Page 6: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Today’s topics - Module 4: pronunciation & prosody

Phoneme Pronunciation

Prosody

Decision tree

Learning decision trees

Page 7: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Speech synthesis - pronunciation & prosody

• Machine Learning• Classification And Regression Trees (CARTs)

• classification: understanding entropy as a measure of predictability• regression: measuring the predictability of a continuous variable• stopping criteria

Page 8: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Step 1 - define the overall task we are going to solve

from the orthographic form: HOGWASH

predict the pronunciation: HH AA G W AA SH

Phoneme Pronunciation

Page 9: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Step 2 - break the task down into simple, solvable, sub-tasks

from one letterof the orthographic form: HOGWASH

predict zero, one or two phonesof the pronunciation: HH AA G W AA SH

Page 10: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Step 3 - obtain the raw training data - for Letter-to-Sound (LTS), this is simply a pre-existing dictionary

HOGWASH HH AA G W AA SHCARWASH K AA R W AA SHWARRANT W AO R AH N TWARRANTY W AO R AH N T IYHARDWARE HH AA R D W EH RSOFTWARE S AO F T W EH RWARES W EH R Z

here are some words from the CMU dictionary that use the letter “A”

Page 11: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Step 4 - define the predictee - which phone are we going to predict from this letter?

H O G W A S H

HH AA G W AA SH

Page 12: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Step 5 - choose the predictors

H O G W A S H

HH AA G W AA SH

Page 13: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Step 6 - get the training data ready for machine learning

predictors predicteeppp pp p n nn nnno g w a s h - aaa r w a s h - aa- - w a r r a ao- - w a r r a aor d w a r e - ehf t w a r e - eh- - w a r e s eh

Page 14: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Speech synthesis - pronunciation & prosody

• Machine Learning• Classification And Regression Trees (CARTs)

• classification: understanding entropy as a measure of predictability• regression: measuring the predictability of a continuous variable• stopping criteria

Page 15: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

In-class exercise

Building a decision tree for phrase-break prediction

Page 16: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Step 1 - define the overall task we are going to solve

For a sentence, predict where the phrase breaks should go.

I like to ride bikes.

break

Prosody

Page 17: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

For each word, predict whether there is phrase break after it.

I like to ride bikes.

break

Step 2 - break the task down into simple, solvable, sub-tasks

no break

no break

no break

no break

Page 18: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Step 3 - obtain the raw training datathis is going to be expensive because it will involve manually labelling spoken utterances

Words I like to ride bikes .Breaks BREAKWords Food and drink are nice .Breaks BREAK BREAKWords Apples but not pears .Breaks BREAK BREAKWords He is but she’s not !Breaks BREAK BREAKWords One , two , three .Breaks BREAK BREAK BREAKWords Shaken yet not stirred .Breaks BREAK BREAK

Page 19: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Break !

Page 20: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Words I like to ride bikes .Breaks No break No break No break No break BREAK No breakWords Food and drink are nice .Breaks BREAK No break No break No break BREAK No breakWords Apples but not pears .Breaks BREAK No break No break BREAK No breakWords He is but she’s not !Breaks No break BREAK No break No break BREAK No breakWords One , two , three .Breaks BREAK No break BREAK No break BREAK No breakWords Shaken yet not stirred .Breaks BREAK No break No break BREAK No break

Step 4 - define the predictee and the possible values it can take: BREAK —or— No break

Page 21: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Step 5 - choose the predictorsthey can only be things that you will also know for the test data

Words I like to ride bikes .POS N V TO V N PUNCBreaks No break No break No break No break BREAK No break

Page 22: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Words I like to ride bikes .POS N V TO V N PUNCBreaks No break No break No break No break BREAK No breakWords Food and drink are nice .POS N CC N V JJ PUNCBreaks BREAK No break No break No break BREAK No breakWords Apples but not pears .POS N CC RB N PUNCBreaks BREAK No break No break BREAK No breakWords He is but she’s not !POS N V CC N V PUNCBreaks No break BREAK No break No break BREAK No breakWords One , two , three .POS CD PUNC CD PUNC CD PUNCBreaks BREAK No break BREAK No break BREAK No breakWords Shaken yet not stirred .POS JJ CC RB JJ PUNCBreaks BREAK No break No break BREAK No break

Step 5 - choose the predictors

Page 23: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Words I like to ride bikes .Predictor 1 : L POSPredictor 2 : C POSPredictor 3 : R POSPredictee NO-BREAK NO-BREAK NO-BREAK NO-BREAK BREAK NO-BREAK

Words Food and drink are nice .Predictor 1 : L POSPredictor 2 : C POSPredictor 3 : R POSPredictee BREAK NO-BREAK NO-BREAK NO-BREAK BREAK NO-BREAK

Words Apples but not pears .Predictor 1 : L POSPredictor 2 : C POSPredictor 3 : R POSPredictee BREAK NO-BREAK NO-BREAK BREAK NO-BREAK

Words He is but she’s not !Predictor 1 : L POSPredictor 2 : C POSPredictor 3 : R POSPredictee NO-BREAK BREAK NO-BREAK NO-BREAK BREAK NO-BREAK

Words One , two , three .Predictor 1 : L POSPredictor 2 : C POSPredictor 3 : R POSPredictee BREAK NO-BREAK BREAK NO-BREAK BREAK NO-BREAK

Words Shaken yet not stirred .Predictor 1 : L POSPredictor 2 : C POSPredictor 3 : R POSPredictee BREAK NO-BREAK NO-BREAK BREAK NO-BREAK

Step 6 - get the training data ready for machine learning

Page 24: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Words I like to ride bikes .Predictor 1 : L POS - N V TO V NPredictor 2 : C POS N V TO V N PUNCPredictor 3 : R POS V TO V N PUNC -Predictee NO-BREAK NO-BREAK NO-BREAK NO-BREAK BREAK NO-BREAK

Words Food and drink are nice .Predictor 1 : L POS - N CC N V JJPredictor 2 : C POS N CC N V JJ PUNCPredictor 3 : R POS CC N V JJ PUNC -Predictee BREAK NO-BREAK NO-BREAK NO-BREAK BREAK NO-BREAK

Words Apples but not pears .Predictor 1 : L POS - N CC RB NPredictor 2 : C POS N CC RB N PUNCPredictor 3 : R POS CC RB N PUNC -Predictee BREAK NO-BREAK NO-BREAK BREAK NO-BREAK

Words He is but she’s not !Predictor 1 : L POS - N V CC N VPredictor 2 : C POS N V CC N V PUNCPredictor 3 : R POS V CC N V PUNC -Predictee NO-BREAK BREAK NO-BREAK NO-BREAK BREAK NO-BREAK

Words One , two , three .Predictor 1 : L POS - CD PUNC CD PUNC CDPredictor 2 : C POS CD PUNC CD PUNC CD PUNCPredictor 3 : R POS PUNC CD PUNC CD PUNC -Predictee BREAK NO-BREAK BREAK NO-BREAK BREAK NO-BREAK

Words Shaken yet not stirred .Predictor 1 : L POS - JJ CC RB JJPredictor 2 : C POS JJ CC RB JJ PUNCPredictor 3 : R POS CC RB JJ PUNC -Predictee BREAK NO-BREAK NO-BREAK BREAK NO-BREAK

Step 6 - get the training data ready for machine learning

Page 25: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Words I like to ride bikes .Predictor 1 : L POS - N V TO V NPredictor 2 : C POS N V TO V N PUNCPredictor 3 : R POS V TO V N PUNC -Predictee NO-BREAK NO-BREAK NO-BREAK NO-BREAK BREAK NO-BREAK

Words Food and drink are nice .Predictor 1 : L POS - N CC N V JJPredictor 2 : C POS N CC N V JJ PUNCPredictor 3 : R POS CC N V JJ PUNC -Predictee BREAK NO-BREAK NO-BREAK NO-BREAK BREAK NO-BREAK

Words Apples but not pears .Predictor 1 : L POS - N CC RB NPredictor 2 : C POS N CC RB N PUNCPredictor 3 : R POS CC RB N PUNC -

Step 6 - get the training data ready for machine learning

-NVNO-BREAK

NVTONO-BREAK

VTOVNO-BREAK

. . . etc

convert sequence into a set of independent data points

Page 26: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

The training data-

N

V

NO-BREAK

N

V

TO

NO-BREAK

V

TO

V

NO-BREAK

TO

V

N

NO-BREAK

V

N

PUNC

BREAK

N

PUNC

-

NO-BREAK

-

N

CC

BREAK

N

CC

N

NO-BREAK

CC

N

V

NO-BREAK

N

V

JJ

NO-BREAK

V

JJ

PUNC

BREAK

JJ

PUNC

-

NO-BREAK

-

N

CC

BREAK

N

CC

RB

NO-BREAK

CC

RB

N

NO-BREAK

RB

N

PUNC

BREAK

N

PUNC

-

NO-BREAK

-

N

V

NO-BREAK

N

V

CC

BREAK

V

CC

N

NO-BREAK

CC

N

V

NO-BREAK

N

V

PUNC

BREAK

V

PUNC

-

NO-BREAK

-

CD

PUNC

BREAK

CD

PUNC

CD

NO-BREAK

PUNC

CD

PUNC

BREAK

CD

PUNC

CD

NO-BREAK

PUNC

CD

PUNC

BREAK

CD

PUNC

-

NO-BREAK

-

JJ

CC

BREAK

JJ

CC

RB

NO-BREAK

CC

RB

JJ

NO-BREAK

RB

JJ

PUNC

BREAK

JJ

PUNC

-

NO-BREAK

Page 27: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Decision tree

Page 28: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Make a list of all possible questions

L POS = C POS = R POS =

L POS = C POS = R POS =

L POS = C POS = R POS =

L POS = C POS = R POS =

L POS = C POS = R POS =

L POS = C POS = R POS =

L POS = C POS = R POS =

L POS = C POS = R POS =

Page 29: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

My list of all possible questions

L POS = PUNC C POS = PUNC R POS = PUNC

L POS = N C POS = N R POS = N

L POS = V C POS = V R POS = V

L POS = TO C POS = TO R POS = TO

L POS = CC C POS = CC R POS = CC

L POS = JJ C POS = JJ R POS = JJ

L POS = RB C POS = RB R POS = RB

L POS = CD C POS = CD R POS = CD

Page 30: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Try question “C POS = N ?”

C POS = N ?

Page 31: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

The answer to the question “C POS = N ?” is either “yes” or “no” for each individual training data point

-

N

V

NO-BREAK

N

V

TO

NO-BREAK

V

TO

V

NO-BREAK

TO

V

N

NO-BREAK

V

N

PUNC

BREAK

N

PUNC

-

NO-BREAK

-

N

CC

BREAK

N

CC

N

NO-BREAK

CC

N

V

NO-BREAK

N

V

JJ

NO-BREAK

V

JJ

PUNC

BREAK

JJ

PUNC

-

NO-BREAK

-

N

CC

BREAK

N

CC

RB

NO-BREAK

CC

RB

N

NO-BREAK

RB

N

PUNC

BREAK

N

PUNC

-

NO-BREAK

-

N

V

NO-BREAK

N

V

CC

BREAK

V

CC

N

NO-BREAK

CC

N

V

NO-BREAK

N

V

PUNC

BREAK

V

PUNC

-

NO-BREAK

-

CD

PUNC

BREAK

CD

PUNC

CD

NO-BREAK

PUNC

CD

PUNC

BREAK

CD

PUNC

CD

NO-BREAK

PUNC

CD

PUNC

BREAK

CD

PUNC

-

NO-BREAK

-

JJ

CC

BREAK

JJ

CC

RB

NO-BREAK

CC

RB

JJ

NO-BREAK

RB

JJ

PUNC

BREAK

JJ

PUNC

-

NO-BREAK

Page 32: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

The answer to the question “C POS = N ?” is either “yes” or “no” for each individual training data point

-

N

V

NO-BREAK

N

V

TO

NO-BREAK

V

TO

V

NO-BREAK

TO

V

N

NO-BREAK

V

N

PUNC

BREAK

N

PUNC

-

NO-BREAK

-

N

CC

BREAK

N

CC

N

NO-BREAK

CC

N

V

NO-BREAK

N

V

JJ

NO-BREAK

V

JJ

PUNC

BREAK

JJ

PUNC

-

NO-BREAK

-

N

CC

BREAK

N

CC

RB

NO-BREAK

CC

RB

N

NO-BREAK

RB

N

PUNC

BREAK

N

PUNC

-

NO-BREAK

-

N

V

NO-BREAK

N

V

CC

BREAK

V

CC

N

NO-BREAK

CC

N

V

NO-BREAK

N

V

PUNC

BREAK

V

PUNC

-

NO-BREAK

-

CD

PUNC

BREAK

CD

PUNC

CD

NO-BREAK

PUNC

CD

PUNC

BREAK

CD

PUNC

CD

NO-BREAK

PUNC

CD

PUNC

BREAK

CD

PUNC

-

NO-BREAK

-

JJ

CC

BREAK

JJ

CC

RB

NO-BREAK

CC

RB

JJ

NO-BREAK

RB

JJ

PUNC

BREAK

JJ

PUNC

-

NO-BREAK

Page 33: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Place all data at the root, then partition with the question “C POS = N ?”N

V

TO

NO-BREAK

V

TO

V

NO-BREAKTO

V

N

NO-BREAKV

N

PUNC

BREAK

N

PUNC

-

NO-BREAK

N

CC

N

NO-BREAK

N

V

JJ

NO-BREAK

V

JJ

PUNC

BREAK

JJ

PUNC

-

NO-BREAK

-

N

CC

BREAK

N

CC

RB

NO-BREAK

CC

RB

N

NO-BREAK

RB

N

PUNC

BREAK

N

PUNC

-

NO-BREAK

-

N

V

NO-BREAK

N

V

CC

BREAK

V

CC

N

NO-BREAK

CC

N

V

NO-BREAK

N

V

PUNC

BREAK

V

PUNC

-

NO-BREAK

-

CD

PUNC

BREAK CD

PUNC

CD

NO-BREAKPUNC

CD

PUNC

BREAK

CD

PUNC

CD

NO-BREAK

PUNC

CD

PUNC

BREAK

CD

PUNC

-

NO-BREAK

-

JJ

CC

BREAK

JJ

CC

RB

NO-BREAKCC

RB

JJ

NO-BREAK

RB

JJ

PUNC

BREAK

JJ

PUNC

-

NO-BREAK

CC

N

V

NO-BREAK

-

N

CC

BREAK

-

N

V

NO-BREAK

yes no

C POS = N ?

Page 34: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Place all data at the root, then partition with the question “C POS = N ?”

N

V

TO

NO-BREAK

V

TO

V

NO-BREAKTO

V

N

NO-BREAK

V

N

PUNC

BREAK

N

PUNC

-

NO-BREAK

N

CC

N

NO-BREAK

N

V

JJ

NO-BREAK

V

JJ

PUNC

BREAK

JJ

PUNC

-

NO-BREAK-

N

CC

BREAK

N

CC

RB

NO-BREAK

CC

RB

N

NO-BREAK

RB

N

PUNC

BREAK

N

PUNC

-

NO-BREAK-

N

V

NO-BREAK

N

V

CC

BREAK

V

CC

N

NO-BREAK

CC

N

V

NO-BREAK

N

V

PUNC

BREAKV

PUNC

-

NO-BREAK

-

CD

PUNC

BREAK

CD

PUNC

CD

NO-BREAK

PUNC

CD

PUNC

BREAK

CD

PUNC

CD

NO-BREAK

PUNC

CD

PUNC

BREAK

CD

PUNC

-

NO-BREAK

-

JJ

CC

BREAK

JJ

CC

RB

NO-BREAK

CC

RB

JJ

NO-BREAK

RB

JJ

PUNC

BREAK

JJ

PUNC

-

NO-BREAK

CC

N

V

NO-BREAK

-

N

CC

BREAK

-

N

V

NO-BREAK

yes no

C POS = N ?

Page 35: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Measure goodness of split for the question “C POS = N ?”

Entropy at the Y node 4 BREAK 0.50 -0.504 NO BREAK 0.50 -0.50

# data points 8 1.00 bits

Entropy at the N node 8 BREAK 0.31 -0.5218 NO BREAK 0.69 -0.37

# data points 26 0.89 bits

# data points in total 34Total entropy 0.92 bits

Page 36: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Try another question: “R POS = CC ?”-

N

V

NO-BREAK

N

V

TO

NO-BREAK

V

TO

V

NO-BREAK

TO

V

N

NO-BREAK

V

N

PUNC

BREAK

N

PUNC

-

NO-BREAK

-

N

CC

BREAK

N

CC

N

NO-BREAK

CC

N

V

NO-BREAK

N

V

JJ

NO-BREAK

V

JJ

PUNC

BREAK

JJ

PUNC

-

NO-BREAK

-

N

CC

BREAK

N

CC

RB

NO-BREAK

CC

RB

N

NO-BREAK

RB

N

PUNC

BREAK

N

PUNC

-

NO-BREAK

-

N

V

NO-BREAK

N

V

CC

BREAK

V

CC

N

NO-BREAK

CC

N

V

NO-BREAK

N

V

PUNC

BREAK

V

PUNC

-

NO-BREAK

-

CD

PUNC

BREAK

CD

PUNC

CD

NO-BREAK

PUNC

CD

PUNC

BREAK

CD

PUNC

CD

NO-BREAK

PUNC

CD

PUNC

BREAK

CD

PUNC

-

NO-BREAK

-

JJ

CC

BREAK

JJ

CC

RB

NO-BREAK

CC

RB

JJ

NO-BREAK

RB

JJ

PUNC

BREAK

JJ

PUNC

-

NO-BREAK

Page 37: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Try another question: “R POS = CC ?”-

N

V

NO-BREAK

N

V

TO

NO-BREAK

V

TO

V

NO-BREAK

TO

V

N

NO-BREAK

V

N

PUNC

BREAK

N

PUNC

-

NO-BREAK

-

N

CC

BREAK

N

CC

N

NO-BREAK

CC

N

V

NO-BREAK

N

V

JJ

NO-BREAK

V

JJ

PUNC

BREAK

JJ

PUNC

-

NO-BREAK

-

N

CC

BREAK

N

CC

RB

NO-BREAK

CC

RB

N

NO-BREAK

RB

N

PUNC

BREAK

N

PUNC

-

NO-BREAK

-

N

V

NO-BREAK

N

V

CC

BREAK

V

CC

N

NO-BREAK

CC

N

V

NO-BREAK

N

V

PUNC

BREAK

V

PUNC

-

NO-BREAK

-

CD

PUNC

BREAK

CD

PUNC

CD

NO-BREAK

PUNC

CD

PUNC

BREAK

CD

PUNC

CD

NO-BREAK

PUNC

CD

PUNC

BREAK

CD

PUNC

-

NO-BREAK

-

JJ

CC

BREAK

JJ

CC

RB

NO-BREAK

CC

RB

JJ

NO-BREAK

RB

JJ

PUNC

BREAK

JJ

PUNC

-

NO-BREAK

Page 38: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Place all data at the root, then partition with the question “R POS = CC ?”

yes no

R POS = CC ?

-

N

V

NO-BREAK

N

V

TO

NO-BREAK

V

TO

V

NO-BREAK

TO

V

N

NO-BREAKV

N

PUNC

BREAK

N

PUNC

-

NO-BREAK

N

CC

N

NO-BREAK

CC

N

V

NO-BREAK

N

V

JJ

NO-BREAK

V

JJ

PUNC

BREAK

JJ

PUNC

-

NO-BREAK

-

N

CC

BREAKN

CC

RB

NO-BREAK

CC

RB

N

NO-BREAK

RB

N

PUNC

BREAK

N

PUNC

-

NO-BREAK

-

N

V

NO-BREAK N

V

CC

BREAK

V

CC

N

NO-BREAK

CC

N

V

NO-BREAK

N

V

PUNC

BREAK

V

PUNC

-

NO-BREAK

-

CD

PUNC

BREAK

CD

PUNC

CD

NO-BREAKPUNC

CD

PUNC

BREAKCD

PUNC

CD

NO-BREAK

PUNC

CD

PUNC

BREAK

CD

PUNC

-

NO-BREAK

-

JJ

CC

BREAK

JJ

CC

RB

NO-BREAK

CC

RB

JJ

NO-BREAK

RB

JJ

PUNC

BREAK

JJ

PUNC

-

NO-BREAK

-

N

CC

BREAK

Page 39: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Place all data at the root, then partition with the question “R POS = CC ?”

yes no

R POS = CC ?

-

N

V

NO-BREAK

N

V

TO

NO-BREAK

V

TO

V

NO-BREAK

TO

V

N

NO-BREAKV

N

PUNC

BREAK

N

PUNC

-

NO-BREAK

N

CC

N

NO-BREAK

CC

N

V

NO-BREAK

N

V

JJ

NO-BREAK

V

JJ

PUNC

BREAK

JJ

PUNC

-

NO-BREAK-

N

CC

BREAK

N

CC

RB

NO-BREAKCC

RB

N

NO-BREAK

RB

N

PUNC

BREAK

N

PUNC

-

NO-BREAK

-

N

V

NO-BREAK

N

V

CC

BREAK

V

CC

N

NO-BREAK

CC

N

V

NO-BREAK

N

V

PUNC

BREAK

V

PUNC

-

NO-BREAK

-

CD

PUNC

BREAK

CD

PUNC

CD

NO-BREAK

PUNC

CD

PUNC

BREAK

CD

PUNC

CD

NO-BREAK

PUNC

CD

PUNC

BREAK

CD

PUNC

-

NO-BREAK-

JJ

CC

BREAK

JJ

CC

RB

NO-BREAK

CC

RB

JJ

NO-BREAK

RB

JJ

PUNC

BREAK

JJ

PUNC

-

NO-BREAK

-

N

CC

BREAK

Page 40: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Try all possible questions, measuring goodness of split (as entropy, in bits)

L POS = PUNC C POS = PUNC R POS = PUNC 0.47

L POS = N C POS = N 0.92 R POS = N

L POS = V C POS = V R POS = V

L POS = TO C POS = TO R POS = TO

L POS = CC C POS = CC R POS = CC 0.74

L POS = JJ C POS = JJ R POS = JJ

L POS = RB C POS = RB R POS = RB

L POS = CD C POS = CD R POS = CD

Page 41: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Insert best question into tree and permanently partition data

yes no

R POS = PUNC ?

-

N

V

NO-BREAK

N

V

TO

NO-BREAK

V

TO

V

NO-BREAK

TO

V

N

NO-BREAK

V

N

PUNC

BREAK

N

PUNC

-

NO-BREAK

N

CC

N

NO-BREAK

CC

N

V

NO-BREAK

N

V

JJ

NO-BREAK

V

JJ

PUNC

BREAK

JJ

PUNC

-

NO-BREAK-

N

CC

BREAK

N

CC

RB

NO-BREAK

CC

RB

N

NO-BREAK

RB

N

PUNC

BREAK

N

PUNC

-

NO-BREAK

-

N

V

NO-BREAK

N

V

CC

BREAK

V

CC

N

NO-BREAK

CC

N

V

NO-BREAK

N

V

PUNC

BREAK

V

PUNC

-

NO-BREAK

-

CD

PUNC

BREAK

CD

PUNC

CD

NO-BREAK

PUNC

CD

PUNC

BREAK

CD

PUNC

CD

NO-BREAK

PUNC

CD

PUNC

BREAK CD

PUNC

-

NO-BREAK

-

JJ

CC

BREAK

JJ

CC

RB

NO-BREAK

CC

RB

JJ

NO-BREAK

RB

JJ

PUNC

BREAKJJ

PUNC

-

NO-BREAK

-

N

CC

BREAK

Page 42: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Recurse

yes no

R POS = PUNC ?

-

N

V

NO-BREAK

N

V

TO

NO-BREAK

V

TO

V

NO-BREAK

TO

V

N

NO-BREAK

V

N

PUNC

BREAK

N

PUNC

-

NO-BREAK

N

CC

N

NO-BREAK

CC

N

V

NO-BREAK

N

V

JJ

NO-BREAK

V

JJ

PUNC

BREAK

JJ

PUNC

-

NO-BREAK-

N

CC

BREAK

N

CC

RB

NO-BREAK

CC

RB

N

NO-BREAK

RB

N

PUNC

BREAK

N

PUNC

-

NO-BREAK

-

N

V

NO-BREAK

N

V

CC

BREAK

V

CC

N

NO-BREAK

CC

N

V

NO-BREAK

N

V

PUNC

BREAK

V

PUNC

-

NO-BREAK

-

CD

PUNC

BREAK

CD

PUNC

CD

NO-BREAK

PUNC

CD

PUNC

BREAK

CD

PUNC

CD

NO-BREAK

PUNC

CD

PUNC

BREAK CD

PUNC

-

NO-BREAK

-

JJ

CC

BREAK

JJ

CC

RB

NO-BREAK

CC

RB

JJ

NO-BREAK

RB

JJ

PUNC

BREAKJJ

PUNC

-

NO-BREAK

-

N

CC

BREAK

Page 43: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

My tree

R POS = PUNC ?

yes

BREAKR POS = CC ?

BREAK NO-BREAK

no

yes no

Decision tree

Page 44: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Words He likes to sail ships .POS N V TO V N PUNCPredictor 1 : L POSPredictor 2 : C POSPredictor 3 : R POSPredictee

Words Horses and cows are animals .POS N CC N V N PUNCPredictor 1 : L POSPredictor 2 : C POSPredictor 3 : R POSPredictee

Words One or two more things .POS CD CC CD JJ N PUNCPredictor 1 : L POSPredictor 2 : C POSPredictor 3 : R POSPredictee

Test data - use the tree to predict where the phrase breaks should be placed

Page 45: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Speech synthesis - pronunciation & prosody

• Machine Learning• Classification And Regression Trees (CARTs)

• classification: understanding entropy as a measure of predictability• regression: measuring the predictability of a continuous variable• stopping criteria

Page 46: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

13.4 3.14 2.7311.3 1.23 4.52 9.42 2.1110.1 1.87

3.14 2.73 1.23 4.52 2.11 1.87

13.411.3 9.4210.1

σ2 = 18.7

σ2 = 1.11

σ2 = 2.29

Page 47: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Speech synthesis - pronunciation & prosody

• Machine Learning• Classification And Regression Trees (CARTs)

• classification: understanding entropy as a measure of predictability• regression: measuring the predictability of a continuous variable• stopping criteria

Page 48: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Stopping criteria (we may use several)

• Classification or Regression• all data points have the same value for the predictee (job done!)• all data points have the same values for all predictors

• equivalently: no available question can split them• number of data points in parent node is below a threshold• number of data points in a child node would fall below a threshold

• Classification only• cannot reduce entropy by more than some pre-specified amount

• Regression only• cannot reduce variance by more than some pre-specified amount

Page 49: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

Today’s topics - what we covered

Phoneme Pronunciation

Prosody

Decision tree

Learning decision trees

Page 50: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis

What next?

In Module 5

• We have• normalised the text• predicted pronunciation

• predicted prosody

• That completes the linguistic specification

• Next, from that linguistic specification• it’s time to generate a waveform