42
Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions An Interval-Based Algorithm for Feature Extraction from Speech Signals SCAN 2016: 17th Intl. Symposium on Scientific Computing, Computer Arithmetics and Verified Numerics Uppsala, Sweden September 26th, 2016 Andreas Rauh 1 , Susann Tiede 2 , Cornelia Klenke 2 1 Chair of Mechatronics, University of Rostock, Germany 2 Speech Therapists, Altentreptow, Germany A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals

An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

An Interval-Based Algorithm for FeatureExtraction from Speech Signals

SCAN 2016: 17th Intl. Symposium on Scientific Computing,Computer Arithmetics and Verified Numerics

Uppsala, Sweden

September 26th, 2016

Andreas Rauh1, Susann Tiede2, Cornelia Klenke2

1Chair of Mechatronics, University of Rostock, Germany2Speech Therapists, Altentreptow, Germany

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals

Page 2: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Contents

Classification of language disorders and project aims

Characteristic properties of voiced and unvoiced phonemes in speechsignals

Approaches for frequency estimationI Offline short-time Fourier analysisI Observer-/ Filter-based approaches

Phoneme-based segmentation of speech signals

Interval-based feature representation

Preliminary classification resultsI Distinction between voiced and unvoiced sounds (based on thresholds

for the estimated standard deviations)I Distinction between different voiced sounds (vowels, based on the

estimated narrow-band frequencies)

Conclusions and outlook on future work

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 1/27

Page 3: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Development of a Computer-Based Assistance System inSpeech Therapy (1)

Classification of language disorders

Pronunciation

Grammar

Lexicon

SUSE: A Software assistance system for Uncovering speech disordersby Stochastic Estimation techniques

1 Automatic transcription and preprocessing of spoken text involvingerroneous pronunciation

2 Automatic classification of pronunciation disorders

3 Grammatical analysis of freely spoken language

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 2/27

Page 4: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Development of a Computer-Based Assistance System inSpeech Therapy (2)

Classification of language disorders

Pronunciation

Grammar

Lexicon

Drawback of state-of-the-art speech recognition systems

State-of-the-art speech recognition and text processing systemsreplace incorrect items by seemingly correct ones

These substitutions prevent the classification of language disorders

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 3/27

Page 5: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Development of a Computer-Based Assistance System inSpeech Therapy (3)

Classification of language disorders

Pronunciation

Grammar

Lexicon

Practical demands: Speech recognition in a therapeutic context

Requirement to develop a new estimation and classification schemefor speech signals

Tedious and time-consuming work of speech therapists if freelyspoken language is to be analyzed (in contrast to standardized testprocedures) =⇒ Need for repeated listening of recorded speech

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 4/27

Page 6: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Characteristic Properties of Speech Signals (1)

Distinction between voiced and unvoiced phonemes

Phonemes as the basic building block of syllables, from which wordsare formed

Characterized by specific featuresI Value of basis frequency is speaker dependentI Format frequencies (basis frequency + higher frequency values)I Higher formant frequencies are typically not integer multiples of the

basis frequencyI Bandwidth of included frequency ranges varies for voiced/ unvoiced

phonemes

Voiced phonemes: E.g. normal vowels, characterized by several sharpformant frequencies

Unvoiced phonemes: E.g. whispered vowels and fricatives such as ch,ss, sh, f, characterized by wide blurred formant frequency ranges

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 5/27

Page 7: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Characteristic Properties of Speech Signals (2)

Mechanism of sound production

Voiced phonemesI Produced by vibrations of the vocal foldsI Vocal folds represent a fluidic resistance against the outflow of air

expelled from the lungs

Unvoiced phonemesI Turbulent, partially irregular, air flowI Negligible vibrations of the vocal foldsI Fizzing soundsI Originating between teeth and lips as well as between tongue and

hard/ soft palate

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 6/27

Page 8: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Characteristic Properties of Speech Signals (3)

State-of-the-art speech recognition systems

Use of offline frequency analysis1 Cut the sound sequence into short temporal slices of typically

10− 50 ms length2 Perform a short-time Fourier analysis for each of these time slices

(partly with overlapping time windows)3 Determine a measure of similarity with phoneme-dependent frequency

spectra (usually by the application of cross-correlation functions in thefrequency domain)

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 7/27

Page 9: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Fourier Analysis of a Benchmark Dataset (1)

Parameterization of the DFT analysis (sampling frequencyfs = 44.1 kHz)

DFT 1 DFT 2 DFT 3

Length of time window in samples (N) 512 1024 2048

Length of time window in ms 11.6 23.2 46.4

Sample shift between two DFT evaluati-ons

64 64 64

Spectral resolution ∆f in Hz 86.5 43.2 21.6

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 8/27

Page 10: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Fourier Analysis of a Benchmark Dataset (2)

Output for the setting DFT 1

ωin

104rad/s

t in s

141210

2468

∣∣Fk(ω)∣∣ in dB

00 1 2 3 4 5

−400

−200

−100

−300

Output for the setting DFT 2

ωin

104rad/s

t in s

141210

2468

∣∣Fk(ω)∣∣ in dB

00 1 2 3 4 5

−400

−200

−100

−300

Increasing the number of sampling points enhances the detectability offormant frequencies

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 9/27

Page 11: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Fourier Analysis of a Benchmark Dataset (3)

Output for the setting DFT 1

ωin

104rad/s

t in s

141210

2468

∣∣Fk(ω)∣∣ in dB

00 1 2 3 4 5

−400

−200

−100

−300

Output for the setting DFT 3

ωin

104rad/s

t in s

141210

2468

∣∣Fk(ω)∣∣ in dB

00 1 2 3 4 5

−400

−200

−100

−300

Points of time with significant changes in the spectrum representcandidates for transition between subsequent phonemes

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 10/27

Page 12: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Filter- and Observer-Based Online Frequency Analysis (1)

Fundamental signal model

Representation of measured speech signals by a superposition of differentharmonic components

ym(t) ≈ ym,n(t) =

n∑i=1

(αi · cos (ωi · t+ φi))

with the basis frequency ω1 > 0, further harmonic signal componentsω2, . . . , ωn, ωi+1 > ωi, i ∈ N, the signal amplitudes αi, and the phaseshifts φi

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 11/27

Page 13: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Filter- and Observer-Based Online Frequency Analysis (2)

State-space representation of the i-th frequency component

Continuous-time system model

x3i−2(t) = −x3i(t) · αi · sin (x3i(t) · t+ φi) =: x3i−1(t)

x3i−1(t) = −x23i(t) · αi · cos (x3i(t) · t+ φi)

x3i(t) = 0

Compact matrix-vector notation

Continuous-time system model

xi(t) = Ai (x3i(t)) · xi(t) , xi(t) =

x3i−2(t)x3i−1(t)x3i(t)

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 12/27

Page 14: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Filter- and Observer-Based Online Frequency Analysis (2)

State-space representation of the i-th frequency component

Continuous-time system model

x3i−2(t) = −x3i(t) · αi · sin (x3i(t) · t+ φi) =: x3i−1(t)

x3i−1(t) = −x23i(t) · αi · cos (x3i(t) · t+ φi)

x3i(t) = 0

Compact matrix-vector notation

Continuous-time system model

xi(t) = Ai (x3i(t)) · xi(t) , Ai (x3i(t)) =

0 1 0−x23i(t) 0 0

0 0 0

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 12/27

Page 15: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Filter- and Observer-Based Online Frequency Analysis (2)

State-space representation of the i-th frequency component

Continuous-time system model

x3i−2(t) = −x3i(t) · αi · sin (x3i(t) · t+ φi) =: x3i−1(t)

x3i−1(t) = −x23i(t) · αi · cos (x3i(t) · t+ φi)

x3i(t) = 0

Compact matrix-vector notation

Continuous-time system model

xi(t) = Ai (x3i(t)) · xi(t) , yi(t) = cTi · xi(t) with cTi =[1 0 0

]

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 12/27

Page 16: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Filter- and Observer-Based Online Frequency Analysis (2)

State-space representation of the i-th frequency component

Continuous-time system model

x3i−2(t) = −x3i(t) · αi · sin (x3i(t) · t+ φi) =: x3i−1(t)

x3i−1(t) = −x23i(t) · αi · cos (x3i(t) · t+ φi)

x3i(t) = 0

Concatenation of all models i = 1, . . . , n

Continuous-time system model

x(t) = A (x(t)) · x(t) with x(t) ∈ R3n

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 12/27

Page 17: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Filter- and Observer-Based Online Frequency Analysis (2)

State-space representation of the i-th frequency component

Continuous-time system model

x3i−2(t) = −x3i(t) · αi · sin (x3i(t) · t+ φi) =: x3i−1(t)

x3i−1(t) = −x23i(t) · αi · cos (x3i(t) · t+ φi)

x3i(t) = 0

Concatenation of all models i = 1, . . . , n

Continuous-time system model

A (x(t)) =

A1 (x3(t)) 0 . . . 0

0 A2 (x6(t)) . . . 0...

.... . .

...0 0 . . . An (x3n(t))

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 12/27

Page 18: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Filter- and Observer-Based Online Frequency Analysis (2)

State-space representation of the i-th frequency component

Continuous-time system model

x3i−2(t) = −x3i(t) · αi · sin (x3i(t) · t+ φi) =: x3i−1(t)

x3i−1(t) = −x23i(t) · αi · cos (x3i(t) · t+ φi)

x3i(t) = 0

Concatenation of all models i = 1, . . . , n

Continuous-time system model

ym,n(t) = cT · x(t) with cT =[cT1 cT2 . . . cTn

]

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 12/27

Page 19: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Filter- and Observer-Based Online Frequency Analysis (2)

State-space representation of the i-th frequency component

Continuous-time system model

x3i−2(t) = −x3i(t) · αi · sin (x3i(t) · t+ φi) =: x3i−1(t)

x3i−1(t) = −x23i(t) · αi · cos (x3i(t) · t+ φi)

x3i(t) = 0

Discrete-time notation for an Extended Kalman Filter design

xk+1=exp (Ts ·A (xk))·xk+wk =: Adk·xk+wk, fw,k (wk)=N (µw,k,Cw,k)

yk = ym,n,k := ym,n(tk) = cT · xk + vk , fv,k (vk) = N (µv,k,Cv,k)

Sampling frequency is sufficiently large so that time discretization errorsare negligibly small

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 12/27

Page 20: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Filter- and Observer-Based Online Frequency Analysis (3)

Estimate for ω1

t in s

ωin

104rad/s

0 1 2 3 4 5

10

1

0.1

Estimate for ω2

t in sωin

104rad/s

0 1 2 3 4 5

10

1

0.1

Estimation results for n = 2: expected value µex,k,3i (solid lines) and upper

frequency bound µex,k,3i + 3

(√Cex,k,(3i,3i)

); n = 3 would be required to

estimate the next formant frequency ω3

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 13/27

Page 21: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Filter- and Observer-Based Online Frequency Analysis (3)

Estimate for ω1

t in s

ωin

104rad/s

0 1 2 3 4 5

10

1

0.1

Estimate for ω2

t in sωin

104rad/s

0 1 2 3 4 5

10

1

0.1

A. Rauh, S. Tiede, and C. Klenke. Observer and Filter Approaches for the Frequency

Analysis of Speech Signals. and Stochastic Filter Approaches for a Phoneme-Based

Segmentation of Speech Signals. In Proc. of 21st IEEE Intl. Conference on Methods and

Models in Automation and Robotics MMAR 2016, Miedzyzdroje, Poland, 2016.

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 13/27

Page 22: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Phoneme-Based Segmentation of the Speech Signal (1)

Normalization of the Extended Kalman Filter outputs (L samples)

Normalized expectation of the i-th formant frequency

µex,k,3i =µex,k,3i

1

L

L∑k=1

µex,k,3i

Normalized (co-)variance of the i-th formant frequency

Cex,k,(3i,3i) =

Cex,k,(3i,3i)

1

L

L∑k=1

Cex,k,(3i,3i)

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 14/27

Page 23: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Phoneme-Based Segmentation of the Speech Signal (2)

Variation rate classification concerning the expectations of theestimated formant frequencies

Absolute variation rate

∆µex,k,3i =∣∣µex,k+1,3i − µex,k,3i

∣∣Candidates for phoneme boundaries are characterized by

∆µex,k,3i > ∆µ

Note

Temporal distance between two subsequent boundaries needs to be largerthan ∆Tmin

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 15/27

Page 24: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Phoneme-Based Segmentation of the Speech Signal (3)

Variation rate classification concerning the (co-)variances of theestimated formant frequencies

Absolute variation rate

∆Cex,k,(3i,3i) =

∣∣∣Cex,k+1,(3i,3i) − Ce

x,k,(3i,3i)

∣∣∣Candidates for phoneme boundaries are characterized by

∆Cex,k,(3i,3i) > ∆C

Note

Temporal distance between two subsequent boundaries needs to be largerthan ∆Tmin

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 16/27

Page 25: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Phoneme-Based Segmentation of the Speech Signal (3)

Variation rate classification concerning the (co-)variances of theestimated formant frequencies

Absolute variation rate

∆Cex,k,(3i,3i) =

∣∣∣Cex,k+1,(3i,3i) − Ce

x,k,(3i,3i)

∣∣∣Candidates for phoneme boundaries are characterized by

∆Cex,k,(3i,3i) > ∆C

Note: Combinations of both classifiers are possibleA. Rauh, S. Tiede, and C. Klenke. Stochastic Filter Approaches for a Phoneme-Based

Segmentation of Speech Signals. In Proc. of 21st IEEE Intl. Conference on Methods and

Models in Automation and Robotics MMAR 2016, Miedzyzdroje, Poland, 2016.

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 16/27

Page 26: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Phoneme-Based Segmentation of the Speech Signal (4)

ωin

104rad/s

t in s

141210

2468

00.750.40

Merging of both criteria with a minimum temporal distance∆Tmin = 20 ms

Comparison with a manually performed segmentation (red)

Note

Accurate detection of phoneme boundaries

Detection of characteristic variations within a single phonemeA. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 17/27

Page 27: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Alternative Algorithm for Phoneme Boundary Detection

Branch-and-bound procedure

Perform stochastic frequency estimation by means of the ExtendedKalman Filter as before

Determine CIH of estimated mean values and standard deviations forthe analyzed speech signal

Bisect the interval ifI Length is larger than ∆T ′

min < ∆Tmin

I Maximum variation rate exceeds threshold (K: time indices forconsidered temporal slice)

maxk∈K{µe

x,k,3i} −mink∈K{µe

x,k,3i} > ∆µ

I Analogously for the estimated (co-)variancesI Subsequent interval merging if the inequality above is not fulfilled in

the union of two neighboring temporal slices

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 18/27

Page 28: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Interval-Based Phoneme Database

Relevant Features

Convex interval hull (CIH) over the expected values of each formantfrequency for the complete phoneme duration

CIH over the standard deviations of each formant frequency

CIH over the amplitudes associated with each formant frequency

Averaged intervals with respect to their corresponding duration (incase of multiple temporal subintervals per phoneme)

Speaker-dependent normalization with respect to the mean of ω1

Database of reference values

Averaging of selected phoneme features from a reference data set(recording of TV news broadcast) for each of the phonemes

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 19/27

Page 29: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Threshold Distinction between Voiced and UnvoicedSounds (1)

Unvoiced phonemes: Characterized by large estimated standarddeviations (esp. for i ≥ 2)

Interval for normalized standard deviation of the second formant [σe2]

Duration of a phoneme τ

Criterion 1:sup{[σe2]} − σ∗

σ∗ − inf{[σe2]}> 0.5

Criterion 2:(inf{[σe2]} > σ∗) & (τ < 65 ms)

Either Criterion 1 or Criterion 2 has to be fulfilled

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 20/27

Page 30: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Threshold Distinction between Voiced and UnvoicedSounds (2)

Voiced phonemes: Characterized by small estimated standarddeviations (esp. for i ≥ 2)

Interval for normalized standard deviation of the second formant [σe2]

Intervals for the estimated formant frequencies [ωe1], [ωe

2]

Intervals for the estimated formant amplitudes [αe1], [αe

2]

Duration of a phoneme τ

Criterion:

(sup{[σe2]} < σ∗) & (inf{[ωe2]} > sup{[ωe

1]}) . . .

& (τ ≥ 50 ms) &

(sup{[αe

1]}sup{[αe

2]}> 0.8

)

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 21/27

Page 31: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Threshold Distinction between Voiced and UnvoicedSounds (3)

Classification results for a test dataset: Phonemes classified as . . .

Unvoiced: ’S’ ’n’ ’n’ ’s’ ’n’ ’ch’ ’t’

Voiced: ’i’ ’i’ ’e’ ’n’

Undecided: ’b’ ’e’ ’g’ ’e’ ’i’ ’ch’ ’ei’ ’z’ ’u’ ’r’ ’i’

Note

Misclassified ’n’ is hard to detect, because it follows a short ’i’

Vowels in the class undecided are reliably detected by the followingphoneme classifier

Advantage robust pre-classification reduces the subsequent effort

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 22/27

Page 32: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Classification of Phonemes: Algorithm 1

Similarity measure

Short hand notation for an n-dimensional interval box:∏(diam{[z]}) = (z1 − z1) · . . . · (zn − zn)

Current feature box [z]

Reference feature box [z]refConvex interval hull denoted by ∪Similarity measure∏

(diam{[z]}) +∏

(diam{[z]ref})∏(diam{[z] ∪ [z]ref})

Best match for the corresponding phoneme is the maximizer of thesimilarity measure above

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 23/27

Page 33: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Classification of Phonemes: Algorithm 2

Use of interval likelihood representations

Definition of a point-valued normal distribution for the phonemefeature representation in time step k

fx,k (xk) =1√

(2π)3n |Cx,k|exp

{−1

2(xk − µx,k)T C−1x,k (xk − µx,k)

}

Definition of the interval residuum value [rx,K] := [µx,K]− [µx,K]ref(all values defined as CIO over the index set K of the phonemeduration)

Corresponding interval definition of the error (co-)variance[Sx,K] := [Cx,K] + [Cx,K]ref

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 24/27

Page 34: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Classification of Phonemes: Algorithm 2

Use of interval likelihood representations

Definition of an interval-based likelihood representation for the indexset K

fr,K ∈1√

(2π)3n |[Sx,K]|exp

{−1

2[rx,K]T [Sx,K]−1 [rx,K]

}

Definition of the interval residuum value [rx,K] := [µx,K]− [µx,K]ref(all values defined as CIO over the index set K of the phonemeduration)

Corresponding interval definition of the error (co-)variance[Sx,K] := [Cx,K] + [Cx,K]refDetect maximum supremum of all fr,K for the list of all databaseentries

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 24/27

Page 35: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Classification of Phonemes: Algorithm 2

Use of interval likelihood representations

Definition of an interval-based likelihood representation for the indexset K

fr,K ∈1√

(2π)3n |[Sx,K]|exp

{−1

2[rx,K]T [Sx,K]−1 [rx,K]

}

Definition of the interval residuum value [rx,K] := [µx,K]− [µx,K]ref(all values defined as CIO over the index set K of the phonemeduration)

Corresponding interval definition of the error (co-)variance[Sx,K] := [Cx,K] + [Cx,K]refProjection onto a subspace formed by individual components of µx,Kis possible

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 24/27

Page 36: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Classification of Phonemes: Algorithm 2

Use of interval likelihood representations

Definition of an interval-based likelihood representation for the indexset K

fr,K ∈1√

(2π)3n |[Sx,K]|exp

{−1

2[rx,K]T [Sx,K]−1 [rx,K]

}

Definition of the interval residuum value [rx,K] := [µx,K]− [µx,K]ref(all values defined as CIO over the index set K of the phonemeduration)

Corresponding interval definition of the error (co-)variance[Sx,K] := [Cx,K] + [Cx,K]refRecursive form as a generalization of a multi-hypothesis KalmanFilter approach

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 24/27

Page 37: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Classification of Phonemes: Algorithm 2

Use of interval likelihood representations

Definition of an interval-based likelihood representation for the indexset K

fr,K ∈1√

(2π)3n |[Sx,K]|exp

{−1

2[rx,K]T [Sx,K]−1 [rx,K]

}

Definition of the interval residuum value [rx,K] := [µx,K]− [µx,K]ref(all values defined as CIO over the index set K of the phonemeduration)

Corresponding interval definition of the error (co-)variance[Sx,K] := [Cx,K] + [Cx,K]refFallback solution in case of a non-invertible interval matrix [Sx,K]:Switching to Algorithm 1

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 24/27

Page 38: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Classification of Phonemes: Ongoing Work

Different definitions of the feature vectors in Algorithm 1

Estimated CIH for estimated formant frequencies (expectation values)

Feature box extended by (co-)variance information

Extension by (co-)variance information and amplitudes

Different definitions of the feature vectors in Algorithm 2

Complete estimated state vector of the EKF definition

Consideration of estimated frequencies (entries 3i, i = 1, . . . , n)

Possible source for misclassifications

Representation of a phoneme by a single feature box disregardscharacteristic frequency variations especially in bilabial stops (’b’,’p’)which are followed by a voiced sound =⇒ Use of temporal subintervalrepresentations of phonemesA. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 25/27

Page 39: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Classification of Vowels: Temporal Subintervals for theIndividual Phonemes

Use of intermediate transition points within each individual phoneme

Definition of the similarity measures as before, however, application toeach subinterval

Proportional scaling of the length of the reference templates to thelength ot the currently measured phoneme

Duration as a weighting factor of the individual similarity measures(weighted arithmetic mean)

Available dataset

Besides the first dataset (TV news broadcast), selected words (40)covering the relevant phonemes of the German language, spoken byapprox. 20 different persons, are investigated

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 26/27

Page 40: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Conclusions and Outlook on Future Work

Stochastic filtering approach for the online frequency estimation ofspeech signals

Stochastic filter as the basis for the phoneme-based segmentation ofspeech signals

Fundamental interval representation of phoneme features

Basic classifier distinguishing between unvoiced and voiced sounds

Extension of the classification procedureI Further interval-based (pre-)classification stagesI Stochastic classification by multi-hypothesis Kalman Filters (recursive

implementation)

Validation on further datasets without/ with pronunciation disorders

Markov model for transitions between phonemes (time-dependentprobabilities for the transition)

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 27/27

Page 41: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Conclusions and Outlook on Future Work

Stochastic filtering approach for the online frequency estimation ofspeech signals

Stochastic filter as the basis for the phoneme-based segmentation ofspeech signals

Fundamental interval representation of phoneme features

Basic classifier distinguishing between unvoiced and voiced sounds

Extension of the classification procedureI Further interval-based (pre-)classification stagesI Stochastic classification by multi-hypothesis Kalman Filters (recursive

implementation)

Validation on further datasets without/ with pronunciation disorders

Markov model for transitions between phonemes (time-dependentprobabilities for the transition)

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals 27/27

Page 42: An Interval-Based Algorithm for Feature Extraction from ...€¦ · spoken language is to be analyzed (in contrast to standardized test procedures) =)Need for repeated listening of

Project Overview Fundamentals Frequency Estimation Phoneme Segmentation Phoneme Database Conclusions

Thank you for your attention!

Merci beaucoup pour votre attention!

Спасибо за Ваше внимание!

Dziękuję bardzo za uwagę!

¡Muchas gracias por su atención!

Grazie mille per la vostra attenzione!

Vielen Dank für Ihre Aufmerksamkeit!

A. Rauh et al.: An Interval-Based Algorithm for Feature Extraction from Speech Signals