21
Exit Duration modeling for speech Duration modeling for speech recognition recognition Presented for BBN Dr. Andrey Nikiforov Department of Applied Mathematics and Statistics State University of New York at Stony Brook

Duration modeling for speech recognition

  • Upload
    lesa

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

Duration modeling for speech recognition. Presented for BBN Dr . Andrey Nikiforov Department of Applied Mathematics and Statistics State University of New York at Stony Brook. Additional topics. Computational and modeling issues improving the performance of speech recognition algorithms - PowerPoint PPT Presentation

Citation preview

Page 1: Duration modeling for speech recognition

Exit

Duration modeling for speech Duration modeling for speech recognitionrecognition

Presented for BBN

Dr. Andrey Nikiforov

Department of Applied Mathematics and Statistics

State University of New York at Stony Brook

Page 2: Duration modeling for speech recognition

Start Exit

Additional topics

Computational and modeling issues improving the performance of speech recognition algorithms

Partial classification techniques Tree-dependence covariance models in HMM Fast search and computations for codebooks Interpolation for acoustic space

Page 3: Duration modeling for speech recognition

Start Exit

State duration in HMM

Page 4: Duration modeling for speech recognition

Start Exit

Duration distributions

Duration probability density functions

Time

Exponential

Raleigh

Weibull

Normal

Page 5: Duration modeling for speech recognition

Start Exit

From …

Page 6: Duration modeling for speech recognition

Start Exit

… to

Page 7: Duration modeling for speech recognition

Start Exit

Progressive model

Page 8: Duration modeling for speech recognition

Start Exit

Time calculation

BA

t t+1

Page 9: Duration modeling for speech recognition

Start Exit

Time calculation (continued)

BA

t t+1

Page 10: Duration modeling for speech recognition

Start Exit

Probability calculations: from …

Page 11: Duration modeling for speech recognition

Start Exit

…to

Page 12: Duration modeling for speech recognition

Start Exit

Hazard function

Page 13: Duration modeling for speech recognition

Start Exit

Hazard function estimation

Page 14: Duration modeling for speech recognition

Start Exit

“Nonparametric estimate”

Page 15: Duration modeling for speech recognition

Start Exit

“Trajectories”

Page 16: Duration modeling for speech recognition

Start Exit

State duration correction

(Fant et al., 1991)

Page 17: Duration modeling for speech recognition

Start Exit

Word duration

0.0

2.7

5.3

8.0

30.0 41.7 53.3 65.0

Word duration distribution

Word_length__frames_

Count

Page 18: Duration modeling for speech recognition

Start Exit

State duration correction

0.0

2.7

5.3

8.0

0.0 0.1 0.1 0.1

State duration distribution

C4

Count

0.0

2.7

5.3

8.0

0.0 0.1 0.1 0.1

State duration distribution

C5

Count

0.0

4.7

9.3

14.0

2.8 3.3 3.7 4.2

State duration distribution

C4

Count

0.0

4.0

8.0

12.0

2.5 3.5 4.5 5.5

State duration distribution

C5

Count

Page 19: Duration modeling for speech recognition

Start Exit

State duration correction (continued)

0.0

2.7

5.3

8.0

0.1 0.1 0.1 0.1

State duration distribution

C6

Count

0.0

3.3

6.7

10.0

0.0 0.1 0.1 0.2

State duration distribution

C7

Count

0.0

5.0

10.0

15.0

2.5 3.7 4.8 6.0

State duration distribution

C6

Count

0.0

6.7

13.3

20.0

2.0 4.7 7.3 10.0

State duration distribution

C7

Count

Page 20: Duration modeling for speech recognition

Start Exit

Conclusions

• Representation of duration distribution via the hazard function is simple, effective and comfortable for programming

• Speech recognition errors dropped by 20-25% in different tasks

•  Pure time spent in Viterbi search or full probability calculation increased in average by 20% compared to the conventional HMM (almost completely compensated by the reduction of computations due to more adequate modeling)

Page 21: Duration modeling for speech recognition

Start Exit

Partial classification techniques for speech recognition

Helps to create structure in speech HMMs

Useful in codebook(s) estimation

Initial estimates for HMMs and codebooks

More accurate estimates