Duration modeling for speech recognition

Exit

Duration modeling for speech Duration modeling for speech recognitionrecognition

Presented for BBN

Dr. Andrey Nikiforov

Department of Applied Mathematics and Statistics

State University of New York at Stony Brook

Start Exit

Additional topics

Computational and modeling issues improving the performance of speech recognition algorithms

Partial classification techniques Tree-dependence covariance models in HMM Fast search and computations for codebooks Interpolation for acoustic space

Start Exit

State duration in HMM

Start Exit

Duration distributions

Duration probability density functions

Time

Exponential

Raleigh

Weibull

Normal

Start Exit

From …

Start Exit

… to

Start Exit

Progressive model

Start Exit

Time calculation

BA

t t+1

Start Exit

Time calculation (continued)

BA

t t+1

Start Exit

Probability calculations: from …

Start Exit

…to

Start Exit

Hazard function

Start Exit

Hazard function estimation

Start Exit

“Nonparametric estimate”

Start Exit

“Trajectories”

Start Exit

State duration correction

(Fant et al., 1991)

Start Exit

Word duration

0.0

2.7

5.3

8.0

30.0 41.7 53.3 65.0

Word duration distribution

Word_length__frames_

Count

Start Exit

State duration correction

0.0

2.7

5.3

8.0

0.0 0.1 0.1 0.1

State duration distribution

C4

Count

0.0

2.7

5.3

8.0

0.0 0.1 0.1 0.1


C5

Count

0.0

4.7

9.3

14.0

2.8 3.3 3.7 4.2


C4

Count

0.0

4.0

8.0

12.0

2.5 3.5 4.5 5.5


C5

Count

Start Exit

State duration correction (continued)

0.0

2.7

5.3

8.0

0.1 0.1 0.1 0.1


C6

Count

0.0

3.3

6.7

10.0

0.0 0.1 0.1 0.2


C7

Count

0.0

5.0

10.0

15.0

2.5 3.7 4.8 6.0


C6

Count

0.0

6.7

13.3

20.0

2.0 4.7 7.3 10.0


C7

Count

Start Exit

Conclusions

• Representation of duration distribution via the hazard function is simple, effective and comfortable for programming

• Speech recognition errors dropped by 20-25% in different tasks

• Pure time spent in Viterbi search or full probability calculation increased in average by 20% compared to the conventional HMM (almost completely compensated by the reduction of computations due to more adequate modeling)

Start Exit

Partial classification techniques for speech recognition

Helps to create structure in speech HMMs

Useful in codebook(s) estimation

Initial estimates for HMMs and codebooks

More accurate estimates

Documents

Duration modeling for speech recognition