24
Speech Recognition Lecture 6: Language Modeling Software Libray Mehryar Mohri Courant Institute and Google Research [email protected]

language modeling software library

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: language modeling software library

Speech RecognitionLecture 6: Language Modeling Software Libray

Mehryar MohriCourant Institute and Google Research

[email protected]

Page 2: language modeling software library

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

Software Library

GRM Library: Grammar Library. General softwarecollection for constructing and modifying weighted automata and transducers representing grammars and statistical language models (Allauzen, MM, and

Roark, 2005).

http://www.research.att.com/projects/mohri/grm

2

Page 3: language modeling software library

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

Counting

Model creation, shrinking, and conversion

Class-based models

This Lecture

3

Page 4: language modeling software library

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

Overview

Generality: to support the representation and use of the various grammars in dynamic speech recognition.

Efficiency: to support competitive large-vocabulary dynamic recognition using automata of several hundred million states and transitions.

Reliability: to serve as a solid foundation for research in statistical language modeling.

4

Page 5: language modeling software library

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

Content

Statistical Language Models: creating, shrinking, and converting language models.Grammar compilation:

• weighted context-dependent rules, weighted context-free grammars (CFGs).

• regular approximations of CFGs.Text and grammar processing utilities:

• local grammars, suffix automata.

• local determinization.

• counting and merging.5

Page 6: language modeling software library

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

Language Modeling Tools

Counts: automata (strings or lattices), merging.

Models:

• Backoff or deleted interpolation smoothing.

• Katz or absolute discounting.

• Knesser-Ney models.

Shrinking: weighted difference or relative entropy.

Class-based modeling: straightforward.

6

Page 7: language modeling software library

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

Corpus

Input:

Program:

7

Corpus Labelshello.bye.hello.bye bye.

<s> 1</s> 2<unknown> 3hello 4bye 5

farcompilestrings -i labels corpus.txt > foo.farcat lattice1.fsm ... latticeN.fsm > foo.faror

Page 8: language modeling software library

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

Counting

Weights:

• use fsmpush to remove initial weight and create a probabilistic automaton.

• counting from far files.

• counts produced in log semiring.

Algorithm:

• applies to all probabilistic automata.

• In particular, no cycle with weight zero or less.

8

Page 9: language modeling software library

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

Counting

Program:

Graphical representation:

9

grmcount -n 2 -s 1 -f 2 foo.far > foo.2.counts.fsmgrmmerge foo.counts.fsm bar.counts.fsm > foobar.counts.fsm

0/0

2/0<s>/4

3/0hello/2

1/0</s>/4

4/0

bye/3

hello/2

bye/2

</s>/2

</s>/2

bye/1

Page 10: language modeling software library

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

Counting

Model creation, shrinking, and conversion

Class-based models

This Lecture

10

Page 11: language modeling software library

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

Creating Back-off Model

Program:

Graphical representation:

11

grmmake foo.2.counts.fsm > foo.2.lm.fsm

Making a backoff model from counts

• Program:grmmake foo.2g.counts.fsm > foo.2g.lm.fsm

• Graphical Representation:

0/0

1

</s>/0.410bye/1.108

4

!!"#$%%</s>/0.810

bye/1.098

2

hello/1.504

</s>/0.005

!!&#'%&

3

bye/0.698

!!&#&()

hello/0.698

Mehryar Mohri Weighted Context-Free Grammars 15

Page 12: language modeling software library

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

Shrinking Back-off Model

Program:

Graphical representation:

12

grmshrink -c 4 foo.2.lm.fsm > foo.2.s4.lm.fsm

Shrinking a backoff model

• Program:grmshrink -s 4 foo.2g.lm.fsm > foo.2g.s4.lm.fsm

• Graphical Representation:

0/0

1

</s>/0.005

3

!!"#$%"

</s>/0.810

hello/1.504

bye/1.098

2

hello/0.698

!!"#"&'

bye/0.698

Mehryar Mohri Weighted Context-Free Grammars 16

Page 13: language modeling software library

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

Back-off Smoothing

Definition: for a bigram model,

13

Pr[wi|wi−1] =

dc(wi−1wi)c(wi−1wi)

c(wi−1)if k > 0;

α Pr[wi] otherwise;

where

dk =

1 if k > 5;

≈(k + 1)nk+1

knk

otherwise.

Page 14: language modeling software library

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

Conversion of Back-off Model

Interpolated model:

Failure function model - failure class:

• efficient representation of backoff models.

• requires on-the-fly composition for decoding.

Failure function model - basic class:

• state-splitting: correct in tropical semiring.

• pre-optimization: on-the-fly composition not required.

14

grmconvert foo.lm.fsm -m interpolated > foo.int.lm.fsm

grmconvert foo.lm.fsm -e failure_transitions > foo.flm.fsm

grmconvert foo.lm.fsm -e state_splitting > foo.elm.fsm

Page 15: language modeling software library

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

Counting

Model creation, shrinking, and conversion

Class-based models

This Lecture

15

Page 16: language modeling software library

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

Class-Based Models

Simple class-based models:

Methods in GRM: no special utility needed.

• create transducer mapping strings to classes.

• use fsmcompose to map from word corpus to classes.

• build and make model over classes.

• use fsmcompose to map from classes to words.Generality: classes defined by weighted automata.

16

Pr[wi|h] = Pr[wi|Ci] Pr[Ci|h].

Page 17: language modeling software library

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

Class-Based Model - Example

Example: BYE = {bye, bye bye}.

Graphical representation: mapping from strings to classes.

17

Class-based models – example

• Create a transducer that maps between strings and classes

• Example: BYE = {bye, bye bye}

• Graphical Representation:

0/0

hello:hello/0

1/0

bye:BYE/0.693

hello:hello/0

bye:!!"

Mehryar Mohri Weighted Context-Free Grammars 20

Page 18: language modeling software library

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

Class-Based Model - Counts

18

Class-based counts – Graphical Representation

0/0

2/0

<s>/4

3/0hello/21/0

</s>/4

4/0

BYE/2

hello/2

BYE/2

</s>/2

</s>/2

Mehryar Mohri Weighted Context-Free Grammars 21

Original counts

0/0

2/0

<s>/4

3/0hello/2

1/0</s>/4

4/0

bye/3

hello/2

bye/2

</s>/2

</s>/2

bye/1

Mehryar Mohri Weighted Context-Free Grammars 22

Original counts. Class-based counts.

Page 19: language modeling software library

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

Models

19

Original Model

0/0

1

</s>/0.410bye/1.108

4

!!"#$%%</s>/0.810

bye/1.098

2

hello/1.504

</s>/0.005

!!&#'%&

3

bye/0.698

!!&#&()

hello/0.698

Mehryar Mohri Weighted Context-Free Grammars 24

original model.

class-based model.

Class-based model – Graphical Representation

0/0

1

</s>/0.005

4

!!"#$%&</s>/0.693

BYE/1.386

2

hello/1.386

</s>/0.005

!!"#$%&

3

BYE/0.698

!!"#$%&

hello/0.698

Mehryar Mohri Weighted Context-Free Grammars 23

Page 20: language modeling software library

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

Final Class-Based Model

20

Final model – Graphical Representation

0 1!!"#$%&

2

hello/0.698

3

bye/1.391

hello/1.386

bye/2.079

4/0

</s>/0.693

!!"#$%&

</s>/0.005

!!"#$%&</s>/0.005

5

bye/0

!!"#$%& </s>/0.005

Mehryar Mohri Weighted Context-Free Grammars 25

Page 21: language modeling software library

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

Conclusion

GRM Library: general utilities for text and grammar processing.

• generality: e.g., counts from arbitrary automata or a class-based-model.

• efficiency: practical, e.g., counting lattices in 1/50th real-time.

• testing: statistical tests for reliability.

21

Page 22: language modeling software library

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

References• Cyril Allauzen, Mehryar Mohri, and Brian Roark. Generalized Algorithms for Constructing

Statistical Language Models. In 41st Meeting of the Association for Computational Linguistics (ACL 2003), Proceedings of the Conference, Sapporo, Japan. July 2003.

• Cyril Allauzen, Mehryar Mohri, and Brian Roark. The Design Principles and Algorithms of a Weighted Grammar Library. International Journal of Foundations of Computer Science, 16(3):403-421, 2005.

• Peter F. Brown, Vincent J. Della Pietra, Peter V. deSouza, Jennifer C. Lai, and Robert L. Mercer. 1992. Class-based n-gram models of natural language. Computational Linguistics, 18(4):467-479.

• Stanley Chen and Joshua Goodman. An empirical study of smoothing techniques for language modeling. Technical Report, TR-10-98, Harvard University. 1998.

• William Gale and Kenneth W. Church. What’s wrong with adding one? In N. Oostdijk and P. de Hann, editors, Corpus-Based Research into Language. Rodolpi, Amsterdam.

• Good, I. The population frequencies of species and the estimation of population parameters, Biometrica, 40, 237-264, 1953.

22

Page 23: language modeling software library

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

References• Frederick Jelinek and Robert L. Mercer. 1980. Interpolated estimation of Markov source

parameters from sparse data. In Proceedings of the Workshop on Pattern Recognition in Practice, pages 381-397.

• Slava Katz . Estimation of probabilities from sparse data for the language model component of a speech recognizer, IEEE Transactions on Acoustics, Speech and Signal Processing, 35, 400-401, 1987.

• Reinhard Kneser and Hermann Ney. Improved backing-off for m-gram language modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, volume 1, pages 181-184, 1995.

• David A. McAllester, Robert E. Schapire: On the Convergence Rate of Good-Turing Estimators. Proceedings of Conference on Learning Theory (COLT) 2000: 1-6.

• Mehryar Mohri. Weighted Grammar Tools: the GRM Library. In Robustness in Language and Speech Technology. pages 165-186. Kluwer Academic Publishers, The Netherlands, 2001.

• Hermann Ney, Ute Essen, and Reinhard Kneser. 1994. On structuring probabilistic dependences in stochastic language modeling. Computer Speech and Language, 8:1-38.

23

Page 24: language modeling software library

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

References• Kristie Seymore and Ronald Rosenfeld. Scalable backoff language models. In Proceedings of

the International Conference on Spoken Language Processing (ICSLP), 1996.

• Andreas Stolcke. 1998. Entropy-based pruning of back-off language models. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, pages 270-274.

• Ian H. Witten and Timothy C. Bell. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression, IEEE Transactions on Information Theory, 37(4):1085-1094, 1991.

24