Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
Speech RecognitionLecture 6: Language Modeling Software Libray
Mehryar MohriCourant Institute and Google Research
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Software Library
GRM Library: Grammar Library. General softwarecollection for constructing and modifying weighted automata and transducers representing grammars and statistical language models (Allauzen, MM, and
Roark, 2005).
http://www.research.att.com/projects/mohri/grm
2
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Counting
Model creation, shrinking, and conversion
Class-based models
This Lecture
3
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Overview
Generality: to support the representation and use of the various grammars in dynamic speech recognition.
Efficiency: to support competitive large-vocabulary dynamic recognition using automata of several hundred million states and transitions.
Reliability: to serve as a solid foundation for research in statistical language modeling.
4
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Content
Statistical Language Models: creating, shrinking, and converting language models.Grammar compilation:
• weighted context-dependent rules, weighted context-free grammars (CFGs).
• regular approximations of CFGs.Text and grammar processing utilities:
• local grammars, suffix automata.
• local determinization.
• counting and merging.5
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Language Modeling Tools
Counts: automata (strings or lattices), merging.
Models:
• Backoff or deleted interpolation smoothing.
• Katz or absolute discounting.
• Knesser-Ney models.
Shrinking: weighted difference or relative entropy.
Class-based modeling: straightforward.
6
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Corpus
Input:
Program:
7
Corpus Labelshello.bye.hello.bye bye.
<s> 1</s> 2<unknown> 3hello 4bye 5
farcompilestrings -i labels corpus.txt > foo.farcat lattice1.fsm ... latticeN.fsm > foo.faror
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Counting
Weights:
• use fsmpush to remove initial weight and create a probabilistic automaton.
• counting from far files.
• counts produced in log semiring.
Algorithm:
• applies to all probabilistic automata.
• In particular, no cycle with weight zero or less.
8
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Counting
Program:
Graphical representation:
9
grmcount -n 2 -s 1 -f 2 foo.far > foo.2.counts.fsmgrmmerge foo.counts.fsm bar.counts.fsm > foobar.counts.fsm
0/0
2/0<s>/4
3/0hello/2
1/0</s>/4
4/0
bye/3
hello/2
bye/2
</s>/2
</s>/2
bye/1
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Counting
Model creation, shrinking, and conversion
Class-based models
This Lecture
10
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Creating Back-off Model
Program:
Graphical representation:
11
grmmake foo.2.counts.fsm > foo.2.lm.fsm
Making a backoff model from counts
• Program:grmmake foo.2g.counts.fsm > foo.2g.lm.fsm
• Graphical Representation:
0/0
1
</s>/0.410bye/1.108
4
!!"#$%%</s>/0.810
bye/1.098
2
hello/1.504
</s>/0.005
!!&#'%&
3
bye/0.698
!!&#&()
hello/0.698
Mehryar Mohri Weighted Context-Free Grammars 15
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Shrinking Back-off Model
Program:
Graphical representation:
12
grmshrink -c 4 foo.2.lm.fsm > foo.2.s4.lm.fsm
Shrinking a backoff model
• Program:grmshrink -s 4 foo.2g.lm.fsm > foo.2g.s4.lm.fsm
• Graphical Representation:
0/0
1
</s>/0.005
3
!!"#$%"
</s>/0.810
hello/1.504
bye/1.098
2
hello/0.698
!!"#"&'
bye/0.698
Mehryar Mohri Weighted Context-Free Grammars 16
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Back-off Smoothing
Definition: for a bigram model,
13
Pr[wi|wi−1] =
dc(wi−1wi)c(wi−1wi)
c(wi−1)if k > 0;
α Pr[wi] otherwise;
where
dk =
1 if k > 5;
≈(k + 1)nk+1
knk
otherwise.
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Conversion of Back-off Model
Interpolated model:
Failure function model - failure class:
• efficient representation of backoff models.
• requires on-the-fly composition for decoding.
Failure function model - basic class:
• state-splitting: correct in tropical semiring.
• pre-optimization: on-the-fly composition not required.
14
grmconvert foo.lm.fsm -m interpolated > foo.int.lm.fsm
grmconvert foo.lm.fsm -e failure_transitions > foo.flm.fsm
grmconvert foo.lm.fsm -e state_splitting > foo.elm.fsm
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Counting
Model creation, shrinking, and conversion
Class-based models
This Lecture
15
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Class-Based Models
Simple class-based models:
Methods in GRM: no special utility needed.
• create transducer mapping strings to classes.
• use fsmcompose to map from word corpus to classes.
• build and make model over classes.
• use fsmcompose to map from classes to words.Generality: classes defined by weighted automata.
16
Pr[wi|h] = Pr[wi|Ci] Pr[Ci|h].
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Class-Based Model - Example
Example: BYE = {bye, bye bye}.
Graphical representation: mapping from strings to classes.
17
Class-based models – example
• Create a transducer that maps between strings and classes
• Example: BYE = {bye, bye bye}
• Graphical Representation:
0/0
hello:hello/0
1/0
bye:BYE/0.693
hello:hello/0
bye:!!"
Mehryar Mohri Weighted Context-Free Grammars 20
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Class-Based Model - Counts
18
Class-based counts – Graphical Representation
0/0
2/0
<s>/4
3/0hello/21/0
</s>/4
4/0
BYE/2
hello/2
BYE/2
</s>/2
</s>/2
Mehryar Mohri Weighted Context-Free Grammars 21
Original counts
0/0
2/0
<s>/4
3/0hello/2
1/0</s>/4
4/0
bye/3
hello/2
bye/2
</s>/2
</s>/2
bye/1
Mehryar Mohri Weighted Context-Free Grammars 22
Original counts. Class-based counts.
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Models
19
Original Model
0/0
1
</s>/0.410bye/1.108
4
!!"#$%%</s>/0.810
bye/1.098
2
hello/1.504
</s>/0.005
!!&#'%&
3
bye/0.698
!!&#&()
hello/0.698
Mehryar Mohri Weighted Context-Free Grammars 24
original model.
class-based model.
Class-based model – Graphical Representation
0/0
1
</s>/0.005
4
!!"#$%&</s>/0.693
BYE/1.386
2
hello/1.386
</s>/0.005
!!"#$%&
3
BYE/0.698
!!"#$%&
hello/0.698
Mehryar Mohri Weighted Context-Free Grammars 23
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Final Class-Based Model
20
Final model – Graphical Representation
0 1!!"#$%&
2
hello/0.698
3
bye/1.391
hello/1.386
bye/2.079
4/0
</s>/0.693
!!"#$%&
</s>/0.005
!!"#$%&</s>/0.005
5
bye/0
!!"#$%& </s>/0.005
Mehryar Mohri Weighted Context-Free Grammars 25
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Conclusion
GRM Library: general utilities for text and grammar processing.
• generality: e.g., counts from arbitrary automata or a class-based-model.
• efficiency: practical, e.g., counting lattices in 1/50th real-time.
• testing: statistical tests for reliability.
21
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
References• Cyril Allauzen, Mehryar Mohri, and Brian Roark. Generalized Algorithms for Constructing
Statistical Language Models. In 41st Meeting of the Association for Computational Linguistics (ACL 2003), Proceedings of the Conference, Sapporo, Japan. July 2003.
• Cyril Allauzen, Mehryar Mohri, and Brian Roark. The Design Principles and Algorithms of a Weighted Grammar Library. International Journal of Foundations of Computer Science, 16(3):403-421, 2005.
• Peter F. Brown, Vincent J. Della Pietra, Peter V. deSouza, Jennifer C. Lai, and Robert L. Mercer. 1992. Class-based n-gram models of natural language. Computational Linguistics, 18(4):467-479.
• Stanley Chen and Joshua Goodman. An empirical study of smoothing techniques for language modeling. Technical Report, TR-10-98, Harvard University. 1998.
• William Gale and Kenneth W. Church. What’s wrong with adding one? In N. Oostdijk and P. de Hann, editors, Corpus-Based Research into Language. Rodolpi, Amsterdam.
• Good, I. The population frequencies of species and the estimation of population parameters, Biometrica, 40, 237-264, 1953.
22
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
References• Frederick Jelinek and Robert L. Mercer. 1980. Interpolated estimation of Markov source
parameters from sparse data. In Proceedings of the Workshop on Pattern Recognition in Practice, pages 381-397.
• Slava Katz . Estimation of probabilities from sparse data for the language model component of a speech recognizer, IEEE Transactions on Acoustics, Speech and Signal Processing, 35, 400-401, 1987.
• Reinhard Kneser and Hermann Ney. Improved backing-off for m-gram language modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, volume 1, pages 181-184, 1995.
• David A. McAllester, Robert E. Schapire: On the Convergence Rate of Good-Turing Estimators. Proceedings of Conference on Learning Theory (COLT) 2000: 1-6.
• Mehryar Mohri. Weighted Grammar Tools: the GRM Library. In Robustness in Language and Speech Technology. pages 165-186. Kluwer Academic Publishers, The Netherlands, 2001.
• Hermann Ney, Ute Essen, and Reinhard Kneser. 1994. On structuring probabilistic dependences in stochastic language modeling. Computer Speech and Language, 8:1-38.
23
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
References• Kristie Seymore and Ronald Rosenfeld. Scalable backoff language models. In Proceedings of
the International Conference on Spoken Language Processing (ICSLP), 1996.
• Andreas Stolcke. 1998. Entropy-based pruning of back-off language models. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, pages 270-274.
• Ian H. Witten and Timothy C. Bell. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression, IEEE Transactions on Information Theory, 37(4):1085-1094, 1991.
24