Upload
lee-brennan
View
83
Download
0
Embed Size (px)
DESCRIPTION
專題研究 week3 Language Model and Decoding. P rof . Lin-Shan Lee TA. Hung- Tsung Lu , C heng-Kuan Wei. Input Speech. Feature Vectors. Linguistic Decoding and Search Algorithm. Output Sentence. Front-end Signal Processing. Language Model. Acoustic Model Training. Speech - PowerPoint PPT Presentation
Citation preview
專題研究WEEK3
LANGUAGE MODEL AND DECODINGProf. Lin-Shan Lee
TA. Hung-Tsung Lu ,Cheng-Kuan Wei
語音辨識系統
Front-endSignal Processing
AcousticModels Lexicon
FeatureVectors
Linguistic Decoding and
Search Algorithm
Output Sentence
SpeechCorpora
AcousticModel
Training
LanguageModel
Construction
TextCorpora
LexicalKnowledge-base
LanguageModel
Input Speech
Grammar
Use Kaldi as tool
2
Language Modeling: providing linguistic constraints to help the selection of correct words
Prob [the computer is listening] > Prob [they come tutor is list sunny]
Prob [電腦聽聲音 ] > Prob [店老天呻吟 ]
t t
00.train_lm.sh01.format.sh
Language Model Training4
Language Model : Training Text (1/2)
train_text=ASTMIC_transcription/train.text
cut -d ' ' -f 1 --complement $train_text > /exp/lm/LM_train.text
5
remove the first column
Language Model : Training Text (2/2)
cut -d ' ' -f 1 --complement $train_text > /exp/lm/LM_train.text
Language Model : ngram-count (1/3)
/share/srilm/bin/i686-m64/ngram-count -order 2 (You can modify it from 1~3) -kndiscount (modified Kneser-Ney smoothing) -text /exp/lm/LM_train.text
(Your training data file name on p.7) -vocab $lexicon (Lexicon, as shown on p.10) -unk (Build open vocabulary language
model) -lm $lm_output (Your language model name)
http://www.speech.sri.com/projects/srilm/manpages/ngram-count.1.html
Language Model : ngram-count (2/3)
Smoothing Many events never occur in the training data
e.g. Prob [Jason immediately stands up]=0 because Prob [immediately| Jason]=0
Try to assign some non-zero probabilities to all events even if they never occur in the training data.
https://class.coursera.org/nlp/lecture Week 2 – Language Modeling
Language Model : ngram-count (3/3)
Lexicon lexicon=material/lexicon.train.txt
01.format.sh
Try to replace with YOUR language model !
WFST Decoding04a.01.mono.mkgraph.sh04a.02.mono.fst.sh07a.01.tri.mkgraph.sh07a.02.tri.fst.sh
Viterbi Decoding04b.mono.viterbi.sh07b.tri.viterbi.sh
Decoding11
WFST : Introduction (1/3)
FSA (or FSM) Finite state automata / Finite state machine
An FSA “accepts” a set of strings View FSA as a representation of a possibly infinite set of
strings Start state(s) bold; final/accepting states have extra circle. This example represents the infinite set {ab, aab, aaab , . . .}
WFST : Introduction (2/3)
FSA with edges weighted
Like a normal FSA but with costs on the arcs and final-states
Note: cost comes after “/”, For final-state, “2/1” means final-cost 1 on state 2.
This example maps “ab” to (3 = 1 + 1 + 1).
WFST : Introduction (3/3)
WFST
Like a weighted FSA but with two tapes : input and output.
Ex. Input tape : “ac” Output tape : “xz” Cost = 0.5 + 2.5 + 3.5 = 6.5
WFST Composition
Notation: C = A。 B means, C is A composed with B
WFST Component
HCLG = H。 C。 L。 G H: HMM structure C: Context-dependent relabeling L: Lexicon G: language model acceptor
Framework for Speech Recognition
17
WFST Component18
L(Lexicon)
H (HMM)
G (Language Model)
Where is C ?(Context-
Dependent)
Training WFST
04a.01.mono.mkgraph.sh 07a.01.tri.mkgraph.sh
Decoding WFST (1/3)
From HCLG we have… the relationship from state word
We need another WFST, U
Compose U with HCLG, i.e. S = U。 HCLG Search the best path(s) on S is the
recognition result
20
Decoding WFST (2/3)
04a.02.mono.fst.sh 07a.02.tri.fst.sh
Decoding WFST (3/3)
During decoding, we need to specify the weight respectively for acoustic model and language model
Split the corpus to Train, Test, Dev set Training set used to training acoustic model Test all of the acoustic model weight on Dev set, and
use the best Test set used to test our performance (Word Error
Rate, WER)
22
Viterbi Decoding
Viterbi Algorithm Given acoustic model and observations Find the best state sequence
Best state sequence Phone sequence (AM) Word sequence (Lexicon) Best word sequence (LM)
Viterbi Decoding
04b.mono.viterbi.sh 07b.tri.viterbi.sh
Language model training , WFST decoding , Viterbi decoding00.train_lm.sh01.format.sh04a.01.mono.mkgraph.sh04a.02.mono.fst.sh07a.01.tri.mkgraph.sh07a.02.tri.fst.sh04b.mono.viterbi.sh07b.tri.viterbi.sh
Homework
ToDo
Step1. Finish code in 00.train_lm.sh and get your LM.
Step2. Use your LM in 01.format.sh Step3.1. Run 04a.01.mono.mkgraph.sh and
04a.02.mono.fst.sh (WFST decode for mono-phone)
Step3.2 Run 07a.01.tri.mkgraph.sh and 07a.02.tri.fst.sh (WFST decode for tri-phone) Step4.1 Run 04b.mono.viterbi.sh (Viterbi for mono) Step4.2 Run 07b.tri.viterbi.sh (Viterbi for tri-phone)
ToDo (Opt.)
Train LM : Use YOUR training text or even YOUR lexicon. Train LM (ngram-count) : Try different arguments.
http://www.speech.sri.com/projects/srilm/manpages/ngram-count.1.html
Watch online courses on coursera (Week2 - LM) https://class.coursera.org/nlp/lecture
Read 數位語音處理概論 4.0 (Viterbi) 6.0 (Language Model) 9.0 (WFST)
Try different AM/LM combinations and report the recognition results.
Questions ?