Upload
nguyendan
View
220
Download
2
Embed Size (px)
Citation preview
Understanding Grapheme
Dong Wang
January 15, 2007
• What is Grapheme?
– Understand Grapheme
– Comparation Result on WSJCAM0
• Graphem And Spoken Term Detection
– Spokten Term Detection Main Stream
– Grapheme Based Spoken Term Detection
• Future Work On Grapheme
Language:Multiple Layer Information Expression
MEANING WORD WORD FRAG Pron.FRAG WAVE
(a-table) TABLE T-A-B-L-E t-ei-b-l 1000110
A language has multiple layers for communication. Mapping
between different layers are famous.
Lexicon: Mapping from Word to Pronunciation
Accurate Lexicon V S Obscure Pronunciation
• Mixutre Gaussians
• Context dependent model
• Multi entries in lexicon
• Network based pronunciation
Grapheme: Thinking The Essence
That’ll be great if stochastic mapping is available, that is Grapheme.
• Obscure Mapping bewteen Meaning and Pronunciation
• A bag of pronunications
Mixture 1 4 8 12 16 18
Phoneme 57.07 70.11 75.54 78.26 77.72 78.80Grapheme 9.78 22.83 33.70 42.93 48.37 51.63
Grapheme: Thinking The Essence
• Composition of linguistic and acoustic clues
• Context Dependent Grapheme: Unit of Phonology
Context independent Tri phoneme/grapheme
Phoneme 79.64 87.42Grapheme 52.98 86.88
Thanks for Motoyuki
Grapheme: Thinking The Essence
Where grapheme can be used?
• High dependency between graphemes and phonemes or
• Obey phonology rules strictly or
• Powerful language model(or other constrains) for discrimination
-
Lg.6
Ac.
g1
g2
p1
p2
tide tid
Grapheme: Thinking The Essence
Can the grapheme lexicon be refined?
• Graphemes and Phonemes are different sharing strategy
• Grapheme Lexicon can be concreted someway
• Difficulties in searching, how to resolve
-
Gr.6
Ph.
u u u uuuu
u
A
I
E
ai ei ih ax
Grapheme System on Wsjcam0
pure grapheme/phoneme recognizer
Basic decoder + 8 letter-gram + lexicon constrain
Phoneme 63.81 - -Grapheme 50.07 88.18 89.49
grapheme and phoneme word decoder
Lexicon +bi-gram lattice +trigram rescore HDecode
Phoneme 60.46 87.71 91.02 90.84Grapheme 58.68 84.80 89.18 87.02
Grapheme System on Wsjcam0
Graphemes of Letter-Pairs
Strategy WER
Single Letter 89.18Single+TH/TION/SION/SH/CH/GH/PH 89.25
Single+TH/SH/GH/PH 89.70Single+TH/SH/PH-F/GH-” 89.30
All the word-pair schemes contain mappings like AU-O
Grapheme System on Wsjcam0
Question set for tri-grapheme
• singular *CMU uses
• phoneme-grapheme mapping *currently used
• grapheme-phoneme mapping
• data driven
singular vs phoneme-grapheme mapping question set
Singular(9 mix) Phoneme Mapping (9 mix)88.44 89.18
simple vs complex phoneme-grapheme mapping question set
Simple(6 mix) Complex (6 mix)87.24 87.74
Grapheme Based Spoken Term Detection
There are several ways for Spoken Term Detection, or Spoken Document Retrieval
• Acoustic detection
• LVCSR based detection
• Phoneme lattice based detection
• Hybrid
Features make grapheme based STD/SDR feasible
• No special requirement for LVCSR accuracy
• In almost all cases, only those words with clear meaning will be searched, which meanslinguistic discrimination
• Avoid word to phoneme conversion, which is almost inevitable for any other systems
Grapheme Based Spoken Term Detection
Phoneme Based Detection
• P1: Word lattice generated from HVite (without higher level LM, but only bigram wordlattice)
• P2: Word lattice generated from HDecode (without pre-built word lattice, but withhigher level LM)
• P3: Phoneme lattice generated from HVite
Grapheme Based Detection
• G1: Word lattice generated from HVite (without higher level LM, but only bigram wordlattice)
• G2: Word lattice generated from HDecode (without pre-built word lattice, but withhigher level LM)
• G3: Grapheme lattice generated from HVite
• G4: Grapheme lattice generated from HDecode ( with 8-gram grapheme LM)
• G5: Word lattice generated from HVite (without any LM, just the lexicon)
Grapheme Based Spoken Term Detection
Most Frequent Word Detection
HIT False Accept Real Occ. FOMP1 1315 2359 1324 89.24P2 1320 855 1324 94.14P3 302 289 1324 20.23G1 1315 2678 1324 83.33G2 1315 690 1324 87.73G3 974 4053 1324 49.82G4 1295 503 1324 88.31G5 1274 1411 1324 77.29
80 most frequent words are selected from the 5k dictioanry according to the LM unigramfrequency, and should occur at least 3 times, but filter out stop words
Grapheme Based Spoken Term Detection
Least Frequent Word Detection
HIT False Accept Real Occ. FOMP1 85 78 88 94.37P2 85 34 88 95.85P3 17 26 88 18.95G1 81 88 88 90.94G2 81 36 88 91.31G3 32 905 88 31.90G4 69 16 88 76.93G5 67 252 88 73.55
80 least frequent words are selected from the 5k dictioanry according to the LM unigramfrequency, and should occur at least once, but filter out stop words
Grapheme Based Spoken Term Detection
How to handle OOV words
• P3,G3,G4 can be used to detect OOV words directly, without any change on the result
• If audio are allowed to be re-searched, OOV words can be added into lexicon on thefly, so G1,G2,G5 can be used, and no change on the result
Grapheme Based Spoken Term Detection
How to handle words never seen (not in LM)
• P3,G3 has no change
• G4 will be affected. We delete all those training sentences containing the target wordsand test again
• If audio are allowed to be re-searched, those words can be added into vocabulary, butas UNKNOWN words in LM, so G1,G2 can be used. We only tested G2 in this case
• If audio are allowed to be re-searched, those words can be added into vocabulary onthe fly so G5 can be used. The result is the same becuase G5 dose not use LM
HIT False Accept Real Occ. FOMP3 17 26 88 20.23G2 84 553 88 90.89G3 32 905 88 49.82G4 20 0 88 22.73G5 67 252 88 73.55
Grapheme Based Spoken Term Detection
Performance Test Phase I: Recognition (recognize 3 sentences)
Time Storage(k)P1 4:52 865P2 0:49 288P3 10:02 9,753G1 2:44 1,097G2 0:49 274G3 5:54 22,679G4 1:58 356G5 4:38 1,113
• Grapheme normally generize larger lattice in shorter time
• G4 is good at generating high quality lattice in short time
Grapheme Based Spoken Term Detection
Performance Test Phase II: Index (Index 80 most frequent words)
Time Storage(k)P1 0:34 34,893P2 0:24 28,008P3 28:51 974,521G1 0:40 116,224G2 0:33 38,091G3 1:35:45 2,373,737G4 0:34 31,873G5 1:01 75,644
• Indexing time is basically determined by the lattice size
• Grapheme lattice seems more fast, maybe the single entry?
Grapheme Based Spoken Term Detection
What conclusion can we draw from these results
• It’s a principle that phoneme system works well for In Vocabulary words
• Graphemes with long-span language models works well in OOV words
• If the audio are allowed to be searched again, G2 is the best way to deal with OOV,even those words never seen
• Hybrid sytem obviously a promising solution
Future Work
Vocabulary Refinement
• Two Pass Decoder: Recall acoustic evaluation on rescoring?
• One Pass Decoder: Look afterword when reach the word boundary?
Language Migrating and Adaptation, thanks for Partha
Pure Chinese Pure English English porting After Adaptation49.88 89.18 1.91 49.05
• Languages with different pronunciation basis are hard for migrating
• Languages with different phonology rules are hard for migrating
• This is intrinsic in graphemes as they are compound of acoustic and lingustic units
Future Work
Large file alingment
• the most strict language: the transcript
• benefit from unsupervised learning
Use wsjcam0 grapheme system recognize mp3 downloaded from internet
Direct Applying on-line adaptation each 100 short segments36.02 65.57
• large amount of OOVs in on-line books or conferences
• Handle bad word piece, for example {I’d}
Grapheme based ASR may be a powerful spider who can update itself steadily by findingproper audio segments, and cooperated with TEXT spider, who provides larger and largerand up-to-date LM, it can find much humane audio indexable, without seperate things likeGrapheme to Phoneme statistics.
Future Work
Language Identification
• The nature of grapheme with linguistic information
• Currently most sucessful identfier is phoneme decoder with phoneme language model
• The same reason as graphemes are not suitable for language porting is just the reasonthey suitable for identification
Final Page
We can not hope Grapheme is a good transcriber, but we really hope it is a good informationminer...
Most of the ideas come from Simon and Joe, they can answer any questions if I do notunderstand!