Understanding Grapheme - University of Edinburghhomepages.inf.ed.ac.uk/v1dwang2/report/ugrapheme.pdf · MEANING WORD WORD FRAG Pron.FRAG WAVE (a-table) ... • Avoid word to phoneme

Understanding Grapheme

Dong Wang

January 15, 2007

• What is Grapheme?

– Understand Grapheme

– Comparation Result on WSJCAM0

• Graphem And Spoken Term Detection

– Spokten Term Detection Main Stream

– Grapheme Based Spoken Term Detection

• Future Work On Grapheme

Language:Multiple Layer Information Expression

MEANING WORD WORD FRAG Pron.FRAG WAVE

(a-table) TABLE T-A-B-L-E t-ei-b-l 1000110

A language has multiple layers for communication. Mapping

between different layers are famous.

Lexicon: Mapping from Word to Pronunciation

Accurate Lexicon V S Obscure Pronunciation

• Mixutre Gaussians

• Context dependent model

• Multi entries in lexicon

• Network based pronunciation

Grapheme: Thinking The Essence

That’ll be great if stochastic mapping is available, that is Grapheme.

• Obscure Mapping bewteen Meaning and Pronunciation

• A bag of pronunications

Mixture 1 4 8 12 16 18

Phoneme 57.07 70.11 75.54 78.26 77.72 78.80Grapheme 9.78 22.83 33.70 42.93 48.37 51.63


• Composition of linguistic and acoustic clues

• Context Dependent Grapheme: Unit of Phonology

Context independent Tri phoneme/grapheme

Phoneme 79.64 87.42Grapheme 52.98 86.88

Thanks for Motoyuki


Where grapheme can be used?

• High dependency between graphemes and phonemes or

• Obey phonology rules strictly or

• Powerful language model(or other constrains) for discrimination

-

Lg.6

Ac.

g1

g2

p1

p2

tide tid


Can the grapheme lexicon be refined?

• Graphemes and Phonemes are different sharing strategy

• Grapheme Lexicon can be concreted someway

• Difficulties in searching, how to resolve

-

Gr.6

Ph.

u u u uuuu

u

A

I

E

ai ei ih ax

Grapheme System on Wsjcam0

pure grapheme/phoneme recognizer

Basic decoder + 8 letter-gram + lexicon constrain

Phoneme 63.81 - -Grapheme 50.07 88.18 89.49

grapheme and phoneme word decoder

Lexicon +bi-gram lattice +trigram rescore HDecode

Phoneme 60.46 87.71 91.02 90.84Grapheme 58.68 84.80 89.18 87.02


Graphemes of Letter-Pairs

Strategy WER

Single Letter 89.18Single+TH/TION/SION/SH/CH/GH/PH 89.25

Single+TH/SH/GH/PH 89.70Single+TH/SH/PH-F/GH-” 89.30

All the word-pair schemes contain mappings like AU-O


Question set for tri-grapheme

• singular *CMU uses

• phoneme-grapheme mapping *currently used

• grapheme-phoneme mapping

• data driven

singular vs phoneme-grapheme mapping question set

Singular(9 mix) Phoneme Mapping (9 mix)88.44 89.18

simple vs complex phoneme-grapheme mapping question set

Simple(6 mix) Complex (6 mix)87.24 87.74

Grapheme Based Spoken Term Detection

There are several ways for Spoken Term Detection, or Spoken Document Retrieval

• Acoustic detection

• LVCSR based detection

• Phoneme lattice based detection

• Hybrid

Features make grapheme based STD/SDR feasible

• No special requirement for LVCSR accuracy

• In almost all cases, only those words with clear meaning will be searched, which meanslinguistic discrimination

• Avoid word to phoneme conversion, which is almost inevitable for any other systems


Phoneme Based Detection

• P1: Word lattice generated from HVite (without higher level LM, but only bigram wordlattice)

• P2: Word lattice generated from HDecode (without pre-built word lattice, but withhigher level LM)

• P3: Phoneme lattice generated from HVite

Grapheme Based Detection

• G1: Word lattice generated from HVite (without higher level LM, but only bigram wordlattice)

• G2: Word lattice generated from HDecode (without pre-built word lattice, but withhigher level LM)

• G3: Grapheme lattice generated from HVite

• G4: Grapheme lattice generated from HDecode ( with 8-gram grapheme LM)

• G5: Word lattice generated from HVite (without any LM, just the lexicon)


Most Frequent Word Detection

HIT False Accept Real Occ. FOMP1 1315 2359 1324 89.24P2 1320 855 1324 94.14P3 302 289 1324 20.23G1 1315 2678 1324 83.33G2 1315 690 1324 87.73G3 974 4053 1324 49.82G4 1295 503 1324 88.31G5 1274 1411 1324 77.29

80 most frequent words are selected from the 5k dictioanry according to the LM unigramfrequency, and should occur at least 3 times, but filter out stop words


Least Frequent Word Detection

HIT False Accept Real Occ. FOMP1 85 78 88 94.37P2 85 34 88 95.85P3 17 26 88 18.95G1 81 88 88 90.94G2 81 36 88 91.31G3 32 905 88 31.90G4 69 16 88 76.93G5 67 252 88 73.55

80 least frequent words are selected from the 5k dictioanry according to the LM unigramfrequency, and should occur at least once, but filter out stop words


How to handle OOV words

• P3,G3,G4 can be used to detect OOV words directly, without any change on the result

• If audio are allowed to be re-searched, OOV words can be added into lexicon on thefly, so G1,G2,G5 can be used, and no change on the result


How to handle words never seen (not in LM)

• P3,G3 has no change

• G4 will be affected. We delete all those training sentences containing the target wordsand test again

• If audio are allowed to be re-searched, those words can be added into vocabulary, butas UNKNOWN words in LM, so G1,G2 can be used. We only tested G2 in this case

• If audio are allowed to be re-searched, those words can be added into vocabulary onthe fly so G5 can be used. The result is the same becuase G5 dose not use LM

HIT False Accept Real Occ. FOMP3 17 26 88 20.23G2 84 553 88 90.89G3 32 905 88 49.82G4 20 0 88 22.73G5 67 252 88 73.55


Performance Test Phase I: Recognition (recognize 3 sentences)

Time Storage(k)P1 4:52 865P2 0:49 288P3 10:02 9,753G1 2:44 1,097G2 0:49 274G3 5:54 22,679G4 1:58 356G5 4:38 1,113

• Grapheme normally generize larger lattice in shorter time

• G4 is good at generating high quality lattice in short time


Performance Test Phase II: Index (Index 80 most frequent words)

Time Storage(k)P1 0:34 34,893P2 0:24 28,008P3 28:51 974,521G1 0:40 116,224G2 0:33 38,091G3 1:35:45 2,373,737G4 0:34 31,873G5 1:01 75,644

• Indexing time is basically determined by the lattice size

• Grapheme lattice seems more fast, maybe the single entry?


What conclusion can we draw from these results

• It’s a principle that phoneme system works well for In Vocabulary words

• Graphemes with long-span language models works well in OOV words

• If the audio are allowed to be searched again, G2 is the best way to deal with OOV,even those words never seen

• Hybrid sytem obviously a promising solution

Future Work

Vocabulary Refinement

• Two Pass Decoder: Recall acoustic evaluation on rescoring?

• One Pass Decoder: Look afterword when reach the word boundary?

Language Migrating and Adaptation, thanks for Partha

Pure Chinese Pure English English porting After Adaptation49.88 89.18 1.91 49.05

• Languages with different pronunciation basis are hard for migrating

• Languages with different phonology rules are hard for migrating

• This is intrinsic in graphemes as they are compound of acoustic and lingustic units

Future Work

Large file alingment

• the most strict language: the transcript

• benefit from unsupervised learning

Use wsjcam0 grapheme system recognize mp3 downloaded from internet

Direct Applying on-line adaptation each 100 short segments36.02 65.57

• large amount of OOVs in on-line books or conferences

• Handle bad word piece, for example {I’d}

Grapheme based ASR may be a powerful spider who can update itself steadily by findingproper audio segments, and cooperated with TEXT spider, who provides larger and largerand up-to-date LM, it can find much humane audio indexable, without seperate things likeGrapheme to Phoneme statistics.

Future Work

Language Identification

• The nature of grapheme with linguistic information

• Currently most sucessful identfier is phoneme decoder with phoneme language model

• The same reason as graphemes are not suitable for language porting is just the reasonthey suitable for identification

Final Page

We can not hope Grapheme is a good transcriber, but we really hope it is a good informationminer...

Most of the ideas come from Simon and Joe, they can answer any questions if I do notunderstand!

Documents

Understanding Grapheme - University of Edinburghhomepages.inf.ed.ac.uk/v1dwang2/report/ugrapheme.pdf · MEANING WORD WORD FRAG Pron.FRAG WAVE (a-table) ... • Avoid word to phoneme