30
Using Conversational Word Bursts in Spoken Term Detection Justin Chiu Language Technologies Institute Presented at University of Cambridge September 6 2013

Using Conversational Word Bursts in Spoken Term Detection

  • Upload
    patia

  • View
    44

  • Download
    0

Embed Size (px)

DESCRIPTION

Using Conversational Word Bursts in Spoken Term Detection. Justin Chiu Language Technologies Institute Presented at University of Cambridge September 6 2013. Introduction. Spoken Term Detection (STD): detecting word or phrase targets in conversational speech Queries: Text Target: Audio - PowerPoint PPT Presentation

Citation preview

Page 1: Using Conversational Word Bursts in Spoken Term Detection

Using Conversational Word Bursts in Spoken Term Detection

Justin ChiuLanguage Technologies Institute

Presented at University of CambridgeSeptember 6 2013

Page 2: Using Conversational Word Bursts in Spoken Term Detection

Introduction

• Spoken Term Detection (STD): detecting word or phrase targets in conversational speech

• Queries: Text• Target: Audio

• The BABEL program: rapid development of key word search for novel languages, with limited resources

• This research: using (ideally universal) properties of conversation to enhance performance

Page 3: Using Conversational Word Bursts in Spoken Term Detection

Current Approach

1 ASR produces candidates (as a confusion network)2 Search on the ASR result identifies targets– Single-word query: Posterior probability for the query

word in confusion network– Multi-word query: Product of the posterior probability

for each word in confusion network

3 Deciding detection result with search result probability/threshold (YES/NO decision)

Page 4: Using Conversational Word Bursts in Spoken Term Detection

Challenge for Current Approach

• Sensitive to ASR performance

• High-performance ASR requires extensive training resources; limited resources produce low-accuracy

• In the BABEL LimitedLP condition, only 10 hour training data are provided

Page 5: Using Conversational Word Bursts in Spoken Term Detection

Intuition

• Can we use other sources of information to enhance STD performance even with poor ASR results?

• We specifically focus on conversational structure, since we believe it is language-independent

Page 6: Using Conversational Word Bursts in Spoken Term Detection

Word Burst

• Word burst refers to a phenomenon in conversational speech in which particular content words tend to occur in close proximity to each other, as a byproduct of some topic under discussion

• Content Word: Word that do not occur with high frequency

Page 7: Using Conversational Word Bursts in Spoken Term Detection

0 100 200 300 400 500 600

Seconds

Conversations

Occurrence pattern for the Tagalog word “magkano”

Page 8: Using Conversational Word Bursts in Spoken Term Detection

0 100 200 300 400 500 600

Seconds

Conversations

Occurrence pattern for the Tagalog word “magkano”

Word Burst

Page 9: Using Conversational Word Bursts in Spoken Term Detection

0 100 200 300 400 500 600

Seconds

Conversations

Occurrence pattern for the Tagalog word “magkano”

Word Burst

No Word Burst

Page 10: Using Conversational Word Bursts in Spoken Term Detection

Word Burst rescoring on the Confusion Network

• Give Bonus to the posterior probability for the content word that is in a Word Burst

• Penalize the posterior probability for the content word that is not in a Word Burst

• Goal: Improve the quality for the confusion network for better search performance

Page 11: Using Conversational Word Bursts in Spoken Term Detection

Word Burst Rescoring Parameters

• Window Size: time region for deciding Word Burst

• Stop word %: top X% of words in training data; assumed not to be content words

• Penalty for non-burst word: penalty for words that do not appear in a Word Burst

• Trained from language pack (development) data

Page 12: Using Conversational Word Bursts in Spoken Term Detection

Word Burst Rescoring Formula

q(x i) p(x i)b(x i) if x j window(x i) i j

q(x i) p(x i) * penalty(L) if x j window(x i)i j

b(x i) d(x i,x j )j *( d(x i,x j )

j * p(x j ))i j

d(x i,x j ) 1 (dis(x i,x j ) /windowsize) i j

p(x)

time

Page 13: Using Conversational Word Bursts in Spoken Term Detection

Word Burst Rescoring Formula

q(x i) p(x i)b(x i) if x j window(x i) i j

q(x i) p(x i) * penalty(L) if x j window(x i)i j

b(x i) d(x i,x j )j *( d(x i,x j )

j * p(x j ))i j

d(x i,x j ) 1 (dis(x i,x j ) /windowsize) i j

p(x)

time

Page 14: Using Conversational Word Bursts in Spoken Term Detection

Word Burst Rescoring Formula

q(x i) p(x i)b(x i) if x j window(x i) i j

q(x i) p(x i) * penalty(L) if x j window(x i)i j

b(x i) d(x i,x j )j *( d(x i,x j )

j * p(x j ))i j

d(x i,x j ) 1 (dis(x i,x j ) /windowsize) i j

p(x)

time

Page 15: Using Conversational Word Bursts in Spoken Term Detection

Word Burst Rescoring Formula

q(x i) p(x i)b(x i) if x j window(x i) i j

q(x i) p(x i) * penalty(L) if x j window(x i)i j

b(x i) d(x i,x j )j *( d(x i,x j )

j * p(x j ))i j

d(x i,x j ) 1 (dis(x i,x j ) /windowsize) i j

p(x)

time

Page 16: Using Conversational Word Bursts in Spoken Term Detection

Word Burst Rescoring Formula

q(x i) p(x i)b(x i) if x j window(x i) i j

q(x i) p(x i) * penalty(L) if x j window(x i)i j

b(x i) d(x i,x j )j *( d(x i,x j )

j * p(x j ))i j

d(x i,x j ) 1 (dis(x i,x j ) /windowsize) i j

p(x)

time

Page 17: Using Conversational Word Bursts in Spoken Term Detection

Word Burst Rescoring Formula

q(x i) p(x i)b(x i) if x j window(x i) i j

q(x i) p(x i) * penalty(L) if x j window(x i)i j

b(x i) d(x i,x j )j *( d(x i,x j )

j * p(x j ))i j

d(x i,x j ) 1 (dis(x i,x j ) /windowsize) i j

p(x)

time

Page 18: Using Conversational Word Bursts in Spoken Term Detection

Datasets

• 4 Language, 2 different size of training data (80 hour, 10 hour)

• An additional 10 hours of data in each language for 5 fold cross-validation

Language Setup Lexicon SizeCantonese FullLP 18769

LimitedLP 5112Pasto FullLP 17904

LimitedLP 6219Tagalog FullLP 21098

LimitedLP 5565Turkish FullLP 38849

LimitedLP 10173

Page 19: Using Conversational Word Bursts in Spoken Term Detection

ATWV

• Actual Term Weighted Value

• In BABEL, the C/V is 0.1, and the is 10-4

• Having a Correct Detection is much more valuable than reducing false alarm

TWV () 1 mean{PMiss(term, ) PFA (term,)}

CV(Prterm

1 1)

Prterm 1

Page 20: Using Conversational Word Bursts in Spoken Term Detection

Comparison between 10/80 hours of training data

• In the condition with limited data, the introduction of additional information can compensate for the lack of training data

Language Setup Baseline Rescore Change

CantoneseFullLP 0.322 0.320 -0.7%

LimitedLP 0.103 0.109 +6%

PashtoFullLP 0.214 0.215 +0.5%

LimitedLP 0.095 0.114 +19%

TagalogFullLP 0.358 0.358 -0.3%

LimitedLP 0.130 0.144 +11%

TurkishFullLP 0.385 0.385 +0.1%

LimitedLP 0.262 0.265 +1%

Page 21: Using Conversational Word Bursts in Spoken Term Detection

Tuned LimitedLP(10h) performance

• Rescoring significantly reduces the False Alarm from detection result, indicating that penalizing an isolated content word is beneficial to STD performance

Language Baseline Rescore ∆ ATWV FACantonese 0.114 0.118 4% -21%

Pashto 0.073 0.094 29% -32%Tagalog 0.143 0.159 11% -21%Turkish 0.241 0.245 2% +4%

Page 22: Using Conversational Word Bursts in Spoken Term Detection

Best Parameters for LimitedLP(10h)

• Apart from Turkish, parameter values are similar across languages.

• Words do not repeat in the same form in Turkish, an agglutinative language.

Language Window (sec) Stop (%) PenaltyCantonese 12 4 0.1

Pashto 18 1 0.05Tagalog 11 2 0.15Turkish 9 8 1

Page 23: Using Conversational Word Bursts in Spoken Term Detection

Analysis/Discussion

1. The impact of Word Burst on WER?2. How often does a Word Burst occur? 3. Issues in Turkish/Vietnamese?4. Difference between Word Burst and Cache-

base/Trigger-based language model?

Page 24: Using Conversational Word Bursts in Spoken Term Detection

WER

• Word Burst does not provide significant improvement on WER

• The rescoring helps the candidate’s probability to cross the detection threshold

Carnegie Mellon University
mention Columbia verbally. The way you had makes it sound like not your idea."we're recently learned that Columbia..."
Page 25: Using Conversational Word Bursts in Spoken Term Detection

How often do Word Bursts occur?

• Query/Content word burst percentage (10sec)

• More than 35% of Content/Query words occur as part of a Word Burst, indicating it’s generality

• Query/Content word burst percentages are similar, except for Turkish

Language Cantonese Pashto Tagalog TurkishContent 43.6 35.7 40.7 35.4Query 41.2 35.3 36.8 27.7

Page 26: Using Conversational Word Bursts in Spoken Term Detection

Issues in Turkish/Vietnamese

• Turkish word will re-occur in different form, can not count on Word Burst for the same word

• Using tool for morphological normalization might be able to find Word Burst for the same word

• Vietnamese performance is in the same range as Cantonese

• Some of the evaluation queries are syllables, not words, which Burst does not sounds reasonable

Carnegie Mellon University
I don't understand this...
Page 27: Using Conversational Word Bursts in Spoken Term Detection

How Word Burst differs from Cache/Trigger Language Models

1. It not only considers the top 1 hypothesis, it considers the entire confusion set

2. It examines both past and future; Cache/Trigger based language model focus on the past

3. It focuses on time-based information, instead of token based information

Page 28: Using Conversational Word Bursts in Spoken Term Detection

Conclusion

• Our experiments indicate that the Word Burst is a property of conversational speech and is language-independent phenomenon

• Information from language or conversational structure (such as Word Burst) can be used to improve performance in the STD task, particularly when language resources are otherwise limited.

• It is potentially useful in other applications

Carnegie Mellon University
have some examples ready...--> although not so great in the current task, WB and other properties might still help ASR accuracy, if we can figure it out.
Page 29: Using Conversational Word Bursts in Spoken Term Detection

Special Thanks

Alex Rudnicky Alan Black

Florian Metze Yajie Miao Yuchen Zhang

Page 30: Using Conversational Word Bursts in Spoken Term Detection

Questions?