Upload
shirin
View
155
Download
0
Embed Size (px)
DESCRIPTION
핵심어 검출을 위한 단일 끝점 DTW 알고리즘. Yong-Sun Choi and Soo-Young Lee Brain Science Research Center and Department of Electrical Engineering and Computer Science Korea Advanced Institute of Science and Technology. Contents. Keyword Spotting Meaning & Necessity Problems Dynamic Time Warping (DTW) - PowerPoint PPT Presentation
Citation preview
1
핵심어 검출을 위한 단일 끝점 DTW 알고리즘
Yong-Sun Choi and Soo-Young Lee
Brain Science Research Center and
Department of Electrical Engineering and Computer Science
Korea Advanced Institute of Science and Technology
2Contents
• Keyword Spotting– Meaning & Necessity– Problems
• Dynamic Time Warping (DTW)– Advantages of DTW– Some conventional types & Proposed DTW type
• Experimental Results– Verification of proposed DTW performance– Standard threshold setting– Results of various conditions
• Conclusions
3Keyword Spotting
• Meaning– Detection of pre-defined keywords in the continuous speech
– Example)• Keywords : ‘open’, ‘window’
• Input : “um…okay, uh… please open the…uh…window”
• Necessity– Human may say OOV(Out Of Vocabulary), sometimes stammer
– But machine only needs some specific words for recognition
4Problems & Goal
• Difficulties– of process
• End-Point-Detection of speech segment• Rejection of OOVs
– of implementation• A big load of calculations• Complex algorithm• Hard to build up a real hardware system
• Goal– Simple & Fast Algorithm
5DTW for Keyword Spotting
• Hidden Markov Model (HMM)– A statistical model : need large number of datum for training
– Complex algorithm : hard to implement a hardware system
– Many parameters : can cause memory problem
• Dynamic Time Warping (DTW)– Advantages
• Small number of datum for training
• Simple algorithm (addition & multiplication)
• Small number of stored datum
– Weak points• Need EPD process, Many calculations
6General DTW Process
• Known both End Points
• Repetition of searches
• Finding corresponding frames
7Advanced DTW
• Myers, Rabiner and Rosenberg– No EPD Process– Series of small area searches
• Global search in one area
• Setting next area around the best match point of local area
• Reducing amount of calculations but still much
– Tested in isolated word recognition
8Proposal – Shape & Weights
• No EPD process• Only one path
– Select the best match point and search again at the point
– Less computations
• Modifying weights– To compensate weight-sum
differences• For search• For distance accumulation
9Proposal – End Point
• Small search area– Successive local searches– Start search at one point
• End condition– When the point is on the
last frame of Ref. pattern– Setting up End Point
automatically
10Proposal – Distance
• Modifying distance– Using differences of pattern
lengths– Pattern lengths of same
words are similar each other
1
exp'E
EE
R
TRDD
11DTW – Computation Loads
– 3 types
EECC TRNK
T1
1 ECC RNKT 2
ECC RNK
T3
1
EEC TRN2
1 ECRNe )12(
ECRN9ECRN
8
5
12Data Base & EX-SET
• DB– RoadRally
• For keyword spotting
• Based on telephone channel
– Usages• 11 keywords (Total 434 occurrences)
• 40 male speakers read speech (Total 47 min.) in Stonehenge
• SET construction– 4 sub-set (about 108 keywords / set)
– 3 set for training , 1 set for test
– 2 reference patterns / keyword / set
13Verification Result
• Isolated Word Recognition– 3 set for training , 1 set for test
Test
Set
Recognition Rate (%)
General DTW Proposed DTW
1 96.3 98.2
2 100.0 99.1
3 96.3 95.4
4 97.2 97.2
Avg. 97.5 97.5
14Experimental Setup
• Assumption– Any frame can be the last frame of keywords
• Threshold– To reject OOV– 1 threshold / ref.– Standard threshold : no false alarm in training set
• Result presentation– ROC (Receiver Operator Characteristic)
• X-axis : false alarm / hour / keyword• Y-axis : recognition rate
15
Thresholds Setting & Recognition Rate of Training Set
• Training set = Test set (No false alarm)
KeywordKeyword RightRight TotalTotal %%
Mountain 21 40 52.5
Secondary 38 40 95.0
Middleton 27 37 73.0
Boonsboro 32 39 82.1
Conway 33 40 82.5
Thicket 30 39 77.0
KeywordKeyword RightRight TotalTotal %%
Primary 34 40 85.0
Minus 25 39 64.1
Interstate 37 40 92.5
Waterloo 35 40 87.5
Retrace 36 40 90.0
Total 368 434 84.8
16Result – DTW & HMM
• ROC Curve
17Changing Conditions
No. of Keywords No. of References
18Conclusion
• Proposed DTW– Advantages
• Simple structure : addition & multiplication (good for hardware)• No EPD processing• Very small computation load• Small stored datum : small memory
: Only keyword information
– Good performance
• Keyword Spotting– Better than HMM in the case of small training datum