핵심어 검출을 위한 단일 끝점 DTW 알고리즘

1

핵심어 검출을 위한 단일 끝점 DTW 알고리즘

Yong-Sun Choi and Soo-Young Lee

Brain Science Research Center and

Department of Electrical Engineering and Computer Science

Korea Advanced Institute of Science and Technology

2Contents

• Keyword Spotting– Meaning & Necessity– Problems

• Dynamic Time Warping (DTW)– Advantages of DTW– Some conventional types & Proposed DTW type

• Experimental Results– Verification of proposed DTW performance– Standard threshold setting– Results of various conditions

• Conclusions

3Keyword Spotting

• Meaning– Detection of pre-defined keywords in the continuous speech

– Example)• Keywords : ‘open’, ‘window’

• Input : “um…okay, uh… please open the…uh…window”

• Necessity– Human may say OOV(Out Of Vocabulary), sometimes stammer

– But machine only needs some specific words for recognition

4Problems & Goal

• Difficulties– of process

• End-Point-Detection of speech segment• Rejection of OOVs

– of implementation• A big load of calculations• Complex algorithm• Hard to build up a real hardware system

• Goal– Simple & Fast Algorithm

5DTW for Keyword Spotting

• Hidden Markov Model (HMM)– A statistical model : need large number of datum for training

– Complex algorithm : hard to implement a hardware system

– Many parameters : can cause memory problem

• Dynamic Time Warping (DTW)– Advantages

• Small number of datum for training

• Simple algorithm (addition & multiplication)

• Small number of stored datum

– Weak points• Need EPD process, Many calculations

6General DTW Process

• Known both End Points

• Repetition of searches

• Finding corresponding frames

7Advanced DTW

• Myers, Rabiner and Rosenberg– No EPD Process– Series of small area searches

• Global search in one area

• Setting next area around the best match point of local area

• Reducing amount of calculations but still much

– Tested in isolated word recognition

8Proposal – Shape & Weights

• No EPD process• Only one path

– Select the best match point and search again at the point

– Less computations

• Modifying weights– To compensate weight-sum

differences• For search• For distance accumulation

9Proposal – End Point

• Small search area– Successive local searches– Start search at one point

• End condition– When the point is on the

last frame of Ref. pattern– Setting up End Point

automatically

10Proposal – Distance

• Modifying distance– Using differences of pattern

lengths– Pattern lengths of same

words are similar each other

1

exp'E

EE

R

TRDD

11DTW – Computation Loads

– 3 types

EECC TRNK

T1

1 ECC RNKT 2

ECC RNK

T3

1

EEC TRN2

1 ECRNe )12(

ECRN9ECRN

8

5

12Data Base & EX-SET

• DB– RoadRally

• For keyword spotting

• Based on telephone channel

– Usages• 11 keywords (Total 434 occurrences)

• 40 male speakers read speech (Total 47 min.) in Stonehenge

• SET construction– 4 sub-set (about 108 keywords / set)

– 3 set for training , 1 set for test

– 2 reference patterns / keyword / set

13Verification Result

• Isolated Word Recognition– 3 set for training , 1 set for test

Test

Set

Recognition Rate (%)

General DTW Proposed DTW

1 96.3 98.2

2 100.0 99.1

3 96.3 95.4

4 97.2 97.2

Avg. 97.5 97.5

14Experimental Setup

• Assumption– Any frame can be the last frame of keywords

• Threshold– To reject OOV– 1 threshold / ref.– Standard threshold : no false alarm in training set

• Result presentation– ROC (Receiver Operator Characteristic)

• X-axis : false alarm / hour / keyword• Y-axis : recognition rate

15

Thresholds Setting & Recognition Rate of Training Set

• Training set = Test set (No false alarm)

KeywordKeyword RightRight TotalTotal %%

Mountain 21 40 52.5

Secondary 38 40 95.0

Middleton 27 37 73.0

Boonsboro 32 39 82.1

Conway 33 40 82.5

Thicket 30 39 77.0

KeywordKeyword RightRight TotalTotal %%

Primary 34 40 85.0

Minus 25 39 64.1

Interstate 37 40 92.5

Waterloo 35 40 87.5

Retrace 36 40 90.0

Total 368 434 84.8

16Result – DTW & HMM

• ROC Curve

17Changing Conditions

No. of Keywords No. of References

18Conclusion

• Proposed DTW– Advantages

• Simple structure : addition & multiplication (good for hardware)• No EPD processing• Very small computation load• Small stored datum : small memory

: Only keyword information

– Good performance

• Keyword Spotting– Better than HMM in the case of small training datum

Documents

핵심어 검출을 위한 단일 끝점 DTW 알고리즘