15
Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science

Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science

Embed Size (px)

Citation preview

Page 1: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science

Dual-domain Hierarchical Classification of Phonetic

Time Series

Hossein Hamooni, Abdullah Mueen University of New Mexico

Department of Computer Science

Page 2: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science

What is Phoneme? Phonemes are very small units of intelligible sound (usually less than 200 ms).

Phonetic spelling is the sequence of phonemes that a word comprises.

Example: Coat ([kōt] /K OW T/) From ([frəm] /F R AH M/) impressive ([imˈpresiv] /IH M P R EH S IH V/)

2

Page 3: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science

Phoneme Classification

What is phoneme classification?

Input: A short segment of audio signal.

Output: What phoneme it is.

Phoneme classification is a complex task:

More than 100 classes (based on International Phonetic Alphabet)

Variation in speakers, dialects, accents, noise in the environment, etc.

Phoneme classification can be used in:

Robust speech recognition

Accent/dialect detection

Speech quality scoring

3

Page 4: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science

Related Work

Different methods for phoneme classification have been used in the literature: Hidden Markov model [Lee, 1989]

Neural network [Schwarz, 2009]

Deep belief network [Mohamed, 2012]

Support vector machine [Salomon, 2001]

Hierarchical methods [Dekel, 2005]

Boltzmann machine [Mohamed, 2010]

Although data mining society has shown that k-NN classifiers can work well on time series data, it hasn’t been tried on phoneme yet.

4

[C. Lopes, F. Perdigao, 2011]

Page 5: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science

Our Dual-domain Approach

5

Time Domain: Using k-NN Dynamic Time Warping (DTW) Expensive Speed up by lower bounding

techniques

Frequency Domain: Using k-NN Euclidean distance between Mel-

frequency cepstrum coefficients (MFCC)

Fast

Page 6: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science

Real Example

6

Page 7: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science

Challenge

7

DTW is expensive (quadratic in time and space complexity)

We need to apply a speed up technique Solution: Lower bounding techniques

w w

Page 8: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science

DTW Lower bounding

8

Resampling to equal length doesn’t always work !!!

Page 9: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science

DTW Lower bounding

9

We use the prefix of the longer signal (Prefixed LB_Keogh) We show that Prefixed LB_Keogh is a lower bound if:

w > difference between lengths of two signals We set w = c * length of the longer signal We ignore all pairs of signals that don’t satisfy the above condition.

2 4 6 8 10 12 14 16 18x104

0

0.5

1

1.5

2

2.5

3

3.5

Sp

eed

up

Training Set Size10 20 30 40 50 60 70 80 90 100

80.2

80.4

80.6

80.8

81

81.2

81.4

81.6

81.8

Window Size (c%)

Acc

urac

y(%

)

c = 30%

Page 10: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science

Data Collection

10

370,000 phonemes are segmented from: Data is publicly available.

AH T S IH IY M EH AE AA FOW V AO

UW W HH CHAW OY ZH

05000

1000015000200002500030000350004000045000

Num

ber o

f sam

ples

Page 11: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science

Phoneme Segmentation

11

The Penn Phonetics Lab Forced Aligner (p2fa) Takes a signal and a transcript Produces timing segmentations (word level and phoneme level)

Page 12: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science

Accuracy (All layers)

12

10-fold cross validation 100 random phonemes in each fold

Page 13: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science

Accented Phoneme Classification

13

0 0.5 1 1.5 2 2.5 3 3.5x 104

0.65

0.7

0.75

0.8

0.85

0.9

0.95

Training Set Size

Acc

urac

y

MFCC

DTW

British vs. American accent Using Oxford test set 2-class classification problem No hierarchy

Page 14: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science

Conclusion We present a dual-domain hierarchical method for phoneme

classification.

We generate a novel dataset of 370,000 phonemes.

We achieve up to 73% accuracy rate for 39 classes.

Our lower bounding technique gives us up to 3X speedup.

14

Page 15: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science

15

Thank You

Data and code available at:http://cs.unm.edu/~hamooni/papers/

Dual_2014