1
Phoneme Hierarchy Techniques Dual-domain Hierarchical Phoneme Classification Hossein Hamooni and Abdullah Mueen Department of Computer Science, University of New Mexico Abstract Phonemes are smallest unit of human speech in any language. We use both frequency and time domain features for English phoneme classification. We use a hierarchy of phonemes based on their manners of pronunciation and applied non-parametric conditional classification at each node. The classifier is tested on three novel datasets and the results are significantly better than parametric methods. Motivation Accurate classification of phonemes can lead to better understanding of speech variations such as accents, dialects and disorders. Phoneme based speech recognizers can be robust for such variations. 0 0.5 1 1.5 2 2.5 3 3.5 4 x 10 4 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 2.5 3 3.5 4 x 10 4 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 2.5 3 3.5 4 x 10 4 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 2.5 3 3.5 4 x 10 4 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 British : bɒs American : bɑːs b ɑː s b ɒ s Data Preparation Crawling online dictionaries Google Translate Oxford Dictionary Merriam-Webster Segmentation Silent Removal Normalization Obstruent Fricative S SH Affricate CH gasser /G AE S ER/ unattached /AH N AH T AE CH T/ appreciable /AH P R IY SH AH B AH L/ cliched /K L IY SH EY D/ DTW label MFCC label Hierarchy label Sonorants Vowel EY IY Semi-vowel Y DTW label MFCC label Hierarchy label savagely /S AE V IH JH L IY / deactivate /D IY AE K T IH V EY T / valueless /V AE L Y U W L AH S/ philosophically /F IH L AH S AA F IH K L IY / 0 100 200 300 400 500 600 -2 -1 0 1 2 0 100 200 300 -2 -1 0 1 2 0 100 300 500 700 -4 -2 0 2 4 6 Equal length DTW Original DTW 0 20 40 60 80 100 120 0 50 100 150 200 250 Original DTW Equal length DTW 0 0.5 1 1.5 2 2.5 3 3.5 4 x 10 5 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 Training Set Size Accuracy 0 0.5 1 1.5 2 2.5 3 3.5 4 x 10 5 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 Training Set Size Accuracy 0 0.5 1 1.5 2 2.5 3 3.5 4 x 10 5 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 MFCC DTW OAHPC DTW - MFCC MFCC -DTW Training Set Size Accuracy MFCC DTW OAHPC DTW - MFCC MFCC -DTW MFCC DTW OAHPC DTW - MFCC MFCC -DTW 1.5 2 2.5 3 3.5 4 x 10 4 0 20 40 60 80 100 120 Phoneme Obstruent Aspirate Fricative Affricate Stop Sonorants Nasal Vowel Semi-vowel Liquid D G T M N W Y L R P HH DH TH F V Z S SH ZH CH JH B K NG AE AH UW OY OW AW EY AY Prefixed Lower Bound Original DTW AA AO ER UH IH IY EH Results

Dual-domain Hierarchical Phoneme Classification · 2019-05-14 · JH B K NG AE AH UW OY OW AW EY AY AO ER UH IH IY EH Results . Author: Hossein Hamooni Created Date: 9/10/2014 4:42:18

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dual-domain Hierarchical Phoneme Classification · 2019-05-14 · JH B K NG AE AH UW OY OW AW EY AY AO ER UH IH IY EH Results . Author: Hossein Hamooni Created Date: 9/10/2014 4:42:18

Phoneme Hierarchy

Techniques

Dual-domain Hierarchical Phoneme Classification Hossein Hamooni and Abdullah Mueen

Department of Computer Science, University of New Mexico

Abstract Phonemes are smallest unit of human speech in any language. We use both frequency and time doma in fea tu res fo r Eng l i sh phoneme classification. We use a hierarchy of phonemes based on their manners of pronunciation and applied non-parametric conditional classification at each node. The classifier is tested on three novel datasets and the results are significantly better than parametric methods.

Motivation Accurate classification of phonemes can lead to better understanding of speech variations such as accents, dialects and disorders. Phoneme based speech recognizers can be robust for such variations.

0 0.5 1 1.5 2 2.5 3 3.5 4 x 104 -0.6 -0.4 -0.2

0 0.2 0.4 0.6 0.8

1

0 0.5 1 1.5 2 2.5 3 3.5 4 x 104 -0.8 -0.6 -0.4 -0.2

0 0.2 0.4 0.6 0.8

1

0 0.5 1 1.5 2 2.5 3 3.5 4 x 104 -0.8 -0.6 -0.4 -0.2

0 0.2 0.4 0.6 0.8

1

0 0.5 1 1.5 2 2.5 3 3.5 4 x 104 -0.6 -0.4 -0.2

0 0.2 0.4 0.6 0.8

1

British : bɒs American : bɑːs

b ɑː s b ɒ s

Data Preparation Ø Crawling online dictionaries

Ø  Google Translate

Ø  Oxford Dictionary

Ø  Merriam-Webster

Ø Segmentation

Ø Silent Removal

Ø Normalization

Obstruent

Fricative

S SH

Affricate

CH gasser /G AE S ER/

unattached /AH N AH T AE CH T/

appreciable /AH P R IY SH AH B AH L/

cliched /K L IY SH EY D/

DTW label MFCC label Hierarchy label

Sonorants

Vowel

EY IY

Semi-vowel

Y

DTW label MFCC label Hierarchy label

savagely /S AE V IH JH L IY /

deactivate /D IY AE K T IH V EY T /

valueless /V AE L Y U W L AH S/

philosophically /F IH L AH S AA F IH K L IY /

0 100 200 300 400 500 600 -2

-1

0

1

2

0 100 200 300 -2

-1

0

1

2

0 100 300 500 700 -4 -2 0 2 4 6

Equal length DTW Original DTW 0 20 40 60 80 100 1200

50

100

150

200

250

Original DTW

Equa

l len

gth

DTW

0 0.5 1 1.5 2 2.5 3 3.5 4 x 105

0.2 0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

Training Set Size A

ccur

acy

0 0.5 1 1.5 2 2.5 3 3.5 4 x 105

0.1 0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

Training Set Size

Acc

urac

y

0 0.5 1 1.5 2 2.5 3 3.5 4 x 105

0.2 0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

MFCC DTW OAHPC DTW - MFCC MFCC -DTW

Training Set Size

Acc

urac

y MFCC DTW OAHPC DTW - MFCC MFCC -DTW

MFCC DTW OAHPC DTW - MFCC MFCC -DTW

1.5 2 2.5 3 3.5 4 x 104

0

20

40

60

80

100

120

Phoneme

Obstruent

Aspirate Fricative Affricate Stop

Sonorants

Nasal Vowel Semi-vowel Liquid

D G T

M N W Y L R P

HH DH TH F

V Z S SH ZH CH JH B

K NG AE AH UW

OY

OW

AW

EY

AY

Prefixed Lower Bound

Original DTW

AA

AO

ER

UH

IH

IY

EH

Results