14
Minimum Phone Error (MPE) Model and Feature Training ShihHsiang 2006

Minimum Phone Error (MPE) Model and Feature Training

  • Upload
    lyneth

  • View
    36

  • Download
    0

Embed Size (px)

DESCRIPTION

Minimum Phone Error (MPE) Model and Feature Training. ShihHsiang 2006. The derivation flow of the various training criteria. Difference. MPE v.s. ORCE ORCE focuses on word error rate and is implemented on N-best results - PowerPoint PPT Presentation

Citation preview

Page 1: Minimum Phone Error (MPE) Model and Feature Training

Minimum Phone Error (MPE)Model and Feature Training

ShihHsiang 2006

Page 2: Minimum Phone Error (MPE) Model and Feature Training

2

The derivation flow of the various training criteria

xx log1

Page 3: Minimum Phone Error (MPE) Model and Feature Training

3

Difference

• MPE v.s. ORCE– ORCE focuses on word error rate and is implemented on N-best res

ults– MPE focuses on phone accuracy and is implemented on a word gra

ph also introduces the prior distribution of the new estimated models (I-smoothing)

• MPE v.s. MMI– MMI treated the correct transcriptions as the numerator lattice and th

e whole word graph as the denominator lattice or the competing sequences

– MPE treats all possible correct sequences on the word graph as the numerator lattice, and treats all possible wrong sequences as the denominator lattice

Page 4: Minimum Phone Error (MPE) Model and Feature Training

4

fMPE (cont.)

• Feature-space minimum phone error (fMPE) is a discriminative training method which adds an offset to the old feature

ttt Mhoy

current feature

transform matrix

high-dimensional feature

current frame

Each vector contains 10,000 Gaussian posterior probabilityAnd the Gaussian likelihoods are evaluated with no priors

average

Page 5: Minimum Phone Error (MPE) Model and Feature Training

5

fMPE (cont.)

• Objective Function

using gradient descent to update the transformation matrix

Direct differential

r u

r

latticevr

rMPE suAcc

vPvOP

uPuOPF ,

|

|

ij

T

t ti

MPE

ij

ijT

t ti

MPE

ij

MPE hy

F

M

y

y

F

M

F

11

ti

smtS

s

M

m smtti

direct

y

l

l

F

y

F

1 1

smitismismi

S

s

M

m sm

sm

ti

indirect

yFFt

y

F

2

1 1

2ti

indirect

ti

direct

ti

MPE

y

F

y

F

y

F

ij

MPEijijij M

FvMM

Page 6: Minimum Phone Error (MPE) Model and Feature Training

6

fMPE (cont.)

• When using only direct differential to update the transformation matrix, significant improvements are obtainable but then lost very soon when the acoustic model is retrained with ML

• The indirect differential part thus aims to reflect the model change from the ML training with new features,

Page 7: Minimum Phone Error (MPE) Model and Feature Training

7

offset fMPE

• The difference of offset fMPE from the original fMPE is the definition of the high dimensional vector t h of posterior probabilities

where represents the posterior of i -th Gaussian at time tsize:

• The number of Gaussians needed is about 1000, which is significantly lower than 100000 for the original fMPE

T

1111111

],2/22,1/11,0.5

,2/22,1/11,0.5[

nn

tnt

nnt

nt

nt

tttttt

xx

xxh

nt

dimension dependent

1: dNht

N

jjt

itnt

gOp

gOp

1

|

|

Page 8: Minimum Phone Error (MPE) Model and Feature Training

8

Dimension-weighted offset fMPE

• Different from the offset fMPE which gives the same weight on each dimension of the feature offset vector– calculates the posterior probability on each dimension of the feature

offset vector

T

1111111

],2/222,1/111,0.5

,2/222,1/111,0.5[

nn

tnt

nnt

nt

nt

tttttt

xx

xxh

N

jjt

itnt

dgdOp

dgdOpd

1

|

|

Page 9: Minimum Phone Error (MPE) Model and Feature Training

9

Experiments (on MATBN)

• Error rates (%) for MPE and fMPE for different features, on different acoustic levels.

Page 10: Minimum Phone Error (MPE) Model and Feature Training

10

Experiments (cont.)

• CER(%) for offset fMPE and dimension-weighted offset fMPE with different features

Page 11: Minimum Phone Error (MPE) Model and Feature Training

11

Connect to SPLICE

• Decomposition Scheme 1

= +

ty to M

th

1p 1p qp

1q

pp

M)1(

1)1( p

th

1)( pnth

ppnM)(

n

i

iitt hMoy

1

)()(

Page 12: Minimum Phone Error (MPE) Model and Feature Training

12

Connect to SPLICE (cont.)

• Compensation of the original feature is carried out by adding a large number of bias vectors, each of which is computed as a full-rank rotation of a small set of posterior probabilities

• Maximum-Likelihood estimation

n

i

iit

n

i

iitt hMohMoy

1

*)(*)(

1

)()(

*i denotes the term greater than remaining (n-1) terms

Page 13: Minimum Phone Error (MPE) Model and Feature Training

13

Connect to SPLICE (cont.)

• Decomposition Scheme 2

= +

ty to M

th

1p 1p qp

1q

1m

1h

2m 3m km

2h3h4h

2kh

1khkh

q

iktt

q

ikkttt mxkpomhoMhoy

11

|

Page 14: Minimum Phone Error (MPE) Model and Feature Training

14

Connect to SPLICE (cont.)

• The compensation vector consists of a linear weighted sum of a set of frame-independent correction vectors, where the weight is the posterior probability associated with the corresponding correction vector

• The key difference is– the bias vector for compensation in fMPE is specific to each time

frame t– the bias vector in feature-space stochastic matching is common

over all frames in the utterance