Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Sequential Data ModelingSequential Data Modelingq gq g
Tomoki Toda 21Graham Neubig 1
Sakriani Sakti 1
1A d H C i i L b
Sakriani Sakti
1 Augmented Human Communication LaboratoryGraduate School of Information Science, NAIST
2 I f ti T h l C t / G d t S h l f I f ti S i2 Information Technology Center / Graduate School of Information ScienceNagoya University
Review: Evaluation/Alignment/TrainingEvaluation
Model λ
Sequential Forward/Backward Likelihood
Model λ
qdata algorithms
( )
z
λzxλx all
)|,()|( ppx
Alignment (Decoding)Model λ
Viterbi algorithm )|,(maxargˆ λzxzz
pSequential data x
State sequence
Training
z
Baum‐Welch (i.e., EM) algorithm )|(maxargˆ λxλ
λp
Sequential data x
Model parameter set
Review: 1
λ
Example of Parameter Estimation起: wake up 寝: sleepTraining data samples with state sequences:
• /s/⇒ state 1:起⇒ state 1:起⇒ state 2:寝• /s/ ⇒ state 1: 起⇒ state 1: 起⇒ state 2: 寝• /s/ ⇒ state 2: 寝⇒ state 2: 起⇒ state 1: 起⇒ state 1: 寝• /s/ ⇒ state 2: 寝⇒ state 2: 起
/s/ * States 1 and 2起 0 起 012
Number of observed samples:
13 2/s/ States 1 and 2 can be a final state.0起: 0
寝: 0起: 0寝: 00
0
112
11
1213
12
32
1 20
001 11
2 2
)(1 起B )(2 起B/s/1 2Maximum likelihood estimates:
2/(1+2)1/(1+2)3/(3+1) 2/(2+3))(
)(
1
1
寝B )()(
2
2
寝B/ /
A2,1A
A
2/(1+2)1/(1+2)
1/(2+1)
3/(3 1)1/(3+1)
( )3/(2+3)
1 21,1A
1,2A
2,2A2/(2+1) 2/(2+1)1/(2+1)
Review: 2
Review: Lower Bound of HMM Likelihood
ULog‐scaled likelihood function for U samples of sequential data
U
u
uuU
u
pp1 all
)()()()1(
)(
|,ln|,,lnz
λzxλxx
λz
λzxzz
,)(
)|,(ln1 all
)(
)()()(
)(
pqU
uu
uuu
u
L Lower bound
)()(
E‐step: calculate posterior probabilities of latent variables (i.e., state sequences)
Lower bound
)(old
)()(old
)()(
old)()()(
)|,()|,(,|ˆ uu
uuuuu
pppq
λzxλzxλxzz
U
)(all uz
M‐step: maximize auxiliary function with respect to model parameters
U
u
uuu
u
pq1 all
new)()()(
oldnew)(
|,ln ˆ,z
λzxzλλQ
Review: 3
Review: E‐Step• Calculate posterior probabilities of latent variables
ˆ)( )()( un
us szqn
Expected # of samples observed in state s at time n in sample u
)()( ss nn
)|(
,|)()(
old)()(
λx
λxuu
uun
szp
szp
1 1 1 1 1
)|()|,(
old)(
old
λxλx
un
pszp
3
2
3
2
3
2
3
2
3
2
',ˆ)1( )()(1
)('
un
un
uss szszqn
Expected # of samples from state s’ at time n – 1 to state s at time n in sample u
)|'(
,|',
,)(
)()()(old
)()()(1
1,
λ
λxuuu
uun
un
nnss
szszp
q
1 1 1 1 1
)'()( '',1 sBAs nnsssn x
)|()|',,(
old)(
old)()(
1)(
λxλx
u
un
un
u
pszszp
1
2
1
2
1
2
1
2
1
2
3 3 3 3 3Review: 4
Review: M‐Step
SS SSAuxiliary function
S
sss
S
s
S
sssss
S
sss BAn
1 "o" all1 1'',',
1old o""ln)o""(lnln)1(, λλQ
)1(
S For each state,
ML estimates
S
s
ss
n
n
)1(
)1(ˆ
0
1,1
old
λλs
s
Q Initial state probability
ss
1)(
ˆ λλs
1, ',old
λλ
S
ssAQ Sss
ssA ','
ˆ Transition
0ˆ',
1'
λλss
s
A
S
sss
ss
1'',
,
probability
0
)""(
)o""(1,"o" all
old
λλ s
B
BQ
)o"(")o"(")o""(ˆ
s
ssB
Output
probability)o""( ˆ λλsB "o" all
Review: 5
Review: Example of E‐Step
起 寝 寝2Forward and backward probabilities Forward
BackwardTime
起 寝 寝
0.8 0.2 0.2
1n 2n 3n Backward
0.060.32
0.0121
0.40.14560.5
0.7
0 3Initial Pseudo final
0.7
0 3 1
1s
0 18 0 1080 20.50.4 0.6
0.3
0.11
0.120.12
10.60.3
0.1
1
12 0.18
0.560.108
10.2
0.3088 0.9 0.92sState
1456.04.0
0.05824 0.0192 0.0120.00840.01792Their products
32.02.07.04.0
1456.04.0
0.0108
0.0036
0.04032
0.00128
0.05824
0.061760.06176 0.1008 0.108
0.09720.06048Review: 6
Review: Example of Posterior ProbabilitiesTime起 寝 寝1n 2n 3n
058240 01920 012012.0
01792.0)1()(1,1 u
12.00084.0)2()(
1,1 u
12.005824.0)1()1(
1 12.0
0192.0)2()1(1
12.0012.0)3()1(
1 1s
0108.0)2()(u04032.0)1()(u12.0
0 08.0)2()(2,1 u
0036.0)2()(u
12.00 03.0)1()(
2,1 u
00128.0)1()(u
06176.0)1()1(2 1008.0)2()1(
2 108.0)3()1(2 2s
12.0)2()(
1,2 u12.0
)1()(1,2 u
12.0)(2 12.0
)(2 12.0)(2
12006048.0)1()(
2,2 u120
0972.0)2()(2,2 u
2s
State 12.0 12.0State
Calculate these posterior probabilities (= expected # of samples)
Review: 7
sequence by sequence
Review: Example of Sufficient Statistics起
寝
1n2Posterior probabilities (= expected # of samples) 寝
寝 2n 3n
/s/12.005824.0)1()1(
1 12.0
06176.0)1()1(2
Posterior probabilities ( expected # of samples)
12.005824.0)1()1(
1 起12.0
06176.0)1()1(2 起
0192.0)2()1( 寝 1008.0)2()1( 寝012.0)3()1( 寝 108.0)3()1( 寝12.0
)2(1 寝12.0
)2(2 寝12.0
)3(1 寝
01792.0)1()(u06048.0)1()( u12.0
04032.0)1()(2,1 u
12.00108.0)2()(
2,1 u
12.0)3(2 寝
12.0)1()(
1,1 u 12.0)1(2,2
12.00084.0)2()(
1,1 u12.0
0972.0)2()(2,2 u
1200036.0)2()(
1,2 u120
00128.0)1()(1,2 u
1 2
12.012.0
Sufficient statistics (= expected # of samples for each parameter)
12.005824.0)1(1 n
06176012.0
02632.01,1
12.005112.0
2,1
004880 1 68012.0
05824.0)(1 起
2088012.0
0312.0)(1 寝
061 6012.0
06176.0)1(2 n12.0
00488.01,2
12.015768.0
2,2 12.0
2088.0)(2 寝12.0
06176.0)(2 起Review: 8
Review: Example of ML EstimatesSufficient statistics (expected # of samples for each parameter)
12.005824.0)1(1 n
12.006176.0)1(2 n
051120
12.005824.0)(1 起
0312.0)(1 寝
/s/起:寝:
起: 寝:
1202088.0)(2 寝12.0
06176.0)(2 起
02632.011
12.005112.0
2,1
004880 15768.0
12.0)(1 寝
1 2
寝: 12.0)(2
12.01,1
12.000488.0
1,2 12.02,2
)1( n ML estimates
49.0)1()1(
)1(ˆ21
11
nn
n
34.0ˆ2,11,1
1,11,1
A
)1(n
66.0ˆ2,11,1
2,12,1
A
51.0
)1()1()1(ˆ
21
22
nn
n
03.0ˆ2,21,2
1,21,2
A 97.0ˆ2,21,2
2,22,2
A
)(起 )(寝65.0)()(
)()(ˆ11
11
寝起
起起
B 35.0
)()()()(ˆ11
11
寝起
寝寝
B
)(起 )(寝23.0)()(
)()(ˆ22
22
寝起
起起
B 77.0
)()()()(ˆ22
22
寝起
寝寝
B
Review: 9
Sequential Data ModelingSequential Data Modelingq gq g
55thth classclass“Continuous Latent Variable Model 1” “Continuous Latent Variable Model 1”
Tomoki TodaInformation Technology Center / Graduate School of Information Science
N U i itNagoya University
Basic Techniques
Discrete latent variables Continuous latent variables
Mixture model (e g GMM) Factor analysis (FA)
Discrete latent variables Continuous latent variables
Mixture model (e.g., GMM) Factor analysis (FA)
1z 2z 3z1z 2z 3z
mod
el
1x 2x 3x1x 2x 3x
Linear dynamical systems (LDS)Markov m
Hidden Markov model (HMM) Linear dynamical systems (LDS)M Hidden Markov model (HMM)
1z 2z 3z1z 2z 3z
1x 2x 3x1x 2x 3x
1
Continuous LatentContinuous LatentContinuous Latent Continuous Latent VariablesVariables
(from(from PCAPCA to FA)to FA)(from (from PCAPCA to FA)to FA)
Example of High Dimensional Data• Example of hand‐written digits
• Each image of 100 x 100 = 10,000 pixels, i.e., represented as 10,000 dimensional vector
Each image is represented as one point in the space.
10,000 dimensional space
However # of the degrees of freedom of variability would be limitedHowever, # of the degrees of freedom of variability would be limited…(e.g., only vertical and horizontal translations and the rotations: 3 degrees)
Can we find a lower dimensional subspace on which the data points live?
2
Extraction of Synthetic Variables • Synthesis of new variables by linearly combining observable variables
e.g., from 2‐dimensional observation data to one dimensional data
on5060di
men
si
2,1,5.0 nnn xxy
150 402n
dd
40 15.0
1, n
xx
40ny20
2040
nx 2,nx
represented by inner product:
1st dimension
nny xw Τrepresented by inner product:
0
1, ,
15.0 n
n
xxwwhere
3
2,1 n
n x
Principal Component Analysis (PCA)• How to extract a synthetic variable that the most effectively
represents observable variables?
• Determine a unit vector by maximizing a variance of synthetic variables Mean vector
Synthetic variable:Mean value
μ
Its variance :
Constraint : i.e., unit vector (length = 1)
4
Eigenvalue Problem• Maximization of variance of synthetic variable
nn
N
n 1
N
nnnN 1
1 ΤμxμxSμ μ
Maximize the following objective function with respect to uMaximize the following objective function with respect to
C t i t
u uF
VarianceConstraint
Lagrange multiplier
Ei l bl F Eigenvalue problem:
Eigenvector
0uu
F
Eigenvalue5
Eigenvector and Eigenvalue• Eigenvalue problem Direction
Variance of synthetic
= Eigenvector
Variance of synthetic variable = Eigenvalue
Variance of synthetic i bl / i
Eigenvalue Eigenvector for the largest eigenvaluerepresents the direction that maximizes
variable w/ eigenvector1u
pthe variance of a synthetic variable.
• Synthetic variable with eigenvector (= principal component)
μxuy Τ Its mean = 0 μxu nny 11,
n
Its mean 0Its variance = Eigenvalue 1
6n
Projection onto Low‐Dimensional Space
21,uu• Extraction of multiple eigenvectors, e.g.,
Orthonormal vectors: 1u2uConstraints
and
• Represent high‐dimensional data w/ low‐dimensional data (i i i l t )
nx ny(i.e., principal components)
1st principal component:Synthetic variable w/ an eigenvector for the largest eigenvalue
μxu
y
n
nn
yΤ
Τ11, n
Synthetic variable w/ an eigenvector for the largest eigenvalue
u
nn
n y Τ22,
n2nd principal component:h bl / f h d l l
1 0
ΛM 0 C i t i
Synthetic variable w/ an eigenvector for the 2nd largest eigenvalue
7
2
1
0 ΛMean vector: 0 Covariance matrix:
Whitening Transformation
0Mean vector:μMean vector: U Τ
nΛCovariance :SCovariance :
x
μxUy nnΤ
n
y
nx
ny
Whiteningnn yΛz 2/1
n nWhitening
UΛ Τ2/1 μxUΛz nnΤ2/1
n
0Mean vector:Linear transform f hit i
nn
nzI0
Covariance :for whitening
8
Continuous LatentContinuous LatentContinuous Latent Continuous Latent VariablesVariables
(from PCA to(from PCA to FAFA))(from PCA to (from PCA to FAFA))
Whitening Process with PCA
Linear transformationHigh‐dimensional space Low‐dimensional space
Linear transformation for whitening
z μxuΤ21
Observation data: nxμMean vector :
Low dimensional data: nzMean vector : 0
1 n
nn z μxu11SCovariance : Covariance :
nx1
n
nz01. Dimension reduction
2 Processing for low dimensional data2. Processing for low‐dimensional datae.g., probability density modeling
Regard low‐dimensional data as observation data Ignore errors caused by linear transformation
i.e., unable to model probability density of the original observation data
9
Basic Idea of Factor Analysis (FA)
Linear transformationHigh‐dimensional space Low‐dimensional space
Linear transformation Low dimensional data: nzMean vector :μux nn z21
11ˆ 0Projected data: nx̂
nn Covariance :
nx̂ 1
tz02. Projection onto
1. Low‐dimensionaldata generation
Observation data: nxsubspace
data generation
x exx ˆ3. Random noise addition
nx nnn exx
10
Comparison between PCA and FA• FA capable of defining p.d.f. of observation data based on inversion
process of whitening transformation
Observation data: Low dimensional data:x z
10
Observation data: nx nzμMean vector :
SCovariance :
Mean vector :Covariance :
Τ1Whitening with PCA
nxS μxu nnz Τ
11
1
ˆnz0
ˆ
nx
Modeled asModeled as
μux nn z11ˆ Factor analysis (FA)
Error: a random variablea random variable
Modeled as a random variableModeled as a random variableModeled as a random variableModeled as a random variable
11
Representation of Observation Data w/ FA• Representation of observation data Loading matrixg
n
n Observation model given the factors n n
ΣμWzxλzx , ;,| nnnnp Ngiven the factors
Observation noiseFactors Observation noise
Σ0eλe , ;| nnp N I0zλz ;|p N
Factors(low dimensional data)
I0zλz ,;| nnp N
012
Marginalization over Latent VariablesIf one sample is generated
If K samples are generated
If an infinite # of samples are generatedgenerated… generated… are generated…
λ|,, )()1( K zpzz λ|)1( zpz 0 0 0
λ|nzp λ|,, nnn zpzz λ|nn zpz |np
Σ
λx ,|)1(
)1( nn zp
N K
k )(1 ΣN
n
zzpzp
p
d||
|
λλx
λx
λx |np
Σμwx , ; )1( nn zN
k
knn z
K 1
)( ,; ΣμwxN nnnn zzpzp d |,| λλx13
Derivation of p.d.f. of Observation Data• Derived by marginalizing the joint p.d.f. over factors, which are
nnnnn ppp zλzλzxλx d|,||regarded as a latent variable
nnnnn ppp zzz |,||
nnnn zI0zΣμWzx d ,;,; NN
ΣWWμx Τ,;nN
Covariance matrix
Expectations:
ΤΤ zI0zμWzμWzΣxx
μμzWzI0zμWzx
nnnnn
d,;
d,;
N
N = Mean vector
ΤΤΤΤΤΤ μzWWzμμμWzzWΣ
zI0zμWzμWzΣxx
nnnn
nnnnnn d,;N
ΤΤ μμWWΣ = Covariance matrix + Squared mean vector14
Comparison between GMM and FAGMM: discrete latent variables FA: continuous latent variables
1z 2z 3z1z 2z 3z
1x 2x 3x1x 2x 3x
nn zpp || λλx
λλx |1|1 ,
M
m mnn zpp
nnn zzp d ,| λx λx ,1| , mnn zp
Prior = discrete distribution Prior = Gaussian distributionPrior discrete distribution Prior Gaussian distribution
15
M d l T i iM d l T i iModel TrainingModel Training(Parameter Optimization)(Parameter Optimization)(Parameter Optimization)(Parameter Optimization)
Maximum Likelihood (ML) Estimation• Log‐scaled likelihood function:
N
nn
N
nnnnpp
11,;lnd |,ln|ln ΣWWμxzλzxλX ΤN
• ML estimates of model parameters: Linear equation!
N
nnN 1
1ˆ xμ 0μ
λX
λλ
ˆ
|ln pMean vector : equation!
L di t i 0λX |ln pNonlinear equations… ˆLoading matrix : 0
WλX
λλ
ˆ
|ln p equations…
λX |ln p
W
ˆ
?
Covariance matrix : 0Σ
λX
λλ
ˆ
|ln p Σ̂ ?
How to determine ML estimates of these parameters?16
Lower Bound of Likelihood Function• Derivation of lower bound of log‐scaled likelihood function
Log‐scaled likelihood function:
zλzxλX d |,ln|ln1
ppN
nnnn
Probability density functionof latent variables
zq
zz
λzxz d |,ln1 q
pqN
nn
n
nnn
J ’ i li
zz
λzxz d|,ln1
1
qpq
qN
nnn
n
n n
Jensen’s inequality
λ
z,
1
qqn n
L
Lower bound:
N
nnn
n qpqq d |,ln, z
zλzxzλL n nq1 z
17
EM Algorithm• Maximization of lower bound (functional of q and function of λ )
N
Maximize lower bound with respect to q:
n
nnn pqpq1
,|||KL|ln, λxzzλXλL
KL divergenceLog‐scaled likelihood
q||pKLMaximize lower bound with respect to λ(={W, Σ}):
λX |ln p λ,qL
N
nnnnn pqq
1
d |,ln, zλzxzλLn 1
Auxiliary function
18
Review: Schematic Image of EM Algorithm
λX |ln p
λX |ln
λX |ln pLog‐scaled likelihood function
λX |ln p )1()1( , iiq λL
)()( , iiq λL λ,)1( iqL
λ)(iλ )1( iλ
λ,)(iqL )(iλ )1( iλ
1 E‐step: determine lower
,q3. E‐step
0. Current model parameter set
2. M‐step: update model parameters based on the lower bound
1. E step: determine lowerbound based on current model parameters
parameter set the lower bound
19
E‐Step: Update q• Set KL divergence to 0 under the fixed model parameters oldλ
old,|ˆ λxzz nnn pq 0,|||KL1
old
N
nnnn pq λxzz
C l l t t i b bilit d it f l t t i bl f h lCalculate posterior probability density of latent variables for each sample
0ˆKL ||pq || pp λzλzx
old,| λxz nnp
d |,|
|,|
oldold
oldold
nnnn
nnn
pppp
zλzλzxλzλzx
old|ln λXp old,ˆ λqL |,|1oldold nnn pp
Zλzλzx
,;,ˆ;1nnnZ
I0zΣμWzx NN
? , ? ;nzN20
Posterior Probability Density Function
const1exp,| )|(1)|(1)|(old
xzn
xznn
xznnnp μΣzzΣzλxz ΤΤ
2
p,| old nnnnnnp μzzzz
)|(1)|()|(1exp xzxzxz μzΣμz Τ* See appendix 1
2
exp nnnn μzΣμz
Posterior probability density function of latent variables:
ppfor derivation
)|()|(old ,;,| xzxz
nnnnp Σμzλxz N
Posterior probability density function of latent variables:
11)|( IWΣWΣ ΤxzCovariance matrix
+1
‐ Sample‐independent
xzxz ΣWΣ 1)|()|( ΤMean vector
‐ Full matrix
nxzxz
n xΣWΣμ 1)|()|( Τn
nxA ‐ Sample‐dependent‐ Linear transformation
n
μxx ˆ nn -n
wheren
21
Schematic Image of E‐Step
)|()|( xx
Posterior probability density funciton of latent variables: )|()|(
old ,;,| xzxznnnnp Σμzλxz N
11 IWΣW Τ μxΣWΣ ˆ1)|( xz Τ CovarianceMean IWΣW μxΣWΣ n
+n1
-
Covariance matrixvector
Observation data samples
)|()|(|| xzxzp Σμzλxz N
Posterior p.d.f. of latent variables
1x )|()|(11old11 ,|,|p Σμzλxz N1
1st data sample
)|()|(|| xzxz Σλ Nx )|()|(22old22 ,|,| xzxzp Σμzλxz N2x2nd data sample
22
M‐Step: Update λ• Maximize auxiliary function with respect to model parameters newλ
N
nnnn pqq d |,lnˆ, zλzxzλL
||pq̂KLn 1
Auxiliary function
N oldnew , λλQ
d |,ln,|1
newold
N
N
nnnnnn pp zλzxλxz new,ˆ λqL
new|ln λXp d ,|ln,|
1newold
N
nnnnnn pp zλzxλxz
d ,;ln,;1
)|()|(
N
nnnn
xzxznn zΣWzxΣμz NN
?23
Expansion of Auxiliary Function
N
nnnxzxz
nn)|()|(
oldnew d ,;ln,;, zΣWzxΣμzλλ NNQ* See appendix 2for more details
nnnnnn
1oldnewQ
N
nn11
21ln
21 xΣxΣ T Expectation of nz n
z
nnn
1 22
nnnxzxz
nn xΣWzzΣμz 1)|()|( d ,; TTN
Expectation of Τnnzz
Τzz
d ,;tr
21 )|()|(1
nnnxzxz
nn zzzΣμzWΣW TT N
n
N
nnN 11 tr21ln
21 TxxΣΣ Summation of T
nnxx
2
n 122
NN11 tr1tr TTTT zzWΣWzxΣW
nn
# of samples
n
nn
nn11
tr2
tr zzWΣWzxΣW
Summation of TzzSummation ofTzx Summation of
nzzSummation of nn zx
24
Sufficient Statistics
Analytical calculation of expectations:
)|()|()|( d ,; xznnn
xzxznnn
μzzΣμzz N TTT )|()|()|()|()|( d; xzxzxzxzxz μμΣzzzΣμzzz N
n
+n
)|()|()|()|()|( d,; nnnnnnnnμμΣzzzΣμzzz N n nn +
Sufficient statistics:
N
NSufficient statistics:
# of samples
N
nnn
1
TT xxxx
N
n 1 nndiagSum of squared samples
N
n 1
N
nn
1
TT zzzzn
Sum of expectations of squared latent variables
N
nnn
1
TT zxzx
N
n 1 nnSum of cross terms
25
n 1
ML Estimates
TxxΣΣλλ 11oldnew tr1ln1, NQAuxiliary function: oldnew 22
,Q
TTTT zzWΣWzxΣW 11 tr1tr
y
zzz2
ML estimate of loading matrix: λλ Linear equation! 0zzWΣzxΣW
λλ
λλ
TT ˆˆˆ, 11
ˆ
oldnewQ
1
Linear equation!
ML i f i iLinear equation!
1ˆ TT zzzxW
1
ML estimate of covariance matrix: 0WzzWWzxxxΣΣ
λλ
TTTTT ˆˆ
21ˆ
21ˆ
21diag,
1oldnew NQ
q
Σ λλ 222ˆ
1
TTT WzxxxΣ ˆdiag1ˆ diag
26
WzxxxΣ diagN
diag
App. 1: Derivation of Posterior p.d.f.nx I0zΣμWzxλx ,;,ˆ;,| old nnnnnzp NN
nnnnnn zzWzμxΣWzμx ΤΤ
21expˆˆ
21exp 1
- 22
const11 11 xΣWzzzWzΣWz ΤΤΤΤΤInside of exp( ) const22
nnnnnn xΣWzzzWzΣWz
const1 11 xΣWzzIWΣWz ΤΤΤΤ
Inside of exp( )
1)|( xzΣ
const2
nnnn xΣWzzIWΣWz
+)|( xzΣ
const21 1)|(1)|(1)|(
nxzxz
nnxz
n xΣWΣΣzzΣz ΤΤΤ
)|( xz1
2
)|( xznμ
const21 )|(1)|(1)|(
xzn
xznn
xzn μΣzzΣz ΤΤ
Appendix: 1
App. 2: Expansion of Auxiliary Function (1)Auxiliary function:
N
N
N
nnnn
xzxznn
1
)|()|(oldnew d ,;ln,;, zΣWzxΣμzλλ NNQ
N
nnn
xzxznn
1
11)|()|(
21ln
21 ,; xΣxΣΣμz TN
nnnnn zzWzΣWxΣWz d tr21 11
TTTTQuadratic form in nz
N
nnn
1
11
21ln
21 xΣxΣ T
n 1
nnnxzxz
nn xΣWzzΣμz 1)|()|( d ,; TTN
d ,;tr
21 )|()|(1
nnnxzxz
nn zzzΣμzWΣW TT NExpectation of nz
Expectation of Τ
nnzzAppendix: 2
App. 2: Expansion of Auxiliary Function (2)
Analytical calculation of expectations:
)|()|()|( d ,; xznnn
xzxznnn
μzzΣμzz N TTT )|()|()|()|()|( d; xzxzxzxzxz μμΣzzzΣμzzz N
n
+n
)|()|()|()|()|( d,; nnnnnnnnμμΣzzzΣμzzz N
N
11 11 T
n nn +
Q d ti f i x
nnn
1
11oldnew 2
1ln21, xΣxΣλλ TQ
1 11 TTTT
Quadratic form in nx
tr
21 11
nnnTTTT zzWΣWxΣWz
N11
nnnN
1
11 tr21ln
21 TxxΣΣ
Summation of Tnnxx
# f l
N
nn
N
nnn
1
1
1
1 tr21tr TTTT zzWΣWzxΣW
# of samples
Summation of n
TzzSummation of Tnn zx
Appendix: 3
App. 2: Expansion of Auxiliary Function (3)Sufficient statistics:
N
TT xxxx
N
N n
# of samples
f d l
n
nn1xxxx
NN
TT
n 1 nndiagSum of squared samples
Sum of expectations of
N
TTN
N
n 1
n
n1
TT zzzzn
Sum of expectations of squared latent variables
n
nn1
TT zxzx
N
n 1 nnSum of cross terms
TxxΣΣλλ 11oldnew tr
21ln
21, NQAuxiliary function:
22
TTTT zzWΣWzxΣW 11 tr21 tr 2
Appendix: 4