Upload
jeysam
View
225
Download
0
Embed Size (px)
Citation preview
8/11/2019 Graphic Models Lecture13
1/37
A Practical Introduction to Graphical Modelsand their use in ASR
6.345
Lecture # 13
Session 2003
8/11/2019 Graphic Models Lecture13
2/37
Graphical models for ASR 26.345 Automatic Speech Recognition
Graphical models for ASR
HMMs (and most other common ASR models) have somedrawbacks
Strong independence assumptions
Single state variable per time frame
May want to model more complex structure
Multiple processes (audio + video, speech + noise, multiple
streams of acoustic features, articulatory features) Dependencies between these processes or between acoustic
observations
Graphical models provide:
General algorithms for large class of models
No need to write new code for each new model
A language with which to talk about statistical models
8/11/2019 Graphic Models Lecture13
3/37
Graphical models for ASR 36.345 Automatic Speech Recognition
Outline
First half intro to GMs
Independence & conditional independence
Bayesian networks (BNs)
* Definition
* Main problems
Graphical models in general
Second half dynamic Bayesian networks (DBNs) for speechrecognition
Dynamic Bayesian networks -- HMMs and beyond
Implementation of ASR decoding/training using DBNs
More complex DBNs for recognition
GMTK
8/11/2019 Graphic Models Lecture13
4/37Graphical models for ASR 46.345 Automatic Speech Recognition
(Statistical) independence
Y Definition: Given the random variables and ,
)()|( xpyxp =Y
c c
)()(),( ypxpyxp = )()|( ypxyp =
8/11/2019 Graphic Models Lecture13
5/37Graphical models for ASR 56.345 Automatic Speech Recognition
(Statistical) conditional independence
Y Z Definition: Given the random variables , , and ,
)|(),|( zxpzyxp =ZYX |
c c
)|()|()|,( zypzxpzyxp = )|(),|( zypzxyp =
8/11/2019 Graphic Models Lecture13
6/37
Graphical models for ASR 66.345 Automatic Speech Recognition
Is height independent of hair length?
x
xx
x
x
x
xxx xx
x
xx x
x
x x
x
xxx
xx
x
x
x
x
x
x
x
xxx
xx
x
x xxxxx
x
xx
x x
x
x
xxx
xxxx
x
x
x
x
x
x
xxx
xx
x
x xxxxx
x
xx
x x
x
x
xxx
xx
H
5 6 7
L
short
mid
longx = female
x = male
8/11/2019 Graphic Models Lecture13
7/37
Graphical models for ASR 76.345 Automatic Speech Recognition
Is height independent of hair length?
Generally, no
If gender known, yes
This is the common cause scenario
gender
hairlength
height
G
HL
)|(),|(
)()|(
ghpglhp
hplhp
=
GLH |
LH
8/11/2019 Graphic Models Lecture13
8/37
Graphical models for ASR 86.345 Automatic Speech Recognition
Is the future independent of the past(in a Markov process)?
Generally, no
If present state is known, then yes
QiQi-1 Qi+1
)|(),|(
)()|(
111
111
iiiii
iii
qqpqqqp
qpqqp
++
++
=
iii QQQ |11 + 11 +
ii QQ
8/11/2019 Graphic Models Lecture13
9/37
Graphical models for ASR 96.345 Automatic Speech Recognition
Are burglaries independent of earthquakes?
Generally, yes
If alarm state known, no
Explaining-away effect: the earthquake explains away the
burglary
alarm
earthquake burglary
A
BE
)|(),|(
)()|(
abpaebp
bpebp
= BE
ABE |
8/11/2019 Graphic Models Lecture13
10/37
Graphical models for ASR 106.345 Automatic Speech Recognition
Are alien abductions independent ofdaylight savings time?
Generally, yes
If Jim doesnt show up for lecture, no
Again, explaining-away effect
Jim
absent
AD
J
alien
abductionDST
)|(),|(
)()|(
japjdap
apdap
= D
JAD |
8/11/2019 Graphic Models Lecture13
11/37
Graphical models for ASR 116.345 Automatic Speech Recognition
Is tongue height independent of lip rounding?
Generally, yes If F1 is known, no
Yet again, explaining-away effect...
tongueheight
lip
roundingRH
F1
)|(),|(
)()|(
11 fhpfrhp
hprhp
= R
1|FRH
8/11/2019 Graphic Models Lecture13
12/37
Graphical models for ASR 126.345 Automatic Speech Recognition
More explaining away...
C4C2
Lleak
C3 C5C1
pipes faucet caulking drain upstairs
jiLCC ji ,| )|(),|(
)()|(
lcplccp
cpccp
iji
iji
= jiCC ji ,
8/11/2019 Graphic Models Lecture13
13/37
Graphical models for ASR 136.345 Automatic Speech Recognition
Bayesian networks
The preceding slides are examples of simple Bayesian networks
Definition:
Directed acyclic graph (DAG) with a one-to-one correspondencebetween nodes (vertices) and variablesX1,X2, ... ,XN
Each nodeXi with parentspa(Xi) is associated with the localprobability functionpXi|pa(Xi)
The joint probability of all of the variables is given by the product
of the local probabilities, i.e.p(xi, ... ,xN) = p(xi|pa(xi))
B
D
A
Cp(a)
p(b|a)
p(c|b)
p(d|b,c)
p(a,b,c,d) =p(a)p(b|a)p(c|b)p(d|b,c)
A given BN represents a familyof probability distributions
8/11/2019 Graphic Models Lecture13
14/37
Graphical models for ASR 146.345 Automatic Speech Recognition
Bayesian networks, contd
Missing edges in the graph correspond to independenceassumptions
Joint probability can always be factored according to thechain rule:
p(a,b,c,d) =p(a)p(b|a)p(c|a,b)p(d|a,b,c)
But by making some independence assumptions, we get a
sparse factorization, i.e. one with fewer parameters
p(a,b,c,d) =p(a)p(b|a)p(c|b)p(d|b,c)
8/11/2019 Graphic Models Lecture13
15/37
Graphical models for ASR 156.345 Automatic Speech Recognition
Medical example
lungcancer
smoker genes
parentsmokerprofession
Things we may want to know: What independence assumptions does
this model encode?
What isp(lung cancer|profession) ?
p(smoker|parent smoker, genes) ? Given some of the variables, what are
the most likely values of others?
How do we estimate the local
probabilities from data?
8/11/2019 Graphic Models Lecture13
16/37
Graphical models for ASR 166.345 Automatic Speech Recognition
Determining independencies from a graph
There are several ways... Bayes-ball algorithm (Bayes-Ball: The Rational Pastime,
Schachter 1998)
Ball bouncing around graph according to a set of rules
Two nodes are independent given a set of observed nodes if aball cant get from one to the other
8/11/2019 Graphic Models Lecture13
17/37
Graphical models for ASR 176.345 Automatic Speech Recognition
Bayes-ball, contd
Boundary conditions:
B b ll i di l l
8/11/2019 Graphic Models Lecture13
18/37
Graphical models for ASR 186.345 Automatic Speech Recognition
Bayes-ball in medical example
lungcancer
smoker genes
parentsmokerprofession
According to this model:
Are a persons genes independent of whether they have a parent
who smokes? What about if we know the person has lungcancer?
Is lung cancer independent of profession given that the person isa smoker?
(Do the answers make sense?)
8/11/2019 Graphic Models Lecture13
19/37
Graphical models for ASR 196.345 Automatic Speech Recognition
Inference
Definition: Computation of the probability of one subset of the variables
given another subset
Inference is a subroutine of:
Viterbi decoding
q* = argmaxqp(q|obs)
Maximum-likelihood estimation of the parameters of the localprobabilities
* = argmaxp(obs| )
G hi l d l (GM )
8/11/2019 Graphic Models Lecture13
20/37
Graphical models for ASR 206.345 Automatic Speech Recognition
Graphical models (GMs)
In general, GMs represent families of probability distributionsvia graphs
directed, e.g. Bayesian networks
undirected, e.g. Markov random fields
combination, e.g. chain graphs
To describe aparticulardistribution with a GM, we need tospecify:
Semantics: Bayesian network, Markov random field, ...
Structure: the graph itself
Implementation: the form of the local functions (Gaussian, table, ...)
Parameters of local functions (means, covariances, table entries...)
Not all types of GMs can represent all sets of independenceproperties!
8/11/2019 Graphic Models Lecture13
21/37
Graphical models for ASR 216.345 Automatic Speech Recognition
Example of undirected graphical models:Markov random fields
Definition:
Undirected graph
Local function (potential) defined on each maximal clique
Joint probability given by normalized product of potentials
Independence properties can be deduced via simple graphseparation
B
D
AC ),,(),(),,,( ,,, dcbbadcbap DCBBA
8/11/2019 Graphic Models Lecture13
22/37
Graphical models for ASR 226.345 Automatic Speech Recognition
Dynamic Bayesian networks (DBNs)
BNs consisting of a structure that r p ts an indefinite (ordynamic) number of times
Useful for modeling time series (e.g. speech)
D
A
C
B
D
A
C
B
D
A
C
B
frame i frame i+1frame i-1
8/11/2019 Graphic Models Lecture13
23/37
Graphical models for ASR 236.345 Automatic Speech Recognition
DBN representation of n-gram language models
Bigram:
Wi+1
Wi
Wi-1
. . . . . .
Trigram:
Wi+1WiWi-1. . . . . .
R ti HMM DBN
8/11/2019 Graphic Models Lecture13
24/37
Graphical models for ASR 246.345 Automatic Speech Recognition
Representing an HMM as a DBN
1 2 3
obs obs obs
= state
= allowed transition
= variable
= allowed dependency
HMM DBN
.3
.7 .8
.2
1
1 2 3qi
qi-1
1 .7 .3 0
2 0 .8 .2
3 0 0 1
P(qi|qi-1)
q=1
q=2
q=3
P(obsi | qi)
frame i
Qi
obsi
. . .. . .
Qi-1
obsi-1
Qi+1
obsi+1
frame i+1frame i-1
8/11/2019 Graphic Models Lecture13
25/37
Graphical models for ASR 256.345 Automatic Speech Recognition
Casting HMM-based ASR as a GM problem
Qi
obsi
. . .. . .
Qi-1
obsi-1
Qi+1
obsi+1
Viterbi decoding finding the most probable settings for allqi given the acoustic observations {obsi}
Baum-Welch training finding the most likely settings for
the parameters of P(qi|qi-1) and P(obsi| qi)
Both are special cases of the standard GM algorithms forViterbi and EM training
8/11/2019 Graphic Models Lecture13
26/37
Graphical models for ASR 266.345 Automatic Speech Recognition
Variations
Input-output HMMs
Qi. . .. . . Q
i-1Q
i+1
Xi-1 Xi Xi+1
Xi-1 Xi Xi+1
Factorial HMMs
Qi . . .. . . Qi-1 Qi+1
Xi-1 Xi Xi+1
RiRi-1 Ri+1
S it hi t
8/11/2019 Graphic Models Lecture13
27/37
Graphical models for ASR 276.345 Automatic Speech Recognition
Switching parents
Definition: A variableX is a switching parent of variable Y if the value ofX
determines the parents and/or implementation of Y
Example:
B
D
A
C
A=0 D has parent B with Gaussian distribution
A=1 D has parent C with Gaussian distribution
A=2 D has parent C with mixture Gaussian distribution
HMM-based recognition with a DBN
8/11/2019 Graphic Models Lecture13
28/37
Graphical models for ASR 286.345 Automatic Speech Recognition
HMM based recognition with a DBN
word
word transition
word position
state transition
phone state
observation
variable name
end of utterance
frame 0 frame i last frame
What language model does this GM implement?
T i i d t ti DBN
8/11/2019 Graphic Models Lecture13
29/37
Graphical models for ASR 296.345 Automatic Speech Recognition
Training and testing DBNs
Why do we need different structures for training testing?Isnt training just the same as testing but with more of thevariables observed?
Not always!
Often, during training we have onlypartialinformation aboutsome of the variables, e.g. the word sequence but not whichframe goes with which word
More comple GM models for recognition
8/11/2019 Graphic Models Lecture13
30/37
Graphical models for ASR 306.345 Automatic Speech Recognition
More complex GM models for recognition
HMM + auxiliary variables (Zweig1998, Stephenson 2001)
Noise clustering
Speaker clustering
Dependence on pitch, speaking rate,etc.
Q
obs
Q
obs
aux
Q
obs
a1 a2 aN. . .
Articulatory/feature-based modeling
Multi-rate modeling, audio-visual speech recognition (Nefian et al.2002)
Modeling inter-observation dependencies:
8/11/2019 Graphic Models Lecture13
31/37
Graphical models for ASR 316.345 Automatic Speech Recognition
Modeling inter observation dependencies:Buried Markov models (Bilmes 1999)
First note that observation variable is actually a vector ofacoustic observations (e.g. MFCCs)
obs
Qi+1QiQi-1
Consider adding dependencies between observations
Add only those that are discriminative with respect to classifying
the current state/phone/word
Feature-based modeling
8/11/2019 Graphic Models Lecture13
32/37
Graphical models for ASR 326.345 Automatic Speech Recognition
g
Phone-based view: Brain:Give me a []!
Lips, tongue, velum, glottis:Right on it, sir!
Lips, tongue, velum, glottis:Right on it, sir!
Lips, tongue, velum, glottis:Right on it, sir!
Lips, tongue, velum, glottis:Right on it, sir!
(Articulatory) feature-basedview:
Tongue:Ummyeah, OK.
Lips:
Huh?
Brain:Give me a []!
Velum, glottis:Right on it, sir !Velum, glottis:Right on it, sir !
A feature based DBN for ASR
8/11/2019 Graphic Models Lecture13
33/37
Graphical models for ASR 336.345 Automatic Speech Recognition
A feature-based DBN for ASR
frame i+1frame i
phone
state
A1 A2 AN. . .
O
phone
state
A1 A2 AN. . .
O
p(o|a1, ... ,aN)
GMTK: Graphical Modeling Toolkit
8/11/2019 Graphic Models Lecture13
34/37
Graphical models for ASR 346.345 Automatic Speech Recognition
p g(J. Bilmes and G. Zweig, ICASSP 2002)
Toolkit for specifying and computing with dynamicBayesian networks
Models are specified via:
Structure file: defines variables, dependencies, and form ofassociated conditional distributions
Parameter files: specify parameters for each distribution in structurefile
Variable distributions can be
Mixture Gaussians + variants
Multidimensional probability tables
Sparse probability tables
Deterministic (decision trees)
Provides programs for EM training, Viterbi decoding, andvarious utilities
Example portion of structure file
8/11/2019 Graphic Models Lecture13
35/37
Graphical models for ASR 356.345 Automatic Speech Recognition
Example portion of structure file
variable : phone {
type: discrete hidden cardinality NUM_PHONES;
switchingparents: nil;
conditionalparents: word(0), wordPosition(0) using
DeterministicCPT("wordWordPos2Phone");
}
variable : obs {
type: continuous observed OBSERVATION_RANGE;
switchingparents: nil;
conditionalparents: phone(0) using mixGaussian
collection(global) mapping("phone2MixtureMapping");
}
Some issues
8/11/2019 Graphic Models Lecture13
36/37
Graphical models for ASR 366.345 Automatic Speech Recognition
Some issues...
For some structures, exact inference may becomputationally infeasible approximate inference
algorithms
Structure is not always known structure learningalgorithms
References
8/11/2019 Graphic Models Lecture13
37/37
Graphical models for ASR 376.345 Automatic Speech Recognition
References
J. Bilmes, Graphical Models and Automatic SpeechRecognition, in Mathematical Foundations of Speech andLanguage Processing, Institute of Mathematical Analysis
Volumes in Mathematics Series, Springer-Verlag, 2003.
G. Zweig, Speech Recognition with Dynamic BayesianNetworks, Ph.D. dissertation, UC Berkeley, 1998.
J. Bilmes, What HMMs Can Do, UWEETR-2002-0003, Feb.2002.