1986_Levinson_Maximum Likelihood Estimation for Multivariate Mixture Observations of Markov Chains

8/2/2019 1986_Levinson_Maximum Likelihood Estimation for Multivariate Mixture Observations of Markov Chains

http://slidepdf.com/reader/full/1986levinsonmaximum-likelihood-estimation-for-multivariate-mixture-observations 1/3

308 IEEE TRANSACTIONSON INFORMATIONTHEORY,VOL.IT-32,NO.2,~ARCH1986

INTRODUCTION

In a recently published paper, Liporace [6] derived a methodfor estimating the parameters of a broad class of elli pticallysymmetric probabilistic functions of a Markov chain. Thecorollary to that work presented here was motivated by the desireto use this general technique to mode l the speech signal for whichit is known [4], [8] that, unfortunately, certain of its most usefulparameterizations do not possess he prescribed symmetry. Since

any continuous probability density function can be approximatedarbitrarily closely by a normal mixture [9], it i s reasonable to usesuch constructs to avoid the restrictions imposed by the require-ment of elli ptical symmetry. In thi s correspondence we adapt themethod and proof of [6] to two types of mixture densities.

02~ (0) = 0. Thus a recursive application of Y to somevalue of X converges to a local maximum (or possibly an tion point) of the likelihood functions. Liporace’s result laxed the original requirement of Baum et al. [2] that 5(xstrictly log concave to the requirement that it be strictconcave and/or elliptically symmetric. We will further exteclass of admissible pdf’s to mixtures and products of mixtustrictly log concave and/or elliptically symmetric densities

For the present problem we will show that a suitable ma

Y is given by the following equations:T-l

C a,(i>u,jbj(O,+l)p,+l(j)Zij =quij) = t=l

NOMENCLATURE c %(9&(i)Throughout this presentation we shall, where possible, adopt

the notation used in [6]. Consider an unobservable n-state Markovchain with state transition matrix A = [ u~~],,~,~. ssociated witheach state j of the hidden Markov chain is a probability densityfunction b,(x), of the observed d-dimensional random vector x.Here we shall consider densities of the form

Cjk .7(cjk) = ‘=;

t~14i)8,(i) ’

(1)k=l

where m is known; cjk 2 0 for 1 5 j I n, 1 I k 5 m;C;‘,ic,, = 1 for 1 < j I n; and .N(x, p, U) denotes the d-di-mensional normal density function of mean vector p and covari-ance matrix U.

It is convenient then to think of our hidden Markov chains asbeing defined over a parameter manifold A = { zZpp” Q” x gdx @‘}, where .sP is the set of all n x n row-wise stochasticmatrices; Y”’ is the set of all M x n row-wise stochastic matrices;Wd is the usual d-dimensional Euclidean space; and ed is theset of all d x d real symmetric positive definite matrices. Thenfor a given sequence of observations, 0 = 0, ,Oz,. ’ . , Or, of thevector x and a particular choice of parameter values X E A, wecan efficiently evaluate the likelihood function, L,(O), of the

hidden Markov chain by the forward-backward method ofBaum [l].

The forward and backward partial likelihoods, cy,( ) andp,(i), are computed recursively from

respectively. The recursion is initialized by setting a,(l) =1, q)(j) = 0 for 2 <j I n and &-(I’) = 1 for 1 I i I n,

whereupon we may write

for any t between 1 and T - 1.

THEESTIMATIONALGORITHM

The parameter estimation problem is then one of maximizingZi( 0) with respect to X for a given 0. One way to mtimize Z’Ais to use conventional methods of constrained optimization.Liporace, on the other hand, advocates a reestimation techniqueanalogous to that of Ba um et al. [l], [2]. It is essentially amapping Y: A + A with the property that

i$k =r(ll/k) = ‘=f ,

c di, k>Nj)I=1

f dj? k)bt(j)(q - pjk)(q - pjk)’

if& = 9-( qk) = 1-1

,$ldj9 k)&(j)

forl~i,jIn,lIkImandlIr,sId.

In U-(9)

( db,tfort=l,

for 1 < t 4

(For f ixed k, p,( j, k) is formally identical to p:(j), as definLiporace.)

Proof of the Formu1u.s:Equations (6) and (7) for the rmation of ai, and cjk follow directly from a theorem of

and Sell [3] because the likelihood function ZA(0) given ina polynomial with nonnegative coefficients in the varu,~, ~,~,l s i, j I n,l I k I m.

To prove (8) and (9) our strategy, following Liporace, define an appropriate auxiliary function Q(1,x). This funwill have the property that Q(x, X_> Q(x, A) implies Z’x<sA(0). Further, as a function of h for any fixed X, Q(x, Xhave a unique global maximum given by (6)-(9).

As a first step to derive such a function we exprelikelihood function as a sum over the set, 9, of all staquences S:

IEEE TRANSACTIONS ON INFORMATION THEORY, V OL. IT-32, NO. 2, MARCH 1986 309

Let us partition the likelihood functi on further by choosi ng aparticular sequence, K = (k, , k, , . . . , k,), of mixture densit ies.As in the case of state sequences e denote the set of all mixturesequences s X= { 1,2,. . . , m}? Thus for someparticular K E Xwe can write the joint likelihood of 0, S, and K as

~A(O,S,K)ras-,.,~(~)~LS,I(I)US,~,)CS,~,.14=1

We have now succee ded n partitioning the likelihood function as

Yh(O) = c c g~(o,s,K). (13)SE9KEY

In view of the similarity of t he representation (13) to that of-Ep, in [6], we now define the auxiliary function

Q&x) =CC~~(O,S,K)log~~(O,S,K). (14)S K

When the expressions for Px and 2~ derived from (12) aresubstituted in (14), we get

where Y,~,~,, 0. The innermost summation in (15) is formallyidentical to that used by Liporace in his proof; therefore, theproperties which he demonstrated for hi s auxiliary function withrespect to p and U hold i n our case as well, thus giving us (8)and (9). We may thus conclude that (5) is correct for Y definedby (6)-(9). Furthermore, the parameter separation made explicitin (12)-(15) allows us to apply the same algorithm to mixtures ofstrictly log concave densities and/or,elliptically symmetric densi-ties as treated by Liporace in 161.

DISCUSSION

In [6] Liporace notes that by setting a,, = pj,l I j I n,Vi,the special case of a single mixture can be treated. It is naturalthen to t hink of usi ng a model with n clusters of m states, eachwith a single associated Gaussian density function as a way of

treating the Gaussian mixture problem considered here.The transformation can be accompli shed n the following way.First we expand the state spaceof our n-state model as shown inFig. 1, in which we have adde d states j, through j, for eachstate j in the original Markov chain. Associated with statesJl, 52,‘. ., j,, are distinct Gaussian densities corresponding to t hem terms of the j th Gaussian mixture in our initial formulation.The transitions exiting state j have probabilities equal to thecorresponding mixture weights. State j,, is a distinguished statethat is entered with probability 1 from the other new states, exitsto state j with probabili ty ujj, and generates o observation in sodoing. The transition matrix for this configuration can be writtendown by inspection. A large number of the entries in it will be

Fig. 1. Equivalent M + 2 state configuration for each state with m-term

Gaussian mixture.

zero or unity. As these are unaltered by (6) and (7), they need notbe reestimated. Using this reconfiguration of the state diagram,Liporace’s formulas can be used n case b,(x) is any mixture ofelliptically symmetric densities.

A variant on the Gaussian mixture theme results from usingb,(x) of the form of a product of mixtures,

What we have considered so far is the special case of (16) forD = 1.From the structure of our derivation it is clear that for hidden

Markov chains having densities of t he form (16), reestimationformulas can be derived as before by solving v-,Q(x, X) = 0.Such solutions will yield results quite analogous o (6)-(10). Notethat this case too can be represented as a reconfiguration of thestate diagram.

One numerical difficulty which may be manifest in the meth-ods described is the phenomenon noted by Nadas [7] in whichone or more of the mean vectors converge to a particular ob-servation while the corresponding covariance matrix approachesa singular matrix. Under t hese conditions, PA(O) -+ cc but t hevalue of X is meaningless.A practical, if unedifying, remedy forthis difficulty is to try a different initial A. Alternatively, one candrop the offending term from the mixture since t is only contrib-

uting at one point of A.Finally, we call attention to two minor facets of these al-

gorithms. First, for flexibility in modeling, the number of termsin each mixture may vary with state, so that m i n (1) could aswell be m,. A similar dependenceon dimension results if m in(16) is repl aced by mj,. In either case, the constraints on themixture weights must be satisfied.

Second, for realistic numbers of observati ons, for example,T 2 5000, the reestimation formulas will underflow on any exist-ing computer. The basic scaling mechani sm described n [5] canbe used to alleviate the problem but must be modified to accountfor the fact that the pt/3, product will be missing the tth scalefactor. To divide out the product of scale factors, the tth sum-mand in both numerator and denominator of (7), (9), and (10)must be multiplied by t he missing coefficient.

At this writing numerical experiments based on Monte Carlo

simulations and classification experiments using real speechsig-nals are being conducted. We h ope to report the results of thesestudies upon their completion.

REFERENCES

L. E. Baum, “An inequalit y and associated maximization technique instatistical estimation for probabilistic functions of a Markov process,” in

Inequalities, vol. III, 0. Shisha, Ed. New York: Academic, 1972, pp.l-8.

L. E. Baum, T. Petrie, G. Soules, and N. Weiss, “A maximizationtechnique occurring in the statistical analysis of probabilistic functions ofMarkov chains” Ann. Murh. Statist., vol. 41, pp. 164-171, 1970.

L. E. Baum and G. R. Sell, “Growth transformations for functions onmanifolds,” Pm. J. Math., vol. 27, pp. 211-227, 1968.

A. H. Gray an d J. D. Markel, “Quantization and bit allocation in speechprocessing,” IEEE Truns. Acourt. Speech Signal Processing, vol. ASSP-24,pp. 459-473, Dec. 1976.

S. E. Levinson, L. R. Rabiner, and M. M. Sondhi, “An introduction tothe applicati on of the theory of probabilistic functions of a Markov

process to automatic speech recognition,” Bell. Cyst. Tech. J., vol. 62, pp.1035-1074, Apr. 1983.L. R. Liporace, “Maximum likeli hood estimati on for multivariate ob-

servations of Markov sources.” IEEE Truns. Inform. Theory. vol. IT-28.

pp. 129-734, Sept. 1982.A. Nadas, “Hidden Markov chains, the forward-backward algorithm andinitial statistics,” IEEE Truns. Acoust. Speech Signd Processing, vol.

ASSP-31, pp. 504-506, Apr. 1983.L. R. Rabiner, J. G. Wilpon, and J. G. Ackenhusen, “On the effects ofvarying analysis parameters on an LPC-based isolated word recognizer,”Bell Svst. Tech. J.. vol. 60, pp. 893-911, 1981.

H. W. Sorenson and D. L. Alspach, “Recursive Bayesian estimation using

Gaussian sums,” Aut omuticu, vol. 7, pp. 465-479, 1971.

Authorized licensed use limited to: UNIVERSIDADE DE BRASILIA Downloaded on June 23 2009 at 11:23 from IEEE Xplore Restrictions apply

1986_Levinson_Maximum Likelihood Estimation for Multivariate Mixture Observations of Markov Chains

Documents

Maximum Likelihood Estimation of Mixture Densities for Binned and Truncated Multivariate Data

1 Lecture 6. Maximum Likelihood Conditional Probability and two-stage experiments Markov Chains (introduction). Markov Chains with Mathematica Bayes formula

The Checklist - 4. Projection - Application: multivariate Markov chains

Classical Estimation of Multivariate Markov-Switching ...bellone.ensae.net/MSVARlib-v2.0.pdf · Classical Estimation of Multivariate Markov-Switching Models using MSVARlib Benoˆıt

Introducing Multivariate Markov Modelling within QFD to

Maximum Likelihood Estimation Multivariate Normal distribution

Bayesian Inference in Intractable Likelihood Models...Intractable Likelihood The Bernoulli Factory problem Barkers and more The Markov switching diffusion model & exact Bayesian inference

Multivariate Hidden Markov model An application to study

Likelihood Ratio Tests in Multivariate Linear ModelLikelihood Ratio Tests in Multivariate Linear Model Yasunori Fujikoshi Department of Mathematics, Graduate School of Science, Hiroshima

Combining multivariate Markov chains

Data cloning: easy maximum likelihood estimation … Data cloning: easy maximum likelihood estimation for complex ecological models using Bayesian Markov chain Monte Carlo methods

1 Likelihood and Hidden Markov model

Semi-Markov models under panel observationtitman/semimarkov_talk.pdf · Advantages Computation of likelihood relatively fast ... framework because already using a hidden Markov model

Likelihood-free Markov chain Monte CarloChapter 1 Likelihood-free Markov chain Monte Carlo Scott A. Sisson and Yanan Fan 1.1 Introduction In Bayesian inference, the posterior distribution

Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)

ON THE EFFICIENCY AND CONSISTENCY OF LIKELIHOOD ESTIMATION … · LIKELIHOOD ESTIMATION IN MULTIVARIATE CONDITIONALLY HETEROSKEDASTIC DYNAMIC REGRESSION MODELS Abstract We rank the

Maximum Likelihood Estimation of Mixture Densities for ...1013679611503.pdf · Maximum Likelihood Estimation of Mixture Densities for Binned and Truncated Multivariate Data IGOR V

Estimation of Markov Chain via Rank-constrained Likelihood

Week 6: Protein sequence models, likelihood, hidden Markov ...evolution.gs.washington.edu/gs570/2014/week6.pdf · Week 6: Protein sequence models, likelihood, hidden Markov models

Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known l Univariate