22
Journal of Statistical Planning and Inference 19 (1988) 137-158 North-Holland _-- 137 themuticul Institute, University of Cologne, 186,s Cologne 41, West Germany Received 4 August 1987;revisedmanuscript received1 December 1987 Recommended by M.L. Puri Abstract: The papergives sufficient conditions for strong consistencyof maximumlikelihood estimators in certain nonparametric families. The mainapplication is to familiesof mixtures, with emphasison mixtures over exponentialfamilies. AMS Subject Classification: 62FlQ. 62F12. Key words andphrases: Maximum likeli. .~od estimator; strong consistency; mixtures of distribu- tions; identifiability. Consistency of m.1. (= es certain problems, ere are some special non out to be consistent, but such results are o

Consistency of maximum likelihood estimators for certain nonparametric families, in particular: mixtures

Embed Size (px)

Citation preview

Journal of Statistical Planning and Inference 19 (1988) 137-158 North-Holland

_-- 137

themuticul Institute, University of Cologne, 186,s Cologne 41, West Germany

Received 4 August 1987; revised manuscript received 1 December 1987 Recommended by M.L. Puri

Abstract: The paper gives sufficient conditions for strong consistency of maximum likelihood estimators in certain nonparametric families. The main application is to families of mixtures, with emphasis on mixtures over exponential families.

AMS Subject Classification: 62FlQ. 62F12.

Key words andphrases: Maximum likeli. .~od estimator; strong consistency; mixtures of distribu- tions; identifiability.

Consistency of m.1. (=

es certain problems, ere are some special non

out to be consistent, but such results are o

etics. If the m.1. estimator cannot be determined explicitely, one point a general theorem assuring the existence of a measurable

m-1. estimator. Usually, the regularity conditions require ose are of ifferent nature (and y more restrictive) than the r 011s which

4~~uarame~ rrr?rr&f~sn~v 6 v v’irrr”rvrriirJ . , nd tP!ce f!lPnr cwa rac_ec U;l;;Pr~ exal?rl m_ll t??f&p~tQaS ---- m-Y’, “11w-v us-- __I__ 7iGZFi-i __- ____ ~-_~-:~:

0 not exist. As an ex Pe, consider the estimation of a mixing distrib important instances (see r of this mixing distribution has a finite support.

nce there is no exact m.1. estimator in the basic family of all mixing with Lebesgue density,

To be more t* ;ianical, let {P8 : r9 E 0) be a family of probability measures on a measurable space (X, ) indexed by a parameter d attaining its values in some abstract set 0. Assume there exists a o-finite measure p 1 &dominating (&I : d E 01,

and ‘ret p( l p 19) denote a p-density of Pd. Genera! theorems on the const~fr~cy %f =.-=1 -31 U”. m.1. estimators (see e.g. UMA (1 QW K+fea EMI l&!nlfowitz (1956), Pfamagl (lg@l), WV U&W i&S SSp) IOIFIFI- _a~- _ \

hadur (197 i , Section 9), rlman (1972)) presume local conditions like continuity c9 +p(x, t?), P,-integrabi of x+ inf(log p(.q 6) : 6 E U} for open sets U, and a

global condition, compactness of 0. All these conditions involve a topology over 0 and are, therefore, inherently related.

Conditions of this kind apply mainly to parametric families. For arbitrary non- arametric families, it appears hopeless to find a suitable topology for which such

conditions are fulfilled and which is, in addition, meaningful from the statistical ~-&~C y”rrrL of Vzc; . i-3 * particular problem rests with conditions like P,-integrability of x+ inf(log p(x, 6) : 6 E U}. One pos&ilitjj is to enforce such a condition by restric- tinn tn tvwt~iw. ‘ Law11 C” V”S CU... moot,!? m.!. estimators (catchword ‘penalized m.1. estimators’). For certain nonparametric models, the m.1. estimators are smooth anyhow, thanks to certain inherent properties of the model, but it requires conditions of a different

ture to express this in an appropriate way . Qne s~I~I possibility -was suggest& by

n Section 2 we present a slightly simplified version of *Wang’s approach. In Sec- tions 3 and 4 this resuh is specialized to densities which are concave functions of :>-,* ;i-_ =-==-.-++r. - _ g.eEaI--2.lk~L1 In Section 5 this is further specialized to arbitrary mixtures, in Sec-

xtures over exponential families.

ric families of probability easures are u

he

The hood)

J. is& J co~&le~~ey of ml. ~~~~~~~~~ F,39

that these are not necessarily ic fa ly of probability =I

7n, n E id, is stro~gl;~ comistmt for 6 if

lim z,(x)= 19 for n+=

estimator sequence rnr nE N, is as. m.1. (= to tic

Solfficient for (2.1) is, for instance, the condition IO n

for some a~ (0,l

Q/\ ie i! YI -*, ?zEN. or the applications inten

the following condition. is avoi

there exists 8~ for p4.a. XE

140 J. Pfanzagi / Clmsistency of ml. esti.~~to~.‘t”

f an estimator s ence z,, : Xn+ @, n E IN, is consistent for 19, the pertaining estimator sequence , n E IN, is consistent for in the sup-metric (as a conse- quence of Scheff6’s Lemma).

en any sequence of as. .l. estimators is strongly consistent for 6.

e an open neighborhoo is compact, so is Since {Us: SE UC) is an open exists a finite sub-

-null set involved in (2.6) for Urj, and let N: = U f

n

min lim inf n-l WI J=l ,..., k~~&L$, 1

, n E M, is in UC for an infinite subsequence, say , this implies by (2.7),

n

Km n-l nehJ, 1

n

inf n-l log[m(x,, a&)) 1

J. Pfanzagl / Ckmsistency of ml. estimators

itions refer to a

(x, 6) is continuous on at S=z for

or -a.a. XE the (x, 6) is concave on

o be useful to plications, these conditions have to

e fixe E 19, i ies

142 J. Pfanzagl / Consistency of ml. estimators

n

lim 12-l log E-I-U n+= [ (

M&P 8)

1 IlFi(xy, w

-1 >o. >I Using (3.3) we obtain for

n

ogether with (3.7) this i plies (2.6) for 6@) : = Ul9-t (I- U)6.

ur main application of Thecrem 3. vrlll be with in Sections 5 tion of ability measures w

sistencg of the m.1. estimator for this son (%967), using its ex ng (1985) obtains consistency of as m.1. estimators as an application of his general Theorem 3.1. The proof indicated below is, perhaps, someSwhat simpler.

easure with nonincreasing nsity is robability measure w* known

e. Consistency for as. m.1. estimators for unimodal probability measures with (which requires ghtly stronger regularity conditions) was

egman (1970) and iss (1973, 1978).

J. Pfanzagl / Consistency of ml. estimators

delicacies of m.1. estimators: Co 6 E @ does not entail consistency of m.1. esti arbitrary subsets OoC

n E N, the exact m.1. estimator is defined

If the densities are concave functions of t characterized in a different way,

n

sup n-l l9E@,

m(x,, V)/m(x,, rn 1

This follows from Lemma 7.4, a for h&9) = m(x,, S)I%~(X~~ a,@)). The results of Lindsay (1983) the existence of m.1. estimators of mixing

A: ,.w: drsrr ,butions use, in fact, relation (3.11) rather tha JO), and so does the E algorithm for approximating the mixing distribution nce it suggests itself to con- sider estimator sequences fulfilling an asymptotic version of (3.1 l), say

n

jijis~g n-l m(x,, V)/m(x,, z,(x)) = 1. * 1

(3.12)

It is easily seen that esti or sequences fulfilling (3. quences (fulfilli e they are wnsi in Theorem 3.4. proof of this theorem base

wever, the equivalence between es not carry over to the as. versions (

quences: Not all as. m.1. estimator sequences fulfill ( results refer to a broader class of estimator se

144 J. Pfanzagl / Consistency of ml. estimators

uigy co lion (3.2) in eofem 3.4 can

):=Ztt?+(l -U) se Theorem 3.4 or

plied for e

.1

.

atou’s ies

SW at

146 J. Pfunzagl / Consistency of 26.1. estimators

(iii) Let &C 1 be a finite subset such that

.i.g. we assume that Since iF+m(x, 6) is concave for -a.a. XEX, we hGve for

log[rn(x,, u19+ (1 - u)tQ/m(x,, S)] 1

n

2s’ r lam, l+u ( m(&9 $1 _ 1 1 c_ m(x,, 89 > J

n = n-l * 1&)(x,)log l+u

[ (

m(x,, 89 m(x,, 69 - 11 > _I

+ (4.99

with ai,, :=n-’ Cl !{ai)(~v).

Using inequality (7.3’) with ai=ai , ,&) and &=ai , ,,(#)Md(ai)/Mti{ai) we obtain

ai, n (X)lOg 1+ U 1 [ (2;;; -I)]

tQn(X))lOg[ 1 + U( ciE’ais ‘(” Ci,l, ai, n(x~Md~ail~M~‘tail

- 1)]

+i C \ieI-I,

a&) lOg[l-U]. >

With an :=ITlZUC{ IQ(X) {ai} - 11 :&I,) we obtain for &UT,

(ai) 15 (I+ a&)) ie Ie i

z!s(ll +cr,(x))(f - (499 = (I+ %2(x99(1 -

ogether with ( ), this implies for -8.a. IN 9

n .

d

-1 (x,, u&+(1 -ujt5j/‘m(xvs Sj]

(4.10)

J. Bffmz~gl / Consistency of m.l. estimators 147

(4.11)

the strong law of large numbers there exists an -null set NCXh” such

and

lim ai Jx) = n-22 ’

{ai} for k 1.

Hence XE NC implies the following relations:

lim an(X) = 0 (since IE is finite), n-+00

Therefore, for XE NC, the right-hand side of (4.11) converges for n-oo to a limit which is positive because it is not smaller than the left-hand side of (4.8). ‘rhis proves (2.6).

If the probability measures are discrete with comma not necessary for this family t e a subfamily of a c proof becomes much simpler. ence it appears use this case.

ore e Let 0 be a convex subset of a linear discrete probability measure with (ai:iEN}= concave on

, i.e.

(x) iai It4

.

148 J. Pfanzagl / ConsiHency of m.l. estimators

To evaluate this rest& recall that for discrete families in general, strong con- .- -_

sistency for a (continuous and Identrfiabie) parameter ti can OII~~ ub UuWI__ -1-r kn actwft$ned if

(Oi) > - a3 a condition which cannot b nterexamyies in ahadur, 1958, p. 208, an

c2.1) of the as. m.i. estimator re

Since 64 { ai} is concave, the application for S = uc9 + (1 - la)r,@) renders

iiiii El Qi n logrl+u 1 (

--I Sf). n+m i= 9

>1 (4.13)

the strang law of large numbers there exists an such that

(Qi} for iEN.

e shall show that

{Qi) for iEN.

establis .I5 e consider an ar itrary cofinal s there exists a cofinal su5set N#c NO such that for all ie IN,

lim nafN,

saY l .16)

e have pi20 for MN, and

nzagl / Consistency 0fm.l. estimators

. trict unless ai} for idN.

0WGning all open sets),

e vagrce teqmlogy of

--) & s q) is continuous

J. Pfanmgl / CbnsLHency of rd. estimators

is cl locally compact usdorff Spm? with cold (H) for p-a.a. xe

atom is strongly con&tent for

PlY ewe with an .

-:J ‘1 & _<< j#$Q ‘2 LL Let D! W be a continuous rob3‘rrility density fulfilli

1 IB denote the probability measure density x-*p(x-q), XE II?. hen the assumptions of nce any sequence of as. m.1.

estimators is stro bility condition is fulfilled. ernel density estimators’ for t

i stigated by we obtain strong consistency

without the condition

j is a family of probability res for

n E IN, the seqmwce

is identifiable in the sense that for any sequwce ai} for ie IN impIies Wed COM-

t9f as. m.P. estimators is strongly

1Y eor .

ea§Uie concentrated i vaguely comtinuous for x = c, w

are covered by the

J, Pfanzagl / Consistency of m.1. estimators

is 1.s.e. and convex on IR” (see e.g. ence the natural parameter space,

. rt -wJG tl) E ) for any x in the interior of S

* It suffices to show that (ICE : q’x-A--(q> 3~) is compact (or empty) for all rE IR, if xE AC. on Rm, this set is ckqsed. It remains to prove that it is bounded. arndorff-Nielsen (1978, p. 141, Lemmas 9.1 and 9.2), we have a(x) : = s )<a fo~x~s~. Let tiEIRm, i=lt.‘.,m, be independent vectors, small cmsugh SQ that xi ti E S (fr) S a(x -I- t) -

‘t for any tERm with x=t&S

(71 E

m

c f-H w : r-tZ(X- ti)SS’t$S -r+a(X+ ti)}, i=ll

which is bounded.

easure zero.

J. Pfanzagl / Cmsistency of ml. estimators 153

there exists iii p-null set N

The function x-p

Since s n NC is de

y definition of the su is an accumulati S,“. This conclu

f p is not discrete, its non-ato

J. Pfanzagl / Consistency of rn. 1. estimators

entifiability condition is fulfilled by roposition 6.2,

solves the consistency proble

solution. eter we are able to provi

is relatively closed, any sequence of as. ml. any identifiable mixing distriburtion.

and So : = (q, sz). Apply

:={xElR:p(X,~)<cx, interval with p(sf) = 0

tion 6.2 we obtain SIC ere exists a j4-null ition of the support

therefore also dense in Sr . Since p( l , G)

is continuous on St, this implies p@ Ii) =p(x, rz) for XE &, i.e. & = rz.

he measure p occurs in heorem 6.4 explicitely, and in Theorem 6.5 in an in- CL way, through identifiability. To make sults applicable to exponential

milies over an arbitrary measurable space with v-density

x-+ h(x)exp[#

with T: X-, P, one has to require these conditions for the measure p 1 lBm, defined Y

1 =

he condition on ,U o o the restriction of t

J. PfanzagI / Consistemcy of m.1. estimators 155

(i) &m(x, 6) is I.s.c. (ii) x+%(x, U) (:= in

borhooa’ w of z. ecaM that this co

xEX (see Lemma 1

I- ))S&.

et w&&N, a countable decreasing local base of 2. Then 1.s.c at z implies

Therefore, for every e >0 there exists nE such that

!i ence 8~ IL& implies for all

he following lemma generalizes Shan

) be a measurable space, p 1

jd-density ml, I= 1,2. Then

156 J. Pfanzagl / Consistxwy of ml. estimators

obtain

n Ai) > 0, relation (7.3) is e following. This implies in

>O, is strictly convex, we

ict, unless pa{x~A, : (x)) >O, this implie

I . r in N let aiz 0, pi= 0. Then the following holds true for any

uality is strict unless there is c E [O, 00) such that (Xi = Cpi for

is a convex subset of a l&at space, and ns, fulfilling h&.5$,) = 1, v = 1, . . . 9 rr, for some

n

w

-1

1

n n

J. Pfanzagl / ConsiMency of n2.I.

nonin~~eQsing density. and

iim m,(x) = o(x) for A-&La. x E (0, 00). n--+a,

For PT E MO kt F,(X) : = (x) for all E&-zontin

increasing, hence F, concave. Characterizing concavity of

(y)-F,(X))SFn(UX+(l -U)y)-Fn

it follows easily FO is concave, too, an a nonincreasing sgue density-, say r”io.

It remains to prove (7.6). Let A = {xEX: F;(x) = ma(x)).

XE A and e > 8 there exists an Fo-continuity point x8 >x su

(X,-x)-l(F~(xJ-F~(x))xno(xj-~.

Since mn is nonincreasing,

f?l,(x)l(Xe-X)-l

ence

!iIlJ m, 2 ino for XE n+oo

ogether with the inverse ine

earlier version of t

158 J. Pfanzagl / Consistency of m.1. estimators

Bahadur, R.R. (1958). Examples of inconsistency of maximum likelihood estimates. Sankhya 207-210.

Philadelphia, PA.

hazard rate. Ann. (1963). Properties of probability distributions with

mixtures of exponential families. J.

ation and !Zxponential Families in Statistical Theory. Wiley, New

(1981). Probability Theory aHd El’ements of Measure Theov. Academic Press, London. limko (1974). Note on the strong convergence of distributions. Ann.

ponential family. Ann, Stat& ) 86-94; 783-792.

distributions. Stat& Probab. Lett. 5, 375-378. MarshaH, A.W. and F. Proschan (1965). imum likelihood estimation for distributions with

monotone failure rate. Ann. Math. Statist. erlman, MD, (1972). On the strong co approximate maximum likelihood estimators. Proc. Sixth Berkeley Symp. PIath. Statkt.

Pfanzagl, J. (1969). On the measurability and consistency of minimum contrast estimates. Met&a 1 249-272.

P fanxagl, J. and W. Wefelmeyer (1985). Asymptotic Expansions for General Statistical Models. Lecture Notes in Statistics No. 31. Springer, Berlin.

Reiss, R.-D. (1973). On the measurability and consistency of maximum likelihood estimates for unimodal densities. Ann. &list.

Reiss, . (1978). Consistency of minimum contrast estimators in non-standard cases. trika 25, 129

Robbins, and E.J.G. Pitman (1949). plication of the method of mixtures to quadratic forms in normal variates. &a. Madh. Statist. 552-560.

. (1967). On estimating a density which is measurable with respect to a a-Lattice. Ann. Math.

imum Ilikelihood estimation of a compound Poisson process. Ann. Statist.