13
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997 1935 Reconstruction Sequences and Equipartition Measures: An Examination of the Asymptotic Equipartition Property John T. Lewis, Charles E. Pfister, Raymond P. Russell, and Wayne G. Sullivan Abstract— We consider a stationary source emitting letters from a finite alphabet . The source is described by a stationary probability measure on the space of sequences of letters. Denote by the set of words of length and by the probability measure induced on by . We consider sequences IN having special properties. Call IN a supporting sequence for if . It is well known that the exponential growth rate of a supporting sequence is bounded below by , the Shannon entropy of the source . For efficient simulation, we require to be as large as possible, subject to the condition that the measure is approximated by the equipartition measure , the probability measure on which gives equal weight to the words in and zero weight to words outside it. We say that a sequence IN is a reconstruction sequence for if each is invariant under cyclic permutations and for each IN. We prove that the exponential growth rate of a reconstruction sequence is bounded above by . We use a large-deviation property of the cyclic empirical measure to give a constructive proof of an existence theorem: if is a stationary source, then there exists a reconstruction sequence for having maximal exponential growth rate; if is ergodic, then the reconstruction sequence may be chosen so as to be supporting for . We prove also a characterization of ergodic measures which appears to be new. Index Terms— Asymptotic equipartition property, empirical measure, ergodic, Kolmogorov, large deviations, reconstruction, stationary. I. INTRODUCTION L ET denote the set of words of length formed using letters taken from a finite alphabet of size ; if is a subset of , then we denote the number of elements in by . Let denote the set of all sequences of letters from —the set of words of infinite length. Suppose the source emitting the letters which form the words is described by a Manuscript received April 15, 1996; revised April 10, 1997. This work was supported in part by the European Commission under the Human Capital and Mobility Scheme (EU Contract CHRX-CT93-0411). The material in this paper was presented in part at the 1996 IEEE Information Theory Workshop, Haifa, Israel. J. T. Lewis and R. P. Russell are with the Dublin Institute for Advanced Studies, Dublin 4, Ireland. C. E. Pfister is with the D´ epartement de Math´ ematiques, Ecole Polytech- nique F´ ed´ erale, CH-1015 Lausanne, Switzerland. W. G. Sullivan is with the Department of Mathematics, University College, Belfield, Dublin 4, Ireland, and the Dublin Institute for Advanced Studies. Publisher Item Identifier S 0018-9448(97)06718-7. stationary probability measure on the space ; can we find a sequence IN of sets of words of increasing length from which we can reconstruct the measure ? Take with the Bernoulli measure on . Define, for such that is an integer, (1.1) These are the words of length in which the relative fre- quencies of ones and zeros are and . We claim that the measure is determined completely by the sequence IN . The first step is to construct a sequence of equipartition measures. Define to be the probability measure on which gives equal weight to the words in and zero weight to words outside it: for each subset of , put (1.2) For , every measure on induces a measure on via the projection which selects the first letters from a word of length . We claim that (1.3) for every and every IN; here is the measure on induced by the measure on via the projection which selects the first letters in an infinite sequence. But the set IN is precisely the data required, according to Kolmogorov’s Reconstruction Theorem [5], to determine the measure completely. Our claim (1.3) can be proved using a conditional limit theorem of van Campenhout and Cover [3]; see (6.13) of Section VI. We say that a sequence IN is a reconstruction sequence for if each is invariant under cyclic permutations and (1.4) for each IN; an alternative definition of the concept is discussed in Section VI. The concept of a reconstruction sequence for is illustrated by the example of the sequence 0018–9448/97$10.00 1997 IEEE

Reconstruction sequences and equipartition measures: an examination of the asymptotic equipartition property

  • Upload
    wg

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Reconstruction sequences and equipartition measures: an examination of the asymptotic equipartition property

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997 1935

Reconstruction Sequencesand Equipartition Measures:

An Examination of theAsymptotic Equipartition Property

John T. Lewis, Charles E. Pfister, Raymond P. Russell, and Wayne G. Sullivan

Abstract—We consider a stationary source emitting lettersfrom a finite alphabet . The source is described by a stationaryprobability measure � on the space :=

IN of sequences ofletters. Denote byn the set of words of lengthn and by �n theprobability measure induced onn by �. We consider sequencesf�n � n: n 2 INg having special properties. Call f�n �n: n 2 INg a supporting sequencefor � if limn �n[�n] = 1.It is well known that the exponential growth rate of a supportingsequence is bounded below byhSh(�), the Shannon entropyof the source�. For efficient simulation, we require �n to beas large as possible, subject to the condition that the measure�n is approximated by the equipartition measure�n[�j�n], theprobability measure onn which gives equal weight to the wordsin �n and zero weight to words outside it. We say that a sequencef�n � n: n 2 INg is a reconstruction sequencefor � if each �n

is invariant under cyclic permutations and limn �n[�j�n] = �mfor each m 2 IN. We prove that the exponential growth rateof a reconstruction sequence is bounded above byhSh(�). Weuse a large-deviation property of the cyclic empirical measureto give a constructive proof of an existence theorem: if� is astationary source, then there exists a reconstruction sequence for� having maximal exponential growth rate; if � is ergodic, thenthe reconstruction sequence may be chosen so as to be supportingfor �. We prove also a characterization of ergodic measures whichappears to be new.

Index Terms—Asymptotic equipartition property, empiricalmeasure, ergodic, Kolmogorov, large deviations, reconstruction,stationary.

I. INTRODUCTION

L ET denote the set of words of lengthformed usingletters taken from a finite alphabetof size ; if is a

subset of , then we denote the number of elements inby. Let denote the set of all sequences of letters

from —the set of words of infinite length. Suppose the sourceemitting the letters which form the words is described by a

Manuscript received April 15, 1996; revised April 10, 1997. This workwas supported in part by the European Commission under the Human Capitaland Mobility Scheme (EU Contract CHRX-CT93-0411). The material in thispaper was presented in part at the 1996 IEEE Information Theory Workshop,Haifa, Israel.

J. T. Lewis and R. P. Russell are with the Dublin Institute for AdvancedStudies, Dublin 4, Ireland.

C. E. Pfister is with the Departement de Mathematiques, Ecole Polytech-nique Federale, CH-1015 Lausanne, Switzerland.

W. G. Sullivan is with the Department of Mathematics, University College,Belfield, Dublin 4, Ireland, and the Dublin Institute for Advanced Studies.

Publisher Item Identifier S 0018-9448(97)06718-7.

stationary probability measure on the space ; can we finda sequence IN of sets of words of increasinglength from which we can reconstruct the measure?

Take with the Bernoulli measure on. Define, for such that is an integer,

(1.1)

These are the words of length in which the relative fre-quencies of ones and zeros are and . We claim thatthe measure is determined completely by the sequence

IN .The first step is to construct a sequence ofequipartition

measures. Define to be the probability measure onwhich gives equal weight to the words in and zero

weight to words outside it: for each subset of , put

(1.2)

For , every measure on induces a measureon via the projection which selects thefirst letters from a word of length . We claim that

(1.3)

for every and every IN; here is the measureon induced by the measure on via the projection

which selects the first letters in an infinitesequence. But the set IN is preciselythe data required, according to Kolmogorov’s ReconstructionTheorem [5], to determine the measurecompletely. Ourclaim (1.3) can be proved using a conditional limit theorem ofvan Campenhout and Cover [3]; see (6.13) of Section VI.

We say that a sequence IN is areconstruction sequencefor if each is invariant undercyclic permutations and

(1.4)

for each IN; an alternative definition of the conceptis discussed in Section VI. The concept of areconstructionsequencefor is illustrated by the example of the sequence

0018–9448/97$10.00 1997 IEEE

Page 2: Reconstruction sequences and equipartition measures: an examination of the asymptotic equipartition property

1936 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

IN defined by (1.1) suitably extended for thecase is not an integer.

For efficient simulation, we would like the sequence to growas fast as possible so that we have large samples of words ofreasonable length. Consider the sequence constructed using athickened shell

(1.5)

This sequence has a faster growth-rate than the sequencedefined by (1.1); it is a reconstruction sequence, but not for:for , the sequence converges to , theBernoulli measure on . This can be deducedfrom a conditional limit theorem proved in [8] (see also [9]).(For , we recover the Bernoulli measure.)

These examples illustrate a property of reconstruction se-quences: they cannot grow too quickly. In fact, we have thefollowing upper bound on the growth rate:

• If IN is a reconstruction sequence for, then

(1.6)

where is the Shannon entropyof .

There are reconstruction sequences which grow very slowly;in Section VI, we give a proof of the following result:

• Let be a stationary source; then there exists a recon-struction sequence IN for which haszero growth rate

(1.7)

We have the following existence theorem:

• Let be a stationary source; then there exists a recon-struction sequence for having maximal growth rate.

We turn our attention to another property which a sequenceof sets of words may have: we call a sequenceIN a supporting sequencefor if

(1.8)

where is the probability measure induced on .The sequence defined by (1.5) is, for all values of ,

an example of a supporting sequence for the Bernoullimeasure while that defined by (1.1) fails to be.

A supporting sequence cannot grow too slowly. We havethe following lower bound to the growth rate:

• If IN is a supporting sequence for,then

(1.9)

For economical coding, it is important to have a supportingsequence which grows as slowly as possible. We have thefollowing existence theorem:

• Let be an ergodic source; then there exists a supportingsequence for having minimal growth rate.

Since the Shannon entropy is a lower bound on the growthrate of a supporting sequence and an upper bound on thegrowth rate of a reconstruction sequence, a sequence whichhas both properties has a growth rate equal to the Shannonentropy.

Let us examine how we can modify the construction (1.5) inorder to get a sequence which is both a reconstruction sequencefor and a supporting sequence for. Define

(1.10)

We can use the conditional limit theorem in [8] to prove thatthe sequence has the reconstruction property and theCentral Limit Theorem to prove that it has the supportingproperty for the Bernoulli measure.

Let us examine this construction more closely. It selectsthose words of length for which the relative frequency ofones lies in a closed neighborhood of(and hence the relativefrequency of zeros lies in a closed neighborhood of). We canthink of the measure as being described by a vector .Introducing a relative frequency vector

(1.11)

we can rewrite (1.10) as

(1.12)

where is the closed ball of radius centered on thepoint . In other words, what we have done is to define amapping from to the space of Bernoulli measures anda decreasing sequence of closed neighborhoods of inthe space of Bernoulli measures whose intersection is, andtaken to be those words for which lies in

. This choice has some nice properties.

1) The set is invariant under cyclic permutations ofthe letters in the words—this is important because themeasure which we are attempting to approximate isstationary.

2) The set is nonempty for all sufficiently large—thisis important because we want to condition on it.

In order to prove our existence theorems, we need to gen-eralize the construction which produced the sequence .We introduce a class of sequences calledcanonical sequences;to define these, we make use of thecyclic empirical measure,a mapping from to , the space of probabilitymeasures on .

The cyclic empirical measure is a generalization of therelative frequency vector which will do what we want in thegeneral case—its precise definition will be given later. For thepresent, we will describe it in terms of its marginals. For each

, we have a measure defined on subsets of;the projection induces a measureon the subsets of

(1.13)

Page 3: Reconstruction sequences and equipartition measures: an examination of the asymptotic equipartition property

LEWIS et al.: RECONSTRUCTION SEQUENCES AND EQUIPARTITION MEASURES 1937

For , and , we can describedirectly. Consider the cyclic permutations of

the word

(1.14)

then is the fraction of these in which the firstentries coincide with . Thus isjust the relative frequency of the letter in the word ,

is the relative frequency of the adjacent pairin the (cyclic) word , and so on. We take to

be the set of words in for whichis close to for all and all .

The sequence is a canonical sequence.Of course, it is necessary to say what we mean by “close

to”; that is what is accomplished by the formal definition: letbe a decreasing sequence of closed neighborhoods of

in the space of measures whose intersection is; for each ,the measure depends only on the first coordinates of

and so determines a subset of ; a sequenceIN constructed in this way with nonempty

for all sufficiently large, is called thecanonical sequencebased on . The definition of ensures that the setis cyclically invariant.

Our reason for introducing the concept of a canonicalsequence is the following result which holds for an arbitrarystationary source :

• Every canonical sequence for is a reconstruction se-quence for .

All we have done so far is to push the problem of existenceone stage back: does there exist a canonical sequence for anarbitrary stationary source? There is no difficulty in findinga sequence of neighborhoods which contract to; the problemis to prove that the subsets which they determine arenonempty—at least for all sufficiently large. One way ofdoing this is to show that the growth rate of is strictlypositive; this will be the case if the sequence of neighborhoodsof contracts sufficiently slowly. Our strategy is to start withan arbitrary sequence of closed neighborhoods contracting to

and slow its rate of contraction until we are sure thatthe corresponding subsets are growing fast enough; tocheck on this, we use large-deviation theory. (In fact, we useonly the most basic result of the theory: the large-deviationlower bound, a direct consequence of the existence of therate function; see [7], for example. A derivation of the large-deviation properties of the cyclical empirical measure whichwe require can be found in [10].) We prove the followingresult:

• Let be a stationary source; then there exists a canonicalsequence for having maximal growth rate.

A canonical sequence is not necessarily supporting. In thecase of a Bernoulli measure, we were able to use the CentralLimit Theorem to find a rate which makes the sequence

supporting; in the general case, we do not havesuch a precise estimate available. Nevertheless, when themeasure is ergodic we are able to use the Ergodic Theoremto prove the existence of a canonical sequence which is

supporting. The converse also holds so that we have thefollowing characterization of ergodic measures:

• Let be a stationary source; thenis ergodic if and onlyif there exists a canonical sequence which is supportingfor .

We have seen that canonical sequences of subsets areuseful and arise naturally in the reconstruction problem forstationary sources. It is instructive to compare them withthe set of “typical” sequences of letters associated with anergodic source. Let be an ergodic source; there exists aset with such that each sequence

in determines uniquely (see Section VI). For astationary source , a canonical sequence plays an analogousrole: let IN be a canonical sequence for

; any sequence IN of words determinesuniquely (this is proved in Section VI). A canonical sequencehas some advantages over the typical set: one is that everystationary source has a canonical sequence—it is not necessarythat the source be ergodic; another is that a canonical sequenceis associated with an increasing sequence IN of

-algebras, where is generated by the first coordinatefunctions, while the typical set is in the tail -algebraso that the first coordinates of an element of areirrelevant.

To put our results in context, it may be useful to re-call the Asymptotic Equipartition Property: in terms of theconcepts used here, the conclusion of the theorem of Shan-non–McMillan–Breiman ([1], [11], [13]), may be stated:

• Let be an ergodic source; then for each thereexists a sequence which is supporting for andwhose growth rate satisfies

(1.15)

It follows from the construction used in the proof that eachword satisfies

(1.16)

where is the base of logarithms used in the definition ofthe Shannon entropy (see (2.15) below); this is the originof the nameasymptotic equipartition property. The sequence

is a not a reconstruction sequence for. In Section VI,we discuss how this construction may be refined to yield asequence which has both properties.

II. STATEMENT OF RESULTS

In Section II, we make precise the concepts introducedinformally in Section I and sketch proofs of our main theorems.The main result of this paper is an existence theorem.

Theorem 2.1:Let be a stationary source; then there existsa reconstruction sequence forhaving entropic growth rate.If, in addition, is ergodic, then the reconstruction sequencemay be chosen so as to be a supporting sequence for.

Page 4: Reconstruction sequences and equipartition measures: an examination of the asymptotic equipartition property

1938 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

We will give a constructive proof of this theorem. A by-product of this investigation is a characterization of ergodicmeasures:

Theorem 2.2:Let be a stationary source; then isergodic if and only if there exists a canonical sequence for

which is supporting for .This section is, to some extent, self-contained: we recall

the definitions and results required to understand the conceptsdefined here. We state six lemmas, indicating roughly on whattheir proofs depend; we prove our existence theorem usingthe first five—the sixth is used to complete the proof ofthe characterization of ergodic measures. The reader who isprepared to accept the lemmas need read no further.

The first two lemmas are proved in Section III usingproperties of thedivergence ratedefined there. The thirdlemma is crucial: it states that every canonical sequence isa reconstruction sequence; it is proved in Section IV.

To construct sequences with the required properties, wemake use of thecyclic empirical measureto definecanonicalsequencesof subsets. The sequence of probability distributionsof the cyclic empirical measure with respect to the uniformproduct measure on satisfies a large deviation principle withthe informational divergence as rate function. This is exploitedin Section V, where the fourth, fifth, and sixth lemmas areproved.

Some of the ideas have their origin in statistical mechanics;some readers will find reference to this confusing while otherswill find it enlightening. Having in mind those in the firstcategory, we make no reference to statistical mechanics inthe body of the paper; for the others, we provide in the finalsection, Section VI, a commentary on the concepts and results.

We now make precise the structures we are considering. Thespace is the space of infinite sequences with entriestaken from a finite alphabet havingletters; the map is the coordinate projection ontothe th factor in the product. Let bethe -algebra generated by the first coordinate functionsand let IN be the -algebra generatedby all coordinate functions. Since contains elements, the

-algebra is generated by the atoms, where is the projection

onto the first coordinates. Sometimes the discussion canbe clarified by working on rather than . Let be aprobability measure defined on. On we have , theimage measure defined on the subsets of by

. Equivalently, one could consider on restrictedto the -algebra . The two viewpoints are complementary.

Recall that, for , every measure on inducesa measure on via the projectionwhich selects the first letters from a word of length; since

, it follows that if the measures INare induced from a probability measureon so that for all

IN we have

(2.1)

then they satisfy the compatibility conditions

(2.2)

for all IN and all . Conversely, Kolmogorov’s Re-construction Theorem implies that given a sequenceIN of probability measures satisfying the compatibility con-ditions (2.2), there exists a unique probability measureon

such that for all IN the probability measures aregiven by (2.1).

For a function , we write to mean thatis -measurable and bounded; we write to mean

that there exists a finite with . We use the notationto denote the space of probability measures on

with the coarsest topology for which each mapping

(2.3)

is continuous whenever : this is called theboundedlocal topology. In Section I, we encountered the following no-tion of convergence: a sequence IN of probabilitymeasures on converges to the probability measurein thesense ofconvergence of finite-dimensional marginalsif, forevery IN and every

(2.4)

In the present setup, convergence of finite-dimensionalmarginals is equivalent to convergence in the bounded localtopology; this can be seen from the following considerations:

is the integral of the indicator function

ifotherwise

(2.5)

of the atom of and the setIN spans .

We use a product probability measureon as areference measure; we taketo be the measure on which,for all IN, assigns equal probability to each of theatoms of , so that for each . Noticethat for , we have

(2.6)

Let be a probability measure on ; we may think ofas characterizing the statistical properties of the source of thewords, and we shall refer to itself as thesource.

We recall the definitions ofstationary measureandergodicmeasure. We define theshift operator on by

IN (2.7)

The shift acts on functions by composition

(2.8)

We define the action of on a measure by

(2.9)

Page 5: Reconstruction sequences and equipartition measures: an examination of the asymptotic equipartition property

LEWIS et al.: RECONSTRUCTION SEQUENCES AND EQUIPARTITION MEASURES 1939

From the shift operator, we construct theaveraging operator

(2.10)

A source is stationary if it is invariant under the shift: forall

(2.11)

A stationary source satisfies the Ergodic Theorem.

• Let be a stationary probability measure; for ,the limit

(2.12)

exists -almost surely and the function isshift-invariant and satisfies

(2.13)

A source is ergodic if it is stationary and it assignsprobability zero or one to each invariant subset: for allsuch that , either or . We havethe following corollary to the Ergodic Theorem:

• If is ergodic, then the limit (2.12) is constant for-every and hence

-a.e. (2.14)

Recall that , the Shannon entropyof a stationarysource , is nonnegative and given by

(2.15)

where the logarithm is taken in some fixed base .Definition 2.1: Let be a stationary source. A sequence

IN is said to be asupporting sequenceforif and only if the condition

(2.16)

holds.Example: A simple example of a supporting sequence is

IN . The elements of this supporting sequenceare too big: in the context of data compression, the goal is tochoose the sets to be as small as possible, consistent withcondition (2.16) holding.

A lower bound on the exponential growth rate of a sup-porting sequence is provided by the following result, a conse-quence of elementary properties of the divergence rate; it willbe proved in Section III, Proposition 3.1.

Lemma 2.1: Let be a stationary source. IfIN is a supporting sequence for, then

(2.17)

This result motivates the following definition.

Definition 2.2: Let be a stationary source. A sequenceIN is said to haveentropic growth ratefor

if and only if

(2.18)

As a first step in the definition of a reconstruction sequence,we define a class of probability measures, the equipartionmeasures. In Section I, we defined them “downstairs” on:the equipartition measure determined by the subset

of is the probability measure on which gives equalweight to the words in and zero weight to words outsideit. Here we define them “upstairs” on with the aid of thereference measure.

Definition 2.3: Let be a subset of with ;we call the probability measure given on by

(2.19)

the equipartition measuredetermined by .Notice that, for each subset of , we have

(2.20)

Although the original measureis stationary, the equipartitionmeasure is not stationary unless . Sincewe wish to use equipartition measures to approximate astationary measure, we have to do something about this. Themost elegant solution is to define a reconstruction sequencewith the aid of the averaging operator (2.10): a sequence

IN is a reconstruction sequence for if. While readers familiar with ergodic

theory may find this definition natural, others may find itpuzzling. For this reason, we prefer to adopt a definitionin which the averaging is performed “downstairs” onrather than “upstairs” on ; the connection between the twodefinitions is discussed in Section VI. We use, the cyclicpermutation operatoracting on

(2.21)

Definition 2.4: Let be a stationary source. A sequenceIN is said to be areconstruction sequence

for if and only if

1) for all sufficiently large, ;2) each is invariant under the cyclic permutation ;3) the corresponding sequence of equipartition

measures converges to

(2.22)

for each IN.

For reconstruction sequences, we have the following upperbound on the exponential growth rate; it will be proved inSection III, Proposition 3.2, using the lower semicontinuity ofthe divergence-rate.

Lemma 2.2:Let be a stationary source. IfIN is a reconstruction sequence for, then

(2.23)

We have the following obvious corollary:

Page 6: Reconstruction sequences and equipartition measures: an examination of the asymptotic equipartition property

1940 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

Corollary 2.1: Let be a stationary source. If a supportingsequence for is also a reconstruction sequence for, thenit has entropic growth rate.

Examples: Take with the Bernoulliprobability measure. Let be the Bernoulli measure.Define

(2.24)

• If , then unless is divisible by ;does not exist.

• If , then for each IN there is exactly oneso that implies , and .

A direct calculation shows that is a reconstructionsequence for . A simple computation using Stirling’sformula shows that has entropic growth rate. It isnot supporting.

• If , then the Central Limit Theorem showsthat is supporting for . Direct arguments show thatit is also a reconstruction sequence, hence has entropicgrowth rate.

• If , where is a constant, then thesequence is supporting, but not a reconstruction sequence,for . It is a reconstruction sequence with entropic growthrate for the Bernoulli -measure with . However,it is not supporting for this -measure.

One may also ask if a sequence can be both supporting andhave entropic growth rate for two distinct measures. To seethat this is the case, let be the Bernoulli measure andlet ; define

(2.25)

and let . With , the sequenceis supporting and of entropic growth rate for, , and

. It is a reconstruction sequence for thenonergodic .Sets forming a reconstruction sequence may grow very

slowly; for examples with zero exponential growth rate, seeSection VI and [14] for a related result.

Our existence proofs make use of a construction whichgeneralizes that used for a Bernoulli measure in the aboveexamples. Define theblocking operator as

(2.26)

which is -measurable since depends only on. Next, we define thecyclic empirical measure

(2.27)

in the space of probability measures on . Sincethe map is -measurable, the inverse image

of a subset of is determined completely by thefirst coordinates.

Definition 2.5: Let be the cyclic empir-ical measure. A sequence IN is said to be acanonical sequencefor if and only if

1) there exists a decreasing sequence of closed neigh-borhoods of whose intersection is ;

2) each set is given by

(2.28)

3) for all sufficiently large, .

In this case, we shall say that the canonical sequenceis based onthe sequence .

The key to the proof of our main theorem is a conditionallimit theorem; this is the subject of Section IV. It says thatif is a canonical sequence for the stationary measure

, then the sequence of conditioned measures con-verges to .

Lemma 2.3:Let be a stationary source; every canonicalsequence for is a reconstruction sequence for.

This result is an easy consequence of the cyclical invarianceof the sets and the compactness of the space .

A great advantage which comes from working “upstairs”on is that we have available results on the large deviationproperties of the cyclic empirical measure. The results we needare summarized in Section V; proofs can be found in [10]. Weuse the large-deviation lower bound to prove the existence ofa canonical sequence for a stationary measure.

Lemma 2.4:Let be a stationary source; then there existsa canonical sequence for having entropic growth rate.

Since the alphabet is assumed to be finite, the existence of adecreasing sequence of closed neighborhoods contracting tois easily established; the large-deviation lower bound is usedto control the rate at which the sequence contracts tosoas to ensure that, at least for allsufficiently large, the sets

satisfy . We do so by exhibitinga sequence whose growth rate is boundedbelowby theShannon entropy of ; it then follows from Lemmas 2.2 and2.3 that the growth rate is entropic.

A canonical sequence for is not necessarily supporting;however, when the source is ergodic, we can use the ErgodicTheorem in place of the large-deviation lower bound to controlthe sequence of contracting neighborhoods. In this way, we canensure that the sequence we construct is supporting and, byLemma 2.1, canonical. This is done in Section V, Proposition5.2, establishing the following result:

Lemma 2.5:Let be an ergodic source; then there exists acanonical sequence forwhich is a supporting sequence for.

In Section V, we use the compactness of to prove theconverse of Lemma 2.5:

Lemma 2.6: If there exists a sequence which is both canon-ical and supporting for a stationary source, then must beergodic.

We are ready to prove Theorem 2.1.Lemmas 2.3 and 2.4 together prove that ifis a stationary

source, then there exists a reconstruction sequence forhaving entropic growth rate; Lemma 2.5 proves that if thesource is ergodic, then the reconstruction sequence may bechosen so as to be supporting for.

Page 7: Reconstruction sequences and equipartition measures: an examination of the asymptotic equipartition property

LEWIS et al.: RECONSTRUCTION SEQUENCES AND EQUIPARTITION MEASURES 1941

Lemma 2.6 is the converse of Lemma 2.5; together theyprove Theorem 2.2.

III. I NFORMATIONAL DIVERGENCE

The principal tool in the proofs of Lemma 2.1, the lowerbound on the growth rate of a supporting sequence, andLemma 2.2, the upper bound on the growth rate of a re-construction sequence, is the divergence rate (also called thespecific information gain).

Definition 3.1: The informational divergenceof the proba-bility measure with respect to the probability measureisgiven by

(3.1)

when is absolutely continuous with respect to; otherwise.

The informational divergence is also known as the infor-mation gain.

Definition 3.2: The divergence rateof the probability mea-sure with respect to is given by

(3.2)

where is the restriction of to . (Note:.)

We always have , ; if ,then , but the corresponding result for does notalways hold. Note that, if is stationary and is a stationaryproduct measure, then

(3.3)

When is a finite alphabet with letters, and is the uniformproduct measure, this yields

(3.4)

Likewise, when is a finite alphabet and , we have

(3.5)

Henceforth, we will replace by and byin the statements of the propositions. Modified in

this way they hold in greater generality; this is discussed inSection VI.

Proposition 3.1: Let be a stationary source. If is asupporting sequence for, then

(3.6)

Proof: Since is in , we have

(3.7)

(3.8)

The inequality follows by dividing by and taking ,using .

Lemma 2.1 follows using (3.4)–(3.6).There are some results of a more technical character which

we require concerning cyclic symmetrization and the diver-gence rate. We collect them in a lemma; they are proved in[10, Sec. VIII]. The space has a natural decomposition intoa product space

(3.9)

and the measure is the product measure ofon and on . Notice that, for each subset of ,we have

(3.10)

Lemma 3.1:Let and be measurablespaces. Let with the corresponding product

-algebra. Let and be probability measures onwith , and denoting the restrictions toconsidered as sub--algebras of . Assume .Then we have

(3.11)

We are now in a position to prove the upper bound on thegrowth rate of a reconstruction sequence.

Proposition 3.2: Let be a stationary source. If is areconstruction sequence for, then

(3.12)

Proof: By direct calculation, we have

(3.13)

so that

(3.14)

Next, we make use of the cyclical invariance of: for anyinteger such that , the projections of onthe -algebras and arethe same. Let and be the largest integer smallerthan . From Lemma 3.1, we have

(3.15)

Since , it follows from the lower semi-continuity of on the space of measures onthat

(3.16)

Page 8: Reconstruction sequences and equipartition measures: an examination of the asymptotic equipartition property

1942 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

Hence

(3.17)

Lemma 2.2 follows using (3.4), (3.5), and (3.12).

IV. A CONDITIONAL LIMIT THEOREM

We state and prove a conditional limit theorem. We shallneed the following lemma which exploits the invariance ofthe reference measure under thecyclic shift operator ,defined by

ifif .

(4.1)

We sometimes find the following notation useful: letand ; we set

(4.2)

Lemma 4.1: Let be -measurable withand let with ; then

(4.3)

Proof: Since is -measurable, there exists

such that . Note that is bijective

and

with (4.4)

Also

which imply

and

(4.5)

Since with , we have so

(4.6)

A second ingredient in the proof of our conditional limittheorem is a lemma which states a simple consequence of thecompactness of ; we shall need it again in Section V.

Lemma 4.2:Let be a stationary source and letIN be a canonical sequence for based on the

sequence . Then for each and each thereexists such that whenever

(4.7)

Proof: is a decreasing sequence of closed neigh-borhoods of whose intersection is . We then have

(4.8)

We deduce there exists so that

because is compact, so we have (4.7).Theorem 4.1:Let be a stationary source; every canonical

sequence for is a reconstruction sequence for.Proof: Let IN be a canonical sequence

for based on the sequence . Then, by Lemma 4.2, foreach and each there exists such that

and whenever we have

(4.9)

It follows that and imply

(4.10)

Since is supported by , we have

(4.11)

It follows from the above and (4.3) that

(4.12)

whenever . This proves that the sequenceconverges to .

V. CANONICAL SEQUENCES

We begin by summarizing the ideas of large-deviationtheory and the single result we shall require; proofs can befound in [10].

Denote by the distribution of the cyclic empiricalmeasure defined on the probability space

, where is our reference measure—the uniformproduct measure

(5.1)

For each open set , define

(5.2)

(5.3)

The following result is [10, Lemma 8.3]: for each ,we have

(5.4)

Page 9: Reconstruction sequences and equipartition measures: an examination of the asymptotic equipartition property

LEWIS et al.: RECONSTRUCTION SEQUENCES AND EQUIPARTITION MEASURES 1943

Definition 5.1 [7]: The Ruelle–Lanford function is de-fined on by

(5.5)

(5.6)

It is a basic result in Large-Deviation Theory (see [7]) thatthe existence of the Ruelle–Lanford function implies that thelarge-deviation lower bound holds for open subsets

• for each open set , we have

(5.7)

In the present case, we have

• the Ruelle–Lanford function (RL function) is given ex-plicitly by

if is stationaryotherwise.

(5.8)

The following result is fundamental: it establishes the exis-tence of the sequences of closed neighborhoods ofon whichour construction of canonical sequences is based.

Lemma 5.1: Let be a stationary source; then there existsa decreasing sequence of closed neighborhoods of in

such that

(5.9)

Proof: The statement is a consequence of our hypothesisthat the factor spaces of are finite sets, copies of a finitealphabet; it holds whenever the factor spaces are standardBorel spaces, for then there exists a sequence inwhich separates : if are any two probability measureswhich satisfy

(5.10)

for all , then . In the finite alphabet case, we can takefor the set IN of all indicatorfunctions of atoms determined by finite words

ifotherwise.

(5.11)

We can choose

(5.12)

We now use the large deviation lower bound (5.7) toprove the existence of a canonical sequence for which alower bound on the exponential growth rate holds. The proofemploys a construction which we call stretching. Let bea decreasing sequence in of closed neighborhoods ofwhose intersection is and let be a strictly increasing

sequence of positive integers; the decreasing sequencedefined, for each IN, by

ifif

(5.13)

is called thestretching of by ; note that theintersection of the stretched sequence is again.

Proposition 5.1: Let be a stationary source; then thereexists a canonical sequence for which

(5.14)

Proof: Let be a decreasing sequence in ofclosed neighborhoods of whose intersection is ; consider

for fixed . Since is a neighborhood of , there existsan open set such that . It follows from thelarge deviation lower bound (5.7) that

(5.15)

hence, for each IN, there is an integer such that

(5.16)

for all . We may choose the sequence to bestrictly increasing. Let be the stretching of by

; define by

(5.17)

By construction, for all such that , wehave

(5.18)

It follows that

(5.19)

But is stationary, so that

(5.20)

hence, we have

(5.21)

In particular, for all sufficiently large; it followsthat is a canonical sequence and that the lower boundholds. The equality (5.14) for the growth rate now followsfrom Theorem 4.1 and Proposition 3.2.

When the source is ergodic, we use the stretching construc-tion together with the Ergodic Theorem to prove the existenceof a canonical sequence which is supporting. First weneed two lemmas:

Lemma 5.2:Let be an ergodic source and let ;then

-a.e. (5.22)

Page 10: Reconstruction sequences and equipartition measures: an examination of the asymptotic equipartition property

1944 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

Proof: For , an elementary calculation showsthat

(5.23)

so that the sequence converges whenever thesequence converges and they have the same limit;since is ergodic, it follows from the Ergodic Theorem thattheir common limit is .

Lemma 5.3: Let be an ergodic source and let be anopen subset of containing ; then

(5.24)

Proof: Since is an open set containing, there existand positive numbers such that

(5.25)

Since is ergodic and , it follows from Lemma5.2 that

-a.e. (5.26)

so

(5.27)

It now follows from (5.25) that

(5.28)

Proposition 5.2: Let be an ergodic source; then thereexists a canonical sequence forwhich is supporting for .

Proof: Let be a decreasing sequence in ofclosed neighborhoods of whose intersection is ; consider

for fixed . Since is a neighborhood of , there existsan open set such that . Suppose that the source

is ergodic; then, by Lemma 5.3, we have

(5.29)

so there exists such that, for all

(5.30)

The sequence may be chosen to be strictly increasing.Let be the stretching of by ; put

(5.31)

Then

(5.32)

so that

(5.33)

Hence, is supporting for ; in particular, by Proposition3.1, for all sufficiently large so that is acanonical sequence.

We conclude this section by proving a theorem of relevanceto the coding problem:

Theorem 5.1:Let be a stationary source. If there exists acanonical sequence for which is supporting for , then thesource is ergodic.

We make use of two lemmas.Lemma 5.4:Let be a stationary source. Let

IN be a canonical sequence forand suppose there existsa sequence IN which separates and

(5.34)

for each IN. If is supporting for , then the sourceis ergodic.

Proof: For simplicity, we replace each byso that

for all IN (5.35)

Since , we deduce from (5.34) that

(5.36)

Since is bounded and by hypothesis, itfollows that

(5.37)

Now let be any shift-invariant set with . We have

(5.38)

because both and are shift-invariant. Then (5.37) and(5.38) imply that

(5.39)

for all IN. From (5.35), we conclude that

(5.40)

which implies that . Since each shift-invariant sethas either or , we deduce that is

ergodic.We use compactness to prove the existence of a separating

sequence having property (5.34).Lemma 5.5:Let be a stationary source and let

IN be a canonical sequence for based onthe sequence ; then condition (5.34) holds for eachin .

Page 11: Reconstruction sequences and equipartition measures: an examination of the asymptotic equipartition property

LEWIS et al.: RECONSTRUCTION SEQUENCES AND EQUIPARTITION MEASURES 1945

Proof: is a decreasing sequence of closed neigh-borhoods of whose intersection is . It follows fromLemma 4.2 that for each and each thereexists such that whenever

(5.41)

It follows that and imply

(5.42)

This implies (5.34).Since is finite, the collection of indicator functions of

atoms from for all is a countable separating set. Takentogether, Lemmas 5.4 and 5.5 prove Theorem 5.1.

VI. COMMENTARY

1) To simplify the exposition, we have assumedto bea finite set with the equiprobable distribution on

. Our results extend with some modificationto the case in whichA is a compact metric space and

a probability measure on with on andon being the product of copies of . Here are themodifications which must be made.

a) must be replaced by and byin the statements of the propositions; this

is done when we come to prove them in SectionsIII–V.

b) The hypothesis “Let be a stationary source”must be amplified to read “Let be a stationarysource with finite.”

The results may be further extended to the case inwhich is a standard Borel space and a probabilitymeasure on , with and the corresponding productprobability measures. In this case, we need to modifythe definition of “canonical sequence” so that with

we require not only that be adecreasing sequence of closed neighborhoods ofwith

, but also that there exists a sequencewith each , such that the topology ondetermined by separates the points of , andthat is a neighborhood basis for in this topology.In the noncompact case, in general, Lemma 4.2 is nolonger valid. However, the conclusions of Lemma 4.2hold when , the separating sequence. Oneproves Theorem 4.1 by noting that the level sets ofarecompact so that the sequence has limit pointsin . The modified Lemma 4.2 shows uniqueness ofthe limit point of , which implies convergence.

2) The concept of an equipartition measure is inspiredby that of a microcanonical measure in statistical me-chanics. For Gibbs, the microcanonical measure wasfundamental; the canonical measure, an approximationto the microcanonical, was useful by virtue of beingmore tractable analytically. The idea of bounded local

convergence is foreshadowed in the statement of his“general theorem:”

If a system of a great number of degrees offreedom is microcanonically distributed in phase,any very small part of it may be regarded ascanonically distributed [4, p. 183].

In the concept of a reconstruction sequence, we turnGibbs’ idea on its head: the stationary source cor-responds to his canonical measure; our equipartitionmeasure corresponds to his microcanonical measure.For us, the stationary source is fundamental; it can beapproximated by an equipartition measure.

3) From the point of view of digital computation, a recon-struction sequence is more tractable than a stationarymeasure. Reconstruction sequences may prove usefulin providing efficient ways of simulating stationarymeasures. We will pursue these ideas elsewhere.

4) The bound on the growth rate of a supporting sequencegiven in Lemma 2.1 is part of the folklore; it can bededuced from [15, Theorem I.7.4]. The bound on thegrowth rate of a reconstruction sequence is a conse-quence of results on the Csisz´ar–Korner-type class (see[15, Sec. I.6.d]). Our proofs, using the divergence rate,appear to be new.

5) The distinction between average and cyclic averagebecomes negligible as because the limit em-ploys : for and , an elementarycomputation shows that

(6.1)

so the results of this paper hold with the followingalternative definition of reconstruction sequence.

Definition 6.1: Let be a stationary source. A sequenceIN is said to be areconstruction sequence

for if and only if

a) for all sufficiently large, ;b) the corresponding sequence of averaged

equipartition measures converges to

(6.2)

If we use this definition, then canonical sequences have thefollowing important property:

Corollary 6.1 (to Theorem 4.1):Let be a canonicalsequence for the stationary source. Let satisfy

and for all sufficiently large. Then isa reconstruction sequence in the sense of Definition 6.1

(6.3)

Page 12: Reconstruction sequences and equipartition measures: an examination of the asymptotic equipartition property

1946 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

Proof: Let ; from the proof of Theorem 4.1, foreach , there exists so that implies

(6.4)

By hypothesis, we have for all large

(6.5)

It follows from (6.1) that

(6.6)

Hence

(6.7)

Since and are arbitrary, it follows thatconverges to .

Remarks:

a) If each has cyclic symmetry, then the conclusionof the corollary holds with the original definition ofreconstruction sequence, Definition 2.4.

b) The set can be a singleton or, in the case ofcyclic symmetry, contain at most elements. Forsuch , we have

(6.8)

6) Examples of “small” reconstruction sequences are pro-vided also by the Ergodic Theorem. Letbe an ergodicmeasure on ; we give an example of a reconstruc-tion sequence for which grows very slowly.

(6.9)

Let denote the indicator function of the atomof , where is a word in

ifotherwise.

(6.10)

The Ergodic Theorem implies that

-a.e. (6.11)

Let be the set on which the above limit holds;let denote the intersection of over allwords in and all . We have

; hence is nonempty. Choose a sequence; for each IN, define to be the set formed by

the distinct cyclic permutations of the word ; then

(6.12)

so that is a reconstruction sequence for.7) The approach to large-deviation theory sketched in Sec-

tion I is described fully in [7]; it has its origins inRuelle’s treatment [12] of thermodynamic entropy andLanford’s proof [6] of Cramer’s Theorem.

8) Our conditional limit theorem has antecedents; the ear-liest we are aware of is due to van Campenhout andCover [3]:

Let be independent and identically dis-tributed (i.i.d.) random variables having uniformprobability mass on the range .Then, for and for all

, we have

(6.13)

where

(6.14)

and the constant is chosen to satisfy the con-straint

A landmark in the development of such theorems isthe paper by Csisz´ar [2], in which several importantconcepts are introduced.

9) We have shown that canonical sequences have the re-construction property. In the case whereis a productmeasure, we have given other examples of reconstruc-tion sequences. In the literature, sequences of the form

(6.15)

are used frequently. When is ergodic, the Shan-non–McMillan–Breiman Theorem implies that isa supporting sequence

(6.16)

We can choose the strictly increasing sequence sothat implies

(6.17)

Define

ifif .

(6.18)

In the case of ergodic , we can use sets of theform (6.15) to generate a supporting sequence for.Straightforward estimates show to have entropicgrowth rate. This suggests asking whether a supportingsequence with entropic growth rate is a reconstructionsequence. The example near (2.25) shows that, in gen-eral, a supporting sequence forwith entropic growthrate need not be a reconstruction sequence for. Weneed additional hypothesis so that does not includepoints of low -probability. Here is a result of this kind.

Page 13: Reconstruction sequences and equipartition measures: an examination of the asymptotic equipartition property

LEWIS et al.: RECONSTRUCTION SEQUENCES AND EQUIPARTITION MEASURES 1947

Proposition 6.1: Let be a stationary product measure andlet be a supporting sequence for. For each ,define

(6.19)

If for each

(6.20)

then is a reconstruction sequence for.Proof: We have

(6.21)

so (6.20) implies that the sequence converges toif, and only if, the sequence converges to

. Thus it suffices to consider the case in which forall and all IN. For , we have

(6.22)

where is given by

(6.23)

Assuming , we have

(6.24)

Then

(6.25)

so

(6.26)

for every , because is a supporting sequence. Anylimit point of the sequence is stationary. Lemma8.1 of [10] and the lower semicontinuity of the divergencerate imply

(6.27)

Since is assumed to be a product measure, this implies. But the level sets of are compact, so the sequence

converges to .

Remarks:

a) If is a supporting sequence for, then the sequencegiven by

(6.28)

as given in (6.18), is a supporting sequence for, whichsatisfies (6.20).

b) Under the condition that is weakly dependent(see[10]), one may also deduce (6.27). One can then con-clude that is aGibbs statefor the interaction associatedwith ; this doesnot, in general, imply that .

ACKNOWLEDGMENT

The authors wish to thank F. den Hollander for a carefulreading of an earlier draft of this paper and P. Shields for hiscomments and references to the literature which helped to putthe results in context.

REFERENCES

[1] L. Breiman, “The individual ergodic theorem of information theory,”Ann. Math. Statist., vol. 28, pp. 809–811, 1957.

[2] I. Csiszar, “Sanov property, generalized I-projection and a conditionallimit theorem,” Ann. Probab., vol. 12, pp. 768–793, 1984.

[3] J. M. van Campenhout and T. M. Cover, “Maximum entropy andconditional probability,”IEEE Trans. Inform. Theory, vol. IT-27, pp.483–489, July 1981.

[4] J. W. Gibbs, Elementary Principles of Statistical Mechanics.NewHaven, CT: Yale Univ. Press, 1902.

[5] A. N. Kolmogorov, Grundbegriffe der Wahrscheinlichkeitsrechnung(Ergebnisse der Math.). Berlin, Germany: Springer, 1933; translatedas Foundations of Probability. New York: Chelsea, 1956.

[6] O. E. Lanford, “Entropy and equilibrium states in classical statisticalmechanics,” inLecture Notes in Physics, vol. 20. Berlin, Germany:Springer, 1973, pp. 1–113.

[7] J. T. Lewis and C.-E. Pfister, “Thermodynamic probability theory: Someaspects of large deviations,”Russ. Math. Surv., vol. 50, no. 2, pp.279–317, 1995.

[8] J. T. Lewis, C.-E. Pfister, and W. G. Sullivan, “Large deviations andthe thermodynamic formalism: A new proof of the equivalence ofensembles,” inOn Three Levels, M. Fannes, C. Maes, and A. Verbeure,Eds. New York: Plenum, 1994, pp. 183–193.

[9] , “The equivalence of ensembles for lattice systems: Some ex-amples and a counterexample,”J. Stat. Mech.,vol. 77, pp. 397–419,1994.

[10] , “Entropy, concentration of probability and conditional limittheorems,”Markov Processes and Related Fields, vol. 1, pp. 319–386,1995.

[11] B. McMillan, “The basic theorems of information theory,”Ann. Math.Statist., vol. 24, pp. 196–219, 1953.

[12] D. Ruelle, “Correlation functionals,”J. Math. Phys., vol. 6, pp. 201–220,1965.

[13] C. E. Shannon, “A mathematical theory of communications,”Bell Syst.Tech. J., vol. 27, pp. 379–423, pp. 623–656, 1948.

[14] P. Shields, “Two divergence-rate counterexamples,”J. Theor. Prob., vol.6, pp. 521–545, 1993.

[15] , “The ergodic theory of discrete sample paths,” inGraduateStudies in Mathematics, vol. 13. Providence, RI: Amer. Math. Soc.,1996.