27
7 Majorizing measures 1 7.1 Introduction ............................ 2 7.2 An illuminating example ..................... 5 7.3 Measures versus partitions .................... 8 7.3.1 From measures to partitions ............... 8 7.3.2 From partitions to majorizing measure ......... 12 7.4 From partitions to expected suprema .............. 13 7.5 The role of Gaussianity ..................... 17 7.6 From functional to nested partitions .............. 18 7.7 Uniformly continuous Gaussian sample paths ......... 23 7.8 Problems ............................. 25 7.9 Notes ............................... 26 Printed: 1 January 2016 version: 1jan2016 printed: 1 January 2016 Mini-empirical c David Pollard

Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

7 Majorizing measures 17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27.2 An illuminating example . . . . . . . . . . . . . . . . . . . . . 57.3 Measures versus partitions . . . . . . . . . . . . . . . . . . . . 8

7.3.1 From measures to partitions . . . . . . . . . . . . . . . 87.3.2 From partitions to majorizing measure . . . . . . . . . 12

7.4 From partitions to expected suprema . . . . . . . . . . . . . . 137.5 The role of Gaussianity . . . . . . . . . . . . . . . . . . . . . 177.6 From functional to nested partitions . . . . . . . . . . . . . . 187.7 Uniformly continuous Gaussian sample paths . . . . . . . . . 237.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257.9 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Printed: 1 January 2016

version: 1jan2016printed: 1 January 2016

Mini-empiricalc©David Pollard

Page 2: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

Chapter 7

Majorizing measures

Section 7.1 reminds you of the three quantities that are inter-related in themajorizing measure approach to chaining: an integral that depends on theprobability of small balls around each point in the index set; a weightedsum involving the diameters of sets in a sequence of finite parttions ofthe index set; and the expected supremum of a stochastic process whoseincrements are controlled by a Young function involved in the definitionof the first two quantities.

Section 7.2 presents a variation on the classical example for illustratingthe differences between packing number and majorizing measure assump-tions. It describes a simple Gaussian process whose supremum has afinite expected value, which satisfies the majorizing measure condition,but which fails the usual packing number condition.

Section 7.3 explains the connection between the majorizing measures andsequences of partitions of the index set.

Section 7.4 explains how a suitable sequence of partitions leads to a chain-ing upper bound on expected suprema of the stochastic process.

Section 7.5 identifies the special properties of Gaussian processes that leadto the lower bound that complements the chaining upper bound.

Section 7.6 recursively applies a partitioning scheme, derived from theGaussian inequality in Section 7.5, to generate a nested sequence of par-titions from properties of the supremum functional.

Section 7.7 describes the effect on the majorizing measure lower bound ifa centered Gaussian processes has uniformly continuous sample paths.

version: 1jan2016printed: 1 January 2016

Mini-empiricalc©David Pollard

Page 3: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

§7.1 Introduction 2

7.1 IntroductionMajorizing::S:intro

My overall goal for this Chapter is to have you understand the Fernique-Talagrand result that relates finiteness of the expected supremum of a cen-tered Gaussian process to the existence of a majorizing measure and theequivalent characterizations using sequences of finite partitions of the indexset. To make the Chapter more nearly self-contained I repeat (in slightlygreater generality) some of the material from Section 4.6 and 4.7.

My exposition is based largely on Talagrand (1992, 1996, 2001), andtwo books, Talagrand (2005, Sections 1.3 and 2.1) and Talagrand (2014,Chapter 2).

Throughout the Chapter there lurks a stochastic process Xt : t ∈ Tindexed by a metric space (T, d), with

D := diam(T ) := supd(s, t) : s, t ∈ T

denoting its diameter. I assume, for a given Young function Ψ, that theincrements of the X process are controlled by the Ψ-Orlicz norm, in thesense that

\E@ increment\E@ increment <1> ‖Xs −Xt‖Ψ ≤ d(s, t) for all s, t ∈ T,

which gives a tail bound

\E@ Psi.tail\E@ Psi.tail <2> P|Xs −Xt| ≥ ηd(s, t) ≤ β(η) := min (1, 1/Ψ(η)) .

I also assume the function Ψ satisfies an inequality for each x0 > 0,

\E@ growth\E@ growth <3> Ψ(x)Ψ(y) ≤ Ψ(c0(x+ y)) whenever x ∧ y ≥ x0,

where c0 is a constant depending on x0. Equivalently (cf. Ledoux andTalagrand 1991, page 310), for each u1 > 0,

\E@ igrowth\E@ igrowth <4> Ψ−1(uv) ≤ c1

(Ψ−1(u) + Ψ−1(v)

)whenever u ∧ v ≥ u1,

where c1 is a constant depending on u1. As shown by Problem [1], suchinequalities force Ψ(x)/eαx →∞ as x→∞, for some α > 0.

Each of the Young functions Ψα (growing like exp(xα)) from Section 2.3satisfies <3>. The special case where Ψ2(x) = exp(x2)−1 takes center stagein Sections 7.5 onwards, for the analysis of centered Gaussian processes.

Initially for Gaussian processes X, but subsequently for a variety ofdifferent X, Talagrand derived upper and lower bounds for analogs of thefunctional

\E@ sup.fnal\E@ sup.fnal <5> F (A) := P supt∈AXt for A ⊆ T .

Draft: 1jan2016 c©David Pollard

Page 4: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

§7.1 Introduction 3

Remark. To avoid measurability issues you may assume that T iscountable. Or you could work with the methods described in Chapter 5to reduce the analysis to countable subsets of T . Or you couldinterpret F (A) as supPmaxt∈S Xt : finite S ⊆ A.

As mentioned in Section 4.6, Talagrand replaced chaining bounds forpacking number by chaining bounds based on admissible sequences, π =πk : k ∈ N0, of finite partitions of T that are nested (each πk+1 is a

refinement of the preceding πk) with π0 = T and #πk ≤ nk := 22k

for k ≥ 1. He worked with functionals

γα(T, π) := supt∈T∑

k∈N0

2k/αdiam(Ek(t)),

where Ek(t) denotes the member of the partition πk that contains the point t.The subscript α suggests the connection with the Young function Ψα, forwhich Ψ−1

α (nk) grows like 2k/α.To emphasize the general connection with Ψ, I decided to work with a

closely related concept that I call (Ψ, λ)-admissibility of a sequence π =πk : k ∈ N0 of (not necessarily nested) finite partitions: again π0 = Tbut #πk ≤ Ψ(λk) for some fixed λ > 1. The corresponding functionals are

γλ(T, π) := supt∈Tγλ(t, π, 0)

where γλ(t, π, p) :=∑

k≥pλkdiam(Ek(t)) for p ∈ N0.\E@ gamlampi\E@ gamlampi <6>

Remark. Nonzero values of p become relevant when studying continuityproperties of sample paths, as in Sections 7.4 and 7.7.

Section 7.3 shows that it is easy to enforce the nestedness property on thepartitions, at the cost of a slight increase in some of the constants. Alter-natively (Section 7.4), one can just slightly modify the chaining argument.The nesting is just a notational convenience, not a necessity.

Even though it is no longer essential to the “majorizing measure” method,I also want you to see the connections between the functional γλ(T, π) andthe original idea of a majorizing (Borel probability) measure µ on T :

γ(T, µ) := supt∈Tγ(t, µ, 1)

where γ(t, µ, δ) :=

∫ δ

0Ψ−1 (1/µB[t, rD]) dr for 0 < δ ≤ 1.\E@ MM.fnal\E@ MM.fnal <7>

Here B[t, rD] denotes the closed ball with center t and radius rD. The µ iscalled a majorizing measure if γ(T, µ) <∞.

Draft: 1jan2016 c©David Pollard

Page 5: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

§7.2 An illuminating example 4

Remark. The unorthodox presence of the rD in the definitionof γ(t, µ, δ) essentially rescales the metric to a diameter of one.It ensures that γ(T, µ) is always bounded below by a fixed constant nomatter what value D takes. To see why, consider any pair of points s, tin T for which d(s, t) > 2D/3. The balls B[t,D/3] and B[s,D/3]are disjoint, which implies that at least one of them—suppose itis B[t,D/3]—has µ measure ≤ 1/2. It follows that

γ(T, µ) ≥∫ 1/3

0

Ψ−1 (1/µB[t,D/3]) dr = Ψ−1(2)/3.

Put another way, B[t, rD] is just the the closed ball of radius r for the

rescaled metric d(s, t) = d(s, t)/D and the process Xt = Xt/D has its

increments controlled by d, in the sense that∥∥∥Xs − Xt

∥∥∥Ψ≤ d(s, t) for

all s, t ∈ T .

The following Theorem, whose proof is spread out across the Sectionsof the Chapter, summarizes the interconnections between the various func-tionals.

Majorizing::summary <8> Theorem. Suppose Xt : t ∈ T has increments for which ‖Xs −Xt‖Ψ ≤d(s, t), where theYoung function Ψ satisfies the growth assumption <3>.Then there exists a finite constant Cλ, depending on λ (and Ψ) for whichthe following hold.

(i) for each majorizing measure µ there exists a (Ψ, λ)-admissible sequenceof partitions π for which γλ(T, π) ≤ CλDγ(T, µ)

(ii) for each (Ψ, λ)-admissible sequence of partitions π there exists a ma-jorizing measure µ for which Dγ(T, µ) ≤ Cλγλ(T, π)

(iii) if there exists a (Ψ, λ)-admissible sequence of partitions π and if X iscentered then F (T ) ≤ Cλγλ(T, π)

(iv) if X is a centered Gaussian process with F (T ) < ∞ then there ex-ists a (Ψ2, λ)-admissible sequence of partitions π for which γλ(T, π) ≤CλF (T ).

Remark. Even though (i) and (ii) show that majorizing measuresand admissible partitions are equivalent (up to constant multipliers),the partition functional γλ(t, π, p) has several advantages over themajorizing measure functional γ(t, µ, δ). For example, suppose T1 isa subset of T . It is not obvious that finiteness of supt∈T γ(t, µ, 1) fora probability µ on T should imply existence of a Borel probabilitymeasure ν on T1 for which supt∈T1

γ(t, ν, 1) is finite; but it is a trivialmatter to restrict the partitions πk to T1. Compare with Problem [2].

Draft: 1jan2016 c©David Pollard

Page 6: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

§7.2 An illuminating example 5

7.2 An illuminating exampleMajorizing::S:MM.vs.pack

The following example is based on a analogous construction by Fernique(1975, Exemple 5.5.2). (See also Talagrand (2005, page 44).) It brings outsome subtle differences between conditions for bounded sample paths basedon packing numbers and the more refined majorizing measure conditions.

As before define nk := 22k so that Ψ−12 (nk) grows like 2k/2. DefineN0 = 0

and Nk :=∑

i≤k ni = nk (1 + o(1)). Partition N into blocks B1, B2, . . . oflengths n1, n2, . . . ,

Bk := (Nk−1, Nk] := t ∈ N : Nk−1 < t ≤ Nk.

The process X = Xt : t ∈ N consists of a sequence of independentrandom variables with Xt ∼ N(0, σ2

t ) with σ2t = δ2

k = 2−k for all t inblock Bk. The natural metric is given by

d2X(s, t) = P|Xs −Xt|2 = σ2

s + σ2t = δ2

j + δ2k if s ∈ Bj and t ∈ Bk.

Under dX the index set N has diameter 1 and diam (∪j≥kBj) =√

2δk, sothat pack(r,N, dX) <∞ for each r > 0.

The X process and the metric space (N, dX) have the following proper-ties.

(i) P supt |Xt| <∞;

(ii)∫ D

0 Ψ−12 (pack(r,N, dX)) dr =∞;

(iii) there exists nested partitions πk of N with #πk = O(nk) for which

supt∑

k∈N2k/2diam (Ek(t)) <∞

(iv) there exists a majorizing measure.

Remark. The results in this Chapter show that (i), (iii), and (iv) areequivalent. Nevertheless, it helped my intuition to carry out explicitcalculations for each assertion.

Remember that

\E@ normal.tail\E@ normal.tail <9> Φ(u) := (2π)−1/2

∫ ∞u

e−y2/2 dy ≤ e−u2/2 for u ≥ 0.

Draft: 1jan2016 c©David Pollard

Page 7: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

§7.2 An illuminating example 6

Proof of (i).Convert the expected value to an integral of tail bounds:

P supt |Xt| =∫ ∞

0Psupt |Xt| > x dx

≤ 2 +

∫ ∞2

∑tP|Xt| > x dx

= 2 +∑

k∈Nnk

∫ ∞2

Φ(x/σt) dx

The union bound would be too crude for x too close to zero. Invoke <9>twice to bound the integrals:∫ ∞

2Φ(x/δk) dx ≤ δk

∫ ∞2/δk

exp(−y2/2) dy ≤√

2π exp(−(2/δk)

2/2).

The kth term in the sum is less than a constant times exp(2k log 2− 2× 2k);the sum converges.

Remark. Even though the sample paths are bounded almost surely theydo oscillate a lot. As shown by Problem [5], maxt∈Bk

Xt concentratesnear δk

√2 log nk = c :=

√log 2 . By symmetry, maxs∈Bk

(−Xs) alsoconcentrates near c. Thus maxt,s∈Bk

|Xt −Xs| concentrates near 2c,even though diam(Bk) → 0 as k → ∞. A similar calculation providesanother explanation for why supt |Xt| is well behaved.

Proof of (ii).Note that dX(s, t) =

√2δk for distinct s, t in Bk, implying Thus

pack(r,N, dX) ≥ nk for√

2δk > r,

and ∫ √2δk

√2δk+1

Ψ−12 (pack(r,N, dX)) dr ≥

√2 (δk − δk+1) Ψ−1

2 (nk) ≥ c

where c =(√

2− 1)√

log 2. The sum over k is infinite.

Proof of (iii).Define πk to consist of all the singleton sets t for 1 ≤ t ≤ Nk, the

Draft: 1jan2016 c©David Pollard

Page 8: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

§7.3 Measures versus partitions 7

blocks Bk+1, . . . , B2k, and I2k = (N2k,∞) = t ∈ N : t ≤ Nm. Thesize of πk equals Nk + k + 1 = O(nk). If t ∈ Bj then

diam(Ek(t)) =

diam(t) = 0 if j ≤ kdiam(Bj) =

√2δj if k < j ≤ 2k

diam(I2k) ≤√

2δ2k+1 if j > 2k.

It follows, for some constant C, that∑k∈N

2k/2diam(Ek(t))

≤ C∑

k2k/2 (δ2k+1k < j/2+ δjj/2 ≤ k < j)

≤ C2

∑k<j/2

2−k/2 + C22−j/2∑

k<j2k/2,

which is bounded uniformly in j.

Proof of (iv).Define µk to be the uniform distribution on Bk and µ =

∑k∈N 2−kµk. Once

again consider a t in some Bj . The distance of t to some other point s in Bk

is Dk :=√δ2j + δ2

k . For 0 < r ≤ 1

B[t, r] =

N if r ≥ D1

[Nk+1,∞) if Dk > r ≥ Dk+1 for k ∈ Nt for δj ≥ r.

so that

µB[t, r] =

1 if r ≥ D1

2−k if Dk > r ≥ Dk+1 for k ∈ N(2jnj)

−1 for δj ≥ r.

Note that Dk −Dk+1 = (δ2k − δ2

k+1)/(Dk +Dk+1) = O(δk), uniformly in j.∫ D

0Ψ−1

2

(1

µB[t, r]

)dr

= Ψ−12 (1)(1−D1) +

∑k∈N

Ψ−12 (2k)(Dk −Dk+1) + δjΨ

−12 (2jnj)

≤ Ψ−12 (1) +O

(∑k∈N

√k2−k/2

)+O(δj

√2j )

which is uniformly bounded in j.

Draft: 1jan2016 c©David Pollard

Page 9: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

§7.3 Measures versus partitions 8

7.3 Measures versus partitionsMajorizing::S:MM.part

This Section explains the essential equivalence between majorizing measuresand admissible sequences of partitions.

7.3.1 From measures to partitionsMajorizing::MM.to.PART

The construction is easier to understand when written in a form that hasnothing to do with Young functions and convexity. The next Lemma provesexistence of a (not necessarily nested) sequence of partitions πk of T thathas properties analogous to <6>. Be sure to notice the simple trick wherethe properties of a measure are needed: if A1, . . . , An are disjoint Borelsubsets of T with mini µAi ≥ ε then n ≤ 1/ε.

Majorizing::mm.partition <10> Lemma. Suppose h : (0, 1]→ R+ is continuous and strictly decreasing withh(r) → ∞ as r → 0 and h(1) = 1. Let µ be a Borel probability measureon T . Define

γh(t, µ, δ) :=

∫ δ

0h (rµB[t, rD]) dr for t ∈ T and δ ∈ (0, 1],

γh(T, µ, δ) := supt∈T γ(t, µ, δ).

Suppose γ(T, µ, 1) <∞. Then for each λ > 1 and for each k ∈ N there exista finite partition πk of T for which

(i) #πk ≤ 2/h−1(λk)

(ii) for each p ∈ N and t ∈ T ,∑k≥p

λkdiam(Ek(t)) ≤8Dλ

λ− 1γh(T, µ, δp)

where δp := γh(T, µ, 1)/λp.

Proof Without loss of generality we may assume D = 1. For each j ∈ N0

define rj = 2−j and abbreviate B[t, rj ] to Bj [t].The function H(t, r) := h(rµB[t, r]) is decreasing in r. The sequence

Hj(t) := H(t, rj) is increasing in j and

1 = H0(t) ≤ H1(t) ≤ · · · → ∞ as j →∞, for each t ∈ T .

For each k ∈ N define εk := h−1(λk) and

Jk(t) := maxj ∈ N0 : Hj(t) ≤ λk = maxj ∈ N0 : rjµBj [t] ≥ εk.

Draft: 1jan2016 c©David Pollard

Page 10: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

§7.3 Measures versus partitions 9

By construction,

Jk(t) = j iff Hj(t) ≤ λk < Hj+1(t)

iff rjµBj [t] ≥ εk > rj+1µBj+1[t].\E@ Jk.def\E@ Jk.def <11>

For each k ∈ N the sets Cj,k := t ∈ T : Jk(t) = j form a partition of T .A refinement of that partition will define πk.

tiEi,j

Cj = t ∈T: Jk(t)=j

C0 C1

Construction of πkWith k fixed for the moment, let me omit some j and k subscripts whiledefining πk, in order to avoid a lot of notational clutter. Partition each

nonempty Cj into subsets Ei,j : i = 1, . . . , nj, each havingdiameter at most 4rj , by first choosing a maximal 2rj-separatedsubset ti : i = 1, . . . , nj of Cj . The balls Bj [ti] are disjointand, by <11>, each has µ measure at least εk/rj . From the factthat µ(T ) = 1 it follows—the key idea—that nj ≤ rj/εk. Bymaximality of the separated set, each t in Cj must lie withina distance 2rj of some ti. Define Ei,j to be the set of those tin Cj for which ti is the closest, with the smallest i being chosenin the case of ties. The total number of Ei,j ’s, the sum of allthe nj ’s, is at most

∑j≥0 rj/εk = 2/εk, as asserted by (i).

If Jk(t) = j then Ek(t) is one of the Ei,j sets, which has diameter atmost 4rj . Thus

diam(Ek(t)) ≤ 4× 2−Jk(t) = 4∑

j∈N0

Jk(t) < jrj .

The last expansion opens the way for another cute argument based on in-terchanging the order of summation in two geometric series.

λp

δpr

H(t,r)

rjrjrj+1

Hj(t)

(t)

Control of the sum in (ii)By the equivalences in <11>, the inequality Jk(t) < j is equal to the in-equality Hj(t) > λk. In addition, if k ≥ p then we also have Hj(t) > λp,that is, rj < δp(t), where H(t, δp(t)) = λp. To eliminate the dependenceof δp(t) on t, note that

λpδp(t) ≤∫ 1

0H(t, r) dr = γh(t, µ, 1) ≤ γh(T, µ, 1),

Draft: 1jan2016 c©David Pollard

Page 11: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

§7.3 Measures versus partitions 10

which implies δp(t) ≤ δp := γh(T, µ, 1)/λp. Thus∑k≥p

λkdiam(Ek(t)) ≤ 4∑k≥p

λk∑

j∈N0

Jk(t) < jrj

≤ 4∑j∈N0

rjrj < δp∑k≥p

λkHj(t) > λk.

The sum over k is at most 1 + λ+ · · ·+ λK ≤ λK+1/(λ− 1) where K is thevalue for which λK < Hj(t) ≤ λK+1. The double sum is bounded by

4∑

j∈N0

rjrj < δpλHj(t)

λ− 1≤ 8λ

λ− 1γh(t, µ, δp),

as asserted.

Majorizing::MM.partition.eg <12> Example. Suppose µ is a majorizing measure on T , that is, suppose γ(T, µ) =supt∈T γ(t, µ, 1) is finite, where

γ(t, µ, δ) :=

∫ δ

0Ψ−1

(1

µB[t,Dr]

)dr.

Define h(x) := Ψ−1(κ/x), where κ := Ψ(1). Then, for some constant c1

(corresponding to u1 = 1 in inequality <4>),

γh(t, µ, δ) =

∫ δ

0h (rµB[t, rD]) dr ≤ c1

(γ(t, µ, δ) +

∫ δ

0Ψ−1(κ/r) dr

).

The growth condition<3> ensures (Problem [1]) that g(δ) :=∫ δ

0 Ψ−1(κ/r) dris finite for 0 < δ ≤ 1. In particular, there exists some constant CΨ depend-ing only on Ψ for which

γh(T, µ, 1) ≤ CΨγ(T, µ, 1).

Lemma <10> gives a sequence of (not necessarily nested) partitions πkwith

#πk ≤ 2/h−1(λk) = 2Ψ(λk)/κ

and, for some constant K = Kλ,Ψ,

\E@ gam.p\E@ gam.p <13> γλ(T, π, p) ≤ DK (γ(T, µ, δ) + g(δ)) .

Draft: 1jan2016 c©David Pollard

Page 12: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

§7.3 Measures versus partitions 11

In particular,

γλ(T, π) ≤ DK (γ(T, µ) + g(1)) .

Because γ(T, µ) ≥ Ψ−1(2)/3 and g(1) < ∞, there exists some constant K1

(depending on Ψ) for which g(1) ≤ K1γ(T, µ). At the cost of an extrafactor 1 +K1 the g(1) can be discarded from the upper bound.

You might well object that the 2Ψ(λk)/κ bound for the size of πk doesnot quite match the Ψ(λk) upper bound for (Ψ, λ)-admissibility. That istrue, but not a great concern. You could safely skip the following nitpickingexplanation.

We could just absorb an extra factor of 2/κ into the upper bounds.Alternatively, we could find a positive integer ` for which 2/κ ≤ λ` so that

#πk ≤ λ`Ψ(λk) ≤ Ψ(λk+`).

We could then define a new sequence of partitions,

πk =

π0 for k ≤ `πk−` for k > `

,

for which #πk ≤ Ψ(λk) and γ(T, π) ≤∑

k≤` λkD + λ`γλ(T, π). The fact

that γλ(T, π) ≥ D ensures that γ(T, π) ≤ K2γλ(T, π), for some K2 depend-ing on λ and Ψ. And so on.

A similar modification bridges the gap between Talgrand’s admissibiltyand (Ψ, λ)-admissibility. Suppose Ψ(x) ≤ exp(xα) for some α > 0. If wechoose λ = 21/α then

#πk ≤ Ψ(λk) ≤ exp(2k) ≤ 22k+1.

While we are at it we can also take care of the nesting, replacing πk by thecommon refinement of π0, . . . , πk, a partition containing at most∏

i≤k22i+1

= 2∑i≤k 2i+1

≤ 22k+2

members. More relabelling, with two more copies of π0 at the start ofthe sequence then gives the desired nested sequence with the desired upperbound on the cardinality.

Draft: 1jan2016 c©David Pollard

Page 13: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

§7.3 Measures versus partitions 12

7.3.2 From partitions to majorizing measureMajorizing::part.to.MM

The path from (Ψ, λ)-admissible partitions back to a majorizing measure ismuch more direct, particularly so if the partitions are nested.

For each k let Sk consist of a single point from each E in πk, so that

#Sk = Nk := #πk ≤ Ψ(λk).

Let µk denote the uniform distribution on Sk, with mass 1/Nk on each point.The exponential growth of Ψ ensures convergence of

∑k 1/Ψ(λk) and

existence of an ` ∈ N for which∑

k∈N0θk ≤ 1, where θk := 1/Ψ(λk+`).

Define µ := ν +∑

k∈N0θkµk with ν an arbitrary measure having total

mass 1−∑

k θk.Write ∆k(t) for diam(Ek(t))/D. The assumed finiteness of γλ(T, π) en-

sures that ∆k(t) → 0 as k → ∞, although the convergence might not bemonotone if the partitions are not nested. For each t ∈ T and each r ∈ [0, 1],the closed ball B[t, rD] contains the set Ek(t) if r ≥ ∆k(t). In that case,

µB[t, rD] ≥ µEk(t) ≥ θk/Nk,

which implies

G(t, r) := Ψ−1(1/µB[t, rD]) ≤ Ψ−1(

Ψ(λk+`)Ψ(λk))≤ c0

(λk+` + λk

).

For each t ∈ T and r ∈ (0, 1) there exists a positive integer k forwhich ∆k(t) ≤ r < ∆k−1(t): just choose the first integer k with ∆k(t) ≤ r.That is, for some subset Nt of N, the open interval (0, 1) is covered by aunion of intervals ∪k∈Nt [∆k(t),∆k−1(t)). It follows that

γ(t, µ) =

∫ 1

0G(t, r) dr

≤∑

k∈Nt

∫∆k(t) ≤ r < ∆k−1(t)G(t, r) dr

≤∑

k∈Nt2c0λ

k+`∆k−1(t)

≤ 2c0λ`+1γλ(t, π) ≤ Cλγλ(T, π).

Remark. You might have been expecting an analog of inequality <13>,something of the form

γ(T, µ, δp) ≤ Cλγλ(T, π, p) for all p ∈ N0.

Draft: 1jan2016 c©David Pollard

Page 14: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

§7.4 From partitions to expected suprema 13

Such a universal bound could not be true in general. For example,consider the case where T consists of exactly n points. There must besome t for which µt ≤ 1/n, which implies

γ(t, µ, δ) ≥∫ δ

0

Ψ−1(n) dr = δΨ−1(n)

whereas γλ(T, π, p) is zero when p is large enough that each Ep(t) is asingleton.

7.4 From partitions to expected supremaMajorizing::S:part-fnal

The chaining argument, starting from a sequence π = πk : k ∈ N0of (Ψ, λ)-admissible partitions (with γλ(T, π) < ∞), is similar to one al-ready described in Section 4.7 if the partitions are nested. As Example <12>showed, at least for any of the Ψα Young functions, some refining and rela-belling would create a nested sequence. However, I have asserted that thenesting is more of a convenience than a necessity, so let me take this oppor-tunity to show you how chaining works when the approximating subsets Siare not nested.

As in subsection 7.3.2, for each k let Sk = tE : E ∈ πk consist of asingle point tE chosen from each E in πk, so that

#Sk = Nk := #πk ≤ Ψ(λk).

Define maps `k : T → Sk by `kt = tE if Ek(t) = E ∈ πk. For each t in Smand each p ≤ m we again have a chain of points

t 7→ `m−1t 7→ `m−2t . . . 7→ `pt ∈ Sp

linking t to a point of Sp.As in Section 4.7 we still have an upper bound via the triangle inequality

applied to the sum of increments down the chain,

\E@ Xt.incr\E@ Xt.incr <14> |X(t)−X(`pt)| ≤∑m−1

i=p|X(`i+1t)−X(`it)|.

The choice of the `i’s ensures that

d(`i+1t, `it) ≤ d(`i+1t, t) + d(t, `it) ≤ ∆i(t) + ∆i+1(t)

where ∆k(t) := diam(Ek(t). Instead of the uniform control over the diam-eters ∆i(t) provided by the packing numbers we now have indirect controlvia the bound

\E@ MMp\E@ MMp <15>∑

k≥pλk∆k(t) ≤ γλ(T, π, p).

Draft: 1jan2016 c©David Pollard

Page 15: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

§7.4 From partitions to expected suprema 14

The lack of nesting has also created a subtle difference from the previousapproach, due the new choice of the linking functions `k.

Remark. Actually inequality <15> does give us a uniform bound,∆k(t) ≤ γλ(T, π, p)/λk, but that rate of decrease is not enough to offsetthe growth of #Sk.

In Section 4.7 the `k was defined as a map from Sk+1 to Sk, with Lkdefined by following a chain of `i-links between succesive Si+1 and Si; thedomain of `k is now the whole of T and there are no Lk maps. Previously,if Li+1t = Li+1t

′ then Lit = Lit′; now two distinct points t and t′ in Sm

might map to the same point of Si+1 but different points of Si. That is, tand t′ might belong to the same E ∈ πi+1, which ensures that `i+1t =`i+1t

′, but to different sets, F and F ′, of πi. We still have an analog ofthe bound <14> for |X(t′)−X(`pt

′)|, but the increments X(`i+1t)−X(`it)and X(`i+1t

′)−X(`it′) need not be same even though Ei+1(t) = Ei+1(t′).

E ∈ πi+1

F' ∈ πiF ∈ πi tFtE

tF'

t t'Not to despair. The difference X(`i+1t) −X(`it) is still taken between

one point in Si+1 and one point in Si. There are at most NiNi+1 suchpairs; with nested partitions we had at most Ni+1 links to worry about.Fortunately the growth assumption on Ψ makes it just as easy to handle aset of NiNi+1 increments as to handle a set of Ni+1 increments. Define

\E@ max.pairs\E@ max.pairs <16> Mi := max

|X(si+1)−X(si)|

d(si+1, si): si ∈ Si, si+1 ∈ Si+1, d(si+1, si) > 0

.

Then, for each t in Sm,

|X(`i+1t)−X(`it)| ≤Mid(`i+1t, `it) ≤Mi

[∆i(t) + ∆i+1(t)

]and on the set Ωy := Mi ≤ 2yλi for all i,

|X(t)−X(`pt)| ≤ 2y∑m−1

i=pλi[∆i(t) + ∆i+1(t)

]≤ 4yγλ(T, π, p).

Draft: 1jan2016 c©David Pollard

Page 16: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

§7.4 From partitions to expected suprema 15

The inequality holds (on Ωy) uniformly for t in Sm, even though we do nothave strong enough uniform control over the individual ∆i(t) diameters. Wehave only to check that PΩy is close enough to 1 if y is large enough:

PΩcy = P∃i ≥ p : Mi > 2yλi

≤∑

i≥pP⋃

si,si+1

|X(si+1)−X(si)| > 2yλid(si, si+1)

≤∑

i≥pNiNi+1β(2yλi) by <2>

≤∑

i≥p

Ψ(λi)Ψ(λi+1)

Ψ(2yλi).

As usual, the β contribution has to be small enough to kill off the NiNi+1

contribution with enough in reserve to make the sum nicely convergent andsmall. Two appeals to the growth assumption <3> ensure that

Ψ(λi)Ψ(λi+1)Ψ(yλi/c0) ≤ Ψ(2yλi) if y ≥ Kλ := 2c20(1 + λ).

For such y,

PΩcy ≤

∑i≥p

1

Ψ(yλi/c0)≤∑

i≥p

1

λiΨ(y/c0)≤

K ′λΨ(y/c0)

,

where K ′λ :=∑

i≥0 λ−i <∞. That is,

\E@ MM.chain\E@ MM.chain <17> Pmaxt∈Sm

|X(t)−X(`pt)| > 4yγλ(T, π, p) ≤K ′λ

Ψ(y/c0)for y ≥ Kλ,

where Kλ and K ′λ are constants that depend on λ and Ψ.If we integrate the last inequality with respect to y we reecover a bound

on the expected value:

Pmaxt∈Sm |X(t)−X(`pt)| ≤ 4γλ(T, π, p)

(∫ Kλ

0dy +

∫ ∞Kλ

K ′λΨ(y/c0)

dy

).

That is,

\E@ MM.mean.approx\E@ MM.mean.approx <18> Pmaxt∈Sm |X(t)−X(`pt)| ≤ Cλγλ(T, π, p) for each m ≥ p ∈ N0,

where the constant Cλ depends only on λ. In particular, if the X process iscenetered (zero means) and we take p = 0 then let m tend to infinity we get

F (T ) ≤ P supt∈T |Xt| ≤ Cλγλ(T, π).

Draft: 1jan2016 c©David Pollard

Page 17: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

§7.5 The role of Gaussianity 16

The stronger inequality <18> also delivers a result stronger than bound-edness of sample paths if γλ(T, π, p) goes to zero as p tends to infinity. Re-call from Section 5.3 that X has uniformly continuous sample paths on thecountable index set T if: for each η > 0 and ε > 0 there exists a δ > 0 suchthat Posc(δ,X, T ) > η < ε. It suffices to prove existence of such a δ (notdepending on m) for which

\E@ osc.Sm\E@ osc.Sm <19> Posc(δ,X, Sm) > η < ε for every m.

Majorizing::MM.unifcty <20> Example. An analog of the argument used in Section 4.5 to bound a normof the oscillation also establishes <19> if γλ(T, π, p)→ 0 as p→∞. In factthe argument is a little easier because π has already created the equivalenceclasses used in the proof.

Fix m. For each p define Λp := maxt∈Sm |X(t)−X(`pt)|. Invoking <18>,choose p so that PΛp ≤ Cλγλ(T, π, p) < ηε.

For distinct E,F ∈ πp choose points τE,F ∈ E and τF,E ∈ F such that

d(τE,F , τF,E) := d(E,F ) := mind(s, t) : s ∈ E ∩ Sm, t ∈ F ∩ Sm.

Define

Mp := max

|X(τE,F )−X(τF,E)|

d(E,F ): E,F ∈ πp and d(E,F ) > 0

,

a maximum of fewer than(Np2

)< N2

p random variables each with Ψ-normat most 1. From one of the maximal inequalities in Section 4.4, we have theupper bound PMp ≤ Ψ−1

(N2p

).

If s, t ∈ Sm ∩ E for some E ∈ πp then |Xs −Xt| ≤ 2Λp. If s ∈ Sm ∩ Eand t ∈ Sm ∩ F for distinct E,F in πp then `ps = `pτE,F and `pτF,E = `pt.If, in addition, d(s, t) < δ then d(E,F ) < δ and

|Xs −Xt| ≤ |Xs −X(`ps)|+ |X(`P τE,F )−X(τE,F )|+ |X(τE,F −X(τF,E)|+ |X(τF,E −X(`pτF,E |+ |X(`pt)−Xt|

≤ 4Λp + δMp.

It follows that

Posc(δ,X, Sm) ≤ 4PΛp + δPMp ≤ 4ηε+ δΨ−1(N2p ).

If δ is small enough then Posc(δ,X, Sm) > η ≤ 5ηε/η.

Draft: 1jan2016 c©David Pollard

Page 18: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

§7.5 The role of Gaussianity 17

7.5 The role of GaussianityMajorizing::S:Gaussfnal

Suppose X = Xt : t ∈ T is a centered Gaussian process, with T atworst countably infinite. Equip T with the metric dX for which d2

X(s, t) =var(Xs −Xt).

For each λ > 1 and Ψ2(x) = ex2 − 1, Theorem <8> part (iv) asserts:

if F (T ) := P supt∈T Xt is finite then there exists a (Ψ2, λ)-admissible sequence of partitions π for which γλ(T, π) ≤ CλF (T ).

Talagrand’s argument—adapted to my setting—depends on just twoproperties of Gaussians, both of which are discussed in Chapter 6.

Majorizing::Borell.subg <21> Borell’s inequality. There exists a universal constant CBor for which: ifY = Yt : t ∈ T is Gaussian process with T finite or countably infinite andboth m := P supt∈T Yt < and σ2 := supt∈T var(Yt) are finite then

‖supt∈T Yt −m‖Ψ2≤ CBorσ.

Majorizing::Sudakov <22> Sudakov’s inequality. There exists a universal constant CSud > 0 forwhich: if Y := (Y1, Y2, . . . , Yn) has a centered multivariate normal distri-bution, with P|Yj − Yk|2 ≥ δ2 for all j 6= k then

Pmaxi≤n Yi ≥ CSudδΨ−12 (n) for n ≥ 2.

Remark. I have replaced the lower bound, which in Chapter 6 was aconstant multiple of δ

√log2 n , by a multiple of δΨ−1

2 (n), in order tostress the role played by Ψ2. The substitution is justified by the factthat log2 n/ log(1 + n) > (0.95)2 for n ≥ 2.

The next Lemma, which corresponds to Talagrand (2005, Proposition 2.1.4)and Talagrand (2014, Proposition 2.4.9), captures everything we need toknow about Gausianity before constructing the admissible sequence of par-titions.

Remark. It is truly amazing that Talagrand’s method also works forother sorts of functional, making it applicable to many nongaussianproblems. The relevant theory for general functionals, and a proof ofhow they generate nice nested partitions of the index set, was presentedin a brutal way by Talagrand (2005, Section 1.3) and more gently byTalagrand (2014, Section 2.3). I found it enlightening to see how theGaussian case works before plunging into the general setting.

Draft: 1jan2016 c©David Pollard

Page 19: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

§7.6 From functional to nested partitions 18

Majorizing::maxfnal <23> Lemma. There exists a universal constant r0 > 2 for which the followingproperty holds for every δ > 0. If t1, . . . , tn, with n ≥ 2, is a r0δ-separatedsubset of T , and if Hi ⊆ B[ti, δ] for each i, then

F(⋃

i≤nHi

)≥ δΨ−1

2 (n) + mini≤n F (Hi).

Remark. The balls B[ti, δ] and B[tj , δ] are disjoint for i 6= j.

Proof Write H for⋃i≤nHi and define Yi = supt∈Hi(Xt −Xti). Note that

PYi = F (Hi) and supt∈Hi var(Xt − Xti) ≤ δ2 because dX(t, ti) ≤ δ for tin Hi. By Borell’s inequality, the variable Wi = Yi − F (Hi) is centeredsubgaussian with ‖Wi‖Ψ2

≤ CBorδ. As in Section 4.4,

Pmaxi≤n |Wi| ≤ CBorδΨ−1(n)

because Ψ (Pmaxi≤n |Wi|/(CBorδ)) ≤∑

i≤n PΨ2(|Wi|/(CBorδ)). By Sudakov’sinequality,

Pmaxi≤nXti ≥ CSudr0δΨ−1(n).

Calculate.

supt∈H Xt = maxi≤n (Xti + Yi)

= maxi≤n (Xti + F (Hi) +Wi)

≥ maxi≤nXti + mini≤n F (Hi)−maxi≤n |Wi|.

Take expectations to deduce

F (H) ≥ CSudr0δΨ−1(n) + mini≤n F (Hi)− CBorδΨ

−1(n)

Choose r0 := max(2.1, (1 + CBor)/CSud).

7.6 From functional to nested partitionsMajorizing::S:bdd

Once again let X = Xt : t ∈ T be a centered Gaussian process, withindex set T at worst countably infinite (to ward off measurability evils) andequipped with the metric dX . For each λ > 1 this Section shows how toconstruct a (Ψ2, λ)-admissible sequence of partitions π for which γλ(T, π) ≤CλF (T ), where Cλ is a constant that depends only on λ.

The partitions will actually be nested (despite all that talk in Section 7.4explaining why nesting is not essential). The argument is recursive. To give

Draft: 1jan2016 c©David Pollard

Page 20: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

§7.6 From functional to nested partitions 19

myself a little more wiggle room, I’ll relax the constraint that #πk ≤ Ψ2(λk),by requiring only that #πk ≤ mk := bΨ2(λk+`)c, where the integer ` ischosen so that m1 − 1 > Ψ2(λ1+`/2) ≥ 2. You would need to engage insome relabelling in the nitpicking style of Example <12> to expunge thesuperfluous `.

The recursive strategy makes use of a localized form of F ,

F (A, δ) := supt∈T

F (A ∩B[t, δ]) = supt∈T

P supXs : s ∈ A and d(s, t) ≤ δ.

The strategy is slightly greedy. We start with π0 = T, constructing π1

as a partition of T into at most m1 subsets. Most of those subsets will havediameter smaller than 2D/r2, where r is chosen (strategically) larger thanthe r0 of Lemma <23>. Those will be called the ‘small’ sets. At most one ofthe sets (call it Ep) in π1 will be ‘big’, in the sense that its diameter might notbe smaller than D but, via Lemma <23>, we gain control over F (Ep, D/r

2).We then continue recursively, partitioning each member of π1 into at

most m2 subsets—mostly small, with no more than one of them ‘big’. Andso on. The trick is to anticipate at each step what will happen after twomore recursive steps. If the sets of πk+1 are thought of as the children of πkthen the strategy is: be greedy for the great-grandchildren. As Talagrand(2005, page 22, last line) noted, this is the main clever idea in the proof.

For each partitioning subset E the proof keeps track of not just its diam-eter but also of a closed ball, which contains E, of radius ρ(E) and centeredat a point τ(E) in T . The center τ(E) need not belong to E. The diameterof E is at most 2ρ(E).

The next Lemma captures the details of the recursive partitioning step.

Majorizing::gauss.partition <24> Lemma. Suppose A is a subset of T contained within some closed ball B[τ, ρ]and r−1 is greater than the r0 from Theorem <23>. Then for each positiveinteger m ≥ 2 and each ε > 0 there exists a finite partition π = E1, . . . , Epof A with p ≤ m and points ti ∈ T for which:

(i) Ei ⊆ B[ti, ρ/r], that is, τ(Ei) = ti and ρ(Ei) = ρ/r, for each i < m(the ‘small sets’)

(ii) if p = m (a ‘big set’) then Em ⊆ B[τ, ρ] (of course, so τ(Em) = τ andρ(Em) = ρ) and

F (Em, ρ/r2) + (ρ/r2)Ψ−1

2 (m− 1) ≤ F (A) + ε

Draft: 1jan2016 c©David Pollard

Page 21: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

§7.6 From functional to nested partitions 20

Proof Define δ = ρ/r2. Construct the Ei sets recursively in a semi-greedyfashion: Instead of trying to make each F (Ei) as large as possible, lookahead to a recursive application of the Lemma within each Ei by trying tomake Fδ(Ei) as large as possible. Algorithmically:

procedure make small setsi← 1 and Ai ← Awhile i < m do

choose ti ∈ T with F (Ai ∩B[ti, δ]) > F (Ai, δ)− εHi ← Ai ∩B[ti, δ]Ei ← Ai ∩B[ti, rδ] % so τ(Ei) = ti and ρ(Ei) = rδAi+1 ← Ai\Eiif Ai+1 = ∅ then

partition complete; exit loopend ifi← i+ 1

end whileend procedure

The procedure ends either when Ai+1 = ∅, which means that A is alreadycompletely partitioned into fewer than m small sets, or when i = m. Inthe latter case, we are left with a nonempty subset Am and disjoint sub-sets E1, . . . , Em−1 with each Ei containing a subset Hi.

B[τ,ρ]B[t1, δ]

A2

TB[t1, rδ]

B[t1, r0δ]

The picture shows the situation after the first passage through the loop.The set A (which is represented by a rectangle, rather than a region definedby intersections of closed balls and their complements) sits inside the bigball B[τ, ρ]. For no good reason, the point t1 is shown sitting inside the sameball. The shaded region shows E1, the part of A carved out by B[t1, rδ].The dotted circle is the ball B[t1, r0δ], which has not yet been mentioned inthe construction. If A2 6= ∅, I claim that the point t2 cannot lie within the

Draft: 1jan2016 c©David Pollard

Page 22: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

§7.6 From functional to nested partitions 21

ball B[t1, r0δ], for otherwise (because r > 1 + r0)

A2 ∩B[t2, δ] ⊆ A2 ∩B[t1, rδ] = ∅.

Maybe I should have defined F (∅) = −∞ to make sure that an empty setcannot get close to the supremum that defines F (A2, δ). A similar argumentworks for each of the succeeding runs through the loop, which ensures thatthe set t1, . . . , tm−1 is r0δ-separated. Lemma <23> then tells us that

F (A) ≥ F (∪i<mHi) ≥ δΨ−12 (m− 1) + mini<m (F (Ai, δ)− ε) .

If we choose Em = Am, a subset of every Ai, then the minimum is greaterthan F (Em, δ), as asserted by (ii).

Remark. In an earlier version of the construction from Lemma <24>,Talagrand (1992) allowed the loop to run until Ai+1 = ∅. It had tostop after a finite number of steps because finiteness of F (T ) impliesthat T is totally bounded, and all the ti’s are r0δ-separated: at somepoint Ai+1 must be empty. (Actually, Talgrand assumed finiteness of Tin that paper; total boundedness was not an issue.)

Finally I can explain in detail how to construct the nested sequence ofpartitions for which

\E@ the.task\E@ the.task <25> supt∈T∑

k≥0λkdiam(Ek(t)) ≤ CλF (T ).

We start with π0 = T and ρ(T ) = D. With mk = bΨ2(λk+`)c for a largeenough ` we ensure that mk ≥ 2 and Ψ−1

2 (mk−1) ≥ λk+`/2 for k ∈ N. Defineεk = F (T )/2k. Define r := 2λ + r0. Apply Lemma <24> first with A = Tand m = m1 and ε = ε1 to construct π1. Then apply the Lemma to each Ain π1, with m = m2 and ε = ε2 to construct π2. And so on.

Consider any t in T . As that t stays fixed for the rest of the proof I’ll dropit from the notation, writing Ek instead of Ek(t) and ρk instead of ρ(Ek(t)).The k = 0 term in <25> causes no problem because (Problem [3])

diam(E0) = D ≤√

2πF (T ).

For ease of notation let me call E0 a ‘big’ term. The sequence of setsE0 ⊇ E1 ⊇ E2 . . . consists of ‘big’ and ‘small’ sets. If Ek is small thenρk = ρk−1/r. If k ≥ 1 and Ek is big then ρk = ρk−1 and

\E@ bigEk\E@ bigEk <26> εk + F (Ek−1) ≥ F (Ek, ρk/r2) + c0ρkλ

k

Draft: 1jan2016 c©David Pollard

Page 23: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

§7.6 From functional to nested partitions 22

where c0 := 12λ

`/(2λ+ r0)2 is a positive constant that depends only on λ.The inequality 2F (T ) ≥ εk+F (Ek−1, ρk−1) ensures ρkλ

k ≤ 2F (T )/c0 forbig Ek. There can be no infinite string of consecutive big sets in the Eksequence: If Ek, Ek+1, . . . , Ek+n are all big then ρk = ρk+1 = · · · = ρk+n

and λn ≤ 2F (T )/(c0ρkλk), constraining the size of n.

Each finite stretch B1, B2, . . . of consecutive big Ek’s is interspersed witha stretch of small sets. In fact one stretch of small Ek’s might be infinite, inwhich case there are no more big sets. The pattern looks like:

B1︷︸︸︷bbb

S1︷ ︸︸ ︷sssss

B2︷︸︸︷bb

S2︷︸︸︷ss

B3︷ ︸︸ ︷bbbbb

S3︷ ︸︸ ︷sssss

B4︷ ︸︸ ︷bbbbb

S4︷︸︸︷sss

B5︷︸︸︷b

S5︷ ︸︸ ︷sssss . . .

↓ ↓ ↓ ↓ ↓k1 k2 k3 k4 k5

The index kj = k(j) gives the position of the last big Ek in block Bj . Theset K = k1, k2, . . . might be finite or infinite.

I assert that∑k≥1

ρkλk ≤ c1

∑k∈K

ρkλk

for a constant c1 that depends only on λ. Indeed, if kj = α then ραλα

dominates the sum of all the terms in blocks Bj and Sj , in the sense that∑k∈Bj

ρkλk ≤ ρα(λα + λα−1 + . . . ) ≤ ραλα/(1− 1/λ)∑

k∈Sjρkλ

k ≤(λα+1ρα/r + λα+2ρα/r

2 + . . .)≤ ραλα/(1− λ/r).

Now you know why I made sure that r > 2λ.If #K ≤ 2 then

∑k∈K ρkλ

k ≤ 4F (T )/c0. The case where #K > 2 ismore interesting; it involves great-grandchildren.

Write fk for F (Ek, ρk/r2). For big Ek, inequality <26> becomes

εk + F (Ek−1) ≥ fk + c0ρkλk

Suppose α = kj and β = kj+2. Both Eα and Eβ are big and between themare at least two small Ek sets: at worst bsbsb. We have ρα/r

2 ≥ ρβ = ρβ−1

and

εβ + F (Eβ−1) ≥ fβ + c0ρβλβ.

Draft: 1jan2016 c©David Pollard

Page 24: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

§7.7 Uniformly continuous Gaussian sample paths 23

The set Eβ−1 is either a great-grandchild of Eα or a descendant of such agreat-grandchild. In any case,

fα = F (Eα, ρα/r2)

≥ F (Eα ∩B[τ(Eβ−1), ρβ−1])

≥ F (Eβ−1) because Eβ−1 ⊆ Eα ∩B[τ(Eβ−1), ρβ−1]

≥ fβ + c0ρβλβ − εβ.

That is,

ρk(j+2)λk(j+2) ≤ εk(j+2)−1 + fk(j) − fk(j+2).

If we sum over all odd values of j the right-hand side telescopes, leavingsomething smaller than a constant multiple of F (T ). The argument foreven values is similar. The sum

∑k∈K ρkλ

k is bounded above by a constantmultiple of F (T ).

7.7 Uniformly continuous Gaussian sample pathsMajorizing::S:cts

Suppose X = Xt : t ∈ T is a centered Gaussian process, with T countable.Even without Gaussianity, Example <20> shows that existence of a (Ψ, λ)-admissible sequence π for which γλ(T, π, p) → 0 as p → ∞ implies uniformcontinuity of the sample paths. The converse is also true.

By virtue of Lemma <10> and Example <12> it suffices to prove ananalogous result for majorizimg probability measures.

Majorizing::MM.cts <27> Theorem. If X is a centered Gaussian process with uniformly continuoussample paths and F (T ) <∞ then there exists a Borel probability measure µon T for which

supt∈T

∫ δ

0Ψ−1

2 (1/µB[t, r]) dr → 0 as δ → 0.

Remark. The following argument is based on a proof by Talagrand(1987, page 122-124), who attributed the method of proof to Fernique.See Talagrand (2014, Appendix B) for the corresponding proof viaadmissible sequences of partitions.

The proof combines the virtues of packing numbers and majorizingmeasures.

The assumption that F (T ) is finite can be weakened to anassumption that T is totally bounded under its dX metric, becauseuniformly continuous functions on totally bounded sets are necessarilybounded.

Draft: 1jan2016 c©David Pollard

Page 25: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

§7.8 Problems 24

Proof Uniform continuity of sample paths implies that

osc(δ,X, T ) := sup|Xs−Xt| : d(s, t) < δ → 0 almost surely as δ → 0.

And the function supt∈T |Xt| is integrable, because

P supt∈T|Xt| ≤ P sup

t∈TXt + P sup

t∈T(−Xt) = 2F (T ) <∞.

It follows by Dominated Convergence that βk := Posc(δk, X, T ) → 0 forany sequence δk → 0.

Finiteness of F (T ) and Sudakov’s inequality together imply that T istotally bounded. As a consequence, for each δk there is a finite partition πkof T into finitely many, say Nk, sets of diameter at most δk.

For each member E of the partition πk choose a point τ(E). Then

F (E) := P supt∈E Xt = P supt∈E(Xt −Xτ(E)) ≤ βk,

the final inequality coming from the fact that dX(t, τ(E)) ≤ diam(E) ≤δk. Through the equivalences noted in Theorem <8>, there exists a Borelprobability measure µE on E for which

supt∈E

∫ diam(E)

0Ψ−1

2 (1/µEB[t, r]) dr ≤ Cβk,

where C is a universal constant.Define a probability measure on T by

µ :=∑

k≥1

1

2kNk

∑E∈πk

µE .

To show that µ has property asserted by the Theorem, consider a fixed tin T . To simplify notation, abbreviate Ek(t) to Ek and µEk(t) to µk. Also

remember that Ψ−12 (uv) ≤ Ψ−1

2 (u)+Ψ−12 (v) for all u, v ≥ 0. Then for each k

and each δ ≤ δk,∫ δ

0Ψ−1

2 (1/µB[t, r]) dr ≤∫ δ

0Ψ−1

2

(2kNk

µkB[t, r]

)dr

≤∫ δ

0Ψ−1

2 (2kNk) dr +

∫ δk

0Ψ−1

2 (1/µkB[t, r]) dr

≤ δΨ−12 (2kNk) + Cβk + δkΨ

−12 (1).

The last inequality holds because µkB[t, r] = 1 for r ≥ diam(Ek). The upperbound is uniform in t. The rest is easy. Given ε > 0, just choose k then δto make the upper bound less than ε.

Draft: 1jan2016 c©David Pollard

Page 26: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

§7.9 Notes 25

7.8 ProblemsMajorizing::S:Problems

[1] Suppose the Young function Ψ satisfies the growth conditionMajorizing::P:Psi.growth

Ψ(x)2 ≤ Ψ(cx) whenever x ≥ x0

for some constant c ≥ 2. For some α > 0, show that Ψ(y)/ exp(yα) → ∞as y → ∞. Hint: For some b > x0 we have Ψ(b) ≥ e. Show that Ψ(ckb) ≥exp(2k). Consider y with ck+1b > y ≥ ckb.

[2] Suppose µ is a probability measure that concentrates on a countable sub-Majorizing::P:MM.extend

set T1 of T and γ(T1, µ) < ∞. It can also be regarded as a probabilitymeasure on T . Show that γ(t, µ, δ) ≤ sups∈T1 γ(s, µ, δ) for each t ∈ Tand 0 < δ ≤ 1. Hint: If t ∈ T and s ∈ T1 with d(s, t) < εD thenB[t, (r + ε)D] ⊃ B[s, rD].

[3] Show that F (T ) ≥ D/√

2π. Hint: Pmax(Xs, Xt) = PXs + P(Xt −Xs)+.Majorizing::P:F.diam

[4] Under the conditions of Theorem<27>, prove that δΨ−12 (pack(δ, T, d))→ 0Majorizing::P:Sud.small

as δ → 0.

[5] (Repeated from Chapter 4.) The classical bounds (Feller, 1968, Section VII.1Majorizing::P:normal.max

and Problem 7.1) show the normal tail probability Φ(x) = PN(0, 1) > xbehaves asymptotically like φ(x)/x. More precisely, 1−x−2 ≤ xΦ(x)/φ(x) ≤1 for all x > 0. Less precisely, − log

(c0Φ(x)

)= 1

2x2 + log x + O(x−2) as

x→∞, where c0 =√

2π .

(i) (Compare with Leadbetter et al. 1983, Theorem 1.5.3.) Define an =√

2 log nand Ln = log an. For each constant τ define mτ,n = an−(1+τ)Ln/an. Showthat

c0Φ (mτ,n) = n−1 exp (τLn + o(1))

(ii) Define Mn = maxi≤n Zi for iid N(0, 1) random variables Z1, Z2, . . . . Showthat

PMn ≤ mτ,n =(1− Φ(mτ,n)

)n= exp

(−aτn(c−1

0 + o(1)))

(iii) If n increases rapidly enough, show that there exists an interval In oflength o(Ln/an) containing the point an−Ln/an for which PMn /∈ In → 0very fast.

Draft: 1jan2016 c©David Pollard

Page 27: Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He worked with functionals t(T;ˇ) := sup 2T X k2N 0 2k= diam(E k(t)); where E k(t)

Notes 26

7.9 NotesMajorizing::S:Notes

What more can I say? Everything in this Chapter resulted from my attemptsto understand some of the work by Talagrand on majorizing measures. WhatI have covered amounts to parts of two (out of sixteen) chapters in thedefinitive book by Talagrand (2014).

References

Feller1 Feller, W. (1968). An Introduction to Probability Theory and Its Applications(third ed.), Volume 1. New York: Wiley.

Fernique75StFlour Fernique, X. (1975). Regularite des trajectoires des fonctions aleatoiresgaussiennes. Springer Lecture Notes in Mathematics 480, 1–97. Ecoled’Ete de Probabilites de Saint-Flour IV—1974.

LeadbetterLindgrenRootzen83 Leadbetter, M. R., G. Lindgren, and H. Rootzen (1983). Extremes andRelated Properties of Random Sequences and Processes. Springer-Verlag.

LedouxTalagrand91book Ledoux, M. and M. Talagrand (1991). Probability in Banach Spaces:Isoperimetry and Processes. New York: Springer.

Talagrand1987ActaMath Talagrand, M. (1987). Regularity of Gaussian processes. Acta Mathemat-ica 159, 99–149.

Talagrand1992GAFA Talagrand, M. (1992). A simple proof of the majorizing measure theorem.Geometric and Functional Analysis 2 (1), 118–125.

Talagrand96MM Talagrand, M. (1996). Majorizing measures: The generic chaining. Annalsof Probability 24, 1049–1103.

Talagrand2001AnnProb-MMwM Talagrand, M. (2001). Majorizing measures without measures. The Annalsof Probability 29 (1), 411–417.

Talagrand2005MMbook Talagrand, M. (2005). The Generic Chaining: Upper and lower bounds ofstochastic processes. Springer-Verlag.

Talagrand2014MMbook Talagrand, M. (2014). Upper and Lower Bounds for Stochastic Processes:Modern methods and classical problems. Springer-Verlag.

Draft: 1jan2016 c©David Pollard