25
A Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators Shiri Artstein-Avidan * , Mathematics Department, Princeton University Abstract: In this paper we first describe a new deviation inequality for sums of independent random variables which uses the precise constants appearing in the tails of their distributions, and can reflect in full their concentration properties. In the proof we make use of Chernoff’s bounds. We then apply this inequality to prove a global diameter reduction theorem for abstract fam- ilies of linear operators endowed with a probability measure satisfying some condition. Next we give a local diameter reduction theorem for abstract fam- ilies of linear operators. We discuss some examples and give one more global result in the reverse direction, and exensions. Acknowledgement: I would like to thank Prof. Vitali Milman for his support and encouragement, and mainly for his mathematical help and advice. * This research was partially supported by BSF grant 2002-006. 1

A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

  • Upload
    lykiet

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

A Bernstein-Chernoff deviation inequality, and

geometric properties of random families of

operators

Shiri Artstein-Avidan ∗,

Mathematics Department, Princeton University

Abstract: In this paper we first describe a new deviation inequality for sums

of independent random variables which uses the precise constants appearing

in the tails of their distributions, and can reflect in full their concentration

properties. In the proof we make use of Chernoff’s bounds. We then apply

this inequality to prove a global diameter reduction theorem for abstract fam-

ilies of linear operators endowed with a probability measure satisfying some

condition. Next we give a local diameter reduction theorem for abstract fam-

ilies of linear operators. We discuss some examples and give one more global

result in the reverse direction, and exensions.

Acknowledgement: I would like to thank Prof. Vitali Milman for his support

and encouragement, and mainly for his mathematical help and advice.

∗This research was partially supported by BSF grant 2002-006.

1

Page 2: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

The first theorem in this note is a new Bernstein-type deviation inequality

which we prove using Chernoff’s bounds. This theorem is different from the

classical Bernstein inequality in the following way: whereas the condition in

the standard Bernstein inequality is on the global behavior of the random

variables in question, for example a condition on the expectation of ecX2,

in Theorem 1 below the condition uses only the constants appearing in the

tail of the distribution, and so can reflect concentration. Sometimes one can

prove very strong estimates on the tails. In the theorem below these estimates

can be then used and are amplified when one averages many i.i.d. copies of

the variable. This theorem in a special case was brought forward and used

in the paper [AFM] for a specific example. Its proof is straightforward using

only Chernoff’s bounds, and we find this approach insightful and new.

We first apply the deviation inequality for some geometric question. We

present several results regarding the behavior of the diameter of a convex

body under some random operations. The first is a global result, namely

regarding the Minkowski sums of copies of a convex body acted upon by

abstract families of linear operators endowed with a probability measure.

The classical global diameter reduction is the well known special case where

the family of operators is O(n), the family of orthogonal rotations. This

was first observed in [BLM], see also [MiS] for more details. In Section 5 we

revisit this case as an example.

The second result we discuss is of a local nature, and is an extension of

the now well known diameter reduction phenomenon for random orthogonal

projections. This phenomenon was first observed by Milman in his proof for

the quotient of a subspace theorem, [Mi2] (and analyzed as a separate propo-

2

Page 3: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

sition in [Mi3], where more references can be found). It can be considered

today as a consequence of the classical Dvoretzky-type theorem as proved in

[Mi1]. The classical theorem concerns the case where the random operation is

intersection with a random subspace or projection onto a random subspace.

However, in this paper we consider a more general setting. Instead of working

with projections, we deal with an abstract family of linear operators endowed

with a probability measure and find a condition on this measure (which is in

fact a condition on the probabilistic behavior of the operators on individual

elements x ∈ Rn) which promises that a diameter reduction theorem holds.

The proof of the theorem uses Talagrand’s Majorizing Measures Theorem,

see [Tal].

In Section 4 we describe a global result in the reverse direction, describing

in a particular case when does the resulting body contain a euclidean ball.

In the classical setting this kind of containment is the only known reason for

stabilization of the diameter.

We then discuss some examples. We show how the abstract propositions

indeed imply Milman’s diameter reduction theorem for usual orthogonal pro-

jections and global Dvoretzky’s Theorem for unitary transformations (and

the diameter reduction which occurs until stabilization). We describe other

families of operators for which there is a similar diameter reduction. One of

our main goals is to crystalize which properties of the operators are impor-

tant for diameter reduction results to hold. Finally we give two more variants

of the local result.

We remark that the results described in this paper have many similar vari-

ants that can be proven in exactly the same way. The choice of conditions in

3

Page 4: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

each one depend very much on the applications in mind. Thus as much as we

tried to give general and abstract constructions, stating each proposition in

full generality would be notationally very inconvenient. We tried to indicate

in remarks which main variants are possible for each statement.

Recently I learned that results in the spirit of Proposition 3 below are be-

ing studied by the team of A. Litvak, A. Pajor and N. Tomczak-Jaegermann,

see [LPT].

Notation: We use | · | to denote the euclidan norm in Rn, and denote

by Dn the euclidean unit ball, Dn = {x : |x| ≤ 1}. For a centrally sym-

metric convex body K ⊂ Rn we denote by d = d(K) its diameter, so

K ⊂ d(K)Dn. We let M∗ = M∗(K) denote half its mean width, that is

M∗(K) =∫

Sn−1 supy∈K〈x, y〉dσ(x) where Sn−1 is the euclidean unit sphere

and σ denotes the normalized Lebesgue measure on this sphere. Thus M∗ is

the average of the dual norm of K, which we denote by ‖x‖∗ = supy∈K〈x, y〉.

1 A Deviation Inequality

We first describe our main tool, which is a Bernstein-type deviation Theo-

rem. Its proof follows from Chernoff’s bounds, and we provide it below. We

wish to point out the main difference between this theorem and the classical

Bernstein deviation inequality for, say ψ2, random variables. The classical

theorem, for which we refer the reader to, say, [BLM], gives an upper bound

for the probability in (1) below, in the following form: If A is the ψ2-norm

of the random variable X, and Xi are i.i.d. copies of X, then

P[| 1

N

N∑i=1

Xi − EX| > t] ≤ 2e−Nt2/(8A2).

4

Page 5: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

The ψ2-norm of the variable is affected by the constant in the tail estimate,

but not only, and for example the expectation or variance may take a part

and influence this constant A. The purpose of the deviation inequality in

our Theorem 1 is to use the tail estimate itself (and not just the good ψp

behavior following from it). This type of Proposition was first used, for a

special example, in [AFM].

Theorem 1 Assume X is a random variable satisfying

P[X > t] ≤ e−Ktp

for some constant K > 0, some p > 1, and any t > K0. Let X1, . . . , XN be

i.i.d. copies of X. Then for any s > max{C(K, p), K0},

P[1

N

N∑i=1

Xi > 3s] ≤ C0e−N(Ksp−ln 2), (1)

where C0 is a universal constant for p bounded away from 1, and where

C(K, p) = (1+ln 2)

K1/p .

Remark 1. As will be evident from the proof, it is not necessary that

the variables be identically distributed, and it is sufficient that they are

independent and that each satisfies the tail estimate.

Remark 2. The term ln 2 appearing in the estimate is avoidable, by using

the exact form of Chernoff’s inequality in the proof, namely using that for

i.i.d. p-Bernoulli variables Zi, and for β < p,

P[N∑

i=1

Zi ≤ βN ] ≤ e−N [β ln(β/p)+(1−β) ln((1−β)/(1−p))].

For reference on this estimate and on the Chernoff bound used in the proof see

for example the survey on geometric applications of Chernoff type estimates

5

Page 6: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

[AFM]. More precisely, if one substitutes the constant 3 by C1 then instead

of ln 2 one can put a constant c2 such that c2 → 0 when C1 →∞.

Remark 3. In the case p = 1 one encounters a problem with the convergence

of the probability. However, if one assumes an upper bound d on the random

variable X, then the same proof as below will give an upper estimate on the

probability in (1) of the form ≈ C0 log(d/s)e−NKs/ log(d/s), which is sufficient

in some cases.

Proof of Theorem 1. We will use the standard Chernoff bound. For j =

log s+ 1, log s+ 2, . . . we define

Aj = {2j−1 < X ≤ 2j},

so that P [Xi ∈ Aj] ≤ e−K2(j−1)p(where we have used the assumption s > K0).

We set mj = Ns2−j/(j− log s)2. We measure the probability of the following

event: out of the N variables Xi, for every j, no more than mj of them are

in Aj. This event is included in the event that

1

N

N∑i=1

Xi ≤ s(1 +∞∑

j=1

1

j2) ≤ 3s

We will estimate the probability of the complementary event. It is less

than the sum over j over the individual probabilities

Pj = P[ more than mj of the Xis are in Aj].

As long as

s2−j/(j − log s)2 > e−K2p(j−1)

(2)

6

Page 7: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

(which will give us a condition on s, namely a lower bound on s in terms of

K and p), this probability is small, and by Chernoff it is smaller than

e−N [K2p(j−1)s2−j/(j−log s)2−ln 2] = e−N [Ks2(p−1)j−1/(j−log s)2−ln 2].

(Here by using the exact form of Chernoff’s estimate we may substitute

− ln 2 by the term β ln β + (1 − β) ln(1 − β) for, say, β = s2−j, and this

will improve the estimate. More precisely, if we sum to begin with for j =

log(C1s) + 1, log(C1s) + 2, . . . we will have above that β ≤ 1/(2C1) and so

the additional term in the exponent is also small, and tends to 0 when C1

increases.)

The sum of these probabilities converges (we are using the fact p > 1),

and is comparable to the first element in the series, which is e−N [spK/2−ln 2]

(so, in fact, C0 in the Theorem depends on p but can be taken universal, and

even not large at all, when p is bounded away from 1).

We now have to check condition (2). The left hand side is 2−i/i2 and the

right hand side is e−K2p(i−1)sp. Taking the natural logarithm of both sides we

see that the condition is (i+ 2 ln i)/2p(i−1) < Ksp. Clearly the left hand side

is largest for i = 1, 2, so we need only ensure that s > (2+2 ln 2)

2K1/p = (1+ln 2)

K1/p .

Thus we have shown that the condition in Chernoff’s bound is satisfied and

the proof of Theorem 1 is complete. ¤

2 A Global Proposition

We now state the application of this theorem, which is a global proposition

regarding the decrease of diameter of a convex body, which generalizes the

7

Page 8: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

well known diameter reduction for averages of random orthogonal rotations

of a convex body.

Proposition 2 There exist universal constants c, C such that: Let {A} be

some family of operators A : Rn → Rn with some probability measure P.

Assume that for some body K ⊂ Rn and some α > 0 the following holds: for

every x ∈ Rn and for any s > s0

P[A : ‖Ax‖K > s|x|] ≤ e−α2s2n. (3)

Then, if T < 2( 1αs0

)2, we have with probability greater than 1− e−cn that for

any x ∈ Rn

1

T

T∑i=1

‖Aix‖K ≤ C1

α√T|x|, (4)

and if T ≥ 2( 1αs0

)2 we have with probability greater than 1−e−cn that for any

x ∈ Rn

1

T

T∑i=1

‖Aix‖K ≤ Cs0|x|, (5)

where Ai are chosen independently according to the distribution P.

Restating the proposition in geometric form, using duality, gives

Corollary 1 There exist universal constants c, C such that: Let {A} be some

family of operators A : Rn → Rn with some probability measure P. Assume

that for some body K ⊂ Rn and some α > 0 the following holds: for every

x ∈ Rn and for any s > s0

P[A : ‖A∗x‖∗K > s|x|] ≤ e−α2s2n. (6)

8

Page 9: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

Then, if T < 2( 1αs0

)2, we have with probability greater than 1− e−cn that

A1K + A2K + · · ·+ ATK

T⊂ C

1

α√TDn, (7)

and if T ≥ 2( 1αs0

)2 we have with probability greater than 1− e−cn that

A1K + A2K + · · ·+ ATK

T⊂ Cs0Dn, (8)

where Ai are chosen independently according to the distribution P.

Remark. We are describing the case p = 2 because it is the most useful.

However, for any p > 1 we have a similar result, namely if instead of (6) we

have an estimate of the form

P[A : ‖A∗x‖∗K > s|x|] ≤ e−αpspn,

then we get a separation into two cases T < 2( 1αs0

)p and T ≥ 2( 1αs0

)p. In the

first case instead of (7) we get that the average is included in C 1αT 1/pDn, and

in the second case we get exactly (8). The proof is identical. As for p = 1,

this is different, since Theorem 1 is different, and one gets an upper estimate

on the diameter of the form ≈ 1αT

log d if this quantity is greater than s0.

The meaning of the separation of the two cases inside the proposition

seems to be that there is a diameter reduction of order√T when taking an

average of T copies of a convex body K operated upon by random operators

Ai, until the diameter reaches some critical value on which it stabilizes. In

the case of orthogonal rotations we know the reason for stabilization, namely

the body becomes a euclidean ball. Of course, the proposition above gives

only an upper bound, and by no means implies stabilization. To get any

result in the reverse direction (namely, the inclusion of a ball of some radius

9

Page 10: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

after a given number of steps, and stabilization) we would need a reverse

condition as well, promising that points do not shrink very much under the

random operation. This is addressed in Proposition 5.

Proof of Proposition 2 We begin with the case of T < 2( 1αs0

)2, and we should

show that for any x ∈ Sn−1

T∑i=1

‖Aix‖ ≤ C

√T

α, (9)

under the assumption that for some constant α and for any s > s0

P[A : ‖Ax‖ > s|x|] ≤ e−α2s2n.

Theorem 1 with p = 2 tells us that then for s > max{s0, (1 + ln 2)/(√nα)}

we have

P[1

T

T∑i=1

‖Aix‖ > 3s|x|] ≤ C0e−T (α2ns2−ln 2).

We want this true for 3s = C2

1√Tα

, for every x in a 1/2-net on the sphere.

Such a net has cardinality less than 5n. Then successive approximation will

guarantee that inequality (9) be true for all x ∈ Sn−1. The probability that

we get for this is greater than 1 − 5ne−T (nC2/(36T )−ln 2), which for large C is

high, at least in the case where T ≤ C ′n. If T is much larger than n the

term ln 2 interferes, and so we have to use the stronger form of Theorem 1

avoiding this term, which we indicated in the remark following Theorem 1

and also in the proof of the theorem.

In the second case where T ≥ 2( 1αs0

)2 we can no longer take s = C 1√Tα

but only s = s0. The probability is greater than 1 − 5ne−T (α2s0n−ln 2), and

from the assumption on T this probability is exponentially close to 1. So, we

10

Page 11: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

get inequality (5). We remark that although we wrote a universal constant

C, the proof shows that this constant is not large at all and can be chosen

to be, say, 5 (and in some cases close to 1). ¤

3 A Local Proposition

In this section we describe an analogue of Milman’s local diameter reduc-

tion theorem, namely the theorem for orthogonal projections. The proof of

Dvoretzky’s Theorem in [Mi1] implies that a random projection of a con-

vex body K in Rn of diameter d = d(K) into a subspace of dimension

k∗ = c(M∗d

)2, is an approximate euclidean ball of radius M∗ = M∗(K). The

fact that this k∗ is indeed the correct formula for the dimension in which a

projection is an approximate euclidean ball, and not just a lower estimate,

was pointed out in [MiS].

It was then observed by Milman that for any dimension k > k∗, when one

projects the body K into a k-dimensional subspace, its diameter decreases

by a factor around√k/n. For a detailed explanation of this fact and more of

the history see [Mi4], Section 2.3.1. Thus, there is only one type of behavior

of the diameter of a convex body under projections, it decreases like√k/n

as long as k is larger than the critical value k∗, and then it stabilizes on the

value M∗. (Note that here we know exact behavior, not only upper bounds.)

In Proposition 3 below we deal with an abstract family of operators satis-

fying a condition which has nothing to do with the body K but only describes

the way the operators act on individual points. Under this condition, a gen-

eral reduction of diameter holds for all convex bodies. We discuss some

11

Page 12: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

examples for such families of operators in Section 5, including the classical

case of orthogonal projections.

Proposition 3 Let {A} be some family of operators A : Rn → Rn with some

probability measure P. Assume that for every x ∈ Rn and for any s > s0

P[A : |Ax| > s|x|] ≤ e−cns2

.

Then there exist universal constants c′, C such that given a convex body

K ⊂ Rn the following holds: If s0 > (M∗(K)d(K)

) then for every j ≥ s20n, with

probability greater than 1− e−c′j on the choice of A we have

AK ⊂ C

√j

nd(K)Dn,

and if s0 ≤ (M∗(K)d(K)

) then with probability greater than 1−e−c′ns20 on the choice

of A we have

AK ⊂ CM∗(K)Dn.

Moreover, the constant c′ appearing in the probability is a function of the

constant C, and by increasing C we can have c′ as big as desired.

To prove Proposition 3 we will use Talagrand’s Majorizing Measures The-

orem in the form of Theorem 4 below (see [Tal]); this type of application of

the Majorizing Measure Theorem was suggested to me by Prof. Keith Ball

and was used in a special case in [Art1].

Theorem 4 (Majorizing Measures) There exists a universal constant C0

such that for any dimension n, for every convex body K ⊂ Rn, there exist

12

Page 13: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

families of points B0 ⊂ . . . ⊂ Bm−1 ⊂ Bm ⊂ . . . ⊂ K with cardinality

|Bm| ≤ 22msuch that for every x ∈ K

∞∑m=0

d(x,Bm)√

2m ≤ C0M∗(K)

√n,

where d(x,Bm) denotes the distance of the point x to the mth family, i.e.,

d(x,Bm) = inf{d(x, y) : y ∈ Bm}. (Moreover, these points can be constructed

so that if vm is the closest point to x in Bm, and vm+1 is the closest point to

x in Bm+1, then vm is the closest point to vm+1 in Bm.)

Proof of Proposition 3 We fix m0 = log c1j, with c1 universal to be chosen

later. Each vector x ∈ K we write as

x = vm0(x) +∞∑

m=m0+1

(vm(x)− vm−1(x))

where vm(x) ∈ Bm is the member of Bm closest to x. We denote um(x) =

vm(x)−vm−1(x), so that |um(x)| ≤ 2d(x,Bm−1), and denoting Cm = {um(x) :

x ∈ K}, we have |Cm| ≤ 22m+1. We rewrite

x = vm0+∞∑

m=m0+1

(d(x,Bm−1)

√2m−1

C0M∗(K)√n

|um(x)|2d(x,Bm−1))

) (um(x)

|um(x)|2C0M

∗(K)√n√

2m−1

).

The Majorizing Measures Theorem tells us that the sum of the coefficients

in the left brackets of the infinite sum is less than 1. Therefore, for any linear

operator A

|Ax| ≤ |A(vm0(x))|+ supm0<m<∞

|A(um(x))||um(x)|

2C0M∗(K)

√n√

2m−1.

In other words, AK is contained in a ball of radius

maxv∈Bm0

|Av|+ supm0<m<∞,um∈Cm

|Aum||um|

2C0M∗(K)

√n√

2m−1. (10)

13

Page 14: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

We first discuss the case s0 > (M∗(K)d(K)

). We use the assumption in the

statement of the Proposition,

P[A : |Ax| > s|x|] ≤ e−cns2

,

to show that with proability 1− e−c′j we have that

(a) For every v ∈ Bm0 ,

|Av| ≤ C1

√j

n|v|,

and (b) For every m > m0, for every um ∈ Cm,

|Aum| ≤ C1d√

2m−1

M∗(K)√n

√j

n|um|.

This will complete the proof of the case s0 > (M∗(K)d

) with C = C1 + 2C0C1.

For (a), since the probability of the event happening for a specific v is at

least 1 − e−cC21 j, we see that if we have in Bm0 no more than ec1j elements,

for c1 = cC21/2, the probability that for all of them we have this property is

greater than 1 − e−c1j, which is precisely why we chose m0 = log c1j (and

we have thus specified c1). In estimating the probability, we have used the

assumption that C1

√j/n > s0, which is clearly satisfied if C1 > 1.

For (b) to happen we take care of each m separately. For each m the

probability is bounded by

1− 22m+1

P [|Ax| > t

√j

n|x|],

for t = dM∗C1

√2m−1√

n. Since we consider m > m0, we have that 2m−1 ≥ c1j,

and hence we can apply the estimate for the probability, as long as, say,

C1 > 1/c1, getting

1− 22m+1

e−cC212m−1 j

n( d

M∗ )2 .

14

Page 15: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

We are assuming that s0 > (M∗(K)d(K)

), so jn> (M∗(K)

d)2. Therefore (if, say,

C1 ≥ 4/√c) we can bound this probability from below by 1 − 2−c22m

, and

c2 can be large provided that C1 is chosen large. Adding up for all m the

probability of failure in (b), and adding also the probability of failure in (a)

as the first summand, we see that

e−c1j +∞∑

m=m0+1

2−c22m ≤ 2−c′j,

which completes the proof in the first case. Clearly by increasing C1 we can

increase c′ as much as required.

In the second case, where s0 < (M∗d

), we take m0 = log(c1ns20), again c1

to be chosen later. We use the assumption

P[A : |Ax| > s|x|] ≤ e−cns2

for s = C1s0 to show that with probability 1− e−cC21ns2

0

(a) For every v ∈ Bm0 ,

|Av| ≤ C1s0|v| ≤ C1M∗(K),

and for s = C1

√2m−1√

nto show

(b) For every m > m0, for every um ∈ Cm,

|Aum| ≤ C1

√2m−1

√n

|um|,

and following (10) this will complete the proof in the second case s0 ≤ (M∗d

).

Calculating the probabilities, using that now√

2m/n >√c1s0, we get

that, when C1 > 1/√c1, the probability is greater than

1− e−cC21s2

0n −∞∑

m=m0+1

22m+1

e−cC212m−1

and for C1 sufficiently large the probability is greater than 1− e−c′ns20 . ¤

15

Page 16: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

4 A Global Proposition in the reverse direc-

tion

To have a complete global Dvoretzky type statement, Proposition 2 provides

just one direction, namely it shows that the average of a certain amount

of copies of K is contained inside a euclidean ball of an appropriate size.

Actually, under these conditions nothing stronger can be stated. Proposition

5 below gives the reverse side, namely the containment of a ball. Naturally,

it involves a condition which promises that individual points are not shrunk

“too much” by the operators. It also includes an a-priori assumption of

diameter reduction, which can be obtained for example by using Proposition

2. We remark that the condition in the proposition is about a specific value

ε0 which is (four times) the radius of the ball we want inside our body. If one

knows a more global condition satisfied for different values of ε, for example

a small ball probability estimate such as in [LO], one can sometimes get an

“inner diameter increase” lemma by applying the condition each time to a

different pair (ε, T ). In this sense the proof below is very simple, and so can

be adapted to various initial conditions.

Proposition 5 Let {A} be some family of operators A : Rn → Rn with some

probability measure P. Assume that for some body K and some ε0 > 0 the

following holds: for every x ∈ Rn

P[A : ‖Ax‖K < ε0|x|] ≤ (F (ε0))n,

where (F (ε0))n < 1/4. Assume, further, that for some R and some T , with

16

Page 17: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

probability p we have

1

T

T∑i=1

‖Aix‖K ≤ R|x|.

Then, if T satisfies T > ln(2 + 16R/ε0)/ ln(1/(2F (ε0))) we have with proba-

bility greater than 1− 2−n − p that

ε0

4|x| ≤ 1

T

T∑i=1

‖Aix‖K ,

where Ai are chosen independently according to the distribution P.

Again we may reformulate the above in geometric form, namely assume

that the family of operators A : Rn → Rn satisfies for some body K and some

ε0 > 0 and F (ε0) < (1/4)1/n, that for every x ∈ Rn one has P[A : ‖A∗x‖∗K <

ε0|x|] ≤ (F (ε0))n, and that for some R and some T , with probability p one has

1T

∑Ti=1AiK ⊂ RDn. Then, if T satisfies T > ln(2+16R/ε0)/ ln(1/(2F (ε0))),

one has with probability greater than 1− 2−n − p that ε0

4Dn ≤ 1

T

∑Ti=1AiK.

Proof of Proposition 5 We will show that for an ε0

4Rnet on the sphere, where

R is the upper bound we are assuming, the inequality holds with ε0/2. Then

by the triangle inequality we will have for every x ∈ Sn−1

1

T

T∑i=1

‖Aix‖K ≥ ε0/4.

The net has cardinality less than (1+8R/ε0)n. From Chernoff, the probability

that more than half of the numbers ‖Aix‖K will be greater than ε0 is greater

than 1 − (2(F (ε0))n)T . The probability that this is true for every point in

the net is greater than

1− (1 + 8R/ε0)n(2F (ε0)

n)T .

17

Page 18: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

We now write the condition on T which promises that this quantity is greater

than 1− 2−n and the proof is complete. ¤

5 Some Examples

1. Orthogonal Projections For an integer 1 ≤ k ≤ n let {Pk} be

the family of orthogonal projections onto k-dimensional subspaces of Rn,

endowed with the normalized Haar measure. It is well known, and was

shown for example (with precise estimates on c1, c2 below) in [Art1], that for

s > c1

√kn

P[Pk : |Pkx| > s|x|] ≤ e−c2ns2

.

Proposition 3 then implies Milman’s diameter reduction theorem for random

projections. (And by duality, one sided estimates for Dvoretzky’s Theorem

for sections). Namely, for projection onto a random subspace of dimension k

greater than k∗ = c(M∗d

)2n the diameter decreases like√

kn, and for dimension

lower than this the diameter is around M∗. The other side of Dvoretzky’s

Theorem, namely that a projection onto a subspace of dimension k∗ is already

with high probability isomorphic to a euclidean ball of radius M∗ is not

included in this statement.

2. Sign-Projections For an integer 1 ≤ k ≤ n let {Sk} be the family

of k-dimensional sign-projections, defined as follows: for k sign-vectors εi ∈{−1, 1}n, i = 1, . . . , k let Sk(x) =

∑ki=1〈x, εi/

√n〉ei ∈ Rk. We consider the

uniform measure on this set, namely each εi is chosen with respect to the

uniform measure on the n-dimensional discrete cube. It is not difficult to

18

Page 19: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

show that for s > c1

√kn

P[Sk : |Skx| > s|x|] ≤ e−c2ns2

.

This is explained for example in [Art2], where in the same spirit as of the

proof of Proposition 3 it was shown that also for this family of operators a

diameter reduction statement holds. In this case, however, the statement is

not sharp, namely for certain bodies, such as B(`n1 ), the decrease of diameter

continues beyond the value k∗. For a more detailed discussion see [Art2].

3. Orthogonal Rotations Consider the family O(n) of orthogonal ro-

tations in Rn, endowed with the normalized Haar measure. Proposition 2

and Proposition 5 together give us the famous global version of Dvoretzky’s

Theorem, namely that the average of ( dM∗ )2 random rotations of a body K

is isomorphic to a euclidean ball of radius M∗. This theorem first appeared

in [BLM]. Indeed, the estimates that we use are the famous concentration of

measure estimate for t > 0

P[x : |‖x‖∗K −M∗| > tM∗|x|] ≤√π/2e−(M∗/d)2t2n/2.

(Where for one side we use this with one specific t = 1/2 say, and apply

Proposition 5, and for the other side we use the tail estimate for t > 1 and

Proposition 2).

4. ψ2-bodies

A convex body T of volume 1 is called a “ψ2-body” if there exists a

constant A such that for each θ ∈ Sn−1 we have that the random variable

Xθ = 〈y, θ〉 (where y is random in T with respect to the volume distribu-

tion) is ψ2 with constant less than A, that is, E(e(Xθ/A)2) ≤ 2. (When the

expectation is with respect to the volume distribution in T )

19

Page 20: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

We can define another projection-type operator as follows: Consider k

random points yi inside this convex body T (random with respect to the vol-

ume distribution). Define Gk(x) = 1√n

∑ki=1〈x, yi〉ei. That is, a projection-

type operator from Rn into Rk.

To apply Proposition 3, we need to find s0 such that for every x ∈ Rn

and for any s > s0

P[Gk : |Gkx| > s|x|] ≤ e−cns2

.

To do this we use the ψ2 behavior in the standard Markov inequality scheme

P[|Gkx| > s|x|] = P[k∑

i=1

|〈x, yi〉|2 > s2n|x|] ≤

E(e(X/A)2)ke−s2n/A2 ≤ e−n(s2/A2−(k/n) ln 2) ≤ e−k(s2/2A2)

for s > A√

ln 2/2√k/n.

Following Proposition 3 we get exactly the same behavior as in the case

of orthogonal projections and of sign-projections. Notice that we have two

different convex bodies involved, one which helps us define a random opera-

tor, and another whose diameter is reduced by applying this operator. The

first is a ψ2-body, the second is arbitrary.

The case of the same definition of an operator but when the body T with

which we define the random operator is general and not necessarily ψ2 is

different, and is discussed in the next section.

6 Some further continuations

We want to describe two extensions of the above propositions, which, joint

together, are relevant to the example indicated at the end of the above sec-

20

Page 21: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

tion.

The first extension of the local Proposition 3 is to the case of ψp behavior,

or more precisely to the case where the tails are not as good, subgaussian,

as in the statement of the proposition. To describe the result we need to

introduce the parameter γp(K), 1 ≤ p ≤ 2, associated to a convex body K.

This parameter was introduced by M. Talagrand in his Majorizing Measures

theory, and is by now a widely used geometric parameter. Define

γp(K) = inf supx∈K

∑m≥0

d(x,Bm)2m/p

where the infimum is taken over all families Bm ⊂ K, m = 0, 1, . . . with

|Bm| ≤ 22m. For p = 2 we have in effect already used the parameter γ2(K)

because γ2(K) ≈ √nM∗(K) (this is exactly Talagrand’s Majorizing Measures

Theorem, one side of which is theorem 4). For motivations for this definition,

computation of γp for certain bodies, and many applications see [Tal].

Using this definition and the method as in the proof of Proposition 3 one

gets the following

Proposition 6 Let {A} be some family of operators A : Rn → Rn with some

probability measure P. Assume that for some 1 ≤ p ≤ 2, for every x ∈ Rn

and for any s > s0

P[A : |Ax| > s|x|] ≤ e−cnsp

.

Then there exist universal constants c′, C such that given a convex body K ⊂Rn the following holds: If s0 > ( γp(K)

d(K)n1/p ) then for every j ≥ sp0n, with

probability greater than 1− e−c′j on the choice of A we have

AK ⊂ C(j

n)1/pd(K)Dn,

21

Page 22: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

and if s0 ≤ ( γp(K)

d(K)n1/p ) then with probability greater than 1 − e−c′nsp0 on the

choice of A we have

AK ⊂ Cγp(K)

n1/pDn.

Moreover, the constant c′ appearing in the probability is a function of the

constant C, and by increasing C we can have c′ as big as desired.

The second extension we describe is to the case where instead of estimat-

ing the `2-norm of Ax, we consider its size with respect to some other norm.

The most important case to consider is the `p-norm, because if we build a

projection-type operator where the different coordinates are ψp and not ψ2

(for a general body this will be ψ1), then to get an estimate for their sum

we have to sum p-powers of them and not squares (which may have terrible

tails). We discuss the general case of measuring the size of Ax by an arbi-

trary norm ‖ · ‖ with unit ball say B. If our assumption is that the images

shrink in the norm ‖ · ‖, we get that the image of K is inside appropriate

copies of B. More precisely

Proposition 7 Let {A} be some family of operators A : Rn → Rn with

some probability measure P. Let ‖ · ‖ be some abstract norm, with unit ball

B. Assume that for every x ∈ Rn and for any s > s0

P[A : ‖Ax‖ > s|x|] ≤ e−cns2

.

Then there exist universal constants c′, C such that given a convex body

K ⊂ Rn the following holds: If s0 > (M∗(K)d(K)

) then for every j ≥ s20n, with

probability greater than 1− e−c′j on the choice of A we have

AK ⊂ C

√j

nd(K)B,

22

Page 23: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

and if s0 ≤ (M∗(K)d(K)

) then with probability greater than 1−e−c′ns20 on the choice

of A we have

AK ⊂ CM∗(K)B.

Moreover, the constant c′ appearing in the probability is a function of the

constant C, and by increasing C we can have c′ as big as desired.

One can similarly combine the two above propositions. The proofs of

these two propositions are similar to the proofs given in this note. Not to

overload this paper technically, we will publish them, together with some

more applications, such as example 4 of Section 5 with T a general convex

body, elsewhere.

References

[Art1] S. Artstein Proportional Concentration Phenomena on the Sphere,

Israel J. Math., Vol 132 (2002), 337-358.

[Art2] S. Artstein The change in the diameter of a convex body under a ran-

dom sign-projection, Geometric Aspects of Functional Analysis, Is-

rael Seminar Notes (2002-2003) (Eds. Milman-Schechtman), Springer

Lect. Notes series 1850, 31–40.

[AFM] S. Artstein, O. Friedland and V. Milman Geometric Applications of

Chernoff Type Estimates To appear in GAFA Seminar notes.

[BLM] J. Bourgain, J. Lindenstrauss and V. Milman, Minkowski sums and

symmetrizations. Geometric aspects of functional analysis (1986/87),

Lecture Notes in Math., 1317, Springer, Berlin, 1988, 44-66.

23

Page 24: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

[LO] R. LataÃla and K. Oleszkiewicz, Small ball probability estimates in

terms of widths Studia Math. 169 (2005) no. 3, 305–314.

[LPT] A. Litvak, A. Pajor and N. Tomczak-Jaegermann, Diameters of sec-

tions, and coverings of convex bodies, J. Func. Anal. 231 (2006) 483-

457.

[Mi1] V. D. Milman, A new proof of the theorem of A. Dvoretzky on sections

of convex bodies, Funct. Anal. Appl. 5 (1971), 28-37 (translated from

Russian).

[Mi2] V. D. Milman, Almost Euclidean quotient spaces of subspaces of a

finite dimensional normed space, Proceedings AMS 94, no. 3 (1985),

445-449.

[Mi3] V. D. Milman, A note on a low M∗-estimate Geometry of Banach

spaces (1989), London Math. Soc. Lecture Note Ser., 158, Cambridge

Univ. Press, Cambridge, 1990, 219-229.

[Mi4] V. D. Milman, Topics in Asymptotic Geometric Analysis, Geom.

Funct. Anal., Special Volume GAFA2000 (2000), 792-815.

[MiS] V. D. Milman and G. Schechtman, Global vs. Local asymptotic the-

ories of finite dimensional normed spaces, Duke Math. J. 90 (1997)

73-93.

[Tal] M. Talagrand, The generic chaining. Upper and lower bounds of

stochastic processes, Springer Monographs in Mathematics, Springer-

Verlag, Berlin, 2005.

24

Page 25: A Bernstein-Chernoff deviation inequality, and …shiri/israelj/IJ-BernCher.pdfA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

Shiri Artstein-Avidan,

Department of Mathematics,

Princeton University,

Fine Hall, Washington Road,

Princeton NJ 08544-1000 USA

Email: [email protected]

25