Chernoff Bound

Embed Size (px)

Citation preview

  • 7/30/2019 Chernoff Bound

    1/49

    Chernoff Bounds

    Let X1,..., Xn be independent 0-1 random variables with

    Pr(Xi = 1) = pi Pr(Xi = 0) = 1 pi.

    Let X = ni=1 Xi,

    = E[X] =n

    i=1

    E[Xi] =n

    i=1

    pi

    We want a bound on

    Pr(|X | > ).

  • 7/30/2019 Chernoff Bound

    2/49

  • 7/30/2019 Chernoff Bound

    3/49

    The Basic IdeaUsing Markov inequality we have:For any t > 0,

    Pr(X a) = Pr(etX eta) E[etX]eta

    .

    Similarly, for any t < 0

    Pr(X a) = Pr(etX

    eta

    ) E[etX]

    eta .

    Pr(X

    a)

    mint>0

    E[etX]

    eta.

    Pr(X

    a)

    mint

  • 7/30/2019 Chernoff Bound

    4/49

    Moment Generating Function

    Definition

    The moment generating function of a random variable X is defined

    for any real value t as

    MX(t) = E[etX].

  • 7/30/2019 Chernoff Bound

    5/49

    Theorem

    Let X be a random variable with moment generating functionMX(t). Assuming that exchanging the expectation and

    differentiation operands is legitimate, then for all n 1E[xn] = M

    (n)X (0),

    where M(n)X (0) is the n-th derivative of MX(t) evaluated at t = 0.

    Proof.

    M(n)X (t) = E[X

    netX].

    Computed at t = 0 we get

    M(n)X (0) = E[X

    n].

  • 7/30/2019 Chernoff Bound

    6/49

    Theorem

    Let X and Y be two random variables. If

    MX(t) = MY(t)

    for all t (, ) for some > 0, then X and Y have the samedistribution.

    Theorem

    If X and Y are independent random variables then

    MX+Y(t) = MX(t)MY(t).

    Proof.

    MX+Y(t) = E[et(X+Y)] = E[etx]E[etY] = MX(t)MY(t).

  • 7/30/2019 Chernoff Bound

    7/49

    Chernoff Bound for Sum of Bernoulli Trials

    Let X1, . . . , Xn be a sequence of independent Bernoulli trials with

    Pr(Xi = 1) = pi. Let X =n

    i=1 Xi, and let

    = E[X] = E

    n

    i=1

    Xi

    =

    ni=1

    E[Xi] =n

    i=1

    pi.

    MXi(t) = E[etXi]

    = piet + (1

    pi)

    = 1 + pi(et 1) epi(et1).

  • 7/30/2019 Chernoff Bound

    8/49

    Taking the product of the n generating functions we get

    MX(t) =n

    i=1

    MXi(t)

    n

    i=1

    epi(et1)

    = eni=1 pi(e

    t1)

    = e(et1)

  • 7/30/2019 Chernoff Bound

    9/49

    Theorem

    Let X1, . . . , Xn be independent Bernoulli random variables suchthat Pr(Xi = 1) = pi.

    1 For any > 0,

    Pr(X (1 + )) 0

    Pr(X (1 + )) = Pr(etX et(1+)) (4) E[e

    tX]

    et(1+)

    0, we can set t = ln(1 + ) > 0 to get:

    Pr(X (1 + )) e

    (1 + )(1+)

    .

    W h h f 0 1

  • 7/30/2019 Chernoff Bound

    11/49

    We show that for 0 < < 1,

    e

    (1 + )(1+) e2/3

    or thatf() = (1 + ) ln(1 + ) + 2/3 0

    in that interval. Computing the derivatives of f() we get

    f() = 1 1 + 1 + ln(1 + ) + 23 (5)

    = ln(1 + ) + 23

    , (6)

    f() =

    1

    1 + +

    2

    3. (7)

    f() < 0 for 0 < 1/2, and f() > 0 for > 1/2.f() first decreases and then increases over the interval [0, 1].Since f(0) = 0 and f(1) < 0, f()

    0 in the interval [0, 1].

    Since f(0) = 0, we have that f() 0 in that interval.

  • 7/30/2019 Chernoff Bound

    12/49

    For R 6, 5.

    Pr(X (1 + ))

    e

    (1 + )(1+)

    e

    6R

    2R.

  • 7/30/2019 Chernoff Bound

    13/49

    Theorem

    Let X1, . . . , Xn be independent Bernoulli random variables suchthat Pr(Xi = 1) = pi. LetX =

    ni=1 Xi and = E[X].

    For 0 < < 1,

    Pr(X (1 )) e2/2 (8)

  • 7/30/2019 Chernoff Bound

    14/49

  • 7/30/2019 Chernoff Bound

    15/49

    We need to show:

    f() = (1 )ln(1 ) + 12

    2 0. (10)

    Differentiating f() we get

    f() = ln(1 ) + ,f() = 1

    1 + 1.

    f(0) = 0, and since f() 0 in the range [0, 1), f() ismonotonically decreasing in that interval.

    E l C i fli

  • 7/30/2019 Chernoff Bound

    16/49

    Example: Coin flips

    Let X be the number of heads in a sequence of n independent fair

    coin flips.

    Pr

    |X n

    2| 1

    2

    4n ln n

    = Pr

    X n2

    1 +

    4 ln n

    n

    +PrX n

    2 1 4 ln n

    n e 13 n2 4 ln nn + e 12 n2 4 ln nn 2

    n.

  • 7/30/2019 Chernoff Bound

    17/49

    Using the Chebyshevs bound we had:

    Pr|X n2 | n4 4n .

    Using the Chernoff bound in this case, we obtain

    Pr|X n2 | n4 = PrX n2 1 + 12

    + Pr

    X n

    2

    1 1

    2

    e

    13n2

    14 + e

    12n2

    14

    2e n24 .

    E l E ti ti P t

  • 7/30/2019 Chernoff Bound

    18/49

    Example: Estimating a Parameter

    Evaluating the probability that a particular gene mutationoccurs in the population.

    Given a DNA sample, a lab test can determine if it carries themutation.

    The test is expensive and we would like to obtain a relativelyreliable estimate from a minimum number of samples. p = the unknown value; n = number of samples, pn had the mutation.

    Given sufficient number of samples we expect the value p tobe in the neighborhood of sampled value p, but we cannotpredict any single value with high confidence.

    Confidence Interval

  • 7/30/2019 Chernoff Bound

    19/49

    Confidence Interval

    Instead of predicting a single value for the parameter we give aninterval that is likely to contain the parameter.

    Definition

    A 1 q confidence interval for a parameter T is an interval[p , p+ ] such that

    Pr(T [p , p+ ]) 1 q.

    We want to minimize 2 and q, with minimum n.Using pn as our estimate for pn, we need to compute and q suchthat

    Pr(p [p , p+ ]) = Pr(np [n(p ), n(p+ )]) 1 q.

  • 7/30/2019 Chernoff Bound

    20/49

    The random variable here is the interval [p , p+ ] (or thevalue p), while p is a fixed (unknown) value.

    np has a binomial distribution with parameters n and p, andE[p] = p. If p / [p , p+ ] then we have one of thefollowing two events:

    1 If p < p , then np n(p+ ) = np(1 + p), or np is largerthan its expectation by a

    pfactor.

    2 If p > p+ , then np n(p ) = np(1 p

    ), and np is

    smaller than its expectation by a p

    factor.

  • 7/30/2019 Chernoff Bound

    21/49

    Pr(p [p , p+ ])

    = Pr(np np(1

    p)) + Pr(np np(1 +

    p))

    e 12np( p)2 + e 13np( p)2

    = en2

    2p + en2

    3p .

    But the value of p is unknown, A simple solution is to use the factthat p 1 to prove

    Pr(p [p , p+ ]) = e n22 + e n23 .

    Setting q = en2

    2 + en2

    3 , we obtain a tradeoff between , n and

    the error probability q.

    Better Bound

  • 7/30/2019 Chernoff Bound

    22/49

    Better Bound

    The binomial probabilities are monotone increasing up to theexpectation, and then monotone decreasing.

    Pr(p [p , p+ ]) Pr(np np(1

    p)) + Pr(np np(1 +

    p))

    maxpp

    enp(pp

    p)2/2 + max

    pp+enp(

    pp

    p)2/3

    e n2

    2(p) + e n

    2

    3(p+) ,

    Setting

    q = e n

    2

    2(p) + e n

    2

    3(p+)

    gives a tighter tradeoff between , n and q.

    Application: Set Balancing

  • 7/30/2019 Chernoff Bound

    23/49

    Application: Set Balancing

    Given an n

    n matrix

    Awith entries in

    {0, 1

    }, let

    a11 a12 ... a1na21 a22 ... a2n... ... ... ...... ... ... ...

    an1 an2 ... ann

    b1b2......bn

    =

    c1c2......cn

    .

    Find a vector b with entries in {1, 1} that minimizes

    ||Ab|| = maxi=1,...,n

    |ci|.

  • 7/30/2019 Chernoff Bound

    24/49

    Theorem

    For a random vector b, with entries chosen independently and withequal probability from the set {1, 1},

    Pr(||Ab|| 12n ln n) 4n

    .

  • 7/30/2019 Chernoff Bound

    25/49

    Consider the i-th row ai = ai,1, . . . , ai,n. Let k be the numberof 1s in that row.

    If k 12n ln n clearly |ai b| 12n ln n. If k >

    12n ln n, let

    Xi = |{j | ai,j = 1 and bj = 1}|

    andYi = |{j | ai,j = 1 and bj = 1}|.

    Thus, Xi counts the number of +1s in the sum n

    j=1

    ai,jbj,

    Yi counts the number of1s Xi + Yi = k.

  • 7/30/2019 Chernoff Bound

    26/49

    if |Xi Yi|

    12n log n then |Xi (k Xi)|

    12n log nwhich implies

    k

    2 (1

    12n log n

    k ) Xi k

    2 (1 +

    12n log n

    k ).

  • 7/30/2019 Chernoff Bound

    27/49

    Using Chernoff bounds,

    Pr

    Xi k2

    1 +

    12n ln n

    k2

    e( k2 )( 13 )( 12n ln nk2 ) e2 ln n

    PrXi k

    21

    12n ln n

    k2 e

    ( k2

    )( 12

    )( 12n ln nk2

    )

    e3 ln n

    Hence, for a given row,

    Pr(|Xi

    Yi

    |

    12n ln n)

    2

    n2

    Since there are n rows, the probability that any row exceeds thatbound is bounded by 2

    n.

    Chernoff Bound for Sum of{1, +1} Random

  • 7/30/2019 Chernoff Bound

    28/49

    C e o ou d o Su o { , + } a doVariables

    Theorem

    Let X1,..., Xn be independent random variables with

    Pr(Xi = 1) = Pr(Xi = 1) =1

    2 .

    Let X =n

    1 Xi. For any a > 0,

    Pr(X

    a)

    ea

    2/2n

    F t > 0

  • 7/30/2019 Chernoff Bound

    29/49

    For any t > 0,

    E[etXi] =1

    2et +

    1

    2et.

    et = 1 + t+t2

    2!+ + t

    i

    i!+ . . .

    and

    et

    = 1 t+t2

    2! + + (1)it

    i

    i! + . . .

    Thus,

    E[etXi] =1

    2

    et +1

    2

    et = i0

    t2i

    (2i)!

    i0

    ( t2

    2 )i

    i!= et

    2/2

  • 7/30/2019 Chernoff Bound

    30/49

    E[etX] =

    ni=1

    E[etXi] ent2

    /2,

    Pr(X

    a) = Pr(etX > eta)

    E[etX]

    eta

    et

    2n/2ta.

    Setting t = a/n yields

    Pr(X a) ea2/2n.

  • 7/30/2019 Chernoff Bound

    31/49

    By symmetry we also have

    Corollary

    Let X1,..., Xn be independent random variables with

    Pr(Xi = 1) = Pr(Xi =

    1) =

    1

    2.

    Let X =n

    i=1 Xi. Then for any a > 0,

    Pr(|X| > a) 2ea2/2n.

    Application: Set Balancing Revisited

  • 7/30/2019 Chernoff Bound

    32/49

    g

    Theorem

    For a random vector b, with entries chosen independently and withequal probability from the set {1, 1},

    Pr(

    ||Ab

    ||

    4n ln n)

    2

    n

    (11)

    Consider the i-th row ai = ai,1, ...., ai,n.

    Let k be the number of 1s in that row.

    Zi =k

    j=1 ai,ijbij =k

    j=1 bij.

    If k 4n ln n then clearly Zi satisfies the bound.

  • 7/30/2019 Chernoff Bound

    33/49

    If k > 4n log n, the k non-zero terms in the sum Zi areindependent random variables, each with probability 1/2 of beingeither +1 or 1.Using the Chernoff bound:

    Pr|Zi| >

    4n log n

    2e4n log n/2k 2

    n2,

    where we use the fact that n k.

    Packet Routing on Parallel Computer

  • 7/30/2019 Chernoff Bound

    34/49

    Communication network:

    Nodes - processors, switching nodes. edges - communication links.

  • 7/30/2019 Chernoff Bound

    35/49

    The n-cube:N = 2n nodes.Let x = (x1,..., xn) be the number of node x in binary.Nodes x and y are connected by an edge iff their binary

    representations differ in exactly one bit.Bit-wise routing: correct bit i in the i-th transition - route haslength n.

  • 7/30/2019 Chernoff Bound

    36/49

  • 7/30/2019 Chernoff Bound

    37/49

    A permutation communication request: each node is the sourceand destination of exactly one packet.Up to one packet can cross an edge per step, each packet can

    cross up to one edge per step.What is the time to route an arbitrary permutation on the n-cube?

  • 7/30/2019 Chernoff Bound

    38/49

    Two phase routing algorithm:

    1 Send packet to a randomly chosen destination.

    2 Send packet from random place to real destination.

    Path: Correct the bits, starting at x0 to xn1.Any greedy queuing method - if some packet can traverse an edgeone does.

  • 7/30/2019 Chernoff Bound

    39/49

    Theorem

    The two phase routing algorithm routes an arbitrary permutationon the n-cube in O(log N) = O(n) parallel steps with highprobability.

    We focus first on phase 1. We bound the routing time of agiven packet M.

    Let e1,..., em be the m n edges traversed by a given packetM is phase 1.

    Let X(e) be the total number of packets that traverse edge e

    at that phase. Let T(M) be the number of steps till M finished phase 1.

  • 7/30/2019 Chernoff Bound

    40/49

    Lemma

    T(M) mi=1

    X(ei).

    We call any path P = (e1, e2, . . . , em) of m n edges thatfollows the bit fixing algorithm a possible packet path.

    We denote the corresponding nodes v0, v1, . . . , vm, withei = (vi1, vi).

    For any possible packet path P, let T(P) = mi=1 X(ei).

    If phase I takes more than T steps then for some possible

  • 7/30/2019 Chernoff Bound

    41/49

    p p ppacket path P,

    T(P) T

    There are at most 2n 2n = 22n possible packet paths. Assume that ek connects (a1,..., ai,..., an) to (a1,.., ai,..., an). Only packets that started in address

    (,..., , ai, ...., an)

    can traverse edge ek, and only if their destination addressesare

    (a1, ...., ai1, ai, , ...., ).

    There are 2i1 possible packets, each has probability 2i totraverse ei.

  • 7/30/2019 Chernoff Bound

    42/49

    There are 2i1 possible packets, each has probability 2i totraverse ei.

    E[X(ei)] 2i1 2i = 1

    2.

    E[T(P)]

    mi=1

    E[X(ei)] 12

    m n.

    Problem: The X(ei)s are not independent.

    A packet is active with respect to possible packet path P if it

  • 7/30/2019 Chernoff Bound

    43/49

    ever use an edge of P.

    For k = 1, . . . , N, let Hk = 1 if the packet starting at node kis active, and H

    k= 0 otherwise.

    The Hk are independent, since each Hk depends only on thechoice of the intermediate destination of the packet startingat node k, and these choices are independent for all packets.

    Let H = Nk=1 Hk be the total number of active packets.

    E[H] E[T(P)] n

    Since H is the sum of independent 0

    1 random variables we

    can apply the Chernoff bound

    Pr(H 6n 6E[H]) 26n.

  • 7/30/2019 Chernoff Bound

    44/49

    For a given possible packet path P,

    Pr(T(P) 36n) Pr(H 6n) + Pr(T(P) 36n | H < 6n) 26n + Pr(T(P) 36n | H < 6n).

    Lemma

  • 7/30/2019 Chernoff Bound

    45/49

    Lemma

    If a packet leaves a path (of another packet) it cannot return tothat path in the same phase.

    Proof.

    Leaving a path at the i-th transition implies different i-th bit, thisbit cannot be changed again in that phase.

    Lemma

    The number of transitions that a packet takes on a given path isdistributed G( 12 ).

    Proof.

    The packet has probability 1/2 of leaving the path in eachtransition.

    The probability that the active packets cross edges of P more than36 i i l h h b bili h f i i fli d 36

  • 7/30/2019 Chernoff Bound

    46/49

    36n times is less than the probability that a fair coin flipped 36ntimes comes up heads less than 6n times.Letting Z be the number of heads in 36n fair coin flips, we now

    apply the Chernoff bound:

    Pr(T(P) 36n | H 6n) Pr(Z 6n)

    e18n(2/3)2/2

    = e

    4n

    23n1

    .

    Pr(T(P) 36n) Pr(H 6n)+ Pr(T(P) 36n | H 6n) 26n + 23n1 23n

  • 7/30/2019 Chernoff Bound

    47/49

    As there are at most 22n possible packet paths in the hypercube,the probability that there is any possible packet path for whichT(P) 36n is bounded by

    22n

    23n

    = 2n

    = O(N1

    ).

  • 7/30/2019 Chernoff Bound

    48/49

    The proof of phase 2 is by symmetry: The proof of phase 1 argued about the number of packets

    crossing a given path, no timing considerations.

    The path from one packet per node to random locations is

    similar to random locations to one packet per node inreverse order.

    Thus, the distribution of the number of packets that crosses apath of a given packet is the same.

    Oblivious Routing

  • 7/30/2019 Chernoff Bound

    49/49

    Definition

    A routing algorithm is oblivious if the path taken by one packet isindependent of the source and destinations of any other packets inthe system.

    TheoremGiven an N-node network with maximum degree d the routingtime of any deterministic oblivious routing scheme is

    (N

    d3 ).