Randomization Carmella Kroitoru Seminar on Communication Complexity

Randomization

Carmella Kroitoru

Seminar on Communication Complexity

What’s new?

Until now, Alice and Bob were all powerful, but deterministic.

But what if Alice and Bob could act in a randomized fashion?

What’s randomized fashion??

Well, just flipping a coin, for instance..

What now??

Now players “toss coins”.

The coins determine the protocol.

The communication over an input (x, y) is not fixed anymore, it’s a random variable.

And so is f(x,y).

So, How can we define the success of the protocol?

The conservative way - Las Vegas protocols

Success = the protocol always gives the right f(x,y).

The liberal option – Monte Carlo protocols

Success = a high probability that the protocol will give the right f(x,y).

Where do we get all the coins?

DeterministicRandomized

Input

Additional Information

Output Dependence

Denote – r(I) is a pseudo random string of arbitrary length

Alice gets xBob gets y

Alice gets xBob gets y

Alice has r(A) Bob has r(B)

F(x,y) depends only on x and y

F(x,y) depends on

x, y, r(A), r(B)

So how does the decision tree looks??

Z(1,1)

X1 = 1 X1 = 0

Y1 = 1 Y1 = 0 Y1 = 1 Y1 = 0

Z(1,0) Z(0,1) Z(0,0)

Before Randomization

After Randomization

But What if we get the ‘wrong’ r(I)?

Does it mean the f(x,y) will be wrong?

Yes!

For The same input (x,y), f(x,y) will differ, depending on r(A) and r(B).

So how will we get the right answer?

Through the magic of probability.

Definitions:P – a randomized protocol Zero error – for every (x,y)

Pr [P(x,y) = f(x,y)] = 1 ε error – for every (x,y)

Pr [P(x,y) = f(x,y)] ≥ 1- ε One sided ε error - for every (x,y)

if f(x,y) = 0 then Pr [P(x,y) = 0] = 1 if f(x,y) = 1 then Pr [P(x,y) = 1] ≥ 1- ε

Now f(x,y) can vary depending on r(A) and r(B)

What about D(f)?? Is it constant for some (x,y) input?

No!

So how will we measure the running time?

Running time – first method

The worst case running time of a randomized protocol p is

The worst case cost of p is

Running time – second method The average case running time of a

randomized protocol p is

The average case cost of p is

Average ?

r(A) and r(B) are chosen independently, according to some probability distribution.

Should we consider the distribution of (x,y)?

No ‘average input’!

Now what’s all that ‘probability’ talk?Didn’t I pass that course already?Let’s do a quick review..

Binomial Distribution A binomial experiment, also known as Bernoulli trial,

is a statistical experiment with the properties:

n Independent trials – yi’s. Success or failure (denoted 0/1, T/F) Probability of success, p, is the same on every trial. Denote: S – number of successes.

Example: coin toss. Success = ‘Heads’.

trialy1 y2 y3 ..…..…yn

result011..…..…0S = ∑yi

Exercise:Given a box with red and blue ball, suppose red

balls are at least δ fraction of total.

Probability that draws don’t see any red balls?

But what if 1/3 isn’t good enough? What if we want <α?

Probability that draws don’t see any red balls?

Expectation

Linearity E( a ) = a E( a*X + b) = a*E(X) + b E( X+Y ) = E(X) + E(Y)

Non-multiplicativity

E( X*Y ) = E(X) * E(Y) only if X and Y independent

What is the probability that a random variable deviates from its expectation? Focus on sums/averages of n bounded variables:

Note: can get similar bounds for other bounded variables

Don’t know anything about the yi’s: If yi is positive, but we don’t know anything else, then

we can use Markov’s inequality to bound S:

Some intuition - no more than 1/5th of the population can have more than 5

times the average income Example:

–toss 100 coins, Pr (# of heads ≥ 60) ≤ 5/6 (E(Y) = 50, t = 6/5)

So without knowing if yi is bound, or if the yi ‘s are independent we got an upper bound.

Although it’s not a great bound, and we don’t know the lower bound.

Bounding S=Σyi when yi ‘s are independent

An exact result: Using the Cumulative binomial probability.

Refers to the probability that the binomial random variable falls within a specified range

Example: toss 100 coins, Pr (# of heads ≥ 60) = ?

Solution: To solve this problem, we compute 40 individual probabilities, using the binomial formula. The sum of all

these probabilities is the answer we seek .Denote: k = # of heads

Pr(k ≥ 60; n = 100, p = 0.5) = Pr(k = 60; 100, 0.5) + Pr(k = 61; 100, 0.5) + . . . + Pr(k = 100; 100, 0.5)

Pr(k ≥ 60; n = 100, p = 0.5) = 0.028

More generally:We can use Chernoff’s inequality to bound S:

Then for any δ > 0

Example: toss 100 coins independently

Pr(# of heads ≥ 60) ≤ (t = 1/5)

Dramatically better bounds then Markov Worse then Cumulative Probability but much easier and

works for more cases. Bounds for all S (bigger or smaller then E)

Back to running time

3 types of errors - 3 complexity measures (1)Let f : x × y → {0,1}

Lets define complexity measures for a randomized protocol p:

is the minimum average case cost of p, that computes f with zero error.

3 types of errors - 3 complexity measures (2)For 0< ε <1, is the minimum worst case

cost of p that computes f with error ε.

For 0< ε <1, is the minimum worst case cost of p that computes f with one sided error ε.

Wait!

What’s the meaning of ‘average’ for f with zero error?

And the meaning of ‘worst’ for f with error?

Worst case = Θ (Average case)Reminder: DAVG(f) – the average running time

over all random vectors r(A), r(B), maxed over all possible inputs x,y.

A protocol ‘A’ makes an error ε/2 and DAVG(A) = t Define A’:~ Execute A as long as at most 2t/ ε bits are exchanged~ If A finished, return it’s answer~ Else return 0

And #{bits exchenged} of A’ in the worst case is DWORST(A’) = 2t/ε

So we found k= 2/ε s.t. DAVG(A) = k * DWORST(A’)

But is A and A’ have the same errors? A has error of ε/2 .

What’s the error of A’? A’ can return the answer A output. That has a chance of

ε/2 to be wrong. Or A’ can output 0 because A wasn’t done after 2t/ ε

bits. What’s the chance of that? We’ll use Markov:

Pr[A exchanged more then 2t/ ε bits] =

Pr[ #{bits exchenged in A}> (2/ε)*t ] ≤ 1/(2/ε) = ε/2

So A’ has error of at most ε/2 + ε/2 = ε, meaning 2*Aerror = A’error , both one sided.

What if ε = 0?

For zero error protocols, using the worst case cost gives exactly the deterministic communication complexity.

How come?

A deterministic protocol can just fix some r(A) and r(B) and proceed.

So for zero error protocols, we only care about the average case cost.

Exercise (1):A: Calculate f(x, y) for some fixed x,y, with some protocol

with one sided error ε.

A’: Execute A t times.

Denote: ~ fi – result of execution #i

~ if for some i fi=1 then res(A’)= 1 , else res(A’)= 0.

Pr [res(A’) ≠ f(x, y)] = ?

Solution:If res(A’) = 1, that’s the right answer, 100% (by definition).

If res(A’) = 0, it might be the right result, or we got the wrong result for t rounds. What’s the chance of that?

Pr[ mistake in A] < ε. Pr[ mistake in A’] < ε^t.

Exercise (2):A: Calculate f(x, y) for some fixed x,y, with some protocol

with error ε.

A’: Execute A t times. Denote: fi – result of execution #i

res(A’) = maj {fi}

What’s the possibility that A’ didn’t get a result we trust?

Solution(1):

Define a Bernoulli trial: yi = 1 if fi ≠ f(x, y).

To get res(A’) wrong we need to get more than half yi ‘s wrong.

E[∑ yi ] = E[S] = εt. So what’s the chance of S > t/2 ?

Hint: Use Chernoff for that.

Solution (2):Let’s fix ε ≤ ¼ (can be generalized for ε < ½) and take δ = 1.

What if we want to bound the possibility of error with α smaller than that?

Then we need to take t = 12*ln[1/ α]

So the error probability can be reduced if we’ll take bigger t, meaning, enlarge the communication complexity by a small penalty.

FieldA non-empty set F with two binary operation + (addition) and * (multiplication) is called a field if it has:

Closure of F under + and * Associativity of + and * Commutativity of + and * Distributivity of * over + + and * identity (0, 1) + and * inverses

examples: rational numbers, GF[7] = {0, 1, 2, 3, 4, 5, 6}

Example: EQDenote: Alice’s input A = a0a1…an-1

Bob’s input B = b0b1…bn-1

Let’s think of these inputs as polynomials over GF[p] where n² < p < 2n², p is prime.

That is A(x) = a0 + a1x + a2x² + … + an-1x^(n-1) mod p

and B(x) = b0 + b1x + b2x² + … + bn-1x^(n-1) mod p

Alice picks at random t in GF[p] and sends Bob both t and A(t). Bob outputs 1 if A(t) = B(t) and 0 otherwise.

#{bits exchanged} = O(log p) = O(log n)

CorrectnessNote that if A=B then A(t) = B(t) for all t, so f(A,B) = 1.

If A≠B then we have 2 distinct polynomials of degree n-1.

Such polynomials can be equal on at most n-1 (out of p) elements of the field (since their difference is a non zero polynomial of degree ≤ n-1, and has at most n-1 roots).

Hence the probability of error is at most (n-1)/p ≤ n/n² = 1/n

So we have shown that R(EQ) = O(log n), and in fact and

In contrast to D(NE) = D(EQ) = n+1

ExerciseProve that the following protocol for EQ achieves similar performance:

Alice and Bob view their inputs A and B as n-bit integers (between 1 and 2^n) .

Alice chooses a random prime number p between the n first primes. (If n = 3 then p is 2, 3 or 5)

She sends p and (A mod p) to Bob .

Bob outputs 1 if A mod p = B mod p, and 0 otherwise.

SolutionFirst note that if A = B, then of course

A mod p = B mod p.If A ≠ B and Bob accepts, it means that A = B (mod p).Define BADA,B = {p | p is in the first n primes and A = B (mod p)}

BAD 5,8 = {3}

Claim (without proof): | BADA,B | ≤ ½n.Then the probability that B accepts is

Can repeat this for 100 times to get error of 2^(-100).

So we got since max {first n primes} < n^3 and O(log n) = O(log n^3)

So how much better are bounds of randomization protocols then deterministic ones?

Lemma:

We will prove a more delicate statement:

Proof 1:

Let’s present a deterministic simulation of a given randomized protocol p.

For each leaf l of the protocol p, Alice will send Bob p(A, l) – the probability that given x, she will respond in a way leading to the l.

Bob will compute p(B, l) - the probability that given the y, he will respond in a way leading to the l.

Bob will then compute p(l) = p(A, l) * p(B, l) – the probability to reach l.

Proof 2:Bob and Alice will do this for every leaf, thus

calculating all leafs.

Bob will check which of the values has probability 1-ε, and that is the right f(x,y).

What’s the problem??

We need to send probability values, that means real numbers.

But we can use precision of

bits.

Proof 3:This guarantees deviation in values of at most

This implies that p(l), is at most far

from the true p(l) (since p(B, l) ≤ 1).

So the total error over all the leaves is at most ½-ε.

Therefore Bob only needs to check which of the values (0 or 1) has probability of more then ½ and that is the correct f(x,y) value.

Q.E.D.

Documents

Randomization Carmella Kroitoru Seminar on Communication Complexity