Jaikumar Radhakrishnan - ee.iitb.ac.in · Communication Complexity Jaikumar Radhakrishnan School of Technology and Computer Science Tata Institute of Fundamental Research Mumbai 1

Communication Complexity

Jaikumar Radhakrishnan

School of Technology and Computer ScienceTata Institute of Fundamental Research

Mumbai

1 June 2012, IIT Bombay

Jaikumar Radhakrishnan (School of Technology and Computer Science Tata Institute of Fundamental Research Mumbai)Communication Complexity 1 June 2012, IIT Bombay 1 / 31

Plan

1 Examples, the model, the set disjointness problem

2 Lower bounds for set disjointness, application to streaming

3 Round elimination, lower bounds for data structure problems

4 Remote generation of random variables, correlated sampling⇐


Entropy of a random variable

Definition (Shannon entropy)

Claude Shannon (1916-2001)

Let X be a random variable taking values in the set[n] = {1, 2, . . . , n}. Then, its entropy is given by

H[X] = −n∑

i=1

pi log2 pi.

H[X] measures the uncertainity in X.H[X] is a function of the distribution of X, not the actual values it takes.H[X] ≤ log2 n.


Pragmatics

AliceObserves XSends Bob a message M

⇐⇒

BobRecovers X from M

GoalAlice and Bob exchange bits.Bob must recover X exactly.Goal: minimize the (expected) total number of bits transmitted.

Transmission costLet T[X] denote the minimum cost of transmitting X.


Pragmatics

AliceObserves XSends Bob a message M

⇐⇒

BobRecovers X from M

GoalAlice and Bob exchange bits.Bob must recover X exactly.Goal: minimize the (expected) total number of bits transmitted.

Transmission costLet T[X] denote the minimum cost of transmitting X.


Entropy and transmission

Theorem

H[X] ≤ T[X] ≤ H[X] + 1.

Kraft’s inequalityLet `1, `2, . . . , `n be positive integers. Then,

there is a binary tree whose i-th leaf is at height ì

m∑i

2−ì ≤ 1


Entropy and transmission

Theorem

H[X] ≤ T[X] ≤ H[X] + 1.

Kraft’s inequalityLet `1, `2, . . . , `n be positive integers. Then,

there is a binary tree whose i-th leaf is at height ì

m∑i

2−ì ≤ 1


Long years ago . . . 1948

Shannon’s source coding theoremLet p be a probability distribution on [n]. For ε > 0 and positive integer k, let

N(k, ε) = minA⊂[n]k:pk(A)≥1−ε

|A|.

Theorem (Shannon)

For all ε ∈ (0, 1), limk→∞

1k

log2 |N(k, ε)| = H(p).

. . . not wholly or in full measure, but very substantially!





|A|.

Theorem (Shannon)


1k

log2 |N(k, ε)| = H(p).






|A|.

Theorem (Shannon)


1k

log2 |N(k, ε)| = H(p).



Conditional entropy

Definition(X,Y): a pair of random variables with some joint distribution.

H[Y | X] =∑

i

pX(i)H[Yi]

FactConditioning reduces uncertainity: H[Y | X] ≤ H[Y].

H[XY] = H[X] + H[Y | X].


Conditional entropy

Definition(X,Y): a pair of random variables with some joint distribution.

H[Y | X] =∑

i

pX(i)H[Yi]

FactConditioning reduces uncertainity: H[Y | X] ≤ H[Y].

H[XY] = H[X] + H[Y | X].


The noisy channel

SpecificationInput alphabet: [m]

Output alphabet: [n]

Characteristics: Pr[output = j | input = i] = pj|i.

Code of conductEncoding: {0, 1}k → [m]t

Decoding: [n]t → {0, 1}k

GoalError: Pr[input 6= output] ≤ ε.

Rate:kt

should be as large as possible.


The noisy channel







Rate:kt



The noisy channel







Rate:kt



Capacity

Input to the channel: X ∈ [m]

Ouput of the channel: Y ∈ [n].

Definition (Capacity of a channel E)

C(E) = maxX

H[X] + H[Y]− H[XY].


Jaane kya toone kahi . . . jaane kya meine suni

Theorem (Shannon)Let C be the capacity of the channel. Then, for all ε > 0 and all k, there existencoders and decoders such that

Encoding rate: kt ≥ C − ε.

Error: Pr[error]→ 0 as k→∞.

Optimality: Can’t replace C − ε by C + δ for any δ > 0.

. . . baat kuchch ban hi gayee!


Jaane kya toone kahi . . . jaane kya meine suni

Theorem (Shannon)Let C be the capacity of the channel. Then, for all ε > 0 and all k, there existencoders and decoders such that

Encoding rate: kt ≥ C − ε.

Error: Pr[error]→ 0 as k→∞.

Optimality: Can’t replace C − ε by C + δ for any δ > 0.

. . . baat kuchch ban hi gayee!


Mutual information

DefinitionFor random variables X and Y with some joint probability distribution, theirmutual information is

I[X : Y] = H[X] + H[Y]− H[XY]

= H[X]− H[X | Y]

= H[Y]− H[Y | X].


Today: generating random variables remotely

Pair of random variablesLet (X,Y) be a pair of not necessarily independent random variables takingvalues in the set [m]× Y.

AliceReceives x ∈ [m]

⇒⇐

BobGenerates y ∈ Y

GoalPr[Bob’s output = y | x = x] = Pr[Y = y | X = x].

Minimize the average number of bits sent by Alice.Let T[X : Y] be this quantity.





⇒⇐








⇒⇐





When X and Y are independent. . .

Bob can generate y on his own.No message from Alice is required.


When X and Y are independent. . .

Bob can generate y on his own.No message from Alice is required.


When X and Y are highly correlated. . .

The case X = YThen

H[X] ≤ T[X : Y] ≤ H[X] + 1,

where H[X] is the Shannon entropy of X.


In general, . . .

A lower bound

I[X : Y] ≤ T[X : Y].

ProofLet M be Alice’s message to Bob.Then, x and y are conditionally independent given M. So,

I[x : y] ≤ I[x : yM] ≤ I[x : M] + I[x : y |M] ≤ H[M] ≤ E[|M|].


In general, . . .

A lower bound

I[X : Y] ≤ T[X : Y].

ProofLet M be Alice’s message to Bob.Then, x and y are conditionally independent given M. So,

I[x : y] ≤ I[x : yM] ≤ I[x : M] + I[x : y |M] ≤ H[M] ≤ E[|M|].


Bad News

A pair of random variables

X ∈U

([n]

n/2

)Y ∈U X

I[X : Y] = 1T[X : Y] ≥ c lg n


Bad News


X ∈U

([n]

n/2

)Y ∈U X

I[X : Y] = 1T[X : Y] ≥ c lg n


Not new . . .

Wyner’s common information (1975)Definition:

C[X : Y] = lim infλ→0

[lim

m→∞

Tλ[Xm : Ym]

m

].

Theorem:C[X : Y] = min

WI[XY : W],

where the minimum is taken over all random variables W suchthat X and Y are conditionally independent given W


Common information versus mutual information

T[X : Y] ≥ C[X : Y] ≥ I[X : Y].

There exist random variables where both inequalities are loose.

ExampleX,Y ∈ {0, 1}n.Let W ∈ [n]× {0, 1} uniformly distributed.X[i] = b, Y[i] = b and the other 2(n− 1) bits are generated uniformly.

I[X : Y] = O(n−13 );

C[X : Y] = 2− o(1);

T[X : Y] = Θ(log n).



T[X : Y] ≥ C[X : Y] ≥ I[X : Y].



I[X : Y] = O(n−13 );

C[X : Y] = 2− o(1);




T[X : Y] ≥ C[X : Y] ≥ I[X : Y].



I[X : Y] = O(n−13 );

C[X : Y] = 2− o(1);



The right question?

Suppose Alice and Bob are allowed to share a random variable R generatedindependently of Alice’s input.

AliceReceives x ∈ [m] ⇒

BobGenerates y ∈ [n]

Alice generates her message to Bob based on her input x, the randomstring R and some of her own randomness.Bob generates his output based on Alice’s message, the random stringR, and some of his own randomness.Let TR[X : Y] denote the minimum expected number of bitscommunicated (by Alice) in the best strategy for generating (X,Y) withshared randomness.


The right question?






The right question?






The right question?






The first example revisited


X ∈U

([n]

n/2

)Y ∈U X

Note that I[X : Y] = 1.

A strategyRandomness. A random permutation R of [n].Alice’s message. The index i of the first element in this permutation that

occurs in her set X.Communication cost. Note that E[i] ≤ 2.




X ∈U

([n]

n/2

)Y ∈U X







X ∈U

([n]

n/2

)Y ∈U X







X ∈U

([n]

n/2

)Y ∈U X





The main result

Theorem

I[X : Y] ≤ TR[X : Y] ≤ I[X,Y] + O(lg(I[X : Y] + 1)).

ProofLower bound. Minor variation of the previous proof.

I[X : Y] ≤ I[X : MRY]

≤ I[X : MR] + I[X : Y |MR]

≤ I[X : M | R] + I[X : R]

≤ H[M | R]

≤ H[M]

≤ E[|M|].


The main result

Theorem

I[X : Y] ≤ TR[X : Y] ≤ I[X,Y] + O(lg(I[X : Y] + 1)).

ProofLower bound. Minor variation of the previous proof.

I[X : Y] ≤ I[X : MRY]

≤ I[X : MR] + I[X : Y |MR]

≤ I[X : M | R] + I[X : R]

≤ H[M | R]

≤ H[M]

≤ E[|M|].


Upper boundDefinitionGiven two distributions P and Q on a set Y, their relative entropy is

D(P‖Q) =∑y∈Y

P(y) lgP(y)

Q(y).

Connection to mutual information

I[X : Y] = Ex←X[D(Qx‖Q)],

where Qx is the conditional distribution of Y given X = x.

The ideaRandomness. R = 〈y1, y2, . . . , yi, . . .〉 independently sampled from Q.Alice’s message. An index i∗.Bob’s output. The sample yi∗ .

Cost. Approximately D(Qx‖Q).



D(P‖Q) =∑y∈Y

P(y) lgP(y)

Q(y).








D(P‖Q) =∑y∈Y

P(y) lgP(y)

Q(y).








D(P‖Q) =∑y∈Y

P(y) lgP(y)

Q(y).








D(P‖Q) =∑y∈Y

P(y) lgP(y)

Q(y).








D(P‖Q) =∑y∈Y

P(y) lgP(y)

Q(y).







Choosing the right index

Main lemmaLet P and Q be two distributions such that D(P‖Q) is finite. There is aprocedure that on input a sequence

R = 〈y1, y2, . . . , yi, . . .〉

of independently drawn samples from Q, outputs an index i∗ such thatyi∗ has distribution P, andE[length(i∗)] ≤ D(P‖Q) + 2 lg(D(P‖Q) + 1) + O(1).




R = 〈y1, y2, . . . , yi, . . .〉





R = 〈y1, y2, . . . , yi, . . .〉



Proof by example

Q = 〈12,

18,

18,

14〉;

Qx = 〈14,

34, 0, 0〉.

R = 〈y1, y2, . . . , yi, . . .〉 (samples drawn from Q)

Step 1: If y1 = 1, accept with probabibility 12 .

If y1 = 2, accept with probability 1.Otherwise, reject.

Step 2 onwards: Accept iff yi = 2.


Proof by example

Q = 〈12,

18,

18,

14〉;

Qx = 〈14,

34, 0, 0〉.






Proof by example

Q = 〈12,

18,

18,

14〉;

Qx = 〈14,

34, 0, 0〉.






Analysis

Claim

E[i∗] ≤ D(P‖Q) + O(1).

Proof

D(P‖Q) =∑y∈[n]

P(y) lgP(y)

Q(y).

Idea: the element y never needs to be generated after the first⌈

P(y)Q(y)

⌉stages.


Summary

Entropy, channel capacity, mutual information.

The problem of generating correlated random variables.

A ‘characterization’ of mutual information in terms of shared randomness.

Proof via a rejection sampling procedure based on relative entropy.


An applications of this result

Theorem (One-shot Reverse Shannon Theorem)

C(P) ≤ T(P) ≤ C(P) + 2 lg(C(P) + 1) + O(1)


The Braverman-Rao protocol

Alice does not know QAlice: A distribution P on [n].Bob: A distribution Q on [n].

Guarantee: D(P‖Q) ≤ k.Goal: Alice and Bob agree on a value M whose distribution is P (or

within distance ε to P). Minimize communication Tε(P‖Q).

Theorem (Braverman and Rao, 2010)

Tε(P‖Q) = O(log(1ε

)D(P‖Q)).


Remarks

The first result was joint work with Prahladh Harsha, Rahul Jain, DavidMcAllester (2010).

The asymptotic versions of the main result and the one-shot ReverseShannon Theorem were shown earlier by Winter (2002) and Bennett,Shor, Smolin and Thapliyal (2002).

The one-shot version implies the asymptotic versions by a routineapplication of the law of large numbers.

We don’t know if the ‘extra’ log term is necessary.Precise asymptotic tradeoffs between commmunication and sharedrandomness were obtained by Bennett & Winter and Paul Cuff.


Remarks






Remarks






Remarks






Plan

1 Examples, the model, the set disjointness problem

2 Lower bounds for set disjointness, application to streaming

3 Round elimination, lower bounds for data structure problems

4 Remote generation of random variables, correlated sampling⇐


Thank you


Documents

Jaikumar Radhakrishnan - ee.iitb.ac.in · Communication Complexity Jaikumar Radhakrishnan School of Technology and Computer Science Tata Institute of Fundamental Research Mumbai 1