57
Entropy, Inference, and Channel Coding Sean Meyn Department of Electrical and Computer Engineering University of Illinois and the Coordinated Science Laboratory NSF support: ECS 02-17836, ITR 00-85929 and CCF 00-49089

Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Embed Size (px)

Citation preview

Page 1: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Entropy, Inference, and Channel Coding

Sean Meyn

Department of Electrical and Computer EngineeringUniversity of Illinois and the Coordinated Science Laboratory

NSF support: ECS 02-17836, ITR 00-85929 and CCF 00-49089

Page 2: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Overview

Hypothesis testing and channel coding

Structure of optimal codes

Error exponents

Algorithms

Optimal code

QAM

R

E Rr( )

Page 3: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

References

Large deviations

Dembo and Zeitouni, Large Deviations Techniques And Applications, 1998

Kontoyiannis, Lastras-Montano and Meyn, Relative Entropy and Exponential Deviation Bounds for General Markov Chains, ISIT, 2005

Pandit and Meyn, Extremal Distributions and Worst-Case Large-Deviation Bounds, 2004

Hypothesis testing

D&Z 1998

Zeitouni and Gutman. On universal hypothesis testing via large deviations, IT-37, 1991

Pandit, Meyn and Veeravalli, Asymptotic Robust Neyman-Pearson Testing Based on Moment Classes, ISIT, 2004.

Page 4: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

References

Channel coding

Csiszar and Korner. Information theory: Coding Theorems for Discrete Memoryless Systems. Academic Press New York, 1997

MacKay, Information Theory, Inference, and Learning Algorithms, CUP, 2003 http://www.inference.phy.cam.ac.uk/mackay/itila/

Blahut, Hypothesis testing and information theory, IT-20, 1974

Page 5: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Outline (today)

Introduction

Relative entropy & Large deviations

Hypothesis testing

Channel capacity

Conclusions

Introduction

Relative entropy & Large deviations

Hypothesis testing

Channel capacity

Conclusions

Page 6: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Memoryless Channel Model

Memoryless channel with input sequence X, output sequence Y

Channel kernel

If X is i.i.d. with marginal distribution µ

Then, Y is i.i.d. with marginal distribution π

P (dy | x) = P{Yt ∈ dx | Xt = x}

π( · ) =

∫P ( · | x)µ(dx)

Page 7: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Random codebook

Channel kernel

N-dimensional code words

N-dimensional output Y received: i.i.d.,

with marginal distribution π

X i, i = 1, 2, . . . , eNR

P (dy | x) = P{Yt ∈ dx | Xt = x}

Page 8: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

IEEE Std 802.11a -1999 SUPPLEMENT TO IEEE STANDARD FOR INFORMATION TECHNOLOGY

0 1

I+1

+1

QBPSK QPSK

01

00 10

11

I+1

+1

Q b0b0 b1

11 10

11 11 10 11

10 10

I+1

+1

Q b0b1b2 b3

+3

11 01

11 00 10 00

10 01+3

00 10

00 11 01 11

01 10

00 01

00 00 01 00

01 01

16-QAM

011 010

011 011 010 011

010 010 I+1

+1

Q b0b1b2b3 b4b5

+3

011 001

011 000 010 000

010 001

+3

000 010

000 011 001 011

001 010

000 001

000 000 001 000

001 001

011 110

011 111 010 111

010 110

011 101

011 100 010 100

010 101

000 110

000 111 001 111

001 110

000 101

000 100 001 100

001 101

111 010

111 011110 011

110 010

111 001

111 000110 000

110 001

100 010

100 011101 011

101 010

100 001

100 000101 000

101 001

111 110

111 111110 111

110 110

111 101110 101

100 110

100 111101 111

101 110

100 101101 101

111 100110 100 100 100101 100+7

+5

+5 +7

64-QAM

Page 9: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Questions & Objectives

1. What is the structure of optimal µ ?

2. Construct algorithms based on this structure

3. Worst-case modeling to simplify code construction

4. Decoding algorithms and evaluation

Page 10: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Questions & Objectives

1. What is the structure of optimal µ ?

2. Construct algorithms based on this structure

3. Worst-case modeling to simplify code construction

4. Decoding algorithms and evaluation

Methodology& Viewpoint: Hypothesis testing

Large deviations

Convex & linear optimization theory

Page 11: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Example: Rayleigh Channel Y = AX + N

σ2A = 1, σ2

N = 1, and σ2P = 26.4 (SNR=14.2 dB)

A and N are i.i.d. and mutually independent:

Page 12: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Example: Rayleigh Channel Y = AX + N

σ2A = 1, σ2

N = 1, and σ2P = 26.4 (SNR=14.2 dB)

16-point QAM

I = 0.2 nats/symbol.

A and N are i.i.d. and mutually independent:

Standard:

Rate:2.57 7.71

16-point QAM

Page 13: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Example: Rayleigh Channel Y = AX + N

σ2A = 1, σ2

N = 1, and σ2P = 26.4 (SNR=14.2 dB)

A and N are i.i.d. and mutually independent:

2.7 8 2.57 7.71

16-point QAM Three-point constellation

Page 14: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Example: Rayleigh Channel Y = AX + N

σ2A = 1, σ2

N = 1, and σ2P = 26.4 (SNR=14.2 dB)

A and N are i.i.d. and mutually independent:

3-point distribution: three-fold improvement

over 16-point QAM

E R

R

r( )

0 0.1 0.2 0.3 0.4 0.5 0.60

0.05

0.15

0.25

0.10

0.20

Page 15: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Outline

Introduction

Relative entropy & Large deviations

Hypothesis testing

Channel capacity

Conclusions

Introduction

Relative entropy & Large deviations

Hypothesis testing

Channel capacity

Conclusions

Page 16: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Large Deviations

Simulate a function

X = {X1, X2, . . . } a nice Markov chain on X, marginal distribution µ

g : X → R

n−1n∑

t=1

g(Xt)cn =

Page 17: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Large Deviations

Simulate a function

Probability of over-estimate

X = {X1, X2, . . . } a nice Markov chain on X, marginal distribution µ

c > c0

n−1 log P{n−1

n∑t=1

g(Xt) ≥ c}→ −Λ∗(c)

g : X → R

n−1n∑

t=1

g(Xt)cn = → c0 = µ(g)

Page 18: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Large Deviations

Simulate a function

Rate function & log-moment generating function

Probability of over-estimate

X = {X1, X2, . . . } a nice Markov chain on X, marginal distribution µ

c > c0 = µ(g),

Λ∗(c) = supθ>0

[θc − Λ(θ)]

n−1 log P{n−1

n∑t=1

g(Xt) ≥ c}→ −Λ∗(c)

Λ(θ) = limn→∞ n−1 log E

[exp

n∑t=1

g(Xt))]

g : X → R

n−1n∑

t=1

g(Xt)cn = → c0 = µ(g)

Page 19: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Hoeffding's Bound

Marginal distribution µ unknown

Worst-case rate function & log-moment generating function

X = {X1, X2, . . . } is i.i.d. on X

n−1n∑

t=1

Xtcn = → c0 = µ(g)

g(x) = x= [0, 1]

inf{Λ∗µ(c) : µ(g) = c0} sup{Λµ(θ) : µ(g) = c0}

Page 20: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Hoeffding's Bound

Marginal distribution µ unknown

Worst-case rate function & log-moment generating function

Solution: is binary on

X = {X1, X2, . . . } is i.i.d. on X

n−1n∑

t=1

Xtcn = → c0 = µ(g)

g(x) = x= [0, 1]

{0, 1}

inf{Λ∗

µ(c) : µ(g) = c0} sup{Λµ(θ) : µ(g) = c0}

µ

Page 21: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Bennett's Lemma

Marginal distribution µ unknown

Worst-case rate function & log-moment generating function

X = {X1, X2, . . . } is i.i.d. on X Mean and variance given

n−1n∑

t=1

Xtcn =

g(x) = x

= [0, 1]

finf{Λ∗µ(c) : µ(gi) = ci, i = 1, 2} sup{Λµ(θ) : µ(gi) = ci, i = 1, 2}

Page 22: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Bennett's Lemma

Marginal distribution µ unknown

Worst-case rate function & log-moment generating function

Solution: is binary on

X = {X1, X2, . . . } is i.i.d. on X Mean and variance given

n−1n∑

t=1

Xtcn =

g(x) = x

= [0, 1]

{ , 1}

f

∗ µ

inf{Λ∗µ(c) : µ(gi) = ci, i = 1, 2} sup{Λµ(θ) : µ(gi) = ci, i = 1, 2}

x0

Page 23: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Generalized Bennett's Lemma

Marginal distribution µ unknown

Worst-case moment generating function:

X = {X1, X2, . . . } is i.i.d. on X n moments given

n−1n∑

t=1

cn =

= [0, 1]

g(Xt)

λ(θ) = E[eθg(Xt)] = 〈µ, eθg〉

gi

Page 24: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Generalized Bennett's Lemma

Marginal distribution µ unknown

Worst-case moment generating function:

Linear program over M:

X = {X1, X2, . . . } is i.i.d. on X n moments given

n−1n∑

t=1

cn =

= [0, 1]

g(Xt)

max 〈µ, eθg〉s. t. 〈µ, gi〉 = ci, i = 1, . . . , n.

µ∗ is discrete

λ(θ) = E[eθg(Xt)] = 〈µ, eθg〉

gi

Page 25: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Sanov's Theorem

X Probability measures: M

Ln :=1

n

n−1∑t=0

δXt n ≥ 1

µ a measureg a function on X

〈µ, g〉 = µ(g) :=

∫g(y)µ(dy)

〈Ln, g〉 =1

n

n−1∑t=0

g(Xt)

State space:

Notation:

Empirical measures:

Ln ∈ M for

Page 26: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Sanov's Theorem

X Probability measures: M

Ln :=1

n

n−1∑t=0

δXt n ≥ 1

µ a measureg a function on X

〈µ, g〉 = µ(g) :=

∫g(y)µ(dy)

State space:

Notation:

Empirical measures:

Ln ∈ M for

Relative entropy:

D(ν‖µ) =⟨ν, log

(dν

)⟩=

∫log

(dν

)ν(dx)

Page 27: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Sanov's Theorem

µ

Ln :=1

n

n−1∑t=0

δXt

Law of large numbers:Ln

Ln

→ µ,

µ

n → ∞

Page 28: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Sanov's Theorem

n−1 log P{Ln ∈ K} → −

K ⊂ M

µ

K

Convex set of probability measures

?

?

µ �∈ K

Ln

µ

Page 29: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Sanov's Theorem

n−1 log P{Ln ∈ K} → infν∈K

J(ν)η −− =

K ⊂ M

µ

K

Convex set of probability measures

Qη = {ν : J (ν) < η}

µ �∈ K

Ln

µ

Page 30: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Sanov's Theorem

µ

K

Qη = {ν : J (ν) < η}

J(ν) = D(ν‖µ)

tr. kernel with ν invariantJ(ν) = inf :D(ν � P P‖ν � P )

i.i.d. source:

Markov:

Ln

µ

Page 31: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Sanov's Theorem

n−1 log P{Ln ∈ K} → inf J(ν)

K = {ν : 〈ν, g〉 ≥ c}

η −− = =

Example:

− Λ∗(c)〈ν, g〉 ≥ c

Page 32: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Sanov's Theorem

n−1 log P{Ln ∈ K} → inf J(ν)

K = {ν : 〈ν, g〉 ≥ c

〈ν, g〉 = c

}

η −− = =

µ

K

Example:

Qη = {ν : J (ν) < η}

− Λ∗(c)〈ν, g〉 ≥ c

Ln

µ

Page 33: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Outline

Introduction

Relative entropy & Large deviations

Hypothesis testing

Channel capacity

Conclusions

Introduction

Relative entropy & Large deviations

Hypothesis testing

Channel capacity

Conclusions

Page 34: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Neyman Pearson Hypothesis Testing

Observations X = {Xt : t = 1,2, . . . N}X i.i.d. with marginal πj under Hj, j = 0,1

Hypothesis test:

φ(x) = 1 if H is declared true,based on N observations

1

Error Probabilities

Pe,0 = P0 {φ(X) = 1} , Pe,1 = P1 {φ(X) = 0}

N-P Criterion: infφ

Pe,1 subject to Pe,0 ≤ e−Nη

Page 35: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Neyman Pearson Hypothesis Testing

Observations X = {Xt : t = 1,2, . . . N}

Erro

Solution: if

r Probabilities

Pe,0 = P0 {φ(X) = 1} , Pe,1 = P1 {φ(X) = 0

φ(X) = 0

}

N-P Criterion: infφ

Pe,1 subject to Pe,0 ≤ e−Nη

X i.i.d. with marginal πj

π0

π1

under Hj, j = 0,1

Ln ∈ Qη(π0)

Qη(π0)

Page 36: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Neyman Pearson Hypothesis Testing

Solution: ifφ(X) = 0

π0

π1

Ln ∈ Qη(π0)

Qη(π0)

limN→∞

N−1 log P0{φN = 1} = −η

limN→∞

N−1 log P1{φN = 0} = −β∗

Page 37: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Neyman Pearson Hypothesis Testing

Solution: ifφ(X) = 0

π0

π1

Ln ∈ Qη(π0)

Qη(π0)

Q (π1)lim

N→∞N−1 log P0{φN = 1} = −η

limN→∞

N−1 log P1{φN = 0} = −β∗

β∗ = inf{J1(ν) : J0(ν) ≤ η}

= inf{β > 0 : Qβ(π1) ∩ Qη(π0) �= ∅}

β∗

〈ν, � 〉 = c

Page 38: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Robust Neyman Pearson Hypothesis Testing

Uncertainty classes defined by moment constraints

P1

P0

π0 ∈ P0 π1 ∈ P1

Page 39: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Robust Neyman Pearson Hypothesis Testing

Uncertainty classes defined by moment constraints

P1

P0

Q (P0)η

π0 ∈ P0 π1 ∈ P1

Page 40: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Robust Neyman Pearson Hypothesis Testing

Uncertainty classes defined by moment constraints

β∗ = infπ1∈P1

infµ∈Qη(P0)

D(µ ‖ π1 )

There exist π∗0 ∈ P0, π∗

1 ∈ P1, and µ∗ solving,

P1

P0

Q (P0)η

π0∗

π1∗

µ∗

Page 41: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Robust Neyman Pearson Hypothesis Testing

Uncertainty classes defined by moment constraints

Optimizers again discrete

β∗ = infπ1∈P1

infµ∈Qη(P0)

D(µ ‖ π1 )

There exist π∗0 ∈ P0, π∗

1 ∈ P1, and µ∗ solving,

Qβ∗(P1)P1

P0

Q (P0)η

〈µ, log(�)〉 = 〈µ∗, log(� ) 〉π0∗

π1∗

µ∗

Page 42: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Outline

Introduction

Relative entropy & Large deviations

Hypothesis testing

Channel capacity

Conclusions

Introduction

Relative entropy & Large deviations

Hypothesis testing

Conclusions

Page 43: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Channel Coding and Sanov's Theorem

Channel kernelChannel kernel

N-dimensional code wordsN-dimensional code words

X is i.i.d. with marginal distribution µX is i.i.d. with marginal distribution µ

Y is i.i.d. with marginal distribution πY is i.i.d. with marginal distribution π

N-dimensional output Y receivedN-dimensional output Y received

X i, i = 1, 2, . . . , eNR

P (dy | x) = P{Yt ∈ dy | Xt = x}

π( · ) =

∫P ( · | x)µ(dx)

Page 44: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Channel Coding and Sanov's Theorem

Channel kernelChannel kernel

N-dimensional code wordsN-dimensional code words

N-dimensional output Y receivedN-dimensional output Y received

X i, i = 1, 2, . . . , eNR

µ � P (dx, dy) = µ(dx)P

µ ⊗ π (dx, dy) = µ(dx)π(dy)

P (dy | x) = P{Yt ∈ dy | Xt = x}

(dy | x)

If i is the true codeword then( , ) has marginal distributionIf i is the true codeword then( , ) has marginal distribution

Otherwise, independence:Otherwise, independence:

X Yi

Page 45: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Channel Coding and Sanov's Theorem

µ � P (dx, dy) = µ(dx)P

µ � P

µ ⊗ π

µ ⊗ π (dx, dy) = µ(dx π(dy))

Two hypotheses based on observations:

H :0

H :1 (dy | x)

Page 46: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Channel Coding and Sanov's Theorem

Qη(π0)

µ � P (dx, dy) = µ(dx)P

µ � P

µ ⊗ π

µ ⊗ π (dx, dy) = µ(dx π(dy))

( , )

Solution: Reject codeword i ( )

Empirical distributions forjoint observations

Two hypotheses based on observations:

H :

X Yi

0

H :1

φ = 0

if Ln ∈ Qη(π0)

(dy | x)

Page 47: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Channel Coding and Sanov's Theorem

Qη(π0)

µ � P

µ ⊗ π

The error probability must be multiplied by

For vanishing error,

That is,

limN→∞

N−1 log P0{φN = 1} = −η

e−Nη e NR

eNR × e−Nη

η

< 1

<R

Solution: ifφ = 0 Ln ∈ Qη(π0)

Page 48: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Channel Coding and Sanov's Theorem

Qη (π0)

µ � P

µ ⊗ π

The error probability must be multiplied by

Solution: ifφ = 0 Ln ∈ Qη(π0)

limN→∞

N−1 log P0{φN = 1} = −η

e−Nη e NR

R < ηmax =

= mutual information

D(µ � P‖µ⊗ π)

max

Page 49: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Error Exponent

limN→∞

N−− 1 log P { } =E(R,µ ) error

Formula expressed as solution to a robust hypothesis testing problem:

For a given input distribution µ, denote product measureson

P0 = { {µ ⊗ ν : ν is a probability measure on Y

X × Y with first marginal µ,

Page 50: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Error Exponent

limN→∞

N−− 1 log P { } =E(R,µ ) error

Formula expressed as solution to a robust hypothesis testing problem:

For a given input distribution µ, denote product measureson

P0 = { {µ ⊗ ν : ν is a probability measure on Y

X × Y with first marginal µ,

Hypothesis : Code word i not sent; independent

Test: Empirical distributions within entropy ball around P0

(X ij ) ( Yj )H0

Page 51: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Error Exponent

Entropy neighborhood of P0

Entropy neighborhood of

H0: {(X ij , Yj) : j = 1, . . . , N} has marginal distribution π0 ∈ P0

H1: {(X ij , Yj) : j = 1, . . . , N} has marginal distribution π1

π1

π1

:=

� p

Q+R(P0) = { {γ : minν D(γ ‖ µ ⊗ ν) ≤ R

Q+β( ) = { {γ : D(γ ‖ µ

� pµ

) ≤ β

Page 52: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Error Exponent

limN→∞

N−− 1 log P { }

= infimum over β such that these entropy neighborhoods meet:

=E(R,µ ) error

β

Qβ(µ p )

µ ⊗ pµ

µ p

ˆµ pQR(P0)

P0

+

+

Page 53: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Error Exponent

limN→∞

N−− 1 log P { }

{ } =

=E(R,µ )

= random coding exponent = supremum over µE(R )

error

Qβ(µ p )

µ ⊗ pµ

µ p

ˆµ pQR(P0)

P0

+

+

infβ

β : Q+β (µ � p) ∩ Q+

R(P0) �= ∅

Page 54: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Outline

Introduction

Relative entropy & Large deviations

Hypothesis testing

Channel capacity

Conclusions

Introduction

Relative entropy & Large deviations

Hypothesis testing

Channel capacity

Page 55: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Summary

Large Deviations is the grand unifying principle of Information Theory

Page 56: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

Summary

Standard coding based on AWGN models

May be unrealistic in wireless models with fading

Discrete distributions arise in coding, and other applications involving optimization over M

Extremal distributions arise in worst-case models

M

Page 57: Entropy, Inference, and Channel Coding · Entropy, Inference, and Channel Coding ... University of Illinois and the Coordinated Science Laboratory ... Hypothesis testing

What's Next?

Channel models

Convex optimization and channel coding

Cutting plane algorithm

II

Worst-case models

Extremal distributions

III