35
Least Squares Superposition Codes of Moderate Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics Yale University Joint work with Andrew Barron June 13-18, 2009 2010 IEEE International Symposium on Information Theory Barron, Joseph (Yale) Least Squares, Superposition codes 1 / 22

Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Least Squares Superposition Codes of ModerateDictionary Size,

Reliable at Rates up to Capacity

Antony JosephDepartment of Statistics

Yale University

Joint work with Andrew Barron

June 13-18, 20092010 IEEE International Symposium on Information Theory

Barron, Joseph (Yale) Least Squares, Superposition codes 1 / 22

Page 2: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Outline

1 The Gaussian channel

2 Sparse Superposition Codes

3 Partitioned Superposition CodesEncodingPerformance of Least Squares Decoding

4 Analysis of Reliability at Rates up to Capacity

Barron, Joseph (Yale) Least Squares, Superposition codes 2 / 22

Page 3: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Gaussian Channel. (Power P , Noise variance σ2)

uMessage

(length K bits)

Encoder xCodeword(length n)

Channel

εNoise ∼ Normal(0, σ2In)

y Decoder u

Characteristics

Power constraint: Ave‖x‖2≤nP

signal-to-noise: v = P/σ2

Capacity: 12 log(1 + P

σ2 )

Interested in

Rate: R = K/n

Prob{u 6= u}Low fraction of mistakes

Barron, Joseph (Yale) Least Squares, Superposition codes 3 / 22

Page 4: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Gaussian Channel. (Power P , Noise variance σ2)

uMessage

(length K bits)

Encoder xCodeword(length n)

Channel

εNoise ∼ Normal(0, σ2In)

y Decoder u

Characteristics

Power constraint: Ave‖x‖2≤nP

signal-to-noise: v = P/σ2

Capacity: 12 log(1 + P

σ2 )

Interested in

Rate: R = K/n

Prob{u 6= u}Low fraction of mistakes

Barron, Joseph (Yale) Least Squares, Superposition codes 3 / 22

Page 5: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Gaussian Channel

Challenges in Code construction

Achieve Rate arbitrarily close to Capacity

Good error exponents

Manageable codebook

Fast Encoding

Fast Decoding

Barron, Joseph (Yale) Least Squares, Superposition codes 4 / 22

Page 6: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Sparse Superposition Codes

Model : Y = Xβ + ε

uMessage

(length K)

β

Encoder

XβCodeword(length n)

Channel

εNoise ∼ N(0, σ2In)

Y Decoder u

Code design

n × N design matrix X

Received string: Y = Xβ + ε

Coefficient vector β is sparse is non-zero values

Not a linear code in algebraic coding sense

Barron, Joseph (Yale) Least Squares, Superposition codes 5 / 22

Page 7: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Sparse Superposition Codes

Model : Y = Xβ + ε

uMessage

(length K)

β

Encoder

XβCodeword(length n)

Channel

εNoise ∼ N(0, σ2In)

Y Decoder u

Code design

n × N design matrix X

Received string: Y = Xβ + ε

Coefficient vector β is sparse is non-zero values

Not a linear code in algebraic coding sense

Barron, Joseph (Yale) Least Squares, Superposition codes 5 / 22

Page 8: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Sparse Superposition Codes

Model : Y = Xβ + ε

uMessage

(length K)

β

Encoder

XβCodeword(length n)

Channel

εNoise ∼ N(0, σ2In)

Y Decoder u

Code design

n × N design matrix X

Received string: Y = Xβ + ε

Coefficient vector β is sparse is non-zero values

Not a linear code in algebraic coding sense

Barron, Joseph (Yale) Least Squares, Superposition codes 5 / 22

Page 9: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Sparse Superposition Codes

X = . . .

β = (....., 1, ............. .............., 1, ... .... .............., 1, ...)

Code design

n × N matrix X with entries independent Normal(0,P/L)

β has exactly L, with L� N non-zero elements, all equal to 1

Average of ||Xβ||2 ≤ nP

Codeword is the sum of the selected columns

Barron, Joseph (Yale) Least Squares, Superposition codes 6 / 22

Page 10: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Partitioned Superposition Codes

X = . . .

B columns B columns B columns

Section 1 Section 2 .... Section L

β = (....., 1, ............. .............., 1, ... .... .............., 1, ...)

0,1,............................,B-1,0,1,............................,B-1

Code design

n × N matrix X with entries independent Normal(0,P/L)

Columns of X divided into L sections of size B. So N = LB

β has exactly one element non-zero in each section

Relationship: L, B , n and R

Number of codewords : BL

Length of message string K = L log2 B bits

Blocklength n = (1/R)L log B

Barron, Joseph (Yale) Least Squares, Superposition codes 7 / 22

Page 11: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Partitioned Superposition Codes

X = . . .

B columns B columns B columns

Section 1 Section 2 .... Section L

β = (....., 1, ............. .............., 1, ... .... .............., 1, ...)

0,1,............................,B-1,0,1,............................,B-1

Relationship: L, B , n and R

Number of codewords : BL

Length of message string K = L log2 B bits

Blocklength n = (1/R)L log B

Barron, Joseph (Yale) Least Squares, Superposition codes 7 / 22

Page 12: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Partitioned Superposition Codes

X = . . .

B columns B columns B columns

Section 1 Section 2 .... Section L

β = (....., 1, ............. .............., 1, ... .... .............., 1, ...)0,1,............................,B-1,0,1,............................,B-1

Relationship: L, B , n and R

Number of codewords : BL

Length of message string K = L log2 B bits

Blocklength n = (1/R)L log B

Barron, Joseph (Yale) Least Squares, Superposition codes 7 / 22

Page 13: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Encoding

0 1 1 1 0 0 0 0 1 1 0 1

{ { {

Non-zero elements of β:

7th element from Section 1

0th element from Section 2

13th element from Section 3

Map from Message u to β

Assume length of u, K = L log2 BFor ex. take L = 3 and log2 B = 4.

Split u into L subtrings of log2 B bits

Substrings give addresses of chosencolumns

For map : u → β,Choose non-zero elements of β in eachsection as given by these indices

Barron, Joseph (Yale) Least Squares, Superposition codes 8 / 22

Page 14: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Encoding

0 1 1 1 0 0 0 0 1 1 0 1{ { {

Non-zero elements of β:

7th element from Section 1

0th element from Section 2

13th element from Section 3

Map from Message u to β

Assume length of u, K = L log2 BFor ex. take L = 3 and log2 B = 4.

Split u into L subtrings of log2 B bits

Substrings give addresses of chosencolumns

For map : u → β,Choose non-zero elements of β in eachsection as given by these indices

Barron, Joseph (Yale) Least Squares, Superposition codes 8 / 22

Page 15: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Encoding

0 1 1 1 0 0 0 0 1 1 0 1{ { {

0111 0000 1101

Non-zero elements of β:

7th element from Section 1

0th element from Section 2

13th element from Section 3

Map from Message u to β

Assume length of u, K = L log2 BFor ex. take L = 3 and log2 B = 4.

Split u into L subtrings of log2 B bits

Substrings give addresses of chosencolumns

For map : u → β,Choose non-zero elements of β in eachsection as given by these indices

Barron, Joseph (Yale) Least Squares, Superposition codes 8 / 22

Page 16: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Encoding

0 1 1 1 0 0 0 0 1 1 0 1{ { {

7 0 13

Non-zero elements of β:

7th element from Section 1

0th element from Section 2

13th element from Section 3

Map from Message u to β

Assume length of u, K = L log2 BFor ex. take L = 3 and log2 B = 4.

Split u into L subtrings of log2 B bits

Substrings give addresses of chosencolumns

For map : u → β,Choose non-zero elements of β in eachsection as given by these indices

Barron, Joseph (Yale) Least Squares, Superposition codes 8 / 22

Page 17: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Encoding

0 1 1 1 0 0 0 0 1 1 0 1{ { {

7 0 13

Non-zero elements of β:

7th element from Section 1

0th element from Section 2

13th element from Section 3

Map from Message u to β

Assume length of u, K = L log2 BFor ex. take L = 3 and log2 B = 4.

Split u into L subtrings of log2 B bits

Substrings give addresses of chosencolumns

For map : u → β,Choose non-zero elements of β in eachsection as given by these indices

Barron, Joseph (Yale) Least Squares, Superposition codes 8 / 22

Page 18: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Decoding, Least Squares

Least Square Decoder

Find β which minimizes ||Y − Xβ||2 over β’s of the assumed form

Reliability

Want error β 6= β∗ to have small probability, when β∗ is sent

Small probability of Block error

Less stringent, want β to not equal to β∗ in at most α sections

Small probability that section error rate is greater than α

Barron, Joseph (Yale) Least Squares, Superposition codes 9 / 22

Page 19: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Decoding, Least Squares

Least Square Decoder

Find β which minimizes ||Y − Xβ||2 over β’s of the assumed form

Reliability

Want error β 6= β∗ to have small probability, when β∗ is sent

Small probability of Block error

Less stringent, want β to not equal to β∗ in at most α sections

Small probability that section error rate is greater than α

Barron, Joseph (Yale) Least Squares, Superposition codes 9 / 22

Page 20: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Decoding, Least Squares

Least Square Decoder

Find β which minimizes ||Y − Xβ||2 over β’s of the assumed form

Reliability

Want error β 6= β∗ to have small probability, when β∗ is sent

Small probability of Block error

Less stringent, want β to not equal to β∗ in at most α sections

Small probability that section error rate is greater than α

Barron, Joseph (Yale) Least Squares, Superposition codes 9 / 22

Page 21: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Decoding, Least Squares

Least Squares is the optimal decoder. It minimizes probability of errorwith uniform distribution on input strings

Analysis here of performance of Least Squares, without concern forcomputational feasibility

A computationally feasible algorithm discussed

Today, Session S-Fr-3, 2:40-4:00 p.m.

Barron, Joseph (Yale) Least Squares, Superposition codes 10 / 22

Page 22: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Decoding, Least Squares

Least Squares is the optimal decoder. It minimizes probability of errorwith uniform distribution on input strings

Analysis here of performance of Least Squares, without concern forcomputational feasibility

A computationally feasible algorithm discussed

Today, Session S-Fr-3, 2:40-4:00 p.m.

Barron, Joseph (Yale) Least Squares, Superposition codes 10 / 22

Page 23: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Performance of Least Squares

Result 1: Section size

To achieve rates up to capacity, the section size B need only be apolynomial in L.

In particular, B = Lav , where av is a function of only v .

av is decreasing function of vav is near 1 for large v

Dictionary size N = L1+av

Barron, Joseph (Yale) Least Squares, Superposition codes 11 / 22

Page 24: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Plot of av

v

sect

ion

size

rat

e

5 15 25 35 45 55 65 75 85 95

12

34

56

av

Barron, Joseph (Yale) Least Squares, Superposition codes 12 / 22

Page 25: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Performance of Least Squares

Result 2: Error Exponent

Letε = Prob{# section mistakes > αL}

be the probability that more than α fraction of sections are wrong.Then,

ε ≤ exp{−n cv min(α, (C−R)2)}

where cv is a constant that depends on only v .

Barron, Joseph (Yale) Least Squares, Superposition codes 13 / 22

Page 26: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Performance of Least Squares

From Partially Correct to Completely Correct Decoding

Let R be a rate for which the partitioned superposition code has

Prob{# section mistakes > αL} = ε.

Then through composition with an outer Reed-Solomon code, oneobtains a code with Rate Rtot = (1− 2α)R and

Prob{block error} = ε

Corollary 3: Block error probability

Taking α of order (C−R)2 we have

Prob{block error} ≤ exp{−n cv (C − Rtot)2}Optimal form of exponent as in Shannon & Gallager or Polyanskiyet.al

Barron, Joseph (Yale) Least Squares, Superposition codes 14 / 22

Page 27: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Performance of Least Squares

From Partially Correct to Completely Correct Decoding

Let R be a rate for which the partitioned superposition code has

Prob{# section mistakes > αL} = ε.

Then through composition with an outer Reed-Solomon code, oneobtains a code with Rate Rtot = (1− 2α)R and

Prob{block error} = ε

Corollary 3: Block error probability

Taking α of order (C−R)2 we have

Prob{block error} ≤ exp{−n cv (C − Rtot)2}Optimal form of exponent as in Shannon & Gallager or Polyanskiyet.al

Barron, Joseph (Yale) Least Squares, Superposition codes 14 / 22

Page 28: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Performance of Least Squares

Comparison with PPV curve

Polyanskiy, Poor and Verdu demonstrate that for the Gaussianchannel the following approximate relation holds for an optimal code

R ≈ C −√

V

nQ−1(ε) +

1

2

log n

n

V = (v/2)(v + 2) log2 e/(v + 1)2 is the channel dispersionQ = 1− Φ, where Φ is Gaussian distribution function

Barron, Joseph (Yale) Least Squares, Superposition codes 15 / 22

Page 29: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Comparison with PPV curve

0 100 200 300 400 500

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Blocklength, n

Rat

e, R

bits

/ch.

use

CapacityPPV curveSuperposition code (Partitioned)SNR == 100Block error prob. =10−4

Barron, Joseph (Yale) Least Squares, Superposition codes 16 / 22

Page 30: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Superpositions for Multiple users vs our Single user channel

Cover introduced Superposition Codes for multi-user Gaussianchannels

Codeword sent is sum of codewords for respective users

Here we use superpositions for simplification of the single user channel

Barron, Joseph (Yale) Least Squares, Superposition codes 17 / 22

Page 31: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Relationship to Sparse Signal Recovery

Wainwright and others: Necessary and Sufficient conditions on n forsparsity recovery. When applied to our settings where N � L, thesegive that the n required is constant × L log(N/L)

It is natural to call (the reciprocal of) the best constant, for a givenset of allowed signals and given noise distribution, the compressedsensing capacity or signal recovery capacity.

For our setting n = (1/R)L log B so that the best constant is 1/C andthus the signal recovery capacity is equal to the Shannon capacity.

Barron, Joseph (Yale) Least Squares, Superposition codes 18 / 22

Page 32: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Error Probability Bound

Analysis

Least squares estimate β satisfies |Y − X β|2 ≤ |Y − Xβ∗|2

Error event of a fraction of α = `/L section mistakes, contained in

Eα = {|Y − Xβ|2 ≤ |Y − Xβ∗|2 for some β ∈Wrongα}

where Wrongα is the set of β differing from β∗ in fraction α sections.

Our bound:

Prob[Eα] ≤(

L

αL

)exp{−nDα}

Exponent Dα is sufficiently large to cancel the combinatorialcoefficient provided the B ≥ Lav and R < C .

Barron, Joseph (Yale) Least Squares, Superposition codes 19 / 22

Page 33: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Some ingredients of the error exponent

Analysis

Ingredients in Dα = D(∆α, ρ2α)

∆α = α(C − R) + (Cα − αC )

Cα = (1/2) log(1 + αv)

1−ρ2α = α(1−α)v/(1 + α2v)

D(∆, ρ2) is the cumulant generating function for (1/2)(Z 21 −Z 2

2 ) withZ1,Z2 bivariate normal, mean zero, unit variance and correlation ρ.

Near (1/2)∆2/(1−ρ2) for small ∆

Barron, Joseph (Yale) Least Squares, Superposition codes 20 / 22

Page 34: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Contributions to Error Exponent

020

4060

80

αα

erro

r ex

pone

nts

0 αα0 0.2 0.4 0.6 0.8 1

P(# mistakes >l0) = 1.8(10)−12

L = 100B = 8192P/σσ2 == 15C = 2R = 0.7Cn = 929N = 819200

Exponent of Prob(Eαα)

Barron, Joseph (Yale) Least Squares, Superposition codes 21 / 22

Page 35: Least Squares Superposition Codes of Moderate Dictionary ...arb4/presentations/IEEEsw0617.pdf · Dictionary Size, Reliable at Rates up to Capacity Antony Joseph Department of Statistics

Summary

Sparse superposition coding is reliable at rates up to channel capacity

Error probability of the optimum form for R < C

Analysis blends modern statistical regression and information theory

Thank You

Barron, Joseph (Yale) Least Squares, Superposition codes 22 / 22