Lecture 8: Polar Codes - Inria · Polar codes, a class of codes which allows to 1.attain...

Preview:

Citation preview

Lecture 8: Polar Codes

November 18, 2016

Information Theory

Polar Codes

1. Introduction

2. Coding

3. Decoding

Information Theory 1/40

1. Introduction

Polar codes, a class of codes which allows to

1. attain the capacity of all symmetric memoryless channels (=those for which thecapacity is attained for a uniform input distribution),

2. with an encoding algorithm of complexity O(N logN) (N = code length),

3. with a decoding algorithm of complexity O(N logN).

This decoding algorithm borrows many ideas from the decoding algorithm used forLDPC codes.

Information Theory 2/40

Polar Codes

1. a coding architecture based on the Fast Fourier Transform by fixing some bitsto 0,

2. a (suboptimal) decoding algorithm which computes the probability that theinput bits are equal to 0 given the previous input bits and the probabilities ofthe output bits.

Information Theory 3/40

Encoding : example

positions in red=

0

0

0

0

1

0

0

1

1

1

1

0

0

0

00

information

Information Theory 4/40

Polar Code : linear code

In the previous case, it is a code of generator matrix

G =

1 1 1 1 0 0 0 01 1 0 0 1 1 0 01 0 1 0 1 0 1 01 1 1 1 1 1 1 1

Information Theory 5/40

Decoding : example on the erasure channel

0

0

0

0

0

1

0

0

?

?

?

? ?

?

1

?

Information Theory 6/40

Decoding : using the properties of the “base code”“

a

b

a+b

b

Information Theory 7/40

Decoding : an example where all erasures can be recovered

v

uu+v

v?

? u

v

Information Theory 8/40

A configuration where erasures can be partially recovered

v

??

?

? ?

v v

Information Theory 9/40

Another configuration where erasures can be partiallyrecovered

a+u

a u

??

u

?

a

Information Theory 10/40

?

0

0

0

0

1

0

0

?

?

?

? ?

1

?

?

0

0

0

?

1

1

?

0

0

0

Information Theory 11/40

?

0

0

0

0

1

0

0

?

?

?

? ?

1

?

?

0

0

0

?

1

?

1

1

1

0

0

0

Information Theory 12/40

?

0

0

0

0

1

0

0

?

?

? ?

1

?

?

0

0

0

?

1

?

1

11

1

0

0

0

Information Theory 13/40

?

0

0

0

0

1

0

0

?

?

? ?

1

?

?

0

0

0

1

1

1

1

1

1

1

0

0

0

?

0

0

?

Information Theory 14/40

0

0

0

0

0

1

0

0?

? ?

1

?

?

0

0

0

1

1

1

1

0

0

0

?

0

1

1

1

0

?

?

Information Theory 15/40

0

0

0

0

0

1

0

0?

? ?

1

?

?

0

0

0

1

1

1

1

0

0

0

?

0

1

1

1

0

0

0

Information Theory 16/40

0

0

0

0

0

1

0

0

? ?

1

?

?

0

0

0

1

1

1

1

0

0

0

?

0

1

1

1

0

00

0

Information Theory 17/40

0

0

0

0

0

1

0

0

?

1

?

?

0

0

0

1

1

1

1

0

0

0

?

0

1

1

1

0

00

00

Information Theory 18/40

2. Encoding of a polar codelength : N = 2n, dimension : 0 ≤ k ≤ N .Choosing an ensemble F of size N − k of positions ⊂ {0, . . . , N − 1} fixed to 0.Bt= subset of {0, . . . , N − 1} whose t-th bit is equal to 0.

Encoding algorithm.

Input : u ∈ {0, 1}N , ui = 0 if i ∈ F .Output : x the codeword corresponding to u.

x← ufor t = 0 to n− 1 do

for all i ∈ Bt doxi ← xi ⊕ xi+2t

end forend for

return x

Information Theory 19/40

3. Decoding

Decoding algorithm

Input : y ∈ AN output of the channel corresponding to codeword xOutput : an estimate u for u.

for all i ∈ {0, 1, . . . , N − 1} \ F do

Compute pidef=Prob(ui = 1|y, u0, . . . , ui−1)

if pi > 0.5 thenui = 1

elseui = 0

end ifend for

Information Theory 20/40

Why does this work ?

I How to choose F ?

I Can pi be computed efficiently ?

I Why does this procedure work at all ?

Information Theory 21/40

A simple computation

channel

11

1

22

2

U

U

X

X

Y

Y

1q p

1

p2

channel

We know p1 = Prob(x1 = 1|y1) and p2 = Prob(x2 = 1|y2). We compute

q1 = Prob(u1 = 1|y1, y2).

Information Theory 22/40

The formula

Lemma 1. Let X1 and X2 be two independent binary random variables and

denote by ridef=Prob(X1 = 1), then

Prob(Z1 ⊕ Z2 = 1) =1− (1− 2r1)(1− 2r2)

2

Application :

q1 =1− (1− 2p1)(1− 2p2)

2

Information Theory 23/40

Another simple computation

channel1

11

22

2

U

U

X

X

Y

Y

u1

p1

p22

q

channel

We know p1 = Prob(x1 = 1|y1), p2 = Prob(x2 = 1|y2) and u1. We computeq2 = Prob(u2 = 1|u1, y1, y2).

Information Theory 24/40

The formula

Lemma 2. A uniformly distributed random bit B is sent through two memorylesschannels, y1 and y2 are the corresponding outputs. If we denote by ri = Prob(B =1|yi), then

Prob(B = 1|y1, y2) =r1r2

r1r2 + (1− r1)(1− r2).

Application :

q2 =p1p2

p1p2 + (1− p1)(1− p2)if u1 = 0

q2 =(1− p1)p2

(1− p1)p2 + p1(1− p2)if u1 = 1

Information Theory 25/40

Notation

We denote by uti the input of the encoding circuit at level t (u0

i = ui) and pti arethe probabilities for these bits that are computed or given by the channel whent = n :

0u

10u

20u

30u

40u

50u

60u

70u

01u

41u

61u

11u

51u

71u

42u

52u

22u

62u

72u

32u

12u

21u

31u

03p

13p

23p

33p

43p

53p

63p

73p

72p

52p

32p

12p

02p

22p

42p

62p

01p

21p

11p

31p

41p

51p

61p

71p

02u

0

Information Theory 26/40

Decoding algorithm (full version)

for i = 0 to N − 1 dofor t = 1 to n− 1 do

Compute all the utj’s that can be obtained directly from the already known

ut−1l ’s for l < i.

end forfor t = n− 1 to 0 do

Compute all the ptj’s that can be obtained directly from the pt+1k ’s and the

known utl’s (with l < i)

end forif i /∈ F then

if p0i > 0.5 thenu0i = 1

elseu0i = 0

end ifend if

end for

Information Theory 27/40

Correction of the algorithm

Lemma 3. At step i of the outer loop and step t of the inner loop of the previousalgorithm, the ut

j’s that can be calculated correspond to the indices j in the set{j : 0 ≤ j ≤ 2tb i

2tc − 1

}.

Lemma 4. At step i of the outer loop and step t of the inner loop of the previousalgorithm, the ptj’s that can be calculated correspond to the indices j in the set{j : 0 ≤ j ≤ 2tbi+2t

2tc − 1

}.

Corollary 1. p0i can always be computed at step i of the outer loop.

Information Theory 28/40

Modeling the decoder

Decoding of the base code can be modeled by the following transmission over twodifferent channels :

u1channel 1 y1, y2

u2channel 2 u1, y1, y2

and we know the channels, they provide Prob(u1 = 1|y1, y2) and Prob(u2 =1|u1, y1, y2).

Information Theory 29/40

Case of an erasure channel

Assume that

x1erasure channel of prob. p1 y1

x2erasure channel of prob. p2 y2

Prob(u1 is erased ) = Prob(x1 ⊕ x2 erased )= Prob(x1 or x2 erased )= 1− (1− p1)(1− p2)= p1 + p2 − p1p2

Prob(u2 is erased) = Prob(x1 and x2 erased )= p1p2

Information Theory 30/40

Induced channel model in the case of the erasure channel

y1, y2decoding u1

u1, y1, y2decoding u2

Induced channel model :

u1channel 1 of eras. prob. p1 + p2 − p1p2 u1

u2channel 2 of eras. prob. p1p2 u2

If we denote by C(p) the capacity of the erasure channel of probability p (C(p) =1− p) then

C(p1) + C(p2) = C(p1 + p2 − p1p2) + C(p1p2). (1)

Information Theory 31/40

Equivalent models for p = 0.25 and n = 3

0.228

0.25

0.25

0.25

0.25

0.25

0.25

0.25

0.25

0.4375

0.4375

0.4375

0.4375

0.0625

0.0625

0.0625

0.0625

0.684

0.684

0.191

0.1910.037

0.004

0.121

0.121

0.004

0.015

0.008

0.0000016

0.9

0.467

0.346

We choose the positions in red for F .

Information Theory 32/40

Equivalent models for n ∈ {5, 8, 16}

”Prob(q > p)” = #{i:q>p}2n

Information Theory 33/40

Why the whole scheme works and attains the capacity of theerasure channel

Point 1 : The equivalent channels“polarize”, the erasure probability is either closeto 0 or 1.

Point 2 : The “conservation law” (1) C(p1)+C(p2) = C(p1+p2−p1p2)+C(p1p2)ensures that

N−1∑i=0

C(qi) =

N−1∑i=0

C(pi) = NC(p)

with qi = capacity of the i−th equivalent channel at the input and pi = capacityof the i−th output channel.

Point 3 : Since either C(qi) ≈ 0 or C(qi) ≈ 1,

kdef=N − |F|def=#{i : C(qi) ≈ 1} ≈ NC(p)

Information Theory 34/40

The general case : the basic scheme

Assumption : U1 and U2 independent and uniformly distributed in {0, 1}.

channel

1

1

1

22

2

U

U

X

X

Y

Y

channel

Same “conservation law” as for the erasure channel :

Theorem 1.

I(U1;Y1, Y2) + I(U2;U1, Y1, Y2) = I(X1;Y1) + I(X2;Y2).

Information Theory 35/40

A lemma on the independence of random variables

Lemma 5. U1 and U2 independent and uniformly distributed,⇒ X1 and X2 independent and uniformly distributed⇒ Y1 and Y2 independent.

proof : X1 and X2 independent and uniformly distributed (obvious).

Information Theory 36/40

Proof (cont’d)

P(Y1 = a, Y2 = b) =∑c,d

P(Y1 = a, Y2 = b|X1 = c,X2 = d)P(X1 = c,X2 = d)

=∑c,d

P(Y1 = a|X2 = c)P(Y2 = b|X2 = d)P(X1 = c)P(X2 = d)

= S1S2 with

S1 =∑c

P(Y1 = a|X1 = c)P(X1 = c) = P (Y1 = a)

S2 =∑d

P(Y2 = b|X2 = d)P(X2 = d) = P (Y2 = b)

HenceP(Y1 = a, Y2 = b) = P (Y1 = a)P (Y2 = b)

Information Theory 37/40

An important lemma in information theory

Lemma 6. If Yi is the output corresponding to Xi after transmission through amemoryless channel

I(X1, X2;Y1, Y2) ≤ I(X1;Y1) + I(X2;Y2).

If Y1 and Y2 are independent

I(X1, X2;Y1, Y2) = I(X1;Y1) + I(X2;Y2).

Information Theory 38/40

Proof

I(X1, X2;Y1, Y2) = H(Y1, Y2)−H(Y1, Y2|X1, X2)

(definition of mutual information)

= H(Y1) +H(Y2)−H(Y1|X1, X2)−H(Y2|X1, X2, Y1)

(independence of the Yi’s)

= H(Y1) +H(Y2)−H(Y1|X1)−H(Y2|X2)

(memoryless channel)

= I(X1;Y1) + I(X2;Y2)

(definition of mutual information)

Information Theory 39/40

Proof of Theorem 1

I(X1;Y1) + I(X2;Y2) = I(X1, X2;Y1, Y2)

= I(U1, U2;Y1, Y2)

= H(U1, U2)−H(U1, U2|Y1, Y2)

= H(U1) +H(U2)−H(U1|Y1, Y2)−H(U2|U1, Y1, Y2)

= I(U1;Y1, Y2) + I(U2;U1, Y1, Y2)

Information Theory 40/40

Recommended