Lecture 8: Polar Codes - Inria · Polar codes, a class of codes which allows to 1.attain thecapacityofall symmetric memoryless channels(=those for which the capacity is attained for

Lecture 8: Polar Codes

November 18, 2016

Information Theory

Polar Codes

1. Introduction

2. Coding

3. Decoding

Information Theory 1/40

1. Introduction

Polar codes, a class of codes which allows to

1. attain the capacity of all symmetric memoryless channels (=those for which thecapacity is attained for a uniform input distribution),

2. with an encoding algorithm of complexity O(N logN) (N = code length),

3. with a decoding algorithm of complexity O(N logN).

This decoding algorithm borrows many ideas from the decoding algorithm used forLDPC codes.


Polar Codes

1. a coding architecture based on the Fast Fourier Transform by fixing some bitsto 0,

2. a (suboptimal) decoding algorithm which computes the probability that theinput bits are equal to 0 given the previous input bits and the probabilities ofthe output bits.


Encoding : example

positions in red=

0

0

0

0

1

0

0

1

1

1

1

0

0

0

00

information


Polar Code : linear code

In the previous case, it is a code of generator matrix

G =

1 1 1 1 0 0 0 01 1 0 0 1 1 0 01 0 1 0 1 0 1 01 1 1 1 1 1 1 1


Decoding : example on the erasure channel

0

0

0

0

0

1

0

0

?

?

?

? ?

?

1

?


Decoding : using the properties of the “base code”“

a

b

a+b

b


Decoding : an example where all erasures can be recovered

v

uu+v

v?

? u

v


A configuration where erasures can be partially recovered

v

??

?

? ?

v v


Another configuration where erasures can be partiallyrecovered

a+u

a u

??

u

?

a


?

0

0

0

0

1

0

0

?

?

?

? ?

1

?

?

0

0

0

?

1

1

?

0

0

0


?

0

0

0

0

1

0

0

?

?

?

? ?

1

?

?

0

0

0

?

1

?

1

1

1

0

0

0


?

0

0

0

0

1

0

0

?

?

? ?

1

?

?

0

0

0

?

1

?

1

11

1

0

0

0


?

0

0

0

0

1

0

0

?

?

? ?

1

?

?

0

0

0

1

1

1

1

1

1

1

0

0

0

?

0

0

?


0

0

0

0

0

1

0

0?

? ?

1

?

?

0

0

0

1

1

1

1

0

0

0

?

0

1

1

1

0

?

?


0

0

0

0

0

1

0

0?

? ?

1

?

?

0

0

0

1

1

1

1

0

0

0

?

0

1

1

1

0

0

0


0

0

0

0

0

1

0

0

? ?

1

?

?

0

0

0

1

1

1

1

0

0

0

?

0

1

1

1

0

00

0


0

0

0

0

0

1

0

0

?

1

?

?

0

0

0

1

1

1

1

0

0

0

?

0

1

1

1

0

00

00


2. Encoding of a polar codelength : N = 2n, dimension : 0 ≤ k ≤ N .Choosing an ensemble F of size N − k of positions ⊂ {0, . . . , N − 1} fixed to 0.Bt= subset of {0, . . . , N − 1} whose t-th bit is equal to 0.

Encoding algorithm.

Input : u ∈ {0, 1}N , ui = 0 if i ∈ F .Output : x the codeword corresponding to u.

x← ufor t = 0 to n− 1 do

for all i ∈ Bt doxi ← xi ⊕ xi+2t

end forend for

return x


3. Decoding

Decoding algorithm

Input : y ∈ AN output of the channel corresponding to codeword xOutput : an estimate u for u.

for all i ∈ {0, 1, . . . , N − 1} \ F do

Compute pidef=Prob(ui = 1|y, u0, . . . , ui−1)

if pi > 0.5 thenui = 1

elseui = 0

end ifend for


Why does this work ?

I How to choose F ?

I Can pi be computed efficiently ?

I Why does this procedure work at all ?


A simple computation

channel

11

1

22

2

U

U

X

X

Y

Y

1q p

1

p2

channel

We know p1 = Prob(x1 = 1|y1) and p2 = Prob(x2 = 1|y2). We compute

q1 = Prob(u1 = 1|y1, y2).


The formula

Lemma 1. Let X1 and X2 be two independent binary random variables and

denote by ridef=Prob(X1 = 1), then

Prob(Z1 ⊕ Z2 = 1) =1− (1− 2r1)(1− 2r2)

2

Application :

q1 =1− (1− 2p1)(1− 2p2)

2


Another simple computation

channel1

11

22

2

U

U

X

X

Y

Y

u1

p1

p22

q

channel

We know p1 = Prob(x1 = 1|y1), p2 = Prob(x2 = 1|y2) and u1. We computeq2 = Prob(u2 = 1|u1, y1, y2).


The formula

Lemma 2. A uniformly distributed random bit B is sent through two memorylesschannels, y1 and y2 are the corresponding outputs. If we denote by ri = Prob(B =1|yi), then

Prob(B = 1|y1, y2) =r1r2

r1r2 + (1− r1)(1− r2).

Application :

q2 =p1p2

p1p2 + (1− p1)(1− p2)if u1 = 0

q2 =(1− p1)p2

(1− p1)p2 + p1(1− p2)if u1 = 1


Notation

We denote by uti the input of the encoding circuit at level t (u0

i = ui) and pti arethe probabilities for these bits that are computed or given by the channel whent = n :

0u

10u

20u

30u

40u

50u

60u

70u

01u

41u

61u

11u

51u

71u

42u

52u

22u

62u

72u

32u

12u

21u

31u

03p

13p

23p

33p

43p

53p

63p

73p

72p

52p

32p

12p

02p

22p

42p

62p

01p

21p

11p

31p

41p

51p

61p

71p

02u

0


Decoding algorithm (full version)

for i = 0 to N − 1 dofor t = 1 to n− 1 do

Compute all the utj’s that can be obtained directly from the already known

ut−1l ’s for l < i.

end forfor t = n− 1 to 0 do

Compute all the ptj’s that can be obtained directly from the pt+1k ’s and the

known utl’s (with l < i)

end forif i /∈ F then

if p0i > 0.5 thenu0i = 1

elseu0i = 0

end ifend if

end for


Correction of the algorithm

Lemma 3. At step i of the outer loop and step t of the inner loop of the previousalgorithm, the ut

j’s that can be calculated correspond to the indices j in the set{j : 0 ≤ j ≤ 2tb i

2tc − 1

}.

Lemma 4. At step i of the outer loop and step t of the inner loop of the previousalgorithm, the ptj’s that can be calculated correspond to the indices j in the set{j : 0 ≤ j ≤ 2tbi+2t

2tc − 1

}.

Corollary 1. p0i can always be computed at step i of the outer loop.


Modeling the decoder

Decoding of the base code can be modeled by the following transmission over twodifferent channels :

u1channel 1 y1, y2

u2channel 2 u1, y1, y2

and we know the channels, they provide Prob(u1 = 1|y1, y2) and Prob(u2 =1|u1, y1, y2).


Case of an erasure channel

Assume that

x1erasure channel of prob. p1 y1

x2erasure channel of prob. p2 y2

Prob(u1 is erased ) = Prob(x1 ⊕ x2 erased )= Prob(x1 or x2 erased )= 1− (1− p1)(1− p2)= p1 + p2 − p1p2

Prob(u2 is erased) = Prob(x1 and x2 erased )= p1p2


Induced channel model in the case of the erasure channel

y1, y2decoding u1

u1, y1, y2decoding u2

Induced channel model :

u1channel 1 of eras. prob. p1 + p2 − p1p2 u1

u2channel 2 of eras. prob. p1p2 u2

If we denote by C(p) the capacity of the erasure channel of probability p (C(p) =1− p) then

C(p1) + C(p2) = C(p1 + p2 − p1p2) + C(p1p2). (1)


Equivalent models for p = 0.25 and n = 3

0.228

0.25

0.25

0.25

0.25

0.25

0.25

0.25

0.25

0.4375

0.4375

0.4375

0.4375

0.0625

0.0625

0.0625

0.0625

0.684

0.684

0.191

0.1910.037

0.004

0.121

0.121

0.004

0.015

0.008

0.0000016

0.9

0.467

0.346

We choose the positions in red for F .


Equivalent models for n ∈ {5, 8, 16}

”Prob(q > p)” = #{i:q>p}2n


Why the whole scheme works and attains the capacity of theerasure channel

Point 1 : The equivalent channels“polarize”, the erasure probability is either closeto 0 or 1.

Point 2 : The “conservation law” (1) C(p1)+C(p2) = C(p1+p2−p1p2)+C(p1p2)ensures that

N−1∑i=0

C(qi) =

N−1∑i=0

C(pi) = NC(p)

with qi = capacity of the i−th equivalent channel at the input and pi = capacityof the i−th output channel.

Point 3 : Since either C(qi) ≈ 0 or C(qi) ≈ 1,

kdef=N − |F|def=#{i : C(qi) ≈ 1} ≈ NC(p)


The general case : the basic scheme

Assumption : U1 and U2 independent and uniformly distributed in {0, 1}.

channel

1

1

1

22

2

U

U

X

X

Y

Y

channel

Same “conservation law” as for the erasure channel :

Theorem 1.

I(U1;Y1, Y2) + I(U2;U1, Y1, Y2) = I(X1;Y1) + I(X2;Y2).


A lemma on the independence of random variables

Lemma 5. U1 and U2 independent and uniformly distributed,⇒ X1 and X2 independent and uniformly distributed⇒ Y1 and Y2 independent.

proof : X1 and X2 independent and uniformly distributed (obvious).


Proof (cont’d)

P(Y1 = a, Y2 = b) =∑c,d

P(Y1 = a, Y2 = b|X1 = c,X2 = d)P(X1 = c,X2 = d)

=∑c,d

P(Y1 = a|X2 = c)P(Y2 = b|X2 = d)P(X1 = c)P(X2 = d)

= S1S2 with

S1 =∑c

P(Y1 = a|X1 = c)P(X1 = c) = P (Y1 = a)

S2 =∑d

P(Y2 = b|X2 = d)P(X2 = d) = P (Y2 = b)

HenceP(Y1 = a, Y2 = b) = P (Y1 = a)P (Y2 = b)


An important lemma in information theory

Lemma 6. If Yi is the output corresponding to Xi after transmission through amemoryless channel

I(X1, X2;Y1, Y2) ≤ I(X1;Y1) + I(X2;Y2).

If Y1 and Y2 are independent

I(X1, X2;Y1, Y2) = I(X1;Y1) + I(X2;Y2).


Proof

I(X1, X2;Y1, Y2) = H(Y1, Y2)−H(Y1, Y2|X1, X2)

(definition of mutual information)

= H(Y1) +H(Y2)−H(Y1|X1, X2)−H(Y2|X1, X2, Y1)

(independence of the Yi’s)

= H(Y1) +H(Y2)−H(Y1|X1)−H(Y2|X2)

(memoryless channel)

= I(X1;Y1) + I(X2;Y2)

(definition of mutual information)


Proof of Theorem 1

I(X1;Y1) + I(X2;Y2) = I(X1, X2;Y1, Y2)

= I(U1, U2;Y1, Y2)

= H(U1, U2)−H(U1, U2|Y1, Y2)

= H(U1) +H(U2)−H(U1|Y1, Y2)−H(U2|U1, Y1, Y2)

= I(U1;Y1, Y2) + I(U2;U1, Y1, Y2)


Documents

Lecture 8: Polar Codes - Inria · Polar codes, a class of codes which allows to 1.attain thecapacityofall symmetric memoryless channels(=those for which the capacity is attained for