Upload
dodung
View
228
Download
1
Embed Size (px)
Citation preview
Lecture 8: Polar Codes
November 18, 2016
Information Theory
Polar Codes
1. Introduction
2. Coding
3. Decoding
Information Theory 1/40
1. Introduction
Polar codes, a class of codes which allows to
1. attain the capacity of all symmetric memoryless channels (=those for which thecapacity is attained for a uniform input distribution),
2. with an encoding algorithm of complexity O(N logN) (N = code length),
3. with a decoding algorithm of complexity O(N logN).
This decoding algorithm borrows many ideas from the decoding algorithm used forLDPC codes.
Information Theory 2/40
Polar Codes
1. a coding architecture based on the Fast Fourier Transform by fixing some bitsto 0,
2. a (suboptimal) decoding algorithm which computes the probability that theinput bits are equal to 0 given the previous input bits and the probabilities ofthe output bits.
Information Theory 3/40
Encoding : example
positions in red=
0
0
0
0
1
0
0
1
1
1
1
0
0
0
00
information
Information Theory 4/40
Polar Code : linear code
In the previous case, it is a code of generator matrix
G =
1 1 1 1 0 0 0 01 1 0 0 1 1 0 01 0 1 0 1 0 1 01 1 1 1 1 1 1 1
Information Theory 5/40
Decoding : example on the erasure channel
0
0
0
0
0
1
0
0
?
?
?
? ?
?
1
?
Information Theory 6/40
Decoding : using the properties of the “base code”“
a
b
a+b
b
Information Theory 7/40
Decoding : an example where all erasures can be recovered
v
uu+v
v?
? u
v
Information Theory 8/40
A configuration where erasures can be partially recovered
v
??
?
? ?
v v
Information Theory 9/40
Another configuration where erasures can be partiallyrecovered
a+u
a u
??
u
?
a
Information Theory 10/40
?
0
0
0
0
1
0
0
?
?
?
? ?
1
?
?
0
0
0
?
1
1
?
0
0
0
Information Theory 11/40
?
0
0
0
0
1
0
0
?
?
?
? ?
1
?
?
0
0
0
?
1
?
1
1
1
0
0
0
Information Theory 12/40
?
0
0
0
0
1
0
0
?
?
? ?
1
?
?
0
0
0
?
1
?
1
11
1
0
0
0
Information Theory 13/40
?
0
0
0
0
1
0
0
?
?
? ?
1
?
?
0
0
0
1
1
1
1
1
1
1
0
0
0
?
0
0
?
Information Theory 14/40
0
0
0
0
0
1
0
0?
? ?
1
?
?
0
0
0
1
1
1
1
0
0
0
?
0
1
1
1
0
?
?
Information Theory 15/40
0
0
0
0
0
1
0
0?
? ?
1
?
?
0
0
0
1
1
1
1
0
0
0
?
0
1
1
1
0
0
0
Information Theory 16/40
0
0
0
0
0
1
0
0
? ?
1
?
?
0
0
0
1
1
1
1
0
0
0
?
0
1
1
1
0
00
0
Information Theory 17/40
0
0
0
0
0
1
0
0
?
1
?
?
0
0
0
1
1
1
1
0
0
0
?
0
1
1
1
0
00
00
Information Theory 18/40
2. Encoding of a polar codelength : N = 2n, dimension : 0 ≤ k ≤ N .Choosing an ensemble F of size N − k of positions ⊂ {0, . . . , N − 1} fixed to 0.Bt= subset of {0, . . . , N − 1} whose t-th bit is equal to 0.
Encoding algorithm.
Input : u ∈ {0, 1}N , ui = 0 if i ∈ F .Output : x the codeword corresponding to u.
x← ufor t = 0 to n− 1 do
for all i ∈ Bt doxi ← xi ⊕ xi+2t
end forend for
return x
Information Theory 19/40
3. Decoding
Decoding algorithm
Input : y ∈ AN output of the channel corresponding to codeword xOutput : an estimate u for u.
for all i ∈ {0, 1, . . . , N − 1} \ F do
Compute pidef=Prob(ui = 1|y, u0, . . . , ui−1)
if pi > 0.5 thenui = 1
elseui = 0
end ifend for
Information Theory 20/40
Why does this work ?
I How to choose F ?
I Can pi be computed efficiently ?
I Why does this procedure work at all ?
Information Theory 21/40
A simple computation
channel
11
1
22
2
U
U
X
X
Y
Y
1q p
1
p2
channel
We know p1 = Prob(x1 = 1|y1) and p2 = Prob(x2 = 1|y2). We compute
q1 = Prob(u1 = 1|y1, y2).
Information Theory 22/40
The formula
Lemma 1. Let X1 and X2 be two independent binary random variables and
denote by ridef=Prob(X1 = 1), then
Prob(Z1 ⊕ Z2 = 1) =1− (1− 2r1)(1− 2r2)
2
Application :
q1 =1− (1− 2p1)(1− 2p2)
2
Information Theory 23/40
Another simple computation
channel1
11
22
2
U
U
X
X
Y
Y
u1
p1
p22
q
channel
We know p1 = Prob(x1 = 1|y1), p2 = Prob(x2 = 1|y2) and u1. We computeq2 = Prob(u2 = 1|u1, y1, y2).
Information Theory 24/40
The formula
Lemma 2. A uniformly distributed random bit B is sent through two memorylesschannels, y1 and y2 are the corresponding outputs. If we denote by ri = Prob(B =1|yi), then
Prob(B = 1|y1, y2) =r1r2
r1r2 + (1− r1)(1− r2).
Application :
q2 =p1p2
p1p2 + (1− p1)(1− p2)if u1 = 0
q2 =(1− p1)p2
(1− p1)p2 + p1(1− p2)if u1 = 1
Information Theory 25/40
Notation
We denote by uti the input of the encoding circuit at level t (u0
i = ui) and pti arethe probabilities for these bits that are computed or given by the channel whent = n :
0u
10u
20u
30u
40u
50u
60u
70u
01u
41u
61u
11u
51u
71u
42u
52u
22u
62u
72u
32u
12u
21u
31u
03p
13p
23p
33p
43p
53p
63p
73p
72p
52p
32p
12p
02p
22p
42p
62p
01p
21p
11p
31p
41p
51p
61p
71p
02u
0
Information Theory 26/40
Decoding algorithm (full version)
for i = 0 to N − 1 dofor t = 1 to n− 1 do
Compute all the utj’s that can be obtained directly from the already known
ut−1l ’s for l < i.
end forfor t = n− 1 to 0 do
Compute all the ptj’s that can be obtained directly from the pt+1k ’s and the
known utl’s (with l < i)
end forif i /∈ F then
if p0i > 0.5 thenu0i = 1
elseu0i = 0
end ifend if
end for
Information Theory 27/40
Correction of the algorithm
Lemma 3. At step i of the outer loop and step t of the inner loop of the previousalgorithm, the ut
j’s that can be calculated correspond to the indices j in the set{j : 0 ≤ j ≤ 2tb i
2tc − 1
}.
Lemma 4. At step i of the outer loop and step t of the inner loop of the previousalgorithm, the ptj’s that can be calculated correspond to the indices j in the set{j : 0 ≤ j ≤ 2tbi+2t
2tc − 1
}.
Corollary 1. p0i can always be computed at step i of the outer loop.
Information Theory 28/40
Modeling the decoder
Decoding of the base code can be modeled by the following transmission over twodifferent channels :
u1channel 1 y1, y2
u2channel 2 u1, y1, y2
and we know the channels, they provide Prob(u1 = 1|y1, y2) and Prob(u2 =1|u1, y1, y2).
Information Theory 29/40
Case of an erasure channel
Assume that
x1erasure channel of prob. p1 y1
x2erasure channel of prob. p2 y2
Prob(u1 is erased ) = Prob(x1 ⊕ x2 erased )= Prob(x1 or x2 erased )= 1− (1− p1)(1− p2)= p1 + p2 − p1p2
Prob(u2 is erased) = Prob(x1 and x2 erased )= p1p2
Information Theory 30/40
Induced channel model in the case of the erasure channel
y1, y2decoding u1
u1, y1, y2decoding u2
Induced channel model :
u1channel 1 of eras. prob. p1 + p2 − p1p2 u1
u2channel 2 of eras. prob. p1p2 u2
If we denote by C(p) the capacity of the erasure channel of probability p (C(p) =1− p) then
C(p1) + C(p2) = C(p1 + p2 − p1p2) + C(p1p2). (1)
Information Theory 31/40
Equivalent models for p = 0.25 and n = 3
0.228
0.25
0.25
0.25
0.25
0.25
0.25
0.25
0.25
0.4375
0.4375
0.4375
0.4375
0.0625
0.0625
0.0625
0.0625
0.684
0.684
0.191
0.1910.037
0.004
0.121
0.121
0.004
0.015
0.008
0.0000016
0.9
0.467
0.346
We choose the positions in red for F .
Information Theory 32/40
Equivalent models for n ∈ {5, 8, 16}
”Prob(q > p)” = #{i:q>p}2n
Information Theory 33/40
Why the whole scheme works and attains the capacity of theerasure channel
Point 1 : The equivalent channels“polarize”, the erasure probability is either closeto 0 or 1.
Point 2 : The “conservation law” (1) C(p1)+C(p2) = C(p1+p2−p1p2)+C(p1p2)ensures that
N−1∑i=0
C(qi) =
N−1∑i=0
C(pi) = NC(p)
with qi = capacity of the i−th equivalent channel at the input and pi = capacityof the i−th output channel.
Point 3 : Since either C(qi) ≈ 0 or C(qi) ≈ 1,
kdef=N − |F|def=#{i : C(qi) ≈ 1} ≈ NC(p)
Information Theory 34/40
The general case : the basic scheme
Assumption : U1 and U2 independent and uniformly distributed in {0, 1}.
channel
1
1
1
22
2
U
U
X
X
Y
Y
channel
Same “conservation law” as for the erasure channel :
Theorem 1.
I(U1;Y1, Y2) + I(U2;U1, Y1, Y2) = I(X1;Y1) + I(X2;Y2).
Information Theory 35/40
A lemma on the independence of random variables
Lemma 5. U1 and U2 independent and uniformly distributed,⇒ X1 and X2 independent and uniformly distributed⇒ Y1 and Y2 independent.
proof : X1 and X2 independent and uniformly distributed (obvious).
Information Theory 36/40
Proof (cont’d)
P(Y1 = a, Y2 = b) =∑c,d
P(Y1 = a, Y2 = b|X1 = c,X2 = d)P(X1 = c,X2 = d)
=∑c,d
P(Y1 = a|X2 = c)P(Y2 = b|X2 = d)P(X1 = c)P(X2 = d)
= S1S2 with
S1 =∑c
P(Y1 = a|X1 = c)P(X1 = c) = P (Y1 = a)
S2 =∑d
P(Y2 = b|X2 = d)P(X2 = d) = P (Y2 = b)
HenceP(Y1 = a, Y2 = b) = P (Y1 = a)P (Y2 = b)
Information Theory 37/40
An important lemma in information theory
Lemma 6. If Yi is the output corresponding to Xi after transmission through amemoryless channel
I(X1, X2;Y1, Y2) ≤ I(X1;Y1) + I(X2;Y2).
If Y1 and Y2 are independent
I(X1, X2;Y1, Y2) = I(X1;Y1) + I(X2;Y2).
Information Theory 38/40
Proof
I(X1, X2;Y1, Y2) = H(Y1, Y2)−H(Y1, Y2|X1, X2)
(definition of mutual information)
= H(Y1) +H(Y2)−H(Y1|X1, X2)−H(Y2|X1, X2, Y1)
(independence of the Yi’s)
= H(Y1) +H(Y2)−H(Y1|X1)−H(Y2|X2)
(memoryless channel)
= I(X1;Y1) + I(X2;Y2)
(definition of mutual information)
Information Theory 39/40
Proof of Theorem 1
I(X1;Y1) + I(X2;Y2) = I(X1, X2;Y1, Y2)
= I(U1, U2;Y1, Y2)
= H(U1, U2)−H(U1, U2|Y1, Y2)
= H(U1) +H(U2)−H(U1|Y1, Y2)−H(U2|U1, Y1, Y2)
= I(U1;Y1, Y2) + I(U2;U1, Y1, Y2)
Information Theory 40/40