Capacity of Permutation Channels - cs.purdue.edu

Capacity of Permutation Channels

Anuran Makur

Department of Electrical Engineering and Computer ScienceMassachusetts Institute of Technology

7 October 2020

Anuran Makur (MIT) Capacity of Permutation Channels 7 October 2020 1 / 39

Outline

1 IntroductionThree MotivationsPermutation Channel ModelInformation CapacityExample: Binary Symmetric Channel

2 Achievability and Converse for the BSC

3 General Achievability Bound

4 General Converse Bounds

5 Conclusion


Three Motivations

Coding theory: [DG01], [Mit06], [Met09], [KV15], [KT18], . . .

Communication networks: [XZ02], [WWM09], [GG10], [KV13], . . .

Molecular/Biological Communications: [YKGR+15], [KPM16], [HSRT17], [SH19], . . .


Three Motivations

Coding theory: [DG01], [Mit06], [Met09], [KV15], [KT18], . . .Random deletion channel: LDPC codes nearly achieve capacity for large alphabetsCodes correct for transpositions of symbols




Three Motivations

Coding theory: [DG01], [Mit06], [Met09], [KV15], [KT18], . . .Random deletion channel: LDPC codes nearly achieve capacity for large alphabetsCodes correct for transpositions of symbolsPermutation channels with insertions, deletions, substitutions, or erasuresConstruction and analysis of multiset codes




Three Motivations


Communication networks: [XZ02], [WWM09], [GG10], [KV13], . . .Mobile ad hoc networks, multipath routed networks, etc.



Three Motivations


Communication networks: [XZ02], [WWM09], [GG10], [KV13], . . .Mobile ad hoc networks, multipath routed networks, etc.Out-of-order delivery of packetsCorrect for packet errors/losses when packets do not have sequence numbers



Three Motivations





Three Motivations



Molecular/Biological Communications: [YKGR+15], [KPM16], [HSRT17], [SH19], . . .DNA based storage systemsSource data encoded into DNA molecules


Three Motivations



Molecular/Biological Communications: [YKGR+15], [KPM16], [HSRT17], [SH19], . . .DNA based storage systemsSource data encoded into DNA moleculesFragments of DNA molecules cachedReceiver reads encoded data by shotgun sequencing (i.e., random sampling)


Three Motivations



Molecular/Biological Communications: [YKGR+15], [KPM16], [HSRT17], [SH19], . . .DNA based storage systemsSource data encoded into DNA moleculesFragments of DNA molecules cachedReceiver reads encoded data by shotgun sequencing (i.e., random sampling)


Motivation: Point-to-point Communication in Packet Networks

NETWORK

SENDER RECEIVER

Model communication network as a channel:

Alphabet symbols = all possible b-bit packetsMultipath routed networkPackets are impaired



NETWORK

SENDER RECEIVER

Model communication network as a channel

:

Alphabet symbols = all possible b-bit packetsMultipath routed networkPackets are impaired



NETWORK

SENDER RECEIVER


Alphabet symbols = all possible b-bit packets ⇒ 2b input symbols

Multipath routed networkPackets are impaired



NETWORK

SENDER RECEIVER


Alphabet symbols = all possible b-bit packetsMultipath routed network or evolving network topology

Packets are impaired



NETWORK

SENDER RECEIVER


Alphabet symbols = all possible b-bit packetsMultipath routed network ⇒ packets received with transpositions

Packets are impaired



NETWORK

SENDER RECEIVER


Alphabet symbols = all possible b-bit packetsMultipath routed network ⇒ packets received with transpositionsPackets are impaired (e.g., deletions, substitutions, etc.)



NETWORK

SENDER RECEIVER


Alphabet symbols = all possible b-bit packetsMultipath routed network ⇒ packets received with transpositionsPackets are impaired ⇒ model using channel probabilities


Example: Coding for Random Deletion Network

Consider a communication network where packets can be dropped:

NETWORK

SENDER RECEIVER

Abstraction:

n-length codeword = sequence of n packets:Random permutation block: Randomly permute packets of codeword

How do you code in such channels without increasing alphabet size?




NETWORK

SENDER RECEIVER

RANDOM DELETION

RANDOM PERMUTATION

Abstraction:

n-length codeword = sequence of n packets

:Random permutation block: Randomly permute packets of codeword





NETWORK

SENDER RECEIVER

RANDOM DELETION

RANDOM PERMUTATION

Abstraction:

n-length codeword = sequence of n packetsRandom deletion channel: Delete each symbol/packet independently with prob p ∈ (0, 1)

Random permutation block: Randomly permute packets of codeword





NETWORK

SENDER RECEIVER

RANDOM DELETION

RANDOM PERMUTATION

Abstraction:

n-length codeword = sequence of n packetsRandom deletion channel: Delete each symbol/packet independently with prob p ∈ (0, 1)

Random permutation block: Randomly permute packets of codeword





NETWORK

SENDER RECEIVER

RANDOM DELETION

RANDOM PERMUTATION

Abstraction:

n-length codeword = sequence of n packetsRandom deletion channel: Delete each symbol/packet independently with prob p ∈ (0, 1)Random permutation block: Randomly permute packets of codeword





NETWORK

SENDER RECEIVER

RANDOM DELETION

RANDOM PERMUTATION

Abstraction:

n-length codeword = sequence of n packetsRandom deletion channel: Delete each symbol/packet independently with prob p ∈ (0, 1)Random permutation block: Randomly permute packets of codeword





NETWORK

SENDER RECEIVER

ERASURE CHANNEL

RANDOM PERMUTATION?

?

Abstraction:

n-length codeword = sequence of n packetsEquivalent Erasure channel: Erase each symbol/packet independently with prob p ∈ (0, 1)Random permutation block: Randomly permute packets of codeword





NETWORK

SENDER RECEIVER

ERASURE CHANNEL

RANDOM PERMUTATION?

?1 2 33

31

1

Abstraction:

n-length codeword = sequence of n packetsErasure channel: Erase each symbol/packet independently with prob p ∈ (0, 1)Random permutation block: Randomly permute packets of codewordCoding: Add sequence numbers (packet size = b + log(n) bits, alphabet size = n 2b)

More refined coding techniques simulate sequence numbers [Mit06], [Met09]





NETWORK

SENDER RECEIVER

ERASURE CHANNEL

RANDOM PERMUTATION?

?1 2 33

31

1

Abstraction:

n-length codeword = sequence of n packetsErasure channel: Erase each symbol/packet independently with prob p ∈ (0, 1)Random permutation block: Randomly permute packets of codewordCoding: Add sequence numbers and use standard coding techniques

More refined coding techniques simulate sequence numbers [Mit06], [Met09]





NETWORK

SENDER RECEIVER

ERASURE CHANNEL

RANDOM PERMUTATION?

?1 2 33

31

1

Abstraction:

n-length codeword = sequence of n packetsErasure channel: Erase each symbol/packet independently with prob p ∈ (0, 1)Random permutation block: Randomly permute packets of codewordCoding: Add sequence numbers and use standard coding techniquesMore refined coding techniques simulate sequence numbers [Mit06], [Met09]





NETWORK

SENDER RECEIVER

ERASURE CHANNEL

RANDOM PERMUTATION?

?

Abstraction:

n-length codeword = sequence of n packetsErasure channel: Erase each symbol/packet independently with prob p ∈ (0, 1)Random permutation block: Randomly permute packets of codeword



Permutation Channel Model

ENCODER CHANNEL RANDOM PERMUTATION DECODER

𝑀 𝑋 𝑍 𝑌 𝑀

Sender sends message M ∼ Uniform(M)

n = blocklength

Randomized encoder fn :M→ X n produces codeword X n1 = (X1, . . . ,Xn) = fn(M)

Discrete memoryless channel PZ |X with input & output alphabets X & Y produces Zn1 :

PZn1 |X n

1(zn1 |xn1 ) =

n∏i=1

PZ |X (zi |xi )

Random permutation π generates Y n1 from Zn

1 : Yπ(i) = Zi for i ∈ {1, . . . , n}Randomized decoder gn : Yn →M∪ {error} produces estimate M = gn(Y n

1 ) at receiver






n = blocklength



PZn1 |X n

1(zn1 |xn1 ) =

n∏i=1

PZ |X (zi |xi )



1 ) at receiver






n = blocklength



PZn1 |X n

1(zn1 |xn1 ) =

n∏i=1

PZ |X (zi |xi )



1 ) at receiver






n = blocklength



PZn1 |X n

1(zn1 |xn1 ) =

n∏i=1

PZ |X (zi |xi )


1 : Yπ(i) = Zi for i ∈ {1, . . . , n}

Randomized decoder gn : Yn →M∪ {error} produces estimate M = gn(Y n1 ) at receiver






n = blocklength



PZn1 |X n

1(zn1 |xn1 ) =

n∏i=1

PZ |X (zi |xi )



1 ) at receiver



What if we analyze the “swapped” model?

ENCODER CHANNELRANDOM PERMUTATION DECODER

𝑀 𝑋 𝑉 𝑊 𝑀

Proposition (Equivalent Models)

If channel PW |V is equal to channel PZ |X , then channel PW n1 |X n

1is equal to channel PY n

1 |X n1

.



Remarks:

Proof follows from direct calculation.

Can analyze either model!









1 |X n1

.



Remarks:











1 |X n1

.



Remarks:











1 |X n1

.



Remarks:


Can analyze either model!Anuran Makur (MIT) Capacity of Permutation Channels 7 October 2020 7 / 39

Coding for the Permutation Channel



General Principle:“Encode the information in an object that is invariant under the [permutation]transformation.” [KV13]

Multiset codes are studied in [KV13], [KV15], and [KT18].

What are the fundamentalinformation theoretic limits of this model?
















Information Capacity of the Permutation Channel



Average probability of error Pnerror , P(M 6= M)

“Rate” of coding scheme (fn, gn) is R ,log(|M|)

log(n)

|M| = nR

Rate R ≥ 0 is achievable ⇔ ∃{(fn, gn)}n∈N such that limn→∞

Pnerror = 0

Definition (Permutation Channel Capacity)

Cperm(PZ |X ) , sup{R ≥ 0 : R is achievable}

Main Question

What is the permutation channel capacity of a general PZ |X?







log(n)

|M| = nR


Pnerror = 0



Main Question








log(n)

|M| = nR


Pnerror = 0



Main Question








log(n)

|M| = nR because number of empirical distributions of Y n1 is poly(n)


Pnerror = 0



Main Question








log(n)

|M| = nR


Pnerror = 0



Main Question








log(n)

|M| = nR


Pnerror = 0



Main Question








log(n)

|M| = nR


Pnerror = 0



Main Question



Example: Binary Symmetric Channel

ENCODER BSC 𝒑 RANDOM PERMUTATION DECODER


Channel is binary symmetric channel, denoted BSC(p):

∀z , x ∈ {0, 1}, PZ |X (z |x) =

{1− p, for z = x

p, for z 6= x

Alphabets are X = Y = {0, 1}Assume crossover probability p ∈ (0, 1) and p 6= 1

2

Question: What is the permutation channel capacity of the BSC?






∀z , x ∈ {0, 1}, PZ |X (z |x) =

{1− p, for z = x

p, for z 6= x

Alphabets are X = Y = {0, 1}

Assume crossover probability p ∈ (0, 1) and p 6= 12







∀z , x ∈ {0, 1}, PZ |X (z |x) =

{1− p, for z = x

p, for z 6= x


2







∀z , x ∈ {0, 1}, PZ |X (z |x) =

{1− p, for z = x

p, for z 6= x


2



Outline

1 Introduction

2 Achievability and Converse for the BSCEncoder and DecoderTesting between Converging HypothesesSecond Moment Method for TV DistanceFano’s Inequality and CLT Approximation



5 Conclusion


Warm-up: Sending Two Messages



Fix a message m ∈ {0, 1}

, and encode m as fn(m) = X n1

i.i.d.∼ Ber(qm)

𝑞13

𝑞23

10

Memoryless BSC(p) outputs Zn1

i.i.d.∼ Ber(p ∗ qm), where p ∗ qm , p(1− qm) + qm(1− p)is the convolution of p and qm

Random permutation generates Y n1

i.i.d.∼ Ber(p ∗ qm)

Maximum Likelihood (ML) decoder: M = 1{1n

∑ni=1 Yi ≥ 1

2

}(for p < 1

2)1n

∑ni=1 Yi → p ∗ qm in probability as n→∞ ⇒ lim

n→∞Pnerror = 0 as p ∗ q0 6= p ∗ q1





Fix a message m ∈ {0, 1}, and encode m as fn(m) = X n1

i.i.d.∼ Ber(qm)

𝑞13

𝑞23

10






∑ni=1 Yi ≥ 1

2

}(for p < 1

2)1n








i.i.d.∼ Ber(qm)

𝑞13

𝑞23

10






∑ni=1 Yi ≥ 1

2

}(for p < 1

2)1n








i.i.d.∼ Ber(qm)

𝑞13

𝑞23

10






∑ni=1 Yi ≥ 1

2

}(for p < 1

2)1n








i.i.d.∼ Ber(qm)

𝑞13

𝑞23

10






∑ni=1 Yi ≥ 1

2

}(for p < 1

2)

1n








i.i.d.∼ Ber(qm)

𝑞13

𝑞23

10






∑ni=1 Yi ≥ 1

2

}(for p < 1

2)1n




Encoder and Decoder

Suppose M = {1, . . . , nR} for some R > 0

Randomized encoder: Given m ∈M, fn(m) = X n1

i.i.d.∼ Ber( m

nR

)

10𝑛

Given m ∈M, Y n1

i.i.d.∼ Ber(p ∗ m

nR

)ML decoder: For yn1 ∈ {0, 1}n, gn(yn1 ) = arg max

m∈MPY n

1 |M(yn1 |m)

Trade-off: Although 1n

∑ni=1 Yi → p ∗ m

nRin probability as n→∞, consecutive messages

become indistinguishable, i.e. mnR− m+1

nR→ 0

Fact: Consecutive messages distinguishable ⇒ limn→∞

Pnerror = 0

What is the largest R such that two consecutive messages can be distinguished?


Encoder and Decoder



i.i.d.∼ Ber( m

nR

)

10𝑛

Given m ∈M, Y n1


nR


m∈MPY n

1 |M(yn1 |m)


∑ni=1 Yi → p ∗ m



nR→ 0


Pnerror = 0



Encoder and Decoder



i.i.d.∼ Ber( m

nR

)

10𝑛

Given m ∈M, Y n1


nR

)(as before)

ML decoder: For yn1 ∈ {0, 1}n, gn(yn1 ) = arg maxm∈M

PY n1 |M(yn1 |m)


∑ni=1 Yi → p ∗ m



nR→ 0


Pnerror = 0



Encoder and Decoder



i.i.d.∼ Ber( m

nR

)

10𝑛

Given m ∈M, Y n1


nR


m∈MPY n

1 |M(yn1 |m)


∑ni=1 Yi → p ∗ m



nR→ 0


Pnerror = 0



Encoder and Decoder



i.i.d.∼ Ber( m

nR

)

10𝑛

Given m ∈M, Y n1


nR


m∈MPY n

1 |M(yn1 |m)


∑ni=1 Yi → p ∗ m



nR→ 0


Pnerror = 0



Encoder and Decoder



i.i.d.∼ Ber( m

nR

)

10𝑛

Given m ∈M, Y n1


nR


m∈MPY n

1 |M(yn1 |m)


∑ni=1 Yi → p ∗ m



nR→ 0


Pnerror = 0



Encoder and Decoder



i.i.d.∼ Ber( m

nR

)

10𝑛

Given m ∈M, Y n1


nR


m∈MPY n

1 |M(yn1 |m)


∑ni=1 Yi → p ∗ m



nR→ 0


Pnerror = 0



Testing between Converging Hypotheses

Binary Hypothesis Testing:

Consider hypothesis H ∼ Ber(12

)with uniform prior

For any n ∈ N, q ∈ (0, 1), and R > 0, consider likelihoods:

Given H = 0 : X n1

i.i.d.∼ PX |H=0 = Ber(q)

Given H = 1 : X n1

i.i.d.∼ PX |H=1 = Ber

(q +

1

nR

)Define the zero-mean sufficient statistic of X n

1 for H:

Tn ,1

n

n∑i=1

Xi − q − 1

2nR

Let HnML(Tn) denote the ML decoder for H based on Tn with minimum probability of

error PnML , P(Hn

ML(Tn) 6= H)Want: Largest R > 0 such that lim

n→∞PnML = 0?





)with uniform prior


Given H = 0 : X n1


Given H = 1 : X n1


(q +

1

nR

)

Define the zero-mean sufficient statistic of X n1 for H:

Tn ,1

n

n∑i=1

Xi − q − 1

2nR


error PnML , P(Hn


n→∞PnML = 0?





)with uniform prior


Given H = 0 : X n1


Given H = 1 : X n1


(q +

1

nR


1 for H:

Tn ,1

n

n∑i=1

Xi − q − 1

2nR


error PnML , P(Hn


n→∞PnML = 0?





)with uniform prior


Given H = 0 : X n1


Given H = 1 : X n1


(q +

1

nR


1 for H:

Tn ,1

n

n∑i=1

Xi − q − 1

2nR


error PnML , P(Hn

ML(Tn) 6= H)

Want: Largest R > 0 such that limn→∞

PnML = 0?





)with uniform prior


Given H = 0 : X n1


Given H = 1 : X n1


(q +

1

nR


1 for H:

Tn ,1

n

n∑i=1

Xi − q − 1

2nR


error PnML , P(Hn


n→∞PnML = 0?


Intuition via Central Limit Theorem

For large n, PTn|H(·|0) and PTn|H(·|1) are Gaussian distributions

|E[Tn|H = 0]− E[Tn|H = 1]| = 1/nR

Standard deviations are Θ(1/√n)

Figure:

𝑡0

𝑃 | 𝑡|0 𝑃 | 𝑡|1

12𝑛

12𝑛




|E[Tn|H = 0]− E[Tn|H = 1]| = 1/nR


Figure:

𝑡0

𝑃 | 𝑡|0 𝑃 | 𝑡|1

𝛳1𝑛

1𝑛

𝛳1𝑛




|E[Tn|H = 0]− E[Tn|H = 1]| = 1/nR


Figure:

𝑡0

𝑃 | 𝑡|0 𝑃 | 𝑡|1

𝛳1𝑛

1𝑛

𝛳1𝑛




|E[Tn|H = 0]− E[Tn|H = 1]| = 1/nR


Case R < 12 :

𝑡0

𝑃 | 𝑡|0 𝑃 | 𝑡|1

𝛳1𝑛

1𝑛

𝛳1𝑛




|E[Tn|H = 0]− E[Tn|H = 1]| = 1/nR


Case R < 12 : Decoding is possible ,

𝑡0

𝑃 | 𝑡|0 𝑃 | 𝑡|1

𝛳1𝑛

1𝑛

𝛳1𝑛




|E[Tn|H = 0]− E[Tn|H = 1]| = 1/nR


Case R > 12 :

𝑡0

𝑃 | 𝑡|0 𝑃 | 𝑡|1

𝛳1𝑛

1𝑛

𝛳1𝑛




|E[Tn|H = 0]− E[Tn|H = 1]| = 1/nR


Case R > 12 : Decoding is impossible /

𝑡0

𝑃 | 𝑡|0 𝑃 | 𝑡|1

𝛳1𝑛

1𝑛

𝛳1𝑛


Second Moment Method for TV Distance

Lemma (2nd Moment Method [EKPS00])∥∥PTn|H=1 − PTn|H=0

∥∥TV≥ (E[Tn|H = 1]− E[Tn|H = 0])2

4VAR(Tn)

where ‖P − Q‖TV = 12 ‖P − Q‖1 denotes the total variation (TV) distance between the

distributions P and Q.

Proof:




∥∥TV≥ (E[Tn|H = 1]− E[Tn|H = 0])2

4VAR(Tn)



Proof: Let T+n ∼ PTn|H=1 and T−n ∼ PTn|H=0(

E[T+n

]− E

[T−n])2

=

(∑t

t(PTn|H(t|1)− PTn|H(t|0)

))2




∥∥TV≥ (E[Tn|H = 1]− E[Tn|H = 0])2

4VAR(Tn)



Proof: Let T+n ∼ PTn|H=1 and T−n ∼ PTn|H=0(E[T+n

]− E

[T−n])2

=

(∑t

t√

PTn(t)

(PTn|H(t|1)− PTn|H(t|0)

)√PTn(t)

)2




∥∥TV≥ (E[Tn|H = 1]− E[Tn|H = 0])2

4VAR(Tn)



Proof: Cauchy-Schwarz inequality(E[T+n

]− E

[T−n])2

=

(∑t

t√PTn(t)


)√PTn(t)

)2

≤

(∑t

t2PTn(t)

)(∑t


)2PTn(t)

)




∥∥TV≥ (E[Tn|H = 1]− E[Tn|H = 0])2

4VAR(Tn)



Proof: Recall that Tn is zero-mean(E[T+n

]− E

[T−n])2

=

(∑t

t√

PTn(t)


)√PTn(t)

)2

≤ VAR(Tn)

(∑t


)2PTn(t)

)




∥∥TV≥ (E[Tn|H = 1]− E[Tn|H = 0])2

4VAR(Tn)



Proof: Hammersley-Chapman-Robbins bound(E[T+n

]− E

[T−n])2

=

(∑t

t√PTn(t)


)√PTn(t)

)2

≤ 4VAR(Tn)

(1

4

∑t


)2PTn(t)

)︸︷︷︸

Vincze-Le Cam distance




∥∥TV≥ (E[Tn|H = 1]− E[Tn|H = 0])2

4VAR(Tn)



Proof: (E[T+n

]− E

[T−n])2

=

(∑t

t√PTn(t)


)√PTn(t)

)2

≤ 4VAR(Tn)

(1

4

∑t


)2PTn(t)

)≤ 4VAR(Tn)

∥∥PTn|H=1 − PTn|H=0

∥∥TV


BSC Achievability Proof

Proposition (BSC Achievability)

For any 0 < R < 1/2, consider the binary hypothesis testing problem with H ∼ Ber(12

), and

X n1

i.i.d.∼ Ber(q + h

nR

)given H = h ∈ {0, 1}.

Then, limn→∞

PnML = 0. This implies that:

Cperm(BSC(p)) ≥ 1

2.

Proof: Start with Le Cam’s relation

PnML =

1

2

(1−


∥∥TV

)





), and

X n1

i.i.d.∼ Ber(q + h

nR

)given H = h ∈ {0, 1}.

Then, limn→∞


Cperm(BSC(p)) ≥ 1

2.

Proof: Apply second moment method lemma

PnML =

1

2

(1−


∥∥TV

)≤ 1

2

(1− (E[Tn|H = 1]− E[Tn|H = 0])2

4VAR(Tn)

)





), and

X n1

i.i.d.∼ Ber(q + h

nR

)given H = h ∈ {0, 1}.

Then, limn→∞


Cperm(BSC(p)) ≥ 1

2.

Proof: After explicit computation and simplification...

PnML =

1

2

(1−


∥∥TV

)≤ 1

2

(1− (E[Tn|H = 1]− E[Tn|H = 0])2

4VAR(Tn)

)





), and

X n1

i.i.d.∼ Ber(q + h

nR

)given H = h ∈ {0, 1}.

Then, limn→∞


Cperm(BSC(p)) ≥ 1

2.

Proof: For any 0 < R < 12 ,

PnML =

1

2

(1−


∥∥TV

)≤ 1

2

(1− (E[Tn|H = 1]− E[Tn|H = 0])2

4VAR(Tn)

)≤ 3

2n1−2RAnuran Makur (MIT) Capacity of Permutation Channels 7 October 2020 17 / 39




), and

X n1

i.i.d.∼ Ber(q + h

nR

)given H = h ∈ {0, 1}.

Then, limn→∞

PnML = 0.

This implies that:

Cperm(BSC(p)) ≥ 1

2.


PnML =

1

2

(1−


∥∥TV

)≤ 1

2

(1− (E[Tn|H = 1]− E[Tn|H = 0])2

4VAR(Tn)

)≤ 3

2n1−2R→ 0 as n→∞





), and

X n1

i.i.d.∼ Ber(q + h

nR

)given H = h ∈ {0, 1}.

Then, limn→∞


Cperm(BSC(p)) ≥ 1

2.


PnML =

1

2

(1−


∥∥TV

)≤ 1

2

(1− (E[Tn|H = 1]− E[Tn|H = 0])2

4VAR(Tn)

)≤ 3

2n1−2R→ 0 as n→∞


Outline

1 Introduction

2 Achievability and Converse for the BSCEncoder and DecoderTesting between Converging HypothesesSecond Moment Method for TV DistanceFano’s Inequality and CLT Approximation



5 Conclusion


Recall: Two Information Inequalities

Consider discrete random variables X ,Y ,Z that form a Markov chain X → Y → Z .

Lemma (Data Processing Inequality [CT06])

I (X ;Z ) ≤ I (X ;Y )

with equality if and only if Z is a sufficient statistic of Y for X , i.e., X → Z → Y also forms aMarkov chain.

Lemma (Fano’s Inequality [CT06])

If X takes values in the finite alphabet X , then

H(X |Z ) ≤ 1 + P(X 6= Z ) log(|X |)

where we perceive Z as an estimator for X based on Y .





I (X ;Z ) ≤ I (X ;Y )




H(X |Z ) ≤ 1 + P(X 6= Z ) log(|X |)






I (X ;Z ) ≤ I (X ;Y )




H(X |Z ) ≤ 1 + P(X 6= Z ) log(|X |)



BSC Converse Proof: Fano’s Inequality Argument

Consider the Markov chain M → X n1 → Zn

1 → Y n1 → Sn ,

∑ni=1 Yi → M, and a

sequence of encoder-decoder pairs {(fn, gn)}n∈N such that |M| = nR and limn→∞

Pnerror = 0

Standard argument [CT06]:

R log(n)

= H(M|M) + I (M; M)

≤ 1 + PnerrorR log(n) + I (M;Y n

1 )

= 1 + PnerrorR log(n) + I (M;Sn)

≤ 1 + PnerrorR log(n) + I (X n

1 ;Sn)

Divide by log(n)

and let n→∞:

R ≤ I (X n1 ; Sn)

log(n)




1 → Y n1 → Sn ,



Pnerror = 0

Standard argument [CT06]: M is uniform

R log(n) = H(M)

= H(M|M) + I (M; M)


1 )



1 ;Sn)

Divide by log(n)

and let n→∞:

R ≤ I (X n1 ; Sn)

log(n)




1 → Y n1 → Sn ,



Pnerror = 0

Standard argument [CT06]: Fano’s inequality, data processing inequality

R log(n) = H(M|M) + I (M; M)


1 )



1 ;Sn)

Divide by log(n)

and let n→∞:

R ≤ I (X n1 ; Sn)

log(n)




1 → Y n1 → Sn ,



Pnerror = 0

Standard argument [CT06]: sufficiency

R log(n) = H(M|M) + I (M; M)


1 )

= 1 + PnerrorR log(n) + I (M; Sn)


1 ;Sn)

Divide by log(n)

and let n→∞:

R ≤ I (X n1 ; Sn)

log(n)




1 → Y n1 → Sn ,



Pnerror = 0

Standard argument [CT06]: data processing inequality

R log(n) = H(M|M) + I (M; M)


1 )



1 ;Sn)

Divide by log(n)

and let n→∞:

R ≤ I (X n1 ; Sn)

log(n)




1 → Y n1 → Sn ,



Pnerror = 0


R log(n) = H(M|M) + I (M; M)


1 )



1 ;Sn)

Divide by log(n)

and let n→∞:

R ≤ 1

log(n)+ Pn

errorR +I (X n

1 ; Sn)

log(n)




1 → Y n1 → Sn ,



Pnerror = 0


R log(n) = H(M|M) + I (M; M)


1 )



1 ;Sn)

Divide by log(n) and let n→∞:

R ≤ limn→∞

I (X n1 ; Sn)

log(n)


BSC Converse Proof: CLT Approximation

Upper bound on I (X n1 ;Sn):

I (X n1 ;Sn) = H(Sn)− H(Sn|X n

1 )

≤ log(n + 1)−∑

xn1∈{0,1}nPX n

1(xn1 )H(bin(k , 1− p) + bin(n − k , p))

≤ log(n + 1)−∑

xn1∈{0,1}nPX n

1(xn1 )H

(bin(n

2, p))

= log(n + 1)− 1

2log(πep(1− p)n) + O

(1

n

)Hence, we have R ≤ lim

n→∞I (X n

1 ; Sn)/log(n) = 12 .

Proposition (BSC Converse)

Cperm(BSC(p)) ≤ 1

2



Since Sn ∈ {0, . . . , n},I (X n

1 ;Sn) = H(Sn)− H(Sn|X n1 )

≤ log(n + 1)−∑

xn1∈{0,1}nPX n

1(xn1 )H(Sn|X n

1 = xn1 )

≤ log(n + 1)−∑

xn1∈{0,1}nPX n

1(xn1 )H(bin(k , 1− p) + bin(n − k , p))

≤ log(n + 1)−∑

xn1∈{0,1}nPX n

1(xn1 )H

(bin(n

2, p))

= log(n + 1)− 1


(1

n


n→∞I (X n

1 ; Sn)/log(n) = 12 .


Cperm(BSC(p)) ≤ 1

2



Given X n1 = xn1 with

∑ni=1 xi = k , Sn = bin(k , 1− p) + bin(n − k , p):


1 )

≤ log(n + 1)−∑

xn1∈{0,1}nPX n

1(xn1 )H(bin(k , 1− p) + bin(n − k , p))

≤ log(n + 1)−∑

xn1∈{0,1}nPX n

1(xn1 )H

(bin(n

2, p))

= log(n + 1)− 1


(1

n


n→∞I (X n

1 ; Sn)/log(n) = 12 .


Cperm(BSC(p)) ≤ 1

2



Using [CT06, Problem 2.14], i.e., max{H(X ),H(Y )} ≤ H(X + Y ) for X ⊥⊥ Y ,


1 )

≤ log(n + 1)−∑

xn1∈{0,1}nPX n

1(xn1 )H(bin(k , 1− p) + bin(n − k , p))

≤ log(n + 1)−∑

xn1∈{0,1}nPX n

1(xn1 )H

(bin(n

2, p))

= log(n + 1)− 1


(1

n


n→∞I (X n

1 ; Sn)/log(n) = 12 .


Cperm(BSC(p)) ≤ 1

2



Approximate binomial entropy using CLT [ALY10]:


1 )

≤ log(n + 1)−∑

xn1∈{0,1}nPX n

1(xn1 )H(bin(k , 1− p) + bin(n − k , p))

≤ log(n + 1)−∑

xn1∈{0,1}nPX n

1(xn1 )H

(bin(n

2, p))

= log(n + 1)−∑

xn1∈{0,1}nPX n

1(xn1 )

(1


(1

n

))

= log(n + 1)− 1


(1

n


n→∞I (X n

1 ; Sn)/log(n) = 12 .


Cperm(BSC(p)) ≤ 1

2





1 )

≤ log(n + 1)−∑

xn1∈{0,1}nPX n

1(xn1 )H(bin(k , 1− p) + bin(n − k , p))

≤ log(n + 1)−∑

xn1∈{0,1}nPX n

1(xn1 )H

(bin(n

2, p))

= log(n + 1)− 1


(1

n

)

Hence, we have R ≤ limn→∞

I (X n1 ; Sn)/log(n) = 1

2 .


Cperm(BSC(p)) ≤ 1

2





1 )

≤ log(n + 1)−∑

xn1∈{0,1}nPX n

1(xn1 )H(bin(k , 1− p) + bin(n − k , p))

≤ log(n + 1)−∑

xn1∈{0,1}nPX n

1(xn1 )H

(bin(n

2, p))

= log(n + 1)− 1


(1

n


n→∞I (X n

1 ; Sn)/log(n) = 12 .


Cperm(BSC(p)) ≤ 1

2





1 )

≤ log(n + 1)−∑

xn1∈{0,1}nPX n

1(xn1 )H(bin(k , 1− p) + bin(n − k , p))

≤ log(n + 1)−∑

xn1∈{0,1}nPX n

1(xn1 )H

(bin(n

2, p))

= log(n + 1)− 1


(1

n


n→∞I (X n

1 ; Sn)/log(n) = 12 .


Cperm(BSC(p)) ≤ 1

2


Information Capacity of the BSC Permutation Channel

Proposition (Pemutation Channel Capacity of BSC)

Cperm(BSC(p)) =

1, for p = 0, 112 , for p ∈

(0, 12)∪(12 , 1)

0, for p = 12

𝑝0

𝐶perm BSC 𝑝

0

1

112

12

Remarks:

Cperm(·) is discontinuous andnon-convex

Cperm(·) is generally agnostic toparameters of channel

Computationally tractable codingscheme in achievability proof




Cperm(BSC(p)) =

1, for p = 0, 112 , for p ∈

(0, 12)∪(12 , 1)

0, for p = 12

𝑝0

𝐶perm BSC 𝑝

0

1

112

12

Remarks:







Cperm(BSC(p)) =

1, for p = 0, 112 , for p ∈

(0, 12)∪(12 , 1)

0, for p = 12

𝑝0

𝐶perm BSC 𝑝

0

1

112

12

Remarks:







Cperm(BSC(p)) =

1, for p = 0, 112 , for p ∈

(0, 12)∪(12 , 1)

0, for p = 12

𝑝0

𝐶perm BSC 𝑝

0

1

112

12

Remarks:





Outline

1 Introduction


3 General Achievability BoundCoding SchemeRank Bound


5 Conclusion


Recall General Problem





log(n)


Pnerror = 0



Main Question



Achievability: Coding Scheme

Let r = rank(PZ |X ) and k =⌊√

n⌋

Consider X ′ ⊆ X with |X ′| = r such that {PZ |X (·|x) : x ∈ X ′} are linearly independent

Message set:

M ,

{p = (p(x) : x ∈ X ′) ∈ (Z+)X

′:∑x∈X ′

p(x) = k

}

where |M| =(k+r−1

r−1)

= Θ(n

r−12

)

Randomized Encoder:

∀p ∈M, fn(p) = X n1

i.i.d.∼ PX where PX (x) =

{p(x)k , for x ∈ X ′

0, for x ∈ X\X ′




n⌋


Message set:

M ,

{p = (p(x) : x ∈ X ′) ∈ (Z+)X

′:∑x∈X ′

p(x) = k

}

where |M| =(k+r−1

r−1)

= Θ(n

r−12

)

Randomized Encoder:

∀p ∈M, fn(p) = X n1



0, for x ∈ X\X ′




n⌋


Message set:

M ,

{p = (p(x) : x ∈ X ′) ∈ (Z+)X

′:∑x∈X ′

p(x) = k

}

where |M| =(k+r−1

r−1)

= Θ(n

r−12

)Randomized Encoder:

∀p ∈M, fn(p) = X n1



0, for x ∈ X\X ′




n⌋


Message set:

M ,

{p = (p(x) : x ∈ X ′) ∈ (Z+)X

′:∑x∈X ′

p(x) = k

}

where |M| =(k+r−1

r−1)

= Θ(n

r−12

)

Randomized Encoder:

∀p ∈M, fn(p) = X n1



0, for x ∈ X\X ′




n⌋


Message set:

M ,

{p = (p(x) : x ∈ X ′) ∈ (Z+)X

′:∑x∈X ′

p(x) = k

}

where |M| =(k+r−1

r−1)

= Θ(n

r−12

)Randomized Encoder:

∀p ∈M, fn(p) = X n1



0, for x ∈ X\X ′



Let stochastic matrix PZ |X ∈ Rr×|Y| have rows {PZ |X (·|x) : x ∈ X ′}Let P†Z |X denote its Moore-Penrose pseudoinverse

(Sub-optimal) Thresholding Decoder: For any yn1 ∈ Yn,Step 1: Construct its type/empirical distribution/histogram

∀y ∈ Y, Pyn1

(y) =1

n

n∑i=1

1{yi = y}

Step 2: Generate estimate p ∈ (Z+)X′

with components

∀x ∈ X ′, p(x) = arg minj∈{0,...,k}

∣∣∣∣∣∣∑y∈Y

Pyn1

(y)[P†Z |X

]y ,x− j

k

∣∣∣∣∣∣Step 3: Output decoded message

gn(yn1 ) =

{p, if p ∈Merror, otherwise





∀y ∈ Y, Pyn1

(y) =1

n

n∑i=1

1{yi = y}


with components

∀x ∈ X ′, p(x) = arg minj∈{0,...,k}

∣∣∣∣∣∣∑y∈Y

Pyn1

(y)[P†Z |X

]y ,x− j

k


gn(yn1 ) =






∀y ∈ Y, Pyn1

(y) =1

n

n∑i=1

1{yi = y}


with components

∀x ∈ X ′, p(x) = arg minj∈{0,...,k}

∣∣∣∣∣∣∑y∈Y

Pyn1

(y)[P†Z |X

]y ,x− j

k

∣∣∣∣∣∣

Step 3: Output decoded message

gn(yn1 ) =






∀y ∈ Y, Pyn1

(y) =1

n

n∑i=1

1{yi = y}


with components

∀x ∈ X ′, p(x) = arg minj∈{0,...,k}

∣∣∣∣∣∣∑y∈Y

Pyn1

(y)[P†Z |X

]y ,x− j

k


gn(yn1 ) =



Achievability: Rank Bound

Theorem (Rank Bound)

For any channel PZ |X :

Cperm(PZ |X ) ≥rank(PZ |X )− 1

2.

Remarks about Coding Scheme:

Showing limn→∞ Pnerror = 0 proves theorem.

Intuition: Conditioned on M = p, PY n1≈ PZ with high probability as n→∞.

Hence,∑

y∈Y PY n1

(y)[P†Z |X

]y ,x≈ PX (x) for all x ∈ X ′ with high probability.

Computational complexity: Decoder has O(n) running time.

Probabilistic method: Good deterministic codes exist.

Expurgation: Achievability bound holds under maximal probability of error criterion.






2.




Hence,∑

y∈Y PY n1

(y)[P†Z |X










2.




Hence,∑

y∈Y PY n1

(y)[P†Z |X










2.




Hence,∑

y∈Y PY n1

(y)[P†Z |X










2.




Hence,∑

y∈Y PY n1

(y)[P†Z |X










2.




Hence,∑

y∈Y PY n1

(y)[P†Z |X






Outline

1 Introduction



4 General Converse BoundsOutput Alphabet BoundEffective Input Alphabet BoundDegradation by Symmetric Channels

5 Conclusion


Converse: Output Alphabet Bound

Theorem (Output Alphabet Bound)

For any entry-wise strictly positive channel PZ |X > 0:

Cperm(PZ |X ) ≤ |Y| − 1

2.

Remarks:

Proof hinges on Fano’s inequality and CLT approximation of binomial entropy.

What if |X | is much smaller than |Y|?Want: Converse bound in terms of input alphabet size.





Cperm(PZ |X ) ≤ |Y| − 1

2.

Remarks:







Cperm(PZ |X ) ≤ |Y| − 1

2.

Remarks:


What if |X | is much smaller than |Y|?

Want: Converse bound in terms of input alphabet size.





Cperm(PZ |X ) ≤ |Y| − 1

2.

Remarks:




Converse: Effective Input Alphabet Bound

Theorem (Effective Input Alphabet Bound)


Cperm(PZ |X ) ≤ext(PZ |X )− 1

2

where ext(PZ |X ) denotes the number of extreme points of conv{PZ |X (·|x) : x ∈ X

}.

Remarks:

Effective input alphabet size: rank(PZ |X ) ≤ ext(PZ |X ) ≤ |X |.For any channel PZ |X > 0, Cperm(PZ |X ) ≤

(min{ext(PZ |X ), |Y|} − 1

)/2.

For any general channel PZ |X , Cperm(PZ |X ) ≤ min{ext(PZ |X ), |Y|} − 1.

How do we prove above theorem?






2


}.

Remarks:

Effective input alphabet size: rank(PZ |X ) ≤ ext(PZ |X ) ≤ |X |.

For any channel PZ |X > 0, Cperm(PZ |X ) ≤(min{ext(PZ |X ), |Y|} − 1

)/2.








2


}.

Remarks:


(min{ext(PZ |X ), |Y|} − 1

)/2.








2


}.

Remarks:


(min{ext(PZ |X ), |Y|} − 1

)/2.








2


}.

Remarks:


(min{ext(PZ |X ), |Y|} − 1

)/2.




Brief Digression: Degradation

Definition (Degradation/Blackwell Order [Bla51], [She51], [Ste51], [Cov72], [Ber73])

Given channels PZ1|X and PZ2|X with common input alphabet X , PZ2|X is a degraded versionof PZ1|X if PZ2|X = PZ1|XPZ2|Z1

for some channel PZ2|Z1.

Theorem (Blackwell-Sherman-Stein [Bla51], [She51], [Ste51])

The observation model PZ2|X is a degraded version of PZ1|X if and only if for every priordistribution PX , and every loss function L : X × X → R, the Bayes risks satisfy:

minf (·)

E [L(X , f (Z1))] ≤ ming(·)

E [L(X , g(Z2))]

where the minima are over all randomized estimators of X .


Brief Digression: Degradation

Definition (Degradation/Blackwell Order [Bla51], [She51], [Ste51], [Cov72], [Ber73])

Given channels PZ1|X and PZ2|X with common input alphabet X , PZ2|X is a degraded versionof PZ1|X if PZ2|X = PZ1|XPZ2|Z1

for some channel PZ2|Z1.

Theorem (Blackwell-Sherman-Stein [Bla51], [She51], [Ste51])

The observation model PZ2|X is a degraded version of PZ1|X if and only if for every priordistribution PX , and every loss function L : X × X → R, the Bayes risks satisfy:

minf (·)

E [L(X , f (Z1))] ≤ ming(·)

E [L(X , g(Z2))]

where the minima are over all randomized estimators of X .


Brief Digression: Symmetric Channels

Definition (q-ary Symmetric Channel)

A q-ary symmetric channel, denoted q-SC(δ), with total crossover probability δ ∈ [0, 1] andalphabet X where |X | = q, is given by the doubly stochastic matrix:

Wδ ,

1− δ δ

q−1 · · · δq−1

δq−1 1− δ · · · δ

q−1...

.... . .

...δ

q−1δ

q−1 · · · 1− δ

.

Proposition (Degradation by Symmetric Channels)

Given channel PZ |X with ν = minx∈X , y∈Y

PZ |X (y |x),

if 0 ≤ δ ≤ ν

1− ν + νq−1

, then PZ |X is a degraded version of q-SC(δ).



Definition (q-ary Symmetric Channel)

A q-ary symmetric channel, denoted q-SC(δ), with total crossover probability δ ∈ [0, 1] andalphabet X where |X | = q, is given by the doubly stochastic matrix:

Wδ ,

1− δ δ

q−1 · · · δq−1

δq−1 1− δ · · · δ

q−1...

.... . .

...δ

q−1δ

q−1 · · · 1− δ

.



PZ |X (y |x),

if 0 ≤ δ ≤ ν

1− ν + νq−1






PZ |X (y |x),

if 0 ≤ δ ≤ ν

1− ν + νq−1


Remarks:

Prop follows from computing extremal δ such that W−1δ PZ |X is row stochastic.

Bound on δ can be improved when more is known about PZ |X :

Markov chain [MP18]: δ ≤ ν/(1− (q − 1)ν + ν

q−1).

Additive noise channel on Abelian group X [MP18]: δ ≤ (q − 1)ν.Alternative bounds for Markov chains [MOS13].

Many applications in information theory, statistics, and probability [MP18], [MOS13].





PZ |X (y |x),

if 0 ≤ δ ≤ ν

1− ν + νq−1


Remarks:




q−1).







PZ |X (y |x),

if 0 ≤ δ ≤ ν

1− ν + νq−1


Remarks:




q−1).







PZ |X (y |x),

if 0 ≤ δ ≤ ν

1− ν + νq−1


Remarks:




q−1).

Additive noise channel on Abelian group X [MP18]: δ ≤ (q − 1)ν.

Alternative bounds for Markov chains [MOS13].






PZ |X (y |x),

if 0 ≤ δ ≤ ν

1− ν + νq−1


Remarks:




q−1).







PZ |X (y |x),

if 0 ≤ δ ≤ ν

1− ν + νq−1


Remarks:




q−1).




Proof Idea: Degradation by Symmetric Channels




2.

Proof Sketch:

Degradation by symmetric channels + tensorization of degradation + data processing

⇒ I (X n1 ;Y n

1 ) ≤ I (X n1 ; Y n

1 )

where Y n1 and Y n

1 are outputs of permutation channels with PZ |X and q-SC(δ), resp.

Convexity of KL divergence ⇒ Reduce |X | to ext(PZ |X ).

Fano argument of output alphabet bound ⇒ effective input alphabet bound.






2.

Proof Sketch:


⇒ I (X n1 ;Y n

1 ) ≤ I (X n1 ; Y n

1 )

where Y n1 and Y n









2.

Proof Sketch:


⇒ I (X n1 ;Y n

1 ) ≤ I (X n1 ; Y n

1 )

where Y n1 and Y n





Outline

1 Introduction




5 ConclusionStrictly Positive and “Full Rank” Channels


Strictly Positive and “Full Rank” Channels

Achievability and converse bounds yield:

Theorem (Strictly Positive and “Full Rank” Channels)

For any entry-wise strictly positive channel PZ |X > 0 that is “full rank” in the sense that

r , rank(PZ |X ) = min{ext(PZ |X ), |Y|}:

Cperm(PZ |X ) =r − 1

2.

Recall Example: Cperm of non-trivial binary symmetric channel is 12 .


Strictly Positive and “Full Rank” Channels

Achievability and converse bounds yield:

Theorem (Strictly Positive and “Full Rank” Channels)

For any entry-wise strictly positive channel PZ |X > 0 that is “full rank” in the sense that

r , rank(PZ |X ) = min{ext(PZ |X ), |Y|}:

Cperm(PZ |X ) =r − 1

2.

Recall Example: Cperm of non-trivial binary symmetric channel is 12 .


Conclusion

Main Result:For any entry-wise strictly positive channel PZ |X > 0:

rank(PZ |X )− 1

2≤ Cperm(PZ |X ) ≤

min{ext(PZ |X ), |Y|} − 1

2.

Future Directions:

Characterize Cperm of all (entry-wise strictly positive) channels.

Perform error exponent analysis (i.e., tight bounds on Pnerror).

Prove strong converse results (i.e., phase transition for Pnerror).

Perform finite blocklength analysis (i.e., exact asymptotics for maximum achievable |M|).

Analyze permutation channels with more complex probability models in the randompermutation block.


Conclusion


rank(PZ |X )− 1



2.

Future Directions:







Conclusion


rank(PZ |X )− 1



2.

Future Directions:







Conclusion


rank(PZ |X )− 1



2.

Future Directions:







Conclusion


rank(PZ |X )− 1



2.

Future Directions:







Conclusion


rank(PZ |X )− 1



2.

Future Directions:







References

This talk was based on:

A. Makur, “Information capacity of BSC and BEC permutation channels,” inProceedings of the 56th Annual Allerton Conference on Communication, Control, andComputing, Monticello, IL, USA, October 2-5 2018, pp. 1112–1119.

A. Makur, “Bounds on permutation channel capacity,” in Proceedings of the IEEEInternational Symposium on Information Theory (ISIT), Los Angeles, CA, USA, June21-26 2020.

A. Makur, “Coding theorems for noisy permutation channels,” accepted to IEEETransactions on Information Theory, July 2020.


Thank You!


Documents

Capacity of Permutation Channels - cs.purdue.edu