View
5
Download
0
Category
Preview:
Citation preview
1 / 8
Digital Communications III (ECE 154C)
Introduction to Coding and Information Theory
Tara Javidi
These lecture notes were originally developed by late Prof. J. K. Wolf.
UC San Diego
Spring 2014
Overview of ECE 154C
Course Overview
• Course Overview I
• Overview II
Examples
2 / 8
Course Overview I: Digital Communications Block Diagram
Course Overview
• Course Overview I
• Overview II
Examples
3 / 8
Course Overview I: Digital Communications Block Diagram
Course Overview
• Course Overview I
• Overview II
Examples
3 / 8
• Note that the Source Encoder converts all types of information to
a stream of binary digits.
Course Overview I: Digital Communications Block Diagram
Course Overview
• Course Overview I
• Overview II
Examples
3 / 8
• Note that the Source Encoder converts all types of information to
a stream of binary digits.
• Note that the Channel Endcouter, in an attempt to protect the
source coded (binary) stream, judiciously adds redundant bits.
Course Overview I: Digital Communications Block Diagram
Course Overview
• Course Overview I
• Overview II
Examples
3 / 8
• Sometimes the output of the source decoder must be an exact
{replica of the information (e.g. computer data) — called
NOISELESS CODING (aka lossless compression)
Course Overview I: Digital Communications Block Diagram
Course Overview
• Course Overview I
• Overview II
Examples
3 / 8
• Sometimes the output of the source decoder must be an exact
{replica of the information (e.g. computer data) — called
NOISELESS CODING (aka lossless compression)
• Other times the output of the source decoder can be
approximately equal to the information (e.g. music, tv, speech) —
called CODING WITH DISTORTION (aka lossy compression)
Overview II: What will we cover?
Course Overview
• Course Overview I
• Overview II
Examples
4 / 8
REFERENCE: CHAPTER 10 ZIEMER & TRANTER
SOURCE CODING - NOISELESS CODES
◦ Basic idea is to use as few binary digits as possible and still
be able to recover the information exactly
◦ Topics include:
• Huffman Codes
• Shannon Fano Codes
• Tunstall Codes
• Entropy of Source
• Lempel-Ziv Codes
Overview II: What will we cover?
Course Overview
• Course Overview I
• Overview II
Examples
4 / 8
REFERENCE: CHAPTER 10 ZIEMER & TRANTER
SOURCE CODING WITH DISTORTION
◦ Again the idea is to use minimum number of binary digits for a
given value of distortion
◦ Topics include:
• Gaussian Source
• Optimal Quantizing
Overview II: What will we cover?
Course Overview
• Course Overview I
• Overview II
Examples
4 / 8
REFERENCE: CHAPTER 10 ZIEMER & TRANTER
CHANNEL CAPACITY OF A NOISY CHANNEL
◦ Even if channel is noisy, messages can be sent essentially
error free if extra digits are transmitted
◦ Basic idea is to use as few extra digits as possible
◦ Topics Covered:
• Channel Capacity
• Mutual Information
• Some Examples
Overview II: What will we cover?
Course Overview
• Course Overview I
• Overview II
Examples
4 / 8
REFERENCE: CHAPTER 10 ZIEMER & TRANTER
CHANNEL CODING
◦ Basic idea — Detect errors that occured on channel and then
correct them
◦ Topics Covered:
• Hamming Code
• General Theory of Block Codes
(Parity Check Matrix, Generator Matrix, Minimum
Distance, etc.)
• LDPC Codes
• Turbo Codes
• Code Performance
A Few Examples
Course Overview
Examples
• Example 1
• Example 2
• More Examples
5 / 8
Example 1: 4 letter DMS
Course Overview
Examples
• Example 1
• Example 2
• More Examples
6 / 8
Basic concepts came from one paper of one man named
Claude Shannon!
Example 1: 4 letter DMS
Course Overview
Examples
• Example 1
• Example 2
• More Examples
6 / 8
Basic concepts came from one paper of one man named
Claude Shannon! Shannon used simple models that
capture the essence of the problem!
Example 1: 4 letter DMS
Course Overview
Examples
• Example 1
• Example 2
• More Examples
6 / 8
EXAMPLE 1– Simple Model of a source (Called a DISCRETE
MEMORYLESS SOURCE OR DMS)
Example 1: 4 letter DMS
Course Overview
Examples
• Example 1
• Example 2
• More Examples
6 / 8
EXAMPLE 1– Simple Model of a source (Called a DISCRETE
MEMORYLESS SOURCE OR DMS)
• I.I.D. (Independent and Identically Distributed) source letters
Example 1: 4 letter DMS
Course Overview
Examples
• Example 1
• Example 2
• More Examples
6 / 8
EXAMPLE 1– Simple Model of a source (Called a DISCRETE
MEMORYLESS SOURCE OR DMS)
• I.I.D. (Independent and Identically Distributed) source letters
• Alphabet size of 4 (A,B,C,D)
Example 1: 4 letter DMS
Course Overview
Examples
• Example 1
• Example 2
• More Examples
6 / 8
EXAMPLE 1– Simple Model of a source (Called a DISCRETE
MEMORYLESS SOURCE OR DMS)
• I.I.D. (Independent and Identically Distributed) source letters
• Alphabet size of 4 (A,B,C,D)
• P(A) = p1, P(B) = p2, P(C) = p3, P(D) = p4,∑
ipi = 1
Example 1: 4 letter DMS
Course Overview
Examples
• Example 1
• Example 2
• More Examples
6 / 8
EXAMPLE 1– Simple Model of a source (Called a DISCRETE
MEMORYLESS SOURCE OR DMS)
• I.I.D. (Independent and Identically Distributed) source letters
• Alphabet size of 4 (A,B,C,D)
• P(A) = p1, P(B) = p2, P(C) = p3, P(D) = p4,∑
ipi = 1
• Simplest CodeA −→ 00B −→ 01C −→ 10D −→ 11
Example 1: 4 letter DMS
Course Overview
Examples
• Example 1
• Example 2
• More Examples
6 / 8
EXAMPLE 1– Simple Model of a source (Called a DISCRETE
MEMORYLESS SOURCE OR DMS)
• I.I.D. (Independent and Identically Distributed) source letters
• Alphabet size of 4 (A,B,C,D)
• P(A) = p1, P(B) = p2, P(C) = p3, P(D) = p4,∑
ipi = 1
• Simplest CodeA −→ 00B −→ 01C −→ 10D −→ 11
Example 1: 4 letter DMS
Course Overview
Examples
• Example 1
• Example 2
• More Examples
6 / 8
Example 1: 4 letter DMS
Course Overview
Examples
• Example 1
• Example 2
• More Examples
6 / 8
• Average length of code words
L = 2(p1 + p2 + p3 + p4) = 2
Example 1: 4 letter DMS
Course Overview
Examples
• Example 1
• Example 2
• More Examples
6 / 8
• Average length of code words
L = 2(p1 + p2 + p3 + p4) = 2
Q: Can we use fewer than 2 binary digits per source letter (on the
average) and still recover information from the binary sequence?
Example 1: 4 letter DMS
Course Overview
Examples
• Example 1
• Example 2
• More Examples
6 / 8
• Average length of code words
L = 2(p1 + p2 + p3 + p4) = 2
Q: Can we use fewer than 2 binary digits per source letter (on the
average) and still recover information from the binary sequence?
A: Depends on values of (p1, p2, p3, p4)
Example 2: Binary Symmetric Channel
Course Overview
Examples
• Example 1
• Example 2
• More Examples
7 / 8
EXAMPLE 2– Simple Model for Noisy Channel
Example 2: Binary Symmetric Channel
Course Overview
Examples
• Example 1
• Example 2
• More Examples
7 / 8
EXAMPLE 2– Simple Model for Noisy Channel
Channels, as you saw in ECE154B, can be viewed as
If s0(t) = −s1(t) and equally likely signals,
Perror = Q
(
√
2E
N0
)
= P
Example 2: Binary Symmetric Channel
Course Overview
Examples
• Example 1
• Example 2
• More Examples
7 / 8
EXAMPLE 2– Simple Model for Noisy Channel
Channels, as you saw in ECE154B, can be viewed as
If s0(t) = −s1(t) and equally likely signals,
Perror = Q
(
√
2E
N0
)
= P
Q: Can we send information “error-free” over such a channel even
though p 6= 0, 1?
Example 2: Binary Symmetric Channel
Course Overview
Examples
• Example 1
• Example 2
• More Examples
7 / 8
EXAMPLE 2– Simple Model for Noisy Channel
Shannon considered a simpler channel called binary symmetric
channel (or BSC for short)
Pictorially Mathematically
PY |X(y|x) =
{
1− p y = x
p y 6= x
Q: Can we send information “error-free” over such a channel even
though p 6= 0, 1?
Example 2: Binary Symmetric Channel
Course Overview
Examples
• Example 1
• Example 2
• More Examples
7 / 8
EXAMPLE 2– Simple Model for Noisy Channel
Shannon considered a simpler channel called binary symmetric
channel (or BSC for short)
Pictorially Mathematically
PY |X(y|x) =
{
1− p y = x
p y 6= x
Q: Can we send information “error-free” over such a channel even
though p 6= 0, 1?
A: Depends on the rate of transmission (how many channel uses
are allowed per information bit). Essentially for small enough of
transmission rate (to be defined precisely), the answer is YES!
Example 3: DMS with Alphabet size 8
Course Overview
Examples
• Example 1
• Example 2
• More Examples
8 / 8
Example 3: DMS with Alphabet size 8
Course Overview
Examples
• Example 1
• Example 2
• More Examples
8 / 8
EXAMPLE 3– Discrete Memoryless Source with alphabet size of 8
letters: {A,B,C,D,E, F,G,H}
• Probabilities: {pA ≥ pB ≥ pC ≥ pD ≥ pE ≥ pF ≥ pG, pH}• See the following codes:
Q: Which codes are uniquely decodable? Which ones are
instantaneously decodable? Compute the average length of the
codewords for each code.
Example 3: DMS with Alphabet size 8
Course Overview
Examples
• Example 1
• Example 2
• More Examples
8 / 8
EXAMPLE 4– Can you optimally design a code?
L =1
2× 1 +
1
4× 2 +
1
8× 3 +
1
16× 4 +
4
64× 6
=1
32+
1
32+
1
16+
1
8+
1
4+
1
2+ 1 = 2
We will see that this is an optimal code (not only among the
single-letter constructions but overall).
Example 3: DMS with Alphabet size 8
Course Overview
Examples
• Example 1
• Example 2
• More Examples
8 / 8
EXAMPLE 5–
L = .1 + .1 + .2 + .2 + .3 + .5 + 1 = 2.4
But here we can do better by encoding 2 source letters (or more) at a
time?
1 / 16
Digital Communications III (ECE 154C)
Introduction to Coding and Information Theory
Tara Javidi
These lecture notes were originally developed by late Prof. J. K. Wolf.
UC San Diego
Spring 2014
Source Coding: Lossless
Compression
Source Coding
• Source Coding
• Basic Definitions
Larger Alphabet
Huffman Codes
Class Work
2 / 16
Source Coding: A Simple Example
Source Coding
• Source Coding
• Basic Definitions
Larger Alphabet
Huffman Codes
Class Work
3 / 16
Back to our simple example of a source:
P[A] =1
2,P[B] =
1
4,P[C] =
1
8,P[D] =
1
8
Assumptions
1. One must be able to uniquely recover the source sequence from
the binary sequence
2. One knows the start of the binary sequence at the receiver
3. One would like to minimize the average number of binary digits
per source letter
Source Coding: A Simple Example
Source Coding
• Source Coding
• Basic Definitions
Larger Alphabet
Huffman Codes
Class Work
3 / 16
Back to our simple example of a source:
P[A] =1
2,P[B] =
1
4,P[C] =
1
8,P[D] =
1
8
1. A → 00B → 01 ABAC → 00010010 → ABAC
C → 10D → 11 L = 2
Assumptions
1. One must be able to uniquely recover the source sequence from
the binary sequence
2. One knows the start of the binary sequence at the receiver
3. One would like to minimize the average number of binary digits
per source letter
Source Coding: A Simple Example
Source Coding
• Source Coding
• Basic Definitions
Larger Alphabet
Huffman Codes
Class Work
3 / 16
Back to our simple example of a source:
P[A] =1
2,P[B] =
1
4,P[C] =
1
8,P[D] =
1
8
1. A → 0 AABD → 00110 → CBBA
B → 1 (→ CBD)C → 10 (→ AABD)D → 11 L = 5
4
This code is useless. Why?
Assumptions
1. One must be able to uniquely recover the source sequence from
the binary sequence
2. One knows the start of the binary sequence at the receiver
3. One would like to minimize the average number of binary digits
per source letter
Source Coding: A Simple Example
Source Coding
• Source Coding
• Basic Definitions
Larger Alphabet
Huffman Codes
Class Work
3 / 16
Back to our simple example of a source:
P[A] =1
2,P[B] =
1
4,P[C] =
1
8,P[D] =
1
8
1. A → 0 ABACD → 0100110111 → ABACD
B → 10C → 110D → 111 L = 7
4
Minimum length code satisfying Assumptions 1 and 2!
Assumptions
1. One must be able to uniquely recover the source sequence from
the binary sequence
2. One knows the start of the binary sequence at the receiver
3. One would like to minimize the average number of binary digits
per source letter
Source Coding: A Simple Example
Source Coding
• Source Coding
• Basic Definitions
Larger Alphabet
Huffman Codes
Class Work
3 / 16
Back to our simple example of a source:
P[A] =1
2,P[B] =
1
4,P[C] =
1
8,P[D] =
1
8
1. A → 0 ABACD → 0100110111 → ABACD
B → 10C → 110D → 111 L = 7
4
Minimum length code satisfying Assumptions 1 and 2!
Assumptions
1. One must be able to uniquely recover the source sequence from
the binary sequence
2. One knows the start of the binary sequence at the receiver
3. One would like to minimize the average number of binary digits
per source letter
Source Coding: Basic Definitions
Source Coding
• Source Coding
• Basic Definitions
Larger Alphabet
Huffman Codes
Class Work
4 / 16
Codeword (aka Block Code)
Each source symbol is represented by some sequence of coded
symbols called a code word
Non-Singular Code
Code words are distinct
Uniquely Decodable (U.D.) Code
Every distinct concatenation of m code words is distinct for every
finite m
Instantaneous Code
A U.D. Code where we can decode each code word without seeing
subsequent code words
Source Coding: Basic Definitions
Source Coding
• Source Coding
• Basic Definitions
Larger Alphabet
Huffman Codes
Class Work
4 / 16
Example
Back to the simple case of 4-letter DMS:
Source Symbols Code 1 Code 2 Code 3 Code 4
A 0 00 0 0
B 1 01 10 01
C 00 10 110 011
D 01 11 111 111
Non-Singular Yes Yes Yes Yes
U.D. No Yes Yes Yes
Instan Yes Yes Yes
A NECESSARY AND SUFFICIENT CONDITION for a code to be
instantaneous is that no code word be a PREFIX of any other code
word.
Coding Several Source Symbol at
a Time
Source Coding
Larger Alphabet
• Example 1
Huffman Codes
Class Work
5 / 16
Example 1: 3 letter DMS
Source Coding
Larger Alphabet
• Example 1
Huffman Codes
Class Work
6 / 16
Source Symbols Probability U.D. Code
A .5 0
B .35 10
C .85 11
L1 = 1.5 (Bits / Symbol)
Example 1: 3 letter DMS
Source Coding
Larger Alphabet
• Example 1
Huffman Codes
Class Work
6 / 16
Source Symbols Probability U.D. Code
A .5 0
B .35 10
C .85 11
L1 = 1.5 (Bits / Symbol)
Let us consider two consecutive source symbols at a time:
2 Symbols Probability U.D. Code
AA .25 01
AB .175 11
AC .075 0010
BA .175 000
BB .1225 101
BC .0525 1001
CA .075 0011
CB .0525 10000
CC .0225 10001
L2 = 2.9275 (Bits/ 2 Symbols)L2
2= 1.46375 (Bits / Symbol)
Example 1: 3 letter DMS
Source Coding
Larger Alphabet
• Example 1
Huffman Codes
Class Work
6 / 16
In other words,
1. It is more efficient to build a code for 2 source symbols!
2. Is it possible to decrease the length more and more by
increasing the alphabet size?
To see the answer to the above question, it is useful if we can say
precisely characterize the best code. The codes given above are
Huffman Codes. The procedure for making Huffman Codes will be
described next.
Minimizing average length
Source Coding
Larger Alphabet
Huffman Codes
• Binary Huffman Code
• Example I
• Example II
• Optimality
• Shannon-Fano Codes
Class Work
7 / 16
Binary Huffman Codes
Source Coding
Larger Alphabet
Huffman Codes
• Binary Huffman Code
• Example I
• Example II
• Optimality
• Shannon-Fano Codes
Class Work
8 / 16
Binary Huffman Codes
1. Order probabilities - Highest to Lowest
2. Add two lowest probabilities
3. Reorder probabilities
4. Break ties in any way you want
Binary Huffman Codes
Source Coding
Larger Alphabet
Huffman Codes
• Binary Huffman Code
• Example I
• Example II
• Optimality
• Shannon-Fano Codes
Class Work
8 / 16
Binary Huffman Codes
1. Order probabilities - Highest to Lowest
2. Add two lowest probabilities
3. Reorder probabilities
4. Break ties in any way you want
Example
1. {.1, .2, .15, .3, .25}order→ {.3, .25, .2, .15, .1}
2. {.3, .25, .2, .15, .1︸ ︷︷ ︸
.25
}
3. Get either {.3, (.15, .1︸ ︷︷ ︸
.25
), .25, .2} or
{.3, .25, (.15, .1︸ ︷︷ ︸
.25
), .2}
Binary Huffman Codes
Source Coding
Larger Alphabet
Huffman Codes
• Binary Huffman Code
• Example I
• Example II
• Optimality
• Shannon-Fano Codes
Class Work
8 / 16
Binary Huffman Codes
1. Order probabilities - Highest to Lowest
2. Add two lowest probabilities
3. Reorder probabilities
4. Break ties in any way you want
5. Assign 0 to top branch and 1 to bottom branch (or vice versa)
6. Continue until we have only one probability equal to 1
7. L = Sum of probabilities of combined nodes (i.e., the circled
ones)
Binary Huffman Codes
Source Coding
Larger Alphabet
Huffman Codes
• Binary Huffman Code
• Example I
• Example II
• Optimality
• Shannon-Fano Codes
Class Work
8 / 16
Binary Huffman Codes
1. Order probabilities - Highest to Lowest
2. Add two lowest probabilities
3. Reorder probabilities
4. Break ties in any way you want
5. Assign 0 to top branch and 1 to bottom branch (or vice versa)
6. Continue until we have only one probability equal to 1
7. L = Sum of probabilities of combined nodes (i.e., the circled
ones)
Optimality of Huffman Coding
1. Binary Huffman code will have the shortest average length as
compared with any U.D. Code for set of probabilities.
2. The Huffman code is not unique. Breaking ties in different ways
can result in very different codes. The average length, however,
will be the same for all of these codes.
Huffman Coding: Example
Source Coding
Larger Alphabet
Huffman Codes
• Binary Huffman Code
• Example I
• Example II
• Optimality
• Shannon-Fano Codes
Class Work
9 / 16
Example Continued
1. {.1, .2, .15, .3, .25}order→ {.3, .25, .2, .15, .1}
Huffman Coding: Example
Source Coding
Larger Alphabet
Huffman Codes
• Binary Huffman Code
• Example I
• Example II
• Optimality
• Shannon-Fano Codes
Class Work
9 / 16
Example Continued
1. {.1, .2, .15, .3, .25}order→ {.3, .25, .2, .15, .1}
L = .25 + .45 + .55 + 1 = 2.25
Huffman Coding: Example
Source Coding
Larger Alphabet
Huffman Codes
• Binary Huffman Code
• Example I
• Example II
• Optimality
• Shannon-Fano Codes
Class Work
9 / 16
Example Continued
1. {.1, .2, .15, .3, .25}order→ {.3, .25, .2, .15, .1}
Or
L = .25 + .45 + .55 + 1 = 2.25
Huffman Coding: Tie Breaks
Source Coding
Larger Alphabet
Huffman Codes
• Binary Huffman Code
• Example I
• Example II
• Optimality
• Shannon-Fano Codes
Class Work
10 / 16
In the last example, the two ways of breaking the tie led to two
different codes with the same set of code lengths. This is not always
the case — Sometimes we get different codes with different code
lengths.
EXAMPLE:
Huffman Coding: Optimal Average Length
Source Coding
Larger Alphabet
Huffman Codes
• Binary Huffman Code
• Example I
• Example II
• Optimality
• Shannon-Fano Codes
Class Work
11 / 16
• Binary Huffman Code will have the shortest average length as
compared with any U.D. Code for set of probabilities (No U.D. will
have a shorter average length).
◦ The proof that a Binary Huffman Code is optimal — that is,
has the shortest average code word length as compared with
any U.D. code for that the same set of probabilities — is
omitted.
◦ However, we would like to mention that the proof is based on
the fact that in the process of constructing a Huffman Code
for that set of probabilities other codes are formed for other
sets of probabilities, all of which are optimal.
Shannon-Fano Codes
Source Coding
Larger Alphabet
Huffman Codes
• Binary Huffman Code
• Example I
• Example II
• Optimality
• Shannon-Fano Codes
Class Work
12 / 16
SHANNON - FANO CODES is another binary coding technique to
construct U.D. codes (not necessarily optimum!)
1. Order probabilities in decreasing order.
2. Partition into 2 sets that one as close to equally probable as
possible. Label top set with a ”‘0”’ and bottom set with a ”‘1”’.
3. Continue using step 2 over and over
Shannon-Fano Codes
Source Coding
Larger Alphabet
Huffman Codes
• Binary Huffman Code
• Example I
• Example II
• Optimality
• Shannon-Fano Codes
Class Work
12 / 16
SHANNON - FANO CODES is another binary coding technique to
construct U.D. codes (not necessarily optimum!)
1. Order probabilities in decreasing order.
2. Partition into 2 sets that one as close to equally probable as
possible. Label top set with a ”‘0”’ and bottom set with a ”‘1”’.
3. Continue using step 2 over and over
.5 ⇒ 0
.2 ⇒ 1 0.15 ⇒ 1 1 0.15 ⇒ 1 1 1
.4 ⇒ 0
.3 ⇒ 1 0
.1 ⇒ 1 1 0 0.05 ⇒ 1 1 0 1.05 ⇒ 1 1 1 0.05 ⇒ 1 1 1 1 0.05 ⇒ 1 1 0 1 1
L =? L = 2.3
Compare with Huffman coding. Same length as Huffman code?!
More Examples
Source Coding
Larger Alphabet
Huffman Codes
Class Work
• More Examples
• More Examples
• More Examples
13 / 16
Binary Huffman Codes
Source Coding
Larger Alphabet
Huffman Codes
Class Work
• More Examples
• More Examples
• More Examples
14 / 16
Construct binary Huffman and Shannon-Fano codes where:
EXAMPLE 1: (p1, p2, p3, p4) = (12, 14, 18, 18)
EXAMPLE 2: Consider the examples on the previous slide and
construct binary Huffman codes.
Large Alphabet Size
Source Coding
Larger Alphabet
Huffman Codes
Class Work
• More Examples
• More Examples
• More Examples
15 / 16
Example 3: Consider a binary Source {A,B}(p1, p2) = (.9, .1). Now construct a series of Huffman Codes and
series of Shannon-Fano Codes, by encoding N source symbols at a
time for N = 1, 2, 3, 4.
Shannon-Fano codes are suboptimal !
Source Coding
Larger Alphabet
Huffman Codes
Class Work
• More Examples
• More Examples
• More Examples
16 / 16
Example 3: Construct a Shannon-Fano code:
2× .25 0 02× .20 0 1
.6 3× .15 1 0 0
.7 3× .10 1 0 1.75 4× .05 1 1 0 0.8 5× .05 1 1 0 1 0
.85 5× .05 1 1 0 1 1.9 4× .05 1 1 1 0
5× .04 1 1 1 1 06× .03 1 1 1 1 1 06× .03 1 1 1 1 1 1
L = 3.11
Compare this with a binary Huffman Code.
1 / 17
Digital Communications III (ECE 154C)
Introduction to Coding and Information Theory
Tara Javidi
These lecture notes were originally developed by late Prof. J. K. Wolf.
UC San Diego
Spring 2014
Noiseless Source Coding
Continued
Source Coding
• n-ary Huffman
Coding
Beyond Huffman
Limpel-Ziv
2 / 17
Non-binary Huffman Coding
Source Coding
• n-ary Huffman
Coding
Beyond Huffman
Limpel-Ziv
3 / 17
• The objective is to create a Huffman Code where the code words
are from an alphabet with n letter is to:
1. Order probabilities high to low (perhaps with an extra symbol
with probability 0)
2. Combine n least likely probabilities. Add them and re-order.
3. End up with n symbols(i.e. probabilities)!!!
Example 1: A source with alphabet {A,B,C,D,E} and
probabilities (.5, .3, .1, .08, .02) coded into ternary stream n = 3:
Non-binary Huffman Coding
Source Coding
• n-ary Huffman
Coding
Beyond Huffman
Limpel-Ziv
4 / 17
Example 2: n = 3 {A,B,C,D}(p1, p2, p3, p4) = (.5, .3, .1, .1)
Non-binary Huffman Coding
Source Coding
• n-ary Huffman
Coding
Beyond Huffman
Limpel-Ziv
4 / 17
Example 2: n = 3 {A,B,C,D}(p1, p2, p3, p4) = (.5, .3, .1, .1)
Placeholder Figure A L1 = 1.5 SUBOPTIMAL
Placeholder Figure A L1 = 1.2 OPTIMAL
Non-binary Huffman Coding
Source Coding
• n-ary Huffman
Coding
Beyond Huffman
Limpel-Ziv
4 / 17
Example 2: n = 3 {A,B,C,D}(p1, p2, p3, p4) = (.5, .3, .1, .1)
• Sometimes one has to add Phantom Source Symbols with 0probability in order to make a Non-Binary Huffman Code.
• How many?
◦ If one starts with M source symbols and one combines the nleast likely into one symbol, one is left with M − (n− 1)symbols.
◦ After doing this α times, one is left with M − α(n− 1)symbols.
◦ But at the end we must be left with n symbols. If this is not
the case, we must add Phantom Symbols.
• Add D Phantom Symbols to insure that
M +D − α(n− 1) = n or (M +D) = α′(n− 1) + 1
Non-binary Huffman Coding
Source Coding
• n-ary Huffman
Coding
Beyond Huffman
Limpel-Ziv
5 / 17
EXAMPLES:
n = 3M D
3 0
4 1
5 0
6 1
7 0
8 1
9 0
10 1
n = 4M D
4 0
5 2
6 1
7 0
8 2
9 1
10 0
11 1
n = 5M D
5 0
6 3
7 2
8 1
9 0
10 3
11 2
12 1
n = 6M D
6 0
7 4
8 3
9 2
10 1
11 0
12 4
13 3
NOTE :
• M +D − 1 must be divisible by n− 1.
EX: n = 3 ⇒ M +D − 1 must be even
• D ≤ n− 2
Beyond Huffman Codes
Source Coding
Beyond Huffman
• Fax
• Tunstall Code
• Tunstall Code
• Tunstall Code
• Tunstall Code
• Tunstall Code
• Tunstall Code
Limpel-Ziv
6 / 17
Run Length Codes for Fax (B/W)
Source Coding
Beyond Huffman
• Fax
• Tunstall Code
• Tunstall Code
• Tunstall Code
• Tunstall Code
• Tunstall Code
• Tunstall Code
Limpel-Ziv
7 / 17
Variable-length to Fixed-length Source Coding
Source Coding
Beyond Huffman
• Fax
• Tunstall Code
• Tunstall Code
• Tunstall Code
• Tunstall Code
• Tunstall Code
• Tunstall Code
Limpel-Ziv
8 / 17
Previously we only considered the situation where we encoded Nsource symbols into variable length code sequences for a
fixed value of N . We could call this ”‘fixed length to variable length”’
encoding. But another possibility exists. We could encode variable
length source sequences into fixed or variable length code words.
Example 1: Consider the DMS source {A,B} with probabilities
(.9, .1) and the following code book
Source Sequences Codewords
B 00AB 01AAB 10AAA 11
Average length of source phrase
= 1× .1 + 2× .09 + 3× (.081 + .70) = 2.75Average # of code symbols/source symbols = 2
2.71= 0.738
Tunstall Codes
Source Coding
Beyond Huffman
• Fax
• Tunstall Code
• Tunstall Code
• Tunstall Code
• Tunstall Code
• Tunstall Code
• Tunstall Code
Limpel-Ziv
9 / 17
Tunstall Codes are U.D. Variable- to Fixed- length codes with binary
code words
Basic Idea – Encode into binary code words of fixed length L, make
2L source phrases that are as nearly equally probable as we can
We do this by making the source phrases as leaves of a tree and
always splitting the leaf with the highest probability.
Tunstall Codes
Source Coding
Beyond Huffman
• Fax
• Tunstall Code
• Tunstall Code
• Tunstall Code
• Tunstall Code
• Tunstall Code
• Tunstall Code
Limpel-Ziv
10 / 17
Example 2: (A,B,C,D) with (p1, p2, p3, p4) = (.5, .3, .1, .1)
Placeholder Figure A
Source Symbol Code word Source Symbol Code word
D 0000 C 0001
BB 0010 AB 1001
BC 0011 AC 1010
BD 0100 AD 1011
BAA 0101 AAA 1100
BAB 0110 AAB 1101
BAC 0111 AAC 1110
BAD 1000 AAD 1111
Average length of source phrase = Sum of probabilities of internal
nodes) = 1 + .5 + .3 + .25 + .15 = 2.2Average number of code symbols/source symbols = 4/2.2 = 1.82
Improved Tunstall Coding
Source Coding
Beyond Huffman
• Fax
• Tunstall Code
• Tunstall Code
• Tunstall Code
• Tunstall Code
• Tunstall Code
• Tunstall Code
Limpel-Ziv
11 / 17
• Since the phrases are not equally probable, one can use a
Huffman Code on the phrases.
• The result is encoding a variable number of source symbols into
a variable number of code symbols.
Example 3: Back to Example 1 with (A,B) with (.9, .1)
We have seen that Tunstall alone I = 2.71 Av = 2
2.71= .738
Source Phrases Tunstall Code Probability Improved Tunstall (Huffman)
AAA 11 0.729 0
B 00 0.1 11
AB 01 0.09 100
AAB 10 0.081 101
Average # of code symbols per source symbol
=(1+.271+.171)(1+.9+.81) =
1.4422.71 = .532
Summary of Results for (A,B) = (.9, .1)
Source Coding
Beyond Huffman
• Fax
• Tunstall Code
• Tunstall Code
• Tunstall Code
• Tunstall Code
• Tunstall Code
• Tunstall Code
Limpel-Ziv
12 / 17
All of the following use 4 code words in coding table:
1. Huffman Code, N = 2
AA −→ 0AB −→ 11BA −→ 100BB −→ 101
2. Shannon-Fano Code N = 2
AA −→ 0AB −→ 10BA −→ 110BB −→ 111
Summary of Results for (A,B) = (.9, .1)
Source Coding
Beyond Huffman
• Fax
• Tunstall Code
• Tunstall Code
• Tunstall Code
• Tunstall Code
• Tunstall Code
• Tunstall Code
Limpel-Ziv
13 / 17
1. Tunstall Code
B −→ 00AB −→ 01
AAB −→ 10AAA −→ 11
2. Tunstall/Huffman
B −→ 11AB −→ 100
AAA −→ 0AAB −→ 101
Limpel-Ziv Source Coding
Source Coding
Beyond Huffman
Limpel-Ziv
• Lempel-Ziv
• Lempel-Ziv
• Lempel-Ziv
14 / 17
Lempel-Ziv Source Coding
Source Coding
Beyond Huffman
Limpel-Ziv
• Lempel-Ziv
• Lempel-Ziv
• Lempel-Ziv
15 / 17
• The basic idea is that if we have a dictionary of 2A source
phrases (Available at both the encoder and the decoder) in order
to encode on of these phrases one needs only A binary digits.
• Normally, a computer stores each symbol as an ASCII character
of 8 binary digits. (Actually only 7 are needed)
• Using L-Z encoding, far less than 7 binary digits per symbol are
needed. Typically the compression is about 2:1 or 3:1.
• Variants of L-Z codes was the algorithm of the widely used Unix
file compression utility compress as well as gzip. Several other
popular compression utilities also used L-Z, or closely related
encoding.
• LZ became very widely used when it became part of the GIF
image format in 1987. It may also (optionally) be used in TIFF
and PDF files.
• There are two versions of L-Z codes. We will only discuss the
“window” version.
Lempel-Ziv Source Coding
Source Coding
Beyond Huffman
Limpel-Ziv
• Lempel-Ziv
• Lempel-Ziv
• Lempel-Ziv
16 / 17
• In (the window version of) Lempel Ziv, symbols that have already
been encoded are stored in a window.
• The encoder then looks at the next symbols to be encoder to find
the longest string that is in the window that matches the source
symbols to be encoded.
◦ If it can’t find the next symbol in the window, it sends a ”‘0”’
followed by the 8(or 7) bits of the ASCII character.
◦ If it finds a sequence of one or more symbols in the window, it
sends a ”‘1”’ followed by the bit position of the first symbol in
the match followed by the length of the match. These latter
two quantities are encoded into binary.
◦ Then the sequence that was just encoded is put into the
window.
Lempel-Ziv Source Coding
Source Coding
Beyond Huffman
Limpel-Ziv
• Lempel-Ziv
• Lempel-Ziv
• Lempel-Ziv
17 / 17
Example: Suppose the content of the window is given as
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
T H E T H R E E A R E I N
The next word, assuming it is “THE ”, will be encoded as
(1, “15”︸︷︷︸
4bits
,
?bits︷︸︸︷
”4” )
And then the windows content will be updated as15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
T H R E E A R E I N T H E
Lempel-Ziv Source Coding
Source Coding
Beyond Huffman
Limpel-Ziv
• Lempel-Ziv
• Lempel-Ziv
• Lempel-Ziv
17 / 17
EXAMPLE: Encode the text
”MY MY WHAT A HAT IS THAT”
16 BIT WINDOW
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
M
M Y
M Y
M Y M Y
M Y M Y W
M Y M Y W H
M Y M Y W H A
M Y M Y W H A T
M Y M Y W H A T
M Y M Y W H A T A
M Y M Y W H A T A
M Y M Y W H A T A H A T
M Y W H A T A H A T I
M Y W H A T A H A T I S
M Y W H A T A H A T I S
Y W H A T A H A T I S T
W H A T A H A T I S T H A T
# of 1 bits
(0,M) 9
(0, Y ) 9
(0, ) 9
(1, 2, 3) 8
(1, 2, 3) 8
(0,W ) 9
(0, H) 9
(0, A) 9
(0, T ) 9
(1, 4, 1) 6
(1, 2, 1) 6
(1, 1, 1) 6
(1, 5, 4) 9
(0, 1) 9
(0, 5) 9
(1, 2, 1) 6
(1, 4, 1) 6
(1, 7, 3) 8
144
Lempel-Ziv Source Coding
Source Coding
Beyond Huffman
Limpel-Ziv
• Lempel-Ziv
• Lempel-Ziv
• Lempel-Ziv
17 / 17
No match (0,M) −→ 1 bit more than needed for a symbol (1 + 8 = 9)
Match(1, , , )↓ ↓ ց
1 bits depends on depends on the code used
window size to encode lengths
One really simple code for this purpose might be
1 −→ 02 −→ 103 −→ 110
...
What are the advantages/disadvantages of this code?
1 / 12
Digital Communications III (ECE 154C)
Introduction to Coding and Information Theory
Tara Javidi
These lecture notes were originally developed by late Prof. J. K. Wolf.
UC San Diego
Spring 2014
Noiseless Source Coding
Fundamental Limits
Source Coding
• Entropy
• A Useful Theorem
• A Useful Theorem
• A Useful Theorem
• More on Entropy
• More on Entropy
• Fundamental Limits
• More on Entropy
• Source Coding
Theorem
• Source Coding
Theorem
2 / 12
Shannon Entropy
Source Coding
• Entropy
• A Useful Theorem
• A Useful Theorem
• A Useful Theorem
• More on Entropy
• More on Entropy
• Fundamental Limits
• More on Entropy
• Source Coding
Theorem
• Source Coding
Theorem
3 / 12
• Let an I.I.D. source S, have M source letters that occur with
probabilities p1, p2, ..., pM ,∑
M
i=1pi = 1
• The entropy of the source S is denoted H(S) and is defined as
Ha(S) =
M∑
i=1
pi loga1
pi= −
M∑
i=1
pi loga pi
Ha(S) = E
[
loga1
pi
]
• The base of the logarithms is usually taken to be equal to 2. In
that case, H2(S) is simply written as H(S) and is measured in
units of “bits”.
Shannon Entropy
Source Coding
• Entropy
• A Useful Theorem
• A Useful Theorem
• A Useful Theorem
• More on Entropy
• More on Entropy
• Fundamental Limits
• More on Entropy
• Source Coding
Theorem
• Source Coding
Theorem
4 / 12
• Other bases can be used and easily transformed to. Note:
loga x = (logb x)(loga b) =(logb x)
(logb a)
Hence,
Ha(S) = Hb(S) · loga b =Hb(S)
logb a
A Useful Theorem
Let p1, p2, ..., pM be one set of probabilities and let p′1, p′
2, ..., p′
M
be another (note∑
M
i=1pi = 1,
∑
M
i=1p′i= 1). Then
M∑
i=1
pi log1
pi≤
M∑
i=1
pi log1
p′i
,
with equality iff pi = p′i
for i = 1, 2, ...,M
Shannon Entropy
Source Coding
• Entropy
• A Useful Theorem
• A Useful Theorem
• A Useful Theorem
• More on Entropy
• More on Entropy
• Fundamental Limits
• More on Entropy
• Source Coding
Theorem
• Source Coding
Theorem
5 / 12
Proof.
First note that lnx ≤ x− 1 with equality iff x = 1.
Placeholder Figure A
M∑
i=1
pi logp′i
pi=
(
M∑
i=1
pi lnp′i
pi
)
log e
Shannon Entropy
Source Coding
• Entropy
• A Useful Theorem
• A Useful Theorem
• A Useful Theorem
• More on Entropy
• More on Entropy
• Fundamental Limits
• More on Entropy
• Source Coding
Theorem
• Source Coding
Theorem
6 / 12
proof continued.
(
M∑
i=1
pi lnp′i
pi
)
log e ≤ (log e)
M∑
i=1
pi
(
p′i
pi− 1
)
= (log e)
(
M∑
i=1
p′i −
M∑
i=1
pi
)
= 0
In other words,
M∑
i=1
pi log1
pi≤
M∑
i=1
pi log1
p′i
Shannon Entropy
Source Coding
• Entropy
• A Useful Theorem
• A Useful Theorem
• A Useful Theorem
• More on Entropy
• More on Entropy
• Fundamental Limits
• More on Entropy
• Source Coding
Theorem
• Source Coding
Theorem
7 / 12
1. For an equally likely i.i.d. source with M source letters
H2(S) = log2 M (aHa(S) = logaM,anya)2. For any i.i.d. source with M source letters
0 ≤ H2(S) ≤ log2 M, for alla
This follows from previous theorem with p′i= 1
Mall i.
3. Consider an iid source with source alphabet S. If we consider
encoding M source letters at a time, this is an iid source with
alphabet SM for source letters. Call this the M tuple extension
of the source and denote it is by SM . Then
H2(SM ) = mH2(S),
(and similarly for all a, Ha(SM ) = mHa(S)).
The proofs are omitted but are easy.
Computation of Entropy (base 2)
Source Coding
• Entropy
• A Useful Theorem
• A Useful Theorem
• A Useful Theorem
• More on Entropy
• More on Entropy
• Fundamental Limits
• More on Entropy
• Source Coding
Theorem
• Source Coding
Theorem
8 / 12
Example 1: Consider a source with M = 2 and
(p1, p2) = (0.9, 0.1).
H2(S) = .9 log21
.9+ .1 log2
1
1.1= 0.469bits
From before we gave Huffman Codes for this source and extensions
of this source for which
L1 = 1
L 2
2
= 0.645
L 3
3
= 0.533
L 4
4
= 0.49
In other words, we note that Lm
m≥ H2(s). Furthermore, as m gets
larger Im
mis getting closer to H2(S).
Performance of Huffman Codes
Source Coding
• Entropy
• A Useful Theorem
• A Useful Theorem
• A Useful Theorem
• More on Entropy
• More on Entropy
• Fundamental Limits
• More on Entropy
• Source Coding
Theorem
• Source Coding
Theorem
9 / 12
−→ One can prove that, in general, for a binary Huffman Code,
H2(S) ≤ Lm
m
< H2(S) +1
m
Example 2: Consider a source with M = 3 and
(p1, p2, p3) = (.5, .35, .15).
H2(S) = .5 log21
.5+ .35 log2
1
.35+ .15 log2
1
.15= 1.44 bits
Again we have already given codes for this source such that
1 symbol at a time L1 = 1.52 symbols at a time L 2
2
= 1.46
Performance of Huffman Codes
Source Coding
• Entropy
• A Useful Theorem
• A Useful Theorem
• A Useful Theorem
• More on Entropy
• More on Entropy
• Fundamental Limits
• More on Entropy
• Source Coding
Theorem
• Source Coding
Theorem
10 / 12
Example 3: Consider M = 4 and (p1, p2, p3, p4) = (12, 14, 18, 18).
H2(S) =1
2log2
1
1/2+
1
4log2
1
1/4+
1
8log2
1
1/8+
1
8log2
1
1/8
= 1.75bits
But from before we gave the code
Source Symbols Codewords
A 0B 10C 110D 111
for which L1 = H2(S).
This means that one cannot improve on the efficiency of this
Huffman code by encoding several source symbols at a time.
Fundamental Source Coding Limit
Source Coding
• Entropy
• A Useful Theorem
• A Useful Theorem
• A Useful Theorem
• More on Entropy
• More on Entropy
• Fundamental Limits
• More on Entropy
• Source Coding
Theorem
• Source Coding
Theorem
11 / 12
Theorem [Lossless Source Coding Theorem]
For any U.D. Binary Code corresponding to the N th extension of the
I.I.D. Source S, for every N = 1, 2, ...
LN
N≥ H(S)
• For a binary Huffman Code corresponding to the N th extension
of the I.I.D. Source S
H2(S) +1
m>
LN
N≥ H2(S)
• But this implies that Huffman coding is asymptotically optimal, i.e.
limN→∞
LN
N−→ H2(S)
NON-BINARY CODE WORDS
Source Coding
• Entropy
• A Useful Theorem
• A Useful Theorem
• A Useful Theorem
• More on Entropy
• More on Entropy
• Fundamental Limits
• More on Entropy
• Source Coding
Theorem
• Source Coding
Theorem
12 / 12
The code symbols that make up the codewords can be from a higher
order alphabet than 2.
Example 4: I.I.D. Source {A,B,C,D,E} with U.D. codes (where each
concatenation of code words can be decoded in only one way):
SYMBOLS TERNARY QUATERNARY 5-ary
A 0 0 0B 1 1 1C 20 2 2D 21 30 3E 22 31 4
• A lower bound to the average code length of any U.D. n-ary code
(with n-letters) is Hn(S)• For example, for a ternary code, the average length (per source
letter),LM
M, is no less than H3(S).
1 / 14
Digital Communications III (ECE 154C)
Introduction to Coding and Information Theory
Tara Javidi
These lecture notes were originally developed by late Prof. J. K. Wolf.
UC San Diego
Spring 2014
Noiseless Source Coding
Fundamental Limits
Source Coding
Theorem
• Theorem Statement
• Proof Sketch
Elements of Proof
Sourece Coding
Theorem
2 / 14
Fundamental Source Coding Limit
Source Coding
Theorem
• Theorem Statement
• Proof Sketch
Elements of Proof
Sourece Coding
Theorem
3 / 14
Theorem [Lossless Source Coding Theorem]
For any U.D. n-ary code corresponding to the N th extension of the
I.I.D. Source S, for every N = 1, 2, ...
LN
N≥ Hn(S)
• Furthermore for an n-ary Huffman Code corresponding to the
N th extension of the I.I.D. Source S
Hn(S) +1
N>
LN
N≥ Hn(S)
• But this implies that Huffman coding is asymptotically optimal, i.e.
limN→∞
LN
N−→ Hn(S)
Sketch of the Proof
Source Coding
Theorem
• Theorem Statement
• Proof Sketch
Elements of Proof
Sourece Coding
Theorem
4 / 14
• If one can construct a U.D. code such that li =⌈
logn1pi
⌉
, we1
will have
Hn(S) + 1 > L ≥ Hn(S).
• But is it always possible to construct such a code?
• If it is, is it good enough? Can we do better?
• To do this, we first prove useful conditions on a set of integers if
they are the length of U.D. codewords. These conditions are
called Kraft and McMillan inequalities.
1Note that ⌈x⌉ is the unique integer in the interval [x, x+ 1).
Proof of Source Coding Theorem
Source Coding
Theorem
Elements of Proof
• Kraft Inequality
• Kraft Inequality
• Kraft Inequality
• Kraft Inequality
• McMillan Inequality
• McMillan Inequality
Sourece Coding
Theorem
5 / 14
Codeword Lengths: Necessary and Sufficient Condition
Source Coding
Theorem
Elements of Proof
• Kraft Inequality
• Kraft Inequality
• Kraft Inequality
• Kraft Inequality
• McMillan Inequality
• McMillan Inequality
Sourece Coding
Theorem
6 / 14
Theorem [Kraft Inequality] A necessary and sufficient condition for
the construction of an instantaneous n-ary code with M code words
of lengths l1, l2, .., lM where the code symbols take on n different
values is thatM∑
i=1
n−li ≤ 1
PROOF OF SUFFICIENCY:
• We construct an instantaneous code with these code word
lengths. Let there be mj code words of length j for
j = 1, 2, ..., l∗ = max li. Then
M∑
i=1
n−li =
l∗∑
j=1
mjn−j
Kraft Inequality: Proof of sufficiency
Source Coding
Theorem
Elements of Proof
• Kraft Inequality
• Kraft Inequality
• Kraft Inequality
• Kraft Inequality
• McMillan Inequality
• McMillan Inequality
Sourece Coding
Theorem
7 / 14
• In other words, if∑M
i=1 n−li ≤ 1, then
∑l∗
j=1mjnl∗−1 ≤ nl∗ .
Or,
ml∗ +m(l∗−1)n+m(l∗−2)n2 + ...+m1n
l∗−1≤ nl∗ ,
and, equivalently
ml∗ ≤ nl∗−m1n
l∗−1−m2n
l∗−2t− . . .−ml∗−1n (1)
• But since ml∗ ≥ 0 we then have
0 ≤ nl∗−m1n
l∗−1−m2n
l∗−2t− . . .−ml∗−1n
and can repeat the procedure to arrive at
ml∗−1 ≤ nl∗−1−m1n
l∗−2−m2n
l∗−3− . . .−ml∗−2n (2)
Codeword Lengths: Necessary and Sufficient Condition
Source Coding
Theorem
Elements of Proof
• Kraft Inequality
• Kraft Inequality
• Kraft Inequality
• Kraft Inequality
• McMillan Inequality
• McMillan Inequality
Sourece Coding
Theorem
8 / 14
• But dividing by n and noting that ml∗−1 ≥ 0 we have
0 ≤ nl∗−2−m1n
l∗−3−m2n
l∗−4− . . .−ml∗−2
equivalently
ml∗−2 ≤ nl∗−2−m1n
l∗−3−m2n
l∗−4− . . . (3)
• Continuing we get
0 ≤ m3 ≤ n3 −m1n2 −m2n (4)
0 ≤ m2 ≤ n2 −m1n (5)
m1 ≤ n (6)
• Note that if
M∑
i=1
n−li ≤ 1 then the mj satisfy (1)-(6).
Kraft Inequality: Proof of sufficiency
Source Coding
Theorem
Elements of Proof
• Kraft Inequality
• Kraft Inequality
• Kraft Inequality
• Kraft Inequality
• McMillan Inequality
• McMillan Inequality
Sourece Coding
Theorem
9 / 14
Note that m1 ≤ n.
If m1 = n then we are done (assigning each codeword a letter).
If m1 < n, we have (n−m1) unused prefixes to form code words
of length 2 for which the code words of length 1 are not prefixes.
This means that there are (n−m1)n phrases of length 2 to select
the codewords from. But this is larger than or equal to the number of
codewords of length 2, m2 according to equation (5). So we can
construct our code by selecting m2 of the (n−m1)n prefix-free
phrases of length 2
If m2 = (n−m1)n we are done. If m2 < n2 −m1n, there are
(n2 −m1n−m2)n code words of length 3 which satisfy the prefix
condition. etc.
Kraft Inequality: Proof of necessity
Source Coding
Theorem
Elements of Proof
• Kraft Inequality
• Kraft Inequality
• Kraft Inequality
• Kraft Inequality
• McMillan Inequality
• McMillan Inequality
Sourece Coding
Theorem
10 / 14
Proof of necessity follows from McMillan inequality
Theorem [McMillan Inequality]
A necessary and a sufficient condition for the existence of a U.D.
code with M code words of length l1, l2, ..., lM where the code
symbols take on n different values is:
M∑
i=1
n−li ≤ 1.
Here we sketch the proof of necessity of McMillan Inequality (enough
to prove the necessity of Kraft inequality). Proof by contradiction:
1. Assume
M∑
i=1
n−li = A > 1 and a U.D. code exists.
2. If
(
M∑
i=1
n−li > 1
)
, then
[
M∑
i=1
n−li
]N
≈ AN.
Kraft Inequality: Proof of necessity
Source Coding
Theorem
Elements of Proof
• Kraft Inequality
• Kraft Inequality
• Kraft Inequality
• Kraft Inequality
• McMillan Inequality
• McMillan Inequality
Sourece Coding
Theorem
11 / 14
3.
(
M∑
i=1
n−li
)N
=
l∗∑
j=1
mjn−j
N
=Nl∗∑
k=N
Nkn−k
where
Nk =∑
i1+i2+...+iN=k
mi1mi2 · · ·miN
and it denotes the # of strings of N code words that are the
length of exactly k.
4. If the code is U.D., Nk ≤ nk. But then for a U.D. code
(
M∑
i=1
n−li
)N
=
(
Nl∗∑
k=N
Nkn−k
)
≤
Nl∗∑
k=N
1 = Nl∗ −N + 1
Which grows linearly with N , not exponentially with N ,
contradicting bullet point 2.
Complete Proof
Source Coding
Theorem
Elements of Proof
Sourece Coding
Theorem
• Proof
• Proof
12 / 14
Proof of Source Coding Theorem
Source Coding
Theorem
Elements of Proof
Sourece Coding
Theorem
• Proof
• Proof
13 / 14
Lower bound for L for a U.D. Code
Theorem: For any instantaneous code 2, L ≥ Hn(S).Furthermore, L = Hn(S) iff pi = n−li .
Proof: Let p′
i =n−li
M∑
j=1
n−lj
. Note p′i ≥ 0 and
M∑
i=1
p′i = 1.
From before:
Hn(S) =
M∑
i=1
pi logn1
pi≤
M∑
i=1
pi logn1
p′i
2Here L = Average length of the U.D. Code and n = number of symbols in code
alphabet.
Proof of Source Coding Theorem
Source Coding
Theorem
Elements of Proof
Sourece Coding
Theorem
• Proof
• Proof
14 / 14
Then:
Hn(S) ≤M∑
i=1
pili +M∑
i=1
pi logn
M∑
j=1
n−lj
But for a U.D. code,
M∑
j=1
n−lj ≤ 1, so logn
M∑
j=1
n−lj
≤ 0
Hn(S) ≤ L
Equality occurs if and only if
M∑
j=1
n−lj = 1 and pi = p′i.
But both of these conditions hold if pi = n−li , li is an integer.
1 / 12
Digital Communications III (ECE 154C)
Introduction to Coding and Information Theory
Tara Javidi
These lecture notes were originally developed by late Prof. J. K. Wolf.
UC San Diego
Spring 2014
Asymptotic Optimality
Huffman Codes
• Optimality
• Proof Sketch
Examples
Optimality Proof
2 / 12
Asymptotic Optimality of Huffman Codes
Huffman Codes
• Optimality
• Proof Sketch
Examples
Optimality Proof
3 / 12
We have seen that for any U.D. n-ary code corresponding to the
N th extension of the I.I.D. Source S, for every N = 1, 2, ...,LN
N≥ Hn(S).
→ How do you get the result for general N?
• Next we show that for an n-ary Huffman Code corresponding to
the N th extension of the I.I.D. Source S
Hn(S) +1
N>
LN
N≥ Hn(S)
• But this implies that Huffman coding is asymptotically optimal, i.e.
limN→∞
LN
N−→ Hn(S)
Sketch of the Proof
Huffman Codes
• Optimality
• Proof Sketch
Examples
Optimality Proof
4 / 12
• From McMillan Inequality, we have that one can construct a U.D.
code such that li =⌈
logn1
pi
⌉
. In other words, we can construct
a U.D. code for which we have
Hn(S) + 1 > L ≥ Hn(S).
• Why?
Asymptotic Optimality of
Huffman Coding
Examples
Huffman Codes
Examples
• Example I
• Example II
• Example III
Optimality Proof
5 / 12
Huffman Codes are asymptotically optimal
Huffman Codes
Examples
• Example I
• Example II
• Example III
Optimality Proof
6 / 12
EX 1: n = 2, (.9, .09, .01) ⇒ H2(S) = .516
Prob log21
pili =
⌈
log21
pi
⌉
A .9 .152 1B .09 3.47 4C .01 6.67 7
L = 1.33
Note thatHn(S) ≤ L < Hn(S) + 1.516 ≤ 1.33 < 1.516
A better code (this is actually a Huffman Code:)
A 0B 10C 11
L = 1.1
Huffman Codes are asymptotically optimal
Huffman Codes
Examples
• Example I
• Example II
• Example III
Optimality Proof
7 / 12
EX 2: n = 2, (.19, .19, .19, .19, .19.05) ⇒ H2(S) = 2.492
Prob log21
pili =
⌈
log21
pi
⌉
A .19 2.396 3B .19 2.396 3C .19 2.396 3D .19 2.396 3E .19 2.396 3F .05 4.322 5
L = 3.10
Huffman Code:A 00 D 101B 01 E 110C 100 F 111
L = 2.62
Huffman Codes are asymptotically optimal
Huffman Codes
Examples
• Example I
• Example II
• Example III
Optimality Proof
8 / 12
EX 3: n = 3, (.19, .19, .19, .19, .19.05) ⇒ H2(S) = 2.492
Prob log31
pili =
⌈
log31
pi
⌉
A .19 1.51 2B .19 1.51 2C .19 1.51 2D .19 1.51 2E .19 1.51 2F .05 2.73 3
L = 1.57
Huffman Code:A 0 D 12B 10 E 20C 11 F 21
L = 1.81
Performance of Huffman Codes
Huffman Codes
Examples
Optimality Proof
• Optimality
• Necessary Conditions
• Proof
9 / 12
Asymptotic Optimality of Huffman Codes
Huffman Codes
Examples
Optimality Proof
• Optimality
• Necessary Conditions
• Proof
10 / 12
Theorem: For an I.I.D. source S, a Huffman Code exists with code
alphabet size n for the N th extension of this source such that
Hn(S) ≤LN
N< Hn(S) + 1
Proof: Earlier (slide 3) we have seen that a U.D. code exists for the
N th extension of the source such that
Hn(SN ) ≤ LN < Hn(S
N ) + 1
But Hn(SN ) = NHn(S). So all we will need is to show that for a
given fixed source with fixed alphabet Huffman Code is at least as
good as the U.D. code.
Q.E.D.
Next we prove the fact we have stated without proof that, for any
fixed and given alphabet, a Huffman code is optimal.
Properties of an Optimal (Compact) Binary Code
Huffman Codes
Examples
Optimality Proof
• Optimality
• Necessary Conditions
• Proof
11 / 12
We only state the results in case of binary codes, even though
similar lines of arguments can be used to establish n-ary codes.
We only need to consider instantaneous codes!
1. If pi ≤ pj , then li ≥ ljProof: Otherwise switching code words will reduce L.
2. There is no single code word of length l∗ ≤ maxli.
Proof: If there were shorten it by one digit, it will still not be a
prefix of any other code word and will shorten L.
3. The code words of length l∗, they occur in pairs in which the
code words in each pair agree in all but the last digit. Proof: If
not, shorten the code word for which is not the case by one digit
and it will not be the prefix of any other code word. This will
shorten L.
Optimality of Binary Huffman Codes
Huffman Codes
Examples
Optimality Proof
• Optimality
• Necessary Conditions
• Proof
12 / 12
Placeholder Figure A
Lj−1 = Lj + Pα1 + Pα2 since the code at (j − 1) is same as the
code at (j) except for two words that have length one more.
We now show that if code Cj is optimal, then code Cj−1 must also
be optimal. Suppose not; and there were a better code at (j − 1).
Call it’s average length L′
j−1 < Lj−1. But the two code words with
probabilities Pα0 & Pα1 are identical in all but the last digit. Form a
new code at j that has the identical prefix as the code word for Pα.
This code will have average length L′
j = L′
j−1 − (Pα1 + Pα2) so
that L′
j−1 = Lj−1. But this can’t be the case if Cj was optimal.
1 / 26
Digital Communications III (ECE 154C)
Introduction to Coding and Information Theory
Tara Javidi
These lecture notes were originally developed by late Prof. J. K. Wolf.
UC San Diego
Spring 2014
Coding With Distortion
Lossy Source Coding
• Lossy Coding
• Distortion
A/D & D/A
Scalar Quantization
Vector Quantization
Audio Compression
2 / 26
Coding with Distortion
Lossy Source Coding
• Lossy Coding
• Distortion
A/D & D/A
Scalar Quantization
Vector Quantization
Audio Compression
3 / 26
Placeholder Figure A
ǫ2 = limT→∞
1
T
∫ T/2
−T/2E(x(t)− x(t))2dt = M.S.E.
Discrete-Time Signals
Lossy Source Coding
• Lossy Coding
• Distortion
A/D & D/A
Scalar Quantization
Vector Quantization
Audio Compression
4 / 26
If signals are bandlimited, one can sample at nyquist rate and
convert continuous-time problem to discrete-time problem. This
sampling is part of the A/D converter.
Placeholder Figure A
ǫ2 = limm→∞
1
m
m∑
i=1
E(xi − xi)2
A/D Conversion and D/A
Conversion
Lossy Source Coding
A/D & D/A
• A/D
• D/A
• D/A
• D/A
Scalar Quantization
Vector Quantization
Audio Compression
5 / 26
A/D Conversion
Lossy Source Coding
A/D & D/A
• A/D
• D/A
• D/A
• D/A
Scalar Quantization
Vector Quantization
Audio Compression
6 / 26
• Assume a random variable X which falls into the range
(Xmin, Xmax).• The goal is for X to be converted into k binary digits. Let
M = 2k.
• The usual A/D converter first subdivides the interval
(Xmin, Xmax) into M equal sub-intervals.
• Subintervals are of width ∆ = (Xmax −Xmin)/M .
• Shown below for the case of k = 3 and M = 8.
Placeholder Figure A
• We call the ith sub-interval, ℜi.
D/A Conversion
Lossy Source Coding
A/D & D/A
• A/D
• D/A
• D/A
• D/A
Scalar Quantization
Vector Quantization
Audio Compression
7 / 26
• Assume that if X falls in the region ℜi, i.e. x ∈ ℜi
• D/A converter uses as an estimate of X , the value X = Y which
is the center of the ith region.
• The mean-squared error between X and X is
ǫ2 = E[(X − X)2] =
∫ Xmax
Xmin
(X − X)2fx(x)dx
where fx(x) is the probability density function of the random
variable X .
• Let fX|ℜi(x) be the conditional density function of X given that
X falls in the region ℜi. Then
ǫ2 =M∑
i=1
P [x ∈ ℜi]
∫
x∈ℜi
(x− yi)2fx|ℜi
(x)dx
D/A Conversion
Lossy Source Coding
A/D & D/A
• A/D
• D/A
• D/A
• D/A
Scalar Quantization
Vector Quantization
Audio Compression
8 / 26
• Note that for i = 1, 2, ...,M
M∑
i=1
P [x ∈ ℜi] = ∆ and
∫
x∈ℜi
fX|ℜi(x)dx = 1.
• Make the further assumption that k is large enough so that
fX|ℜi(x) is a constant over the region ℜi.
• Then fX|ℜi(x) = 1
∆ for all i, and
∫
x∈ℜi
(x− yi)2fX|ℜi
(x)dx =1
∆
∫ b
a(x− (
b− a
2))2dx
=1
∆
∫ ∆2
−∆
2
(x− 0)2dx
=1
∆· 23·(
∆
2
)3
=∆2
12
D/A Conversion
Lossy Source Coding
A/D & D/A
• A/D
• D/A
• D/A
• D/A
Scalar Quantization
Vector Quantization
Audio Compression
9 / 26
• In other words, ǫ2 =M∑
i=1
P [x ∈ ℜi] ·∆2
12=
∆2
12
• If X has variance σ2x, the signal-to-noise ratio of the A to the D
(& D to A) converter is often defined as
(
σ2x/
∆2
12
)
• If Xmin is equal to −∞ and/or Xmax = +∞, then the last and
first intervals can be infinite in extent.
• However fx(x) is usually small enough in those intervals so that
the result is still approximately the same.
Placeholder Figure A
SCALAR QUANTIZATION of
GAUSSIAN SAMPLES
Lossy Source Coding
A/D & D/A
Scalar Quantization
• Quantization
• Quantizer
• Optimal Quantizer
• Optimal Quantizater
• Optimal Quantizater
• Iterative Solution
• Comments
• Comments
Vector Quantization
Audio Compression
10 / 26
Scalar Quantization
Lossy Source Coding
A/D & D/A
Scalar Quantization
• Quantization
• Quantizer
• Optimal Quantizer
• Optimal Quantizater
• Optimal Quantizater
• Iterative Solution
• Comments
• Comments
Vector Quantization
Audio Compression
11 / 26
Placeholder Figure A
• ENCODER:
x ≤ −3b 000 0 < x < b 100−3b < x ≤ −2b 001 b < x ≤ 2b 101−2b ≤ x ≤ −b 010 2b < x ≤ 3b 110
−b ≤ x ≤ 0 011 3b < x 111
• DECODER:
000 −3.5b 100 +.5b001 −2.5b 101 +1.5b010 −1.5b 110 +2.5b011 −.5b 111 +3.5b
Optimum Scalar Quantizer
Lossy Source Coding
A/D & D/A
Scalar Quantization
• Quantization
• Quantizer
• Optimal Quantizer
• Optimal Quantizater
• Optimal Quantizater
• Iterative Solution
• Comments
• Comments
Vector Quantization
Audio Compression
12 / 26
• Let us construct boundaries bi, (b0 = −∞, bM = +∞ and
quantization symbols ai such that
bi−1 ≤ x < bi −→ x = ai i = 1, 2, ...,M
Placeholder Figure A
• The question is how to optimize {bi} and {ai} to minimize
distortion ǫ2
ǫ2 =
M∑
i=1
∫ bi
bi−1
(x− ai)2fx(x)dx
Optimum Scalar Quantizer
Lossy Source Coding
A/D & D/A
Scalar Quantization
• Quantization
• Quantizer
• Optimal Quantizer
• Optimal Quantizater
• Optimal Quantizater
• Iterative Solution
• Comments
• Comments
Vector Quantization
Audio Compression
13 / 26
• To optimize ǫ2 =∑M
i=1
∫ bibi−1
(x− ai)2fx(x)dx we take
derivatives and putting them equal to zero:
δǫ2
δaj= 0
δǫ2
δbj= 0
• And use Leibnitz’s Rule:
δ
δt
∫ b(t)
a(t)f(x, t)dx =f(b(t), t)
δb(t)
δt
− f(a(t), t)δa(t)
δt
+
∫ b(t)
a(t)
δ
δtf(x, t)dt
Optimum Scalar Quantizer
Lossy Source Coding
A/D & D/A
Scalar Quantization
• Quantization
• Quantizer
• Optimal Quantizer
• Optimal Quantizater
• Optimal Quantizater
• Iterative Solution
• Comments
• Comments
Vector Quantization
Audio Compression
14 / 26
δ
δbj
(
M∑
i=1
∫ bi
bi−1
(x− ai)2fx(x)dx
)
=
δ
δbj
∫ bj
bj−1
(x− aj)2fx(x)dx+
δ
δbj
∫ bj+1
bj
(x− aj+1)2fx(x)dx
= (bj − aj)2fx(x)|x=bj − (bj − aj+1)
2fx(x)|x=bj = 0
b2j − 2ajbj + a2j = b2j − 2bjaj+1 + a2j+1
2bj(aj+1 − aj) = a2j+1 − a2j
bj =aj+1+aj
2 (I)
Optimum Scalar Quantizer
Lossy Source Coding
A/D & D/A
Scalar Quantization
• Quantization
• Quantizer
• Optimal Quantizer
• Optimal Quantizater
• Optimal Quantizater
• Iterative Solution
• Comments
• Comments
Vector Quantization
Audio Compression
15 / 26
δ
δaj
(
M∑
i=1
∫
bi−1
bi(x− ai)2fx(x)dx
)
= −2
∫ bj
bj−1
(x−aj)fx(x)dx = 0
aj
∫ bj
bj−1
fx(x)dx =
∫ bj
bj−1
xfx(x)dx
aj =
∫ bjbj−1
xfx(x)dx
∫ bjbj−1
fx(x)dx(II)
Optimum Scalar Quantizer
Lossy Source Coding
A/D & D/A
Scalar Quantization
• Quantization
• Quantizer
• Optimal Quantizer
• Optimal Quantizater
• Optimal Quantizater
• Iterative Solution
• Comments
• Comments
Vector Quantization
Audio Compression
16 / 26
• Note that the {bi} can be found from (I) once the {ai} is known.
◦ In fact, the {bi} are the midpoints of the {ai}.
• The {ai} can also be solved from (II) once the {bi} are known.
◦ The {ai} are centroids of the corresponding regions.
• Thus one can use a computer to iteratively solve for the {ai} and
the {bi}1. One starts with an initial guess for the {bi}.
2. One uses (II) to solve for the {ai}.
3. One uses (I) to solve for the {bi}.
4. One repeats steps 2 and 3 until the {ai} and the {bi} ”stop
changing”.
Comments on Optimum Scalar Quantizer
Lossy Source Coding
A/D & D/A
Scalar Quantization
• Quantization
• Quantizer
• Optimal Quantizer
• Optimal Quantizater
• Optimal Quantizater
• Iterative Solution
• Comments
• Comments
Vector Quantization
Audio Compression
17 / 26
1. This works for any fx(x)2. If fx(x) only has a finite support one adjusts b0&bM to be the
limits of the support.
Placeholder Figure A
3. One needs to know
β∑
α
fx(x)dx and
β∑
α
xfx(x)dx (true for
any fx(x))4. For a Gaussian, we can integrate by parts or let y = x2
∫ β
α
1√2π
e−1
2x2
dx = Q(β)−Q(α)
∫ β
αx
1√2π
e−1
2x2
dx = ...
Comments on Optimum Scalar Quantizer
Lossy Source Coding
A/D & D/A
Scalar Quantization
• Quantization
• Quantizer
• Optimal Quantizer
• Optimal Quantizater
• Optimal Quantizater
• Iterative Solution
• Comments
• Comments
Vector Quantization
Audio Compression
18 / 26
5. If M = 2a one could use a binary digits to represent the
quantized value. However since the quantized values are not
necessarily equally likely, one could use a HUFFMAN CODE to
use fewer binary digits(on the average).
6. After the {ai} and {bi} are known, one computes ǫ2 from
ǫ2 =M∑
i=1
∫ bi
bi−1
(x− ai)2fx(x)dx
7. For M = 2 and fx(x) =1√2πσ2
e−12
x2
σ2 we have
b0 = −∞, b1 = 0, b2 = +∞, and a2 = −a1 =
√
2σ2
π
8. Also easy to show that ǫ2 = (1− 2π )σ
2 = .3634σ2.
Vector Quantization
Lossy Source Coding
A/D & D/A
Scalar Quantization
Vector Quantization
• Quantization
• Gaussain DMS
• Gaussain DMS
Audio Compression
19 / 26
Vector Quantization
Lossy Source Coding
A/D & D/A
Scalar Quantization
Vector Quantization
• Quantization
• Gaussain DMS
• Gaussain DMS
Audio Compression
20 / 26
• One can achieve a smaller ǫ2 by quantizing several samples at a
time.
• We would then use regions in an m-dimensional space
Placeholder Figure A
• Shannon characterized this in terms of ”rate-distortion formula”
which tells us how small ǫ2 can be (m → ∞).
• For a Gaussian source with one binary digit per sample,
ǫ2 ≥ σ2
4= 0.25σ2
◦ This follows from the result on the next page.
◦ Contrast this with scalar case: ǫ2s = (1− 2π )σ
2 = .3634σ2.
VQ: Discrete Memoryless Gaussian Source
Lossy Source Coding
A/D & D/A
Scalar Quantization
Vector Quantization
• Quantization
• Gaussain DMS
• Gaussain DMS
Audio Compression
21 / 26
• Let source produce i.i.d. Gaussian samples X1, X2, ... where
fX(x) =1√2πσ2
e−1
2x2
σ2
• Let the source encoder produce a sequence of binary digits at a
rate of R binary digits/source symbol.
◦ In our previous terminology R = logM
• Let the source decoder produce the sequence X1, X2, ..., Xi, ...such that the mean-squared error between {Xi} and {Xi} is ǫ2.
ǫ2 =1
n
n∑
i=1
E{(Xi − Xi)2}
VQ: Discrete Memoryless Gaussian Source
Lossy Source Coding
A/D & D/A
Scalar Quantization
Vector Quantization
• Quantization
• Gaussain DMS
• Gaussain DMS
Audio Compression
22 / 26
• Then one can prove that for any such system
R ≥ 1
2log2(
σ2
ǫ2) for ǫ2 ≤ σ2
◦ Note that R = 0 for ǫ2 ≥ σ2. What does this mean?
◦ Note that for R = logM = 1,
1 ≥ 1
2log2(
σ2
ǫ2) ⇒ 2 ≥ log2(
σ2
ǫ2)
⇒ 4 ≥ σ2
ǫ2⇒ ǫ2 ≥ (1/4)σ2
• This is an example of “Rate-Distortion Theory.”
Reduced Fidelity Audio
Compression
Lossy Source Coding
A/D & D/A
Scalar Quantization
Vector Quantization
Audio Compression
• MP3
• CD
• MPEG-1 Layer 3
23 / 26
Reduced Fidelity Audio Compression
Lossy Source Coding
A/D & D/A
Scalar Quantization
Vector Quantization
Audio Compression
• MP3
• CD
• MPEG-1 Layer 3
24 / 26
• MP3 players use a form of audio compression called MPEG-1
Audio Layer 3.
• It takes advantage of a psycho-acoustic phenomena whereby
◦ a loud tone at one frequency “masks” the presence of softer
tones at neighboring frequencies;
◦ hence, these softer neighbouring tones need not be stored(or
transmitted).
• Compression efficiency of an audio compression scheme is
usually described by the encoded bit rate (prior to the
introduction of coding bits.)
Reduced Fidelity Audio Compression
Lossy Source Coding
A/D & D/A
Scalar Quantization
Vector Quantization
Audio Compression
• MP3
• CD
• MPEG-1 Layer 3
25 / 26
• The CD has a bit rate of (44.1× 103 × 2× 16) = 1.41× 106
bits/second.
◦ The term 44.1× 103 is the sampling rate which is
approximately the Nyquist frequency of the audio to be
compressed.
◦ The term 2 comes from the fact that there are two channels in
a stereo audio system.
◦ The term 16 comes from the 16-bit (or 216 = 65536 level) A
to D converter.
◦ Note that a slightly higher sampling rate 48× 103
samples/second is used for a DAT recorder.
Reduced Fidelity Audio Compression
Lossy Source Coding
A/D & D/A
Scalar Quantization
Vector Quantization
Audio Compression
• MP3
• CD
• MPEG-1 Layer 3
26 / 26
• Different standards are used in MP3 players.
• Several bit rates are specified in the MPEG-1, Layer 3 standard.
◦ These are 32, 40, 48, 56, 64, 80, 96, 112, 128, 144, 160,
192, 224, 256 and 320 kilobits/sec.
◦ The sampling rates allowed are 32, 44.1 and 48 kiloHz but
the sampling rate of 44.1× 103 Hz is almost always used.
• The basic idea behind the scheme is as follows.
◦ A block of 576 time domain samples are converted into 576
frequency domain samples using a DFT.
◦ The coefficients then modified using psycho-acoustic
principles.
◦ The processed coefficients are then converted into a bit
stream using various schemes including Huffman Encoding.
◦ The process is reversed at the receiver: bits −→ frequency
domain coefficients −→ time domain samples.
1 / 16
Digital Communications III (ECE 154C)
Introduction to Coding and Information Theory
Tara Javidi
These lecture notes were originally developed by late Prof. J. K. Wolf.
UC San Diego
Spring 2014
Information Theory:
Definitions and Equalities
Basic Definition
• Conditional Entropy
• Conditional Entropy
• Mutual Information
• Basic Inequality
Examples
2 / 16
Basic Definitions
Basic Definition
• Conditional Entropy
• Conditional Entropy
• Mutual Information
• Basic Inequality
Examples
3 / 16
• Since H(X) =∑
x
p[x] log1
p[x]
• Then H(X,Y ) =∑
x
∑
y
p[x, y] log1
p[x, y]
• Define H[X|Y = y] =∑
x
p(x|y) log1
p(x|y)
• Define H[X|Y ] =∑
y
p(y)H[X|Y = y]
• Easy to see
H[X|Y ] =∑
y
p(y)∑
x
p(x|y) log1
p(x|y)
=∑
x
∑
y
p(x, y) log1
p(x|y)
Fundamental Equalities
Basic Definition
• Conditional Entropy
• Conditional Entropy
• Mutual Information
• Basic Inequality
Examples
4 / 16
• But p[x, y] = p[y|x]p[x] so
H[X,Y ]
=∑
x
∑
y
p[x, y]
[
log1
p[y|x]+ log
1
p[x]
]
=∑
x
∑
y
p[x, y] log1
p[y|x]+∑
x
(∑
y
p[x, y]
)
︸ ︷︷ ︸
p[x]
log1
p[x]
=∑
x
∑
y
p[x, y] log1
p[y|x]︸ ︷︷ ︸
Defined as H[Y |X]
+∑
x
p[x] log1
p[x]︸ ︷︷ ︸
H[X]
• In other words
H(X,Y ) = H(X) +H(Y |X) = H(Y ) +H(X|Y )
Basic Definition
Basic Definition
• Conditional Entropy
• Conditional Entropy
• Mutual Information
• Basic Inequality
Examples
5 / 16
• Define I(X;Y ) =∑
x
∑
y
p(x, y) logp(x, y)
p(x)p(y)
• It is easy to see that
1. I(X;Y ) = H(X) +H(Y )−H(X,Y )
or equivalently
H(X,Y ) = H(X) +H(Y )− I(X;Y )
2. I(X;Y ) = H(X)−H(X|Y )
3. I(X;Y ) = H(Y )−H(Y |X)
A Basic Inequality
Basic Definition
• Conditional Entropy
• Conditional Entropy
• Mutual Information
• Basic Inequality
Examples
6 / 16
I(X;Y ) ≥ 0 (or −I(x; y) ≤ 0) with equality iff X & Yare statistically independent.
PROOF:
−Ia(X;Y ) =∑
x
∑
y
p[x, y] loga
[p(x)p(y)
p(x, y)
]
= (loga e)∑
x
∑
y
p[x, y] lnp(x)p(y)
p(x, y)
≤ (loga e) ln
(∑
x
∑
y
p[x, y]
[p(x)p(y)
p(x, y)
])
= 0
with equality iff p[x, y] = p[x]p[y].
As a consequence:
H(X|Y ) ≤ H(X)H(Y |X) ≤ H(Y )
}
with equality iff X&Y are independent.
Three Examples
Basic Definition
Examples
• Example 1
• Example 2
• Example 2
(continued)
• Example 2
(continued)
• Example 3
• Example 3
(continued)
• Example 3
(continued)
• Example 3
(continued)
• Example 3
(continued)
7 / 16
EXAMPLE 1
Basic Definition
Examples
• Example 1
• Example 2
• Example 2
(continued)
• Example 2
(continued)
• Example 3
• Example 3
(continued)
• Example 3
(continued)
• Example 3
(continued)
• Example 3
(continued)
8 / 16
EX 1: One die,
Xoutcome ∈ {1, 2, 3, 4, 5, 6}Y ∈ {odd, even} −→ {0, 1}
H(X) = log2 6
H(Y |X) = 0
H(X,Y ) = log2 6
H(Y ) = 1 = log2 2
H(X|Y ) = log2 3
H(X,Y ) = log2 6 = log2 6 + 0 = log2 2 + log2 3
EXAMPLE 2
Basic Definition
Examples
• Example 1
• Example 2
• Example 2
(continued)
• Example 2
(continued)
• Example 3
• Example 3
(continued)
• Example 3
(continued)
• Example 3
(continued)
• Example 3
(continued)
9 / 16
EX 2: Two dice (independent throws): (X1, X2)
H(X2|X1) = H(X2) = log2 6 = 2.58496
H(X1, X2) = 2 log2 6 = log2 36 = 5.16993
Let us define
Y = X1 +X2
What is H(X1, X2|Y ) =?It is easier to calculate H(Y |X1, X2) and H(Y ) and then use
H(X1, X2|Y ) = H(Y |X1, X2) +H(X1, X2)−H(Y )
H(X1, X2, Y ) = H(Y |X1, X2) +H(X1, X2)
But Also
H(X1, X2, Y ) = H(X1, X2|Y ) +H(Y )
EX 2 (continued)
Basic Definition
Examples
• Example 1
• Example 2
• Example 2
(continued)
• Example 2
(continued)
• Example 3
• Example 3
(continued)
• Example 3
(continued)
• Example 3
(continued)
• Example 3
(continued)
10 / 16
Since H(Y |X1, X2) = 0, all we need to calculate is H(Y )
1 (X1, X2)2 (1, 1) 1/363 (1, 2) + (2, 1) 2/364 (1, 3), (2, 2) + (3, 1) 3/36
5... 4/36
......
...
12 (6, 6) 1/36
H(Y ) = 2
[1
36log2 36 + · · ·+
5
36log2
36
5
]
+6
36log2 6
= 3.27441
H(X1, X2|Y ) = 5.16993− 3.27441 = 1.89552
EX 2 (continued)
Basic Definition
Examples
• Example 1
• Example 2
• Example 2
(continued)
• Example 2
(continued)
• Example 3
• Example 3
(continued)
• Example 3
(continued)
• Example 3
(continued)
• Example 3
(continued)
11 / 16
Another question: What is H(X1|Y )?
We take advantage of the following equality:
H(X1, X2|Y ) = H(X1, Y ) +H(X2|X1, Y ).
Now notice that since X2 is independent of X1
H(X2|X1, Y ) = H(X2|Y ).
But by symmetry
H(X1|Y ) = H(X2|Y ).
Hence,
H(X1|Y ) =1
2H(X1, X2|Y ) = 0.94776.
EXAMPLE 3
Basic Definition
Examples
• Example 1
• Example 2
• Example 2
(continued)
• Example 2
(continued)
• Example 3
• Example 3
(continued)
• Example 3
(continued)
• Example 3
(continued)
• Example 3
(continued)
12 / 16
EX 3: Assume a tennis match is played between two equally
matched players A and B and that the first player to win 3 sets is the
winner. Let X represent the outcomes of the sets. Some of the
possible values of X are: AAA, BBB, ABBB, BAAA, BABAA, etc. Let
Y be the number of sets played. Then Y takes on values 3,4, or 5.
Let Z represent the player who won the match. For example if X =
ABBAA, then Y = 5 and Z = A.
1. Compute H(X), H(Y ), H(z).2. Compute H(X,Y ), H(X,Z), H(Y, Z)3. Prove that H(X|Y ) = H(X)−H(Y ).
EX 3 (continued)
Basic Definition
Examples
• Example 1
• Example 2
• Example 2
(continued)
• Example 2
(continued)
• Example 3
• Example 3
(continued)
• Example 3
(continued)
• Example 3
(continued)
• Example 3
(continued)
13 / 16
X Y Z18
18
AAA
BBB
3
3
}
28 = 1
4
A
B
116
116
116
116
116
116
BAAA
ABAA
AABA
ABBB
BABB
BBAB
4
4
4
4
4
4
616 = 3
8
A
A
A
B
B
B
EX 3 (continued)
Basic Definition
Examples
• Example 1
• Example 2
• Example 2
(continued)
• Example 2
(continued)
• Example 3
• Example 3
(continued)
• Example 3
(continued)
• Example 3
(continued)
• Example 3
(continued)
14 / 16
X Y Z132
132
...
132
132
132
132
132
132
BBAAA
BABAA
...
AABBB
ABABB
ABBAB
BAABB
BABAB
BBAAB
5
5
...
5
5
5
5
5
5
1232 = 3
8
A
A
...
B
B
B
B
B
B
EX 3 (continued)
Basic Definition
Examples
• Example 1
• Example 2
• Example 2
(continued)
• Example 2
(continued)
• Example 3
• Example 3
(continued)
• Example 3
(continued)
• Example 3
(continued)
• Example 3
(continued)
15 / 16
1. H(X) = 2
[1
8log 8
]
+ 6
[1
16log 16
]
+ 12
[1
32log 32
]
=
6
8+
3
2+
15
8=
33
8= 4.125 (base 2)
H(Y ) =1
4log 4 + 2
(3
8log
8
3
)
=1
2+
9
4−
3
4log 3 =
11
4−
3
4log 3 = 1.56 (base 2)
H(Z) = log 2 = 1 (base 2)
2. H(X,Y ) = H(X) +H(Y |X)︸ ︷︷ ︸
0
= H(X) = 338 = 4.125
H(X,Z) = H(X) +H(Z|X)︸ ︷︷ ︸
0
= H(X) = 338 = 4.125
H(Y, Z) = H(Y ) +H(Z|Y ) = H(Y ) +H(Z) =154 − 3
4 log 3 = 2.56
EX 3 (continued)
Basic Definition
Examples
• Example 1
• Example 2
• Example 2
(continued)
• Example 2
(continued)
• Example 3
• Example 3
(continued)
• Example 3
(continued)
• Example 3
(continued)
• Example 3
(continued)
16 / 16
3. H(X|Y ) = H(X,Y )︸ ︷︷ ︸
H(X)︸ ︷︷ ︸
+H(Y |X)︸ ︷︷ ︸
0
−H(Y ) = H(X)−H(Y ) =
2.57 .
Similarly,
H(Y |X) = H(Z|X) = 0
H(Z|Y ) = H(Y, Z)−H(Y ) = 2.56− 1.56 = 1 = H(Z)
H(Z|X) = 1 = H(Z|X,Y ) = H(Z|X) = 1
H(X|Z) = H(X,Z)−H(Z) = H(X)−H(Z) = 3.125
H(X,Y, Z) = H(Z|XY )︸ ︷︷ ︸
1
+H(X,Y )︸ ︷︷ ︸
4.25
= 5.25 etc..
1 / 23
Digital Communications III (ECE 154C)
Introduction to Coding and Information Theory
Tara Javidi
These lecture notes were originally developed by late Prof. J. K. Wolf.
UC San Diego
Spring 2014
Discrete Memoryless Channels
Discrete Memoryless
Channel
• Basic Definition
Capacity of DMC
Channel Coding
Capacity: Examples
Computing Capacity
2 / 23
Discrete Memoryless Channel
Discrete Memoryless
Channel
• Basic Definition
Capacity of DMC
Channel Coding
Capacity: Examples
Computing Capacity
3 / 23
X1...XN−→ DMCY1...YN−→
PY1...YN |X1...XN(y1...yN |x1...xN)
= PY |X(y1|x1)PY |X(y2|x2)PY |X(yN |xN )
From the memoryless property of the chancel, the “Single Input –
Single Output” representation is sufficiently informative.
X−→ DMC
Y−→
PY |X(y|x)
Channel Capacity:
Fundamental Limits
Discrete Memoryless
Channel
Capacity of DMC
• Basica Defintion
• Intuition
• Block Input
Channel Coding
Capacity: Examples
Computing Capacity
4 / 23
Maximum Mutual Information
Discrete Memoryless
Channel
Capacity of DMC
• Basica Defintion
• Intuition
• Block Input
Channel Coding
Capacity: Examples
Computing Capacity
5 / 23
=⇒ An important property of a DMC is its mutual information,
I(X;Y )
=⇒ But in order to calculate I(X;Y ) we need to know
PX,Y (x, y) = PY |X(y|x)PX(x)
=⇒ Thus, in order to calculate I(X;Y ) one has to specify an input
distribution PX(x)
=⇒ The capacity, C , of a DMC is the maximum I(X;Y ) that can be
achieved over all input distributions
C = maxPX(x)
I(X;Y )
Channel Capacity: Intuitition
Discrete Memoryless
Channel
Capacity of DMC
• Basica Defintion
• Intuition
• Block Input
Channel Coding
Capacity: Examples
Computing Capacity
6 / 23
• Recall that I(X;Y ) = H(X)−H(X/Y )• Recall the interpretation that
◦ For any random variable X , H(X) measure the randomness
or uncertainty about X◦ Zy = X|Y=y as a new random variable with (conditional)
pmf PX|Y=y(x|Y = y)◦ H(X|Y ) is the entropy of this random variable averaged
over choices of Y = y
• Mutual Information is nothing but the average reduction in the
randomness about X after Y is observed
• Channel Capacity is the maximum such reduction in uncertainty
when one can design X
C = maxPX (x)
I(X;Y )
Multiple Input – Multiple Output
Discrete Memoryless
Channel
Capacity of DMC
• Basica Defintion
• Intuition
• Block Input
Channel Coding
Capacity: Examples
Computing Capacity
7 / 23
=⇒ One can show that for a DMC
maxPX1···XN
(x1,...,xN )
1
NI(X1, . . . , XN ;Y1, . . . , YN ) = max
PX(x)I(X;Y )
=⇒ How?
Channel Coding and Capacity
Discrete Memoryless
Channel
Capacity of DMC
Channel Coding
• Definitions
• Definitions
• Coding Theorem
Capacity: Examples
Computing Capacity
8 / 23
Coding For A Binary Input DMC
Discrete Memoryless
Channel
Capacity of DMC
Channel Coding
• Definitions
• Definitions
• Coding Theorem
Capacity: Examples
Computing Capacity
9 / 23
W−→
Channel
Encoder
m−vector
︷︸︸︷
x−−−−−−→
Binary-Input
DMC
m−vector
︷︸︸︷
y
−−−−−−→Channel
Decoder
W−→
• Code Rate: R < 1 (at most one digit transmitted)
• Message: An integer number, denoted by W, between 1 and
2mR (equivalently, a binary vector of length mR)
Assumption: Messages occur with equal probabilities
P [W = i] =1
2mR, for all i = 1, 2, ..., 2mR
Coding For A Binary Input DMC
Discrete Memoryless
Channel
Capacity of DMC
Channel Coding
• Definitions
• Definitions
• Coding Theorem
Capacity: Examples
Computing Capacity
10 / 23
W−→
Channel
Encoder
m−vector
︷︸︸︷
x−−−−−−→
Binary-Input
DMC
m−vector
︷︸︸︷
y
−−−−−−→Channel
Decoder
W−→
• Block Code (Binary): Collection of 2mR length m binary vectors
• Channel Decoder: Chooses most likely code word (or
equivalently most likely message) based upon received vector y.
• Error Probability: Probability that message produced by decoder
is not equal to the original message, i.e. P{W 6= W}
Channel Coding Theorem
Discrete Memoryless
Channel
Capacity of DMC
Channel Coding
• Definitions
• Definitions
• Coding Theorem
Capacity: Examples
Computing Capacity
11 / 23
W−→
Channel
Encoder
m−vector
︷︸︸︷
x−−−−−−→
Binary-Input
DMC
m−vector
︷︸︸︷
y
−−−−−−→Channel
Decoder
W−→
Channel Coding Theorem:
Let R < C (base 2) . For m large enough, there exists an encoder
and a decoder such that P{W 6= W} < ǫ for any ǫ > 0.
Computation of Capacity for
Well-known Channels
Discrete Memoryless
Channel
Capacity of DMC
Channel Coding
Capacity: Examples
• BSC
• BSC
• BEC
• BEC
• Z-Channel
• Z-Channel
• 5-ary Symmetric
• 5-ary Symmetric
• Code for 5-ary
Symmetric
Computing Capacity
12 / 23
Capacity of BSC
Discrete Memoryless
Channel
Capacity of DMC
Channel Coding
Capacity: Examples
• BSC
• BSC
• BEC
• BEC
• Z-Channel
• Z-Channel
• 5-ary Symmetric
• 5-ary Symmetric
• Code for 5-ary
Symmetric
Computing Capacity
13 / 23
EXAMPLE: Binary Symmetric Channel (BSC)
Placeholder Figure A
I2(X;Y ) = H2(Y )−H2(Y |X)
H2(Y |X) =1∑
x=0
1∑
y=0
PX(x)PY |X(y|x) log21
PY |X(y|x)
=
1∑
x=0
PX(x)
(
(1− p) log21
1− p+ p log2
1
p
)
= h2(p)
where h2(p) := (1− p) log21
1−p + p log21p .
Capacity of BSC
Discrete Memoryless
Channel
Capacity of DMC
Channel Coding
Capacity: Examples
• BSC
• BSC
• BEC
• BEC
• Z-Channel
• Z-Channel
• 5-ary Symmetric
• 5-ary Symmetric
• Code for 5-ary
Symmetric
Computing Capacity
14 / 23
EXAMPLE: Binary Symmetric Channel (BSC)
On the other hand, H2(Y ) ≤ log2 2 with equality iff
P [Y = 0] = P [Y = 1] = 1/2.
But note that if PX(0) = 1/2 = PX(1) = 1/2, then
PY (0) = PY (1) = 1/2.
In other words,
C = maxPX (x)
I(X;Y ) = log2 2− h2(p) = 1− h2(p).
Capacity of BEC
Discrete Memoryless
Channel
Capacity of DMC
Channel Coding
Capacity: Examples
• BSC
• BSC
• BEC
• BEC
• Z-Channel
• Z-Channel
• 5-ary Symmetric
• 5-ary Symmetric
• Code for 5-ary
Symmetric
Computing Capacity
15 / 23
EXAMPLE: Binary Erasure Channel (BEC)
Placeholder Figure A
I2(X;Y ) = H2(X)−H(X|Y )
Assume P [X = 0] = α, and P [X = 1] = (1− α)(⇒ PY (2) = p, PY (0) = α(1− p), and PY [1] = (1−α)(1− p)).
Hence, H2(X) = α log21α + (1− α) log2
11−α = h2(α).
On the other hand,
H2(X|Y ) =∑2
y=0 PY (y)1∑
x=0
PX|Y (x|y) log21
PX|Y (x|y)︸ ︷︷ ︸
H(X|Y=y)
Capacity of BEC
Discrete Memoryless
Channel
Capacity of DMC
Channel Coding
Capacity: Examples
• BSC
• BSC
• BEC
• BEC
• Z-Channel
• Z-Channel
• 5-ary Symmetric
• 5-ary Symmetric
• Code for 5-ary
Symmetric
Computing Capacity
16 / 23
Also,
H(X|Y = 0) =1∑
x=0
PX|Y (x|0) log21
PX|Y (x|0)= 0
H(X|Y = 1) =1∑
x=0
PX|Y (x|1) log21
PX|Y (x|1)= 0
H(X|Y = 2) =1∑
x=0
PX|Y (x|2) log21
PX|Y (x|2)
= α log2(1
α) + (1− α) log2
1
1− α= h2(α)
Hence,
C = maxα
h2(α)− ph2(α) = 1− p.
Capacity of a Z-Channel
Discrete Memoryless
Channel
Capacity of DMC
Channel Coding
Capacity: Examples
• BSC
• BSC
• BEC
• BEC
• Z-Channel
• Z-Channel
• 5-ary Symmetric
• 5-ary Symmetric
• Code for 5-ary
Symmetric
Computing Capacity
17 / 23
EXAMPLE:
Placeholder Figure A
Recall I2(X;Y ) = H2(X)−H(X|Y ).
Assume P [X = 0] = α, and P [X = 1] = (1− α).
Hence, H2(X) = α log21α + (1− α) log2
11−α = h2(α).
Also, H2(Y |X) = αh2(12). So
I2(X;Y ) = H2(Y )−H2(Y |X) = h2(α
2)− αh2(
1
2)
=α
2log2
2
α+ (1−
α
2) log2
1
1− α2
− αh2(1
2)
Capacity of a Z-Channel
Discrete Memoryless
Channel
Capacity of DMC
Channel Coding
Capacity: Examples
• BSC
• BSC
• BEC
• BEC
• Z-Channel
• Z-Channel
• 5-ary Symmetric
• 5-ary Symmetric
• Code for 5-ary
Symmetric
Computing Capacity
18 / 23
Equivalently Ie(x; y) =α2 ln 2
α(1−α2 ) ln
11−α
2
− αhe(12).
We maximize Ie(X;Y ) over α (same α maximizes I2(X;Y )).
0 =δIe(X;Y )
δα=
1
2ln
2
α−
α
2.1
α+
(
1−α
2
) 1/2(1− α
2
) − he(1
2)
︸ ︷︷ ︸
ln 2
=1
2ln 2−
1
2lnα−
1
2+
1
2+
1
2ln
(
1−α
2
)
− ln 2
This means
1
2ln
α
1− α2
= −1
2ln 2 =
1
2ln
1
2⇒
1− α2
α= 2 =⇒ α =
2
5
C2 = I2(X;Y )|α=2/5 =1
5log2 5 +
4
5log2
5
4−
2
5= log2 5− 2 = .3219
Capacity of a 5-ary Symmetric Channel
Discrete Memoryless
Channel
Capacity of DMC
Channel Coding
Capacity: Examples
• BSC
• BSC
• BEC
• BEC
• Z-Channel
• Z-Channel
• 5-ary Symmetric
• 5-ary Symmetric
• Code for 5-ary
Symmetric
Computing Capacity
19 / 23
EXAMPLE:
Placeholder Figure A
C = maxP (X)
I(X;Y ) = I(X;Y )|P [X=x]= 1
5,x=0,1,2,3,4
= [H(Y )−H(Y |X)]P [X=x]= 1
5,x=0,1,2,3,4 .
If inputs are equally likely, outputs are equally likely. Thus,
H(Y ) = log 5.
So what remains is to compute H(Y |X)|P [X=x]= 1
5,x=0,1,2,3,4.
Capacity of a 5-ary Symmetric Channel
Discrete Memoryless
Channel
Capacity of DMC
Channel Coding
Capacity: Examples
• BSC
• BSC
• BEC
• BEC
• Z-Channel
• Z-Channel
• 5-ary Symmetric
• 5-ary Symmetric
• Code for 5-ary
Symmetric
Computing Capacity
20 / 23
For any value of the input, there are two possible outputs that are
equally likely. In other words,
H(Y |X = x) = log 2, x = 0, . . . , 4.
Hence,
H(Y |X) = log 2.
And
C = log 5− log 2 = log5
2
Note: one can also measure the capacity in terms of 5-art digits
where capacity would be nothing but C5 = log552 .
A simple Code for 5-ary Symmetric Channel
Discrete Memoryless
Channel
Capacity of DMC
Channel Coding
Capacity: Examples
• BSC
• BSC
• BEC
• BEC
• Z-Channel
• Z-Channel
• 5-ary Symmetric
• 5-ary Symmetric
• Code for 5-ary
Symmetric
Computing Capacity
21 / 23
Code Words Outputs
00 −→ 00 01 10 1112 −→ 12 13 22 2324 −→ 24 20 34 3031 −→ 31 32 41 4243 −→ 43 44 03 04
Decoder works in reverse (and never gets confused!).
There are 5 code words and each code word can represent one
5-ary digit or log2 5 binary digits. Since each code word represents
2 uses of the channel the rate of the code is 12 = .5 5-ary digits
(12 log2 5 = 1.1609 binary digits) per channel use.
Now note that this rate is below the capacity of the channel:12 log2 5 = 1.11609 ≤ log 5
2 = 1.322 (or equivalently 12 ≤ log5
52 ).
Computation of Capacity
Discrete Memoryless
Channel
Capacity of DMC
Channel Coding
Capacity: Examples
Computing Capacity
• Blahut-Arimoto
22 / 23
Finding the Capacity Using A Computer
Discrete Memoryless
Channel
Capacity of DMC
Channel Coding
Capacity: Examples
Computing Capacity
• Blahut-Arimoto
23 / 23
• For an arbitrary DMC, one cannot, in general, use analytic
techniques to find the channel capacity.
• Instead of one can use a computer to search out the input
distribution, PX(x), that maximizes I(X;Y )• These is a special algorithm, called the BLAHUT-ARIMOTO
algorithm, for doing this. For simple cases, a brute-force search
can be used.
EXAMPLE:
Placeholder Figure A
1. Use computer to evaluate I(X;Y ) = f(α)2. Optimize with respect to α
Recommended