Digital Communications III (ECE 154C) Introduction...

Preview:

Citation preview

1 / 8

Digital Communications III (ECE 154C)

Introduction to Coding and Information Theory

Tara Javidi

These lecture notes were originally developed by late Prof. J. K. Wolf.

UC San Diego

Spring 2014

Overview of ECE 154C

Course Overview

• Course Overview I

• Overview II

Examples

2 / 8

Course Overview I: Digital Communications Block Diagram

Course Overview

• Course Overview I

• Overview II

Examples

3 / 8

Course Overview I: Digital Communications Block Diagram

Course Overview

• Course Overview I

• Overview II

Examples

3 / 8

• Note that the Source Encoder converts all types of information to

a stream of binary digits.

Course Overview I: Digital Communications Block Diagram

Course Overview

• Course Overview I

• Overview II

Examples

3 / 8

• Note that the Source Encoder converts all types of information to

a stream of binary digits.

• Note that the Channel Endcouter, in an attempt to protect the

source coded (binary) stream, judiciously adds redundant bits.

Course Overview I: Digital Communications Block Diagram

Course Overview

• Course Overview I

• Overview II

Examples

3 / 8

• Sometimes the output of the source decoder must be an exact

{replica of the information (e.g. computer data) — called

NOISELESS CODING (aka lossless compression)

Course Overview I: Digital Communications Block Diagram

Course Overview

• Course Overview I

• Overview II

Examples

3 / 8

• Sometimes the output of the source decoder must be an exact

{replica of the information (e.g. computer data) — called

NOISELESS CODING (aka lossless compression)

• Other times the output of the source decoder can be

approximately equal to the information (e.g. music, tv, speech) —

called CODING WITH DISTORTION (aka lossy compression)

Overview II: What will we cover?

Course Overview

• Course Overview I

• Overview II

Examples

4 / 8

REFERENCE: CHAPTER 10 ZIEMER & TRANTER

SOURCE CODING - NOISELESS CODES

◦ Basic idea is to use as few binary digits as possible and still

be able to recover the information exactly

◦ Topics include:

• Huffman Codes

• Shannon Fano Codes

• Tunstall Codes

• Entropy of Source

• Lempel-Ziv Codes

Overview II: What will we cover?

Course Overview

• Course Overview I

• Overview II

Examples

4 / 8

REFERENCE: CHAPTER 10 ZIEMER & TRANTER

SOURCE CODING WITH DISTORTION

◦ Again the idea is to use minimum number of binary digits for a

given value of distortion

◦ Topics include:

• Gaussian Source

• Optimal Quantizing

Overview II: What will we cover?

Course Overview

• Course Overview I

• Overview II

Examples

4 / 8

REFERENCE: CHAPTER 10 ZIEMER & TRANTER

CHANNEL CAPACITY OF A NOISY CHANNEL

◦ Even if channel is noisy, messages can be sent essentially

error free if extra digits are transmitted

◦ Basic idea is to use as few extra digits as possible

◦ Topics Covered:

• Channel Capacity

• Mutual Information

• Some Examples

Overview II: What will we cover?

Course Overview

• Course Overview I

• Overview II

Examples

4 / 8

REFERENCE: CHAPTER 10 ZIEMER & TRANTER

CHANNEL CODING

◦ Basic idea — Detect errors that occured on channel and then

correct them

◦ Topics Covered:

• Hamming Code

• General Theory of Block Codes

(Parity Check Matrix, Generator Matrix, Minimum

Distance, etc.)

• LDPC Codes

• Turbo Codes

• Code Performance

A Few Examples

Course Overview

Examples

• Example 1

• Example 2

• More Examples

5 / 8

Example 1: 4 letter DMS

Course Overview

Examples

• Example 1

• Example 2

• More Examples

6 / 8

Basic concepts came from one paper of one man named

Claude Shannon!

Example 1: 4 letter DMS

Course Overview

Examples

• Example 1

• Example 2

• More Examples

6 / 8

Basic concepts came from one paper of one man named

Claude Shannon! Shannon used simple models that

capture the essence of the problem!

Example 1: 4 letter DMS

Course Overview

Examples

• Example 1

• Example 2

• More Examples

6 / 8

EXAMPLE 1– Simple Model of a source (Called a DISCRETE

MEMORYLESS SOURCE OR DMS)

Example 1: 4 letter DMS

Course Overview

Examples

• Example 1

• Example 2

• More Examples

6 / 8

EXAMPLE 1– Simple Model of a source (Called a DISCRETE

MEMORYLESS SOURCE OR DMS)

• I.I.D. (Independent and Identically Distributed) source letters

Example 1: 4 letter DMS

Course Overview

Examples

• Example 1

• Example 2

• More Examples

6 / 8

EXAMPLE 1– Simple Model of a source (Called a DISCRETE

MEMORYLESS SOURCE OR DMS)

• I.I.D. (Independent and Identically Distributed) source letters

• Alphabet size of 4 (A,B,C,D)

Example 1: 4 letter DMS

Course Overview

Examples

• Example 1

• Example 2

• More Examples

6 / 8

EXAMPLE 1– Simple Model of a source (Called a DISCRETE

MEMORYLESS SOURCE OR DMS)

• I.I.D. (Independent and Identically Distributed) source letters

• Alphabet size of 4 (A,B,C,D)

• P(A) = p1, P(B) = p2, P(C) = p3, P(D) = p4,∑

ipi = 1

Example 1: 4 letter DMS

Course Overview

Examples

• Example 1

• Example 2

• More Examples

6 / 8

EXAMPLE 1– Simple Model of a source (Called a DISCRETE

MEMORYLESS SOURCE OR DMS)

• I.I.D. (Independent and Identically Distributed) source letters

• Alphabet size of 4 (A,B,C,D)

• P(A) = p1, P(B) = p2, P(C) = p3, P(D) = p4,∑

ipi = 1

• Simplest CodeA −→ 00B −→ 01C −→ 10D −→ 11

Example 1: 4 letter DMS

Course Overview

Examples

• Example 1

• Example 2

• More Examples

6 / 8

EXAMPLE 1– Simple Model of a source (Called a DISCRETE

MEMORYLESS SOURCE OR DMS)

• I.I.D. (Independent and Identically Distributed) source letters

• Alphabet size of 4 (A,B,C,D)

• P(A) = p1, P(B) = p2, P(C) = p3, P(D) = p4,∑

ipi = 1

• Simplest CodeA −→ 00B −→ 01C −→ 10D −→ 11

Example 1: 4 letter DMS

Course Overview

Examples

• Example 1

• Example 2

• More Examples

6 / 8

Example 1: 4 letter DMS

Course Overview

Examples

• Example 1

• Example 2

• More Examples

6 / 8

• Average length of code words

L = 2(p1 + p2 + p3 + p4) = 2

Example 1: 4 letter DMS

Course Overview

Examples

• Example 1

• Example 2

• More Examples

6 / 8

• Average length of code words

L = 2(p1 + p2 + p3 + p4) = 2

Q: Can we use fewer than 2 binary digits per source letter (on the

average) and still recover information from the binary sequence?

Example 1: 4 letter DMS

Course Overview

Examples

• Example 1

• Example 2

• More Examples

6 / 8

• Average length of code words

L = 2(p1 + p2 + p3 + p4) = 2

Q: Can we use fewer than 2 binary digits per source letter (on the

average) and still recover information from the binary sequence?

A: Depends on values of (p1, p2, p3, p4)

Example 2: Binary Symmetric Channel

Course Overview

Examples

• Example 1

• Example 2

• More Examples

7 / 8

EXAMPLE 2– Simple Model for Noisy Channel

Example 2: Binary Symmetric Channel

Course Overview

Examples

• Example 1

• Example 2

• More Examples

7 / 8

EXAMPLE 2– Simple Model for Noisy Channel

Channels, as you saw in ECE154B, can be viewed as

If s0(t) = −s1(t) and equally likely signals,

Perror = Q

(

2E

N0

)

= P

Example 2: Binary Symmetric Channel

Course Overview

Examples

• Example 1

• Example 2

• More Examples

7 / 8

EXAMPLE 2– Simple Model for Noisy Channel

Channels, as you saw in ECE154B, can be viewed as

If s0(t) = −s1(t) and equally likely signals,

Perror = Q

(

2E

N0

)

= P

Q: Can we send information “error-free” over such a channel even

though p 6= 0, 1?

Example 2: Binary Symmetric Channel

Course Overview

Examples

• Example 1

• Example 2

• More Examples

7 / 8

EXAMPLE 2– Simple Model for Noisy Channel

Shannon considered a simpler channel called binary symmetric

channel (or BSC for short)

Pictorially Mathematically

PY |X(y|x) =

{

1− p y = x

p y 6= x

Q: Can we send information “error-free” over such a channel even

though p 6= 0, 1?

Example 2: Binary Symmetric Channel

Course Overview

Examples

• Example 1

• Example 2

• More Examples

7 / 8

EXAMPLE 2– Simple Model for Noisy Channel

Shannon considered a simpler channel called binary symmetric

channel (or BSC for short)

Pictorially Mathematically

PY |X(y|x) =

{

1− p y = x

p y 6= x

Q: Can we send information “error-free” over such a channel even

though p 6= 0, 1?

A: Depends on the rate of transmission (how many channel uses

are allowed per information bit). Essentially for small enough of

transmission rate (to be defined precisely), the answer is YES!

Example 3: DMS with Alphabet size 8

Course Overview

Examples

• Example 1

• Example 2

• More Examples

8 / 8

Example 3: DMS with Alphabet size 8

Course Overview

Examples

• Example 1

• Example 2

• More Examples

8 / 8

EXAMPLE 3– Discrete Memoryless Source with alphabet size of 8

letters: {A,B,C,D,E, F,G,H}

• Probabilities: {pA ≥ pB ≥ pC ≥ pD ≥ pE ≥ pF ≥ pG, pH}• See the following codes:

Q: Which codes are uniquely decodable? Which ones are

instantaneously decodable? Compute the average length of the

codewords for each code.

Example 3: DMS with Alphabet size 8

Course Overview

Examples

• Example 1

• Example 2

• More Examples

8 / 8

EXAMPLE 4– Can you optimally design a code?

L =1

2× 1 +

1

4× 2 +

1

8× 3 +

1

16× 4 +

4

64× 6

=1

32+

1

32+

1

16+

1

8+

1

4+

1

2+ 1 = 2

We will see that this is an optimal code (not only among the

single-letter constructions but overall).

Example 3: DMS with Alphabet size 8

Course Overview

Examples

• Example 1

• Example 2

• More Examples

8 / 8

EXAMPLE 5–

L = .1 + .1 + .2 + .2 + .3 + .5 + 1 = 2.4

But here we can do better by encoding 2 source letters (or more) at a

time?

1 / 16

Digital Communications III (ECE 154C)

Introduction to Coding and Information Theory

Tara Javidi

These lecture notes were originally developed by late Prof. J. K. Wolf.

UC San Diego

Spring 2014

Source Coding: Lossless

Compression

Source Coding

• Source Coding

• Basic Definitions

Larger Alphabet

Huffman Codes

Class Work

2 / 16

Source Coding: A Simple Example

Source Coding

• Source Coding

• Basic Definitions

Larger Alphabet

Huffman Codes

Class Work

3 / 16

Back to our simple example of a source:

P[A] =1

2,P[B] =

1

4,P[C] =

1

8,P[D] =

1

8

Assumptions

1. One must be able to uniquely recover the source sequence from

the binary sequence

2. One knows the start of the binary sequence at the receiver

3. One would like to minimize the average number of binary digits

per source letter

Source Coding: A Simple Example

Source Coding

• Source Coding

• Basic Definitions

Larger Alphabet

Huffman Codes

Class Work

3 / 16

Back to our simple example of a source:

P[A] =1

2,P[B] =

1

4,P[C] =

1

8,P[D] =

1

8

1. A → 00B → 01 ABAC → 00010010 → ABAC

C → 10D → 11 L = 2

Assumptions

1. One must be able to uniquely recover the source sequence from

the binary sequence

2. One knows the start of the binary sequence at the receiver

3. One would like to minimize the average number of binary digits

per source letter

Source Coding: A Simple Example

Source Coding

• Source Coding

• Basic Definitions

Larger Alphabet

Huffman Codes

Class Work

3 / 16

Back to our simple example of a source:

P[A] =1

2,P[B] =

1

4,P[C] =

1

8,P[D] =

1

8

1. A → 0 AABD → 00110 → CBBA

B → 1 (→ CBD)C → 10 (→ AABD)D → 11 L = 5

4

This code is useless. Why?

Assumptions

1. One must be able to uniquely recover the source sequence from

the binary sequence

2. One knows the start of the binary sequence at the receiver

3. One would like to minimize the average number of binary digits

per source letter

Source Coding: A Simple Example

Source Coding

• Source Coding

• Basic Definitions

Larger Alphabet

Huffman Codes

Class Work

3 / 16

Back to our simple example of a source:

P[A] =1

2,P[B] =

1

4,P[C] =

1

8,P[D] =

1

8

1. A → 0 ABACD → 0100110111 → ABACD

B → 10C → 110D → 111 L = 7

4

Minimum length code satisfying Assumptions 1 and 2!

Assumptions

1. One must be able to uniquely recover the source sequence from

the binary sequence

2. One knows the start of the binary sequence at the receiver

3. One would like to minimize the average number of binary digits

per source letter

Source Coding: A Simple Example

Source Coding

• Source Coding

• Basic Definitions

Larger Alphabet

Huffman Codes

Class Work

3 / 16

Back to our simple example of a source:

P[A] =1

2,P[B] =

1

4,P[C] =

1

8,P[D] =

1

8

1. A → 0 ABACD → 0100110111 → ABACD

B → 10C → 110D → 111 L = 7

4

Minimum length code satisfying Assumptions 1 and 2!

Assumptions

1. One must be able to uniquely recover the source sequence from

the binary sequence

2. One knows the start of the binary sequence at the receiver

3. One would like to minimize the average number of binary digits

per source letter

Source Coding: Basic Definitions

Source Coding

• Source Coding

• Basic Definitions

Larger Alphabet

Huffman Codes

Class Work

4 / 16

Codeword (aka Block Code)

Each source symbol is represented by some sequence of coded

symbols called a code word

Non-Singular Code

Code words are distinct

Uniquely Decodable (U.D.) Code

Every distinct concatenation of m code words is distinct for every

finite m

Instantaneous Code

A U.D. Code where we can decode each code word without seeing

subsequent code words

Source Coding: Basic Definitions

Source Coding

• Source Coding

• Basic Definitions

Larger Alphabet

Huffman Codes

Class Work

4 / 16

Example

Back to the simple case of 4-letter DMS:

Source Symbols Code 1 Code 2 Code 3 Code 4

A 0 00 0 0

B 1 01 10 01

C 00 10 110 011

D 01 11 111 111

Non-Singular Yes Yes Yes Yes

U.D. No Yes Yes Yes

Instan Yes Yes Yes

A NECESSARY AND SUFFICIENT CONDITION for a code to be

instantaneous is that no code word be a PREFIX of any other code

word.

Coding Several Source Symbol at

a Time

Source Coding

Larger Alphabet

• Example 1

Huffman Codes

Class Work

5 / 16

Example 1: 3 letter DMS

Source Coding

Larger Alphabet

• Example 1

Huffman Codes

Class Work

6 / 16

Source Symbols Probability U.D. Code

A .5 0

B .35 10

C .85 11

L1 = 1.5 (Bits / Symbol)

Example 1: 3 letter DMS

Source Coding

Larger Alphabet

• Example 1

Huffman Codes

Class Work

6 / 16

Source Symbols Probability U.D. Code

A .5 0

B .35 10

C .85 11

L1 = 1.5 (Bits / Symbol)

Let us consider two consecutive source symbols at a time:

2 Symbols Probability U.D. Code

AA .25 01

AB .175 11

AC .075 0010

BA .175 000

BB .1225 101

BC .0525 1001

CA .075 0011

CB .0525 10000

CC .0225 10001

L2 = 2.9275 (Bits/ 2 Symbols)L2

2= 1.46375 (Bits / Symbol)

Example 1: 3 letter DMS

Source Coding

Larger Alphabet

• Example 1

Huffman Codes

Class Work

6 / 16

In other words,

1. It is more efficient to build a code for 2 source symbols!

2. Is it possible to decrease the length more and more by

increasing the alphabet size?

To see the answer to the above question, it is useful if we can say

precisely characterize the best code. The codes given above are

Huffman Codes. The procedure for making Huffman Codes will be

described next.

Minimizing average length

Source Coding

Larger Alphabet

Huffman Codes

• Binary Huffman Code

• Example I

• Example II

• Optimality

• Shannon-Fano Codes

Class Work

7 / 16

Binary Huffman Codes

Source Coding

Larger Alphabet

Huffman Codes

• Binary Huffman Code

• Example I

• Example II

• Optimality

• Shannon-Fano Codes

Class Work

8 / 16

Binary Huffman Codes

1. Order probabilities - Highest to Lowest

2. Add two lowest probabilities

3. Reorder probabilities

4. Break ties in any way you want

Binary Huffman Codes

Source Coding

Larger Alphabet

Huffman Codes

• Binary Huffman Code

• Example I

• Example II

• Optimality

• Shannon-Fano Codes

Class Work

8 / 16

Binary Huffman Codes

1. Order probabilities - Highest to Lowest

2. Add two lowest probabilities

3. Reorder probabilities

4. Break ties in any way you want

Example

1. {.1, .2, .15, .3, .25}order→ {.3, .25, .2, .15, .1}

2. {.3, .25, .2, .15, .1︸ ︷︷ ︸

.25

}

3. Get either {.3, (.15, .1︸ ︷︷ ︸

.25

), .25, .2} or

{.3, .25, (.15, .1︸ ︷︷ ︸

.25

), .2}

Binary Huffman Codes

Source Coding

Larger Alphabet

Huffman Codes

• Binary Huffman Code

• Example I

• Example II

• Optimality

• Shannon-Fano Codes

Class Work

8 / 16

Binary Huffman Codes

1. Order probabilities - Highest to Lowest

2. Add two lowest probabilities

3. Reorder probabilities

4. Break ties in any way you want

5. Assign 0 to top branch and 1 to bottom branch (or vice versa)

6. Continue until we have only one probability equal to 1

7. L = Sum of probabilities of combined nodes (i.e., the circled

ones)

Binary Huffman Codes

Source Coding

Larger Alphabet

Huffman Codes

• Binary Huffman Code

• Example I

• Example II

• Optimality

• Shannon-Fano Codes

Class Work

8 / 16

Binary Huffman Codes

1. Order probabilities - Highest to Lowest

2. Add two lowest probabilities

3. Reorder probabilities

4. Break ties in any way you want

5. Assign 0 to top branch and 1 to bottom branch (or vice versa)

6. Continue until we have only one probability equal to 1

7. L = Sum of probabilities of combined nodes (i.e., the circled

ones)

Optimality of Huffman Coding

1. Binary Huffman code will have the shortest average length as

compared with any U.D. Code for set of probabilities.

2. The Huffman code is not unique. Breaking ties in different ways

can result in very different codes. The average length, however,

will be the same for all of these codes.

Huffman Coding: Example

Source Coding

Larger Alphabet

Huffman Codes

• Binary Huffman Code

• Example I

• Example II

• Optimality

• Shannon-Fano Codes

Class Work

9 / 16

Example Continued

1. {.1, .2, .15, .3, .25}order→ {.3, .25, .2, .15, .1}

Huffman Coding: Example

Source Coding

Larger Alphabet

Huffman Codes

• Binary Huffman Code

• Example I

• Example II

• Optimality

• Shannon-Fano Codes

Class Work

9 / 16

Example Continued

1. {.1, .2, .15, .3, .25}order→ {.3, .25, .2, .15, .1}

L = .25 + .45 + .55 + 1 = 2.25

Huffman Coding: Example

Source Coding

Larger Alphabet

Huffman Codes

• Binary Huffman Code

• Example I

• Example II

• Optimality

• Shannon-Fano Codes

Class Work

9 / 16

Example Continued

1. {.1, .2, .15, .3, .25}order→ {.3, .25, .2, .15, .1}

Or

L = .25 + .45 + .55 + 1 = 2.25

Huffman Coding: Tie Breaks

Source Coding

Larger Alphabet

Huffman Codes

• Binary Huffman Code

• Example I

• Example II

• Optimality

• Shannon-Fano Codes

Class Work

10 / 16

In the last example, the two ways of breaking the tie led to two

different codes with the same set of code lengths. This is not always

the case — Sometimes we get different codes with different code

lengths.

EXAMPLE:

Huffman Coding: Optimal Average Length

Source Coding

Larger Alphabet

Huffman Codes

• Binary Huffman Code

• Example I

• Example II

• Optimality

• Shannon-Fano Codes

Class Work

11 / 16

• Binary Huffman Code will have the shortest average length as

compared with any U.D. Code for set of probabilities (No U.D. will

have a shorter average length).

◦ The proof that a Binary Huffman Code is optimal — that is,

has the shortest average code word length as compared with

any U.D. code for that the same set of probabilities — is

omitted.

◦ However, we would like to mention that the proof is based on

the fact that in the process of constructing a Huffman Code

for that set of probabilities other codes are formed for other

sets of probabilities, all of which are optimal.

Shannon-Fano Codes

Source Coding

Larger Alphabet

Huffman Codes

• Binary Huffman Code

• Example I

• Example II

• Optimality

• Shannon-Fano Codes

Class Work

12 / 16

SHANNON - FANO CODES is another binary coding technique to

construct U.D. codes (not necessarily optimum!)

1. Order probabilities in decreasing order.

2. Partition into 2 sets that one as close to equally probable as

possible. Label top set with a ”‘0”’ and bottom set with a ”‘1”’.

3. Continue using step 2 over and over

Shannon-Fano Codes

Source Coding

Larger Alphabet

Huffman Codes

• Binary Huffman Code

• Example I

• Example II

• Optimality

• Shannon-Fano Codes

Class Work

12 / 16

SHANNON - FANO CODES is another binary coding technique to

construct U.D. codes (not necessarily optimum!)

1. Order probabilities in decreasing order.

2. Partition into 2 sets that one as close to equally probable as

possible. Label top set with a ”‘0”’ and bottom set with a ”‘1”’.

3. Continue using step 2 over and over

.5 ⇒ 0

.2 ⇒ 1 0.15 ⇒ 1 1 0.15 ⇒ 1 1 1

.4 ⇒ 0

.3 ⇒ 1 0

.1 ⇒ 1 1 0 0.05 ⇒ 1 1 0 1.05 ⇒ 1 1 1 0.05 ⇒ 1 1 1 1 0.05 ⇒ 1 1 0 1 1

L =? L = 2.3

Compare with Huffman coding. Same length as Huffman code?!

More Examples

Source Coding

Larger Alphabet

Huffman Codes

Class Work

• More Examples

• More Examples

• More Examples

13 / 16

Binary Huffman Codes

Source Coding

Larger Alphabet

Huffman Codes

Class Work

• More Examples

• More Examples

• More Examples

14 / 16

Construct binary Huffman and Shannon-Fano codes where:

EXAMPLE 1: (p1, p2, p3, p4) = (12, 14, 18, 18)

EXAMPLE 2: Consider the examples on the previous slide and

construct binary Huffman codes.

Large Alphabet Size

Source Coding

Larger Alphabet

Huffman Codes

Class Work

• More Examples

• More Examples

• More Examples

15 / 16

Example 3: Consider a binary Source {A,B}(p1, p2) = (.9, .1). Now construct a series of Huffman Codes and

series of Shannon-Fano Codes, by encoding N source symbols at a

time for N = 1, 2, 3, 4.

Shannon-Fano codes are suboptimal !

Source Coding

Larger Alphabet

Huffman Codes

Class Work

• More Examples

• More Examples

• More Examples

16 / 16

Example 3: Construct a Shannon-Fano code:

2× .25 0 02× .20 0 1

.6 3× .15 1 0 0

.7 3× .10 1 0 1.75 4× .05 1 1 0 0.8 5× .05 1 1 0 1 0

.85 5× .05 1 1 0 1 1.9 4× .05 1 1 1 0

5× .04 1 1 1 1 06× .03 1 1 1 1 1 06× .03 1 1 1 1 1 1

L = 3.11

Compare this with a binary Huffman Code.

1 / 17

Digital Communications III (ECE 154C)

Introduction to Coding and Information Theory

Tara Javidi

These lecture notes were originally developed by late Prof. J. K. Wolf.

UC San Diego

Spring 2014

Noiseless Source Coding

Continued

Source Coding

• n-ary Huffman

Coding

Beyond Huffman

Limpel-Ziv

2 / 17

Non-binary Huffman Coding

Source Coding

• n-ary Huffman

Coding

Beyond Huffman

Limpel-Ziv

3 / 17

• The objective is to create a Huffman Code where the code words

are from an alphabet with n letter is to:

1. Order probabilities high to low (perhaps with an extra symbol

with probability 0)

2. Combine n least likely probabilities. Add them and re-order.

3. End up with n symbols(i.e. probabilities)!!!

Example 1: A source with alphabet {A,B,C,D,E} and

probabilities (.5, .3, .1, .08, .02) coded into ternary stream n = 3:

Non-binary Huffman Coding

Source Coding

• n-ary Huffman

Coding

Beyond Huffman

Limpel-Ziv

4 / 17

Example 2: n = 3 {A,B,C,D}(p1, p2, p3, p4) = (.5, .3, .1, .1)

Non-binary Huffman Coding

Source Coding

• n-ary Huffman

Coding

Beyond Huffman

Limpel-Ziv

4 / 17

Example 2: n = 3 {A,B,C,D}(p1, p2, p3, p4) = (.5, .3, .1, .1)

Placeholder Figure A L1 = 1.5 SUBOPTIMAL

Placeholder Figure A L1 = 1.2 OPTIMAL

Non-binary Huffman Coding

Source Coding

• n-ary Huffman

Coding

Beyond Huffman

Limpel-Ziv

4 / 17

Example 2: n = 3 {A,B,C,D}(p1, p2, p3, p4) = (.5, .3, .1, .1)

• Sometimes one has to add Phantom Source Symbols with 0probability in order to make a Non-Binary Huffman Code.

• How many?

◦ If one starts with M source symbols and one combines the nleast likely into one symbol, one is left with M − (n− 1)symbols.

◦ After doing this α times, one is left with M − α(n− 1)symbols.

◦ But at the end we must be left with n symbols. If this is not

the case, we must add Phantom Symbols.

• Add D Phantom Symbols to insure that

M +D − α(n− 1) = n or (M +D) = α′(n− 1) + 1

Non-binary Huffman Coding

Source Coding

• n-ary Huffman

Coding

Beyond Huffman

Limpel-Ziv

5 / 17

EXAMPLES:

n = 3M D

3 0

4 1

5 0

6 1

7 0

8 1

9 0

10 1

n = 4M D

4 0

5 2

6 1

7 0

8 2

9 1

10 0

11 1

n = 5M D

5 0

6 3

7 2

8 1

9 0

10 3

11 2

12 1

n = 6M D

6 0

7 4

8 3

9 2

10 1

11 0

12 4

13 3

NOTE :

• M +D − 1 must be divisible by n− 1.

EX: n = 3 ⇒ M +D − 1 must be even

• D ≤ n− 2

Beyond Huffman Codes

Source Coding

Beyond Huffman

• Fax

• Tunstall Code

• Tunstall Code

• Tunstall Code

• Tunstall Code

• Tunstall Code

• Tunstall Code

Limpel-Ziv

6 / 17

Run Length Codes for Fax (B/W)

Source Coding

Beyond Huffman

• Fax

• Tunstall Code

• Tunstall Code

• Tunstall Code

• Tunstall Code

• Tunstall Code

• Tunstall Code

Limpel-Ziv

7 / 17

Variable-length to Fixed-length Source Coding

Source Coding

Beyond Huffman

• Fax

• Tunstall Code

• Tunstall Code

• Tunstall Code

• Tunstall Code

• Tunstall Code

• Tunstall Code

Limpel-Ziv

8 / 17

Previously we only considered the situation where we encoded Nsource symbols into variable length code sequences for a

fixed value of N . We could call this ”‘fixed length to variable length”’

encoding. But another possibility exists. We could encode variable

length source sequences into fixed or variable length code words.

Example 1: Consider the DMS source {A,B} with probabilities

(.9, .1) and the following code book

Source Sequences Codewords

B 00AB 01AAB 10AAA 11

Average length of source phrase

= 1× .1 + 2× .09 + 3× (.081 + .70) = 2.75Average # of code symbols/source symbols = 2

2.71= 0.738

Tunstall Codes

Source Coding

Beyond Huffman

• Fax

• Tunstall Code

• Tunstall Code

• Tunstall Code

• Tunstall Code

• Tunstall Code

• Tunstall Code

Limpel-Ziv

9 / 17

Tunstall Codes are U.D. Variable- to Fixed- length codes with binary

code words

Basic Idea – Encode into binary code words of fixed length L, make

2L source phrases that are as nearly equally probable as we can

We do this by making the source phrases as leaves of a tree and

always splitting the leaf with the highest probability.

Tunstall Codes

Source Coding

Beyond Huffman

• Fax

• Tunstall Code

• Tunstall Code

• Tunstall Code

• Tunstall Code

• Tunstall Code

• Tunstall Code

Limpel-Ziv

10 / 17

Example 2: (A,B,C,D) with (p1, p2, p3, p4) = (.5, .3, .1, .1)

Placeholder Figure A

Source Symbol Code word Source Symbol Code word

D 0000 C 0001

BB 0010 AB 1001

BC 0011 AC 1010

BD 0100 AD 1011

BAA 0101 AAA 1100

BAB 0110 AAB 1101

BAC 0111 AAC 1110

BAD 1000 AAD 1111

Average length of source phrase = Sum of probabilities of internal

nodes) = 1 + .5 + .3 + .25 + .15 = 2.2Average number of code symbols/source symbols = 4/2.2 = 1.82

Improved Tunstall Coding

Source Coding

Beyond Huffman

• Fax

• Tunstall Code

• Tunstall Code

• Tunstall Code

• Tunstall Code

• Tunstall Code

• Tunstall Code

Limpel-Ziv

11 / 17

• Since the phrases are not equally probable, one can use a

Huffman Code on the phrases.

• The result is encoding a variable number of source symbols into

a variable number of code symbols.

Example 3: Back to Example 1 with (A,B) with (.9, .1)

We have seen that Tunstall alone I = 2.71 Av = 2

2.71= .738

Source Phrases Tunstall Code Probability Improved Tunstall (Huffman)

AAA 11 0.729 0

B 00 0.1 11

AB 01 0.09 100

AAB 10 0.081 101

Average # of code symbols per source symbol

=(1+.271+.171)(1+.9+.81) =

1.4422.71 = .532

Summary of Results for (A,B) = (.9, .1)

Source Coding

Beyond Huffman

• Fax

• Tunstall Code

• Tunstall Code

• Tunstall Code

• Tunstall Code

• Tunstall Code

• Tunstall Code

Limpel-Ziv

12 / 17

All of the following use 4 code words in coding table:

1. Huffman Code, N = 2

AA −→ 0AB −→ 11BA −→ 100BB −→ 101

2. Shannon-Fano Code N = 2

AA −→ 0AB −→ 10BA −→ 110BB −→ 111

Summary of Results for (A,B) = (.9, .1)

Source Coding

Beyond Huffman

• Fax

• Tunstall Code

• Tunstall Code

• Tunstall Code

• Tunstall Code

• Tunstall Code

• Tunstall Code

Limpel-Ziv

13 / 17

1. Tunstall Code

B −→ 00AB −→ 01

AAB −→ 10AAA −→ 11

2. Tunstall/Huffman

B −→ 11AB −→ 100

AAA −→ 0AAB −→ 101

Limpel-Ziv Source Coding

Source Coding

Beyond Huffman

Limpel-Ziv

• Lempel-Ziv

• Lempel-Ziv

• Lempel-Ziv

14 / 17

Lempel-Ziv Source Coding

Source Coding

Beyond Huffman

Limpel-Ziv

• Lempel-Ziv

• Lempel-Ziv

• Lempel-Ziv

15 / 17

• The basic idea is that if we have a dictionary of 2A source

phrases (Available at both the encoder and the decoder) in order

to encode on of these phrases one needs only A binary digits.

• Normally, a computer stores each symbol as an ASCII character

of 8 binary digits. (Actually only 7 are needed)

• Using L-Z encoding, far less than 7 binary digits per symbol are

needed. Typically the compression is about 2:1 or 3:1.

• Variants of L-Z codes was the algorithm of the widely used Unix

file compression utility compress as well as gzip. Several other

popular compression utilities also used L-Z, or closely related

encoding.

• LZ became very widely used when it became part of the GIF

image format in 1987. It may also (optionally) be used in TIFF

and PDF files.

• There are two versions of L-Z codes. We will only discuss the

“window” version.

Lempel-Ziv Source Coding

Source Coding

Beyond Huffman

Limpel-Ziv

• Lempel-Ziv

• Lempel-Ziv

• Lempel-Ziv

16 / 17

• In (the window version of) Lempel Ziv, symbols that have already

been encoded are stored in a window.

• The encoder then looks at the next symbols to be encoder to find

the longest string that is in the window that matches the source

symbols to be encoded.

◦ If it can’t find the next symbol in the window, it sends a ”‘0”’

followed by the 8(or 7) bits of the ASCII character.

◦ If it finds a sequence of one or more symbols in the window, it

sends a ”‘1”’ followed by the bit position of the first symbol in

the match followed by the length of the match. These latter

two quantities are encoded into binary.

◦ Then the sequence that was just encoded is put into the

window.

Lempel-Ziv Source Coding

Source Coding

Beyond Huffman

Limpel-Ziv

• Lempel-Ziv

• Lempel-Ziv

• Lempel-Ziv

17 / 17

Example: Suppose the content of the window is given as

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

T H E T H R E E A R E I N

The next word, assuming it is “THE ”, will be encoded as

(1, “15”︸︷︷︸

4bits

,

?bits︷︸︸︷

”4” )

And then the windows content will be updated as15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

T H R E E A R E I N T H E

Lempel-Ziv Source Coding

Source Coding

Beyond Huffman

Limpel-Ziv

• Lempel-Ziv

• Lempel-Ziv

• Lempel-Ziv

17 / 17

EXAMPLE: Encode the text

”MY MY WHAT A HAT IS THAT”

16 BIT WINDOW

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

M

M Y

M Y

M Y M Y

M Y M Y W

M Y M Y W H

M Y M Y W H A

M Y M Y W H A T

M Y M Y W H A T

M Y M Y W H A T A

M Y M Y W H A T A

M Y M Y W H A T A H A T

M Y W H A T A H A T I

M Y W H A T A H A T I S

M Y W H A T A H A T I S

Y W H A T A H A T I S T

W H A T A H A T I S T H A T

# of 1 bits

(0,M) 9

(0, Y ) 9

(0, ) 9

(1, 2, 3) 8

(1, 2, 3) 8

(0,W ) 9

(0, H) 9

(0, A) 9

(0, T ) 9

(1, 4, 1) 6

(1, 2, 1) 6

(1, 1, 1) 6

(1, 5, 4) 9

(0, 1) 9

(0, 5) 9

(1, 2, 1) 6

(1, 4, 1) 6

(1, 7, 3) 8

144

Lempel-Ziv Source Coding

Source Coding

Beyond Huffman

Limpel-Ziv

• Lempel-Ziv

• Lempel-Ziv

• Lempel-Ziv

17 / 17

No match (0,M) −→ 1 bit more than needed for a symbol (1 + 8 = 9)

Match(1, , , )↓ ↓ ց

1 bits depends on depends on the code used

window size to encode lengths

One really simple code for this purpose might be

1 −→ 02 −→ 103 −→ 110

...

What are the advantages/disadvantages of this code?

1 / 12

Digital Communications III (ECE 154C)

Introduction to Coding and Information Theory

Tara Javidi

These lecture notes were originally developed by late Prof. J. K. Wolf.

UC San Diego

Spring 2014

Noiseless Source Coding

Fundamental Limits

Source Coding

• Entropy

• A Useful Theorem

• A Useful Theorem

• A Useful Theorem

• More on Entropy

• More on Entropy

• Fundamental Limits

• More on Entropy

• Source Coding

Theorem

• Source Coding

Theorem

2 / 12

Shannon Entropy

Source Coding

• Entropy

• A Useful Theorem

• A Useful Theorem

• A Useful Theorem

• More on Entropy

• More on Entropy

• Fundamental Limits

• More on Entropy

• Source Coding

Theorem

• Source Coding

Theorem

3 / 12

• Let an I.I.D. source S, have M source letters that occur with

probabilities p1, p2, ..., pM ,∑

M

i=1pi = 1

• The entropy of the source S is denoted H(S) and is defined as

Ha(S) =

M∑

i=1

pi loga1

pi= −

M∑

i=1

pi loga pi

Ha(S) = E

[

loga1

pi

]

• The base of the logarithms is usually taken to be equal to 2. In

that case, H2(S) is simply written as H(S) and is measured in

units of “bits”.

Shannon Entropy

Source Coding

• Entropy

• A Useful Theorem

• A Useful Theorem

• A Useful Theorem

• More on Entropy

• More on Entropy

• Fundamental Limits

• More on Entropy

• Source Coding

Theorem

• Source Coding

Theorem

4 / 12

• Other bases can be used and easily transformed to. Note:

loga x = (logb x)(loga b) =(logb x)

(logb a)

Hence,

Ha(S) = Hb(S) · loga b =Hb(S)

logb a

A Useful Theorem

Let p1, p2, ..., pM be one set of probabilities and let p′1, p′

2, ..., p′

M

be another (note∑

M

i=1pi = 1,

M

i=1p′i= 1). Then

M∑

i=1

pi log1

pi≤

M∑

i=1

pi log1

p′i

,

with equality iff pi = p′i

for i = 1, 2, ...,M

Shannon Entropy

Source Coding

• Entropy

• A Useful Theorem

• A Useful Theorem

• A Useful Theorem

• More on Entropy

• More on Entropy

• Fundamental Limits

• More on Entropy

• Source Coding

Theorem

• Source Coding

Theorem

5 / 12

Proof.

First note that lnx ≤ x− 1 with equality iff x = 1.

Placeholder Figure A

M∑

i=1

pi logp′i

pi=

(

M∑

i=1

pi lnp′i

pi

)

log e

Shannon Entropy

Source Coding

• Entropy

• A Useful Theorem

• A Useful Theorem

• A Useful Theorem

• More on Entropy

• More on Entropy

• Fundamental Limits

• More on Entropy

• Source Coding

Theorem

• Source Coding

Theorem

6 / 12

proof continued.

(

M∑

i=1

pi lnp′i

pi

)

log e ≤ (log e)

M∑

i=1

pi

(

p′i

pi− 1

)

= (log e)

(

M∑

i=1

p′i −

M∑

i=1

pi

)

= 0

In other words,

M∑

i=1

pi log1

pi≤

M∑

i=1

pi log1

p′i

Shannon Entropy

Source Coding

• Entropy

• A Useful Theorem

• A Useful Theorem

• A Useful Theorem

• More on Entropy

• More on Entropy

• Fundamental Limits

• More on Entropy

• Source Coding

Theorem

• Source Coding

Theorem

7 / 12

1. For an equally likely i.i.d. source with M source letters

H2(S) = log2 M (aHa(S) = logaM,anya)2. For any i.i.d. source with M source letters

0 ≤ H2(S) ≤ log2 M, for alla

This follows from previous theorem with p′i= 1

Mall i.

3. Consider an iid source with source alphabet S. If we consider

encoding M source letters at a time, this is an iid source with

alphabet SM for source letters. Call this the M tuple extension

of the source and denote it is by SM . Then

H2(SM ) = mH2(S),

(and similarly for all a, Ha(SM ) = mHa(S)).

The proofs are omitted but are easy.

Computation of Entropy (base 2)

Source Coding

• Entropy

• A Useful Theorem

• A Useful Theorem

• A Useful Theorem

• More on Entropy

• More on Entropy

• Fundamental Limits

• More on Entropy

• Source Coding

Theorem

• Source Coding

Theorem

8 / 12

Example 1: Consider a source with M = 2 and

(p1, p2) = (0.9, 0.1).

H2(S) = .9 log21

.9+ .1 log2

1

1.1= 0.469bits

From before we gave Huffman Codes for this source and extensions

of this source for which

L1 = 1

L 2

2

= 0.645

L 3

3

= 0.533

L 4

4

= 0.49

In other words, we note that Lm

m≥ H2(s). Furthermore, as m gets

larger Im

mis getting closer to H2(S).

Performance of Huffman Codes

Source Coding

• Entropy

• A Useful Theorem

• A Useful Theorem

• A Useful Theorem

• More on Entropy

• More on Entropy

• Fundamental Limits

• More on Entropy

• Source Coding

Theorem

• Source Coding

Theorem

9 / 12

−→ One can prove that, in general, for a binary Huffman Code,

H2(S) ≤ Lm

m

< H2(S) +1

m

Example 2: Consider a source with M = 3 and

(p1, p2, p3) = (.5, .35, .15).

H2(S) = .5 log21

.5+ .35 log2

1

.35+ .15 log2

1

.15= 1.44 bits

Again we have already given codes for this source such that

1 symbol at a time L1 = 1.52 symbols at a time L 2

2

= 1.46

Performance of Huffman Codes

Source Coding

• Entropy

• A Useful Theorem

• A Useful Theorem

• A Useful Theorem

• More on Entropy

• More on Entropy

• Fundamental Limits

• More on Entropy

• Source Coding

Theorem

• Source Coding

Theorem

10 / 12

Example 3: Consider M = 4 and (p1, p2, p3, p4) = (12, 14, 18, 18).

H2(S) =1

2log2

1

1/2+

1

4log2

1

1/4+

1

8log2

1

1/8+

1

8log2

1

1/8

= 1.75bits

But from before we gave the code

Source Symbols Codewords

A 0B 10C 110D 111

for which L1 = H2(S).

This means that one cannot improve on the efficiency of this

Huffman code by encoding several source symbols at a time.

Fundamental Source Coding Limit

Source Coding

• Entropy

• A Useful Theorem

• A Useful Theorem

• A Useful Theorem

• More on Entropy

• More on Entropy

• Fundamental Limits

• More on Entropy

• Source Coding

Theorem

• Source Coding

Theorem

11 / 12

Theorem [Lossless Source Coding Theorem]

For any U.D. Binary Code corresponding to the N th extension of the

I.I.D. Source S, for every N = 1, 2, ...

LN

N≥ H(S)

• For a binary Huffman Code corresponding to the N th extension

of the I.I.D. Source S

H2(S) +1

m>

LN

N≥ H2(S)

• But this implies that Huffman coding is asymptotically optimal, i.e.

limN→∞

LN

N−→ H2(S)

NON-BINARY CODE WORDS

Source Coding

• Entropy

• A Useful Theorem

• A Useful Theorem

• A Useful Theorem

• More on Entropy

• More on Entropy

• Fundamental Limits

• More on Entropy

• Source Coding

Theorem

• Source Coding

Theorem

12 / 12

The code symbols that make up the codewords can be from a higher

order alphabet than 2.

Example 4: I.I.D. Source {A,B,C,D,E} with U.D. codes (where each

concatenation of code words can be decoded in only one way):

SYMBOLS TERNARY QUATERNARY 5-ary

A 0 0 0B 1 1 1C 20 2 2D 21 30 3E 22 31 4

• A lower bound to the average code length of any U.D. n-ary code

(with n-letters) is Hn(S)• For example, for a ternary code, the average length (per source

letter),LM

M, is no less than H3(S).

1 / 14

Digital Communications III (ECE 154C)

Introduction to Coding and Information Theory

Tara Javidi

These lecture notes were originally developed by late Prof. J. K. Wolf.

UC San Diego

Spring 2014

Noiseless Source Coding

Fundamental Limits

Source Coding

Theorem

• Theorem Statement

• Proof Sketch

Elements of Proof

Sourece Coding

Theorem

2 / 14

Fundamental Source Coding Limit

Source Coding

Theorem

• Theorem Statement

• Proof Sketch

Elements of Proof

Sourece Coding

Theorem

3 / 14

Theorem [Lossless Source Coding Theorem]

For any U.D. n-ary code corresponding to the N th extension of the

I.I.D. Source S, for every N = 1, 2, ...

LN

N≥ Hn(S)

• Furthermore for an n-ary Huffman Code corresponding to the

N th extension of the I.I.D. Source S

Hn(S) +1

N>

LN

N≥ Hn(S)

• But this implies that Huffman coding is asymptotically optimal, i.e.

limN→∞

LN

N−→ Hn(S)

Sketch of the Proof

Source Coding

Theorem

• Theorem Statement

• Proof Sketch

Elements of Proof

Sourece Coding

Theorem

4 / 14

• If one can construct a U.D. code such that li =⌈

logn1pi

, we1

will have

Hn(S) + 1 > L ≥ Hn(S).

• But is it always possible to construct such a code?

• If it is, is it good enough? Can we do better?

• To do this, we first prove useful conditions on a set of integers if

they are the length of U.D. codewords. These conditions are

called Kraft and McMillan inequalities.

1Note that ⌈x⌉ is the unique integer in the interval [x, x+ 1).

Proof of Source Coding Theorem

Source Coding

Theorem

Elements of Proof

• Kraft Inequality

• Kraft Inequality

• Kraft Inequality

• Kraft Inequality

• McMillan Inequality

• McMillan Inequality

Sourece Coding

Theorem

5 / 14

Codeword Lengths: Necessary and Sufficient Condition

Source Coding

Theorem

Elements of Proof

• Kraft Inequality

• Kraft Inequality

• Kraft Inequality

• Kraft Inequality

• McMillan Inequality

• McMillan Inequality

Sourece Coding

Theorem

6 / 14

Theorem [Kraft Inequality] A necessary and sufficient condition for

the construction of an instantaneous n-ary code with M code words

of lengths l1, l2, .., lM where the code symbols take on n different

values is thatM∑

i=1

n−li ≤ 1

PROOF OF SUFFICIENCY:

• We construct an instantaneous code with these code word

lengths. Let there be mj code words of length j for

j = 1, 2, ..., l∗ = max li. Then

M∑

i=1

n−li =

l∗∑

j=1

mjn−j

Kraft Inequality: Proof of sufficiency

Source Coding

Theorem

Elements of Proof

• Kraft Inequality

• Kraft Inequality

• Kraft Inequality

• Kraft Inequality

• McMillan Inequality

• McMillan Inequality

Sourece Coding

Theorem

7 / 14

• In other words, if∑M

i=1 n−li ≤ 1, then

∑l∗

j=1mjnl∗−1 ≤ nl∗ .

Or,

ml∗ +m(l∗−1)n+m(l∗−2)n2 + ...+m1n

l∗−1≤ nl∗ ,

and, equivalently

ml∗ ≤ nl∗−m1n

l∗−1−m2n

l∗−2t− . . .−ml∗−1n (1)

• But since ml∗ ≥ 0 we then have

0 ≤ nl∗−m1n

l∗−1−m2n

l∗−2t− . . .−ml∗−1n

and can repeat the procedure to arrive at

ml∗−1 ≤ nl∗−1−m1n

l∗−2−m2n

l∗−3− . . .−ml∗−2n (2)

Codeword Lengths: Necessary and Sufficient Condition

Source Coding

Theorem

Elements of Proof

• Kraft Inequality

• Kraft Inequality

• Kraft Inequality

• Kraft Inequality

• McMillan Inequality

• McMillan Inequality

Sourece Coding

Theorem

8 / 14

• But dividing by n and noting that ml∗−1 ≥ 0 we have

0 ≤ nl∗−2−m1n

l∗−3−m2n

l∗−4− . . .−ml∗−2

equivalently

ml∗−2 ≤ nl∗−2−m1n

l∗−3−m2n

l∗−4− . . . (3)

• Continuing we get

0 ≤ m3 ≤ n3 −m1n2 −m2n (4)

0 ≤ m2 ≤ n2 −m1n (5)

m1 ≤ n (6)

• Note that if

M∑

i=1

n−li ≤ 1 then the mj satisfy (1)-(6).

Kraft Inequality: Proof of sufficiency

Source Coding

Theorem

Elements of Proof

• Kraft Inequality

• Kraft Inequality

• Kraft Inequality

• Kraft Inequality

• McMillan Inequality

• McMillan Inequality

Sourece Coding

Theorem

9 / 14

Note that m1 ≤ n.

If m1 = n then we are done (assigning each codeword a letter).

If m1 < n, we have (n−m1) unused prefixes to form code words

of length 2 for which the code words of length 1 are not prefixes.

This means that there are (n−m1)n phrases of length 2 to select

the codewords from. But this is larger than or equal to the number of

codewords of length 2, m2 according to equation (5). So we can

construct our code by selecting m2 of the (n−m1)n prefix-free

phrases of length 2

If m2 = (n−m1)n we are done. If m2 < n2 −m1n, there are

(n2 −m1n−m2)n code words of length 3 which satisfy the prefix

condition. etc.

Kraft Inequality: Proof of necessity

Source Coding

Theorem

Elements of Proof

• Kraft Inequality

• Kraft Inequality

• Kraft Inequality

• Kraft Inequality

• McMillan Inequality

• McMillan Inequality

Sourece Coding

Theorem

10 / 14

Proof of necessity follows from McMillan inequality

Theorem [McMillan Inequality]

A necessary and a sufficient condition for the existence of a U.D.

code with M code words of length l1, l2, ..., lM where the code

symbols take on n different values is:

M∑

i=1

n−li ≤ 1.

Here we sketch the proof of necessity of McMillan Inequality (enough

to prove the necessity of Kraft inequality). Proof by contradiction:

1. Assume

M∑

i=1

n−li = A > 1 and a U.D. code exists.

2. If

(

M∑

i=1

n−li > 1

)

, then

[

M∑

i=1

n−li

]N

≈ AN.

Kraft Inequality: Proof of necessity

Source Coding

Theorem

Elements of Proof

• Kraft Inequality

• Kraft Inequality

• Kraft Inequality

• Kraft Inequality

• McMillan Inequality

• McMillan Inequality

Sourece Coding

Theorem

11 / 14

3.

(

M∑

i=1

n−li

)N

=

l∗∑

j=1

mjn−j

N

=Nl∗∑

k=N

Nkn−k

where

Nk =∑

i1+i2+...+iN=k

mi1mi2 · · ·miN

and it denotes the # of strings of N code words that are the

length of exactly k.

4. If the code is U.D., Nk ≤ nk. But then for a U.D. code

(

M∑

i=1

n−li

)N

=

(

Nl∗∑

k=N

Nkn−k

)

Nl∗∑

k=N

1 = Nl∗ −N + 1

Which grows linearly with N , not exponentially with N ,

contradicting bullet point 2.

Complete Proof

Source Coding

Theorem

Elements of Proof

Sourece Coding

Theorem

• Proof

• Proof

12 / 14

Proof of Source Coding Theorem

Source Coding

Theorem

Elements of Proof

Sourece Coding

Theorem

• Proof

• Proof

13 / 14

Lower bound for L for a U.D. Code

Theorem: For any instantaneous code 2, L ≥ Hn(S).Furthermore, L = Hn(S) iff pi = n−li .

Proof: Let p′

i =n−li

M∑

j=1

n−lj

. Note p′i ≥ 0 and

M∑

i=1

p′i = 1.

From before:

Hn(S) =

M∑

i=1

pi logn1

pi≤

M∑

i=1

pi logn1

p′i

2Here L = Average length of the U.D. Code and n = number of symbols in code

alphabet.

Proof of Source Coding Theorem

Source Coding

Theorem

Elements of Proof

Sourece Coding

Theorem

• Proof

• Proof

14 / 14

Then:

Hn(S) ≤M∑

i=1

pili +M∑

i=1

pi logn

M∑

j=1

n−lj

But for a U.D. code,

M∑

j=1

n−lj ≤ 1, so logn

M∑

j=1

n−lj

≤ 0

Hn(S) ≤ L

Equality occurs if and only if

M∑

j=1

n−lj = 1 and pi = p′i.

But both of these conditions hold if pi = n−li , li is an integer.

1 / 12

Digital Communications III (ECE 154C)

Introduction to Coding and Information Theory

Tara Javidi

These lecture notes were originally developed by late Prof. J. K. Wolf.

UC San Diego

Spring 2014

Asymptotic Optimality

Huffman Codes

• Optimality

• Proof Sketch

Examples

Optimality Proof

2 / 12

Asymptotic Optimality of Huffman Codes

Huffman Codes

• Optimality

• Proof Sketch

Examples

Optimality Proof

3 / 12

We have seen that for any U.D. n-ary code corresponding to the

N th extension of the I.I.D. Source S, for every N = 1, 2, ...,LN

N≥ Hn(S).

→ How do you get the result for general N?

• Next we show that for an n-ary Huffman Code corresponding to

the N th extension of the I.I.D. Source S

Hn(S) +1

N>

LN

N≥ Hn(S)

• But this implies that Huffman coding is asymptotically optimal, i.e.

limN→∞

LN

N−→ Hn(S)

Sketch of the Proof

Huffman Codes

• Optimality

• Proof Sketch

Examples

Optimality Proof

4 / 12

• From McMillan Inequality, we have that one can construct a U.D.

code such that li =⌈

logn1

pi

. In other words, we can construct

a U.D. code for which we have

Hn(S) + 1 > L ≥ Hn(S).

• Why?

Asymptotic Optimality of

Huffman Coding

Examples

Huffman Codes

Examples

• Example I

• Example II

• Example III

Optimality Proof

5 / 12

Huffman Codes are asymptotically optimal

Huffman Codes

Examples

• Example I

• Example II

• Example III

Optimality Proof

6 / 12

EX 1: n = 2, (.9, .09, .01) ⇒ H2(S) = .516

Prob log21

pili =

log21

pi

A .9 .152 1B .09 3.47 4C .01 6.67 7

L = 1.33

Note thatHn(S) ≤ L < Hn(S) + 1.516 ≤ 1.33 < 1.516

A better code (this is actually a Huffman Code:)

A 0B 10C 11

L = 1.1

Huffman Codes are asymptotically optimal

Huffman Codes

Examples

• Example I

• Example II

• Example III

Optimality Proof

7 / 12

EX 2: n = 2, (.19, .19, .19, .19, .19.05) ⇒ H2(S) = 2.492

Prob log21

pili =

log21

pi

A .19 2.396 3B .19 2.396 3C .19 2.396 3D .19 2.396 3E .19 2.396 3F .05 4.322 5

L = 3.10

Huffman Code:A 00 D 101B 01 E 110C 100 F 111

L = 2.62

Huffman Codes are asymptotically optimal

Huffman Codes

Examples

• Example I

• Example II

• Example III

Optimality Proof

8 / 12

EX 3: n = 3, (.19, .19, .19, .19, .19.05) ⇒ H2(S) = 2.492

Prob log31

pili =

log31

pi

A .19 1.51 2B .19 1.51 2C .19 1.51 2D .19 1.51 2E .19 1.51 2F .05 2.73 3

L = 1.57

Huffman Code:A 0 D 12B 10 E 20C 11 F 21

L = 1.81

Performance of Huffman Codes

Huffman Codes

Examples

Optimality Proof

• Optimality

• Necessary Conditions

• Proof

9 / 12

Asymptotic Optimality of Huffman Codes

Huffman Codes

Examples

Optimality Proof

• Optimality

• Necessary Conditions

• Proof

10 / 12

Theorem: For an I.I.D. source S, a Huffman Code exists with code

alphabet size n for the N th extension of this source such that

Hn(S) ≤LN

N< Hn(S) + 1

Proof: Earlier (slide 3) we have seen that a U.D. code exists for the

N th extension of the source such that

Hn(SN ) ≤ LN < Hn(S

N ) + 1

But Hn(SN ) = NHn(S). So all we will need is to show that for a

given fixed source with fixed alphabet Huffman Code is at least as

good as the U.D. code.

Q.E.D.

Next we prove the fact we have stated without proof that, for any

fixed and given alphabet, a Huffman code is optimal.

Properties of an Optimal (Compact) Binary Code

Huffman Codes

Examples

Optimality Proof

• Optimality

• Necessary Conditions

• Proof

11 / 12

We only state the results in case of binary codes, even though

similar lines of arguments can be used to establish n-ary codes.

We only need to consider instantaneous codes!

1. If pi ≤ pj , then li ≥ ljProof: Otherwise switching code words will reduce L.

2. There is no single code word of length l∗ ≤ maxli.

Proof: If there were shorten it by one digit, it will still not be a

prefix of any other code word and will shorten L.

3. The code words of length l∗, they occur in pairs in which the

code words in each pair agree in all but the last digit. Proof: If

not, shorten the code word for which is not the case by one digit

and it will not be the prefix of any other code word. This will

shorten L.

Optimality of Binary Huffman Codes

Huffman Codes

Examples

Optimality Proof

• Optimality

• Necessary Conditions

• Proof

12 / 12

Placeholder Figure A

Lj−1 = Lj + Pα1 + Pα2 since the code at (j − 1) is same as the

code at (j) except for two words that have length one more.

We now show that if code Cj is optimal, then code Cj−1 must also

be optimal. Suppose not; and there were a better code at (j − 1).

Call it’s average length L′

j−1 < Lj−1. But the two code words with

probabilities Pα0 & Pα1 are identical in all but the last digit. Form a

new code at j that has the identical prefix as the code word for Pα.

This code will have average length L′

j = L′

j−1 − (Pα1 + Pα2) so

that L′

j−1 = Lj−1. But this can’t be the case if Cj was optimal.

1 / 26

Digital Communications III (ECE 154C)

Introduction to Coding and Information Theory

Tara Javidi

These lecture notes were originally developed by late Prof. J. K. Wolf.

UC San Diego

Spring 2014

Coding With Distortion

Lossy Source Coding

• Lossy Coding

• Distortion

A/D & D/A

Scalar Quantization

Vector Quantization

Audio Compression

2 / 26

Coding with Distortion

Lossy Source Coding

• Lossy Coding

• Distortion

A/D & D/A

Scalar Quantization

Vector Quantization

Audio Compression

3 / 26

Placeholder Figure A

ǫ2 = limT→∞

1

T

∫ T/2

−T/2E(x(t)− x(t))2dt = M.S.E.

Discrete-Time Signals

Lossy Source Coding

• Lossy Coding

• Distortion

A/D & D/A

Scalar Quantization

Vector Quantization

Audio Compression

4 / 26

If signals are bandlimited, one can sample at nyquist rate and

convert continuous-time problem to discrete-time problem. This

sampling is part of the A/D converter.

Placeholder Figure A

ǫ2 = limm→∞

1

m

m∑

i=1

E(xi − xi)2

A/D Conversion and D/A

Conversion

Lossy Source Coding

A/D & D/A

• A/D

• D/A

• D/A

• D/A

Scalar Quantization

Vector Quantization

Audio Compression

5 / 26

A/D Conversion

Lossy Source Coding

A/D & D/A

• A/D

• D/A

• D/A

• D/A

Scalar Quantization

Vector Quantization

Audio Compression

6 / 26

• Assume a random variable X which falls into the range

(Xmin, Xmax).• The goal is for X to be converted into k binary digits. Let

M = 2k.

• The usual A/D converter first subdivides the interval

(Xmin, Xmax) into M equal sub-intervals.

• Subintervals are of width ∆ = (Xmax −Xmin)/M .

• Shown below for the case of k = 3 and M = 8.

Placeholder Figure A

• We call the ith sub-interval, ℜi.

D/A Conversion

Lossy Source Coding

A/D & D/A

• A/D

• D/A

• D/A

• D/A

Scalar Quantization

Vector Quantization

Audio Compression

7 / 26

• Assume that if X falls in the region ℜi, i.e. x ∈ ℜi

• D/A converter uses as an estimate of X , the value X = Y which

is the center of the ith region.

• The mean-squared error between X and X is

ǫ2 = E[(X − X)2] =

∫ Xmax

Xmin

(X − X)2fx(x)dx

where fx(x) is the probability density function of the random

variable X .

• Let fX|ℜi(x) be the conditional density function of X given that

X falls in the region ℜi. Then

ǫ2 =M∑

i=1

P [x ∈ ℜi]

x∈ℜi

(x− yi)2fx|ℜi

(x)dx

D/A Conversion

Lossy Source Coding

A/D & D/A

• A/D

• D/A

• D/A

• D/A

Scalar Quantization

Vector Quantization

Audio Compression

8 / 26

• Note that for i = 1, 2, ...,M

M∑

i=1

P [x ∈ ℜi] = ∆ and

x∈ℜi

fX|ℜi(x)dx = 1.

• Make the further assumption that k is large enough so that

fX|ℜi(x) is a constant over the region ℜi.

• Then fX|ℜi(x) = 1

∆ for all i, and

x∈ℜi

(x− yi)2fX|ℜi

(x)dx =1

∫ b

a(x− (

b− a

2))2dx

=1

∫ ∆2

−∆

2

(x− 0)2dx

=1

∆· 23·(

2

)3

=∆2

12

D/A Conversion

Lossy Source Coding

A/D & D/A

• A/D

• D/A

• D/A

• D/A

Scalar Quantization

Vector Quantization

Audio Compression

9 / 26

• In other words, ǫ2 =M∑

i=1

P [x ∈ ℜi] ·∆2

12=

∆2

12

• If X has variance σ2x, the signal-to-noise ratio of the A to the D

(& D to A) converter is often defined as

(

σ2x/

∆2

12

)

• If Xmin is equal to −∞ and/or Xmax = +∞, then the last and

first intervals can be infinite in extent.

• However fx(x) is usually small enough in those intervals so that

the result is still approximately the same.

Placeholder Figure A

SCALAR QUANTIZATION of

GAUSSIAN SAMPLES

Lossy Source Coding

A/D & D/A

Scalar Quantization

• Quantization

• Quantizer

• Optimal Quantizer

• Optimal Quantizater

• Optimal Quantizater

• Iterative Solution

• Comments

• Comments

Vector Quantization

Audio Compression

10 / 26

Scalar Quantization

Lossy Source Coding

A/D & D/A

Scalar Quantization

• Quantization

• Quantizer

• Optimal Quantizer

• Optimal Quantizater

• Optimal Quantizater

• Iterative Solution

• Comments

• Comments

Vector Quantization

Audio Compression

11 / 26

Placeholder Figure A

• ENCODER:

x ≤ −3b 000 0 < x < b 100−3b < x ≤ −2b 001 b < x ≤ 2b 101−2b ≤ x ≤ −b 010 2b < x ≤ 3b 110

−b ≤ x ≤ 0 011 3b < x 111

• DECODER:

000 −3.5b 100 +.5b001 −2.5b 101 +1.5b010 −1.5b 110 +2.5b011 −.5b 111 +3.5b

Optimum Scalar Quantizer

Lossy Source Coding

A/D & D/A

Scalar Quantization

• Quantization

• Quantizer

• Optimal Quantizer

• Optimal Quantizater

• Optimal Quantizater

• Iterative Solution

• Comments

• Comments

Vector Quantization

Audio Compression

12 / 26

• Let us construct boundaries bi, (b0 = −∞, bM = +∞ and

quantization symbols ai such that

bi−1 ≤ x < bi −→ x = ai i = 1, 2, ...,M

Placeholder Figure A

• The question is how to optimize {bi} and {ai} to minimize

distortion ǫ2

ǫ2 =

M∑

i=1

∫ bi

bi−1

(x− ai)2fx(x)dx

Optimum Scalar Quantizer

Lossy Source Coding

A/D & D/A

Scalar Quantization

• Quantization

• Quantizer

• Optimal Quantizer

• Optimal Quantizater

• Optimal Quantizater

• Iterative Solution

• Comments

• Comments

Vector Quantization

Audio Compression

13 / 26

• To optimize ǫ2 =∑M

i=1

∫ bibi−1

(x− ai)2fx(x)dx we take

derivatives and putting them equal to zero:

δǫ2

δaj= 0

δǫ2

δbj= 0

• And use Leibnitz’s Rule:

δ

δt

∫ b(t)

a(t)f(x, t)dx =f(b(t), t)

δb(t)

δt

− f(a(t), t)δa(t)

δt

+

∫ b(t)

a(t)

δ

δtf(x, t)dt

Optimum Scalar Quantizer

Lossy Source Coding

A/D & D/A

Scalar Quantization

• Quantization

• Quantizer

• Optimal Quantizer

• Optimal Quantizater

• Optimal Quantizater

• Iterative Solution

• Comments

• Comments

Vector Quantization

Audio Compression

14 / 26

δ

δbj

(

M∑

i=1

∫ bi

bi−1

(x− ai)2fx(x)dx

)

=

δ

δbj

∫ bj

bj−1

(x− aj)2fx(x)dx+

δ

δbj

∫ bj+1

bj

(x− aj+1)2fx(x)dx

= (bj − aj)2fx(x)|x=bj − (bj − aj+1)

2fx(x)|x=bj = 0

b2j − 2ajbj + a2j = b2j − 2bjaj+1 + a2j+1

2bj(aj+1 − aj) = a2j+1 − a2j

bj =aj+1+aj

2 (I)

Optimum Scalar Quantizer

Lossy Source Coding

A/D & D/A

Scalar Quantization

• Quantization

• Quantizer

• Optimal Quantizer

• Optimal Quantizater

• Optimal Quantizater

• Iterative Solution

• Comments

• Comments

Vector Quantization

Audio Compression

15 / 26

δ

δaj

(

M∑

i=1

bi−1

bi(x− ai)2fx(x)dx

)

= −2

∫ bj

bj−1

(x−aj)fx(x)dx = 0

aj

∫ bj

bj−1

fx(x)dx =

∫ bj

bj−1

xfx(x)dx

aj =

∫ bjbj−1

xfx(x)dx

∫ bjbj−1

fx(x)dx(II)

Optimum Scalar Quantizer

Lossy Source Coding

A/D & D/A

Scalar Quantization

• Quantization

• Quantizer

• Optimal Quantizer

• Optimal Quantizater

• Optimal Quantizater

• Iterative Solution

• Comments

• Comments

Vector Quantization

Audio Compression

16 / 26

• Note that the {bi} can be found from (I) once the {ai} is known.

◦ In fact, the {bi} are the midpoints of the {ai}.

• The {ai} can also be solved from (II) once the {bi} are known.

◦ The {ai} are centroids of the corresponding regions.

• Thus one can use a computer to iteratively solve for the {ai} and

the {bi}1. One starts with an initial guess for the {bi}.

2. One uses (II) to solve for the {ai}.

3. One uses (I) to solve for the {bi}.

4. One repeats steps 2 and 3 until the {ai} and the {bi} ”stop

changing”.

Comments on Optimum Scalar Quantizer

Lossy Source Coding

A/D & D/A

Scalar Quantization

• Quantization

• Quantizer

• Optimal Quantizer

• Optimal Quantizater

• Optimal Quantizater

• Iterative Solution

• Comments

• Comments

Vector Quantization

Audio Compression

17 / 26

1. This works for any fx(x)2. If fx(x) only has a finite support one adjusts b0&bM to be the

limits of the support.

Placeholder Figure A

3. One needs to know

β∑

α

fx(x)dx and

β∑

α

xfx(x)dx (true for

any fx(x))4. For a Gaussian, we can integrate by parts or let y = x2

∫ β

α

1√2π

e−1

2x2

dx = Q(β)−Q(α)

∫ β

αx

1√2π

e−1

2x2

dx = ...

Comments on Optimum Scalar Quantizer

Lossy Source Coding

A/D & D/A

Scalar Quantization

• Quantization

• Quantizer

• Optimal Quantizer

• Optimal Quantizater

• Optimal Quantizater

• Iterative Solution

• Comments

• Comments

Vector Quantization

Audio Compression

18 / 26

5. If M = 2a one could use a binary digits to represent the

quantized value. However since the quantized values are not

necessarily equally likely, one could use a HUFFMAN CODE to

use fewer binary digits(on the average).

6. After the {ai} and {bi} are known, one computes ǫ2 from

ǫ2 =M∑

i=1

∫ bi

bi−1

(x− ai)2fx(x)dx

7. For M = 2 and fx(x) =1√2πσ2

e−12

x2

σ2 we have

b0 = −∞, b1 = 0, b2 = +∞, and a2 = −a1 =

2σ2

π

8. Also easy to show that ǫ2 = (1− 2π )σ

2 = .3634σ2.

Vector Quantization

Lossy Source Coding

A/D & D/A

Scalar Quantization

Vector Quantization

• Quantization

• Gaussain DMS

• Gaussain DMS

Audio Compression

19 / 26

Vector Quantization

Lossy Source Coding

A/D & D/A

Scalar Quantization

Vector Quantization

• Quantization

• Gaussain DMS

• Gaussain DMS

Audio Compression

20 / 26

• One can achieve a smaller ǫ2 by quantizing several samples at a

time.

• We would then use regions in an m-dimensional space

Placeholder Figure A

• Shannon characterized this in terms of ”rate-distortion formula”

which tells us how small ǫ2 can be (m → ∞).

• For a Gaussian source with one binary digit per sample,

ǫ2 ≥ σ2

4= 0.25σ2

◦ This follows from the result on the next page.

◦ Contrast this with scalar case: ǫ2s = (1− 2π )σ

2 = .3634σ2.

VQ: Discrete Memoryless Gaussian Source

Lossy Source Coding

A/D & D/A

Scalar Quantization

Vector Quantization

• Quantization

• Gaussain DMS

• Gaussain DMS

Audio Compression

21 / 26

• Let source produce i.i.d. Gaussian samples X1, X2, ... where

fX(x) =1√2πσ2

e−1

2x2

σ2

• Let the source encoder produce a sequence of binary digits at a

rate of R binary digits/source symbol.

◦ In our previous terminology R = logM

• Let the source decoder produce the sequence X1, X2, ..., Xi, ...such that the mean-squared error between {Xi} and {Xi} is ǫ2.

ǫ2 =1

n

n∑

i=1

E{(Xi − Xi)2}

VQ: Discrete Memoryless Gaussian Source

Lossy Source Coding

A/D & D/A

Scalar Quantization

Vector Quantization

• Quantization

• Gaussain DMS

• Gaussain DMS

Audio Compression

22 / 26

• Then one can prove that for any such system

R ≥ 1

2log2(

σ2

ǫ2) for ǫ2 ≤ σ2

◦ Note that R = 0 for ǫ2 ≥ σ2. What does this mean?

◦ Note that for R = logM = 1,

1 ≥ 1

2log2(

σ2

ǫ2) ⇒ 2 ≥ log2(

σ2

ǫ2)

⇒ 4 ≥ σ2

ǫ2⇒ ǫ2 ≥ (1/4)σ2

• This is an example of “Rate-Distortion Theory.”

Reduced Fidelity Audio

Compression

Lossy Source Coding

A/D & D/A

Scalar Quantization

Vector Quantization

Audio Compression

• MP3

• CD

• MPEG-1 Layer 3

23 / 26

Reduced Fidelity Audio Compression

Lossy Source Coding

A/D & D/A

Scalar Quantization

Vector Quantization

Audio Compression

• MP3

• CD

• MPEG-1 Layer 3

24 / 26

• MP3 players use a form of audio compression called MPEG-1

Audio Layer 3.

• It takes advantage of a psycho-acoustic phenomena whereby

◦ a loud tone at one frequency “masks” the presence of softer

tones at neighboring frequencies;

◦ hence, these softer neighbouring tones need not be stored(or

transmitted).

• Compression efficiency of an audio compression scheme is

usually described by the encoded bit rate (prior to the

introduction of coding bits.)

Reduced Fidelity Audio Compression

Lossy Source Coding

A/D & D/A

Scalar Quantization

Vector Quantization

Audio Compression

• MP3

• CD

• MPEG-1 Layer 3

25 / 26

• The CD has a bit rate of (44.1× 103 × 2× 16) = 1.41× 106

bits/second.

◦ The term 44.1× 103 is the sampling rate which is

approximately the Nyquist frequency of the audio to be

compressed.

◦ The term 2 comes from the fact that there are two channels in

a stereo audio system.

◦ The term 16 comes from the 16-bit (or 216 = 65536 level) A

to D converter.

◦ Note that a slightly higher sampling rate 48× 103

samples/second is used for a DAT recorder.

Reduced Fidelity Audio Compression

Lossy Source Coding

A/D & D/A

Scalar Quantization

Vector Quantization

Audio Compression

• MP3

• CD

• MPEG-1 Layer 3

26 / 26

• Different standards are used in MP3 players.

• Several bit rates are specified in the MPEG-1, Layer 3 standard.

◦ These are 32, 40, 48, 56, 64, 80, 96, 112, 128, 144, 160,

192, 224, 256 and 320 kilobits/sec.

◦ The sampling rates allowed are 32, 44.1 and 48 kiloHz but

the sampling rate of 44.1× 103 Hz is almost always used.

• The basic idea behind the scheme is as follows.

◦ A block of 576 time domain samples are converted into 576

frequency domain samples using a DFT.

◦ The coefficients then modified using psycho-acoustic

principles.

◦ The processed coefficients are then converted into a bit

stream using various schemes including Huffman Encoding.

◦ The process is reversed at the receiver: bits −→ frequency

domain coefficients −→ time domain samples.

1 / 16

Digital Communications III (ECE 154C)

Introduction to Coding and Information Theory

Tara Javidi

These lecture notes were originally developed by late Prof. J. K. Wolf.

UC San Diego

Spring 2014

Information Theory:

Definitions and Equalities

Basic Definition

• Conditional Entropy

• Conditional Entropy

• Mutual Information

• Basic Inequality

Examples

2 / 16

Basic Definitions

Basic Definition

• Conditional Entropy

• Conditional Entropy

• Mutual Information

• Basic Inequality

Examples

3 / 16

• Since H(X) =∑

x

p[x] log1

p[x]

• Then H(X,Y ) =∑

x

y

p[x, y] log1

p[x, y]

• Define H[X|Y = y] =∑

x

p(x|y) log1

p(x|y)

• Define H[X|Y ] =∑

y

p(y)H[X|Y = y]

• Easy to see

H[X|Y ] =∑

y

p(y)∑

x

p(x|y) log1

p(x|y)

=∑

x

y

p(x, y) log1

p(x|y)

Fundamental Equalities

Basic Definition

• Conditional Entropy

• Conditional Entropy

• Mutual Information

• Basic Inequality

Examples

4 / 16

• But p[x, y] = p[y|x]p[x] so

H[X,Y ]

=∑

x

y

p[x, y]

[

log1

p[y|x]+ log

1

p[x]

]

=∑

x

y

p[x, y] log1

p[y|x]+∑

x

(∑

y

p[x, y]

)

︸ ︷︷ ︸

p[x]

log1

p[x]

=∑

x

y

p[x, y] log1

p[y|x]︸ ︷︷ ︸

Defined as H[Y |X]

+∑

x

p[x] log1

p[x]︸ ︷︷ ︸

H[X]

• In other words

H(X,Y ) = H(X) +H(Y |X) = H(Y ) +H(X|Y )

Basic Definition

Basic Definition

• Conditional Entropy

• Conditional Entropy

• Mutual Information

• Basic Inequality

Examples

5 / 16

• Define I(X;Y ) =∑

x

y

p(x, y) logp(x, y)

p(x)p(y)

• It is easy to see that

1. I(X;Y ) = H(X) +H(Y )−H(X,Y )

or equivalently

H(X,Y ) = H(X) +H(Y )− I(X;Y )

2. I(X;Y ) = H(X)−H(X|Y )

3. I(X;Y ) = H(Y )−H(Y |X)

A Basic Inequality

Basic Definition

• Conditional Entropy

• Conditional Entropy

• Mutual Information

• Basic Inequality

Examples

6 / 16

I(X;Y ) ≥ 0 (or −I(x; y) ≤ 0) with equality iff X & Yare statistically independent.

PROOF:

−Ia(X;Y ) =∑

x

y

p[x, y] loga

[p(x)p(y)

p(x, y)

]

= (loga e)∑

x

y

p[x, y] lnp(x)p(y)

p(x, y)

≤ (loga e) ln

(∑

x

y

p[x, y]

[p(x)p(y)

p(x, y)

])

= 0

with equality iff p[x, y] = p[x]p[y].

As a consequence:

H(X|Y ) ≤ H(X)H(Y |X) ≤ H(Y )

}

with equality iff X&Y are independent.

Three Examples

Basic Definition

Examples

• Example 1

• Example 2

• Example 2

(continued)

• Example 2

(continued)

• Example 3

• Example 3

(continued)

• Example 3

(continued)

• Example 3

(continued)

• Example 3

(continued)

7 / 16

EXAMPLE 1

Basic Definition

Examples

• Example 1

• Example 2

• Example 2

(continued)

• Example 2

(continued)

• Example 3

• Example 3

(continued)

• Example 3

(continued)

• Example 3

(continued)

• Example 3

(continued)

8 / 16

EX 1: One die,

Xoutcome ∈ {1, 2, 3, 4, 5, 6}Y ∈ {odd, even} −→ {0, 1}

H(X) = log2 6

H(Y |X) = 0

H(X,Y ) = log2 6

H(Y ) = 1 = log2 2

H(X|Y ) = log2 3

H(X,Y ) = log2 6 = log2 6 + 0 = log2 2 + log2 3

EXAMPLE 2

Basic Definition

Examples

• Example 1

• Example 2

• Example 2

(continued)

• Example 2

(continued)

• Example 3

• Example 3

(continued)

• Example 3

(continued)

• Example 3

(continued)

• Example 3

(continued)

9 / 16

EX 2: Two dice (independent throws): (X1, X2)

H(X2|X1) = H(X2) = log2 6 = 2.58496

H(X1, X2) = 2 log2 6 = log2 36 = 5.16993

Let us define

Y = X1 +X2

What is H(X1, X2|Y ) =?It is easier to calculate H(Y |X1, X2) and H(Y ) and then use

H(X1, X2|Y ) = H(Y |X1, X2) +H(X1, X2)−H(Y )

H(X1, X2, Y ) = H(Y |X1, X2) +H(X1, X2)

But Also

H(X1, X2, Y ) = H(X1, X2|Y ) +H(Y )

EX 2 (continued)

Basic Definition

Examples

• Example 1

• Example 2

• Example 2

(continued)

• Example 2

(continued)

• Example 3

• Example 3

(continued)

• Example 3

(continued)

• Example 3

(continued)

• Example 3

(continued)

10 / 16

Since H(Y |X1, X2) = 0, all we need to calculate is H(Y )

1 (X1, X2)2 (1, 1) 1/363 (1, 2) + (2, 1) 2/364 (1, 3), (2, 2) + (3, 1) 3/36

5... 4/36

......

...

12 (6, 6) 1/36

H(Y ) = 2

[1

36log2 36 + · · ·+

5

36log2

36

5

]

+6

36log2 6

= 3.27441

H(X1, X2|Y ) = 5.16993− 3.27441 = 1.89552

EX 2 (continued)

Basic Definition

Examples

• Example 1

• Example 2

• Example 2

(continued)

• Example 2

(continued)

• Example 3

• Example 3

(continued)

• Example 3

(continued)

• Example 3

(continued)

• Example 3

(continued)

11 / 16

Another question: What is H(X1|Y )?

We take advantage of the following equality:

H(X1, X2|Y ) = H(X1, Y ) +H(X2|X1, Y ).

Now notice that since X2 is independent of X1

H(X2|X1, Y ) = H(X2|Y ).

But by symmetry

H(X1|Y ) = H(X2|Y ).

Hence,

H(X1|Y ) =1

2H(X1, X2|Y ) = 0.94776.

EXAMPLE 3

Basic Definition

Examples

• Example 1

• Example 2

• Example 2

(continued)

• Example 2

(continued)

• Example 3

• Example 3

(continued)

• Example 3

(continued)

• Example 3

(continued)

• Example 3

(continued)

12 / 16

EX 3: Assume a tennis match is played between two equally

matched players A and B and that the first player to win 3 sets is the

winner. Let X represent the outcomes of the sets. Some of the

possible values of X are: AAA, BBB, ABBB, BAAA, BABAA, etc. Let

Y be the number of sets played. Then Y takes on values 3,4, or 5.

Let Z represent the player who won the match. For example if X =

ABBAA, then Y = 5 and Z = A.

1. Compute H(X), H(Y ), H(z).2. Compute H(X,Y ), H(X,Z), H(Y, Z)3. Prove that H(X|Y ) = H(X)−H(Y ).

EX 3 (continued)

Basic Definition

Examples

• Example 1

• Example 2

• Example 2

(continued)

• Example 2

(continued)

• Example 3

• Example 3

(continued)

• Example 3

(continued)

• Example 3

(continued)

• Example 3

(continued)

13 / 16

X Y Z18

18

AAA

BBB

3

3

}

28 = 1

4

A

B

116

116

116

116

116

116

BAAA

ABAA

AABA

ABBB

BABB

BBAB

4

4

4

4

4

4

616 = 3

8

A

A

A

B

B

B

EX 3 (continued)

Basic Definition

Examples

• Example 1

• Example 2

• Example 2

(continued)

• Example 2

(continued)

• Example 3

• Example 3

(continued)

• Example 3

(continued)

• Example 3

(continued)

• Example 3

(continued)

14 / 16

X Y Z132

132

...

132

132

132

132

132

132

BBAAA

BABAA

...

AABBB

ABABB

ABBAB

BAABB

BABAB

BBAAB

5

5

...

5

5

5

5

5

5

1232 = 3

8

A

A

...

B

B

B

B

B

B

EX 3 (continued)

Basic Definition

Examples

• Example 1

• Example 2

• Example 2

(continued)

• Example 2

(continued)

• Example 3

• Example 3

(continued)

• Example 3

(continued)

• Example 3

(continued)

• Example 3

(continued)

15 / 16

1. H(X) = 2

[1

8log 8

]

+ 6

[1

16log 16

]

+ 12

[1

32log 32

]

=

6

8+

3

2+

15

8=

33

8= 4.125 (base 2)

H(Y ) =1

4log 4 + 2

(3

8log

8

3

)

=1

2+

9

4−

3

4log 3 =

11

4−

3

4log 3 = 1.56 (base 2)

H(Z) = log 2 = 1 (base 2)

2. H(X,Y ) = H(X) +H(Y |X)︸ ︷︷ ︸

0

= H(X) = 338 = 4.125

H(X,Z) = H(X) +H(Z|X)︸ ︷︷ ︸

0

= H(X) = 338 = 4.125

H(Y, Z) = H(Y ) +H(Z|Y ) = H(Y ) +H(Z) =154 − 3

4 log 3 = 2.56

EX 3 (continued)

Basic Definition

Examples

• Example 1

• Example 2

• Example 2

(continued)

• Example 2

(continued)

• Example 3

• Example 3

(continued)

• Example 3

(continued)

• Example 3

(continued)

• Example 3

(continued)

16 / 16

3. H(X|Y ) = H(X,Y )︸ ︷︷ ︸

H(X)︸ ︷︷ ︸

+H(Y |X)︸ ︷︷ ︸

0

−H(Y ) = H(X)−H(Y ) =

2.57 .

Similarly,

H(Y |X) = H(Z|X) = 0

H(Z|Y ) = H(Y, Z)−H(Y ) = 2.56− 1.56 = 1 = H(Z)

H(Z|X) = 1 = H(Z|X,Y ) = H(Z|X) = 1

H(X|Z) = H(X,Z)−H(Z) = H(X)−H(Z) = 3.125

H(X,Y, Z) = H(Z|XY )︸ ︷︷ ︸

1

+H(X,Y )︸ ︷︷ ︸

4.25

= 5.25 etc..

1 / 23

Digital Communications III (ECE 154C)

Introduction to Coding and Information Theory

Tara Javidi

These lecture notes were originally developed by late Prof. J. K. Wolf.

UC San Diego

Spring 2014

Discrete Memoryless Channels

Discrete Memoryless

Channel

• Basic Definition

Capacity of DMC

Channel Coding

Capacity: Examples

Computing Capacity

2 / 23

Discrete Memoryless Channel

Discrete Memoryless

Channel

• Basic Definition

Capacity of DMC

Channel Coding

Capacity: Examples

Computing Capacity

3 / 23

X1...XN−→ DMCY1...YN−→

PY1...YN |X1...XN(y1...yN |x1...xN)

= PY |X(y1|x1)PY |X(y2|x2)PY |X(yN |xN )

From the memoryless property of the chancel, the “Single Input –

Single Output” representation is sufficiently informative.

X−→ DMC

Y−→

PY |X(y|x)

Channel Capacity:

Fundamental Limits

Discrete Memoryless

Channel

Capacity of DMC

• Basica Defintion

• Intuition

• Block Input

Channel Coding

Capacity: Examples

Computing Capacity

4 / 23

Maximum Mutual Information

Discrete Memoryless

Channel

Capacity of DMC

• Basica Defintion

• Intuition

• Block Input

Channel Coding

Capacity: Examples

Computing Capacity

5 / 23

=⇒ An important property of a DMC is its mutual information,

I(X;Y )

=⇒ But in order to calculate I(X;Y ) we need to know

PX,Y (x, y) = PY |X(y|x)PX(x)

=⇒ Thus, in order to calculate I(X;Y ) one has to specify an input

distribution PX(x)

=⇒ The capacity, C , of a DMC is the maximum I(X;Y ) that can be

achieved over all input distributions

C = maxPX(x)

I(X;Y )

Channel Capacity: Intuitition

Discrete Memoryless

Channel

Capacity of DMC

• Basica Defintion

• Intuition

• Block Input

Channel Coding

Capacity: Examples

Computing Capacity

6 / 23

• Recall that I(X;Y ) = H(X)−H(X/Y )• Recall the interpretation that

◦ For any random variable X , H(X) measure the randomness

or uncertainty about X◦ Zy = X|Y=y as a new random variable with (conditional)

pmf PX|Y=y(x|Y = y)◦ H(X|Y ) is the entropy of this random variable averaged

over choices of Y = y

• Mutual Information is nothing but the average reduction in the

randomness about X after Y is observed

• Channel Capacity is the maximum such reduction in uncertainty

when one can design X

C = maxPX (x)

I(X;Y )

Multiple Input – Multiple Output

Discrete Memoryless

Channel

Capacity of DMC

• Basica Defintion

• Intuition

• Block Input

Channel Coding

Capacity: Examples

Computing Capacity

7 / 23

=⇒ One can show that for a DMC

maxPX1···XN

(x1,...,xN )

1

NI(X1, . . . , XN ;Y1, . . . , YN ) = max

PX(x)I(X;Y )

=⇒ How?

Channel Coding and Capacity

Discrete Memoryless

Channel

Capacity of DMC

Channel Coding

• Definitions

• Definitions

• Coding Theorem

Capacity: Examples

Computing Capacity

8 / 23

Coding For A Binary Input DMC

Discrete Memoryless

Channel

Capacity of DMC

Channel Coding

• Definitions

• Definitions

• Coding Theorem

Capacity: Examples

Computing Capacity

9 / 23

W−→

Channel

Encoder

m−vector

︷︸︸︷

x−−−−−−→

Binary-Input

DMC

m−vector

︷︸︸︷

y

−−−−−−→Channel

Decoder

W−→

• Code Rate: R < 1 (at most one digit transmitted)

• Message: An integer number, denoted by W, between 1 and

2mR (equivalently, a binary vector of length mR)

Assumption: Messages occur with equal probabilities

P [W = i] =1

2mR, for all i = 1, 2, ..., 2mR

Coding For A Binary Input DMC

Discrete Memoryless

Channel

Capacity of DMC

Channel Coding

• Definitions

• Definitions

• Coding Theorem

Capacity: Examples

Computing Capacity

10 / 23

W−→

Channel

Encoder

m−vector

︷︸︸︷

x−−−−−−→

Binary-Input

DMC

m−vector

︷︸︸︷

y

−−−−−−→Channel

Decoder

W−→

• Block Code (Binary): Collection of 2mR length m binary vectors

• Channel Decoder: Chooses most likely code word (or

equivalently most likely message) based upon received vector y.

• Error Probability: Probability that message produced by decoder

is not equal to the original message, i.e. P{W 6= W}

Channel Coding Theorem

Discrete Memoryless

Channel

Capacity of DMC

Channel Coding

• Definitions

• Definitions

• Coding Theorem

Capacity: Examples

Computing Capacity

11 / 23

W−→

Channel

Encoder

m−vector

︷︸︸︷

x−−−−−−→

Binary-Input

DMC

m−vector

︷︸︸︷

y

−−−−−−→Channel

Decoder

W−→

Channel Coding Theorem:

Let R < C (base 2) . For m large enough, there exists an encoder

and a decoder such that P{W 6= W} < ǫ for any ǫ > 0.

Computation of Capacity for

Well-known Channels

Discrete Memoryless

Channel

Capacity of DMC

Channel Coding

Capacity: Examples

• BSC

• BSC

• BEC

• BEC

• Z-Channel

• Z-Channel

• 5-ary Symmetric

• 5-ary Symmetric

• Code for 5-ary

Symmetric

Computing Capacity

12 / 23

Capacity of BSC

Discrete Memoryless

Channel

Capacity of DMC

Channel Coding

Capacity: Examples

• BSC

• BSC

• BEC

• BEC

• Z-Channel

• Z-Channel

• 5-ary Symmetric

• 5-ary Symmetric

• Code for 5-ary

Symmetric

Computing Capacity

13 / 23

EXAMPLE: Binary Symmetric Channel (BSC)

Placeholder Figure A

I2(X;Y ) = H2(Y )−H2(Y |X)

H2(Y |X) =1∑

x=0

1∑

y=0

PX(x)PY |X(y|x) log21

PY |X(y|x)

=

1∑

x=0

PX(x)

(

(1− p) log21

1− p+ p log2

1

p

)

= h2(p)

where h2(p) := (1− p) log21

1−p + p log21p .

Capacity of BSC

Discrete Memoryless

Channel

Capacity of DMC

Channel Coding

Capacity: Examples

• BSC

• BSC

• BEC

• BEC

• Z-Channel

• Z-Channel

• 5-ary Symmetric

• 5-ary Symmetric

• Code for 5-ary

Symmetric

Computing Capacity

14 / 23

EXAMPLE: Binary Symmetric Channel (BSC)

On the other hand, H2(Y ) ≤ log2 2 with equality iff

P [Y = 0] = P [Y = 1] = 1/2.

But note that if PX(0) = 1/2 = PX(1) = 1/2, then

PY (0) = PY (1) = 1/2.

In other words,

C = maxPX (x)

I(X;Y ) = log2 2− h2(p) = 1− h2(p).

Capacity of BEC

Discrete Memoryless

Channel

Capacity of DMC

Channel Coding

Capacity: Examples

• BSC

• BSC

• BEC

• BEC

• Z-Channel

• Z-Channel

• 5-ary Symmetric

• 5-ary Symmetric

• Code for 5-ary

Symmetric

Computing Capacity

15 / 23

EXAMPLE: Binary Erasure Channel (BEC)

Placeholder Figure A

I2(X;Y ) = H2(X)−H(X|Y )

Assume P [X = 0] = α, and P [X = 1] = (1− α)(⇒ PY (2) = p, PY (0) = α(1− p), and PY [1] = (1−α)(1− p)).

Hence, H2(X) = α log21α + (1− α) log2

11−α = h2(α).

On the other hand,

H2(X|Y ) =∑2

y=0 PY (y)1∑

x=0

PX|Y (x|y) log21

PX|Y (x|y)︸ ︷︷ ︸

H(X|Y=y)

Capacity of BEC

Discrete Memoryless

Channel

Capacity of DMC

Channel Coding

Capacity: Examples

• BSC

• BSC

• BEC

• BEC

• Z-Channel

• Z-Channel

• 5-ary Symmetric

• 5-ary Symmetric

• Code for 5-ary

Symmetric

Computing Capacity

16 / 23

Also,

H(X|Y = 0) =1∑

x=0

PX|Y (x|0) log21

PX|Y (x|0)= 0

H(X|Y = 1) =1∑

x=0

PX|Y (x|1) log21

PX|Y (x|1)= 0

H(X|Y = 2) =1∑

x=0

PX|Y (x|2) log21

PX|Y (x|2)

= α log2(1

α) + (1− α) log2

1

1− α= h2(α)

Hence,

C = maxα

h2(α)− ph2(α) = 1− p.

Capacity of a Z-Channel

Discrete Memoryless

Channel

Capacity of DMC

Channel Coding

Capacity: Examples

• BSC

• BSC

• BEC

• BEC

• Z-Channel

• Z-Channel

• 5-ary Symmetric

• 5-ary Symmetric

• Code for 5-ary

Symmetric

Computing Capacity

17 / 23

EXAMPLE:

Placeholder Figure A

Recall I2(X;Y ) = H2(X)−H(X|Y ).

Assume P [X = 0] = α, and P [X = 1] = (1− α).

Hence, H2(X) = α log21α + (1− α) log2

11−α = h2(α).

Also, H2(Y |X) = αh2(12). So

I2(X;Y ) = H2(Y )−H2(Y |X) = h2(α

2)− αh2(

1

2)

2log2

2

α+ (1−

α

2) log2

1

1− α2

− αh2(1

2)

Capacity of a Z-Channel

Discrete Memoryless

Channel

Capacity of DMC

Channel Coding

Capacity: Examples

• BSC

• BSC

• BEC

• BEC

• Z-Channel

• Z-Channel

• 5-ary Symmetric

• 5-ary Symmetric

• Code for 5-ary

Symmetric

Computing Capacity

18 / 23

Equivalently Ie(x; y) =α2 ln 2

α(1−α2 ) ln

11−α

2

− αhe(12).

We maximize Ie(X;Y ) over α (same α maximizes I2(X;Y )).

0 =δIe(X;Y )

δα=

1

2ln

2

α−

α

2.1

α+

(

1−α

2

) 1/2(1− α

2

) − he(1

2)

︸ ︷︷ ︸

ln 2

=1

2ln 2−

1

2lnα−

1

2+

1

2+

1

2ln

(

1−α

2

)

− ln 2

This means

1

2ln

α

1− α2

= −1

2ln 2 =

1

2ln

1

2⇒

1− α2

α= 2 =⇒ α =

2

5

C2 = I2(X;Y )|α=2/5 =1

5log2 5 +

4

5log2

5

4−

2

5= log2 5− 2 = .3219

Capacity of a 5-ary Symmetric Channel

Discrete Memoryless

Channel

Capacity of DMC

Channel Coding

Capacity: Examples

• BSC

• BSC

• BEC

• BEC

• Z-Channel

• Z-Channel

• 5-ary Symmetric

• 5-ary Symmetric

• Code for 5-ary

Symmetric

Computing Capacity

19 / 23

EXAMPLE:

Placeholder Figure A

C = maxP (X)

I(X;Y ) = I(X;Y )|P [X=x]= 1

5,x=0,1,2,3,4

= [H(Y )−H(Y |X)]P [X=x]= 1

5,x=0,1,2,3,4 .

If inputs are equally likely, outputs are equally likely. Thus,

H(Y ) = log 5.

So what remains is to compute H(Y |X)|P [X=x]= 1

5,x=0,1,2,3,4.

Capacity of a 5-ary Symmetric Channel

Discrete Memoryless

Channel

Capacity of DMC

Channel Coding

Capacity: Examples

• BSC

• BSC

• BEC

• BEC

• Z-Channel

• Z-Channel

• 5-ary Symmetric

• 5-ary Symmetric

• Code for 5-ary

Symmetric

Computing Capacity

20 / 23

For any value of the input, there are two possible outputs that are

equally likely. In other words,

H(Y |X = x) = log 2, x = 0, . . . , 4.

Hence,

H(Y |X) = log 2.

And

C = log 5− log 2 = log5

2

Note: one can also measure the capacity in terms of 5-art digits

where capacity would be nothing but C5 = log552 .

A simple Code for 5-ary Symmetric Channel

Discrete Memoryless

Channel

Capacity of DMC

Channel Coding

Capacity: Examples

• BSC

• BSC

• BEC

• BEC

• Z-Channel

• Z-Channel

• 5-ary Symmetric

• 5-ary Symmetric

• Code for 5-ary

Symmetric

Computing Capacity

21 / 23

Code Words Outputs

00 −→ 00 01 10 1112 −→ 12 13 22 2324 −→ 24 20 34 3031 −→ 31 32 41 4243 −→ 43 44 03 04

Decoder works in reverse (and never gets confused!).

There are 5 code words and each code word can represent one

5-ary digit or log2 5 binary digits. Since each code word represents

2 uses of the channel the rate of the code is 12 = .5 5-ary digits

(12 log2 5 = 1.1609 binary digits) per channel use.

Now note that this rate is below the capacity of the channel:12 log2 5 = 1.11609 ≤ log 5

2 = 1.322 (or equivalently 12 ≤ log5

52 ).

Computation of Capacity

Discrete Memoryless

Channel

Capacity of DMC

Channel Coding

Capacity: Examples

Computing Capacity

• Blahut-Arimoto

22 / 23

Finding the Capacity Using A Computer

Discrete Memoryless

Channel

Capacity of DMC

Channel Coding

Capacity: Examples

Computing Capacity

• Blahut-Arimoto

23 / 23

• For an arbitrary DMC, one cannot, in general, use analytic

techniques to find the channel capacity.

• Instead of one can use a computer to search out the input

distribution, PX(x), that maximizes I(X;Y )• These is a special algorithm, called the BLAHUT-ARIMOTO

algorithm, for doing this. For simple cases, a brute-force search

can be used.

EXAMPLE:

Placeholder Figure A

1. Use computer to evaluate I(X;Y ) = f(α)2. Optimize with respect to α

Recommended