Question (from exercises 2) Are the following sources likely to be stationary and ergodic? (i)Binary source, typical sequence aaaabaabbabbbababbbabbbbbabbbbaabbbbbba

Question (from exercises 2)Are the following sources likely to be stationary and ergodic?

(i) Binary source, typical sequence aaaabaabbabbbababbbabbbbbabbbbaabbbbbba......

(ii) Quaternary source (4 symbols), typical sequences abbabbabbababbbaabbabbbab....... andcdccdcdccddcccdccdccddcccdc......

(iii) Ternary source (3 symbols), typical sequence AABACACBACCBACBABAABCACBAA…

(iv) Quaternary source, typical sequence 124124134124134124124 …

Definitions

• A source is stationary if its symbol probabilities do not change with time, e.g.

• Binary source: Pr(0) = Pr(1) = 0.5• Probabilities assumed same all the time

• A source is ergodic if it is stationary and(a)No proper subset of it is stationary

i.e. source does not get locked in subset of symbols or states

(b)It is not periodici.e. the states do not occur in a regular pattern

E.g. output s1 s2 s3 s1 s4 s3 s1 s4 s5 s1 s2 s5 s1 s4 s3 …

is periodic because s1 occurs every 3 symbols

Review

source encode/transmit

receive/decode

destinationchannel

Ideal

message messagesignal

NOISE Actual

measure of information - entropy• conditional entropy, mutual information• entropy per symbol (or per second),

entropy of Markov sourceredundancyinformation capacity (ergodic source)

(i) remove redundancy to maximise information transfer(ii) use “redundancy” to correct transmission errors

Shannon Source Coding Theorem

N identical independently distributed random variables

each with entropy H(x)

virtually certain that no information will be lost

N H(x) bitsvirtually certain that

information will be lost

compressionNumber of bits

0

Optimal Coding

• Requirements for a code– efficiency– uniquely decodable– immunity to noise– instantaneous

sourceencode/transmit

receive/decode

destination

A X BXchannel

output of transmitter = input to receiver

alphabet

Aoutput of source = input to encoder

{a1, a2, ... am}

X {b1, b2, ... bn}

Boutput of decoder = input to destination

{a1, a2, ... am}

Noise-free communication channel

Definitions

€

H / L × log2 n( )

€

b1,…,bn{ }

( )L N ai ii

m

= ∑ P=1

Coding conversion of source symbols into a different alphabet for transmission over a channel.Input to encoder = source alphabet = encoder output alphabet = Coding necessary if n < m

Code word group of output symbols corresponding to an input symbol (or group of input symbols)

Code set (table) of all input symbols (or input words) and the corresponding code words

Word Length number of output symbols in a code word

Average Word Length (AWL)

where Ni = length of word

for symbol ai

Optimal Code

has minimum average word length for a given source

Efficiency where H is the entropy per symbol of the source

€

a1,…,am{ }

Binary encoding• A (binary) symbol code f is a mapping or (abusing notation)

where {0,1}+ = {0, 1, 00, 01, 10, 11, 000, 001, … }

• if f has an inverse then it is uniquely decodable

• compression is achieved (on average) by assigning– shorter encodings to the more probable symbols in A– longer encodings to the less probable symbols

• easy to decode if we can identify the end of a codeword as soon as it arrives (instantaneous)– no codeword can be a prefix of another codeword– e.g 1 and 10 are prefixes of 101

€

f : A → 0,1{ }+

€

∀x,y ∈ A+, x ≠ y ⇒ f x( ) ≠ f y( )€

f : A+ → 0,1{ }+

Prefix codes• no codeword is a prefix of any other codeword.

– also known as an instantaneous or self-punctuating code,

– an encoded string can be decoded from left to right without looking ahead to subsequent codewords

– prefix code is uniquely decodeable (but not all uniquely decodable codes are prefix codes)

– can be written as a tree, leaves = codewords

a 1

b 10

c 100

d 1000

a 0

b 10

c 110

d 111

a 00

b 01

c 10

d 11

a 0

b 01

c 011

d 111

Limits on prefix codes• the maximum number of

codewords of length l is 2l

• if we shorten one codeword, we must lengthen others to retain unique decodability

• For any uniquely decodable binary coding, the codeword lengths li satisfy

€

2−li

i

∑ ≤1

(Kraft inequality)

0

00

0000000

0001

0010010

0011

01

0100100

0101

0110110

0111

1

10

1001000

1001

1011010

1011

11

1101100

1101

1111110

1111

source code 1 code 2 code 30 0 0000 0001 1 0001 0012 10 0010 01103 11 0011 0111

4 100 0100 01005 101 0101 01016 110 0110 100

7 111 0111 1018 1000 1000 1109 1001 1001 111

average word length 2.6 4 3.4

code 1 code 2 code 3

length variable fixed variable

efficiency

uniquely decodable

instantaneousprefix

Kraft inequality

• all source digits equally probable

• source entropy = log210 = 3.32 bits/sym

Coding example

Prefix codes (reminder)

• variable length• uniquely decodable• instantaneous• can be represented as a tree• no code word is a prefix of another

– e.g. if ABAACA is a code word then A, AB, ABA, ABAA, ABAAC cannot be used as code words

• Kraft inequality

€

2−li

i

∑ ≤1

Optimal prefix codes

• if Pr(a1) Pr(a2) … Pr(am),then l1 l2 … lm where li = length of word for symbol ai

• at least 2 (up to n) least probable input symbols will have the same prefix and only differ in the last output symbol

• every possible sequence up to lm-1 output symbols must be a code word or have one of its prefixes used as a code word (lm is the longest word length)

• for a binary code, the optimal word length for a symbol is equal to the information content i.e.

li = log2(1/pi)

Converse

• conversely, any set of word lengths {li} implicitly defines a set of symbol probabilities {qi} for which the word lengths {li} are optimal

€

qi =2−li

z

z = 2j

∑−l j

a 0

b 10

c 110

d 111

1/2

1/4

1/8

1/8

Compression - How close can we get to the entropy?

• We can always find a binary prefix code with average word length L satisfying

€

H A( ) ≤ L < H A( ) +1

€

x⎡ ⎤

x⎡ ⎤< x +1

Let be the smallest integer that is ≥ x

Clearly

€

li = log2

1

pi

⎛

⎝ ⎜

⎞

⎠ ⎟

⎡

⎢ ⎢

⎤

⎥ ⎥Now consider

Huffman prefix code

• used for image compression• General approach

– Work out necessary conditions for a code to be optimal– Use these to construct code

• from condition (3) of prefix code (earlier slide)am x x … x 0 (least probable)

am-1 x x … x 1 (next probable)

therefore assign final digit first• e.g. consider the source on the right

Symbol Probability

s1 0.1

s2 0.25

s3 0.2

s4 0.45

Algorithm

1. Lay out all symbols in a line, one node per symbol

2. Merge the two least probable symbols into a single node

3. Add their probabilities and assign this to the merged node

4. Repeat until only one node remains

5. Assign binary code from last node, assigning 0 for the lower probability link at each step

Examples1

Pr(s1)=0.1

s2

Pr(s2)=0.25

s3

Pr(s3)=0.2

s4

Pr(s4)=0.45

s1

Pr(s1)=0.1

s2

Pr(s2)=0.25

s3

Pr(s3)=0.2

s4

Pr(s4)=0.45

0.3

Example - contd.

s1

Pr(s1)=0.1

s2

Pr(s2)=0.25

s3

Pr(s3)=0.2

s4

Pr(s4)=0.45

0.3

0.55

1

Example - step 5

s1

Pr(s1)=0.1

s2

Pr(s2)=0.25

s3

Pr(s3)=0.2

s4

Pr(s4)=0.45

0.3

0.55

1

10

10 0

1

0

10

111110

Algorithm

1. Lay out all symbols in a line, one node per symbol

2. Merge the two least probable symbols into a single node

3. Add their probabilities and assign this to the merged node

4. Repeat until only one node remains

5. Assign binary code from last node, assigning 0 for the lower probability link at each step

Comments

• we can choose different ordering of 0 or 1 at each node– 2m different codes (m = number of merging nodes, i.e., not symbol

nodes)– 23 = 8 in previous example

• But, AWL is the same for all codes– hence source entropy and efficiency are the same

• What if n (number of symbols in code alphabet) is larger than 2?

– Condition (2) says we can group from 2 to n symbols– Condition (3) effectively says we should use groups as large as

possible and end with one composite symbol at end

Disadvantages of Huffman Code

• we have assumed that probabilities of our source symbols are known and fixed– symbol frequencies may vary with context (e.g. markov source)

• up to 1 extra bit per symbol is needed– could be serious if H(A) ≈1bit !– e.g. English : entropy is approx 1 bit per character

• beyond symbol codes - arithmetic coding– move away from the idea that one symbol integer number of

bits– e.g. Lempel-Ziv coding– not covered in this course

Another question

• consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown

• what is the probability that a randomly chosen bit from the encoded message is 1?

a 1/2 0

b 1/4 10

c 1/8 110

d 1/8 111

Shannon-Fano theorem

• Channel capacity– Entropy (bits/sec) of encoder determined by entropy of source (bits/sym)– If we increase the rate at which source generates information (bits/sym) eventually we

will reach the limit of the encoder (bits/sec). At this point the encoder’s entropy will have reached a limit

• This is the channel capacity• S-F theorem

– Source has entropy H bits/symbol– Channel has capacity C bits/sec– Possible to encode the source so that its symbols can be transmitted at up to C/H

symbols per second, but no faster– (general proof in notes)

source encode/transmit

receive/decode

destinationchannel

€

2−li

i

∑ = 2− log 1 p i( )⎡ ⎤

i

∑ ≤ 2− log 1 p i( )

i

∑ = pi

i

∑ =1

satisfies kraft

€

L = pi log 1 pi( )⎡ ⎤i

∑ < pi log 1 pi( ) +1( )i

∑ = H A( ) +1

average word length

Documents

Question (from exercises 2) Are the following sources likely to be stationary and ergodic? (i)Binary source, typical sequence aaaabaabbabbbababbbabbbbbabbbbaabbbbbba