26
Information Theory Computer Engineering Department Second Year Dr. Eng. Riyadh J.S. Al-Bahadili PDF created with pdfFactory Pro trial version www.pdffactory.com

Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Information Theory

Computer Engineering Department

Second Year

Dr. Eng. Riyadh J.S. Al-Bahadili

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 2: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

1 | P a g e

Information Theory Information theory provides a quantitative measure of info contained in message signals and

allows us to determine the capacity of a communication system to transfer this info from source

to destination. Information theory is originally known as ‘Mathematical Theory of

Communication’ and it deals with mathematical modeling and analysis of a communication system

rather than with physical channel.

By the information theory, we can consider the efficient way to communicate the data. It basically

provides limits on:

1. The minimum number of bits per symbol required to fully represent the source.

2. The maximum rate at which reliable communication can take over the channel.

1. Concept of Information

An info source is an object that produces an event, the outcome of which is selected at random

according to a probability distribution.

A discrete info source is a source that has only a finite set of symbols as outputs. The set of source

symbols is called the source alphabet, and the elements of the set are called symbols or letters. Info

sources can be classified as having memory or being memoryless.

A memory source is one for which a current symbol depends on the previous symbols. A

memoryless source is one for which each symbol produced is independent of the previous symbols.

The communication system never be described in the deterministic sense, it can be considered of

Statistical nature. It means to describe a communication system completely we have to use its

unpredictable ‘or’ uncertain behavior.

It can be easily understand by example that each transmitter transmit the information randomly,

we cannot predict which one message, and transmitter is going to be transfer just next moment.

But we know the probability of transmitting a particular message.

So to define a system completely, we need statistical study of system and statistical study of system

is performed with the help of concept of probability.

Now, just an example of two messages:

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 3: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

2 | P a g e

(a) Bird flies

(b) Cat flies

Sentence (a) has minimum information, but sentence (b) has maximum information as this

sentence (b) has least probability to occur. So we can say there is a sort of inverse relationship

between the probability of event and the amount of information associated with it. Thus, we can

say: ( ) = 1 ( ) Where xi is an event with a probability of P ( ) and the amount of information associated with it is I ( ). Generally, for simplicity we define the logarithm measurement of information.

( ) = log ( ) = − log ( )

Example.1:

How many bits per symbol to encode 32 different symbols?

We have M=32 symbols, so P(x) = 1/32, ( ) = log 32 = 5 /

The advantage of logarithm presentation is that if we have probability of joint event ( , ) and if both are statistically independent

, = ( ). ( )

So, , = ( ) + ( ) [proof H.W]

Example.2:

The symbols A, B, C, and D occur with probability ½, ¼, 1/8, and 1/8 respectively. Find the info content in the message ‘BDA’, where the symbols are independent.

I (BDA) = I (B) + I (D) + I (A) = log 4 + log 8 + log 2 =6 bits

As we know that the base of logarithm can be different, so we may have different units of information:

• Bits (Base 2) • Nats (Base e) • Decits (Base 10)

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 4: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

3 | P a g e

e.g. 1 = 1log = 0.6932 1 = 1log 10 = 0.3010 1 = 1log 2 = 3.219

2. Entropy

In the communication system, we don’t have only a single message, but have a number of

messages. So to calculate the total information instead of calculating information due to individual

messages and adding them, we calculate the average information of the system knows as entropy

of the source.

Let there be M different messages m1, m2... mM with their respective probability P1, P2, PM.

Let us assume that in a long time interval, L messages have been generated. Let L be very large so

that L > > M; then the number of messages m1=P1 L

The amount of information in message ml = log (1/P1)

Thus, the total amount of information in ml = P1 L log (1/P1)

The total amount of information in all L messages will be

It = P1 L log (1/P1) + P2 L log (1/P2) + … + PM L log (1/PM)

So, the average information will be

H=It / L = P1 log (1/P1) + P2 log (1/P2) + … + PM log (1/PM)

Or = ∑ log 1/

Thus the unit of entropy wi1l be information/message. I (x) is called self-information and simply

H (x) is called self-entropy.

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 5: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

4 | P a g e

Example.3:

A discrete source has 4 symbols x=[x1, x2, x3, x4] with probability P= [1/2, 1/4, 1/8, 1/8]. Find info content in each symbol then calculate the entropy. Also calculate the bit average for the message ‘x1 x2 x1 x4 x3 x1 x1 x2’.

( ) = log ( ) I(x1) = 1 bit, I(x2) = 2 bits, I(x3) = 3 bits, I(x4) = 3 bits ( ) = −∑ P ( ) log P ( ) = 1.75 bit/symbol

The message ‘x1 x2 x1 x4 x3 x1 x1 x2’ has 8 symbols which consist of 14 bits

So the bit average = 14/8 = 1.75 bit/symbol

Exercise.1: For binary discrete source, x has two symbols x1 and x2. Prove mathematically that H(x) is maximum when p(x1) = p(x2) and max H(x) = 1 bit/symbol.

3. Rate of Information

If a message source generates message (or symbols) at the rate of r messages (or symbols) per second; then rate of information

R= r H bits/second

Example.4:

An event has total six outcomes with probability P1=1/2, P2=1/4, P3=1/8, P4=1/16, P5=1/32, and P6=1/32. Find the entropy of the system. Also find the rate of information if these are 18 outcomes per second.

Solution. By the formula of entropy ∑ log 1/

H= ½ log 2+ ¼ log 4+1/8 log 8+ 1/16 log 16+ 1/32 log 32+1/32 log 32 = 31/16 bits/message

But now r= 18 outcomes / second

R= r H =18 (31/16) = 34.875 bits/sec

Exercise.2: A discrete source emits one of five symbol once every one microsecond with probabilities ½, ¼, 1/8, 1/16, 1/32 respectively. Determine the entropy and information rate.

(Check answer: H=57/32 bits/symbol, R=1.78125 Mb/s)

Exercise.3: TV picture consists of 2 × 10 pixels and 16 different grey levels. The pictures are repeated at the rate of 32 picture/sec. All grey levels have equal likelihood of occurrence. Find the average rate of info conveyed by this TV.

(Check answer: R= 256 Mbits/sec)

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 6: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

5 | P a g e

Exercise.4: A telegraph source has two symbols (dot and dash). The time of dot is 0.5 sec, the time of dash is 3 times of dot time, and the time between symbols is 0.2 sec. The probability of dot’s occurring is twice that of the dash. Find the average rate of info for this telegraph.

(Check answer: R= 1.725 bit/sec)

4. Discrete Memoryless Channel (DMC)

A communication channel is the path or medium through the symbols flow to the receiver. DMC is a statistical model with input X and output Y. Its ‘memoryless’ when the current output depends only on the current input and not on any of the previous inputs.

4.1 Conditional Probability ( / ) Represents the conditional probability of obtaining output given that the input , it’s also called channel transition probability.

x1 y1

x2 X ( / ) Y y2

xi yj

xm yn

A channel is completely specified by the complete set of transition probabilities:

( ⁄ ) = ( 1 1⁄ ) ( 1 2⁄ ) ( 2 1⁄ ) ( 2 2⁄ ) ⋯ ( 1⁄ ) ( 2⁄ )⋮ ⋱ ⋮ ( 1 ⁄ ) ( 2 ⁄ ) ⋯ ( ⁄ ) The matrix ( ⁄ ) is called channel matrix, and each row in this matrix must sum to unity.

∑ ( / ) = 1

Let

P(X) = [p (x1) p (x2) … p (xm)]

P(Y) = [p (y1) p (y2) … p (yn)]

Then

( ) = ( ) ( / )

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 7: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

6 | P a g e

4.2 Joint Probability

If P(X) is represented as a diagonal matrix:

( ) = ( 1) 00 ( 2) ⋯ 00⋮ ⋱ ⋮0 0 ⋯ ( ) Then

( , ) = ( ) ( ⁄ )

The matrix P(X, Y) is the joint probability matrix, and the element p ( , ) is the joint probability of transmitting and receiving .

, = ( ) ⁄ = ( ) ( ⁄ )

Note:

( ) = ∑ ( , ) And = ∑ ( , )

Example.5:

For the following binary channel (p(x1) = p(x2) =1/2)

X1 .9 y1

0.1

0.2

X2 0.8 y2

a. Construct the channel matrix for this channel b. Find p(y1) and p(y2) c. Find the joint probability p(x1, y2) and p(x2,y1)

Solution:

(a) ( ⁄ ) = ( 1 1⁄ ) ( 2 1⁄ ) ( 1 2⁄ ) ( 2 2⁄ ) = . 9 . 1. 2 . 8 (b) P(X) = [p(x1) p(x2)] = [ 0.5 0.5]

P(Y) = P(X) P(Y/X)

= [0.5 0.5] . 9 . 1. 2 . 8 = [.55 .45] = [p(y1) p(y2)]

(c) ( , ) = ( ) ( ⁄ )

= . 5 00 . 5 . 9 . 1. 2 . 8 = . 45 . 05. 1 . 4 Hence p(x1, y2) =.05 and p(x2, y1) =.1

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 8: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

7 | P a g e

Exercise.5:

The following channel matrix has P(X) = [.5 .5]

( ⁄ ) = 1 − 00 1 − a. Draw the channel b. Find P(Y) if p=0.2

5. Joint Entropy and Conditional Entropy

Suppose, we have totally m messages

[X] = [x1, x2, ... xm]

And now at receiver, we receive totally n messages

[Y] = [y1, y2, ... yn]

Then P ( ) = called marginal probability of x messages

P ( ) = called marginal probability of y messages

Thus, marginal entropy

( ) = −∑ P ( ) log P ( )

( ) = −∑ P ( ) log P ( )

Where P ( ) may be defined as

( ) = ∑ ( , )

(Note: 0 ≤ ( ) ≤ )

( , ) = joint probability of event x and y

Joint entropy of x and y

( , ) = −∑ ∑ , log ( , )

Note that ( ⁄ ) and ( ⁄ ) are called conditional probability that will be clear by their definitions.

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 9: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

8 | P a g e

( ⁄ ) = Probability of X when Y has been received

( ⁄ ) = Probability of Y when X has been transmitted

For conditional entropy; we use the relation

( ⁄ ) = −∑ ∑ , log ( ⁄ )

Similarly

( ⁄ ) = −∑ ∑ , log ( ⁄ )

6. Relations between the different entropies

The relations between joint, conditional and marginal entropies given by,

( , ) = ( ⁄ ) + ( ) = ( ⁄ ) + ( )

Exercise.6:

Find H(X), P(X, Y), and H(Y) for given channel shown in figure, given that P(X1) =0.2, P(X2) =0.5, and P(X3) =0.3

0.8 X1 Y1

0.2

X2 1 Y2

0.3

X3 0.7 Y3

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 10: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

9 | P a g e

Exercise.7:

A transmitter has an alphabet of four letters [x1, x2, x3, x4] and the receiver has an alphabet of three letters. Calculate all entropies if the joint probability matrix is:

P(X, Y) = 0.3 0.05 0 0.25 00 0 0.15 0 0.05 0.050.15 (Check answer: H(X) = 1.96, H(Y/X) =0.53, H(X, Y) =2.49, H(Y) =1.49, H(X/Y) =1.0)

7. Mutual Information

We have

( ) = Probability of transmitting ( ⁄ ) = Probability of transmitting , when has been received

Thus ( ) shows the probability ‘or’ uncertainty of x; when we have not received any thing called prior uncertainty and ( ⁄ ) called final uncertainty of ‘x’ when we have received that at receiver side and the difference of these uncertainties called mutual information. ; Mutual information represents the uncertainty about input that is resolved by observing the output.

( ; ) = ∑ ∑ ( , ) log ( ⁄ ) ( )

Properties of ( ; )

• I(X; Y) = I(Y; X)

• I(X; Y) ≥ 0

• I(X; Y) = H(Y) − H(Y X⁄ ) = H(X) − H(X Y⁄ )

• I(X; Y) = H(Y) + H(X) − H(X, Y)

8. Channel Types

8.1 Lossless Channel

Channel matrix with only nonzero element in each column. No source info is lost in transmission.

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 11: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

10 | P a g e

e.g.

( ⁄ ) = 3/4 1/4 0 0 000 00 1/30 2/30 01 , where ⁄ = 0 1

8.2 Deterministic Channel

Channel matrix with only nonzero element in each row. The element must be 1.

e.g. ( ⁄ ) = ⎣⎢⎢⎢⎡11 00 00000 110 001⎦⎥⎥

⎥⎤ 8.3 Noiseless Channel

It’s both lossless and deterministic, with m = n.

X1 y1

X2 y2

Xm yn

Where ⁄ = 1 = 0 ≠ 8.4 Binary Symmetric Channel (BSC)

( ⁄ ) = 1 − 1 − Example.6:

Consider BSC channel with ( ) =∝

1-p

p

p

1-p

a. Show that:

( ; ) = ( ) + log + (1 − ) log (1 − )

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 12: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

11 | P a g e

b. Calculate ( ; ) for α=0.5 and p=0.1 c. Repeat (b) for p=0.5 and comment on the result.

Solution:

a. We have ( ) =∝ , so ( ) = 1−∝

( , ) = ( ) ( ⁄ )

( , ) = ∝ 00 1−∝ 1 − 1 − = ∝ (1 − ) ∝ (1−∝) (1−∝)(1 − ) ( ⁄ ) = −∑ ∑ , log ( ⁄ ) = −∝ (1 − ) log (1 − )−∝ log − (1−∝) log − (1−∝)(1 − ) log (1 − )

= − log − (1 − ) log (1 − )

I(X; Y) = H(Y) − H(Y X⁄ )

= ( ) + log + (1 − ) log (1 − )

b. When α=0.5 and p=0.1

P(Y) = P(X) P(Y/X) = [. 5 .5] . 9 . 1. 1 . 9 = [. 5 .5] = [ ( ) ( )] ( ) = −∑ p ( ) log p ( ) = −.5 log . 5 − .5 log . 5 = 1

log + (1 − ) log (1 − ) = −0.469

Hence I(X; Y) = 1 − 0.469 = 0.531

c. When α=0.5 and p=0.5

P(Y) = P(X) P(Y/X) = [. 5 .5] . 5 . 5. 5 . 5 = [. 5 .5] = [ ( ) ( )] ( ) = −∑ p ( ) log p ( ) = −.5 log . 5 − .5 log . 5 = 1

log + (1 − ) log (1 − ) = −1

Hence I(X; Y) = 1 − 1 = 0

When I(X; Y) = 0, the channel is useless, i.e. when p=0.5 no information is being transmitted at all.

Exercise.8:

For a lossless channel show that: H(X/Y) = 0

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 13: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

12 | P a g e

Exercise.9:

For a noise channel with inputs = outputs = m, show that

H(X) = H(Y), and H(Y/X) =0

Exercise.10:

Show that: ( , ) = ( ⁄ ) + ( )

Exercise.11:

Show that: ( ; ) = ∑ ∑ ( , ) log ( ⁄ ) ( )

Exercise.12:

Show that: I(X; Y) = I(Y; X)

Exercise.13: Show that: I(X; Y) ≥ 0 (Hint: log = − log( ) , and ln ∝≤∝ −1)

9. Channel Capacity

The mutual information also shows the average information per symbol transmitted the system. And it will be practical as the Shannon has also showed that capacity of channel can be said as the max practical rate of information. So capacity per symbol Cs of channel is given by:

Cs = max ( ) I(X; Y) … Bit/symbol

= max [H(X) – H(X/Y)] = max [H(Y) – H(Y/X)]

If r is symbol rate, then the channel capacity per second C is:

C= r Cs …. Bit/sec

• For lossless channel: Cs=log2 m • For deterministic channel: Cs=log2 n • For noiseless channel: Cs=log2 m = log2 n • For BSC channel: Cs=1 + log + (1 − ) log (1 − )

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 14: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

13 | P a g e

Example.7:

Find the capacity for the following channel (where ( ) =∝)

1-p

p

p

1-p

Solution:

( ⁄ ) = 1 − 00 1 − ( ) = ( ) ( ⁄ ) = [∝ 1−∝] 1 − 00 1 − = [∝ (1 − ) (1−∝)(1 − )] ( , ) = ( ) ( ⁄ ) = ∝ 00 1−∝ 1 − 00 1 − = ∝ (1 − ) ∝ 00 (1−∝) (1−∝)(1 − ) ( ) = −∑ p ( ) log p ( )

=(1 − )[−∝ log ∝ − (1−∝) log (1−∝)] − log − (1 − ) log (1 − )

( ⁄ ) = −∑ ∑ , log ( ⁄ )

=− log − (1 − ) log (1 − )

( ; ) = ( ) − ( ⁄ ) = (1 − ) ( )

Cs=max ( ) I(X; Y) = max ( )(1 − p)H(X) = (1 − ) max ( ) ( ) = (1 − )

(Note: max ( ) H(X) = log = log 2 = 1)

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 15: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

14 | P a g e

10. Additive White Gaussian Noise Channel (AWGN)

X y

N

The noise characteristic of channels practically observed are assumed Gaussian. The channel capacity for this channel is:

C= Max{R}

Or = log (1 + / ) … bit/sec

Where:

B: bandwidth of the channel (Hz)

S/N: signal to noise ratio (SNR)

S: signal power in watt

N: noise power in watt (N=B N0) and N0 is PSD of the noise (W/HZ)

Note: The channel is error-free if and only if ≥

Example.8:

Consider AWGN channel with 4 KHz bandwidth and noise PSD is 2x10-12 W/HZ. The signal power required at the receiver is 0.1 mW. Calculate the capacity of this channel.

Solution:

We have: B=4000 Hz, S=0.1x10-3 W, N0=2x10-12 W/Hz

N=N0B =2 (10-12) (4000) = 8x10-9 W

SNR= S/N = . ( ) ( ) = 1.25(10 )

Hence = log 1 + = 4000 log [1 + 1.25(10 )] = 54.44 / Example.9:

The terminal of a computer used to enter alphabetic data is connected to the computer through a voice grade telephone line having a usable bandwidth of 3 KHz and SNR=10. Assume that terminal has 128 characters, determine:

a. Capacity of channel b. The max rate of transmission without error

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 16: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

15 | P a g e

Solution:

a. = log 1 + = 3000 log (1 + 10) = 10.378 / b. Average information: = log 128 = 7 / ℎ

The rate is =

For error-free transmission: ≤

≤ 7 ≤ 10378

Hence ≤ 1482 ℎ / Exercise.14:

Calculate the capacity of low pass channel with a usable bandwidth of 3 KHz and SNR=100 at channel output. Assume the channel noise to be white Gaussian.

Exercise.15:

A discrete signal with 256 samples is transmitted by rate of 104 sample/sec.

a. What is the information rate? b. Can the output be transmitted without error over AWGN channel with B=10 KHz and

SNR=100? c. Find the required SNR for error-free transmission for part b d. Find the required Bandwidth for AWGN channel for error-free transmission if SNR=100.

11. Code Length, Code Efficiency and Redundancy

The length of a code word is the number of bits in the code word (symbol). The average code word length per source symbol is:

= ∑ ( )

Where is the length of symbol in bits.

Code efficiency may be defined as the ratio of actual transmission rate to the max transmission rate.

= ( ; ) ( ; ) Or = ( ) . 100%

The redundancy of the channel is defined as

= 1 −

We know that rate of information

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 17: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

16 | P a g e

= /

Where r is the symbol rate (symbol/sec). If all symbols convey the same amount of information then, = log so = log

Now consider we are using an encoder that converts the incoming symbols to code words consisting of bits produced at same fixed rate. Then

= log 2 =

If the symbols have different probabilities, then ≤ or ≥

Here represents a very important parameter; and called average code length denoted by .

For optimum source coding = , but practically ≥

12. Kraft Inequality

A necessary and sufficient condition for a binary code to be uniquely decipherable, the code length must be such that,

= ∑ 2 ≤ 1

Now simplest coding is that we generate a fixed length code in which all codes have the same length given by: = , so

= 2

This means that for decipherable (equally identified) codes in the case of fixed length coding, we need

≥ log

So, the resulting efficiency can be calculated as:

Here the result of this discussion is that if < log and we need higher efficiency, we have to reduce the average code length . That is why we use variable code length code.

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 18: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

17 | P a g e

Example.10:

For the following codes:

xi Code A Code B Code C Code D

00

01

10

11

0

10

11

110

0

11

100

110

0

100

110

111

a. Show that all codes except code B satisfy Kraft Inequality. b. Show that code A and D are uniquely decodable but codes B and C are not.

Solution:

a. For Code A: = = = = 2

= ∑ 2 =1/4+1/4+1/4+1/4 = 1

For Code B: = 1, = = 2, = 3 = ∑ 2 =1/2+1/4+1/4+1/8 = 9/8 >1 For Code C: = 1, = 2, = = 3 = ∑ 2 =1/2+1/4+1/8+1/8 = 1 For Code D: = 1, = = = 3 = ∑ 2 =1/2+1/8+1/8+1/8 = 7/8 <1

b. Codes A and D are prefix-free codes so they are uniquely decodable. Code B is not uniquely decodable because it does not satisfy Kraft Inequality. Code C satisfy Kraft Inequality but not uniquely decodable (0110110 corresponds to the ‘x1 x2 x1 x4’ or ‘x1 x4 x4’ )

13. Source Coding Theorem

The last stage of digital system is encoding. So we can say that in communication system; for an efficient and error free transmission of data; coding techniques are very important. Basically we have two things in our mind:

(a) Codes generated should be in binary form.

(b) The code should be decipherable means it should be uniquely identified so that at the receiver side; it should be decoded easily and without any error at the receiver side.

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 19: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

18 | P a g e

An objective of source encoding is to minimize the average bit rate required for representation of source by reducing the redundancy of information source.

13.1 Prefix Coding

Codes generated by encoder should be unique for minimum error during decoding. But due to this; every source have limitation on the number of codes generated. For the purpose that we have codes that are uniquely identified can be solved by using ‘prefix coding ’.

‘A prefix code is defined as a code in which no code word is the prefix of any other code-word’

This can be understand by the following example:

Code I Code II (Prefix code)

S0

S1

S2

S3

0

1

00 S0 is the prefix

11 S1 is the prefix

0

10

110

111

It can be easily see that these code can be easily decoded by decoder as this follows the Kraft inequality.

It is also found that there are some codes that can follow the Kraft inequality but not prefix code, so they can be decoded without any error. For example:

Code III

0

01

011

0111

13.2. Shannon-Fano Coding

For variable length coding; if we apply a practical concept that the generally used codes (means codes with high-probability) should be coded in the minimum length and rarely used codes should be coded in the long length so that don’t effect the efficiency too much.

Shannon Fano coding generates efficient codes in which the word length increases as the probability of symbol decreases.

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 20: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

19 | P a g e

In this method; first of all arrange message according to the descending order of probability. Now draw a line that divides the symbol into two groups such that the group probabilities are as nearly as equal as possible.

Then assign the digit 0 to each symbol in the group above the line and digit l to each symbol in the

group below the line. For all the subsequent steps, subdivide each group into subgroups and again

assign digits following the previous rule. Whenever a group contains one symbol no further

subdivision is possible and code word for that symbol is completed. When all the groups have been

reduced to one symbol, the code words for each symbol is assigned.

Example.11: For the given message sequence with their probabilities, Apply Shannon Fano coding, calculate the code efficiency.

[x] = [x1 x2 x3 x4 x5 x6 x7 x8]

[P] = [1/4 1/8 1/16 1/16 1/16 1/4 1/16 1/8]

Solution. Arrange the probabilities in descending order.

Message

X

Prob. Encoding Code Length X1

X6

X2

X8

X3

X4

X5

X7

¼

¼

0 0

0

0

2

2

3

3

4

0 1

1/8

1/8

1/16

1/16

1/16

1/16

1

1

0

0 1

1

1

1

1

1

1

0

0 1 4

1

1

1 0 4

1 1 4

Now length/message = ∑ = 2.75 letters/message ( ) = ∑ P ( ) log 1/P ( ) = 2.75 bits/message

Code efficiency, = ( ) . 100 = . . .100 = 100 %

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 21: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

20 | P a g e

Exercise.16:

Apply the Shannon Fano coding and find the code efficiency

[x] = [x1 x2 x3 x4 x5 x6 x7]

[P] = [.4 .2 .12 .08 .08 .08 .04]

(Check answer: efficiency = 96.03 % )

13.3. Huffmann Coding

The Huffmann coding (compact coding) is an optimum coding in the sense that no other uniquely

decodable set of code-words has a smaller average code-word length for a given source.

The Huffmann encoding algorithm proceeds as follows:

1. The source symbols are arranged in the descending order

2. The two source symbols of least probabilities are regarded as being combined into a new source

symbol with probability equal to the sum of the two original probabilities. The probability of the

new symbol is placed in the list in accordance with its value.

3. The procedure is repeated until we are left with a final list of source. Symbol of only two for

which a ‘0’ and a ‘1 ‘are assigned. The code for each symbol is found by working backward and

tracing the sequence of 0s and 1s assigned to that symbol as well as its successor.

Example.12: We have 5 symbols for a discrete source

Xi S0 S1 S2 S3 S4

P (xi) .4 .2 .2 .1 .1

Obtain Huffmann coding, average code word length, entropy of the given system.

Solution (a) Entropy of the system is given by: ( ) = ∑ P ( ) log 1/P ( ) = 2.12193 bits/message

(b) Huffmann coding can be performed as follow:

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 22: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

21 | P a g e

Symbol Step 1 Final Codes Step 2 Step 3 Step 4

S0

S1

S2

S3

S4

.4

.2

.2

.1

.1

00

10

11

010

011

.4

.2

.2

.2

00

01

10

11

.4

.4

.2

1

00

01

.6

.4

0

1

Thus we have finally

Symbol Code Code length (Ni)

S0

S1

S2

S3

S4

00

10

11

010

011

2

2

2

3

3

(c) Average Code length

= ∑ = 2.2 letter/message

Exercise.17:

A message source generates ten messages with probabilities 0.1, 0.13, 0.01, 0.04, 0.08, 0.29, 0.06, 0.22, 0.05 and 0.02. The rate of message generation is 300 message/sec. Find the entropy of source and information rate. Obtain the Huffmann codes for message and calculate the average number of bits/message. What is code redundancy?

(Check answer: H= 2.38 bits/message, R=714 bits/sec, = 2.43 letter/message, = 97.94 %, Redundancy γ= 2.06 %)

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 23: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

22 | P a g e

14. Error Detection and Correction

Tx

Codec Modem Channel

Rx

Model of digital communication system

The goal of the channel encoding and decoding process is to detect (or decode) the data digits with minimum probability of error. This is an effective way of increasing the channel capacity.

The basic idea of coding is to add a group of check digits to the message digits. The check digits may then provide the receiver with sufficient information to either detect or correct channel errors.

= +

k: number of message digits

r: number of check digits

n: code word

Single parity check

A simple one error detection with r= 1 bit

k r

xor

Hamming code

A class of linear codes which can correct all patterns of single error in received word.

= 2 − 1

Block coding

Let the encoding word

Message Source Encoder

Channel Encoder

Modulator

User Source Decoder

Channel Decoder

Demodulator

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 24: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

23 | P a g e

= [ … … ]

Message check digits

Then

= ( , , … , )

= ( , , … , )

= ( , , … , )

For k=3, and r=3: = [ ] With the following functions for check digits:

= ⨁

= ⨁

= ⨁

(Note: the operator ⨁ is modulo-2 addition)

Then

= 0. ⨁ 1. ⨁ 1.

= 1. ⨁ 0. ⨁ 1.

= 1. ⨁ 1. ⨁ 0.

Or in matrix form:

= 0 1 11 0 11 1 0 0 1 1 1 0 01 0 11 1 0 0 1 00 0 1 = 0

In general:

= 0

Where H is r x n matrix called parity check matrix.

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 25: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

24 | P a g e

Decoding process

Let the error vector be

= [ … ] , where = 0 … 1 … The received word is

= [ … ] And = ⊕

The decoder begins computing the syndrome S

=

Now we have two cases:

• If E=0 (no error), then = 0, i.e. = = • If E≠0 (there’s an error), then:

= = ( + ) = +

But = 0, then =

That’s mean represents a column of H matrix.

Example.13:

For k=3, n=6 and the parity check matrix is:

= 0 1 1 1 0 01 0 11 1 0 0 1 00 0 1

If the received word is R= [0 1 0 0 1 1], check if there’s an error occurred in R, then find the correct transmitted word C.

Solution:

= = 0 1 1 1 0 01 0 11 1 0 0 1 00 0 1 ⎣⎢⎢⎢⎢⎡010011⎦⎥⎥⎥⎥⎤ = 110

PDF created with pdfFactory Pro trial version www.pdffactory.com

Page 26: Information Theory - University of Technology, Iraq 2014/dr ryadh inth2/Information... · Information theory is originally known as ‘Mathematical Theory of Communication’ and

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering

25 | P a g e

Since S≠0 (an error occurred)

E= [0 0 1 0 0 0]

= ⊕ = [0 1 0 0 1 1] ⊕[0 0 1 0 0 0] = [0 1 1 0 1 1]

Exercise.18:

For k=4, n=7. If the received word is R= [1 1 1 1 0 1 0], check if there’s an error occurred in R, then find the correct transmitted word C for the flowing functions.

= ⨁ ⨁

= ⨁ ⨁

= ⨁ ⨁

PDF created with pdfFactory Pro trial version www.pdffactory.com