36
10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/1/39 COM347J1 Networks and Data Communications Ian McCrum Room 5D03B Tel: 90 366364 voice mail on 6 th ring Email: [email protected] Web site: http://www.eej.ulst.ac.uk Lecture 4: Data Compression, Error Detection and Error Correction

COM347J1 Networks and Data Communications

  • Upload
    hide

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

COM347J1 Networks and Data Communications. Lecture 4: Data Compression, Error Detection and Error Correction. Ian McCrum Room 5D03B Tel: 90 366364 voice mail on 6 th ring Email: [email protected] Web site: http://www.eej.ulst.ac.uk. The Encoding and compression of data. Introduction - PowerPoint PPT Presentation

Citation preview

Page 1: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/1/39

COM347J1Networks and Data Communications

Ian McCrum Room 5D03BTel: 90 366364 voice mail on 6th ringEmail: [email protected]

Web site: http://www.eej.ulst.ac.uk

Lecture 4: Data Compression, Error Detection and Error Correction

Page 2: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/2/39

The Encoding and compression of data

• Introduction

• Information Content of a message stream

• simple coding methods

• Huffman coding

• compression techniques

Page 3: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/3/39

REDUNDANCY

• Consider that you were in receipt of the following telegram:

• RONMIE (ROCKTT) O’SULLIVON 146 CREAK

• It is possible due to the inherent redundancy of natural language to perform a reconstruction leading to the message on the next slide.

Page 4: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/4/39

REDUNDANCY

• Consider that you were in receipt of the following telegram:

• RONMIE (ROCKTT) O’SULLIVON 146 CREAK

• It is possible due to the inherent redundancy of natural language to perform a reconstruction leading to the message below.

• RONNIE (ROCKET) O’SULLIVAN 146 BREAK

• but what about the numbers in the message?

Page 5: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/5/39

Redundancy

• Redundancy arises due to the correlation of letters occurring in natural language, consider the word:

• YACH ( if T is sent it will carry no information)

• Is it possible for a coding schema to produce an Ideal code?

Page 6: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/6/39

Reduction of Redundancy

• observe the – Statistical occurrence of symbols– Repetition of symbols

• employ– Fano coding, Huffman coding (the most

common symbols are given shorter codes)– data compression (e,g code repetition as a

special case)

Page 7: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/7/39

Packed decimal / half byte compression

• When frames just contain numeric characters – use binary coded decimal instead of 7 bit

ASCII or 8 bit EBCDIC as only the four least significant bits change with number.

– In ASCII “:” and “;” in same column are used as decimal pt and space respectively

Page 8: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/8/39

Packed DecimalSTX Cntrl XX ‘2’’6’ ‘:’’3’ ‘2’’;’ ‘4’’5’ ETX BCC

Opening Flag

Control character half byte compression

Number of digits following

1st number 26.32

Closing flag& Block CC

Page 9: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/9/39

• Whenever only small differences occur between successive values

• send only that difference• very effective in data logging• consider level of a river

STX ‘+’ ‘1’ ‘¬’ ‘+ ‘ ‘4’ ‘¬’ ETX BCC

STX +3 -95 +11 +124 -100 ETX BCC

Relative encoding sign, number and delimiter

Relative encoding using signed 8 bit integers

Relative encoding

Page 10: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/10/39

Character suppression• in a stream of digits there are often

sequences of the same characters, most frequently spaces.

• if a continuous string of three or more chars in a sequence it is replaced by Cntrl,char,number

• thus CntrlF25 means 25Fs in a sequence.• type of run-length encoding

Page 11: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/11/39

Character suppressionSTX Cntrl sp 45 ‘A’ ‘B’ ETX BCC

Opening Flag

Char being suppressed

number of chars

Closing flag& Block CC

Single letters

Control character

Page 12: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/12/39

Run length encoding• Run-length compression where the codeword actually

contains the number of repetitions.• A three byte minimum repetition is chosen such that all

occurrences of repetitions greater or equal to 3 will be encoded thus.

• <char><char><char><n> • this four byte codeword can represent repetitions up to 259<char> <char><char><char> <char><char><char><char><char> <char><char><char><0><char><char><char><char> <char><char><char><1><char><char><char><char><char> <char><char><char><2>

Page 13: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/13/39

Huffman coding• Instead of representing symbols with a

fixed no of bits, fewer bits are used for frequently occurring symbols and vice versa

• Method: Determine the relative frequency of symbols. Create an unbalanced tree with unequal branches.

Page 14: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/14/39

Example of Huffman• Consider that a group of characters A to H is to be

transmitted. This comprises• 9As, 9Bs, 5Cs, 5Ds, 2Es, 2Fs, 2Gs, 2Hs• Sequence of operations.

– a) Order the symbols in terms of probability– b) Combine the two least frequently occurring

symbols – c) assigning 1(upper) and 0(lower) to each.– d) This is now considered to be one entity.

Page 15: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/15/39

Huffman continued

• Perform the same steps until only two symbols are left.

• Determine the codeword by reading from left to right. The first bit being read is the least significant one.

Page 16: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/16/39

The resulting Huffman Codes for thesesymbols are:

A9 --> 1 0

B9 --> 0 1

C5 --> 1 1 1

D5 --> 1 1 0

E2 --> 0 0 0 1

F2 --> 0 0 0 0

G2 --> 0 0 1 1

H2 --> 0 0 1 0

Page 17: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/17/39

Comparison

• If there were N symbols then N codewords would be sent. In the case of fixed length binary codes this would be represented by 3N bits.

• How does this compare with those required by this example of Huffman encoding?

Page 18: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/18/39

Message Prob No of bits N messages

A 9/36 2 18/36N

B 9/36 2 18/36N

C 5/36 3 15/36N

D 5/36 3 15/36N

E 2/36 4 8/36N

F 2/36 4 8/36N

G 2/36 4 8/36N

H 2/36 4 8/36N

Total number of Bits 98/36N = 2.72N bits

Page 19: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/19/39

Therefore there has been a saving of 0.28N bits in comparison with fixed length binary each of 3 bits.

Redundancy it can shown that the ideal code for this sequence of symbols would take 2.53N bits ie. this is the actual information content of the stream of codewords.Thus for fixed length binary codes the Information contentRedundancy = 1 - -------------------------

Number of bits sentor = 1 - 2.53N/3.0N

= 16%for Huffman = 1 - 2.53N/2.72N = 7%

Page 20: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/20/39

MNP Class 5 Compression• is a combination of Huffman and run-length encoding. • The symbol stream is run-length encoded with a minimum

repetition of 3 bytes and then Huffman encoded using a statistically generated table.

• During transmission the statistics for the occurrence of each symbol are updated and the allocation of codewords are dynamically changed.

• MNP Class 5 compression achieves 2:1 compression on a regular basis. Its major drawback is that cannot turn itself off when it offers no gain, so that an incompressible file actually expands by >10%.

Page 21: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/21/39

Error detection and protection

• Introduction• Error Detection

– recognise that one has happened • Error Correction

– repair damaged data• parity and CRC.

• BCC and Hamming,

Page 22: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/22/39

Data errors• Errors can arise due to attenuation of signal

strength and due to other reasons.– well shaped signals can become distorted and thus

misinterpreted.• Random errors (each occurs with certain

probability)– noise in electronics– distance traveled

• Burst errors (groups of bits in error occur)– source interference – faults in equipment

Page 23: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/23/39

Error detection• A sequence of bits (I0 … In) is subjected to some processing

(P) giving rise to a check sequence (C0…Ck) • Both are transmitted toward a receiver and incur a possibility

of corruption.• Upon reception the bit stream is separated into received data

(I0r … Inr) and received check sequence (C0r…Ckr).• The received data (I0r … Inr) is assumed to be correct and the

same processing (P) is performed on it giving the reconstructed sequence (C0rr...Ckrr).

• If received check sequence (C0r…Ckr) and the reconstructed sequence (C0rr...Ckrr) are equal then no detectable error has occurred.

Page 24: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/24/39

Parity for ASCII codes• Consider a seven-bit ASCII code to comprise the

following bits which can be labeled I6, I5, I4, I3, I2, I1, I0

• A Parity bit P0 is placed beside the most significant bit I6 so that the codeword P0, I6, I5, I4, I3, I2, I1, I0 is formed.

• The Parity bit is determined as before so that for Odd parity there are an odd number of 1s in the codeword.

• and for Even parity there are an even number of 1s in the codeword.

Page 25: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/25/39

Block Sum Check CharacterP0 I6 I5 I4 I3 I2 I1 I0

1 0 1 1 1 0 1 00 1 1 1 1 0 0 11 1 0 1 1 1 0 00 0 1 0 0 1 1 01 1 1 0 1 0 0 11 1 1 1 0 1 1 10 1 0 1 1 0 0 0

Codeword 1

Codeword 2

Codeword 3

Codeword 4

Codeword 5

Codeword 6

Block Check Char.

Hey!See me!!

Page 26: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/26/39

Block Sum Check Character

• Consider what this method can do:– in terms of detecting errors.– in terms of correcting errors.

• Can you see where it might be used in practice?

• Where will it cease to work adequately?

Page 27: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/27/39

Cyclic Redundancy Check (CRC)

• The CRC is so called because the codes fall into a class of cyclic codes each forming new legal code which shifted, when added to a sequence of bits they increase the redundancy of the codeword.

• The data sequence is divided by a standard polynomial and the remainder is the check bits or CRC.

• Polynomial is of the form – 1.X4 + 0.X3 + 1.X2 + 0.X1 + 1– more usually written X4 + X2 + 1– and in binary take the form 10101

Page 28: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/28/39

The arithmetic is different! But easier

• In decimal 0..9 and 0..9 means 100 different additions and 21 different answers (0..20)

• In binary using a half adder or exclusive OR there are (0 1) and (0 1) meaning 4 different additions and only 2 answers.

• Thus 0 0 = 0, 0 1 = 1, 1 0 = 1 and 1 1 = 0 – being the symbol for exclusive OR.– think of a half adder being an adder without a carry.

Page 29: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/29/39

To perform CRC determination• Get data to be protected, ok 11011• Choose polynomial ok X4 + X2 + 1• append to data the number of bits indicated by the

maximum order of the polynomial (4) giving 110110000• divide this number by the polynomial thus

– 110110000 / 10101

• Take the remainder and send after the original data.• Upon reception check received CRC with reconstructed

CRC to determine error conditions.

Page 30: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/30/39

Use the polynomial x4 + x2 + 1 to generate CRC

11101 10101 110110000 10101 11100 10101 10010 10101 11100 10101 1001

Thus the remainder is 1001and codeword 110111001

Page 31: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/31/39

Does 111010010 contain an error, generated by using the same polynomial

as before. 11000 10101 111010000 10101 10000 10101 10100 10101 010

Thus the remainder is 0010 and codeword 111010010

Page 32: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/32/39

Or divide rx data and crc by generating polynomial and remainder should be

zero 11010 10101 111010010 10101 10000 10101 10101 10101 000

Thus the remainder is 000 and codeword 11101 was rx ok!

Page 33: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/33/39

Hamming Codes

11 10 9 8 7 6 5 4 3 2 1 position in codewordI6 I5 I4 C3 I3 I2 I1 C2 I0 C1 C0 information and checks

Given an ASCII code 1001010 what is the Hamming Code?

11 10 9 8 7 6 5 4 3 2 1 I6 I5 I4 C3 I3 I2 I1 C2 I0 C1 C0

1 0 0 x 1 0 1 x 0 x x

Page 34: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/34/39

How to determine the values of C3C2C1&C0

C3 C2 C1 C0

11 1 0 1 1 7 0 1 1 1 5 0 1 0 1

1 0 0 1 I6 I5 I4 C3 I3 I2 I1 C2 I0 C1 C0

1 0 0 1 1 0 1 0 0 0 1

Page 35: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/35/39

How does this detect an error?I6 I5 I4 C3 I3 I2 I1 C2 I0 C1 C0

1 0 0 1 1 1 1 0 0 0 1 C3 C2 C1 C0

11 1 0 1 1 8 1 0 0 0 7 0 1 1 1 6 0 1 1 0 5 0 1 0 1 1 0 0 0 1 0 1 1 0 Therefore 6th bit was received in error

Bit in error

Page 36: COM347J1 Networks and Data Communications

10/10/04 www.eej.ulster.ac.uk/~ian/modules/COM347J1/COM347J1_L4.ppt L4/36/39

Summary• Hamming codes have their redundant bits in the positions

which are powers of 2 ie 1,2,4,8 etc• They can detect and correct single errors.• They can indicate multiple error conditions but cannot

correct.• Used for random errors.• Can you think of how they might be applied to a

circumstance a burst error could occur? Assume that the burst is shorter that 8 bits and there are 256 bytes to be transmitted.