26
Cyclic redundancy codes Circuit elements in Digital computations Prof. Seok-Bum Ko Mehrnoosh Janbakhsh Jan29, 2010

Cyclic redundancy codes

  • Upload
    katen

  • View
    105

  • Download
    3

Embed Size (px)

DESCRIPTION

Cyclic redundancy codes. Circuit elements in Digital computations Prof. Seok-Bum Ko Mehrnoosh Janbakhsh Jan29, 2010. Novel Table Lookup-Based Algorithms for High-Performance CRC Generation. - PowerPoint PPT Presentation

Citation preview

Page 1: Cyclic redundancy codes

Cyclic redundancy codes

Circuit elements in Digital computations Prof. Seok-Bum Ko

Mehrnoosh Janbakhsh Jan29, 2010

Page 2: Cyclic redundancy codes

2

Novel Table Lookup-Based Algorithms for High-Performance

CRC Generation

VOL.57, NO.11, November 2008

Michael E. Kounavis, Member, IEEEFrank L. Berry

Page 3: Cyclic redundancy codes

3

Introduction

CRC are used for detecting the digital content corruption

CRC treats each bitstream as a binary polynomial All the binary words corresponing to remainder are

transmitted with the bitstream At the receiver side, CRC algorithms verify the

correct remainder has been received

Page 4: Cyclic redundancy codes

4

Point of Interest

- New investigation on the CRC generation algorithms implementation in software

- Good for accelerating well known Codes- Give more speed to many commercial host,

network, and server chipsets- A number of proposed Internet protocols like

data center protocols require data integrity checks be performed above the transport layer by using very high speed CRCs(e.g., 10 Gbps)

Page 5: Cyclic redundancy codes

5

Sarwate Algorithm

This algorithm is able to read 8 bits at a time from a stream and calculates the stream's CRC value by performing lookups on a table of 256 32-bit entries.

It was designed when most computer architectures allowed XOR operations between 8-bit quantities.

Now they can perform efficiently between 32- or 64-bit quantities and few clock cycles large on-chip cache memory access.

Page 6: Cyclic redundancy codes

6

What is new here?

Novel slicing-by-4 algorithm Based on Sarwate algorithm

Use a 4-Kbyte cache footprint

Double the existing CRC performance by reading 32 bits at a time

Novel slicing-by-8 algorithmBased on Sarwate algorithm

Use a 8-Kbyte cache footprint

Triples the existing CRC performance by reading 64 bits at a time

Page 7: Cyclic redundancy codes

7

Advantages

- Using the parallel lookup tables to generate the CRC values over long bitstreams.

- Compute the next remainder by performing parallel LUTs into smaller tables

Page 8: Cyclic redundancy codes

8

Parallet LUTs concept

The concept of parallel table lookups appears in early CRC5 implementations and the work done by Braun and Waldvogel on performing incremental CRC updates for IP over ATM networks.

Page 9: Cyclic redundancy codes

9

CRC Generation Process

CRCs are error detecting codes that are capable to detect the accidental alteration of data. Data in computer systems can be modified due to many reasons like hard drive malfunctions, Gaussian noise, and faulty physical connections.

Page 10: Cyclic redundancy codes

10

How CRC algorithm works?

It treats each bitstream as a binary polynomial B(x) and the remainder R(x) from the division of B(x) with a standard ”generator” polynomial g(x).

The length of R(x) in bits is equal to the length of G(x) minus one.

At the reciever, CRC algorithms verify that R(x) is the correct remainder.

Additions and subtractions are carry-less so they are equal to the XOR logical operation.

Page 11: Cyclic redundancy codes

11

Straightforward LUT Example 1 divisor dividend

11011 10001 1 1 0 11000 11011 ↓ ↓ ↓ steps 1010 1 ↓ ↓replaced by 1101 1 ↓ ↓a LUT 111 0 1 ↓ 110 1 1 ↓current remainder 011 0 0

Accelerating the long division using table lookups

Page 12: Cyclic redundancy codes

12

Modify Ex. 1

Remainder slicing

Page 13: Cyclic redundancy codes

13

Sarwate Alg. disadvantage

The memory requirement is high when reading a large amount of bits at a time. For example, to achive acceleration by reading 32 bits at a time, table driven algorithm needs a table of 2 ³²= 4G entries.

Page 14: Cyclic redundancy codes

14

First step

• p is the MSB of B (bit stream)• l be the length of B, l>p• g be the length of generator polynomial, g<l• l-g+1 is B's MSB that got encoded• g-1 is B's LSB that is equal to zero

Page 15: Cyclic redundancy codes

15

Continue

• P= {b1,b2,.....bp} , B= {b1,b2,...,bl}• P= {P1:P2:....:Pm}• p is the length of P and p=Σpi

• P is sliced in order for our Alg. to be able to read potentially large amounts of data without having to access to LUT of 2 power p entries.

• Each Pi has its own LUT:Ti by the size of 2 power pi and contains the shifted remainders by an offset oi.

Page 16: Cyclic redundancy codes

16

calculations

• oi = ∑pj , m< j < i+1

• Let's R1 (i) be the values from LUT during first step: Ri (1)= Pi . 2 power oi mod G

• Ri (1)= ө Ri (1) , m<i<1

• S(1)=[ R(1): Q(1)]=R(1).2 power q ө Q(1)

• Q(1) is the set of next q bits of the bit stream after p bits

Q(1)=[bp+1bp+2....bp+q]

Page 17: Cyclic redundancy codes

17

Step k

• The difference between first step and other steps is because the length of the input stream l may not be a multiple of the amount of bits that are read at a time q.

• f i = ∑ sj , m<j <i+1

• Ri (k)= Si (k-1) . 2 power fi mod G• Ri (k)= ө Ri (k) , n<i<1

• S(k)=[ R(k): Q(k)]=R(k). 2 power q ө Q(k)

• N=l/q +1

Page 18: Cyclic redundancy codes

18

Correctness

They prove the correctness of the algorithmic framework by showing the value of R(n) that is produced in the last step of framework is indeed the remainder from the division of the input stream B with the generator polynomial using modulo-2 arithmetic.

Page 19: Cyclic redundancy codes

19

Space and time requirements

• In the first step, m slices are created and m LUT performed.

• In worse case each slice will need one shift operation and one logical operation.

• m-1 XOR operations are required for the execution of the first step

• Total number of operations Including shift, AND, XOR and LUTs is O(1) = 4.m – 1

• Since LUTs are in parallel, it will reduce to O(1) = 3.m in fist step

Page 20: Cyclic redundancy codes

20

Continue

• In step k, the total number of operations required for the execution will be:

• O= ∑ o(i) = 3.n (N+1) +3.m n: No. of LUTs

m-1: No. of XOR

N : No. of steps to execute

Page 21: Cyclic redundancy codes

21

Continue

• The space required for storing the tables used by the first step of our algorithmic framework is :

E(1) = ∑ 2 power pi m < i < 1• And in step k : E(k) = ∑ 2 power si n < i < 1

Page 22: Cyclic redundancy codes

22

Riminder

• The total space requirement of the slicing by 4 Alg. is 4 K bytes and it could read 32 bits at a time.

• The total space requirement of the slicing by 8 Alg. is 8 K bytes and it could read 64 bits at a time.

Page 23: Cyclic redundancy codes

23

Evaluation

• It is a trade-off between the number of logical operations and the space requirement of the algorithm.

• If tables are stored in an external memory unit, the latency associated with accessing these tables may be significantly higher than they are stored in a cache unit.

• Slicing reduces the number of operations performed for each byte of an input stream.

Page 24: Cyclic redundancy codes

24

Comparison of an input byte

No. of XOR Shift operation2 1 1 35

Slice by 4 1 1 0 16

LUTsTotal Operations in a loop

Sarwate Algorithm

Page 25: Cyclic redundancy codes

25

Min. and Ave. processing cost

• “Warm” refer to any memory entry placed in a cache memory• “Cold” refer to any any memory entry stored in an external memory unit

Page 26: Cyclic redundancy codes

26

Any Questions?