25
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University of Pittsburgh

A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Embed Size (px)

Citation preview

Page 1: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

A Scalable and ReconfigurableSearch Memory Substrate for

High Throughput Packet Processing

Sangyeun Cho and Rami Melhem

Dept. of Computer ScienceUniversity of Pittsburgh

Page 2: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Feb. 6 ’07 – CCW-21

Lookup ops in packet processing Packet forwarding

• Given an IP address• Look up in a table (IP table) a matching prefix• Make sure the chosen prefix is longest LPM (Longest Prefix

Matching)

Rule-based packet filtering• Given a set of packet fields• Look up in a rule database matching entries

Deep packet inspection• Given a string in packet payload• Look up in a signature database matching entries

Page 3: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Feb. 6 ’07 – CCW-21

Lookup performance scalability Lookup performance must match increasing line speeds

• For OC-768, up to 104M packets must be processed per second• Network traffic has doubled every year [McKeown03]

• Router capacity doubles every 18 months

Capacity pressure• Routing tables (~200K prefixes in a core router) are growing [RIS]• # of firewall rules increases; 100K rules are practical [Baboescu04]

• IPv6

Power and thermal issue already a critical limiting factor in network processing device design [McKeown03]

Two conventional lookup solutions• Software methods (tries, hash table, …)• Hardware methods (TCAM, Bloom filter, …)

Page 4: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Feb. 6 ’07 – CCW-21

IP lookup using a trie

Consider an IP address: 0 1 0 0 0 1 1 0

“flexibility”

high memory capacity requirement

high memory bandwidth requirement

not SCALABLE

Page 5: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Feb. 6 ’07 – CCW-21

IP lookup using TCAM

Consider an IP address: 0 1 0 0 0 1 1 0

110100*110101*110111*01000*01100*01101*11011*0100*0110*1101*10*0*

sort beforestoring

choose the firstamong the matched high bandwidth, constant time

lookup

TCAMs are relatively small, expensive

power consumption very high

not SCALABLE

Page 6: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Feb. 6 ’07 – CCW-21

CA-RAM – a hybrid approach

Can we do better than the existing conventional schemes?• Flexibility and search performance• Exploit optimized RAM designs

CA-RAM combines hashing w/ hardware parallel matching

CA-RAM design goals• High lookup performance• Low power consumption• Smaller chip area per stored datum• Straightforward system-level integration

Page 7: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Feb. 6 ’07 – CCW-21

CA-RAM – Content Addressable RAM

Separate match logic and memory Match logic for a single row, not every row Allows the use of dense RAM technology Enables highly reconfigurable match logic Keep keys sorted in each row, not in entire array

Match logic

Memory cells

Conventional CAM/TCAM CA-RAM

Page 8: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Feb. 6 ’07 – CCW-21

Very simple, yet efficient

Use hashing to store keys in a particular row To look up, hash the key and retrieve one row Perform matching on entire row in parallel Achieve full content addressability w/o paying overhead!

Index generato

r

Keyi1

Match processor1

Keyi2

Keyj2Keyj1

Match processor2…

key

Page 9: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Feb. 6 ’07 – CCW-21

Pipelined CA-RAM operation

Index generator

Search keyKeyi1

Match processor1

Keyi2

Keyj2Keyj1

Match processor2

Result

Match processor3

Keyi3

Keyj3

Step 1 Step 2 Step 3 Step 4

Index

Keyj2Keyj1 Keyj3

Search key Match processor2

Index generationMemory accessKey matchingResult forwarding

Page 10: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Feb. 6 ’07 – CCW-21

Dealing w/ bucket overflows

Careful design of hash function Increase bucket size

• Reduce load factor (); = # of occupied entries / # of total entries

Use “chaining”; store overflows in subsequent rows• Multiple accesses per lookup

Use a small overflow CAM, accessed in parallel• Similar to popular “victim caching” in computer architecture

Use two-level hashing and employ multiple CA-RAM banks

……

Page 11: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Feb. 6 ’07 – CCW-21

CA-RAM reconfig. opportunities

Reconfigurable match logic allows:

Adapting key size to apps• Same hardware to support multiple apps or standards

……

Page 12: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Feb. 6 ’07 – CCW-21

Adapting key size

Keyi1

Reconfigurable match logic

Keyi2

Keyj2Keyj1

Keyi3

Keyj3

Match information

Keyi1 Keyi2

Keyj2Keyj1 Adapting key size is straightforward

Will benefit supporting multiple apps/ standards

Select key bitsfor matching

Page 13: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Feb. 6 ’07 – CCW-21

CA-RAM reconfig. opportunities

Reconfigurable match logic allows:

Adapting key size to apps• Same hardware to support multiple apps or standards

Binary and ternary matching• Some apps require ternary matching, some don’t

……

Page 14: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Feb. 6 ’07 – CCW-21

Supporting binary/ternary matching

Reconfigurable match logic

Match information

Keyi1 Keyi2

Keyj2Keyj1

Search key

Maskj1

Maski1

Developed configurable comparator

T-matching requires 2 bits / 1 symbol

Supporting different types of matching in different bit positions feasible

Consider maskbits or not

Page 15: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Feb. 6 ’07 – CCW-21

CA-RAM reconfig. opportunities

Reconfigurable match logic allows:

Adapting key size to apps• Same hardware to support multiple apps or standards

Binary and ternary matching• Some apps require ternary matching, some don’t

Storing data and keys in a CA-RAM module• Cuts # of memory accesses for IP lookup by half

……

Page 16: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Feb. 6 ’07 – CCW-21

Simult. key matching & data access

Reconfigurable match logic

Match information

Keyi1 Keyi2

Keyj2Keyj1

Search key

Dataj1

Datai1

Data access follows TCAM lookup

CA-RAM supports data embedding

Cuts memory traffic & latency by half

Match information & Data

Match key &bypass data

Page 17: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Feb. 6 ’07 – CCW-21

CA-RAM reconfig. opportunities

Reconfigurable match logic allows:

Adapting key size to apps• Same hardware to support multiple apps or standards

Binary and ternary matching• Some apps require ternary matching, some don’t

Storing data and keys in a CA-RAM module• Cuts # of memory accesses for IP lookup by half

Providing range checking capabilities• Beneficial for rule-based packet filtering

……

Page 18: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Feb. 6 ’07 – CCW-21

Supporting range checking

Reconfigurable match logic

Match information

Keyi1 Rangei1

Rangej1Keyj1

Search key

(Range checking causes troubles)

(Entries must be expanded)

CA-RAM can upport range checking efficiently Match key &

check range

Page 19: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Feb. 6 ’07 – CCW-21

Evaluation

We implemented a CA-RAM design (w/ reconfigurability) and evaluated its power and area advantages over state-of-the-art TCAMs

We experimented with real routing tables to estimate the load factor and the average memory accesses per lookup

Page 20: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Feb. 6 ’07 – CCW-21

Comparing CA-RAM and TCAM

0123456789

10

16T SRAM-basedTCAM

8T DRAM-basedTCAM

6T DRAM-basedTCAM

DRAM-based ternaryCA-RAM

Per

Cel

l Are

a (u

m2)

@13

0nm

4.5x

11x

0

1

2

3

4

5

6

7

8

16T SRAM-basedTCAM

8T DRAM-basedTCAM

6T DRAM-basedTCAM

DRAM-based ternaryCA-RAM

4.5M

b P

ower

(W

) @

143M

Hz

14x

4x

Cell area (m2)@130nm CMOS

Power (W)4.5Mb @143MHz

CA-RAM area advantage 4.5x~11x

CA-RAM power advantage 4x~14x

Page 21: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Feb. 6 ’07 – CCW-21

Mapping a large IP routing table

Consider multiple design points:

Design B

Design A

Design D

Design C

Design EDesign F

2,048 rows (32 entries)

4,096 rows (64 entries)

( = 0.47)

( = 0.40)

( = 0.36)

( = 0.36)

( = 0.24)

( = 0.36)

Page 22: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Feb. 6 ’07 – CCW-21

0

0.5

1

1.5

2

2.5

Design A Design B Design C Design D Design E Design F

Mapping a large IP routing table

0%

10%

20%

30%

40%

Design A Design B Design C Design D Design E Design F

Spilled entries

0

0.5

1

1.5

2

2.5

Design A Design B Design C Design D Design E Design F

Average memoryaccess latency

( = 0.47) ( = 0.40) ( = 0.36) ( = 0.36) ( = 0.24) ( = 0.36)

“Uniform” traffic

“Skewed” traffic

With a properly chosen ,

CA-RAM achieves near-constant AMAL

Page 23: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Feb. 6 ’07 – CCW-21

Mapping a large IP routing table

0

0.2

0.4

0.6

0.8

1

1.2

TCAM TCAM

CA-RAM

CA-RAM

Area Power

CA-RAM advantageous over TCAM

Design B

Page 24: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Feb. 6 ’07 – CCW-21

Conclusions

Compared w/ software methods• Less # of memory accesses; higher lookup performance

Compared w/ TCAM• Higher density matching that of DRAM large lookup table• Exceeds the speed of TCAM• Low power – a critical advantage for cost-effective system

design

Reconfigurability• Can accommodate apps having different key/record sizes,

binary vs. ternary searching requirements, range checking, …• Can adopt new standards much more easily, e.g., IPv6

Page 25: A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University

Feb. 6 ’07 – CCW-21

CA-RAM components

Index generator

Result Bus

Keyi1

Match processor1 …

Keyi2

Keyj2Keyj1

Match processorsMatch processor2

C bits

2R rows

N bits