View
215
Download
0
Tags:
Embed Size (px)
Citation preview
1
High-performance TCAM-based IP Lookup Engines
Authors: Hui Yu, Jing Chenm Jianpian Wang and S.Q. Zheng
Publisher: IEEE INFOCOM 2008
Present: 林呈俞
Date: 2008/9/24
2
Outline Introduction
• Previous works
• MSMB scheme
• MSMB-PT scheme MSMB-LPT scheme Goals of this paper Proposed works
• M-MSMB-LPT scheme
• MSMB-LPT-I scheme Experimental results
3
Introduction (1/3) To achieve high IP lookup performance, it has been proposed to use TCAMs to
implement IP-Lookup accelerators.
One TCAM-based routing table is shared by multiple packet streams in one line card or multiple line cards in practice.
Previous works on reconfiguring a TCAM into several independent blocks.
• MSMB
• MSMB – PT
• MSMB – LPT
4
Introduction (2/3) MSMB (Multi – Selector and Multi – Block) scheme
• Proposed in [6] to reconfiguring a TCAM into several independent blocks so that parallel IP lookup is possible.
• With K TCAMs, instead of performing only one lookup in each cycle, all TCAMs can concurrently be used for different lookups.
• One would need M parallel RDs for the this system.
5
Introduction (3/3) MSMB – PT (Popular – prefix table) scheme
• This scheme is based on temporal locality of packet destinations.
• In order to alleviate the TCAM contention problem caused by traffic bias.
Popular-Prefix Table (PT) : caching some of the prefixes recently used by all inputs.
6
MSMB – LPT (Local PT) (1/2) A flow is a stream of packets, for which the packets are transmitted as a bursty
sequence. For a given router R, the packets of flows arrive at same input of R exhibit bias
of IP streams to a small set of IP prefixes. For any bursty traffic period of an input of R, the bias of IP addresses is called t
he temporal locality of flows. The major difference between MSMB – LPT and MSMB – PT are as follows
MSMB – LPT improve the performance of MSMB – PT by up to 250%(speedup), 80%(hit ratio), 82%(TCAM contention), and 71%(TCAM power consumption).
LPT helps to reduce the number of accesses to the TCAM blocks and TCAM contentions.
MSMB-PT MSMB-LPT
Capture temporal locality global to all input.
Capture temporal locality of flow
7
MSMB – LPT (Local PT) (2/2)
Local Popular-Prefix Table (LPT) : it used to dynamically store recently referenced IP prefixes requested from input i.
Contention Resolver (CR) : chooses one request according to a priority scheme and passes it to TCAM.
8
Goals of this paper How to design a TCAM-based IP lookup engine that
• improves MSMB-LPT without using more HW resources ?
• satisfy given performance requirements ?
For lage m (inputs)
• How to design a scalable TCAM-based IP lookup engine ?
• How to find tradeoffs among cost, performance and reliability ?
9
Proposed work (1/5) Definitions:
• MSMB – LPT has a configuration with (m, n, k)
• m input
• k TCAM blocks
• LPT of size n
• Total number of prefixes M (each block contains M/k prefixes).
The parameters m and k are carefully selected to achieve optimized cost and performance.
Are there better MSMB schemes for given m and k ?
Two proposed schemes:
• M – MSMB – LPT
• MSMB – LPT – I
10
Proposed work (2/5) Multiple(M) – MSMB – LPT
• For large m (input), we propose to use w identical copies of MSMB – LPT of configuration (m’, n, k).
• input i*m’ + j as the j-th input of the (i+1)-th MSMB-LPT.
m’ = m / w
11
Proposed work (3/5) Multiple(M) – MSMB – LPT
The w TCAM clocks TCAMj,u ,have the same content as TCAMu in MSMB-LPT, where j = 1 ~ w.
We say that an M-MSMB-LPT has configuration (m, n, w, k).
• if it has w MSMB-LPTs of configuration (m’, n, k).
In an M-MSMB-LPT scheme, w MSMB-LPTs operate completely independently.
MSMB - LPTj
Input (j-1)*m’ + 1Input (j-1)*m’ + 2
Input j*m’
k CRs and k TCAM blocks
…
12
Proposed work (4/5) MSMB – LPT – Interleaved TCAMs (MSMB – LPT – I)
• An MSMB – LPT – I of configuration (m, n, w, k) has
• m input, and the LPT of size n.
• wk TCAM blocks that are partitioned into k groups, each called TCAM bundle.
Input 1
Input 2
Input m
The w TCAM blocks in the j-th TCAM bundle contain the same content as that of TCAMj in the MSMB-LPT scheme.
k bundles
13
Proposed work (5/5)Process runs concurrently
i = 1~ m j = 1~ k
ni – th key from input i
The concurrent TCAM – search processes are coordinated by CR, which can be implemented as a round robin m – to – w selector.
14
Experimental results (1/9) We conduct a serious simulations on M-MSMB-LPT and MSMB-LPT-I.
• First – in – first – out (FIFO) replacement policy is used for LPT update.• Round – rodin (RR) arbitration is used for TCAM contention resolution.
Two packet traces are used in simulations.• 1. generating accroding to routing table described in [17].• 2. derived from actual packet flows given in [19].
The performance of an M-MSMB-LPT is determined by a single component MSMB-LPT.
The performance of MSMB-LPT and M-MSMB-LPT can be derived from the performance of MSMB-LPT-I with configurations (m, n, w, k) as follows.• (m, n, 1, k) = MSMB-LPT with (m, n, k).• (m, n, 1, k) = M-MSMB-LPT with (w*m, n, w, k).• Example:
• MSMB-LPT-I with (6, n, 1, 4) can be used to indicate the performance of M-MSMB-LPT with (12, n, 2, 4) as well as (18, n, 3, 4)
# bundles# blocks
15
Experimental results (2/9) Performance metrics
• TCAM contention ratio
• Speedup over naïve MSMB
• TCAM utilization
# contentions at TCAM blocks
Total # key search time.
Total # parallel cycles to complete IP lookup for all packets in a trace.
AMSMB-LPT-I(j) : total # cycles in which TCAMj blocks is searched.
19
Experimental results (6/9) Contention ratio
• 36 inputs and 4 TCAM blocks in each bundle.
• Increase the number of TCAM bundles.
• From 1 to 2
• From 4 to 6
1
2
34
(36, n, w, 4) w = 1, 2, 4, 6
20
Experimental results (7/9) Given the available TCAM resource such as
• # TCAM bundles – 2
• # TCAM blocks in each bundle – 4 It is important to know the expected contention ratio under different inputs.
(m, n, 2, 4) m = 6, 12, 18, 36
6
12
18
36
21
Experimental results (8/9) Speedup gain of increasing the TCAM bundle for a given # inputs.
(36, n, w, 4) w = 1, 2, 4, 6
1
2
46