View
231
Download
1
Category
Tags:
Preview:
Citation preview
1 2003 ©UCR
CS 260Lecture 1: Introduction to
Network Processors
Instructor: L.N. Bhuyanwww.cs.ucr.edu/~bhuyan/CS260
2 2003 ©UCR
Outline°Introduction to NP Systems
°Relevant Applications
°Design Issues and Challenges
°Relevant Software and Benchmarks
°A case study: Intel IXP network processors
3 2003 ©UCR
What are Network Processors°Any device that executes programs to handle packets in a data network
°Examples• Processors on router line cards
• Processors in network access equipment
4 2003 ©UCR
Why Network Processors° Current Situation• Data rates are increasing
• Protocols are becoming more dynamic and sophisticated
• Protocols are being introduced more rapidly
° Processing Elements• GP(General-purpose Processor)
- Programmable, Not optimized for networking applications
• ASIC(Application Specific Integrated Circuit)- high processing capacity, long time to develop, Lack the flexibility
• NP(Network Processor)- achieve high processing performance
- programming flexibility
- Cheaper than GP
5 2003 ©UCR
Typical NP Architecture
SDRAM(Packet buffer)
SRAM(Routing table)
multi-threaded processing elements
Co-processor
Input ports Output ports
Network Processor
Bus Bus
6 2003 ©UCR
TCP/IP Model
° ISO OSI (Open Systems Interconnection) not fully implemented
° Presentation and Session layers not present in TCP/IP
Application
Pre.
Session
Transport
Network
Data Link
Physical
7
6
5
4
3
2
1
Application
TCP
IP
Host-to-Net
OSI TCP/IP
7 2003 ©UCR
Processing Tasks
Policy Applications
Network Management
Signaling
Topology Management
Queuing / Scheduling
Data Transformation
Classification
Data Parsing
Media Access Control
Physical Layer
DataPlane
ControlPlane
Source: Network Processor Tutorial in Micro 34 - Mangione-Smith & Memik
8 2003 ©UCR
Application Categorization
° Control-Plane tasks• Less time-critical
• Control and management of device operation
- Table maintenance, port states, etc.
° Data-Plane tasks• Operations occurring real-time on
“packet path”
• Core device operations- Receive, process and transmit packets
9 2003 ©UCR
Data Plane Tasks
° Media Access Control• Low-level protocol implementation
- Ethernet, SONET framing, ATM cell processing, etc.
° Data Parsing• Parsing cell or packet headers for address or protocol
information
° Classification• Identify packet against a criteria (filtering / forwarding
decision, QoS, accounting, etc.)
° Data Transformation• Transformation of packet data between protocols
° Traffic Management• Queuing, scheduling and policing packet data
10 2003 ©UCR
Applications: IPv4 Routing
° Routers determine next hop and forward packets
RouterA
B
C
PP P
11 2003 ©UCR
URL-based switching – My NSF Project
° Increase efficiency
° Tasks• Traverse the packet data (request) for each
arriving packet and classify it:- Contains ‘.jpg’ -> to image server
- Contains ‘cgi-bin/’ -> to application server
Switch
Image Server
Application Server
HTML Server
www.yahoo.comInternet
GET /cgi-bin/form HTTP/1.1 Host: www.yahoo.com…
APP. DATATCPIP
12 2003 ©UCR
Organizing Processor Resources°Design decisions:
• High-level organization
• ISA and micro architecture
• Memory and I/O integration
°Today’s commercial NPs:• Chip multiprocessors
• Most are multithreaded
• Exploit little ILP (Cisco does)
• No cache
• Micro-programmed
13 2003 ©UCR
Architectural Comparisons°High-level organizations
• Aggressive superscalar (SS)
• Fine-grained multithreaded (FGMT)
• Chip multiprocessor (CMP)
• Simultaneous multithreaded (SMT)
14 2003 ©UCR
Multithreading°Basic idea
• multiple register sets in the processor
• fast context switch
• switch thread on a cache access (How is this different than non-blocking cache?)
• tolerating local latency vs remote in CC-NUMA multiprocessors
• hybrids- switch on notice
- simultaneous multithreading
15 2003 ©UCR
Architectural Comparisons (cont.)Ti
me
(pro
cess
or
cycle
)Superscalar Fine-Grained Coarse-Grained Multiprocessing
Thread 1
Thread 2
Thread 3
Thread 4
Thread 5
Idle slot
SimultaneousMultithreading
16 2003 ©UCR
Tasks and Services
Three Benchmarks used in the experiment
17 2003 ©UCR
Some Challenges° Intelligent Design
• Given a selection of programs, a target network link speed, the ‘best’ design for the processor
- Least area
- Least power
- Most performance
° Write efficient multithreaded programs
• NPs have- Heterogeneous computer resources- Non-uniform memory- Multiple interacting threads of execution- Real-time constraints
• Make use of resources- How to use special instructions and hardware assists
– Compilers– Hand-coded
• Multithreaded programs- Manage access to shared state- Synchronization between threads
18 2003 ©UCR
Benchmarks for Network Processors• NetBench
- 10 applications
- http://cares.icsl.ucla.edu/NetBench
• CommBench- 8 networking and communications
applications
- http://ccrc.wustl.edu/~wolf/cb/
• EEMBC- http://www.eembc.org/benchmark
• MediaBench- Transcoders
- Some communications applications
19 2003 ©UCR
IXP1200 Block Diagram
° StrongARM processing core
° Microengines introduce new ISA
° I/O• PCI
• SDRAM
• SRAM
• IX : PCI-like packet bus
° On chip FIFOs• 16 entry 64B each
20 2003 ©UCR
IXP1200 Microengine° 4 hardware contexts
• Single issue processor
• Explicit optional context switch on SRAM access
° Registers• All are single ported
• Separate GPR
• 256*6 = 1536 registers total
° 32-bit ALU• Can access GPR or XFER registers
° Shared hash unit• 1/2/3 values – 48b/64b
• For IP routing hashing
° Standard 5 stage pipeline
° 4KB SRAM instruction store – not a cache!
° Barrel shifter
21 2003 ©UCR
IXP 2400 Block Diagram
° XScale core replaces StrongARM
° Microengines• Faster
• More: 2 clusters of 4 microengines each
° Local memory
° Next neighbor routes added between microengines
° Hardware to accelerate CRC operations and Random number generation
° 16 entry CAM
ME0 ME1
ME2ME3
ME4 ME5
ME6ME7
Scratch/Hash/CSR
MSF Unit
DDR DRAM controller
XScaleCore
QDR SRAM controller
PCI
22 2003 ©UCR
Different Types of Memory
Type Width
(byte)
Size
(bytes)
Approx unloaded latency (cycles)
Notes
Local 4 2560 1 Indexed addressing post incr/decr
On-chip Scratch
4 16K 60 Atomic ops
SRAM 4 256M 150 Atomic ops
DRAM 8 2G 300 Direct path to/fro MSF
23 2003 ©UCR
IXA Software Framework
Control Plane Protocol Stack
Control Plane PDK
Core Components
Core Component Library
Resource Manager Library
Microblock Library
Protocol Library
Hardware Abstraction Library
Microblock
Microblock
Microblock
Utility Library
XScaleCore
MicroenginePipeline Microengine
C Language
C/C++ Language
ExternalProcessors
24 2003 ©UCR
Example Toaster System: Cisco 10000
° Almost all data plane operations execute on the programmable XMC
° Pipeline stages are assigned tasks – e.g. classification, routing, firewall, MPLS
• Classic SW load balancing problem
° External SDRAM shared by common pipe stages
25 2003 ©UCR
IBM PowerNP
° 16 pico-procesors and 1 powerPC
° Each pico-processor• Support 2 hardware threads
• 3 stage pipeline : fetch/decode/execute
° Dyadic Processing Unit• Two pico-processors
• 2KB Shared memory
• Tree search engine
° Focus is layers 2-4
° PowerPC 405 for control plane operations
• 16K I and D caches
° Target is OC-48
26 2003 ©UCR
Motorola C-Port C-5 Chip Architecture
text
text
Q ueueM ngt U n it
FabricP rocessor
Tab leLookup
U nit
B uffe r M ngtU nit
E xecutiveP rocessor
C P -0
P H Y
C P -1
P H Y
C P -2
P H Y
C P -3
P H Y
C luster
textC P -12
P H Y
C P -13
P H Y
C P -14
P H Y
C P -15
P H Y
C luster
60G bps B usses
S R A MS R A M S R A MS witchFabric
PR
OM
PC
I
CO
NT
RO
L
27 2003 ©UCR
References° [NPT] W. H. Mangione-Smith, G. Memik Network Processor
Technologies
° [NPRD] Patrick Crowley, Raj Yavatkar An Introduction to Network Processor Research & Design, HPCA-9 Tutorial
Recommended