29
12/18/05 Yan Luo, CAR of UML 1 Network Processor: Architecture and Applications Yan Luo [email protected] http://faculty.uml.edu/yluo/

Network Processor: Architecture and Applications

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 1

Network Processor:Architecture and Applications

Yan [email protected]

http://faculty.uml.edu/yluo/

Page 2: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 2

Outline

Overview of Network ProcessorsNetwork Processor ArchitecturesApplicationsCase Studies

Wireless Mesh Networka Content-Aware Switch

Conclusion

Page 3: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 3

Packet Processing in the Future Internet

•High processing power•Support wire speed•Programmable•Scalable•Optimized for networkapplications• …

ASIC

General-Purpose Processors

More packets &

Complex packet processing

Future Internet

Page 4: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 4

What is Network Processor ?

Programmable processors optimized fornetwork applications and protocol processing

High performance

Programmable & Flexible

Optimized for packet processing

Main players: AMCC, Intel, Hifn, Ezchip,Agere

Semico Research Corp. Oct. 14, 2003

Page 5: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 5

Commercial Network Processors

Multi-threaded, on-chip trafficmanagement

OC-192/10 Gbps

PayloadPlusAgere

Multi-threaded multiprocessorcomplex, h/w accelerators

OC-48/2.5 Gbps

5NP4GHifn

Classification engines, trafficmanagers

OC-192/10 Gbps

NP-2EZchip

Intel

AMCC

Vendor

IXP2850

nP7510

Product

Multi-core, h/w multi-threaded,coprocessor, h/w accelerators

OC-192/10 Gbps

Multi-core, customized ISA,multi-tasking

OC-192/10 Gbps

FeaturesLinespeed

Page 6: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 6

Typical Network Processor Architecture

SDRAM(e.g. packet buffer)

SRAM(e.g. routing table)

Co-processor

Network interfaces

Network ProcessorBus

H/w accelerator

PE

Page 7: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 7

Intel IXP2400 Network ProcessorIntel IXP2400 Network Processor

Page 8: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 8

Snapshots of IXP2xxx BasedSystems

Radisys ENP2611 PCI Packet Processing Engine

ADI Roadrunner Platform•multiservice switches, •routers, broadband access devices, •intrusion detection and prevention (IDS/IPS)•Voice over IP (VoIP) gateway•Virtual Private Network gateway•Content-aware switch

•IPv4 Forwarding/NAT•Forwarding w/ QoS / DiffServ•ATM RAN•IP RAN•IPv6/v4 dual stack forwarding

Page 9: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 9

Intel IXP425 Network Processor

Page 10: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 10

StarEast: IXP425 Based Multi-radioPlatform

Page 11: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 11

Applications of Network Processors

DSL modem

Wireless router

VoIP terminal

Printer server

Edge router

VPN gateway

Core router

Page 12: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 12

Case Study 1:Wireless Mesh Network

Page 13: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 13

Software Stack on StarEast

Page 14: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 14

Case Study 2: Content-aware Switch

Switch

Media Server

Application Server

HTML Server

www.yahoo.comInternet

GET /cgi-bin/form HTTP/1.1 Host: www.yahoo.com…

APP. DATATCPIP

Front-end of a Web cluster, only one Virtual IP Route packets based on Layer 5 information

Examine application data in addition to IP& TCP Advantages over layer 4 switches

Better load balancing: distributed based on content typeFaster response: exploit cache affinityBetter resource utilization: partition database

Page 15: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 15

Mechanisms to Build a Content-awareSwitchTCP gateway

An application level proxy Setup 1st connection w/ client,

parses request server, setup 2nd

connection w/ server Copy overhead

TCP splicing Reduce the copy overhead Forward packet at network level

between the network interfacedriver and the TCP/IP stack

Two connections are splicedtogether

Modify fields in IP and TCP header

kernel

user

kernel

user

server

server

client

client

Page 16: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 16

Anatomy of TCP Splicing

Without TCP Splicing With TCP Splicing

SEQ # translationChecksum Recalculation

Etc.

Bookkeeping of connection states,

selection of servers, state migration

Page 17: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 17

Design Options

•Option 0: GP-based (Linux-based) switch•Option 1: CP setup & and splices connections, DPs process packets sentafter splicing

Connection setup & splicing is more complex than data forwardingPackets before splicing need to be passed through DRAM queues

•Option 2: DPs handle connection setup, splicing & forwarding

Page 18: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 18

IXP 2400 Block Diagram

XScale core Microengines(MEs)

2 clusters of 4microengines each

Each MErun up to 8 threads16KB instruction

storeLocal memory

Scratchpadmemory, SRAM &DRAM controllers

ME ME

MEME

ME ME

MEME

ScratchHashCSR

IX businterface

SRAM controller

XScale

SDRAM controller

PCI

Page 19: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 19

Resource Allocation Client-side control block list

record states for connections between clients andSpliceNP, states after splicing

Server-side control block list record states for connections between server and

SpliceNP

• Client side CB list• Server side CB list • server selection table• Locks

Packet buffer

SRAM (8MB)

DRAM (256MB)

Packet queues

Scratchpad (16KB) Client ME

Microengines

Server ME

Rx ME

Tx ME

Page 20: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 20

Comparison of Functionality

YYYSend ACK packet, change state toSYN_RCVD

10

Only MSSoption

YYProcess TCP option9NYYReset idle time and keep-alive timer8NYYInitialize TCP and IP header template7

No socket, onlycontrol block

YYCreate new socket and set state toLISTEN

6YYYControl block lookup5YYYTCP header verification4NYYIP option processing3YYYIP header verification2YYYDequeue packet1

SpliceNPLinuxSplicer

TCPFunctionalityStep

Processing a SYN packet• A lite version of TCP due to the limited instruction size of microengines.

Page 21: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 21

Experimental Setup

Radisys ENP2611 containing an IXP2400 XScale & ME: 600MHz 8MB SRAM and 128MB DRAM Three 1Gbps Ethernet ports: 1 for Client port and 2 for Server

ports Server: Apache web server on an Intel 3.0GHz Xeon

processor Client: Httperf on a 2.5GHz Intel P4 processor Linux-based switch

Loadable kernel module 2.5GHz P4, two 1Gbps Ethernet NICs

Page 22: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 22

Latency on a Linux-based TCPSplicer

Latency is reduced by TCP splicing

Page 23: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 23

Latency vs Request File Size

Latency reduced significantly 83.3% (0.6ms 0.1ms) @ 1KB

The larger the file size, the higher the reduction 89.5% @ 1MB file

0

2

4

6

8

10

12

14

16

18

20

1 4 16 64 256 1024

Request file size (KB)

La

ten

cy

on

th

e s

wit

ch

(m

s)

Linux-based

NP-based

Latency on the Splicer (ms)

Page 24: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 24

Comparison of Packet ProcessingLatency

Page 25: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 25

Analysis of Latency Reduction

IXP processing:Optimized ISA6.5 us

Linux processing: OS overheadsProcessing a data packet insplicing state: 13.6 us

No copy: Packetsare processed insidewithout two copies

NIC-to-mem copyXeon 3.0Ghz Dual processor w/1Gbps Intel Pro 1000 (88544GC)NIC, 3 us to copy a 64-byte packetby DMA

pollingInterrupt: NIC raises an interruptonce a packet comes

NP-basedNP-basedLinux-basedLinux-based

Page 26: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 26

Throughput vs Request File Size

Throughput is increased significantly 5.7x for small file size @ 1KB, 2.2x for large file @ 1MB

Higher improvement for small files Latency reduction for control packets > data packets Control packets take a larger portion for small files

0

100

200

300

400

500

600

700

800

1 4 16 64 256 1024

Request file size (KB)

Th

rou

gh

pu

t (M

bp

s)

Linux-based

NP-based

Page 27: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 27

Conclusion

Network Processor combines high-performance packet processing andprogrammability

A large variety of NP applicationsEfficient resource utilization is challenging

Page 28: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 28

Thank you !

Page 29: Network Processor: Architecture and Applications

12/18/05 Yan Luo, CAR of UML 29

Microengine