1 2003 ©UCR CS 260 Lecture 1: Introduction to Network Processors Instructor: L.N. Bhuyan...

1 2003 ©UCR

CS 260Lecture 1: Introduction to

Network Processors

Instructor: L.N. Bhuyanwww.cs.ucr.edu/~bhuyan/CS260

2 2003 ©UCR

Outline°Introduction to NP Systems

°Relevant Applications

°Design Issues and Challenges

°Relevant Software and Benchmarks

°A case study: Intel IXP network processors

3 2003 ©UCR

What are Network Processors°Any device that executes programs to handle packets in a data network

°Examples• Processors on router line cards

• Processors in network access equipment

4 2003 ©UCR

Why Network Processors° Current Situation• Data rates are increasing

• Protocols are becoming more dynamic and sophisticated

• Protocols are being introduced more rapidly

° Processing Elements• GP(General-purpose Processor)

- Programmable, Not optimized for networking applications

• ASIC(Application Specific Integrated Circuit)- high processing capacity, long time to develop, Lack the flexibility

• NP(Network Processor)- achieve high processing performance

- programming flexibility

- Cheaper than GP

5 2003 ©UCR

Typical NP Architecture

SDRAM(Packet buffer)

SRAM(Routing table)

multi-threaded processing elements

Co-processor

Input ports Output ports

Network Processor

Bus Bus

6 2003 ©UCR

TCP/IP Model

° ISO OSI (Open Systems Interconnection) not fully implemented

° Presentation and Session layers not present in TCP/IP

Application

Session

Transport

Network

Data Link

Physical

Application

Host-to-Net

OSI TCP/IP

7 2003 ©UCR

Processing Tasks

Policy Applications

Network Management

Signaling

Topology Management

Queuing / Scheduling

Data Transformation

Classification

Data Parsing

Media Access Control

Physical Layer

DataPlane

ControlPlane

Source: Network Processor Tutorial in Micro 34 - Mangione-Smith & Memik

8 2003 ©UCR

Application Categorization

° Control-Plane tasks• Less time-critical

• Control and management of device operation

- Table maintenance, port states, etc.

° Data-Plane tasks• Operations occurring real-time on

“packet path”

• Core device operations- Receive, process and transmit packets

9 2003 ©UCR

Data Plane Tasks

° Media Access Control• Low-level protocol implementation

- Ethernet, SONET framing, ATM cell processing, etc.

° Data Parsing• Parsing cell or packet headers for address or protocol

information

° Classification• Identify packet against a criteria (filtering / forwarding

decision, QoS, accounting, etc.)

° Data Transformation• Transformation of packet data between protocols

° Traffic Management• Queuing, scheduling and policing packet data

10 2003 ©UCR

Applications: IPv4 Routing

° Routers determine next hop and forward packets

RouterA

11 2003 ©UCR

URL-based switching – My NSF Project

° Increase efficiency

° Tasks• Traverse the packet data (request) for each

arriving packet and classify it:- Contains ‘.jpg’ -> to image server

- Contains ‘cgi-bin/’ -> to application server

Switch

Image Server

Application Server

HTML Server

www.yahoo.comInternet

GET /cgi-bin/form HTTP/1.1 Host: www.yahoo.com…

APP. DATATCPIP

12 2003 ©UCR

Organizing Processor Resources°Design decisions:

• High-level organization

• ISA and micro architecture

• Memory and I/O integration

°Today’s commercial NPs:• Chip multiprocessors

• Most are multithreaded

• Exploit little ILP (Cisco does)

• No cache

• Micro-programmed

13 2003 ©UCR

Architectural Comparisons°High-level organizations

• Aggressive superscalar (SS)

• Fine-grained multithreaded (FGMT)

• Chip multiprocessor (CMP)

• Simultaneous multithreaded (SMT)

14 2003 ©UCR

Multithreading°Basic idea

• multiple register sets in the processor

• fast context switch

• switch thread on a cache access (How is this different than non-blocking cache?)

• tolerating local latency vs remote in CC-NUMA multiprocessors

• hybrids- switch on notice

- simultaneous multithreading

15 2003 ©UCR

Architectural Comparisons (cont.)Ti

)Superscalar Fine-Grained Coarse-Grained Multiprocessing

Thread 1

Thread 2

Thread 3

Thread 4

Thread 5

Idle slot

SimultaneousMultithreading

16 2003 ©UCR

Tasks and Services

Three Benchmarks used in the experiment

17 2003 ©UCR

Some Challenges° Intelligent Design

• Given a selection of programs, a target network link speed, the ‘best’ design for the processor

- Least area

- Least power

- Most performance

° Write efficient multithreaded programs

• NPs have- Heterogeneous computer resources- Non-uniform memory- Multiple interacting threads of execution- Real-time constraints

• Make use of resources- How to use special instructions and hardware assists

– Compilers– Hand-coded

• Multithreaded programs- Manage access to shared state- Synchronization between threads

18 2003 ©UCR

Benchmarks for Network Processors• NetBench

- 10 applications

- http://cares.icsl.ucla.edu/NetBench

• CommBench- 8 networking and communications

applications

- http://ccrc.wustl.edu/~wolf/cb/

• EEMBC- http://www.eembc.org/benchmark

• MediaBench- Transcoders

- Some communications applications

19 2003 ©UCR

IXP1200 Block Diagram

° StrongARM processing core

° Microengines introduce new ISA

° I/O• PCI

• SDRAM

• SRAM

• IX : PCI-like packet bus

° On chip FIFOs• 16 entry 64B each

20 2003 ©UCR

IXP1200 Microengine° 4 hardware contexts

• Single issue processor

• Explicit optional context switch on SRAM access

° Registers• All are single ported

• Separate GPR

• 256*6 = 1536 registers total

° 32-bit ALU• Can access GPR or XFER registers

° Shared hash unit• 1/2/3 values – 48b/64b

• For IP routing hashing

° Standard 5 stage pipeline

° 4KB SRAM instruction store – not a cache!

° Barrel shifter

21 2003 ©UCR

IXP 2400 Block Diagram

° XScale core replaces StrongARM

° Microengines• Faster

• More: 2 clusters of 4 microengines each

° Local memory

° Next neighbor routes added between microengines

° Hardware to accelerate CRC operations and Random number generation

° 16 entry CAM

ME0 ME1

ME2ME3

ME4 ME5

ME6ME7

Scratch/Hash/CSR

MSF Unit

DDR DRAM controller

XScaleCore

QDR SRAM controller

22 2003 ©UCR

Different Types of Memory

Type Width

(byte)

(bytes)

Approx unloaded latency (cycles)

Local 4 2560 1 Indexed addressing post incr/decr

On-chip Scratch

4 16K 60 Atomic ops

SRAM 4 256M 150 Atomic ops

DRAM 8 2G 300 Direct path to/fro MSF

23 2003 ©UCR

IXA Software Framework

Control Plane Protocol Stack

Control Plane PDK

Core Components

Core Component Library

Resource Manager Library

Microblock Library

Protocol Library

Hardware Abstraction Library

Microblock

Utility Library

XScaleCore

MicroenginePipeline Microengine

C Language

C/C++ Language

ExternalProcessors

24 2003 ©UCR

Example Toaster System: Cisco 10000

° Almost all data plane operations execute on the programmable XMC

° Pipeline stages are assigned tasks – e.g. classification, routing, firewall, MPLS

• Classic SW load balancing problem

° External SDRAM shared by common pipe stages

25 2003 ©UCR

IBM PowerNP

° 16 pico-procesors and 1 powerPC

° Each pico-processor• Support 2 hardware threads

• 3 stage pipeline : fetch/decode/execute

° Dyadic Processing Unit• Two pico-processors

• 2KB Shared memory

• Tree search engine

° Focus is layers 2-4

° PowerPC 405 for control plane operations

• 16K I and D caches

° Target is OC-48

26 2003 ©UCR

Motorola C-Port C-5 Chip Architecture

Q ueueM ngt U n it

FabricP rocessor

Tab leLookup

B uffe r M ngtU nit

E xecutiveP rocessor

C P -0

C P -1

C P -2

C P -3

C luster

textC P -12

C P -13

C P -14

C P -15

C luster

60G bps B usses

S R A MS R A M S R A MS witchFabric

27 2003 ©UCR

References° [NPT] W. H. Mangione-Smith, G. Memik Network Processor

Technologies

° [NPRD] Patrick Crowley, Raj Yavatkar An Introduction to Network Processor Research & Design, HPCA-9 Tutorial

1 2003 ©UCR CS 260 Lecture 1: Introduction to Network Processors Instructor: L.N. Bhuyan...

Documents

Surface computing by deekshita bhuyan

CS260 Intro to Java & Android 04.Android Intro - Pacific Uzeus.cs.pacificu.edu/ryand/cs260/2015/Lectures/04.AndroidIntro.pdf · CS260 Intro to Java & Android 04.Android Intro Winter

9/30/2004Lec. 31 Lecture 2 Performance, Instruction Set Principles, Pipeline Hazards Instructor: L.N. Bhuyan CS 203A Advanced Computer Architecture

Keystone Review L.N.2.2 L.N.2.3 L.N.2.4 L.N.2.5. L.N.2.2 Use appropriate strategies to compare, analyze, and evaluate literary forms. L.N.2.2.1 Analyze

1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 14 Instructor: L.N. Bhuyan bhuyan

1 1999 ©UCB CS 161Computer Architecture Chapter 5 Instructor: L.N. Bhuyan bhuyan LECTURE 10

L.n an Lnl.nl.n L.n In L.nL.nLn l.n M m oo 0 0000 0 L.n Ln Ln L.nl.nt.nt.n Ln 00 0 ... · 2016. 12. 20. · 1.0 C S 6 S 0 0 0 0000 0 Ln Ian L.nl.nun L.n L.n m m Lin Ln Ln In L.nLnL.n

Arpad Kovacs CS260 Class Discussion CS260 - Input II ...hci.berkeley.edu/cs260-fall10/images/a/ab/SurfaceComputing_Arpad… · What is Surface Computing? Replace monitor, mouse and

1 1999 ©UCB CS 161Computer Architecture Chapter 5 Lecture 11 Instructor: L.N. Bhuyan bhuyan Adapted from notes by Dave Patterson (http.cs.berkeley.edu/~patterson)

CS260: Machine Learning Algorithms - CS | Computer Science

CS260 Final Portfolio - James Arlow

Dynamic Networks CS 213, LECTURE 15 L.N. Bhuyan. 10/3/2015 CS258 S99 2 What is Dynamic Network Dynamic Network is the network that can connect any input

1 1999©UCB CPSC 161 Lecture 3 Prof. L.N. Bhuyan bhuyan

1 1999©UCB CS 161 Lecture 4 Prof. L.N. Bhuyan bhuyan/cs161/index.html

CS 161 Design and Architecture of Computer Systems Lecture 1 Instructor: L.N. Bhuyan (bhuyan) Adapted from notes by Dave Patterson

DAP.F96 1 Lecture 9: Introduction to Compiler Techniques Chapter 4, Sections 4.1-4.3 L.N. Bhuyan CS 203A

CS260 Machine Learning Algorithms - Computer Scienceweb.cs.ucla.edu/~ameet/teaching/winter17/cs260/lectures/lec03.pdf · Decision Trees Professor Ameet Talwalkar Professor Ameet Talwalkar

L.n international trade

1 2003 ©UCR CS 162 Computer Architecture Lecture 8: Introduction to Network Processors (II) Instructor: L.N. Bhuyan bhuyan/cs162

Parallel Processing Architectures Laxmi Narayan Bhuyan bhuyan