48
Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Embed Size (px)

Citation preview

Page 1: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Adaptive CPU Allocation forSoftware basedRouter Systems

Puneet Zaroo

Page 2: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Software based routers

Implement packet forwarding/processing in software. E.g a PC with multiple NICs.

Provide value added services like encryption, network address translation esp. at the network edge.

Issues Software architecture.

Per flow threads / per-packet threads Division of input, forwarding and output functions

CPU scheduling. How to determine CPU shares How to enforce CPU shares.

Page 3: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Objective

Leverage the advantages of a component based software router system. Flexibility in designing routers Reusability of software components Dynamic addition of element modules

Overlay a QoS provisioning mechanism on top of the component based system.

Develop an adaptive QoS system Adaptive to varying input rate and per-packet

processing costs.

Page 4: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Some Software Router Systems

Router Plugins : ETH Zurich, Uwash St. Louis Per flow code modules or plugins. Implemented in the NetBSD kernel.

Click Modular router : MIT Routers made of elements composed into a flow graph.

ANTS Programmable and customizable networks. Customizable applications acting on packets / packets carrying code as well

as data. X-kernel : University of Arizona

Object oriented interface to protocols. Can be used on end systems as well as routers.

Scout : University if Arizona, Princeton University Communication oriented OS based on x-kernel. Path based abstraction. Advanced CPU scheduling.

Page 5: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

OS support for CPU scheduling

Scout Proportional scheduling. CPU balance (extension of work on livelock)

Resource Containers : Rice University Decoupling of protection domain/resource

domain. Proper accounting of resources to processes.

Resources include threads as well as kernel data structures and memory,bound to containers.

E.g a web server serving multiple connections. Processor Capacity reserves : CMU

Provides support for both time-sharing and real-time systems. The OS enforces the reservations (cpu share, time period). Applications free to change their reservations subject to admission control.

Nemesis : Cambridge OS does low level resource multiplexing. Avoiding QoS cross-talk

Support for I/O in user level libraries.

Page 6: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Click

Composable flow-graphs from router elements Packets travel along graph edges Element based processing (push/pull). Element based scheduling. Multithreaded SMP Click

Issues in flow level QoS on top of an element based architecture Flow level accounting and scheduling. CPU balance b/w input, output and processing. CPU conservation of idle elements.

Page 7: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

CROSS/Linux – Resource reservation with containers

Containers Group of related elements

Elements doing per flow processing. Container – CPU resource reservation unit.

Why use containers and not flows ? Types of Containers

Input Output Forwarding

Best Effort QoS - Packet rate reservations

Page 8: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Example Router Configuration

Page 9: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

CROSS/Linux - CPU scheduling

Three level scheduler Linux schedules CROSS

Linux process schedulerCROSS schedules Containers

Proportional (Dynamic stride scheduling)Containers schedule Elements

Simple Round Robin scheduling

Page 10: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

CROSS/Linux – Architectural Enhancements to Click

CPU conservation through sleep/wakeup Elements tested for scheduling eligibility Containers tested for scheduling eligibilty Notifier Queues - wake up elements (make eligible

for scheduling) Delayed wakeup Network interface Input Element

Switching between polling and interrupt Based on a threshold packet input rate to reduce

programmed I/O overhead Topology discovery

Discovering input/output queues for a container

Page 11: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

CROSS/Linux – Enhancements to Click

virtual Interface queues – especially for interface statistics gathering

Linux /proc interface – One directory for each container Directory provides information about

Container tickets CPU cycles consumed Packet rate/drop rate Elements Input/Output queues Set container tickets

Page 12: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

CROSS/Linux – Share adaptation

Why ? Inability to do a-priori CPU share calculation Variations in packet input rate Variations in per-packet processing cost

How ? Scheduler for each container keeps track of

Packet input rate. Packet drop rate. CPU cycles used.

Recomputes container shares to remove packet drops.

Page 13: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

CROSS/Linux – Share adaptation

Statistics maintained by QueuesPacket ratesPacket drop rates

Queues used to connect containersPacket pass/drop rates at Queues

indicate the difference between the required and the actual CPU shares for the container

Page 14: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Share adaptation Algorithm

Invoked every 1 second Notation used

T – Ticket share C – Current CPU share p – Input packet rate d – packet drop rate m – maximum input rate

General idea Increase ticket share of a container so that the drop

rate is removed at all the containers

Page 15: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Input Container share adaptation (Issues)

Pass as many packets as possible upto a maximum.How to arrive at this maximum?Forwarding more than the maximum

adversely affects the effective router throughput.

Reduce share on observing over allocation.

Page 16: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Input Container – Share adaptation(Algorithm)

if p > m /* Input rate too high */ /* reduce share */ T = C * (m/p)else if d > 0 /* Increase share to */ /*remove packet drops */ drate = min(d + p,m) T = C * (drate/p) else if (T – C) >= delta

/* Over allocation *//* reduce share */T = T – eps

Page 17: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

QoS container – Share adaptation(Issues)

Always forward till reserved rate. Target a forwarding rate range.

Reduce share in case of over allocation

Page 18: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

QoS container – Share adaptation(Algorithm)

If p ε [ R – Dt, R + Dt] /* No change */ return if p > R + Dt /* Reduce share */ T = C * (R/p) else if d > 0 /* Increase share */ drate = min(p + d,R) T = C * (drate/p) else if (T-C) >= delta /* Reduce share */ T = T – eps

Page 19: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Output Container – Share adaptation (Issues)

Try to forward all that is receivedThrottling if any has happened upstream

Reduce share in case of over allocation

Page 20: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Output Container – Share adaptation (Algorithm)

if d > 0 /*Increase share */ T = C * (1 + d/p)else if (T – C) >= delta / * Reduce Share */ T = T - eps

Page 21: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Best Effort Container – Share adaptation

No action takenSystem makes no guarantees

Page 22: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Discussion

Packet rate based reservation Reservations based on packet rates more intuitive CPU shares may vary for the same packet rates

C (Actual share) - How is it calculated? Input container

Only include CPU cycles used in packet processing as opposed to idle polling.

Other containers Easy to calculate since no idle polling.

m (Maximum forwarding rate) Constant determined at router initialization Evaluated at each iteration

Page 23: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Evaluation

Using a simulatorCalculates the forwarding rate , drop rate

based on the CPU shares.Mimics the actions of the adaptive algorithmEases loading the “router” and testing of

diverse workloadsUsing a real implementation

CROSS/Linux on 866 Mhz Pentium III CPU.

Page 24: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Adaptive vs. Non Adaptive(Experimental setup)

Input (2 µs), Output (2 µs) , Best Effort Container (6 µs).

Router – 1 MHz CPU => max forwarding = 100,000 packets/s

Static ticket assignment = 1:1:1Input varied for 0 to 110,000 packets/s in

increments of 10,000 packet/s every 10s.

Page 25: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Adaptive vs. Non Adaptive(Variation with time)

Page 26: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Adaptive vs. Non Adaptive(Maximum loss free forwarding rate)

Page 27: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Variable packet processing time(Experimental Setup)

Input (2µs), Best Effort/QoS (6µs), Output Container (2µs) Observe different convergence behavior for QoS /

Best Effort Router – 1 MHz CPU => max forwarding rate

initially = 100,000 packets/s Constant input = 50,000 packets/s Per packet processing cost increased by 2 µs

every 10 secs. Max. forwarding rate = 50,000 packets/s at

t=50s.

Page 28: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Variable packet processing time(Adaptive vs. Non Adaptive)

Page 29: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Variable packet processing time-(Best Effort vs. QoS)

Page 30: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Adaptation in m

Hard to determine m at router initializationMay vary with variations in per packet

processing costs.

m = maxi (TOTAL_CPU_CPS/cpu_cpp(ci))

where ci ε C TOTAL_CPU_CPS - Total CPU cycles per second available to the router cpu_cpp(ci) - cycles/packet being used by the flow serviced by container ci

cpu_cpp(ci) = cpu_cpi() + cpu_cycles(ci)/num_packets(ci) + cpu_cpo()

C - The set of containers servicing active flows

Page 31: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Fixed vs adaptive m - (Experimental setup)

Input (8µs), Best Effort/QoS (1µs), Output Container (1µs)

Router – 1 MHz CPU => max forwarding rate, initially = 100,000 packets/s

Constant input = 50,000 packets/s Per packet processing cost increased by 2 µs

every 5 secs Max forwarding rate = 50,000 packets/s at

t=30 s.

Page 32: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Fixed vs adaptive m - (Effective Best Effort Forwarding)

Page 33: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Fixed vs. adaptive m(Effective QoS forwarding)

Page 34: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Fixed vs. Adaptive m(Best Effort, QoS , Theoretical maximum)

Page 35: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Advanced Adaptation in m

Previous algorithm gives too much stress to the least expensive flow. Fine if all packets destined for that flow. The packet rate to different flows can be variable.

m =(TOTAL_CPU_CPS/weighted_cpu_cpp) weighted_cpu_cpp

= Σ (cpu_cpp(ci) * rate(ci))/ (Σ rate(ci))

where ci ε C

Page 36: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Adaptive m vs. advanced adaptive m(Experimental Setup)

Input container (5 µs), Output Container(5 µs) Router (1 MHz CPU) 2 flows

QoS container (50,000 p/s,30 µs) => max forwarding rate achievable = 25,000 packets/s

Best Effort container (3 µs) => max forwarding rate achievable = 77,000 packets/s

Input rate to best effort container = 500 packets/s Input rate to QoS container varied from 15,000

packets/s to 50,000 packets/s in increments of 5,000 packets/s every 5 s.

Page 37: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Adaptive m vs. advanced adaptive m(Forwarding rate vs. time)

Page 38: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Evaluation on a Router

CROSS/Linux software router platformP III 866 MHZ pc.3 network interface cards.

Page 39: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

QoS Forwarding (Experimental setup)

866 MHz , PIII router Input Container(4.5 µs) , Best Effort

Container(3 µs),QoS container (32,000 packets/s), Output Container (4.9 µs)

3 different per – packet processing costsfor the QoS container 3, 9.7 and 15.2 µs

Input to QoS => 32,000 packets/ Input to Best Effort => 27,000 packets/s

Page 40: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

QoS Forwarding (Forwarding rate)

Page 41: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

QoS Forwarding (Ticket Share)

Page 42: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

QoS forwarding (Ticket Shares)

Case Input Output Best

Effort

QoS

3 µs 0.29 0.236 0.236 0.236

9.7 µs 0.27 0.282 0.153 0.293

15.2 µs 0.213 0.245 0.068 0.47

Page 43: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

QoS forwarding (CPU Shares)

Case Input Output Best

Effort

QoS

3 µs 0.51 0.29 0.08 0.10

9.7 µs 0.31 0.299 0.087 0.30

15.2 µs 0.21 0.24 0.066 0.48

Page 44: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Effective Forwarding rate(Experimental setup)

Input (4.5 µs), best effort (8.3 µs) and output (4.9 µs)

Maximum forwarding rate = 57,000 p/s 3 different scenarios

No AdaptationCPU share Adaptation and m = 65000

packets/sCPU share Adaptation and m = 110000

packets/s

Page 45: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Effective Forwarding rate

Page 46: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Future Work

Conjoint CPU – Buffer Allocation Insufficient CPU share => always packet drops Once sufficient CPU shares, more buffering =>

more efficiency More buffering => higher packet delays and

packets getting dropped at line cards.

Share adaptation between Linux/CROSS Can use the SFQ scheduler already implemented

Page 47: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

Conclusion

Provide a QoS provisioning layer on top of a component based system.

Adaptive in response to variable packet input and processing costs.

Page 48: Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo

THANK YOU