39
1 Adapting Routing to the Traffic COS 461: Computer Networks Spring 2006 (MW 1:30-2:50 in Friend 109) Jennifer Rexford Teaching Assistant: Mike Wawrzoniak http://www.cs.princeton.edu/courses/archive/spring06/ cos461/

1 Adapting Routing to the Traffic COS 461: Computer Networks Spring 2006 (MW 1:30-2:50 in Friend 109) Jennifer Rexford Teaching Assistant: Mike Wawrzoniak

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

1

Adapting Routing to the Traffic

COS 461: Computer Networks

Spring 2006 (MW 1:30-2:50 in Friend 109)

Jennifer Rexford

Teaching Assistant: Mike Wawrzoniak http://www.cs.princeton.edu/courses/archive/spring06/cos461/

2

Goals of Today’s Lecture

• Challenges–Reacting quickly to alleviate congestion–Avoiding over-reacting and causing oscillations–Limiting bandwidth & CPU overhead on routers

• Load-sensitive routing–Routers adapt to link load in a distributed fashion–At the packet level, or on “group of packets”

• Traffic engineering–Centralized computation of routing parameters–Network-wide measurements of offered traffic

3

Do IP Networks Manage Themselves?

• TCP congestion control– Senders react to congestion– Decrease sending rate– But… the TCP sessions

receive lower throughput

• IP routing protocols– Routers react to failures– Compute new paths– But… the new paths

may be congested

42

2

1

13

1

4

5

3

42

2

1

13

1

4

5

3

4

Do IP Networks Manage Themselves?

• In some sense, yes:– TCP senders send less traffic during congestion– Routing protocols adapt to topology changes

• But, does the network run efficiently?– Congested link when idle paths exist?– High-delay path when a low-delay path exists?

42

2

1

13

1

4

5

3

42

2

1

13

1

4

5

3

5

Adapting the Routing to the Traffic

• Goal: modify the routes to steer traffic through the network in most effective way

• Approach #1: load-sensitive protocols–Distribute traffic & performance measurements–Routers compute paths based on load

• Approach #2: adaptive management system –Collect measurements of traffic and topology–Management system optimizes the parameters

• Debates still today about the right answer

6

Load-Sensitive Routing Protocols

• Advantages–Efficient use of network resources–Satisfying the performance needs of end users–Self-managing network takes care of itself

• Disadvantages–Higher overhead on the routers–Long alternate paths consume extra resources–Instability from out-of-date feedback information

7

Packet-Based Load-Sensitive Routing

• Packet-based routing–Forward packets based on forwarding table

• Load-sensitive–Compute table entries based on load or delay

• Questions–What link metrics to use?–How frequently to update the metrics?–How to propagate the metrics?–How to compute the paths based on metrics?

8

Original ARPANET Algorithm (1969)

• Routing algorithm–Shortest-path routing based on link metrics–Instantaneous queue length plus a constant–Distributed shortest-path algorithm (Bellman-Ford)

32

2

1

13

1

5

20congested link

9

Performance of ARPANET Algorithm

• Light load–Delay dominated by transmission & propagation–So, link metrics don’t fluctuate much

• Medium load–Queuing delay is no longer negligible–Moderate traffic shifts to avoid congestion

• Heavy load–Very high metrics on congested links–Busy links look bad to all of the routers–All routers avoid the busy links–Routers may send packets on longer paths

10

Problem: Out-of-Date Information

• Routers make decisions based on old information– Propagation delay in flooding link metrics– Thresholds applied to limit number of updates

• Old information leads to bad decisions– All routers avoid the congested links– … leading to congestion on other links– … and the whole things repeats

Lincoln Tunnel

Holland TunnelNJ NYC

“Backup at Lincoln” on radio triggers congestion at Holland

11

Problem: Frequent Updates

• Update messages– Link keeps track of its metric (e.g., queuing delay)– Link transmits updates when the metric changes

• Frequency of updates– Frequent changes to the metric lead to frequent updates– Significantly increases the overhead of the protocol

• Oscillation makes the problem worse– Oscillation leads to wild swings in the link metrics– Forcing very frequent update messages– … that add to the load on the links in the network

12

Second ARPANET Algorithm (1979)

• Link-state protocol– Old: Distributed path computation leads to loops

– New: Better to flood metrics and have each router

compute the shortest paths

• Averaging of the link metric over time– Old: Instantaneous delay fluctuates a lot

– New: Averaging reduces the fluctuations

• Reduce frequency of updates– Old: Sending updates on each change is too much

– New: Send updates if change passes a threshold

13

Problem of Long Alternate Paths

• Picking alternate paths–Long path chosen by one router consumes

resource that other packets could have used–Leads other routers to pick other alternate paths

• Solution: limit path length–Bound the value of the link metric–“This link is busy enough to go two extra hops”

• Extreme case–Limit path selection to the shortest paths–Pick least-loaded shortest path in the network

14

Load-Sensitive Routing

• Timescales– What timescale of routing decisions?– What timescale of feedback about link loads?

• Load-sensitive routing at packet level– Routers receive feedback on load and delay– Routers re-compute their forwarding tables – Fundamental problems with oscillation

• Load-sensitive routing for groups of packets– Routers receive feedback on load and delay– Router compute a path for the next flow or circuit– Less oscillation, as long as circuits last for a while

15

Reducing Effects of Out-of-Date Info

• Send link metrics more often– But, leads to higher overhead– But, propagation delay is a fundamental limit

• Make the traffic last longer– Route on groups of packets, rather than packets– Fewer routing decisions, and more accurate feedback

• Groups of packets– Telephone network: phone call (3-minutes long)– Internet: TCP connection (10-packets long)– Internet: all traffic between a pair of hosts, or routers, …

More when we talk about circuit switching later in the course.

16

Traffic Engineering as a Network-Management Problem: Case Study

17

Using Traditional Routing Protocols

• Routers flood information to learn topology– Determine “next hop” to reach other routers…– Compute shortest paths based on link weights

• Link weights configured by network operator

32

2

1

13

1

4

5

3

18

Approaches for Setting the Link Weights

• Conventional static heuristics–Proportional to physical distance

Cross-country links have higher weights Minimizes end-to-end propagation delay

–Inversely proportional to link capacity Smaller weights for higher-bandwidth links Attracts more traffic to links with more capacity

• Tune the weights based on the offered traffic–Network-wide optimization of the link weights –Directly minimize metrics like max link utilization

19

Example of Tuning the Link Weights

32

2

1

13

1

4

5

3

• Problem: congestion along the pink path– Second or third link on the path is overloaded

• Solution: move some traffic to the bottom path– E.g., by decreasing the weight of the second link

3

20

Measure, Model, and Control

Topology/Configuratio

n

Offeredtraffic

Changes tothe network

Operational network

Network-wide“what if”

model

measure

control

21

Traffic Engineering Problem

• Topology–Connectivity and capacity of routers and links

• Traffic matrix–Offered load between points in the network

• Link weights–Configurable parameters for routing protocol

• Performance objective–Balanced load, low latency, service level

agreements …

• Question: Given the topology and traffic matrix, which link weights should be used?

22

Key Ingredients of the Approach

• Instrumentation–Topology: monitoring of the routing protocols–Traffic matrix: fine-grained traffic measurement

• Network-wide models–Representations of topology and traffic–“What-if” models of shortest-path routing

• Network optimization–Efficient algorithms to find good configurations–Operational experience to identify key

constraints

23

Formalizing the Optimization Problem

• Input: graph G(R,L)–R is the set of routers–L is the set of unidirectional links–cl is the capacity of link l

• Input: traffic matrix–Mi,j is load from router i to j

• Output: setting of the link weights–wl is weight on unidirectional link l

–Pi,j,l is fraction of traffic from i to j traversing link l

j

i

24

Multiple Shortest Paths: Even Splitting

0.5

0.5

0.5

0.5

0.250.25

0.250.251.0

1.0

Values of Pi,j,l

25

Defining the Objective Function

• Computing the link utilization

– Link load: ul = i,j Mi,j Pi,j,l

– Utilization: ul/cl• Objective functions

– min (maxl(ul/cl))– min(lf(ul/cl))

f(x)

1x

26

Complexity of the Optimization Problem

• Computationally intractable problem–No efficient algorithm to find the link weights–Even for simple objective functions

• What are the implications?–Must resort to searching through weight settings

27

Optimization Based on Local Search

• Start with an initial setting of the link weights–E.g., same integer weight on every link–E.g., weights inversely proportional to capacity–E.g., existing weights in the operational network

• Compute the objective function–Compute the all-pairs shortest paths to get Pi,j,l

–Apply the traffic matrix Mi,j to get link loads ul

–Evaluate the objective function from the ul/cl

• Generate a new setting of the link weightsrepeat

28

Making the Search Efficient

• Avoid repeating the same weight setting–Keep track of past values of the weight setting–… or keep a small signature of past values–Do not evaluate setting if signatures match

• Avoid computing shortest paths from scratch–Explore settings that changes just one weight–Apply fast incremental shortest-path algorithms

• Limit number of unique link-weight values–Don’t explore 216 possible values for each weight

• Stop early, before exploring all settings

29

Incorporating Operational Realities

• Minimize number of changes to the network–Changing just 1 or 2 link weights is often enough

• Tolerate failure of network equipment–Weights usually remain good after failure–… or can be fixed by changing 1-2 weights

• Limit effects of measurement accuracy–Good weights remain good, despite noise

• Limit frequency of changes to the weights–Joint optimization for day & night traffic matrices

30

Application to AT&T’s Backbone

• Performance of the optimized weights– Search finds a good solution within a few minutes– Much better than link capacity or physical distance– Competitive with multi-commodity flow solution

• How AT&T changes the link weights– Maintenance every night from midnight to 6am– Predict effects of removing link(s) from network– Reoptimize the link weights to avoid congestion – Configure new weights before disabling equipment

31

Example from AT&T’s Operations Center

• Amtrak repairing/moving part of train track–Need to move some of the fiber optic cables–Or, heightened risk of the cables being cut–Amtrak notifies AT&T the timework will be done

• AT&T engineers model the effects–Determine which IP links go over affected fiber–Pretend the network no longer has these links–Evaluate the new shortest paths and traffic flow–Identify whether link loads will be too high

32

Example Continued

• If load will be too high–Reoptimize the weights on the remaining links–Schedule time for new weights to be configured–Roll back to old weights when Amtrak is done

• Same process applied to other cases–Assessing the network’s risk to possible failures–Planning for maintenance of existing equipment –Adapting link weights to installation of new links–Adapting link weights in response to traffic shifts

33

What About Interdomain Routing?

• Border Gateway Protocol–Announcements carry very limited information–E.g., AS path, but nothing about delay, loss, etc.

• Challenging to make load-sensitive protocol–Hard to agree upon a common metric–Hard to scale to such a large network–Hard to prevent ASes from gaming the system

• Instead, individual ASes act alone–Change routing policies based on link load–E.g., moving some traffic to another provider

34

Interdomain Traffic Engineering

• Predict effects of changes to import policies– Inputs: routing, traffic, and configuration data– Outputs: flow of traffic through the network

TopologyBGP policy

configuration

Externally learned routes

Offered traffic

BGP routingmodel

Flow of traffic through the network

35

Outbound Traffic: Pick a BGP Route

• Easier to control than inbound traffic– IP routing is destination based– Sender determines where the packets go

• Control only by selecting the next hop– Border router can pick the next-hop AS– Cannot control selection of the entire path

Provider 1 Provider 2

“(2, 7, 8, 4)”“(1, 3, 4)”

36

Outbound Traffic: Shortest AS Path

• No import policy on border router–Pick route with shortest AS path–Arbitrary tie break (e.g., smallest router-id)

• Performance?–Shortest AS path is not necessarily best–Could have high delays or congestion

• Load balancing?–Could lead to uneven split in traffic–E.g., one provider with shorter paths–E.g., too many ties with skewed tie-break

37

Outbound Traffic: Load Balancing

• Selectively use each provider–Assign local-pref across destination prefixes–Change the local-pref assignments over time

• Useful inputs to load balancing–End-to-end path performance data

E.g., active measurements along each path

–Outbound traffic statistics per destination prefix E.g., packet monitors or router-level support

–Link capacity to each provider–Billing model of each provider

38

Balancing Load, Performance, and Cost

• Balance traffic based on link capacity– Measure outbound traffic per prefix– Select provider per prefix for even load splitting– But, might lead to poor performance and high bill

• Balance traffic based on performance– Select provider with best performance per prefix– But, might lead to congestion and a high bill

• Balance traffic based on financial cost– Select provider per prefix over time to minimize the total

financial cost– But, might lead to bad performance

39

Conclusions

• Adapting routing to the traffic–To alleviate congestion–To minimize propagation delay–To be robust to future failures

• Two main approaches–Load-sensitive routing protocol–Optimization of configurable parameters

• Next class: World Wide Web–Read Section 9.2.2 of the textbook–Also, read handout about the Web