Intradomain Routing - University Of Illinois...Approach 1: Optimize OSPF weights • e.g. OSPF-TE • Need to propagate everywhere: can’t change often • Artiﬁcial constraints

Intradomain RoutingBrighten Godfrey

CS 538 February 19 2018

slides ©2010-2018 by Brighten Godfrey unless otherwise noted

Routing

Often defined as the job of Layer 3 (IP). But...

• Ethernet spanning tree protocol (Layer 2)• Content delivery overlays, distributed hash tables,

network virtualization, … (Layer 4+)

Choosing paths along which messages will travel from source to destination.

Problems for intradomain routing

Distributed path finding

React to dynamics

High reliability even with failures

Scale

Optimize link utilization (traffic engineering)

Protocols: The Building Blocks

Distance vector routing

Protocol variants

• Original ARPANET• More recently: RIP, EIGRP

Remember vector of distances to each destination and exchange this vector with neighbors

• Initially: distance 0 from myself • Upon receipt of vector: my distance to each destination

= min of all my neighbors’ distances + 1 • Send packet to neighbor with lowest dist.

Slow convergence and looping problems

• E.g., consider case of disconnection from destination

Path vector routing

Protocol variants

• BGP

Remember vector of paths to each destination and exchange path announcements

• Initially: empty path to myself • Upon receipt of path announcement or withdrawal:

selected path = shortest among active paths to destination

• Send packet along selected path

Link state routing

Protocol variants

• ARPANET: McQuillan, Richer, Rosen 1980; Perlman 1983• Intermediate System-to-Intermediate System (IS-IS)• Open Shortest Path First (OSPF)

Algorithm

• Gossip the entire topology to everyone• Forwarding at each hop:

- Compute shortest path (e.g., Dijkstra’s algorithm)- Send packet to neighbor along computed path

We have a network...

Question

x y

A link fails. How many total units of message does x send in immediate response?

Question

x yX

...using DV or PV? ...using link state?

Link state vs. DV/PV

Disadvantages of LS

• Need consistent computation of shortest paths- Same view of topology- Same metric in computing routes

• Slightly more complicated protocol

Advantages of LS

• Faster convergence if compute time is negligible• Gives unified global view

- Useful for other purposes, e.g., building MPLS tables

Q: Can link state have forwarding loops?

Comparison

DV/PV Link State

Computes paths incrementally as path announcements are disseminated

Decouples topology gossiping from routing decision

• Simple protocol• Local, independent decisions

• Can have convergence problems• e.g. burst of updates• Or worse if you lack a path vector…

• Slightly more complex, but not too bad• Requires consistent view of topology and

routing metric across all routers

• Faster convergence if compute time is negligible

• Provides unified global view• Useful for other purposes, e.g. building

MPLS tables

Q: Can link state have forwarding loops?

LS variant: Source routing

Algorithm:

• Broadcast the entire topology to everyone• Forwarding at source:

- Compute shortest path (Dijkstra’s algorithm)- Put path in packet header

• Forwarding at source and remaining hops:- Follow path specified by source

Q: Can this result in forwarding loops?

Source routing vs. link state

Advantages

• Essentially eliminates loops• Compute route only once rather than every hop• Forwarding table (FIB) size = #neighbors (not #nodes)• Flexible computation of paths at source

Disadvantages

• Computation of paths at source• Header size: ≥ log2(#nodes)•|Path|

- Can use local rather than global next-hop identifiers- Then, size drops to ≥ log2(#neighbors)•|Path|

• Source needs to know topology• Harder to redirect packets in flight (to avoid a failure)

MPLS design

a network

315

15

3

3

egresslabel inheader

...3 Swap in label 15; out port 120 Swap in label 3; out port 2...

Why is this more flexible than shortest path routing?

20 33 9

9

Forwarding table ...15 Swap in label 3; out port 13...

Forwarding table

Port 1

Port 2

Port 13Port 5

MPLS design

Control plane constructs paths and coordinates labels

Can also stack labels = concatenate paths

• used for backup paths in MPLS Fast ReRoute (FRR)

a network

Ingress:Traffic classification, label packets (“forwarding equivalence class”)

315

15

3

3

egress

MPLS motivation

In the design doc

• High performance forwarding• Minimal forwarding requirements, so can interface well

with many types of media such as ATM• Flexible control of traffic routing

What matters today?

Flexibility. Widely used to achieve:

• Virtual Private Network (VPN) service along dedicated paths between enterprise sites

• Control backup paths with MPLS Fast ReRoute• Traffic engineering (load balancing)

Using the Protocols: Traditional Traffic Engineering

Traffic engineering

Key task of intradomain routing: optimize utilization

No TE: Shortest path routing

• How well does this work?

What do we actually want to accomplish?

TE: Classic ISP formulation

Given

Objective

Subject to

Cij

8i,jX

s,t

xijst u

⇤Cij

u⇤

xijst

Tst

minu⇤

8st Tst =X

j

xsjst

8st Tst =X

i

xitst

8st8v 62{s,t}X

i

xivst =X

j

xvjst

Capacity of link from i to jTraffic demand from s to t

Max link utilization (0 to 1)

Traffic volume of (s,t) flowcarried over link (i,j)

Ingress flow equals demand

Egress flow equals demand

Flow conservation

Utilization cap

TE: Classic ISP formulation

Can we solve it?

• Computationally yes: multi-commodity flow problem, represented as linear program,

• Solvable in polynomial time with LP solvers• But how do we solve it in a distributed way with ever-

changing inputs?

Is it enough?

• What about different objectives (e.g. latency)?• What about different priorities (enterprise VPN vs. best-

effort Internet flow)?

Classic TE solutions

Approach 1: Optimize OSPF weights

• e.g. OSPF-TE• Need to propagate everywhere: can’t change often• Artificial constraints make it difficult to optimize

- Same weights apply to all traffic- So all traffic at one ingress follows same paths

Approach 2: Allocate traffic to explicit MPLS paths

• Control protocol like RSVP-TE reserves capacity and constructs MPLS tunnels at each router along path

• Tradeoff: improves path choice but also state in routers- Not all possible paths will be available

MPLS with RSVP-TE

[Slide Credit: Ankit Singla]

MPLS with RSVP-TE

Link-state protocol (OSPF / IS-IS)1


MPLS with RSVP-TE


Also flood available bandwidth info2

Fulfill tunnel provisioning requests3


MPLS with RSVP-TE

Update network state, flood info4





MPLS with RSVP-TE

Update network state, flood info4





More Advanced TE

TeXCP [Kandula et al 2005]

Pre-construct small set of paths between every ingress-egress pair

• 10 MPLS tunnels in implementation

Dynamically at each ingress node:

• Probe utilization, latency of each path• Dynamically reallocate traffic between paths

Walking the Tightrope: Responsive Yet StableTraffic Engineering

Srikanth KandulaMIT CSAIL

[email protected]

Dina KatabiMIT [email protected]

Bruce DavieCisco Systems

[email protected]

Anna CharnyCisco Systems

[email protected]

ABSTRACTCurrent intra-domain Traffic Engineering (TE) relies on offlinemethods, which use long term average traffic demands. It can-not react to realtime traffic changes caused by BGP reroutes, di-urnal traffic variations, attacks, or flash crowds. Further, currentTE deals with network failures by pre-computing alternative rout-ings for a limited set of failures. It may fail to prevent congestionwhen unanticipated or combination failures occur, even though thenetwork has enough capacity to handle the failure.This paper presents TeXCP, an online distributed TE protocol

that balances load in realtime, responding to actual traffic demandsand failures. TeXCP uses multiple paths to deliver demands froman ingress to an egress router, adaptively moving traffic from over-utilized to under-utilized paths. These adaptations are carefully de-signed such that, though done independently by each edge routerbased on local information, they balance load in the whole net-work without oscillations. We model TeXCP, prove the stability ofthe model, and show that it is easy to implement. Our extensivesimulations show that, for the same traffic demands, a network us-ing TeXCP supports the same utilization and failure resilience as anetwork that uses traditional offline TE, but with half or third thecapacity.

Categories and Subject DescriptorsC.2.2 [Computer Communication Networks]: Network Proto-cols; C.2.3 [Computer Communication Networks]: NetworkOperations—Network Management

General TermsAlgorithms, Design, Management, Reliability, Performance.

KeywordsTeXCP, Traffic Engineering, Responsive, Online, Distributed, Sta-ble.

1. INTRODUCTIONIntra-domain Traffic Engineering (TE) is an essential part of

modern ISP operations. The TE problem is typically formalized asminimizing the maximum utilization in the network [5, 6, 15, 26].

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.SIGCOMM’05, August 21–26, 2005, Philadelphia, Pennsylvania, USA.Copyright 2005 ACM 1-59593-009-04/05/0008 ...$5.00.

A B

Boston

NewYork

Seattle

SanDiego

Figure 1: For each Ingress-Egress (IE) pair, there is a TeXCP agent atthe ingress router, which balances the IE traffic across available pathsin an online, distributed fashion.

This allows the ISP to balance the load and avoid hot spots and fail-ures, which increases reliability and improves performance. Fur-thermore, ISPs upgrade their infrastructure when the maximumlink utilization exceeds a particular threshold (about 40% utiliza-tion [20]). By maintaining lower network utilization for the sametraffic demands, traffic engineering allows the ISP to make do withexisting infrastructure for a longer time, which reduces cost.Recent years have witnessed significant advancements in traf-

fic engineering methods, from both the research and operationalcommunities [6, 12, 15, 40]. TE methods like the OSPF weight op-timizer (OSPF-TE) [15, 16] and the MPLS multi-commodity flowoptimizer [26] have shown significant reduction in maximum uti-lization over pure shortest path routing. Nonetheless, because of itsoffline nature, current TE has the following intrinsic limitations:

• It might create a suboptimal or even inadequate load distributionfor the realtime traffic. This is because offline TE attempts tobalance load given the long term traffic demands averaged overmultiple days (potentially months). But the actual traffic maydiffer from the long term demands due to BGP re-routes, externalor internal failures, diurnal variations, flash crowds, or attacks.

• Its reaction to failures is suboptimal. Offline TE deals with net-work failures by pre-computing alternative routings for a lim-ited set of failures [16]. Since the operator cannot predict whichfailure will occur, offline TE must find a routing that works rea-sonably well under a large number of potential failures. Such arouting is unlikely to be optimal for any particular failure. Asa result, current TE may fail to prevent congestion when unan-ticipated or combination failures occur, even though the networkmay have enough capacity to handle the failure.

The natural next step is to use online traffic engineering, whichreacts to realtime traffic demands and failures. Currently, onlineTE research is still in its infancy. Indeed it is challenging to build adistributed scheme that responds quickly to changes in traffic, yetdoes not lead to oscillations, as demonstrated by the instability ofthe early ARPAnet routing [23]. Prior online TE methods are eithercentralized [9, 10] or assume an oracle that provides global knowl-edge of the network [12], and most lack a stability analysis [34,39].There is a need for an online TE protocol that combines practical

[Kandula et al, “Walking the Tightrope”,SIGCOMM 2005]

ISP(AS#) # of paths usedavg std

Ebone(1755) 4.275 1.717Exodus(3967) 4.769 1.577Abovenet(6461) 4.653 2.038Genuity(1) 4.076 1.806Sprint(1239) 4.175 1.935Tiscali(3257) 4.525 1.980AT&T(7018) 3.976 1.785

Table 4: Though a TeXCP agentis configured with a maximum ofK =10 paths, it achieves near-optimal max-utilization using manyfewer paths.

Technique Description Distributed? Reacts to changesin traffic?

Robust to fail-ures?

Oracle LP based on multi-commodity

No No No

TeXCP in §3.3 and §3.4 Yes Yes YesOSPF-TEBase Optimal Link weights

for a TM [15]No No No

OSPF-TEF ailures Opt. weights for fewcritical failures [16]

No No Limited number ofanticipated failures

OSPF-TEMulti−TM Opt. weights overmultiple TMs [15]

No Optimizes overmultiple demands

No

MATE in [12] Sim. needs globalknowledge

Yes Yes

InvCap Common Practice - No No

Table 5: Various load balancing techniques.

5. PERFORMANCEWe evaluate TeXCP and compare it with prior work.

5.1 Topologies & Traffic DemandsISPs regard their topologies and traffic demands as proprietary

information. Thus, similar to prior work [6, 29], we use the Rock-etfuel topologies in Table 3. To obtain approximate PoP to PoPtopologies, we collapse the topologies so that “nodes” correspondto “cities”. Rocketfuel does not provide link capacities; so weassign capacities to links as follows. There is a marked knee inthe degree distribution of cities–i.e., cities are either highly con-nected (high-degree) or not. The high degree cities are probablyLevel-1 PoPs [20], with the rest being smaller PoPs. We assumethat links connecting Level-1 PoPs have high capacity (10Gb/s) andthat the others have smaller capacity (2.5Gb/s). This is in line withrecent ISP case studies [1, 20].Similarly to [6], we use the gravity model to compute estimated

traffic matrices. This approach assumes that the incoming traffic ata PoP is proportional to the combined capacity of its outgoing links.Then it applies the gravity model [33] to extrapolate a completeTM. The TMs used in our experiments lead to max. utilizations inthe range 25-75%. For lack of space, we omit similar results forbimodal TMs [6] and topologies generated using GT-ITM [19].

5.2 MetricAs in [6], we compare the performance of various load balancing

techniques with respect to a particular topology and traffic matrix(TM) using the ratio of the max-utilization under the studied tech-nique to the max-utilization obtained by an oracle, i.e.:

Metric =max-utilization Tech.

max-utilization Oracle.

5.3 Simulated TE TechniquesWe compare the following techniques (see Table 5):

(a) Oracle: As the base case for all our comparisons, we use Mat-lab’s linprog solver to compute the optimal link utilization for anytopology and traffic matrix. This is the standard off-line central-ized oracle which uses instantaneous traffic demands and solvesthe multi-commodity flow optimization problem [26].(b) TeXCP: We have implemented TeXCP in ns2 [27]. The im-plementation uses Eqs. 5,12. The TeXCP probe timer is set toTp = 0.1s, and thus Td = 0.5s. TeXCP uses the constants α = 0.4and β = 0.225 as per Theorem 4.1. The processing time of a probeat a core router is uniformly distributed in [0,2]ms, consistent withInternet measurements of the delay jitter for packets processed onthe slow path [18]. Packet size is 1KB, and buffers store up to 0.1s.

1

1.2

1.4

1.6

1.8

2

2.2

Ebone Exodus Abovenet Genuity Sprint Tiscali AT&TRatio

of M

ax-U

tiliz

atio

n to

Opt

imal

InvCap OSPF-TE_base TeXCP

Figure 4: When traffic matches TM, TeXCP results in a max-utilizationwithin a few percent of the optimal, and much closer to optimal thanOSPF-TE or InvCap. Figure shows both average (thick bars) and max-imum (thin bars) taken over 40 TMs.

(c) OSPF-TE: We implemented 3 versions of the OSPF weightoptimizer. The first, which we call OSPF-TEBase, is from [15].Given a traffic matrix, it searches for link weights that result in lowmax-utilization.6 The second, OSPF-TEF ailures, computes linkweights that result in low max-utilization even when few criticalfailures happen [16]. The third, OSPF-TEMulti−TM , simultane-ously optimizes weights for multiple traffic matrices. Our imple-mentation gives results consistent with those in [15, 16].(d) MATE:We compare the performance of TeXCP with MATE,a prior online TE protocol [12]. MATE’s simulation code is propri-etary. Therefore, we compare TeXCP against MATE’s publishedresults [12], after consulting with the authors to ensure that the sim-ulation environments are identical.(e) InvCap: A common practice sets a link weight to the inverse ofits capacity and runs OSPF [11].

5.4 Comparison With the OSPF OptimizerWe would like to understand the performance gap between on-

line and offline traffic engineering. No prior work provides a quan-titative comparison of these two approaches. Hence, in this section,we compare TeXCP with the OSPF weight optimizer (OSPF-TE),one of the more sophisticated and highly studied offline TE tech-niques [15, 16]. Given a topology and a traffic matrix, OSPF-TEcomputes a set of link weights, which when used in the OSPFintra-domain routing protocol produce a routing with low max-utilization. We also compare against InvCap, a common practicethat sets link weights to the inverse of link capacity.(a) Static Traffic: First, we investigate the simplest case in

which IE traffic demands are static, i.e., the actual realtime trafficcompletely matches the long term demands in the TM.

6It minimizes the total cost in the network; where cost is assigned to each link basedon a piece-wise linear function of the link utilization [15].

TeXCP results

Q: In OSPF-TE, “Finding optimal link weights that minimize the max-utilization is NP-hard”. Why is this harder than finding the best possible (non-OSPF) solution?

Background: Segment Routing

Idea: source routing by composing path segments

• Segment identifies- link or service (local)- router (global)

• Associated actions at router:- Push a new segment onto

front of packet - Continue forwarding along a

specified segment- Go to Next segment in

packet• Can be implemented with MPLS

which can leverage the benefits of SR without the need of anMPLS implementation. Authors propose a new IPv6 ExtensionHeader to encode the SR header in [9]. We depict the currentlyproposed format for the header in Figure 2.

Next Header

Hdr Ext Len

Routing Type

Segments Left

First Segment

HMAC Key Id Flags

Segment List [0] (128 bits IPv6 Address)

…

Segment List [n] (128 bits IPv6 Address)

Policy List [0] (128 bits IPv6 Address, Optional)




HMAC (256 bits, Optional)

~ ~

Fig. 2. IPv6 header for SR [9].

B. SR control-planeThe control-plane of SR defines how the segment ID in-

formation is communicated among devices in the network. Ina SR network, Node and Adjacency SIDs will be advertisedvia the link state IGP protocol. ISIS and OSPF, the mostpopular IGP protocols in service provider networks, wereextended to support the distribution of segment IDs [10][11].The extensions of IGP protocols would allow any router tomaintain a database of all nodes and adjacency segments.Also, by leveraging the sub-second convergence propertiesof both IGPs, the segment database on each router can bequickly updated after any topology change. Note that usingthese extensions, end-to-end encapsulation can be performedin the network without requiring enabling and management ofanother protocol, such as LDP.

Another element of the control-plane of SR deals with howan ingress router is instructed to select the SR path that apacket should follow. The following methods can be used forthis purpose:

1) Distributed Constrained SPF (CSPF) calculation. Inthis approach, an ingress router calculates the shortestpath for a destination, under the constraint that this pathmatches some criteria. It then computes a sequence ofnode and adjacency segments that encodes this path.

2) SDN controller based approach. SR provides a scal-able and resilient data-plane while allowing the flexi-bility of control commonly assumed for SDN environ-ments. This aspect led to the planned support of SR intodesigns of some SDN oriented controllers. For example,OpenDaylight supports the control of SR using the PathComputation Element Protocol (PCEP) [12].

3) Statically defined by the operator. Static configurationof the tunnels might be used for specific purposes suchas testing or troubleshooting, but it is typically notrecommended for network operation in the long term,due to evident scaling, resiliency, and managementlimitations.

R1 R2

R5

R3

9001$ 9002$

9005$

9003$

R4 R6

PE3 PE2

9004$ 9006$

9008$9007$

Firewall 1003$

PE1

2023$ 2032$

PE4

PE5

9009$

Low BW, Low delay

High BW, high delay

Link A High BW

High delay

Link B Low BW Low delay

1002$

DPI

9010$

High BW, Low delay

High BW, High delay

9000$

Fig. 3. Sample network for service chaining and traffic engineering use cases.

An operator can choose any of these methods, based onthe applications and scenarios that they want to support. Notethat the three strategies can coexist in the same network. Statictunnels could be used for troubleshooting or specific, but infre-quent, purposes. The CSPF method provides a balance betweenconnectivity optimization and automation. The great flexibilitydelivered by centralized approaches makes it compelling fornetworks with TE objectives for which conflicting decisionscould be taken when performed in a distributed way (e.g.demand placement for capacity engineering purposes).

III. SEGMENT ROUTING USE CASES

In this section, we describe various use cases that canleverage SR to obtain maximum benefit.

A. Traffic Engineering using SR TunnelsNowadays, networks support a variety of applications, with

their respective constraints on how resources are used toserve them. It is prevalent for service providers to ensure thattraffic flows transported between same devices, with differentresource requirements, follow optimal, maybe dissimilar paths.SR can give control over traffic-engineered paths withoutincreasing control-plane overhead at the transit nodes.

We use the network topology of Figure 3 to illustratehow traffic engineering can be implemented with SR. Let usconsider 2 types of application traffic entering the domain viaPE1 that should be processed by the Deep Packet Inspection(DPI) service function before egressing the domain: Voiceservice and high demanding bandwidth service. Voice trafficshould be steered over a short latency path, which is PE1-R1-R2-LinkB-R5-PE3-DPI. Large bandwidth flow should besteered over high bandwidth path PE1-R1-R2-LinkA-R5-PE3-DPI.

For the voice traffic, PE1 would place the segment list{9002, 2032, 9008, 1002} in the SR header, and forward toR1. Note that since a Node SID instructs a device to forwarda packet using the shortest paths to a destination, PE1 doesnot require to define the path hop-by-hop. After receiving thepacket, R1 will perform a CONTINUE operation on the SR

DEFO

DEFO

Connectivity Layer

Optimization Layer

Operator

e.g. IS-IS

e.g. Segment Routing

Physical Layer e.g. Routers

Figure 1: Proposed architecture.

2. ARCHITECTUREIn this section, we detail the role of every component

of our architecture, which is illustrated in Fig. 1.The right part of the figure highlights that two log-

ical layers are installed on top of the physical networktopology (routers and links). The underlying connec-tivity layer is responsible for the default forwarding be-havior and for network-wide connectivity. It definesconnectivity paths between all pairs of routers, hencefor any traffic flow possibly traversing the network. Incontrast, the optimization layer defines exceptions tothis default routing behavior. It implements optimizedpaths used for a subset of the traversing flows (e.g., asubset of router pairs). Optimized paths overwrite con-nectivity ones: Whenever an optimized path is defined,the corresponding flows always use it. While connectiv-ity and optimization layers represent a logic separation,they can also be implemented by different protocols run-ning on the routers. This provides a cleaner design andcomes with a set of advantages. For example, it enableschanges in the optimization layer (expected to be fre-quent) without any impact on the connectivity one. Italso simplifies the assessment of the impact of changesin the connectivity layer (e.g., consequently to less fre-quent events like failures). Finally, it ensures that con-troller failures do not disrupt forwarding paths.The connectivity layer is configured (possibly once in

the network lifetime) by operators. This choice is moti-vated by several reasons. First, it maximizes the utilityof the operators’ domain knowledge (expected trafficmatrix, capacity of links, etc.) to provide a good basisfor network optimization. Second, it leaves operatorswith a mean to take back the control of the network,e.g., in the case of major problems on the controller orfor maintenance operations that cannot be supportedby DEFO (e.g., router physical replacement). Third, itfacilitates progressive deployment of our proposal.DEFO controls the optimization layer to fine-tune

forwarding. It receives high-level goals from operators,and automatically translates them into optimized paths,e.g., configurations of the optimization-layer protocol.Since optimized paths are preferred over connectivityones, DEFO decides the actual forwarding paths usedfor any traffic flow crossing the network. DEFO canperform this goal-to-path translation for any change ofconnectivity paths, traffic flows or physical topology,e.g., for online traffic engineering.

function DSL syntax semantics

max load d.loadmaximum load of any linkin F (d)

max delay d.delaymaximum delay of source-destination paths in F (d)

deviations d.deviationsnumber of deviations fromconnectivity paths in F (d)

traversal d passThrough Strue if F (d) crosses anynode in S, false otherwise

sequencingd passThrough S1 true if F (d) sequentially

crosses nodes in S1 . . . Skthen S2 . . . then Sk

avoid d avoid Strue if no node in S is alsoin F (d), false otherwise

d, F (d) and S, S1, . . . , Sk respectively represent any demand,the forwarding paths for d, and sets of network nodes.

Figure 2: Constructs of the current DEFO DSL.

3. EXPRESSING NETWORK GOALSDEFO exposes a high-level interface, based on a small

Scala DSL [16]. It enables operators to intuitively de-clare desired characteristics of forwarding paths.The central abstractions used in the interface are the

concepts of demand, forwarding function and goal. Ademand is an aggregate of flows. In the following, we fo-cus on demands at the granularity of source-destinationpairs, for simplicity. A forwarding function maps a setof links to the value of a parameter (e.g., maximumload) associated to them. Our DSL includes constructsfor pre-defined forwarding functions. It permits for-warding functions to be applied to sets of links, to pathsor to demands. In the latter case, a forwarding functionmaps a demand to the value of a parameter associatedto the forwarding paths for that demand. Fig. 2 detailsthe forwarding functions currently supported by DEFO(when applied to demands). A goal is defined by for-warding functions applied to demands. Those functionscan be composed as (i) constraints that restrict the setof forwarding paths for given demands; and (ii) objec-tive functions specifying parameters to be optimized.In the following, we show examples of DEFO goal def-

initions. They illustrate both usage of forwarding func-tions and practically-meaningful compositions of them.The examples include standard Scala keywords like valfor value declaration and <- for variable assignment ina loop. Variables storing parameters not determined byDEFO, like the network topology (topology variable)or sets of traffic demands (Demands, LowDelayDemandsand SecDemands variables), are dynamically initializedat runtime, e.g., reading from preconfigured input filesprovided by operators or routing daemons.

Classic Traffic Engineering Goals. The use of re-sources often needs to be optimized in carrier-grade net-works. For example, the classic MinMaxLoad goal con-sists in minimizing the load of the maximally loadedlink, e.g., to avoid performance bottlenecks. It can bedeclared in DEFO in two lines.

var MaxLoad = max(for(l<-topology.links){yield l.load})val goal = new Goal(topology){ minimize(MaxLoad) }

17

Tactical Traffic Engineering Goals. To limit traf-fic disruptions, operators are often reluctant to changemany forwarding paths, and can prefer sub-optimal con-figurations to an unbounded number of path changes.DEFO forwarding functions can be combined to expressthose tactical goals. As an example, the following codeinstructs DEFO to find the best solution bringing alllink utilization under a certain threshold with at most2 deviations from connectivity paths.

val goal = new Goal(topology){for(d<-Demands) add(d.deviations <= 2)for(l<-topology.links) add(l.load <= 0.9 l.capacity)minimize(MaxLoad)}

Refined Goals. Operators often have to accommodatespecific (e.g., per-customer) needs. In DEFO, goals in-cluding arbitrary combinations of the forwarding func-tions in Fig. 2 are supported by adding constraints andchanging the objective function. The following exam-ple shows how to define a delay-respectful goal, that is,a variant of MinMaxLoad with constraints on the max-imum delay experienced by specific demands. In theexample, the latency of every network path is assumedto be known by DEFO, e.g., as a result of few activemeasurements. Our DSL also permits to define otherdelay-respectful goals, e.g., with delay constraints ex-pressed as a percentage of their current values or delaysfor given demands in the objective function.

val goal = new Goal(topology){for(d <- LowDelayDemands) add(d.latency <= 10.ms)minimize(MaxLoad)}

DEFO also supportsmulti-objective goals [17]. The nextsnippet shows a goal in which both the maximum linkload and the average latency have to be optimized.

val goal = new Goal(topology){minimize(MaxLoad, AvgLatency)}

Support for New Applications. Many operatorshave lately shown interest for service chaining, i.e., theability to steer packets through a sequence of services.In the example below, each demand in SecDemands isforced to pass first through one firewall in FirewallSet(whose position is assumed to be known by the opera-tor) and through either IPS1 or IPS2 after.

val goal = new Goal(topology){for(d <- SecDemands){add(d passThrough FirewallSet then (’IPS1, ’IPS2))}

minimize(MaxLoad)}

The avoid and passThrough constructs can also beused to specify anycast goals, in which a set of routersmust be avoided or forcedly used. This meets the needsof operators that have to comply with political or busi-ness rules, like preventing or ensuring that given trafficflows cross a specific country.

Finally, operators can run DEFO to achieve the de-clared goal, possibly specifying an upper bound for itscomputation time.

DEFOptimizer(goal).solve(30.sec)

Note that infeasible goals can be defined in our DSL,e.g., trying to force a demand through an isolated nodeor requiring a link utilization that cannot be achievedon the given topology and input demands. In our im-plementation, DEFO discards unsatisfiable constraints,computes a solution satisfying the other ones, and re-turns a warning to the operator. Similarly, if DEFOdoes not terminate in the specified time, it returns thebest solution found during that time.

4. OPTIMIZED PATH COMPUTATIONIn this section, we describe how DEFO computes

forwarding paths. Namely, we detail how it tacklesthe challenges posed by the need for (i) supporting awide variety of possible input goals (defined by arbitrarycombinations of forwarding functions in Fig. 2), and(ii) fast computation of solutions to computationally-hard (translation) problems. First, we present the Mid-dlepoint Routing model, internally used by DEFO fora compact representation of paths (§4.1). Then, weoverview DEFO formal representation of input goals interms of Middlepoint Routing instances (§4.2). Finally,we describe the optimization algorithm that it runs tocompute compliant optimized paths (§4.3).

4.1 The Middlepoint Routing ModelDEFO relies on the Middlepoint Routing (MR) model.

MR is similar to the pathlet routing model [18, 19] inthat it represents paths as concatenations of sub-paths.In contrast to pathlet routing, however, MR is based onforwarding graphs rather than single paths. This gener-alization enables to incorporate equal-cost multipath inthe model, and to provide a compact representation ofseveral distinct paths between any source-destinationpair. Moreover, we use MR to represent forwardingpaths rather than for routing protocol design.An MR instance represents acyclic paths from any

source s to any destination d as sequences of acyclicgraphs S = G1, . . . , Gn, with n ≥ 1. In S, any pair ofconsecutive graphs Gi and Gi+1 (with i = 1, . . . , n− 1)share a single common node mi, which is the sink ofGi and the root of Gi+1. We refer to any graph in Sas partial forwarding graph (PFG), and to any sharednode between two consecutive PFGs as middlepoint. Ofcourse, a PFG can represent a single path. In an MRinstance, a source-destination pair is associated to oneor more sequences S1, . . . ,SN of PFGs. As an illustra-tion, consider Fig. 3. Circles and arrows respectivelyrepresent routers and corresponding forwarding next-hops, while dashed squares identify PFGs. The MRrepresentation of the paths from s to t is then given bysequences S1 = [G(s,m), G(m,t)] and S2 = [G(s,t)].

18

s t

m

Figure 3: MR representations of s− t paths.

In the specific case of our two-layer architecture, eachPFG represents connectivity paths. Since PFGs aredefined by the connectivity layer, we can simplify therepresentation of optimized paths as a set of middle-point sequences. The right part of Fig. 3 also reportsthe simplified MR representation of s − t paths, whichis {[m], []}, under the assumption G(s,t) represents theconnectivity paths from s to t.Intuitively, DEFO uses the compact representation

of paths provided by MR to limit the number of deci-sion variables in the forwarding optimization. Indeed,it associates decision variables to the sequences of mid-dlepoints to be used for a given traffic flow. In Fig. 3,for instance, DEFO uses 2 variables (for the two mid-dlepoint sequences from s to t) instead of 8 (as neededif variables were associated to single paths) or 23 (asneeded to represent traversed links and paths, like inclassic linear programming formulations). In the fol-lowing, we explain how DEFO computes the value formiddlepoint sequences.

4.2 Network Goal FormalizationDEFO formalizes an input goal as an (initially empty)

MR instance with associated constraints and optimiza-tion functions. This formalization is based on the Con-straint Programming (CP) optimization framework [14].A CP problem is defined by a set of variables, eachhaving its own finite domain of possible values, and aset of constraints that apply to them. Contrary to lin-ear programming, constraints are implemented by algo-rithms that preserve consistency between variable do-mains. For this reason, CP supports high-level, inde-pendent and easily composable constraints. An objec-tive function can also be added to a CP problem.We now describe novel variables, data structures and

constraints used for goal formalization in DEFO. Con-sistently with §3, we assume that topology, demandsand parameters (like path delays) independent fromDEFO computations are provided as an input. Moredetails on our CP formalization are reported in [20].

Middlepoint variables model the MR representationof per-demand forwarding paths. Every input demandis mapped to a middlepoint variable, representing a set

of middlepoint sequences. We say that a middlepointvariable is assigned to a value when all the representedsequences end with the destination of the correspond-ing demand. If this condition does not hold, we saythat the middlepoint variable is unassigned. For exam-ple, the forwarding paths for the demand from s to t inFig. 3 are represented by an assigned middlepoint vari-able {[t], [m, t]}. Middlepoint variables represent thedecision variables of the optimizations performed byDEFO. When forwarding paths need to be optimized,the middlepoint variables are re-initialized to an emptyvalue. Then, they are progressively assigned to valuesaccording to the algorithm described in §4.3.

Forwarding function algorithms guarantee the sat-isfaction of constraints defined on forwarding functions.Every supported forwarding function implemented bya specific algorithm, specialized for that function. Forexample, the load and delay forwarding functions aresupported by different algorithms. Forwarding functionalgorithms (i) extract the value of the associated for-warding function (for example, the load of the maxi-mally loaded link for the load forwarding function al-gorithm) from a set of links or from forwarding pathscorresponding to the value of a middlepoint variable;(ii) compare the extracted value with a configured thresh-old, to assess if the represented constraint is satisfied;and (iii) reduce the domain of middlepoint variables byexcluding values that violate a constraint on the as-sociated forwarding function. The thresholds checkedby those algorithms are initialized by the constraintsof the input goal. For example, a constraint l.link <l.capacity defined on a link l initializes the value to bechecked on l by the load forwarding function algorithmto the link capacity provided by the input topology.

4.3 Middlepoint SelectionWe propose an efficient algorithm to solve CP prob-

lems corresponding to DEFO input goals. By assigningvalues to middlepoint variables, our algorithms com-pute which middlepoints to use in the optimized pathsof which traffic flow. The same algorithm can also beused to compute backup paths (e.g., to be pre-installedin routers as in well-known fast reroute techniques [21]).Unfortunately, selecting middlepoints is a hard prob-

lem. Indeed, we proved [20] that middlepoint selectionproblems are NP-hard, even if only link capacity con-straints have to be respected. Additional constraint orspecific objective function can make the correspondingselection problem even harder. Despite this, our algo-rithms are required to be efficient and scalable, for thecontroller to quickly react to events (failures, demandand goal changes, etc.) in large-scale networks.To quickly compute good solutions, we then propose a

heuristic approach, mixing a pure CP solution with a lo-cal search technique called Large Neighborhood Search(LNS). Intuitively, we use CP to compute the best solu-tion in a given portion of the search space, and we rely

19

… for each ingress-egresstraffic bundle

A Declarative and Expressive Approach

to Control Forwarding Paths in Carrier-

Grade Networks

Hartert, Vissicchio, Schaus, Bonaventure,

Filsfils,Telkamp, Francois

SIGCOMM 2015

DEFO

s t

m

Figure 3: MR representations of s− t paths.

In the specific case of our two-layer architecture, eachPFG represents connectivity paths. Since PFGs aredefined by the connectivity layer, we can simplify therepresentation of optimized paths as a set of middle-point sequences. The right part of Fig. 3 also reportsthe simplified MR representation of s − t paths, whichis {[m], []}, under the assumption G(s,t) represents theconnectivity paths from s to t.Intuitively, DEFO uses the compact representation

of paths provided by MR to limit the number of deci-sion variables in the forwarding optimization. Indeed,it associates decision variables to the sequences of mid-dlepoints to be used for a given traffic flow. In Fig. 3,for instance, DEFO uses 2 variables (for the two mid-dlepoint sequences from s to t) instead of 8 (as neededif variables were associated to single paths) or 23 (asneeded to represent traversed links and paths, like inclassic linear programming formulations). In the fol-lowing, we explain how DEFO computes the value formiddlepoint sequences.

4.2 Network Goal FormalizationDEFO formalizes an input goal as an (initially empty)

MR instance with associated constraints and optimiza-tion functions. This formalization is based on the Con-straint Programming (CP) optimization framework [14].A CP problem is defined by a set of variables, eachhaving its own finite domain of possible values, and aset of constraints that apply to them. Contrary to lin-ear programming, constraints are implemented by algo-rithms that preserve consistency between variable do-mains. For this reason, CP supports high-level, inde-pendent and easily composable constraints. An objec-tive function can also be added to a CP problem.We now describe novel variables, data structures and

constraints used for goal formalization in DEFO. Con-sistently with §3, we assume that topology, demandsand parameters (like path delays) independent fromDEFO computations are provided as an input. Moredetails on our CP formalization are reported in [20].

Middlepoint variables model the MR representationof per-demand forwarding paths. Every input demandis mapped to a middlepoint variable, representing a set

of middlepoint sequences. We say that a middlepointvariable is assigned to a value when all the representedsequences end with the destination of the correspond-ing demand. If this condition does not hold, we saythat the middlepoint variable is unassigned. For exam-ple, the forwarding paths for the demand from s to t inFig. 3 are represented by an assigned middlepoint vari-able {[t], [m, t]}. Middlepoint variables represent thedecision variables of the optimizations performed byDEFO. When forwarding paths need to be optimized,the middlepoint variables are re-initialized to an emptyvalue. Then, they are progressively assigned to valuesaccording to the algorithm described in §4.3.

Forwarding function algorithms guarantee the sat-isfaction of constraints defined on forwarding functions.Every supported forwarding function implemented bya specific algorithm, specialized for that function. Forexample, the load and delay forwarding functions aresupported by different algorithms. Forwarding functionalgorithms (i) extract the value of the associated for-warding function (for example, the load of the maxi-mally loaded link for the load forwarding function al-gorithm) from a set of links or from forwarding pathscorresponding to the value of a middlepoint variable;(ii) compare the extracted value with a configured thresh-old, to assess if the represented constraint is satisfied;and (iii) reduce the domain of middlepoint variables byexcluding values that violate a constraint on the as-sociated forwarding function. The thresholds checkedby those algorithms are initialized by the constraintsof the input goal. For example, a constraint l.link <l.capacity defined on a link l initializes the value to bechecked on l by the load forwarding function algorithmto the link capacity provided by the input topology.

4.3 Middlepoint SelectionWe propose an efficient algorithm to solve CP prob-

lems corresponding to DEFO input goals. By assigningvalues to middlepoint variables, our algorithms com-pute which middlepoints to use in the optimized pathsof which traffic flow. The same algorithm can also beused to compute backup paths (e.g., to be pre-installedin routers as in well-known fast reroute techniques [21]).

Unfortunately, selecting middlepoints is a hard prob-lem. Indeed, we proved [20] that middlepoint selectionproblems are NP-hard, even if only link capacity con-straints have to be respected. Additional constraint orspecific objective function can make the correspondingselection problem even harder. Despite this, our algo-rithms are required to be efficient and scalable, for thecontroller to quickly react to events (failures, demandand goal changes, etc.) in large-scale networks.

To quickly compute good solutions, we then propose aheuristic approach, mixing a pure CP solution with a lo-cal search technique called Large Neighborhood Search(LNS). Intuitively, we use CP to compute the best solu-tion in a given portion of the search space, and we rely

19

ECMP

ECMP

DEFO discussion

What’s the benefit of using a middlepoint instead of an explicit path?

What are the advantages & disadvantages of DEFO compared to TeXCP?

Announcements

Wednesday

Project proposal feedback later today

Readings

• OpenFlow (McKeown et al, 2008)• Fabric: A Retrospective on Evolving SDN (Casado et al,

HotSDN 2012)• Recommended, but no review: The Future of Networking

(Shenker, ONS 2011)

Documents

Intradomain Routing - University Of Illinois...Approach 1: Optimize OSPF weights • e.g. OSPF-TE • Need to propagate everywhere: can’t change often • Artiﬁcial constraints