1. Modelling and Understanding End-To-End Class of Service Policies in Operational Networks

Embed Size (px)

Citation preview

  • 8/12/2019 1. Modelling and Understanding End-To-End Class of Service Policies in Operational Networks

    1/12

    Modeling and Understanding End-to-End Class of ServicePolicies in Operational Networks

    Yu-Wei Eric Sung , Carsten Lund , Mark Lyn , Sanjay Rao , and Subhabrata Sen Purdue University, AT&T Labs Research, AT&T Inc.

    {sungy,sanjay}@ecn.purdue.edu, {lund,sen}@research.att.com, [email protected]

    ABSTRACTBusiness and economic considerations are driving the extensive useof service differentiation in Virtual Private Networks (VPNs) op-erated for business enterprises today. The resulting Class of Ser-vice (CoS) designs embed complex policy decisions based on thedescribed priorities of various applications, extent of bandwidthavailability, and cost considerations. These inherently complexhigh-level policies are realized through low-level router congu-

    rations. The conguration process is tedious and error-prone giventhe highly intertwined nature of CoS conguration, the multiplerouter congurations over which the policies are instantiated, andthe complex access control lists (ACLs) involved. Our contribu-tions include (i) a formal approach to modeling CoS policies fromrouter conguration les in a precise manner; (ii) a practical andcomputationally efcient tool that can determine the CoS treat-ments received by an arbitrary set of ows across multiple routers;and (iii) a validation of our approach in enabling applications suchas troubleshooting, auditing, and visualization of network-wide CoSdesign, using router conguration data from a cross-section of 150diverse enterprise VPNs. To our knowledge, this is the rst effortaimed at modeling and analyzing CoS congurations.

    Categories and Subject Descriptors: C.2.3 [Network Operations] :

    Network ManagementGeneral Terms: Design, Management, MeasurementKeywords: Conguration modeling, differentiated service

    1. INTRODUCTIONService differentiation using Class of Service (CoS) is being in-

    creasingly adopted in a wide variety of network settings includingenterprise Virtual Private Networks (VPNs). The need to supportCoS differentiation arises from several considerations: (i) IP net-works today carry trafc from a diverse set of applications withvery different performance needs such as low delay and low lossfor Voice over IP (VoIP) trafc, and high throughput for bulk datatransfer like FTP; (ii) different applications can have very differ-ent relative importance to an enterprise customer, e.g., web servicetransactions supporting a vital workow are likely to be accorded

    Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for prot or commercial advantage and that copiesbear this notice and the full citation on the rst page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specicpermission and/or a fee.SIGCOMM09, August 1721, 2009, Barcelona, Spain.Copyright 2009 ACM 978-1-60558-594-9/09/08 ...$10.00.

    much higher priority compared to trafc associated with brows-ing external web sites; and (iii) economic considerations drive theneed to optimize the usage of existing network infrastructures un-der nite capacity and cost constraints, while ensuring good perfor-mance for important applications.

    CoS designs embed inherently complex policy decisions basedon the described priorities of various applications, extent of band-width availability, and cost considerations. Realizing these pol-icy objectives involves conguring multiple routers in low-level,device-specic conguration languages. Troubleshooting customercomplaints regarding poor performance involves tracing the con-gured CoS treatment of individual ows by looking across routerconguration les, each of which can contain thousands of com-mand lines. Misconguration could potentially lead to violationsof the service level agreements (SLAs), business service disrup-tions for the enterprise, and penalties for the service provider. Theproblem is daunting considering the large number of enterprise net-works managed, their large size and geographical span, and the di-versity of services and conguration options.

    While managing network conguration is hard in general, CoSconguration poses several unique challenges. First, CoS cong-uration is highly intertwined, with dependencies across differentparts of a conguration le. Second, the overall CoS treatment

    seen by a ow between two end-points may be impacted by policyblocks (i.e., parts of the conguration specifying CoS policy rules)instantiated in multiple device, possibly on both input and outputinterfaces of each device, and by multiple policy blocks even on asingle interface. Third, each policy block can transform a packetbefore it enters the next policy block for instance, a router maymark a packet at the inbound interface to indicate the priority class,and the actions of the routers outbound interface or a downstreamrouter may depend on the marking. Finally, each policy block clas-sies ows into different classes using multiple access control lists(ACLs), each of which can have hundreds of inter-dependent rules.

    1.1 ContributionsIn this paper, we make the following contributions:

    We draw attention to the prevalence of CoS, and the complexity

    inherent in managing CoS conguration, a topic that is little knownoutside the operational community. Our understanding has beengleaned from repeated inspections of routerconguration les froma large number of enterprise VPNs, and through close interactionswith network designers. We believe our efforts pave the way forthe research community to further engage in the area. We propose a formal representation of CoS policies that is inde-pendent of the underlying conguration syntax. This representationis easily derivable from low-level congurations and can enableus to systematically compose CoS policies across multiple policyblocks within a single router and across multiple routers.

  • 8/12/2019 1. Modelling and Understanding End-To-End Class of Service Policies in Operational Networks

    2/12

    CER1

    MPLSBackbone

    PER4

    PER3

    PER2

    PER1

    CER4 CER2

    CER3LA

    San Diego Dallas

    LA Dallas branch traffic

    CER: Customer-Edge RouterPER: Provider-Edge Router

    New York

    CER5

    Houston

    CER4b

    Figure 1: An enterprise VPN spanning multiple sites.

    Based on our framework, we have built a query tool that enablesoperators to determine how an arbitrary set of ows are treated onan end-to-end basis. We use the term end-to-end throughoutthis paper to refer to the fact that the tool can model the net ef-fect that a ow would see if it were to traverse a given sequenceof devices, each of which could potentially have multiple CoS pol-icy blocks. The set of devices of interest is assumed to be known,though in the future, routing table information could be incorpo-rated to automate the discovery of the devices involved. The toolleverages binary decision diagrams (BDD) [14], a practical and ef-cient data structure for manipulating boolean operations. Withour tool, the computation time for query resolution is in the orderof seconds for the real-world datasets we considered.

    While we believe our modeling approach is itself more general,the current implementation of the tool is targeted at Cisco IOS con-gurations, and enterprise VPN settings. Further, the focus is onCoS policies instantiated in routers at the customer and provideredge, at both the ingress and egress. We focus on these policies asthey reect unique requirements of individual customers and tendto change frequently. CoS policies in the core are not consideredas they reect provider policies which are relatively homogeneousand stable, and do not involve IP packet transformations. We have evaluated our tool on a cross-section of 150 differ-ent enterprise VPNs. Even though the VPNs are managed by thesame provider, their CoS designs show signicant diversity acrossenterprises, perhaps reecting the different application mixes and

    cost considerations of each enterprise. Further, their designs showa large degree of heterogeneity in terms of how different routerswithin an enterprise are congured, reecting the different func-tionality offered by sites within an enterprise (e.g., a VoIP cus-tomer service center may be congured differently than a data cen-ter). The results also conrm the importance and effectiveness of our approach in assisting operators to reason about network-wideCoS operations. In particular, our tool enables operators to auditthe overall CoS design, identify potentially anomalous ow treat-ments and conguration lines that are never triggered, and derive anetwork-wide view of the CoS design which summarizes the trafcclasses that can be exchanged between every pair of VPN sites.

    The rest of the paper is organized as follows. 2 presents thebackground on CoS in enterprise VPN settings and motivates ourformal modeling approach. 3 describes a framework for modeling

    CoS conguration. 4 outlines the design and key implementationaspects of our CoS query tool. 5 describes potential usage sce-narios for the tool. 6 studies CoS usage in practice and evaluatesthe effectiveness of the tool in troubleshooting, auditing, and deriv-ing network-wide CoS designs. 7 summarizes related works. 8discusses some open issues, and nally 9 concludes the paper.

    2. BACKGROUND AND MOTIVATIONEnterprise networks are increasingly moving from using dedi-

    cated private lines to using VPNs, especially layer-3 Multi Protocol

    P routers

    PER1 PER2CER1 CER2

    MPLS VPN Backbone

    Packets aremarked

    inbound offLAN givenDSCP setting

    for appropriateCoS.

    Packets are policeddepending on whetherthey are conforming or

    non-conforming.

    Packets are scheduledusing various queuing

    disciplines andcongestion avoidance

    mechanisms.

    Packets are still markedwith the same DSCP

    value as they were whenthey left CER1.

    Packets are scheduled

    using various queuingdisciplines andcongestion avoidance

    mechanisms.

    EXP-based CoS

    DSCP-based CoS

    LA Dallas

    Figure 2: CoS treatment along an end-to-end path.

    Label Switching (MPLS) VPN technology, to connect geographi-cally disparate sites. In this architecture (see Fig. 1), each site hasone or more customer edge routers (CER), which can be jointlycongured by the enterprise customer and service provider. EachCER connects to the provider network via one or more provideredge routers (PER). Customer IP trafc from a CER is encapsu-lated by MPLS labels at the ingress PER, carried over MPLS tun-nels across core routers (referred to as Provider or P routers) in theMPLS provider backbone, decapsulated by a remote egress PER,

    and sent to the appropriate destination CER. Such provider-basedVPNs provide a scalable and secure way for service providers tosupport different customers across a common MPLS backbone.

    2.1 Service differentiation in VPNsA primary consideration for service providers in designing en-

    terprise VPNs is to satisfy various application needs such as delay, jitter, and bandwidth. In this architecture, both the enterprise andprovider are faced with supporting trafc for applications with dif-ferent performance requirements and service priorities over a com-mon network infrastructure. Furthermore, in the provider back-bone, services with different requirements such as VPNs, Internet,multicast, and VoIP may coexist. At the edge of the network, be-tween the enterprise and the provider, limited available bandwidthcapacity coupled with the high cost of deploying and provision-ing additional capacity means that bandwidth needs to be carefullymanaged across the applications competing for it. While the core istypically overprovisioned, bandwidth must still be carefully man-aged under failure scenarios.

    Service differentiation is essential to trafc management in sucha heterogeneous environment. A provider and an enterprise cus-tomer enter into service level agreements (SLAs) that describe theagreed-to requirements on performance for different applications this involves complex choices based on application requirementsand behaviors, the relative priorities of applications used by thatenterprise, and cost. The routers in the VPN need to then be con-gured appropriately to honor the SLAs. SLA violations can haveadverse consequences, potentially involving disruption of criticalenterprise activity and heavy penalties for the service provider.

    2.2 Realizing service differentiation using CoSClass of Service (CoS) is a way of realizing service differen-

    tiation, by grouping trafc with similar service requirements andtreating each group as a class with its own level of service priority.The Type of Service (ToS) eld of the IP header of each packet in-dicates the priority class to which it belongs. Typically, the rst 6bits of the ToS eld encodes the class information, and these bitsare referred to as the Differentiated Services Code Point (DSCP)bits. The DSCP bits are directly used by the CERs and PERs indeciding the treatment that a packet must receive on the CER-PER

  • 8/12/2019 1. Modelling and Understanding End-To-End Class of Service Policies in Operational Networks

    3/12

    1 class-map match-any REALTIME2 match access-group VOICE3 match access-group INTERACTIVE-VIDEO4 match ip dscp 465 class-map match-all ... ! OTHER class-maps6 !7 policy-map REALTIME-POLICER8 class REALTIME9 police 30000 conform-action set-dscp-transmit 46

    exceed-action drop10 policy-map CRITICAL-DATA-POLICER11 class ROUTING12 set ip dscp 4813 class OTHER-CRITICAL-DATA14 police 15000 conform-action set-dscp-transmit 26

    exceed-action set-dscp-transmit 2815 policy-map BEST-EFFORT-MARKER16 ...17 !18 policy-map WAN-EGRESS-POLICER-QUEUE19 class REALTIME20 priority percent 3521 service-policy REALTIME-POLICER22 class CRITICAL-DATA23 bandwidth percent 4024 service-policy CRITICAL-DATA-POLICER25 class class-default26 bandwidth percent 2527 service-policy BEST-EFFORT-MARKER28 !29 interface Ethernet0/1 ! INTERFACE TO LA BRANCHs LAN30 ... ! LAN INGRESS MARKING POLICY31 !32 interface Serial1/0 ! INTERFACE TO PER133 service-policy output WAN-EGRESS-POLICER-QUEUE34 !35 ip access-list extended VOICE36 permit ip 192.168.1.0 0.0.0.255 any37 permit ip any 192.168.1.0 0.0.0.25538 ip access-list . .. ! ACLs FOR OTHER TRAFFIC TYPES

    Figure 3: Class of Service conguration of CER1.

    and PER-CER links. Before forwarding an IP packet to the back-bone, the PER encapsulates it with an MPLS label and maps theDSCP value of the packet to a 3-bit eld in the MPLS label re-ferred to as the experimental (EXP) value. This EXP value is usedby P routers to differentially treat trafc in the core. Fig. 2 illus-trates the overall process of realizing differentiated treatment alongan end-to-end path.

    Based on the SLAs and customer input, the network designeris tasked with determining the CoS policies to instantiate at eachCER and PER for a given VPN. Conceptually, there are three maincomponents to CoS policies: Marking: This component involves rules to determine how in-coming trafc may be assigned to a particular class by setting ap-propriate ToS values based on packet parameters. Typically, theparameters used are a subset of the 6-tuple: source and destinationIPs and ports, protocol, and ToS. Note that the ToS eld is includedas the classication decision may depend on the ToS eld set by anearlier device, e.g., a router at the customer premise. Policing: This component involves rules that determine whether

    the packet arrival rate of a class conforms to the specied trafcrates and what actions should be taken. For example, conformantand non-conformant trafc of each data class could be assigneddifferent DSCP bits to indicate they must be treated differently. Al-ternately, non-conformant trafc could simply be dropped. Queuing: This component applies queuing rules to outgoing routerinterfaces that may experience congestion (e.g., a PER-facing CERinterface). The rules for each queue specify the discipline (e.g.,RED, Weighted Fair Queuing) and allocated bandwidth of thequeue,as well as attributes (e.g., drop rate) for different trafc classes as-signed to the queue.

    1 class-map match-any REALTIME2 match ip dscp 463 class-map match-any CRITICAL-DATA4 match ip dscp 485 match ip dscp 266 match ip dscp 287 !8 policy-map WAN-INGRESS-POLICING9 class REALTIME

    10 police 95000 conform-action set-mpls-exp-transmit5 exceed-action drop

    11 class CRITICAL-DATA12 police 60000 conform-action set-mpls-exp-transmit

    3 exceed-action set-mpls-exp-transmit 713 class class-default14 police 75000 conform-action set-mpls-exp-transmit

    0 exceed-action set-mpls-exp-transmit 415 !16 interface Serial1/0 ! INTERFACE TO CER117 service-policy input WAN-INGRESS-POLICING18 service-policy output WAN-EGRESS-QUEUE ! TO LA19 !

    Figure 4: Class of Service conguration of PER1.

    2.3 Complexity of CoS congurationIn this section, we draw attention to a few key aspects of CoS

    conguration that make CoS policies particularly complex to man-age and understand. Our insights are drawn from extensive anal-ysis of conguration data from operational enterprise VPNs (see6). Our discussion is conducted in the context of Fig. 3 and Fig. 4which respectively present CoS conguration snippets of CER1and PER1 in Fig. 2. The congurations are based on Cisco IOSand are signicantly simplied for ease of illustration.Instantiated over multiple devices: CoS policies may be instan-tiated at the CERs, PERs, and P routers, all of which may impactthe treatment of a ow traversing these routers. For example, poli-cies in the ingress CER are customized to individual customers andimpact how different trafc ows are prioritized on the CER-PERlink. Policies in the ingress PER typically set the appropriate EXPvalue in the MPLS label based on the DSCP marking from theingress CER, and trafc conformance. Policies in the P routers de-termine the per-hop queuing strategy inside the core based on theEXP bits and are typically homogeneous across enterprises sincethey reect provider-level priorities inside the backbone. Finally,policies in the egress PER prioritize ows on the PER-CER link according to their ToS values.Diversity and dynamics of customer policies: The congurationof theingress CER is tailored to meet theuniquerequirements of in-dividual customers. As our analysis of real enterprise VPN datasetsin 6.2 will show, CERs within an enterprise VPN may be very dif-ferent and heterogeneous in terms of the data classes that each CERsupports, while the polices used across enterprises may be very di-verse. Further, these policies are very dynamic, requiring congu-ration changes over time to reect the evolving nature of customerrequirements, emergence of new applications, and shifts in trafcpatterns (e.g., migration of a database server to another site).

    Multiple policy blocks per device: The CoS conguration of eachrouter can be composed of multiple CoS policy blocks. Typically,the output PER-facing interface of a CER is associated with twopolicy blocks, corresponding to the policing and queuing rules.In addition, every input interface may be associated with a policyblock corresponding to the marking rules, or the marking rules maybe merged with the policing rules on the output interface. Fig. 3illustrates CER1s conguration, which employs the latter style.Lines 32-34 show thedenition of the PER1-facing WAN interface,which is associated with a single policy construct embedding twoegress policy blocks (policing and queueing). Lines 18-28 show

  • 8/12/2019 1. Modelling and Understanding End-To-End Class of Service Policies in Operational Networks

    4/12

    0

    20

    40

    60

    80

    100

    0 100 200 300 400 500

    C D F ( C E R )

    Number of ACL rules related to CoS

    Figure 5: CDF of the number of CoS-related ACL rules per CER.

    the denition of the queuing policy block, and lines 7-17 show thedenition of the policing block. Each of these conguration blocksstarts with the keyword policy-map and is organized into multi-ple sub-blocks, each of which starts with the keyword class . Eachof these sub-blocks corresponds to a particular class of trafc, anddenes the (policing or queuing) rules for that class. For instance,for the REALTIME class, line 9 indicates that based on the param-eter, trafc is either marked with a DSCP value of 46 to indicateit is conformant, or dropped if it is non-conformant, while line 20indicates that the queueing discipline to be used is a priority queue.Specifying class membership using multiple large ACLs: Flowsare usually classied into different classes by using multiple ACLs.Forinstance, in Fig. 3, the class membership is specied in lines 1-5using conguration blocks that start with the keyword class-map .Each class-map identies matching ows with a list of criteria,such as ACLs (lines 2-3) or packet parameters (line 4), and basedon the match-all or match-any keyword might declare a packetto be a member of that class if all or any of the criteria is matched.An ACL consists of a list of permit/deny rules evaluated sequen-tially, and the membership described by an ACL includes all thepermitted ows. For example, the ACL used by line 2 describesVOICE trafc with two permit rules (lines 35-37). Fig. 5 shows aCDF of the total number of ACL rules used by CoS policies con-gured in a CER, across all VPNs in our dataset. We see that morethan 100 ACL rules are congured for CoS in 20% of the CERs,and the CER with the most ACL rules has more than 500 rules.Discussions with the designers revealed that the large number of rules may be due to (i) a customer not using contiguous subnet orport ranges to classify applications; and (ii) evolution of classi-cation policies, when ACL entries are added without consolidatingexisting entries. Given that each policy block can specify member-ship of multiple classes, each expressed as a logical combinationof multiple ACLs with potentially hundreds of rules, manually de-termining the class which each ow in a set of ows belongs to isimpractical from the perspective of operators.Transformations of ows: Each CoS policy block may modify thepacket headers or MPLS labels, impacting how they are treated atthe next policy block or device. For instance, marking rules may

    be instantiated in the input LAN-facing interface of a CER, whichcan modify the DSCP header of a packet, which in turn affects howthe packet is treated by the policing and queueing rules. Likewise,the treatment of a ow at the PER may depend on how it has beenmarked at the CER. Fig. 4 presents a conguration snippet of PER1.The conguration has a similar structure to that of CER1. For eachclass of trafc, the appropriate EXP value in the MPLS label is setbased on the DSCP value and trafc conformance as shown in thepolicy-map block in lines 8-15. Interestingly, we note that trafcis labeled with the same EXP bits for multiple DSCP values of 48,26, and 28. Thus, though the trafc belonging to these classes may

    Ethernet0/1 interface

    WAN-EGRESS-POLICER-QUEUE

    Serial1/0

    REALTIME

    policy-map(parent)

    class-map

    LAN-POLICY

    VOICE access-list

    CONFIG CONFIG

    BLOCK BLOCK

    CRITICAL-DATA

    REALTIME-POLICERBEST-EFFORT-MARKER

    class-default OTHER-CRITICAL-

    DATA

    INTERACTIVE-VIDEO

    BGPTRANSACTIONAL-

    DATA

    policy-map

    (child)

    CRITICAL-DATA-POLICER

    ROUTING

    Figure 6: Syntactic reference structure of conguration in Fig. 3.

    be treated differently at the CER1-PER1 link, they are treated in thesame fashion in the backbone. Given such complexity in design, itis necessary to consider potential transformations by every routerin determining the end-to-end treatment of a ow.Potential for errors: CoS conguration is highly intertwined, withseveral dependencies that exist across different logical blocks of the conguration. Fig. 6 illustrates the syntactic dependencies of CER1s conguration. Congurations with such a highly inter-twined structure are hard to manually navigate and prone to mis-congurations. An example misconguration may involve real-time trafc being incorrectly congured to enter the queue cor-responding to best-effort trafc. As another example, errors mayarise due to the complex ordering relationships in the congura-tion. For instance, consider the policy-map block in lines 18-28of Fig. 3. If the class-map REALTIME had a catch-all clause (alltrafc is matched), then, no trafc wouldmatch the class CRITICAL-DATA. This would imply that no trafc would be sent to that par-ticular queue, even though this is not obvious from a casual inspec-tion of the conguration. We refer to a situation where a policycorresponding to a trafc class can never see trafc as a shadowed policy . We discuss more examples of such errors in 6.

    3. MODELING CLASS OF SERVICEIn this section, we present a model to extract the end-to-end CoS

    policies between a pair of devices from low-level congurations.We view such a model as an essential building block that can en-able applications useful to operators such as troubleshooting, visu-alization of network-wide CoS designs, auditing, and analysis of conguration changes. Performing these tasks is challenging to-day due to the complex nature of CoS conguration, motivating theneed for formal modeling of CoS policies.

    In modeling CoS conguration, we had several objectives. First,we wanted a formal representation of CoS policies that preciselyand unambiguously captures the policy goals, yet is independent of low-level conguration syntax. Second, the representation shouldbe easily derivable from low-level congurations through a sim-ple parser. Third, we wanted the representation to be amenable to

    composition, i.e., it should be possible to compose the formal rep-resentations corresponding to different policy blocks in a router, oracross routers to obtain an end-to-end view of the CoS design.

    We provide an overview of our approach to achieve these goalsin 3.1, and discuss the details in the rest of the section.

    3.1 OverviewWe model the overall CoSpolicy as a function that we calla rule-

    set that takes a multiple dimensional input and produces an output.The input is a owset , which is an arbitrary set of ows. A sin-gle ow is identied by the 6-tuple IP header elds: source and

  • 8/12/2019 1. Modelling and Understanding End-To-End Class of Service Policies in Operational Networks

    5/12

    destination IP addresses, source and destination port numbers, pro-tocol, and ToS byte. In addition, there may be other inputs thatmodel attributes outside the scope of a static analysis. For exam-ple, a policing policy treats packets differently depending on theirconformance to the SLA. We handle this by having an additionalbit in the input that species whether the model should treat thepackets in a ow as conformant or not. The output provides infor-mation about the treatment seen by a packet corresponding to eachow (e.g., which queue would be used by the packet), how thepacket gets changed (e.g., how the routers modify the ToS byte),and where the packet ends up (e.g., does it get dropped by a policerif non-conformant).

    Our overall approach consists of three steps: First, we construct a root ruleset that models each CoS policyblock of every device from the conguration les. For instance, forthe CER conguration in Fig. 3, two root rulesets are constructed,which correspond to the policing rules (lines 7-17) and the queuingrules (lines 18-28). For the PER conguration in Fig. 4, one rootruleset is constructed corresponding to the policing rules (lines 8-15). We initially begin with a representation of a root ruleset which isitself expressed in terms of other rulesets. These dependencies mir-ror the inherent nested structure of conguration, enabling the root

    ruleset to be easily derivable from the low-level congurations. Wethen show how a at representation which eliminates these depen-dencies may be derived. The at representation makes it feasible tocompose multiple root rulesets. Finally, we model the end-to-end CoS policy as a sequence of rootrulesets in at representation, with each root ruleset being allowedto transform the input before sending it to the next root ruleset. Wetake such a sequence and collapse it into one single ruleset in atrepresentation that contains the overall CoS behavior.

    3.2 Recursive representation of rulesetsA ruleset can be viewed as a generalization of access control

    lists (ACLs), typically used for reachability control [18]. A stan-dard ACL has only two basic actions: permit and deny. Thus,it can be seen as a function that maps an input to one of these twooutput values. A ruleset generalizes the notion in a few ways. First,a ruleset may recursively depend on other rulesets. The ruleset atthe top of the recursive hierarchy is a root ruleset that encompassesa single CoS policy block. The rulesets at the bottom of the hierar-chy, which we refer to as leaf rulesets , capture the matching criteriafor a ow (typically by using ACLs). Second, a ruleset permits abroader range of actions than simple permit/deny. Third, a rulesetcan have an output action that transforms an input and forwards itto another ruleset. We note however that only the root ruleset is al-lowed to have an output action that transforms the input the otherrulesets in the hierarchy are restricted to having simple outputs.

    More formally, a ruleset is represented as a table F that mapsan input N-tuple ow f to an output action. Each column j in thetable is associated with either a basic eld value (e.g., source IP) of

    the input ow f (denoted by vj (f )), or another ruleset F j , whichis itself represented as another mapping table. Let vars F = { j |column j is associated with a basic input eld }. For j varsF ,the cell F ij species a match expression. The match expressioncan include wildcards, and ranges of values just like for standardACLs. For j varsF the cell F ij contains a subset of possibleoutputs that could be produced by ruleset F j . In addition, for eachrow i , an output action act (i ) gives the output of this row. Givenan input ow f we evaluate a ruleset by recursively evaluating theF j for each j varsF . Flow f is said to match a row i if and onlyif it matches every cell in row i . The output of table F given an

    ***

    protocol

    *192.168.1.0/24

    *destIP

    ***

    srcPort

    deny**permit**permit*192.168.1.0/24outputdestPortsrcIP

    ok46***

    permit*

    INTERACTIVE- VIDEO

    no**

    ok**ok*permit

    outputdscpVOICE

    (b) REALTIME(a) VOICE

    P4**ok* P5****

    P3ok*ok*P2*okok*P1***ok

    outputOTHER-

    CRITICAL- DATA

    ROUTING CRITICAL-

    DATAREALTIME

    (c) WAN-EGRESS-POLICER-QUEUE (Policer)

    Q3**Q2ok*Q1*ok

    outputCRITICAL-

    DATAREALTIME

    (d) WAN-EGRESS-POLICER-QUEUE(Queue)

    Figure 7: Recursive ruleset representation of CoS conguration inFig. 3. Columns in italics refer to rulesets which are not shown.

    input ow f , denoted F (f ), is dened as the output action for therow with the smallest index i that matches f , i.e., the action for therst-matching row is taken. If no match is found until the end of the table, a special output empty is associated with the ow.

    Example: Fig. 7 shows how the example conguration pre-sented in Fig. 3 can be depicted using two recursive hierarchiesof rulesets. Fig. 3 has two CoS policy blocks describing the polic-ing rules (lines 7-17) and queueing rules (lines 18-28). Each of these blocks corresponds to a root ruleset as shown in Fig. 7(c)

    and Fig. 7(d), respectively. Each ACL can be represented as a leaf ruleset, while conguration constructs such as class-map corre-spond to intermediate rulesets. For instance, Fig. 7(a) and 7(b)respectively show the ruleset representation of the ACL VOICEand the class-map REALTIME. Intermediate and root rulesets areexpressed in terms of other rulesets lower in the hierarchy. Forinstance, consider Fig. 7(b). The rst two columns refer to the out-puts of the rulesets VOICE and INTERACTIVE-VIDEO. The rstrow is matched as long as the ruleset VOICE produces an outputof permit . The second row is matched for ows not matching theruleset VOICE, but matching the ruleset INTERACTIVE-VIDEO.The remaining rows are matched in a similar fashion. Likewise, inFig. 7(c), each column refers to the output of the ruleset indicated,and the output action of each row corresponds to a policing rule tobe invoked. For example, the policing rule P2 (line 12 in Fig. 3) is

    invoked only if both the CRITICAL-DATA and ROUTING rulesetsare matched, and the REALTIME ruleset is not matched.

    3.3 Flat representation of rulesetsThe representation of a ruleset as dened in 3.2 is recursive in

    that it may depend on other rulesets. While this recursive repre-sentation is easily derivable from the congurations, we require anon-recursive, at representation for composing multiple rulesets.The at representation of a ruleset F consists of a set of uniqueoutput actions {a l }, and an associated subset S (F, a l ) includingall inputs (simply abbreviated as S l when the context is clear), forwhich action a l is triggered. The subsets are non-overlapping (e.g.,l = l = S l S l = ). Note that the non-overlapping propertymakes the ordering of the actions irrelevant.

    We now present expressions that can construct the at represen-tation of a ruleset from its recursive representation using set oper-ations. Let M (F, k ) denote the set of ows that match row k of ruleset F . Let F M (F, k ) denote the set of ows that match row kof function F and none prior to it. We can dene S (F, A ) to be theset of ows for which the function F produces an output action A .Then,

    S (F, A ) = k,act ( k )= A F M (F, k ) (1a)

    F M (F, k ) = k 1i =1 (M (F, i )) M (F, k ) (1b)M (F, k ) = j vars F S (F j , F kj ) j vars F {f |v j (f ) F kj } (1c)

  • 8/12/2019 1. Modelling and Understanding End-To-End Class of Service Policies in Operational Networks

    6/12

    srcIP 1.2.2.0/24|(M4 P4,set dscp=40,Q)

    M P

    drop

    drop

    (b)

    (c)

    done

    srcIP 1.2.2.0/24|(M4,set dscp=40,P)

    srcIP 1.2.2.0/24|(M1,set dscp=10,P)

    M

    dscp=10&NC|(P1,n/a,drop)

    P

    dscp=10|(Q1,n/a,done)

    dscp z 10|(Q4,n/a,done)

    Q

    drop(a) done

    dscp=10&C|(P1,set dscp=10,Q)

    srcIP 1.2.2.0/24&NC|(M1

    P1,set dscp=10,drop)

    srcIP 1.2.2.0/24&C|(M1 P1,set dscp=10,Q)

    dscp=10|(Q1,n/a,done)

    dscp z 10|(Q4,n/a,done)

    Q

    done

    srcIP 1.2.2.0/24|(M4 P4 Q4,set dscp=40,done)

    M P Q

    srcIP 1.2.2.0/24&NC|(M1 P1 Q1,set dscp=10,drop)

    srcIP 1.2.2.0/24&C|(M1 P1 Q1,set dscp=10,done)

    dscp=40|(P4,n/a,Q)

    Figure 8: Example of composing a tree of rulesets. S l and a l areseparated by " |". C and NC respectively denote conformant and non-conformant ows.

    3.4 Composing rulesetsOnce each CoS policy block (e.g., marking, policing, queuing)

    of each device (e.g., CER, PER) is expressed using a at rulesetrepresentation, these rulesets can be composed into a single rule-set that captures the end-to-end CoS policies. There are two keyconsiderations when composing rulesets. First, an output actionmay change the input ow, e.g., the marking action, which sets theDSCP bits to some specic value. Second, some action may changethe path of the ow. For example, a policing policy may choose todrop non-conformant trafc while allowing conformant trafc tocontinue on the path. This leads us to the following formulation.An end-to-end policy can be described by a tree of nodes with eachnode of the tree having a ruleset. Each unique output action in aruleset is a triplet ( tag , func , next ), where tag is a sequenceof string tags recording output actions encountered so far, func isa mapping of the input (e.g., setting the DSCP bits), and next isa reference to another node. Note that next can also be an emptyreference in this case, the semantic is that the ow stops here. Allleaf nodes have a single triplet with an empty next for all incom-ing ows. An input ow starts at the root of the tree and will thenget forwarded to the next node until reaching a leaf node. Twotypes of leaf nodes drop and done exist to respectively signal thediscard of the ow and the end of the path traversed by the ow.The func , if exists, transforms the input at each ruleset, and theend-to-end treatment received by the ow is the concatenation of all the tag s on the path.

    We cannow more formally describe thecomposition steps neededto collapse a tree of rulesets. Given a node n and a child node n ,we combine them by replacing the ruleset in node n with a com-bined ruleset and redirect any node that previously referred to nto now refer to n . In particular, let S and S be the at repre-

    sentations for the rulesets in nodes n and n , respectively. We canderive the at representation of the combined ruleset (denoted byS ) by using a cross-product construction with S and S , wherewe replace the ows in S belonging to triplets having next = nwith ows in S using the func mapping, and leave the other owsin S unchanged. More formally, we dene the set of ows S l lassociated with the combined action a l l in S as follows:

    S l l = {f | f S l and a l (f ) S l } (2a)a l l = ( tag l tag l , func l func l , next l ) (2b)

    Note that if S l l is empty, we just delete it from S .

    Recursive RulesetRepresentation

    Section 3.2Flat Ruleset

    Representation

    Section 3.3

    ComposingRulesets

    Input FlowSpecification(in ACL form)

    End-to-endrouter path

    Auxiliary Information(e.g., Forwarding Tables,Flow Conformance)

    RouterConfigs

    Section 3.4

    End-to-endtreatment ofinput flows

    Preprocessing

    Query Processing

    ili i i l

    l

    Figure 9: Overview of CoS query tool.

    With ruleset composition, any tree can be attened into a singlenode representation. Thus, we can derive the sequence of tags thatexplain the per-ruleset treatment of a set S l of ows along the path,the func l (S l ) will describe how the ows are transformed at theend of the path. By examining the path of tags, design patterns andany misbehavior can be easily discovered.Example: Fig. 8(a) shows an example tree of at rulesets includingmarking (M), policing (P) and queueing (Q) policies. Each outputaction is stored in a tag (e.g., M1 contains the marking action setdscp=10 ). For the M ruleset, any ow not from 1.2.2.0/24 sub-net is marked with DSCP of 40, and any ow from 1.2.2.0/24 ismarked with DSCP of 10. The P ruleset examines the DSCP value

    of ows output from the M ruleset and performs the correspondingoutput action. Note that for ows marked with DSCP of 10, non-conformant trafc is dropped, which is modeled with next point-ing to a drop node. For ows marked with DSCP 40, no action istaken. The rules for the Q ruleset are simple: ows marked withDSCP of 10 goes into Q1 queue, and ows with all other DSCPvalues goes into the Q4 queue. Flows leaving the Q ruleset entera done node, indicating that the end of the path is reached and theend-to-end treatments of the ows can be reported. Fig. 8(b) andFig. 8(c) respectively show the rulesets after composing the M andP rulesets, and after composing the M, P, and Q rulesets.

    4. A TOOL FOR QUERYING COS TREAT-MENTS OF FLOWSETS

    Based on our model in 3, we have designed and implemented aquery tool that can trace the end-to-end CoS treatments of owsets.In building thetool, our primary focus was on efcient and practicalcomposition of rulesets in a real-time fashion. As a secondary goal,we were interested in augmenting the tool with auxiliary informa-tion such as router forwarding tables. In the rest of this section, wediscuss our approach to achieve these goals.

    4.1 Efciently operating on rulesetsFig. 9 shows a conceptual overview of the query tool. The tool

    consists of two phases. In the preprocessing phase, based on theconguration le of each router in the network, the appropriaterulesets for each CoS policy block are created. Next, the at rulesetrepresentation for each CoS policy is derived. The query process-

    ing phase takes two inputs: (i) a set of ows described by an ACL,whose CoS treatment needs to be determined; and (ii) a sequenceof routers that may be traversed by the owset, over which the anal-ysis must be conducted. Based on the sequence of routers speciedin the input, the preprocessed rulesets for the relevant CoS policieson the path are loaded, and these rulesets are composed with theinput owset to generate all possible end-to-end treatments that theset of ows may receive when they traverse the path.

    Flattening and composing rulesets (Equations (1) and (2)) re-quire performing set operations such as union, intersection andcomparison. In general, performing such operations on large multi-

  • 8/12/2019 1. Modelling and Understanding End-To-End Class of Service Policies in Operational Networks

    7/12

    ( ,M1)fin finv SM3 =

    SM1

    SM3

    PER1

    End-to-endrouter path

    End-to-endtreatment ofinput flows

    CER1

    Direction of traffic flow

    deny 192.168.1.1/32 permit 192.168.1.0/24

    markingpolicing

    SP1

    SP3

    SP4

    flowset (ACL)

    T

    finv SM1 finv SM1

    Figure 10: Composing a owset with rulesets on an end-to-end path.

    dimensional rulesets is computationally expensive. In fact, in ourdatasets, we found that intuitive approaches to combining rulesetsusing simple cross-products of ACLs in the recursive representa-tions of the rulesets could result in computation times of severaldays even for a single enterprise VPN.

    To effectively describe a set of ows and perform various opera-tions on owsets, we use binary decision diagrams (BDD) [14] asthe underlying data structure. A BDD is an efcient data structurethat can compactly and canonically represents a boolean functionas a directed acyclic graph, BDDs have been widely used in formalverication of digital circuits, and we have been inspired by theiruse to encode rewall rules in recent studies [19, 20, 12, 13].

    In our context, a single ow is captured by a 6-tuple includingsource and destination IP addresses and ports, protocol, and ToSbyte. Every bit in each of these elds corresponds to a BDD vari-able for instance an IP address is modeled with 32 BDD variables.Performing standard set operations such as intersection, union, andcomplement using BDDs is straightforward. It is also easy to deter-mine membership, i.e., whether a particular ow belongs to a givenowset. We refer readers to [14, 20] for more detailed informationabout BDDs.

    While standard set operators for manipulating BDDs are easilyimplementable, one atypical operation we need in our context is thefunc operator which may require the transformation of a subset of nodes in a BDD in order to support marking actions during rule-set composition. The steps that we employed to transform a BDDinvolve rst removing all BDD nodes corresponding to bits in theToS byte using a technique called existential quantication [15],constructing a separate BDD using the new ToS value alone, andmerging the two BDDs together using set-union to create the trans-formed BDD. While in our context, transformations primarily in-volve a modication of the ToS byte, we defer an investigation onwhether more general transformation functions can be efcientlysupported by BDDs to future work.

    Fig. 10 illustrates how we leverage BDDs to compose rulesets.First, the at representation of each root ruleset is precomputedusing BDDs by performing the set operations specied in Equa-tion (1). For instance, in Fig. 10, the ruleset corresponding tothe marking policy block might be represented by multiple BDDs

    (S M 1 and S M 3 ), each corresponding to a set of ows that are asso-ciated with marking actions M 1 and M 3, respectively. This com-putation itself leverages standard BDD operations. The network operator enters an ACL which is converted to a owset describedby a BDD ( f in ). The intersections of f in with S M 1 and S M 3 in-dicates the subset of input ows that are marked M 1 and M 3. Inthis example, the intersection of f in and S M 3 is a null set, indicat-ing all ows are marked as M 1. Since the marking policy changesthe ToS byte, an explicit transformation ( T ) of the output BDD isrequired before the next stage is entered. A similar process is nowfollowed for the policing policy, and the remaining stages.

    Usage Flowset ACL representation

    troubleshooting a owset operator specied operator speciedauditing a CER universal permit any any

    address space permit AS(CER) anyauditing a path universal permit any anybetween CERs address space permit AS(CER1) AS(CER2)

    Table 1: Potential usage of the query tool. AS(CER) denotes the ad-dress space of CER.

    4.2 Considering forwarding informationIn the basic version of our tool, the operator species a owset

    whose treatment needs to be determined, and the list of routers inthe path of interest. While this is already useful, the value of thetool would be greatly augmented if it could automatically identifythe routers involved.

    Our tool takes a rst but limited step towards this goal by identi-fying the source and destination CERs and PERs. To achieve this,the tool makes use of the PER forwarding tables for each CERinterface. Specically, each PER has a separate forwarding tableknown as a VRF (Virtual Routing and Forwarding), one per CER-facing interface. This table is looked up by the PER to forwardtrafc arriving from the CER. Using this information, we determinethe address space of each CER interface by nding all addresses inthe VRF for which the CER interface is used as the next hop. When

    an operator wishes to determine the ow treatment between two 2IP addresses, the CER interfaces to which they belong is rst iden-tied based on the extracted address space information. Once theCERs are determined, it is easy to determine the PERs to whichthey are attached by correlating interfaces whose IP addresses fallinto the same subnet from the router congurations. It is possiblethat for redundancy/load-sharing reasons, a CER may be attachedto multiple PERs, or may have multiple links to a PER. In such sce-narios, we determine all possible paths between the pair of CERs,and trace the CoS treatment along each path.

    In some VPNs, trafc between two CERs maybe relayed throughother intermediate CERs. Our tool may be extended to provide in-formation about the CoS treatment at each intermediate CER, if inaddition to forwarding tables, routing table data is also utilized sointermediate CERs can be determined. We note that our analysis in

    6 focuses on VPNs where all sites can directly communicate witheach other, and this issue does not arise.

    Our tool currently does not consider the P routers in the MPLSbackbone, as we did not have access to conguration and forward-ing table information of P routers. While our techniques can beextended to P routers if this information is made available, the de-signer indicated that CoS policies in the P routers are typically ho-mogeneous queueing policies that are rarely activated due to band-width overprovisioning in the backbone, and do not involve owtransformations.

    5. APPLYING THE TOOLThe tool enables an operator to input a owset using an ACL.

    This exibility enables us to apply it to various scenarios (sum-

    marized in Table 1). We describe some of the scenarios we haveexplored below: Troubleshooting and what-if analysis: In its basic usage, thetool could take an operator-specied ACL as input, to study theCoS treatment of any ow(s) of interest, locate the root cause of apotential problem, or conduct a what-if analysis. For a single owand path, we see a single possible end-to-end treatment. However,if a owset is provided, or if there are multiple paths that may betaken for redundancy reasons, then the output may include multiplepossible ow-treatments. Auditing individual routers: The tool can be used by an op-

  • 8/12/2019 1. Modelling and Understanding End-To-End Class of Service Policies in Operational Networks

    8/12

    erator to trace all possible ow treatments within a router. Thismay be useful to determine potential miscongurations such asthe presence of policies in the router that are never triggered andnon-standard ow treatments. A rst interesting mode involves us-ing the universal owset (the set of all possible ows). A secondmode, particularly to our enterprise VPN scenario, involves usinga owset which corresponds to all trafc sourced from addresses inthe address space of a CER, determined using forwarding informa-tion as described in 4.2. While using address space informationhelps detect anomalous ow treatments that are potentially presentin the network today, using the universal owset helps detect latenterrors that may not exist today, but could arise if the address spaceof the CER changes. Auditing policies across a pair of routers: Similar to the above,the tool could be used to understand all possible CoS patterns be-tween a pair of routers either by using (i) the universal owset; or(ii) the owset corresponding to all trafc with source and destina-tion addresses corresponding to the address space of the two CERs.

    6. EVALUATION AND RESULTSWe have applied our tool to study the CoS designs of several op-

    erational enterprise VPNs. In this section, we describe our datasetsand present results from our analysis.

    6.1 Data SetsWe collected router conguration les from 150 enterprise VPNs

    (ENT1ENT150). 40% of the enterprises have more than 10 CERswhile 5% have more than 100 CERs. Note that these numbers onlyreect the number of CERs, and the size of each enterprise site be-hind a CER can be much larger. These enterprises employ VPNswith any-to-any connectivity (i.e., any site can directly communi-cate with any other site). The dataset contains CER congurationles from theseenterprises, and the relevantPER conguration lesfrom the MPLS backbone. All the routers in our dataset are Ciscorouters managed by a tier-1 ISP. In addition, we have informationregarding the VRF forwarding tables of every PER on each CER-PER interface, as described in 4.2.Running Time: To get a sense of the running time of our tool, weran the tool with a query involving the universal owset (the mostcomputationally expensive query) on all CER congurations acrossall enterprises. On a dual-core Intel Itanium 2 1.6GHz system with32GB of RAM, the median time across all CER congurations was1.56 seconds, with values ranging between 0.27 and 6.55 seconds.The larger computation times were in general associated with CERswith larger ACLs. These numbers are very encouraging, and indi-cate the potential for operators to use the tool for tracing owsetsin an interactive fashion.

    6.2 Usage of CoS in practiceIn this section, we present a high-level analysis of our datasets to

    better understand the prevalence and usage of CoS policies in oper-ational networks. CoS conguration is the largest single functional

    piece in CER congurations in our datasets, with 20% 60% of theconguration lines of all CERs associated with CoS alone. Further,we have observed that CoS-related changes are among the mostfrequent in the VPNs we studied, with each CoS-related changeinvolving modications to several conguration blocks.

    Each CER in these VPNs is congured to support up to four dif-ferent data classes C1, C2, C3, and C4 (in order of decreasingpriority). C1 is the real-time class, designed for jitter and latencysensitive applications like voice and video. C2 is the premiumclass, designed for critical business applications such as databasetransactions. C3 is the bulk data class, designed for medium pri-

    0

    20

    40

    60

    80

    100

    0 1 2 3 4 5 6 7 8

    C D F ( e n t e r p r i s e s )

    Number of Distinct Data Class Combinations

    Figure 11: CDF of the number of congured class combinations forCERs in each enterprise.

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

    N u m

    b e r o

    f E n

    t e r p r i s e s

    Data Class Combination ID

    Figure 12: Number of enterprises with some CER belonging to a par-ticular combination.

    ority business applications such as email and le transfers. C4 isthe best-effort class, designed for the remaining background traf-c. In addition, other classes may exist to handle network manage-ment trafc, such as SNMP query trafc, and trafc correspond-ing to routing updates. Considering the data classes alone, eachCER could be congured to use the 4 data classes in 16 ( 24 ) pos-sible ways (which we refer to as combinations), corresponding towhether a particular data class is present or not. For instance, onecombination could include both C1 and C4, and another combina-tion could include C1 or C4 alone.

    Fig. 11 studies the diversity of CoS conguration of CERs withinan enterprise. The X-Axis is the number of combinations that arepresent in an enterprise, and the Y-Axis is the fraction of enterprisesthat contain fewer than a particular number of combinations. Wesee that 60% of the enterprises are homogeneous , containing justone single combination (i.e., the same set of classes are conguredin all CERs). However, the other enterprises show more diversity,with some having as many as 7 different combinations that are con-gured. This heterogeneity may stem from variations in bandwidthcapacity provisioned across sites, and the unique CoS requirementsof each site. For example, a site hosting the VoIP customer ser-vice center may only need high priority real-time class, while a site

    with multiple applications may need a mixture of voice and dataclasses. As another example, data centers and head ofces are of-ten provisioned with higher capacity to the provider backbone thanperipheral locations.

    Fig. 12 studies whether all, or only a subset, of the 16 possiblecombinations are actually congured. Each combination is taken,and the number of enterprises with at least one CER that uses thecombination is considered. It is interesting to see that all possiblecombinations arecongured for some VPN. In addition, some com-binations are more popular than the others 8 combinations (com-binations 9 to 16) are each present in over 10 enterprises. These

  • 8/12/2019 1. Modelling and Understanding End-To-End Class of Service Policies in Operational Networks

    9/12

    CER1 PER1 PER2

    MPLS

    Direction of traffic flow

    S

    To CER2

    CER2

    CustomerLAN

    M2

    M3

    M4

    Pm

    Tx

    P2

    Tx

    Tx

    P3

    P4

    Qm

    Q2

    Q3

    Q4

    exp:4

    exp:4

    exp:3

    Q(C3)

    Q(C4)

    Q(C1,C2,NM)

    Figure 13: All possible end-to-end CoS treatments received by owssent from CER1 to CER2 in enterprise ENT6. Each of the round boxcorresponds to a CoS policy block.

    combinations correspond to different arrangements with the classC3 (i.e., C3 alone, or C3 plus a subset of other classes), with thelargest combination (present in 100 enterprises) being the use of C3

    alone. Further discussion with the designer indicates that C3 is therecommended default class for customers. Finally, we observe thateven across homogeneous enterprises, 11 combinations are used,indicating signicant diversity in CoS usage across enterprises.

    Overall, these results show that (i) the use of CoS is prevalentin enterprise VPNs, (ii) there is signicant diversity in CoS usageacross enterprises, and (iii) there exists a large degree of hetero-geneity in how routers within an enterprise are congured. Theseresults strengthen the case for formal modeling of CoS policies.

    6.3 Tracing end-to-end CoS design patternsAn application of our tool is to identify all possible treatments

    that a owset may receive on an end-to-end path. This can helpan operator analyze the design patterns used in putting together theCoS design, as well as identify potentially anomalous patterns.

    Fig. 13 pictorially shows the output obtained with our tool, whenrun on the end-to-end path between the two CERs (CER1-PER1-PER2-CER2) in ENT6. The left side of the gure shows three poli-cies within CER1, the central portion shows the marking policy atthe ingress interface of PER1, and the right side shows the queuingpolicy at the egress interface of PER2. At CER1, on the input inter-face, trafc from the customer site (denoted by S ) could be markedusing markers M2, M3, or M4, as belonging to C2, C3, or C4. Traf-c corresponding to each of these classes is policed according todifferent parameters (dictated by policers P2 , P3 , and P4 ), and thecompliant/non-compliant trafc of different classes is then queuedin separate queues Q2 , Q3 and Q4 . Some interesting parts of thedesign deserve further discussion. First, a subset of trafc markedas belonging to each of these classes are simply transmitted (de-

    noted by a policing rule Tx ) without checks for compliance andare respectively queued in the appropriate class based on markingsobtained at the input interface. This trafc corresponds to SLAprobe trafc, which may be injected to help determine that the traf-c corresponding to a particular class meets the performance met-rics specied in the SLAs. Since the operator may explicitly desireto test the performance of compliant trafc, the trafc must not besubject to normal policing checks and should not be re-marked asnon-compliant. Second, a subset of trafc marked as belonging toC2, C3, and C4 in the input interface are overridden and re-markedas corresponding to network management trafc by the policer Pm.

    0

    20

    40

    60

    80

    100

    0 5 10 15 20 25 30 35

    C D F ( e n t e r p r i s e s )

    Number of CERs with shadowed policies

    universal shadowingaddress space shadowing

    Figure 14: CDF of the number of CERs with some shadowed trafcclass per enterprise.

    This trafc typically corresponds to SNMP query trafc and BGProuting update trafc, that must be treated separately from the dataclasses. In fact, we can see this trafc is queued separately in Qm.

    Customer trafc leaving CER1 enters the ingress interface of PER1. Based on the ToS byte of each packet, PER1 changes theEXP eld in its MPLS label (see 2.2). With network managementtrafc, the EXP value is set unconditionally. However, the EXPvalue of data trafc is changed conditionally based on compliance

    to pre-set trafc parameters. Interestingly, trafc corresponding totwo of the data classes C2 and C3 is provided with an identicalEXP value, indicating that while the differentiation between trafcbelonging to C2 and C3 occurs at the enterprise edge, the trafc istreated identically in the MPLS core. However, trafc in C4 con-tinues to be treated at a lower priority level inside the core.

    Finally, the right side of the gure shows how trafc is treated asit comes out of PER2, and before entering CER2. We see that whileC3, and C4 trafc enter separate queues, the same queue is usedfor C1, C2, and network management (NM) trafc. Note howeverthat the treatment handed out to these 3 trafc classes can still bedifferent (e.g., with different drop probabilities during congestion).

    This example illustrates the power of our model in automaticallyextracting the patterns used by the designer in identifying treat-ment corresponding to various ows. It also highlights the need fora systematic model, given the signicant complexity of possibletreatments received by a owset.

    6.4 Detecting shadowed policy congurationsOne application of our model is that it can help highlight shad-

    owed policies (see 2.3). For instance, a CER may have a queuingpolicy that pertains to four different classes. However, it is possiblethat no trafc is ever classied as belonging to one of the classes,and hence that portion of the queuing policy is never exercised.

    In identifying shadowed policy congurations within a CER, weconsider two types of shadowing: (i) Universal Shadowing: Here,we consider classes shadowed when the universal set of ows (see5 and Table 1) are fed through a CER this indicates that a por-tion of policy conguration is never utilized regardless of what traf-

    c may ow through the CER; (ii) Address Space Shadowing: weconsider shadowing that occurs only when the input ow containsa source address in the address space of the CER. This indicatesthat a portion of policy conguration is not utilized given the cur-rent address space assignments to the CER, but might be used laterwhen the address space changes.

    Fig. 14 shows the prevalence of shadowed policies in the 150enterprises. The number of CERs where a particular type of shad-owing is present is computed for each enterprise, and a CDF isplotted. We see that around 35% of the enterprises have someCERs with classes shadowed, and 5% of the enterprises have more

  • 8/12/2019 1. Modelling and Understanding End-To-End Class of Service Policies in Operational Networks

    10/12

    CER1d:C1,C3,C4

    a:C1

    CER2d:C1,C3,C4

    a:C1

    C1C1

    CER1d:C2,C3 a:C2,C3

    (a) ENT1 CER2

    d:C2,C3 a:C2,C3

    CER3d:C2,C3 a:C2,C3

    C3C3

    C3C3

    C3C3

    (b) ENT20Figure 15: CoS matrix for enterprises ENT1 and ENT20.

    CER1d:C1a:C1

    ENT83

    CER4d:C1,C3 a:C1,C3

    CER3d:C1,C3 a:C1,C3

    CER6d:C1,C3 a:C1,C3

    CER5d:C1,C3 a:C1,C3

    CER2d:C1a:C1

    CER1bd:C1a:C1

    CER2bd:C1a:C1

    CER3bd:C1,C3 a:C1,C3

    C1,C3C1,C3C1C1

    Figure 16: CoS matrix for enterprise ENT83.

    than 5 routers shadowed. While the presence of shadowing mightcorrespond to an inadvertent error made by the operator, it mayalso correspond to legacy conguration lines or actual design in-tent. For example, the tail in Fig. 14 corresponds to enterpriseENT139, where 36 CERs have universal shadowing of class C4in the policing and queueing policy congurations. A potential be-nign explanation is that this was due to a deliberate design intentto remove class C4 from these routers. To achieve this change, theoperators possibly modied the marking policy to no longer mark ows as corresponding to C4, but did not remove the policing andqueuing rules already congured for that class. While a judgementregarding whether a particular example represents genuine designintent, or an error can only be made by the appropriate operators, atool like ours can bring such examples to the operators attention.

    6.5 Mapping network-wide CoS designsUsing our tool, one may obtain network-wide views of CoS de-

    signs. To derive such views, we consider a simple abstraction,called a CoS matrix that models how various trafc classes can beexchanged between two CERs (see Fig. 15(a) for an example). Wecan visualize a CoS matrix with a directed graph. Here, each noderepresents a CER annotated with a set of classes congured (de-noted d: ) and classes that may actually be active (denoted a: ) tak-ing shadowing into account. A bidirectional link between two nodesindicates that when only the address spaces of the two CERs areconsidered, they can communicate in a symmetric fashion using a

    particular set of trafc classes as annotated on the link.We have obtained such graphs for all enterprises in our datasets,and we discuss some interesting examples that point the value of our model and abstraction below.

    Fig. 15(a) illustrates the CoS matrix for ENT1, a simple VPNwith two CERs. The annotations on each CER indicate that policiescorresponding to C1, C3, and C4 are dened in the conguration,but only trafc corresponding to C1 may exit the router once shad-owed policies are considered. Further, the two routers have a link between them annotated with C1 indicating that C1 trafc may beexchanged in either direction. Fig. 15(b) depicts the CoS matrix for

    ENT20, which has three CERs. In this case, trafc correspondingto C2 and C3 may exit each router, which is consistent with whatis congured by the network operator. However, it is interesting tosee that once the address spaces of the CERs are considered, eachrouter pair may only exchange C3 trafc. These examples high-light the value of our model that can enable us to reason about whattrafc can be effectively exchanged between two VPN sites.

    To show the value of our CoS matrix abstraction in reasoningabout the network design, Fig. 16 considers ENT83, a more com-plex VPN with 9 CERs. It is interesting that 4 of the routers,namely CER3, CER4, CER5, and CER6, form a clique and ex-change trafc corresponding to C1 and C3 as shown by the dottedlines. However two other routers (CER1 and CER2) are exclu-sively congured with C1 and can only exchange C1 trafc withother sites. We hypothesize that these two sites correspond to voicecall centers and only involve voice trafc. In contrast, the othersites may include both voice and data trafc. Another interestingaspect is that CER1 and CER2 are each co-located with anotherrouter (i.e., CER1b and CER2b). This corresponds to a primary-backup arrangement, where each site has two CERs, with one con-gured as a primary and another congured as a backup. Thebackup router normally does not involve trafc but may take overif the primary fails. Finally, we note that a primary-backup ar-

    rangement is also used with CER3. Interestingly, we observed thatCER3 has a default route back to itself. Discussions with the de-signers shows that this is an indication that CER3 is a gateway site,through which all trafc to the Internet is routed. It makes sensethat such redundant arrangements are mainly employed for criticalsites such as CER1, CER2, and CER3 that correspond to voice cen-ters or gateway sites, where the costs of having redundancy setupsmay be worthwhile.

    One interesting observation is that in some enterprise VPNs, wehave seen examples of asymmetric classes between CER pairs. Forexample, a pair of CERs may be able to communicate in C1 inone direction, but not in the reverse direction. While we are un-able to conrm the exact reasons behind this, we hypothesize thatthere may be two possible scenarios. First, this could correspondto streaming video trafc, or certain trafc types which are usu-

    ally purely unidirectional. Second, it is possible that the two sitesare not intended to exchange C1 trafc in practice. To know thisfor sure, we would need to combine actual trafc data exchangedbetween sites with our CoS model.

    6.6 Non-standard ow treatmentsOne application of our model and tool is in detecting whether

    various application ows are treated in a correct and expected fash-ion, and to identify any non-standard ow treatment that is a depar-ture from best practice. To investigate this further, we used our toolto collect a list of standard ow-treatment patterns that are expectedto hold and conrmed them with network designers. These patternscapture the expected consistency in treatment of ows throughoutthe marking, policing and queuing stages of the CERs. While we

    have observed several patterns, in this paper we only list three mostsignicant and interesting ones, denoted by P1 P3. P1: Flows marked as belonging to a data class by the markingstage are usually re-marked by the policing stage as belonging tothe same class, but either conformant or non-conformant. A depar-ture would be a case where a ow marked as belonging to a dataclass (say C1) is re-marked by the policing stage as correspondingto a different data class (say C2). P2: Flows leaving the policing stage marked as a particular dataclass should go to the corresponding queue. A potential departureis a scenario where a ow does not go into any queue dened in the

  • 8/12/2019 1. Modelling and Understanding End-To-End Class of Service Policies in Operational Networks

    11/12

    0

    20

    40

    60

    80

    100

    0 5 10 15 20

    C D F ( e n t e r p r i s e s )

    Number of CERs with non-standard flow treatments

    departure from P1departure from P2departure from P3

    Figure 17: CDF of the number of CERs with non-standard ow treat-ment.

    conguration. In such cases, the ow enters a default queue andcould see non-deterministic and likely degraded performance, be-cause the behavior of default queues is vendor and model-specic. P3: Flows that exit the CER should be explicitly marked by theCER. A departure is a concern since the treatment of the ow is notdeterministic, and depends on how that ow was marked prior toentering the CER.

    In detecting the presence of non-standard ow treatments forCERs, we have considered cases where non-standard ow treat-ments could occur for (i) ows that correspond to the address spaceof the CER; and (ii) the universal owset (see 5 and Table 1). Forspace reasons, we present results only when the address space of the CER is considered.

    Fig. 17 shows the departure from standard ow treatments acrossall enterprises, considering the address space of each CER. Eachcurve corresponds to one of the particular standard ow treatmentconditions that were violated. The X-Axis shows the number of routers in an enterprise for which such departure from standardtreatment exists, and a point (X, Y ) shows that Y % of the enter-prises have at most X CERs with non-standard treatment. For allthree patterns, there is no departure from standard treatment fornearly 80% of the enterprises, and at most 5 routers for 95% of theenterprises. This result indicates although non-standard ow treat-ments are not prevalent, they do exist in a small set of routers in afew enterprises.

    We further analyzed the cases involving departures from stan-dard treatments, and discussed these cases with designers and op-erators. We summarize some of the insights from those discussions: Change in conguration practices, and handling of legacy routers:The departures from pattern P1 corresponded to cases where owsthat traverse a CER could be marked on the input interface as corre-sponding to a particular data class, and could then be re-marked onthe output interface as corresponding to another different data class.This was not expected, as the standard expected ow treatments in-volved each CER marking the ow only once, or re-marking asconformant or non-conformant trafc for the same data class. Dis-cussions with operators revealed that this departure was potentially

    due to changes in conguration practices, as well the handling of legacy routers. While the earlier practice was to congure markingpolicies in the input interfaces of routers due to vendor capabilitiesexisting at the time, the newer practice was to congure markingpolicies in the output interfaces of routers as underlying vendor ca-pabilities evolved. Legacy routers congured using the earlier ap-proach were not typically modied unless an actual policy changewas involved, in which case the policy changes were made on theoutput interfaces, which is consistent with the newer practices. Thelegacy conguration in the input interface was left behind, thoughthe best practice would have been to remove that.

    Underestimating performance of network through miscongura-tion of probe trafc: The departures from pattern P2 correspondedto cases where trafc corresponding to SLA probes from someCERs were incorrectly queued. Recall from 6.3 that SLA probesare used to monitor the performance of each trafc class and ensurethat it is in compliance with the requirements. Incorrect congura-tion of the SLA trafc does not impact the performance of the ap-plication itself; however, it could impact the monitoring results. Inthis scenario, the SLA probes were incorrectly assigned to the de-fault queue which receives a lower priority than any other queues.This would result in the probes underestimating the performancecompared to what the trafc classes are actually experiencing. Pre-marked customer trafc: We found examples where owsthat traverse a CER could potentially not be marked at all (neitherat the policing nor marking stage) by the CER, thereby departingfrom pattern P3. The operators indicated that these scenarios likelycorresponded to cases where an explicit agreement with the cus-tomer stipulates that the customer would properly mark all trafcbefore it reached the CER. While best practice would have been forthe CERs to explicitly re-mark the trafc, the net treatment receivedby correctly pre-marked ows would not be affected.

    7. RELATED WORKIn recent years, the modeling and understanding of network de-

    signs, and detection of errors through conguration analysis hasevolved into an important areaof research. Researchers have lookedat routing designs [17, 5], route redistribution policies [16], reach-ability analysis in enterprises [18], modeling of BGP policies [10,6, 4], and intra-domain trafc engineering [11]. Industry-driven ef-forts have attempted to simplify conguration through the use of templates [1, 3, 9], or vendor-neutral conguration languages [7, 8,2]. In contrast to these works, our focus is on CoS policies in enter-prise networks, an important area that has received little attention.The CoS domain offers several distinguishing challenges such ashighly intertwined and nested conguration, policies instantiatedover multiple routers, and the use of large ACLs. Further, manag-ing CoS conguration involves tuning class memberships and CoS

    policies at the granularity of individual ows, and miscongura-tions can result in SLA violations and adverse consequences.The analysis of CoS conguration has some similarity to the

    analysis of rewall rules in that both domains deal with large ACLsand operate at the granularity of ows. Further, our use of BDDshas been inspired by [20, 12, 13]. However, while much work onrewall analysis deals with issues related to misconguration of in-dividual ACLs, our focus is on modeling CoS policies in a network-wide fashion across multiple routers. Recent works on distributedrewalls [20, 12, 13] and enterprise reachability [18, 5] do considercombining ACLs across devices. However, the actions are simple,involving just a permit/deny of ows. With CoS, each router couldbe associated with multiple policies (marking, policing, queueing),each of which could have multiple ACLs corresponding to dif-ferent classes, and actions could be much more complex, poten-tially involving packet transformations. Our approach could thusbe viewed as a generalization of modeling simple ACLs.

    8. DISCUSSION AND OPEN ISSUESIn this section, we discuss some key aspects of our work, and

    open issues:Scalability of ruleset composition: While our modeling frame-work allows for the composition of an arbitrary number of rule-sets, our current tool implementation focuses on conguration of the CERs and PERs. Thus, relatively few rulesets need to be com-

  • 8/12/2019 1. Modelling and Understanding End-To-End Class of Service Policies in Operational Networks

    12/12

    posed despite each router may have multiple root rulesets, corre-sponding to each of the marking, policing and queueing policies.We believe that this is reasonable in our MPLS VPN settings giventhat the CoS policies in the CER and PERs exhibit signicant het-erogeneity across customer networks and change frequently overtime, while the CoS policies in the P routers reect provider poli-cies, and tend to be homogeneous and stable. That said, one im-portant question pertains to the scalability of our approach with thenumber of rulesets that need to be composed. We believe that theuse of BDDs helps ensure betterscaling properties than naive cross-product techniques, and in the general, the computational complex-ity would depend on the number of distinct ow treatments, and thetypes of ow transformations that may occur. In practice, we ex-pect the number of distinct ow treatments to be bounded, and theow transformations to be relatively simple (such as changing theToS byte), which would help contain the complexity. We defer amore detailed investigation of these issues to the future.Combining with routing information: In this paper, we focus oncomposing rulesets given a set of routers, as well as use forward-ing table information to determine the appropriate CERs and PERsfor a given ow. An interesting direction for future work involvesautomatically determining the entire set of routers on the path, per-haps through the use of routing table information. Availability of

    such information will also enable use of the tool in settings suchas hub-and-spoke VPNs where trafc between two sites may tra-verse through intermediate PERs and CERs.Extension to other vendors: Our tool is currently based on CiscoIOS. An interesting future direction is to extend our approach tosupport other vendors such as Juniper and Alcatel. Based on a pre-liminary inspection of congurations of these vendors, we believea modeling approach such as ours is still important and relevant.The primary effort in adapting to other vendors involves the devel-opment of a language-specic parser that can derive the formal rep-resentation that we have proposed. One open question is whetherour model is expressive enough to capture all conguration optionsof these vendors. In general, we believe that if a conguration lan-guage uses ACLs to describe matching ows that belong to a trafcclass (as is the case with Cisco IOS), it ts naturally into our model.

    Visualizing CoS matrix: While our CoS matrix abstraction canhelp identify the set of classes that can be exchanged between ev-ery pair of VPN sites, a ripe avenue for future work involves devel-oping better visualization techniques that can compactly show theCoS design for large-scale VPNs. One simple heuristic that we be-lieve can signicantly help involves identifying sites that exchangeexactly the same set of classes with other sites and collapsing theminto a single instance.

    9. CONCLUSIONIn this paper, we have shown how to model CoS policies from

    low-level device congurations. Our approach centers around rule-sets , a formal representation of conguration policies. Our repre-sentation captures CoS policies in a manner independent of cong-

    uration syntax, allows the composition of policies across multiplepolicy blocks within and across devices, and enables us to deal withissues such as the transformation of ows across stages.

    Based on our model, we have built a tool that can trace the end-to-end CoS treatments of an arbitrary set of ows. The tool hascomputation times in the order of seconds on real datasets and isconducive for use by operators in an interactive fashion.

    Using our tool, we have conducted the rst study on CoS designsof operational networks and analyzed a cross-section of 150 differ-ent enterprise VPNs. Our analysis shows that the usage of CoSis widely prevalent, points to signicant diversity in CoS designs

    across enterprises, and indicates that a large degree of heterogene-ity exists in terms of how different routers within an enterprise arecongured.

    The results also conrm the importance and effectiveness of ourmodel in assisting operators to reason about network-wide CoSoperations. In particular, we have demonstrated the potential of the tool in extracting all possible ow treatments across a givenset of devices, identifying non-standard and potentially anomalousow treatments and shadowed conguration policies, and derivingnetwork-wide views of CoS designs.

    Moving forward, we are interested in leveraging the concept of rulesets to model other areas of network conguration, extendingour tool to consider other vendor congurations, and enhancing thecapability of our tool by incorporating models of routing design,control and data plane reachability, as well as trafc data.

    10. ACKNOWLEDGMENTSWe thank everyone who helped make this research possible, in-

    cluding Jennifer Rexford for her helpful comments on an earlierversion of the paper. We also thank the anonymous reviewers andGianluca Iannaccone, whose suggestions beneted the nal ver-sion of the paper. This work was supported by NSF grant CNS-0721488.

    11. REFERENCES[1] Cisco IP solution center. http://www.cisco.com/en/US/products/

    sw/netmgtsw/ps4748/index.html .[2] DSL forum TR-069. http://www.broadband-forum.org/

    technical/download/TR-069.pdf .[3] Intelliden. http://www.intelliden.com/ .[4] C. Alaettinoglu, C. Villamizar, E. Gerich, D. Kessensand, D. Meyer, T. Bates,

    D. Karrenberg, and M. Terpstra. Routing policy specication language (RPSL).RFC 2622, June 1999.

    [5] T. Benson, A. Akella, and D. Maltz. Unraveling the complexity of network management. In Proc. NSDI , 2009.

    [6] H. Boehm, A. Feldmann, O. Maennel, C. Reiser, and R. Volk. Network-wideinter-domain routing policies: Design and realization. In Proc. NANOG 34 ,2005.

    [7] J. Case, M. Fedor, M. Schoffstall, and J. Davin. A simple network managementprotocol (SNMP). RFC 1157, May 1990.

    [8] Distributed Management Task Force, Inc. http://www.dmtf.org .[9] W. Enck, P. McDaniel, S. Sen, P. Sebos, S. Spoerel, A. Greenberg, S. Rao, and

    W. Aiello. Conguration management at massive scale: System design andexperience. In Proc. USENIX , 2007.

    [10] N. Feamster and H. Balakrishnan. Detecting BGP conguration faults withstatic analysis. In Proc. NSDI , 2005.

    [11] A. Feldmann and J. Rexford. IP network conguration for intradomain trafcengineering. In IEEE Network Magazine , Sept. 2001.

    [12] H. Hamed and E. Al-Shaer. Anomaly discovery in distributed rewalls. In Proc. IEEE INF OCOM , 2004.

    [13] H. Hamed, E. Al-Shaer, and W. Marrero. Modeling and verication of ipsec andvpn security policies. In Proc. IEEE ICNP , 2005.

    [14] S. Hazelhurst, A. Attar, and R. Sinnappan. Algorithms for improving thedependability of rewall and lter rule lists. In Proc. DSN , 2000.

    [15] P. G. Hinman. Fundamentals of Mathematical Logic . A K Peters Ltd, 2005.[16] F. Le, G. Xie, D. Pei, J. Wang, and H. Zhang. Shedding light on the glue logic

    of internet routing architecture. In Proc. ACM SIGCOMM , 2008.[17] D. Maltz, G. Xie, J. Zhan, H. Zhang, G. Hjalmtysson, and A. Greenberg.

    Routing design in operational networks: A look from the inside. In Proc. ACM SIGCOMM , 2004.

    [18] G. Xie, J. Zhan, D. A. Maltz, H. Zhang, A. Greenberg, G. Hjalmtysson, andJ. Rexford. On static reachability analysis of IP networks. In Proc. IEEE INFOCOM , 2005.

    [19] L. Yuan, C.-N. Chuah, and P. Mohapatra. Progme: Towards programmablenetwork measurement. In Proc. ACM SIGCOMM , 2007.

    [20] L. Yuan, J. Mai, Z. Su, H. Chen, C.-N. Chuah, and P. Mohapatra. Fireman: Atoolkit for rewall modeling and analysis. In Proc. IEEE Symposium onSecurity and Privacy , 2006.