13
IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 9, NO. 6, DECEMBER 2001 733 On Inferring Autonomous System Relationships in the Internet Lixin Gao, Member, IEEE Abstract—The Internet consists of rapidly increasing number of hosts interconnected by constantly evolving networks of links and routers. Interdomain routing in the Internet is coordinated by the Border Gateway Protocol (BGP). BGP allows each autonomous system (AS) to choose its own administrative policy in selecting routes and propagating reachability information to others. These routing policies are constrained by the contractual commercial agreements between administrative domains. For example, an AS sets its policy so that it does not provide transit services between its providers. Such policies imply that AS relationships are an important aspect of Internet structure. We propose an augmented AS graph representation that classifies AS relation- ships into customer–provider, peering, and sibling relationships. We classify the types of routes that can appear in BGP routing tables based on the relationships between the ASs in the path and present heuristic algorithms that infer AS relationships from BGP routing tables. The algorithms are tested on publicly available BGP routing tables. We verify our inference results with AT&T internal information on its relationship with neighboring ASs. As much as 99.1% of our inference results are confirmed by the AT&T internal information. We also verify our inferred sibling relationships with the information acquired from the WHOIS lookup service. More than half of our inferred sibling-to-sibling relationships are confirmed by the WHOIS lookup service. To the best of our knowledge, there has been no publicly available information about AS relationships and this is the first attempt in understanding and inferring AS relationships in the Internet. We show evidence that some routing table entries stem from router misconfigurations. Index Terms—Border Gateway Protocol (BGP), Internet, proto- cols, routing. I. INTRODUCTION T HE INTERNET has experienced a tremendous growth in its size and complexity since its commercialization. The Internet connects thousands of autonomous systems (ASs) oper- ated by many different administrative domains such as Internet service providers (ISPs), companies and universities. Since two ISPs might merge into one and each administrative domain can possess several ASs, an administrative domain can operate one or several ASs. Routing within an AS is controlled by intrado- main routing protocols such as static routing, OSPF, IS-IS, and Manuscript received September 5, 2000; revised November 21, 2000; rec- ommended by IEEE/ACM TRANSACTIONS ON NETWORKING Editor V. Paxson. The author was supported in part by the National Science Foundation under Grants ANI-9977555 and ANI-0085848, and by an NSF CAREER Award Grant ANI-9875513. The author is with the Department of Electrical and Computer Engi- neering, University of Massachusetts, Amherst, MA 01003 USA (e-mail: [email protected]). Publisher Item Identifier S 1063-6692(01)10548-0. RIP. A pair of ASs interconnect via dedicated links and/or public network access points, and routing between ASs is determined by the interdomain routing protocol such as Border Gateway Protocol (BGP). One key distinct feature of the interdomain routing protocol is that it allows each AS to choose its own ad- ministrative policy in selecting the best route, and announcing and accepting routes. One of the most important factors in deter- mining routing policies is the commercial contractual relation- ships between administrative domains. The commercial agreements between pairs of administrative domains can be classified into customer–provider, peering, mu- tual-transit, and mutual-backup agreements [2], [3]. A customer pays its provider for connectivity to the rest of the Internet. Therefore, a provider does transit traffic for its customers. However, a customer does not transit traffic between two of its providers. A pair of peers agree to exchange traffic between their respective customers free of charge. A mutual-transit agreement allows a pair of administrative domains to provide connectivity to the rest of the Internet for each other. This mutual-transit relationship is typically between two adminis- trative domains such as small ISPs who are located close to each other and who cannot afford additional Internet services for better connectivity. A pair of administrative domains may also provide backup connectivity to the Internet for each other in the event that one administrative domain’s connection to its provider fails. These contractual commercial agreements between adminis- trative domains play a crucial role in shaping the structure of the Internet and the end-to-end performance characteristics. Pre- vious work on the Internet topology has been focused on the interconnection structure at either AS or router level [4]–[9]. Since routing between ASs is controlled by BGP, a policy-based routing protocol, connectivity does not imply reachability. For example, national ISPs A and B are connected to their customer, a regional ISP, C, respectively. Although ISPs A and B are con- nected through ISP C, ISP A cannot reach ISP B via ISP C, since C as a customer does not provide transit services between its providers. Even if ISPs A and B can reach each other via other ISPs, the end-to-end performance characteristics between A and B cannot be inferred from that of between A and C and between C and B. For example, the delay between A and B is independent of the total delay between A and C and between C and B. This has been observed by several measurement studies [10], [11]. Therefore, a global picture of AS relationships is an important aspect of the Internet structure. We propose an augmented AS graph representation to cap- ture AS relationships. We classify the relationship between a pair of interconnected ASs into customer–provider, peering, and 1063–6692/01$10.00 © 2001 IEEE

On inferring autonomous system relationships in the

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On inferring autonomous system relationships in the

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 9, NO. 6, DECEMBER 2001 733

On Inferring Autonomous System Relationshipsin the InternetLixin Gao, Member, IEEE

Abstract—The Internet consists of rapidly increasing number ofhosts interconnected by constantly evolving networks of links androuters. Interdomain routing in the Internet is coordinated by theBorder Gateway Protocol (BGP). BGP allows each autonomoussystem (AS) to choose its own administrative policy in selectingroutes and propagating reachability information to others. Theserouting policies are constrained by the contractual commercialagreements between administrative domains. For example, anAS sets its policy so that it does not provide transit servicesbetween its providers. Such policies imply that AS relationshipsare an important aspect of Internet structure. We propose anaugmented AS graph representation that classifies AS relation-ships into customer–provider, peering, and sibling relationships.We classify the types of routes that can appear in BGP routingtables based on the relationships between the ASs in the path andpresent heuristic algorithms that infer AS relationships from BGProuting tables. The algorithms are tested on publicly availableBGP routing tables. We verify our inference results with AT&Tinternal information on its relationship with neighboring ASs.As much as 99.1% of our inference results are confirmed by theAT&T internal information. We also verify our inferred siblingrelationships with the information acquired from the WHOISlookup service. More than half of our inferred sibling-to-siblingrelationships are confirmed by the WHOIS lookup service. Tothe best of our knowledge, there has been no publicly availableinformation about AS relationships and this is the first attempt inunderstanding and inferring AS relationships in the Internet. Weshow evidence that some routing table entries stem from routermisconfigurations.

Index Terms—Border Gateway Protocol (BGP), Internet, proto-cols, routing.

I. INTRODUCTION

THE INTERNET has experienced a tremendous growth inits size and complexity since its commercialization. The

Internet connects thousands of autonomous systems (ASs) oper-ated by many different administrative domains such as Internetservice providers (ISPs), companies and universities. Since twoISPs might merge into one and each administrative domain canpossess several ASs, an administrative domain can operate oneor several ASs. Routing within an AS is controlled by intrado-main routing protocols such as static routing, OSPF, IS-IS, and

Manuscript received September 5, 2000; revised November 21, 2000; rec-ommended by IEEE/ACM TRANSACTIONS ON NETWORKING Editor V. Paxson.The author was supported in part by the National Science Foundation underGrants ANI-9977555 andANI-0085848, and by an NSFCAREERAwardGrantANI-9875513.The author is with the Department of Electrical and Computer Engi-

neering, University of Massachusetts, Amherst, MA 01003 USA (e-mail:[email protected]).Publisher Item Identifier S 1063-6692(01)10548-0.

RIP. A pair of ASs interconnect via dedicated links and/or publicnetwork access points, and routing between ASs is determinedby the interdomain routing protocol such as Border GatewayProtocol (BGP). One key distinct feature of the interdomainrouting protocol is that it allows each AS to choose its own ad-ministrative policy in selecting the best route, and announcingand accepting routes. One of the most important factors in deter-mining routing policies is the commercial contractual relation-ships between administrative domains.The commercial agreements between pairs of administrative

domains can be classified into customer–provider, peering, mu-tual-transit, and mutual-backup agreements [2], [3]. A customerpays its provider for connectivity to the rest of the Internet.Therefore, a provider does transit traffic for its customers.However, a customer does not transit traffic between two of itsproviders. A pair of peers agree to exchange traffic betweentheir respective customers free of charge. A mutual-transitagreement allows a pair of administrative domains to provideconnectivity to the rest of the Internet for each other. Thismutual-transit relationship is typically between two adminis-trative domains such as small ISPs who are located close toeach other and who cannot afford additional Internet servicesfor better connectivity. A pair of administrative domains mayalso provide backup connectivity to the Internet for each otherin the event that one administrative domain’s connection to itsprovider fails.These contractual commercial agreements between adminis-

trative domains play a crucial role in shaping the structure ofthe Internet and the end-to-end performance characteristics. Pre-vious work on the Internet topology has been focused on theinterconnection structure at either AS or router level [4]–[9].Since routing between ASs is controlled by BGP, a policy-basedrouting protocol, connectivity does not imply reachability. Forexample, national ISPs A and B are connected to their customer,a regional ISP, C, respectively. Although ISPs A and B are con-nected through ISP C, ISP A cannot reach ISP B via ISP C,since C as a customer does not provide transit services betweenits providers. Even if ISPs A and B can reach each other viaother ISPs, the end-to-end performance characteristics betweenA and B cannot be inferred from that of between A and C andbetween C and B. For example, the delay between A and B isindependent of the total delay between A and C and between Cand B. This has been observed by several measurement studies[10], [11]. Therefore, a global picture of AS relationships is animportant aspect of the Internet structure.We propose an augmented AS graph representation to cap-

ture AS relationships. We classify the relationship between apair of interconnected ASs into customer–provider, peering, and

1063–6692/01$10.00 © 2001 IEEE

Page 2: On inferring autonomous system relationships in the

734 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 9, NO. 6, DECEMBER 2001

sibling relationships. There is no publicly available informa-tion about inter-AS relationships. ISPs do not register their re-lationships to the Internet registries. Internet registries such asARIN [1] do provide information such as who administrates anAS. However, the information can be out of date and does notimply anything about how ASs relate to each other. Contrac-tual agreements between ISPs are proprietary and companiesare unwilling to reveal even the names of their ISPs [12]. In-ternet Routing Registries (IRR) was created as a repository ofrouting policies. However, some ISPs are not willing to revealtheir policies, and even if they were, these routing policies mightnot specify AS relationships.In this paper, we present heuristic algorithms that infer the

augmentedAS graph fromBGP routing tables.We first formallypresent the routing policies implied by AS relationships and de-rive routing table entry patterns as the result of routing poli-cies. We then infer the AS relationships based on the heuristicthat the size of an AS is typically proportional to its degreein the AS graph. This heuristic has been used by Govindanand Reddy [5] in classifying ASs into four levels of hierarchy.Our heuristic algorithms classify an interconnected AS pair intohaving a provider–customer, peering, or sibling relationship.The running time of the algorithm is linear in the total numberof consecutive AS pairs in the routing tables. To the best of ourknowledge, this is the first attempt in understanding and infer-ring AS relationships in the Internet.Furthermore, we perform an experimental study of AS rela-

tionships in the Internet. BGP routing tables are retrieved fromthe Route Views server in Oregon [16], which is publicly avail-able and has the most complete view currently available. TheRouteViews server establishes BGP peering sessionswithmanytier-1 and tier-2 ISPs. Among the connected AS pairs, the algo-rithms infer that more than 90.5% of the AS pairs have cus-tomer–provider relationships, less than 1.5% of the AS pairshave sibling relationships, and less than 8% of the AS pairshave peering relationships. We verify our inference results withAT&T internal information on its relationship with neighboringASs. Our result shows that 100% of our inferred customers areconfirmed by the AT&T internal information. 100% of our in-ferred peers are confirmed by the AT&T internal information.20% of our inferred siblings are confirmed by the AT&T in-ternal information. Out of all of our inference results, 98.9% ofinference results are confirmed by the AT&T internal informa-tion. We also verify our inferred sibling relationships with theinformation acquired from theWHOIS lookup service [1].Morethan half of the inferred sibling relationships are confirmed bythe WHOIS lookup service. We show evidence that some BGProuting table entries stem from router misconfigurations.The remainder of the paper is structured as follows. Section II

presents an overview of interdomain routing and discusses pre-vious work on the Internet topology and routing. In Section III,we define the types of relationships between ASs and impliedexport policies. We also derive routing table entry patterns re-sulted from the export policies. Section IV presents heuristicalgorithms for inferring AS relationships. In Section V, we per-form an empirical study of inferring AS relationships by usingpublicly available BGP routing tables. We conclude the paperin Section VI with a summary and future work.

Fig. 1. AS graph example.

II. BACKGROUND ON INTERDOMAIN ROUTINGAND RELATED WORK

In this section, we present background material on the In-ternet routing architecture [17] and the use of BGP for inter-domain routing [18], [19]. We also summarize previous workon the Internet topology discovery.

A. Internet Architecture

The Internet consists of a large collection of hosts intercon-nected by networks of links and routers. The Internet is dividedinto thousands of distinct regions of administrative domain, eachof which possesses one or several ASs. Examples of adminis-trative domain range from college campuses and corporate net-works to large ISPs such as AT&T or MCI Worldcom. EachAS in the Internet is represented by a 16-bit AS number, whichbrings to a total of 65 536 possible ASs. Not all AS numbersare assigned to administrative domains, and some assigned ASnumbers are not used. On January 2, 2000, there are at least6474 ASs in use [20]. Many ISPs possess several ASs. For ex-ample, MCI Worldcom owned at least 143 ASs on December10, 1997 [20]. An AS has its own routers and routing poli-cies, and connects to other ASs to exchange traffic with re-mote hosts. A router typically has very detailed knowledge ofthe topology within its AS, and limited reachability informa-tion about other ASs. ASs interconnect at dedicated point-to-point links or public Internet exchange points (IXPs) such asMAE-EAST or MAE-WEST. Public exchange points typicallyconsist of a shared medium, such as a Gigabit Ethernet or anATM switch, that interconnects routers from several differentASs. Physical connectivity at the IXP does not necessarily implythat every pair of ASs exchanges traffic with each other.We can model the connectivity between ASs in the Internet

using an AS graph , where the node set consistsof ASs and the edge set consists of AS pairs that exchangetraffic with each other. Note that the edges of AS graph representlogical relationships between ASs and do not represent the formof the physical connection. Fig. 1 shows an example of an ASgraph. The degree of an AS is the number of ASs that are itsneighbors. Formally, the degree of AS ,

. The degree of an AS can be a good heuristic in determiningthe size of the AS. In [5], AS degrees have been used to classifyASs into four levels of hierarchy.

Page 3: On inferring autonomous system relationships in the

GAO: ON INFERRING AUTONOMOUS SYSTEM RELATIONSHIPS IN THE INTERNET 735

Each AS has responsibility for carrying traffic to and froma set of customer IP addresses. The scalability of the Internetrouting infrastructure depends on the aggregation of IP ad-dresses in contiguous blocks, called prefixes, each consistingof a 32-bit IP address and a mask length (e.g., 1.2.3.0/24). AnAS employs an intradomain routing protocol (such as OSPFor IS–IS) to determine how to reach each customer prefix, andemploys an interdomain routing protocol (such as BGP) toadvertise the reachability of these prefixes to neighboring ASs.We denote the set of prefixes that are originated from AS by

.Since the commercialization of the Internet in 1995, the

Internet has experienced tremendous growth in both size andcomplexity. The interconnections between ASs are dynamicallyevolving since ISPs can add or remove connections to otherASs and companies can change their Internet service providers.Furthermore, the contractual agreements between ASs canchange due to ISP merging and restructuring. There are severalregistration services for the administration and registrationof IP and AS numbers. ARIN [1] is an Internet registry thatprovides the WHOIS lookup service in North America, SouthAmerica, the Caribbean, and sub-Saharan Africa. The WHOISservice provides information about each AS such as the nameand address of the administrative domain that the AS belongsto. Other registration services include RIPE NCC, whichprovides services for Europe, the Middle East, and parts ofAfrica, and APNIC, which provides services for Asia Pacific.However, answering simple questions such as which ASsbelong to an ISP or which prefixes are originated from ASis not a straightforward undertaking. There is no one-to-onerelationship between AS numbers and ISPs, and networks areat times connected via multiple ISPs.

B. Routing Policies and BGP Routing TablesBGP is a path-vector protocol that constructs paths by suc-

cessively propagating updates between pairs of BGP speakingrouters that establish BGP peering sessions [18], [19]. Eachupdate concerns a particular prefix, , and includesthe list of the ASs along the path (the AS path), .Each BGP speaking router originates updates for one or moreprefixes, and can send the updates to its immediate neighborsvia BGP sessions. The simplest path-vector protocol wouldemploy shortest AS path routing, where each AS selects a routewith the shortest AS path. However, BGP allows a much widerrange of routing policies so as to honor contractual agreementsthat control the exchange of traffic. Upon receiving an update,a router must decide whether or not to use this path accordingto import policies and, if the path is chosen, whether or not topropagate the update to neighboring ASs according to exportpolicies. Routing policies are set by manipulating updateattributes including next-hop interface address ( ),local preference ( ), multiple-exit discriminator( ), and community set ( ) as described in thefollowing paragraphs. Routing policies are configured on eachBGP speaking router. For the simplicity of exposition, we usean AS to represent BGP speaking routers in the AS and use theAS and its BGP speaking routers interchangeably throughoutthis paper.

An AS uses import policies to transform incoming route up-dates. These import policies include denying an update, or per-mitting an update and assigning a local preference to indicatehow favorable the path is. We consider a BGP sessionbetween two ASs, and . receives a set of route updates

from . Let represent ’s update set after ap-plying the import policy. For example, an import policy couldassign if AS 1 appears in or denyany update that includes AS 2 in . Further, BGP dis-cards a routing update when already appears in the AS path ofthe update; this is essential to avoid introducing a cycle in theAS path. That is, BGP has the following loop-avoidance rule:

if then

After applying the import policies for route updates from aBGP session, an AS saves all the imported updates in its BGProuting table. The AS then follows a route selection process thatpicks the best route for each prefix. Let denote the bestroute selected by for prefix . is selected by pickingthe route with the highest , breaking ties by selectingthe route with the shortest . Note that local preferenceoverrides the AS-path length. Among the remaining routes, theAS picks the one with the smallest , breaking ties by se-lecting the route with the smallest intradomain routing cost. If atie still exists, further tie-breaking rules can be found at [19].Each AS sends only its best route for a prefix to a neighbor.

Export policies allow an AS to determine whether to send itsbest route to a neighbor and, if it does send the route, whathint it should send to its neighbor on using the route. AS ap-plies export policies to its best route set, , forsending to a neighboring AS . Export policies include permit-ting or denying a route, assigning multiple exit discriminator(to control how traffic enters its network), adding a communityvalue to community set (to hint on what preference a neighborshould give to the route), and prepending one or more times toAS path (to discourage traffic from entering its network by in-flating the length of the AS path listing its AS number multipletimes). For example, AS could decline to advertise routes toAS that have community 10 in the community set. Also, AScould prepend two times to the AS path for prefix 1.2.3.0/24and for any route that includes AS 2 in the AS path. For anyroute update , an AS always applies an implicit policy that sets

and to default values, assignsto ’s interface connecting to , and prepends to .Ultimately, the export policy transforms the set of updates as

, which transmits to using a BGP session.Each BGP speaking router keeps a BGP routing table, which

stores a set of candidate routes for the router.We refer to a candi-date route as a routing table entry, which includes a destinationprefix, next-hop, med, local preference and AS path of the route.For the sake of simplicity, we describe the routing table entriesfor a fixed prefix . The routing table entry in AS for desti-nation is a route with empty AS path, denoted as , iforiginates prefix . Otherwise, the routing table entries in fordepend on the best route of its neighboring AS , , as

well as the import policies of from and the export policies

Page 4: On inferring autonomous system relationships in the

736 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 9, NO. 6, DECEMBER 2001

of to . Formally, the routing table entries of for destinationprefix

if

otherwise(1)

We show a router’s BGP routing table entries for destinationprefix 4.2.24.0/21 below. The AS has five candidate routes to4.2.24.0/21: AS path (1740 1) via 134.24.127.3, ASpath (5459 5413 1) via 194.68.130.254, etc. Note thatthe third candidate route has AS path (1849 702 702 701 1),where AS 702 appears twice consecutively. This is due to ASprepending; AS 702 appends its AS number twice before ex-porting to AS 1849. Since we are interested in inferring AS re-lationships in this paper, the extra appearance of an AS numberdoes not give us additional information in this context. There-fore, for the sake of simplicity, we assume that the AS path inthe BGP routing table entry is preprocessed so that no AS ap-pears more than once throughout this paper. In addition, we listASs in an AS path in the order that ASs are traversed when apacket is sent from the source to the destination throughout thispaper.

An AS can specify a diverse set of routing policies includingits preference on route selection and filtering. However, routingpolicies are typically constrained by commercial contractualagreements negotiated between administrative domain pairs.Routing policies are often manually configured in BGPspeaking routers by administrative domain operators. Thepotential for the various policies to conflict with and contradictone another is enormous [15]. To address these challenges,Internet Routing Registries (IRR), a distributed database ofrouting registries, was created. The aim of IRR is to act asa repository of routing policies and to perform consistencychecking on the registered information. However, not all ISPsare willing to reveal their policies and even if they are, RoutingPolicy Specification Language (RPSL) [21], [22] has onlybeen recently standardized. ISPs are still in the early stage ofmigrating to the new standard. As a result, information on IRRis far from complete.

C. Related WorkThe increasing importance and complexity of the Internet

routing infrastructure has sparked interest in understandingInternet topology and its effect on the end-to-end performance.Previous work consists of discovering the Internet topology,

constructing the Internet distance map, and identifying inherentstructural properties of the Internet. Several studies [6], [9]present heuristics on discovering the router adjacencies byeffectively using the traceroute tool. Motivated by impor-tant problems such as the mirror site placement, Jamin andTheilmann et al. study the construction of distance maps byestimating the end-to-end distance using strategically placedmeasurement servers [7], [8], [10]. Faloutsos et al. identify thepower-law properties of the Internet connectivity at both routerand AS levels [4]. In [5], inter-AS connectivity is characterizedby a hierarchy of ASs, where ASs are classified into four levelsby the degree of the AS.With the exception of [5], all of the aforementioned work do

not have explicit notion of AS hierarchy. To the best of ourknowledge, all studies have assumed that the connectivity isequivalent to the reachability and there is no explicit notion ofAS relationships in the topology characterization. Our paper isthe first study that explores AS relationships, which is an in-herent aspect of the policy-based Internet routing structure. Theinformation about AS relationships is crucial in fully under-standing structural properties of the Internet. Further, AS rela-tionships can help to effectively place measurement servers andbetter approximate end-to-end distances.

III. AS RELATIONSHIPS AND ROUTING TABLE ENTRYPATTERNS

Our algorithm for inferring AS relationships is based on thefact that each AS sets up its export policies according to its rela-tionships with neighboring ASs. In this section, we describe theannotated AS graph representation, export polices, and routingtable entry patterns resulted from the export polices.

A. Annotated AS Graph and Selective Export RuleWe propose to represent AS relationships by an annotated

AS graph. An annotated AS graph is a partially directed graphwhose nodes represent ASs and whose edges are classifiedinto provider-to-customer, customer-to-provider, peer-to-peerand sibling-to-sibling edges. Furthermore, only edges betweenproviders and customers are directed. When traversing anedge from a provider to a customer, we refer to the edgeas a provider-to-customer edge. When traversing an edgefrom a customer to a provider, we refer to the edge as acustomer-to-provider edge. We call the edge between two ASsthat have a peering relationship a peer-to-peer edge and theedge between two ASs that have a sibling relationship a sib-ling-to-sibling edge. Fig. 2 shows an example of an annotatedAS graph.Each AS sets up its export policies according to its relation-

ships with neighboring ASs. We define , ,, and as the set of customers, peers,

siblings, and providers of , respectively. We classify the set ofroutes for an AS into customer, provider, and peer routes. Aroute of AS is a customer (provider, or peer) route if the firstnonsibling-to-sibling edge in is a provider-to-cus-tomer (customer-to-provider, or peer-to-peer) edge. Moreprecisely, let . If isa sibling-to-sibling edge for all and is a

Page 5: On inferring autonomous system relationships in the

GAO: ON INFERRING AUTONOMOUS SYSTEM RELATIONSHIPS IN THE INTERNET 737

Fig. 2. Annotated AS graph.

provider-to-customer (customer-to-provider or peer-to-peer)edge, then is a customer (provider or peer) route. Ifis a sibling-to-sibling edge for all . then is defined to be’s own route. The AS relationships translate into the followingrules that govern BGP export policies [23], [3]:• Exporting to a provider: In exchanging routing informa-tion with a provider, an AS can export its routes and itscustomer routes, but usually does not export its provideror peer routes.

• Exporting to a customer: In exchanging routing informa-tion with a customer, an AS can export its routes and itscustomer routes, and as well as its provider or peer routes.

• Exporting to a peer: In exchanging routing informationwith a peer, an AS can export its routes and its customerroutes, but usually does not export its provider or peerroutes.

• Exporting to a sibling: In exchanging routing informa-tion with a sibling, an AS can export its routes and routesof its customers, and as well as its provider or peer routes.

In summary, an AS selectively provides transit services for itsneighboring ASs. An AS sets up its export policy according tothe following rule.Selective Export Rule:a) Consider AS and AS . For

each best route of , if is a provider or peer route of, then .

b) Consider AS and AS .There is a best route of such that is a provider orpeer route of , and .

Note that although exporting policies are the same forproviders and peers (or customers and siblings), provider–cus-tomer relationships are asymmetric and peering (or sibling)relationships are symmetric, which is the key in distinguishingprovider–customer relationships from peering (or sibling)relationships. Formally, AS transits traffic for AS iff AStransits some of its provider or peer routes to AS , i.e., thereis a best route of such that is a provider or peer route ofand . Now we can determine AS

relationships as follows.• ASs and have a peering relationship iff neither tran-sits traffic for nor transits traffic for .

• AS is a provider of AS iff transits traffic for anddoes not transit traffic for .

• ASs and have a sibling relationship iff both transitstraffic for and transits traffic for .

Note that the relationship between two ASs might not di-rectly correspond to the business or commercial agreement be-tween the administrative domains that the ASs belong to. Itis possible to have a commercial agreement between two ASsthat includes a mixture of provider–customer, peering, and mu-tual-backup agreements. Two ASs might set up different exportpolicies at different BGP sessions between the ASs. The rela-tionship between two ASs reflects the most dominant commer-cial agreement between their respective administrative domains.The commercial agreements can be ordered from the most dom-inant to the least dominant as follows: mutual transit/backup,provider–customer, and peering agreement.

B. Routing Table Entry Patterns

The selective export rule indicates that a BGP routing tableentry should have a certain pattern. Before we explain the pat-tern, we present a lemma that infers export policies from routingtable entries. This lemma aids us to derive the routing table entrypatterns.Lemma 3.1: If ’s BGP routing table contains an entry

with AS path for destination prefix , i.e.,there is an entry such that and

, then we conclude that for

1) selects a route with asthe best route to prefix , i.e.,

.2) exports its best route to , i.e.,

.Proof: We prove by induction on . We first prove for the

case that . From (1),since otherwise ’s BGP routing table does not contain anentry with AS path for destination prefix .If , then the routing table ofdoes not contains a route to with AS path .

Therefore, . Suppose thelemma is true for . That is,

. Then ’s BGP routing table contains anentry with AS path for destination prefix . Nowwe can use the same argument for as for .In the next theorem, we show that the selective export rule

and Lemma 3.1 ensure that the AS path of an BGP routing tableentry has the property.Valley-Free: After traversing a provider-to-customer

or peer-to-peer edge, the AS path cannot traverse a cus-tomer-to-provider or peer-to-peer edge. Formally, an AS path

is valley-free iff the following conditionshold true.• A provider-to-customer edge can be followed byonly provider-to-customer or sibling-to-sibling edges:If is a provider-to-customer edge, then

must be either a provider-to-customer or asibling-to-sibling edge for any .

• A peer-to-peer edge can be followed by onlyprovider-to-customer or sibling-to-sibling edges:If is a peer-to-peer edge, then must

Page 6: On inferring autonomous system relationships in the

738 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 9, NO. 6, DECEMBER 2001

Fig. 3. AS paths (1, 2, 3) and (1, 2, 6, 3) are valley-free while AS paths (1, 4,3) and (1, 4, 5, 3) are not valley-free.

be either a provider-to-customer or a sibling-to-siblingedge for any .

For example, in Fig. 3, AS paths (1, 2, 3) and (1, 2, 6, 3) arevalley-free while AS paths (1, 4, 3) and (1, 4, 5, 3) are notvalley-free. Note that the selective export rule ensures that BGProuting table entries contain only valley-free AS paths. For ex-ample, if AS path (1, 4, 3) appears in a BGP routing table, thenAS 4 exports its provider route (3) to its provider AS 1. This vi-olates the selective export rule. Formally, we have the followingtheorem.Theorem 3.1: If all ASs set their export policies according to

the selective export rule, then the AS path in any BGP routingtable entry is valley-free.

Proof: We prove by contradiction. Suppose that AS pathin a BGP routing table entry is not valley-free.

Let be the destination prefix for the routing table entry. ASpath contains either 1) a provider-to-cus-tomer edge that is followed by a customer-to-provider orpeer-to-peer edge, or 2) a peer-to-peer edge that is followed bya customer-to-provider or peer-to-peer edge.In the case of 1), there is such that

is a provider-to-customer edge and there is suchthat is a customer-to-provider or peer-to-peeredge. Assume that is smallest such that is acustomer-to-provider or peer-to-peer edge. This means that

is either a provider-to-customer or sibling-to-sib-ling edge for . Let be the largest such that

is a provider-to-customer edge. That is, for, is a sibling-to-sibling edge. From

Lemma 3.1, we have that is a provider route of and. However, this contradicts

the selective export policy rule since is a provider or peerof .In the case of 2), a similar argument applies.The valley-free property derived from Theorem 3.1 enables

us to identify patterns for BGP routing table entries. We havea corollary that indicates such patterns. But first, we define no-tations that simplify the description of the routing table entrypatterns.Downhill Path: A sequence of edges that are either

provider-to-customer or sibling-to-sibling edges. Formally, apath is a downhill path iffis either a provider-to-customer or a sibling-to-sibling edge forall .

Uphill Path: A sequence of edges that are either cus-tomer-to-provider or sibling-to-sibling edges. Formally, a path

is an uphill path iff is either acustomer-to-provider or a sibling-to-sibling edge for all .Corollary 3.1: An AS path of a BGP routing table entry has

one of the following patterns:1) an uphill path;2) a downhill path;3) an uphill path followed by a downhill path;4) an uphill path followed by a peer-to-peer edge;5) a peer-to-peer edge followed by a downhill path; or6) an uphill path followed by a peer-to-peer edge, which isfollowed by a downhill path.

It is easy to verify that any other types of AS paths are notvalley-free. Corollary 3.1 implies that an AS path can be parti-tioned into either1) the maximal uphill path, the peer-to-peer edge, and themaximal downhill path in order; or

2) the maximal uphill path and the maximal downhill pathin order

where the maximal uphill path and the maximal downhill pathare defined as follows.Maximal Uphill Path: The longest uphill path in the AS

path. Formally, is the maximal uphill path ofAS path iff is an uphill path and

is a provider-to-customer or peer-to-peer edge.Maximal Downhill Path: The remaining AS path after

removing the maximal uphill path and the peer-to-peer edge.Formally, is the maximal downhill path of ASpath iff is a downhill path and

is a peer-to-peer edge or belongs to the maximaluphill path.Note that any one or both of the maximal uphill path and the

maximal downhill path of an AS path can be empty. An AS pathcan have an uphill top provider and a downhill top provider,where the uphill top provider is the last AS in its maximal uphillpath and the downhill top provider is the first AS in its maximaldownhill path. Note that an AS path’s uphill top provider anddownhill top provider are the sameAS if there is no peer-to-peeredge in the AS path. If the uphill and downhill top providers areknown, then we can infer the relationship between any consec-utive pair of the AS path. Therefore, identifying the uphill anddownhill top providers is the key in inferring AS relationships.The goal of this paper is to produce an annotated AS graph by

taking advantage of BGP routing table entry patterns. In otherwords, given BGP routing tables, we derive an annotated ASgraph that is consistent with the BGP routing tables. In thenext two sections, we present heuristic algorithms that use BGProuting tables to infer AS relationships and show experimentalresults derived from BGP routing tables.

IV. HEURISTIC ALGORITHMS FOR INFERRING ASRELATIONSHIPS

In this section, we present heuristic algorithms for inferringAS relationships given a set of routing tables. Our algorithms

Page 7: On inferring autonomous system relationships in the

GAO: ON INFERRING AUTONOMOUS SYSTEM RELATIONSHIPS IN THE INTERNET 739

are based on the intuition that a provider typically has a largersize than its customer does and the size of an AS is typically pro-portional to its degree in the AS graph. The uphill (or downhill)top provider of an AS path should be the AS that has the highestdegree among all ASs in its maximal uphill (or downhill) path.Let the top provider of an AS path to be the AS that has a higherdegree between the uphill and downhill top provider. Therefore,the top provider of an AS path is the AS that has the highest de-gree among all ASs in the AS path.For the sake of simplicity, we first classify edges into

provider-to-customer and sibling-to-sibling edges. Now, wecan infer that consecutive AS pairs that appear before thetop provider in the AS path are customer-to-provider or sib-ling-to-sibling edges, and consecutive pairs that appear afterthe top provider in the AS path are provider-to-customer orsibling-to-sibling edges. We then identify peer-to-peer edgesfrom the set of AS pairs that appear only as the top providerand the top provider’s neighbor in an AS path. We first showalgorithms that classify AS relationships into provider-to-cus-tomer and sibling-to-sibling edges in Section IV-A, and thenpresent an algorithm that identifies AS pairs that have peeringrelationships in Section IV-B.

A. Algorithms for Inferring Provider–Customer and SiblingRelationshipsIn this section, we first present a basic algorithm for inferring

provider–customer and sibling relationships in Section IV-A-1.We then refine this algorithm in Section IV-A-2.1) Basic Algorithm: Our basic heuristic algorithm goes

through the AS path of each routing table entry. It finds thehighest degree AS and lets the AS be the top provider of the ASpath. Knowing the top provider, we can infer that consecutiveAS pairs before the top provider are customer-to-provider orsibling-to-sibling edges, and consecutive AS pairs after the topprovider are provider-to-customer or sibling-to-sibling edges.Note that we traverse an AS path in the order that ASs arevisited when a packet is sent from the source to the destination.More precisely, if an AS pair appears before the topprovider of an AS path, then provides transit services for, and if an AS pair appears after the top provider of

an AS path, then provides transit services for . Therefore,is a provider of iff provides transit services for anddoes not provide transit services for . An AS pair have a

sibling relationship if the pair provide transit services for eachother.This leads to a three-phase heuristic algorithm for inferring

provider–customer and sibling relationships. The first phaseparses routing tables and calculates the degree of each AS.The second phase parses each entry of the routing tables. Itfirst identifies the top provider and then assigns consecutiveAS pairs before the top provider with a transit relationshipand consecutive AS pairs after the top provider with a transitrelationship. Fig. 4 shows the basic algorithm in details.The basic algorithm has the running time complexity of

, where is the total number of consecutive AS pairsin the routing tables. As we will see later, we evaluate thealgorithm by using a publicly available routing table, in whichthere are 1 million route entries and is over 2.6 million.

Fig. 4. Basic heuristic algorithm.

Therefore, it is important to construct a linear time algorithmin .2) Refined Algorithm: The basic algorithm assumes that all

BGP speaking routers are configured correctly. However, it ispossible that some BGP speaking routers are misconfigured inthe sense that they do not conform to the selective export rule.This might lead to incorrect inference of AS relationships fromrouting tables. For example, AS and AS are providers ofAS . However, AS misconfigures its BGP speaker routersuch that AS transits traffic between ASs and . In therouting table of AS , there is a routing table entry with ASpath . Suppose AS has the highest degree among thethree ASs. The Basic algorithm infers that transits traffic for, which contradicts with the fact that is a provider of . Toreduce the possibility of incorrect inference, we propose a re-fined algorithm that determines AS relationships by countingthe number of routing table entries that conclude transit rela-tionships. In the refined algorithm, we assume that misconfig-ured BGP speaking routers affect only a small number of routingtable entries. Specifically, we use the heuristic that if no morethan routing table entries infer that AS provides transit ser-vices for AS and more than routing table entries infer thatAS provides transit services for AS , then we ignore therouting table entries that infer that AS transits for AS and weconclude that is a customer of , where is a small constant.The refined algorithm infers AS relationships as follows. The

first phase parses routing tables and calculates the degree of eachAS. The second phase parses each entry of the routing tables. Itcounts the number of routing table entries that infer an AS pairhaving a transit relationship by assigning consecutive AS pairs

Page 8: On inferring autonomous system relationships in the

740 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 9, NO. 6, DECEMBER 2001

Fig. 5. Refined heuristic algorithm.

before the top provider with a transit relationship and consecu-tive AS pairs after the top provider with a transit relationship.The third phase finalizes the relationship between AS pairs. Ifmore than routing table entries infer that AS transits trafficfor AS and more than routing table entries infer that AStransits traffic for AS , then is a sibling of . If at least oneand at most routing table entries infer that AS transits trafficfor AS and at least one and at most routing table entries inferthat AS transits traffic for AS , then is a sibling of . Oth-erwise, if no routing table entry infers that transits traffic foror at least routing table entries infer that transits traffic for, then is a provider of . Note that unlike the basic algorithm,the refined algorithm ignores some routing table entries. Fig. 5shows the refined algorithm in details.

B. A Heuristic Algorithm for Inferring Peering RelationshipsBoth the basic and refined algorithms classify AS relation-

ships into provider–customer or sibling relationships only. Inthis section, we present a heuristic algorithm for identifyingpeering relationships. An AS pair have a peering relationship ifand only if theAS pair do not transit traffic for each other. There-fore, we first identify all AS pairs that have transit relationships.According to Corollary 3.1, if an AS pair appear consecutivelyin an AS path and neither of the AS pair is the top provider ofthe AS path, then the AS pair have a transit relationship andtherefore, the AS pair do not have a peering relationship. Fur-thermore, according to Corollary 3.1, an AS path has at mostone consecutive AS pair that have a peering relationship. Thatis, a top provider can have a peering relationship with at mostone of its neighbors in the AS path. Since an AS pair that have apeering relationship are typically of comparable size, we iden-tify the neighboring AS of the top provider that the top providerdoes not have a peering relationship with using the heuristic thatthe top provider is more likely to peer with its neighbor with a

higher degree. That is, if the top provider does not have a siblingrelationship with any one of its neighboring ASs in the AS path,then the top provider does not have a peering relationship withthe neighbor with smaller degree.Finally, since we might not have routing tables from all BGP

speaking routers, we might not be able to identify all AS pairsthat do not have peering relationships using the heuristic de-scribed above. To eliminate more AS pairs from having peeringrelationships, we use the heuristic that the sizes of two peersdo not differ significantly. Specifically, we assume that the de-grees of the two ASs that have a peering relationship do notdiffer by more than times, where is a constant that has tobe fine-tuned. Note that the need for the constant is unfortu-nate and it is not clear how to properly set it. Accordingly, someinaccuracy might be introduced to the corresponding inference.On the other hand, the more BGP routing tables we use for theinference, the less crucial the choice of is. This is because wecan eliminate an AS pair from having a peering relationship ifthe AS pair appear in any BGP routing table entry as having atransit relationship or being not likely to peer, as described ear-lier. The use of to eliminate an AS pair from having a peeringrelationship plays a less significant role.The final algorithm infers peering relationships as follows.

Phase 1 coarsely classifies AS pairs into having provider–cus-tomer or sibling relationships. Phase 2 identifies all AS pairs thatcannot peer with each other. Finally, Phase 3 identify peeringrelationships from the rest of connected AS pairs by using theheuristic that two peering ASs’ degrees do not differ by morethan times. Fig. 6 presents the final algorithm in detail.

V. INFERRING AS RELATIONSHIPS IN THE INTERNET

In this section, we present experimental results of inferringAS relationships in the Internet. Ideally, we would perform ex-periments on BGP routing tables of all BGP speaking routers

Page 9: On inferring autonomous system relationships in the

GAO: ON INFERRING AUTONOMOUS SYSTEM RELATIONSHIPS IN THE INTERNET 741

Fig. 6. Final algorithm.

TABLE ICURRENT CONTRIBUTORS OF ROUTE VIEWS

in the Internet. However, there are a limited number of BGProuting tables publicly available. We choose to use BGP routingtables from the Route Views router in Oregon [16], which hasthe most complete view currently available. The Router Viewsrouter establishes BGP peering sessions with 22 ISPs at 24 lo-cations, as shown in Table I. The Route Views server collects

the BGP routing table once every night [20]. For the detaileddescription of the Route Views server, see [16].

A. Experimental ResultsWe implement the Basic, Refined, and Final algorithms using

the Perl programming language. For the Refined algorithm, we

Page 10: On inferring autonomous system relationships in the

742 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 9, NO. 6, DECEMBER 2001

TABLE IIINFERENCE RESULTS

choose so as to ignore fewer number of routing tableentries. For the Final algorithm, we use the Basic algorithmto infer sibling relationships and let the limit of the ratio be-tween the degrees of two peering ASs, , be infinite or 60.Note that we choose to use due to the current con-nectivity of top tier providers. There are only two ASs whosedegrees are greater than 420, and few ASs with degree less thanseven peer with tier-1 providers. Admittedly, this might excludesome peer-to-peer edges. We will see in the next subsection thatfine-tuning to be 60 can significantly improve the inferenceresult for peering relationships. We run the algorithms for theBGP routing tables from September 27, 1999, January 2, 2000,and March 9, 2000. The number of edges in the AS graph, thenumber of sibling-to-sibling edges inferred by both the Basicand Refined algorithms, and the number of peer-to-peer edgesinferred by the Final algorithm are shown in Table II.Note that the total number of edges in the AS graph is in-

consistent with the publicly available data at [20]. In [20], ASsummary data indicates that there are 13 895 edges on January2, 2000 and 12 468 edges on September 27, 1999. Because ofthe operation in BGP, an AS can appear more thanonce in a routing table entry. The Perl script [20] overcounts thenumber of edges by including self edges (A self edge is an edgebetween an AS and itself) when parsing the BGP routing table.We eliminate self edges in our Perl programs.These BGP routing tables contain almost 1 million routing

table entries. From the BGP routing table on September 27,1999, theBasic and Final algorithms infer that among 11 288ASgraph edges, there are 10 745 provider-to-customer edges, 149sibling-to-sibling edges, and 884 peer-to-peer edges. By usingthe Refined algorithm, the number of sibling-to-sibling edges isreduced to 124 and by using the Final algorithm with ,the number of peer-to-peer edges is reduced to 733.From the BGP routing table on January 2, 2000, the Basic

and Final algorithms infer that among 12 571 AS graph edges,there are 12 013 provider-to-customer edges, 186 sibling-to-sib-ling edges, and 372 peer-to-peer edges. By using the Refinedalgorithm, the number of sibling-to-sibling edges is reduced to135 and by using the Final algorithm with , the numberof peer-to-peer edges is reduced to 668. From the BGP routingtable onMarch 9, 2000, the Basic and Final algorithms infer thatamong 13 800AS graph edges, there are 13 661 provider-to-cus-tomer edges, 203 sibling-to-sibling edges, and 836 peer-to-peeredges. By using the Refined algorithm, the number of sibling-to-sibling edges is reduced to 157 and by using the Final algorithmwith , the number of peer-to-peer edges is reduced to713. Therefore, for all three routing tables, we can find a con-sistent view of AS relationships which has more than 90.5%

TABLE IIICOMPARING INFERENCE RESULTS FROM BASIC AND FINAL ( ) WITH

AT&T INTERNAL INFORMATION

provider-to-customer edges, less than 1.5% of sibling-to-siblingedges and less than 8%of peer-to-peer edges. Note that the smallpercent of peer-to-peer edgesmight be caused by the incompleteview of the Route Views router. Since the Route Views routerpeers with mostly tier-1 providers, peering between tier-2 ortier-3 ISPs might not be manifested in the Route Views routingtable due to the selective export rule and the fact that only thebest routes are exported. We also observe that the number or thepercentage of sibling-to-sibling edges is increasing. It might becaused by the increasing number of complex AS relationshipsand ISP mergers.

B. Verification of Inferred Relationships by AT&TAlthough there is no publicly available information about AS

relationships, we verify our inferred relationships by comparingwith AT&T internal information on AT&T Common IP Back-bone.We compare our data forMarch 9, 2000with that of AT&TCommon IP Backbone on the same day.Table III compares the inference results from Basic and

Final( ) algorithms with AT&T internal information.Since we cannot reveal AT&T internal information on eachtype of relationship, we present the comparison results in termsof percentage except for some special cases. From the table,we see that 100% of our inferred customers are confirmed bythe AT&T internal information. 0% of our inferred provideris confirmed by the AT&T internal information. Note that weinfer that AT&T has only one provider while AT&T has noprovider. 77.4% of our inferred peers are confirmed by theAT&T internal information. 20% of our inferred siblings areconfirmed by the AT&T internal information. Note also thatwe do not necessarily have all AT&T’s neighbors from therouting table of Route Views since AT&T announces onlyits best routes to outside and some of its announced routesare aggregate routes. Out of neighbors from the AT&T list,

Page 11: On inferring autonomous system relationships in the

GAO: ON INFERRING AUTONOMOUS SYSTEM RELATIONSHIPS IN THE INTERNET 743

TABLE IVCOMPARING INFERENCE RESULTS FROM REFINED ( ) AND FINAL

( ) WITH AT&T INTERNAL INFORMATION

20% ASs do not exist in our adjacency list and most of theseASs are customers of AT&T. Out of all of our inferenceresults, 96.3% of inference results are confirmed by the AT&Tinternal information. Note that 96.3% accuracy is relative toonly AT&T’s relationships with its neighbors. Using AT&T’sinternal information, we can verify 3.32% of edges that appearin the Router Views’ BGP routing table. For AT&T’s relation-ships with its neighbors, most of the inaccurate inference iscaused by the misclassification of peering relationships intoprovider–customer relationship. This confirms the need offine-tuning due to the lack of sufficient BGP routing tables.We will see later that by fine-tuning , it is possible to achievebetter accuracy for AT&T’s relationships with others.Table IV compares the inference results fromRefined( )

and Final( ) with AT&T internal information. From thetable, we see that 100% of our inferred customers are confirmedby the AT&T internal information. 77.4% of our inferred peersare confirmed by the AT&T internal information. 25% of ourinferred siblings are confirmed by the AT&T internal informa-tion. Out of all of our inference results, 96.5% of inference re-sults are confirmed by the AT&T internal information. UsingRefined algorithm, we improve the inference results on siblingrelationships.Table V compares the inference results from Refined( )

and Final( ) with AT&T internal information. From thetable, we see that 100% of our inferred customers are confirmedby the AT&T internal information. 100% of our inferred peersare confirmed by the AT&T internal information. 25% of our in-ferred siblings are confirmed by the AT&T internal information.Out of our inference results, 99.1% of inference results are con-firmed by the AT&T internal information. Using the heuristicthat peers are typically of comparable sizes by setting a reason-able value for , we improve the inference results on peeringrelationships significantly. Note that although it is problematicto select a proper value for , it is encouraging to see that itis possible to achieve 100% confirmation for peering relation-ship inference for an ISP. At the same time, this suggests that weshould combine other informationwith our inference techniquesto achieve better accuracy for important business applications.

C. Verifications by the WHOIS ServerWe verify our inferred sibling relationships by checking with

the WHOIS lookup service. Since the WHOIS lookup servicesupplies the name and address of the company that owns an AS,

TABLE VCOMPARING INFERENCE RESULTS FROM REFINED ( ) AND FINAL

( ) WITH AT&T INTERNAL INFORMATION

we can confirm that an AS pair has a sibling relationship if thetwoASs belong to the same company or twomerging companies(such as AT&T and Cerfnet). Further, we also confirm that anAS pair has a sibling relationship if the ASs belong to two smallcompanies that are located in the same city (which increases thelikelihood that they have a mutual-transit agreement). We man-ually queried the WHOIS lookup service and confirmed 101 of186 inferred sibling relationships for the January 2, 2000 data.This is 54.3% of the inferred sibling relationships. Note that,however, the WHOIS server might not be completely accuratesince its database might contain stale records. Therefore, it ispossible that theWHOIS servermight falsely confirm a relation-ship. With that caveat in mind, we use it for the lack of a betteravenue by which to verify our inference results. By the sametoken, other unconfirmed sibling relationships might still havesibling relationships since the WHOIS lookup service might beout of date and we do not have sufficient information about ISPmergers.The Refined algorithm reduces the number of sibling rela-

tionships by ignoring some of routing table entries as shown inTable II. For September 27, 1999 data, the Refined algorithminfers 124 sibling-to-sibling edges by ignoring only 25 routeentries. For January 2, 2000 data, the Refined algorithm infers135 sibling-to-sibling edges by ignoring only 51 route entries.For March 9, 2000 data, the Refined algorithm infers 157 sib-ling-to-sibling edges by ignoring only 46 route entries. This en-courages us to look into routing table entries that infer uncon-firmed sibling relationships and analyze routes that might mis-lead us in inferring AS relationships.We analyze the routes that contribute to the inference of un-

confirmed sibling relationships. Our goal is to find the possiblepatterns for these routes and perhaps to use the patterns to in-crease the accuracy of then inference. We report here severalpossible reasons behind the inference of unconfirmed siblingrelationships.

1) Router Configuration Typo: Some router prepends itsAS number by explicitly specifying the AS numbers toprepend. A typo in the configuration can result routingtable entries that violate the loop-avoidance rule definedin BGP. For example, in AS path (7018 3561 7057 70757057), AS7057 appears twice and does not appear con-secutively. This might be the result of the router configu-ration typo in AS7057.

Page 12: On inferring autonomous system relationships in the

744 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 9, NO. 6, DECEMBER 2001

2) Misconfiguration of small ISPs: Some small ISPsdo not follow the selective export rule in their routerconfiguration. For example, AS path (1239 11 116 7017018) has Sprint (AS1239) using a small ISP in Cali-fornia (AS11116) to get to AT&T (AS7018) via UUnet(AS701). According to the AS graph, UUnet and Sprintare directly connected, although the route does not usethe direct connection. It is likely that AS11116 is a cus-tomer of both AS701 and AS1239. Therefore, this routemight be caused by the misconfiguration of AS11116that announces its provider route to another provider.

3) Unusual AS relationships: Some AS pairs have their re-lationships defined at the prefix level. For example, inAS path (1239 3561 2856 701 702 1849 9090), Sprint(AS1239) uses AS3561 and AS2856 to get the route ofUUnet (AS701) instead of using the direct link to UUnet(AS701). Note that AS2856, AS1849, and AS9090 areEuropean ASs. This might be the result of specifically de-fined relationship for prefixes in Europe.

4) Inaccuracy of the heuristic: The top provider does nothave the highest degree. For example, in AS path (33337905 5727 1327), although AS3333 has the highest de-gree, it might not be the top provider of the AS path sinceAS5727 (AT&T) is likely to be the top provider. Note thatAS3333 is a European ISP.

Reasons 1, 2, and 3 suggest that we have to ignore somerouting table entries in inferring AS relationships. It is not clear,however, how to identify these entries. Reason 4 hints that itmight be wise to modify our heuristic for special ASs. How-ever, this cannot be done without additional knowledge such aswhich ISP an AS belongs to. Therefore, it is a challenging taskto increase the accuracy of the AS relationship inference withonly BGP routing tables.

VI. CONCLUSIONS AND FUTURE WORK

Interdomain routing policies are constrained by commercialcontractual relationships between administrative domains. As aresult, AS relationships are an inherent aspect of the Internetrouting structure. We present heuristic algorithms that infer ASrelationships from BGP routing tables. The algorithm is basedon the fact that a provider is typically larger than its customersand two peers are typically of comparable size. We perform anexperimental study ofAS relationships in the Internet. Out of theconnected AS pairs seen from Route Views, our heuristic algo-rithm classifies more than 90.5% of AS pairs into provider–cus-tomer relationships, less than 1.5% of AS pairs into sibling re-lationships, and less than 8% of AS pairs into peering relation-ships. We verify our inferred relationships with both AT&T in-ternal information and theWHOIS lookup service. 99.1% of ourinferred relationships between AT&T and its neighboring ASsare confirmed by the AT&T internal information. More than50% of inferred sibling pairs can be confirmed by the data fromthe WHOIS lookup service. Furthermore, we identify routingtable entries that stem from unusual AS relationships or routermisconfiguration/bugs.As part of ongoing work, we are exploring heuristics that

can improve AS relationship inference. Many of scenarios dis-

cussed in Section V-C can be combined with additional knowl-edge about ASs to improve our heuristic algorithms. In addition,we plan to study several applications of AS relationships. First,ISPs can reduce misconfiguration and debug router configura-tion files [13], [15]. Route policies are often manually config-ured and therefore prone to errors. Such errors can propagatefurther to other ASs and can potentially cause outage. Therefore,it is important for ISPs to monitor the received route announce-ments using AS relationship knowledge and perhaps filter erro-neous routes such as a route using a customer for transit trafficbetween its two providers. Further, an ISP operator can scan itsBGP routing tables periodically to identify potential erroneousroutes and inform the originating AS. We would like to buildtools to improve the reliability of Internet routing. These toolsinclude features such as debugging router configuration files soas to conform to the selective export rule.Second, ISPs or companies can use AS relationship informa-

tion to plan for future contractual agreements. The contractualagreement between ISPs is constantly evolving. For example,a company might decide to switch to or add a tier-1 ISP as itsprovider. Verifying whether an ISP is a tier-1 ISPs involves un-derstandingwhether the ISP has a provider. As another example,an ISP might decide to establish private peering relationshipswith other ISPs as it becomes larger. The ISP might want to firstunderstand which tier that a potential ISP belongs to before col-lecting information on traffic volume between the two ISPs. Weplan to systematically study the AS hierarchical structure usingthe AS relationship information.

ACKNOWLEDGMENT

The author would like to thank J. Rexford at AT&T ResearchLabs for many helpful discussion and comments on the paper,J. M. Gottlieb at AT&T Research Labs for providing the AT&Tinternal information, D.Meyer at CISCO for allowing the use ofRoute Views data, and the anonymous reviewers for many valu-able comments. Any opinions, findings, conclusions, or recom-mendations expressed in this material are those of the authorand do not necessarily reflect the views of the National ScienceFoundation.

REFERENCES[1] [Online]. Available: http://www.arin.net/whois/arinwhois.html.[2] G. Huston, “Interconnection, peering and settlements—Part I,” Internet

Protocol J., Mar. 1999.[3] , “Interconnection, peering and settlements—Part II,” Internet Pro-

tocol J., June 1999.[4] M. Faloutsos, P. Faloutsos, and C. Faloutsos, “On power-law relation-

ships of the Internet topology,” in Proc. ACM SIGCOMM, Aug. 1999,pp. 251–262.

[5] R. Govindan and A. Reddy, “An analysis of Internet inter-domaintopology and route stability,” in Proc. IEEE INFOCOM, vol. 2, Apr.1997, pp. 850–857.

[6] R. Govindan and H. Tangmunarunkit, “Heuristics for Internet map dis-covery,” in Proc. IEEE INFOCOM, vol. 3, Mar. 2000, pp. 1371–1380.

[7] W. Theilmann and K. Rothermel, “Dynamic distance maps of the In-ternet,” in Proc. IEEE INFOCOM, vol. 1, Mar. 2000, pp. 275–284.

[8] S. Jamin, C. Jin, Y. Jin, D. Raz, Y. Shavitt, and L. Zhang, “On the place-ment of Internet instrumentation,” in Proc. IEEE INFOCOM, vol. 1,Mar. 2000, pp. 295–304.

[9] R. Siamwalla, R. Sharma, and S. Keshav, “Discovering Internettopology,” Cornell Univ., Ithaca, NY, Tech. Rep., May 1999.

Page 13: On inferring autonomous system relationships in the

GAO: ON INFERRING AUTONOMOUS SYSTEM RELATIONSHIPS IN THE INTERNET 745

[10] S. Jamin, P. Francis, V. Paxson, L. Zhang, D. Gryniewicz, and Y. Jin,“An architecture for a global Internet host distance estimation service,”in Proc. IEEE INFOCOM, vol. 1, Mar. 1999, pp. 210–217.

[11] S. Savage, A. Collins, E. Hoffman, J. Snell, and T. Anderson, “Theend-to-end effects of Internet path selection,” in Proc. ACM SIGCOMM,Aug. 1999, pp. 289–299.

[12] J. Rickard, “Mapping the Internet with traceroute,” Broadwatch Mag.,Dec. 1996.

[13] C. Labovitz, A. Ahuja, and F. Jahanian, “Experimental study of Internetstability and wide-area network failures,” in Proc. Int. Symp. Fault-Tol-erant Computing, June 1999, pp. 278–285.

[14] V. Paxson, “End-to-end routing behavior in the Internet,” IEEE/ACMTrans. Networking, vol. 5, pp. 601–615, Oct. 1997.

[15] A. Feldmann and J. Rexford, “IP network configuration for intradomaintraffic engineering,” IEEE Network, pp. 46–57, Sept./Oct. 2001.

[16] University of Oregon Route Views Project, D. Meyer. (2001). [Online].Available: http://www.antc.uoregon.edu/route-views/.

[17] B. Halabi, Internet Routing Architectures. Indianapolis, IN: Cisco,1997.

[18] J. W. Stewart, BGP4: Inter-Domain Routing in the Internet. Reading,MA: Addison-Wesley, 1999.

[19] Y. Rekhter and T. Li, “A Border Gateway Protocol 4 (BGP-4),” IETF,Request for Comments 1771, Mar. 1995.

[20] National Laboratory for Applied Network Research. Global ISP inter-connectivity by AS number. San Diego Supercomputer Center, Univ. ofCalifornia, San Diego. [Online]. Available: http://moat.nlanr.net/AS/.

[21] D. Meyer, J. Schmitz, C. Orange, M. Prior, and C. Alaettinoglu, “UsingRPSL in practice,” IETF, Request for Comments 2650, Aug. 1999.

[22] C. Alaettinoglu, C. Villamizar, E. Gerich, D. Kessens, D. Meyer, T.Bates, D. Karrenberg, and M. Terpstra, “Routing policy specificationlanguage (RPSL),” IETF, Request for Comments 2622, June 1999.

[23] C. Alaettinoglu, “Scalable router configuration for the Internet,” in Proc.IEEE IC3N, Oct. 1996.

[24] K. Varadhan, R. Govindan, and D. Estrin, “Persistent route oscillationsin inter-domain routing,” Univ. of Southern California Information Sci-ences Inst., Los Angeles, Tech. Rep. 96-631, Feb. 1996.

[25] T. G. Griffin and G. Wilfong, “An analysis of BGP convergence proper-ties,” in Proc. ACM SIGCOMM, Sept. 1999, pp. 277–288.

[26] L. Gao and J. Rexford, “A stable Internet routing without global coordi-nation,” IEEE/ACM Trans. Networking, vol. 9, pp. 681–692, Dec. 2001.

Lixin Gao (M’96) received the Ph.D. degree in com-puter science from the University of Massachusetts,Amherst (UMass), in 1996.She is currently an Associate Professor of

Electrical and Computer Engineering at UMass. Herresearch interests include multimedia networkingand Internet routing. She was a visiting researcherat AT&T Research Labs and DIMACS betweenMay 1999 and January 2000. She is the foundingdirector of the Multimedia Networking and InternetResearch Laboratory at UMass. She is a principal

investigator of a number of the National Science Foundation grants, includingthe NSF CAREER award.