19
On the Geographic Location of Internet Resources Anukool Lakhina John W. Byers Mark Crovella Ibrahim Matta Department of Computer Science Boston University anukool, byers, crovella, matta @cs.bu.edu Abstract— One relatively unexplored question about the Internet’s physical structure concerns the geographical location of its components: routers, links and autonomous systems (ASes). We study this question using two large inventories of Internet routers and links, collected by different methods and about two years apart. We first map each router to its geographical location using two differ- ent state-of-the-art tools. We then study the relationship between router location and population density; between geographic distance and link density; and between the size and geographic extent of ASes. Our findings are consistent across the two datasets and both mapping methods. First, as expected, router density per person varies widely over different economic regions; however, in economically homogeneous regions, router density shows a strong superlinear relationship to population density. Second, the probability that two routers are directly connected is strongly dependent on distance; our data is consistent with a model in which a majority (up to 75-95%) of link formation is based on geo- graphical distance (as in the Waxman topology generation method). Finally, we find that ASes show high variability in geographic size, which is correlated with other measures of AS size (degree and number of interfaces). Among small to medium ASes, ASes show wide variability in their geographic dispersal; however, all ASes exceeding a certain threshold in size are maximally dispersed geo- graphically. These findings have many implications for the next generation of topology generators, which we envisage as producing router-level graphs annotated with attributes such as link latencies, AS identifiers and geographical locations. I. I NTRODUCTION Despite the Internet’s critical importance in so- ciety, surprisingly little quantitative information is known about its physical structure and about the This work was partially supported by NSF research grants CCR- 9706685, ANI-9986397, ANI-0095988, and CAREER award ANI- 0093296. Some of the data used in this research was collected as part of CAIDA’s skitter initiative, http://www.caida.org. Support for skitter is provided by DARPA, NSF, and CAIDA membership. dynamic processes that drive its rapid growth. De- veloping a better understanding of the Internet’s structure is of interest from a purely scientific stand- point, but is also of immediate practical interest, since knowledge of the network’s properties enables researchers to optimize network applications and to conduct more representative network simulations. Previous attempts to model Internet structure have often made implicit or explicit assumptions about the network’s geometry. For example, the Waxman model [41] makes two such assumptions: 1) that network nodes are placed uniformly at random in the plane; and 2) that the likelihood two nodes are directly connected is an exponentially declining function of separation distance. On the other hand, other models have implicitly assumed that there is no important underlying geometry to the network, and the patterns of connectivity are only influenced by topological factors [9], [44], [1]. Despite these prevalent assumptions about net- work geometry, very little work to date has actually examined the geometry of the Internet’s infrastruc- ture. In this paper, we present initial results bearing on these questions. For example, with respect to the Waxman assumptions, we find that assumption 1 (uniform distribution of routers) is very inaccurate — the actual distribution pattern of routers is highly irregular. On the other hand, we find evidence that supports assumption 2 — the connectivity patterns of routers show a strong relationship to distance. In the process of obtaining these results, we ask a number of basic questions. Regarding router placement, we ask: Where are the routers com- prising the Internet physically located? and: What factors drive the geographic placement of routers? Turning to connectivity, the key questions we wish to answer are: Where are the links between Internet routers physically located? and: To what extent does router connectivity appear to be sensitive to physical

On the Geographic Location of Internet Resources - Computer Science

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On the Geographic Location of Internet Resources - Computer Science

On the Geographic Location of Internet ResourcesAnukool Lakhina John W. Byers Mark Crovella Ibrahim Matta

Department of Computer ScienceBoston University

fanukool, byers, crovella, [email protected]

Abstract— One relatively unexplored question about theInternet’s physical structure concerns the geographicallocation of its components: routers, links and autonomoussystems (ASes). We study this question using two largeinventories of Internet routers and links, collected bydifferent methods and about two years apart. We first mapeach router to its geographical location using two differ-ent state-of-the-art tools. We then study the relationshipbetween router location and population density; betweengeographic distance and link density; and between the sizeand geographic extent of ASes.

Our findings are consistent across the two datasetsand both mapping methods. First, as expected, routerdensity per person varies widely over different economicregions; however, in economically homogeneous regions,router density shows a strong superlinear relationshipto population density. Second, the probability that tworouters are directly connected is strongly dependent ondistance; our data is consistent with a model in which amajority (up to 75-95%) of link formation is based on geo-graphical distance (as in the Waxman topology generationmethod). Finally, we find that ASes show high variabilityin geographic size, which is correlated with other measuresof AS size (degree and number of interfaces). Amongsmall to medium ASes, ASes show wide variability intheir geographic dispersal; however, all ASes exceedinga certain threshold in size are maximally dispersed geo-graphically. These findings have many implications for thenext generation of topology generators, which we envisageas producing router-level graphs annotated with attributessuch as link latencies, AS identifiers and geographicallocations.

I. I NTRODUCTION

Despite the Internet’s critical importance in so-ciety, surprisingly little quantitative information isknown about its physical structure and about the

This work was partially supported by NSF research grants CCR-9706685, ANI-9986397, ANI-0095988, and CAREER award ANI-0093296. Some of the data used in this research was collected aspart of CAIDA’s skitter initiative, http://www.caida.org. Support forskitter is provided by DARPA, NSF, and CAIDA membership.

dynamic processes that drive its rapid growth. De-veloping a better understanding of the Internet’sstructure is of interest from a purely scientific stand-point, but is also of immediate practical interest,since knowledge of the network’s properties enablesresearchers to optimize network applications and toconduct more representative network simulations.

Previous attempts to model Internet structure haveoften made implicit or explicit assumptions aboutthe network’s geometry. For example, the Waxmanmodel [41] makes two such assumptions: 1) thatnetwork nodes are placed uniformly at random inthe plane; and 2) that the likelihood two nodesare directly connected is an exponentially decliningfunction of separation distance. On the other hand,other models have implicitly assumed that there isno important underlying geometry to the network,and the patterns of connectivity are only influencedby topological factors [9], [44], [1].

Despite these prevalent assumptions about net-work geometry, very little work to date has actuallyexamined the geometry of the Internet’s infrastruc-ture. In this paper, we present initial results bearingon these questions. For example, with respect to theWaxman assumptions, we find that assumption 1(uniform distribution of routers) is very inaccurate— the actual distribution pattern of routers is highlyirregular. On the other hand, we find evidence thatsupports assumption 2 — the connectivity patternsof routers show a strong relationship to distance.

In the process of obtaining these results, weask a number of basic questions. Regarding routerplacement, we ask: Where are the routers com-prising the Internet physically located? and: Whatfactors drive the geographic placement of routers?Turning to connectivity, the key questions we wishto answer are: Where are the links between Internetrouters physically located? and: To what extent doesrouter connectivity appear to be sensitive to physical

Page 2: On the Geographic Location of Internet Resources - Computer Science

2

distance? Our third set of questions concerns theautonomous system (AS) structure of the network:How does geographical size (number of locations)relate to previously studied measures of AS size?How do ASes disperse their resources geographi-cally? and: How do interdomain links differ fromintradomain links geographically? The answers wefind to our main questions are consistent acrossthree different regions of the world, across two verydifferent sources of data, and across two differentgeographic mapping techniques.

The choice of these questions is motivated by cur-rent problems in network topology generation. Weturn to geography for inspiration because a numberof unsolved problems in topology generation appearmuch easier to solve given an underlying geograph-ical model. For example, an accurate geometricmodel of router placement and link formation wouldmake the labelling of links with latency values astraightforward matter.

Although the questions we pose are relativelysimple, providing reasoned and justifiable methodsto answer them is surprisingly difficult. The fore-most difficulty is that there does not exist a recent“snapshot” of the Internet that provides geograph-ical location of routers, links, and ASes.1 To buildsuch snapshots, we took two large inventories ofInternet routers and links, collected by differentmethods about two years apart, and processed themin two stages: first, by mapping each router toits associated AS number, and second, using twodifferent state-of-the-art tools to determine eachrouter’s geographical location.

We present our main results in Sections IV toVI. In Section IV we show that router densityper person varies widely over different economicregions, but that router density per “online user”(defined in Section IV) shows much less variability— suggesting that the number of network usersin a geographic region (as determined,e.g., bysurveys) can be used to roughly size the amount ofnetwork infrastructure expected in the region. Whenwe restrict our focus to economically homogeneousregions, we find that router density shows a strongsuperlinear relationship to population density; thatis, the number of routers per person is higherin highly populated areas. (This may reflect the

1The most recent geographical map of the entire Internet we havebeen able to find dates from 1982 (ARPANET).

superlinear scaling of the number of communicationpaths needed as a function of the number of networkusers in an area.) These results justify the use ofpopulation distribution (which is well studied, witheasily accessible datasets [5]) as an effective proxyfor the actual distribution of routers.

Next, in Section V, we show that the probabilitythat two routers are directly connected is stronglydependent on the distance between them. In fact,our data is consistent with a model in which asurprisingly large majority (up to 75-95%) of linkformation is influenced by geographical distance. Asmentioned above, this is the assumption made inthe Waxman model [41] but it is explicitlynot anassumption in more recent and more sophisticatedtopology models. In fact, we even find that thefunctional form of distance dependence used byWaxman (i.e., an exponentially declining connec-tion probability) is in agreement with our data. Ofcourse, the Waxman method produces topologiesvery different from reality; but our results high-light the relative importance in examining the pointdistribution assumptions in the Waxman model inassessing the sources of its inaccuracy.

Finally, in Section VI, we turn to questions ofhow to use geographical information to assign nodesto Autonomous Systems. We find that ASes showremarkable variability in geographic extent. Weshow that the number of distinct locations in whichan AS places routers has a long-tail distributionsimilar to that previously reported for AS degree[13] and number of routers in an AS [38]. We alsoshow that all three of these measures of AS size areclearly correlated. In examining the geographic areacovered by the routers of an AS, we show evidencefor two distinct types of ASes: smaller ASes show awide range of variation in the geographic dispersionof their infrastructure. On the other hand, there isan upper cutoff in size (in terms of degree, numberof routers, or number of locations) beyond whichall ASes are maximally dispersed geographically. Inexamining the AS-crossing properties of links, wefind that intradomain links constitute the majority oflinks in our dataset (generally over 80%) and thatthey are on average only half as long as interdomainlinks.

We conclude in Section VII with a review ofour findings and a look to the future, including theimplications of our work for representative topologygeneration.

Page 3: On the Geographic Location of Internet Resources - Computer Science

3

II. RELATED WORK

Early work in generating test topologies fo-cused on simple and natural methods for produc-ing interconnections between a set of nodes onthe plane. The widely studied Erdos-Renyi randomgraph model [11] includes each possible connectionwith a fixed probabilityp, but typically yields agraph which is not connected whenp is chosenso that the resulting graph is sparse. Waxman [41]created topologies in which the probability thata connection between a pair of nodes is madedecays exponentially as the distance between thenodes increases, emphasizingspatialconsiderationsin topology generation. Structural models such asTiers and GT-ITM [9], [44] chose a different tack,building an explicithierarchy into their topologies.

Following the discovery of then-unexplainedpower laws in Internet topologies of Faloutsoset al:[13], subsequent methods, notably the Barabasi-Albert model [1], and topology generators such asInet [21] and generation models in BRITE[27], mea-sured success primarily in terms of graph connec-tivity properties, such as node degree distributions.An active debate about the merits and limitationsof these approaches is ongoing [21], [24], [7], [4];the jury is still out on which models are best andstudies have shown varying conclusions dependingon the generators used [31].

Our goal is not to propose a new topology gen-eration method in this paper, but to suggest a widerset of bases for the construction of topology gen-eration tools. To this end, we study the geographiclocation of Internet links, routers and ASs. CAIDA’sNetGeo [14] is a database that contains mappingsfrom IP addresses, domain names and AS numbersto latitude/longitude values. NetGeo’s database isbuilt using whois lookups to the ARIN, RIPE,and APNIC servers. Ixia’s IxMapper [20] database,extends NetGeo by using other data sources andheuristics, including geographically-based hostnameconventions. Padmanabhan and Subramanian [30]show that this hostname based mapping is accurateup to the granularity of a city. Another mappingtool is Akamai’s EdgeScape [10] which uses ge-ographical information gathered from ISPs alongwith hostname conventions to resolve IPs to theirgeographical locations. Besides Ixia and Akamai,other commercial providers include Matrix NetSys-tems [26].

To our knowledge, the only other work whichmeasures and models geographic location of Inter-net resources is recent work of Yook, Jeong andBarabasi [43]. That paper demonstrated the similarfractal dimension (� 1.5) of routers, ASes, andpopulation density; our work, not shown in thispaper, confirms this result for our datasets as well(via the box-counting method [25], [12]). However,our goals differ with respect to links and distance:while [43] studied the distribution of link lengths,we are concerned with the likelihood that two nodesare directly connected as a function of the distancebetween them. A preliminary version of our workappears as [23].

III. M ETHODOLOGY

We use router level topology snapshots fromtwo sources, collected by different methods andabout two years apart. For each router interface IPaddress in the datasets, we obtained a geographicalcoordinate and the AS that originated that address.

A. Datasets

Our first topology dataset is a large collectionof ICMP forward path (traceroute) probes. Thisdata was collected by Skitter, a measurement toolrun on more than 20 monitors around the worldby CAIDA [15]. Skitter sends hop-limited probesto a list of destination nodes located worldwide.Intermediate routers which respond to packets withexpired TTL values transmit an ICMP messageback to the source. Contained within this packetis the IP address of aninterface on the router;thus a successful Skitter probe reports a sequenceof interfaces along contiguous routers on the pathfrom the source to the destination. In this study,we treat interfaces as virtual nodes, and define alink to mean a connection between two adjacentinterfaces. The destination lists are created with theaim to cover all blocks of 256 addresses (/24s)in the IPv4 space [3]. Destinations are selectedby several methods, among which are: results ofsearches for several hundred thousand geographicnames and popular science articles from the topfive search engines, Squid web cache logs [42],CAIDA’s IP geography server [14], and UCSD webserver and traffic logs. Our particular dataset wasgathered between December 26, 2001 and January1, 2002 and is the union of traceroute paths from 19

Page 4: On the Geographic Location of Internet Resources - Computer Science

4

monitors, each probing a destination list of varyingsize. This dataset contains 704,107 router interfacesand 1,075,454 incident links. To our knowledge, thisis one of the largest and most recent router leveldataset studied to date. On this dataset, we followedthe methods of [3] and discarded anomalies suchas self-loops. We further discarded all interfacesappearing in the destination lists (18%). This wasmotivated by the fact that many destinations in theselists are end-hosts and we are interested only inrouters.

Our second dataset is another router level topol-ogy snapshot collected during August 1999 bythe Scan Project’s [36] Mercator tool. Mercatoralso uses hop-limited probes to discover and maprouters and links. Unlike Skitter, Mercator is runfrom a single host to a heuristically determineddestination address space [16]. Further, Mercatoremploys loose source routing to discover lateralconnectivity and therefore limits the tree-like prop-erties commonly found in single-source router map-ping efforts such as [33]. Our Mercator dataset isconsiderably smaller than our Skitter dataset, at268,382 router interfaces and 320,149 links. Mer-cator employs published techniques [32] to collapseinterface IP addresses belonging to the same routerto a canonical IP address for that router. After thisdisambiguation process, we are left with 228,263routers and 320,149 links.

An important distinction between maps generatedby Mercator and Skitter is that the former generatesa map of routers, while the latter generates mapsof interfaces. Routers often have multiple interfaces,thus maps that are unable to resolve which interfacesare present on which routers are prone to inac-curacies described elsewhere in the literature [2].The primary method to resolve interfaces [32] is tosend UDP probe packets to unknown ports for everyinterface in the dataset. When two interfaces are onthe same router, the router will respond with twoICMP Port unreachable messages, both of whichhave the same source IP address. Unfortunately, thistechnique suffers from numerous limitations, espe-cially because probe packets now frequently triggernetwork intrusion detection systems, and routersmay not respond correctly to the probes. Because ofthese reasons, we were not able to perform interfacedisambiguation on the Skitter datasets. Despite thisdifference, our conclusions seem robust whetherexpressed in terms of routers or interfaces. But to

Dataset No. of No. of No. ofNodes Links Locations

IxMapper, Mercator 214,498 258,999 7,696IxMapper, Skitter 563,521 862,933 12,610EdgeScape, Mercator 216,116 269,484 7,076EdgeScape, Skitter 570,761 881,618 13,767

TABLE I

SIZES OFPROCESSEDDATASETS

emphasize this difference, we will always keep theterms “router” and “interface” distinct in this paper.

B. Geographical Mapping

We draw on two different state-of-the-art ge-ographic mapping tools to identify IP addresseswith their geographical longitude and latitude: Ixia’sIxMapper [20] and Akamai’sEdgeScape[10].

IxMapper extends NetGeo’s [28] methods forlocation mapping by using several data sources anda library of heuristics to infer the geographicallocation of an IP address. The primary techniqueemployed by IxMapper is hostname based mapping.This technique exploits the fact that ISPs usually ad-here to a strict naming convention for each of theirrouters in which some sense of geographical loca-tion (such as city name or airport codes) is specified.For instance, 0.so-5-2-0.XL1.NYC8.ALTER.NETmaps to New York City. IxMapper also uses othertechniques, parsing whois records [17] and DNSLOC records [8]. The whois lookup method is gen-erally accurate for small organizations but may failin cases where geographically dispersed hosts aremapped to an organization’s registered headquarters.DNS LOC records, while accurate, are not requiredand are therefore not always available. IxMapperalways tries to use hostname based mapping, de-faulting to DNS LOC records if available and finallyto whois records.

Akamai’s EdgeScape service supplements host-name based mapping techniques with internal ISPgeographical information. Akamai’s many relation-ships with networks coupled with its extensiveserver deployment give it access to such informa-tion.

Our principle results are consistent across bothmapping tools. However, due to space limitationsand to avoid confusion, we only present results ob-tained from IxMapper in the next sections. Resultsfrom EdgeScape are provided in the Appendix.

Page 5: On the Geographic Location of Internet Resources - Computer Science

5

(a) US (b) Europe (c) Japan

Fig. 1. Regions Studied (Not to Same Scale).

Name North South West EastUS 50˚ N 25˚ N 150˚ W 45˚ WEurope 58˚ N 42˚ N 5˚ W 22˚ EJapan 60˚ N 30˚ N 130˚ E 150˚ E

TABLE II

BOUNDARIES OFREGIONSSTUDIED

Our results of geographic mapping therouter/interfaces from both datasets are encouraging.After discarding private addresses originating frommisconfigured routers, only 1% of Mercator’srouters and 1.5% of Skitter’s interfaces couldnot be located by IxMapper. Similarly, only0.6% of Mercator’s routers and 0.3% of Skitter’sinterfaces could not be identified by EdgeScape.All unmapped interfaces were discarded. For theMercator dataset, we determined the locationof a router by the location most commonlyreported across all its interfaces. We discardedrouters with ties for the most commonly reportedinterface location (2.9% for IxMapper and 2.5% forEdgeScape). Table I summarizes the final numberof geographically mapped interfaces/routers andlinks for both datasets.

For reasons described in the next section, themajority of our results are based on analysis of threeregions, delineated by the latitude and longitudelines shown in Table II. These regions, along withthe results of our IxMapper mapping for the Skitterdataset, are shown in Figure 1.

C. Mapping to Autonomous Systems

We next label all nodes in both datasets with theirparent AS. This was done by identifying the longestadvertised prefix in a BGP table that matches theIP address and recording the AS which originatedthat prefix. While there are several publicly available

sources of raw and processed BGP data [34], [19],[35], we used the RouteViews data from the Uni-versity of Oregon’s Advanced Network TechnologyCenter which has been the most comprehensivepublic source since 1997. RouteViews data is theunion of many BGP backbone tables contributedby several dozen participating ASes. We used BGPtables from August 10, 1999 and January 1, 2002 tomap the routers in the Mercator and Skitter datasets,respectively. For the Mercator dataset, we againdetermined the parent AS of a router by choosingthe AS most commonly reported by its interfaces.A small fraction of IP addresses (2.8% for Mercatorand 1.5% for Skitter) were not mapped. We groupedthese into a separate AS, which was omitted in ouranalysis of Autonomous Systems (Section VI).

D. Sources of Error and Validation

While we have taken considerable effort to ensurethe validity of our results that we present in thefollowing sections, there are inevitable sources oferror given the current state of the art in constructingInternet maps and in identifying geographic loca-tions of routers. Foremost among these sources oferror are that our router-level datasets are known tobe incomplete and subject to measurement biases,and that the accuracy of our geographic mappingtools has not been formally quantified. For thesereasons, we confirmed our results across two dif-ferent router inventory collection methods and twodifferent geographic mapping tools, reflecting thebest available snapshots of the router topology andthe best mapping methods in use today. While theresults we present hold for all combinations of thesepairings, we cannot rule out the possibility that ourfindings might differ if we had access to completelyaccurate maps.

We now briefly survey some experiments weconducted in assessing the accuracy of the geo-

Page 6: On the Geographic Location of Internet Resources - Computer Science

6

Population Interfaces People Per Online Online per(Millions) Interface (Millions) Interface

Africa 837 8,379 100,011 4.15 495South America 341 10,131 33,752 21.9 2,161Mexico 154 4,361 35,534 3.42 784W. Europe 366 95,993 3,817 143 1,489Japan 136 37,649 3,631 47.1 1,250Australia 18 18,277 975 10.1 552USA 299 282,048 1,061 166 588World 5,653 563,521 10,032 513 910

TABLE III

VARIATION IN PEOPLE/INTERFACEDENSITY ACROSSREGIONS

graphic mapping tools. The accuracy of these toolsis difficult to verify without a priori knowledge ofIP locations; moreover, a systematic validation ofthese tools is a significant undertaking in and ofitself, and one that leads us away from the centralquestions we seek to answer in this paper. Weinstead examine where and how often both toolsdiffer on our datasets. No clear pattern of biasemerged when we manually inspected outliers, i.e.those IP addresses for which the mapping toolsstrongly disagreed. Instead of trying to deduce and“model” where a particular tool fails, we consideredan alternative methodology to assess the impact ofmapping inaccuracies. We discarded all IPs thatwere mapped more than100 miles apart, and reranall of our experiments on this reduced graph, wherewe have high confidence in the accuracy of the map-ping. We found that our principle results do indeedhold on these reduced versions of the datasets, afact that supports our comparable findings on thecomplete datasets (which we now present).

IV. ROUTERS ANDPOPULATION

It is natural to assume that demand for Internetservices is greater in areas of higher population.All of the drivers for Internet service would seemto have a connection to population:e.g., end-userdemand, content availability, and switching capacity.What is less obvious is what precise relationshipwe should expect between population density anddensity of network infrastructure. In this section weexplore that relationship quantitatively; the resultsthen form a foundation for subsequent sections.

A. Variation Across Economic Regions

While a relationship between population and net-work infrastructure density is natural, it is also

obvious that this relationship is not the same inall parts of the world. We explore the variation indegree of Internet development in Table III. Thistable shows various regions of the world, includingboth less developed regions and highly developedregions.2 The Interfacescolumn shows the numberof interfaces from our Skitter dataset that weremapped into this region.Population numbers arefrom Columbia University’s CIESIN database [5],and the number ofOnline Usersper region is fromthe extensive repository of survey statistics gatheredand maintained by Nua, Inc [29].3

Looking at the first three columns of the table,it is clear that penetration of Internet infrastructurevaries dramatically across regions; the ratio of peo-ple to interfaces varies by a factor of over 100 fromless developed to highly developed regions. Thismakes it clear that studying population vs. interfacedensity over the entire world will be misleading.On the other hand, the last two columns providea different perspective: the ratio of online peopleto interfaces showsmuch less variability — onlyabout a factor of four across the regions studied.This is encouraging for two reasons: first, it suggeststhat the number of online users in an area mayprovide a rough indicator of the amount of networkinfrastructure present; and second, it suggests thatour datasets are not excessively biased in favor ofany particular geographical area. We note that wefound the same ranges of variation in our Mercatordataset — a variation of a factor over 100 in peopleper router, and only about a factor of 4 in online

2Regions in this table, and throughout this paper, are delineatedby simple latitude/longitude boundaries. The names assigned to suchregions are only approximate, since we are not working with precisepolitical boundaries.

3According to Nua, an online user is defined as an adult or childwho has accessed the Internet at least once during the last 3 months,although not all of their data is strictly based on this definition.

Page 7: On the Geographic Location of Internet Resources - Computer Science

7

0

1

2

3

4

5

3 3.5 4 4.5 5 5.5 6 6.5 7 7.5

log1

0(R

oute

r C

ount

)

log10(Population Count)

y = 1.20x-4.82

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

4.5 5 5.5 6 6.5 7 7.5

log(

10)

Rou

ter

Cou

nt

log(10) Population Count

y = 1.56x-8.18

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5 5.5 6 6.5 7 7.5 8

log1

0(R

oute

r C

ount

)

log10(Population Count)

y = 1.75x-9.55

0

1

2

3

4

5

3 3.5 4 4.5 5 5.5 6 6.5 7 7.5

log1

0(In

terf

ace

Cou

nt)

log10(Population Count)

y = 1.26x-4.70

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

4.5 5 5.5 6 6.5 7 7.5

log1

0(In

terf

ace

Cou

nt)

log10(Population Count)

y = 1.60x-7.77

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5 5.5 6 6.5 7 7.5 8

log1

0(In

terf

ace

Cou

nt)

log10(Population Count)

y = 1.71x-8.86

(a) US (b) Europe (c) Japan

Fig. 2. Router/Interface Density vs. Population Density: Upper, Mercator (Routers); Lower, Skitter (Interfaces). Corresponding results usingEdgeScape can be found in the Appendix (Figure 11).

people per router.

Fig. 3. Regions Used to Test for Homogeneity

Population Interfaces People Per(Millions) Interface

Northern US 168 182,846 991Southern US 132 101,102 1305Central Am. 154 4,361 35,533

TABLE IV

TESTING FORHOMOGENEITY

Thus it is important to restrict our study toregions that are roughly homogeneous in terms ofdevelopment of Internet infrastructure. Using simpletests we can easily verify whether a region meetsthis criterion. For example, consider the case of thecontinental US. We can test its homogeneity by di-viding it into two subregions, as shown in Figure 3.We also include a portion of Central America as athird region for comparison. The statistics for these

three regions are shown in Table IV. It is clear thatthe two subregions of the US are quite similar indeployment of network infrastructure, and that theCentral American region is dramatically different.

B. Infrastructure vs. People in Homogeneous Re-gions

Focusing on the economically homogeneous re-gions shown in Figure 1 and delineated in Table IIallows us to ask how router density relates topopulation density. To answer this question, wesubdivided each region into patches of size 75 arc-minutes� 75 arc-minutes. At the latitudes studied,this creates patches about 90 miles on a side. Thissize is much larger than the median location errorreported by Padmanabhan and Subramanian [30]for their toolset, which employs techniques similarto (and a subset of) those used by IxMapper andEdgeScape. Within each patch, we tally the popu-lation and the number of routers or interfaces.

The results are plotted on log-log scale in Fig-ure 2, for the two datasets and three regions.Each plot includes a least-squares fitted line forcomparison purposes. The figure shows that withineach region, the plots for routers and interfaces arequalitatively quite similar, as are the properties ofthe fitted lines. This similarity is striking given the

Page 8: On the Geographic Location of Internet Resources - Computer Science

8

0

0.0002

0.0004

0.0006

0.0008

0.001

0.0012

0.0014

0.0016

0.0018

0 500 1000 1500 2000 2500 3000

f(d)

est

imat

e

d (miles)

0

0.0002

0.0004

0.0006

0.0008

0.001

0.0012

0.0014

0 100 200 300 400 500 600 700 800

f(d)

est

imat

e

d (miles)

0

5e-05

0.0001

0.00015

0.0002

0.00025

0.0003

0.00035

0 100 200 300 400 500 600

f(d)

est

imat

e

d (miles)

0

2e-05

4e-05

6e-05

8e-05

0.0001

0.00012

0.00014

0.00016

0.00018

0 500 1000 1500 2000 2500 3000

f(d)

est

imat

e

d (miles)

05e-05

0.00010.000150.0002

0.000250.0003

0.000350.0004

0.000450.0005

0 100 200 300 400 500 600 700 800

f(d)

est

imat

e

d (miles)

0

2e-05

4e-05

6e-05

8e-05

0.0001

0.00012

0.00014

0.00016

0 100 200 300 400 500 600

f(d)

est

imat

e

d (miles)

(a) US, bin size = 35 mi. (b) Europe, bin size = 15 mi. (c) Japan, bin size = 11 mi.

Fig. 4. Empirical Distance Preference Function: Upper, Mercator; Lower, Skitter. Corresponding results using EdgeScape can be found inthe Appendix (Figure 12).

considerable time difference of collection betweenthe two datasets, and the very different collectionmethods.

All the plots show a strong relationship be-tween infrastructure and population density. Al-though these plots appear roughly linear on theselog-log axes, the precise functional relationshipbetween population density and router density isdifficult to identify from these data because of thesignificant amount of noise, and the relatively lim-ited range of scales available. For example, it wouldbe hard to distinguish an log n relationship from apower law relationship for the data in Figure 2(a).

Nonetheless, we conclude that in each plot,router/interface density clearly bears asuperlinearrelationship to population density (slope of thefitted line is larger than 1). This surprising resultindicates that the number of routers or interfaces perperson ishigher in areas of high population density(population centers).

Furthermore, it seems reasonable to use a simplepower law relationship as an approximation for thetrends seen in these plots; that is, over the limitedrange of data studied, we can approximately modelrouter or interface densityR and population densityP as related by

R � P�

with � varying from 1.2 to 1.7 across the regionsstudied, based on the slopes of the fitted lines.

This result may be interpreted as a consequenceof simple scaling effects: as the number of networkusersn in a region grows, the number of potentialconnections between pairs of users grows via ann2

law. If the capacity of individual switches does notscale accordingly, then in order to provide accept-able service it becomes necessary to add switchesin a superlinear fashion. Thus,e.g.,multistage inter-connection networks for multiprocessor computersare often designed to scale inn log n fashion [22],[18].

V. L INKS AND DISTANCE

Given an understanding of how routers are dis-tributed over the Earth’s surface, we next proceedto examine the geographical properties of node-node links. As described in Section II, early workin topology generation used a distance-sensitivefunction for link creation, while later work has fo-cused on different features, such as overall networkstructure and node degree distribution.

Our data provides an opportunity to examine thesensitivity of router connections to distance. To doso we proceed as follows: we measure the empiricalprobability that two routers separated by great-circledistanced, are directly connected.

Page 9: On the Geographic Location of Internet Resources - Computer Science

9

-7.5

-7

-6.5

-6

-5.5

-5

-4.5

-4

0 50 100 150 200 250

ln(f

(d)

estim

ate)

d (miles)

y = -0.00691x - 5.11

-14

-13

-12

-11

-10

-9

-8

-7

-6

0 50 100 150 200 250 300

ln(f

(d)

estim

ate)

d (miles)

y = -0.0128x - 7.51

-11

-10.5

-10

-9.5

-9

-8.5

-8

-7.5

-7

0 20 40 60 80 100120140160180200

ln(f

(d)

estim

ate)

d (miles)

y = -0.00689x - 8.62

-9

-8.5

-8

-7.5

-7

-6.5

-6

-5.5

-5

0 50 100 150 200 250

ln(f

(d)

estim

ate)

d (miles)

y = -0.00705x - 6.61

-14

-13

-12

-11

-10

-9

-8

-7

0 50 100 150 200 250 300

ln(f

(d)

estim

ate)

d (miles)

y = -0.0123x - 8.40

-13

-12

-11

-10

-9

-8

0 20 40 60 80 100120140160180200

ln(f

(d)

estim

ate)

d (miles)

y = -0.00882x - 9.84

(a) US (b) Europe (c) Japan

Fig. 5. Empirical Distance Preference Function, Smalld, Semi-Log: Upper, Mercator; Lower, Skitter. Corresponding results from EdgeScapecan be found in the Appendix (Figure 13).

Note that this is not the same as assuming thatlink creation is in fact dependent on distance; moredetailed data would be needed to verify that claim.However, evidence of distance-sensitivity in routerconnectivity is suggestive of factors influencing linkcreation, and provides an important characteristic tobe taken into account in constructing and validatingtopology generators.

For any pair of routers separated by distancedlet C be the event that the two routers are directlyconnected. Then we are interested in estimating thelikelihood function:

f(d) = P [Cjd]:

We call this thedistance preference function.Weestimate this function by placing the data into binsof size b. Then we form the empirical distancepreference function as:

f(d) =# links with length in[d; d+ b)

# node pairs with distance in[d; d + b)(1)

for values ofd that are multiples ofb.The resulting estimates for our three regions are

shown in Figure 4. The maximum value ofd varieswith size of the region considered; in each case weuse 100 bins (the bin sizes are noted on the figure).Note that for large distances the number of links and

router pairs grows small, making the estimate basedon (1) noisy, so we omit the very largest distancesfrom these plots.

Broadly speaking, these plots appear to showtwo regimes: for short distances,f(d) declines withdistance; while for longer distances,f(d) seemsnearly constant. To explore this relationship further,we break the data up into two regions, “smalld”and “larged”, and plot the two regions separately.We motivate how to choose the cut-off point mo-mentarily.

Focusing first on smalld, we plot ln(f(d)) vs.d. These plots are shown in Figure 5. Surprisingly,these plots show a linear tendency on the semi-log axes, suggestive of an exponentially decliningfunction.4 In fact, these fits can be characterized interms of Waxman’s method for topology generation[41]. In the Waxman model, the probability that twonodes are connectedfW (d) is:

fW (d) = � exp(�d=�L)

whereL is the maximum distance between nodes,0 < � � 1 is the sensitivity of link formation todistance, and0 < � � 1 controls link density.

4Note again that the much smaller number of routers and linksfor the Japan region means that the method results in more noisyestimates.

Page 10: On the Geographic Location of Internet Resources - Computer Science

10

0

0.002

0.004

0.006

0.008

0 500 1000 1500 2000 2500 3000

F(d

)

d (miles)

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0 100 200 300 400 500 600 700 800

F(d

)

d (miles)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0 100 200 300 400 500 600

F(d

)

d (miles)

0

0.002

0.004

0.006

0.008

0.01

0 500 1000 1500 2000 2500 3000

F(d

)

d (miles)

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

0 100 200 300 400 500 600 700 800

F(d

)

d (miles)

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0 100 200 300 400 500 600

F(d

)

d (miles)

(a) US (b) Europe (c) Japan

Fig. 6. Cumulated Empirical Distance Preference Function, Larged: Upper, Mercator; Lower, Skitter. Corresponding results obtained fromEdgeScape can be found in the Appendix (Figure 14).

In terms of the Waxman model, we find estimatesof �L � 140 miles for the US and Japan, and 80miles for Europe. This is not to suggest that theWaxman model is a correct model for the growthof the Internet over these distance ranges, but ratherthat it is surprisingly descriptive of the end result.5

In the other region (larged), the functionf(d)appears nearly constant,i.e., insensitive to distance.Because of the noise in the data, we study thecumulated distance preference function,

F (d) =X

d0<d

f(d0):

Summing the data smooths out noise, and if theoriginal functionf(d) is constant, then the cumu-lated functionF (d) will be linear.

The results are shown in Figure 6. In each plot, afitted least square line is also shown for comparison.Again, for large distances the number of links androuter pairs grows small, so the trailing off of thepoints from the fitted line may not be significant. Allof these plots but one (Mercator, Europe) show goodagreement with the linear fit line, suggesting that theprobability two routers are directly connected forlarged is independent of their separation distance.

5It is important to note that the Waxman topology generator placespoints randomly in the plane, which is very far from the case for ourdata.

Mercator Skitter% Links % Links

Limit < Limit Limit < LimitUSA 820 mi. 82.1% 818 mi. 77.2%Europe 383 mi. 97.3% 366 mi. 95.4%Japan 165 mi. 91.5% 116 mi. 92.8%

TABLE V

LIMITS OF DISTANCE SENSITIVITY

By setting the exponential functional fits in Fig-ure 5 equal to the averagef(d) value for larged,we obtain a value for each plot that approximatelydemarcates the limit of the distance-sensitive por-tion of the empirical preference function. Roughlyspeaking, links between router pairs that are furtherapart than this limit can be considered distance-independent, while links with length less thanthe limit are consistent with a distance-dependentmodel.

The limit values are shown in Table V. The tablealso shows the fraction of links whose length is lessthan the limit in each case. The table shows thatvalues across datasets are strikingly consistent, butacross regions are not. The variation across regionsis a consequence of the differences in overall densityof links and different distance sensitivity parameters(�L) in each region.

Even more notable is the fraction of links in each

Page 11: On the Geographic Location of Internet Resources - Computer Science

11

-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

log1

0(P

[X>

x])

log10(Number of Interfaces)

-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0 0.5 1 1.5 2 2.5 3 3.5

log1

0(P

[X>

x])

log10(Number of Locations)

-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0 0.5 1 1.5 2 2.5 3 3.5

log1

0(P

[X>

x])

log10(AS Degree)

(a) No. of Interfaces (b) No. of Locations (c) AS degree

Fig. 7. Distributions of AS Sizes (World). Corresponding results using EdgeScape can be found in the Appendix (Figure 15).

0

0.5

1

1.5

2

2.5

3

3.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

log1

0(N

umbe

r of

Loc

atio

ns)

log10(Number of Interfaces)

0

0.5

1

1.5

2

2.5

3

3.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

log1

0(A

S D

egre

e)

log10(Number of Interfaces)

0

0.5

1

1.5

2

2.5

3

3.5

0 0.5 1 1.5 2 2.5 3 3.5

log1

0(A

S D

egre

e)

log10(Number of Locations)

(a) No. Interfaces and No. Locations (b) No. of Interfaces and Degree (c) No. Locations and Degree

Fig. 8. Scatterplots of AS Size Measures (World). Corresponding results using EdgeScape can be found in the Appendix (Figure 16).

dataset with length less than the sensitivity limit.Most links (from 75% to 95%) fall within the rangeof link lengths considered distance-sensitive. Weconclude that distance sensitivity of router connec-tivity applies to the vast majority of router-routerlinks in our datasets.

On the other hand, we note that although asmall fraction of routers are connected in a mannerinsensitive to distance, they are clearly not randomlyconnected, and their connections doubtless play animportant structural role. In fact, work in [40] hasshown that only a very small fraction of non-locallinks is needed to dramatically reduce the averagediameter of an otherwise locally-connected graph.

VI. A UTONOMOUSSYSTEMS

Having developed an understanding of how prop-erties of routers and links relate to geography, wenow turn to the properties of autonomous systems.A significant, unsolved problem common to allcurrent topology generators is their inability to labelrouters with autonomous system information in arepresentative way. This prevents topology genera-tors from being able to assign IP addresses automat-ically to routers for simulating interdomain routing.

In this section we study two geographic propertiesof ASes that can help guide the assignment ofrouters to ASes: the number of distinct locationsspanned by an AS, and the geographical dispersionof an AS’s components.

Due to space limitations, this section uses datafrom Skitter (with IxMapper) only, but our resultsin this section are consistent across the two datasetsand both mapping methods.

A. AS Size: Number of Locations

Previous work has documented the distribution ofAS sizes measured in degree in the AS-graph [13]and measured in the number of routers within theAS [38]. In both cases, the observed distributionis highly variable, with a long tail spanning manyorders of magnitude. In this section we show that asimilar property holds for AS size when measuredas the number of distinct locationsin which aninterface for the AS is present.

In Figure 7 we show log-log complementarydistribution plots of three measures of AS size in ourskitter data: (a) the number of interfaces containedin an AS; (b) the number of distinct geographic

Page 12: On the Geographic Location of Internet Resources - Computer Science

12

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 4e+07 8e+07 1.2e+08 1.6e+08

P(X

<=

x)

Area of AS Convex Hull (sq. mi.)

0.7

0.75

0.8

0.85

0.9

0.95

1

0 1e+06 2e+06 3e+06 4e+06 5e+06

P(X

<=

x)

Area of AS Convex Hull (sq. mi.)

0.7

0.75

0.8

0.85

0.9

0.95

1

0 200000 400000 600000 800000 1e+06

P(X

<=

x)

Area of Convex Hull (sq. mi.)

(a) World (b) US (c) Europe

Fig. 9. CDFs of AS Convex Hull Size

0

1

2

3

4

5

6

7

8

9

0 0.5 1 1.5 2 2.5 3 3.5

log1

0(S

ize

of C

onve

x H

ull)

log10(Degree of AS)

0

1

2

3

4

5

6

7

8

9

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

log1

0(A

rea

of C

onve

x-H

ull)

log10(Number of Router Interfaces)

0

1

2

3

4

5

6

7

8

9

0 0.5 1 1.5 2 2.5 3 3.5

log1

0(A

rea

of C

onve

x H

ull)

log10(Number of Locations)

(a) Degree vs. CH (b) No. Interfaces vs. CH (c) No. Locations vs. CH

Fig. 10. Scatterplots of Size Measures vs. Convex Hull (World). Corresponding results using EdgeScape can be found in the Appendix(Figure 17).

locations contained in an AS; and (c) the degreeof the AS in the AS-graph (the number of otherASes directly connected to an AS).

Figures 7(a) and (c) generally agree with previouswork suggesting that these AS size measures havelong-tail distributions. Figure 7(b) broadens thisunderstanding by showing that the same is true forthe number of distinct locations spanned by an AS.

In [38], the authors point out that the numberof routers in an AS and the degree of the AS arestrongly related. Our data shows that in thethree-way relationship among (1) number of interfaces,(2) number of locations, and (3) degree, each pairof measures shows correlation. This is shown inFigure 8, which shows scatter plots of (a) number ofinterfaces and number of locations; (b) number ofinterfaces and degree; (c) number of locations anddegree.

These plots show that the correlation betweennumber of locations and degree (Figure 8(c)) isas strong or stronger than the previously docu-mented correlation between the number of interfacesand degree (Figure 8(b)). The strongest correlation(tightest scatterplot) appears to be that between

number of interfaces and number of locations (Fig-ure 8(a)), suggesting that ASes with a large numberof interfaces (routers) tend to place those resourcesin many distinct locations.

Figure 8(a) also reveals that there is surprisinggeographic diversity in how ASes place routers. Forexample, it shows that some ASes with hundredsof interfaces have placed them in onlytwo loca-tions distinguishable by our methods (lowest lineof points in plot).

B. AS Size: Geographical Extent

The results in the last subsection suggest thattopology generators that label routers with AS num-bers should do so in a manner that creates manygeographically distinct locations for large ASes,while creating a more variable number of distinctlocations for medium and small ASes. However,it is not clear from this datawheresuch locationsshould be chosen relative to each other. To answerthis question we examine the geographical extentof ASes — the degree to which an AS’s routers aredispersed over the Earth’s surface.

Page 13: On the Geographic Location of Internet Resources - Computer Science

13

Inter Address Space Intra Address SpaceCount Mean Length Count Mean Length

World 146,936 1664 mi. 715,997 757 mi.US 77,367 762 mi. 354,593 421 mi.Europe 15,365 88.6 mi. 99,023 29.1 mi.Japan 3,651 181 mi. 44,701 54.5 mi.

TABLE VI

INTER ADDRESSSPACE VS. INTRA ADDRESSSPACE LINKS

To assess this property we measured the convexhull of each AS’s interface set. The standard defi-nition of convexity of a point set is not applicableon a manifold such as the globe, so we adopted thefollowing simple approach: we mapped each pointonto the plane using the Albers Equal Area projec-tion [37]. This conic projection does not preserveareas perfectly (no projection can) but since our goalin this section is primarily qualitative, this approachwas deemed sufficient. The globe is unfolded at thepoles and the International Date Line, thus yieldinga standard planar geometry in which convexity of aset is well defined.

Figure 9 shows CDFs of convex hull size for theWorld, and for portions of the map restricted to theUS and Europe regions. These plots show that thevast majority of ASes have no extent at all: around80% of ASes in each dataset have either one or twolocations (and thus zero area). However, among theremaining ASes, there is considerable variability ingeographical dispersion.

To understand what drives geographical disper-sion, we compare the size measures from the lastsection to the convex hull measure. The results areshown in Figure 10. These plots expose two distinctstrategies or regimes for AS interface placement,depending on AS size.

For smaller ASes, the minimum convex hull sizegrows with other size measures, but there is amaximum size that is often attained even for thesmallest ASes. That is, even small ASes (e.g.,thosewith only three or four locations, or two or threeconnections to other ASes) may be very widelydispersed geographically (in fact, worldwide).

For the largest ASes, there is no relationship be-tween other size measures and geographical extent;all these ASes are fully dispersed geographically.This can be seen for ASes with degree greater thanabout 100 (Figure 10(a)), with more than about 1000interfaces (Figure 10(b)), or with more than about

100 locations (Figure 10(c)).

C. Domains and Link Lengths

A final question bearing on the geographic ar-rangement of ASes regards the properties of AS-AS connections. To study this question we make thedistinction between links that cross ASes (interdo-main) and links that lie within an AS (intradomain).Previous work have labeled a link as interdomainif the routers it connects are assigned to differentASes, and intradomain otherwise [39], [3]. Aproblem with such an approach is that border routersare often assigned an IP address from a neighboringASs’ address space, as detailed in [6]. Therefore,we instead study links that cross address spaces andlinks that lie within the same address space. Theproperties of these links in the Skitter dataset areshown in Table VI. This table shows that about halfof all links in our dataset lie within the continentalUS. With respect to length, the table shows thatfor our dataset, inter address space links tend tobe about twice as long as intra links, and thatthe majority of links (83% or more) lie within anaddress space. Comparing this table with Table Vshows that the mean length of intra address spacelinks is well within the limits of distance sensitivity,while for the US and Japan, the mean lengths ofinter address space links approaches or exceeds thelimits of distance sensitivity.

VII. C ONCLUSIONS

In this paper we have described a wide rangeof geographical properties of the Internet, focusingon routers, links, and autonomous systems. We arespecifically motivated to develop results to guidethe development of geographically-driven topologygeneration methods.

We believe that geographically-based topologygeneration has some advantages over existing meth-ods. To be useful, network topologies must be

Page 14: On the Geographic Location of Internet Resources - Computer Science

14

labeled with link latencies; these can be approxi-mated in a straightforward manner when nodes havegeographical location. Second, network topologiesmust be labeled with link bandwidths; such labelingmight also be more tractable when starting froma geographical placement of nodes. Finally, routersneed autonomous system labels in order to assignIP addresses to them in a realistic manner,e.g.,to simulate inter-domain routing; we believe thatknowledge of geography provides leverage here aswell.

Given the evident promise of geographically-based topology generation, we have presented in thispaper a collection of results intended to bring thatgoal closer. First, we have described the quantitativerelationship between population density, and thecorresponding density of routers (or router inter-faces). We have shown that, within economicallyhomogeneous regions, there is a systematic rela-tionship between these two densities that appearssuperlinear, so that router density per person ishigher in population centers.

Second, we have shown that the connection pat-terns between routers are strongly related to geo-graphical distance. For our data, between 75% and95% of all links are consistent with a distance-basedmodel for link formation.

Finally, we have shown that the number of dis-tinct locations spanned by an AS is strongly cor-related with two other measures of size: numberof interfaces, and degree in the AS graph. Amongsmall to medium ASs, these locations show widevariability in their geographic dispersal. However,all ASs exceeding a certain threshold in size aremaximally dispersed geographically.

Beyond and apart from these implications fortopology generation, we believe that understandingthe relationship between physical geography and theInternet’s resources is an important component ofInternet science(loosely, the study of laws and pat-terns in Internet structure); we anticipate that under-standing the relationship between network structureand geography will have broad applicability acrossdisciplines and for future problems.

Acknowledgements

This work benefitted significantly from the helpand comments of a number of people, and wethank them. CAIDA provided facilities and support

for part of this work. David Moore and k claffy(CAIDA) provided important guidance and advicein using IxMapper. Andre Broido (CAIDA) helpedus understand Skitter data and how to use it. BruceMaggs (CMU/Akamai) helped map our datasetsusing EdgeScape. Ramesh Govindan (ICSI) andHongsuda Tangmunarunkit (USC/ISI) provided theMercator data and advice on AS mapping. We ben-efitted from extensive discussions about the impor-tance of geographical location of Internet resourceswith Albert-Laszlo Barabasi and Hawoong Jeong(both at University of Notre Dame) which helpedshape our interest in this topic. In addition, we’vebenefitted from discussions on this work with AviFreedman (Akamai), Gregory Yetmen (CIESIN),and Larry Landweber (U. Wisconsin).

REFERENCES

[1] A.-L. Barabasi and R. Albert. Emergence of Scaling in RandomNetworks. Science, pages 509–512, October 1999.

[2] P. Barford, A. Bestavros, J. Byers, and M. Crovella. Onthe Marginal Utility of Network Topology Measurements. InProcedings of the SIGCOMM Internet Measurement Workshop(IMW ’01), November 2001.

[3] A. Broido and K. Claffy. Connectivity of IP Graphs. InProceedings of SPIE ITCom ’01, Scalability and Traffic Controlin IP Networks, August 2001.

[4] T. Bu and D. Towsley. On Distinguishing between InternetPower Law Topology Generators. InProceeedings of IEEEINFOCOM ’02, 2002.

[5] Center for International Earth Science Information Network(CIESIN), Columbia University. Gridded population of theworld. Available atwww.ciesin.org.

[6] H. Chang, S. Jamin, and W. Willinger. Inferring AS-Level Inter-net Topology From Router-Level Path Traces. InProceedings ofSPIE ITCom ’01, Scalability and Traffic Control in IP Networks,August 2001.

[7] Q. Chen, H. Chang, R. Govindan, S. Jamin, S. Shenker, andW. Willinger. The Origin of Power Laws in Internet TopologiesRevisited. InProceeedings of IEEE INFOCOM ’02, 2002.

[8] C. Davis, P. Vixie, T. Goodwin, and I. Dickinson. A Means forExpressing Location Information in the Domain Name System.RFC 1876, January 1996.

[9] M. Doar. A Better Model for Generating Test Networks. InProceedings of IEEE GLOBECOM, November 1996.

[10] EdgeScape. Athttp://www.akamai.com/.[11] P. Erd�os and A. Renyi. On the evolution of random graphs.

Publ. Math. Inst. Hung. Acad. Sci., 5:17–61, 1960. Discussionappears in B. Bollobas, Random Graphs, Academic Press, Inc.,1985.

[12] K. Falconer.Fractal Geometry. John Wiley & Sons, Ltd., 1990.[13] M. Faloutsos, P. Faloutsos, and C. Faloutsos. On Power-Law

Relationships of the Internet Topology. InACM SIGCOMM,pages 251–62, Cambridge, MA, September 1999.

[14] Cooperative Association for Internet Data Analysis (CAIDA).Netgeo. Athttp://www.caida.org/tools/utilities/netgeo.

[15] Cooperative Association for Internet Data Analysis (CAIDA).The Skitter project. Athttp://www.caida.org/Tools/Skitter.

Page 15: On the Geographic Location of Internet Resources - Computer Science

15

[16] R. Govindan and H. Tangmunarunkit. Heuristics for InternetMap Discovery. InProceedings of IEEE INFOCOM ’00, March2000.

[17] K. Harrenstien, M. Stahl, and E. Feinler. NICNAME/WHOIS.RFC 954, October 1985.

[18] P. G. Harrison. Analytic models for multistage interconnectionnetworks. Journal of Parallel and Distributed Computing,12:357–369, 1991.

[19] Internet Routing Registry. Athttp://www.irr.net.[20] IxMapper. At http://www.ixiacom.com/products/.[21] C. Jin, Q. Chen, and S. Jamin. Inet: Internet Topology

Generator. Technical Report Research Report CSE-TR-433-00,University of Michigan at Ann Arbor, 2000.

[22] C. Kruskal and M. Snir. The performance of multistage inter-connection networks for multiprocessors.IEEE Transactionson Computers, 32(12):1091–1098, 1983.

[23] A. Lakhina, J. W. Byers, M. Crovella, and I. Matta. On theGeographic Location of Internet Resources (Short Abstract). InProceedings of the SIGCOMM Internet Measurement Workshop(IMW ’02), November 2002.

[24] D. Magoni and J. Pansiot. Analysis and Comparison ofInternet Topology Generators. In2nd International IFIP-TC6Networking Conference, 2002.

[25] B. Mandelbrot. The Fractal Geometry of Nature. W. H.Freedman and Co., New York, 1983.

[26] Matrix NetSystems. Athttp://www.matrixnetsystems.com.[27] A. Medina, A. Lakhina, I. Matta, and J. Byers. BRITE: An

Approach to Universal Topology Generation. InProceedingsof IEEE MASCOTS ’01, August 2001.

[28] D. Moore, R. Periakaruppan, J. Donohoe, and K. Claffy. Wherein the world is netgeo.caida.org?INET, July 2000.

[29] How many online? Available atwww.nua.com.[30] V. N. Padmanabhan and L. Subramanian. An Investigation

of Geographic Mapping Techniques for Internet Hosts. InProceedings of ACM/SIGCOMM ’01, August 2001.

[31] C.R. Palmer and J. G. Steffan. Generating Network TopologiesThat Obey Power Laws. InProceedings of Global InternetSymposium, Globecom 2000, November 2000.

[32] J. Pansiot and D. Grad. On Routes and Multicast Trees in theInternet.ACM Computer Communication Review, 28(1):41–50,January 1998.

[33] Internet Mapping Project. Athttp://www.cs.belllabs.com/who/-ches/map/, 1999.

[34] RIPE BGP Data. Athttp://www.ripe.net.[35] RouteViews Project at University of Oregon. At

http://www.antc.uoregon.edu/route-views/.[36] The SCAN project. Athttp://www.isi.edu/scan/, 1999.[37] J. P. Snyder.Flattening the Earth: Two Thousand Years of Map

Projection. University of Chicago Press, 1997.[38] H. Tangmunarunkit, J. Doyle, R. Govindan, S. Jamin,

S. Shenker, and W. Willinger. Does AS Size Determine Degreein AS Topology? InACM Computer Communication Review,October 2001.

[39] H. Tangmunarunkit, R. Govindan, S. Shenker, and D. Estrin.The Impact of Routing Policy on Internet Paths. InProceedingsof IEEE INFOCOM ’01, 2001.

[40] D. J. Watts and S. H. Strogatz. Collective Dynamics of ‘Small-World’ Networks. Nature, pages 440–442, June 1998.

[41] B. Waxman. Routing of Multipoint Connections.IEEE J.Select. Areas Commun., December 1988.

[42] D. Wessels. Squid web proxy cache. Athttp://www.squid-cache.org.

[43] S.-H. Yook, H. Jeong, and A.-L. Barabasi. Modeling theinternet’s large-scale topology.Proceedings of the NationalAcademy of Sciences, 99:13382–13386, 2002.

[44] E. W. Zegura, K.L. Calvert, and M. J. Donahoo. A Quanti-tative Comparison of Graph-based Models for Internetworks.IEEE/ACM Transactions on Networking, pages 770–783, De-cember 1997.

APPENDIX

RESULTS FROMEDGESCAPE MAPPING

The results presented in this Appendix wereobtained from Akamai’s EdgeScape mapping andare provided as a comparison point against plotspresented in the paper using IxMapper.

Page 16: On the Geographic Location of Internet Resources - Computer Science

16

0

1

2

3

4

5

3 3.5 4 4.5 5 5.5 6 6.5 7 7.5

log1

0(R

oute

r C

ount

)

log10(Population Count)

y=1.26x -5.35

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

4.5 5 5.5 6 6.5 7 7.5

log(

10)

Rou

ter

Cou

nt

log(10) Population Count

y=1.58x -8.27

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5 5.5 6 6.5 7 7.5 8

log(

10)

Rou

ter

Cou

nt

log(10) Population Count

y=1.15x -5.53

0

1

2

3

4

5

3 3.5 4 4.5 5 5.5 6 6.5 7 7.5

log(

10)

Rou

ter

Cou

nt

log(10) Population Count

y=1.27x -5.05

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

4.5 5 5.5 6 6.5 7 7.5

log(

10)

Rou

ter

Cou

nt

log(10) Population Count

y=1.66x -8.29

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5 5.5 6 6.5 7 7.5 8

log(

10)

Rou

ter

Cou

nt

log(10) Population Count

y=1.29x-6.23

(a) US (b) Europe (c) Japan

Fig. 11. Router/Interface Density vs. Population Density: Upper, Mercator (Routers); Lower, Skitter (Interfaces). Compare with Figure 2.

0

0.0005

0.001

0.0015

0.002

0.0025

0 500 1000 1500 2000 2500 3000

f(d)

est

imat

e

d (miles)

0

0.0002

0.0004

0.0006

0.0008

0.001

0.0012

0 100 200 300 400 500 600 700 800

f(d)

est

imat

e

d (miles)

0

5e-05

0.0001

0.00015

0.0002

0.00025

0 100 200 300 400 500 600

f(d)

est

imat

e

d (miles)

0

5e-05

0.0001

0.00015

0.0002

0.00025

0.0003

0 500 1000 1500 2000 2500 3000

f(d)

est

imat

e

d (miles)

0

0.0001

0.0002

0.0003

0.0004

0.0005

0.0006

0 100 200 300 400 500 600 700 800

f(d)

est

imat

e

d (miles)

0

2e-05

4e-05

6e-05

8e-05

0.0001

0.00012

0.00014

0 100 200 300 400 500 600

f(d)

est

imat

e

d (miles)

(a) US, bin size = 35 mi. (b) Europe, bin size = 15 mi. (c) Japan, bin size = 11 mi.

Fig. 12. Empirical Distance Preference Function: Upper, Mercator; Lower, Skitter. Compare with Figure 4.

Page 17: On the Geographic Location of Internet Resources - Computer Science

17

-7

-6

-5

-4

-3

-2

0 50 100 150 200 250

ln(f

(d)

estim

ate)

d (miles)

y=-0.008x -4.06

-13

-12

-11

-10

-9

-8

-7

-6

0 50 100 150 200 250 300

ln(f

(d)

estim

ate)

d (miles)

y=-0.01x -7.51

-11

-10.5

-10

-9.5

-9

-8.5

-8

-7.5

-7

0 20 40 60 80 100120140160180200

ln(f

(d)

estim

ate)

d (miles)

y=-0.003x-8.91

-9

-8.5

-8

-7.5

-7

-6.5

-6

0 50 100 150 200 250

ln(f

(d)

estim

ate)

d (miles)

y=-0.009x -6.490

-11

-10

-9

-8

-7

-6

-5

0 50 100 150 200 250 300

ln(f

(d)

estim

ate)

d (miles)

y=-0.013x-6.228

-12.5

-12

-11.5

-11

-10.5

-10

-9.5

-9

-8.5

-8

0 50 100 150 200

ln(f

(d)

estim

ate)

d (miles)

y=-0.003x-10.04

(a) US (b) Europe (c) Japan

Fig. 13. Empirical Distance Preference Function, Smalld, Semi-Log: Upper, Mercator; Lower, Skitter. Compare with Figure 5.

0

0.002

0.004

0.006

0.008

0.01

0 500 1000 1500 2000 2500 3000

F(d

)

d (miles)

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0 100 200 300 400 500 600 700 800

F(d

)

d (miles)

0

0.02

0.04

0.06

0.08

0.1

0 100 200 300 400 500 600

F(d

)

d (miles)

0

0.001

0.002

0.003

0.004

0.005

0 500 1000 1500 2000 2500 3000

F(d

)

d (miles)

0

0.002

0.004

0.006

0.008

0 100 200 300 400 500 600 700 800

F(d

)

d (miles)

0

0.01

0.02

0.03

0 100 200 300 400 500 600

F(d

)

d (miles)

(a) US (b) Europe (c) Japan

Fig. 14. Cumulated Empirical Distance Preference Function, Larged: Upper, Mercator; Lower, Skitter. Compare with Figure 6.

Page 18: On the Geographic Location of Internet Resources - Computer Science

18

-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0 0.5 1 1.5 2 2.5 3 3.5 4-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0 0.5 1 1.5 2 2.5-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0 0.5 1 1.5 2 2.5 3

-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0 0.5 1 1.5 2 2.5 3 3.5-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0 0.5 1 1.5 2 2.5 3 3.5

(a) No. of Interfaces (b) No. of Locations (c) AS degree

Fig. 15. Distributions of AS Sizes (World): Upper, Mercator; Lower, Skitter. Compare with Figure 7.

0

0.5

1

1.5

2

2.5

3

3.5

0 0.5 1 1.5 2 2.5 3 3.5 40

0.5

1

1.5

2

2.5

3

0 0.5 1 1.5 2 2.5 3 3.5 40

0.5

1

1.5

2

2.5

3

0 0.5 1 1.5 2 2.5 3 3.5

0

0.5

1

1.5

2

2.5

3

3.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

0.5

1

1.5

2

2.5

3

3.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

0.5

1

1.5

2

2.5

3

3.5

0 0.5 1 1.5 2 2.5 3 3.5

(a) No. Interfaces and No. Locations (b) No. of Interfaces and Degree (c) No. Locations and Degree

Fig. 16. Scatterplots of AS Size Measures (World): Upper, Mercator; Lower, Skitter. Compare with Figure 8.

Page 19: On the Geographic Location of Internet Resources - Computer Science

19

0

1

2

3

4

5

6

7

8

0 0.5 1 1.5 2 2.5 30

1

2

3

4

5

6

7

8

0 0.5 1 1.5 2 2.5 3 3.5 40

1

2

3

4

5

6

7

8

0 0.5 1 1.5 2 2.5 3 3.5

0

1

2

3

4

5

6

7

8

9

0 0.5 1 1.5 2 2.5 3 3.50

1

2

3

4

5

6

7

8

9

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

1

2

3

4

5

6

7

8

9

0 0.5 1 1.5 2 2.5 3 3.5

(a) Degree vs. CH (b) No. Interfaces vs. CH (c) No. Locations vs. CH

Fig. 17. Scatterplots of Size Measures vs. Convex Hull (World): Upper,Mercator; Lower, Skitter. Compare with Figure 10.