VOLUME 87, NUMBER 25 P H Y S I C A L R E V I E W L E T T E R S 17 DECEMBER 2001
258701-Dynamical and Correlation Properties of the Internet
Romualdo Pastor-Satorras,1 Alexei Vzquez,2 and Alessandro Vespignani31Departament de Fsica i Enginyeria Nuclear, Universitat Politcnica de Catalunya, Campus Nord, Mdul B4,
08034 Barcelona, Spain2International School for Advanced Studies SISSA/ISAS, via Beirut 4, 34014 Trieste, Italy
3The Abdus Salam International Centre for Theoretical Physics (ICTP), P.O. Box 586, 34100 Trieste, Italy(Received 8 May 2001; published 28 November 2001)
The description of the Internet topology is an important open problem, recently tackled with theintroduction of scale-free networks. We focus on the topological and dynamical properties of real Internetmaps in a three-year time interval. We study higher order correlation functions as well as the dynamicsof several quantities. We find that the Internet is characterized by nontrivial correlations among nodesand different dynamical regimes. We point out the importance of node hierarchy and aging in the Internetstructure and growth. Our results provide hints towards the realistic modeling of the Internet evolution.
DOI: 10.1103/PhysRevLett.87.258701 PACS numbers: 89.20.Hh, 05.70.Ln, 87.23.Ge, 89.75.HcComplex networks play an important role in the under-standing of many natural systems [1,2]. A network is aset of nodes and links, representing individuals and theinteractions among them, respectively. Despite this simpledefinition, growing networks can exhibit a high degree ofcomplexity, due to the inherent wiring entanglement occur-ring during their growth. The Internet is a capital exampleof growing network with technological and economicalrelevance; however, the recollection of router-level mapsof the Internet has received the attention of the researchcommunity only very recently . The statisticalanalysis performed so far has revealed that the Internet ex-hibits several nontrivial topological properties (wiringredundancy, clustering, etc.). Among them, the presenceof a power-law connectivity distribution [6,7] makes theInternet an example of the recently identified class ofscale-free networks .
In this Letter, we focus on the dynamical properties ofthe Internet. We shall consider the evolution of real In-ternet maps from 1997 to 2000, collected by the NationalLaboratory for Applied Network Research (NLANR) .In particular, we will inspect the correlation propertiesof nodes connectivity, as well as the time behavior ofseveral quantities related to the growth dynamics of newnodes. Our analysis shows dynamical behavior with dif-ferent growth regimes depending on the nodes age andconnectivity. The analysis points out two distinct wiringprocesses: the first one concerns newly added nodes, whilethe second is related to already existing nodes increasingtheir interconnections. A feature introduced in this pa-per refers to the Internet hierarchical structure, reflectedin a nontrivial scale-free connectivity correlation function.Finally, we discuss recent models for the generation ofscale-free networks in the light of the present analysis ofreal Internet maps. The results presented in this Lettercould help develop more accurate models of the Internet.
Several Internet mapping projects are currently de-voted to obtaining high-quality router-level maps of theInternet. In most cases, the map is constructed by using a1 0031-90070187(25)258701(4)$15.00hop-limited probe (such as the UNIX trace-route tool) froma single location in the network. In this case the result is adirected map as seen from a specific point on the Inter-net . This approach does not correspond to a completemap of the Internet because cross-links and other technicalproblems (such as multiple internet provider aliases) arenot considered. Heuristic methods to take into accountthese problems have been proposed . However, it is notclear if they are reliable and if the corresponding complete-ness of maps can be constructed in this way. A differentrepresentation of the Internet is obtained by mapping theautonomous systems (AS) topology. Each AS numberapproximately maps to an internet service provider (ISP)and its links are inter-ISP connections. In this case it ispossible to collect data from several probing stations toobtain complete interconnectivity maps [3,4]. In particular,the NLANR project has been collecting data since Novem-ber 1997, and it provides topological as well as dynamicalinformation on a consistent subset of the Internet. The firstNovember 1997 map contains 3180 AS, and it has grownin time until the December 1999 measurement, consistingof 6374 AS. In the following we will consider the graphwhose nodes represent AS and whose links represent theconnections between AS.
In dealing with the Internet as an evolving network, it isimportant to discern whether or not it has reached a station-ary state whose average properties are time independent.As a first step, we analyzed the behavior in time of sev-eral average quantities, such as the connectivity k, theclustering coefficient C, and the average minimum pathdistance d, of the network . The first two quanti-ties (see Table I) show a very slow tendency to increase intime, while the average minimum path distance slowly de-creases with time. A more clear-cut characterization of thetopological properties of the network is given by the con-nectivity distribution, Pk. In Fig. 1 we show the proba-bility Pk that a given node has k links to other nodes. Wereport the distribution of snapshots of the Internet at dif-ferent times. In all cases, we found a clear power-law 2001 The American Physical Society 258701-1
VOLUME 87, NUMBER 25 P H Y S I C A L R E V I E W L E T T E R S 17 DECEMBER 2001TABLE I. Average properties for three different years. k isthe average connectivity. d is the minimum path distance dijaveraged over every pair of nodes i, j. C is the clusteringcoefficient Ci averaged over all nodes i, where Ci is defined asthe ratio between the number of links between the neighbors ofi and its maximum possible value ki ki 2 12. The numbersin parentheses indicate the statistical uncertainty from averagingthe values of the corresponding months in each year.
Year 1997 1998 1999
k 3.47(4) 3.62(5) 3.82(6)C 0.18(1) 0.21(2) 0.24(1)d 3.77(1) 3.76(2) 3.72(1)
behavior Pk k2g with g 2.2 6 0.1. The distribu-tion cutoff is fixed by the maximum connectivity of thesystem and is related to the overall size of the Internetmap. On the other hand, the power-law exponent g seemsto be independent of time and in good agreement with pre-vious measurements . This evidence seems to point outthat the Internets topological properties have already set-tled into a rather well-defined stationary state.
Initially, the modeling of the Internet considered algo-rithms based on its static topological properties . How-ever, since the Internet is the natural outcome of a complexgrowth process, the understanding of the dynamical pro-cesses leading to its present structure must be considereda fundamental goal. From this perspective, the Barabsi-Albert (BA) model (Refs. [8,12]) can be considered as amajor step forward in the understanding of evolving net-works. Underlying the BA model is the preferential at-tachment rule ; i.e., new nodes will link with higherprobability to nodes with an already large connectivity.This feature is quantitatively accounted for by postulat-ing that the probability of a new link attaching to an old
FIG. 1. The cumulated connectivity distribution for the 1997,1998, and 1999 snapshots of the Internet. The power-law behav-ior is characterized by a slope 21.2, which yields a connectivityexponent g 2.2.258701-2node with connectivity ki, Pki, is linearly proportional toki , Pki ki. This is an intuitive feature of the Internetgrowth, where large provider hubs are more likely to estab-lish connections than smaller providers. The BA model hasbeen successively modified with the introduction of severalingredients in order to account for connectivity distributionwith 2 , g , 3 [13,14], local geographical factors ,wiring among existing nodes , and age effects .While all these models reproduce the scale-free behaviorof the connectivity distribution, it is interesting to inspectdeeper the Internets topology to eventually find a few dis-criminating features of the dynamical processes at the basisof the Internet growth.
A first step in a more detailed characterization of theInternet concerns the exploration of the connectivitycorrelations. This factor is best represented by the con-ditional probability Pck0 j k that a link belonging to anode with connectivity k points to a node with connec-tivity k0. If this conditional probability is independentof k, we are in the presence of a topology without anycorrelation among the nodes connectivity. In this case,Pck0 j k Pck0 k0Pk0, in view of the fact thatany link points to nodes with a probability proportionalto their connectivity. On the contrary, the explicit de-pendence on k is a signature of nontrivial correlationsamong the nodes connectivity, and the possible presenceof a hierarchical structure in the network topology. Adirect measurement of the Pck0 jk function is a rathercomplex task due to large statistical fluctuations. Moreclear indications can be extracted by studying the quan-tity knn
Pk 0 k
0Pck0 j k; i.e., the nearest neighborsaverage connectivity of nodes with connectivity k. InFig. 2, we show the results obtained for the Internetmap of 1998 which strikingly exhibit a clear power-lawdependence on the connectivity degree knn k2n ,with n 0.5. This result clearly implies the existenceof nontrivial correlation properties for the Internet. Theprimary known structural difference between Internetnodes is the distinction between stub and transit do-mains. Nodes in stub domains have links that go throughonly the domain itself. Stub domains, on the other hand,are connected via a gateway node to transit domainsthat, on the contrary, are fairly well interconnected viamany paths. In other words, there is a hierarchy imposedon nodes that is very likely at the basis of the abovecorrelation properties. As instructive examples, we reportin Fig. 2 the average nearest-neighbor connectivity forthe generalized BA model, with g 2.2 , and thefitness model described in Ref. , with g 2.25, fornetworks with the same size as the Internet snapshotconsidered. While in the first case we do not observe anynoticeable structure with respect to the connectivity k, inthe latter we obtain a power-law dependence similar tothe experimental findings. The general analytic study ofconnectivity correlations in growing network models canbe found in Ref. . A detailed discussion of different258701-2
VOLUME 87, NUMBER 25 P H Y S I C A L R E V I E W L E T T E R S 17 DECEMBER 2001FIG. 2. The average connectivity knn of the nearest neighborsof a node depending on its connectivity k for the 1998 snap-shot of the Internet, the generalized BA model with g 2.2(Ref. ), and the fitness model (Ref. ). The solid line hasa slope 20.5. The scattered results for very large k are due tostatistical fluctuations.
models is beyond the scope of this paper; however, it isworth noticing that a k structure in correlation functions,as probed by the quantity knn, does not arise in allgrowing network models.
In order to inspect the Internet dynamics, we focus ourattention on the addition of new nodes and links into themaps. In the three-year range considered, we keep track ofthe number of links new appearing between a newly intro-duced node and an already existing node. We also monitorthe rate of appearance of links old between already ex-isting nodes. In Table II we see that the creation of newlinks is governed by these two processes at the same time.Specifically, the largest contribution to the growth is givenby the appearance of links between already existing nodes.This clearly points out that the Internet growth is stronglydriven by the need for redundancy wiring and an increasedneed for an available bandwidth for data transmission.
A customarily measured quantity in the case of growingnetworks is the average connectivity kit of new nodesas a function of their age t. In Refs. [8,19] it is shownthat kit is a scaling function of both t and the absolutetime of birth of the node t0. We thus consider the totalnumber of nodes born within a small observation win-
TABLE II. Monthly rate of new links connecting existingnodes to new new and old old nodes.
Year 1997 1998 1999
new 183(9) 170(8) 231(11)old 546(35) 350(9) 450(29)newold 0.34(2) 0.48(2) 0.53(3)258701-3dow Dt0, such that t0 const with respect to the absolutetime scale that is the Internet lifetime. For these nodes,we measure the average connectivity as a function of thetime t elapsed since their birth. The data for two differenttime windows are reported in Fig. 3, where it is possibleto distinguish two different dynamical regimes: At earlytimes, the connectivity is nearly constant with a very slowincrease kit t0.1. Later, the behavior approachesa power-law growth kit t0.5. While exponent es-timates are affected by noise and limited time windoweffects, the crossover between two distinct dynamicalregimes is compatible with the general aging form obtainedin Ref. . In particular both the generalized BA model and the fitness model  present aging effects similarto those obtained in real data. A more detailed comparisonwould require quantitative knowledge of the parameters tobe used in the models and will be reported elsewhere.
A basic issue in the modeling of growing networksconcerns the preferential attachment hypothesis . Bygeneralizing the BA model algorithm it is possible to definemodels in which the rate Pk, with which a node withk links receives new nodes, is proportional to ka . Theinspection of the exact value of a in real networks is animportant issue since the connectivity properties stronglydepend on this exponent [14,20]. Here we use a simplerecipe that allows us to extract the value of a by studyingthe appearance of new links. We focus on links emanatingfrom newly appeared nodes in different time windowsranging from one to three years. We consider the frequencymk of links that connect to nodes with connectivity k.By using the preferential attachment hypothesis, this ef-fective probability is mk kaPk. Since we know
FIG. 3. The average connectivity of nodes borne within a smalltime window Dt0, after a time t elapsed since their appearance.Time t is measured in days. As a comparison we report the linescorresponding to t0.1 and t0.5.258701-3
VOLUME 87, NUMBER 25 P H Y S I C A L R E V I E W L E T T E R S 17 DECEMBER 2001FIG. 4. The cumulative frequency of links emanating fromnew and existing nodes that attach to nodes with connectivityk. The straight line corresponds to a slope 20.2. The flat tailoriginates from the poor statistics at very...