Upload
wes
View
36
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Optimization-Based Approaches to Understanding and Modeling Internet Topology. David Alderson California Institute of Technology INFORMS Telecom Conference 2004 March 8, 2004. Acknowledgments. - PowerPoint PPT Presentation
Citation preview
Optimization-Based Approaches to Understanding and Modeling
Internet Topology
David AldersonCalifornia Institute of Technology
INFORMS Telecom Conference 2004March 8, 2004
Acknowledgments• This talk represents joint work with John Doyle
(Caltech), Lun Li (Caltech), and Walter Willinger (AT&T—Research)
• Many others have contributed to this story– Reiko Tanaka (Caltech)
– Steven Low (Caltech)
– Ramesh Govindan (USC)
– Neil Spring (U.Washington)
– Stanislav Shalunov (Abilene)
– Heather Sherman (CENIC)
– John Dundas (Caltech)
Today’s Agenda• Review recent work of empiricists and theoreticians to
understand the router-level topology of the Internet• Understand the causes and implications of “heavy tails”
in the complex structure of the Internet• Illustrate how recently popular “scale-free” models of
Internet topology are not just wrong, but wildly so• Describe the importance of optimization in the
development of explanatory models of Internet topology• Present the HOT framework as an alternative means to
understanding the “robust, yet fragile” structure of the Internet and other complex engineering systems
• Highlight some open research problems and areas where contributions can be made by the OR/MS community
First Question:
Why should we care about modeling
the topology of the Internet?
Understanding the topology of the current (and future) Internet is important for many reasons
• Design and evaluation of networking protocols– Topology affects performance, not correctness
• Understanding large-scale network behavior– Closed-loop feedback: topology design vs. protocols
• Is the current design a result of the dominant routing protocols?
• Or are the presently used routing protocols the result of some prevailing network design principles?
– Ability to study what-if scenarios• Operating policy shifts
• Economic changes in the ISP market
• Implications for tomorrow’s networks– Provisioning requirements, Traffic engineering, Operating
and management policies
Performance Evaluation via Simulation
networksimulator(e.g. ns2)
• application-specific
traffic demandinfo
protocolinfo
performancemeasures
networktopology
info• connectivity• bandwidths• delays
(traditional) topology generators provide only connectivity
information!
Performance Evaluation via Simulation
networksimulator(e.g. ns2)
traffic demandinfo
protocolinfo
performancemeasures
• connectivity• bandwidths• delays
annotated networkgraph!
The Internet as an InfrastructureAs the Internet has grown in capability and importance,
we have become increasingly dependent on it.• Directly: communication (email, instant messenger,
VOIP), information and entertainment, e-commerce• Indirectly: business, education, government have
(permanently) replaced physical/manual methods with electronic processes, many of which rely on the Internet.
The central importance, open architecture, and evolving technology landscape make the Internet an attractive target for asymmetric attack.
The Internet as a Case StudyThe Internet is a great starting point for the study of
other highly engineered complex network systems:• To the user, it creates the illusion of a simple, robust,
homogeneous resource enabling endless varieties and types of technologies, physical infrastructures, virtual networks, and applications (heterogeneous).
• Its complexity is starting to approach that of simple biological systems
• Our understanding of the underlying technology together with the ability to perform detailed measurements means that most conjectures about its large-scale properties can be unambiguously resolved, though often not without substantial effort.
Question Two:
Why is research on Internet topology interesting/difficult?
A Challenging ProblemSince the decommissioning of the NSFNet in 1995, it has
been difficult to obtain comprehensive knowledge about the topology of the Internet
• The network has grown dramatically (number of hosts, amount of traffic, number of ISPs, etc.)
• There have been economic incentives for ISPs to maintain secrecy about their topologies
• Direct inspection usually not allowed
Qwest US Fiber Map – June 2001
(to Europe)
(to Hawaii)
(to Japan)
Source: www.qwest.com
Sprint backbone
Measuring Topology• Marketing documents not very helpful• The task of “discovering” the network has been left to
experimentalists – Must develop sophisticated methods to infer this topology
from appropriate network measurements.
– Many possible measurements that can be made.
Router-Level Topology
Hosts
Routers
• Nodes are machines (routers or hosts) running IP protocol
• Measurements taken from traceroute experiments that infer topology from traffic sent over network
• Subject to sampling errors and bias
• Requires careful interpretation
AS Topology• Nodes are entire
networks (ASes)• Links = peering
relationships between ASes
• Relationships inferred from Border Gateway Protocol (BGP) information
• Really a measure of business relationships, not network structure
AS1
AS3
AS4
AS2
Measuring Topology• Marketing documents not very helpful• The task of “discovering” the network has been left to
experimentalists – Must develop sophisticated methods to infer this topology
from appropriate network measurements.
– Many possible measurements that can be made.
– Each type of measurement has its own strengths, weaknesses, and idiosyncrasies, and results in a distinct view of the network topology.
• Hard to know what “matters”…
“Standard” Approach1. Choose a sequence of well-understood metrics or
observed features of interest, such as – hierarchy– node-degree distributions – clustering coefficients
2. Develop a method that matches these metrics
Pros:– Always possible to obtain a good “fit” on a chosen metric.
Cons:– Hard to choose the “right” metric. What is “right” is apt to vary,
depending on the intended use of the topology. – A method that does a good job of matching the chosen metric often
does not fit other metrics well.– No predictive power.
We call this approach descriptive (evocative) modeling.
An Alternative Approach1. Identify the causal forces at work in the design and
evolution of real topologies.
2. Develop methods that generate and evolve topologies in a manner consistent with these forces.
Pros: – Ability to generate (and design!) topologies at different levels of
hierarchy
– More realistic topology
– Greater predictive power
– Possibly reveal some relationship with routing protocols
Cons: – Difficult to identify the causal forces
– Requires careful development and diligent validation
We call this approach explanatory modeling.
Our Approach: Focus on the ISP
• Capture and represent realistic drivers of Internet deployment and operation at the level of the single ISP
• Many important networking issues are relevant at the level of the ISP (e.g. configuration, management, pricing, provisioning)
• Common topologies represented in terms of ISPs
– Router-level graphs connectivity within the ISP
– AS graphs connectivity between ISPs
• First-Order Objective: the ability to generate a “realistic, but fictitious” ISP topology at different levels of hierarchy
ISP Driving Forces• Economic Factors:
– Cost of procuring, installing, and maintaining the necessary facilities and equipment
– Limited budget for capital expenditures
– Need to balance expenditures with revenue streams
– Need to leverage investment in existing infrastructure
– Location of customers
• Technological Factors:– Hardware constraints (e.g. router speeds, limited # interfaces
or line cards per router)
– Level 2 Technologies (Sonet, ATM, WDM)
– Existing legacy infrastructure
– Location and availability of dark fiber
Mathematical Framework1. Use combinatorial optimization to represent the problem and
its constraints– Objectives (min cost, max profitability, satisfy demand)
– Constraints (equipment costs/capacities, legacy infrastructure)
– Parameters (pricing, provisioning, facility location)
2. Study and explore how this framework allows for a range of ISP behavior
– Effect of objectives and constraints are the most important
3. Part of a more general framework– Highly Optimized Tolerance (HOT), Carlson and
Doyle, 1999
Question Three:
How is research on Internet topology different from what OR/MS researchers are used to doing
on other complex engineering networks?
A Different Type of Problem• Researchers in network optimization are used to
problems in which the objectives and constraints are well-defined
• Here, we are trying to uncover the “most significant” drivers of topology evolution so that we can create “fictitious, yet realistic” network counterparts, and also so that we can design and build improved networks
• We will need to iterate between modeling, measurement, and analysis to get it right
• In this sense, this is a bit more like biology• Recent progress has given hope that a comprehensive
theory for the Internet may be possible
Heterogeneity of the Internet• Full of “high variability”
– Link bandwidth: Kbps – Gbps– File sizes: a few bytes – Mega/Gigabytes– Flows: a few packets – 100,000+ packets– In/out-degree (Web graph): 1 – 100,000+– Delay: Milliseconds – seconds and beyond
• Diversity in the technologies that comprise the physical and link layers
• Diversity in the applications and services that are supported
• This heterogeneity has evolved organically from an architecture that was designed to be robust to changes (failures or innovation) and is permanent
The Internet hourglass
IP
Web FTP Mail News Video Audio ping Kazaa
Applications
TCP SCTP UDP ICMP
Transport protocols
Ethernet 802.11 SatelliteOpticalPower lines BluetoothATM
Link technologies
IP
The Internet hourglass
Web FTP Mail News Video Audio ping Kazaa
Applications
TCP
Ethernet 802.11 SatelliteOpticalPower lines BluetoothATM
Link technologies
The Internet hourglass
IP
Web FTP Mail News Video Audio ping Kazaa
Applications
TCP
Ethernet 802.11 SatelliteOpticalPower lines BluetoothATM
Link technologies
Everythingon IP
IP oneverything
IP
TCP/AQM
Applications
Link
A Theory for the Internet?
• Vertical decomposition of the protocol stack allows for the treatment of layers in isolation (a separation theorem)
• Assume that layers not considered perform in a near-optimal manner
• Use an engineering design-based perspective in two ways:
– Analysis: explain the complex structure that is observed
– Synthesis: suggest changes or improvements to the current design
IP
TCP/AQM
Applications
Link
A Theory for the Internet?
How to design an application that “performs
well” in meeting user demands subject to the
resources/constraints made available by TCP/IP?
IP
TCP/AQM
Applications
Link
A Theory for the Internet?
How to design a network that “performs well” and satisfies traffic demands subject to the physical resources/constraints?
IP
TCP/AQM
Applications
Link
A Theory for the Internet?
If TCP/AQM is the answer, what is the
question?
Primal/dual model of TCP/AQM congestion control…
cRx
xUs
ssxs
subject to
)( max0
gives
gives
??
IP
TCP/AQM
Applications
Link
A Theory for the Internet?
??If the current topology of the Internet is the answer,
what is the question?
Next Question:
What’s been done to try and understand the large-scale structure of the Internet?
How should we think about the Internet’s router-level topology?
Trends in Topology ModelingObservation Modeling Approach
• Real networks are not random, but have obvious hierarchy (router-level).
• Structural models (GT-ITM Calvert/Zegura, 1996) generate connectivity-only topologies with inherent hierarchy
• Long-range links are expensive (router-level).
• Random graph models (Waxman, 1988) generate connectivity-only topologies
• Router-level and AS graphs exhibit heavy-tailed distributions (power laws) in characteristics such as node-degree.
• Degree-based models (including popular “scale-free” models) generate connectivity-only topologies with inherent power laws in node degree distribution
Power Laws and Internet Topology
Source: Faloutsos et al (1999)
Observed scaling in node degree and other statistics:– Autonomous System (AS) graph– Router-level graph
How to account for high variability in node degree?
Most nodes have few connections
A few nodes have lots of connectionsn
um
ber
of
con
nec
tion
s
rank rank
Power Laws in Topology Modeling• Recent emphasis has been on whether or not a
given topology model/generator can reproduce the same types of macroscopic statistics, especially power law-type degree distributions
• Lots of degree-based models have been proposed– All of them are based on random graphs, usually with
some form of preferential attachment
– All of them are connectivity-only models and tend to ignore engineering-specific system details
• Examples: BRITE, INET, Barabasi-Albert, GLP, PLRG, CMU-generator
Models of Internet Topology• These topology models are merely descriptive
– Measure some feature of interest (connectivity)– Develop a model that replicates that feature– Make claims about the similarity between the real system
and the model– A type of “curve fitting”?
• Unfortunately, by focusing exclusively on node degree distribution, these models that get the story wrong
• We seek something that is explanatory– Consistent with the drivers of topology design and
deployment– Consistent with the engineering-related details– Can be verified through the measurement of appropriate
system-specific details
Our Perspective• Must consider the explicit design of the Internet
– Protocol layers on top of a physical infrastructure– Physical infrastructure constrained by technological
and economic limitations– Emphasis on network performance– Critical role of feedback at all levels
• Consider the ability to match large scale statistics (e.g. power laws) as secondary evidence of having accounted for key factors affecting design
Trends in Topology ModelingObservation Modeling Approach
• Real networks are not random, but have obvious hierarchy (router-level).
• Structural models (GT-ITM Calvert/Zegura, 1996) generate connectivity-only topologies with inherent hierarchy
• Long-range links are expensive (router-level).
• Random graph models (Waxman, 1988) generate connectivity-only topologies
• Router-level and AS graphs exhibit heavy-tailed distributions (power laws) in characteristics such as node-degree.
• Degree-based models (including popular “scale-free” models) generate connectivity-only topologies with inherent power laws in node degree
• Physical networks have hard technological (and economic) constraints.
• Optimization-driven models generate annotated topologies consistent with design tradeoffs of network engineers
HOTHighly HeavilyHeuristically
Optimized Organized
Tolerance Tradeoffs
• Based on ideas of Carlson and Doyle (1999)• Complex structure (including power laws) of highly
engineered technology (and biological) systems is viewed as the natural by-product of tradeoffs between system-specific objectives and constraints
• Non-generic, highly engineered configurations are extremely unlikely to occur by chance
• Result in “robust, yet fragile” system behavior
Heuristic Network DesignWhat factors dominate network design?
• Economic constraints – User demands – Link costs– Equipment costs
• Technology constraints – Router capacity– Link capacity
1e-1
1e-2
1
1e1
1e2
1e3
1e4
1e21 1e4 1e6 1e8
Rank (number of users)
Dial-up~56Kbps
BroadbandCable/DSL~500Kbps
Ethernet10-100Mbps
POS/Ethernet1-10Gbps
Con
nec
tion
Sp
eed
(M
bp
s)
most users have low speed
connections
a few users have very high speed
connections
high performancecomputing
academic and corporate
residential and small business
Internet End-User Bandwidths
How to build a network that
satisfies these end user demands?
Economic Constraints• Network operators have a limited budget to
construct and maintain their networks
• Links are tremendously expensive
• Tremendous drive to operate network so that traffic shares the same links– Enabling technology: multiplexing– Resulting feature: traffic aggregation at edges– Diversity of technologies at network edge (Ethernet,
DSL, broadband cable, wireless) is evidence of the drive to provide connectivity and aggregation using many media types
Heuristically Optimal Network
Hosts
Edges
CoresMesh-like core of fast,
low degree routers
High degree nodes are at the edges.
Heuristically Optimal NetworkClaim: economic considerations alone suggest
a structure having– Mesh-like core of high-speed, low degree routers– High degree, low-speed nodes at the edge
• Is this consistent with technology capability?
• Is this consistent with real network design?
Cisco 12000 Series Routers
Chassis Rack size SlotsSwitching Capacity
12416 Full 16 320 Gbps
12410 1/2 10 200 Gbps
12406 1/4 6 120 Gbps
12404 1/8 4 80 Gbps
• Modular in design, creating flexibility in configuration.
• Router capacity is constrained by the number and speed of line cards inserted in each slot.
Source: www.cisco.com
Cisco 12000 Series RoutersTechnology constrains the number and capacity of line cards that can be installed, creating a feasible region.
Cisco 12000 Series RoutersPricing info: State of Washington Master Contract, June 2002
(http://techmall.dis.wa.gov/master_contracts/intranet/routers_switches.asp)
$602,500
$381,500
$212,400
$128,500
$2,762,500
$1,667,500
$932,400
$560,500
bandwidth
degree1 16
10Gb
155Mb
256log/log
625Mb
2.5GbTechnically
feasible
160Gb
Technological advance
Technologically Feasible Region
0.01
0.1
1
10
100
1000
10000
100000
1000000
1 10 100 1000 10000
degree
Ban
dw
idth
(M
bp
s) cisco 12416
cisco 12410
cisco 12406
cisco 12404
cisco 7500
cisco 7200
cisco 3600/3700
cisco 2600
linksys 4-port router
uBR7246 cmts(cable)cisco 6260 dslam(DSL)cisco AS5850(dialup)
Edge Shared media(LAN, DSL,
Cable, Wireless,Dial-up)
Corebackbone
High-end gateways
Older/cheapertechnology
SOX
SFGP/AMPATH
U. Florida
U. So. Florida
Miss StateGigaPoP
WiscREN
SURFNet
Rutgers
MANLAN
NorthernCrossroads
Mid-AtlanticCrossroads
Drexel
U. Delaware
PSC
NCNI/MCNC
MAGPI
UMD NGIX
DARPABossNet
GEANT
Seattle
Sunnyvale
Los Angeles
Houston
Denver
KansasCity
Indian-apolis
Atlanta
Wash D.C.
Chicago
New York
OARNET
Northern LightsIndiana GigaPoP
MeritU. Louisville
NYSERNet
U. Memphis
Great Plains
OneNetArizona St.
U. Arizona
Qwest Labs
UNM
OregonGigaPoP
Front RangeGigaPoP
Texas Tech
Tulane U.
North TexasGigaPoP
TexasGigaPoP
LaNet
UT Austin
CENICUniNet
WIDE
AMES NGIX
OC-3 (155 Mb/s)OC-12 (622 Mb/s)GE (1 Gb/s)OC-48 (2.5 Gb/s)OC-192/10GE (10 Gb/s)
Abilene BackbonePhysical Connectivity(as of December 16, 2003)
PacificNorthwestGigaPoP
U. Hawaii
PacificWave
ESnet
TransPAC/APAN
Iowa St.
Florida A&MUT-SWMed Ctr.
NCSA
MREN
SINet
WPI
StarLight
IntermountainGigaPoP
Cisco 750X
Cisco 12008
Cisco 12410
dc1dc2
dc3
hpr
dc1
dc3
hpr
dc2dc1
dc1dc2
hprhpr
SACOAK
SVL
LAX
SDG
SLOdc1
FRGdc1
FREdc1
BAKdc1
TUSdc1
SOLdc1
CORdc1
hprdc1
dc2
dc3
hpr
OC-3 (155 Mb/s)OC-12 (622 Mb/s)GE (1 Gb/s)OC-48 (2.5 Gb/s)10GE (10 Gb/s)
CENIC Backbone (as of January 2004)
AbileneSunnyvale
AbileneLos Angeles
Backbone topologies for both Abilene and CENIC are built as a mesh of high speed, low degree routers.
As one moves from the core out toward the edge, connectivity gets higher, and speeds get lower.
Corporation for Education Network Initiatives in California (CENIC) runs the educational backbone for the State of California
dc1dc2
dc3
hpr
dc1
dc3
hpr
LAX
SDG
SLOdc1
BAKdc1
TUSdc1
hpr
Chaffey, Crafton Hills, Cypress, Fullerton CC,
Mt. San Jacinto, Rio Hondo, Riverside, San Bernardino CCD, San Bernardino Valley, N.
Orange Cty CCD, Santa Ana College
Chaffey Joint USD
LA USD
UC Irvine
UC San Diego
SDSC
UCLA
UC Riverside
San Diego CC, Soutwestern CC,
Grossmont, Cuyamaca, Imperial Valley, Mira Costa CC, Palomar
CollegeSan Diego COE Johnson & Johnson
CUDI Peer,ESNet Peer
LosNettos
LAAP
UCSSN(Las Vegas)
UC SantaBarbara
MonroviaUSD Gigaman
Antelope Valley CC, Cerritos, Citrus, College of
the Canyons, Compton, East LA, El Camino CC, Glendale, Long Beach City College, Pasadena
CC, Santa Monica, Ventura
College
LA CCD, LA City, LA Harbor, LA Mission, LA Pierce, LA Southwest, LA Trade Tech,
LA Valley, Moorpark, Mt. San Antonio, Oxnard
Los Angeles COE
San Bernardino CSS
Riverside COE
Orange COE
Caltech
Abilene
to Soledad
to Sunnyvale
to Sacramento
to Fremont
CENIC Backbone for Southern California
Heuristically Optimal Network• Mesh-like core of high-speed, low degree routers• High degree, low-speed nodes at the edge
• Claim: consistent with drivers of topology design– Economic considerations (traffic aggregation)
– End user demands
• Claim: consistent with technology constraints• Claim: consistent with real observed networks
Question: How could anyone imagine anything else?
Two opposite views of complexity
Physics:• Pattern formation by
reaction/diffusion• Edge-of-chaos• Order for free• Self-organized criticality• Phase transitions• Scale-free networks• Equilibrium, linear• Nonlinear, heavy tails as
exotica
Engineering and math:• Constraints• Tradeoffs• Structure • Organization• Optimality• Robustness/fragility• Verification• Far from equilibrium• Nonlinear, heavy tails
as tool
Principle Difference:
Random vs. Designed
Models of Internet Topology
• To physicists, scaling relationships are suggestive of critical phenomenon and phase transitions
• The starting assumption is that of randomness, and one looks for emergent behaviors
Random NetworksTwo methods for generating random networks having
power law distributions in node degree• Preferential attachment (“scale-free” networks)
– Inspired by statistical physics– Barabasi et al.; 1999
• Power Law Random Graph (PLRG)– Inspired by graph theory– Aiello, Chung, and Lu; 2000
Summary of “Scale-Free” Story• Fact: “Scale-free” networks have roughly power law
degree distributions• Claim:
– If the Internet has power law degree distribution
– Then it must be “scale-free” (oops)
– Therefore, it has the properties of a “scale-free” network
• Characteristic features of “scale free” networks– High degree central “hubs”
– Network connectivity is robust to loss of random nodes, but fragile to attack on central hubs
– Highly likely to result from various random constructions
One ofthe most-read papers ever on
the Internet!
Scientists spot Achilles heel of the Internet
• "The reason this is so is because there are a couple of very big nodes and all messages are going through them. But if someone maliciously takes down the biggest nodes you can harm the system in incredible ways. You can very easily destroy the function of the Internet."
• These scientists compared the structure of the Internet to the airline network of the United States.
Key Points• The scale-free story is based critically on the implied
relationship between power laws and a network structure that has highly connected “central hubs”
• Not all networks with power law degree distributions have properties of scale free networks. (The Internet is just one example!)
• Building a model to replicate power law data is no more than curve fitting (descriptive, not explanatory)
• The ubiquity of heavy-tailed (power-law) relationships in highly variable phenomena is to be expected for statistical reasons alone and requires no “special” or “exotic” explanation
End ResultThe “scale-free” claims of the Internet are not merely wrong, they suggest properties that are
the opposite of the real thing.
Fundamental difference:
random vs. designed
Internet topologies
nodes=routersedges=links
25 interior routers818 end systems
High degree hub-like coreLow degree
mesh-like core
101
102
100
101
degree
rank
identical power-law degrees
How to characterize / compare these two networks?
“scale-rich” vs. “scale-free”
Network PerformanceGiven realistic technology constraints on routers, how
well is the network able to carry traffic?
Step 1: Constrain to be feasible
Abstracted Technologically Feasible Region
1
10
100
1000
10000
100000
1000000
10 100 1000
degree
Ban
dw
idth
(M
bp
s)
Bi
Bj
xij
Step 2: Compute traffic demand
kCxts
BBx
ijrkjikij
ji jijiij
,..
maxmax
:,
, ,
Step 3: Compute max flow
Network LikelihoodHow likely is a particular graph (having given node
degree distribution) to be constructed?
• Notion of likelihood depends on defining an appropriate probability space for random graphs.
• Many methods (all based on probabilistic preferential attachment) for randomly generating graphs having power law degree distributions:– Power Law Random Graph (PLRG) [Aiello et al.]
– Random rewiring (Markov chains)
In both cases, LogLikelihood (LLH) j
connectedji
idd,
Likelihood
HighLow
Fast
Slow
Performance
Why such strikingdifferences with same
node degree distribution?
1
10
100
1000
10000
100000
10 100
degree
Ban
dw
idth
(M
bp
s)
11
10
100
1000
10000
100000
10 100
degree1
Fast core
High-degree edge
Slow core
Slower edge
Performance LikelihoodLikelihood
HighLow
Fast
Slow
“Scale-free”• Core: Hub-like, high degree • Edge: Low degree• Robust to random • Fragile to “attack”
HOT “scale-rich”• Core: Mesh-like, low degree • Edge: High degree• Robust to random • Robust to “attack”
• High performance• Low link costs• Unlikely, rare, designed• Destroyed by rewiring• Similar to real Internet
• Low performance• High link costs• Highly likely, generic• Preserved by rewiring• Opposite of real Internet
+ objectives and constraints
HierarchicalScale-Free (HSF)
RandomHOT
Low LikelihoodLow Performance Most Likely
Key Points
1. High performance networks are extremely unlikely to be found by a random process.
2. Models that focus on “highly likely” constructions will result in graphs that are poorly performing and are not representative of highly engineered networks.
Recap• We do not claim that our “heuristically optimal topology” is
an accurate representation of the real Internet, simply that it captures some basic features observed in real networks.
• But it goes a long way to dispelling much of the confusion about heavy-tailed node degree distributions (i.e. “scale-free” models are fundamentally inconsistent with engineering design despite their ability to match macro-statistics).
• “Scale-free” models may be good representations of other systems, simply not the router-level of the Internet.
• This highlights the importance of “random vs. designed”.• It is remarkable how even simple models based on
fundamental technological and economic tradeoffs can go a long way to explaining large-scale network features.
• These models are a only starting point for a more detailed investigation of Internet topology.
Final Question:
What still needs to be done to understand the large-scale structure of the Internet?
How can researchers in OR/MS help to solve this problem?
(to be done)• Understand the relationship between
optimization drivers and topology– Example: Papadimitriou’s HOT
• Description of more detailed optimization models that account for real economic and technological considerations (?)
• Tie-in to WDM network optimization currently in vogue
• (Need help thinking this through)
Today’s Agenda• Review recent work of empiricists and theoreticians to
understand the router-level topology of the Internet• Understand the causes and implications of “heavy tails”
in the complex structure of the Internet• Illustrate how recently popular “scale-free” models of
Internet topology are not just wrong, but wildly so• Describe the importance of optimization in the
development of explanatory models of Internet topology• Present the HOT framework as an alternative means to
understanding the “robust, yet fragile” structure of the Internet and other complex engineering systems
• Highlight some open research problems and areas where contributions can be made by the OR/MS community
Thank You
Appendix
Recent Measurement Experiments• R. Govindan and H. Tangmunarunkit. Heuristics for Internet Map
Discovery, Proceeding of IEEE INFOCOM (2000) [often known as Mercator Project]
• L. Gao. On inferring autonomous system relationships in the Internet, in Proc. IEEE Global Internet Symposium, November 2000.
• Route Views, University of Oregon Route Views Project, Available at http://www.antc.uoregon.edu/route-views/.
• A. Broido and k. Claffy. Internet Topology: Connectivity of IP Graphs, Proceeding of SPIE ITCom WWW Conf. (2001) [often known as Skitter Project]
• N. Spring, R. Mahajan, and D.Wetherall. Measuring ISP Topologies with Rocketfuel, Proc. ACM SIGCOMM (2002)
• L. Subramanian, S. Agarwal, J. Rexford, and R. Katz. Characterizing the Internet Hierarchy from Multiple Vantage Points, Proc. IEEE INFOCOM (2002)
• H. Chang, R. Govindan, S. Jamin, S. Shenker, and W. Willinger. Towards Capturing Representative AS-Level Internet Topologies Proc. Of ACM SIGMETRICS (2002)
Models based on preferential attachment:• INET: use both curve fitting and preferential attachment. It first calculates the
frequency-degree and rank-degree distributions. It then assigns degrees to each node according to these distributions. Finally it matches these degrees according to the linear preferential model.
• BRITE: BRITE incorporates recent preferential attachment and observations of skewed network placement and locality in network connections on the Internet.
• BA: preferential attachment• AB: preferential attachment + adding a third rewiring operation consisting of
choosing links randomly and re-wiring each end of them according to the same linear preference rule as used in BA generator.
• GLP (Generalized Linear Preference): change preferential probability from d_i/sum(d_i) to (d_i-beta)/(sum(d_i-beta)), and beta is a tunable paparmenter between (-\infinity, 1). With some probabilty, add links preferentially and with some other probability, add nodes preferentially.
Other models:• PLRG: given N nodes with expected degree distribution, assign links among
links with probability proprotional to the product of the expected degree of the two end points.
• CMU Power-law generator: two ways: 1. assign a power-law degree distribution to nodes and then place links in the adjacency matrix such that every node obtains the assigned degree. 2. Recusive way: define a probability distribution function that randomly selects a pair of nodes and use it to produce a network graph with real-valued edge weights.