Models, Simulation, Emulation, [3] Experimention for Grid

1/ 43

Models, Simulation, Emulation, Experimention

for Grid Computing

Arnaud Legrand

ID laboratory, [email protected]

Grid 5000 school: March 8, 2006

[email protected]

2/ 43

Grid Research

Grid researchers often ask questions in the broad area of distributedcomputing

I Which scheduling algorithm is best for this application on agiven Grid?

I Which design is best for implementing a distributed Grid re-source information service?

I Which caching strategy is best for enabling a community ofusers that to distributed data analysis?

I What are the properties of several resource management poli-cies in terms of fairness and throughput?

I What is the scalability of my publish-subscribe system undervarious levels of failures

I . . .

3/ 43

Analytical or Experimental?

Analytical Grid Computing researchI Some have developed purely analytical / mathematical models

for Grid computingI makes it possible to prove interesting theoremsI often too simplistic to convince practitionersI but generally useful for understanding principles in spite of du-

bious applicability

I One often uncovers NP-complete problems anyway e.g., rout-ing, partitioning, scheduling problems

I And one must run experiments

; Grid computing research is based on experiments

4/ 43

Grid Computing as Science?

You can tell you are a scientific discipline if

I You can read a paper, easily reproduce (at least a subset of)its results, and improve

I You can tell to a grad student: “Here are the standard tools,go learn how to use them and come back in one month”.

I You can give a 1-hour seminar on widely accepted tools thatare the basis for doing research in the area

We are not there today. . .

But perhaps I can give a 1-hour seminar on emerging tools thatcould be the basis for doing research in the area, provided a fewopen questions are addressed.

Need for standard ways to run Grid experiments

5/ 43

Grid Experiments

Real-world experiments are good

I Eminently believableI Demonstrates that proposed approach can be implemented

in practice

But...

Can be time-intensive Execution of “applications” for hours, days,months, ...

Can be labor-intensive Entire application needs to be built and func-tional (including all design / algorithms alternatives include allhooks for deployment).Is it a good engineering practice to build many full-fledge solu-tions to find out which ones work best?

6/ 43

Grid Experiments: What experimental testbed?

My own little testbed well-behaved, controlled, stable, often not rep-resentative of real Grids.

Real grid platforms

I (Still) challenging for many grid researchers to obtainI Not built as a computer scientist’s playpen (other users may

disrupt experiments, other users may find experiments dis-ruptive)

I Platform will experience failures that may disrupt the exper-iments

I Platform configuration may change drastically while experi-ments are being conducted

I Experiments are uncontrolled and unrepeatable: even if dis-ruption from other users is part of the experiments, it pre-vents back-to-back runs of competitor designs / algorithms

7/ 43

Grid Experiments are Limited and Non Reproducible

; Difficult to obtain statistically significant results on an appropri-ate testbed

And to make things worse...

Experiments are limited to the testbed

I What part of the results are due to idiosyncrasies of thetestbed?

I Extrapolations are possible, but rarely convincingI Must use a collection of testbeds...I Still limited explorations of “what if” scenarios (what if the

network were different? what if we were in 2015? what ifwe had a different workload?)

Difficult for others to reproduce results This is the basis for scien-tific advances!

8/ 43

Simulation

Simulation can solve many (all) of these difficulties

I No need to build a real system

I Conduct controlled / repeatable experiments

I In principle, no limits to experimental scenarios

I Possible for anybody to reproduce results

Definition: Simulation.

Attempting to predict aspects of the behaviour of some system bycreating an approximate (mathematical) model of it.

Key question: Validation (correspondence between simulation andreal-world)

9/ 43

Simulation in Computer Science

Microprocessor Design A few standard “cycle-accurate” simulatorsare used extensively

http://www.cs.wisc.edu/∼arch/www/tools.html

Possible to reproduce simulation results

Networking Network-guys are better at standards than we are... ,I A few standard “packet-level” simulators NS-2, DaSSF, OM-

NeT++I Well-known datasets for network topologiesI Well-known generators of synthetic topologiesI SSF standard: http://www.ssfnet.org/I Possible to reproduce simulation results

Grid Computing? None of the above up until a few years ago (mostpeople built their own “ad-hoc” solutions).

http://www.cs.wisc.edu/~arch/www/tools.html

10/ 43

Simulation of Distributed Computing Platforms?

Simulation of parallel platforms used for decades. Most simulationshave made drastic assumptions

Simplistic platform model

I Processors perform work at some fixed rate (Flops)I Network links send data at some fixed rateI Topology is fully connected (no communication interference)

or a bus (simple communication interference)I Communication and computation are perfectly overlappable

Simplistic application model

I All computations are CPU intensiveI Clear-cut communication and computation phasesI Application is deterministic

Straightforward simulation in most cases Just fill in a Gantt chartwith a computer rather than by hand. No need for a “simulationstandard”

11/ 43

Grid Simulations?

Simple models are perhaps justifiable for a switched dedicated clusterrunning a matrix multiply application.They are hardly justifiable for grid platforms. . .

I Complex and wide-reaching network topologies (multi-hop net-works, heterogeneous bandwidths and latencies, non-negligiblelatencies, complex bandwidth sharing behaviors, contention withother traffic)

I Overhead of middleware

I Complex resource access/management policies

I Interference of communication and computation

12/ 43

Grid Experiments

Recognized as a critical area

GRID 5000 A reconfigurable grid to enable reproductibility.

Grid eXplorer (GdX) project (INRIA) Build an actual scientific in-strument

I Databases of “experimental conditions”I 1000-node cluster for simulation/emulationI Visualization tools

What simulation technology???

What kind of results in my paper?

I Results for a few “real” platforms

I Results for many synthetic platforms

Two main questions:

1 What does a “representative” Grid look like?

2 How does one do simulation on a synthetic representative Grid?

13/ 43

Outline

1 Simulation for Grid Computing

2 Generating Synthetic GridsTopologyCompute resourcesBackground conditions

3 Simulating Applications on Synthetic GridsIntroductionMicroGridModelNetPlanetLabChicagoSim, OptorSim, GridSimSimGrid

4 Conclusion

14/ 43

Outline




4 Conclusion

15/ 43

Why Synthetic Platforms?

Two goals of simulations:

I Simulate platforms beyond the ones at hand

I Perform sensitivity analyses

Need: Synthetic platforms

I Examine real platforms

I Discover principles

I Implement “platform generators”

16/ 43

Generation of Synthetic Grids

Network Topology

I Graph

I Bandwidth and Latencies

Compute Resources And other re-sources “Background” Conditions

I CPU capacity

I Load

I Failures

What is Representative and Tractable?

16/ 43


Network Topology

I Graph



I CPU capacity

I Load

I Failures


16/ 43


Network Topology

I Graph



I CPU capacity

I Load

I Failures


16/ 43


Network Topology

I Graph



I CPU capacity

I Load

I Failures


16/ 43


Network Topology

I Graph



I CPU capacity

I Load

I Failures


16/ 43


Network Topology

I Graph



I CPU capacity

I Load

I Failures


17/ 43

Synthetic Network Topologies

The network community has wondered about the properties of theInternet topology for years

I The Internet grows in a decentralized fashion with seeminglycomplex rules and incentives

I Could it have a mathematical structure?

I Could we then have generative models?

Three “generations of graph generators”

I Random

I Structural

I Degree-based

18/ 43

Flat models

Brain-dead N dots are randomly chosen (using a uniform distribu-tion) in a square. Then they are randomly connected with auniform probability α.

Waxman [Wax88] Dots are randomly placed on a square of side cand are randomly connected with a probability P (u, v) = αe−d/(βL),0 < α, β 6 1 where d is the Euclidean distance between u and vand L = c

√2. The edge number increases with α and the edge

length heterogeneity increases with β.

Exponential Dots are randomly placed and are connected with aprobability P (u, v) = αe−d/(L−d).

Locality [ZCD97] This model is due to Zegura. Dots are randomlyplaced and are connected with a probability

P (u, v) =

{α if d < L× r

β if d > L× r.

18/ 43

Flat models







P (u, v) =

{α if d < L× r

β if d > L× r.

18/ 43

Flat models







P (u, v) =

{α if d < L× r

β if d > L× r.

18/ 43

Flat models







P (u, v) =

{α if d < L× r

β if d > L× r.

18/ 43

Flat models







P (u, v) =

{α if d < L× r

β if d > L× r.

19/ 43

Node placement

Uniform

19/ 43

Node placement

Heavy Tailed

20/ 43

What about hierarchy ?

Top-Down

N-level [ZCD97] Starting from a connected graph, at each step, anode is replaced by another connected graph (Tiers, GT-ITM).

Transit-stub [ZCD97] 2-levels of hierarchy and some additional edges(GT-ITM, BRITE).

Stub Domains

Multi-homed stubTransit Domains

Stub-Stub Edge

20/ 43

What about hierarchy ?

Top-Down

N-level [ZCD97] Starting from a connected graph, at each step, anode is replaced by another connected graph (Tiers, GT-ITM).

Transit-stub [ZCD97] 2-levels of hierarchy and some additional edges(GT-ITM, BRITE).

AS-level Topology

Router Level

EdgeConnectionMethod

Topologies

(1)

(2)

(3)

AS Nodes

21/ 43

Power-Law : rank exponent

Faloutsos brothers [FFF99] have analyzed the topology at the ASlevel and have established power-laws describing this topology.The rank rv of a note v is its index in the order of decreasing degree.

21/ 43



0.1

1

10

100

1000

1 10 100 1000 10000

"971108.rank"exp(6.34763) * x ** ( −0.811392 )

0.1

1

10

100

1000

1 10 100 1000 10000

"980410.rank"exp(6.62082) * x ** ( −0.821274 )

0.1

1

10

100

1000

1 10 100 1000 10000

"981205.rank"exp(6.16576) * x ** ( −0.74496 )

1

10

100

1 10 100 1000 10000

"routes.rank"exp(4.39519) * x ** ( −0.487592 )

21/ 43



Power Law (rank exponent).

Given a graph, the degree dv of a node v is propor-tional to the rank of the node rv to the power of aconstant R.

dv ∝ rRv

Nov. 97 Apr. 98 Dec. 98 Router 95

R 0,81 0,82 0,74 0,48

22/ 43

What about Power Laws ?

Incremental growth and affinity lead to Power Laws [BA99].Nodes are incrementally added. The probability that v is connectedto u depends on du:

P (u, v) =du∑k dk

22/ 43

What about Power Laws ?

Incremental growth and affinity lead to Power Laws [BA99].Nodes are incrementally added. The probability that v is connectedto u depends on du:

P (u, v) =du∑k dk

23/ 43

Let us check two Power-Laws

1

10

100

1 10 100 1000

outd

egre

e_ra

nk

rank

1

10

100

1000

1 10

freq

uenc

youtdegree_freq

Interdomain November 1997

23/ 43


100

1 10 100 1000

outd

egre

e_ra

nk

rank

1

10

100

freq

uenc

youtdegree_freq

GT-ITM flat ?

23/ 43


10

1 10 100 1000 10000

outd

egre

e_ra

nk

rank

1

10

100

1000

1 10

freq

uenc

youtdegree_freq

Waxman (BRITE)

23/ 43


1

10

1 10 100 1000

outd

egre

e_ra

nk

rank

1

10

100

1000

1 10

freq

uenc

youtdegree_freq

Transit-Stub (GT-ITM)

23/ 43


10

100

1 10 100 1000 10000

outd

egre

e_ra

nk

rank

1

10

100

1000

1 10

freq

uenc

youtdegree_freq

Barabasi Albert (BRITE)

24/ 43

Are these measurements meaningful?

We know network have structure AND power laws. Some projectstry to combine both (GridG [LD03]).However, some of the power laws observed by the Faloutsos brothersare correlated. Power laws could be a measurement bias. What kindof measurements can be used ?

I Expension

I Resilience

I Distortion

I Excentricity distri-bution

I Eigenvalues distribu-tion

I Set cover size, . . .

Observation [TGJ+02]:

I AS-level and router-level have similar characteristics

I Degree-based generators are significantly better at representinglarge scale properties of the Internet than structural ones.

I Hierarchy seem to arise from degree-based generators.

24/ 43



I Expension

I Resilience

I Distortion








24/ 43



I Expension

I Resilience

I Distortion








24/ 43



I Expension

I Resilience

I Distortion








24/ 43



I Expension

I Resilience

I Distortion








24/ 43



I Expension

I Resilience

I Distortion








24/ 43



I Expension

I Resilience

I Distortion








24/ 43



I Expension

I Resilience

I Distortion








24/ 43



I Expension

I Resilience

I Distortion








24/ 43



I Expension

I Resilience

I Distortion








24/ 43



I Expension

I Resilience

I Distortion








24/ 43



I Expension

I Resilience

I Distortion








24/ 43



I Expension

I Resilience

I Distortion








25/ 43

I’m lost. . . what should I do then?

I For a 10 000 nodes platform, degree-based generators seem togive good results.

I For a 100 node platform, power laws do not make sense andstructural generators may be more appropriate.

We need some additional informations(e.g. routing, bandwidth, latency, sharing capacity, . . . ).

25/ 43





25/ 43





26/ 43

Bandwidths, latencies, traffic, etc.

Topology generators only produce a graph: We need link character-istics as well

1 Model physical characteristics.I Some “models” in topology generatorsI Need to simulate background trafficI No accepted model for generating background trafficI Simulation can be very costly

2 Model end-to-end performance. Models ([LS01]) or Measure-ments (NWS, ...). Go from path modeling to link modeling?

Turns out to be a difficult question (DARPA workshop on networkmodeling).

27/ 43

Bandwidths, latencies, traffic, etc.

Maybe none of this matters?

I Fiber inside the network mostly unused

I Communication bottleneck is the local link

I Appropriate tuning of TCP or better protocols should saturatethe local link

I Don’t care about topology at all!

I Maybe none of this matters for my application (no networkcontention)

28/ 43

Compute Resources

What resources do we put at the end- points?

“Ad-hoc” generalization Look at the TeraGrid. Generate new sitesbased on existing sites

Statistical modeling

I Examing many production resourcesI Identify key statistical characteristicsI Come up with a generative/predictive model

29/ 43

Synthetic Clusters?

Many Grid resources are clusters. What is the “typical” distributionof clusters?“Commodity Cluster synthesizer” [KCC04a]

I Examined 114 production clusters (10K+ procs)I Came up with statistical models

I Linear fit between clock-rate and release-year within a processorfamily

I Quadratic fraction of processors released on a given year

I Validated model against a set of 191 clusters (10K+ procs)

I Models allow “extrapolation” for future configurations

I Models implemented in a resource generator

30/ 43

Resource Availability / Workload

Probabilistic models

I Naive: experimental distributed availability and unavailabil-ity intervals

I Sophisticated: Weibull distributions [NBW05]

Traces NWS, Desktop Grid resources [KCC04b]

Workload models e.g. batch schedulers

I TracesI Models [LF03]: job inter-arrival times (Gamma), amount

of work requested (Hyper-Gamma), number of processorsrequested: Compounded (2p, 1, ...)

I Adversary ?

31/ 43

A Sample Synthetic Grid

I Generate a 5,000 node graph with BRITE

I Annotate latency according to BRITE’s Euclidian distance method(scaling to obtain the desired network diameter)

I Annotate bandwidth based on a set of end-to-end NWS mea-surements

I Pick 30% of the end-points

I Generate a cluster at each end-point according to Kee’s syn-thesizer for Year 2006

I Model cluster load with Feitelson’s model with a range of pa-rameters for the random distributions

I Model resource failures based on Inca measurements on Tera-Grid

32/ 43

Outline




4 Conclusion

33/ 43

Simulation in a nutshell

Simulations are configurable, repeatable, fast.Key question: “Validation: correspondence between simulation andreal world”.

Boundaries above are blurred (d.e. simulation/simulation).A simulation can combine all paradigms at different levels.I MicroGrid

I ModelNet

I Emulab/DummyNet

I PlatetLab

I ChicagoSim,OptorSim

I SimGrid

33/ 43



Based solely on equations

Abstraction of system as a setof dependant actions and events(fine- or coarse- grain)

Trapping and virtualization of low-levelapplication/system actions

less abstract

more abstract

Discrete-EventSimulation

SimulationMathematical

Emulation


I ModelNet

I Emulab/DummyNet

I PlatetLab


I SimGrid

33/ 43



Microscopic: packet-level(fine-grain d.e. simulation)

Macroscopic: flows in a pipe(coarse-grain d.e simulation + math. simulation)

Actual flows go through some network

Network

less abstract

more abstract



Emulation


I ModelNet

I Emulab/DummyNet

I PlatetLab


I SimGrid

33/ 43



Microscopic: Cycle-accurate simulation(fine-grain d.e. simulation)

Macroscopic: flows in a pipe(coarse-grain d.e simulation + math. simulation)

Virtualization via another CPU/virtual machine

CPU

less abstract

more abstract



Emulation


I ModelNet

I Emulab/DummyNet

I PlatetLab


I SimGrid

33/ 43



Application

Virtualization (emulation of actual codewith trapping of application generated events)

Macroscopic: application = analytical ”flow”

with resource needs and dependanciesLess Macroscopic: set of abstract tasks

Application




less abstract

more abstract



Emulation


I ModelNet

I Emulab/DummyNet

I PlatetLab


I SimGrid

33/ 43



Application




Application




less abstract

more abstract



Emulation


I ModelNet

I Emulab/DummyNet

I PlatetLab


I SimGrid

33/ 43



Application




Application




less abstract

more abstract



Emulation


I ModelNet

I Emulab/DummyNet

I PlatetLab


I SimGrid

34/ 43

MicroGrid

MicroGrid is a UCSD project [SLJ+00] lead by Andrew Chien.Applications are supported by emulation and virtualization: Actualapplication code is executed on “virtualized” resourcesMicrogrid accounts for CPU and network

Resource gethostnames, sockets,GIS, MDS, NWS are wrapped

CPU Direct execution on a frac-tion of physical CPU: find agood mapping

Network Packet-level simulation(parallel version of MaSSF)

Time Synchronize real time andvirtual time: find the good ex-ecution rate

Virtual

Resources

Physical

Ressources

Application

MicroGrid

34/ 43

MicroGrid

MicroGrid is a UCSD project [SLJ+00] lead by Andrew Chien.Applications are supported by emulation and virtualization: Actualapplication code is executed on “virtualized” resourcesMicrogrid accounts for CPU and network

Resource gethostnames, sockets,GIS, MDS, NWS are wrapped

CPU Direct execution on a frac-tion of physical CPU: find agood mapping

Network Packet-level simulation(parallel version of MaSSF)

Time Synchronize real time andvirtual time: find the good ex-ecution rate

Virtual

Resources

Physical

Ressources

Application

MicroGrid

MicroGrid in a Nutshell

Network

Application Slow but hopefullyaccurate

CPU Can have high overheadsBut captures the overhead!

Slow but hopefullyaccurate

MicroGrid

less abstract

more abstract



Emulation

35/ 43

ModelNet

ModelNet is a UCSD/Duke project [VYW+02] lead by Amin Vahdat.Applications are supported by emulation and virtualization: Actualapplication code is executed on “virtualized” resourcesA key tradeoff in ModelNet is scalability versus accuracy.ModelNet accounts for network but not for CPU:

Resource gethostnames, socketsare wrapped

CPU Plain mapping, slowdown isnot taken into account

Network one emulator runningFreeBSD, a gigabit LAN, somehost machines with IP aliasingfor the virtual nodes ; emula-tion of heterogeneous links

Similar ideas have been used in other projects (Emulab [WLS+02],DummyNet, Panda [KBM+02], . . . )

35/ 43

ModelNet

ModelNet is a UCSD/Duke project [VYW+02] lead by Amin Vahdat.Applications are supported by emulation and virtualization: Actualapplication code is executed on “virtualized” resourcesA key tradeoff in ModelNet is scalability versus accuracy.ModelNet accounts for network but not for CPU:

Resource gethostnames, socketsare wrapped

CPU Plain mapping, slowdown isnot taken into account

Network one emulator runningFreeBSD, a gigabit LAN, somehost machines with IP aliasingfor the virtual nodes ; emula-tion of heterogeneous links

ModelNet in a Nutshell

Network

Application Slow but hopefullyaccurate

CPU

Fast and scalableSomehow realist

Doesn’t take computationinto account

ModelNet

less abstract

more abstract



Emulation

Similar ideas have been used in other projects (Emulab [WLS+02],DummyNet, Panda [KBM+02], . . . )

36/ 43

PlanetLab

PlanetLab is an open platform for developping, deploying, and ac-cessing planetary-scale services.

Planetary-scale 350 machines, 140 sites, 20 countries

Distribution Virtualization each user can get a slice of the platform.

Unbundled Management

I OS defines only local (per-node) behavior (global (network-wide) behavior implemented by services)

I multiple competing services running in parallel (shared, un-privileged interfaces)

Unstable like the real world Convenient to try P2P applications ordemonstrate the feasability of a middleware.

36/ 43

PlanetLab

PlanetLab is an open platform for developping, deploying, and ac-cessing planetary-scale services.

Planetary-scale 350 machines, 140 sites, 20 countries

Distribution Virtualization each user can get a slice of the platform.

Unbundled Management

I OS defines only local (per-node) behavior (global (network-wide) behavior implemented by services)

I multiple competing services running in parallel (shared, un-privileged interfaces)

Unstable like the real world Convenient to try P2P applications ordemonstrate the feasability of a middleware.

PlanetLab in a Nutshell

Application

CPU

Network

PlanetLab

This is not reproducible!!!

less abstract

more abstract



Emulation

37/ 43

ChicagoSim, OptorSim, GridSim

Previous solutions are kind of heavy. What about Higher-Level con-cerns ?

I data placement/replication

I grid economy

PhD students need a all-in-one Grid simulator to try a few things.They’d like to only have to plug in their algorithm.Amongst many simulators we can cite:

ChicSim built on ParSec [RF02]

OptorSim developped for Data-Grid [opt]

GridSim focus on Grid economy [MBA01]

Unfortunately, different needs require different simulation paradigms.

37/ 43




I grid economy





ChicSim, OptorSim, GridSim in a Nutshell

Application

Network

CPU

ChicSim, OptorSim, GridSim

less abstract

more abstract



Emulation


37/ 43




I grid economy





ChicSim, OptorSim, GridSim in a Nutshell

Application

Network

CPU

ChicSim, OptorSim, GridSim

less abstract

more abstract



Emulation


38/ 43

SIMGRID

Originally developed for scheduling research ; must be fast to allowfor thousands of simulation

Application No real application code is executed. Simulation is ex-pressed in term of communicating process that can perform com-putations (task communication or computation).

Resources No virtualization. A resource is defined by a rate a whichit does “work”, a fixed overhead that must be paid by each task,traces of the above if needed + failures.

Tasks Tasks can use multiple resources (data transfer over multiplelinks, computation that uses a disk and a CPU)

In previous projects, resource sharing “emerges” out of the low-levelemulation and simulation.SIMGRID uses a blend of mathematical simulation and coarse-graindiscrete event simulation.Simple API to “specify” an application rather than having it alreadyimplemented [CALLM03].

38/ 43

SIMGRID

Originally developed for scheduling research ; must be fast to allowfor thousands of simulation

Application No real application code is executed. Simulation is ex-pressed in term of communicating process that can perform com-putations (task communication or computation).

Resources No virtualization. A resource is defined by a rate a whichit does “work”, a fixed overhead that must be paid by each task,traces of the above if needed + failures.

Tasks Tasks can use multiple resources (data transfer over multiplelinks, computation that uses a disk and a CPU)

In previous projects, resource sharing “emerges” out of the low-levelemulation and simulation.SIMGRID uses a blend of mathematical simulation and coarse-graindiscrete event simulation.Simple API to “specify” an application rather than having it alreadyimplemented [CALLM03].

SimGrid in a Nutshell

SIMGRID

Application

CPUNetwork

less abstract

more abstract



Emulation

39/ 43

Outline




4 Conclusion

40/ 43

So what should I use?

It really depends on your goal / resources

I SimGrid’s models have clear limitations (e.g. for short trans-fers)

I SimGrid simulations are easy to set upI MicroGrid simulations take a lot of time (although they can

be parallelized)I ModelNet requires some hardware setupI SimGrid does not require for a full application to be writtenI MicroGrid models overhead of system calls implicitlyI PlanetLab does not enable reproducible experiments

Key trade-off: accuracy and speed

I The more abstract the simulation the fastestI The less abstract the simulation the most accurate

Does this trade-off really hold?

41/ 43

Simulation Validation

The crux of most simulation work in most domains of computerscience.

Validation is difficult and almost never done convincingly.

I Provide justification that the model is plausible

I Convince people that the simulator implements the model (ver-ification)

I Provide a few graphs that show that it’s reasonable (validationin a few special cases at best or validation against another“validated” simulator)

I Argue that although absolute values are off, the trends are re-spected

I Conclude that the simulator is useful to compare algorithms/designs

I Obtain scientific results?????

42/ 43

FLASH vs. FLASH

FLASH vs. (Simulated) FLASH: Closing the Simulation Loop [GKOH00]

FLASH project at Stanford

I building large-scale shared-memory multiprocessorsI Went from conception, to design, to actual hardware (32-node)I Used simulation heavily over 6 years

The authors went back and compared simulation to the real world!

I Simulation error is unavoidable(30% error in their case was notrare; Negating the impact of “we got 1.5% improvement”)

I One should focus on simulating the important thingsI A more complex simulator does not ensure better simulation

I Simple simulators worked better than sophisticated ones, which wereunstable

I Simple simulators predicted trends as well as slower, sophisticatedones

I It is key to use the real-world to tune/calibrate the simulator

42/ 43

FLASH vs. FLASH










42/ 43

FLASH vs. FLASH










42/ 43

FLASH vs. FLASH










42/ 43

FLASH vs. FLASH










42/ 43

FLASH vs. FLASH










ConclusionFor FLASH, the simple simula-tor was all that was needed. . .

43/ 43

Conclusion

Simulation is difficult : What does really matters?

Grid researchers are actively working on it : Usable tools exist; Gridsimulation today should not “re-invent the wheel”.

Two crucial next steps

I Repository of synthetic Grids, Grid measurement datasets,and Grid simulation software

I Scientifically sound validation experiments. Validate simu-lators. Understand what matters and what does not.

Only then will we have a scientific discipline with a standard wayto conduct experiments and a way for researcher to reproduceeach other’s results.

43/ 43

A.L. Barabasi and R. Albert.Emergence of scaling in random networks.Science, 59:509–512, 1999.Available at http://www.nd.edu/∼networks/Papers/science.pdf.

Henri Casanova, Arnaud Legrand, and Loris Marchal.Scheduling Distributed Applications: the SimGrid SimulationFramework.In Proceedings of the third IEEE International Symposium onCluster Computing and the Grid (CCGrid’03). IEEE ComputerSociety Press, May 2003.

Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos.On power-law relationships of the internet topology.In SIGCOMM, pages 251–262, 1999.Available at http://citeseer.nj.nec.com/faloutsos99powerlaw.html.

Jeff Gibson, Robert Kunz, David Ofelt, and Mark Heinrich.FLASH vs. (simulated) FLASH: Closing the simulation loop.

http://www.nd.edu/~networks/Papers/science.pdf

http://www.nd.edu/~networks/Papers/science.pdf

http://citeseer.nj.nec.com/faloutsos99powerlaw.html

http://citeseer.nj.nec.com/faloutsos99powerlaw.html

43/ 43

In Architectural Support for Programming Languages and Op-erating Systems, pages 49–58, 2000.

Thilo Kielmann, Henri E. Bal, Jason Maassen, Rob van Nieuw-poort, Lionel Eyraud, Rutger Hofman, and Kees Verstoep.Programming environments for high-performance grid comput-ing: the albatross project.Future Generation Computer Systems, 2002.Available at http://www.cs.vu.nl/∼kielmann/papers/fgcs01.ps.gz.

Yang-Suk Kee, Henri Casanova, and Andrew A. Chien.Realistic modeling and synthesis of resources for computationalgrids.In Proceedings of Supercomputing 2004. IEEE Computer Soci-ety Press, 2004.

Derrick Kondo, Andrew A. Chien, and Henri Casanova.Resource management for rapid application turnaround on en-terprise desktop grids.

http://www.cs.vu.nl/~kielmann/papers/fgcs01.ps.gz

http://www.cs.vu.nl/~kielmann/papers/fgcs01.ps.gz

43/ 43

In Proceedings of Supercomputing 2004. IEEE Computer Soci-ety Press, 2004.

D. Lu and P. Dinda.GridG: Generating realistic computational grids.Performance Evaluation Review, 30(4), March 2003.Available at http://www.cs.nwu.edu/∼pdinda/Papers/lu-dinda-per.pdf.

Uri Lublin and Dror G. Feitelson.The workload on parallel supercomputers: modeling the char-acteristics of rigid jobs.Journal of Parallel and Distributed Computing, 63(11):1105–1122, 2003.

C. Lee and J. Stepanek.On future global grid communication performance.In 10th Heterogeneous Computing Workshop (HCW’2001).IEEE Computer Society Press, 2001.

M. Murshed, R. Buyya, and D. Abramson.

http://www.cs.nwu.edu/~pdinda/Papers/lu-dinda-per.pdf

http://www.cs.nwu.edu/~pdinda/Papers/lu-dinda-per.pdf

43/ 43

Gridsim: A toolkit for the modeling and simulation of globalgrids.Technical report, Monash University, October 2001.Available at http://www.csse.monash.edu.au/∼rajkumar/gridsim/.

Daniel Nurmi, John Brevik, , and Rich Wolski.Modeling machine availability in enterprise and wide-area dis-tributed computing environments.In Proceedings of EuroPar 2005, 2005.

OptorSim: Simulating data access optimization algorithms.URL:http://edg-wp2.web.cern.ch/edg-wp2/optimization/optorsim.html.

Kavitha Ranganathan and Ian Foster.Decoupling computation and data scheduling in distributeddata-intensive applications.In Proceedings of the 11 th IEEE International Symposiumon High Performance Distributed Computing HPDC-11 20002(HPDC’02), Chicago, IL, 2002. IEEE Computer Science Press.

http://www.csse.monash.edu.au/~rajkumar/gridsim/

http://www.csse.monash.edu.au/~rajkumar/gridsim/

http://edg-wp2.web.cern.ch/edg-wp2/optimization/optorsim.html

http://edg-wp2.web.cern.ch/edg-wp2/optimization/optorsim.html

43/ 43

H. J. Song, X. Liu, D. Jakobsen, R. Bhagwan, X. Zhang, KenjiroTaura, and Andrew A. Chien.The microgrid: a scientific tool for modeling computationalgrids.In Supercomputing, 2000.Available at http://www.sc2000.org/techpapr/papers/pap.pap286.pdf.

H. Tangmunarunkit, R. Govindan, S. Jamin, S. Shenker, andW. Willinger.Network topology generators: Degree-based vs structural.In Proceedings of SIGCOMM’02. ACM Press, August 2002.Available at http://citeseer.nj.nec.com/tangmunarunkit02network.html.

Amin Vahdat, Ken Yocum, Kesvin Walsh, Priya Mahadevan,Dejan Kosti0107, Jeffrey Chase, and David Becker.Scalability and accuracy in a largescale network emulator.In Proceedings of the 5th ACM/USENIX Symposium on Oper-ating System Design and Implementation (OSDI), 2002.

http://www.sc2000.org/techpapr/papers/pap.pap286.pdf

http://www.sc2000.org/techpapr/papers/pap.pap286.pdf

http://citeseer.nj.nec.com/tangmunarunkit02network.html

http://citeseer.nj.nec.com/tangmunarunkit02network.html

43/ 43

Bernard M. Waxman.Routing of Multipoint Connections.IEEE Journal on Selected Areas in Communications, 6(9):1617–1622, December 1988.

Brian White, Jay Lepreau, Leigh Stoller, Robert Ricci, ShashiGuruprasad, Mac Newbold, Mike Hibler, Chad Barb, and Abhi-jeet Joglekar.An integrated experimental environment for distributed systemsand networks.pages 255–270, Boston, MA, December 2002.

Ellen W. Zegura, Kenneth L. Calvert, and Michael J. Donahoo.

A quantitative comparison of graph-based models for Internettopology.IEEE/ACM Transactions on Networking, 5(6):770–783, 1997.Available at http://citeseer.nj.nec.com/zegura97quantitative.html.

http://citeseer.nj.nec.com/zegura97quantitative.html

http://citeseer.nj.nec.com/zegura97quantitative.html

Documents

Models, Simulation, Emulation, [3] Experimention for Grid