Computing Outside The Box September 2009

Preview:

DESCRIPTION

Keynote talk at Parco 2009 in Lyon, France. An updated version of http://www.slideshare.net/ianfoster/computing-outside-the-box-june-2009.

Citation preview

1

Ian FosterComputation Institute

Argonne National Lab & University of Chicago

3

“I’ve been doing cloud computing since before it

was called grid.”

4

1890

5

1953

6

“Computation may someday be organized as a public utility …

The computing utility could become the basis for a new and important

industry.”

John McCarthy

(1961)

7

8Time

Con

nect

ivity

(on

log

scal

e) Science

“When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances”

(George Gilder, 2001)

Grid

9

Application

Infrastructure

10

Layered grid architecture

Application

Fabric“Controlling things locally”: Access to, & control of, resources

Connectivity“Talking to things”: communication (Internet protocols) & security

Resource“Sharing single resources”: negotiating access, controlling use

Collective“Managing multiple resources”: ubiquitous infrastructure services

User“Specialized services”: user- or appln-specific distributed services

InternetTransport

Application

Link

Inte

rnet P

roto

col

Arch

itectu

re

(“The Anatomy of the Grid,” 2001)

11

Application

InfrastructureService oriented infrastructure

12

13www.opensciencegrid.org

14www.opensciencegrid.org

15

Application

InfrastructureService oriented infrastructure

16

ApplicationService oriented applications

InfrastructureService oriented infrastructure

17

18

As of Oct19, 2008:

122 participants105 services

70 data35 analytical

19

Microarray clustering using Taverna

1. Query and retrieve microarray data from a caArray data service:cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/CaArrayScrub

2. Normalize microarray data using GenePattern analytical service node255.broad.mit.edu:6060/wsrf/services/cagrid/PreprocessDatasetMAGEService

1. Hierarchical clustering using geWorkbench analytical service: cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/HierarchicalClusteringMage

Workflow in/output

caGrid services

“Shim” servicesothers

Wei Tan

20Infrastructure

Applications

21

Energy

Progress of adoption

22

Energy

Progress of adoption

$$ $$$$

23

Energy

Progress of adoption

$$ $$$$

24Time

Con

nect

ivity

(on

log

scal

e) Science Enterprise

“When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances”

(George Gilder, 2001)

Grid Cloud

25

26

27US$3

28Credit: Werner Vogels

29Credit: Werner Vogels

30

Animoto EC2 image usage

Day 1 Day 8

0

4000

31

Software

Platform

Infrastructure

Salesforce.com, Google,Animoto, …, …, caBIG,TeraGrid gateways

32

Software

Platform

Infrastructure Amazon, GoGrid, Sun,Microsoft, …

Salesforce.com, Google,Animoto, …, …, caBIG,TeraGrid gateways

33

Software

Platform

Infrastructure Amazon, GoGrid,Microsoft, Flexiscale, …

Google, Microsoft, Amazon, …

Salesforce.com, Google,Animoto, …, …, caBIG,TeraGrid gateways

34

35

Dynamo: Amazon’s highly available key-value store (DeCandia et al., SOSP’07)

Simple query model Weak consistency,

no isolation Stringent SLAs (e.g.,

300ms for 99.9% of requests; peak 500 requests/sec)

Incremental scalability

Symmetry Decentralization Heterogeneity

Technologies used in Dynamo

Problem Technique AdvantagePartitioning

Consistent hashing

Incremental scalability

High Availability for writes

Vector clocks with

reconciliation during reads

Version size is decoupled from

update rates

Handling temporary failures

Sloppy quorum and hinted

handoff

Provides high availability and

durability guarantee when some of the replicas are not

availableRecovering from

permanent failures

Anti-entropy using Merkle

trees

Synchronizes divergent replicas in

the background

Membership and failure detection

Gossip-based membership protocol and

failure detection.

Preserves symmetry and avoids having a centralized registry

for storing membership and

node liveness information

Using IaaS for elastic capacity

NimbusNimbus

Amazon EC2Amazon EC2

STAR nodes

Local clusterLocal cluster

STAR nodes

Kate Keahey et al.

38

ApplicationService oriented applications

InfrastructureService oriented infrastructure

39

Birmingham•

The Globus-basedLIGO data grid

Replicating >1 Terabyte/day to 8 sites>100 million replicas so farMTBF = 1 month

LIGO Gravitational Wave Observatory

Cardiff

AEI/Golm

40

Pull “missing” files to a storage system

List of required

Files

GridFTPLocal

ReplicaCatalog

ReplicaLocation

Index

Data Replication

Service

Reliable File

Transfer Service Local

ReplicaCatalog

GridFTP

Data replication service

“Design and Implementation of a Data Replication Service Based on the Lightweight Data Replicator System,” Chervenak et al., 2005

ReplicaLocation

Index

Data MovementData Location

Data Replication

41

Specializing further …

User

ServiceProvider

“Provide access to data D at S1, S2, S3 with performance P”

ResourceProvider

“Provide storage with performance P1, network with P2, …”

D

S1

S2

S3

D

S1

S2

S3Replica catalog,User-level multicast, …

D

S1

S2

S3

42

My servers

ChicagoChicago

handle.net

BIRN

Chicago

IaaS provider

Chicago

BIRN

Chicago

Using IaaS in biomedical informatics

43

Clouds and supercomputers:Conventional wisdom?

Too slow

Too expensive

Clouds/clusters

Supercomputers

Loosely coupledapplications

Tightly coupledapplications

44Ed Walker, Benchmarking Amazon EC2 for high-performance scientific computing, ;Login, October 2008.

45Ed Walker, Benchmarking Amazon EC2 for high-performance scientific computing, ;Login, October 2008.

46Ed Walker, Benchmarking Amazon EC2 for high-performance scientific computing, ;Login, October 2008.

47Ed Walker, Benchmarking Amazon EC2 for high-performance scientific computing, ;Login, October 2008.

48D. Nurmi, J. Brevik, R. Wolski: QBETS: queue bounds estimation from

time series. SIGMETRICS 2007: 379-380

49D. Nurmi, J. Brevik, R. Wolski: QBETS: queue bounds estimation from

time series. SIGMETRICS 2007: 379-380

50

51

Clouds and supercomputers:Conventional wisdom?

Good for rapid

response

Too expensive

Clouds/clusters

Supercomputers

Loosely coupledapplications

Tightly coupledapplications

5252

Loosely coupled problems Ensemble runs to quantify climate model uncertainty Identify potential drug targets by screening a database

of ligand structures against target proteins Study economic model sensitivity to parameters Analyze turbulence dataset from many perspectives Perform numerical optimization to determine optimal

resource assignment in energy problems Mine collection of data from advanced light sources Construct databases of computed properties of chemical

compounds Analyze data from the Large Hadron Collider Analyze log data from 100,000-node parallel

computations

53

Many many tasks:Identifying potential drug targets

2M+ ligands Protein xtarget(s)

(Mike Kubal, Benoit Roux, and others)

54

start

report

DOCK6Receptor

(1 per protein:defines pocket

to bind to)

ZINC3-D

structures

ligands complexes

NAB scriptparameters

(defines flexibleresidues, #MDsteps)

Amber Score:1. AmberizeLigand3. AmberizeComplex5. RunNABScript

end

BuildNABScript

NABScript

NABScript

Template

Amber prep:2. AmberizeReceptor4. perl: gen nabscript

FREDReceptor

(1 per protein:defines pocket

to bind to)

Manually prepDOCK6 rec file

Manually prepFRED rec file

1 protein(1MB)

6 GB2M

structures(6 GB)

DOCK6FRED ~4M x 60s x 1 cpu~60K cpu-hrs

Amber~10K x 20m x 1 cpu

~3K cpu-hrs

Select best ~500

~500 x 10hr x 100 cpu~500K cpu-hrsGCMC

PDBprotein

descriptions

Select best ~5KSelect best ~5K

For 1 target:4 million tasks

500,000 cpu-hrs(50 cpu-years)

55

56

DOCK on BG/P: ~1M tasks on 118,000 CPUs

CPU cores: 118784 Tasks: 934803 Elapsed time: 7257 sec Compute time: 21.43 CPU years Average task time: 667 sec Relative Efficiency: 99.7% (from 16 to 32 racks) Utilization:

Sustained: 99.6% Overall: 78.3%

• GPFS

• 1 script (~5KB)

• 2 file read (~10KB)

• 1 file write (~10KB)

• RAM (cached from GPFS on first task per node)

• 1 binary (~7MB)

• Static input data (~45MB)IoanRaicu

ZhaoZhang

MikeWilde

Time (secs)

57

Managing 160,000 cores

Slower shared storage

High-speed local “disk”

Falkon

58

Scaling Posix to

petascale

LFS Computenode

(local datasets)

LFS Computenode

(local datasets)

. . .

Largedataset

CN-striped intermediate file system

Torus and tree interconnects

Global file systemChirp(multicast)

MosaStore(striping)

Staging

Intermediate

Local

59Efficiency for 4 second tasks and varying data size (1KB to 1MB) for CIO and GPFS up to 32K processors

60

“Sine” workload, 2M tasks, 10MB:10ms ratio, 100 nodes, GCC policy, 50GB caches/node

IoanRaicu

61

“Sine” workload, 2M tasks, 10MB:10ms ratio, 100 nodes, GCC policy, 50GB caches/node

IoanRaicu

62Same scenario, but with dynamic resource provisioning

63Same scenario, but with dynamic resource provisioning

64

Data diffusion sine-wave workload: Summary

GPFS 5.70 hrs, ~8Gb/s, 1138 CPU hrs DD+SRP 1.80 hrs, ~25Gb/s, 361 CPU hrs DD+DRP 1.86 hrs, ~24Gb/s, 253 CPU hrs

65

Clouds and supercomputers:Conventional wisdom?

Good for rapid

response

Excellent

Clouds/clusters

Supercomputers

Loosely coupledapplications

Tightly coupledapplications

66

“The computer revolution hasn’t happened yet.”

Alan Kay, 1997

67Time

Con

nect

ivity

(on

log

scal

e) Science Enterprise Consumer

“When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances”

(George Gilder, 2001)

Grid Cloud ????

68

Energy InternetThe Shape of Grids to Come?

Computation Institutewww.ci.uchicago.edu

Thank you!