Computing Outside The Box September 2009

Ian FosterComputation Institute

Argonne National Lab & University of Chicago

“I’ve been doing cloud computing since before it

was called grid.”

“Computation may someday be organized as a public utility …

The computing utility could become the basis for a new and important

industry.”

John McCarthy

(1961)

e) Science

“When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances”

(George Gilder, 2001)

Application

Infrastructure

Layered grid architecture

Application

Fabric“Controlling things locally”: Access to, & control of, resources

Connectivity“Talking to things”: communication (Internet protocols) & security

Resource“Sharing single resources”: negotiating access, controlling use

Collective“Managing multiple resources”: ubiquitous infrastructure services

User“Specialized services”: user- or appln-specific distributed services

InternetTransport

Application

rnet P

itectu

(“The Anatomy of the Grid,” 2001)

Application

InfrastructureService oriented infrastructure

13www.opensciencegrid.org

14www.opensciencegrid.org

Application

ApplicationService oriented applications

As of Oct19, 2008:

122 participants105 services

70 data35 analytical

Microarray clustering using Taverna

1. Query and retrieve microarray data from a caArray data service:cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/CaArrayScrub

2. Normalize microarray data using GenePattern analytical service node255.broad.mit.edu:6060/wsrf/services/cagrid/PreprocessDatasetMAGEService

1. Hierarchical clustering using geWorkbench analytical service: cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/HierarchicalClusteringMage

Workflow in/output

caGrid services

“Shim” servicesothers

Wei Tan

20Infrastructure

Applications

Energy

Progress of adoption

Energy

$$ $$$$

Energy

$$ $$$$

24Time

e) Science Enterprise

Grid Cloud

27US$3

28Credit: Werner Vogels

29Credit: Werner Vogels

Animoto EC2 image usage

Day 1 Day 8

Software

Platform

Infrastructure

Salesforce.com, Google,Animoto, …, …, caBIG,TeraGrid gateways

Software

Platform

Infrastructure Amazon, GoGrid, Sun,Microsoft, …

Software

Platform

Infrastructure Amazon, GoGrid,Microsoft, Flexiscale, …

Google, Microsoft, Amazon, …

Dynamo: Amazon’s highly available key-value store (DeCandia et al., SOSP’07)

Simple query model Weak consistency,

no isolation Stringent SLAs (e.g.,

300ms for 99.9% of requests; peak 500 requests/sec)

Incremental scalability

Symmetry Decentralization Heterogeneity

Technologies used in Dynamo

Problem Technique AdvantagePartitioning

Consistent hashing

Incremental scalability

High Availability for writes

Vector clocks with

reconciliation during reads

Version size is decoupled from

update rates

Handling temporary failures

Sloppy quorum and hinted

handoff

Provides high availability and

durability guarantee when some of the replicas are not

availableRecovering from

permanent failures

Anti-entropy using Merkle

Synchronizes divergent replicas in

the background

Membership and failure detection

Gossip-based membership protocol and

failure detection.

Preserves symmetry and avoids having a centralized registry

for storing membership and

node liveness information

Using IaaS for elastic capacity

NimbusNimbus

Amazon EC2Amazon EC2

STAR nodes

Local clusterLocal cluster

STAR nodes

Kate Keahey et al.

ApplicationService oriented applications

Birmingham•

The Globus-basedLIGO data grid

Replicating >1 Terabyte/day to 8 sites>100 million replicas so farMTBF = 1 month

LIGO Gravitational Wave Observatory

Cardiff

AEI/Golm

Pull “missing” files to a storage system

List of required

GridFTPLocal

ReplicaCatalog

ReplicaLocation

Data Replication

Service

Reliable File

Transfer Service Local

ReplicaCatalog

GridFTP

Data replication service

“Design and Implementation of a Data Replication Service Based on the Lightweight Data Replicator System,” Chervenak et al., 2005

ReplicaLocation

Data MovementData Location

Data Replication

Specializing further …

ServiceProvider

“Provide access to data D at S1, S2, S3 with performance P”

ResourceProvider

“Provide storage with performance P1, network with P2, …”

S3Replica catalog,User-level multicast, …

My servers

ChicagoChicago

handle.net

Chicago

IaaS provider

Chicago

Using IaaS in biomedical informatics

Clouds and supercomputers:Conventional wisdom?

Too slow

Too expensive

Clouds/clusters

Supercomputers

Loosely coupledapplications

Tightly coupledapplications

44Ed Walker, Benchmarking Amazon EC2 for high-performance scientific computing, ;Login, October 2008.

48D. Nurmi, J. Brevik, R. Wolski: QBETS: queue bounds estimation from

time series. SIGMETRICS 2007: 379-380

49D. Nurmi, J. Brevik, R. Wolski: QBETS: queue bounds estimation from

time series. SIGMETRICS 2007: 379-380

Good for rapid

response

Too expensive

Clouds/clusters

Supercomputers

Loosely coupled problems Ensemble runs to quantify climate model uncertainty Identify potential drug targets by screening a database

of ligand structures against target proteins Study economic model sensitivity to parameters Analyze turbulence dataset from many perspectives Perform numerical optimization to determine optimal

resource assignment in energy problems Mine collection of data from advanced light sources Construct databases of computed properties of chemical

compounds Analyze data from the Large Hadron Collider Analyze log data from 100,000-node parallel

computations

Many many tasks:Identifying potential drug targets

2M+ ligands Protein xtarget(s)

(Mike Kubal, Benoit Roux, and others)

report

DOCK6Receptor

(1 per protein:defines pocket

to bind to)

ZINC3-D

structures

ligands complexes

NAB scriptparameters

(defines flexibleresidues, #MDsteps)

Amber Score:1. AmberizeLigand3. AmberizeComplex5. RunNABScript

BuildNABScript

NABScript

Template

Amber prep:2. AmberizeReceptor4. perl: gen nabscript

FREDReceptor

(1 per protein:defines pocket

to bind to)

Manually prepDOCK6 rec file

Manually prepFRED rec file

1 protein(1MB)

6 GB2M

structures(6 GB)

DOCK6FRED ~4M x 60s x 1 cpu~60K cpu-hrs

Amber~10K x 20m x 1 cpu

~3K cpu-hrs

Select best ~500

~500 x 10hr x 100 cpu~500K cpu-hrsGCMC

PDBprotein

descriptions

Select best ~5KSelect best ~5K

For 1 target:4 million tasks

500,000 cpu-hrs(50 cpu-years)

DOCK on BG/P: ~1M tasks on 118,000 CPUs

CPU cores: 118784 Tasks: 934803 Elapsed time: 7257 sec Compute time: 21.43 CPU years Average task time: 667 sec Relative Efficiency: 99.7% (from 16 to 32 racks) Utilization:

Sustained: 99.6% Overall: 78.3%

• GPFS

• 1 script (~5KB)

• 2 file read (~10KB)

• 1 file write (~10KB)

• RAM (cached from GPFS on first task per node)

• 1 binary (~7MB)

• Static input data (~45MB)IoanRaicu

ZhaoZhang

MikeWilde

Time (secs)

Managing 160,000 cores

Slower shared storage

High-speed local “disk”

Falkon

Scaling Posix to

petascale

LFS Computenode

(local datasets)

LFS Computenode

(local datasets)

Largedataset

CN-striped intermediate file system

Torus and tree interconnects

Global file systemChirp(multicast)

MosaStore(striping)

Staging

Intermediate

59Efficiency for 4 second tasks and varying data size (1KB to 1MB) for CIO and GPFS up to 32K processors

“Sine” workload, 2M tasks, 10MB:10ms ratio, 100 nodes, GCC policy, 50GB caches/node

IoanRaicu

“Sine” workload, 2M tasks, 10MB:10ms ratio, 100 nodes, GCC policy, 50GB caches/node

IoanRaicu

62Same scenario, but with dynamic resource provisioning

63Same scenario, but with dynamic resource provisioning

Data diffusion sine-wave workload: Summary

GPFS 5.70 hrs, ~8Gb/s, 1138 CPU hrs DD+SRP 1.80 hrs, ~25Gb/s, 361 CPU hrs DD+DRP 1.86 hrs, ~24Gb/s, 253 CPU hrs

Good for rapid

response

Excellent

Clouds/clusters

Supercomputers

“The computer revolution hasn’t happened yet.”

Alan Kay, 1997

67Time

e) Science Enterprise Consumer

Grid Cloud ????

Energy InternetThe Shape of Grids to Come?

Computation Institutewww.ci.uchicago.edu

Thank you!

Computing Outside The Box September 2009

Technology

IB+ Outside Sensor Box Outside Sensor Extension Box · animeo IB+ OUTSIDE SENSOR BOX. REF. 5060202F - 7/12 A KUVAT [1] IB+ Outside Sensor Box [2] IB+ Outside Sensor Extension Box

Outside The Box

FRBR outside the box

Thinking outside the box

Computing Outside The Box

“Stocking” Outside the Box

Outside Box Thinking

Inking Outside the Box: How Context ... - microsoft.com · Inking Outside the Box: A Microsoft Research Vision of Pen Computing 5 This simple illustration is ripe with lessons for

Computing Outside The Box June 2009

FINANCING OUTSIDE THE BOX--FINAL BOOKLET · FINANCING OUTSIDE THE BOX--FINAL BOOKLET ... the

Tarot Outside the Box

Cloud computing-sales-think-outside-the-box

Think Outside the Box Think Outside the Box - EnerJazz Home

Showplace Outside the Box

Employ Outside The Box

MANAGEMENT OUTSIDE the BOX

Commuting Outside the Box

IB+ Outside Sensor Box Outside Sensor Extension Box · animeo IB+ OUTSIDE SENSOR BOX. REF. 5060202E - 5/12 A ABBILDUNGEN [1] IB+ Outside Sensor Box [2] IB+ Outside Sensor Extension

Outside the box Introduction

Think outside the box