32
An Overview of Swift Parallel scripting for distributed systems Mike Wilde [email protected] Computation Institute, University of Chicago and Argonne National Laboratory www.ci.uchicago.edu/swift

An Overview of Swift Parallel scripting for distributed systems Mike Wilde [email protected]

  • Upload
    sona

  • View
    36

  • Download
    3

Embed Size (px)

DESCRIPTION

An Overview of Swift Parallel scripting for distributed systems Mike Wilde [email protected] Computation Institute, University of Chicago and Argonne National Laboratory www.ci.uchicago.edu/swift. Acknowledgments. - PowerPoint PPT Presentation

Citation preview

Page 1: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

An Overview of SwiftParallel scripting for distributed systems

Mike [email protected]

Computation Institute, University of Chicagoand Argonne National Laboratory

www.ci.uchicago.edu/swift

Page 2: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

Acknowledgments• Swift effort is supported in part by NSF grants OCI-

721939 and PHY-636265, NIH DC08638 and DA024304, The Argonne LDRD program, and the UChicago/Argonne Computation Institute.

• The Swift team:– Ben Clifford, Ian Foster, Mihael Hategan, Sarah Kenny, Mike

Wilde, Zhao Zhang, Yong Zhao

• Java CoG Kit used by Swift developed by:– Mihael Hategan, Gregor Von Laszewski, and many others

• User contributed workflows and application user– I2U2 Project, U.Chicago Molecular Dynamics,

U.Chicago Radiology and Human Neuroscience Lab, Umaryland, Dartmouth Brain Imaging Center

Page 3: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

Swift programs

• A Swift script is a set of functions– Atomic functions wrap & invoke application programs

– Composite functions invoke other functions

• Data is typed as composable arrays and structures of files and simple scalar types (int, float, string)

• Collections of persistent file structures (datasets) are mapped into this data model

• Members of datasets can be processed in parallel

• Statements in a procedure are executed in data-flow dependency order and concurrency

• Variables are single assignment

• Provenance is gathered as scripts execute

Page 4: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

A simple Swift scripttype imagefile;

(imagefile output) flip(imagefile input) { app { convert "-rotate" "180" @input @output; }}

imagefile stars <"orion.2008.0117.jpg">;imagefile flipped <"output.jpg">;

flipped = flip(stars);

Page 5: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

Parallelism via foreach { }type imagefile;

(imagefile output) flip(imagefile input) {

app {

convert "-rotate" "180" @input @output;

}

}

imagefile observations[ ] <simple_mapper; prefix=“orion”>;

imagefile flipped[ ] <simple_mapper; prefix=“orion-flipped”>;

foreach obs,i in observations { flipped[i] = flip(obs); }

Nameoutputs

based on inputs

Process alldataset members

in parallel

Page 6: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

The Swift Scripting Model

• Program in high-level, functional model• Swift hides issues of location, mechanism

and data representation• Basic active elements are functions that

encapsulate application tools and run jobs• Typed data model: structures and arrays of

files and scalar types• Variables are single assignment• Control structures perform conditional,

iterative and parallel operations

Page 7: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

The Variable model

• Single assignment:– Can only assign a value to a var once– This makes data flow semantics much

cleaner to specify, understand and implement

• Variables are scalars or references to composite objects

• Variables are typed• File typed variables are “mapped” to

files

Page 8: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

Swift statements

• Var declarations– Can be mapped

• Type declarations• Assignment statements

– Assignments are type-checked

• Control-flow statements– if, foreach, iterate

• Function declarations

Page 9: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

Data Flow Model

• This is what makes it possible to be location independent

• Computations proceed when data is ready (often not in source-code order)

• User specifies DATA dependencies, doesn’t worry about sequencing of operations

• Exposes maximal parallelism

Page 10: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

Data Management

• Directories and management model– local dir, storage dir, work dir– caching within workflow– reuse of files on restart

• Makes unique names for: jobs, files, wf• Can leave data on a site

– For now, in Swift you need to track it– In Pegasus (and VDS) this is done automatically

Page 11: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

Mappers and Vars

• Vars can be “file valued”

• Many useful mappers built-in, written in Java to the Mapper interface

• “Ext”ernal mapper can be easily written as an external script in any language

Page 12: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

Example: fMRI Type Definitionstype Study {

Group g[ ]; }

type Group { Subject s[ ];

}

type Subject { Volume anat; Run run[ ];

}

type Run { Volume v[ ];

}

type Volume { Image img; Header hdr;

}

type Image {};

type Header {};

type Warp {};

type Air {};

type AirVec { Air a[ ];

}

type NormAnat {Volume anat; Warp aWarp; Volume nHires;

}Simplified version of fMRI AIRSN Program

(Spatial Normalization)

Page 13: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

Coding your own “external” mapper

awk <angle-spool-1-2 'BEGIN { server="gsiftp://tp-osg.ci.uchicago.edu//disks/ci-gpfs/angle"; }{ printf "[%d] %s/%s\n", i++, server, $0 }’

$ cat angle-spool-1-2 spool_1/anl2-1182294000-dump.1.167.pcap.gzspool_1/anl2-1182295800-dump.1.170.pcap.gzspool_1/anl2-1182296400-dump.1.171.pcap.gzspool_1/anl2-1182297600-dump.1.173.pcap.gz…$ ./map1 | head[0] gsiftp://tp-osg.ci.uchicago.edu//disks/ci-gpfs/angle/spool_1/anl2-1182294000-

dump.1.167.pcap.gz[1] gsiftp://tp-osg.ci.uchicago.edu//disks/ci-gpfs/angle/spool_1/anl2-1182295800-

dump.1.170.pcap.gz[2] gsiftp://tp-osg.ci.uchicago.edu//disks/ci-gpfs/angle/spool_1/anl2-1182296400-

dump.1.171.pcap.gz…

Page 14: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

fMRI Example Workflow(Run resliced) reslice_wf ( Run r) { Run yR = reorientRun( r , "y", "n" ); Run roR = reorientRun( yR , "x", "n" ); Volume std = roR.v[1]; AirVector roAirVec = alignlinearRun(std, roR, 12, 1000, 1000, "81 3 3"); resliced = resliceRun( roR, roAirVec, "-o", "-k");}

Collaboration with James Dobson, Dartmouth

(Run or) reorientRun (Run ir, string direction, string overwrite) { foreach Volume iv, i in ir.v { or.v[i] = reorient (iv, direction, overwrite); }}

Page 15: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

AIRSN Program Definition

(Run snr) functional ( Run r, NormAnat a,

Air shrink ) {

Run yroRun = reorientRun( r , "y" );

Run roRun = reorientRun( yroRun , "x" );

Volume std = roRun[0];

Run rndr = random_select( roRun, 0.1 );

AirVector rndAirVec = align_linearRun( rndr, std, 12, 1000, 1000, "81 3 3" );

Run reslicedRndr = resliceRun( rndr, rndAirVec, "o", "k" );

Volume meanRand = softmean( reslicedRndr, "y", "null" );

Air mnQAAir = alignlinear( a.nHires, meanRand, 6, 1000, 4, "81 3 3" );

Warp boldNormWarp = combinewarp( shrink, a.aWarp, mnQAAir );

Run nr = reslice_warp_run( boldNormWarp, roRun );

Volume meanAll = strictmean( nr, "y", "null" )

Volume boldMask = binarize( meanAll, "y" );

snr = gsmoothRun( nr, boldMask, "6 6 6" );

}

(Run or) reorientRun (Run ir, string direction) { foreach Volume iv, i in ir.v { or.v[i] = reorient(iv, direction); } }

Page 16: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

Automated image registration for spatial normalization

reorientRun

reorientRun

reslice_warpRun

random_select

alignlinearRun

resliceRun

softmean

alignlinear

combinewarp

strictmean

gsmoothRun

binarize

reorient/01

reorient/02

res lic e_w arp/22

align l inear/03 al ign l inear/07align l inear/11

reorient/05

reorient/06

res lic e_w ar p/23

reorient/09

reorient/10

res l ic e_w arp/24

reorient/25

reorient/51

res l ic e_warp/26

reorient/27

reorient/52

r es l ic e_warp/28

reorient/29

reorient/53

res l ic e_warp/30

reorient/31

reorient/54

res lic e_w arp/32

reorient/33

r eorient/55

resl ic e_warp/34

reorient/35

reor ient/56

resl ice_warp/36

reor ient/37

reorient/57

resl ice_warp/38

resl ice/04 resl ice/08res l ice/12

gsm ooth/41

s tr ic tmean/39

gs mooth/42gs mooth/43gs mooth/44 gsm ooth/45 gs mooth/46 gs m ooth/47 gs m ooth/48 gs m ooth/49 gsm ooth/50

softm ean/13

al ign l inear/17

com binewarp/21

binarize/40

reorient

reorient

alignlinear

reslice

softmean

alignlinear

combine_warp

reslice_warp

strictmean

binarize

gsmooth

Collaboration with James Dobson, Dartmouth [SIGMOD Record Sep05]

AIRSN workflow expanded:AIRSN workflow:

Page 17: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

Workflow Coding/Style Issues

• Swift is not a data manipulation language - use scripting tools for that

• Expose or hide parameters

• One atomic, many variants

• Expose or hide program structure

• Driving a parameter sweep with readdata() - reads a csv file into struct[].

• Can code your own mappers

Page 18: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

Virtual Node(s)

SwiftScript

Abstractcomputation

Virtual DataCatalog

SwiftScriptCompiler

Specification Execution

Worker Nodes

Provenancedata

ProvenancedataProvenance

collector

launcher

launcher

file1

file2

file3

AppF1

AppF2

Scheduling

Execution Engine(Karajan w/

Swift Runtime)

Swift runtimecallouts

C

C CC

Status reporting

Swift Architecture

Provisioning

FalkonResource

Provisioner

AmazonEC2

Page 19: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

Running swift

• Fully contained Java grid client

• Can test on a local machine

• Can run on a PBS cluster

• Runs on multiple clusters over Grid interfaces

Page 20: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

WorkflowStatus

and logs

swiftcommand

launcher

launcher

f1

f2

f3

Worker Nodes

Appa1

Appa2

Using Swift

SwiftScript

Appa1

Appa2

Data

f1 f2 f3

sitelist

applist

Provenancedata

Page 21: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

Site selection and throttling

• Avoid overloading target infrastructure

• Base resource choice on current conditions and real response for you

• Balance this with space availability

• Things are getting more automated.

Page 22: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

Clustering and Provisioning

• Can cluster jobs together to reduce grid overhead for small jobs

• Can use a provisioner

• Can use a provider to go straight to a cluster

Page 23: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

Research topic:Collective I/O for Petascale Workflow

• Goal: efficient I/O at petascale for loosely coupled programming

• Approach– Distribute input data to local filesystems– Create fast local filesystems distributed

throughout the cluster– Asynchronously batch output data in a

pipelined manner to later stages and persistent storage

• Work is in early stages

Page 24: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

Abstract architecture

Global FS

Workload Processor

Intermediatefilesystem &

IO processorIFS

Computenode

LFS

Computenode

LFS

<-- Interconnect -->

. . .

Page 25: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

Application Dataflow Patterns

Common input datasets

Per-invocation datasets. . .

Large output dataset

App App AppApp

App App App App

App

Page 26: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

LCP Collective IO Model

Global FS

Global FS

Application Script

ZOID onIO node

ZOID IFS for staging

<-- Torus & Tree Interconnects -->

CN-striped IFS for Data

Computenode(local

datasets)

LFSCompute

node(local

datasets)

LFS. . .

IFSCompute

node

IFSseg

IFSCompute

node

IFSseg

. . .Large Input

Dataset

Page 27: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

Distributor and collector

Global FS

IFSNodeIFS

. . .

CNLFS

CNLFS

. . .

CNLFS

CNLFS

. . .

IFSNodeIFS

DistributorCollector

55

44

33

22

11

Page 28: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

Mapping Compute Nodes to IFSs

<-- Torus Interconnect -->

CNLFS

. . .

CNLFS

CNLFS

CNLFS

CNLFS . . .

CNLFS

CNLFS

CNLFS

CNLFS . . .

CNLFS

CNLFS

CNLFS

. . .CNLFS

CNLFS

CNLFS . . .

CNLFS

CNLFS

CNLFS

CNLFS . . .

CNLFS

CNLFS

CNLFS

CNLFS . . .

CNLFS

CNLFS

CNLFS

. . .CNLFS

CNLFS

.

.

.

.

.

.

IFS

IFS

Page 29: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

Asynchronous Output Stagingcompute out compute outcompute outcompute out

compute

ou

t

to GFS

to GFS

computeo

utcompute

ou

tcompute

ou

t

next task

ou

t

Page 30: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

Getting Started• www.ci.uchicago.edu/swift

– Documentation -> tutorials

• Get CI accounts– https://www.ci.uchicago.edu/accounts/

• Request: workstation, gridlab, teraport

• Get a DOEGrids Grid Certificate– http://www.doegrids.org/pages/cert-request.html

• Virtual organization: OSG / OSGEDU• Sponsor: Mike Wilde, [email protected], 630-252-7497

• Develop your Swift code and test locally, then:– On PBS / TeraPort– On OSG: OSGEDU

• Use simple scripts (Perl, Python) as your test apps

http://www.ci.uchicago.edu/swift

Page 31: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

Summary• Clean separation of logical/physical concerns

– XDTM specification of logical data structures

+ Concise specification of parallel programs– SwiftScript, with iteration, etc.

+ Efficient execution (on distributed resources)– Karajan+Falkon: Grid interface, lightweight dispatch, pipelining,

clustering, provisioning

+ Rigorous provenance tracking and query– Records provenance data of each job executed

Improved usability and productivity– Demonstrated in numerous applications

http://www.ci.uchicago.edu/swift

Page 32: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl

For Swift Information

• www.ci.uchicago.edu/swift– Quick Start Guide:

• http://www.ci.uchicago.edu/swift/guides/quickstartguide.php

– User Guide:• http://www.ci.uchicago.edu/swift/guides/userguide.php

– Introductory Swift Tutorials:• http://www.ci.uchicago.edu/swift/docs/index.php