21
Exploring Scientific Discovery with Large-Scale Parallel Scripting Tim Armstrong 1 Justin M. Wozniak 2 Michael Wilde 12 1 University of Chicago 2 Argonne National Laboratory May 15, 2013

Exploring Scientific Discovery with Large-Scale Parallel ...people.cs.uchicago.edu/~tga/pubs/scale-2013-slides.pdf · M. Hategan, B. Cli ord, and I. Raicu, Parallel scripting for

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Exploring Scientific Discovery with Large-Scale Parallel ...people.cs.uchicago.edu/~tga/pubs/scale-2013-slides.pdf · M. Hategan, B. Cli ord, and I. Raicu, Parallel scripting for

Exploring Scientific Discovery with Large-ScaleParallel Scripting

Tim Armstrong 1 Justin M. Wozniak 2

Michael Wilde12

1University of Chicago

2Argonne National Laboratory

May 15, 2013

Page 2: Exploring Scientific Discovery with Large-Scale Parallel ...people.cs.uchicago.edu/~tga/pubs/scale-2013-slides.pdf · M. Hategan, B. Cli ord, and I. Raicu, Parallel scripting for

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Overview

Parallel scripting: massive scalability with (relative) ease

• Scaling up real science applications difficult:• Must adapt code to radically different programming model• Concurrency bugs• Load balancing, data management, etc

• SciColSim: compute-intensive science app

• Swift/T: super-scalable high-performance scripting system forparallel composition of existing code

Page 3: Exploring Scientific Discovery with Large-Scale Parallel ...people.cs.uchicago.edu/~tga/pubs/scale-2013-slides.pdf · M. Hategan, B. Cli ord, and I. Raicu, Parallel scripting for

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

The Scripting Paradigm

• Low-level language (e.g. C) + high-level language (e.g.Python)

High-level script

Optimized performance-critical functions

orchestrates

Page 4: Exploring Scientific Discovery with Large-Scale Parallel ...people.cs.uchicago.edu/~tga/pubs/scale-2013-slides.pdf · M. Hategan, B. Cli ord, and I. Raicu, Parallel scripting for

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Parallel Scripting

• Can retrofit parallelism onto sequential scripting languages:• Threads• Message passing (MPI, etc.)• Abstractions (MapReduce, etc.)

• But parallelism is a second-class concept in the language...

• Q: Why can’t I express parallelism with loops, conditionals,variables, etc?

Page 5: Exploring Scientific Discovery with Large-Scale Parallel ...people.cs.uchicago.edu/~tga/pubs/scale-2013-slides.pdf · M. Hategan, B. Cli ord, and I. Raicu, Parallel scripting for

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Parallel Scripting in Swift/T• Q: Why can’t I express parallelism with loops, conditionals,

variables?• A: you can in Swift!• The Swift parallel scripting language[WFI+09]:

• Implicit dataflow parallelism• Language statements execute concurrently in dataflow order• Single-assignment variables guarantee determinism• Determinism extends to additional, rich, data structures:

arrays, hash tables, structs.

float results[];

file data = input_file("my.data");

foreach i in [1:N] { Independent parallel iterations

if (predicate(i)) { Dataflow dependencies

results[i] = compute(i, data);

}

}

mean, stdev = stat_summ( results);

Swift code with implied parallel dataflow

Page 6: Exploring Scientific Discovery with Large-Scale Parallel ...people.cs.uchicago.edu/~tga/pubs/scale-2013-slides.pdf · M. Hategan, B. Cli ord, and I. Raicu, Parallel scripting for

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Swift/T Scalable Implementation[WAW+]

• Can harness tens or hundreds of thousands of cores

• All runtime components distributed and scalable: data store,task distributor & script executor

• Optimizing compiler (stc) reduces messaging

Shared State

Data Store

Task Queue

Server Processes

Execution

Control/Worker Processes

ControlFlow

Load Balancing

RuleEngine Server

Server

TaskExecution

…………

…………

Process

Task flow

RuleEngine

Legend

Swift/T runtime services breakdown (left) and task dispatch (right)

Page 7: Exploring Scientific Discovery with Large-Scale Parallel ...people.cs.uchicago.edu/~tga/pubs/scale-2013-slides.pdf · M. Hategan, B. Cli ord, and I. Raicu, Parallel scripting for

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

SciColSim Application: Simulating Scientific Discovery• Ongoing research at University of Chicago• Want to understand process of scientific discovery: [ER10]

• How do scientists select hypotheses to work on?• What are the most effective strategies?

• Can explore with simulation:• Model knowledge as graph of concepts• Simulate different graph exploration strategies• Can measure how “efficient” strategy is

• Computational characteristics:• Each simulation implemented with sequential C++ code• Floating point intensive: many probability calculations

Page 8: Exploring Scientific Discovery with Large-Scale Parallel ...people.cs.uchicago.edu/~tga/pubs/scale-2013-slides.pdf · M. Hategan, B. Cli ord, and I. Raicu, Parallel scripting for

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Evaluating model parameters

• “Ensemble” of randomized simulations

• Results of simulation are averaged to evaluate “goodness” ofcurrent parameters

• Task duration is 0.2-20s. Runtime depends on inputparameters, plus significant random variation.

ensemble of randomized simulations

analyze and choose new parameters

parameter set i

parameter set i + i

parameter set i + 2

Task

Dataflow variable

Datadependency

Evaluating objective function and updating parameters

Page 9: Exploring Scientific Discovery with Large-Scale Parallel ...people.cs.uchicago.edu/~tga/pubs/scale-2013-slides.pdf · M. Hategan, B. Cli ord, and I. Raicu, Parallel scripting for

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Simulated Annealing

• Want to find “best” set of simulation parameters

• Optimize using a simulated annealing algorithm• Basic idea:

1. Perturb one parameter2. Evaluate objective function for current parameters3. Depending on result, maybe undo parameter change4. Repeat...

10x independent simulated annealing instances

500-1000x parameter updates

………

…… …

Visualization of parallel simulated annealing with 8-way parallelizedobjective function. Real runs have 1000-way parallelism.

Page 10: Exploring Scientific Discovery with Large-Scale Parallel ...people.cs.uchicago.edu/~tga/pubs/scale-2013-slides.pdf · M. Hategan, B. Cli ord, and I. Raicu, Parallel scripting for

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Scale-up requirements

• Optimization + validation: 0.25–0.5M CPU-hours per model

• Fast feedback needed: scientists want to iterate models

• Need to get high speedup: 4000x+ to get timely results

• Relatively short-lived tasks: 0.2s-20s. Fan-out and fan-inevery 1-2 minutes.

• Unpredictable task duration: need to dynamically assign tasksto processors, in scalable way

• High-performance dynamic task allocation mandatory

10x independent simulated annealing instances

500-1000x parameter updates

………

…… …

Page 11: Exploring Scientific Discovery with Large-Scale Parallel ...people.cs.uchicago.edu/~tga/pubs/scale-2013-slides.pdf · M. Hategan, B. Cli ord, and I. Raicu, Parallel scripting for

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Adapting for Swift/T

• Kept compute-intensive simulation logic in C++

• Converted simulated annealing algorithm to Swift/T:• Nested parallel loops• Sequential iteration• Logic and formulas to update parameters• Logging and output

Original Swift/T VersionLines of Code Python: 33 lines

C++: 1175 linesSwift/T: 269 linesC++: 861 lines

Scalability One node, many cores Many cores, 100’s or1000’s of nodes

Page 12: Exploring Scientific Discovery with Large-Scale Parallel ...people.cs.uchicago.edu/~tga/pubs/scale-2013-slides.pdf · M. Hategan, B. Cli ord, and I. Raicu, Parallel scripting for

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Scaling up!

• Strong scaling results for production workload at differentcompiler optimization levels: scales well!

• Mainly limited by amount of parallelism in workload ⇒ couldscale further with different optimization algorithm

• STC compiler optimization: reduces messaging ⇒ betterscaling

0 1000 2000 3000 40000

5

10

15

O0

O1

O2

O3

Ideal

Cores

Iters

/sec

0 2000 4000 6000 80000

0.01

0.02

0.03

0.04

0.05

iters/sec

Ideal

Cores

Iters

/sec

Strong scaling for down-scaled at different STC optimization levels(left) and full-scale problem (right)

Page 13: Exploring Scientific Discovery with Large-Scale Parallel ...people.cs.uchicago.edu/~tga/pubs/scale-2013-slides.pdf · M. Hategan, B. Cli ord, and I. Raicu, Parallel scripting for

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Task Prioritization

• Key technique enabled by Swift/T: task prioritization

• Improves resource utilization and time-to-solution

• Exploits application knowledge:• “Catch-up” heuristic for slower optimization chains• Prioritize long-running tasks: target parameter correlated

with runtime

@prio= 100*(niters - iter) + target

run simulation(...);

1200 1300 1400 1500 160005

1015202530

without priorities

with priorities

Time (seconds)

Bus

y co

res

Page 14: Exploring Scientific Discovery with Large-Scale Parallel ...people.cs.uchicago.edu/~tga/pubs/scale-2013-slides.pdf · M. Hategan, B. Cli ord, and I. Raicu, Parallel scripting for

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

References

J. Evans and A. Rzhetsky, Machine science, Science 329 (2010), no. 5990.

J. M. Wozniak, T. G. Armstrong, M. Wilde, D. S. Katz, E. Lusk, and I. T.Foster, Swift/T: Large-scale application composition viadistributed-memory data flow processing, Proc. CCGrid ’13.

M. Wilde, I. Foster, K. Iskra, P. Beckman, Z. Zhang, A. Espinosa,M. Hategan, B. Clifford, and I. Raicu, Parallel scripting for applications atthe petascale and beyond, Computer 42 (2009), no. 11.

Acknowledgements This research is supported in part by the U.S. DOE Office of

Science under contract DE-AC02-06CH11357, FWP-57810. This research was

supported in part by NIH through resources provided by the Computation Institute

and the Biological Sciences Division of the University of Chicago and Argonne

National Laboratory, under grant S10 RR029030-01.

Page 15: Exploring Scientific Discovery with Large-Scale Parallel ...people.cs.uchicago.edu/~tga/pubs/scale-2013-slides.pdf · M. Hategan, B. Cli ord, and I. Raicu, Parallel scripting for

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Demo

• Compile application from scratch to illustrate toolchain

• Production-scale run of SciColSim on 8400 cores of BeagleCray XE6 supercomputer @ UChicago

Page 16: Exploring Scientific Discovery with Large-Scale Parallel ...people.cs.uchicago.edu/~tga/pubs/scale-2013-slides.pdf · M. Hategan, B. Cli ord, and I. Raicu, Parallel scripting for

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Conclusions

• Can scale up existing applications with parallel scripting

• Quick development cycle: easy to debug and modify code,compared with alternative cluster programming models

• Appropriate for applications that can be implemented asuser-defined tasks with explicit data dependences

• Much better for moderately fine-grained workloads on largeclusters than traditional centralized workflow systems

• Does not support wide-area grids/clouds (yet)

Page 17: Exploring Scientific Discovery with Large-Scale Parallel ...people.cs.uchicago.edu/~tga/pubs/scale-2013-slides.pdf · M. Hategan, B. Cli ord, and I. Raicu, Parallel scripting for

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Task Dispatch Speed

• Cray XE6

• On 10 nodes, 24 cores per node

• Many independent 0s tasks

ADLB O0 O1 O2 O30.00.20.40.60.81.01.21.4

Wor

k T

asks

/s (

Mil.

)

Page 18: Exploring Scientific Discovery with Large-Scale Parallel ...people.cs.uchicago.edu/~tga/pubs/scale-2013-slides.pdf · M. Hategan, B. Cli ord, and I. Raicu, Parallel scripting for

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Scaling up to 105

• Experiment on Blue Gene/P Intrepid at Argonne National Lab

• 100s task durations

• Experiment used old version of Swift/T. Many improvementssince.

Page 19: Exploring Scientific Discovery with Large-Scale Parallel ...people.cs.uchicago.edu/~tga/pubs/scale-2013-slides.pdf · M. Hategan, B. Cli ord, and I. Raicu, Parallel scripting for

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Optimizations

• O0: Only optimize write reference counts.

• O1: Basic optimizations: constant folding, dead codeelimination, forward data flow, and loop fusion.

• O2: More aggressive optimizations: asynchronous opexpansion, wait coalescing, hoisting, and small loop expansion.

• O3: All optimizations: function inlining, pipeline fusion, loopunrolling, intra-block instruction reordering, and simplealgebra.

Page 20: Exploring Scientific Discovery with Large-Scale Parallel ...people.cs.uchicago.edu/~tga/pubs/scale-2013-slides.pdf · M. Hategan, B. Cli ord, and I. Raicu, Parallel scripting for

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Comparison with Other Systems

• Hadoop• Fixed communication pattern• Must reorganize code to fit MapReduce model• Minimize per data-item overhead versus minimize per

task-overhead

• Swift and other workflow systems• Single master node limits scalability• Optimizing compiler• Better foreign-function interface for directly calling C++ code• No support yet for wide-area systems

Page 21: Exploring Scientific Discovery with Large-Scale Parallel ...people.cs.uchicago.edu/~tga/pubs/scale-2013-slides.pdf · M. Hategan, B. Cli ord, and I. Raicu, Parallel scripting for

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Comparison with PGAS

• Scripting paradigm versus one language for computation +coordination

• Focus on simplicity

• No explicit data placement: managed by runtime

• Strong safety guarantees (e.g. determinism)