22
Many-Task Applications in the Integrated Plasma Simulator Samantha S. Foley, Wael R. Elwasif, David E. Bernholdt, Aniruddha G. Shet Oak Ridge National Laboratory Randall Bramley Indiana University

Many-Task Applications in the Integrated Plasma Simulatordatasys.cs.iit.edu/events/MTAGS10/paper02-slides.pdf(IPS) ! The Integrated Plasma Simulator ... Comp 1 Comp 2 Comp 3 Driver

Embed Size (px)

Citation preview

Many-Task Applications in the Integrated Plasma

Simulator Samantha S. Foley, Wael R. Elwasif,

David E. Bernholdt, Aniruddha G. Shet Oak Ridge National Laboratory

Randall Bramley Indiana University

Motivation ! Computational science is moving from single SPMD codes to

loosely coupled MPMD applications

! MPMD viewed through a many-task computing (MTC) paradigm: !  Some degree of data and task coupling

!  Varying parallelism and runtime between tasks

!  Modest number of tasks, executed in a time stepped style

! Mismatch in runtime and parallelism, and the presence of dependencies lead to poor load balancing situations

Nov. 15, 2010 2 MTAGS - SC10

The Integrated Plasma Simulator(IPS)

!  The Integrated Plasma Simulator (IPS) is a component framework for fusion energy simulation for the Center for Simulation of RF Wave Interactions with Magnetohydrodynamics (SWIM)

! One of three US DOE SciDAC 2 projects to explore integrated fusion simulation

!  Primary directive: “Explore the targeted coupled physics interactions while constituent codes evolve independently, minimizing impact on long lived codes and other research/production use”. !  Code re-factoring and/or rewriting ruled out.

Nov. 15, 2010 3 MTAGS - SC10

IPS Landscape ! Existing physics codes

! Little prior experience with coupling in the fusion community

! Loose coupling and modest data communication

! Target platforms are leadership class facilities (Cray)

Nov. 15, 2010 4 MTAGS - SC10

Component Adapter Physics App.

Framework Services Framework

State Adapter

Component Adapter Physics App.

State Adapter

Plasma State

Solution: evolutionary development of a light-weight Python framework that allows underlying codes to remain unchanged, provides a flexible execution environment, and loosely coupled simulation composition strategies with file-based data coupling

IPS: Architecture

5

Tasks (Parallel Physics Codes) Resource Manager

Task Manager

Batch Allocation

Head Node

IPS Framework

IPS: Levels of Parallelism

Simulation A

Comp 1 Comp 2 Comp 3 Driver

1.  Tasks are parallel codes 2.  Tasks of a single component can run concurrently 3.  Tasks of multiple components can run concurrently 4. Multiple simulations can run concurrently within the same batch

allocation and framework instance

These levels of parallelism can be used to improve the resource utilization efficiency

Nov. 15, 2010 6 MTAGS - SC10

Simulation B

Comp 1 Comp 2 Comp 3 Driver

Framework

RM & TM in the IPS

Nov. 15, 2010 7 MTAGS - SC10

Simulation A

Comp 1 Comp 2 Comp 3 Driver RM

TM

Batch Allocation

Queue of Tasks

Resource Usage Simulator (RUS)

! We created RUS to examine resource utilization and efficiency of IPS simulations !  Accurately simulates task and resource management in the IPS

!  Random variation of task execution times

!  RUS provides the ability to examine how the multiple levels of parallelism and characteristics of the tasks interact !  Focus on multiple simulations capability

!  Ultimately, this tool will be used to inform how IPS simulations can be configured with respect to resource efficiency

Nov. 15, 2010 8 MTAGS - SC10

SWIM Scenarios ! TNT Scenario

! TORIC: 4 processes, 97 ± 2 seconds

! NUBEAM: 16 processes, 115 ± 15 seconds

! TSC: 1 process, 130 ± 40 seconds

! ANT Scenario

! AORSA: 1024 processes, 1020 ± 5 seconds

! NUBEAM: 512 processes, 1020 ± 300 seconds

! TSC: 1 process, 130 ± 40 seconds

Nov. 15, 2010 9 MTAGS - SC10

T T

N

Time

Co

res

T A

N

Time

Co

res

Multiple Simulation Task Interleaving ! Single simulation

! 43% resource efficiency

! 8 steps completed

! Two simulations

! 64% resource efficiency

! 12 total steps

! Four simulations

! 86% resource efficiency

! 16 total steps

! More physics can be done in the same time and same resources using MTC capability

Nov. 15, 2010 10 MTAGS - SC10

Resource Utilization - TNT

Nov. 15, 2010 11 MTAGS - SC10

Resource efficiency Avg. time/simulation

16 cores, 4 sims,

86% effcy

T 4p

N 16p

Time

Co

res

T 1p

Resource Utilization - ANT

Nov. 15, 2010 12 MTAGS - SC10

Resource efficiency Avg. time/simulation

T 1p

A 1024p N

512p

Time

Co

res

!  >90% efficiency achievable for all multi-simulation cases

!  Peak efficiencies occur at multiples of the cores needed to run each task

! E.g., 1540 cores allows 1 instance of each task to run concurrently

Study of Resource Utilization Trends

!  Using RUS we examine the resource utilization efficiency of variations in SWIM workloads !  What happens to the resource utilization when multiple

instances of the same simulation execute concurrently? !  What happens to the resource utilization when the time or

parallelism of the tasks are varied?

! We performed four studies on the two scenarios: 1.  Time scaling of TSC 2.  Time scaling of NUBEAM 3.  Weak parallel scaling of NUBEAM 4.  Strong parallel scaling of NUBEAM

!  The following graphs show the highest peak for a given number of simulations versus experiment variation (time or parallelism)

Nov. 15, 2010 13 MTAGS - SC10

Scaling Trends

Nov. 15, 2010 14 MTAGS - SC10

T 4p

N 16p

Time

Co

res

T 1p

Time Scaling of TSC

Nov. 15, 2010 15 MTAGS - SC10

T 1p

A 1024p N

512p

Time

Co

res

T 4p

N 16p

Time

Co

res

T 1p

TNT ANT

Time Scaling of NUBEAM

Nov. 15, 2010 16 MTAGS - SC10

T 1p

A 1024p N

512p

Time

Co

res

T 4p

N 16p

Time

Co

res

T 1p

TNT ANT

Weak Scaling of NUBEAM

Nov. 15, 2010 17 MTAGS - SC10

Weak scaling = increase work,

increase parallelism, same runtime

MTAGS - SC10

T 4p

N 16p

Time

Co

res

T 1p

T 1p

A 1024p N

512p

Time

Co

res

TNT ANT

Strong Scaling of NUBEAM

Nov. 15, 2010 18 MTAGS - SC10

Strong scaling = same work, increase parallelism, decrease

runtime

18

T 4p

N 16p

Time

Co

res

T 1p

T 1p

A 1024p N

512p

Time

Co

res

TNT ANT

General Observations for Many Task Execution

! Interleaving multiple simulations is an effective way to increase resource utilization efficiency !  Even small numbers of interleaved simulations (3 or 4) are

sufficient for significant resource efficiency improvements

! Modest increases in allocation size produce high efficiencies !  Local maxima at larger allocation sizes tend to be lower than

the first or second peak

! Great differences in parallelism of tasks provide more opportunities for effective resource utilization !  However, it is more important for the tasks to match in

parallelism than in time to improve resource efficiency

Nov. 15, 2010 19 MTAGS - SC10

Future Work

! Examine different SWIM simulation scenarios ! Validate and improve model using data from IPS runs

!  Study impact of concurrent task execution in a single simulation

! Study, develop and include models for overheads such as task launch time, I/O, component and framework activities in RUS

! Develop the capability to use RUS as a recommendation system for IPS simulation configuration to maximize resource utilization

! Explore the impact of different scheduling algorithms and policies

Nov. 15, 2010 20 MTAGS - SC10

Summary ! The IPS provides a flexible and lightweight execution

environment and coupling framework for MPMD fusion energy applications

! Characteristics of fusion tasks lead to poor resource utilization

! Using RUS, we showed how the execution of small numbers of simultaneous simulations can dramatically improve resource utilization

! Through simulation of resource utilization of real and synthetic workloads, we are able to extract some preliminary guidelines for constructing more efficient coupled simulations using a many task approach

Nov. 15, 2010 21 MTAGS - SC10

Questions?

Nov. 15, 2010 22 MTAGS - SC10