14
Overview of Recent MCMD Developments Jarek Nieplocha CCA Forum Meeting San Francisco

Overview of Recent MCMD Developments Jarek Nieplocha CCA Forum Meeting San Francisco

Embed Size (px)

Citation preview

Page 1: Overview of Recent MCMD Developments Jarek Nieplocha CCA Forum Meeting San Francisco

Overview of Recent MCMD Developments

Jarek Nieplocha

CCA Forum Meeting

San Francisco

Page 2: Overview of Recent MCMD Developments Jarek Nieplocha CCA Forum Meeting San Francisco

MCMD Working Group Recent activities focus on development of specifications for CCA-based

processor groups teams BOFs held during CCA meetings in April and July, 2007 Mini-Workshop held January 24, 2007 Use cases documented and analyzed Wiki webpage and mailing list:

https://www.cca-forum.org/wiki/tiki-index.php?page=MCMD-WG Specifications document version 0.3

Telecon held Sept 28, 2007 Several other people sent good comments by email Issues about threads, fault tolerant environment, MPI-centric narrative and examples, ID

representation Plans

Complete work on the spec document be end of 2007 Telecon, mailing list discussions and reviews

Prototype implementation and some application evaluation NWChem, subsurface

Page 3: Overview of Recent MCMD Developments Jarek Nieplocha CCA Forum Meeting San Francisco

Multilevel Parallelism

How can applications effectively exploit the massive amount of h/w parallelism available in petaflop-scale machines?

Massive numbers of CPUs in future systems require algorithm and software redesign to exploit all available parallelism

Multilevel parallelism Divide work into parts that can be executed

concurrently on groups of processors Can exploit massive hardware parallelism Increases granularity of computation =>

improve the overall scalability

Task 2

Task 1

Task 2Task 1

Page 4: Overview of Recent MCMD Developments Jarek Nieplocha CCA Forum Meeting San Francisco

Multiple Component Multiple Data

MCMD extends the SCMD (single component multiple data) model that was the main focus of CCA in Scidac-1 Prototype solution described at SC’05 for

computational chemistry Allows different groups of processors execute

different CCA components Main motivation for MCMD is support for multiple

levels of parallelism in applicationsSCMD

MCMD

NWChem example

SCMDMCMD

Page 5: Overview of Recent MCMD Developments Jarek Nieplocha CCA Forum Meeting San Francisco

MCMD Use Cases

Coop Parallelism Hierarchical Parallelism in Computational Chemistry Ab Initio Nuclear Structure Calculations Coupled Climate Modeling Molecular Dynamics, Multiphysics Simulations Fusion use-case described at Silver Springs Meeting

Page 6: Overview of Recent MCMD Developments Jarek Nieplocha CCA Forum Meeting San Francisco

Target Execution Model and Global Ids

Global id specification global id = <machine id> + <job id> +

<task/process rank> + <thread id>

Single/Multiple mpiruns

MPI Tasks/Processes

Threads Threads

Page 7: Overview of Recent MCMD Developments Jarek Nieplocha CCA Forum Meeting San Francisco

Group Management

Various execution models E.g. coop parallelism vs. single mpirun

Programming Models Should be MPI-Friendly but also open to other

models MPI, Threads, GAS models including GA,

UPC, HPCS languages Global process and team ids Group translators

Page 8: Overview of Recent MCMD Developments Jarek Nieplocha CCA Forum Meeting San Francisco

CCA Processor Teams

We propose to use a slightly different term of process(or) teams rather than groups Avoid confusion with existing terminology and interfaces in

programming models Some use cases call for something more general than MPI

groups e.g., COOP with multiple mpiruns For example, CCA team can encompass a collection of

processes in two different MPI jobs. We cannot construct a single MPI group corresponding to that.

Operations on CCA teams might not have direct mapping to group operations in programming models that support groups

MPI Job A MPI Job B

MPI groups

CCA Process Team

Page 9: Overview of Recent MCMD Developments Jarek Nieplocha CCA Forum Meeting San Francisco

CCA Team Service

How do initialize the application? COOP example makes it non-trivial

Provides the following Create, destroy, compare, split teams

More capabilities can be added as required Assigns global ids to tasks from one or more

jobs running on one or more machines Global id = <machines id> + <job id> + < task id>

Also, <thread id> if we were to support threads at component level in the future

Locality Information Gets the job id, machines id, task id of the given

task

Page 10: Overview of Recent MCMD Developments Jarek Nieplocha CCA Forum Meeting San Francisco

Plugins

CCA Team

MPI Group Service

GA

Gro

up

PV

M G

roup

Inte

rope

rabl

e G

roup

Ser

vice

Lay

er

MPI Group Service

GA Group Service

PVM Group Service

XYZ Prog Model’sGroup Service

CCATeam

Service

Provide mappings between CCAteams and task/image/thread groups for programming modelscomponents written in

Page 11: Overview of Recent MCMD Developments Jarek Nieplocha CCA Forum Meeting San Francisco

Example

LandOcean

Coupled System

PVMProcGroup

GAProcGroup

MPIProcGroup

Ocean Model Land Model I/O

Global CCA Team

PVM Job A MPI/GA Job B

Page 12: Overview of Recent MCMD Developments Jarek Nieplocha CCA Forum Meeting San Francisco

Specification Document

Version 0.3 on wiki (Word, PDF) Please review and contribute Looking at candidate applications and

component s/w for initial evaluation Numerical, I/O

Page 13: Overview of Recent MCMD Developments Jarek Nieplocha CCA Forum Meeting San Francisco

Issues from the Telecon

Eliminate threads from the spec + Add more emphasis on mixing multiple

programming models + How do we handle global ids ?

Pros and cons of using integers Conclusion is to use "global ids" as

objects and introduce a new representaion called "global ranks".

Need for dynamic team management

Page 14: Overview of Recent MCMD Developments Jarek Nieplocha CCA Forum Meeting San Francisco

Dynamic Behavior

We want to support dynamic nature of applications

Application composed of parallel jobs that are launched and complete at different stages of application execution

Fault tolerance in style of FT-MPI Adaptation to faults Teams can shrink/expand. Cannot count of

persistency of values returned by team service calls.