66
1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

Embed Size (px)

Citation preview

Page 1: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

1

Charm++ Tutorial

Parallel Programming Laboratory, UIUC

Page 2: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

2

Overview Introduction

– Virtualization– Data Driven Execution in Charm++– Object-based Parallelization

Charm++ features with simple examples– Chares and Chare Arrays– Parameter Marshalling– Structured Dagger Construct– Load Balancing– Tools

– Projections– LiveViz

Page 3: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

3

Technical Approach

Specialization

Aut

omat

ion

Decomposition done by programmer, everything else automated

Seek optimal division of labor between “system” and programmer

Scheduling

Mapping

Decomposition

Charm++

Page 4: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

4

Object - based Parallelization

User View

System implementation

User is only concerned with interaction between objects

Page 5: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

5

Virtualization: Object-based Decomposition

Divide the computation into a large number of pieces – Independent of number of processors– Typically larger than number of processors

Let the system map objects to processors

Page 6: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

6

Chares – Concurrent Objects

Can be dynamically created on any available processor

Can be accessed from remote processors Send messages to each other

asynchronously Contain “entry methods”

Page 7: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

7

.ci filemainmodule hello {

mainchare mymain {

entry mymain();

};

};

#include “hello.decl.h”class mymain : public Base_mymain{public: mymain(int argc, char **argv) {

ckout <<“Hello World” <<endl;CkExit();

}};#include “hello.def.h”

.C file

“Hello World!”

Generates

hello.decl.h

hello.def.h

Page 8: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

8

Compile and run the programCompiling

• charmc <options> <source file>• -o, -g, -language, -module, -tracemode

pgm: pgm.ci pgm.h pgm.C charmc pgm.ci charmc pgm.c charmc –o pgm pgm.o –language charm++

To run a CHARM++ program named ``pgm'' on four processors, type:

charmrun pgm +p4 <params>

Nodelist file (for network architecture)• list of machines to run the program• host <hostname> <qualifiers>

Page 9: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

9

Data Driven Execution in Charm++

Scheduler

Message Q

Scheduler

Message Q

Objectsx y

CkExit()

y->f() ??

Page 10: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

10

Charm++ solution: proxy classes

Proxy class generated for each chare class – For instance, CProxy_Y is the proxy class

generated for chare class Y. – Proxy objects know where the real object is– Methods invoked on this object simply put the

data in an “envelope” and send it out to the destination

Given a proxy p, you can invoke methods– p.method(msg);

Page 11: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

11

Ring program

• Array of Objects of the same kind

• Each one communicates with the next one

• Individual chares – cumbersome and not practical

A collection of chares, –with a single global name for the collection–each member addressed by an index–Mapping of element objects to processors handled by the system

Page 12: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

12

Chare Arrays

A[1]A[0]

System view

A[1]A[0]

A[0] A[1] A[2] A[3] A[..] User’s view

Page 13: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

13

mainmodule m { readonly CProxy_mymain mainProxy;

mainchare mymain{ …. } array [1D] Hello { entry Hello(void); entry void sayHi(int HiNo); };};

(.ci) file

int nElements=4;mainProxy = thisProxy;CProxy_Hello p = CProxy_Hello::ckNew(nElements);//Have element 0 say “hi”p[0].sayHi(12345);

In mymain:: mymain()

Array Hello

p.SayHi(…)

class Hello : public CBase_Hello {

public:

Hello(CkMigrateMessage *m){}

Hello();

void sayHi(int hiNo);

};Class Declaration

Page 14: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

14

void Hello::sayHi(int hiNo){ ckout << hiNo <<"from element" << thisIndex << endl; if (thisIndex < nElements-1) //Pass the hello on: thisProxy[thisIndex+1].sayHi(hiNo+1); else //We've been around once-- we're done. mainProxy.done();}

Array Hello

Read-only

Element index

Array Proxy

void mymain::done(void){

CkExit();

}

Page 15: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

15

Sorting numbers Sort n integers in increasing order. Create n chares, each keeping one number. In every odd iteration chares numbered 2i swaps with chare 2i+1 if

required. In every even iteration chares 2i swaps with chare 2i-1 if required. After each iteration all chares report to the mainchare. After everybody

reports mainchares signals next iteration. Sorting completes in n iterations.

Even round:

Odd round:

Page 16: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

16

mainmodule sort{ readonly CProxy_myMain mainProxy; readonly int nElements;

mainchare myMain { entry myMain(CkArgMsg *m); entry void swapdone(void); }; array [1D] sort{ entry sort(void); entry void setValue(int myvalue); entry void swap(int round_no); entry void swapReceive(int from_index, int value); }; };

Array Sort class sort : public CBase_sort{

private:

int myValue;

public:

sort() ;

sort(CkMigrateMessage *m);

void setValue(int number);

void swap(int round_no);

void swapReceive(int from_index, int value);

}; swapcount=0;

roundsDone=0;

mainProxy = thishandle;

CProxy_sort arr = CProxy_sort::ckNew(nElements);

for(int i=0;i<nElements;i++)

arr[i].setValue(rand());

arr.swap(0);

sort.cisort.h

Main::Main()

Page 17: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

17

void sort::swap(int roundno){ bool sendright=false; if (roundno%2==0 && thisIndex%2==0|| roundno%2==1 && thisIndex%2==1) sendright=true; //sendright is true if I have to send to right

if((sendright && thisIndex==nElements-1) || (!sendright && thisIndex==0)) mainProxy.swapdone(); else{

if(sendright) thisProxy[thisIndex+1].swapReceive(thisIndex, myValue); else thisProxy[thisIndex-1].swapReceive(thisIndex, myValue); }}

Array Sort(contd..)

void sort::swapReceive(int from_index, int value){ if(from_index==thisIndex-1 && value>myValue) myValue=value; if(from_index==thisIndex+1 && value<myValue) myValue=value; mainProxy.swapdone();

}

void myMain::swapdone(void){

if (++swapcount==nElements){

swapcount=0; roundsDone++;

if (roundsDone==nElements) CkExit();

else arr.swap(roundsDone);

}

}

Error!!

Page 18: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

18

Remember :

Message passing is asynchronous.

Messages can be delivered out of order.

3 2 3

swap

swapswapReceive

swapReceive

2 is lost!

Page 19: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

19

void sort::swap(int roundno){ bool sendright=false; if (roundno%2==0 && thisIndex%2==0|| roundno%2==1 && thisIndex%2==1) sendright=true; //sendright is true if I have to send to right

if((sendright && thisIndex==nElements-1) || (!sendright && thisIndex==0)) mainProxy.swapdone(); else{

if(sendright) thisProxy[thisIndex+1].swapReceive(thisIndex, myValue); }}

Array Sort(correct)

void sort::swapReceive(int from_index, int value){ if(from_index==thisIndex-1){ if(value>myValue){

thisProxy[thisIndex-1].swapReceive(thisIndex, myValue); myValue=value; } else thisProxy[thisIndex-1].swapReceive(thisIndex, value); } if(from_index==thisIndex+1) myValue=value; mainProxy.swapdone();

}

void myMain::swapdone(void){

if (++swapcount==nElements){

swapcount=0; roundsDone++;

if (roundsDone==nElements) CkExit();

else arr.swap(roundsDone);

}

}

Page 20: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

20

Basic Entities in Charm++ Programs

Sequential Objects– ordinary sequential C++ code and objects

Read-only variables– initialized in main::main()– used as “global” variables

Chares (concurrent objects) Chare Arrays (an indexed collection of

chares)

Page 21: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

21

Illustrative example: Jacobi 1D

Input: 2D array of values with boundary conditions

In each iteration, each array element is computed as the average of itself and its neighbors

Iterations are repeated till some threshold error value is reached

Page 22: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

22

Jacobi 1D: Parallel Solution!

Slice up the 2D array into sets of columns

Chare = computations in one set At the end of each iteration

– Chares exchange boundaries– Determine maximum change in

computation Output result when threshold is reached

Page 23: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

23

Arrays as Parameters

Array cannot be passed as pointer specify the length of the array in the

interface file– entry void bar(int n,double arr[n])

Page 24: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

24

Jacobi Codevoid Ar1::doWork(int sendersID, int n, double arr[n])

{

maxChange = 0.0;

if (sendersID == thisIndex-1)

{ leftmsg = 1; // set boolean to indicate we received

the right message }

else if (sendersID == thisIndex+1)

{ rightmsg = 1; // set boolean to indicate we

received the right message }

// Rest of the code on the next slide

}

Page 25: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

25

Reduction Like Barrier in MPI Apply a single operation (add, max, min, ...)

to data items scattered across many processors

Collect the result in one place Reduce x across all elements

– contribute(sizeof(x), &x,

CkReduction::sum_int,processResult );– Function “processResult()”

All contribute calls from one array must name the same function

Page 26: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

26

void Ar1::doWork(int sendersID, int n, double arr[n]){//Code on previous slide … if (((rightmsg == 1) && (leftmsg == 1)) || ((thisIndex == 0) && (rightmsg == 1)) || ((thisIndex ==K-1) && (leftmsg == 1))) { // Both messages have been received and we can now compute the new values of the matrix

… // Use a reduction to find determine if all of the maximum errors on each processor had a maximum change that is below our threshold value. contribute(8, &maxChange, CkReduction::max_double, cb); }}

Jacobi Code Continued

Page 27: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

27

Structured Dagger

What is it?– A coordination language built on top of

Charm++ Motivation:

– To reduce the complexity of program development without adding any overhead

Page 28: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

28

Structured Dagger Constructs

atomic {code} – Specifies that no structured dagger constructs appear inside of the code so it executes atomically.

overlap {code} – Enables all of its component constructs concurrently and can execute these constructs in any order.

when <entrylist> {code} – Specifies dependencies between computation and message arrival.

Page 29: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

29

Structure Dagger Constructs Continued if / else/ while / for – These are the same as

their C++ conterparts, except that they can contain when blocks in their respective code segments. Hence execution can be suspended while they wait for messages.

forall – Functions like a for statement, but enables its component constructs for its entire iteration space at once. As a result it doesn’t need to execute its iteration space in strict sequence.

Page 30: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

30

Jacobi Example Using Structured Dagger

jacobi.ci array[1D] Ar1 {…entry void GetMessages (MyMsg *msg) { when rightmsgEntry(MyMsg *right), leftmsgEntry(MyMsg *left) { atomic { CkPrintf(“Got both left and right messages \n”); doWork(right, left); } }};entry void rightmsgEntry(MyMsg *m);entry void leftmsgEntry(MyMsg *m);…};

In a 1D jacobi that doesn’t use structured dagger the code in the .C file for the doWork function is much more complex. The code needs to manually check if both messages have been received by using if/else statements. By using structured dagger, the doWork function will not be called until both messages have been received. The compiler will translate the structured dagger code into code that will do the appropriate checks, hence making the programmers job simpler.

Page 31: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

31

Screen shots – Load imbalance

Jacobi 2048 X 2048

Threshold 0.1

Chares 32

Processors 4

Page 32: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

32

Timelines – load imbalance

Page 33: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

33

Migration Array objects can migrate from one PE to another To migrate, must implement pack/unpack or pup method Need this because migration creates a new object on the

destination processor while destroying the original pup combines 3 functions into one

– Data structure traversal : compute message size, in bytes– Pack : write object into message– Unpack : read object out of message

Basic Contract : here are my fields (types, sizes and a pointer)

Page 34: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

34

Pup – How to write it?Class ShowPup {

double a; int x;

char y; unsigned long z;

float q[3]; int *r; // heap allocated memory

public:

… other methods …

void pup(PUP:er &p) {

p | a; // notice that you can either use | operator

p | x; p | y;

p(z); // or ()

p(q,3); // but you need () for arrays

if(p.isUnpacking() )

r = new int[ARRAY_SIZE];

p(r,ARRAY_SIZE);

}

};

Page 35: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

35

Load Balancing All you need is a working pup

link a LB module

– -module <strategy>

– RefineLB, NeighborLB, GreedyCommLB, others…

– EveryLB will include all load balancing strategies

runtime option

– +balancer RefineLB

Page 36: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

36

Centralized Load Balancing

Uses information about activity on all processors to make load balancing decisions

Advantage: since it has the entire object communication graph, it can make the best global decision

Disadvantage: Higher communication costs/latency, since this requires information from all running chares

Page 37: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

37

Neighborhood Load Balancing

Load balances among a small set of processors (the neighborhood) to decrease communication costs

Advantage: Lower communication costs, since communication is between a smaller subset of processors

Disadvantage: Could leave a system which is globally poorly balanced

Page 38: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

38

Main Centralized Load Balancing Strategies GreedyCommLB – a “greedy” load balancing strategy

which uses the process load and communications graph to map the processes with the highest load onto the processors with the lowest load, while trying to keep communicating processes on the same processor

RefineLB – move objects off overloaded processors to under-utilized processors to reach average load

Others – the manual discusses several other load balancers which are not used as often, but are available for testing and experimentation

Page 39: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

39

Neighborhood Load Balancing Strategies

NeighborLB – neighborhood load balancer, currently uses a neighborhood of 4 processors

Page 40: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

40

When to Re-balance Load?

Programmer Control: AtSync load balancing

AtSync method: enable load balancing at specific point– Object ready to migrate– Re-balance if needed– AtSync() called when your chare is ready to be load

balanced – load balancing may not start right away– ResumeFromSync() called when load balancing for this

chare has finished

Default: Load balancer will migrate when needed

Page 41: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

41

Processor Utilization: After Load Balance

Page 42: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

42

Timelines: Before and After Load Balancing

Page 43: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

43

Other tools: LiveViz

Page 44: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

44

LiveViz – What is it?

Charm++ library Visualization tool Inspect your

program’s current state

Client runs on any machine (java)

You code the image generation

2D and 3D modes

Page 45: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

45

LiveViz – Monitoring Your Application

LiveViz allows you to watch your application’s progress

Can use it from work or home

Doesn’t slow down computation when there is no client

Page 46: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

46

LiveViz - Compilation

Compile the LiveViz library itself– Must have built charm++ first!

From the charm directory, run:– cd tmp/libs/ck-libs/liveViz– make

Page 47: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

47

Running LiveViz Build and run the server

– cd pgms/charm++/ccs/liveViz/serverpush– Make– ./run_server

Or in detail…

Page 48: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

48

Running LiveViz

Run the client– cd pgms/charm++/ccs/liveViz/client– ./run_client [<host> [<port>]]

Should get a result window:

Page 49: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

49

LiveViz Request Model

LiveViz Server Code

Client

Parallel Application

Get Image

Poll for Request Poll Request Returns Work

Image Chunk Passed to Server Server Combines Image Chunks

Send Image to Client

Buffer Request

Page 50: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

50

Main:

Setup worker array, pass data to them

Workers:

Start looping

Send messages to all neighbors with ghost rows

Wait for all neighbors to send ghost rows to me

Once they arrive, do the regular Jacobi relaxation

Calculate maximum error, do a reduction to compute

global maximum error

If timestep is a multiple of 64, load balance the

computation. Then restart the loop.

Jacobi 2D Example StructureMain:

Setup worker array, pass data to them

Workers:

Start looping

Send messages to all neighbors with ghost rows

Wait for all neighbors to send ghost rows to me

Once they arrive, do the regular Jacobi relaxation

Calculate maximum error, do a reduction to compute

global maximum error

If timestep is a multiple of 64, load balance the

computation. Then restart the loop.

Main:

Setup worker array, pass data to them

Workers:

Start looping

Send messages to all neighbors with ghost rows

Wait for all neighbors to send ghost rows to me

Once they arrive, do the regular Jacobi relaxation

Calculate maximum error, do a reduction to compute

global maximum error

If timestep is a multiple of 64, load balance the

computation. Then restart the loop.

Page 51: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

51

#include <liveVizPoll.h>

void main::main(. . .) {

// Do misc initilization stuff

// Now create the (empty) jacobi 2D array

work = CProxy_matrix::ckNew(0);

// Distribute work to the array, filling it as you do

}

#include <liveVizPoll.h>

void main::main(. . .) {

// Do misc initilization stuff

// Create the workers and register with liveviz

CkArrayOptions opts(0); // By default allocate 0

// array elements.

liveVizConfig cfg(true, true); // color image = true and

// animate image = true

liveVizPollInit(cfg, opts); // Initialize the library

// Now create the jacobi 2D array

work = CProxy_matrix::ckNew(opts);

// Distribute work to the array, filling it as you do

}

LiveViz Setup

Page 52: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

52

Adding LiveViz To Your Codevoid matrix::serviceLiveViz() {

liveVizPollRequestMsg *m;

while ( (m = liveVizPoll((ArrayElement *)this, timestep))

!= NULL ) {

requestNextFrame(m);

}

}

void matrix::startTimeSlice() {

// Send ghost row north, south, east, west, . . .

sendMsg(dims.x-2, NORTH, dims.x+1, 1, +0, -1);

}

void matrix::startTimeSlice() {

// Send ghost row north, south, east, west, . . .

sendMsg(dims.x-2, NORTH, dims.x+1, 1, +0, -1);

// Now having sent all our ghosts, service liveViz

// while waiting for neighbor’s ghosts to arrive.

serviceLiveViz();

}

Page 53: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

53

Generate an Image For a Requestvoid matrix::requestNextFrame(liveVizPollRequestMsg *m) {

// Compute the dimensions of the image bit we’ll send

// Compute the image data of the chunk we’ll send –

// image data is just a linear array of bytes in row-major

// order. For greyscale it’s 1 byte, for color it’s 3

// bytes (rgb).

// The liveViz library routine colorScale(value, min, max,

// *array) will rainbow-color your data automatically.

// Finally, return the image data to the library

liveVizPollDeposit((ArrayElement *)this, timestep, m,

loc_x, loc_y, width, height, imageBits);

}

Page 54: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

54

OPTS=-g

CHARMC=charmc $(OPTS)

LB=-module RefineLB

OBJS = jacobi2d.o

all: jacobi2d

jacobi2d: $(OBJS)

$(CHARMC) -language charm++ \

-o jacobi2d $(OBJS) $(LB) –lm

jacobi2d.o: jacobi2d.C jacobi2d.decl.h

$(CHARMC) -c jacobi2d.C

OPTS=-g

CHARMC=charmc $(OPTS)

LB=-module RefineLB

OBJS = jacobi2d.o

all: jacobi2d

jacobi2d: $(OBJS)

$(CHARMC) -language charm++ \

-o jacobi2d $(OBJS) $(LB) -lm \

-module liveViz

jacobi2d.o: jacobi2d.C jacobi2d.decl.h

$(CHARMC) -c jacobi2d.C

Link With The LiveViz Library

Page 55: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

55

LiveViz Summary

Easy to use visualization library Simple code handles any number of

clients Doesn’t slow computation when there

are no clients connected Works in parallel, with load balancing,

etc.

Page 56: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

56

Advanced Features

Groups Node Groups Priorities Reductions

Page 57: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

57

Advanced Features: Groups

With arrays, the standard method of communications between elements is message passing

With a large number of chares, this can lead to large numbers of messages in the system

For global operations like reductions, this could make the receiving chare a bottleneck

Can this be fixed?

Page 58: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

58

Advanced Features: Groups

Solution: Groups Groups are more of a “system level” programming

feature of Charm++, versus “user level” arrays Groups are similar to arrays, except only one element

is on each processor – the index to access the group is the processor ID

Groups can be used to batch messages from chares running on a single processor, which cuts down on the message traffic

Disadvantage: Does not allow for effective load balancing, since groups are stationary (they are not virtualized)

Page 59: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

59

Advanced Features: Node Groups Similar to groups, but only one per node,

instead of one per processor – the index is the node number

Can be used to solve similar problems as well – with one node group per SMP node, the node group could act as a collection point for messages on the node, lowering message traffic on interconnects between nodes

Page 60: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

60

Advanced Features: Priorities

In general, messages in Charm++ are unordered: but what if an order is needed?

Solution: Priorities Messages can be assigned different priorities The simplest priorities just specify that the message

should either go on the end of the queue (standard behavior) or the beginning of the queue

Specific priorities can also be assigned to messages, using either numbers or bit vectors

Note that messages are non-preemptive: a lower priority message will continue processing, even if a higher priority message shows up

Page 61: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

61

Advanced Features: Entry Method Attributes

entry [attribute1, ..., attributeN] void

EntryMethod(parameters);

Attributes: threaded

– entry methods which are run in their own non- preemptible threads

sync – methods return message as a result

Page 62: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

62

Advanced features: Reductions Callbacks

– transfer control back to a client after a library has finished– Various pre-defined callbacks, eg: CkExit the program

Callbacks in reductions– Can be specified in the main chare on processor 0:

myProxy.ckSetReductionClient(new CkCallback(...));

– Can be specified in the call to contribute by specifying the callback method:

contribute(sizeof(x), &x, CkReduction::sum_int,processResult );

Page 63: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

63

Reductions, Part 2

Predefined Reductions – A number of reductions are predefined, including ones that– Sum values or arrays– Calculate the product of values or arrays– Calculate the maximum contributed value– Calculate the minimum contributed value– Calculate the logical and of integer values– Calculate the logical or of contributed integer values– Form a set of all contributed values– Concatenate bytes of all contributed values

Plus, you can create your own

Page 64: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

64

Other Advanced Features

Custom array indexes Array creation/mapping options Additional load balancers

Page 65: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

65

Benefits of Virtualization Better Software Engineering

– Logical Units decoupled from “Number of processors”

Message Driven Execution– Adaptive overlap between computation and

communication– Predictability of execution

Flexible and dynamic mapping to processors– Flexible mapping on clusters– Change the set of processors for a given job– Automatic Checkpointing

Principle of Persistence

Page 66: 1 Charm++ Tutorial Parallel Programming Laboratory, UIUC

66

More Information

http://charm.cs.uiuc.edu– Manuals– Papers– Download files– FAQs

[email protected]