15
Computer Science 320 Load Balancing for Hybrid SMP/Clusters

Computer Science 320 Load Balancing for Hybrid SMP/Clusters

Embed Size (px)

Citation preview

Page 1: Computer Science 320 Load Balancing for Hybrid SMP/Clusters

Computer Science 320

Load Balancing for

Hybrid SMP/Clusters

Page 2: Computer Science 320 Load Balancing for Hybrid SMP/Clusters

Load Balancing Strategies

• For SMP, use a dynamic schedule to break the work into smaller chunks to keep the threads continually busy

• For cluster, use the master/worker pattern with a dynamic schedule to keep the nodes continually busy

• For hybrid, put several worker threads in each node, and schedule them as in the cluster program

Page 3: Computer Science 320 Load Balancing for Hybrid SMP/Clusters

One-Level Scheduling Strategy

Cluster Hybrid

Page 4: Computer Science 320 Load Balancing for Hybrid SMP/Clusters

Hybrid Mandelbrot Set Program

• Each of Kp nodes has Kt worker threads

• Node 0 has one extra thread (the master)

• Each worker thread is numbered, from 0 to Kt * Kp - 1

• The master thread communicates with all worker threads; message tags identify them

Page 5: Computer Science 320 Load Balancing for Hybrid SMP/Clusters

Set Up and Run the ThreadsParallelTeam team = new ParallelTeam (rank == 0 ? Kt+1 : Kt);

// Every parallel team thread runs the worker section, except thread Kt// (which exists only in process 0) runs the master section.team.execute(new ParallelRegion(){ public void run() throws Exception{ if (getThreadIndex() == Kt) masterSection(); else workerSection(rank * Kt + getThreadIndex()); }});

The workerSection method takes a parameter to identify the thread for messages to and from the master thread

Page 6: Computer Science 320 Load Balancing for Hybrid SMP/Clusters

Scheduling the Threads in the Masterprivate static void masterSection()throws IOException{ int process, thread, worker; Range range;

// Set up a schedule object to divide the row range into chunks. IntegerSchedule schedule = IntegerSchedule.runtime(); schedule.start(K, new Range(0, height-1));

// Send initial chunk range to each worker. If range is null, no more // work for that worker. Keep count of active workers. int activeWorkers = K; // (Kp * Kt) for (process = 0; process < Kp; ++ process) for (thread = 0; thread < Kt; ++ thread) worker = process * Kt + thread; range = schedule.next(worker); world.send(process, worker, ObjectBuf.buffer(range)); if (range == null) --activeWorkers; }

Page 7: Computer Science 320 Load Balancing for Hybrid SMP/Clusters

Scheduling the Threads in the Masterprivate static void masterSection()throws IOException{ int process, thread, worker; Range range;

// Repeat until all workers have finished. while (activeWorkers > 0){ // Receive an empty message from any worker. CommStatus status = world.receive(null, null, IntegerBuf.emptyBuffer()); process = status.fromRank; worker = status.tag;

// Send next chunk range to that specific worker. // If null, no more work. range = schedule.next(worker); world.send(process, worker, ObjectBuf.buffer (range)); if (range == null) --activeWorkers; }}

Page 8: Computer Science 320 Load Balancing for Hybrid SMP/Clusters

Worker Thread Activity: Receiveprivate static void workerSection(int worker) throws IOException{ // Image, writer, matrix, and row slice variables are now local here. . . . for (;;){ // Receive chunk range from master. If null, no more work. ObjectItemBuf<Range> rangeBuf = ObjectBuf.buffer(); world.receive(0, worker, rangeBuf); Range range = rangeBuf.item; if (range == null) break; int lb = range.lb(); int ub = range.ub(); int len = range.length();

// Allocate storage for matrix row slice if necessary. if (slice == null || slice.length < len) slice = new int [len] [width];

// Code to compute rows and columns of slice goes here.

Page 9: Computer Science 320 Load Balancing for Hybrid SMP/Clusters

Worker Thread Activity: Sendprivate static void workerSection(int worker) throws IOException{ // Image, writer, matrix, and row slice variables are now local here. . . . for (;;){ // Receive chunk range from master. If null, no more work. ObjectItemBuf<Range> rangeBuf = ObjectBuf.buffer(); world.receive (0, worker, rangeBuf); Range range = rangeBuf.item; if (range == null) break; . . . . . . // Report completion of slice to master. world.send(0, worker, IntegerBuf.emptyBuffer());

// Set full pixel matrix rows to refer to slice rows. System.arraycopy(slice, 0, matrix, lb, len);

// Write row slice of full pixel matrix to image file. writer.writeRowSlice(range); }

Page 10: Computer Science 320 Load Balancing for Hybrid SMP/Clusters

One-Level Scheduling Performance

• With one master and Kt * Kp workers, lots of messages just to schedule them all

• Two-level scheduling: – One worker per node, but each worker uses

multiple threads– Two schedules, one from the master for each

worker and one from each worker for its threads

Page 11: Computer Science 320 Load Balancing for Hybrid SMP/Clusters

Two-Level Scheduling

Page 12: Computer Science 320 Load Balancing for Hybrid SMP/Clusters

Changes to Program

• Master uses a schedule with chunk size of 100, worker uses schedule with chunk size of 1

• Master node has two parallel sections as well as a worker team

• No worker tags needed

• Master section has no changes otherwise

Page 13: Computer Science 320 Load Balancing for Hybrid SMP/Clusters

Set Up and Run the Threads// In master process, run master section and worker section in parallel.if (rank == 0) new ParallelTeam(2).execute (new ParallelRegion(){ public void run() throws Exception{ execute(new ParallelSection(){ public void run() throws Exception{ masterSection();

} }, new ParallelSection(){ public void run() throws Exception{ workerSection(); } }); } });

// In worker process, run only worker section.else workerSection();

Page 14: Computer Science 320 Load Balancing for Hybrid SMP/Clusters

Worker Thread Activityprivate static void workerSection() throws IOException{ // Image, writer, matrix, and row slice variables are now local here. . . . // Parallel team to calculate each slice in multiple threads. ParallelTeam team = new ParallelTeam(); for (;;){ // Receive chunk range from master. If null, no more work. ObjectItemBuf<Range> rangeBuf = ObjectBuf.buffer(); world.receive(0, rangeBuf); Range range = rangeBuf.item; if (range == null) break; final int lb = range.lb(); final int ub = range.ub(); final int len = range.length();

// Allocate storage for matrix row slice if necessary. if (slice == null || slice.length < len) slice = new int [len] [width];

Page 15: Computer Science 320 Load Balancing for Hybrid SMP/Clusters

Worker Thread Activityprivate static void workerSection() throws IOException{ // Image, writer, matrix, and row slice variables are now local here. . . . // Parallel team to calculate each slice in multiple threads. ParallelTeam team = new ParallelTeam(); for (;;){ . . . // Compute rows of slice in parallel threads. team.execute (new ParallelRegion(){ public void run() throws Exception{ execute (lb, ub, new IntegerForLoop(){ // Use the thread-level loop schedule. public IntegerSchedule schedule(){ return thrschedule; }

// Compute all rows and columns in slice. public void run (int first, int last){ for (int r = first; r <= last; ++ r){ // Yadah, yadah, yadah