Upload
juniper-osborne
View
215
Download
1
Embed Size (px)
Citation preview
Computer Science 320
Load Balancing for
Hybrid SMP/Clusters
Load Balancing Strategies
• For SMP, use a dynamic schedule to break the work into smaller chunks to keep the threads continually busy
• For cluster, use the master/worker pattern with a dynamic schedule to keep the nodes continually busy
• For hybrid, put several worker threads in each node, and schedule them as in the cluster program
One-Level Scheduling Strategy
Cluster Hybrid
Hybrid Mandelbrot Set Program
• Each of Kp nodes has Kt worker threads
• Node 0 has one extra thread (the master)
• Each worker thread is numbered, from 0 to Kt * Kp - 1
• The master thread communicates with all worker threads; message tags identify them
Set Up and Run the ThreadsParallelTeam team = new ParallelTeam (rank == 0 ? Kt+1 : Kt);
// Every parallel team thread runs the worker section, except thread Kt// (which exists only in process 0) runs the master section.team.execute(new ParallelRegion(){ public void run() throws Exception{ if (getThreadIndex() == Kt) masterSection(); else workerSection(rank * Kt + getThreadIndex()); }});
The workerSection method takes a parameter to identify the thread for messages to and from the master thread
Scheduling the Threads in the Masterprivate static void masterSection()throws IOException{ int process, thread, worker; Range range;
// Set up a schedule object to divide the row range into chunks. IntegerSchedule schedule = IntegerSchedule.runtime(); schedule.start(K, new Range(0, height-1));
// Send initial chunk range to each worker. If range is null, no more // work for that worker. Keep count of active workers. int activeWorkers = K; // (Kp * Kt) for (process = 0; process < Kp; ++ process) for (thread = 0; thread < Kt; ++ thread) worker = process * Kt + thread; range = schedule.next(worker); world.send(process, worker, ObjectBuf.buffer(range)); if (range == null) --activeWorkers; }
Scheduling the Threads in the Masterprivate static void masterSection()throws IOException{ int process, thread, worker; Range range;
// Repeat until all workers have finished. while (activeWorkers > 0){ // Receive an empty message from any worker. CommStatus status = world.receive(null, null, IntegerBuf.emptyBuffer()); process = status.fromRank; worker = status.tag;
// Send next chunk range to that specific worker. // If null, no more work. range = schedule.next(worker); world.send(process, worker, ObjectBuf.buffer (range)); if (range == null) --activeWorkers; }}
Worker Thread Activity: Receiveprivate static void workerSection(int worker) throws IOException{ // Image, writer, matrix, and row slice variables are now local here. . . . for (;;){ // Receive chunk range from master. If null, no more work. ObjectItemBuf<Range> rangeBuf = ObjectBuf.buffer(); world.receive(0, worker, rangeBuf); Range range = rangeBuf.item; if (range == null) break; int lb = range.lb(); int ub = range.ub(); int len = range.length();
// Allocate storage for matrix row slice if necessary. if (slice == null || slice.length < len) slice = new int [len] [width];
// Code to compute rows and columns of slice goes here.
Worker Thread Activity: Sendprivate static void workerSection(int worker) throws IOException{ // Image, writer, matrix, and row slice variables are now local here. . . . for (;;){ // Receive chunk range from master. If null, no more work. ObjectItemBuf<Range> rangeBuf = ObjectBuf.buffer(); world.receive (0, worker, rangeBuf); Range range = rangeBuf.item; if (range == null) break; . . . . . . // Report completion of slice to master. world.send(0, worker, IntegerBuf.emptyBuffer());
// Set full pixel matrix rows to refer to slice rows. System.arraycopy(slice, 0, matrix, lb, len);
// Write row slice of full pixel matrix to image file. writer.writeRowSlice(range); }
One-Level Scheduling Performance
• With one master and Kt * Kp workers, lots of messages just to schedule them all
• Two-level scheduling: – One worker per node, but each worker uses
multiple threads– Two schedules, one from the master for each
worker and one from each worker for its threads
Two-Level Scheduling
Changes to Program
• Master uses a schedule with chunk size of 100, worker uses schedule with chunk size of 1
• Master node has two parallel sections as well as a worker team
• No worker tags needed
• Master section has no changes otherwise
Set Up and Run the Threads// In master process, run master section and worker section in parallel.if (rank == 0) new ParallelTeam(2).execute (new ParallelRegion(){ public void run() throws Exception{ execute(new ParallelSection(){ public void run() throws Exception{ masterSection();
} }, new ParallelSection(){ public void run() throws Exception{ workerSection(); } }); } });
// In worker process, run only worker section.else workerSection();
Worker Thread Activityprivate static void workerSection() throws IOException{ // Image, writer, matrix, and row slice variables are now local here. . . . // Parallel team to calculate each slice in multiple threads. ParallelTeam team = new ParallelTeam(); for (;;){ // Receive chunk range from master. If null, no more work. ObjectItemBuf<Range> rangeBuf = ObjectBuf.buffer(); world.receive(0, rangeBuf); Range range = rangeBuf.item; if (range == null) break; final int lb = range.lb(); final int ub = range.ub(); final int len = range.length();
// Allocate storage for matrix row slice if necessary. if (slice == null || slice.length < len) slice = new int [len] [width];
Worker Thread Activityprivate static void workerSection() throws IOException{ // Image, writer, matrix, and row slice variables are now local here. . . . // Parallel team to calculate each slice in multiple threads. ParallelTeam team = new ParallelTeam(); for (;;){ . . . // Compute rows of slice in parallel threads. team.execute (new ParallelRegion(){ public void run() throws Exception{ execute (lb, ub, new IntegerForLoop(){ // Use the thread-level loop schedule. public IntegerSchedule schedule(){ return thrschedule; }
// Compute all rows and columns in slice. public void run (int first, int last){ for (int r = first; r <= last; ++ r){ // Yadah, yadah, yadah