Upload
qian-lin
View
119
Download
6
Embed Size (px)
DESCRIPTION
Citation preview
Adaptive Execution Support for Malleable Computation
Speaker: LIN Qianhttp://www.comp.nus.edu.sg/~linqian
Outline
• Introduce the key ideas of 3 selected papers• Discussion
FORMLESS
• FORMLESS: Scalable Utilization of Embedded Manycores in Streaming Applications[LCTES’12]– Functionally-cOnsistent stRucturally-MalLEabe
Streaming Specification– Actor-oriented specification models– Space exploration scheme • to customize the application specification to better fit
the target platform.
FORMLESS (cont.)
• Space exploration for platform-driven instantiation
FORMLESS (cont.)
• Example:
Dynamic Load Balancing
• A Distributed and Adaptive Dynamic Load Balancing Scheme for Parallel Processing of Medium-Grain Tasks[IEEE Jounal, 1990]– Challenge: Allocate and distribute tasks
dynamically with minimum run time overhead.– Design: A distributed and adaptive load balancing
scheme for medium-grain tasks
Dynamic Load Balancing (cont.)
• Key idea 1: Neighborhood average strategy– Attempts to balance load within a neighborhood
by distributing tasks • such that all neighbors have loads close to the
neighborhood average.
– The decision when to balance load is based on the neighborhood state information that is checked periodically. • Each processor maintains status information of all its
neighbors.
Dynamic Load Balancing (cont.)
• Key idea 2: Grain Size Control– If the cost of making work available to another
processor exceeds the cost of executing it at the local processor, then it does not make sense to decompose and parallelize work beyond a certain size or granularity of work.
– Granularity control: To determine when to stop breaking down a computation into parallel computations at a frontier node, treating it as a leaf node and executing it sequentially.
Adaptive Load Balancing
• Compiler and Run-Time Support for Adaptive Load Balancing in Software Distributed Shared Memory Systems[1998]– Use information provided by the compiler to help
the run-time system distribute the work of the parallel loops• according to the relative power of the processors• minimize communication and page sharing
Adaptive Load Balancing (cont.)
• Compile-Time Support for Load Balancing– The specific compiler adopts SUIF system, which is
organized as a set of compiler passes.– The SUIF pass extracts the shared data access
patterns in each of the SPMD regions, and feeds this information to the run-time system.• also responsible for adding hooks in the parallelized
code to allow run-time library to change the load distribution
--------SUIF: Stanford University Intermediate FormatSPMD: Single-Program Multiple-Data
Adaptive Load Balancing (cont.)– Access pattern extraction• SUIF pass walks through the program looking for
accesses to shared memory.
– Prefetching• Use the access pattern information to prefetch data
through prefetching calls.
– Load balancing interface and strategy• The compiler can direct the run-time to choose
between two partitioning strategies for distributing the parallel loops.
1. Shifting of loop boundaries2. Multiple loop bounds
Adaptive Load Balancing (cont.)
• Run-Time Load Balancing Support– The run-time library is responsible for keeping
track of the progress of each process• collect statistics about the execution time of each
parallel task, and• adjust the load accordingly
– Load balancing vs. Locality management• need to avoid unnecessary movement of data and
minimize page sharing• Locality-conscious load balancing: the run-time library
uses the information supplied by the compiler about what loop distribution strategy to use.
Algorithms for Scheduling
• Scheduling Malleable Parallel Tasks: An Asymptotic Fully Polynomial-Time Approximation Scheme [2002]
• Mapping and Scheduling Heterogeneous Tasks using Genertic Algorithms [1995]