Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Outline
I. Motivation!
II. Solutions!
(1) Dimensional Ordering!
(2) Space Filling Curve Ordering!
III. Evaluation Metrics!
IV. Experimental Results
What is job allocation on HPC?
Why we care about job allocation
• Jobs submitted to HPC system always requires different number of processors.
Processors —> Nodes—>Midplanes(or blade cards)—>Racks(chassis)—>Cabinets
• HPS system consists of hundreds/thousands of processors. They are organized in the form of:
• HPC network resource is limited, especially like bandwidth,connection(routing path)
• Communication in HPC is expensive, more expensive than computation.
IBM! Cray
Blue Gene/L 0.375 XT3 8.77
Blue Gene/P 0.375 XT4 1.36
Blue Gene/Q 0.117 XT5 0.23
Table 1: Byte-to-flop ratios!
For each flop on the node, the interconnected network is able to communicate fewer and fewer bytes. !!
Topology aware job scheduling/allocating will have great importance for HPC systems.
Now, only 6% of the top500 machines(primarily the IBM Blue Gene series) provide contiguous node allocation for their jobs.
Contiguous VS Non-Contiguous job allocation
Contiguous! Non-Contiguous
Pros• Low communicat ion
cost/Network contention • fragmentation
• High system utilization • Short wait time • No fragmentation
Cons • Low system utilization • long wait time
• High communication cost/Network contention
Processor Ordering—Sequence of allocation
1. Dimensional Ordering
2. Space Filling Curve (Hilbert Curve)Ordering
3D torus topology, three dimension is w, l, d . For each node, its index is ind, coordinates is (x, y, z)
ind = z*w*l+y*w+x
ind = H(x, y, z) = (h(x), h(y), h(x))
colored by job and illustrates the planar and fragmented
nature of the default selection algorithm.
The new node selection algorithm was designed to select
nodes in a cubic geometry by using a node ordering mask,
a static, total ordering of all compute nodes, constructed
by taking the shortest path through the machine from node
to node. The mask was then used to order free nodes on
each scheduling cycle, assigning the first N nodes from
this list to a job requiring N nodes. The reader is
encouraged to view an animation[7] illustrating the
construction of the node ordering mask by comparing the
physical and wired views of the machine as nodes are
added to the mask.
Ordering the list of free nodes according to this mask is
computationally no more expensive than sorting them
numerically, so there is no additional overhead in using
this new algorithm.
Figure 5. Xt3dmon wired view showing planar nature of
default node selection algorithm leading to non
contiguous node assignment within a job. Jobs are color
coded. Service nodes are yellow.
To illustrate the node selection differences between the
default and new algorithms on a set of real jobs, a time
lapse animation[8] has been produced that shows a six
hour window starting from an empty state on the machine.
This animation contrasts the differences between the two
algorithms on the same set of jobs and shows how larger
jobs generally get contiguous nodes in a cubic geometry
using the new algorithm while jobs using the old default
node id ordering algorithm have a more planar and non-
contiguous geometry. Figures 5 and 6 also help to
illustrate these differences.
4.0 System Changes to Benefit Specific Jobs
The changes detailed in section 3 were made to help
improve interconnect performance for all jobs. In this
section system changes to accommodate applications that
understand the machine topology and that can assign tasks
to take advantage of node proximity will be reviewed.
For these topology-aware codes each must be given a
specific geometry or shape. In addition the codes must
know the coordinates of the nodes that have been assigned
so that they may assign tasks appropriately.
Figure 6. Xt3dmon wired view showing cubic nature of
new node selection algorithm leading to contiguous node
assignment within a job. Jobs are color coded. Service
nodes are yellow.
Figure 7. Xt3dmon wired view showing an 8x8x8 node
job allocation in red.
4.1 OpenAtom
OpenAtom is a quantum chemistry code that is highly
communications bound and its performance is highly
influenced by placement on a torus topology machine[9].
The goal of the researchers working with this code on
BigBen is to minimize the communication volume of
3
Dimensional Ordering job allocation algorithm leading to non contiguous node assignment within a job. Jobs are color coded. !
I. Dimensional Ordering
Evaluation Metrics
Parameter Geo-Metrics
α1 Average Pairwise Distance(m1)
α2 Diameter(m2)
α3 Max Dimension(m3)
α4 Distance between Logic Neighbors(m4)
• αi is obtained from running benchmark on Blue Gene/Q • Penalty function p = ∑ αi ·mi
Communication Pattern
• Broadcast
• P2P
Communication Pattern Dominate MetricAll-to-All Average Pairwise Distance
One-to-All Diameter
Communication Pattern Dominate Metric
Nearest Neighbor Distance Between Neighbors
I. SDSC Blue
• System: IBM SP at SDSC; 144 nodes; 1152 Processors!• Duration: Apr 2000 to May 2000!• Jobs: 2,440
Traces and Evaluation
SDSC-BLUE Trace
• Horizontal axis is the id of each job • HSFC—Hilbert Space Filling Curve Ordering • DO—Dimensional Ordering
Average_Pairwise_Distance_Difference HSFC vs DO
-2.25
0
2.25
4.5
6.75
9
Max-Dimension Difference HSFC vs DO
0
1.5
3
4.5
6
• Horizontal axis is the id of each job • HSFC—Hilbert Space Filling Curve Ordering • DO—Dimensional Ordering
Diameter Difference HSFC vs DO
-7.5
-5
-2.5
0
2.5
5
7.5
10
• Horizontal axis is the id of each job • HSFC—Hilbert Space Filling Curve Ordering • DO—Dimensional Ordering
• Horizontal axis is the id of each job • HSFC—Hilbert Space Filling Curve Ordering • DO—Dimensional Ordering
Job Runtime Improvement HSFC vs DO
-80%
-40%
0%
40%
80%
120%
160%
II. LLNL Thunder
• System: Linux Cluster (Thunder) at LLNL; 1024 Nodes; 4096 Processors!• Duration: Feb 2007 to Mar 2007!• Jobs: 2,662
LLNL Thunder
• Horizontal axis is the id of each job • HSFC—Hilbert Space Filling Curve Ordering • DO—Dimensional Ordering
Average_Pairwise_Distance_Difference HSFC vs DO
-15
-7.5
0
7.5
15
22.5
30
Max_Dimension_Difference HSFC vs DO
-3.5
0
3.5
7
10.5
14
• Horizontal axis is the id of each job • HSFC—Hilbert Space Filling Curve Ordering • DO—Dimensional Ordering
Diameter_Difference HSFC vs DO
-15
-7.5
0
7.5
15
22.5
30
• Horizontal axis is the id of each job • HSFC—Hilbert Space Filling Curve Ordering • DO—Dimensional Ordering
• Horizontal axis is the id of each job • HSFC—Hilbert Space Filling Curve Ordering • DO—Dimensional Ordering
Job Runtime Improvement HSFC vs DO
-40%
0%
40%
80%
120%
160%