View
71
Download
7
Category
Preview:
DESCRIPTION
The HPC Challenge (HPCC) Benchmark Suite. Piotr Luszczek The MathWorks, Inc. http://icl.cs.utk.edu/hpcc/. HPCC: Components. Ax=b. ------------------------------------------------------- name kernel bytes/iter FLOPS/iter - PowerPoint PPT Presentation
Citation preview
Presented by
The HPC Challenge (HPCC)Benchmark Suite
Piotr Luszczek
The MathWorks, Inc.
http://icl.cs.utk.edu/hpcc/
2 Luszczek_HPCC_SC07
HPCC: Components1. HPL (High Performance LINPACK)
2. STREAM
3. PTRANS A ← AT+B
4. RandomAccess
5. FFT
6. Matrix-matrix multiply
7. b_eff (effective bandwidth/latency)
------------------------------------------------------- name kernel bytes/iter FLOPS/iter------------------------------------------------------- COPY: a(i) = b(i) 16 0 SCALE: a(i) = q*b(i) 16 1 SUM: a(i) = b(i) + c(i) 24 1 TRIAD: a(i) = b(i) + q*c(i) 24 2-------------------------------------------------------
+1
-1
T: T[k] (+) ai
T[k] (+) ai
64 bits
ping
C ← s*C + t * A*B
Ax=b
zk=xj exp(-2-1 jk/n)
pong
3 Luszczek_HPCC_SC07
HPCC: Motivation and measurement
HPC Challenge BenchmarksSelect Applications
0.00
0.20
0.40
0.60
0.80
1.00
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
Spatial Locality
Temporal locality
HPL
Test3D
CG
OverflowGamess
RandomAccess
AVUS
OOCore
RFCTH2
STREAM
HYCOM
Generated by PMaC @ SDSC
Spatial and temporal data locality here is for one node/processor - i.e., locally or “in the small.”
Spatial and temporal data locality here is for one node/processor - i.e., locally or “in the small.”
Hig
h
Spatial locality
Tem
po
ral
lo
cali
ty
DGEMMHPL
PTRANSSTREAM
FFT
RandomAccess
Mission partner
applications
Low High
Measurement
Concept
4 Luszczek_HPCC_SC07
HPCC: Scope and naming conventions
G
EP
S
HPL STREAM
FFT...
RandomAccess(1 m)HPL (25%)
system
CPU
thread
MM
PPPP
MM
PPPP
MM
PPPP
MM
PPPP
NetworkNetwork
MM
PPPP
MM
PPPP
MM
PPPP
MM
PPPP
NetworkNetwork
MM
PPPP
MM
PPPP
MM
PPPP
MM
PPPP
NetworkNetwork
GlobalGlobal
Embarrassingly ParallelEmbarrassingly Parallel
SingleSingle
CPU
Memory Interconnect
Computationalresources
Computationalresources
core(s)
MPI
OpenMP
SoftwaremodulesSoftwaremodules
Vectorize
5 Luszczek_HPCC_SC07
HPCC: Hardware probes
Registers
Cache
Local memory
Disk
Instr. Operands
Blocks
Pages
Remote memory
Messages
HPC ChallengeBenchmark
CorrespondingMemory Hierarchy
HPCS PerformanceTargets (improvement)
Top500: solves a system
Ax = b
STREAM: vector operations
A = B + s x C
FFT: 1D fast Fourier transform
Z = FFT(X)
RandomAccess: random updates
T(i) = XOR( T(i), r )
bandwidth
latency
2 Petaflops(8x)
6.5 Petabytes(40x)
0.5 Petaflops(200x)
64,000 GUPS(2000x)
HPCS program has developed a new suite of benchmarks (HPC Challenge).
Each benchmark focuses on a different part of the memory hierarchy.
HPCS program performance targets will flatten the memory hierarchy, improve real application performance, and make programming easier.
HPCS program has developed a new suite of benchmarks (HPC Challenge).
Each benchmark focuses on a different part of the memory hierarchy.
HPCS program performance targets will flatten the memory hierarchy, improve real application performance, and make programming easier.
6 Luszczek_HPCC_SC07
HPCC: Official submission process1. Download
2. Install
3. Run
4. Upload results
5. Confirm via @email@
6. Tune
7. Run
8. Upload results
9. Confirm via @email@
• Only some routines can be replaced.• Data layout needs to be preserved.• Multiple languages can be used.
• Only some routines can be replaced.• Data layout needs to be preserved.• Multiple languages can be used.
Provide detailedinstallation andexecution environment.
Provide detailedinstallation andexecution environment.
Results are immediately availableon the Web site:● Interactive HTML● XML● MS Excel● Kiviat charts (radar plots)
Results are immediately availableon the Web site:● Interactive HTML● XML● MS Excel● Kiviat charts (radar plots)
OptionalOptional
Prerequisites:• C compiler• BLAS• MPI
Prerequisites:• C compiler• BLAS• MPI
7 Luszczek_HPCC_SC07
HPCC: Submissions over time
10
100
1000
10000
100000
1000000
10000000
Nov-03Jan-04Mar-04May-04 Jul-04Sep-04Nov-04Jan-05Mar-05May-05 Jul-05Sep-05Nov-05Jan-06Mar-06May-06 Jul-06Sep-06
1
10
100
1000
10000
100000
Jul-04Aug-04Sep-04Oct-04Nov-04Dec-04Jan-05Feb-05Mar-05Apr-05May-05Jun-05Jul-05
Aug-05Sep-05Oct-05Nov-05Dec-05Jan-06Feb-06Mar-06Apr-06May-06Jun-06Jul-06Aug-06Sep-06Oct-06
0.1
1
10
100
1000
10000
Nov-03Jan-04Mar-04May-04 Jul-04Sep-04Nov-04Jan-05Mar-05May-05 Jul-05Sep-05Nov-05Jan-06Mar-06May-06 Jul-06Sep-06
0.001
0.01
0.1
1
10
100
1000
Jul-04Aug-04Sep-04Oct-04Nov-04Dec-04Jan-05Feb-05Mar-05Apr-05May-05Jun-05Jul-05
Aug-05Sep-05Oct-05Nov-05Dec-05Jan-06Feb-06Mar-06Apr-06May-06Jun-06Jul-06Aug-06Sep-06Oct-06
STREAM[GB/s]
HPL[Tflop/s]
FFT[Gflop/s]
RandomAccess[GUPS]
Sum Sum
Sum Sum
#1
#1
#1
#1
8 Luszczek_HPCC_SC07
HPCC: Comparing three interconnects
Kiviat chart (radar plot) 3 AMD Opteron clusters Clock: 2.2 GHz 64-processor cluster
Interconnect typesA. VendorB. CommodityC. GigE G-HPL Matrix-matrix multiply
Cannot be differentiated based on G-HPL Matrix-matrix multiply
Available on HPCC Web site http://icl.cs.utk.edu/hpcc/
9 Luszczek_HPCC_SC07
HPCC: Analysis of sample resultsHPCS ~102
HPC ~104
Clusters ~106
1.E+03
1.E+06
1.E+09
1.E+12
1.E+15
DARPA HPCS GoalsIBM BG/L (LLNL) OptIBM BG/L (LLNL)IBM Power5 (LLNL)Cray XT3 (ORNL)Cray XT3 (ERDC)Cray X1 (ORNL) OptCray X1 (ORNL)NEC SX-8 (HLRS)SGI Altix (NASA)Cray X1E (AHPCRC)Opteron (AMD)Dell GigE P64 (MITLL)Dell GigE P32 (MITLL)Dell GigE P16 (MITLL)Dell GigE P8 (MITLL)Dell GigE P4 (MITLL)Dell GigE P2 (MITLL)Dell GigE P1 (MITLL)
Top500 (words/s)
STREAM (words/s)
FFT (words/s)
RandomAccess (words/s)
Systems(in
Top500order)
Meg
aG
iga
Ter
aP
eta
Eff
ecti
ve B
and
wid
th
(wo
rds/
seco
nd
)
All results in words/second
Highlights memory hierarchy
Clusters Hierarchy
steepens
HPC systems Hierarchy
constant
HPCS goals Hierarchy
flattens Easier to
program
Kil
o
10 Luszczek_HPCC_SC07
HPCC: Augmenting June TOP500
TOP500 rating Data provided by HPCC database
11 Luszczek_HPCC_SC07
Contacts
Piotr LuszczekThe MathWorks, Inc.(508) 647-6767luszczek@eecs.utk.edu
Recommended