Upload
earl-morgan
View
216
Download
0
Embed Size (px)
Citation preview
CDA 3101 Fall 2013
Introduction to Computer Organization
Benchmarks
30 August 2013
Overview
• Benchmarks
• Popular benchmarks– Linpack– Intel’s iCOMP
• SPEC Benchmarks
• MIPS Benchmark
• Fallacies and Pitfalls
Benchmarks• Benchmarks measure different aspects of component and
system performance
• Ideal situation: use real workload• Types of Benchmarks
• Risk: adjust design to benchmark requirements– (partial) solution: use real programs and update constantly
• Engineering or scientific applications• Software development tools• Transaction processing• Office applications
• Real programs • Kernels
• Toy benchmarks • Synthetic benchmarks
A / Benchmark Story
1. You create a benchmark called the vmark
2. Run it on lots of different computers
3. Publish the vmarks in www.vmark.org
4. vmark and www.vmark.org become popular – Users start buying their PCs based on vmark– Vendors would be banging on your door
5. Vendors examine the vmark code and fix up their compilers and/or microarchitecture to run vmark
6. Your vmark benchmark has been broken
7. Create vmark 2.0
Performance Reports• Reproducibility
– Include hardware / software configuration (SPEC)
– Evaluation process conditions
• Summarizing performance– Total time:
– Arithmetic mean: AM = 1/n * Σ exec timei
– Harmonic mean: HM = n / Σ (1/ratei)
– Weighted mean: WM = Σ wi * exec timei
– Geometric mean: GM = (Π exec time ratioi)1/n
GM (Xi) XiGM (Yi) Yi
= GM
Ex.1: Linpack Benchmark
• “Mother of all benchmarks”• Time to solve a dense systems of linear equations
DO I = 1, N DY(I) = DY(I) + DA * DX(I) END DO
• Metrics– Rpeak: system peak Gflops
– Nmax: matrix size that gives the highest Gflops
– N1/2: matrix size that achieves half the rated Rmax Gflops
– Rmax: the Gflops achieved for the Nmax size matrix
• Used in http://www.top500.org
Ex.2: Intel’s iCOMP Index 3.0• New version (3.0) reflects:
• Mix of instructions for existing and emerging software. • Increasing use of 3D, multimedia, and Internet software.
• Benchmarks• 2 integer productivity applications (20% each)
• 3D geometry and lighting calculations (20%)• FP engineering and finance programs and games (5%)• Multimedia and Internet application (25%. )• Java application (10%)
• Weighted GM of relative performance– Baseline processor: Pentium II processor at 350MHz
Ex.3: SPEC CPU Benchmarks• System Performance Evaluation
Corporation• Need to update/upgrade benchmarks
– Longer run time– Larger problems– Application diversity
• Rules to run and report– Baseline and optimized– Geometric mean of normalized execution times– Reference machine: Sun Ultra5_10 (300-MHz SPARC, 256MB)
• CPU2006: latest SPEC CPU benchmark (4th version)– 12 integer and 17 floating point programs
• Metrics: response time and throughput
www.spec.org
Ex.3: SPEC CPU Benchmarks
1989-2006
Previous Benchmarks, now retired
Ex.3: SPEC CPU Benchmarks
• Observe: We will use SPEC 2000 & 2006 CPU
benchmarks in this set of notes.
• Task: However, you are asked to read about
SPEC 2006 CPU benchmark suite, described at
www.spec.org/cpu2006
• Result: Compare SPEC 2006 with SPEC
2000 data www.spec.org/cpu2000 to answer
the extra-credit questions in Homework #2.
SPEC CINT2000 Benchmarks1. 164.gzip C Compression2. 175.vpr C FPGA Circuit Placement and
Routing3. 176.gcc C C Programming Language Compiler4. 181.mcf C Combinatorial Optimization5. 186.crafty C Game Playing: Chess6. 197.parser C Word Processing7. 252.eon C++ Computer Visualization8. 253.perlbmk C PERL Programming Language9. 254.gap C Group Theory, Interpreter10. 255.vortex CObject-oriented Database11. 256.bzip2 C Compression12. 300.twolf C Place and Route Simulator
SPEC CFP2000 Benchmarks1. 168.wupwise F77 Physics / Quantum Chromodynamics2. 171.swim F77 Shallow Water Modeling3. 172.mgrid F77 Multi-grid Solver: 3D Potential Field4. 173.applu F77 Parabolic / Elliptic Partial Differential Equations5. 177.mesa C 3-D Graphics Library6. 178.galgel F90 Computational Fluid Dynamics7. 179.art C Image Recognition / Neural Networks8. 183.equake C Seismic Wave Propagation Simulation9. 187.facerec F90 Image Processing: Face Recognition10. 188.ammp C Computational Chemistry11. 189.lucas F90 Number Theory / Primality Testing12. 191.fma3d F90 Finite-element Crash Simulation13. 200.sixtrack F77 High Energy Nuclear Physics Accelerator Design14. 301.apsi F77 Meteorology: Pollutant Distribution
SPECINT2000 Metrics• SPECint2000: The geometric mean of 12 normalized
ratios (one for each integer benchmark) when each benchmark is compiled with "aggressive" optimization
• SPECint_base2000: The geometric mean of 12 normalized ratios when compiled with "conservative" optimization
• SPECint_rate2000: The geometric mean of 12 normalized throughput ratios when compiled with "aggressive" optimization
• SPECint_rate_base2000: The geometric mean of 12 normalized throughput ratios when compiled with "conservative" optimization
SPECint_base2000 Results
Alpha/Tru6421264 @ 667 MHz
Mips/IRIXR12000@ 400MHz
Intel/NT 4.0PIII @ 733 MHz
SPECfp_base2000 Results
Alpha/Tru6421264 @ 667 MHz
Mips/IRIXR12000@ 400MHz
Intel/NT 4.0PIII @ 733 MHz
Effect of CPI: SPECint95 Ratings
Microarchitecture improvements
CPU time = IC * CPI * clock cycle
Effect of CPI: SPECfp95 Ratings
Microarchitecture improvements
SPEC Recommended Readings
SPEC 2006 – Survey of Benchmark Programs
http://www.spec.org/cpu2006/publications/CPU2006benchmarks.pdf
SPEC 2006 Benchmarks - Journal Articles on Implementation Techniques and Problems
http://www.spec.org/cpu2006/publications/SIGARCH-2007-03/
SPEC 2006 Installation, Build, and Runtime Issues
http://www.spec.org/cpu2006/issues/
Another Benchmark: MIPS • Millions of Instructions Per Second• MIPS = IC / (CPUtime * 106)• Comparing apples to oranges• Flaw: 1 MIPS on one processor does not accomplish
the same work as 1 MIPS on another– It is like determining the winner of a foot race by counting
who used fewer steps– Some processors do FP in software (e.g. 1FP = 100 INT)– Different instructions take different amounts of time
• Useful for comparisons between 2 processors from the same vendor that support the same ISA with the same compiler (e.g. Intel’s iCOMP benchmark)
Fallacies and Pitfalls
• Ignoring Amdahl’s law • Using clock rate or MIPS
as a performance metric• Using the Arithmetic Mean of normalized
CPU times (ratios) instead of the Geometric Mean• Using hardware-independent metrics
– Using code size as a measure of speed
• Synthetic benchmarks predict performance– They do not reflect the behavior of real programs
• The geometric mean of CPU times ratios is proportional to the total execution time [NOT!!]
Conclusions
• Performance is specific to a particular program/s• CPU time: only adequate measure of performance• For a given ISA performance increases come from:
– increases in clock rate (without adverse CPI affects)
– improvements in processor organization that lower CPI
– compiler enhancements that lower CPI and/or IC
• Your workload: the ideal benchmark• You should not always believe everything you read!
Happy & Safe Holiday Weekend