CDA 3101 Fall 2013 Introduction to Computer Organization Benchmarks 30 August 2013

CDA 3101 Fall 2013

Introduction to Computer Organization

Benchmarks

30 August 2013

Overview

• Benchmarks

• Popular benchmarks– Linpack– Intel’s iCOMP

• SPEC Benchmarks

• MIPS Benchmark

• Fallacies and Pitfalls

Benchmarks• Benchmarks measure different aspects of component and

system performance

• Ideal situation: use real workload• Types of Benchmarks

• Risk: adjust design to benchmark requirements– (partial) solution: use real programs and update constantly

• Engineering or scientific applications• Software development tools• Transaction processing• Office applications

• Real programs • Kernels

• Toy benchmarks • Synthetic benchmarks

A / Benchmark Story

1. You create a benchmark called the vmark

2. Run it on lots of different computers

3. Publish the vmarks in www.vmark.org

4. vmark and www.vmark.org become popular – Users start buying their PCs based on vmark– Vendors would be banging on your door

5. Vendors examine the vmark code and fix up their compilers and/or microarchitecture to run vmark

6. Your vmark benchmark has been broken

7. Create vmark 2.0

Performance Reports• Reproducibility

– Include hardware / software configuration (SPEC)

– Evaluation process conditions

• Summarizing performance– Total time:

– Arithmetic mean: AM = 1/n * Σ exec timei

– Harmonic mean: HM = n / Σ (1/ratei)

– Weighted mean: WM = Σ wi * exec timei

– Geometric mean: GM = (Π exec time ratioi)1/n

GM (Xi) XiGM (Yi) Yi

= GM

Ex.1: Linpack Benchmark

• “Mother of all benchmarks”• Time to solve a dense systems of linear equations

DO I = 1, N DY(I) = DY(I) + DA * DX(I) END DO

• Metrics– Rpeak: system peak Gflops

– Nmax: matrix size that gives the highest Gflops

– N1/2: matrix size that achieves half the rated Rmax Gflops

– Rmax: the Gflops achieved for the Nmax size matrix

• Used in http://www.top500.org

Ex.2: Intel’s iCOMP Index 3.0• New version (3.0) reflects:

• Mix of instructions for existing and emerging software. • Increasing use of 3D, multimedia, and Internet software.

• Benchmarks• 2 integer productivity applications (20% each)

• 3D geometry and lighting calculations (20%)• FP engineering and finance programs and games (5%)• Multimedia and Internet application (25%. )• Java application (10%)

• Weighted GM of relative performance– Baseline processor: Pentium II processor at 350MHz

Ex.3: SPEC CPU Benchmarks• System Performance Evaluation

Corporation• Need to update/upgrade benchmarks

– Longer run time– Larger problems– Application diversity

• Rules to run and report– Baseline and optimized– Geometric mean of normalized execution times– Reference machine: Sun Ultra5_10 (300-MHz SPARC, 256MB)

• CPU2006: latest SPEC CPU benchmark (4th version)– 12 integer and 17 floating point programs

• Metrics: response time and throughput

www.spec.org

Ex.3: SPEC CPU Benchmarks

1989-2006

Previous Benchmarks, now retired

Ex.3: SPEC CPU Benchmarks

• Observe: We will use SPEC 2000 & 2006 CPU

benchmarks in this set of notes.

• Task: However, you are asked to read about

SPEC 2006 CPU benchmark suite, described at

www.spec.org/cpu2006

• Result: Compare SPEC 2006 with SPEC

2000 data www.spec.org/cpu2000 to answer

the extra-credit questions in Homework #2.

http://www.spec.org/cpu2006

http://www.spec.org/cpu2000

SPEC CINT2000 Benchmarks1. 164.gzip C Compression2. 175.vpr C FPGA Circuit Placement and

Routing3. 176.gcc C C Programming Language Compiler4. 181.mcf C Combinatorial Optimization5. 186.crafty C Game Playing: Chess6. 197.parser C Word Processing7. 252.eon C++ Computer Visualization8. 253.perlbmk C PERL Programming Language9. 254.gap C Group Theory, Interpreter10. 255.vortex CObject-oriented Database11. 256.bzip2 C Compression12. 300.twolf C Place and Route Simulator

SPEC CFP2000 Benchmarks1. 168.wupwise F77 Physics / Quantum Chromodynamics2. 171.swim F77 Shallow Water Modeling3. 172.mgrid F77 Multi-grid Solver: 3D Potential Field4. 173.applu F77 Parabolic / Elliptic Partial Differential Equations5. 177.mesa C 3-D Graphics Library6. 178.galgel F90 Computational Fluid Dynamics7. 179.art C Image Recognition / Neural Networks8. 183.equake C Seismic Wave Propagation Simulation9. 187.facerec F90 Image Processing: Face Recognition10. 188.ammp C Computational Chemistry11. 189.lucas F90 Number Theory / Primality Testing12. 191.fma3d F90 Finite-element Crash Simulation13. 200.sixtrack F77 High Energy Nuclear Physics Accelerator Design14. 301.apsi F77 Meteorology: Pollutant Distribution

SPECINT2000 Metrics• SPECint2000: The geometric mean of 12 normalized

ratios (one for each integer benchmark) when each benchmark is compiled with "aggressive" optimization

• SPECint_base2000: The geometric mean of 12 normalized ratios when compiled with "conservative" optimization

• SPECint_rate2000: The geometric mean of 12 normalized throughput ratios when compiled with "aggressive" optimization

• SPECint_rate_base2000: The geometric mean of 12 normalized throughput ratios when compiled with "conservative" optimization

SPECint_base2000 Results

Alpha/Tru6421264 @ 667 MHz

Mips/IRIXR12000@ 400MHz

Intel/NT 4.0PIII @ 733 MHz

SPECfp_base2000 Results

Alpha/Tru6421264 @ 667 MHz

Mips/IRIXR12000@ 400MHz

Intel/NT 4.0PIII @ 733 MHz

Effect of CPI: SPECint95 Ratings

Microarchitecture improvements

CPU time = IC * CPI * clock cycle

Effect of CPI: SPECfp95 Ratings

Microarchitecture improvements

SPEC Recommended Readings

SPEC 2006 – Survey of Benchmark Programs

http://www.spec.org/cpu2006/publications/CPU2006benchmarks.pdf

SPEC 2006 Benchmarks - Journal Articles on Implementation Techniques and Problems

http://www.spec.org/cpu2006/publications/SIGARCH-2007-03/

SPEC 2006 Installation, Build, and Runtime Issues

http://www.spec.org/cpu2006/issues/

http://www.spec.org/cpu2006/publications/CPU2006benchmarks.pdf

http://www.spec.org/cpu2006/publications/SIGARCH-2007-03/

http://www.spec.org/cpu2006/issues/

Another Benchmark: MIPS • Millions of Instructions Per Second• MIPS = IC / (CPUtime * 106)• Comparing apples to oranges• Flaw: 1 MIPS on one processor does not accomplish

the same work as 1 MIPS on another– It is like determining the winner of a foot race by counting

who used fewer steps– Some processors do FP in software (e.g. 1FP = 100 INT)– Different instructions take different amounts of time

• Useful for comparisons between 2 processors from the same vendor that support the same ISA with the same compiler (e.g. Intel’s iCOMP benchmark)

Fallacies and Pitfalls

• Ignoring Amdahl’s law • Using clock rate or MIPS

as a performance metric• Using the Arithmetic Mean of normalized

CPU times (ratios) instead of the Geometric Mean• Using hardware-independent metrics

– Using code size as a measure of speed

• Synthetic benchmarks predict performance– They do not reflect the behavior of real programs

• The geometric mean of CPU times ratios is proportional to the total execution time [NOT!!]

Conclusions

• Performance is specific to a particular program/s• CPU time: only adequate measure of performance• For a given ISA performance increases come from:

– increases in clock rate (without adverse CPI affects)

– improvements in processor organization that lower CPI

– compiler enhancements that lower CPI and/or IC

• Your workload: the ideal benchmark• You should not always believe everything you read!

Happy & Safe Holiday Weekend

Documents

CDA 3101 Fall 2013 Introduction to Computer Organization Benchmarks 30 August 2013