87
Matthias Müller ([email protected]) Center for Information Services and High Performance Computing (ZIH) Vorlesung Leistungsanalyse Parallel SPEC Benchmarks Regression Models

Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

Matthias Müller ([email protected])

Center for Information Services and High Performance Computing (ZIH)

Vorlesung Leistungsanalyse

Parallel SPEC Benchmarks

Regression Models

Page 2: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

Matthias Müller ([email protected])

Center for Information Services and High Performance Computing (ZIH)

Summary of Previous Lecture

Experimental Design

Page 3: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

3 Holger Brunst, Matthias Müller: Leistungsanalyse

Goal & Terminology

Obtain the maximum information with the

minimum number experiments

Response Variable (Zielgröße)

Factors (Einflussfaktoren), also called: Predictor variables or predictors

Levels, also called: treatment

Primary Factors

Secondary Factors

Replication

Experiment Design (Versuchsplanung)

Experimental Unit

Page 4: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

4 Holger Brunst, Matthias Müller: Leistungsanalyse

Terminology: Interaction

Interaction (Wechselwirkung)

Two factors A and B are interacting factors if the effect of one depends upon the level of the other

Noninteracting Factors

A1 A2

B1 2 4

B2 5 7

Interacting Factors

A1 A2

B1 2 4

B2 5 8

2 4 6 8

10

A1 A2

B2

B1

10

2 4 6 8

B1 B2

A2

A1

10

2 4 6 8

A1 A2

B2

B1

10

2 4 6 8

B1 B2

A2

A1

Page 5: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

5 Holger Brunst, Matthias Müller: Leistungsanalyse

Common Mistakes

Variation due to experimental error ignored

Important parameters are not controlled

Effects of different factors are not isolated

Simple one-factor-at-a-time designs are used

Interactions are ignored

Too many experiments are conducted

Page 6: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

7 Holger Brunst, Matthias Müller: Leistungsanalyse

Full Factorial Designs

Uses every possible combination at all levels of all factors which requires n experiments, where

In workstation example: 7 CPUs x 3 Memory sizes x 4 disk drives x 4 workloads x 4 operating systems x 3 educational levels

= 4032 experiments

Advantage: Every possible factor combination is examined. This includes secondary factors and their interactions.

Disadvantage:

– Cost of the study regarding time and money

– Too many experiments to be conducted.

– Also consider replication!

Page 7: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

8 Holger Brunst, Matthias Müller: Leistungsanalyse

2k Factorial Designs

Determines the effect of k factors with 2 levels each

Easy to analyze

Helps to sort performance factors in the order of impact

At beginning of performance study:

– Large number of factors and levels

– Full factorial design most likely not possible

– Reduce the number of factors by selecting the significant ones

Impact of unidirectional factors can be estimated for their minimum and maximum levels

Decide if performance difference is worth further examination (with more levels)

Explanation of the concept: Start with k=2, then generalize

Page 8: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

Matthias Müller ([email protected])

Center for Information Services and High Performance Computing (ZIH)

Parallel SPEC Benchmarks

Page 9: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

Matthias Müller ([email protected])

Center for Information Services and High Performance Computing (ZIH)

SPEC OMP

Page 10: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

12 Holger Brunst, Matthias Müller: Leistungsanalyse

SPEC OMP

Benchmark suite developed by SPEC HPG

Benchmark suite for performance testing of shared memory processor systems

Uses OpenMP versions of SPEC CPU2000 benchmarks

SPEC OMP mixes integer and FP in one suite

OMPM is focused on 4-way to 16-way systems

OMPL is targeting 32-way and larger systems

Page 11: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

13 Holger Brunst, Matthias Müller: Leistungsanalyse

SPEC OMP Applications

Code Applications Language lines ammp Molecular Dynamics C 13500

applu CFD, partial LU Fortran 4000

apsi Air pollution Fortran 7500

art Image Recognition\

neural networks C 1300

fma3d Crash simulation Fortran 60000

gafort Genetic algorithm Fortran 1500

galgel CFD, Galerkin FE Fortran 15300

equake Earthquake modeling C 1500

mgrid Multigrid solver Fortran 500

swim Shallow water modeling Fortran 400

wupwise Quantum chromodynamics Fortran 2200

Page 12: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

14 Holger Brunst, Matthias Müller: Leistungsanalyse

CPU2000 vs OMPL2001

Page 13: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

Matthias Müller ([email protected])

Center for Information Services and High Performance Computing (ZIH)

SPEC MPI2007

Page 14: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

16 Holger Brunst, Matthias Müller: Leistungsanalyse

An application benchmark suite that measures:

– Type of computer processor

– Number of computer processors

– Communication interconnect

– Memory architecture

– Compilers

– MPI library performance

– File system performance

Identifying Candidate Applications

– From SPEC CPU2006

– With a search for candidate call

MPI2007 design goals: benchmark for distributed memory

Page 15: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

17 Holger Brunst, Matthias Müller: Leistungsanalyse

Comparison of Different Benchmarks using MPI

SPEC MPI NPB HPCC

Number of applications

13 8 7

Language F77,F90,C,C++ F77,C C

Code size ~530.000 lines 28.000 lines 47.200 lines

#MPI calls in the code ~2400 ~400 ~600

#different MPI calls in the code

~59 ~36 ~44

Page 16: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

18 Holger Brunst, Matthias Müller: Leistungsanalyse

Application Fields

– Computation fluid dynamics

– Quantum chromodynamics

– Climate modeling

– Ray tracing

– Molecular Dynamics

– Weather prediction

– Heat transfer

– Hydrodynamics

– Flow Simulation

Page 17: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

19 Holger Brunst, Matthias Müller: Leistungsanalyse

MPI2007 Benchmark Goals

–Runs on Clusters or SMP’s

–Validates for correctness and measures performance

–Supports 32-bit or 64-bit OS/ABI.

–Consists of applications drawn from National Labs and University research centers

–Supports a broad range of MPI implementations and Operating systems including Windows, Linux, Proprietary Unix

–Has a runtime of ~1 hour per benchmark test at 16 ranks using GigE with 1 GB memory footprint per rank

–Scales to 128 ranks

–Is extensible to future large and extreme data sets planned to cover larger number of ranks.

Page 18: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

20 Holger Brunst, Matthias Müller: Leistungsanalyse

MPI2007 – tested for portability

– Architectures:

• Opteron, Xeon, Itanium2, PA-Risc, Power5, Sparc

– Interconnects:

• Ethernet, Infiniband, Infinipath, SGI NUMAlink, and shared memory.

– Operating systems

• Linux (RH FC3, SLES9/10,Suse 9.3), Windows CCS, HPUX, Solaris, AIX

– MPI implementations

• HP-MPI, MPICH, MPICH2, Open MPI, IBM-MPI, Intel MPI, MPICH-GM, MVAPICH, Fujitsu MPI, InfiniPath MPI, SGI MPT

– Compilers:

• SUN Studio, Fujitsu, Intel, PathScale, PGI, HP, and IBM compilers.

Page 19: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

21 Holger Brunst, Matthias Müller: Leistungsanalyse

MPI2007 – tested for scalability

– Scalable from 16 to 128 ranks (processes) for medium data set

– Runtime of 1 hour per benchmark test at 16 ranks using GigE on an unspecified reference cluster.

– Memory footprint should be < 1GB per rank at 16 ranks.

– Exhaustively tested for each rank count - 12 - 15 -> 130 - 140, 160, 180, 200, 225, 256, 512

Page 20: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

22 Holger Brunst, Matthias Müller: Leistungsanalyse

Overview of the applications

Code LOC Language MPI MPI Area call sites calls

104.milc 17987 C 51 18 Lattice QCD 107.leslie3d 10503 F77,F90 43 13 Combustion 113.GemsFDTD 21858 F90 237 16 Electrodynamic simulation 115.fds4 44524 F90,C 239 15 CFD

121.pop2 69203 F90 158 17 Geophysical fluid

dynamics 122.tachyon 15512 C 17 16 Ray tracing 126.lammps 6796 C++ 625 25 Molecular dynamics 127.wrf2 163462 F90,C 132 23 Weather forecast 128.GAPgeofem 30935 F77,C 58 18 Geophysical FEM 129.tera_tf 6468 F90 42 13 Eulerian hydrodynamics 130.socorro 91585 F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu 5671 F90 72 13 SSOR

Page 21: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

23 Holger Brunst, Matthias Müller: Leistungsanalyse

MPI2007 Benchmark dynamic message call counts

Page 22: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

24 Holger Brunst, Matthias Müller: Leistungsanalyse

Pt2Pt Communication Statistics: 122.tachyon (ray tracing)

Page 23: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

25 Holger Brunst, Matthias Müller: Leistungsanalyse

Pt2Pt Communication Statistics: 107.leslie3D (combustion)

Page 24: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

26 Holger Brunst, Matthias Müller: Leistungsanalyse

Pt2Pt Communication Statistics: 113.GemsFDTD (electrodynamics)

Page 25: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

27 Holger Brunst, Matthias Müller: Leistungsanalyse

Message Length Statistics (Pt2Pt)

Page 26: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

Matthias Müller ([email protected])

Center for Information Services and High Performance Computing (ZIH)

Available Results

Page 27: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

29 Holger Brunst, Matthias Müller: Leistungsanalyse

Available Results (blind submission)

– AMD A2210 Reference Platform (16 cores)

• Gigabit Ethernet

• Single Core AMD Opteron 848, 2.2 GHz

– SGI Altix 4700 (16-128 cores)

• SGI Numalink, SGI MPT 1.15

• Dual-Core Intel Itanium II 9040, 1.6 GHz

– HP Proliant BL460c Blade Cluster Platform 3000 BL (16-256 cores)

• Infiniband DDR, HP-MPI 2.2.5

• Dual-Core Intel Xeon 5160, 3.0 GHz

– QLogic, U. Cambridge Darwin Cluster (32-512 cores)

• Infinipath, QLogic Infinipath MPI library 2.0

• Dual-Core Intel Xeon 5160, 3.0 GHz

– QLogic, AMD Emerald Cluster (32-512 cores)

• Infinipath, QLogic Infinipath MPI library 2.1

• Dual-Core AMD Opteron 290, 2.8 GHz

Page 28: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

30 Holger Brunst, Matthias Müller: Leistungsanalyse

Scales to 128 , works on 512

Page 29: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

31 Holger Brunst, Matthias Müller: Leistungsanalyse

Scalability on U. Cambridge’s Darwin Cluster (II)

Page 30: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

32 Holger Brunst, Matthias Müller: Leistungsanalyse

Scalability on HP Cluster

Page 31: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

33 Holger Brunst, Matthias Müller: Leistungsanalyse

Summary and Conclusion

SPEC MPI2007 properties:

– Application benchmark with 13 different codes

– Run and reporting rules for reproducibility

– Tested on a wide range of platforms:

• CPU and Node Architectures

• Interconnects

• Compilers

• MPI implementations

– Available dataset (medium) scales to 128 ranks

– Next steps:

• Large dataset with enhanced scalability for larger systems

• …

Page 32: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

Matthias Müller ([email protected])

Center for Information Services and High Performance Computing (ZIH)

Use Cases

Page 33: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

35 Holger Brunst, Matthias Müller: Leistungsanalyse

Use cases

– Performance trends

– Compiler and performance

– Comparing different Itanium systems

– Comparing different system generations

Page 34: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

36 Holger Brunst, Matthias Müller: Leistungsanalyse

SPEC performance trends (performance per thread)

Page 35: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

37 Holger Brunst, Matthias Müller: Leistungsanalyse

Where Does the Performance Go? or Why Should I Care About the Memory Hierarchy?

Processor-DRAM Memory Gap (latency) Proc

60%/yr.

(2X/1.5yr)

DRAM

9%/yr.

(2X/10 yrs)

“Moore’s Law”

Processor-Memory

Performance Gap:

(grows 50% / year)

CPU

DRAM

Page 36: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

38 Holger Brunst, Matthias Müller: Leistungsanalyse

Comparison OMPM base compilers

Page 37: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

39 Holger Brunst, Matthias Müller: Leistungsanalyse

Influence of compilers on OMPM base 32-way results

Page 38: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

40 Holger Brunst, Matthias Müller: Leistungsanalyse

Comparison OMPM on 32-way 1.5 GHz Itanium

Page 39: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

41 Holger Brunst, Matthias Müller: Leistungsanalyse

SMP Performance Gain Itanium/Itanium 2

Page 40: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

42 Holger Brunst, Matthias Müller: Leistungsanalyse

45.7cm

38.6cm CPU

1985 1990 1995 1998

Perf

orm

ance

Bipolar Water-cooled

CMOS Air-cooled

Multi Nodes

Large scale cluster

>100nodes

SX-3

SX-5

Over 1GFLOP Per Node

SX-6/7

SX-1/2

SX-4

Technology

2cm

2cm

SX-8

Massive scale cluster >500nodes

2004

Single module node

Single Chip Vector Processor

Multi CPUs

Architecture

The history of NEC SX series

2001

Page 41: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

43 Holger Brunst, Matthias Müller: Leistungsanalyse

Performance Properties of Different SX systems

System Availability CPU perf. Mem band/ CPU Node perf. Mem. Band/ Node

SX-4 1996 2 GF/s 16 GB/s 64 GF/s 512 GB/s

SX-5e 1999 4 GF/s 32 GB/s 64 GF/s 512 GB/s

SX-6 2001 8 GF/s 32 GB/s 64 GF/s 256 GB/s

SX-6+ 2002 9 GF/s 36 GB/s 72 GF/s 324 GB/s

SX-8 2004 16 GF/s 64 GB/s 128 GF/s 512 GB/s

Factor 2 in

two years

Factor 2 in

eight years

Page 42: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

44 Holger Brunst, Matthias Müller: Leistungsanalyse

Properties of SPEC codes on vector systems

Name Lang Vratio Vlen MEM (MB)

Wupwise F 87.34 58.74 1488

Swim F 99.75 253.48 1584

Mgrid F 99.14 211.04 480

Applu F 81.31 34.17 1520

Galgel F 92.57 45.14 272

Equake C 0.06 9.6 464

Apsi F 76.70 23.02 1648

Gafort F 40.25 59.60 1680

Fma3d F 10.29 8.95 1040

Art C 32.06 242.14 272

Ammp C 76.67 102.79 176

Page 43: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

45 Holger Brunst, Matthias Müller: Leistungsanalyse

Expectations

Swim, mgrid and maybe galgel should perform well

Equake, fma3d and art should perform poorly

However, the focus was not on absolute, but relative performance and scalability

Page 44: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

46 Holger Brunst, Matthias Müller: Leistungsanalyse

SPEC efficiency on SX

Page 45: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

47 Holger Brunst, Matthias Müller: Leistungsanalyse

Performance measurements

All performance is reported relative to the performance of one thread on SX-4

Number of threads used:

– 1,2,4,8,16,32 on SX-4

– 1,2,4,8,16 on SX-5

– 1,2,4,8 on SX-6+

– 1,2,4,8 on SX-8

Page 46: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

48 Holger Brunst, Matthias Müller: Leistungsanalyse

Wupwise – expected behavior

Same node

performance

of SX-4/5/6

Page 47: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

49 Holger Brunst, Matthias Müller: Leistungsanalyse

Art – improves better than peak performance

Art benefits from

improvements of

scalar unit

Page 48: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

50 Holger Brunst, Matthias Müller: Leistungsanalyse

Swim – surprisingly improves with every generation

Compute

bound on SX-4

and SX-5 !

Page 49: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

51 Holger Brunst, Matthias Müller: Leistungsanalyse

Mgrid – large improvements from SX-6+ to SX-8

Improved

stride 2

memory access

Page 50: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

52 Holger Brunst, Matthias Müller: Leistungsanalyse

Not much improvement from SX-4 to 5 and 6 to 8

Page 51: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

53 Holger Brunst, Matthias Müller: Leistungsanalyse

Explanation for ammp improvements

Ammp contains a lot of locks

Lock performance (measured by EPCC microbenchmarks)

Lock Lock Ratio Ammp Ammp ratio

SX-6+ 4.3 micro s 1.23 2.82 1

SX-8 3.5 micro s 1 3.40 1.21

Page 52: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

54 Holger Brunst, Matthias Müller: Leistungsanalyse

General observations

With the exception of equake and galgel the applications show good scalability

Peak performance improvements

– realized to 87% to 96% for 1 thread

– realized to 81% to 89% for 8 threads

On average an SX-8 CPU is 6.14 times faster than an SX-4 CPU (peak ratio is 8)

No significant difference between scalar and vector codes

Page 53: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

Matthias Müller ([email protected])

Center for Information Services and High Performance Computing (ZIH)

Summary for SPEC

Page 54: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

56 Holger Brunst, Matthias Müller: Leistungsanalyse

Summary – What you should have learned

– There are many different benchmark approaches: microbenchmarks, kernels, applications…

– SPEC benchmarks are application or at least application oriented benchmarks, designed to represent current workloads

• An update is required after a few years

– SPEC benchmarks are used to:

• Measure and compare performance of systems

• Drive future development

• …

– Different metrics are used (base/peak, speed/throughput)

– Many different factors have an influence on application performance:

• CPU

• Memory system

• Compilers

• OS and runtime environment

• I/O system

• …

Page 55: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

Matthias Müller ([email protected])

Center for Information Services and High Performance Computing (ZIH)

Regression Models

Page 56: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

58 Holger Brunst, Matthias Müller: Leistungsanalyse

Terms

Regression models allow to estimate or predict a random variable as a function of several other variables

The estimated variable is called response variable, the variables used to predict the response are called predictor variables, predictors or factors.

Page 57: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

59 Holger Brunst, Matthias Müller: Leistungsanalyse

Simple Linear Regression Model

Predictor variable x and predicted response y:

Regression parameters b

Error

x

y

Measured y Estimated y

Page 58: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

60 Holger Brunst, Matthias Müller: Leistungsanalyse

Definitions

n observation pairs:

Error:

Sum of Squared Errors (SSE):

Mean Error:

Best linear model minimizes SSE and has a mean error of zero.

Exercise: calculate regression parameters for best linear model

Page 59: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

61 Holger Brunst, Matthias Müller: Leistungsanalyse

Calculation of Linear Regression Parameters

Page 60: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

62 Holger Brunst, Matthias Müller: Leistungsanalyse

Coefficient of determination

Sum of Squared Errors (SSE):

SSE without regression would be (total sum of squares SST):

Difference between SSE and SST is explained by regression:

SSR=SST-SSE

Coefficient of determination (the higher R, the better the regression)

Page 61: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

63 Holger Brunst, Matthias Müller: Leistungsanalyse

Assumptions

The relationship between the response variable y and the predictor variable x is linear

The predictor variable x is measured without any error

The model errors are statistically independent

The errors are normally distributed with zero mean and a constant standard deviation

Page 62: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

64 Holger Brunst, Matthias Müller: Leistungsanalyse

Visual tests: look at the data

x

y (a) Linear

x

y (c) Outlier

x

y (d) Nonlinear

x

y (b) Multilinear

Page 63: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

65 Holger Brunst, Matthias Müller: Leistungsanalyse

Residual versus predicted response graph

Predicted response

(a) No trend R

esid

ual

Predicted response

(b) Trend

Res

idua

l Predicted response

(c) Trend

Res

idua

l

Page 64: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

66 Holger Brunst, Matthias Müller: Leistungsanalyse

Residual versus experiment number

Experiment number

(a) No trend R

esid

ual

Experiment number

(b) Trend

Res

idua

l

Example: physical experiment with insufficient initial conditions.

Page 65: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

67 Holger Brunst, Matthias Müller: Leistungsanalyse

Check for constant standard deviation of errors

Predicted response

(a) No trend R

esid

ual

Predicted response

(b) Increasing spread

Res

idua

l

Page 66: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

68 Holger Brunst, Matthias Müller: Leistungsanalyse

Automatic fitting with gnuplot

Page 67: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

69 Holger Brunst, Matthias Müller: Leistungsanalyse

gnuplot> f(x)=a*x+b

gnuplot> fit f(x) "data.txt" u 1:2 via a,b

After 4 iterations the fit converged.

final sum of squares of residuals : 1.80841

rel. change during last iteration : -6.64694e-07

degrees of freedom (ndf) : 15

rms of residuals (stdfit) = sqrt(WSSR/ndf) : 0.347218

variance of residuals (reduced chisquare) = WSSR/ndf : 0.120561

Final set of parameters Asymptotic Standard Error

======================= ==========================

a = 0.530196 +/- 0.01719 (3.242%)

b = 3.70353 +/- 0.1761 (4.756%)

Page 68: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

70 Holger Brunst, Matthias Müller: Leistungsanalyse

Visual test

plot [0:][0:] "data.txt" u 1:2 w p, f(x)

Page 69: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

71 Holger Brunst, Matthias Müller: Leistungsanalyse

Residual versus predicted response graph

Page 70: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

72 Holger Brunst, Matthias Müller: Leistungsanalyse

Residual versus experiment number

Page 71: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

73 Holger Brunst, Matthias Müller: Leistungsanalyse

Fitting with gnuplot: basics

The `fit` command can fit a user-defined function to a set of data points

(x,y), using an implementation of the nonlinear least-squares

(NLLS) Marquardt-Levenberg algorithm. Any user-defined variable occurring in

the function body may serve as a fit parameter, but the return type of the

function must be real.

Syntax: fit {[xrange] {[yrange]}} <function> '<datafile>'

{datafile-modifiers} via '<parameter file>' | <var1>{,<var2>,...}

Page 72: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

74 Holger Brunst, Matthias Müller: Leistungsanalyse

Fitting with gnuplot: advanced

The default data formats for fitting functions with a single independent

variable, y=f(x), are {x:}y or x:y:s; those formats can be changed with

the datafile `using` qualifier. The third item (a column number or an

expression), if present, is interpreted as the standard deviation of the

corresponding y value and is used to compute a weight for the datum, 1/s**2.

Page 73: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

75 Holger Brunst, Matthias Müller: Leistungsanalyse

Curvilinear regression

Sometimes life is more difficult than linear dependencies: nonlinear regression is needed

Often it is sufficient to convert the nonlinear function in a linear form with suitable variable conversion, this is called curvilinear regression

Example:

Page 74: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

76 Holger Brunst, Matthias Müller: Leistungsanalyse

Examples of curvilinear regression functions

Note: if a predictor variable appears in more than one transformed predictor variable, the transformed variables are likely to be correlated, causing the problem of multicolinearity

Nonlinear Linear Y=a+b/x Y=a+b(1/x)

y = 1(a+bx) (1/y) = a+bx

y = x / (a+bx) (x/y) = a + bx

y = a b^x ln y = ln a + (ln b) x

y = a + b x^n y = a + b (x ^ n )

Page 75: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

77 Holger Brunst, Matthias Müller: Leistungsanalyse

Common mistakes in regression

Not verifying that the relationship is linear

Relying on automated results without visual verification

Not specifying confidence intervalls for the regression parameters

Not specifying the coefficient of determination

Confusing the Coefficient of Determination R^2 and the Coefficient of Correlation R

Using regression to predict far beyond the measure range

Page 76: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

78 Holger Brunst, Matthias Müller: Leistungsanalyse

Coefficient of determination provides wrong indication

x

y

x

y

x

y

x

y

Page 77: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

79 Holger Brunst, Matthias Müller: Leistungsanalyse

Short checklist for simple linear regression analysis

1. Visually verified that the relationship is linear?

2. Are all predictors in appropriate units so that the regression coeeficients are comparable?

3. Has the coefficient of determination been specified?

4. Is the coefficient of determination high enough?

5. Have the confidence intervals for regression parameters been calculated?

6. Are all regression parameters statistically significant?

7. Is the regression only been used for predictions closed to the measured range?

Page 78: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

80 Holger Brunst, Matthias Müller: Leistungsanalyse

Not treated here

Confidence intervals for regression parameters

Confidence intervals for predictions

Multiple linear regression

General transformations

.. and much more…

Page 79: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

Jens Doleschal ([email protected])

Internal Timer Synchronization for

Parallel Event Tracing

Use Case for Linear Regression

Center for Information Services and High Performance Computing (ZIH)

Page 80: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

82 Jens Doleschal

Introduction

• Timers on distributed environments are not sufficiently synchronized for the purpose of event tracing (~2-3 μs = network latency)

Causes for insufficient timers on distributed environments:

– Every host typically has it's own local timer

– System timers synchronized with NTP are far to imprecise for event tracing (~1 ms)

– Some timers like cycle counters are not synchronized by default

– Fluctuations of the timer speed due to temperature and aging of the quartz oscillator

– Speed Step Technology

Page 81: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

83 Jens Doleschal

Introduction

Use of inaccurately synchronized timers results in an erroneous representation of the program trace data:

Q1 Qualitative error: Violation of the logical order of distributed events.

Q2 Quantitative error: Distorted time measurement of distributed activities. Leads to skewed performance values.

Page 82: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

84 Jens Doleschal

Timer Synchronization: Overview

Two parts of the synchronization scheme:

– Recording synchronization information during runtime

– Subsequent correction, i.e. Transformation of asynchronous local time stamps to synchronous global time stamps with a linear interpolation

Due to small fluctuations in the timer drift the synchronization error will be accumulated over long intervals

Linear begin-to-end correction insufficient for long trace runs

Synchronize the timers frequently and piecewise interpolate the timer parameters between the synchronization phases

Page 83: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

85 Jens Doleschal

Timer Synchronization: Clock Model

Timer correction:

– System of linear equations solved with least square method

– Foundation of the equation system are message passing relationships between the local timers

– No need for a reference timer

Statistical estimation:

- Message delays determined by a maximum likelihood estimator

- Error is normal distributed

– Best linear unbiased estimator (BLUE)

Page 84: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

86 Jens Doleschal

Timer Synchronization: Resynchronization

Relationships between local timers using multiple ping-pong messages for concurrency with low uncertainty

Within each synchronization phase a specially designed message pattern monitors timer alignment

Frequently repetition of synchronization phase

1st Sync-phase 2nd Sync-phase

Page 85: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

87 Jens Doleschal

Measurement Results: Linear begin-to-end Synchronization

Page 86: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

88 Jens Doleschal

Measurement Results: Resynchronization every 4 Minutes

Page 87: Regression Models - TU Dresden · 8 Holger Brunst, Matthias Müller: Leistungsanalyse 2k Factorial Designs Determines the effect of k factors with 2 levels each Easy to analyze Helps

89 Jens Doleschal

Measurement Results: Resynchronization every 2 Minutes