27
OpenMP Performance OpenMP Performance Visualization with Visualization with Paraver Paraver Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

  • Upload
    erica

  • View
    58

  • Download
    2

Embed Size (px)

DESCRIPTION

OpenMP Performance Visualization with Paraver. Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC. PARAVER. (1992- ) Flexible performance visualization tool Functions of time Precedence relationships Quantitative, comparative Powerful / not trivial - PowerPoint PPT Presentation

Citation preview

Page 1: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

OpenMP Performance OpenMP Performance Visualization with ParaverVisualization with Paraver

Jesús Labarta, Jordi Caubet, Judit Gimenez

Sergi Girona, Francesc Escale

CEPBA-UPC

Page 2: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

PARAVERPARAVER

(1992- )

Flexible performance visualization tool

Functions of time

Precedence relationships

Quantitative, comparative

Powerful / not trivial

You drive the analysis

MPI + OpenMP, System activity, performance counters,…

Distributed by CEPBA

Page 3: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

Process modelProcess model

Multithreaded + message passing + multiprogramming

Objects:ThreadTaskPtask (application)

Page 4: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

TracefileTracefile

RecordsState (Object, time_start, time_end, state)Events: Flag (Object, time, type, value) Precedence (Object_src, Object_dst, time_src, time_dst, tag, size)

Instrumented codesMPI + OpenMPJavaPthreads, shmem

Monitoring toolsSystem activity (SCPUs)InfoPerfex

SimulatorsDimemasSimplescalar

Filterspar2ParaverUTE2Paraver

Page 5: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

StructureStructure

Tracefile

Filter

Semantics

Visualization AnalysisTextual

Representation

Reduced Tracefile

Function of time (semantic value)Events

Demand Driven evaluation

Page 6: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

Filter moduleFilter module

Events

by type

by value

Communications

by tag

by size

by source / destination

logical / physical

Page 7: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

Semantic value: f(t)

f = fcomp2 fcomp1 fPtask ftask fthread

Semantic functions

fcomp2, fcomp1: sign, mod, div, in range

fPtask : add, average, max, select

ftask : add, average, max, select

fthread: in state, useful, given state,

last event value,

next event value,

average next event value

Semantic moduleSemantic module

fPtask

ftask

fthread fthread fthreadfthread

ftaskftask

fthread fthread

fcomp1

Page 8: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

VisualizationVisualization

Type of window

Ptask / Task / thread: one row per object of selected type

Object selection (scalability)

Representation

Color encoded / Gradient / Function of time

Multiple windows

Synchronised

Forward/backward animation

Precise time measurement

Within/between windows

Page 9: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

TextualTextual

Textual detail of area around point within window

Semantic value and duration / flag / communication

Numeric / translated text (.pcf file)

Page 10: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

AnalysisAnalysis

Time and object range selected pointing on window

Analysis function applied to output of semantic module

Average semantic value

Average duration/variance/number of bursts (if within range)

Number of events

Number of communications

...

Page 11: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

OpenMP instrumentationOpenMP instrumentation

Compiler instrumentation

NANOS compiler

Dynamic Interception SGI native OpenMP (MP library)

Tracing of thread status running idle (busy wait) scheduling blocked

Page 12: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

OpenMP analysisOpenMP analysis

Application structure Stamping code

Page 13: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

OpenMP analysisOpenMP analysis

Loop scheduling Antena design

Page 14: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

OpenMP analysisOpenMP analysis

Page 15: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

OpenMP analysisOpenMP analysis

How do bees see flowers?

Page 16: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

OpenMP analysisOpenMP analysis

Page 17: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

OpenMP analysisOpenMP analysis

Page 18: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

OpenMP analysisOpenMP analysis

What bees don’t see

Function A B C D

Av. L2 misses/ms 62 52 163 14

FLOPS/ms 41K 21K 8K 1K

Loads/ms 57K 52K 18K 100K

Page 19: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

Static vs. Dynamic ParallelismStatic vs. Dynamic Parallelism

Page 20: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

More on hardware countersMore on hardware counters

Less misses, more time

Page 21: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

More on hardware countersMore on hardware counters

More memory accesses per second

Less coherence state changes

Page 22: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

MPI + OpenMPMPI + OpenMP

NAS FT

Quantitative data:

%MPI collective comm: 18%

%OMP: fork/join 5%

%non parallelized: 32%

Avg. || Loop: 50ms

# || loops: 38

# || loops < 5ms 6

Page 23: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

Other usesOther uses

System activity

InfoPerfex

Pthreads

Average : 33 MFLOPS

Peak: 60 MFLOPS

Page 24: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

Paraver on IBMParaver on IBM

DPCL + PAPI : Sequential programs OpenMP

UTE MPI MPI+OpenMP

Page 25: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

Filter Thread states

Executing application code

Executing MPI Reveive

Executing MPI Send

Descheduled

Statistics

UTE ParaverUTE Paraver

Appl. Code MPI Rec. MPI Send Descheduled

97% 1% 0% 1%

38% 9% 1% 52%

46% 1% 1% 52%

47% 8% 0% 45%

Page 26: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

UTE AnalysisUTE Analysis

Communication pattern Exchanges 1 2 ; 3 4

Load balance More load on thread 1

MPI implementation Busy wait on receives

Scheduling Thread 2 and 3 time sharing one CPU Thread 4 time sharing one CPU with other processes OS quantum: 10 ms.

Page 27: Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

Jesús Labarta, SPSciComp2000

More informationMore information

http://www.cepba.upc.es/paraver

[email protected]