Upload
erica
View
58
Download
2
Tags:
Embed Size (px)
DESCRIPTION
OpenMP Performance Visualization with Paraver. Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC. PARAVER. (1992- ) Flexible performance visualization tool Functions of time Precedence relationships Quantitative, comparative Powerful / not trivial - PowerPoint PPT Presentation
Citation preview
OpenMP Performance OpenMP Performance Visualization with ParaverVisualization with Paraver
Jesús Labarta, Jordi Caubet, Judit Gimenez
Sergi Girona, Francesc Escale
CEPBA-UPC
Jesús Labarta, SPSciComp2000
PARAVERPARAVER
(1992- )
Flexible performance visualization tool
Functions of time
Precedence relationships
Quantitative, comparative
Powerful / not trivial
You drive the analysis
MPI + OpenMP, System activity, performance counters,…
Distributed by CEPBA
Jesús Labarta, SPSciComp2000
Process modelProcess model
Multithreaded + message passing + multiprogramming
Objects:ThreadTaskPtask (application)
Jesús Labarta, SPSciComp2000
TracefileTracefile
RecordsState (Object, time_start, time_end, state)Events: Flag (Object, time, type, value) Precedence (Object_src, Object_dst, time_src, time_dst, tag, size)
Instrumented codesMPI + OpenMPJavaPthreads, shmem
Monitoring toolsSystem activity (SCPUs)InfoPerfex
SimulatorsDimemasSimplescalar
Filterspar2ParaverUTE2Paraver
Jesús Labarta, SPSciComp2000
StructureStructure
Tracefile
Filter
Semantics
Visualization AnalysisTextual
Representation
Reduced Tracefile
Function of time (semantic value)Events
Demand Driven evaluation
Jesús Labarta, SPSciComp2000
Filter moduleFilter module
Events
by type
by value
Communications
by tag
by size
by source / destination
logical / physical
Jesús Labarta, SPSciComp2000
Semantic value: f(t)
f = fcomp2 fcomp1 fPtask ftask fthread
Semantic functions
fcomp2, fcomp1: sign, mod, div, in range
fPtask : add, average, max, select
ftask : add, average, max, select
fthread: in state, useful, given state,
last event value,
next event value,
average next event value
Semantic moduleSemantic module
fPtask
ftask
fthread fthread fthreadfthread
ftaskftask
fthread fthread
fcomp1
Jesús Labarta, SPSciComp2000
VisualizationVisualization
Type of window
Ptask / Task / thread: one row per object of selected type
Object selection (scalability)
Representation
Color encoded / Gradient / Function of time
Multiple windows
Synchronised
Forward/backward animation
Precise time measurement
Within/between windows
Jesús Labarta, SPSciComp2000
TextualTextual
Textual detail of area around point within window
Semantic value and duration / flag / communication
Numeric / translated text (.pcf file)
Jesús Labarta, SPSciComp2000
AnalysisAnalysis
Time and object range selected pointing on window
Analysis function applied to output of semantic module
Average semantic value
Average duration/variance/number of bursts (if within range)
Number of events
Number of communications
...
Jesús Labarta, SPSciComp2000
OpenMP instrumentationOpenMP instrumentation
Compiler instrumentation
NANOS compiler
Dynamic Interception SGI native OpenMP (MP library)
Tracing of thread status running idle (busy wait) scheduling blocked
Jesús Labarta, SPSciComp2000
OpenMP analysisOpenMP analysis
Application structure Stamping code
Jesús Labarta, SPSciComp2000
OpenMP analysisOpenMP analysis
Loop scheduling Antena design
Jesús Labarta, SPSciComp2000
OpenMP analysisOpenMP analysis
Jesús Labarta, SPSciComp2000
OpenMP analysisOpenMP analysis
How do bees see flowers?
Jesús Labarta, SPSciComp2000
OpenMP analysisOpenMP analysis
Jesús Labarta, SPSciComp2000
OpenMP analysisOpenMP analysis
Jesús Labarta, SPSciComp2000
OpenMP analysisOpenMP analysis
What bees don’t see
Function A B C D
Av. L2 misses/ms 62 52 163 14
FLOPS/ms 41K 21K 8K 1K
Loads/ms 57K 52K 18K 100K
Jesús Labarta, SPSciComp2000
Static vs. Dynamic ParallelismStatic vs. Dynamic Parallelism
Jesús Labarta, SPSciComp2000
More on hardware countersMore on hardware counters
Less misses, more time
Jesús Labarta, SPSciComp2000
More on hardware countersMore on hardware counters
More memory accesses per second
Less coherence state changes
Jesús Labarta, SPSciComp2000
MPI + OpenMPMPI + OpenMP
NAS FT
Quantitative data:
%MPI collective comm: 18%
%OMP: fork/join 5%
%non parallelized: 32%
Avg. || Loop: 50ms
# || loops: 38
# || loops < 5ms 6
Jesús Labarta, SPSciComp2000
Other usesOther uses
System activity
InfoPerfex
Pthreads
Average : 33 MFLOPS
Peak: 60 MFLOPS
Jesús Labarta, SPSciComp2000
Paraver on IBMParaver on IBM
DPCL + PAPI : Sequential programs OpenMP
UTE MPI MPI+OpenMP
Jesús Labarta, SPSciComp2000
Filter Thread states
Executing application code
Executing MPI Reveive
Executing MPI Send
Descheduled
Statistics
UTE ParaverUTE Paraver
Appl. Code MPI Rec. MPI Send Descheduled
97% 1% 0% 1%
38% 9% 1% 52%
46% 1% 1% 52%
47% 8% 0% 45%
Jesús Labarta, SPSciComp2000
UTE AnalysisUTE Analysis
Communication pattern Exchanges 1 2 ; 3 4
Load balance More load on thread 1
MPI implementation Busy wait on receives
Scheduling Thread 2 and 3 time sharing one CPU Thread 4 time sharing one CPU with other processes OS quantum: 10 ms.
Jesús Labarta, SPSciComp2000
More informationMore information
http://www.cepba.upc.es/paraver