16
Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida

Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida

Embed Size (px)

Citation preview

Page 1: Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida

Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version]

Adam Leko

HCS Research LaboratoryUniversity of Florida

Page 2: Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida

2

Summary Give characteristics of existing tools to aid our design discussions

Metrics (what is recorded, any hardware counters, etc) Profiled entities Visualizations

Most information & some slides taken from tool evaluations Tools overviewed

TAU Paradyn MPE/Jumpshot Dimemas/Paraver/MPITrace mpiP Dynaprof KOJAK Intel Cluster Tools (old Vampir/VampirTrace) Pablo MPICL/Paragraph

Page 3: Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida

3

TAU Metrics recorded

Two modes: profile, trace Profile mode

Inclusive/exclusive time spent in functions Hardware counter information

PAPI/PCL: L1/2/3 cache reads/writes/misses, TLB misses, cycles, integer/floating point/load/store/stalls executed, wall clock time, virtual time

Other OS timers (gettimeofday, getrusage) MPI message size sent

Trace mode Same as profile (minus hardware counters?) Message send time, message receive time, message size,

message sender/recipient(?) Profiled entities

Functions (automatic & dynamic), loops + regions (manual instrumentation)

Page 4: Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida

4

TAU Visualizations

Profile mode Text-based: pprof, shows a summary of profile

information Graphical: racy (old), jracy a.k.a. paraprof

Trace mode No built-in visualizations Can export to CUBE (see KOJAK), Jumpshot (see

MPE), and Vampir format (see Intel Cluster Tools)

Page 5: Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida

5

Paradyn Metrics recorded

Number of CPUs, number of active threads, CPU and inclusive CPU time

Function calls to and by Synchronization (# operations, wait time, inclusive wait time) Overall communication (# messages, bytes sent and received),

collective communication (# messages, bytes sent and received), point-to-point communication (# messages, bytes sent and received)

I/O (# operations, wait time, inclusive wait time, total bytes) All metrics recorded as “time histograms” (fixed-size data

structure) Profiled entities

Functions only (but includes functions linked to in existing libraries)

Page 6: Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida

6

Paradyn Visualizations

Time histograms Tables Barcharts “Terrains” (3-D histograms)

Page 7: Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida

7

MPE/Jumpshot Metrics collected

MPI message send time, receive time, size, message sender/recipient

User-defined event entry & exit Profiled entities

All MPI functions Functions or regions via manual instrumentation

and custom events Visualization

Jumpshot: timeline view (space-time diagram overlaid on Gantt chart), histogram

Page 8: Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida

8

Dimemas/Paraver/MPITrace Metrics recorded (MPITrace)

All MPI functions Hardware counters (2 from the

following two lists, uses PAPI) Counter 1

Cycles Issued instructions, loads, stores,

store conditionals Failed store conditionals Decoded branches Quadwords written back from

scache(?) Correctible scache data array

errors(?) Primary/secondary I-cache misses Instructions mispredicted from

scache way prediction table(?) External interventions (cache

coherency?) External invalidations (cache

coherency?) Graduated instructions

Counter 2 Cycles Graduated instructions,

loads, stores, store conditionals, floating point instructions

TLB misses Mispredicted branches Primary/secondary data

cache miss rates Data mispredictions from

scache way prediction table(?)

External intervention/invalidation (cache coherency?)

Store/prefetch exclusive to clean/shared block

Page 9: Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida

9

Dimemas/Paraver/MPITrace Profiled entities (MPITrace)

All MPI functions (message start time, message end time, message size, message recipient/sender)

User regions/functions via manual instrumentation Visualization

Timeline display (like Jumpshot) Shows Gantt chart and messages Also can overlay hardware counter information

Clicking on timeline brings up a text listing of events near where you clicked

1D/2D analysis modules

Page 10: Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida

10

mpiP Metrics collected

Start time, end time, message size for each MPI call Profiled entities

MPI function calls + PMPI wrapper Visualization

Text-based output, with graphical browser that displays statistics in-line with source

Displayed information: Overall time (%) for each MPI node Top 20 callsites for time (MPI%, App%, variance) Top 20 callsites for message size (MPI%, App%, variance) Min/max/average/MPI%/App% time spent at each call site Min/max/average/sum of message sizes at each call site

App time = wall clock time between MPI_Init and MPI_Finalize MPI time = all time consumed by MPI functions App% = % of metric in relation to overall app time MPI% = % of metric in relation to overall MPI time

Page 11: Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida

11

Dynaprof Metrics collected

Wall clock time or PAPI metric for each profiled entity

Collects inclusive, exclusive, and 1-level call tree % information

Profiled entities Functions (dynamic instrumentation)

Visualizations Simple text-based Simple GUI (shows same info as text-based)

Page 12: Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida

12

KOJAK Metrics collected

MPI: message start time, receive time, size, message sender/recipient

Manual instrumentation: start and stop times 1 PAPI metric / run (only FLOPS and L1 data misses

visualized) Profiled entities

MPI calls (MPI wrapper library) Function calls (automatic instrumentation, only available on

a few platforms) Regions and function calls via manual instrumentation

Visualizations Can export traces to Vampir trace format (see ICT) Shows profile and analyzed data via CUBE

Page 13: Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida

13

Intel Cluster Tools (ICT) Metrics collected

MPI functions: start time, end time, message size, message sender/recipient

User-defined events: counter, start & end times Code location for source-code correlation

Instrumented entities MPI functions via wrapper library User functions via binary instrumentation(?) User functions & regions via manual instrumentation

Visualizations Different types: timelines, statistics & counter info

Page 14: Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida

14

Pablo Metrics collected

Time inclusive/exclusive of a function Hardware counters via PAPI Summary metrics computed from timing info

Min/max/avg/stdev/count Profiled entities

Functions, function calls, and outer loops All selected via GUI

Visualizations Displays derived summary metrics color-coded

and inline with source code

Page 15: Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida

15

MPICL/Paragraph Metrics collected

MPI functions: start time, end time, message size, message sender/recipient

Manual instrumentation: start time, end time, “work” done (up to user to pass this in)

Profiled entities MPI function calls via PMPI interface User functions/regions via manual instrumentation

Visualizations Many, separated into 4 categories: utilization,

communication, task, “other”

Page 16: Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida

16

ParaGraph visualizations Utilization visualizations

Display rough estimate of processor utilization Utilization broken down into 3 states:

Idle – When program is blocked waiting for a communication operation (or it has stopped execution) Overhead – When a program is performing communication but is not blocked (time spent within MPI library) Busy – if execution part of program other than communication

“Busy” doesn’t necessarily mean useful work is being done since it assumes (not communication) := busy

Communication visualizations Display different aspects of communication Frequency, volume, overall pattern, etc. “Distance” computed by setting topology in options menu

Task visualizations Display information about when processors start & stop tasks Requires manually instrumented code to identify when processors start/stop tasks

Other visualizations Miscellaneous things