24
Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Embed Size (px)

Citation preview

Page 1: Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center

© 2005 IBM Corporation

The IBM High Performance Computing Toolkit

Guojing Cong

Page 2: Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center

© 2005 IBM Corporation

IBM High Performance Computing Toolkit (HPCT)

One consolidated package

Components:– Hardware Performance Monitor(HPM)

– Simulation Guided Memory Analyzer (SiGMA)

– MPI Profiler (MP_profiler)

– OpenMP Profiler (PompProf)

– Modular I/O Performance Tool (MIO)

– Xprofiler

– GUI integration tool w/ source code traceback (PeekPerf)

– Watson Sparse Matrix Library (WSMP) included

Page 3: Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center

© 2005 IBM Corporation

Our Vision

A toolkit that spans various aspects of high performance computing

– CPU profiling, memory behavior analysis, communication profiling, I/O analysis and optimization

Integrated performance monitoring and profiling environment

– one single consistent interface for all components

– enhanced functionality

• Binary instrumentation (without source code modification)• Dynamic instrumentation

Available on IBM Platforms

– AIX, LoP, and BlueGene

Page 4: Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center

© 2005 IBM Corporation

Support Matrix

HPMCount &

HPMlib

MP-profiler&MP-tracer

Xprofiler

SHMEM &

SHMEM-profiler

MIOPompPofi

lerSiGMA

PeekPerfWatson Sparse

Matrix Package

AIX Powe

r

today (AIX 5L 5.1, 5.3)

today (AIX 4.3.3

+)

today (AIX  5L 5.1)

today (AIX 5L

5.1)

today(AIX 5L 5.1)

today (AIX 5L

5.1)

today (AIX 4.3.3+)

today(AIX 4.3.3+)

today (AIX 5L 5.1)

Linux Powe

r

Aug/05 (Linux 2.4

&2.6)

May/05 (Linux

2.6)

Aug-Sep/05 (Linux 2.6)

N/ATBT

(Linux 2.6)

N/AAug-Sep/05 (Linux 2.6)

TBT TBT(Linux 2.6)

Linux JS20

Aug/05 (Linux 2.4

&2.6)

May/05 (Linux

2.6)

Aug-Sep/05 (Linux 2.6)

N/ATBT

(Linux 2.6)

N/AAug-Sep/05 (Linux 2.6)

TBT TBT(Linux 2.6)

Linux BG/L

Aug/05 today Aug/05 N/A TBT N/A N/A today N/A

Page 5: Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center

© 2005 IBM Corporation

Outline

Xprofiler

HPM

MP Profiler

OpenMP Profiler

MIO

Page 6: Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center

© 2005 IBM Corporation

Xprofiler

CPU profiling tool similar to gprof

Can be used to profile both serial and parallel applications

Use procedure-profiling information to construct a graphical display of the functions within an application

Provide quick access to the profiled data and helps users identify functions that are the most CPU-intensive

Based on sampling (support from both compiler and kernel)

Charge execution time to source lines and show disassembly code

Page 7: Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center

© 2005 IBM Corporation

Xprofiler: Main Display

Width of a bar:time includingcalled routines

Height of a bar:time excludingcalled routines

Call arrowslabeled withnumber of calls

Overview windowfor easy navigation(View Overview)

Page 8: Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center

© 2005 IBM Corporation

Xprofiler: Source Code Window

Source codewindow displayssource codewith time profile(in ticks=.01 sec)

Access

– Select functionin main display

– context menu

– Select functionin flat profile

– Code Display

– Show Source Code

Page 9: Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center

© 2005 IBM Corporation

Xprofiler - Disassembler Code

Page 10: Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center

© 2005 IBM Corporation

HPM provides comprehensive reports of hardware events that are

critical to performance

– Accurate and Low overhead

– Comprehensive

• E.g., number of floating-point instructions executed, cache misses, TLB misses

Derived metrics

– correlate the behavior of the application to one or more of the hardware components

Thread-level support

Including

– Hpmcount, libhpm, hpmstat

Page 11: Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center

© 2005 IBM Corporation

HPM Visualization Using PeekPerf

Page 12: Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center

© 2005 IBM Corporation

MP_profiler

A set of libraries that collect profiling data for MPI and TurboSHMEM applications

– Implements wrappers using PMPI interface

Report performance metrics, e.g.,

– time used by MPI function calls

– message sizes

Visualization tools help users identify performance bottlenecks

– peekperf maps performance metrics back to the source codes

– peekview gives a visual representation of the overall computation and communication pattern of the system.

Page 13: Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center

© 2005 IBM Corporation

MP_Profiler Visualization Using PeekPerf

Page 14: Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center

© 2005 IBM Corporation

MP_Tracer Visualization Using PeekPerf

Page 15: Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center

© 2005 IBM Corporation

POMP Profiler (PompProf)

Generates a detailed profile describing overheads and time spent by each thread in three key regions of the parallel application:

– Parallel regions

– OpenMP loops inside a parallel region

– User defined functions

Profile data is presented in the form of an XML file that can be visualized with PeekPerf

Page 16: Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center

© 2005 IBM Corporation

DPOMP

Dynamically instruments OpenMP applications

Has the advantage of the being able to modify binaries with performance instrumentation without requiring access to souce codes or recompilation

Based on dynamic probes using DPCL

Page 17: Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center

© 2005 IBM Corporation

PompProf Visualization Using PeekPerf

Page 18: Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center

© 2005 IBM Corporation

Modular I/O Performance Tool (MIO)

I/O Analysis

– Trace module

– Summary of File I/O Activity + Binary Events File

– Low CPU overhead

I/O Performance Enhancement Library

– Prefetch module (optimizes asynchronous prefetch and write-behind)

– System Buffer Bypass capability

– User controlled pages (size and number)

Recoverable Error Handling

– Recover module (monitors return values and errnor + reissues failed requests)

Remote Data Server

– Remote module (simple socket protocol for moving data)

Shared object library for AIX

Page 19: Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center

© 2005 IBM Corporation

MIO User Code Interface

#define open64(a,b,c)MIO_open64(a,b,c,0)#define read MIO_read#define write MIO_write#define close MIO_close#define lseek64 MIO_lseek64#define fcntl MIO_fcntl#define ftruncate64 MIO_ftruncate64#define fstat64 MIO_fstat64

Page 20: Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center

© 2005 IBM Corporation

MIO Trace Module (sample partial text output)

Trace close : program <-> pf : /bmwfs/cdh108.T20536_13.SCR300 : (281946/2162.61)=130.37 mbytes/s current size=0 max_size=16277 mode =0777 sector size=4096 oflags =0x302=RDWR CREAT TRUNC open 1 0.01 write 478193 462.10 59774 59774 131072 131072 read 1777376 1700.48 222172 222172 131072 131072 seek 911572 2.83 fcntl 3 0.00 trunc 16 0.40 close 1 0.03 size 127787

Page 21: Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center

© 2005 IBM Corporation

MSC.Nastran V2001

Benchmark: SOL 111, 1.7M DOF, 1578 modes, 146 frequencies, residual flexibility and acoustics. 120 GB of disk space.

Machine:4-way, 1.3 GHz p655, 32 GB with 16 GB large pages, JFS striped on 16 SCSI disks.

MSC.Nastran:V2001.0.9 with large pages, dmp=2 parallel=2 mem=700mbThe run with MIO used mio=1000mb

Tim

e

(sec

ond

s)

6.8 TB of I/O in 26666 seconds is an average of about 250 MB/sec

0

10,000

20,000

30,000

40,000

50,000

60,000

no MIO with MIO

Elapsed

CPU time

Page 22: Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center

© 2005 IBM Corporation

Page 23: Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center

© 2005 IBM Corporation

Page 24: Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center

© 2005 IBM Corporation

Problems that we are considering

Performance profiling and monitoring for scientific applications on large systems

– Selectively generates and reports profiling data

– Large amount performance data management and analysis

Composite profiling and presentation

– CPU profiling

– Hardware Performance Counter profiling

– Communication profiling