Upload
colin-sharp
View
217
Download
2
Embed Size (px)
Citation preview
Paradyn Week – April 14, 2004 – Madison, WI
DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP
Applications
Bernd MohrForschungszentrum Jü[email protected]
* Work done while authors were at IBM Research
*Seetharami SeelamUniversity of Texas El [email protected]
Luiz DeRoseCray [email protected]
*
2
DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose
Outline
What is POMPWhat is DPCL IBM compiler and run-time library features
that makes dPOMP possible Implementation
POMP not supported features (and why)Probe libraries
POMPROF
KOJAKConclusions
3
DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose
“Standard” OpenMP Monitoring API?
Problem: OpenMP (unlike MPI) does not define
standard monitoring interface
OpenMP is defined mainly by directives/pragmas
Solution: POMP: OpenMP Monitoring Interface
Joint Development
Forschungszentrum JülichUniversity of Oregon
Presented at EWOMP’01, LACSI’01 and SC’01
“The Journal of Supercomputing”, 23, Aug. 2002.
4
DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose
What is POMP?
Portable cross-platform/cross-language API to simplify the design and implementation of OpenMP tools
POMP was motivated by the MPI profiling interface (PMPI) PMPI allows selective replacement of MPI routines at link time
Used by most MPI performance tools
VampirTrace
MP-Profiler
User Program
Call MPI_Bcast
Call MPI_Send
MPI Library
MPI_Bcast
PMPI_Send
MPI_Send
MPI Library
MPI_Bcast
PMPI_Send
MPI_Send
Profiling Library
MPI_Send
5
DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose
A Brief History of OpenMP Instrumentation
POMP1 OpenMP performance monitoring interface Forschungszentrum Jülich, University of Oregon
Published at "The Journal of Supercomputing", Vol. 23, 2002.
European IST Project INTONE Development of OpenMP tools (includes Monitoring interface)
Pallas, CEPBA, Royal Inst. Of Technology, Tech. Univ. Dresden
http://www.cepba.upc.es/intone/
Intel KAI Software Laboratory (KSL) - POMP Development of OpenMP monitoring interface inside ASCI
Based on POMP, but further developed in other directions
Joint proposal presented at EWOMP’02 POMP2 == POMP
6
DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose
POMP Proposal
Three groups of events OpenMP constructs and directives/pragmas
Enter/Exit around each OpenMP constructBegin/End around associated body
Special case for parallel loops: ChunkBegin/End, IterBegin/End, or IterEvent
instead of Begin/End
“Single” events for small constructs like atomic or flush
OpenMP API calls
Enter/Exit events around omp_set_*_lock() functions
“single” events for all API functions
User functions and regions
Allows application programmers to specify and control amount of instrumentation
7
DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose
1: int main() { 2: int id; 3: 4: #pragma omp parallel private(id) 5: { 6: id = omp_get_thread_num(); 7: printf("hello from %d\n", id); 8: } 9: }
Example: Standard Instrumentation 1: int main() { 2: int id;
3:
4: #pragma omp parallel private(id) 5: {
6: id = omp_get_thread_num(); 7: printf("hello from %d\n", id);
8: }
9: }
*** POMP_Init();
*** POMP_Finalize();
*** { POMP_handle_t pomp_hd1 = 0; *** int32 pomp_tid = omp_get_thread_num();
*** int32 pomp_tid = omp_get_thread_num();
*** }
*** POMP_Parallel_enter(&pomp_hd1, pomp_tid, -1, 1,*** "49*type=pregion*file=demo.c*slines=4,4*elines=8,8**");
*** POMP_Parallel_begin(pomp_hd1, pomp_tid);
*** POMP_Parallel_end(pomp_hd1, pomp_tid);
*** POMP_Parallel_exit(pomp_hd1, pomp_tid);
8
DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose
Dynamic Performance Monitoring Interface for OpenMP
Collaboration with Forschungszentrum Jülich
Motivation:POMP under review by the OpenMP ARB!
May take too long to be implemented (if accepted)
ApproachA POMP implementation based on dynamic
probesBuilt on top of DPCL
Modifies the binary with performance instrumentation
No source code or re-compilation required
10
DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose
A() {
}
OMP loop
Source code
main() {
}
A()
OMP parallel
OMP end parallel
DPOMP Instrumentation
The IBM compiler and run-time library
run-time library
Compiler generated
A() {
}
xlf_Par
main() {
}
A()
master thread
A@0L1 {
}
xlf_DoPar
all threads
do I=start,end loop bodyenddo
A@0L1@OL2 {
}
POMP_Parallel_enter
POMP_Parallel_exit
POMP_Parallel_begin
POMP_Parallel_end
POMP_Loop_enter
POMP_Loop_exit
POMP_Loop_chunk_begin
POMP_Loop_chunk_end
POMP_Functionl_enter
POMP_Functionl_exit
11
DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose
DPOMP Usage
% dpomp <pomp library> <exe>
Input parameters
<exe> OpenMP application (or mixed-mode)
<pomp-lib> POMP compliant monitoring library
List of user functions to instrument (optional)
dpomp [-f function.lst] libpomp a.out
12
DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose
DPOMP Instrumentation
Amount of instrumentation can be controlled By the tool builder
Set of POMP calls available in the monitoring library and/or
By the user
Environment variables
Events instrumented by default: All OpenMP constructs
All user functions called in the main program
All MPI Calls
Once the instrumentation is finished, the modified
program is executed
13
DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose
Limitations
63 out of 68 POMP events supported ! Limitations due to compiler issues
POMP_Loop_iter_(begin, or end, or event)
POMP_Implicit_barrier_(end, or exit)
OMP Parallel Loop NOT = OMP Parallel / OMP Loop
Compile Time Context (CTC)
hasFirstPrivate, hasLastPrivate, hasNowait, hasCopyin, schedule, hasOrdered, and hasCopypriv not available
Limitations due to DPCL issues Loop iteration values (init, final, incr, chunk)
Other Limitations C++ not support
14
DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose
POMP Profiler (POMPROF)
Generates a detailed profile describing overheads and time spent by each thread in three key regions of the parallel application:
Parallel regions
OpenMP loops inside a parallel region
User defined functions
Profile data is presented in the form of an
XML file that can be visualized with PeekPerf
15
DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose
POMP Profiler (pomprof)
16
DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose
POMP Profiler (II)
17
DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose
KOJAK POMP Library (Forschungszentrum Juelich)
POMP monitoring library which generates EPILOG event traces
Processed by KOJAK’s automatic event tracer analyzer EXPERT
Performance Property:Which type of behavior caused the problem?
Call Tree: Where in the source code? In which context?
Location:How is the problem distributed across the machine?
Color Scale:How severe is the problem?
18
DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose
EPILOG Trace Converted to VTF3 (FZ Juelich) EPILOG-to-VTF3
Maps OpenMP constructs into VAMPIR symbols and activities
19
DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose
Conclusions
DPCL based implementation of the POMP performance monitoring interface for OpenMP
Easy to use
Two POMP LibrariesKOJAK POMP Library
POMP Profiler Library
or build your own library