Upload
others
View
35
Download
0
Embed Size (px)
Citation preview
1
Entornos Operativos para la Gestión de Recursos... Dept. d’Arquitectura de Computadors
UPC
Performance analysis support on the Operating Environment
Marisa Gil, Xavier Martorell {marisa, xavim}@ac.upc.es
Curs de doctorat 2002/2003
Departament d’Arquitectura de Computadors
UPC
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 2
UPC
Motivation
n Complexity grows in all the areasl Processorsl Computersl Operating Systeml Run-time librariesl Applications
n We have a problem...
2
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 3
UPC
Outline
n Motivation
n Introduction
n Support to performance analysisl Architectural, OS, RTL
n Examplesl PAPIl Rabbitl libFASTparparaver / Ompitracel Intone OMPI tracingl Scpus
n Conclusions
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 4
UPC
Motivation
n Why such a loop is performing so bad???
l Is it a matter of TLB misses?l Primary data cache misses?l External interventions???l False sharing?
!$OMP PARALLEL DOdo i = 1, N
do j = 1, NA(i,j) = 0.0
3
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 5
UPC
Motivation
n Can we obtain statistics about performance?l Resource utilizationüCachesüMemoryü FP instructions
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 6
UPC
Introduction
n Tracing is necessary to figure out about performance
n Need for low interference
n Two viewsl Internal to the applicationl External from the operating systemü Several applications executed at a time
4
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 7
UPC
Architectural support
n Performance event countersl Userl Kernell Exception
n Limitationsl Small number of registers to count onüMultiplex
l Small register size (R10K)üOverhead
l Hardware dependentü Processors from the same family have different counters
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 8
UPC
OS support
n None (...)l Currently, Linux
n Reading/writing global countersl Some support libraries
n Performance counters on thread/process contextl IRIX
5
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 9
UPC
OS support
n Process/thread informationl /procl system calls
n Memory/swap usage
n [Processor information]l [Which process is it executing?]l How many time is spent in OS/exception code
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 10
UPC
OS support
n Availability of information related to processorsl Time spent in... [sysmp (MP_SAGET, SINFO_CPU, &data)]üUser levelü System levelüWaiting (IO, SWAP, PhysIO, Graphics...) [SGI]üMemory managementü Interruptü Idle
l [Process currently executed]üNot usually available :-(
6
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 11
UPC
RTL support
n Countingl User-defined event-setsüGroups of counters got as a unit of informationüMove counters as a bulk across the OS interfaceü Specialize use of counters and relate different ones
l Multiplexl Overflow detectionüUser-programmed routines
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 12
UPC
RTL support
n Multiplexl Limited number of events counted at a timel Software can switch events countedüResolution depends on interval timers
l Overhead in the implementation
l Inaccuracyü Short high performance routines hidden
7
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 13
UPC
RTL support
n User-defined handlersl Activated on an event exceeding a specific thresholdl Usually through OS supportüOtherwise an alarm handler can help
l Implemented through signals... efficient???
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 14
UPC
RTL support
n Getting/extracting the information l From the kernelü Specific system call neededüReading /proc
l Directly from the processorü Such machine language instructions are usually privilegedü Pentium architecture allows them to be read from user-level
– Setting a bit in the processor (CR0?)
ü Always will read own counters
8
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 15
UPC
Support to the end-users
n Profiling toolsl Traces are analyzed/visualized by end-user utilitiesl Prof generates histogramsüNumber of events per subroutine/program line
l PARAVER shows the exact behaviour of the applicationl Vampir
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 16
UPC
Outline
n Motivation
n Introduction
n Support to performance analysisl Architectural, OS, RTL
n Examplesl PAPIl Rabbitl libFASTparparaver / Ompitracel Intone OMPI tracingl Scpus
n Conclusions
9
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 17
UPC
PAPI
n Goall Specify a standard API for accessing hardware performance
counters
n Definesl Standard set of hardware eventsl Library interface
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 18
UPC
PAPI
n Architecture
Performance Counter Hardware
Operating System
Kernel extension
PAPI machine dependentsubstrate
PAPI low levelMultiplex countersOverflow managementTimer interrupt
PAPI high level
10
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 19
UPC
PAPI
n How to use
PAPI low levelMultiplex countersOverflow managementTimer interrupt
PAPI high level
Performance Analysis ToolsFeedback-Directed CompilersAdaptive Run-Time LibrariesApplication measurement and timing
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 20
UPC
PAPI
n Processor / OS supportedl POWER3, AIX4.3l POWER4, AIX5l CRAY T3E, Unicosl AMD Athlon, Pentium III, Linux [patched]l Pentium III, Windowsl Itanium I & II, Linux [patched]l UltraSparc I, II & III, Solaris 2.8l MIPS R10K, R12K, IRIXl Alpha, Tru64 UNIXl Alpha, Linux [patched?]
11
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 21
UPC
PAPI
n Overview of the interfacel Set of hardware independent eventsüNot available in all hardwareüCheck list!
l Low level APIüCounter management and acquisition
l High level APIü Easier counter acquisition
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 22
UPC
Low-level PAPI
PAPI_create_eventset (eventSet)q Creates an eventSet. Not thread-safe
PAPI_destroy_event_set (eventSet)
PAPI_add_event (eventSet, event)
PAPI_add_events (eventSet, events, number)
PAPI_cleanup_eventset (eventSet)
PAPI_rem_event (eventset, event)
PAPI_rem_events (eventSet, events, number)q EventSet management
12
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 23
UPC
Low-level PAPI
PAPI_get_hardware_info (hw_info)
PAPI_get_executable_info (exe_info)
PAPI_get_opt (option, value)q Access to the environment
PAPI_query_event (eventCode)q Checks whether the hardware support such event
PAPI_set_domain (domain)q Execution domain to countqUser, kernel...
PAPI_set_granularity (granularity)q thread/process/process group/current cpu/all cpus
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 24
UPC
Low-level PAPI
PAPI_get_overflow_address (context)q Gets overflow PC
PAPI_overflow (eventSet, eventCode, threshold,flags, handler)
q Sets the overflow handler for the indicated event in eventSet
PAPI_set_multiplex (eventSet)q Allows multiplexion
PAPI_get_real_cyc (long long)
PAPI_get_real_usec (long long)q Time control
13
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 25
UPC
Low-level PAPI
PAPI_start (eventSet)q Starts counting...not if in conflict with another eventSet
PAPI_stop (eventSet, values)q Stops the eventSet and reads values
PAPI_accum (eventSet, values)q Accumulates current counters in eventSet to values
PAPI_read (eventSet, values)q Copies current counters in eventSet to values
PAPI_write (eventSet, values)q Copies the values to the current counters in eventSet
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 26
UPC
High-level PAPI
n PAPI_start_counters (events, number)
n PAPI_read_counters (values, number)
n PAPI_accum_counters (values, number)
n PAPI_stop_counters (values, number)
n PAPI_flops (...)
14
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 27
UPC
Outline
n Motivation
n Introduction
n Support to performance analysisl Architectural, OS, RTL
n Examplesl PAPIl Rabbitl libFASTparparaver / Ompitracel Intone OMPI tracingl Scpus
n Conclusions
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 28
UPC
Rabbit
n Thesisl “All programs should be designed to measure their own
performance”
n Preconditionl “Performance measurement software should be completely
portable”
n Preclusionsl “Performance measurement hardware and operating system
interfaces are completely non-portable”
15
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 29
UPC
Rabbit
n Supports Pentium/AMDl Different hardwareü Pentium has two 40-bit countersü AMD has two 48-bit counters
l Different event setsl Single register counting cycles (64 bits)
n Supports Linuxl No patchl On dual/quad -processor systemsü counters are lost on process migrations
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 30
UPC
Rabbit
n Simple interfacel pmc_open/pmc_closel pmc_startl pmc_selectl pmc_readl pmc_counter_initl pmc_accumulate
16
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 31
UPC
Outline
n Motivation
n Introduction
n Support to performance analysisl Architectural, OS, RTL
n Examplesl PAPIl Rabbitl libFASTparparaver / Ompitracel Intone OMPI tracingl Scpus
n Conclusions
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 32
UPC
libFASTparparaver
n Mergesl tracing applicationsl hardware event counters
n Visualization in Paraver
17
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 33
UPC
libFASTparparaver
n Interfacel Compiler orientedü pushstate(newstate)ü popstate()ü changestate(newstate)ü changeandevent(newstate, event, value)ü pushandevent(newstate, event, value)ü popandevent(event, value)ü userevent(event, value)ü eventcount(userevent)ü endeventcount(userevent)
l Internal (user oriented?)
n Counters sampled implicitlyl Using IRIX/AIX system interfaces
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 34
UPC
libFASTparparaver
n What can we get?l thread statusl Counter valuesl Source code line
18
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 35
UPC
Outline
n Motivation
n Introduction
n Support to performance analysisl Architectural, OS, RTL
n Examplesl PAPIl Rabbitl libFASTparparaver / Ompitracel Intone OMPI tracingl Scpus
n Conclusions
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 36
UPC
Intone OMPI
n Application/counting events related to source codel Automatically
n OMPI interfacel OpenMP constructsl Functionsl Event countersl Handles and tagsl Source code locations
19
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 37
UPC
Intone OMPI
n OpenMP constructsl ompi_enter_parallel (parHandle, regionTag, numThreads,
sclHandle)l ompi_leave_handle (sclHandle)l ompi_enter_openmp (eventHandle, tag, sclHandle)l ompi_leave_openmp (sclHandle)üDO, SECTION, BARRIER, SINGLE, MASTER, CRITICAL...
l ompi_define_parallel (regionName, parHandle)
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 38
UPC
Intone OMPI
n Functionsl ompi_enter_function (fHandle, sclHandle)l ompi_leave_function (sclHandle)
l ompi_define_function (functionName, classHandle, fHandle)
20
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 39
UPC
Intone OMPI
n Event countersl ompi_log_counter (noCounters, counterHandles, values,
sclHandle)
l ompi_define_counter (counterName, type, unit, bounds, counterHandle)
l ompi_set_counter_callback (fCallback, custom, nCounters)
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 40
UPC
Intone OMPI
n Handles and tagsl ompi_define_tag (tagString, tagHandle)
n Source code locationsl ompi_define_scl (fileName, lineNumber, sclHandle)
21
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 41
UPC
Intone OMPI
n Visualization
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 42
UPC
Intone OMPI
n Counting iterations (or “find the 8 differences” - I)
22
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 43
UPC
Intone OMPI
n Counting iterations (or “find the 8 differences” - II)
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 44
UPC
Outline
n Motivation
n Introduction
n Support to performance analysisl Architectural, OS, RTL
n Examplesl PAPIl Rabbitl libFASTparparaver / Ompitracel Intone OMPI tracingl Scpus
n Conclusions
23
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 45
UPC
Scpus
n Allows tracing of OS view l Externally to applications
n Searches /proc for any useful informationl Mapping of processes/threads to processors
n Gets other information from the systeml Processor states
n Would be useful to have the process running in each processor as a OS primitive!!!
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 46
UPC
Scpus
n Visualization
24
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 47
UPC
Scpus
n Poor coordinationn High number of
movementsn Higher execution
time in kernel moden Worse execution
times and throughput
ft.Amg.Acg.A
Kernel modeKernel idle
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 48
UPC
Scpus
n Better coordinationn Better processor to
cpu mappingn Consistent
execution timesn Improved memory
behaviourü Local accessesü Lower kernel mode
execution time
ft.Amg.Acg.A
Kernel modeKernel idle
25
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 49
UPC
Outline
n Motivation
n Introduction
n Support to performance analysisl Architectural, OS, RTL
n Examplesl PAPIl Rabbitl libFASTparparaver / Ompitracel Intone OMPI tracingl Scpus
n Conclusions
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 50
UPC
Conclusions
n Complexity grows in all areasl Processorsl Computersl Operating Systeml Run-time librariesl Applications
n Have we solved that problem???l Or at least, do we have now some clues on how to deal with it?
26
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 51
UPC
Conclusions
n Multiple tools support performance analysisl Librariesü PAPIüRabbitü libFASTparparaver
l End-user toolsü Visualization
n Internal and external viewsl Application orientedl Workload oriented
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 52
UPC
Support libraries
n List...l Papi http://icl.cs.utk.edu/projects/papiü Innovative Computing Lab (Computer Science, University of
Tennessee)– Jack Dongarra,
l Rabbit http://www.scl.ameslab.gov/Projects/Rabbitü Scalable Computing Lab (Ames Lab, US DOE, Iowa State
University)– Don Heller
27
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 53
UPC
Support libraries
n List...l Vtune
http://developer.intel.com/software/products/vtune/index.htmü Intel
l DCPI http://www.tru64unix.compaq.com/dcpiüCompaq (Digital)
Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 54
UPC
Other
n Parallel Tools (www.ptools.org)
n Solaris hardware statistics tool