27
Entornos Operativos para la Gestión de Recursos... Dept. d’Arquitectura de Computadors UPC Performance analysis support on the Operating Environment Marisa Gil, Xavier Martorell {marisa, xavim}@ac.upc.es Curs de doctorat 2002/2003 Departament d’Arquitectura de Computadors UPC Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 2 UPC Motivation n Complexity grows in all the areas l Processors l Computers l Operating System l Run-time libraries l Applications n We have a problem...

Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

  • Upload
    others

  • View
    35

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

1

Entornos Operativos para la Gestión de Recursos... Dept. d’Arquitectura de Computadors

UPC

Performance analysis support on the Operating Environment

Marisa Gil, Xavier Martorell {marisa, xavim}@ac.upc.es

Curs de doctorat 2002/2003

Departament d’Arquitectura de Computadors

UPC

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 2

UPC

Motivation

n Complexity grows in all the areasl Processorsl Computersl Operating Systeml Run-time librariesl Applications

n We have a problem...

Page 2: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

2

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 3

UPC

Outline

n Motivation

n Introduction

n Support to performance analysisl Architectural, OS, RTL

n Examplesl PAPIl Rabbitl libFASTparparaver / Ompitracel Intone OMPI tracingl Scpus

n Conclusions

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 4

UPC

Motivation

n Why such a loop is performing so bad???

l Is it a matter of TLB misses?l Primary data cache misses?l External interventions???l False sharing?

!$OMP PARALLEL DOdo i = 1, N

do j = 1, NA(i,j) = 0.0

Page 3: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

3

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 5

UPC

Motivation

n Can we obtain statistics about performance?l Resource utilizationüCachesüMemoryü FP instructions

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 6

UPC

Introduction

n Tracing is necessary to figure out about performance

n Need for low interference

n Two viewsl Internal to the applicationl External from the operating systemü Several applications executed at a time

Page 4: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

4

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 7

UPC

Architectural support

n Performance event countersl Userl Kernell Exception

n Limitationsl Small number of registers to count onüMultiplex

l Small register size (R10K)üOverhead

l Hardware dependentü Processors from the same family have different counters

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 8

UPC

OS support

n None (...)l Currently, Linux

n Reading/writing global countersl Some support libraries

n Performance counters on thread/process contextl IRIX

Page 5: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

5

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 9

UPC

OS support

n Process/thread informationl /procl system calls

n Memory/swap usage

n [Processor information]l [Which process is it executing?]l How many time is spent in OS/exception code

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 10

UPC

OS support

n Availability of information related to processorsl Time spent in... [sysmp (MP_SAGET, SINFO_CPU, &data)]üUser levelü System levelüWaiting (IO, SWAP, PhysIO, Graphics...) [SGI]üMemory managementü Interruptü Idle

l [Process currently executed]üNot usually available :-(

Page 6: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

6

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 11

UPC

RTL support

n Countingl User-defined event-setsüGroups of counters got as a unit of informationüMove counters as a bulk across the OS interfaceü Specialize use of counters and relate different ones

l Multiplexl Overflow detectionüUser-programmed routines

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 12

UPC

RTL support

n Multiplexl Limited number of events counted at a timel Software can switch events countedüResolution depends on interval timers

l Overhead in the implementation

l Inaccuracyü Short high performance routines hidden

Page 7: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

7

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 13

UPC

RTL support

n User-defined handlersl Activated on an event exceeding a specific thresholdl Usually through OS supportüOtherwise an alarm handler can help

l Implemented through signals... efficient???

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 14

UPC

RTL support

n Getting/extracting the information l From the kernelü Specific system call neededüReading /proc

l Directly from the processorü Such machine language instructions are usually privilegedü Pentium architecture allows them to be read from user-level

– Setting a bit in the processor (CR0?)

ü Always will read own counters

Page 8: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

8

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 15

UPC

Support to the end-users

n Profiling toolsl Traces are analyzed/visualized by end-user utilitiesl Prof generates histogramsüNumber of events per subroutine/program line

l PARAVER shows the exact behaviour of the applicationl Vampir

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 16

UPC

Outline

n Motivation

n Introduction

n Support to performance analysisl Architectural, OS, RTL

n Examplesl PAPIl Rabbitl libFASTparparaver / Ompitracel Intone OMPI tracingl Scpus

n Conclusions

Page 9: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

9

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 17

UPC

PAPI

n Goall Specify a standard API for accessing hardware performance

counters

n Definesl Standard set of hardware eventsl Library interface

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 18

UPC

PAPI

n Architecture

Performance Counter Hardware

Operating System

Kernel extension

PAPI machine dependentsubstrate

PAPI low levelMultiplex countersOverflow managementTimer interrupt

PAPI high level

Page 10: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

10

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 19

UPC

PAPI

n How to use

PAPI low levelMultiplex countersOverflow managementTimer interrupt

PAPI high level

Performance Analysis ToolsFeedback-Directed CompilersAdaptive Run-Time LibrariesApplication measurement and timing

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 20

UPC

PAPI

n Processor / OS supportedl POWER3, AIX4.3l POWER4, AIX5l CRAY T3E, Unicosl AMD Athlon, Pentium III, Linux [patched]l Pentium III, Windowsl Itanium I & II, Linux [patched]l UltraSparc I, II & III, Solaris 2.8l MIPS R10K, R12K, IRIXl Alpha, Tru64 UNIXl Alpha, Linux [patched?]

Page 11: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

11

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 21

UPC

PAPI

n Overview of the interfacel Set of hardware independent eventsüNot available in all hardwareüCheck list!

l Low level APIüCounter management and acquisition

l High level APIü Easier counter acquisition

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 22

UPC

Low-level PAPI

PAPI_create_eventset (eventSet)q Creates an eventSet. Not thread-safe

PAPI_destroy_event_set (eventSet)

PAPI_add_event (eventSet, event)

PAPI_add_events (eventSet, events, number)

PAPI_cleanup_eventset (eventSet)

PAPI_rem_event (eventset, event)

PAPI_rem_events (eventSet, events, number)q EventSet management

Page 12: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

12

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 23

UPC

Low-level PAPI

PAPI_get_hardware_info (hw_info)

PAPI_get_executable_info (exe_info)

PAPI_get_opt (option, value)q Access to the environment

PAPI_query_event (eventCode)q Checks whether the hardware support such event

PAPI_set_domain (domain)q Execution domain to countqUser, kernel...

PAPI_set_granularity (granularity)q thread/process/process group/current cpu/all cpus

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 24

UPC

Low-level PAPI

PAPI_get_overflow_address (context)q Gets overflow PC

PAPI_overflow (eventSet, eventCode, threshold,flags, handler)

q Sets the overflow handler for the indicated event in eventSet

PAPI_set_multiplex (eventSet)q Allows multiplexion

PAPI_get_real_cyc (long long)

PAPI_get_real_usec (long long)q Time control

Page 13: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

13

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 25

UPC

Low-level PAPI

PAPI_start (eventSet)q Starts counting...not if in conflict with another eventSet

PAPI_stop (eventSet, values)q Stops the eventSet and reads values

PAPI_accum (eventSet, values)q Accumulates current counters in eventSet to values

PAPI_read (eventSet, values)q Copies current counters in eventSet to values

PAPI_write (eventSet, values)q Copies the values to the current counters in eventSet

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 26

UPC

High-level PAPI

n PAPI_start_counters (events, number)

n PAPI_read_counters (values, number)

n PAPI_accum_counters (values, number)

n PAPI_stop_counters (values, number)

n PAPI_flops (...)

Page 14: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

14

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 27

UPC

Outline

n Motivation

n Introduction

n Support to performance analysisl Architectural, OS, RTL

n Examplesl PAPIl Rabbitl libFASTparparaver / Ompitracel Intone OMPI tracingl Scpus

n Conclusions

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 28

UPC

Rabbit

n Thesisl “All programs should be designed to measure their own

performance”

n Preconditionl “Performance measurement software should be completely

portable”

n Preclusionsl “Performance measurement hardware and operating system

interfaces are completely non-portable”

Page 15: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

15

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 29

UPC

Rabbit

n Supports Pentium/AMDl Different hardwareü Pentium has two 40-bit countersü AMD has two 48-bit counters

l Different event setsl Single register counting cycles (64 bits)

n Supports Linuxl No patchl On dual/quad -processor systemsü counters are lost on process migrations

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 30

UPC

Rabbit

n Simple interfacel pmc_open/pmc_closel pmc_startl pmc_selectl pmc_readl pmc_counter_initl pmc_accumulate

Page 16: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

16

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 31

UPC

Outline

n Motivation

n Introduction

n Support to performance analysisl Architectural, OS, RTL

n Examplesl PAPIl Rabbitl libFASTparparaver / Ompitracel Intone OMPI tracingl Scpus

n Conclusions

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 32

UPC

libFASTparparaver

n Mergesl tracing applicationsl hardware event counters

n Visualization in Paraver

Page 17: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

17

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 33

UPC

libFASTparparaver

n Interfacel Compiler orientedü pushstate(newstate)ü popstate()ü changestate(newstate)ü changeandevent(newstate, event, value)ü pushandevent(newstate, event, value)ü popandevent(event, value)ü userevent(event, value)ü eventcount(userevent)ü endeventcount(userevent)

l Internal (user oriented?)

n Counters sampled implicitlyl Using IRIX/AIX system interfaces

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 34

UPC

libFASTparparaver

n What can we get?l thread statusl Counter valuesl Source code line

Page 18: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

18

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 35

UPC

Outline

n Motivation

n Introduction

n Support to performance analysisl Architectural, OS, RTL

n Examplesl PAPIl Rabbitl libFASTparparaver / Ompitracel Intone OMPI tracingl Scpus

n Conclusions

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 36

UPC

Intone OMPI

n Application/counting events related to source codel Automatically

n OMPI interfacel OpenMP constructsl Functionsl Event countersl Handles and tagsl Source code locations

Page 19: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

19

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 37

UPC

Intone OMPI

n OpenMP constructsl ompi_enter_parallel (parHandle, regionTag, numThreads,

sclHandle)l ompi_leave_handle (sclHandle)l ompi_enter_openmp (eventHandle, tag, sclHandle)l ompi_leave_openmp (sclHandle)üDO, SECTION, BARRIER, SINGLE, MASTER, CRITICAL...

l ompi_define_parallel (regionName, parHandle)

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 38

UPC

Intone OMPI

n Functionsl ompi_enter_function (fHandle, sclHandle)l ompi_leave_function (sclHandle)

l ompi_define_function (functionName, classHandle, fHandle)

Page 20: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

20

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 39

UPC

Intone OMPI

n Event countersl ompi_log_counter (noCounters, counterHandles, values,

sclHandle)

l ompi_define_counter (counterName, type, unit, bounds, counterHandle)

l ompi_set_counter_callback (fCallback, custom, nCounters)

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 40

UPC

Intone OMPI

n Handles and tagsl ompi_define_tag (tagString, tagHandle)

n Source code locationsl ompi_define_scl (fileName, lineNumber, sclHandle)

Page 21: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

21

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 41

UPC

Intone OMPI

n Visualization

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 42

UPC

Intone OMPI

n Counting iterations (or “find the 8 differences” - I)

Page 22: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

22

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 43

UPC

Intone OMPI

n Counting iterations (or “find the 8 differences” - II)

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 44

UPC

Outline

n Motivation

n Introduction

n Support to performance analysisl Architectural, OS, RTL

n Examplesl PAPIl Rabbitl libFASTparparaver / Ompitracel Intone OMPI tracingl Scpus

n Conclusions

Page 23: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

23

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 45

UPC

Scpus

n Allows tracing of OS view l Externally to applications

n Searches /proc for any useful informationl Mapping of processes/threads to processors

n Gets other information from the systeml Processor states

n Would be useful to have the process running in each processor as a OS primitive!!!

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 46

UPC

Scpus

n Visualization

Page 24: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

24

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 47

UPC

Scpus

n Poor coordinationn High number of

movementsn Higher execution

time in kernel moden Worse execution

times and throughput

ft.Amg.Acg.A

Kernel modeKernel idle

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 48

UPC

Scpus

n Better coordinationn Better processor to

cpu mappingn Consistent

execution timesn Improved memory

behaviourü Local accessesü Lower kernel mode

execution time

ft.Amg.Acg.A

Kernel modeKernel idle

Page 25: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

25

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 49

UPC

Outline

n Motivation

n Introduction

n Support to performance analysisl Architectural, OS, RTL

n Examplesl PAPIl Rabbitl libFASTparparaver / Ompitracel Intone OMPI tracingl Scpus

n Conclusions

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 50

UPC

Conclusions

n Complexity grows in all areasl Processorsl Computersl Operating Systeml Run-time librariesl Applications

n Have we solved that problem???l Or at least, do we have now some clues on how to deal with it?

Page 26: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

26

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 51

UPC

Conclusions

n Multiple tools support performance analysisl Librariesü PAPIüRabbitü libFASTparparaver

l End-user toolsü Visualization

n Internal and external viewsl Application orientedl Workload oriented

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 52

UPC

Support libraries

n List...l Papi http://icl.cs.utk.edu/projects/papiü Innovative Computing Lab (Computer Science, University of

Tennessee)– Jack Dongarra,

l Rabbit http://www.scl.ameslab.gov/Projects/Rabbitü Scalable Computing Lab (Ames Lab, US DOE, Iowa State

University)– Don Heller

Page 27: Performance analysis support on the Operating …people.ac.upc.edu/marisa/doctorado/PerfAnalysis.pdfPerformance analysis support on the Operating Environment Marisa Gil, Xavier Martorell

27

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 53

UPC

Support libraries

n List...l Vtune

http://developer.intel.com/software/products/vtune/index.htmü Intel

l DCPI http://www.tru64unix.compaq.com/dcpiüCompaq (Digital)

Dept. d’Arquitectura de Computadors Entornos Operativos para la Gestión de Recursos... 54

UPC

Other

n Parallel Tools (www.ptools.org)

n Solaris hardware statistics tool