63
TotalView by Perforce© 2019 Perforce Software, Inc. TotalView by Perforce © Perforce Software, Inc. Techniques for Debugging HPC Applications NIKOLAY PISKUN , TOTALVIEW SOFTWARE ARCHITECT AUGUST 11, 2021, ATPESC 2021

Techniques for Debugging HPC Applications

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Techniques for Debugging HPC Applications

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Techniques for Debugging HPC Applications

NIKOLAY PISKUN TOTALVIEW SOFTWARE ARCHITECT AUGUST 11 2021 ATPESC 2021

totalviewio2 | TotalView by Perforce copy Perforce Software Inc

bull Introduction

bull Overview of TotalView Features

bull TotalView Debugging Solution

bull General Debugging Features for C C++ and Fortran

bull UI Navigation and Process Control

bull Action Points

bull Examining and Editing Data

bull Advanced C++ and Data Debugging

bull Mixed Language CC++Fortran and Python Debugging

bull Remote Debugging

bull MPI OpenMP CUDA GPU and Hybrid Debugging

bull Reverse ConnectAttach

bull Memory Debugging

bull Reverse Debugging

bull HPC Debugging Techniques

bull TotalView Resources and Documentation

bull QampA

Agenda

What is Debugging andWhy do you need TotalView

totalviewio4 | TotalView by Perforce copy Perforce Software Inc

TotalView Features

bull Comprehensive C C++ and Fortran

debugger

bull Multi-processmulti-thread dynamic

analysis

bull Thread specific breakpoints with

individual thread control

bull View thread specific stack and data

bull View complex data types easily

bull MPI OpenMP Hybrid and CUDA

debugging

bull Convenient remote debugging for HPC

bull Integrated Reverse and Memory

debugging

bull Mixed Language - Python CC++ debugging

bull Script debugging

bull Linux macOS and UNIX

totalviewio5 | TotalView by Perforce copy Perforce Software Inc

bull More than just a tool to find bugs

bull Understand complex code

bull Improve code quality

bull Collaborate with team members to resolve issues faster

bull Shorten development time

bull Finds problems and bugs in applications including

bull Program crash or incorrect behavior

bull Data issues

bull Application memory leaks and errors

bull Communication problems between processes and threads

bull CUDA application analysis and debugging

bull Applications in an automated test and batch environments

What is TotalView used for

UI Navigation and Process Control

totalviewio7 | TotalView by Perforce copy Perforce Software Inc

TotalViewrsquos Default Views

1 Processes amp Threads Control Viewbull Lookup File or Functionbull Documents

2 Source View

3 Call Stack View

4 Local Variables View

5 Data View Command LineInputOutput

6 Action Points Replay Bookmarks

totalviewio8 | TotalView by Perforce copy Perforce Software Inc

Process and Threads View

totalviewio9 | TotalView by Perforce copy Perforce Software Inc

Source View

totalviewio10 | TotalView by Perforce copy Perforce Software Inc

Call Stack View and Local Variables View

totalviewio11 | TotalView by Perforce copy Perforce Software Inc

Action Points View

totalviewio12 | TotalView by Perforce copy Perforce Software Inc

Data View Command Line View and InputOutput View

Action Points

totalviewio16 | TotalView by Perforce copy Perforce Software Inc

Breakpoint

Evaluation Point (Evalpoint)

Watchpoint

Barrier point

Action Points

totalviewio17 | TotalView by Perforce copy Perforce Software Inc

Setting Breakpoints

bull Setting action points

bull Single-click line number

bull Right clicking on the line

number and using the

context menu

bull Clicking a line in the source

view then selecting the

Action Points -gt Set

breakpoint menu option

totalviewio18 | TotalView by Perforce copy Perforce Software Inc

bull Breakpoint-gtAt Locationhellip

bull Specify function name or line number

bull If function name TotalView sets a breakpoint at

first executable line in the function

Setting Breakpoints

totalviewio20 | TotalView by Perforce copy Perforce Software Inc

Evaluation points

bull Use Eval points to

bull Include instructions that stop a process and its relatives

bull Test potential fixes or patches for your program

bull Include a goto for C or Fortran that transfers control to a

line number in your program

bull Execute a TotalView function

bull Set the values of your programrsquos variables

totalviewio21 | TotalView by Perforce copy Perforce Software Inc

bull Print the value of a variable to the command line

printf(The value of result is dn result)

bull Skip some code

goto 63

bull Stop a loop after a certain number of iterations

if ( (i 100) == 0)

printf(The value of i is dn i)

$stop

See ldquoUsing Built-in Statementsrdquo in Appendix A of the User Guide for more information on ldquo$rdquo expressions

httpshelptotalviewiocurrentHTMLindexhtmlpageTotalViewBuiltInStatmentshtmlww1894979

Evaluation points Examples

totalviewio22 | TotalView by Perforce copy Perforce Software Inc

bull Watchpoints are set on a specific memory location

bull Execution is stopped when the value stored in that memory location changes

bull A breakpoint stops before an instruction executes A watchpoint stops after an instruction executes

Watchpoints

totalviewio23 | TotalView by Perforce copy Perforce Software Inc

bull Used to synchronize a group of threads or processes defined in the action point

bull Threads or processes are held at barrierpoint until all threads or processes in the group arrive

bull When all threads or processes arrive the barrier is satisfied and the threads or processes are released

Barrier Breakpoints

totalviewio24 | TotalView by Perforce copy Perforce Software Inc

Saving Breakpoints

From the Action Points menu select Save or Save As to save breakpointsTurn on option to save action points on exit

Examining and Editing Data

totalviewio26 | TotalView by Perforce copy Perforce Software Inc

Call Stack and Local Variables

Call Stack Viewbull Lists the set of call frames as the

program calls from one function or method to another

bull Filter button used to turn on or off filtering of frames

Local Variables Viewbull Displays local variables relative to the

current thread of interest and the selected stack frame

bull Organized by arguments and blocksbull To edit values add variable to the Data

View

totalviewio27 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel

bull Data View allows deeper exploration of data structures

bull Edit data valuesbull Cast to new data typesbull Add data to the Data View using the context

menu or by dragging and dropping

Context menu

Drag and drop

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 2: Techniques for Debugging HPC Applications

totalviewio2 | TotalView by Perforce copy Perforce Software Inc

bull Introduction

bull Overview of TotalView Features

bull TotalView Debugging Solution

bull General Debugging Features for C C++ and Fortran

bull UI Navigation and Process Control

bull Action Points

bull Examining and Editing Data

bull Advanced C++ and Data Debugging

bull Mixed Language CC++Fortran and Python Debugging

bull Remote Debugging

bull MPI OpenMP CUDA GPU and Hybrid Debugging

bull Reverse ConnectAttach

bull Memory Debugging

bull Reverse Debugging

bull HPC Debugging Techniques

bull TotalView Resources and Documentation

bull QampA

Agenda

What is Debugging andWhy do you need TotalView

totalviewio4 | TotalView by Perforce copy Perforce Software Inc

TotalView Features

bull Comprehensive C C++ and Fortran

debugger

bull Multi-processmulti-thread dynamic

analysis

bull Thread specific breakpoints with

individual thread control

bull View thread specific stack and data

bull View complex data types easily

bull MPI OpenMP Hybrid and CUDA

debugging

bull Convenient remote debugging for HPC

bull Integrated Reverse and Memory

debugging

bull Mixed Language - Python CC++ debugging

bull Script debugging

bull Linux macOS and UNIX

totalviewio5 | TotalView by Perforce copy Perforce Software Inc

bull More than just a tool to find bugs

bull Understand complex code

bull Improve code quality

bull Collaborate with team members to resolve issues faster

bull Shorten development time

bull Finds problems and bugs in applications including

bull Program crash or incorrect behavior

bull Data issues

bull Application memory leaks and errors

bull Communication problems between processes and threads

bull CUDA application analysis and debugging

bull Applications in an automated test and batch environments

What is TotalView used for

UI Navigation and Process Control

totalviewio7 | TotalView by Perforce copy Perforce Software Inc

TotalViewrsquos Default Views

1 Processes amp Threads Control Viewbull Lookup File or Functionbull Documents

2 Source View

3 Call Stack View

4 Local Variables View

5 Data View Command LineInputOutput

6 Action Points Replay Bookmarks

totalviewio8 | TotalView by Perforce copy Perforce Software Inc

Process and Threads View

totalviewio9 | TotalView by Perforce copy Perforce Software Inc

Source View

totalviewio10 | TotalView by Perforce copy Perforce Software Inc

Call Stack View and Local Variables View

totalviewio11 | TotalView by Perforce copy Perforce Software Inc

Action Points View

totalviewio12 | TotalView by Perforce copy Perforce Software Inc

Data View Command Line View and InputOutput View

Action Points

totalviewio16 | TotalView by Perforce copy Perforce Software Inc

Breakpoint

Evaluation Point (Evalpoint)

Watchpoint

Barrier point

Action Points

totalviewio17 | TotalView by Perforce copy Perforce Software Inc

Setting Breakpoints

bull Setting action points

bull Single-click line number

bull Right clicking on the line

number and using the

context menu

bull Clicking a line in the source

view then selecting the

Action Points -gt Set

breakpoint menu option

totalviewio18 | TotalView by Perforce copy Perforce Software Inc

bull Breakpoint-gtAt Locationhellip

bull Specify function name or line number

bull If function name TotalView sets a breakpoint at

first executable line in the function

Setting Breakpoints

totalviewio20 | TotalView by Perforce copy Perforce Software Inc

Evaluation points

bull Use Eval points to

bull Include instructions that stop a process and its relatives

bull Test potential fixes or patches for your program

bull Include a goto for C or Fortran that transfers control to a

line number in your program

bull Execute a TotalView function

bull Set the values of your programrsquos variables

totalviewio21 | TotalView by Perforce copy Perforce Software Inc

bull Print the value of a variable to the command line

printf(The value of result is dn result)

bull Skip some code

goto 63

bull Stop a loop after a certain number of iterations

if ( (i 100) == 0)

printf(The value of i is dn i)

$stop

See ldquoUsing Built-in Statementsrdquo in Appendix A of the User Guide for more information on ldquo$rdquo expressions

httpshelptotalviewiocurrentHTMLindexhtmlpageTotalViewBuiltInStatmentshtmlww1894979

Evaluation points Examples

totalviewio22 | TotalView by Perforce copy Perforce Software Inc

bull Watchpoints are set on a specific memory location

bull Execution is stopped when the value stored in that memory location changes

bull A breakpoint stops before an instruction executes A watchpoint stops after an instruction executes

Watchpoints

totalviewio23 | TotalView by Perforce copy Perforce Software Inc

bull Used to synchronize a group of threads or processes defined in the action point

bull Threads or processes are held at barrierpoint until all threads or processes in the group arrive

bull When all threads or processes arrive the barrier is satisfied and the threads or processes are released

Barrier Breakpoints

totalviewio24 | TotalView by Perforce copy Perforce Software Inc

Saving Breakpoints

From the Action Points menu select Save or Save As to save breakpointsTurn on option to save action points on exit

Examining and Editing Data

totalviewio26 | TotalView by Perforce copy Perforce Software Inc

Call Stack and Local Variables

Call Stack Viewbull Lists the set of call frames as the

program calls from one function or method to another

bull Filter button used to turn on or off filtering of frames

Local Variables Viewbull Displays local variables relative to the

current thread of interest and the selected stack frame

bull Organized by arguments and blocksbull To edit values add variable to the Data

View

totalviewio27 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel

bull Data View allows deeper exploration of data structures

bull Edit data valuesbull Cast to new data typesbull Add data to the Data View using the context

menu or by dragging and dropping

Context menu

Drag and drop

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 3: Techniques for Debugging HPC Applications

What is Debugging andWhy do you need TotalView

totalviewio4 | TotalView by Perforce copy Perforce Software Inc

TotalView Features

bull Comprehensive C C++ and Fortran

debugger

bull Multi-processmulti-thread dynamic

analysis

bull Thread specific breakpoints with

individual thread control

bull View thread specific stack and data

bull View complex data types easily

bull MPI OpenMP Hybrid and CUDA

debugging

bull Convenient remote debugging for HPC

bull Integrated Reverse and Memory

debugging

bull Mixed Language - Python CC++ debugging

bull Script debugging

bull Linux macOS and UNIX

totalviewio5 | TotalView by Perforce copy Perforce Software Inc

bull More than just a tool to find bugs

bull Understand complex code

bull Improve code quality

bull Collaborate with team members to resolve issues faster

bull Shorten development time

bull Finds problems and bugs in applications including

bull Program crash or incorrect behavior

bull Data issues

bull Application memory leaks and errors

bull Communication problems between processes and threads

bull CUDA application analysis and debugging

bull Applications in an automated test and batch environments

What is TotalView used for

UI Navigation and Process Control

totalviewio7 | TotalView by Perforce copy Perforce Software Inc

TotalViewrsquos Default Views

1 Processes amp Threads Control Viewbull Lookup File or Functionbull Documents

2 Source View

3 Call Stack View

4 Local Variables View

5 Data View Command LineInputOutput

6 Action Points Replay Bookmarks

totalviewio8 | TotalView by Perforce copy Perforce Software Inc

Process and Threads View

totalviewio9 | TotalView by Perforce copy Perforce Software Inc

Source View

totalviewio10 | TotalView by Perforce copy Perforce Software Inc

Call Stack View and Local Variables View

totalviewio11 | TotalView by Perforce copy Perforce Software Inc

Action Points View

totalviewio12 | TotalView by Perforce copy Perforce Software Inc

Data View Command Line View and InputOutput View

Action Points

totalviewio16 | TotalView by Perforce copy Perforce Software Inc

Breakpoint

Evaluation Point (Evalpoint)

Watchpoint

Barrier point

Action Points

totalviewio17 | TotalView by Perforce copy Perforce Software Inc

Setting Breakpoints

bull Setting action points

bull Single-click line number

bull Right clicking on the line

number and using the

context menu

bull Clicking a line in the source

view then selecting the

Action Points -gt Set

breakpoint menu option

totalviewio18 | TotalView by Perforce copy Perforce Software Inc

bull Breakpoint-gtAt Locationhellip

bull Specify function name or line number

bull If function name TotalView sets a breakpoint at

first executable line in the function

Setting Breakpoints

totalviewio20 | TotalView by Perforce copy Perforce Software Inc

Evaluation points

bull Use Eval points to

bull Include instructions that stop a process and its relatives

bull Test potential fixes or patches for your program

bull Include a goto for C or Fortran that transfers control to a

line number in your program

bull Execute a TotalView function

bull Set the values of your programrsquos variables

totalviewio21 | TotalView by Perforce copy Perforce Software Inc

bull Print the value of a variable to the command line

printf(The value of result is dn result)

bull Skip some code

goto 63

bull Stop a loop after a certain number of iterations

if ( (i 100) == 0)

printf(The value of i is dn i)

$stop

See ldquoUsing Built-in Statementsrdquo in Appendix A of the User Guide for more information on ldquo$rdquo expressions

httpshelptotalviewiocurrentHTMLindexhtmlpageTotalViewBuiltInStatmentshtmlww1894979

Evaluation points Examples

totalviewio22 | TotalView by Perforce copy Perforce Software Inc

bull Watchpoints are set on a specific memory location

bull Execution is stopped when the value stored in that memory location changes

bull A breakpoint stops before an instruction executes A watchpoint stops after an instruction executes

Watchpoints

totalviewio23 | TotalView by Perforce copy Perforce Software Inc

bull Used to synchronize a group of threads or processes defined in the action point

bull Threads or processes are held at barrierpoint until all threads or processes in the group arrive

bull When all threads or processes arrive the barrier is satisfied and the threads or processes are released

Barrier Breakpoints

totalviewio24 | TotalView by Perforce copy Perforce Software Inc

Saving Breakpoints

From the Action Points menu select Save or Save As to save breakpointsTurn on option to save action points on exit

Examining and Editing Data

totalviewio26 | TotalView by Perforce copy Perforce Software Inc

Call Stack and Local Variables

Call Stack Viewbull Lists the set of call frames as the

program calls from one function or method to another

bull Filter button used to turn on or off filtering of frames

Local Variables Viewbull Displays local variables relative to the

current thread of interest and the selected stack frame

bull Organized by arguments and blocksbull To edit values add variable to the Data

View

totalviewio27 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel

bull Data View allows deeper exploration of data structures

bull Edit data valuesbull Cast to new data typesbull Add data to the Data View using the context

menu or by dragging and dropping

Context menu

Drag and drop

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 4: Techniques for Debugging HPC Applications

totalviewio4 | TotalView by Perforce copy Perforce Software Inc

TotalView Features

bull Comprehensive C C++ and Fortran

debugger

bull Multi-processmulti-thread dynamic

analysis

bull Thread specific breakpoints with

individual thread control

bull View thread specific stack and data

bull View complex data types easily

bull MPI OpenMP Hybrid and CUDA

debugging

bull Convenient remote debugging for HPC

bull Integrated Reverse and Memory

debugging

bull Mixed Language - Python CC++ debugging

bull Script debugging

bull Linux macOS and UNIX

totalviewio5 | TotalView by Perforce copy Perforce Software Inc

bull More than just a tool to find bugs

bull Understand complex code

bull Improve code quality

bull Collaborate with team members to resolve issues faster

bull Shorten development time

bull Finds problems and bugs in applications including

bull Program crash or incorrect behavior

bull Data issues

bull Application memory leaks and errors

bull Communication problems between processes and threads

bull CUDA application analysis and debugging

bull Applications in an automated test and batch environments

What is TotalView used for

UI Navigation and Process Control

totalviewio7 | TotalView by Perforce copy Perforce Software Inc

TotalViewrsquos Default Views

1 Processes amp Threads Control Viewbull Lookup File or Functionbull Documents

2 Source View

3 Call Stack View

4 Local Variables View

5 Data View Command LineInputOutput

6 Action Points Replay Bookmarks

totalviewio8 | TotalView by Perforce copy Perforce Software Inc

Process and Threads View

totalviewio9 | TotalView by Perforce copy Perforce Software Inc

Source View

totalviewio10 | TotalView by Perforce copy Perforce Software Inc

Call Stack View and Local Variables View

totalviewio11 | TotalView by Perforce copy Perforce Software Inc

Action Points View

totalviewio12 | TotalView by Perforce copy Perforce Software Inc

Data View Command Line View and InputOutput View

Action Points

totalviewio16 | TotalView by Perforce copy Perforce Software Inc

Breakpoint

Evaluation Point (Evalpoint)

Watchpoint

Barrier point

Action Points

totalviewio17 | TotalView by Perforce copy Perforce Software Inc

Setting Breakpoints

bull Setting action points

bull Single-click line number

bull Right clicking on the line

number and using the

context menu

bull Clicking a line in the source

view then selecting the

Action Points -gt Set

breakpoint menu option

totalviewio18 | TotalView by Perforce copy Perforce Software Inc

bull Breakpoint-gtAt Locationhellip

bull Specify function name or line number

bull If function name TotalView sets a breakpoint at

first executable line in the function

Setting Breakpoints

totalviewio20 | TotalView by Perforce copy Perforce Software Inc

Evaluation points

bull Use Eval points to

bull Include instructions that stop a process and its relatives

bull Test potential fixes or patches for your program

bull Include a goto for C or Fortran that transfers control to a

line number in your program

bull Execute a TotalView function

bull Set the values of your programrsquos variables

totalviewio21 | TotalView by Perforce copy Perforce Software Inc

bull Print the value of a variable to the command line

printf(The value of result is dn result)

bull Skip some code

goto 63

bull Stop a loop after a certain number of iterations

if ( (i 100) == 0)

printf(The value of i is dn i)

$stop

See ldquoUsing Built-in Statementsrdquo in Appendix A of the User Guide for more information on ldquo$rdquo expressions

httpshelptotalviewiocurrentHTMLindexhtmlpageTotalViewBuiltInStatmentshtmlww1894979

Evaluation points Examples

totalviewio22 | TotalView by Perforce copy Perforce Software Inc

bull Watchpoints are set on a specific memory location

bull Execution is stopped when the value stored in that memory location changes

bull A breakpoint stops before an instruction executes A watchpoint stops after an instruction executes

Watchpoints

totalviewio23 | TotalView by Perforce copy Perforce Software Inc

bull Used to synchronize a group of threads or processes defined in the action point

bull Threads or processes are held at barrierpoint until all threads or processes in the group arrive

bull When all threads or processes arrive the barrier is satisfied and the threads or processes are released

Barrier Breakpoints

totalviewio24 | TotalView by Perforce copy Perforce Software Inc

Saving Breakpoints

From the Action Points menu select Save or Save As to save breakpointsTurn on option to save action points on exit

Examining and Editing Data

totalviewio26 | TotalView by Perforce copy Perforce Software Inc

Call Stack and Local Variables

Call Stack Viewbull Lists the set of call frames as the

program calls from one function or method to another

bull Filter button used to turn on or off filtering of frames

Local Variables Viewbull Displays local variables relative to the

current thread of interest and the selected stack frame

bull Organized by arguments and blocksbull To edit values add variable to the Data

View

totalviewio27 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel

bull Data View allows deeper exploration of data structures

bull Edit data valuesbull Cast to new data typesbull Add data to the Data View using the context

menu or by dragging and dropping

Context menu

Drag and drop

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 5: Techniques for Debugging HPC Applications

totalviewio5 | TotalView by Perforce copy Perforce Software Inc

bull More than just a tool to find bugs

bull Understand complex code

bull Improve code quality

bull Collaborate with team members to resolve issues faster

bull Shorten development time

bull Finds problems and bugs in applications including

bull Program crash or incorrect behavior

bull Data issues

bull Application memory leaks and errors

bull Communication problems between processes and threads

bull CUDA application analysis and debugging

bull Applications in an automated test and batch environments

What is TotalView used for

UI Navigation and Process Control

totalviewio7 | TotalView by Perforce copy Perforce Software Inc

TotalViewrsquos Default Views

1 Processes amp Threads Control Viewbull Lookup File or Functionbull Documents

2 Source View

3 Call Stack View

4 Local Variables View

5 Data View Command LineInputOutput

6 Action Points Replay Bookmarks

totalviewio8 | TotalView by Perforce copy Perforce Software Inc

Process and Threads View

totalviewio9 | TotalView by Perforce copy Perforce Software Inc

Source View

totalviewio10 | TotalView by Perforce copy Perforce Software Inc

Call Stack View and Local Variables View

totalviewio11 | TotalView by Perforce copy Perforce Software Inc

Action Points View

totalviewio12 | TotalView by Perforce copy Perforce Software Inc

Data View Command Line View and InputOutput View

Action Points

totalviewio16 | TotalView by Perforce copy Perforce Software Inc

Breakpoint

Evaluation Point (Evalpoint)

Watchpoint

Barrier point

Action Points

totalviewio17 | TotalView by Perforce copy Perforce Software Inc

Setting Breakpoints

bull Setting action points

bull Single-click line number

bull Right clicking on the line

number and using the

context menu

bull Clicking a line in the source

view then selecting the

Action Points -gt Set

breakpoint menu option

totalviewio18 | TotalView by Perforce copy Perforce Software Inc

bull Breakpoint-gtAt Locationhellip

bull Specify function name or line number

bull If function name TotalView sets a breakpoint at

first executable line in the function

Setting Breakpoints

totalviewio20 | TotalView by Perforce copy Perforce Software Inc

Evaluation points

bull Use Eval points to

bull Include instructions that stop a process and its relatives

bull Test potential fixes or patches for your program

bull Include a goto for C or Fortran that transfers control to a

line number in your program

bull Execute a TotalView function

bull Set the values of your programrsquos variables

totalviewio21 | TotalView by Perforce copy Perforce Software Inc

bull Print the value of a variable to the command line

printf(The value of result is dn result)

bull Skip some code

goto 63

bull Stop a loop after a certain number of iterations

if ( (i 100) == 0)

printf(The value of i is dn i)

$stop

See ldquoUsing Built-in Statementsrdquo in Appendix A of the User Guide for more information on ldquo$rdquo expressions

httpshelptotalviewiocurrentHTMLindexhtmlpageTotalViewBuiltInStatmentshtmlww1894979

Evaluation points Examples

totalviewio22 | TotalView by Perforce copy Perforce Software Inc

bull Watchpoints are set on a specific memory location

bull Execution is stopped when the value stored in that memory location changes

bull A breakpoint stops before an instruction executes A watchpoint stops after an instruction executes

Watchpoints

totalviewio23 | TotalView by Perforce copy Perforce Software Inc

bull Used to synchronize a group of threads or processes defined in the action point

bull Threads or processes are held at barrierpoint until all threads or processes in the group arrive

bull When all threads or processes arrive the barrier is satisfied and the threads or processes are released

Barrier Breakpoints

totalviewio24 | TotalView by Perforce copy Perforce Software Inc

Saving Breakpoints

From the Action Points menu select Save or Save As to save breakpointsTurn on option to save action points on exit

Examining and Editing Data

totalviewio26 | TotalView by Perforce copy Perforce Software Inc

Call Stack and Local Variables

Call Stack Viewbull Lists the set of call frames as the

program calls from one function or method to another

bull Filter button used to turn on or off filtering of frames

Local Variables Viewbull Displays local variables relative to the

current thread of interest and the selected stack frame

bull Organized by arguments and blocksbull To edit values add variable to the Data

View

totalviewio27 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel

bull Data View allows deeper exploration of data structures

bull Edit data valuesbull Cast to new data typesbull Add data to the Data View using the context

menu or by dragging and dropping

Context menu

Drag and drop

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 6: Techniques for Debugging HPC Applications

UI Navigation and Process Control

totalviewio7 | TotalView by Perforce copy Perforce Software Inc

TotalViewrsquos Default Views

1 Processes amp Threads Control Viewbull Lookup File or Functionbull Documents

2 Source View

3 Call Stack View

4 Local Variables View

5 Data View Command LineInputOutput

6 Action Points Replay Bookmarks

totalviewio8 | TotalView by Perforce copy Perforce Software Inc

Process and Threads View

totalviewio9 | TotalView by Perforce copy Perforce Software Inc

Source View

totalviewio10 | TotalView by Perforce copy Perforce Software Inc

Call Stack View and Local Variables View

totalviewio11 | TotalView by Perforce copy Perforce Software Inc

Action Points View

totalviewio12 | TotalView by Perforce copy Perforce Software Inc

Data View Command Line View and InputOutput View

Action Points

totalviewio16 | TotalView by Perforce copy Perforce Software Inc

Breakpoint

Evaluation Point (Evalpoint)

Watchpoint

Barrier point

Action Points

totalviewio17 | TotalView by Perforce copy Perforce Software Inc

Setting Breakpoints

bull Setting action points

bull Single-click line number

bull Right clicking on the line

number and using the

context menu

bull Clicking a line in the source

view then selecting the

Action Points -gt Set

breakpoint menu option

totalviewio18 | TotalView by Perforce copy Perforce Software Inc

bull Breakpoint-gtAt Locationhellip

bull Specify function name or line number

bull If function name TotalView sets a breakpoint at

first executable line in the function

Setting Breakpoints

totalviewio20 | TotalView by Perforce copy Perforce Software Inc

Evaluation points

bull Use Eval points to

bull Include instructions that stop a process and its relatives

bull Test potential fixes or patches for your program

bull Include a goto for C or Fortran that transfers control to a

line number in your program

bull Execute a TotalView function

bull Set the values of your programrsquos variables

totalviewio21 | TotalView by Perforce copy Perforce Software Inc

bull Print the value of a variable to the command line

printf(The value of result is dn result)

bull Skip some code

goto 63

bull Stop a loop after a certain number of iterations

if ( (i 100) == 0)

printf(The value of i is dn i)

$stop

See ldquoUsing Built-in Statementsrdquo in Appendix A of the User Guide for more information on ldquo$rdquo expressions

httpshelptotalviewiocurrentHTMLindexhtmlpageTotalViewBuiltInStatmentshtmlww1894979

Evaluation points Examples

totalviewio22 | TotalView by Perforce copy Perforce Software Inc

bull Watchpoints are set on a specific memory location

bull Execution is stopped when the value stored in that memory location changes

bull A breakpoint stops before an instruction executes A watchpoint stops after an instruction executes

Watchpoints

totalviewio23 | TotalView by Perforce copy Perforce Software Inc

bull Used to synchronize a group of threads or processes defined in the action point

bull Threads or processes are held at barrierpoint until all threads or processes in the group arrive

bull When all threads or processes arrive the barrier is satisfied and the threads or processes are released

Barrier Breakpoints

totalviewio24 | TotalView by Perforce copy Perforce Software Inc

Saving Breakpoints

From the Action Points menu select Save or Save As to save breakpointsTurn on option to save action points on exit

Examining and Editing Data

totalviewio26 | TotalView by Perforce copy Perforce Software Inc

Call Stack and Local Variables

Call Stack Viewbull Lists the set of call frames as the

program calls from one function or method to another

bull Filter button used to turn on or off filtering of frames

Local Variables Viewbull Displays local variables relative to the

current thread of interest and the selected stack frame

bull Organized by arguments and blocksbull To edit values add variable to the Data

View

totalviewio27 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel

bull Data View allows deeper exploration of data structures

bull Edit data valuesbull Cast to new data typesbull Add data to the Data View using the context

menu or by dragging and dropping

Context menu

Drag and drop

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 7: Techniques for Debugging HPC Applications

totalviewio7 | TotalView by Perforce copy Perforce Software Inc

TotalViewrsquos Default Views

1 Processes amp Threads Control Viewbull Lookup File or Functionbull Documents

2 Source View

3 Call Stack View

4 Local Variables View

5 Data View Command LineInputOutput

6 Action Points Replay Bookmarks

totalviewio8 | TotalView by Perforce copy Perforce Software Inc

Process and Threads View

totalviewio9 | TotalView by Perforce copy Perforce Software Inc

Source View

totalviewio10 | TotalView by Perforce copy Perforce Software Inc

Call Stack View and Local Variables View

totalviewio11 | TotalView by Perforce copy Perforce Software Inc

Action Points View

totalviewio12 | TotalView by Perforce copy Perforce Software Inc

Data View Command Line View and InputOutput View

Action Points

totalviewio16 | TotalView by Perforce copy Perforce Software Inc

Breakpoint

Evaluation Point (Evalpoint)

Watchpoint

Barrier point

Action Points

totalviewio17 | TotalView by Perforce copy Perforce Software Inc

Setting Breakpoints

bull Setting action points

bull Single-click line number

bull Right clicking on the line

number and using the

context menu

bull Clicking a line in the source

view then selecting the

Action Points -gt Set

breakpoint menu option

totalviewio18 | TotalView by Perforce copy Perforce Software Inc

bull Breakpoint-gtAt Locationhellip

bull Specify function name or line number

bull If function name TotalView sets a breakpoint at

first executable line in the function

Setting Breakpoints

totalviewio20 | TotalView by Perforce copy Perforce Software Inc

Evaluation points

bull Use Eval points to

bull Include instructions that stop a process and its relatives

bull Test potential fixes or patches for your program

bull Include a goto for C or Fortran that transfers control to a

line number in your program

bull Execute a TotalView function

bull Set the values of your programrsquos variables

totalviewio21 | TotalView by Perforce copy Perforce Software Inc

bull Print the value of a variable to the command line

printf(The value of result is dn result)

bull Skip some code

goto 63

bull Stop a loop after a certain number of iterations

if ( (i 100) == 0)

printf(The value of i is dn i)

$stop

See ldquoUsing Built-in Statementsrdquo in Appendix A of the User Guide for more information on ldquo$rdquo expressions

httpshelptotalviewiocurrentHTMLindexhtmlpageTotalViewBuiltInStatmentshtmlww1894979

Evaluation points Examples

totalviewio22 | TotalView by Perforce copy Perforce Software Inc

bull Watchpoints are set on a specific memory location

bull Execution is stopped when the value stored in that memory location changes

bull A breakpoint stops before an instruction executes A watchpoint stops after an instruction executes

Watchpoints

totalviewio23 | TotalView by Perforce copy Perforce Software Inc

bull Used to synchronize a group of threads or processes defined in the action point

bull Threads or processes are held at barrierpoint until all threads or processes in the group arrive

bull When all threads or processes arrive the barrier is satisfied and the threads or processes are released

Barrier Breakpoints

totalviewio24 | TotalView by Perforce copy Perforce Software Inc

Saving Breakpoints

From the Action Points menu select Save or Save As to save breakpointsTurn on option to save action points on exit

Examining and Editing Data

totalviewio26 | TotalView by Perforce copy Perforce Software Inc

Call Stack and Local Variables

Call Stack Viewbull Lists the set of call frames as the

program calls from one function or method to another

bull Filter button used to turn on or off filtering of frames

Local Variables Viewbull Displays local variables relative to the

current thread of interest and the selected stack frame

bull Organized by arguments and blocksbull To edit values add variable to the Data

View

totalviewio27 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel

bull Data View allows deeper exploration of data structures

bull Edit data valuesbull Cast to new data typesbull Add data to the Data View using the context

menu or by dragging and dropping

Context menu

Drag and drop

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 8: Techniques for Debugging HPC Applications

totalviewio8 | TotalView by Perforce copy Perforce Software Inc

Process and Threads View

totalviewio9 | TotalView by Perforce copy Perforce Software Inc

Source View

totalviewio10 | TotalView by Perforce copy Perforce Software Inc

Call Stack View and Local Variables View

totalviewio11 | TotalView by Perforce copy Perforce Software Inc

Action Points View

totalviewio12 | TotalView by Perforce copy Perforce Software Inc

Data View Command Line View and InputOutput View

Action Points

totalviewio16 | TotalView by Perforce copy Perforce Software Inc

Breakpoint

Evaluation Point (Evalpoint)

Watchpoint

Barrier point

Action Points

totalviewio17 | TotalView by Perforce copy Perforce Software Inc

Setting Breakpoints

bull Setting action points

bull Single-click line number

bull Right clicking on the line

number and using the

context menu

bull Clicking a line in the source

view then selecting the

Action Points -gt Set

breakpoint menu option

totalviewio18 | TotalView by Perforce copy Perforce Software Inc

bull Breakpoint-gtAt Locationhellip

bull Specify function name or line number

bull If function name TotalView sets a breakpoint at

first executable line in the function

Setting Breakpoints

totalviewio20 | TotalView by Perforce copy Perforce Software Inc

Evaluation points

bull Use Eval points to

bull Include instructions that stop a process and its relatives

bull Test potential fixes or patches for your program

bull Include a goto for C or Fortran that transfers control to a

line number in your program

bull Execute a TotalView function

bull Set the values of your programrsquos variables

totalviewio21 | TotalView by Perforce copy Perforce Software Inc

bull Print the value of a variable to the command line

printf(The value of result is dn result)

bull Skip some code

goto 63

bull Stop a loop after a certain number of iterations

if ( (i 100) == 0)

printf(The value of i is dn i)

$stop

See ldquoUsing Built-in Statementsrdquo in Appendix A of the User Guide for more information on ldquo$rdquo expressions

httpshelptotalviewiocurrentHTMLindexhtmlpageTotalViewBuiltInStatmentshtmlww1894979

Evaluation points Examples

totalviewio22 | TotalView by Perforce copy Perforce Software Inc

bull Watchpoints are set on a specific memory location

bull Execution is stopped when the value stored in that memory location changes

bull A breakpoint stops before an instruction executes A watchpoint stops after an instruction executes

Watchpoints

totalviewio23 | TotalView by Perforce copy Perforce Software Inc

bull Used to synchronize a group of threads or processes defined in the action point

bull Threads or processes are held at barrierpoint until all threads or processes in the group arrive

bull When all threads or processes arrive the barrier is satisfied and the threads or processes are released

Barrier Breakpoints

totalviewio24 | TotalView by Perforce copy Perforce Software Inc

Saving Breakpoints

From the Action Points menu select Save or Save As to save breakpointsTurn on option to save action points on exit

Examining and Editing Data

totalviewio26 | TotalView by Perforce copy Perforce Software Inc

Call Stack and Local Variables

Call Stack Viewbull Lists the set of call frames as the

program calls from one function or method to another

bull Filter button used to turn on or off filtering of frames

Local Variables Viewbull Displays local variables relative to the

current thread of interest and the selected stack frame

bull Organized by arguments and blocksbull To edit values add variable to the Data

View

totalviewio27 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel

bull Data View allows deeper exploration of data structures

bull Edit data valuesbull Cast to new data typesbull Add data to the Data View using the context

menu or by dragging and dropping

Context menu

Drag and drop

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 9: Techniques for Debugging HPC Applications

totalviewio9 | TotalView by Perforce copy Perforce Software Inc

Source View

totalviewio10 | TotalView by Perforce copy Perforce Software Inc

Call Stack View and Local Variables View

totalviewio11 | TotalView by Perforce copy Perforce Software Inc

Action Points View

totalviewio12 | TotalView by Perforce copy Perforce Software Inc

Data View Command Line View and InputOutput View

Action Points

totalviewio16 | TotalView by Perforce copy Perforce Software Inc

Breakpoint

Evaluation Point (Evalpoint)

Watchpoint

Barrier point

Action Points

totalviewio17 | TotalView by Perforce copy Perforce Software Inc

Setting Breakpoints

bull Setting action points

bull Single-click line number

bull Right clicking on the line

number and using the

context menu

bull Clicking a line in the source

view then selecting the

Action Points -gt Set

breakpoint menu option

totalviewio18 | TotalView by Perforce copy Perforce Software Inc

bull Breakpoint-gtAt Locationhellip

bull Specify function name or line number

bull If function name TotalView sets a breakpoint at

first executable line in the function

Setting Breakpoints

totalviewio20 | TotalView by Perforce copy Perforce Software Inc

Evaluation points

bull Use Eval points to

bull Include instructions that stop a process and its relatives

bull Test potential fixes or patches for your program

bull Include a goto for C or Fortran that transfers control to a

line number in your program

bull Execute a TotalView function

bull Set the values of your programrsquos variables

totalviewio21 | TotalView by Perforce copy Perforce Software Inc

bull Print the value of a variable to the command line

printf(The value of result is dn result)

bull Skip some code

goto 63

bull Stop a loop after a certain number of iterations

if ( (i 100) == 0)

printf(The value of i is dn i)

$stop

See ldquoUsing Built-in Statementsrdquo in Appendix A of the User Guide for more information on ldquo$rdquo expressions

httpshelptotalviewiocurrentHTMLindexhtmlpageTotalViewBuiltInStatmentshtmlww1894979

Evaluation points Examples

totalviewio22 | TotalView by Perforce copy Perforce Software Inc

bull Watchpoints are set on a specific memory location

bull Execution is stopped when the value stored in that memory location changes

bull A breakpoint stops before an instruction executes A watchpoint stops after an instruction executes

Watchpoints

totalviewio23 | TotalView by Perforce copy Perforce Software Inc

bull Used to synchronize a group of threads or processes defined in the action point

bull Threads or processes are held at barrierpoint until all threads or processes in the group arrive

bull When all threads or processes arrive the barrier is satisfied and the threads or processes are released

Barrier Breakpoints

totalviewio24 | TotalView by Perforce copy Perforce Software Inc

Saving Breakpoints

From the Action Points menu select Save or Save As to save breakpointsTurn on option to save action points on exit

Examining and Editing Data

totalviewio26 | TotalView by Perforce copy Perforce Software Inc

Call Stack and Local Variables

Call Stack Viewbull Lists the set of call frames as the

program calls from one function or method to another

bull Filter button used to turn on or off filtering of frames

Local Variables Viewbull Displays local variables relative to the

current thread of interest and the selected stack frame

bull Organized by arguments and blocksbull To edit values add variable to the Data

View

totalviewio27 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel

bull Data View allows deeper exploration of data structures

bull Edit data valuesbull Cast to new data typesbull Add data to the Data View using the context

menu or by dragging and dropping

Context menu

Drag and drop

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 10: Techniques for Debugging HPC Applications

totalviewio10 | TotalView by Perforce copy Perforce Software Inc

Call Stack View and Local Variables View

totalviewio11 | TotalView by Perforce copy Perforce Software Inc

Action Points View

totalviewio12 | TotalView by Perforce copy Perforce Software Inc

Data View Command Line View and InputOutput View

Action Points

totalviewio16 | TotalView by Perforce copy Perforce Software Inc

Breakpoint

Evaluation Point (Evalpoint)

Watchpoint

Barrier point

Action Points

totalviewio17 | TotalView by Perforce copy Perforce Software Inc

Setting Breakpoints

bull Setting action points

bull Single-click line number

bull Right clicking on the line

number and using the

context menu

bull Clicking a line in the source

view then selecting the

Action Points -gt Set

breakpoint menu option

totalviewio18 | TotalView by Perforce copy Perforce Software Inc

bull Breakpoint-gtAt Locationhellip

bull Specify function name or line number

bull If function name TotalView sets a breakpoint at

first executable line in the function

Setting Breakpoints

totalviewio20 | TotalView by Perforce copy Perforce Software Inc

Evaluation points

bull Use Eval points to

bull Include instructions that stop a process and its relatives

bull Test potential fixes or patches for your program

bull Include a goto for C or Fortran that transfers control to a

line number in your program

bull Execute a TotalView function

bull Set the values of your programrsquos variables

totalviewio21 | TotalView by Perforce copy Perforce Software Inc

bull Print the value of a variable to the command line

printf(The value of result is dn result)

bull Skip some code

goto 63

bull Stop a loop after a certain number of iterations

if ( (i 100) == 0)

printf(The value of i is dn i)

$stop

See ldquoUsing Built-in Statementsrdquo in Appendix A of the User Guide for more information on ldquo$rdquo expressions

httpshelptotalviewiocurrentHTMLindexhtmlpageTotalViewBuiltInStatmentshtmlww1894979

Evaluation points Examples

totalviewio22 | TotalView by Perforce copy Perforce Software Inc

bull Watchpoints are set on a specific memory location

bull Execution is stopped when the value stored in that memory location changes

bull A breakpoint stops before an instruction executes A watchpoint stops after an instruction executes

Watchpoints

totalviewio23 | TotalView by Perforce copy Perforce Software Inc

bull Used to synchronize a group of threads or processes defined in the action point

bull Threads or processes are held at barrierpoint until all threads or processes in the group arrive

bull When all threads or processes arrive the barrier is satisfied and the threads or processes are released

Barrier Breakpoints

totalviewio24 | TotalView by Perforce copy Perforce Software Inc

Saving Breakpoints

From the Action Points menu select Save or Save As to save breakpointsTurn on option to save action points on exit

Examining and Editing Data

totalviewio26 | TotalView by Perforce copy Perforce Software Inc

Call Stack and Local Variables

Call Stack Viewbull Lists the set of call frames as the

program calls from one function or method to another

bull Filter button used to turn on or off filtering of frames

Local Variables Viewbull Displays local variables relative to the

current thread of interest and the selected stack frame

bull Organized by arguments and blocksbull To edit values add variable to the Data

View

totalviewio27 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel

bull Data View allows deeper exploration of data structures

bull Edit data valuesbull Cast to new data typesbull Add data to the Data View using the context

menu or by dragging and dropping

Context menu

Drag and drop

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 11: Techniques for Debugging HPC Applications

totalviewio11 | TotalView by Perforce copy Perforce Software Inc

Action Points View

totalviewio12 | TotalView by Perforce copy Perforce Software Inc

Data View Command Line View and InputOutput View

Action Points

totalviewio16 | TotalView by Perforce copy Perforce Software Inc

Breakpoint

Evaluation Point (Evalpoint)

Watchpoint

Barrier point

Action Points

totalviewio17 | TotalView by Perforce copy Perforce Software Inc

Setting Breakpoints

bull Setting action points

bull Single-click line number

bull Right clicking on the line

number and using the

context menu

bull Clicking a line in the source

view then selecting the

Action Points -gt Set

breakpoint menu option

totalviewio18 | TotalView by Perforce copy Perforce Software Inc

bull Breakpoint-gtAt Locationhellip

bull Specify function name or line number

bull If function name TotalView sets a breakpoint at

first executable line in the function

Setting Breakpoints

totalviewio20 | TotalView by Perforce copy Perforce Software Inc

Evaluation points

bull Use Eval points to

bull Include instructions that stop a process and its relatives

bull Test potential fixes or patches for your program

bull Include a goto for C or Fortran that transfers control to a

line number in your program

bull Execute a TotalView function

bull Set the values of your programrsquos variables

totalviewio21 | TotalView by Perforce copy Perforce Software Inc

bull Print the value of a variable to the command line

printf(The value of result is dn result)

bull Skip some code

goto 63

bull Stop a loop after a certain number of iterations

if ( (i 100) == 0)

printf(The value of i is dn i)

$stop

See ldquoUsing Built-in Statementsrdquo in Appendix A of the User Guide for more information on ldquo$rdquo expressions

httpshelptotalviewiocurrentHTMLindexhtmlpageTotalViewBuiltInStatmentshtmlww1894979

Evaluation points Examples

totalviewio22 | TotalView by Perforce copy Perforce Software Inc

bull Watchpoints are set on a specific memory location

bull Execution is stopped when the value stored in that memory location changes

bull A breakpoint stops before an instruction executes A watchpoint stops after an instruction executes

Watchpoints

totalviewio23 | TotalView by Perforce copy Perforce Software Inc

bull Used to synchronize a group of threads or processes defined in the action point

bull Threads or processes are held at barrierpoint until all threads or processes in the group arrive

bull When all threads or processes arrive the barrier is satisfied and the threads or processes are released

Barrier Breakpoints

totalviewio24 | TotalView by Perforce copy Perforce Software Inc

Saving Breakpoints

From the Action Points menu select Save or Save As to save breakpointsTurn on option to save action points on exit

Examining and Editing Data

totalviewio26 | TotalView by Perforce copy Perforce Software Inc

Call Stack and Local Variables

Call Stack Viewbull Lists the set of call frames as the

program calls from one function or method to another

bull Filter button used to turn on or off filtering of frames

Local Variables Viewbull Displays local variables relative to the

current thread of interest and the selected stack frame

bull Organized by arguments and blocksbull To edit values add variable to the Data

View

totalviewio27 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel

bull Data View allows deeper exploration of data structures

bull Edit data valuesbull Cast to new data typesbull Add data to the Data View using the context

menu or by dragging and dropping

Context menu

Drag and drop

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 12: Techniques for Debugging HPC Applications

totalviewio12 | TotalView by Perforce copy Perforce Software Inc

Data View Command Line View and InputOutput View

Action Points

totalviewio16 | TotalView by Perforce copy Perforce Software Inc

Breakpoint

Evaluation Point (Evalpoint)

Watchpoint

Barrier point

Action Points

totalviewio17 | TotalView by Perforce copy Perforce Software Inc

Setting Breakpoints

bull Setting action points

bull Single-click line number

bull Right clicking on the line

number and using the

context menu

bull Clicking a line in the source

view then selecting the

Action Points -gt Set

breakpoint menu option

totalviewio18 | TotalView by Perforce copy Perforce Software Inc

bull Breakpoint-gtAt Locationhellip

bull Specify function name or line number

bull If function name TotalView sets a breakpoint at

first executable line in the function

Setting Breakpoints

totalviewio20 | TotalView by Perforce copy Perforce Software Inc

Evaluation points

bull Use Eval points to

bull Include instructions that stop a process and its relatives

bull Test potential fixes or patches for your program

bull Include a goto for C or Fortran that transfers control to a

line number in your program

bull Execute a TotalView function

bull Set the values of your programrsquos variables

totalviewio21 | TotalView by Perforce copy Perforce Software Inc

bull Print the value of a variable to the command line

printf(The value of result is dn result)

bull Skip some code

goto 63

bull Stop a loop after a certain number of iterations

if ( (i 100) == 0)

printf(The value of i is dn i)

$stop

See ldquoUsing Built-in Statementsrdquo in Appendix A of the User Guide for more information on ldquo$rdquo expressions

httpshelptotalviewiocurrentHTMLindexhtmlpageTotalViewBuiltInStatmentshtmlww1894979

Evaluation points Examples

totalviewio22 | TotalView by Perforce copy Perforce Software Inc

bull Watchpoints are set on a specific memory location

bull Execution is stopped when the value stored in that memory location changes

bull A breakpoint stops before an instruction executes A watchpoint stops after an instruction executes

Watchpoints

totalviewio23 | TotalView by Perforce copy Perforce Software Inc

bull Used to synchronize a group of threads or processes defined in the action point

bull Threads or processes are held at barrierpoint until all threads or processes in the group arrive

bull When all threads or processes arrive the barrier is satisfied and the threads or processes are released

Barrier Breakpoints

totalviewio24 | TotalView by Perforce copy Perforce Software Inc

Saving Breakpoints

From the Action Points menu select Save or Save As to save breakpointsTurn on option to save action points on exit

Examining and Editing Data

totalviewio26 | TotalView by Perforce copy Perforce Software Inc

Call Stack and Local Variables

Call Stack Viewbull Lists the set of call frames as the

program calls from one function or method to another

bull Filter button used to turn on or off filtering of frames

Local Variables Viewbull Displays local variables relative to the

current thread of interest and the selected stack frame

bull Organized by arguments and blocksbull To edit values add variable to the Data

View

totalviewio27 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel

bull Data View allows deeper exploration of data structures

bull Edit data valuesbull Cast to new data typesbull Add data to the Data View using the context

menu or by dragging and dropping

Context menu

Drag and drop

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 13: Techniques for Debugging HPC Applications

Action Points

totalviewio16 | TotalView by Perforce copy Perforce Software Inc

Breakpoint

Evaluation Point (Evalpoint)

Watchpoint

Barrier point

Action Points

totalviewio17 | TotalView by Perforce copy Perforce Software Inc

Setting Breakpoints

bull Setting action points

bull Single-click line number

bull Right clicking on the line

number and using the

context menu

bull Clicking a line in the source

view then selecting the

Action Points -gt Set

breakpoint menu option

totalviewio18 | TotalView by Perforce copy Perforce Software Inc

bull Breakpoint-gtAt Locationhellip

bull Specify function name or line number

bull If function name TotalView sets a breakpoint at

first executable line in the function

Setting Breakpoints

totalviewio20 | TotalView by Perforce copy Perforce Software Inc

Evaluation points

bull Use Eval points to

bull Include instructions that stop a process and its relatives

bull Test potential fixes or patches for your program

bull Include a goto for C or Fortran that transfers control to a

line number in your program

bull Execute a TotalView function

bull Set the values of your programrsquos variables

totalviewio21 | TotalView by Perforce copy Perforce Software Inc

bull Print the value of a variable to the command line

printf(The value of result is dn result)

bull Skip some code

goto 63

bull Stop a loop after a certain number of iterations

if ( (i 100) == 0)

printf(The value of i is dn i)

$stop

See ldquoUsing Built-in Statementsrdquo in Appendix A of the User Guide for more information on ldquo$rdquo expressions

httpshelptotalviewiocurrentHTMLindexhtmlpageTotalViewBuiltInStatmentshtmlww1894979

Evaluation points Examples

totalviewio22 | TotalView by Perforce copy Perforce Software Inc

bull Watchpoints are set on a specific memory location

bull Execution is stopped when the value stored in that memory location changes

bull A breakpoint stops before an instruction executes A watchpoint stops after an instruction executes

Watchpoints

totalviewio23 | TotalView by Perforce copy Perforce Software Inc

bull Used to synchronize a group of threads or processes defined in the action point

bull Threads or processes are held at barrierpoint until all threads or processes in the group arrive

bull When all threads or processes arrive the barrier is satisfied and the threads or processes are released

Barrier Breakpoints

totalviewio24 | TotalView by Perforce copy Perforce Software Inc

Saving Breakpoints

From the Action Points menu select Save or Save As to save breakpointsTurn on option to save action points on exit

Examining and Editing Data

totalviewio26 | TotalView by Perforce copy Perforce Software Inc

Call Stack and Local Variables

Call Stack Viewbull Lists the set of call frames as the

program calls from one function or method to another

bull Filter button used to turn on or off filtering of frames

Local Variables Viewbull Displays local variables relative to the

current thread of interest and the selected stack frame

bull Organized by arguments and blocksbull To edit values add variable to the Data

View

totalviewio27 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel

bull Data View allows deeper exploration of data structures

bull Edit data valuesbull Cast to new data typesbull Add data to the Data View using the context

menu or by dragging and dropping

Context menu

Drag and drop

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 14: Techniques for Debugging HPC Applications

totalviewio16 | TotalView by Perforce copy Perforce Software Inc

Breakpoint

Evaluation Point (Evalpoint)

Watchpoint

Barrier point

Action Points

totalviewio17 | TotalView by Perforce copy Perforce Software Inc

Setting Breakpoints

bull Setting action points

bull Single-click line number

bull Right clicking on the line

number and using the

context menu

bull Clicking a line in the source

view then selecting the

Action Points -gt Set

breakpoint menu option

totalviewio18 | TotalView by Perforce copy Perforce Software Inc

bull Breakpoint-gtAt Locationhellip

bull Specify function name or line number

bull If function name TotalView sets a breakpoint at

first executable line in the function

Setting Breakpoints

totalviewio20 | TotalView by Perforce copy Perforce Software Inc

Evaluation points

bull Use Eval points to

bull Include instructions that stop a process and its relatives

bull Test potential fixes or patches for your program

bull Include a goto for C or Fortran that transfers control to a

line number in your program

bull Execute a TotalView function

bull Set the values of your programrsquos variables

totalviewio21 | TotalView by Perforce copy Perforce Software Inc

bull Print the value of a variable to the command line

printf(The value of result is dn result)

bull Skip some code

goto 63

bull Stop a loop after a certain number of iterations

if ( (i 100) == 0)

printf(The value of i is dn i)

$stop

See ldquoUsing Built-in Statementsrdquo in Appendix A of the User Guide for more information on ldquo$rdquo expressions

httpshelptotalviewiocurrentHTMLindexhtmlpageTotalViewBuiltInStatmentshtmlww1894979

Evaluation points Examples

totalviewio22 | TotalView by Perforce copy Perforce Software Inc

bull Watchpoints are set on a specific memory location

bull Execution is stopped when the value stored in that memory location changes

bull A breakpoint stops before an instruction executes A watchpoint stops after an instruction executes

Watchpoints

totalviewio23 | TotalView by Perforce copy Perforce Software Inc

bull Used to synchronize a group of threads or processes defined in the action point

bull Threads or processes are held at barrierpoint until all threads or processes in the group arrive

bull When all threads or processes arrive the barrier is satisfied and the threads or processes are released

Barrier Breakpoints

totalviewio24 | TotalView by Perforce copy Perforce Software Inc

Saving Breakpoints

From the Action Points menu select Save or Save As to save breakpointsTurn on option to save action points on exit

Examining and Editing Data

totalviewio26 | TotalView by Perforce copy Perforce Software Inc

Call Stack and Local Variables

Call Stack Viewbull Lists the set of call frames as the

program calls from one function or method to another

bull Filter button used to turn on or off filtering of frames

Local Variables Viewbull Displays local variables relative to the

current thread of interest and the selected stack frame

bull Organized by arguments and blocksbull To edit values add variable to the Data

View

totalviewio27 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel

bull Data View allows deeper exploration of data structures

bull Edit data valuesbull Cast to new data typesbull Add data to the Data View using the context

menu or by dragging and dropping

Context menu

Drag and drop

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 15: Techniques for Debugging HPC Applications

totalviewio17 | TotalView by Perforce copy Perforce Software Inc

Setting Breakpoints

bull Setting action points

bull Single-click line number

bull Right clicking on the line

number and using the

context menu

bull Clicking a line in the source

view then selecting the

Action Points -gt Set

breakpoint menu option

totalviewio18 | TotalView by Perforce copy Perforce Software Inc

bull Breakpoint-gtAt Locationhellip

bull Specify function name or line number

bull If function name TotalView sets a breakpoint at

first executable line in the function

Setting Breakpoints

totalviewio20 | TotalView by Perforce copy Perforce Software Inc

Evaluation points

bull Use Eval points to

bull Include instructions that stop a process and its relatives

bull Test potential fixes or patches for your program

bull Include a goto for C or Fortran that transfers control to a

line number in your program

bull Execute a TotalView function

bull Set the values of your programrsquos variables

totalviewio21 | TotalView by Perforce copy Perforce Software Inc

bull Print the value of a variable to the command line

printf(The value of result is dn result)

bull Skip some code

goto 63

bull Stop a loop after a certain number of iterations

if ( (i 100) == 0)

printf(The value of i is dn i)

$stop

See ldquoUsing Built-in Statementsrdquo in Appendix A of the User Guide for more information on ldquo$rdquo expressions

httpshelptotalviewiocurrentHTMLindexhtmlpageTotalViewBuiltInStatmentshtmlww1894979

Evaluation points Examples

totalviewio22 | TotalView by Perforce copy Perforce Software Inc

bull Watchpoints are set on a specific memory location

bull Execution is stopped when the value stored in that memory location changes

bull A breakpoint stops before an instruction executes A watchpoint stops after an instruction executes

Watchpoints

totalviewio23 | TotalView by Perforce copy Perforce Software Inc

bull Used to synchronize a group of threads or processes defined in the action point

bull Threads or processes are held at barrierpoint until all threads or processes in the group arrive

bull When all threads or processes arrive the barrier is satisfied and the threads or processes are released

Barrier Breakpoints

totalviewio24 | TotalView by Perforce copy Perforce Software Inc

Saving Breakpoints

From the Action Points menu select Save or Save As to save breakpointsTurn on option to save action points on exit

Examining and Editing Data

totalviewio26 | TotalView by Perforce copy Perforce Software Inc

Call Stack and Local Variables

Call Stack Viewbull Lists the set of call frames as the

program calls from one function or method to another

bull Filter button used to turn on or off filtering of frames

Local Variables Viewbull Displays local variables relative to the

current thread of interest and the selected stack frame

bull Organized by arguments and blocksbull To edit values add variable to the Data

View

totalviewio27 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel

bull Data View allows deeper exploration of data structures

bull Edit data valuesbull Cast to new data typesbull Add data to the Data View using the context

menu or by dragging and dropping

Context menu

Drag and drop

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 16: Techniques for Debugging HPC Applications

totalviewio18 | TotalView by Perforce copy Perforce Software Inc

bull Breakpoint-gtAt Locationhellip

bull Specify function name or line number

bull If function name TotalView sets a breakpoint at

first executable line in the function

Setting Breakpoints

totalviewio20 | TotalView by Perforce copy Perforce Software Inc

Evaluation points

bull Use Eval points to

bull Include instructions that stop a process and its relatives

bull Test potential fixes or patches for your program

bull Include a goto for C or Fortran that transfers control to a

line number in your program

bull Execute a TotalView function

bull Set the values of your programrsquos variables

totalviewio21 | TotalView by Perforce copy Perforce Software Inc

bull Print the value of a variable to the command line

printf(The value of result is dn result)

bull Skip some code

goto 63

bull Stop a loop after a certain number of iterations

if ( (i 100) == 0)

printf(The value of i is dn i)

$stop

See ldquoUsing Built-in Statementsrdquo in Appendix A of the User Guide for more information on ldquo$rdquo expressions

httpshelptotalviewiocurrentHTMLindexhtmlpageTotalViewBuiltInStatmentshtmlww1894979

Evaluation points Examples

totalviewio22 | TotalView by Perforce copy Perforce Software Inc

bull Watchpoints are set on a specific memory location

bull Execution is stopped when the value stored in that memory location changes

bull A breakpoint stops before an instruction executes A watchpoint stops after an instruction executes

Watchpoints

totalviewio23 | TotalView by Perforce copy Perforce Software Inc

bull Used to synchronize a group of threads or processes defined in the action point

bull Threads or processes are held at barrierpoint until all threads or processes in the group arrive

bull When all threads or processes arrive the barrier is satisfied and the threads or processes are released

Barrier Breakpoints

totalviewio24 | TotalView by Perforce copy Perforce Software Inc

Saving Breakpoints

From the Action Points menu select Save or Save As to save breakpointsTurn on option to save action points on exit

Examining and Editing Data

totalviewio26 | TotalView by Perforce copy Perforce Software Inc

Call Stack and Local Variables

Call Stack Viewbull Lists the set of call frames as the

program calls from one function or method to another

bull Filter button used to turn on or off filtering of frames

Local Variables Viewbull Displays local variables relative to the

current thread of interest and the selected stack frame

bull Organized by arguments and blocksbull To edit values add variable to the Data

View

totalviewio27 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel

bull Data View allows deeper exploration of data structures

bull Edit data valuesbull Cast to new data typesbull Add data to the Data View using the context

menu or by dragging and dropping

Context menu

Drag and drop

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 17: Techniques for Debugging HPC Applications

totalviewio20 | TotalView by Perforce copy Perforce Software Inc

Evaluation points

bull Use Eval points to

bull Include instructions that stop a process and its relatives

bull Test potential fixes or patches for your program

bull Include a goto for C or Fortran that transfers control to a

line number in your program

bull Execute a TotalView function

bull Set the values of your programrsquos variables

totalviewio21 | TotalView by Perforce copy Perforce Software Inc

bull Print the value of a variable to the command line

printf(The value of result is dn result)

bull Skip some code

goto 63

bull Stop a loop after a certain number of iterations

if ( (i 100) == 0)

printf(The value of i is dn i)

$stop

See ldquoUsing Built-in Statementsrdquo in Appendix A of the User Guide for more information on ldquo$rdquo expressions

httpshelptotalviewiocurrentHTMLindexhtmlpageTotalViewBuiltInStatmentshtmlww1894979

Evaluation points Examples

totalviewio22 | TotalView by Perforce copy Perforce Software Inc

bull Watchpoints are set on a specific memory location

bull Execution is stopped when the value stored in that memory location changes

bull A breakpoint stops before an instruction executes A watchpoint stops after an instruction executes

Watchpoints

totalviewio23 | TotalView by Perforce copy Perforce Software Inc

bull Used to synchronize a group of threads or processes defined in the action point

bull Threads or processes are held at barrierpoint until all threads or processes in the group arrive

bull When all threads or processes arrive the barrier is satisfied and the threads or processes are released

Barrier Breakpoints

totalviewio24 | TotalView by Perforce copy Perforce Software Inc

Saving Breakpoints

From the Action Points menu select Save or Save As to save breakpointsTurn on option to save action points on exit

Examining and Editing Data

totalviewio26 | TotalView by Perforce copy Perforce Software Inc

Call Stack and Local Variables

Call Stack Viewbull Lists the set of call frames as the

program calls from one function or method to another

bull Filter button used to turn on or off filtering of frames

Local Variables Viewbull Displays local variables relative to the

current thread of interest and the selected stack frame

bull Organized by arguments and blocksbull To edit values add variable to the Data

View

totalviewio27 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel

bull Data View allows deeper exploration of data structures

bull Edit data valuesbull Cast to new data typesbull Add data to the Data View using the context

menu or by dragging and dropping

Context menu

Drag and drop

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 18: Techniques for Debugging HPC Applications

totalviewio21 | TotalView by Perforce copy Perforce Software Inc

bull Print the value of a variable to the command line

printf(The value of result is dn result)

bull Skip some code

goto 63

bull Stop a loop after a certain number of iterations

if ( (i 100) == 0)

printf(The value of i is dn i)

$stop

See ldquoUsing Built-in Statementsrdquo in Appendix A of the User Guide for more information on ldquo$rdquo expressions

httpshelptotalviewiocurrentHTMLindexhtmlpageTotalViewBuiltInStatmentshtmlww1894979

Evaluation points Examples

totalviewio22 | TotalView by Perforce copy Perforce Software Inc

bull Watchpoints are set on a specific memory location

bull Execution is stopped when the value stored in that memory location changes

bull A breakpoint stops before an instruction executes A watchpoint stops after an instruction executes

Watchpoints

totalviewio23 | TotalView by Perforce copy Perforce Software Inc

bull Used to synchronize a group of threads or processes defined in the action point

bull Threads or processes are held at barrierpoint until all threads or processes in the group arrive

bull When all threads or processes arrive the barrier is satisfied and the threads or processes are released

Barrier Breakpoints

totalviewio24 | TotalView by Perforce copy Perforce Software Inc

Saving Breakpoints

From the Action Points menu select Save or Save As to save breakpointsTurn on option to save action points on exit

Examining and Editing Data

totalviewio26 | TotalView by Perforce copy Perforce Software Inc

Call Stack and Local Variables

Call Stack Viewbull Lists the set of call frames as the

program calls from one function or method to another

bull Filter button used to turn on or off filtering of frames

Local Variables Viewbull Displays local variables relative to the

current thread of interest and the selected stack frame

bull Organized by arguments and blocksbull To edit values add variable to the Data

View

totalviewio27 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel

bull Data View allows deeper exploration of data structures

bull Edit data valuesbull Cast to new data typesbull Add data to the Data View using the context

menu or by dragging and dropping

Context menu

Drag and drop

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 19: Techniques for Debugging HPC Applications

totalviewio22 | TotalView by Perforce copy Perforce Software Inc

bull Watchpoints are set on a specific memory location

bull Execution is stopped when the value stored in that memory location changes

bull A breakpoint stops before an instruction executes A watchpoint stops after an instruction executes

Watchpoints

totalviewio23 | TotalView by Perforce copy Perforce Software Inc

bull Used to synchronize a group of threads or processes defined in the action point

bull Threads or processes are held at barrierpoint until all threads or processes in the group arrive

bull When all threads or processes arrive the barrier is satisfied and the threads or processes are released

Barrier Breakpoints

totalviewio24 | TotalView by Perforce copy Perforce Software Inc

Saving Breakpoints

From the Action Points menu select Save or Save As to save breakpointsTurn on option to save action points on exit

Examining and Editing Data

totalviewio26 | TotalView by Perforce copy Perforce Software Inc

Call Stack and Local Variables

Call Stack Viewbull Lists the set of call frames as the

program calls from one function or method to another

bull Filter button used to turn on or off filtering of frames

Local Variables Viewbull Displays local variables relative to the

current thread of interest and the selected stack frame

bull Organized by arguments and blocksbull To edit values add variable to the Data

View

totalviewio27 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel

bull Data View allows deeper exploration of data structures

bull Edit data valuesbull Cast to new data typesbull Add data to the Data View using the context

menu or by dragging and dropping

Context menu

Drag and drop

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 20: Techniques for Debugging HPC Applications

totalviewio23 | TotalView by Perforce copy Perforce Software Inc

bull Used to synchronize a group of threads or processes defined in the action point

bull Threads or processes are held at barrierpoint until all threads or processes in the group arrive

bull When all threads or processes arrive the barrier is satisfied and the threads or processes are released

Barrier Breakpoints

totalviewio24 | TotalView by Perforce copy Perforce Software Inc

Saving Breakpoints

From the Action Points menu select Save or Save As to save breakpointsTurn on option to save action points on exit

Examining and Editing Data

totalviewio26 | TotalView by Perforce copy Perforce Software Inc

Call Stack and Local Variables

Call Stack Viewbull Lists the set of call frames as the

program calls from one function or method to another

bull Filter button used to turn on or off filtering of frames

Local Variables Viewbull Displays local variables relative to the

current thread of interest and the selected stack frame

bull Organized by arguments and blocksbull To edit values add variable to the Data

View

totalviewio27 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel

bull Data View allows deeper exploration of data structures

bull Edit data valuesbull Cast to new data typesbull Add data to the Data View using the context

menu or by dragging and dropping

Context menu

Drag and drop

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 21: Techniques for Debugging HPC Applications

totalviewio24 | TotalView by Perforce copy Perforce Software Inc

Saving Breakpoints

From the Action Points menu select Save or Save As to save breakpointsTurn on option to save action points on exit

Examining and Editing Data

totalviewio26 | TotalView by Perforce copy Perforce Software Inc

Call Stack and Local Variables

Call Stack Viewbull Lists the set of call frames as the

program calls from one function or method to another

bull Filter button used to turn on or off filtering of frames

Local Variables Viewbull Displays local variables relative to the

current thread of interest and the selected stack frame

bull Organized by arguments and blocksbull To edit values add variable to the Data

View

totalviewio27 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel

bull Data View allows deeper exploration of data structures

bull Edit data valuesbull Cast to new data typesbull Add data to the Data View using the context

menu or by dragging and dropping

Context menu

Drag and drop

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 22: Techniques for Debugging HPC Applications

Examining and Editing Data

totalviewio26 | TotalView by Perforce copy Perforce Software Inc

Call Stack and Local Variables

Call Stack Viewbull Lists the set of call frames as the

program calls from one function or method to another

bull Filter button used to turn on or off filtering of frames

Local Variables Viewbull Displays local variables relative to the

current thread of interest and the selected stack frame

bull Organized by arguments and blocksbull To edit values add variable to the Data

View

totalviewio27 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel

bull Data View allows deeper exploration of data structures

bull Edit data valuesbull Cast to new data typesbull Add data to the Data View using the context

menu or by dragging and dropping

Context menu

Drag and drop

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 23: Techniques for Debugging HPC Applications

totalviewio26 | TotalView by Perforce copy Perforce Software Inc

Call Stack and Local Variables

Call Stack Viewbull Lists the set of call frames as the

program calls from one function or method to another

bull Filter button used to turn on or off filtering of frames

Local Variables Viewbull Displays local variables relative to the

current thread of interest and the selected stack frame

bull Organized by arguments and blocksbull To edit values add variable to the Data

View

totalviewio27 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel

bull Data View allows deeper exploration of data structures

bull Edit data valuesbull Cast to new data typesbull Add data to the Data View using the context

menu or by dragging and dropping

Context menu

Drag and drop

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 24: Techniques for Debugging HPC Applications

totalviewio27 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel

bull Data View allows deeper exploration of data structures

bull Edit data valuesbull Cast to new data typesbull Add data to the Data View using the context

menu or by dragging and dropping

Context menu

Drag and drop

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 25: Techniques for Debugging HPC Applications

totalviewio28 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Expanding Arrays and Structures

Select the right arrow to display the substructures in a complex variable

Any nested structures are displayed in the data view

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 26: Techniques for Debugging HPC Applications

totalviewio29 | TotalView by Perforce copy Perforce Software Inc

bull Dive in All

bull Use Dive in All to easily see each member of a data structure from an array of structures

The Data View ndash Dive in All

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 27: Techniques for Debugging HPC Applications

totalviewio30 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel ndash Entering Expressions

Enter a new expression in the Data View panel to view that data

A new expression is added

Increment a variable

Type the expression in the [Add New expression] field

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 28: Techniques for Debugging HPC Applications

totalviewio32 | TotalView by Perforce copy Perforce Software Inc

The Data View Panel - Casting

Casting to another type

TotalView displays the array

Cast a variable into an array by adding the array specifier

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 29: Techniques for Debugging HPC Applications

Extending Debugging Capabilities How to Debug (AI) Mixed PythonC++ Code

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 30: Techniques for Debugging HPC Applications

totalviewio34 | TotalView by Perforce copy Perforce Software Inc

bull Debugging one language is difficult enough

bull Understanding the flow of execution across language barriers is hard

bull Examining and comparing data in both languages is challenging

bull What TotalView provides

bull Easy python debugging session setup

bull Fully integrated Python and CC++ call stack

bull rdquoGluerdquo layers between the languages removed

bull Easily examine and compare variables in Python and C++

bull Modest system requirements

bull Utilize reverse debugging and memory debugging

bull What TotalView does not provide (yet)

bull Setting breakpoints and stepping within Python code

Mixed Language Python Debugging

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 31: Techniques for Debugging HPC Applications

totalviewio35 | TotalView by Perforce copy Perforce Software Inc

Python debugging with TotalView (demo)

usrbinpython

def callFact()import tv_python_example as tpa = 3b = 10c = a+bch = ldquolocal stringrdquohelliphellip

return tpfact(a)if __name__ == __main__rsquo

b = 2result = callFact()print result

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 32: Techniques for Debugging HPC Applications

totalview -args python test_python_typespy

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 33: Techniques for Debugging HPC Applications

Remote Debugging - TotalView Remote UI

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 34: Techniques for Debugging HPC Applications

roguewavecom38 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Combine the convenience of establishing a remote

connection to a cluster and the ability to run the

TotalView GUI locally

bull Front-end GUI architecture does not need to match back-

end target architecture (macOS front-end -gt Linux back-

end)

bull Secure communications

bull Convenient saved sessions

bull Once connected debug as normal with access to all

TotalView features

bull Front-end GUI currently supports macOS and Linux

x86x86-64 Windows client is coming

TotalView Remote UI

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 35: Techniques for Debugging HPC Applications

roguewavecom39 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Remote UI Architecture

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 36: Techniques for Debugging HPC Applications

TotalView Reverse Connections

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 37: Techniques for Debugging HPC Applications

roguewavecom41 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 38: Techniques for Debugging HPC Applications

roguewavecom42 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 39: Techniques for Debugging HPC Applications

roguewavecom43 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

2 TotalView UI reads request

3 TotalView returns response

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 40: Techniques for Debugging HPC Applications

roguewavecom44 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect 5 exec

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 41: Techniques for Debugging HPC Applications

roguewavecom45 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

TotalView UI

2 TotalView UI reads request

3 TotalView returns response

6 socket connection opened tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 42: Techniques for Debugging HPC Applications

roguewavecom46 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

F R O N T - E N D N O D E B A T C H N O D E

Reverse Connection Flow

$HOMEtotalviewconnect

v TotalView UI

2 TotalView UI reads request

3 TotalView returns response

tvdsvr

srun

tvconnect

C O M P U T E N O D E S

Rank 0Rank 0Rank 0

1 tvconnect writes request

4 tvconnect reads response

5 exec

6 socket connection opened

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 43: Techniques for Debugging HPC Applications

roguewavecom47 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Start a debugging session using TotalView Reverse Connect

bull Reverse Connect enables the debugger to be submitted to a cluster and connected to the GUI once run

bull Enables running TotalView UI on the front-end node and remotely

debug jobs executing on the compute nodes

bull Very easy to utilize simply prefix job launch or application start

with ldquotvconnectrdquo command

Batch Script Submission with Reverse Connect

binbashSBATCH -J hybrid_fibhellipSBATCH -n 2SBATCH -c 4SBATCH --mem-per-cpu=4000export OMP_NUM_THREADS=4

tvconnect srun -n 2 --cpus-per-task=4 --mpi=pmix hybrid_fib

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 44: Techniques for Debugging HPC Applications

Memory Leaks Heap Status and Identifying Dangling Pointers

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 45: Techniques for Debugging HPC Applications

roguewavecom50 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull A Memory Bug is a mistake in the management of heap memory

bull Leaking Failure to free memory

bull Dangling references Failure to clear pointers

bull Failure to check for error conditions

bull Memory Corruption

bull Writing to memory not allocated

bull Overrunning array bounds

What is a Memory Bug

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 46: Techniques for Debugging HPC Applications

roguewavecom51 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Advantages of TotalView HIA Technology

bull Use it with your existing builds

bull No Source Code or Binary Instrumentation

bull Programs run nearly full speed

bull Low performance overhead

bull Low memory overhead

bull Efficient memory usage

TotalView Heap Interposition Agent (HIA) Technology

Malloc API

User Code and Libraries

Process

TotalView

Heap Interposition

Agent (HIA)Allocation

Table

Deallocation

Table

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 47: Techniques for Debugging HPC Applications

roguewavecom52 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

TotalView New UI Features

bull Leak detection

bull Heap Status

bull Dangling pointer detection

Coming Features

bull Memory Error Events

bull Memory Corruption Detection

bull Memory Block Painting

bull Memory Hoarding

bull Memory Comparisons between processes

Memory Debugging Features ndash MemoryScape TotalView

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 48: Techniques for Debugging HPC Applications

TotalView Reverse Debugging

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 49: Techniques for Debugging HPC Applications

roguewavecom54 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull Reverse debugging provides the ability for developers to go back in execution history

bull Activated either before program starts running or at some point after execution begins

bull Capturing and deterministically replay execution

bull Enables stepping backwards and forward by function line or instruction

bull Run backwards to breakpoints

bull Run backwards and stop when a variable changes value

bull Saving recording files for later analysis or collaboration

bull For remote connection use CLI dhistory ndashsave ltnamegt

Reverse Debugging with TotalView

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 50: Techniques for Debugging HPC Applications

roguewavecom55 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Reverse Debugging Controls

Run forward-

Run backwards

Next forward over functions

-Next backwards over functions

Step forward into functions

-Step backwards into

functions

Advance forward out of function call

-Advance backwards to

calling function

Advance forward to selected line

-Advance backward to

selected line

Advance to ldquoliverdquo session

Create a bookmark at this point in recorded history

Save the recorded session

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 51: Techniques for Debugging HPC Applications

Debugging CUDA Applications with TotalView

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 52: Techniques for Debugging HPC Applications

roguewavecom59 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull NVIDIA Tesla Fermi Kepler Pascal Volta Turing Ampere

bull NVIDIA CUDA 92 10 and 11

bull With support for Unified Memory

bull Debugging 64-bit CUDA programs

bull Features and capabilities include

bull Support for dynamic parallelism

bull Support for MPI based clusters and multi-card configurations

bull Flexible Display and Navigation on the CUDA device

bull Physical (device SM Warp Lane)

bull Logical (Grid Block) tuples

bull CUDA device window reveals what is running where

bull Support for types and separate memory address spaces

bull Leverages CUDA memcheck

TotalView for the NVIDIA reg GPU Accelerator

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 53: Techniques for Debugging HPC Applications

roguewavecom60 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Source View Opened on CUDA host code

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 54: Techniques for Debugging HPC Applications

roguewavecom61 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

Breakpoint Set in CUDA Kernel Code Before Launch

Hollow breakpoint indicates a breakpoint will be set when the code is loaded onto the GPU

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 55: Techniques for Debugging HPC Applications

roguewavecom62 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

GPU Physical and Logical Toolbars

Logical toolbar displays the Block and Thread coordinates

Physical toolbar displays the Device number Streaming Multiprocessor Warp and Lane

To view a CUDA host thread select a thread with a positive thread ID in the Process and Threads view

To view a CUDA GPU thread select a thread with a negative thread ID then use the GPU thread selector on the logical toolbar to focus on a specific GPU thread

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 56: Techniques for Debugging HPC Applications

roguewavecom63 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull The identifier local is a TotalView built-in type storage qualifier that tells the debugger the storage kind of A is

local storage

bull The debugger uses the storage qualifier to determine how to locate A in device memory

Displaying CUDA Program Elementslocal type qualifier indicates that variable A is in local storage

ldquoelementsrdquo is a pointer to a float in generic storage

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 57: Techniques for Debugging HPC Applications

Using TotalView for Parallel Debugging on ANL

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 58: Techniques for Debugging HPC Applications

totalviewio65 | TotalView by Perforce copy Perforce Software Inc

TotalView remote debugging on Linux and Mac OS

bull Download and install TotalView on your linux or mac

bull Connect to remote front node

bull Run labs remotely

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 59: Techniques for Debugging HPC Applications

totalviewio66 | TotalView by Perforce copy Perforce Software Inc

Hands-on labs

bull Install TV from installers on Mac or Linux

bull Ignore license code

bull Star TotalView

bull Remotely connect to cooley and enable Reverse Connection

Labs

bull Lab 1 Debugger Basic

bull Lab 2 Viewing Examining Watching and Editing Data

bull Lab 3 Examining and Controlling a Parallel Application (on Cooley)

bull Using remote connect (tvconnect)

bull qsub ndashq training tvconnectjob

bull Modify and submit tvconnectjob on your machine

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 60: Techniques for Debugging HPC Applications

totalviewio67 | TotalView by Perforce copy Perforce Software Inc

TotalView is available on Theta Cooley

bull Connect to CooleyTheta

bull Get allocation first

bull qsub -A ATPESC2021 ndashn 4 ndashq debug-flat-quad ndashI (theta)

bull qsub -A ATPESC2021 ndashn 4 ndashq training ndashI (Cooley)

bull module load totalview (theta)

bull soft add +totalview (cooley)

bull totalview -args mpiexec ndashnp ltNgt demoMpi_v2

bull tvconnect mpiexec ndashnp ltNgt demoMpi_v2

bull Installed at softdebuggerstotalview-2021-08-04toolworkstotalview2021X3756bintotalview

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 61: Techniques for Debugging HPC Applications

roguewavecom68 | Rogue Wave by Perforce copy 2019 Perforce Software Inc

bull TotalView website

bull httpstotalviewio

bull TotalView documentation

bull httpshelptotalviewio

bull TotalView Video Tutorials

bull httpstotalviewiosupportvideo-tutorials

bull Other Resources

bull Blog httpstotalviewioblog

TotalView Resources and Documentation

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 62: Techniques for Debugging HPC Applications

totalviewio69 | TotalView by Perforce copy Perforce Software Inc

bull Use of modern debugger saves you time

bull TotalView can help you because

bull Itrsquos cross-platform (the only debugger you ever need)

bull Allow you to debug accelerators (GPU) and CPU in one session

bull Allow you to debug multiple languages (C++PythonFortran)

Summary

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc

Page 63: Techniques for Debugging HPC Applications

TotalView by Perforcecopy 2019 Perforce Software IncTotalView by Perforce copy Perforce Software Inc