44
1 OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future Technologies Group omputer Science and Mathematics Division Oak Ridge National Laboratory Paradyn Research Group Computer Sciences Department University of Wisconsin-Madison

O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

Embed Size (px)

Citation preview

Page 1: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

1

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

On-line Automated Performance Diagnosis on Thousands of Processors

Philip C. Roth

Future Technologies GroupComputer Science and Mathematics Division

Oak Ridge National LaboratoryParadyn Research Group

Computer Sciences DepartmentUniversity of Wisconsin-Madison

Page 2: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

2

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

High Performance Computing Today

Large parallel computing resources Tightly coupled systems (Earth Simulator,

BlueGene/L, XT3) Clusters (LANL Lightning, LLNL Thunder) Grid

Large, complex applications ASCI Blue Mountain job sizes (2001)

512 cpus: 17.8% 1024 cpus: 34.9% 2048 cpus: 19.9%

Small fraction of peak performance is the rule

Page 3: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

3

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Achieving Good Performance

Need to know what and where to tune Diagnosis and tuning tools are critical for realizing

potential of large-scale systems On-line automated tools are especially desirable

Manual tuning is difficult Finding interesting data in large data volume Understanding application, OS, hardware interactions

Automated tools require minimal user involvement; expertise is built into the tool

On-line automated tools can adapt dynamically Dynamic control over data volume Useful results from a single run

But: tools that work well in small-scale environments often don’t scale

Page 4: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

4

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Tool Front End

d0 d1 d2 d3 dP-4 dP-3 dP-2 dP-1

a0 a1 a2 a3 aP-4 aP-3 aP-2 aP-1

Tool Daemons

App Processes

•Managing performance data volume•Communicating efficiently between distributed tool components

•Making scalable presentation of data and analysis results

Barriers to Large-Scale Performance Diagnosis

Page 5: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

5

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Our Approach for Addressing These Scalability Barriers

MRNet: multicast/reduction infrastructure for scalable tools

Distributed Performance Consultant: strategy for efficiently finding performance bottlenecks in large-scale applications

Sub-Graph Folding Algorithm: algorithm for effectively presenting bottleneck diagnosis results for large-scale applications

Page 6: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

6

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Outline

Performance Consultant MRNet Distributed Performance Consultant Sub-Graph Folding Algorithm Evaluation Summary

Page 7: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

7

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Automated performance diagnosis Search for application performance

problems Start with global, general experiments (e.g., test

CPUbound across all processes) Collect performance data using dynamic

instrumentation Collect only the data desired Remove the instrumentation when no longer

needed Make decisions about truth of each experiment Refine search: create more specific experiments

based on “true” experiments (those whose data is above user-configurable threshold)

Performance Consultant

Page 8: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

8

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Performance Consultant

c002.cs.wisc.educ001.cs.wisc.edu c128.cs.wisc.edu

myapp367 myapp4287 myapp27549

Page 9: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

9

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

c002.cs.wisc.educ001.cs.wisc.edu

CPUbound

c001.cs.wisc.edu

main

myapp{367}

Do_row Do_col

Do_mult

c128.cs.wisc.edu

main

myapp{27549}

Do_row Do_col

Do_mult

main

Do_row Do_col

Do_mult

c002.cs.wisc.edu

main

myapp{4287}

Do_row Do_col

Do_mult

… …

……

c128.cs.wisc.edu

myapp367 myapp4287 myapp27549

Performance Consultant

Page 10: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

10

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

c002.cs.wisc.educ001.cs.wisc.edu

CPUbound

c001.cs.wisc.edu

main

myapp{367}

Do_row Do_col

Do_mult

c128.cs.wisc.edu

main

myapp{27549}

Do_row Do_col

Do_mult

main

Do_row Do_col

Do_mult

c002.cs.wisc.edu

main

myapp{4287}

Do_row Do_col

Do_mult

… …

……

cham.cs.wisc.edu c128.cs.wisc.edu

myapp367 myapp4287 myapp27549

Performance Consultant

Page 11: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

11

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Outline

Performance Consultant MRNet Distributed Performance Consultant Sub-Graph Folding Algorithm Evaluation Summary

Page 12: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

12

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

MRNet: Multicast/Reduction Overlay Network

Parallel tool infrastructure providing: Scalable multicast Scalable data synchronization and transformation

Network of processes between tool front-end and back-ends

Useful for parallelizing and distributing tool activities Reduce latency Reduce computation and communication load at tool

front-end Joint work with Dorian Arnold (University of

Wisconsin-Madison)

Page 13: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

13

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Typical Parallel Tool Organization

Tool Front End

d0 d1 d2 d3

a0 a1 a2 a3

dP-4 dP-3 dP-2 dP-1

aP-4 aP-3 aP-2 aP-1

Tool Daemons

App Processes

Page 14: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

14

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

MRNet-based Parallel Tool Organization

Tool Front End

d0 d1 d2 d3

a0 a1 a2 a3

dP-4 dP-3 dP-2 dP-1

aP-4 aP-3 aP-2 aP-1

Tool Daemons

App Processes

Multicast/ Reduction Network

Internal Process

Filter

Page 15: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

15

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Outline

Performance Consultant MRNet Distributed Performance Consultant Sub-Graph Folding Algorithm Evaluation Summary

Page 16: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

16

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Performance Consultant: Scalability Barriers

MRNet can alleviate scalability problem for global performance data (e.g., CPU utilization across all processes)

But front-end still processes local performance data (e.g., utilization of process 5247 on host mcr398.llnl.gov)

Page 17: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

17

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

c002.cs.wisc.educ001.cs.wisc.edu

CPUbound

c001.cs.wisc.edu

main

myapp{367}

Do_row Do_col

Do_mult

c128.cs.wisc.edu

main

myapp{27549}

Do_row Do_col

Do_mult

main

Do_row Do_col

Do_mult

c002.cs.wisc.edu

main

myapp{4287}

Do_row Do_col

Do_mult

… …

……

cham.cs.wisc.edu c128.cs.wisc.edu

myapp367 myapp4287 myapp27549

Performance Consultant

Page 18: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

18

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

c002.cs.wisc.educ001.cs.wisc.edu

CPUbound

c001.cs.wisc.edu

main

myapp{367}

Do_row Do_col

Do_mult

c128.cs.wisc.edu

main

myapp{27549}

Do_row Do_col

Do_mult

main

Do_row Do_col

Do_mult

c002.cs.wisc.edu

main

myapp{4287}

Do_row Do_col

Do_mult

… …

……

cham.cs.wisc.edu c128.cs.wisc.edu

myapp367 myapp4287 myapp27549

Distributed Performance Consultant

Page 19: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

19

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Distributed Performance Consultant: Variants

Natural steps from traditional centralized approach (CA)

Partially Distributed Approach (PDA) Distributed local searches, centralized global search Requires complex instrumentation management

Truly Distributed Approach (TDA) Distributed local searches only Insight into global behavior from combining local

search results (e.g., using Sub-Graph Folding Algorithm)

Simpler tool design than PDA

Page 20: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

20

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

c002.cs.wisc.educ001.cs.wisc.edu

CPUbound

c001.cs.wisc.edu

main

myapp{367}

Do_row Do_col

Do_mult

c128.cs.wisc.edu

main

myapp{27549}

Do_row Do_col

Do_mult

main

Do_row Do_col

Do_mult

c002.cs.wisc.edu

main

myapp{4287}

Do_row Do_col

Do_mult

… …

……

cham.cs.wisc.edu c128.cs.wisc.edu

myapp367 myapp4287 myapp27549

Distributed Performance Consultant: PDA

Page 21: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

21

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

c002.cs.wisc.educ001.cs.wisc.edu

c001.cs.wisc.edu

main

myapp{367}

Do_row Do_col

Do_mult

c128.cs.wisc.edu

main

myapp{27549}

Do_row Do_col

Do_mult

c002.cs.wisc.edu

main

myapp{4287}

Do_row Do_col

Do_mult

… …

…… …

cham.cs.wisc.edu c128.cs.wisc.edu

myapp367 myapp4287 myapp27549

Distributed Performance Consultant: TDA

Page 22: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

22

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

c002.cs.wisc.educ001.cs.wisc.edu

c001.cs.wisc.edu

main

myapp{367}

Do_row Do_col

Do_mult

c128.cs.wisc.edu

main

myapp{27549}

Do_row Do_col

Do_mult

c002.cs.wisc.edu

main

myapp{4287}

Do_row Do_col

Do_mult

… …

…… …

cham.cs.wisc.edu c128.cs.wisc.edu

myapp367 myapp4287 myapp27549

Distributed Performance Consultant: TDA

Sub-Graph Folding Algorithm

Page 23: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

23

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Outline

Paradyn and the Performance Consultant

MRNet Distributed Performance Consultant Sub-Graph Folding Algorithm Evaluation Summary

Page 24: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

24

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Search History Graph Example

CPUbound

c34.cs.wisc.edu

myapp{7624}

main

A B

C

D

main

A B

C

D

myapp{1272}

main

A B

C

D

myapp{1273}

main

A B

C

D E

myapp{7625}

main

A B

C

D

c33.cs.wisc.edu

Page 25: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

25

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Search History Graphs

Search History Graph is effective for presenting search-based performance diagnosis results…

…but it does not scale to a large number of processes because it shows one sub-graph per process

Page 26: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

26

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Sub-Graph Folding Algorithm

Combines host-specific sub-graphs into composite sub-graphs

Each composite sub-graph represents a behavioral category among application processes

Dynamic clustering of processes by qualitative behavior

Page 27: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

27

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

SGFA: Example

CPUbound

c34.cs.wisc.edu

myapp{7624}

main

A B

C

D

main

A B

C

D

myapp{1272}

main

A B

C

D

myapp{1273}

main

A B

C

D E

myapp{7625}

main

A B

C

D

c33.cs.wisc.edu

myapp{*}

D E

c*.cs.wisc.edu

Page 28: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

28

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

SGFA: Implementation

Custom MRNet filter Filter in each MRNet process keeps

folded graph of search results from all reachable daemons

Updates periodically sent upstream By induction, filter in front-end holds

entire folded graph Optimization for unchanged graphs

Page 29: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

29

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Outline

Performance Consultant MRNet Distributed Performance Consultant Sub-Graph Folding Algorithm Evaluation Summary

Page 30: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

30

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

DPC + SGFA: Evaluation

Modified Paradyn to perform bottleneck searches using CA, PDA, or TDA approach

Modified instrumentation cost tracking to support PDA Track global, per-process instrumentation cost

separately Simple fixed-partition policy for scheduling global

and local instrumentation Implemented Sub-Graph Folding Algorithm as

custom MRNet filter to support TDA (used by all)

Instrumented front-end, daemons, and MRNet internal processes to collect CPU, I/O load information

Page 31: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

31

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

DPC + SGFA: Evaluation

su3_rmd QCD pure lattice gauge theory code C, MPI Weak scaling scalability study

LLNL MCR cluster 1152 nodes (1048 compute nodes) Two 2.4 GHz Intel Xeons per node 4 GB memory per node Quadrics Elan3 interconnect (fat tree) Lustre parallel file system

Page 32: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

32

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

DPC + SGFA: Evaluation

PDA and TDA: bottleneck searches with up to 1024 processes so far, limited by partition size

CA: scalability limit at less than 64 processes

Similar qualitative results from all approaches

Page 33: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

33

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

DPC: Evaluation

Page 34: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

34

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

DPC: Evaluation

Page 35: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

35

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

DPC: Evaluation

Page 36: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

36

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

DPC: Evaluation

Page 37: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

37

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

DPC: Evaluation

Page 38: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

38

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

DPC: Evaluation

Page 39: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

39

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

DPC: Evaluation

Page 40: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

40

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

DPC: Evaluation

Page 41: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

41

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

DPC: Evaluation

Page 42: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

42

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

SGFA: Evaluation

Page 43: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

43

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Summary

Tool scalability is critical for effective use of large-scale computing resources

On-line automated performance tools are especially important at large scale

Our approach: MRNet Distributed Performance Consultant (TDA)

plus Sub-Graph Folding Algorithm

Page 44: O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future

44

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

References

P.C. Roth, D.C. Arnold, and B.P. Miller, “MRNet: a Software-Based Multicast/Reduction Network for Scalable Tools,” SC 2003, Phoenix, Arizona, November 2003

P.C. Roth and B.P. Miller, “The Distributed Performance Consultant and the Sub-Graph Folding Algorithm: On-line Automated Performance Diagnosis on Thousands of Processes,” in submission

Publications available from http://www.paradyn.org

MRNet software available from http://www.paradyn.org/mrnet