43
Visualizing Distributed Memory Computations with Hive Plots VizSec 2012, October 15, 2012, Seattle, Washington Sophie Engle and Sean Whalen

Visualizing Distributed Memory Computations with Hive Plots Visualizing... · 2021. 1. 6. · Multiclass Classification of Distributed Memory Parallel Computations by Sean Whalen,

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

  • Visualizing Distributed Memory Computations with Hive Plots

    VizSec 2012, October 15, 2012, Seattle, Washington Sophie Engle and Sean Whalen

  • Introduction

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    2

  • Introduction

    •  High performance computing environments – Used for scientific computing applications at several

    national laboratories – Potential for misuse by insiders and outsiders

    •  Anomaly detection – Determine normal versus abnormal behavior for

    these environments to prevent unauthorized use – Can classify codes into “computational dwarves” to

    determine “normal” (Asanovic 2006)

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    3

  • Introduction

    •  Several network measures can be used as features in classification – Time consuming to calculate these measures – Time consuming to compare how well these

    measures perform for classification

    •  Use visualization to choose network measures to use as classification features – Which measures look similar for similar codes? – Which measures look distinct for distinct codes?

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    4

  • Dataset

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    5

  • Original Dataset

    •  Data collection – Collected by NERSC at LBNL – Used IPM to monitor MPI calls between ranks

    (captures communication between compute nodes)

    •  Dataset contents – Total of 1681 IPM logs – Covers 29 different scientific computing codes with

    varying ranks, parameters, and architectures

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    6

  • Original Dataset Src,Dst,MPICall,Bytes,Repeats,Code 0,1,29,99856,52,cactus 0,4,29,99856,52,cactus 0,0,2,4,5,cactus 0,0,2,8,7,cactus 0,1,22,599136,26,cactus 0,-1,5,0,1,cactus 0,4,22,599136,26,cactus 0,16,29,99856,52,cactus

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    7

  • Original Dataset Src,Dst,MPICall,Bytes,Repeats,Code 0,1,29,99856,52,cactus 0,4,29,99856,52,cactus 0,0,2,4,5,cactus 0,0,2,8,7,cactus 0,1,22,599136,26,cactus 0,-1,5,0,1,cactus 0,4,22,599136,26,cactus 0,16,29,99856,52,cactus

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    8

  • Subset Analyzed Code Description Nodes Edges

    cactus astrophysics 64   989

    ij algebraic multi-grid 64  8596

    milc lattice gauge theory 64  1473

    namd molecular dynamics 64  8208

    paratec materials science 64 16492

    superlu sparse linear algebra 64  3239

    tgyro magnetic fusion 64  1123

    vasp materials science 64 13760

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    9

  • Subset Analyzed Code Description Nodes Edges

    cactus astrophysics 64   989

    ij algebraic multi-grid 64  8596

    milc lattice gauge theory 64  1473

    namd molecular dynamics 64  8208

    paratec materials science 64 16492

    superlu sparse linear algebra 64  3239

    tgyro magnetic fusion 64  1123

    vasp materials science 64 13760

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    10

  • Network Measures

    Measure Description

    degree the number of adjacent edges

    betweenness number of shortest paths going through a node

    closeness measures steps required to reach every other node

    eccentricity shortest path distance from farthest node

    page rank measures relative importance of node

    transitivity

    probability adjacent nodes are connected (clustering coefficient)

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    11

    Calculated in R using the igraph library.

  • Motivation

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    12

  • Traditional Degree CCDF Plot

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    13

    cactusij

    milc

    namd

    superlutgyro

  • Traditional Degree CCDF Plot

    •  Pros: – Able to compare individual metrics across datasets – Simple approach, widely used

    •  Cons: – Contains no information on topology – Lines look visually similar, may not be appropriate

    for generating visual signatures

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    14

  • Adjacency Matrices

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    15

  • Adjacency Matrices

    •  Pros: – Comparable across datasets – Easy to see communication patterns – Many distinct codes look distinct

    •  Cons: – No information on metrics needed for classification

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    16

  • Issues Identified

    •  Traditional CCDF plot does not convey any information about network topology

    •  Traditional adjacency matrices do not convey any information about network properties

    •  Traditional network layout algorithms are not repeatable or comparable across networks

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    17

  • Hive Plots

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    18

  • Introduction to Hive Plots

    •  What are hive plots? – Network layout algorithm using network properties

    for consistent node placement – A radially-arranged parallel coordinate plot

    •  Why use hive plots? – Repeatable, comparable network layouts –  Integration of network properties with topology

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    19

  • Understanding Hive Plots 32

    259 837

    tgyro degree

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    20

  • Understanding Hive Plots 32

    259 837

    tgyro degree

    primary axis

    self loopmax axis value

    edges b/w nodeson different axes

    degree ranges from0 to 837 across datasets

    node degree b/w 260 and 837

    edges b/w nodeson same axis

    duplicate axis

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    21

  • Implementation

    •  Existing implementations exist –  JHive (Java) –  HiveR (R) –  HiveGraph (webapp) –  Prototypes in Perl and d3.js

    •  Custom implementation in R and ggplot2 –  Implements grammar of graphics (Wilkinson) –  Polar plots to create hive plots –  Facets to create hive panels* –  Non-interactive

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    22

  • Implementation 0.008

    0.015 0.661

    0.008

    0.015 0.661

    evenly spaced nodes0.008

    0.015 0.661

    0.008

    0.015 0.661

    linear interpolation0.008

    0.015 0.661

    0.008

    0.015 0.661

    interpolation and jitter

    page rank (milc)

    page rank (cactus)

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    23

  • Implementation 32

    259 837

    0.008

    0.016 0.016

    consistent alpha32

    259 837

    0.008

    0.016 0.016

    variable alpha

    closeness (cactus)

    degree (superlu)

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    24

  • Hive Plot References

    Hive Plots—Rational Approach to Visualizing Networks by Martin Krzywinski, Inanc Birol, Steven JM Jones and Marco A Marra in Briefings in Bioinformatics, volume 13, issue 5, pages 627–644, 2012

    Hive Plots: Rational Network Visualization—Farewell to Hairballs by Martin Krzywinski at http://www.hiveplot.com online

    Getting Into Visualization of Large Biological Data Sets by Martin Krzywinski, Inanc Birol, Steven Jones, Marco Marra in BioVis 2012 Posters, 2nd floor foyer, Sunday 8:30am – Monday 5:55pm

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    25

  • Initial Results

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    26

  • Degree cactus

    namd

    ij

    superlu

    milc

    tgyro

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    27

  • Betweenness cactus

    namd

    ij

    superlu

    milc

    tgyro

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    28

  • Closeness cactus

    namd

    ij

    superlu

    milc

    tgyro

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    29

  • Eccentricity cactus

    namd

    ij

    superlu

    milc

    tgyro

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    30

  • Page Rank cactus

    namd

    ij

    superlu

    milc

    tgyro

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    31

  • Transitivity cactus

    namd

    ij

    superlu

    milc

    tgyro

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    32

  • Visually Distinct

    cactus ij milc namd superlu tygro

    degree x x x x betweenness x x x x x x closeness x x x x eccentricity x x x page rank x x x x transitivity x x x x x x

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    33

  • Visually Distinct

    cactus ij milc namd superlu tygro

    degree x x x x betweenness x x x x x x closeness x x x x eccentricity x x x page rank x x x x transitivity x x x x x x

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    34

  • Next Steps

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    35

  • Next Steps

    •  Improve hive plot visualizations – Explore variable-length axes – Explore better axes assignment

    •  Incorporate more information from data set – Multiple-edge connections – Type of IPM calls – Amount of data transmitted

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    36

  • Next Steps

    •  Feature identification – Compare hive plots for more distinct codes – Compare hive plots for similar codes –  Identify features that visually distinguish codes

    •  Classification and anomaly detection – Determine if features identified by visualization lead

    to better classifiers and anomaly detection

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    37

  • Conclusion

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    38

  • Summary

    •  Motivation and goals –  Improve anomaly detection in HPC environments –  Improve classification of HPC codes – Use exploratory visualization for feature selection

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    39

  • Summary

    •  Motivation and goals –  Improve anomaly detection in HPC environments –  Improve classification of HPC codes – Use exploratory visualization for feature selection

    •  Initial results – Hive plots allow visual comparison of HPC codes – Some features distinguish distinct HPC codes

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    40

  • References

    Hive Plots—Rational Approach to Visualizing Networks by Martin Krzywinski, Inanc Birol, Steven JM Jones and Marco A Marra in Briefings in Bioinformatics, volume 13, issue 5, pages 627–644, 2012

    Network-Theoretic Classification of Parallel Computation Patterns by Sean Whalen, Sophie Engle, Sean Peisert, and Matt Bishop in International Journal of High Performance Computing Applications (IJHPCA), volume 26, number 2, pages 159–169, May 2012

    Multiclass Classification of Distributed Memory Parallel Computations by Sean Whalen, Sean Peisert, and Matt Bishop to appear in Pattern Recognition Letters (PRL), 2012

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    41

  • Contact Information

    Sophie Engle University of San Francisco Department of Computer Science [email protected] w http://sjengle.cs.usfca.edu

    Sean Whalen Mount Sinai School of Medicine Institute for Genomics and Multiscale Biology [email protected] w http://node99.org

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    42

  • Questions?

    VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

    43