1
Slycat Ensemble Analysis Patricia J. Crossno, Timothy M. Shead, Milosz A. Sielicki, Warren L. Hunt, Shawn Martin, and Ming-Yu Hsieh Sandia National Laboratories Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND 2014-1399P Patricia J. Crossno: [email protected] Timothy M. Shead: [email protected] Milosz A. Sielicki: [email protected] Warren L. Hunt: [email protected] Analysis Tasks: Find strongest input/output correlations Find inputs with least impact on outputs Find anomalous simulation runs CCA Visual Representations Scatterplot: Each Simulation Relative to Ensemble Distance off diagonal shows difference from ensemble as a whole, plus potential anomalies. Purple = Outputs Bar chart: Ensemble- wide Relationship s Viewing 1 st CCA component in both views Input x1 has the least impact on outputs y1 and y2 250 simulations, each color-coded by its y1 output value Selected simulatio n Positive many-to- many correlatio n (bar color the same) between X25 & X14 and Y2 & Y1 Green = Inputs Inputs x25 & x14 have the most impact on both outputs y1 and y2 Viewing 2 nd CCA component in both bar chart & scatterplot 250 simulations, each color-coded by its x23 input value Inputs and outputs sorted by correlation strength within CCA2 component X23 selected for scatterplot color-coding (dark green row highlight) Three distinct groups of input values Inverse correlation (red vs. blue) between x23 & y4; CCA3 captures relationship between x8 & y3 Scatterplot color-coding changed by clicking on y4 row (darker purple highlight) Three output value groups map to the 3 input groups 250 simulations, each color-coded by its y4 output value Click CCA column header to select CCA component in views Viewing 3 rd CCA component in both bar chart & scatterplot Inverse correlation between x8 & y3; CCA2 captures relationship between x23 & y4 250 simulations, each color-coded by its x8 input value X8 inputs range from low (blue) to high (red) X8 selected for scatterplot color-coding (dark green row highlight) Click header triangle to sort variables (toggles from decreasing to increasing) 250 simulations, each color-coded by its y3 output value Correspondin g y3 outputs inversely range from high (red) to low (blue) Scatterplot color-coding changed by clicking on y3 row (darker purple highlight) Approach: Canonical Correlation Analysis (CCA) feature s simulatio ns output s input s s 1 s 2 s n o 2 i 1 o m s 3 s 4 i k o 1 . . . . . . CCA feature s input s i 1 i k . . . output s o 2 o 1 . . . o m CCA components c 1 c k CCA1 input meta- features output meta- features s 1 s 2 s n s 4 s 3 Structure Correlatio ns Slycat Sensitivi ty Analysis Input parameter s Simulation Ensemble Simple Regression (1-to-1) Multiple Regression (Many-to-1) Model Confidenc e How About Many-to- Many Correlation s? Problem: Electrical Circuit Simulation Sensitivity Analysis Rerun CCA analysis between all inputs and y4 to find strongest correlations (all- to-1) All to y4 analysis 4 anomalous runs share common x248 values 2641 simulations, each color-coded by its x248 input value (strongest) All to y4 analysis 4 anomalous runs share common x255 values 2641 simulations, each color-coded by its x255 input value (2 nd strongest) 2641 simulations, each color-coded by its y4 output value 4 anomalous runs in y4 values All to all analysis Finding Anomalous Simulations Finding Most Significant Inputs Objectives: Map Output Variability Back to Inputs Reduce Number of Input Parameters Reduce Number of Simulations to Run Identify Anomalous Runs Increase Model Confidence 266 scrollab le Inputs Note R2 is increasing & P is decreasing with each CCA component Available Open Source https://github.com/sandialabs/slycat Reduce Inputs & Simulations In the 2641 run ensemble above, analysis allowed input parameters to be reduced from 266 to 21, decreasing simulation time ten-fold.

Slycat Ensemble Analysis Patricia J. Crossno, Timothy M. Shead, Milosz A. Sielicki, Warren L. Hunt, Shawn Martin, and Ming-Yu Hsieh Sandia National Laboratories

Embed Size (px)

Citation preview

Page 1: Slycat Ensemble Analysis Patricia J. Crossno, Timothy M. Shead, Milosz A. Sielicki, Warren L. Hunt, Shawn Martin, and Ming-Yu Hsieh Sandia National Laboratories

Slycat Ensemble AnalysisPatricia J. Crossno, Timothy M. Shead, Milosz A. Sielicki,

Warren L. Hunt, Shawn Martin, and Ming-Yu Hsieh Sandia National Laboratories

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.SAND 2014-1399P

Patricia J. Crossno: [email protected] Timothy M. Shead: [email protected] Milosz A. Sielicki: [email protected] Warren L. Hunt: [email protected] Shawn Martin: [email protected] Hsieh: [email protected]

Analysis Tasks:• Find strongest input/output correlations• Find inputs with least impact on outputs

• Find anomalous simulation runs

CCA Visual Representations

Scatterplot: Each Simulation Relative to Ensemble

Distance off diagonal shows difference from

ensemble as a whole, plus potential anomalies.

Purple = Outputs

Bar chart: Ensemble-wide Relationships

Viewing 1st CCA component in

both views

Input x1 has the least impact on

outputs y1 and y2

250 simulations, each color-coded by its y1 output value

Selected simulation

Positive many-to-

many correlation (bar color the same)

between X25 & X14 and

Y2 & Y1

Green = Inputs

Inputs x25 & x14 have the most impact on both

outputs y1 and y2

Viewing 2nd CCA component in

both bar chart & scatterplot

250 simulations, each color-coded by its x23 input value

Inputs and outputs sorted by

correlation strength within

CCA2 component

X23 selected for scatterplot color-

coding (dark green row highlight)

Three distinct groups of

input values

Inverse correlation (red vs. blue)

between x23 & y4; CCA3 captures

relationship between x8 & y3

Scatterplot color-coding changed by clicking on y4

row (darker purple highlight)

Three output value groupsmap to the 3 input groups

250 simulations, each color-coded by its y4 output value

Click CCA column header to select CCA component in

views

Viewing 3rd CCA component in

both bar chart & scatterplot

Inverse correlation between x8 & y3; CCA2 captures

relationship between x23 & y4

250 simulations, each color-coded by

its x8 input value

X8 inputs range from low (blue)

to high (red)

X8 selected for scatterplot color-

coding (dark green row highlight)

Click header triangle to sort

variables (toggles fromdecreasing to

increasing)

250 simulations, each color-coded by its y3 output value

Corresponding y3 outputs

inversely range from high (red)

to low (blue)

Scatterplot color-coding changed by clicking on y3

row (darker purple highlight)

Approach: Canonical Correlation Analysis (CCA)

feat

ure

s

simulations

ou

tpu

tsin

pu

ts

s1 s2 sn

o2

i1

om

…s3 s4

ik

o1

.

.

.

.

.

.

CCA

feat

ure

s

inp

uts

i1

ik

.

.

.

ou

tpu

ts

o2

o1

.

.

.om

CCA components

c1 ck…

CCA1

input meta-features

ou

tpu

t m

eta

-fe

atu

res

s1

s2

sns4

s3

Structure Correlations

Slycat Sensitivity Analysis

Input parameters

SimulationEnsemble

Simple Regression(1-to-1)

Multiple Regression(Many-to-1)

Model Confidence

How About Many-to-Many Correlations?

Problem: Electrical Circuit Simulation Sensitivity Analysis

Rerun CCA analysis between all inputs and y4 to find strongest correlations (all-to-1)

All to y4 analysis4 anomalous runs

share common x248 values

2641 simulations, each color-coded by its x248 input value

(strongest)

All to y4 analysis4 anomalous runs

share common x255 values

2641 simulations, each color-coded by its x255

input value (2nd strongest)

2641 simulations, each color-coded by its y4 output value

4 anomalous runs in y4 values

All to all analysis

Finding Anomalous SimulationsFinding Most Significant Inputs

Objectives:• Map Output Variability Back to Inputs• Reduce Number of Input Parameters

• Reduce Number of Simulations to Run• Identify Anomalous Runs

• Increase Model Confidence

266 scrollable

Inputs

Note R2 is increasing & P is decreasing with

each CCA component

Available Open Sourcehttps://github.com/sandialabs/slycat

Reduce Inputs & Simulations

In the 2641 run ensemble above, analysis allowed input parameters to be reduced from 266 to 21,

decreasing simulation time ten-fold.