20
Konduri Aditya a , Hemanth Kolla a , W. Philip Kegelmeyer a , Timothy M. Shead b , Julia Ling c , Warren L. Davis b a Sandia National Laboratories, Livermore, CA 94550, United States b Sandia National Laboratories, Albuquerque, NM 87123, United States c Citrine Informatics, Redwood City, CA 94063, United States Princeton Combustion Institute Summer Shool June 23-28, 2019 Part 6. Anomaly detection in scientific data using joint statistical moments

Part 6. Anomaly detection in scientific data using joint … · Konduri Adityaa, Hemanth Kollaa, W. Philip Kegelmeyera, Timothy M. Sheadb, Julia Lingc,Warren L.Davisb aSandia National

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Part 6. Anomaly detection in scientific data using joint … · Konduri Adityaa, Hemanth Kollaa, W. Philip Kegelmeyera, Timothy M. Sheadb, Julia Lingc,Warren L.Davisb aSandia National

Konduri Adityaa, Hemanth Kollaa, W. Philip Kegelmeyera, Timothy M. Sheadb, Julia Lingc, Warren

L. Davisb

aSandia National Laboratories, Livermore, CA 94550, United States bSandia National Laboratories, Albuquerque, NM 87123, United States

cCitrine Informatics, Redwood City, CA 94063, UnitedStates

Princeton Combustion Institute Summer Shool June 23-28, 2019

Part 6. Anomaly detection in scientific data using joint statistical moments

Page 2: Part 6. Anomaly detection in scientific data using joint … · Konduri Adityaa, Hemanth Kollaa, W. Philip Kegelmeyera, Timothy M. Sheadb, Julia Lingc,Warren L.Davisb aSandia National

Combustion simulations

• Direct numerical simulations: resolve all the scales in space and time• Attributes:

• Multi-variate: ∼10 − 100 variables; multi-scale• High resolution data of smoothly varying functions

• Massively parallel solvers (e.g. S3D Chen at al. (CSD 2009)) • computationally expensive (tens millions of CPU hours) • large amount of data (∼ 100 TB)

• Exascale (millions processing elements): need efficient workflows

Iso-surfaces of vorticity magnitude colored by temperature

Page 3: Part 6. Anomaly detection in scientific data using joint … · Konduri Adityaa, Hemanth Kollaa, W. Philip Kegelmeyera, Timothy M. Sheadb, Julia Lingc,Warren L.Davisb aSandia National

Initialization

Compute RHS terms

Update solution

Save data (I/O)

In-situ analysis

Mesh (AMR)

End simulation

if(end time)

M. A. Kopera et al. (2014)

A. Krisman (2016)

Time loop

Simulation workflow

Page 4: Part 6. Anomaly detection in scientific data using joint … · Konduri Adityaa, Hemanth Kollaa, W. Philip Kegelmeyera, Timothy M. Sheadb, Julia Lingc,Warren L.Davisb aSandia National

Existing Anomaly Detection Methods

• k-nearest neighbors

• compute “mean” distance from k nearest neighbors

• Local outlier factor

• compute outlier score based on local density

• Supervised learning methods (e.g. neural networks)

• regression and classification

Not efficient and expensive to compute

Page 5: Part 6. Anomaly detection in scientific data using joint … · Konduri Adityaa, Hemanth Kollaa, W. Philip Kegelmeyera, Timothy M. Sheadb, Julia Lingc,Warren L.Davisb aSandia National

Idea• Bivariate dataset

• Characterize the data distribution: principal component analysis• Change in distribution: effects the magnitude of principal value

and orientation of principal vectors

Page 6: Part 6. Anomaly detection in scientific data using joint … · Konduri Adityaa, Hemanth Kollaa, W. Philip Kegelmeyera, Timothy M. Sheadb, Julia Lingc,Warren L.Davisb aSandia National

Principal component analysis

• Mainly captures variance• Need to look at higher order moments to capture extreme events

• Scale the data: with mean and maximum• Compute the co-variance matrix (second order joint moment)• Perform Eigen decomposition to obtain the principal values

and vectors

Page 7: Part 6. Anomaly detection in scientific data using joint … · Konduri Adityaa, Hemanth Kollaa, W. Philip Kegelmeyera, Timothy M. Sheadb, Julia Lingc,Warren L.Davisb aSandia National

Fourth order joint momentKurtosis: measure of “either existing outliers (for the sample kurtosis) or propensity to produce outliers (for the kurtosis of a probability distribution)” (P. H. Westfall, 2014)

• Compute the fourth joint moment (cumulant tensor, T )

• Decompose the fourth order symmetric tensor• Tensor size (Nf

4), Nf : number of features• Matricize the tensor and perform SVD (A. Anandkumar et al.,

JMLR 2014) • Obtain principal kurtosis values and vectors

Page 8: Part 6. Anomaly detection in scientific data using joint … · Konduri Adityaa, Hemanth Kollaa, W. Philip Kegelmeyera, Timothy M. Sheadb, Julia Lingc,Warren L.Davisb aSandia National

Anomaly detection

• First principal kurtosis vector aligns in the direction of anomalies

• Can be used to characterize extreme events

Page 9: Part 6. Anomaly detection in scientific data using joint … · Konduri Adityaa, Hemanth Kollaa, W. Philip Kegelmeyera, Timothy M. Sheadb, Julia Lingc,Warren L.Davisb aSandia National

Combustion test caseConsider a simple problem with a 1D domain• Initial condition

• Fuel-air composition: premixed – 0.6CO + 0.4H2 + 0.5(O2 + 3.76N2) • Solver: scalable reacting flow code S3D (J. H. Chen et al., CSD 2009)

• Number of subdomains: Nd = 4 • Time-steps: ∆t = 0.001 μs• Number of checkpoints: Nt = 20, interval: 1 μs

Page 10: Part 6. Anomaly detection in scientific data using joint … · Konduri Adityaa, Hemanth Kollaa, W. Philip Kegelmeyera, Timothy M. Sheadb, Julia Lingc,Warren L.Davisb aSandia National

Auto-ignition solution

• Early ignition occurs in Region 1: spatial anomaly

• Eventually ignites in Regions 4, 2 and 3

• Time evolution of temperature

Page 11: Part 6. Anomaly detection in scientific data using joint … · Konduri Adityaa, Hemanth Kollaa, W. Philip Kegelmeyera, Timothy M. Sheadb, Julia Lingc,Warren L.Davisb aSandia National

ResultsTime evolution of principal vectors in Region 1 (axes: scaled)

Page 12: Part 6. Anomaly detection in scientific data using joint … · Konduri Adityaa, Hemanth Kollaa, W. Philip Kegelmeyera, Timothy M. Sheadb, Julia Lingc,Warren L.Davisb aSandia National

Feature moment metrics

• Number of features: Nf = 13 (12 species + temperature), index i• Number of subdomains: Nd = 4, index j • Number of time steps: Nt = 20, index n • Project the principal vectors weighted by the principal values onto the features to obtain the feature moment metrics (Fi

j,n)

• eˆ · vˆ is effectively the i-th entry in the k-th vector vˆk

• Property:

Page 13: Part 6. Anomaly detection in scientific data using joint … · Konduri Adityaa, Hemanth Kollaa, W. Philip Kegelmeyera, Timothy M. Sheadb, Julia Lingc,Warren L.Davisb aSandia National

Results

• FMMs distribute across different features when ignition (anomaly) occurs

Page 14: Part 6. Anomaly detection in scientific data using joint … · Konduri Adityaa, Hemanth Kollaa, W. Philip Kegelmeyera, Timothy M. Sheadb, Julia Lingc,Warren L.Davisb aSandia National

Anomaly metricsIdentify spatial and temporal anomalies • Statistical signature: distribution of feature moment metrics changes• Hellinger distance: a symmetric measure of difference between two

discrete distributions P and Q

• Spatial metric: compare each FMM distribution with the average

• Temporal metric: compare FMM distribution between successive time steps

Page 15: Part 6. Anomaly detection in scientific data using joint … · Konduri Adityaa, Hemanth Kollaa, W. Philip Kegelmeyera, Timothy M. Sheadb, Julia Lingc,Warren L.Davisb aSandia National

Algorithm

Page 16: Part 6. Anomaly detection in scientific data using joint … · Konduri Adityaa, Hemanth Kollaa, W. Philip Kegelmeyera, Timothy M. Sheadb, Julia Lingc,Warren L.Davisb aSandia National

Results

• Dash line: threshold for anomaly (=0.5)• Anomalies detected in space and time

Page 17: Part 6. Anomaly detection in scientific data using joint … · Konduri Adityaa, Hemanth Kollaa, W. Philip Kegelmeyera, Timothy M. Sheadb, Julia Lingc,Warren L.Davisb aSandia National

HCCI dataset• Premixed ethanol-air with EGR 2D simulation (A. Bhagatwala et al., 2014) • Equivalence ratio: 0.4, pressure: 45 atm• Mean temperature: 924 K, rms: 40 K• Grid: 672 x 672• Domain decomposition: 12 x 12• Initial condition:

Temperature u - velocity

Page 18: Part 6. Anomaly detection in scientific data using joint … · Konduri Adityaa, Hemanth Kollaa, W. Philip Kegelmeyera, Timothy M. Sheadb, Julia Lingc,Warren L.Davisb aSandia National

HCCI dataset• Earliest inception of ignition kernels• “True” kernel identification: heat release rate > 1x109 J/m3/s

Heat release

0 0.1 0.2 0.30

0.1

0.2

0.3

0 0.1 0.2 0.30

0.1

0.2

0.3

0 0.1 0.2 0.30

0.1

0.2

0.3

0

0.2

0.4

0.6

0 0.1 0.2 0.30

0.1

0.2

0.3

0

0.2

0.4

0.6

True positive

True negative

False positive

False negative0

0.2

0.4

0.6

0.8

1

M1 M2

True events Predicted events

Page 19: Part 6. Anomaly detection in scientific data using joint … · Konduri Adityaa, Hemanth Kollaa, W. Philip Kegelmeyera, Timothy M. Sheadb, Julia Lingc,Warren L.Davisb aSandia National

HCCI dataset• Performance of the algorithm• Receiver Operating Characteristic (ROC) curves• Study the effect of thresholds

Page 20: Part 6. Anomaly detection in scientific data using joint … · Konduri Adityaa, Hemanth Kollaa, W. Philip Kegelmeyera, Timothy M. Sheadb, Julia Lingc,Warren L.Davisb aSandia National

Results• Proposed an unsupervised anomaly detection algorithm

• Verified the idea using synthetic and auto-ignition data

• Future work:

• in-situ implementation of the algorithm into the massively parallel direct numerical simulation solver (S3D)

• evaluate scalability

• apply the algorithm to detect anomalies in other scientific phenomena