QPRC June [email protected] Wookyeon Hwang Univ. of South Carolina George Runger Industrial Engineering Industrial, Systems, and Operations Engineering

4251

0011 0010 1010 1101 0001 0100 1011

QPRC June 2009 [email protected] 1

Wookyeon Hwang Univ. of South Carolina

George Runger Industrial EngineeringIndustrial, Systems, and Operations EngineeringSchool of Computing, Informatics, and Decision Systems EngineeringArizona State University

Eugene Tuv Intel

Process Monitoring with Supervised Learning and Artificial Contrasts

4251

0011 0010 1010 1101 0001 0100 1011


Statistical Process Control/Anomaly Detection

• Objective is to detect change in a system– Transportation, environmental, security, health, processes, etc.

• In modern approach, leverage massive data– Continuous, categorical, missing, outliers, nonlinear relationships

• Goal is a widely-applicable, flexible method– Normal conditions and fault type unknown

• Capture relationships between multiple variables– Learn patterns, exploit patterns– Traditional Hotelling’s T2 captures structure, provides control region

(boundary), quantifies false alarms

4251

0011 0010 1010 1101 0001 0100 1011


Traditional Monitoring• Traditional approach is

Hotelling’s (1948) T-squared chart

• Numerical measurements, based on multivariate normality

• Simple elliptical pattern (Mahalanobis distance)

• Time-weighted extensions, exponentially weighted moving average, and cumulative sum– More efficient, but same

elliptical [email protected] 3

4251

0011 0010 1010 1101 0001 0100 1011


Transform to Supervised Learning• Process monitoring can be transformed to a supervised

learning problem– One approach--supplement with artificial, contrasting data– Any one of multiple learners can be used, without pre-

specified faults– Results can generalize monitoring in several directions—such

as arbitrary (nonlinear) in-control conditions, fault knowledge, and categorical variables

– High-dimensional problems can be handled with an appropriate learner

4251

0011 0010 1010 1101 0001 0100 1011


Learn Process Patterns• Learn pattern compared to “structureless” alternative• Generate noise, artificial data without structure to

differentiate– For example, f(x) = f1(x1)… f2(x2) joint distribution as product of

marginals (enforce independence)– Or f(x) = product of uniforms

• Define & assign y = +/–1 to “actual” and “artificial” data, artificial contrast

• Use supervised (classification) learner to distinguish the data sets– Only simple examples used here

4251

0011 0010 1010 1101 0001 0100 1011


Learn Pattern from Artificial Contrast

4251

0011 0010 1010 1101 0001 0100 1011


Regularized Least Squares (Kernel Ridge) Classifier with Radial Basis Functions

• Model with a linear combination of basis functions• Smoothness penalty controls complexity

– Tightly related to Support Vector Machines (SVM)– Regularized least squares allows closed form solution, trades it for

sparsity, may not want to trade!• Previous example: challenge for a generalized learner--

multivariate normal data!f(x)

x1

x2

4251

0011 0010 1010 1101 0001 0100 1011


RLS Classifier

where

with parameters , Solution

1

( ) K( , )n

i ii

f c

x x x

2 2( , ) exp( 2 )K x x x x

( ) I K c y

min ( ) '( )c y Kc y Kc c'Kc

K

n

iiiHf fxfyL

])(,[min

1

4251

0011 0010 1010 1101 0001 0100 1011


Patterns Learned from Artificial Contrast RLSC

• True Hotelling’s 95% probability bound

• Red: learned contour function to assign +/-1

• Actual: n = 1000 Artificial: n = 2000

• Complexity: 4/3000• Sigma2 = 5

4251

0011 0010 1010 1101 0001 0100 1011


More Challenging Example withHotelling’s Contour

4251

0011 0010 1010 1101 0001 0100 1011



• Actual: n = 1000 Artificial: n = 2000

• Complexity: 4/3000

• Sigma2 = 5

4251

0011 0010 1010 1101 0001 0100 1011



-20 -15 -10 -5 0 5 10 15 20-20

-15

-10

-5

0

5

10

15

20

random datagiven datadecision boundary

Actual: n = 1000 Artificial: n = 1000

Complexity: 4/2000

Sigma2 = 5

4251

0011 0010 1010 1101 0001 0100 1011


RLSC for p = 10 dimensions

Shift = 1

Training error (Type II error)

Testing error (Type II error)

Chi-squared (99.5%) (Type II error)

Mean 0.00666 0.980 0.982StDev 0.00057 0.00305

Shift = 3Mean 0.005 0.487 0.489StDev 0.00264 0.0483

4251

0011 0010 1010 1101 0001 0100 1011


Tree-Based Ensembles p = 10• Alternative learner– works with mixed

data – elegantly handle

missing data– scale invariant– outlier resistance– insensitive to

extraneous predictors

• Provide an implicit ability to select key variables

Shift = 1

Training error(Type I error)

OOB for training data

Testing error(Type II error)

OOB for test data

Chi-squared (99.5%)(Type II error)

Mean 0 0.00233 0.989 0.0026 0.982StDe

v 0 0.00152 0.0075 0.0011

Shift = 3

Mean 0 0.00266 0.532 0.0033 0.489

StDev 0 0.00115 0.2270 0.0023

4251

0011 0010 1010 1101 0001 0100 1011


Nonlinear Patterns• Hotelling’s

boundary—not a good solution when patterns are not linear

• Control boundaries from supervised learning captures the normal operating condition

4251

0011 0010 1010 1101 0001 0100 1011


-4 -3 -2 -1 0 1 2 3 4 5-5

-4

-3

-2

-1

0

1

2

3

4

x1

x 2

Boundaries by RLSC vs Hotelling 95% boundary

in-control data

reference data

Tuned Control• Extend to incorporate specific

process knowledge of faults• Artificial contrasts generated from

the specified fault distribution – or from a mixture of samples from

different fault distributions • Numerical optimization to design a

control statistic can be very complicated– maximizes the likelihood function

under a specified fault (alternative)

4251

0011 0010 1010 1101 0001 0100 1011


Tuned Control

• Fault: means of both variables x1 and x2 are known to increase

• Artificial data (black) are sampled from 12 independent normal distributions– Mean vectors are selected from a grid

over the area [0, 3] x [0, 3] • Learned control region is shown in

the right panel—approx. matches the theoretical result in Testik et al., 2004.

4251

0011 0010 1010 1101 0001 0100 1011


Incorporate Time-Weighted Rules

• What form of statistic should be filtered and monitored?– Log likelihood ratio

• Some learners provide call probability estimates • Bayes’ theorem (for equal sample size) gives

• Log likelihood ratio for an observation xt estimated as

• Apply EWMA (or CUSUM, etc.) to lt

tintoutt ppl __ lnln

1)1( ttt ZlZ

)(

)(

)(

)(

x

x

x

x

in

out

in

out

p

p

f

f

4251

0011 0010 1010 1101 0001 0100 1011


Time-Weighted ARLs• ARLs for selected schemes applied to lt statistic

– 10-dimensional, independent normal

EWMA No shift 5 vars. shift 1 sigma 10 vars. shift 1 sigmaAverage 202.8 10.1 4.68Stdev 3.65 1.21 0.27IndAverage 200.5 39.4 11.8Stdev 5.79 18.68 5.62

4251

0011 0010 1010 1101 0001 0100 1011


Example: 50 Dimensions

[email protected] 20

4251

0011 0010 1010 1101 0001 0100 1011


Example: 50 Dimensions• Hotelling’s: left• Artificial contrast:

right

[email protected] 21

4251

0011 0010 1010 1101 0001 0100 1011


Example: Credit Data (UCI)• 20 attributes: 7 numerical and 13 categorical• Associated class label of “good” or “bad” credit

risk• Artificial data generated from continuous and

discrete uniform distributions, respectively, independently for each attribute

• Ordered by 300 “good” instances followed by 300 “bad”

[email protected]

4251

0011 0010 1010 1101 0001 0100 1011


Artificial Contrasts for Credit Data

• Plot of lt over time

[email protected]

4251

0011 0010 1010 1101 0001 0100 1011


Diagnostics: Contribution Plots

• 50 dimensions: 2 contributors, 48 noise variables (scatter plot projections to contributor variables)

4251

0011 0010 1010 1101 0001 0100 1011


Contributor Plots from PCA T2

4251

0011 0010 1010 1101 0001 0100 1011


Contributor Plots from PCA SPE

4251

0011 0010 1010 1101 0001 0100 1011


Contributor Plots from Artificial Contrast Ensemble (ACE)

• Impurity importance weighted by means of split variable

4251

0011 0010 1010 1101 0001 0100 1011


Contributor Plots for Nonlinear System

• Contributor plots from SPE, T2 and ACE in left, center, right, respectively

4251

0011 0010 1010 1101 0001 0100 1011


Conclusions• Can/must leverage the automated-ubiquitous, data-

computational environment– Professional obsolesce

• Employ flexible, powerful control solution, for broad applications: environment, health, security, etc., as well as manufacturing– “Normal” sensors not obvious, patterns not known

• Include automated diagnosis– Tools to filter to identify contributors

• Computational feasibility in embedded software

This material is based upon work supported by the National Science Foundation under Grant No. 0355575.

Documents

QPRC June [email protected] Wookyeon Hwang Univ. of South Carolina George Runger Industrial Engineering Industrial, Systems, and Operations Engineering