Upload
bryan-drake
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 1
Wookyeon Hwang Univ. of South Carolina
George Runger Industrial EngineeringIndustrial, Systems, and Operations EngineeringSchool of Computing, Informatics, and Decision Systems EngineeringArizona State University
Eugene Tuv Intel
Process Monitoring with Supervised Learning and Artificial Contrasts
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 22
Statistical Process Control/Anomaly Detection
• Objective is to detect change in a system– Transportation, environmental, security, health, processes, etc.
• In modern approach, leverage massive data– Continuous, categorical, missing, outliers, nonlinear relationships
• Goal is a widely-applicable, flexible method– Normal conditions and fault type unknown
• Capture relationships between multiple variables– Learn patterns, exploit patterns– Traditional Hotelling’s T2 captures structure, provides control region
(boundary), quantifies false alarms
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 3
Traditional Monitoring• Traditional approach is
Hotelling’s (1948) T-squared chart
• Numerical measurements, based on multivariate normality
• Simple elliptical pattern (Mahalanobis distance)
• Time-weighted extensions, exponentially weighted moving average, and cumulative sum– More efficient, but same
elliptical [email protected] 3
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 4
Transform to Supervised Learning• Process monitoring can be transformed to a supervised
learning problem– One approach--supplement with artificial, contrasting data– Any one of multiple learners can be used, without pre-
specified faults– Results can generalize monitoring in several directions—such
as arbitrary (nonlinear) in-control conditions, fault knowledge, and categorical variables
– High-dimensional problems can be handled with an appropriate learner
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 55
Learn Process Patterns• Learn pattern compared to “structureless” alternative• Generate noise, artificial data without structure to
differentiate– For example, f(x) = f1(x1)… f2(x2) joint distribution as product of
marginals (enforce independence)– Or f(x) = product of uniforms
• Define & assign y = +/–1 to “actual” and “artificial” data, artificial contrast
• Use supervised (classification) learner to distinguish the data sets– Only simple examples used here
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 66
Learn Pattern from Artificial Contrast
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 77
Regularized Least Squares (Kernel Ridge) Classifier with Radial Basis Functions
• Model with a linear combination of basis functions• Smoothness penalty controls complexity
– Tightly related to Support Vector Machines (SVM)– Regularized least squares allows closed form solution, trades it for
sparsity, may not want to trade!• Previous example: challenge for a generalized learner--
multivariate normal data!f(x)
x1
x2
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 88
RLS Classifier
where
with parameters , Solution
1
( ) K( , )n
i ii
f c
x x x
2 2( , ) exp( 2 )K x x x x
( ) I K c y
min ( ) '( )c y Kc y Kc c'Kc
K
n
iiiHf fxfyL
])(,[min
1
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 99
Patterns Learned from Artificial Contrast RLSC
• True Hotelling’s 95% probability bound
• Red: learned contour function to assign +/-1
• Actual: n = 1000 Artificial: n = 2000
• Complexity: 4/3000• Sigma2 = 5
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 1010
More Challenging Example withHotelling’s Contour
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 1111
Patterns Learned from Artificial Contrast RLSC
• Actual: n = 1000 Artificial: n = 2000
• Complexity: 4/3000
• Sigma2 = 5
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 1212
Patterns Learned from Artificial Contrast RLSC
-20 -15 -10 -5 0 5 10 15 20-20
-15
-10
-5
0
5
10
15
20
random datagiven datadecision boundary
Actual: n = 1000 Artificial: n = 1000
Complexity: 4/2000
Sigma2 = 5
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 1313
RLSC for p = 10 dimensions
Shift = 1
Training error (Type II error)
Testing error (Type II error)
Chi-squared (99.5%) (Type II error)
Mean 0.00666 0.980 0.982StDev 0.00057 0.00305
Shift = 3Mean 0.005 0.487 0.489StDev 0.00264 0.0483
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 1414
Tree-Based Ensembles p = 10• Alternative learner– works with mixed
data – elegantly handle
missing data– scale invariant– outlier resistance– insensitive to
extraneous predictors
• Provide an implicit ability to select key variables
Shift = 1
Training error(Type I error)
OOB for training data
Testing error(Type II error)
OOB for test data
Chi-squared (99.5%)(Type II error)
Mean 0 0.00233 0.989 0.0026 0.982StDe
v 0 0.00152 0.0075 0.0011
Shift = 3
Mean 0 0.00266 0.532 0.0033 0.489
StDev 0 0.00115 0.2270 0.0023
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 15
Nonlinear Patterns• Hotelling’s
boundary—not a good solution when patterns are not linear
• Control boundaries from supervised learning captures the normal operating condition
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 16
-4 -3 -2 -1 0 1 2 3 4 5-5
-4
-3
-2
-1
0
1
2
3
4
x1
x 2
Boundaries by RLSC vs Hotelling 95% boundary
in-control data
reference data
Tuned Control• Extend to incorporate specific
process knowledge of faults• Artificial contrasts generated from
the specified fault distribution – or from a mixture of samples from
different fault distributions • Numerical optimization to design a
control statistic can be very complicated– maximizes the likelihood function
under a specified fault (alternative)
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 17
Tuned Control
• Fault: means of both variables x1 and x2 are known to increase
• Artificial data (black) are sampled from 12 independent normal distributions– Mean vectors are selected from a grid
over the area [0, 3] x [0, 3] • Learned control region is shown in
the right panel—approx. matches the theoretical result in Testik et al., 2004.
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 18
Incorporate Time-Weighted Rules
• What form of statistic should be filtered and monitored?– Log likelihood ratio
• Some learners provide call probability estimates • Bayes’ theorem (for equal sample size) gives
• Log likelihood ratio for an observation xt estimated as
• Apply EWMA (or CUSUM, etc.) to lt
tintoutt ppl __ lnln
1)1( ttt ZlZ
)(
)(
)(
)(
x
x
x
x
in
out
in
out
p
p
f
f
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 19
Time-Weighted ARLs• ARLs for selected schemes applied to lt statistic
– 10-dimensional, independent normal
EWMA No shift 5 vars. shift 1 sigma 10 vars. shift 1 sigmaAverage 202.8 10.1 4.68Stdev 3.65 1.21 0.27IndAverage 200.5 39.4 11.8Stdev 5.79 18.68 5.62
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 21
Example: 50 Dimensions• Hotelling’s: left• Artificial contrast:
right
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 22
Example: Credit Data (UCI)• 20 attributes: 7 numerical and 13 categorical• Associated class label of “good” or “bad” credit
risk• Artificial data generated from continuous and
discrete uniform distributions, respectively, independently for each attribute
• Ordered by 300 “good” instances followed by 300 “bad”
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 23
Artificial Contrasts for Credit Data
• Plot of lt over time
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 24
Diagnostics: Contribution Plots
• 50 dimensions: 2 contributors, 48 noise variables (scatter plot projections to contributor variables)
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 25
Contributor Plots from PCA T2
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 26
Contributor Plots from PCA SPE
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 27
Contributor Plots from Artificial Contrast Ensemble (ACE)
• Impurity importance weighted by means of split variable
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 28
Contributor Plots for Nonlinear System
• Contributor plots from SPE, T2 and ACE in left, center, right, respectively
4251
0011 0010 1010 1101 0001 0100 1011
QPRC June 2009 [email protected] 2929
Conclusions• Can/must leverage the automated-ubiquitous, data-
computational environment– Professional obsolesce
• Employ flexible, powerful control solution, for broad applications: environment, health, security, etc., as well as manufacturing– “Normal” sensors not obvious, patterns not known
• Include automated diagnosis– Tools to filter to identify contributors
• Computational feasibility in embedded software
This material is based upon work supported by the National Science Foundation under Grant No. 0355575.