Upload
jordan-mcbain
View
1.587
Download
1
Tags:
Embed Size (px)
DESCRIPTION
feature selection for pattern recognition in novelty detection; mechanical vibrations
Citation preview
Principal Component Analysis for Novelty Detection
A journal article submitted to and accepted by Pattern Recognition Letters
Jordan McBain, P.Eng.
Markus Timusk, PhD, P.Eng.
Condition Monitoring Maintenance technique
Maintenance undertaken when some indicator of health is flagged
Advanced technique employed when cost-benefit analysis justifies the expense of monitoring equipment
Alternative to run-to-failure maintenance and statistically determined time-based maintenance
Employ pattern recognition to automate diagnosis Expert system employed to replicate technicians
maintenance insight Computer and sensors replaces technician and screw driver
set atop vibrating machine – the nature of the vibration used to discern state
Pattern Recognition Equality insufficient means of classifying real-world
members of class (noise, variance, etc) Pattern recognition
Real-world signals presumed to be representative of class reduced to representative n-dimensional feature vectors
Plotted in N-dimensional space Decision boundary generated with pattern recognition
techniques Employed as classification rule
Problems Choice of features
How representative? Maximize number of features? Curse of dimensionality
Imbalance of data
Principal Component Analysis One technique used to find “optimal” set of
features Finds the axes of normally distributed data Select the largest axes and omit smaller ones to
define new basis Project data onto basis to reduce dimensionality of
problem space Each feature presumed to be normally
distributed
N-dimensional scattering of features presumed independent
Combined probability:( ) ( )* ( )P A B P A P B
2
21
1
1( )2
1 1
1 1( ) ( ) ( )2 2
1
1( ) ( )
2
1 1
(2 ) | |(2 )
i i
i
di i t
ii
xd d
ii i i
xx x
d dd
ii
p x p x e
e e
Find principal components (i.e. axes of hyper-ellipsoidal distribution)
Select maximum variance (largest axes)
Eigenvalue problem Eigenvectors – principle
components Eigenvalues – size of axis
Novelty Detection Deals with imbalance of data between classes Fault detection in machinery
Easy to collect data representative of healthy state
Difficult to collect data representative of faulted states Costly to break machinery Operationally unacceptable Poor database of faults kept Can never capture them all!
Model healthy data with decision boundary If test patterns fall outside, classify as a fault!
Problem PCA is best for selecting a subspace that best
represents the data In pattern recognition, we seek to discriminant
between classes Objective of most feature reduction
techniques are not optimized for novelty detection
Feature Reduction Techniques
Feature Reduction Techniques Feature Selection vs. Feature Extraction Selection
Choosing small subsets of features that are adequate to describe classes
E.g. “Search” Examines all subsets of feature combinations to find the one which
maximizes some objective function May employ classifier error as objective function Exponential explosion
Heuristics to mitigate possible If computationally feasible, gives the best results
Extraction Computes a small number of new features form the set of
old features E.g. PCA
Principal Component Analysis Seeks a subspace in which the data
representation error is minimal Development
For a set of n vectors in d-dimensional space seek the equation of a hyper plane onto which the data
may be projected with minimal representation error Hyper plane fixed at the data’s mean, m Hyper plane’s orientation defined by direction vector, w
(normal definition of a plane)
Derive error function
Optimization problem well known eigenvalue problem
Resultant feature space is linear May not represent non-linear and changing data
well Kernel PCA and Dynamic PCA
Techniques only suitable for representing data not discriminating between them
Source: Duda, 2000
Multiple Discriminant Analysis Seeks to find efficient subspaces for
discrimination rather than representation Development
Two class problem with d-dimensional set of n-vectors grouped into D1 and D2
Projected onto some direction vector w to give
Consequently grouped into subsets Y1 and Y 2 Find the direction vector w such that the distance
between projected sample means m1 and m2 is maximized Rationalize the distance against the relative sample size
Reduces to
Solution is described as “analogous to the well known Rayleigh quotient:”
Technique extended for problems with n-classes Objective to maximize the spread between all classes in
the projected space
11 2( )ww S m m
Source: Duda, 2000
Extraction for Novelty Detection
Development Objective: distinguish between normal and
abnormal classes KFDA inappropriate (assumes classes group well
into separate classes) Novelty detection – classes may cluster well but
abnormal classes expected to orbit the normal data Means could overlap
Eliminating previous objective functions
Approach: find the subspace maximizing difference between average spread of the normal class and average spread of the abnormal class measured from the mean of the normal class
Mathematically, for an outlier class containing b elements and target class containing a-elements with mean m_t
To simplify, introduce outlier scatter matrix, O, for outlier data centered at m_t
Reducing to
Maximize this objective function Find the eigenvectors and eigenvalues of the matrix St-O
Select the first k largest eigenvalues and use corresponding eigenvectors as new basis
Project data onto new basis Proceed with classification Limitations
Still dependant on assumption of normal data distribution (as are other PCA techniques)
Assumption: normal data scatter somewhat circularly and outlier data orbit nicely without intruding (as with PCA and MDA )
Machinery vibration data are not normally Gaussian (heuristic)
Validation: Artificial Data Artificial 3-d data set
Normal distribution: spherical (radius 50) centered at origin
Outlier distribution: randomly generated spherical distribution (radius 100) Not permitted to fall within cylinder concentric with the
normal data’s sphere and oriented with length parallel to [1,1,1]
Validation: Artifical Data Results (reduced to 2 dimensions)
Subspace’s normal vector only 7 degrees off from expected [1,1,1]
Experimental Methodology
Apparatus Spectraquest gear dynamics simulator
3-hp motor Magnetic particle brake loading National Instruments PXI data acquisition and
control Accelerometers (sampling 4kHz)
Faults 4 motors employed
healthy Combo bearing faults Broken rotor bars Rotor unbalance
Gearbox faults Fault-free conditions Missing tooth gear Chipped tooth Bearing with outer race faults Bearings with inner and outer race faults
Feature Extraction Autoregressive model
a model of a statistical process generated by regressing previous values of that statistical process with itself
Sampling of sampled signal that best represents the original sampling
Order 10
Segmentation Vibration data segmented into groups based
on intervals with constant number of shaft rotations
Gaussian Window 70% overlap between segments
Results: Proposed Algorithm
Results: Kernel PCA
Results: Kernel FDA N.B. Potential for singular matrices
Results: Exhaustive Feature Search
Feature Extraction in the Absence of Outliers
Motivation and Development The above violates assumption of novelty
detection Limited data from fault classes
In the case where we know nothing of the outlier classes Work with what we have: normal data
Minimize variance of normal data
Results: Novelty Reduction (Outlier Absence)
Conclusions
Conclusions Reduce a large feature space to a smaller one
Mitigate the curse of dimensionality Objective function tweaked for novelty detection Similar to MDA but modified to accommodate case
where normal and outlier means are closely separated
Results good for artificial and machinery data Future work
Extend technique with kernels Difficult problem due to need for mean
Thanks CEMI Dr. Mechefske, Queens