Upload
kelley-summers
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Geometric margin domain description with instance-specific
margins
Adam GriptonThursday, 5th May, 2011
Presentation Contents
• High-level motivation• Development of system• Exact-centre method• Dual-optimisation method• Experimental data collection• Conclusions
High-level motivation
Task as originally stated:
• Expert replacement system to deal with Non-Destructive Assay (NDA) data provided by a sponsor for analysis and classification
• Involves automatic feature extraction and inference via classification step
High-level motivation
Data consignment:Fissile elements• Californium-252• Highly Enriched Uranium• Weapons-Grade Plutonium
Shielding methods• Aluminium• Steel ball• Steel planar to det• Lead• CHON (HE sim.)
Detectors• Sodium Iodide scintillator (NaI)• High-Resolution Germanium (semiconductor) spectrometer (HRGS)• Neutron array counter (N50R)
NaI
Neutron
HRGS
Source
Shield
NaI
Source
High-level motivation
Data consignment:
Spectroscopy experiments
High-level motivation
Data consignment:
τ0 2 * τ0 3 * τ0 etc
BX 0 279384403 138774738 91909165 to
BX 1 1805235 1785515 1770553 16
BX 2 49548 58784 65688
..etc … … …
Neutron multiplicity arrays
High-level motivationf1 f2 f3 f4 Class
0.1 0.2 -0.3 0.4 1
0.15 x -0.2 x 1
0.05 0.22 x x 2
x x -0.4 0.401 2
0.08 0.24 -0.5 0.399 3
• Features (columns) based on physically relevant projections of raw experimental data
• Class vector: refers to fissile material or shielding method• Some data absent: either not measured or not applicable
(structurally missing)
High-level motivation
Two principal aims:
1.Devise a novel contribution to existing literature on classification methods
2.Provide system of classification of abstract data that is applicable to provided dataset
Presentation Contents
• High-level motivation• Development of system• Exact-centre method• Dual-optimisation method• Experimental data collection• Conclusions
Development of system
Overview
Aim 2
Applicability To Dataset
Aim 1
Novel Contribution
Multi-Class
KernelMethods
Missing Data
Development of system
Overview
Multi-Class
KernelMethods
Missing Data• SVDD (Tax, Duin) and
Multi-Class Hybrid (Lee)• Geometric SVM (Chechik)
Development of system
• “Kernel trick” : ML algorithms that only query data values implicitly via the dot product
Working with Kernels
• Replace <x,y>←k(x,y) to imitate a mapping {x→φ(x)} such that k(x,y)=<φ(x), φ(y)>
• Valid if Mercer condition holds ({k(xi,xj)} p.semid.)• Allows analysis in complex superspace without
need to directly address its Cartesian form
Development of system
Support Vector Domain Description
• “One-class classification”• Fits sphere around cluster
of data, allowing errors {ξi}
• Extends in kernel space to more complex boundary
• Hybrid methods: multi-class classification
Development of system
Support Vector Domain Description
Dual formulation allows centre to be described in kernel space via weighting factors αi:
Development of system
Support Vector Domain Description
Values of αi form partition:• αi =0 inside• αi =1 outside• αi =C (support vectors)
Only support vectors determine size and position of sphere
Development of system
• Cannot use kernel methods directly with missing features
• Must impute (fill in) or assume probability distribution of missing values: pre-processing
• Missing features describe complex parametric curves in kernel space
• Seek a method which can address incomplete data directly: minimise point-to-line distances
Dealing with Missing Data
Development of system
• Chechik’s GM-SVM method provides analogue of binary SVM for structurally missing data
• Uses two loops of optimisation to replace instance-specific norms with scalings of full norm
• Questionable applicability to kernel spaces – difficult to choose proper scaling terms and ultimately equivalent to zero imputation
Dealing with Missing Data
Development of system
Synopsis for Novel System
Structurally missing features
Abstract, context-free
Domain description (one-class)
Avoid imputation / prob. models
Kernel Extension
Applicable to provided
data
Presentation Contents
• High-level motivation• Development of system• Exact-centre method• Dual-optimisation method• Experimental data collection• Conclusions
Exact-centre method
• Seeks solution in input space only• Demonstrates concept of optimisation-based
distance metric
Exact-centre method
• Cannot sample from entire feature space!• Selects centre point a such that φ(a) is optimal
centre (hence solves a slightly different problem)• Tricky (but possible) to optimise for soft margins
Exact-centre method
• Always performs at least as well as imputation in linear space w.r.t. sphere volume
• Often underperforms in quadratic space (which is expected, as domain restricted)
Presentation Contents
• High-level motivation• Development of system• Exact-centre method• Dual-optimisation method• Experimental data collection• Conclusions
Dual-optimisation method
• Motivated by desire to search over entire kernel feature space, to match imputation methods for non-trivial kernel maps
• Takes lead from dual formulation of SVDD where weighting factors αi are appended to dataset and implicitly describe centre a
Dual-optimisation method
• a must itself have full features, and therefore so must the “xi” in the sum
• Must therefore provide auxiliary dataset X* with full features to perform this computation
• Choice is largely arbitrary, but must span in FS• Weighting factors no longer “tied” to dataset
Dual-optimisation methodGiven an initial guess α:• Need to first produce full dataset Xa optimally
aligned to a, by optimisation over all possible imputations of incomplete dataset
• Then need to perform minimax optimisation step on vector of point-to-centre distances:
New candidate α at each optimisation step
Presentation Contents
• High-level motivation• Development of system• Exact-centre method• Dual-optimisation method• Experimental data collection• Conclusions
Experimental data collection
Preparatory trials of datasets constructed to exhibit degree of “structural missingness”:
• 2-D cluster of data with censoring applied to all values |x| > 1
• Two disjoint clusters –in [f1,f2], and in [f3,f4]
• One common dimension and three other dimensions each common to one part of set
Synthetic Data
Experimental data collection
Structure of comparisons:
Synthetic Data
Linear KernelK(x,y)=<x,y>
Quadratic KernelK(x,y)=(1+<x,y>)2
Hard Margin(all within sphere)
Soft Margin(50% outwith
sphere)
Imputation with [zeros, feature
means, 3 nearest neighbours]
vs.
Our XC and DO methods
Experimental data collection
Structure of comparisons:
Synthetic Data
Linear KernelK(x,y)=<x,y>
Quadratic KernelK(x,y)=(1+<x,y>)2
Hard Margin(all within sphere)
Soft Margin(50% outwith
sphere)
• Dual-optimisation method on hard margins only
• Particle-Swarm Optimisation also used to provide cross-validated classification study
•Main study is into effect on sphere size
Experimental data collection
Four main features selected for analysis:• Compton edge position (6 features)• Area under graph up to Compton edge (6)• Mean multiplicity of neutron data (1)• Poisson fit on neutron data (9) and chi-
squared goodness-of-fit (3)Total 25 features
Feature Extraction
Experimental data collection
Feature Extraction
PCA used on groups of features with identical presence flags to reduce dataset to 10 principal components missingness pattern intact
Presentation Contents
• High-level motivation• Development of system• Exact-centre method• Dual-optimisation method• Experimental data collection• Conclusions
Conclusions
• Dual-Opt method generally equalled or surpassed imputation methods in hard margin cases; XC method, predictably, did not operate as well in quadratic case
• Unreasonably small spheres start appearing with a soft-margin classifier as datapoints with few features start holding too much weight
• Cross-validation study using a joint optimiser shows improvement with quadratic kernel
Conclusions
• Insight provided into the behaviour of a kernel method with missing data – not much literature deals with this issue
• Link exists with the Randomised Maximum Likelihood (RML) sampling technique
• Deliberate concentration for now on entirely uninformed methods; scope exists for incorporation of this information where known to improve efficiency
Conclusions
• Sphere size ≠ Overall classification accuracy (c.f. a delta-function Parzen window) but this is arguably not we set out to achieve
• Divergent remit – not a catch-all procedure for handling all types of data, but gives insight into how structural missingness can be analysed
Caveats
Conclusions
• Fuller exploration into PSJO technique to provide alternative to auxiliary dataset
• Heavily reliant on optimisation procedures: could make more efficient than nested loop
• Extension to popular radial-basis (RBF) kernel• A more concrete application to sponsor
dataset
Room for Improvement
Thank you for listening…
Figure 1.1 (a)