Geometric margin domain description with instance-specific margins Adam Gripton Thursday, 5 th May, 2011

Geometric margin domain description with instance-specific

margins

Adam GriptonThursday, 5th May, 2011

Presentation Contents

• High-level motivation• Development of system• Exact-centre method• Dual-optimisation method• Experimental data collection• Conclusions

High-level motivation

Task as originally stated:

• Expert replacement system to deal with Non-Destructive Assay (NDA) data provided by a sponsor for analysis and classification

• Involves automatic feature extraction and inference via classification step


Data consignment:Fissile elements• Californium-252• Highly Enriched Uranium• Weapons-Grade Plutonium

Shielding methods• Aluminium• Steel ball• Steel planar to det• Lead• CHON (HE sim.)

Detectors• Sodium Iodide scintillator (NaI)• High-Resolution Germanium (semiconductor) spectrometer (HRGS)• Neutron array counter (N50R)

NaI

Neutron

HRGS

Source

Shield

NaI

Source


Data consignment:

Spectroscopy experiments


Data consignment:

τ0 2 * τ0 3 * τ0 etc

BX 0 279384403 138774738 91909165 to

BX 1 1805235 1785515 1770553 16

BX 2 49548 58784 65688

..etc … … …

Neutron multiplicity arrays

High-level motivationf1 f2 f3 f4 Class

0.1 0.2 -0.3 0.4 1

0.15 x -0.2 x 1

0.05 0.22 x x 2

x x -0.4 0.401 2

0.08 0.24 -0.5 0.399 3

• Features (columns) based on physically relevant projections of raw experimental data

• Class vector: refers to fissile material or shielding method• Some data absent: either not measured or not applicable

(structurally missing)


Two principal aims:

1.Devise a novel contribution to existing literature on classification methods

2.Provide system of classification of abstract data that is applicable to provided dataset



Development of system

Overview

Aim 2

Applicability To Dataset

Aim 1

Novel Contribution

Multi-Class

KernelMethods

Missing Data


Overview

Multi-Class

KernelMethods

Missing Data• SVDD (Tax, Duin) and

Multi-Class Hybrid (Lee)• Geometric SVM (Chechik)


• “Kernel trick” : ML algorithms that only query data values implicitly via the dot product

Working with Kernels

• Replace <x,y>←k(x,y) to imitate a mapping {x→φ(x)} such that k(x,y)=<φ(x), φ(y)>

• Valid if Mercer condition holds ({k(xi,xj)} p.semid.)• Allows analysis in complex superspace without

need to directly address its Cartesian form


Support Vector Domain Description

• “One-class classification”• Fits sphere around cluster

of data, allowing errors {ξi}

• Extends in kernel space to more complex boundary

• Hybrid methods: multi-class classification



Dual formulation allows centre to be described in kernel space via weighting factors αi:



Values of αi form partition:• αi =0 inside• αi =1 outside• αi =C (support vectors)

Only support vectors determine size and position of sphere


• Cannot use kernel methods directly with missing features

• Must impute (fill in) or assume probability distribution of missing values: pre-processing

• Missing features describe complex parametric curves in kernel space

• Seek a method which can address incomplete data directly: minimise point-to-line distances

Dealing with Missing Data


• Chechik’s GM-SVM method provides analogue of binary SVM for structurally missing data

• Uses two loops of optimisation to replace instance-specific norms with scalings of full norm

• Questionable applicability to kernel spaces – difficult to choose proper scaling terms and ultimately equivalent to zero imputation

Dealing with Missing Data


Synopsis for Novel System

Structurally missing features

Abstract, context-free

Domain description (one-class)

Avoid imputation / prob. models

Kernel Extension

Applicable to provided

data



Exact-centre method

• Seeks solution in input space only• Demonstrates concept of optimisation-based

distance metric

Exact-centre method

• Cannot sample from entire feature space!• Selects centre point a such that φ(a) is optimal

centre (hence solves a slightly different problem)• Tricky (but possible) to optimise for soft margins

Exact-centre method

• Always performs at least as well as imputation in linear space w.r.t. sphere volume

• Often underperforms in quadratic space (which is expected, as domain restricted)



Dual-optimisation method

• Motivated by desire to search over entire kernel feature space, to match imputation methods for non-trivial kernel maps

• Takes lead from dual formulation of SVDD where weighting factors αi are appended to dataset and implicitly describe centre a

Dual-optimisation method

• a must itself have full features, and therefore so must the “xi” in the sum

• Must therefore provide auxiliary dataset X* with full features to perform this computation

• Choice is largely arbitrary, but must span in FS• Weighting factors no longer “tied” to dataset

Dual-optimisation methodGiven an initial guess α:• Need to first produce full dataset Xa optimally

aligned to a, by optimisation over all possible imputations of incomplete dataset

• Then need to perform minimax optimisation step on vector of point-to-centre distances:

New candidate α at each optimisation step



Experimental data collection

Preparatory trials of datasets constructed to exhibit degree of “structural missingness”:

• 2-D cluster of data with censoring applied to all values |x| > 1

• Two disjoint clusters –in [f1,f2], and in [f3,f4]

• One common dimension and three other dimensions each common to one part of set

Synthetic Data


Structure of comparisons:

Synthetic Data

Linear KernelK(x,y)=<x,y>

Quadratic KernelK(x,y)=(1+<x,y>)2

Hard Margin(all within sphere)

Soft Margin(50% outwith

sphere)

Imputation with [zeros, feature

means, 3 nearest neighbours]

vs.

Our XC and DO methods


Structure of comparisons:

Synthetic Data

Linear KernelK(x,y)=<x,y>

Quadratic KernelK(x,y)=(1+<x,y>)2

Hard Margin(all within sphere)

Soft Margin(50% outwith

sphere)

• Dual-optimisation method on hard margins only

• Particle-Swarm Optimisation also used to provide cross-validated classification study

•Main study is into effect on sphere size


Four main features selected for analysis:• Compton edge position (6 features)• Area under graph up to Compton edge (6)• Mean multiplicity of neutron data (1)• Poisson fit on neutron data (9) and chi-

squared goodness-of-fit (3)Total 25 features

Feature Extraction


Feature Extraction

PCA used on groups of features with identical presence flags to reduce dataset to 10 principal components missingness pattern intact



Conclusions

• Dual-Opt method generally equalled or surpassed imputation methods in hard margin cases; XC method, predictably, did not operate as well in quadratic case

• Unreasonably small spheres start appearing with a soft-margin classifier as datapoints with few features start holding too much weight

• Cross-validation study using a joint optimiser shows improvement with quadratic kernel

Conclusions

• Insight provided into the behaviour of a kernel method with missing data – not much literature deals with this issue

• Link exists with the Randomised Maximum Likelihood (RML) sampling technique

• Deliberate concentration for now on entirely uninformed methods; scope exists for incorporation of this information where known to improve efficiency

Conclusions

• Sphere size ≠ Overall classification accuracy (c.f. a delta-function Parzen window) but this is arguably not we set out to achieve

• Divergent remit – not a catch-all procedure for handling all types of data, but gives insight into how structural missingness can be analysed

Caveats

Conclusions

• Fuller exploration into PSJO technique to provide alternative to auxiliary dataset

• Heavily reliant on optimisation procedures: could make more efficient than nested loop

• Extension to popular radial-basis (RBF) kernel• A more concrete application to sponsor

dataset

Room for Improvement

Thank you for listening…

Figure 1.1 (a)

Documents

Geometric margin domain description with instance-specific margins Adam Gripton Thursday, 5 th May, 2011