76
Object Orie’d Data Analysis, Last Time • Kernel Embedding – Embed data in higher dimensional manifold – Gives greater flexibility to linear methods • Support Vector Machines – Aimed at very non-Gaussian Data – E.g. from Kernel Embedding • Distance Weighted Discrimination – HDLSS Improvement of SVM

Object Orie’d Data Analysis, Last Time

Embed Size (px)

DESCRIPTION

Object Orie’d Data Analysis, Last Time. Kernel Embedding Embed data in higher dimensional manifold Gives greater flexibility to linear methods Support Vector Machines Aimed at very non-Gaussian Data E.g. from Kernel Embedding Distance Weighted Discrimination HDLSS Improvement of SVM. - PowerPoint PPT Presentation

Citation preview

Page 1: Object Orie’d Data Analysis, Last Time

Object Orie’d Data Analysis, Last Time

• Kernel Embedding– Embed data in higher dimensional

manifold

– Gives greater flexibility to linear methods

• Support Vector Machines– Aimed at very non-Gaussian Data

– E.g. from Kernel Embedding

• Distance Weighted Discrimination– HDLSS Improvement of SVM

Page 2: Object Orie’d Data Analysis, Last Time

Support Vector MachinesGraphical View, using Toy Example:

• Find separating plane

• To maximize distances from data to plane

• In particular smallest distance

• Data points closest are called

support vectors

• Gap between is called margin

Page 3: Object Orie’d Data Analysis, Last Time

Support Vector MachinesGraphical View, using Toy Example:

Page 4: Object Orie’d Data Analysis, Last Time

Support Vector MachinesForgotten last time,

Important Extension:

Multi-Class SVMs

Hsu & Lin (2002)

Lee, Lin, & Wahba (2002)

• Defined for “implicit” version

• “Direction Based” variation???

Page 5: Object Orie’d Data Analysis, Last Time

Support Vector MachinesAlso forgotten last time,

Toy examples illustrating

Explicit vs. Implicit

Kernel Embedding

As well as effect of window width, σ

on Gaussian kernel embedding

Page 6: Object Orie’d Data Analysis, Last Time

SVMs, Comput’n & Embedding

For an “Embedding Map”,

e.g.

Explicit Embedding:

Maximize:

Get classification function:

• Straightforward application of embedding

• But loses inner product advantage

x

2x

xx

i ji

jijijiiD xxyyL,

21

n

iiii bxxyxf

1

Page 7: Object Orie’d Data Analysis, Last Time

SVMs, Comput’n & EmbeddingImplicit Embedding:

Maximize:

Get classification function:

• Still defined only via inner products• Retains optimization advantage• Thus used very commonly• Comparison to explicit embedding?• Which is “better”???

i ji

jijijiiD xxyyL,

21

n

iiii bxxyxf

1

Page 8: Object Orie’d Data Analysis, Last Time

Support Vector MachinesTarget Toy Data set:

Page 9: Object Orie’d Data Analysis, Last Time

Support Vector MachinesExplicit Embedding, window σ = 0.1:

Page 10: Object Orie’d Data Analysis, Last Time

Support Vector MachinesExplicit Embedding, window σ = 1:

Page 11: Object Orie’d Data Analysis, Last Time

Support Vector MachinesExplicit Embedding, window σ = 10:

Page 12: Object Orie’d Data Analysis, Last Time

Support Vector MachinesExplicit Embedding, window σ = 100:

Page 13: Object Orie’d Data Analysis, Last Time

Support Vector MachinesNotes on Explicit Embedding:

• Too small Poor generalizability

• Too big miss important regions

• Classical lessons from kernel smoothing

• Surprisingly large “reasonable region”

• I.e. parameter less critical (sometimes?)

Also explore projections (in kernel space)

Page 14: Object Orie’d Data Analysis, Last Time

Support Vector MachinesKernel space projection, window σ =

0.1:

Page 15: Object Orie’d Data Analysis, Last Time

Support Vector MachinesKernel space projection, window σ = 1:

Page 16: Object Orie’d Data Analysis, Last Time

Support Vector MachinesKernel space projection, window σ =

10:

Page 17: Object Orie’d Data Analysis, Last Time

Support Vector MachinesKernel space projection, window σ =

100:

Page 18: Object Orie’d Data Analysis, Last Time

Support Vector MachinesKernel space projection, window σ =

100:

Page 19: Object Orie’d Data Analysis, Last Time

Support Vector MachinesNotes on Kernel space projection:

• Too small – Great separation

– But recall, poor generalizability

• Too big no longer separable

• As above:– Classical lessons from kernel smoothing

– Surprisingly large “reasonable region”

– I.e. parameter less critical (sometimes?)

Also explore projections (in kernel space)

Page 20: Object Orie’d Data Analysis, Last Time

Support Vector MachinesImplicit Embedding, window σ = 0.1:

Page 21: Object Orie’d Data Analysis, Last Time

Support Vector MachinesImplicit Embedding, window σ = 0.5:

Page 22: Object Orie’d Data Analysis, Last Time

Support Vector MachinesImplicit Embedding, window σ = 1:

Page 23: Object Orie’d Data Analysis, Last Time

Support Vector MachinesImplicit Embedding, window σ = 10:

Page 24: Object Orie’d Data Analysis, Last Time

Support Vector MachinesNotes on Implicit Embedding:

• Similar Large vs. Small lessons

• Range of “reasonable results”

Seems to be smaller

(note different range of windows)

• Much different “edge” behavior

Interesting topic for future work…

Page 25: Object Orie’d Data Analysis, Last Time

Distance Weighted Discrim’n 2-d Visualization:

Pushes PlaneAway FromData

All PointsHave SomeInfluence

n

i iw r1,

1min

Page 26: Object Orie’d Data Analysis, Last Time

Distance Weighted Discrim’n References for more on DWD:

• Current paper:Marron, Todd and Ahn (2007)

• Links to more papers:Ahn (2007)

• JAVA Implementation of DWD:caBIG (2006)

• SDPT3 Software:Toh (2007)

Page 27: Object Orie’d Data Analysis, Last Time

Batch and Source Adjustment

Recall from Class Notes 8/28/07• For Stanford Breast Cancer Data (C.

Perou)• Analysis in Benito, et al (2004)

Bioinformatics, 20, 105-114. https://genome.unc.edu/pubsup/dwd/

• Adjust for Source Effects– Different sources of mRNA

• Adjust for Batch Effects– Arrays fabricated at different times

Page 28: Object Orie’d Data Analysis, Last Time

Source Batch Adj: Biological Class Col. &

Symbols

Page 29: Object Orie’d Data Analysis, Last Time

Source Batch Adj: Source Colors

Page 30: Object Orie’d Data Analysis, Last Time

Source Batch Adj: PC 1-3 & DWD direction

Page 31: Object Orie’d Data Analysis, Last Time

Source Batch Adj: DWD Source Adjustment

Page 32: Object Orie’d Data Analysis, Last Time

Source Batch Adj: Source Adj’d, PCA view

Page 33: Object Orie’d Data Analysis, Last Time

Source Batch Adj: S. & B Adj’d, Adj’d PCA

Page 34: Object Orie’d Data Analysis, Last Time

3434

UNC, Stat & OR

Why not adjust using SVM?

Major Problem: Proj’d Distrib’al

Shape

Triangular Dist’ns (opposite skewed)

Does not allow sensible rigid shift

Page 35: Object Orie’d Data Analysis, Last Time

3535

UNC, Stat & OR

Why not adjust using SVM?

Nicely Fixed by DWD

Projected Dist’ns near Gaussian

Sensible to shift

Page 36: Object Orie’d Data Analysis, Last Time

3636

UNC, Stat & OR

Why not adjust by means?

DWD is complicated: value added?

Xuxin Liu example…

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

(although still not perfect)

Page 37: Object Orie’d Data Analysis, Last Time

3737

UNC, Stat & OR

Why not adjust by means?

Next time:

Work in before and after, slides like

138-141 from DWDnormPreso.ppt

In Research/Bioinf/caBIG

Page 38: Object Orie’d Data Analysis, Last Time

3838

UNC, Stat & OR

Twiddle ratios of subtypes

Page 39: Object Orie’d Data Analysis, Last Time

3939

UNC, Stat & OR

Why not adjust by means?

DWD robust against non-

proportional subtypes…

Mathematical Statistical Question:

Are there mathematics behind this?

(will answer next time…)

Page 40: Object Orie’d Data Analysis, Last Time

4040

UNC, Stat & OR

DWD in Face Recognition

Face Images as Data

(with M. Benito & D. Peña)

Male – Female Difference?

Discrimination Rule?

Represented as long

vector of pixel gray levels

Registration is critical

Page 41: Object Orie’d Data Analysis, Last Time

4141

UNC, Stat & OR

DWD in Face Recognition, (cont.)

Registered Data

Shifts and scale

Manually chosen

To align eyes and mouth

Still large variation

See males vs. females???

Page 42: Object Orie’d Data Analysis, Last Time

4242

UNC, Stat & OR

DWD in Face Recognition , (cont.)

DWD Direction

Good separation

Images “make

sense”

Garbage at ends?

(extrapolation

effects?)

Page 43: Object Orie’d Data Analysis, Last Time

4343

UNC, Stat & OR

DWD in Face Recognition , (cont.)

Unregistered Version

Much blurrier

Since features don’t

properly line up

Nonlinear Variation

But DWD still works

Can see M-F

differ’ce?

Page 44: Object Orie’d Data Analysis, Last Time

4444

UNC, Stat & OR

DWD in Face Recognition , (cont.)

Interesting summary:

Jump between

means

(in DWD direction)

Clear separation of

Maleness vs.

Femaleness

Page 45: Object Orie’d Data Analysis, Last Time

4545

UNC, Stat & OR

DWD in Face Recognition , (cont.)

Fun Comparison:

Jump between means

(in SVM direction)

Also distinguishes

Maleness vs.

Femaleness

But not as well as

DWD

Page 46: Object Orie’d Data Analysis, Last Time

4646

UNC, Stat & OR

DWD in Face Recognition , (cont.)

Analysis of difference: Project onto normals SVM has “small gap” (feels noise artifacts?) DWD “more informative” (feels real structure?)

Page 47: Object Orie’d Data Analysis, Last Time

4747

UNC, Stat & OR

DWD in Face Recognition, (cont.)

Current Work:

Focus on “drivers”:

(regions of interest)

Relation to Discr’n?

Which is “best”?

Lessons for human

perception?

Page 48: Object Orie’d Data Analysis, Last Time

Outcomes DataBreast Cancer Study (C. M. Perou):

• Outcome of interest = death or survival

• Connection with gene expression?

Approach:

• Treat death vs. survival during study as “classes”

• Find “direction that best separates the classes”

Page 49: Object Orie’d Data Analysis, Last Time

Outcomes DataFind “direction that best separates the classes”

Page 50: Object Orie’d Data Analysis, Last Time

Outcomes DataFind “direction that best separates the classes”

Page 51: Object Orie’d Data Analysis, Last Time

Outcomes DataFind “direction that best separates classes”• DWD Projection• SVM Projection

Notes:• SVM is “better separated”?

(recall “data piling” problems….)• DWD gives “more spread between sub-

populations”???(perhaps “more stable”?)

Page 52: Object Orie’d Data Analysis, Last Time

Outcomes DataWhich is “better”?

Approach:

• Find “genes of interest”

• To maximize loadings of direction vectors

(reflects pointing in gene direction)

• Show intensity plot (of gene expression)

• Using top 20 genes in each direction

Page 53: Object Orie’d Data Analysis, Last Time

Outcomes DataWhich is “better”?

• Study with gene intensity plot

• Order cases by DWD score (proj’n)

• Order genes by DWD loading (vec. entry)

• Reduce to top & bottom 20

• Color map shows gene expression

• Shows genes that drive classification

• Gene names also available

Page 54: Object Orie’d Data Analysis, Last Time

Outcomes DataWhich is “better”? DWD direction

Page 55: Object Orie’d Data Analysis, Last Time

Outcomes DataWhich is “better”? SVM direction

Page 56: Object Orie’d Data Analysis, Last Time

Outcomes DataWhich is “better”?

• DWD finds genes showing better separation

• SVM genes are less informative

Page 57: Object Orie’d Data Analysis, Last Time

Outcomes DataHow about Centroid (Mean Diff’nce) Method?

Page 58: Object Orie’d Data Analysis, Last Time

Outcomes DataHow about Centroid (Mean Diff’nce) Method?

Page 59: Object Orie’d Data Analysis, Last Time

Outcomes DataCompare to DWD direction

Page 60: Object Orie’d Data Analysis, Last Time

Outcomes DataHow about Centroid (Mean Diff’nce) Method?

• Best yet, in terms of red – green plot?

• Projections unacceptably mixed?

• These are two different goals…

• Try for trade-off?

Scale space approach???

• Interesting philosophical point:

Very simple things often “best”

Page 61: Object Orie’d Data Analysis, Last Time

Outcomes DataWeakness of above analysis:

• Some with “genes prone to disease” have not died yet

• Perhaps can see in DWD expression plot?

Better analysis:

• More sophisticated survival methods

• Work in progress w/ Brent Johnson, Danyu Li, Helen Zhang

Page 62: Object Orie’d Data Analysis, Last Time

Distance Weighted Discrim’n 2=d Visualization:

Pushes PlaneAway FromData

All PointsHave SomeInfluence

n

i iw r1,

1min

Page 63: Object Orie’d Data Analysis, Last Time

Distance Weighted Discrim’n Maximal Data Piling

Page 64: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n Simulations

Main idea:

Comparison of

• SVM (Support Vector Machine)

• DWD (Distance Weighted Discrimination)

• MD (Mean Difference, a.k.a. Centroid)

Linear versions, across dimensions

Page 65: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n Simulations

Overall Approach:

• Study different known phenomena– Spherical Gaussians

– Outliers

– Polynomial Embedding

• Common Sample Sizes

• But wide range of dimensions

25 nn

1600,400,100,40,10d

Page 66: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n Simulations

Spherical Gaussians:

Page 67: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n Simulations

Spherical Gaussians:• Same setup as before• Means shifted in dim 1 only,• All methods pretty good• Harder problem for higher dimension• SVM noticeably worse• MD best (Likelihood method)• DWD very close to MD• Methods converge for higher

dimension??

2.21

Page 68: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n Simulations

Outlier Mixture:

Page 69: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n Simulations

Outlier Mixture:80% dim. 1 , other dims 020% dim. 1 ±100, dim. 2 ±500, others 0• MD is a disaster, driven by outliers• SVM & DWD are both very robust• SVM is best• DWD very close to SVM (insig’t

difference)• Methods converge for higher dimension??

Ignore RLR (a mistake)

2.21

Page 70: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n Simulations

Wobble Mixture:

Page 71: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n Simulations

Wobble Mixture:80% dim. 1 , other dims 020% dim. 1 ±0.1, rand dim ±100, others

0• MD still very bad, driven by outliers• SVM & DWD are both very robust• SVM loses (affected by margin push)• DWD slightly better (by w’ted influence)• Methods converge for higher dimension??

Ignore RLR (a mistake)

2.21

Page 72: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n Simulations

Nested Spheres:

Page 73: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n SimulationsNested Spheres:

1st d/2 dim’s, Gaussian with var 1 or C2nd d/2 dim’s, the squares of the 1st dim’s(as for 2nd degree polynomial embedding)

• Each method best somewhere• MD best in highest d (data non-Gaussian)• Methods not comparable (realistic)• Methods converge for higher

dimension??• HDLSS space is a strange place

Ignore RLR (a mistake)

Page 74: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n SimulationsConclusions:

• Everything (sensible) is best sometimes• DWD often very near best• MD weak beyond Gaussian

Caution about simulations (and examples):• Very easy to cherry pick best ones• Good practice in Machine Learning

– “Ignore method proposed, but read paper for useful comparison of

others”

Page 75: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n Simulations

Caution: There are additional players

E.g. Regularized Logistic Regression

looks also very competitive

Interesting Phenomenon:

All methods come together

in very high dimensions???

Page 76: Object Orie’d Data Analysis, Last Time

HDLSS Discrim’n Simulations

Can we say more about:

All methods come together

in very high dimensions???

Mathematical Statistical Question:

Mathematics behind this???

(will answer next time)