15
Informatics and Mathematical Modelling / Cognitive Sysemts Group 1 MLSP 2010 September 1st Archetypal Analysis for Machine Learning Morten Mørup DTU Informatics Cognitive Systems Group Technical University of Denmark Joint work with Lars Kai Hansen DTU Informatics Cognitive Systems Group Technical University of Denmark

Archetypal Analysis for Machine Learning

  • Upload
    cael

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Archetypal Analysis for Machine Learning. Morten Mørup DTU Informatics Cognitive Systems Group Technical University of Denmark. Joint work with Lars Kai Hansen DTU Informatics Cognitive Systems Group Technical University of Denmark. Archetypical Analysis (AA). X. X. C. S.  . - PowerPoint PPT Presentation

Citation preview

Page 1: Archetypal Analysis for Machine Learning

Informatics and Mathematical Modelling / Cognitive Sysemts Group

1MLSP 2010 September 1st

Archetypal Analysis for Machine Learning

Morten Mørup DTU Informatics

Cognitive Systems GroupTechnical University of Denmark

Joint work with Lars Kai HansenDTU Informatics

Cognitive Systems GroupTechnical University of Denmark

Page 2: Archetypal Analysis for Machine Learning

Informatics and Mathematical Modelling / Cognitive Sysemts Group

2MLSP 2010 September 1st

Page 3: Archetypal Analysis for Machine Learning

Informatics and Mathematical Modelling / Cognitive Sysemts Group

X XC

S

Archetypical Analysis (AA)

AA formed by two simplex constraints Archetype: Xck formed by convex combination of the data points

Projection: sn gives the convex combination of archetypes forming each data point

3MLSP 2010 September 1st

Page 4: Archetypal Analysis for Machine Learning

Informatics and Mathematical Modelling / Cognitive Sysemts Group

4MLSP 2010 September 1st

The Original paper of Adler and Breiman considered 3 applications

Swiss army head shape Los Angeles Basin air polution 1976

Tokamak Fusion Data

Other Applications:Flame dynamics (Stone & Adler 1996)End member extraction of Galaxy Spectra (Chan et al, 2003)Data driven Benchmarking (Porzio et al. 2008)

Page 5: Archetypal Analysis for Machine Learning

Informatics and Mathematical Modelling / Cognitive Sysemts Group

Archetypical analysis extract the ”principal convex hull” (PCH) of the data cloud

Convex hull: Blue lines and light shaded region (dots indicate points in convex set)Dominant convex hull: green lines and gray shaded region (dots indicate archetypes)

While convex set can be identified in linear time O(N) (McCallum & Avis 1979)finding C and S is a non-convex (NP hard) problem.

5MLSP 2010 September 1st

(Dwyer, 1988)

NB: One might think that AA is highy driven by outliers, however, ”outliers” are only relevant if they reflect representative dynamics in the data!

Page 6: Archetypal Analysis for Machine Learning

Informatics and Mathematical Modelling / Cognitive Sysemts Group

6MLSP 2010 September 1st

Our (new) mathematical results:1: The AA/PCH model is in general unique!

2: The AA/PCH model can be efficiently initialized by the proposed FurthestSum algorithm

3: The AA/PCH model parameters can be efficiently optimized by normalization invariant projected gradient

Large scale Applications

See Theorem 1

The proposed FurthestSum algorithmguarantee extraction of points in the convex set, see Theorem 2

For details on derivation of updates and their computational complexity see section 2.3

Page 7: Archetypal Analysis for Machine Learning

Informatics and Mathematical Modelling / Cognitive Sysemts Group

Our Machine Learning Applications

Computer visionNeuroImagingTextMiningCollaborative Filtering

7MLSP 2010 September 1st

Page 8: Archetypal Analysis for Machine Learning

Informatics and Mathematical Modelling / Cognitive Sysemts Group

8MLSP 2010 September 1st

Face database: K=361 pixels, N=2429 all images belong with probabilty 1 to convex set

SVD/PCA: Low -> high freq. dynamicsNMF: Part Based RepresentationAA: Archetypes/FreaksK-means: Centroids/Prototypes

X XC

S

Computer Vision: CBCL face database

Page 9: Archetypal Analysis for Machine Learning

Informatics and Mathematical Modelling / Cognitive Sysemts Group

Archetypal Analysis naturally bridges clustering methods with low rank representations

9MLSP 2010 September 1st

Page 10: Archetypal Analysis for Machine Learning

Informatics and Mathematical Modelling / Cognitive Sysemts Group

NeuroImaging: Positron Emission Tomography

10MLSP 2010 September 1st

XC

S

Altansering tracer injected, recorded signal in theory mixture of 3 underlying binding profiles (Archetypes): Low binding regions, High binding regions and artery/veines. Each voxel a given concentration fraction of these tissue types.

X

X

C

S

Low Binding High Binding Artery/Veines

Page 11: Archetypal Analysis for Machine Learning

Informatics and Mathematical Modelling / Cognitive Sysemts Group

Text Mining: NIPS term-document (bag of words)

11MLSP 2010 September 1st

X

CS

X

XC:

Distinct Aspects

Prototypical Aspects

Page 12: Archetypal Analysis for Machine Learning

Informatics and Mathematical Modelling / Cognitive Sysemts Group

12MLSP 2010 September 1st

Collaborative filtering: MovieLens

Medium size and large size Movie lens data (www.grouplens.org)Medium size: 1,000,209 ratings of 3,952 movies by 6,040 users Large size: 10,000,054 ratings of 10,677 movies given by 71,567

Extracts features representing distinct user types, each user represented as a given concentration fraction of the user types. AA appear to have less tendency to overfit.

Page 13: Archetypal Analysis for Machine Learning

Informatics and Mathematical Modelling / Cognitive Sysemts Group

Conclusion Archetypal Analysis is Unique in general (Theorem 1) Archetypal Analysis can be efficiently initialized by the proposed

FurhtestSum algorithm (Theorem 2) and optimized through normalization invariant projected gradient.

Archetypal Analysis naturally bridges clustering with low rank approximations Archetypal Analysis results in easy interpretable features that are closely

related to the actual data Archetypal Analysis useful for a large variety of machine learning problem

domains within unsupervised learning.(Computer Vision, NeuroImaging, TextMining, Collaborative Filtering)

Archetypal Analysis can be extended to kernel representations finding the principal convex hull in (a potentially infinite) Hilbert space (see section 2.4 of the paper).

13MLSP 2010 September 1st

Page 14: Archetypal Analysis for Machine Learning

Informatics and Mathematical Modelling / Cognitive Sysemts Group

Open problems and current research directions: What is the optimal number of components?

Cross-validation based on missing value prediction (see also collaborative filtering example in the paper)Bayesian generative models for AA/PCH that automatically penalize model complexity.

What if ’pure’ archetypes cannot be well represented by the data available?

14MLSP 2010 September 1st

vs.

Page 15: Archetypal Analysis for Machine Learning

Informatics and Mathematical Modelling / Cognitive Sysemts Group

Selected References from the paper

15MLSP 2010 September 1st

[1] Adele Cutler and Leo Breiman, “Archetypal analysis,” Technometrics, vol. 36, no. 4, pp. 338–347, Nov 1994.

[2] D. S. Hochbaum and D. B. Shmoys., “A best possible heuristic or the k-center problem.,” Mathematics of Operational Research, vol. 10, no. 2, pp. 180–184, 1985.

[7] Emily Stone and Adele Cutler, “Introduction to archetypal analysis of spatio-temporal dynamics,” Phys. D, vol. 96, no.1-4, pp. 110–131, 1996.

[8] Giovanni C. Porzio, Giancarlo Ragozini, and Domenico Vistocco, “On the use of archetypes as benchmarks,” Appl. Stoch. Model. Bus. Ind., vol. 24, no. 5, pp. 419–437, 2008.

[9] B. H. P. Chan, D. A. Mitchell, and L. E. Cram, “Archetypal analysis of galaxy spectra,” MON.NOT.ROY.ASTRON.SOC., vol. 338, pp. 790, 2003.

[11] D. McCallum and D. Avis, “A linear algorithm for finding the convex hull of a simple polygon,” Information Processing Letters, vol. 9, pp. 201–206, 1979.

[12] Rex A. Dwyer, “On the convex hull of random points in a polytope,” Journal of Applied Probability, vol. 25, no. 4, pp.688–699, 1988.