13
ETC3250/5250: Dimension reduction Semester 1, 2020 Professor Di Cook Econometrics and Business Statistics Monash University Week 4 (b)

ETC3250/5250: Dimension...Multidimensional scaling (MDS) nds a low-dimensional layout of points that minimises the difference between distances computed in the p-dimensional space,

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ETC3250/5250: Dimension...Multidimensional scaling (MDS) nds a low-dimensional layout of points that minimises the difference between distances computed in the p-dimensional space,

ETC3250/5250: DimensionreductionSemester 1, 2020

Professor Di Cook

Econometrics and Business Statistics Monash University

Week 4 (b)

Page 2: ETC3250/5250: Dimension...Multidimensional scaling (MDS) nds a low-dimensional layout of points that minimises the difference between distances computed in the p-dimensional space,

PCA vs LDA

Discriminant space: is the low-dimensional space where the classmeans are the furthest apart relative to the common variance-covariance.

The discriminant space is provided by the eigenvectors after making aneigen-decomposition of , whereΣ−1ΣB

ΣB =K

∑i=1

(μi − μ)(μi − μ)′   and   Σ =K

∑k=1

nk

∑i=1

(xi − μk)(xi − μk)′1

K

1

K

1

nk

2 / 12

Page 3: ETC3250/5250: Dimension...Multidimensional scaling (MDS) nds a low-dimensional layout of points that minimises the difference between distances computed in the p-dimensional space,

Mahalanobis distance

For two -dimensional vectors,Euclidean distance is

and Mahalanobs distance is

Which points are closest accordingto Euclidean distance? Which pointsare closest relative to the variance-covariance?

p

d(x, y) = √(x − y)′(x − y)

d(x, y) = √(x − y)′Σ−1(x − y)

00:30

3 / 12

Page 4: ETC3250/5250: Dimension...Multidimensional scaling (MDS) nds a low-dimensional layout of points that minimises the difference between distances computed in the p-dimensional space,

Discriminant space

Both means the same. Two different variance-covariance matrices.Discriminant space depends on the variance-covariance matrix.

4 / 12

Page 5: ETC3250/5250: Dimension...Multidimensional scaling (MDS) nds a low-dimensional layout of points that minimises the difference between distances computed in the p-dimensional space,

Projection pursuit (PP) generalises PCA

PCA:

PP:

maximizeϕ11,…,ϕp1

n

∑i=1

(p

∑j=1

ϕj1xij)

2

 subject to 

p

∑j=1

ϕ2j1 = 1

1

n

maximizeϕ11,…,ϕp1

f (p

∑j=1

ϕj1xij)  subject to 

p

∑j=1

ϕ2j1

= 1

5 / 12

Page 6: ETC3250/5250: Dimension...Multidimensional scaling (MDS) nds a low-dimensional layout of points that minimises the difference between distances computed in the p-dimensional space,

MDS

Multidimensional scaling (MDS) �nds a low-dimensional layout ofpoints that minimises the difference between distances computed inthe p-dimensional space, and those computed in the low-dimensionalspace.

where is an matrix of distances between all pairs of

points, and is the distance between the points in the low-dimensional space.

StressD(x1, . . . ,xN) =⎛

N

∑i,j=1;i≠j

(dij − dk(i, j))2⎞

1/2

D N × N (dij)dk(i, j)

6 / 12

Page 7: ETC3250/5250: Dimension...Multidimensional scaling (MDS) nds a low-dimensional layout of points that minimises the difference between distances computed in the p-dimensional space,

MDS

Classical MDS is the same asPCA Metric MDS incorporates powertransformations on the distances,

.

Non-metric MDS incorporates amonotonic transformation of thedistances, e.g. rank

track <- read_csv("data/womens_track.csv"track_mds <- cmdscale(dist(track[,1:7])) as_tibble() %>% mutate(country = track$country)

drij

7 / 12

Page 8: ETC3250/5250: Dimension...Multidimensional scaling (MDS) nds a low-dimensional layout of points that minimises the difference between distances computed in the p-dimensional space,

ChallengeFor each of these distance matrices, �nd a layout in 1 or 2D thataccurately re�ects the full distances.

## # A tibble: 3 x 4## name A B C## <chr> <dbl> <dbl> <dbl>## 1 A 0.1 3.2 3.9## 2 B 3.2 -0.1 5.1## 3 C 3.9 5.1 0

## # A tibble: 4 x 5## name A B C D## <chr> <dbl> <dbl> <dbl> <dbl>## 1 A 0.1 0.9 2.1 3 ## 2 B 0.9 0 1.1 1.9## 3 C 2.1 1.1 0.1 1.1## 4 D 3 1.9 1.1 -0.1

00:30 8 / 12

Page 9: ETC3250/5250: Dimension...Multidimensional scaling (MDS) nds a low-dimensional layout of points that minimises the difference between distances computed in the p-dimensional space,

Non-linear dimension reduction

T-distributed Stochastic Neighbor Embedding (t-SNE): similar toMDS, except emphasis is placed on grouping observations into clusters.Observations within a cluster are placed close in the low-dimensionalrepresentation, but clusters themselves are placed far apart.

9 / 12

Page 10: ETC3250/5250: Dimension...Multidimensional scaling (MDS) nds a low-dimensional layout of points that minimises the difference between distances computed in the p-dimensional space,

Non-linear dimension reduction

Local linear embedding (LLE): Finds nearest neighbours of points,de�nes interpoint distances relative to neighbours, and preserves theseproximities in the low-dimensional mapping. Optimisation is used tosolve an eigen-decomposition of the knn distance construction.

10 / 12

Page 11: ETC3250/5250: Dimension...Multidimensional scaling (MDS) nds a low-dimensional layout of points that minimises the difference between distances computed in the p-dimensional space,

Non-linear dimension reduction

Self-organising maps (SOM): First clusters the observations into groups. Uses the mean of each group laid out in a constrained 2D

grid to create a 2D projection.k × k

11 / 12

Page 12: ETC3250/5250: Dimension...Multidimensional scaling (MDS) nds a low-dimensional layout of points that minimises the difference between distances computed in the p-dimensional space,

� Made by a human with a computerSlides at https://iml.numbat.space.

Code and data at https://github.com/numbats/iml.

Created using R Markdown with �air by xaringan, andkunoichi (female ninja) style.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

12 / 12

Page 13: ETC3250/5250: Dimension...Multidimensional scaling (MDS) nds a low-dimensional layout of points that minimises the difference between distances computed in the p-dimensional space,