Upload
pomona
View
110
Download
0
Embed Size (px)
DESCRIPTION
Visualizing Data using t-SNE. Presenter : Wei- Hao Huang Authors : Geoffrey Hinton JMLR 2008. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation
Citation preview
Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
1
Visualizing Data using t-SNE
Presenter : Wei-Hao Huang Authors : Geoffrey Hinton
JMLR 2008
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
2
Outlines Motivation Objectives Methodology Experiments Conclusions Comments
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
3
Motivation· Visualization of high-dimensional data is an
important problem and deals with data of widely varying dimensionality.
· Linear v.s. Nonlinear dimensionality reduction techniques.
· Techniques are strong performance on artificial data sets, but visualizing real high-dimensional data are not.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Objectives
4
• To convert a high-dimensional data set into a matrix of pairwise similarities.
• To introduce a new technique is called “t-SNE” for visualizing the resulting similarity data.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
5
Methodology· Stochastic Neighbor Embedding
· t-Distributed Stochastic Neighbor Embedding─ Symmetric SNE─ Mismatched Tail can Compensate for Mismatched
Dimensionalities
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Stochastic Neighbor Embedding·
·
· · · ·
6
Data space
Map space
Cost function
Perplexity
Gradient descent method
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Symmetric SNE (t-SNE)
7
To use Student-t distribution improve performance.• Cost function is difficult to optimize Symmetrized• Crowding problem heavy-tailed distribution
Cost function
Data space
Map space
Gradient descent method
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Mismatched Tails can Compensate for Mismatched Dimensionalities (t-SNE)·
·
8
Gradient descent method
Map space
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.t-SNE Algorithm
9
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· Data Sets
─ MNIST data set, Olivetti faces data set, COIL-20 data set, word-features data set, and Netflix data set.
· Experimental Setup─ To use PCA to reduce the dimensionality─ Cost function parameter settings
10
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Visualizations of 6,000 handwritten digits from the MNIST data set
11
t-SNE Sammon mapping
LLEIsomap
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Visualizations of the Olivetti faces data set
12
t-SNE Sammon mapping
LLEIsomap
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Visualizations of the COIL-20 data set
13
t-SNE Sammon mapping
LLEIsomap
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Applying t-SNE to Large Data Sets
14
K=20Neighborhood graph
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Weaknesses· Dimensionality reduction for other purposes.· Curse of intrinsic dimensionality.· Non-convexity of the t-SNE cost function.
15
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
16
Conclusions
• t-SNE is capable of retaining the local structure of the data while also revealing some important global structure.
• To present a landmark approach that makes it possible to successfully visualize large real-world data sets.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
17
Comments· Advantages
─ Visualization of high-dimensional data is very well.─ Open source.
· Applications─ Visual application for data.