59
DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering Maastricht University

DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

  • Upload
    ossie

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering Maastricht University. PART 2 Exploratory Data Analysis. VISUALISING AND EXPLORING DATA-SPACE. Data Mining Lecture II [Chapter 3 from Principles of Data Mining by Hand,, Manilla, Smyth ]. - PowerPoint PPT Presentation

Citation preview

Page 1: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

DATA MINING from data to information

Ronald WestraDep. MathematicsKnowledge EngineeringMaastricht University

Page 2: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

PART 2

Exploratory Data Analysis

Page 3: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

VISUALISING AND EXPLORING DATA-SPACE

Data Mining Lecture II[Chapter 3 from Principles of Data Mining

by Hand,, Manilla, Smyth ]

Page 4: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

LECTURE 3: Visualising and Exploring Data-Space

Readings:

• Chapter 3 from Principles of Data Mining by Hand, Mannila, Smyth.

3.1 Obtain insight in the Structure in Data Space

1. distribution over the space

2. Are there separate and disconnected parts?

3. is there a model?

4. data-driven hypothesis testing

5. Starting point: use strong perceptual powers of humans

Page 5: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

LECTURE 3: Visualising and Exploring Data-Space

3.2 Tools to represent a variabele

1. mean, variance, standard deviation, skewness

2. plot

3. moving-average plot

4. histogram, kernel

Page 6: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

histogram

Page 7: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

Box Plots

Page 8: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

Overprinting

Page 9: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

Contour plot

Page 10: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

LECTURE 3: Visualising and Exploring Data-Space

3.3 Tools for repressenting two variables

1. scatter plot

2. moving-average plots

Page 11: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

scatter plot

Page 12: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

scatter plots

Page 13: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

scatter plots

Page 14: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

LECTURE 3: Visualising and Exploring Data-Space

3.4 Tools for representing multiple variables

1. all or selection of scatter plots

2. idem moving-average plots

3. ‘trelis’ or other parameterised plots

4. icons: star icons, Chernoff’s faces

Page 15: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

Chernoff’s faces

Page 16: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

Chernoff’s faces

Page 17: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

Chernoff’s faces

Page 18: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

Star Plots

Page 19: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

Parallel coordinates

Page 20: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.5 PCA: Principal Component Ananlysis

3.6 MDS: Multidimensional Scaling

DIMENSION REDUCTION

Page 21: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.5 PCA: Principal Component Ananlysis

With sub-scatter plots we already noticed that the best projections were determined by the projection that resulted in the maximal size of the set of data points. This is in the direction of the maximum variance.

This idea is worked out in the approach of the

Principal Components Analysis.

Page 22: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.5 PCA: Principal Component Ananlysis

Principal component analysis (PCA) is a vector space transform often used to reduce multidimensional data sets to lower dimensions for analysis.

Depending on the field of application, it is also named the discrete Karhunen-Loève transform (KLT), the Hotelling transform or proper orthogonal decomposition (POD).

PCA now is the mostly used as a tool in exploratory data analysis and for making predictive models. PCA involves the calculation of the eigenvalue decomposition of a data covariance matrix after mean centering the data for each attribute. The results of a PCA are usually discussed in terms of component scores and contribution.

Page 23: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.5 PCA: Principal Component Ananlysis

PCA is the simplest of the true eigenvector-based multivariate analyses. Often, its operation can be thought of as revealing the internal structure of the data in a way which best explains the variance in the data.

If a multivariate dataset is visualised as a set of coordinates in a high-dimensional data space (1 axis per variable), PCA supplies the user with a lower-dimensional picture, a "shadow" of this object when viewed from its (in some sense) most informative viewpoint.

Page 24: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.5 PCA: Principal Component Ananlysis

PCA is closely related to factor analysis.

Page 25: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.5 PCA: Principal Component Ananlysis

Consider a multivariate set in Data Space: this is a set with normal distributions in multiple dimensions, for instance:

Observe that the spatial extent appears different in each dimension.

Also observe that in this case the set is almost 1-dimensional.

Can we project the set so that the spatial extent in one dimension is optimal?

Page 26: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.5 PCA: Principal Component AnanlysisData X: n rows of p fields: the vectors are rows in X.

STEP 1: Subtract the average value from the dataset X: mean centered data.

The spatial extent of this cloud of points can be measured by the variance in the dataset X. This is an entry in the correlation matrix V = XTX.

The projection of the dataset X in a direction a is: y = Xa.

a

The spatial extent in direction a is the variance in the projected dataset Y: i.e. the variance σa

2 = yTy = (Xa)T(Xa) = aTXTXa = aTV a .

We now want to maximize this extent σa2 over all possible vectors a (why?).

Page 27: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.5 PCA: Principal Component Ananlysis

STEP 2: Maximize: σa2 = aTV a over all possible vectors a.

This is unlimited, just like maximizing x2 over x, therefore we restrict the size of vector a to 1: aTV a – 1 = 0

So we have: maximize: aTV a subject to: aTV a – 1 = 0

This can be solved with the Lagrange-multipliers method:

maximize: f(x) subject to: g(x) = 0 → d/dx{ f(x) – λ g(x)} = 0

For our case this means: d/da{ aTV a – λ (aTV a – 1 )} = 0

→ 2 Va – 2λa = 0

→ Va = λa

This means that we are looking for the eigen-vectors and eigen-values of the correlation matrix V = XTX.

Page 28: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.5 PCA: Principal Component Analysis

So, the underlying idea is: supose you have a high-dimensional normal-distributed data set. This will take the shape of a high-dimensional ellipsoid.

An ellipsoid is structured from its centre by orthogonal vectors with different radii. The largest radii have the strongest influence on the shape of the ellipsoid. The ellipsoid is described by the covariance-matrix of the set of data-points. The axes are defined by the orthogonal eigen-vectors (from the centre – the centroid – of the set), the radii are defined by the associated values.

So determine the eigen-values and order those in decreasing size: .

The first n ordered eigen-vectors thus ‘explain’ the following amount of the data: .

},...,,,{ 321 N

N

ii

n

ii

11/

Page 29: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.5 PCA: Principal Component Ananlysis

Page 30: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.5 PCA: Principal Component Ananlysis

Page 31: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.5 PCA: Principal Component Ananlysis

MEAN

Page 32: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.5 PCA: Principal Component Ananlysis

MEAN

Principal axis 2

Principal axis 1

Page 33: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.5 PCA: Principal Component Ananlysis

STEP 2: Plot the ordered eigen-values versus the index-number and inspect where a ‘shoulder’ occurs: this determines the number of eigen-values you take into acoount. This is a so-called ‘scree-plot’.

Page 34: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.5 PCA: Principal Component Ananlysis

For n points of p components there are: O(np2 + p3) operations required. Use LU-decomposition etcetera.

Page 35: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.5 PCA: Principal Component Ananlysis

Many benefits: considerable data-reduction, necessary for computational techniques like ‘Fisher-discriminant-analysis’ and ‘clustering’.

This works very well in practice.

Page 36: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.5 PCA: Principal Component Analysis

PCA is closely related to and often confused with Factor Analysis:

Factor Analysis is the explanation of p-dimensional data by a smaller number of m < p factors.

Page 37: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

EXAMPLE of PCA

Page 38: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

Dressler, et al. 1987

astronomical application: PCs for elliptical galaxies

Rotating to PC in BT – Σ space improves Faber-Jackson relation

as a distance indicator

Page 39: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

astronomical application: Eigenspectra (KL transform)

Connolly, et al. 1995

Page 40: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering
Page 41: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering
Page 42: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering
Page 43: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

1 pc 2 pc

3 pc 4 pc

Page 44: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.6 Multi-Dimensional Scaling [MDS]

1. Same purpose : represent high-dimensional data set

2. In the case of MS not by projections, but by reconstruction from the distance-table. The computed points are represented in an Euclidian sub-space – preferably a 2D-plane.

3. MDS performs better than PCA in case of strongly curved sets.

Page 45: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.6 Multidimensional Scaling

The purpose of multidimensional scaling (MDS) is to provide a visual representation of the pattern of proximities (i.e., similarities or distances) among a set of objects

INPUT: distances dist[Ai,Aj] where A is some class of objects

OUTPUT: positions X[Ai] where X is a D-dimensional vector

Page 46: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.6 Multidimensional Scaling

Page 47: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.6 Multidimensional Scaling

Page 48: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.6 Multidimensional Scaling

INPUT: distances dist[Ai,Aj] where A is some class of objects

Page 49: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.6 Multidimensional Scaling

OUTPUT: positions X[Ai] where X is a D-dimensional vector

Page 50: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.6 Multidimensional Scaling

How many dimensions ??? SCREE PLOT

Page 51: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

Multidimensional Scaling: Nederlandse dialekten

Page 52: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering
Page 53: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering
Page 54: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.6 Kohonen’s Self Organizing Map (SOM) and Sammon mapping

1. Same purpose : DIMENSION REDUCTION : represent a high dimensional set in a smaller sub-space e.g. 2D-plane.

2. SOM gives better results than Sammon mapping, but strongly sensitive to initial values.

3. This is close to clustering!

Page 55: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.6 Kohonen’s Self Organizing Map (SOM)

Page 56: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

3.6 Kohonen’s Self Organizing Map (SOM)

Page 57: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

Sammon mapping

Page 58: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

 

All information on math-part of course on:

http://www.math.unimaas.nl/personal/ronaldw/DAM/DataMiningPage.htm

Page 59: DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering