Upload
mahrukh-fida
View
224
Download
0
Embed Size (px)
Citation preview
7/31/2019 Regression, Classification and Clustering
http://slidepdf.com/reader/full/regression-classification-and-clustering 1/23
Mah-Rukh Fida
June 2012
7/31/2019 Regression, Classification and Clustering
http://slidepdf.com/reader/full/regression-classification-and-clustering 2/23
Topics to be discussed D ATA MINING
R EGRESSION
CLASSIFICATION CLUSTERING
7/31/2019 Regression, Classification and Clustering
http://slidepdf.com/reader/full/regression-classification-and-clustering 3/23
DATA MINING
7/31/2019 Regression, Classification and Clustering
http://slidepdf.com/reader/full/regression-classification-and-clustering 4/23
Definition Definition : Exploring hidden information
Models of data mining
7/31/2019 Regression, Classification and Clustering
http://slidepdf.com/reader/full/regression-classification-and-clustering 5/23
Two categories of data mining
models Prediction Model
Makes prediction using known results found from differentdata objects.
Descriptive Model
Identifies patterns or relationships in data.
Explores properties of the data examined
Does not predict new properties.
7/31/2019 Regression, Classification and Clustering
http://slidepdf.com/reader/full/regression-classification-and-clustering 6/23
REGRESSION
7/31/2019 Regression, Classification and Clustering
http://slidepdf.com/reader/full/regression-classification-and-clustering 7/23
Definition Numeric prediction of the value of dependent variable.
Relationship between dependent and independent variable(s) are expressible through mathematical equation.
Types of regression
7/31/2019 Regression, Classification and Clustering
http://slidepdf.com/reader/full/regression-classification-and-clustering 8/23
Types of Regression Linear regression
y=c+mx, where c and m are regression coefficients.
Multi-Linear regression y=c
0+c
1 x
1+c
2 x
2+…+c
n x
n
where c0 ,c
1 ,…c
nare regression coefficients and x
1 , x
2 ,…, x
n
are independent variables.
7/31/2019 Regression, Classification and Clustering
http://slidepdf.com/reader/full/regression-classification-and-clustering 9/23
7/31/2019 Regression, Classification and Clustering
http://slidepdf.com/reader/full/regression-classification-and-clustering 10/23
Regression Continued … Regression model is selected when
Prediction of a continuous or numerical value is needed
The relationship of predictor and response can beexpressed in the form of a curve or a mathematicalequation
Regression is not suitable when
Data may not fit in linear model Linear data may be poor due to noise or outliers.
Data is non-numeric
7/31/2019 Regression, Classification and Clustering
http://slidepdf.com/reader/full/regression-classification-and-clustering 11/23
CLASSIFICATION
7/31/2019 Regression, Classification and Clustering
http://slidepdf.com/reader/full/regression-classification-and-clustering 12/23
Definition
Predicts class membership of data instances Classes are non-overlapping
Classes are already defined
7/31/2019 Regression, Classification and Clustering
http://slidepdf.com/reader/full/regression-classification-and-clustering 13/23
Basic Steps for Prediction
Model Construction
Model Usage
Example :
• Height based Output follows the below given division criteria:
2m ≤ Height Tall 1.7m < Height < 2m Medium Height ≤ 1.7m Short • Classify :<Pat, F, 1.6> using KNN with K=5.
- {<Kristina, F, 1.6>, <Kathy, F, 1.6>, < Stephanie, F, 1.7>, <Dave, M, 1.7>, <Wynette, F, 1.75>}.
- Pat is Short.
7/31/2019 Regression, Classification and Clustering
http://slidepdf.com/reader/full/regression-classification-and-clustering 14/23
Validation Criteria
7/31/2019 Regression, Classification and Clustering
http://slidepdf.com/reader/full/regression-classification-and-clustering 15/23
Validation Criteria
7/31/2019 Regression, Classification and Clustering
http://slidepdf.com/reader/full/regression-classification-and-clustering 16/23
7/31/2019 Regression, Classification and Clustering
http://slidepdf.com/reader/full/regression-classification-and-clustering 17/23
Definition Grouping of like terms
Groups are not predefined
Four Clusters
7/31/2019 Regression, Classification and Clustering
http://slidepdf.com/reader/full/regression-classification-and-clustering 18/23
Clustering Algorithms
7/31/2019 Regression, Classification and Clustering
http://slidepdf.com/reader/full/regression-classification-and-clustering 19/23
7/31/2019 Regression, Classification and Clustering
http://slidepdf.com/reader/full/regression-classification-and-clustering 20/23
Clustering Algorithm
Result Validation If clusters do not make sense, go back to prior stage
Check for tendency of clusters in the data set
7/31/2019 Regression, Classification and Clustering
http://slidepdf.com/reader/full/regression-classification-and-clustering 21/23
Selection Criteria Simplification
Useful in data concept construction
Unsupervised learning
7/31/2019 Regression, Classification and Clustering
http://slidepdf.com/reader/full/regression-classification-and-clustering 22/23
Validation Criteria External criteria
Entropy, F-Measure, NMI-Measure, Purity
Internal criteria Sum of Squared Error, BIC, CH, DB, SIL, DUNN
Relative criteria
Entropy, SSE