Upload
fadey
View
55
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Supervised Learning Regression, Classification Linear regression, k- NN classification. Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 11, 2014. An Example: Size of Engine vs Power. - PowerPoint PPT Presentation
Citation preview
Supervised Learning
Regression, ClassificationLinear regression, k-NN classification
Debapriyo Majumdar
Data Mining – Fall 2014
Indian Statistical Institute Kolkata
August 11, 2014
2
An Example: Size of Engine vs Power
700 900 1100 1300 1500 1700 1900 2100 2300 25000
20406080
100120140160180200
Engine displacement (cc)
Pow
er (b
hp)
An unknown car has an engine of size 1800cc. What is likely to be the power of the engine?
3
An Example: Size of Engine vs Power
700 900 1100 1300 1500 1700 1900 2100 2300 25000
20406080
100120140160180200
Engine displacement (cc)
Pow
er (b
hp)
Intuitively, the two variables have a relation Learn the relation from the given data Predict the target variable after learning
TargetVariable
4
Exercise: on a simpler set of data points
Predict y for x = 2.5
0.5 1 1.5 2 2.5 3 3.5 4 4.50
2
4
6
8
10
12
y
x
x y1 12 33 74 10
2.5 ?
5
Linear Regression
700 900 1100 1300 1500 1700 1900 2100 2300 25000
20406080
100120140160180200
Engine displacement (cc)
Pow
er (b
hp)
Assume: the relation is linear Then for a given x (=1800), predict the value of y
Training set
6
Linear Regression
700 900 1100 1300 1500 1700 1900 2100 2300 25000
20406080
100120140160180200
Engine displacement (cc)
Pow
er (b
hp)
Linear regression Assume y = a . x + b Try to find suitable a and b
Optional exercise
Engine (cc)
Power (bhp)
800 601000 901200 801200 1001200 751400 901500 1201800 1602000 1402000 1702400 180
7
Exercise: using Linear Regression
Define a regression line of your choice Predict y for x = 2.5
0.5 1 1.5 2 2.5 3 3.5 4 4.50
2
4
6
8
10
12
y
x
x y1 12 33 74 10
2.5 ?
8
Choosing the parameters right
The data points: (x1, y1), (x2, y2), … , (xm, ym)
The regression line: f(x) = y = a . x + b
Least-square cost function: J = Σi ( f(xi) – yi )2
Goal: minimize J over choices of a and b
700 900 1100 1300 1500 1700 1900 2100 2300 25000
20406080
100120140160180200
x
y
Goal: minimizing the deviation from the actual data points
9
How to Minimize the Cost Function?
Goal: minimize J for all values of a and b Start from some a = a0 and b = b0
Compute: J(a0,b0)
Simultaneously change a and b towards the negative gradient and eventually hope to arrive an optimal
Question: Can there be more than one optimal?
a
b
Δ
10
Another example:
Given that a person’s age is 24, predict if (s)he has high blood sugar
Discrete values of the target variable (Y / N) Many ways of approaching this problem
0 10 20 30 40 50 60 70 80
Hig
h bl
ood
suga
r
N
Y
Age
Training set
11
Classification problem
One approach: what other data points are nearest to the new point?
Other approaches?
0 10 20 30 40 50 60 70 80
Hig
h bl
ood
suga
r
N
Y
Age
?
24
12
Classification Algorithms The k-nearest neighbor classification Naïve Bayes classification Decision Tree Linear Discriminant Analysis Logistics Regression Support Vector Machine
13
Classification or Regression?Given data about some cars: engine size, number of seats, petrol / diesel, has airbag or not, price
Problem 1: Given engine size of a new car, what is likely to be the price?
Problem 2: Given the engine size of a new car, is it likely that the car is run by petrol?
Problem 3: Given the engine size, is it likely that the car has airbags?
Classification
15
Example: Age, Income and Owning a flat
10 20 30 40 50 60 700
50
100
150
200
250
Mon
thly
inco
me
(tho
usan
d ru
pees
)
Age
Training set• Owns a
flat
• Does not own a flat
Given a new person’s age and income, predict – does (s)he own a flat?
16
Example: Age, Income and Owning a flat
10 20 30 40 50 60 700
50
100
150
200
250
Mon
thly
inco
me
(tho
usan
d ru
pees
)
Age
Nearest neighbor approach Find nearest neighbors among the known data points
and check their labels
Training set• Owns a
flat
• Does not own a flat
17
Example: Age, Income and Owning a flat
10 20 30 40 50 60 700
50
100
150
200
250
Mon
thly
inco
me
(tho
usan
d ru
pees
)
Age
The 1-Nearest Neighbor (1-NN) Algorithm:– Find the closest point in the training set– Output the label of the nearest neighbor
Training set• Owns a
flat
• Does not own a flat
18
The k-Nearest Neighbor Algorithm
10 20 30 40 50 60 700
50
100
150
200
250
Mon
thly
inco
me
(tho
usan
d ru
pees
)
Age
The k-Nearest Neighbor (k-NN) Algorithm:– Find the closest k point in the training set– Majority vote among the labels of the k points
Training set• Owns a
flat
• Does not own a flat
19
Distance measures How to measure distance to find closest points? Euclidean: Distance between vectors x = (x1, … , xk)
and y = (y1, … , yk)
Manhattan distance:
Generalized squared interpoint distance: S is the covariance matrix
The Maholanobis distance (1936)
20
Classification setup Training data / set: set of input data points and given
answers for the data points Labels: the list of possible answers Test data / set: inputs to the classification algorithm
for finding labels– Used for evaluating the algorithm in case the answers are
known (but known to the algorithm)
Classification task: Determining labels of the data points for which the label is not known or not passed to the algorithm
Features: attributes that represent the data
21
Evaluation Test set accuracy: the correct performance measure Accuracy = #of correct answer / #of all answers Need to know the true test labels – Option: use training set itself– Parameter selection (for k-NN) by accuracy on training set
Overfitting: a classifier performs too good on training set compared to new (unlabeled) test data
22
Better validation methods Leave one out:– For each training data point x of training set D– Construct training set D – x, test set {x}– Train on D – x, test on x– Overall accuracy = average over all such cases– Expensive to compute
Hold out set: – Randomly choose x% (say 25-30%) of the training data, set
aside as test set– Train on the rest of training data, test on the test set– Easy to compute, but tends to have higher variance
23
The k-fold Cross Validation Method Randomly divide the training data into k partitions
D1,…, Dk : possibly equal division
For each fold Di
– Train a classifier with training data = D – Di
– Test and validate with Di
Overall accuracy: average accuracy over all cases
24
References Lecture videos by Prof. Andrew Ng, Stanford University
Available on Coursera (Course: Machine Learning)
Data Mining Map: http://www.saedsayad.com/