26
Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Mach Classification Techniques II 1. Nearest Neighbor Classifiers 2. Support Vector Machines 3. Editing and Condensing Techniques for NN Classifiers 4. Ensemble Methods and Adaboost 5. Model Evaluation

Classification Techniques II

  • Upload
    wayne

  • View
    56

  • Download
    0

Embed Size (px)

DESCRIPTION

Classification Techniques II. Nearest Neighbor Classifiers Support Vector Machines Editing and Condensing Techniques for NN Classifiers Ensemble Methods and Adaboost Model Evaluation . Classification and Decision Boundaries. - PowerPoint PPT Presentation

Citation preview

Page 1: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

Classification Techniques II

1. Nearest Neighbor Classifiers2. Support Vector Machines3. Editing and Condensing Techniques for NN

Classifiers 4. Ensemble Methods and Adaboost5. Model Evaluation

Page 2: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

Classification and Decision Boundaries

Classification can be viewed as “learning good decision boundaries” that separate the examples belonging to different classes in a data set.

Decision boundary

Page 3: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

Instance-Based Classifiers

Atr1 ……... AtrN ClassA

B

B

C

A

C

B

Set of Stored Cases

Atr1 ……... AtrN

Unseen Case

• Store the training records • Use training records to

predict the class label of unseen cases

Page 4: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

Instance Based Classifiers Instance-based Classifiers: do not create a model

but use training examples directly to classify unseen examples (“lazy” classifiers).

Examples:– Rote-learner

Memorizes entire training data and performs classification only if attributes of record match one of the training examples exactly

– Nearest neighbor Uses k “closest” points (nearest neighbors) for

performing classification

Page 5: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

1. Nearest-Neighbor Classifiers Requires three things

– The set of stored records– Distance Metric to compute

distance between records– The value of k, the number of

nearest neighbors to retrieve

To classify an unknown record:– Compute distance to other

training records– Identify k nearest neighbors – Use class labels of nearest

neighbors to determine the class label of unknown record (e.g., by taking majority vote)

Unknown record

Page 6: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

Definition of Nearest Neighbor

X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points that have the k smallest distance to x

Page 7: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

Voronoi Diagrams for NN-Classifiers

Each cell contains one sample, and every location within the cell is closer to that sample than to any other sample.

A Voronoi diagram divides the space into such cells.

Every query point will be assigned the classification of the sample within that cell. The decision boundary separates the class regions based on the 1-NN decision rule.

Knowledge of this boundary is sufficient to classify new points.

Remarks: Voronoi diagrams can be computed in lower dimensional spaces; in feasible for higher dimensional spaced. They also represent models for clusters that have been generate by representative-based clustering algorithms.

Page 8: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

Nearest Neighbor Classification

Compute distance between two points:– Euclidean distance

Determine the class from nearest neighbor list– take the majority vote of class labels among

the k-nearest neighbors– Weigh the vote according to distance

weight factor, w = 1/d2

i ii

qpqpd 2)(),(

Page 9: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

Nearest Neighbor Classification…

Choosing the value of k:– If k is too small, sensitive to noise points– If k is too large, neighborhood may include points from

other classes

X

Page 10: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

Nearest Neighbor Classification…

Scaling issues– Attributes may have to be scaled to prevent

distance measures from being dominated by one of the attributes

– Example: height of a person may vary from 1.5m to 1.8m weight of a person may vary from 90lb to 300lb income of a person may vary from $10K to $1M

Page 11: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

Summary Nearest Neighbor Classifiers

k-NN classifiers are lazy learners – Unlike eager learners such as decision tree

induction and rule-based systems, it does not build models explicitly

– Classifying unknown records is relatively expensive

k-NN classifiers rely on a distance function; the quality of the distance function is critical for the performance of a K-NN classifier.

k-NN classifiers obtain high accuracies and are quite popular in some fields, such as text data mining and in information retrieval, in general.

Page 12: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

2. Support Vector Machines

One Possible Solution

B1

Page 13: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

Support Vector Machines

Another possible solution

B2

Page 14: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

Support Vector Machines

Other possible solutions

B2

Page 15: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

Support Vector Machines

Which one is better? B1 or B2? How do you define better?

B1

B2

Page 16: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

Support Vector Machines

Find hyperplane maximizes the margin => B1 is better than B2

B1

B2

b11

b12

b21b22

margin

Page 17: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

Support Vector Machines

B1

b11

b12

0 bxw

1 bxw 1 bxw

1bxw if1

1bxw if1)(

xf 2||||

2 Marginw

Examples are;(x1,..,xn,y) withy{-1.1}

Page 18: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

Support Vector Machines

We want to maximize:

– Which is equivalent to minimizing:

– But subjected to the following N constraints:

This is a constrained convex quadratic optimization problem that can be solved in polynominal time

– Numerical approaches to solve it (e.g., quadratic programming) exist– The function to be optimized has only a single minimum no local minimum

problem

2||||2 Marginw

N1,..,i 1b)xw(y ii

2||||)(

2wwL

Page 19: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

Support Vector Machines

What if the problem is not linearly separable?

Page 20: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

Linear SVM for Non-linearly Separable Problems

What if the problem is not linearly separable?– Introduce slack variables

Need to minimize:

Subject to (i=1,..,N):

C is chosen using a validation set trying to keep the margins wide while keeping the training error low.

i

iii

0)2(-1b)xw(*y )1(

N

i

kiCwwL

1

2

2||||)(

Measures testing errorInverse size of marginbetween hyperplanes

Parameter

Slack variable

allows constraint violationto a certain degree

Page 21: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

Nonlinear Support Vector Machines What if decision boundary is not linear?

Alternative 1:Use technique thatEmploys non-lineardecision boundaries

Non-linear function

Page 22: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

Nonlinear Support Vector Machines

1. Transform data into higher dimensional space2. Find the best hyperplane using the methods introduced earlier

Alternative 2:Transform into a higher dimensionalattribute space and find linear decision boundaries in this space

Page 23: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

Nonlinear Support Vector Machines

1. Choose a non-linear kernel function f to transform into a different, usually higher dimensional, attribute space

2. Minimize

– but subjected to the following N constraints:

N1,..,i 1b))x w(y ii

f

2||||)(

2wwL

Find a good hyperplaneIn the transformed space

Page 24: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

Example: Polynomial Kernal Function

Polynomial Kernel Function:F(x1,x2)=(x12,x22,sqrt(2)*x1,sqrt(2)*x2,1)K(u,v)=F(u)F(v)= (uv + 1)2

A Support Vector Machine with polynomial kernel function classifies a new example z as follows:

sign((S liyi*F(xi)F(z))+b) =

sign((S liyi *(xiz +1)2))+b)

Remark: li and b are determined using the methods for linear SVMs that were discussed earlier

Kernel function trick: perform computationsin the original space, although we solve anoptimization problem in the transformed space more efficient!!

Page 25: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

Summary Support Vector Machines Support vector machines learn hyperplanes that separate two

classes maximizing the margin between them (the empty space between the instances of the two classes).

Support vector machines introduce slack variables, in the case that classes are not linear separable and trying to maximize margins while keeping the training error low.

The most popular versions of SVMs use non-linear kernel functions to map the attribute space into a higher dimensional space to facilitate finding “good” linear decision boundaries in the modified space.

Support vector machines find “margin optimal” hyperplanes by solving a convex quadratic optimization problem. However, this optimization process is quite slow and support vector machines tend to fail if the number of examples goes beyond 500/2000/5000…

In general, support vector machines accomplish quite high accuracies, if compared to other techniques.

Page 26: Classification Techniques II

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines

Useful Support Vector Machine Links

Lecture notes are much more helpful to understand the basic ideas: http://www.ics.uci.edu/~welling/teaching/KernelsICS273B/Kernels.html  http://cerium.raunvis.hi.is/~tpr/courseware/svm/kraekjur.html Some tools are often used in publications                 livsvm:     http://www.csie.ntu.edu.tw/~cjlin/libsvm/                 spider:     http://www.kyb.tuebingen.mpg.de/bs/people/spider/index.html Tutorial Slides:      http://www.support-vector.net/icml-tutorial.pdf Surveys:                http://www.svms.org/survey/Camp00.pdf More General Material: http://www.learning-with-kernels.org/ http://www.kernel-machines.org/http://kernel-machines.org/publications.htmlhttp://www.support-vector.net/tutorial.html      Remarks: Thanks to Chaofan Sun for providing these links!