Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Support VectorMachines
Optimization objective
Machine Learning
Andrew Ng
If , we want ,
Alternative view of logistic regression
If , we want ,
Andrew Ng
Alternative view of logistic regression
Cost of example:
If (want ): If (want ):
Andrew Ng
Support vector machineLogistic regression:
Support vector machine:
Andrew Ng
SVM hypothesis
Hypothesis:
Large MarginIntuition
Machine Learning
Support VectorMachines
Andrew Ng
If , we want (not just )If , we want (not just )
Support Vector Machine
-1 1 -1 1
Andrew Ng
SVM Decision Boundary
Whenever :
Whenever :
-1 1
-1 1
Andrew Ng
SVM Decision Boundary: Linearly separable case
x1
x2
Large margin classifier
Andrew Ng
Large margin classifier in presence of outliers
x1
x2
The mathematics behind large margin classification (optional)
Machine Learning
Support VectorMachines
Andrew Ng
Vector Inner Product
Andrew Ng
SVM Decision Boundary
Andrew Ng
SVM Decision Boundary
커널(Kernels) I
Machine Learning
Support VectorMachines
Andrew Ng
비선형결정경계
x1
x2
특징 의또다른/ 더나은선택이있는가?
Andrew Ng
Kernel
x1
x2
주어진 에대하여, 랜드마크 의근접도에의존하는새로운특징을계산한다.
Andrew Ng
커널과유사도(Similarity)
Andrew Ng
Example:
Andrew Ng
x1
x2
Support VectorMachines
커널(Kernels) II
Machine Learning
Andrew Ng
x1
x2
Choosing the landmarks
Given :
Predict ifWhere to get ?
Andrew Ng
SVM with KernelsGivenchoose
Given example :
For training example :
Andrew Ng
SVM with Kernels
Hypothesis: Given , compute featuresPredict “y=1” if
Training:
Andrew Ng
Small : 특징 가좀덜부드럽게변화한다.
Lower bias, higher variance.
SVM parameters:
C ( ). Large C: Lower bias, high variance.Small C: Higher bias, low variance.
Large : 특징 가좀더부드럽게변화한다.
Higher bias, lower variance.
Support Vector Machines
Using an SVM
Machine Learning
Andrew Ng
Use SVM software package (e.g. liblinear, libsvm, …) to solve for parameters .
Need to specify:Choice of parameter C.Choice of kernel (similarity function):
E.g. No kernel (“linear kernel”)Predict “y = 1” if
Gaussian kernel:
, where . Need to choose .
Andrew Ng
Kernel (similarity) functions:
function f = kernel(x1,x2)
return
주의 : 가우시안커널을사용하기전에특징스케일링을수행한다.
x1 x2
Andrew Ng
Other choices of kernel
Note: 모든유사도함수 가 유효한커널은아니다.
( SVM 패키지의최적화가정확히실행되고발산하지않으려면, “Mercer’s Theorem”라는기술적조건을만족할필요가있다).
많은이용가능한커널들:
- Polynomial kernel:
- 좀더난해한: String kernel, chi-square kernel, histogram intersection kernel, …
Andrew Ng
Multi-class classification
많은 SVM 패키지들이이미내장된다중-클래스분류기능을갖는다.
그렇지않으면, 일-대-모두방법을사용한다
(Train SVMs, one to distinguish from the rest, for ), get
Pick class with largest
Andrew Ng
Logistic regression vs. SVMs
number of features ( ), number of training examplesIf is large (relative to ):Use logistic regression, or SVM without a kernel (“linear kernel”)
If is small, is intermediate:Use SVM with Gaussian kernel
If is small, is large:Create/add more features, then use logistic regression or SVM without a kernel
Neural network likely to work well for most of these settings, but may be slower to train.