Upload
marvin-kenneth-underwood
View
222
Download
5
Tags:
Embed Size (px)
Citation preview
Online Multiple Kernel ClassificationSteven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang
Machine Learning (2013)
Presented by Audrey Cheong
Electrical & Computer Engineering
MATH 6397: Data Mining
2
Online Multiple Kernel Classification (OMKC)
Background - Online• Online learning
• Learns one instance at a time and predicts labels for future instances
Learner is given an instance
Learner predicts the label of the
instance
Learner is given the correct label
Learner refines its prediction mechanism
3
Online Multiple Kernel Classification (OMKC)
Background – Multiple Kernel• Composed of two online learning algorithms:
• Perceptron algorithm (Rosenblatt 1958) • Type of linear classifier• Learns a classifier for a given kernel
• Hedge algorithm (Freund and Schapire 1997) • Combines classifiers by linear weights
: Classifier 2
Perceptron
: Classifier 1
Perceptron
: Classifier 3
Perceptron
where
Hedge
4
Online Multiple Kernel Classification (OMKC)
Perceptron algorithm• Input vector : • Output vector : ; • Weights : • Threshold : • Arithmetic test :
• Minimize :
𝑦 𝑖={−1 𝑖𝑓 𝛼 ⋅ 𝑥 𝑖<𝜃1𝑖𝑓 𝛼 ⋅ 𝑥 𝑖≥𝜃
5
Online Multiple Kernel Classification (OMKC)
Hedge algorithm• Distribute weight among classifiers
• Setting new weights : for discount weight
• if the prediction is incorrect and if correct
6
Online Multiple Kernel Classification (OMKC)
Notations• : trial• : mixture of kernel classifiers• : indicates if training instance
is misclassified by the kernel classifier at trial t
• : indicator function
• : prediction from combination of m kernel classifiers
• : classifier function
7
Online Multiple Kernel Classification (OMKC)
Proposed framework• We define the optimal margin classification error for the
kernel with respect to a collection of training examples as
where
8
Online Multiple Kernel Classification (OMKC)
Algorithms
Deterministic approach: all kernels are used
Stochastic approach: a subset of kernels are used
Deterministic Stochastic
Deterministic StochasticUpdate
Combination
9
Online Multiple Kernel Classification (OMKC)
OMKC(D,D)Training sample
𝑓 1(𝑥) 𝑓 2(𝑥) 𝑓 𝑚(𝑥)…Kernel
classifiers :
𝑧1Prediction: 𝑧 2 … 𝑧𝑚
�̂� (𝑥)=∑𝑖=1
𝑚
𝑤𝑖 𝑧𝑖Combined Prediction:
Reduce if
Reduce if
Reduce if
…
Deterministic update
Deterministic combination
Deterministic Stochastic
Deterministic StochasticUpdate
Combination
10
Online Multiple Kernel Classification (OMKC)
OMKC(S,S)Training sample
𝑓 1(𝑥) 𝑓 2(𝑥) 𝑓 𝑚(𝑥)…Kernel
classifiers :
𝑧1Prediction: 𝑧 2 … 𝑧𝑚
�̂� (𝑥)=∑𝑖=1
𝑚
𝑤𝑖 𝑧𝑖Combined Prediction:
Reduce if
𝑤2=0 𝑤𝑚=0
…
…
Stochastic update
Deterministic Stochastic
Deterministic StochasticUpdate
Combination
𝑤1≠0Stochastic combination
11
Online Multiple Kernel Classification (OMKC)
Experimental setupbinary datasets
12
Online Multiple Kernel Classification (OMKC)
Experimental setup• 15 diverse datasets obtained from LIBSVM and UCI
machine learning repository• Predefine 16 kernel functions
• 3 polynomial kernels (i.e. )• 13 Gaussian kernels (i.e.)
• Fix discount weight • Results are averaged over 20 runs
13
Online Multiple Kernel Classification (OMKC)
Evaluation of the deterministic OMKC algorithm
• Comparison of the deterministic OMKC algorithm with three Perceptron based algorithms
• Perceptron : the well-known Perceptron baseline algorithm with a linear kernel (Rosenblatt 1958; Freund and Schapire 1999)
• Perceptron(u) : another Perceptron baseline algorithm with an unbiased/uniform combination of all the kernels
• Perceptron(*): an online validation procedure to search for the best kernel among the pool of kernels (using the first 10 % training examples), and then apply the Perceptron algorithm with the best kernel
• OM-2: a state-of-the-art online learning algorithm for multiple kernel learning (Jie et al. 2010; Orabona et al. 2010)
14
Online Multiple Kernel Classification (OMKC)
Evaluation of the deterministic OMKC algorithm
<>
<
15
Online Multiple Kernel Classification (OMKC)
Average mistake rate (20 runs)
16
Online Multiple Kernel Classification (OMKC)
Number of support vectors (20 runs)
17
Online Multiple Kernel Classification (OMKC)
Kernel weights
18
Online Multiple Kernel Classification (OMKC)
Effect of optimal 𝛽= √𝑇√𝑇+√ ln𝑚
; 𝑇 : training examples 𝑚 : # of kernels
19
Online Multiple Kernel Classification (OMKC)
Time Efficiency
Decreases as size increases
20
Online Multiple Kernel Classification (OMKC)
Conclusion• All the OMKC algorithms usually perform better than
• the regular Perceptron algorithm with an unbiased linear combination of multiple kernels
• the Perceptron algorithm with the best kernel found by validation• the state-of-the-art online MKL algorithm
• The deterministic combination strategy usually performs better
• Stochastic updating strategy improves computational efficiency without decreasing the accuracy significantly
21
Questions?1) How many kernel classifiers were used in the stochastic
combination?2) How was the number of support vectors determined? Should the
support vectors be given in terms of the number of support vectors per kernel classifier? Did support vectors overlap between kernel classifiers?
22
Online Multiple Kernel Classification (OMKC)
References• Hoi, S. C. H., Jin, R., Zhao, P., & Yang, T. (2012). Online
Multiple Kernel Classification. Machine Learning, 90(2), 289–316. doi:10.1007/s10994-012-5319-2
23
Online Multiple Kernel Classification (OMKC)
Algorithm 1
All kernels are used
: Represent the classifier at trial t : combination of m kernel classifiers
Deterministic Stochastic
Deterministic Stochastic
Update
Combination
Normalize the weights
Update
Combination
24
Online Multiple Kernel Classification (OMKC)
Algorithm 1 → 2
: Represent the classifier at trial t : combination of m kernel classifiers
Stochastic combination
Deterministic update
17:
Update
Combination Deterministic Stochastic
Deterministic Stochastic
25
Online Multiple Kernel Classification (OMKC)
Algorithm 2 → 3Deterministic Stochastic
Deterministic StochasticUpdate
Combination
26
Online Multiple Kernel Classification (OMKC)
Algorithm 2 → 3Deterministic Stochastic
Deterministic Stochastic
Deterministic combination
Stochastic update
Guaranteeds that each kernel will be selected with at least probability • Tradeoff between exploration and
exploitation (Auer et al. 2003)
Update
Combination
27
Online Multiple Kernel Classification (OMKC)
Algorithm 4Deterministic Stochastic
Deterministic Stochastic
Stochastic update
Stochastic combination
Update
Combination