Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Support Vector Machines (SVM):
Recent Research
Panos M. Pardalos
www.ise.ufl.edu/pardalos
https://nnov.hse.ru/en/latna/
Winter School on Data Analytics (Nov 20-22, 2020, HSE)
Classification and Clustering in
Data Analysis
Classification (supervised learning) uses predefined
classes in which objects are assigned, while clustering
(unsupervised learning) identifies similarities between
objects, which it groups according to those
characteristics in common and which differentiate
them from other groups of objects. These groups are
known as "clusters".
2020/11/23
2
Applications of Classification
Algorithms
Speech recognition
Face recognition
Handwriting recognition
Biometric identification
Document classification
Fraud detection in finance
Biomedicine
2020/11/23
3
Classification Algorithms
Neural Networks
Random Forest
Decision Trees
Nearest Neighbor
Boosted Trees
Linear Classifiers: Logistic Regression, Naïve Bayes
Classifier
Support Vector Machines
2020/11/23
4
Fuzzy approaches to classification
Ducange, P., Fazzolari, M. & Marcelloni, F. An overview of
recent distributed algorithms for learning fuzzy models in
Big Data classification. J Big Data 7, 19
(2020). https://doi.org/10.1186/s40537-020-00298-6
2020/11/23
5
Quantum approaches to classification
Is Quantum Machine Learning the next thing?
https://medium.com/illumination-curated/is-quantum-
machine-learning-the-next-thing-6328b594f424
Quantum Machine Learning Is The Next Big Thing
https://thequantumdaily.com/2020/05/28/quantum-
machine-learning-is-the-next-big-thing/
Daniel K. Park, Carsten Blank, Francesco Petruccione,
The theory of the quantum kernel-based binary
classifier, Physics Letters A, Volume 384, Issue 21, 2020, 126422
2020/11/23
6
https://medium.com/illumination-curated/is-quantum-machine-learning-the-next-thing-6328b594f424
Quantum approaches to classification
2020/11/23
7
Complexity of Classification
Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann,
Marcilio C. P. Souto, and Tin Kam Ho. 2019. How
Complex Is Your Classification Problem?: A Survey on
Measuring Classification Complexity. ACM Comput.
Surv. 52, 5, Article 107 (September 2019), 34 pages.
https://doi.org/10.1145/3347711
Each measure provides a distinct perspective on
classification complexity, a combination of different
measures is advised. Nonetheless, whether there is a
subset of the complexity measures that can be
considered core to stress the difficulty of problems
from different application domains is still an open
issue.
2020/11/23
8
What about clustering?
A density-based statistical analysis of graph clustering
algorithm performance
Pierre Miasnikof, Alexander Y Shestopaloff, Anthony J
Bonner, Yuri Lawryshyn, Panos M Pardalos
Journal of Complex Networks, Volume 8, Issue 3, June
2020, cnaa012, https://doi.org/10.1093/comnet/cnaa012
2020/11/23
9
Complexity measures
(1) Feature-based measures, which characterize how informative the available features are to separate the classes;
(2) Linearity measures, which try to quantify whether the classes can be linearly separated;
(3) Neighborhood measures, which characterize the presence and density of same or different classes in local neighborhoods;
(4) Network measures, which extract structural information from the dataset by modeling it as a graph;
(5) Dimensionality measures, which evaluate data sparsity based on the number of samples relative to the data dimensionality;
(6) Class imbalance measures, which consider the ratio of the
numbers of examples between classes.
2020/11/23
10
2020/11/23
11Clustering and Classification (P Arabie, L J Hubert, and G De Soete
https://doi.org/10.1142/1930 | January 1996)
Any issues with data
analysis?
2020/11/23
12
https://medium.com/dataseries/five-machine-learning-paradoxes-that-will-change-the-way-you-think-about-data-3b82513482b8
Five Machine Learning Paradoxes that will Change
the Way You Think About Data
Machine Learning Paradoxes
Basic Support Vector Machines (SVM)
2020/11/23
13
Twin support vector machines
2020/11/23
14
Many Models of SVM
2020/11/23
15
Wang, X., Pardalos, P.M. A Survey of Support Vector Machines with Uncertainties. Ann. Data. Sci. 1, 293–309 (2014). https://doi.org/10.1007/s40745-014-0022-8
Explosive research on svm
2020/11/23
16
Kernels - see e.g.
https://www.educba.com/kernel-methods/
2020/11/23
17
Nonparallel support vector
regression
Structural risk minimization(SRM) principle. The SRM
principle addresses overfitting by balancing the
model's complexity against its success at fitting the
training data. This principle was first set out in a 1974
paper by Vladimir Vapnik and Alexey Chervonenkis
Sparsity of the model (number of support vectors). The
decision functions constructed by support vector
machines usually depend only on a subset of the
training set—the so-called support vectors
2020/11/23
18
Nonparallel support vector regression
Primal problem
2020/11/23
19
Lower
bound
Upper
bound
minw1,𝑏1,𝜼1,𝜼1
∗ ,𝝃1
1
2w1𝑇w1 + 𝐶1 𝜼1 1 + 𝜼1
∗1 + 𝐶3 𝝃1 1
s.t. y− e𝜀1 − Aw1 + e𝑏1 ≤ 𝜼1 + e𝜀−y+ e𝜀1 + Aw1 + e𝑏1 ≤ 𝜼1
∗ + e𝜀y− Aw1 + e𝑏1 ≥ e𝜀1 − 𝝃1
𝜼1, 𝜼1∗ , 𝝃1 ≥ 0
minw2,𝑏2,𝜼2,𝜼2
∗ ,𝝃2
1
2w2𝑇w2 + 𝐶2 𝜼2 1 + 𝜼2
∗1 + 𝐶4 𝝃2 1
s.t. y+ e𝜀2 − Aw2 + e𝑏2 ≤ 𝜼2 + e𝜀−y− e𝜀2 + Aw2 + e𝑏2 ≤ 𝜼2
∗ + e𝜀Aw2 + e𝑏2 − y ≥ e𝜀2 − 𝝃2
𝜼2, 𝜼2∗ , 𝝃2 ≥ 0
y
x
( )1 1f +x
( )1 1f + +x
( )1 1f + −x
( )1f x
y
x
( )2 2f −x
( )2 2f − +x
( )2 2f − −x
( )2f x
2020/11/23
20 NPSVR
Advantages of NPSVR
Equivalent sparseness to the standard SVR;
Does not involve computing inverse matrix;
Same formulation as the standard SVR. An
SMO-type solver can be developed to
accelerate the training process;
2020/11/23
21
-15 -10 -5 0 5 10 15-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2Up-bound function
f2(x)
f2(x)-
2
f2(x)-
2-
training sample
support vector
0 50 100 150 200 250 300 350 400-1
0
1
2
3
4
5
6
7
iteration
Z1(down-bound function)
Z2(up-bound function)
0 50 100 150 200-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
iteration
Z1(down-bound function)
Z2(up-bound function)
Convergence of
SMO-type solver
-15 -10 -5 0 5 10 15-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2Down-bound function
f1(x)
f1(x)+
1
f1(x)+
1+
training sample
support vector
2020/11/23
21NPSVRSparseness
2020/11/23
22
NPSVR
1000 1500 2000 2500 3000 3500 4000 4500 50000
50
100
150
200
250
300
350
400
Training size
Training time (s)
NPSVR
TSVR
RLTSVR
L1-TWSVR
SVR
Training speed test of
large-scale data sets
Accuracy test of UCI data sets
Tang Long, Tian Yingjie*, Yang Chunyan.
Nonparallel support vector regression
and its SMO-type solver. Neural networks,
2018, 105: 431-446.
Ramp loss function based nonparallel
support vector regression (RL-NPSVR)
A Ramp ɛ -insensitive loss function is constructed to
compel as many training samples as possible to locate the down (up) bound hyperplane within a 2ɛ -wide
band
A Ramp loss function is constructed to keep as many
training samples as possible above (below) the down
(up) bound hyperplane
A regularized term is added into each primal problem
by rigidly following the SRM principle
Trading Convexity for Scalability
2020/11/23
23
Ramp-loss NPSVR Compared to the existing TSVRs, our proposed RL-NPSVR
has the following merits:
(1) It can explicitly filter noise and outlier suppression in
the training process
(2) RL-NPSVR has inherent sparseness as the standard
SVR, and the adopted Ramp-type loss functions make
it sparser
(3) The dual of each reconstructed convex
optimization problem has the same formulation as that
of the standard SVR, so computing inverse matrix is
avoided and the kernel trick can be directly applied
to the nonlinear case
(4) Available SMO-type fast algorithm exists to solve
this problem
2020/11/23
24
Original loss function is sensitive to the outlier data,
limiting the generalization ability.
Ramp-loss is adopted to improve the robustness of the
model to outlier data.
2020/11/23
25Ramp-loss NPSVR
2020/11/23
26
Dual problem
s.t.
min෩α1, ഥα1,
ഥβ1
1
2α1 − ഥα1 − ഥβ1
𝑇𝐀𝐀𝑇 α1 − ഥα1 − ഥβ1
− α1 − ഥα1 − ഥβ1𝑇y+ α1 + ഥα1
𝑇𝐞𝜀
α1 − ഥα1 − ഥβ1𝑇𝐞 = 𝟎
−෨θ1𝑡≤α1≤𝐶1e− ෨θ1
𝑡
തθ1𝑡≤ ഥα1≤𝐶1e+തθ1
𝑡
δ1𝑡≤ ഥβ1≤𝐶3e+δ1
𝑡
Ramp-loss NPSVR
SMO-type solver of NPSVR can be
used to solve each sub-optimization.
Non-convexity:CCCP (concave–
convex programming)
2020/11/23
27
-15 -10 -5 0 5 10 15-1.5
-1
-0.5
0
0.5
1
1.5
Outlier point
Observed function
RL-NPSVR
TSVR
RLTSVR
Ramp-loss NPSVR
0 1 2 3 4 5 6 7 8 9 10-5
-4
-3
-2
-1
0
1
2
3
4
5
Outlier point
Observed function
RL-NPSVR
TSVR
RLTSVR
𝑓 𝑥 =sin 𝑥
𝑥, 𝑥 ∈ −4𝜋, 4𝜋 \ 0
𝑓 𝑥 = sin9𝜋
0.35𝑥 + 1, 𝑥 ∈ 0,10
Stochastically generate
200 training points, in
which 5% of them are set
to outlier points。
Capacity of filtering outlier data
2020/11/23
28Ramp-loss NPSVR
Accuracy test of UCI data sets
2020/11/23
29Ramp-loss NPSVR
Tang Long, Tian Yingjie, Pardalos P. M*,
Yang Chunyan. Ramp-loss nonparallel
support vector regression: robust, sparse
and scalable approximation. Knowledge-
based systems, 2018, 147: 55-67
1000 1500 2000 2500 3000 3500 4000 4500 50000
50
100
150
200
250
300
350
400
450
Training size
train
ing t
ime (
s)
Ts-total
RL-NPSVR
TSVR
RLTSVR
1000 1500 2000 2500 3000 3500 4000 4500 50000
50
100
150
200
250
300
350
400
450
Training size
train
ing t
ime (
s)
Ttotal
RL-NPSVR
TSVR
RLTSVR
Training speed test of large-
scale data sets
Regular simplex support vector machine
(RSSVM) for the K -class classification
RSSVM maps the K classes to K vertices of a (K−1) -
dimensional regular simplex so that the K-class
classification becomes a (K-1)-output learning task
We measure the training loss by comparing the square
of the distance between the output point of each
sample and its vertices
Adding an appropriate regularized term to the primal
problem, makes the dual problem a quadratic
programming problem, and we developed an
exclusive sequential minimization optimization–type
solver to accelerate our ability to solve it
2020/11/23
30
2020/11/23
31 Regular simplex SVM for multi-classification
Limitations of traditional Partitioning one-
versus-one (1-v-1), one-versus-
rest (1-v-r) strategies
Establish multiple sub-binary classifiers,
limiting the sparseness of the model
Lack of definite classifying boundaries
Individual classifier can hardly use complete
information of training samples
Primal problem
2020/11/23
32
(0,0)T
(1,0)T
(0.5,0.866)T
V1 V2
V3
V1
V2
V3
(0,0,0)T
(1,0,0)T
(0.5,0.866,0)T
V1 V2
V3
V1
V2
V3
V4
(0.5,0.2887,0.8165)TV4
minw,b
𝑗=1
𝐾−11
2w𝑗𝑇w𝑗 + 𝑏𝑗
2 + 𝐶
𝑖=1
𝑁
𝑘≠𝑐𝑖
𝜉𝑖,𝑘
s.t. σ𝑗=1𝐾−1 2 𝑉𝑐𝑖,𝑗 − 𝑉𝑘,𝑗 w𝑗
𝑇x𝑖 + 𝑏𝑗 + 𝑉𝑘,𝑗2 − 𝑉𝑐𝑖,𝑗
2 ≥ 𝜀 − 𝜉𝑖,𝑘 , 𝑖 = 1,2,⋯𝑁
𝜉𝑖,𝑘 ≥ 0, 𝑖 = 1,2,⋯𝑁
RSSVM
The classes are mapped to different vertices of a
regular simplex, and square distance is used to measure
the loss.
2020/11/23
33
Advantages of RSSVM
The primal includes only a single
optimization problem.
The adapted loss function preserves
equivalent sparseness of the original
SVM in the RSSVM.
Matched SMO-type solver can be
developed for training.
minα
1
2ෝα𝑇
𝑗=1
𝐾−1
E𝑗 𝐀𝐀𝑇 + ee𝑇 E𝑗
𝑇 ෝα − ෝα𝑇
𝑗=1
𝐾−1
𝐅𝑗 + 𝜀e
s.t. 0≤ෝα≤𝐶e
Dual problem
RSSVM
2020/11/23
34
Classifying
mode
2020/11/23
34
2020/11/23
35
Tang Long, Tian Yingjie, Pardalos P. M*. A
novel perspective on multiclass
classification: regular simplex support
vector machine. Information sciences,
2019, 480: 324-338.
RSSVM
The developed SMO-type solver has
excellent scalability.
Training speed test of
large-scale data sets
Accuracy test of UCI data sets
2020/11/23
36 Shortcomings of directly combining the partitioning (1-v-1, 1-v-
r) strategies and RSSVM.
Repeatedly computing the clustering information matrices
under different partitions increases the training time.
Individual classifier can hardly use complete information of
training samples.
Structural improved RSSVM
All-in-one multi-classification
model
Embedding the cluster
granularity into binary-
classification SVM
RSSVM
SRSVM
SIRSSVM
2020/11/23
37minw,b
𝑗=1
𝐾−11
2w𝑗𝑇w𝑗 + 𝑏𝑗
2 + 𝑑1
𝑖=1
𝑁
𝐾 − 1 𝜉𝑖 +
𝑗=1
𝐾−1𝑑22w𝑗𝑇Σw𝑗
s.t. σ𝑗=1𝐾−1 2 𝑉𝑐𝑖,𝑗 − 𝑉𝑘,𝑗 w𝑗
𝑇x𝑖 + 𝑏𝑗 + 𝑉𝑘,𝑗2 − 𝑉𝑐𝑖,𝑗
2 ≥ 𝜀 −
𝜉𝑖,𝑘, 𝑖 = 1,2,⋯𝑁𝜉𝑖,𝑘 ≥ 0, 𝑖 = 1,2,⋯𝑁
Compute complete cluster
information matrix
Primal problem
Improved SMO-type solver
Structural improved RSSVM
2020/11/23
38
2020/11/23
38
Convergence process
Accuracy test
2020/11/23
39
Comparison of training speed SIRSSVM has better convergence than
RSSVM.
Long Tang; Yingjie Tian; Wenjun Li; Panos
M. Pardalos*; Structural improved regular
simplex support vector machine for
multiclass classification, Applied soft
computing, 2020, 91,
https://doi.org/10.1016/j.asoc.2020.106235.
Structural improved RSSVM
Challenging issues with SVM
Unbalanced data
Structural data sets
Multi-label classification
Semi-supervised learning
Massive data sets
Jair Cervantes, Farid Garcia-Lamont, Lisbeth Rodríguez-
Mazahua, Asdrubal Lopez, A comprehensive survey on
support vector machine classification: Applications,
challenges and trends, Neurocomputing, Volume 408,
2020, Pages 189-21
https://www.sciencedirect.com/science/article/pii/S0925
231220307153
2020/11/23
40
2020/11/23
41
Thank you!