Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Aggressive Double Sampling for Reducing Multi-classClassification to Binary Classification
Bikash Joshi (PhD Student)AMA team, LIG
Supervised By:
Prof. Massih-Reza Amini and Dr. Franck Iutzeler
March 20, 2017
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 1 / 27
Outline
1 Introduction
2 Multiclass to Binary Reduction
3 Double-Sampled Multiclass to Binary Reduction
4 Experimental Results
5 Conclusion
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 2 / 27
Outline
1 Introduction
2 Multiclass to Binary Reduction
3 Double-Sampled Multiclass to Binary Reduction
4 Experimental Results
5 Conclusion
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 3 / 27
Multiclass Classification: Introduction
Figure : DigitClassification
Figure : ImageClassification
Figure : TextClassification
Finite set of categories (K > 2)
Popular applications: image and text classification.
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 4 / 27
Multiclass classification: Related Work
1 Combined approaches based on binary classification:I One-Vs-Rest
F One binary problem for each classF K binary problemsF O(K × d)
I One-Vs-OneF One binary problem for each pair of classesF O(K 2× d)
2 Uncombined ApproachesI for example: multiclass SVM, MLPI One scoring function per class
3 Logarithmic Time AlgorithmsI For example: logTree, Recall-TreeI Each leaf node represents a classI O(logK)
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 5 / 27
Multiclass classification : Challenges
The number of classes, K, in new emerging multiclass problems, forexample in text and image classification, may reach 105 to 106
categories.
For example:
I 4 × 106 sites
I 106 categories
I 105 editors
I Imbalanced natureof hierarchies
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 6 / 27
Multiclass classification : Challenges
Class imbalance problem
Majority of classes have few representative examples
Long tailed distribution
0
500
1000
1500
2000
2500
3000
3500
4000
2-5 6-10 11-30 31-100 101-200 >200
# C
lasses
# Documents
DMOZ-7500
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 7 / 27
Text Classification:
Task: Automatic classification of an example text to one of fixed set ofcategories.Feature Representation:
Bag of Words:I From training corpus extract vocabulary.I Represent each terms as 0 or 1I Highly sparse
Document-class joint feature representation:I Inspired by learning to rankI Similarity features between an example and class of examplesI For example: ∑
t∈y∩x
1
Where,x → One documenty → Class of documents
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 8 / 27
Outline
1 Introduction
2 Multiclass to Binary Reduction
3 Double-Sampled Multiclass to Binary Reduction
4 Experimental Results
5 Conclusion
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 9 / 27
Motivation of our work
Baselines: Model complexity increases with classes(K) and featuredimension (d).
Algorithm that scales well for large scale data
Does not suffer from class imbalance problem
Less complex model
Competitive with the state of the art approaches
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 10 / 27
Framework
X ⊆ Rd : Input Space
Y = 1,...,K : Output Space
S = (xyii )mi=1 : Training set of i.i.d. pairs
G = g : X × Y → R : Class of predictors
Instantaneous Loss
e(g , xy ) =1
K − 1
∑y ′∈Y\y
1g(xy )≤g(xy
′)
(1)
1π is the indicator function (Value is 0 or 1)
Average number of classes that get greater scoring by g than trueclass
Ranking loss used in Multiclass-SVM a
aWeston et. al. (1998)
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 11 / 27
Framework
Empirical Loss
Empirical error of g ∈ G over S is:
Lm(g ,S) =1
m(K − 1)
m∑i=1
∑y ′∈Y \yi
1g(x
yii )≤g(xy
′i )
(2)
=1
m(K − 1)
m∑i=1
∑y ′∈Y \yi
1
h(xyii , xy′
i )︸ ︷︷ ︸g(x
yii
)−g(xy′
i)
≤0(3)
Resembles to binary-classification-loss based risk
Selection of a hypothesis in G minimizing risk over S is equivalent tosearch a hypothesis in H minimizing risk over T(S) of size m× (K − 1)
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 12 / 27
Multiclass to binary reduction example
We consider the following transformation
T (S) =
({ (zj =
(xki , x
yii
), yj = −1
)if k < yi(
zj =(xyii , x
ki
), yj = +1
)elsewhere
)j.=(i−1)(K−1)+k
,
|T (S)| = m × (K - 1)
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 13 / 27
Multiclass to binary reduction algorithm
[Bikash et al. 2015]
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 14 / 27
Improvements and New challenges
Improvements:
One parameter vector for all classes.
Low-dimensional feature space.
Overcome class imbalance.
New Challenges:
Number of transformations huge for larger K
Large computational overhead
Large memory requirement
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 15 / 27
Outline
1 Introduction
2 Multiclass to Binary Reduction
3 Double-Sampled Multiclass to Binary Reduction
4 Experimental Results
5 Conclusion
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 16 / 27
Aggressive double sampling
1 Drawing uniformly µ examples per class, in order to form practical setSµ;
I Reduce redundancy in examplesI Emphasizing rare classes
2 For each example xy in Sµ, drawing uniformly κ adversarial classes inY\{y}.
I Reduces time complexityI Low memory requirement
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 17 / 27
Double Sampled Multi to Binary Reduction
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 18 / 27
Outline
1 Introduction
2 Multiclass to Binary Reduction
3 Double-Sampled Multiclass to Binary Reduction
4 Experimental Results
5 Conclusion
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 19 / 27
Experimental Setup
Datasets:
Application: Text Classification
DMOZ and Wikipedia datasets. (LSHTC challenge)
Pre-processed with stop word removal and stemming.
Random samples of 1000, 2000, 3000, 4000, 5000, 7500, 10000,20000.
Comparison:
DS-m2b: Proposed double sampled multiclass to binary algorithm
OVA: One-Vs-All algorithm
M-SVM: Crammar-Singer implementation of multiclass SVM
Recall Tree: Hierarchical One-Vs-Some algorithm
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 20 / 27
Feature representation Φ(xy)
Features
1.∑
t∈y∩x
ln(1 + yt) 2.∑
t∈y∩x
ln(1 +lSSt
)
3.∑
t∈y∩x
It 4.∑
t∈y∩x
ln(1 +yt|y | )
5.∑
t∈y∩x
ln(1 +yt|y | .It) 6.
∑t∈y∩x
ln(1 +yt|y | .
lSSt
)
7.∑
t∈y∩x
1 8.∑
t∈y∩x
yt|y | .It
9. BM25 10. d(xy , centroid(y))
xt : number of occurrences of terme t in document x ,
V: Number of distinct terms in S,
yt =∑
x∈y xt , |y | =∑
t∈V yt , St =∑
x∈S xt , lS =∑
t∈V St .It : idf of the term t,
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 21 / 27
Results: Runtime Comparison
1000 3000 5000 7500 10000 20000# of classes
101
102
103
104
105
106To
tal r
untim
e (s
econ
ds)
OVAMSVMRecall TreeDS-m2b
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 22 / 27
Results: Memory Comparison
1000 3000 5000 7500 10000 20000# of classes
0
10
20
30
40
50
60To
tal m
emor
y us
age
(GB)
16GB Limit
32GB Limit
OVAMSVMRecall TreeDS-m2b
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 23 / 27
Results: Prediction Performance Comparison
1000 3000 5000 7500 10000 20000# of classes
0.0
0.1
0.2
0.3
0.4
0.5
0.6M
AFOVAMSVMRecall TreeDS-m2b
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 24 / 27
Outline
1 Introduction
2 Multiclass to Binary Reduction
3 Double-Sampled Multiclass to Binary Reduction
4 Experimental Results
5 Conclusion
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 25 / 27
Conclusion:
Multiclass to binary reduction to handle large-class scenario andovercome class imbalance problem.
Use of double sampling to further improve computational complexityand memory usage.
Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 26 / 27