8/2/2019 CS583 Partially Supervised Learning
1/48
Chapter 5: Partially-Supervised Learning
8/2/2019 CS583 Partially Supervised Learning
2/48
CS583, Bing Liu, UIC 2
Outline
Fully supervised learning (traditionalclassification)
Partially (semi-) supervised learning (or
classification) Learning with a small set of labeled examples and
a large set of unlabeled examples (LU learning) Learning with positive and unlabeled examples
(no labeled negative examples) (PU learning).
8/2/2019 CS583 Partially Supervised Learning
3/48
Learning from a smalllabeled set and a large
unlabeled setLU learning
8/2/2019 CS583 Partially Supervised Learning
4/48
CS583, Bing Liu, UIC 4
Unlabeled Data
One of the bottlenecks of classification is thelabeling of a large set of examples (data
records or text documents).
Often done manually Time consuming
Can we label only a small number of examples
and make use of a large number of unlabeled
examples to learn?
Possible in many cases.
8/2/2019 CS583 Partially Supervised Learning
5/48
CS583, Bing Liu, UIC 5
Why unlabeled data areuseful? Unlabeled data are usually plentiful, labeled
data are expensive. Unlabeled data provide information about
the joint probability distribution over wordsand collocations (in texts).
We will use text classification to study thisproblem.
8/2/2019 CS583 Partially Supervised Learning
6/48
CS583, Bing Liu, UIC 6
DocNo: k ClassLabel: Positive
...homework.
...
DocNo: n ClassLabel: Positive
...homework.
...
DocNo: m ClassLabel: Positive
...homework.
...
DocNo: x (ClassLabel: Positive)
...homework.
...lecture.
DocNo: z ClassLabel: Positive
...homework.
lecture.
DocNo: y (ClassLabel: Positive)
lecture..
...homework.
...
Labeled Data Unlabeled Data
Documents containing homework
tend to belong to the positive class
8/2/2019 CS583 Partially Supervised Learning
7/48
8/2/2019 CS583 Partially Supervised Learning
8/48
CS583, Bing Liu, UIC 8
Incorporating unlabeled Datawith EM (Nigam et al, 2000)
Basic EM
Augmented EM with weighted unlabeled data
Augmented EM with multiple mixture
components per class
8/2/2019 CS583 Partially Supervised Learning
9/48
CS583, Bing Liu, UIC 9
Algorithm Outline
1. Train a classifier with only the labeled
documents.
2. Use it to probabilistically classify the
unlabeled documents.
3. Use ALL the documents to train a new
classifier.
4. Iterate steps 2 and 3 to convergence.
8/2/2019 CS583 Partially Supervised Learning
10/48
CS583, Bing Liu, UIC 10
Basic Algorithm
8/2/2019 CS583 Partially Supervised Learning
11/48
CS583, Bing Liu, UIC 11
Basic EM: E Step & M Step
E Step:
M Step:
8/2/2019 CS583 Partially Supervised Learning
12/48
CS583, Bing Liu, UIC 12
The problem
It has been shown that the EM algorithm in Fig. 5.1works well if the The two mixture model assumptions for a particular data set
are true. The two mixture model assumptions, however, can
cause major problems when they do not hold. Inmany real-life situations, they may be violated.
It is often the case that a class (or topic) contains anumber of sub-classes (or sub-topics).
For example, the class Sports may contain documents aboutdifferent sub-classes of sports, Baseball, Basketball, Tennis,and Softball.
Some methods to deal with the problem.
8/2/2019 CS583 Partially Supervised Learning
13/48
CS583, Bing Liu, UIC 13
Weighting the influence ofunlabeled examples by factor
New M step:
The prior probability also needs to be weighted.
8/2/2019 CS583 Partially Supervised Learning
14/48
CS583, Bing Liu, UIC 14
Experimental Evaluation
Newsgroup postings 20 newsgroups, 1000/group
Web page classification student, faculty, course, project 4199 web pages
Reuters newswire articles
12,902 articles 10 main topic categories
8/2/2019 CS583 Partially Supervised Learning
15/48
CS583, Bing Liu, UIC 15
20 Newsgroups
8/2/2019 CS583 Partially Supervised Learning
16/48
CS583, Bing Liu, UIC 16
20 Newsgroups
8/2/2019 CS583 Partially Supervised Learning
17/48
CS583, Bing Liu, UIC 17
Another approach: Co-training
Again, learning with a small labeled set and a large
unlabeled set.
The attributes describing each example or instance
can be partitioned into two subsets. Each of them issufficient for learning the target function. E.g., hyperlinks and page contents in Web page
classification.
Two classifiers can be learned from the same data.
8/2/2019 CS583 Partially Supervised Learning
18/48
CS583, Bing Liu, UIC 18
Co-training Algorithm[Blum and Mitchell, 1998]
Given: labeled data L,
unlabeled data U
Loop:
Train h1 (e.g., hyperlink classifier) using L
Train h2 (e.g., page classifier) using L
Allow h1 to label p positive, n negative examples from UAllow h2 to label p positive, n negative examples from U
Add these most confident self-labeled examples to L
8/2/2019 CS583 Partially Supervised Learning
19/48
CS583, Bing Liu, UIC 19
Co-training: Experimental Results
begin with 12 labeled web pages (academic course) provide 1,000 additional unlabeled web pages
average error: learning from labeled data 11.1%;
average error: co-training 5.0%
Page-baseclassifier
Link-basedclassifier
Combinedclassifier
Supervisedtraining
12.9 12.4 11.1
Co-training 6.2 11.6 5.0
8/2/2019 CS583 Partially Supervised Learning
20/48
CS583, Bing Liu, UIC 20
When the generative model isnot suitable
Multiple Mixture Components per Class (M-EM). E.g., aclass --- a number of sub-topics or clusters.
Results of an example using 20 newsgroup data 40 labeled; 2360 unlabeled; 1600 test Accuracy
NB 68% EM 59.6%
Solutions M-EM(Nigam et al, 2000): Cross-validation on the training
data to determine the number of components. Partitioned-EM (Cong, et al, 2004): using hierarchical
clustering. It does significantly better than M-EM.
8/2/2019 CS583 Partially Supervised Learning
21/48
CS583, Bing Liu, UIC 21
Summary
Using unlabeled data can improve the accuracy ofclassifier when the data fits the generative model.
Partitioned EM and the EM classifier based on multiplemixture components model (M-EM) are more suitablefor real data when multiple mixture components are inone class.
Co-training is another effective technique whenredundantly sufficient features are available.
8/2/2019 CS583 Partially Supervised Learning
22/48
Learning from Positiveand Unlabeled
ExamplesPU learning
8/2/2019 CS583 Partially Supervised Learning
23/48
CS583, Bing Liu, UIC 23
Learning from Positive &Unlabeled data Positive examples: One has a set of examples of a
class P, and Unlabeled set: also hasa set Uof unlabeled (or
mixed) examples with instances from Pand also notfrom P(negative examples). Build a classifier: Build a classifier to classify the
examples in Uand/or future (test) data.
Key feature of the problem: no labeled negativetraining data.
We call this problem, PU-learning.
8/2/2019 CS583 Partially Supervised Learning
24/48
CS583, Bing Liu, UIC 24
Applications of the problem
With the growing volume of online texts availablethrough the Web and digital libraries, one often wants tofind those documents that are related to one's workorone's interest.
For example, given a ICML proceedings, find all machine learning papers from AAAI, IJCAI, KDD No labeling of negative examples from each of these
collections. Similarly, given one's bookmarks (positive documents),
identify those documents that are of interest to him/herfrom Web sources.
8/2/2019 CS583 Partially Supervised Learning
25/48
CS583, Bing Liu, UIC 25
Direct Marketing
Company has database with details of its customer positive examples, but no information on those who
are not their customers, i.e., no negative examples.
Want to find people who are similar to their
customers for marketing
Buy a database consisting of details of people, some
of whom may be potential customers unlabeled
examples.
8/2/2019 CS583 Partially Supervised Learning
26/48
CS583, Bing Liu, UIC 26
Are Unlabeled ExamplesHelpful?
Function known to be
either x1 < 0 or x2 > 0
Which one is it?
x1
< 0
x2
> 0
+
+
++ +
+ + ++
u
uu
uu
u
u
u
u
uu
Not learnable with only positive
examples. However, addition ofunlabeled examples makes it
learnable.
8/2/2019 CS583 Partially Supervised Learning
27/48
CS583, Bing Liu, UIC 27
Theoretical foundations
(X, Y): X- input vector, Y {1, -1} - class label. f: classification function We rewrite the probability of error
Pr[f(X) Y]= Pr[f(X) = 1 and Y = -1]+ (1) Pr[f(X) = -1 and Y = 1]
We have Pr[f(X) = 1 and Y = -1]=Pr[f(X) = 1] Pr[f(X) = 1 and Y = 1]
=Pr[f(X) = 1] (Pr[Y = 1] Pr[f(X) = -1 and Y = 1]).
Plug this into (1), we obtainPr[f(X) Y] = Pr[f(X) = 1] Pr[Y = 1] (2)
+2Pr[f(X) = -1|Y = 1]Pr[Y = 1]
8/2/2019 CS583 Partially Supervised Learning
28/48
CS583, Bing Liu, UIC 28
Theoretical foundations (cont)
Pr[f(X) Y]=Pr[f(X) = 1]Pr[Y = 1] (2) +2Pr[f(X) = -1|Y = 1] Pr[Y = 1] Note that Pr[Y = 1]is constant. If we can hold Pr[f(X) = -1|Y = 1] small, then learning is
approximately the same as minimizing Pr[f(X) = 1].
Holding Pr[f(X) = -1|Y = 1] small while minimizing Pr[f(X) = 1] isapproximately the same as minimizing Pru[f(X) = 1]
while holding PrP[f(X) = 1] r(where r is recall Pr[f(X)=1| Y=1])
which is the same as (Prp[f(X) = -1] 1 r)
if the set of positive examples P and the set of unlabeledexamples U are large enough.
Theorem 1 and Theorem 2 in [Liu et al 2002] state these formallyin the noiseless case and in the noisy case.
8/2/2019 CS583 Partially Supervised Learning
29/48
CS583, Bing Liu, UIC 29
Put it simply
A constrained optimization problem.
A reasonably good generalization (learning)
result can be achieved If the algorithm tries to minimize the number of
unlabeled examples labeled as positive
subject to the constraint that the fraction of errors
on the positive examples is no more than 1-r.
8/2/2019 CS583 Partially Supervised Learning
30/48
CS583, Bing Liu, UIC 30
An illustration
Assume a linear classifier. Line 3 is the bestsolution.
8/2/2019 CS583 Partially Supervised Learning
31/48
CS583, Bing Liu, UIC 31
Existing 2-step strategy
Step 1: Identifying a set of reliable negative documentsfrom the unlabeled set. S-EM [Liu et al, 2002] uses a Spy technique, PEBL [Yu et al, 2002] uses a 1-DNF technique Roc-SVM [Li & Liu, 2003] uses the Rocchio algorithm.
Step 2: Building a sequence of classifiers by iteratively
applying a classification algorithm and then selecting agood classifier.
S-EM uses the Expectation Maximization (EM) algorithm, withan error based classifier selection mechanism PEBL uses SVM, and gives the classifier at convergence. I.e.,
no classifier selection. Roc-SVM uses SVM with a heuristic method for selecting the
final classifier.
8/2/2019 CS583 Partially Supervised Learning
32/48
CS583, Bing Liu, UIC 32
Step 1 Step 2
positive negative
Reliable
Negative(RN)
Q
=U - RN
U
P
positive
Using P, RN and Q
to build the finalclassifier iteratively
or
Using only P and RNto build a classifier
8/2/2019 CS583 Partially Supervised Learning
33/48
CS583, Bing Liu, UIC 33
Step 1: The Spy technique
Sample a certain % of positive examples and putthem into unlabeled set to act as spies.
Run a classification algorithm assuming all unlabeled
examples are negative, we will know the behavior of those actual positive examples
in the unlabeled set through the spies.
We can then extract reliable negative examples from
the unlabeled set more accurately.
8/2/2019 CS583 Partially Supervised Learning
34/48
CS583, Bing Liu, UIC 34
Step 1: Other methods
1-DNF method: Find the set of words W that occur in the positive
documents more frequently than in the unlabeledset.
Extract those documents from unlabeled set thatdo not contain any word in W. These documentsform the reliable negative documents.
Rocchio method from information retrieval.
Nave Bayesian method.
8/2/2019 CS583 Partially Supervised Learning
35/48
CS583, Bing Liu, UIC 35
Step 2: Running EM or SVMiteratively(1) Running a classification algorithm iteratively
Run EM using P, RN and Q until it converges, or
Run SVM iteratively using P, RN and Q until this no
document from Q can be classified as negative. RN and
Q are updated in each iteration, or
(2) Classifier selection.
8/2/2019 CS583 Partially Supervised Learning
36/48
CS583, Bing Liu, UIC 36
Do they follow the theory?
Yes, heuristic methods because Step 1 tries to find some initial reliable negative
examples from the unlabeled set.
Step 2 tried to identify more and more negative
examples iteratively.
The two steps together form an iterative
strategy ofincreasing the number of
unlabeled examples that are classified asnegative while maintaining the positive
examples correctly classified.
8/2/2019 CS583 Partially Supervised Learning
37/48
CS583, Bing Liu, UIC 37
Can SVM be applieddirectly? Can we use SVM to directly deal with the
problem of learning with positive and
unlabeled examples, without using two steps?
Yes, with a little re-formulation.
8/2/2019 CS583 Partially Supervised Learning
38/48
CS583, Bing Liu, UIC 38
Support Vector Machines
Support vector machines (SVM) are linear functions ofthe form f(x)=wTx + b, where w is the weight vector
and x is the input vector.
Let the set of training examples be {(x1, y1), (x2, y2),
, (xn, yn)}, where xi is an input vector and yi is its
class label, yi {1, -1}.
To find the linear function:
Minimize:
Subject to:
wwT
2
1
niby ii ...,2,1,,1)(T
=+xw
8/2/2019 CS583 Partially Supervised Learning
39/48
CS583, Bing Liu, UIC 39
Soft margin SVM
To deal with cases where there may be no separatinghyperplane due to noisy labels of both positive andnegative training examples, the soft margin SVM isproposed:
Minimize:
Subject to:
where C 0 is a parameter that controls the amountof training errors allowed.
=+n
i
iC1
T
2
1
ww
niby iii ...,2,1,,1)(T
=+ xw
8/2/2019 CS583 Partially Supervised Learning
40/48
CS583, Bing Liu, UIC 40
Biased SVM (noiseless case)
Assume that the first k-1 examples are positive
examples (labeled 1), while the rest are unlabeled
examples, which we label negative (-1).
Minimize:
Subject to:
i 0, i= k, k+1, n
=
+n
ki
iC wwT
2
1
1...,2,1,,1T =+ kibixw
nkkib ii ...,,1,,1)1(-T +=+ xw
8/2/2019 CS583 Partially Supervised Learning
41/48
CS583, Bing Liu, UIC 41
Biased SVM (noisy case)
If we also allow positive set to have some noisynegative examples, then we have:
Minimize:
Subject to:
i 0, i= 1, 2, , n.
This turns out to be the same as the asymmetric costSVM for dealing with unbalanced data. Of course, we
have a different motivation.
niby iii ...,2,1,1)(T
=+ xw
=
=
+ ++n
ki
i
k
i
i CC 1
1
T
2
1ww
8/2/2019 CS583 Partially Supervised Learning
42/48
CS583, Bing Liu, UIC 42
Estimating performance
We need to estimate the performance in order to
select the parameters.
Since learning from positive and negative examples
often arise in retrieval situations, we use F score asthe classification performance measure F = 2pr/ (p+r)
(p: precision, r: recall).
To get a high F score, both precision and recall have
to be high. However, without labeled negative examples, we do
not know how to estimate the F score.
8/2/2019 CS583 Partially Supervised Learning
43/48
CS583, Bing Liu, UIC 43
A performance criterion
Performance criteriapr/Pr[Y=1]: It can be estimateddirectly from the validation set as r2/Pr[f(X) = 1] Recall r= Pr[f(X)=1| Y=1] Precisionp = Pr[Y=1| f(X)=1]
To see thisPr[f(X)=1|Y=1] Pr[Y=1] = Pr[Y=1|f(X)=1] Pr[f(X)=1]
//both side times r
Behavior similar to the F-score (= 2pr/ (p+r))]1Pr[]1)(Pr[
=
=
=
Y
p
Xf
r
f i i (
8/2/2019 CS583 Partially Supervised Learning
44/48
CS583, Bing Liu, UIC 44
A performance criterion (cont) r2/Pr[f(X) = 1] rcan be estimated from positive examples in
the validation set.
Pr[f(X) = 1]can be obtained using the fullvalidation set.
This criterion actually reflects the theory very
well.
8/2/2019 CS583 Partially Supervised Learning
45/48
CS583, Bing Liu, UIC 45
Empirical Evaluation Two-step strategy: We implemented a benchmark system, called
LPU, which is available at http://www.cs.uic.edu/~liub/LPU/LPU-download.html Step 1:
Spy 1-DNF Rocchio Nave Bayesian (NB)
Step 2: EM with classifier selection SVM: Run SVM once. SVM-I: Run SVM iteratively and give converged classifier. SVM-IS: Run SVM iteratively with classifier selection
Biased-SVM (we used SVMlight package)
T a b l e 1 : A v e r a g e F s c o r e s o n R e u t e r s c o ll e c t io n
8/2/2019 CS583 Partially Supervised Learning
46/48
CS583, Bing Liu, UIC 46
1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 16 1 7
S tep 11 -D N F1 -D N F 1 -D N F S p y S p y Sp y R o c c h ioR o c c h ioR o c c h io N B N B N B N B
S tep 2 E M S V M S V M -IS S V M S V M - IS V M -IS E M S V M S V M -I E M S V M S V M - IS V M - IS
0 . 1 0 . 1 8 7 0 . 4 2 3 0 . 0 0 1 0 . 4 2 3 0 . 5 4 7 0 . 3 2 9 0 . 0 0 6 0 .3 2 8 0 . 6 4 4 0 . 5 8 9 0 . 0 0 1 0 . 5 8 9 0 . 5 4 7 0 . 1 1 5 0 . 0 0 6 0 . 1 1 5 0 . 5 1
0 . 2 0 . 1 7 7 0 . 2 4 2 0 . 0 7 1 0 . 2 4 2 0 . 6 7 4 0 . 5 0 7 0 . 0 4 7 0 .5 0 7 0 . 6 3 1 0 . 7 3 7 0 . 1 2 4 0 . 7 3 7 0 . 6 9 3 0 . 4 2 8 0 . 0 7 7 0 . 4 2 8 0 . 6 80 . 3 0 . 1 8 2 0 . 2 6 9 0 . 2 5 0 0 . 2 6 8 0 . 6 5 9 0 . 7 3 3 0 . 2 3 5 0 .7 3 3 0 . 6 2 3 0 . 7 8 0 0 . 2 4 2 0 . 7 8 0 0 . 6 9 5 0 . 6 6 4 0 . 2 3 5 0 . 6 6 4 0 . 6 9
0 . 4 0 . 1 7 8 0 . 1 9 0 0 . 5 8 2 0 . 2 2 8 0 . 6 6 1 0 . 7 8 2 0 . 5 4 9 0 .7 8 0 0 . 6 1 7 0 . 8 0 5 0 . 5 6 1 0 . 7 8 4 0 . 6 9 3 0 . 7 8 4 0 . 5 5 7 0 . 7 8 2 0 . 7 0
0 . 5 0 . 1 7 9 0 . 1 9 6 0 . 7 4 2 0 . 3 5 8 0 . 6 7 3 0 . 8 0 7 0 . 7 1 5 0 .7 9 9 0 . 6 1 4 0 . 7 9 0 0 . 7 3 7 0 . 7 9 9 0 . 6 8 5 0 . 7 9 7 0 . 7 2 1 0 . 7 8 9 0 . 7 0
0 . 6 0 . 1 8 0 0 . 2 1 1 0 . 8 1 0 0 . 5 7 3 0 . 6 6 9 0 . 8 3 3 0 . 8 0 4 0 .8 2 0 0 . 5 9 7 0 . 7 9 3 0 . 8 1 3 0 . 8 1 1 0 . 6 7 0 0 . 8 3 2 0 . 8 0 8 0 . 8 2 4 0 . 6 9
0 . 7 0 . 1 7 5 0 . 1 7 9 0 . 8 2 4 0 . 4 2 5 0 . 6 6 7 0 . 8 4 3 0 . 8 2 1 0 .8 4 2 0 . 5 8 5 0 . 7 9 3 0 . 8 2 3 0 . 8 3 4 0 . 6 6 4 0 . 8 4 5 0 . 8 2 2 0 . 8 4 3 0 . 6 8
0 . 8 0 . 1 7 5 0 . 1 7 8 0 . 8 6 8 0 . 6 5 0 0 . 6 4 9 0 . 8 6 1 0 . 8 6 5 0 .8 5 8 0 . 5 7 5 0 . 7 8 7 0 . 8 6 7 0 . 8 6 4 0 . 6 5 1 0 . 8 5 9 0 . 8 6 5 0 . 8 5 8 0 . 6 7
0 . 9 0 . 1 7 2 0 . 1 9 0 0 . 8 6 0 0 . 7 1 6 0 . 6 5 8 0 . 8 5 9 0 . 8 5 9 0 .8 5 3 0 . 5 8 0 0 . 7 7 6 0 . 8 6 1 0 . 8 6 1 0 . 6 5 1 0 . 8 4 6 0 . 8 5 8 0 . 8 4 5 0 . 6 7
1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 16 1 7
S tep 11 -D N F1 -D N F 1 -D N F S p y S p y Sp y R o c c h ioR o c c h ioR o c c h io N B N B N B N B
S tep 2 E M S V M S V M -IS S V M S V M - IS V M -IS E M S V M S V M -I E M S V M S V M - IS V M - IS
0 . 1 0 . 1 4 5 0 . 5 4 5 0 . 0 3 9 0 . 5 4 5 0 . 4 6 0 0 . 0 9 7 0 . 0 0 3 0 .0 9 7 0 . 5 5 7 0 . 2 9 5 0 . 0 0 3 0 . 2 9 5 0 . 3 6 8 0 . 0 2 0 0 . 0 0 3 0 . 0 2 0 0 . 3 3
0 . 2 0 . 1 2 5 0 . 3 7 1 0 . 0 7 4 0 . 3 7 1 0 . 6 4 0 0 . 4 0 8 0 . 0 1 4 0 .4 0 8 0 . 6 7 0 0 . 5 4 6 0 . 0 1 4 0 . 5 4 6 0 . 6 4 9 0 . 2 3 2 0 . 0 1 3 0 . 2 3 2 0 . 6 1
0 . 3 0 . 1 2 3 0 . 2 8 8 0 . 2 0 1 0 . 2 8 8 0 . 6 6 5 0 . 6 2 5 0 . 1 5 4 0 .6 2 5 0 . 6 7 3 0 . 6 4 4 0 . 1 2 1 0 . 6 4 4 0 . 6 8 9 0 . 4 6 9 0 . 1 2 0 0 . 4 6 9 0 . 6 70 . 4 0 . 1 2 2 0 . 2 6 0 0 . 3 4 2 0 . 2 5 8 0 . 6 8 3 0 . 6 8 4 0 . 3 5 4 0 .6 8 4 0 . 6 7 1 0 . 6 9 0 0 . 3 8 5 0 . 6 8 2 0 . 7 0 5 0 . 6 1 0 0 . 3 5 4 0 . 6 0 3 0 . 7 0
0 . 5 0 . 1 2 1 0 . 2 4 8 0 . 5 6 3 0 . 3 0 6 0 . 6 8 5 0 . 7 1 5 0 . 5 6 0 0 .7 0 7 0 . 6 6 3 0 . 7 1 6 0 . 5 6 5 0 . 7 0 8 0 . 7 0 2 0 . 6 8 0 0 . 5 5 4 0 . 6 7 2 0 . 7 0
0 . 6 0 . 1 2 3 0 . 2 0 9 0 . 6 4 6 0 . 4 1 9 0 . 6 8 9 0 . 7 5 8 0 . 6 7 4 0 .7 4 6 0 . 6 6 3 0 . 7 4 7 0 . 6 8 3 0 . 7 3 8 0 . 7 0 1 0 . 7 3 7 0 . 6 7 0 0 . 7 2 4 0 . 7 1
0 . 7 0 . 1 1 9 0 . 1 9 6 0 . 7 1 5 0 . 5 6 3 0 . 6 8 1 0 . 7 7 4 0 . 7 3 1 0 .7 5 7 0 . 6 6 0 0 . 7 5 4 0 . 7 3 1 0 . 7 4 6 0 . 6 9 9 0 . 7 6 3 0 . 7 2 8 0 . 7 4 9 0 . 7 1
0 . 8 0 . 1 2 4 0 . 1 8 9 0 . 6 8 9 0 . 5 0 8 0 . 6 8 0 0 . 7 8 9 0 . 7 6 0 0 .7 8 3 0 . 6 5 4 0 . 7 6 1 0 . 7 6 3 0 . 7 6 6 0 . 6 8 8 0 . 7 8 0 0 . 7 5 8 0 . 7 7 4 0 . 7 0
0 . 9 0 . 1 2 3 0 . 1 7 7 0 . 7 1 6 0 . 5 7 7 0 . 6 8 4 0 . 8 0 7 0 . 7 9 7 0 .7 9 8 0 . 6 5 4 0 . 7 7 5 0 . 7 9 8 0 . 7 9 0 0 . 6 9 1 0 . 8 0 6 0 . 7 9 7 0 . 7 9 8 0 . 7 1
N B
N B
T a b l e 2 : A v e r a g e F s c o r e s o n 2 0 N e w s g r o u p c o l le c t io n
T a b l e 1 : A v e r a g e F s c o r e s o n R e u t e r s c o ll e c t io n
P E B
P E B S -E M
S -E M R o c -S V
R o c - S V
8/2/2019 CS583 Partially Supervised Learning
47/48
CS583, Bing Liu, UIC 47
Results of Biased SVM
Average F score
Biased-SVM
Previous best F
score
0.3 0.785 0.780.7 0.856 0.845
0.3 0.742 0.689
0.7 0.805 0.774
Tab le 3: Average F scores on the two collections
Reuters
20Newsgrou
8/2/2019 CS583 Partially Supervised Learning
48/48
Summary
Gave an overview ofthe theory on learning withpositive and unlabeled examples. Described the existing two-step strategy for learning. Presented an more principled approach to solve the
problem based on a biased SVM formulation. Presented a performance measure pr/P(Y=1) that can
be estimated from data. Experimental results using text classification show the
superior classification power of Biased-SVM.
Some more experimental work are being performed tocompare Biased-SVM with weighted logisticregressionmethod [Lee & Liu 2003].