Upload
yca
View
213
Download
0
Embed Size (px)
Citation preview
A Selective Incremental Approach for Transductive
Nearest Neighbor Classification
P. Viswanath1, K. Rajesh2, C. Lavanya3
Departments of CSE and IT
Rajeev Gandhi Memorial College of Eng. and Technology
Nandyal-518501, A.P., India.1Email: [email protected],
[email protected], [email protected]
Y.C.A. Padmanabha Reddy
Department of Computer Science and Engineering
Madanapalle Institute of Technology and Science
Madanapalle-517325, A.P., India
Email:[email protected]
Abstract—Transductive learning is a special case ofsemi-supervised learning, where class labels to the test patternsalone are found. That is, the domain of the learner is the test setalone. Often, transductive learners achieve a better classificationaccuracy, since additional information in the form of test patternslocation in the feature-space is used. For several inductivelearners, there exists corresponding transductive learners; likefor SVMs there exists transductive SVMs (TSVMs). For near-est neighbor based classifiers, their corresponding transductivemethods are achieved through graph mincuts or spectral graphmincuts. It is shown that these solutions achieve low leave-one-outcross-validation (LOOCV) error with respect to nearest neighborbased classifiers. It is formally shown in the paper that, througha clustering method, it is possible to get various solutions havingzero LOOCV error with respect to nearest neighbor based clas-sifiers. Some solutions can have low classification accuracy. Thepaper proposes, instead of optimizing LOOCV error, to optimizea margin like criterion. This criterion is based on the observationthat similar labeled patterns should be nearer to each other, whiledissimilar labeled patterns should be far away. An approximatemethod to solve the proposed optimization problem is given in thepaper which is called selective incremental transductive nearestneighbor classifier (SI-TNNC). SI-TNNC finds the test patternfrom the test set which is very close to one class of trainingpatterns and at the same time very much away from the otherclass of training examples. The selected test pattern is given itsnearest neighbor’s label and is added to the training set. Thispattern is removed from the test set. The process is repeated withthe next best test pattern, and is stopped only when the test setbecomes empty. An algorithm to implement SI-TNNC methodis given in the paper which has a quadratic time complexity.Other related solutions have either cubic time complexity or areNP-hard. Experimentally, using several standard data-sets, it isshown that the proposed transductive learner achieves on-par orbetter classification accuracy than its related competitors.
Index Terms—semi-supervised learning, transduction, graphmincut, nearest neighbor classifier.
I. INTRODUCTION
Semi-supervised learning [1][2] deals with learning prob-
lems like those when the available data-set is having two
parts, viz., a subset consisting of labeled data (training set)
and the remaining part consisting of unlabeled data (test
set). Transductive inference is limited to predicting labels for
unlabeled data (test set) alone. This is in contrast to inductive
methods of finding a classifier, by using the given training
set, which is applicable for the entire feature space. Inductive
learning is often an ill-posed problem [3][4]. Semi-supervised
learning can use the location of test points (which is additional
information than using only the training set) along with the
training set. It is shown that for high dimensional problems
having small training sets (when compared to test sets),
transductive inference works better than inductive methods like
conventional classifiers [1].
Several transductive methods were proposed, like trans-
ductive SVMs [3] introduced by Vapnik which were later
studied by others [5], [6]. In this connection, graph theoretic
solutions, like s-t mincuts based solutions [7][8] and spectral
graph partitioning based solutions [9] are promising. The basic
conjecture used in these methods is, “similar examples ought
to have similar labels” [8]. A distance based similarity measure
can be used. These methods finds a labeling to the test data
in such a way that leave-one-out cross-validation (LOOCV)
error of the nearest-neighbor or k nearest-neighbor algorithm
is minimized, when applied to the entire data-set. Graph
mincut methods have similar relation to nearest neighbor
methods as TSVMs with SVMs [7]. Related solutions are, a
soft-partitioning based method was proposed in [10] where
labels are assigned based on minimizing a quadratic cost
function, and a spectral partitioning method which produces
an approximate minimum ratio cut is given in [9].
In this paper, first, few drawbacks of LOOCV error based
methods are formally shown, following by a transductive near-
est neighbor classifier which optimizes a margin like criterion,
which is the main contribution of the paper. The conjecture,
“similar examples ought to have similar labels” [8] is in a sense
a half way belief. Clustering (an unsupervised learning prob-
lem) is often solved based on the conjecture, ”similar examples
should be grouped together, while dissimilar examples should
form different groups”. Achieving a low LOOCV error with
respect to a nearest neighbor based classifier is a misleading
criterion, since, often a degenerate solution is obtained where
a very few (almost zero) unlabeled patterns are labeled to one
class while the rest are given the other label. A normalized
version of the mincut where number of examples in each group
is balanced can overcome this problem, but this solution is
shown to be NP-hard [7]. Spectral graph partitioning where the
Laplacian of the graph is a normalized one can also overcome
978-1-4244-9477-4/11/$26.00 ©2011 IEEE
221
this problem [9] and can be achieved in O(n3) time where nis the total data-set size. The solution given in this paper takes
O(n2) time.
Since, the optimization problem introduced in the paper is
a time consuming one to get an exact solution, a heuristic
based approximate method is given. The heuristic used is to
state informally, (i) select an example from the test set which is
very much similar to one class, which at the same time is very
much dissimilar from the other class, (ii) label the selected
example and add it to the training set. Repeat this process
until all examples in test set are consumed by the training
set. This approach is named selective incremental transductive
nearest neighbor classification (SI-TNNC). Experimentally, it
is shown that SI-TNNC can achieve similar or sometimes
better classification accuracy than graph based methods.
The rest of the paper is organized as follows. Section II gives
notation and definitions used in the paper. Section III formally
establishes shortcomings of LOOCV error based solutions
with respect to nearest neighbor based classifiers. Section IV
describes the proposed method of the paper. Section V gives
the experimental comparison. Finally, Section VI concludes
the paper while giving some future directions of research.
II. NOTATION AND DEFINITIONS
1) Class labels: The set of class labels, Ω = −1,+1.
For simplicity, a two class problem is assumed.
2) Training set: L = (x1, y1), . . . , (xl, yl) is the training
set, where xi is a d-dimensional feature-vector, and yi
is its class label. This is also called the labeled set. l is
the training set size. Training patterns with class-label
+1 is the subset L+ ⊆ L, and that with class-label −1is L− ⊆ L. The feature space, by default, is assumed to
be a d-dimensional Euclidean space, Rd.
3) Test set: U = xl+1, . . . , xn. This is also called the
unlabeled set. u is the test set size. we have n = l + u.
4) Distance function: ||xi − xj || is the distance between
two patterns xi and xj . If feature-vectors are from a
Euclidean space, then this is Euclidean distance. Oth-
erwise, an appropriate distance is used (like matching
coefficient) based on the feature space.
5) Inductive learner: This is a function f (L) : Rd → Ω.
This function is learned using the training set L and can
be used to predict a label for any x ∈ Rd.
6) Transductive learner: This is a function
g(L∪U) : U → Ω. Note, here the domain of the
function is limited to U . This function can use labeled
as well as unlabeled set in predicting class label for a
test pattern.
7) Loss function: For a pattern x, if y′ is the predicted
label by a learner, and y is its actual label, then the loss
associated with this prediction is,
L(y, y′) =
0 if y = y′
1 if y 6= y′ (1)
8) LOOCV error: This is leave-one-out cross-validation
(LOOCV) error of an inductive learner. Let U ′ =
(xl+1, y′l+1), . . . , (xn, y′
n), where each test pattern xis associated with its predicted label y′. Let D = L∪U ′.
Then LOOCV error of an inductive learner f with
respect to D is,
LOOCV-error(f,D) =1
n
(
l∑
i=1
(L(yi, f(D−xi)(xi))
+n∑
i=l+1
L(y′i, f
(D−xi)(xi))
)
9) Classification Accuracy (CA): This is prediction accu-
racy over the test set. We assume that for each test
pattern xi, for i = l + 1, . . . , n, its actual label yi is
available. Let the predicted label for xi be y′i. Then the
classification accuracy is,(
1
u
n∑
i=l+1
(1 − L(yi, y′i))
)
× 100% (2)
III. DRAWBACKS OF LOOCV ERROR BASED SOLUTIONS
LOOCV error is often seen as an unbiased estimate of the
error-rate of an inductive learner [9]. LOOCV-error((f, L ∪U ′)) is a measure of goodness for the predictions given to
test patterns with respect to the classifier f . So, one way
of doing transductive inference is to assign labels to test
patterns such that LOOCV-error((f, L∪U ′)) is minimized for
a chosen f . When f is the nearest neighbor classifier (or k-
nearest neighbor classifier) a solution produced through graph
mincuts achieves least LOOCV error [7][8][9]. First, solutions
proposed in [7],[8] and [9] are briefly described, followed by
a formal explanation of drawbacks of LOOCV error based
solutions, when the associated inductive learner is a nearest
neighbor based one.
A. Graph mincut and Spectral graph theory based solutions
A nearest neighbor graph (or k-nearest neighbor graph) G =(V,E) is an undirected graph, where V is the set of patterns
which corresponds to the nodes of the graph, and an edge
between a pair of patterns is given a weight (w) which is
the similarity between them. Set of edges is E ⊆ V × V . If
similarity between a pair is less than or equal to a threshold,
then the corresponding edge does not exist in the graph. For
nearest neighbor based graphs, this threshold is often set to
zero.
In [7], for theoretical analysis, similarity measure used is
as follows. Two special nodes called classification nodes v+
and v− are added to V . For all x in L+, w(x, v+) = ∞.
Similarly for all x in L−, w(x, v−) = ∞. For other patterns
the following neighborhood relation is used, where x and yare any patterns.
nn(x, y) =
1 if y is the nearest neighbor of x0 otherwise.
(3)
Then, the similarity weight between x and y is,
w(x, y) = nn(x, y) + nn(y, x).
222
A cut partitions v in to V+, V−. The cost of this cut is,
cut(V+, V−) =∑
x∈V+,y∈V−
w(x, y) (4)
It is theoretically shown that, if we get a minimum cut solution
and label all patterns in V+ with +1, and in V− with −1, then
the LOOCV error of the nearest neighbor classifier over the
entire data-set will be zero.
Practically, this solution can give several solutions having
zero cut value. Also, often, solutions are found where very few
test patterns are labeled with a class-label, while the rest with
the opposite class-label. To overcome these, practical solutions
given are based on Mincut-δ, where a graph is constructed
such that two points are connected with an edge if they are
closer than δ. Further, variations of these are,
• Mincut-δ0: The value of δ chosen is the maximum value
such that the graph has a cut of value 0.
• Mincut-δ1/2: The value of δ chosen is such that the largest
connected component in the graph has size equal to half
the number of data points.
• Mincut-δopt: The value of δ chosen corresponds to the
least classification error in hindsight. Experimentally, this
is shown to give good results.
Solution given in [8] is an extension of the above solu-
tion. Several mincuts are found by producing several graphs
by adding some random noise to the edges. Among these
unbalanced solutions are thrown away. From the remaining
solutions, a labeling is given to the test patterns according to
the majority consensus. This technique is named randomized
mincuts solution.
In [9] a k-nearest neighbor graph is constructed, where
w(x, y) =
sim(x,y)∑
z∈knn(x)sim(x,z)
if y ∈ knn(x)
0 otherwise.(5)
where sim(x, y) is the similarity between x and y, and knn(x)is the set of k-nearest neighbors of x. A distance based
similarity like sim(x, y) = 1||x−y|| or sim(x, y) = −||x − y||
can be used. The graph is represented as an adjacency matrix Awith corresponding w(x, y) values. Normalized Laplacian of
A is used for doing spectral clustering. Based on this clustering
result, labels to test patterns are given.
Mincut-δ based solutions given in [7] and in [8] are NP-hard
since these are equivalent to finding a normalized graph mincut
(ratiocut of the graph) [9]. But, spectral graph partitioning can
be done in O(n3) time.
B. LOOCV error for k-nearest neighbor classifier
First, a clustering of the data-set L∪U (class-labels present
in L are ignored while doing clustering) called k-nn consistent
clustering is defined. Based on this clustering result, a formal
way of labeling the unlabeled data is described. We show that
this labeling will have zero LOOCV error with respect to k-
nearest neighbor classifier.
k-nn consistent clustering: For a given positive integer k, a
clustering π = C1, C2, . . . , Cm of the data-set D = L∪U , is
Feature 1
Fea
ture
2
*
Labeled pattern with label −1
Labeled pattern with label +1
Unlabeled pattern
**
**
* *
* ** *
*
C C1 2 CC3
4
Fig. 1. An example of 3-nn consistent clustering
said to be knn consistent, if each cluster satisfies the following
three conditions.
1) At least k + 1 patterns are present in each cluster.
2) All k nearest neighbors of a pattern x (excluding x) are
contained in the same cluster as x, for all x ∈ D.
3) In any cluster, if there are labeled patterns, then class-
labels for all these patterns are same.
For small values of k, finding k-nn consistent clustering is
quite easy. For larger values of k, k-nn consistent clustering
may not exist at all. When k = 1, finding k-nn consistent
clustering can be achieved through the well known single-link
clustering [11] method. Here, merging of nearest clusters is
progressively done. Merging process can be stopped when all
clusters get at least 2 patterns and no cluster has opposite
labeled patterns. A general procedure for this is:
1) For each pattern x ∈ D, find the group of k-nearest
neighbors of it (excluding itself). That is, find in total
k+1 patterns including x, which are k-nearest neighbors
of x. Let us call these subsets as groups.
2) Merge two groups, if they are not disjoint. Repeat this
merging till all resulting groups are disjoint.
3) If any group has labeled patterns of two different labels.
Then output “k-nn consisting clustering is not possible”.
Else output the final groups as the k-nn consistent
clustering.
An example of 3-nn consistent clustering is given in Fig. 1
for a 2 dimensional data.
k-nn consistent labeling: Let π = C1, C2, . . . , Cm is a
k-nn consistent clustering. For clusters where labeled patterns
are present, label the remaining unlabeled patterns with the
same label as the labeled patterns. For other clusters, where
all patterns are unlabeled, label either +1 or -1 to all patterns.
Theorem 3.1: If labels to unlabeled patterns are given ac-
cording to k-nn consistent labeling, then LOOCV error of the
entire data set with respect to the k-nearest neighbor classifier
is zero.
Proof: k-nearest neighbors of each pattern x (excluding itself)
are in the same cluster. Since labeling for a cluster is uniform,
the predicted label of x is same as the assigned label of x.
Hence, loss associated with x is zero. This is true for all
patterns, hence the total loss is zero.
223
***
*
**
C C1 2 C C34
Fig. 2. A 3-nn consistent labeling for the data shown in Fig. 1 having zeroLOOCV error.
*
*
C C1 2 C C34
Fig. 3. A desirable 3-nn consistent labeling for the data shown in Fig. 1having zero LOOCV error.
Two 3-nn consistent labellings to the example in Fig. 1
are shown in Fig. 2 and in Fig. 3. Note, for both labellings,
LOOCV error with respect to 3-nearest neighbor classifier is
zero. Labeling shown in Fig. 2 is not a desirable one, but that
shown in Fig. 3 is a desirable one. For this example, there
are two more solutions having LOOCV error zero, where all
patterns in C2 and C3 are labeled with either +1 or with −1.
But, LOOCV error can produce any solution either randomly
or based on the ordering in which patterns are processed.
This is clearly a drawback of LOOCV error based labeling
where the inductive classifier is a nearest neighbor based one.
This kind of drawback is not present, for example, when
one uses LOOCV error with respect to SVMs to guide the
labeling. Nearest neighbor based methods are highly localized
ones, hence LOOCV error can be a misleading criterion for
transductive learning.
IV. A SELECTIVE INCREMENTAL TRANSDUCTIVE
NEAREST NEIGHBOR CLASSIFICATION (SI-TNNC)
This Section presents the proposed transductive nearest
neighbor classifier. First, motivation (or intuition) behind this
transductive labeling scheme is informally explained, followed
by a detailed formal description.
The nearest neighbor classifier (NNC) classifies a given
test pattern according to its nearest neighbor in the training
set. This can be done in the following way also, where
nearest neighbors in each class of training patterns are found
separately, and then the nearest neighbor is found. Let x be
the test pattern, let its nearest neighbor’s distance in L+ be
d+(x), and in L− be d−(x). Then, class-label assigned to xis y′ = +1 if d+(x) < d−(x), y′ = −1 otherwise. One way
to measure goodness (γ) of this assignment is,
γ(x, y′) =
(d−(x) − d+(x)) if y′ = +1(d+(x) − d−(x)) if y′ = −1
(6)
This can be simplified as
γ(x, y′) = y′(d−(x) − d+(x)) (7)
This goodness measure γ(x, y′) is called margin of x with
respect to L. This is done in a similar way as functional
margin is used for a hyper-plane in order to learn a SVM [12].
Intuitively, if margin is large the confidence in that labeling
assignment is high. If labeling has to be done for the test set
U collectively, then, suppose we assigned a labeling to all
patterns in U to get U ′ = (xl+1, y′l+1), . . . , (xn, y′
n), then
the goodness of this labeling is,
Γ(L ∪ U ′) = miny′i(d−(xi) − d+(xi) | i = 1 to n. (8)
Here for a training pattern x, y′ = y, .i.e., the available label
in the training set itself is used. Further, nearest neighbors of
x are found from (L ∪ U ′) − x. That is, excluding x, its
nearest neighbor from L ∪ U ′ with label +1 and with label
−1 are found.
Now, a labeling is given to the test data in such a way
that Γ(L ∪ U ′) is maximized. Intuitively, this can be related
to SVM learning. SVM’s find a hyper-plane either in feature
space or in a kernel induced space such that the margin is
maximized [12].
Formally, this is, find a labeling to U such that
U ′ = arg maxU ′Γ(L ∪ U ′) (9)
This optimization problem is a discrete one and hence
gradient-descent kind of procedures are not feasible. Evo-
lutionary techniques, like Genetic Algorithms can be used.
But, this takes excessive amounts of time. Instead, the paper
proposes a heuristic based approximate technique to solve this
problem. This is an iterative incremental labeling procedure.
For each pattern x ∈ U a score s(x) = |d+(x) − d−(x)|is given. Here we consider only L to find nearest neighbors
of x in both of the classes. The pattern with highest score
is added to L along with its nearest neighbor’s label. The
process is repeated till all patterns in the test set, along with
their nearest neighbor’s label, are added to L. This method
is called selective incremental transductive nearest neighbor
classification (SI-TNNC). The method is described formally in
Algorithm 1.
A. The time complexity analysis of SI-TNNC
Training set size, |L| = l, test set size, |U | = u, and l+u =n is the total data-set size (refer Section II for the Notation
used). Outer “while loop” is executed u times. Time taken for
each iteration of the outer “while loop” depends on the current
training set size, and current test set size. After each “while
loop” iteration training set size is increased by 1, and test
set size is decreased by 1. At the kth iteration of the “while
loop”, the time taken by the inner “for loop” is proportionate
to (l + k)(u − k). So the time taken by the Algorithm 1 is
proportionate to
lu + (l + 1)(u − 1) + (l + 2)(u − 2) + · · · + (l + u − 1)(1).
So, the time complexity is O(lu2). In the worst case, l = u =n/2, the time complexity of SI-TNNC is O(n3). An improved
algorithm, which achieves the same result as SI-TNNC, but in
O(n2) worst case time, is presented below which is called
Improved SI-TNNC method.
224
Algorithm 1 SI-TNNC(L, U )
while U 6= ∅ do
for all x ∈ U do
Find d+(x) = distance between x and its nearest
neighbor in L+;
Find d−(x) = distance between x and its nearest
neighbor in L−;
Find s(x) = |d+(x) − d−(x)|;end for
Find x ∈ U having maximum s(x).if d+(x) < d−(x) then
y′ = +1;
else
y′ = −1;
end if
L = L ∪ (x, y′);
U = U − x;
end while
Output U ′ = (xl+1, y′l+1), . . . , (xn, y′
n), which is now a
subset of L;
B. Improved SI-TNNC method
For each test pattern x, we store the pair (d+(x), d−(x))with respect to the current training set L, called the distance
pair of x. Let z be the next test pattern which is added to the
training set. Let z is assigned with class-label yz . Then L is
updated to L = L ∪ (z, yz). Now, to update the distance
pair of x, there is no need to consider all patterns in L,
considering (z, yz) is enough. Let the updated distance pair for
x be (dnew+ (x), dnew
− (x)), which can be obtained as described
below.
dnew+ (x) =
min(d+(x), ||x − z||) if yz = +1d+(x) if yz = −1
(10)
dnew− (x) =
d−(x) if yz = +1min(d−(x), ||x − z||) if yz = −1
(11)
Algorithm 2 describes the Improved SI-TNNC method
which is based on the above observation. It is easy to see
that Algorithm 2 produces same result as Algorithm 1.
C. The time complexity analysis of Improved-SI-TNNC
The time taken by Algorithm 2 is proportionate to
lu + (u − 1) + (u − 2) + · · · + 1.
Hence the time complexity is O(lu + u2). In the worst case,
when l = u = n/2, the time complexity is O(n2).
V. EXPERIMENTAL RESULTS
Experiments are conducted with five standard data-sets
which are drawn from from the data-sets available at UCI
Machine Learning Repository [13]. Properties of the data-
sets along with distance function used are shown in Table I.
Note, same data-sets are used in [7] and [8]. The classifiers
Algorithm 2 Improved-SI-TNNC(L, U )
U ′ = ∅;
for all x ∈ U do
Find d+(x) = distance between x and its nearest neighbor
in L+;
Find d−(x) = distance between x and its nearest neighbor
in L−;
Store distance-pair(x) = (d+(x), d−(x));end for
while U 6= ∅ do
Find z ∈ U such that s(z) = |d+(z) − d−(z)| is
maximum;
if d+(z) < d−(z) then
yz = +1;
else
yz = −1;
end if
U ′ = U ′ ∪ (z, yz);
U = U − z;
for all x ∈ U do
if yz = +1 then
d+(x) = min(d+(x), ||x − z||);else
d−(x) = min(d−(x), ||x − z||);end if
end for
end while
Output U ′;
TABLE IDETAILS OF DATA-SETS USED
Data-set Number of |L| |U | DistanceFeatures function
VOTING 16 45 390 Jaccard Coefficient
MUSH 22 20 1000 Simple Matching
IONO 34 50 300 Euclidean
BUPA 6 45 300 Euclidean
PIMA 8 50 718 Euclidean
used for the comparison purpose are, (i) graph mincut-δopt [7]
(a transductive classifier), (ii) randomized graph mincut [8]
(a transductive classifier), (iii) spectral graph partitioning [9]
(a transductive classifier), (iv) ID3 (a decision tree based
classifier, an inductive classifier) [14][15], (v) 3-NNC (3-
nearest neighbor classifier, an inductive classifier) [16], (vi)
SI-TNNC (the proposed method of this paper, a transductive
classifier). Classifiers for comparison are chosen so as to
compare with other transductive methods which are similar to
the proposed method of the paper. Two well known induction
based classifiers viz., ID3 and 3-NNC are also used for the
comparison purpose.
It can be seen that the proposed SI-TNNC method is
comparable with other classifiers and in some cases shows
better results. A plot comparing classification accuracy with
225
TABLE IICA (%) FOR VARIOUS CLASSIFIERS
Data-set Mincut Rand. Spectral-δopt Graph Graph ID3 3-NNC SI-TNNC
Mincut Partit.
VOTING 91.3 91.2 85.9 86.4 89.6 93.5
MUSH 97.7 94.2 91.6 93.3 91.1 96.7
IONO 81.6 82.8 79.7 88.6 69.6 82.67
BUPA 59.3 63.5 61.6 55.3 52.7 64.67
PIMA 72.3 67.5 68.2 70.0 68.1 72.70
40
50
60
70
80
90
100
0 5 10 15 20
Cla
ssific
atio
n A
ccu
racy (
%)
Training set size (|L|)
MincutRand. Graph MincutSpectral Graph Part.
ID33-NNC
SI-TNNC
Fig. 4. Classification Accuracy Vs. Training Set Size for MUSH data.
training set size for MUSH (Mushroom) data is shown in
Fig. 4. It can be seen that SI-TNNC method shows consistent
improvement as the training set size increases. For some
smaller sized training sets SI-TNNC performs better than all
other classifiers.
It is worth noting that a mincut of a graph (which is called
s-t mincut) can be found in O(n3) time [17]. Several standard
techniques for this are available. But, this can give a degenerate
solution having highly imbalanced labeling. To overcome this
problem a normalized graph mincut is used (variations of such
solutions are given in [7], [8]) which are NP-hard, and hence
are feasible for only small data-sets. A randomized algorithm
to solve the s-t mincut problem can be done in O(n2 log n)time which uses O(n2) space to find a solution with high
probability [18](with probability O(1/log n)), where n is the
data-set size. Spectral graph partitioning can be done in O(n3)time. The proposed method of this paper, viz., SI-TNNC,
whose improved version is given in Algorithm 2 takes O(n2)time.
VI. CONCLUSION AND FUTURE WORK
Nearest neighbor based transductive learners optimize leave-
one-out cross-validation (LOOCV) error. These methods are
realized by graph mincut or by spectral graph partitioning.
But, LOOCV error with nearest neighbor based classifiers have
some drawbacks. It is shown formally, that various labellings
to the test set are possible which will have zero LOOCV error.
Some of these solutions can have low classification accuracy.
The paper proposed a transductive learner called selective in-
cremental transductive nearest neighbor classifier (SI-TNNC)
to label the test patterns. The method is an approximate
one which maximizes a margin like criterion. The proposed
method has quadratic time complexity, whereas normalized
graph mincut methods are either NP-hard or have cubic
time complexity (with spectral graph partitioning method).
Experimentally, the proposed method is shown to have on-
par or better classification accuracy when compared with a
few standard data-sets.
Future work is to devise an algorithm to find exact solution
which maximizes the criterion proposed in this paper.
ACKNOWLEDGMENTS
The work is partly funded by an AICTE Project under RPS
Scheme with reference: “F.No: 8023/BOR/RID/RPS-51/2009-
10”.REFERENCES
[1] O. Chapelle, B. Scholkopf, and A. Zein, Semi- Supervised Learning.Cambridge, Massachusetts: The MIT Press, 2006.
[2] D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. S. lkopf, “Learningwith local and global consistency,” in Advances in Neural Information
Processing Systems, S. Thrun, L. Saul, and B. S. lkopf, Eds., vol. 16.Cambridge, MA: The MIT Press, 2004, pp. 321–328.
[3] V. Vapnik, Statistical Learning Theory. John Wiley & Sons: A Wiley-interscience Publication, New York, 1998.
[4] V. Vapnik, Estimation of Dependences Based on Empirical Data, 2nd ed.New York: Springer Series in Statistics, Springer-Verlag, 2006.
[5] K. Bennett, “Combining support vector and mathematical programmingmethods for classification,” in Advances in kernel methods - support
vector learning, B. Scholkopf et al., Ed. MIT-Press, 1999.[6] T. Joachims, “Transductive inference for text classification using support
vector machines,” in Sixteenth International Conference on Machine
Learning. Bled Slovenia: Morgan Kaufmann, 1999, pp. 200–209.[7] A. Blum and S. Chawla, “Learning from labeled and unlabeled data
using graph mincut,” in Eighteenth International Conference on Machine
Learning. Morgan Kaufmann, 2001, pp. 19–26.[8] A. Blum, J. Lafferty, M. Rwebangira, and R. Reddy, “Semi-supervised
learning using randomized mincuts,” in International Conference on
Machine Learning. Morgan Kaufmann, 2004.[9] T. Joachims, “Transductive learning via spctral graph partitioning,” in
International Conference on Machine Learning, 2003, pp. 290–297.[10] X. Zhu, Z. Gharahmani, and J. Lafferty, “Semi-supervised learning
using Gaussian fields and harmonic functions,” in 20th International
Conference on Machine Learning, 2003, pp. 912–919.[11] A. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: A review,” ACM
Computing Surveys, vol. 31, no. 3, pp. 264–323, 1999.[12] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector
Machines and Other Kernel-based Learning Methods, 1st ed. Cam-bridge University Press, 2000.
[13] P.M.Murphy, UCI Repository of Machine Learning Databases [http://
www.ics.uci.edu/mlearn/MLRepository.html], Department of Informationand Computer Science, University of California, Irvine, CA, 2000.
[14] R. O. Duda, P. E.Hart, and D. G. Stork, Pattern Classification, 2nd ed.John Wiley & Sons: A Wiley-interscience Publication, 2000.
[15] J. Han and M. Kamber, Data Mining: Concepts and Techniques,Academic Press, 2001.
[16] B. V. Dasarathy, “Data mining tasks and methods: Classification:Nearest-neighbor approaches,” in Handbook of data mining and knowl-
edge discovery. New York: Oxford University Press, 2002, pp. 288–298.[17] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to
Algorithms. Cambridge, MA, U.S.A: The MIT Press, 1990.[18] R. Motwani and P. Raghavan, Randomized Algorithms. Cambridge,
UK: Cambridge University Press, 1995.
226