6
A Selective Incremental Approach for Transductive Nearest Neighbor Classification P. Viswanath 1 , K. Rajesh 2 , C. Lavanya 3 Departments of CSE and IT Rajeev Gandhi Memorial College of Eng. and Technology Nandyal-518501, A.P., India. 1 Email: [email protected], 2 [email protected], 3 [email protected] Y.C.A. Padmanabha Reddy Department of Computer Science and Engineering Madanapalle Institute of Technology and Science Madanapalle-517325, A.P., India Email:[email protected] Abstract—Transductive learning is a special case of semi-supervised learning, where class labels to the test patterns alone are found. That is, the domain of the learner is the test set alone. Often, transductive learners achieve a better classification accuracy, since additional information in the form of test patterns location in the feature-space is used. For several inductive learners, there exists corresponding transductive learners; like for SVMs there exists transductive SVMs (TSVMs). For near- est neighbor based classifiers, their corresponding transductive methods are achieved through graph mincuts or spectral graph mincuts. It is shown that these solutions achieve low leave-one-out cross-validation (LOOCV) error with respect to nearest neighbor based classifiers. It is formally shown in the paper that, through a clustering method, it is possible to get various solutions having zero LOOCV error with respect to nearest neighbor based clas- sifiers. Some solutions can have low classification accuracy. The paper proposes, instead of optimizing LOOCV error, to optimize a margin like criterion. This criterion is based on the observation that similar labeled patterns should be nearer to each other, while dissimilar labeled patterns should be far away. An approximate method to solve the proposed optimization problem is given in the paper which is called selective incremental transductive nearest neighbor classifier (SI-TNNC). SI-TNNC finds the test pattern from the test set which is very close to one class of training patterns and at the same time very much away from the other class of training examples. The selected test pattern is given its nearest neighbor’s label and is added to the training set. This pattern is removed from the test set. The process is repeated with the next best test pattern, and is stopped only when the test set becomes empty. An algorithm to implement SI-TNNC method is given in the paper which has a quadratic time complexity. Other related solutions have either cubic time complexity or are NP-hard. Experimentally, using several standard data-sets, it is shown that the proposed transductive learner achieves on-par or better classification accuracy than its related competitors. Index Terms—semi-supervised learning, transduction, graph mincut, nearest neighbor classifier. I. I NTRODUCTION Semi-supervised learning [1][2] deals with learning prob- lems like those when the available data-set is having two parts, viz., a subset consisting of labeled data (training set) and the remaining part consisting of unlabeled data (test set). Transductive inference is limited to predicting labels for unlabeled data (test set) alone. This is in contrast to inductive methods of finding a classifier, by using the given training set, which is applicable for the entire feature space. Inductive learning is often an ill-posed problem [3][4]. Semi-supervised learning can use the location of test points (which is additional information than using only the training set) along with the training set. It is shown that for high dimensional problems having small training sets (when compared to test sets), transductive inference works better than inductive methods like conventional classifiers [1]. Several transductive methods were proposed, like trans- ductive SVMs [3] introduced by Vapnik which were later studied by others [5], [6]. In this connection, graph theoretic solutions, like s-t mincuts based solutions [7][8] and spectral graph partitioning based solutions [9] are promising. The basic conjecture used in these methods is, “similar examples ought to have similar labels” [8]. A distance based similarity measure can be used. These methods finds a labeling to the test data in such a way that leave-one-out cross-validation (LOOCV) error of the nearest-neighbor or k nearest-neighbor algorithm is minimized, when applied to the entire data-set. Graph mincut methods have similar relation to nearest neighbor methods as TSVMs with SVMs [7]. Related solutions are, a soft-partitioning based method was proposed in [10] where labels are assigned based on minimizing a quadratic cost function, and a spectral partitioning method which produces an approximate minimum ratio cut is given in [9]. In this paper, first, few drawbacks of LOOCV error based methods are formally shown, following by a transductive near- est neighbor classifier which optimizes a margin like criterion, which is the main contribution of the paper. The conjecture, “similar examples ought to have similar labels” [8] is in a sense a half way belief. Clustering (an unsupervised learning prob- lem) is often solved based on the conjecture, ”similar examples should be grouped together, while dissimilar examples should form different groups”. Achieving a low LOOCV error with respect to a nearest neighbor based classifier is a misleading criterion, since, often a degenerate solution is obtained where a very few (almost zero) unlabeled patterns are labeled to one class while the rest are given the other label. A normalized version of the mincut where number of examples in each group is balanced can overcome this problem, but this solution is shown to be NP-hard [7]. Spectral graph partitioning where the Laplacian of the graph is a normalized one can also overcome 978-1-4244-9477-4/11/$26.00 ©2011 IEEE 221

[IEEE 2011 IEEE Recent Advances in Intelligent Computational Systems (RAICS) - Trivandrum, India (2011.09.22-2011.09.24)] 2011 IEEE Recent Advances in Intelligent Computational Systems

  • Upload
    yca

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: [IEEE 2011 IEEE Recent Advances in Intelligent Computational Systems (RAICS) - Trivandrum, India (2011.09.22-2011.09.24)] 2011 IEEE Recent Advances in Intelligent Computational Systems

A Selective Incremental Approach for Transductive

Nearest Neighbor Classification

P. Viswanath1, K. Rajesh2, C. Lavanya3

Departments of CSE and IT

Rajeev Gandhi Memorial College of Eng. and Technology

Nandyal-518501, A.P., India.1Email: [email protected],

[email protected], [email protected]

Y.C.A. Padmanabha Reddy

Department of Computer Science and Engineering

Madanapalle Institute of Technology and Science

Madanapalle-517325, A.P., India

Email:[email protected]

Abstract—Transductive learning is a special case ofsemi-supervised learning, where class labels to the test patternsalone are found. That is, the domain of the learner is the test setalone. Often, transductive learners achieve a better classificationaccuracy, since additional information in the form of test patternslocation in the feature-space is used. For several inductivelearners, there exists corresponding transductive learners; likefor SVMs there exists transductive SVMs (TSVMs). For near-est neighbor based classifiers, their corresponding transductivemethods are achieved through graph mincuts or spectral graphmincuts. It is shown that these solutions achieve low leave-one-outcross-validation (LOOCV) error with respect to nearest neighborbased classifiers. It is formally shown in the paper that, througha clustering method, it is possible to get various solutions havingzero LOOCV error with respect to nearest neighbor based clas-sifiers. Some solutions can have low classification accuracy. Thepaper proposes, instead of optimizing LOOCV error, to optimizea margin like criterion. This criterion is based on the observationthat similar labeled patterns should be nearer to each other, whiledissimilar labeled patterns should be far away. An approximatemethod to solve the proposed optimization problem is given in thepaper which is called selective incremental transductive nearestneighbor classifier (SI-TNNC). SI-TNNC finds the test patternfrom the test set which is very close to one class of trainingpatterns and at the same time very much away from the otherclass of training examples. The selected test pattern is given itsnearest neighbor’s label and is added to the training set. Thispattern is removed from the test set. The process is repeated withthe next best test pattern, and is stopped only when the test setbecomes empty. An algorithm to implement SI-TNNC methodis given in the paper which has a quadratic time complexity.Other related solutions have either cubic time complexity or areNP-hard. Experimentally, using several standard data-sets, it isshown that the proposed transductive learner achieves on-par orbetter classification accuracy than its related competitors.

Index Terms—semi-supervised learning, transduction, graphmincut, nearest neighbor classifier.

I. INTRODUCTION

Semi-supervised learning [1][2] deals with learning prob-

lems like those when the available data-set is having two

parts, viz., a subset consisting of labeled data (training set)

and the remaining part consisting of unlabeled data (test

set). Transductive inference is limited to predicting labels for

unlabeled data (test set) alone. This is in contrast to inductive

methods of finding a classifier, by using the given training

set, which is applicable for the entire feature space. Inductive

learning is often an ill-posed problem [3][4]. Semi-supervised

learning can use the location of test points (which is additional

information than using only the training set) along with the

training set. It is shown that for high dimensional problems

having small training sets (when compared to test sets),

transductive inference works better than inductive methods like

conventional classifiers [1].

Several transductive methods were proposed, like trans-

ductive SVMs [3] introduced by Vapnik which were later

studied by others [5], [6]. In this connection, graph theoretic

solutions, like s-t mincuts based solutions [7][8] and spectral

graph partitioning based solutions [9] are promising. The basic

conjecture used in these methods is, “similar examples ought

to have similar labels” [8]. A distance based similarity measure

can be used. These methods finds a labeling to the test data

in such a way that leave-one-out cross-validation (LOOCV)

error of the nearest-neighbor or k nearest-neighbor algorithm

is minimized, when applied to the entire data-set. Graph

mincut methods have similar relation to nearest neighbor

methods as TSVMs with SVMs [7]. Related solutions are, a

soft-partitioning based method was proposed in [10] where

labels are assigned based on minimizing a quadratic cost

function, and a spectral partitioning method which produces

an approximate minimum ratio cut is given in [9].

In this paper, first, few drawbacks of LOOCV error based

methods are formally shown, following by a transductive near-

est neighbor classifier which optimizes a margin like criterion,

which is the main contribution of the paper. The conjecture,

“similar examples ought to have similar labels” [8] is in a sense

a half way belief. Clustering (an unsupervised learning prob-

lem) is often solved based on the conjecture, ”similar examples

should be grouped together, while dissimilar examples should

form different groups”. Achieving a low LOOCV error with

respect to a nearest neighbor based classifier is a misleading

criterion, since, often a degenerate solution is obtained where

a very few (almost zero) unlabeled patterns are labeled to one

class while the rest are given the other label. A normalized

version of the mincut where number of examples in each group

is balanced can overcome this problem, but this solution is

shown to be NP-hard [7]. Spectral graph partitioning where the

Laplacian of the graph is a normalized one can also overcome

978-1-4244-9477-4/11/$26.00 ©2011 IEEE

221

Page 2: [IEEE 2011 IEEE Recent Advances in Intelligent Computational Systems (RAICS) - Trivandrum, India (2011.09.22-2011.09.24)] 2011 IEEE Recent Advances in Intelligent Computational Systems

this problem [9] and can be achieved in O(n3) time where nis the total data-set size. The solution given in this paper takes

O(n2) time.

Since, the optimization problem introduced in the paper is

a time consuming one to get an exact solution, a heuristic

based approximate method is given. The heuristic used is to

state informally, (i) select an example from the test set which is

very much similar to one class, which at the same time is very

much dissimilar from the other class, (ii) label the selected

example and add it to the training set. Repeat this process

until all examples in test set are consumed by the training

set. This approach is named selective incremental transductive

nearest neighbor classification (SI-TNNC). Experimentally, it

is shown that SI-TNNC can achieve similar or sometimes

better classification accuracy than graph based methods.

The rest of the paper is organized as follows. Section II gives

notation and definitions used in the paper. Section III formally

establishes shortcomings of LOOCV error based solutions

with respect to nearest neighbor based classifiers. Section IV

describes the proposed method of the paper. Section V gives

the experimental comparison. Finally, Section VI concludes

the paper while giving some future directions of research.

II. NOTATION AND DEFINITIONS

1) Class labels: The set of class labels, Ω = −1,+1.

For simplicity, a two class problem is assumed.

2) Training set: L = (x1, y1), . . . , (xl, yl) is the training

set, where xi is a d-dimensional feature-vector, and yi

is its class label. This is also called the labeled set. l is

the training set size. Training patterns with class-label

+1 is the subset L+ ⊆ L, and that with class-label −1is L− ⊆ L. The feature space, by default, is assumed to

be a d-dimensional Euclidean space, Rd.

3) Test set: U = xl+1, . . . , xn. This is also called the

unlabeled set. u is the test set size. we have n = l + u.

4) Distance function: ||xi − xj || is the distance between

two patterns xi and xj . If feature-vectors are from a

Euclidean space, then this is Euclidean distance. Oth-

erwise, an appropriate distance is used (like matching

coefficient) based on the feature space.

5) Inductive learner: This is a function f (L) : Rd → Ω.

This function is learned using the training set L and can

be used to predict a label for any x ∈ Rd.

6) Transductive learner: This is a function

g(L∪U) : U → Ω. Note, here the domain of the

function is limited to U . This function can use labeled

as well as unlabeled set in predicting class label for a

test pattern.

7) Loss function: For a pattern x, if y′ is the predicted

label by a learner, and y is its actual label, then the loss

associated with this prediction is,

L(y, y′) =

0 if y = y′

1 if y 6= y′ (1)

8) LOOCV error: This is leave-one-out cross-validation

(LOOCV) error of an inductive learner. Let U ′ =

(xl+1, y′l+1), . . . , (xn, y′

n), where each test pattern xis associated with its predicted label y′. Let D = L∪U ′.

Then LOOCV error of an inductive learner f with

respect to D is,

LOOCV-error(f,D) =1

n

(

l∑

i=1

(L(yi, f(D−xi)(xi))

+n∑

i=l+1

L(y′i, f

(D−xi)(xi))

)

9) Classification Accuracy (CA): This is prediction accu-

racy over the test set. We assume that for each test

pattern xi, for i = l + 1, . . . , n, its actual label yi is

available. Let the predicted label for xi be y′i. Then the

classification accuracy is,(

1

u

n∑

i=l+1

(1 − L(yi, y′i))

)

× 100% (2)

III. DRAWBACKS OF LOOCV ERROR BASED SOLUTIONS

LOOCV error is often seen as an unbiased estimate of the

error-rate of an inductive learner [9]. LOOCV-error((f, L ∪U ′)) is a measure of goodness for the predictions given to

test patterns with respect to the classifier f . So, one way

of doing transductive inference is to assign labels to test

patterns such that LOOCV-error((f, L∪U ′)) is minimized for

a chosen f . When f is the nearest neighbor classifier (or k-

nearest neighbor classifier) a solution produced through graph

mincuts achieves least LOOCV error [7][8][9]. First, solutions

proposed in [7],[8] and [9] are briefly described, followed by

a formal explanation of drawbacks of LOOCV error based

solutions, when the associated inductive learner is a nearest

neighbor based one.

A. Graph mincut and Spectral graph theory based solutions

A nearest neighbor graph (or k-nearest neighbor graph) G =(V,E) is an undirected graph, where V is the set of patterns

which corresponds to the nodes of the graph, and an edge

between a pair of patterns is given a weight (w) which is

the similarity between them. Set of edges is E ⊆ V × V . If

similarity between a pair is less than or equal to a threshold,

then the corresponding edge does not exist in the graph. For

nearest neighbor based graphs, this threshold is often set to

zero.

In [7], for theoretical analysis, similarity measure used is

as follows. Two special nodes called classification nodes v+

and v− are added to V . For all x in L+, w(x, v+) = ∞.

Similarly for all x in L−, w(x, v−) = ∞. For other patterns

the following neighborhood relation is used, where x and yare any patterns.

nn(x, y) =

1 if y is the nearest neighbor of x0 otherwise.

(3)

Then, the similarity weight between x and y is,

w(x, y) = nn(x, y) + nn(y, x).

222

Page 3: [IEEE 2011 IEEE Recent Advances in Intelligent Computational Systems (RAICS) - Trivandrum, India (2011.09.22-2011.09.24)] 2011 IEEE Recent Advances in Intelligent Computational Systems

A cut partitions v in to V+, V−. The cost of this cut is,

cut(V+, V−) =∑

x∈V+,y∈V−

w(x, y) (4)

It is theoretically shown that, if we get a minimum cut solution

and label all patterns in V+ with +1, and in V− with −1, then

the LOOCV error of the nearest neighbor classifier over the

entire data-set will be zero.

Practically, this solution can give several solutions having

zero cut value. Also, often, solutions are found where very few

test patterns are labeled with a class-label, while the rest with

the opposite class-label. To overcome these, practical solutions

given are based on Mincut-δ, where a graph is constructed

such that two points are connected with an edge if they are

closer than δ. Further, variations of these are,

• Mincut-δ0: The value of δ chosen is the maximum value

such that the graph has a cut of value 0.

• Mincut-δ1/2: The value of δ chosen is such that the largest

connected component in the graph has size equal to half

the number of data points.

• Mincut-δopt: The value of δ chosen corresponds to the

least classification error in hindsight. Experimentally, this

is shown to give good results.

Solution given in [8] is an extension of the above solu-

tion. Several mincuts are found by producing several graphs

by adding some random noise to the edges. Among these

unbalanced solutions are thrown away. From the remaining

solutions, a labeling is given to the test patterns according to

the majority consensus. This technique is named randomized

mincuts solution.

In [9] a k-nearest neighbor graph is constructed, where

w(x, y) =

sim(x,y)∑

z∈knn(x)sim(x,z)

if y ∈ knn(x)

0 otherwise.(5)

where sim(x, y) is the similarity between x and y, and knn(x)is the set of k-nearest neighbors of x. A distance based

similarity like sim(x, y) = 1||x−y|| or sim(x, y) = −||x − y||

can be used. The graph is represented as an adjacency matrix Awith corresponding w(x, y) values. Normalized Laplacian of

A is used for doing spectral clustering. Based on this clustering

result, labels to test patterns are given.

Mincut-δ based solutions given in [7] and in [8] are NP-hard

since these are equivalent to finding a normalized graph mincut

(ratiocut of the graph) [9]. But, spectral graph partitioning can

be done in O(n3) time.

B. LOOCV error for k-nearest neighbor classifier

First, a clustering of the data-set L∪U (class-labels present

in L are ignored while doing clustering) called k-nn consistent

clustering is defined. Based on this clustering result, a formal

way of labeling the unlabeled data is described. We show that

this labeling will have zero LOOCV error with respect to k-

nearest neighbor classifier.

k-nn consistent clustering: For a given positive integer k, a

clustering π = C1, C2, . . . , Cm of the data-set D = L∪U , is

Feature 1

Fea

ture

2

*

Labeled pattern with label −1

Labeled pattern with label +1

Unlabeled pattern

**

**

* *

* ** *

*

C C1 2 CC3

4

Fig. 1. An example of 3-nn consistent clustering

said to be knn consistent, if each cluster satisfies the following

three conditions.

1) At least k + 1 patterns are present in each cluster.

2) All k nearest neighbors of a pattern x (excluding x) are

contained in the same cluster as x, for all x ∈ D.

3) In any cluster, if there are labeled patterns, then class-

labels for all these patterns are same.

For small values of k, finding k-nn consistent clustering is

quite easy. For larger values of k, k-nn consistent clustering

may not exist at all. When k = 1, finding k-nn consistent

clustering can be achieved through the well known single-link

clustering [11] method. Here, merging of nearest clusters is

progressively done. Merging process can be stopped when all

clusters get at least 2 patterns and no cluster has opposite

labeled patterns. A general procedure for this is:

1) For each pattern x ∈ D, find the group of k-nearest

neighbors of it (excluding itself). That is, find in total

k+1 patterns including x, which are k-nearest neighbors

of x. Let us call these subsets as groups.

2) Merge two groups, if they are not disjoint. Repeat this

merging till all resulting groups are disjoint.

3) If any group has labeled patterns of two different labels.

Then output “k-nn consisting clustering is not possible”.

Else output the final groups as the k-nn consistent

clustering.

An example of 3-nn consistent clustering is given in Fig. 1

for a 2 dimensional data.

k-nn consistent labeling: Let π = C1, C2, . . . , Cm is a

k-nn consistent clustering. For clusters where labeled patterns

are present, label the remaining unlabeled patterns with the

same label as the labeled patterns. For other clusters, where

all patterns are unlabeled, label either +1 or -1 to all patterns.

Theorem 3.1: If labels to unlabeled patterns are given ac-

cording to k-nn consistent labeling, then LOOCV error of the

entire data set with respect to the k-nearest neighbor classifier

is zero.

Proof: k-nearest neighbors of each pattern x (excluding itself)

are in the same cluster. Since labeling for a cluster is uniform,

the predicted label of x is same as the assigned label of x.

Hence, loss associated with x is zero. This is true for all

patterns, hence the total loss is zero.

223

Page 4: [IEEE 2011 IEEE Recent Advances in Intelligent Computational Systems (RAICS) - Trivandrum, India (2011.09.22-2011.09.24)] 2011 IEEE Recent Advances in Intelligent Computational Systems

***

*

**

C C1 2 C C34

Fig. 2. A 3-nn consistent labeling for the data shown in Fig. 1 having zeroLOOCV error.

*

*

C C1 2 C C34

Fig. 3. A desirable 3-nn consistent labeling for the data shown in Fig. 1having zero LOOCV error.

Two 3-nn consistent labellings to the example in Fig. 1

are shown in Fig. 2 and in Fig. 3. Note, for both labellings,

LOOCV error with respect to 3-nearest neighbor classifier is

zero. Labeling shown in Fig. 2 is not a desirable one, but that

shown in Fig. 3 is a desirable one. For this example, there

are two more solutions having LOOCV error zero, where all

patterns in C2 and C3 are labeled with either +1 or with −1.

But, LOOCV error can produce any solution either randomly

or based on the ordering in which patterns are processed.

This is clearly a drawback of LOOCV error based labeling

where the inductive classifier is a nearest neighbor based one.

This kind of drawback is not present, for example, when

one uses LOOCV error with respect to SVMs to guide the

labeling. Nearest neighbor based methods are highly localized

ones, hence LOOCV error can be a misleading criterion for

transductive learning.

IV. A SELECTIVE INCREMENTAL TRANSDUCTIVE

NEAREST NEIGHBOR CLASSIFICATION (SI-TNNC)

This Section presents the proposed transductive nearest

neighbor classifier. First, motivation (or intuition) behind this

transductive labeling scheme is informally explained, followed

by a detailed formal description.

The nearest neighbor classifier (NNC) classifies a given

test pattern according to its nearest neighbor in the training

set. This can be done in the following way also, where

nearest neighbors in each class of training patterns are found

separately, and then the nearest neighbor is found. Let x be

the test pattern, let its nearest neighbor’s distance in L+ be

d+(x), and in L− be d−(x). Then, class-label assigned to xis y′ = +1 if d+(x) < d−(x), y′ = −1 otherwise. One way

to measure goodness (γ) of this assignment is,

γ(x, y′) =

(d−(x) − d+(x)) if y′ = +1(d+(x) − d−(x)) if y′ = −1

(6)

This can be simplified as

γ(x, y′) = y′(d−(x) − d+(x)) (7)

This goodness measure γ(x, y′) is called margin of x with

respect to L. This is done in a similar way as functional

margin is used for a hyper-plane in order to learn a SVM [12].

Intuitively, if margin is large the confidence in that labeling

assignment is high. If labeling has to be done for the test set

U collectively, then, suppose we assigned a labeling to all

patterns in U to get U ′ = (xl+1, y′l+1), . . . , (xn, y′

n), then

the goodness of this labeling is,

Γ(L ∪ U ′) = miny′i(d−(xi) − d+(xi) | i = 1 to n. (8)

Here for a training pattern x, y′ = y, .i.e., the available label

in the training set itself is used. Further, nearest neighbors of

x are found from (L ∪ U ′) − x. That is, excluding x, its

nearest neighbor from L ∪ U ′ with label +1 and with label

−1 are found.

Now, a labeling is given to the test data in such a way

that Γ(L ∪ U ′) is maximized. Intuitively, this can be related

to SVM learning. SVM’s find a hyper-plane either in feature

space or in a kernel induced space such that the margin is

maximized [12].

Formally, this is, find a labeling to U such that

U ′ = arg maxU ′Γ(L ∪ U ′) (9)

This optimization problem is a discrete one and hence

gradient-descent kind of procedures are not feasible. Evo-

lutionary techniques, like Genetic Algorithms can be used.

But, this takes excessive amounts of time. Instead, the paper

proposes a heuristic based approximate technique to solve this

problem. This is an iterative incremental labeling procedure.

For each pattern x ∈ U a score s(x) = |d+(x) − d−(x)|is given. Here we consider only L to find nearest neighbors

of x in both of the classes. The pattern with highest score

is added to L along with its nearest neighbor’s label. The

process is repeated till all patterns in the test set, along with

their nearest neighbor’s label, are added to L. This method

is called selective incremental transductive nearest neighbor

classification (SI-TNNC). The method is described formally in

Algorithm 1.

A. The time complexity analysis of SI-TNNC

Training set size, |L| = l, test set size, |U | = u, and l+u =n is the total data-set size (refer Section II for the Notation

used). Outer “while loop” is executed u times. Time taken for

each iteration of the outer “while loop” depends on the current

training set size, and current test set size. After each “while

loop” iteration training set size is increased by 1, and test

set size is decreased by 1. At the kth iteration of the “while

loop”, the time taken by the inner “for loop” is proportionate

to (l + k)(u − k). So the time taken by the Algorithm 1 is

proportionate to

lu + (l + 1)(u − 1) + (l + 2)(u − 2) + · · · + (l + u − 1)(1).

So, the time complexity is O(lu2). In the worst case, l = u =n/2, the time complexity of SI-TNNC is O(n3). An improved

algorithm, which achieves the same result as SI-TNNC, but in

O(n2) worst case time, is presented below which is called

Improved SI-TNNC method.

224

Page 5: [IEEE 2011 IEEE Recent Advances in Intelligent Computational Systems (RAICS) - Trivandrum, India (2011.09.22-2011.09.24)] 2011 IEEE Recent Advances in Intelligent Computational Systems

Algorithm 1 SI-TNNC(L, U )

while U 6= ∅ do

for all x ∈ U do

Find d+(x) = distance between x and its nearest

neighbor in L+;

Find d−(x) = distance between x and its nearest

neighbor in L−;

Find s(x) = |d+(x) − d−(x)|;end for

Find x ∈ U having maximum s(x).if d+(x) < d−(x) then

y′ = +1;

else

y′ = −1;

end if

L = L ∪ (x, y′);

U = U − x;

end while

Output U ′ = (xl+1, y′l+1), . . . , (xn, y′

n), which is now a

subset of L;

B. Improved SI-TNNC method

For each test pattern x, we store the pair (d+(x), d−(x))with respect to the current training set L, called the distance

pair of x. Let z be the next test pattern which is added to the

training set. Let z is assigned with class-label yz . Then L is

updated to L = L ∪ (z, yz). Now, to update the distance

pair of x, there is no need to consider all patterns in L,

considering (z, yz) is enough. Let the updated distance pair for

x be (dnew+ (x), dnew

− (x)), which can be obtained as described

below.

dnew+ (x) =

min(d+(x), ||x − z||) if yz = +1d+(x) if yz = −1

(10)

dnew− (x) =

d−(x) if yz = +1min(d−(x), ||x − z||) if yz = −1

(11)

Algorithm 2 describes the Improved SI-TNNC method

which is based on the above observation. It is easy to see

that Algorithm 2 produces same result as Algorithm 1.

C. The time complexity analysis of Improved-SI-TNNC

The time taken by Algorithm 2 is proportionate to

lu + (u − 1) + (u − 2) + · · · + 1.

Hence the time complexity is O(lu + u2). In the worst case,

when l = u = n/2, the time complexity is O(n2).

V. EXPERIMENTAL RESULTS

Experiments are conducted with five standard data-sets

which are drawn from from the data-sets available at UCI

Machine Learning Repository [13]. Properties of the data-

sets along with distance function used are shown in Table I.

Note, same data-sets are used in [7] and [8]. The classifiers

Algorithm 2 Improved-SI-TNNC(L, U )

U ′ = ∅;

for all x ∈ U do

Find d+(x) = distance between x and its nearest neighbor

in L+;

Find d−(x) = distance between x and its nearest neighbor

in L−;

Store distance-pair(x) = (d+(x), d−(x));end for

while U 6= ∅ do

Find z ∈ U such that s(z) = |d+(z) − d−(z)| is

maximum;

if d+(z) < d−(z) then

yz = +1;

else

yz = −1;

end if

U ′ = U ′ ∪ (z, yz);

U = U − z;

for all x ∈ U do

if yz = +1 then

d+(x) = min(d+(x), ||x − z||);else

d−(x) = min(d−(x), ||x − z||);end if

end for

end while

Output U ′;

TABLE IDETAILS OF DATA-SETS USED

Data-set Number of |L| |U | DistanceFeatures function

VOTING 16 45 390 Jaccard Coefficient

MUSH 22 20 1000 Simple Matching

IONO 34 50 300 Euclidean

BUPA 6 45 300 Euclidean

PIMA 8 50 718 Euclidean

used for the comparison purpose are, (i) graph mincut-δopt [7]

(a transductive classifier), (ii) randomized graph mincut [8]

(a transductive classifier), (iii) spectral graph partitioning [9]

(a transductive classifier), (iv) ID3 (a decision tree based

classifier, an inductive classifier) [14][15], (v) 3-NNC (3-

nearest neighbor classifier, an inductive classifier) [16], (vi)

SI-TNNC (the proposed method of this paper, a transductive

classifier). Classifiers for comparison are chosen so as to

compare with other transductive methods which are similar to

the proposed method of the paper. Two well known induction

based classifiers viz., ID3 and 3-NNC are also used for the

comparison purpose.

It can be seen that the proposed SI-TNNC method is

comparable with other classifiers and in some cases shows

better results. A plot comparing classification accuracy with

225

Page 6: [IEEE 2011 IEEE Recent Advances in Intelligent Computational Systems (RAICS) - Trivandrum, India (2011.09.22-2011.09.24)] 2011 IEEE Recent Advances in Intelligent Computational Systems

TABLE IICA (%) FOR VARIOUS CLASSIFIERS

Data-set Mincut Rand. Spectral-δopt Graph Graph ID3 3-NNC SI-TNNC

Mincut Partit.

VOTING 91.3 91.2 85.9 86.4 89.6 93.5

MUSH 97.7 94.2 91.6 93.3 91.1 96.7

IONO 81.6 82.8 79.7 88.6 69.6 82.67

BUPA 59.3 63.5 61.6 55.3 52.7 64.67

PIMA 72.3 67.5 68.2 70.0 68.1 72.70

40

50

60

70

80

90

100

0 5 10 15 20

Cla

ssific

atio

n A

ccu

racy (

%)

Training set size (|L|)

MincutRand. Graph MincutSpectral Graph Part.

ID33-NNC

SI-TNNC

Fig. 4. Classification Accuracy Vs. Training Set Size for MUSH data.

training set size for MUSH (Mushroom) data is shown in

Fig. 4. It can be seen that SI-TNNC method shows consistent

improvement as the training set size increases. For some

smaller sized training sets SI-TNNC performs better than all

other classifiers.

It is worth noting that a mincut of a graph (which is called

s-t mincut) can be found in O(n3) time [17]. Several standard

techniques for this are available. But, this can give a degenerate

solution having highly imbalanced labeling. To overcome this

problem a normalized graph mincut is used (variations of such

solutions are given in [7], [8]) which are NP-hard, and hence

are feasible for only small data-sets. A randomized algorithm

to solve the s-t mincut problem can be done in O(n2 log n)time which uses O(n2) space to find a solution with high

probability [18](with probability O(1/log n)), where n is the

data-set size. Spectral graph partitioning can be done in O(n3)time. The proposed method of this paper, viz., SI-TNNC,

whose improved version is given in Algorithm 2 takes O(n2)time.

VI. CONCLUSION AND FUTURE WORK

Nearest neighbor based transductive learners optimize leave-

one-out cross-validation (LOOCV) error. These methods are

realized by graph mincut or by spectral graph partitioning.

But, LOOCV error with nearest neighbor based classifiers have

some drawbacks. It is shown formally, that various labellings

to the test set are possible which will have zero LOOCV error.

Some of these solutions can have low classification accuracy.

The paper proposed a transductive learner called selective in-

cremental transductive nearest neighbor classifier (SI-TNNC)

to label the test patterns. The method is an approximate

one which maximizes a margin like criterion. The proposed

method has quadratic time complexity, whereas normalized

graph mincut methods are either NP-hard or have cubic

time complexity (with spectral graph partitioning method).

Experimentally, the proposed method is shown to have on-

par or better classification accuracy when compared with a

few standard data-sets.

Future work is to devise an algorithm to find exact solution

which maximizes the criterion proposed in this paper.

ACKNOWLEDGMENTS

The work is partly funded by an AICTE Project under RPS

Scheme with reference: “F.No: 8023/BOR/RID/RPS-51/2009-

10”.REFERENCES

[1] O. Chapelle, B. Scholkopf, and A. Zein, Semi- Supervised Learning.Cambridge, Massachusetts: The MIT Press, 2006.

[2] D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. S. lkopf, “Learningwith local and global consistency,” in Advances in Neural Information

Processing Systems, S. Thrun, L. Saul, and B. S. lkopf, Eds., vol. 16.Cambridge, MA: The MIT Press, 2004, pp. 321–328.

[3] V. Vapnik, Statistical Learning Theory. John Wiley & Sons: A Wiley-interscience Publication, New York, 1998.

[4] V. Vapnik, Estimation of Dependences Based on Empirical Data, 2nd ed.New York: Springer Series in Statistics, Springer-Verlag, 2006.

[5] K. Bennett, “Combining support vector and mathematical programmingmethods for classification,” in Advances in kernel methods - support

vector learning, B. Scholkopf et al., Ed. MIT-Press, 1999.[6] T. Joachims, “Transductive inference for text classification using support

vector machines,” in Sixteenth International Conference on Machine

Learning. Bled Slovenia: Morgan Kaufmann, 1999, pp. 200–209.[7] A. Blum and S. Chawla, “Learning from labeled and unlabeled data

using graph mincut,” in Eighteenth International Conference on Machine

Learning. Morgan Kaufmann, 2001, pp. 19–26.[8] A. Blum, J. Lafferty, M. Rwebangira, and R. Reddy, “Semi-supervised

learning using randomized mincuts,” in International Conference on

Machine Learning. Morgan Kaufmann, 2004.[9] T. Joachims, “Transductive learning via spctral graph partitioning,” in

International Conference on Machine Learning, 2003, pp. 290–297.[10] X. Zhu, Z. Gharahmani, and J. Lafferty, “Semi-supervised learning

using Gaussian fields and harmonic functions,” in 20th International

Conference on Machine Learning, 2003, pp. 912–919.[11] A. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: A review,” ACM

Computing Surveys, vol. 31, no. 3, pp. 264–323, 1999.[12] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector

Machines and Other Kernel-based Learning Methods, 1st ed. Cam-bridge University Press, 2000.

[13] P.M.Murphy, UCI Repository of Machine Learning Databases [http://

www.ics.uci.edu/mlearn/MLRepository.html], Department of Informationand Computer Science, University of California, Irvine, CA, 2000.

[14] R. O. Duda, P. E.Hart, and D. G. Stork, Pattern Classification, 2nd ed.John Wiley & Sons: A Wiley-interscience Publication, 2000.

[15] J. Han and M. Kamber, Data Mining: Concepts and Techniques,Academic Press, 2001.

[16] B. V. Dasarathy, “Data mining tasks and methods: Classification:Nearest-neighbor approaches,” in Handbook of data mining and knowl-

edge discovery. New York: Oxford University Press, 2002, pp. 288–298.[17] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to

Algorithms. Cambridge, MA, U.S.A: The MIT Press, 1990.[18] R. Motwani and P. Raghavan, Randomized Algorithms. Cambridge,

UK: Cambridge University Press, 1995.

226