Upload
yanyan-liu
View
213
Download
0
Embed Size (px)
Citation preview
ARTICLE IN PRESS
0925-2312/$ - se
doi:10.1016/j.ne
�CorrespondE-mail addr
Neurocomputing 71 (2008) 1735–1740
www.elsevier.com/locate/neucom
Letters
A new fuzzy approach for handling class labels incanonical correlation analysis
Yanyan Liu, Xiuping Liu�, Zhixun Su
Department of Applied Mathematics, Dalian University of Technology, Dalian 116024, China
Received 25 April 2007; received in revised form 25 November 2007; accepted 30 November 2007
Communicated by S. Mitra
Available online 8 February 2008
Abstract
Canonical correlation analysis (CCA) can extract more discriminative features by utilizing class labels, especially the ones that can
reflect the sample distribution appropriately. In this paper, a new fuzzy approach for handling class labels in the form of fuzzy
membership degrees is proposed. We elaborately design a novel fuzzy membership function to represent the distribution of image
samples. These fuzzy class labels promote the classification performances of CCA and kernel CCA (KCCA) through incorporating
distribution information into the process of feature extraction. Comprehensive experimental results on face recognition demonstrate the
effectiveness and feasibility of the proposed method.
r 2007 Elsevier B.V. All rights reserved.
Keywords: Feature extraction; Fuzzy membership degree; Sample distribution; Kernel methods; Face recognition
1. Introduction
Canonical correlation analysis (CCA) was initiallyproposed as a multivariate analysis method by Hotelling[6] for correlating linear relationships between two sets ofvariables. Recently CCA gained much attention in thefields of image analysis [4] and pattern recognition[5,11,12]. When taking one set of variables as samplesand the other as corresponding class labels, CCA can beused for supervised feature extraction, especially if theselabels are binary vectors, CCA is equivalent to Fisherlinear discriminant analysis (FLDA) [1,5,12]. These binaryvectors, in which a single component is set to one to denotethe correct class and all the other components are set tozero, are encoded by f0; 1g for binary class assignment, i.e.each sample fully belongs to a unique class. However,image samples, such as face images, are significantlyaffected by numerous environmental conditions. Theseinfluences may blur the boundaries between classes andmake some samples locate in or near overlapping regions
e front matter r 2007 Elsevier B.V. All rights reserved.
ucom.2007.11.008
ing author.
ess: [email protected] (X. Liu).
among classes. This characteristic of sample distribution isnot considered in binary label vectors, which will result inthe loss of useful information for classification.To overcome the shortcoming of CCA methods based on
binary labels, the information of sample distribution shouldbe incorporated into the process of feature extraction. Sincethe distribution information is not provided by trainingimages explicitly, how to represent it in the form of numericalvalues appropriately is a key problem. It can be seen as theproblem of transforming categorical variables into intervalmeasures by measurement transformations in multivariatestatistics [14]. Such transformations are subject to somemeasurement restraints and can be obtained by an alternatingleast squares algorithm [14]. We are not going to focus ourefforts on discussing these restraints or proposing newiterative algorithms, but rather constructing an appropriatefunction of class labels to reflect the sample distributiondirectly. Obviously, fuzzy set theory is a good choice, andfuzzy k-nearest neighbor (FKNN) method has been takeninto account to yield class labels by utilizing neighborhoodinformation [7,12]. In this paper, we elaborately design anovel fuzzy membership function for handling fuzzy classlabels to represent the distribution of image samples. We
ARTICLE IN PRESSY. Liu et al. / Neurocomputing 71 (2008) 1735–17401736
anticipate that CCA incorporating fuzzy class labels, whichreferred as fuzzy label CCA later, could promote itsclassification performance. Furthermore, the modifiedfuzzy class labels via kernel trick [10] corresponding toKCCA [4] are also proposed to obtain nonlinear discrimi-native features. Comprehensive experimental results onface databases demonstrate the effectiveness of fuzzy labelbased CCA and KCCA.
The rest of this paper is organized as follows. A briefreview of CCA and KCCA is given in Section 2. In Section3 we present the fuzzy approach for handling class labels,and then incorporate them into CCA and KCCA. Ourexperimental results are presented in Section 4. In Section 5the conclusions and future work are discussed.
2. Overview of CCA and KCCA
CCA can be defined as the problem of finding basisvectors for two sets of variables such that the correlationbetween the projections of the variables onto these basisvectors are mutually maximized [4]. More formally,consider two multidimensional variables x and y with zeromean, and suppose that fxigi¼1;...;N and fyigi¼1;...;N are N
observations of them, respectively. Then the goal of CCA isequivalent to finding pairs of basis vectors a and b thatmaximize
r ¼aTXYTbffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
aTXXTa � bTYYTbq , (1)
where X ¼ ½x1; . . . ;xN �, Y ¼ ½y1; . . . ; yN �, and the symbol Tdenotes transpose. Solve this optimization problem, we canobtain the following generalized eigenvalue equations[4,12]:
XYTðYYTÞ�1YXTa ¼ lXXTa,
YXTðXXTÞ�1XYTb ¼ lYYTb; (2)
where the eigenvalue l equals to r2. Generally, only thefirst equation of (2) need to be solved for subsequentfeature extraction. If XXT is invertible, the generalizedeigenproblem can be converted to a standard eigenpro-blem. However, in pattern recognition area, small samplesize problem often occurs and makes XXT singular. Tosolve this problem, we perform PCA algorithm on theoriginal data before CCA for dimension reduction [11].This process does not lose any discriminative information,which is superior to FLDA [5].
CCA may not extract useful descriptors of the databecause of its linearity. As a nonlinear extension of CCAvia kernel trick, KCCA offers a solution by first implicitlymapping data into a higher dimensional feature space F
f : x! fðxÞ; X ! Xf ¼ ½fðx1Þ; . . . ;fðxNÞ� (3)
and then performing CCA in F [4]. Different from thederivation process in [4] where KCCA was formulated as ageneralized eigenproblem, an alternative natural approach
is tried to obtain a standard eigenproblem correspondingto KCCA in this paper. we perform CCA algorithm on Xf
and Yf directly as described in previous paragraph andutilize PCA to reduce the dimension of fðxÞ, which arepresented in detail in Section 3.2.
3. Fuzzy class label based CCA and KCCA
Correlating samples with appropriate class labels whichcan represent the sample distribution, CCA can be used toextract combined features by fusing the information ofgray level and distribution. Whereas, what kind of classlabels can represent the distribution information approx-imatively? As environmental effect blurs the boundariesbetween classes, image samples may have relations witheach class. These relationships can be represented by fuzzymembership degrees. Therefore, a novel fuzzy membershipfunction is defined for handling class labels to reflect thedistribution of image samples.
3.1. Fuzzy label approach
Let O ¼ fxij 2 Rp; i ¼ 1; . . . ; c; j ¼ 1; . . . ;Nig be a sampleset with N elements, where c is the number of image classes,Ni is the number of samples in the ith class and xij denotesthe jth sample in the ith class. Evidently, O contains thegray level information of images. Suppose Ok ¼ fxkj 2
O; j ¼ 1; . . . ;Nkg is the sample set of the kth class. Let dkij ¼
kxij �mkk be the Euclidean distance between xij and themean of Ok (k ¼ 1; . . . ; c). 8xij 2 O, we define the member-ship degree of xij belonging to Ok as
okij ¼
dkij �maxs¼1;...;c ðd
sijÞ
mins¼1;...;c ðdsijÞ �maxs¼1;...;c ðd
sijÞ, (4)
where okij 2 ½0; 1� essentially utilize distances to represent
the membership of samples and to yield class labels.However, due to the effect of numerous environmentalconditions, many samples may locate away from their ownregions, and even be surrounded by samples of otherclasses. If we use Eq. (4) to reflect the sample distribution,the wrong information about categories may result ininaccurate classification. Therefore, a penalty term isintroduced to define the improved membership function as
eokij ¼
1; k ¼ i;
okij
oiij þ gij
; kai;
8>><>>:
gij ¼
maxsai ðosijÞ � oi
ijy
y;
maxsai ðosijÞ
oiij
4y;
0; else:
8>><>>: (5)
It is easy to know that the penalty term gij should bemoderate, because too much penalty may lead to the loss ofdistribution information, while quite small value maynot work properly. Here, threshold y 2 ð0; 1Þ is used to
ARTICLE IN PRESSY. Liu et al. / Neurocomputing 71 (2008) 1735–1740 1737
determine the value of gij . Obviously, improved member-ship degrees satisfy eoi
ij ¼ 1, eokijpy (kai), which means that
sample xij exhibits the highest degree of membership to itsown class in contrast to other classes. Moreover, theequation eok
ij=eolij ¼ ok
ij=olij (k; lai) guarantees the limited
loss of fuzzy relationship information by fixing the
ðdkijÞf ¼ fðxijÞ �
1
Nk
XNk
l¼1
fðxklÞ
����������
¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffifðxijÞ
TfðxijÞ �2
Nk
XNk
l¼1
fðxijÞTfðxklÞ þ
1
N2k
XNk
l¼1
XNk
s¼1
fðxklÞTfðxksÞ
vuut
¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffikðxij ; xijÞ �
2
Nk
XNk
l¼1
kðxij ; xklÞ þ1
N2k
XNk
l¼1
XNk
s¼1
kðxkl ;xksÞ
vuut .
proportion between xij and other classes in terms ofmembership degree. Consequently, for 8xij 2 O, the vectorof class labels yij ¼ ðeo1
ij ; eo2ij ; . . . ; eoc
ijÞT reflects the approx-
imate location where sample xij locates in O in some sense.In other words, C ¼ fyij 2 Rc; i ¼ 1; . . . ; c; j ¼ 1; . . . ;Nig
contains the information of sample distribution.
3.2. Fuzzy label CCA/KCCA
To implement CCA algorithm, O and C are regarded astwo sets of variables. Suppose they are both zero mean forderivational convenience. Then we just need to solve thefirst equation of (2) with X ¼ ½x11;x12; . . . ;xcNc � and
Y ¼ ½y11; y12; . . . ; ycNc�. Since XXT is singular, as men-
tioned in Section 2, PCA algorithm is used for dimensionreduction [11]. Specifically, computing all eigenvectors xcorresponding to the nonzero eigenvalues of XXT, and
let P ¼ ðx1; . . . ; xrÞ (r ¼ rankðXXTÞ). Obviously, matrix
PTXXTP is reversible. In addition, small nonzero eigenva-
lues of XXT generally include more interference informa-tion which will be blown up in the inverse. To counteractthis effect, the regularization technique [3] is employed.Thus the eigenvalue equation is converted to
ðPTXXTPþ mIrÞ�1PTXYTðYYTÞ
�1YXTPa ¼ la, (6)
where Ir denotes r� r identity matrix and m is a smallnonnegative number. Solving Eq. (6), we can obtaineigenvectors a corresponding to the first d largesteigenvalues (dpminðr; cÞ). The extracted feature vector ofa test sample x is calculated as
z ¼ ða1; a2; . . . ; adÞTPTx. (7)
We anticipate that these features extracted by fuzzy labelCCA have more powerful ability of classification, for theyincorporate the information about gray level and distribu-tion of samples.
As to the corresponding kernel version, class labelsshould be recalculated because the distribution of samplesin F may be different from the original space. Since theimplicit nonlinear mapping f is unknown, we cannot useEqs. (4) and (5) to obtain Yf directly. Fortunately, theinner product in F can be computed via a kernel function
kðxi; xjÞ ¼ fðxiÞTfðxjÞ using kernel trick [10]. Thereby,
ðdkijÞf can be achieved as follows:
Replacing dkij with ðd
kijÞf, Yf can be obtained easily.
Assume Xf and Yf are all centered in F (see [3] for amethod to center the samples in feature space). Forsimplicity, we denote all the training samples in F byfðx1Þ;fðx2Þ; . . . ;fðxNÞ. Similar to the derivations of CCA,KCCA is equivalent to solving the following generalizedeigenvalue equation:
XfYTfðYfYT
f�1YfXT
faf ¼ lfXfXTfaf. (8)
As XfXTf is unknown, to perform PCA, we first compute
the nonzero eigenvalues es and their correspondingeigenvectors us of K, where K ¼ XT
fXf is the kernel matrixwith elements Kij ¼ fðxiÞ
TfðxjÞ ¼ kðxi; xjÞ. According toSVD theorem [3], eigenvectors of XfXT
f satisfy vs ¼
Xfus=ffiffiffiffiesp
, where s ¼ 1; . . . ; n and n ¼ rankðKÞ, i.e. V ¼
XfUL�1=2, where V ¼ ðv1; . . . ; vnÞ, U ¼ ðu1; . . . ; unÞ andL ¼ diagðe1; . . . ; enÞ. Consequently, after the process ofPCA and regularization, Eq. (8) can be converted to
ðUTK2U þ mInÞ�1UTKYT
fðYfYTfÞ�1YfKUaf ¼ lfaf.
(9)
Accordingly, the fuzzy label KCCA features of a testsample x are calculated as
zf ¼ ðaf1; af2; . . . ; afdf ÞTL�1=2UTKx, (10)
where Kx 2 RN with elements ðKxÞi ¼ kðxi;xÞ, i ¼ 1;2; . . . ;N and dfpminðn; cÞ.
4. Experimental results
To evaluate the performance of the proposed fuzzy labelmethod, some classification experiments are performed onface images. We construct a large face database bycombining ORL [9], Yale [13] and Lab database in orderto represent the variations occurring in the real world
ARTICLE IN PRESS
Fig. 1. Some face images from the combined database.
80
78
76
74
72
70
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0Threshold θ
Rec
ogni
tion
rate
(%)
Fig. 2. The change of recognition rates w.r.t. y on 10 different data sets.
78
77
76
75
74
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Recognitio
n r
ate
(%
)
Eq.(5)
Eq.(4)Binary labels
Fig. 3. The change of average recognition rates w.r.t. y.
Y. Liu et al. / Neurocomputing 71 (2008) 1735–17401738
sufficiently. Lab database contains 450 images from 30members in our laboratory, and these images are takenwith little restriction. The combined database is composedof 85 distinct subjects and each subject has 10 selectedimages. All images are cropped into 112� 92. Pose,illumination and expression variations are contained inthis database, as shown in Fig. 1. Experiments areconstructed on a P4-2.4GHz PC with 512MB RAM usingMatlab 6.5 environment. All of the obtained eigenvectorsa=af are used to extract features for classification by thenearest neighbor classifier.
To show how the threshold y affects the recognitionperformance, we change y 2 ð0; 1Þ with step 0.05 and repeatthe classification experiments 10 times by randomlychoosing different training and testing sets from thecombined database. The number of training samples persubject is 4. In PCA process, too small nonzero eigenvaluesand their corresponding eigenvectors are given up for theconvictive illustration of y’s effect. The recognition resultsin each round are shown in Fig. 2. Meanwhile the resultsusing Eq. (4) to handle fuzzy labels are also shown as dotlines due to its independence of y. The change of averagerecognition rates w.r.t. y are illustrated in Fig. 3. From theexperimental results in Figs. 2 and 3, we can observe that inmost cases, the proposed method works well by simplysetting y 2 ð0:1; 0:4Þ which corresponds to moderate g, andy ¼ 0:2 can achieve the highest accuracy.
We then fix y ¼ 0:2 for fuzzy label CCA/KCCA, andcompare them with Fisherfaces [2] and PCAþ CCA [5]. Inthis experiment, the regularization parameter m is taken asan experiential value 10�6 by trial and error, for m cannotbe determined automatically and the optimal value is noteasily specified [8]. Gaussian RBF kernel function kðx; yÞ ¼expð�kx� yk2=2s2Þ with s ¼ 10 000 is used in fuzzy labelKCCA. The number of training samples per subjectincreases from 2 to 9. Training samples are selectedrandomly and the remaining samples are used for testing.
ARTICLE IN PRESS
100959085807570656055504540
2 3 4 5 6 7 8 9Number of training samples
Rec
ogni
tion
rate
(%)
Fuzzy label KCCAFuzzy label CCAPCA+CCAFisherfaces
Fig. 4. Recognition rates comparison on the combined database.
Table 1
Recognition rates comparison on ORL database
Method Recognition rate (%) Average time (s)
Binary labels 95:25� 1:3591 10.4581
FKNN 96:20� 1:3166 24.0373
New method (y ¼ 0:1) 96:45� 1:3632 13.5352
New method (y ¼ 0:2) 96:60� 1:2867 14.0040
New method (y ¼ 0:3) 96:70� 1:0055 13.8280
New method (y ¼ 0:4) 96:40� 1:3499 13.9389
Y. Liu et al. / Neurocomputing 71 (2008) 1735–1740 1739
Ten times of random selections are performed on thecombined database. Fig. 4 shows the average recognitionrates. It can be seen that Fuzzy label KCCA achieves betterresults than fuzzy label CCA, and both of them outperformFisherfaces and PCAþ CCA methods. These results showthat the sample distribution information and nonlinearfeatures are beneficial to classification.
Finally, experiments are performed on ORL database tocompare the performance of different class assignmentmethods. The PCA eigenvectors corresponding to the first60 largest eigenvalues are preserved for dimension reduc-tion and the parameter of FKNN method is fixed as k ¼ 17[12]. For the new method, y is set to 0.1, 0.2, 0.3 and 0.4,respectively. The mean, standard deviation and averageruntime of 10 times repeated experimental results aretabulated in Table 1, from which we can observe that theproposed fuzzy label approach can achieve satisfactoryperformance in terms of both accuracy and efficiency. It isworth noting that the optimal performance of the newmethod on ORL database is obtained when y ¼ 0:3 whileon the combined database is obtained when y ¼ 0:2, whichimply that the value of y should be smaller for morecomplicated database seriously affected by environmentalconditions.
5. Conclusions and future work
In this paper, we design a new fuzzy membershipfunction for handling class labels to reflect the distributionof image samples. Incorporating these class labels intoCCA and KCCA, we can extract more discriminativefeatures by making use of the distribution information.Classification experimental results on face images showthat the new approach is effective and feasible.Intuitively, different class labels can result in different
classification performances. So how to construct moreappropriate functions of labels can be further studied. Inaddition, it is desirable to develop alternative approachesto yield class labels based on the techniques of handlingcategorical variables in multivariate data analysis [14].
Acknowledgments
This work was supported by the Program for NewCentury Excellent Talents in University of China underGrant no. NCET-05-0275 and National Natural SciencesFoundation of China under Grant no. 60673006. Theauthors would like to thank the anonymous reviewers fortheir precious suggestions.
References
[1] M. Barker, W. Rayens, Partial least squares for discrimination,
J. Chemometrics 17 (2003) 166–173.
[2] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs.
Fisherfaces: recognition using class specific linear projection, IEEE
Trans. Pattern Anal. Mach. Intell. 19 (7) (1997) 711–720.
[3] T. De Bie, N. Cristianini, R. Rosipal, Eigenproblems in pattern
recognition, in: E. Bayro-Corrochano (Ed.), Handbook of Computa-
tional Geometry for Pattern Recognition, Computer Vision, Neuro-
computing and Robotics, Springer, Heidelberg, 2004.
[4] D.R. Hardoon, S. Szedmak, J. Shawe-Taylor, Canonical correaltion
analysis; an overview with application to learning methods, Technical
Report CSD-TR-03-02, Computer Science Department, Royal
Rolloway, University of London, 2003.
[5] Y.H. He, L. Zhao, C.R. Zou, Face recognition based on PCA/KPCA
plus CCA, in: Proceedings of the ICNC 2005, Lecture Notes in
Computer Science, vol. 3611, Springer, Berlin, 2005, pp. 71–74.
[6] H. Hotelling, Relations between two sets of variates, Biometrika 28
(1936) 321–377.
[7] J.M. Keller, M.R. Gray, J.A. Givens, A fuzzy k-nearest neighbor
algorithm, IEEE Trans. Syst. Man Cybern. 15 (4) (1985) 580–585.
[8] T. Melzer, Generalized canonical correlation analysis for object
recognition, Ph.D. Thesis, Institute of Automation, Vienna Uni-
versity of Technology, 2002.
[9] ORL face database hhttp://www.uk.research.att.com/facedatabase.
htmli.
[10] J. Shawe-Taylor, N. Cristianin, Kernel Methods for Pattern Analysis,
Cambridge University Press, Cambridge, 2004.
[11] Q.S. Sun, S.G. Zeng, Y. Liu, P.A. Heng, D.S. Xia, A new method of
feature fusion and its application in image recognition, Pattern
Recognition 38 (2005) 2437–2448.
[12] T.K. Sun, S.C. Chen, Class label versus sample label-based CCA,
Appl. Math. Comput. 185 (2007) 272–283.
[13] Yale face database hhttp://cvc.yale.edu/projects/yalefaces/yalefaces.
htmli.
ARTICLE IN PRESSY. Liu et al. / Neurocomputing 71 (2008) 1735–17401740
[14] F.W. Young, J. de Leeuw, Y. Takane, Regression with qualitative
and quantitative variables: an alternating least squares method with
optimal scaling features, Psychometrika 41 (1976) 505–529.
Yanyan Liu was born in Hebei, China in 1980.
Currently, she is a Ph.D. candidate in the
Department of Applied Mathematics of Dalian
University of Technology. Her current research
interests include face recognition, image proces-
sing and computer vision.
Xiuping Liu was born in 1964. She received the
Ph.D. degree in Computational Mathematics in
1999 from Dalian University of Technology,
China. She was a postdoctoral research fellow
in School of Mathematics and Computational
Science of Sun Yat-sen University from October
1999 to October 2001. She is now an associate
professor in the Department of Applied Mathe-
matics, Dalian University of Technology. Her
research activities are in the areas of computer
vision, multivariate spline function, CAGD and wavelet analysis.
Zhixun Su received his B.Sc. degree in Mathe-
matics from Jilin University in 1987 and M.S.
degree in Computer Science from Nankai
University in 1990. He received his Ph.D. degree
in 1993 from Dalian University of Technology,
where he has been a professor in the Department
of Applied Mathematics since 1999. His research
interests include computer graphics and image
processing, computational geometry, computer
vision, etc.