13
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO. 4, AUGUST 2013 597 Knowledge-Leverage-Based Fuzzy System and Its Modeling Zhaohong Deng, Member, IEEE, Yizhang Jiang, Member, IEEE, Fu-Lai Chung, Member, IEEE, Hisao Ishibuchi, and Shitong Wang Abstract—The classical fuzzy system modeling methods only consider the current scene where the training data are assumed fully collectable. However, if the available data from that scene are insufficient, the fuzzy systems trained will suffer from weak gener- alization for the modeling task in this scene. In order to overcome this problem, a fuzzy system with knowledge-leverage capability, which is known as a knowledge-leverage-based fuzzy system (KL- FS), is proposed in this paper. The KL-FS not only makes full use of the data from the current scene in the learning procedure but can effectively make leverage on the existing knowledge from the ref- erence scene, e.g., the parameters of a fuzzy system obtained from a reference scene, as well. Specifically, a knowledge-leverage-based Mamdani–Larsen-type fuzzy system (KL-ML-FS) is proposed by using the reduced set density estimation technique integrating with the corresponding knowledge-leverage mechanism. The new fuzzy system modeling technique has been verified by experiments on synthetic and real-world datasets, where KL-ML-FS has better performance and adaptability than the traditional fuzzy modeling methods in scenarios with insufficient data. Index Terms—Fuzzy modeling, fuzzy systems, knowledge lever- age, Mamdani–Larsen fuzzy model, missing data, reduced set den- sity estimator (RSDE), transfer learning. I. INTRODUCTION T RANSFER learning and/or domain adaptive learning [1]–[10] are becoming increasingly attractive in the fields of computational intelligence and machine learning to cope with the new challenges in the real-world modeling tasks, where the data of the current scene are insufficient, but some useful information of related scenes (reference scenes) is available. As shown in Fig. 1 with the auxiliary explanations in Table I, Manuscript received September 6, 2011; revised April 13, 2012; accepted June 14, 2012. Date of publication August 8, 2012; date of current version July 31, 2013. This work was supported in part by the Hong Kong Polytechnic University under Grant Z-08R and Grant 1-ZV6C, the National Natural Sci- ence Foundation of China under Grant 60903100, Grant 60975027, and Grant 60117012, the Natural Science Foundation of Jiangsu province under Grant BK2009067 and Grant BK2011003, JiangSu 333 Expert Engineering Grant BRA2011142, and the 2011 Postgraduate Student’s Creative Research Fund of Jiangsu Province under Grant CXZZ11-0483. Z. H. Deng, Y. Jiang, and S. Wang are with the School of Information Tech- nology, Jiangnan University, Wuxi 214122, China (e-mail: dzh666828@yahoo. com.cn; [email protected]). F. L. Chung is with the Department of Computing, Hong Kong Polytechnic University, Hong Kong (e-mail: [email protected]). H. Ishibuchi is with the Department of Computer Science and Intelli- gent Systems, Osaka Prefecture University, Osaka 599-8531, Japan (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TFUZZ.2012.2212444 Fig. 1. Transfer learning. transfer learning is a promising approach to obtain an effective model of data from the current scene by effectively leveraging the useful information from the reference scenes in the learning procedure. The situations that need transfer learning are becoming com- mon in real-world applications. Two examples can be given as follows. One is the modeling of fermentation process [11]. In the current scene of a microbiological fermentation process, the collected data may be insufficient or some of the data are miss- ing due to the deficiency of the sensors’ setup. Thus, we cannot effectively model the fermentation process for this scene with the collected data. However, there may have sufficient data avail- able from another similar microbiological fermentation process, which can be viewed as a reference scene for the current scene. Hence, transfer learning can be exploited to make use of the information from the reference scene to improve the modeling of the current scene, resulting to obtain a model with better generalization capability. Another example is the indoor WiFi localization problem [12], which aims to detect a user’s cur- rent location based on previously collected WiFi data. The WiFi data, in terms of the signal-strength values, may be a function of time, device, or other dynamic factors. For a new scene, it is very expensive to collect sufficient data for modeling. In this situation, a model trained for the reference scene’s location es- timation (corresponding to a particular time period or a device) may suffer from performance degradation when it is directly ap- plied to the current scene. Meanwhile, the model trained based on the collected data in the current scene also cannot implement the location estimation task satisfactorily as the training data is limited. In this case, transfer learning can be an effective solu- tion to the corresponding modeling task because it can enhance the modeling effect by leveraging the available information of the reference scenes, such as the collected data in other time periods or with other setups. Recently, transfer learning has been studied extensively for different applications [13], such as text classification and indoor WiFi location estimation. The existing works can be generally 1063-6706/$31.00 © 2013 IEEE

Knowledge-Leverage-Based Fuzzy System and Its Modeling

  • Upload
    shitong

  • View
    216

  • Download
    3

Embed Size (px)

Citation preview

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO. 4, AUGUST 2013 597

Knowledge-Leverage-Based Fuzzy Systemand Its Modeling

Zhaohong Deng, Member, IEEE, Yizhang Jiang, Member, IEEE, Fu-Lai Chung, Member, IEEE,Hisao Ishibuchi, and Shitong Wang

Abstract—The classical fuzzy system modeling methods onlyconsider the current scene where the training data are assumedfully collectable. However, if the available data from that scene areinsufficient, the fuzzy systems trained will suffer from weak gener-alization for the modeling task in this scene. In order to overcomethis problem, a fuzzy system with knowledge-leverage capability,which is known as a knowledge-leverage-based fuzzy system (KL-FS), is proposed in this paper. The KL-FS not only makes full use ofthe data from the current scene in the learning procedure but caneffectively make leverage on the existing knowledge from the ref-erence scene, e.g., the parameters of a fuzzy system obtained froma reference scene, as well. Specifically, a knowledge-leverage-basedMamdani–Larsen-type fuzzy system (KL-ML-FS) is proposed byusing the reduced set density estimation technique integrating withthe corresponding knowledge-leverage mechanism. The new fuzzysystem modeling technique has been verified by experiments onsynthetic and real-world datasets, where KL-ML-FS has betterperformance and adaptability than the traditional fuzzy modelingmethods in scenarios with insufficient data.

Index Terms—Fuzzy modeling, fuzzy systems, knowledge lever-age, Mamdani–Larsen fuzzy model, missing data, reduced set den-sity estimator (RSDE), transfer learning.

I. INTRODUCTION

TRANSFER learning and/or domain adaptive learning[1]–[10] are becoming increasingly attractive in the fields

of computational intelligence and machine learning to cope withthe new challenges in the real-world modeling tasks, wherethe data of the current scene are insufficient, but some usefulinformation of related scenes (reference scenes) is available.As shown in Fig. 1 with the auxiliary explanations in Table I,

Manuscript received September 6, 2011; revised April 13, 2012; acceptedJune 14, 2012. Date of publication August 8, 2012; date of current versionJuly 31, 2013. This work was supported in part by the Hong Kong PolytechnicUniversity under Grant Z-08R and Grant 1-ZV6C, the National Natural Sci-ence Foundation of China under Grant 60903100, Grant 60975027, and Grant60117012, the Natural Science Foundation of Jiangsu province under GrantBK2009067 and Grant BK2011003, JiangSu 333 Expert Engineering GrantBRA2011142, and the 2011 Postgraduate Student’s Creative Research Fund ofJiangsu Province under Grant CXZZ11-0483.

Z. H. Deng, Y. Jiang, and S. Wang are with the School of Information Tech-nology, Jiangnan University, Wuxi 214122, China (e-mail: [email protected]; [email protected]).

F. L. Chung is with the Department of Computing, Hong Kong PolytechnicUniversity, Hong Kong (e-mail: [email protected]).

H. Ishibuchi is with the Department of Computer Science and Intelli-gent Systems, Osaka Prefecture University, Osaka 599-8531, Japan (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TFUZZ.2012.2212444

Fig. 1. Transfer learning.

transfer learning is a promising approach to obtain an effectivemodel of data from the current scene by effectively leveragingthe useful information from the reference scenes in the learningprocedure.

The situations that need transfer learning are becoming com-mon in real-world applications. Two examples can be given asfollows. One is the modeling of fermentation process [11]. Inthe current scene of a microbiological fermentation process, thecollected data may be insufficient or some of the data are miss-ing due to the deficiency of the sensors’ setup. Thus, we cannoteffectively model the fermentation process for this scene withthe collected data. However, there may have sufficient data avail-able from another similar microbiological fermentation process,which can be viewed as a reference scene for the current scene.Hence, transfer learning can be exploited to make use of theinformation from the reference scene to improve the modelingof the current scene, resulting to obtain a model with bettergeneralization capability. Another example is the indoor WiFilocalization problem [12], which aims to detect a user’s cur-rent location based on previously collected WiFi data. The WiFidata, in terms of the signal-strength values, may be a functionof time, device, or other dynamic factors. For a new scene, itis very expensive to collect sufficient data for modeling. In thissituation, a model trained for the reference scene’s location es-timation (corresponding to a particular time period or a device)may suffer from performance degradation when it is directly ap-plied to the current scene. Meanwhile, the model trained basedon the collected data in the current scene also cannot implementthe location estimation task satisfactorily as the training data islimited. In this case, transfer learning can be an effective solu-tion to the corresponding modeling task because it can enhancethe modeling effect by leveraging the available information ofthe reference scenes, such as the collected data in other timeperiods or with other setups.

Recently, transfer learning has been studied extensively fordifferent applications [13], such as text classification and indoorWiFi location estimation. The existing works can be generally

1063-6706/$31.00 © 2013 IEEE

598 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO. 4, AUGUST 2013

TABLE ISOME TERMS USED IN THIS PAPER

categorized into three tasks: 1) transfer learning for classifica-tion [1]–[10], [12]; 2) transfer learning for unsupervised learning(clustering [14] and dimensionality reduction [15]); and 3) trans-fer learning for regression [16]–[20]. Currently, transfer learn-ing for classification has been studied in depth while the studiesabout regression are still scarce. Although transfer learning isvery appropriate for many regression problems, the studies oftransfer learning for regression tasks still cannot keep up with therequirements. In this study, we focus on transfer regression mod-eling. As an important regression model [21]–[23], [41]–[43],fuzzy system modeling should be an appropriate candidate forthe incorporation of transfer learning. To the best of our knowl-edge, the study of transfer learning for fuzzy system modelinghas yet to be reported.

For fuzzy system modeling, transfer learning is very usefulin some real-world modeling tasks, where the traditional meth-ods may not work well. For example, the trained fuzzy systemsare much weaker in generalization capability when the trainingdata are insufficient or only partially available [24], [25]. Asin real-world applications that the sensors and setups for datasampling may not be steady due to the noisy environment orother malfunctions, one may result to have insufficient data col-lected for the modeling task. In order to tackle this problem,a feasible remedy strategy is to boost up the performance byborrowing useful information from reference scenes (or relatedscenes), which may be the data or the relevant knowledge likethe density distribution and fuzzy rules. The simplest way toobtain the information from the reference scenes is to use thedata collected from the reference scenes. However, borrowingdata from reference scenes suffers from two major challenges.First, due to the necessity of privacy protection in some appli-cations, such as the aforementioned fermentation process, thedata of reference scenes cannot always be obtained. Under thissituation, the knowledge about the reference scenes, such as thedensity distribution and model parameters, should be easier toobtain for enhancing the modeling of the current scene. Sec-ond, the data from the reference scenes cannot be directly usedin the current scene as there may exist a drifting phenomenonbetween these scenes. These two challenges should be properlyaddressed in order to develop an effective transfer regressionmodeling strategy for fuzzy system modeling.

In this study, a fuzzy system modeling approach withknowledge-leverage capability from reference scenes is ex-ploited. In view of its popularity, the Mamdani–Larsen-typefuzzy system (ML-FS) was chosen to incorporate with aknowledge-leverage mechanism. It makes use of a novel objec-

tive criterion to integrate the model knowledge of the referencescene and the data of the current scene and, thus, learn the in-duced fuzzy rules of the model accordingly. The learning fromthe knowledge of the reference scene will effectively make upthe deficiency of the learning from the insufficient data of thecurrent scene. Hence, the proposed knowledge-leverage-basedML-FS (KL-ML-FS) and its modeling/learning algorithm aremore adaptive to the situations, where the data are only partiallyavailable from the current scene but some useful knowledge ofthe reference scene is available. Besides, the proposed method isdistinctive in preserving the data privacy as only the knowledge(such as the corresponding density distribution) rather than thedata of the reference scene is adopted.

The rest of this paper is organized as follows. In Section II,the concepts of knowledge leverage and knowledge-leverage-based fuzzy systems (KL-FSs) are introduced. In Section III,a specific knowledge-leverage strategy and the correspondingknowledge-leverage-based fuzzy system, i.e., the KL-ML-FSand its learning algorithm, are proposed. In Section IV, theexperimental results are reported and discussed. Conclusionsare given in the final section.

II. FUZZY SYSTEM LEARNING AND KNOWLEDGE LEVERAGE

A. Classical Fuzzy System Learning Methods

The traditional fuzzy system modeling is strongly dependenton the experience of experts, and the model parameters aredetermined by the experts. With the introduction of machine-learning techniques, the most commonly used training methodsfor fuzzy systems have become data-driven methods and theytypically obtain the model parameters by some optimizationtechniques based on a certain objective function and the avail-able data sampled from the scene to be modeled. Almost all thedata-driven training algorithms for fuzzy system modeling onlyconsider the data collected from the current scene, even if someuseful information of the reference scenes is available.

B. Learning with Knowledge Leverage

As mentioned in the Section I, when the data are insufficient,the existing data-driven fuzzy system training methods will nolonger be effective and the generalization performance of thetrained systems will become inferior. A promising strategy tocope with this challenge is to refer to other reference scenes,which possibly have important information that is usually sim-ilar to that of the current scene at least to a certain extent. From

DENG et al.: KNOWLEDGE-LEVERAGE-BASED FUZZY SYSTEM AND ITS MODELING 599

Fig. 2. Framework of KL-FS modeling.

the reference scenes, there are usually two categories of infor-mation available, i.e., the data and the knowledge. For these twotypes of information available from the reference scenes, thecorresponding characteristics are given as follows.

1) The data are the original information and can be furtherprocessed to obtain some knowledge. However, the dataare not always available in some situations. For example,many data samples cannot be made open due to the ne-cessity of privacy protection. Moreover, even if the dataof the reference scenes are available, it may not be appro-priate to directly adopt them for the modeling task in thecurrent scene as there possibly exists a drifting betweenmodels of different scenes, and thus, some data from thereference scene may have negative influence on the systemmodeling of the current scene.

2) Knowledge is another kind of important information fromthe scene. The types of knowledge include the densitydistribution, fuzzy rules, and model parameters. Most ofthem can be obtained by some learning procedures. Forexample, the parameters of a fuzzy system for the refer-ence scene are learned by a certain fuzzy system trainingalgorithm based on the data collected from that scene. De-spite the fact that most of the knowledge obtained cannotbe inversely mapped to the original data, which is goodfrom a privacy preservation point of view, they are impor-tant information from the reference scene to improve themodeling of the current scene.

Based on these characteristics, it is very appropriate to exploitthe use of knowledge rather than data from the reference sceneand the data available from the current scene to accomplish themodeling task. In this study, the corresponding fuzzy systemtrained with such a strategy is called KL-FS.

C. Framework of KL-FS Modeling

For the KL-FS here, its construction process can be describedby Fig. 2, where we can see that there are two main informationsources for the learning of the KL-FS, i.e., data of current sceneand knowledge of reference scenes. With these two categoriesof information, parameter learning is carried out, and the fuzzysystem is obtained for the modeling task of the current scene.

III. KNOWLEDGE-LEVERAGE-BASED MAMDANI–LARSEN-TYPE FUZZY SYSTEMS

Classical fuzzy system models include the Mamdani–Larsen(ML) model [26], [27], Takagi–Sugeno (T–S) model [28], andgeneralized fuzzy model (GFM) [29], [30]. The ML model ispopular due to its simplicity. In this study, we adopt it andpropose a KL-ML-FS and the associated learning algorithm.First, a recently proposed ML-type fuzzy system constructionmethod, i.e., the reduced set density estimation-based ML-typefuzzy system (RSDE-ML-FS) construction method [33], [34] isreviewed. Based on this method, a corresponding knowledge-leverage mechanism is derived and then incorporated to developthe KL-ML-FS and its learning algorithm.

A. Reduced Set Density Estimator-Based Mamdani–Larsen-Type Fuzzy System Construction

1) Mamdani–Larsen-Type Fuzzy Systems: The fuzzy infer-ence rules of ML-type fuzzy systems can be described asfollows:

IF x1 is Ak1 ∧ x2 is Ak

2 ∧ . . . ∧ xD is AkD

THEN y is Bk (bk , vk )

which is premised on the input vector x = [x1 , x2 , . . . , xD ]T ∈RD , and maps the fuzzy set in the input space Ak ⊂ RD toa fuzzy set in the output space, Bk ⊂ R. Here, Ak

i is a fuzzyset subscribed by the input variable xi for the kth rule, and ∧is a fuzzy conjunction operator. Bk (bk , vk ) in the THEN-partof the ML-type fuzzy rule is a fuzzy set with centroid bk andfuzziness index vk . The firing strength of the kth rule μk (x) canbe obtained by taking the following fuzzy conjunction (denotedby ∧) of the membership functions of a rule’s IF-part, i.e.,

μk (x) = μk1 (x1) ∧ μk

2 (x2) ∧ . . . ∧ μkD (xD ). (1)

When the multiplicative conjunction, multiplicative implica-tion, and additive disjunction are employed for the conjunctionoperator, implication operator and disjunction operator, respec-tively, the defuzzified output of the ML-type fuzzy system canbe given by

y0 =∑K

k=1 μk (x) · vk · bk∑K

k ′=1 μk ′(x) · vk ′=

∑Kk=1

∏Dj=1 μk

j (xj ) · vk · bk∑K

k ′=1∏D

j=1 μk ′j (xj ) · vk ′

.

(2)2) Reduced Set Density Estimator-Based Mamdani–Larsen-

Type Fuzzy System Training: An important advance of theML-type fuzzy system is the kernel density estimation-basedML-type fuzzy system construction methods. In [31], [32], thereduced set density estimator (RSDE) and its scalable versionwere proposed for kernel density estimation. Given a refer-ence dataset, S = {x1 ,x2 , . . . ,xN } ∈ RD , the correspondingRSDE can be denoted as

�p (x;h, γ) =

M∑

i=1

γiKh(x,xi) (3)

where M denotes the number of data samples in the obtainedreduced set, which is a subset of the reference dataset S, and xi

600 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO. 4, AUGUST 2013

is the ith data in this reduced set. If the density distribution of aregression dataset is estimated by using RSDE, given the inputof an input–output data pair, the corresponding expected outputcan be denoted as [33]

E [y|x] =∑M

i=1 γiG(x, xi , σ2rsde)yi

∑Mi ′=1 γi ′G(x, xi , σ2

rsde)

=

∑Mi=1 γi

∏Dj=1 G(xj , xij , σ2

rsde)yi∑M

i ′=1 γi ′∏D

j=1 G(xj , xi ′j , σ2rsde)

(4)

where (xi ∈ RD , yi ∈ R) denotes the ith data pair in the ob-tained reduced set by using RSDE and Gaussian density distri-bution function, i.e., G(x, xi , σ

2rsde), is adopted as the kernel

function Kh(x, xi) in (3).Since (2) and (4) have the same form, the equivalence between

RSDE and ML-FS construction were revealed in [33], [34], andthus, the RSDE-ML-FS training algorithm [34] and its scal-able version [33] were proposed. For details of the constructionprocedure of an ML-FS-based on RSDE, see [33] and [34].

B. Knowledge-Leverage-Based Mamdani–Larsen-TypeFuzzy System

With respect to the ML-type fuzzy system, a KL-ML-FS isproposed by integrating the RSDE-ML-FS construction methodwith a corresponding knowledge-leverage mechanism. Thechallenge here is how to make good use of the knowledge of thereference scenes to remedy the deficiency of insufficient datain the current scene and to develop the corresponding learningalgorithm.

1) Objective Criterion Integrating the Knowledge of Refer-ence Scenes: For an ML-FS constructed by the RSDE tech-nique, the corresponding density estimation obtained by RSDEmay be viewed as the knowledge. Thus, for an ML-FS obtainedin a reference scene, we can take the associated RSDE den-sity estimation as the useful knowledge of this reference scene.In this study, in order to develop an effective KL-ML-FS, wepresent an optimization criterion integrated with the knowledgeof the reference scene for the model learning of the currentscene.

First, the RSDE process corresponding to the ML-FS obtainedin a reference scene is denoted as

�p (x) =

M∑

i=1

piG(x,up,i , σ2p,i) (5)

where the parameters have been determined before and can betaken as the existing useful knowledge of the reference scene.Meanwhile, for the ML-FS to be constructed in the currentscene, the corresponding RSDE density estimation is expressedas

�q (x) =

M∑

i=1

qiG(x,uq ,i , σ2q ,i) (6)

where the parameters of this density estimation need to be deter-mined, which is also the core task of the proposed KL-ML-FSconstruction method.

The original RSDE is usually implemented by a certain opti-mization criterion based on the given dataset [31], [32]. Thus,the obtained density estimation is only learned from the avail-able data, and then, if the original RSDE technique is used forML-FS modeling, the obtained fuzzy system is also learnedfrom the data of current scene only. In this situation, when thedata is insufficient, the obtained fuzzy system will become weakin its generalization for the modeling in this scene.

In order to overcome this problem, we present the followingoptimization criterion, which integrates the knowledge-leverageterm of the RSDE with the corresponding fuzzy modeling task:

arg minqi ,uq , i ,σq , i

J1 =∫

(�p (x)−�

q (x))2dx+λ

(�q (x) − q(x))

2dx.

(7-1)Here, (7-1) consists of two terms, i.e., the first term denoting

the knowledge leverage of the reference scene from RSDE,which makes the desired density estimation to approximate thedensity estimation associated with the reference scene, and thesecond term denoting the learning from the data of the currentscene from RSDE, which makes the desired density estimationto approximate the true density distribution of the current scene.More information about the second term can be seen in [31] and[32]. The parameter λ in (7-1) is used to balance the influence ofthe two terms and an optimal value can be determined by usingthe commonly used cross-validation strategy.

From (7-1), we can see that the learning procedure of RSDE(or the corresponding ML-FS) will not only learn from the dataof current scene but inherits the knowledge of reference sceneas well. Furthermore, (7-1) can be expressed as

arg minqi ,uq , i ,σq , i

J1 =∫

(�p (x)

2 −2�p (x)�

q (x) + �q (x)

2)dx

+ λ

(�q (x)

2 −2�q (x)q(x) + q(x)2)dx

= arg minqi ,uq , i ,σq , i

(−2�p (x)�

q (x) + �q (x)

2)dx

+ λ

(�q (x)

2 −2q(x)�q (x))dx

= arg minqi ,uq , i ,σq , i

(1 + λ)∫

�q (x)

2dx

− 2∫

�p (x)�

q (x)dx − 2λ

q(x)�q (x)dx.

(7-2)

Since∫

G(x,uq ,j , σ2p,i)G(x,uq ,j , σ

2q ,j )dx = G(up,i ,uq ,j ,

σ2p,i +σ2

q ,j ) and based on (5) and (6), we have

∫�p (x)�

q (x)dx =M∑

i=1

M∑

j=1

piqjG(up,i ,uq ,j , σ

2p,i +σ2

q ,j

)

(8-1)∫

�q (x)

2dx =

M∑

i=1

M∑

j=1

qi qj G(uq ,i ,uq ,j , σ

2q ,i +σ2

q ,j

).

(8-2)

DENG et al.: KNOWLEDGE-LEVERAGE-BASED FUZZY SYSTEM AND ITS MODELING 601

Meanwhile, we may approximate∫

�q (x)q(x)dx as follows

[31], [32]:

∫�q (x)q(x)dx = E(�

q (x)) = E

⎝M∑

j=1

G(x,uq ,j , σ

2q ,j

)⎞

≈ 1N

N∑

i=1

M∑

j=1

qjG(xi ,uq ,j , σ

2q ,j

). (8-3)

Substituting (8-1)–(8-3) into (7-2), we have

arg minqi ,uq , i ,σq , i

J1 = (1 + λ)M∑

i=1

M∑

j=1

qi qj G(uq ,i ,uq ,j , σ

2q ,i +σ2

q ,j

)

− 2M∑

i=1

M∑

j=1

piqjG(up,i ,uq ,j , σ

2p,i +σ2

q ,j

)

− 2λ

N∑

i=1

M∑

j=1

1N

qj G(xi ,uq ,j , σ

2q ,j

). (9)

Based on (9), we will give the detailed learning rules of theproposed KL-ML-FS construction method next.

2) Parameter Learning: In (9), the variables qj ,uq ,j , andσq,j are all the variables to be determined. Solving this problemdirectly is not a trivial task. In this study, an iterative method isadopted. The iterative procedure contains three main steps.

Step 1: Fix uq ,j , σq,j , and optimize qj using (9).When uq ,j and σq,j are fixed, (9) becomes the following

typical quadratic programming (QP) problem [35]:

arg minqi

M∑

i=1

M∑

j=1

qiqj hij +M∑

i=1

qiβi (10-1)

where

hij = (1 + λ)G(uq ,i ,uq ,j , σ

2q ,i +σ2

q ,j

)(10-2)

βj =

⎧⎨

⎩−2

M∑

j=1

piG(up,i ,uq ,j , σ

2p,i + σ2

q ,j

)

− 2λ

N∑

i=1

1N

G(xi ,uq ,j , σ

2q ,j

)}

. (10-3)

For (10-1), many classical QP algorithms [35]–[38] can bedirectly adopted to solve it.

Step 2: Fix qj , σq,j , and optimize uq ,j using (9).When qj and σq,j are fixed, the necessary conditions of min-

imizing (9) with respect to uq ,j can be obtained by

uq ,j = b/a (11-1)

where

a = −M∑

i=1

piqjG1ij (σ2p,i +σ2

q ,j )−1 − λ

N∑

i=1

1N

qj G3ij

· (σ2q ,j )

−1 +(1 + λ)M∑

i=1,i �=j

qi qj [G2ij (σ2p,i +σ2

q ,j )−1 ]

(11-2)

b = −M∑

i=1

piqjG1ij (σ2p,i +σ2

q ,j )−1 up,i − λ

N∑

i=1

1N

qj G3ij

· σ2q ,j

−1 xi +(1 + λ)M∑

i=1,i �=j

qi qj [G2ij (σ2q ,i +σ2

q ,j )−1 ]uq ,i

(11-3)

with G1ij =G(up,i ,uq ,j ,σ2p,i +σ2

q ,j ),G2ij =G(uq ,i ,uq ,j ,σ2q ,i +

σ2q ,j ), and G3ij = G

(xi ,uq ,j , σ

2q ,j

). Here, (11-1) can be taken

as the learning rule of the parameters uq ,j .Step 3: Fix qj and uq ,j , and optimize σq,j using (9).When qj and uq ,j are fixed, the necessary condition for min-

imizing (9) with respect to σq,j can be expressed as

∂ J1

∂ σq,j= 0 (12-1)

where ∂ J1∂σq , j

can be formulated as

∂ J1

∂σq,j= 2

M∑

i=1

piqj G1ij [D · (σ2p,i +σ2

q ,j )−1

− (σ2p,i +σ2

q ,j )−2 · (up,i −uq ,j )

T (up,i −uq ,j )] · σq,j

+ (1 + λ)

⎧⎪⎪⎨

⎪⎪⎩

−2M∑

i=1,i �=j

qi qj G2ij [D · (σ2q ,i +σ2

q ,j )−1

− (σ2q ,i +σ2

q ,j )−2 · (uq ,i −uq ,j )

T (uq ,i −uq ,j )] · σq,j

− 2 q2j G2ij ·D · (2σ2

q ,j )−1 ·σq,j

⎫⎪⎪⎬

⎪⎪⎭

+ 2λ

N∑

i=1

1N

qj G3ij ·[D ·σ2q ,j

−1

− σ2q ,j

−2 (xi −uq ,j )T (xi −uq ,j )] · σq,j . (12-2)

In (12-2), D denotes the dimensionality of xi . Since it isdifficult to obtain the analytical solution of (12-1) with respectto σq,j , the following gradient descent learning rule is proposedfor optimizing the parameters σq,j :

σq,j (s + 1) = σq,j (s) − η∂ J1

∂σq,j (s)(12-3)

where s is the index of iteration. The detailed derivations of(11-1)–(11-3) and (12-1)–(12-3) can be found in the Appendix.

3) Algorithm: Based on the aforementioned update rules,the learning algorithm of the proposed KL-ML-FS is presentedin Algorithm 1 below.

The proposed KL-ML-FS and its learning algorithm havethe following distinctive advantages compared with the exist-ing methods. First, it has inherited the characteristics of theoriginal RSDE-ML-FS method. Second, it has better adaptabil-ity from the reference scenes through its knowledge-leveragemechanism. Third, due to the introduction of useful knowledge

602 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO. 4, AUGUST 2013

from the reference scenes, the generalization performance of theproposed algorithm is better than that of the existing methodswithout knowledge transfer abilities.

The convergence of the proposed KL-ML-FS’s learning algo-rithm in Algorithm 1 can be briefly discussed as follows. Whenuq ,j and σq,j are fixed, the obtained qj using (10-1) may be theglobal minimum solution of (9), and we have J1(qj (t + 1)) ≤J1(qj (t)). When qj and σq,j are fixed, the obtained uq ,j using(11-1) will satisfy J1(uq ,j (t + 1)) ≤ J1(uq ,j (t)). When qj anduq ,j are fixed, the obtained σq,j by using the gradient descentlearning will also satisfy J1(σq,j (t + 1)) ≤ J1(σq,j (t)). Thus,we can get the following convergence series:

J1(qj (t + 1),uq ,j (t + 1), σq,j (t + 1))

≤ J1(qj (t + 1),uq ,j (t + 1), σq,j (t))

≤ J1(qj (t + 1),uq ,j (t), σq,j (t))

≤ J1(qj (t),uq ,j (t), σq,j (t)).

Based on this analysis, we can infer that the KL-ML-FS’slearning algorithm in Algorithm 1 will converge to a local opti-mal solution or a local saddle point.

IV. EXPERIMENTAL RESULTS

A. Setup

In order to evaluate the proposed KL-ML-FS and its learningalgorithm, both synthetic and real-world datasets were adoptedas will be described in Sections IV-B and C, respectively. Theperformance comparisons will be reported from two aspects,namely, compared with the traditional RSDE-based ML-FS con-struction methods (RSDE-ML-FS) and compared with the re-gression methods designed for datasets with missing or noisydata.

For the first aspect, the comparison was set as 1) to constructthe ML-FS by RSDE-ML-FS based on the data in the refer-ence scene, i.e., RSDE-ML-FS(D1), 2) to construct the ML-FSby RSDE-ML-FS based on the data in the current scene, i.e.,RSDE-ML-FS(D2), 3) to construct the ML-FS by RSDE-ML-FS based on the data in both the current scene and the refer-ence scene, i.e., RSDE-ML-FS(D1+D2), and 4) to constructthe ML-FS by using the proposed KL-ML-FS, i.e., KL-ML-FS(D2+Knowledge). With the obtained four fuzzy systems, thetesting data of the current scene were used for evaluating thegeneralization performance.

For the second aspect, the compared methods are 1) TS-fuzzysystem-based support vector regression (TSFS-SVR) [39],2) fuzzy system learned through fuzzy clustering and supportvector machine (FS-FCSVM) [40], and 3) Bayesian task-leveltransfer learning for non-linear regression method (HiRBF)[16].

The methods adopted for performance comparison have beensummarized in Table II, and the following performance index[30] is used:

J =∑N

i=1 e2(i)N · yr

(13)

where N denotes the size of the testing dataset, e(i) =ytarget(i)−ymodel(i) and yr =[max(ytarget)−min(ytarget)]2 ,with ytarget(i) being the expected output of the ith testing datainput and ytarget(i) being the real output of the fuzzy system forthe ith testing data input. The smaller the value of J , the betterthe generalization performance is.

In all our experiments, the attributes of data have been nor-malized into the interval [0,1] before running the correspond-ing algorithms and the outputs of fuzzy systems are revertedto the original interval for computing the performance index.For the proposed algorithm, the user parameters about themaximum number of external iterations, the maximum num-ber of internal iterations and the threshold are set to 5, 100,and 1e-7, respectively. For all the methods adopted for com-parison, the hyper parameters are all determined by cross-validation strategy with the training dataset. All algorithms wereimplemented in MATLAB on a computer with Intel Core 2Duo P8600 2.4-GHz CPU and 1-GB RAM. For clarity, somenotations and their definitions for the datasets are listed inTable III.

B. On Synthetic Datasets

1) Construction of Synthetic Datasets: In order to generatethe synthetic datasets used to simulate the scenarios discussedin this study, the following requirements need to be satisfied.1) The reference scene should be related to the current scene,i.e., the reference and current scenes are different but related.2) Part of the data of the current scene are not available ormissing. In other words, the available data from the currentscene are insufficient.

Based on the aforementioned requirements, we considerthe function Y = f(x) = cos(x) ∗ x, x ∈ [−8, 8] to describethe reference scene, which is used to generate the dataset of

DENG et al.: KNOWLEDGE-LEVERAGE-BASED FUZZY SYSTEM AND ITS MODELING 603

TABLE IIMETHODS ADOPTED FOR PERFORMANCE COMPARISON

TABLE IIINOTATIONS ABOUT THE ADOPTED DATASETS AND THEIR DEFINITIONS

TABLE IVDETAILS ABOUT THE SYNTHETIC DATASETS

Fig. 3. Two functions for representing two different scenes with relation parameter r = 0.85 and the corresponding sampled data from these scenes. (a) Functionsfor representing the reference scene (Y ) and the current scene (y). (b) Data of the reference scene and the training data of the current scene with intervals [–7.2,–5.6] and [0, 2] having missing data.

the reference scene (D1). On the other hand, the functiony = r ∗ f(x) = r ∗ cos(x) ∗ x, x ∈ [−8, 8] is used to denote thecurrent scene, which is used to generate the training dataset (D2)and testing datasets (D2_test) of the current scene. Here, r is arelation parameter between the reference scene and the currentscene, which is used to control the similarity and differencebetween these two scenes. When r = 1, it means that the twoscenes are the same. On the other hand, when the training setis generated for the current scene, we set some intervals withmissing data to simulate the lack of information (insufficientdata) as shown in Table IV. Fig. 3(a) shows the two functionsused to simulate two related scenes with the relation parame-ter r = 0.85 and Fig. 3(b) shows the dataset of the referencescene and the training sets of the current scene with two differ-ent intervals having missing data under the same relation (i.e.,r = 0.85).

2) Comparing With Reduced Set Density Estimator-BasedMamdani–Larsen-Type Modeling: In this section, a comparison

between the proposed method and the traditional RSDE-basedML-FS modeling methods is reported. The experimental resultsare recorded in Table V and Fig. 4, and based on this, thefollowing observations can be made.

1) From Table V, we can see that the generalization per-formance of the knowledge-leverage-based fuzzy systemKL-ML-FS is better than that of the traditional RSDE-ML-FS methods.

2) Fig. 4(a) shows the modeling results of the RSDE-ML-FSbased on only the data of the reference scene. These resultsindicate that the generalization performance of the ML-FSobtained by RSDE-ML-FS is weak as there usually existsa drifting between the reference scene and the currentscene.

3) Fig. 4(b) shows the modeling results of the RSDE-ML-FSbased on only the data of the current scene. The resultsindicate that the generalization performance of the ML-FS obtained by RSDE-ML-FS is also much weaker for

604 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO. 4, AUGUST 2013

TABLE VGENERALIZATION PERFORMANCE J OF THE PROPOSED KL-ML-FS METHOD AND THE TRADITIONAL RSDE-BASED ML-FS CONSTRUCTION

METHODS ON THE SYNTHETIC DATASETS

Fig. 4. Performance comparison of the proposed KL-ML-FS method and thetraditional RSDE based ML-FS construction methods on the synthetic datasetsof Fig. 3(b). (a) RSDE-ML-FS (D1). (b) RSDE-ML-FS (D2). (c) RSDE-ML-FS(D1+D2). (d) KL-ML-FS (D2+ Knowledge).

the current scene. An obvious reason is that the data inthe training set is insufficient, which makes the obtainedML-FS degrade in the generalization capability. Specifi-cally, the prediction performance is obviously bad in theintervals with missing data in the training dataset.

4) Fig. 4(c) shows the modeling results of the RSDE-ML-FS based on the data of both the current scene and thereference scene. The results indicate that although thedata of both scenes have been used for training, the gen-eralization performance of the obtained ML-FS is stillnot good enough for the current scene. This can be ex-plained by two reasons. First, there exists a drifting phe-nomenon between the reference and current scenes, i.e.,not all data in the reference scene are useful for the mod-eling task of the current scene, and some of them mayeven make a negative influence. Second, the size of thereference scene is larger than that of the current scene,and this makes the obtained ML-type fuzzy system moreapt to approximate the reference scene than the currentscene.

5) Fig. 4(d) shows the modeling results of the proposed KL-ML-FS. Comparing the result of KL-ML-FS with the re-sults of RSDE-ML-FS, we can have the following obser-vations. First, when comparing Fig. 4(a) and (d), we seethat the KL-ML-FS demonstrates better prediction results

Fig. 5. Performance comparison of the proposed KL-ML-FS method and threerelated regression methods on the synthetic datasets of Fig. 3(b). (a) TSFS-SVR,(b) FS-FCSVM, (c) HiRBF, and (d) the proposed KL-ML-FS.

than the RSDE-ML-FS, which only uses the data of refer-ence scene. Second, when comparing Fig. 4(b) and (d), itis easy to see that the KL-ML-FS has effectively remediedthe deficiency of the ML-FS obtained by RSDE-ML-FSby introducing the knowledge-leverage mechanism. Whencomparing Fig. 4(c) and (d), we also find that the KL-ML-FS has shown better generalization performance than theRSDE-ML-FS which has used the data of both referenceand current scenes. It especially deserves pointing out thatthe KL-ML-FS also has better privacy-protection capabil-ity than the methods that use the data of reference scenesdirectly. When the data in the reference scenes are notavailable due to the necessity of privacy protection andonly some knowledge are revealed in many situations, themethods that use the data of all scenes are no longer feasi-ble. Thus, the proposed KL-ML-FS is distinctive in thesesituations.

3) Comparing With Regression Methods Designed for Noisyand/or Missing Data: In this section, a comparison betweenthe proposed method KL-ML-FS and three related regressionmethods designed for handling noisy/missing data is reported.The experimental results are shown in Table VI and Fig. 5. Fromthese results, we can give the following observations.

1) The KL-ML-FS has shown a better generalization perfor-mance than the other three related methods.

DENG et al.: KNOWLEDGE-LEVERAGE-BASED FUZZY SYSTEM AND ITS MODELING 605

TABLE VIGENERALIZATION PERFORMANCE J OF THE PROPOSED KL-ML-FS METHOD AND SEVERAL RELATED REGRESSION METHODS ON THE SYNTHETIC DATASETS

2) Fig. 5(a) and (b) shows that the support vector learning-based fuzzy modeling methods TSFS-SVR and FS-FCSVM have shown a better generalization performanceto some extent. For example, although the data in the in-terval [0, 2] are missing, these two methods still show thepromising generalization capability at this interval. How-ever, on the other interval [–7.2, –5.6] with missing data,these two methods cannot give acceptable generalizationabilities.

3) Although the transfer learning-based method HiRBF hasused the data in both the current scene and the referencescene in the training procedure, Fig. 5(c) shows that thismethod cannot effectively cope with the problem causedby the missing data and shows much weaker generalizationability on both intervals with missing data.

4) Fig. 5(d) shows that the proposed KL-ML-FS has shownacceptable generalization capability on the intervals withmissing data since it has effectively leveraged the usefulknowledge from the reference scene to remedy the defi-ciency of insufficient data in the training procedure.

C. On Real-World Datasets

1) Glutamic Acid Fermentation Process Modeling: To fur-ther evaluate the performance of the proposed KL-FS learn-ing method, an experiment was conducted to apply the pro-posed method to model a biochemical process [11]. The datasetadopted originates from the glutamic acid fermentation process,which is a multiple-input–multiple-output dataset. The inputvariables of the dataset include the fermentation time h, glucoseconcentration S(h), thalli concentration X(h), glutamic acidconcentration P (h), stirring speed R(h), and ventilation Q(h)at time h, where h = 0, 2, . . . , 28. The output variables are glu-cose concentration S(h+2), thalli concentration X(h+2), andglutamic acid concentration P (h+2) at a future time h+2. TheML-FS-based biochemical process prediction model is illus-trated in Fig. 6. The data in this experiment were collected from21 batch fermentation processes with each batch containing 14effective data samples. In this experiment, in order to matchthe situation discussed in this study, the data are split into twoscenes, i.e., the reference scene and the current scene, as de-scribed in Table VII.

2) Comparing With Reduced Set Density Estimator-BasedMamdani–Larsen-Type Modeling: In this section, a compari-son between the proposed method KL-ML-FS and the tradi-tional RSDE-based ML-FS modeling method RSDE-ML-FSfor fermentation process modeling is reported. The experimen-tal results are reported in Table VIII and Fig. 7. From theseresults, we can see that the observations made are similar to

Fig. 6. Glutamic acid fermentation process prediction model based on ML-FS.

those obtained for the synthetic datasets. The modeling resultsof the KL-ML-FS are better than those of the traditional RSDE-ML-FS method. As the proposed method can learn from notonly the data of the current scene but also the useful knowledgeof the reference scenes, the obtained ML-FS has demonstratedbetter adaptive abilities. From the experimental results, we canfind that even if the data in the training data of the current sceneare missing to some extent, the generalization capability of theobtained ML-FS by the proposed KL-ML-FS will not degradesignificantly, and this is very valuable for the task of biochem-ical process modeling since the lack of some sampling datausually occurs due to the insensitivity of sensors in the noisyenvironment.

3) Comparing With Regression Methods Designed for Noisyand/or Missing Data: In this section, a comparison betweenthe proposed method KL-ML-FS and three related regressionmethods for fermentation process modeling are reported. Theexperimental results are shown in Table IX and Fig. 8. Fromthese experimental results, we can have similar observations asthe previous experiments. In general, the proposed KL-ML-FShas shown better generalization performance than the other threerelated methods for fermentation process modeling. This can beexplained again by the fact that the proposed KL-ML-FS haseffectively leveraged the useful knowledge from the referencescene in the training procedure such that the influence of themissing data can be properly reduced.

V. CONCLUSION

In this study, the concept of knowledge leverage is proposedand used to develop a KL-ML-FS for the scenario with insuffi-cient data in the current scene and useful knowledge availablefrom the reference scene. Accordingly, the KL-ML-FS and itslearning algorithm are presented. The proposed algorithm canlearn from not only the data of the current scene but also theknowledge of the reference scene. Moreover, as only the knowl-edge is used, and the data of the reference scene are not requiredby the training procedure of the fuzzy system in the current

606 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO. 4, AUGUST 2013

TABLE VIIFERMENTATION PROCESS MODELING DATASETS

TABLE VIIIGENERALIZATION PERFORMANCE J OF THE PROPOSED KL-ML-FS METHOD AND THE TRADITIONAL RSDE-BASED ML-FS CONSTRUCTION METHODS FOR THE

GLUTAMIC ACID FERMENTATION PROCESS MODELING

TABLE IXGENERALIZATION PERFORMANCE J OF SEVERAL METHODS IN GLUTAMIC ACID FERMENTATION PROCESS MODELING

Fig. 7. Performance comparison between the proposed KL-ML-FS methodand the traditional RSDE-based ML-FS construction methods for the fermen-tation process modeling. (a) Prediction results of S(h+2) for the 20th batch.(b) Prediction results of S(h+2) for the 21st batch. (c) Prediction results ofX (h+2) for the 20th batch. (d) Prediction results of X (h+2) for the 21st batch.(e) Prediction results of P (h+2) for the 20th batch. (f) Prediction results ofP (h+2) for the 21st batch.

Fig. 8. Performance comparison between the proposed KL-ML-FS methodand three related regression methods for the fermentation process modeling.(a) Prediction results of S(h+2) for the 20th batch. (b) Prediction results ofS(h+2) for the 21st batch. (c) Prediction results of X (h+2) for the 20thbatch. (d) Prediction results of X (h+2) for the 21st batch. (e) Prediction resultsof P (h+2) for the 20th batch. (f) Prediction results of P (h+2) for the 21stbatch.

DENG et al.: KNOWLEDGE-LEVERAGE-BASED FUZZY SYSTEM AND ITS MODELING 607

scene, the proposed learning algorithm has a very good provi-sion for privacy protection of the data in the reference scenes.

The experimental results demonstrate the attractiveness andeffectiveness of the proposed method when compared with sev-eral existing related methods. Although the proposed fuzzysystem and its learning algorithm are very promising, thereare still further works that deserve in-depth study. For exam-ple, the research of another commonly used TSK fuzzy systemwith the knowledge-leverage capability should be very valuable.Besides, for the proposed knowledge-leverage-based ML-typefuzzy system and its learning algorithm, how to reduce the train-ing time is also important for its application in large datasets. Inthe near future, we will try to address these issues.

APPENDIX

A. Derivations of (11-1)–(11-3) and (12-1)–(12-3)

For the objective function J1 in (9), i.e.,

J1 = −2M∑

i=1

M∑

j=1

pi qjup,i G(up,i ,uq ,j , σ

2p,i +σ2

q ,j

)

+ (1 + λ)M∑

i=1

M∑

j=1

qi qj G(uq ,i ,uq ,j , σ

2q ,i +σ2

q ,j

)

− 2λ

N∑

i=1

M∑

j=1

1N

qj G(xi ,uq ,j , σ

2q ,j

).

in order to easily describe the derivation procedure, it is parti-tioned into three parts:

J11 = −2M∑

i=1

M∑

j=1

pi qj G(up,i ,uq ,j , σ

2p,i +σ2

q ,j

)(A1-1)

J12 = (1 + λ)M∑

i=1

M∑

j=1

qi qj G(uq ,i ,uq ,j , σ

2q ,i +σ2

q ,j

)(A1-2)

J13 = −2λ

N∑

i=1

M∑

j=1

1N

qj G(xi ,uq ,j , σ

2q ,j

). (A1-3)

Thus, we have J1 = J11 +J12 +J13 and obtain

∂ J1

∂ uq ,j=

∂ J11

∂ uq ,j+

∂ J12

∂ uq ,j+

∂ J13

∂ uq ,j(A2-1)

∂ J1

∂σq,j=

∂ J11

∂σq,j+

∂ J12

∂σq,j+

∂ J13

∂σq,j. (A2-2)

1) Derivation of (11-1)–(11-3): Let

G1ij = G(up,i ,uq ,j , σ

2p,i +σ2

q ,j

)

= (2π)−D2 ·

(σ2

p,i +σ2q ,j

)−D2

· exp[

−12

(σ2

p,i +σ2q ,j

)−1(up,i −uq ,j )T(up,i −uq ,j )

]

(A3-1)

G2ij = G(uq ,i ,uq ,j , σ

2q ,i +σ2

q ,j

)

= (2π)−D2 ·

(σ2

q ,i +σ2q ,j

)−D2

· exp[

−12

(σ2

q ,i +σ2q ,j

)−1 (uq ,i −uq ,j )T (uq ,i −uq ,j )

]

(A3-2)

G3ij = G(xi ,uq ,j , σ

2q ,j

)

= (2π)−D2 ·

(σ2

q ,j

)−D2

· exp[

−12

σ2q ,j

−1(xi −uq ,j )T(xi −uq ,j )

]

. (A3-3)

We have

∂J11

∂ uq ,j= −2

M∑

i=1

piqjG1ij

(σ2

p,i +σ2q ,j

)−1(up,i −uq ,j ) (A4-1)

∂ J12

∂ uq ,j= (1+λ)

M∑

i=1,i �=j

qi qj [2 · G2ij (σ2q ,i +σ2

q ,j )−1(uq ,i −uq ,j )]

(A4-2)

∂ J13

∂ uq ,j= −2λ

N∑

i=1

1N

qj∂ G3i

∂ uq ,j

= −2λ

N∑

i=1

1N

qj G3ij ·σ2q ,j

−1(xi −uq ,j ). (A4-3)

Substituting (A4-1)–(A4-3) into (A2-1) and setting ∂J1∂ uq , j

=∂J1 1∂ uq , j

+ ∂ J1 2∂uq , j

+ ∂J1 3∂uq , j

= 0, we obtain the following necessaryconditions for minimizing (9), when qj and σq,j are fixed, i.e.,

uq ,j = b/a (A5-1)

where

a = −M∑

i=1

piqjG1ij (σ2p,i +σ2

q ,j )−1

+ (1 + λ)M∑

i=1,i �=j

qi qj G2ij (σ2q ,i +σ2

q ,j )−1

− λ

N∑

i=1

1N

qj G3ij ·σ2q ,j

−1(A5-2)

b = −M∑

i=1

piqjG1ij (σ2p,i +σ2

q ,j )−1 up,i

+ (1 + λ)M∑

i=1,i �=j

qi qj (G2ij (σ2q ,i +σ2

q ,j )−1)uq ,i

− λ

N∑

i=1

1N

qj G3ij ·σ2q ,j

−1 xi . (A5-3)

Here, (A5-1)–(A5-3) are just (11-1)–(11-3) in the text.

608 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO. 4, AUGUST 2013

2) Derivation of (12-1) and (12-2): According to (A1-1)–(A1-3) and (A3-1)–(A3-3), we have

∂ J11

∂ σq,j= −2

M∑

i=1

piqj∂ G1ij

∂σq,j

= 2M∑

i=1

piqj G1ij [D(σ2p,i +σ2

q ,j )−1

− (σ2p,i +σ2

q ,j )−2 · (up,i −uq ,j )

T (up,i −uq ,j )] · σq,j

(A6-1)

∂ J12

∂σq,j= (1 + λ)

⎧⎨

⎩−2

C∑

i=1,i �=j

qi qj G2ij [D(σ2q ,i +σ2

q ,j )−1

− (σ2q ,i +σ2

q ,j )−2 · (uq ,i −uq ,j )

T (uq ,i −uq ,j )] · σq,j

−2 q2j G2ij ·D · (2σ2

q ,j )−1 ·σq,j

}

(A6-2)

∂ J13

∂σq,j= 2λ

N∑

i=1

1N

qj G3ij ·{D σ2q ,j

−1

− σ2q ,j

−2 (xi −uq ,j )T (xi −uq ,j )} · σq,j . (A6-3)

Substituting (A6-1)–(A6-3) into (A2-2) and setting (∂J1/∂σq,j ) = 0, the necessary conditions of minimizing (9) with qj

and uq ,j fixed are

∂ J1

∂σq,j= 0 (A7)

∂ J1

∂σq,j= 2

M∑

i=1

piqj G1ij [D(σ2p,i +σ2

q ,j )−1

− (σ2p,i +σ2

q ,j )−2 · (up,i −uq ,j )

T (up,i −uq ,j )] · σq,j

+ (1 + λ)

{

−2M∑

i=1,i �=j

qi qj G2ij [D(σ2q ,i +σ2

q ,j )−1

− (σ2q ,i +σ2

q ,j )−2 · (uq ,i −uq ,j )

T (uq ,i −uq ,j )] · σq,j

−2 q2j G2ij ·D · (2σ2

q ,j )−1 ·σq,j

}

+ 2λ

N∑

i=1

1N

qj G3ij ·[D σ2q ,j

−1

− σ2q ,j

−2 (xi −uq ,j )T (xi −uq ,j )] · σq,j . (A8)

Since it is difficult to get the analytical solution of the (A7)for σq,j , the following gradient descent learning rule can be usedto optimize σq,j :

σq,j (t + 1) = σq,j (t) − η∂ J1

∂σq,j (t). (A9)

Equations (A7)–(A9) are just (12-1)–(12-3) in the text.

ACKNOWLEDGMENT

The authors would like to thank the reviewers for their valu-able comments, which have greatly improved the quality of ourmanuscript in many ways.

REFERENCES

[1] X. Liao, Y. Xue, and L. Carin, “Logistic regression with an auxiliary datasource,” in Proc. 21st Int. Conf. Mach. Learn., Aug. 2005, pp. 505–512.

[2] J. Huang, A. Smola, A. Gretton, K. M. Borgwardt, and B. Scholkopf,“Correcting sample selection bias by unlabeled data,” in Proc. 19th Annu.Conf. Neural Inf. Process. Syst., 2007.

[3] S. Bickel, M. Bruckner, and T. Scheffer, “Discriminative learning fordiffering training and test distributions,” in Proc. 24th Int. Conf. Mach.Learn., 2007, pp. 81–88.

[4] M. Sugiyama, S. Nakajima, H. Kashima, P. V. Buenau, and M. Kawanabe,“Direct importance estimation with model selection and its application tocovariate shift adaptation,” in Proc. 20th Annu. Conf. Neural Inf. Process.Syst., Dec. 2008, pp. 1433–1440.

[5] N. D. Lawrence and J. C. Platt, “Learning to learn with the informativevector machine,” in Proc. 21st Int. Conf. Mach. Learn., Jul. 2004, pp. 512–519.

[6] A. Schwaighofer, V. Tresp, and K. Yu, “Learning Gaussian process kernelsvia hierarchical Bayes,” in Proc. 17th Ann. Conf. Neural Inf. Process. Syst.,2005, pp. 1209–1216.

[7] J. Gao, W. Fan, J. Jiang, and J. Han, “Knowledge transfer via multiplemodel local structure mapping,” in Proc. 14th ACM SIGKDD Int. Conf.Knowl. Discovery Data Min., Aug. 2008, pp. 283–291.

[8] L. Mihalkova, T. Huynh, and R. J. Mooney, “Mapping and revising markovlogic networks for transfer learning,” in Proc. 22nd Assoc. Adv. Artif. Intell.Conf., Jul. 2007, pp. 608–614.

[9] L. Mihalkova and R. J. Mooney, “Transfer learning by mapping withminimal target data,” in Proc. Assoc. Adv. Artif. Intell. Workshop TransferLearning for Complex Tasks, Jul. 2008.

[10] J. Davis and P. Domingos, “Deep transfer via second-order Markov logic,”in Proc. Assoc. Adv. Artif. Intell. Workshop Transfer Learning for ComplexTasks, Jul. 2008.

[11] Z. H. Deng, K. S. Choi, F. L. Chung, and S. T. Wang, “Scalable TSKfuzzy modeling for very large datasets using minimal-enclosing-ball ap-proximation,” IEEE Trans. Fuzzy Syst., vol. 19, no. 2, pp. 210–226, Apr.2011.

[12] S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, “Domain adaptation viatransfer component analysis,” IEEE Trans. Neural Netw., vol. 22, no. 2,pp. 199–210, Feb. 2011.

[13] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans.Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010.

[14] W. Dai, Q. Yang, G. Xue, and Y. Yu, “Self-taught clustering,” in Proc.25th Int. Conf. Mach. Learn., Jul. 2008, pp. 200–207.

[15] Z. Wang, Y. Song, and C. Zhang, “Transferred dimensionality reduction,”in Proc. Eur. Conf. Mach. Learn. KDD, Sep. 2008, pp. 550–565.

[16] P. Yang, Q. Tan, and Y. Ding, “Bayesian task-level transfer learning fornon-linear regression,” in Proc. Int. Conf. Comput. Sci. Software Eng.,Dec. 2008, pp. 62–65.

[17] L. Borzemski and G. Starczewski, “Application of transfer regression toTCP throughput prediction,” in Proc. 1st Asian Conf. Intell. Inf. DatabaseSyst., Apr. 2009, pp. 28–33.

[18] W. Mao, G. Yan, J. Bai, and H. Li, “Regression transfer learning basedon principal curve,” in Advances in Neural Networks (Lecture Notes onComputer Science 6063), New York, 2010, pp. 365–372.

[19] J. Liu, Y. Chen, and Y. Zhang, “Transfer regression model for indoor 3Dlocation estimation,” in Proc. 16th Int. Conf. on Adv. Multimedia Modeling(Lecture Notes on Computer Science 5916), 2010, pp. 603–613.

[20] D. Pardoe and P. Stone, “Boosting for regression transfer,” in Proc. Int.Conf. Mach. Learn., 2010, pp. 863–870.

[21] J. M. Mendel, Uncertain Rule-Based Fuzzy Logic Systems: Introductionand New Directions. Upper Saddle River, NJ: Prentice-Hall, 2001.

[22] J. C. Bezdek, J. Keller, and R. Krishnapuram, Fuzzy models and algo-rithms for pattern recognition and image processing. San Francisco,CA: Kluwer, 1999.

[23] S. T. Wang, Neural-fuzzy systems and their application. Beijing, China:Beijing Aeronautical Univ. Press, 1998.

[24] L. X. Wang, Adaptive fuzzy systems and control: Design and stabilityanalysis. Upper Saddle River, NJ: Prentice-Hall, 1994.

DENG et al.: KNOWLEDGE-LEVERAGE-BASED FUZZY SYSTEM AND ITS MODELING 609

[25] J. S. R. Jang, C. T. Sun, and E. Mizutani, Neuro-fuzzy and soft-computing.Upper Saddle River, NJ: Prentice-Hall, 1997.

[26] E. H. Mamdani, “Application of fuzzy logic to approximate reasoningusing linguistic synthesis,” IEEE Trans. Comput., vol. C-26, no. 12,pp. 1182–1191, Dec. 1977.

[27] P. M. Larsen, “Industrial applications of fuzzy logic control,” Int. J.Man-Mach. Stud., vol. 12, no. 1, pp. 3–10, 1980.

[28] T. Takagi and M. Sugeno, “Fuzzy identification of systems and its ap-plication to modeling and control,” IEEE Trans. Syst. Man Cybern.,vol. SMC-15, no. 1, pp. 116–132, 1985.

[29] M. F. Azeem, M. Hanmandlu, and N. Ahmad, “Generalization of adaptiveneural-fuzzy inference systems,” IEEE Trans. Neural Netw., vol. 11, no. 6,pp. 1332–1346, Nov. 2000.

[30] M. T. Gan, M. Hanmandlu, and A. H. Tan, “From a Gaussian mixturemodel to additive fuzzy systems,” IEEE Trans. Fuzzy Syst., vol. 13, no. 3,pp. 303–316, Jun. 2005.

[31] G. Mark and H. Chao, “Probability density estimation from optimallycondensed data samples,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25,no. 10, pp. 1253–1264, Oct. 2003.

[32] Z. H. Deng, F. L. Chung, and S. T. Wang, “FRSDE: Fast reduced set densityestimator using minimal enclosing ball,” Pattern Recognit., vol. 41, no. 4,pp. 1363–1372, 2008.

[33] F. L. Chung, Z. H. Deng, and S. T. Wang, “From minimum enclosing ballto fast fuzzy inference system training on large datasets,” IEEE Trans.Fuzzy Syst., vol. 17, no. 1, pp. 173–184, Feb. 2009.

[34] Z. H. Deng and S. T. Wang, “Reduced set density estimation based MLFuzzy Inference system s construction,” J. Jiangnan Univ. (Nat. Sci. Ed.),vol. 9, no. 1, pp. 1–6, 2010.

[35] R. E. Fan, P. H. Chan, and C. J. Lin, “Working set selection using secondorder information for training support vector machines,” J. Mach. Learn.Res., vol. 6, pp. 1889–1918, 2005.

[36] P. H. Chen, R. E. Fan, and C. J. Lin, “A study on SMO-type decompositionmethods for support vector machines,” IEEE Trans. Neural Netw., vol. 17,no. 4, pp. 893–908, Jul. 2006.

[37] S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K. Murthy, “Im-provements to Platt’s SMO algorithm for SVM classifier design,” NeuralComput., vol. 13, pp. 637–649, 2001.

[38] C. J. Lin, “Asymptotic convergence of an SMO algorithm without anyassumptions,” IEEE Trans. Neural Netw., vol. 13, no. 1, pp. 248–250,Jan. 2002.

[39] C. F. Juang, S. H. Chiu, and S. J. Shiu, “Fuzzy system learned throughfuzzy clustering and support vector machine for human skin color segmen-tation,” IEEE Trans. Syst. Man Cybern., vol. 37, no. 6, pp. 1077–1087,Nov. 2007.

[40] C. F. Juang and C. D. Hsieh, “TS-fuzzy system-based support vectorregression,” Fuzzy Sets Syst., vol. 160, no. 17, pp. 2486–2504, 2009.

[41] R. Alcala, M. J. Gacto, and F. Herrera, “A fast and scalable multiobjectivegenetic fuzzy system for linguistic fuzzy modeling in high-dimensionalregression problems,” IEEE Trans. Fuzzy Syst., vol. 19, no. 4, pp. 666–681, Aug. 2011.

[42] A. Lemos, W. Caminhas, and F. Gomide, “Multivariable gaussian evolvingfuzzy modeling system,” IEEE Trans. Fuzzy Syst., vol. 19, no. 1, pp. 91–104, Feb. 2011.

[43] Z. Chen, S. Aghakhani, J. Man, and S. Dick, “ANCFIS: A neurofuzzyarchitecture employing complex fuzzy sets,” IEEE Trans. Fuzzy Syst.,vol. 19, no. 2, pp. 305–322, Apr. 2011.

Zhaohong Deng (M’12) received the B.S. degreein physics from Fuyang Normal College, Fuyang,China, in 2002 and the Ph.D. degree in light industryinformation technology and engineering from Jiang-nan University, Wuxi, China, in 2008.

He is currently an Associate Professor with theSchool of Digital Media, Jiangnan University. Hiscurrent research interests include neurofuzzy systemsand pattern recognition and their applications. He isthe author or coauthor of more than 40 research pa-pers in international/national journals.

Yizhang Jiang (M’12) is working toward the Ph.D.degree with the School of Digital Media, JiangnanUniversity, Wuxi, China.

His research interests include pattern recognitionand intelligent computation and their application.

Fu-lai Chung (M’95) received the B.Sc. degree fromthe University of Manitoba, Winnipeg, MB, Canada,in 1987 and the M.Phil. and Ph.D. degrees from theChinese University of Hong Kong, Hong Kong, in1991 and 1995, respectively. In 1994, he joined theDepartment of Computing, Hong Kong PolytechnicUniversity, where he is currently an Associate Pro-fessor.

He has authored or coauthored over 80 journalpapers published in the areas of soft computing, datamining, machine intelligence, and multimedia. His

current research interests include transfer learning, social network analysis andmining, kernel learning, dimensionality reduction, and big data learning.

Hisao Ishibuchi received the B.S., M.S. and Ph.D.in industrial engineering from Osaka Prefecture Uni-versity, Osaka, Japan.

Since 1999, he has been a Full Professor withOsaka Prefecture University. His research interestsinclude artificial intelligence, neural fuzzy systems,and data mining.

Dr. Ishibuchi is on the editorial boards of sev-eral journals, including the IEEE TRANSACTIONS

FUZZY SYSTEMS and the IEEE TRANSACTIONS

ON SYSTEMS, MAN, AND CYBERNETICS—PART B:CYBERNETICS.

Shitong Wang received the M.S. degree in computerscience from Nanjing University of Aeronautics andAstronautics, Nanjing, China, in 1987.

He visited London University and Bristol Uni-versity in the U.K., Hiroshima International Univer-sity in Japan, the Hong Kong University of Scienceand Technology, and Hong Kong Polytechnic Uni-versity, as a Research Scientist, for over six years.He is currently a Full Professor with the School ofDigital Media, Jiangnan University, Wuxi, China. Hisresearch interests include artificial intelligence, neu-

rofuzzy systems, pattern recognition, and image processing. He has publishedmore than 80 papers in international/national journals and has authored sevenbooks.