17
210 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 19, NO. 2, APRIL 2011 Scalable TSK Fuzzy Modeling for Very Large Datasets Using Minimal-Enclosing-Ball Approximation Zhaohong Deng, Kup-Sze Choi, Member, IEEE, Fu-Lai Chung, Member, IEEE, and Shitong Wang Abstract—In order to overcome the difficulty in Takagi–Sugeno– Kang (TSK) fuzzy modeling for large datasets, scalable TSK (STSK) fuzzy-model training is investigated in this study based on the core-set-based minimal-enclosing-ball (MEB) approxima- tion technique. The specified L2-norm penalty-based ε-insensitive criterion is first proposed for TSK-model training, and it is found that such TSK fuzzy-model training can be equivalently expressed as a center-constrained MEB problem. With this finding, an STSK fuzzy-model-training algorithm, which is called STSK, for large or very large datasets is then proposed by using the core-set-based MEB-approximation technique. The proposed algorithm has two distinctive advantages over classical TSK fuzzy-model training al- gorithms: The maximum space complexity for training is not re- liant on the size of the training dataset, and the maximum time complexity for training is linear with the size of the training dataset, as confirmed by extensive experiments on both synthetic and real- world regression datasets. Index Terms—Core set, core vector machine (CVM), ε- insensitive training, minimal-enclosing-ball (MEB) approxima- tion, Takagi–Sugeno–Kang (TSK) fuzzy modeling, very large datasets. I. INTRODUCTION T AKAGI–Sugeno–Kang (TSK) fuzzy modeling is one of the most important fuzzy-modeling methods and has been extensively studied in past decades [22], [24]. While this method has been investigated in depth for small- and middle-size datasets [1]–[5], development of a scalable and fast training al- gorithm for very large datasets remains a critical challenge due to the time and space complexities involved, where the number of samples in a dataset is usually greater than 10 000 or even up Manuscript received November 21, 2009; revised April 20, 2010 and Au- gust 13, 2010; accepted November 1, 2010. Date of publication November 11, 2010; date of current version April 4, 2011. This work was supported in part by the Hong Kong Polytechnic University under Grant Z-08R and Grant 1-ZV6C, in part by the National Natural Science Foundation of China un- der Grant 60773206, Grant 60903100, Grant 60975027, and Grant 90820002, in part by the Natural Science Foundation of Jiangsu Province under Grant BK2009067, in part by the Doctoral Foundation of Jiangnan University, and in part by the Research Grants Council of the Hong Kong Special Administrative Region under Project 5147/06E and Project 5152/09E. Z. H. Deng and S. T. Wang are with the School of Information Technology, Jiangnan University, Wuxi 214122, China (e-mail: [email protected]; [email protected]). K.-S. Choi is with the School of Nursing, the Hong Kong Polytechnic Uni- versity, Kowloon, Hong Kong (e-mail: [email protected]). F.-L. Chung is with the Department of Computing, Hong Kong Polytechnic University, Kowloon, Hong Kong (e-mail: [email protected]) Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TFUZZ.2010.2091961 to 1 million. For example, the training time increases sharply when TSK fuzzy model is trained with adaptive-network-based fuzzy-inference system (ANFIS) [2] to handle large datasets. The ε-insensitive TSK fuzzy-model training algorithms [14] usually fail for large datasets due to their high space complex- ity. Thus, development of an effective TSK fuzzy-model training method for large datasets is very important for broadening the applications of TSK fuzzy model. In this study, we address the issue based on a recent and promising advance in machine learning for very large datasets, i.e., the core-set-based minimal- enclosing-ball (MEB) approximation technique [6], [13]. The MEB-approximation technique has attracted consider- able attentions in the field of machine learning due to its distinct advantages to develop scalable algorithms [6]–[13], [19]. Tsang et al. showed that by introducing the kernel tricks, some kernel methods can be equivalently formulated as MEB problems, and a core vector machine (CVM) was proposed for support-vector- machine (SVM) training on large datasets by using the core-set- based MEB-approximation technique [7]. They also extended the MEB problem to the center-constrained MEB (CC-MEB) problem and established connections bridging several other ker- nel methods, such as L2-norm-based support-vector regression (L2-SVR) and imbalanced L2-norm-based support-vector clas- sification (L2-SVC). These kernel methods can then be effi- ciently realized for very large datasets by using the fast core- set-based MEB-approximation algorithm [8]. Deng et al. [9] revealed the connection between reduced set density estimation (RSDE) and CC-MEB and proposed the fast reduced set density estimator (FRSDE) for large datasets. Based on this advance, Chung et al. [10] first identified the relationship between CC- MEB and the Mamdani–Larsen (ML) FIS [22], [23] and pro- posed the scalable ML fuzzy-model training algorithm, i.e., fast ML-FIS. In addition, the core-set-based MEB-approximation technique has been used for feature extraction and clustering [11], [12]. As reported in [6]–[13], the algorithms based on fast- MEB approximation show the following advantages: 1) The upper bound on their space complexity is independent of the size of the training dataset, and 2) the upper bound on their time complexity is linear with the size of the training dataset. In this study, the core-set-based MEB-approximation tech- nique is introduced to develop a scalable TSK (STSK) fuzzy- model training algorithm for very large datasets. First, based on the ε-insensitive training of the TSK fuzzy model [14], the modified objective function for TSK fuzzy-model training is proposed by introducing the L2-norm penalty term. Then, the fact that the L2-norm penalty-based ε-insensitive training of the 1063-6706/$26.00 © 2010 IEEE

Scalable TSK Fuzzy Modeling for Very Large Datasets Using Minimal-Enclosing-Ball Approximation

  • Upload
    shitong

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Scalable TSK Fuzzy Modeling for Very Large Datasets Using Minimal-Enclosing-Ball Approximation

210 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 19, NO. 2, APRIL 2011

Scalable TSK Fuzzy Modeling for Very LargeDatasets Using Minimal-Enclosing-Ball

ApproximationZhaohong Deng, Kup-Sze Choi, Member, IEEE, Fu-Lai Chung, Member, IEEE, and Shitong Wang

Abstract—In order to overcome the difficulty in Takagi–Sugeno–Kang (TSK) fuzzy modeling for large datasets, scalable TSK(STSK) fuzzy-model training is investigated in this study basedon the core-set-based minimal-enclosing-ball (MEB) approxima-tion technique. The specified L2-norm penalty-based ε-insensitivecriterion is first proposed for TSK-model training, and it is foundthat such TSK fuzzy-model training can be equivalently expressedas a center-constrained MEB problem. With this finding, an STSKfuzzy-model-training algorithm, which is called STSK, for largeor very large datasets is then proposed by using the core-set-basedMEB-approximation technique. The proposed algorithm has twodistinctive advantages over classical TSK fuzzy-model training al-gorithms: The maximum space complexity for training is not re-liant on the size of the training dataset, and the maximum timecomplexity for training is linear with the size of the training dataset,as confirmed by extensive experiments on both synthetic and real-world regression datasets.

Index Terms—Core set, core vector machine (CVM), ε-insensitive training, minimal-enclosing-ball (MEB) approxima-tion, Takagi–Sugeno–Kang (TSK) fuzzy modeling, very largedatasets.

I. INTRODUCTION

TAKAGI–Sugeno–Kang (TSK) fuzzy modeling is one ofthe most important fuzzy-modeling methods and has been

extensively studied in past decades [22], [24]. While this methodhas been investigated in depth for small- and middle-sizedatasets [1]–[5], development of a scalable and fast training al-gorithm for very large datasets remains a critical challenge dueto the time and space complexities involved, where the numberof samples in a dataset is usually greater than 10 000 or even up

Manuscript received November 21, 2009; revised April 20, 2010 and Au-gust 13, 2010; accepted November 1, 2010. Date of publication November11, 2010; date of current version April 4, 2011. This work was supported inpart by the Hong Kong Polytechnic University under Grant Z-08R and Grant1-ZV6C, in part by the National Natural Science Foundation of China un-der Grant 60773206, Grant 60903100, Grant 60975027, and Grant 90820002,in part by the Natural Science Foundation of Jiangsu Province under GrantBK2009067, in part by the Doctoral Foundation of Jiangnan University, and inpart by the Research Grants Council of the Hong Kong Special AdministrativeRegion under Project 5147/06E and Project 5152/09E.

Z. H. Deng and S. T. Wang are with the School of Information Technology,Jiangnan University, Wuxi 214122, China (e-mail: [email protected];[email protected]).

K.-S. Choi is with the School of Nursing, the Hong Kong Polytechnic Uni-versity, Kowloon, Hong Kong (e-mail: [email protected]).

F.-L. Chung is with the Department of Computing, Hong Kong PolytechnicUniversity, Kowloon, Hong Kong (e-mail: [email protected])

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TFUZZ.2010.2091961

to 1 million. For example, the training time increases sharplywhen TSK fuzzy model is trained with adaptive-network-basedfuzzy-inference system (ANFIS) [2] to handle large datasets.The ε-insensitive TSK fuzzy-model training algorithms [14]usually fail for large datasets due to their high space complex-ity. Thus, development of an effective TSK fuzzy-model trainingmethod for large datasets is very important for broadening theapplications of TSK fuzzy model. In this study, we addressthe issue based on a recent and promising advance in machinelearning for very large datasets, i.e., the core-set-based minimal-enclosing-ball (MEB) approximation technique [6], [13].

The MEB-approximation technique has attracted consider-able attentions in the field of machine learning due to its distinctadvantages to develop scalable algorithms [6]–[13], [19]. Tsanget al. showed that by introducing the kernel tricks, some kernelmethods can be equivalently formulated as MEB problems, anda core vector machine (CVM) was proposed for support-vector-machine (SVM) training on large datasets by using the core-set-based MEB-approximation technique [7]. They also extendedthe MEB problem to the center-constrained MEB (CC-MEB)problem and established connections bridging several other ker-nel methods, such as L2-norm-based support-vector regression(L2-SVR) and imbalanced L2-norm-based support-vector clas-sification (L2-SVC). These kernel methods can then be effi-ciently realized for very large datasets by using the fast core-set-based MEB-approximation algorithm [8]. Deng et al. [9]revealed the connection between reduced set density estimation(RSDE) and CC-MEB and proposed the fast reduced set densityestimator (FRSDE) for large datasets. Based on this advance,Chung et al. [10] first identified the relationship between CC-MEB and the Mamdani–Larsen (ML) FIS [22], [23] and pro-posed the scalable ML fuzzy-model training algorithm, i.e., fastML-FIS. In addition, the core-set-based MEB-approximationtechnique has been used for feature extraction and clustering[11], [12]. As reported in [6]–[13], the algorithms based on fast-MEB approximation show the following advantages: 1) Theupper bound on their space complexity is independent of thesize of the training dataset, and 2) the upper bound on their timecomplexity is linear with the size of the training dataset.

In this study, the core-set-based MEB-approximation tech-nique is introduced to develop a scalable TSK (STSK) fuzzy-model training algorithm for very large datasets. First, basedon the ε-insensitive training of the TSK fuzzy model [14], themodified objective function for TSK fuzzy-model training isproposed by introducing the L2-norm penalty term. Then, thefact that the L2-norm penalty-based ε-insensitive training of the

1063-6706/$26.00 © 2010 IEEE

Page 2: Scalable TSK Fuzzy Modeling for Very Large Datasets Using Minimal-Enclosing-Ball Approximation

DENG et al.: SCALABLE TSK FUZZY MODELING FOR VERY LARGE DATASETS USING MINIMAL-ENCLOSING-BALL APPROXIMATION 211

TSK fuzzy model can be equivalently expressed as a CC-MEBproblem is proved. With this finding, the fast core-set-basedMEB-approximation algorithm, i.e., CVM [7], [8], is adopted toimplement the TSK fuzzy-model training and the correspond-ing STSK fuzzy-model training algorithm, i.e., STSK, is ob-tained. As a machine-learning algorithm based on fast-MEBapproximation, the proposed STSK algorithm also inherits theadvantages stated above.

The rest of this paper is organized as follows. The relatedconcepts of TSK fuzzy modeling and MEB are reviewed inSection II. In Section III, the modified ε-insensitive TSK fuzzy-model training is first proposed, and the connection bridgingCC-MEB and the TSK fuzzy-model training is then established.In Section IV, an STSK training algorithm is proposed for largeand very large datasets by using the findings in Section III.The experimental results on synthetic and real-world datasetsare reported in Section V. Finally, conclusions and the potentialof the proposed method are given in the last section. A list ofabbreviations used in this paper is provided in Appendix B toenhance readability.

II. RELATED WORK

In this section, the TSK fuzzy system and the classical ob-jective criteria used for model training are described. The MEBtechnique that will be used to develop the STSK fuzzy-modeltraining algorithm is also introduced briefly.

A. Takagi–Sugeno–Kang Fuzzy System

The TSK FIS is one of the most important fuzzy models formodeling and intelligence control [1]–[5]. The training of theTSK FIS can be considered as a linear-regression problem or aquadratic-programming-optimization problem, which is brieflydiscussed in the following sections.

1) Takagi–Sugeno–Kang Fuzzy System and Linear Regres-sion: For TSK FISs, the most commonly used fuzzy-inferencerules are defined as follows.TSK Fuzzy Rule Rk :

IF x1 is Ak1 ∧ x2 is Ak

2 ∧ · · · ∧ xd is Akd

THEN fk (x) = pk0 + pk

1 x1 + · · · + pkdxd, k = 1, . . . , K. (1)

Here, Aki is a fuzzy subset subscribed by the input variable xi

for the kth rule, K is the number of fuzzy rules, and ∧ is afuzzy conjunction operator. Each rule is premised on the inputvector x = [x1 , x2 , . . . , xd ]T and maps the fuzzy sets in theinput space Ak ⊂ Rd to a varying singleton denoted by fk (x).When multiplicative conjunction is employed as the conjunctionoperator, multiplicative implication as the implication operator,and additive disjunction as the disjunction operator, the outputof the TSK fuzzy model can be formulated as

yo =K∑

k=1

μk (x)∑K

k ′=1 μk ′(x)fk (x) =

K∑

k=1

μk (x)fk (x) (2a)

where μk (x) and μk (x) denote the fuzzy-membership functionand the normalized fuzzy-membership function associated with

the fuzzy set Ak . These two functions can be calculated by using

μk (x) =d∏

i=1

μAki(xi) (2b)

and

μk (x) = μk (x)

/K∑

k ′=1

μk ′(x) (2c)

A commonly used fuzzy-membership function is the Gaussianmembership function with the following formulation:

μAki(xi) = exp

(−(xi − ck

i )2

2δki

)(2d)

where the parameters cki and δk

i can be estimated by clusteringtechnique or other partition methods. For example, with fuzzyc-means (FCM) clustering, ck

i and δki can be estimated as

follows:

cki =

N∑

j=1

ujkxji

/N∑

j=1

ujk (2e)

δki = h ·

N∑

j=1

ujk (xji − cki )2

/N∑

j=1

ujk (2f)

where ujk denotes the fuzzy membership of the jth input dataxj = (xj1 , . . . , xjd)T belonging to the kth cluster obtained byFCM clustering [14]. Here, h is the scale parameter and can beadjusted manually.

When the premise of the TSK fuzzy model is determined, let

xe = (1,xT )T (3a)

xk = μk (x)xe (3b)

xg = ((x1)T , (x2)T , . . . , (xK )T )T (3c)

pk = (pk0 , pk

1 , . . . , pkd )T (3d)

pg = ((p1)T , (p2)T , . . . , (pK )T )T (3e)

then (2a) can be formulated as the following linear-regressionproblem [14]:

yo = pTg xg . (3f)

Thus, the problem of TSK fuzzy-model training can be trans-formed into the learning of the parameters in the correspondinglinear regression model [2], [14].

2) ε-Insensitive Takagi–Sugeno–Kang Fuzzy-Model Train-ing With L1-Norm Penalty Term: Given a training datasetDtr = {xi , yi}, xi ∈ Rd , yi ∈ R, i = 1, . . . , N for fixed an-tecedents obtained via clustering of the input space or otherpartition techniques, the least square (LS) solution to the conse-quent parameters is to minimize the following criterion function[14], [15]:

minpg

E =N∑

i=1

[pTg xgi − yi ]

2= (y − Xgpg )

T (y − Xgpg )

(4)

Page 3: Scalable TSK Fuzzy Modeling for Very Large Datasets Using Minimal-Enclosing-Ball Approximation

212 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 19, NO. 2, APRIL 2011

where Xg = [xg1 , . . . ,xgN ]T ∈ RN ×K ·(d+1) , and y = [y1 ,. . . , yN ]T ∈ RN .

The most popular LS criterion-based TSK fuzzy-model train-ing algorithm is perhaps the ANFIS method [2]. However, itis inefficient when applying to very large datasets since thetime and space complexities will increase sharply. Further, LScriterion-based training algorithms usually have weak robust-ness for noisy datasets when the size of the training dataset issmall [14].

In addition to the LS criterion, another important criterionfor TSK fuzzy-model training is the ε-insensitive criterion [14].Given a scalar g and a vector g = [g1 , . . . , gg ]T , the correspond-ing ε-insensitive loss functions take the following forms, respec-tively [14], [26]:

|g|ε ={

g − ε, g > ε

0, g ≤ 0(5)

and

|g|ε =d∑

i=1

|gi |ε . (6)

For the linear-regression problem of the TSK fuzzy model in(3f), the corresponding ε-insensitive loss-based criterion func-tion [14] is defined as

minpg

E =N∑

i=1

|yoi − yi |ε =

N∑

i=1

|pTg xgi − yi |ε . (7)

In general, the inequalities yi − pTg xgi < ε and pT

g xgi − yi < εare not satisfied for all data pairs (xgi , yi). By introducing theslack variables ξ+

i ≥ 0 and ξ−i ≥ 0, the following constraintsare obtained:

{yi − pT

g xgi < ε + ξ+i

pTg xgi − yi < ε + ξ−i .

(8)

Using (8), the criterion in (7) can be equivalently written as

minpg ,ξ+

i,ξ

E =N∑

j=1

(ξ+i + ξ−i

)

s.t.

{yi − pT

g xgi < ε + ξ+i

pTg xgi − yi < ε + ξ−i

, ξ+i ≥ 0, ξ−i ≥ 0 ∀i.

(9)

By introducing the regularization term [14], (9) is modified tobecome

minpg ,ξ+ ,ξ+

E =1τ

N∑

j=1

(ξ+i + ξ−i

)+

12pT

g pg (10)

s.t.

{yi − pT

g xgi < ε + ξ+i

pTg xgi − yi < ε + ξ−i

, ξ+i ≥ 0, ξ−i ≥ 0∀i

where τ > 0 controls the tradeoff between the complexity ofthe regression model and the tolerance of the errors. Here, ξ+

i

and ξ−i can be taken as the L1-norm penalty terms, and thus,

(10) is an objective function based on L1-norm penalty terms.The dual optimization for (10) can be expressed as

maxλλ

+,λλ

−−1

2

N∑

i=1

N∑

j=1

(λ+i − λ−

i )(λ+j − λ−

j )(xgi)T (xgj )T

−N∑

i=1

ε(λ+j + λ−

j ) +N∑

i=1

yi(λ+j − λ−

j ) (11)

s.t.N∑

i=1

(λ+j − λ−

j ) = 0, λ+j , λ−

j ∈ [0, τ ]∀i.

Thus, the ε-insensitive TSK fuzzy-model training can betaken as a quadratic programming (QP) optimization problem,and existing QP algorithms can be adopted directly for TSKfuzzy-model training. As the classical QP solutions are inef-ficient for large and very large datasets, two new algorithms,namely, iterative QP solution (IQP) and ε-insensitive learningby solving a system of linear inequations (ε-LSSLIs), are pro-posed for the ε-insensitive TSK fuzzy-model training [14]. Al-though these two algorithms are more effective compared tothe classical QP solutions, they are still infeasible due to thehigh space complexities when the size of the training datasetis much larger (e.g., sample size greater than 10 000). In or-der to overcome these weaknesses, the scalable ε-insensitiveTSK fuzzy-model training is proposed in this study by usingthe core-set-based MEB-approximation technique, which willbe discussed in Sections III and IV.

B. MEB

The MEB problem is a classical computational geometryproblem [6], [13], [16]. The application of MEB and CC-MEBin the fields of machine-learning through the introduction ofkernel tricks are reviewed in the following sections.

1) MEB and Kernelized Minimal Enclosing Ball: Given a setof data points S = {x1 , . . . ,xN } ∈ Rd , the MEB of S, whichis denoted as MEB(S), is the smallest ball that contains all thepoints in S. Let us denote a ball B (c, R) with center c andradius R. The MEB problem can be formulated as the followingoptimization problem:

minc,R

R2

s.t. (xi − c)T (xi − c) ≤ R2 ∀xi ∈ S. (12)

Recently, the MEB-approximation technique has been intro-duced for machine-learning [7]–[12]. The resulting MEB prob-lem has been properly solved by the introduction of kerneltricks [25], where the corresponding kernel version of the MEBcan be described as

minc,R

R2

s.t. (ϕ(xi) − c)T (ϕ(xi) − c) ≤ R2 ∀i (13)

where ϕ(·) denotes the feature mapping associated with a givenkernel k(·), and B (c, R) is the desired MEB in the kernel featurespace. The dual form of (13) can be expressed by the following

Page 4: Scalable TSK Fuzzy Modeling for Very Large Datasets Using Minimal-Enclosing-Ball Approximation

DENG et al.: SCALABLE TSK FUZZY MODELING FOR VERY LARGE DATASETS USING MINIMAL-ENCLOSING-BALL APPROXIMATION 213

QP problem:

maxα

αT diag (K) − αT Kα

s.t. αT 1 = 1, αi ≥ 0 ∀i (14)

where α = [α1 , . . . , αN ]T is the vector of Lagrange multi-pliers, 1 = [1, . . . , 1]T is a N -dimensional vector, and K =[k(xi ,xj )]N ×N =

[ϕ(xi)T ϕ(xj )

]N ×N

is the correspondingkernel matrix.

Studies have been conducted to demonstrate the importantrelationships between several kernel methods and the kernel-ized MEB problems [7]. Under some constrained conditions,the kernelized MEB problem is equivalent to the hard-marginsupport-vector data description (SVDD) [17], which is oftenused in novelty detection. Besides this direct connection, itis also shown that the soft-margin one-class and two-classL2-norm SVMs (L2-SVMs) can be regarded as the correspond-ing kernelized MEB problems. Based on this finding, the fastcore-set-based MEB approximation is used to develop more ef-fective kernel methods. For example, the CVM in [7] has demon-strated satisfactory performance for several machine-learningtasks on large datasets.

2) Center-Constrained Minimal Enclosing Ball: To furtherinvestigate the relationships between the MEB and other ker-nel methods, the MEB is extended to a CC-MEB [8], and theconnections between MEB and several kernel methods, includ-ing the L2-SVR and imbalanced L2-SVC, are then established.For the kernelized MEB problem in (12), the aim is to find thesmallest ball containing all ϕ (xi) in the feature space. In the CC-MEB problem, an extra attribute hi ∈ R is augmented to each

ϕ (xi) to form ϕ(xi) = [ϕ(xi)

hi] in the extended feature space.

The MEB for these augmented points can then be found, whileat the same time the last coordinate of the ball’s center is con-strained to be zero (i.e., the center of the CC-MEB is of the form

[c0 ], where c is the center in the un-extended feature space). The

primal formulation of the CC-MEB problem can be expressed as

minc,R

R2

s.t. (ϕ (xi) − c)T (ϕ (xi) − c) + h2i ≤ R2 ∀i. (15)

The dual optimization of (15) can be formulated as thefollowing QP problem:

maxα

αT (diag (K) + Δ) − αT Kα

s.t. αT 1 = 1, αi ≥ 0 ∀i (16)

where

Δ =[h2

1 , . . . , h2N

]T ≥ 0. (17)

Because of the constraint αT 1 = 1 in (16), an arbitrarymultiple of αT 1 can be added to the objective function withoutaffecting the solution to α. In other words, for an arbitraryη ∈ R, (16) yields the same optimal α, which is given by

maxα

αT (diag (K) + Δ − η1) − αT Kα

s.t. αT 1 = 1, αi ≥ 0 ∀i. (18)

By employing the CC-MEB, the MEB is linked to margin-basedfeature transformation, and a fast algorithm is proposed forlarge dataset feature transformation [11]. The fast reduced-setdensity estimator (FRSDE) is also proposed by relating kerneldensity estimation to the CC-MEB [9]. Further, based on theFRSDE, it is revealed for the first time that the training of theML-FIS can be expressed equivalently as a CC-MEB problem,and the fast ML-FIS training algorithm is proposed for verylarge datasets [10].

Besides ML-FIS, another representative fuzzy-inferencemodel is the TSK FIS, which is more commonly used. Efficientimplementation of the training of the TSK fuzzy model for verylarge datasets is a challenging issue due to high time and spacecomplexities. This issue will be investigated in this paper byemploying the core-set-based MEB-approximation technique.

3) Core-Set-Based Minimal Enclosing Ball Approximation:In this section, the core-set-based fast MEB-approximation al-gorithms are briefly introduced. Given a parameter ρ > 0, aball B (c, (1 + ρ)R) is an (1 + ρ)-approximation of MEB(S)if R ≤ RMEB(S ) and S ⊂ B (c, (1 + ρ)R). It is found that solv-ing the MEB problem on a subset Q of S, which is called thecore set, can often give an accurate and efficient approxima-tion [6], [13]. More formally, a subset Q ⊂ S is a core set ofS if an expansion by a factor (1 + ρ) of its MEB contains S,i.e., S ⊂ B (c, (1 + ρ)R), where B(c, R) = MEB(Q). A break-through in achieving such an (1 + ρ)-approximation was firstpresented in [6] and further explored in [7]–[13] and [19].

A simple iterative scheme is typically used for core-set-based MEB approximation. For example, in the tth iteration,the current estimate B(ct , Rt) is expanded by including thefarthest data point outside the (1 + ρ)-ball B (ct , (1 + ρ) Rt).This process is repeated until all the points in S are coveredby B (ct , (1 + ρ)Rt). A remarkable property of this simplescheme is that the maximum number of iterations and the size ofthe final core-set depend only on ρ but not on the dimensionalitynor the size of a dataset [13]. Based on this idea, CVM [7], [8]is proposed to realize a fast kernelized MEB approximation forlarge datasets. The algorithm of CVM is shown below.

Page 5: Scalable TSK Fuzzy Modeling for Very Large Datasets Using Minimal-Enclosing-Ball Approximation

214 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 19, NO. 2, APRIL 2011

III. L2-NORM PENALTY-BASED ε-INSENSITIVE

TAKAGI–SUGENO–KANG FUZZY-MODEL TRAINING

In this section, the modified ε-insensitive criterion function isproposed for TSK fuzzy-model training. With this new criterionintroduced, we will prove that the optimization problem for TSKfuzzy-model training is equivalent to a corresponding CC-MEBproblem.

A. L2-Norm Penalty-Based ε-insensitive Criterion Function

To introduce the new criterion, the L2-norm penalty termsare employed instead of the L1-norm penalty terms in (10).Meanwhile, we also introduce the penalty term, i.e., insensitiveparameter ε, as in other existing L2-norm penalty-based meth-ods such as L2-SVR [8]. The following ε-insensitive objectivefunction for TSK fuzzy-model training is then obtained as

minpg ,ξ+ ,ξ+ ,ε

g(pg , ξ

+ , ξ+ , ε)

=1τ· 1N

N∑

j=1

((ξ+

i )2 + (ξ−i )2)

+12pT

g pg +2τ· ε (19)

s.t.

{yi − pT

g xgi < ε + ξ+i

pTg xgi − yi < ε + ξ−i

∀i.

Compared with the L1-norm penalty-based ε-insensitive cri-terion function, the proposed L2-norm penalty-based criterionhas the following characteristics: 1) The constraints ξ+

i ≥ 0and ξ−i ≥ 0 are not needed for the optimization in (19); 2) theinsensitive parameter ε can be obtained automatically by opti-mization without the need of manual setting. Similar propertiescan also be found in other L2-norm penalty-based machine-learning algorithms, such as L2-SVR [8]. For convenience, theL2-norm penalty based ε-insensitive TSK fuzzy-model trainingis referred to as L2-TSK in this paper.

Based on optimization theory, the dual problem of (19) canbe formulated as the following QP problem (with the derivationprovided in Appendix A)

maxλλ

+,λλ

−−

N∑

i=1

N∑

j=1

(λ+i − λ−

i )(λ+i − λ−

i ) · xTgixgj

−N∑

i=1

2(λ+

i )2 −N∑

i=1

2(λ−

i )2

+N∑

i=1

λ+i · yi · τ −

N∑

i=1

λ−i · yi · τ (20)

s.t.N∑

i=1

(λ+j + λ−

j ) = 1, λ+j , λ−

j ≥ 0 ∀i.

Let

α = (α1 , . . . , α2N )T =((λλ+)T , (λλ−)T

)T(21a)

zi ={

xgi , i = 1, . . . , N

−xg(i−N ) , i = N + 1, . . . , 2N(21b)

β = (τyT ,−τyT )T , y = (y1 , y2 , . . . , yN )T . (21c)

Equation (20) can be formulated as

argmaxα

−αT Kα+αT β

s.t. αT 1 = 1, αi ≥ 0 ∀i (22)

where K = [kij ]2N ×2N , kij = zTi zi + N τ

2 δij , and

δij ={

1, i = j0, i = j

.

The optimal λλ+∗ , λλ−

∗ can be obtained by solving (22), and thesolutions to (19) can then be obtained with (23a)–(23d) (withthe derivation provided in Appendix A)

pg =2τ

N∑

i=1

(λ+i − λ−

i )xgi (23a)

ξ+i = Nλ+

i (23b)

ξ−i = Nλ′−i (23c)

ε =N∑

i=1

(λ+i − λ−

i )yi −N

2

N∑

i=1

((λ+

i )2 + (λ−i )2)

− 1τ

N∑

i=1

N∑

j=1

(λ+i − λ−

i )(λ+j − λ−

j )xTgixgj (23d)

It is clear from the above results that the optimization prob-lem for TSK fuzzy-model training is a QP problem that canbe directly solved by traditional QP solutions. However, thetraditional algorithms cannot handle large datasets, not eventhe sophisticated sequential minimal optimization (SMO) algo-rithms [18]. Fortunately, the QP problem in (22) has some spe-cial characteristics that enable the use of core-set-based MEB-approximation technique to solve the problem for very largedatasets.

B. Relationship Between L2-Takagi–Sugeno–Kangand Minimal Enclosing Ball

For the optimization objective of L2-TSK in (22), let

Δ = (h21 , . . . , h

22N )T = −diag(K) + β + η1 (24)

where η ≥ 0 is a much larger constant such that Δ ≥ 0. Then,(22) can be rewritten as

maxα

αT(diag(K) + Δ − η1

)− αT Kα

s.t. αT 1 = 1, αi ≥ 0 ∀i. (25)

By comparing (25) of L2-TSK with (18) of CC-MEB, we caneasily obtain the following theorem.

Theorem 1: L2-TSK can be taken as a special MEB problem,i.e., the CC-MEB problem, under the Condition Set A.

Condition Set A:i) The kernel matrix K of the CC-MEB problem in (18) is

given by the matrix K in (25) of L2-TSK, i.e. K = K.ii) The Lagrange multipliers vector α of the CC-MEB prob-

lem in (18) is set by the Lagrange multipliers vector α ofL2-TSK in (25), i.e. α = α.

iii) The term Δ of the CC-MEB problem in (18) is set withthe term Δ of L2-TSK in (25).

Page 6: Scalable TSK Fuzzy Modeling for Very Large Datasets Using Minimal-Enclosing-Ball Approximation

DENG et al.: SCALABLE TSK FUZZY MODELING FOR VERY LARGE DATASETS USING MINIMAL-ENCLOSING-BALL APPROXIMATION 215

Here, the corresponding CC-MEB to be solved for L2-TSKcontains the following dataset in the extended feature space, i.e.,D = {ϕ(zi)}, with

ϕ(zi) =

(ϕ(zi)

hi

)=

⎜⎝zi

(√

2Nτ · ei)/2hi

⎟⎠, i = 1, 2, . . . , 2N

(26)where zi is defined in (21b), ei is a 2N -dimensional vector inwhich the ith element equals one and all the other elementsequal zero, and h2

i is the ith component of Δ as defined in (24).

IV. SCALABLE TAKAGI–SUGENO–KANG FUZZY-MODEL

TRAINING FOR VERY LARGE DATASETS

Approximation algorithms have long been used in the field oftheoretical computer science for tackling computationally dif-ficult problems [6], [13], [16]. For MEB problems, as shownin Section II-B3, it is usually of practical interest to study fastapproximation algorithms that only aim at returning a good ap-proximate solution. For large datasets, the core-set-based MEBapproximation is highly effective for obtaining approximate so-lutions. Based on Theorem 1, we propose the STSK fuzzy-modeltraining algorithm in this section by using the fast core-set-basedMEB approximation. Details of this algorithm are as follows.

A. Scalable L2-Takagi–Sugeno–Kang Training Algorithm

With reference to the CVM algorithm for fast kernelizedMEB approximation as discussed in Section II-B3, we proposethe STSK fuzzy-model training algorithm (i.e., STSK). TheSTSK algorithm consists of three stages. In the first stage, thedataset for the corresponding CC-MEB problem is constructed.In the second stage, the corresponding CC-MEB problem issolved by using the CVM algorithm. In the final stage, thecorresponding TSK fuzzy system is generated using the resultsobtained in the second stage. The three-stage STSK algorithmcan be summarized as follows.

B. Properties

The proposed STSK algorithm is indeed a special case ofthe generalized core-set-based MEB-approximation algorithms[6]–[8], [13]. Thus, the properties concluded from the core-set-based MEB-approximation algorithms still hold true for theSTSK algorithm. Based on the theoretical results of core-set-based MEB approximation in [6] and [13], important propertiesof the proposed STSK algorithm are summarized below.

Property 1: Given a fixed MEB-approximation parameter ρ,the upper bound on the size of the core set obtained by the STSKalgorithm in the second stage is O(1/ρ) and is independent ofthe size of the training set.

Property 2: Given a fixed MEB-approximation parameter ρ,the upper bound on the space complexity of the STSK algorithmis O(1/ρ2) and is independent of the size of the training set.Here, the O(N) space requirement for storing the whole datasetis ignored because they can be stored outside the core memory.

Property 3: Given a fixed MEB-approximation parameter ρ,the upper bound on the time complexity of the STSK algorithmis O

(N/ρ2 + 1/ρ4

), i.e., it is linear with the size of the training

set. Furthermore, with the use of the probabilistic speedup strat-egy in [20], it can be reduced to O

(1/ρ8

), which is independent

of N .These properties characterize the upper bounds on the time

and space complexities of the STSK algorithm. In fact, the ex-perimental results to be presented in Section V will show that forlarge and very large datasets, the typical CPU running time andspace consumption are usually much less than the theoreticalupper bounds, as demonstrated by other MEB-approximation-based learning algorithms, such as [7]–[11]. Thus, the proposedalgorithm is highly effective and efficient for large and verylarge datasets. The number of attributes is not considered here.Indeed, even if dimensionality is considered, the conclusion re-garding the space and time complexities of the proposed methodstill holds true since this characteristic is inherited from the core-set-based MEB-approximation algorithm and it has already beenconfirmed by many existing MEB-approximation-based meth-ods [6]–[13].

C. Difference From Fast ML-FIS Algorithm

Although the scalable ML-FIS training algorithm (i.e.,fast ML-FIS) [10] is also developed by using the MEB-approximation technique, the proposed STSK algorithm is fun-damentally different from the fast ML-FIS algorithm. First, thefast ML-FIS and STSK algorithms are proposed to train twomost representative and distinct fuzzy models, namely, the MLand the TSK fuzzy systems, respectively. Details of these twofuzzy models can be found in [31]. Second, the equivalence be-tween the ML-FIS training and MEB problem is established byintroducing the reduced density estimation (RSDE) [10], and theMEB-approximation technique is then adopted to develop thefast ML fuzzy-model training algorithm (fast ML-FIS). How-ever, this strategy is not applicable to the development of anSTSK fuzzy-model training algorithm since the characteristicsof the TSK and ML fuzzy models are different. In our work, theequivalence between the TSK fuzzy system training and MEB

Page 7: Scalable TSK Fuzzy Modeling for Very Large Datasets Using Minimal-Enclosing-Ball Approximation

216 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 19, NO. 2, APRIL 2011

problems is established instead by introducing the linear regres-sion model and the L2-norm penalty term, and the developmentof the STSK algorithm is based on this.

An additional but relatively minor difference between theSTSK algorithm and the fast ML-FIS algorithm is that manualsetting of the number of fuzzy rules is required in the STSKalgorithm, while the number of fuzzy rules is determined au-tomatically in the fast ML-FIS algorithm. For the fast ML-FISalgorithm, a major weakness is that too many rules may begenerated for the ML fuzzy model, as pointed out in [10].

D. Curse of Dimensionality

The curse of dimensionality has been a major problem withneural-fuzzy (NF) techniques. Efforts have been made to over-come this difficulty. A very useful method is to adopt thesupport-vector learning technique, which originates from statis-tical learning theory [25], [27] and has been extensively studiedin the fields of machine learning. For example, support-vectorlearning for fuzzy modeling and classification [14], [28], [29]has demonstrated promising performance for high-dimensionaldata by introducing an additional term to control the structuralrisk in the objective function. The L2-norm penalty-based ob-jective function proposed in this study and the L1-norm penalty-based objective function proposed in [14] [i.e., eq. (10)] both be-long to the category of support-vector learning technique, wherethe regularization term (1/2)pT

g pg introduced to the objectivefunction corresponds to the term of structural risk. Hence, theproposed STSK algorithm in this study can overcome the curveof dimensionality in NF modeling to some extent.

V. EXPERIMENTAL STUDIES

The proposed STSK fuzzy-model training algorithm hasbeen evaluated with several experiments on both syntheticand real-world datasets. The experimental results are presentedas follows. Section V-A describes the experimental settings.Section V-B and C reports the experimental results on the syn-thetic and real-world datasets respectively. In Section V-D, theapplication of the STSK algorithm to biochemical process mod-eling with large datasets is presented and discussed.

A. Experimental Settings

The performance of the proposed STSK algorithm was com-pared with three existing TSK fuzzy-model training algorithms,namely, the L1-norm penalty-based ε-insensitive training algo-rithms IQP and LSSLI [14], and the well-known LS criterion-based ANFIS algorithm [2]. Meanwhile, since the MEB ap-proximation is used to develop the proposed STSK algorithm,the performance of the STSK algorithm was compared withthat of two existing MEB-approximation-based modeling algo-rithms, i.e., the fast ML fuzzy-model training algorithm (i.e., fastML-FIS) [10] and the fast SVR training algorithm (i.e., CVR) [8]for large datasets.

For the proposed STSK algorithm and the other two MEB-approximation-based algorithms, i.e., fast ML-FIS and CVR,the MEB-approximation parameter ρ was set to 10−5 in ourexperiments, by referring to the settings of this parameter in

existing MEB-approximation-based methods [6]–[10]. All theother parameters of the six adopted modeling methods weredetermined with a validation set containing 10% data sampledfrom the adopted datasets. All algorithms were implementedwith MATLAB codes. For the three MEB-approximation-basedalgorithms, i.e., STSK, fast ML-FIS, and CVR, the second-orderinformation-based SMO (SMO strategy) [18] was adopted tosolve the corresponding sub-QP problems. All algorithms wereimplemented on a computer with Intel Core 2 Duo P8600 2.4-GHz CPU and 2 GB RAM.

To evaluate the generalization ability of the algorithms, thefollowing performance index was adopted:

J =

√√√√ 1N

N∑

i=1

(y′i − yi)2

/1N

N∑

i=1

(yi − y)2 (27)

where N is the size of test dataset, yi is the output for the ithtest input, y′

i is the fuzzy-model output for the ith test input,and y = 1

N

∑Ni=1 yi . The smaller the value of J , the better

the generalization performance. In all the experiments, eachattribute of the data was normalized into the range [−1, 1] foreach dataset. Meanwhile, the fivefold cross-validation strategywas adopted to evaluate the modeling performance. The settingsof the experiments and the experimental setup in this study aresummarized in Table I and Fig. 1.

B. Synthetic Datasets

First, the algorithms were experimented with the syn-thetic Friedman datasets [30]. The input attributes x =(x1 , x2 , . . . , x5)T were generated independently, each of thembeing uniformly distributed over [0, 1]. The target was defined by

y = 10 sin (πx1x2) + 20 (x3 − 0.5)2 + 10x4 + 5x5 + σ(0, 1)(28)

where σ(0, 1) is the noise term, which was normally distributedwith mean and variance set to 0 and 1, respectively.

In this experiment, datasets of different sizes were generatedand adopted for performance evaluation, the range of sizes beingfrom 103 to 106 .

1) Takagi–Sugeno–Kang Fuzzy-Model Training Algorithms:An experiment was conducted to compare the performance ofthe proposed STSK algorithm with that of the three existing TSKfuzzy-model training algorithms, i.e., IQP [14], LSSLI [14], andANFIS [2]. The number of fuzzy rules was set to M = 16 forthe STSK, IQP, and LSSLI algorithms. Since ANFIS employsthe grid-type partition strategy to generate the fuzzy rules, thenumber of fuzzy rules is large if the number of input variables islarge. In this experiment, the number of fuzzy rules for ANFISwas set to M = 32 (=25).

The results of this experiment are shown in Fig. 2. The av-erage generalization performance indices J of the four algo-rithms, which is obtained with fivefold validation strategy onthe Friedman datasets, are compared in Fig. 2(a). The averageCPU running times of these algorithms for model training arerecorded and presented in Fig. 2(b). The following observationsare noted from the results: 1) The generalization ability of theproposed STSK algorithm on the Friedman datasets is com-parable with that of the other three TSK fuzzy-model training

Page 8: Scalable TSK Fuzzy Modeling for Very Large Datasets Using Minimal-Enclosing-Ball Approximation

DENG et al.: SCALABLE TSK FUZZY MODELING FOR VERY LARGE DATASETS USING MINIMAL-ENCLOSING-BALL APPROXIMATION 217

TABLE ISETTINGS OF THE EXPERIMENTS

Fig. 1. Experimental setup. Each of the five datasets (top row) is used to evaluate the two categories of algorithms (middle row), respectively.

algorithms, and 2) the scalability of the STSK algorithm clearlyoutperforms that of the other three algorithms. The IQP andLSSLI algorithms usually fail when the sizes of the datasets arelarger than 5 × 103 ; for the ANFIS algorithm, the CPU runningtime increases sharply with increasing size of the datasets, and3) for small- and middle-size training datasets, the STSK algo-

rithm usually requires more CPU running time than the otherthree algorithms. However, for large and very large datasets,the CPU running time of the STSK algorithm usually increasesasymptotically with the size of the datasets. The STSK algo-rithm is thus quite efficient and effective for large and very largedatasets.

Page 9: Scalable TSK Fuzzy Modeling for Very Large Datasets Using Minimal-Enclosing-Ball Approximation

218 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 19, NO. 2, APRIL 2011

Fig. 2. Performance comparison of four TSK fuzzy-model training algorithms on synthetic Friedmen datasets. (a) Average generalization performance index J.(b) Average CPU running time for training.

Fig. 3. Performance comparison of three MEB-approximation-based modeling methods on synthetic Friedmen datasets. (a) Average generalization performanceindex J. (b) Average CPU running time for training.

2) MEB-Approximation-Based Modeling Methods: An ex-periment was also conducted to evaluate the performance ofthe STSK algorithm compared with that of the two existingMEB-approximation-based algorithms, i.e., fast ML-FIS [10]and CVR [8]. The numbers of fuzzy rules were set respectivelyto M = 9, 16, and 25 for STSK. For the fast ML-FIS algorithm,it is not necessary to set the number of fuzzy rules for the MLfuzzy model since it is determined automatically. For the CVRalgorithm, the radial basis function (RBF) was adopted as thekernel function to train the L2-norm SVR.

The results of this experiment are presented in Fig. 3. Theaverage generalization performance indices J of the three algo-rithms with fivefold cross-validation strategy on the Friedmandatasets are shown in Fig. 3(a). The average CPU running timesof these algorithms with fivefold cross-validation strategy aregiven in Fig. 3(b). It is observed that 1) the generalization abil-ity of the proposed STSK algorithm on the Friedman datasetsis comparable with that of the other two MEB-approximation-based algorithms; 2) all the three algorithms demonstrate goodscalability, and the CPU running time for training usually in-creases asymptotically with the increasing size of the datasets,

which is inherited from the core-set-based MEB-approximationtechnique; 3) for the STSK algorithm, the generalization abilityimproves on the Friedman datasets with increasing number offuzzy rules.

C. Real-World Datasets

The performance of the proposed STSK algorithm was alsoevaluated by conducting experiment on real-world datasetsavailable from the Laboratory of Artificial Intelligence andComputer Science at the University of Porto, Porto, Portu-gal [21]. The original datasets adopted in the experiment aredescribed in Table II. In this experiment, datasets of differentsize were generated by random sampling of the original datasetswith different percentages (i.e., 1%–100%) and used for perfor-mance evaluation. The settings of the experiment were the sameas that described in Section V-B.

Similarly, performance evaluation of the STSK algorithmwas made by comparing it with the TSK fuzzy-model train-ing algorithms and MEB-approximation-based modeling algo-rithms, respectively. The performance of the four TSK fuzzy-model training algorithms is presented in Tables III–V, while the

Page 10: Scalable TSK Fuzzy Modeling for Very Large Datasets Using Minimal-Enclosing-Ball Approximation

DENG et al.: SCALABLE TSK FUZZY MODELING FOR VERY LARGE DATASETS USING MINIMAL-ENCLOSING-BALL APPROXIMATION 219

TABLE IITHREE REAL-WORLD REGRESSION DATASETS

TABLE IIIPERFORMANCE COMPARISON OF FOUR TSK FUZZY-MODEL TRAINING ALGORITHMS ON CartExample DATASETS

TABLE IVPERFORMANCE COMPARISON OF FOUR TSK FUZZY-MODEL TRAINING ALGORITHMS ON Census DATASETS

Page 11: Scalable TSK Fuzzy Modeling for Very Large Datasets Using Minimal-Enclosing-Ball Approximation

220 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 19, NO. 2, APRIL 2011

TABLE VPERFORMANCE COMPARISON OF FOUR TSK FUZZY-MODEL TRAINING ALGORITHMS ON delta_elevators DATASETS

Fig. 4. Performance comparison of three MEB-approximation-based modeling methods on CartExample datasets. (a) Average generalization performance indexJ. (b) Average CPU running time for training.

performance of the three MEB-approximation-based modelingalgorithms is shown in Figs. 4–6. It is observed from theseexperimental results that 1) while the generalization ability ofthe STSK algorithm is comparable to the three existing TSKfuzzy-model training algorithms, the scalability of the proposedalgorithm obviously outperforms the others, and 2) the STSK al-gorithm is competitive with the two MEB-approximation-basedmodeling algorithms in terms of scalability (see Figs. 4–6).This experiment indicates that the proposed STSK algorithmhas promising scalability, making it possible to take advantageof the TSK fuzzy model even for applications involving largedatasets that are often encountered in the real world.

D. Biochemical Process Modeling

To further evaluate the performance of the STSK algorithm,an experiment was conducted by applying the algorithm tomodel a biochemical process that involves a large dataset. Thedataset adopted originates from the glutamic-acid fermentationprocess, which is a multiple-input–multiple-output dataset. Theinput variables of the dataset include fermentation time k, glu-cose concentration S(k), thalli concentration X(k), glutamic-acid concentration P (k), stirring speed R(k), and ventilationQ(k) at time k. The output variables are glucose concentrationS(k + 1), thalli concentration X(k + 1), and glutamic-acid con-centration P (k + 1). The estimation model based on the TSK

Page 12: Scalable TSK Fuzzy Modeling for Very Large Datasets Using Minimal-Enclosing-Ball Approximation

DENG et al.: SCALABLE TSK FUZZY MODELING FOR VERY LARGE DATASETS USING MINIMAL-ENCLOSING-BALL APPROXIMATION 221

Fig. 5. Performance comparison of three MEB-approximation-based modeling methods on Census datasets. (a) Average generalization performance index J.(b) Average CPU running time for training.

Fig. 6. Performance comparison of three MEB approximation-based modeling methods on delta_elevators datasets. (a) Average generalization performanceindex J. (b) Average CPU running time for training.

Fig. 7. Glutamic-acid fermentation process prediction model based on the TSK fuzzy system

fuzzy system is illustrated in Fig. 7. The data in this experimentwere collected from a 1.1 × 104 batch-fermentation process,each batch containing 15 data samples. The description of thewhole dataset is given in Table VI. In this experiment, datasets

of different size were generated by random sampling of theoriginal dataset with different percentages (i.e., 1%–100%) forperformance evaluation. The other experimental settings werethe same as those mentioned in the previous section.

Page 13: Scalable TSK Fuzzy Modeling for Very Large Datasets Using Minimal-Enclosing-Ball Approximation

222 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 19, NO. 2, APRIL 2011

TABLE VIBIOCHEMICAL PROCESS MODELING DATASET

Fig. 8. Performance comparison of four TSK fuzzy-model training algorithms on glucose concentration S(k). (a) Average generalization performance index J.(b) Average CPU running time for training.

Fig. 9. Performance comparison of four TSK fuzzy-model training algorithms on glutamic-acid concentration P(k). (a) Average generalization performanceindex J. (b) Average CPU running time for training.

The performances of the STSK algorithm and the other threeTSK fuzzy-model training algorithms on the prediction of thethree biochemical variables are shown in Figs. 8–10. The pro-posed STSK algorithm is found to be effective for biochemicalprocess modeling on large datasets. The results indicate that theSTSK algorithm has much better generalization performanceand scalability than the IQP and LSSLI algorithms. When com-pared with the ANFIS algorithm, the generalization ability ofthe STSK algorithm is competitive, and an obvious advantagein terms of CPU running time for model training is also demon-strated with the algorithm.

On the other hand, the performances of the STSK algorithmand the other two MEB-approximation-based modeling algo-rithms on the prediction of the three biochemical variables areshown in Figs. 11–13. The proposed STSK fuzzy-model train-ing algorithm demonstrates competitive performance over thefast ML-FIS and CVR modeling algorithms in terms of bothgeneralization ability and scalability. Compared with the fastML-FIS, the STSK algorithm usually requires fewer fuzzy rulesto model a complicated system. Besides, the STSK algorithmenables the interpretation of the TSK fuzzy model, while themodel obtained with the CVR is usually difficult to interpret

Page 14: Scalable TSK Fuzzy Modeling for Very Large Datasets Using Minimal-Enclosing-Ball Approximation

DENG et al.: SCALABLE TSK FUZZY MODELING FOR VERY LARGE DATASETS USING MINIMAL-ENCLOSING-BALL APPROXIMATION 223

Fig. 10. Performance comparison of four TSK fuzzy-model training algorithms on thalli concentration X(k). (a) Average generalization performance index J.(b) Average CPU running time for training.

Fig. 11. Performance comparison of three MEB-approximation-based modeling algorithms on glucose concentration S(k). (a) Average generalization performanceindex J. (b) Average CPU running time for training.

Fig. 12. Performance comparison of three MEB-approximation-based modeling algorithms on glutamic-acid concentration P(k). (a) Average generalizationperformance index J. (b) Average CPU running time for training.

Page 15: Scalable TSK Fuzzy Modeling for Very Large Datasets Using Minimal-Enclosing-Ball Approximation

224 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 19, NO. 2, APRIL 2011

Fig. 13. Performance comparison of three MEB-approximation-based modeling algorithms on thalli concentration X(k). (a) Average generalization performanceindex J. (b) Average CPU running time for training.

and taken as a black box. This experiment illustrates that theSTSK algorithm has a distinctive advantage for biochemicalprocess modeling with large datasets.

VI. CONCLUSION

In this study, STSK fuzzy-model training for large datasetsis investigated by introducing the core-set-based MEB-approximation technique. The two major contributions of thework are as follows. First, the equivalence between the MEBproblem and TSK fuzzy-model training is proved and demon-strated. Second, the fast core-set-based MEB-approximationtechnique used in computational geometry is adopted to developthe STSK fuzzy-model training algorithm for large datasets.

The proposed STSK algorithm has been studied with exper-iments on both synthetic and real-world datasets. The resultsdemonstrate that the algorithm has outstanding performanceon scalability and CPU running time for large and very largedatasets. This study has developed a promising method for TSKfuzzy-modeling in situations involving large datasets.

Future work will be performed to extend this study for fast andscalable training of the generalized fuzzy system (GFS) [26],[31]. GFS inherits the characteristics of both the TSK andthe ML fuzzy models. It also confronts the difficulty of highspace and time complexities when handling large datasets dur-ing the training procedure. As the GFS has extensive appli-cations, development of a scalable GFS training algorithm isof particular importance. This can be achieved by using theMEB-approximation technique, the critical challenge being toestablish a connection bridging the GFS with the MEB. A pos-sible strategy is to utilize both the Gaussian mixture model andthe linear regression model as the mediums to relate the GFSand the MEB. Once the equivalence between the GFS and theMEB is proved, scalable GFS training algorithms can be eas-ily developed by using the core-set-based MEB-approximationtechnique. The corresponding training algorithm will possesssimilar characteristics as those in other MEB-approximation-based algorithms.

ACKNOWLEDGMENT

The authors would like to thank the reviewers for their valu-able comments that have greatly improved the quality of ourmanuscript in many ways.

APPENDIX A

For (19), the corresponding Lagrangian function is given by

Φ(pg , ξ

+ , ξ+ , ε, λλ+ , λλ−)=

1τ· 1N

N∑

i=1

((ξ+

i )2 + (ξ−i )2)

+12pT

g pg +2τ· ε +

N∑

i=1

λ+i (yi−pT

g xgi − ε − ξ+i )

+N∑

i=1

λ−i (pT

g xgi − yi − ε − ξ−i ). (A1)

From this equation, the optimal values can be computed bysetting the derivatives of Φ(·) w.r.t. pg , ξ+

i , ξ−i and ε to zeros,respectively, i.e.,

∂Φ∂pg

= pg −N∑

i=1

λ+i xgi +

N∑

i=1

λ−i xgi = 0 (A2)

∂Φ∂ξ+

i

=2

Nτξ+i − λ+

i = 0 (A3)

∂Φ∂ξ−i

=2

Nτξ−i − λ−

i = 0 (A4)

∂Φ∂ε

=2τ−

N∑

i=1

λ+i −

N∑

i=1

λ−i = 0. (A5)

From (A2) to (A5), we obtain

pg =N∑

i=1

λ+i xgi −

N∑

i=1

λ−i xgi =

N∑

i=1

(λ+i − λ−

i )xgi (A6)

Page 16: Scalable TSK Fuzzy Modeling for Very Large Datasets Using Minimal-Enclosing-Ball Approximation

DENG et al.: SCALABLE TSK FUZZY MODELING FOR VERY LARGE DATASETS USING MINIMAL-ENCLOSING-BALL APPROXIMATION 225

TABLE VIIABBREVIATIONS USED IN THIS PAPER

ξ+i =

Nτλ+i

2(A7)

ξ−i =Nτλ−

i

2(A8)

=N∑

i=1

(λ+i + λ−

i ). (A9)

Letting λ+i = 2λ′+

i

/τ , and λ−

i = 2λ′−i

/τ , then (A6)–(A9) can

be rewritten as

pg =2τ

N∑

i=1

(λ′+i − λ′−

i )xgi (A10)

ξ+i = Nλ′+

i (A11)

ξ−i = Nλ′−i (A12)

N∑

i=1

(λ′+i + λ′−

i ) = 1. (A13)

Substituting (A10)–(A13) into (A1), the following optimiza-tion problem is obtained:

maxλλ

′+,λλ

′−Φ(λλ′+ , λλ′−)

= − 2τ 2

N∑

i=1

N∑

j=1

(λ′+i − λ′−

i )(λ′+j − λ′−

j )xTgixgj

− N

τ

N∑

i=1

((λ′+

i )2 + (λ′−i )2) +

N∑

i=1

λ′+i yi −

N∑

i=1

λ′−i yi

s.tN∑

i=1

(λ′+i + λ′−

i ) = 1, λ′+i , λ′−

i ≥ 0 ∀i. (A14)

By multiplying the constant τ 2/2 by the above objective

function, (A14) can be equivalently expressed as the following

optimization problem:

maxλλ

+,λλ

−Φ(λλ′+ , λλ′−) = −

N∑

i=1

N∑

j=1

(λ′+i −λ′−

i )(λ′+j −λ′−

j ) · xTgixgj

− Nτ

2

N∑

i=1

[(λ′+i )2 +

N∑

i=1

(λ′−i )2 ] +

N∑

i=1

(λ′+i − λ′−

i )yi · τ

s.tN∑

i=1

(λ′+i + λ′−

i ) = 1, λ′+i , λ′−

i ≥ 0 ∀i. (A15)

With (19) and (A14), let

maxλ

′+,λ

′−Φ

(λ′+ , λ′−)

= minpg ,ξ+ ,ξ+ ,ε

g(pg , ξ

+ , ξ+ , ε). (A16)

We obtain

ε =N∑

i=1

(λ′+i − λ′−

i )yi −N

2

N∑

i=1

((λ′+

i )2 + (λ′−i )2)

− 1τ

N∑

i=1

N∑

j=1

(λ′+i − λ′−

i )(λ′+j − λ′−

j )xTgixgj . (A17)

By using the notations λ+i , λ−

i instead of λ′+i , λ′−

i , it is clearthat (A10)–(A12), (A15), and (A17) are equivalent to (23a)–(23d) and (20), respectively.

APPENDIX B

Table VII contains a list of abbreviations used in this paper.

REFERENCES

[1] T. Takagi and M. Sugeno, “Fuzzy identification of systems and its ap-plication to modeling and control,” IEEE Trans. Syst., Man, Cybern.,vol. SMC-15, no. 1, pp. 116–132, Jan. 1985.

[2] J.-S. R. Jang, “ANFIS: Adaptive-network-based fuzzy inference systems,”IEEE Trans. Syst., Man, Cybern., vol. 23, no. 3, pp. 665–685, May 1993.

Page 17: Scalable TSK Fuzzy Modeling for Very Large Datasets Using Minimal-Enclosing-Ball Approximation

226 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 19, NO. 2, APRIL 2011

[3] J. C. Bezdek, J. Keller, and R. Krishnapuram, Fuzzy Models and Algo-rithms for Pattern Recognition and Image Processing. San Francisco,CA: Kluwer, 1999.

[4] J. C. Bezdek and Pattern, Recognition with Fuzzy Objective FunctionAlgorithm. New York: Plenum, 1982.

[5] L. X. Wang, Adaptive Fuzzy Systems And Control: Design And StabilityAnalysis. Upper Saddle River, NJ: Prentice-Hall, 1994.

[6] M. Badoiu and K. L. Clarkson, “Optimal core sets for balls,” presented atDIMACS Workshop Comput. Geometry, Piscataway, NJ, Nov. 2002.

[7] I. W. Tsang, J. T. Kwok, and P. M. Cheung, “Core vector machines: FastSVM training on large data sets,” J. Mach. Learn. Res., vol. 6, pp. 363–392, 2005.

[8] I. W. Tsang, J. T. Kwok, and J. M. Zurada, “Generalized core vectormachines,” IEEE Trans. Neural Netw., vol. 17, no. 5, p. 1126-1140, Sep.2006.

[9] Z. H. Deng, F. L. Chung, and S. T. Wang, “FRSDE: Fast reduced setdensity estimator using minimal enclosing ball approximation,” PatternRecog., vol. 41, no. 4, pp. 1363–1372, 2008.

[10] F. L. Chung, Z. H. Deng, and S. T. Wang, “From minimum enclosing ballto fast fuzzy inference system training on large datasets,” IEEE Trans.Fuzzy Syst., vol. 17, no. 1, pp. 173–184, Feb. 2009.

[11] I. W. Tsang, A. Kocsor, and J. T. Kwok, “Large-scale maximum margindiscriminant analysis using core vector machines,” IEEE Trans. NeuralNetw., vol. 19, no. 4, pp. 610–624, Apr. 2008.

[12] S. Varma, S. Asharaf, and M. Narasimha Murty, “Core vector clustering,”in Proc. Third Indian Int. Conf. Artif. Intell., Pune, India, 2007, pp. 565–574.

[13] P. Kumar, J. S. B. Mitchell, and A. Yildirim, “Approximate min-imum enclosing balls in high dimensions using core-sets,” ACMJ. Exp. Algo. [J/OL], vol. 8, 2003. [Online]. Available: http://portal.acm.org/citation.cfm?id=996548, 2010.

[14] J. Leski, “TSK-fuzzy modeling based on ε-insensitive learning,” IEEETrans. Fuzzy Syst., vol. 13, no. 2, pp. 181–193, Apr. 2005.

[15] J. Yen, L. Wang, and C. W. Gillespie, “Improving the interpretabilityof TSK fuzzy models by combining global learning and local learning,”IEEE Trans. Fuzzy Syst., vol. 6, no. 4, pp. 530–537, Aug. 1998.

[16] F. Nielsen and R. Nock, “Approximating smallest enclosing balls,” inProc. Int. Conf. Comput. Sci. Appl., 2004, vol. 3045, pp. 147–157.

[17] D. Tax and R. Duin, “Support vector domain description,” Patt. Recogn.Lett., vol. 20, no. 14, pp. 1191–1199, 1999.

[18] R. E. Fan, P. H. Chen, and C. J. Lin, “Working set selection using secondorder information for training support vector machines,” J. Mach. Learn.Res., vol. 6, pp. 1889–1918, 2005.

[19] S. Asharaf, M. N. Murty, and S. K. Shevade, “Multiclass core vectormachine,” in Proc. ICML, Corvallis, OR, 2007.

[20] A. Smola and B. Scholkopf, “Sparse greedy matrix approximation formachine learning,” in Proc. 17th Int. Conf. Mach. Learn., Stanford, CA,Jun. 2000, pp. 911–918.

[21] L. Torgo. (2009). “Regression datasets,” Dep. Comput. Sci., Porto Univ.,Porto, Portugal. [Online]. Available: http://www.liaad.up.pt/∼ltorgo/Regression/DataSets.html

[22] J. S. R. Jang, C. T. Sun, and E. Mizutani, Neuro-Fuzzy and Soft-Computing.Upper Saddle River, NJ: Prentice-Hall, 1997.

[23] E. H. Mamdani, “Application of fuzzy logic to approximate reasoning us-ing linguistic systems,” IEEE Trans. Comput., vol. C-26, no. 12, pp. 1182–1191, Dec. 1977.

[24] P. M. Larsen, “Industrial applications of fuzzy logic control,” Int. J.Man-Mach. Stud., vol. 12, pp. 3–10, 1980.

[25] V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.[26] M. F. Azeem, M. Hanmandlu, and N. Ahmad, “Generalization of adaptive

neuro-fuzzy inference systems,” IEEE Trans. Neural Netw., vol. 11, no. 6,pp. 1332–1346, Nov. 2000.

[27] V. Vapnik, The Nature of Statistical Learning Theory. Berlin, Germany:Springer, 1995.

[28] C. F. Juang, S. H. Chiu, and S. W. Chang, “A self-organizing TS-type fuzzynetwork with support vector learning and its application to classificationproblems,” IEEE Trans. Fuzzy Syst., vol. 15, pp. 998–1008, Oct. 2007.

[29] Y. X. Chen and J. Z. Wang, “Support vector learning for fuzzy rule-basedclassification systems,” IEEE Trans. Fuzzy Syst., vol. 11, pp. 716–723,Dec. 2003.

[30] J. Friedman, “Multivariate adaptive regression splines (with discussion),”Ann. Stat., vol. 19, no. 1, pp. 1–141, 1991.

[31] M. T. Gan, M. Hanmandlu, and A. Tan, “From a Gaussian mixture modelto additive fuzzy systems,” IEEE Trans. Fuzzy Syst., vol. 13, no. 3,pp. 303–316, Jun. 2005.

Zhaohong Deng received the B.S. degree in physicsfrom Fuyang Normal College, Fuyang, China, in2002 and the Ph.D. degree in light industry infor-mation technology and engineering from JiangnanUniversity, Wuxi, China, in 2008.

He is currently an Assistant Professor with theSchool of Information Technology, Jiangnan Uni-versity. He is the author or coauthor of more than30 research papers in international/national journals.His current research interests include neurofuzzy sys-tems, pattern recognition, and their applications.

Kup-Sze Choi (M’97) received the Ph.D. degree incomputer science and engineering from the ChineseUniversity of Hong Kong, Shatin, Hong Kong.

He is currently an Assistant Professor with theSchool of Nursing, the Hong Kong Polytechnic Uni-versity, Kowloon, Hong Kong, and the Leader of theTechnology in Health Care research team. His re-search interests include computational intelligence,computer graphics, virtual reality, physics-based sim-ulation, and their applications in medicine and healthcare.

Fu-Lai Chung (M’95) received the B.Sc. degreefrom the University of Manitoba, Winnipeg, MB,Canada, in 1987 and the M.Phil. and Ph.D. degreesfrom the Chinese University of Hong Kong, Shatin,Hong Kong, in 1991 and 1995, respectively.

In 1994, he joined the Department of Comput-ing, Hong Kong Polytechnic University, Kowloon,Hong Kong, where he is currently an Associate Pro-fessor. He is also a Guest Professor with the School ofInformation Technology, Jiangnan University, Wuxi,China. He has authored or coauthored more than 50

journal papers published in the areas of soft computing, data mining, machineintelligence, and multimedia. His current research interests include novel fea-ture selection techniques, text data mining, and fuzzy system modeling.

Shitong Wang received the M.S. degree in computerscience from Nanjing University of Aeronautics andAstronautics, Nanjing, China, in 1987.

He visited London University, London, U.K., andBristol University, Bristol, U.K., Hiroshima Inter-national University, Hiroshima, in Japan, and theHong Kong University of Science and Technology,Kowloon, Hong Kong, as a Research Scientist, formore than four years. He is currently a Full Professorand the Dean of the School of Information Technol-ogy, Jiangnan University, Wuxi, China. His research

interests include artificial intelligence, neuro-fuzzy systems, pattern recogni-tion, and image processing. He has authored or coauthored about 80 papers ininternational/national journals and has authored seven books.