11
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 16, NO. 4, AUGUST 2008 1027 Theoretical Choice of the Optimal Threshold for Possibilistic Linear Model With Noisy Input Hongwei Ge, Fu-Lai Chung, Member, IEEE, and Shitong Wang Abstract—Based on possibility concepts, various possibilistic lin- ear models (PLMs) have been proposed, and their pivotal role in fuzzy modeling and associated applications has been established. When adopting PLMs, one has to adopt an appropriate threshold (λ) value. However, choosing such a value is by no means trivial, and is still an open theoretical issue. In this paper, we propose a solution by first extending the PLM to its regularized version, i.e., a regularized PLM (RPLM), such that its generalization capability can be enhanced. The RPLM is then formulated as a maximum a posteriori (MAP) framework, which facilitates the determina- tion of the theoretically optimal threshold value for the RPLM with noisy input. Our mathematical derivations reveal the approx- imately inversely proportional relationship between the threshold λ and the standard deviation of Gaussian noisy input. This is also confirmed by the simulation results. This finding is very helpful for the practical applications of both PLMs and RPLMs. Index Terms—Maximum a posteriori (MAP), possibilistic linear model (PLM), possibility theory. I. INTRODUCTION I N MANY practical applications, the available data are often uncertain, imprecise, and even incomplete, which can be ap- propriately represented as fuzzy data. In order to process such kinds of data, fuzzy regression analysis [1] has been considered an important tool, and is successfully applied in many appli- cations, such as market forecasting, system identification, civil engineering, etc. The fuzzy regression analysis can be classified into two categories, i.e., possibilistic regression analysis [2]–[6], which works on possibility concepts, and least squares meth- ods [1], [7]–[15], which minimize the errors between the given outputs and the estimated outputs. Here, the possibilistic regres- sion analysis is our main concern. Possibilistic regression attempts to obtain new estimators by directly dealing with models formulated by possibility concepts. Manuscript received March 8, 2006; revised February 14, 2007 and April 28, 2007; accepted August 23, 2007. This work was supported in part by the Hong Kong Polytechnic University under Grant Z-08R, in part by the National 973 Key Project under Grant 2006CB705700, in part by the National 863 Projects under Grant 2006AA10Z313 and Grant 2007AA1Z1158, in part by the 2007 National Science Foundation of China, in part by the New_Century Outstanding Young Scholar Grant of the Ministry of Education of China under Grant NCET- 04–0496, in part by the National KeySoft Laboratory at Nanjing University, in part by the Key Laboratory of Computer Science at Insitute of Software, Chinese Academy of Sciences (CAS), China, in part by the Key Laboratory of Computer Information Technologies at JiangSu Province, in part by the 2004 Key Project of Ministry of Education of China, and in part by the National Key Laboratory of Pattern Recognition at Institute of Automation, CAS, China. H. Ge and S. Wang are with the School of Information Engineering, Southern Yangtze University, WuXi 214122, China (e-mail: [email protected]; [email protected]). F.-L. Chung is with the Department of Computing, Hong Kong Polytechnic University, Kuwloon, Hong Kong (e-mail: [email protected]). Digital Object Identifier 10.1109/TFUZZ.2008.917290 It was first presented by Tanaka et al. [1], where a fuzzy lin- ear system was adopted as a regression. Due to the fact that the membership functions of fuzzy sets are often described as possibility distributions, this approach is usually termed as the possibilistic regression analysis [2]–[6], where the correspond- ing fuzzy coefficients are assumed to be independent. Since then, there have been many advances in this research aspect. For ex- ample, a modified version of Tanaka’s possibilistic regression model was given in [12], where the possibilistic regression for fuzzy input–output data was considered. In [13], the fuzzy lin- ear programming was introduced into the modified Tanaka’s model. The important properties of the possibilistic regression have also been studied in [2], [3], and [14], and more variants of Tanaka’s model can be seen in [4]–[6] and [15]–[18], among which [16] introduces a support vector technique [19]–[21] into the possibilistic regression analysis to enhance its generalization capability. Noise often appears in real input–output data. With the ex- istence of noise, one very interesting but challenging issue is how to determine the free parameters (i.e., thresholds) in the possibilistic regression models. When some prior knowledge about the threshold values is available, such an issue is not critical. Unfortunately, it is not common in most practical situa- tions, and a bad choice of threshold values will severely deteri- orate the performances of these models. Data-resampling tech- niques [22], [23], such as cross-validation, seem to be feasible approaches to this issue. However, they are usually very expen- sive in terms of computation and/or data. Another approach is to consider the theoretically optimal choice. To the best of our knowledge, very little attention has been paid to this theoretical issue, particularly for the cases of having noisy inputs. In this paper, we first extend the traditional possibilistic linear model (PLM) to its regularized version to enhance its general- ization capability. The regularized model is then formulated as a maximum a posteriori (MAP) framework [22]–[24]. With the help of the MAP framework, we prove that there exists an approximately inversely proportional relationship between the free threshold of the model and the standard deviation of Gaussian noisy input. The remainder of this paper is orga- nized as follows. In Section II, the PLM is first introduced in brief. It is then extended to its regularized version, i.e., a regularized PLM (RPLM), and the equivalent relationship be- tween the RPLM and the MAP is finally derived. In Section III, for the case of Gaussian noisy input, the optimal choice of the threshold value for the RPLM is theoretically deter- mined. In Section IV, several experimental results are reported to confirm the theoretical claim. The final section concludes this paper. 1063-6706/$25.00 © 2008 IEEE

Theoretical Choice of the Optimal Threshold for Possibilistic Linear Model With Noisy Input

  • Upload
    shitong

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Theoretical Choice of the Optimal Threshold for Possibilistic Linear Model With Noisy Input

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 16, NO. 4, AUGUST 2008 1027

Theoretical Choice of the Optimal Threshold forPossibilistic Linear Model With Noisy Input

Hongwei Ge, Fu-Lai Chung, Member, IEEE, and Shitong Wang

Abstract—Based on possibility concepts, various possibilistic lin-ear models (PLMs) have been proposed, and their pivotal role infuzzy modeling and associated applications has been established.When adopting PLMs, one has to adopt an appropriate threshold(λ) value. However, choosing such a value is by no means trivial,and is still an open theoretical issue. In this paper, we propose asolution by first extending the PLM to its regularized version, i.e.,a regularized PLM (RPLM), such that its generalization capabilitycan be enhanced. The RPLM is then formulated as a maximuma posteriori (MAP) framework, which facilitates the determina-tion of the theoretically optimal threshold value for the RPLMwith noisy input. Our mathematical derivations reveal the approx-imately inversely proportional relationship between the thresholdλ and the standard deviation of Gaussian noisy input. This is alsoconfirmed by the simulation results. This finding is very helpful forthe practical applications of both PLMs and RPLMs.

Index Terms—Maximum a posteriori (MAP), possibilistic linearmodel (PLM), possibility theory.

I. INTRODUCTION

IN MANY practical applications, the available data are oftenuncertain, imprecise, and even incomplete, which can be ap-

propriately represented as fuzzy data. In order to process suchkinds of data, fuzzy regression analysis [1] has been consideredan important tool, and is successfully applied in many appli-cations, such as market forecasting, system identification, civilengineering, etc. The fuzzy regression analysis can be classifiedinto two categories, i.e., possibilistic regression analysis [2]–[6],which works on possibility concepts, and least squares meth-ods [1], [7]–[15], which minimize the errors between the givenoutputs and the estimated outputs. Here, the possibilistic regres-sion analysis is our main concern.

Possibilistic regression attempts to obtain new estimators bydirectly dealing with models formulated by possibility concepts.

Manuscript received March 8, 2006; revised February 14, 2007 and April 28,2007; accepted August 23, 2007. This work was supported in part by the HongKong Polytechnic University under Grant Z-08R, in part by the National 973Key Project under Grant 2006CB705700, in part by the National 863 Projectsunder Grant 2006AA10Z313 and Grant 2007AA1Z1158, in part by the 2007National Science Foundation of China, in part by the New_Century OutstandingYoung Scholar Grant of the Ministry of Education of China under Grant NCET-04–0496, in part by the National KeySoft Laboratory at Nanjing University,in part by the Key Laboratory of Computer Science at Insitute of Software,Chinese Academy of Sciences (CAS), China, in part by the Key Laboratory ofComputer Information Technologies at JiangSu Province, in part by the 2004Key Project of Ministry of Education of China, and in part by the National KeyLaboratory of Pattern Recognition at Institute of Automation, CAS, China.

H. Ge and S. Wang are with the School of Information Engineering, SouthernYangtze University, WuXi 214122, China (e-mail: [email protected];[email protected]).

F.-L. Chung is with the Department of Computing, Hong Kong PolytechnicUniversity, Kuwloon, Hong Kong (e-mail: [email protected]).

Digital Object Identifier 10.1109/TFUZZ.2008.917290

It was first presented by Tanaka et al. [1], where a fuzzy lin-ear system was adopted as a regression. Due to the fact thatthe membership functions of fuzzy sets are often described aspossibility distributions, this approach is usually termed as thepossibilistic regression analysis [2]–[6], where the correspond-ing fuzzy coefficients are assumed to be independent. Since then,there have been many advances in this research aspect. For ex-ample, a modified version of Tanaka’s possibilistic regressionmodel was given in [12], where the possibilistic regression forfuzzy input–output data was considered. In [13], the fuzzy lin-ear programming was introduced into the modified Tanaka’smodel. The important properties of the possibilistic regressionhave also been studied in [2], [3], and [14], and more variantsof Tanaka’s model can be seen in [4]–[6] and [15]–[18], amongwhich [16] introduces a support vector technique [19]–[21] intothe possibilistic regression analysis to enhance its generalizationcapability.

Noise often appears in real input–output data. With the ex-istence of noise, one very interesting but challenging issue ishow to determine the free parameters (i.e., thresholds) in thepossibilistic regression models. When some prior knowledgeabout the threshold values is available, such an issue is notcritical. Unfortunately, it is not common in most practical situa-tions, and a bad choice of threshold values will severely deteri-orate the performances of these models. Data-resampling tech-niques [22], [23], such as cross-validation, seem to be feasibleapproaches to this issue. However, they are usually very expen-sive in terms of computation and/or data. Another approach isto consider the theoretically optimal choice. To the best of ourknowledge, very little attention has been paid to this theoreticalissue, particularly for the cases of having noisy inputs.

In this paper, we first extend the traditional possibilistic linearmodel (PLM) to its regularized version to enhance its general-ization capability. The regularized model is then formulated asa maximum a posteriori (MAP) framework [22]–[24]. Withthe help of the MAP framework, we prove that there existsan approximately inversely proportional relationship betweenthe free threshold of the model and the standard deviation ofGaussian noisy input. The remainder of this paper is orga-nized as follows. In Section II, the PLM is first introducedin brief. It is then extended to its regularized version, i.e., aregularized PLM (RPLM), and the equivalent relationship be-tween the RPLM and the MAP is finally derived. In SectionIII, for the case of Gaussian noisy input, the optimal choiceof the threshold value for the RPLM is theoretically deter-mined. In Section IV, several experimental results are reportedto confirm the theoretical claim. The final section concludes thispaper.

1063-6706/$25.00 © 2008 IEEE

Page 2: Theoretical Choice of the Optimal Threshold for Possibilistic Linear Model With Noisy Input

1028 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 16, NO. 4, AUGUST 2008

II. REGULARIZED POSSIBILISTIC LINEAR MODEL AND MAP

A. Possibilistic Linear Model

A PLM can be written as

Y∼

= f(x,A∼

) = A0∼

+A1∼

x1 + A2∼

x2 + · · · + An∼

xn (1)

where Ai∼

(0 ≤ i ≤ n) denotes a symmetric triangular fuzzy

number, i.e., Ai∼

= (ξi, ηi), whose membership function is de-

fined as

µAi(ai) =

{1 − |ai −ξi |

ηiξi − ηi ≤ ai ≤ ξi + ηi

0, otherwise(2)

where ξi is the center and ηi is the radius. According to fuzzyarithmetic, we have

Ai∼

xi = (ξixi, ηi |xi |) (3)

Ai∼

+Aj∼

= (ξi + ξj , ηi + ηj ). (4)

Thus, let

x = (1, x1 , x2 , . . . , xn )T (5)

|x| = (1, |x1 |, |x2 |, . . . , |xn |)T (6)

ξT = (ξ0 , ξ1 , ξ2 , . . . , ξn ) (7)

ηT = (η0 , η1 , η2 , . . . , ηn ). (8)

Then, the aforementioned PLM becomes

Y∼

= A0∼

+A1∼

x1 + A2∼

x2 + · · · + An∼

xn

= (ξ0 + ξ1x1 + ξ2x2 + · · · + ξnxn , η0 + η1 |x1 |+ η2 |x2 | + · · · + η2 |xn |)

= (ξT x,ηT |x|) (9)

and the acceptable fuzzy membership function µY∼(y) of output

y can be derived as

µY∼(y) = 1 − |y − ξT x|

ηT |x| (10)

where µY∼(y) = 0 when ηT |x| ≤ |y − ξT x|.

For a given input–output dataset D defined as

D = {(x1 , y1), (x2 , y2), . . . , (xN , yN )},xi ∈ Rn+1 , yi ∈ R, i = 1, 2, . . . , N, (11)

the possibilistic linear regression analysis attempts to find thecorresponding appropriate triangular fuzzy numbers such that

Y ∗i∼

=T

A∗∼

xi

where xi = (1, xi1 , xi2 , . . . , xin )T , A∼

= (A0∼

, A1∼

, A2∼

, . . . ,

An∼

)T , A∗∼

= (A∗0

∼, A∗

1∼

, A∗2

∼, . . . , A∗

n∼

)T , i = 1, 2, . . . , N , and

Y ∗i∼

and A∗∼

denote the estimates of Yi∼

and A∼

, respectively.

TABLE IBENCHMARKING DATASET FROM [6]

In the possibilistic linear regression model, we often need tochoose an appropriate free parameter (i.e., the threshold λ), suchthat µY ∗

i∼

(yi) ≥ λ, i.e.

µY ∗i∼

(yi) = 1 − |yi − ξT xi |ηT |xi |

≥ λ (12)

where ηT |xi | denotes a radius of Y ∗i∼

and ξT xi denotes a center

of Y ∗i∼

. That is, the degree of fitting the estimated possibilistic

linear regression model Y ∗i∼

= A∗T∼

xi to the given output data

is determined by the corresponding λ-level set [25]. Therefore,λ plays a key role in the possibilistic linear regression model. Ifwe take

∑Ni=1 ηT |xi | as the objective index, then the possibilis-

tic linear regression analysis in (1) will become the followingoptimization problem [8], [26], [27]:

min Φ(η, ξ,∆) =∑

i

ηT |xi | + c∑

i

(∆−i + ∆+

i )

such that

yi − ξT xiηT |x i | ≤ (1 − λ) + ∆−

i

ξT xi − yiηT |x i | ≤ (1 − λ) + ∆+

i

∆−i ,∆+

i ≥ 0

i = 1, 2, . . . , N

(13)

where c is a predefined constant, and ∆− and ∆+ denote thelatent variables of the upper/lower bounds of the output, re-spectively. Thus, (1) and (13) describe our PLM, and the PLMwithout delta in (13) corresponds to the original fuzzy regressionmodel proposed by Tanaka in [2] and [3].

In order to enhance the PLM’s generalization capability andfollow the spirit of support vector regression techniques in [16],[19], [20], and [24], we propose to introduce a regularized term1/2ξT ξ into (13), i.e.

min Φ(η, ξ,∆) =∑

i

ηT |xi | +12ξT ξ + c

∑i

(∆−i + ∆+

i ).

(14)

Thus, an RPLM is obtained. As mentioned previously, thethreshold λ and the parameter c are influential to the perfor-mance of the PLM and the RPLM as well. Let us take a sim-ple single-input–single-output example to illustrate the perfor-mance dependency of the RPLM on λ and c. Table I lists thebenchmarking dataset taken from [6]. Figs. 1 and 2 illustrate theobtained fuzzy regression results using the RPLM with differentλ and c. Obviously, both λ and c are influential to the obtainedfuzzy regression models. For the parameter c that plays thesame role as in the support vector regression, we can readily

Page 3: Theoretical Choice of the Optimal Threshold for Possibilistic Linear Model With Noisy Input

GE et al.: THEORETICAL CHOICE OF THE OPTIMAL THRESHOLD FOR POSSIBILISTIC LINEAR MODEL WITH NOISY INPUT 1029

Fig. 1. Obtained fuzzy regression results using an RPLM with λ = 0.7 and c= 1, 10, or 100.

Fig. 2. Obtained fuzzy regression results using an RPLM with c = 100 and λ= 0.3, 0.7, or 0.9.

refer to [19] and [21] to determine its value. However, how todetermine an appropriate value for the threshold λ is still anopen issue, particularly in the case of having noisy inputs. In thefollowing section, an attempt to address this issue is presented.

B. Regularized Model and MAP

In this section, by using the evidence theory [22], [23], weshow that the RPLM is equivalent to an MAP estimation. Givena dataset D defined in (11), we are interested in obtaining aweight vector ξ̃ such that

yi = ξ̃Txi + ni, i = 1, 2, . . . , N (15)

for all the data in the dataset D, where all the data xi fol-low some distribution p(·) and all ni are independent iden-tically distributed (i.i.d) noises following some distribution

φ(·). Thus, the corresponding density function on y can be

denoted by p(y|x) = φ(y − ξ̃Tx). The degree of such an ap-

proximation may be measured by the following loss functionL(ξT x,ηT x, y), given as

L(ξT x,ηT x, y) =

0|ξT x − y|

ηT |x| ≤ 1 − λ

|ξT x − y|ηT |x| − 1 + λ, otherwise.

(16)Assume that the loss function L(ξT x,ηT x, y) leads to the

Gaussian probability density function on y, given by

p(yi |xi , ξ,η, β, λ)

=1

C(β, λ)exp[−βL(ξT xi ,η

T xi , yi)] (17)

where β is a parameter and C(β, λ) =∫∫

D exp[−βL(ξT x,ηT x, y)]dx dy.

With the Gaussian prior on η and ξ, i.e.

p(η|α,xi) =1

M(α)exp(−αηT |xi |) (18)

where

M(α) =∫

exp(−αηT |xi |) dη (19)

and

p(ξ|γ) =1

F (γ)exp

(−γ

2ξT ξ

)(20)

where

F (γ) =∫

exp(−γ

2ξT ξ

)dξ. (21)

By applying the Bayes rule, we have

p(η, ξ|D,β, λ) ∝ p(η|ξ,D, β, λ)p(ξ|γ)

∝ p(D|η, ξ, β, λ)p(η|α)p(ξ|γ) (22)

that is

p(η, ξ|D,β, λ)

∝ p(D|η, ξ, β, λ)p(ξ|γ)N∏

i=1

p(η|α,xi)p(xi), (23)

and then, we have

ln p(η, ξ|D,β, λ)

= −αN∑

i=1

ηT |xi | −γ

2ξT ξ − β

N∑i=1

L(ηT xi , ξT xi , yi)

− N ln C(β, λ) + const. (24)

Furthermore, when γ = α and c = β/α, optimizing (14) canbe interpreted as finding the MAP estimates of η, ξ for the givenvalues of β, λ. In other words, the RPLM in (14) is equivalentto the MAP problem in (24).

Page 4: Theoretical Choice of the Optimal Threshold for Possibilistic Linear Model With Noisy Input

1030 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 16, NO. 4, AUGUST 2008

In general, it is not trivial to obtain the MAP estimate η, ξby directly solving (24), because it depends on the given train-ing dataset. To ease our analysis, we replace 1

N

∑Ni=1 L(ηT xi ,

ξT xi , yi) in (24) by its expectation

E(L(ηT x, ξT x, y))

=∫

D

[∫ ξT x−(1−λ) ηT |x|

−∞

(ξT x − y

ηT |x| − 1 + λ

)p(y|x) dy

+∫ +∞

ξT x+(1−λ) ηT |x|

(y − ξT xηT |x| − 1 + λ

)p(y|x) dy

]

× p(x) dx. (25)

Note that 1/N∑N

i=1 ηT |xi | in (24) can also be approximatedby

E(ηT |x|) =∫

D

ηT |x|p(x) dx. (26)

Thus, (24) becomes

M(η, ξ, β, λ) = ln p(η, ξ|D,β, λ)

= −αN ∗ E(ηT |x|) − γ

2ξT ξ

− βN ∗ E(L(ηT x, ξT x, y)) − N ln C(β, λ)

+ const. (27)

In order to maximize M(η, ξ, β, λ), its derivatives with re-spect to η, ξ, β, λ must be zeros. In other words

∂M(η, ξ, β, λ)∂η

∣∣∣η= η̂

=−αN ∗ E(|x|) − βN ∗ ∂E(L(η̂T x, ξT x, y))

∂η

= 0 (28)

∂M(η, ξ, β, λ)∂ξ

∣∣∣ξ= ξ̂

=−γξ̂ − βN ∗ ∂E(L(ηT x, ξ̂

Tx, y))

∂ξ

= 0 (29)

∂M(η, ξ, β, λ)∂β

=[∂M(η, ξ, β, λ)

∂η

∣∣∣η=η̂

]∂η̂

∂β+

[∂M(η, ξ, β, λ)

∂ξ

∣∣∣ξ= ξ̂

]∂ξ̂

∂β

− N ∗ E(L(η̂T x, ξ̂T

x, y)) − N∂C(β, λ)/∂β

C(β, λ)= 0 (30)

i.e.

E(L(η̂T x, ξ̂T

x, y)) = −∂C(β, λ)/∂β

C(β, λ)(31)

∂M(η, ξ, β, λ)∂λ

=[∂M(η, ξ, β, λ)

∂η

∣∣∣η=η̂

]∂η̂

∂λ+

[∂M(η, ξ, β, λ)

∂ξ

∣∣∣ξ= ξ̂

]∂ξ̂

∂λ

− βN ∗∫

D

[ ∫ ξ̂T

x−(1−λ) η̂T |x|

−∞p(y|x)dy

+∫ +∞

ξ̂T

x+(1−λ) η̂T |x|p(y|x)dy

]p(x) dx

−N∂C(β, λ)/∂λ

C(β, λ)= 0 (32)

i.e. ∫D

[ ∫ ξ̂T

x−(1−λ) η̂T |x|

−∞p(y|x)dy

+∫ +∞

ξ̂T

x+(1−λ) η̂T |x|p(y|x)dy

]p(x) dx

=−∂C(β, λ)/∂λ

βC(β, λ). (33)

When η = η̂, ξ = ξ̂,θ maximizing (27) actually becomesthe optimization problem

arg minα,β ,λ

αE(η̂T |x|)

+ βE(L(η̂T x, ξ̂T

x, y)) + lnC(β, λ). (34)

Thus, (25) and (31)–(33) can be used to find the optimalα, β, λ values in our RPLM. In the next section, the dependencyrelationship between the threshold λ and the noisy input is ana-lyzed. Without loss of generality, a Gaussian noise model withzero mean and standard deviation σ is used.

III. ESTIMATING THE OPTIMAL THRESHOLD λ IN THE RPLMWITH GAUSSIAN NOISY INPUT

According to the corresponding MAP estimation, C(β, λ) in(34) can be rewritten as

C(β, λ) =∫ ∫

D

exp [−βL(ηT x, ξT x, y)] dx dy

=∫ +∞

−∞exp(−βL(ηT x, ξT x, y))dt

= 2

(∫ 1−λ

0exp(0)dt+

∫ +∞

1−λexp(−β(t − 1+λ))dt

)

=2(1 + (1 − λ)β)

β. (35)

In terms of (31) and (35), we immediately have

E(L(η̂T x, ξ̂T

x, y)) =1

β(1 + (1 − λ)β). (36)

Page 5: Theoretical Choice of the Optimal Threshold for Possibilistic Linear Model With Noisy Input

GE et al.: THEORETICAL CHOICE OF THE OPTIMAL THRESHOLD FOR POSSIBILISTIC LINEAR MODEL WITH NOISY INPUT 1031

Furthermore, according to (33) and (35), we obtain∫D

[∫ ξ̂T

x−(1−λ) η̂T |x|

−∞p(y|x)dy +

∫ +∞

ξ̂T

x+(1−λ) η̂T |x|p(y|x)dy

]

× p(x)dx =1

1 + (1 − λ)β. (37)

Assume that a Gaussian noise model with zero mean andstandard deviation σ is taken, i.e.

p(y|x) =1√2πσ

exp

[−(y − ξ̃

Tx)2

2σ2

]. (38)

Let

t =y − ξ̃

Tx√

2σ, y − ξ̃

Tx ≈ ξ̂

Tx − ξ̃

Tx = δ(x)

ε = 1 − λ

A(x) = η̂T |x|

E =∫

D

A(x)p(x)dx

b1(x) =1√2

(εA(x)

σ− δ(x)

σ

)

b2(x) =1√2

(εA(x)

σ+

δ(x)σ

)and

f(x) =2√π

∫ +∞

x

exp(−t2)dt.

Also, note that ∫D

p(x)dx = 1

∫D

δ(x)p(x)dx = 0

∫D

δ2(x)p(x)dx = σ2 ,

and

exp[−δ2(x)

2σ2

]≈ 1 − δ2(x)

2σ2 .

Thus, from (37), we have

11 + εβ

=∫

D

f(b1) + f(b2)2

p(x)dx. (39)

From (25) and (36), we have1

β(1 + εβ)=

1β (1 + εβ)

=∫

D

[1

A(x)

(− σb1√

2f(b1) −

σb2√2

f(b2)

+σ√2π

exp(−b21)

+σ√2π

exp(−b22)

)]p(x)dx.

(40)

By applying the Taylor series expansion on f(x) andexp(−x2), i.e.

f(x + h) = f(x) − 2√π

e−x2h +

2√π

xe−x2h2 + o(h3) (41)

exp(−(x + h)2) = exp(−x2)(1 − 2xh + (2x2 − 1)h2) + o(h3)

(42)

and after some tedious derivations, (39) and (40) become

11 + εβ

≈∫

D

[f

(εA(x)√

)+

2√π

εA(x)√2σ

× exp(−ε2A2(x)

2σ2

)δ2(x)2σ2

]p(x)dx

= f

(εE√2σ

)+

εE√2πσ

exp(−ε2E2

2σ2

)(43)

and

1β(1 + εβ)

≈∫

D

[− ε

(f(b1) + f(b2)

2

)+

δ(x)2A(x)

(f(b1) − f(b2))

+σ√

2πA(x)2 exp

(−ε2A2(x)

2σ2

)

×(

1 +((

ε2A2(x)σ2

)− 1

)δ2(x)2σ2

) ]p(x)dx. (44)

Again, after some tedious derivations using (36), (39), (41),and (44), we have

E(L(η̂T x, ξ̂Tx, y))

=1

β(1 + εβ)

= − ε

1 + εβ+

∫D

[∫ ξ̂T

x−ε·A(x)

−∞

ξ̂Tx − y

A(x)p(y |x )dy

+∫ +∞

ξ̂T

x+ε·A(x)

y − ξ̂T xA(x)

p(y |x )dy

]p(x)dx

= − ε

1 + εβ+

∫D

[ √2√

πA(x)exp

(−ε2A2(x)

2σ2

)

×(

σ+(

ε2A2(x)σ2 + 1

)δ2(x)2σ

)]p(x)dx

= − ε

1 + εβ+

√2σ√πE

exp(−ε2E2

2σ2

) (32

+ε2E2

2σ2

)(45)

i.e.

=√

2σ√πE

exp(−ε2E2

2σ2

)(32

+ε2E2

2σ2

). (46)

Page 6: Theoretical Choice of the Optimal Threshold for Possibilistic Linear Model With Noisy Input

1032 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 16, NO. 4, AUGUST 2008

After substituting (43) into (45), we have

E(L(η̂T x, ξ̂Tx, y))= − εf

(εE√2σ

)+

3σ√2πE

exp(−ε2E2

2σ2

).

(47)Thus,

β ∗ E(L(η̂T x, ξ̂Tx, y))

=√

2π exp(

ε2E2

2σ2

) (3 +

ε2E2

σ2

)−1

(−εE

σf

(εE√2σ

)+

3√2π

exp(−ε2E2

2σ2

)). (48)

Now, let us go back to (28) for η = η̂, ξ = ξ̂, i.e.

αE(|x|) = −β∂E(L(η̂T x, ξ̂

Tx, y))

∂η. (49)

Obviously, for the given dataset D defined in (11), we canview E(|x|) as a constant vector. Thus, there exists a constantvector k such that

E(L(η̂T x, ξ̂Tx, y)) = η̂T k (50)

i.e.

E(L(η̂T x, ξ̂Tx, y))|x| = η̂T k|x| = η̂T |x|k. (51)

So,

E(η̂T |x|) = E(L(η̂T x, ξ̂Tx, y))(k−1)E(|x|) (52)

where k−1k = I. After substituting (35), (43), (46), (48), and(52) into (34), minimizing (34) becomes minimizing

h( ε

σ

)=

√2π exp

(ε2E2

2σ2

)(3 +

ε2E2

σ2

)−1

×(

α(k−1)E(|x|) − εE

σf

(εE√2σ

)

+3√2π

exp

(−ε2E2

2σ2

))

− ln

(f

(εE√2σ

)+

εE√2πσ

exp

(−ε2E2

2σ2

))

+ ln

( √2σ√πE

exp

(−ε2E2

2σ2

)(32+

ε2E2

2σ2

))+ const.

(53)

Obviously, when ε/σ takes some fixed values, (53) willachieve its minimum, indicating that there is an approximatelylinear relationship between ε and σ. As ε = 1 − λ, there is anapproximately inversely linear relationship between λ and thestandard deviation σ of noisy input. This finding offers an im-portant reference for determining the threshold λ in the PLMwith noisy input.

Fig. 3. Original data set.

Fig. 4. Relationship between λ and σ when k = 0.01.

IV. EXPERIMENTAL RESULTS

Four experiments were conducted to validate our findings inthe previous section, and the results are reported next.

Experiment 1: In this experiment, we adopted the func-tion y = (1/4)x + 1 + r, x ∈ [−5, 5], where r ∈ [−1, 1] de-notes a random value, and generated the original dataset(xi, yi) , i = 1, 2, . . . , 41, with x varying from −5 to 5 using astep length equal to 0.25, as shown in Fig. 3. We first set λ = 0.5,and obtained the regression value (ξi, ηi), i = 1, 2, . . . , 41,using our RPLM model, where ξi denotes the center and ηi

denotes the radius for the ith fuzzy membership function. Inorder to study the dependency relationship between λ andσ, let y′

i = yi + kn, i = 1, 2, . . . , 41, where k is an SNR andn ∼ N(0, σ) represents the Gaussian noise. Similarly, we cangenerate its corresponding dataset (xi, y

′i), i = 1, 2, . . . , 41 and

obtain its regression value (ξ′i , η′i), i = 1, 2, . . . , 41, using the

same RPLM model. To ensure fairness in this experiment, wetook σ from [0.1, 2.0], with step length equal to 0.1, and used

Page 7: Theoretical Choice of the Optimal Threshold for Possibilistic Linear Model With Noisy Input

GE et al.: THEORETICAL CHOICE OF THE OPTIMAL THRESHOLD FOR POSSIBILISTIC LINEAR MODEL WITH NOISY INPUT 1033

Fig. 5. Relationship between λ and σ when k = 0.05.

Fig. 6. Relationship between λ and σ when k = 0.1.

the Gaussian noise distribution to generate 20 groups of the cor-responding datasets for each given σ. For each given σ, we com-puted λ as the average result of all 20 λ values that can minimize∑41

i=1

√(ξi − ξ′i)2 + (ηi − η′

i)2 , respectively, for each group ofthe generated datasets.

Figs. 4–6 demonstrate the dependency relationship betweenthe threshold λ and σ for all 20 σ values with different k values(see ‘+’ in the figures). Here, the curves depict the changetendencies between λ and σ, respectively. We can easily seefrom these figures that, when the noise is small, i.e., k and σare comparatively small, there is an approximately inverselyproportional relationship between λ and σ. However, when kand/or σ are comparatively large, i.e., the datasets are seriouslydistorted, this dependency cannot be maintained any more (seeFigs. 5 and 6).

Experiment 2: We took the dataset from [6] in this exper-iment (see also Table I). With the same experimental setupand method as in Experiment 1, Figs. 7–9 demonstrate our

Fig. 7. Relationship between λ and σ when k = 0.05.

Fig. 8. Relationship between λ and σ when k = 0.1.

Fig. 9. Relationship between λ and σ when k = 0.15.

Page 8: Theoretical Choice of the Optimal Threshold for Possibilistic Linear Model With Noisy Input

1034 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 16, NO. 4, AUGUST 2008

TABLE II21 ORIGINAL SPECTRAL DATA OF A KIND OF DIESEL OIL

Fig. 10. Relationship between λ and σ when k = 0.01.

experimental results. Obviously, the results here support thefindings in Section III, and conclusions like that in Experiment 1can be drawn.

Experiment 3: In this experiment, a real-life dataset was takenfrom [26] and listed in Table II. This dataset collects 21 realspectral analysis data of a kind of diesel oil used in a gas engine.In the original dataset, the input variables are Po, Cr, Mo, Al,Ma, Ni, and V, and the output variable is Fe. With the sameexperimental setup and method as before, Figs. 10–12 showthe experimental results, which again validate our theoreticalconclusion.

Fig. 11. Relationship between λ and σ when k = 0.05.

Fig. 12. Relationship between λ and σ when k = 0.1.

Experiment 4: In this experiment, we apply the RPLM to a realdataset about wastewater treatment systems in Taiwan [25], asshown in Table III. The fuzzy regression was used in [25] to builda cost function for this basic factual data including flow rate,treatment degree, influent biochemical oxygen demand (BOD),and collection area. As pointed out in [25], this data set wascollected from various sources, which contain some degree ofuncertainty originating from errors of various types and evennoise. Therefore, when the RPLM is utilized, we should considerthe choice of λ with noisy input. With the same experimentalsetup and method as in Experiment 1, Figs. 13–15 illustrate therelationship between λ and σ with different k. Obviously, theobtained results tell us that λ should take a small value for verynoisy input. In other words, if a dataset contains small noise, wecan take a comparatively bigger λ value.

Page 9: Theoretical Choice of the Optimal Threshold for Possibilistic Linear Model With Noisy Input

GE et al.: THEORETICAL CHOICE OF THE OPTIMAL THRESHOLD FOR POSSIBILISTIC LINEAR MODEL WITH NOISY INPUT 1035

TABLE IIIINPUT–OUTPUT DATA CONCERNING WASTEWATER TREATMENT SYSTEMS IN TAIWAN [25]

Fig. 13. Relationship between λ and σ when k = 0.1. Fig. 14. Relationship between λ and σ when k = 0.15.

Page 10: Theoretical Choice of the Optimal Threshold for Possibilistic Linear Model With Noisy Input

1036 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 16, NO. 4, AUGUST 2008

Fig. 15. Relationship between λ and σ when k = 0.2.

V. CONCLUSION

In this paper, the PLM was extended to its regularized ver-sion, the RPLM, in order to enhance its generalization capability.Accordingly, the RPLM was proved to be equivalent to the cor-responding MAP framework. With the help of this framework,an approximately inversely proportional relationship betweenthe threshold λ in the RPLM and the standard deviation σ ofGaussian noisy input was derived. Our experimental results con-firmed this dependency relationship.

It should be pointed out that, although this conclusion isbased on the RPLM, we can also refer to it when the PLM isapplied. In other words, our theoretical result on the optimalchoice of the threshold λ is very valuable for both the PLM andRPLM. Moreover, in order to apply the obtained approximatelyinversely proportional relationship in practical applications, onehas to first obtain an estimate of the noise level σ. One way todo so is by using Bayesian methods. In the future, we will studyhow to integrate PLMs/RPLMs with Bayesian methods.

Although the Gaussian noise is typically adopted in a ro-bust data analysis, there remain other symmetric/nonsymmetrictypes of noise such as the Student-t-distribution noise and theDirichlet distribution noise. The way to determine the optimalchoices of the threshold λ in the RPLM with such kinds of noisyinput is also worthy to be studied in the future.

ACKNOWLEDGMENT

The authors would like to thank the referees for their valuablecomments. They helped us improve the quality of the papergreatly.

REFERENCES

[1] H. Tanaka, S. Uejima, and K. Asai, “Linear regression analysis with fuzzymodel,” IEEE Trans. Syst., Man, Cybern., vol. SMC-12, no. 6, pp. 903–907, Dec. 1982.

[2] H. Tanaka, “Fuzzy data analysis by possibilistic linear model,” Fuzzy SetsSyst., vol. 24, pp. 363–375, 1987.

[3] H. Tanaka, “Possibilistic linear system and their application to the linearregression model,” Fuzzy Sets Syst., vol. 27, pp. 275–289, 1988.

[4] H. Tanaka and H. Ishibuchi, “Possibilistic regression analysis based onlinear programming,” in Fuzzy Regression Analysis, J. Kacprzyk andM. Fedrizzi, Eds. Heidelberg, Germany: Omnitech; Warsaw, Poland:Springer-Verlag, 1992, pp. 47–60.

[5] H. Tanaka, H. Ishibuchi, and S. Yoshikawa, “Exponential possibility re-gression analysis,” Fuzzy Sets Syst., vol. 69, pp. 305–318, 1995.

[6] H. Tanaka and H. Ishibuchi, “Identification of possibilistic linear systemsby quadratic membership functions of fuzzy parameters,” Fuzzy SetsSyst., vol. 41, pp. 145–160, 1991.

[7] P. Diamond, “Fuzzy least squares,” Inf. Sci., vol. 46, pp. 145–157, 1988.[8] K. K. Yen, S. Ghoshray, and G. Roig, “A linear regression model using

triangular fuzzy number coefficients,” Fuzzy Sets Syst., vol. 106, pp. 167–177, 1999.

[9] C. Kao and C.-L. Chyu, “A fuzzy linear regression model with betterexplanatory power,” Fuzzy Sets Syst., vol. 126, pp. 401–409, 2002.

[10] M. Modarres and E. Nasrabadi, “Fuzzy linear regression models with leastsquare errors,” Appl. Math. Comput., vol. 163, pp. 977–989, 2005.

[11] M. Hojati and C. R. Bector, “A simple method for computation of fuzzylinear regression,” Eur. J. Oper. Res., vol. 166, pp. 172–184, 2005.

[12] D. Savic and W. Pedrycz, “Evaluation of fuzzy linear regression models,”Fuzzy Sets Syst., vol. 39, pp. 51–63, 1991.

[13] G. Peters, “Fuzzy linear regression with fuzzy intervals,” Fuzzy Sets Syst.,vol. 63, pp. 45–55, 1994.

[14] D. T. Redden and W. H. Woodall, “Properties of certain fuzzy linearregression methods,” Fuzzy Sets Syst., vol. 64, pp. 361–375, 1994.

[15] H. Tanaka and H. Lee, “Interval regression analysis by quadratic program-ming approach,” IEEE Trans. Fuzzy Syst., vol. 6, no. 4, pp. 473–481, Nov.1998.

[16] D. H. Hong and C. Hwang, “Support vector fuzzy regression machines,”Fuzzy Sets Syst., vol. 138, pp. 271–281, 2003.

[17] M. Sakawa and H. Yano, “Multiobjective fuzzy linear regression analysisfor fuzzy input-output data,” Fuzzy Sets Syst., vol. 47, pp. 173–181, 1992.

[18] H. Ishibuchi and M. Nii, “Numerical analysis of the learning of fuzzifiedneural networks from fuzzy if-then rules,” Fuzzy Sets Syst., vol. 120,no. 2, pp. 281–307, 2001.

[19] M. H. Law and J. T. Kwok, “Bayesian support vector regression,” in ProcEnglish Int. Workshop Artif. Intell. Statist., 2001, pp. 239–244.

[20] V. Cherkassky and Y. Ma, “Practical selection of SVM parameters andnoise estimation for SVM regression,” Neural Netw., vol. 17, no. 1,pp. 113–126, 2004.

[21] N. Cristianini and J. Shawe-Taylor, An Introduction to Support VectorMachines. Cambridge, U.K.: Cambridge Univ. Press, 2000.

[22] J. B. Gao, S. R. Gunn, C. J. Harris, and M. Brown, “A probabilistic frame-work for SVM regression and error bar estimation,” Mach. Learning,vol. 46, pp. 71–89, 2002.

[23] J. T. Kwok and I. W. Tsang, “Linear dependency between e and the inputnoise in e-support vector regression,” IEEE Trans. Neural Netw., vol. 14,no. 3, pp. 544–553, May 2003.

[24] S. Wang, J. Zhu, F.-L. Chung, Q. Lin, and D. Hu, “Theoretically opti-mal parameter choices for support vector regression machines with noisyinput,” Soft Comput., vol. 9, no. 10, pp. 732–741, 2005.

[25] C.-G. Wen and C.-S. Lee, “Development of a cost function for wastewatertreatment systems with fuzzy regression,” Fuzzy Sets Syst., vol. 106,pp. 143–153, 1999.

[26] J. Wu, “Device diagnosis based on possibility theory,” J. Fuzzy Syst.Math., vol. 11, no. 2, pp. 75–84, 1999.

[27] S. Wang, F.-L. Chung, H. B. Shen, and R. Q. Zhu, “Note on the relationshipbetween probabilistic and fuzzy clustering,” Soft Comput., vol. 8, pp. 523–526, 2004.

Hongwei Ge received the M.S. degree in computerscience from Nanjing University of Aeronautics andAstronautics, Nanjing, China, in 1990. He is currentlyworking toward the Ph.D. degree in computer sciencefrom the School of Information, Southern YangtseUniversity, Wuxi, China.

He is currently an Associate Professor in theSchool of Information, Southern Yangtse University,Wuxi, China. His current research interests includeneuro-fuzzy systems, machine learning, and patternrecognition and their applications.

Page 11: Theoretical Choice of the Optimal Threshold for Possibilistic Linear Model With Noisy Input

GE et al.: THEORETICAL CHOICE OF THE OPTIMAL THRESHOLD FOR POSSIBILISTIC LINEAR MODEL WITH NOISY INPUT 1037

Fu-Lai Chung (M’96) received the B.Sc. degreefrom the University of Manitoba, Winnipeg, Canada,in 1987, and the M.Phil. and Ph.D. degrees in elec-tronic engineering from the Chinese University ofHong Kong, Shatin, Hong Kong, in 1991 and 1995,respectively.

In 1994, he joined the Department of Computing,Hong Kong Polytechnic University, Kowloon, HongKong, where he is currently an Associate Professor.He has published widely in the areas of fuzzy sys-tems, neural networks, and pattern recognition. His

current research interests include fuzzy data mining, fuzzy neural network mod-eling, and fuzzy techniques for multimedia applications.

Shitong Wang received the M.S. degree in computerscience from Nanjing University of Aeronautics andAstronautics, Nanjing, China, in 1987.

He was a Research Scientist at London Universityand Bristol University in U.K., Hiroshima Interna-tional University in Japan, Hong Kong Universityof Science and Technology, Hong Kong PolytechnicUniversity, and Hong Kong City University in HongKong. He is currently a Full Professor in the Schoolof Information, Southern Yangtse University, Wuxi,China. His current research interests include artifi-

cial intelligence (AI), neuron-fuzzy systems, pattern recognition, and imageprocessing. He is the author or coauthor of more than 80 papers published invarious national and international journals. He is also the author or coauthorseven books.