12
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 3, JUNE 2010 845 Multiobjective Optimization of Temporal Processes Zhe Song, Member, IEEE, and Andrew Kusiak, Member, IEEE Abstract—This paper presents a dynamic predictive- optimization framework of a nonlinear temporal process. Data- mining (DM) and evolutionary strategy algorithms are integrated in the framework for solving the optimization model. DM algo- rithms learn dynamic equations from the process data. An evolu- tionary strategy algorithm is then applied to solve the optimization problem guided by the knowledge extracted by the DM algorithm. The concept presented in this paper is illustrated with the data from a power plant, where the goal is to maximize the boiler effi- ciency and minimize the limestone consumption. This multiob- jective optimization problem can be either transformed into a single-objective optimization problem through preference aggre- gation approaches or into a Pareto-optimal optimization problem. The computational results have shown the effectiveness of the proposed optimization framework. Index Terms—Data mining (DM), dynamic modeling, evolution- ary algorithms (EAs), multiobjective optimization, nonlinear tem- poral process, power plant, predictive control, preference-based optimization. NOMENCLATURE x Vector of the controllable variables of a process. x i ith controllable variable. v Vector of the noncontrollable variables of a process. v i ith noncontrollable variable. y Vector of the response variables of a process. y i ith response variable. y P Vector of the performance variables of a process, y P is a subset of y. y NP Vector of the nonperformance variables of a process, y NP is a subset of y. Ω x Search space of x. Ω y NP Constraint space of y NP . f () Function capturing the mapping between (x, v) and y. t Sampling time stamp. d y i , d x i , d v i Maximum possible time delays for y i , x i , v i . D y i y i , D y i x i , D y i v i Sets of time delay constants selected for the corresponding variables y i , x i , v i un- der the response variable y i . Manuscript received November 2, 2008; revised February 8, 2009 and April 25, 2009. First published November 6, 2009; current version published June 16, 2010. This work was supported by the Iowa Energy Center under Grant 07-01. This paper was recommended by Associate Editor Y. S. Ong. The authors are with the Department of Mechanical and Industrial Engi- neering, The University of Iowa, Iowa City, IA 52242-1527 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSMCB.2009.2030667 d y i ,min x , d y i ,max x Minimum and maximum values of sets D y i x 1 ,...,D y i x k . d y i ,min v , d y i ,max v Minimum and maximum values of sets D y i v 1 ,...,D y i v k . X Set of all controllable variables. X 1,y i , X 2,y i Two subsets of X, actionable and nonac- tionable sets for y i . Y NP Set of response variables which are not the performance variables. Y 1,NP , Y 2,NP Two subsets of Y NP , one is affected by changing the controllable variables, the other one is not. α i () Preference function for the performance variable y i . y i (LB), y i (CP ), Lower bound, center point, and upper bound for the preference function α i (). y i (UB) Δy i Small positive constant to evaluate the de- crease or increase of y i . Ω p Preference space. Region i Region i in the preference space character- ized by its preference values. C() Cost function of the controllable variables. R, S Positive semidefinite matrices. β(), w 1 ,..., Aggregation function and the weights used in it. w M ,w C λ Offspring size. μ Parent size or initial population size. S i Solution vector of ith individual. σ i Mutation vector of ith individual. N () Normal distribution. δ Threshold-distance vector to differentiate two individuals. X , Region i (t) Set of Pareto-optimal solutions leading to Region i at sampling time t. X Region i (t) Set of solutions in the offspring pool lead- ing to Region i at sampling time t. n local Number of dominated individuals in a preference region. n global Number of dominated individuals in the preference space. I. I NTRODUCTION O PTIMIZING nonlinear and nonstationary processes with multiple objectives presents a challenge for traditional solution approaches. In this paper, a process is represented as a triplet (x, v, y), where x R k is a vector of k controllable variables, v R m is a vector of m noncontrollable (measur- able) variables, and y R l is a vector of l system response variables. The value of a response variable changes due to 1083-4419/$26.00 © 2009 IEEE

Multiobjective Optimization of Temporal Processes

  • Upload
    uiowa

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 3, JUNE 2010 845

Multiobjective Optimization of Temporal ProcessesZhe Song, Member, IEEE, and Andrew Kusiak, Member, IEEE

Abstract—This paper presents a dynamic predictive-optimization framework of a nonlinear temporal process. Data-mining (DM) and evolutionary strategy algorithms are integratedin the framework for solving the optimization model. DM algo-rithms learn dynamic equations from the process data. An evolu-tionary strategy algorithm is then applied to solve the optimizationproblem guided by the knowledge extracted by the DM algorithm.The concept presented in this paper is illustrated with the datafrom a power plant, where the goal is to maximize the boiler effi-ciency and minimize the limestone consumption. This multiob-jective optimization problem can be either transformed into asingle-objective optimization problem through preference aggre-gation approaches or into a Pareto-optimal optimization problem.The computational results have shown the effectiveness of theproposed optimization framework.

Index Terms—Data mining (DM), dynamic modeling, evolution-ary algorithms (EAs), multiobjective optimization, nonlinear tem-poral process, power plant, predictive control, preference-basedoptimization.

NOMENCLATURE

x Vector of the controllable variables of aprocess.

xi ith controllable variable.v Vector of the noncontrollable variables of

a process.vi ith noncontrollable variable.y Vector of the response variables of a

process.yi ith response variable.yP Vector of the performance variables of a

process, yP is a subset of y.yNP Vector of the nonperformance variables of

a process, yNP is a subset of y.Ωx Search space of x.ΩyNP Constraint space of yNP.f(•) Function capturing the mapping between

(x,v) and y.t Sampling time stamp.dyi

, dxi, dvi

Maximum possible time delays for yi,xi, vi.

Dyiyi

, Dyixi

, Dyivi

Sets of time delay constants selected forthe corresponding variables yi, xi, vi un-der the response variable yi.

Manuscript received November 2, 2008; revised February 8, 2009 andApril 25, 2009. First published November 6, 2009; current version publishedJune 16, 2010. This work was supported by the Iowa Energy Center under Grant07-01. This paper was recommended by Associate Editor Y. S. Ong.

The authors are with the Department of Mechanical and Industrial Engi-neering, The University of Iowa, Iowa City, IA 52242-1527 USA (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TSMCB.2009.2030667

dyi,minx , dyi,max

x Minimum and maximum values of setsDyi

x1, . . . , Dyi

xk.

dyi,minv , dyi,max

v Minimum and maximum values of setsDyi

v1, . . . , Dyi

vk.

X Set of all controllable variables.X1,yi , X2,yi Two subsets of X , actionable and nonac-

tionable sets for yi.Y NP Set of response variables which are not the

performance variables.Y 1,NP, Y 2,NP Two subsets of Y NP, one is affected by

changing the controllable variables, theother one is not.

αi(•) Preference function for the performancevariable yi.

yi(LB), yi(CP ), Lower bound, center point, and upperbound for the preference function αi(•).yi(UB)

Δyi Small positive constant to evaluate the de-crease or increase of yi.

Ωp Preference space.Regioni Region i in the preference space character-

ized by its preference values.C(•) Cost function of the controllable variables.R, S Positive semidefinite matrices.β(•), w1, . . . , Aggregation function and the weights used

in it.wM , wC

λ Offspring size.μ Parent size or initial population size.Si Solution vector of ith individual.σi Mutation vector of ith individual.N(•) Normal distribution.δ Threshold-distance vector to differentiate

two individuals.X∗,Regioni(t) Set of Pareto-optimal solutions leading to

Regioni at sampling time t.XRegioni(t) Set of solutions in the offspring pool lead-

ing to Regioni at sampling time t.nlocal Number of dominated individuals in a

preference region.nglobal Number of dominated individuals in the

preference space.

I. INTRODUCTION

O PTIMIZING nonlinear and nonstationary processes withmultiple objectives presents a challenge for traditional

solution approaches. In this paper, a process is represented asa triplet (x,v,y), where x ∈ Rk is a vector of k controllablevariables, v ∈ Rm is a vector of m noncontrollable (measur-able) variables, and y ∈ Rl is a vector of l system responsevariables. The value of a response variable changes due to

1083-4419/$26.00 © 2009 IEEE

846 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 3, JUNE 2010

the controllable and noncontrollable variables. The controllableand noncontrollable variables are considered in this paper asinput variables. The underlying relationship is represented asy = f(x,v), where f(•) is a function capturing the process,and it may change in time. y = f(x,v) could be further ex-panded as y1 = f1(x,v), y2 = f2(x,v), . . . , yl = fl(x,v) foreach response variable. Finding the optimal control settings foroptimizing the process can be formulated as a multiobjectiveoptimization problem with constraints. Without loss of general-ity, the first lp response variables (lp ≤ l) are assumed to be theperformance metrics to be maximized. Let yP = [y1, . . . , ylp ]T

be a vector of all the performance response variables. yNP =[ylp+1, . . . , yl]T is a vector of the left nonperformance responsevariables

maxx

{y1, y2, . . . , ylp}s.t. x ∈ Ωx yNP ∈ ΩyNP . (1)

In model (1), Ωx is a feasible search space of the controllablevariables x. ΩyNP is the constraint space in which yNP hasto stay. In many industrial applications, the noncontrollablevariables v, the underlying function f(•), and the search spacesΩx and ΩyNP are time dependent. Thus, the optimization modelshould be solved repeatedly.

Finding the optimal control settings for the nonlinearand temporal processes with multiobjectives poses severalchallenges.

1) It is difficult to derive analytic models describing f(•).For example, modeling the relationship between combus-tion process efficiency and input variables is not trivial.Thus, it is difficult to solve model (1) with traditionaloptimization techniques.

2) The function f(•) is nonstationary. Updating f(•) isnecessary in practical applications. The function f(•) canbe extracted with data-mining (DM) algorithms from thecurrent process data, and, thus, it remains current. Forexample, a combustor ages over time. Regular mainte-nance and repair change the combustor’s properties, thusimpacting the combustion process and the function f(•).

3) How to decide the tradeoffs among multiobjectives?How to find a set of potential solutions which is Paretooptimal?

To deal with the aforementioned challenges, a frameworkintegrating DM and evolutionary algorithms (EAs) is presented.The underlying process is captured with dynamic equationslearned by DM algorithms from process data. Then, the op-timization model is solved using EAs. Domain knowledge isthereby reflected through the definition of a preference functiontransforming performance metrics into the preference space,which is easier for the decision makers to understand.

Recent advances in EAs and DM present an opportunity tomodel and optimize complex systems using operational data.DM algorithms have proved to be effective in applicationssuch as manufacturing, marketing, combustion process, and soon [4], [20], [25], [33]. The use of DM algorithms to extractprocess models and then to optimize the models by EA can betraced back to [31] and [37], where neural networks (NNs) wereused to identify the process with static equations (i.e., no time

delays were considered). Clustering algorithms were used to ex-tract patterns, leading to higher combustion efficiency in steadystates in [25]. It has been recognized that DM algorithms canidentify dynamic equations from process data, which then, inturn, can be solved by EAs. Numerous successful EA applica-tions have been reported in the literature [1]–[3], [8], [9], [15],[18], [19], [23], [31], [32], [37]–[39], [42], [43]. An EA wasapplied to solve a multiobjective power dispatch problem [1]and racing car design optimization [3]. NNs and genetic algo-rithms were used in [2] to optimize engine emissions, where theobjective function was a nonlinear weighted function. An EAwas applied to the multiobjective optimization of gas turbineemissions and pressure fluctuation [8]. The research resultsreported in the literature offer a promising direction for solvingcomplex problems that are difficult to handle by traditionalanalytical approaches.

This paper presents a framework for optimizing temporalprocesses. The process model is described by a set of dynamicequations which is identified by DM algorithms. As gradientsare usually not available for solving these dynamic equationsassembled in a model, an EA is used to solve this model. Theoptimization problem discussed in this paper can be consideredas a dynamic multiobjective optimization problem [15], [44].The focus of this paper is on modeling processes with dynamicequations generating solutions at specific intervals required bythe application. The latter parallels solving static optimizationproblems. From the dynamic optimization perspective, trackingoptimal solutions is achieved by iteratively solving the opti-mization problem or updating the underlying process modelwith recent process data.

The proposed approach integrating DM with evolutionarycomputation has been applied to optimize combustion processin a power plant. Computational results have shown that themodels extracted by the DM algorithms are accurate and can beused to control an industrial combustion process. The optimiza-tion framework presented is applicable to other processes, e.g.,refinery process and wind energy conversion, where analyticalapproaches are not able to handle the complexity and scope ofthe models. The proposed approach calls for the use of a largevolume of process data representing the process.

II. PROCESS MODELING AND OPTIMIZATION

BASED ON DYNAMIC EQUATIONS

A process can be considered as a dynamic multiple-input–multiple-output system. Assume that the value of the firstperformance response variable at time t, i.e., y1(t), could bedetermined by the values of the previous system status (i.e., pre-dictors): {y1(t − 1), . . . , y1(t − dy1)}, {x1(t − 1), . . . , x1(t −dx1)}, . . . , {xk(t − 1), . . . , xk(t − dxk

)}, {v1(t − 1), . . . ,v1(t−dv1)}, . . . , {vm(t−1), . . . , vm(t−dvm

)}. Similarly, yi(t)could be affected by {yi(t − 1), . . . , yi(t − dyi

)}, {x1(t − 1),. . . , x1(t−dx1)}, . . . , {xk(t−1), . . . , xk(t−dxk

)}, {v1(t−1),. . . , v1(t−dv1)}, . . . , {vm(t−1), . . . , vm(t−dvm

)}, for i=1to l. dyi

, dx1 , . . . , dxk, dv1 , . . . , dvm

are some maximumpossible time delays to be considered for the correspondingvariables, and they are all positive constants. To obtain anaccurate dynamic model that can be applied to optimize the

SONG AND KUSIAK: MULTIOBJECTIVE OPTIMIZATION OF TEMPORAL PROCESSES 847

process, selecting the appropriate predictors is important.For example, for the performance variable y1(t), a predictorselection algorithm selects a set of important predictors among{y1(t−1), . . . , y1(t−dy)}, {x1(t−1), . . . , x1(t−dx1)}, . . . ,{xk(t−1), . . . , xk(t−dxk

)}, {v1(t−1), . . . , v1(t−dv1)}, . . . ,{vm(t − 1), . . . , vm(t − dvm

)}. DM offers algorithms that canperform such a task. For example, the boosting tree [16], [17]algorithm can be used to determine the predictor’s importance.Wrapper and genetic random search procedures can determinethe best set of predictors [14], [35]. Aside from the algorithmsfor predictor selection, domain knowledge is another importantsource of information for case-by-case applications. Thepredictor selection is not discussed in detail in this paper;rather, it is accomplished by domain knowledge and importanceranking. For clarity of presentation, some definitions andobservations about the dynamic equations are included in theAppendix.

Since the process is modeled with a set of dynamic equations,model (1) can be reformulated as a one-step-predictive-optimization model. The predictive-optimization modelresembles the widely used predictive control idea, which hasproven to be successful in industry [18], [21], [26], [27], [36].At sampling time t, the system status is {y1(t), . . . , yl(t), x1(t),. . . , xk(t), v1(t), . . . , vm(t)}, and all historical informationis available. The optimal values of {x1(t), . . . , xk(t)} aredetermined by solving

maxx(t)

{y1

(t+ dy1,min

x

), y2

(t+ dy2,min

x

), . . . , yM

(t+ d

ylp,minx

)}

s.t. x(t) ∈ Ωx[ylp+1

(t+ d

ylp+1,minx

), . . . , yl

(t+ dyl,min

x

)]T

∈ ΩyNP

yi

(t + dyi,min

x

)= fi

(. . . , [xj(t)]xj∈X1,yi ,[xjd

(t − d + dyi,min

x

)]xj∈X2,yi ,d∈D

yixj

, . . .)

i = 1, . . . , l. (2)

thus, the performance metrics {y1, . . . , ylp} are optimized at

sampling time {t + dy1,minx , t + dy2,min

x , . . . , t + dylp ,minx }.

Based on observation 3, the x vector in model (2) iscomposed of variables belonging to

⋃lpi=1 X1,yi , i.e., x(t) =

[xj(t)]xj∈⋃lp

i=1X1,yi

. Note that, in order to improve the solution

robustness of model (2), different techniques could be used;interested readers are referred to [12], [19], and [34]. DifferentDM algorithms can also be used to learn the dynamic equationto form an ensemble [33] and combine the predictions.

Model (2) could be treated either as a single-objective opti-mization or Pareto-optimization problem, both discussed in thispaper and illustrated with case studies.

The concept of preference function is introduced, wheredomain knowledge can be used for simplifying decision mak-ing and helping design multiobjective evolutionary strategyalgorithms to keep desired individuals in the elite set. Thesingle-objective representation of model (2) is solved with anevolutionary strategy. The strength Pareto EA (SPEA) [38] isused to optimize the Pareto-optimal representation of model (2).

A. Incorporating Domain Knowledge in Preference Functions

In a multiobjective decision-making process, domain knowl-edge is important in determining the optimal or satisfactorysolution. In this paper, domain knowledge is incorporated inmodel (2) through the preference functions [10], [29], [30].

Definition 1: Preference function αi(•) is defined to trans-form performance variable yi into the interval [0, 1], with“0” denoting complete unacceptability, “1” standing for totalsatisfaction, and “0.5” denoting “not good and not bad.” i =1, . . . , lp.

Based on the previous assumption, the high value of yi(t)means better performance; thus, the derivative dαi/yi should begreater than or equal to zero, i.e., dαi/yi ≥ 0. To characterizethe preference function αi(•), three characteristic points needto be defined.

Definition 2: For αi(•), let yi(LB), yi(CP ), and yi(UB)stand for yi’s lower bound, center point, and upper boundfor the preference, i.e., αi(yi) = 1 for yi ≥ yi(UB), αi(yi) =0.5 for yi = yi(CP ), and αi(yi) = 0 for yi ≤ yi(LB), i =1, . . . , lp.

Model (2) can be expressed as follows by using the prefer-ence functions:

maxx(t)

{α1

(y1

(t+ dy1,min

x

)), α2

(y2

(t+ dy2,min

x

)), . . . ,

αlp

(ylp

(t+ d

ylp ,minx

))}s.t. x(t) ∈ Ωx[

ylp+1

(t+ d

ylp+1,minx

), . . . , yl

(t + dyl,min

x

)]T

∈ ΩyNP

yi

(t + dyi,min

x

)= fi

(. . . , [xj(t)]xj∈X1,yi ,[xj

(t − d + dyi,min

x

)]xj∈X2,yi ,d∈D

yixj

, . . .)

i = 1, . . . , l. (3)

To continuously improve the performance of the dynamicprocess, yi(LB), yi(CP ), and yi(UB) need to be dynamic. If,at sampling time t, the process is not optimized (it is left alone),the performance response variables are {y1(t+ dy1,min

x ),y2(t + dy2,min

x ), . . . , ylp(t + dylp ,minx )}. Thus, yi(CP ) at sam-

pling time t can be defined as yi(CP ) = yi(t + dyi,minx ); sim-

ilarly, yi(LB) = yi(t + dyi,minx ) − Δyi, and yi(UB) = yi(t +

dyi,minx ) + Δyi, where Δyi is a small positive constant. The

intuitive explanation is that, if, after implementing the opti-mal control settings, the performance increases by Δyi, it isdeemed as totally satisfactory. Otherwise, if the control settingsdecrease the performance by Δyi, it is totally unacceptable.For ease of discussion, αi(yi(t + dyi,min

x )) could be simpli-fied as αi(t + dyi,min

x ), and {α1(y1(t + dy1,minx )), α2(y2(t +

dy2,minx )), . . . , αlp(ylp(t + d

ylp ,minx ))} can be represented as a

vector α(t) = [α1(t + dy1,minx ), α2(t + dy2,min

x ), . . . , αlp(t +

dylp ,minx )]T in the preference space.Model (3) can be solved by different algorithms. The Pareto-

optimal set concept to directly solve model (3) is a valid ap-proach. Then, a solution is selected from the solution set basedon the domain knowledge. Another approach is to combine the

848 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 3, JUNE 2010

lp objectives into a single-objective function. The simplest wayis the weighted sum of all the objectives.

B. Pareto-Optimal Set Approach

Since all preferences are between zero and one, the prefer-ence space is a hypercubic unit with lp dimensions. It can bedivided into several regions with different characteristics.

Definition 3: Let Ωp be the preference space, i.e., Ωp =[0, 1]lp . Let [0.5]lp be a vector consisting of lp 0.5’s. Let [1]lpbe a vector consisting of lp 1’s. p = [p1, . . . , plp ]T is a vector inthe preference space. Four regions of Ωp can be defined

Region1 ={p|p ∈ Ωp, pi≥0.5, i=1, . . . , lp, p �=[0.5]lp

},

Region4 ={p|p ∈ Ωp, pi≤0.5, i=1, . . . , lp, p �=[0.5]lp

},

Region2 ={p|p ∈ Ωp, p �∈ Region1, p

T [1]lp ≥0.5lp,

p �=[0.5]lp}

,

Region3 ={p|p ∈ Ωp, p �∈ Region4, p

T [1]lp <0.5lp,

p �=[0.5]lp}

.

Observation 1: At sampling time t, for a Pareto-optimalfrontier α∗(t) and its corresponding solution x∗(t), if α∗(t) ∈Region1, then implementing x∗(t) will increase some of thelp preferences without decreasing any other preferences; ifα∗(t) ∈ Region4, then implementing x∗(t) will decrease someof the lp preferences without increasing any other preferences;if α∗(t) ∈ Region2, then implementing x∗(t) will increasesome of the preferences at the expense of decreasing otherpreferences, but the total sum of the lp preference values willincrease or stay the same; if α∗(t) ∈ Region3, then implement-ing x∗(t) will increase some of the preferences at the expenseof decreasing other preferences, but the total sum of the lppreference values will decrease.

Based on observation 1, the solutions leading to Region1 aremost desirable. The solutions leading to Region4 are definitelynot desirable. The solutions leading to Region2 or Region3 arenot clear; it is difficult to determine if the process is optimizedwithout deeper domain knowledge. The solutions leading to theboundary regions between Region2 and Region1 could also beconsidered for decision-making. In some cases, sacrificing oneperformance metric slightly is worthwhile if the improvementof the other performance metrics is significant.

Until now, the input cost has not been considered. However,in any real application, every input costs some type of energy. Itis desirable then to optimize the process with a low-input cost.

Definition 4: Let x∗(t) be a solution of model (3) at sam-pling time t. The cost associated with this input x∗(t) is de-fined as

C (x∗(t)) = x∗(t)T Rx∗(t)

+ (x∗(t) − x(t − 1))T S(x∗(t) − x(t − 1))

where R and S are positive semidefinite matrices.Based on the cost, the Pareto-optimal solutions could be

ranked in ascending order. A user may select the Pareto-optimalsolution with the small input cost and desirable preferencevalues.

C. Preference Aggregation Approach

Traditional Pareto-dominance-based EAs are ineffectivewhen the number of objectives becomes large [28], as therecould be many Pareto-optimal solutions. A common practiceis to reduce the multiobjectives to a single objective. The lpobjectives could be appropriately aggregated by some type ofaggregation function. Based on previous research in engineer-ing design [10], [29], [30], an aggregation function is adopteddue to its useful properties.

Definition 5: Let β(•) be an aggregation function combiningthe lp objectives of model (3), i.e., β(α1, . . . , αlp , w1, . . . ,

wlp) = (w1αs1 + · · · + wlpαs

lp)1/s, where w1, . . . , wlp are the

weights, w1, . . . , wlp ≥ 0, and w1 + · · · + wlp = 1.If the input cost is considered as another performance

metric to be optimized, the aggregation function is expressedas β(α1, . . . , αlp , C, w1, . . . , wlp , wC) = (w1α

s1 + · · · +

wlpαslp

+ wCC−s)1/s, with w1, . . . , wlp , wC ≥ 0, andw1 + · · · + wlp + wC = 1.

Model (3) can be transformed into a single-objective modelas follows:

maxx(t)

{β(α1, . . . , αlp , C, w1, . . . , wlp , wC)

}s.t. x(t) ∈ Ωx[

ylp+1

(t+ d

ylp+1,minx

), . . . , yl

(t+ dyl,min

x

)]T

∈ ΩyNP

yi

(t+ dyi,min

x

)= fi

(. . . , [xj(t)]xj∈X1,yi ,[xj

(t − d + dyi,min

x

)]xj∈X2,yi ,d∈D

yixj

, . . .

)i = 1, . . . , l (4)

where

β(α1, . . . , αlp , C, w1, . . . , wlp , wC)

=(w1α1

(t + dy1,min

x

)s+ · · · + wlpαlp

(t + d

ylp ,minx

)s

+wCC (x(t))−s) 1s .

III. EVOLUTIONARY STRATEGY ALGORITHM

Since the dynamic equations fi’s are constructed by theDM algorithms, traditional optimization algorithms cannot beapplied, as they usually require fi to be in a specific form.In this paper, different evolutionary strategy algorithms areused to solve optimization models (3) and (4) at time t. OtherEAs (e.g., the genetic algorithm) could also be used, and theirperformance could be studied.

Definition 6: Let λ be the offspring size and μ be both thenumber of offspring selected and the initial population size. In-dividuals in the parent population are numbered from 1 to μ. In-dividuals in the offspring population are numbered from 1 to λ.For ease of discussion, μ is assumed to be divisible by four.

Recall that x(t) = [xj(t)]xj∈⋃lp

i=1X1,yi

=

⎛⎜⎝

xjlow(t)...

xjhigh(t)

⎞⎟⎠

which is a vector with index j varying from jlow to jhigh.

SONG AND KUSIAK: MULTIOBJECTIVE OPTIMIZATION OF TEMPORAL PROCESSES 849

Definition 7: The general form of the ith individualin the evolutionary strategy is defined as (si,σi), where

si =

⎛⎜⎝

xijlow(t)

...xi

jhigh(t)

⎞⎟⎠, and σi =

⎛⎜⎝

σijlow

...σi

jhigh

⎞⎟⎠. Each element of σi

is used as a standard deviation of a normal distribution with zeromean.

The basic steps of an evolutionary strategy algorithm [13] forsolving a single-objective model are shown next.

Algorithm 11) Initialize μ individuals (candidate solutions) to form the

initial parent population.2) Repeat until the stopping criteria are satisfied.

a) Select and recombine parents from the parent popula-tion to generate λ offspring (children).

b) Mutate the λ children.c) Select the best μ children based on the fitness function

values.d) Use the selected μ children as parents for the next

generation.

In an evolutionary strategy algorithm, an individual (Si,σi)can be mutated by following (5) and (6), with σi mutated firstand si mutated next:

σi = σi •

⎛⎝ eN(0,τ ′)+N

jlow (0,τ)...

eN(0,τ ′)+Njhigh (0,τ)

⎞⎠ (5)

where N(0, τ ′) is a random number drawn from the normal dis-tribution with zero mean and standard deviation τ ′. Njlow(0, τ)is a random number drawn from the normal distribution withzero mean and standard deviation τ . Njlow(0, τ) is generatedspecifically for σi

jlow , whereas N(0, τ ′) is for all entries. “•” isthe Hadamard matrix product [40]

si = si + N(0,σi) (6)

where N(0,σi) is a vector of the same size as si. Each elementof N(0,σi) is generated from a normal distribution with meanzero and the corresponding standard deviation in vector σi.

Definition 8: Let SelectedParents be an index set that iscomposed of two unique randomly selected indexes from 1 to μ.SelectedParents changes every time it is generated.

To generate λ children, two parents are selected from theparent population and recombined λ times. Assume each timethat two parents are selected randomly to produce one child byusing (

12

∑i∈SelectedParents

si,12

∑i∈SelectedParents

σi

). (7)

A discrete recombination operator [13] was applied in this re-search; however, it did not perform, as well as the intermediaryrecombination operator used in (7).

In the traditional evolutionary strategy algorithm, λ childrenare generated, and the best μ of them are selected based on thefitness function value. However, in order to keep the diversity

of the population and to prevent the algorithm from convergingtoo fast, a threshold-distance selection operator is used to solvemodel (4).

A. Solving the Preference Aggregation Model

Model (4) is a constrained optimization problem; thus, anadditional constraint-handling technique has to be incorporatedinto algorithm 1. Although there are different techniques tohandle the constraints in an evolutionary computation [9], [13],a procedure that is similar to that in [1] is incorporated inalgorithm 1 to check the feasibility of all the individuals, thusresulting in algorithm 2.

Algorithm 2

1) Initialize μ feasible individuals (candidate solutions) toform the initial parent population.

2) Repeat until the stopping criteria are satisfied.a) Select and recombine parents from the parent popula-

tion to generate λ offspring (children).b) Mutate the λ children.c) Check the feasibility of all children. If all children are

feasible, go to step d); otherwise, go to step a).d) Select μ children based on the threshold-distance se-

lection operator.

To avoid selecting similar individuals for the next gen-eration, a threshold-distance selection operator is used inalgorithm 2.

Threshold-Distance Selection Operator: An individual si isconsidered similar to another individual sj if |si − sj | ≤ δ,

i.e.,

⎛⎜⎜⎝

∣∣∣xijlow(t) − xj

jlow(t)∣∣∣

...∣∣∣xijhigh(t) − xj

jhigh(t)∣∣∣

⎞⎟⎟⎠ ≤

⎛⎜⎝

δjlow

...δjhigh

⎞⎟⎠. δ is a threshold-

distance vector to differentiate two individuals. When δ = 0or δ = ∞ (i.e., large enough), the threshold-distance selectionoperator reduces to the traditional evolutionary-strategy (ES)selection operator in algorithm 1. The threshold-distance selec-tion operator is formulated next.

Threshold-Distance Selection:

1) Sort the λ children in descending order based on theirfitness values.

2) Add the first child into the parent population. Let thecurrent selected individual be the first child.

3) Do:a) Select the next child into the parent population if it is

not similar to the current selected individual.b) Update the current selected individual.c) While the parent population is not full or all the

remaining children are similar to the current selectedindividual or the current selected individual is the lastchild.

4) If the number of individuals in the parent populationis smaller than μ, supplement the parent populationby selecting the best children which have not beenselected yet.

850 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 3, JUNE 2010

B. Solving the Pareto-Optimal Set Model

Major modifications need to be made to algorithm 1 togenerate the Pareto-optimal set for model (3). Since not allPareto-optimal solutions will lead to Region1 and Region2,the Pareto-optimal solutions leading to other regions are notstored in an elite external set. For a traditional multiobjec-tive EA, the preference functions are not considered until thePareto-optimal set is generated by the EA. In this scenario, thePareto-optimal solutions are scattered across different regionsin the preference space, including undesirable regions. If thepreference functions are utilized before designing a multiob-jective EA, the preference space can be divided into desiredand undesired regions. Specific techniques could be developedwithin a multiobjective EA so that the Pareto-optimal setcontains only the solutions leading to the desired preferenceregions.

Definition 9: Let Offspring be the set consisting of λchildren and Parent be the set consisting of μ parents. LetX∗,Region1(t) be a set consisting of Pareto-optimal solutionsleading to Region1. Let X∗,Region2(t) be a set consisting ofPareto-optimal solutions leading to Region2.

Algorithm 3 is the modified SPEA [11], [38] so that differentregions in the preference space are considered and the individ-uals leading to the desired regions are retained.

Algorithm 3

1) Initialize μ feasible individuals, and save them intoParent. Each individual can be labeled into the fourregions. Initialize X∗,Region1(t) and X∗,Region2(t) as anempty set.

2) Copy from Parent nondominated solutions, which willlead to Region1 or Region2, into X∗,Region1(t) andX∗,Region2(t).

3) Remove the solutions in X∗,Region1(t) ∪ X∗,Region2(t)which are dominated by the other members in it.

4) If the number of solutions in X∗,Region1(t) ∪X∗,Region2(t) exceeds some threshold, use clusteringalgorithms to reduce it.

5) Select and recombine the parents from Parent to gener-ate λ offspring, and save them into Offspring.

6) Mutate the λ children.7) Check the feasibility of all children. If all children are

feasible, go to step 8); otherwise, go to step 5).8) Label each child with a corresponding region. Cal-

culate the local and global strength of each child inOffspring.

9) Empty Parent. For each region, select for Parent thebest μ/4 individuals based on their local strength. If thetotal number of selected individuals in Parent is smallerthan μ, select the remaining individuals of Offspringbased on their global strength.

10) If the maximum number of generations is reached, thenstop; else, go to step 2).

Local and Global Strength:Definition 10: Let XRegioni(t) be a set consisting of indi-

viduals from Offspring leading to Regioni, i = 1, . . . , 4.

TABLE IDATA SET DESCRIPTION

For an individual (sj ,σj) in XRegioni(t), its local strengthis calculated as

Local_strengthj =nlocal

|XRegioni(t)| + 1(8)

where nlocal is the number of individuals of XRegioni(t)dominated by individual (sj ,σj). Its global strength is cal-culated as

Global_strengthj =nglobal

|Offspring| + 1(9)

where nglobal is the number of individuals of Offspring dom-inated by individual (sj ,σj). | • | is the cardinality operatorof a set. The proposed algorithm 3 heuristically keeps thediversity in the populations and uses local and global strengthto converge quickly. The elite solutions leading to Region1 andRegion2 are not lost until better ones are found.

IV. INDUSTRIAL CASE STUDY

To validate the concepts introduced in this paper, the datafrom Boiler 11 at The University of Iowa Power Plant (UIPP) were collected. The boiler burns coal and biomass (oathulls). The ratio of coal to oat hulls changes depending onthe availability of the oat hulls. Two performance metrics arethe boiler efficiency and the coal–limestone ratio. They areboth to be maximized. Limestone is used to reduce the SO2

emission. Thus, the response variable to be constrained is theSO2 emission, which is to be controlled below some level. SO2

could be considered as another performance metric; however,in this study, two metrics are easier to visualize, and it is easierto discuss and visualize the optimization results.

A. Data Set, Sampling Frequency, and Predictor Selection

From the UI PP data historian, 5729 data points were sam-pled at 5-min intervals. Data set 1 in Table I is the total dataset that is composed of 5729 data points starting from “2/1/072:50 A.M.” and continuing to “2/21/07 11:45 A.M.” Duringthis time period, the boiler operations could be described asnormal. Considering the noise in the industrial data, data set 1was denoised by the moving-average approach [41] with a lagof four, and error readings of the SO2–coal ratio were deleted.Data set 1 was divided into two data sets. Data set 2 was usedto extract a model by the DM algorithms. It consisted of 4600data points. Data set 3 was used to test the model learnt fromdata set 2.

In this paper, the boiler efficiency, the coal–limestone ratio,and the SO2–coal ratio (Table II) are heuristically modeled as a

SONG AND KUSIAK: MULTIOBJECTIVE OPTIMIZATION OF TEMPORAL PROCESSES 851

TABLE IIPROCESS VARIABLES OF THE DATA SET

function of coal-and-primary-air ratio [coal flow (kilopoundsper hour)/primary air flow (kilopounds per hour)], coal-and-secondary-air ratio [coal flow (kilopounds per hour)/secondaryair flow (kilopounds per hour)], coal-to-oat-hull ratio [coal flow(kilopounds per hour)/oat hull flow (kilopounds per hour)], andcoal and oat hull quality (in British thermal unit per pound).However, other variables could be considered based on theapplication context. Using the coal-and-primary-air ratio, theprimary air flow could be determined based on the current coalflow. Similarly, the secondary air flow could be determinedby using the current coal flow and the coal-and-secondary-airratio. The coal-to-oat-hull ratio is considered a noncontrollablevariable.

The maximum time delays, namely, dy1 , dy2 , dy3 , dx1 , dx2 ,dv1 , dv2 , dv3 , are assumed to be nine. In the context of the5-min sampling intervals, 9 × 5 = 45 min is assumed to be themaximum time delays in the combustion process. For example,if the operator were to change the primary air flow, it wouldtake at most 45 min to observe that this change had someeffect on the boiler efficiency. After running the classificationand Regression tree (C & R tree) algorithm [7] on data set1, the importance of each predictor is calculated. Consideringobservations 2 and 3, the predictors are heuristically selectedfor each response variable.

Then, the current boiler efficiency y1(t) is expressed as

y1(t) = f1 (y1(t − 1), y1(t − 2), x1(t − 5), x1(t − 6),x1(t − 7), x2(t − 5), x2(t − 6), x2(t − 7),v1(t − 9), v2(t − 9), v3(t − 9)) . (10)

The coal–limestone ratio y2(t) can be written as

y2(t) = f2 (y2(t − 1), y2(t − 2), y2(t − 3), x1(t − 1),x1(t − 2), x2(t − 1), x2(t − 2),v1(t − 1), v1(t − 2), v2(t − 9)) . (11)

The SO2–coal ratio y3(t) is

y3(t) = f3 (y3(t − 1), y3(t − 2), x1(t − 1), x1(t − 8),x1(t − 9), x2(t − 1), x2(t − 7), x2(t − 9),v1(t − 3), v2(t − 1), v3(t − 6)) . (12)

When the structures of the dynamic equations describing thecombustion process are known, the DM algorithms are used toextract the actual equations.

B. Learning Dynamic Equations From the Process Data

In this paper, the C & R tree algorithm [7] and the NNalgorithm [5] are used to learn the dynamic equations (10)–(12)from data set 2. NN performed better in predicting the boiler ef-ficiency and the coal–limestone ratio. The C & R tree algorithmperformed better than NN only in predicting the SO2–coal ratio.The extracted models were tested using data set 3. Table IIIsummarizes the models’ prediction accuracy based on dataset 3. Fig. 1(a)–(c) shows the first 200 predicted and observedvalues of data set 3.

In summary, all the models made high quality predictionson the testing data sets and captured the system dynamics. Themodeling task is simplified by using the DM algorithms. How-ever, updating (online or offline) the learned models by newdata points is necessary for a temporal process. Data filtering isused for removing the data in error. Model performance is mon-itored so that the updating procedure can be triggered as needed.These issues, however, are beyond the scope of this paper.

Although only the C & R tree and NN algorithms are usedin this paper, other DM algorithms (such as random forest [6],boosting tree [16], [17], or radial basis function [22]) could beselected to further improve the prediction accuracy.

Based on the selected predictors and dynamic equations(10)–(12), model (2) is instantiated as

maxx(t)

{y1(t + 5), y2(t + 1)}

s.t. [x1(t), x2(t)]T ∈ Ωx;

y3(t + 1) ∈ ΩyNP ;

y1(t + 5) = f1 (y1(t + 4), y1(t + 3), x1(t), x1(t − 1),

x1(t − 2), x2(t), x2(t − 1), x2(t − 2),

v1(t − 4), v2(t − 4), v3(t − 4)) ;

y2(t + 1) = f2 (y2(t), y2(t − 1), y2(t − 2),

x1(t), x1(t − 1), x2(t), x2(t − 1),

v1(t), v1(t − 1), v2(t − 8)) ;

y3(t + 1) = f3 (y3(t), y3(t − 1), x1(t), x1(t − 7),

x1(t − 8), x2(t), x2(t − 6), x2(t − 8),

v1(t − 2), v2(t), v3(t − 5)) . (13)

To solve model (13), preference functions are used.

C. Preference Functions

Based on domain knowledge, improving the boiler efficiencyby 0.01 will be considered as a significant achievement inany power plant. The increase of the coal–limestone ratio byone would be regarded as totally satisfactory, i.e., Δy1 = 0.01,Δy2 = 1. Note that, in different applications, the degree ofsatisfaction and dissatisfaction could vary. Thus, Δyi shouldalso change based on the domain knowledge.

852 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 3, JUNE 2010

TABLE IIIPREDICTION ACCURACY OF DIFFERENT MODELS FOR DATA SET 3

Fig. 1. Predicted values and observed values of the first 200 data pointsof data set 3. (a) Boiler efficiency. (b) Coal–limestone ratio. (c) SO2–coalratio.

Thus, two linear preference functions could be definedbased on y1(CP ) = y1(t + 5), y1(LB) = y1(t + 5) − 0.01,and y1(UB) = y1(t + 5) + 0.01, i.e., α1(y1) = 50y1 + 0.5 −50y1(t + 5), y1(t + 5) − 0.01 ≤ y1 ≤ y1(t + 5) + 0.01. Sim-ilarly, α2(y2) = 0.5y2 + 0.5 − 0.5y2(t + 1), y2(t + 1) − 1 ≤y2 ≤ y2(t + 1) + 1.

Fig. 2(a) and (b) illustrates the shapes of the preferencefunctions for the boiler efficiency and the coal–limestone ra-tio, respectively. Other nonlinear shape functions can also beconsidered in future research.

Once the preference functions are determined, model (13)can be solved using the preference aggregation or the Pareto-optimal set approach.

Fig. 2. Linear dynamic preference functions for the boiler efficiency and coal–limestone ratio. (a) Boiler efficiency preference function. (b) Coal–limestone ratio preference function.

D. Solving the Preference Aggregation Model

Once the linear preference functions are assumed, model (4)can be instantiated as follows:

maxx(t)

{β(α1, α2, C, w1, w2, wC)}

s.t. [x1(t), x2(t)]T ∈ Ωx;

y3(t + 1) ∈ ΩyNP ;y1(t + 5) = f1 (y1(t + 4), y1(t + 3), x1(t), x1(t − 1),

x1(t − 2), x2(t), x2(t − 1), x2(t − 2),v1(t − 4), v2(t − 4), v3(t − 4)) ;

y2(t + 1) = f2 (y2(t), y2(t − 1), y2(t − 2),x1(t), x1(t − 1), x2(t), x2(t − 1),v1(t), v1(t − 1), v2(t − 8)) ;

y3(t + 1) = f3 (y3(t), y3(t − 1), x1(t), x1(t − 7),x1(t − 8), x2(t), x2(t − 6), x2(t − 8),v1(t − 2), v2(t), v3(t − 5)) (14)

where

β(α1, α2, C, w1, w2, wC)

=(w1α1(t + 5)s + w2α2(t + 1)s + wCC (x(t))−s) 1

s .

SONG AND KUSIAK: MULTIOBJECTIVE OPTIMIZATION OF TEMPORAL PROCESSES 853

Fig. 3. ES solving model (14) for different offspring sizes at sampling time“2/21/2007 9:55 A.M.”; the objective function value is the best individual’sfitness value at each generation.

Let w1 = 0.4, w2 = 0.58, and wC = 0.02 (other weightscenarios are discussed later in this paper). Let s = 1 and

R = S = I =(

1 00 1

); thus, C(x(t)) = x1(t)2 + x2(t)2 +

(x1(t) − x1(t − 1))2 + (x2(t) − x2(t − 1))2. These weightsand constants are fixed in this paper to see the impact of otherinteresting weights, namely, w1 and w2.

In this paper, the input cost weight wC impact is not studied.δ is set as

[0.010.01

]based on the analysis of historical data. The

impact δ on the solution is discussed later in this paper.[x1x2

]is

limited to between[0.050.05

]and

[0.20.29

]in this case study based on

analyzing the historical data distribution of x. y3(t + 1) shouldbe smaller than or equal to the current SO2–coal ratio withoutchanging the controllable settings.

The ES parameters τ ′ and τ are determined heuristically

as τ ′ = 1/√

2b and τ = 1/√

2√

b, with b = 2, in this case,the number of actionable variables [13]. The lower and upperbounds for the standard deviation values of σi are set to 0.005and 0.1 based on the analysis of the historical data.

Two parents are selected to generate one child. For the initialpopulation at sampling time t, si is generated by drawingrandom numbers uniformly between

[0.050.05

]and

[x1(t)+0.1x2(t)+0.1

],

which allows the algorithm to perform a local search. Froman application point of view, finding the local solutions ispreferred, as dramatically different solutions could make theprocess unstable. Similarly, σi is generated by drawing randomnumbers uniformly between

[0.0050.005

]and

[0.10.1

].

Although the research suggests the selection pressure to beμ/λ = 1/7 [13], [24], numerous experiments were conducted,and it was determined that μ/λ = 20/120 produced satisfac-tory results. The algorithm has converged to a local optimumin 25 generations (see Fig. 3). A sampling time stamp (testset) was randomly selected between “2/17/07 11:45 A.M.” and“2/21/07 11:45 A.M.” Then, model (14) was solved for differentvalues of the selection pressure. Similar patterns were observedby solving the optimization model for other test sets.

To evaluate the effects of the initial parent size, the selectionpressure is fixed at μ/λ = 1/6. Fig. 4 shows that the 20-parent size is enough for the algorithm to converge to localoptima. A smaller number of initial parent sizes lead to unstableresults.

To evaluate the impact of the threshold-distance selectionoperator, vector δ is varied from

[00

]to

[0.30.3

]. It can be observed

from Fig. 5 that δ =[0.0050.005

]performs a little better. When δ is

Fig. 4. ES solving model (14) for different parent sizes at sampling time“2/21/2007 9:55 A.M.”; the objective function value is the best individual’sfitness value at each generation.

Fig. 5. ES solving model (14) with 20/120 for different threshold-distanceselection operator at sampling time “2/21/2007 9:55 A.M.”; the objectivefunction value is the best individual’s fitness value at each generation.

[0.30.3

], the threshold-distance selection operator is reduced to the

traditional ES selection process, i.e., δ =[00

].

Fig. 6 illustrates the impact of different weight combina-tions for the aggregation function. The vertical axis is thecoal–limestone ratio preference value. The horizontal axis is theboiler efficiency preference value. There are five points in Fig. 6which correspond to five different weight combinations. Fromleft to right, the weight combinations are

w1 = 0.08 w2 = 0.9 wC = 0.02 s = 1w1 = 0.38 w2 = 0.6 wC = 0.02 s = 1w1 = 0.4 w2 = 0.58 wC = 0.02 s = 1w1 = 0.6 w2 = 0.38 wC = 0.02 s = 1w1 = 0.9 w2 = 0.08 wC = 0.02 s = 1.

It is easy to see that, as w1 increases, the best solution tends toemphasize the efficiency optimization. It is also easy to see thatthe coal–limestone ratio is hard to optimize. For w1 = 0.08 andw2 = 0.9, the algorithm cannot find the solutions to improvethe coal–limestone ratio significantly. A possible reason is thatincreasing the coal–limestone ratio slightly may sacrifice theboiler efficiency too much, which may lead to a total decreaseof the objective function.

Preference aggregation allows the two objectives to be bal-anced with the weights. This approach is effective when deepdomain knowledge about the process is available to determinethe weights. The Pareto-optimal approach produces a solu-tion set.

854 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 3, JUNE 2010

Fig. 6. ES solving model (14) with 20/120 for different weights at samplingtime “2/19/2007 7:25 A.M.”; the plotted solution is the best individual at the25th generation.

E. Solving the Optimization Model With the Pareto-OptimalSet Approach

Only the most important results are reported in this paper.Model (13) is solved by algorithm 3 for μ/λ = 20/120 withoutrestricting the size of the two elite sets. The preference valuesare generated with the DM algorithms. Based on Fig. 7, itis easy to observe that the coal–limestone ratio preference(vertical axis) and the boiler efficiency preference (horizontalaxis) are two competing performance metrics which are hardto optimize at the same time. The randomly generated initialpopulation shows that increasing the coal–limestone ratio willlead to a total dissatisfaction of the boiler efficiency. Increasingthe boiler efficiency will sacrifice the coal–limestone ratioslightly. Ten generations are not enough for the algorithmto find enough Pareto-optimal solutions. As the generationnumber increases from 25 to 50, the algorithm tends to findmore similar solutions. The initial population mostly spansRegion2 and Region3. After several generations, the algorithmcan converge to the Pareto-optimal front, as shown in Fig. 7.

The computational results reported in Sections IV-D andIV-E show that both optimization models have the abilityto provide satisfactory solutions. The preference aggregationoptimization model achieves the goal by adjusting the weightsand calls for domain knowledge about the process. The Pareto-optimal set approach does not rely on domain knowledgein solving the optimization model and can provide a set ofpotential solutions. However, a decision is needed to select thefinal solution.

V. CONCLUSION

DM algorithms and EAs were integrated within a frameworkto optimize a complex time-dependent process. The underlyingprocess dynamic equations were identified by the DM ap-proach. The equations can be updated with the current processdata if the prediction accuracy of them decreases. Modifiedevolutionary strategy algorithms were applied to solve theoptimization models either through preference aggregation orthrough a Pareto-optimal set approach.

Fig. 7. Pareto-optimal set approach to solve model (13) at sampling time“2/21/2007 9:55 A.M.” (a) Initial population. (b) Pareto-optimal front at thetenth generation. (c) Pareto-optimal front at the 25th generation. (d) Pareto-optimal front at the 50th generation.

SONG AND KUSIAK: MULTIOBJECTIVE OPTIMIZATION OF TEMPORAL PROCESSES 855

To solve the preference aggregation model, a traditionalevolutionary strategy algorithm was modified by introducinga threshold-distance selection operator, which prevented thealgorithm from rapid convergence due to the diversity in thepopulation. For the Pareto-optimal set approach, a traditionalevolutionary strategy algorithm was modified according to theSPEA. The local and global strength of each individual wasused to select the offspring for the next generation. The diver-sity of the population was guaranteed by the classification ofthe individuals into different regions in the preference space.The solutions leading to the undesired regions were not kept inthe Pareto-optimal set, which significantly reduced the size ofthe elite set.

The industrial case study illustrated the effectiveness of theproposed approach and the possibility of applying this frame-work to other similar processes. The combustion efficiencyand the coal–limestone ratio were optimized by adjusting twocontrollable variables. The computational results showed thatsmall improvements of the coal–limestone ratio produced a sig-nificant decrease in the combustion efficiency. It is conceivablethat other controllable variables that were not considered inthis research could be used to optimize the coal–limestone ratiowithout an adverse impact on the boiler efficiency.

APPENDIX

Definition 11: For response variable y1, Dy1y1

= {dy1,lowy1

,

. . . , dy1,highy1

} is a set that is composed of integers se-lected from {1, . . . , dy1} related to y1’s previous values andarranged in ascending sequence, dy1,low

y1≤ dy1,high

y1. Similarly,

Dy1x1

= {dy1,lowx1

, . . . , dy1,highx1

} is a set selected from {1, . . . ,

dx1} for predictors related to x1. Similarly, Dy1v1

= {dy1,lowv1

,

. . . , dy1,highv1

} is a set selected from {1, . . . , dv1} for predictorsrelated to v1. In total, there are 1 + k + m individual sets for y1:Dy1

y1, Dy1

x1, . . . , Dy1

xk, Dy1

v1, . . . , Dy1

vm. For any response variable

yi, there are 1 + k + m individual sets: Dyiyi

, Dyix1

, . . . , Dyixk

,Dyi

v1, . . . , Dyi

vm, i = 1 to l.

Based on definition 11, for i = 1 to l, yi = fi(x,v) can bewritten as the following dynamic equation:

yi(t) = fi

([yi(t − d)]d∈D

yiyi

, [x1(t − d)]d∈Dyix1

, . . . ,

[xk(t − d)]d∈Dyixk

, [v1(t − d)]d∈Dyiv1

, . . . ,

[vm(t − d)]d∈Dyivm

)where [yi(t−d)]d∈D

yiyi

, [x1(t−d)]d∈Dyix1

, . . . , [xk(t−d)]d∈Dyixk

,

[v1(t − d)]d∈Dyiv1

, . . . , [vm(t − d)]d∈Dyivm

expand by enumerat-ing all possible elements in the corresponding sets.

Definition 12: For i = 1 to l, let dyi,minx = min{dyi,low

x1,

. . . , dyi,lowxk

} be the smallest time delay of controllable vari-ables x for response variable yi, dyi,max

x = max{dyi,highx1

, . . . ,

dyi,highxk

} be the largest time delay of controllable variables xfor response variable yi, dyi,min

v = min{dyi,lowv1

, . . . , dyi,lowvm

}be the smallest time delay of noncontrollable variables v for re-sponse variable yi, and dyi,max

v = max{dyi,highv1

, . . . , dyi,highvm

}be the largest time delay of noncontrollable variables v forresponse variable yi.

Observation 2: If there exists a constant d ∈⋃m

j=1 Dyivj

and d < dyi,minx holds, there is not sufficient information about

the noncontrollable variables to predict future yi(t + dyi,minx )

at sampling time t.To explain observation 2, consider an illustrative example,

y(t) = f(x1(t − 3), v1(t − 1)), at sampling time t; y(t + 3)is to be determined, i.e., y(t + 3) = f(x1(t), v1(t + 2)). Asv1(t + 2) is not known, it is difficult to predict y(t + 3);nevertheless, to optimize y(t + 3), change x1(t).

Definition 13: Let X be a set that is composed of all con-trollable variables. X = {x1, . . . , xk}, and, for a response vari-able yi, define X1,yi = {xj |dyi,low

xj= dyi,min

x , j = 1, . . . , k}. Itis obvious that X1,yi ⊆ X . Let X2,yi = X − X1,yi , and, forany controllable variables xj ∈ X2,yi , dyi,low

xj> dyi,min

x holds.Observation 3: At sampling time t, one can only modify

the controllable variables in X1,yi to optimize (or change) yi.X1,yi is called an actionable variable set of yi.

Observation 3 is explained with the following example. Fory(t) = f(x1(t − 3), x2(t − 4)), at sampling time t, y(t + 3) isoptimized by modifying the values of the controllable variables,i.e., y(t + 3) = f(x1(t), x2(t − 1)). As x2(t − 1) has alreadyhappened, y(t + 3) is optimized with x1(t) only.

To make sure that all controllable variables are used to opti-mize yi, observation 3 offers hints for selecting the predictorsand the corresponding time delays. From now on, it is assumedthat, for each response variable yi, there is enough informationto predict its values at sampling time t based on observation 2.

Definition 14: Let Y NP = {ylp+1, . . . , yl} be the setof all nonperformance response variables. Y 1,NP = {yj |⋃lp

i=1 X1,yi ∩ X1,yj �= ∅, j = lp + 1, . . . , l}, and Y 2,NP =Y NP − Y 1,NP.

Y 1,NP is a set of nonperformance response variables whichwill be affected when changing the controllable variables to op-timize the process. Y 2,NP is a set of nonperformance responsevariables which will not be affected during the optimizationprocess.

REFERENCES

[1] M. A. Abido, “Multiobjective evolutionary algorithms for electricpower dispatch problem,” IEEE Trans. Evol. Comput., vol. 10, no. 3,pp. 315–329, Jun. 2006.

[2] J. M. Alonso, F. Alvarruiz, J. M. Desantes, L. Hernández, V. Hernández,and G. Moltó, “Combining neural networks and genetic algorithms topredict and reduce diesel engine emissions,” IEEE Trans. Evol. Comput.,vol. 11, no. 1, pp. 46–55, Feb. 2007.

[3] A. Benedetti, M. Farina, and M. Gobbi, “Evolutionary multiobjectiveindustrial design: The case of a racing car tire-suspension system,”IEEE Trans. Evol. Comput., vol. 10, no. 3, pp. 230–244,Jun. 2006.

[4] M. J. A. Berry and G. S. Linoff, Data Mining Techniques: For Market-ing, Sales, and Customer Relationship Management. New York: Wiley,2004.

[5] C. Bishop, Neural Networks for Pattern Recognition. Oxford, U.K.:Oxford Univ. Press, 1995.

[6] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32,Oct. 2001.

[7] L. Breiman, J. H. Friedman, C. J. Stone, and R. A. Olshen, Classificationand Regression Trees. Monterey, CA: Wadsworth, 1984.

[8] D. Büche, P. Stoll, R. Dornberger, and P. Koumoutsakos, “Multiobjec-tive evolutionary algorithm for the optimization of noisy combustionprocesses,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 32, no. 4,pp. 460–473, Nov. 2002.

856 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 3, JUNE 2010

[9] Z. Cai and Y. Wang, “A multiobjective optimization-based evolutionaryalgorithm for constrained optimization,” IEEE Trans. Evol. Comput.,vol. 10, no. 6, pp. 658–675, Dec. 2006.

[10] Z. Dai and M. J. Scott, “Effective product family design using preferenceaggregation,” Trans. ASME, J. Mech. Des., vol. 128, no. 4, pp. 659–667,2006.

[11] K. Deb, Multi-Objective Optimization Using Evolutionary Algorithms.New York: Wiley, 2001.

[12] K. Deb and H. Gupta, “Introducing robustness in multi-objectiveoptimization,” Evol. Comput., vol. 14, no. 4, pp. 463–494, Dec. 2006.

[13] A. E. Eiben and J. E. Smith, Introduction to Evolutionary Computation.New York: Springer-Verlag, 2003.

[14] J. Espinosa, J. Vandewalle, and V. Wertz, Fuzzy Logic, Identification andPredictive Control. London, U.K.: Springer-Verlag, 2005.

[15] M. Farina, K. Deb, and P. Amato, “Dynamic multiobjective optimizationproblems: Test cases, approximations, and applications,” IEEE Trans.Evol. Comput., vol. 8, no. 5, pp. 425–442, Oct. 2004.

[16] J. H. Friedman, “Stochastic gradient boosting,” Comput. Stat. Data Anal.,vol. 38, no. 4, pp. 367–378, Feb. 2002.

[17] J. H. Friedman, “Greedy function approximation: A gradient boostingmachine,” Ann. Stat., vol. 29, no. 5, pp. 1189–1232, 2001.

[18] H. Ghezelayagh and K. Y. Lee, “Intelligent predictive control of a powerplant with evolutionary programming optimizer and neuro-fuzzy identi-fier,” in Proc. Congr. Evol. Comput., 2002, pp. 1308–1313.

[19] C. K. Goh and K. C. Tan, “An investigation on noisy environments inevolutionary multiobjective optimization,” IEEE Trans. Evol. Comput.,vol. 11, no. 3, pp. 354–381, Jun. 2007.

[20] J. A. Harding, M. Shahbaz, S. Srinivas, and A. Kusiak, “Data mining inmanufacturing: A review,” Trans. ASME, J. Manuf. Sci. Eng., vol. 128,no. 4, pp. 969–976, 2006.

[21] V. Havlena and J. Findejs, “Application of model predictive controlto advanced combustion control,” Control Eng. Pract., vol. 13, no. 6,pp. 671–680, Jun. 2005.

[22] S. Haykin, Neural Networks: A Comprehensive Foundation. New York:Macmillan, 1994.

[23] J. S. Heo, K. Y. Lee, and R. Garduno-Ramirez, “Multiobjective control ofpower plants using particle swarm optimization techniques,” IEEE Trans.Energy Convers., vol. 21, no. 2, pp. 552–561, Jun. 2006.

[24] T. Jansen, K. A. De Jong, and I. Wegener, “On the choice of the offspringpopulation size in evolutionary algorithms,” Evol. Comput., vol. 13, no. 4,pp. 413–440, Dec. 2005.

[25] A. Kusiak and Z. Song, “Combustion efficiency optimization and virtualtesting: A data-mining approach,” IEEE Trans Ind. Informat., vol. 2, no. 3,pp. 176–184, Aug. 2006.

[26] X. J. Liu and C. W. Chan, “Neural-fuzzy generalized predictive control ofboiler steam temperature,” IEEE Trans. Energy Convers., vol. 21, no. 4,pp. 900–908, Dec. 2006.

[27] C. H. Lu and C. C. Tsai, “Generalized predictive control using recur-rent fuzzy neural networks for industrial processes,” J. Process Control,vol. 17, no. 1, pp. 83–92, Jan. 2007.

[28] F. Pierro, S. Khu, and D. A. Savic, “An investigation on preferenceorder ranking scheme for multiobjective evolutionary optimization,”IEEE Trans. Evol. Comput., vol. 11, no. 1, pp. 17–45, Feb. 2007.

[29] M. J. Scott, “Formalizing negotiation in engineering design,” Ph.D. dis-sertation, Stanford Univ. Press, Stanford, CA, 1999.

[30] M. J. Scott and E. K. Antonsson, “Compensation and weights for trade-offs in engineering design: Beyond the weighted sum,” Trans. ASME,J. Mech. Des., vol. 127, no. 6, pp. 1045–1055, Nov. 2005.

[31] C. K. Tan, S. Kakietek, S. J. Wilcox, and J. Ward, “Constrained optimisa-tion of pulverised coal fired boiler using genetic algorithms and artificialneural networks,” Int. J. COMADEM, vol. 9, no. 3, pp. 39–46, 2006.

[32] K. C. Tan, T. H. Lee, D. Khoo, and E. F. Khor, “A multiobjectiveevolutionary algorithm toolbox for computer-aided multiobjective opti-mization,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 31, no. 4,pp. 537–556, Aug. 2001.

[33] P. N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining.Reading, MA: Addison-Wesley, 2006.

[34] S. Tsutsui and A. Ghosh, “Genetic algorithms with a robustsolution searching scheme,” IEEE Trans. Evol. Comput., vol. 1, no. 3,pp. 201–208, Sep. 1997.

[35] I. H. Witten and E. Frank, Data Mining: Practical Machine LearningTools and Techniques., 2nd ed. San Francisco, CA: Morgan Kaufmann,2005.

[36] T. Zhang and J. Lu, “A PSO-based multivariable fuzzy decision-makingpredictive controller for a once-through 300-MW power plant,” Cybern.Syst., vol. 37, no. 5, pp. 417–441, Jul./Aug. 2006.

[37] H. Zhou, K. Cen, and J. Fan, “Modeling and optimization of the NOxemission characteristics of a tangentially fired boiler with artificial neuralnetworks,” Energy, vol. 29, no. 1, pp. 167–183, Jan. 2004.

[38] E. Zitzler and L. Thiele, “Multiobjective evolutionary algorithms: A com-parative case study and the strength Pareto approach,” IEEE Trans. Evol.Comput., vol. 3, no. 4, pp. 257–271, Nov. 1999.

[39] E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca, and V. G. Fonseca,“Performance assessment of multiobjective optimizers: An analysis andreview,” IEEE Trans. Evol. Comput., vol. 7, no. 2, pp. 117–132, Apr. 2003.

[40] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis. Cambridge,U.K.: Cambridge Univ. Press, 1994.

[41] Accessed Jan. 28, 2009. [Online]. Available: http://en.wikipedia.org/wiki/Weighted_moving_average#Weighted_moving_average

[42] K. C. Tan, Q. Yu, and T. H. Lee, “A distributed evolutionary classifier forknowledge discovery in data mining,” IEEE Trans. Syst., Man, Cybern. C,Appl. Rev., vol. 35, no. 2, pp. 131–142, May 2005.

[43] D. Liu, K. C. Tan, C. K. Goh, and W. K. Ho, “A multiobjective memeticalgorithm based on particle swarm optimization,” IEEE Trans. Syst., Man,Cybern. B, Cybern., vol. 37, no. 1, pp. 42–50, Feb. 2007.

[44] C. K. Goh and K. C. Tan, “A competitive-cooperative coevolutionaryparadigm for dynamic multiobjective optimization,” IEEE Trans. Evol.Comput., vol. 13, no. 1, pp. 103–127, Feb. 2009.

Zhe Song (M’08) received the B.S. and M.S.degrees from China University of Petroleum,Dongying, China, in 2001 and 2004, respectively,and the Ph.D. degree from The University of Iowa,Iowa City, in 2008.

He is currently an Associate Professor withthe Business Administration Department, School ofBusiness, Nanjing University. He has published tech-nical papers in journals sponsored by IEEE, ESOR,and IFPR. His research concentrates on modeling en-ergy systems, control and optimization, data mining,

computational intelligence, decision theory, control theory, and statistics.

Andrew Kusiak (M’89) received the B.S. and M.S.degrees in engineering from the Warsaw Univer-sity of Technology, Warsaw, Poland, in 1972 and1974, respectively, and the Ph.D. degree in opera-tions research from the Polish Academy of Sciences,Warsaw, in 1979.

He is currently a Professor with the IntelligentSystems Laboratory, Department of Mechanical andIndustrial Engineering, The University of Iowa, IowaCity. He speaks frequently at international meetings,conducts professional seminars, and does consulta-

tion for industrial corporations. He has served on the editorial boards of over40 journals. He is the author or coauthor of numerous books and technicalpapers in journals sponsored by professional societies, such as the Associa-tion for the Advancement of Artificial Intelligence, the American Society ofMechanical Engineers, etc. His current research interests include applicationsof computational intelligence in automation, wind and combustion energy,manufacturing, product development, and healthcare.

Prof. Kusiak is an Institute of Industrial Engineers fellow and the Editor-in-Chief of the Journal of Intelligent Manufacturing.