Upload
dinhanh
View
216
Download
3
Embed Size (px)
Citation preview
1
Case-based Approach for
Process Modeling
Authors:Qin Cao and Bijun Wan
Supervisor: Ning Xiong, Malardalen University
Examiner: Peter Funk
28th May 2013
2
ABSTRACT
Case-based reasoning is a technique to solve new problems based on previous
successful cases. Similar problems have similar solutions is the main assumption
when case-based reasoning is used. For the problem that the prediction accuracy of
real-valued attribute data is not high and some algorithms can be time-consuming, a
novel case-based approach is proposed in this article, which combines K nearest
neighbor algorithm and regression. This case-based approach is studied with the
purpose of improving the efficiency of case retrieval. It was known to be
computationally expensive due to the matrix calculation when regression is applied
as well as the use of mutual Euclidean distance between elements as a similarity
measure. Here, the authors propose to alleviate this drawback by conducting off-line
calculation using K nearest neighbor regression method for the existing cases in the
dataset. In this way, each case in the dataset can have an extra useful attribute. In the
experiments, this case-based approach is applied to a furnace process. After making
full use of some updated cases, on-line calculation for the unsolved problem proves to
be much faster which owes to the substantial calculation reduction. Some other
common methods are also studied in this thesis. Ten-fold cross validation is applied
to show that the prediction performance of this CBR system appears almost the same
as the other methods, while the size of the useful updated cases decreases
significantly.
Date: 28 MAY 2013
Carried out at: MDH
Supervisor at MDH: Ning Xiong
Examiner: Peter Funk
3
PREFACE First and foremost, we really appreciate our advisor, Ning Xiong, for his
supervision on our thesis. His suggestions and comments is always right to the point.
He gives us helpful advice how to work more efficiently. After this thesis work, I think
I learn the knowledge as well as how to fulfill a more effective teamwork. On the
other hand, we are also very grateful for this exchange student program, which give
us the chance to study in Sweden. It turns out to be an unforgettable experience. I
learn a lot and also have many friends now.
At the same time, we want to thank some friends who care and help us in study.
They really help to improve ourselves in daily life. At last, we do believe that this
thesis work experience and life here is very nice and unforgettable.
Place and month year: Västerås, May 2013
Qin Cao
Bijun Wan
4
Content
1. Introduction ......................................................................................... 6
2.Background ........................................................................................... 6
2.1 Basic Concept of CBR .................................................................... 6
2.2 Case-based Reasoning Foundations ............................................. 7
2.3 Application of CBR ........................................................................ 8
3. Algorithm ............................................................................................. 9
3.1 Off-line Calculation Method ......................................................... 9
3.1.1 K Nearest Neighbor Algorithm ............................................. 9
3.1.2 Case Adaptation ................................................................ 10
3.1.3 Regression ......................................................................... 10
3.2 On-line Calculation Method ....................................................... 12
3.3 some related methods ............................................................... 12
3.3.1 Average ............................................................................. 12
3.3.2 Weighted average .............................................................. 13
4 Ten-fold Cross Validation .................................................................... 13
4.1 Historical Background ................................................................. 14
4.2 Why 10-Fold Cross-Validation: From Ideal to Reality .................. 14
4.3 Applications of Cross Validation ................................................. 15
5 Implement Procedures ........................................................................ 15
6 Experiment tests ................................................................................. 17
6.1 Data set ...................................................................................... 17
5
6.2 Results and Evaluation................................................................ 18
6.2.1 Test for simple Linear regression ....................................... 18
6.2.2 Test for Adaptation ............................................................ 20
6.2.3 Test for improved algorithm .............................................. 22
7 Summary and Conclusion .................................................................... 24
8 Future work ........................................................................................ 24
Reference .............................................................................................. 24
6
1. Introduction
Case-based Reasoning (CBR) systems fall in the domain of Artificial Intelligence
(AI). Problems are solved by reusing past knowledge, the way humans use their past
experiences. Past cases as well as their solutions are stored in the case base of a
particular domain. As an important reasoning approach in AI, CBR overcomes the
bottleneck of capturing knowledge. CBR avoids solving problems from the scratch for
each time so as to improve the reasoning efficiency.
Case-based reasoning system (CBR) includes 4 processes: case retrieve, case
revise, case reuse and case retain[1], among which case retrieve is the key link of the
reasoning system. The efficiency of the retrieval affects the whole system.
Prediction to some important parameters using CBR is of significant importance
in industry domain, including continuous industrial processes. Outcome estimation
in a continuous process requires tuning a few other model parameters, which may
affect the result. Knowledge acquisition for a furnace process is a difficult task.
The furnace process has inherently time-delay, time-varying and non-linear
characteristics. Traditional control strategies such as PID are no longer in force while
confronting the combustion problem of the furnace process, because there are no
exact mathematical models for describing the characteristics. Therefore some
advanced control strategies have been researched and implemented for solving this
control problem, such as fuzzy control or expert control methods reported in
literatures[2-5]. But the natures of these intelligent control strategies are of rule-based,
consequently the obstacle or so called “bottle neck” for gaining the expert knowledge,
which is the basis for realizing the above mentioned strategies, is inevitable
eventually.
We have proposed a novel case-based approach which utilizes CBR rather than
rule-based reasoning (RBR) as its reasoning machine for getting the control
information. This proposed case-based approach in this thesis is essentially applying
K nearest neighbor regression algorithm, however, with some significant
improvements.
The thesis is organized as follow: chapter 2 introduces some related background
and theory. Chapter 3 describes this novel case-based approach in detail. And chapter
4 explains the procedure of cross validation. Chapter 5 gives a brief description of the
procedures in the implementation. Chapter 6 is one of the most important part in this
thesis with focus on implementation and evaluation. In this part, several methods are
conducted to prove that this proposed approach brings improvements in some
aspects. At the end, conclusions are given in chapter 7 as well as some future work in
chapter 8.
2.Background
2.1 Basic Concept of CBR
7
Case-based reasoning (CBR), broadly construed, is the process of solving new
problems based on the solutions of similar past problems. It has been argued that
CBR is not only a powerful method for computer reasoning, but also a pervasive
behavior in everyday human problem solving; more radically, that all reasoning is
based on past cases. This view is related to prototype theory, which is most deeply
explored in cognitive science.
Although CBR is based on cognitive science, and also it was borne by artificial
intelligence, it still had a lot of disadvantages. People do not deny it just because of
their experience, there is no strict proof, which make it very hard to solve some
difficult problems. In other words, it is just a method, not a technology. So, a
complete CBR system should include the following circle steps: Retrieve, Reuse,
Revise and Retain. It can be showed by using a cycle diagram (figure 1). After
checking all kinds of similar cases, we can match these cases and test them and get a
solution by modifying them, so that we can get a most suitable case.
Figure 1: Work flow of CBR
2.2 Case-based Reasoning Foundations
Case-based reasoning was conceived at the end of 70s when first Schank and
Abelson [6] and then Schank [7] laid the foundations for the creation of CBR. To date,
CBR has evolved so as to extend its reasoning and learning capabilities to more
complex situations, as in the case of the use of fuzzy logic as the main knowledge
representation of the information [8], Special attention is required for the use of CBR
in distributed problems that use physically dispersed information, or which comprise
diverse and independent pieces of knowledge. The issues are dealt with in Hayes [9],
Chaudhury [10], and Watson [11],The reasoning cycle of a CBR comprises four stages
which are continuously accessing a shared knowledge base. The four stages of this
cycle are:
8
1) Retrieve: Given a target problem, retrieve from memory those cases relevant to
solving it. A case consists of a problem, its solution, and, typically, annotations about
how the solution was derived.
2) Reuse: Map the solution from the previous case to the target problem. This
can be done in various ways, either through the substitution of characteristics, human
expert intervention, or the reapplication of the same reasoning process as that
followed in the case retrieval.
3) Revise: Having mapped the previous solution to the target situation, test the
new solution in the real world (or a simulation) and, if necessary, revise.
4) Retain: After the solution has been successfully adapted to the target problem,
store the resulting experience as a new case in memory.
2.3 Application of CBR
Since case-based reasoning has been put forward, researches in this area
increases a lot. After continuous development, case-based reasoning had been
applied in both academic and commercial fields till 1990’s. and now, the fields where
CBR can be used become more and more, such as, application of CBR in industrial
process.
CBR can be used to make diagnosis in different area, such as commercial and
industrial area. And also it is possible to make the best decision according to a good
case-based system. What is more, in order to get a better design, CBR system
provides some useful successful experience, which may contribute greatly to part of
the design. On the other hand, the application most commonly seen is to apply CBR
to commercial field, to do decision making or to do assessment and so on. However,
the wide application of CBR does not mean that CBR can be used everywhere. It is
necessary to make sure some conditions are satisfied, for example, case base is
available and the assumption should be valid that similarity in problems can indicate
that the solutions are similar, and so on.
Despite that CBR can be useful in many ways, CBR usually work together with
other methods to get a desired purpose. Article [12] introduces an improved
case-based reasoning. As the large number of attributes in Case-based reasoning
system (CBR) brings a huge information redundancy which reduces the matching
and retrieval efficiency, a novel reduction method based on Water-Filling is proposed
in [12] to remove those unnecessary attributes. A hierarchical memetic algorithm was
proposed in [13] for combined feature selection and similarity modeling in case-based
reasoning. And also Chun Guang Chang [14-15]put forward another way to reduce the
size of cases, and applied it to solve the practical dynamic scheduling problem of
some iron and steel works. Rough set based reduction technique for case attributes is
studied to improve the efficiency of case retrieval.
Learning and knowledge discovery have received much attention by the CBR
community to extract valuable knowledge from the case bases to support the various
steps in a CBR process. Genetic algorithms have been employed to learn/adapt the
parameters in a similarity metric [16,17,18]. Fuzzy rule learning was conducted in [19,20]
to construct a fuzzy knowledge base for similarity assessment. Different learning
algorithms were utilized in [21] for adaptation knowledge modeling in a case-based
9
pharmaceutical design task.
3. Algorithm
After considering many aspects, a novel case-based approach is proposed to get
the unsolved real value output. Two main steps for this new algorithm are as follows:
at first, K nearest neighbor regression algorithm is used to add one more attribute to
the original cases, that is, the coefficients of the linear line. Secondly, a new unknown
input attribute x0 remains to be solved. After searching for some useful cases which
already add the coefficient as an input attribute, the output of the new problem can
be calculated in a very easy and time-saving way.
3.1 Off-line Calculation Method
K nearest neighbor algorithm together with regression is applied in the off-line
calculation stage. By using KNN regression, the coefficients for the linear line can be
calculated, which is an important step in this proposed case-based approach.
3.1.1 K Nearest Neighbor Algorithm
K Nearest Neighbor (KNN) algorithm is a classification and prediction method
used widely in pattern recognition and data mining, and it is a supervised machine
learning method. It is also one of the basic technologies in the field of data mining. It
has been proved to be very effective in many fields, but the researches on KNN
algorithm applying for real-valued output are rare relatively.
Given a set of training samples:
{( , ) 1,2, , , ( , ) }p
i i i iS x y i m x y R R
where xi and yi respectively represent input attribute and output, and they are
both real values. Given a sample x0 to be predicted, KNN is used to predict its
associated real-valued output y0 .
The traditional KNN regression algorithm mainly includes two steps [22]: Firstly,
associated output values of K nearest training samples, which have the shortest
distance with x0 , are selected by KNN in S , denoted as Y={ y1, y2,… yk}. Secondly, the
average value of Y is used as the predicted value of y0, that is,
∑ . An
obvious improvement is that the weighted average value of Y is used to be the
predictive value of y0, that is, (∑ ) ∑
, where the distance weight wi
is inversely proportional to distance, and the associated algorithm is called
distance-weighted KNN[22]. Studies have shown that KNN regression algorithm
achieves relatively better effects in some practical applications [22].
However a new way is proposed to get the output y0, which is corresponding to
the input attribute x0.The first step is the same with the traditional KNN algorithm.
Euclidean distance shown below in equation 1 is applied to act as the similarity
metric.
10
Distance √(x − x )^2 + (x2 − x2 )^2 + …+ (xn − xn )^2 (equation 1)
where x is a n-dimension variable, (x , x2 , ……,xn ) is the input attribute of
the unsolved problem, and (x ,x2 ,…,xn ) is the existing case. By calculating all
the Euclidean distance between the unsolved problem x0 and the known cases in the
dataset, the specified K nearest neighbors can be obtained. On the other hand, the
second step is regression essentially, but some more work has to be done before
applying regression, that is, the following case adaptation part.
3.1.2 Case Adaptation
The major difficulty in case-based reasoning system is the adaptation from
retrieved case. Most of the design problems have to be adapted manually due to the
complexity of the adaptation processes. The approach to make case adaptation varies
much.
The adaptation patterns are the combinations of process parameters, which
would affect the final outcome of a product. A particular process pattern when
observed in a process suggests a good or bad effect in the final product unlike the
traditional adaptation rules, which suggest the amount of change to be made to the
final solution.
In this thesis, from the K neighbors relative to be the nearest to the unsolved
input attribute x0. it is possible to find a method to get the output of this input x0. This
kind of method can be used to get the average value of these neighbors or something
else. However, in this thesis, case adaptation is conducted to get more accurate
information with these limited neighbors. that is, get the incremental value between
two different neighbors. In this way, the useful size of data is increased to Ck2. thus
more accurate information are given by using these limited useful neighbors.
Xnew ∑ ∑ (x − xj) j +
(equation 2)
Ynew ∑ ∑ ( − j) j +
(equation 3)
In the equations above, Xnew and Ynew are vectors to store the adapted cases. And
xi and yi is the obtained cases by using K nearest neighbor algorithm.
3.1.3 Regression
In the normal linear regression, data are modeled using linear predictor function
as follows:
y=a1x1+a2x2+…+akxk+b=XA+b (Equation 4)
However, since the case adaptation has been made, that is, getting the
incremental value between elements, a new equation for linear regression can be
obtained. The following shows how it works.
11
0 0
0 0
( )y xA b
y y x x Ay x A b
y xA
Y XA
( Equation 5)
As it is shown in Equation 5, the estimation for variable b is eliminated,
compared with Equation 4. The value of A, that is, (a1, a2, …, ak)T, can be derived from
the useful adapted cases. And usually, X is a matrix as follows:
1
2
new
new
newn
X
X
X
X
(Equation 6)
Where X is a matrix with its size as C 2 × K, Xnew is a row vector, whose size is
1 × K, and means the corresponding input attribute of an adapted case among all the
useful cases, and n is the amount of useful points.
The same as the equation below, Y is a C 2 × 1 matrix, with rows equal to the
size of useful points after adaptation, each of which is the output of the corresponding
input.
1
2
new
new
newn
Y
Y
Y
Y
(Equation 7)
In this project, after getting these qualified cases, linear regression models are
fitted using the least squares approach as follows:
2 2( ) = ( )n n
i Li i
i i
E Y Y Y XA (Equation 8)
In equation 8, n is the amount of useful points, Yi is the output value of ith
qualified case, and YLi is the value we got using linear regression for the
corresponding existed case.
By minimizing value for equation 8, we can get the following coefficients
equation to get the coefficients of the linear function.
1( )T TA X X X Y (Equation 9)
12
As we can see in K nearest neighbor regression method, large quantity of matrix
calculation is really time-consuming. However, all of the time-costing calculation can
be finished before the unsolved problem really comes. The following is about how this
work is done.
Without the real unsolved problem as an input, every time each of the existing
cases in the dataset acts as the unsolved problem. At the same time, all the other
cases act as the training data, thus the desired coefficient for the temporary test case
can be obtained. In this way, with the knowledge of the existing successful cases, each
existing case in the dataset can get one more useful input attribute.
3.2 On-line Calculation Method
After the off-line work has been done, an updated dataset is achieved. Every time
a new problem arrives, instead of large calculation of the K nearest neighbor
regression method mentioned above, the output value can be calculated in a very
simple way with the help of the coefficient attribute. Obviously, this way to deal with
CBR system is much less time-consuming.
On the other hand, it is necessary to understand the reason for the validity and
reliance of this new input attribute. Let us consider two situations: the first situation
is applying K nearest neighbor regression method and calculating everything on-line
when a new problem comes; the second situation can be the approach we put forward
in this article, making some off-line calculation and then get the on-line results. In
the first situation, the new problem is the test case and all the data in the original
dataset are the training data when conducting KNN regression. And coefficients for
the linear line will be calculated and used to get the output value. However, in the
second situation, one of the data in the original dataset acts as the test case, and all
the others are training data. As it can be seen, the only difference between these two
situations when calculating the coefficient attribute is that there is just one case less
in the training data in the second situation compared with the first situation. Usually,
if one case is eliminated from a dataset, which has a pretty large amount of data,
there will not be big difference to the desired results.
To make the output value more accurate, a specified number of neighbors for the
unsolved problem can be found. It is better to take the average value of all the output
of these neighbors.
3.3 some related methods
3.3.1 Average
An average is a measure of the “middle” or “typical” value of a data set. It is thus a
measure of central tendency. In the most common case, the data set is a list of numbers,
13
and the average of a list of numbers is a single number intended to typify the numbers in
the list.
y=(y1+y2+…+yk)/k (Equation 10)
3.3.2 Weighted average
Weighted average is aimed to compute a kind of arithmetic mean of a set of
numbers in which some elements of the set are more important than others. The
influence of different elements in this data set is distinguished by the weighted
coefficient, wi. That is, wi determine the relative importance of each element on the
final average value. Average, or arithmetic mean, is actually a special case of a
weighted average, except all the weights are equal to 1.
y=(w1y1+w2y2+…+wkyk)/(w1+w2+…+wk) (Equation 11)
Here in some part of work in this thesis, the weighted coefficient is related to the
Euclidean distance between this element in the data set with the unsolved element. The
closer they are, the bigger the weighted coefficient is. That is,
1 √(x − x )^2 + (x2 − x2 )^2 + …+ (xn − xn )^2 (Equation 12)
In [23] it was suggested that case similarity degrees being used as the weights to
calculate the weighted average of the outcomes of retrieved cases in prediction for the
new query problem.
4 Ten-fold Cross Validation
Cross-Validation is a statistical method of evaluating and comparing learning
algorithms by dividing data into two segments: one used to learn or train a model and
the other used to validate the model. In typical cross-validation, the training and
validation sets must cross-over in successive rounds such that each data point has a
chance of being validated against. The basic form of cross-validation is k-fold
cross-validation. Other forms of cross-validation are special cases of k-fold
cross-validation or involve repeated rounds of k-fold cross-validation.
Cross-validation is used to evaluate or compare learning algorithms as follows: in
each iteration, one or more learning algorithms use k-1 folds of data to learn one or
more models, and subsequently the learned models are asked to make predictions
about the data in the validation fold. The performance of each learning algorithm on
14
each fold can be tracked using some predetermined performance metric like accuracy.
Upon completion, k samples of the performance metric will be available for each
algorithm. Different methodologies such as averaging can be used to obtain an
aggregate measure from these samples, or these samples can be used in a statistical
hypothesis test to show that one algorithm is superior to another. The follow is the
procedure of the three-fold cross-validation.
Figure 4: Procedure of three-fold cross-validation
4.1 Historical Background
In statistics or data mining, a typical task is to learn a model from available data.
Such a model may be a regression model or a classifier. The problem with evaluating
such a model is that it may demonstrate adequate prediction capability on the training
data, but might fail to predict future unseen data. Cross-validation is a procedure for
estimating the generalization performance in this context. The idea for
cross-validation originated in the 1930s [24]. In the paper one sample is used for
regression and a second for prediction. Mosteller and Turkey [25], and various other
people further developed the idea. A clear statement of cross-validation, which is
similar to current version of k-fold cross-validation, first appeared in[26]. In
1970s,both Stone [27] and Geisser [28] employed cross-validation as means for choosing
proper model parameters, As opposed to using cross-validation purely for estimating
model performance. Currently, cross-validation is widely accepted in data mining and
machine learning community, and serves as a standard procedure for performance
estimation and model selection.
4.2 Why 10-Fold Cross-Validation: From Ideal to Reality
Whether estimating the performance of a learning algorithm or comparing two or
more algorithms in terms of their ability to learn, an ideal or statistically sound
experimental design must provide a sufficiently large number of independent
measurements of the algorithm(s) performance. To make independent measurements
of an algorithm’s performance, one must ensure that the factors affecting the
measurement are independent from one run to the next. These factors are the training
data the algorithm learns from and the test data uses to measure the algorithm’s
performance. If some data is used for testing in more than one round, the obtained
results, for example the accuracy from these two rounds will be dependent and a
statistical comparison may not be valid.
15
Now the issue becomes selecting an appropriate value for k. A large k is
seemingly desirable, since with a larger k, there are more performance estimates, and
the training set size is closer to the full data size, thus increasing the possibility that
any conclusion made about the learning algorithm(s) under test will generalize to the
case where all the data is used to train the learning model. As k increases, however,
the overlap between training sets also increases. For example, with 5-fold
cross-validation, each training set shares only 3∕4 of its instances with each of the
other four training sets whereas with 10-fold cross-validation, each training set shares
8/9 of its instances with each of the other nine training sets. Furthermore, increasing
k shrinks the size of the test set, leading to less precise, less fine-grained performance
metric. These competing factors have all been considered and the general consensus
in the data mining community seems to be that k = 10 is a good compromise. This
value of k is particularity attractive because it makes predictions using 90% of the
data, making it more likely to be generalizable to the full data.
4.3 Applications of Cross Validation
Cross-validation can be applied in three contexts: performance estimation,
model selection, and tuning learning model parameters. The early uses of
cross-validation were for the investigation of prediction equations but, more recently,
they have been used for model selection purposes. And it also can be used to compare
the performances of different predictive modeling procedures. Many classfiers are
parameterized and their parameters can be tuned to achieve the best result with a
particular dataset. In most cases it is easy to learn the proper value for a parameter
from the available data. Besides it can be performed on the training data as to
measure the performance with each value being tested. Alternatively a portion of the
training set can be reserved for this purpose and not used in the rest of the learning
process. But if the amount of labeled data is limited, this can significantly degrade the
performance of the learned model and cross-validation may be the best option.
5 Implement Procedures
With the help of some graphs, the description below shows how this case-based
approach is fulfilled. All the graph vividly depict how it works in a single input and
single output continuous process. And ten-fold cross validation is used to test the
accuracy.
1. Make use of the existing dataset. The first case in the dataset acts as test case, that
is, the unsolved problem x0. And all the other cases are training data.
2. Find K nearest neighbors for the test case x0 in the continuous process.
16
x0
X
Y
?
0 2 8-4
X 5 6x0
X0=5.2
0.4
a
b
c
d
e
Figure 2: find K nearest neighbors
3. Make case adaptation. After some incremental calculation, CK2 useful points are
found.
4. Conduct regression to the adapted points, thus getting the coefficients for the
linear function.
Xnew
ab
ac
ad
aebc
bd
be
cd
cede 0
Ynew
Figure 3: get the coefficients of the linear function
5. Make the coefficient to be a new part for the test case.
6. Make the next case in the dataset to be test case x0, and repeat step 2 until all the
cases in the dataset get one more part indicating the coefficients of the corresponding
linear function.
7. Make use of the coefficients which is newly added to the dataset, calculate the
output of a real unsolved problem.
8. Find a specified number of nearest neighbors, to get the average output of the
unsolved problem.
9. Conduct ten-fold cross validation to test the accuracy of this method.
17
6 Experiment tests
The gas furnace datasets is a commonly used benchmark, and it also very useful
in valuation, we compare the results with three different algorithms on this dataset,
these algorithms are applied to estimate the output in a furnace process. Firstly, we
use the simple linear fitting to get the result(square error) comparing with the
method of average and weighted-average, secondly, we add case adaptation to make
the first method more perfect, because this method can make us use less data points
to get the appropriate results, after that we use off-line calculation to subscribe this
online-calculation. In order to make the result more accurate, ten-cross validation is
applied when testing these algorithms.
6.1 Data set
The data set consists of 296 input-output measurements sampled at a fixed
interval of 9 seconds. The measured input u(k) represents the flow rate of the
methane gas in a gas furnace and the output measurement y(k) represents the
concentration of carbon dioxide in the gas mixture flowing out of the furnace under a
steady air supply.
In our attached dataset, the first column in the data represent u(k) and the
second column is y(k), k is between 1 and 296.In this test we chose y(k-1), y(k-2),
y(k-3), u(k), u(k-1),u(k-2)as inputs of the cases to predict the concentration of carbon
dioxide at time instant k. The following graph gives a clear image of the relationship
between the inputs and output.
Figure 1.The process of furnace
Figure 2 shows the distribution of this dataset, we can see these dataset is
continuous, it can be treated as a non-linear model between small amount numbers
of points.
18
Figure 2. The gas furnace data
6.2 Results and Evaluation
6.2.1 Test for simple Linear regression
In the following table, K means the number of nearest neighbors, we have made
six input data, so we made K change from six. The results are the average squared
errors with the data library when using the specified method to predict the output
values. Ave indicates that we use the average method to predict the output as results,
after that we decide the numbers of the nearest neighbors, we get the sum of these
point’s value, then average them. Weighted-ave stands for the results which using
the weighted-average method, this method is mainly related to the Euclidean
distance.
19
Table 1. The result of simple regression
K Ave Weighted-ave Simple Linear regession
6 0.41473 0.37658 9.2607
9 0.43734 0.38322 0.24766
14 0.53548 0.42601 0.15619
18 0.62127 0.46772 0.12564
24 0.74047 0.52547 0.099747
27 0.80127 0.55302 0.095479
32 0.91621 0.59950 0.082702
37 1.0213 0.6410 0.074715
42 1.1003 0.67187 0.073254
46 1.1674 0.69672 0.074356
51 1.2645 0.73203 0.074191
55 1.3319 0.75543 0.071290
60 1.4194 0.78353 0.069408
64 1.4915 0.80619 0.070306
67 1.5598 0.82657 0.070117
71 1.6346 0.84935 0.06975
75 1.7150 0.87165 0.068703
79 1.7897 0.89345 0.070054
83 1.8858 0.91884 0.070159
87 1.9709 0.93850 0.070491
92 2.0811 0.96159 0.070425
97 2.2095 0.98749 0.070730
100 2.2845 1.0031 0.071613
20
From the result above, we can see that there are many important factors we
should notice, because they will have a great effect to the result, firstly, with different
amount of neighbors, which is denoted by K, the results vary greatly, for the first
method, when K increase, the result is also turned to increase, but for Simple Linear
regression, when K is small, such as, 6, the error is pretty big, but when k increases
from 6,the error decreases gradually, the larger number the less square error. When
K increase to 75,we can see that the result is the smallest, However, when K continues
to increase, the error becomes larger. The reasons why the error behaves in this way
are explained as follows:
1) When the amount of neighbors is small, the possibility of inaccuracy is much
higher. Since the data library with small number of the nearest neighbor may be not
so concentrated, it is easy to get inaccurate approximation linear line.
2) On the other hand, if the amount of neighbors is too large, it is pointless to
search for the nearest neighbor, because with large number data points, the linear
line will deviate greatly compared to the real line, and we know that linear regression
prove to be very good locally in small interval.
Meanwhile, from the result we can see that the results from simple linear
regression are much better than another two methods(average method and
weighted-average method). So the simple linear regression method is more accurate
than these two methods.
6.2.2 Test for Adaptation
The parameters below (K, ave, Weighted-ave) is the same as above we have
described. For the adaptation part, we just do a little change based on the simple
linear regression, after deciding the number of the nearest neighbor, we don’t use
these data points to do regression directly as last method, firstly making the
adaptation based on these data points, for instance, if we choose four data points,
after the adaptation we can get six data points to do regression, so for the six input
system, the start value of K is four.
21
Table 2. The result of Adaptation
K ave Weighted-ave Adaptation
4 0.41307 0.38424 0.37819
6 0.41473 0.37658 0.62961
9 0.43734 0.38322 0.24766
13 0.50263 0.41179 0.15938
19 0.63703 0.47507 0.12132
24 0.74047 0.52547 0.099747
28 0.82313 0.56156 0.092044
32 0.91621 0.59950 0.082702
36 0.99675 0.63176 0.07603
41 1.0815 0.66383 0.073324
46 1.1674 0.69672 0.074356
55 1.3319 0.75543 0.071290
59 1.4094 0.77932 0.069752
64 1.4915 0.80619 0.070306
68 1.5774 0.83243 0.069882
72 1.6538 0.855 0.069261
75 1.7150 0.87165 0.068703
81 1.8427 0.90686 0.070501
84 1.9095 0.92407 0.070016
87 1.9709 0.93850 0.070491
91 2.0545 0.95562 0.070433
96 2.1844 0.98256 0.070626
100 2.2845 1.0031 0.071613
22
From the result, with the increment of K, the error of first two methods increase
in the meantime, and for adaptation part, the result is same as the simple linear
regression, when K equals 75, the best result comes out.
6.2.3 Test for improved algorithm
From previous calculation, we can see that for the gas furnace dataset, the best
result (square error) under these methods is 0.068703 when K equals to 75.The
following method leads to the new algorithm for this dataset, it combines average and
adaptation as well as linear regression, the big difference comparing with previous
two is that this method use offline calculation.
When getting the input data points, we divided them into 10 sets, we choose one
set for the test data, and the rest is train data, unlike previous method, here, we
separate one data points from train dataset and use the value of K equals to 75 in the
adaptation method which we talked in 5.2.2 as the nearest neighbors of this data point,
with that we use these nearest neighbors to do adaptation, then we can get the
coefficient of the equation related to this data point, in this process, we set a loop to
work out the coefficients of all the data points in the training dataset. After these all
steps, go back to the test dataset, giving the data point from test dataset, according to
the nearest neighbors, use the related coefficient to calculate the values of cases in the
test dataset in combination with the average method. In the following parameters, K
means the number of the nearest neighbors of the test dataset.
23
Table 3. The result of improved algorithm
K Ave-adaptation
4 0.086943
6 0.081838
9 0.07669
14 0.072748
18 0.07208
22 0.071868
26 0.071744
30 0.07064
32 0.070388
34 0.069724
35 0.069432
36 0.069743
39 0.0698
43 0.07048
46 0.070755
50 0.071305
53 0.07148
56 0.071506
Skimming through the result, we can see that when K increases, the square error
decreases until K equals to 35, and the smallest error is 0.069432, after that, when K
increases, the result turn out to increase. Comparing this method with the other two
methods we discussed before, it’s not difficult to find out that the result (square
error) is almost same as the method of Adaptation, but for the test dataset, the need
of the nearest neighbor’s numbers is not so many, just 35 which is much less than 75
in the simple linear regression and the adaptation method.
24
7 Summary and Conclusion
After all the test methods, we can see that without designing complicated
mathematical models, CBR can get the desired output with a tolerant error. From the
simple linear regression and adaptation, we can see that the choice of the nearest
neighbor is so important, different value of K will make the result totally different. We
also should notice that the result from the linear regression and adaptation is much
better than the average and weighted-average methods, so these two methods are
more appropriate for this continuous dataset compared with simple average and
weighted-average method.
In the third part, we have presented an operational and a new approach based on
the classical gas furnace dataset in CBR. This method mainly do some changes on the
previous methods(simple linear fitting and adaptation),it combines linear regression
and adaptation with the nearest neighbors, and it also firstly calculate the coefficients
of the equation related to training dataset, so we don’t need to calculate the
coefficients online which is so time-consuming when given the new problem, just use
the coefficients calculated before. Regarding the numbers of the nearest neighbors,
this method gets the best result when K equals 35 instead of 75 in the other two
methods. It use fewer cases to get the best result which is almost the same as the
previous methods. Moreover, this approach has the advantage of being general and
easy to understand and to reuse. In particular, it fits well with the “real world”
applications.
8 Future work
The experiment shows that the approach proposed in this thesis has its own
advantage compared with some other methods. However, there is still some ways to
improve the prediction performance of the CBR system. Due to the time limitation,
we are not able to study it further.
In our opinion, there is at least one improvement for this method. It is possible
and reasonable to make use of some techniques to deal with more complex and high
dimensional dataset. such as, K-means algorithm. In this way, combine clustering
and case-based reasoning is possible to reach better accuracy.
Reference
[1] Schank.R, “Dynamic Memory, A theory of learning in computers and people”,
Cambridge university press,1982.
[2] Kenneth R. Muske, James W. Howse, Glen A.. Hanson et al., Temperature Profile
Estimation for a Thermal Regenerator, Proceeding of the 38th Conference on
Decision and Control, 3944-3949, 1998.
25
[3] Huang Zhaojun, Bai Fengshuang, Zhuang Bin et al., Flow Set and Control Expert
System of Hot Stoves for Blast Furnace, Metallurgical Automation, Vol.26, No.5,
38-40, 2002.
[4] Ma Zhuwu, Lou Shengqiang, Li Gang et al., The Intelligence Burning Control of
the Hot Stove of Lianyuan Iron & Steel Group Co., Metallurgical Automation, Vol.26,
No.4, 11-15, 2002.
[5] Sun Jinsheng, Research on a Case-Based Reasoning AI Control Strategy for Blast
Furnace Stove Combustion, Doctor Thesis, China University of Mining and
Technology, 2005.
[6] R.Schank, R. Abelson, Script, Plans, Goals and Understanding, Hillsdale, NJ:
Erlbaum, 1977.
[7] R.Schank, Dynamic Memory: A Theory of Learning in Computers and People,
Cambridge University Press, New York, 1982.
[8] H.Li,S Dick.A similarity measure for fuzzy rrlebases based on linguistic gradients.
Information Science 176, 2006, pp.2960-2987.
[9] C.Hayes, C., P. Cunningham, P., M. Doyle, M.Distributed CBR using XML
(Department of Computer Science Trinity College, Dublin)2001.
[10] S.Chaudhury, T.Singh, P.S.Goswami. Distributed fuzzy case based reasoning.
Applied Soft Computing,4, 2004, pp.323-343.
[11] I.Watson, D. Gardingen, “A Distributed Case Based Reasoning Application for
Engineering Sales Support”. In Procc. of the IJCAI-99, 1999, pp. 600-605.
[12] Hui Zhao, Ai-jun Yan, Chun-xiao Zhang and Pu Wang, Attribute Reduction
Method Using Water-filling Principle for Case-based Reasoning. Proceedings of the
10th World Congress on Intelligent Control and Automation. July 6-8, 2012, Beijing,
China.
[13] N. Xiong, and P. Funk, “Combined feature selection and similarity modeling in
case-based reasoning using hierarchical memetic algorithm”, In: Proceedings of the
IEEE World Congress on Computational Intelligence, 2010, pp. 1537-1542.
[14] Chun Guang Chang, Ding Wei Wang, Kun Yuan Hu, Zhi Tao., Study on rough set
based reduction technique for case attributes, proceedings of the Third International
Conference on Machine Learning and Cybernetics, shanghai, 26-29, August 2004.
[15] J. W. Han, and M. Kamber, “Data Mining: Concepts and Techniques,” China
Machine Press, 2007, Beijing.
[16] H. Ahn, K. Kim, and I. Han, “Global optimization of feature weights and the
number of neighbors that combine in a case-based reasoning system,” Expert
Systems, vol. 23, pp. 290-301, 2006.
[17] J. Jarmulak, S. Craw, and R. Rowe, “Genetic algorithms to optimize CBR
retrieval,” In: Proc.European Workshop on Case-Based Reasoning (EWCBR 2000),
2000, pp. 136-147.
[18] N. Xiong, “Towards coherent matching in case-based classification”, Cybernetics
and Systems, Vol. 42, 2011, pp. 198-214.
[19] N. Xiong, “Learning fuzzy Rules for similarity assessment in case-based
reasoning”, Expert Systems and Applications, Vol 38, 2011, pp. 10780-10786.
26
[20] N. Xiong, “Fuzzy rule-based similarity model enables learning from small case
bases,” Applied Soft Computing, vol. 13, pp. 2057-2064, 2013.
[21] S. Craw, N. Wiratunga, and R. C. Rowe, Learning adaptation knowledge to
improve case-based reasoning, Artificial Intelligence, Vol. 170, 2006, pp. 1175-1192.
[22] T. Ye, X. F. Zhu, X. Y. Li, and B. H. Shi, “Soft sensor modeling based on a
modified k-nearest neighbor regression algorithm,” Acta Automatica Sinica. 2007, vol.
33, pp. 996–999.
[23] N. Xiong, and P. Funk, “Building similarity metrics reflecting utility in
case-based reasoning”, Journal of Intelligent and Fuzzy Systems, Vol. 17, No. 4,
2006, pp. 407-416.
[24] Larson S.The shrinkage of the coefficient of multiple correlation. J. Educat.
Psychol., 22:45–55,1931.
[25] Mosteller F.and Wallace D.L.Inference in an authorship problem. J. Am. Stat.
Assoc,58:275–309, 1963.
[26]Mosteller F. and Turkey J.W. Data analysis, including statistics. In Handbook of
Social Psychology. Addison-Wesley, Reading,MA, 1968.
[27] Stone M. Cross-validatory choice and assessment of statistical predictions. J.
Royal Stat. Soc., 36(2):111–147,1974.
[28] Geisser S. The predictive sample reuse method with applications. J. Am. Stat.
Assoc., 70(350):320–328,1975.