Sales Using ANN n Fuzzy

Embed Size (px)

Citation preview

  • 7/29/2019 Sales Using ANN n Fuzzy

    1/15

    .Computers in Industry 37 1998 115

    An intelligent sales forecasting system through integration ofartificial neural network and fuzzy neural network

    R.J. Kuo a,), K.C. Xue b

    aDepartment of Marketing and Distribution Management, National Institute of Technology at Kaohsiung, Kaohsiung County, 824, Taiwan

    bGraduate School of Management Science, I-Shou Uniersity, Kaohsiung County, 80048, Taiwan

    Received 3 May 1997; accepted 16 February 1998

    Abstract

    This paper attempts to develop an intelligent sales forecasting system which can consider the quantitative factors as well . .as the non-quantitative factors. The proposed forecasting system consists of four parts: 1 data collection, 2 general pattern

    . .model, 3 special pattern model, and 4 decision integration. In the general pattern model, a feedforward neural network .with error-backpropagation EBP learning algorithm is employed to learn the time series data, or quantitative factors.

    However, unique circumstances, e.g., promotion, may cause a sudden change in the sales pattern. To this end, this paper .utilizes fuzzy logic which is capable of learning fuzzy neural network, FNN to learn the experts knowledge regarding the

    effect of promotion on the sales. Finally, the outputs from the above two mentioned models are integrated with time effect

    through a feedforward neural network with EBP learning algorithm. Evaluation of the model results indicates that the

    proposed system performs more accurately than the conventional statistical method and single artificial neural network

    .ANN .q

    1998 Elsevier Science B.V. All rights reserved.

    Keywords: Sales forecasting; Artificial neural networks; Fuzzy neural networks

    1. Introduction

    To enhance the commercial competitive advan-

    tage in a constantly fluctuating environment, an or-

    ganizations management must make the right deci-

    sion in time depending on the information at hand.

    The decision lead time ranges from several years toseveral hours based on the types of business. Thus,

    making an accurate decision plays a prominent role.

    )

    Corresponding author. Tel.: q886-7-6011000 ext. 4213;e-mail: [email protected]

    Intuitively, forecasting models can provide rea-

    sonable estimates by using historical data. Therefore,

    if the marketing department can estimate the sales

    quantity for the next period, the materials department

    can then effectively control the inventory to achieve .just-in-time JIT . In addition, the production depart-

    ment can make the scheduling and arrange the facil-ity utilization. Such an action may cause the produc-

    tion cost to decrease. Therefore, obtaining an accu-

    rate forecast appears to be critical. Statistical meth-

    ods, such as regression models and ARMA, have

    long been candidates for decision makers. However,

    these methods are only efficient for data which are

    seasonal or cyclical. If the data are influenced by

    0166-3615r98r$19.00 q 1998 Elsevier Science B.V. All rights reserved. .PII S 0 1 6 6 - 3 6 1 5 9 8 0 0 0 6 6 - 9

  • 7/29/2019 Sales Using ANN n Fuzzy

    2/15

    ( )R.J. Kuo, K.C. Xuer Computers in Industry 37 1998 1152

    special cases, like promotion, they are inaccurate. .Artificial neural networks ANNs that are better

    w xthan the conventional statistical methods 1,2 havebeen recently employed, but even then the problem

    still arises. Thus, this paper first attempts to develop

    a forecasting system capable of handling such a

    circumstance, i.e., promotion.

    Fuzzy logic has been applied in the area of con-

    trol and has shown highly promising results. Fuzzy

    logic attempts to capture vague experts knowledge.

    However, the setup of experts knowledge resulting

    from the membership function setup of fuzzy sets is

    quite subjective. Therefore, this paper also aims to

    develop an learning algorithm for fuzzy logic, called .fuzzy neural network FNN .

    Thus, this paper develops a sales forecasting sys- .tem which consists of four parts: 1 data collection,

    . . .2 general pattern model ANN , 3 special pattern

    . . .model FNN , and 4 decision integration ANN .To evaluate the proposed system, the actual data

    provided by a chain supermarket company are used.

    According to these results, the proposed system per-

    forms more accurately than the conventional statisti-

    cal method and single ANN, particularly when the

    promotion is conducted.

    The rest of this paper is organized as follows.

    Section 2 provides some necessary background in-

    formation while the proposed system is discussed in

    Section 3. Section 4 summarizes the evaluation re-

    sults. Discussion and concluding remarks are finallymade in Sections 5 and 6, respectively.

    2. Background

    This section briefly reviews the sales forecasting

    methods and the applications of artificial neural net-

    works in this area. In addition, fuzzy modeling and

    fuzzy neural networks are also discussed.

    2.1. Sales forecasting

    Sales forecasting always plays a prominent role in

    a decision support system. Obtaining effective sales

    forecasting in advance can help the decision maker

    calculate production and materials costs, even deter-w xmine the sale price 3 . This will result in a lower

    inventory level and achieve the objective of just-in-

    time. Regarding conventional sales forecasting meth-w xods 46 , most of them used either factors or time

    series data to determine the forecast. However, the

    relationship between the factors or the past time .series data independent variables and the sales data

    .dependent variable is always quite complicated.Obtaining results through the above mentioned ap-

    proaches is quite difficult. Therefore, various deci-

    sion makers prefer using their own intuition, insteadof model-based approaches i.e., time series or re-

    .gression models . However, there is a model-freeapproach, ANN, that can be applied in the area of

    forecasting owing to its adequate performance in

    control and pattern recognition. Thus, ANN is re-

    viewed in Section 2.2.

    2.2. Artificial neural networks in sales forecasting

    .Artificial neural network ANN is a system de-rived through models of neurophysiology. In general,

    it consists of a collection of simple nonlinear com-

    puting elements whose inputs and outputs are tied

    together to form a network. The learning algorithms

    of ANNs can generally be divided into three differ-

    ent types: supervised, unsupervised, and hybrid

    learning.

    Many studies have attempted to apply ANN to

    time-series forecasting. However, their conclusions

    are often contradictory. Some studies conclude that

    w xANNs are better than conventional methods 7 , whilew xothers reach an opposite conclusion 8 . Weigen etw xal. 7 introduced the eight-elimination backpropa-

    gation learning procedure to effectively deal with the

    overfitting problem, and applied it to sunspots and anw xexchange rate time series. Tang et al. 8 compared

    the ANN and BoxJenkins models, using interna-

    tional airline passenger traffic, domestic car sales

    and foreign car sales in the USA. They concluded

    that the BoxJenkins models outperformed the ANN

    models in short-term forecasting. On the other hand,

    the ANN models outperformed the BoxJenkins inlong term forecasting.w xChakraborty et al. 1 presents an ANN approach

    to multivariate time-series analysis. They accurately

    predicted the flour prices in three cities in USA.

    According to their results, the ANN approach is a

    leading contender among statistical modeling ap-

    proaches.

  • 7/29/2019 Sales Using ANN n Fuzzy

    3/15

    ( )R.J. Kuo, K.C. Xuer Computers in Industry 37 1998 115 3

    w xLachtermacher and Fuller 2 developed a cali-brated ANN model. The model used BoxJenkins

    methods to identify the lag components of the data,

    that should be used as input variables. Also, it

    employed a heuristic to suggest the number of hid-

    den units needed in structuring the model. In examin-

    ing the stationary series, they observed that the

    calibrated ANN models have only a slightly better

    overall performance than the conventional time-series

    methods used in the benchmark. In the case of a

    non-stationary series, the calibrated ANN models

    outperformed the ARMA model for three of the four

    series, and performed almost as well as the ARMA

    in fourth series. The above survey indicates that

    ANN is more appropriate for the time series data.

    However, only considering the time series data may

    result in a bad forecast. Including both the time

    series data and factors in the forecasting model

    seems to be preferable. Fuzzy logic has not onlybeen successful in engineering application, but also

    capable of replacing these factors by fuzzy logic as

    introduced in Section 2.3.

    2.3. Fuzzy modeling

    The theory of fuzzy sets was first developed byw xLotfi Zadeh 9 , primarily in the context of his

    interest in the analysis of complex systems. How-

    ever, some of the key ideas of the theory were

    envisioned by Max Black, a philosopher, almost 30w xyears prior to Zadehs seminal paper 10 . Basically,the concept of the fuzzy set is a generalization of the

    classical or crisp set.

    The crisp set is defined in such a way as to

    dichotomize the individuals in some given universeof discourse into two groups: members those that

    . certainly belong in the set and nonmembers those.that certainly do not . A sharp, unambiguous distinc-

    tion exists between the members and nonmembers of

    the class or category represented by the crisp set.

    However, many of the collections and categories do

    not have this kind of characteristic. Instead, their

    boundaries seem vague, and the transition from

    member to nonmember appears gradual rather than

    abrupt. Thus, the fuzzy set introduces vagueness by

    eliminating the sharp boundary dividing members of

    the class from nonmembers. The logic based on

    fuzzy set theory is called fuzzy logic. A simple

    example is used to explained the general concept of

    the fuzzy modeling. For instance, two arbitrary rules

    are illustrated as follows. . .Rule 1: IF X is Small A and Y is Small B1 1

    THEN Z is f1 . .Rule 2: IF X is Large A and Y is Large B2 2

    THEN Z is f2The membership functions for X and Y are shown

    in Figs. 2 and 3. Both X and Y are divided into four . .linguistic variables, say very small VS , small S ,

    . .large L , and very large VL . Each rule has apremise, or IF part, which contains several precondi-

    tions and a consequent, or THEN part which de-

    scribes the value of one or more output actions. Now

    suppose there are two preconditions, x and y, for

    fuzzy variables X and Y, respectively. Then, for rule

    1, the membership function values are represented as . .m x and m y for A and B , respectively. Simi-1 1 1 1

    . .larly for rule 2, we have m x and m y as the2 2membership function values. Hence, the firing

    . .strength of rule 1 is obtained as w sm x m y ,1 1 1 . .while w s m x m y is the firing strength of rule2 2 2

    2.

    The overall output O is determine by using cen-

    troid defuzzification where

    w f i i x =y =f qx =y =f1 2 1 2 1 2iO s 1 .

    x =y qx =yw 1 2 2 1 ii

    and f is the consequence, or control action, value ofiw xrule i 11 .

    During the past several years, fuzzy modeling has

    been applied in many practical areas, ranging from

    finance to engineering. In particular, fuzzy control

    has emerged as one of the most promising areas for

    research in the application of fuzzy modeling. In

    many applications, such as control, fuzzy-based sys-

    tems have proved to be superior in performance to

    conventional systems.

    2.4. Fuzzy neural network

    An ANN, which is employed for recognition pur-

    poses, generally lacks the ability to be developed for

    a given task within a reasonable time. On the otherw xhand, fuzzy modeling 9,12 , which is applied to fuse

  • 7/29/2019 Sales Using ANN n Fuzzy

    4/15

    ( )R.J. Kuo, K.C. Xuer Computers in Industry 37 1998 1154

    the decisions from the different variables, requires anapproach to learn from experience i.e., data col-

    .lected in advance . Therefore, how to successfullycombine these two approaches, ANNs and fuzzy

    modeling, has become the a relevant concern of

    further studies.

    Recently, ANNs learning algorithm has been ap-

    plied to enhance the performance of a fuzzy system

    and has been demonstrated to be an innovative ap-

    proach. Also, fuzzy ifthen rules were generated and

    adjusted by learning methods using numerical data.w xLin and Lee 13,14 proposed the so-called Neural-

    Network-Based Fuzzy Logic Control System NN-.FLCS . They introduced the low-level learning power

    of neural networks in the fuzzy logic system and

    provided high-level human-understandable meaning

    to the normal connectionist architecture. Also, Kuow x15 introduced a feedforward ANN into fuzzy infer-

    ence represented by Takagis fuzzy modeling.The above mentioned FNNs are only appropriate

    for numerical data. However, the experts knowledge

    is always of a fuzzy nature. Thus, some researchers

    have attempted to address this problem. Ishibuchi etw xal. 16,17 proposed learning methods of neural net-

    works to utilize not only numerical data but also

    expert knowledge represented by fuzzy if then rules.w xLin and Lee 18 also presented a FNN, capable of

    handling both the fuzzy inputs and outputs. Mean-w xwhile, Buckley and Hayashi 19 surveyed recent

    results on learning algorithms and applications forFNNs.

    3. Methodology

    Section 2 has emphasized the relevance of sales

    forecasting as well as some necessary background

    information. The proposed forecasting system is dis-

    cussed in more detail in this section.

    The proposed intelligent forecasting system con- . .sists of 1 data collection, 2 general pattern model

    . . . .ANN , 3 special pattern model FNN , and 4 .decision integration ANN . Fig. 1 shows the pro-

    posed system architecture. Basically, the system de-

    termines the forecasted product and the factors af-

    fecting the sales first. Thereafter, the general pattern

    of the sales is forecasted by an ANN while FNN

    considers the effect on the sales if a promotion is

    conducted. Finally, the decisions from these two

    networks and time effect are integrated through the

    other ANN. Each part is thoroughly discussed in the

    following subsections.

    3.1. Data collection

    The first part of the proposed forecasting system, .data collection, concerns two data resources: 1

    .quantitative factors and 2 non-quantitative factors.The quantitative factors can be accumulated from the

    supermarket company while the questionnaire survey

    can provide the non-quantitative data.

    3.2. ANN architecture

    .In this part, i.e., ANN , a feedforward ANN with .error backpropagation EBP learning algorithm is

    employed. Fig. 2 shows the general structure of the

    one-hidden-layer ANN. The ANNs input layer with

    some neurons represents the previous sales data, say

    period t-p to period t-1, which are connected to the

    hidden layer. The hidden layer with some neurons is

    connected with the output layer with one single

    neuron which represents the sales for period t. Usu- . ally, the input neurons use y sf x sx no change

    .in input and all the other neurons have the sig-

    . ..y1

    moidal function y sf x s 1q exp yx . Theobjective is to minimize a cost function defined as:

    1 2Es T yO , 2 . . p p

    2 p

    where T and O are the desired and actual outputs,

    respectively, and p is the sample number. The

    derivation is based on the gradient descent technique

    to converge toward improved weights and biases .thresholds . The updating rule for all the weightsand biases is

    yEEW t syh q a W ty 1 , 3 . . .k h k h /EWk hwhere h is learning rate and a is the momentum.

    Therefore, from the preceding setup and required

    data, the ANN can learn the relationship between the

    sales at period t and previous sales from periods t-1

  • 7/29/2019 Sales Using ANN n Fuzzy

    5/15

    ( )R.J. Kuo, K.C. Xuer Computers in Industry 37 1998 115 5

    Fig. 1. The architecture of the proposed forecasting system.

    to t-p. Thus, just input sales from periods from m-1

    to m-p if we intend to forecast sales at period m.

    3.3. FNN architecture

    In the first part, the ANN can provide us sales

    without promotion effects. However, if a promotion

    is running, then ANNs forecast will be inaccurate.

    Thus, this section discusses how to effectively han-

    dle the circumstance of promotion by means of a

    FNN. Since the FNN architecture is based on the

    fuzzy logic which possesses both the precondition

    and consequence, the precondition variables repre-

    sent the effective factors while the sales represents

    the consequence variable. First, the data and ifthen

    rules are obtained through the fuzzy Delphi method.

    After this procedure, the collected data can be ap-

    plied to train the proposed FNN. The structure ofw xFNN presented in this study is similar to Ref. 17 .

    The main difference is that the network employs the

    asymmetrical bell shaped instead of triangular fuzzy

    weights. In the following, the two components, FNN

    and fuzzy Delphi, are discussed in more detail.

  • 7/29/2019 Sales Using ANN n Fuzzy

    6/15

    ( )R.J. Kuo, K.C. Xuer Computers in Industry 37 1998 1156

    Fig. 2. The ANN structure.

    3.3.1. FNN with asymmetric bell-shaped fuzzy

    weights

    Most FNNs can only handle the actual real input

    w xand output except 17,18 . Thus, this component

    w xintends to modify Ishibuchis work 17 . InIshibuchis work, the input, weight, and output fuzzy

    numbers are symmetric triangular. However, this

    assumption is not similar to a human beings think-

    ing. Thus, this paper replaces the triangular fuzzy

    numbers with asymmetric Gaussian functions. The

    inputoutput relation of the proposed FNN is dis-

    cussed in the following. However, the operations of

    fuzzy numbers are presented first.

    3.3.1.1. Operations of fuzzy numbers. Before describ-

    ing the FNN architecture, fuzzy numbers and fuzzynumber operations are defined by the extension prin-

    ciple. In the proposed algorithm, real numbers and

    fuzzy numbers are denoted by the lowercase letters .e.g., a, b, . . . and a bar placed over uppercase

    .letters e.g., A, B, . . . , respectively.

    Fig. 3. The FNN architecture.

    Since input vectors, connection weights and out-

    put vectors of multi-layer feedforward neural net-

    works are fuzzified in the proposed FNN, the addi-

    tion, multiplication and nonlinear mapping of fuzzy

    numbers are necessary for defining the proposed

    FNN. Thus, they are defined as follows:

    Z z sX x qY y . . .

  • 7/29/2019 Sales Using ANN n Fuzzy

    7/15

    ( )R.J. Kuo, K.C. Xuer Computers in Industry 37 1998 115 7

    For hidden layer,

    L Uw x w x w xO a s O a ,O ap h p h p h

    L Uw x w xs f Net a ,f Net a , 8 .p h p h / /

    hs

    1 , 2 , . . . ,n ,H

    nI

    L L Lw x w x w xNet a s W a PO ap h h i p iis1

    Lw xW a G0hi

    nI

    L Uw x w xq W a PO a hi p iis1

    Lw xW a -0hi

    Lw xqQ a , 9 .hnI

    U U Uw x w x w xNet a s W a PO ap h h i p iis1

    Uw xW a G0hi

    nI

    U Lw x w xq W a PO a h i p iis1

    Uw xW a -0hi

    Uw xq Q a , 10 .h

    For output layer,

    L Uw x w x w xO a s O a ,O ap k p k p k

    L Uw x w xs f Net a ,f Net a ,p k p k / /

    s1 , 2 , . . . ,n , 11 .OnO

    L L Lw x w x w xNet a s W a PO ap k k h p hks1

    L

    w xW a G0kh

    nO

    L Uw x w xq W a PO a k h p hks1

    Lw xW a -0kh

    Lw xqQ a , 12 .k

    nO

    U U Uw x w x w xNet a s W a PO ap k k h p hks1

    Uw xW a G0kh

    nO

    U Lw x w xq W a PO a k h p hks1

    Uw xW a -0kh

    Uw xq Q a . 13 .kThe objective is to minimize the cost function

    defined as:

    nO

    L UE s a E qE s E , 14 . . p k a . k a . p a .a aks1

    where

    nO

    L U

    E s a E qE 15 . .p a . k a . k a .ks11 2L LL w x w xE s T a yO a , 16 . /k a . p k p k 21 2U UU w x w xE s T a yO a . 17 . /k a . p k p k 2

    Where EL and EU can be viewed as the squaredk a . k a .errors for the lower boundaries and the upper bound-

    aries of the a-cut sets of a fuzzy weight. Other a-cut

    sets of a fuzzy weight are independently modified to

    reduce E . Otherwise, the fuzzy numbers afterp a .modifications are distorted. Therefore, each fuzzy

    weight is updated similar to but different from thew xapproach of Ishibuchi 17 . That is, in the proposed

    FNN, the membership functions are asymmetric .Gaussian functions i.e., a general shape which is

    represented as:

    1 x ymexp y , x-m

    L / /2 s~ 1, x smA x s 18 . .

    2

    1 x ymexp y , otherwise U / /2 sThus, the asymmetric Gaussian fuzzy weights are

    specified by their three parameters i.e., center right.width and left width . The gradient search method is

    derived for each parameter similar to error-backprop-

    agation learning algorithm.

  • 7/29/2019 Sales Using ANN n Fuzzy

    8/15

    ( )R.J. Kuo, K.C. Xuer Computers in Industry 37 1998 1158

    3.3.2. Fuzzy delphi

    Section 3.3.1 has provided us a novel FNN to

    learn the relationship between the fuzzy inputs and

    fuzzy outputs. Thus, this subsection proposes a pilot

    study, say fuzzy Delphi, for extracting the fuzzy

    inputs and outputs.

    Delphi method was first developed by Dalkey in

    Rand Corporation. This approach has been widely

    applied in many management areas, e.g., forecasting,

    public policy analysis, or project planning. However,

    the conventional Delphi method does not converge

    very well. Besides, high survey frequencies alwaysw xresult in high costs. Thus, Ishikawa et al. 20 uti-

    lized fuzzy sets theory in the Delphi method to

    resolve the above shortcomings. This study modifiesw xthe method proposed by Ishikawa et al. 20 since the

    Gaussian function instead of triangular function is

    used. In addition, the other difference is that this

    study puts three indexes pessimistic, optimistic and.average indexes in the questionnaire in order to

    accelerate the convergence. Therefore, the proce-

    dures of the modified fuzzy Delphi method for this

    research are as follows.

    a. Collect all the possible factors which may

    affect the sales and make the sortition, grouping in

    order to formulate the first questionnaire. In this

    questionnaire, all the factors are separate and sur-

    vey results are the FNN inputs.b. Select one event from each group or dimen-

    .sion to formulate an ifthen rule in order to formthe second questionnaire which is a set of ifthen

    rules.

    c. Fuzzify the returned second questionnaires from

    the senior managers and determine the pessimistic

    index, optimistic index and average index. The for- . .mulations are as follows: 1 Pessimistic Minimum

    index

    l q l q . . . ql1 2 nl s 19 .

    n

    where l is the pessimistic index of the ith experti .and n is the number of the experts. 2 Optimistic .Maximum index

    u qu q . . . qu1 2 nu s 20 .

    n

    .where u is the optimistic index of the ith expert. 3i .Average most appropriate index . For each interval

    . .l , u , calculate the mid point, m s l qu r2i i i i iand then find

    1rnms m =m =PPP=m 21 . .1 2 n

    R L .Thereafter, the fuzzy number A s m, s , s ,which represents the mean, right width, and left

    width, respectively, for an asymmetrically bell shaped

    function, can be determined through the above in-

    dexes:

    l y mRs s 22 .

    3u ym

    Ls s 23 .3

    d. Formulate the third questionnaire with the above

    indexes and make the survey.

    e. Repeat the procedure c.w xf. Employ dissemblance index rule 21 to exam-

    ine the second and third questionnaire fuzzy num-

    bers. Restated, the dissemblance index rule deter-

    mines whether the membership functions have con- .verged or not Fig. 4 . If not, continue the next

    questionnaire until converging. Otherwise, results of

    the third questionnaire are the FNN output. The .distance between two fuzzy numbers A and B is

    11w x w xd A ,B s d A a ,B a d as b y b . . .H 2 12as0

    =

    1 L L

    w x w xA a yB aH /as0U Uw x w xq A a yB a d a 24 . /

    where b and b are given any convenient values in1 2w x w x order to surround both A a s0 and B a s0 Fig.

    . . w x3 . Basically, d A,B is in the interval 0,1 .

    Fig. 4. The concept of dissemblance index of two fuzzy numbers.

  • 7/29/2019 Sales Using ANN n Fuzzy

    9/15

    ( )R.J. Kuo, K.C. Xuer Computers in Industry 37 1998 115 9

    Fig. 5. The time series data.

    3.4. Decision integration

    From the above two parts, the ANN provides the

    sales without any special promotion while the pro-

    motion effect is forecasted by the FNN. To yield the

    final result, for the sales which consider both the

    general pattern and the promotional effect, only inte-

    grating the results from the ANN and FNN is inade-

    quate in that the promotional effect during the pro-

    motion interval may differ. Thus, this part will em-

    ploy the other ANN to combine the ANN result, the

    FNN result, and the time effect.

    4. Model evaluation results

    Section 3 has presented the proposed forecasting

    system. To verify the proposed systems feasibility, a

    set of real data is collected. In addition, the proposed

    system is also compared with the other methods,

    such as single ANN and ARMA. Both the proce-

    Table 1

    The third questionnaire

    Related products without promotion 15-day promotion length

    Promotion methods Advertising media Reference index

    Pessimistic Average Optimistic Affective interval

    $10 discount 1. From 2000 to 2100 h on TV 7.2 7.8 8.6 _to _

    2. At noon on TV 2 2.1 3.8 _to _

    3. In the evening on TV 3.6 3.9 5.2 _to _

    4. Radio 4 4.6 5.4 _to _

    5. Newspaper 4.8 5.5 6.4 _to _

    6. POP notice 6.4 7 8 _to _

    7. Poster 6.2 6.5 7.4 _to _

    $5 discount 1. From 2000 to 2100 h on TV 5.4 6 7.2_

    to_

    2. At noon on TV 1 1.1 2 _to _

    3. In the evening on TV 3 3.3 4.4 _to _

    4. Radio 3 3.4 4.4 _to _

    5. Newspaper 3.6 4.2 5.2 _to _

    6. POP notice 4.6 5 6.2 _to _

    7. Poster 5 5.3 6.2 _to _

    $3 discount 1. From 2000 to 2100 h on TV 1 1.3 2.3 _to _

    2. At noon on TV 0.8 1 1.8 _to _

    3. In the evening on TV 0.8 1 1.8 _to _

    4. Radio 1.8 2 3.2 _to _

    5. Newspaper 1 1.3 2.3 _to _

    6. POP notice 1.8 1.9 3.3 _to _

    7. Poster 2.8 2.9 4.2_

    to_

    Buy two get one free 1. From 2000 to 2100 h on TV 6.6 7.3 8.2 _to _

    2. At noon on TV 3.2 3.2 4.6 _to _

    3. In the evening on TV 3.6 3.7 5 _to _

    4. Radio 4.6 5.2 6.2 _to _

    5. Newspaper 4.8 5.4 6.4 _to _

    6. POP notice 5.8 6.5 7.4 _to _

    7. Poster 6 6.6 7.4 _to _

  • 7/29/2019 Sales Using ANN n Fuzzy

    10/15

    ( )R.J. Kuo, K.C. Xuer Computers in Industry 37 1998 11510

    dures and results are sequentially shown in the fol-

    lowing subsections.

    4.1. Data collection

    The data are provided by a locally well-known

    chain supermarket. Since the forecasting pattern is

    divided into two categories, general pattern and spe-

    cial pattern, the data collection is also comprised of

    two parts, as follows.

    4.1.1. Time series data

    The company provides the daily sales of 500 cc

    papaya milk. The total number of the data points is

    379 as shown in Fig. 5. The sudden convex indicates

    that a promotion is conducted. In total, there are five

    promotions. The date lasts from January 1, 1995 to

    January 14, 1996. For the purpose of testing, these

    379 data points are further divided into trainingsamples and testing samples. The front one has 334

    data points while the latter one has 45 data points.

    4.1.2. Expert questionnaire

    To survey all the possible factors of promotion

    and their effects on the sales, this paper employs

    fuzzy Delphi method. The questionnaires setup is

    based on related references, such as professional

    journals and senior managers knowledge. The pro-

    Fig. 6. Four fuzzy numbers.

    cedures based on the modified fuzzy Delphi method

    presented in the above section are as follows.

    4.1.2.1. Factors determination. A large number of

    factors can generally affect the sales. However, a

    different product has different characteristics. Afterdiscussing with the senior managers, all the factors

    are divided into three dimensions. The first dimen-

    sion represents the methods of promotion, while the

    types of advertising media are presented in the sec-

    ond dimension. The third dimension represents the

    competitors actions.

    4.1.2.2. First questionnaire formulation. After the

    events have been determined in the above procedure,

    Table 2

    The fuzzy number of each event for the third questionnaire

    R L . . . . .Factors Events Average m s s lym r 3 s s u ym r 3

    Promotion methods $10 discount 0.85 0.0310 0.0338

    $5 discount 0.48 0.0257 0.0378

    $3 discount 0.21 0.0206 0.0454

    Buy two get one free 0.70 0.0266 0.0369

    Advertising media At night on TV 0.77 0.0345 0.0387

    At noon on TV 0.32 0.0159 0.0523

    In the evening on TV 0.36 0.0223 0.0524

    Radio 0.44 0.0126 0.0534

    Newspaper 0.43 0.0213 0.0492

    POP notice 0.69 0.0255 0.0325Poster 0.65 0.0290 0.0365

    Promotion length 15 days 0.54 0.0221 0.0432

    20 days 0.58 0.0343 0.0477

    30 days 0.68 0.0227 0.0413

    45 days 0.62 0.0245 0.0442

    Others Related products without promotion 0.73 0.0259 0.0348

    Related products with promotion 0.48 0.0177 0.0343

  • 7/29/2019 Sales Using ANN n Fuzzy

    11/15

    ( )R.J. Kuo, K.C. Xuer Computers in Industry 37 1998 115 11

    Table 3

    The combinatorial fuzzy number for each linguistic term

    R LLevels Mean m Right standard deviation s Left standard deviation s

    Poor 0.234188 0.039563 0.066271

    Medium 0.446196 0.046899 0.059435

    Good 0.637987 0.047829 0.049004

    Very good 0.778053 0.023351 0.029316

    these events can be used to formulate the first ques-

    tionnaire. Provide the questionnaires to the senior

    managers for survey. Calculate the membership func-

    tion for each event. Results obtained from the first

    questionnaire can provide the degree of importance

    for each event and are the FNN inputs.

    4.1.2.3. Second questionnaire formulation. Since the

    questionnaire attempts to provide training data forFNN, the events of all three dimensions can be used

    to formulate the second questionnaire which is a set .of ifthen rules Table 1 . Totally, there are 56

    .4=7=2 ifthen rules. For example, the first ruleof the questionnaire is

    if

    discount is $10 and advertising time is from

    2000 to 2100 h on TV and no related product is

    under promotion

    then

    the effect on sales ranges from _ to _.

    After the survey of this questionnaire, all the

    required information, i.e., pessimistic, optimistic,

    and average indexes, is calculated on the basis of

    Section 3.

    4.1.2.4. Third questionnaire surey. Formulate the

    third questionnaire which includes the pessimistic,optimistic and average indexes. Continue the next .third survey and calculate all the information again.Table 2 presents the fuzzy number of each event.

    However, directly using these fuzzy numbers is

    time-consuming and complicated. Therefore, all the

    similar events should be combined together. Finally,

    only four linguistic variables are used: bad, medium,

    good and very good. Fig. 6 and Table 3 present the

    fuzzy numbers of these four linguistic terms.

    Table 4

    Dissemblance index rule testing results for separate events

    .Factors Events d A,B ds0.06 ds0.08

    Promotion methods $10 discount 0.010842 Accept Accept

    $5 discount 0.012683 Accept Accept

    $3 discount 0.055033 Accept Accept

    Buy two get one free 0.021496 Accept Accept

    Advertising media at night on TV 0.008304 Accept Accept

    at noon on TV 0.072394 Reject Accept

    In the evening on TV 0.036122 Accept Accept

    Radio 0.020922 Accept Accept

    Newspaper 0.039173 Accept Accept

    POP notice 0.017868 Accept AcceptPoster 0.011316 Accept Accept

    Promotion length 15 days 0.032976 Accept Accept

    20 days 0.015148 Accept Accept

    30 days 0.006339 Accept Accept

    45 days 0.013148 Accept Accept

    Others Related products without promotion 0.031801 Accept Accept

    Related products with promotion 0.02144 Accept Accept

  • 7/29/2019 Sales Using ANN n Fuzzy

    12/15

    ( )R.J. Kuo, K.C. Xuer Computers in Industry 37 1998 11512

    Table 5

    The dissemblance index rule test results for combinatorial events

    .Sales linguistic terms d A,B ds0.06 ds0.08

    Poor 0.018939 Accept Accept

    Medium 0.004207 Accept Accept

    Good 0.013379 Accept Accept

    Very good 0.005568 Accept Accept

    4.1.2.5. Similarity testing. To determine the neces-

    sity of the next survey, the dissemblance index rule

    should be utilized. The single event results and

    combinatorial events results are summarized in Ta-

    bles 4 and 5, respectively. This finding suggests that

    the membership function of each linguistic term has

    converged. The next survey does not need to be

    performed. Therefore, this knowledge base will be

    applied to train the FNN and represents the FNN

    outputs.

    ( )4.2. General pattern model ANN

    Only 379 data points are used for training. How- .ever, in a general pattern model ANN , the data

    points under promotion are not included in the train-

    ing samples. Thus, after subtracting the number of

    promotion data points from 379, the number of

    training samples is 288. This study will test different

    alternatives, say network structure. For instance, pat-

    tern 1 in Table 6 suggests that the input nodes are . .t-1 th and t-2 th sales which are connected to twohidden nodes which are connected to one output

    .node, t th sales. However, all of these data pointsw xshould be normalized in 0,1 . The training rate and

    momentum are set to be 0.3 and 0.8, respectively.

    The network will not stop learning until 50,000

    Table 7 y3 .MSEs =10 for different FNN setup

    Momentum Training rate

    0.1 0.3 0.5

    0.1 8.31 9.10 8.46

    0.5 8.15 8.52 8.63

    0.8 13.85 8.96 11.30

    epochs. In addition, statistical method, ARMA, is .also used to model the above time series data. AR 3

    has the MSE value 2.21=10y3, while the network

    whose structure is 7=7=1 has the lowest MSE

    value 2.20=10y3.

    ( )4.3. Special pattern model FNN

    Data collection demonstrated that the question-

    naire should have fifty six ifthen rules. After using

    unsimilarity index rule to test the similarity, the .result indicates that both the event newspaper A

    . .and event radio B are quite similar in that d A,Bis equal to 0.017966, i.e., being smaller than d, 0.08.

    Thus, combining the two events results in the total .number of rules is 48 4=6=2 .

    The setup of proposed FNN with asymmetric bell

    shaped fuzzy weights is similar to the general ANN.

    The network structure consists of three input nodes,

    i.e., dimensions, which are connected to six hiddennodes, i.e., numbers which are connected to one

    output node. The network will not stop learning until

    30,000 epochs. The a-level sets are 0.2, 0.4, 0.6,

    0.8, and 1.0. Different training rates and momentum

    terms may yield different results. Testing results .Table 7 indicate that the lowest MSE value is

    Table 6

    The testing alternatives for general pattern model

    y3 . . .Pattern number Network structure I=H=O MSE 10 Percentage increase %

    .0 AR 3 2.21 ))

    1 2=2=1 2.36 6.79

    2 2=2=2=1 2.36 6.78

    3 5=5=1 2.22 0.45

    4 5=5=5=1 2.23 0.76

    5 7=7=1 2.20 y0.456 7=7=7=1 2.20 y0.52

  • 7/29/2019 Sales Using ANN n Fuzzy

    13/15

    ( )R.J. Kuo, K.C. Xuer Computers in Industry 37 1998 115 13

    Table 8

    MSE values of training results for different models

    Forecasting model MSE Percentagey3 .=10 increase of MSE

    Integration ANN 3.37 )) .ARMA 2,5 4.88 46.13

    .ANN1 5=5=1 4.95 48.35

    .ANN2 5=5=5=1 5.50 49.80 .ANN3 10=10=1 4.76 44.77 .ANN4 10=10=10=1 4.77 43.10

    8.15=10y3 as training rate and momentum are 0.1

    and 0.5, respectively. According to these results, the

    smallest MSE is equal to 8.15=10y3 as the training

    rate and momentum are equal to 0.1 and 0.5, respec-

    tively. Finally, this network becomes integrated with

    general pattern model, ANN, in the next part, deci-

    sion integration.

    ( )4.4. Decision integration model ANN

    The above two parts can provide results obtainedfrom the general pattern and special pattern promo-

    .tion models. This part attempts to integrate theabove two models. Basically, the inputs are originat-

    ing from the following three sources:

    a. General pattern model provides one input node

    which is the general pattern trend;

    b. A special pattern model provides five inputnodes which are from FNN output fuzzy number

    as a is equal to 0.6, 0.8 and 1.0, respectively; and

    c. The promotion length implies one input node.

    Therefore, the total number of integration ANN

    input nodes is seven. Consequently, the network

    structure is 7=7=1. The training rate and momen-

    tum are 0.3 and 0.8, respectively. The training limit

    is 50,000 epochs. Table 8 presents the MSE value,

    which is equal to 3.37=10y3. For the purpose of

    Table 9The forecasting results

    Forecasting model MSE Percentagey3 .=10 increase of MSE

    Integration ANN 3.10 ))

    Short-term memory ANN 6.61 113.63

    Long-term memory ANN 6.78 118.99

    Fig. 7. The integration ANN forecasting result.

    comparison, this paper also uses the collected data to

    formulate the ARMA model and ANN model. Table

    8 summarized those results.

    So far, the paper has presented the training

    results. In the following, three models, i.e., integra-

    .tion ANN, short term memory ANN 5=5=1 , and .long term memory ANN 10=10=1 , are used toforecast the 45 data points with one time of promo-

    tion. The data points are the last forty five data

    points shown in Fig. 5. The related results are listed

    in Table 9 and Figs. 79.

    Fig. 8. The short-term memory ANN forecasting result.

    Fig. 9. The long-term memory ANN forecasting result.

  • 7/29/2019 Sales Using ANN n Fuzzy

    14/15

    ( )R.J. Kuo, K.C. Xuer Computers in Industry 37 1998 11514

    5. Discussion

    Section 4 has presented evaluation results based

    on data accumulated from a chain supermarket com-

    pany. The data collection appears to be subjective

    since the data are provided by either the senior

    managers or experts. However, the number of ex-

    perts is twenty, implying that the subjective factors

    can be eliminated to a minimum. In particularly the

    fuzzy Delphi method is employed; the above consid-

    eration can also be reduced. Among all types of

    promotion methods, the event $10 discount is the

    most effective event while the worst event is $3

    discount. Regarding the types of the advertising

    media, the event advertising time from 2000 to 2100

    h on TV affects sales the most. Undoubtedly, since

    2000 to 2100 h is the so-called golden interval, the

    promotion of the related product yields a negative

    effect.Notably, the pessimistic, optimistic, and average

    indexes are put in the third questionnaire. The reason

    is that providing the others experts opinions may

    accelerate the converge of the fuzzy numbers. In the

    third questionnaire, 13 fuzzy numbers are combined

    into four fuzzy numbers. The main reason is to

    reduce the computational complexity. The forecasts

    precision may not differ significantly. Moreover, the

    third questionnaire also used the unsimilarity index

    rule to determine the necessity for the next survey.

    Table 5 indicates that either d is equal to 0.06 or0.08, all the pairing fuzzy numbers are similar. Even

    by viewing the separate events result, only the event

    advertising at noon on TV is not similar as d is .0.06. Therefore, using these data or fuzzy numbers

    to train the FNN is obviously subjective. Basically,

    this is a pilot study to indicate the sources of thew xfuzzy inputs and fuzzy outputs. Both Refs. 18,17 ,

    i.e., two representative papers considering the fuzzy

    inputs and outputs, did not provide a mean to find

    the fuzzy inputs and outputs for training.

    In the second part of the proposed system, six

    alternatives, from short-term to long term memories,

    are tested. The MSE values indicate that the network .with long-term memory seven outperforms the net-

    .work with short-term memory two or five . How-ever, continuously increasing the length of memory

    does not yield a better result. The reason is that the

    unnecessary information may mislead the networks

    memory. Regarding the number of hidden layers, the

    results indicate either one or two hidden layers can

    provide a similar forecast. Owing to computational

    considerations, one hidden layer is more feasible.

    Moreover, it is the reason to select 7=7=1 net-

    work instead of 7=7=7=1 network to integrate

    with FNN.

    Table 8 indicates that integration ANN outper-forms all other forecasting methods, e.g., ARMA 2,

    .5 and single ANN. The reason is that the integrationANN prioritizes the promotion effect on the sales

    pattern. Even with the forecasting result, the integra-

    tion ANN is also second to none. Interestingly, the

    short and long term ANNs always have one day

    delay. It is because this kind of ANN only depends

    on the previous data. However, the integration ANN

    poses the promotion effect in the network.

    6. Conclusions

    This paper has developed a forecasting system

    based on FNN to solve the sales forecasting problem

    under promotion. Though directly using a single

    ANN to model the sales pattern has been shown to

    be better than the conventional statistical methods, it

    still need further improvement. Integrating the ANN

    and the FNN can provide more reliable forecast. In

    addition, the fuzzy Delphi method was applied to

    collect the fuzzy inputs and outputs for the FNN. Toour knowledge, this is the first paper to consider this

    existing but unresolved problem. In the future, the

    authors would like to further improve the FNN, like

    the pruning of the fuzzy weights and training speed

    of the network. From a marketing perspective, more

    factors, which may yield a more precise result, can

    be included.

    Acknowledgements

    The authors would like to thank the National

    Science Council, Republic of China for partially

    supporting this manuscript under Contract No. NSC

    86-2416-H-327-003-E10. Mr. L.C. Shie is also ap-

    preciated for providing the daily sales data and his

    valuable discussion regarding chain store promotion.

    In addition, the authors thank the anonymous refer-

  • 7/29/2019 Sales Using ANN n Fuzzy

    15/15

    ( )R.J. Kuo, K.C. Xuer Computers in Industry 37 1998 115 15

    ees for reading the paper and offering many helpful

    comments.

    References

    w x1 K. Chakraborty, K. Mehrotra, C.K. Mohan, Forecasting the

    behavior of multivariate time series using neural networks, .Neural Networks 5 1992 961970.

    w x2 G. Lachtermacher, J.D. Fuller, Backpropagation in Time- .series forecasting, J. Forecasting 14 1995 381393.

    w x3 G.S. LeVee, The key to understanding the forecasting pro- . .cess, J. Business Forecasting 11 4 1992 1216.

    w x4 G.G. Meyer, Marketing research and sales forecasting at . .Schlegel Corporation, J. Business Forecasting 12 2 1993

    2223.w x5 C.W. Chase, Ways to improve sales forecasts, J. Business

    . .Forecasting 12 3 1993 1517.w x6 M.M. Florance, M.S. Sawicz, Positioning sales forecasting

    . .for better results, J. Business Forecasting 12 4 1993

    2728.w x7 A.S. Weigen, D.E. Rumelhart, B.A. Huberman, Generaliza-tion by weight-elimination with application to forecasting,

    .Adv. Neural Inf. Processing Syst. 3 1991 875882.w x8 Z. Tang, C. Almeida, P.A. Fishwick, Times series forecast-

    ing using neural networks vs. BoxJenkins methodology,

    Simulations, Simulations Councils, Nov., 1991, pp. 303310.w x9 L.A. Zadeh, The concept of a linguistic variable and its

    application to approximate reasoning: Parts 13, Inf. Sci. 8 .1965 .

    w x10 M. Black, Vagueness: An Exercise in Logical Analysis,Philosophy of Science 4, pp. 427 455, 1937.

    w x11 T. Takagi, M. Sugeno, Derivation of Fuzzy Control Rulesfrom Human Operators Control Actions, Proc. of the IFAC

    Symposium on Fuzzy Information, Knowledge Representa-tion and Decision Analysis, Marseilles, France, July 1983,

    pp. 5560.w x12 C.C. Lee, Fuzzy logic in control systems: fuzzy logic con-

    trollerParts I and II, IEEE Trans. Syst. Man, Cybernetics . .SMC 20 2 1990 404435.

    w x13 C.T. Lin, C.S.G. Lee, Neural-network-based fuzzy logic .control and decision system, IEEE Trans. Comput. C 40 12

    .1991 13201336.w x14 C.T. Lin, C.S.G. Lee, Reinforcement structurerparameter

    learning for neural network based fuzzy logic control sys- . .tems, IEEE Trans. Fuzzy Syst. 2 1 1994 4663.

    w x15 R.J. Kuo, P.H. Cohen, Manufacturing process control throughintegration of artificial neural networks and fuzzy model,

    . .Fuzzy Sets and Systems 98 1 1998 1531.w x16 H. Ishibuchi, H. Okada, R. Fujioka, H. Tanaka, Neural

    networks that learn from fuzzy ifthen rules, IEEE Trans. . .Fuzzy Syst. FS 1 2 1993 8597.

    w x17 H. Ishibuchi, K. Kwon, H. Tanaka, A learning algorithm offuzzy neural networks with triangular fuzzy weights, Fuzzy

    .Sets Syst. 71 1995 277293.w x18 C.T. Lin, C.S.G. Lee, Fuzzy adaptive learning control net-

    work with on-line neural learning, Fuzzy Sets Syst. 71 .1995 2545.

    w x19 J.J. Buckley, Y. Hayashi, Fuzzy neural networks: a survey, .Fuzzy Sets Syst. 66 1994 113.

    w x20 A. Ishikawa, M. Amagasa, G. Tomiqawa, R. Tatsuta, H.Mieno, The mix-min Delphi method and fuzzy Delphi method

    .via fuzzy integration, Fuzzy Sets Syst. 55 1993 241253.w x21 A. Kaufmann, M.M. Gupta, Introduction to Fuzzy Arith-

    metic, North-Holland, Amsterdam, 1985.

    R.J. Kuo received an MS degree in In-

    dustrial and Manufacturing Systems En-gineering from the Iowa State Univer-

    sity, Ames, IA, in 1990 and Ph.D. de-

    gree in Industrial and Management Sys-

    tems Engineering from the Pennsylvania

    State University, University Park, PA, in

    1994. Currently, he is an Associate Pro-

    fessor in the Department of Marketing

    and Distribution Management, National

    Institute of Technology at Kaohsiung,

    Taiwan. His research interests include

    architecture issues of neural networks, fuzzy logic, and genetic

    algorithm, and their applications in process control, forecasting

    and marketing.

    K.C. XUE earned his MS degree in

    Management Science from the I-Shou

    University, Taiwan, in 1996. His re-

    search interests include neural networks,

    fuzzy logic, and their applications in

    marketing. Currently, he is in the mili-

    tary service.