14
Hydrologicat Sciences-Journal-des Sciences Hydrologiques, 46(2) April 2001 255 Genetic algorithms for the classification and prediction of precipitation occurrence ZEKAÎ $EN Hydraulics Division, Istanbul Technical University, Civil Engineering Faculty, Maslak 80626, Istanbul, Turkey e-mail: [email protected] AHMET ÔZTOPAL Meteorology Department, Istanbul Technical University, Maslak 80626, Istanbul, Turkey Abstract Using an approach similar to the biological processes of natural selection and evolution, the genetic algorithm (GA) is a nonconventional optimum search technique. Genetic algorithms have the ability to search large and complex decision spaces and handle nonconvexities. In this paper, the GA is applied for solving the optimum classification of rainy and non-rainy day occurrences based on vertical velocity, dew- point depression, temperature and humidity data. The problem involves finding optimum classification based on known data, training the future prediction system and then making reliable predictions for rainfall occurrences which have significance in agricultural, transportation, water resources and tourism activities. Various statistical approaches require restrictive assumptions such as stationarity, homogeneity and normal probability distribution of the hydrological variables concerned. The GAs do not require any of these assumptions in their applications. The GA approach for the occurrence classifications and predictions is presented in steps and then the application of the methodology is shown for precipitation occurrence (non-occurrence) data. It has been shown that GAs give better results than classical approaches such as discriminant analysis. The application of the methodology is presented independently for the precipitation event occurrences and forecasting at the Lake Van station in eastern Turkey. Finally, the amounts of precipitation are predicted with a model similar to a third order Markov model whose parameters are estimated by the GA technique. Key words genetic algorithms; precipitation; prediction Algorithmes génétiques appliqués à la classification et la prédiction de l'occurrence de précipitation Résumé Basé sur une approche analogue aux processus biologiques de sélection naturelle et d'évolution, l'algorithme génétique (AG) est une technique non conventionnelle de recherche d'optimum. Les algorithmes génétiques permettent d'appréhender des espaces de décision vastes et complexes ainsi que des situations de non-convexité. Dans cet article, l'AG est appliqué à une classification optimale des occurrences de jours avec et sans pluie, à partir de données de vitesse verticale, de point de rosée, de température et d'humidité. Le problème comprend l'identification de la classification optimale à partir de données connues, l'entraînement du système pour la prédiction et enfin l'utilisation en prédiction de l'occurrence de pluie pour les usages agricoles, de transport, de gestion de la ressource et de tourisme. Plusieurs approches statistiques nécessitent des hypothèses restrictives comme la stationnarité, l'homogénéité et une distribution normale des variables hydrologiques concernées. Les AG ne récessitent aucune de ces hypothèses dans leur mise en oeuvre. L'approche AG pour des classifications et des prédictions d'occurrences est présentée en détail et la méthodologie est appliquée à des données d'occurrence (non-occurrence) de précipitation. Nous avons montré que les AG donnent de meilleurs résultats que les approches classiques comme l'analyse discriminante. La méthodologie est mise en oeuvre au niveau de la station du lac de Van dans l'Est de la Turquie de manière indépendante pour les occurrences des pluies et pour les prédictions. Finalement, les hauteurs précipitées sont prédites avec un modèle analogue à un modèle de Markov du troisième ordre dont les paramètres sont estimés avec la technique d'AG. Mots clefs algorithmes génétiques; précipitation; prédiction Open for discussion until I October 2001

Genetic algorithms for the classification and prediction …hydrologie.org/hsj/460/hysj_46_02_0255.pdf · Genetic algorithms for the classification and prediction of precipitation

  • Upload
    ledien

  • View
    235

  • Download
    0

Embed Size (px)

Citation preview

Hydrologicat Sciences-Journal-des Sciences Hydrologiques, 46(2) April 2001 255

Genetic algorithms for the classification and prediction of precipitation occurrence

ZEKAÎ $EN Hydraulics Division, Istanbul Technical University, Civil Engineering Faculty, Maslak 80626, Istanbul, Turkey e-mail: [email protected]

AHMET ÔZTOPAL Meteorology Department, Istanbul Technical University, Maslak 80626, Istanbul, Turkey

Abstract Using an approach similar to the biological processes of natural selection and evolution, the genetic algorithm (GA) is a nonconventional optimum search technique. Genetic algorithms have the ability to search large and complex decision spaces and handle nonconvexities. In this paper, the GA is applied for solving the optimum classification of rainy and non-rainy day occurrences based on vertical velocity, dew-point depression, temperature and humidity data. The problem involves finding optimum classification based on known data, training the future prediction system and then making reliable predictions for rainfall occurrences which have significance in agricultural, transportation, water resources and tourism activities. Various statistical approaches require restrictive assumptions such as stationarity, homogeneity and normal probability distribution of the hydrological variables concerned. The GAs do not require any of these assumptions in their applications. The GA approach for the occurrence classifications and predictions is presented in steps and then the application of the methodology is shown for precipitation occurrence (non-occurrence) data. It has been shown that GAs give better results than classical approaches such as discriminant analysis. The application of the methodology is presented independently for the precipitation event occurrences and forecasting at the Lake Van station in eastern Turkey. Finally, the amounts of precipitation are predicted with a model similar to a third order Markov model whose parameters are estimated by the GA technique.

Key words genetic algorithms; precipitation; prediction

Algorithmes génétiques appliqués à la classification et la prédiction de l'occurrence de précipitation Résumé Basé sur une approche analogue aux processus biologiques de sélection naturelle et d'évolution, l'algorithme génétique (AG) est une technique non conventionnelle de recherche d'optimum. Les algorithmes génétiques permettent d'appréhender des espaces de décision vastes et complexes ainsi que des situations de non-convexité. Dans cet article, l'AG est appliqué à une classification optimale des occurrences de jours avec et sans pluie, à partir de données de vitesse verticale, de point de rosée, de température et d'humidité. Le problème comprend l'identification de la classification optimale à partir de données connues, l'entraînement du système pour la prédiction et enfin l'utilisation en prédiction de l'occurrence de pluie pour les usages agricoles, de transport, de gestion de la ressource et de tourisme. Plusieurs approches statistiques nécessitent des hypothèses restrictives comme la stationnarité, l'homogénéité et une distribution normale des variables hydrologiques concernées. Les AG ne récessitent aucune de ces hypothèses dans leur mise en œuvre. L'approche AG pour des classifications et des prédictions d'occurrences est présentée en détail et la méthodologie est appliquée à des données d'occurrence (non-occurrence) de précipitation. Nous avons montré que les AG donnent de meilleurs résultats que les approches classiques comme l'analyse discriminante. La méthodologie est mise en œuvre au niveau de la station du lac de Van dans l'Est de la Turquie de manière indépendante pour les occurrences des pluies et pour les prédictions. Finalement, les hauteurs précipitées sont prédites avec un modèle analogue à un modèle de Markov du troisième ordre dont les paramètres sont estimés avec la technique d'AG.

Mots clefs algorithmes génétiques; précipitation; prédiction

Open for discussion until I October 2001

256 Zekai §en & Ahmet Ôztopal

INTRODUCTION

Classifications of hydrological events, such as rainfall and runoff, as well as extreme events—droughts and floods, are very important in many areas of human activity such as agriculture, transportation, sports, and water resources. For instance, in the case of agriculture, it may be necessary to classify whether the next days or months will be rainy or non-rainy. These types of data are qualitative (verbal) in their nature and therefore, their assessments are not easily made with many quantitative approaches. However, they can be described, forecast and classified quantitatively by using probability theory. The classifications of droughts, floods, earthquakes, avalanches, fog, hail and many other natural events as occurring or nonoccurring in the future, rather than their quantitative amounts are among frequently needed information. For example, the occurrence of fog, 24-h precipitation in excess of 25 mm, tornados, convective storms, or Atlantic hurricanes in excess of some critical threshold might cause significant hazards in many parts of the world and result in possible loss of human lives (Tremant, 1987).

There exist many statistical methods for performing classifications. Some well-known methods are regression (Draper & Smith, 1981), discriminant analysis (McLachlan, 1992), classification and regression trees (Burrows, 1991) and generalized adaptive models (Vislocky & Fritsch, 1995). Some of these methods require theoretical formulations including derivatives and do not guarantee that globally optimal solutions will be reached.

Unlike traditional methods, the genetic algorithm (GA) uses the objective itself, not the derivative information. The inherent random property of the GA helps it avoid local optima. According to Adeli & Hung (1995), there are five components of GAs, namely, encoding, initialization of the population, fitness evaluation, evolution performance and working parameters. These steps are explained in the following section.

In the domains of hydrology and water resources, GAs have been used to calibrate rainfall-runoff models by Wang (1991), Franchini (1996) and Franchini & Galeati (1997). In groundwater hydrology, McKinney & Lin (1992, 1994) have employed GA methodology in order to optimize well pumping tests in addition to some other groundwater management problems. Ritzel et al. (1994) presented a solution technique using GAs for solving groundwater pollution contamination problems. Pipe network optimization by Simpson et al. (1994) and groundwater monitoring problems by Cieniawski et al. (1995) have adopted GAs as solution procedures. Recently, river level forecasting applications of GA have been explained by See & Openshaw (1999). Large and complex multidimesional decision spaces can be searched efficiently by GAs in handling nonconvexities that cause difficulties for traditional optimization methods. A classification forecasting procedure based on GAs is presented herein and then implemented for rainfall occurrence events.

GENETIC ALGORITHMS

Genetic algorithms are search algorithms based on the mechanics of natural selection and genetics. They combine the concept of survival of the fittest among string structures with a structured yet randomized information exchange to form a search algorithm with some of the innovative flair of human search (Goldberg, 1989). In every generation, a new set of artificial creatures in the form of strings is created using

Genetic algorithms for the classification and prediction of precipitation occurrence 257

bits and pieces of the fittest of the old. An occasional new part is tried for good measure. While randomized, GAs are not a simple random walk. They efficiently exploit historical information to speculate on a new search point with expected improvement in performance. Genetic algorithms differ from traditional search tech­niques in several ways (Buckles & Petry 1994): (a) GAs optimize the trade-off between exploring new points in the search space and

exploring the information discovered so far. (b) GAs have the property of implicit parallelism. This means that their effect is

equivalent to an extensive search of hyperplanes of the given space, without directly testing all of the hyperplane values.

(c) GAs are randomized algorithms, in that they use operators whose results are governed by probability. The results of such operations are based on the value of a random number.

(d) GAs operate on several solutions simultaneously, gathering information from current search points to direct subsequent research. Their ability to maintain multiple solutions concurrently makes GAs less susceptible to the problem of local maxima and noise. The technique of GAs starts with generating individual initial populations that are

stored as binary strings in the computer memory. Subsequently, each population is evolved with the use of selection, crossover and mutation principles similar to the natural biological process of evolution. Holland (1975) described the idea of selection and variation in order to evolve solutions to problems as summarized in Fig. 1. Simply, each row in the population consists of a string of binary digits (bits). As in biological systems, the string of bits is referred to as the genotype. Furthermore, each individual consists only of its genetic material which is organized into one "chromosome". Each bit position assumes values of either 1 or 0, representing a single gene. The term "bit string" refers to both genotypes and the individuals that they define. There is a variety of techniques for mapping bit strings to different problem domains.

The initial population of individuals is usually generated randomly and then each individual is tested empirically in an "environment" and is assigned a numerical evaluation of its merit by a fitness function, F. In this paper, the environment is the precipitation classification with a set of hydrological variables. The fitness function returns either a single number or a vector of numbers. It determines how each gene (bit) of an individual will be interpreted and thus what specific problem the population will evolve to solve.

Initial Population Updated Population

00111 11100 0 1010

Selection

1 1 1 0 0 1 1 1 0 0 0 1 0 1 0

Mutation

Crossover

01100 11010 01100

F(00111) = 0.1 F(11100) = 0.9 F(01010) = 0.5

Fig. 1 Genetic algorithm procedure.

258 Zekai §en & Ahmet Ûztopal

Once all the individuals in the population have been evaluated, their fitnesses are used as the basis for selection. Selection is implemented by eliminating low-fitness individuals from the population and inheritance is implemented by making multiple copies of high-fitness individuals. These inherited chromosomes are treated later by further genetic operations, such as mutation (flipping individual bits) and crossover (exchanging substrings of two individuals to obtain two offspring), which are applied probabilistically to the selected individuals to produce a new population of individuals.

The term crossover is used here to refer to the exchange of homologous strings between individuals, although the biological term "crossing over" generally implies exchange within an individual. New generations can be produced either syn­chronously, which leads to the complete replacement of the old generation, or asynchronously so that generations overlap. Additionally, in the GA technique, the crossover operator mixes and matches attributes between two chromosomes through random processes. Two random selections are applied successively, namely, the random selection of two chromosomes from the chromosome population and then that of an arbitrary gene position within the chromosomes (referred to as the crossover point). Accordingly, the genes following the crossover gene are swapped between the two chromosomes resulting in two offspring. Finally, the mutation operation is applied occasionally in order to alternate some gene values in a chromosome. Every gene within a chromosome has an equal chance for mutation, depending on the mutation probability which is usually kept at a low value to avoid losing a large number of good chromosomes.

By updating the previous set of good individuals, the operators generate a new set of individuals that have a better than average chance of also being good. When the cycle of evaluation, selection and genetic operations is iterated for many generations, the overall fitness of the population generally improves, and the individuals in the population represent improved "solutions" to whatever problem is posed in the fitness function. Selection can arbitrarily eliminate the least fit 50% of the population and make one copy of all the remaining individuals; it could also replicate individuals in direct proportion to their fitness or scale the fitness, in any of several ways and replicate individuals in direct proportion to their scaled values. Likewise, the crossover operator can pass on both offspring to the new generation or it can arbitrarily choose one to be passed on. The number of crossovers can be restricted to one per pair, two per pair, or N per pair. These and other variations of the basic algorithm have been discussed extensively by Goldberg (1989), Davis (1991), Grefenstette (1985, 1987), Schaffer (1989) and Belew & Booker (1991).

PRECIPITATION OCCURRENCE CLASSIFICATION

The classification of any natural event as occurring or nonoccurring is very significant in many practical applications in hydrology, agriculture and meteorology. Most often, statistical methods such as regression and discriminant analysis are used for the classification purposes. Statistical methods provide explicit forecasts of event occurrence classification. The basis of the statistical classification methods is that an implicit relationship is valid between the occurrence of the event and measured forecast variables. Again, in precipitation occurrence classification, meteorological variables such as the dew point depression, and vertical velocity, constitute sufficient

Genetic algorithms for the classification and prediction of precipitation occurrence 259

sources of information. Statistical methods do not provide understanding of the underlying physical mechanism of the phenomenon, but they provide an empirical basis to establish correlations using available field measurements and observations. They require some restrictive assumptions such as stationarity, homogeneity, and a normal probability distribution of the random variables concerned. Genetic algorithms, on the other hand, do not require assumptions concerning the statistical distribution of the random variables and help to find the optimum solution with minimum error value.

The general presentation of the event prediction problem can be exemplified with the scatter of points on a Cartesian coordinate system (Fig. 2). Let the classification of any event, X, be related to two other events, Y and 2. In this case, X variable for classification will have two alternatives as falling into class 1, CI, or class 2, C2, depending on the values of Y and Z. There are two phases for effective classification. During the first phase, known classes are used for training the GA procedure. The second phase classifies the event into either CI or C2 without training. This second phase constitutes the classification prediction procedure. During the training phase the following steps are used: (a) A portion of data Y vs Z is plotted and a scatter diagram (Fig. 2(a)) obtained. This

is referred to as the "training scatter diagram". If the purpose is to find a relationship between these two variables then it is sufficient to fit a curve by the least squares technique through this training scatter diagram. This corresponds to the classical regression technique.

(b) It is possible to label each one of the points in the training scatter diagram according to the occurrence or nonoccurrence of the variable, X. Hence, the training scatter diagram is grouped into two classes, as in Fig. 2(b), where occurrences are shown with squares and non-occurrence with stars.

(c) A boundary (linear or nonlinear) should be found that separates these two classes on the Cartesian coordinate plane. It is not necessary that the discriminating function be linear with respect to both parameters. It may be possible that a nonlinear form of relationship may result in a classification with fewer mis­calculations. Nonlinear functions would perhaps lead to improvement with respect

1 — • — i — ' — i — • — i — « — i — • — i ° i — ' — i — H — i — ' — i — ' — i — • — i 0 4 8 12 16 20 0 4 8 12 16 20

Z Z

Fig. 2 Scatter diagram with (a) regression line; and (b) discriminant line.

260 Zekai §en & Ahmet Oztopal

to linear discrimination without requiring the iterative solution of the GA and thus avoiding problems of convergence, local minima and computer time. This boundary separates two classes such that the total number of misclassifica-

tions is at a minimum. Such problems can be solved by multiple linear regression which is called the "discriminant function". This function is expected to discriminate between two classes most successfully. Discriminant analysis was conceived by Fisher (1936) and first brought into the literature by Barnard (1935). In the second phase of the GA classification, unclassified data (values of Y and Z) are classified in the light of the training phase, during which the discriminating boundary between CI and C2 is determined.

APPLICATION

The application of the genetic algorithm for classification is presented for two independent sets of data. The first data set is taken from the literature and it is related to precipitation occurrences during a 24-h period at Albany, New York (Panofsky & Brier, 1968). Precipitation occurrences are sought in relation to measures of vertical velocity (mm s"1), Y, and the dew point depression (°F), Z. Vertical velocity is important for the uplift of moist air which leads to feeding of clouds for higher precipitation generation. It is possible to rely on the vertical velocity measurement prior to the rainfall occurrences. Furthermore, there may be other parameters that can be considered to classify the precipitation occurrence/nonoccurrence in addition to the vertical velocity and dew point depression. Among such parameters are evaporation rates, humidity, temperature, etc. The sample consists of 91 days of which 29 days have rainfall occurrences. A scatter diagram of these data is shown in Fig. 3. For finding the best linear discriminant line by means of the GA method, the following steps are executed in sequence: (a) Two sequences of arbitrarily random variables are generated on a digital computer.

Let these be denoted by a\, ci2, ..., an for the intercept of the possible discriminate straight line and b\, b2, ..., bn corresponding to the slope of this line. According to the GA, n2 corresponds to the population size.

(b) For each one of these combinations, say i, a, and bt are coded into a binary system each with a specified length, m, of strings. Consideration of the two parameters in a single string of length 2m constitutes the basic chromosome for the problem at hand. However, during the whole genetic operation, it must be kept in mind that the first half of any chromosome represents the intercept value and the second half the slope.

(c) The relative importance of each chromosome in the population is evaluated through an error relation evaluation function. Similar percent classification errors are used for discriminant and GA approaches as stated below. Completion of these three steps constitutes the initial preparation stage of the GA procedure.

(d) The chromosomes in the initial population are now ready for genetic operations in sequence as selection (or reproduction), crossover and mutation. The selection is achieved through a "roulette wheel" with sectors proportional to errors. The smaller the error, the greater the share of the chromosome over the roulette wheel. The roulette wheel is metaphorical and the actual decisions are based on pseudorandom numbers from a computer. The purpose of this procedure is to eliminate the worst chromosomes and to regenerate better substitutes. Crossover

Genetic algorithms for the classification and prediction of precipitation occurrence 261

has the purpose of further improving chromosomes. However, mutation gives the best chromosome outside of the local better ones. Hence, new and updated populations are obtained.

(e) The overall error of the most recent population is calculated and if it is found close to the previous error value within a ±5% limit, then the GA procedure is terminated, otherwise the population undergoes the same operation again from step (d) onwards. Allowable error level is taken as ±5% in this paper. Finally, coupled sequences of as and bs are obtained in such a manner that the

error term cannot be reduced by further GA procedure. Under the light of the aforementioned explanations, for the application of GAs to data presented in Fig. 3, n2 = 100 chromosomes are considered (n = 10) with rn = 16 genes for each parameter. Of course, according to step (b), each chromosome in the population size of 100 will have 32 genes. In this study, typical values for the probabilities of crossover and bit mutation are adopted as 0.80 and 0.01, respectively.

After the execution of the above steps, the discriminant line is obtained by the GA technique as (Ôztopal, 1998):

F = 27.83 + 1.12Z (1)

However, the application of the statistical linear discriminant analysis leads to (Panofsky & Brier, 1968):

Y = 37.53 + 0.75Z (2)

In Fig. 3, the solid line is that suggested by Panofsky & Brier (1968), the dashed line by the GA and the dotted line is hand drawn. The comparison of solid and dashed lines and their calculation principles lead to the following conclusions: (a) The GA solution has steeper slope than the statistical discriminant line, which

implies that as the dew-point depression Z approaches zero, comparatively more precipitation occurrences take place.

Z(°F) Fig. 3 Precipitation occurrence and discriminant line (after Panofsky & Brier, 1968).

262 Zekai §en & Ahmet Ôztopal

(b) The statistical discriminant line separates rainy and non-rainy occurrences with 15 misclassifications. Hence, this technique yields 15/91 = 0,16 error which means that 16% of the points are mispredicted. However, the GA line misclassification is only 12 with 12/91 = 0.13 error level. Hence, he GA clearly has smaller error than the discriminant approach.

(c) The GA procedure provides an adaptive approach by decreasing the number of mislocations, i.e. errors with iteration number as shown in Fig. 4. However, the statistical discriminant analysis does not give way to successive error reduction but treats the whole data set globally. The same data set is now considered in two parts. The first part includes 45

randomly chosen data as a training set for establishing discriminant and GA lines. These lines are shown together with the training data set in Fig. 5. Even in the training

30-1

UJ 2 5 -

UJ

O

<

t 20

15-

10-

0 10 20 30 40 ITERATION

Fig. 4 Number of misplacements reduction with iteration number.

160

120

• NO PRECIPITATION • PRECIPITATION

Fig. 5 Training set and model lines.

Genetic algorithms for the classification and prediction of precipitation occurrence 263

data set, the GA approach has five misclassified points, whereas the discriminant line shows seven misclassifications.

Now, the remaining 46 data points are classified as predictions on the basis of training data model lines in Fig. 5. The results are shown collectively in Fig. 6. Here, again the GA procedure indicates fewer misclassification points than the classical discriminant analysis.

In order to investigate the performance of the GA technique, a second dataset is taken of monthly temperature, humidity and precipitation data records from 1940 to 1993 at the Lake Van meteorological station in eastern Turkey. Hence, there are 648

>-

1 6 0 - 1

120

8 0 -

* N O PRECIPITATION • PRECIPITATION

4 0 -

7 30

Z(°F) Fig. 6 Prediction of rainfall occurrences.

* PRECIPITATION < 10 mm a PRECIPITATION > 10 mm

DISCRIMINANT LINE

100

Fig. 7 Precipitation occurrence and model lines.

264 Zekai §en & Ahmet Ôztopal

simultaneous monthly data. For classifying months on the basis of precipitation occurrences, a threshold value of 10 mm is selected. Accordingly, if the precipitation is less than or equal to 10 mm it is regarded as nonoccurring. The scatter diagram is shown in Fig. 7 with percentages of temperature as the Y variable on the vertical axis and humidity, Z, on the horizontal axis. Application of the GA steps yields a dis­criminant line as:

F = 8.95 + 0.16Z (3)

which classifies rainy and non-rainy months with 10% misclassification error (68/648 = 0.10). On the other hand, the discriminant line model, also shown in the same figure, misclassifies 134 data points. This is equivalent to 134/648 = 0.20, being twice the G A line model results.

In order to further test the reliability of the GA approach, a discriminant line is determined by training the GA technique on the basis of a 1940-1966 dataset having 324 data values. The results are represented in Fig. 8 with the GA discriminant line as:

F = 12.52 + 0.10Z (4)

which achieves classification with 37 misclassification cases, while discriminant line misclassifications are equal to 59 points. Finally, Fig. 9 shows the scatter diagram of a 1966-1993 data set with the discriminant line from the training period of 1940-1966 as presented in Fig. 8. It is observed that there are 34 misclassifications with 10% error. Again, the GA model line shows better performance than the discriminant line.

Finally, the suitability of GAs for prediction of the precipitation amounts is tested by considering monthly precipitation series recorded at Lake Van. Although various previous precipitation records are considered for present monthly precipitation forecast, it is found after many trials that the present monthly value is forecast best based on the three previous monthly precipitation records. Accordingly the linear model adopted for forecasting becomes:

Yt = aYtA + bYt.2 + cYt.3 + e (5)

-^ j , ! , ! , ! , 1 20 40 60 80 100

Z (%) Fig. 8 Training data (1940-1966) set model lines.

Genetic algorithms for the classification and prediction of precipitation occurrence 265

O o

>

30- ,

-

2 0 -

1 0 -

0 -

1 0 -

* PRECIPITATION < 10 mm

oPRECIPITATION > 10 mm

*<^=Jfil^

i • i

DISCRIMINANT LINE

^ ^ ^ S ^ - - - ^ ^ G É N Ï T I C LINE

\ «/QiEiiÊâlSHBHS '

1 i • i • i 20 40 60 80 100

Z (%)

Fig. 9 Precipitation occurrence scatter diagram for 1966-1993 data set with discriminant line of 1940-1966 data set.

where Yt is the monthly precipitation forecast value and Yt.\, Yt.z and Yt.-} are the past records; a, b and c are model parameters and e is the error term. The model parameters can be estimated by using the GA procedure. Herein, first 346-month (28 years) records are used for training GA procedure which yielded model parameters a = 0.49, b = 0.15 and c = 0.14. Subsequently, the remaining months' precipitation amounts are forecast by employing the model in equation (5) with these parameters. Herein, the forecast and observed values are shown together in Fig. 10. It may be seen from this figure that, in general, forecast sequences follow observed records rather closely. However, the peaks are underestimated, which may be due to the linear form of equation (5).

< u. z < DC

< H O H > _ l X H Z

o

160

120

340 360 380 400 MONTHS

420 440

Fig. 10 Forecast and observed monthly precipitation series.

266 Zekai §en & Ahmet Oztopal

CONCLUSION

A genetic algorithm procedure for event occurrence forecasting is proposed by taking into consideration hydrological variables observed in the field. The essence of the approach is to relate occurrences or nonoccurrences of any event to some other quantitative data and then to classify event occurrences through the genetic algorithm (GA) procedure.

The fundamentals of the GA application for classification and prediction problems are presented with the application to precipitation data. The GA classification yields comparatively better results than the classical statistical methods. It is possible to trace the error reduction in the GA procedure with iteration number which is not possible in the conventional methods. The calculations indicate that the number of misclassifica-tions in the GA method is smaller than in the classical discriminant method. The GA technique is employed also for forecasting rainfall amounts from the three previous monthly records.

REFERENCES

Adeli, H. & Hung, S. L. (1995) Machine Learning—Neural Networks, Genetic Algorithms, and Fuzzy Systems. 128-135. John Wiley & Sons, New York, USA.

Barnard, M. (1935) The secular variations of skull characters in four series of Egyptian skulls. Annals of Eugenics 6, 352-371.

Belew, R. K. & Booker, L. B. (1991) (eds) Proc. Fourth Int. Conf. on Genetic Algorithms. Morgan Kaufmann, Los Altos, California, USA.

Buckles, B. P. & Petry, F.-E. (1994) An overview of genetic algorithm and their applications. In: Genetic Algorithms, 1-4. IEEE Computer Society Press, Piscatawy, New Jersey, USA.

Burrows, W. R. (1991) Objective guidance for 0-24 hour and 24^18 hour mesoscale forecasts of lake-effect snow using CART. Weather Forecasting 6, 357-378.

Cieniawski, S. E., Eheart, J. W. & Ranjithan, S. (1995) Using genetic algorithms to solve a multiobjective groundwater monitoring problem. Wat. Resour. Res. 31(2), 399-409.

Davis, L. (1991) Handbook of Genetic Algorithms. Van Nostrand Reinhold, New York, USA. Draper, N. R. & Smith, H. ( 1981) Applied Regression Analysis. John Wiley & Sons, New York, USA. Enger, I., Russo, J. A. Jr & Sorenson, E. L. (1964) A statistical approach to 2-7 hr precipitation of ceiling and visibility,

vols I and II. Travelers Research Center, Inc., Hartford, Connecticut, USA. Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics 7(11), 179-188. Franchini, M. (1996) Use of a genetic algorithm combined with a local search method for the automatic calibration of

conceptual rainfall-runoff models, Hydrol. Sci. J. 41(1), 21-39. Franchini, M. & Galeati, G. (1997) Comparing several genetic algorithm schemes for the calibration of conceptual

rainfall-runoff models. Hydrol. Sci. J. 42(3), 357-380. Goldberg, D. E. (1989) Genetic Algorithms in Search Optimization and Machine learning. Addison-Wesley, Reading,

Massachusetts, USA. Grefenstette, J. J. (1985) (eds) Proc. Int. Conf. Genetic Algorithms and Their Applications. NCARAI, Washington, DC and

Texas Instruments, Dallas, Texas, USA. Grefenstette, J. J. (1987) (eds) Proc. Second Int. Conf. Genetic Algorithms. Erlbaum, Hillsdale, New Jersey, USA. Holland, J. (1975) Adaptation of Natural and Artificial Systems. Univ. Michigan Press, Ann Arbor, Michigan, USA. McKinney, D. C. & Lin, M. D. (1992) Genetic algorithms in groundwater flow optimization. EOS, Trans. AGU 73(43),

229. McKinney, D. C. & Lin, M. D. (1994) Genetic algorithm solution of groundwater management models. Wat. Resour. Res.

30(6), 1897-1906. McLachlan, G.-J. (1992) Discriminant Analysis and Statistical Pattern Recognition. John Wiley & Sons, New York, USA. Oztopal, A. (1998) Genetik algoritmalann meteorolojik uygulamalan (Meteorological applications of genetic algorithms).

Master thesis (in Turkish), Istanbul Technical University, Istanbul, Turkey. Panofsky, H. A. & Brier, G. W. (1968) Some Application of Statistics to Meteorology. Pennsylvania State University Press USA. Ritzel, B. J., Eheart, J. W & Ranjithan, S. (1994) Using genetic algorithm to solve a multiple objective groundwater

pollution problem. Wat. Resour. Res. 30(5), 1589-1603. Schaffer, J. D. (1989) (ed.) Proc. Third Int. Conf on Genetic Algorithms. Morgan Kaufmann, Los Altos, California, USA. See, L. & Openshaw, S. (1999) Applying soft computing approaches to river level forecasting Hydrol. Sci. J. 44(5), 763-

778.

Genetic algorithms for the classification and prediction of precipitation occurrence 267

Simpson, A. R., Dandy, G. C. & Murphy, L. J. (1994) Genetic algorithms compared to other techniques for pipe optimization. J. Wat. Resour. Plan. Manage. 120(4), 423-443.

Tremant, M. (1987) La prévision du brouillard en mer. Météorologie Maritime et Activities Océanographique Connexes, Rapport No. 20. TD no. 211. World Meteorological Organization, Geneva, Switzerland.

Vislocky, R. L & Fritsch, J. M. (1995) Generalized additive models vs linear regression in generating probabilistic MOS forecasts of aviation weather parameters. Weather Forecasting 10, 669-680.

Wang, Q. J. (1991) The genetic algorithm and its application to calibrating conceptual rainfall-runoff models. Wat. Resour. Res. 27(9), 2467-2471.

Received 18 January 2000; accepted 9 November 2000