16
Hydrological Scîences~Journal-des Sciences Hydrologiques, 47( ! ) February 200 Î 107 A note on the applicability of log-Gumbel and log-logistic probability distributions in hydrological analyses: I. Known pdf PAWEL M. ROWINSKI, WITOLD G. STRUPCZEWSKI Water Resources Department, Institute of Geophysics, Polish Academy of Sciences, ul. Ksiecia Janusza 64, 01-452 Warsaw, Poland [email protected] VIJAY P. SINGH Department of Civil and Environmental Engineering, Louisiana State University, Baton Rouge, Louisiana 70803-6405. USA [email protected] Abstract Two probability density functions (pdf), popular in hydrological analyses, namely the log-Gumbel (LG) and log-logistic (LL), are discussed with respect to (a) their applicability to hydrological data and (b) the drawbacks resulting from their mathematical properties. This paper—the first in a two-part series—examines a classical problem in which the considered pdf is assumed to be the true distribution. The most significant drawback is the existence of the statistical moments of LG and LL for a very limited range of parameters. For these parameters, a very rapid increase of the skewness coefficient, as a function of the coefficient of variation, is observed (especially for the log-Gumbel distribution), which is seldom observed in the hydro- logical data. These probability distributions can be applied with confidence only to extreme situations. For other cases, there is an important disagreement between empir- ical data and theoretical distributions in their tails, which is very important for the characterization of the distribution asymmetry. The limited range of shape parameters in both distributions makes the analyses (such as the method of moments), that make use of the interpretation of moments, inconvenient. It is also shown that the often-used L-moments are not sufficient for the characterization of the location, scale and shape parameters of pdfs, particularly in the case where attention is paid to the tail part of probability distributions. The maximum likelihood method guarantees an asymptotic convergence of the estimators beyond the domain of the existence of the first two moments (or L-moments), but it is not sensitive enough to the upper tails shape. Key words probability density functions; log-Gumbel pdf; log-logistic pdf; statistical moments; L-moments; flood frequency analysis; Poland Applicabilité des distributions de probabilité log-Gumbel et log-logistique dans les analyses hydrologiques: I. Pdf connues Résumé Deux fonctions de densité de probabilité (pdf), largement utilisées dans les analyses hydrologiques, à savoir log-Gumbel (LG) et log-logistique (LL), sont considérées de deux points de vue: (a) leur applicabilité aux données hygrologiques, et (b) leurs défauts résultant des propriétés mathématiques. Dans cet article, qui est le premier d'une série, on étudie le problème classique où la pdf considérée est supposée être la vraie distribution. Le défaut le plus grand est que les moments statistiques de LG et de LL n'existent que pour une étendue très limitée de paramètres. Pour ces paramètres-là, on voit une forte augmentation du coefficient d'asymétrie en fonction de leur variation (surtout pour la distribution log-Gumbel), ce qu'on observe rarement dans les données hydrologiques. On ne peut appliquer ces modèles avec confiance que pour les situations extrêmes. Dans les autres cas, l'adéquation entre les données expérimentales et les distributions théoriques est mauvaise au niveau de la queue, ce qui est très important pour caractériser l'asymétrie de la distribution. L'étendue limitée des paramètres de forme, dans les deux distributions, rend incommode toute analyse où l'interprétation des moments est utilisée (par exemple la méthode des moments). 11 est également montré que les L-moments, fréquemment utilisés, ne sont Open for discussion until I August 2002

A note on the applicability of log-Gumbel and log-logistic ...hydrologie.org/hsj/470/hysj_47_01_0107.pdf · de leur variation (surtout pour la distribution log-Gumbel), ce qu'on observe

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A note on the applicability of log-Gumbel and log-logistic ...hydrologie.org/hsj/470/hysj_47_01_0107.pdf · de leur variation (surtout pour la distribution log-Gumbel), ce qu'on observe

Hydrological Scîences~Journal-des Sciences Hydrologiques, 47( ! ) February 200 Î 107

A note on the applicability of log-Gumbel and log-logistic probability distributions in hydrological analyses: I. Known pdf

PAWEL M. ROWINSKI, WITOLD G. STRUPCZEWSKI Water Resources Department, Institute of Geophysics, Polish Academy of Sciences, ul. Ksiecia Janusza 64, 01-452 Warsaw, Poland

[email protected]

VIJAY P. SINGH Department of Civil and Environmental Engineering, Louisiana State University, Baton Rouge, Louisiana 70803-6405. USA [email protected]

Abstract Two probability density functions (pdf), popular in hydrological analyses, namely the log-Gumbel (LG) and log-logistic (LL), are discussed with respect to (a) their applicability to hydrological data and (b) the drawbacks resulting from their mathematical properties. This paper—the first in a two-part series—examines a classical problem in which the considered pdf is assumed to be the true distribution. The most significant drawback is the existence of the statistical moments of LG and LL for a very limited range of parameters. For these parameters, a very rapid increase of the skewness coefficient, as a function of the coefficient of variation, is observed (especially for the log-Gumbel distribution), which is seldom observed in the hydro-logical data. These probability distributions can be applied with confidence only to extreme situations. For other cases, there is an important disagreement between empir­ical data and theoretical distributions in their tails, which is very important for the characterization of the distribution asymmetry. The limited range of shape parameters in both distributions makes the analyses (such as the method of moments), that make use of the interpretation of moments, inconvenient. It is also shown that the often-used L-moments are not sufficient for the characterization of the location, scale and shape parameters of pdfs, particularly in the case where attention is paid to the tail part of probability distributions. The maximum likelihood method guarantees an asymptotic convergence of the estimators beyond the domain of the existence of the first two moments (or L-moments), but it is not sensitive enough to the upper tails shape.

Key words probability density functions; log-Gumbel pdf; log-logistic pdf; statistical moments; L-moments; flood frequency analysis; Poland

Applicabilité des distributions de probabilité log-Gumbel et log-logistique dans les analyses hydrologiques: I. Pdf connues Résumé Deux fonctions de densité de probabilité (pdf), largement utilisées dans les analyses hydrologiques, à savoir log-Gumbel (LG) et log-logistique (LL), sont considérées de deux points de vue: (a) leur applicabilité aux données hygrologiques, et (b) leurs défauts résultant des propriétés mathématiques. Dans cet article, qui est le premier d'une série, on étudie le problème classique où la pdf considérée est supposée être la vraie distribution. Le défaut le plus grand est que les moments statistiques de LG et de LL n'existent que pour une étendue très limitée de paramètres. Pour ces paramètres-là, on voit une forte augmentation du coefficient d'asymétrie en fonction de leur variation (surtout pour la distribution log-Gumbel), ce qu'on observe rarement dans les données hydrologiques. On ne peut appliquer ces modèles avec confiance que pour les situations extrêmes. Dans les autres cas, l'adéquation entre les données expérimentales et les distributions théoriques est mauvaise au niveau de la queue, ce qui est très important pour caractériser l'asymétrie de la distribution. L'étendue limitée des paramètres de forme, dans les deux distributions, rend incommode toute analyse où l'interprétation des moments est utilisée (par exemple la méthode des moments). 11 est également montré que les L-moments, fréquemment utilisés, ne sont

Open for discussion until I August 2002

Page 2: A note on the applicability of log-Gumbel and log-logistic ...hydrologie.org/hsj/470/hysj_47_01_0107.pdf · de leur variation (surtout pour la distribution log-Gumbel), ce qu'on observe

108 Pawel M. Rowimki et al.

pas suffisants pour caractériser les paramètres d'emplacement, d'échelle et de forme des pdf, en particulier lorsque l'on s'intéresse aux queues des distributions. La méthode du maximum de vraisemblance garantit une convergence asymptotique des estimateurs au delà du domaine d'existence des deux premiers moments (ou L-moments), mais n'est pas assez sensible pour la forme des queues plus hautes.

Mots clefs fonctions de densité de probabilité; pdf log-Gumbel; pdf log-logistique; moments statistiques; L-moments; analyse fréquentielle des crues; Pologne

INTRODUCTION

In many cases, hydrologists construct probability distributions, which seem to have a suitable shape for their purposes, but pay little attention to the mathematical relevance of their propositions. The "true" probability density function (pdf) is usually not known in hydrological applications. Even if such information is available, a great number of unknown parameters would have to be estimated, which renders such a pdf meaningless for practical purposes (Landwehr et al., 1980). In practice, asymmetrical probability distributions with a relatively low number of parameters are frequently sought in hydrological and environmental analyses. An interesting discussion of lower-bounded distributions with significant upper tails, used in frequency analyses for hydrological extremes, has been given recently in Klemes (2000 a,b).

The objective of this study is to critically examine the applicability of two unimodal, skewed, log-type distributions, namely log-Gumbel and log-logistic. The examination is based on the properties of both traditional moments and L-moments.

BACKGROUND

Statistical moments of some orders may not exist for several pdfs used in hydrology. This means that the frequently used method of moments (MOM), or the description of statistical properties of the distribution by means of moments cannot be applied, or is restricted to a limited range of parameters; consequently, the assumed pdf may not be adequate for many hydrological analyses. For example, one may consider a non-stationary flood frequency analysis where the primary task is to investigate the time trend in the first two statistical moments. How can it be done if such moments do not exist? Also, one has to remember that numerous basic concepts of the probability theory are derived under the assumption of the existence of the first two moments. Examples of such concepts include the Tchebycheff or Holder inequalities, the laws of large numbers given by Tchebycheff, Markov, Khinchine, the central limit theorem, etc. Apart from that, it is known that if the statistical moments are finite and the characteristic function may be represented by an absolutely convergent series in the neighbourhood of a zero value, then the pdf is uniquely determined from its moments. However, this useful property is lost in the situation described above. It is worth mentioning that even the popular lognormal distribution is not uniquely determined by its existing moments (Kendall & Stuart, 1969).

In the case of the non-existence of statistical moments, one can try to use the maximum likelihood method (MLM), or the recently advocated and used L-moments, which also describe the location, scale and shape of probability distributions (Gingras & Adamowski, 1994; Hosking, 1990; Hosking & Wallis, 1997; Stedinger et al., 1993;

Page 3: A note on the applicability of log-Gumbel and log-logistic ...hydrologie.org/hsj/470/hysj_47_01_0107.pdf · de leur variation (surtout pour la distribution log-Gumbel), ce qu'on observe

On the applicability of probability distributions in hydrological analyses: I. Known pdf 109

Vogel & Fennessey, 1993; Singh, 1998). A number of relationships between traditional moments and L-moments exist, but they are applicable only when both types of moments exist. The estimation of parameters may be based on L-moments only and it is expected that the admissible range of parameters can be larger than for traditional moments. It is worth mentioning that the use of the MLM could be worthwhile when one is interested only in the estimate of parameters, since this method works beyond the domain of existence of statistical moments and L-moments.

In the following, the two two-parameter distributions, namely the log-Gumbel (LGD) and log-logistic (LLD) distributions, are considered. Both these distributions have been used recently in hydrological analyses since they fulfil three important features: (a) they are specified in the domain of positive values, (b) they depend on only two parameters, and (c) they are characterized by positively skewed shape (Cunnane, 1989; Gunasekara & Cunnane, 1992; Haktanir, 1992; Haktanir & Horlacher, 1993; Singh et al, 1993; Singh, 1998). Both these distributions are obtained by applying the logarithmic transformation to popular Fisher-Tippett type I (Gumbel) and logistic probability density functions, respectively. The procedure is analogous to the method of obtaining the lognormal distribution from the Gaussian pdf and, on first sight, the approach is absolutely correct. However, some basic problems appear when statistical moments are considered, which are discussed below.

The importance of the log-Gumbel distribution (LGD) for flood data was emphasized by Shen et al. (1980) and Ochoa et al. (1980). Singh (1985) derived parameters of LGD using the entropy theory, whereas Heo & Salas (1996) estimated quantiles and confidence intervals for it. Lee et al. (1986) employed the log-logistic distribution (LLD) for frequency analysis of multiyear drought durations. Shoukri et al. (1988) applied the two-parameter LLD to analyse extensive Canadian precipitation data, whereas Ahmad et al. (1988) employed its three-parameter version for flood frequency analysis. Haktanir (1992) concurred with Ahmad et al. (1988) for Turkish rivers. Narda & Malik (1993) used LLD to develop a model of root growth and water uptake in wheat. Singh et al. (1993) compared different parameter estimation methods for LLD. Fitzgerald (1996) derived the maximum product of spacings for LLD from the standpoint of information theory.

Both traditional moments and L-moments are discussed with respect to the assumed distributions. Although in other methods, such as those based on the maximum likelihood (ML), the principle of maximum entropy or the indirect method of moments, the range of parameters is not limited to the assumed distributions, other serious disadvantages may be readily revealed (see Weglarczyk et al., 2002 for details). Therefore, MOM and the L-moments method (LMM) are extremely important and will constitute the basis for the discussion that follows.

FLOW DATA

For analysis of data, the annual peak flow series at 39 gauging stations were employed. These come from Polish territory, covering the period 1921-1990, and are from drainage basins ranging in area from 100 to 194 000 km2. The majority of the basins are in the south of Poland, which is a mountainous part of the country. Some pertinent characteristics of the data are given in Strupczewski et al. (2001). These data were

Page 4: A note on the applicability of log-Gumbel and log-logistic ...hydrologie.org/hsj/470/hysj_47_01_0107.pdf · de leur variation (surtout pour la distribution log-Gumbel), ce qu'on observe

110 Pawel M. Rowinski et al.

selected on the basis of length, completeness and homogeneity of records and in each case the sample was found homogenous and independent at a 5% significance level.

LOG-GUMBEL DISTRIBUTION

If a random variable Y has a Gumbel distribution, then its probability density function, g{y) can be defined as:

g(y) = - e x p H exp(-^ )] (1) a a a

where parameter a is positive and u can assume any value. Furthermore, if the random variable Y= lnZ, one is interested in the distribution of the original random variable X. Simple mathematical considerations lead to the following pdf of X:

f(x) = —exp-{ - exp ax

ln(jc) - u

a a

ln(x) - u . ' l oc>0 (2)

Similar to g(y), fix) is also a unimodal probability distribution which is a consequence of the logarithmic transformation. In most studies, to apply the method of ordinary moments, the distribution/(*) of the log extreme value (type 1) has been used for the logarithms of the considered data Y (indirect method). The indirect method of moments omits the problem of the existence of moments for variable X and hence the question about the applicability of the LG distribution remains unsettled. Another question arises if the comparison of various probability density functions defined in different domains is justified.

Moments and their existence

Simple algebraic transformations lead to a more friendly form of the distribution fix):

/(Jc) = A I - " « - ' e x p ( ^ c - ' - ' « ) (3) a

where t, = exp(w/a). The statistical moments of the LG distribution are found in the literature (e.g. Heo & Salas, 1996); in the present study, only slightly different para­meterization is applied. The mean value and the next statistical moments about the origin are as follows:

mr = £,mT(\-m) where r = 1,2,3,... (4)

It is appropriate to limit the considerations to parameter a taken from the range (0, 1) for the mean value, (0, 1/2) for the second moment and the range (0, 1/3) for the third moment. Negative values of a lead in principle to the Weibull probability density functions. They behave adequately to represent many hydrological problems (e.g. Heo et al., 2000; Kaczmarek et al, 1998; Kebaili-Bargaoui, 1994; Veneziano & Villani, 1999, Tate & Freeman, 2000).

In hydrological applications, when dealing with two-parameter probability distributions, one usually limits one's consideration to the first three statistical

Page 5: A note on the applicability of log-Gumbel and log-logistic ...hydrologie.org/hsj/470/hysj_47_01_0107.pdf · de leur variation (surtout pour la distribution log-Gumbel), ce qu'on observe

On the applicability of probability distributions in hydrological analyses: I. Known pdf 111

moments and therefore only the interval (0, 1/3) for the parameter a is relevant. However, it is important to note that, if for some reason one were interested in further moments (for instance, the kurtosis as a measure of the flatness or "peakedness" of the distribution or as a weight of the tail), the range for parameter a should decrease with every new moment. It occurs because of the convergence of the relevant integrals. In general, for an rth moment, a e (0, \lï). The question remains as to how this fact can be interpreted physically. Applying the indirect MOM or MLM for parameter estimation, the problem is reduced to parameter estimation of Gumbel distribution (equation (1)) and one can obtain the value of parameter a greater than one, i.e. in the range when the mean for the LG distribution does not exist.

The coefficient of variation, CY, and skewness coefficient, Cs, representing the asymmetry of the probability distribution, are as follows:

C =

C. =

ylr(l-2a)^T*(l-a) r ( i - a )

r(l - 3a) - 3f(l - a)r(l - 2a) + 2r3 (1 - a)

r ( l -2a ) - r - ( l~a )

(5)

(6)

0.6

0.35

Fig. 1 (a) Variation coefficient C,. for log-Gumbel distribution plotted against model parameter a; and (b) skewness coefficient C, for log-Gumbel distribution plotted against model parameter a.

Page 6: A note on the applicability of log-Gumbel and log-logistic ...hydrologie.org/hsj/470/hysj_47_01_0107.pdf · de leur variation (surtout pour la distribution log-Gumbel), ce qu'on observe

112 Pawel M. Rowinski et al.

The coefficient of variation is plausible for a e (0, 1/2), while the skewness coefficient for a e (0, 1/3), Both Cs and Cv can achieve values in the interval (0, °°). This suggests, for example, that one can always obtain a certain parameter a by comparing equations (5) and (6), respectively, to the coefficients of variation and skewness obtained from empirical data, even if these data were created by the LG distribution based on parameter a > 1/3. A dependence of the coefficients of skewness and variation on parameter a is presented in Fig. 1. Additionally, one needs to take into account that some value a ~ 0.00026 also lies beyond the domain of the skewness function. The above discussion shows that a simple fit of the data to the assumed LG shape can easily lead to confusing results when one employs basic analyses of statistical moments.

Relationship between Cs and Cv

In many cases, one is interested in the relationship between, say, the coefficient of variation and the coefficient of skewness for a broad range of parameters (e.g. Svanidze & Grigolia, 1973). This may help in the understanding and quantification of model errors with respect to Cs, for example, and in assessing the applicability of a distribution for a given data set. In the case of the LG distribution, one is limited to a relatively narrow range of parameters, which does not permit one to perform a detailed analysis of this kind. The dependence of ClS on the variability of the coefficient of variation Cy for admissible and physically reasonable ranges of parameters shows that LG behaves differently from most of the two-parameter distributions, specified for positive values of x, used in the hydrological sciences (Fig. 2). One can easily see that the plot of C,S(CV) increases more quickly for the LG distribution than for the other distributions considered. At very low values of Cv, where the increase of the CS(CY)

-2 0 0.5 1 1.5 2

Cv

—I—Weibull log-Gumbel —*—— log-logistic - - - gamma

LDA O observations — - log-normal

Fig. 2 Relationship between variation and skewness coefficients for log-Gumbel, log-logistic, lognormal, linear diffusion analogy, Weibull and gamma probability distribu­tions and for empirical data on annual peak flows in 39 gauging stations in Poland.

Page 7: A note on the applicability of log-Gumbel and log-logistic ...hydrologie.org/hsj/470/hysj_47_01_0107.pdf · de leur variation (surtout pour la distribution log-Gumbel), ce qu'on observe

On the applicability of probability distributions in hydrological analyses: I. Known pdf 113

plot is extremely high, the value of skewness coefficients may exceed 1 for very small coefficients of variation (of the order of 10"'). Fortunately, such values of Cv are seldom found in hydrological practice. It is obvious, therefore, that in the applications where any of the listed distributions fits the data in a reasonable way, LG cannot be expected to provide good results, especially in its tail part. When looking at the annual peak flow data only, it is hard to find large values of the skewness coefficient at the same time as the coefficient of variation is relatively low. Within 39 gauging stations in Poland, the highest value of the skewness coefficient was found to be 3.96 (Cv = 0.69 for these data) (Strupczewski et al., 2001). The authors realize that the sample estimates of skew may be heavily downward biased, but a similar picture will be obtained with the use of nearly unbiased L-moments.

It seems that the LG distribution may provide useful results only when all traditional distributions fail. In such cases, one should either look for some equivalent distribution to LG or accept its disadvantages. It is worth pointing out that, when considering just the goodness-of-fit test to identify "objectively" if the model behaves in a reasonable way, the results can be somehow confusing. Moreover, various goodness-of-fit tests can lead to different results for the normal sizes of hydrological samples, so, as noticed by Cunnane (1989), the goodness of fit alone need not be the sole criterion in choosing a proper distribution. The basic properties of the assumed probability density functions must also be taken into account. The coefficients C, and Cv can provide enough information to disqualify a certain distribution as one which, for example, does not describe the discussed phenomenon correctly in its tail part. It may simply mean that the number and also the values of outliers are unacceptable. This feature cannot be easily "grasped" by traditional goodness-of-fit criteria. One may trivially claim that models, which one selects to fit the data, must come from a well-defined class and that, if the model-fitting process is to be useful, this class must be determined by a preceding study. Such a study should allow for selecting the relevant class to the kind of data under study.

It remains to be emphasized that, although the indirect method of moments and the maximum likelihood method give asymptotically convergent values of the parameters of the LG distribution, in the case of the direct method of moments, this convergence is limited only to the interval (0, 1/2), where both the first two moments exist.

L-moments and their existence

The above observations are, to a certain extent, obvious, but on the other hand one can note that the pdf functions, like the LG one, are used in hydrological analyses without paying much attention to the basic properties described above, namely, to the domain for the existence of statistical moments and their behaviour. A solution to the above problems is by use of an alternative system of describing the shapes of probability distributions, which is the system of L-moments (Hosking & Wallis, 1997). Various investigators have used the concept of L-statistics recently (e.g. Adamowski, 2000; Martins & Stedinger, 2000). L-moments are summary statistics for probability distributions and data samples. The first three L-moments can be obtained from a quantile function as linear functions of probability weighted moments. In the case of a LG distribution, x(F), can be expressed as:

Page 8: A note on the applicability of log-Gumbel and log-logistic ...hydrologie.org/hsj/470/hysj_47_01_0107.pdf · de leur variation (surtout pour la distribution log-Gumbel), ce qu'on observe

114 Pawel M. Rowinski et al.

x(F) = (

•- ln(F) (7)

L-moments for the considered pdf may be readily derived from probability weighted moments found in the literature (e.g. Heo & Salas, 1996):

\=\aY(\-a) (8)

\ 2 = Ç ° T ( l - a X 2 a - l ) (9)

A, 3 =Ç < T(l-<x)(2-3 a -3-2 a +l) (10)

It is easily noticed that, for the existence of the above expressions, only the condition a < 1 must be fulfilled, and it is a much weaker requirement than that for the existence of traditional moments. Dimensionless versions of L-moments are very often used as an analogy of dimensionless statistical parameters, such as the coefficients of variation, skewness, etc. An analogy with the coefficient of variation yields the L-C, parameter given as x = A/>A,i. In the case of the LG distribution, the L-coefficient of variation is given as:

x = 2 a - l (11)

One can easily note that 0 < x < 1, and that parameter x varies moderately in the neighbourhood of the critical point for the existence of traditional coefficient of variation (when a = 0.5, then x ~ 0.414). Figure 3(a) compares the measures of variation of Cv and X for the admissible range of parameters. The simplicity of the above expression, in contrast to the traditional coefficient of variation, is attractive. Also, a measure of skewness in terms of L-moments is extremely simple:

x . = A > 3 a - 3 - 2 " + 1 ) (12) ' K 2 a - l

where X3 is limited: 0 < X3 < 1. Figure 3(b) reveals the relationship between the values of Cs and X3.

One may still ask: What is the meaning of such an interpretation of the coefficient of variation and the coefficient of skewness when these quantities do not exist in a traditional sense? Does it weaken the traditional approach, or does one get a confusing picture? Traditional moments definitely have much clearer physical interpretation. But in such cases when they do not exist, what is the real interpretation of L-moments? When one deals with an infinite skewness coefficient, can one claim that a moderate value of L-skewness (x3(a = 1/3) = 0.4) has some simple interpretation as a representa­tion of the tail part of the distribution and that it is just a matter of sensitivity of these parameters to the weight of the tails? If yes, then L-moments can prove to be too weak a tool in the analyses of subtle questions concerning just tail parts of probability distribution functions. The often-stressed positive feature, that L-moments are much more robust to outliers in the data, can appear to cause problems when the information about the "tail" is valuable. Klemes (2000b) stressed this fact as a dangerous feature. The lack of sensitivity is precisely in that part of the distribution which matters most in every safety-related design.

Page 9: A note on the applicability of log-Gumbel and log-logistic ...hydrologie.org/hsj/470/hysj_47_01_0107.pdf · de leur variation (surtout pour la distribution log-Gumbel), ce qu'on observe

On the applicability of probability distributions in hydrological analyses: L Known pdf 115

= 0.414 when a = 0.5 — • |

0.1 0.2 0.3 0.4 0.5

20 18 16 14 12 10

Cs

x3(a=1/3)=0.4

0.1 0.15 0.2 0.25 0.3 0.35 x3

0.4 0.45

Fig.3 (a) Variation Cv vs L-variation x for log-Gumbel distribution; and (b) skewness Ç, vs L-skewness t3 for log-Gumbel distribution.

On the other hand, L-moments have an important feature: they are nearly unbiased. Therefore, it may be interesting to note how the LG distribution is placed on an L-moment ratio diagram in comparison to the computed experimental data (Fig. 4).

• Log-Gumbel - Log-logistic Empirical data

Fig. 4 L-moment ratio diagram: L-C,. vs L-Ç, for log-Gumbel and log-logistic distribu­tions and for empirical data on annual peak flows in 39 gauging stations in Poland.

Page 10: A note on the applicability of log-Gumbel and log-logistic ...hydrologie.org/hsj/470/hysj_47_01_0107.pdf · de leur variation (surtout pour la distribution log-Gumbel), ce qu'on observe

116 Pawel M. Rowinski et al.

One may see that the L-moments of samples representing annual peak flow series in Poland differ from the theoretical results obtained from the LG distribution by more than 60%. Similar results were obtained by Gunasekara & Cunnane (1992) and Heo et al. (2000).

LOG-LOGISTIC DISTRIBUTION

Similar considerations can be applied to the log-logistic (LL) distribution. Only a brief introduction of the LL distribution is made, since it is described in more detail in the literature (Haktanir, 1992; Haktanir & Horlacher, 1993; Singh et al., 1993). Ahmad et al. (1988) evaluated this distribution for flood frequency analysis and proved its suitability for such studies. The pdf of a random variable Y having a two-parameter LL distribution is obtained from a standard logistic distribution given by:

g(y) = exp(->0(l + exp(-jO) "2 (13)

Two parameters are introduced to the distribution with the use of transformation

Y = —i and the distribution of the original random variable Xis obtained as:

^

/ ( * ) = - a

KX

( i V (

1 + a

(14)

x \ K

where a > 0 is a scale parameter, and K > 0 is a shape parameter.

Moments and their existence

It is easy to show that the mean and the next moments about the origin for a distribution can be computed by means of the formula (Ahmad et al., 1988):

mr = a'A(r,K) where r- 1,2,3,... (15)

r(m)T(n) where A(r, K) = B(\ +rK, 1 - nc) r e N and B(m,n) >——L [s the beta function.

T{m + n) The coefficients of variation and skewness are:

c 2L±±J. \J-L (16) .4(1, K)

c = A(3, K) - 3A(2, K)A(l, K) + 2 A3 (1, K)

[A(2,K)-A2(1,K)}-

Page 11: A note on the applicability of log-Gumbel and log-logistic ...hydrologie.org/hsj/470/hysj_47_01_0107.pdf · de leur variation (surtout pour la distribution log-Gumbel), ce qu'on observe

On the applicability of probability distributions in hydrological analyses: I. Known pdf 117

0.1 0.2 0.3 ' 0.4

Fig. 5 (a) Variation coefficient C,. and (b) skewness coefficient Cs for log-logistic distribution plotted against model parameter K.

The functional dependence of coefficients Cs and C, on parameter K is presented in Fig. 5. It is again the case that the shape parameter K has to fulfil certain requirements for the existence of the particular statistical moments. One can easily note that the z'th moment is finite only when KG (0, Hi). Again, one encounters a situation in which, although the shapes of the distribution for some parameters are reasonable, their statistical moments do not exist which makes, for example, the usefulness of the popular method of moments very restricted. As in the case of the LG distribution, the indirect method of moments and also the maximum likelihood method do not impose restrictions on the admissible range of parameters and one can readily obtain the shape parameter K > 1 from those methods.

Relationship between skewness and variation coefficients

One may note that, compared to other pdfs used in the hydrological sciences, the curve representing the dependence of Cs on parameter CY of the LL distribution lies between the curve for the LG distribution discussed herein, and all others, exceeding them significantly (Fig. 2). In cases when Cs is much larger (for relatively small values of

Page 12: A note on the applicability of log-Gumbel and log-logistic ...hydrologie.org/hsj/470/hysj_47_01_0107.pdf · de leur variation (surtout pour la distribution log-Gumbel), ce qu'on observe

Pawel M. Rowinski et al

the coefficient of variation) than in the typical distributions like gamma, lognormal or Weibull, the LL can work relatively well (Ahmad et ah, 1988). However, it is still not clear how to avoid the limitations on the shape parameter when one is interested in the statistical moments of higher order.

L-moments and their existence

Again, one may utilize the concept of L-moments as discussed above. The quantile function for the LL distribution assumes the following form:

x(F) = a(j-l) (18)

L-moments for the LL distribution are as follows (e.g. Hosking & Wallis, 1997):

A, r=aK r- ,r( l + K)r(l-K) for r = 1,2,3... (19)

The dimensionless L-coefficient of variation and the L-skewness can be characterized by the same expression:

T = ^ - = jfc = T 3 =-^- (20)

Again the domain in which L-moments exist is larger than in the case of traditional moments and is defined by the inequality K < 1. Note that Fig. 5(a) and (b) also represents the dependence of the coefficients of variation and skewness on the coefficients of L-variation and L-skewness, since the latter are equal to the values of model parameter K. All the questions of the previous section on the interpretation of L-moments hold. Do the L-measures of the shape of probability density functions provide enough information from a sample when attention is paid to the tail part of the probability distribution? When criticizing the use of the LL distribution, one may also be reminded of the "condemnation" by Feller (1981) of the broad use of logistic distri­butions, which he treated as completely senseless and as just artificial manipulation to replace well-established other distributions.

NUMERICAL EXPERIMENT

Returning to the problem of the estimates of parameters with the use of various methods: as mentioned before, a precondition for asymptotic convergence of the esti­mators of parameters of MOM and MLM is the existence of relevant statistical moments. It has also been stated that the range for which those moments exist tends to zero when the rank of moments increases to infinity. When one limits one's considera­tions to the two first moments and L-moments, respectively, it is obvious that, even if the first moment does not exist, the mean taken over a finite random sample is a finite value, but it ceases to be an unbiased estimator of the first moment and becomes an increasing function of the sample size, i.e.:

1 ,v

lim m\A ' = lim — V x, — °° (21)

Page 13: A note on the applicability of log-Gumbel and log-logistic ...hydrologie.org/hsj/470/hysj_47_01_0107.pdf · de leur variation (surtout pour la distribution log-Gumbel), ce qu'on observe

On the applicability of probability distributions in hydrological analyses: I. Known pdf 119

One deals with exactly the same problem when the second central moment does not exist:

lira AVV' = lim —!— Y (x,. -m\N)) = oo (22)

Assuming that one has an vV-element random sample from the population fulfilling one of the considered pdfs with the parameters from beyond the domain of the existence of both, or just the second moment: by applying MOM one would obtain the value of the

estimator from the range &M0M e (0,.5)for log-Gumbel and kmm e (0,.5) for LL dis­

tributions. When the LMM is applied, those estimators will be obtained from a larger

domain, namely â iMM e (0,1), JtLMM € (0,1). The MLM method does not impose any

limitations on the sought estimators âMIM and kmM and those estimators are asymptotically effective and unbiased. A lack of convergence of the mentioned methods again results from the nonexistence of the moments and a formal method to overcome this obstacle may be the application of indirect methods of moments and linear moments. However, especially in the case of flood frequency analysis, a signi­ficance is attached to the fitting of upper tails and the explanation for the use of the logarithms could be very hard (unless robustness for outliers is discussed). Logarith­mic transformation reduces the information contained in the largest values of the sample. Despite this problem, the use of indirect MOM is a frequent practice in hydrological research.

A special numerical experiment was designed to illustrate the above-mentioned problems. To focus the discussion, LG distribution is considered again; the conclu­sions from the LL distribution remain the same. A Monte Carlo method was applied to generate 500 random samples of N = 20 and N = 40 elements from the LG distribution with the parameter a equal to 0.25, 0.55 and 1.05, respectively and u = 0.0. Then, the parameter estimates were obtained with the use of three methods, namely MOM, LMM and MLM, and the results of the computations are shown in Table 1. The solutions of relevant nonlinear algebraic equations were found with the use of an evolutionary strategy realized in its so-called ((j. + À.) — ES variant (e.g. Michalewicz, 1996). Table 1 shows that one can easily make a mistake when estimating the parameters for the samples generated from a pdf with parameters from beyond the domain of existence of statistical moments or L-moments. Having a sample of the length of 40 elements, for example, generated with parameters a = 1.05 and u = 0.0, MOM produces the para­meter a as low as 0.49 (close to its limit) and u = 0.05, and LMM gives a = 0.81 and it = 0.04. This divergence is more pronounced when shorter samples are considered. Such divergence may be easily obtained when a practitioner automatically applies one of the methods—MOM or LMM. On the other hand, it is shown that an MLM is also not acceptable in practice for the considered pdfs (Weglarczyk et al, 2002).

CONCLUDING REMARKS

One may pose the question: Why use particular probability density functions, which reveal very inconvenient properties, i.e. nonexistence of statistical moments for certain values of parameters? Although they describe (by their shape) some physical processes

Page 14: A note on the applicability of log-Gumbel and log-logistic ...hydrologie.org/hsj/470/hysj_47_01_0107.pdf · de leur variation (surtout pour la distribution log-Gumbel), ce qu'on observe

120 Pawel M. Rowinski et al.

Table 1 Parameter estimates for log-Gumbel distribution with the use of MOM, LMM and MLM.

£(a) c(a) C,(a) £(M) o(w) Cs(u)

Generated for N= 20, a = 0.25, u = 0.0 MOM 0.209 0.130 -0.504 0.039 0.031 0.048 LMM 0.216 0.056 -0.504 0.027 0.034 1.065 MLM 0.242 0.044 0.253 0.038 0.027 0.729 Generated for N= 20, a = 0.55, u = 0.0 MOM 0.395 0.069 -0.670 0.061 0.029 -0.542 LMM 0.477 0.124 -0.393 0.031 0.034 0.943 MLM 0.533 0.088 -0.005 0.046 0.031 0.149 Generated for N = 20, a = 1.05, u = 0.0 MOM 0.483 0.024 -2.605 0.05 0.029 -0.014 LMM 0.781 0.144 -0.790 0.04 0.028 0.346 MLM 1.010 0.174 -0.170 0.049 0.049 0.061 Generated for N = 40, a = 0.25, u = 0.0 MOM 0.219 0.137 -1.315 0.034 0.028 0.678 LMM 0.229 0.037 -0.636 0.025 0.031 1.264 MLM 0.247 0.031 -0.102 0.037 0.013 0.479 Generated for N = 40, a = 0.55, a = 0.0 MOM 0.416 0.051 -0.558 0.064 0.027 -0.576 LMM 0.502 0.088 -0.213 0.031 0.034 0.237 MLM 0.538 0.067 -0.071 0.045 0.029 0.251 Generated for ^ = 4 0 , a = 1.05, « = 0.0 MOM 0.491 0.016 -5.895 0.05 0.029 0.005 LMM 0.809 0.104 -0.469 0.041 0.028 -0.058 MLM 1.025 0.127 -0.016 0.048 0.028 0.066

in a reasonable way, it is suggested that one should rather look for other mathematical representations having better computational features, or, at least, limit their use to absolutely necessary situations. It is usually possible to find a few statistical distribu­tions revealing a similar goodness of fit to the data, but indeed both the LG and LL distributions behave differently than most of the distributions used in hydrology. Therefore, some basic research in this respect is still necessary. A question is raised on the applicability of L-moments when the values from the tail part of distributions provide useful information. In such cases, it is not obvious that L-moments enable secure inferences about the underlying distributions.

It is obvious that, when statistical moments exist, the estimates of distribution parameters from the method of moments, method of linear moments and maximum likelihood method should asymptotically lead to the same results. However, this property fails when one operates beyond the domain of functions representing statis­tical moments of LG and LL distributions.

The discussion on the applicability of both distributions is continued from another perspective in the second paper in this two-part series (Weglarczyk et ai, 2002), which considers the situation in which the assumed distribution differs from the true one.

Acknowledgements This study has been financed by the Polish State Committee for Scientific Research (Grant KBN no. 6 P 04 D 056 17), entitled "Revision of applica-

Page 15: A note on the applicability of log-Gumbel and log-logistic ...hydrologie.org/hsj/470/hysj_47_01_0107.pdf · de leur variation (surtout pour la distribution log-Gumbel), ce qu'on observe

On the applicability of probability distributions in hydrological analyses: I. Known pdf 121

bility of the parametric methods of estimation of statistical characteristics of floods". The authors are thankful to A. Kozlowski for his help in the numerical computations.

REFERENCES

Adamowski, K. (2000) Regional analysis of annual maximum and partial duration flood data by nonparametric and L-moment methods. J. Hydro!. 229, 219-231.

Ahmad, M. I., Sinclair, C. D. & Werritty, A. (1988) Log-logistic Hood frequency analysis../. Hydrol. 98, 205-224. Cunnane, C. (1989) Statistical distributions for flood frequency analysis. WMO Operational Hydrological Report no. 33,

73. World Meteorological Organization, Geneva, Switzerland. Feller, W. (1981) An Introduction to Probability Theory and its Applications, vol. 2. PWN, Warsaw Poland (Polish

translation). Fitzgerald, D. L. ( 1996) Maximum product of spacings estimators for the generalized Parelo and log-logistic distributions.

Stochast. Hydrol. Hydraul. 10. 1-15. Gingras, D. & Adamowski, K. (1994) Performance of L-moments and nonparametric flood frequency analysis. Can. J.

Civil Engng 21, 856-862. Gunasekara, T. A. G. & Cunnane, C. (1992) Split sampling technique for selecting a flood frequency' analysis procedure.

./. Hydrol. 130, 189-200. Haktanir, T. (1992) Comparison of various flood frequency distributions using annual flood peaks data of rivers in

Anatolia. J. Hydrol. 136, 1-31. Haktanir, T. & Horlacher, H. B. (1993) Evaluation of various distributions for flood frequency analysis. Hvdrol. Sci. J.

38(1), 15-32. Heo, J. H. & Salas, J. D. (1996) Estimation of quantiles and confidence intervals for the log-Gumbel distribution. Stochast.

Hydrol. Hydraul. 10, 187-207. Heo, J. H., l.ee, D. .1. & Kim, K. D. (2000) Caution on regional Hood frequency analysis based on Weibull model. Wat.

Engng Res. (KWRA) 1( 1 ), 11-23. llosking, J. R. M. (1990) L-moments: analysis and estimation using linear combinations of order statistics. J. Row Statist.

Soc. B 52(2), 105-124. llosking, J. R. M. & Wallis, J. R. (1997) Regional Frequency Analysis: An Approach Based on L-moments. Cambridge

University Press, Cambridge, UK. Kaczmarek, Z., Napiôrkowski, J. J. & Rowinski, P. M. (1998) Conceptual catchment water balance model. In:

Hydroinformatics '98 (ed. by V. Babovie & L. C. Larscn), 143-147. Balkema, Rotterdam, The Netherlands. Kebaili-Bargaoui, Z. (1994) Comparison of some estimation methods in frequency analysis. J. Hydraul. En^ng 120(2),

228-235. Kendall, M. G. & Stuart, A. (1969) The Advanced Theory of Statistics, Part 1. Distribution theory, 109, 179. Griffin,

London, UK. Klemes, V. (2000a) Tall tales about tails of hydrological distributions, I. J. Hydrol. Engng ASCE 5(3), 227-231. Klemes, V. (2000b) Tall tales about tails of hydrological distributions, II. J. Hydro!. Engng ASCE 5(3), 232-239. Landwehr. J. M., Matalas, N. C. & Wallis. J. R. (1980) Quantile estimation with more or less floodlike distributions. Wat.

Resour. Res. 16(3), 547-555. Lee, K. S., Sadeahipour, J. & Dracup, .1. A. ( 1986) An approach for frequency analysis of multivear drought duration. Wat.

Resour Res. 22(5), 655--662. Martins, E. S. & Stedinger, j . R. (2000) Generalized maximum-likelihood generalized extreme-value quantile estimations

for hydrologie data. Wat. Resour. Res. 36(3), 737-744. Michalewicz, Z. (1996) Genetic Algorithms ! Data Structures - Evolution Programs. Springer-Verlag, Heidelberg,

Germany. Narda, N. K. & Malik, R. K. (1993) Dynamic model of root growth and water uptake in wheat. Indian J. Açiïc. Engin;

3(3 & 4), 147-155. Ochoa, I. D., Brvson, M. C. & Shen, H. W. (1980) On the occurrence and importance of parentian-tailcd distributions in

hydrology. J. Hydrol. 48, 53-62. Shen, 11. W„ Brvson, M. C. & Ochoa, I. D. (1980) Effect of tail behavior assumptions on flood predictions. Wat. Resour.

Res. 16, 36'1-364. Shoukri, M. M, Mian, I. U. H. & Tracy, D. S. (1988) Sampling properties of estimators of the log-logistic distribution

with application to Canadian precipitation data. Can. J. Statist. 16(3), 223-236. Singh, V. P. (1985) On the log-Gumbel (LG) distribution. Hydrol. J. E1HH(4), 34^12. Singh, V. P. ( 1998) Entropy-based Parameter Estimation in Hydrology. Kluwcr, Dordrecht, The Netherlands. Singh, V. P., Guo, H. & Yu, F. X. (1993) Parameter estimation for 3-parameter log-logistic distribution (LLD3) by POME.

Stochast. Hydrol. Hydraul. 7(3), 163-177. Stedinger, J. R.. Vogel, R. M. & Foufoula-Georgiou, E. (1993) Frequency analysis of extreme events. In: Handbook of

Hydrology (ed]"by IX A. Maidment)., Chapter 18. Mc-Graw Hill, New" York, USA. Strupezewski, W. G.. Singh, V. P. & Mitosek, 11. T. (2001) Non-stationarv approach to at-side flood frequency modeling,

III. Flood analysis of Polish rivers../. Hydrol. 248, 152 167. Svanidze, G. G. & Grigolia, G. L (1973) O vybore podehodzaseego zakona raspredelenia dla raseota rechnego stoka (On

the selection of a suitable distribution for the calculation of river flow, in Russian). J. I'odnye Resursy 6, 73-81.

Page 16: A note on the applicability of log-Gumbel and log-logistic ...hydrologie.org/hsj/470/hysj_47_01_0107.pdf · de leur variation (surtout pour la distribution log-Gumbel), ce qu'on observe

122 Pawel M. Rowinski et al.

Tale, E. L. & Freeman, S. N. (2000) Three modeling approaches for seasonal strcamflow in southern Africa: the use of censored data, ffydml. Sci. J. 45( 1 ), 27^42.

Weglarczyk, S., Strupczewski, W. G. & Singh, V. P. (2002) A note on the applicability of log-Gumbel and log-logistic probability distributions in hydrological analyses, U. Assumed pdf. Hydro!. Sci. J. 47(1), 123-138-

Veneziano, D. & Villani, P. (1999) Best linear unbiased design hyetograph. Wat. Resour. Res. 35(9), 2725-2738. Vogei. R. M. & Fennessev, N. M. (1993) L-momenl diagrams should replace product-moment diagrams. Wat. Resour.

Res. 29, 1745-1752."

Received 20 October 2000; accepted 27 August 2001