13
Hindawi Publishing Corporation ISRN Probability and Statistics Volume 2013, Article ID 829131, 12 pages http://dx.doi.org/10.1155/2013/829131 Research Article A Weighted Estimation for Risk Model Mei Ling Huang 1 and Ke Zhao 2 1 Department of Mathematics, Brock University, St. Catharines, ON, Canada L2S 3A1 2 Alberta Health, Edmonton, AB, Canada T5J 4R7 Correspondence should be addressed to Mei Ling Huang; [email protected] Received 16 June 2013; Accepted 19 August 2013 Academic Editors: J. Abellan, P. Dai Pra, and S. Sagitov Copyright © 2013 M. L. Huang and K. Zhao. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We propose a weighted estimation method for risk models. Two examples of natural disasters are studied: hurricane loss in the USA and forest fire loss in Canada. Risk data is oſten fitted by a heavy-tailed distribution, for example, a Pareto distribution, which has many applications in economics, actuarial science, survival analysis, networks, and other stochastic models. ere is a difficulty in the inference of the Pareto distribution which has infinite moments in the heavy-tailed case. Firstly this paper applies the truncated Pareto distribution to overcome this difficulty. Secondly, we propose a weighted semiparametric method to estimate the truncated Pareto distribution. e idea of the new method is to place less weight on the extreme data values. is paper gives an exact efficiency function, 1 -optimal weights and 2 -optimal weights of the new estimator. Monte Carlo simulation results confirm the theoretical conclusions. e two above mentioned examples are analyzed by using the proposed method. is paper shows that the new estimation method is more efficient by mean square error relative to several existing methods and fits risk data well. 1. Introduction 1.1. Two Motivating Examples. In the recent years, many extreme events have occurred in financial markets, natural disasters, disease control, and industrial quality control. Natural disasters, for example, earthquakes, hurricanes, forest fires, volcanoes, and floods affect human life. It is important to predict and prepare for the next disaster occurrence and to estimate losses to inhabitants, insurance companies, and governments. In this section, we study two examples. 1.1.1. A Hurricane Loss Example. Strong winds, heavy rainfall, and storm surges caused by hurricanes cause death and destroy properties. ey generate great losses to insurance companies as well. Figure 1 shows the 49 costliest Atlantic hurricane losses for the United States during 1900–2005 [1]. e measurement of this hurricane loss data is in US dollars; all dollars have been adjusted by using the inflation rates from 1900 to 2005. From the data in [1], we note that the most costly hurricane is the 1926 Great Miami Hurricane with cost of damage of 157 billion which is 1.58 times larger than the second worst hurricane, the 1900 Galveston Hurricane. Aſter 1926, on August 28, 2005 Hurricane Katrina caused damage of 81 billion. is is approximately 1.19 times larger than Hurricane Andrew in August 1992, which caused damage of 56 billion. On the other hand, we note that 80% of the hurricane losses are less than 21 billion. Many smaller hurricanes are not listed. ese considerations raise a number of questions. (a) How do we predict the loss of the next hurricane? Will it again be 1.58 times larger than the worst one so far? (b) Do we keep the traditional approach which is to prepare for the worst event? Should we focus on the extreme events? (c) What is the value at risk with 5% probability? (d) How do we set an upper limit of the losses? (e) How do we estimate the distribution of largest losses? e objective of this study is to find the best model to fit the empirical data and to answer the above questions as accurately as possible. Since the data has large losses, we consider that the data should fit a heavy-tailed distribution. Studies on Pareto-type heavy-tailed distributions are rapidly

Research Article A Weighted Estimation for Risk Modeldownloads.hindawi.com › archive › 2013 › 829131.pdf · 2019-07-31 · Research Article A Weighted Estimation for Risk Model

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article A Weighted Estimation for Risk Modeldownloads.hindawi.com › archive › 2013 › 829131.pdf · 2019-07-31 · Research Article A Weighted Estimation for Risk Model

Hindawi Publishing CorporationISRN Probability and StatisticsVolume 2013 Article ID 829131 12 pageshttpdxdoiorg1011552013829131

Research ArticleA Weighted Estimation for Risk Model

Mei Ling Huang1 and Ke Zhao2

1 Department of Mathematics Brock University St Catharines ON Canada L2S 3A12 Alberta Health Edmonton AB Canada T5J 4R7

Correspondence should be addressed to Mei Ling Huang mhuangbrockuca

Received 16 June 2013 Accepted 19 August 2013

Academic Editors J Abellan P Dai Pra and S Sagitov

Copyright copy 2013 M L Huang and K Zhao This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

We propose a weighted estimationmethod for riskmodels Two examples of natural disasters are studied hurricane loss in the USAand forest fire loss in Canada Risk data is often fitted by a heavy-tailed distribution for example a Pareto distribution which hasmany applications in economics actuarial science survival analysis networks and other stochastic models There is a difficulty inthe inference of the Pareto distribution which has infinite moments in the heavy-tailed case Firstly this paper applies the truncatedPareto distribution to overcome this difficulty Secondly we propose a weighted semiparametric method to estimate the truncatedPareto distribution The idea of the new method is to place less weight on the extreme data values This paper gives an exactefficiency function 119871

1-optimal weights and 119871

2-optimal weights of the new estimator Monte Carlo simulation results confirm the

theoretical conclusionsThe two above mentioned examples are analyzed by using the proposed methodThis paper shows that thenew estimation method is more efficient by mean square error relative to several existing methods and fits risk data well

1 Introduction

11 Two Motivating Examples In the recent years manyextreme events have occurred in financial markets naturaldisasters disease control and industrial quality controlNatural disasters for example earthquakes hurricanes forestfires volcanoes and floods affect human life It is importantto predict and prepare for the next disaster occurrence andto estimate losses to inhabitants insurance companies andgovernments In this section we study two examples

111 AHurricane Loss Example Strongwinds heavy rainfalland storm surges caused by hurricanes cause death anddestroy properties They generate great losses to insurancecompanies as well Figure 1 shows the 49 costliest Atlantichurricane losses for the United States during 1900ndash2005 [1]The measurement of this hurricane loss data is in US dollarsall dollars have been adjusted by using the inflation rates from1900 to 2005

From the data in [1] we note that the most costlyhurricane is the 1926 Great Miami Hurricane with cost ofdamage of 157 billion which is 158 times larger than thesecond worst hurricane the 1900 Galveston Hurricane After

1926 on August 28 2005 Hurricane Katrina caused damageof 81 billion This is approximately 119 times larger thanHurricane Andrew in August 1992 which caused damageof 56 billion On the other hand we note that 80 ofthe hurricane losses are less than 21 billion Many smallerhurricanes are not listedThese considerations raise a numberof questions

(a) Howdowe predict the loss of the next hurricaneWillit again be 158 times larger than the worst one so far

(b) Do we keep the traditional approach which is toprepare for the worst event Should we focus on theextreme events

(c) What is the value at risk with 5 probability(d) How do we set an upper limit of the losses(e) How do we estimate the distribution of largest losses

The objective of this study is to find the best model tofit the empirical data and to answer the above questions asaccurately as possible Since the data has large losses weconsider that the data should fit a heavy-tailed distributionStudies on Pareto-type heavy-tailed distributions are rapidly

2 ISRN Probability and Statistics

0

20

40

60

80

100

120

140

160

180

Billion

1900 1920 1940 1960 1980 2000

Year

Figure 1 The 49 USA atlantic hurricane loss 1900ndash2005

increasing with applications to extreme values insurancesurvival analysis networks and risk analysis [2 3]Thereforewe choose a Pareto model In this paper we will proposea weighted method to study the hurricane loss data Thestatistical analysis results are given in Section 6

112 A Forest Fire Loss Example Large forest fires have asignificant impact on natural social and economic systemsHowever most fires are extinguished in the initial stages andthus remain smallThese smaller fires have a large probabilityof occurrence but the resulting damage is almost negligibleon an individual basis Large forest fires however havea low probability of occurrence but the damage and thelosses are huge So modelling large fire losses is becomingcritical in the analysis of the risk of the next large forestfire Figure 2 shows 30 forest fire losses during 1977 to 2006in AB Canada We are concerned that the losses have beenincreasing over the last 30 years The data listed in Table 1 arefire occurrence records from the Forestry Service (Councilof Canadian Fire Marshals and Fire Commissioners 2008Canadian Fire Statistics (httpwwwccfmfccastatshtml)The database includes forest fire records for all the insurancelosses The data contains a relatively substantial number oflarge losses which convince us to use a Pareto model Thestatistical analysis results are given in Section 6

12 The Truncated Pareto Distribution Many risk modelswith heavy tails have been developed using the class ofthe Pareto distributions Some of these cases are found incity population sizes the occurrence of natural resourcesthe sizes of firms and personal income [3] The Paretodistribution is on the list of distributions of Frechet domainwhich belongs to the extreme value distribution of type II[4] It is important to explore estimation methods for Paretodistribution There are theoretical difficulties in studying thePareto distribution since it has infinite moments in heavy-tailed situations We propose using the truncated Paretodistribution in these models to overcome these difficultiesUsually we choose the upper limit as the largest value in thedata set In the recent years the truncated Pareto distribution

600

800

1000

1200

1400

1600

Million

1980 1985 1990 1995 2000 2005 2010

Year

Figure 2 The 30 forest fire losses 1977ndash2006 AB Canada

has become an alternative model for the original Paretodistribution

There are several kinds of Pareto distributions We con-sider a type I Pareto distribution in this paper [3]

Definition 1 The probability density function (pdf) and thecumulative distribution function (cdf) of a random variable119883 having the Pareto distribution are given by

119891119901(119909) =

120572120574120572

119909(120572+1) 0 lt 120574 le 119909 lt infin 120572 gt 0 (1)

119865119901(119909) = 1 minus (

120574

119909)

120572

0 lt 120574 le 119909 lt infin 120572 gt 0 (2)

where 120572 is the shape parameterWhen 0 lt 120572 le 1 which is a heavy-tailed case the mean

and variance of 119883 are infinite and the distribution is heavieron the right tail as 120572 decreases

The truncated Pareto distribution was originally used todescribe the distribution of oil fields by size It has a lowerlimit 120574 an upper limit ] and a shape parameter 120572 In factit has been shown that the truncated Pareto distributionfits better than the nontruncated distribution for positivelyskewed populations [5]

Definition 2 The pdf and cdf of a random variable 119883having the truncated Pareto distribution are given by

119891 (119909) =120572120574

120572119909minus120572minus1

1 minus (120574])120572 0 lt 120574 le 119909 le ] lt infin 120572 gt 0 (3)

119865 (119909) = 1 minus120574120572(119909

minus120572minus ]minus120572)

1 minus (120574])120572

0 lt 120574 le 119909 le ] lt infin 120572 gt 0

(4)

ISRN Probability and Statistics 3

Table 1 Forest fire loss data AB Canada (in million 2006 CAD)

Year Loss1977 57171978 65491979 75641980 97901981 90541982 99891983 81601984 92951985 92601986 97351987 95621988 100431989 111961990 123741991 123971992 124141993 118191994 115151995 111081996 116331997 129161998 117561999 123192000 118522001 142082002 148902003 135612004 112662005 126592006 14033

The quantile function of the truncated Pareto distribution is

119865minus1(119906) = (

1 minus 119906

120574120572+119906

]120572)

minus1120572

0 le 119906 le 1 0 lt 120574 le ] lt infin 120572 gt 0

(5)

The mean and variance of119883 are

120583 =

120572120574120572(120574

1minus120572minus ]1minus120572)

(120572 minus 1) (1 minus (120574])120572

)

0 lt 120574 lt ] lt infin 120572 gt 0 (6)

1205902=

120572120574120572(]2minus120572 minus 1205742minus120572)

(2 minus 120572) (1 minus (120574])120572

)2minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

0 lt 120574 lt ] lt infin 120572 gt 0

(7)

13 The Weighted Empirical Distribution Function Recentlysome parametric estimation methods for the truncatedPareto distribution has been developed [6] But there isan efficiency problem in the estimates of the distribution

tails The distribution tail values and their probabilities areimportant in many fields for example value at risk in riskanalysis survival probability in survival analysis tolerancelimits in quality control prediction intervals and confidenceintervals Classical statistical inference theory depends on theclassical empirical distribution function (EDF) 119878

119899(119909)

119878119899(119909) =

1

119899

119899

sum

119894=1

119868(minusinfin119909]

(119883119894) where 119868

119860=

1 if 119909 isin 1198600 if 119909 notin 119860

(8)

119878119899(119909) is a minimum variance unbiased estimator for the

cdf 119865(119909) based on a random sample 1198831 119883

2 119883

119899 Note

that 119878119899(119909) uses the equal weight 1119899 for each sample point

Should we use equal weights on extreme data values aswell Recently authors have applied various weights to datapoints by using different philosophies that is the Jackknifemethod gives zero weight for eliminated data [7] weightedbootstrap [8] and weighted empirical distribution functionsor processes have been discussed [9 10] But there are somedifficulties to determine what weights should be used forthe data points Huang and Brill [11] introduced a weightedlevel crossing estimation method from a geometric point ofview to visualize random samples in the 119871

1-optimal sense the

method improves the efficiency of the estimation of tailsThis paper proposes a semiparametric approach to esti-

mate 119865(119909) in (4) using 1198711-optimal and 119871

2-optimal weights

Both theoretical and simulation efficiencies are consistentlyimproved when compared with existing methods Thismethod is based on a symmetric weighted empirical distribu-tion function (SWEDF) of Huang [12] namely

119865lowast

119899(119909) =

119899

sum

119894=1

119868(minusinfin119909]

(119883(119894)) 119901

119899119894 119909 isin K 119899 gt 2 (9)

where the 119901119899119894are symmetric general weights

119901119899119894equiv

119908 119894 = 2 119899 minus 1 0 lt 119908 lt1

119899 minus 2

1199081119899equiv1

2(1 minus (119899 minus 2)119908) 119894 = 1 119899

(10)

and 119883(1)

le 119883(2)

le sdot sdot sdot le 119883(119899)

are the order statistics of therandom sample Note that

0 lt 119901119899119894lt 1 119894 = 1 119899

119899

sum

119894=1

119901119899119894= 1 (11)

The parameter 119908 in (9) is the weight for the middle119899 minus 2 data 119908

1119899in (9) is the weight for the extreme data

It is interesting to explore how the value of 119908 affects theestimation of a heavy-tailed distribution We may use 119908

1119899

flexiblyIn Section 2 we propose a weighted method for esti-

mating the shape parameter and the mean of the truncatedPareto distribution In Section 3 an exact efficiency functionof the new mean estimator relative to the classical estimatoris derived Section 4 explores the 119871

1-optimal weights and

1198712-optimal weights for estimating the mean Section 5 gives

4 ISRN Probability and Statistics

results of Monte Carlo simulations The simulation efficien-cies are consistent with the exact efficiencies in Section 3 InSection 6 we analyze the hurricane loss data and forest fireloss data given in Section 1 by using the proposed methodThe statistical inference in these examples shows that theestimated distribution curve by using proposed weightedestimation method fits the tails of data better relative toseveral existing methods Suggestions for further studies arealso discussed

2 Estimation Methods

In this section we discuss the existing and proposedmethod-ologies for the truncated Pareto distribution Consider arandom sample 119883

1 119883

2 119883

119899from the distribution in (3)

and let119883(1)le 119883

(2)le sdot sdot sdot le 119883

(119899)denote its order statistics

21 Maximum Likelihood Estimators (Hill Beg and Aban)There are several different maximum likelihood estimators(MLE) for estimating the shape parameter 120572

A popular estimator is the Hill [13] MLE which uses the119903 largest order statistics 1 le 119903 le 119899 minus 1 to estimate theoriginal Pareto shape parameter 120572 in (1) When applying itto the truncated Pareto distribution in (3) it is defined as

Hill = [119903minus1

119903

sum

119894=1

ln119883(119899minus119894+1)

minus ln119883(119899minus119903)

]

minus1

(12)

Beg [5] developed the MLE method for the truncatedPareto distribution when ] is knownThe Beg MLE for 120572 canbe obtained by solving the following equation

119899[

[

1

Beg+

ln (119883(1)Beg)

((119883(1)])

minusBegminus 1)

minus ln(119892

119883(1)

)]

]

= 0 (13)

where 119883(1)

= min(1198831 119883

2 119883

119899) and 119892 = (119883

(1)119883

(2)

sdot sdot sdot 119883(119899))1119899 is the sample geometric mean

Abanrsquos MLE [6] when 120574 ] are known is obtained bysolving the equation

119899

Aban+119899(120574])

Aban ln (120574])

1 minus (120574])Aban

minus

119899

sum

119894=1

[ln119883(119894)minus ln 120574] = 0 (14)

where 120574 le 119883(119894)le ] and 119899 is the sample size

Note that wemay use estimators 120574 = min(1198831 119883

2 119883

119899)

and ] = max(1198831 119883

2 119883

119899) in (14) when 120574 ] are unknown

A similar situation is in the following equations (16) and (18)There are other estimators of 120574 and ] in the literature forexample Cooke [14]

22Moment Estimator To estimate the population truncatedPareto mean the sample mean estimator is

120583119883= 119883 =

1

119899

119899

sum

119894=1

119883119894 (15)

A Moment estimator 119872

for estimating 120572 is the solution ofthe equation

119883 =

119872120574119872 (120574

1minus119872 minus ]1minus119872)

(119872minus 1) (1 minus (120574])

119872)

119872gt 0 (16)

23 A Proposed Weighted Estimator Now to estimate thepopulation truncated Pareto mean we define a weightedmean based on the weighted empirical distribution functionin (9) as

120583119908= 119883

119908=1

119899

119899

sum

119894=1

119901119899119894119883

(119894)

=

119899minus1

sum

119894=2

119908119883(119894)+1

2(1 minus (119899 minus 2)119908) [119883

(1)+ 119883

(119899)]

(17)

where 119899 is the sample size and 119908 is the weight as defined in(9) Then for estimating 120572 we define

119908to be the solution of

the equation

119883119908=

119908120574119908 (120574

1minus119908 minus ]1minus119908)

(119908minus 1) (1 minus (120574])

119908)

119908gt 0 (18)

3 An Exact Efficiency Function

In this section an exact efficiency function of the weightedestimator 120583

119908in (17) relative to the sample mean 119883 in (15)

is given The mathematical derivations are given in theappendix

Theorem 3 Themean and mean square error (MSE) of 120583119908in

(17) when 120572 gt 0 are given by

119864 [120583119908] = 120574

119899minus1

sum

119894=2

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

)

times 120588119896[Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1)

+1

2(1 minus (119899 minus 2)119908)

times120588119896(Γ (119899 + 1) Γ (119896 + 1)

Γ (119896 + 119899 + 1)+

119899

119896 + 119899)]

(19)

119872119878119864 (120583119908) = 119886119908

2+ 119887119908 + 119888 119908ℎ119890119903119890 0 lt 119908 lt 1

119886 = 119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862

119887 = minus1

2(119899 minus 2) 119861 + 119862 minus 2120583119863 + 120583 (119899 minus 2) 119864

119888 =1

4119861 minus 120583119864 + 120583

2

119860 = 120574

119899minus1

sum

119894=2

infin

sum

119896=0

(

2

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1)

ISRN Probability and Statistics 5

+ 21205742

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times 120588119896+119897

times[Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)]

119861 = 120574

infin

sum

119896=0

(

2

120572+ 119896 minus 1

119896

)

times120588119896[Γ (119899 + 1) Γ (119896 + 1)

Γ (119896 + 119899 + 1)+

119899

119896 + 119899]

+ 21205742

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897[Γ (119899 + 1) Γ (119896 + 1)

Γ (119896 + 119899) (119896 + 119897 + 119899)]

119862 = 1205742

119899minus1

sum

119894=2

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times 120588119896+119897[Γ (119899 + 1) Γ (119896 + 1) Γ (119896 + 119897 + 119894)

Γ (119896 + 119894) Γ (119896 + 119897 + 119899 + 1)

+Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899) (119896 + 119897 + 119899)]

119863 = 120574

119899minus1

sum

119894=2

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

)

times120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1)

119864 = 120574

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

)

times 120588119896[Γ (119899 + 1) Γ (119896 + 1)

Γ (119896 + 119899 + 1)

+119899

119896 + 119899]

(20)

where 120588 = (119892120572minus 1)119892

120572 119892 = ]120574 0 lt 120588 lt 1 0 lt 119892 lt 10 lt 120574 le 119909 le ] lt infin

Γ (119911) = int

infin

0

119905119911minus1119890minus119905119889119905 119911 gt 0 119894119904 119905ℎ119890 119866119886119898119898119886 119891119906119899119888119905119894119900119899

(21)

Corollary 4 The efficiency function (EFF) of 120583119908

in (17)relative to the sample mean 119883 in (15) is given by119864119865119865 (120583

119908)

=

Var (119883)119872119878119864 (120583

119908)=

1205902119899

119872119878119864 (120583119908)

=

120572120574120572(]2minus120572 minus 1205742minus120572) (2 minus 120572) (1 minus (120574])

120572

)

119899119872119878119864 (120583119908)

minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

119899119872119878119864 (120583119908)

(22)

where 1205902 and119872119878119864(120583119908) are given in (7) and (20)

4 Optimal Weights

41 1198711-Optimal Weights Huang and Brill [11] proposed an

1198711-optimal weight which is based on Manhattan metric for

the 119865lowast

119899(119909) in (9) It is

1199081198711-opt =

1

radic119899 (119899 minus 1)

gt1

119899 for the middle data (23)

Huang and Brill [11] proved that the exact efficiency (EFF)of 119865lowast

119899(119909) in (9) relative to the EDF 119878

119899(119909) exceeds 1 on the

tails of the distribution In this paper we use this 1199081198711-opt

weight in order to improve the efficiency of estimating the tailprobability of the truncated Pareto distribution Huang [12]indicates that in general for any distribution if119908 gt 1119899 whichmeans puttingmore weight on the middle data the efficiencyof estimating the tail probability will be improved

Next we explore an alternative 1198712-optimal weight

42 1198712-Optimal Weights

Corollary 5 An L2-optimal weight 119908

1198712-opt for the efficiencyfunction of the 120583

119908given in (17) for estimating the population

mean in (3) relative to the sample mean 119883 in (16) when 119899 gt

4119862119861 + 2 is given by

1199081198712-119900119901119905 = minus

119887

2119886 (24)

The minimum119872119878119864min(120583119908) and 119864119865119865max(120583119908) are given by

119872119878119864min (120583119908) = minus1198872

4119886+ 119888

119864119865119865max (120583119908)

=

120572120574120572(]2minus120572 minus 1205742minus120572) (2 minus 120572) (1 minus (120574])

120572

)

119899 (minus1198872 (4119886) + 119888)

minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

119899 (minus1198872 (4119886) + 119888)

(25)In (24) and (25) 119886 119887 119888 119862 and 119861 are defined in (20)

6 ISRN Probability and Statistics

Table 2 Exact efficiencies of 120583119908relative to other estimators by using 119871

2-optimal weights 120574 = 1 ] = 10

Sample size 120572 1199081198712-opt 119908 = 1119899 119908

1198711-opt MSE (120583119908) MSE (119883) EFF (120583

119908(119883))

119899 = 10

08 01039601 010541

04007 04034 1006818 011446 01406 01647 117173 011847 00365 00493 13501

119899 = 20

08 005015005 005130

02017 02017 1000118 005255 00766 00824 107593 005463 00217 00247 11367

119899 = 30

08 003323003333 003390

01344 01345 1000318 003425 00528 00549 103993 003529 00155 00164 10628

119899 = 50

08 002021002 002020

00968 01063 1097818 002159 00514 00592 115363 002298 00274 00352 12849

119899 = 100

08 001001001 001005

00765 00830 1084518 001002 00542 00576 106373 001002 00380 00400 10526

The proof of Corollary 5 is in the appendixTable 2 lists the values of 119908

1198712-opt 1199081198711-opt MSEmin(120583119908)MSEmin(119883) and the exact EFFmax(120583119908) of 120583119908 relative to119883 for119899 = 10 20 30 50 and 100 120572 = 08 18 and 3 by using (24)and (25) We note that all values of 119908

1198712-opt are greater than1119899 and all exact EFFmax(120583119908) relative to119883 are greater than 1

Remark 6 The 1199081198711-opt in (23) is totally nonparametric it is

more robust and easy to use Note that1199081198712-opt in (24) depends

on the parameter 120572 In practice we may estimate 120572 first thenobtain a119908

1198712-opt while still keeping the optimal advantage Ofcourse we use the given120572 values in the simulationsHowever1199081198711-opt and 1199081198712-opt are close to each other when 119899 gt 100

5 Simulations

Next we use simulations to compare the performance ofthe three parametric MLE estimators that is Hillrsquos Begrsquosand Abanrsquos estimators in (12) (13) and (14) and the twosemiparametric estimators that is the Moment and theweighted estimators in (16) and (18) for estimating the shapeparameter 120572 in (3) We generate m = 1000 random samplesof size 119899 = 100 from the distribution of (3) 119908

1198711-opt is onlyused for the weighted estimator since Table 2 indicates thatthe values of 119908

1198712-opt and 1199081198711-opt are close to each other when119899 = 100

We know that if120572 le 2 the original Pareto distribution hasan infinite variance if 120572 le 1 then the mean is infinite Thesecases have inference difficultiesWe focus on 120572 = 05 08 and18 and let 120574 = 1 ] = 10 Figure 3 contains the box-plotsof the comparison of these five estimators Note that in thecases 120572 = 05 and 08 Begrsquos and Hillrsquos estimators have largebiases and the weighted Abanrsquos and the moment estimatorsperformed very well but sometimes Abanrsquos MLE estimateshave unstable solutions When 120572 = 18 there are similarresults except Hillrsquos and Begrsquos estimators performed better It

is interesting to see that in all three cases the median of theweighted estimator for 120572 is relatively larger than Abanrsquos andthe Momentrsquos ones We will discuss how these affect the tailestimation in the next section The simulations were run byusing MAPLE 15 with double precision

6 Examples

Now we use the proposed method and compare it withexisting methods to analyze the data of the two examplesoutlined in Section 1

61 Hurricane Loss Example

611 Comparison of Four Estimation Methods At first Welook at the hurricane loss example in Section 111 Based onthe simulation results in Figure 3 we consider the three betterestimators out of the five in the heavy-tailed case that isAbanrsquos Moment and the weighted estimators in (27) (28)and (29) for the truncated Pareto pdf 119891(119909) in (3) and thecdf 119865(119909) in (4)We also compare them with Hillrsquos estimatorin (26) for the original Pareto pdf 119891

119901(119909) in (1) and the cdf

119865119901(119909) in (2) Here

119891Hill (119909) =Hill120574

Hill

119909119908 119865Hill (119909) = 1 minus (

120574

119909)

Hill (26)

where Hill is given in (12) with 119903 = 119899 minus 1 consider

119891MLE (119909) =MLE120574

MLE119909minusMLEminus1

1 minus (120574])MLE

119865MLE (119909) = 1 minus120574MLE (119909

minusMLE minus ]minusMLE)

1 minus (120574])MLE

(27)

ISRN Probability and Statistics 7

Table 3 Estimation of index mean median and VaR for hurricane loss data

Method 120583 Median 5 value at riskPareto

(Hill) 08126 infin 868 billion 14768 billionTPD

(Aban) 06206 2174 billion 973 billion 8515 billionTPD

(Moment) 06476 2048 billion 947 billion 8255 billionTPD

(Weighted) 06990 1937 billion 902 billion 7758 billion

1614121

080604020

181614121

080604020

181614121

2

262422

080

2 4 6 8 10Weighted Aban Beg Hill Moment

2 4 6 8 10Weighted Aban Beg Hill Moment

2 4 6 8 10Weighted Aban Beg Hill Moment

120572 = 05

120572 = 08

120572 = 18

minus02

Figure 3 Box-plot of the estimators for the shape parameter 120572 byusing 119871

1-optimal weight 120574 = 1 ] = 10 sample size 119899 = 100 and

generating timesm = 1000

where MLE is Abanrsquos MLE given in (14) consider

119891119872(119909) =

119872120574119872119909

minus119872minus1

1 minus (120574])119872

119865119872(119909) = 1 minus

120574119872 (119909

minus119872 minus ]minus119872)

1 minus (120574])119872

(28)

where 119872

is given in (16) consider

119891119908(119909) =

119908120574119908119909

minus119908minus1

1 minus (120574])119908

119865119908(119909) = 1 minus

120574119908 (119909

minus119908 minus ]minus119908)

1 minus (120574])119908

(29)

where 119908

is given in (18) using the weight 1199081198711-opt =

1radic119899(119899 minus 1)

The results of these four methods are listed in Table 3 byusing the hurricane loss data where 120574 = 119883

(min) ] = 119883(max)

and 119899 = 49Figure 4 is a log-log plot showing the upper tail for the

hurricane loss data In this plot the circles represent the realdata and the straight line represents the estimated originalPareto distribution The dashed line dotted line and thicksolid line represent the estimated truncated Pareto distri-butions by using Abanrsquos Moment and Huangrsquos estimatorsrespectively We observe two conclusions intuitively

(1) The original Pareto distribution (straight line) doesnot fit the data well in the tail The truncated Paretodistribution fits the data very well using all threeestimation methods (Aban Moment and weighted)Note that the 5value at risk estimated by the originalPareto model is 147 billion the three truncatedmodelestimates are about 80 billion It appears that theoriginal Pareto model overestimates the risk Aninsurance company would set a high premium if itis using the Pareto model with the result that manypeople cannot afford to buy insurance

(2) We examine the three truncated Pareto estimatesAround the tail the weighted estimate fits the datathe best that is the ln(119875119883 gt 119909) curve turns down-ward more quickly following the trend of the datapattern because the weighted estimate

119908is largest

among the three truncated model estimates (this isconsistent with the simulation results in Figure 4)and the weighted mean estimate 120583

119908and the 5 value

at risk are the smallest among the three methods (thisis obtained by placing less weight on the extremevalue eg the 1926 great miami hurricanersquos loss)so its estimated distribution is less heavy in the tailcompared with other methods

Next in order to confirm these conclusions we runthree goodness of-fit tests Later we define the absolute errorand integrated error as the measures of the distance fromthe empirical data points to the estimated Pareto curve andtruncated Pareto curves

612 Goodness-of-Fit Tests Our objective is to test if theestimated distributions in (26)ndash(29) fit the data properlyWe test the hypotheses 119867

0 119865(119909) = 119865

lowast(119909) against 119867

1

119865(119909) = 119865lowast(119909) where 119865(119909) is the true unknown distribution

function and119865lowast(119909) is the estimated Pareto cdf in (26) or the

estimated truncated Pareto cdf in (27)ndash(29) In this paperwe use three EDF goodness-of-fit tests

8 ISRN Probability and Statistics

DataParetoAban

MomentWeighted

22 23 24 25 260

minus1

minus2

minus3

minus4

ln(x)

ln(P(X

gtx))

Figure 4 Log-log plot of the hurricane loss data

(1) The Kolmogorov-Smirnov (K-S) test [15] the teststatistic is given by

119879 = sup119909

1003816100381610038161003816119865lowast(119909) minus 119878

119899(119909)

1003816100381610038161003816 minusinfin lt 119909 lt infin (30)

where 119878119899(119909) is the EDF in (8) Under 119867

0 the two-tailed 119875

value for the K-S test is as follows

119875-value

= 1 minus [1 minus 119879

[119899(1minus119879)]

sum

119895=0

119899

119895 (119899 minus 119895)(1 minus 119879 minus

119895

119899)

119899minus119895

(119905 +119895

119899)

119895minus1

]

2

(31)

where [119899(1 minus 119879)] is the integer part of 119899(1 minus 119879)(2) Anderson and Darling [16] test (A-D test) introduced

a measure of ldquodistancerdquo between the empirical distribution119878119899(119909) and the proposed cdf119865lowast

(119909) by using ametric functionspace

1198822

119899= 119899int

infin

minusinfin

[119878119899(119909) minus 119865

lowast(119909)]

2

120595 (119865lowast(119909)) 119889119865 (32)

where 120595(119906) = 1(119906(1 minus 119906)) is a weight function with 119906 =

119865(119909) Let 119906119894= 119865(119883

(119894)) 119894 = 1 119899 and under 119867

0the test

statistic and 119875-value are given by

1198822

119899= minus119899 minus

1

119899

119899

sum

119895=1

(2119895 minus 1) (log (119906119895) + log (1 minus 119906

119899minus119895+1))

119875-value = 1 minusradic2

119911

infin

sum

119895=0

(minus1)

119895Γ (119895 + 12) (4119895 + 1)

119895119890minus(4119895+1)

21205872(8119911)

timesint

infin

0

119890((z8)(1199082+1)minus(4119895+1)212058721199082(8119911))

119889119908

(33)

008

006

004

002

0

Aban

Moment

Pareto

Weighted

4e + 10 6e + 10 8e + 10 1e + 10 12e + 1014e + 10

x

Figure 5 Absolute errors |119865lowast(119909)minus119878

119899(119909)| for hurricane loss example

on the 10 largest losses data

where 119911 is the observed value of 1198822

119899and Γ(119909) =

intinfin

0119905119909minus1

119890minus119905119889119905 119909 gt 0 is the Gamma function

(3) Cramer-von Mises test [16] proposed (C-v-M test)using 120595(119906) = 1 in (32) thus under 119867

0 the test statistic and

119875-value are given by

1198991205962=

1

12119899+

119899

sum

119895=1

(119906119895 minus2119895 minus 1

2119899)

2

119875-value = 1 minus1

120587radic119911

infin

sum

119895=0

(minus1)

2119895Γ (119895 + 12) (4119895 + 1)

12

119895radic120587119890minus(4119895+1)

2(16119911)

times11987014 ((4119895 + 1)

2

16119911)

(34)

where 11987014(119909) is the modified Bessel function of the second

kind

119870120572(119909) =

120587

2

119868minus120572(119909) minus 119868

120572(119909)

sin (120572120587)

119868120572(119909) =

infin

sum

119898=0

1

119898Γ (119898 + 120572 + 1)(119909

2)

2119898+120572

(35)

For fitting the 49 losses in the hurricane loss data forexample after computing estimates of 119865lowast

(119909) by using (26)(27) (28) and (29) we compute the absolute error (AE) in(30) and define the integrated error (IE) by

IE = [int119883(119899)

119883(119899minus119903+1)

(119878119899(119909) minus 119865

lowast(119909))

2

119889119909]

12

(36)

Figure 5 gives the absolute errors (AE) in (30) for the 10largest losses and confirms that the original Pareto estimatehas relatively larger errors and the weighted estimate hasrelatively smaller errors in the tail Figure 5 explains the datafitting of the tail of the distributions in Figure 4

We also compute the AE in (30) and IE in (36) to confirmthose tail errors by using 119903 as the number of the largest losses1 le 119903 le 119899 The AE and IE values are given in Table 4 (thesmallest values are bold with lowast)

ISRN Probability and Statistics 9

Table 4 Errors of goodness-of-fit tests 119899 = 49 for hurricane example

MethodGoodness-of-fit tests

Absolute error (AE) Integrated error (IE)119903 = 49 119903 = 18 119903 = 10 119903 = 49 119903 = 18 119903 = 10

Pareto(Hill) 01340 00584 00584 04844 03818 03723

TPD(Aban) 00948lowast 00839 00832 03114 02565 02161

TPD(Moment) 01053 00738 00737 02985lowast 02171 01825

TPD(Weighted) 01250 00561lowast 00561lowast 03098 01678lowast 01453lowast

Table 5The estimation of index mean median and VaR for forestfire loss data

Method 120583 Median 5 value at riskPareto

(Hill) 43044 1096 billion 109 billion 1855 billionTPD

(Aban) 07964 1167 billion 115 billion 1446 billionTPD

(Moment) 08260 1166 billion 115 billion 1446 billionTPD

(Weighted) 08642 1165 billion 115 billion 1446 billion

The weighted estimator has the smallest AE and IE valuesfor 119903 = 18 and 10 largest losses and its IE value is almostequal to the smallest IE value for 119903 = 49 all largest losses Westatistically conclude that theweighted estimated distributionis the best fit in the tail of the hurricane loss data

62 Forest Fire Loss Data Example Next we look at the forestfire loss example in Section 112 The data in Table 1 containsa relatively substantial number of large forest fire losses whichconvince us to use a truncated Paretomodel and compare thefour estimators in (26) (27) (28) and (29) We use the 25largest losses in this study

Figure 6 is a log-log plot which shows the upper tail forthe forest fire loss data The circles represent the real dataand the straight line represents the estimated original Paretodistribution The dashed line dotted line and thick solidline represent the estimated truncated Pareto distributionby using Abanrsquos Moment and the weighted estimatorsrespectively We can see that the estimated truncated Paretodistributions fit the data very well using all three estimationmethods and are much better than the original Paretodistribution Around the tail the weighted and the Momentestimation methods perform the best The fact that the tailof the data curves downward in Figure 6 is the evidence insupport of using a truncated Pareto model

The results of these three estimators are listed in Table 5by using the forest fire loss data where 120574 = 119883

(1) ] = 119883

(119899)

and 119899 = 25Note that the weighted estimate

119908is the largest among

the three truncated Pareto estimates The 5 value at risk ofthe original Pareto estimate is the largestThe 5VaRmay beoverestimated compared to the truncated Pareto models

Similarly as in the hurricane example we compute theAE in (30) and IE in (36) again for fitting the 25 losses of theforest fire loss data

Figure 7 gives the absolute errors (AE) in (30) for the 10largest losses and confirms that the original Pareto estimate

ndash3

ndash25

ndash2

ndash15

ndash1

ndash05

0207206 208 2092 21 211

ln(x)

ln(P(X

gtx))

DataParetoAban

MomentWeighted

Figure 6 Log-log plot for forest fire example

has relatively larger errors and the weighted estimate hasrelatively smaller errors in the tail Figure 7 explains the datafitting of the tail of the distributions in Figure 6

We also compute the AE in (30) and IE in (36) to confirmthose tail errors by using 119903 as the number of the largest losses1 le 119903 le 119899 The IE values are given in Table 6 (the smallestvalues are bold with lowast)

The weighted estimator has the smallest AE and IE valuesfor 119903 = 10 and 5 and its IE value in 119903 = 25 is almost equalto the smallest IE value We statistically conclude that theweighted estimated distribution is the best fit to the tail of theforest fire loss data

7 Conclusions

In the complicated real world it is difficult to construct amodel combining all the desired features In general thefinal model selection depends on the best fitting model Thecriteria are based on goodness-of-fit tests existence of themoments characteristic largest values and log-log plots Thehurricane loss data and forest fire loss data are well fitted bythe truncated Pareto distribution In summary

(a) we recommend that the truncated Pareto model isappropriate as a loss distribution to be used whenanalyzing huge risk loss data setsTheupper and lower

10 ISRN Probability and Statistics

Table 6 Errors of goodness-of-fit tests 119899 = 25 for forest fire loss data

MethodGoodness-of-fit tests

Absolute error (AE) Integrated error (IE)119903 = 25 119903 = 10 119903 = 5 119903 = 25 119903 = 10 119903 = 5

Pareto(Hill) 02656 01288 01288 27125 11493 10985

TPD(Aban) 01503lowast 01195 01195 13687lowast 0 7911 05430

TPD(Moment) 01520 00982 00982 13736 0 7764 05430

TPD(Weighted) 01542 00964lowast 00964lowast 13796 07577lowast 05203lowast

Aban

Moment

Pareto

Weighted

012

01

008

006

004

002

0125e + 09 13e + 09 135e + 09 14e + 09 145e + 09

x

Figure 7 Absolute errors |119865lowast(119909)minus119878

119899(119909)| for forest fire loss example

the 10 largest losses

limits can be set by the largest and smallest losses orother reasonable values

(b) the estimated loss distribution provides a predictionof the next disasterrsquos 5 value at risk The largestloss in the data set plays an important role in theprediction In Figures 4 and 6 the trend of thetail distribution is crucial for an insurance companysetting policy also for inhabitants and governmentto make plans to minimize damage from naturaldisasters

(c) the semiparametricmethods (Moment andweighted)are robust easy to use more stable and fit the databetter than theMLEmethods In both of the foregoingtwo examples the estimated shape parameter is lessthan 1 The original Pareto model is not able to usetheMoment or the weightedmethodsThis is anotheradvantage of using the truncated Pareto model

(d) the statistical inference in these examples shows thatthe estimated distribution curve by using proposedweighted estimation method fits the tails of databetter relative to the Moment and MLE estimatorsThis is due to the fact that theweighted estimator givesless weight on the extreme values and has good MSEBased on these studies we suggest that further studieson the usage of weights may be useful

Appendix

Mathematical Derivations

Lemma A1 For a truncated Pareto random variable 119883 withpdf given in (3) for 119894 119895 isin 1 2 119899 119894 lt 119895 119883

(1)le 119883

(2)le

sdot sdot sdot le 119883(119899)

are order statistics one has

119864 [119883(119894)] = 120574

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1) (A1)

119864 [1198832

(119894)] = 120574

infin

sum

119896=0

(

2

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1) (A2)

119864 [119883(119894)119883

(119895)]

= 1205742

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897

[Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)]

(A3)

where 120588 = (119892120572minus 1)119892

120572 119892 = ]120574 0 lt 120588 lt 1 0 lt 119892 lt 10 lt 120574 le 119909 le ] lt infin and 120572 gt 0

Proof Let the cdf of a truncated Pareto distribution in (4)be

119865 (119909) = 119906 119909 = 120574(1 minus 119887119906)minus1120572

120588 =119892120572minus 1

119892120572 119892 =

]

120574

(A4)

By the theory of order statistics we have

119891119894(119909) =

Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)[119865 (119909)]

119894minus1[1 minus 119865 (119909)]

119899minus119894119891 (119909)

(A5)

119891119894119895(119909 119910) =

Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)

times [119865 (119909)]119894minus1[119865 (119910) minus 119865 (119909)]

119895minusiminus1

times [1 minus 119865 (119910)]119899minus119895

119891 (119909) 119891 (119910)

(A6)

ISRN Probability and Statistics 11

Using the binomial formula 1(1 minus 119909)119903 = suminfin

119896=0(119903+119896minus1

119896) 119909

119896119898 = 1 2 we have

119864 [119883119898

(119894)] = int

]

120574

119909119898119891119894(119909) 119889119909

= int

1

0

120574(1 minus 120588119906)minus119898120572 Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)

times (119906)119894minus1(1 minus 119906)

119899minus119894119889119906

=Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)120574

times

infin

sum

119896=0

(

119898

120572+ 119896 minus 1

119896

)

timesint

1

0

120588119896(119906)

119896+119894minus1(1 minus 119906)

119899minus119894119889119906

= 120574

infin

sum

119896=0

(

119898

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1)

(A7)

By substituting 119898 = 1 and 2 respectively we have (A1) and(A2) And

119864 [119883(119894)119883

(119895)]

= int

119910

120574

int

]

120574

119909119894119909119895119891119894119895(119909 119910) 119889119909 119889119910

=Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)

times int

1

0

int

1

0

120574(1 minus 120588119906)minus1120572

120574(1 minus 120588ℎ)minus1120572

times(119906)119894minus1(ℎ minus 119906)

119895minus119894minus1(1 minus ℎ)

ℎminus119895119889119906 119889ℎ

=Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)1205742

times

infin

sum

119896=0

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

) (

1

120572+ 119897 minus 1

119897

)

times 120588119896+119897

int

1

0

int

1

0

(119906

ℎ)

119896+119894minus1

(1 minus119906

ℎ)

119895minus119894minus1

times(ℎ)119896+119897+119895minus1

(1 minus ℎ)119899minus119895119889119906 119889

119906

=

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)

(A8)

Proof of Theorem 3 When 120572 gt 0 we use

119864 [120583119908] =

119899minus1

sum

119894=2

119864 [119883(119894)] +

1

2(1 minus (119899 minus 2)119908)

times (119864 [119883(1)] + 119864 [119883

(119899)])

MSE (120583119908) = 119864 [(120583

119908minus 120583)

2

]

= 119864 [1205832

119908] minus 2120583119864 [120583

119908] + 120583

2

(A9)

And we have

119864 [1205832

119908] =

119899minus1

sum

119894=2

1199082119864 [119883

2

(119894)] + 2

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

1199082119864 [119883

(119894)119883

(119895)]

+1

4(1 minus (119899 minus 2)119908)

2

times (119864 [1198832

(1)] + 119864 [119883

2

(119899)] +2119864 [119883

(1)119883

(119899)])

+

119899minus1

sum

119894=2

119908 (1 minus (119899 minus 2)119908)

times (119864 [119883(1)119883

(119894)] + 119864 [119883

(119894)119883

(119899)])

(A10)

Let

119860 =

119899minus1

sum

119894=2

119864 [1198832

(119894)] + 2

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

119864 [119883(119894)119883

(119895)]

119861 = 119864 [1198832

(1)] + 119864 [119883

2

(119899)] + 2119864 [119883

(1)119883

(119899)]

119862 =

119899minus1

sum

119894=2

(119864 [119883(1)119883

(119894)] + 119864 [119883

(119894)119883

(119899)])

119863 =

119899minus1

sum

119894=2

119864 [119883(119894)]

119864 = 119864 [119883(1)] + 119864 [119883

(119899)]

(A11)

Then

MSE (120583119908) = 119860119908

2+1

4(1 minus (119899 minus 2)119908)

2119861

+ 119908 (1 minus (119899 minus 2)119908)119862

minus 2120583 (119863119908 +1

2(1 minus (119899 minus 2)119908) 119864) + 120583

2

12 ISRN Probability and Statistics

= (119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862)119908

2

+ (minus1

2(119899 minus 2) 119861 + 119862 minus 2120583119863 + 120583 (119899 minus 2) 119864)119908

+1

4119861 minus 120583119864 + 120583

2

(A12)

Now use Lemma A1 then we haveTheorem 3

LemmaA2 119872119878119864(120583119908) in (20) is a convex functionwith 119886 gt 0

when 119899 gt 4119862119861 + 2 119862 and 119861 are given in (20)

Proof Let 119891(119908) = MSE(120583119908) = 119886119908

2+ 119887119908 + 119888 We have

119886 = 119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862 (A13)

where

119860 = 119864[

[

(

119899minus1

sum

119894=2

119883(119894))

2

]

]

ge 0

119861 = 119864 [(119883(1)+ 119883

(119899))2

] ge 0

119862 = 119864[

119899minus1

sum

119894=2

119883(119894)(119883

(1)+ 119883

(119899))] ge 0

(A14)

Since 119899 gt 2 then1

4(119899 minus 2) 119861 minus 119862 gt 0 such that 119886 gt 0 when 119899 gt

4119862

119861+ 2

(A15)

Proof of Corollary 5 ByTheorem 3 and Lemma A2 let

119891 (119908) = MSE (120583119908) = 119886119908

2+ 119887119908 + 119888 (A16)

which is a quadratic functionwith first and secondderivativesof 119891(119908) wrt 119908

1198911015840(119908) = 2119886119908 + 119887 119891

10158401015840(119908) = 2119886 gt 0 (A17)

Hence 119891(119908) is a convex function with minimum value

119891min = 119886(minus119887

2119886)

2

+ 119887(minus119887

2119886) + 119888 = minus

1198872

4119886+ 119888 (A18)

The maximum value of the EFF(120583119908) is

EFFmax (120583119908)

=

120572120574120572(]2minus120572 minus 1205742minus120572) (2 minus 120572) (1 minus (120574])

120572

)

119899 (minus1198872 (4119886) + 119888)

minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

119899 (minus1198872 (4119886) + 119888)

at 119908 = minus119887

2119886

(A19)

Acknowledgments

The authors thank the referees and the editor for their com-ments which helped to improve the paper This research issupported by the Natural Sciences and Engineering ResearchCouncil of Canada

References

[1] R A Pielke Jr J Gratz C W Landsea D Collins M ASaunders and R Musulin ldquoNormalized hurricane damage inthe United States 1900ndash2005rdquo Natural Hazards Review vol 9no 1 pp 29ndash42 2008

[2] P Embrechts C Kluppelberg and T Mikosch ModellingExtremal Events for Insurance and Finance Springer New YorkNY USA 2003

[3] C Kleiber and S Kotz Statistical Size Distribution in Economicsand Actuarial Sciences JohnWiley amp Sons NewYork NY USA2003

[4] Beirlant J Y Goegebeur I Sergers and J Teugels Statisticsof Extremes Theory and Application John Wiley amp Sons NewYork NY USA 2005

[5] M A Beg ldquoEstimation of the tail probability of the truncatedPareto distributionrdquo Journal of Information amp OptimizationSciences vol 2 no 2 pp 192ndash198 1981

[6] I B Aban M M Meerschaert and A K Panorska ldquoParameterestimation for the truncated Pareto distributionrdquo Journal of theAmerican Statistical Association vol 101 no 473 pp 270ndash2772006

[7] B Efron ldquoBootstrap methods another look at the jackkniferdquoThe Annals of Statistics vol 7 no 1 pp 1ndash26 1979

[8] P Barbe and P Bertail The Weighted Bootstrap Springer NewYork NY USA 1995

[9] G R Shorack and J A Wellner Empirical Processes withApplications to Statistics John Wiley amp Sons New York NYUSA 1986

[10] H L Koul Weighted Empirical and Linear Models vol 21 ofLecture Notes-Monograph Series Institute of Mathematical Sta-tistics Hayward Calif USA 1992

[11] M L Huang and P H Brill ldquoA distribution estimation methodbased on level crossingsrdquo Journal of Statistical Planning andInference vol 124 no 1 pp 45ndash62 2004

[12] M L Huang ldquoThe efficiencies of a weighted distribution func-tion estimatorrdquo inThe Proceeding of American Statistical Associ-ation Nonparametric Statistics Section pp 1502ndash1506 2003

[13] B M Hill ldquoA simple general approach to inference about thetail of a distributionrdquo The Annals of Statistics vol 3 no 5 pp1163ndash1174 1975

[14] P Cooke ldquoStatistical inference for bounds of random variablesrdquoBiometrika vol 66 no 2 pp 367ndash374 1979

[15] A N Kolmogorov ldquoSulla determinazione empirica di una leggedi distribuzionerdquoGiornale dellrsquoIstituto Italiano degli Attuari vol4 pp 83ndash91 1933

[16] T W Anderson and D A Darling ldquoAsymptotic theory of cer-tain ldquogoodness of fitrdquo criteria based on stochastic processesrdquoTheAnnals of Mathematical Statistics vol 23 pp 193ndash212 1952

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article A Weighted Estimation for Risk Modeldownloads.hindawi.com › archive › 2013 › 829131.pdf · 2019-07-31 · Research Article A Weighted Estimation for Risk Model

2 ISRN Probability and Statistics

0

20

40

60

80

100

120

140

160

180

Billion

1900 1920 1940 1960 1980 2000

Year

Figure 1 The 49 USA atlantic hurricane loss 1900ndash2005

increasing with applications to extreme values insurancesurvival analysis networks and risk analysis [2 3]Thereforewe choose a Pareto model In this paper we will proposea weighted method to study the hurricane loss data Thestatistical analysis results are given in Section 6

112 A Forest Fire Loss Example Large forest fires have asignificant impact on natural social and economic systemsHowever most fires are extinguished in the initial stages andthus remain smallThese smaller fires have a large probabilityof occurrence but the resulting damage is almost negligibleon an individual basis Large forest fires however havea low probability of occurrence but the damage and thelosses are huge So modelling large fire losses is becomingcritical in the analysis of the risk of the next large forestfire Figure 2 shows 30 forest fire losses during 1977 to 2006in AB Canada We are concerned that the losses have beenincreasing over the last 30 years The data listed in Table 1 arefire occurrence records from the Forestry Service (Councilof Canadian Fire Marshals and Fire Commissioners 2008Canadian Fire Statistics (httpwwwccfmfccastatshtml)The database includes forest fire records for all the insurancelosses The data contains a relatively substantial number oflarge losses which convince us to use a Pareto model Thestatistical analysis results are given in Section 6

12 The Truncated Pareto Distribution Many risk modelswith heavy tails have been developed using the class ofthe Pareto distributions Some of these cases are found incity population sizes the occurrence of natural resourcesthe sizes of firms and personal income [3] The Paretodistribution is on the list of distributions of Frechet domainwhich belongs to the extreme value distribution of type II[4] It is important to explore estimation methods for Paretodistribution There are theoretical difficulties in studying thePareto distribution since it has infinite moments in heavy-tailed situations We propose using the truncated Paretodistribution in these models to overcome these difficultiesUsually we choose the upper limit as the largest value in thedata set In the recent years the truncated Pareto distribution

600

800

1000

1200

1400

1600

Million

1980 1985 1990 1995 2000 2005 2010

Year

Figure 2 The 30 forest fire losses 1977ndash2006 AB Canada

has become an alternative model for the original Paretodistribution

There are several kinds of Pareto distributions We con-sider a type I Pareto distribution in this paper [3]

Definition 1 The probability density function (pdf) and thecumulative distribution function (cdf) of a random variable119883 having the Pareto distribution are given by

119891119901(119909) =

120572120574120572

119909(120572+1) 0 lt 120574 le 119909 lt infin 120572 gt 0 (1)

119865119901(119909) = 1 minus (

120574

119909)

120572

0 lt 120574 le 119909 lt infin 120572 gt 0 (2)

where 120572 is the shape parameterWhen 0 lt 120572 le 1 which is a heavy-tailed case the mean

and variance of 119883 are infinite and the distribution is heavieron the right tail as 120572 decreases

The truncated Pareto distribution was originally used todescribe the distribution of oil fields by size It has a lowerlimit 120574 an upper limit ] and a shape parameter 120572 In factit has been shown that the truncated Pareto distributionfits better than the nontruncated distribution for positivelyskewed populations [5]

Definition 2 The pdf and cdf of a random variable 119883having the truncated Pareto distribution are given by

119891 (119909) =120572120574

120572119909minus120572minus1

1 minus (120574])120572 0 lt 120574 le 119909 le ] lt infin 120572 gt 0 (3)

119865 (119909) = 1 minus120574120572(119909

minus120572minus ]minus120572)

1 minus (120574])120572

0 lt 120574 le 119909 le ] lt infin 120572 gt 0

(4)

ISRN Probability and Statistics 3

Table 1 Forest fire loss data AB Canada (in million 2006 CAD)

Year Loss1977 57171978 65491979 75641980 97901981 90541982 99891983 81601984 92951985 92601986 97351987 95621988 100431989 111961990 123741991 123971992 124141993 118191994 115151995 111081996 116331997 129161998 117561999 123192000 118522001 142082002 148902003 135612004 112662005 126592006 14033

The quantile function of the truncated Pareto distribution is

119865minus1(119906) = (

1 minus 119906

120574120572+119906

]120572)

minus1120572

0 le 119906 le 1 0 lt 120574 le ] lt infin 120572 gt 0

(5)

The mean and variance of119883 are

120583 =

120572120574120572(120574

1minus120572minus ]1minus120572)

(120572 minus 1) (1 minus (120574])120572

)

0 lt 120574 lt ] lt infin 120572 gt 0 (6)

1205902=

120572120574120572(]2minus120572 minus 1205742minus120572)

(2 minus 120572) (1 minus (120574])120572

)2minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

0 lt 120574 lt ] lt infin 120572 gt 0

(7)

13 The Weighted Empirical Distribution Function Recentlysome parametric estimation methods for the truncatedPareto distribution has been developed [6] But there isan efficiency problem in the estimates of the distribution

tails The distribution tail values and their probabilities areimportant in many fields for example value at risk in riskanalysis survival probability in survival analysis tolerancelimits in quality control prediction intervals and confidenceintervals Classical statistical inference theory depends on theclassical empirical distribution function (EDF) 119878

119899(119909)

119878119899(119909) =

1

119899

119899

sum

119894=1

119868(minusinfin119909]

(119883119894) where 119868

119860=

1 if 119909 isin 1198600 if 119909 notin 119860

(8)

119878119899(119909) is a minimum variance unbiased estimator for the

cdf 119865(119909) based on a random sample 1198831 119883

2 119883

119899 Note

that 119878119899(119909) uses the equal weight 1119899 for each sample point

Should we use equal weights on extreme data values aswell Recently authors have applied various weights to datapoints by using different philosophies that is the Jackknifemethod gives zero weight for eliminated data [7] weightedbootstrap [8] and weighted empirical distribution functionsor processes have been discussed [9 10] But there are somedifficulties to determine what weights should be used forthe data points Huang and Brill [11] introduced a weightedlevel crossing estimation method from a geometric point ofview to visualize random samples in the 119871

1-optimal sense the

method improves the efficiency of the estimation of tailsThis paper proposes a semiparametric approach to esti-

mate 119865(119909) in (4) using 1198711-optimal and 119871

2-optimal weights

Both theoretical and simulation efficiencies are consistentlyimproved when compared with existing methods Thismethod is based on a symmetric weighted empirical distribu-tion function (SWEDF) of Huang [12] namely

119865lowast

119899(119909) =

119899

sum

119894=1

119868(minusinfin119909]

(119883(119894)) 119901

119899119894 119909 isin K 119899 gt 2 (9)

where the 119901119899119894are symmetric general weights

119901119899119894equiv

119908 119894 = 2 119899 minus 1 0 lt 119908 lt1

119899 minus 2

1199081119899equiv1

2(1 minus (119899 minus 2)119908) 119894 = 1 119899

(10)

and 119883(1)

le 119883(2)

le sdot sdot sdot le 119883(119899)

are the order statistics of therandom sample Note that

0 lt 119901119899119894lt 1 119894 = 1 119899

119899

sum

119894=1

119901119899119894= 1 (11)

The parameter 119908 in (9) is the weight for the middle119899 minus 2 data 119908

1119899in (9) is the weight for the extreme data

It is interesting to explore how the value of 119908 affects theestimation of a heavy-tailed distribution We may use 119908

1119899

flexiblyIn Section 2 we propose a weighted method for esti-

mating the shape parameter and the mean of the truncatedPareto distribution In Section 3 an exact efficiency functionof the new mean estimator relative to the classical estimatoris derived Section 4 explores the 119871

1-optimal weights and

1198712-optimal weights for estimating the mean Section 5 gives

4 ISRN Probability and Statistics

results of Monte Carlo simulations The simulation efficien-cies are consistent with the exact efficiencies in Section 3 InSection 6 we analyze the hurricane loss data and forest fireloss data given in Section 1 by using the proposed methodThe statistical inference in these examples shows that theestimated distribution curve by using proposed weightedestimation method fits the tails of data better relative toseveral existing methods Suggestions for further studies arealso discussed

2 Estimation Methods

In this section we discuss the existing and proposedmethod-ologies for the truncated Pareto distribution Consider arandom sample 119883

1 119883

2 119883

119899from the distribution in (3)

and let119883(1)le 119883

(2)le sdot sdot sdot le 119883

(119899)denote its order statistics

21 Maximum Likelihood Estimators (Hill Beg and Aban)There are several different maximum likelihood estimators(MLE) for estimating the shape parameter 120572

A popular estimator is the Hill [13] MLE which uses the119903 largest order statistics 1 le 119903 le 119899 minus 1 to estimate theoriginal Pareto shape parameter 120572 in (1) When applying itto the truncated Pareto distribution in (3) it is defined as

Hill = [119903minus1

119903

sum

119894=1

ln119883(119899minus119894+1)

minus ln119883(119899minus119903)

]

minus1

(12)

Beg [5] developed the MLE method for the truncatedPareto distribution when ] is knownThe Beg MLE for 120572 canbe obtained by solving the following equation

119899[

[

1

Beg+

ln (119883(1)Beg)

((119883(1)])

minusBegminus 1)

minus ln(119892

119883(1)

)]

]

= 0 (13)

where 119883(1)

= min(1198831 119883

2 119883

119899) and 119892 = (119883

(1)119883

(2)

sdot sdot sdot 119883(119899))1119899 is the sample geometric mean

Abanrsquos MLE [6] when 120574 ] are known is obtained bysolving the equation

119899

Aban+119899(120574])

Aban ln (120574])

1 minus (120574])Aban

minus

119899

sum

119894=1

[ln119883(119894)minus ln 120574] = 0 (14)

where 120574 le 119883(119894)le ] and 119899 is the sample size

Note that wemay use estimators 120574 = min(1198831 119883

2 119883

119899)

and ] = max(1198831 119883

2 119883

119899) in (14) when 120574 ] are unknown

A similar situation is in the following equations (16) and (18)There are other estimators of 120574 and ] in the literature forexample Cooke [14]

22Moment Estimator To estimate the population truncatedPareto mean the sample mean estimator is

120583119883= 119883 =

1

119899

119899

sum

119894=1

119883119894 (15)

A Moment estimator 119872

for estimating 120572 is the solution ofthe equation

119883 =

119872120574119872 (120574

1minus119872 minus ]1minus119872)

(119872minus 1) (1 minus (120574])

119872)

119872gt 0 (16)

23 A Proposed Weighted Estimator Now to estimate thepopulation truncated Pareto mean we define a weightedmean based on the weighted empirical distribution functionin (9) as

120583119908= 119883

119908=1

119899

119899

sum

119894=1

119901119899119894119883

(119894)

=

119899minus1

sum

119894=2

119908119883(119894)+1

2(1 minus (119899 minus 2)119908) [119883

(1)+ 119883

(119899)]

(17)

where 119899 is the sample size and 119908 is the weight as defined in(9) Then for estimating 120572 we define

119908to be the solution of

the equation

119883119908=

119908120574119908 (120574

1minus119908 minus ]1minus119908)

(119908minus 1) (1 minus (120574])

119908)

119908gt 0 (18)

3 An Exact Efficiency Function

In this section an exact efficiency function of the weightedestimator 120583

119908in (17) relative to the sample mean 119883 in (15)

is given The mathematical derivations are given in theappendix

Theorem 3 Themean and mean square error (MSE) of 120583119908in

(17) when 120572 gt 0 are given by

119864 [120583119908] = 120574

119899minus1

sum

119894=2

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

)

times 120588119896[Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1)

+1

2(1 minus (119899 minus 2)119908)

times120588119896(Γ (119899 + 1) Γ (119896 + 1)

Γ (119896 + 119899 + 1)+

119899

119896 + 119899)]

(19)

119872119878119864 (120583119908) = 119886119908

2+ 119887119908 + 119888 119908ℎ119890119903119890 0 lt 119908 lt 1

119886 = 119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862

119887 = minus1

2(119899 minus 2) 119861 + 119862 minus 2120583119863 + 120583 (119899 minus 2) 119864

119888 =1

4119861 minus 120583119864 + 120583

2

119860 = 120574

119899minus1

sum

119894=2

infin

sum

119896=0

(

2

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1)

ISRN Probability and Statistics 5

+ 21205742

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times 120588119896+119897

times[Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)]

119861 = 120574

infin

sum

119896=0

(

2

120572+ 119896 minus 1

119896

)

times120588119896[Γ (119899 + 1) Γ (119896 + 1)

Γ (119896 + 119899 + 1)+

119899

119896 + 119899]

+ 21205742

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897[Γ (119899 + 1) Γ (119896 + 1)

Γ (119896 + 119899) (119896 + 119897 + 119899)]

119862 = 1205742

119899minus1

sum

119894=2

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times 120588119896+119897[Γ (119899 + 1) Γ (119896 + 1) Γ (119896 + 119897 + 119894)

Γ (119896 + 119894) Γ (119896 + 119897 + 119899 + 1)

+Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899) (119896 + 119897 + 119899)]

119863 = 120574

119899minus1

sum

119894=2

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

)

times120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1)

119864 = 120574

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

)

times 120588119896[Γ (119899 + 1) Γ (119896 + 1)

Γ (119896 + 119899 + 1)

+119899

119896 + 119899]

(20)

where 120588 = (119892120572minus 1)119892

120572 119892 = ]120574 0 lt 120588 lt 1 0 lt 119892 lt 10 lt 120574 le 119909 le ] lt infin

Γ (119911) = int

infin

0

119905119911minus1119890minus119905119889119905 119911 gt 0 119894119904 119905ℎ119890 119866119886119898119898119886 119891119906119899119888119905119894119900119899

(21)

Corollary 4 The efficiency function (EFF) of 120583119908

in (17)relative to the sample mean 119883 in (15) is given by119864119865119865 (120583

119908)

=

Var (119883)119872119878119864 (120583

119908)=

1205902119899

119872119878119864 (120583119908)

=

120572120574120572(]2minus120572 minus 1205742minus120572) (2 minus 120572) (1 minus (120574])

120572

)

119899119872119878119864 (120583119908)

minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

119899119872119878119864 (120583119908)

(22)

where 1205902 and119872119878119864(120583119908) are given in (7) and (20)

4 Optimal Weights

41 1198711-Optimal Weights Huang and Brill [11] proposed an

1198711-optimal weight which is based on Manhattan metric for

the 119865lowast

119899(119909) in (9) It is

1199081198711-opt =

1

radic119899 (119899 minus 1)

gt1

119899 for the middle data (23)

Huang and Brill [11] proved that the exact efficiency (EFF)of 119865lowast

119899(119909) in (9) relative to the EDF 119878

119899(119909) exceeds 1 on the

tails of the distribution In this paper we use this 1199081198711-opt

weight in order to improve the efficiency of estimating the tailprobability of the truncated Pareto distribution Huang [12]indicates that in general for any distribution if119908 gt 1119899 whichmeans puttingmore weight on the middle data the efficiencyof estimating the tail probability will be improved

Next we explore an alternative 1198712-optimal weight

42 1198712-Optimal Weights

Corollary 5 An L2-optimal weight 119908

1198712-opt for the efficiencyfunction of the 120583

119908given in (17) for estimating the population

mean in (3) relative to the sample mean 119883 in (16) when 119899 gt

4119862119861 + 2 is given by

1199081198712-119900119901119905 = minus

119887

2119886 (24)

The minimum119872119878119864min(120583119908) and 119864119865119865max(120583119908) are given by

119872119878119864min (120583119908) = minus1198872

4119886+ 119888

119864119865119865max (120583119908)

=

120572120574120572(]2minus120572 minus 1205742minus120572) (2 minus 120572) (1 minus (120574])

120572

)

119899 (minus1198872 (4119886) + 119888)

minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

119899 (minus1198872 (4119886) + 119888)

(25)In (24) and (25) 119886 119887 119888 119862 and 119861 are defined in (20)

6 ISRN Probability and Statistics

Table 2 Exact efficiencies of 120583119908relative to other estimators by using 119871

2-optimal weights 120574 = 1 ] = 10

Sample size 120572 1199081198712-opt 119908 = 1119899 119908

1198711-opt MSE (120583119908) MSE (119883) EFF (120583

119908(119883))

119899 = 10

08 01039601 010541

04007 04034 1006818 011446 01406 01647 117173 011847 00365 00493 13501

119899 = 20

08 005015005 005130

02017 02017 1000118 005255 00766 00824 107593 005463 00217 00247 11367

119899 = 30

08 003323003333 003390

01344 01345 1000318 003425 00528 00549 103993 003529 00155 00164 10628

119899 = 50

08 002021002 002020

00968 01063 1097818 002159 00514 00592 115363 002298 00274 00352 12849

119899 = 100

08 001001001 001005

00765 00830 1084518 001002 00542 00576 106373 001002 00380 00400 10526

The proof of Corollary 5 is in the appendixTable 2 lists the values of 119908

1198712-opt 1199081198711-opt MSEmin(120583119908)MSEmin(119883) and the exact EFFmax(120583119908) of 120583119908 relative to119883 for119899 = 10 20 30 50 and 100 120572 = 08 18 and 3 by using (24)and (25) We note that all values of 119908

1198712-opt are greater than1119899 and all exact EFFmax(120583119908) relative to119883 are greater than 1

Remark 6 The 1199081198711-opt in (23) is totally nonparametric it is

more robust and easy to use Note that1199081198712-opt in (24) depends

on the parameter 120572 In practice we may estimate 120572 first thenobtain a119908

1198712-opt while still keeping the optimal advantage Ofcourse we use the given120572 values in the simulationsHowever1199081198711-opt and 1199081198712-opt are close to each other when 119899 gt 100

5 Simulations

Next we use simulations to compare the performance ofthe three parametric MLE estimators that is Hillrsquos Begrsquosand Abanrsquos estimators in (12) (13) and (14) and the twosemiparametric estimators that is the Moment and theweighted estimators in (16) and (18) for estimating the shapeparameter 120572 in (3) We generate m = 1000 random samplesof size 119899 = 100 from the distribution of (3) 119908

1198711-opt is onlyused for the weighted estimator since Table 2 indicates thatthe values of 119908

1198712-opt and 1199081198711-opt are close to each other when119899 = 100

We know that if120572 le 2 the original Pareto distribution hasan infinite variance if 120572 le 1 then the mean is infinite Thesecases have inference difficultiesWe focus on 120572 = 05 08 and18 and let 120574 = 1 ] = 10 Figure 3 contains the box-plotsof the comparison of these five estimators Note that in thecases 120572 = 05 and 08 Begrsquos and Hillrsquos estimators have largebiases and the weighted Abanrsquos and the moment estimatorsperformed very well but sometimes Abanrsquos MLE estimateshave unstable solutions When 120572 = 18 there are similarresults except Hillrsquos and Begrsquos estimators performed better It

is interesting to see that in all three cases the median of theweighted estimator for 120572 is relatively larger than Abanrsquos andthe Momentrsquos ones We will discuss how these affect the tailestimation in the next section The simulations were run byusing MAPLE 15 with double precision

6 Examples

Now we use the proposed method and compare it withexisting methods to analyze the data of the two examplesoutlined in Section 1

61 Hurricane Loss Example

611 Comparison of Four Estimation Methods At first Welook at the hurricane loss example in Section 111 Based onthe simulation results in Figure 3 we consider the three betterestimators out of the five in the heavy-tailed case that isAbanrsquos Moment and the weighted estimators in (27) (28)and (29) for the truncated Pareto pdf 119891(119909) in (3) and thecdf 119865(119909) in (4)We also compare them with Hillrsquos estimatorin (26) for the original Pareto pdf 119891

119901(119909) in (1) and the cdf

119865119901(119909) in (2) Here

119891Hill (119909) =Hill120574

Hill

119909119908 119865Hill (119909) = 1 minus (

120574

119909)

Hill (26)

where Hill is given in (12) with 119903 = 119899 minus 1 consider

119891MLE (119909) =MLE120574

MLE119909minusMLEminus1

1 minus (120574])MLE

119865MLE (119909) = 1 minus120574MLE (119909

minusMLE minus ]minusMLE)

1 minus (120574])MLE

(27)

ISRN Probability and Statistics 7

Table 3 Estimation of index mean median and VaR for hurricane loss data

Method 120583 Median 5 value at riskPareto

(Hill) 08126 infin 868 billion 14768 billionTPD

(Aban) 06206 2174 billion 973 billion 8515 billionTPD

(Moment) 06476 2048 billion 947 billion 8255 billionTPD

(Weighted) 06990 1937 billion 902 billion 7758 billion

1614121

080604020

181614121

080604020

181614121

2

262422

080

2 4 6 8 10Weighted Aban Beg Hill Moment

2 4 6 8 10Weighted Aban Beg Hill Moment

2 4 6 8 10Weighted Aban Beg Hill Moment

120572 = 05

120572 = 08

120572 = 18

minus02

Figure 3 Box-plot of the estimators for the shape parameter 120572 byusing 119871

1-optimal weight 120574 = 1 ] = 10 sample size 119899 = 100 and

generating timesm = 1000

where MLE is Abanrsquos MLE given in (14) consider

119891119872(119909) =

119872120574119872119909

minus119872minus1

1 minus (120574])119872

119865119872(119909) = 1 minus

120574119872 (119909

minus119872 minus ]minus119872)

1 minus (120574])119872

(28)

where 119872

is given in (16) consider

119891119908(119909) =

119908120574119908119909

minus119908minus1

1 minus (120574])119908

119865119908(119909) = 1 minus

120574119908 (119909

minus119908 minus ]minus119908)

1 minus (120574])119908

(29)

where 119908

is given in (18) using the weight 1199081198711-opt =

1radic119899(119899 minus 1)

The results of these four methods are listed in Table 3 byusing the hurricane loss data where 120574 = 119883

(min) ] = 119883(max)

and 119899 = 49Figure 4 is a log-log plot showing the upper tail for the

hurricane loss data In this plot the circles represent the realdata and the straight line represents the estimated originalPareto distribution The dashed line dotted line and thicksolid line represent the estimated truncated Pareto distri-butions by using Abanrsquos Moment and Huangrsquos estimatorsrespectively We observe two conclusions intuitively

(1) The original Pareto distribution (straight line) doesnot fit the data well in the tail The truncated Paretodistribution fits the data very well using all threeestimation methods (Aban Moment and weighted)Note that the 5value at risk estimated by the originalPareto model is 147 billion the three truncatedmodelestimates are about 80 billion It appears that theoriginal Pareto model overestimates the risk Aninsurance company would set a high premium if itis using the Pareto model with the result that manypeople cannot afford to buy insurance

(2) We examine the three truncated Pareto estimatesAround the tail the weighted estimate fits the datathe best that is the ln(119875119883 gt 119909) curve turns down-ward more quickly following the trend of the datapattern because the weighted estimate

119908is largest

among the three truncated model estimates (this isconsistent with the simulation results in Figure 4)and the weighted mean estimate 120583

119908and the 5 value

at risk are the smallest among the three methods (thisis obtained by placing less weight on the extremevalue eg the 1926 great miami hurricanersquos loss)so its estimated distribution is less heavy in the tailcompared with other methods

Next in order to confirm these conclusions we runthree goodness of-fit tests Later we define the absolute errorand integrated error as the measures of the distance fromthe empirical data points to the estimated Pareto curve andtruncated Pareto curves

612 Goodness-of-Fit Tests Our objective is to test if theestimated distributions in (26)ndash(29) fit the data properlyWe test the hypotheses 119867

0 119865(119909) = 119865

lowast(119909) against 119867

1

119865(119909) = 119865lowast(119909) where 119865(119909) is the true unknown distribution

function and119865lowast(119909) is the estimated Pareto cdf in (26) or the

estimated truncated Pareto cdf in (27)ndash(29) In this paperwe use three EDF goodness-of-fit tests

8 ISRN Probability and Statistics

DataParetoAban

MomentWeighted

22 23 24 25 260

minus1

minus2

minus3

minus4

ln(x)

ln(P(X

gtx))

Figure 4 Log-log plot of the hurricane loss data

(1) The Kolmogorov-Smirnov (K-S) test [15] the teststatistic is given by

119879 = sup119909

1003816100381610038161003816119865lowast(119909) minus 119878

119899(119909)

1003816100381610038161003816 minusinfin lt 119909 lt infin (30)

where 119878119899(119909) is the EDF in (8) Under 119867

0 the two-tailed 119875

value for the K-S test is as follows

119875-value

= 1 minus [1 minus 119879

[119899(1minus119879)]

sum

119895=0

119899

119895 (119899 minus 119895)(1 minus 119879 minus

119895

119899)

119899minus119895

(119905 +119895

119899)

119895minus1

]

2

(31)

where [119899(1 minus 119879)] is the integer part of 119899(1 minus 119879)(2) Anderson and Darling [16] test (A-D test) introduced

a measure of ldquodistancerdquo between the empirical distribution119878119899(119909) and the proposed cdf119865lowast

(119909) by using ametric functionspace

1198822

119899= 119899int

infin

minusinfin

[119878119899(119909) minus 119865

lowast(119909)]

2

120595 (119865lowast(119909)) 119889119865 (32)

where 120595(119906) = 1(119906(1 minus 119906)) is a weight function with 119906 =

119865(119909) Let 119906119894= 119865(119883

(119894)) 119894 = 1 119899 and under 119867

0the test

statistic and 119875-value are given by

1198822

119899= minus119899 minus

1

119899

119899

sum

119895=1

(2119895 minus 1) (log (119906119895) + log (1 minus 119906

119899minus119895+1))

119875-value = 1 minusradic2

119911

infin

sum

119895=0

(minus1)

119895Γ (119895 + 12) (4119895 + 1)

119895119890minus(4119895+1)

21205872(8119911)

timesint

infin

0

119890((z8)(1199082+1)minus(4119895+1)212058721199082(8119911))

119889119908

(33)

008

006

004

002

0

Aban

Moment

Pareto

Weighted

4e + 10 6e + 10 8e + 10 1e + 10 12e + 1014e + 10

x

Figure 5 Absolute errors |119865lowast(119909)minus119878

119899(119909)| for hurricane loss example

on the 10 largest losses data

where 119911 is the observed value of 1198822

119899and Γ(119909) =

intinfin

0119905119909minus1

119890minus119905119889119905 119909 gt 0 is the Gamma function

(3) Cramer-von Mises test [16] proposed (C-v-M test)using 120595(119906) = 1 in (32) thus under 119867

0 the test statistic and

119875-value are given by

1198991205962=

1

12119899+

119899

sum

119895=1

(119906119895 minus2119895 minus 1

2119899)

2

119875-value = 1 minus1

120587radic119911

infin

sum

119895=0

(minus1)

2119895Γ (119895 + 12) (4119895 + 1)

12

119895radic120587119890minus(4119895+1)

2(16119911)

times11987014 ((4119895 + 1)

2

16119911)

(34)

where 11987014(119909) is the modified Bessel function of the second

kind

119870120572(119909) =

120587

2

119868minus120572(119909) minus 119868

120572(119909)

sin (120572120587)

119868120572(119909) =

infin

sum

119898=0

1

119898Γ (119898 + 120572 + 1)(119909

2)

2119898+120572

(35)

For fitting the 49 losses in the hurricane loss data forexample after computing estimates of 119865lowast

(119909) by using (26)(27) (28) and (29) we compute the absolute error (AE) in(30) and define the integrated error (IE) by

IE = [int119883(119899)

119883(119899minus119903+1)

(119878119899(119909) minus 119865

lowast(119909))

2

119889119909]

12

(36)

Figure 5 gives the absolute errors (AE) in (30) for the 10largest losses and confirms that the original Pareto estimatehas relatively larger errors and the weighted estimate hasrelatively smaller errors in the tail Figure 5 explains the datafitting of the tail of the distributions in Figure 4

We also compute the AE in (30) and IE in (36) to confirmthose tail errors by using 119903 as the number of the largest losses1 le 119903 le 119899 The AE and IE values are given in Table 4 (thesmallest values are bold with lowast)

ISRN Probability and Statistics 9

Table 4 Errors of goodness-of-fit tests 119899 = 49 for hurricane example

MethodGoodness-of-fit tests

Absolute error (AE) Integrated error (IE)119903 = 49 119903 = 18 119903 = 10 119903 = 49 119903 = 18 119903 = 10

Pareto(Hill) 01340 00584 00584 04844 03818 03723

TPD(Aban) 00948lowast 00839 00832 03114 02565 02161

TPD(Moment) 01053 00738 00737 02985lowast 02171 01825

TPD(Weighted) 01250 00561lowast 00561lowast 03098 01678lowast 01453lowast

Table 5The estimation of index mean median and VaR for forestfire loss data

Method 120583 Median 5 value at riskPareto

(Hill) 43044 1096 billion 109 billion 1855 billionTPD

(Aban) 07964 1167 billion 115 billion 1446 billionTPD

(Moment) 08260 1166 billion 115 billion 1446 billionTPD

(Weighted) 08642 1165 billion 115 billion 1446 billion

The weighted estimator has the smallest AE and IE valuesfor 119903 = 18 and 10 largest losses and its IE value is almostequal to the smallest IE value for 119903 = 49 all largest losses Westatistically conclude that theweighted estimated distributionis the best fit in the tail of the hurricane loss data

62 Forest Fire Loss Data Example Next we look at the forestfire loss example in Section 112 The data in Table 1 containsa relatively substantial number of large forest fire losses whichconvince us to use a truncated Paretomodel and compare thefour estimators in (26) (27) (28) and (29) We use the 25largest losses in this study

Figure 6 is a log-log plot which shows the upper tail forthe forest fire loss data The circles represent the real dataand the straight line represents the estimated original Paretodistribution The dashed line dotted line and thick solidline represent the estimated truncated Pareto distributionby using Abanrsquos Moment and the weighted estimatorsrespectively We can see that the estimated truncated Paretodistributions fit the data very well using all three estimationmethods and are much better than the original Paretodistribution Around the tail the weighted and the Momentestimation methods perform the best The fact that the tailof the data curves downward in Figure 6 is the evidence insupport of using a truncated Pareto model

The results of these three estimators are listed in Table 5by using the forest fire loss data where 120574 = 119883

(1) ] = 119883

(119899)

and 119899 = 25Note that the weighted estimate

119908is the largest among

the three truncated Pareto estimates The 5 value at risk ofthe original Pareto estimate is the largestThe 5VaRmay beoverestimated compared to the truncated Pareto models

Similarly as in the hurricane example we compute theAE in (30) and IE in (36) again for fitting the 25 losses of theforest fire loss data

Figure 7 gives the absolute errors (AE) in (30) for the 10largest losses and confirms that the original Pareto estimate

ndash3

ndash25

ndash2

ndash15

ndash1

ndash05

0207206 208 2092 21 211

ln(x)

ln(P(X

gtx))

DataParetoAban

MomentWeighted

Figure 6 Log-log plot for forest fire example

has relatively larger errors and the weighted estimate hasrelatively smaller errors in the tail Figure 7 explains the datafitting of the tail of the distributions in Figure 6

We also compute the AE in (30) and IE in (36) to confirmthose tail errors by using 119903 as the number of the largest losses1 le 119903 le 119899 The IE values are given in Table 6 (the smallestvalues are bold with lowast)

The weighted estimator has the smallest AE and IE valuesfor 119903 = 10 and 5 and its IE value in 119903 = 25 is almost equalto the smallest IE value We statistically conclude that theweighted estimated distribution is the best fit to the tail of theforest fire loss data

7 Conclusions

In the complicated real world it is difficult to construct amodel combining all the desired features In general thefinal model selection depends on the best fitting model Thecriteria are based on goodness-of-fit tests existence of themoments characteristic largest values and log-log plots Thehurricane loss data and forest fire loss data are well fitted bythe truncated Pareto distribution In summary

(a) we recommend that the truncated Pareto model isappropriate as a loss distribution to be used whenanalyzing huge risk loss data setsTheupper and lower

10 ISRN Probability and Statistics

Table 6 Errors of goodness-of-fit tests 119899 = 25 for forest fire loss data

MethodGoodness-of-fit tests

Absolute error (AE) Integrated error (IE)119903 = 25 119903 = 10 119903 = 5 119903 = 25 119903 = 10 119903 = 5

Pareto(Hill) 02656 01288 01288 27125 11493 10985

TPD(Aban) 01503lowast 01195 01195 13687lowast 0 7911 05430

TPD(Moment) 01520 00982 00982 13736 0 7764 05430

TPD(Weighted) 01542 00964lowast 00964lowast 13796 07577lowast 05203lowast

Aban

Moment

Pareto

Weighted

012

01

008

006

004

002

0125e + 09 13e + 09 135e + 09 14e + 09 145e + 09

x

Figure 7 Absolute errors |119865lowast(119909)minus119878

119899(119909)| for forest fire loss example

the 10 largest losses

limits can be set by the largest and smallest losses orother reasonable values

(b) the estimated loss distribution provides a predictionof the next disasterrsquos 5 value at risk The largestloss in the data set plays an important role in theprediction In Figures 4 and 6 the trend of thetail distribution is crucial for an insurance companysetting policy also for inhabitants and governmentto make plans to minimize damage from naturaldisasters

(c) the semiparametricmethods (Moment andweighted)are robust easy to use more stable and fit the databetter than theMLEmethods In both of the foregoingtwo examples the estimated shape parameter is lessthan 1 The original Pareto model is not able to usetheMoment or the weightedmethodsThis is anotheradvantage of using the truncated Pareto model

(d) the statistical inference in these examples shows thatthe estimated distribution curve by using proposedweighted estimation method fits the tails of databetter relative to the Moment and MLE estimatorsThis is due to the fact that theweighted estimator givesless weight on the extreme values and has good MSEBased on these studies we suggest that further studieson the usage of weights may be useful

Appendix

Mathematical Derivations

Lemma A1 For a truncated Pareto random variable 119883 withpdf given in (3) for 119894 119895 isin 1 2 119899 119894 lt 119895 119883

(1)le 119883

(2)le

sdot sdot sdot le 119883(119899)

are order statistics one has

119864 [119883(119894)] = 120574

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1) (A1)

119864 [1198832

(119894)] = 120574

infin

sum

119896=0

(

2

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1) (A2)

119864 [119883(119894)119883

(119895)]

= 1205742

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897

[Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)]

(A3)

where 120588 = (119892120572minus 1)119892

120572 119892 = ]120574 0 lt 120588 lt 1 0 lt 119892 lt 10 lt 120574 le 119909 le ] lt infin and 120572 gt 0

Proof Let the cdf of a truncated Pareto distribution in (4)be

119865 (119909) = 119906 119909 = 120574(1 minus 119887119906)minus1120572

120588 =119892120572minus 1

119892120572 119892 =

]

120574

(A4)

By the theory of order statistics we have

119891119894(119909) =

Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)[119865 (119909)]

119894minus1[1 minus 119865 (119909)]

119899minus119894119891 (119909)

(A5)

119891119894119895(119909 119910) =

Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)

times [119865 (119909)]119894minus1[119865 (119910) minus 119865 (119909)]

119895minusiminus1

times [1 minus 119865 (119910)]119899minus119895

119891 (119909) 119891 (119910)

(A6)

ISRN Probability and Statistics 11

Using the binomial formula 1(1 minus 119909)119903 = suminfin

119896=0(119903+119896minus1

119896) 119909

119896119898 = 1 2 we have

119864 [119883119898

(119894)] = int

]

120574

119909119898119891119894(119909) 119889119909

= int

1

0

120574(1 minus 120588119906)minus119898120572 Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)

times (119906)119894minus1(1 minus 119906)

119899minus119894119889119906

=Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)120574

times

infin

sum

119896=0

(

119898

120572+ 119896 minus 1

119896

)

timesint

1

0

120588119896(119906)

119896+119894minus1(1 minus 119906)

119899minus119894119889119906

= 120574

infin

sum

119896=0

(

119898

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1)

(A7)

By substituting 119898 = 1 and 2 respectively we have (A1) and(A2) And

119864 [119883(119894)119883

(119895)]

= int

119910

120574

int

]

120574

119909119894119909119895119891119894119895(119909 119910) 119889119909 119889119910

=Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)

times int

1

0

int

1

0

120574(1 minus 120588119906)minus1120572

120574(1 minus 120588ℎ)minus1120572

times(119906)119894minus1(ℎ minus 119906)

119895minus119894minus1(1 minus ℎ)

ℎminus119895119889119906 119889ℎ

=Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)1205742

times

infin

sum

119896=0

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

) (

1

120572+ 119897 minus 1

119897

)

times 120588119896+119897

int

1

0

int

1

0

(119906

ℎ)

119896+119894minus1

(1 minus119906

ℎ)

119895minus119894minus1

times(ℎ)119896+119897+119895minus1

(1 minus ℎ)119899minus119895119889119906 119889

119906

=

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)

(A8)

Proof of Theorem 3 When 120572 gt 0 we use

119864 [120583119908] =

119899minus1

sum

119894=2

119864 [119883(119894)] +

1

2(1 minus (119899 minus 2)119908)

times (119864 [119883(1)] + 119864 [119883

(119899)])

MSE (120583119908) = 119864 [(120583

119908minus 120583)

2

]

= 119864 [1205832

119908] minus 2120583119864 [120583

119908] + 120583

2

(A9)

And we have

119864 [1205832

119908] =

119899minus1

sum

119894=2

1199082119864 [119883

2

(119894)] + 2

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

1199082119864 [119883

(119894)119883

(119895)]

+1

4(1 minus (119899 minus 2)119908)

2

times (119864 [1198832

(1)] + 119864 [119883

2

(119899)] +2119864 [119883

(1)119883

(119899)])

+

119899minus1

sum

119894=2

119908 (1 minus (119899 minus 2)119908)

times (119864 [119883(1)119883

(119894)] + 119864 [119883

(119894)119883

(119899)])

(A10)

Let

119860 =

119899minus1

sum

119894=2

119864 [1198832

(119894)] + 2

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

119864 [119883(119894)119883

(119895)]

119861 = 119864 [1198832

(1)] + 119864 [119883

2

(119899)] + 2119864 [119883

(1)119883

(119899)]

119862 =

119899minus1

sum

119894=2

(119864 [119883(1)119883

(119894)] + 119864 [119883

(119894)119883

(119899)])

119863 =

119899minus1

sum

119894=2

119864 [119883(119894)]

119864 = 119864 [119883(1)] + 119864 [119883

(119899)]

(A11)

Then

MSE (120583119908) = 119860119908

2+1

4(1 minus (119899 minus 2)119908)

2119861

+ 119908 (1 minus (119899 minus 2)119908)119862

minus 2120583 (119863119908 +1

2(1 minus (119899 minus 2)119908) 119864) + 120583

2

12 ISRN Probability and Statistics

= (119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862)119908

2

+ (minus1

2(119899 minus 2) 119861 + 119862 minus 2120583119863 + 120583 (119899 minus 2) 119864)119908

+1

4119861 minus 120583119864 + 120583

2

(A12)

Now use Lemma A1 then we haveTheorem 3

LemmaA2 119872119878119864(120583119908) in (20) is a convex functionwith 119886 gt 0

when 119899 gt 4119862119861 + 2 119862 and 119861 are given in (20)

Proof Let 119891(119908) = MSE(120583119908) = 119886119908

2+ 119887119908 + 119888 We have

119886 = 119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862 (A13)

where

119860 = 119864[

[

(

119899minus1

sum

119894=2

119883(119894))

2

]

]

ge 0

119861 = 119864 [(119883(1)+ 119883

(119899))2

] ge 0

119862 = 119864[

119899minus1

sum

119894=2

119883(119894)(119883

(1)+ 119883

(119899))] ge 0

(A14)

Since 119899 gt 2 then1

4(119899 minus 2) 119861 minus 119862 gt 0 such that 119886 gt 0 when 119899 gt

4119862

119861+ 2

(A15)

Proof of Corollary 5 ByTheorem 3 and Lemma A2 let

119891 (119908) = MSE (120583119908) = 119886119908

2+ 119887119908 + 119888 (A16)

which is a quadratic functionwith first and secondderivativesof 119891(119908) wrt 119908

1198911015840(119908) = 2119886119908 + 119887 119891

10158401015840(119908) = 2119886 gt 0 (A17)

Hence 119891(119908) is a convex function with minimum value

119891min = 119886(minus119887

2119886)

2

+ 119887(minus119887

2119886) + 119888 = minus

1198872

4119886+ 119888 (A18)

The maximum value of the EFF(120583119908) is

EFFmax (120583119908)

=

120572120574120572(]2minus120572 minus 1205742minus120572) (2 minus 120572) (1 minus (120574])

120572

)

119899 (minus1198872 (4119886) + 119888)

minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

119899 (minus1198872 (4119886) + 119888)

at 119908 = minus119887

2119886

(A19)

Acknowledgments

The authors thank the referees and the editor for their com-ments which helped to improve the paper This research issupported by the Natural Sciences and Engineering ResearchCouncil of Canada

References

[1] R A Pielke Jr J Gratz C W Landsea D Collins M ASaunders and R Musulin ldquoNormalized hurricane damage inthe United States 1900ndash2005rdquo Natural Hazards Review vol 9no 1 pp 29ndash42 2008

[2] P Embrechts C Kluppelberg and T Mikosch ModellingExtremal Events for Insurance and Finance Springer New YorkNY USA 2003

[3] C Kleiber and S Kotz Statistical Size Distribution in Economicsand Actuarial Sciences JohnWiley amp Sons NewYork NY USA2003

[4] Beirlant J Y Goegebeur I Sergers and J Teugels Statisticsof Extremes Theory and Application John Wiley amp Sons NewYork NY USA 2005

[5] M A Beg ldquoEstimation of the tail probability of the truncatedPareto distributionrdquo Journal of Information amp OptimizationSciences vol 2 no 2 pp 192ndash198 1981

[6] I B Aban M M Meerschaert and A K Panorska ldquoParameterestimation for the truncated Pareto distributionrdquo Journal of theAmerican Statistical Association vol 101 no 473 pp 270ndash2772006

[7] B Efron ldquoBootstrap methods another look at the jackkniferdquoThe Annals of Statistics vol 7 no 1 pp 1ndash26 1979

[8] P Barbe and P Bertail The Weighted Bootstrap Springer NewYork NY USA 1995

[9] G R Shorack and J A Wellner Empirical Processes withApplications to Statistics John Wiley amp Sons New York NYUSA 1986

[10] H L Koul Weighted Empirical and Linear Models vol 21 ofLecture Notes-Monograph Series Institute of Mathematical Sta-tistics Hayward Calif USA 1992

[11] M L Huang and P H Brill ldquoA distribution estimation methodbased on level crossingsrdquo Journal of Statistical Planning andInference vol 124 no 1 pp 45ndash62 2004

[12] M L Huang ldquoThe efficiencies of a weighted distribution func-tion estimatorrdquo inThe Proceeding of American Statistical Associ-ation Nonparametric Statistics Section pp 1502ndash1506 2003

[13] B M Hill ldquoA simple general approach to inference about thetail of a distributionrdquo The Annals of Statistics vol 3 no 5 pp1163ndash1174 1975

[14] P Cooke ldquoStatistical inference for bounds of random variablesrdquoBiometrika vol 66 no 2 pp 367ndash374 1979

[15] A N Kolmogorov ldquoSulla determinazione empirica di una leggedi distribuzionerdquoGiornale dellrsquoIstituto Italiano degli Attuari vol4 pp 83ndash91 1933

[16] T W Anderson and D A Darling ldquoAsymptotic theory of cer-tain ldquogoodness of fitrdquo criteria based on stochastic processesrdquoTheAnnals of Mathematical Statistics vol 23 pp 193ndash212 1952

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article A Weighted Estimation for Risk Modeldownloads.hindawi.com › archive › 2013 › 829131.pdf · 2019-07-31 · Research Article A Weighted Estimation for Risk Model

ISRN Probability and Statistics 3

Table 1 Forest fire loss data AB Canada (in million 2006 CAD)

Year Loss1977 57171978 65491979 75641980 97901981 90541982 99891983 81601984 92951985 92601986 97351987 95621988 100431989 111961990 123741991 123971992 124141993 118191994 115151995 111081996 116331997 129161998 117561999 123192000 118522001 142082002 148902003 135612004 112662005 126592006 14033

The quantile function of the truncated Pareto distribution is

119865minus1(119906) = (

1 minus 119906

120574120572+119906

]120572)

minus1120572

0 le 119906 le 1 0 lt 120574 le ] lt infin 120572 gt 0

(5)

The mean and variance of119883 are

120583 =

120572120574120572(120574

1minus120572minus ]1minus120572)

(120572 minus 1) (1 minus (120574])120572

)

0 lt 120574 lt ] lt infin 120572 gt 0 (6)

1205902=

120572120574120572(]2minus120572 minus 1205742minus120572)

(2 minus 120572) (1 minus (120574])120572

)2minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

0 lt 120574 lt ] lt infin 120572 gt 0

(7)

13 The Weighted Empirical Distribution Function Recentlysome parametric estimation methods for the truncatedPareto distribution has been developed [6] But there isan efficiency problem in the estimates of the distribution

tails The distribution tail values and their probabilities areimportant in many fields for example value at risk in riskanalysis survival probability in survival analysis tolerancelimits in quality control prediction intervals and confidenceintervals Classical statistical inference theory depends on theclassical empirical distribution function (EDF) 119878

119899(119909)

119878119899(119909) =

1

119899

119899

sum

119894=1

119868(minusinfin119909]

(119883119894) where 119868

119860=

1 if 119909 isin 1198600 if 119909 notin 119860

(8)

119878119899(119909) is a minimum variance unbiased estimator for the

cdf 119865(119909) based on a random sample 1198831 119883

2 119883

119899 Note

that 119878119899(119909) uses the equal weight 1119899 for each sample point

Should we use equal weights on extreme data values aswell Recently authors have applied various weights to datapoints by using different philosophies that is the Jackknifemethod gives zero weight for eliminated data [7] weightedbootstrap [8] and weighted empirical distribution functionsor processes have been discussed [9 10] But there are somedifficulties to determine what weights should be used forthe data points Huang and Brill [11] introduced a weightedlevel crossing estimation method from a geometric point ofview to visualize random samples in the 119871

1-optimal sense the

method improves the efficiency of the estimation of tailsThis paper proposes a semiparametric approach to esti-

mate 119865(119909) in (4) using 1198711-optimal and 119871

2-optimal weights

Both theoretical and simulation efficiencies are consistentlyimproved when compared with existing methods Thismethod is based on a symmetric weighted empirical distribu-tion function (SWEDF) of Huang [12] namely

119865lowast

119899(119909) =

119899

sum

119894=1

119868(minusinfin119909]

(119883(119894)) 119901

119899119894 119909 isin K 119899 gt 2 (9)

where the 119901119899119894are symmetric general weights

119901119899119894equiv

119908 119894 = 2 119899 minus 1 0 lt 119908 lt1

119899 minus 2

1199081119899equiv1

2(1 minus (119899 minus 2)119908) 119894 = 1 119899

(10)

and 119883(1)

le 119883(2)

le sdot sdot sdot le 119883(119899)

are the order statistics of therandom sample Note that

0 lt 119901119899119894lt 1 119894 = 1 119899

119899

sum

119894=1

119901119899119894= 1 (11)

The parameter 119908 in (9) is the weight for the middle119899 minus 2 data 119908

1119899in (9) is the weight for the extreme data

It is interesting to explore how the value of 119908 affects theestimation of a heavy-tailed distribution We may use 119908

1119899

flexiblyIn Section 2 we propose a weighted method for esti-

mating the shape parameter and the mean of the truncatedPareto distribution In Section 3 an exact efficiency functionof the new mean estimator relative to the classical estimatoris derived Section 4 explores the 119871

1-optimal weights and

1198712-optimal weights for estimating the mean Section 5 gives

4 ISRN Probability and Statistics

results of Monte Carlo simulations The simulation efficien-cies are consistent with the exact efficiencies in Section 3 InSection 6 we analyze the hurricane loss data and forest fireloss data given in Section 1 by using the proposed methodThe statistical inference in these examples shows that theestimated distribution curve by using proposed weightedestimation method fits the tails of data better relative toseveral existing methods Suggestions for further studies arealso discussed

2 Estimation Methods

In this section we discuss the existing and proposedmethod-ologies for the truncated Pareto distribution Consider arandom sample 119883

1 119883

2 119883

119899from the distribution in (3)

and let119883(1)le 119883

(2)le sdot sdot sdot le 119883

(119899)denote its order statistics

21 Maximum Likelihood Estimators (Hill Beg and Aban)There are several different maximum likelihood estimators(MLE) for estimating the shape parameter 120572

A popular estimator is the Hill [13] MLE which uses the119903 largest order statistics 1 le 119903 le 119899 minus 1 to estimate theoriginal Pareto shape parameter 120572 in (1) When applying itto the truncated Pareto distribution in (3) it is defined as

Hill = [119903minus1

119903

sum

119894=1

ln119883(119899minus119894+1)

minus ln119883(119899minus119903)

]

minus1

(12)

Beg [5] developed the MLE method for the truncatedPareto distribution when ] is knownThe Beg MLE for 120572 canbe obtained by solving the following equation

119899[

[

1

Beg+

ln (119883(1)Beg)

((119883(1)])

minusBegminus 1)

minus ln(119892

119883(1)

)]

]

= 0 (13)

where 119883(1)

= min(1198831 119883

2 119883

119899) and 119892 = (119883

(1)119883

(2)

sdot sdot sdot 119883(119899))1119899 is the sample geometric mean

Abanrsquos MLE [6] when 120574 ] are known is obtained bysolving the equation

119899

Aban+119899(120574])

Aban ln (120574])

1 minus (120574])Aban

minus

119899

sum

119894=1

[ln119883(119894)minus ln 120574] = 0 (14)

where 120574 le 119883(119894)le ] and 119899 is the sample size

Note that wemay use estimators 120574 = min(1198831 119883

2 119883

119899)

and ] = max(1198831 119883

2 119883

119899) in (14) when 120574 ] are unknown

A similar situation is in the following equations (16) and (18)There are other estimators of 120574 and ] in the literature forexample Cooke [14]

22Moment Estimator To estimate the population truncatedPareto mean the sample mean estimator is

120583119883= 119883 =

1

119899

119899

sum

119894=1

119883119894 (15)

A Moment estimator 119872

for estimating 120572 is the solution ofthe equation

119883 =

119872120574119872 (120574

1minus119872 minus ]1minus119872)

(119872minus 1) (1 minus (120574])

119872)

119872gt 0 (16)

23 A Proposed Weighted Estimator Now to estimate thepopulation truncated Pareto mean we define a weightedmean based on the weighted empirical distribution functionin (9) as

120583119908= 119883

119908=1

119899

119899

sum

119894=1

119901119899119894119883

(119894)

=

119899minus1

sum

119894=2

119908119883(119894)+1

2(1 minus (119899 minus 2)119908) [119883

(1)+ 119883

(119899)]

(17)

where 119899 is the sample size and 119908 is the weight as defined in(9) Then for estimating 120572 we define

119908to be the solution of

the equation

119883119908=

119908120574119908 (120574

1minus119908 minus ]1minus119908)

(119908minus 1) (1 minus (120574])

119908)

119908gt 0 (18)

3 An Exact Efficiency Function

In this section an exact efficiency function of the weightedestimator 120583

119908in (17) relative to the sample mean 119883 in (15)

is given The mathematical derivations are given in theappendix

Theorem 3 Themean and mean square error (MSE) of 120583119908in

(17) when 120572 gt 0 are given by

119864 [120583119908] = 120574

119899minus1

sum

119894=2

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

)

times 120588119896[Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1)

+1

2(1 minus (119899 minus 2)119908)

times120588119896(Γ (119899 + 1) Γ (119896 + 1)

Γ (119896 + 119899 + 1)+

119899

119896 + 119899)]

(19)

119872119878119864 (120583119908) = 119886119908

2+ 119887119908 + 119888 119908ℎ119890119903119890 0 lt 119908 lt 1

119886 = 119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862

119887 = minus1

2(119899 minus 2) 119861 + 119862 minus 2120583119863 + 120583 (119899 minus 2) 119864

119888 =1

4119861 minus 120583119864 + 120583

2

119860 = 120574

119899minus1

sum

119894=2

infin

sum

119896=0

(

2

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1)

ISRN Probability and Statistics 5

+ 21205742

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times 120588119896+119897

times[Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)]

119861 = 120574

infin

sum

119896=0

(

2

120572+ 119896 minus 1

119896

)

times120588119896[Γ (119899 + 1) Γ (119896 + 1)

Γ (119896 + 119899 + 1)+

119899

119896 + 119899]

+ 21205742

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897[Γ (119899 + 1) Γ (119896 + 1)

Γ (119896 + 119899) (119896 + 119897 + 119899)]

119862 = 1205742

119899minus1

sum

119894=2

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times 120588119896+119897[Γ (119899 + 1) Γ (119896 + 1) Γ (119896 + 119897 + 119894)

Γ (119896 + 119894) Γ (119896 + 119897 + 119899 + 1)

+Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899) (119896 + 119897 + 119899)]

119863 = 120574

119899minus1

sum

119894=2

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

)

times120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1)

119864 = 120574

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

)

times 120588119896[Γ (119899 + 1) Γ (119896 + 1)

Γ (119896 + 119899 + 1)

+119899

119896 + 119899]

(20)

where 120588 = (119892120572minus 1)119892

120572 119892 = ]120574 0 lt 120588 lt 1 0 lt 119892 lt 10 lt 120574 le 119909 le ] lt infin

Γ (119911) = int

infin

0

119905119911minus1119890minus119905119889119905 119911 gt 0 119894119904 119905ℎ119890 119866119886119898119898119886 119891119906119899119888119905119894119900119899

(21)

Corollary 4 The efficiency function (EFF) of 120583119908

in (17)relative to the sample mean 119883 in (15) is given by119864119865119865 (120583

119908)

=

Var (119883)119872119878119864 (120583

119908)=

1205902119899

119872119878119864 (120583119908)

=

120572120574120572(]2minus120572 minus 1205742minus120572) (2 minus 120572) (1 minus (120574])

120572

)

119899119872119878119864 (120583119908)

minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

119899119872119878119864 (120583119908)

(22)

where 1205902 and119872119878119864(120583119908) are given in (7) and (20)

4 Optimal Weights

41 1198711-Optimal Weights Huang and Brill [11] proposed an

1198711-optimal weight which is based on Manhattan metric for

the 119865lowast

119899(119909) in (9) It is

1199081198711-opt =

1

radic119899 (119899 minus 1)

gt1

119899 for the middle data (23)

Huang and Brill [11] proved that the exact efficiency (EFF)of 119865lowast

119899(119909) in (9) relative to the EDF 119878

119899(119909) exceeds 1 on the

tails of the distribution In this paper we use this 1199081198711-opt

weight in order to improve the efficiency of estimating the tailprobability of the truncated Pareto distribution Huang [12]indicates that in general for any distribution if119908 gt 1119899 whichmeans puttingmore weight on the middle data the efficiencyof estimating the tail probability will be improved

Next we explore an alternative 1198712-optimal weight

42 1198712-Optimal Weights

Corollary 5 An L2-optimal weight 119908

1198712-opt for the efficiencyfunction of the 120583

119908given in (17) for estimating the population

mean in (3) relative to the sample mean 119883 in (16) when 119899 gt

4119862119861 + 2 is given by

1199081198712-119900119901119905 = minus

119887

2119886 (24)

The minimum119872119878119864min(120583119908) and 119864119865119865max(120583119908) are given by

119872119878119864min (120583119908) = minus1198872

4119886+ 119888

119864119865119865max (120583119908)

=

120572120574120572(]2minus120572 minus 1205742minus120572) (2 minus 120572) (1 minus (120574])

120572

)

119899 (minus1198872 (4119886) + 119888)

minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

119899 (minus1198872 (4119886) + 119888)

(25)In (24) and (25) 119886 119887 119888 119862 and 119861 are defined in (20)

6 ISRN Probability and Statistics

Table 2 Exact efficiencies of 120583119908relative to other estimators by using 119871

2-optimal weights 120574 = 1 ] = 10

Sample size 120572 1199081198712-opt 119908 = 1119899 119908

1198711-opt MSE (120583119908) MSE (119883) EFF (120583

119908(119883))

119899 = 10

08 01039601 010541

04007 04034 1006818 011446 01406 01647 117173 011847 00365 00493 13501

119899 = 20

08 005015005 005130

02017 02017 1000118 005255 00766 00824 107593 005463 00217 00247 11367

119899 = 30

08 003323003333 003390

01344 01345 1000318 003425 00528 00549 103993 003529 00155 00164 10628

119899 = 50

08 002021002 002020

00968 01063 1097818 002159 00514 00592 115363 002298 00274 00352 12849

119899 = 100

08 001001001 001005

00765 00830 1084518 001002 00542 00576 106373 001002 00380 00400 10526

The proof of Corollary 5 is in the appendixTable 2 lists the values of 119908

1198712-opt 1199081198711-opt MSEmin(120583119908)MSEmin(119883) and the exact EFFmax(120583119908) of 120583119908 relative to119883 for119899 = 10 20 30 50 and 100 120572 = 08 18 and 3 by using (24)and (25) We note that all values of 119908

1198712-opt are greater than1119899 and all exact EFFmax(120583119908) relative to119883 are greater than 1

Remark 6 The 1199081198711-opt in (23) is totally nonparametric it is

more robust and easy to use Note that1199081198712-opt in (24) depends

on the parameter 120572 In practice we may estimate 120572 first thenobtain a119908

1198712-opt while still keeping the optimal advantage Ofcourse we use the given120572 values in the simulationsHowever1199081198711-opt and 1199081198712-opt are close to each other when 119899 gt 100

5 Simulations

Next we use simulations to compare the performance ofthe three parametric MLE estimators that is Hillrsquos Begrsquosand Abanrsquos estimators in (12) (13) and (14) and the twosemiparametric estimators that is the Moment and theweighted estimators in (16) and (18) for estimating the shapeparameter 120572 in (3) We generate m = 1000 random samplesof size 119899 = 100 from the distribution of (3) 119908

1198711-opt is onlyused for the weighted estimator since Table 2 indicates thatthe values of 119908

1198712-opt and 1199081198711-opt are close to each other when119899 = 100

We know that if120572 le 2 the original Pareto distribution hasan infinite variance if 120572 le 1 then the mean is infinite Thesecases have inference difficultiesWe focus on 120572 = 05 08 and18 and let 120574 = 1 ] = 10 Figure 3 contains the box-plotsof the comparison of these five estimators Note that in thecases 120572 = 05 and 08 Begrsquos and Hillrsquos estimators have largebiases and the weighted Abanrsquos and the moment estimatorsperformed very well but sometimes Abanrsquos MLE estimateshave unstable solutions When 120572 = 18 there are similarresults except Hillrsquos and Begrsquos estimators performed better It

is interesting to see that in all three cases the median of theweighted estimator for 120572 is relatively larger than Abanrsquos andthe Momentrsquos ones We will discuss how these affect the tailestimation in the next section The simulations were run byusing MAPLE 15 with double precision

6 Examples

Now we use the proposed method and compare it withexisting methods to analyze the data of the two examplesoutlined in Section 1

61 Hurricane Loss Example

611 Comparison of Four Estimation Methods At first Welook at the hurricane loss example in Section 111 Based onthe simulation results in Figure 3 we consider the three betterestimators out of the five in the heavy-tailed case that isAbanrsquos Moment and the weighted estimators in (27) (28)and (29) for the truncated Pareto pdf 119891(119909) in (3) and thecdf 119865(119909) in (4)We also compare them with Hillrsquos estimatorin (26) for the original Pareto pdf 119891

119901(119909) in (1) and the cdf

119865119901(119909) in (2) Here

119891Hill (119909) =Hill120574

Hill

119909119908 119865Hill (119909) = 1 minus (

120574

119909)

Hill (26)

where Hill is given in (12) with 119903 = 119899 minus 1 consider

119891MLE (119909) =MLE120574

MLE119909minusMLEminus1

1 minus (120574])MLE

119865MLE (119909) = 1 minus120574MLE (119909

minusMLE minus ]minusMLE)

1 minus (120574])MLE

(27)

ISRN Probability and Statistics 7

Table 3 Estimation of index mean median and VaR for hurricane loss data

Method 120583 Median 5 value at riskPareto

(Hill) 08126 infin 868 billion 14768 billionTPD

(Aban) 06206 2174 billion 973 billion 8515 billionTPD

(Moment) 06476 2048 billion 947 billion 8255 billionTPD

(Weighted) 06990 1937 billion 902 billion 7758 billion

1614121

080604020

181614121

080604020

181614121

2

262422

080

2 4 6 8 10Weighted Aban Beg Hill Moment

2 4 6 8 10Weighted Aban Beg Hill Moment

2 4 6 8 10Weighted Aban Beg Hill Moment

120572 = 05

120572 = 08

120572 = 18

minus02

Figure 3 Box-plot of the estimators for the shape parameter 120572 byusing 119871

1-optimal weight 120574 = 1 ] = 10 sample size 119899 = 100 and

generating timesm = 1000

where MLE is Abanrsquos MLE given in (14) consider

119891119872(119909) =

119872120574119872119909

minus119872minus1

1 minus (120574])119872

119865119872(119909) = 1 minus

120574119872 (119909

minus119872 minus ]minus119872)

1 minus (120574])119872

(28)

where 119872

is given in (16) consider

119891119908(119909) =

119908120574119908119909

minus119908minus1

1 minus (120574])119908

119865119908(119909) = 1 minus

120574119908 (119909

minus119908 minus ]minus119908)

1 minus (120574])119908

(29)

where 119908

is given in (18) using the weight 1199081198711-opt =

1radic119899(119899 minus 1)

The results of these four methods are listed in Table 3 byusing the hurricane loss data where 120574 = 119883

(min) ] = 119883(max)

and 119899 = 49Figure 4 is a log-log plot showing the upper tail for the

hurricane loss data In this plot the circles represent the realdata and the straight line represents the estimated originalPareto distribution The dashed line dotted line and thicksolid line represent the estimated truncated Pareto distri-butions by using Abanrsquos Moment and Huangrsquos estimatorsrespectively We observe two conclusions intuitively

(1) The original Pareto distribution (straight line) doesnot fit the data well in the tail The truncated Paretodistribution fits the data very well using all threeestimation methods (Aban Moment and weighted)Note that the 5value at risk estimated by the originalPareto model is 147 billion the three truncatedmodelestimates are about 80 billion It appears that theoriginal Pareto model overestimates the risk Aninsurance company would set a high premium if itis using the Pareto model with the result that manypeople cannot afford to buy insurance

(2) We examine the three truncated Pareto estimatesAround the tail the weighted estimate fits the datathe best that is the ln(119875119883 gt 119909) curve turns down-ward more quickly following the trend of the datapattern because the weighted estimate

119908is largest

among the three truncated model estimates (this isconsistent with the simulation results in Figure 4)and the weighted mean estimate 120583

119908and the 5 value

at risk are the smallest among the three methods (thisis obtained by placing less weight on the extremevalue eg the 1926 great miami hurricanersquos loss)so its estimated distribution is less heavy in the tailcompared with other methods

Next in order to confirm these conclusions we runthree goodness of-fit tests Later we define the absolute errorand integrated error as the measures of the distance fromthe empirical data points to the estimated Pareto curve andtruncated Pareto curves

612 Goodness-of-Fit Tests Our objective is to test if theestimated distributions in (26)ndash(29) fit the data properlyWe test the hypotheses 119867

0 119865(119909) = 119865

lowast(119909) against 119867

1

119865(119909) = 119865lowast(119909) where 119865(119909) is the true unknown distribution

function and119865lowast(119909) is the estimated Pareto cdf in (26) or the

estimated truncated Pareto cdf in (27)ndash(29) In this paperwe use three EDF goodness-of-fit tests

8 ISRN Probability and Statistics

DataParetoAban

MomentWeighted

22 23 24 25 260

minus1

minus2

minus3

minus4

ln(x)

ln(P(X

gtx))

Figure 4 Log-log plot of the hurricane loss data

(1) The Kolmogorov-Smirnov (K-S) test [15] the teststatistic is given by

119879 = sup119909

1003816100381610038161003816119865lowast(119909) minus 119878

119899(119909)

1003816100381610038161003816 minusinfin lt 119909 lt infin (30)

where 119878119899(119909) is the EDF in (8) Under 119867

0 the two-tailed 119875

value for the K-S test is as follows

119875-value

= 1 minus [1 minus 119879

[119899(1minus119879)]

sum

119895=0

119899

119895 (119899 minus 119895)(1 minus 119879 minus

119895

119899)

119899minus119895

(119905 +119895

119899)

119895minus1

]

2

(31)

where [119899(1 minus 119879)] is the integer part of 119899(1 minus 119879)(2) Anderson and Darling [16] test (A-D test) introduced

a measure of ldquodistancerdquo between the empirical distribution119878119899(119909) and the proposed cdf119865lowast

(119909) by using ametric functionspace

1198822

119899= 119899int

infin

minusinfin

[119878119899(119909) minus 119865

lowast(119909)]

2

120595 (119865lowast(119909)) 119889119865 (32)

where 120595(119906) = 1(119906(1 minus 119906)) is a weight function with 119906 =

119865(119909) Let 119906119894= 119865(119883

(119894)) 119894 = 1 119899 and under 119867

0the test

statistic and 119875-value are given by

1198822

119899= minus119899 minus

1

119899

119899

sum

119895=1

(2119895 minus 1) (log (119906119895) + log (1 minus 119906

119899minus119895+1))

119875-value = 1 minusradic2

119911

infin

sum

119895=0

(minus1)

119895Γ (119895 + 12) (4119895 + 1)

119895119890minus(4119895+1)

21205872(8119911)

timesint

infin

0

119890((z8)(1199082+1)minus(4119895+1)212058721199082(8119911))

119889119908

(33)

008

006

004

002

0

Aban

Moment

Pareto

Weighted

4e + 10 6e + 10 8e + 10 1e + 10 12e + 1014e + 10

x

Figure 5 Absolute errors |119865lowast(119909)minus119878

119899(119909)| for hurricane loss example

on the 10 largest losses data

where 119911 is the observed value of 1198822

119899and Γ(119909) =

intinfin

0119905119909minus1

119890minus119905119889119905 119909 gt 0 is the Gamma function

(3) Cramer-von Mises test [16] proposed (C-v-M test)using 120595(119906) = 1 in (32) thus under 119867

0 the test statistic and

119875-value are given by

1198991205962=

1

12119899+

119899

sum

119895=1

(119906119895 minus2119895 minus 1

2119899)

2

119875-value = 1 minus1

120587radic119911

infin

sum

119895=0

(minus1)

2119895Γ (119895 + 12) (4119895 + 1)

12

119895radic120587119890minus(4119895+1)

2(16119911)

times11987014 ((4119895 + 1)

2

16119911)

(34)

where 11987014(119909) is the modified Bessel function of the second

kind

119870120572(119909) =

120587

2

119868minus120572(119909) minus 119868

120572(119909)

sin (120572120587)

119868120572(119909) =

infin

sum

119898=0

1

119898Γ (119898 + 120572 + 1)(119909

2)

2119898+120572

(35)

For fitting the 49 losses in the hurricane loss data forexample after computing estimates of 119865lowast

(119909) by using (26)(27) (28) and (29) we compute the absolute error (AE) in(30) and define the integrated error (IE) by

IE = [int119883(119899)

119883(119899minus119903+1)

(119878119899(119909) minus 119865

lowast(119909))

2

119889119909]

12

(36)

Figure 5 gives the absolute errors (AE) in (30) for the 10largest losses and confirms that the original Pareto estimatehas relatively larger errors and the weighted estimate hasrelatively smaller errors in the tail Figure 5 explains the datafitting of the tail of the distributions in Figure 4

We also compute the AE in (30) and IE in (36) to confirmthose tail errors by using 119903 as the number of the largest losses1 le 119903 le 119899 The AE and IE values are given in Table 4 (thesmallest values are bold with lowast)

ISRN Probability and Statistics 9

Table 4 Errors of goodness-of-fit tests 119899 = 49 for hurricane example

MethodGoodness-of-fit tests

Absolute error (AE) Integrated error (IE)119903 = 49 119903 = 18 119903 = 10 119903 = 49 119903 = 18 119903 = 10

Pareto(Hill) 01340 00584 00584 04844 03818 03723

TPD(Aban) 00948lowast 00839 00832 03114 02565 02161

TPD(Moment) 01053 00738 00737 02985lowast 02171 01825

TPD(Weighted) 01250 00561lowast 00561lowast 03098 01678lowast 01453lowast

Table 5The estimation of index mean median and VaR for forestfire loss data

Method 120583 Median 5 value at riskPareto

(Hill) 43044 1096 billion 109 billion 1855 billionTPD

(Aban) 07964 1167 billion 115 billion 1446 billionTPD

(Moment) 08260 1166 billion 115 billion 1446 billionTPD

(Weighted) 08642 1165 billion 115 billion 1446 billion

The weighted estimator has the smallest AE and IE valuesfor 119903 = 18 and 10 largest losses and its IE value is almostequal to the smallest IE value for 119903 = 49 all largest losses Westatistically conclude that theweighted estimated distributionis the best fit in the tail of the hurricane loss data

62 Forest Fire Loss Data Example Next we look at the forestfire loss example in Section 112 The data in Table 1 containsa relatively substantial number of large forest fire losses whichconvince us to use a truncated Paretomodel and compare thefour estimators in (26) (27) (28) and (29) We use the 25largest losses in this study

Figure 6 is a log-log plot which shows the upper tail forthe forest fire loss data The circles represent the real dataand the straight line represents the estimated original Paretodistribution The dashed line dotted line and thick solidline represent the estimated truncated Pareto distributionby using Abanrsquos Moment and the weighted estimatorsrespectively We can see that the estimated truncated Paretodistributions fit the data very well using all three estimationmethods and are much better than the original Paretodistribution Around the tail the weighted and the Momentestimation methods perform the best The fact that the tailof the data curves downward in Figure 6 is the evidence insupport of using a truncated Pareto model

The results of these three estimators are listed in Table 5by using the forest fire loss data where 120574 = 119883

(1) ] = 119883

(119899)

and 119899 = 25Note that the weighted estimate

119908is the largest among

the three truncated Pareto estimates The 5 value at risk ofthe original Pareto estimate is the largestThe 5VaRmay beoverestimated compared to the truncated Pareto models

Similarly as in the hurricane example we compute theAE in (30) and IE in (36) again for fitting the 25 losses of theforest fire loss data

Figure 7 gives the absolute errors (AE) in (30) for the 10largest losses and confirms that the original Pareto estimate

ndash3

ndash25

ndash2

ndash15

ndash1

ndash05

0207206 208 2092 21 211

ln(x)

ln(P(X

gtx))

DataParetoAban

MomentWeighted

Figure 6 Log-log plot for forest fire example

has relatively larger errors and the weighted estimate hasrelatively smaller errors in the tail Figure 7 explains the datafitting of the tail of the distributions in Figure 6

We also compute the AE in (30) and IE in (36) to confirmthose tail errors by using 119903 as the number of the largest losses1 le 119903 le 119899 The IE values are given in Table 6 (the smallestvalues are bold with lowast)

The weighted estimator has the smallest AE and IE valuesfor 119903 = 10 and 5 and its IE value in 119903 = 25 is almost equalto the smallest IE value We statistically conclude that theweighted estimated distribution is the best fit to the tail of theforest fire loss data

7 Conclusions

In the complicated real world it is difficult to construct amodel combining all the desired features In general thefinal model selection depends on the best fitting model Thecriteria are based on goodness-of-fit tests existence of themoments characteristic largest values and log-log plots Thehurricane loss data and forest fire loss data are well fitted bythe truncated Pareto distribution In summary

(a) we recommend that the truncated Pareto model isappropriate as a loss distribution to be used whenanalyzing huge risk loss data setsTheupper and lower

10 ISRN Probability and Statistics

Table 6 Errors of goodness-of-fit tests 119899 = 25 for forest fire loss data

MethodGoodness-of-fit tests

Absolute error (AE) Integrated error (IE)119903 = 25 119903 = 10 119903 = 5 119903 = 25 119903 = 10 119903 = 5

Pareto(Hill) 02656 01288 01288 27125 11493 10985

TPD(Aban) 01503lowast 01195 01195 13687lowast 0 7911 05430

TPD(Moment) 01520 00982 00982 13736 0 7764 05430

TPD(Weighted) 01542 00964lowast 00964lowast 13796 07577lowast 05203lowast

Aban

Moment

Pareto

Weighted

012

01

008

006

004

002

0125e + 09 13e + 09 135e + 09 14e + 09 145e + 09

x

Figure 7 Absolute errors |119865lowast(119909)minus119878

119899(119909)| for forest fire loss example

the 10 largest losses

limits can be set by the largest and smallest losses orother reasonable values

(b) the estimated loss distribution provides a predictionof the next disasterrsquos 5 value at risk The largestloss in the data set plays an important role in theprediction In Figures 4 and 6 the trend of thetail distribution is crucial for an insurance companysetting policy also for inhabitants and governmentto make plans to minimize damage from naturaldisasters

(c) the semiparametricmethods (Moment andweighted)are robust easy to use more stable and fit the databetter than theMLEmethods In both of the foregoingtwo examples the estimated shape parameter is lessthan 1 The original Pareto model is not able to usetheMoment or the weightedmethodsThis is anotheradvantage of using the truncated Pareto model

(d) the statistical inference in these examples shows thatthe estimated distribution curve by using proposedweighted estimation method fits the tails of databetter relative to the Moment and MLE estimatorsThis is due to the fact that theweighted estimator givesless weight on the extreme values and has good MSEBased on these studies we suggest that further studieson the usage of weights may be useful

Appendix

Mathematical Derivations

Lemma A1 For a truncated Pareto random variable 119883 withpdf given in (3) for 119894 119895 isin 1 2 119899 119894 lt 119895 119883

(1)le 119883

(2)le

sdot sdot sdot le 119883(119899)

are order statistics one has

119864 [119883(119894)] = 120574

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1) (A1)

119864 [1198832

(119894)] = 120574

infin

sum

119896=0

(

2

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1) (A2)

119864 [119883(119894)119883

(119895)]

= 1205742

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897

[Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)]

(A3)

where 120588 = (119892120572minus 1)119892

120572 119892 = ]120574 0 lt 120588 lt 1 0 lt 119892 lt 10 lt 120574 le 119909 le ] lt infin and 120572 gt 0

Proof Let the cdf of a truncated Pareto distribution in (4)be

119865 (119909) = 119906 119909 = 120574(1 minus 119887119906)minus1120572

120588 =119892120572minus 1

119892120572 119892 =

]

120574

(A4)

By the theory of order statistics we have

119891119894(119909) =

Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)[119865 (119909)]

119894minus1[1 minus 119865 (119909)]

119899minus119894119891 (119909)

(A5)

119891119894119895(119909 119910) =

Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)

times [119865 (119909)]119894minus1[119865 (119910) minus 119865 (119909)]

119895minusiminus1

times [1 minus 119865 (119910)]119899minus119895

119891 (119909) 119891 (119910)

(A6)

ISRN Probability and Statistics 11

Using the binomial formula 1(1 minus 119909)119903 = suminfin

119896=0(119903+119896minus1

119896) 119909

119896119898 = 1 2 we have

119864 [119883119898

(119894)] = int

]

120574

119909119898119891119894(119909) 119889119909

= int

1

0

120574(1 minus 120588119906)minus119898120572 Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)

times (119906)119894minus1(1 minus 119906)

119899minus119894119889119906

=Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)120574

times

infin

sum

119896=0

(

119898

120572+ 119896 minus 1

119896

)

timesint

1

0

120588119896(119906)

119896+119894minus1(1 minus 119906)

119899minus119894119889119906

= 120574

infin

sum

119896=0

(

119898

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1)

(A7)

By substituting 119898 = 1 and 2 respectively we have (A1) and(A2) And

119864 [119883(119894)119883

(119895)]

= int

119910

120574

int

]

120574

119909119894119909119895119891119894119895(119909 119910) 119889119909 119889119910

=Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)

times int

1

0

int

1

0

120574(1 minus 120588119906)minus1120572

120574(1 minus 120588ℎ)minus1120572

times(119906)119894minus1(ℎ minus 119906)

119895minus119894minus1(1 minus ℎ)

ℎminus119895119889119906 119889ℎ

=Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)1205742

times

infin

sum

119896=0

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

) (

1

120572+ 119897 minus 1

119897

)

times 120588119896+119897

int

1

0

int

1

0

(119906

ℎ)

119896+119894minus1

(1 minus119906

ℎ)

119895minus119894minus1

times(ℎ)119896+119897+119895minus1

(1 minus ℎ)119899minus119895119889119906 119889

119906

=

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)

(A8)

Proof of Theorem 3 When 120572 gt 0 we use

119864 [120583119908] =

119899minus1

sum

119894=2

119864 [119883(119894)] +

1

2(1 minus (119899 minus 2)119908)

times (119864 [119883(1)] + 119864 [119883

(119899)])

MSE (120583119908) = 119864 [(120583

119908minus 120583)

2

]

= 119864 [1205832

119908] minus 2120583119864 [120583

119908] + 120583

2

(A9)

And we have

119864 [1205832

119908] =

119899minus1

sum

119894=2

1199082119864 [119883

2

(119894)] + 2

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

1199082119864 [119883

(119894)119883

(119895)]

+1

4(1 minus (119899 minus 2)119908)

2

times (119864 [1198832

(1)] + 119864 [119883

2

(119899)] +2119864 [119883

(1)119883

(119899)])

+

119899minus1

sum

119894=2

119908 (1 minus (119899 minus 2)119908)

times (119864 [119883(1)119883

(119894)] + 119864 [119883

(119894)119883

(119899)])

(A10)

Let

119860 =

119899minus1

sum

119894=2

119864 [1198832

(119894)] + 2

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

119864 [119883(119894)119883

(119895)]

119861 = 119864 [1198832

(1)] + 119864 [119883

2

(119899)] + 2119864 [119883

(1)119883

(119899)]

119862 =

119899minus1

sum

119894=2

(119864 [119883(1)119883

(119894)] + 119864 [119883

(119894)119883

(119899)])

119863 =

119899minus1

sum

119894=2

119864 [119883(119894)]

119864 = 119864 [119883(1)] + 119864 [119883

(119899)]

(A11)

Then

MSE (120583119908) = 119860119908

2+1

4(1 minus (119899 minus 2)119908)

2119861

+ 119908 (1 minus (119899 minus 2)119908)119862

minus 2120583 (119863119908 +1

2(1 minus (119899 minus 2)119908) 119864) + 120583

2

12 ISRN Probability and Statistics

= (119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862)119908

2

+ (minus1

2(119899 minus 2) 119861 + 119862 minus 2120583119863 + 120583 (119899 minus 2) 119864)119908

+1

4119861 minus 120583119864 + 120583

2

(A12)

Now use Lemma A1 then we haveTheorem 3

LemmaA2 119872119878119864(120583119908) in (20) is a convex functionwith 119886 gt 0

when 119899 gt 4119862119861 + 2 119862 and 119861 are given in (20)

Proof Let 119891(119908) = MSE(120583119908) = 119886119908

2+ 119887119908 + 119888 We have

119886 = 119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862 (A13)

where

119860 = 119864[

[

(

119899minus1

sum

119894=2

119883(119894))

2

]

]

ge 0

119861 = 119864 [(119883(1)+ 119883

(119899))2

] ge 0

119862 = 119864[

119899minus1

sum

119894=2

119883(119894)(119883

(1)+ 119883

(119899))] ge 0

(A14)

Since 119899 gt 2 then1

4(119899 minus 2) 119861 minus 119862 gt 0 such that 119886 gt 0 when 119899 gt

4119862

119861+ 2

(A15)

Proof of Corollary 5 ByTheorem 3 and Lemma A2 let

119891 (119908) = MSE (120583119908) = 119886119908

2+ 119887119908 + 119888 (A16)

which is a quadratic functionwith first and secondderivativesof 119891(119908) wrt 119908

1198911015840(119908) = 2119886119908 + 119887 119891

10158401015840(119908) = 2119886 gt 0 (A17)

Hence 119891(119908) is a convex function with minimum value

119891min = 119886(minus119887

2119886)

2

+ 119887(minus119887

2119886) + 119888 = minus

1198872

4119886+ 119888 (A18)

The maximum value of the EFF(120583119908) is

EFFmax (120583119908)

=

120572120574120572(]2minus120572 minus 1205742minus120572) (2 minus 120572) (1 minus (120574])

120572

)

119899 (minus1198872 (4119886) + 119888)

minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

119899 (minus1198872 (4119886) + 119888)

at 119908 = minus119887

2119886

(A19)

Acknowledgments

The authors thank the referees and the editor for their com-ments which helped to improve the paper This research issupported by the Natural Sciences and Engineering ResearchCouncil of Canada

References

[1] R A Pielke Jr J Gratz C W Landsea D Collins M ASaunders and R Musulin ldquoNormalized hurricane damage inthe United States 1900ndash2005rdquo Natural Hazards Review vol 9no 1 pp 29ndash42 2008

[2] P Embrechts C Kluppelberg and T Mikosch ModellingExtremal Events for Insurance and Finance Springer New YorkNY USA 2003

[3] C Kleiber and S Kotz Statistical Size Distribution in Economicsand Actuarial Sciences JohnWiley amp Sons NewYork NY USA2003

[4] Beirlant J Y Goegebeur I Sergers and J Teugels Statisticsof Extremes Theory and Application John Wiley amp Sons NewYork NY USA 2005

[5] M A Beg ldquoEstimation of the tail probability of the truncatedPareto distributionrdquo Journal of Information amp OptimizationSciences vol 2 no 2 pp 192ndash198 1981

[6] I B Aban M M Meerschaert and A K Panorska ldquoParameterestimation for the truncated Pareto distributionrdquo Journal of theAmerican Statistical Association vol 101 no 473 pp 270ndash2772006

[7] B Efron ldquoBootstrap methods another look at the jackkniferdquoThe Annals of Statistics vol 7 no 1 pp 1ndash26 1979

[8] P Barbe and P Bertail The Weighted Bootstrap Springer NewYork NY USA 1995

[9] G R Shorack and J A Wellner Empirical Processes withApplications to Statistics John Wiley amp Sons New York NYUSA 1986

[10] H L Koul Weighted Empirical and Linear Models vol 21 ofLecture Notes-Monograph Series Institute of Mathematical Sta-tistics Hayward Calif USA 1992

[11] M L Huang and P H Brill ldquoA distribution estimation methodbased on level crossingsrdquo Journal of Statistical Planning andInference vol 124 no 1 pp 45ndash62 2004

[12] M L Huang ldquoThe efficiencies of a weighted distribution func-tion estimatorrdquo inThe Proceeding of American Statistical Associ-ation Nonparametric Statistics Section pp 1502ndash1506 2003

[13] B M Hill ldquoA simple general approach to inference about thetail of a distributionrdquo The Annals of Statistics vol 3 no 5 pp1163ndash1174 1975

[14] P Cooke ldquoStatistical inference for bounds of random variablesrdquoBiometrika vol 66 no 2 pp 367ndash374 1979

[15] A N Kolmogorov ldquoSulla determinazione empirica di una leggedi distribuzionerdquoGiornale dellrsquoIstituto Italiano degli Attuari vol4 pp 83ndash91 1933

[16] T W Anderson and D A Darling ldquoAsymptotic theory of cer-tain ldquogoodness of fitrdquo criteria based on stochastic processesrdquoTheAnnals of Mathematical Statistics vol 23 pp 193ndash212 1952

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article A Weighted Estimation for Risk Modeldownloads.hindawi.com › archive › 2013 › 829131.pdf · 2019-07-31 · Research Article A Weighted Estimation for Risk Model

4 ISRN Probability and Statistics

results of Monte Carlo simulations The simulation efficien-cies are consistent with the exact efficiencies in Section 3 InSection 6 we analyze the hurricane loss data and forest fireloss data given in Section 1 by using the proposed methodThe statistical inference in these examples shows that theestimated distribution curve by using proposed weightedestimation method fits the tails of data better relative toseveral existing methods Suggestions for further studies arealso discussed

2 Estimation Methods

In this section we discuss the existing and proposedmethod-ologies for the truncated Pareto distribution Consider arandom sample 119883

1 119883

2 119883

119899from the distribution in (3)

and let119883(1)le 119883

(2)le sdot sdot sdot le 119883

(119899)denote its order statistics

21 Maximum Likelihood Estimators (Hill Beg and Aban)There are several different maximum likelihood estimators(MLE) for estimating the shape parameter 120572

A popular estimator is the Hill [13] MLE which uses the119903 largest order statistics 1 le 119903 le 119899 minus 1 to estimate theoriginal Pareto shape parameter 120572 in (1) When applying itto the truncated Pareto distribution in (3) it is defined as

Hill = [119903minus1

119903

sum

119894=1

ln119883(119899minus119894+1)

minus ln119883(119899minus119903)

]

minus1

(12)

Beg [5] developed the MLE method for the truncatedPareto distribution when ] is knownThe Beg MLE for 120572 canbe obtained by solving the following equation

119899[

[

1

Beg+

ln (119883(1)Beg)

((119883(1)])

minusBegminus 1)

minus ln(119892

119883(1)

)]

]

= 0 (13)

where 119883(1)

= min(1198831 119883

2 119883

119899) and 119892 = (119883

(1)119883

(2)

sdot sdot sdot 119883(119899))1119899 is the sample geometric mean

Abanrsquos MLE [6] when 120574 ] are known is obtained bysolving the equation

119899

Aban+119899(120574])

Aban ln (120574])

1 minus (120574])Aban

minus

119899

sum

119894=1

[ln119883(119894)minus ln 120574] = 0 (14)

where 120574 le 119883(119894)le ] and 119899 is the sample size

Note that wemay use estimators 120574 = min(1198831 119883

2 119883

119899)

and ] = max(1198831 119883

2 119883

119899) in (14) when 120574 ] are unknown

A similar situation is in the following equations (16) and (18)There are other estimators of 120574 and ] in the literature forexample Cooke [14]

22Moment Estimator To estimate the population truncatedPareto mean the sample mean estimator is

120583119883= 119883 =

1

119899

119899

sum

119894=1

119883119894 (15)

A Moment estimator 119872

for estimating 120572 is the solution ofthe equation

119883 =

119872120574119872 (120574

1minus119872 minus ]1minus119872)

(119872minus 1) (1 minus (120574])

119872)

119872gt 0 (16)

23 A Proposed Weighted Estimator Now to estimate thepopulation truncated Pareto mean we define a weightedmean based on the weighted empirical distribution functionin (9) as

120583119908= 119883

119908=1

119899

119899

sum

119894=1

119901119899119894119883

(119894)

=

119899minus1

sum

119894=2

119908119883(119894)+1

2(1 minus (119899 minus 2)119908) [119883

(1)+ 119883

(119899)]

(17)

where 119899 is the sample size and 119908 is the weight as defined in(9) Then for estimating 120572 we define

119908to be the solution of

the equation

119883119908=

119908120574119908 (120574

1minus119908 minus ]1minus119908)

(119908minus 1) (1 minus (120574])

119908)

119908gt 0 (18)

3 An Exact Efficiency Function

In this section an exact efficiency function of the weightedestimator 120583

119908in (17) relative to the sample mean 119883 in (15)

is given The mathematical derivations are given in theappendix

Theorem 3 Themean and mean square error (MSE) of 120583119908in

(17) when 120572 gt 0 are given by

119864 [120583119908] = 120574

119899minus1

sum

119894=2

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

)

times 120588119896[Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1)

+1

2(1 minus (119899 minus 2)119908)

times120588119896(Γ (119899 + 1) Γ (119896 + 1)

Γ (119896 + 119899 + 1)+

119899

119896 + 119899)]

(19)

119872119878119864 (120583119908) = 119886119908

2+ 119887119908 + 119888 119908ℎ119890119903119890 0 lt 119908 lt 1

119886 = 119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862

119887 = minus1

2(119899 minus 2) 119861 + 119862 minus 2120583119863 + 120583 (119899 minus 2) 119864

119888 =1

4119861 minus 120583119864 + 120583

2

119860 = 120574

119899minus1

sum

119894=2

infin

sum

119896=0

(

2

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1)

ISRN Probability and Statistics 5

+ 21205742

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times 120588119896+119897

times[Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)]

119861 = 120574

infin

sum

119896=0

(

2

120572+ 119896 minus 1

119896

)

times120588119896[Γ (119899 + 1) Γ (119896 + 1)

Γ (119896 + 119899 + 1)+

119899

119896 + 119899]

+ 21205742

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897[Γ (119899 + 1) Γ (119896 + 1)

Γ (119896 + 119899) (119896 + 119897 + 119899)]

119862 = 1205742

119899minus1

sum

119894=2

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times 120588119896+119897[Γ (119899 + 1) Γ (119896 + 1) Γ (119896 + 119897 + 119894)

Γ (119896 + 119894) Γ (119896 + 119897 + 119899 + 1)

+Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899) (119896 + 119897 + 119899)]

119863 = 120574

119899minus1

sum

119894=2

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

)

times120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1)

119864 = 120574

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

)

times 120588119896[Γ (119899 + 1) Γ (119896 + 1)

Γ (119896 + 119899 + 1)

+119899

119896 + 119899]

(20)

where 120588 = (119892120572minus 1)119892

120572 119892 = ]120574 0 lt 120588 lt 1 0 lt 119892 lt 10 lt 120574 le 119909 le ] lt infin

Γ (119911) = int

infin

0

119905119911minus1119890minus119905119889119905 119911 gt 0 119894119904 119905ℎ119890 119866119886119898119898119886 119891119906119899119888119905119894119900119899

(21)

Corollary 4 The efficiency function (EFF) of 120583119908

in (17)relative to the sample mean 119883 in (15) is given by119864119865119865 (120583

119908)

=

Var (119883)119872119878119864 (120583

119908)=

1205902119899

119872119878119864 (120583119908)

=

120572120574120572(]2minus120572 minus 1205742minus120572) (2 minus 120572) (1 minus (120574])

120572

)

119899119872119878119864 (120583119908)

minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

119899119872119878119864 (120583119908)

(22)

where 1205902 and119872119878119864(120583119908) are given in (7) and (20)

4 Optimal Weights

41 1198711-Optimal Weights Huang and Brill [11] proposed an

1198711-optimal weight which is based on Manhattan metric for

the 119865lowast

119899(119909) in (9) It is

1199081198711-opt =

1

radic119899 (119899 minus 1)

gt1

119899 for the middle data (23)

Huang and Brill [11] proved that the exact efficiency (EFF)of 119865lowast

119899(119909) in (9) relative to the EDF 119878

119899(119909) exceeds 1 on the

tails of the distribution In this paper we use this 1199081198711-opt

weight in order to improve the efficiency of estimating the tailprobability of the truncated Pareto distribution Huang [12]indicates that in general for any distribution if119908 gt 1119899 whichmeans puttingmore weight on the middle data the efficiencyof estimating the tail probability will be improved

Next we explore an alternative 1198712-optimal weight

42 1198712-Optimal Weights

Corollary 5 An L2-optimal weight 119908

1198712-opt for the efficiencyfunction of the 120583

119908given in (17) for estimating the population

mean in (3) relative to the sample mean 119883 in (16) when 119899 gt

4119862119861 + 2 is given by

1199081198712-119900119901119905 = minus

119887

2119886 (24)

The minimum119872119878119864min(120583119908) and 119864119865119865max(120583119908) are given by

119872119878119864min (120583119908) = minus1198872

4119886+ 119888

119864119865119865max (120583119908)

=

120572120574120572(]2minus120572 minus 1205742minus120572) (2 minus 120572) (1 minus (120574])

120572

)

119899 (minus1198872 (4119886) + 119888)

minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

119899 (minus1198872 (4119886) + 119888)

(25)In (24) and (25) 119886 119887 119888 119862 and 119861 are defined in (20)

6 ISRN Probability and Statistics

Table 2 Exact efficiencies of 120583119908relative to other estimators by using 119871

2-optimal weights 120574 = 1 ] = 10

Sample size 120572 1199081198712-opt 119908 = 1119899 119908

1198711-opt MSE (120583119908) MSE (119883) EFF (120583

119908(119883))

119899 = 10

08 01039601 010541

04007 04034 1006818 011446 01406 01647 117173 011847 00365 00493 13501

119899 = 20

08 005015005 005130

02017 02017 1000118 005255 00766 00824 107593 005463 00217 00247 11367

119899 = 30

08 003323003333 003390

01344 01345 1000318 003425 00528 00549 103993 003529 00155 00164 10628

119899 = 50

08 002021002 002020

00968 01063 1097818 002159 00514 00592 115363 002298 00274 00352 12849

119899 = 100

08 001001001 001005

00765 00830 1084518 001002 00542 00576 106373 001002 00380 00400 10526

The proof of Corollary 5 is in the appendixTable 2 lists the values of 119908

1198712-opt 1199081198711-opt MSEmin(120583119908)MSEmin(119883) and the exact EFFmax(120583119908) of 120583119908 relative to119883 for119899 = 10 20 30 50 and 100 120572 = 08 18 and 3 by using (24)and (25) We note that all values of 119908

1198712-opt are greater than1119899 and all exact EFFmax(120583119908) relative to119883 are greater than 1

Remark 6 The 1199081198711-opt in (23) is totally nonparametric it is

more robust and easy to use Note that1199081198712-opt in (24) depends

on the parameter 120572 In practice we may estimate 120572 first thenobtain a119908

1198712-opt while still keeping the optimal advantage Ofcourse we use the given120572 values in the simulationsHowever1199081198711-opt and 1199081198712-opt are close to each other when 119899 gt 100

5 Simulations

Next we use simulations to compare the performance ofthe three parametric MLE estimators that is Hillrsquos Begrsquosand Abanrsquos estimators in (12) (13) and (14) and the twosemiparametric estimators that is the Moment and theweighted estimators in (16) and (18) for estimating the shapeparameter 120572 in (3) We generate m = 1000 random samplesof size 119899 = 100 from the distribution of (3) 119908

1198711-opt is onlyused for the weighted estimator since Table 2 indicates thatthe values of 119908

1198712-opt and 1199081198711-opt are close to each other when119899 = 100

We know that if120572 le 2 the original Pareto distribution hasan infinite variance if 120572 le 1 then the mean is infinite Thesecases have inference difficultiesWe focus on 120572 = 05 08 and18 and let 120574 = 1 ] = 10 Figure 3 contains the box-plotsof the comparison of these five estimators Note that in thecases 120572 = 05 and 08 Begrsquos and Hillrsquos estimators have largebiases and the weighted Abanrsquos and the moment estimatorsperformed very well but sometimes Abanrsquos MLE estimateshave unstable solutions When 120572 = 18 there are similarresults except Hillrsquos and Begrsquos estimators performed better It

is interesting to see that in all three cases the median of theweighted estimator for 120572 is relatively larger than Abanrsquos andthe Momentrsquos ones We will discuss how these affect the tailestimation in the next section The simulations were run byusing MAPLE 15 with double precision

6 Examples

Now we use the proposed method and compare it withexisting methods to analyze the data of the two examplesoutlined in Section 1

61 Hurricane Loss Example

611 Comparison of Four Estimation Methods At first Welook at the hurricane loss example in Section 111 Based onthe simulation results in Figure 3 we consider the three betterestimators out of the five in the heavy-tailed case that isAbanrsquos Moment and the weighted estimators in (27) (28)and (29) for the truncated Pareto pdf 119891(119909) in (3) and thecdf 119865(119909) in (4)We also compare them with Hillrsquos estimatorin (26) for the original Pareto pdf 119891

119901(119909) in (1) and the cdf

119865119901(119909) in (2) Here

119891Hill (119909) =Hill120574

Hill

119909119908 119865Hill (119909) = 1 minus (

120574

119909)

Hill (26)

where Hill is given in (12) with 119903 = 119899 minus 1 consider

119891MLE (119909) =MLE120574

MLE119909minusMLEminus1

1 minus (120574])MLE

119865MLE (119909) = 1 minus120574MLE (119909

minusMLE minus ]minusMLE)

1 minus (120574])MLE

(27)

ISRN Probability and Statistics 7

Table 3 Estimation of index mean median and VaR for hurricane loss data

Method 120583 Median 5 value at riskPareto

(Hill) 08126 infin 868 billion 14768 billionTPD

(Aban) 06206 2174 billion 973 billion 8515 billionTPD

(Moment) 06476 2048 billion 947 billion 8255 billionTPD

(Weighted) 06990 1937 billion 902 billion 7758 billion

1614121

080604020

181614121

080604020

181614121

2

262422

080

2 4 6 8 10Weighted Aban Beg Hill Moment

2 4 6 8 10Weighted Aban Beg Hill Moment

2 4 6 8 10Weighted Aban Beg Hill Moment

120572 = 05

120572 = 08

120572 = 18

minus02

Figure 3 Box-plot of the estimators for the shape parameter 120572 byusing 119871

1-optimal weight 120574 = 1 ] = 10 sample size 119899 = 100 and

generating timesm = 1000

where MLE is Abanrsquos MLE given in (14) consider

119891119872(119909) =

119872120574119872119909

minus119872minus1

1 minus (120574])119872

119865119872(119909) = 1 minus

120574119872 (119909

minus119872 minus ]minus119872)

1 minus (120574])119872

(28)

where 119872

is given in (16) consider

119891119908(119909) =

119908120574119908119909

minus119908minus1

1 minus (120574])119908

119865119908(119909) = 1 minus

120574119908 (119909

minus119908 minus ]minus119908)

1 minus (120574])119908

(29)

where 119908

is given in (18) using the weight 1199081198711-opt =

1radic119899(119899 minus 1)

The results of these four methods are listed in Table 3 byusing the hurricane loss data where 120574 = 119883

(min) ] = 119883(max)

and 119899 = 49Figure 4 is a log-log plot showing the upper tail for the

hurricane loss data In this plot the circles represent the realdata and the straight line represents the estimated originalPareto distribution The dashed line dotted line and thicksolid line represent the estimated truncated Pareto distri-butions by using Abanrsquos Moment and Huangrsquos estimatorsrespectively We observe two conclusions intuitively

(1) The original Pareto distribution (straight line) doesnot fit the data well in the tail The truncated Paretodistribution fits the data very well using all threeestimation methods (Aban Moment and weighted)Note that the 5value at risk estimated by the originalPareto model is 147 billion the three truncatedmodelestimates are about 80 billion It appears that theoriginal Pareto model overestimates the risk Aninsurance company would set a high premium if itis using the Pareto model with the result that manypeople cannot afford to buy insurance

(2) We examine the three truncated Pareto estimatesAround the tail the weighted estimate fits the datathe best that is the ln(119875119883 gt 119909) curve turns down-ward more quickly following the trend of the datapattern because the weighted estimate

119908is largest

among the three truncated model estimates (this isconsistent with the simulation results in Figure 4)and the weighted mean estimate 120583

119908and the 5 value

at risk are the smallest among the three methods (thisis obtained by placing less weight on the extremevalue eg the 1926 great miami hurricanersquos loss)so its estimated distribution is less heavy in the tailcompared with other methods

Next in order to confirm these conclusions we runthree goodness of-fit tests Later we define the absolute errorand integrated error as the measures of the distance fromthe empirical data points to the estimated Pareto curve andtruncated Pareto curves

612 Goodness-of-Fit Tests Our objective is to test if theestimated distributions in (26)ndash(29) fit the data properlyWe test the hypotheses 119867

0 119865(119909) = 119865

lowast(119909) against 119867

1

119865(119909) = 119865lowast(119909) where 119865(119909) is the true unknown distribution

function and119865lowast(119909) is the estimated Pareto cdf in (26) or the

estimated truncated Pareto cdf in (27)ndash(29) In this paperwe use three EDF goodness-of-fit tests

8 ISRN Probability and Statistics

DataParetoAban

MomentWeighted

22 23 24 25 260

minus1

minus2

minus3

minus4

ln(x)

ln(P(X

gtx))

Figure 4 Log-log plot of the hurricane loss data

(1) The Kolmogorov-Smirnov (K-S) test [15] the teststatistic is given by

119879 = sup119909

1003816100381610038161003816119865lowast(119909) minus 119878

119899(119909)

1003816100381610038161003816 minusinfin lt 119909 lt infin (30)

where 119878119899(119909) is the EDF in (8) Under 119867

0 the two-tailed 119875

value for the K-S test is as follows

119875-value

= 1 minus [1 minus 119879

[119899(1minus119879)]

sum

119895=0

119899

119895 (119899 minus 119895)(1 minus 119879 minus

119895

119899)

119899minus119895

(119905 +119895

119899)

119895minus1

]

2

(31)

where [119899(1 minus 119879)] is the integer part of 119899(1 minus 119879)(2) Anderson and Darling [16] test (A-D test) introduced

a measure of ldquodistancerdquo between the empirical distribution119878119899(119909) and the proposed cdf119865lowast

(119909) by using ametric functionspace

1198822

119899= 119899int

infin

minusinfin

[119878119899(119909) minus 119865

lowast(119909)]

2

120595 (119865lowast(119909)) 119889119865 (32)

where 120595(119906) = 1(119906(1 minus 119906)) is a weight function with 119906 =

119865(119909) Let 119906119894= 119865(119883

(119894)) 119894 = 1 119899 and under 119867

0the test

statistic and 119875-value are given by

1198822

119899= minus119899 minus

1

119899

119899

sum

119895=1

(2119895 minus 1) (log (119906119895) + log (1 minus 119906

119899minus119895+1))

119875-value = 1 minusradic2

119911

infin

sum

119895=0

(minus1)

119895Γ (119895 + 12) (4119895 + 1)

119895119890minus(4119895+1)

21205872(8119911)

timesint

infin

0

119890((z8)(1199082+1)minus(4119895+1)212058721199082(8119911))

119889119908

(33)

008

006

004

002

0

Aban

Moment

Pareto

Weighted

4e + 10 6e + 10 8e + 10 1e + 10 12e + 1014e + 10

x

Figure 5 Absolute errors |119865lowast(119909)minus119878

119899(119909)| for hurricane loss example

on the 10 largest losses data

where 119911 is the observed value of 1198822

119899and Γ(119909) =

intinfin

0119905119909minus1

119890minus119905119889119905 119909 gt 0 is the Gamma function

(3) Cramer-von Mises test [16] proposed (C-v-M test)using 120595(119906) = 1 in (32) thus under 119867

0 the test statistic and

119875-value are given by

1198991205962=

1

12119899+

119899

sum

119895=1

(119906119895 minus2119895 minus 1

2119899)

2

119875-value = 1 minus1

120587radic119911

infin

sum

119895=0

(minus1)

2119895Γ (119895 + 12) (4119895 + 1)

12

119895radic120587119890minus(4119895+1)

2(16119911)

times11987014 ((4119895 + 1)

2

16119911)

(34)

where 11987014(119909) is the modified Bessel function of the second

kind

119870120572(119909) =

120587

2

119868minus120572(119909) minus 119868

120572(119909)

sin (120572120587)

119868120572(119909) =

infin

sum

119898=0

1

119898Γ (119898 + 120572 + 1)(119909

2)

2119898+120572

(35)

For fitting the 49 losses in the hurricane loss data forexample after computing estimates of 119865lowast

(119909) by using (26)(27) (28) and (29) we compute the absolute error (AE) in(30) and define the integrated error (IE) by

IE = [int119883(119899)

119883(119899minus119903+1)

(119878119899(119909) minus 119865

lowast(119909))

2

119889119909]

12

(36)

Figure 5 gives the absolute errors (AE) in (30) for the 10largest losses and confirms that the original Pareto estimatehas relatively larger errors and the weighted estimate hasrelatively smaller errors in the tail Figure 5 explains the datafitting of the tail of the distributions in Figure 4

We also compute the AE in (30) and IE in (36) to confirmthose tail errors by using 119903 as the number of the largest losses1 le 119903 le 119899 The AE and IE values are given in Table 4 (thesmallest values are bold with lowast)

ISRN Probability and Statistics 9

Table 4 Errors of goodness-of-fit tests 119899 = 49 for hurricane example

MethodGoodness-of-fit tests

Absolute error (AE) Integrated error (IE)119903 = 49 119903 = 18 119903 = 10 119903 = 49 119903 = 18 119903 = 10

Pareto(Hill) 01340 00584 00584 04844 03818 03723

TPD(Aban) 00948lowast 00839 00832 03114 02565 02161

TPD(Moment) 01053 00738 00737 02985lowast 02171 01825

TPD(Weighted) 01250 00561lowast 00561lowast 03098 01678lowast 01453lowast

Table 5The estimation of index mean median and VaR for forestfire loss data

Method 120583 Median 5 value at riskPareto

(Hill) 43044 1096 billion 109 billion 1855 billionTPD

(Aban) 07964 1167 billion 115 billion 1446 billionTPD

(Moment) 08260 1166 billion 115 billion 1446 billionTPD

(Weighted) 08642 1165 billion 115 billion 1446 billion

The weighted estimator has the smallest AE and IE valuesfor 119903 = 18 and 10 largest losses and its IE value is almostequal to the smallest IE value for 119903 = 49 all largest losses Westatistically conclude that theweighted estimated distributionis the best fit in the tail of the hurricane loss data

62 Forest Fire Loss Data Example Next we look at the forestfire loss example in Section 112 The data in Table 1 containsa relatively substantial number of large forest fire losses whichconvince us to use a truncated Paretomodel and compare thefour estimators in (26) (27) (28) and (29) We use the 25largest losses in this study

Figure 6 is a log-log plot which shows the upper tail forthe forest fire loss data The circles represent the real dataand the straight line represents the estimated original Paretodistribution The dashed line dotted line and thick solidline represent the estimated truncated Pareto distributionby using Abanrsquos Moment and the weighted estimatorsrespectively We can see that the estimated truncated Paretodistributions fit the data very well using all three estimationmethods and are much better than the original Paretodistribution Around the tail the weighted and the Momentestimation methods perform the best The fact that the tailof the data curves downward in Figure 6 is the evidence insupport of using a truncated Pareto model

The results of these three estimators are listed in Table 5by using the forest fire loss data where 120574 = 119883

(1) ] = 119883

(119899)

and 119899 = 25Note that the weighted estimate

119908is the largest among

the three truncated Pareto estimates The 5 value at risk ofthe original Pareto estimate is the largestThe 5VaRmay beoverestimated compared to the truncated Pareto models

Similarly as in the hurricane example we compute theAE in (30) and IE in (36) again for fitting the 25 losses of theforest fire loss data

Figure 7 gives the absolute errors (AE) in (30) for the 10largest losses and confirms that the original Pareto estimate

ndash3

ndash25

ndash2

ndash15

ndash1

ndash05

0207206 208 2092 21 211

ln(x)

ln(P(X

gtx))

DataParetoAban

MomentWeighted

Figure 6 Log-log plot for forest fire example

has relatively larger errors and the weighted estimate hasrelatively smaller errors in the tail Figure 7 explains the datafitting of the tail of the distributions in Figure 6

We also compute the AE in (30) and IE in (36) to confirmthose tail errors by using 119903 as the number of the largest losses1 le 119903 le 119899 The IE values are given in Table 6 (the smallestvalues are bold with lowast)

The weighted estimator has the smallest AE and IE valuesfor 119903 = 10 and 5 and its IE value in 119903 = 25 is almost equalto the smallest IE value We statistically conclude that theweighted estimated distribution is the best fit to the tail of theforest fire loss data

7 Conclusions

In the complicated real world it is difficult to construct amodel combining all the desired features In general thefinal model selection depends on the best fitting model Thecriteria are based on goodness-of-fit tests existence of themoments characteristic largest values and log-log plots Thehurricane loss data and forest fire loss data are well fitted bythe truncated Pareto distribution In summary

(a) we recommend that the truncated Pareto model isappropriate as a loss distribution to be used whenanalyzing huge risk loss data setsTheupper and lower

10 ISRN Probability and Statistics

Table 6 Errors of goodness-of-fit tests 119899 = 25 for forest fire loss data

MethodGoodness-of-fit tests

Absolute error (AE) Integrated error (IE)119903 = 25 119903 = 10 119903 = 5 119903 = 25 119903 = 10 119903 = 5

Pareto(Hill) 02656 01288 01288 27125 11493 10985

TPD(Aban) 01503lowast 01195 01195 13687lowast 0 7911 05430

TPD(Moment) 01520 00982 00982 13736 0 7764 05430

TPD(Weighted) 01542 00964lowast 00964lowast 13796 07577lowast 05203lowast

Aban

Moment

Pareto

Weighted

012

01

008

006

004

002

0125e + 09 13e + 09 135e + 09 14e + 09 145e + 09

x

Figure 7 Absolute errors |119865lowast(119909)minus119878

119899(119909)| for forest fire loss example

the 10 largest losses

limits can be set by the largest and smallest losses orother reasonable values

(b) the estimated loss distribution provides a predictionof the next disasterrsquos 5 value at risk The largestloss in the data set plays an important role in theprediction In Figures 4 and 6 the trend of thetail distribution is crucial for an insurance companysetting policy also for inhabitants and governmentto make plans to minimize damage from naturaldisasters

(c) the semiparametricmethods (Moment andweighted)are robust easy to use more stable and fit the databetter than theMLEmethods In both of the foregoingtwo examples the estimated shape parameter is lessthan 1 The original Pareto model is not able to usetheMoment or the weightedmethodsThis is anotheradvantage of using the truncated Pareto model

(d) the statistical inference in these examples shows thatthe estimated distribution curve by using proposedweighted estimation method fits the tails of databetter relative to the Moment and MLE estimatorsThis is due to the fact that theweighted estimator givesless weight on the extreme values and has good MSEBased on these studies we suggest that further studieson the usage of weights may be useful

Appendix

Mathematical Derivations

Lemma A1 For a truncated Pareto random variable 119883 withpdf given in (3) for 119894 119895 isin 1 2 119899 119894 lt 119895 119883

(1)le 119883

(2)le

sdot sdot sdot le 119883(119899)

are order statistics one has

119864 [119883(119894)] = 120574

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1) (A1)

119864 [1198832

(119894)] = 120574

infin

sum

119896=0

(

2

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1) (A2)

119864 [119883(119894)119883

(119895)]

= 1205742

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897

[Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)]

(A3)

where 120588 = (119892120572minus 1)119892

120572 119892 = ]120574 0 lt 120588 lt 1 0 lt 119892 lt 10 lt 120574 le 119909 le ] lt infin and 120572 gt 0

Proof Let the cdf of a truncated Pareto distribution in (4)be

119865 (119909) = 119906 119909 = 120574(1 minus 119887119906)minus1120572

120588 =119892120572minus 1

119892120572 119892 =

]

120574

(A4)

By the theory of order statistics we have

119891119894(119909) =

Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)[119865 (119909)]

119894minus1[1 minus 119865 (119909)]

119899minus119894119891 (119909)

(A5)

119891119894119895(119909 119910) =

Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)

times [119865 (119909)]119894minus1[119865 (119910) minus 119865 (119909)]

119895minusiminus1

times [1 minus 119865 (119910)]119899minus119895

119891 (119909) 119891 (119910)

(A6)

ISRN Probability and Statistics 11

Using the binomial formula 1(1 minus 119909)119903 = suminfin

119896=0(119903+119896minus1

119896) 119909

119896119898 = 1 2 we have

119864 [119883119898

(119894)] = int

]

120574

119909119898119891119894(119909) 119889119909

= int

1

0

120574(1 minus 120588119906)minus119898120572 Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)

times (119906)119894minus1(1 minus 119906)

119899minus119894119889119906

=Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)120574

times

infin

sum

119896=0

(

119898

120572+ 119896 minus 1

119896

)

timesint

1

0

120588119896(119906)

119896+119894minus1(1 minus 119906)

119899minus119894119889119906

= 120574

infin

sum

119896=0

(

119898

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1)

(A7)

By substituting 119898 = 1 and 2 respectively we have (A1) and(A2) And

119864 [119883(119894)119883

(119895)]

= int

119910

120574

int

]

120574

119909119894119909119895119891119894119895(119909 119910) 119889119909 119889119910

=Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)

times int

1

0

int

1

0

120574(1 minus 120588119906)minus1120572

120574(1 minus 120588ℎ)minus1120572

times(119906)119894minus1(ℎ minus 119906)

119895minus119894minus1(1 minus ℎ)

ℎminus119895119889119906 119889ℎ

=Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)1205742

times

infin

sum

119896=0

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

) (

1

120572+ 119897 minus 1

119897

)

times 120588119896+119897

int

1

0

int

1

0

(119906

ℎ)

119896+119894minus1

(1 minus119906

ℎ)

119895minus119894minus1

times(ℎ)119896+119897+119895minus1

(1 minus ℎ)119899minus119895119889119906 119889

119906

=

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)

(A8)

Proof of Theorem 3 When 120572 gt 0 we use

119864 [120583119908] =

119899minus1

sum

119894=2

119864 [119883(119894)] +

1

2(1 minus (119899 minus 2)119908)

times (119864 [119883(1)] + 119864 [119883

(119899)])

MSE (120583119908) = 119864 [(120583

119908minus 120583)

2

]

= 119864 [1205832

119908] minus 2120583119864 [120583

119908] + 120583

2

(A9)

And we have

119864 [1205832

119908] =

119899minus1

sum

119894=2

1199082119864 [119883

2

(119894)] + 2

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

1199082119864 [119883

(119894)119883

(119895)]

+1

4(1 minus (119899 minus 2)119908)

2

times (119864 [1198832

(1)] + 119864 [119883

2

(119899)] +2119864 [119883

(1)119883

(119899)])

+

119899minus1

sum

119894=2

119908 (1 minus (119899 minus 2)119908)

times (119864 [119883(1)119883

(119894)] + 119864 [119883

(119894)119883

(119899)])

(A10)

Let

119860 =

119899minus1

sum

119894=2

119864 [1198832

(119894)] + 2

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

119864 [119883(119894)119883

(119895)]

119861 = 119864 [1198832

(1)] + 119864 [119883

2

(119899)] + 2119864 [119883

(1)119883

(119899)]

119862 =

119899minus1

sum

119894=2

(119864 [119883(1)119883

(119894)] + 119864 [119883

(119894)119883

(119899)])

119863 =

119899minus1

sum

119894=2

119864 [119883(119894)]

119864 = 119864 [119883(1)] + 119864 [119883

(119899)]

(A11)

Then

MSE (120583119908) = 119860119908

2+1

4(1 minus (119899 minus 2)119908)

2119861

+ 119908 (1 minus (119899 minus 2)119908)119862

minus 2120583 (119863119908 +1

2(1 minus (119899 minus 2)119908) 119864) + 120583

2

12 ISRN Probability and Statistics

= (119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862)119908

2

+ (minus1

2(119899 minus 2) 119861 + 119862 minus 2120583119863 + 120583 (119899 minus 2) 119864)119908

+1

4119861 minus 120583119864 + 120583

2

(A12)

Now use Lemma A1 then we haveTheorem 3

LemmaA2 119872119878119864(120583119908) in (20) is a convex functionwith 119886 gt 0

when 119899 gt 4119862119861 + 2 119862 and 119861 are given in (20)

Proof Let 119891(119908) = MSE(120583119908) = 119886119908

2+ 119887119908 + 119888 We have

119886 = 119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862 (A13)

where

119860 = 119864[

[

(

119899minus1

sum

119894=2

119883(119894))

2

]

]

ge 0

119861 = 119864 [(119883(1)+ 119883

(119899))2

] ge 0

119862 = 119864[

119899minus1

sum

119894=2

119883(119894)(119883

(1)+ 119883

(119899))] ge 0

(A14)

Since 119899 gt 2 then1

4(119899 minus 2) 119861 minus 119862 gt 0 such that 119886 gt 0 when 119899 gt

4119862

119861+ 2

(A15)

Proof of Corollary 5 ByTheorem 3 and Lemma A2 let

119891 (119908) = MSE (120583119908) = 119886119908

2+ 119887119908 + 119888 (A16)

which is a quadratic functionwith first and secondderivativesof 119891(119908) wrt 119908

1198911015840(119908) = 2119886119908 + 119887 119891

10158401015840(119908) = 2119886 gt 0 (A17)

Hence 119891(119908) is a convex function with minimum value

119891min = 119886(minus119887

2119886)

2

+ 119887(minus119887

2119886) + 119888 = minus

1198872

4119886+ 119888 (A18)

The maximum value of the EFF(120583119908) is

EFFmax (120583119908)

=

120572120574120572(]2minus120572 minus 1205742minus120572) (2 minus 120572) (1 minus (120574])

120572

)

119899 (minus1198872 (4119886) + 119888)

minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

119899 (minus1198872 (4119886) + 119888)

at 119908 = minus119887

2119886

(A19)

Acknowledgments

The authors thank the referees and the editor for their com-ments which helped to improve the paper This research issupported by the Natural Sciences and Engineering ResearchCouncil of Canada

References

[1] R A Pielke Jr J Gratz C W Landsea D Collins M ASaunders and R Musulin ldquoNormalized hurricane damage inthe United States 1900ndash2005rdquo Natural Hazards Review vol 9no 1 pp 29ndash42 2008

[2] P Embrechts C Kluppelberg and T Mikosch ModellingExtremal Events for Insurance and Finance Springer New YorkNY USA 2003

[3] C Kleiber and S Kotz Statistical Size Distribution in Economicsand Actuarial Sciences JohnWiley amp Sons NewYork NY USA2003

[4] Beirlant J Y Goegebeur I Sergers and J Teugels Statisticsof Extremes Theory and Application John Wiley amp Sons NewYork NY USA 2005

[5] M A Beg ldquoEstimation of the tail probability of the truncatedPareto distributionrdquo Journal of Information amp OptimizationSciences vol 2 no 2 pp 192ndash198 1981

[6] I B Aban M M Meerschaert and A K Panorska ldquoParameterestimation for the truncated Pareto distributionrdquo Journal of theAmerican Statistical Association vol 101 no 473 pp 270ndash2772006

[7] B Efron ldquoBootstrap methods another look at the jackkniferdquoThe Annals of Statistics vol 7 no 1 pp 1ndash26 1979

[8] P Barbe and P Bertail The Weighted Bootstrap Springer NewYork NY USA 1995

[9] G R Shorack and J A Wellner Empirical Processes withApplications to Statistics John Wiley amp Sons New York NYUSA 1986

[10] H L Koul Weighted Empirical and Linear Models vol 21 ofLecture Notes-Monograph Series Institute of Mathematical Sta-tistics Hayward Calif USA 1992

[11] M L Huang and P H Brill ldquoA distribution estimation methodbased on level crossingsrdquo Journal of Statistical Planning andInference vol 124 no 1 pp 45ndash62 2004

[12] M L Huang ldquoThe efficiencies of a weighted distribution func-tion estimatorrdquo inThe Proceeding of American Statistical Associ-ation Nonparametric Statistics Section pp 1502ndash1506 2003

[13] B M Hill ldquoA simple general approach to inference about thetail of a distributionrdquo The Annals of Statistics vol 3 no 5 pp1163ndash1174 1975

[14] P Cooke ldquoStatistical inference for bounds of random variablesrdquoBiometrika vol 66 no 2 pp 367ndash374 1979

[15] A N Kolmogorov ldquoSulla determinazione empirica di una leggedi distribuzionerdquoGiornale dellrsquoIstituto Italiano degli Attuari vol4 pp 83ndash91 1933

[16] T W Anderson and D A Darling ldquoAsymptotic theory of cer-tain ldquogoodness of fitrdquo criteria based on stochastic processesrdquoTheAnnals of Mathematical Statistics vol 23 pp 193ndash212 1952

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article A Weighted Estimation for Risk Modeldownloads.hindawi.com › archive › 2013 › 829131.pdf · 2019-07-31 · Research Article A Weighted Estimation for Risk Model

ISRN Probability and Statistics 5

+ 21205742

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times 120588119896+119897

times[Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)]

119861 = 120574

infin

sum

119896=0

(

2

120572+ 119896 minus 1

119896

)

times120588119896[Γ (119899 + 1) Γ (119896 + 1)

Γ (119896 + 119899 + 1)+

119899

119896 + 119899]

+ 21205742

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897[Γ (119899 + 1) Γ (119896 + 1)

Γ (119896 + 119899) (119896 + 119897 + 119899)]

119862 = 1205742

119899minus1

sum

119894=2

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times 120588119896+119897[Γ (119899 + 1) Γ (119896 + 1) Γ (119896 + 119897 + 119894)

Γ (119896 + 119894) Γ (119896 + 119897 + 119899 + 1)

+Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899) (119896 + 119897 + 119899)]

119863 = 120574

119899minus1

sum

119894=2

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

)

times120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1)

119864 = 120574

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

)

times 120588119896[Γ (119899 + 1) Γ (119896 + 1)

Γ (119896 + 119899 + 1)

+119899

119896 + 119899]

(20)

where 120588 = (119892120572minus 1)119892

120572 119892 = ]120574 0 lt 120588 lt 1 0 lt 119892 lt 10 lt 120574 le 119909 le ] lt infin

Γ (119911) = int

infin

0

119905119911minus1119890minus119905119889119905 119911 gt 0 119894119904 119905ℎ119890 119866119886119898119898119886 119891119906119899119888119905119894119900119899

(21)

Corollary 4 The efficiency function (EFF) of 120583119908

in (17)relative to the sample mean 119883 in (15) is given by119864119865119865 (120583

119908)

=

Var (119883)119872119878119864 (120583

119908)=

1205902119899

119872119878119864 (120583119908)

=

120572120574120572(]2minus120572 minus 1205742minus120572) (2 minus 120572) (1 minus (120574])

120572

)

119899119872119878119864 (120583119908)

minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

119899119872119878119864 (120583119908)

(22)

where 1205902 and119872119878119864(120583119908) are given in (7) and (20)

4 Optimal Weights

41 1198711-Optimal Weights Huang and Brill [11] proposed an

1198711-optimal weight which is based on Manhattan metric for

the 119865lowast

119899(119909) in (9) It is

1199081198711-opt =

1

radic119899 (119899 minus 1)

gt1

119899 for the middle data (23)

Huang and Brill [11] proved that the exact efficiency (EFF)of 119865lowast

119899(119909) in (9) relative to the EDF 119878

119899(119909) exceeds 1 on the

tails of the distribution In this paper we use this 1199081198711-opt

weight in order to improve the efficiency of estimating the tailprobability of the truncated Pareto distribution Huang [12]indicates that in general for any distribution if119908 gt 1119899 whichmeans puttingmore weight on the middle data the efficiencyof estimating the tail probability will be improved

Next we explore an alternative 1198712-optimal weight

42 1198712-Optimal Weights

Corollary 5 An L2-optimal weight 119908

1198712-opt for the efficiencyfunction of the 120583

119908given in (17) for estimating the population

mean in (3) relative to the sample mean 119883 in (16) when 119899 gt

4119862119861 + 2 is given by

1199081198712-119900119901119905 = minus

119887

2119886 (24)

The minimum119872119878119864min(120583119908) and 119864119865119865max(120583119908) are given by

119872119878119864min (120583119908) = minus1198872

4119886+ 119888

119864119865119865max (120583119908)

=

120572120574120572(]2minus120572 minus 1205742minus120572) (2 minus 120572) (1 minus (120574])

120572

)

119899 (minus1198872 (4119886) + 119888)

minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

119899 (minus1198872 (4119886) + 119888)

(25)In (24) and (25) 119886 119887 119888 119862 and 119861 are defined in (20)

6 ISRN Probability and Statistics

Table 2 Exact efficiencies of 120583119908relative to other estimators by using 119871

2-optimal weights 120574 = 1 ] = 10

Sample size 120572 1199081198712-opt 119908 = 1119899 119908

1198711-opt MSE (120583119908) MSE (119883) EFF (120583

119908(119883))

119899 = 10

08 01039601 010541

04007 04034 1006818 011446 01406 01647 117173 011847 00365 00493 13501

119899 = 20

08 005015005 005130

02017 02017 1000118 005255 00766 00824 107593 005463 00217 00247 11367

119899 = 30

08 003323003333 003390

01344 01345 1000318 003425 00528 00549 103993 003529 00155 00164 10628

119899 = 50

08 002021002 002020

00968 01063 1097818 002159 00514 00592 115363 002298 00274 00352 12849

119899 = 100

08 001001001 001005

00765 00830 1084518 001002 00542 00576 106373 001002 00380 00400 10526

The proof of Corollary 5 is in the appendixTable 2 lists the values of 119908

1198712-opt 1199081198711-opt MSEmin(120583119908)MSEmin(119883) and the exact EFFmax(120583119908) of 120583119908 relative to119883 for119899 = 10 20 30 50 and 100 120572 = 08 18 and 3 by using (24)and (25) We note that all values of 119908

1198712-opt are greater than1119899 and all exact EFFmax(120583119908) relative to119883 are greater than 1

Remark 6 The 1199081198711-opt in (23) is totally nonparametric it is

more robust and easy to use Note that1199081198712-opt in (24) depends

on the parameter 120572 In practice we may estimate 120572 first thenobtain a119908

1198712-opt while still keeping the optimal advantage Ofcourse we use the given120572 values in the simulationsHowever1199081198711-opt and 1199081198712-opt are close to each other when 119899 gt 100

5 Simulations

Next we use simulations to compare the performance ofthe three parametric MLE estimators that is Hillrsquos Begrsquosand Abanrsquos estimators in (12) (13) and (14) and the twosemiparametric estimators that is the Moment and theweighted estimators in (16) and (18) for estimating the shapeparameter 120572 in (3) We generate m = 1000 random samplesof size 119899 = 100 from the distribution of (3) 119908

1198711-opt is onlyused for the weighted estimator since Table 2 indicates thatthe values of 119908

1198712-opt and 1199081198711-opt are close to each other when119899 = 100

We know that if120572 le 2 the original Pareto distribution hasan infinite variance if 120572 le 1 then the mean is infinite Thesecases have inference difficultiesWe focus on 120572 = 05 08 and18 and let 120574 = 1 ] = 10 Figure 3 contains the box-plotsof the comparison of these five estimators Note that in thecases 120572 = 05 and 08 Begrsquos and Hillrsquos estimators have largebiases and the weighted Abanrsquos and the moment estimatorsperformed very well but sometimes Abanrsquos MLE estimateshave unstable solutions When 120572 = 18 there are similarresults except Hillrsquos and Begrsquos estimators performed better It

is interesting to see that in all three cases the median of theweighted estimator for 120572 is relatively larger than Abanrsquos andthe Momentrsquos ones We will discuss how these affect the tailestimation in the next section The simulations were run byusing MAPLE 15 with double precision

6 Examples

Now we use the proposed method and compare it withexisting methods to analyze the data of the two examplesoutlined in Section 1

61 Hurricane Loss Example

611 Comparison of Four Estimation Methods At first Welook at the hurricane loss example in Section 111 Based onthe simulation results in Figure 3 we consider the three betterestimators out of the five in the heavy-tailed case that isAbanrsquos Moment and the weighted estimators in (27) (28)and (29) for the truncated Pareto pdf 119891(119909) in (3) and thecdf 119865(119909) in (4)We also compare them with Hillrsquos estimatorin (26) for the original Pareto pdf 119891

119901(119909) in (1) and the cdf

119865119901(119909) in (2) Here

119891Hill (119909) =Hill120574

Hill

119909119908 119865Hill (119909) = 1 minus (

120574

119909)

Hill (26)

where Hill is given in (12) with 119903 = 119899 minus 1 consider

119891MLE (119909) =MLE120574

MLE119909minusMLEminus1

1 minus (120574])MLE

119865MLE (119909) = 1 minus120574MLE (119909

minusMLE minus ]minusMLE)

1 minus (120574])MLE

(27)

ISRN Probability and Statistics 7

Table 3 Estimation of index mean median and VaR for hurricane loss data

Method 120583 Median 5 value at riskPareto

(Hill) 08126 infin 868 billion 14768 billionTPD

(Aban) 06206 2174 billion 973 billion 8515 billionTPD

(Moment) 06476 2048 billion 947 billion 8255 billionTPD

(Weighted) 06990 1937 billion 902 billion 7758 billion

1614121

080604020

181614121

080604020

181614121

2

262422

080

2 4 6 8 10Weighted Aban Beg Hill Moment

2 4 6 8 10Weighted Aban Beg Hill Moment

2 4 6 8 10Weighted Aban Beg Hill Moment

120572 = 05

120572 = 08

120572 = 18

minus02

Figure 3 Box-plot of the estimators for the shape parameter 120572 byusing 119871

1-optimal weight 120574 = 1 ] = 10 sample size 119899 = 100 and

generating timesm = 1000

where MLE is Abanrsquos MLE given in (14) consider

119891119872(119909) =

119872120574119872119909

minus119872minus1

1 minus (120574])119872

119865119872(119909) = 1 minus

120574119872 (119909

minus119872 minus ]minus119872)

1 minus (120574])119872

(28)

where 119872

is given in (16) consider

119891119908(119909) =

119908120574119908119909

minus119908minus1

1 minus (120574])119908

119865119908(119909) = 1 minus

120574119908 (119909

minus119908 minus ]minus119908)

1 minus (120574])119908

(29)

where 119908

is given in (18) using the weight 1199081198711-opt =

1radic119899(119899 minus 1)

The results of these four methods are listed in Table 3 byusing the hurricane loss data where 120574 = 119883

(min) ] = 119883(max)

and 119899 = 49Figure 4 is a log-log plot showing the upper tail for the

hurricane loss data In this plot the circles represent the realdata and the straight line represents the estimated originalPareto distribution The dashed line dotted line and thicksolid line represent the estimated truncated Pareto distri-butions by using Abanrsquos Moment and Huangrsquos estimatorsrespectively We observe two conclusions intuitively

(1) The original Pareto distribution (straight line) doesnot fit the data well in the tail The truncated Paretodistribution fits the data very well using all threeestimation methods (Aban Moment and weighted)Note that the 5value at risk estimated by the originalPareto model is 147 billion the three truncatedmodelestimates are about 80 billion It appears that theoriginal Pareto model overestimates the risk Aninsurance company would set a high premium if itis using the Pareto model with the result that manypeople cannot afford to buy insurance

(2) We examine the three truncated Pareto estimatesAround the tail the weighted estimate fits the datathe best that is the ln(119875119883 gt 119909) curve turns down-ward more quickly following the trend of the datapattern because the weighted estimate

119908is largest

among the three truncated model estimates (this isconsistent with the simulation results in Figure 4)and the weighted mean estimate 120583

119908and the 5 value

at risk are the smallest among the three methods (thisis obtained by placing less weight on the extremevalue eg the 1926 great miami hurricanersquos loss)so its estimated distribution is less heavy in the tailcompared with other methods

Next in order to confirm these conclusions we runthree goodness of-fit tests Later we define the absolute errorand integrated error as the measures of the distance fromthe empirical data points to the estimated Pareto curve andtruncated Pareto curves

612 Goodness-of-Fit Tests Our objective is to test if theestimated distributions in (26)ndash(29) fit the data properlyWe test the hypotheses 119867

0 119865(119909) = 119865

lowast(119909) against 119867

1

119865(119909) = 119865lowast(119909) where 119865(119909) is the true unknown distribution

function and119865lowast(119909) is the estimated Pareto cdf in (26) or the

estimated truncated Pareto cdf in (27)ndash(29) In this paperwe use three EDF goodness-of-fit tests

8 ISRN Probability and Statistics

DataParetoAban

MomentWeighted

22 23 24 25 260

minus1

minus2

minus3

minus4

ln(x)

ln(P(X

gtx))

Figure 4 Log-log plot of the hurricane loss data

(1) The Kolmogorov-Smirnov (K-S) test [15] the teststatistic is given by

119879 = sup119909

1003816100381610038161003816119865lowast(119909) minus 119878

119899(119909)

1003816100381610038161003816 minusinfin lt 119909 lt infin (30)

where 119878119899(119909) is the EDF in (8) Under 119867

0 the two-tailed 119875

value for the K-S test is as follows

119875-value

= 1 minus [1 minus 119879

[119899(1minus119879)]

sum

119895=0

119899

119895 (119899 minus 119895)(1 minus 119879 minus

119895

119899)

119899minus119895

(119905 +119895

119899)

119895minus1

]

2

(31)

where [119899(1 minus 119879)] is the integer part of 119899(1 minus 119879)(2) Anderson and Darling [16] test (A-D test) introduced

a measure of ldquodistancerdquo between the empirical distribution119878119899(119909) and the proposed cdf119865lowast

(119909) by using ametric functionspace

1198822

119899= 119899int

infin

minusinfin

[119878119899(119909) minus 119865

lowast(119909)]

2

120595 (119865lowast(119909)) 119889119865 (32)

where 120595(119906) = 1(119906(1 minus 119906)) is a weight function with 119906 =

119865(119909) Let 119906119894= 119865(119883

(119894)) 119894 = 1 119899 and under 119867

0the test

statistic and 119875-value are given by

1198822

119899= minus119899 minus

1

119899

119899

sum

119895=1

(2119895 minus 1) (log (119906119895) + log (1 minus 119906

119899minus119895+1))

119875-value = 1 minusradic2

119911

infin

sum

119895=0

(minus1)

119895Γ (119895 + 12) (4119895 + 1)

119895119890minus(4119895+1)

21205872(8119911)

timesint

infin

0

119890((z8)(1199082+1)minus(4119895+1)212058721199082(8119911))

119889119908

(33)

008

006

004

002

0

Aban

Moment

Pareto

Weighted

4e + 10 6e + 10 8e + 10 1e + 10 12e + 1014e + 10

x

Figure 5 Absolute errors |119865lowast(119909)minus119878

119899(119909)| for hurricane loss example

on the 10 largest losses data

where 119911 is the observed value of 1198822

119899and Γ(119909) =

intinfin

0119905119909minus1

119890minus119905119889119905 119909 gt 0 is the Gamma function

(3) Cramer-von Mises test [16] proposed (C-v-M test)using 120595(119906) = 1 in (32) thus under 119867

0 the test statistic and

119875-value are given by

1198991205962=

1

12119899+

119899

sum

119895=1

(119906119895 minus2119895 minus 1

2119899)

2

119875-value = 1 minus1

120587radic119911

infin

sum

119895=0

(minus1)

2119895Γ (119895 + 12) (4119895 + 1)

12

119895radic120587119890minus(4119895+1)

2(16119911)

times11987014 ((4119895 + 1)

2

16119911)

(34)

where 11987014(119909) is the modified Bessel function of the second

kind

119870120572(119909) =

120587

2

119868minus120572(119909) minus 119868

120572(119909)

sin (120572120587)

119868120572(119909) =

infin

sum

119898=0

1

119898Γ (119898 + 120572 + 1)(119909

2)

2119898+120572

(35)

For fitting the 49 losses in the hurricane loss data forexample after computing estimates of 119865lowast

(119909) by using (26)(27) (28) and (29) we compute the absolute error (AE) in(30) and define the integrated error (IE) by

IE = [int119883(119899)

119883(119899minus119903+1)

(119878119899(119909) minus 119865

lowast(119909))

2

119889119909]

12

(36)

Figure 5 gives the absolute errors (AE) in (30) for the 10largest losses and confirms that the original Pareto estimatehas relatively larger errors and the weighted estimate hasrelatively smaller errors in the tail Figure 5 explains the datafitting of the tail of the distributions in Figure 4

We also compute the AE in (30) and IE in (36) to confirmthose tail errors by using 119903 as the number of the largest losses1 le 119903 le 119899 The AE and IE values are given in Table 4 (thesmallest values are bold with lowast)

ISRN Probability and Statistics 9

Table 4 Errors of goodness-of-fit tests 119899 = 49 for hurricane example

MethodGoodness-of-fit tests

Absolute error (AE) Integrated error (IE)119903 = 49 119903 = 18 119903 = 10 119903 = 49 119903 = 18 119903 = 10

Pareto(Hill) 01340 00584 00584 04844 03818 03723

TPD(Aban) 00948lowast 00839 00832 03114 02565 02161

TPD(Moment) 01053 00738 00737 02985lowast 02171 01825

TPD(Weighted) 01250 00561lowast 00561lowast 03098 01678lowast 01453lowast

Table 5The estimation of index mean median and VaR for forestfire loss data

Method 120583 Median 5 value at riskPareto

(Hill) 43044 1096 billion 109 billion 1855 billionTPD

(Aban) 07964 1167 billion 115 billion 1446 billionTPD

(Moment) 08260 1166 billion 115 billion 1446 billionTPD

(Weighted) 08642 1165 billion 115 billion 1446 billion

The weighted estimator has the smallest AE and IE valuesfor 119903 = 18 and 10 largest losses and its IE value is almostequal to the smallest IE value for 119903 = 49 all largest losses Westatistically conclude that theweighted estimated distributionis the best fit in the tail of the hurricane loss data

62 Forest Fire Loss Data Example Next we look at the forestfire loss example in Section 112 The data in Table 1 containsa relatively substantial number of large forest fire losses whichconvince us to use a truncated Paretomodel and compare thefour estimators in (26) (27) (28) and (29) We use the 25largest losses in this study

Figure 6 is a log-log plot which shows the upper tail forthe forest fire loss data The circles represent the real dataand the straight line represents the estimated original Paretodistribution The dashed line dotted line and thick solidline represent the estimated truncated Pareto distributionby using Abanrsquos Moment and the weighted estimatorsrespectively We can see that the estimated truncated Paretodistributions fit the data very well using all three estimationmethods and are much better than the original Paretodistribution Around the tail the weighted and the Momentestimation methods perform the best The fact that the tailof the data curves downward in Figure 6 is the evidence insupport of using a truncated Pareto model

The results of these three estimators are listed in Table 5by using the forest fire loss data where 120574 = 119883

(1) ] = 119883

(119899)

and 119899 = 25Note that the weighted estimate

119908is the largest among

the three truncated Pareto estimates The 5 value at risk ofthe original Pareto estimate is the largestThe 5VaRmay beoverestimated compared to the truncated Pareto models

Similarly as in the hurricane example we compute theAE in (30) and IE in (36) again for fitting the 25 losses of theforest fire loss data

Figure 7 gives the absolute errors (AE) in (30) for the 10largest losses and confirms that the original Pareto estimate

ndash3

ndash25

ndash2

ndash15

ndash1

ndash05

0207206 208 2092 21 211

ln(x)

ln(P(X

gtx))

DataParetoAban

MomentWeighted

Figure 6 Log-log plot for forest fire example

has relatively larger errors and the weighted estimate hasrelatively smaller errors in the tail Figure 7 explains the datafitting of the tail of the distributions in Figure 6

We also compute the AE in (30) and IE in (36) to confirmthose tail errors by using 119903 as the number of the largest losses1 le 119903 le 119899 The IE values are given in Table 6 (the smallestvalues are bold with lowast)

The weighted estimator has the smallest AE and IE valuesfor 119903 = 10 and 5 and its IE value in 119903 = 25 is almost equalto the smallest IE value We statistically conclude that theweighted estimated distribution is the best fit to the tail of theforest fire loss data

7 Conclusions

In the complicated real world it is difficult to construct amodel combining all the desired features In general thefinal model selection depends on the best fitting model Thecriteria are based on goodness-of-fit tests existence of themoments characteristic largest values and log-log plots Thehurricane loss data and forest fire loss data are well fitted bythe truncated Pareto distribution In summary

(a) we recommend that the truncated Pareto model isappropriate as a loss distribution to be used whenanalyzing huge risk loss data setsTheupper and lower

10 ISRN Probability and Statistics

Table 6 Errors of goodness-of-fit tests 119899 = 25 for forest fire loss data

MethodGoodness-of-fit tests

Absolute error (AE) Integrated error (IE)119903 = 25 119903 = 10 119903 = 5 119903 = 25 119903 = 10 119903 = 5

Pareto(Hill) 02656 01288 01288 27125 11493 10985

TPD(Aban) 01503lowast 01195 01195 13687lowast 0 7911 05430

TPD(Moment) 01520 00982 00982 13736 0 7764 05430

TPD(Weighted) 01542 00964lowast 00964lowast 13796 07577lowast 05203lowast

Aban

Moment

Pareto

Weighted

012

01

008

006

004

002

0125e + 09 13e + 09 135e + 09 14e + 09 145e + 09

x

Figure 7 Absolute errors |119865lowast(119909)minus119878

119899(119909)| for forest fire loss example

the 10 largest losses

limits can be set by the largest and smallest losses orother reasonable values

(b) the estimated loss distribution provides a predictionof the next disasterrsquos 5 value at risk The largestloss in the data set plays an important role in theprediction In Figures 4 and 6 the trend of thetail distribution is crucial for an insurance companysetting policy also for inhabitants and governmentto make plans to minimize damage from naturaldisasters

(c) the semiparametricmethods (Moment andweighted)are robust easy to use more stable and fit the databetter than theMLEmethods In both of the foregoingtwo examples the estimated shape parameter is lessthan 1 The original Pareto model is not able to usetheMoment or the weightedmethodsThis is anotheradvantage of using the truncated Pareto model

(d) the statistical inference in these examples shows thatthe estimated distribution curve by using proposedweighted estimation method fits the tails of databetter relative to the Moment and MLE estimatorsThis is due to the fact that theweighted estimator givesless weight on the extreme values and has good MSEBased on these studies we suggest that further studieson the usage of weights may be useful

Appendix

Mathematical Derivations

Lemma A1 For a truncated Pareto random variable 119883 withpdf given in (3) for 119894 119895 isin 1 2 119899 119894 lt 119895 119883

(1)le 119883

(2)le

sdot sdot sdot le 119883(119899)

are order statistics one has

119864 [119883(119894)] = 120574

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1) (A1)

119864 [1198832

(119894)] = 120574

infin

sum

119896=0

(

2

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1) (A2)

119864 [119883(119894)119883

(119895)]

= 1205742

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897

[Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)]

(A3)

where 120588 = (119892120572minus 1)119892

120572 119892 = ]120574 0 lt 120588 lt 1 0 lt 119892 lt 10 lt 120574 le 119909 le ] lt infin and 120572 gt 0

Proof Let the cdf of a truncated Pareto distribution in (4)be

119865 (119909) = 119906 119909 = 120574(1 minus 119887119906)minus1120572

120588 =119892120572minus 1

119892120572 119892 =

]

120574

(A4)

By the theory of order statistics we have

119891119894(119909) =

Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)[119865 (119909)]

119894minus1[1 minus 119865 (119909)]

119899minus119894119891 (119909)

(A5)

119891119894119895(119909 119910) =

Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)

times [119865 (119909)]119894minus1[119865 (119910) minus 119865 (119909)]

119895minusiminus1

times [1 minus 119865 (119910)]119899minus119895

119891 (119909) 119891 (119910)

(A6)

ISRN Probability and Statistics 11

Using the binomial formula 1(1 minus 119909)119903 = suminfin

119896=0(119903+119896minus1

119896) 119909

119896119898 = 1 2 we have

119864 [119883119898

(119894)] = int

]

120574

119909119898119891119894(119909) 119889119909

= int

1

0

120574(1 minus 120588119906)minus119898120572 Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)

times (119906)119894minus1(1 minus 119906)

119899minus119894119889119906

=Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)120574

times

infin

sum

119896=0

(

119898

120572+ 119896 minus 1

119896

)

timesint

1

0

120588119896(119906)

119896+119894minus1(1 minus 119906)

119899minus119894119889119906

= 120574

infin

sum

119896=0

(

119898

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1)

(A7)

By substituting 119898 = 1 and 2 respectively we have (A1) and(A2) And

119864 [119883(119894)119883

(119895)]

= int

119910

120574

int

]

120574

119909119894119909119895119891119894119895(119909 119910) 119889119909 119889119910

=Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)

times int

1

0

int

1

0

120574(1 minus 120588119906)minus1120572

120574(1 minus 120588ℎ)minus1120572

times(119906)119894minus1(ℎ minus 119906)

119895minus119894minus1(1 minus ℎ)

ℎminus119895119889119906 119889ℎ

=Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)1205742

times

infin

sum

119896=0

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

) (

1

120572+ 119897 minus 1

119897

)

times 120588119896+119897

int

1

0

int

1

0

(119906

ℎ)

119896+119894minus1

(1 minus119906

ℎ)

119895minus119894minus1

times(ℎ)119896+119897+119895minus1

(1 minus ℎ)119899minus119895119889119906 119889

119906

=

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)

(A8)

Proof of Theorem 3 When 120572 gt 0 we use

119864 [120583119908] =

119899minus1

sum

119894=2

119864 [119883(119894)] +

1

2(1 minus (119899 minus 2)119908)

times (119864 [119883(1)] + 119864 [119883

(119899)])

MSE (120583119908) = 119864 [(120583

119908minus 120583)

2

]

= 119864 [1205832

119908] minus 2120583119864 [120583

119908] + 120583

2

(A9)

And we have

119864 [1205832

119908] =

119899minus1

sum

119894=2

1199082119864 [119883

2

(119894)] + 2

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

1199082119864 [119883

(119894)119883

(119895)]

+1

4(1 minus (119899 minus 2)119908)

2

times (119864 [1198832

(1)] + 119864 [119883

2

(119899)] +2119864 [119883

(1)119883

(119899)])

+

119899minus1

sum

119894=2

119908 (1 minus (119899 minus 2)119908)

times (119864 [119883(1)119883

(119894)] + 119864 [119883

(119894)119883

(119899)])

(A10)

Let

119860 =

119899minus1

sum

119894=2

119864 [1198832

(119894)] + 2

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

119864 [119883(119894)119883

(119895)]

119861 = 119864 [1198832

(1)] + 119864 [119883

2

(119899)] + 2119864 [119883

(1)119883

(119899)]

119862 =

119899minus1

sum

119894=2

(119864 [119883(1)119883

(119894)] + 119864 [119883

(119894)119883

(119899)])

119863 =

119899minus1

sum

119894=2

119864 [119883(119894)]

119864 = 119864 [119883(1)] + 119864 [119883

(119899)]

(A11)

Then

MSE (120583119908) = 119860119908

2+1

4(1 minus (119899 minus 2)119908)

2119861

+ 119908 (1 minus (119899 minus 2)119908)119862

minus 2120583 (119863119908 +1

2(1 minus (119899 minus 2)119908) 119864) + 120583

2

12 ISRN Probability and Statistics

= (119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862)119908

2

+ (minus1

2(119899 minus 2) 119861 + 119862 minus 2120583119863 + 120583 (119899 minus 2) 119864)119908

+1

4119861 minus 120583119864 + 120583

2

(A12)

Now use Lemma A1 then we haveTheorem 3

LemmaA2 119872119878119864(120583119908) in (20) is a convex functionwith 119886 gt 0

when 119899 gt 4119862119861 + 2 119862 and 119861 are given in (20)

Proof Let 119891(119908) = MSE(120583119908) = 119886119908

2+ 119887119908 + 119888 We have

119886 = 119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862 (A13)

where

119860 = 119864[

[

(

119899minus1

sum

119894=2

119883(119894))

2

]

]

ge 0

119861 = 119864 [(119883(1)+ 119883

(119899))2

] ge 0

119862 = 119864[

119899minus1

sum

119894=2

119883(119894)(119883

(1)+ 119883

(119899))] ge 0

(A14)

Since 119899 gt 2 then1

4(119899 minus 2) 119861 minus 119862 gt 0 such that 119886 gt 0 when 119899 gt

4119862

119861+ 2

(A15)

Proof of Corollary 5 ByTheorem 3 and Lemma A2 let

119891 (119908) = MSE (120583119908) = 119886119908

2+ 119887119908 + 119888 (A16)

which is a quadratic functionwith first and secondderivativesof 119891(119908) wrt 119908

1198911015840(119908) = 2119886119908 + 119887 119891

10158401015840(119908) = 2119886 gt 0 (A17)

Hence 119891(119908) is a convex function with minimum value

119891min = 119886(minus119887

2119886)

2

+ 119887(minus119887

2119886) + 119888 = minus

1198872

4119886+ 119888 (A18)

The maximum value of the EFF(120583119908) is

EFFmax (120583119908)

=

120572120574120572(]2minus120572 minus 1205742minus120572) (2 minus 120572) (1 minus (120574])

120572

)

119899 (minus1198872 (4119886) + 119888)

minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

119899 (minus1198872 (4119886) + 119888)

at 119908 = minus119887

2119886

(A19)

Acknowledgments

The authors thank the referees and the editor for their com-ments which helped to improve the paper This research issupported by the Natural Sciences and Engineering ResearchCouncil of Canada

References

[1] R A Pielke Jr J Gratz C W Landsea D Collins M ASaunders and R Musulin ldquoNormalized hurricane damage inthe United States 1900ndash2005rdquo Natural Hazards Review vol 9no 1 pp 29ndash42 2008

[2] P Embrechts C Kluppelberg and T Mikosch ModellingExtremal Events for Insurance and Finance Springer New YorkNY USA 2003

[3] C Kleiber and S Kotz Statistical Size Distribution in Economicsand Actuarial Sciences JohnWiley amp Sons NewYork NY USA2003

[4] Beirlant J Y Goegebeur I Sergers and J Teugels Statisticsof Extremes Theory and Application John Wiley amp Sons NewYork NY USA 2005

[5] M A Beg ldquoEstimation of the tail probability of the truncatedPareto distributionrdquo Journal of Information amp OptimizationSciences vol 2 no 2 pp 192ndash198 1981

[6] I B Aban M M Meerschaert and A K Panorska ldquoParameterestimation for the truncated Pareto distributionrdquo Journal of theAmerican Statistical Association vol 101 no 473 pp 270ndash2772006

[7] B Efron ldquoBootstrap methods another look at the jackkniferdquoThe Annals of Statistics vol 7 no 1 pp 1ndash26 1979

[8] P Barbe and P Bertail The Weighted Bootstrap Springer NewYork NY USA 1995

[9] G R Shorack and J A Wellner Empirical Processes withApplications to Statistics John Wiley amp Sons New York NYUSA 1986

[10] H L Koul Weighted Empirical and Linear Models vol 21 ofLecture Notes-Monograph Series Institute of Mathematical Sta-tistics Hayward Calif USA 1992

[11] M L Huang and P H Brill ldquoA distribution estimation methodbased on level crossingsrdquo Journal of Statistical Planning andInference vol 124 no 1 pp 45ndash62 2004

[12] M L Huang ldquoThe efficiencies of a weighted distribution func-tion estimatorrdquo inThe Proceeding of American Statistical Associ-ation Nonparametric Statistics Section pp 1502ndash1506 2003

[13] B M Hill ldquoA simple general approach to inference about thetail of a distributionrdquo The Annals of Statistics vol 3 no 5 pp1163ndash1174 1975

[14] P Cooke ldquoStatistical inference for bounds of random variablesrdquoBiometrika vol 66 no 2 pp 367ndash374 1979

[15] A N Kolmogorov ldquoSulla determinazione empirica di una leggedi distribuzionerdquoGiornale dellrsquoIstituto Italiano degli Attuari vol4 pp 83ndash91 1933

[16] T W Anderson and D A Darling ldquoAsymptotic theory of cer-tain ldquogoodness of fitrdquo criteria based on stochastic processesrdquoTheAnnals of Mathematical Statistics vol 23 pp 193ndash212 1952

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article A Weighted Estimation for Risk Modeldownloads.hindawi.com › archive › 2013 › 829131.pdf · 2019-07-31 · Research Article A Weighted Estimation for Risk Model

6 ISRN Probability and Statistics

Table 2 Exact efficiencies of 120583119908relative to other estimators by using 119871

2-optimal weights 120574 = 1 ] = 10

Sample size 120572 1199081198712-opt 119908 = 1119899 119908

1198711-opt MSE (120583119908) MSE (119883) EFF (120583

119908(119883))

119899 = 10

08 01039601 010541

04007 04034 1006818 011446 01406 01647 117173 011847 00365 00493 13501

119899 = 20

08 005015005 005130

02017 02017 1000118 005255 00766 00824 107593 005463 00217 00247 11367

119899 = 30

08 003323003333 003390

01344 01345 1000318 003425 00528 00549 103993 003529 00155 00164 10628

119899 = 50

08 002021002 002020

00968 01063 1097818 002159 00514 00592 115363 002298 00274 00352 12849

119899 = 100

08 001001001 001005

00765 00830 1084518 001002 00542 00576 106373 001002 00380 00400 10526

The proof of Corollary 5 is in the appendixTable 2 lists the values of 119908

1198712-opt 1199081198711-opt MSEmin(120583119908)MSEmin(119883) and the exact EFFmax(120583119908) of 120583119908 relative to119883 for119899 = 10 20 30 50 and 100 120572 = 08 18 and 3 by using (24)and (25) We note that all values of 119908

1198712-opt are greater than1119899 and all exact EFFmax(120583119908) relative to119883 are greater than 1

Remark 6 The 1199081198711-opt in (23) is totally nonparametric it is

more robust and easy to use Note that1199081198712-opt in (24) depends

on the parameter 120572 In practice we may estimate 120572 first thenobtain a119908

1198712-opt while still keeping the optimal advantage Ofcourse we use the given120572 values in the simulationsHowever1199081198711-opt and 1199081198712-opt are close to each other when 119899 gt 100

5 Simulations

Next we use simulations to compare the performance ofthe three parametric MLE estimators that is Hillrsquos Begrsquosand Abanrsquos estimators in (12) (13) and (14) and the twosemiparametric estimators that is the Moment and theweighted estimators in (16) and (18) for estimating the shapeparameter 120572 in (3) We generate m = 1000 random samplesof size 119899 = 100 from the distribution of (3) 119908

1198711-opt is onlyused for the weighted estimator since Table 2 indicates thatthe values of 119908

1198712-opt and 1199081198711-opt are close to each other when119899 = 100

We know that if120572 le 2 the original Pareto distribution hasan infinite variance if 120572 le 1 then the mean is infinite Thesecases have inference difficultiesWe focus on 120572 = 05 08 and18 and let 120574 = 1 ] = 10 Figure 3 contains the box-plotsof the comparison of these five estimators Note that in thecases 120572 = 05 and 08 Begrsquos and Hillrsquos estimators have largebiases and the weighted Abanrsquos and the moment estimatorsperformed very well but sometimes Abanrsquos MLE estimateshave unstable solutions When 120572 = 18 there are similarresults except Hillrsquos and Begrsquos estimators performed better It

is interesting to see that in all three cases the median of theweighted estimator for 120572 is relatively larger than Abanrsquos andthe Momentrsquos ones We will discuss how these affect the tailestimation in the next section The simulations were run byusing MAPLE 15 with double precision

6 Examples

Now we use the proposed method and compare it withexisting methods to analyze the data of the two examplesoutlined in Section 1

61 Hurricane Loss Example

611 Comparison of Four Estimation Methods At first Welook at the hurricane loss example in Section 111 Based onthe simulation results in Figure 3 we consider the three betterestimators out of the five in the heavy-tailed case that isAbanrsquos Moment and the weighted estimators in (27) (28)and (29) for the truncated Pareto pdf 119891(119909) in (3) and thecdf 119865(119909) in (4)We also compare them with Hillrsquos estimatorin (26) for the original Pareto pdf 119891

119901(119909) in (1) and the cdf

119865119901(119909) in (2) Here

119891Hill (119909) =Hill120574

Hill

119909119908 119865Hill (119909) = 1 minus (

120574

119909)

Hill (26)

where Hill is given in (12) with 119903 = 119899 minus 1 consider

119891MLE (119909) =MLE120574

MLE119909minusMLEminus1

1 minus (120574])MLE

119865MLE (119909) = 1 minus120574MLE (119909

minusMLE minus ]minusMLE)

1 minus (120574])MLE

(27)

ISRN Probability and Statistics 7

Table 3 Estimation of index mean median and VaR for hurricane loss data

Method 120583 Median 5 value at riskPareto

(Hill) 08126 infin 868 billion 14768 billionTPD

(Aban) 06206 2174 billion 973 billion 8515 billionTPD

(Moment) 06476 2048 billion 947 billion 8255 billionTPD

(Weighted) 06990 1937 billion 902 billion 7758 billion

1614121

080604020

181614121

080604020

181614121

2

262422

080

2 4 6 8 10Weighted Aban Beg Hill Moment

2 4 6 8 10Weighted Aban Beg Hill Moment

2 4 6 8 10Weighted Aban Beg Hill Moment

120572 = 05

120572 = 08

120572 = 18

minus02

Figure 3 Box-plot of the estimators for the shape parameter 120572 byusing 119871

1-optimal weight 120574 = 1 ] = 10 sample size 119899 = 100 and

generating timesm = 1000

where MLE is Abanrsquos MLE given in (14) consider

119891119872(119909) =

119872120574119872119909

minus119872minus1

1 minus (120574])119872

119865119872(119909) = 1 minus

120574119872 (119909

minus119872 minus ]minus119872)

1 minus (120574])119872

(28)

where 119872

is given in (16) consider

119891119908(119909) =

119908120574119908119909

minus119908minus1

1 minus (120574])119908

119865119908(119909) = 1 minus

120574119908 (119909

minus119908 minus ]minus119908)

1 minus (120574])119908

(29)

where 119908

is given in (18) using the weight 1199081198711-opt =

1radic119899(119899 minus 1)

The results of these four methods are listed in Table 3 byusing the hurricane loss data where 120574 = 119883

(min) ] = 119883(max)

and 119899 = 49Figure 4 is a log-log plot showing the upper tail for the

hurricane loss data In this plot the circles represent the realdata and the straight line represents the estimated originalPareto distribution The dashed line dotted line and thicksolid line represent the estimated truncated Pareto distri-butions by using Abanrsquos Moment and Huangrsquos estimatorsrespectively We observe two conclusions intuitively

(1) The original Pareto distribution (straight line) doesnot fit the data well in the tail The truncated Paretodistribution fits the data very well using all threeestimation methods (Aban Moment and weighted)Note that the 5value at risk estimated by the originalPareto model is 147 billion the three truncatedmodelestimates are about 80 billion It appears that theoriginal Pareto model overestimates the risk Aninsurance company would set a high premium if itis using the Pareto model with the result that manypeople cannot afford to buy insurance

(2) We examine the three truncated Pareto estimatesAround the tail the weighted estimate fits the datathe best that is the ln(119875119883 gt 119909) curve turns down-ward more quickly following the trend of the datapattern because the weighted estimate

119908is largest

among the three truncated model estimates (this isconsistent with the simulation results in Figure 4)and the weighted mean estimate 120583

119908and the 5 value

at risk are the smallest among the three methods (thisis obtained by placing less weight on the extremevalue eg the 1926 great miami hurricanersquos loss)so its estimated distribution is less heavy in the tailcompared with other methods

Next in order to confirm these conclusions we runthree goodness of-fit tests Later we define the absolute errorand integrated error as the measures of the distance fromthe empirical data points to the estimated Pareto curve andtruncated Pareto curves

612 Goodness-of-Fit Tests Our objective is to test if theestimated distributions in (26)ndash(29) fit the data properlyWe test the hypotheses 119867

0 119865(119909) = 119865

lowast(119909) against 119867

1

119865(119909) = 119865lowast(119909) where 119865(119909) is the true unknown distribution

function and119865lowast(119909) is the estimated Pareto cdf in (26) or the

estimated truncated Pareto cdf in (27)ndash(29) In this paperwe use three EDF goodness-of-fit tests

8 ISRN Probability and Statistics

DataParetoAban

MomentWeighted

22 23 24 25 260

minus1

minus2

minus3

minus4

ln(x)

ln(P(X

gtx))

Figure 4 Log-log plot of the hurricane loss data

(1) The Kolmogorov-Smirnov (K-S) test [15] the teststatistic is given by

119879 = sup119909

1003816100381610038161003816119865lowast(119909) minus 119878

119899(119909)

1003816100381610038161003816 minusinfin lt 119909 lt infin (30)

where 119878119899(119909) is the EDF in (8) Under 119867

0 the two-tailed 119875

value for the K-S test is as follows

119875-value

= 1 minus [1 minus 119879

[119899(1minus119879)]

sum

119895=0

119899

119895 (119899 minus 119895)(1 minus 119879 minus

119895

119899)

119899minus119895

(119905 +119895

119899)

119895minus1

]

2

(31)

where [119899(1 minus 119879)] is the integer part of 119899(1 minus 119879)(2) Anderson and Darling [16] test (A-D test) introduced

a measure of ldquodistancerdquo between the empirical distribution119878119899(119909) and the proposed cdf119865lowast

(119909) by using ametric functionspace

1198822

119899= 119899int

infin

minusinfin

[119878119899(119909) minus 119865

lowast(119909)]

2

120595 (119865lowast(119909)) 119889119865 (32)

where 120595(119906) = 1(119906(1 minus 119906)) is a weight function with 119906 =

119865(119909) Let 119906119894= 119865(119883

(119894)) 119894 = 1 119899 and under 119867

0the test

statistic and 119875-value are given by

1198822

119899= minus119899 minus

1

119899

119899

sum

119895=1

(2119895 minus 1) (log (119906119895) + log (1 minus 119906

119899minus119895+1))

119875-value = 1 minusradic2

119911

infin

sum

119895=0

(minus1)

119895Γ (119895 + 12) (4119895 + 1)

119895119890minus(4119895+1)

21205872(8119911)

timesint

infin

0

119890((z8)(1199082+1)minus(4119895+1)212058721199082(8119911))

119889119908

(33)

008

006

004

002

0

Aban

Moment

Pareto

Weighted

4e + 10 6e + 10 8e + 10 1e + 10 12e + 1014e + 10

x

Figure 5 Absolute errors |119865lowast(119909)minus119878

119899(119909)| for hurricane loss example

on the 10 largest losses data

where 119911 is the observed value of 1198822

119899and Γ(119909) =

intinfin

0119905119909minus1

119890minus119905119889119905 119909 gt 0 is the Gamma function

(3) Cramer-von Mises test [16] proposed (C-v-M test)using 120595(119906) = 1 in (32) thus under 119867

0 the test statistic and

119875-value are given by

1198991205962=

1

12119899+

119899

sum

119895=1

(119906119895 minus2119895 minus 1

2119899)

2

119875-value = 1 minus1

120587radic119911

infin

sum

119895=0

(minus1)

2119895Γ (119895 + 12) (4119895 + 1)

12

119895radic120587119890minus(4119895+1)

2(16119911)

times11987014 ((4119895 + 1)

2

16119911)

(34)

where 11987014(119909) is the modified Bessel function of the second

kind

119870120572(119909) =

120587

2

119868minus120572(119909) minus 119868

120572(119909)

sin (120572120587)

119868120572(119909) =

infin

sum

119898=0

1

119898Γ (119898 + 120572 + 1)(119909

2)

2119898+120572

(35)

For fitting the 49 losses in the hurricane loss data forexample after computing estimates of 119865lowast

(119909) by using (26)(27) (28) and (29) we compute the absolute error (AE) in(30) and define the integrated error (IE) by

IE = [int119883(119899)

119883(119899minus119903+1)

(119878119899(119909) minus 119865

lowast(119909))

2

119889119909]

12

(36)

Figure 5 gives the absolute errors (AE) in (30) for the 10largest losses and confirms that the original Pareto estimatehas relatively larger errors and the weighted estimate hasrelatively smaller errors in the tail Figure 5 explains the datafitting of the tail of the distributions in Figure 4

We also compute the AE in (30) and IE in (36) to confirmthose tail errors by using 119903 as the number of the largest losses1 le 119903 le 119899 The AE and IE values are given in Table 4 (thesmallest values are bold with lowast)

ISRN Probability and Statistics 9

Table 4 Errors of goodness-of-fit tests 119899 = 49 for hurricane example

MethodGoodness-of-fit tests

Absolute error (AE) Integrated error (IE)119903 = 49 119903 = 18 119903 = 10 119903 = 49 119903 = 18 119903 = 10

Pareto(Hill) 01340 00584 00584 04844 03818 03723

TPD(Aban) 00948lowast 00839 00832 03114 02565 02161

TPD(Moment) 01053 00738 00737 02985lowast 02171 01825

TPD(Weighted) 01250 00561lowast 00561lowast 03098 01678lowast 01453lowast

Table 5The estimation of index mean median and VaR for forestfire loss data

Method 120583 Median 5 value at riskPareto

(Hill) 43044 1096 billion 109 billion 1855 billionTPD

(Aban) 07964 1167 billion 115 billion 1446 billionTPD

(Moment) 08260 1166 billion 115 billion 1446 billionTPD

(Weighted) 08642 1165 billion 115 billion 1446 billion

The weighted estimator has the smallest AE and IE valuesfor 119903 = 18 and 10 largest losses and its IE value is almostequal to the smallest IE value for 119903 = 49 all largest losses Westatistically conclude that theweighted estimated distributionis the best fit in the tail of the hurricane loss data

62 Forest Fire Loss Data Example Next we look at the forestfire loss example in Section 112 The data in Table 1 containsa relatively substantial number of large forest fire losses whichconvince us to use a truncated Paretomodel and compare thefour estimators in (26) (27) (28) and (29) We use the 25largest losses in this study

Figure 6 is a log-log plot which shows the upper tail forthe forest fire loss data The circles represent the real dataand the straight line represents the estimated original Paretodistribution The dashed line dotted line and thick solidline represent the estimated truncated Pareto distributionby using Abanrsquos Moment and the weighted estimatorsrespectively We can see that the estimated truncated Paretodistributions fit the data very well using all three estimationmethods and are much better than the original Paretodistribution Around the tail the weighted and the Momentestimation methods perform the best The fact that the tailof the data curves downward in Figure 6 is the evidence insupport of using a truncated Pareto model

The results of these three estimators are listed in Table 5by using the forest fire loss data where 120574 = 119883

(1) ] = 119883

(119899)

and 119899 = 25Note that the weighted estimate

119908is the largest among

the three truncated Pareto estimates The 5 value at risk ofthe original Pareto estimate is the largestThe 5VaRmay beoverestimated compared to the truncated Pareto models

Similarly as in the hurricane example we compute theAE in (30) and IE in (36) again for fitting the 25 losses of theforest fire loss data

Figure 7 gives the absolute errors (AE) in (30) for the 10largest losses and confirms that the original Pareto estimate

ndash3

ndash25

ndash2

ndash15

ndash1

ndash05

0207206 208 2092 21 211

ln(x)

ln(P(X

gtx))

DataParetoAban

MomentWeighted

Figure 6 Log-log plot for forest fire example

has relatively larger errors and the weighted estimate hasrelatively smaller errors in the tail Figure 7 explains the datafitting of the tail of the distributions in Figure 6

We also compute the AE in (30) and IE in (36) to confirmthose tail errors by using 119903 as the number of the largest losses1 le 119903 le 119899 The IE values are given in Table 6 (the smallestvalues are bold with lowast)

The weighted estimator has the smallest AE and IE valuesfor 119903 = 10 and 5 and its IE value in 119903 = 25 is almost equalto the smallest IE value We statistically conclude that theweighted estimated distribution is the best fit to the tail of theforest fire loss data

7 Conclusions

In the complicated real world it is difficult to construct amodel combining all the desired features In general thefinal model selection depends on the best fitting model Thecriteria are based on goodness-of-fit tests existence of themoments characteristic largest values and log-log plots Thehurricane loss data and forest fire loss data are well fitted bythe truncated Pareto distribution In summary

(a) we recommend that the truncated Pareto model isappropriate as a loss distribution to be used whenanalyzing huge risk loss data setsTheupper and lower

10 ISRN Probability and Statistics

Table 6 Errors of goodness-of-fit tests 119899 = 25 for forest fire loss data

MethodGoodness-of-fit tests

Absolute error (AE) Integrated error (IE)119903 = 25 119903 = 10 119903 = 5 119903 = 25 119903 = 10 119903 = 5

Pareto(Hill) 02656 01288 01288 27125 11493 10985

TPD(Aban) 01503lowast 01195 01195 13687lowast 0 7911 05430

TPD(Moment) 01520 00982 00982 13736 0 7764 05430

TPD(Weighted) 01542 00964lowast 00964lowast 13796 07577lowast 05203lowast

Aban

Moment

Pareto

Weighted

012

01

008

006

004

002

0125e + 09 13e + 09 135e + 09 14e + 09 145e + 09

x

Figure 7 Absolute errors |119865lowast(119909)minus119878

119899(119909)| for forest fire loss example

the 10 largest losses

limits can be set by the largest and smallest losses orother reasonable values

(b) the estimated loss distribution provides a predictionof the next disasterrsquos 5 value at risk The largestloss in the data set plays an important role in theprediction In Figures 4 and 6 the trend of thetail distribution is crucial for an insurance companysetting policy also for inhabitants and governmentto make plans to minimize damage from naturaldisasters

(c) the semiparametricmethods (Moment andweighted)are robust easy to use more stable and fit the databetter than theMLEmethods In both of the foregoingtwo examples the estimated shape parameter is lessthan 1 The original Pareto model is not able to usetheMoment or the weightedmethodsThis is anotheradvantage of using the truncated Pareto model

(d) the statistical inference in these examples shows thatthe estimated distribution curve by using proposedweighted estimation method fits the tails of databetter relative to the Moment and MLE estimatorsThis is due to the fact that theweighted estimator givesless weight on the extreme values and has good MSEBased on these studies we suggest that further studieson the usage of weights may be useful

Appendix

Mathematical Derivations

Lemma A1 For a truncated Pareto random variable 119883 withpdf given in (3) for 119894 119895 isin 1 2 119899 119894 lt 119895 119883

(1)le 119883

(2)le

sdot sdot sdot le 119883(119899)

are order statistics one has

119864 [119883(119894)] = 120574

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1) (A1)

119864 [1198832

(119894)] = 120574

infin

sum

119896=0

(

2

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1) (A2)

119864 [119883(119894)119883

(119895)]

= 1205742

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897

[Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)]

(A3)

where 120588 = (119892120572minus 1)119892

120572 119892 = ]120574 0 lt 120588 lt 1 0 lt 119892 lt 10 lt 120574 le 119909 le ] lt infin and 120572 gt 0

Proof Let the cdf of a truncated Pareto distribution in (4)be

119865 (119909) = 119906 119909 = 120574(1 minus 119887119906)minus1120572

120588 =119892120572minus 1

119892120572 119892 =

]

120574

(A4)

By the theory of order statistics we have

119891119894(119909) =

Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)[119865 (119909)]

119894minus1[1 minus 119865 (119909)]

119899minus119894119891 (119909)

(A5)

119891119894119895(119909 119910) =

Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)

times [119865 (119909)]119894minus1[119865 (119910) minus 119865 (119909)]

119895minusiminus1

times [1 minus 119865 (119910)]119899minus119895

119891 (119909) 119891 (119910)

(A6)

ISRN Probability and Statistics 11

Using the binomial formula 1(1 minus 119909)119903 = suminfin

119896=0(119903+119896minus1

119896) 119909

119896119898 = 1 2 we have

119864 [119883119898

(119894)] = int

]

120574

119909119898119891119894(119909) 119889119909

= int

1

0

120574(1 minus 120588119906)minus119898120572 Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)

times (119906)119894minus1(1 minus 119906)

119899minus119894119889119906

=Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)120574

times

infin

sum

119896=0

(

119898

120572+ 119896 minus 1

119896

)

timesint

1

0

120588119896(119906)

119896+119894minus1(1 minus 119906)

119899minus119894119889119906

= 120574

infin

sum

119896=0

(

119898

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1)

(A7)

By substituting 119898 = 1 and 2 respectively we have (A1) and(A2) And

119864 [119883(119894)119883

(119895)]

= int

119910

120574

int

]

120574

119909119894119909119895119891119894119895(119909 119910) 119889119909 119889119910

=Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)

times int

1

0

int

1

0

120574(1 minus 120588119906)minus1120572

120574(1 minus 120588ℎ)minus1120572

times(119906)119894minus1(ℎ minus 119906)

119895minus119894minus1(1 minus ℎ)

ℎminus119895119889119906 119889ℎ

=Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)1205742

times

infin

sum

119896=0

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

) (

1

120572+ 119897 minus 1

119897

)

times 120588119896+119897

int

1

0

int

1

0

(119906

ℎ)

119896+119894minus1

(1 minus119906

ℎ)

119895minus119894minus1

times(ℎ)119896+119897+119895minus1

(1 minus ℎ)119899minus119895119889119906 119889

119906

=

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)

(A8)

Proof of Theorem 3 When 120572 gt 0 we use

119864 [120583119908] =

119899minus1

sum

119894=2

119864 [119883(119894)] +

1

2(1 minus (119899 minus 2)119908)

times (119864 [119883(1)] + 119864 [119883

(119899)])

MSE (120583119908) = 119864 [(120583

119908minus 120583)

2

]

= 119864 [1205832

119908] minus 2120583119864 [120583

119908] + 120583

2

(A9)

And we have

119864 [1205832

119908] =

119899minus1

sum

119894=2

1199082119864 [119883

2

(119894)] + 2

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

1199082119864 [119883

(119894)119883

(119895)]

+1

4(1 minus (119899 minus 2)119908)

2

times (119864 [1198832

(1)] + 119864 [119883

2

(119899)] +2119864 [119883

(1)119883

(119899)])

+

119899minus1

sum

119894=2

119908 (1 minus (119899 minus 2)119908)

times (119864 [119883(1)119883

(119894)] + 119864 [119883

(119894)119883

(119899)])

(A10)

Let

119860 =

119899minus1

sum

119894=2

119864 [1198832

(119894)] + 2

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

119864 [119883(119894)119883

(119895)]

119861 = 119864 [1198832

(1)] + 119864 [119883

2

(119899)] + 2119864 [119883

(1)119883

(119899)]

119862 =

119899minus1

sum

119894=2

(119864 [119883(1)119883

(119894)] + 119864 [119883

(119894)119883

(119899)])

119863 =

119899minus1

sum

119894=2

119864 [119883(119894)]

119864 = 119864 [119883(1)] + 119864 [119883

(119899)]

(A11)

Then

MSE (120583119908) = 119860119908

2+1

4(1 minus (119899 minus 2)119908)

2119861

+ 119908 (1 minus (119899 minus 2)119908)119862

minus 2120583 (119863119908 +1

2(1 minus (119899 minus 2)119908) 119864) + 120583

2

12 ISRN Probability and Statistics

= (119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862)119908

2

+ (minus1

2(119899 minus 2) 119861 + 119862 minus 2120583119863 + 120583 (119899 minus 2) 119864)119908

+1

4119861 minus 120583119864 + 120583

2

(A12)

Now use Lemma A1 then we haveTheorem 3

LemmaA2 119872119878119864(120583119908) in (20) is a convex functionwith 119886 gt 0

when 119899 gt 4119862119861 + 2 119862 and 119861 are given in (20)

Proof Let 119891(119908) = MSE(120583119908) = 119886119908

2+ 119887119908 + 119888 We have

119886 = 119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862 (A13)

where

119860 = 119864[

[

(

119899minus1

sum

119894=2

119883(119894))

2

]

]

ge 0

119861 = 119864 [(119883(1)+ 119883

(119899))2

] ge 0

119862 = 119864[

119899minus1

sum

119894=2

119883(119894)(119883

(1)+ 119883

(119899))] ge 0

(A14)

Since 119899 gt 2 then1

4(119899 minus 2) 119861 minus 119862 gt 0 such that 119886 gt 0 when 119899 gt

4119862

119861+ 2

(A15)

Proof of Corollary 5 ByTheorem 3 and Lemma A2 let

119891 (119908) = MSE (120583119908) = 119886119908

2+ 119887119908 + 119888 (A16)

which is a quadratic functionwith first and secondderivativesof 119891(119908) wrt 119908

1198911015840(119908) = 2119886119908 + 119887 119891

10158401015840(119908) = 2119886 gt 0 (A17)

Hence 119891(119908) is a convex function with minimum value

119891min = 119886(minus119887

2119886)

2

+ 119887(minus119887

2119886) + 119888 = minus

1198872

4119886+ 119888 (A18)

The maximum value of the EFF(120583119908) is

EFFmax (120583119908)

=

120572120574120572(]2minus120572 minus 1205742minus120572) (2 minus 120572) (1 minus (120574])

120572

)

119899 (minus1198872 (4119886) + 119888)

minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

119899 (minus1198872 (4119886) + 119888)

at 119908 = minus119887

2119886

(A19)

Acknowledgments

The authors thank the referees and the editor for their com-ments which helped to improve the paper This research issupported by the Natural Sciences and Engineering ResearchCouncil of Canada

References

[1] R A Pielke Jr J Gratz C W Landsea D Collins M ASaunders and R Musulin ldquoNormalized hurricane damage inthe United States 1900ndash2005rdquo Natural Hazards Review vol 9no 1 pp 29ndash42 2008

[2] P Embrechts C Kluppelberg and T Mikosch ModellingExtremal Events for Insurance and Finance Springer New YorkNY USA 2003

[3] C Kleiber and S Kotz Statistical Size Distribution in Economicsand Actuarial Sciences JohnWiley amp Sons NewYork NY USA2003

[4] Beirlant J Y Goegebeur I Sergers and J Teugels Statisticsof Extremes Theory and Application John Wiley amp Sons NewYork NY USA 2005

[5] M A Beg ldquoEstimation of the tail probability of the truncatedPareto distributionrdquo Journal of Information amp OptimizationSciences vol 2 no 2 pp 192ndash198 1981

[6] I B Aban M M Meerschaert and A K Panorska ldquoParameterestimation for the truncated Pareto distributionrdquo Journal of theAmerican Statistical Association vol 101 no 473 pp 270ndash2772006

[7] B Efron ldquoBootstrap methods another look at the jackkniferdquoThe Annals of Statistics vol 7 no 1 pp 1ndash26 1979

[8] P Barbe and P Bertail The Weighted Bootstrap Springer NewYork NY USA 1995

[9] G R Shorack and J A Wellner Empirical Processes withApplications to Statistics John Wiley amp Sons New York NYUSA 1986

[10] H L Koul Weighted Empirical and Linear Models vol 21 ofLecture Notes-Monograph Series Institute of Mathematical Sta-tistics Hayward Calif USA 1992

[11] M L Huang and P H Brill ldquoA distribution estimation methodbased on level crossingsrdquo Journal of Statistical Planning andInference vol 124 no 1 pp 45ndash62 2004

[12] M L Huang ldquoThe efficiencies of a weighted distribution func-tion estimatorrdquo inThe Proceeding of American Statistical Associ-ation Nonparametric Statistics Section pp 1502ndash1506 2003

[13] B M Hill ldquoA simple general approach to inference about thetail of a distributionrdquo The Annals of Statistics vol 3 no 5 pp1163ndash1174 1975

[14] P Cooke ldquoStatistical inference for bounds of random variablesrdquoBiometrika vol 66 no 2 pp 367ndash374 1979

[15] A N Kolmogorov ldquoSulla determinazione empirica di una leggedi distribuzionerdquoGiornale dellrsquoIstituto Italiano degli Attuari vol4 pp 83ndash91 1933

[16] T W Anderson and D A Darling ldquoAsymptotic theory of cer-tain ldquogoodness of fitrdquo criteria based on stochastic processesrdquoTheAnnals of Mathematical Statistics vol 23 pp 193ndash212 1952

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article A Weighted Estimation for Risk Modeldownloads.hindawi.com › archive › 2013 › 829131.pdf · 2019-07-31 · Research Article A Weighted Estimation for Risk Model

ISRN Probability and Statistics 7

Table 3 Estimation of index mean median and VaR for hurricane loss data

Method 120583 Median 5 value at riskPareto

(Hill) 08126 infin 868 billion 14768 billionTPD

(Aban) 06206 2174 billion 973 billion 8515 billionTPD

(Moment) 06476 2048 billion 947 billion 8255 billionTPD

(Weighted) 06990 1937 billion 902 billion 7758 billion

1614121

080604020

181614121

080604020

181614121

2

262422

080

2 4 6 8 10Weighted Aban Beg Hill Moment

2 4 6 8 10Weighted Aban Beg Hill Moment

2 4 6 8 10Weighted Aban Beg Hill Moment

120572 = 05

120572 = 08

120572 = 18

minus02

Figure 3 Box-plot of the estimators for the shape parameter 120572 byusing 119871

1-optimal weight 120574 = 1 ] = 10 sample size 119899 = 100 and

generating timesm = 1000

where MLE is Abanrsquos MLE given in (14) consider

119891119872(119909) =

119872120574119872119909

minus119872minus1

1 minus (120574])119872

119865119872(119909) = 1 minus

120574119872 (119909

minus119872 minus ]minus119872)

1 minus (120574])119872

(28)

where 119872

is given in (16) consider

119891119908(119909) =

119908120574119908119909

minus119908minus1

1 minus (120574])119908

119865119908(119909) = 1 minus

120574119908 (119909

minus119908 minus ]minus119908)

1 minus (120574])119908

(29)

where 119908

is given in (18) using the weight 1199081198711-opt =

1radic119899(119899 minus 1)

The results of these four methods are listed in Table 3 byusing the hurricane loss data where 120574 = 119883

(min) ] = 119883(max)

and 119899 = 49Figure 4 is a log-log plot showing the upper tail for the

hurricane loss data In this plot the circles represent the realdata and the straight line represents the estimated originalPareto distribution The dashed line dotted line and thicksolid line represent the estimated truncated Pareto distri-butions by using Abanrsquos Moment and Huangrsquos estimatorsrespectively We observe two conclusions intuitively

(1) The original Pareto distribution (straight line) doesnot fit the data well in the tail The truncated Paretodistribution fits the data very well using all threeestimation methods (Aban Moment and weighted)Note that the 5value at risk estimated by the originalPareto model is 147 billion the three truncatedmodelestimates are about 80 billion It appears that theoriginal Pareto model overestimates the risk Aninsurance company would set a high premium if itis using the Pareto model with the result that manypeople cannot afford to buy insurance

(2) We examine the three truncated Pareto estimatesAround the tail the weighted estimate fits the datathe best that is the ln(119875119883 gt 119909) curve turns down-ward more quickly following the trend of the datapattern because the weighted estimate

119908is largest

among the three truncated model estimates (this isconsistent with the simulation results in Figure 4)and the weighted mean estimate 120583

119908and the 5 value

at risk are the smallest among the three methods (thisis obtained by placing less weight on the extremevalue eg the 1926 great miami hurricanersquos loss)so its estimated distribution is less heavy in the tailcompared with other methods

Next in order to confirm these conclusions we runthree goodness of-fit tests Later we define the absolute errorand integrated error as the measures of the distance fromthe empirical data points to the estimated Pareto curve andtruncated Pareto curves

612 Goodness-of-Fit Tests Our objective is to test if theestimated distributions in (26)ndash(29) fit the data properlyWe test the hypotheses 119867

0 119865(119909) = 119865

lowast(119909) against 119867

1

119865(119909) = 119865lowast(119909) where 119865(119909) is the true unknown distribution

function and119865lowast(119909) is the estimated Pareto cdf in (26) or the

estimated truncated Pareto cdf in (27)ndash(29) In this paperwe use three EDF goodness-of-fit tests

8 ISRN Probability and Statistics

DataParetoAban

MomentWeighted

22 23 24 25 260

minus1

minus2

minus3

minus4

ln(x)

ln(P(X

gtx))

Figure 4 Log-log plot of the hurricane loss data

(1) The Kolmogorov-Smirnov (K-S) test [15] the teststatistic is given by

119879 = sup119909

1003816100381610038161003816119865lowast(119909) minus 119878

119899(119909)

1003816100381610038161003816 minusinfin lt 119909 lt infin (30)

where 119878119899(119909) is the EDF in (8) Under 119867

0 the two-tailed 119875

value for the K-S test is as follows

119875-value

= 1 minus [1 minus 119879

[119899(1minus119879)]

sum

119895=0

119899

119895 (119899 minus 119895)(1 minus 119879 minus

119895

119899)

119899minus119895

(119905 +119895

119899)

119895minus1

]

2

(31)

where [119899(1 minus 119879)] is the integer part of 119899(1 minus 119879)(2) Anderson and Darling [16] test (A-D test) introduced

a measure of ldquodistancerdquo between the empirical distribution119878119899(119909) and the proposed cdf119865lowast

(119909) by using ametric functionspace

1198822

119899= 119899int

infin

minusinfin

[119878119899(119909) minus 119865

lowast(119909)]

2

120595 (119865lowast(119909)) 119889119865 (32)

where 120595(119906) = 1(119906(1 minus 119906)) is a weight function with 119906 =

119865(119909) Let 119906119894= 119865(119883

(119894)) 119894 = 1 119899 and under 119867

0the test

statistic and 119875-value are given by

1198822

119899= minus119899 minus

1

119899

119899

sum

119895=1

(2119895 minus 1) (log (119906119895) + log (1 minus 119906

119899minus119895+1))

119875-value = 1 minusradic2

119911

infin

sum

119895=0

(minus1)

119895Γ (119895 + 12) (4119895 + 1)

119895119890minus(4119895+1)

21205872(8119911)

timesint

infin

0

119890((z8)(1199082+1)minus(4119895+1)212058721199082(8119911))

119889119908

(33)

008

006

004

002

0

Aban

Moment

Pareto

Weighted

4e + 10 6e + 10 8e + 10 1e + 10 12e + 1014e + 10

x

Figure 5 Absolute errors |119865lowast(119909)minus119878

119899(119909)| for hurricane loss example

on the 10 largest losses data

where 119911 is the observed value of 1198822

119899and Γ(119909) =

intinfin

0119905119909minus1

119890minus119905119889119905 119909 gt 0 is the Gamma function

(3) Cramer-von Mises test [16] proposed (C-v-M test)using 120595(119906) = 1 in (32) thus under 119867

0 the test statistic and

119875-value are given by

1198991205962=

1

12119899+

119899

sum

119895=1

(119906119895 minus2119895 minus 1

2119899)

2

119875-value = 1 minus1

120587radic119911

infin

sum

119895=0

(minus1)

2119895Γ (119895 + 12) (4119895 + 1)

12

119895radic120587119890minus(4119895+1)

2(16119911)

times11987014 ((4119895 + 1)

2

16119911)

(34)

where 11987014(119909) is the modified Bessel function of the second

kind

119870120572(119909) =

120587

2

119868minus120572(119909) minus 119868

120572(119909)

sin (120572120587)

119868120572(119909) =

infin

sum

119898=0

1

119898Γ (119898 + 120572 + 1)(119909

2)

2119898+120572

(35)

For fitting the 49 losses in the hurricane loss data forexample after computing estimates of 119865lowast

(119909) by using (26)(27) (28) and (29) we compute the absolute error (AE) in(30) and define the integrated error (IE) by

IE = [int119883(119899)

119883(119899minus119903+1)

(119878119899(119909) minus 119865

lowast(119909))

2

119889119909]

12

(36)

Figure 5 gives the absolute errors (AE) in (30) for the 10largest losses and confirms that the original Pareto estimatehas relatively larger errors and the weighted estimate hasrelatively smaller errors in the tail Figure 5 explains the datafitting of the tail of the distributions in Figure 4

We also compute the AE in (30) and IE in (36) to confirmthose tail errors by using 119903 as the number of the largest losses1 le 119903 le 119899 The AE and IE values are given in Table 4 (thesmallest values are bold with lowast)

ISRN Probability and Statistics 9

Table 4 Errors of goodness-of-fit tests 119899 = 49 for hurricane example

MethodGoodness-of-fit tests

Absolute error (AE) Integrated error (IE)119903 = 49 119903 = 18 119903 = 10 119903 = 49 119903 = 18 119903 = 10

Pareto(Hill) 01340 00584 00584 04844 03818 03723

TPD(Aban) 00948lowast 00839 00832 03114 02565 02161

TPD(Moment) 01053 00738 00737 02985lowast 02171 01825

TPD(Weighted) 01250 00561lowast 00561lowast 03098 01678lowast 01453lowast

Table 5The estimation of index mean median and VaR for forestfire loss data

Method 120583 Median 5 value at riskPareto

(Hill) 43044 1096 billion 109 billion 1855 billionTPD

(Aban) 07964 1167 billion 115 billion 1446 billionTPD

(Moment) 08260 1166 billion 115 billion 1446 billionTPD

(Weighted) 08642 1165 billion 115 billion 1446 billion

The weighted estimator has the smallest AE and IE valuesfor 119903 = 18 and 10 largest losses and its IE value is almostequal to the smallest IE value for 119903 = 49 all largest losses Westatistically conclude that theweighted estimated distributionis the best fit in the tail of the hurricane loss data

62 Forest Fire Loss Data Example Next we look at the forestfire loss example in Section 112 The data in Table 1 containsa relatively substantial number of large forest fire losses whichconvince us to use a truncated Paretomodel and compare thefour estimators in (26) (27) (28) and (29) We use the 25largest losses in this study

Figure 6 is a log-log plot which shows the upper tail forthe forest fire loss data The circles represent the real dataand the straight line represents the estimated original Paretodistribution The dashed line dotted line and thick solidline represent the estimated truncated Pareto distributionby using Abanrsquos Moment and the weighted estimatorsrespectively We can see that the estimated truncated Paretodistributions fit the data very well using all three estimationmethods and are much better than the original Paretodistribution Around the tail the weighted and the Momentestimation methods perform the best The fact that the tailof the data curves downward in Figure 6 is the evidence insupport of using a truncated Pareto model

The results of these three estimators are listed in Table 5by using the forest fire loss data where 120574 = 119883

(1) ] = 119883

(119899)

and 119899 = 25Note that the weighted estimate

119908is the largest among

the three truncated Pareto estimates The 5 value at risk ofthe original Pareto estimate is the largestThe 5VaRmay beoverestimated compared to the truncated Pareto models

Similarly as in the hurricane example we compute theAE in (30) and IE in (36) again for fitting the 25 losses of theforest fire loss data

Figure 7 gives the absolute errors (AE) in (30) for the 10largest losses and confirms that the original Pareto estimate

ndash3

ndash25

ndash2

ndash15

ndash1

ndash05

0207206 208 2092 21 211

ln(x)

ln(P(X

gtx))

DataParetoAban

MomentWeighted

Figure 6 Log-log plot for forest fire example

has relatively larger errors and the weighted estimate hasrelatively smaller errors in the tail Figure 7 explains the datafitting of the tail of the distributions in Figure 6

We also compute the AE in (30) and IE in (36) to confirmthose tail errors by using 119903 as the number of the largest losses1 le 119903 le 119899 The IE values are given in Table 6 (the smallestvalues are bold with lowast)

The weighted estimator has the smallest AE and IE valuesfor 119903 = 10 and 5 and its IE value in 119903 = 25 is almost equalto the smallest IE value We statistically conclude that theweighted estimated distribution is the best fit to the tail of theforest fire loss data

7 Conclusions

In the complicated real world it is difficult to construct amodel combining all the desired features In general thefinal model selection depends on the best fitting model Thecriteria are based on goodness-of-fit tests existence of themoments characteristic largest values and log-log plots Thehurricane loss data and forest fire loss data are well fitted bythe truncated Pareto distribution In summary

(a) we recommend that the truncated Pareto model isappropriate as a loss distribution to be used whenanalyzing huge risk loss data setsTheupper and lower

10 ISRN Probability and Statistics

Table 6 Errors of goodness-of-fit tests 119899 = 25 for forest fire loss data

MethodGoodness-of-fit tests

Absolute error (AE) Integrated error (IE)119903 = 25 119903 = 10 119903 = 5 119903 = 25 119903 = 10 119903 = 5

Pareto(Hill) 02656 01288 01288 27125 11493 10985

TPD(Aban) 01503lowast 01195 01195 13687lowast 0 7911 05430

TPD(Moment) 01520 00982 00982 13736 0 7764 05430

TPD(Weighted) 01542 00964lowast 00964lowast 13796 07577lowast 05203lowast

Aban

Moment

Pareto

Weighted

012

01

008

006

004

002

0125e + 09 13e + 09 135e + 09 14e + 09 145e + 09

x

Figure 7 Absolute errors |119865lowast(119909)minus119878

119899(119909)| for forest fire loss example

the 10 largest losses

limits can be set by the largest and smallest losses orother reasonable values

(b) the estimated loss distribution provides a predictionof the next disasterrsquos 5 value at risk The largestloss in the data set plays an important role in theprediction In Figures 4 and 6 the trend of thetail distribution is crucial for an insurance companysetting policy also for inhabitants and governmentto make plans to minimize damage from naturaldisasters

(c) the semiparametricmethods (Moment andweighted)are robust easy to use more stable and fit the databetter than theMLEmethods In both of the foregoingtwo examples the estimated shape parameter is lessthan 1 The original Pareto model is not able to usetheMoment or the weightedmethodsThis is anotheradvantage of using the truncated Pareto model

(d) the statistical inference in these examples shows thatthe estimated distribution curve by using proposedweighted estimation method fits the tails of databetter relative to the Moment and MLE estimatorsThis is due to the fact that theweighted estimator givesless weight on the extreme values and has good MSEBased on these studies we suggest that further studieson the usage of weights may be useful

Appendix

Mathematical Derivations

Lemma A1 For a truncated Pareto random variable 119883 withpdf given in (3) for 119894 119895 isin 1 2 119899 119894 lt 119895 119883

(1)le 119883

(2)le

sdot sdot sdot le 119883(119899)

are order statistics one has

119864 [119883(119894)] = 120574

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1) (A1)

119864 [1198832

(119894)] = 120574

infin

sum

119896=0

(

2

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1) (A2)

119864 [119883(119894)119883

(119895)]

= 1205742

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897

[Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)]

(A3)

where 120588 = (119892120572minus 1)119892

120572 119892 = ]120574 0 lt 120588 lt 1 0 lt 119892 lt 10 lt 120574 le 119909 le ] lt infin and 120572 gt 0

Proof Let the cdf of a truncated Pareto distribution in (4)be

119865 (119909) = 119906 119909 = 120574(1 minus 119887119906)minus1120572

120588 =119892120572minus 1

119892120572 119892 =

]

120574

(A4)

By the theory of order statistics we have

119891119894(119909) =

Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)[119865 (119909)]

119894minus1[1 minus 119865 (119909)]

119899minus119894119891 (119909)

(A5)

119891119894119895(119909 119910) =

Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)

times [119865 (119909)]119894minus1[119865 (119910) minus 119865 (119909)]

119895minusiminus1

times [1 minus 119865 (119910)]119899minus119895

119891 (119909) 119891 (119910)

(A6)

ISRN Probability and Statistics 11

Using the binomial formula 1(1 minus 119909)119903 = suminfin

119896=0(119903+119896minus1

119896) 119909

119896119898 = 1 2 we have

119864 [119883119898

(119894)] = int

]

120574

119909119898119891119894(119909) 119889119909

= int

1

0

120574(1 minus 120588119906)minus119898120572 Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)

times (119906)119894minus1(1 minus 119906)

119899minus119894119889119906

=Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)120574

times

infin

sum

119896=0

(

119898

120572+ 119896 minus 1

119896

)

timesint

1

0

120588119896(119906)

119896+119894minus1(1 minus 119906)

119899minus119894119889119906

= 120574

infin

sum

119896=0

(

119898

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1)

(A7)

By substituting 119898 = 1 and 2 respectively we have (A1) and(A2) And

119864 [119883(119894)119883

(119895)]

= int

119910

120574

int

]

120574

119909119894119909119895119891119894119895(119909 119910) 119889119909 119889119910

=Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)

times int

1

0

int

1

0

120574(1 minus 120588119906)minus1120572

120574(1 minus 120588ℎ)minus1120572

times(119906)119894minus1(ℎ minus 119906)

119895minus119894minus1(1 minus ℎ)

ℎminus119895119889119906 119889ℎ

=Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)1205742

times

infin

sum

119896=0

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

) (

1

120572+ 119897 minus 1

119897

)

times 120588119896+119897

int

1

0

int

1

0

(119906

ℎ)

119896+119894minus1

(1 minus119906

ℎ)

119895minus119894minus1

times(ℎ)119896+119897+119895minus1

(1 minus ℎ)119899minus119895119889119906 119889

119906

=

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)

(A8)

Proof of Theorem 3 When 120572 gt 0 we use

119864 [120583119908] =

119899minus1

sum

119894=2

119864 [119883(119894)] +

1

2(1 minus (119899 minus 2)119908)

times (119864 [119883(1)] + 119864 [119883

(119899)])

MSE (120583119908) = 119864 [(120583

119908minus 120583)

2

]

= 119864 [1205832

119908] minus 2120583119864 [120583

119908] + 120583

2

(A9)

And we have

119864 [1205832

119908] =

119899minus1

sum

119894=2

1199082119864 [119883

2

(119894)] + 2

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

1199082119864 [119883

(119894)119883

(119895)]

+1

4(1 minus (119899 minus 2)119908)

2

times (119864 [1198832

(1)] + 119864 [119883

2

(119899)] +2119864 [119883

(1)119883

(119899)])

+

119899minus1

sum

119894=2

119908 (1 minus (119899 minus 2)119908)

times (119864 [119883(1)119883

(119894)] + 119864 [119883

(119894)119883

(119899)])

(A10)

Let

119860 =

119899minus1

sum

119894=2

119864 [1198832

(119894)] + 2

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

119864 [119883(119894)119883

(119895)]

119861 = 119864 [1198832

(1)] + 119864 [119883

2

(119899)] + 2119864 [119883

(1)119883

(119899)]

119862 =

119899minus1

sum

119894=2

(119864 [119883(1)119883

(119894)] + 119864 [119883

(119894)119883

(119899)])

119863 =

119899minus1

sum

119894=2

119864 [119883(119894)]

119864 = 119864 [119883(1)] + 119864 [119883

(119899)]

(A11)

Then

MSE (120583119908) = 119860119908

2+1

4(1 minus (119899 minus 2)119908)

2119861

+ 119908 (1 minus (119899 minus 2)119908)119862

minus 2120583 (119863119908 +1

2(1 minus (119899 minus 2)119908) 119864) + 120583

2

12 ISRN Probability and Statistics

= (119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862)119908

2

+ (minus1

2(119899 minus 2) 119861 + 119862 minus 2120583119863 + 120583 (119899 minus 2) 119864)119908

+1

4119861 minus 120583119864 + 120583

2

(A12)

Now use Lemma A1 then we haveTheorem 3

LemmaA2 119872119878119864(120583119908) in (20) is a convex functionwith 119886 gt 0

when 119899 gt 4119862119861 + 2 119862 and 119861 are given in (20)

Proof Let 119891(119908) = MSE(120583119908) = 119886119908

2+ 119887119908 + 119888 We have

119886 = 119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862 (A13)

where

119860 = 119864[

[

(

119899minus1

sum

119894=2

119883(119894))

2

]

]

ge 0

119861 = 119864 [(119883(1)+ 119883

(119899))2

] ge 0

119862 = 119864[

119899minus1

sum

119894=2

119883(119894)(119883

(1)+ 119883

(119899))] ge 0

(A14)

Since 119899 gt 2 then1

4(119899 minus 2) 119861 minus 119862 gt 0 such that 119886 gt 0 when 119899 gt

4119862

119861+ 2

(A15)

Proof of Corollary 5 ByTheorem 3 and Lemma A2 let

119891 (119908) = MSE (120583119908) = 119886119908

2+ 119887119908 + 119888 (A16)

which is a quadratic functionwith first and secondderivativesof 119891(119908) wrt 119908

1198911015840(119908) = 2119886119908 + 119887 119891

10158401015840(119908) = 2119886 gt 0 (A17)

Hence 119891(119908) is a convex function with minimum value

119891min = 119886(minus119887

2119886)

2

+ 119887(minus119887

2119886) + 119888 = minus

1198872

4119886+ 119888 (A18)

The maximum value of the EFF(120583119908) is

EFFmax (120583119908)

=

120572120574120572(]2minus120572 minus 1205742minus120572) (2 minus 120572) (1 minus (120574])

120572

)

119899 (minus1198872 (4119886) + 119888)

minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

119899 (minus1198872 (4119886) + 119888)

at 119908 = minus119887

2119886

(A19)

Acknowledgments

The authors thank the referees and the editor for their com-ments which helped to improve the paper This research issupported by the Natural Sciences and Engineering ResearchCouncil of Canada

References

[1] R A Pielke Jr J Gratz C W Landsea D Collins M ASaunders and R Musulin ldquoNormalized hurricane damage inthe United States 1900ndash2005rdquo Natural Hazards Review vol 9no 1 pp 29ndash42 2008

[2] P Embrechts C Kluppelberg and T Mikosch ModellingExtremal Events for Insurance and Finance Springer New YorkNY USA 2003

[3] C Kleiber and S Kotz Statistical Size Distribution in Economicsand Actuarial Sciences JohnWiley amp Sons NewYork NY USA2003

[4] Beirlant J Y Goegebeur I Sergers and J Teugels Statisticsof Extremes Theory and Application John Wiley amp Sons NewYork NY USA 2005

[5] M A Beg ldquoEstimation of the tail probability of the truncatedPareto distributionrdquo Journal of Information amp OptimizationSciences vol 2 no 2 pp 192ndash198 1981

[6] I B Aban M M Meerschaert and A K Panorska ldquoParameterestimation for the truncated Pareto distributionrdquo Journal of theAmerican Statistical Association vol 101 no 473 pp 270ndash2772006

[7] B Efron ldquoBootstrap methods another look at the jackkniferdquoThe Annals of Statistics vol 7 no 1 pp 1ndash26 1979

[8] P Barbe and P Bertail The Weighted Bootstrap Springer NewYork NY USA 1995

[9] G R Shorack and J A Wellner Empirical Processes withApplications to Statistics John Wiley amp Sons New York NYUSA 1986

[10] H L Koul Weighted Empirical and Linear Models vol 21 ofLecture Notes-Monograph Series Institute of Mathematical Sta-tistics Hayward Calif USA 1992

[11] M L Huang and P H Brill ldquoA distribution estimation methodbased on level crossingsrdquo Journal of Statistical Planning andInference vol 124 no 1 pp 45ndash62 2004

[12] M L Huang ldquoThe efficiencies of a weighted distribution func-tion estimatorrdquo inThe Proceeding of American Statistical Associ-ation Nonparametric Statistics Section pp 1502ndash1506 2003

[13] B M Hill ldquoA simple general approach to inference about thetail of a distributionrdquo The Annals of Statistics vol 3 no 5 pp1163ndash1174 1975

[14] P Cooke ldquoStatistical inference for bounds of random variablesrdquoBiometrika vol 66 no 2 pp 367ndash374 1979

[15] A N Kolmogorov ldquoSulla determinazione empirica di una leggedi distribuzionerdquoGiornale dellrsquoIstituto Italiano degli Attuari vol4 pp 83ndash91 1933

[16] T W Anderson and D A Darling ldquoAsymptotic theory of cer-tain ldquogoodness of fitrdquo criteria based on stochastic processesrdquoTheAnnals of Mathematical Statistics vol 23 pp 193ndash212 1952

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article A Weighted Estimation for Risk Modeldownloads.hindawi.com › archive › 2013 › 829131.pdf · 2019-07-31 · Research Article A Weighted Estimation for Risk Model

8 ISRN Probability and Statistics

DataParetoAban

MomentWeighted

22 23 24 25 260

minus1

minus2

minus3

minus4

ln(x)

ln(P(X

gtx))

Figure 4 Log-log plot of the hurricane loss data

(1) The Kolmogorov-Smirnov (K-S) test [15] the teststatistic is given by

119879 = sup119909

1003816100381610038161003816119865lowast(119909) minus 119878

119899(119909)

1003816100381610038161003816 minusinfin lt 119909 lt infin (30)

where 119878119899(119909) is the EDF in (8) Under 119867

0 the two-tailed 119875

value for the K-S test is as follows

119875-value

= 1 minus [1 minus 119879

[119899(1minus119879)]

sum

119895=0

119899

119895 (119899 minus 119895)(1 minus 119879 minus

119895

119899)

119899minus119895

(119905 +119895

119899)

119895minus1

]

2

(31)

where [119899(1 minus 119879)] is the integer part of 119899(1 minus 119879)(2) Anderson and Darling [16] test (A-D test) introduced

a measure of ldquodistancerdquo between the empirical distribution119878119899(119909) and the proposed cdf119865lowast

(119909) by using ametric functionspace

1198822

119899= 119899int

infin

minusinfin

[119878119899(119909) minus 119865

lowast(119909)]

2

120595 (119865lowast(119909)) 119889119865 (32)

where 120595(119906) = 1(119906(1 minus 119906)) is a weight function with 119906 =

119865(119909) Let 119906119894= 119865(119883

(119894)) 119894 = 1 119899 and under 119867

0the test

statistic and 119875-value are given by

1198822

119899= minus119899 minus

1

119899

119899

sum

119895=1

(2119895 minus 1) (log (119906119895) + log (1 minus 119906

119899minus119895+1))

119875-value = 1 minusradic2

119911

infin

sum

119895=0

(minus1)

119895Γ (119895 + 12) (4119895 + 1)

119895119890minus(4119895+1)

21205872(8119911)

timesint

infin

0

119890((z8)(1199082+1)minus(4119895+1)212058721199082(8119911))

119889119908

(33)

008

006

004

002

0

Aban

Moment

Pareto

Weighted

4e + 10 6e + 10 8e + 10 1e + 10 12e + 1014e + 10

x

Figure 5 Absolute errors |119865lowast(119909)minus119878

119899(119909)| for hurricane loss example

on the 10 largest losses data

where 119911 is the observed value of 1198822

119899and Γ(119909) =

intinfin

0119905119909minus1

119890minus119905119889119905 119909 gt 0 is the Gamma function

(3) Cramer-von Mises test [16] proposed (C-v-M test)using 120595(119906) = 1 in (32) thus under 119867

0 the test statistic and

119875-value are given by

1198991205962=

1

12119899+

119899

sum

119895=1

(119906119895 minus2119895 minus 1

2119899)

2

119875-value = 1 minus1

120587radic119911

infin

sum

119895=0

(minus1)

2119895Γ (119895 + 12) (4119895 + 1)

12

119895radic120587119890minus(4119895+1)

2(16119911)

times11987014 ((4119895 + 1)

2

16119911)

(34)

where 11987014(119909) is the modified Bessel function of the second

kind

119870120572(119909) =

120587

2

119868minus120572(119909) minus 119868

120572(119909)

sin (120572120587)

119868120572(119909) =

infin

sum

119898=0

1

119898Γ (119898 + 120572 + 1)(119909

2)

2119898+120572

(35)

For fitting the 49 losses in the hurricane loss data forexample after computing estimates of 119865lowast

(119909) by using (26)(27) (28) and (29) we compute the absolute error (AE) in(30) and define the integrated error (IE) by

IE = [int119883(119899)

119883(119899minus119903+1)

(119878119899(119909) minus 119865

lowast(119909))

2

119889119909]

12

(36)

Figure 5 gives the absolute errors (AE) in (30) for the 10largest losses and confirms that the original Pareto estimatehas relatively larger errors and the weighted estimate hasrelatively smaller errors in the tail Figure 5 explains the datafitting of the tail of the distributions in Figure 4

We also compute the AE in (30) and IE in (36) to confirmthose tail errors by using 119903 as the number of the largest losses1 le 119903 le 119899 The AE and IE values are given in Table 4 (thesmallest values are bold with lowast)

ISRN Probability and Statistics 9

Table 4 Errors of goodness-of-fit tests 119899 = 49 for hurricane example

MethodGoodness-of-fit tests

Absolute error (AE) Integrated error (IE)119903 = 49 119903 = 18 119903 = 10 119903 = 49 119903 = 18 119903 = 10

Pareto(Hill) 01340 00584 00584 04844 03818 03723

TPD(Aban) 00948lowast 00839 00832 03114 02565 02161

TPD(Moment) 01053 00738 00737 02985lowast 02171 01825

TPD(Weighted) 01250 00561lowast 00561lowast 03098 01678lowast 01453lowast

Table 5The estimation of index mean median and VaR for forestfire loss data

Method 120583 Median 5 value at riskPareto

(Hill) 43044 1096 billion 109 billion 1855 billionTPD

(Aban) 07964 1167 billion 115 billion 1446 billionTPD

(Moment) 08260 1166 billion 115 billion 1446 billionTPD

(Weighted) 08642 1165 billion 115 billion 1446 billion

The weighted estimator has the smallest AE and IE valuesfor 119903 = 18 and 10 largest losses and its IE value is almostequal to the smallest IE value for 119903 = 49 all largest losses Westatistically conclude that theweighted estimated distributionis the best fit in the tail of the hurricane loss data

62 Forest Fire Loss Data Example Next we look at the forestfire loss example in Section 112 The data in Table 1 containsa relatively substantial number of large forest fire losses whichconvince us to use a truncated Paretomodel and compare thefour estimators in (26) (27) (28) and (29) We use the 25largest losses in this study

Figure 6 is a log-log plot which shows the upper tail forthe forest fire loss data The circles represent the real dataand the straight line represents the estimated original Paretodistribution The dashed line dotted line and thick solidline represent the estimated truncated Pareto distributionby using Abanrsquos Moment and the weighted estimatorsrespectively We can see that the estimated truncated Paretodistributions fit the data very well using all three estimationmethods and are much better than the original Paretodistribution Around the tail the weighted and the Momentestimation methods perform the best The fact that the tailof the data curves downward in Figure 6 is the evidence insupport of using a truncated Pareto model

The results of these three estimators are listed in Table 5by using the forest fire loss data where 120574 = 119883

(1) ] = 119883

(119899)

and 119899 = 25Note that the weighted estimate

119908is the largest among

the three truncated Pareto estimates The 5 value at risk ofthe original Pareto estimate is the largestThe 5VaRmay beoverestimated compared to the truncated Pareto models

Similarly as in the hurricane example we compute theAE in (30) and IE in (36) again for fitting the 25 losses of theforest fire loss data

Figure 7 gives the absolute errors (AE) in (30) for the 10largest losses and confirms that the original Pareto estimate

ndash3

ndash25

ndash2

ndash15

ndash1

ndash05

0207206 208 2092 21 211

ln(x)

ln(P(X

gtx))

DataParetoAban

MomentWeighted

Figure 6 Log-log plot for forest fire example

has relatively larger errors and the weighted estimate hasrelatively smaller errors in the tail Figure 7 explains the datafitting of the tail of the distributions in Figure 6

We also compute the AE in (30) and IE in (36) to confirmthose tail errors by using 119903 as the number of the largest losses1 le 119903 le 119899 The IE values are given in Table 6 (the smallestvalues are bold with lowast)

The weighted estimator has the smallest AE and IE valuesfor 119903 = 10 and 5 and its IE value in 119903 = 25 is almost equalto the smallest IE value We statistically conclude that theweighted estimated distribution is the best fit to the tail of theforest fire loss data

7 Conclusions

In the complicated real world it is difficult to construct amodel combining all the desired features In general thefinal model selection depends on the best fitting model Thecriteria are based on goodness-of-fit tests existence of themoments characteristic largest values and log-log plots Thehurricane loss data and forest fire loss data are well fitted bythe truncated Pareto distribution In summary

(a) we recommend that the truncated Pareto model isappropriate as a loss distribution to be used whenanalyzing huge risk loss data setsTheupper and lower

10 ISRN Probability and Statistics

Table 6 Errors of goodness-of-fit tests 119899 = 25 for forest fire loss data

MethodGoodness-of-fit tests

Absolute error (AE) Integrated error (IE)119903 = 25 119903 = 10 119903 = 5 119903 = 25 119903 = 10 119903 = 5

Pareto(Hill) 02656 01288 01288 27125 11493 10985

TPD(Aban) 01503lowast 01195 01195 13687lowast 0 7911 05430

TPD(Moment) 01520 00982 00982 13736 0 7764 05430

TPD(Weighted) 01542 00964lowast 00964lowast 13796 07577lowast 05203lowast

Aban

Moment

Pareto

Weighted

012

01

008

006

004

002

0125e + 09 13e + 09 135e + 09 14e + 09 145e + 09

x

Figure 7 Absolute errors |119865lowast(119909)minus119878

119899(119909)| for forest fire loss example

the 10 largest losses

limits can be set by the largest and smallest losses orother reasonable values

(b) the estimated loss distribution provides a predictionof the next disasterrsquos 5 value at risk The largestloss in the data set plays an important role in theprediction In Figures 4 and 6 the trend of thetail distribution is crucial for an insurance companysetting policy also for inhabitants and governmentto make plans to minimize damage from naturaldisasters

(c) the semiparametricmethods (Moment andweighted)are robust easy to use more stable and fit the databetter than theMLEmethods In both of the foregoingtwo examples the estimated shape parameter is lessthan 1 The original Pareto model is not able to usetheMoment or the weightedmethodsThis is anotheradvantage of using the truncated Pareto model

(d) the statistical inference in these examples shows thatthe estimated distribution curve by using proposedweighted estimation method fits the tails of databetter relative to the Moment and MLE estimatorsThis is due to the fact that theweighted estimator givesless weight on the extreme values and has good MSEBased on these studies we suggest that further studieson the usage of weights may be useful

Appendix

Mathematical Derivations

Lemma A1 For a truncated Pareto random variable 119883 withpdf given in (3) for 119894 119895 isin 1 2 119899 119894 lt 119895 119883

(1)le 119883

(2)le

sdot sdot sdot le 119883(119899)

are order statistics one has

119864 [119883(119894)] = 120574

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1) (A1)

119864 [1198832

(119894)] = 120574

infin

sum

119896=0

(

2

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1) (A2)

119864 [119883(119894)119883

(119895)]

= 1205742

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897

[Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)]

(A3)

where 120588 = (119892120572minus 1)119892

120572 119892 = ]120574 0 lt 120588 lt 1 0 lt 119892 lt 10 lt 120574 le 119909 le ] lt infin and 120572 gt 0

Proof Let the cdf of a truncated Pareto distribution in (4)be

119865 (119909) = 119906 119909 = 120574(1 minus 119887119906)minus1120572

120588 =119892120572minus 1

119892120572 119892 =

]

120574

(A4)

By the theory of order statistics we have

119891119894(119909) =

Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)[119865 (119909)]

119894minus1[1 minus 119865 (119909)]

119899minus119894119891 (119909)

(A5)

119891119894119895(119909 119910) =

Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)

times [119865 (119909)]119894minus1[119865 (119910) minus 119865 (119909)]

119895minusiminus1

times [1 minus 119865 (119910)]119899minus119895

119891 (119909) 119891 (119910)

(A6)

ISRN Probability and Statistics 11

Using the binomial formula 1(1 minus 119909)119903 = suminfin

119896=0(119903+119896minus1

119896) 119909

119896119898 = 1 2 we have

119864 [119883119898

(119894)] = int

]

120574

119909119898119891119894(119909) 119889119909

= int

1

0

120574(1 minus 120588119906)minus119898120572 Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)

times (119906)119894minus1(1 minus 119906)

119899minus119894119889119906

=Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)120574

times

infin

sum

119896=0

(

119898

120572+ 119896 minus 1

119896

)

timesint

1

0

120588119896(119906)

119896+119894minus1(1 minus 119906)

119899minus119894119889119906

= 120574

infin

sum

119896=0

(

119898

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1)

(A7)

By substituting 119898 = 1 and 2 respectively we have (A1) and(A2) And

119864 [119883(119894)119883

(119895)]

= int

119910

120574

int

]

120574

119909119894119909119895119891119894119895(119909 119910) 119889119909 119889119910

=Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)

times int

1

0

int

1

0

120574(1 minus 120588119906)minus1120572

120574(1 minus 120588ℎ)minus1120572

times(119906)119894minus1(ℎ minus 119906)

119895minus119894minus1(1 minus ℎ)

ℎminus119895119889119906 119889ℎ

=Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)1205742

times

infin

sum

119896=0

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

) (

1

120572+ 119897 minus 1

119897

)

times 120588119896+119897

int

1

0

int

1

0

(119906

ℎ)

119896+119894minus1

(1 minus119906

ℎ)

119895minus119894minus1

times(ℎ)119896+119897+119895minus1

(1 minus ℎ)119899minus119895119889119906 119889

119906

=

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)

(A8)

Proof of Theorem 3 When 120572 gt 0 we use

119864 [120583119908] =

119899minus1

sum

119894=2

119864 [119883(119894)] +

1

2(1 minus (119899 minus 2)119908)

times (119864 [119883(1)] + 119864 [119883

(119899)])

MSE (120583119908) = 119864 [(120583

119908minus 120583)

2

]

= 119864 [1205832

119908] minus 2120583119864 [120583

119908] + 120583

2

(A9)

And we have

119864 [1205832

119908] =

119899minus1

sum

119894=2

1199082119864 [119883

2

(119894)] + 2

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

1199082119864 [119883

(119894)119883

(119895)]

+1

4(1 minus (119899 minus 2)119908)

2

times (119864 [1198832

(1)] + 119864 [119883

2

(119899)] +2119864 [119883

(1)119883

(119899)])

+

119899minus1

sum

119894=2

119908 (1 minus (119899 minus 2)119908)

times (119864 [119883(1)119883

(119894)] + 119864 [119883

(119894)119883

(119899)])

(A10)

Let

119860 =

119899minus1

sum

119894=2

119864 [1198832

(119894)] + 2

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

119864 [119883(119894)119883

(119895)]

119861 = 119864 [1198832

(1)] + 119864 [119883

2

(119899)] + 2119864 [119883

(1)119883

(119899)]

119862 =

119899minus1

sum

119894=2

(119864 [119883(1)119883

(119894)] + 119864 [119883

(119894)119883

(119899)])

119863 =

119899minus1

sum

119894=2

119864 [119883(119894)]

119864 = 119864 [119883(1)] + 119864 [119883

(119899)]

(A11)

Then

MSE (120583119908) = 119860119908

2+1

4(1 minus (119899 minus 2)119908)

2119861

+ 119908 (1 minus (119899 minus 2)119908)119862

minus 2120583 (119863119908 +1

2(1 minus (119899 minus 2)119908) 119864) + 120583

2

12 ISRN Probability and Statistics

= (119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862)119908

2

+ (minus1

2(119899 minus 2) 119861 + 119862 minus 2120583119863 + 120583 (119899 minus 2) 119864)119908

+1

4119861 minus 120583119864 + 120583

2

(A12)

Now use Lemma A1 then we haveTheorem 3

LemmaA2 119872119878119864(120583119908) in (20) is a convex functionwith 119886 gt 0

when 119899 gt 4119862119861 + 2 119862 and 119861 are given in (20)

Proof Let 119891(119908) = MSE(120583119908) = 119886119908

2+ 119887119908 + 119888 We have

119886 = 119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862 (A13)

where

119860 = 119864[

[

(

119899minus1

sum

119894=2

119883(119894))

2

]

]

ge 0

119861 = 119864 [(119883(1)+ 119883

(119899))2

] ge 0

119862 = 119864[

119899minus1

sum

119894=2

119883(119894)(119883

(1)+ 119883

(119899))] ge 0

(A14)

Since 119899 gt 2 then1

4(119899 minus 2) 119861 minus 119862 gt 0 such that 119886 gt 0 when 119899 gt

4119862

119861+ 2

(A15)

Proof of Corollary 5 ByTheorem 3 and Lemma A2 let

119891 (119908) = MSE (120583119908) = 119886119908

2+ 119887119908 + 119888 (A16)

which is a quadratic functionwith first and secondderivativesof 119891(119908) wrt 119908

1198911015840(119908) = 2119886119908 + 119887 119891

10158401015840(119908) = 2119886 gt 0 (A17)

Hence 119891(119908) is a convex function with minimum value

119891min = 119886(minus119887

2119886)

2

+ 119887(minus119887

2119886) + 119888 = minus

1198872

4119886+ 119888 (A18)

The maximum value of the EFF(120583119908) is

EFFmax (120583119908)

=

120572120574120572(]2minus120572 minus 1205742minus120572) (2 minus 120572) (1 minus (120574])

120572

)

119899 (minus1198872 (4119886) + 119888)

minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

119899 (minus1198872 (4119886) + 119888)

at 119908 = minus119887

2119886

(A19)

Acknowledgments

The authors thank the referees and the editor for their com-ments which helped to improve the paper This research issupported by the Natural Sciences and Engineering ResearchCouncil of Canada

References

[1] R A Pielke Jr J Gratz C W Landsea D Collins M ASaunders and R Musulin ldquoNormalized hurricane damage inthe United States 1900ndash2005rdquo Natural Hazards Review vol 9no 1 pp 29ndash42 2008

[2] P Embrechts C Kluppelberg and T Mikosch ModellingExtremal Events for Insurance and Finance Springer New YorkNY USA 2003

[3] C Kleiber and S Kotz Statistical Size Distribution in Economicsand Actuarial Sciences JohnWiley amp Sons NewYork NY USA2003

[4] Beirlant J Y Goegebeur I Sergers and J Teugels Statisticsof Extremes Theory and Application John Wiley amp Sons NewYork NY USA 2005

[5] M A Beg ldquoEstimation of the tail probability of the truncatedPareto distributionrdquo Journal of Information amp OptimizationSciences vol 2 no 2 pp 192ndash198 1981

[6] I B Aban M M Meerschaert and A K Panorska ldquoParameterestimation for the truncated Pareto distributionrdquo Journal of theAmerican Statistical Association vol 101 no 473 pp 270ndash2772006

[7] B Efron ldquoBootstrap methods another look at the jackkniferdquoThe Annals of Statistics vol 7 no 1 pp 1ndash26 1979

[8] P Barbe and P Bertail The Weighted Bootstrap Springer NewYork NY USA 1995

[9] G R Shorack and J A Wellner Empirical Processes withApplications to Statistics John Wiley amp Sons New York NYUSA 1986

[10] H L Koul Weighted Empirical and Linear Models vol 21 ofLecture Notes-Monograph Series Institute of Mathematical Sta-tistics Hayward Calif USA 1992

[11] M L Huang and P H Brill ldquoA distribution estimation methodbased on level crossingsrdquo Journal of Statistical Planning andInference vol 124 no 1 pp 45ndash62 2004

[12] M L Huang ldquoThe efficiencies of a weighted distribution func-tion estimatorrdquo inThe Proceeding of American Statistical Associ-ation Nonparametric Statistics Section pp 1502ndash1506 2003

[13] B M Hill ldquoA simple general approach to inference about thetail of a distributionrdquo The Annals of Statistics vol 3 no 5 pp1163ndash1174 1975

[14] P Cooke ldquoStatistical inference for bounds of random variablesrdquoBiometrika vol 66 no 2 pp 367ndash374 1979

[15] A N Kolmogorov ldquoSulla determinazione empirica di una leggedi distribuzionerdquoGiornale dellrsquoIstituto Italiano degli Attuari vol4 pp 83ndash91 1933

[16] T W Anderson and D A Darling ldquoAsymptotic theory of cer-tain ldquogoodness of fitrdquo criteria based on stochastic processesrdquoTheAnnals of Mathematical Statistics vol 23 pp 193ndash212 1952

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article A Weighted Estimation for Risk Modeldownloads.hindawi.com › archive › 2013 › 829131.pdf · 2019-07-31 · Research Article A Weighted Estimation for Risk Model

ISRN Probability and Statistics 9

Table 4 Errors of goodness-of-fit tests 119899 = 49 for hurricane example

MethodGoodness-of-fit tests

Absolute error (AE) Integrated error (IE)119903 = 49 119903 = 18 119903 = 10 119903 = 49 119903 = 18 119903 = 10

Pareto(Hill) 01340 00584 00584 04844 03818 03723

TPD(Aban) 00948lowast 00839 00832 03114 02565 02161

TPD(Moment) 01053 00738 00737 02985lowast 02171 01825

TPD(Weighted) 01250 00561lowast 00561lowast 03098 01678lowast 01453lowast

Table 5The estimation of index mean median and VaR for forestfire loss data

Method 120583 Median 5 value at riskPareto

(Hill) 43044 1096 billion 109 billion 1855 billionTPD

(Aban) 07964 1167 billion 115 billion 1446 billionTPD

(Moment) 08260 1166 billion 115 billion 1446 billionTPD

(Weighted) 08642 1165 billion 115 billion 1446 billion

The weighted estimator has the smallest AE and IE valuesfor 119903 = 18 and 10 largest losses and its IE value is almostequal to the smallest IE value for 119903 = 49 all largest losses Westatistically conclude that theweighted estimated distributionis the best fit in the tail of the hurricane loss data

62 Forest Fire Loss Data Example Next we look at the forestfire loss example in Section 112 The data in Table 1 containsa relatively substantial number of large forest fire losses whichconvince us to use a truncated Paretomodel and compare thefour estimators in (26) (27) (28) and (29) We use the 25largest losses in this study

Figure 6 is a log-log plot which shows the upper tail forthe forest fire loss data The circles represent the real dataand the straight line represents the estimated original Paretodistribution The dashed line dotted line and thick solidline represent the estimated truncated Pareto distributionby using Abanrsquos Moment and the weighted estimatorsrespectively We can see that the estimated truncated Paretodistributions fit the data very well using all three estimationmethods and are much better than the original Paretodistribution Around the tail the weighted and the Momentestimation methods perform the best The fact that the tailof the data curves downward in Figure 6 is the evidence insupport of using a truncated Pareto model

The results of these three estimators are listed in Table 5by using the forest fire loss data where 120574 = 119883

(1) ] = 119883

(119899)

and 119899 = 25Note that the weighted estimate

119908is the largest among

the three truncated Pareto estimates The 5 value at risk ofthe original Pareto estimate is the largestThe 5VaRmay beoverestimated compared to the truncated Pareto models

Similarly as in the hurricane example we compute theAE in (30) and IE in (36) again for fitting the 25 losses of theforest fire loss data

Figure 7 gives the absolute errors (AE) in (30) for the 10largest losses and confirms that the original Pareto estimate

ndash3

ndash25

ndash2

ndash15

ndash1

ndash05

0207206 208 2092 21 211

ln(x)

ln(P(X

gtx))

DataParetoAban

MomentWeighted

Figure 6 Log-log plot for forest fire example

has relatively larger errors and the weighted estimate hasrelatively smaller errors in the tail Figure 7 explains the datafitting of the tail of the distributions in Figure 6

We also compute the AE in (30) and IE in (36) to confirmthose tail errors by using 119903 as the number of the largest losses1 le 119903 le 119899 The IE values are given in Table 6 (the smallestvalues are bold with lowast)

The weighted estimator has the smallest AE and IE valuesfor 119903 = 10 and 5 and its IE value in 119903 = 25 is almost equalto the smallest IE value We statistically conclude that theweighted estimated distribution is the best fit to the tail of theforest fire loss data

7 Conclusions

In the complicated real world it is difficult to construct amodel combining all the desired features In general thefinal model selection depends on the best fitting model Thecriteria are based on goodness-of-fit tests existence of themoments characteristic largest values and log-log plots Thehurricane loss data and forest fire loss data are well fitted bythe truncated Pareto distribution In summary

(a) we recommend that the truncated Pareto model isappropriate as a loss distribution to be used whenanalyzing huge risk loss data setsTheupper and lower

10 ISRN Probability and Statistics

Table 6 Errors of goodness-of-fit tests 119899 = 25 for forest fire loss data

MethodGoodness-of-fit tests

Absolute error (AE) Integrated error (IE)119903 = 25 119903 = 10 119903 = 5 119903 = 25 119903 = 10 119903 = 5

Pareto(Hill) 02656 01288 01288 27125 11493 10985

TPD(Aban) 01503lowast 01195 01195 13687lowast 0 7911 05430

TPD(Moment) 01520 00982 00982 13736 0 7764 05430

TPD(Weighted) 01542 00964lowast 00964lowast 13796 07577lowast 05203lowast

Aban

Moment

Pareto

Weighted

012

01

008

006

004

002

0125e + 09 13e + 09 135e + 09 14e + 09 145e + 09

x

Figure 7 Absolute errors |119865lowast(119909)minus119878

119899(119909)| for forest fire loss example

the 10 largest losses

limits can be set by the largest and smallest losses orother reasonable values

(b) the estimated loss distribution provides a predictionof the next disasterrsquos 5 value at risk The largestloss in the data set plays an important role in theprediction In Figures 4 and 6 the trend of thetail distribution is crucial for an insurance companysetting policy also for inhabitants and governmentto make plans to minimize damage from naturaldisasters

(c) the semiparametricmethods (Moment andweighted)are robust easy to use more stable and fit the databetter than theMLEmethods In both of the foregoingtwo examples the estimated shape parameter is lessthan 1 The original Pareto model is not able to usetheMoment or the weightedmethodsThis is anotheradvantage of using the truncated Pareto model

(d) the statistical inference in these examples shows thatthe estimated distribution curve by using proposedweighted estimation method fits the tails of databetter relative to the Moment and MLE estimatorsThis is due to the fact that theweighted estimator givesless weight on the extreme values and has good MSEBased on these studies we suggest that further studieson the usage of weights may be useful

Appendix

Mathematical Derivations

Lemma A1 For a truncated Pareto random variable 119883 withpdf given in (3) for 119894 119895 isin 1 2 119899 119894 lt 119895 119883

(1)le 119883

(2)le

sdot sdot sdot le 119883(119899)

are order statistics one has

119864 [119883(119894)] = 120574

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1) (A1)

119864 [1198832

(119894)] = 120574

infin

sum

119896=0

(

2

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1) (A2)

119864 [119883(119894)119883

(119895)]

= 1205742

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897

[Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)]

(A3)

where 120588 = (119892120572minus 1)119892

120572 119892 = ]120574 0 lt 120588 lt 1 0 lt 119892 lt 10 lt 120574 le 119909 le ] lt infin and 120572 gt 0

Proof Let the cdf of a truncated Pareto distribution in (4)be

119865 (119909) = 119906 119909 = 120574(1 minus 119887119906)minus1120572

120588 =119892120572minus 1

119892120572 119892 =

]

120574

(A4)

By the theory of order statistics we have

119891119894(119909) =

Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)[119865 (119909)]

119894minus1[1 minus 119865 (119909)]

119899minus119894119891 (119909)

(A5)

119891119894119895(119909 119910) =

Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)

times [119865 (119909)]119894minus1[119865 (119910) minus 119865 (119909)]

119895minusiminus1

times [1 minus 119865 (119910)]119899minus119895

119891 (119909) 119891 (119910)

(A6)

ISRN Probability and Statistics 11

Using the binomial formula 1(1 minus 119909)119903 = suminfin

119896=0(119903+119896minus1

119896) 119909

119896119898 = 1 2 we have

119864 [119883119898

(119894)] = int

]

120574

119909119898119891119894(119909) 119889119909

= int

1

0

120574(1 minus 120588119906)minus119898120572 Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)

times (119906)119894minus1(1 minus 119906)

119899minus119894119889119906

=Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)120574

times

infin

sum

119896=0

(

119898

120572+ 119896 minus 1

119896

)

timesint

1

0

120588119896(119906)

119896+119894minus1(1 minus 119906)

119899minus119894119889119906

= 120574

infin

sum

119896=0

(

119898

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1)

(A7)

By substituting 119898 = 1 and 2 respectively we have (A1) and(A2) And

119864 [119883(119894)119883

(119895)]

= int

119910

120574

int

]

120574

119909119894119909119895119891119894119895(119909 119910) 119889119909 119889119910

=Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)

times int

1

0

int

1

0

120574(1 minus 120588119906)minus1120572

120574(1 minus 120588ℎ)minus1120572

times(119906)119894minus1(ℎ minus 119906)

119895minus119894minus1(1 minus ℎ)

ℎminus119895119889119906 119889ℎ

=Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)1205742

times

infin

sum

119896=0

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

) (

1

120572+ 119897 minus 1

119897

)

times 120588119896+119897

int

1

0

int

1

0

(119906

ℎ)

119896+119894minus1

(1 minus119906

ℎ)

119895minus119894minus1

times(ℎ)119896+119897+119895minus1

(1 minus ℎ)119899minus119895119889119906 119889

119906

=

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)

(A8)

Proof of Theorem 3 When 120572 gt 0 we use

119864 [120583119908] =

119899minus1

sum

119894=2

119864 [119883(119894)] +

1

2(1 minus (119899 minus 2)119908)

times (119864 [119883(1)] + 119864 [119883

(119899)])

MSE (120583119908) = 119864 [(120583

119908minus 120583)

2

]

= 119864 [1205832

119908] minus 2120583119864 [120583

119908] + 120583

2

(A9)

And we have

119864 [1205832

119908] =

119899minus1

sum

119894=2

1199082119864 [119883

2

(119894)] + 2

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

1199082119864 [119883

(119894)119883

(119895)]

+1

4(1 minus (119899 minus 2)119908)

2

times (119864 [1198832

(1)] + 119864 [119883

2

(119899)] +2119864 [119883

(1)119883

(119899)])

+

119899minus1

sum

119894=2

119908 (1 minus (119899 minus 2)119908)

times (119864 [119883(1)119883

(119894)] + 119864 [119883

(119894)119883

(119899)])

(A10)

Let

119860 =

119899minus1

sum

119894=2

119864 [1198832

(119894)] + 2

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

119864 [119883(119894)119883

(119895)]

119861 = 119864 [1198832

(1)] + 119864 [119883

2

(119899)] + 2119864 [119883

(1)119883

(119899)]

119862 =

119899minus1

sum

119894=2

(119864 [119883(1)119883

(119894)] + 119864 [119883

(119894)119883

(119899)])

119863 =

119899minus1

sum

119894=2

119864 [119883(119894)]

119864 = 119864 [119883(1)] + 119864 [119883

(119899)]

(A11)

Then

MSE (120583119908) = 119860119908

2+1

4(1 minus (119899 minus 2)119908)

2119861

+ 119908 (1 minus (119899 minus 2)119908)119862

minus 2120583 (119863119908 +1

2(1 minus (119899 minus 2)119908) 119864) + 120583

2

12 ISRN Probability and Statistics

= (119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862)119908

2

+ (minus1

2(119899 minus 2) 119861 + 119862 minus 2120583119863 + 120583 (119899 minus 2) 119864)119908

+1

4119861 minus 120583119864 + 120583

2

(A12)

Now use Lemma A1 then we haveTheorem 3

LemmaA2 119872119878119864(120583119908) in (20) is a convex functionwith 119886 gt 0

when 119899 gt 4119862119861 + 2 119862 and 119861 are given in (20)

Proof Let 119891(119908) = MSE(120583119908) = 119886119908

2+ 119887119908 + 119888 We have

119886 = 119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862 (A13)

where

119860 = 119864[

[

(

119899minus1

sum

119894=2

119883(119894))

2

]

]

ge 0

119861 = 119864 [(119883(1)+ 119883

(119899))2

] ge 0

119862 = 119864[

119899minus1

sum

119894=2

119883(119894)(119883

(1)+ 119883

(119899))] ge 0

(A14)

Since 119899 gt 2 then1

4(119899 minus 2) 119861 minus 119862 gt 0 such that 119886 gt 0 when 119899 gt

4119862

119861+ 2

(A15)

Proof of Corollary 5 ByTheorem 3 and Lemma A2 let

119891 (119908) = MSE (120583119908) = 119886119908

2+ 119887119908 + 119888 (A16)

which is a quadratic functionwith first and secondderivativesof 119891(119908) wrt 119908

1198911015840(119908) = 2119886119908 + 119887 119891

10158401015840(119908) = 2119886 gt 0 (A17)

Hence 119891(119908) is a convex function with minimum value

119891min = 119886(minus119887

2119886)

2

+ 119887(minus119887

2119886) + 119888 = minus

1198872

4119886+ 119888 (A18)

The maximum value of the EFF(120583119908) is

EFFmax (120583119908)

=

120572120574120572(]2minus120572 minus 1205742minus120572) (2 minus 120572) (1 minus (120574])

120572

)

119899 (minus1198872 (4119886) + 119888)

minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

119899 (minus1198872 (4119886) + 119888)

at 119908 = minus119887

2119886

(A19)

Acknowledgments

The authors thank the referees and the editor for their com-ments which helped to improve the paper This research issupported by the Natural Sciences and Engineering ResearchCouncil of Canada

References

[1] R A Pielke Jr J Gratz C W Landsea D Collins M ASaunders and R Musulin ldquoNormalized hurricane damage inthe United States 1900ndash2005rdquo Natural Hazards Review vol 9no 1 pp 29ndash42 2008

[2] P Embrechts C Kluppelberg and T Mikosch ModellingExtremal Events for Insurance and Finance Springer New YorkNY USA 2003

[3] C Kleiber and S Kotz Statistical Size Distribution in Economicsand Actuarial Sciences JohnWiley amp Sons NewYork NY USA2003

[4] Beirlant J Y Goegebeur I Sergers and J Teugels Statisticsof Extremes Theory and Application John Wiley amp Sons NewYork NY USA 2005

[5] M A Beg ldquoEstimation of the tail probability of the truncatedPareto distributionrdquo Journal of Information amp OptimizationSciences vol 2 no 2 pp 192ndash198 1981

[6] I B Aban M M Meerschaert and A K Panorska ldquoParameterestimation for the truncated Pareto distributionrdquo Journal of theAmerican Statistical Association vol 101 no 473 pp 270ndash2772006

[7] B Efron ldquoBootstrap methods another look at the jackkniferdquoThe Annals of Statistics vol 7 no 1 pp 1ndash26 1979

[8] P Barbe and P Bertail The Weighted Bootstrap Springer NewYork NY USA 1995

[9] G R Shorack and J A Wellner Empirical Processes withApplications to Statistics John Wiley amp Sons New York NYUSA 1986

[10] H L Koul Weighted Empirical and Linear Models vol 21 ofLecture Notes-Monograph Series Institute of Mathematical Sta-tistics Hayward Calif USA 1992

[11] M L Huang and P H Brill ldquoA distribution estimation methodbased on level crossingsrdquo Journal of Statistical Planning andInference vol 124 no 1 pp 45ndash62 2004

[12] M L Huang ldquoThe efficiencies of a weighted distribution func-tion estimatorrdquo inThe Proceeding of American Statistical Associ-ation Nonparametric Statistics Section pp 1502ndash1506 2003

[13] B M Hill ldquoA simple general approach to inference about thetail of a distributionrdquo The Annals of Statistics vol 3 no 5 pp1163ndash1174 1975

[14] P Cooke ldquoStatistical inference for bounds of random variablesrdquoBiometrika vol 66 no 2 pp 367ndash374 1979

[15] A N Kolmogorov ldquoSulla determinazione empirica di una leggedi distribuzionerdquoGiornale dellrsquoIstituto Italiano degli Attuari vol4 pp 83ndash91 1933

[16] T W Anderson and D A Darling ldquoAsymptotic theory of cer-tain ldquogoodness of fitrdquo criteria based on stochastic processesrdquoTheAnnals of Mathematical Statistics vol 23 pp 193ndash212 1952

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Research Article A Weighted Estimation for Risk Modeldownloads.hindawi.com › archive › 2013 › 829131.pdf · 2019-07-31 · Research Article A Weighted Estimation for Risk Model

10 ISRN Probability and Statistics

Table 6 Errors of goodness-of-fit tests 119899 = 25 for forest fire loss data

MethodGoodness-of-fit tests

Absolute error (AE) Integrated error (IE)119903 = 25 119903 = 10 119903 = 5 119903 = 25 119903 = 10 119903 = 5

Pareto(Hill) 02656 01288 01288 27125 11493 10985

TPD(Aban) 01503lowast 01195 01195 13687lowast 0 7911 05430

TPD(Moment) 01520 00982 00982 13736 0 7764 05430

TPD(Weighted) 01542 00964lowast 00964lowast 13796 07577lowast 05203lowast

Aban

Moment

Pareto

Weighted

012

01

008

006

004

002

0125e + 09 13e + 09 135e + 09 14e + 09 145e + 09

x

Figure 7 Absolute errors |119865lowast(119909)minus119878

119899(119909)| for forest fire loss example

the 10 largest losses

limits can be set by the largest and smallest losses orother reasonable values

(b) the estimated loss distribution provides a predictionof the next disasterrsquos 5 value at risk The largestloss in the data set plays an important role in theprediction In Figures 4 and 6 the trend of thetail distribution is crucial for an insurance companysetting policy also for inhabitants and governmentto make plans to minimize damage from naturaldisasters

(c) the semiparametricmethods (Moment andweighted)are robust easy to use more stable and fit the databetter than theMLEmethods In both of the foregoingtwo examples the estimated shape parameter is lessthan 1 The original Pareto model is not able to usetheMoment or the weightedmethodsThis is anotheradvantage of using the truncated Pareto model

(d) the statistical inference in these examples shows thatthe estimated distribution curve by using proposedweighted estimation method fits the tails of databetter relative to the Moment and MLE estimatorsThis is due to the fact that theweighted estimator givesless weight on the extreme values and has good MSEBased on these studies we suggest that further studieson the usage of weights may be useful

Appendix

Mathematical Derivations

Lemma A1 For a truncated Pareto random variable 119883 withpdf given in (3) for 119894 119895 isin 1 2 119899 119894 lt 119895 119883

(1)le 119883

(2)le

sdot sdot sdot le 119883(119899)

are order statistics one has

119864 [119883(119894)] = 120574

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1) (A1)

119864 [1198832

(119894)] = 120574

infin

sum

119896=0

(

2

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1) (A2)

119864 [119883(119894)119883

(119895)]

= 1205742

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897

[Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)]

(A3)

where 120588 = (119892120572minus 1)119892

120572 119892 = ]120574 0 lt 120588 lt 1 0 lt 119892 lt 10 lt 120574 le 119909 le ] lt infin and 120572 gt 0

Proof Let the cdf of a truncated Pareto distribution in (4)be

119865 (119909) = 119906 119909 = 120574(1 minus 119887119906)minus1120572

120588 =119892120572minus 1

119892120572 119892 =

]

120574

(A4)

By the theory of order statistics we have

119891119894(119909) =

Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)[119865 (119909)]

119894minus1[1 minus 119865 (119909)]

119899minus119894119891 (119909)

(A5)

119891119894119895(119909 119910) =

Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)

times [119865 (119909)]119894minus1[119865 (119910) minus 119865 (119909)]

119895minusiminus1

times [1 minus 119865 (119910)]119899minus119895

119891 (119909) 119891 (119910)

(A6)

ISRN Probability and Statistics 11

Using the binomial formula 1(1 minus 119909)119903 = suminfin

119896=0(119903+119896minus1

119896) 119909

119896119898 = 1 2 we have

119864 [119883119898

(119894)] = int

]

120574

119909119898119891119894(119909) 119889119909

= int

1

0

120574(1 minus 120588119906)minus119898120572 Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)

times (119906)119894minus1(1 minus 119906)

119899minus119894119889119906

=Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)120574

times

infin

sum

119896=0

(

119898

120572+ 119896 minus 1

119896

)

timesint

1

0

120588119896(119906)

119896+119894minus1(1 minus 119906)

119899minus119894119889119906

= 120574

infin

sum

119896=0

(

119898

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1)

(A7)

By substituting 119898 = 1 and 2 respectively we have (A1) and(A2) And

119864 [119883(119894)119883

(119895)]

= int

119910

120574

int

]

120574

119909119894119909119895119891119894119895(119909 119910) 119889119909 119889119910

=Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)

times int

1

0

int

1

0

120574(1 minus 120588119906)minus1120572

120574(1 minus 120588ℎ)minus1120572

times(119906)119894minus1(ℎ minus 119906)

119895minus119894minus1(1 minus ℎ)

ℎminus119895119889119906 119889ℎ

=Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)1205742

times

infin

sum

119896=0

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

) (

1

120572+ 119897 minus 1

119897

)

times 120588119896+119897

int

1

0

int

1

0

(119906

ℎ)

119896+119894minus1

(1 minus119906

ℎ)

119895minus119894minus1

times(ℎ)119896+119897+119895minus1

(1 minus ℎ)119899minus119895119889119906 119889

119906

=

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)

(A8)

Proof of Theorem 3 When 120572 gt 0 we use

119864 [120583119908] =

119899minus1

sum

119894=2

119864 [119883(119894)] +

1

2(1 minus (119899 minus 2)119908)

times (119864 [119883(1)] + 119864 [119883

(119899)])

MSE (120583119908) = 119864 [(120583

119908minus 120583)

2

]

= 119864 [1205832

119908] minus 2120583119864 [120583

119908] + 120583

2

(A9)

And we have

119864 [1205832

119908] =

119899minus1

sum

119894=2

1199082119864 [119883

2

(119894)] + 2

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

1199082119864 [119883

(119894)119883

(119895)]

+1

4(1 minus (119899 minus 2)119908)

2

times (119864 [1198832

(1)] + 119864 [119883

2

(119899)] +2119864 [119883

(1)119883

(119899)])

+

119899minus1

sum

119894=2

119908 (1 minus (119899 minus 2)119908)

times (119864 [119883(1)119883

(119894)] + 119864 [119883

(119894)119883

(119899)])

(A10)

Let

119860 =

119899minus1

sum

119894=2

119864 [1198832

(119894)] + 2

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

119864 [119883(119894)119883

(119895)]

119861 = 119864 [1198832

(1)] + 119864 [119883

2

(119899)] + 2119864 [119883

(1)119883

(119899)]

119862 =

119899minus1

sum

119894=2

(119864 [119883(1)119883

(119894)] + 119864 [119883

(119894)119883

(119899)])

119863 =

119899minus1

sum

119894=2

119864 [119883(119894)]

119864 = 119864 [119883(1)] + 119864 [119883

(119899)]

(A11)

Then

MSE (120583119908) = 119860119908

2+1

4(1 minus (119899 minus 2)119908)

2119861

+ 119908 (1 minus (119899 minus 2)119908)119862

minus 2120583 (119863119908 +1

2(1 minus (119899 minus 2)119908) 119864) + 120583

2

12 ISRN Probability and Statistics

= (119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862)119908

2

+ (minus1

2(119899 minus 2) 119861 + 119862 minus 2120583119863 + 120583 (119899 minus 2) 119864)119908

+1

4119861 minus 120583119864 + 120583

2

(A12)

Now use Lemma A1 then we haveTheorem 3

LemmaA2 119872119878119864(120583119908) in (20) is a convex functionwith 119886 gt 0

when 119899 gt 4119862119861 + 2 119862 and 119861 are given in (20)

Proof Let 119891(119908) = MSE(120583119908) = 119886119908

2+ 119887119908 + 119888 We have

119886 = 119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862 (A13)

where

119860 = 119864[

[

(

119899minus1

sum

119894=2

119883(119894))

2

]

]

ge 0

119861 = 119864 [(119883(1)+ 119883

(119899))2

] ge 0

119862 = 119864[

119899minus1

sum

119894=2

119883(119894)(119883

(1)+ 119883

(119899))] ge 0

(A14)

Since 119899 gt 2 then1

4(119899 minus 2) 119861 minus 119862 gt 0 such that 119886 gt 0 when 119899 gt

4119862

119861+ 2

(A15)

Proof of Corollary 5 ByTheorem 3 and Lemma A2 let

119891 (119908) = MSE (120583119908) = 119886119908

2+ 119887119908 + 119888 (A16)

which is a quadratic functionwith first and secondderivativesof 119891(119908) wrt 119908

1198911015840(119908) = 2119886119908 + 119887 119891

10158401015840(119908) = 2119886 gt 0 (A17)

Hence 119891(119908) is a convex function with minimum value

119891min = 119886(minus119887

2119886)

2

+ 119887(minus119887

2119886) + 119888 = minus

1198872

4119886+ 119888 (A18)

The maximum value of the EFF(120583119908) is

EFFmax (120583119908)

=

120572120574120572(]2minus120572 minus 1205742minus120572) (2 minus 120572) (1 minus (120574])

120572

)

119899 (minus1198872 (4119886) + 119888)

minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

119899 (minus1198872 (4119886) + 119888)

at 119908 = minus119887

2119886

(A19)

Acknowledgments

The authors thank the referees and the editor for their com-ments which helped to improve the paper This research issupported by the Natural Sciences and Engineering ResearchCouncil of Canada

References

[1] R A Pielke Jr J Gratz C W Landsea D Collins M ASaunders and R Musulin ldquoNormalized hurricane damage inthe United States 1900ndash2005rdquo Natural Hazards Review vol 9no 1 pp 29ndash42 2008

[2] P Embrechts C Kluppelberg and T Mikosch ModellingExtremal Events for Insurance and Finance Springer New YorkNY USA 2003

[3] C Kleiber and S Kotz Statistical Size Distribution in Economicsand Actuarial Sciences JohnWiley amp Sons NewYork NY USA2003

[4] Beirlant J Y Goegebeur I Sergers and J Teugels Statisticsof Extremes Theory and Application John Wiley amp Sons NewYork NY USA 2005

[5] M A Beg ldquoEstimation of the tail probability of the truncatedPareto distributionrdquo Journal of Information amp OptimizationSciences vol 2 no 2 pp 192ndash198 1981

[6] I B Aban M M Meerschaert and A K Panorska ldquoParameterestimation for the truncated Pareto distributionrdquo Journal of theAmerican Statistical Association vol 101 no 473 pp 270ndash2772006

[7] B Efron ldquoBootstrap methods another look at the jackkniferdquoThe Annals of Statistics vol 7 no 1 pp 1ndash26 1979

[8] P Barbe and P Bertail The Weighted Bootstrap Springer NewYork NY USA 1995

[9] G R Shorack and J A Wellner Empirical Processes withApplications to Statistics John Wiley amp Sons New York NYUSA 1986

[10] H L Koul Weighted Empirical and Linear Models vol 21 ofLecture Notes-Monograph Series Institute of Mathematical Sta-tistics Hayward Calif USA 1992

[11] M L Huang and P H Brill ldquoA distribution estimation methodbased on level crossingsrdquo Journal of Statistical Planning andInference vol 124 no 1 pp 45ndash62 2004

[12] M L Huang ldquoThe efficiencies of a weighted distribution func-tion estimatorrdquo inThe Proceeding of American Statistical Associ-ation Nonparametric Statistics Section pp 1502ndash1506 2003

[13] B M Hill ldquoA simple general approach to inference about thetail of a distributionrdquo The Annals of Statistics vol 3 no 5 pp1163ndash1174 1975

[14] P Cooke ldquoStatistical inference for bounds of random variablesrdquoBiometrika vol 66 no 2 pp 367ndash374 1979

[15] A N Kolmogorov ldquoSulla determinazione empirica di una leggedi distribuzionerdquoGiornale dellrsquoIstituto Italiano degli Attuari vol4 pp 83ndash91 1933

[16] T W Anderson and D A Darling ldquoAsymptotic theory of cer-tain ldquogoodness of fitrdquo criteria based on stochastic processesrdquoTheAnnals of Mathematical Statistics vol 23 pp 193ndash212 1952

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Research Article A Weighted Estimation for Risk Modeldownloads.hindawi.com › archive › 2013 › 829131.pdf · 2019-07-31 · Research Article A Weighted Estimation for Risk Model

ISRN Probability and Statistics 11

Using the binomial formula 1(1 minus 119909)119903 = suminfin

119896=0(119903+119896minus1

119896) 119909

119896119898 = 1 2 we have

119864 [119883119898

(119894)] = int

]

120574

119909119898119891119894(119909) 119889119909

= int

1

0

120574(1 minus 120588119906)minus119898120572 Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)

times (119906)119894minus1(1 minus 119906)

119899minus119894119889119906

=Γ (119899 + 1)

Γ (119894) Γ (119899 minus 119894 + 1)120574

times

infin

sum

119896=0

(

119898

120572+ 119896 minus 1

119896

)

timesint

1

0

120588119896(119906)

119896+119894minus1(1 minus 119906)

119899minus119894119889119906

= 120574

infin

sum

119896=0

(

119898

120572+ 119896 minus 1

119896

)120588119896 Γ (119899 + 1) Γ (119896 + 119894)

Γ (119894) Γ (119896 + 119899 + 1)

(A7)

By substituting 119898 = 1 and 2 respectively we have (A1) and(A2) And

119864 [119883(119894)119883

(119895)]

= int

119910

120574

int

]

120574

119909119894119909119895119891119894119895(119909 119910) 119889119909 119889119910

=Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)

times int

1

0

int

1

0

120574(1 minus 120588119906)minus1120572

120574(1 minus 120588ℎ)minus1120572

times(119906)119894minus1(ℎ minus 119906)

119895minus119894minus1(1 minus ℎ)

ℎminus119895119889119906 119889ℎ

=Γ (119899 + 1)

Γ (119894) Γ (119895 minus 119894) Γ (119899 minus 119895 + 1)1205742

times

infin

sum

119896=0

infin

sum

119896=0

(

1

120572+ 119896 minus 1

119896

) (

1

120572+ 119897 minus 1

119897

)

times 120588119896+119897

int

1

0

int

1

0

(119906

ℎ)

119896+119894minus1

(1 minus119906

ℎ)

119895minus119894minus1

times(ℎ)119896+119897+119895minus1

(1 minus ℎ)119899minus119895119889119906 119889

119906

=

infin

sum

119896=0

infin

sum

119897=0

(

1

120572+ 119896 minus 1

119896

)(

1

120572+ 119897 minus 1

119897

)

times120588119896+119897Γ (119899 + 1) Γ (119896 + 119894) Γ (119896 + 119897 + 119895)

Γ (119894) Γ (119896 + 119895) Γ (119896 + 119897 + 119899 + 1)

(A8)

Proof of Theorem 3 When 120572 gt 0 we use

119864 [120583119908] =

119899minus1

sum

119894=2

119864 [119883(119894)] +

1

2(1 minus (119899 minus 2)119908)

times (119864 [119883(1)] + 119864 [119883

(119899)])

MSE (120583119908) = 119864 [(120583

119908minus 120583)

2

]

= 119864 [1205832

119908] minus 2120583119864 [120583

119908] + 120583

2

(A9)

And we have

119864 [1205832

119908] =

119899minus1

sum

119894=2

1199082119864 [119883

2

(119894)] + 2

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

1199082119864 [119883

(119894)119883

(119895)]

+1

4(1 minus (119899 minus 2)119908)

2

times (119864 [1198832

(1)] + 119864 [119883

2

(119899)] +2119864 [119883

(1)119883

(119899)])

+

119899minus1

sum

119894=2

119908 (1 minus (119899 minus 2)119908)

times (119864 [119883(1)119883

(119894)] + 119864 [119883

(119894)119883

(119899)])

(A10)

Let

119860 =

119899minus1

sum

119894=2

119864 [1198832

(119894)] + 2

119899minus2

sum

119894=2

119899minus1

sum

119895=119894+1

119864 [119883(119894)119883

(119895)]

119861 = 119864 [1198832

(1)] + 119864 [119883

2

(119899)] + 2119864 [119883

(1)119883

(119899)]

119862 =

119899minus1

sum

119894=2

(119864 [119883(1)119883

(119894)] + 119864 [119883

(119894)119883

(119899)])

119863 =

119899minus1

sum

119894=2

119864 [119883(119894)]

119864 = 119864 [119883(1)] + 119864 [119883

(119899)]

(A11)

Then

MSE (120583119908) = 119860119908

2+1

4(1 minus (119899 minus 2)119908)

2119861

+ 119908 (1 minus (119899 minus 2)119908)119862

minus 2120583 (119863119908 +1

2(1 minus (119899 minus 2)119908) 119864) + 120583

2

12 ISRN Probability and Statistics

= (119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862)119908

2

+ (minus1

2(119899 minus 2) 119861 + 119862 minus 2120583119863 + 120583 (119899 minus 2) 119864)119908

+1

4119861 minus 120583119864 + 120583

2

(A12)

Now use Lemma A1 then we haveTheorem 3

LemmaA2 119872119878119864(120583119908) in (20) is a convex functionwith 119886 gt 0

when 119899 gt 4119862119861 + 2 119862 and 119861 are given in (20)

Proof Let 119891(119908) = MSE(120583119908) = 119886119908

2+ 119887119908 + 119888 We have

119886 = 119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862 (A13)

where

119860 = 119864[

[

(

119899minus1

sum

119894=2

119883(119894))

2

]

]

ge 0

119861 = 119864 [(119883(1)+ 119883

(119899))2

] ge 0

119862 = 119864[

119899minus1

sum

119894=2

119883(119894)(119883

(1)+ 119883

(119899))] ge 0

(A14)

Since 119899 gt 2 then1

4(119899 minus 2) 119861 minus 119862 gt 0 such that 119886 gt 0 when 119899 gt

4119862

119861+ 2

(A15)

Proof of Corollary 5 ByTheorem 3 and Lemma A2 let

119891 (119908) = MSE (120583119908) = 119886119908

2+ 119887119908 + 119888 (A16)

which is a quadratic functionwith first and secondderivativesof 119891(119908) wrt 119908

1198911015840(119908) = 2119886119908 + 119887 119891

10158401015840(119908) = 2119886 gt 0 (A17)

Hence 119891(119908) is a convex function with minimum value

119891min = 119886(minus119887

2119886)

2

+ 119887(minus119887

2119886) + 119888 = minus

1198872

4119886+ 119888 (A18)

The maximum value of the EFF(120583119908) is

EFFmax (120583119908)

=

120572120574120572(]2minus120572 minus 1205742minus120572) (2 minus 120572) (1 minus (120574])

120572

)

119899 (minus1198872 (4119886) + 119888)

minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

119899 (minus1198872 (4119886) + 119888)

at 119908 = minus119887

2119886

(A19)

Acknowledgments

The authors thank the referees and the editor for their com-ments which helped to improve the paper This research issupported by the Natural Sciences and Engineering ResearchCouncil of Canada

References

[1] R A Pielke Jr J Gratz C W Landsea D Collins M ASaunders and R Musulin ldquoNormalized hurricane damage inthe United States 1900ndash2005rdquo Natural Hazards Review vol 9no 1 pp 29ndash42 2008

[2] P Embrechts C Kluppelberg and T Mikosch ModellingExtremal Events for Insurance and Finance Springer New YorkNY USA 2003

[3] C Kleiber and S Kotz Statistical Size Distribution in Economicsand Actuarial Sciences JohnWiley amp Sons NewYork NY USA2003

[4] Beirlant J Y Goegebeur I Sergers and J Teugels Statisticsof Extremes Theory and Application John Wiley amp Sons NewYork NY USA 2005

[5] M A Beg ldquoEstimation of the tail probability of the truncatedPareto distributionrdquo Journal of Information amp OptimizationSciences vol 2 no 2 pp 192ndash198 1981

[6] I B Aban M M Meerschaert and A K Panorska ldquoParameterestimation for the truncated Pareto distributionrdquo Journal of theAmerican Statistical Association vol 101 no 473 pp 270ndash2772006

[7] B Efron ldquoBootstrap methods another look at the jackkniferdquoThe Annals of Statistics vol 7 no 1 pp 1ndash26 1979

[8] P Barbe and P Bertail The Weighted Bootstrap Springer NewYork NY USA 1995

[9] G R Shorack and J A Wellner Empirical Processes withApplications to Statistics John Wiley amp Sons New York NYUSA 1986

[10] H L Koul Weighted Empirical and Linear Models vol 21 ofLecture Notes-Monograph Series Institute of Mathematical Sta-tistics Hayward Calif USA 1992

[11] M L Huang and P H Brill ldquoA distribution estimation methodbased on level crossingsrdquo Journal of Statistical Planning andInference vol 124 no 1 pp 45ndash62 2004

[12] M L Huang ldquoThe efficiencies of a weighted distribution func-tion estimatorrdquo inThe Proceeding of American Statistical Associ-ation Nonparametric Statistics Section pp 1502ndash1506 2003

[13] B M Hill ldquoA simple general approach to inference about thetail of a distributionrdquo The Annals of Statistics vol 3 no 5 pp1163ndash1174 1975

[14] P Cooke ldquoStatistical inference for bounds of random variablesrdquoBiometrika vol 66 no 2 pp 367ndash374 1979

[15] A N Kolmogorov ldquoSulla determinazione empirica di una leggedi distribuzionerdquoGiornale dellrsquoIstituto Italiano degli Attuari vol4 pp 83ndash91 1933

[16] T W Anderson and D A Darling ldquoAsymptotic theory of cer-tain ldquogoodness of fitrdquo criteria based on stochastic processesrdquoTheAnnals of Mathematical Statistics vol 23 pp 193ndash212 1952

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: Research Article A Weighted Estimation for Risk Modeldownloads.hindawi.com › archive › 2013 › 829131.pdf · 2019-07-31 · Research Article A Weighted Estimation for Risk Model

12 ISRN Probability and Statistics

= (119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862)119908

2

+ (minus1

2(119899 minus 2) 119861 + 119862 minus 2120583119863 + 120583 (119899 minus 2) 119864)119908

+1

4119861 minus 120583119864 + 120583

2

(A12)

Now use Lemma A1 then we haveTheorem 3

LemmaA2 119872119878119864(120583119908) in (20) is a convex functionwith 119886 gt 0

when 119899 gt 4119862119861 + 2 119862 and 119861 are given in (20)

Proof Let 119891(119908) = MSE(120583119908) = 119886119908

2+ 119887119908 + 119888 We have

119886 = 119860 +1

4(119899 minus 2)

2119861 minus (119899 minus 2) 119862 (A13)

where

119860 = 119864[

[

(

119899minus1

sum

119894=2

119883(119894))

2

]

]

ge 0

119861 = 119864 [(119883(1)+ 119883

(119899))2

] ge 0

119862 = 119864[

119899minus1

sum

119894=2

119883(119894)(119883

(1)+ 119883

(119899))] ge 0

(A14)

Since 119899 gt 2 then1

4(119899 minus 2) 119861 minus 119862 gt 0 such that 119886 gt 0 when 119899 gt

4119862

119861+ 2

(A15)

Proof of Corollary 5 ByTheorem 3 and Lemma A2 let

119891 (119908) = MSE (120583119908) = 119886119908

2+ 119887119908 + 119888 (A16)

which is a quadratic functionwith first and secondderivativesof 119891(119908) wrt 119908

1198911015840(119908) = 2119886119908 + 119887 119891

10158401015840(119908) = 2119886 gt 0 (A17)

Hence 119891(119908) is a convex function with minimum value

119891min = 119886(minus119887

2119886)

2

+ 119887(minus119887

2119886) + 119888 = minus

1198872

4119886+ 119888 (A18)

The maximum value of the EFF(120583119908) is

EFFmax (120583119908)

=

120572120574120572(]2minus120572 minus 1205742minus120572) (2 minus 120572) (1 minus (120574])

120572

)

119899 (minus1198872 (4119886) + 119888)

minus

12057221205742120572(]1minus120572 minus 1205741minus120572)

2

(1 minus 120572)2(1 minus (120574])

120572

)2

119899 (minus1198872 (4119886) + 119888)

at 119908 = minus119887

2119886

(A19)

Acknowledgments

The authors thank the referees and the editor for their com-ments which helped to improve the paper This research issupported by the Natural Sciences and Engineering ResearchCouncil of Canada

References

[1] R A Pielke Jr J Gratz C W Landsea D Collins M ASaunders and R Musulin ldquoNormalized hurricane damage inthe United States 1900ndash2005rdquo Natural Hazards Review vol 9no 1 pp 29ndash42 2008

[2] P Embrechts C Kluppelberg and T Mikosch ModellingExtremal Events for Insurance and Finance Springer New YorkNY USA 2003

[3] C Kleiber and S Kotz Statistical Size Distribution in Economicsand Actuarial Sciences JohnWiley amp Sons NewYork NY USA2003

[4] Beirlant J Y Goegebeur I Sergers and J Teugels Statisticsof Extremes Theory and Application John Wiley amp Sons NewYork NY USA 2005

[5] M A Beg ldquoEstimation of the tail probability of the truncatedPareto distributionrdquo Journal of Information amp OptimizationSciences vol 2 no 2 pp 192ndash198 1981

[6] I B Aban M M Meerschaert and A K Panorska ldquoParameterestimation for the truncated Pareto distributionrdquo Journal of theAmerican Statistical Association vol 101 no 473 pp 270ndash2772006

[7] B Efron ldquoBootstrap methods another look at the jackkniferdquoThe Annals of Statistics vol 7 no 1 pp 1ndash26 1979

[8] P Barbe and P Bertail The Weighted Bootstrap Springer NewYork NY USA 1995

[9] G R Shorack and J A Wellner Empirical Processes withApplications to Statistics John Wiley amp Sons New York NYUSA 1986

[10] H L Koul Weighted Empirical and Linear Models vol 21 ofLecture Notes-Monograph Series Institute of Mathematical Sta-tistics Hayward Calif USA 1992

[11] M L Huang and P H Brill ldquoA distribution estimation methodbased on level crossingsrdquo Journal of Statistical Planning andInference vol 124 no 1 pp 45ndash62 2004

[12] M L Huang ldquoThe efficiencies of a weighted distribution func-tion estimatorrdquo inThe Proceeding of American Statistical Associ-ation Nonparametric Statistics Section pp 1502ndash1506 2003

[13] B M Hill ldquoA simple general approach to inference about thetail of a distributionrdquo The Annals of Statistics vol 3 no 5 pp1163ndash1174 1975

[14] P Cooke ldquoStatistical inference for bounds of random variablesrdquoBiometrika vol 66 no 2 pp 367ndash374 1979

[15] A N Kolmogorov ldquoSulla determinazione empirica di una leggedi distribuzionerdquoGiornale dellrsquoIstituto Italiano degli Attuari vol4 pp 83ndash91 1933

[16] T W Anderson and D A Darling ldquoAsymptotic theory of cer-tain ldquogoodness of fitrdquo criteria based on stochastic processesrdquoTheAnnals of Mathematical Statistics vol 23 pp 193ndash212 1952

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 13: Research Article A Weighted Estimation for Risk Modeldownloads.hindawi.com › archive › 2013 › 829131.pdf · 2019-07-31 · Research Article A Weighted Estimation for Risk Model

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of