86
A. H. El-Shaarawi National Water Research Institute and McMaster University Southern Ontario Statistics, Graduate Student Seminar Days, 2006 McMaster University May 12, 2006 Environm entalControland StatisticalScience

A. H. El-Shaarawi National Water Research Institute and McMaster University

Embed Size (px)

DESCRIPTION

A. H. El-Shaarawi National Water Research Institute and McMaster University Southern Ontario Statistics, Graduate Student Seminar Days, 2006 McMaster University May 12, 2006. Outline. What is statistical science?. - PowerPoint PPT Presentation

Citation preview

Page 1: A. H. El-Shaarawi National Water Research Institute and McMaster University

A. H. El-ShaarawiNational Water Research Institute and McMaster University

Southern Ontario Statistics, Graduate Student Seminar Days, 2006

McMaster University

May 12, 2006

Environmental Control and Statistical Science

Page 2: A. H. El-Shaarawi National Water Research Institute and McMaster University

Outline

1. What is Statistical Science?

2. What are the Sources of its foundations?

3. What are its Tools? 4. How do you become a successful statistician? 5. Some statistical issues related to environmental protection and control

Page 3: A. H. El-Shaarawi National Water Research Institute and McMaster University

What is statistical science?

A coherent system of knowledge that has its own methods and areas of applications.

The success of the methods is measured by their universal acceptability and by the breadth of the scope of their applications.

Statistics has broad applications (almost to all human activities including science and technology).

Environmental problems are complex and subject to many sources of uncertainty and thus statistics will have greater role to play in furthering the understanding of environmental problems.

The word “ENVIRONMETRICS” refers in part to Environmental Statistics

Page 4: A. H. El-Shaarawi National Water Research Institute and McMaster University

What are the Sources of the foundations? Concepts and abstraction. Schematization == Models Models and reality (deficiency in theory leads

to revision of models)

Page 5: A. H. El-Shaarawi National Water Research Institute and McMaster University

What are the Tools?

Philosophy “different schools of statistical inference”.

Mathematics. Science and technology.

Page 6: A. H. El-Shaarawi National Water Research Institute and McMaster University

How to become a successful statistician? Continue to upgrade your statistical knowledge. Improve your ability to perform statistical

computation. Be knowledgeable in your area of application. Understand the objectives and scope of the problem

in which you are involved. Read about the problem and discuss with experts in

relevant fields. Learn the art of oral and written communication. The

massage of communication is dependent on the interest of to whom the message is intended.

Page 7: A. H. El-Shaarawi National Water Research Institute and McMaster University

Environmental Problem

ionsCommunicatnsApplicatiosearch Re

Tools for:•Data Acquisition•Analysis & Interpretation•Modeling•Model Assessment

•Trend Analysis•Regulations•Improving Sampling Network•Estimation of Loading•Spatial & Temporal Change

•E Canada•H Canada•DFO•INAC•Provincial•EPA•International

Hazards Exposure Control

Page 8: A. H. El-Shaarawi National Water Research Institute and McMaster University

Data Acquisition

Data Analysis Empirical Models Process Models

Information

Prior Information

Page 9: A. H. El-Shaarawi National Water Research Institute and McMaster University

Modeling

Data

Time Space

Seasonal Trend Input-output Net-work

Error +Covariates

Page 10: A. H. El-Shaarawi National Water Research Institute and McMaster University

Measurements

Input System Output

Desirable Qualities of Measurements Effects Related Easy and Inexpensive Rapid Responsive and more Informative (high statistical power)

Page 11: A. H. El-Shaarawi National Water Research Institute and McMaster University

Burlington Beach

Page 12: A. H. El-Shaarawi National Water Research Institute and McMaster University

Applications:

1. Microbiological Regulations (Human health) Current U.S. Environmental Protection Agency (USEPA) guidelines for:

a) designated beaches specify a 30-day geometric mean and a single-sample sample maximum corresponding to the 75th percentile based on that 30-day mean [USEPA, 1986].

b) drinking water specify the arithmetic mean coliform density of all standard samples examined per month shall not exceed one per 100ml.

EPA recent workshop to establish Recreational Water Quality Criteria, Chapel Hill, North Carolina last February: Objective was not only to determine compliance but also to relate waterborne illness to bacteriological indicator’s density

2. Estimation of Chemical Concentrations and Loadings (Ecosystem Health)

Page 13: A. H. El-Shaarawi National Water Research Institute and McMaster University

Designing Sampling Program for Recreational Water (EC, EPA)

Sampling Grid for bathing beach water quality

Setting the regulatory limits: Select the indicators;Determine indicators illness association; Select indicators levelsThat corresponds to acceptable risk level.

Sampling Problems

Page 14: A. H. El-Shaarawi National Water Research Institute and McMaster University

Sampling Designs

Model based Design based Examples of sampling designs

1. Simple random sampling

2. Composite sampling

3. Ranked set sampling

Page 15: A. H. El-Shaarawi National Water Research Institute and McMaster University

Composite Sampling

Individual samples

Composite sample

Individual samples

Composite sample

Page 16: A. H. El-Shaarawi National Water Research Institute and McMaster University

Efficiency of Composite Sampling

Table 1 Density of frequently used distributions in the analysis of bacteriological data

Distribution

Normal

Log-normal

Poisson

Negative binomial

Density

)(xf

2

2

2

2

)(

x

e

22

2

))(ln(

2

2

2

x

ex

!x

e x

)()()()(

)(x

x

x

x

Mean

2/2e

Variance2

2

)1(22 e

+ 2/

Skewness3

3 /

0

1)2(22

ee

2/1 )(

2

Kurtosis4

4 /

3

332222 234 eee

13

2163

Page 17: A. H. El-Shaarawi National Water Research Institute and McMaster University

Efficiency for estimating the mean and variance of the distribution

Moments for the sample mean

344

233

2

)/()33()(;)/()(

/)(;)(

mkmkxmkx

mkxVarxE

.

Number of Composite samples = mNumber of sub-samples in a single C sample = k

Properties of the estimator of Variance:1. It is an unbiased estimator of regardless of the values taken by k and . The variance of this estimator is given by

This expression shows that for: , composite sampling improves the efficiency of as an estimator of regardless of the value of k and in this case the maximum efficiency is obtained for k =1 which corresponds to discrete sampling. , the efficiency of composite sampling depends only on m and is completely independent of k., the composite sampling results in higher variance and for fixed m the variance is maximized when k =1. It should be noted that the frequently used models to represent bacterial counts belong to case c above. This implies that the efficiency declines by composite sampling and maximum efficiency occurs when k = 1. Case b corresponds to the normal distribution where the efficiency is completely independent of the number of the discrete samples included in

the composite sample.

1

23

)1(

)}332()1{()( 4

42

mmkkmm

mkmmsVar

.

Page 18: A. H. El-Shaarawi National Water Research Institute and McMaster University

Health Survey

Summer of Mean Indicator Density –Swimming—Association Gastroenteritis Rate From Trails of All U.S. Studies

Location Beach1 Year E.coli

Density Enterococcus

Density Number

Swimmers Number Illnesses

Number Nonswimmers

Number Illnesses

Gastroenteritis Rate nor-1000

Lake Erie A 1979 23 5.2 3020 17.2 2349 14.9 2.3 B 47 13 2056 19.5 2349 14.9 4.6

Keystone Lake E 138 38.8 3059 20.6 970 15.5 5.1 W 19 6.8 2440 20 970 15.5 0.5

Lake Erie A 1980 137 25 2907 16.5 2944 11.7 4.8 B 236 71 2427 26.4 2944 11.7 14.7*

Keystone Lake E 52 23 5121 13.5 1211 8.1 5.2 W 71 20 3562 11.2 1211 8.1 3.0

Lake Erie B 1982 146 20 4374 24.9 1650 13.9 11.0*

1A=Beach 7 , B=Beach 11, E=Washington Irving Cove Beach, W=Salt Creek Cove—Keystone Ramp Beaches * Indicate swimmer-norswimmer illness rate difference significant at p=0.05 level

Page 19: A. H. El-Shaarawi National Water Research Institute and McMaster University

The effects of exposure to contaminated water

Page 20: A. H. El-Shaarawi National Water Research Institute and McMaster University

Surface water quality criteria (CFU/100mL) proposed by EPA for primary contact recreational use

Water Indicator Geometric Mean

Single Sample Maximum

Marine Enterococci 35 104

Fresh Enterococci

E. coli

33

126

61

235

Water Indicator Geometric Mean

Single Sample Maximum

Marine Enterococci 35 104

Fresh Enterococci

E. coli

33

126

61

235

Based on not less than 5 samples equally spaced over a 30-day

period.The selection of :IndicatorsSummary statistics, number of samples and the reporting periodControl limits

Page 21: A. H. El-Shaarawi National Water Research Institute and McMaster University

Approximate expression for probability of compliance with the regulations

Let

b and

a

)(

)(

g

where b is the geometric mean ; a is single sample maximum )( is the pdf of standard normal distribution

)( is the CDF of standard normal distribution

)())((1Pr ngnob

)()1

)((2Pr

2

n

gg

ganob

Page 22: A. H. El-Shaarawi National Water Research Institute and McMaster University

Sample size n=5 and 10 # of simulations =10000

Page 23: A. H. El-Shaarawi National Water Research Institute and McMaster University

Ratio of single sample rejection probability to that of the mean rule (n = 5,10 and 20)

nagprob

bXprob nn

1

)(1log

)(1

)(1log

)(

Page 24: A. H. El-Shaarawi National Water Research Institute and McMaster University

Modeling the Accumulation of Contaminants in Aquatic Environment

The fish (trout) contamination data:

1. Lake Ontario (n = 171); Lake Superior (61)2. Measurements (total PCBs in whole fish, age, weight, length, %fat)

– fish collected from several locations (representative of the population in the lake because the fish moves allover the lake)

Page 25: A. H. El-Shaarawi National Water Research Institute and McMaster University

0 2 4 6 8

0.0

0.1

0.2

0.3

0.4

0.5

L.ONT

PCB

0 1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

L.Sup

PCB

2 4 6

PCBs

0.0

0.2

0.4

0.6

0.8

1.0

Cd

f

L.OntL.SupCrit ical Level

Page 26: A. H. El-Shaarawi National Water Research Institute and McMaster University

Let x(t) be a random variable representing the contaminant level in a fish at age t. The expected value of x(t) is frequently represented by the expression

where b is the asymptotic accumulation level and λ is the growth parameter. Note that 1 – exp(-λt) is cdf of E(λ ) and so an immediate generalization of this is

The expected instantaneous accumulation rate is f(t; λ)/F(t; λ).

One possible extension is to use the Weibull cdf

)exp(1)( tbt

,)( tbFt

)exp(1)( mtbt

Page 27: A. H. El-Shaarawi National Water Research Institute and McMaster University
Page 28: A. H. El-Shaarawi National Water Research Institute and McMaster University

1 2 3 4 5

02

04

06

08

01

00

PCB Concentration

14 years

7 years

3 years

Page 29: A. H. El-Shaarawi National Water Research Institute and McMaster University

Modeling: Consider a continuous time systems with a stochastic perturbations

)()( xxbdt

dx

with initial condition x(0) = x0, b(x) is a given function of x and t

σ(x) is the amplitude of the perturbation ξ = dw/dt is a white noise assumed to be the time derivative of a Wiener process

Examples for σ(x)=0 : 1. b(x) = - λx μ(x) = μ(0) exp(- λt ) (pure decay) 2. b(x) = λ{μ(0) - μ(x)} μ(x) = μ(0) {1- exp(- λt )} Bertalanffy equation

When σ(x) > 0, a complete description of the process requires finding the pdf f(t,x) and its moments given f(0,x).

Page 30: A. H. El-Shaarawi National Water Research Institute and McMaster University

d

ji

d

i

d

i

i

ji

ij Rxtx

fb

xx

fa

t

f

1, 1

2

,0,)()(

2

1

The density f (t,x) satisfies the Fokker-Planck equation or Kolmogorov forward equation

Where

d

kjkikij xxxa

1

)()()( . When d = 1 this equation simplifies to

x

bf

x

f

dt

df

)()(5.0

2

22

Multiplying by xn and integrating we obtain the moments equation

})({})({)1(5.0 122 nnn xxbnExxEnndt

dm

Clearly dm0/dt = 0 and dm1 /dt = E(b)

Page 31: A. H. El-Shaarawi National Water Research Institute and McMaster University

tt exVaretxVar

txtx

222

))0(()1(2

))((

)exp())0(())((

In the first example with b(x) = -λx and σ(x) = σ, we have

In the second example with b(x) = λ{B - μ(x)} and σ(x) = σ, we have

)()1(2

)(

},)1({2

}1{)(

222

2

2222

tetm

meBdt

dm

eBt

t

t

t

Page 32: A. H. El-Shaarawi National Water Research Institute and McMaster University

The Quasi Likelihood Equations

0)(1'

yV

and the variance of

11^^

)()(

VVaris

Page 33: A. H. El-Shaarawi National Water Research Institute and McMaster University

Upstream-Downstream Water Quality Monitoring Human and Ecosystem Health

Regulations and Control

S0 S1 S2 . . . Sk-1 Sk

Niagara River Overview of U-D M: Purpose, Design and

Examples

Univariate Series and Ratio (Trend & Seasonality)

Bivariate Series

Page 34: A. H. El-Shaarawi National Water Research Institute and McMaster University
Page 35: A. H. El-Shaarawi National Water Research Institute and McMaster University

Fraser River (BC)

Page 36: A. H. El-Shaarawi National Water Research Institute and McMaster University

b) Modeling: Semi Parametric Mixed Model Spline regression with AR error + random effects models.

Yijt = Pijm(t) + Sit + it + Zit i + it

where

Pjm= j0 + j1t + …+jmtm for tj-1 t tj j = 0, 1, . . ., J

~AR(p), ~ MVN(0, V) and it ~ N(0, i2)

The EM is used to obtain the ML estimates

Page 37: A. H. El-Shaarawi National Water Research Institute and McMaster University
Page 38: A. H. El-Shaarawi National Water Research Institute and McMaster University
Page 39: A. H. El-Shaarawi National Water Research Institute and McMaster University

0 100 200 300

0.0

00

.05

0.1

00

.15

day

Estim

ate

d C

on

ce

ntr

atio

n o

f T

P(m

g/L

)

Hope

Red PassHansardMargueriteHope

Hansard/Red Pass

0 100 200 300

51

01

52

02

53

03

5

day

Ra

tio

of th

e T

P C

on

ce

ntr

atio

n

0 100 200 300

1.0

1.5

2.0

2.5

3.0

day

Ra

tio o

f th

e T

P C

on

cen

tra

tion

Page 40: A. H. El-Shaarawi National Water Research Institute and McMaster University

Features of the Data: Period : 2 April 86 - Weekly (March 96) then biweekly Missing values Nondetects (contaminats) Explanatory variables (Flow & Solid concentration) Models a) Niagara River Individual Station

Y ~ Log-normal (x x , 2)

(Y-1)/ ~ N (x x , 2)

Page 41: A. H. El-Shaarawi National Water Research Institute and McMaster University

Two Stations

Ratio R = Y2/Y1

R ~ Log-normal (x x , 2) (R-1)/ ~ N (x x , 2)

Censoring Patterns for the Ratio of the Concentrations

Censoring Status at FE

Censoring Status at NOTL

Censoring Status for the Ratio

A1 : Y1>d1

A1 : Y1>d1

B1 : Y1d1

B1 : Y1d1

A2 : Y2>d2

B2 : Y2d2 A2 : Y2>d2

B2 : Y2d2

Y2/Y1 observed Y2/Y1d2/y1

Y2/Y1y2/d1

0 <Y2/Y1 <

Inference and the Profile Likelihood of The profile likelihood of is

PL() = L(, *(), *())

The relative profile likelihood of is

RPL() = PL()/PL(**)

Page 42: A. H. El-Shaarawi National Water Research Institute and McMaster University

Ratio of GEV Distributions

Page 43: A. H. El-Shaarawi National Water Research Institute and McMaster University
Page 44: A. H. El-Shaarawi National Water Research Institute and McMaster University
Page 45: A. H. El-Shaarawi National Water Research Institute and McMaster University
Page 46: A. H. El-Shaarawi National Water Research Institute and McMaster University
Page 47: A. H. El-Shaarawi National Water Research Institute and McMaster University

Example is Canadian Ecological Effects Monitoring (EEM) Program for Pulp Mills

Risk Identification Risk Assessment Risk Management

Page 48: A. H. El-Shaarawi National Water Research Institute and McMaster University
Page 49: A. H. El-Shaarawi National Water Research Institute and McMaster University
Page 50: A. H. El-Shaarawi National Water Research Institute and McMaster University
Page 51: A. H. El-Shaarawi National Water Research Institute and McMaster University
Page 52: A. H. El-Shaarawi National Water Research Institute and McMaster University
Page 53: A. H. El-Shaarawi National Water Research Institute and McMaster University
Page 54: A. H. El-Shaarawi National Water Research Institute and McMaster University

Objectives of Environmental Effects Monitoring Program: Does effluent cause an effect in the environment? Is effect persistent over time? Does effect warrant correction? What are the causative stressors? From 1992, all new effluent regulations require sites to do EEM. Pulp and Paper Pilot program

Modelling the Toxicity of Canadian Pulp and Paper Effluent on the Reproduction and Survival of Ceridaphnia:

Environmental Effects Monitoring

Page 55: A. H. El-Shaarawi National Water Research Institute and McMaster University
Page 56: A. H. El-Shaarawi National Water Research Institute and McMaster University
Page 57: A. H. El-Shaarawi National Water Research Institute and McMaster University

Environmental Effects Monitoring: Canadian Pulp and Paper Industry

Structure Data and Objective

survivalgrowthalAsurvivalgrowthLarvalonreproductiSurvival

SelanastumnowFatheaddubiaiaCeriodaphn

MillMillMillMill

Tests

IndustryPaperandPulp

Cycles

Ii

Ii

lg

.3min.2.1

321

321

21

21

Page 58: A. H. El-Shaarawi National Water Research Institute and McMaster University
Page 59: A. H. El-Shaarawi National Water Research Institute and McMaster University

Example of data (daphnia survival and reproduction)

No. of neonates produced per replicates and total female adult mortality

100000000000100

9000101000050

12642222510

525

210

13

610

13

10

611

10

012.5

016

12

11

15

915

12

14

10

10

6.25

023

14

16

22

10

22

11

13

15

28

3.13

1020

19

16

13

16

17

14

14

28

1.56

045

44

40

41

38

46

49

35

39

50

0

Mortality

10

987654321%Effluent

Page 60: A. H. El-Shaarawi National Water Research Institute and McMaster University

Example of reproduction data (one cycle)

0 20 40 60 80 100

Dilut ion Factor

01

02

03

04

05

0

Co

un

ts

Exp1Exp2Exp3All

Page 61: A. H. El-Shaarawi National Water Research Institute and McMaster University

Accounting for over dispersion

First Source

)(~| PoisN

)exp(z Frequently

)(~ kgamma leads to negative binomial distribution. Less frequently

),0(log~ normal , gives

0

2

)1()1(

}2/)1)((exp{)1()(

n

nnnNp

n

Some observations: No closed form for the likelihood is available so simulation is the most

convenient approach for the maximization of the likelihood function.

Given ork , quasi likelihood is available for estimating the regression

parameters and this could be followed by the method of moments for estimating the dispersion parameters.

Given the equality of the first two moments of it is easy to show the third and fourth central moments under the gamma assumption are less than those under the lognormal model.

Page 62: A. H. El-Shaarawi National Water Research Institute and McMaster University

Second Source

)1(Iprob

!/)))((exp))(()1()(

var

)0,|()1()1,|()|(

)(

0

1

smmm

sSprob

NSiablesmofsumConsider

IxnNgIxnNgxnNg

ds

s

ds

mm

m

ii

)()(

)())(1()(

)1()(2

SESVar

SEmSVar

mSE

ds

ds

Animal

Alive Dead

Young No Young Young No Young

Page 63: A. H. El-Shaarawi National Water Research Institute and McMaster University

Sample mean vs geometric mean

),(~ 2NX , then )exp(XY ~ log-normal distribution The th moment of Y is 2/)(exp{)}{exp()( 2 XEYE

The ratio of MSE for the arithmetic mean to the geometric mean is

)exp(2/)1(exp2)/2exp(

)exp()2exp(222

22

nnnn

R

0.0 0.5 1.0 1.5 2.0 2.5 3.0

tau^2

02

46

810

R

Page 64: A. H. El-Shaarawi National Water Research Institute and McMaster University

The Maximum Likelihood Estimator

Consider the model )exp( iii aY

The MLE of

)exp()exp(),( 210

21010 a

is

),( 10

^

0 )exp(^

21

^

0

))/21log(exp()),(( 212

2202

1010

^

ncE f , (1)

where pnf and aAAac 1)( .

This expression immediately shows that ),( 10

^

is: 1. Asymptotically unbiased and consistent. 2. Positively biased that is overestimates ),( 10 for finite n.

3. Defined only for .2 21n

The last property particularly shows the serious limitation of the ML estimator since its expectation does not exist when does not satisfy (3) and so the ML estimator in this case is meaningless. For finite samples MLE > largest observation

Page 65: A. H. El-Shaarawi National Water Research Institute and McMaster University

Some simulation results (MLE)

22 5 10 20 30 50

0.10 0.0006 0.0004 0.0002 0.0001 0.0001 0.0000

0.50 0.0190 0.0108 0.0058 0.0030 0.0020 0.0012

1.00 0.1014 0.0474 0.0244 0.0124 0.0083 0.0050

2.00 - 0.2481 0.1098 0.0522 0.0343 0.0203

3.00 - 0.8825 0.2905 0.1263 0.0808 0.0470

Table 1:Relative Bias in the ML estimator for different n and 2

n

Page 66: A. H. El-Shaarawi National Water Research Institute and McMaster University

n 2

2 5 8 12 20 30 50

0.01 0.030 0.012 0.008 0.005 0.003 0.002 0.001 0.10 0.311 0.129 0.081 0.054 0.033 0.022 0.013 0.50 2.936 0.909 0.560 0.373 0.224 0.149 0.090 1.00 NA 3.798 1.752 1.072 0.617 0.406 0.242 2.00 NA NA 29.373 5.364 2.189 1.316 0.749 3.00 NA NA NA 109.194 6.910 3.136 1.588

Table 2 Skewness

MLE has a heavy right tail distribution (skewed to the right)

Page 67: A. H. El-Shaarawi National Water Research Institute and McMaster University

n 2

2 5 8 12 20 30 50

0.01 -1.959 -1.984 -1.990 -1.993 -1.996 -1.997 -1.998 0.10 -1.458 -1.807 -1.883 -1.924 -1.955 -1.970 -1.982 0.50 NA 0.642 -0.764 -1.297 -1.630 -1.770 -1.869 1.00 NA 83.616 7.089 1.481 -0.576 -1.212 -1.594 2.00 NA NA NA 154.729 11.447 2.850 -0.118 3.00 NA NA NA NA 274.759 26.896 4.703

Table 3 Kurtosis

MLE has heavy tails and sharp central part for kurtosis>0 while tails are lighter and the central part is flatter for kurtosis<0

Page 68: A. H. El-Shaarawi National Water Research Institute and McMaster University

UMVU Estimator

UMVU for is

0

^22^~

!2

1)exp(

j jj

jj

jb

ca

where aAAac 1)( and )2/)((/)2/)2(()(2 pnjpnpnb jjj .

j

i iji

j

j jj

jjj

bb

bij

jb

ccaVar

0

2

0

222

~

!2

1))(22exp()(

The mean square error is used to compare those estimators, that is

2^^

)()( BVarMSE and )()(~~

VarMSE .

Page 69: A. H. El-Shaarawi National Water Research Institute and McMaster University

UMVU : Closed form expression for n=2m-1

Theorem: )(^

nng the MLE based on df of )exp()( 2 h satisfies the recurrence relation

2

^

2

^

2 )(2

)(

n

nnn

nn gd

dng

where 2/2nn ns and 2

ns is the MLE of 2 based on n df.

It is easy to show that )2cosh()( 11

^

1 g and

12|)2cosh()(

1

1

12

^

12

mm

m

mmd

dg

For n =1, 3, 5 and 7 are

25.2

2/3

/)2cosh(3/)25.1)(2sinh(158/)2sinh(34/)2cosh(3

2/)2sinh()2cosh(

For sample of size 2 MLE reduces to the sample mean

Page 70: A. H. El-Shaarawi National Water Research Institute and McMaster University

UMVU: n even

For n =2m we have to start the recurrence relation

)()!(

)2/()( 0

^

022 I

ig

i

i

Where I0 (z) is the modified Bessel function of order 0 which can be written in the form

0

0 )cos(exp1

)( dzzI .

Approximation of this function will be needed to start the recurrence relation.

Page 71: A. H. El-Shaarawi National Water Research Institute and McMaster University

Modified Estimator

Modified Estimator for the Sample Mean We consider an estimator of the form 5.0,exp 2 sxm Select that minimizes MSE. This leads to

2

)1(

)3(611

6exp s

nn

nnxm

Note that as n ,^

m

Page 72: A. H. El-Shaarawi National Water Research Institute and McMaster University
Page 73: A. H. El-Shaarawi National Water Research Institute and McMaster University
Page 74: A. H. El-Shaarawi National Water Research Institute and McMaster University

0.0 0.1 0.2 0.3 0.4 0.5

c

0.2

0.4

0.6

0.8

1.0

RM

SE

n = 5

s= .01s= .10

s= .50

s= 1.00

s= 1.50

0.0 0.1 0.2 0.3 0.4 0.5

c

0.0

0.2

0.4

0.6

0.8

1.0

RM

SE

n = 10

s= .01s= .10

s= .50

s= 1.00

s= 1.50

s= 2.00s= 2.50s= 3.00

0.0 0.1 0.2 0.3 0.4 0.5

c

0.5

1.0

1.5

RM

SE

n = 20

s= .01s= .10s= .50

s= 1.00s= 1.50s= 2.00

s= 2.50s= 3.00

0.0 0.1 0.2 0.3 0.4 0.5

c

12

3

RM

SE

n = 40

s= .01

s= .1

s= .5s= 1.0

1.5

s= 2.5

s= 2

s= 3

Page 75: A. H. El-Shaarawi National Water Research Institute and McMaster University

Sample mean vs geometric mean

),(~ 2NX , then )exp(XY ~ log-normal distribution The th moment of Y is 2/)(exp{)}{exp()( 2 XEYE

The ratio of MSE for the arithmetic mean to the geometric mean is

)exp(2/)1(exp2)/2exp(

)exp()2exp(222

22

nnnn

R

0.0 0.5 1.0 1.5 2.0 2.5 3.0

tau^2

02

46

810

R

Page 76: A. H. El-Shaarawi National Water Research Institute and McMaster University

The Maximum Likelihood Estimator

Consider the model )exp( iii aY

The MLE of

)exp()exp(),( 210

21010 a

is

),( 10

^

0 )exp(^

21

^

0

))/21log(exp()),(( 212

2202

1010

^

ncE f , (1)

where pnf and aAAac 1)( .

This expression immediately shows that ),( 10

^

is: 1. Asymptotically unbiased and consistent. 2. Positively biased that is overestimates ),( 10 for finite n.

3. Defined only for .2 21n

The last property particularly shows the serious limitation of the ML estimator since its expectation does not exist when does not satisfy (3) and so the ML estimator in this case is meaningless. For finite samples MLE > largest observation

Page 77: A. H. El-Shaarawi National Water Research Institute and McMaster University

Some simulation results (MLE)

22 5 10 20 30 50

0.10 0.0006 0.0004 0.0002 0.0001 0.0001 0.0000

0.50 0.0190 0.0108 0.0058 0.0030 0.0020 0.0012

1.00 0.1014 0.0474 0.0244 0.0124 0.0083 0.0050

2.00 - 0.2481 0.1098 0.0522 0.0343 0.0203

3.00 - 0.8825 0.2905 0.1263 0.0808 0.0470

Table 1:Relative Bias in the ML estimator for different n and 2

n

Page 78: A. H. El-Shaarawi National Water Research Institute and McMaster University

n 2

2 5 8 12 20 30 50

0.01 0.030 0.012 0.008 0.005 0.003 0.002 0.001 0.10 0.311 0.129 0.081 0.054 0.033 0.022 0.013 0.50 2.936 0.909 0.560 0.373 0.224 0.149 0.090 1.00 NA 3.798 1.752 1.072 0.617 0.406 0.242 2.00 NA NA 29.373 5.364 2.189 1.316 0.749 3.00 NA NA NA 109.194 6.910 3.136 1.588

Table 2 Skewness

MLE has a heavy right tail distribution (skewed to the right)

Page 79: A. H. El-Shaarawi National Water Research Institute and McMaster University

n 2

2 5 8 12 20 30 50

0.01 -1.959 -1.984 -1.990 -1.993 -1.996 -1.997 -1.998 0.10 -1.458 -1.807 -1.883 -1.924 -1.955 -1.970 -1.982 0.50 NA 0.642 -0.764 -1.297 -1.630 -1.770 -1.869 1.00 NA 83.616 7.089 1.481 -0.576 -1.212 -1.594 2.00 NA NA NA 154.729 11.447 2.850 -0.118 3.00 NA NA NA NA 274.759 26.896 4.703

Table 3 Kurtosis

MLE has heavy tails and sharp central part for kurtosis>0 while tails are lighter and the central part is flatter for kurtosis<0

Page 80: A. H. El-Shaarawi National Water Research Institute and McMaster University

UMVU Estimator

UMVU for is

0

^22^~

!2

1)exp(

j jj

jj

jb

ca

where aAAac 1)( and )2/)((/)2/)2(()(2 pnjpnpnb jjj .

j

i iji

j

j jj

jjj

bb

bij

jb

ccaVar

0

2

0

222

~

!2

1))(22exp()(

The mean square error is used to compare those estimators, that is

2^^

)()( BVarMSE and )()(~~

VarMSE .

Page 81: A. H. El-Shaarawi National Water Research Institute and McMaster University

UMVU : Closed form expression for n=2m-1

Theorem: )(^

nng the MLE based on df of )exp()( 2 h satisfies the recurrence relation

2

^

2

^

2 )(2

)(

n

nnn

nn gd

dng

where 2/2nn ns and 2

ns is the MLE of 2 based on n df.

It is easy to show that )2cosh()( 11

^

1 g and

12|)2cosh()(

1

1

12

^

12

mm

m

mmd

dg

For n =1, 3, 5 and 7 are

25.2

2/3

/)2cosh(3/)25.1)(2sinh(158/)2sinh(34/)2cosh(3

2/)2sinh()2cosh(

For sample of size 2 MLE reduces to the sample mean

Page 82: A. H. El-Shaarawi National Water Research Institute and McMaster University

UMVU: n even

For n =2m we have to start the recurrence relation

)()!(

)2/()( 0

^

022 I

ig

i

i

Where I0 (z) is the modified Bessel function of order 0 which can be written in the form

0

0 )cos(exp1

)( dzzI .

Approximation of this function will be needed to start the recurrence relation.

Page 83: A. H. El-Shaarawi National Water Research Institute and McMaster University

Modified Estimator

Modified Estimator for the Sample Mean We consider an estimator of the form 5.0,exp 2 sxm Select that minimizes MSE. This leads to

2

)1(

)3(611

6exp s

nn

nnxm

Note that as n ,^

m

Page 84: A. H. El-Shaarawi National Water Research Institute and McMaster University
Page 85: A. H. El-Shaarawi National Water Research Institute and McMaster University
Page 86: A. H. El-Shaarawi National Water Research Institute and McMaster University

0.0 0.1 0.2 0.3 0.4 0.5

c

0.2

0.4

0.6

0.8

1.0

RM

SE

n = 5

s= .01s= .10

s= .50

s= 1.00

s= 1.50

0.0 0.1 0.2 0.3 0.4 0.5

c

0.0

0.2

0.4

0.6

0.8

1.0

RM

SE

n = 10

s= .01s= .10

s= .50

s= 1.00

s= 1.50

s= 2.00s= 2.50s= 3.00

0.0 0.1 0.2 0.3 0.4 0.5

c

0.5

1.0

1.5

RM

SE

n = 20

s= .01s= .10s= .50

s= 1.00s= 1.50s= 2.00

s= 2.50s= 3.00

0.0 0.1 0.2 0.3 0.4 0.5

c

12

3

RM

SE

n = 40

s= .01

s= .1

s= .5s= 1.0

1.5

s= 2.5

s= 2

s= 3