28

ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

  • Upload
    others

  • View
    16

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

ESTIMATION

David M. Lane. et al. Introduction to Statistics : pp. 329�369

[email protected] ICY0006: Lecture 8 1 / 22

Page 2: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

Contents

1 Introduction (recollection of sampling distributions)

2 Estimation in Statistics

3 Student's t-Distribution

[email protected] ICY0006: Lecture 8 2 / 22

Page 3: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

Next section

1 Introduction (recollection of sampling distributions)

2 Estimation in Statistics

3 Student's t-Distribution

[email protected] ICY0006: Lecture 8 3 / 22

Page 4: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

If the population mean µ and standard deviation

σ are given (1)

We have for (a) random sample(s):

Sample mean:

Normally distributed random value;Sample mean: µX̄ = µ

Standard error σX̄ = σ/√N, where N is the sample size

Con�dence interval:

Con�dence interval for sample mean is Limits = µX̄ ±z(σx̄ ), where z = 1.96 for 95%con�dence and z = 2.58 for 99% con�dence

Di�erence between means:

Di�erence: µM1−M2

= µ1−µ2

Error

σM1−M2

=

√σ2

1

n1+

σ2

2

n2

[email protected] ICY0006: Lecture 8 4 / 22

Page 5: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

If the population mean µ and standard deviation

σ are given (2)

We have for (a) random sample(s):

Correlation coe�cient r :

the sample Pearson correlation coe�cient:

r = rxy =n∑xiyi −∑xi ∑yi√

n∑x2i − (∑xi )2√

n∑y2i − (∑yi )2

Transforme r to the normally distributed Fisher's z ′:

z ′ = 0.5 ln1+ r

1− r

Standard error for z ′:σz ′ = 1/

√n−3

Compute a con�dence interval for N (z ′,σz ′ ), i.e. for 95% con�dence:

[lbz ′ ,ubz ′ ] = [z ′−1.96σz ′ ,z′+1.96σz ′ ]

Convert the con�dence interval back to r , i.e. lbz ′ → lb and ubz ′ → ub using inversetransformation:

r =e2z

′ −1e2z

′+1

[email protected] ICY0006: Lecture 8 5 / 22

Page 6: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

If the population proportion parameter π is given

We have for (a) random sample(s):

Sample proportions:

Mean:µp = π

Standard deviation:

σp ==

√π(1−π)

N

[email protected] ICY0006: Lecture 8 6 / 22

Page 7: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

Next section

1 Introduction (recollection of sampling distributions)

2 Estimation in Statistics

3 Student's t-Distribution

[email protected] ICY0006: Lecture 8 7 / 22

Page 8: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

Estimation

Estimation refers to the process by which one makes inferences about a

population, based on information obtained from a sample.

An estimate of a population parameter may be expressed in two ways:

Point estimate. A point estimate of a population parameter is a single value of a statistic. Forexample, the sample mean µX̄ is a point estimate of the population mean µ.Similarly, the sample proportion p is a point estimate of the populationproportion P.

Interval estimate. An interval estimate is de�ned by two numbers, between which a populationparameter is said to lie. For example, a< x < b is an interval estimate of thepopulation mean µ. It indicates that the population mean is greater than abut less than b.

[email protected] ICY0006: Lecture 8 8 / 22

Page 9: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

Con�dence Intervals

A con�dence interval is used to express the precision and uncertainty associated with aparticular sampling method.

A con�dence interval consists of three parts:1 A con�dence level.2 A statistic.3 A margin of error.

The con�dence level describes the likelihood (e.g. 95% or 99%) that a particularsampling method will produce a con�dence interval that includes the true populationparameter.

The statistic and the margin of error de�ne an interval estimate that describes theprecision of the method. The interval estimate of a con�dence interval is de�ned by the

sample statistic±margin of error

[email protected] ICY0006: Lecture 8 9 / 22

Page 10: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

The sampling distribution of the mean

was based on knowing the standard

deviation of the population.

But you almost never know the

standard deviation of the population.

[email protected] ICY0006: Lecture 8 10 / 22

Page 11: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

Degrees of Freedom

Degrees of freedom (shortly df or ν) is the number of values in the �nal calculation of astatistic that are free to vary.

Df is the number of independent pieces of information that go into the estimate of aparameter.

In general, the degrees of freedom of an estimate of a parameter are equal to the numberof independent scores that go into the estimate minus the number of parameters used asintermediate steps in the estimation of the parameter itself.

For instance, the sample variance has N−1 degrees of freedom, since it is computed fromN random scores minus the only 1 parameter estimated as intermediate step, which is thesample mean).

[email protected] ICY0006: Lecture 8 11 / 22

Page 12: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

Degrees of Freedom

Degrees of freedom (shortly df or ν) is the number of values in the �nal calculation of astatistic that are free to vary.

Df is the number of independent pieces of information that go into the estimate of aparameter.

In general, the degrees of freedom of an estimate of a parameter are equal to the numberof independent scores that go into the estimate minus the number of parameters used asintermediate steps in the estimation of the parameter itself.

For instance, the sample variance has N−1 degrees of freedom, since it is computed fromN random scores minus the only 1 parameter estimated as intermediate step, which is thesample mean).

[email protected] ICY0006: Lecture 8 11 / 22

Page 13: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

Degrees of Freedom

Degrees of freedom (shortly df or ν) is the number of values in the �nal calculation of astatistic that are free to vary.

Df is the number of independent pieces of information that go into the estimate of aparameter.

In general, the degrees of freedom of an estimate of a parameter are equal to the numberof independent scores that go into the estimate minus the number of parameters used asintermediate steps in the estimation of the parameter itself.

For instance, the sample variance has N−1 degrees of freedom, since it is computed fromN random scores minus the only 1 parameter estimated as intermediate step, which is thesample mean).

[email protected] ICY0006: Lecture 8 11 / 22

Page 14: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

Degrees of Freedom

Degrees of freedom (shortly df or ν) is the number of values in the �nal calculation of astatistic that are free to vary.

Df is the number of independent pieces of information that go into the estimate of aparameter.

In general, the degrees of freedom of an estimate of a parameter are equal to the numberof independent scores that go into the estimate minus the number of parameters used asintermediate steps in the estimation of the parameter itself.

For instance, the sample variance has N−1 degrees of freedom, since it is computed fromN random scores minus the only 1 parameter estimated as intermediate step, which is thesample mean).

[email protected] ICY0006: Lecture 8 11 / 22

Page 15: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

Degrees of Freedom

Degrees of freedom (shortly df or ν) is the number of values in the �nal calculation of astatistic that are free to vary.

Df is the number of independent pieces of information that go into the estimate of aparameter.

In general, the degrees of freedom of an estimate of a parameter are equal to the numberof independent scores that go into the estimate minus the number of parameters used asintermediate steps in the estimation of the parameter itself.

For instance, the sample variance has N−1 degrees of freedom, since it is computed fromN random scores minus the only 1 parameter estimated as intermediate step, which is thesample mean).

Example: a sample {A,B}

An estimate of the population mean would be M = (A+B)/2

Two estimates of variance

Estimate1 = (A−M)2/2 = (A−B)2/4

Estimate2 = (B−M)2/2 = (B−A)2/4

The estimates are equal (are not independent): knowing A and M one can compute B.

[email protected] ICY0006: Lecture 8 11 / 22

Page 16: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

Degrees of Freedom

Degrees of freedom (shortly df or ν) is the number of values in the �nal calculation of astatistic that are free to vary.

Df is the number of independent pieces of information that go into the estimate of aparameter.

In general, the degrees of freedom of an estimate of a parameter are equal to the numberof independent scores that go into the estimate minus the number of parameters used asintermediate steps in the estimation of the parameter itself.

For instance, the sample variance has N−1 degrees of freedom, since it is computed fromN random scores minus the only 1 parameter estimated as intermediate step, which is thesample mean).

The sample variance has N−1 degrees of freedom;

The estimate of the variance in a sample is:

s2 =∑(X −M)2

df=

∑(X −M)2

N−1

s2 is called unbiased estimate of the population variance

[email protected] ICY0006: Lecture 8 11 / 22

Page 17: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

Bias

Components of total error:

[email protected] ICY0006: Lecture 8 12 / 22

Page 18: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

Nonsampling bias

Types:

Sampling frame is not equal to population to which you want to generalize (sampling

universe)I Sampling frame out of dateI Non-response among sampling units in sampling frame

Measurement errorI Tape incorrectly �xed to height boardI Scale consistently reads low by 0.5 kgI Failure to remove heavy clothing before weighingI Misleading questionsI Recall bias

[email protected] ICY0006: Lecture 8 13 / 22

Page 19: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

Sampling bias

Selection of nonrepresentative sample, i.e., the likelihood of selection not equal for eachsampling unit

Failure to weight analysis of unequal probability sample

In sum, you have not sampled people with equal probability and you have not accountedfor this in your analysis!

Examples

Non-representative sampleI Selecting youngest child in householdI Choosing households close to the roadI Using a di�erent sampling fraction in di�erent provinces

Failure to do statistical weighting

[email protected] ICY0006: Lecture 8 14 / 22

Page 20: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

Sampling error

Di�erence between survey result and population value due to random selection of sample

In�uenced by:I Sample sizeI Sampling scheme

Unlike non-sampling bias and sampling bias, it can be predicted, calculated, and accounted for.

Examples

Measures of sampling error:I Con�dence limitsI Standard errorI Coe�cient of varianceI P valuesI Others

Use these measures to:I Calculate sample size prior to samplingI Determine how sure we are of result after analysis

[email protected] ICY0006: Lecture 8 15 / 22

Page 21: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

Summarizing bias and sampling error

Sampling error

Di�erence between survey result and population value due to random selection ofsampleGreater with smaller sample sizesInduces lack of precision

Bias

Di�erence between survey result and population value due to error in measurement,selection of non-representative sample or other factorsDue to factors other than sample sizeTherefore, a large sample size cannot guarantee absence of biasInduces lack of accuracy, even with good precision

[email protected] ICY0006: Lecture 8 16 / 22

Page 22: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

Next section

1 Introduction (recollection of sampling distributions)

2 Estimation in Statistics

3 Student's t-Distribution

[email protected] ICY0006: Lecture 8 17 / 22

Page 23: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

Student's t-Distribution

Student's t-distribution has the probability density function given by

f (t) =Γ( ν+1

2)

√νπ Γ( ν

2)

(1+

t2

ν

)− ν+12

,

where ν is the number of degrees of freedom and Γ is the gamma function.

It was developed by William Sealy Gosset under the pseudonym Student.

Whereas a normal distribution describes a full population, t-distributions describesamples drawn from a full population; accordingly, the t-distribution for each sample sizeis di�erent, and the larger the sample, the more the distribution resembles a normaldistribution.

[email protected] ICY0006: Lecture 8 18 / 22

Page 24: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

Student's t-DistributionStudent's t-distribution has the probability density function given by

f (t) =Γ( ν+1

2)

√νπ Γ( ν

2)

(1+

t2

ν

)− ν+12

,

where ν is the number of degrees of freedom and Γ is the gamma function.It was developed by William Sealy Gosset under the pseudonym Student.Whereas a normal distribution describes a full population, t-distributions describesamples drawn from a full population; accordingly, the t-distribution for each sample sizeis di�erent, and the larger the sample, the more the distribution resembles a normaldistribution.

Density of the t-distribution (red) for 1, 2, and 3 degrees of freedom compared to the standardnormal distribution (blue). Previous plots shown in green.

[email protected] ICY0006: Lecture 8 18 / 22

Page 25: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

The critical valueRecall that the critical values for normal distribution are

Z.95 = 1.96

Z.99 = 2.58

The critical values for t-distribution tCL depend on the degrees of freedom:

[email protected] ICY0006: Lecture 8 19 / 22

Page 26: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

Con�dence Interval for the MeanWe do not know µ and σ for the population. The steps of analysis:

1 Take a sample with the size N (i.e. df = N−1);2 Compute the sample mean M, variance s2 (= estimates of the population mean and

variance), the standard error sm = s/√N;

3 Decide the con�dence level (usually 0.95 or 0.99)

4 Compute con�dence interval:

Lower limit = M− (tCL)(sm)

Upper limit = M + (tCL)(sm)

Example

Let's the �ve numbers 2, 3, 5, 6, and 9 are sampled from a normal distribution;

We will have: M = 5, s2m = ∑(X −M)2/(N−1) = 7.5 and sM = s/√N = 1.225

For the 95% con�dence level and df = 5−1 = 4, the critical value is tCL = 2.776

The con�dence interval:

Lower limit = 5−2.776 ·1.225 = 1.60

Upper limit = 5+2.776 ·1.225 = 8.40

or 5±3.40.

[email protected] ICY0006: Lecture 8 20 / 22

Page 27: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

Di�erence between Means (Computations for Unequal Sample Sizes)

We have the following three assumptions:

The two populations have the same variance. This assumption is called the assumption ofhomogeneity of variance.

The populations are normally distributed.

Each value is sampled independently from each other value.

A con�dence interval on the di�erence between means is computed using the followingformula::

Lower limit = M1−M2− (tCL)(SM1−M2

)

Upper limit = M1−M2 + (tCL)(SM1−M2

)where M1−M2 is the di�erence between sample means, tCL is the t for the desired levelof con�dence, and (S

M1−M2) is the estimated standard error of the di�erence between

sample means.(S

M1−M2) is estimated by the harmonic mean of the �mean square error� (MSE) that

equals to the the sum of squares error (SSE) divided by the the degrees of freedom:

SSE = ∑(X −M1)2 +∑(X −M2)2

MSE = SSE/df

df = (n1−1) + (n2−1)

SM1−M2

=

√MSE/

(1

n1+

1

n2

)[email protected] ICY0006: Lecture 8 21 / 22

Page 28: ESTIMATION - cs.ioc.ee · Interval estimate. An interval estimate is de ned by wto numbers, between which a population parameter is said to lie. For example, a

ioc.pdf

Di�erence between Means (2)

Example

Two samples are given:

Sample1 = {3,4,5}, and Sample2 = {2,4}

We have:

M1 = 4, and M2 = 3, and M1−M2 = 1

SSE = (3−4)2 + (4−4)2 + (5−4)2 + (2−3)2 + (4−3)2 = 4

df = (3−1) + (2−1) = 3

MSE = 4/3 = 1.333

2

√MSE/

(1

n1+ 1

n2

)= 2

√MSE/

(1

n1+ 1

n2

)= 2√1.333/

(1

3+ 1

2

)= 2√1.11 = 1.054

tCL for 3 df and the 0.05 level = 3.182.

Lower limit = 1−3.182 ·1.054 =−2.35Upper limit = 1+3.182 ·1.054 = 4.35

[email protected] ICY0006: Lecture 8 22 / 22