48
Chapter 4

Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Embed Size (px)

Citation preview

Page 1: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Chapter 4

Page 2: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 1

The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by the values that correspond to the 0.025 and 0.975 quantiles of the sampling distribution of the sample statistic.

Page 3: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 2

C would be the 1-α/2 quantile on the normal distribution.From Table 1 or R function qnorm:

For CI of 0.8, the 0.9 quantile is 1.281For CI of 0.92, the 0.96 quantile is 1.750For CI of 0.98, the 0.99 quantile is 2.326

Page 4: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 3

• From Table 1: 96.1)2/1( c

Page 5: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 4

From Table 1:

Page 6: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 5

μ=1200, σ=25, n=36

For CI 0f 95%

The 95% CI for μ does not contain 1200, so the claim seems unreasonable

Page 7: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 6

Page 8: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 7

• Random sampling requires:

1. That all observations are sampled from the same distribution

2. That the sampled observations are independent, meaning that the probability of sampling a given observation does not alter the probability of sampling another. (Note: this is not the same as equal probability)

Page 9: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 8

The sampling distribution is centered around population μ so it will be 9.

The variance of the sampling distribution is given by

In this case:

Page 10: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 9

X: 1 2 3 4P(x) 0.2 0.1 0.5 0.2

So

Page 11: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 10

The expected value of the sample mean equals the population mean, so if you average 1000 sample means the grand average should approximately equal μ, in this case, 2.7.

Page 12: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 11

Based on the same principle, the expected value of the sample variance equals to the population variance, so if you average 1000 sample variances should approximately equal , in this case, 1.01

Page 13: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 12

a=c(2,6,10,1,15,22,11,29) n=8var(a)[1] 94.28571The variance of the sample mean is estimated by

And standard error is estimated by

Page 14: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 13

The estimate of μ in this case would be based on a single observation = 32.

With a single observation, it is not possible to estimate the standard error because there is no variance in the sample.

As the sample size increases, the variance of the sampling distribution decreases (squared standard error). Note that n is in the denominator of the standard error. Lower variance in the sampling distribution means smaller standard error, a less error in the sample estimates.

Page 15: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 14

b=c(450,12,52,80,600,93,43,59,1000,102,98,43)N=12• var(b)• [1] 93663.52

• Squared SE=

Page 16: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 15

b=c(450,12,52,80,600,93,43,59,1000,102,98,43)> out(b)$out.val[1] 450 600 1000

These outliers substantially inflate the standard error, as they inflate the variance.

Page 17: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 16

c=c(6,3,34,21,34,65,23,54,23) n=9var(c)[1] 413.9444

The squared SE is:

Page 18: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 17

No. An accurate estimate of the standard error requires independence among sampled observations.

Page 19: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 18

The variance of the mixed normal is 10.9, so the squared standard error for a sample of 25 would be 10.9/25=0.436, compared to 1/25=0.04

This means that under small departures from normality, the standard error can inflate more than 10 fold. The inflation greatly increases error, and the length of CIs.

Page 20: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 19

When sampling from a non-normal distribution, the sampling distribution of the mean no longer conforms to the probabilities that of the normal curve. In other words, the sampling distribution is no longer normal, so the se cannot be used accurately to determine probabilities and Cis.

Page 21: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 20μ=30, σ=2, n=16, so SE=2/4=0.5. Determine Z, and consult Table 1, or use R.

For pnorm(29,30,2/sqrt(16))[1] 0.02275013

For

pnorm(30.5,30,2/sqrt(16))[1] 0.84134471-0.841=0.159

For

pnorm(31,30,2/sqrt(16))[1] 0.97724990.9777-0.022=0.955

Page 22: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 21

μ=30, σ=5, n=25, so SE=5/5=1. Determine Z, and consult Table 1, or use R.

a. pnorm(4,5,1)[1] 0.1586553

b. pnorm(7,5,1)[1] 0.9772499. 1-0.977=0.023

c. pnorm(3,5,1)[1] 0.02275013. 0.977-0.022=0.955.

Page 23: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 22

μ=100000, σ=10000, n=16, so SE=10000/4=2500

From Table 1 P<0.0227

Using R:pnorm(95000,100000,10000/sqrt(16))[1] 0.02275013

Page 24: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 23

μ=100000, σ=10000, n=16, so SE=10000/4=2500

Compute z scores for each value and consult Table 1. Or use R:pnorm(97500,100000,10000/sqrt(16))[1] 0.1586553pnorm(102500,100000,10000/sqrt(16))[1] 0.8413447.

Page 25: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 24

μ=750, σ=100, n=9, so SE=100/3=33.333

Compute z scores for each value and consult Table 1. Or use R. • > pnorm(700,750,100/sqrt(9))• [1] 0.0668072• > pnorm(800,750,100/sqrt(9))• [1] 0.93319280.• 933-0.06=0.873

Page 26: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 25

μ=36, σ=5, n=16, so SE=5/4

pnorm(37,36,5/4)[1] 0.7881446

pnorm(33,36,5/4)[1] 0.008197536. 1-0.008=0.992

pnorm(34,36,5/4)[1] 0.05479929

Use table 1For p<-1.6

pnorm(37,36,5/4)[1] 0.7881446> pnorm(34,36,5/4)[1] 0.054799290.788-0.054=0.734

Page 27: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 26

μ=25, σ=3, n=25, so SE=3/5

a. pnorm(24,25,3/5)[1] 0.04779035b. pnorm(26,25,3/5)[1] 0.9522096c. 1-0.0477=0.9523d. 0.95-0.047=0.903

Page 28: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 27

Heavy tailed distributions generally yield long CI for the mean because their large variance inflates the SE. Central limit thorem does not remedy this problem.

Page 29: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 28

Light tailed, symmetric distributions provide relatively accurate probability coverage for CI even with small sample sizes. Central limit theorem works relatively well in this case.

Page 30: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 29

C is the 1-α/2 quantile of a T distribution with n-1 degrees of freedom. Look up c from Table 4, 0.975 qantile with 9dfOr use R:qt(0.975,9): [1] 2.262157

a.b. c.

Page 31: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 30

C is the 1-α/2 quantile of a T distribution with n-1 degrees of freedom. Look up c from Table 4, 0.975 qantile with 9dfOr use R:qt(0.99,9): [1] 2.82

a.b. c.

Page 32: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 31

x=c(77,87,88,114,151,210,219,246,253,262,296,299,306,376,428,515,666,1310,2611)

The R function t.test(x) returns:

One Sample t-testdata: x t = 3.2848, df = 18, p-value = 0.004117alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 161.5030 734.7075 sample estimates mean of x : 448.1053

Page 33: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 32y=c(5,12,23,24,18,9,18,11,36,15)

The R function t.test(y) returns:

One Sample t-test

data: y t = 6.042, df = 9, p-value = 0.0001924alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 10.69766 23.50234 sample estimates: mean of x 17.1

Page 34: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 33Heavy tailed distributions inflate the standard error in a manner that changes the cumulative probabilities of the T distribution. In this situation, the new T quantiles correspond to values that are different than T under normality. The inflation of the SE, due the larger frequency of extreme values in the tails, leads to very long CI that far exceed the nominal value of the state probability coverage under normality. For example, the intended 95% CI will yield a range that in reality covers over 99% of the distribution.

When distributions are skewed, T becomes skewed, off centered (mean and median no longer 0 – due to the dependency that is now created between the mean and SD), with values that do not correspond to the quantiles in Table 4. This results in highly inaccurate probability coverage for CIs.

Page 35: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 34

When the variance is estimated by the empirical sample in a light tailed skewed distribution, the t distribution markedly departs from the values to student t (becoming skewed and no longer centered around 0), so probability coverage is no longer accurate.

Page 36: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 35a. c corresponds to the 0.975 quantile of a T distribution with n-2g(g=.2n,rounded down)—1 df=24-24✕2✕0.2-1=15qt(0.975,15)[1] 2.13145

b. Df=36-36✕2✕0.2-1=21qt(0.975,21)[1] 2.079614

12-12✕2✕0.2-1=7

qt(0.975,7)[1] 2.364624

Page 37: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 36

c corresponds to the 0.975 quantile of a T distribution with n-2g(g=.2n,rounded down)—1

a. qt(0.99,15)[1] 2.60248b. qt(0.99,21)[1] 2.517648c. qt(0.99,7)[1] 2.997952

Page 38: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 37

x=c(77,87,88,114,151,210,219,246,253,262,296,299,306,376,428,515,666,1310,2611)The R function trimci(x) returns $ci[1] 160.3913 404.9933

Page 39: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 38

With trimmed means the CI is 244.6 long

With means it is 573.2, which is 2.34 times longer. The mean has a larger standard errors, resulting in larger CI.

Page 40: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 39m=c(56,106,174,207,219,237,313,365,458,497,515,529,557,615,625,645,973,1065,3215).For mean: t.test(m)266.6441, 930.3033 For trimmed mean: trimci(m) 293.5976, 595.9409Checking for outliers: out(m)$out.val[1] 3215The CI for trimmed mean is far shorter than the CI for the mean because the outlier (3213) inflates the SE. In the case of the trimmed mean, it is trimmed. Other values in the data set may have a smiliar effect.

Page 41: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 40

Under normality, the sample mean has the smallest standard error. So it isthe only candidate for being ideal. But as we have seen, other estimators havea smaller standard error than the mean in other situations, so an optimal estimator does notexist across board.

Page 42: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 41

No, because what often appears to be normal is not normal. In addition, there are robust estimators that compare relatively well (although not as well) to the mean under normality but perform far better in situations that mildly depart from normality. In other word, under normality, the difference is small, under non-normality it can be very large.

Page 43: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 42

c=c(250,220,281,247,230,209,240,160,370,274,210,204,243,251,190,200,130,150,177,475,221,350,224,163,272,236,200,171,98)

CI for the mean:t.test(c)95 percent confidence interval: 200.7457 257.5991 CI for the trimmed mean: trimci(c)[1] 196.6734 244.9056

Page 44: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 43

And outlier analysis reveals 4 outliers:out(c)$out.val[1] 370 475 350 98

These increase the length of the CI foe the mean. They are trimmed with the trimmed mean CI.

Page 45: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 44

Even if the two measures are identical, outliers can largely inflate the CI based on means, rendering the outcome less informative.

Page 46: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 45

In this case we have 16 successes in 16 trials. The R function:binomci(16,16, alpha=0.01)$ci[1] 0.7498942 1.0000000

Page 47: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 46

• In this case we have 0 successes in 200000 trials. The R function:

binomci(0,200000)$ci[1] 0.000000e+00 1.497855e-05

Page 48: Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by

Exercise 47 val=0 for(i in 1:5000) val[i]=median(rbinom(25,6,0.9)) splot(val)

This is an example of how the sampling distribution of the median can largely depart from the expected bell curve dues to tied values. Each of the 5000 samples has many tied values because there are 25 trial in every sample and only 7 possible outcomes. Thus values are bound to repeat.