Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter...

Preview:

Citation preview

Chapter 4

Exercise 1

The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by the values that correspond to the 0.025 and 0.975 quantiles of the sampling distribution of the sample statistic.

Exercise 2

C would be the 1-α/2 quantile on the normal distribution.From Table 1 or R function qnorm:

For CI of 0.8, the 0.9 quantile is 1.281For CI of 0.92, the 0.96 quantile is 1.750For CI of 0.98, the 0.99 quantile is 2.326

Exercise 3

• From Table 1: 96.1)2/1( c

Exercise 4

From Table 1:

Exercise 5

μ=1200, σ=25, n=36

For CI 0f 95%

The 95% CI for μ does not contain 1200, so the claim seems unreasonable

Exercise 6

Exercise 7

• Random sampling requires:

1. That all observations are sampled from the same distribution

2. That the sampled observations are independent, meaning that the probability of sampling a given observation does not alter the probability of sampling another. (Note: this is not the same as equal probability)

Exercise 8

The sampling distribution is centered around population μ so it will be 9.

The variance of the sampling distribution is given by

In this case:

Exercise 9

X: 1 2 3 4P(x) 0.2 0.1 0.5 0.2

So

Exercise 10

The expected value of the sample mean equals the population mean, so if you average 1000 sample means the grand average should approximately equal μ, in this case, 2.7.

Exercise 11

Based on the same principle, the expected value of the sample variance equals to the population variance, so if you average 1000 sample variances should approximately equal , in this case, 1.01

Exercise 12

a=c(2,6,10,1,15,22,11,29) n=8var(a)[1] 94.28571The variance of the sample mean is estimated by

And standard error is estimated by

Exercise 13

The estimate of μ in this case would be based on a single observation = 32.

With a single observation, it is not possible to estimate the standard error because there is no variance in the sample.

As the sample size increases, the variance of the sampling distribution decreases (squared standard error). Note that n is in the denominator of the standard error. Lower variance in the sampling distribution means smaller standard error, a less error in the sample estimates.

Exercise 14

b=c(450,12,52,80,600,93,43,59,1000,102,98,43)N=12• var(b)• [1] 93663.52

• Squared SE=

Exercise 15

b=c(450,12,52,80,600,93,43,59,1000,102,98,43)> out(b)$out.val[1] 450 600 1000

These outliers substantially inflate the standard error, as they inflate the variance.

Exercise 16

c=c(6,3,34,21,34,65,23,54,23) n=9var(c)[1] 413.9444

The squared SE is:

Exercise 17

No. An accurate estimate of the standard error requires independence among sampled observations.

Exercise 18

The variance of the mixed normal is 10.9, so the squared standard error for a sample of 25 would be 10.9/25=0.436, compared to 1/25=0.04

This means that under small departures from normality, the standard error can inflate more than 10 fold. The inflation greatly increases error, and the length of CIs.

Exercise 19

When sampling from a non-normal distribution, the sampling distribution of the mean no longer conforms to the probabilities that of the normal curve. In other words, the sampling distribution is no longer normal, so the se cannot be used accurately to determine probabilities and Cis.

Exercise 20μ=30, σ=2, n=16, so SE=2/4=0.5. Determine Z, and consult Table 1, or use R.

For pnorm(29,30,2/sqrt(16))[1] 0.02275013

For

pnorm(30.5,30,2/sqrt(16))[1] 0.84134471-0.841=0.159

For

pnorm(31,30,2/sqrt(16))[1] 0.97724990.9777-0.022=0.955

Exercise 21

μ=30, σ=5, n=25, so SE=5/5=1. Determine Z, and consult Table 1, or use R.

a. pnorm(4,5,1)[1] 0.1586553

b. pnorm(7,5,1)[1] 0.9772499. 1-0.977=0.023

c. pnorm(3,5,1)[1] 0.02275013. 0.977-0.022=0.955.

Exercise 22

μ=100000, σ=10000, n=16, so SE=10000/4=2500

From Table 1 P<0.0227

Using R:pnorm(95000,100000,10000/sqrt(16))[1] 0.02275013

Exercise 23

μ=100000, σ=10000, n=16, so SE=10000/4=2500

Compute z scores for each value and consult Table 1. Or use R:pnorm(97500,100000,10000/sqrt(16))[1] 0.1586553pnorm(102500,100000,10000/sqrt(16))[1] 0.8413447.

Exercise 24

μ=750, σ=100, n=9, so SE=100/3=33.333

Compute z scores for each value and consult Table 1. Or use R. • > pnorm(700,750,100/sqrt(9))• [1] 0.0668072• > pnorm(800,750,100/sqrt(9))• [1] 0.93319280.• 933-0.06=0.873

Exercise 25

μ=36, σ=5, n=16, so SE=5/4

pnorm(37,36,5/4)[1] 0.7881446

pnorm(33,36,5/4)[1] 0.008197536. 1-0.008=0.992

pnorm(34,36,5/4)[1] 0.05479929

Use table 1For p<-1.6

pnorm(37,36,5/4)[1] 0.7881446> pnorm(34,36,5/4)[1] 0.054799290.788-0.054=0.734

Exercise 26

μ=25, σ=3, n=25, so SE=3/5

a. pnorm(24,25,3/5)[1] 0.04779035b. pnorm(26,25,3/5)[1] 0.9522096c. 1-0.0477=0.9523d. 0.95-0.047=0.903

Exercise 27

Heavy tailed distributions generally yield long CI for the mean because their large variance inflates the SE. Central limit thorem does not remedy this problem.

Exercise 28

Light tailed, symmetric distributions provide relatively accurate probability coverage for CI even with small sample sizes. Central limit theorem works relatively well in this case.

Exercise 29

C is the 1-α/2 quantile of a T distribution with n-1 degrees of freedom. Look up c from Table 4, 0.975 qantile with 9dfOr use R:qt(0.975,9): [1] 2.262157

a.b. c.

Exercise 30

C is the 1-α/2 quantile of a T distribution with n-1 degrees of freedom. Look up c from Table 4, 0.975 qantile with 9dfOr use R:qt(0.99,9): [1] 2.82

a.b. c.

Exercise 31

x=c(77,87,88,114,151,210,219,246,253,262,296,299,306,376,428,515,666,1310,2611)

The R function t.test(x) returns:

One Sample t-testdata: x t = 3.2848, df = 18, p-value = 0.004117alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 161.5030 734.7075 sample estimates mean of x : 448.1053

Exercise 32y=c(5,12,23,24,18,9,18,11,36,15)

The R function t.test(y) returns:

One Sample t-test

data: y t = 6.042, df = 9, p-value = 0.0001924alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 10.69766 23.50234 sample estimates: mean of x 17.1

Exercise 33Heavy tailed distributions inflate the standard error in a manner that changes the cumulative probabilities of the T distribution. In this situation, the new T quantiles correspond to values that are different than T under normality. The inflation of the SE, due the larger frequency of extreme values in the tails, leads to very long CI that far exceed the nominal value of the state probability coverage under normality. For example, the intended 95% CI will yield a range that in reality covers over 99% of the distribution.

When distributions are skewed, T becomes skewed, off centered (mean and median no longer 0 – due to the dependency that is now created between the mean and SD), with values that do not correspond to the quantiles in Table 4. This results in highly inaccurate probability coverage for CIs.

Exercise 34

When the variance is estimated by the empirical sample in a light tailed skewed distribution, the t distribution markedly departs from the values to student t (becoming skewed and no longer centered around 0), so probability coverage is no longer accurate.

Exercise 35a. c corresponds to the 0.975 quantile of a T distribution with n-2g(g=.2n,rounded down)—1 df=24-24✕2✕0.2-1=15qt(0.975,15)[1] 2.13145

b. Df=36-36✕2✕0.2-1=21qt(0.975,21)[1] 2.079614

12-12✕2✕0.2-1=7

qt(0.975,7)[1] 2.364624

Exercise 36

c corresponds to the 0.975 quantile of a T distribution with n-2g(g=.2n,rounded down)—1

a. qt(0.99,15)[1] 2.60248b. qt(0.99,21)[1] 2.517648c. qt(0.99,7)[1] 2.997952

Exercise 37

x=c(77,87,88,114,151,210,219,246,253,262,296,299,306,376,428,515,666,1310,2611)The R function trimci(x) returns $ci[1] 160.3913 404.9933

Exercise 38

With trimmed means the CI is 244.6 long

With means it is 573.2, which is 2.34 times longer. The mean has a larger standard errors, resulting in larger CI.

Exercise 39m=c(56,106,174,207,219,237,313,365,458,497,515,529,557,615,625,645,973,1065,3215).For mean: t.test(m)266.6441, 930.3033 For trimmed mean: trimci(m) 293.5976, 595.9409Checking for outliers: out(m)$out.val[1] 3215The CI for trimmed mean is far shorter than the CI for the mean because the outlier (3213) inflates the SE. In the case of the trimmed mean, it is trimmed. Other values in the data set may have a smiliar effect.

Exercise 40

Under normality, the sample mean has the smallest standard error. So it isthe only candidate for being ideal. But as we have seen, other estimators havea smaller standard error than the mean in other situations, so an optimal estimator does notexist across board.

Exercise 41

No, because what often appears to be normal is not normal. In addition, there are robust estimators that compare relatively well (although not as well) to the mean under normality but perform far better in situations that mildly depart from normality. In other word, under normality, the difference is small, under non-normality it can be very large.

Exercise 42

c=c(250,220,281,247,230,209,240,160,370,274,210,204,243,251,190,200,130,150,177,475,221,350,224,163,272,236,200,171,98)

CI for the mean:t.test(c)95 percent confidence interval: 200.7457 257.5991 CI for the trimmed mean: trimci(c)[1] 196.6734 244.9056

Exercise 43

And outlier analysis reveals 4 outliers:out(c)$out.val[1] 370 475 350 98

These increase the length of the CI foe the mean. They are trimmed with the trimmed mean CI.

Exercise 44

Even if the two measures are identical, outliers can largely inflate the CI based on means, rendering the outcome less informative.

Exercise 45

In this case we have 16 successes in 16 trials. The R function:binomci(16,16, alpha=0.01)$ci[1] 0.7498942 1.0000000

Exercise 46

• In this case we have 0 successes in 200000 trials. The R function:

binomci(0,200000)$ci[1] 0.000000e+00 1.497855e-05

Exercise 47 val=0 for(i in 1:5000) val[i]=median(rbinom(25,6,0.9)) splot(val)

This is an example of how the sampling distribution of the median can largely depart from the expected bell curve dues to tied values. Each of the 5000 samples has many tied values because there are 25 trial in every sample and only 7 possible outcomes. Thus values are bound to repeat.

Recommended