2
Soud Khalifa - CIs and hypothesis testing 1 Confidence Intervals 1. A species of fish has weights distributed according to a normal distribution with standard deviation 50 grams. 25 fish were caught, and the sample mean was found to be 184 grams. Find a 95% confidence interval for the mean weight of the population. What is the class interval width? Solution: For 95% CI, we have α =0.05, and α/2=0.025. We can use R to find z α/2 : qnorm(0.025, lower.tail = FALSE) ## [1] 1.959964 so α 1.96 P (-1.96 ¯ X-μ σ/ n 1.96) = 0.95 = P (-1.96 184-μ 10 1.96) = P (184 - 1.96 × 10 μ 184+1.96 × 10) = P (164.4 μ 203.6) The 95% confidence interval width is 2z α/2 σ n = 39.2 (Note: this is the same as (203.6 - 164.4) from the sample above. 2. In a survey conducted among 100 students in a university, 67 feel that parking is a problem. Find a 95% confidence interval of the true proportion of students who feel that parking is a problem. Solution: The true proportion of students can be modelled as binomial variable X, with p = 0.67, n = 100. E[X]= np and V ar[X]= np(1 - p). Since n is large, CLT is valid and so the distribution of X is approximately normal, with mean and variance given by E[X/n]=1/nE[X]= p and V ar[X/n]= 1/nV ar[X]= p(1 - p). For the 95% CI centered around 67%, we use z α 2 =1.96, and the interval is given by p ± z α 2 q p(1-p) n = (0.67 - 0.092, 0.67 + 0.092) = (58%, 76%) 3. Assume that the sample mean is 5, the sample standard deviation is 2, and the sample size is 20. Compute the interval for a 95% confidence level. Solution: Since the sample size is small, we employ the T-distribution. The 95% confidence interval is given by ¯ x ± t α/2,n-1 · s n . Using R, We can calculate the t-statistic as: n <- 20 tstat <- qt(0.025, lower.tail=FALSE, df=n-1) tstat ## [1] 2.093024 Therefore, the interval is 5 ± 2.09×2 20 = (5 - 0.93, 5+0.93) = (4.07, 5.93)

CIs Hypo Test

  • Upload
    ak

  • View
    218

  • Download
    2

Embed Size (px)

DESCRIPTION

Statistical hypothesis testing - using R. (Solved and unsolved problems)

Citation preview

Page 1: CIs Hypo Test

Soud Khalifa - CIs and hypothesis testing

1 Confidence Intervals

1. A species of fish has weights distributed according to a normal distribution with standard deviation 50grams. 25 fish were caught, and the sample mean was found to be 184 grams. Find a 95% confidenceinterval for the mean weight of the population. What is the class interval width?

Solution:

For 95% CI, we have α = 0.05, and α/2 = 0.025. We can use R to find zα/2:

qnorm(0.025, lower.tail = FALSE)

## [1] 1.959964

so α ≈ 1.96

P (−1.96 ≤ X̄−µσ/√n≤ 1.96) = 0.95 = P (−1.96 ≤ 184−µ

10 ≤ 1.96) = P (184−1.96×10 ≤ µ ≤ 184+1.96×10)

= P (164.4 ≤ µ ≤ 203.6)

The 95% confidence interval width is2zα/2σ√

n= 39.2 (Note: this is the same as (203.6 − 164.4) from the

sample above.

2. In a survey conducted among 100 students in a university, 67 feel that parking is a problem. Find a 95%confidence interval of the true proportion of students who feel that parking is a problem.

Solution:

The true proportion of students can be modelled as binomial variable X, with p = 0.67, n = 100.E[X] = np and V ar[X] = np(1 − p). Since n is large, CLT is valid and so the distribution of X isapproximately normal, with mean and variance given by E[X/n] = 1/nE[X] = p and V ar[X/n] =1/nV ar[X] = p(1 − p). For the 95% CI centered around 67%, we use zα

2= 1.96, and the interval is

given by p± zα2

√p(1−p)n = (0.67 − 0.092, 0.67 + 0.092) = (58%, 76%)

3. Assume that the sample mean is 5, the sample standard deviation is 2, and the sample size is 20.Compute the interval for a 95% confidence level.

Solution:

Since the sample size is small, we employ the T-distribution. The 95% confidence interval is given byx̄± tα/2,n−1 · s√

n. Using R, We can calculate the t-statistic as:

n <- 20

tstat <- qt(0.025, lower.tail=FALSE, df=n-1)

tstat

## [1] 2.093024

Therefore, the interval is 5 ± 2.09×2√20

= (5 − 0.93, 5 + 0.93) = (4.07, 5.93)

Page 2: CIs Hypo Test

Soud Khalifa - CIs and hypothesis testing

2 Hypothesis testing

1. Dr. Ruth claims that the average time working mothers spend talking to their children is 11 minutes perday. Suppose you suspect that that working mothers actually spend more time talking to their childrenthan that. (a) Set up the null and alternative hypothesis.

Suppose, to investigate the claim, you determine that in a sample of 100 working mothers, the averagetime spent talking to their children is 11.5 minutes, and the standard deviation is 2.3 minutes.

(b) Determine the p-value. Would you reject the null hypothesis at a 0.05 significance level?

2. Suppose Cavifree claims their toothpaste is recommended by 4 out of 5 dentists. You suspect theproportion of recommendations is actually less than their claim. You conduct a survey and determinethat out of the 200 dentists who responded to the survey, 150 recommend Cavifree. At a confidence levelof 0.05, do you have enough evidence to reject their claim?

3. You want to compare the absorbency of two brands of paper towels (call them A and B). You can makethis comparison by measuring the average number of ounces of water each brand absorbs before beingsaturated. You select a random sample of 50 towels from each brand, and measure the absorbency. Forbrand A, the average number of ounces of water is 3, with a standard deviation of 0.9. For brand B, theaverage number of ounces of water is 3.5, with a standard deviation of 1.2 ounces. Can you concludethat the two brands are equally effective at absorbing liquid or not?

Page 2