Hypothesis Testing - University of Torontoolgac/sta255_2013/notes/... · test about a population mean. State the appropriate null hypothesis and alternative hypothesis in each case

Lecture 10

Hypothesis Testing

A hypothesis is a conjecture about the distribution of some random

variables. For example, a claim about the value of a parameter of

the statistical model.

There are two types of hypotheses:

The null hypothesis, , is the current belief.

The alternative hypothesis, , is your belief; it is what you

want to show.

Examples: Each of the following situations requires a significance

test about a population mean. State the appropriate null hypothesis

and alternative hypothesis in each case.

(a) The mean area of the several thousand apartments in a new

development is advertised to be 1250 square feet. A tenant

group thinks that the apartments are smaller than advertised.

They hire an engineer to measure a sample of apartments to

test their suspicion.

(b) Larry's car consume on average 32 miles per gallon on the

highway. He now switches to a new motor oil that is

advertised as increasing gas mileage. After driving 3000

highway miles with the new oil, he wants to determine if his

gas mileage actually has increased.

(c) The diameter of a spindle in a small motor is supposed to be

5 millimeters. If the spindle is either too small or too large,

the motor will not perform properly. The manufacturer

measures the diameter in a sample of motors to determine

whether the mean diameter has moved away from the target.

Guidelines for Hypothesis testing

Hypothesis testing is a proof by contradiction. The testing process

has four steps:

Step 1: Assume is true.

Step 2: Use statistical theory to make a statistic (function of the

data) that includes . This statistic is called the test statistic.

Step 3: Find the probability that the test statistic would take a

value as extreme or more extreme than that actually observed.

Think of this as: probability of getting our sample assuming is

true.

Step 4: If the probability we calculated in step 3 is high it means

that the sample is likely under and so we have no evidence

against . If the probability is low, there are two possibilities:

- we observed a very unusual event, or

- our assumption is wrong

Test Statistic

• The test is based on a statistic that estimates the parameter

that appears in the hypotheses. Usually this is the same

estimate we would use in a confidence interval for the

parameter. When is true, we expect the estimate to take a

value near the parameter value specified in .

• Values of the estimate far from the parameter value specified by

give evidence against . The alternative hypothesis

determines which directions count against .

• A test statistic measures compatibility between the null

hypothesis and the data.

• To assess how far the estimate is from the parameter, standardize

the estimate. In many common situations the test statistics has the

form

Example: An air freight company wishes to test whether or not the

mean weight of parcels shipped on a particular root exceeds 10

pounds. A random sample of 49 shipping orders was examined and

found to have average weight of 11 pounds. Assume that the

standard deviation of the weights is 2.8 pounds.

Solution:

Graphical Representation

Suppose we want to test a set of hypotheses concerning a

parameter based on a random sample .

vs

is the estimate of our parameter .

Rejection Region (RR) is the specified values of the test statistics

for which we reject .

The probability that defines the critical region is called the size of

the test or level of the significance of the test and is denoted by α.

Example: The hourly wages in a particular industry are normally

distributed with mean $13.20 and standard deviation $2.50. A

company employs 40 workers paying them an average of $12.20

per hour. Can this company be accused of paying substandard

wages? Use .

Solution:

Decision Errors

When we perform a statistical test we hope that our decision will

be correct, but sometimes it will be wrong. There are two possible

errors that can be made in hypothesis test.

Definition: The error made by rejecting the null hypothesis

when in fact is true is called a type I error.

The error made by failing to reject the null hypothesis when in

fact is false is called a type II error.

Note: The level of significance of the test is also the probability of

type I error, denoted by , i.e.

The probability of a type II error is denoted by .

Example: An experimenter has prepared a drug dosage level that

she claims will induce sleep for 80% of people suffering from

insomnia. In an attempt to disprove her claim, we administer her

prescribed dosage to 20 insomniacs and observe X, the number of

people for whom the drug dose induces sleep. We wish to test

vs . Assume .

P-value

Definition: The probability, assuming is true, that the test

statistic would take a value as extreme or more extreme than that

actually observed is called the P-value of the test. The smaller the

P-value, the stronger the evidence against provided by the data.

Guideline for how small is “small”:

P-value > 0.1 provides no evidence against .

0.05 < P-value < 0.1 provides weak evidence against .

0.01 < P-value < 0.05 provides moderated evidence against .

P-value < 0.01 provides strong evidence against .

We can compare the P-value we calculate with a fixed value that

we regard as decisive. The decisive value of P is called the

significance level (this is our ). Most common values for are

0.1, 0.05, 0.01.

If the P-value is as small or smaller than , we say that the data are

statistically significant at level . In other words, the P-value is

the smallest level of significance for which the null hypothesis

should be rejected.

Example: 85% of the general public is right-handed. A survey of

300 chief executive officers of large corporations found that 95%

were right-handed. Is this difference in percentages statistically

significant? Use . Find the P-value for the test.

Solution:

Tests for a Population Mean ( is known)

where is the specified value of .

Example: In 1999, it was reported that the mean serum cholesterol

level for female undergraduates was 168 mg/dl with a standard

deviation of 27 mg/dl. A recent study at Baylor University

investigated the lipid levels in a cohort of sedentary university

students. The mean total cholesterol level among n = 71 females

was . Is this evidence that cholesterol levels of sedentary

students differ from the previously reported average?

Solution:

Tests for a Population Mean ( is unknown)

Recall: (one-sample t CI)

Example: Founded in 1998, Telephia provides a wide variety of

information on cellular phone use. In 2006, Telaphia reported that,

on average, United Kingdom (U.K.) subscribers with third-

generation technology (3G) phones spent an average of 8.3 hours

per month listening to full-track music on their cell phones.

Suppose we want to determine a 95% CI for the U.S. average and

draw the following random sample of size 8 from the U.S.

population of 3G subscribers:

5 6 0 4 11 9 2 3

The sample mean is and the standard deviation s = 3.63 with

degrees of freedom n - 1 = 7.

Example: Suppose that, for the U.S. data in example before we

want to test whether the U.S. average is different from the reported

U.K. average.

Power

The ability of a test to detect that is false is measured by the

probability that the test will reject when an alternative is true.

The higher this probability is, the more sensitive the test is.

Definition: The probability that a fixed size test will reject

when is false is called the power of the test.

A powerful test has a large probability of rejecting when it is

false.

Example: Can a 6-month exercise program increase the total body

bone mineral content (TBBMC) of young women? A team of

researchers is planning a study to examine this question. Based on

the results of a previous study, they are willing to assume that

for the percent change in TBBMC over the 6-month period.

A change in TBBMC of 1% would be considered important, and

the researchers would like to have a reasonable chance of detecting

a change this large or larger. Is 25 subjects a large enough sample

for this project?

Three steps to find the power of the test:

1. State , , the particular alternative we want to detect, and

the significance level .

2. Find the values of (or other estimates) that will lead to reject

.

3. Calculate the probability of observing these values of when

the alternative is true.

How to increase the power?

Back to Error Probabilities

Example: The mean outer diameter of a skateboard bearing is

supposed to be 22.000 millimeters (mm). The outer diameters vary

Normally with standard deviation mm. When a lot of

bearings arrives, the skateboard manufacturer takes an SRS of 5

bearings from the lot and measures their outer diameters. The

manufacturer rejects the bearings if the sample mean diameter is

significantly different from 22 at the 5% significance level.

Suppose the producer and the manufacturer agree that a lot of

bearings with mean 0.015 mm away from 22 should be rejected.

Significance and Type I error: The significance level of any

fixed level test is the probability of a Type I error. That is, is the

probability that the test will reject when is in fact true.

Power and Type II error: The power of a fixed level test to detect a

particular alternative is 1 minus the probability of a Type II error

for that alternative.

Comparing Two Means

Assume we have two populations of interest, each with unknown

mean . Choose an SRS of size from one normal population

having mean and standard deviation and an independent

SRS of size from another normal population having mean

and standard deviation . The estimate of the difference in the

population means is

where and are sample means.

Distribution of :

Example: A fourth-grade class has 12 girls and 8 boys. The

children’s heights are recorded on their 10th birthdays. Based on

information from the National Health and Nutrition Examination

Survey, the heights (in inches) of 10-year-old girls are

distributed Normally with mean 56.8 and standard deviation 2.7

and the heights (in inches) of 10-year-old boys are distributed

Normally with mean 55.7 and standard deviation 3.8. Assume

that the heights of the students in the class are random samples

from the populations. What is the probability that the girls’

average height is greater than the boys’ average height?

Solution:

Here we know and

, which is quite rare.

So in general, there are two ways to compare the means of two

normal populations. This is due to the fact that there are two

distinct possibilities:

1. and

are unknown and equal.

2. and

are unknown and unequal.

Comparing Two Mean: Variances Unequal

Assume and

are unknown. We estimate them by and

.

Example: An educator believes that new directed reading activities

in the classroom will help elementary school pupils improve some

aspects of their reading ability. She arranges for a third-grade class

of 21 students to take part in these activities for an eight-week

period. A control classroom of 23 third-graders follows the same

curriculum without the activities. At the end of the eight weeks, all

students are given a Degree of Reading Power (DRP) test, which

measures the aspects of reading ability that the treatment is

designed to improve. The data appear in the table below:

Two-Sample t CI:

Choose an SRS of size from a Normal population with

unknown mean and an independent SRS of size from

another Normal population with unknown mean .

A ( ) CI for is given by

( ) √

where is the value for density curve with area

between and . The value of the degrees of freedom k is

approximated by software or we use the smaller of and

.

Example: How much improvement?

Comparing Two Means: Variances Equal (Pooled Test)

Suppose we have two Normal populations with the same

variances:

, is unknown.

The pooled two-sample t procedures:

Choose an SRS of size from a Normal population with

unknown mean and an independent SRS of size from

another Normal population with unknown mean .

A ( ) CI for is given by

( ) √

where is the value for density curve with area

between and .

To test the hypothesis , compute the pooled two-

sample t statistic

√

In terms of a random variable T having the distribution,

the P-value for a test of against

Example: Does increasing the amount of calcium in our diet reduce

blood pressure? Examination of a large sample of people revealed

a relationship between calcium intake and blood pressure, but such

observational studies do not establish causation. A randomized

comparative experiment gave one group of 10 people a calcium

supplement for 12 weeks. The control group of 11 people received

a placebo that appeared identical. Table below gives the seated

systolic blood pressure for all subjects at the beginning and end of

12-week period, in millimeters of mercury. The table also shows

the decrease of each subject. An increase appears as a negative

entry.

Does increase calcium reduce blood pressure?

How different are the calcium and placebo groups?

Documents

Hypothesis Testing - University of Torontoolgac/sta255_2013/notes/... · test about a population mean. State the appropriate null hypothesis and alternative hypothesis in each case