40
Introduction to Inference Tests of Significance

Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Introduction to Inference

Tests of Significance

Page 2: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Proof

925 950 975 1000

1000x 125

2525

x

979x

xz

sn

979 1000

25

.84

( 979) .2005P x

Page 3: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Proof

925 950 975 1000

1000x 125

2525

x

920x

xz

sn

920 1000

25

3.2

( 920) .0007P x

Page 4: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Definitions

• A test of significance is a method for using

sample data to decide between two competing

claims about a population characteristic.

• The null hypothesis, denoted by H0, says that

there is no effect or no change to a claim

assumed to be true (i.e. H0 : 1000).

• The alternative hypothesis, denoted by Ha, is

the competing claim (i.e. Ha : 1000).

Note: population characteristic could

be , or hypothesized value

Page 5: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Chrysler Concord

• H0: 8

• Ha: 8

xz

sn

8.7 8

110

Page 6: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Phrasing our decision

• In justice system, what is our null and

alternative hypothesis?

• H0: defendant is innocent

• Ha: defendant is guilty

• What does the jury state if the defendant

wins?

• Not guilty

• Why?

Page 7: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Phrasing our decision

• What is the goal of the prosecutor?

• The goal of a trial is to provide evidence that

the defendant is guilty.

• When does the prosecutor win?

• What is the decision with respect to:

– the null hypothesis (H0: defendant is innocent)

– the alternative hypothesis (Ha: defendant is guilty)

• We reject the null because we have the

evidence to believe the alternative.

Page 8: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Phrasing our decision

• When does the defendant win?

• What is the decision with respect to:

– the null hypothesis (H0: defendant is innocent)

– the alternative hypothesis (Ha: defendant is guilty)

• We fail to reject the null because we do not

have the evidence to believe the alternative.

Page 9: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Summary

• H0: defendant is innocent

• Ha: defendant is guilty

• We have the evidence:

– We reject the null because we have the

evidence to believe the alternative.

• We don’t have the evidence:

– We fail to reject the null because we do not

have the evidence to believe the alternative.

Page 10: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Chrysler Concord

• H0: 8

• Ha: 8

• p-value = .0134

• We reject H0 since the probability is so

small there is enough evidence to believe

the mean Concord time is greater than 8

seconds.

Page 11: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

K-mart light bulb

• H0: 1000

• Ha: 1000

• p-value = .1078

• We fail to reject H0 since the probability is

not very small there is not enough

evidence to believe the mean lifetime is

less than 1000 hours.

Page 12: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Remember:

Inference procedure overview

• State the procedure

• Define any variables

• Establish the conditions (assumptions)

• Use the appropriate formula

• Draw conclusions

Page 13: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Test of Significance Example

• A package delivery service claims it takes an

average of 24 hours to send a package from

New York to San Francisco. An independent

consumer agency is doing a study to test the

truth of the claim. Several complaints have led

the agency to suspect that the delivery time is

longer than 24 hours. Assume that the delivery

times are normally distributed with standard

deviation (assume for now) of 2 hours. A

random sample of 25 packages has been taken.

Page 14: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

The thought process of a test

test of significance

= true mean delivery time

Ho: = 24

Ha: > 24

Given a random sample

Given a normal distribution

Safe to infer a population of at least 250 packages

Page 15: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Thought process continued

22.8 23.2 23.6 24 24.4 24.8 25.2

24x 2

0.425

x

24.85x

24.85

xz

sn

24.85 24

.4

2.125

Page 16: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Thought process continued

let a = .05

test of significance = true mean delivery time

Ho: = 24 Ha: > 24

Given a random sample

Given a normal distribution

Assume a population of at least 250 packages 24.85 24

2.1252

25

z

p-value 1 .9834 .0166

Page 17: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Thought process continued

• Question: What can I conclude?

• If I believe the statistic is just too extreme and unusual (P-value < a), I will reject the null hypothesis.

• If I believe the statistic is just normal chance variation (P-value > a), I will fail to reject the null hypothesis.

Page 18: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Thought process continued test of significance = true mean delivery time

Ho: = 24 Ha: > 24

Given a random sample

Given a normal distribution

Assume a population of at least 250 packages

let = .05a

We reject Ho. Since p-value<a there is enough

evidence to believe the delivery time is longer than

24 hours.

p-value .016624.85 24

2.1252

25

z

Page 19: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Second example test of significance = true mean VVIQ

Ho: = 67 Ha: < 67

Given a random sample

Sample is large (n>40) Central Limit Theorem

ensures a normal distribution

Assume a population of at least 510 varsity athletes

let = .05a.p-value=.1882

We fail to reject Ho. Since p-value>a there is not

enough evidence to believe the mean VVIQ score is

less than 67.

64.6 67.88

19.3851

z

Page 20: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

1 proportion z-test p = true proportion pure short

Ho: p = .25 Ha: p = .25

Given a random sample.

np = 1064(.25) > 10 n(1–p) = 1064(1–.25) > 10

Sample size is large enough to use normality

Safe to infer a population of at least 10,640 plants.

let = .05a

.2603 .25.78

.25(1 .25)

1064

z

.p-value=.4361

We fail to reject Ho. Since p-value>a there is not

enough evidence to believe the proportion of pure

short is different than 25%.

Page 21: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Choosing a level of significance

• How plausible is H0? If H0 represents a

long held belief, strong evidence (small a)

might be needed to dissolve the belief.

• What are the consequences of rejecting

H0? The choice of a will be heavily

influenced by the consequences of

rejecting or failing to reject.

Page 22: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Errors in the justice system

Actual truth

Jury decision

Guilty Not guilty

Guilty

Not guilty

Correct decision

Correct decision

Type I error

Type II error

Page 23: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

“No innocent man is jailed” justice system

Actual truth

Jury decision

Guilty Not guilty

Guilty

Not guilty

Type I error

Type II error

smaller

larger

Page 24: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

“No guilty man goes free” justice system

Actual truth

Jury decision

Guilty Not guilty

Guilty

Not guilty

Type I error

Type II error smaller

larger

Page 25: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Errors in the justice system

Actual truth

Jury decision

Guilty Not guilty

Guilty

Not guilty

Correct decision

Correct decision

Type I error

Type II error

(Ha true) (H0 true)

(reject H0)

(fail to reject H0)

Page 26: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Type I and Type II errors

• If we believe Ha when in fact H0 is true,

this is a type I error.

• If we believe H0 when in fact Ha is true,

this is a type II error.

• Type I error: if we reject H0 and it’s a

mistake.

• Type II error: if we fail to reject H0 and

it’s a mistake. APPLET

Page 27: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Type I and Type II example

A distributor of handheld calculators receives very large

shipments of calculators from a manufacturer. It is too

costly and time consuming to inspect all incoming

calculators, so when each shipment arrives, a sample is

selected for inspection. Information from the sample is

then used to test Ho: p = .02 versus Ha: p < .02, where p

is the true proportion of defective calculators in the

shipment. If the null hypothesis is rejected, the distributor

accepts the shipment of calculators. If the null hypothesis

cannot be rejected, the entire shipment of calculators is

returned to the manufacturer due to inferior quality. (A

shipment is defined to be of inferior quality if it contains

2% or more defectives.)

Page 28: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Type I and Type II example

• Type I error: We think the proportion of

defective calculators is less than 2%, but

it’s actually 2% (or more).

• Consequence: Accept shipment that has

too many defective calculators so potential

loss in revenue.

Page 29: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Type I and Type II example

• Type II error: We think the proportion of

defective calculators is 2%, but it’s actually

less than 2%.

• Consequence: Return shipment thinking

there are too many defective calculators,

but the shipment is ok.

Page 30: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Type I and Type II example

• Distributor wants to avoid Type I error.

Choose a = .01

• Calculator manufacturer wants to avoid

Type II error. Choose a = .10

Page 31: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Concept of Power

• Definition?

• Power is the capability of accomplishing

something…

• The power of a test of significance is…

Page 32: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Power Example

In a power generating plant, pressure in a certain line is

supposed to maintain an average of 100 psi over any 4

- hour period. If the average pressure exceeds 103 psi

for a 4 - hour period, serious complications can evolve.

During a given 4 - hour period, thirty random

measurements are to be taken. The standard

deviation for these measurements is 4 psi (graph of

data is reasonably normal), test Ho: = 100 psi versus

the alternative “new” hypothesis = 103 psi. Test at

the alpha level of .01. Calculate a type II error and the

power of this test. In context of the problem, explain

what the power means.

Page 33: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Type I error and a

4.73

30s

n

100100.73

101.46102.19

for =.01 t*=2.462a

a is the probability that we think

the mean pressure is above 100 psi,

but actually the mean pressure is

100 psi (or less)

Page 34: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Type I error and a

4.73

30s

n

100100.73

101.46102.19

101.80

for =.01 t*=2.462a

1002.462

.73

x

Page 35: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Type II error and b

100100.73

101.46102.19

101.8

103

103

.73zb

1.64

.0505b

Page 36: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Type II error and b

100100.73

101.46102.19

101.8

103

.0505b

b is the probability that we think the mean pressure is 100 psi,

but actually the pressure is greater than 100 psi.

Page 37: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Power?

100100.73

101.46102.19

103

.0505b

Power = 1 .0505 .9495

Page 38: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

100100.73

101.46102.19

103

For a sample size of 30, there is a .9495

probability that this test of significance will

correctly detect if the pressure is above

100 psi.

Page 39: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Concept of Power

• The power of a test of significance is

the probability that the null hypothesis

will be correctly rejected.

• Because the true value of is unknown,

we cannot know what the power is for ,

but we are able to examine “what if”

scenarios to provide important

information.

• Power = 1 – b

Page 40: Tests of Significance · Tests of Significance . Proof 925 950 975 1000 P x 1000125 25 x 25 V x 979 x z s n P 979 1000 25 .84 Px( 979) .2005 Proof 925 950 975 1000 P x 1000125 25

Effects on the Power of a Test

• The larger the difference between the hypothesized value and the true value of the population characteristic, the higher the power.

• The larger the significance level, a, the higher the power of the test.

• The larger the sample size, the higher the power of the test.

APPLET