1 Schaum’s Outline Probability and Statistics Chapter 7 HYPOTHESIS TESTING presented by Professor...

Preview:

Citation preview

1

Schaum’s Outline Probability and Statistics

Chapter 7

HYPOTHESIS TESTING

presented by Professor Carol Dahl

Examples byAlfred Aird

Kira Jeffery Catherine Keske

Hermann Logsend Yris Olaya

2Outline of Topics

Statistical Decisions Statistical Hypotheses Null Hypotheses Tests of Hypotheses Type I and Type II Errors Level of Significance Tests Involving the Normal Distribution One and Two – Tailed Tests P – Value

Topics Covered

3

Special Tests of Significance Large Samples Small Samples

Estimation Theory/Hypotheses Testing Relationship

Operating Characteristic Curves and Power of a Test

Fitting Theoretical Distributions to Sample Frequency Distributions

Chi-Square Test for Goodness of Fit

Outline of Topics (Continued)

4

“The Truth Is Out There”The Importance of Hypothesis Testing

Hypothesis testing

helps evaluate models based upon real data

enables one to build a statistical model

enhances your credibility as

analyst

economist

5Statistical Decisions

Innocent until proven guilty principle

Want to prove someone is guilty

Assume the opposite or status quo - innocent

Ho: Innocent

H1: Guilty

Take subsample of possible information

If evidence not consistent with innocent - reject

Person not pronounced innocent but not guilty

6Statistical Decisions

Status quo innocence = null hypothesis

Evidence = sample result

Reasonable doubt = confidence level

7

Statistical Decisions

Eg. Tantalum ore deposit

feasible if quality > 0.0600g/kg with 99% confidence

100 samples collected from large deposit at random.

Sample distribution

mean of 0.071g/kg

standard deviation 0.0025g/kg.

8

Statistical Decisions

Should the deposit be developed?

Evidence = 0.071 (sample mean)

Reasonable doubt = 99%

Status quo = do not develop the deposit

Ho: < 0.0600

H1: > 0.0600

9Statistical Hypothesis

General Principles

Inferences about population using sample statistic

Prove A is true by assuming it isn’t true

Results of experiment (sample) compared with model

If results of model unlikely, reject model

If results explained by model, do not reject

10Statistical Hypothesis

Event A fairly likely, model would be retained

Event B unlikely, model would be rejected

00 AA

AreaArea

BB

zz

11

Statistical Decisions

Should the deposit be developed?

Evidence = 0.071 (sample mean)

Reasonable doubt = 99%

Status quo = do not develop the deposit

Ho: = 0.0600

H1: > 0.0600

How likely Ho given = 0.071X

12

Need Sampling Statistic

Need statistic with

population parameter

estimate for population parameter

its distribution

13

Need Sampling Statistic

Population Normal - Two Choices

Small Sample <30

Known Variance Unknown Variance

n

X

ns

X

N(0,1) tn-1

14

Need Sampling StatisticPopulation Not-Normal

Large Sample

Known Variance Unknown Variance

ns

X

N(0,1) N(0,1) Doesn’t matter if know variance of not

If population is finite sampling no replacement need adjustment

n

X

15Normal Distribution

=0

SD=1 (68%)

X~N(0,1)

SD=2 (95%)

SD=3 (99.7%)

27

16

Statistical Decisions

Should the deposit be developed?

Evidence: 0.071 (sample mean)

0.0025g/kg (sample variance)

0.05 (sample standard deviation)

Reasonable doubt = 99%

Status quo = do not develop the deposit

Ho: = 0.0600

H1: > 0.0600

One tailed test

How likely Ho given = 0.071X

17Hypothesis test

Evidence: 0.071 (sample mean)

0.05g/kg (sample standard deviation)

Reasonable doubt = 99%

Status quo = do not develop the deposit

Ho: = 0.0600

H1: > 0.0600

199.0)Z

ns

X(P c

18Statistical Hypothesis

Eg. Z = (0.071 – 0.0600)/ (0.05/ 100) = 2.2

Conclusion: Don’t reject Ho , don’t develop deposit

2.2 Zc=2.33

19Null Hypothesis

Hypotheses cannot be proven

reject or fail to reject

based on likelihood of event occurring

null hypothesis is not accepted

20

Test of Hypotheses Maple Creek Mine and

Potaro Diamond field in Guyana

Mine potential for producing large diamonds

Experts want to know true mean carat size produced

True mean said to be 4 carats

Experts want to know if true with 95% confidence

Random sample taken

Sample mean found to be 3.6 carats

Based on sample, is 4 carats true mean for mine?

21Tests of Hypotheses

Tests referred to as:

“Tests of Hypotheses”

“Tests of Significance”

“Rules of Decision”

22

Types of Errors

195.0)96.1

n

4X96.1(P

Ho: µ = 4 (Suppose this is true)

H1: µ 4

Two tailed test

Choose = 0.05

Sample n = 100 (assume X is normal), = 1

23

Type I error () –reject true

195.0)96.1

n

4X96.1(P

Ho: µ = 4 suppose true

/2/2

24

Type II Error (ß) - Accept False

Ho: µ = 4 not true

µ = 6 true

X-µ not mean 0 but mean 2

μ=4 μ=6

0 2

ß

25Lower Type I

What happens to Type II

Ho: µ = 4 not true

µ = 6 true

ßμ=4 μ=6

0 2

26Higher µ

What happens to Type II?

Ho: µ = 4 not true

µ = 7 true

X-µ not mean 0 but mean 3

ßμ=4 μ=7

0 3

27

Ho True Ho False

Reject Ho Type I Error Correct Decision

Do Not Reject Ho

Correct Decision

Type II Error

)Error II Type(P

)Error I Type(P

Type I and Type II Errors

Two types of errors can occur in hypothesis testing

To reduce errors, increase sample size when possible

28To Reduce Errors

Increase sample size when possible

Population, n = 5, 10, 20Mean Sampling

Distributions Difference Sample Sizes

-0.5

0

0.5

1

1.5

2

2.5

-4 -2 0 2 4

29

Error Examples

Type I Error – rejecting a true null hypothesis

Convicting an innocent person

Rejecting true mean carat size is 4 when it is

Type II Error – not rejecting a false null hypothesis

Setting a guilty person free

Not rejecting mean carat size is 4 when it’s not

30

Level of Significance ()

α = max probability we’re willing to risk Type I Error

= tail area of probability density function

If Type I Error’s “cost” high, choose α low

α defined before hypothesis test conducted

α typically defined as 0.10, 0.05 or 0.01

α = 0.10 for 90% confidence of correct test decision

α = 0.05 for 95% confidence of correct test decision

α = 0.01 for 99% confidence of correct test decision

31Diamond Hypothesis Test Example

Ho: µ = 4

H1: µ 4Choose α = 0.01 for 99% confidence

Sample n = 100, = 1

X = 3.6, -Zc = - 2.575, Zc = 2.575

-2.575 2.575

.005.005

32

21001

42.3

n

-Xz

2

Example Continued

)z( 2.575- not )z( 2- 2 Observed not “significantly” different from expected

Fail to reject null hypothesis

We’re 99% confident true mean is 4 carats

1

21

33

Tests Involving the t Distribution

Billy Ray has inherited large, 25,000 acre homestead

Located on outskirts of Murfreesboro, Arkansas, near:

Crater of Diamonds State Park

Prairie Creek Volcanic Pipe

Land now used for

agricultural

recreational

No official mining has taken place

34

Case Study in Statistical Analysis Billy Ray’s Inheritance

Billy Ray must now decide upon land usage

Options:

Exploration for diamonds

Conservation

Land biodiversity and recreation

Agriculture and recreation

Land development

35

Consider Costs and Benefits of Mining

Cost and Benefits of Mining

Opportunity cost

Excessive diamond exploration damages land’s value

Exploration and Mining Costs

Benefit

Value of mineral produced

36

Consider Costs and Benefits of Mining

Cost and Benefits of Mining

Sample for geologic indicators for diamonds

kimberlite or lamporite

larger sample more likely to represent “true population”

larger sample will cost more

37

How to decide one tailed or two tailed

One tailed test

Do we change status quo only if its bigger than null

Do we change status quo only if its smaller than null

Two tailed test

Change status quo if its bigger of if it smaller

38Tests of Mean

Normal or t

population normal

known variance

small sample

Normal

population normal unknown variancesmall sample

t

large population Normal

39Difference Normal and t

00.10.20.30.40.50.6

-5 0 5

t “fatter” tail than normal bell-curve

40

Hypothesis and Sample

Need at least 30 g/m3 mine

Null hypothesis Ho: µ = 20

Alternative hypothesis H1: ?

Sample data: n=16 (holes drilled)

X close to normal

X =31 g/m³

variance (ŝ2/n)=0.286 g/m³

41

Normal or t?

One tailed

Null hypothesis Ho: µ = 30

Alternative hypothesis H1: µ > 30

Sample data: n = 16 (holes drilled)

X = 31 g/m³

variance (ŝ2) = 4.29 g/m³ = 4.29

standard deviation ŝ = 2.07

small sample, estimated variance, X close to normal

not exactly t but close if X close to normal

42

Tests Involving the t Distribution

tn-1 = X - µ ŝ/n

=0

Reject 5%

tc=1.75

t16-1

43

Tests Involving the t Distribution

tn-1 = X - µ = (31 - 30) = 1.93 ŝ/n 2.07/ 16

=0

Reject 5%

tc=1.75

t16-1

44

Wells produces oil

X= API Gravity

approximate normal with mean 37

periodically test to see if the mean has changed

too heavy or too light revise contract

Ho:

H1:

Sample of 9 wells, X= 38, ŝ2 = 2

What is test statistic?

Normal or t?

Carol Dahl

45

Two tailed t test on mean

tn-1 = X - µ ŝ/n

=0

tc

Reject /2%

Reject /2%

tc

46Two tailed t test on mean

Ho: µ= 37

H1: µ 37

Sample of 9 wells, X= 38, ŝ2 = 2, = 10%

tn-1 = X - µ = (38 – 37) = 1.5 ŝ/n 2/ 9

47

P-values - one tailed test

Level of significance for a sample statistic under null

Largest for which statistic would reject null

t16-1 = X - µ = (31 - 30) = 1.93

ŝ/n 2.07/ 16

tinv(1,87,15,1)

P=0.04

48P-value two tailed test

Ho: µ= 37

H1: µ 37

Sample of 9 wells, X= 38, ŝ2 = 2, = 10%

tn-1 = X - µ = (38 – 37) = 1.5 ŝ/n 2/ 9 =TDIST(1.5,8,2) = 0.172

49Formal Representation of p-Values

p-Value < = Reject Ho

p-Value > = Fail to reject Ho

50

More tests

Survey: - Ranking refinery managers

Daily refinery production

Sample two refineries of 40 and 35 1000 b/cd

First refinery: mean = 74, stand. dev. = 8

Second refinery: mean = 78, stand. dev. = 7

Questions: difference of means?

variances?

differences of variances

Again Statistics Can Help!!!!

51

Differences of Means

Ho: µ1 - µ2 = 0

Ho: µ1 - µ2 0

X1 and X2 normal, known variance

or large sample known variance

= 10%

2

2

2

1

2

1

21

nn

XX

5%

5%

-Zc Zc

52

Differences of Means

Ho: µ1 - µ2 = 0

Ho: µ1 - µ2 0

n1 = 40, n2 = 35

X1 = 74, 1 = 8

X2 = 78, 2 = 7

958.0

357

408

7874

XX22

2

2

2

1

2

1

21

5%

5%

-Z=-1.645c Zc-1.645

53Difference of Means

X normal

Unknown but equal variances

Do above test with

21

21

21

222

211

2121

nnnn

2nns)1n(s)1n(

XX2nn

t

54

Variance test (2 distribution)

2

22 S)1n(

Two tailed

/2/2

55

Variance test (2 distribution)

2

22 S)1n(

One tailed

56Hypothesis Test on Variance

Suppose best practice in refinery 2 = 6

Does refinery 2 have different variability than best practice?

Ho: 2 = 6

H1: 2 6.5

Example: 2nd mine, n –1 = 34, Standard deviation = 7

1)

S)1n((P 2

2

22

2c1c

57Hypothesis Test on Variance

/2

278.466

7)135(S)1n(2

2

2

2

Ho: 2 = (6.5) 2

H1: 2 6.52

Example: 2nd mine, n –1 = 34, Standard deviation = 7

= 10%

1)

S)1n((P 2

2

22

2c1c

58Hypothesis Test on Variance

/2

)34,05.0(chiinv),34,95.0(chiinv

Suppose best practice in refinery

Ho: 2 = 6.5

H1: 2 6.5

Example: 2nd mine, n –1 = 34, Standard deviation = 7

603.48,664.21

59

Variance test (2 distribution)

278.46S)1n(

2

22

Two tailed

0.050.05

21.664 48.602

60

Variance test (2 distribution)

More variance than best practice

One tailed

0.10

Ho: 2 = 6.5

H1: 2 > 6.5

61Variance test (2 distribution)

More variance than best practice

One tailed

0.10

Ho: 2 = 6.5

H1: 2 > 6.5 278.46S)1n(

2

22

chiinv(0.10,34)=44.903

62

Testing if Variances the Same F Distribution

2 samples of size n1 and n2

sample variances: ŝ12, ŝ2

2,

Ho: 12

= 22 => Ho: 2

2/12= 1

Ho: 12

22 => Ho: 2

2/12 1

F isS

S S

S

F 12n,11n2

1

2

2

2

2

2

1

2

2

2

2

2

1

2

1

63

Testing if Variances the Same F Distribution

Ho: 12/2

2= 1

H1: 12/2

2 1 2

2

2

1

SS

Two tailed

/2

/2

64

Testing if Variances the Same F Distribution

Ho: 22/1

2= 1

H1: 22/1

2>1 2

2

2

1

SS

One tailed

=10

65Example Testing if Variances the

Same

2 samples of size n1 = 40

and n2 = 35

sample variances: ŝ12= 82, ŝ2

2 = 72

Ho: 22/1

2= 1

Ho: 22/1

2 1 10.01))34,39,05.0(FinvS

S)34,39,95.0(Finv(P 2

1

2

2

2

2

2

1

[0.579, 1.749]

82/72=1.306

66

Testing if Variances the Same F Distribution

Ho: 12/2

2= 1

H1: 12/2

2 1 306.1

SS 2

2

2

1

Two tailed

0.050.05

Finv(0.95,39,34)=0.579 Finv(0.05,39,34)=1.749

67

Testing if Variances the Same F Distribution

Ho: 22/1

2= 1

H1: 22/1

2 1 306.1

SS 2

2

2

1

One tailed

0.05

Finv(0.10,39,34)=1.544

68Power of a test

Type II error:

= P(Fail to reject Ho | H1 is true)

Power = 1-

μ=4 μ=6

0 2

69Power of a test

Type II error:

= P(Fail to reject Ho | H1 is true)

Power = 1-

μ=4 μ=6

0 2

70Power of a test

Researcher controls level of significance,

Increase what happens to ß?

71Raise Type I ( )

What happens to Type II (ß)

Ho: µ = 4 not true

µ = 6 true

X-µ not mean 0 but mean 2

ßμ=4 μ=6

0 2

72Higher

What happens to Type II?

μ=4 μ=6

0 2

ß

Increase ß, reduce

73

Operating Characteristic Curve

-10 -5 5 10

H1H0

ß

μ=μ0 μ=μ1

Can graph against

called operating characteristic curve

useful in experimental design

74Operating Characteristic Curve

H1H0

-10 -5 5 10

ß

μ=μ0 μ=μ2

-10 -5 5 10

H1H0

ß

μ=μ0 μ=μ1

75

Fitting a probability distribution

Is electricity demand a log-normal distribution

Observed Mean: 18.42

Observed Variance 43

Observations : 20

9.8261 13.2253 30.2449 9.255420.8787 20.2954 14.182 23.309935.6834 18.1785 20.275 17.265213.1139 24.3539 17.243 21.976415.9879 16.4685 12.8461 13.9045

76

Fitting a probability distribution

Does electricity demand follow a normal distribution?

9.8261 13.2253 30.2449 9.255420.8787 20.2954 14.182 23.309935.6834 18.1785 20.275 17.265213.1139 24.3539 17.243 21.976415.9879 16.4685 12.8461 13.9045

Observed Mean: 18.42

Observed Variance: 43

Observations : 20

77

1. Order observations from smallest Y1 to largest Yn

2. Compute cumulative frequency distribution 3. Plot ordered observations versus Pi

on special probability sheet 4. If straight line within critical range

can’t reject normal

You can test your model graphically:

78You can test your model graphically:

9.26 0.05 17.27 0.55

9.83 0.10 18.18 0.60

12.85 0.15 20.28 0.65

13.11 0.20 20.30 0.70

13.23 0.25 20.88 0.75

13.90 0.30 21.98 0.80

14.18 0.35 23.31 0.85

15.99 0.40 24.35 0.90

16.47 0.45 30.24 0.95

17.24 0.50 35.68 1.00

79

10010

99

95

90

80

7060504030

20

10

5

1

Data

Perc

ent

0.695AD*

Goodness of Fit

Lognormal base e Probability Plot for C1ML Estimates - 95% CI

Location

Scale

2.85735

0.334029

ML Estimates

Or use the Graph/Probability Plot …Option in Minitab

80

Statistical test of distribution

Ho: Xe N(µ,2)

H1: Xe does not follow N(µ,2)

Order data

Estimate sample mean & variance

Observed Mean: 18.42

Observed Variance: 43

Observations : 20

2 statistic goodness of fit of model

81Statistical test of distribution

9.26 17.27

9.83 18.18

12.85 20.28

13.11 20.30

13.23 20.88

13.90 21.98

14.18 23.31

15.99 24.35

16.47 30.24

17.24 35.68

Again order sample

Create m = 5 categories

<10

10-15

15-20

20-25

>25

82Statistical test of distribution

9.26 17.27

9.83 18.18

12.85 20.28

13.11 20.30

13.23 20.88

13.90 21.98

14.18 23.31

15.99 24.35

16.47 30.24

17.24 35.68

Actual frequencies

<10 2

10-15 5

15-20 5

20-25 6

>25 2

83Statistical test of distribution

  actual expected

<10 2  Normdist(10,18.42,6.56,1)*20

10-15 5(Normdist(15,18.42,6.56,1) -Normdist(10,18.42,6.56,1)*20

15-20 5(Normdist(20,18.42,6.56,1) Normdist(15,18.42,6.56,1)*20

20-25 6  

>25 2  

Frequencies

84Statistical test of distribution

Frequencies

  Observed Expected

<10 2 1.99

10-15 5 4.03

15-20 5 5.88

20-25 6 4.94

>25 2 3.16

852 Goodness of Fit Test

Is based on:

2= (oi-ei)2/ei

df = m – k – 1

k = number of parameters replaced by estimates

oi: observed frequency, ei: expected frequency

i=1

m

86Statistical test of distribution

Frequencies

 oi ei

<10 2 1.99

10-15 5 4.03

15-20 5 5.88

20-25 6 4.94

>25 2 3.16

2= (oi-ei)2/ei

+(2-1.99)2/1.99

+(5-4.03)2/4.03

+(5-5.88)2/5.88

+(6-4.94)2/4.94

+(2-3.19)2/3.16

= 1.04

87Statistical test of distribution

Ho: X N(µ,2)

H1: X ~ does not follow N(µ,2)

df = m – k – 1= 5 – 2 - 1

2= (oi-ei)2/ei= 1.04

CHIINV(0.05,2)=5.99

88

Estimation Theory/Hypotheses Testing Relationship

Operating Characteristic Curves and Power of a Test

Fitting Theoretical Distributions to Sample Frequency Distributions

Chi-Square Test for Goodness of Fit

Outline of Topics (Continued)

89Sum Up Chapter 7

Hypothesis testing

null vs alternative

null with equal sign

null often status quo

alternative often what want to provetype I error vs type II error

type I called level of significance

P – values

1-ß = power of test

= probability of rejecting false

one tailed vs two tailed

90Sum Up Chapter 7

Hypothesis tests

mean – Normal test

population normal, known variance

large sample

mean – t test

population normal, unknown variance,

small sample

Statistical Decisions

Statistical Hypotheses

Null Hypotheses

Tests of Hypotheses

Type I and Type II Errors

Level of Significance

Tests Involving the Normal Distribution

One and Two – Tailed Tests

P – Value

n

X

ns

X

91Sum Up Chapter 7

Normal and t

92Sum Up Chapter 7

Hypothesis tests

difference of means – Normal test

population normal, known variance

2

2

2

1

2

1

21

nn

XX

93Sum Up Chapter 7

Hypothesis tests

variance

2

22 S)1n(

F isS

S 12n,11n2

1

2

2

2

2

2

1

Are variances equal

94Sum Up Chapter 7

2 and F

95Sum Up Chapter 7

How is random variable distributed

normal – graph cumulative frequency distribution

special paper

straight line

Statistical

2k-m-1= (oi-ei)2/ei

k = categories

m = estimated parameters

always 1 tailed

96

End of Chapter 7!

Recommended