63
Chap 9-1 Basic Statistics Fundamentals of Hypothesis Testing: One-Sample, Two-Sample Tests

Basics of statistics

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Basics of statistics

Chap 9-1

Basic Statistics

Fundamentals of Hypothesis Testing:

One-Sample, Two-Sample Tests

Page 2: Basics of statistics

Chap 9-2

What is biostatistics

Statistics is the science and art of collecting, summarizing, and analyzing data that are subject to random variation.

Biostatistics is the application of statistics and mathematical methods to the design and analysis of health, biomedical, and biological studies.

Page 3: Basics of statistics

Chap 9-3

Different Tests of Significance

1. One-Sample z-test or t-testa. Compares one sample mean versus a population

mean

2. Two-Sample t-testa. Compares one sample mean versus another

sample meana. Independent t-tests (equal samples)b. Dependent t-tests (dependent/paired samples)

3. One-way analysis of variance (ANOVA)a. Comparing several sample means

Page 4: Basics of statistics

Chap 9-4

How to properly useBiostatistics

Develop an underlying question of interest

Generate a hypothesis Design a study (Protocol) Collect Data Analyze Data

Descriptive statistics Statistical Inference

Page 5: Basics of statistics

Chap 9-5

Relationship between population and sample

(Simple random sampling)

Page 6: Basics of statistics

Chap 9-6

Sampling Techniques

Population

Simple RandomSample

Systematic Sampling

Stratified RandomSample

ConvenienceSampling

Cluster Sampling

Bias free sample

Bias freesample

Biasedsample

Bias freesample

Biasedsample

Page 7: Basics of statistics

Chap 9-7

Example

How are my 10 patients doing after I put them on an anti-hypertensive medications? Describe the results of your 10 patients

Page 8: Basics of statistics

Chap 9-8

Example

What is the in hospital mortality rate after open heart surgery at SAL hospital so far this year Describe the mortality

What is the in hospital mortality after open heart surgery likely to be this year, given results from last year Estimate probability of death for patients

like those seen in the previous year.

Page 9: Basics of statistics

Chap 9-9

Misuse of statistics

About 25% of biological research is flawed because of incorrect conclusions drawn from confounded experimental designs and misuse of statistical methods

Page 10: Basics of statistics

Chap 9-10

What is a Hypothesis?

A hypothesis is a claim (assumption)about the populationparameter Difference

between the value of sample statistic and the corresponding hypothesized parameter value is called hypothesis testing.

I claim that mean CVD in the INDIA is atleast 3!

© 1984-1994 T/Maker Co.

Page 11: Basics of statistics

Chap 9-11

Hypothesis Testing Process

Identify the Population

Assume thepopulation

mean age is 50.

( )

REJECT

Take a Sample

Null Hypothesis

No, not likely!X 20 likely if Is ?

0 : 50H

20X

Page 12: Basics of statistics

Chap 9-12

Sampling Distribution of

= 50

It is unlikely that we would get a sample mean of this value ...

... Therefore, we reject the

null hypothesis

that m = 50.

Reason for Rejecting H0

20

If H0 is trueX

... if in fact this were the population mean.

X

Page 13: Basics of statistics

Chap 9-13

Biostatistics

DescriptiveStatistical Inference

Estimation Hypothesis Testing

Confidence Intervals P-values

Components of Biostatistics

Page 14: Basics of statistics

Chap 9-14

A variable is said to be normally distributed or to have a normal distribution if its distribution has the shape of a normal curve.

Normal Distribution

Page 15: Basics of statistics

Chap 9-15

Normal distribution bell-shaped

symmetrical about the mean (No skewness)

total area under curve = 1

approximately 68% of distribution is within one standard deviation of the mean

approximately 95% of distribution is within two standard deviations of the mean

approximately 99.7% of distribution is within 3 standard deviations of the mean

Mean = Median = Mode

Page 16: Basics of statistics

Chap 9-16

Empirical Rule

About 95% of the area lies within 2 standard

deviations

About 99.7% of the area lies within 3 standard deviations of the mean

68%

About 68% of the area lies within 1 standard deviation of the mean

3 2 32

Page 17: Basics of statistics

Chap 9-17

Page 18: Basics of statistics

Chap 9-18

Level of Significance,

Is designated by , (level of significance) Typical values are .01, .05, .10

Is selected by the researcher at the beginning

Provides the critical value(s) of the test

Page 19: Basics of statistics

Chap 9-19

The z-Test for Comparing Population

MeansCritical values for standard normal

distribution

Page 20: Basics of statistics

Chap 9-20

Level of Significance and the Rejection Region

H0: 3

H1: < 30

0

0

H0: 3

H1: > 3

H0: 3

H1: 3

/2

Critical Value(s)

Rejection Regions

I claim that mean CVD in the INDIA is atleast 3!

Page 21: Basics of statistics

Chap 9-21

Hypothesis Testing

1. State the research question.2. State the statistical hypothesis.3. Set decision rule.4. Calculate the test statistic.5. Decide if result is significant.6. Interpret result as it relates to your

research question.

Page 22: Basics of statistics

Chap 9-22

Rejection & Nonrejection Regions

Two-tailed test Left-tailed test Right-tailed

Sign in Ha < >

Rejection region Both sides Left side Right side=

I claim that mean CVD in the INDIA is atleast 3!

Page 23: Basics of statistics

Chap 9-23

The Null Hypothesis, H0

States the assumption (numerical) to be tested e.g.: The average number of CVD in INDIA is

at least three ( ) Is always about a population parameter

( ), not about a sample statistic ( )

0 : 3H

0 : 3H 0 : 3H X

Page 24: Basics of statistics

Chap 9-24

The Null Hypothesis, H0

Begins with the assumption that the null hypothesis is true Similar to the notion of innocent until

proven guilty

(continued)

Page 25: Basics of statistics

Chap 9-25

The Alternative Hypothesis, H1

Is the opposite of the null hypothesis e.g.: The average number of CVD in

INDIA is less than 3 ( ) Never contains the “=” sign May or may not be accepted

1 : 3H

Page 26: Basics of statistics

Chap 9-26

General Steps in Hypothesis Testing

e.g.: Test the assumption that the true mean number of of CVD in INDIA is at least three ( Known)

1. State the H0

2. State the H1

3. Choose

4. Choose n

5. Choose Test

0

1

: 3

: 3

=.05

100

Z

H

H

n

test

Page 27: Basics of statistics

Chap 9-27

100 persons surveyed

Computed test stat =-2,p-value = .0228

Reject null hypothesis

The true mean number of CVD is less than 3 in human population.

(continued)

Reject H0

-1.645Z

6. Set up critical value(s)

7. Collect data

8. Compute test statistic and p-value

9. Make statistical decision

10. Express conclusion

General Steps in Hypothesis Testing

Page 28: Basics of statistics

Chap 9-28

The z-Test for Comparing Population

MeansCritical values for standard normal

distribution

Page 29: Basics of statistics

Chap 9-29

p-Value Approach to Testing Convert Sample Statistic (e.g. ) to

Test Statistic (e.g. Z, t or F –statistic) Obtain the p-value from a table or

computer

Compare the p-value with If p-value , do not reject H0

If p-value , reject H0

X

Page 30: Basics of statistics

Chap 9-30

Comparison of Critical-Value & P-Value Approaches

Critical-Value Approach P-Value Approach

Step1 State the null and alternative hypothesis.

Step1 State the null and alternative hypothesis.

Step 2 Decide on the significance level,

Step 2 Decide on the significance level,

Step 3 Compute the value of the test statistic.

Step 3 Compute the value of the test statistic.

Step 4 Determine the critical value(s).

Step 4 Determine the P-value.

Step 5 If the value of the test statistic falls in the rejection region,

reject Ho; otherwise, do not reject

Ho.

Step 5 If P < , reject Ho;

otherwise do not reject Ho.

Step 6 Interpret the result of the hypothesis test.

Step 6 Interpret the result of the hypothesis test.

Page 31: Basics of statistics

Chap 9-31

Result ProbabilitiesH0: Innocent

The Truth The Truth

Verdict Innocent Guilty Decision H0 True H0 False

Innocent Correct ErrorDo NotReject

H0

1 - Type IIError ( )

Guilty Error Correct RejectH0

Type IError( )

Power(1 - )

Jury Trial Hypothesis Test

Page 32: Basics of statistics

Chap 9-32

Type I & II Errors Have an Inverse Relationship

If you reduce the probability of one error, the other one increases so that everything else is unchanged.

Page 33: Basics of statistics

Chap 9-33

Critical Values Approach to Testing

Convert sample statistic (e.g.: ) to test statistic (e.g.: Z, t or F –statistic)

Obtain critical value(s) for a specifiedfrom a table or computer If the test statistic falls in the critical region,

reject H0

Otherwise do not reject H0

X

Page 34: Basics of statistics

Chap 9-34

One-tail Z Test for Mean( Known)

Assumptions Population is normally distributed If not normal, requires large samples Null hypothesis has or sign only

Z test statistic

/X

X

X XZ

n

Page 35: Basics of statistics

Chap 9-35

Rejection Region

Z0

Reject H0

Z0

Reject H0

H0: 0 H1: < 0

H0: 0 H1: > 0

Z Must Be Significantly Below 0

to reject H0

Small values of Z don’t contradict H0

Don’t Reject H0 !

Page 36: Basics of statistics

Chap 9-36

Example: One Tail Test

Q. Does an average box of cereal contain more than 368 grams of cereal? A random sample of 25 boxes showed = 372.5. The company has specified to be 15 grams. Test at the 0.05 level.

368 gm.

H0: 368 H1: > 368

X

Page 37: Basics of statistics

Chap 9-37

Finding Critical Value: One Tail

Z .04 .06

1.6 .9495 .9505 .9515

1.7 .9591 .9599 .9608

1.8 .9671 .9678 .9686

.9738 .9750

Z0 1.645

.05

1.9 .9744

Standardized Cumulative Normal Distribution Table

(Portion)

What is Z given = 0.05?

= .05

Critical Value = 1.645

.95

1Z

Page 38: Basics of statistics

Chap 9-38

Example Solution: One Tail Test

= 0.5

n = 25

Critical Value: 1.645

Decision:

Conclusion:

Do Not Reject at = .05

No evidence that true mean is more than 368

Z0 1.645

.05

Reject

H0: 368 H1: > 368 1.50

XZ

n

1.50

Page 39: Basics of statistics

Chap 9-39

p -Value Solution

Z0 1.50

P-Value =.0668

Z Value of Sample Statistic

From Z Table: Lookup 1.50 to Obtain .9332

Use the alternative hypothesis to find the direction of the rejection region.

1.0000 - .9332 .0668

p-Value is P(Z 1.50) = 0.0668

Page 40: Basics of statistics

Chap 9-40

p -Value Solution(continued)

01.50

Z

Reject

(p-Value = 0.0668) ( = 0.05) Do Not Reject.

p Value = 0.0668

= 0.05

Test Statistic 1.50 is in the Do Not Reject Region

1.645

Page 41: Basics of statistics

Chap 9-41

Example: Two-Tail Test

Q. Does an average box of cereal contain 368 grams of cereal? A random sample of 25 boxes showed = 372.5. The company has specified to be 15 grams. Test at the 0.05 level.

368 gm.

H0: 368

H1: 368

X

Page 42: Basics of statistics

Chap 9-42

372.5 3681.50

1525

XZ

n

= 0.05

n = 25

Critical Value: ±1.96

Example Solution: Two-Tail Test

Test Statistic:

Decision:

Conclusion:

Do Not Reject at = .05

No Evidence that True Mean is Not 368Z0 1.96

.025

Reject

-1.96

.025

H0: 368

H1: 368

1.50

Page 43: Basics of statistics

Chap 9-43

p-Value Solution

(p Value = 0.1336) ( = 0.05) Do Not Reject.

01.50

Z

Reject

= 0.05

1.96

p Value = 2 x 0.0668

Test Statistic 1.50 is in the Do Not Reject Region

Reject

Page 44: Basics of statistics

Chap 9-44

For 372.5, 15 and 25,

the 95% confidence interval is:

372.5 1.96 15 / 25 372.5 1.96 15 / 25

or

366.62 378.38

If this interval contains the hypothesized mean (368),

we do not reject the null hypothesis.

I

X n

t does. Do not reject.

Connection to Confidence Intervals

Page 45: Basics of statistics

Chap 9-45

What is a t Test?

Commonly Used Definition: Comparing two means to see if they are significantly different from each other

Technical Definition: Any statistical test that uses the t family of distributions

Page 46: Basics of statistics

Chap 9-46

Independent Samples t Test Use this test when you

want to compare the means of two independent samples on a given variable

• “Independent” means that the members of one sample do not include, and are not matched with, members of the other sample

Example:• Compare the average

height of 50 randomly selected men to that of 50 randomly selected women

Compare using Compare using tt test test

Independent Independent MeanMean

#1#1

Independent Independent MeanMean

#2#2

Page 47: Basics of statistics

Chap 9-47

Dependent Samples t Test

Used to compare the means of a single sample or of two matched or paired samples

Example: • If a group of students

took a math test in March and that same group of students took the same math test two months later in May, we could compare their average scores on the two test dates using a dependent samples t test

Page 48: Basics of statistics

Chap 9-48

Comparing the Two t TestsIndependent Samples Tests the equality of the

means from two independent groups (diagram below)

Relies on the t distribution to produce the probabilities used to test statistical significance

Dependent Samples Tests the equality of the means

between related groups or of two variables within the same group (diagram below)

Relies on the t distribution to produce the probabilities used to test statistical significance

Before treatmentBefore treatment

Person Person #1#1

After treatmentAfter treatment

Person Person #1#1

Treatment groupTreatment group

Person Person #1#1

Control groupControl group

Person Person #2#2

Page 49: Basics of statistics

Chap 9-49

Types

One sample compare with population

Unpairedcompare with control

Pairedsame subjects: pre-post

Z-test large samples >30

Page 50: Basics of statistics

Chap 9-50

Compare Means (or medians)Example: Compare blood presures of two or more groups,

or compare BP of one group with a theoretical value.

1 Group: 1. One Sample t test 2. Wilcoxon rank sum test 2 Groups: 1. Unpaired t test 2. Paired t test 3. Mann-Whitney t test 4. Welch’s corrected t test 5. Wilcoxon matched pairs test

Page 51: Basics of statistics

Chap 9-51

3-26 Groups: 1. One-way ANOVA 2. Repeated measures ANOVA 3. Kruskal-Wallis test 4. Friedman test

(All with post tests) Raw data Average data Mean, SD, & NAverage data Mean, SEM, & N

Page 52: Basics of statistics

Chap 9-52

Is there a difference?

between you…means,

who is meaner?

Page 53: Basics of statistics

Chap 9-53

Statistical Analysis

controlgroupmean

treatmentgroupmean

Is there a difference?

Slide downloaded from the Internet

Page 54: Basics of statistics

Chap 9-54

What does difference mean?

mediumvariability

highvariability

lowvariability

The mean differenceis the same for all

three cases

Slide downloaded from the Internet

Page 55: Basics of statistics

Chap 9-55

What does difference mean?

mediumvariability

highvariability

lowvariability

Which one showsthe greatestdifference?

Slide downloaded from the Internet

Page 56: Basics of statistics

Chap 9-56

t Test: Unknown

Assumption Population is normally distributed If not normal, requires a large sample

T test statistic with n-1 degrees of freedom

/

XtS n

Page 57: Basics of statistics

Chap 9-57

Example: One-Tail t Test

Does an average box of cereal contain more than 368 grams of cereal? A random sample of 36 boxes showed X = 372.5, ands 15. Test at the 0.01 level.

368 gm.

H0: 368 H1: 368

is not given

Page 58: Basics of statistics

Chap 9-58

Example Solution: One-Tail

= 0.01

n = 36, df = 35

Critical Value: 2.4377

Test Statistic:

Decision:

Conclusion:

Do Not Reject at = .01

No evidence that true mean is more than 368t35

0 2.4377

.01

Reject

H0: 368 H1: 368

372.5 3681.80

1536

Xt

Sn

1.80

Page 59: Basics of statistics

Chap 9-59

The t Table

Since it takes into account the changing shape of the distribution as n increases, there is a separate curve for each sample size (or degrees of freedom).

However, there is not enough space in the table to put all of the different probabilities corresponding to each possible t score.

The t table lists commonly used critical regions (at popular alpha levels).

Page 60: Basics of statistics

Chap 9-60

Z-distribution versus t-distribution

Page 61: Basics of statistics

Chap 9-61

The z-Test for Comparing Population

MeansCritical values for standard normal

distribution

Page 62: Basics of statistics

Chap 9-62

Summary

We can use the z distribution for testing hypotheses involving one or two independent samples To use z, the samples are independent

and normally distributed The sample size must be greater than 30 Population parameters must be known

Page 63: Basics of statistics

Chap 9-63