15
HS 167 9: Inference About a Prop ortion 1 Inference about a proportion Unit 9

HS 1679: Inference About a Proportion1 Inference about a proportion Unit 9

  • View
    221

  • Download
    2

Embed Size (px)

Citation preview

Page 1: HS 1679: Inference About a Proportion1 Inference about a proportion Unit 9

HS 167 9: Inference About a Proportion

1

Inference about a proportion

Unit 9

Page 2: HS 1679: Inference About a Proportion1 Inference about a proportion Unit 9

HS 167 9: Inference About a Proportion

2Copyright ©1997 BMJ Publishing Group Ltd.

Greenhalgh, T. BMJ 1997;315:364-366

Our data analysis journey continues …

Page 3: HS 1679: Inference About a Proportion1 Inference about a proportion Unit 9

HS 167 9: Inference About a Proportion

3

Types of response variables

Quantitative Sums Averages

Categorical Counts Proportions

Response type

Prior chapters have focused on quantitative response variables. We now focus on categorical response

variables.

Page 4: HS 1679: Inference About a Proportion1 Inference about a proportion Unit 9

HS 167 9: Inference About a Proportion

4

Binary variables

We focus on the most popular type of categorical response, the binary response (categorical variables with two categories; dichotomous variables)Examples of binary responses

CURRENT_SMOKER: yes/no SEX: male/female SURVIVED: yes/no DISEASE_STATUS: case/non-case

One category is arbitrarily labeled a “success”Count the number of success in the sample Turn the count into a proportion

Page 5: HS 1679: Inference About a Proportion1 Inference about a proportion Unit 9

HS 167 9: Inference About a Proportion

5

Proportions

np

successes"" ofnumber ˆ

tois as , tois ˆ xpp

The proportion in the sample is denoted "p-hat"

The proportion in the population (parameter) is denoted p

Page 6: HS 1679: Inference About a Proportion1 Inference about a proportion Unit 9

HS 167 9: Inference About a Proportion

6

A proportion is an average of 0s and 1s

Example (right):n =10X binary attribute coded 1=YES and 0=NOx = 2

Observation X

1 0

2 1

3 0

4 0

5 0

6 0

7 0

8 0

9 0

10 1

x = 2

p

n

xx

ˆ

samplein proportion10

2

Page 7: HS 1679: Inference About a Proportion1 Inference about a proportion Unit 9

HS 167 9: Inference About a Proportion

7

Incidence proportion and prevalence proportion

Incidence proportion (risk): proportion that develop condition over specified timePrevalence: proportion with the condition at a point in time

Source www.bioteach.ubc.ca/Biomedicine/Smallpox/

Page 8: HS 1679: Inference About a Proportion1 Inference about a proportion Unit 9

HS 167 9: Inference About a Proportion

8

Example: “Smoking prevalence”

SRS, n = 57, determine number of current smokers (“successes”) X = 17

29.8% is sample in the smoking of prevalence The

2982.57

17ˆ p

Calculations: at least 4 significant digits Reporting (APA 2001): convert to percent and report xx.x%

Page 9: HS 1679: Inference About a Proportion1 Inference about a proportion Unit 9

HS 167 9: Inference About a Proportion

9

Inference about the proportion

How good is sample proportion (p-hat) as an estimate population proportoin (p)?To answer this question, consider what would happen if we took many samples of size n from the same population?This creates the sampling distribution of p-hat

tois as , tois ˆ xpp

Page 10: HS 1679: Inference About a Proportion1 Inference about a proportion Unit 9

HS 167 9: Inference About a Proportion

10

Binomial Sampling Distribution

The sampling distribution of the is binomial Binomial probabilities are difficult to calculateHowever, the binomial becomes Normal when n is large (central limit theorem)The figure to the right shows the number of smokers expected in a sample of n = 57 from a population in which p = 0.25. This distribution is both binomial and Normal. We can use a Normal approximation to the binomial when n is large.

0

.02

.04

.06

.08

.10

.12

.14

0 5 10 15 20 25 30 35 40 45 50 55

Number of Successes

Page 11: HS 1679: Inference About a Proportion1 Inference about a proportion Unit 9

HS 167 9: Inference About a Proportion

11

Sampling Distribution of p-hat when n large

pqn

pqSE

SEpNp

p

p

1 where

where

),(~ˆ

ˆ

ˆ

When n is large,

Page 12: HS 1679: Inference About a Proportion1 Inference about a proportion Unit 9

HS 167 9: Inference About a Proportion

12

Confidence interval for p (plus 4 method)

Take a SRS, count the successes and failures, add two imaginary successes and two imaginary failures to the statistics, put a tilde over these revised statistics:

~

~~ where~ ~~1 2 n

qpSESEzp pp

~

~~ and ,4~ ,2

~

n

XpnnXX

Then calculate the CI according to this formula:

Example: 17 smokers in n = 57

6885.3115.1~

3115.61

19~

614574~192172

~

q

p

nn

XX

)4277. ,1953(.

1162.3115.

)0593)(.96.1(3115.

ˆ for CI 95%

confidence 95%for 96.1

0593.61

)6885)(.3115(. ~

~~

ˆ

~

p

p

SEzpp

zn

qpSE

Page 13: HS 1679: Inference About a Proportion1 Inference about a proportion Unit 9

HS 167 9: Inference About a Proportion

13

Sample size requirements

2

**21 2

d

qpzn

To estimate p with margin of error d use:

where z is the Z value for given level of confidence and p* is an educated guess for the proportion you want to estimate (use p* = 0.5 to get the “safest” estimate)

323322.7 05.

)70)(.30)(.96.1(2

2

n

Redo study but now want margin of error of ±.03

897896.4 03.

)70)(.30)(.96.1(2

2

n

Sample size calculations always rounded up.

Example: Redo smoking survey; now want 95% CI with margin of error ±.05; assume p* = 0.30 (best available estimate)

Page 14: HS 1679: Inference About a Proportion1 Inference about a proportion Unit 9

HS 167 9: Inference About a Proportion

14

Hypothesis TestA. H0: p = p0 vs. H1: p p0

where p0 represents the proportion specified by null hypothesis

B. Test statistic

C. P-value (from z table)D. Significance level

n

qpSE

SE

ppz

p

p

00ˆ

ˆ

0stat

where

ˆ

Illustration: Prevalence of smoking in the U.S. (p0) is 0.25. Take a SRS of n = 57 from community and find 17 smokers. Therefore, p-hat = 17 / 57 = 0.2982. Is this significantly different than 0.25?A. H0:p = 0.25 vs. H1:p ≠ 0.25B. Test statistic

C. P = 0.4010D. Evidence against H0 is not

significant (retain H0)

84.00574.

25.2982.

ˆ

574.057

)75)(.25(.

ˆ

0stat

00ˆ

p

p

SE

ppz

n

qpSE

Page 15: HS 1679: Inference About a Proportion1 Inference about a proportion Unit 9

HS 167 9: Inference About a Proportion

15

Conditions for Inference

Valid informationSRSTo use Normal based method For plus-four confidence interval, n must be

10 or greater For z test, np0q0 5

Illustration: n = 57, p0= 0.25, q0 = 0.75Therefore, np0q0 = 57 ∙ 0.25 ∙ 0.75 = 10.7 → “OK”