55
Statistics for clinicians • Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida, College of Nursing Professor, College of Public Health Department of Epidemiology and Biostatistics Associate Member, Byrd Alzheimer’s Institute Morsani College of Medicine Tampa, FL, USA 1

Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Embed Size (px)

Citation preview

Page 1: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Statistics for clinicians

• Biostatistics course by Kevin E. Kip, Ph.D., FAHAProfessor and Executive Director, Research CenterUniversity of South Florida, College of NursingProfessor, College of Public HealthDepartment of Epidemiology and BiostatisticsAssociate Member, Byrd Alzheimer’s InstituteMorsani College of MedicineTampa, FL, USA

1

Page 2: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

SECTION 2.1SECTION 2.1

Module OverviewModule Overviewand Introductionand IntroductionProbability theory and discrete and continuous sampling distributions

Page 3: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

SECTION 2.4SECTION 2.4

Bayes TheoremBayes Theorem

Page 4: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

• Procedure for updating a probability based on new information.• Rule can be used to compute a conditional probability based on

specific, available information (i.e. links degree of belief in a proposition before and after accounting for evidence).

• Can represent a subjective degree of belief that changes over time to account for new evidence.

• Often used in meta analyses and synthesis of evidence

P(B|A) P(A)P(A|B) = ---------------

P(B)

Bayes Theorem

Page 5: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

5

Question: What is the probability that Daphne’s next male child will be affected by the X-linked disorder?

Aaron

BarbaraBart

Carlos Cathy

Desmond Daphne

Earl Joseph

Naïve: 1/2

1/4

1/8

1/16

Bayesian Methods (example)Bayesian Methods (example)

Page 6: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

6

Question: What is the probability that Daphne’s next male child will be affected by the X-linked disorder?

Aaron

BarbaraBart

Carlos Cathy

Desmond Daphne

Earl Joseph

Cathy Cathycarrier non-carrier 1/2 1/2 1/3 2/3Daphne Daphnecarrier non-carrier 1/6 5/6 1/21 20/21

Thus, the probability that Daphne’snext male child will be affected is:1/2 x 1/21 = 1/42.

Prior:Posterior:

Prior:Posterior:

Page 7: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

SECTION 2.5SECTION 2.5

Binomial Binomial Distribution ModelDistribution Model

Page 8: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Probability Model:Mathematical equation or formula used to generate probabilities based on certain assumptions about the process. Very important for statistical inference.

Binomial Model:•Two possible outcomes – often labeled as “success” or “failure”, or as “disease” or “no disease”.•Allows computation of observing a specified number of responses (e.g. successes) when the process is repeated a specific number of times (e.g. among a set of patients).•For the binomial model with a set number of trials:

p = probability of success, andq = 1 – p

Page 9: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Binomial distribution model

n = number of times process is repeated (e.g. # of patients)x = number of successes (outcomes)p = probability of outcome for any individual (i.e. independent)! = factorial

n!P(x outcomes) = ----------- px(1-p)n-x

x!(n-x)!

Example:Assume that a medication is effective 80% (0.80) of the time (i.e. p=0.8)Assume that the medication will be given to 10 patients (i.e. n=10)What is the probability the medication will be effective in exactly 7 patients? (x=7)

(i.e. if we had to guess, we would think it was most likely that the medication would be effective in 8 patients)

10!P(7 successes) = ----------- 0.807(1-0.80)10-7

7!(10-7)!

Page 10: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

10!P(7 successes) = ----------- 0.807(1-0.80)10-7

7!(10-7)!

10! 10(9)(8)(7)(6)(5)(4)(3)(2)1--------- = --------------------------------------7!(10-7)! [7(6)(5)(4)(3)(2)(1)][(3)(2)(1)]

10(9)(8) = ---------- = 120

3(2)

P(7 successes) = (120)(0.807))(1-0.80)10-7)

P(7 successes) = (120)(0.2097)(0.008) = 0.2013

n!P(x outcomes) = ----------- px(1-p)n-x

x!(n-x)!

Binomial distribution model

Page 11: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Binomial distribution model

Many times we do not want to know a probability of an outcome in an exact number of persons, but rather a certain number or more persons.

For example, using our previous scenario, what is the probability that the medication will be effective at least 7 of the 10 patients? (i.e. P(> 7 successes))To do this, we need to compute individual probabilities for all the combinations.

P(7 successes) = 0.2013P(8 successes) = 0.3020P(9 successes) = 0.2684P(10 successes) = 0.1074

P(>7 successes) = 0.2013 + 0.3020 + 0.2684 + 0.1074 = 0.8791

Page 12: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Binomial distribution model (practice)

n!P(x outcomes) = ----------- px(1-p)n-x

x!(n-x)!

Assume that a drug is effective 90% of the time Assume that the medication will be given to 12 patientsWhat is the probability the drug will be effective in exactly 10 patients?

Complete the formula below:

P(10 successes) = ------------------

Page 13: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Binomial distribution model (practice)

n!P(x outcomes) = ----------- px(1-p)n-x

x!(n-x)!

Assume that a drug is effective 90% of the time Assume that the medication will be given to 12 patientsWhat is the probability the drug will be effective in exactly 10 patients?

Complete the formula below:

12!P(10 successes) = ----------- 0.9010(1-0.90)12-10

10!(12-10)!

Page 14: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Binomial distribution model (practice)

Complete the formula below:

12!P(10 successes) = ----------- 0.9010(1-0.90)12-10

10!(12-10)!

12! 12(11)(10)(9)(8)(7)(6)(5)(4)(3)(2)1--------- = --------------------------------------10!(12-10)! [10(9)(8)7(6)(5)(4)(3)(2)(1)][(2)(1)]

12(11) = ---------- = 66

(2) P(10 successes) = (66)(0.9010))(1-0.90)12-10)

P(10 successes) = (66)(0.3487)(0.01) = 0.2301

Page 15: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Binomial distribution model(calculating standard deviation)

Our Example:Assume a medication is effective 80% (0.80) of the time (i.e. p=0.8)Assume the medication will be given to 10 patients (i.e. n=10)What is the probability the medication will be effective in exactly 7 patients? (i.e. x=7)

The expected number of outcomes of a binomial population is: µ = np

So, in our example, µ = (10 x 0.8) = 8

The standard deviation (σ) = sqrt[(n(p))(1-p)]

So, in our example, σ = sqrt[(10 x 0.8) x (1-0.8)]σ = sqrt[(8 x 0.2)]σ = sqrt[1.6]σ = 1.265

Page 16: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Binomial distribution model (practice)(calculating standard deviation)

Example:Assume that a medication is effective 90% of the timeAssume that the medication will be given to 20 patients

What is the expected number of outcomes: µ = np

So, in this example, µ = ______________

Calculate the standard deviation (σ) = sqrt[(n(p))(1-p)]

So, in this example, σ = _______________________

Page 17: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Binomial distribution model (practice)(calculating standard deviation)

Example:Assume that a medication is effective 90% of the timeAssume that the medication will be given to 20 patients

What is the expected number of outcomes: µ = np

So, in this example, µ = (20 x 0.9) = 18

Calculate the standard deviation (σ) = sqrt[(n(p))(1-p)]

So, in this example, σ = sqrt[(20 x 0.9) x (1-0.9)]σ = sqrt[(18 x 0.1)]σ = sqrt[1.8]σ = 1.34

Page 18: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

SECTION 2.6SECTION 2.6

Poisson Poisson Distribution ModelDistribution Model

Page 19: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Poisson Distribution Model

• A discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time (i.e. X = 0,1,2,3, ….)

• Usually associated with rare events• Approximates the binomial distribution when N is large (i.e.

>100) and p is small (i.e. <0.01)

Requirements for the Poisson Distributiona) Length of time period is fixed in advance; b) Events occur at a constant average rate;c) Events can be counted in whole numbers d) Number of events occurring in disjoint intervals are statistically

independent.

Page 20: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Poisson Distribution ModelIllustration:Assume deaths from typhoid fever in a given population are Poisson distributed with a mean of 2.3 deaths per year. What is the probability distribution of deaths in this population?

Pr(X = k)

k

Page 21: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Poisson Distribution Model

Poisson Formula

Suppose we conduct a Poisson study in which the average number of health events within a given time period is μ. Then, the Poisson probability is:

P(x; μ) = (e-μ) (μx) / x!

where x is the actual number of events that result from the study,

and e is approximately equal to 2.71828.

Page 22: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Poisson FormulaExample:The average number of colds for toddlers in day care is 2 per year. Knowing this, what is the estimated probability that a new toddler to day care will have exactly 3 colds during the following year?

P(x; μ) = (e-μ) (μx) / x! μ = 2; since 2 colds per year, on average. x = 3; since we want to find the likelihood that 3 colds will occur in the next year. e = 2.71828; since e is a constant equal to approximately 2.71828.

Plug these values into the Poisson formula:

P(3; 2) = (2.71828-2) (23) / 3! P(3; 2) = (0.13534) (8) / 6 P(3; 2) = 0.180

Thus, the probability of a toddler having 3 colds in the next year is 0.180.

Page 23: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Poisson Formula (Practice)Assume that the average number of cases of tuberculosis within a nursing home is 4 per year. Knowing this, what is the estimated probability that exactly 4 new cases of TB will occur during the following 6-months?

P(x; μ) = (e-μ) (μx) / x! μ = ________x = ________ e = ________

Plug these values into the Poisson formula:

Page 24: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Poisson Formula (Practice)Assume that the average number of cases of tuberculosis within a nursing home is 4 per year. Knowing this, what is the estimated probability that exactly 4 new cases of TB will occur during the following 6-months?

P(x; μ) = (e-μ) (μx) / x! μ = 2; since 4 cases of TB per year = 2 cases of TB per 6 months, on average. x = 4; since we want to find the likelihood that 4 cases will occur in next 6 months. e = 2.71828; since e is a constant equal to approximately 2.71828.

Plug these values into the Poisson formula:

P(4; 2) = (2.71828-2) (24) / 4! P(4; 2) = (0.13534) (16) / 24 P(4; 2) = 0.0902

Thus, the probability of exactly 4 new cases of TB in the next 6-months is 0.09.

Page 25: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

http://statpages.org/ctab2x2.html

Page 26: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

SECTION 2.7SECTION 2.7

Properties of the Properties of the Normal DistributionNormal Distribution

Page 27: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Normal Distribution

Appropriate for a continuous outcome if:

Mean = median = mode

Symmetric around the mean

P(x > µ) = p(x < µ) where x is continuous variable and µ is mean

~68% of all values fall between the mean and one 1 SD (i.e. P(µ - σ < x < P(µ + σ) = 0.68)

~95% of all values fall between the mean and 2 SD (i.e. P(µ - 2σ < x < P(µ + 2σ) = 0.95)

~99% of all values fall between the mean and 3 SD (i.e. P(µ - 3σ < x < P(µ + 3σ) = 0.99)

Page 28: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Normal Distribution

Page 29: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Normal Distribution ~68% of values between mean and one 1 SD~95% - ~68% = 27% / 2 = ~13.6% of all values between -1 to -2 SD and +1 to +2SD~99% - ~95% = 4% / 2 = ~2.0% of all values between -2 to -3 SD and +2 to +3SD

Page 30: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Normal Distribution (Practice)Assume that BMI is normally distributedwith µ = 29.4 and σ = 4.6

1. Put in the appropriate values on thenormal distribution curve for BMI valuesplus or minus 1, 2, and 3 SD from the mean

2. What is the median BMI value? _______

3. Approximately what percentage of thepopulation has a BMI > 34? _________

4. Approximately what percentage of thepopulation has a BMI between 15.6 and 20.2? __________

5. What is the approximate minimum andmaximum BMI in the population?

Min: _________ Max: __________

29.4

? ? ? ? ? ?

Body Mass Index (BMI)

µ

Page 31: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Normal Distribution (Practice)Assume that BMI is normally distributedwith µ = 29.4 and σ = 4.6

1. Put in the appropriate values on thenormal distribution curve for BMI valuesplus or minus 1, 2, and 3 SD from the mean

2. What is the median BMI value? _29.4__

3. Approximately what percentage of thepopulation has a BMI > 34? __16.0%___(i.e. 13.6% + 2.2% + 0.2%)

4. Approximately what percentage of thepopulation has a BMI between 15.6 and 20.2? ____2.2%______

5. What is the approximate minimum andmaximum BMI in the population?

Min: ____~15_____ Max: ____~44____

29.4

Body Mass Index (BMI)

µ24.820.215.6 34.0 38.6 43.2

µ-3σ µ-2σ µ-1σ µ+1σ µ+2σ µ+3σ

34%13.6% 34% 13.6% 0.2%2.2% 2.2%0.2%

Page 32: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Normal DistributionQuestion: What do we if we want to calculate a probability when the value of interestis not the mean or a multiple of the standard deviation?Answer: Compute a z-score and use a table of probabilities for a “standard” normaldistribution with mean of 0 and standard deviation of 1.

Standard Normal Distribution; µ=0, σ=1

µ-1-2-3 1 2 3

Page 33: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Normal Distribution

x - µZ = -------------

σ

Standardized Score (z-score)

Where:x is the value of interestµ is the meanσ is the standard deviation

Example: Body Mass Indexx is 35µ is 29.4σ is 4.6

35 – 29.4Z = ------------- = 1.17

4.6

Page 34: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

29.4

Body Mass Index (BMI)

24.820.215.6 34.0 38.6 43.2

34%13.6% 34% 13.6% 0.2%2.2% 2.2%

0.2%

Body Mass Index µ is 29.4 σ is 4.6

Body Mass Index (BMI)

35.0

If x = 34, then z = 1

i.e. 34 – 29.4 ----------- = 1 4.6Thus, P(z > 1 = (0.136 + 0.022 + 0.002 = 0.16)

If x = 35, then z = 1.17

i.e. 35 – 29.4 ----------- = 1.17 4.6Thus, P(z > 1.17 = (?????)

Refer to Appendix Table 1

P(z > 1.17 = 0.121)P

Area underthe curve

Page 35: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Body Mass Index (BMI)

35.0

In Appendix Table 1:

The probability value of 0.879 (and 1 – 0.879 = 0.121) is determined by first looking at the left column for z to 1 decimal place, and then across the top row for z to the second decimal place.

P

http://stattrek.com/online-calculator/normal.aspx

Can also get the exact probability for z = 1.17

Cumulative probability: P(Z < 1.17) = 0.879

Thus, probability: P(Z > 1.17) = (1 - 0.879) = 0.121

Area underthe curve

Page 36: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Body Mass Index (BMI)

37.0

P

Area underthe curve

Standard Normal Distribution (Practice)

A. What is the probability that a person in the populationwill have a BMI > 37.0?

z = _______________ P(Z > ???) = __________

B. What is the probability that a person in the populationwill have a BMI < 27.0?

z = _______________ P(Z < ???) = __________

Body Mass Index µ is 29.4 σ is 4.6

27.0

P

Area underthe curve

x - µZ = -------------

σ

After calculating z for questions A

and B, refer to Appendix Table 1

Page 37: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Body Mass Index (BMI)

37.0

P

Area underthe curve

Standard Normal Distribution (Practice)

Body Mass Index µ is 29.4 σ is 4.6

27.0

P

Area underthe curve

x - µZ = -------------

σ

After calculating z for questions A

and B, refer to Appendix Table 1

A. What is the probability that a person in the populationwill have a BMI > 37.0?

z = (37 – 29.4) / 4.6 = 1.65 P(Z > 1.65) = 0.0495

B. What is the probability that a person in the populationwill have a BMI < 27.0?

z = (27-29.4) / 4.6 = -0.52 P(Z < -0.52) = 0.3015

Page 38: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Percentiles from Standard Normal Distribution

Standard normal distribution can also be used to compute percentiles.

x = µ + zσ

Example: Calculate the 90th percentile for BMI:

Body Mass Index (BMI)

90th percentile

Body Mass Index µ is 29.4 σ is 4.6

29.4

In Appendix Table 1:

The z value for the 90th percentile is found by looking in the body of the table for a value of 0.90, or the nearest value. In this case, it is 0.8997 which corresponds to a z value of 1.28 (the actual z value is 1.282)

So, x = 29.4 + (1.282 x 4.6) = 35.3

µ

Page 39: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Percentiles from Standard Normal Distribution (Practice)

x = µ + zσ

Calculate the 33rd percentile for BMI:

z = _________ so, x = _________________________

Body Mass Index (BMI)

33rd percentile

Body Mass Index µ is 29.4 σ is 4.6

29.4

µ

Page 40: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Percentiles from Standard Normal Distribution (Practice)

x = µ + zσ

Calculate the 33rd percentile for BMI:

z = -0.44 so, x = 29.4 + (-0.44 x 4.6) = 27.4

Body Mass Index (BMI)

33rd percentile

Body Mass Index µ is 29.4 σ is 4.6

29.4

µ

Page 41: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Standard Normal Distribution

x - µZ = -------------

σ

Z values for common percentiles:

Percentile Z

1st -2.3262.5th -1.9605th -1.64510th -1.28225th -0.67550th 0.075th 0.67590th 1.28295th 1.64597.5th 1.96099th 2.326

Page 42: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

SECTION 2.8SECTION 2.8

Sampling Distributions Sampling Distributions and Central Limit and Central Limit TheoremTheorem

Page 43: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Sampling Distributions

In estimating the mean of a continuous variable in a population, the mean of a representative sample is a good estimate of the unknown population mean, but it is only an estimate.

When making estimates about population parameters based on sample statistics, it is very important to quantify the precision of the parameter estimates (e.g. standard error of the mean).

Illustration:Assume a population of 6 measurements of self-reported pain (on a 0

to 100 scale) after total hip replacement with scores as follows:

25 50 80 85 90 100

The population mean (μ) is: ∑X / N = 71.7 andthe standard deviation is:sqrt[∑(X- μ)2 / N = 25.9

Page 44: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Sampling Distributions

25 50 80 85 90 100

The population mean (μ) is: ∑X / N = 71.7 andthe standard deviation is: sqrt[∑(X- μ)2 / N = 25.9

Suppose we did not have population data and wanted to estimate the mean from a sample, taking a sample size of 4.

There are 15 different possible samples with n=4 when sampling without replacement is used (i.e. each individual can only be sampled once in a given sample).

The probability of selecting any one of the 15 possible samples is1/15 – see next slide.

Page 45: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

SampleObservations in

Sample Sample Mean (X)

1 25 50 80 85 60.0

2 25 50 80 90 61.3

3 25 50 80 100 63.8

4 25 50 85 90 62.5

5 25 50 85 100 65.0

6 25 50 90 100 66.3

7 25 80 85 90 70.0

8 25 80 85 100 72.5

9 25 80 90 100 73.8

10 25 85 90 100 75.0

11 50 80 85 90 76.3

12 50 80 85 100 78.8

13 50 80 90 100 80.0

14 50 85 90 100 81.3

15 80 85 90 100 88.8

The table represents the samplingdistribution of the sample means

So, with the original populationsample of N=6, the population meanis: µ = ∑X / N = 71.7 with SD of 25.9.

However, the mean of the samplemeans, denoted as µX is 71.7 and astandard deviation of σX = 8.5.

Note that µ = µX (71.7), yet σ (25.9)is much smaller than σX = 8.5

This is because the range of thepopulation data (25 to 100) is much larger than the range of the samplemeans (60 to 88.8).

These properties are formally statedin the Central Limit Theorem.

Page 46: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Central Limit Theorem

If we take simple random samples of size n from the population with replacement, then for large samples (n > 30), the sample distribution of the sample means is approximately normally distributed with:

µX = µ and σX = σ / sqrt(n)

Because the distribution of the sample means is approximately normal, the normal probability model can be used to make inferences about a population mean.

The parameter σX = σ / sqrt(n), as noted above, is the standard error (meaning the standard deviation of the sample means)

For a dichotomous outcome, the theorem holds for samples that:Minimum[np, n(1-p)] > 5, where n is the sample size and p is probability of the outcome in the population.

Page 47: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Central Limit Theorem (Practice)

Characteristic N µ σ µX σX

Age (in years) 60 56.2 9.1

Systolic blood pressure 60 135.6 20.8

Body mass index 60 26.3 6.4

Resting heart rate 60 71.0 8.6

Sample Population Sample Means

µX = µ σX = σ / sqrt(n)

Page 48: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Central Limit Theorem (Practice)

Characteristic N µ σ µX σX

Age (in years) 60 56.2 9.1 56.2 1.17

Systolic blood pressure 60 135.6 20.8 135.6 2.69

Body mass index 60 26.3 6.4 26.3 0.83

Resting heart rate 60 71.0 8.6 71.0 1.11

Sample Population Sample Means

µX = µ σX = σ / sqrt(n)

Note that µX = µ because the sample population mean is an unbiased estimate of the true meanWhereas σX = σ are different quantities:

σ is an estimate of variability in the sampleσX is an estimate of precision of the mean estimate

Thus, as n increases, σX is smaller, but σ may be > or <

Page 49: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Central Limit Theorem -- Application

Assume that in adults over age 50 that HDL cholesterol has:µ = 54 and σ = 17.

Suppose a physician has 40 patients (older than 50) and wants todetermine the probability that their mean HDL is >60 i.e. P(X>60)

Intuitively, this should appear very unlikely…..

X - µ 60 – 54 6Z = ------------ Z = ---------------- = ---- = 2.22

σ / sqrt(n) 17 / sqrt(40) 2.7

From appendix Table 1, P(z > 2.22) =

(1 – 0.9868) = 0.0132 (i.e. very unlikely)

Page 50: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Central Limit Theorem – Application (Practice)

Assume that in adults over age 50 that HDL cholesterol has:µ = 44 and σ = 16.

Suppose a physician has 50 patients (older than 50) and wants todetermine the probability that their mean HDL is <40 i.e. P(X<40)

X - µ Z = ------------ Z = ----------------

σ / sqrt(n)

X = ____ µ = ____ σ = ____ n = ____

From appendix Table 1, P(z < ????) =

Page 51: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Central Limit Theorem – Application (Practice)

Assume that in adults over age 50 that HDL cholesterol has:µ = 44 and σ = 16.

Suppose a physician has 50 patients (older than 50) and wants todetermine the probability that their mean HDL is <40 i.e. P(X<40)

X - µ 40 – 44 -4Z = ------------ Z = ---------------- = ----- = -1.77

σ / sqrt(n) 16 / sqrt(50) 2.26

X = 40 µ = 44 σ = 16 n = 50

From appendix Table 1, P(z < -1.77) = 0.0384

Page 52: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

SECTION 2.9SECTION 2.9

SPSS – Calculation SPSS – Calculation of Z-Scoresof Z-Scores

Page 53: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

SPSS – Calculation of Z-Scores

Example: Age

Analyze Descriptive Statistics

Descriptives Before you click OK, be sure that the box marked "Save Standardized Values as Variables" is checked.

Run a Frequency distribution for the standardized variable

Page 54: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

SPSS – Calculation of Z-Scores

Example: Age

GET FILE='G:\NGR 7848 2012\Datasets\baseline_random.sav'.DATASET NAME DataSet1 WINDOW=FRONT.DESCRIPTIVES VARIABLES=SCR_AGE /SAVE /STATISTICS=MEAN STDDEV MIN MAX.

Descriptive Statistics N Minimum Maximum Mean Std. Deviation

Age (years) 503 45 74 59.16 7.409Valid N (listwise) 503

Page 55: Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

SPSS – Calculation of Z-Scores

Example: AgeFREQUENCIES VARIABLES=ZSCR_AGE /FORMAT=NOTABLE /STATISTICS=STDDEV MEAN SKEWNESS SESKEW /HISTOGRAM /ORDER=ANALYSIS.

StatisticsZscore: Age (years)N Valid 503

Missing 0Mean 0E-7Std. Deviation 1.00000000Skewness .097Std. Error of Skewness .109