SC 604 Phillips · 2016. 5. 2. · Characteristic Retrospective Study (Observational or Case-Control) Prospective (Designed) Study (Cohort) 1. Outcome is measured Before exposure

SC 604Statistically Significant or Not Significant?

Is that the Question? Jerry Phillips

[email protected]513-1776

Applying Statistical Research to Infusion Therapy

Statistically Significant or Not Significant?

Is that the Question?

Learning Objectives:1. Identify what constitutes sound statistical data.

(Diagnose whether results are reliable)2. Discuss application of data to an infusion practice.

(If data are reliable, then how to interpret and apply the results).


Is that the Question?

Other Questions Must be Asked First:1. How was Study designed?2. How was Data Collected?3. How was Data Analyzed?4. Do Statistical Results make “Clinical Sense”?

How was Study Designed?

1. Methodsa) Study Design:

i. Retrospective vs. Prospective

b) Data Collectioni. Randomizationii. Representative Sampleiii. Bias

c) Sample Size Justification

Retrospective vs. Prospective Studies

Characteristic Retrospective Study(Observational or Case-Control)

Prospective (Designed) Study(Cohort)

1. Outcome is measured

Before exposure After exposure

2. Control and Results Controls are selected on the basis of not having the outcome

• Yields true incidence rates and relative risks

• May uncover unanticipated associations with outcome

3. Frequency of Outcome:

Rare outcomes (rare diseases)

Common outcomes

Retrospective vs. Prospective Studies

Characteristic Retrospective Study(Observational or Case-Control)

Prospective (Designed) Study(Cohort)

4. Cost: Inexpensive Expensive5. Sample Size: Smaller numbers required

or use existing databaseLarger numbers required to properly detect clinical effect

6. Study Duration: Quicker to complete or use existing database

Longer to complete

7. Bias prone to: Selection and recall/retrospective bias

Attrition and change in methods over time bias

8. Confounding Effect Factors confounded–difficult to separate effects

Effect of Factors estimated separately

How was Data Collected?

1. Methodsa) Data Collection

i. Randomizationii. Representative Sampleiii. Bias

Data Collection

1. Define Population2. Randomly Select n

Samples

Population

x

x

x

x

xx

x xx x

x xx

x

x

x

x

xx

xx

x xx

xx

xx

x

x

xxx x x

x xx x

3. Make Inferences about Population from Sample

Sample n

Sample 4

Sample 1

Sample 3Sample 2

Sample 5

Sample

A sample of n test units randomly drawn from a larger population of interest. Conclusions of study only pertain to the population from which the samples were collected.

Conclusions may NOT be extrapolated to other Populations!

Representative Sample

Benefits of Random Selection

• Avoids bias (e.g. all samples at beginning of production)

• Provides an equal opportunity for every sample to be selected.

• Provides insurance against uncontrolled factors (e.g., weather, humidity, temperature, etc.)

• Allows inference across the population of interest.

• Governmental Agencies Require it!

Types of Study Bias

1. Selection bias - e.g. study of nursing practice in U.S. is not representative of the practices in Canada.

2. Observation bias (recall and information) - e.g. on questioning, healthy people are more likely to under report their alcohol intake than people with a disease.

3. Observation bias (interviewer) - e.g. different interviewer styles might provoke different responses to the same question.

4. Observation bias (misclassification or misdiagnosis) - tends to dilute an effect

5. Losses to follow up - e.g. ill people may not feel able to continue with a study whereas health people tend to complete it.

Study Design and Data Collection Hospital Survey Example

1) Define the Population of Interest :

a) Subject : Nurses that work with Neonatesb) Hospital Conditions :

1) Hospital Size : Large ( > 250 beds )2) Hospital Location : Midwest3) Specialty : Neonates

2) Randomly select N nurses to survey from a larger list of nurses that meet the hospital conditions requirements.

3) Survey results ONLY pertain to nurses that use neonate sets in large hospitals located in the Midwest.

Study Design and Data Collection Hospital Survey Example

Famous Selection Bias

Hypothesis Terminology

• Null Hypothesis denoted H0 (“H” naught) • H0 usually postulates the absence of an effect,

such as no difference between two groups, or the absence of a relationship between a factor and an outcome.

• Sometimes referred to as the “Dull”Lhypothesis (nothing exciting is happening)

• H0 : Treatment Effect= 0

Hypothesis Terminology

• Alternative hypothesis denoted Ha

• Depending on objective of the study, Hapostulates the desired alternative effect, such as clinical difference between two groups, or the presence of a relationship between a factor and an outcome.

• Ha : Treatment Effect ≠ 0• Ha : Treatment Effect > 0 (positive effect)• Ha : Treatment Effect < 0 (negative effect)

Sample Size Justification?

Sample Size Justification should define:a) Clinical importance/difference of study

b) Confidence Level (1-α):

Chance of correctly declaring no difference existsc) Power (1-β):

Chance of correctly declaring a clinical difference exists

d) Assumptions (distribution, variability, etc.)

What is a Clinical Difference?

• Clinical difference defined as:

– Threshold (determined by medical team) at which new treatment is more efficacious than the current treatment.

– Example: Current treatment provides a 5 year survival rate of 50%

– New treatment is considered clinically efficacious if a 5 year survival rate of 60% is demonstrated.

Statistical Difference

• Statistical difference is the minimum difference one may detect between the new and current treatment given the: 1. sample size, 2. confidence level,3. power of the test, and 4. standard deviation.

Statistical vs. Clinical Difference

Case 1: If sample size is “too small”, then study is not sensitive enough to detect desired clinical difference.Case 2: If sample size is “too large”, then study is over sensitive and is able to detect a statistical difference smaller than the desired clinical difference.Case 3: If sample size is “Just Right”, then study is properly powered to detect a statistical difference that is equal to the desired clinical difference.Case 3 is the goal of a Prospective Study.

Confidence Level & Type I Error Definitions

Truth : H0 is True (Only known if entire population were sampled)H0 :Patient is truly “Healthy” (Test Result = Negative)

Confidence Level = Chance of correctly diagnosing patient is Healthy (True Negative)Example: 95% chance of correctly diagnosing patient is Healthy

Type I or alpha Error = 100% - Confidence Level (%) A truly “Healthy” Patient is incorrectly diagnosed with disease ( False Negative Test Result)

Example: Type I Error = 100% - 95% Confidence Level = 5%= 5% chance of a False Negative Test Result.

Power & Type II Error Definitions

Truth : H0 is False (Only known if entire population were sampled)Patient is truly “Unhealthy” (Test Result should be Truly Positive)

Power = Chance of correctly diagnosing patient is Unhealthy (True Positive)Example: 90% chance of correctly diagnosing Patient is Unhealthy

Type II or beta Error = 100% - Power (%) A truly “Unhealthy” Patient is incorrectly diagnosed not having disease ( False Positive Test Result)

Example: Type II Error = 100% - 90% Power = 10%= 10% chance of a False Positive Test Result.

Confidence, Power, Type I and II Errors (Diagnostic Test)

Decision Based on Data:

Truth : H0 should NOT be RejectedH0 is TrueH0 :Patient = “Healthy” H0 :Test = Negative

Truth : H0 should be RejectedHa is TrueHa :Patient = “Unhealthy” Ha :Test = Positive

Test result is Negative

H0 is NOT Rejected

Confidence Level (1-α):True Negative(Correctly Diagnose Patient is Healthy)

Type II Error (β):False Positive(Incorrectly Diagnose Patient is Healthy)

Test result is Positive

H0 is Rejected

Type I Error (α):False Negative(Incorrectly Diagnose Patient is Unhealthy)

Power (1-β):True Positive(Correctly Diagnose Patient is Unhealthy)

Confidence, Power, Type I and II Errors (Clinical Trial)

Decision Based on Data:

Truth : H0 should NOT be RejectedH0 is TrueH0 : New Therapy = Ineffective

Truth : H0 should be RejectedHa is TrueHa :New Therapy = Effective

New Therapy is IneffectiveH0 is NOT Rejected

Confidence Level (1-α):Correctly Declare New Therapy Ineffective(Back to R&D Lab)

Type II Error (β):Incorrectly Declare New Therapy Ineffective (Missed Opportunity)

New Therapy is Effective

H0 is Rejected

Type I Error (α):Incorrectly Declare New Therapy Ineffective(False Claim/Advertising)

Power (1-β):Correctly Declare New Therapy Effective (Introduce New Therapy into Market!)

Sample Size Justification Example [1]

Outcome: Number of surviving days outside the hospital at day 28 after Emergency Room presentation of an antibiotic.

Day 0 = Pt. Rec’d Antibiotic

DischargedDay 5

28 Days

Outcome= 23 Days if patient still living at 28 days

[1] “The Association Between Time to Antibiotics and Relevant Clinical Outcomes in Emergency Department Patients With Various Stages of Sepsis”. A Prospective Multi-Center Study. Bas de Groot, et. al’s. Critical Care. 2015;19(1).

Sample Size Justification Example

“… The expected number of surviving days outside the hospital at day 28 was 23, and was derived from the study of Houck et al..[1].”

H0: New Trt. Median = 23 surviving days outside the hospital

Ha: New Trt. Median ≠ 23 surviving days outside the hospital


“… the present study had a power of 80%, calculated a priori to detect a difference in outcome (α = 0.05) of one day between a group with time to antibiotics below or above (≠) the median time to antibiotics.”

Clinical Difference = Type I error = Confidence Level = Power =Type II error =

95% (Correctly Declare No Trt. Effect)

One Day

80% (Correctly Declare Trt. Effect of 1 Day)

5% (Falsely Declare Trt. Effect of 1 Day)

20% (Falsely Declare No Trt. Effect, missed opportunity)

Pop Quiz!!


“In this calculation, the skewed distribution of the number of surviving days outside thehospital was taken into account. It was calculated that approximately 400 inclusions per PIRO category were needed. “Where PIRO = Predisposition, Infection, Response, and Organ failure score

How was Data Analyzed?

• Prospective Study• Generally data analysis is dictated by design and

straight forward.

• Retrospective Study • Analysis more complicated due to confounding and

possible bias.

• When in doubt, Consult a statistician J

Interpreting Results

Data analysis tests the Objectives/Hypotheses of interest defined in Abstract. Typically provided are:1. P-values2. Summary Statistics (Mean, proportion, etc.)3. Confidence Intervals on Sample Statistics

What is a p-value?

• P stands for Probability (between 0.00 and 1.00).• P-values can indicate how incompatible the data are

with a specified statistical model or hypothesis.

What is a p-value?

• The smaller the p-value, the greater the statistical incompatibility of the data with the null hypothesis (evidence to Reject H0) , if the underlying assumptions used to calculate the p-value are true.

• Conversely, the larger the p-value, the greater the statistical compatibility of the data with the null hypothesis (No evidence to Reject H0), if the underlying assumptions used to calculate the p-value are true.

P-value Example

Historically the length of stay for patients is 10 days. It is hypothesized that a new therapy will reduce the length of stay by at least 2 days.

1) Define Hypotheses:H0 : New Therapy Mean = Current 10 days Ha : New Therapy Mean < Current 8 (10 – 2) days

2) Determine appropriate sample size, n, based on clinical difference, confidence level and power.

3) Randomly select n test units from population.

P-value Example

4) Calculate appropriate statistic from sample of n test units: New Therapy Mean = 8 days

5) Calculate p-value =Probability of “observing” a New Therapy Mean of 8 days assuming sample is randomly selected from the hypothesized normal population with mean = 10 and standard deviation of 1.0.

For this scenario the calculated p-value = 0.02.

Graphical Representation of p-value

H0 : Pop. Mean = 10 daysp-value =

New Therapy Mean = 8 days

There are 2 chances in one hundred (0.02) that the New Therapy Mean of 8 days is “compatible” with the null hypothesis that the population mean = 10 days.

P-value Example

6) Make decision about H0 based on evidence collected.• The smaller the p-value, the greater the statistical

incompatibility of the data with the null hypothesis (Reject H0) , if the underlying assumptions used to calculate the p-value are true.

• Conversely, the larger the p-value, the greater the statistical compatibility of the data with the null hypothesis (Fail to Reject H0), if the underlying assumptions used to calculate the p-value are true.

• Is the calculated p-value = 0.02 small or large?

What is a Small/Large p-value?

The Type I or Alpha (α) Error is used as the Cut-Off Point to determine whether the p-value is “small” or “large”.

What value of α is commonly chosen? α = 0.05

What is the corresponding confidence level?= 0.95 = 1.00 – 0.05

Statistics is Easy J

What is a Small/Large p-value?

If p-value < αThen Reject H0 in favor of Alternative Ha

P-value

alpha

Reject H0 Fail to Reject H0

If the p is Low, the NULL must GO!

Example 1 – “Small” P-value If New Therapy Mean = 8 Days,

then p-value = 0.02Since p-value = 0.02 is less than α = 0.05,the New Therapy Mean of 8 days is incompatible with the null hypothesis that the New Therapy Mean is 10 days. Conclusion: The New Therapy clinically reduces the length of stay by at least 2 days from current 10 days, with 95% confidence.

P-value = 0.02

Alpha = 0.05


Example 2 – “Large” P-value If New Therapy Mean = 9 Days,

then p-value = 0.15Since p-value = 0.15 is greater than or equal to α = 0.05, the New Therapy Mean of 9 days is compatible with the null hypothesis that the New Therapy Mean is 10 days. Conclusion: The New Therapy is not clinically different from the current mean of 10 days, with 95% confidence.

P-value = 0.15

Alpha = 0.05


Example 3 - ?? P-value If New Therapy Mean = 8.4 Days,

then p-value = 0.05

Since p-value = 0.05 is greater than or equal to α = 0.05,the New Therapy Mean of 8.4 days is compatible with the null hypothesis. Conclusion: The New Therapy Mean of 8.4 days is not clinically different from the current 10 days with 95% confidence.

P-value = Alpha = 0.05


P-Value Examples 1-3

Ex. New Therapy

Mean

H0: Mean

=

Mean Diff-

erence

P-value

Alpha Decision about H0

New Therapy is:

1 8.0 10.0 -2.0 0.02 0.05 Reject Effective

2 9.0 10.0 -1.0 0.15 0.05 Fail to Reject

Non-Effective

3 8.4 10.0 -1.6 0.05 0.05 Fail to Reject

Non-Effective

• p-value provides degree of compatibility with H0• P-values just on either side of alpha (0.04 vs. 0.05)

must be interpreted with great care!• Results of p-value MUST NOT be interpreted in a

vacuum!!

The Great p-value Controversy!

1. Typically used as “black” and “white” cut-off for significant vs. non-significant without regard to study design and sample size.

2. Some publications have BANNED the use of p-values due to “Non-Repeatable Results”.

3. American Statistical Assoc (ASA). issued statement [2]:“…Must be used in context of study design, power of test and clinical significance.”

[2] Ronald L. Wasserstein & Nicole A. Lazar (2016): The ASA's statement on p-values: context, process, and purpose, The American Statistician, DOI:10.1080/00031305.2016.1154108. To link to this article: http://dx.doi.org/10.1080/00031305.2016.1154108

What Do you think is the Value of P?

I Agree!

What Do you think is the Value of P?

Must be able to Match Statistical Difference with Clinical Difference

through proper Study Design and Sample Size !

• The following Sample Statistics should be summarized depending on the type of outcome measured:

• Central Tendency or Location: – Mean , Median (50th %tile)

• Dispersion or spread of data: – standard deviation, Range (Max – Min), – Interquartile range (75th %tile - 25th %tile)

• Sample size• Proportions or rates (numerator/ denominator) • Confidence Interval for Sample Statistics

Summary Statistics

Summary Statistics Table

Group Sample Size

Mean Std. Dev.

Confidence Interval

P-value

Decision about

H0

Test n1 s1 (LCL, UCL)

Control n2 s2 (LCL, UCL)

Test -Control

Total Difference

pooled (LCL, UCL) 0.05 Accept or

Reject

2X1X

Note: Estimate of dispersion should always be provided either as standard deviation or confidence intervals!!

• The Sample Statistic of interest (e.g. Sample Mean) is estimated from the random sample of n test units.

• Confidence interval (CI) on estimated statistic provides a range of expected values if the experiment was repeated numerous times.

• CI provides the precision of the estimated statistic.

Confidence Interval

• A CI takes form :– Estimate ± Delta, – where Delta depends on the sample size, variability

and confidence level. – Delta corresponds to the statistically detectable

difference.• Lower and Upper Conf. Limits for Sample Mean:

= constant based on the confidence level ≈ 2.0 for 95% confidence level.

Confidence Interval

)n (s/ α t X UCL)(LCL, ±=

α t

mean ofdeviation standardn s/ =

What is Confidence Level ?

• The confidence level (%) is a measure of how often the estimated limits capture the true value if experiments of sample size, N, are repeatedly taken.

• Suppose 10 experiments with sample size of 50 each are conducted and the 90% confidence limits on the true mean are calculated.

• A 90% confidence level implies that the estimated interval will capture the true mean 9 out of 10 times.

• The true mean is only known if the entire population is sampled.

)) N(s/ t X( ±

100

101

102

103

10490% Confidence Limits on True Population Mean

1 2 3 4 5 6 7 8 9 10Experiment Number

True Mean

90% Confidence Level Example

In real world, only 1 Expt is run.(But which one?)

• A CI may be used to graphically test the null hypothesis and will provide the same conclusion as the p-value approach.

• Similar to game of horseshoes!

Using Confidence Intervalto Test Hypotheses

• Game of Horseshoes Analogy:

– Stake in ground is the Hypothesized value, H0: μ = 10

– Horseshoe represents calculated CI.

– Throwing horseshoe represents running experiment.

– If horseshoe captures stake, then H0 is

– If horseshoe misses stake, then H0 is

Using Confidence Intervalto Test Hypotheses

NOT Rejected.Rejected.

Statistics Makes Sense

If H0 is Not captured in the Confidence IntervalThen Reject H0 (Incompatible with H0)

Example 1 (p-value = 0.02) Using CI to Test Hypotheses

UCL( )

H0 : μ = 10

LCL

Test Mean = 8

Delta = 1.6 = statistical difference

A p-value of 0.02 < alpha = 0.05 corresponds to the 95% confidence intervals not capturing the hypothesized value.

If H0 is included in the Confidence IntervalThen Fail to Reject H0 (Compatible with H0)


UCL

( )H0 : μ = 10

LCL

Test Mean = 9

Delta = 1.6A p-value of 0.15 ≥ alpha = 0.05 corresponds to the hypothesized value being imbedded within the 95% confidence intervals.

If H0 is included in the Confidence IntervalThen Fail to Reject H0 (Compatible with H0)


UCL

( )H0 : μ = 10

LCL

Test Mean = 8.4

Delta = 1.6

A p-value of 0.05 corresponds to one of the 95% confidence intervals being equal to the hypothesized value (barely capturing or accepting the null hypothesis).

Interpreting Results

1. Proportions,2. Percentages, 3. Rates, and 4. Ratios

To properly interpret results, must know:Numerator and Denominator!

Proportions, Percentages, Rates and Ratios

Group Numerator Denominator Numerator/ Denominator

Range Example

Proportion(decimal)

X= No. Incidences

N= No.Opportunities

P = x/n (0.0, 1.0)

No. MedErrors/ No. Admins. =

0.02

Percentage (%)

X= No. Incidences

N= No.Opportunities

100(p) (0.0, 100%)

% Med Errors = 2.0%

Rate (per unit of measure)

X= No. Incidences

PerOpportunities

X Per unit of measure

> 0 2.4 BSI per 1000 central-

line days

Ratio (proportions

or rates)

Test Control Test/Control(unit less)

Test/Control

Survival Rate

BSI : Bloodstream Infection

Confidence Interval on Ratios

Results are often presented as ratio of two rates or proportions (Hazard Ratio, Relative Risk, Odds Ratio).

1. Ratio = Test Rate / Control Rate2. Ratio = 1 implies Test Rate = Control Rate3. Ratio > 1 implies Test Rate > Control Rate4. Ratio < 1 implies Test Rate < Control Rate

Confidence Interval on Ratios

1. H0: Ratio = 1.0 vs. Ha: Ratio ≠ 1.02. If CI on Ratio captures hypothesized value of 1.0, then

no evidence to reject H0.3. If CI on Ratio does not capture hypothesized value of

1.0, then there is evidence to reject H0 in favor of the alternative Ratio ≠ 1.0

4. Depending on direction, Test group either increases or reduces rate (e.g. infection) relative to the Control group rate.


Is that the Only Question?

Learning Objectives

the foundation that they are built upon.

Reliable Results

Data Analysis

Data Collection

Study Design

Results are only reliable as

I’d like to thank Loretta Dorn, CRNI, Member of the National Council on

Education (NCOE), INS.

for the opportunity to share With You that

Statistics is Fun J, Easy and Makes Sense!

Thank YOU for Your Participation

Documents

SC 604 Phillips · 2016. 5. 2. · Characteristic Retrospective Study (Observational or Case-Control) Prospective (Designed) Study (Cohort) 1. Outcome is measured Before exposure