Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
SC 604Statistically Significant or Not Significant?
Is that the Question? Jerry Phillips
[email protected]513-1776
Applying Statistical Research to Infusion Therapy
Statistically Significant or Not Significant?
Is that the Question?
Learning Objectives:1. Identify what constitutes sound statistical data.
(Diagnose whether results are reliable)2. Discuss application of data to an infusion practice.
(If data are reliable, then how to interpret and apply the results).
Statistically Significant or Not Significant?
Is that the Question?
Other Questions Must be Asked First:1. How was Study designed?2. How was Data Collected?3. How was Data Analyzed?4. Do Statistical Results make “Clinical Sense”?
How was Study Designed?
1. Methodsa) Study Design:
i. Retrospective vs. Prospective
b) Data Collectioni. Randomizationii. Representative Sampleiii. Bias
c) Sample Size Justification
Retrospective vs. Prospective Studies
Characteristic Retrospective Study(Observational or Case-Control)
Prospective (Designed) Study(Cohort)
1. Outcome is measured
Before exposure After exposure
2. Control and Results Controls are selected on the basis of not having the outcome
• Yields true incidence rates and relative risks
• May uncover unanticipated associations with outcome
3. Frequency of Outcome:
Rare outcomes (rare diseases)
Common outcomes
Retrospective vs. Prospective Studies
Characteristic Retrospective Study(Observational or Case-Control)
Prospective (Designed) Study(Cohort)
4. Cost: Inexpensive Expensive5. Sample Size: Smaller numbers required
or use existing databaseLarger numbers required to properly detect clinical effect
6. Study Duration: Quicker to complete or use existing database
Longer to complete
7. Bias prone to: Selection and recall/retrospective bias
Attrition and change in methods over time bias
8. Confounding Effect Factors confounded–difficult to separate effects
Effect of Factors estimated separately
How was Data Collected?
1. Methodsa) Data Collection
i. Randomizationii. Representative Sampleiii. Bias
Data Collection
1. Define Population2. Randomly Select n
Samples
Population
x
x
x
x
xx
x xx x
x xx
x
x
x
x
xx
xx
x xx
xx
xx
x
x
xxx x x
x xx x
3. Make Inferences about Population from Sample
Sample n
Sample 4
Sample 1
Sample 3Sample 2
Sample 5
Sample
A sample of n test units randomly drawn from a larger population of interest. Conclusions of study only pertain to the population from which the samples were collected.
Conclusions may NOT be extrapolated to other Populations!
Representative Sample
Benefits of Random Selection
• Avoids bias (e.g. all samples at beginning of production)
• Provides an equal opportunity for every sample to be selected.
• Provides insurance against uncontrolled factors (e.g., weather, humidity, temperature, etc.)
• Allows inference across the population of interest.
• Governmental Agencies Require it!
Types of Study Bias
1. Selection bias - e.g. study of nursing practice in U.S. is not representative of the practices in Canada.
2. Observation bias (recall and information) - e.g. on questioning, healthy people are more likely to under report their alcohol intake than people with a disease.
3. Observation bias (interviewer) - e.g. different interviewer styles might provoke different responses to the same question.
4. Observation bias (misclassification or misdiagnosis) - tends to dilute an effect
5. Losses to follow up - e.g. ill people may not feel able to continue with a study whereas health people tend to complete it.
Study Design and Data Collection Hospital Survey Example
1) Define the Population of Interest :
a) Subject : Nurses that work with Neonatesb) Hospital Conditions :
1) Hospital Size : Large ( > 250 beds )2) Hospital Location : Midwest3) Specialty : Neonates
2) Randomly select N nurses to survey from a larger list of nurses that meet the hospital conditions requirements.
3) Survey results ONLY pertain to nurses that use neonate sets in large hospitals located in the Midwest.
Study Design and Data Collection Hospital Survey Example
Famous Selection Bias
Hypothesis Terminology
• Null Hypothesis denoted H0 (“H” naught) • H0 usually postulates the absence of an effect,
such as no difference between two groups, or the absence of a relationship between a factor and an outcome.
• Sometimes referred to as the “Dull”Lhypothesis (nothing exciting is happening)
• H0 : Treatment Effect= 0
Hypothesis Terminology
• Alternative hypothesis denoted Ha
• Depending on objective of the study, Hapostulates the desired alternative effect, such as clinical difference between two groups, or the presence of a relationship between a factor and an outcome.
• Ha : Treatment Effect ≠ 0• Ha : Treatment Effect > 0 (positive effect)• Ha : Treatment Effect < 0 (negative effect)
Sample Size Justification?
Sample Size Justification should define:a) Clinical importance/difference of study
b) Confidence Level (1-α):
Chance of correctly declaring no difference existsc) Power (1-β):
Chance of correctly declaring a clinical difference exists
d) Assumptions (distribution, variability, etc.)
What is a Clinical Difference?
• Clinical difference defined as:
– Threshold (determined by medical team) at which new treatment is more efficacious than the current treatment.
– Example: Current treatment provides a 5 year survival rate of 50%
– New treatment is considered clinically efficacious if a 5 year survival rate of 60% is demonstrated.
Statistical Difference
• Statistical difference is the minimum difference one may detect between the new and current treatment given the: 1. sample size, 2. confidence level,3. power of the test, and 4. standard deviation.
Statistical vs. Clinical Difference
Case 1: If sample size is “too small”, then study is not sensitive enough to detect desired clinical difference.Case 2: If sample size is “too large”, then study is over sensitive and is able to detect a statistical difference smaller than the desired clinical difference.Case 3: If sample size is “Just Right”, then study is properly powered to detect a statistical difference that is equal to the desired clinical difference.Case 3 is the goal of a Prospective Study.
Confidence Level & Type I Error Definitions
Truth : H0 is True (Only known if entire population were sampled)H0 :Patient is truly “Healthy” (Test Result = Negative)
Confidence Level = Chance of correctly diagnosing patient is Healthy (True Negative)Example: 95% chance of correctly diagnosing patient is Healthy
Type I or alpha Error = 100% - Confidence Level (%) A truly “Healthy” Patient is incorrectly diagnosed with disease ( False Negative Test Result)
Example: Type I Error = 100% - 95% Confidence Level = 5%= 5% chance of a False Negative Test Result.
Power & Type II Error Definitions
Truth : H0 is False (Only known if entire population were sampled)Patient is truly “Unhealthy” (Test Result should be Truly Positive)
Power = Chance of correctly diagnosing patient is Unhealthy (True Positive)Example: 90% chance of correctly diagnosing Patient is Unhealthy
Type II or beta Error = 100% - Power (%) A truly “Unhealthy” Patient is incorrectly diagnosed not having disease ( False Positive Test Result)
Example: Type II Error = 100% - 90% Power = 10%= 10% chance of a False Positive Test Result.
Confidence, Power, Type I and II Errors (Diagnostic Test)
Decision Based on Data:
Truth : H0 should NOT be RejectedH0 is TrueH0 :Patient = “Healthy” H0 :Test = Negative
Truth : H0 should be RejectedHa is TrueHa :Patient = “Unhealthy” Ha :Test = Positive
Test result is Negative
H0 is NOT Rejected
Confidence Level (1-α):True Negative(Correctly Diagnose Patient is Healthy)
Type II Error (β):False Positive(Incorrectly Diagnose Patient is Healthy)
Test result is Positive
H0 is Rejected
Type I Error (α):False Negative(Incorrectly Diagnose Patient is Unhealthy)
Power (1-β):True Positive(Correctly Diagnose Patient is Unhealthy)
Confidence, Power, Type I and II Errors (Clinical Trial)
Decision Based on Data:
Truth : H0 should NOT be RejectedH0 is TrueH0 : New Therapy = Ineffective
Truth : H0 should be RejectedHa is TrueHa :New Therapy = Effective
New Therapy is IneffectiveH0 is NOT Rejected
Confidence Level (1-α):Correctly Declare New Therapy Ineffective(Back to R&D Lab)
Type II Error (β):Incorrectly Declare New Therapy Ineffective (Missed Opportunity)
New Therapy is Effective
H0 is Rejected
Type I Error (α):Incorrectly Declare New Therapy Ineffective(False Claim/Advertising)
Power (1-β):Correctly Declare New Therapy Effective (Introduce New Therapy into Market!)
Sample Size Justification Example [1]
Outcome: Number of surviving days outside the hospital at day 28 after Emergency Room presentation of an antibiotic.
Day 0 = Pt. Rec’d Antibiotic
DischargedDay 5
28 Days
Outcome= 23 Days if patient still living at 28 days
[1] “The Association Between Time to Antibiotics and Relevant Clinical Outcomes in Emergency Department Patients With Various Stages of Sepsis”. A Prospective Multi-Center Study. Bas de Groot, et. al’s. Critical Care. 2015;19(1).
Sample Size Justification Example
“… The expected number of surviving days outside the hospital at day 28 was 23, and was derived from the study of Houck et al..[1].”
H0: New Trt. Median = 23 surviving days outside the hospital
Ha: New Trt. Median ≠ 23 surviving days outside the hospital
Sample Size Justification Example
“… the present study had a power of 80%, calculated a priori to detect a difference in outcome (α = 0.05) of one day between a group with time to antibiotics below or above (≠) the median time to antibiotics.”
Clinical Difference = Type I error = Confidence Level = Power =Type II error =
95% (Correctly Declare No Trt. Effect)
One Day
80% (Correctly Declare Trt. Effect of 1 Day)
5% (Falsely Declare Trt. Effect of 1 Day)
20% (Falsely Declare No Trt. Effect, missed opportunity)
Pop Quiz!!
Sample Size Justification Example
“In this calculation, the skewed distribution of the number of surviving days outside thehospital was taken into account. It was calculated that approximately 400 inclusions per PIRO category were needed. “Where PIRO = Predisposition, Infection, Response, and Organ failure score
How was Data Analyzed?
• Prospective Study• Generally data analysis is dictated by design and
straight forward.
• Retrospective Study • Analysis more complicated due to confounding and
possible bias.
• When in doubt, Consult a statistician J
Interpreting Results
Data analysis tests the Objectives/Hypotheses of interest defined in Abstract. Typically provided are:1. P-values2. Summary Statistics (Mean, proportion, etc.)3. Confidence Intervals on Sample Statistics
What is a p-value?
• P stands for Probability (between 0.00 and 1.00).• P-values can indicate how incompatible the data are
with a specified statistical model or hypothesis.
What is a p-value?
• The smaller the p-value, the greater the statistical incompatibility of the data with the null hypothesis (evidence to Reject H0) , if the underlying assumptions used to calculate the p-value are true.
• Conversely, the larger the p-value, the greater the statistical compatibility of the data with the null hypothesis (No evidence to Reject H0), if the underlying assumptions used to calculate the p-value are true.
P-value Example
Historically the length of stay for patients is 10 days. It is hypothesized that a new therapy will reduce the length of stay by at least 2 days.
1) Define Hypotheses:H0 : New Therapy Mean = Current 10 days Ha : New Therapy Mean < Current 8 (10 – 2) days
2) Determine appropriate sample size, n, based on clinical difference, confidence level and power.
3) Randomly select n test units from population.
P-value Example
4) Calculate appropriate statistic from sample of n test units: New Therapy Mean = 8 days
5) Calculate p-value =Probability of “observing” a New Therapy Mean of 8 days assuming sample is randomly selected from the hypothesized normal population with mean = 10 and standard deviation of 1.0.
For this scenario the calculated p-value = 0.02.
Graphical Representation of p-value
H0 : Pop. Mean = 10 daysp-value =
New Therapy Mean = 8 days
There are 2 chances in one hundred (0.02) that the New Therapy Mean of 8 days is “compatible” with the null hypothesis that the population mean = 10 days.
P-value Example
6) Make decision about H0 based on evidence collected.• The smaller the p-value, the greater the statistical
incompatibility of the data with the null hypothesis (Reject H0) , if the underlying assumptions used to calculate the p-value are true.
• Conversely, the larger the p-value, the greater the statistical compatibility of the data with the null hypothesis (Fail to Reject H0), if the underlying assumptions used to calculate the p-value are true.
• Is the calculated p-value = 0.02 small or large?
What is a Small/Large p-value?
The Type I or Alpha (α) Error is used as the Cut-Off Point to determine whether the p-value is “small” or “large”.
What value of α is commonly chosen? α = 0.05
What is the corresponding confidence level?= 0.95 = 1.00 – 0.05
Statistics is Easy J
What is a Small/Large p-value?
If p-value < αThen Reject H0 in favor of Alternative Ha
P-value
alpha
Reject H0 Fail to Reject H0
If the p is Low, the NULL must GO!
Example 1 – “Small” P-value If New Therapy Mean = 8 Days,
then p-value = 0.02Since p-value = 0.02 is less than α = 0.05,the New Therapy Mean of 8 days is incompatible with the null hypothesis that the New Therapy Mean is 10 days. Conclusion: The New Therapy clinically reduces the length of stay by at least 2 days from current 10 days, with 95% confidence.
P-value = 0.02
Alpha = 0.05
Reject H0 Fail to Reject H0
Example 2 – “Large” P-value If New Therapy Mean = 9 Days,
then p-value = 0.15Since p-value = 0.15 is greater than or equal to α = 0.05, the New Therapy Mean of 9 days is compatible with the null hypothesis that the New Therapy Mean is 10 days. Conclusion: The New Therapy is not clinically different from the current mean of 10 days, with 95% confidence.
P-value = 0.15
Alpha = 0.05
Reject H0 Fail to Reject H0
Example 3 - ?? P-value If New Therapy Mean = 8.4 Days,
then p-value = 0.05
Since p-value = 0.05 is greater than or equal to α = 0.05,the New Therapy Mean of 8.4 days is compatible with the null hypothesis. Conclusion: The New Therapy Mean of 8.4 days is not clinically different from the current 10 days with 95% confidence.
P-value = Alpha = 0.05
Reject H0 Fail to Reject H0
P-Value Examples 1-3
Ex. New Therapy
Mean
H0: Mean
=
Mean Diff-
erence
P-value
Alpha Decision about H0
New Therapy is:
1 8.0 10.0 -2.0 0.02 0.05 Reject Effective
2 9.0 10.0 -1.0 0.15 0.05 Fail to Reject
Non-Effective
3 8.4 10.0 -1.6 0.05 0.05 Fail to Reject
Non-Effective
• p-value provides degree of compatibility with H0• P-values just on either side of alpha (0.04 vs. 0.05)
must be interpreted with great care!• Results of p-value MUST NOT be interpreted in a
vacuum!!
The Great p-value Controversy!
1. Typically used as “black” and “white” cut-off for significant vs. non-significant without regard to study design and sample size.
2. Some publications have BANNED the use of p-values due to “Non-Repeatable Results”.
3. American Statistical Assoc (ASA). issued statement [2]:“…Must be used in context of study design, power of test and clinical significance.”
[2] Ronald L. Wasserstein & Nicole A. Lazar (2016): The ASA's statement on p-values: context, process, and purpose, The American Statistician, DOI:10.1080/00031305.2016.1154108. To link to this article: http://dx.doi.org/10.1080/00031305.2016.1154108
What Do you think is the Value of P?
I Agree!
What Do you think is the Value of P?
Must be able to Match Statistical Difference with Clinical Difference
through proper Study Design and Sample Size !
• The following Sample Statistics should be summarized depending on the type of outcome measured:
• Central Tendency or Location: – Mean , Median (50th %tile)
• Dispersion or spread of data: – standard deviation, Range (Max – Min), – Interquartile range (75th %tile - 25th %tile)
• Sample size• Proportions or rates (numerator/ denominator) • Confidence Interval for Sample Statistics
Summary Statistics
Summary Statistics Table
Group Sample Size
Mean Std. Dev.
Confidence Interval
P-value
Decision about
H0
Test n1 s1 (LCL, UCL)
Control n2 s2 (LCL, UCL)
Test -Control
Total Difference
pooled (LCL, UCL) 0.05 Accept or
Reject
2X1X
Note: Estimate of dispersion should always be provided either as standard deviation or confidence intervals!!
• The Sample Statistic of interest (e.g. Sample Mean) is estimated from the random sample of n test units.
• Confidence interval (CI) on estimated statistic provides a range of expected values if the experiment was repeated numerous times.
• CI provides the precision of the estimated statistic.
Confidence Interval
• A CI takes form :– Estimate ± Delta, – where Delta depends on the sample size, variability
and confidence level. – Delta corresponds to the statistically detectable
difference.• Lower and Upper Conf. Limits for Sample Mean:
= constant based on the confidence level ≈ 2.0 for 95% confidence level.
Confidence Interval
)n (s/ α t X UCL)(LCL, ±=
α t
mean ofdeviation standardn s/ =
What is Confidence Level ?
• The confidence level (%) is a measure of how often the estimated limits capture the true value if experiments of sample size, N, are repeatedly taken.
• Suppose 10 experiments with sample size of 50 each are conducted and the 90% confidence limits on the true mean are calculated.
• A 90% confidence level implies that the estimated interval will capture the true mean 9 out of 10 times.
• The true mean is only known if the entire population is sampled.
)) N(s/ t X( ±
100
101
102
103
10490% Confidence Limits on True Population Mean
1 2 3 4 5 6 7 8 9 10Experiment Number
True Mean
90% Confidence Level Example
In real world, only 1 Expt is run.(But which one?)
• A CI may be used to graphically test the null hypothesis and will provide the same conclusion as the p-value approach.
• Similar to game of horseshoes!
Using Confidence Intervalto Test Hypotheses
• Game of Horseshoes Analogy:
– Stake in ground is the Hypothesized value, H0: μ = 10
– Horseshoe represents calculated CI.
– Throwing horseshoe represents running experiment.
– If horseshoe captures stake, then H0 is
– If horseshoe misses stake, then H0 is
Using Confidence Intervalto Test Hypotheses
NOT Rejected.Rejected.
Statistics Makes Sense
If H0 is Not captured in the Confidence IntervalThen Reject H0 (Incompatible with H0)
Example 1 (p-value = 0.02) Using CI to Test Hypotheses
UCL( )
H0 : μ = 10
LCL
Test Mean = 8
Delta = 1.6 = statistical difference
A p-value of 0.02 < alpha = 0.05 corresponds to the 95% confidence intervals not capturing the hypothesized value.
If H0 is included in the Confidence IntervalThen Fail to Reject H0 (Compatible with H0)
Example 2 (p-value = 0.15) Using CI to Test Hypotheses
UCL
( )H0 : μ = 10
LCL
Test Mean = 9
Delta = 1.6A p-value of 0.15 ≥ alpha = 0.05 corresponds to the hypothesized value being imbedded within the 95% confidence intervals.
If H0 is included in the Confidence IntervalThen Fail to Reject H0 (Compatible with H0)
Example 3 (p-value = 0.05) Using CI to Test Hypotheses
UCL
( )H0 : μ = 10
LCL
Test Mean = 8.4
Delta = 1.6
A p-value of 0.05 corresponds to one of the 95% confidence intervals being equal to the hypothesized value (barely capturing or accepting the null hypothesis).
Interpreting Results
1. Proportions,2. Percentages, 3. Rates, and 4. Ratios
To properly interpret results, must know:Numerator and Denominator!
Proportions, Percentages, Rates and Ratios
Group Numerator Denominator Numerator/ Denominator
Range Example
Proportion(decimal)
X= No. Incidences
N= No.Opportunities
P = x/n (0.0, 1.0)
No. MedErrors/ No. Admins. =
0.02
Percentage (%)
X= No. Incidences
N= No.Opportunities
100(p) (0.0, 100%)
% Med Errors = 2.0%
Rate (per unit of measure)
X= No. Incidences
PerOpportunities
X Per unit of measure
> 0 2.4 BSI per 1000 central-
line days
Ratio (proportions
or rates)
Test Control Test/Control(unit less)
Test/Control
Survival Rate
BSI : Bloodstream Infection
Confidence Interval on Ratios
Results are often presented as ratio of two rates or proportions (Hazard Ratio, Relative Risk, Odds Ratio).
1. Ratio = Test Rate / Control Rate2. Ratio = 1 implies Test Rate = Control Rate3. Ratio > 1 implies Test Rate > Control Rate4. Ratio < 1 implies Test Rate < Control Rate
Confidence Interval on Ratios
1. H0: Ratio = 1.0 vs. Ha: Ratio ≠ 1.02. If CI on Ratio captures hypothesized value of 1.0, then
no evidence to reject H0.3. If CI on Ratio does not capture hypothesized value of
1.0, then there is evidence to reject H0 in favor of the alternative Ratio ≠ 1.0
4. Depending on direction, Test group either increases or reduces rate (e.g. infection) relative to the Control group rate.
Statistically Significant or Not Significant?
Is that the Only Question?
Learning Objectives
the foundation that they are built upon.
Reliable Results
Data Analysis
Data Collection
Study Design
Results are only reliable as
I’d like to thank Loretta Dorn, CRNI, Member of the National Council on
Education (NCOE), INS.
for the opportunity to share With You that
Statistics is Fun J, Easy and Makes Sense!
Thank YOU for Your Participation