Upload
julius-matthews
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Biostatistics and Evaluation of Evidence
April 23, 2015
Bibliography
• The Physiologic Basis of Surgery, 4th ed. Chapter 11: Biostatistics.
• Scientific American Surgery. Competency-Based Surgical Care, Practice-Based Learning and Improvement, 4: Evidence-Based Surgery.
• Surgery: Basic Science and Clinical Evidence, 2nd ed. Chapter 2: Evidence-Based Surgery.
Evidence-Based Medicine
"the conscientious, explicit, and judicious use of current best evidence in making decisions about the care of
individual patients.“
1. clinical decisions should be based on the best-available scientific evidence
2. the clinical problem, rather than the habits of protocols, should determine the type of evidence to be sought
3. identifying the best evidence means using epidemiological and biostatistical ways of thinking
4. conclusions derived from identifying and critically appraising evidence are useful only if put into action in managing patients or making health care decisions
5. performance should be constantly evaluated
Study Designs
Retrospective Studies – Chart Reviewsa. Case Reportb. Case Series (arbitrarily defined as < 10
subjects)c. Case-Control Seriesd. Retrospective Cohort Studiese. Meta Analyses
Prospective Studies
a. Cross Sectional Surveyb. Prospective Cohort Studyc. Randomized Control Trial Issues of concern in surgical trials Who should perform the surgery: experts only
or surgeons of varying ability. Standardization of the procedure so it is
performed similarly by all participants and can be duplicated by others following the trial.
Outcome Studies
"a multidisciplinary field of inquiry, both basic and applied, that examines the use, costs, quality and accessibility, delivery, organization, financing and outcomes of health care services to increase knowledge and understanding of the structures, processes and effects of health services for individuals and populations."
Levels of Evidence
• At one end of the spectrum is an empirical impression that a practice makes physiologic sense and seems to work well.
• At the other end of the spectrum is evidence accumulated from multiple carefully conducted scientific experiments with consistent and reproducible results.
Levels of Evidence, as Stratified by the U.S. Preventive Services Task Force
Level of Evidence Source of Evidence
I At least one properly randomized controlled trial
II - 1 Well-designed controlled trials without randomization
II – 2Well-designed cohort or case-control analytic study, preferably from more than one center or research group
II - 3Multiple time-series with or without intervention or possibly dramatic results from uncontrolled trials
IIIOpinions from respected authorities based on clinical experience, descriptive studies and case reports, or committees of experts
Levels of Evidence cont.
Category of Recommendation Basis of Recommendation
Level A Good and consistent scientific evidence
Level B Limited or inconsistent scientific evidence
Level C Consensus and/or expert opinion
Oxford Centre for Evidence Based Medicine Levels of Evidence
Level Therapy/prevention, etiology/harm
1a Systematic review
1b Individual RCT
1c All-or-none case series
2a Systematic review of cohort studies
2b Individual cohort studies
2c Outcomes research
3a Systematic review of case-control studies
3b Individual case-control study
4 Case series
5 Expert opinion
Grades of Recommendation, Assessment, Development, and Evaluation (GRADE)
• Quality of Evidence – 2 definitions• In the context of a systematic review, the ratings of the quality
of evidence reflect the extent of our confidence that the estimates of the effect are correct.
• In the context of making recommendations, the quality ratings reflect the extent of our confidence that the estimates of an effect are adequate to support a particular decision or recommendation.
GRADE’s Levels of EvidenceQuality level Current definition
High We are very confident that the true effect lies close to that of the estimate of the effect
ModerateWe are moderately confident in the effect estimate: The true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different
LowOur confidence in the effect estimate is limited: The true effect may be substantially different from the estimate of the effect
Very lowWe have very little confidence in the effect estimate: The true effect is likely to be substantially different from the estimate of effect
How Data Are Expressed and Summarized
Measures of Location• Mean• Median• Mode
Measures of Variability
• Variance (σ2) • Standard Deviation (σ)• Standard Error ()• Coefficient of Variation≠ 0
H0 true H1 trueAccept H0 Correct
decisionType II error
Reject H0 Type 1 error
Correct decision
Significance Testing
True Situation
Our decision
H0 : μ = μ0 vs. H1 : μ ≠ μ0
α = P(reject H0 | H0 is true)β = P(accept H0 | H1 is true)
Power = P(reject H0 | H1 is true) = 1 – β
P-Value for H0 : μ = μ0
z
-z
0
0
0
-z
z
Upper-Tailed
Lower-Tailed
Two-Tailed
H1 : μ > μ0
H1 : μ < μ0
H1 : μ ≠ μ0
Continuous Data – Normal Distribution# of Groups Test Null Hypothesis
1 One-Sample t-test H0: µ = µ0
1 (2 measures) Paired t-test H0: δ = 0
2 (equal variances) Unpaired t-test for groups with = σ2 H0: µ1 = µ2
2 (unequal variances) Unpaired t-test for groups with ≠ σ2 H0: µ1 = µ2
3 or more Analysis of Variance (ANOVA) H0: µ1 = µ2 … = µp *
2 or moreWith adjustment for prognostic factors
Analysis of Covariance (ANCOVA) H0: µ1 = µ2 … = µp
* H1: µ1 ≠ µ2 … ≠ µp
Continuous Data – Non-Normal Distribution
# of Groups Test Null Hypothesis1 Sign test, Wilcoxon signed rank test H0: median = m0
1 (2 measures) Sign test, Wilcoxon signed rank test H0: median = 0
2 Mann-Whitney U, Wilcoxon rank H0: m1 = m2
3 or more Kruskal-Wallis test H0: m1 = m2 … = mp
Measures of AssociationMeasures Parametric Test Hypothesis
Independent Yes Pearson’s correlation coefficient H0: ρ = 0 (-1 ≤ ρ ≤ 1)
Independent No Spearman’s rank coefficientKendall’s τ H0: ρ = 0 (-1 ≤ ρ ≤ 1)
Independent +Dependent Yes Simple linear regression
y = α + βx + εH0: β = 0 r2 (0 ≤ r2 ≤ 1)
Independent +Dependent Yes Multiple linear regression
y = α + β1x1 + β2x2 + … + βkxk + εH0: β1 = 0 … βk = 0
Categorical DataMeasure Test Hypothesis
Contingency table Chi-squared (χ2) approximate H0: Independence
Fisher’s Exact test
Paired observations McNemar’s test
With adjustment for prognostic factors
Mantel-Haenszel χ2
Logistic regression
Diagnostic Medicine
D+ D-
T+ A B
T- C D
Disease
Test
Sensitivity = P(T+ | D+) = A / (A + C)Specificity = P(T- | D-) = D / (B + D)
PPV = P(D+ | T+) = A / (A + B)NPV = P(D- | T-) = D / (C + D)FPR = P(T+ | D-) = B / (B + D)FNR = P(T- | D+) = C / (A + C)
Relative Risk
• Relative risk computes the possibility of disease when exposed to a certain agent, relative to the risk of disease when not exposed to the same agent.
Relative Risk
a b
c d
Disease Yes No
Yes
No
Agent
Relative Risk =
Odds Ratio
• If P1 is the rate at which an event occurs in a population, then the odds associated with that event are P1/Q1. Similarly, the odds associated with the event in a second population is P2/Q2. The odds ratio is simply the ratio of these two odds.
Odds Ratio
a c
b d
Cases Controls
Event
Yes
No
Odds Ratio = ad/bc
Time to Eventwith Censored Data
Method Test Hypothesis
Actuarial or Life TableKaplan-Meier
Log-rank testWilcoxon test
H0: curves are equal
With adjustment for prognostic factors
Cox’s proportional hazards
Outlier
• An observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data.
A Practical Approach for Handling Outliers:
• Test for normality• Transform the data• Use nonparametric statistics• Exclude the outliers
Any special handling of outliers should be reported.
Criteria for Evaluating Studies/Papers
• Is the hypothesis clearly stated?• Will the design of the experiment test the
hypothesis?• Is there a power analysis/sample size justification?• If applicable, are blinding and randomization used
appropriately?• Is the proper control group used?• Are the proper statistical tests used?
What study design is accepted as the best for establishing treatment effectiveness?
a) Retrospective case-control seriesb) Prospective randomized clinical trialc) Meta analysesd) Prospective cohort study
What is considered the lowest level of evidence in evidence-based medicine?
a) Single prospective randomized control trialb) Systematic review of case-control studiesc) Expert opiniond) Well-designed control trial without
randomization (prospective cohort study)
What is a type of error associated with prospective randomized clinical trials?
a) Type I error (erroneously reject H0)
b) Type 2 error (erroneously accept H0)
c) Selection biasd) Measurement biase) All of the above
You take adjacent pieces of colon from 10 rats. One of the pieces is incubated with acetylcholine and the other is incubated in buffer (controls) Before the addition of Ach (or buffer) and ten minutes later you measure serotonin levels in the buffer bathing the mucosa . The change in serotonin levels in treated animals versus controls is best compared using what statistical test? You can assume that the adjacent pieces of tissue respond identically to each other.
a) Unpaired t-testb) Pearson’s correlation coefficient c) Paired t-testd) Analysis of covariance
Ninety patients getting a gastrojejunostomy have a retrocolic anastomosis. Of these 10 develop an internal hernia with small bowel obstruction. Eight patients done by the same surgeon have an antecolic anastomosis and none develop internal hernia or small bowel obstruction over the same follow-up period. How do you compare the results?
a) Fisher’s Exact testb) Linear regressionc) McNemar testd) Mantel-Haenzel test