Biostatistics and Evaluation of Evidence April 23, 2015

Biostatistics and Evaluation of Evidence

April 23, 2015

Bibliography

• The Physiologic Basis of Surgery, 4th ed. Chapter 11: Biostatistics.

• Scientific American Surgery. Competency-Based Surgical Care, Practice-Based Learning and Improvement, 4: Evidence-Based Surgery.

• Surgery: Basic Science and Clinical Evidence, 2nd ed. Chapter 2: Evidence-Based Surgery.

Evidence-Based Medicine

"the conscientious, explicit, and judicious use of current best evidence in making decisions about the care of

individual patients.“

1. clinical decisions should be based on the best-available scientific evidence

2. the clinical problem, rather than the habits of protocols, should determine the type of evidence to be sought

3. identifying the best evidence means using epidemiological and biostatistical ways of thinking

4. conclusions derived from identifying and critically appraising evidence are useful only if put into action in managing patients or making health care decisions

5. performance should be constantly evaluated

Study Designs

Retrospective Studies – Chart Reviewsa. Case Reportb. Case Series (arbitrarily defined as < 10

subjects)c. Case-Control Seriesd. Retrospective Cohort Studiese. Meta Analyses

Prospective Studies

a. Cross Sectional Surveyb. Prospective Cohort Studyc. Randomized Control Trial Issues of concern in surgical trials Who should perform the surgery: experts only

or surgeons of varying ability. Standardization of the procedure so it is

performed similarly by all participants and can be duplicated by others following the trial.

Outcome Studies

"a multidisciplinary field of inquiry, both basic and applied, that examines the use, costs, quality and accessibility, delivery, organization, financing and outcomes of health care services to increase knowledge and understanding of the structures, processes and effects of health services for individuals and populations."

Levels of Evidence

• At one end of the spectrum is an empirical impression that a practice makes physiologic sense and seems to work well.

• At the other end of the spectrum is evidence accumulated from multiple carefully conducted scientific experiments with consistent and reproducible results.

Levels of Evidence, as Stratified by the U.S. Preventive Services Task Force

Level of Evidence Source of Evidence

I At least one properly randomized controlled trial

II - 1 Well-designed controlled trials without randomization

II – 2Well-designed cohort or case-control analytic study, preferably from more than one center or research group

II - 3Multiple time-series with or without intervention or possibly dramatic results from uncontrolled trials

IIIOpinions from respected authorities based on clinical experience, descriptive studies and case reports, or committees of experts

Levels of Evidence cont.

Category of Recommendation Basis of Recommendation

Level A Good and consistent scientific evidence

Level B Limited or inconsistent scientific evidence

Level C Consensus and/or expert opinion

Oxford Centre for Evidence Based Medicine Levels of Evidence

Level Therapy/prevention, etiology/harm

1a Systematic review

1b Individual RCT

1c All-or-none case series

2a Systematic review of cohort studies

2b Individual cohort studies

2c Outcomes research

3a Systematic review of case-control studies

3b Individual case-control study

4 Case series

5 Expert opinion

Grades of Recommendation, Assessment, Development, and Evaluation (GRADE)

• Quality of Evidence – 2 definitions• In the context of a systematic review, the ratings of the quality

of evidence reflect the extent of our confidence that the estimates of the effect are correct.

• In the context of making recommendations, the quality ratings reflect the extent of our confidence that the estimates of an effect are adequate to support a particular decision or recommendation.

GRADE’s Levels of EvidenceQuality level Current definition

High We are very confident that the true effect lies close to that of the estimate of the effect

ModerateWe are moderately confident in the effect estimate: The true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different

LowOur confidence in the effect estimate is limited: The true effect may be substantially different from the estimate of the effect

Very lowWe have very little confidence in the effect estimate: The true effect is likely to be substantially different from the estimate of effect

How Data Are Expressed and Summarized

Measures of Location• Mean• Median• Mode

Measures of Variability

• Variance (σ2) • Standard Deviation (σ)• Standard Error ()• Coefficient of Variation≠ 0

H0 true H1 trueAccept H0 Correct

decisionType II error

Reject H0 Type 1 error

Correct decision

Significance Testing

True Situation

Our decision

H0 : μ = μ0 vs. H1 : μ ≠ μ0

α = P(reject H0 | H0 is true)β = P(accept H0 | H1 is true)

Power = P(reject H0 | H1 is true) = 1 – β

P-Value for H0 : μ = μ0

z

-z

0

0

0

-z

z

Upper-Tailed

Lower-Tailed

Two-Tailed

H1 : μ > μ0

H1 : μ < μ0

H1 : μ ≠ μ0

Continuous Data – Normal Distribution# of Groups Test Null Hypothesis

1 One-Sample t-test H0: µ = µ0

1 (2 measures) Paired t-test H0: δ = 0

2 (equal variances) Unpaired t-test for groups with = σ2 H0: µ1 = µ2

2 (unequal variances) Unpaired t-test for groups with ≠ σ2 H0: µ1 = µ2

3 or more Analysis of Variance (ANOVA) H0: µ1 = µ2 … = µp *

2 or moreWith adjustment for prognostic factors

Analysis of Covariance (ANCOVA) H0: µ1 = µ2 … = µp

* H1: µ1 ≠ µ2 … ≠ µp

Continuous Data – Non-Normal Distribution

# of Groups Test Null Hypothesis1 Sign test, Wilcoxon signed rank test H0: median = m0

1 (2 measures) Sign test, Wilcoxon signed rank test H0: median = 0

2 Mann-Whitney U, Wilcoxon rank H0: m1 = m2

3 or more Kruskal-Wallis test H0: m1 = m2 … = mp

Measures of AssociationMeasures Parametric Test Hypothesis

Independent Yes Pearson’s correlation coefficient H0: ρ = 0 (-1 ≤ ρ ≤ 1)

Independent No Spearman’s rank coefficientKendall’s τ H0: ρ = 0 (-1 ≤ ρ ≤ 1)

Independent +Dependent Yes Simple linear regression

y = α + βx + εH0: β = 0 r2 (0 ≤ r2 ≤ 1)

Independent +Dependent Yes Multiple linear regression

y = α + β1x1 + β2x2 + … + βkxk + εH0: β1 = 0 … βk = 0

Categorical DataMeasure Test Hypothesis

Contingency table Chi-squared (χ2) approximate H0: Independence

Fisher’s Exact test

Paired observations McNemar’s test

With adjustment for prognostic factors

Mantel-Haenszel χ2

Logistic regression

Diagnostic Medicine

D+ D-

T+ A B

T- C D

Disease

Test

Sensitivity = P(T+ | D+) = A / (A + C)Specificity = P(T- | D-) = D / (B + D)

PPV = P(D+ | T+) = A / (A + B)NPV = P(D- | T-) = D / (C + D)FPR = P(T+ | D-) = B / (B + D)FNR = P(T- | D+) = C / (A + C)

Relative Risk

• Relative risk computes the possibility of disease when exposed to a certain agent, relative to the risk of disease when not exposed to the same agent.

Relative Risk

a b

c d

Disease Yes No

Yes

No

Agent

Relative Risk =

Odds Ratio

• If P1 is the rate at which an event occurs in a population, then the odds associated with that event are P1/Q1. Similarly, the odds associated with the event in a second population is P2/Q2. The odds ratio is simply the ratio of these two odds.

Odds Ratio

a c

b d

Cases Controls

Event

Yes

No

Odds Ratio = ad/bc

Time to Eventwith Censored Data

Method Test Hypothesis

Actuarial or Life TableKaplan-Meier

Log-rank testWilcoxon test

H0: curves are equal

With adjustment for prognostic factors

Cox’s proportional hazards

Outlier

• An observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data.

A Practical Approach for Handling Outliers:

• Test for normality• Transform the data• Use nonparametric statistics• Exclude the outliers

Any special handling of outliers should be reported.

Criteria for Evaluating Studies/Papers

• Is the hypothesis clearly stated?• Will the design of the experiment test the

hypothesis?• Is there a power analysis/sample size justification?• If applicable, are blinding and randomization used

appropriately?• Is the proper control group used?• Are the proper statistical tests used?

What study design is accepted as the best for establishing treatment effectiveness?

a) Retrospective case-control seriesb) Prospective randomized clinical trialc) Meta analysesd) Prospective cohort study

What is considered the lowest level of evidence in evidence-based medicine?

a) Single prospective randomized control trialb) Systematic review of case-control studiesc) Expert opiniond) Well-designed control trial without

randomization (prospective cohort study)

What is a type of error associated with prospective randomized clinical trials?

a) Type I error (erroneously reject H0)

b) Type 2 error (erroneously accept H0)

c) Selection biasd) Measurement biase) All of the above

You take adjacent pieces of colon from 10 rats. One of the pieces is incubated with acetylcholine and the other is incubated in buffer (controls) Before the addition of Ach (or buffer) and ten minutes later you measure serotonin levels in the buffer bathing the mucosa . The change in serotonin levels in treated animals versus controls is best compared using what statistical test? You can assume that the adjacent pieces of tissue respond identically to each other.

a) Unpaired t-testb) Pearson’s correlation coefficient c) Paired t-testd) Analysis of covariance

Ninety patients getting a gastrojejunostomy have a retrocolic anastomosis. Of these 10 develop an internal hernia with small bowel obstruction. Eight patients done by the same surgeon have an antecolic anastomosis and none develop internal hernia or small bowel obstruction over the same follow-up period. How do you compare the results?

a) Fisher’s Exact testb) Linear regressionc) McNemar testd) Mantel-Haenzel test

Documents

Biostatistics and Evaluation of Evidence April 23, 2015