28
The Campbell Collaboration www.campbellcollaboration.org C2 Training: Oslo 2009 Effect Size Calculation II: Advanced Techniques C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org A brief introduction to effect sizes Meta-analysis expresses the results of each study using a quantitative index of effect size (ES). ESs are measures of the strength or magnitude of a relationship of interest. ESs have the advantage of being comparable (i.e., they estimate the same thing) across all of the studies and therefore can be summarized across studies in the meta-analysis. Also, they are relatively independent of sample size.

C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

The Campbell Collaboration www.campbellcollaboration.org

C2 Training: Oslo 2009

Effect Size Calculation II: Advanced Techniques

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

A brief introduction to effect sizes

Meta-analysis expresses the results of each study using a quantitative

index of effect size (ES).

ESs are measures of the strength or magnitude of a relationship of

interest.

ESs have the advantage of being comparable (i.e., they estimate the

same thing) across all of the studies and therefore can be

summarized across studies in the meta-analysis.

Also, they are relatively independent of sample size.

Page 2: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Effect Size Basics

• Effect sizes can be expressed in many different metrics

– d, r, odds ratio, risk ratio, etc.• So be sure to be specific about the metric!

• Effect sizes can be unstandardized or standardized

– Unstandardized = expressed in measurement units

– Standardized = expressed in standardized measurement units

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Types of effect size

Most reviews use effect sizes from one of three families of effect sizes:

• the d family, including the standardized mean difference,

• the r family, including the correlation coefficient, and

• the odds ratio (OR) family, including proportions and other measures

for categorical data.

Page 3: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Effect size computation

• Compute a measure of the “effect” of each study as our outcome

• Range of effect sizes:

– Differences between two groups on a continuous measure

– Relationship between two continuous measures

– Differences between two groups on frequency or incidence

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Types of effect sizes

• Standardized mean difference

• Correlation Coefficient

• Odds Ratios

Page 4: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Correlational data

11 12

1 2

1

n n

X X

n X X

M M M

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Correlation Coefficient (r)

• Also a standardized effect size

• Relatively understandable to a wide

range of people

• If r = 0 then there is no relationship

• r = bivariate correlation

– two continuous variables

• rpb = point-biserial correlation

– one continuous and one dichotomous

variable

• Φ = “fee”

– two dichotomous variables

x yz zr

n

Σ=

Page 5: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Correlation data

10.5 log

1r

r

rZ e

r

ES r

ESES

ES

=

+=

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Standard error of z-transform

1

3rZSEn

=−

Page 6: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Example

0.39

1 0.390.5 log

1 0.39

0.41

r

r

Z e

ES

ES

=

+=

=

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Standard error of z-transform

1

100 3

0.10

rZSE =

=

Page 7: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

95% confidence interval for z

[ ]0.41 1.96 * 0.10 0.21, 0.61± =

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

To translate back to r-metric

2

2

1

1

zr

zr

ES

ES

er

e

−=

+

Page 8: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Confidence interval in r-metric

[0.21, 0.54]

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Computing correlations

• Usually straightforward if correlation matrix given

• Problem arises when regression is used in primary study

• Becker & Wang paper:

http://www.msu.edu/~mkennedy/TQQT/

Page 9: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Example with correlations

• Example is from:

Reynolds, A. J., Ou, S.-R., & Topitzes, J. W. (2004). Paths of

effects of early childhood intervention on educational

attainment and delinquency. Child Development, 75(5),

1299-1328.

• Study looked at the effects of preschool participation for

1,404 low-income children in the Chicago Longitudinal Study

on several later outcomes such as high school completion

and delinquency

Page 10: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Example with correlations

We will compute the Fisher z-transformation for the correlation between

retention and high school completion.

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Example: ITBS Word analysis & H.S. Completion

• r = 0.173, N = 1,286

0.173

1 0.1730.5log 0.174

1 0.173

1 10.028

35.181,286 3

0.174 1.96(0.028) (0.147,0.201)

r

r

r

Z e

Z

ES

ES

SE

=

+= =

= = =−

± =

Page 11: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Odds ratios as effect sizes

• Odds in the treatment group ÷odds in the control group

• Odds ratios are relatively hard

for people to understand

• If OR = 1, odds were equal in

both groups

• OR = 2.0 is as strong as OR =

.50 (but effects are in opposite

directions)

(d)(c)Control

(b)(a)Treatment

Re-ArrestNo re-arrest

a

adbORc bc

d

= =

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Outcomes of one study

18126Comparison

26

14

Failure

37

19

TOTAL

11TOTAL

5Treatment

SuccessDrummond et

al. (1990)

Page 12: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Odds of improving, ΩTrt

T

Prob(Success|Treatment)

Prob(Failure|Treatment)

Prob(S|Trt)

1- Prob(S|Trt)

Ω =

=

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Odds of improving, ΩTrt

Estimate ΩTrt by OE

E

# successes / total # trtO

# failures / total # trt

5 /19 5

14 /19 14

=

= =

Page 13: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Odds of improving, ΩCntl

Estimate ΩCntl by OE’

E '

# successes / total # cntlO

# failures / total # cntl

6 /18 6

12 /18 12

=

= =

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Odds ratio, ωTrt

Cntl

E

E '

estimated by

O # trt success /# trt failureso

O # cntl success /# cntl failures

# trt s # cntl s # trt s*# cntl f

# trt f # cntl f # trt f *# cntl s

Ωω =

Ω

= =

= ÷ =

Page 14: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Example

5*12Odds ratio, o

6*14

600.71

84

=

= =

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Outcomes of one study

dcComparison

b

Failure

aTreatment

SuccessFrequencies

Page 15: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Odds ratio, o or ESOR

OR

adES

bc=

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Interpretation of ESOR

• ESOR = 1, Treatment & Control equally effective

• ESOR > 1, Treatment successes more likely than Control

successes

• 0 < ESOR < 1, Treatment successes less likely than Control

Page 16: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Information for a 2 x 2 table

• MST n = 92

• IT (Control) n = 84

• 26.1% of MST group re-arrested

• 71.4% of IT group re-arrested

Page 17: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

2 x 2 Table

84 – 60 = 2471.4% of 84 =

60

IT

92 – 24 = 6826.1% of 92 =

24

MST

Not arrestedRe-arrested

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Odds ratio

OR

24*24ES

68*60

0.14

=

=

Page 18: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Example

• Take Table 1 from the Ogden study. Compute the odds ratio

for the odds of children in MST who are in out-of-home

placement versus the odds of children in usual child services

who are in out-of-home placement

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Page 19: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

2 x 2 Table

41.9% of 37 =

15.5

58.1% of 37 = 21.5Usual child welfare

services

9.4% of 59 = 5.5590.6% of 59 = 53.45MST

Out of home

placement

In home placement

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Why Do We Need to Interpret Effect Sizes?

• The importance of some intervention effects are sometimes intuitively understood

– Change in earning power

• “College graduates will earn $XX more in their lifetimes than non-graduates.”

– Risk ratio

• “…are 1.4 times more likely to …”

– Grade level equivalency

• “students receiving the intervention scored 5.3 GLE while students not receiving the intervention scored 4.9 GLE.”

• But, most are not …

– Statistically significant effect

– Correlation of +.35, d = -.15

• In most cases, we’ll be working with effects that have to be translated so people will have some idea how to interpret them

Page 20: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Options for Expressing Study Results in an Understandable

Metric

• Statistical significance

– Sometimes naively used as a proxy for effect size

• But trivially small effects can be statistically significant

• And large effects can be statistically nonsignificant

• Remember, a p-value expresses the likelihood of observing a

result at least this big, assuming a true null hypothesis

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

More on ES and Statistical Significance

• Some students learn that if a statistical test fails to

reject the null, it means that the population effect is

zero

– For example, that the intervention is ineffective

– This is one reason people confuse statistical significance

with practical significance (as in, if it is not statistically

significant it can’t be practically significant)

– However…

Page 21: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Point Estimation vs. Interval Estimation

• Interval estimation– Confidence intervals tell us the likely range of population values

• If a study has a confidence interval for IQ scores ranging from .1 to 10.1 points, that is the likely range of the treatment effect as suggested by this study

• Point estimation– Point estimates (e.g., the mean) tell us the most likely value of the

population parameter

Point estimation and interval estimation are best kept separate

Asserting that the treatment effect is zero if the test is not statistically significant confounds these two activities

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Counternull Value of an Effect Size

• The counternull value of an effect size points out this

problem

– Assume a study finds d = +.30, p = .10

– Classic H0: Counternull H0:

There is exactly as much evidence supporting the “classic” null

hypothesis as there is the counternull hypothesis! (The ES

is not statistically different from either 0 or +.60)

1 2

1 2 0

Y Y or

Y Y

=

− =

1 2 .60Y Y− =

Page 22: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Proportion of Variance Explained

• Common for correlations (r2), multiple regression (R2)

• Research suggests that neither experienced researchers nor

experienced statisticians have a good feel for the practical

meaning of this type of effect size (Rosenthal, 1984)

– Typically, even well-trained individuals underestimate the importance

of results when stated in terms of proportion of variance explained

– Not to mention policy makers and the general public

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

More on Proportion of Variance Explained

• Consider a study

– Program designed to improve graduation rate among “at-risk”

students

– φ = +.32, φ2 = .10

• Remember, φ is a correlation with 2 dichotomous variables

– Using proportion of variance as the effect size, one might be tempted

to label this a small or even trivial effect, as only 10% of the variance

in graduation rates can be attributed to the intervention. But …

Page 23: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Binomial Effect Size Display

6634Control

3466Received Intervention

Did not

Graduate

Graduated

φ = .32

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Physician’s Aspirin Study

φ=.08, φ2=.006, p = .16, OR=.48, Risk ratio

= .51

18171Placebo

599Aspirin

Fatality rates, given second heart

attack

φ=.03, φ2=.0009, p<.0001, OR=.55, Risk

ratio = .55 (55% fewer men who take aspirin

have a second heart attack)

18910,845Placebo

10410,933Aspirin

Heart AttackNo Heart AttackSubsequent heart attack rates

Page 24: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Computing the BESD

• For dichotomous outcomes, the BESD illustrates

change in “success rate” corresponding to particular

values of r

– For example, the number of additional graduates

• Computed as (simply)

Treatment group success rate = .50 + (r/2)

Control group success rate = .50 – (r/2)

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Risk Ratios

• Defined as:

Events in the treatment group / treatment group n ÷Events in the control group / control group n

• Interpreted as “The ratio of risk in the treatment group relative to the risk in the control group”– Risk ratio for having a second heart attack was .55

• 55% fewer men who take aspirin have a second heart attack

Page 25: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Odds vs. Risk Ratios

• OR and RR are very similar

when events are rare

• When events become more

common, they diverge

– Study 1: OR = .40 RR = .401

– Study 2: OR = 1.25 RR = 1.50

• Generally, logged ORs have

somewhat better properties for

meta-analysis

– Can convert any OR to a RR for

interpretation

10005Control

10002Treatment

Non-eventEventStudy 1

Control

Treatment

Study 2

400

500

Event

600

500

Non-event

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Risk Difference

• Interpreted as– The difference in risks

between two groups

• Defined as(a ÷ (a+b)) - (c ÷ (c+d))

104 ÷ (104+10933) -

189 ÷ (189+10845) =

.0094-.0171 = -.0077 (or .77%)10,845 (d)189 (c)Placebo

10,993 (b)104 (a)Aspirin

Heart AttackNo Heart

Attack

Page 26: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Number Needed to Treat

• Number needed to treat (NNT) is an additional way to interpret dichotomous outcomes

– How many people have to receive the intervention to produce one more positive (or, one less negative) event?

• Defined as

1/risk difference

• Here, NNT = 1/.0077 ≈ 130– So, 130 men who have had a heart attack need to take aspirin to prevent one

additional second heart attack

– With the fictitious program designed to increase graduation rates among “at-risk”students,

RD = .66-.34 = .32

NNT = 1/.32 = 3.125

– for every 3.125 people who participate in the program, an additional one person will graduate

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Cohen’s Benchmarks

• Jacob Cohen (1988) proposed general definitions for

interpreting effect size estimates:

.10.20Small

.50.80Large

.30.50Medium

rd-index

Page 27: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

More on Cohen

• Lipsey & Wilson (1993) analyzed 183 meta-analyses in the social sciences

– 25th percentile d = .25

– 50th percentile d = .38

– 75th percentile d = .62

• Cohen intended these to be “rules of thumb”, and emphasized that they represent average effects from across the social sciences

– Cautioned that in some areas, smallish effects may be more typical due to:

• Measurement error

• Relative weakness of interventions

– He did not intend these to stand for estimates of practical significance!

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Converting Back to Original Metric

• It can sometimes be helpful to use the mean difference to translate back into a metric people are more accustomed to working with

– Example

• Assume we did a research synthesis and meta-analysis of the effects of homework on achievement among HS students. Outcomes included standardized test scores such as the SAT and ACT, and chapter tests. Assume overall result was d = +.20, and that type of outcome was not a moderator of effect sizes.

– SAT average = 500, SD = 100

– ACT average = 21, SD = 5

» “The overall effect suggests, for example, that the average student doing homework would see an increase in SAT scores from 500 to 520, or in ACT scores from 21 to 22.”

– Cautions

• Comparing different constructs (e.g., math achievement vs. attendance) is difficult to impossible

• Even when tests are highly similar, if their distributions are different the comparisons can be misleading

Page 28: C2 Training: Oslo 2009 - Campbell Collaboration · C2 Training Materials –Oslo –May 2009 Effect Size Basics • Effect sizes can be expressed in many different metrics – d,

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Basic Strategy for Comparing Effect Sizes

• Holding intervention constant, are there differential effects across outcomes?

– Does summer school help math more than reading?

• Holding outcome constant, are there differential effects across interventions (or intervention components)?

– Does mentoring affect graduation rates more than tutoring?

C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org

Other Considerations When Comparing Effect Sizes

• Are some important outcomes completely missing from the

evidence base?

• Are some interventions or intervention components missing

from the evidence base?

• Is there covariation between interventions and study

methodology?

• Is there covariation between interventions and outcome

choice?

– Caution about comparing different mediating variables