1 Developing evidence-based products using the systematic review process Session 4/Unit 10 Assessing study quality Carole Torgerson November 13 th 2007

1

Developing evidence-based products using the systematic review process

Session 4/Unit 10

Assessing study quality

Carole Torgerson

November 13th 2007

NCDDR training course for NIDRR grantees

1

2

Assessing study quality or critical appraisal

• To investigate whether the individual studies in the review are affected by bias. Systematic errors in a study can bias its results by overestimating or underestimating effects. Bias in an individual study or individual studies in the review can in turn bias the results of the review

• To make a judgement about the weight of evidence that should be placed on each study (higher weight given to studies of higher quality in a meta-analysis)

• To investigate differences in effect sizes between high an low quality studies in a meta-regression

2

3

Coding for study quality

• Collect and record information about each study on quality of design and quality of analysis (internal validity)

• Base quality assessment judgement about each study on this information

• Use quality assessment judgements of all included studies to inform the synthesis (meta-analysis)» Only use findings from studies judged to be of high quality

or qualify findings» Look for homogeneity/heterogeneity » Examine differences in findings according to quality

(sensitivity analysis)

3

4

“A careful look at randomized experiments will make clear that

they are not the gold standard. But then, nothing is. And the

alternatives are usually worse.”

Berk RA. (2005) Journal of Experimental Criminology 1, 417-433.

4

5

Code for design: Experiments/quasi experiments; fidelity to random

allocation; concealmentCode for whether RCT or quasi-experiment (specify) or other

(specify)Is the method of assignment unclear?

» Need to ascertain how assignment was undertaken and code for this; if unclear, may need to contact authors for clarification

» Look for confusion between non-random and random assignment – the former can lead to bias.

If RCT» Need to look for and code for assignment

discrepancies, e.g. failure to keep to random allocation» Need to code for whether or not allocation concealed

5

6

Which studies are RCTs?

• “We took two groups of schools – one group had high ICT use and the other low ICT use – we then took a random sample of pupils from each school and tested them”.

• “We put the students into two groups, we then randomly allocated one group to the intervention whilst the other formed the control”

• “We formed the two groups so that they were approximately balanced on gender and pre-test scores”

• “We identified 200 children with a low reading age and then randomly selected 50 to whom we gave the intervention. They were then compared to the remaining 150”.

• “Of the eight [schools] two randomly chosen schools served as a control group”

6

7

Is it randomised?

“The groups were balanced for gender and, as far as possible, for school. Otherwise, allocation was randomised.”

Thomson et al. Br J Educ Psychology 1998;68:475-91.

7

8

Is it randomised?

“The students were assigned to one of three groups, depending on how revisions were made: exclusively with computer word processing, exclusively with paper and pencil or a combination of the two techniques.”

Greda and Hannafin, J Educ Res 1992;85:144.

8

9

Mixed allocation

“Students were randomly assigned to either Teen Outreach participation or the control condition either at the student level (i.e., sites had more students sign up than could be accommodated and participants and controls were selected by picking names out of a hat or choosing every other name on an alphabetized list) or less frequently at the classroom level”

Allen et al, Child Development 1997;64:729-42. 9

10

Non-random assignment confused with random allocation

“Before mailing, recipients were randomized by rearranging them in alphabetical order according to the first name of each person. The first 250 received one scratch ticket for a lottery conducted by the Norwegian Society for the Blind, the second 250 received two such scratch tickets, and the third 250 were promised two scratch tickets if they replied within one week.”

Finsen V, Storeheier, AH (2006) Scratch lottery tickets are a poor incentive to respond to mailed questionnaires. BMC Medical Research Methodology 6, 19. doi:10.1186/1471-2288-6-19.

10

11

Misallocation issues

“23 offenders from the treatment group could not attend the CBT course and they were then placed in the control group”.

11

12

Concealed allocation – why is it important?

» Good evidence from multiple sources shows that effect sizes for RCTs where randomisation was not independently conducted were larger compared with RCTs that used independent assignment methods.

» A wealth of evidence is available that indicates that unless random assignment was undertaken by an independent third party, then subversion of the allocation may have occurred (leading to selection bias and exaggeration of any differences between the groups).

12

13

Allocation concealment: a meta-analysis

• Schulz and colleagues took a database of 250 randomised trials in the field of pregnancy and child birth.

• The trials were divided into 3 groups with respect to concealment: » Good concealment (difficult to subvert);» Unknown (not enough detail in paper);» Poor (e.g., randomisation list on a public notice board).

• They found exaggerated effect sizes for poorly concealed compared with well concealed randomisation.

13

14

Comparison of adequate, unclear and inadequate concealment

Allocation Concealment

Effect Size OR

Adequate 1.0

Unclear 0.67 P < 0.01

Inadequate 0.59

Schulz et al. JAMA 1995;273:408.14

15

Small vs large trials

• Small trials tend to give greater effect sizes than large trials: this shouldn’t happen.

• Kjaergard et al showed this phenomenon was due to poor allocation concealment in small trials; when trials were grouped by allocation methods ‘secure’ allocation reduced effect by 51%.

Kjaegard et al. Ann Intern Med 2001;135:982. 15

16

Case study

• Subversion is rarely reported for individual studies.

• One study where it has been reported was for a large, multi-centred surgical trial.

• Participants were randomised at 5+ centres using sealed envelopes (sealed envelopes can be opened in advance and participants can be selected by the recruiting researcher into groups rather than by randomisation).

16

17

Mean ages of groups

Clinician Experimental Control

All p < 0.01 59 63

1 p =.84 62 61

2 p = 0.60 43 52

3 p < 0.01 57 72

4 p < 0.001 33 69

5 p = 0.03 47 72

Others p = 0.99 64 59

17

18

Using telephone allocation

Clinician Experimental Control

All p = 0.37 59 57

1 p =.62 57 57

2 p = 0.24 60 51

3 NA 61 70

4 p =0.99 63 65

5 p = 0.91 57 62

Others p = 0.99 59 56

18

19

Recent blocked trial

“This was a block randomised study (four patients to each block) with separate randomisation at each of the three centres. Blocks of four cards were produced, each containing two cards marked with "nurse" and two marked with "house officer." Each card was placed into an opaque envelope and the envelope sealed. The block was shuffled and, after shuffling, was placed in a box.”

Kinley et al., BMJ 2002 325:1323. 19

20

• Block randomisation is a method of ensuring numerical balance; in this case, blocking was by centre.

• If block randomisation of 4 was used then numbers in each group at each centre should not be different by more than 2 participants.

20

21

Problem?

Southampton Sheffield Doncaster

Doctor Nurse Doctor Nurse Doctor Nurse

500 511 308 319 118 118

Kinley et al., 2002 BMJ 325:1323. 21

22

Examples of good allocation concealment

• “Randomisation by centre was conducted by personnel who were not otherwise involved in the research project” [1]

• Distant assignment was used to: “protect overrides of group assignment by the staff, who might have a concern that some cases receive home visits regardless of the outcome of the assignment process”[2]

[1] Cohen et al. (2005) J of Speech Language and Hearing Res. 48, 715-729.

[2] Davis RG, Taylor BG. (1997) Criminology 35, 307-333. 22

23

Assignment discrepancy

• “Pairs of students in each classroom were matched on a salient pretest variable, Rapid Letter Naming, and randomly assigned to treatment and comparison groups.”

• “The original sample – those students were tested at the beginning of Grade 1 – included 64 assigned to the SMART program and 63 assigned to the comparison group.”

Baker S, Gersten R, Keating T. (2000) When less may be more: A 2-year longitudinal evaluation of a volunteer tutoring program requiring minimal training. Reading Research Quarterly 35, 494-519.

23

24

Change in concealed allocation

05

1015

2025

3035

4045

50

Drug No Drug

<1997

>1996

NB No education trial used concealed allocation

P = 0.04 P = 0.7024

25

Example of unbalanced trial affecting results

• Trowman and colleagues undertook a systematic review to see if calcium supplements were useful for helping weight loss among overweight people.

• The meta-analysis of final weights showed a statistically significant benefit of calcium supplements. HOWEVER, a meta-analysis of baseline weights showed that most of the trials had ‘randomised’ lower weight people into the intervention group. When this was taken into account there was no longer any difference.

25

26

Meta-analysis of baseline body weight

Trowman R et al (2006) A systematic review of the effects of calcium supplementation on body weight. British Journal of Nutrition 95, 1033-38. 26

27

Summary of assignment and concealment

• Code for whether RCT or quasi-experiment (specify) or other (specify)

• Increasing evidence to suggest that subversion of random allocation is a problem in randomised trials. The ‘gold-standard’ method of ‘random’ allocation is the use of a secure third party method.

• Code whether or not the trial reports that an independent method of allocation was used. Poor quality trials: use sealed envelopes; do not specify allocation method; or use allocation methods within the control of the researcher (e.g., tossing a coin).

• Code for assignment discrepancies, e.g. failure to keep to random allocation

27

2828

5 minute break!

29

Other design issues

• Attrition (drop-out) can introduce selection bias• Unblinded ascertainment (outcome

measurement) can lead to ascertainment bias• Small samples can lead to Type II error

(concluding there is no difference when there is a difference)

• Multiple statistical tests can give Type I errors (concluding there is a difference when this is due to chance)

• Poor reporting of uncertainty (e.g., lack of confidence intervals).

30

Coding for other design characteristics

• Code for attrition in intervention and control groups

• Code for whether or not there is ‘blinding’ of participants

• Code for whether or not there is blinded assessment of outcome

• Code for whether or not the sample size is adequate

• Code for whether the primary and secondary outcomes are pre-specified

30

31

Blinding of participants and investigators

• Participants can be blinded to:» Research hypotheses» Nature of the control or experimental condition» Whether or not they are taking part in a trial

• This may help to reduce bias from resentful demoralisation

• Investigators should be blinded (if possible) to follow-up tests as this eliminates ‘ascertainment’ bias. This is where consciously or unconsciously investigators ascribe a better outcome than is the truth based on the knowledge of the assigned groups.

31

32

Blinding of outcome assessment

• Code for whether or not post-tests were administered by someone who is unaware of the group allocation. Ascertainment bias can result when the assessor is not blind to group assignment, e.g., homeopathy study of histamine showed an effect when researchers were not blind to the assignment but no effect when they were.

• Example of outcome assessment blinding: Study “was implemented with blind assessment of outcome by qualified speech language pathologists who were not otherwise involved in the project”

Cohen et al. (2005) J of Speech Language and Hearing Res. 48, 715-729.

32

33

Blinded outcome assessment

0

5

10

15

20

25

30

35

40

Hlth Ed Education

<1997

>1996

P = 0.03P = 0.13

Torgerson CJ, Torgerson DJ, Birks YF, Porthouse J. (2005) A comparison of randomised controlled trials in health and education. British Educational Research Journal,31:761-785.

33

34

Statistical power

• Few effective educational interventions produce large effect sizes especially when the comparator group is an ‘active’ intervention. In a tightly controlled setting 0.5 of a standard deviation difference at post-test is good. Smaller effect sizes in field trials are to be expected (e.g. 0.25). To detect 0.5 of an effect size with 80% power (sig = 0.05), we need 128 participants for an individually randomised experiment.

35

Percentage of trials underpowered (n < 128)

0

10

20

30

40

50

60

70

80

90

Hlth Ed Education

<1997

>1996

P = 0.22

P = 0.76

Torgerson CJ, Torgerson DJ, Birks YF, Porthouse J. (2005) A comparison of randomised controlled trials in health and education. British Educational Research Journal,31:761-785.

36

Code for analysis issues

• Code for whether, once randomised, all participants are included within their allocated groups for analysis (i.e., was intention to treat analysis used).

• Code for whether a single analysis is pre-specified before data analysis.

36

37

Attrition

• Attrition can lead to bias; a high quality trial will have maximal follow-up after allocation.

• It can be difficult to ascertain the amount of attrition and whether or not attrition rates are comparable between groups.

• A good trial reports low attrition with no between group differences.

• Rule of thumb: 0-5%, not likely to be a problem. 6% to 20%, worrying, > 20% selection bias.

37

38

Poorly reported attrition

• In a RCT of Foster-Carers extra training was given.» “Some carers withdrew from the study once the dates

and/or location were confirmed; others withdrew once they realized that they had been allocated to the control group” “117 participants comprised the final sample”

• No split between groups is given except in one table which shows 67 in the intervention group and 50 in the control group. 25% more in the intervention group – unequal attrition hallmark of potential selection bias. But we cannot be sure.

Macdonald & Turner, Brit J Social Work (2005) 35,126538

39

What is the problem here?

Random allocation

160 children in 20 schools (8 per school)

80 in each group

76 children allocated to control 76 allocated to intervention group

1 school 8 children withdrew

N = 17 children replaced following

discussion with teachers

39

40

• In this example one school withdrew and pupils were lost from both groups – unlikely to be a source of bias.

• BUT 17 pupils were withdrawn by teachers as they did not want them to have their allocated intervention and these were replaced by 17 others.

• This WILL introduce bias into the experiment. Note: such a trial should be regarded as a quasi-experiment.

40

41

What about matched pairs?

• It is sometimes stated that selection bias due to attrition can be avoided using a matched pairs design, whereby the survivor of a pair is removed from the analysis (1).

• We can only match on observable variables and we trust to randomisation to ensure that unobserved covariates or confounders are equally distributed between groups.

• Using matched pairs won’t remove attrition bias from the unknown covariate.

(1) Farrington DP, Welsh BC. (2005) Randomized experiments in criminology: What have we learned in the last two decades? Journal of Experimental Criminology 1, 9-38. 41

42

Pairs matched on gender

Control

(unknown covariate)

Intervention

(unknown covariate)

Boy (high) Boy (low)

Girl (high) Girl (high)

Girl (low) Girl (high)



3 Girls and 3 highs 3 Girls and 3 highs.42

43

Drop-out of 1 girl

Control Intervention





Girl (high)

2 Girls and 3 highs 3 Girls and 3 highs.

43

44

Removing matched pair does not balance the groups!

Control Intervention





2 Girls and 3 highs 2 Girls and 2 highs.

44

45

Intention to treat (ITT)

• Randomisation ensures the abolition of selection bias at baseline; after randomisation some participants may cross over into the opposite treatment group (e.g., fail to take allocated treatment or obtain experimental intervention elsewhere).

• There is often a temptation by trialists to analyse the groups as treated rather than as randomised.

• This is incorrect and can introduce selection bias.

45

46

ITT analysis: examples

• Seven participants allocated to the control condition (1.6%) received the intervention, whilst 65 allocated to the intervention failed to receive treatment (15%). (1) The authors, however, analysed by randomised group (CORRECT approach)

• “It was found in each sample that approximately 86% of the students with access to reading supports used them. Therefore, one-way ANOVAs were computed for each school sample, comparing this subsample with subjects who did not have access to reading supports.” (2) (INCORRECT)

(1) Davis RG, Taylor BG. (1997) Criminology 35, 307-333.

(2) Feldman SC, Fish MC. (1991) Journal of Educational Computing Research 7, 25-36.

.

46

47

Unit of allocation

• Participants can be randomised individually (the most usual approach) or as groups.

• The latter is known as cluster or group or place randomised controlled trials.

• Often it is not possible to randomise as individuals for example:» Evaluating training interventions on clinicians or

teachers and measuring outcomes on patients or students;

» Spill over or contamination of the control group.

47

48

Clusters

• A cluster can take many forms:» GP practice or patients belonging to an

individual practitioner;» Hospital ward;» School, class;» A period of time (week; day; month);» Geographical area (village; town; postal

district).

48

49

Code for quality of cluster trials

• Code for whether the participants were recruited before the clusters were randomised – if not this could have lead to selection bias.

• Individuals within clusters have outcomes that are related and this needs to be accounted for both in the sample size calculation and the analysis. Code for the following: did the trial report its intracluster correlation coefficient (ICC)? did the analysis use some form of statistical approach to take clustering into account (e.g., cluster level means, hierarchical linear modelling, robust standard errors).

49

50

What is wrong here?

• “the remaining 4 classes of fifth-grade students (n = 96) were randomly assigned, each as an intact class, to the [4] prewriting treatment groups;”

Brodney et al. J Exp Educ 1999;68,5-20.

50

51

Insufficient cluster replication

• The key quality criterion of a cluster trial is not the number of individual participants in the study but the number of clusters.

• A cluster trial with only 1 cluster per group cannot be thought of as a trial as it is impossible to control for cluster level confounders. At least 4 (some say 7) clusters per group are needed to have some hope of balancing out confounders.

51

52

Which is better?

• Cluster trial A: We randomised 10 schools with 500 children in each 5 to the intervention and 5 to the control (I.e., 5,000 children in all);

OR • Cluster trial B: We randomised 100 classes with

25 children in each 50 to the control and 50 to the intervention (I.e., 2,500 children in all).

• Trial B is better as it has 100 units of allocation rather than 10 despite having 50% fewer children.

52

53

Selection bias in cluster randomised trials

• Given enough clusters selection bias should not have occurred in cluster trials as randomisation will have dealt with this.

• HOWEVER, the clusters will be balanced at the individual level ONLY if all eligible people, or a random sample, within the cluster were included in the trial.

• In some trials this does not apply as randomisation occurred BEFORE recruitment. This could have introduced selection bias.

53

54

Authors Source Years

Clustering allowed in sample size

Clustering allowed in analysis

Donner et al. (1990)

16 non-therapeutic intervention trials

1979–1989 <20% <50%

Simpson et al. (1995)

21 trials from American Journal of Public Health and Preventive Medicine

1990–1993 19% 57%

Isaakidis and Ioannidis (2003)

51 trials in Sub-Saharan Africa

1973–2001 (half post 1995)

20% 37%

Puffer et al. (2003)

36 trials in British Medical Journal, Lancet, and New England Journal of Medicine

1997–2002 56% 92%

Eldridge et al. (Clinical Trials 2004)

152 trials in primary health care

1997–2000 20% 59%

Reviews of Cluster Trials

54

55

Analysis

Many cluster randomised health care trials were improperly analysed. Most analyses use t-tests, chi-squared tests, which assumes independence of observations, which

are violated in a cluster trial.

This leads to spurious p values and narrow CIs.

Various methods exist, e.g., multilevel models, comparing means of clusters, which will produce correct estimates.

See a worked example at Martin Bland’s website:

http://www-users.york.ac.uk/~mb55/

55




56

Survey of trial quality

Characteristic Drug Health Education Cluster Randomised 1% 36% 18% Sample size justified 59% 28% 0% Concealed randomisation 40% 8% 0% Blinded Follow-up 53% 30% 14% Use of CIs 68% 41% 1% Low Statistical Power 45% 41% 85%

Torgerson CJ, Torgerson DJ, Birks YF, Porthouse J. (2005) A comparison of randomised controlled trials in health and education. British Educational Research Journal,31:761-785. (based on n = 168 trials)

56

57

CONSORT

• Because the majority of health care trials were badly reported, a group of health care trial methodologists developed the CONSORT statement, which indicates key methodological items that must be reported in a trial report.

• This has now been adopted by all major medical journals and some psychology journals.

57

58

The CONSORT guidelines, adapted for trials in educational research

• Was the target sample size adequately determined?• Was intention to teach analysis used? (i.e. were all children who

were randomised included in the follow-up and analysis?)• Were the participants allocated using random number tables, coin

flip, computer generation?• Was the randomisation process concealed from the investigators?

(i.e. were the researchers who were recruiting children to the trial blind to the child’s allocation until after that child had been included in the trial?)

• Were follow-up measures administered blind? (i.e. were the researchers who administered the outcome measures blind to treatment allocation?)

• Was precision of effect size estimated (confidence intervals)?• Were summary data presented in sufficient detail to permit

alternative analyses or replication?• Was the discussion of the study findings consistent with the

data?58

59

Flow Diagram

• In health care trials reported in the main medical journals authors are required to produce a CONSORT flow diagram.

• The trial by Hatcher et al, clearly shows the fate of the participants after randomisation until analysis.

59

60

Flow Diagram

2 schools excludeddue to insufficient numbers

of poor spellers

1 school (6 children) withdrewfrom study after randomisation

39/42 children in 13 remainingschools allocated to 20-week

intervention39/42 children included

39/42 children in 13 remainingschools allocated to 10-week intervention

1 child left study (moved school)38/42 children included

84/118 in 14 remainingschools (6 per school) selected for

randomisation to interventionsexcluded 9 children due to behaviour

118 children with poor spellingskills given individual tests of

vocabulary, letter knowledge, wordreading and phoneme manipulation

635 children in 16 schoolsscreened using group spelling test

Hatcher et al. 2005 J Child Psych Psychiatry: online60

61

Year 7 PupilsN = 155

Randomised

ICT groupN = 77

N = 3 left school

No ICT GroupN = 78

N = 1 left school

75 valid pre-tests71 valid post-tests

67 valid pre and post tests

70 valid pre-tests67 valid post-tests

63 valid pre and post

61

62

Dr Carole Torgerson

Senior Research Fellow

Institute for Effective Education

University of York

[email protected]

62

mailto:[email protected]

mailto:[email protected]

Documents

1 Developing evidence-based products using the systematic review process Session 4/Unit 10 Assessing study quality Carole Torgerson November 13 th 2007