Critical Appraisal

Critical appraisal of the medical

literature

Partini Pudjiastuti, Sudigdo SastroasmoroChild Health Department

Faculty of Medicine University of Indonesia

Population & Sample

Sudigdo [email protected]

Population is a large group of study subjects (human, animals, tissues, blood specimens, medical records, etc) with defined characteristics [“Population is a group of study subjects defined by the researcher as population”]

Sample is a subset of population which will be directly investigated. Sample should be (or assumed to be) representative to the population; otherwise all statistical analyses will be invalid

All investigations are always performed in the sample, and the results will be applied to the population

Avoid using ambiguous terms

Sample populationSampled populationPopulasi sampel

Target population = domain = population in which the results of the study will be applied. Usually character-ized by demographic & clinical characteristics; e.g. normal infants, teens with epilepsy, post-menopausal women with osteoporosis. Accessible population = subset of target population which can be accessed by the investigator. Frame: time & place. Example: teens with epilepsy in RSCM, 2000-2005; women with osteoporosis, 2002 RSGSIntended sample = subjects who meet eligibility criteria and selected to be included in the studyActual study subjects = subjects who actually completed the participation in the study

Accessible population(+ time,

place)

Usually based on practicalpurposes

Appropriatesampling technique

[Non-response, drop outs,withdrawals, loss to follow-up]

Target population =DOMAIN

(demographic, clinical)

IntendedSample

[Subjects selectedfor study]

Actualstudy

subjectsSubjects

completedthe study

Target population Accessible

population

IntendedSample

Actualstudy

subjects

External validity II:Does AP represent TP?

[Internal validity: does ASS represent IS?]

[External validity I:Does IS represent AP?}

A. Probability samplingSimple random sampling (r. table,

computer) Stratified random samplingSystematic samplingCluster samplingOthers: two stage cluster sampling, etc

B. Non-probability samplingConsecutive samplingConvenience sampling Judgmental sampling

Sampling methods

All statistical analyses (inferences) are based on random samplingWhether or not a sample is representative to the population depends on whether or not it resembles the results if it were done by random sampling

Note

IMPORTANT!!!

Statistical significance vs. clinical importance

Negligible clinical difference may be statistically very significant if the number of subjects >>>. e.g., difference in reduction of cholesterol level of 3 mg/dl, n1=n2 = 10,000; p = 0.00002Large clinical difference may be statistically non-significant if the no of subjects <<<, e.g. 30% difference in cure rate, if n1 = n2 = 10, p = 0.74

R

x = 300 mg/dl

x = 300mg/dl

Standardtreatment

New treatment

Cholesterol level, mg/dl

t = df = 9998 p = 0.00002

x = 220

x = 217

Clinical

Statistical

Clinical importance vs. statistical significance

n=10000

n=10000

Cured Died

Standard Rx 0 10 (100%)

New Rx 3 7 (70%)

Fischer exact test: p = 0.211

Clinical importance vs. statistical significance

Absolute risk reduction = 30% Clinical

Statistical

Correlation between abdominal circumference and total cholesterol level in middle-aged menN = 200R = 0.22, p = 0.031Conclusion: There was a significant correlation between abdominal circumference and total cholesterol level in the subjects studied. Measuring abdominal circumference may predict the cholesterol level in middle-aged healthy men.

How important is important?

Two percent mortality reduction is probably not important in your clinicIn a community prevention, a simple measure that reduce 2% severe morbidity is probably important. –Low dose aspirin reduces 2%

cardiac events in 5 years (without aspirin 400 cardiac events per 10,000, with aspirin 200 cardiac events)

Requires judgment

Can the results of the study (in sample) be applied in the accessible or target population?Hypothesis testing & confidence interval

How to infer?

Statistic and Parameter

An observed value drawn from the sample is called a statistic (cf. statistics, the science)The corresponding value in population is called a parameterWe measure, analyze, etc statistics and translate them as parameters

Examples of statistics:

ProportionPercentageMeanMedian ModeDifference in proportion/mean

ORRRSensitivitySpecificityKappaLRNNT

There are 2 ways in inferring statistic into parameter:

Hypothesis testing p valueEstimation: confidence interval (CI)

P Value & CI tell the same concept in different ways

P value

Determines the probability that the observed results are caused solely by chance (probability to obtain the observed results if Ho were true)

C 30 (60%) 20 (40%) 50

E 40 (80%) 10 (20%) 50

X2= ; df = 1; p = 0.0432

Group Success Failure Total

C 30 (60%) 20 (40%) 50

E 40 (40%) 10 (20%) 50

X2= ; df = 1; p = 0.0432

Group Success Failure Total

If drugs E and C were equally effective, we still can have the above result (difference of success rate of 20%)

but the probability is small (4.32%)

If drugs E and C were equally effective, the probability that the result is merely caused by chance is 4.32%

If we define in advance that p<0.05 is significant,than the result is called statistically significant

Similar interpretation applies to ALL hypothesis testing: t-test, Anova, non-parametric tests, Pearson correlation, multivariate tests, etc:

If null-hypothesis null were true, the probability of obtaining the result was ……. (example 0,02 or 2%, etc)

Confidence Intervals

Estimate the range of values (parameter) in the population using a statistic in the sample (as point estimate)

X XX

If the observedresult in the

sample is X, whatis the figure inthe population?

CI

A statistic (point estimate)

S

P

Most commonly used CI:

CI 90% corresponds to p 0.10CI 95% corresponds to p 0.05CI 99% corresponds to p 0.01

Note:p value only for analytical studiesCI for descriptive and analytical studies

How to calculate CI

General Formula:

CI = p Z x SE

p = point of estimate, a value drawn from sample (a statistic)Z = standard normal deviate for , if = 0.05 Z = 1.96 (~ 95% CI)

Example 1

100 FKUI students 60 females (p=0.6)What is the proportion of females in Indonesian FK students? (assuming FKUI represents FK in Indonesia)

Example

7050106096160

10040609616095

.;.....

....%

npqSE(p)

=±=±=

±=

=

X0.5/10

xCI

Example 2: CI of the mean

100 newborn babies, mean BW = 3000 (SD = 400) grams, what is 95% CI?

95% CI = x 1.96 x SEM

30802920

803000803000803000100

400x9613000CI95

nSDSEM

;

)();(

.%

Examples 3: CI of difference between proportions (p1-p2)

50 patients with drug A, 30 cured (p1=0.6)50 patients with drug B, 40 cured (p2=0.8)

29.0;11.0)09.02.0();9.02.0()pp(CI%95

09.050

4.0

50

)2.08.0(

50

)4.06.0(

n

qp

n

qp)pp(SE

)pp(xSE96.1)pp()pp(CI%95

21

2

21

2

1121

212121

Example 4: CI for difference between 2 means

Mean systolic BP:50 smokers = 146.4 (SD 18.5) mmHg50 non-smokers = 140.4 (SD 16.8) mmHg

x1-x2 = 6.0 mmHg

95% CI(x1-x2) = (x1-x2) 1.96 x SE (x1-x2)

SE(x1-x2) = S x V(1/n1 + 1/n2)

Example 4: CI for difference between 2 means

V

13.01.0;)(1.96X3.536.095%CI

3.53501

501

17.7)xSE(x

17.798

16.24918.6)(49s

2)n(n1)s(n1)s(n

s

21

21

222

211

Other commonly supplied CI

Relative risk (RR)Odds ratio (OR)Sensitivity, specificity (Se, Sp)Likelihood ratio (LR)Relative risk reduction (RRR)Number needed to treat (NNT)

Suggested CI presentation:

95%CI: 1.5 to 4.595%CI: -2.5 to 4.395%CI: -12 to -6

Not recommended: 3 +1.5Not recommended: -9+ -3

In contrast to CI for proportion, mean, diff. between proportions/means, where the values of CI are symmetrical around point estimate, CI’s for RR, OR, LR, NNT are asymmetrical because the calculations involve logarithm

Examples

RR = 5.6 (95% CI 1.2 ; 23.7)OR = 12.8 (95% CI 3.6 ; 44,2)NNT = 12 (95% CI 9 ; 26)

If p value <0.05, then 95% CI:exclude 0 (for difference), because if A=B then A-B = 0 p>0.05exclude 1 (for ratio), because if A=B then A/B = 1, p>0.05

For small number of subjects, computer calculated CI may not meet this rule due to correction for continuity automatically done by the computer

Concluding remarks

In every study sample should (assumed to) be representative to the population. Otherwise all statistical calculations are not validp values (hypothesis testing) gives you the probability that the result in the sample is merely caused by chance, it does not give the magnitude and direction of the differenceConfidence interval (estimation) indicates estimate of value in the population given one result in the sample, it gives the magnitude and direction of the difference

Concluding remarks

p value alone tends to equate statistical significance and clinical importanceCI avoids this confusion because it provides estimate of clinical values and exclude statistical significance whenever applicable, supply CI

especially for the main results of study

in critical appraisal of study results, focus should be on CI rather than on p value.

1

The ultimate goal of clinical research is the use of evidence in source population

2

The best non-probabiity sampling is consecuitve sampling

3

P value refers to the probability of getting the observed result when the Ho were false

4

The mean difference of 2 measurements is 20 mmHg, with 95% CI 15 to 25 mmHg. The p value should be “statistically significant”

5

Confidence intervals give more information than p value

6

It is possible to have a study with good internal validity but poor external validity

7

If the odds ratio is 5, then the 95% CI may have values from 3 to 11

8

It is possible to have a significant difference even when the clinical difference is not important, but clinically important difference always statistically significant

9

Appropriate sampling method is mandatory to ensure generalization

10

Clinical epidemiology may include animal studies

11

The more wide the confidence interval, the more precise the result of any study

12

Assessment of clinical importance requires judgment

13

The confidence interval of any measure must include the point estimate

14

Selection of source population usually based on practical reasons

15

Diagnostic test, therapy, etiology, harm, are examples of basic research

Critical appraisal (making reading more

worthwhile)

What is Critical Appraisal?1. Critical appraisal = quality assessment2. ….process of weighing up evidence to

see how useful it is in decision making3. .…a process of assessing the validity,

importance, and usefulness of evidence4. Critical appraisal is about considering,

evaluating and interpreting information in a systematic and objective way

Critically appraise what you read

Separating the wheat from the chaffTime is limited – you should aim to quickly stop reading the drossOthers contain useful information mixed with rubbishSimple checklists enable the useful information to be identified

Critical appraisal – Critical thinking

Appraising (evaluating/reviewing) the available evidence to construct clinical reasoning strategies and to make decisions

Finding strengths and limitations of written ‘evidence’

You need to decide what evidence to pay attention to (what is “worthy” of your attention) versus what to ignore

Why critically appraise?

Supports sound decision making based on best available evidenceHelps us determine (three R’s):

• How rigorous a piece of research is (Valid?)

• What the results are telling us (Important?)

• How relevant it is to our patient (Applicable?)

Why do we need evidence?

Resources should be allocated to things that are EFFECTIVE

The only way of judging effectiveness is EVIDENCE

“In God we trust – all others need evidence”

Sources of Evidence

Primary sources–Based on experiments and

published researchSecondary sources–Systematic reviews–Clinical guidelines– Journals of secondary publication

e.g. Evidence Based Medicine

“5S” Pyramid of Evidence Resources

Levels of evidence

1. Systematic reviews of RCTs/high quality RCTs

2. Systematic reviews of cohort studies, lower quality RCTs, outcomes research

3. Systematic reviews of case controls, case control studies

4. Case Series5. Expert opinionSee

http://www.cebm.net/levels_of_evidence.asp for complete description

http://www.cebm.net/levels_of_evidence.asp

http://www.cebm.net/levels_of_evidence.asp

Types of Evidence - Question Types

Type of Question Best Evidence

Health care interventions: treatment, prevention

Quantitative: Systematic Review of RCTs or RCT

Harm or Etiology Quantitative: Observational Study - Cohort or Case Control

Prognosis Quantitative: Observational Study - Cohort, Case Control

Diagnosis or Assessment

Quantitative: Comparison to Gold Standard

Economics Quantitative: Cost-effectiveness Study

Meaning Qualitative: case study,

Key quality parameters

Validity

Reliability

Importance

Validity: internal and external

Internal - Is the study designed in such a way that we can trust the findings?

External - Is the study designed in such a way that I can generalize the findings?

Studies with good internal validity may not have good external validity if the study subjects do not represent population

With poor internal validity, question about external validity is not relevant

If the study was conducted again, would the results be the same?

Usually interpreted as the accuracy of measurement.

Reliability

What was the effect size or magnitude of effect? (Would the evidence change your practice?)

Clinical vs. statistical significance.

Importance

Tools for Critical Appraisal

Are the results valid?

What are the results?

Will the results help me in patient care?

EBM “simplified” approach:

V

I

A

Evidence based medicine5 steps

Formulate question

Efficiently track down bestavailableevidence

Critically review thevalidity and usefulnessof the evidence

Implement changes in clinical practice

Evaluate performance

Check list for medical literature (completeness)

1. Title2. Authors3. Abstract: structured? Informative? Abbreviation?4. Introduction: length? Relevant references? Target

population?5. Methods:

• Design• Eligibility (inclusion and exclusion) criteria • Sample size, sampling method• Randomization: technique, concealment• Intervention: masking?• Measurement: blinding? - Primary &

secondary outcome• Definitions• Analysis

6. Results• Baseline characteristics• Main outcome• Secondary outcome

7. Discussion• General• Strength and weakness• Conclusions

8. References• Vancouver style• Constant

9. Acknowledgments10. Ethics approval11. Conflict of interest

Check list for medical literature (contd.)

What to assess?(in study of cause-effect

relationship)

A. General description– Type of design– Target population, source

population, sample– Sampling method– Dependent and independent

variables– Main results?

B. Internal validity, non-causal relationship– Influence of bias– Influence of chance– Influence of confounders


relationship)

BiasWhat is a bias? A process that tends

to produce results that depart systematically from the true values existing in the study population

Types of bias1. Sample (subject selection) biases, which may

result in the subjects in the sample being unrepresentative of the population which you are interested in

2. Measurement (detection) biases, which include issues related to how the outcome of interest was measured

3. Intervention (performance) biases, which involve how the treatment itself was carried out.

C. Internal validity, causal relationshipTemporality (cause precedes effect)Strength of association (large difference, RR, OR, etc) or small p value or narrow confidence intervalBiological gradient (dose dependence)Consistency among studies (diff. populations/designs)Specificity (certain factor results in certain effect)Coherence (does not conflict with current knowledge)Biological plausibility: can be explained with current knowledge (at least in part)


relationship)

D. External validity• Applicable to study subjects• Applicable to source population• Applicable to target population


relationship)

11 items, each with 3 sections

1. Can you find this information in the paper?

2. Is there any problem?3. Does this problem threaten the

validity?

11 items1. What is the research question?2. What is the study type?3. What are the outcome factors and how are

they measured?4. What are the study factors and how are they

measured?5. What important confounders are considered?6. What are the sampling frame and sampling

method?7. In an exp., how were the subjects assigned to

groups? In a longitudinal study, how many reached final follow-up? In a case control study, are the controls appropriate? (etc)

8. Are statistical tests considered?9. Are the results clinically/socially significant

(important)?10. Is the study ethical? 11. What conclusions did the authors reach?

1. What is the research question?

Any problem?– Is it concerned with the impact of

an intervention, causality or determining the magnitude of a health problem?

(Does this problem threaten the validity?)– Is it a well stated research

question/hypothesis?

2. What is the study type?

(Any problem?)– Is the study type appropriate to the

research question?

(Does this problem threaten the validity?)– If not, how useful are the results

produced by this type of study?

3. What are the outcome factors and how are they measured?

(Any problem?)–a) are all relevant outcomes

assessed–b) is there measurement error?(Does this problem threaten the validity?)–a) how important are omitted

outcomes–b) is measurement error an

important source of bias?

4. What are the study factors and how are the measured?

(Any problem?)– Is there measurement error?

(Does this problem threaten the validity?)– Is measurement error an important

source of bias?

5. What important potential confounders are considered?

(Any problem?)–Are potential confounders

examined and controlled for?(Does this problem threaten the validity?)– Is confounding an important source

of bias?

6. What are the sampling frame and sampling method?

(Any problem?)– Is there selection bias?

(Does this problem threaten the validity?)–Does this threaten the external

validity of the study?

7. Questions of internal validity

(Any problem?)–Experimental: how were the

subjects assigned to groups?–Longitudinal study, how many

reached follow-up?–Case control study, are the controls

appropriate?• Note: other issues of relevance to

internal validity are considered under the other headings in this critical appraisal system. You can add your own questions, and also design your own questions for other study types such as cross sectional studies and systematic reviews

(Does this problem threaten the validity?)–Does this threaten the internal

validity of the study?

8. Are statistical tests considered?

(Any problem?)–Were the tests appropriate for the

data?–Are confidence intervals given?– Is the power given if a null result?– In a trial, are results presented as

absolute risk reduction as well as relative risk reduction?

(Does this problem threaten the validity?)– If not, how useful are the results?

9. Are the results clinically/socially significant?

(Any problem?)–Was the sample size adequate to

detect a clinically/socially significant result?

–Are the results presented in a way to help in health policy decisions?

(Does this problem threaten the validity?)– Is the study useful?

10. Are ethical issues considered?

(Any problem?)–Does the paper indicate ethics

approval?–Can you identify potential ethical

issues?

(Does this problem threaten the validity?–Are the results or their application

compromised?

11. What conclusions did the authors reach about the study

question?

(Any problem?)–Do the results apply to the

population in which you are interested?

(Does this problem threaten the validity?–Will you use the results of the

study?

Appraisal ToolsTools from the Critical Appraisal Skills Programme (CASP)–Systematic Reviews–Randomised Controlled Trials–Qualitative Research Studies–Cohort Studies–Case-Control Studies–Diagnostic Test Studies–Economic Evaluation Studies

Available at: http://www.phru.nhs.uk/casp/critical_appraisal_tools.htm

http://www.phru.nhs.uk/casp/critical_appraisal_tools.htm

http://www.phru.nhs.uk/casp/critical_appraisal_tools.htm

Study Designs Recap

Effectiveness of Therapy

Risk Factors / Prognosis

Diagnosis

Attitudes & Beliefs

Randomised Controlled Trial

Cohort Study

Survey using gold standard

Qualitative (Interviews, Observations, etc)

Critical appraisal

- Validity- Importance- Applicability

Methods Results Discussion

Thanks

Documents

Critical Appraisal