SessionNumber:T146’’ Session’Title:’CONVEYING CONFIDENT...

Preview:

Citation preview

Session  Number:    T146    Session  Title:  CONVEYING CONFIDENT CONCLUSIONS: P-VALUES, CONFIDENCE INTERVALS AND FOREST PLOTS  

PRESENTERS: Patti Ragan, PhD, MPH, PA-C Melissa Murfin, PharmD, PA-C, BCACP Elon University PAEA Oct 2014

¡  At the conclusion of this session, participants will be able to: ¡  1.  Ar6culate  the  importance  of  correctly  interpre6ng  conclusions  of  research  data  and  studies  

¡  2.  Define  and  discuss  the  meaning  of  p-­‐values,  confidence  intervals  and  forest  plots  

¡  3.  Differen6ate  between  sta6s6cal  and  clinical  significance  when  evalua6ng  research  outcomes  

LEARNING OBJECTIVES

2

“THE DIFFERENCE BETWEEN SIGNIFICANT AND NOT SIGNIFICANT IS

NOT ITSELF STATISTICALLY SIGNIFICANT”

Gelman A & Stern H. The American Statistician,

November 2006, Vol. 60, No. 4

AND NOQUOTE OF THE DAY IS NOT ITSELF”

3

¡ Clinical knowledge is gained through research trials and from experience § Most of what we hold to be true needs to be “tested”

§  Is a new treatment superior to a current treatment in preventing recurrence of disease?

§ Also determines what is beneficial or has a proven value (outcome) § Does screening mammography in women aged 40-50 who

have no risk factors provide more good than harm? § Does it decrease overall mortality? § Does it decrease mortality from breast cancer?

UNDERSTANDING RESEARCH

4

¡ The gold standard for clinical trials is null hypothesis significance testing (NHST)

¡ Null hypothesis states there is no difference between two (or more) sample groups drawn from the population of interest

¡ Study results: researchers either “reject” or “fail to reject” the null hypothesis

HYPOTHESIS TESTING (OR THE DREADED “P” VALUE)

5

¡ By convention, typically set significance level at 0.05 or 5% (1/20) chance of error; can also be set at 0.1 or 1% (1/100) chance of error or likelihood that the finding occurred by “chance”

¡ P value or probability must be < 0.05 for significance § If p < 0.05, reject the null hypothesis § If p >0.05, fail to reject null hypothesis

¡ A significant P value does not tell you the importance of the finding

HYPOTHESIS TESTING (OR THE DREADED “P” VALUE)

6

¡ If they fail to reject, researchers can’t conclude there wasn’t a difference, only that they failed to find one

¡ Other factors to consider: § Variable of interest is normally distributed § Sample selection § Sample size § Magnitude of the difference (if one exists)

HYPOTHESIS TESTING (OR THE DREADED “P” VALUE)

7

INTERPRETING P-VALUES

Effect of Rosiglitazone (Avandia) on the Risk of MI and Death from Cardiovascular Causes. NEJM:: Nissen and Wolski 356 (24): 2457, Table 4. June 14, 2007

8

¡ Statistical significance vs. clinical significance

§ A finding may be “statistically” significant, but may not be meaningful in a clinical context § Necessary, but not sufficient

§ May be an artifact due to a large sample size

STATISTICAL VS. CLINICAL SIGNIFICANCE

9

¡ What is SPIN? § “Specific reporting strategies, whatever their motive,

to highlight that the experimental treatment is beneficial despite a statistically nonsignificant difference for the primary outcome.”

§  Boutron, Dutton, Ravaud, Alman “Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes.” JAMA 2010

*Always compare text to tables, conclusions should match the data!

“SPIN”

10

¡  almost reached stat ist ical s ignif icance (p=0.06)

¡  almost s ignif icant (p=0.06) ¡  almost s ignif icant tendency

(p=0.06) ¡  almost stat ist ical ly s ignif icant

(p=0.06) ¡  an adverse trend (p=0.10) ¡  an apparent t rend (p=0.286) ¡  an associat ive t rend (p=0.09) ¡  an elevated trend (p<0.05) ¡  an encouraging trend (p<0.1) ¡  an establ ished trend (p<0.10) ¡  an evident t rend (p=0.13) ¡  an expected trend (p=0.08) ¡  an impor tant t rend (p=0.066) ¡  an increasing trend (p<0.09) ¡  an interest ing trend (p=0.1) ¡  an inverse trend toward s ignificance

(p=0.06) ¡  an observed trend (p=0.06)

¡  barely below the level of s igni f icance (p=0.06)

¡  barely escaped stat is t ica l s igni f icance (p=0.07)

¡  c losely approaches the stat is t ica l s igni f icance (p=0.0669)

¡  c losely approx imat ing s igni f icance (p>0.05)

¡  fe l l bare ly shor t of s igni f icance (p=0.08)

¡  fe l l just shor t of s tat is t ica l s igni f icance (p=0.12)

¡  fe l l marginal ly shor t of s igni f icance (p=0.07)

¡  fe l l narrowly shor t of s igni f icance (p=0.0623)

¡  f l i r t ing wi th convent ional levels of s igni f icance (p>0.1)

¡  near miss of s tat is t ica l s igni f icance (p>0.1)

¡  not fu l ly s igni f icant (p=0.085)

HOW MANY WAYS CAN YOU SAY “ALMOST SIGNIFICANT”

11 http://mchankins.wordpress.com/2013/04/21/still-not-significant-2/

¡ Increased “publish or perish” pressure?

¡ With new software ability to “preview” results and optimally stop at any point that meets the needs of the researcher

¡ Problematic research practice engaging in “researcher degrees of freedom” (John, Lowenstein, & Prelec, 2012; Simmons, Nelson and Simonsohn 2011) to meet the level of significance?

WHY THE INCREASE IN SPIN?

12

¡ Confidence intervals (CI) are descriptive statistics

that provide an estimate of the range of values (around the mean) that would be reasonably expected for the entire population § Allows readers to directly determine whether if a value is

significant and provides a context

¡ Most often see 95% CI § Interpretation: If we repeated the experiment multiple

times, 95% of the CI would include the population mean

§ May also see 99 or 90% CIs

CONFIDENCE INTERVALS

13

¡ Width of CI §  Narrower interval provides more certainty, smaller range of values

§  ______X_____ §  _________________X_______________

§ Wider interval provides less certainty, larger range of values §  Can be due to small sample size or an unreliable mean

§  Interpretation for CI: §  For Relative Risk or Odds Ratio: non-significant if includes “1” §  For Absolute Risk Reduction (ARR) or proportion, non-significant if

interval includes “0” §  For Number needed to treat (NNT), report for both significant and

non-significant values

MORE ON CONFIDENCE INTERVALS

14

¡  Interpretation of significance with CI:

§  For relative risks (RR) or odds ratios (OR), the value is not significant if it includes 1 in the interval

§  RR of 3.1 (95% CI of 1.5-4.9) Significant or not significant?

§  RR of 3.1 (95% CI of 0.98-5.1) Significant or not significant?

§  RR of 0.8 (95% CI of 0.66 -0.98) Significant or not significant?

§  For absolute risks or weighted mean differences, the value is not significant if it includes zero (0) in the interval

§  Point estimate of 0.34 (95% CI of 0.02 - 0.67) Significant or NS?

§  Point estimate of 1.7 (95% CI of –1.2 – 1.9) Significant or NS?

§  Point estimate of 1.7 (95% CI of 1.5 – 2.3) Significant or NS?

CONFLICTS, CONFUSION OR CONSISTENCY?

15

POINT ESTIMATE WITH CONFIDENCE INTERVAL AND P VALUE BOTH REPORTED

Effect of Rosiglitazone (Avandia) on the Risk of MI and Death from Cardiovascular Causes. NEJM:: Nissen and Wolski 356 (24): 2457, Table 4. June 14, 2007

16

§ Since number-needed-to-treat (NNT) estimates can be relatively small and unstable, should always include a confidence interval §  If no treatment effect, the risk reduction is zero and the NNT

is infinite

§  CI help determine if a result is clinically relevant

or applicable

WHAT ABOUT CONFIDENCE INTERVALS FOR NUMBER NEEDED TO TREAT?

17

TYPE OF DATA PRESENTATION MAY INFLUENCE

INTERPRETATION

18

¡  Type I error (alpha error) – finding a dif ference when one does not exist. Generally a more serious type of error. § Problem: changing therapy to one that is not more effective

¡  Type II error (beta error) – failing to find a dif ference when one is present. It is a reflection of power and the magnitude of the dif ference.

§ By convention, power is typically .80 to .90 (80-90%) § Goal: Having enough participants after dropouts occur to find

the difference when one is present

§ Typically less serious type of error, can correct by increasing the power

TYPE I (ALPHA) AND TYPE II (BETA) ERRORS

19

¡  Research outcomes collectively can have a dif ferent interpretation than a single clinical trial § Meta-analyses may show findings that if viewed in a broader context

(vs dichotomous outcomes) can demonstrate consistent findings §  View graphically to see where findings occur and the relationship between

them

¡ McCormack et al . BMC Medical Research Methodology7

advocate that authors, publishers, etc. who write about medical interventions use “common sense and good judgment when presenting results that dif fer from others and not be so beholden to the magical statistical significance level of 0.5.”

CONCLUSIONS FROM MULTIPLE STUDIES CAN BE DIFFERENT

20

ON TO THE FOREST……

21

FOREST PLOT (BLOBBOGRAM)

¡ Helps to see the forest and not get caught in all the trees

¡ show info from individual studies that went into a meta-analysis at a glance

¡ show amount of variation between studies and estimate of overall result

22

¡  Preferred Reporting Items for Systematic Reviews and Meta-Analyses

¡  Published 2009 ¡  Standards for meta-analyses reporting ¡  Recommends reporting individual study results in a forest plot

PRISMA STATEMENT

http://www.prisma-statement.org/statement.htm 23

FOREST PLOT (BLOBBOGRAM)

¡ The Cochrane Collaboration logo § Forest plot

¡ Meta-analysis of 7 trials of the effect of giving steroids to moms giving birth prematurely

¡ 2 of 7 trials showed statistically significant improvement in infant mortality

Logo used with permission of The Cochrane Collaboration http://www.cochrane.org/about-us/history/our-logo 24

The plot is drawn in STATA 11 (Stata Corp., College Station, TX, USA) from data presented in Ezekowitz et al.22 Note that studies have been sorted first by whether they addressed

primary or secondary prevention and second by year of publication.

Schriger D L et al. Int. J. Epidemiol. 2010;39:421-429

Published by Oxford University Press on behalf of the International Epidemiological Association © The Author 2010; all rights reserved.

25

Ezekowitz, JA et al. Systematic Review: Implantable Cardioverter Defibrillators for Adults with Left Ventricular Systolic Dysfunction. Annals of Internal Medicine. 2007;147(4):251-W50. 26

Ezekowitz, JA et al. Systematic Review: Implantable Cardioverter Defibrillators for Adults with Left Ventricular Systolic Dysfunction. Annals of Internal Medicine. 2007;147(4):251-W50.

Title: tells comparison and outcomes

Studies reviewed

27

Effect of ordering on appearance of forest plot.

Schriger D L et al. Int. J. Epidemiol. 2010;39:421-429 Published by Oxford University Press on behalf of the International Epidemiological Association ©

The Author 2010; all rights reserved. 28

Ezekowitz, JA et al. Systematic Review: Implantable Cardioverter Defibrillators for Adults with Left Ventricular Systolic Dysfunction. Annals of Internal Medicine. 2007;147(4):251-W50. 29

Ezekowitz, JA et al. Systematic Review: Implantable Cardioverter Defibrillators for Adults with Left Ventricular Systolic Dysfunction. Annals of Internal Medicine. 2007;147(4):251-W50. 30

¡ More weight to the studies which give us more information § More participants § More events § More precision

¡ Weight is proportional to the precision

WEIGHTING STUDIES

31

¡  Do we need studies to be exactly the same?

¡ When can we say we are measuring the same thing?

DOES IT MAKE SENSE TO COMBINE?

32

¡ Variation in study results due to chance ¡ Results of each individual trial are compatible with

results of the others ¡ Can be estimated with forest plot ¡ Do CIs overlap?

§ Lower CI of each trial should be below upper CI of all the rest

¡ If outliers, heterogeneous ¡ Calculated with Chi-squared test

§ Low power with few studies § Over powered for many studies

HETEROGENEITY

http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fjournal.pmed.1000100#pmed-1000100-box006

33

¡  Chi-square X2 ¡  Degrees of freedom (df)

§  One less than number of trials analyzed ¡  If X2 = df, not interpretable ¡  If X2 > df, heterogeneous ¡  Sometimes p-value given

§ May be 0.1

HETEROGENEITY

34

¡ I2 statistic § Quantifies variation across studies that is not due to

chance § Percentage of total variation in estimated effects

across studies that is due to heterogeneity § < 25% = homogeneous § Low power with few studies

HETEROGENEITY

http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fjournal.pmed.1000100#pmed-1000100-box006

35

Palmer, SC et al. Meta-analysis: Vitamin D Compounds in Chronic Kidney Disease. Ann of Intern Med. 2007;147:840-53.

36

¡ Lack of evidence does not mean lack of value

¡ Not every procedure must be validated with randomized clinical trials

¡ Ethics and feasibility impact quality of available studies

¡ Clinician must decide whether a statistically significant or non-significant result is clinically significant for a given patient

INTERPRETING THE FINDINGS

37

¡  No adequate synthesis ¡ Mainly on therapy ¡  For some Systematic Reviews it is dif ficult to define what the

intervention really is (e.g. Non-drug interventions)

LIMITATIONS

38

¡  1. It is important to not only understand statistical concepts and results, but also to have the context to correctly interpret them

¡  2. P-values, confidence intervals and forest plots each can add unique information in interpreting the meaning of a research question result

¡  3. No single study definitely answers a research question, meta-analysis (forest plots) provide a “big picture” perspective of combining multiple studies

¡  4. Statistical significance does not necessarily mean a finding is clinically relevant

TAKE-HOME POINTS

39

LINGERING QUESTIONS?

40

¡  1. Coulson M, Healey M, Fidler F, Cumming G. Confidence Intervals permit , but do not guarantee, better inference than statist ical s ignificance testing. Frontiers in Psychology. 2010;1 (26):1-9.

¡  2. Leggett NC, Thomas NA, Loetscher T, Nichol ls ME. The l i fe of P: “Just signif icant” results are on the r ise. The Quarterly J . of Exp. Psych. 2013;66(12):2303-2309.

¡  3. Statist ical Signif icance. Avai lable at: http://mchankins.wordpress.com/2013/04/21/sti l l -not-signif icant-2/ Accessed on May 27, 2014.

¡  4. Gaskin CJ, Happell , B. Power, ef fects, confidence and signif icance: An investigation of statist ical practices in nursing research. 2013; Avai lable at http://dx.doi .org/10.1016/j. i jnurstu.2013.09.014 Accessed May 14, 2014.

¡  5. Altman, D. Why we need confidence intervals. World J . Surg. 2003;29:554-556.

¡  6. Wolfe R, Cumming G. Communicating the uncer tainty in research findings: confidence intervals. J Sci Med Spor t . 2004;7(2):138-143.

¡  7. McCormack J, Vandemeer B, Al lan GM. How confidence intervals become confusion intervals. BMC Medical Research Methodology. 2013;13:134. Avai lable at http://www.biomedcentral .com/1471-2288/13/134 Accessed May 14, 2014.

REFERENCES & RESOURCES

41

¡  8. Altman DG. Confidence intervals for the number needed to treat. BMJ. 1998;317:1309-1312.

¡  9. Ferr i l l MJ, Brown DA, Kyle JA . Cl inical versus statist ical s ignif icance: Interpreting P values and confidence intervals related to measures of association to guide decision making. J Pharm Practice. 2010;23:344:344-351 Available at http://jpp.sagepub.com/content/23/4/344 Accessed on May 14, 2014.

¡  10. Cumming G. The new statist ics: why and how. Psychological Science. 2014;25(7)1-29 Avai lable at http://pss.sagepu.com/content/25/1/7 Accessed on May 14, 2014.

¡  11. The Cochrane Col laboration. Considerations and recommendations for f igures in Cochrane reviews: graphs of statist ical data. 2008. http://www.cochrane.org/sites/default/fi les/uploads/Graph_recommendations9.pdf. Accessed February 10, 2014.

¡  12. The Cochrane Col laboration. Cochrane Col laboration logo. http://www.cochrane.org/about-us/history/our- logo. Accessed February 10, 2014.

REFERENCES & RESOURCES

42

¡  13. Lewis, S. and Clarke, M. Forest plots: tr y ing to see the wood and the trees. BMJ. 2001;322;1479-1480.

¡  14. Buitrago-Lopez, A . et al . Chocolate consumption and cardiometabolic disorders: systematic review and meta-analysis. BMJ 2011;343:d4488 doi : 10.1136/bmj.d4488

¡  15. Cochrane Students’ Journal Club. Heterogeneity, I squared and subgroups. http://csjc. informer.org. in/knowledgebase/heterogeneity - i -squared-and-subgroups. Accessed February 19, 2014.

¡  16. Chlebowski, R.T. et al . Estrogen Plus Progestin and Breast Cancer Incidence and Mortal i ty in Postmenopausal Women. JAMA. 2010;304(15):1684-1692.

REFERENCES & RESOURCES

43

Recommended