76
Primer on Statistics for Interventional Cardiologists Giuseppe Sangiorgi, MD Pierfrancesco Agostoni, MD Giuseppe Biondi-Zoccai, MD

What you will learn

Embed Size (px)

DESCRIPTION

Primer on Statistics for Interventional Cardiologists Giuseppe Sangiorgi, MD Pierfrancesco Agostoni, MD Giuseppe Biondi-Zoccai, MD. What you will learn. Introduction Basics Descriptive statistics Probability distributions Inferential statistics - PowerPoint PPT Presentation

Citation preview

Page 1: What you will learn

Primer on Statistics for Interventional

Cardiologists

Giuseppe Sangiorgi, MDPierfrancesco Agostoni, MDGiuseppe Biondi-Zoccai, MD

Page 2: What you will learn

What you will learn• Introduction

• Basics

• Descriptive statistics

• Probability distributions

• Inferential statistics

• Finding differences in mean between two groups

• Finding differences in mean between more than 2 groups

• Linear regression and correlation for bivariate analysis

• Analysis of categorical data (contingency tables)

• Analysis of time-to-event data (survival analysis)

• Advanced statistics at a glance

• Conclusions and take home messages

Page 3: What you will learn

What you will learn• Introduction

• Basics

• Descriptive statistics

• Probability distributions

• Inferential statistics

• Finding differences in mean between two groups

• Finding differences in mean between more than 2 groups

• Linear regression and correlation for bivariate analysis

• Analysis of categorical data (contingency tables)

• Analysis of time-to-event data (survival analysis)

• Advanced statistics at a glance

• Conclusions and take home messages

Page 4: What you will learn

What you will learn• Descriptive statistics

– frequency distributions– contingency tables – measures of location: mean, median, mode– measures of dispersion: variance, standard

deviation, range, interquartile range– coefficient of variation– graphical presentation: histogram, box-plot,

scatter plot– correlation

Page 5: What you will learn

What you will learn• Descriptive statistics

– frequency distributions– contingency tables – measures of location: mean, median, mode– measures of dispersion: variance, standard

deviation, range, interquartile range– coefficient of variation– graphical presentation: histogram, box-plot,

scatter plot– correlation

Page 6: What you will learn

Cardiology

Counting and displaying dataAfter we have collected our data, we need to display them (tables, graphics and figures) Raw enumeration (eg lesion length by visual estimation in patients treated in Endeavor II trial: 14-27 mm)

……

Page 7: What you will learn

Cardiology

example

Tabular display

Page 8: What you will learn

Cardiology

example

DELAYED RRISC, JACC 2007

Tabular display

Page 9: What you will learn

Cardiology

example

DELAYED RRISC, JACC 2007

Tabular display

Page 10: What you will learn

Variables

nominalnominal ordinalordinal discretediscrete continuouscontinuous

orderedorderedcategoriescategories

ranksranks countingcounting measuringmeasuring

Types of variables

QUANTITYQUANTITYCATEGORYCATEGORY

Page 11: What you will learn

Cardiology

Variable type

Nominal Ordinal Continuous

Patient ID Diabetes AHA/ACC Type

Lesion Length

1 Y A 18

2 N B1 24

3 N A 17

4 N C 25

5 Y B2 23

6 N A 15

7 N A 16

8 Y B2 18

9 N B1 21

10 Y B2 19

11 N B1 14

12 Y C 22

13 N C 27

Counting and displaying data

Create a database!

Page 12: What you will learn

Cardiology

Frequency distribution

A frequency distribution is a list of the values that a variable takes in a sample. It is usually a list, ordered by quantity, showing the number of times each value appears

Diabetes n=13

Yes 5

No 8

AHA/ACC Type

n=13

A 4

B1 3

B2 3

C 3

Page 13: What you will learn

Cardiology

Frequency distribution

A frequency distribution is a list of the values that a variable takes in a sample. It is usually a list, ordered by quantity, showing the number of times each value appears

Diabetes n=13

Yes 5 38.5%

No 8 61.5%

AHA/ACC Type

n=13

A 4 30.7%

B1 3 23.1%

B2 3 23.1%

C 3 23.1%

This introduces the concept of percentage or rate

Page 14: What you will learn

Cardiology

Frequency distribution

ENDEAVOR III, JACC 2006

Page 15: What you will learn

Cardiology

Frequency distribution

This simple tabulation has drawbacks. When a variable can take continuous values instead of discrete values or when the number of possible values is too large, the table construction is cumbersome, if not impossible

Lesion length

n=13

14 1 7.7%

15 1 7.7%

16 1 7.7%

17 1 7.7%

18 2 15.3%

19 1 7.7%

21 1 7.7%

22 1 7.7%

23 1 7.7%

24 1 7.7%

25 1 7.7%

27 1 7.7%

Page 16: What you will learn

Cardiology

Frequency distribution

A slightly different tabulation scheme based on the range of values can be a solution in such cases

Lesion length n=13

14-20 mm 7 53.8%

21-27 mm 6 46.2%

However better solutions are coming later…

Page 17: What you will learn

What you will learn• Descriptive statistics

– frequency distributions– contingency tables – measures of location: mean, median, mode– measures of dispersion: variance, standard

deviation, range, interquartile range– coefficient of variation– graphical presentation: histogram, box-plot,

scatter plot– correlation

Page 18: What you will learn

Cardiology

Contingency tables are used to record and analyse the relationship between two (or more) variables, most usually categorical variables

Counting and displaying data

Diabetes n=13

Yes 5 38.5%

No 8 61.5%

AHA/ACC Type

n=13

A 4 30.7%

B1 3 23.1%

B2 3 23.1%

C 3 23.1%

Page 19: What you will learn

Cardiology

Contingency tables are used to record and analyse the relationship between two (or more) variables, most usually categorical variables

Counting and displaying data

3 3 0 2 8

1 0 3 1 5

4 3 3 3 13

no

yes

DIABETES

Total

A B1 B2 C

AHA/ACC type

Total

Page 20: What you will learn

Cardiology

Contingency tables are used to record and analyse the relationship between two (or more) variables, most usually categorical variables

Counting and displaying data

3 3 0 2 8

37,5% 37,5% ,0% 25,0% 100,0%

1 0 3 1 5

20,0% ,0% 60,0% 20,0% 100,0%

4 3 3 3 13

30,8% 23,1% 23,1% 23,1% 100,0%

Count

% within DIABETES

Count

% within DIABETES

Count

% within DIABETES

no

yesDIABETES

Total

A B1 B2 C

AHA/ACC type

Total

Is there a difference between diabetics and non-dabetics in the rate of AHA/ACC type lesions?

The answer will follow…

Page 21: What you will learn

What you will learn• Descriptive statistics

– frequency distributions– contingency tables – measures of location: mean, median, mode– measures of dispersion: variance, standard

deviation, range, interquartile range– coefficient of variation– graphical presentation: histogram, box-plot,

scatter plot– correlation

Page 22: What you will learn

Cardiology

We need to describe the kind of values that we have (eg lesion length by visual estimation in patients treated in Endeavor II trial: 14-27 mm)

Raw enumeration

……

Measures of central tendency: rationale

Page 23: What you will learn

Cardiology

xx

N

Characteristics:-summarises information well-discards a lot of information

(dispersion??)

Assumptions:-data are not skewed

– distorts the mean– outliers make the mean very different

-Measured on measurement scale– cannot find mean of a categorical measure

‘average’ stent diameter may be meaningless

Mean (arithmetic)

Page 24: What you will learn

Cardiology

xx

N

Mean (arithmetic)

Lesion length

n=13

14 1 7.7%

15 1 7.7%

16 1 7.7%

17 1 7.7%

18 2 15.3%

19 1 7.7%

21 1 7.7%

22 1 7.7%

23 1 7.7%

24 1 7.7%

25 1 7.7%

27 1 7.7%

14+15+16+17+18+18+19+21+22+23+24+25+27

13

Mean = 19.92

Page 25: What you will learn

Cardiology

TAPAS, Lancet 2008

Mean (arithmetic)

Page 26: What you will learn

Cardiology

What is it?

– The one in the middle

– Place values in order

– Median is central

Definition:

– Equally distant from all other values

Used for:

– Ordinal data

– Skewed data / outliers

Median

Page 27: What you will learn

Cardiology

MedianVariable

typeContinuous

Patient ID Lesion Length

1 18

2 24

3 17

4 25

5 23

6 15

7 16

8 18

9 21

10 19

11 14

12 22

13 27

Page 28: What you will learn

Cardiology

MedianVariable

typeContinuous

Patient ID Lesion Length

1 18

2 24

3 17

4 25

5 23

6 15

7 16

8 18

9 21

10 19

11 14

12 22

13 27

Variable type

Continuous

Patient ID Lesion Length

11 14

6 15

7 16

3 17

1 18

8 18

10 19

9 21

12 22

5 23

2 24

4 25

13 27

Page 29: What you will learn

Cardiology

What is it?

Definition:

– The most common value

Used (rarely) for:

– Discrete non interval data

– E.g. stent length, stent diameter…………

– MicroDriver is only available in 2.25, 2.50, 2.75 reporting the mean is meaningless

Mode

Page 30: What you will learn

Cardiology

ModeVariable

typeContinuous

Patient ID Lesion Length

1 18

2 24

3 17

4 25

5 23

6 15

7 16

8 18

9 21

10 19

11 14

12 22

13 27

Lesion length

n=13

14 1 7.7%

15 1 7.7%

16 1 7.7%

17 1 7.7%

18 2 15.3%

19 1 7.7%

21 1 7.7%

22 1 7.7%

23 1 7.7%

24 1 7.7%

25 1 7.7%

27 1 7.7%

Page 31: What you will learn

Cardiology

Mean is usually best– If it works– Useful properties (with standard deviation [SD])– But…

Driver Endeavor

17 21 19 2119 2117 2118 6

Mean 18 18Median 18 21

Lesion length

Comparing Measures of central tendency

Page 32: What you will learn

Cardiology

It also depends on the underlying distribution…

Symmetric? mean = median = mode

Comparing Measures of central tendency

Value

Fre

quen

cy

Page 33: What you will learn

Cardiology

It also depends on the underlying distribution…

Asymmetric? mean ≠ median ≠ mode

0

5

10

15

20

25

30

0 1 2 3 4 5 6 7 8 9

Number of Endeavor implanted per patient

Fre

qu

ency

Mode Mode

Median Median

Mean Mean

Comparing Measures of central tendency

Page 34: What you will learn

Cardiology

Agostoni et al, AJC 2007

Median

Page 35: What you will learn

What you will learn• Descriptive statistics

– frequency distributions– contingency tables – measures of location: mean, median, mode– measures of dispersion: variance, standard

deviation, range, interquartile range– coefficient of variation– graphical presentation: histogram, box-plot,

scatter plot– correlation

Page 36: What you will learn

Cardiology

Central tendency doesn’t tell us everything– We need to know about the spread, or

dispersion of the scores

Is there a difference? And if yes, how big is it?

We can only tell if we know data dispersion

Group Late loss(mm)Endeavor 0.61Driver 1.03

Measures of dispersion: rationale

ENDEAVOR II, Circulation 2006

Page 37: What you will learn

Cardiology

0 0.30 0.60 0.90 1.20 1.50

Late loss

Fre

qu

en

cy

DriverEndeavor

Measures of dispersion: examples

Page 38: What you will learn

Cardiology

0 0.30 0.60 0.90 1.20 1.50

Late loss

Fre

qu

en

cy

DriverEndeavor

Measures of dispersion: examples

Page 39: What you will learn

Cardiology

0 0.30 0.60 0.90 1.20 1.50

Late loss

Fre

qu

en

cy

DriverEndeavor

Measures of dispersion: examples

Page 40: What you will learn

Cardiology

Value

Fre

quency

Gaussian, normal or “parametric” distributionGaussian, normal or “parametric” distribution

Shape of distribution

Page 41: What you will learn

Cardiology

Value

Fre

qu

ency

Non-normal, right-skewedNon-normal, right-skewed

Departing from normality

Page 42: What you will learn

Cardiology

Non-normal, left-skewedNon-normal, left-skewed

Value

Fre

qu

en

cyDeparting from normality

Page 43: What you will learn

Cardiology

20

10

0

Fre

qu

ency

Value

Departing from normality

Outliers

Page 44: What you will learn

Cardiology

• Standard deviation (SD)– Used with mean– Parametric tests

• Range– First to last value– Not commonly used

• Interquartile range– Used with median– 25% (1/4) to 75% (3/4) percentile– Non-parametric tests

Measures of dispersion: types

Page 45: What you will learn

Cardiology

Standard deviation (SD):

– approximates population σ

as N increases

Advantages:

– with mean enables powerful synthesis

mean±1*SD 68% of data

mean±2*SD 95% of data (1.96)

mean±3*SD 99% of data (2.86)

Disadvantages:

– is based on normal assumptions

1

)( 2

-

-N

xxSDSD

Standard deviation

Variance

Page 46: What you will learn

Cardiology

1

)( 2

-

-N

xxSDSD

Standard deviationVariable

typeContinuous

Patient ID Lesion Length

1 18

2 24

3 17

4 25

5 23

6 15

7 16

8 18

9 21

10 19

11 14

12 22

13 27

Mean 19.92

(18-19.92)2+(24-19.92)2+(17-19.92)2+…+(27-19.92)2

12

Variance = 16.58

SD = √16.58 = 4.07

Page 47: What you will learn

Cardiology

-1 SD mean +1 SD

Fre

qu

ency

68%

Mean ± Standard deviation

Page 48: What you will learn

Cardiology

-1 SD +1 SD-2 SD +2 SD

95%

mean

Fre

qu

ency

Mean ± Standard deviation

Page 49: What you will learn

Cardiology

-1 SD +1 SD-2 SD +2 SD

99%

-3 SD +3 SDmean

Fre

qu

ency

Mean ± Standard deviation

Page 50: What you will learn

Cardiology

TAPAS, Lancet 2008

Standard deviation

Page 51: What you will learn

Cardiology

TAPAS, NEJM 2008

Standard deviation

Page 52: What you will learn

Cardiology

TAPAS, NEJM 2008

Why not mean ± SD?

Page 53: What you will learn

Cardiology

Rules of thumb

1. Refer to previous data or analyses (eg landmark articles, large databases)

2. Inspect tables and graphs (eg outliers, histograms)

3. Check rough equality of mean, median, mode

4. Perform ad hoc statistical tests• Levene’s test for equality of means

• Kolmogodorov-Smirnov tests

• …

Testing normality assumptions

Page 54: What you will learn

Cardiology

Range

Lesion length

n=13

14 1 7.7%

15 1 7.7%

16 1 7.7%

17 1 7.7%

18 2 15.3%

19 1 7.7%

21 1 7.7%

22 1 7.7%

23 1 7.7%

24 1 7.7%

25 1 7.7%

27 1 7.7%

First to last value

Range = 14 – 27or

Range = 13

Page 55: What you will learn

Cardiology

Range

RRISC, JACC 2006

Page 56: What you will learn

Cardiology

Interquartile rangeVariable

typeContinuous

Patient ID Lesion Length

11 14

6 15

7 16

3 17

1 18

8 18

10 19

9 21

12 22

5 23

2 24

4 25

13 27

16.5

23.5

25% to 75% percentile

or

1° to 3° quartile

MedianInterquartile Range

=16.5 – 23.5

Page 57: What you will learn

Cardiology

Agostoni et al, AJC 2007

Interquartile range

Page 58: What you will learn

Cardiology

Page 59: What you will learn

Cardiology

Statistics

Lesion Length13

0

19,9231

19,0000

18,00

4,07148

13,00

14,00

27,00

16,5000

19,0000

23,5000

Valid

Missing

N

Mean

Median

Mode

Std. Deviation

Range

Minimum

Maximum

25

50

75

Percentiles

Lesion Length

1 7,7 7,7 7,7

1 7,7 7,7 15,4

1 7,7 7,7 23,1

1 7,7 7,7 30,8

2 15,4 15,4 46,2

1 7,7 7,7 53,8

1 7,7 7,7 61,5

1 7,7 7,7 69,2

1 7,7 7,7 76,9

1 7,7 7,7 84,6

1 7,7 7,7 92,3

1 7,7 7,7 100,0

13 100,0 100,0

14,00

15,00

16,00

17,00

18,00

19,00

21,00

22,00

23,00

24,00

25,00

27,00

Total

ValidFrequency Percent Valid Percent

CumulativePercent

Page 60: What you will learn

Cardiology

Reporting data

If parametric:Mean and

Standard Deviation

Mean ± SDMean (SD)

Age (y): 63 ± 13Age (y): 63 (13)

If non-parametric:Median and InterQuartile Range

Median [IQR]

NIH vol (mm3): 1.3 [0–13.1]

Mode and Range less commonly used

Page 61: What you will learn

What you will learn• Descriptive statistics

– frequency distributions– contingency tables – measures of location: mean, median, mode– measures of dispersion: variance, standard

deviation, range, interquartile range– coefficient of variation– graphical presentation: histogram, box-plot,

scatter plot– correlation

Page 62: What you will learn

Coefficient of Variation•The coefficient of variation (CV) is a normalized measure of dispersion of a probability distribution. It is defined as the ratio of the standard deviation to the mean

•This is only defined for non-zero mean, and is most useful for variables that are always positive. The coefficient of variation should only be computed for continuous data•A given standard deviation indicates a high or low degree of variability only in relation to the mean value •It is easier to get an idea of variability in a distribution by dividing the standard deviation with the mean

Page 63: What you will learn

Coefficient of Variation•Advantages

•The CV is a dimensionless number

•The CV is particularly useful when comparing dispersion in datasets with: markedly different means or, different units of measurement

•Distributions with CV<1 are considered low-variance, while those with CV>1 are considered high-variance

•Disadvantages

•When the mean is near zero, the CV is sensitive to small changes in the mean, limiting its usefulness

•Unlike the standard deviation, it cannot be used to construct confidence intervals for the mean

Page 64: What you will learn

What you will learn• Descriptive statistics

– frequency distributions– contingency tables – measures of location: mean, median, mode– measures of dispersion: variance, standard

deviation, range, interquartile range– coefficient of variation– graphical presentation: histogram, box-plot,

scatter plot– correlation

Page 65: What you will learn

Cardiology

Histograms

DIABETES

10

10

8

6

4

2

0

no yes

ENDEAVOR II, Circulation 2006

Very good for categorical variables

Page 66: What you will learn

Cardiology

HistogramsNot so good for continuous variables, but…

Page 67: What you will learn

Cardiology

exampleboth restenotic and both restenotic and

non-restenotic SESnon-restenotic SES

Agostoni et al, AJC 2007

Histograms

Page 68: What you will learn

Cardiology

example

non-restenotic SESnon-restenotic SES

Agostoni et al, AJC 2007

shape of distributionshape of distribution

Shape of distributions

Page 69: What you will learn

Cardiology

Box (& whiskers) plots

Page 70: What you will learn

Cardiology

Box (& whiskers) plots

Median (Q2)Interquartile

range

Max (Q4) or Q3+1.5(IQR)

Q1

Q3

Min (Q0) or Q1-1.5(IQR)

Page 71: What you will learn

Cardiology

Box (& whiskers) plots

Margheri, Biondi Zoccai, et al, AJC 2008

Page 72: What you will learn

Cardiology

Scatter plots

A scatter plot is a type of display using Cartesian coordinates to display values for two variables for a set of data. The data is displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable

determining the position on the vertical axis

Usually it is done with 2 continuous variables to visually assess the degree of correlation between them

But it can be also used with one categorical variable and one continuous variable (mainly if sample size is small)

Page 73: What you will learn

Cardiology

Scatter plots

Abbate, Biondi Zoccai, et al, Circulation 2002

Page 74: What you will learn

Cardiology

Scatter plots

Mintz, et al, AJC 2005

Page 75: What you will learn

Cardiology

Agostoni, et al, IJC 2007

Scatter plots

Page 76: What you will learn

Thank you for your attention

For any correspondence: [email protected]

For further slides on these topics feel free to visit the metcardio.org website:

http://www.metcardio.org/slides.html