View
36
Download
4
Category
Tags:
Preview:
DESCRIPTION
Primer on Statistics for Interventional Cardiologists Giuseppe Sangiorgi, MD Pierfrancesco Agostoni, MD Giuseppe Biondi-Zoccai, MD. What you will learn. Introduction Basics Descriptive statistics Probability distributions Inferential statistics - PowerPoint PPT Presentation
Citation preview
Primer on Statistics for Interventional
Cardiologists
Giuseppe Sangiorgi, MDPierfrancesco Agostoni, MDGiuseppe Biondi-Zoccai, MD
What you will learn• Introduction
• Basics
• Descriptive statistics
• Probability distributions
• Inferential statistics
• Finding differences in mean between two groups
• Finding differences in mean between more than 2 groups
• Linear regression and correlation for bivariate analysis
• Analysis of categorical data (contingency tables)
• Analysis of time-to-event data (survival analysis)
• Advanced statistics at a glance
• Conclusions and take home messages
What you will learn• Introduction
• Basics
• Descriptive statistics
• Probability distributions
• Inferential statistics
• Finding differences in mean between two groups
• Finding differences in mean between more than 2 groups
• Linear regression and correlation for bivariate analysis
• Analysis of categorical data (contingency tables)
• Analysis of time-to-event data (survival analysis)
• Advanced statistics at a glance
• Conclusions and take home messages
What you will learn• Descriptive statistics
– frequency distributions– contingency tables – measures of location: mean, median, mode– measures of dispersion: variance, standard
deviation, range, interquartile range– coefficient of variation– graphical presentation: histogram, box-plot,
scatter plot– correlation
What you will learn• Descriptive statistics
– frequency distributions– contingency tables – measures of location: mean, median, mode– measures of dispersion: variance, standard
deviation, range, interquartile range– coefficient of variation– graphical presentation: histogram, box-plot,
scatter plot– correlation
Cardiology
Counting and displaying dataAfter we have collected our data, we need to display them (tables, graphics and figures) Raw enumeration (eg lesion length by visual estimation in patients treated in Endeavor II trial: 14-27 mm)
……
Cardiology
example
Tabular display
Cardiology
example
DELAYED RRISC, JACC 2007
Tabular display
Cardiology
example
DELAYED RRISC, JACC 2007
Tabular display
Variables
nominalnominal ordinalordinal discretediscrete continuouscontinuous
orderedorderedcategoriescategories
ranksranks countingcounting measuringmeasuring
Types of variables
QUANTITYQUANTITYCATEGORYCATEGORY
Cardiology
Variable type
Nominal Ordinal Continuous
Patient ID Diabetes AHA/ACC Type
Lesion Length
1 Y A 18
2 N B1 24
3 N A 17
4 N C 25
5 Y B2 23
6 N A 15
7 N A 16
8 Y B2 18
9 N B1 21
10 Y B2 19
11 N B1 14
12 Y C 22
13 N C 27
Counting and displaying data
Create a database!
Cardiology
Frequency distribution
A frequency distribution is a list of the values that a variable takes in a sample. It is usually a list, ordered by quantity, showing the number of times each value appears
Diabetes n=13
Yes 5
No 8
AHA/ACC Type
n=13
A 4
B1 3
B2 3
C 3
Cardiology
Frequency distribution
A frequency distribution is a list of the values that a variable takes in a sample. It is usually a list, ordered by quantity, showing the number of times each value appears
Diabetes n=13
Yes 5 38.5%
No 8 61.5%
AHA/ACC Type
n=13
A 4 30.7%
B1 3 23.1%
B2 3 23.1%
C 3 23.1%
This introduces the concept of percentage or rate
Cardiology
Frequency distribution
ENDEAVOR III, JACC 2006
Cardiology
Frequency distribution
This simple tabulation has drawbacks. When a variable can take continuous values instead of discrete values or when the number of possible values is too large, the table construction is cumbersome, if not impossible
Lesion length
n=13
14 1 7.7%
15 1 7.7%
16 1 7.7%
17 1 7.7%
18 2 15.3%
19 1 7.7%
21 1 7.7%
22 1 7.7%
23 1 7.7%
24 1 7.7%
25 1 7.7%
27 1 7.7%
Cardiology
Frequency distribution
A slightly different tabulation scheme based on the range of values can be a solution in such cases
Lesion length n=13
14-20 mm 7 53.8%
21-27 mm 6 46.2%
However better solutions are coming later…
What you will learn• Descriptive statistics
– frequency distributions– contingency tables – measures of location: mean, median, mode– measures of dispersion: variance, standard
deviation, range, interquartile range– coefficient of variation– graphical presentation: histogram, box-plot,
scatter plot– correlation
Cardiology
Contingency tables are used to record and analyse the relationship between two (or more) variables, most usually categorical variables
Counting and displaying data
Diabetes n=13
Yes 5 38.5%
No 8 61.5%
AHA/ACC Type
n=13
A 4 30.7%
B1 3 23.1%
B2 3 23.1%
C 3 23.1%
Cardiology
Contingency tables are used to record and analyse the relationship between two (or more) variables, most usually categorical variables
Counting and displaying data
3 3 0 2 8
1 0 3 1 5
4 3 3 3 13
no
yes
DIABETES
Total
A B1 B2 C
AHA/ACC type
Total
Cardiology
Contingency tables are used to record and analyse the relationship between two (or more) variables, most usually categorical variables
Counting and displaying data
3 3 0 2 8
37,5% 37,5% ,0% 25,0% 100,0%
1 0 3 1 5
20,0% ,0% 60,0% 20,0% 100,0%
4 3 3 3 13
30,8% 23,1% 23,1% 23,1% 100,0%
Count
% within DIABETES
Count
% within DIABETES
Count
% within DIABETES
no
yesDIABETES
Total
A B1 B2 C
AHA/ACC type
Total
Is there a difference between diabetics and non-dabetics in the rate of AHA/ACC type lesions?
The answer will follow…
What you will learn• Descriptive statistics
– frequency distributions– contingency tables – measures of location: mean, median, mode– measures of dispersion: variance, standard
deviation, range, interquartile range– coefficient of variation– graphical presentation: histogram, box-plot,
scatter plot– correlation
Cardiology
We need to describe the kind of values that we have (eg lesion length by visual estimation in patients treated in Endeavor II trial: 14-27 mm)
Raw enumeration
……
Measures of central tendency: rationale
Cardiology
xx
N
Characteristics:-summarises information well-discards a lot of information
(dispersion??)
Assumptions:-data are not skewed
– distorts the mean– outliers make the mean very different
-Measured on measurement scale– cannot find mean of a categorical measure
‘average’ stent diameter may be meaningless
Mean (arithmetic)
Cardiology
xx
N
Mean (arithmetic)
Lesion length
n=13
14 1 7.7%
15 1 7.7%
16 1 7.7%
17 1 7.7%
18 2 15.3%
19 1 7.7%
21 1 7.7%
22 1 7.7%
23 1 7.7%
24 1 7.7%
25 1 7.7%
27 1 7.7%
14+15+16+17+18+18+19+21+22+23+24+25+27
13
Mean = 19.92
Cardiology
TAPAS, Lancet 2008
Mean (arithmetic)
Cardiology
What is it?
– The one in the middle
– Place values in order
– Median is central
Definition:
– Equally distant from all other values
Used for:
– Ordinal data
– Skewed data / outliers
Median
Cardiology
MedianVariable
typeContinuous
Patient ID Lesion Length
1 18
2 24
3 17
4 25
5 23
6 15
7 16
8 18
9 21
10 19
11 14
12 22
13 27
Cardiology
MedianVariable
typeContinuous
Patient ID Lesion Length
1 18
2 24
3 17
4 25
5 23
6 15
7 16
8 18
9 21
10 19
11 14
12 22
13 27
Variable type
Continuous
Patient ID Lesion Length
11 14
6 15
7 16
3 17
1 18
8 18
10 19
9 21
12 22
5 23
2 24
4 25
13 27
Cardiology
What is it?
Definition:
– The most common value
Used (rarely) for:
– Discrete non interval data
– E.g. stent length, stent diameter…………
– MicroDriver is only available in 2.25, 2.50, 2.75 reporting the mean is meaningless
Mode
Cardiology
ModeVariable
typeContinuous
Patient ID Lesion Length
1 18
2 24
3 17
4 25
5 23
6 15
7 16
8 18
9 21
10 19
11 14
12 22
13 27
Lesion length
n=13
14 1 7.7%
15 1 7.7%
16 1 7.7%
17 1 7.7%
18 2 15.3%
19 1 7.7%
21 1 7.7%
22 1 7.7%
23 1 7.7%
24 1 7.7%
25 1 7.7%
27 1 7.7%
Cardiology
Mean is usually best– If it works– Useful properties (with standard deviation [SD])– But…
Driver Endeavor
17 21 19 2119 2117 2118 6
Mean 18 18Median 18 21
Lesion length
Comparing Measures of central tendency
Cardiology
It also depends on the underlying distribution…
Symmetric? mean = median = mode
Comparing Measures of central tendency
Value
Fre
quen
cy
Cardiology
It also depends on the underlying distribution…
Asymmetric? mean ≠ median ≠ mode
0
5
10
15
20
25
30
0 1 2 3 4 5 6 7 8 9
Number of Endeavor implanted per patient
Fre
qu
ency
Mode Mode
Median Median
Mean Mean
Comparing Measures of central tendency
Cardiology
Agostoni et al, AJC 2007
Median
What you will learn• Descriptive statistics
– frequency distributions– contingency tables – measures of location: mean, median, mode– measures of dispersion: variance, standard
deviation, range, interquartile range– coefficient of variation– graphical presentation: histogram, box-plot,
scatter plot– correlation
Cardiology
Central tendency doesn’t tell us everything– We need to know about the spread, or
dispersion of the scores
Is there a difference? And if yes, how big is it?
We can only tell if we know data dispersion
Group Late loss(mm)Endeavor 0.61Driver 1.03
Measures of dispersion: rationale
ENDEAVOR II, Circulation 2006
Cardiology
0 0.30 0.60 0.90 1.20 1.50
Late loss
Fre
qu
en
cy
DriverEndeavor
Measures of dispersion: examples
Cardiology
0 0.30 0.60 0.90 1.20 1.50
Late loss
Fre
qu
en
cy
DriverEndeavor
Measures of dispersion: examples
Cardiology
0 0.30 0.60 0.90 1.20 1.50
Late loss
Fre
qu
en
cy
DriverEndeavor
Measures of dispersion: examples
Cardiology
Value
Fre
quency
Gaussian, normal or “parametric” distributionGaussian, normal or “parametric” distribution
Shape of distribution
Cardiology
Value
Fre
qu
ency
Non-normal, right-skewedNon-normal, right-skewed
Departing from normality
Cardiology
Non-normal, left-skewedNon-normal, left-skewed
Value
Fre
qu
en
cyDeparting from normality
Cardiology
20
10
0
Fre
qu
ency
Value
Departing from normality
Outliers
Cardiology
• Standard deviation (SD)– Used with mean– Parametric tests
• Range– First to last value– Not commonly used
• Interquartile range– Used with median– 25% (1/4) to 75% (3/4) percentile– Non-parametric tests
Measures of dispersion: types
Cardiology
Standard deviation (SD):
– approximates population σ
as N increases
Advantages:
– with mean enables powerful synthesis
mean±1*SD 68% of data
mean±2*SD 95% of data (1.96)
mean±3*SD 99% of data (2.86)
Disadvantages:
– is based on normal assumptions
1
)( 2
-
-N
xxSDSD
Standard deviation
Variance
Cardiology
1
)( 2
-
-N
xxSDSD
Standard deviationVariable
typeContinuous
Patient ID Lesion Length
1 18
2 24
3 17
4 25
5 23
6 15
7 16
8 18
9 21
10 19
11 14
12 22
13 27
Mean 19.92
(18-19.92)2+(24-19.92)2+(17-19.92)2+…+(27-19.92)2
12
Variance = 16.58
SD = √16.58 = 4.07
Cardiology
-1 SD mean +1 SD
Fre
qu
ency
68%
Mean ± Standard deviation
Cardiology
-1 SD +1 SD-2 SD +2 SD
95%
mean
Fre
qu
ency
Mean ± Standard deviation
Cardiology
-1 SD +1 SD-2 SD +2 SD
99%
-3 SD +3 SDmean
Fre
qu
ency
Mean ± Standard deviation
Cardiology
TAPAS, Lancet 2008
Standard deviation
Cardiology
TAPAS, NEJM 2008
Standard deviation
Cardiology
TAPAS, NEJM 2008
Why not mean ± SD?
Cardiology
Rules of thumb
1. Refer to previous data or analyses (eg landmark articles, large databases)
2. Inspect tables and graphs (eg outliers, histograms)
3. Check rough equality of mean, median, mode
4. Perform ad hoc statistical tests• Levene’s test for equality of means
• Kolmogodorov-Smirnov tests
• …
Testing normality assumptions
Cardiology
Range
Lesion length
n=13
14 1 7.7%
15 1 7.7%
16 1 7.7%
17 1 7.7%
18 2 15.3%
19 1 7.7%
21 1 7.7%
22 1 7.7%
23 1 7.7%
24 1 7.7%
25 1 7.7%
27 1 7.7%
First to last value
Range = 14 – 27or
Range = 13
Cardiology
Range
RRISC, JACC 2006
Cardiology
Interquartile rangeVariable
typeContinuous
Patient ID Lesion Length
11 14
6 15
7 16
3 17
1 18
8 18
10 19
9 21
12 22
5 23
2 24
4 25
13 27
16.5
23.5
25% to 75% percentile
or
1° to 3° quartile
MedianInterquartile Range
=16.5 – 23.5
Cardiology
Agostoni et al, AJC 2007
Interquartile range
Cardiology
Cardiology
Statistics
Lesion Length13
0
19,9231
19,0000
18,00
4,07148
13,00
14,00
27,00
16,5000
19,0000
23,5000
Valid
Missing
N
Mean
Median
Mode
Std. Deviation
Range
Minimum
Maximum
25
50
75
Percentiles
Lesion Length
1 7,7 7,7 7,7
1 7,7 7,7 15,4
1 7,7 7,7 23,1
1 7,7 7,7 30,8
2 15,4 15,4 46,2
1 7,7 7,7 53,8
1 7,7 7,7 61,5
1 7,7 7,7 69,2
1 7,7 7,7 76,9
1 7,7 7,7 84,6
1 7,7 7,7 92,3
1 7,7 7,7 100,0
13 100,0 100,0
14,00
15,00
16,00
17,00
18,00
19,00
21,00
22,00
23,00
24,00
25,00
27,00
Total
ValidFrequency Percent Valid Percent
CumulativePercent
Cardiology
Reporting data
If parametric:Mean and
Standard Deviation
Mean ± SDMean (SD)
Age (y): 63 ± 13Age (y): 63 (13)
If non-parametric:Median and InterQuartile Range
Median [IQR]
NIH vol (mm3): 1.3 [0–13.1]
Mode and Range less commonly used
What you will learn• Descriptive statistics
– frequency distributions– contingency tables – measures of location: mean, median, mode– measures of dispersion: variance, standard
deviation, range, interquartile range– coefficient of variation– graphical presentation: histogram, box-plot,
scatter plot– correlation
Coefficient of Variation•The coefficient of variation (CV) is a normalized measure of dispersion of a probability distribution. It is defined as the ratio of the standard deviation to the mean
•This is only defined for non-zero mean, and is most useful for variables that are always positive. The coefficient of variation should only be computed for continuous data•A given standard deviation indicates a high or low degree of variability only in relation to the mean value •It is easier to get an idea of variability in a distribution by dividing the standard deviation with the mean
Coefficient of Variation•Advantages
•The CV is a dimensionless number
•The CV is particularly useful when comparing dispersion in datasets with: markedly different means or, different units of measurement
•Distributions with CV<1 are considered low-variance, while those with CV>1 are considered high-variance
•Disadvantages
•When the mean is near zero, the CV is sensitive to small changes in the mean, limiting its usefulness
•Unlike the standard deviation, it cannot be used to construct confidence intervals for the mean
What you will learn• Descriptive statistics
– frequency distributions– contingency tables – measures of location: mean, median, mode– measures of dispersion: variance, standard
deviation, range, interquartile range– coefficient of variation– graphical presentation: histogram, box-plot,
scatter plot– correlation
Cardiology
Histograms
DIABETES
10
10
8
6
4
2
0
no yes
ENDEAVOR II, Circulation 2006
Very good for categorical variables
Cardiology
HistogramsNot so good for continuous variables, but…
Cardiology
exampleboth restenotic and both restenotic and
non-restenotic SESnon-restenotic SES
Agostoni et al, AJC 2007
Histograms
Cardiology
example
non-restenotic SESnon-restenotic SES
Agostoni et al, AJC 2007
shape of distributionshape of distribution
Shape of distributions
Cardiology
Box (& whiskers) plots
Cardiology
Box (& whiskers) plots
Median (Q2)Interquartile
range
Max (Q4) or Q3+1.5(IQR)
Q1
Q3
Min (Q0) or Q1-1.5(IQR)
Cardiology
Box (& whiskers) plots
Margheri, Biondi Zoccai, et al, AJC 2008
Cardiology
Scatter plots
A scatter plot is a type of display using Cartesian coordinates to display values for two variables for a set of data. The data is displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable
determining the position on the vertical axis
Usually it is done with 2 continuous variables to visually assess the degree of correlation between them
But it can be also used with one categorical variable and one continuous variable (mainly if sample size is small)
Cardiology
Scatter plots
Abbate, Biondi Zoccai, et al, Circulation 2002
Cardiology
Scatter plots
Mintz, et al, AJC 2005
Cardiology
Agostoni, et al, IJC 2007
Scatter plots
Thank you for your attention
For any correspondence: gbiondizoccai@gmail.com
For further slides on these topics feel free to visit the metcardio.org website:
http://www.metcardio.org/slides.html
Recommended