Upload
boone
View
47
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Primer on Statistics for Interventional Cardiologists Giuseppe Sangiorgi, MD Pierfrancesco Agostoni, MD Giuseppe Biondi-Zoccai, MD. What you will learn. Introduction Basics Descriptive statistics Probability distributions Inferential statistics - PowerPoint PPT Presentation
Citation preview
Primer on Statistics for Interventional
CardiologistsGiuseppe Sangiorgi, MD
Pierfrancesco Agostoni, MDGiuseppe Biondi-Zoccai, MD
What you will learn• Introduction• Basics• Descriptive statistics• Probability distributions• Inferential statistics• Finding differences in mean between two groups• Finding differences in mean between more than 2 groups• Linear regression and correlation for bivariate analysis • Analysis of categorical data (contingency tables)• Analysis of time-to-event data (survival analysis)• Advanced statistics at a glance• Conclusions and take home messages
What you will learn• Introduction• Basics• Descriptive statistics• Probability distributions• Inferential statistics• Finding differences in mean between two groups• Finding differences in mean between more than 2 groups• Linear regression and correlation for bivariate analysis • Analysis of categorical data (contingency tables)• Analysis of time-to-event data (survival analysis)• Advanced statistics at a glance• Conclusions and take home messages
Variables
nominal ordinal discrete continuous
orderedcategories
ranks counting measuring
Death: yes/noTLR: yes/no
TIMIflow
BMIBlood pressure
QCA data (MLD, late loss)
Stent diameterStent length
Types of variables
Radial/brachial/femoral
QUANTITYCATEGORY
Variables
discrete continuous
counting measuringBMI
Blood pressureQCA data (MLD, late loss)
Stent diameterStent length
Types of variables
QUANTITY
Variables
PAIRED OR
REPEATEDMEASURES
UNPAIREDOR
INDEPENDENTMEASURES
egblood pressure measuredtwice in the same patientsat different times
egblood pressure measuredin several different groups of patients only once
Types of
variables
Parametric and non-parametric tests
Whenever normal or Gaussian assumptions are valid, we can use PARAMETRIC tests, which are usually more sensitive and powerful
However, if an underlying normal cannot be safely assumed (ie there is non-gaussian distribution), NON-PARAMETRIC alternatives should be employed, as they are more robust and efficient
In some cases, albeit uncommonly in clinical cardiovascular research, there are alternatives to non-parametric tests in the presence of violations to normality assumptions
Such alternatives are mathematical transformations, such as the logarithmic (Ln), the power (^x), or the square root (√) trasformation
Alternatives to non-parametric tests
Alternatives to non-parametric tests
Categorical data: compare proportions in groups
Continuous data: compare means or medians in groups
Normal data; use t test
Non-normal data; use Mann Whitney
U testNormal data; use ANOVA
Non-normal data; use Kruskal Wallis
Are data categorical or continuous?
How many groups?
Two groups; normal data?
More than two groups; normal data?
Statistical tests
Variables
CATEGORY QUANTITY
nominal ordinal discrete continuousorderedcategori
es
ranks counting measuring
BinomialChi-square test
Fisher test
Chi-square testSign test
K-S test Kruskal-Wallis test
Mann-Whitney U test Spearman rho test
Wilcoxon test
Student’s t testMann-Whitney U test
Wilcoxon testAnalysis of VarianceKruskal-Wallis test
Kolmogodorov-Smirnov testLinear correlationLinear regressionSpearman rho test
Statistical tests
FreewaresEpi-info www.cdc.govRevMan www.cochrane.org
…or just go to www.google.com and search for the test you need…
Proprietary softwaresBMDPMinitabPrimerSASSPSSStataStatistica
Softwares
SPSS
SPSS
SPSS
SPSS
1) How can we compare EF in men vs. females after MI?
2) How does blood pressure change before and after therapy in a group of patients treated with B-blockers?
3) How can we test if there is difference in in-hospital death in patients treated with trombolysis vs. PCI?
Questions
4) How can we compare the occurrence of strokein AF patients treated with oral anticoagulant therapy vs.oral aspirin during a long term follow-up?
5) Can we predict discharge EF using peak CK values after MI?
Questions
What you will learn• Introduction• Basics• Descriptive statistics• Probability distributions• Inferential statistics• Finding differences in mean between two groups• Finding differences in mean between more than 2 groups• Linear regression and correlation for bivariate analysis • Analysis of categorical data (contingency tables)• Analysis of time-to-event data (survival analysis)• Advanced statistics at a glance• Conclusions and take home messages
What you will learn
• Finding differences in mean between two groups– independent groups (two-sample t-test)– dependent groups (paired t-test)– non-parametric alternatives: Mann-Whitney U
test (rank sum) and Wilcoxon test (signed rank)
Compare variables
Agostoni et al. AJC 2007
late loss (mm)
3,22,41,6,8-,0-,8
A
Freq
uenc
y
50
40
30
20
10
0
late loss (mm)
3,22,41,6,8-,0-,8
B
Freq
uenc
y
50
40
30
20
10
0
Late loss in 2 different stents
Continuous or categorical variable?
Compare variables
late loss (mm)
3,22,41,6,8-,0-,8
A
Freq
uenc
y
50
40
30
20
10
0
late loss (mm)
3,22,41,6,8-,0-,8
B
Freq
uenc
y
50
40
30
20
10
0
Late loss in 2 different stents
Compare continuous variables
Paired or unpaired data?
late loss (mm)
3,22,41,6,8-,0-,8
A
Fre
quen
cy
50
40
30
20
10
0
late loss (mm)
3,22,41,6,8-,0-,8
B
Fre
quen
cy
50
40
30
20
10
0
Parametric or non-parametric test?
Compare continuous unpaired variables
Late loss in 2 different stents
late loss (mm)
3,22,41,6,8-,0-,8
A
Fre
quen
cy
50
40
30
20
10
0
late loss (mm)
3,22,41,6,8-,0-,8
B
Fre
quen
cy
50
40
30
20
10
0
Mean: 0.45 SD: 0.76 Mean: 0.55 SD: 0.76
Compare variables
late loss (mm)
3,22,41,6,8-,0-,8
A
Fre
quen
cy
50
40
30
20
10
0
late loss (mm)
3,22,41,6,8-,0-,8
B
Fre
quen
cy
50
40
30
20
10
0
Median: 0.29 IQR: -0.09–0.66 Median: 0.41 IQR: -0.02–0.85
Compare variables
late loss (mm)
3,22,41,6,8-,0-,8
A
Fre
quen
cy
50
40
30
20
10
0
late loss (mm)
3,22,41,6,8-,0-,8
B
Fre
quen
cy
50
40
30
20
10
0
Mean: 0.45 SD: 0.76 Mean: 0.55 SD: 0.76
Median: 0.29 IQR: -0.09–0.66 Median: 0.41 IQR: -0.02–0.85
Compare variables
Value
Freq
uenc
y
Mean Mean
SD SD
Student t test for unpaired data: p=0.14Unpaired: same variable in different patients at same time
If parametric…
Mann Whitney U test for unpaired data: p=0.03
MedianMedian
IQRIQR
MedianMedian
IQRIQR
If non-parametric…Fr
eque
ncy
Value
Unpaired Student t test
Unpaired Student t testGroup Statistics
267 ,4533 ,75892 ,04645295 ,5468 ,76173 ,04435
typestentchypertaxus
late lossN Mean Std. Deviation
Std. ErrorMean
AB
Independent Samples Test
,002 ,962 -1,455 560 ,146 -,0935 ,06423 -,21964 ,03268
-1,456 554,860 ,146 -,0935 ,06422 -,21962 ,03266
Equal variancesassumedEqual variancesnot assumed
late lossF Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
Student t test• A t-test is any statistical hypothesis test in which the
test statistic has a Student's t distribution if the null
hypothesis is true• It is applied when the population is assumed to be
normally distributed but the sample sizes are small
enough that the statistic on which inference is based is
not normally distributed because it relies on an uncertain
estimate of standard deviation rather than on a precisely
known value (if we knew it, we could use the Z-test)
Student t test• It is used to test the null hypothesis that the means of
two normally distributed populations are equal• Given two data sets (each with its mean, SD and number
of data points) the t test determines whether the means
are distinct, provided that the underlying distributions
can be assumed to be normal• the Student t test should be used if the variances (not
known) of the two populations are also assumed to be
equal; the form of the test used when this assumption is
dropped is sometimes called Welch's t test
Student t test
Mann Whitney rank sum U test
Mann Whitney rank sum U test
Ranks
267 266,65 71194,50295 294,94 87008,50562
typestentchypertaxusTotal
late lossN Mean Rank Sum of Ranks
A
B
Test Statisticsa
35416,50071194,500
-2,063,032
Mann-Whitney UWilcoxon WZAsymp. Sig. (2-tailed)
late loss
Grouping Variable: typestenta.
Ranking• The basic concept of non-parametric tests is ranking• The single values of the variable to analyze are not
evaluated according to their absolute value but to the
“rank” (or position) they assume in the merged
distributon of the values from lower to higher.
Driver Endeavor
17 2119 2119 2117 2118 6
6-17-17-18-19-19-21-21-21-21
1 2 3 4 5 6 7 8 9 10
Comparisonparametric/non-parametric tests
Robustness: non-parametric tests are much less likely
than the t tests to give a spuriously significant result
because of outliers – they are more robust
Efficiency: when normality holds, non-parametric tests
have an efficiency of about 95% when compared to
parametric tests. For distributions sufficiently far from
normal and for sufficiently large sample sizes, non-
parametric tests can be considerably more efficient
late loss (mm)
3,22,41,6,8-,0-,8
A
Fre
quen
cy
50
40
30
20
10
0
late loss (mm)
3,22,41,6,8-,0-,8
B
Fre
quen
cy
50
40
30
20
10
0
Mean: 0.45 SD: 0.76 Mean: 0.55 SD: 0.76
Median: 0.29 IQR: -0.09–0.66 Median: 0.41 IQR: -0.02–0.85
Compare variables
Student t test for unpaired data: p=0.14
Mann Whitney U test for unpaired data: p=0.03
Late loss in restenotic lesions in two different stents
Mean: 1.75 SD: 0.51 Mean: 1.82 SD: 0.62
Unpaired Student t test: p=0.48
Unpaired Student t test
late loss (mm)
3,22,41,6,8-,0-,8
A
Fre
quen
cy
50
40
30
20
10
0
late loss (mm)
3,22,41,6,8-,0-,8
B
Fre
quen
cy
50
40
30
20
10
0
Late loss in non restenotic lesions in two different stents
Mean: 0.14 SD: 0.39 Mean: 0.27 SD: 0.44
Unpaired Student t test: p=0.002
Unpaired Student t test
Value
Freq
uenc
y
Paired: same variable in same group at different time
Paired Student t test
MAGIC, Lancet 2004
Significant increase in EF by paired t test P=0.005EF at baseline and FU in patients treated with BMC for MI
48.7% (8.3)48.7% (8.3)
55.1% (7.4)55.1% (7.4)
Only 11 patients !!!Only 11 patients !!!
Paired Student t test
Does MLD change from post-procedure to follow-upin a group of patients receiving a stent?
Paired Student t test
Paired Samples Test
,5024 ,76115 ,03211 ,4393 ,5655 15,648 561 ,000mld post-mld fuMean SD
Std.ErrorMean Lower Upper
95% CI of theDifference
Paired Differences
t dfSig.
(2-tailed)
Paired Samples Statistics
2,7770 562 ,48155 ,020312,2746 562 ,85787 ,03619
mld postmld fu
Pair1
Mean N Std. DeviationStd. Error
Mean
MLD fu MLD postDifference
Paired Student t test
Paired Student t test
Wilcoxon test: non-parametric comparison of 2 paired variables
Wilcoxon signed rank test
Wilcoxon signed rank testDescriptive Statistics
562 1,50 4,40 2,4400 2,7500 3,1000562 ,00 4,31 1,8700 2,4000 2,8400
mld postmld fu
N Minimum Maximum 25th 50th (Median) 75thPercentiles
Ranks
407a 322,51 131263,00153b 168,74 25817,00
2c
562
Negative RanksPositive RanksTiesTotal
mld fu - mld postN Mean Rank Sum of Ranks
mld fu < mld posta.
mld fu > mld postb.
mld fu = mld postc.
Test Statisticsb
-13,764a
,000ZAsymp. Sig. (2-tailed)
mld fu -mld post
Based on positive ranks.a.
Wilcoxon Signed Ranks Testb.
What you will learn• Introduction• Basics• Descriptive statistics• Probability distributions• Inferential statistics• Finding differences in mean between two groups• Finding differences in mean between more than 2 groups• Linear regression and correlation for bivariate analysis • Analysis of categorical data (contingency tables)• Analysis of time-to-event data (survival analysis)• Advanced statistics at a glance• Conclusions and take home messages
What you will learn
• Finding differences in mean between more than 2 groups– One-way analysis of variance– Non-parametric alternatives: Kruskal-Wallis,
Friedman
Value
Freq
uenc
y
Unpaired: same variable in >2 groups at same time
Three (or more) groups: what happens?
Always ask yourself… Paired or not? Parametric or not?
Compare continuous variables
If ANOVA is significant, search where the difference is…
POST-HOC TESTSCAMELOT, JAMA 2004
1-way ANalysis Of VAriance
T test with P-value no more 0.05 but 0.05/n of tests performedin this case 0.01666
1-way ANOVA
1-way ANOVA• As with the t-test, ANOVA is appropriate when the data
are continuous, when the groups are assumed to have
similar variances, and when the data are normally
distributed• ANOVA is based upon a comparison of variance
attributable to the independent variable (variability
between groups or conditions) relative to the variance
within groups resulting from random chance. In fact,
the formula involves dividing the between-group
variance estimate by the within-group variance estimate
1-way ANOVADescriptives
blood pressure pre
5 90,0000 2,82843 1,26491 86,4880 93,5120 86,00 94,004 90,0000 4,08248 2,04124 83,5039 96,4961 85,00 95,004 90,0000 1,63299 ,81650 87,4015 92,5985 88,00 92,00
13 90,0000 2,73861 ,75955 88,3451 91,6549 85,00 95,00
placABTotal
N Mean Std. Deviation Std. Error Lower Bound Upper Bound
95% Confidence Interval forMean
Minimum Maximum
ANOVA
blood pressure pre
,000 2 ,000 ,000 1,00090,000 10 9,00090,000 12
Between GroupsWithin GroupsTotal
Sum ofSquares df Mean Square F Sig.
1-way ANOVADescriptives
blood pressure post 1 month
5 90,0000 3,00000 1,34164 86,2750 93,7250 85,00 93,004 80,0000 4,08248 2,04124 73,5039 86,4961 75,00 85,004 85,0000 1,63299 ,81650 82,4015 87,5985 83,00 87,00
13 85,3846 5,14034 1,42567 82,2783 88,4909 75,00 93,00
placABTotal
N Mean Std. Deviation Std. Error Lower Bound Upper Bound
95% Confidence Interval forMean
Minimum Maximum
ANOVA
blood pressure post 1 month
223,077 2 111,538 11,866 ,00294,000 10 9,400
317,077 12
Between GroupsWithin GroupsTotal
Sum ofSquares df Mean Square F Sig.
Post-hoc testDescriptives
blood pressure post 1 month
5 90,0000 3,00000 1,34164 86,2750 93,7250 85,00 93,004 80,0000 4,08248 2,04124 73,5039 86,4961 75,00 85,004 85,0000 1,63299 ,81650 82,4015 87,5985 83,00 87,00
13 85,3846 5,14034 1,42567 82,2783 88,4909 75,00 93,00
placABTotal
N Mean Std. Deviation Std. Error Lower Bound Upper Bound
95% Confidence Interval forMean
Minimum Maximum
Multiple Comparisons
Dependent Variable: blood pressure post 1 monthBonferroni
10,0000* 2,05670 ,002 4,0971 15,90295,0000 2,05670 ,106 -,9029 10,9029
-10,0000* 2,05670 ,002 -15,9029 -4,0971-5,0000 2,16795 ,131 -11,2222 1,2222-5,0000 2,05670 ,106 -10,9029 ,90295,0000 2,16795 ,131 -1,2222 11,2222
(J) drugABplacBplacA
(I) drugplac
A
B
MeanDifference
(I-J) Std. Error Sig. Lower Bound Upper Bound95% Confidence Interval
The mean difference is significant at the .05 level.*.
1-way ANOVADescriptives
blood pressure post 2 months
5 90,00 2,646 1,183 86,71 93,29 87 944 79,00 4,082 2,041 72,50 85,50 74 844 86,00 1,633 ,816 83,40 88,60 84 88
13 85,38 5,455 1,513 82,09 88,68 74 94
placABTotal
N Mean Std. Deviation Std. Error Lower Bound Upper Bound
95% Confidence Interval forMean
Minimum Maximum
ANOVA
blood pressure post 2 months
271,077 2 135,538 15,760 ,00186,000 10 8,600
357,077 12
Between GroupsWithin GroupsTotal
Sum ofSquares df Mean Square F Sig.
Post-hoc testDescriptives
blood pressure post 2 months
5 90,00 2,646 1,183 86,71 93,29 87 944 79,00 4,082 2,041 72,50 85,50 74 844 86,00 1,633 ,816 83,40 88,60 84 88
13 85,38 5,455 1,513 82,09 88,68 74 94
placABTotal
N Mean Std. Deviation Std. Error Lower Bound Upper Bound
95% Confidence Interval forMean
Minimum Maximum
Multiple Comparisons
Dependent Variable: blood pressure post 2 monthsBonferroni
11,00* 1,967 ,001 5,35 16,654,00 1,967 ,208 -1,65 9,65
-11,00* 1,967 ,001 -16,65 -5,35-7,00* 2,074 ,021 -12,95 -1,05-4,00 1,967 ,208 -9,65 1,657,00* 2,074 ,021 1,05 12,95
(J) drugABplacBplacA
(I) drugplac
A
B
MeanDifference
(I-J) Std. Error Sig. Lower Bound Upper Bound95% Confidence Interval
The mean difference is significant at the .05 level.*.
Post-hoc testDescriptives
blood pressure post 2 months
5 90,00 2,646 1,183 86,71 93,29 87 944 79,00 4,082 2,041 72,50 85,50 74 844 86,00 1,633 ,816 83,40 88,60 84 88
13 85,38 5,455 1,513 82,09 88,68 74 94
placABTotal
N Mean Std. Deviation Std. Error Lower Bound Upper Bound
95% Confidence Interval forMean
Minimum Maximum
Multiple Comparisons
Dependent Variable: blood pressure post 2 monthsBonferroni
11,00* 1,967 ,001 5,35 16,654,00 1,967 ,208 -1,65 9,65
-11,00* 1,967 ,001 -16,65 -5,35-7,00* 2,074 ,021 -12,95 -1,05-4,00 1,967 ,208 -9,65 1,657,00* 2,074 ,021 1,05 12,95
(J) drugABplacBplacA
(I) drugplac
A
B
MeanDifference
(I-J) Std. Error Sig. Lower Bound Upper Bound95% Confidence Interval
The mean difference is significant at the .05 level.*.
Kruskal Wallis test:
non parametric comparison
of >2 unpaired
continuous variables
Non parametric test
Kruskal Wallis test
Ranks
5 7,004 7,004 7,00
13
drugplacABTotal
blood pressure preN Mean Rank
Test Statisticsa,b
,0002
1,000
Chi-SquaredfAsymp. Sig.
bloodpressure pre
Kruskal Wallis Testa.
Grouping Variable: drugb.
Kruskal Wallis test
Ranks
5 10,504 3,134 6,50
13
drugplacABTotal
blood pressurepost 1 month
N Mean Rank
Test Statisticsa,b
8,3392
,015
Chi-SquaredfAsymp. Sig.
bloodpressure post
1 month
Kruskal Wallis Testa.
Grouping Variable: drugb.
Post-hoc analysis with
Mann Withney U and
Bonferroni correction
Value
Freq
uenc
y
Paired: same variable in same patients at >2 different moments
Three (or more) groups: what happens?
Always ask yourself… Paired or not? Parametric or not?
Compare continuous variables
Three (or more) paired groupsAgain ask yourself… Parametric or not?
If parametric: ANOVA for repeated measuresin SPSS… in the General Linear Model
If non-parametric: Friedman test
Compare continuous variables
Friedman test
Ranksa
2,00
1,90
2,10
blood pressure preblood pressurepost 1 monthblood pressurepost 2 months
Mean Rank
drug = placa.
Test Statisticsa,b
5,111
2,946
NChi-SquaredfAsymp. Sig.
Friedman Testa.
drug = placb.
Ranksa
3,00
2,00
1,00
blood pressure preblood pressurepost 1 monthblood pressurepost 2 months
Mean Rank
drug = Aa.
Test Statisticsa,b
48,000
2,018
NChi-SquaredfAsymp. Sig.
Friedman Testa.
drug = Ab.
Ranksa
3,00
1,00
2,00
blood pressure preblood pressurepost 1 monthblood pressurepost 2 months
Mean Rank
drug = Ba.
Test Statisticsa,b
48,000
2,018
NChi-SquaredfAsymp. Sig.
Friedman Testa.
drug = Bb.
Descriptive Statistics
5
5
5
blood pressure preblood pressurepost 1 monthblood pressurepost 2 months
N
drug = placa.
Descriptive Statisticsa
88,00 90,00 92,00
87,50 91,00 92,00
88,00 89,00 92,50
25th 50th (Median) 75thPercentiles
drug = placa.
Descriptive Statisticsa
95 86,25 90,00 93,75
85 76,25 80,00 83,75
84 75,25 79,00 82,75
Maximum 25th 50th (Median) 75thPercentiles
drug = Aa.
Descriptive Statisticsa
92 88,50 90,00 91,50
87 83,50 85,00 86,50
88 84,50 86,00 87,50
Maximum 25th 50th (Median) 75thPercentiles
drug = Ba.
• Actually, the analysis of variance (ANOVA) is a collection of statistical models, and their associated procedures, in which the observed variance is partitioned into components due to different explanatory variables
• There are several types of ANOVA depending on the number of treatments and the way they are applied to the subjects in the experiment:
• One-way ANOVA is used to test for differences in ≥3 independent groups • One-way ANOVA for repeated measures is used when the subjects
undergo repeated measures; the same subjects are used for each treatment
• Factorial ANOVA (2-way ANOVA) is used to study the effects of two more treatment variables. The most commonly used type of factorial ANOVA is the 2×2 design, where there are two independent variables and each variable has two levels or distinct values
• When one wishes to test two or more independent groups subjecting the subjects to repeated measures, one may perform a factorial mixed-design ANOVA, in which one factor is a between subjects variable and the other is within subjects variable. This is a type of mixed effect model
• Multivariate ANOVA (MANOVA) - more than one dependent variable• Analysis of Covariance (ANCOVA) – ANOVA and regression/correlation
Compare continuous variables
CAMELOT, JAMA 2004
A mixed-design ANOVA is used to test for differences between
independent groups whilst subjecting participants to repeated
measures. In a mixed-design ANOVA model, one factor is a between-
subjects variable (drug) and the other is within-subjects variable (BP)
2-way ANOVA
Thank you for your attention
For any correspondence: [email protected]
For further slides on these topics feel free to visit the metcardio.org website:
http://www.metcardio.org/slides.html