Upload
rita-pizzi
View
36
Download
2
Embed Size (px)
Citation preview
R. PIZZI1, S. SICCARDI1, C. PEDRINAZZI2, O. DURIN2 and G. INAMA2
1Computer Science Department,University of Milan (Italy)2 Department of Cardiology, Hospital of Crema (Italy)
Data Mining Methods for the Stratification of the Arrhythmic Risk
in Young and Master Athletes
8th International Conference on APPLIED MATHEMATICS, SIMULATION, MODELLING. Florence, 2014
THE CLINICAL PROBLEM
•Sudden death in young atlete is still an open and socially relevant issue.
•It hits more than 1000 young athletes (<35) every year in Italy
•Most deaths are due to hidden heart diseases
•Cardiomyopathy is responsible for up to 30% of fatal cases
THE CLINICAL PROBLEM
Physical activity causes • structural remodelling of the ventricles• Alteration of the heart loading conditions• Predisposing to possible fatal arrhythmias
• Endurance sports (running, bicycling, etc) may cause increase of heart rate and stroke volume
• reduced vascular resistances • slight increase in blood pressure
THE CLINICAL PROBLEM
Power sports (weightlifting, rowing, etc.)• Increase of cardiac output• Increase of frequency• Inclease of vascular resistances and blood pressure• Increase of pressure load
• Volume load may lead to dilatation of left ventricle and wall thickness
THE CLINICAL PROBLEM
•Few studies on cardiovascular adaptation to sport activity in master athletes
•Clinical significance for cardiac rehabilitation after myocardial infarction and surgical procedures
•Exercise has a positive effect:– Reduces cardiovascular events– A clinical protocol should include risk evaluation.
THE DATA
• Data collected from four groups:
– A (18 subjects) athletes <40 5 females 13 males
– B (19 subjects) athletes >40 6 females, 13 males
– C 8 subjects non-athletes < 40 3 females 13 males
– D 7 subjects non-athletes>40 2 females 5 males
THE DATA
ECG signals from from treadmill exercise stress test
• HRV (heart rate variability)• RR intervals analysis• Elimination of artifacts
pNNx analysis
• percentage of RR intervals lasting more than x ms• pNN20 and pNN50 are the most significant.• pNN20 gives the best discrimination among subjects.
Multiscale Entropy Analysis Sample Entropy assesses the complexity of a time
series.• Estimator of the conditional probability that two
sequences of m data points remain similar (distance <r) including one more point.
Multiscale Entropy analysis (MSE)• is the Sample Entropy of a time series at multiple
scales, i.e. taking the average of groups of x points.
• MSEx is the MSE with scale factor x
• Both pNNx and MSEx have been satisfactorily used in many researches on arrhythmias.
THE DATA
THE DATAClustering
Hierarchical clustering with Ward’s method• Agglomerative hierarchical clustering• The pair of clusters to merge are chosen
minimizing the Sum of Squared Errors
• i cluster• k variable• j observation (case)
THE DATA
• Recursively , n-1 clusters are formed of size 1, the EES is calculated, the pair with smallest EES forms the first cluster.
• Then n-2 clusters with couples of size 2 and 1 of size 3 are formed, EES is calsulated and so on
• Algorithm stops when 1 single cluster of size n is formed.
THE DATA
THE DATA
Artificial Neural Network
• ANNs are effective non linear classifiers• Our model : ITSOM • A custom SOM-like network (Self Organizing Map)
evaluates chaotic attractors within the sequence of winning neurons.
THE ITSOM ARTIFICIAL NEURAL NETWORK
RESULTS
Variables:–Athlete yes/no–Age–Gender–MSE1–MSE5–MSE20–pNN20
RESULTS
The best clustering highlighted 6 classes:
1. Non-athletes <40 , low MSE1
2. Non-athletes >40, highest MSE1, high pNN20– 1 subject in this group is non-athlete <40 but with high
MSE1.
3. Athletes >40, low MSE1, low pNN20– 1 subject in this group is an athlete<40 with average MSE1.
4. Athletes with high MSE1, high pNN20 both >40 and <40
5. Athletes <40 with low MSE1 and low pNN20
6. Athletes >40 very young (mean age 19) , dispersed values of MSE1 and pNN20.
RESULTS
The ANN results gave further indications.
• The ANN clearly distinguish athletes form non-athletes (separated attractors)
• Again sex is not discriminant (there are no attractor separated by sex)
• Age is discriminant for athletes, is not discriminant for non-athletes.
RESULTS
There are correspondences between attractors and clusters:
• One attractors collects 5 of 7 subjects of cluster 5 (see figure)
• One attractor collects the subjects of cluster 2• One attractor collects the subjects of cluster 1.
FIRST CONCLUSIONS
• The two non-linear procedures (clustering and ANN) are mutually congruent in discriminating subjects
• Sensitive variables exist:
physical activity, age, MSE1
• Sex is not a discriminating variable
• Very young athletes have dispersed values of MSE1. MSE1 seems to lower itself with maturity in athletes,but rises in non-athletes>40
physical activity seems to preserve a low MSE1
FIRST CONCLUSIONS
From the ANN classification:
• lack of age stratification within the group of non-athletes
• athletes have cardiovascular characteristics differentiated by age.
THE NEW VARIABLES
• We wanted to assess if other variables may identify in the clusters groups of subjects with common cardiovascular characteristics.
• All the subjects underwent many other blood and clinical tests:
THE NEW VARIABLES
•TDD (telediastolic diameter)•IVS (interventricular septum) •PW (posterior wall) thickness •ETT (exercise tolerance test) •EF (ejection fraction) •maximum O2 consumption (VO2peak)•% VO2 •O2 consumption at the anaerobic threshold (VO2AT ) •VE / VCO2 slope (indicator of ventilatory response to exercise) •peak RER (respiration exchange ratio)
•maximum workload•HB (hemoglobin)•HCT (hematocrit)•creatinin •cholesterol (total, HDL, LDL)•triglycerides, •blood glucose•BNP (brain natriuretic peptide) •BMI (body mass index) •DTS (Duhe treadmill score) •mean Holter HR (cardiac frequency) •min-max-mean FC, •FC
DATA ANALYSIS
• To evaluate the variables value differences among clusters we used a t-test.
• The following table show the emergence of many statistical significances
• Significances with p<0.001 are indicated with **
DATA ANALYSIS
DATA ANALYSIS
• None of the variables is significantly different between the two clusters 1 and 2 ( non-athletes)
• VO2 peak, workload, Fcmean, Fcmin differ significantly between the non-athletes clusters and various clusters of athletes
• TDD, Fcmin, DTS differ significantly between cluster 6 (very young athletes) and many other clusters
• Fcmin, Fcmean,VO2peak,%VO2,VO2AT differ significantly between cluster 2 (non-athletes>40) and cluster 3 (athletes >40).
THE NEW CLUSTERING
We performed a new clustering using all the variables.
The best result identifies 5 clusters:
1. 16 athletes, both < 40 and < 40, 15 males , 1 female2. 12 athletes, both < 40 and < 40, 10 females, 1 male3. 12 athletes, both < 40 and < 40, all males4. 10 non-athletes, both < 40 and < 40, 9 males, 1
female5. 11 non athletes, both < 40 and < 40, both males and
females.
THE NEW CLUSTERING
• The clusters discriminate perfectly athletes from non athletes.
• Although sex was not a considered variable, the cluster discriminate very well by sex.
• The following figures show the dependence of clusters on groups of variables
THE NEW CLUSTERING
• Workload : maximum for cluster 1 (male athletes, minimum for cluster 2 (female athletes)
0
0,2
0,4
0,6
0,8
1
1,2
1 2 3 4 5
age BMI wload
THE NEW CLUSTERING
• Good correlation between :– IVS and Pw in clusters 1,2,5– IVS and TDD in clusters 3,4– PW and DTS in cluster 3
THE NEW CLUSTERING
• Good correlation between HB, HCT in clusters 1,4,5.
THE NEW CLUSTERING
• Good correlation between pNN20,MSE1,MSE5,MSE20 especially in cluster 2 and 4.
0
0,2
0,4
0,6
0,8
1
1,2
0 1 2 3 4 5 6
PNN20 MSE1 MSE5 MSE20
t-TEST EVALUATION
• We examined the significance of the variable values differences among clusters.
• In the table the variables with p<0.01 are reported.
• Significances with p<0.001 are reported with *.
t-TEST EVALUATION
OBSERVATIONS
• Cluster 1 (male athletes <40) differs significantly from all the other clusters for respiratory variables (VO) and at least one FC variable.
• Cluster 1 differs in workload from all clusters except cluster 3 ( male athletes >40).
• Differences between clusters of athletes and non-athletes involve frequently cholesterol or connected variables.
IN SUMMARYStarting form the analysis of the ECG stress test signals, we could
conclude that:
• MSE1 is low in athletes both < and > 40 and in non-athletes <40, • MSE1 is higher in non-athletes > 40. • Thus sport seems to keep down the parameter MSE1 regardless of age. The application of a self-organizing ANN reveals in addition that the non-
athlete subjects are not separated by age, but result to be clearly separated by athletes.
• Sex is not a discriminating variable in either the clustering or the ANN classification.
IN SUMMARYThe analysis of all the variables together has been able to
discriminate:
• depending on the VO variables– athletes> 40 from non-athletes> 40 – non-athletes <40 from athletes> 40 – non-athletes> 40 from athletes <40
• depending on the workload – non athletes <40 from all athletes
• depending on DTS – athletes <40 from athletes << 40
• depending on FC – non-athletes> from 40 athletes <40 .
• No variables are significantly different between non-athletes <40 and > 40.
IN SUMMARY
On the basis of the following variables we could discriminate:
• workload: between male athletes and female athletes
• VO parameters, workload, FC parameters : between male athletes and non-athletes, both male and female
• VO parameters: between male athletes and female athletes.
IN SUMMARY
• Male athletes differ compared to all the other groups according in particular to VO variables and workload.
• Male athletes are distinguished from non-athletes according to the FC parameters.
CONCLUSIONS
• The study has allowed a stratification of the subjects on the basis of physical activity,
age and sex
• The existence of significant differences in the cardiovascular status of these groups
was shown, through the variability of a set of cardiovascular parameters, in particular
MSE1, PNN20, VO and FC variables.
CONCLUSIONS
• The complex interaction between these variables required the use of nonlinear analysis techniques, namely clustering and ANN.
• Taking into account this stratification it will be possible , following the subjects over time, to identify cardiovascular prognostic indicators that may help to prevent possibly fatal cardiac arrhythmias.
REFERENCES
• O. Durin, C. Pedrinazzi, G. Donato, R. Pizzi, G. Inama, Usefulness of nonlinear analysis of ECG signals for prediction of inducibility of sustained ventricular tachycardia by programmed ventricular stimulation in patients with complex spontaneous ventricular arrhythmias, Annals of Noninvasive Electrocardiology, Vol. 13 , No. 3, 2008, pp.219-227. • R. Pizzi, G. Inama, O. Durin, C. Pedrinazzi, Non-invasive assessment of risk for severe tachyarrhythmias by means of non-linear analysis techniques, Chaos and Complexity Letters, Vol. 3 , No. 3, 2007, pp. 229-250, • G. Inama, C. Pedrinazzi, O. Durin, M. Nanetti, R. Pizzi, Microvolt t-wave alternans for risk stratification in athletes with ventricular arrhythmias: correlation with programmed ventricular stimulation,
Annals of Noninvasive Electrocardiology, Vol. 13 , No. 1 , 2008,pp. ,14-21.• R. Pizzi, O. Durin, G. Inama, Non-invasive assessment of risk for severe tachyarrhythmias by means of non-linear analysis techniques, in: Developments in Chaos and Complexity Research, Nova Science NY, 2008.• G. Inama, C. Pedrinazzi, O. Durin, M. Nanetti, G. Donato, R. Pizzi, Ventricular arrythmias in competitive atheltes: risk stratification with T-wave alternans, Heart International, Vol. 3 , No. 1, 2007 ,pp. 58-67,.• R. Pizzi, S. Siccardi, C. Pedrinazzi, O. Durin, G. Inama, Cardiovascular Modifications and Stratification of the Arrhythmic Risk in Young and Master Athletes Am. J. Biomed Eng, Vol. 4 , No. 3, 2014, pp. 60-67.