40
R. PIZZI 1 , S. SICCARDI 1 , C. PEDRINAZZI 2 , O. DURIN 2 and G. INAMA 2 1Computer Science Department,University of Milan (Italy) 2 Department of Cardiology, Hospital of Crema (Italy) Data Mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

Embed Size (px)

Citation preview

Page 1: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

R. PIZZI1, S. SICCARDI1, C. PEDRINAZZI2, O. DURIN2 and G. INAMA2

1Computer Science Department,University of Milan (Italy)2 Department of Cardiology, Hospital of Crema (Italy)

Data Mining Methods for the Stratification of the Arrhythmic Risk

in Young and Master Athletes

Page 2: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

8th International Conference on APPLIED MATHEMATICS, SIMULATION, MODELLING. Florence, 2014

THE CLINICAL PROBLEM

•Sudden death in young atlete is still an open and socially relevant issue.

•It hits more than 1000 young athletes (<35) every year in Italy

•Most deaths are due to hidden heart diseases

•Cardiomyopathy is responsible for up to 30% of fatal cases

Page 3: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

THE CLINICAL PROBLEM

Physical activity causes • structural remodelling of the ventricles• Alteration of the heart loading conditions• Predisposing to possible fatal arrhythmias

• Endurance sports (running, bicycling, etc) may cause increase of heart rate and stroke volume

• reduced vascular resistances • slight increase in blood pressure

Page 4: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

THE CLINICAL PROBLEM

Power sports (weightlifting, rowing, etc.)• Increase of cardiac output• Increase of frequency• Inclease of vascular resistances and blood pressure• Increase of pressure load

• Volume load may lead to dilatation of left ventricle and wall thickness

Page 5: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

THE CLINICAL PROBLEM

•Few studies on cardiovascular adaptation to sport activity in master athletes

•Clinical significance for cardiac rehabilitation after myocardial infarction and surgical procedures

•Exercise has a positive effect:– Reduces cardiovascular events– A clinical protocol should include risk evaluation.

Page 6: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

THE DATA

• Data collected from four groups:

– A (18 subjects) athletes <40 5 females 13 males

– B (19 subjects) athletes >40 6 females, 13 males

– C 8 subjects non-athletes < 40 3 females 13 males

– D 7 subjects non-athletes>40 2 females 5 males

Page 7: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

THE DATA

ECG signals from from treadmill exercise stress test

• HRV (heart rate variability)• RR intervals analysis• Elimination of artifacts

pNNx analysis

• percentage of RR intervals lasting more than x ms• pNN20 and pNN50 are the most significant.• pNN20 gives the best discrimination among subjects.

Page 8: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

Multiscale Entropy Analysis Sample Entropy assesses the complexity of a time

series.• Estimator of the conditional probability that two

sequences of m data points remain similar (distance <r) including one more point.

Multiscale Entropy analysis (MSE)• is the Sample Entropy of a time series at multiple

scales, i.e. taking the average of groups of x points.

• MSEx is the MSE with scale factor x

• Both pNNx and MSEx have been satisfactorily used in many researches on arrhythmias.

THE DATA

Page 9: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

THE DATAClustering

Hierarchical clustering with Ward’s method• Agglomerative hierarchical clustering• The pair of clusters to merge are chosen

minimizing the Sum of Squared Errors

• i cluster• k variable• j observation (case)

Page 10: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

THE DATA

• Recursively , n-1 clusters are formed of size 1, the EES is calculated, the pair with smallest EES forms the first cluster.

• Then n-2 clusters with couples of size 2 and 1 of size 3 are formed, EES is calsulated and so on

• Algorithm stops when 1 single cluster of size n is formed.

Page 11: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

THE DATA

Page 12: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

THE DATA

Artificial Neural Network

• ANNs are effective non linear classifiers• Our model : ITSOM • A custom SOM-like network (Self Organizing Map)

evaluates chaotic attractors within the sequence of winning neurons.

Page 13: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

THE ITSOM ARTIFICIAL NEURAL NETWORK

Page 14: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

RESULTS

Variables:–Athlete yes/no–Age–Gender–MSE1–MSE5–MSE20–pNN20

Page 15: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

RESULTS

The best clustering highlighted 6 classes:

1. Non-athletes <40 , low MSE1

2. Non-athletes >40, highest MSE1, high pNN20– 1 subject in this group is non-athlete <40 but with high

MSE1.

3. Athletes >40, low MSE1, low pNN20– 1 subject in this group is an athlete<40 with average MSE1.

4. Athletes with high MSE1, high pNN20 both >40 and <40

5. Athletes <40 with low MSE1 and low pNN20

6. Athletes >40 very young (mean age 19) , dispersed values of MSE1 and pNN20.

Page 16: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

RESULTS

The ANN results gave further indications.

• The ANN clearly distinguish athletes form non-athletes (separated attractors)

• Again sex is not discriminant (there are no attractor separated by sex)

• Age is discriminant for athletes, is not discriminant for non-athletes.

Page 17: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

RESULTS

There are correspondences between attractors and clusters:

• One attractors collects 5 of 7 subjects of cluster 5 (see figure)

• One attractor collects the subjects of cluster 2• One attractor collects the subjects of cluster 1.

Page 18: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

FIRST CONCLUSIONS

• The two non-linear procedures (clustering and ANN) are mutually congruent in discriminating subjects

• Sensitive variables exist:

physical activity, age, MSE1

• Sex is not a discriminating variable

• Very young athletes have dispersed values of MSE1. MSE1 seems to lower itself with maturity in athletes,but rises in non-athletes>40

physical activity seems to preserve a low MSE1

Page 19: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

FIRST CONCLUSIONS

From the ANN classification:

• lack of age stratification within the group of non-athletes

• athletes have cardiovascular characteristics differentiated by age.

Page 20: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

THE NEW VARIABLES

• We wanted to assess if other variables may identify in the clusters groups of subjects with common cardiovascular characteristics.

• All the subjects underwent many other blood and clinical tests:

Page 21: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

THE NEW VARIABLES

•TDD (telediastolic diameter)•IVS (interventricular septum) •PW (posterior wall) thickness •ETT (exercise tolerance test) •EF (ejection fraction) •maximum O2 consumption (VO2peak)•% VO2 •O2 consumption at the anaerobic threshold (VO2AT ) •VE / VCO2 slope (indicator of ventilatory response to exercise) •peak RER (respiration exchange ratio)

•maximum workload•HB (hemoglobin)•HCT (hematocrit)•creatinin •cholesterol (total, HDL, LDL)•triglycerides, •blood glucose•BNP (brain natriuretic peptide) •BMI (body mass index) •DTS (Duhe treadmill score) •mean Holter HR (cardiac frequency) •min-max-mean FC, •FC

Page 22: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

DATA ANALYSIS

• To evaluate the variables value differences among clusters we used a t-test.

• The following table show the emergence of many statistical significances

• Significances with p<0.001 are indicated with **

Page 23: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

DATA ANALYSIS

Page 24: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

DATA ANALYSIS

• None of the variables is significantly different between the two clusters 1 and 2 ( non-athletes)

• VO2 peak, workload, Fcmean, Fcmin differ significantly between the non-athletes clusters and various clusters of athletes

• TDD, Fcmin, DTS differ significantly between cluster 6 (very young athletes) and many other clusters

• Fcmin, Fcmean,VO2peak,%VO2,VO2AT differ significantly between cluster 2 (non-athletes>40) and cluster 3 (athletes >40).

Page 25: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

THE NEW CLUSTERING

We performed a new clustering using all the variables.

The best result identifies 5 clusters:

1. 16 athletes, both < 40 and < 40, 15 males , 1 female2. 12 athletes, both < 40 and < 40, 10 females, 1 male3. 12 athletes, both < 40 and < 40, all males4. 10 non-athletes, both < 40 and < 40, 9 males, 1

female5. 11 non athletes, both < 40 and < 40, both males and

females.

Page 26: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

THE NEW CLUSTERING

• The clusters discriminate perfectly athletes from non athletes.

• Although sex was not a considered variable, the cluster discriminate very well by sex.

• The following figures show the dependence of clusters on groups of variables

Page 27: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

THE NEW CLUSTERING

• Workload : maximum for cluster 1 (male athletes, minimum for cluster 2 (female athletes)

0

0,2

0,4

0,6

0,8

1

1,2

1 2 3 4 5

age BMI wload

Page 28: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

THE NEW CLUSTERING

• Good correlation between :– IVS and Pw in clusters 1,2,5– IVS and TDD in clusters 3,4– PW and DTS in cluster 3

Page 29: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

THE NEW CLUSTERING

• Good correlation between HB, HCT in clusters 1,4,5.

Page 30: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

THE NEW CLUSTERING

• Good correlation between pNN20,MSE1,MSE5,MSE20 especially in cluster 2 and 4.

0

0,2

0,4

0,6

0,8

1

1,2

0 1 2 3 4 5 6

PNN20 MSE1 MSE5 MSE20

Page 31: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

t-TEST EVALUATION

• We examined the significance of the variable values differences among clusters.

• In the table the variables with p<0.01 are reported.

• Significances with p<0.001 are reported with *.

Page 32: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

t-TEST EVALUATION

Page 33: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

OBSERVATIONS

• Cluster 1 (male athletes <40) differs significantly from all the other clusters for respiratory variables (VO) and at least one FC variable.

• Cluster 1 differs in workload from all clusters except cluster 3 ( male athletes >40).

• Differences between clusters of athletes and non-athletes involve frequently cholesterol or connected variables.

Page 34: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

IN SUMMARYStarting form the analysis of the ECG stress test signals, we could

conclude that:

• MSE1 is low in athletes both < and > 40 and in non-athletes <40, • MSE1 is higher in non-athletes > 40. • Thus sport seems to keep down the parameter MSE1 regardless of age. The application of a self-organizing ANN reveals in addition that the non-

athlete subjects are not separated by age, but result to be clearly separated by athletes.

• Sex is not a discriminating variable in either the clustering or the ANN classification.

Page 35: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

IN SUMMARYThe analysis of all the variables together has been able to

discriminate:

• depending on the VO variables– athletes> 40 from non-athletes> 40 – non-athletes <40 from athletes> 40 – non-athletes> 40 from athletes <40

• depending on the workload – non athletes <40 from all athletes

• depending on DTS – athletes <40 from athletes << 40

• depending on FC – non-athletes> from 40 athletes <40 .

• No variables are significantly different between non-athletes <40 and > 40.

Page 36: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

IN SUMMARY

On the basis of the following variables we could discriminate:

• workload: between male athletes and female athletes

• VO parameters, workload, FC parameters : between male athletes and non-athletes, both male and female

• VO parameters: between male athletes and female athletes.

Page 37: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

IN SUMMARY

• Male athletes differ compared to all the other groups according in particular to VO variables and workload.

• Male athletes are distinguished from non-athletes according to the FC parameters.

Page 38: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

CONCLUSIONS

• The study has allowed a stratification of the subjects on the basis of physical activity,

age and sex

• The existence of significant differences in the cardiovascular status of these groups

was shown, through the variability of a set of cardiovascular parameters, in particular

MSE1, PNN20, VO and FC variables.

Page 39: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

CONCLUSIONS

• The complex interaction between these variables required the use of nonlinear analysis techniques, namely clustering and ANN.

• Taking into account this stratification it will be possible , following the subjects over time, to identify cardiovascular prognostic indicators that may help to prevent possibly fatal cardiac arrhythmias.

Page 40: Data mining Methods for the Stratification of the Arrhythmic Risk in Young and Master Athletes

REFERENCES

• O. Durin, C. Pedrinazzi, G. Donato, R. Pizzi, G. Inama, Usefulness of nonlinear analysis of ECG signals for prediction of inducibility of sustained ventricular tachycardia by programmed ventricular stimulation in patients with complex spontaneous ventricular arrhythmias, Annals of Noninvasive Electrocardiology, Vol. 13 , No. 3, 2008, pp.219-227. • R. Pizzi, G. Inama, O. Durin, C. Pedrinazzi, Non-invasive assessment of risk for severe tachyarrhythmias by means of non-linear analysis techniques, Chaos and Complexity Letters, Vol. 3 , No. 3, 2007, pp. 229-250, • G. Inama, C. Pedrinazzi, O. Durin, M. Nanetti, R. Pizzi, Microvolt t-wave alternans for risk stratification in athletes with ventricular arrhythmias: correlation with programmed ventricular stimulation,

Annals of Noninvasive Electrocardiology, Vol. 13 , No. 1 , 2008,pp. ,14-21.• R. Pizzi, O. Durin, G. Inama, Non-invasive assessment of risk for severe tachyarrhythmias by means of non-linear analysis techniques, in: Developments in Chaos and Complexity Research, Nova Science NY, 2008.• G. Inama, C. Pedrinazzi, O. Durin, M. Nanetti, G. Donato, R. Pizzi, Ventricular arrythmias in competitive atheltes: risk stratification with T-wave alternans, Heart International, Vol. 3 , No. 1, 2007 ,pp. 58-67,.• R. Pizzi, S. Siccardi, C. Pedrinazzi, O. Durin, G. Inama, Cardiovascular Modifications and Stratification of the Arrhythmic Risk in Young and Master Athletes Am. J. Biomed Eng, Vol. 4 , No. 3, 2014, pp. 60-67.