Upload
shijunti
View
719
Download
0
Embed Size (px)
Citation preview
+
Predicting Sepsis Patient Resilience Jing Tong
Insight Data Science Program
+Predicting sepsis resilience
Sepsis is a whole-body inflammatory response to an infection.
+
Treatments: Standardized treatment More aggressive treatment
Sepsis patients: Survivors Non-survivors
Predicting sepsis resilience
+Data
Dataset1(Training)
Dataset2(Validation)
Dataset3(Validation)
+ Data
Each sample has a microarray gene expression profile (24840 transcripts 12293 genes)
#Samples
Sepsis Survivors
96
Non-survivors 31
Public dataset from one research paper (integrated by Sage) with 163 samples and 53 patients.
Dataset 1
+Randomly selected 2 genes
+Dataset1
5-Fold Cross
Validation Top500 Genes (p-value
< 0.001) from ANOVA
212 Consensus
Genes
X51/3
Validating
2/3 Training 4/5 Training
1/5 Testing Top500
x 5 x 5
+Feature selection
Randomly selected 2 genes
Top 2 genes in 212 genes
212 consensus genes found.
+Prediction model Algorithm: Support Vector Machine (SVM) Features: 212 consensus gene expressions Testing on 1/3 validation dataset
Measurement ResultAccuracy 0.93Precision 0.97
Recall 0.94F1-score 0.95
+Validation on other datasets Dataset2: Microarray Dataset3:
RNAseqDataset2 #Samples
Sepsis Survivors
51
Non-survivors 26
Dataset3 #Samples
Sepsis Survivors
78
Non-survivors 28
Construct their own SVM models by using 212 consensus genes.
Measurement Dataset2 Dataset3Accuracy 0.73 0.69Precision 0.81 0.83
Recall 0.76 0.73F1-score 0.79 0.78
+What biological processes are involved in these biomarkers?
http://amp.pharm.mssm.edu/Enrichr/enrich
Gene list enrichment analysis tool – “Enrich”
Gene name
Protein encoded
Biological process
ALOX15 LipoxygenaseInhibit diverse inflammatory diseases
including sepsis
PRKCD Protein kinase C Involved in B lymphocyte signaling
+Blog postsepsis-re.com
+About me
Computational Biology, Ph.D.
+
Thanks for your attention!
+Dataset1
Each sample has a gene expression profile (24840 transcripts 12293 genes)
#Samples
Sepsis Survivors
96
Non-survivors 31Healthy Control 36
Public dataset from one research paper (integrated by Sage) with 163 samples and 53 patients.
+Dataset – Feature Engineering 212 consensus genes found.
Survivors
Non-Survivors
+Dataset – Feature Engineering
Survivors
Non-Survivors
Top 50 (of 212) Gene expressions
Down-regulated
+Dataset – Feature Engineering
Survivors
Non-Survivors
Top 50 (of 212) Gene expressions
Up-regulated
+Dataset – GDS4971
Biomarker selections: 1) 2/3 training data, 1/3 validation data 2) Among training data, 5-fold cross validation 3) Each CV round, use ANOVA to find top500 (p-
value<0.001) genes. 4) Find the final consensus genes as potential
biomarkers.
#Samples
Sepsis Survivors
96
Non-survivors 31Healthy Control 36
+What are these biomarkers?
+What biological processes are involved in these biomarkers?
http://amp.pharm.mssm.edu/Enrichr/enrich
Gene list enrichment analysis tool – “Enrich”
+What biological process is involved in these biomarkers?
http://amp.pharm.mssm.edu/Enrichr/enrich
GenesALOX15PRKCDSTAT3ADRM1ITGB2ARHGAP1NEDD9AQP3RXRAITGADRASSF5CCR9ULK1CSKJAK1
+What biological processes are involved in these biomarkers?
Gene name
Protein encoded
Biological process
ALOX15 LipoxygenaseMetaboliting act to inhibit diverse inflammatory diseases including sepsis
PRKCD Protein kinase C
Involved in B cell signaling (a type of lymphocyte in the humoral immunity of the adaptive immune system)
STAT3 Transcription factor Associating with recurrent infections
ADRM1Adhesion-regulating molecule
Mediating lymphocyte adhesion in endothelial cells
ITGB2 IntegrinParticipating cell adhesion and cell-surface mediated signalling.