Upload
varana
View
27
Download
0
Tags:
Embed Size (px)
DESCRIPTION
We mine the data using Support Vector Machines and create the confusion matrix. SVM Classification of Multiple Tumor Types. 78.25% accuracy. DNA Microarray Data. Oracle Data Mining. Green=Correct Red=Errors. - PowerPoint PPT Presentation
Citation preview
Multiple Examples of tumor tissue (public data from Whitehead/MIT)
SVM Classification of Multiple Tumor Types
DNA Microarray Data
Oracle Data Mining
Actual\Predicted BR PR LU CO LY BL ML UT LE RE PA OV MS BR
BREAST-BR 1 1 PROSTATE-PR 1 1 LUNG-LU 1 2 COLON-CO 3 LYMPHOMA-LY 6 BLADDER-BL 1 2 MELANOMA-ML 1 1 UTERUS-UT 2 LEUKEMIA-LE 1 5 RENAL-RE 3 PANCREAS-PA 1 2 OVARY-OV 1 2 MESOTHELIOMA-MS
3
BRAIN-BR 4
78.25% accuracy
Green=Correct Red=Errors
We feed multiple cancer types data into the Oracle DB: 16,063 genes, 144 cancer
patients.
We mine the data using Support Vector Machines and create the confusion matrix
SVM Classification of Multiple Tumor Types
Actual\Predicted BR PR LU CO LY BL ML UT LE RE PA OV MS BR
BREAST-BR 1 1 PROSTATE-PR 1 1 LUNG-LU 1 2 COLON-CO 3 LYMPHOMA-LY 6 BLADDER-BL 1 2 MELANOMA-ML 1 1 UTERUS-UT 2 LEUKEMIA-LE 1 5 RENAL-RE 3 PANCREAS-PA 1 2 OVARY-OV 1 2 MESOTHELIOMA-MS
3
BRAIN-BR 4
78.25% accuracy
Green=Correct Red=Errors
Oracle Data Mining’s SVM models are able to accurately predict the multi-class tumor problem with
78.25% accuracy.
Identify Biomarkers for DLBC Lymphoma Treatment Outcome
Attribute Importance identifies genes correlated with Lymphoma cancer.
Find a Cure for Lymphoma
Literature search on Lymphoma Set up a project workspace Set up a meeting Check lab protocols Store cell histology images Analyze gene expression results Study the markers Find a lead
Study the Markers
Statistical analysis Protein sequence analysis (Swissprot) BLAST Search Protein secondary structure study Search of genes and genetic disorders (OMIM) Pathway modeling
Data Analysis with JDeveloper
Data Analysis with JDeveloper
PKC Distribution Difference
Statistical Analysis
Create an External Table to read data from lymphoma.txt.
Statistical Analysis
Calculate Mean and Standard Deviation
The t-test shows that the PKC expression levels in cured and fatal patients are significantly different.
Protein sequence analysis Load SwissProt into Oracle XML DB
Load SwissProt into XML DB to learn more about expressed genes of interest
Load SwissProt into XML DB
FTP SwissProt data and schema into Oracle XML DB
Load SwissProt into XML DB
Access XML schema using XML Spy (XML editor) which connects to the database using WebDAV
Load SwissProt into XML DB
Register the XML Schema
Once schema is registered, XML DB automatically generates tables
Describe the Table Generated