Classification of Classification of microarraymicroarray gene expressiongene expression data using data using
support vector machinessupport vector machines ((SVMSVM))
Classification of Classification of microarraymicroarray gene expressiongene expression data using data using
support vector machinessupport vector machines ((SVMSVM))
A presentation on the topic A presentation on the topic For CIS 595 Bioinformatics courseFor CIS 595 Bioinformatics course
by Despina Kontosby Despina KontosSpring 2003 – Temple University Spring 2003 – Temple University
Overview…• What are microarray gene
expression data?
• What are Support Vectors Machines?
• How can we use them to utilize
these gene expression data?
CLASSIFICATION EXPERIMENTS !!!CLASSIFICATION EXPERIMENTS !!!
Microarrays…• What are they anyway??
Gene expression levels on tissue or cell for varying environment conditions
Microarrays…• From a machine learning point of view…
Genes
Experiment g-1 g-2 …… g-n
ex-1
ex-2
…….
…….
ex-m
Tissue classification
Function classification
Support Vector Machines (SVM)• Linear classifiers• Attempt to avoid overfitting by finding the optimal
hyperplane that separates the data
HOW???
By maximizing the Margin..
Support Vectors
Introduced by V.Vapnic and co-workers in 1995
Support Vector Machines (SVM)• And what about datasets that are not linearly separable??Map the data into higher dimensional space and make linear classification there (theorem!!)
Support Vector Machines (SVM)
We need ONLY the support vectors for
computations!!
We can use KERNEL functions to avoid computations in
higher dimensional space
Some mathematical formulations…
Some experiments…M.P.S.Brown, W.N.Grundy, D.Lin, N.Cristianini, C.W.Sugnet, T.S.Furey, M.Ares Jr. and D.Haussler,“Knowledge-based analysis of microarray gene expression data by using support vector machines", Proc.Natl.Acad.Sci.USA,97, 1, pp.262-267, 2000.
Classification of gene function from microarray data using SVM
2,476 genes
79 DNA hybridization experiments
6 gene function families
SVM providedoptimal
classification!!!
Genes
Experiment
g-1
g-2
……
g-n
ex-1
ex-2
…….
…….
ex-m
F1 F2 F3 ...
Function Classification
More experiments…T.furey, N.Cristianini, N. Duffy, D. Bednarski, M. Schummer and D Haussler, “Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expressioin Data”, Bioinformatics, 2000.
Gene expression data on tissue
97,802 DNA clones
31 tissue samples
Genes
Experiment
g-1
g-2
……
g-n
ex-1
ex-2
…….
…….
ex-m
Cancer ovarianNormal ovarianNormal non-ovarian
Cancer
Not Cancer
...
...
Cancer
Tissue
Classification
Conclusions• Microarray gene expression data are a very useful
format of biological information (..expensive to obtain!!)
• SVM new and very promising classification apprach
• A lot of research still to be done on Biological
information processing using techniques developed in
fields such as Machine Learning, Data Mining, etc..
Additional resources..Osuna, R. Freund, and F. Girosi. Support vector machines: Training and applications. In A.I. Memo. MIT A.I. Lab, 1996
N. Cristianini. ICML'01 tutorial, 2001
http://www.kernel-machines.org/
http://research.microsoft.com/users/jplatt/svm.html
http://www.isis.ecs.soton.ac.uk/resources/svminfo/
THANK YOU!!!!!