15
1 Transcription Factor-DNA binding prediction Tahmina Ahmed Prosunjit Biswas Iffat Sharmin Chowdhury Badri Sampath

Final Project Transciption Factor DNA binding Prediction

Embed Size (px)

DESCRIPTION

Final Project Transciption Factor DNA binding Prediction

Citation preview

Page 1: Final Project Transciption Factor DNA binding Prediction

1

Transcription Factor-DNA binding prediction

Tahmina AhmedProsunjit BiswasIffat Sharmin ChowdhuryBadri Sampath

Page 2: Final Project Transciption Factor DNA binding Prediction

2

Motivation

• Label the unlabeled DNA sequences by the model, built by examining the labeled DNA sequences and be able to perceive some real world Machine Learning problems.

Page 3: Final Project Transciption Factor DNA binding Prediction

3

Approaches

• K-mer based Fixed length K-mer

K-mer with Mismatches

Using Regular Expression

• PWM basedMEME and MAST

• Combined Model

Unite both model

Page 4: Final Project Transciption Factor DNA binding Prediction

K-mer Approach Based on Regular Expression

Motivation

2-mer appears mostly in the sequences. So, emphasize mostly on 2-mer.

Strategy

- For any two 2-mers X & Y, generate regular expression X(.*)Y and Y(.*)X.

- Use these Regular expression as candidate attribute.

Page 5: Final Project Transciption Factor DNA binding Prediction

5

Classifier Selection

Fig : Around 9 classifiers applied on TF data set

Algorithms are numbered as follows -

(1)Logistic (2)SMO (3)NaiveBayes (4)BayesianLogisticRegression (5)Kstar (6)Bagging 7)LogitBoost (8)RandomForest (9)J48

Summary -

* 9 classifiers are applied on 10 data set. 3 are shown among them

* choosing an absolute classifier is not a trivial task

* same classifier behaves differently on different data sets

Page 6: Final Project Transciption Factor DNA binding Prediction

6

Change in Accuracy due to Different Classifiers

Logistic J48 RandomForest NaiveBayes Logistic J48 RandomForest NaiveBayes

Fig : The performance of different types of Classifiers on TF_3 data set Fig : The performance of different types of Classifiers on TF_5 data set

Summary -

* classifiers have great consequences on accuracy

* one has to be prudent when choosing classifiers

Page 7: Final Project Transciption Factor DNA binding Prediction

7

Change in Accuracy due to Different K-mer Length

4-mer 5-mer 6-mer

Fig : The performance of different length K-mer on TF_3 data set

Summary -

* K-mer length also has consequences on accuracy

* not trivial, difficult to find the absolute one

Page 8: Final Project Transciption Factor DNA binding Prediction

8

Attribute Space Selection

Fig : The performance of different selecting k-mer on TF_4 data set

Summary -

* considering number of attributes also has consequences on accuracy

* accuracy increases if we consider greater number of attributes, but from such saturation point it decreases.

Page 9: Final Project Transciption Factor DNA binding Prediction

9

PWM based Analysis on Accuracy(TF_1 data set)

Fig : J48, minW 6 - maxW 15, no. of sites 10 Fig : J48, minW 6 – maxW 15, no. of motifs 5

Summary -

* accuracy increases when we have more motifs but fixed no. of sites

* accuracy increases when we have more sites but fixed no. of motifs

* what happened when we increases both ?????

Page 10: Final Project Transciption Factor DNA binding Prediction

PWM based Analysis

Fig : Accuracy vary on no. of motifs and no. of sites

* 1st bar concern with no. of sites

* 2nd bar concern with no. of motifs

* 3rd bar concern with accuracy

* the point is that accuracy decreases when we increases no. of motifs and no. of sites.

Page 11: Final Project Transciption Factor DNA binding Prediction

Extra Work for TF_20

Fig : Flow diagram of Building New Model for TF-20

Summary -

* we have done some extra work for TF_20

K-mer+

Pwm Sequences identified differently

Sequences identified by both model

Biased 2-mer Model

Newly Labeled

Sequences

The New Model for TF-20

Page 12: Final Project Transciption Factor DNA binding Prediction

12

AUC based on the Feedback (bonus model)

Fig : AUC of 10 data sets based on last submission

* accuracy improved than first submission

* PWM does not have pleasant result

Page 13: Final Project Transciption Factor DNA binding Prediction

13

Participation

Background Study

Working with Tools

Working with

Models

Parameter Tuning

Automation

Badri Sampath

DNA,RNA,protein, motif

AlignAce, MEME,MAST

PWM K-mer Arff Writer,Mast output

writer

Iffat Sharmin

Chowdhury

Protein, Motif,

Transcription

Weka, AlignAce,ScanAce

K-mer PWM Script for FASTA,

Weka

Prosunjit Biswas

DNA, Transcriptio

nK-mer

MEME,MAST

K-mer PWM Script for RE, for new

model

Tahmina Ahmed

MEME, MAST, PWM

MEME, MAST,Weka

PWM K-mer Script for MEME, MAST

Page 14: Final Project Transciption Factor DNA binding Prediction

14

Acknowledgment

Page 15: Final Project Transciption Factor DNA binding Prediction

Questions ???