16
06/27/22 Changhui (Charles) Yan 1 Gene Finding Changhui (Charles) Yan

6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan

Embed Size (px)

Citation preview

Page 1: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan

04/18/23 Changhui (Charles) Yan 1

Gene Finding

Changhui (Charles) Yan

Page 2: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan

04/18/23 Changhui (Charles) Yan 2

Gene Finding

Genomes of many organisms have been sequenced

Page 3: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan

04/18/23 Changhui (Charles) Yan 3

Genome

Page 4: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan

04/18/23 Changhui (Charles) Yan 4

Completely Sequenced Genomes

Page 5: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan

04/18/23 Changhui (Charles) Yan 5

Gene Finding

More than 60 eukaryotic genome sequencing projects are underway

Page 6: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan

04/18/23 Changhui (Charles) Yan 6

Human Genome Project (HGP)

To determine the sequences of the 3 billion bases that make up human DNA 99% human DNA sequence finished to 99.99%

accuracy (April 2003) To identify the approximate 100,000 genes

in human DNA (The estimates has been changed to 20,000-25,000 by Oct 2004) 15,000 full-length human genes identified

(March 2003) To store this information in databases To develop tools for data analysis

Page 7: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan

04/18/23 Changhui (Charles) Yan 7

Gene Finding

Genomes of many organisms have been sequenced

We need to decipher the raw sequences Where are the genes? What do they encode? How the genes are regulated?

Page 8: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan

04/18/23 Changhui (Charles) Yan 8

Gene Finding

Homology-based methods, also called `extrinsic methods‘ It seems that only approximately half of

the genes can be found by homology to other known genes (although this percentage is of course increasing as more genomes get sequenced).

Gene prediction methods or `intrinsic methods‘ (http://www.nslij-genetics.org/gene/)

Page 9: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan

04/18/23 Changhui (Charles) Yan 9

Machine Learning Approach Split data into a training set and a test set Use the training set to train a classifier Test the classifier on test set The classifier then can be applied to novel data

Training data

Machine Learning algorithm

Classifier

Test data

Evaluation of classifier

Novel data

Prediction

Page 10: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan

04/18/23 Changhui (Charles) Yan 10

Data, examples, classes, classifier

ccgctttttgccagcataacggtgtcga, 1accacgttttttgccagcatttgccagca, 0atcatcacgatcacgaacatcaccacg, 0…

Page 11: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan

04/18/23 Changhui (Charles) Yan 11

N-fold cross-validation

Training Set Test Set

Round 1

Round 2

Round 3

3-fold cross-validationE.Coli K12 Genome4,639,675

Page 12: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan

04/18/23 Changhui (Charles) Yan 12

Machine Learning Approach

Training data

Machine Learning algorithm

Classifier

Test data

Evaluation of classifier

Novel data

Prediction

Page 13: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan

04/18/23 Changhui (Charles) Yan 13

Gene-finders

Page 14: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan

04/18/23 Changhui (Charles) Yan 14

Prokaryotes vs. Eukaryotes Prokaryotes are organisms

without a cell nucleus. Most prokaryotes are bacteria. Prokaryotes can be divided into

Bacteria and Archaeabacteria. Eukaryotes are organisms which

a membrane-bound nucleus.

Page 15: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan

04/18/23 Changhui (Charles) Yan 15

Prokaryotes vs. Eukaryotes

Prokaryotes’ genomes are relatively simple: coding region (genes) vs. non-coding region.

Eukaryotes’ genomes are complicated.

Page 16: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan

04/18/23 Changhui (Charles) Yan 16

Eukaryotic genes