48
Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning that most gene annotations contain at least one mis-annotated exon. (Yandell and Ence, 2012, Nature Reviews) Automated annotation is often not good enough for genes you really care about!

Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning

Embed Size (px)

Citation preview

Page 1: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning

Why Manual Genome Annotation?

Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning that most gene annotations contain at least one mis-annotated exon. (Yandell and Ence, 2012, Nature Reviews)

Automated annotation is often not good enough for genes you really care about!

Page 2: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 3: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning

Yandell and Ence, 2012, Nature Reviewshttp://www.yandell-lab.org/publications/pdf/euk_genome_annotation_review.pdf

Page 4: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning

Different lines of evidence go into modern gene annotation pipelines:1. Computational prediction (Open Reading Frames, etc.)2. Evidence based prediction (ESTs, RNA-seq, etc)3. Homology based prediction (BLAST, etc)Synthesized into a consensus gene annotation – still may be wrong!

Page 5: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning

Bees(Order Hymenoptera, Family Apidae)

Western Honey Bee (Apis mellifera)

Common Eastern Bumble Bee (Bombus impatiens)

Buff-Tailed Bumble Bee (Bombus terrestris) Dwarf Asian Honey Bee

(Apis florea)

Page 6: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning

NADPH + H+ + O2 + R-H NADP+ + H2O + R-OH

cytochrome P450 monooxygenase enzymes

classification: CYP 3 A 4

family>40% amino acid sequence-homology

sub-family>55% amino acid sequence-homology

isoenzyme

*15 A-B

allele

Page 7: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning

Chemical signalling??? (pheromone synthesis and breakdown)

Detoxication(toxin and pesticide metabolism)

Hormone synthesis (highly conserved orthologs)+ Detoxication

Page 8: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning

Organism P450s food / environment

Nasonia vitripennis 92 f ly pupae

Apis mellifera 46 nectar and pollen / homeostatic nest

Anopheles gambiae 106 blood and detritus / standing water

Drosophila melanogaster 85 rotting fruit

Tribolium castaneum 131 seeds

Page 9: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning

Organism P450s Mito CYP2 CYP3 CYP4

Drosophila melanogaster 85 11 6 36 32

Apis mellifera 46 6 8 28 4

Nasonia vitripennis 87 6 7 45 29

Page 10: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning

Repeats

Page 11: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning

Intron splice sites are highly conserved

Page 12: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning

P450s:~ 500 amino acids (1500 nucleotides)Highly conserved heme-binding site (cysteine)

Page 13: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning

Basic Annotation Rules

CDS StartAmino acid MNucleotide ATG

CDS Stop * Amino AcidTAA/TAG/TAG Nucleotide

Translation Frames

Frame 1Frame 2Frame 3

Page 14: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning

http://en.wikipedia.org/wiki/File:Exon_and_Intron_classes.png

http://doc.goldenhelix.com/SVS/latest/_images/splice_site_diagram.png

Intron splice sites

GT-AG

Page 15: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 16: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 17: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 18: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 19: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 20: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 21: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 22: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 23: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 24: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 25: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 26: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 27: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 28: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 29: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 30: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 31: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 32: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 33: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning

“(\w)”

“\1 “

Page 34: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 35: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning

‘GT’ intron donor site

Page 36: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 37: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 38: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning

‘AG’ intron acceptor site

Page 39: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning

‘GT’ intron donor site

1 nucelotide “G” for next codon = Phase 1 intron

Page 40: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning

‘AG’ intron acceptor site

2 nucelotides “AA” before first full codon

Combine with “G” on exon 2

Make the codon “GAA” for glutamic acid (E)

Page 41: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 42: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning

This start looks good!

Page 43: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 44: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 45: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 46: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning
Page 47: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning

Jamboree!Search for paralogs using one of these genes from Apis mellifera in the protein database on Genbank (e.g. CYP9R1 AND Apis mellifera)

CYP9R1 CYP6AS3CYP6BD1CYP6AQ1CYP4G11

Use BLASTP to find predicted paralogs in the NCBI “nr” database. Select one of the following bees for the Organism:

Apis floreaBombus impatiensBombus terrestrisMegachile rotundata

Copy and paste verified amino acid sequences (FASTA formatted) into a text file:

Page 48: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning

Add comments to the header and include a geneidentifier

Send to me at: [email protected]

Thanks!!