32
Genome Annotation: A Protein-centric Perspective

Genome Annotation: A Protein-centric Perspective

  • Upload
    yoshi

  • View
    53

  • Download
    3

Embed Size (px)

DESCRIPTION

Genome Annotation: A Protein-centric Perspective. Protein data contributing to genome annotation. Gene structure prediction Gene function prediction. UniProt. Collaboration between EBI, SIB and PIR Funded mainly by NIH Based on the original work on PIR, Swiss-Prot and TrEMBL. - PowerPoint PPT Presentation

Citation preview

Page 1: Genome Annotation:  A Protein-centric Perspective

Genome Annotation: A Protein-centric Perspective

Page 2: Genome Annotation:  A Protein-centric Perspective

Protein data contributing to genome annotation

Gene structure prediction Gene function prediction

Page 3: Genome Annotation:  A Protein-centric Perspective
Page 4: Genome Annotation:  A Protein-centric Perspective
Page 5: Genome Annotation:  A Protein-centric Perspective
Page 6: Genome Annotation:  A Protein-centric Perspective
Page 7: Genome Annotation:  A Protein-centric Perspective
Page 8: Genome Annotation:  A Protein-centric Perspective
Page 9: Genome Annotation:  A Protein-centric Perspective
Page 10: Genome Annotation:  A Protein-centric Perspective

UniProt

Collaboration between EBI, SIB and PIR Funded mainly by NIH Based on the original work on PIR, Swiss-Prot

and TrEMBL

Page 11: Genome Annotation:  A Protein-centric Perspective

UniProt Goals

High level of annotation Minimal redundancy High level of integration with other databases Complete and up-to-date

Page 12: Genome Annotation:  A Protein-centric Perspective
Page 13: Genome Annotation:  A Protein-centric Perspective

UniProt Non-Redundancy Concepts UniProt Archive (UniParc):

All sequences that are 100% identical over their entire length are merged into a single entry, regardless of species. UniParc represents each protein sequence once and only once, assigning it a unique Identifier. UniParc cross-references the accession numbers of the source databases.

UniProt Knowledgebase: Aims to describe in a single record all protein products

derived from a certain gene (or genes if the translation from different genes in a genome leads to indistinguishable proteins) from a certain species.

UniProt Nref: Merges sequences automatically across different species.

Page 14: Genome Annotation:  A Protein-centric Perspective
Page 15: Genome Annotation:  A Protein-centric Perspective

UniParc 2.2. July 2004

3,913,916 unique sequences from 10,422,131 source records

Source databases are DDBJ/EMBL/GenBank, UniProt/Swiss-Prot, UniProt/TrEMBL, PIR-PSD, Ensembl, International Protein Index (IPI), PDB, RefSeq, FlyBase, WormBase, H-Inv, TROME, European Patent Office, United States Patent and Trademark Office and Japan Patent Office

Page 16: Genome Annotation:  A Protein-centric Perspective
Page 17: Genome Annotation:  A Protein-centric Perspective
Page 18: Genome Annotation:  A Protein-centric Perspective
Page 19: Genome Annotation:  A Protein-centric Perspective
Page 20: Genome Annotation:  A Protein-centric Perspective
Page 21: Genome Annotation:  A Protein-centric Perspective
Page 22: Genome Annotation:  A Protein-centric Perspective
Page 23: Genome Annotation:  A Protein-centric Perspective
Page 24: Genome Annotation:  A Protein-centric Perspective
Page 25: Genome Annotation:  A Protein-centric Perspective
Page 26: Genome Annotation:  A Protein-centric Perspective
Page 27: Genome Annotation:  A Protein-centric Perspective

UniProt Protein DAS Reference Server

Aristotle – Data Source for the Reference Server Creating a Plugin for Thomas Down's DAZZLE

Servlet

Page 28: Genome Annotation:  A Protein-centric Perspective

DAS Infrastructure - Overview

EBI

UniProt

InterPro

Aristotle

ProteinDAS

ReferenceServer

Download every 2 weeks

Reference and Annotation from the EBI

ProteinDAS

AnnotationServer

ProteinDAS

AnnotationServer

DAS Client – Connects to ReferenceServer and zero or more Annotation Servers.

● Merge duplicate features?● Resolve version differences ?

Page 29: Genome Annotation:  A Protein-centric Perspective

Creating a Plugin for DAZZLE / Aristotle

Involved linking the Aristotle Java API to the BioJava & DAZZLE Java API's

Issues with enabling a useful entry_point command

Creation of an 'artificial' hierarchy of entry points, based upon sequence length

Page 30: Genome Annotation:  A Protein-centric Perspective
Page 31: Genome Annotation:  A Protein-centric Perspective

Possible Approach to Implementing Local Annotation Servers

Use GFF Format as a simple and accessible primary data source

Problem with this – not suitable for very large numbers of records, so...

Load this into a relational database (sticking with SQL-92 to ensure as cross-platform as possible)

Use a standard plugin that will allow the 'GFF' data to be read from the relational database.

From the point of view of the data curators, this process should be transparent, i.e. they should be able to work with GFF files and not need to worry about the database structure

Page 32: Genome Annotation:  A Protein-centric Perspective

UniProt Protein DAS Server

External Page: http://www.ebi.ac.uk/uniprot-das/ DAS server package downloadpage: http://www.

ebi.ac.uk/uniprot-das/gffDasApp.html The UniProt DAS Server itself:

http://www.ebi.ac.uk/das-srv/uniprot/das