63
Fortaleza 31.VII.20 06 UniProtKB: Questions and answers UniProtKB/Swiss- UniProtKB/Swiss- Prot: Prot: Questions, Answers Questions, Answers and a few Tips and a few Tips

Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Embed Size (px)

Citation preview

Page 1: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

UniProtKB/Swiss-UniProtKB/Swiss-Prot:Prot:

Questions, Questions, Answers Answers

and a few Tipsand a few Tips

Page 2: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Everything you always wanted Everything you always wanted to know about UniProtKB/Swiss-to know about UniProtKB/Swiss-

Prot…Prot…

and others were not afraid to ask !and others were not afraid to ask !

Page 3: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Two main contact points:

[email protected]@uniprot.org

Page 4: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Some have problems Some have problems finding a protein…finding a protein…

Page 5: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Troubles finding a Troubles finding a protein…protein…

I cannot find the IgG protein from Lama pacasIgG protein from Lama pacas in your server.

“Lama pacas” = Lama guanicoe pacos (Alpaca) (Lama pacos)

Page 6: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Troubles finding a Troubles finding a protein…protein…

I cannot find the IgG protein from Lama pacasIgG protein from Lama pacas in your server.

“Lama pacas” = Lama guanicoe pacos (Alpaca) (Lama pacos)40 entries in UniProtKB (5 Swiss-Prot, 35 TrEMBL), but no IgG;98 entries at the EMBL database, no IgG;

In addition:Ig are not annotated in UniProtKB/Swiss-Prot (currently many Ig sequences are stored only in UniParc);Lama pacos is not an annotation priority.

Page 7: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Model-organism oriented annotationModel-organism oriented annotation

1. Complete microbial proteomes and plastid–encoded proteins (HAMAP) (SP131&132)

2. Human proteins and their orthologs in other mammals (HPI) (SP129)

3. Plant proteins (A.thaliana and rice) (PPAP) (SP133)4. Fungal proteomes (FPAP) (SP134)5. Proteomes of representative subsets of viral strains (SP135)6. Toxins and anti-microbial peptides (ToxProt) (SP139)7. Drosophila proteome (SP137)8. C.elegans proteome (SP138)9. Xenopus proteome (SP136) …

Priorities shared by all organismsPriorities shared by all organisms

1. Post-Translational Modifications (PTMs) (SP126)2. 3D structures (SP128)3. Protein-protein interactions (SP) …

UniProtKB/Swiss-Prot annotation UniProtKB/Swiss-Prot annotation prioritiespriorities

(see poster SP106)(see poster SP106)

Page 8: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Troubles finding a Troubles finding a protein…protein…

Dear Folks, Dear Folks, I cannot find an entry for I cannot find an entry for human human apolipoprotein B100apolipoprotein B100 in Swiss-Prot/TrEMBL. in Swiss-Prot/TrEMBL.Am I doing something wrong? Am I doing something wrong?

Page 9: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Page 10: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Page 11: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Page 12: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

In the annotation process, we try to add all synonyms found for a

given protein/gene in the literature and other databases.

In the future, our search engines will cope with dashes, Roman/Arabic

figures, etc.

Page 13: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Troubles finding a Troubles finding a protein…protein…

I am trying to locate the entry ofr the human beta-beta-2 adrenoreceptor2 adrenoreceptor protein, but I don't seem to get any entries. Can you help me to locate this entry, please?

The missing synonym was added

Page 14: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

I could not find the information of protein protein gi/34906958gi/34906958

Troubles finding a Troubles finding a protein…protein…

From the NCBI documentation:

=> 1. restricted to GenBank (not agreed upon with EMBL and DDBJ) 2. not stable identifiers

Of note, cross-references to RefSeq soon available from UniProtKB

Page 15: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

http://www.pir.uniprot.org/search/idmapping.shtmlhttp://www.pir.uniprot.org/search/idmapping.shtml

Page 16: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Page 17: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

… … eventually they find it !eventually they find it !

Page 18: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

help #12995This is my new question: DO all the Swiss-Prot proteins of human DO all the Swiss-Prot proteins of human and Arabidopsis have CDS nucleotide and Arabidopsis have CDS nucleotide sequences in database?sequences in database? What should I do to get them ?

Page 19: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

From EMBL to TrEMBL

CDS

Page 20: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

CDS

From EMBL to TrEMBL

Page 21: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

CDS

From EMBL to TrEMBL

Ref.

Page 22: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

CDS

From EMBL to UniProtKB/TrEMBL

Ref.

Page 23: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

In the current UniProt release (8.4 – 25-Jul-2006), there are 8’133 UniProtKB/Swiss-Prot 8’133 UniProtKB/Swiss-Prot

entriesentries without cross-references to EMBL/GenBank/DDBJ (over a total of 230’133 entries – 3.5%).

Page 24: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips
Page 25: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Page 26: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answershttp://www.ebi.ac.uk/swissprot/Submissions/submissions.htmlhttp://www.ebi.ac.uk/swissprot/Submissions/submissions.html

Page 27: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

help #12995This is my new question: DO all the Swiss-Prot proteins of human and Arabidopsis have CDS nucleotide sequences in database? What should I do to get them ? What should I do to get them ?

Page 28: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips
Page 29: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

I found that the UNIPROT entry for human MAPKAKK3 is still a TREMBL entry (since MAPKAKK3 is still a TREMBL entry (since 1996)1996) and could not be found in SWISSPROT. Is there a specific reason why certain entries do not enter the SWISSPROT section and get

an'correct UNIPROT ID' ?

Page 30: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips
Page 31: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

I found that the UNIPROT entry for human MAPKAKK3 is still a TREMBL entry (since MAPKAKK3 is still a TREMBL entry (since 1996)1996) and could not be found in SWISSPROT. Is there a specific reason why certain entries do not enter the SWISSPROT section and get

an'correct UNIPROT ID' ?

MAPKAKK3 is not a valid gene name;the corresponding TrEMBL entry was not found

and could not be annotated.Please use the update request form

(or cite accession numbers)!

Page 32: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

I found that the UNIPROT entry for human MAPKAKK3 is still a TREMBL entry (since 1996)

and could not be found in SWISSPROT. Is Is there a specific reason why certain there a specific reason why certain

entries do not enter the SWISSPROT entries do not enter the SWISSPROT sectionsection and get an'correct UNIPROT ID' ?

Page 33: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

UniProtKB:UniProtKB:From TrEMBL to Swiss-ProtFrom TrEMBL to Swiss-Prot

and ~60 uperannotators at SIB and

EBIsupported by a

dedicated programming

team

Page 34: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Sequence merge & analysisHigh performance bioinformatics tools

UniProtKB:UniProtKB:From TrEMBL to Swiss-ProtFrom TrEMBL to Swiss-Prot

Page 35: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Sequencing errors ?

Polymorphisms ?Alternative splicing ?

Alternative initiation ?

Usage of an alternative promoter ?

RNA editing ?

Sequence annotationSequence annotation

Selenocysteine ?Fragment ?

Same gene ?

1 gene / 1 species = 1 Swiss-Prot entry

-> Annotation and documentation of all the differences

Page 36: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Sequence merge & analysisHigh performance bioinformatics tools

Literature information(>1’700 journals cited)

Databases and external scientific expertise

In order to avoid redundancy, once manually annotated and integrated into Swiss-Prot, the entry is deleted from TrEMBL 

X

Annotation and sequence check

UniProtKB:UniProtKB:From TrEMBL to Swiss-ProtFrom TrEMBL to Swiss-Prot

Page 37: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Dear Curator, I am the main author of the paper describing two new phopshorylation sites for human growth hormone (P01241) published in Proteomics 4:587-598(2004). One of two phosphorylation sites, One of two phosphorylation sites, ser 176 described by us in the paper is not ser 176 described by us in the paper is not listed in the expasy web site.listed in the expasy web site. If the curator simply missed the site, please make the necessary update. If ser 176 was not included in the table feature for other reasons, please let us know.

Page 38: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

www.expasy.orgwww.expasy.org

Page 39: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

The reference has been added…

… and the modifications described

Page 40: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Searching UniProtKB/Swiss-Prot Searching UniProtKB/Swiss-Prot

I wish to retrieve separately, all the bacteria I wish to retrieve separately, all the bacteria and viruses protein sequences with and viruses protein sequences with virulence factorsvirulence factors, but what I manage to get when i type "virulence" as a keyword are all the protein sequences with virulence as a keyword. Are the sequences i got here only from bacterial and virus? Any other organisms have this virulence factors? How could I specified the sequences,based on viral and bacterial virulense factors? I ll be really appreciated if you could help me. Thank you.

Page 41: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Currently:Currently:Sequence Retrieval System (SRS)Sequence Retrieval System (SRS)

Page 42: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Page 43: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Page 44: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Page 45: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

(PR#6943)(PR#6943)Dear Sir/Madame, Dear Sir/Madame, I have a question concerning selection of data from I have a question concerning selection of data from UniProt protein database. I wonder if there are any UniProt protein database. I wonder if there are any examples of two or more protein entries, which examples of two or more protein entries, which concern exactly the same protein of two or more concern exactly the same protein of two or more individuals representing the same species. individuals representing the same species. In other In other words, I would like to know, if each protein of a words, I would like to know, if each protein of a given species is represented by exactly one given species is represented by exactly one amino acids sequenceamino acids sequence. If there are some proteins of . If there are some proteins of a given species which are represented by more than a given species which are represented by more than one amino acids sequence, which line of the entry one amino acids sequence, which line of the entry should I use to group such entries together? should I use to group such entries together?

Page 46: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

One Swiss-Prot entry

All protein products encoded by one gene in one species (including

fragments, variations/polymorphisms, splice variants, sequencing errors…)

UniProtKB/Swiss-Prot is non-UniProtKB/Swiss-Prot is non-redundant:redundant:

Page 47: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Post-translational modifications

(PTMs)alternative promoter usage

alternative splicingmRNA editing

etc.~ 100’000

human transcripts

~ 25’000 human genes (with

polymorphisms)

~ 1'000'000 human proteins

Increase in Increase in complexity complexity

Genome

Transcriptome

Proteome

Page 48: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

- 13 sequences (complete or partial) - derived from mRNA (n=6) or genomic DNA (n=7)

Page 49: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Multiple alignment of the C-terminus of available GCR sequencesMultiple alignment of the C-terminus of available GCR sequences

Annotation of the sequence differences

Alternative splicing ?

Polymorphism ? Disease mutation ?Sequencing error (frameshift) ?

Sequencing error (conflict) ?

RNA editing ?

Page 50: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Multiple alignment of C-terminus of the available GCR sequencesMultiple alignment of C-terminus of the available GCR sequences

Page 51: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Where to find the Where to find the annotation about annotation about

alternative splicing in alternative splicing in UniProtKB/Swiss-Prot ?UniProtKB/Swiss-Prot ?

Page 52: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

View « by default » on theExPASy server

Sequence description(Feature Table)

Sequence (SQ)

Cross-references(DR)

Keywords (KW)

References(RN, RP, RC, RX, RA, RL)

Identifier & accession nr. (ID, AC, DT)

Comments(CC)

Protein and gene namesTaxonomy

(DE, GN, OC, OS, OG)

Page 53: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

View « by default » on theExPASy server

Sequence description(Feature Table)

Sequence (SQ)

Cross-references(DR)

Keywords (KW)

References(RN, RP, RC, RX, RA, RL)

Comments(CC)

Identifier & accession nr. (ID, AC, DT)

Protein and gene namesTaxonomy

(DE, GN, OC, OS, OG)

Page 54: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Page 55: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

P04150 (GCR_HUMAN)

Page 56: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

All the alternative sequences are available for Blast searches and protein identification tools (on the ExPASy server).

Page 57: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Page 58: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips
Page 59: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Currently in UniProtKB/Swiss-Prot, for Homo sapiensHomo sapiens,

14’445 entries (~ as many genes)7’975 alternative splicing isoforms

-> 22’420 human sequences described

not taking into account other diversity generating events…

Page 60: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

How to download the How to download the sequences ?sequences ?

Page 61: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Page 62: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Page 63: Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006

UniProtKB: Questions and answers

Dear Sirs,We need deacetylase for the following purposes:1. Deacetylation of fiber obtained from chitin.2. Chitin deacetylation for obtaining chitosan oligosaccahrides.Evidently, it will be different types of deacetylase, because in case of the fiber decrease of molecular weight is not allowed, while in case of chitin deacetylation it is allowable and even desirable for oligomerisation of the product during deacetylation.We ask you to send us the example Deacetylase We ask you to send us the example Deacetylase for chitin and its price.for chitin and its price.

Dear,At this moment I am looking for : bovine TGF beta1I saw in web that you have this product with part# P18341

Could you inform me the price and delivery time ?Could you inform me the price and delivery time ?

And if bioinformatics is not funded And if bioinformatics is not funded properly, we could start a new properly, we could start a new

business…business…