36
Tutori al 9 Protein and Function Databases

Tutorial 9 Protein and Function Databases. -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology -DAVID Protein and Function Databases

  • View
    234

  • Download
    1

Embed Size (px)

Citation preview

Tutorial 9

Protein and Function Databases

-UniProt - SwissProt/TrEMBL -PROSITE-Pfam-Gene Onltology-DAVID

Protein and Function Databases

Glossary

DomainA structural unit which can be found in multiple protein contexts.

Glossary

RepeatA short unit which is unstable in isolation but forms a stable structure when multiple copies are present.

FamilyA collection of related proteins.

UniProt

The Universal Protein Resource (UniProt) is a central repository of protein sequence, function, classification and cross reference.

It was created by joining the information contained in swiss-Prot and TrEMBL.

http://www.uniprot.org/

Protein search

Reviewed protein

Uniprot input

Uniprot output

Protein status

Accession

numberorganism length

Sequence download

General information

annotations

Information for one protein

GO annotation (MF, BP, CC)

General keywords

Alternative splicing

isoforms

Features in the sequence

Sequences

References

Alignment for two or more proteins

MSA

Blast

Pfam

• http://pfam.sanger.ac.uk/

• Pfam is a database of multiple alignments of protein domains or conserved protein regions.

What kind of domains can we find in Pfam?

Trusted Domains

Repeats

Fragment Domains

Nested Domains

Disulfide bonds

Important residues(e.g active sites)

Trans membrane domains

What kind of domains can we find in Pfam?

Low complexity regions

Coiled Coils:(two or three alpha helices that wind around each other)

Context domains: are those that despite not scoring above the family threshold are expected to be real, based on the other domains found in the protein.

Signal peptides:(indicate a protein that will be secreted)

Pfam input

Domains

Domain range and score

Description

Structure info

Gene Ontology

Links

• http://www.expasy.org/tools/scanprosite • ProSite is a database of protein domains and

motifs that can be searched by either regular expression patterns or sequence profiles.

Prosite

Search Results

Domains architecture

Gene Ontology (GO)

• It is a database of biological processes, molecular functions and cellular components.• GO does not contain sequence information nor gene or protein description. • GO is linked to gene and protein databases. •The GO database is structured as a tree

http://www.geneontology.org/

Search by AmiGO

Three principal branches

http://www.geneontology.org/amigo/

GO structure is a Directed Acyclic Graph

GO sourcesISS Inferred from Sequence/Structural SimilarityIDA Inferred from Direct AssayIPI Inferred from Physical InteractionTAS Traceable Author StatementNAS Non-traceable Author StatementIMP Inferred from Mutant PhenotypeIGI Inferred from Genetic InteractionIEP Inferred from Expression PatternIC Inferred by CuratorND No Data availableIEA Inferred from electronic annotation

Results for alpha-synuclein

DAVID Functional Annotation Bioinformatics Microarray Analysis

 

• Identify enriched biological themes, particularly GO terms• Discover enriched functional-related gene/protein groups• Cluster redundant annotation terms• Explore gene names in batch

ID conversion

annotation

classification

Functional annotationUpload

Annotation options