58
Bioinformatics Structural and functional prediction Master in Molecular Biotecnology 2009-10

Bioinformatics Structural and functional prediction Master in Molecular Biotecnology 2009-10

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Bioinformatics

Structural and functional prediction Master in Molecular Biotecnology 2009-10

Outline

Introduction Biological Databases Sequence Comparison 3D Structure visualization Functional Prediction Structural Prediction

http://mmb.pcb.ub.es/MBIOTEC/

Material and Evaluation

Exercises and slides Campus Virtual http://mmb.pcb.ub.es/MBIOTEC

Evaluation. Practical test on Campus Virtual.

Bioinformaticsintuitive definition

Informatic tools that can suggest solutions to biological problems

You really understand a system when you are able to represent it using a mathematical equation

Lord Kelvin

Living organisms are the most perverse of chemical systems

Coulson

FEBRUARY 2001:

Public Consortium

Celera Genomics NOVIEMBRE 2001 :

Ohio State University

SequencingParalel / combinatorial synthesisHT ScreeningSeparationPurificationCrystallization...

DATA INFORMATION

Bioinformatics

Genome projects Functional genomics Structural genomics Proteomics, systems biology Molecular recognition …

Genome projects

Massivesequencing

Massivesequencing

Genome Determinations

Genome Determinations

Genome Annotation

Genomics and disease

http://www.ncbi.nlm.nih.gov/genomes/static/gpstat.html+ 2124 Virus

http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=9606&build=previous

Functional genomics

DNA-chips….

DNA-chips….

Image processingData mining

Expressionprofiles

Expressionprofiles

Statistical analysis

Statistical analysis

Clustering Machine learning

methods Ontology

http://www.ncbi.nlm.nih.gov/geo/

Structural genomics

X RayNMR

X RayNMR

Homology

3D Structure

3D Structure

Structure selection

3DStructure

3DStructure

Structure-functionNew biomolecules

Structure-functionNew biomolecules

Structure-function analysis

Molecular modeling

Rosalyn Franklin

Mapa difracciónB-DNA

COX-2 ADA

XOFKBP

ATP (Mg) - ACV

Dynamic properties.

Molecular recognition requires structural adjustment

Proteomics

ProteomaProteoma MetabolomaMetaboloma

System biology

HUMAN PLASMA

http://www.imb-jena.de/jcb/ppi/

Barabasi et al. (and others), since 1999

Pazos et al., EMBO Reports 2003

Bioinformatics & prediction

Most used bioinformatics tools try to predict function or structure of macromolecules

Sequence information is the primary entry point

Evolutionary pressure assures conservation DNA seq < Protein seq < Protein 3D structure

Prediction. Possible scenarios1. Homology can be recognized using sequence

comparison tools or protein family databases (blast, clustal, pfam,...).Structural and functional predictions are feasible

2. Homology exist but cannot be recognized easily (psi-blast, threading)Low resolution fold predictions are possible. No functional

information.

3. No homology1D predictions. Sequence motifs. Limited functional

prediction. Ab-initio prediction

Reminder

Bioinformatics “suggests” answers, experimental proof is still necessary

Bioinformatics can “save work”. Hypothesis can be tested “in silico”

Bioinformatics can do impossible experiments

However, never trust bioinformatics

Biological databases

DNA sequence Protein sequence

3D StructureMolecular Recognition41

In real life however …..>gi|261252063|ref|NZ_ACZV01000005.1| Vibrio orientalis CIP 102891 VIA.Contig80, whole genome shotgun sequence ACGCGTTAAGTAGACCGCCTGGGGAGTACGGTCGCAAGATTAAAACTCAAATGAATTGACGGGGGCCCGC ACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTACTCTTGACATCCAGAGA AGCCGGAAGAGATTCTGGTGTGCCTTCGGGAACTCTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTG TTGTGAAATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCCTTGTTTGCCAGCGAGTAATGTCGG GAACTCCAGGGAGACTGCCGGTGATAAACCGGAGGAAGGTGGGGACGACGTCAAGTCATCATGGCCCTTA CGAGTAGGGCTACACACGTGCTACAATGGCGCATACAGAGGGCAGCCAACTTGCGAAAGTGAGCGAATCC CAAAAAGTGCGTCGTAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCG TGGATCAGAATGCCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGG CTGCAAAAGAAGTAGGTAGTTTAACCTTCGGGAGAACGCTTACCACTTTGTGGTTCATGACTGGGGTGAA GTCGTAACAAGGTAGCCCTAGGGGAACCTGGGGCTGGATCACCTCCTTATACGATGATTACTCACGATGA GTGTCCACACAGATTGATATGTCTTTATTAGAGCTTTGAGGGGCTATAGCTCAGCTGGGAGAGCGCTTCG

ATOM 95 CE2 TRP 115 28.381 8.071 33.915 1.00 10.00ATOM 96 CE3 TRP 115 27.500 9.825 32.526 1.00 10.00ATOM 97 CZ2 TRP 115 27.750 7.155 33.103 1.00 10.00ATOM 98 CZ3 TRP 115 26.888 8.895 31.705 1.00 10.00ATOM 99 CH2 TRP 115 27.053 7.584 32.002 1.00 10.00ATOM 100 N ASP 116 26.290 11.255 36.778 1.00 10.00ATOM 101 CA ASP 116 25.763 10.825 38.096 1.00 10.00ATOM 102 C ASP 116 24.689 11.802 38.607 1.00 10.00ATOM 103 O ASP 116 24.564 12.103 39.797 1.00 10.00ATOM 104 CB ASP 116 26.872 10.617 39.142 1.00 50.00ATOM 105 CG ASP 116 26.368 10.397 40.557 1.00 50.00ATOM 106 OD1 ASP 116 25.812 9.294 40.721 1.00 50.00ATOM 107 OD2 ASP 116 26.590 11.276 41.416 1.00 50.00ATOM 108 N PHE 117 23.915 12.348 37.709 1.00 10.00ATOM 109 CA PHE 117 22.766 13.148 38.156 1.00 10.00

DNA sequence Protein sequence

3D StructureMolecular Recognition 42

The amount of data is huge

43

http://www3.ebi.ac.uk/Services/DBStats/ 44

Biological databases

Primary Information comes from experiment Database only organizes and provides the data Ex. GenBank, EMBL

Derived Annotated a posteriori Data is revised and corrected. Information from

literature is added Ex. SWISS-PROT

Reusable Experimental data GEO, SRA

Computationally derived Ex. PFAM Specific

Molecular Database Collection 2009 update

Search strategies

Direct access to database Usually more elaborated information

Global retrieval Sequence Retrieval System (SRS), NCBI Entrez Automated, uniform. Allows to check several

(all) databases simultaneously Program access (bioXXX, Web services,

Taverna)

Origin of information

Individual research Good quality but very limited amount

Massive sequencing projects: EST, HTS, genome projects. Large amount of data. Quality not

assured. Frequent update

Main sequence repositories

DNA EMBL, Genbank, DDBJ

Protein Swissprot/TrEMBL, PIR

50

51

52

53

54

55

Trusted annotation

Translation from DNA

http://www.expasy.org

Cross links

Most database files contain links to other databases DNA sequence to Protein sequence Sequence to 3D structure Sequence to bibliographic data ....

Warnings

Prediction method can fail and some times accurancy is not available

Prediction is always made of known issues

Databases can contain incorrect data

Avoid overvaloration of results