View
215
Download
1
Embed Size (px)
Citation preview
Outline
Introduction Biological Databases Sequence Comparison 3D Structure visualization Functional Prediction Structural Prediction
http://mmb.pcb.ub.es/MBIOTEC/
Material and Evaluation
Exercises and slides Campus Virtual http://mmb.pcb.ub.es/MBIOTEC
Evaluation. Practical test on Campus Virtual.
Bioinformaticsintuitive definition
Informatic tools that can suggest solutions to biological problems
You really understand a system when you are able to represent it using a mathematical equation
Lord Kelvin
Living organisms are the most perverse of chemical systems
Coulson
Bioinformatics
Genome projects Functional genomics Structural genomics Proteomics, systems biology Molecular recognition …
Genome projects
Massivesequencing
Massivesequencing
Genome Determinations
Genome Determinations
Genome Annotation
Genomics and disease
Functional genomics
DNA-chips….
DNA-chips….
Image processingData mining
Expressionprofiles
Expressionprofiles
Statistical analysis
Structural genomics
X RayNMR
X RayNMR
Homology
3D Structure
3D Structure
Structure selection
3DStructure
3DStructure
Structure-functionNew biomolecules
Structure-functionNew biomolecules
Structure-function analysis
Molecular modeling
Bioinformatics & prediction
Most used bioinformatics tools try to predict function or structure of macromolecules
Sequence information is the primary entry point
Evolutionary pressure assures conservation DNA seq < Protein seq < Protein 3D structure
Prediction. Possible scenarios1. Homology can be recognized using sequence
comparison tools or protein family databases (blast, clustal, pfam,...).Structural and functional predictions are feasible
2. Homology exist but cannot be recognized easily (psi-blast, threading)Low resolution fold predictions are possible. No functional
information.
3. No homology1D predictions. Sequence motifs. Limited functional
prediction. Ab-initio prediction
Reminder
Bioinformatics “suggests” answers, experimental proof is still necessary
Bioinformatics can “save work”. Hypothesis can be tested “in silico”
Bioinformatics can do impossible experiments
However, never trust bioinformatics
In real life however …..>gi|261252063|ref|NZ_ACZV01000005.1| Vibrio orientalis CIP 102891 VIA.Contig80, whole genome shotgun sequence ACGCGTTAAGTAGACCGCCTGGGGAGTACGGTCGCAAGATTAAAACTCAAATGAATTGACGGGGGCCCGC ACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTACTCTTGACATCCAGAGA AGCCGGAAGAGATTCTGGTGTGCCTTCGGGAACTCTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTG TTGTGAAATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCCTTGTTTGCCAGCGAGTAATGTCGG GAACTCCAGGGAGACTGCCGGTGATAAACCGGAGGAAGGTGGGGACGACGTCAAGTCATCATGGCCCTTA CGAGTAGGGCTACACACGTGCTACAATGGCGCATACAGAGGGCAGCCAACTTGCGAAAGTGAGCGAATCC CAAAAAGTGCGTCGTAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCG TGGATCAGAATGCCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGG CTGCAAAAGAAGTAGGTAGTTTAACCTTCGGGAGAACGCTTACCACTTTGTGGTTCATGACTGGGGTGAA GTCGTAACAAGGTAGCCCTAGGGGAACCTGGGGCTGGATCACCTCCTTATACGATGATTACTCACGATGA GTGTCCACACAGATTGATATGTCTTTATTAGAGCTTTGAGGGGCTATAGCTCAGCTGGGAGAGCGCTTCG
ATOM 95 CE2 TRP 115 28.381 8.071 33.915 1.00 10.00ATOM 96 CE3 TRP 115 27.500 9.825 32.526 1.00 10.00ATOM 97 CZ2 TRP 115 27.750 7.155 33.103 1.00 10.00ATOM 98 CZ3 TRP 115 26.888 8.895 31.705 1.00 10.00ATOM 99 CH2 TRP 115 27.053 7.584 32.002 1.00 10.00ATOM 100 N ASP 116 26.290 11.255 36.778 1.00 10.00ATOM 101 CA ASP 116 25.763 10.825 38.096 1.00 10.00ATOM 102 C ASP 116 24.689 11.802 38.607 1.00 10.00ATOM 103 O ASP 116 24.564 12.103 39.797 1.00 10.00ATOM 104 CB ASP 116 26.872 10.617 39.142 1.00 50.00ATOM 105 CG ASP 116 26.368 10.397 40.557 1.00 50.00ATOM 106 OD1 ASP 116 25.812 9.294 40.721 1.00 50.00ATOM 107 OD2 ASP 116 26.590 11.276 41.416 1.00 50.00ATOM 108 N PHE 117 23.915 12.348 37.709 1.00 10.00ATOM 109 CA PHE 117 22.766 13.148 38.156 1.00 10.00
DNA sequence Protein sequence
3D StructureMolecular Recognition 42
Biological databases
Primary Information comes from experiment Database only organizes and provides the data Ex. GenBank, EMBL
Derived Annotated a posteriori Data is revised and corrected. Information from
literature is added Ex. SWISS-PROT
Reusable Experimental data GEO, SRA
Computationally derived Ex. PFAM Specific
Molecular Database Collection 2009 update
Search strategies
Direct access to database Usually more elaborated information
Global retrieval Sequence Retrieval System (SRS), NCBI Entrez Automated, uniform. Allows to check several
(all) databases simultaneously Program access (bioXXX, Web services,
Taverna)
Origin of information
Individual research Good quality but very limited amount
Massive sequencing projects: EST, HTS, genome projects. Large amount of data. Quality not
assured. Frequent update
Cross links
Most database files contain links to other databases DNA sequence to Protein sequence Sequence to 3D structure Sequence to bibliographic data ....