View
218
Download
0
Tags:
Embed Size (px)
Citation preview
BIO-TRAC 25 (Proteomics: Principles and Methods)BIO-TRAC 25 (Proteomics: Principles and Methods)October 10, 2003October 10, 2003 NIH, Bethesda, MDNIH, Bethesda, MD
Zhang-Zhi Hu, M.D. Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Senior Bioinformatics Scientist, Protein Information ResourceProtein Information ResourceNational Biomedical Research Foundation, GUMCNational Biomedical Research Foundation, GUMC
Tutorial: Tutorial: Bioinformatics ResourcesBioinformatics Resources
2
What is Bioinformatics?What is Bioinformatics?
NIH Biomedical Information Science and Technology NIH Biomedical Information Science and Technology Initiative (BISTI) Working Definition (2002)Initiative (BISTI) Working Definition (2002) - Research, - Research, development, or application of computational tools and development, or application of computational tools and approaches for expanding the use of biological, medical, approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.organize, archive, analyze, or visualize such data.
BioinformaticsBioinformatics is the application of information technology is the application of information technology to the analysis, organization and distribution of biological to the analysis, organization and distribution of biological data in order to answer complex biological questions.data in order to answer complex biological questions.
3
Bioinformatics ResourcesBioinformatics Resources
The Molecular Biology Database Collection: The Molecular Biology Database Collection: An Online An Online Compilation of Relevant Database ResourcesCompilation of Relevant Database Resources 2003 update: 2003 update: http://www3.oup.co.uk/nar/database/ Nucleic Acids Research Database Issues (January Annually) Nucleic Acids Research Database Issues (January Annually)
(2003 - (2003 - http://nar.oupjournals.org/content/vol31/issue1/))
DBcat: DBcat: A Catalog of > 500 Biological DatabasesA Catalog of > 500 Biological Databases http://www.infobiogen.fr/services/dbcat/
4
Molecular Biology Database Collection Molecular Biology Database Collection (http://nar.oupjournals.org/cgi/content/full/31/1/1#GKG120TB1)
5
The Molecular Biology Database Collection: The Molecular Biology Database Collection: 2003 update (Baxevanis, A.D.)2003 update (Baxevanis, A.D.)
---- An online resource of 386 key databases of 18 categoriesAn online resource of 386 key databases of 18 categories
Major sequence repositoriesMajor sequence repositories
Comparative GenomicsComparative Genomics
Gene ExpressionGene Expression
Gene Identification and Gene Identification and StructureStructure
Genetic and Physical MapsGenetic and Physical Maps
Genomic DatabasesGenomic Databases
Intermolecular InteractionsIntermolecular Interactions
Metabolic Pathways and Metabolic Pathways and Cellular RegulationCellular Regulation
Mutation DatabasesMutation Databases
PathologyPathology
Protein Sequence MotifsProtein Sequence Motifs
Proteome ResourcesProteome Resources
Retrieval Systems and Retrieval Systems and Database StructureDatabase Structure
RNA SequencesRNA Sequences
StructureStructure
TransgenicsTransgenics
Varied Biomedical ContentVaried Biomedical Content
6
OverviewOverview
Protein Sequence AnalysisProtein Sequence AnalysisII. Sequence Similarity Search and Alignment. Sequence Similarity Search and Alignment
IIII. Family Classification Methods. Family Classification Methods
IIIIII. Structure Prediction Methods. Structure Prediction Methods
Molecular Biology DatabasesMolecular Biology DatabasesIVIV. Protein Family Databases. Protein Family Databases
VV. Database of Protein Functions. Database of Protein Functions
VIVI. Databases of Protein Structures. Databases of Protein Structures
Proteomic ResourcesProteomic ResourcesVIIVII. 2D-gel databases. 2D-gel databases
VIIIVIII. Proteomic analyses. Proteomic analyses
7
I. Sequence Similarity SearchI. Sequence Similarity Search
Find a protein sequence: Find a protein sequence: text searchtext searchBased on Based on Pair-Wise ComparisonsPair-Wise Comparisons BLOSUMBLOSUM scoring matrix scoring matrix PAMPAM scoring matrix scoring matrixDynamic Programming AlgorithmsDynamic Programming Algorithms Global Similarity: Global Similarity: Needleman-WunschNeedleman-Wunsch ( (GAP/BestFitGAP/BestFit)) Local Similarity: Local Similarity: Smith-WatermanSmith-Waterman ( (SSEARCHSSEARCH))Heuristic Algorithms (Sequence Database Searching)Heuristic Algorithms (Sequence Database Searching) FASTAFASTA: Based on K-Tuples (2-Amino Acid): Based on K-Tuples (2-Amino Acid) BLASTBLAST: Triples of Conserved Amino Acids: Triples of Conserved Amino Acids Gapped-BLASTGapped-BLAST: Allow Gaps in Segment Pairs (NREF): Allow Gaps in Segment Pairs (NREF) PHI-BLASTPHI-BLAST: Pattern-Hit Initiated Search (NCBI): Pattern-Hit Initiated Search (NCBI) PSI-BLASTPSI-BLAST: Iterative Search (NCBI): Iterative Search (NCBI)
8
Sequence Search by Text or Unique IDSequence Search by Text or Unique IDEntrez (http://www.ncbi.nlm.nih.gov/Entrez/)
(http://pir.georgetown.edu/pirwww/search/textsearch.html)
9
Pair-Wise Pair-Wise ComparisonsComparisons
Scoring matrix Global lobal and local local
Similarity: Similarity: Dynamic Dynamic ProgrammingProgramming((Needleman-Wunsch,Smith-Waterman)
((http://www.ebi.ac.uk/emboss/align/))
10
FASTA SearchFASTA Search
(http://www.ebi.ac.uk/fasta33/)
(http://pir.georgetown.edu/pirwww/search/fasta.html)
11
Gapped-BLAST SearchGapped-BLAST Search(http://pir.georgetown.edu/pirwww/search/pirnref.shtml)
(http://www.ncbi.nlm.nih.gov/BLAST/)
A BLAST ResultA BLAST Result
13
PSI-BLAST Iterative SearchPSI-BLAST Iterative Search
(http://www.ncbi.nlm.nih.gov/BLAST/)
14
PSI-BLASTPSI-BLAST
15
II. Family Classification MethodsII. Family Classification Methods
Multiple Sequence AlignmentMultiple Sequence Alignment and Phylogenetic Analysis and Phylogenetic Analysis ClustalW Multiple Sequence AlignmentClustalW Multiple Sequence Alignment Alignment Editor & Phylogenetic TreesAlignment Editor & Phylogenetic Trees
Searches Based on Searches Based on Family InformationFamily Information PROSITE Pattern SearchPROSITE Pattern Search Motif and Profile SearchMotif and Profile Search Hidden Markov Model (HMMs)Hidden Markov Model (HMMs)
16
Multiple Sequence AlignmentMultiple Sequence Alignment ClustalW (http://pir.georgetown.edu/pirwww/search/multaln.html)
17
Alignment Editor (Jalview)Alignment Editor (Jalview)(http://www.ebi.ac.uk/clustalw/)
18
Alignment Editor (GeneDoc)Alignment Editor (GeneDoc)(http://www.psc.edu/biomed/genedoc/)
19
Phylogenetic AnalysisPhylogenetic AnalysisTree Programs: (Tree Programs: (http://evolution. http://evolution. genetics.washington.edu/phylip.htmlgenetics.washington.edu/phylip.html)) Tree Searches: (http://pauling.
mbu.iisc.ernet.in/~pali/index.html)
20
Phylogenetic Trees Phylogenetic Trees (IGFBP Superfamily)
(Radial Tree)
(Phylogram)
21
PROSITE Pattern SearchPROSITE Pattern Search(http://pir.georgetown.edu/pirwww/search/patmatch.html)
22
Profile SearchProfile Search(http://bmerc-www.bu.edu/bioinformatics/profile_request.html)
23
Hidden Markov Model Search Hidden Markov Model Search (http://www.sanger.ac.uk/Software/Pfam/search.shtml)
(http://smart.embl-heidelberg.de)
24
III. Structural Prediction MethodsIII. Structural Prediction Methods
Signal Peptide: SIGFIND, SignalP
Transmembrane Helix: TMHMM, TMAP
2D Prediction (-helix, -sheet, Coiled-coils): PHD, JPred
3D Modeling: Homology Modeling (Modeller, SWISS-MODEL), Threading, Ab-initio Prediction
25
StructureStructurePrediction:Prediction:A GuideA Guide
(http://speedy.embl-heidelberg.de/gtsp/flowchart2.html)
26
Protein Protein Prediction Prediction ServerServer
(http://www.cbs.dtu.dk/services/)
27
Signal Peptide PredictionSignal Peptide Prediction(http://www.stepc.gr/~synaptic/sigfind.html)
(http://www.cbs.dtu.dk/services/SignalP-2.0)
28
Transmembrane HelixTransmembrane Helix
(http://www.cbs.dtu.dk/services/TMHMM/)
29
Protein Structure PredictionProtein Structure Prediction(http://cmgm.stanford.edu/WWW/www_predict.html)
(http://restools.sdsc.edu/biotools/biotools9.html)
30
Structure Prediction ServerStructure Prediction Server(http://cubic.bioc.columbia.edu/predictprotein/)
(http://www.compbio.dundee.ac.uk/WWW_Servers/JPred/jpred.html)
31
3D-Modelling3D-Modelling(http://www.salilab.org/modeller/modeller.html)
(http://www.expasy.ch/swissmod/SWISS-MODEL.html)
32
IV. Protein Family DatabasesIV. Protein Family Databases
Whole Proteins PIR: Superfamilies and Families COG (Clusters of Orthologous Groups) of Complete Genomes ProtoNet: Automated Hierarchical Classification of Proteins
Protein Domains Pfam: Alignments and HMM Models of Protein Domains SMART: Protein Domain Families
Protein Motifs PROSITE: Protein Patterns and Profiles BLOCKS: Protein Sequence Motifs and Alignments PRINTS: Protein Sequence Motifs and Signatures
Integrated Family Databases iProClass: Superfamilies/Families, Domains, Motifs, Rich Links InterPro: Integrate Pfam, PRINTS, PROSITES, ProDom, SMART
33
Protein ClusteringProtein Clustering((http://www.ncbi.nlm.nih.gov/COG/))
34
Protein DomainsProtein DomainsPfam (http://www.sanger.ac.uk/Software/Pfam/)
SMART (http:// smart.embl-heid elberg.de/smart/ show_motifs.pl)
35
Protein MotifsProtein Motifs PROSITE is a database of protein families and domains. It
consists of biologically significant sites, patterns and profiles. (http://www.expasy.ch/prosite/)
36
Integrated Family ClassificationIntegrated Family ClassificationInterProInterPro: An integrated resource unifying PROSITE, PRINTS, ProDom, Pfam, SMART, and TIGRFAMs, PIRSF. (http://www.ebi.ac.uk/interpro/search.html)
37
V. Databases of Protein FunctionsV. Databases of Protein Functions
Metabolic Pathways, Enzymes, and Compounds Enzyme Classification: Classification and Nomenclature of Enzyme-Catalysed
Reactions (EC-IUBMB) KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways LIGAND (at KEGG): Chemical Compounds, Reactions and Enzymes EcoCyc: Encyclopedia of E. coli Genes and Metabolism MetaCyc: Metabolic Encyclopedia (Metabolic Pathways) WIT: Functional Curation and Metabolic Models BRENDA: Enzyme Database UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways Klotho: Collection and Categorization of Biological Compounds
Cellular Regulation and Gene Networks EpoDB: Genes Expressed during Human Erythropoiesis BIND: Descriptions of interactions, molecular complexes and pathways DIP: Catalogs experimentally determined interactions between proteins RegulonDB: Escherichia coli Pathways and Regulation
38
KEGG Metabolic & Regulatory PathwaysKEGG Metabolic & Regulatory Pathways
(http://www.genome.ad.jp/dbget-bin/show_pathway?hsa00590+874)
KEGG is a suite of databases and associated software, integrating our current knowledge on molecular interaction networks, the information of genes and proteins, and of chemical compounds and reactions. (http://www.genome.ad.jp/kegg/kegg2.html)
39
BioCycBioCyc (EcoCyc/MetaCyc Metabolic Pathways) (EcoCyc/MetaCyc Metabolic Pathways) The BioCyc Knowledge Library is a collection of Pathway/Genome
Databases (http://biocyc.org/)
40
Protein-Protein Interactions: DIPProtein-Protein Interactions: DIP(http://dip.doe-mbi.ucla.edu/)
41
Protein-Protein Interaction: BINDProtein-Protein Interaction: BIND((http://www.bind.ca/))
42
BioCarta Cellular PathwaysBioCarta Cellular Pathways(http://www.biocarta.com/index.asp)
43
VI. Databases of Protein StructuresVI. Databases of Protein Structures
Protein Structure and Classification PDB: Structure Determined by X-ray Crystallography and NMR CATH: Hierarchical Classification of Protein Domain Structures SCOP: Familial and Structural Protein Relationships FSSP: Protein Fold Family Database
Protein Sequence-Structure Relationship PIR-NRL3D: Protein Sequence-Structure Database PIR-RESID: Protein Structure/Post-Translational Modifications HSSP: Families and Alignments of Structurally-Conserved
Regions
44
PDB Structure DataPDB Structure Data(http://www.rcsb.org/pdb/)
45
PDBsum:PDBsum:
Summary and AnalysisSummary and Analysis (http://www.biochem.ucl.ac.uk/bsm/pdbsum)
46
Protein Structural Protein Structural ClassificationClassification
CATH: Hierarchical domain classification of protein structures (http://www.biochem.ucl.ac.uk/bsm/cath_new/)
47
Protein Structural ClassificationProtein Structural Classification
(http://scop.mrc-lmb. cam.ac.uk/scop/)
The SCOP database aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known, including all entries in the PDB.
48
VII. Proteomic ResourcesVII. Proteomic Resources
GELBANK (GELBANK (http://gelbank.anl.gov): 2D-gel patterns from completed ): 2D-gel patterns from completed genomes; SWISS-2DPAGE (genomes; SWISS-2DPAGE (http://www.expasy.org/ch2d/))
PEP: Predictions for Entire Proteomes: (PEP: Predictions for Entire Proteomes: (http://cubic.bioc.columbia.edu/ pep/): Summarized analyses of protein sequences): Summarized analyses of protein sequences Proteome BioKnowledge Library: (http://www.proteome.com): Detailed Proteome BioKnowledge Library: (http://www.proteome.com): Detailed information on human, mouse and rat proteomesinformation on human, mouse and rat proteomesProteome Analysis Database (http://www.ebi.ac.uk/proteome/): Online Proteome Analysis Database (http://www.ebi.ac.uk/proteome/): Online application of InterPro and CluSTr for the functional classification of application of InterPro and CluSTr for the functional classification of proteins in whole genomesproteins in whole genomesExpression Profiling databases: GNF Expression Profiling databases: GNF (http://expression.gnf.org/cgi-bin/index.cgi, human and mouse (http://expression.gnf.org/cgi-bin/index.cgi, human and mouse transcriptome), SMD transcriptome), SMD (http://genome-www5.stanford.edu/MicroArray/SMD/, Stanford (http://genome-www5.stanford.edu/MicroArray/SMD/, Stanford microarray data analysis), EBI Microarray Informatics microarray data analysis), EBI Microarray Informatics (http://www.ebi.ac.uk/microarray/ index.html , (http://www.ebi.ac.uk/microarray/ index.html , managing, storing and managing, storing and analyzing microarray dataanalyzing microarray data))
49
2D-Gel Image Databases (1)2D-Gel Image Databases (1)(http://gelbank.anl.gov/2dgels/index.asp)
50
2D-Gel Image Databases (2)2D-Gel Image Databases (2)(http://us.expasy.org/ch2d/2d-index.html)
(http://us.expasy.org/cgi-bin/nice2dpage.pl?P06493)
51
VIII. Proteome AnalysisVIII. Proteome Analysis(http://www.ebi.ac.uk/proteome)
52
Expression ProfilingExpression Profiling Human and Mouse Transcriptome
(http://expression.gnf.org/cgi-bin/index.cgi)
(http://genome-www. stanford.edu/serum/)
53
Lab:Lab: Visit selected websites and analyze some protein sequences of
your own choices. - List of Bioinformatics Resources of this tutorial available: http://pir.georgetown.edu/~huz/bioinfo_resource.html
Try some of the following sequences for analysis: 1) well characterized proteins: PIR:A26366(CYP17), JS0747(Sp1) 2) less characterized proteins: PIR:A59000(MATER) TrEMBL:Q9QY16(GRTH) 3) hypothetical protein: PIR:T12515, T00338 , T47130 SWISS-PROT:Q9BWT7