Upload
christina-mcbride
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Sequence Based Analysis Sequence Based Analysis TutorialTutorial
March 26, 2004 March 26, 2004 NIH Proteomics Workshop NIH Proteomics Workshop
Lai-Su L. Yeh, Ph.D.Lai-Su L. Yeh, Ph.D.Protein Science Team LeadProtein Science Team LeadProtein Information Resource at Protein Information Resource at Georgetown University Medical CenterGeorgetown University Medical Center
22
Retrieval, Sequence Search & Retrieval, Sequence Search & Classification MethodsClassification Methods
Retrieve protein info by text / UIDRetrieve protein info by text / UID Sequence Similarity SearchSequence Similarity Search
BLAST, FASTA, Dynamic ProgrammingBLAST, FASTA, Dynamic Programming Family Classification Family Classification
Patterns, Profiles, Hidden Markov Models, Patterns, Profiles, Hidden Markov Models, Sequence Alignments, Neural NetworksSequence Alignments, Neural Networks
Integrated Search and Classification Integrated Search and Classification SystemSystem
33
Sequence Similarity SearchSequence Similarity Search
Based on Based on Pair-Wise ComparisonsPair-Wise Comparisons Dynamic Programming AlgorithmsDynamic Programming Algorithms
Global Similarity: Needleman-WunchGlobal Similarity: Needleman-Wunch Local Similarity: Smith-WatermanLocal Similarity: Smith-Waterman
Heuristic AlgorithmsHeuristic Algorithms FASTA: Based on K-Tuples (2-Amino Acid)FASTA: Based on K-Tuples (2-Amino Acid) BLAST: Triples of Conserved Amino AcidsBLAST: Triples of Conserved Amino Acids Gapped-BLAST: Allow Gaps in Segment PairsGapped-BLAST: Allow Gaps in Segment Pairs PHI-BLAST: Pattern-Hit Initiated SearchPHI-BLAST: Pattern-Hit Initiated Search PSI-BLAST: Position-Specific Iterated SearchPSI-BLAST: Position-Specific Iterated Search
44
Sequence Similarity SearchSequence Similarity Search
Similarity Search ParametersSimilarity Search Parameters Scoring Matrices – Based on Conserved Amino Scoring Matrices – Based on Conserved Amino
Acid Substitution Acid Substitution • Dayhoff Mutation Matrix, e.g., PAM250 (~20% Dayhoff Mutation Matrix, e.g., PAM250 (~20%
Identity)Identity)• Henikoff Matrix from Ungapped Alignments, Henikoff Matrix from Ungapped Alignments,
e.g., BLOSUM 62 e.g., BLOSUM 62 Gap PenaltyGap Penalty
Search Time ComparisonsSearch Time Comparisons Smith-Waterman: 10 MinSmith-Waterman: 10 Min FASTA: 2 MinFASTA: 2 Min BLAST: 20 SecBLAST: 20 Sec
55
Feature RepresentationFeature Representation
Features:Features: Residue Physicochemical Properties, Context Residue Physicochemical Properties, Context (Local & Global) Features, Evolutionary Features(Local & Global) Features, Evolutionary Features
Alternative Alphabets:Alternative Alphabets: Classification of Amino Acids To Classification of Amino Acids To Capture Different Features of Amino Acid ResiduesCapture Different Features of Amino Acid Residues
66
Substitution MatrixSubstitution Matrix Likelihood of One Amino Acid Mutated into Another Over Evolutionary Likelihood of One Amino Acid Mutated into Another Over Evolutionary
TimeTime Negative Score: Unlikely to Happen (e.g., Gly/Trp, -7)Negative Score: Unlikely to Happen (e.g., Gly/Trp, -7) Positive Score: Conservative Substitution (e.g., Lys/Arg, +3)Positive Score: Conservative Substitution (e.g., Lys/Arg, +3) High Score for Identical Matches: Rare Amino Acids (e.g., Trp, Cys)High Score for Identical Matches: Rare Amino Acids (e.g., Trp, Cys)
77
BLASTBLAST
BLASTBLAST (Basic Local Alignment Search Tool) (Basic Local Alignment Search Tool) To search a sequence against the databaseTo search a sequence against the database Extremely fastExtremely fast Robust Robust Most widely usedMost widely usedIt finds very short segment pairs between the query It finds very short segment pairs between the query
and sequence in the databaseand sequence in the databaseThese segments are then extended in both directions These segments are then extended in both directions
until the maximum possible score of this particular until the maximum possible score of this particular segment is reached segment is reached
88
BLAST SearchBLAST Search From BLAST Search InterfaceFrom BLAST Search Interface Table-Format Result with BLAST Output and SSEARCH Table-Format Result with BLAST Output and SSEARCH
(Smith-Waterman) Pair-Wise Alignment(Smith-Waterman) Pair-Wise Alignment
99
BLAST/SSEARCH ResultsBLAST/SSEARCH Results
SSEARCH Alignment
BLAST Alignment
1010
Family Classification MethodsFamily Classification Methods
Based on Based on Family InformationFamily Information ClustalW Multiple Sequence AlignmentClustalW Multiple Sequence Alignment ProSite Pattern SearchProSite Pattern Search Profile Search Profile Search Hidden Markov Models (HMMs)Hidden Markov Models (HMMs) Neural NetworksNeural Networks Integrated AnalysisIntegrated Analysis
1111
Multiple Sequence AlignmentMultiple Sequence Alignment
ClustalWClustalW Progressive Pairwise ApproachProgressive Pairwise Approach
Base on Exhaustive Pairwise AlignmentsBase on Exhaustive Pairwise Alignments Neighbor JoiningNeighbor Joining
Joining Order Corresponding to a Tree Joining Order Corresponding to a Tree Alignment VariesAlignment Varies
Dependent on Joining OrderDependent on Joining Order
1212
How do you build a tree?How do you build a tree?
Pick sequences to alignPick sequences to align Align themAlign them Verify the alignmentVerify the alignment Keep the parts that are aligned correctlyKeep the parts that are aligned correctly Build and evaluate a phylogenetic treeBuild and evaluate a phylogenetic tree
1313
Multiple Alignment and TreeMultiple Alignment and Tree From Text/Sequence Search Result or ClustalW Alignment InterfaceFrom Text/Sequence Search Result or ClustalW Alignment Interface
1414
1515
Motif Patterns (Regular Expressions)Motif Patterns (Regular Expressions) Signature Patterns for Functional MotifsSignature Patterns for Functional Motifs
ProClass Motif Alignments
1616
PIR Pattern SearchPIR Pattern Search From Text/Sequence Search Result or Pattern Search InterfaceFrom Text/Sequence Search Result or Pattern Search Interface One Query Sequence Against PROSITE Pattern DatabaseOne Query Sequence Against PROSITE Pattern Database One Query Pattern (PROSITE or User-Defined) Against Sequence DBOne Query Pattern (PROSITE or User-Defined) Against Sequence DB
1717
Pattern Search Result (I)Pattern Search Result (I) One Query Sequence Against PROSITE Pattern DatabaseOne Query Sequence Against PROSITE Pattern Database
1818
Pattern Search Result (II)Pattern Search Result (II) One Query Pattern Against Sequence DatabaseOne Query Pattern Against Sequence Database
1919
Profile MethodProfile Method
Profile: A Table of Scores to Express Family Consensus Derived from Multiple Profile: A Table of Scores to Express Family Consensus Derived from Multiple Sequence AlignmentsSequence Alignments Num of Rows = Num of Aligned PositionsNum of Rows = Num of Aligned Positions Each row contains a score for the alignment with each possible residue.Each row contains a score for the alignment with each possible residue.
Profile SearchingProfile Searching Summation of Scores for Each Amino Acid Residue along Query SequenceSummation of Scores for Each Amino Acid Residue along Query Sequence Higher Match Values at Conserved PositionsHigher Match Values at Conserved Positions
2020
PIR HMM Domain/Motif SearchPIR HMM Domain/Motif Search
From Text/Sequence From Text/Sequence Search Result or HMM Search Result or HMM Search InterfaceSearch Interface
HMMER Model Building HMMER Model Building & Sequence Search & Sequence Search
Search One Query Search One Query Protein Against All HMMs Protein Against All HMMs
Search One HMM Search One HMM Against Sequence DBAgainst Sequence DB
2121
HMM Search Result (I)HMM Search Result (I) One Query Protein Against All Pfam HMMsOne Query Protein Against All Pfam HMMs
2222
HMM Search Result (II)HMM Search Result (II) Search User-Built HMM Against Protein Sequence DBSearch User-Built HMM Against Protein Sequence DB Input Sequences (Optional Residue Ranges) -> Multiple Input Sequences (Optional Residue Ranges) -> Multiple
Sequence Alignment -> Model Building -> HMM SearchSequence Alignment -> Model Building -> HMM Search
2323
Secondary Structure FeaturesSecondary Structure Features HelixHelix Patterns of Hydrophobic Residue Conservation Showing I, Patterns of Hydrophobic Residue Conservation Showing I,
I+3, I+4, I+7 Pattern Are Highly Indicative of an I+3, I+4, I+7 Pattern Are Highly Indicative of an Helix (Amphipathic)Helix (Amphipathic) StrandsStrands That Are Half Buried in the Protein Core Will Tend to Have That Are Half Buried in the Protein Core Will Tend to Have
Hydrophobic Residues at Positions I, I+2, I+4, I+6Hydrophobic Residues at Positions I, I+2, I+4, I+6
2424
Integrated Bioinformatics System for Integrated Bioinformatics System for Function and Pathway DiscoveryFunction and Pathway Discovery
Data IntegrationData Integration Associative AnalysisAssociative Analysis
Sequence Analysis Pipeline
(Family Classification & Feature Identification)
Data Mining Tools
(Retrieval, Visualization, Analysis, Correlation)
Data Warehouse
(Gene, Protein, Family, Function, Structure, Pathway, Interaction)
Graphical User Interface
(Browsing, Querying, Navigation)
Input
(Gene/Protein Expression Data)
Output
(Analysis Results, Biological Interpretation)
Integrated Bioinformatics System
User
Input
(Local Data, Search Criteria, Report Format)
Sequence Analysis Pipeline
(Family Classification & Feature Identification)
Data Mining Tools
(Retrieval, Visualization, Analysis, Correlation)
Data Warehouse
(Gene, Protein, Family, Function, Structure, Pathway, Interaction)
Graphical User Interface
(Browsing, Querying, Navigation)
Input
(Gene/Protein Expression Data)
Output
(Analysis Results, Biological Interpretation)
Integrated Bioinformatics System
User
Input
(Local Data, Search Criteria, Report Format)
2525
Analytical Analytical PipelinePipeline
Query SequencePIR-NREFiProClass
Top-Matched Superfamilies/Domains
BLAST Search HMM Domain Search
Predicated Superfamilies/Domains/Motifs/Sites/SignalPeptides/TMHs
SSEARCH CLUSTALW
Superfamily/Domain/Motif Alignments
Family Relationships & Functional Features
Family Classification & Functional Analysis
HMM Motif Search Pattern Search SignalP/TMHMM
2626
Integrated Bioinformatics SystemIntegrated Bioinformatics System
Global Bioinformatics Global Bioinformatics Analysis of 1000’s of Analysis of 1000’s of Genes and ProteinsGenes and Proteins
Pathway Discovery, Pathway Discovery,
Target IdentificationTarget Identification
Gene Expression Data Proteomic Data
Clustering
Expression Pattern
Visualization & Statistical Analysis
Clustered Matrix Pathway Map Process HierarchyClustered Graph
Gene/Peptide-Protein Mapping
Pathway Discovery (Browsing, Sorting, Visualization & Statistical Analysis)
Functional Analysis (Sequence Analysis & Information Retrieval)
Integrated Protein Knowledge System
Comprehensive Protein
Information Matrix
Protein List
Gene Expression Data Proteomic Data
Clustering
Expression Pattern
Visualization & Statistical Analysis
Clustered Matrix Pathway Map Process HierarchyClustered GraphClustered Matrix Pathway Map Process HierarchyClustered Graph
Gene/Peptide-Protein Mapping
Pathway Discovery (Browsing, Sorting, Visualization & Statistical Analysis)
Functional Analysis (Sequence Analysis & Information Retrieval)
Integrated Protein Knowledge System
Comprehensive Protein
Information Matrix
Protein List
Gene/Peptide-Protein Mapping
Pathway Discovery (Browsing, Sorting, Visualization & Statistical Analysis)
Functional Analysis (Sequence Analysis & Information Retrieval)
Integrated Protein Knowledge System
Comprehensive Protein
Information Matrix
Protein List
2727
2828
Lab SectionLab Section
2929
Peptide Search & ResultsPeptide Search & Results
3030
Blast Similarity SearchBlast Similarity Search
3131
Blast Search ResultsBlast Search Results
3232
Pair-Wise AlignmentPair-Wise Alignment
3333
Multiple Sequence AlignmentMultiple Sequence Alignment
3434
Pattern Search Results Pattern Search Results
3535
HMM Domain Search ResultHMM Domain Search Result
3636
Building HMM ProfileBuilding HMM Profile
3737
Using HMM Profile for Using HMM Profile for SearchingSearching
3838
Rabbit Alpha Crystallin A Chain Rabbit Alpha Crystallin A Chain An An iiProClass View of the entryProClass View of the entry