39
Multiple alignments - applications Identify conserved motifs - patterns (PROSITE) Profiles (Pfam) Phylogenetic studies Prediction of protein secondary structure Experimental : design of probes

Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

Multiple alignments - applications

Identify conserved motifs - patterns (PROSITE)Profiles (Pfam)Phylogenetic studiesPrediction of protein secondary structureExperimental : design of probes

Page 2: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

Example of experimental application of msa.Probes may be designed for PCR to amplify a region of DNA.

GCCCAAATTACGGGGACAACGGATCTTGGGTTATC......CGGGACGGG GCCCAAATTACGGGGTACTTACGCGGGGACTTTAT......CGGGACGGG GCCCAAATTACGGGGACGGACTTAGC...............CGGGACGGG GCCCAAATTACGGGGCGAGTCTATCTTTTACTTATCTTT..CGGGACGGG GCCCAAATTACGGGGCGGACTTTACTTATCTTTTTCTTT..CGGGACGGG GCCCAAATTACGGGGACGGACGGCGATCGAGCGATCG....CGGGACGGG GCCCAAATTACGGGGACGACGTACGTGAGCC..........CGGGACGGG GCCCAAATTACGGGGACAATTTATCTATCTTTATC......CGGGACGGG GCCCAAATTACGGGGACAACGATCGTGACTGACTG......CGGGACGGG GCCCAAATTACGGGGACAATACGGGACTTATCGGGCTTCC.CGGGACGGG GCCCAAATTACGGGGCGGAGCGGAGCGAGCGGGACGGGCG.CGGGACGGG GCCCAAATTACGGGGACGAGCGGCATCTACTTCGCGCTA..CGGGACGGG GCCCAAATTACGGGGAAAACAATTCTATCTTTATCGCAAAACGGGACGGG

Page 3: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

Multiple sequence alignment

PILEUP

PileUp does a series of progressive, pairwise alignments between sequences and clusters of sequences to generate the final multiple alignment. A cluster consists of two or more already-aligned sequences.

PileUp begins by doing pairwise alignments that score the similarity between every possible pair of sequences. These similarity scores are used to create a clustering order that can be represented as a dendrogram. The clustering strategy represented by the dendrogram is called UPGMA that stands for unweighted pair-group method using arithmetic averages (Sneath, P.H.A. and Sokal, R.R. (1973) in Numerical Taxonomy (pp; 230-234), W.H. Freeman and Company, San Francisco, California, USA).

The dendrogram shows the order of the pairwise alignments of sequences and clusters of sequences that together generate the final alignment. For example:

Page 4: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

PileUp aligns the two most related sequences to each other in order to produce the first cluster. It then aligns the next most related sequence to this cluster or the next two most-related sequences to each other in order to produce another cluster. A series of such pairwise alignments that includes increasingly dissimilar sequences and clusters of sequences at each iteration produces the final alignment.

In the above example, Seq1 and Seq2 are aligned first. Next, Seq3 and Seq4 are aligned. The cluster of Seq1-aligned-to-Seq2 is then aligned to the cluster of Seq3-aligned-to-Seq4. Finally, Seq5 is aligned to the cluster that now contains Seq1 through Seq4 to generate the final alignment of Seq1 through Seq5.

Each pairwise alignment in PileUp uses the method of Needleman and Wunsch (Journal of Molecular Biology 48; 443-453 (1970)), that is extended for use with clusters of aligned sequences rather than only individual sequences. For a pairwise alignment of individual sequences, the comparison score between any two sequence symbols is found in a scoring matrix. For a pairwise alignment of clusters of sequences, the comparison score between any two positions in those clusters is simply the arithmetic average of the scores for all possible symbol comparisons at those positions. When gaps are inserted into a cluster to produce an alignment, they are inserted at the same position in all of the sequences of the cluster.

Page 5: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

CLUSTAL

Clustalw creates a multiple alignment, using methods reminiscent of those of Pileup. One difference between the programs: During the multiple alignment, terminal gaps are penalised in Clustal but not in PILEUP. This will make the PILEUP alignments better when the sequences are of very different lengths (has no effect if there are no large terminal gaps).

CLUSTALW (W = weighting , different weigths to sequences and parameters at different positions in alignments) ftp://ftp.sunet.se/pub/molbio/align/clustal

See documentation in file clustalv.doc

Page 6: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

CLUSTAL W (1.7) Multiple Sequence Alignments

Sequence format is PearsonSequence 1: Seq1 26 aaSequence 2: Seq2 33 aaSequence 3: Seq3 30 aaSequence 4: Seq4 31 aaSequence 5: Seq5 26 aaStart of Pairwise alignmentsAligning...Sequences (1:2) Aligned. Score: 61Sequences (1:3) Aligned. Score: 50Sequences (1:4) Aligned. Score: 15Sequences (1:5) Aligned. Score: 11Sequences (2:3) Aligned. Score: 46Sequences (2:4) Aligned. Score: 29Sequences (2:5) Aligned. Score: 38Sequences (3:4) Aligned. Score: 40Sequences (3:5) Aligned. Score: 34Sequences (4:5) Aligned. Score: 61Guide tree file created: [t.dnd]Start of Multiple AlignmentThere are 4 groupsAligning...Group 1: Sequences: 2 Score:371Group 2: Sequences: 3 Score:180Group 3: Sequences: 2 Score:351Group 4: Sequences: 5 Score:136Alignment Score 209CLUSTAL-Alignment file created [t.aln]

Page 7: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

Examples of clustal parameters:

Pairwise -gap openinggap extensionsubstitution matrix

Multiple alignment -gap openinggap extensionsubstitution matrix

Page 8: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

Multiple alignment software

Pileup (GCG)

Clustalw / Clustalx

MSA (program that in principle finds the true optimal multiple alignment by thedynamic programming method)

T-coffee

Multiple alignment editors/viewers

SeqLab (GCG)MACAW (search for motifs, blocks)JalviewCINEMAGenedocBoxshadeMview

Page 9: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

Clustalx

njplot

Page 10: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

Multiple sequence alignment formatting Jalview

Page 11: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

Multiple sequence alignment formatting Boxshade

hth01 TPRQKVAIIYDVGVSTLYKRFPhth02 IPRKQVAIIYDVAVSTLYKKFPhth03 HPRQQLAIIFGIGVSTLYRYFPhth04 GSKTKLAQAAGIRLASLYSWKGhth05 TTFKQIALESGLSTGTISSFINhth06 IPYQEFAKLIGKSTGAVRRMIDhth07 VTLQQFAELEGVSERTAYRWTThth08 FTYNQYAQMMNISRENAYGVLAhth09 LGASHISKTMNIARSTYVKVINhth10 TGATEIAHQLSIARSTVYKILEhth11 ISISAIAREFNTTRQTILRVKAhth12 GNISALADAENISRKIITRCINhth13 MVLADIAQAVEMHESTISRVTThth14 LVLHDIAEAVGMHESTISRVTThth15 LNLRIVADAIKMHESTVSRVTShth16 MTRGDIGNYLGLTVETISRLLGhth17 LSLSALSRQFGYAPTTLANALEhth18 MSLAELGRSNGLSSSTLKNALDhth19 FDIASVAQHVCLSPSRLSHLFRhth20 LRIDEVARHVCLSPSRLAHLFRhth21 VTLEALADQVAMSPFHLHRLFKhth22 VLYPDIAKKFNTTASRVERAIR

Page 12: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

Multiple sequence alignment formatting Mview

Page 13: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

Multiple alignments - applications

Identify conserved motifs - patterns (PROSITE)Profiles (Pfam)Phylogenetic studiesPrediction of protein secondary structureExperimental : design of probes

Page 14: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

TPRQKVAIIY DVGVSTLYKR FP IPRKQVAIIY DVAVSTLYKK FP HPRQQLAIIF GIGVSTLYRY FP GSKTKLAQAA GIRLASLYSW KG TTFKQIALES GLSTGTISSF IN IPYQEFAKLI GKSTGAVRRM ID VTLQQFAELE GVSERTAYRW TT FTYNQYAQMM NISRENAYGV LA LGASHISKTM NIARSTYVKV IN TGATEIAHQL SIARSTVYKI LE ISISAIAREF NTTRQTILRV KA GNISALADAE NISRKIITRC IN MVLADIAQAV EMHESTISRV TT LVLHDIAEAV GMHESTISRV TT LNLRIVADAI KMHESTVSRV TS MTRGDIGNYL GLTVETISRL LG LSLSALSRQF GYAPTTLANA LE MSLAELGRSN GLSSSTLKNA LD FDIASVAQHV CLSPSRLSHL FR LRIDEVARHV CLSPSRLAHL FR VTLEALADQV AMSPFHLHRL FK VLYPDIAKKF NTTASRVERA IR

Profiles : Example with HTH (helix turn helix) motif

Page 15: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

A B C D E F G H I K L M N P Q R S T V W X Y Z -25 -89 -44 -89 -74 -6 -79 -65 31 -63 33 31 -66 -79 -57 -66 -41 -9 28 -62 -40 -27 -74-11 -23 -62 -23 -27 -81 -29 -51 -56 -27 -59 -41 0 3 -27 -39 19 29 -37 -97 -34 -72 -27-25 -102 -52 -102 -68 -7 -103 -53 18 -38 36 14 -72 -89 -46 -24 -50 -37 3 -62 -42 -3 -68 2 -9 -67 -9 13 -82 -28 -18 -78 10 -69 -38 6 -27 26 -3 29 0 -57 -90 -25 -57 13 -7 28 -82 28 51 -85 -48 -16 -76 11 -74 -42 -8 -38 44 -10 7 -26 -55 -97 -23 -58 51-32 -116 -40 -116 -97 14 -130 -93 96 -87 72 39 -105 -101 -86 -95 -71 -32 77 -78 -49 -13 -97198 -100 -12 -100 -54 -112 17 -106 -66 -54 -66 -60 -94 -57 -54 -57 60 -3 -15 -161 -49 -112 -54-40 -3 -89 -3 22 -78 -70 -14 -60 21 -54 -28 -3 -53 31 20 -18 -33 -52 -95 -29 -52 22 6 -49 -52 -49 -12 -46 -64 -20 -24 -21 -19 -8 -39 -54 0 -27 -7 -18 -13 -76 -29 -27 -12-25 -75 -52 -75 -50 19 -88 -50 24 -56 11 17 -64 -81 -47 -68 -40 -28 24 -52 -36 8 -50

-17 -1 -65 -1 -30 -100 78 -41 -111 -36 -113 -82 40 -65 -38 -46 12 -39 -91 -97 -43 -89 -30-28 -100 -38 -100 -78 -10 -114 -82 65 -58 60 54 -81 -82 -56 -67 -50 -13 56 -71 -41 -27 -78 30 -27 -47 -27 -16 -70 -3 -7 -71 -18 -70 -45 8 -44 -15 -25 68 22 -58 -94 -27 -55 -16-14 -39 -78 -39 4 -84 -70 -45 -46 -4 -48 -29 -36 -1 -5 0 -19 -6 -25 -102 -34 -65 4 19 -10 -61 -10 6 -74 -4 -38 -83 1 -79 -44 16 -46 3 -25 93 15 -73 -102 -26 -70 6 -4 -46 -55 -46 -36 -85 -77 -56 -50 -28 -49 -41 4 -52 -31 -15 30 142 -20 -95 -33 -76 -36-17 -121 -34 -121 -94 -8 -124 -98 84 -79 84 43 -103 -96 -77 -85 -63 -28 68 -85 -48 -27 -94-10 -55 -56 -55 -26 -9 -59 6 -45 -23 -42 -30 -26 -64 -16 -34 11 -15 -38 -40 -28 45 -26-36 -49 -105 -49 3 -102 -51 14 -109 66 -83 -46 23 -65 24 110 -9 -33 -102 -110 -35 -64 3 -8 -99 -26 -99 -69 -14 -87 -73 24 -52 21 17 -86 -79 -53 -61 -48 -25 30 -35 -41 -16 -69

-38 -93 -50 -93 -79 37 -109 -72 32 -60 35 14 -76 -94 -71 -68 -44 -7 16 -50 -43 -5 -79-15 0 -79 0 4 -92 -21 -31 -84 -2 -81 -56 16 -8 -6 -4 9 2 -65 -101 -31 -75 4

Profile based on HTH motif alignment

Page 16: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

Profilemake Makes a profile from a multiple sequence alignment.Profilesearch Compares a profile to a sequence database. Finds

the sequences that best fit the profile.Profilescan Compares a sequence to a library of profiles.

Finds the profile that best fit the sequenceProfilegap Compares a sequence and a profile, producing a

sequence-profile alignmentProfilesegments Aligns a profile of the sequences found by

profilesearch.

MEME Finds conserved motifs in a group of unaligned sequences. MEME saves these motifs as a set of profiles. You can search a database of sequences with these profiles using the MotifSearch program.

Page 17: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

PSIBLAST

PSI-BLAST is an important tool to identify remote protein similarity. It proceeds by way of the following steps:

(1) PSI-BLAST takes as an input a single protein sequence and compares it to a protein database, using the gapped BLAST program . (2) The program constructs a multiple alignment, and then a profile, from any

significant local alignments found. The original query sequence servesas a template for the multiple alignment and profile, whose lengths are identical to that of the query.

(3) The profile is compared to the protein database, again seeking local alignments. After a few minor modifications, the BLAST algorithm can be used for this directly.

(4) PSI-BLAST estimates the statistical significance of the local alignments found. Because profile substitution scores are constructed to a fixed scale , and gap scores remain independent of position, the statistical theory and parameters for gapped BLAST alignments remain applicable to profile alignments.

(5) Finally, PSI-BLAST iterates, by returning to step (2), an arbitrary number

of times or until convergence.

Profile-alignment statistics allow PSI-BLAST to proceed as a natural extension of BLAST; the results produced in iterative search steps are comparable to those produced from the first pass.

Advantage : Unlike most profile-based search methods, PSI-BLAST runs as one program, starting with a single protein sequence, and the intermediate steps of multiple alignment and profile construction are invisible to the user.

Page 18: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

Psiblast tutorial http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/psi1.html

Page 19: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG
Page 20: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG
Page 21: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG
Page 22: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG
Page 23: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

Evidence of Med4/Trap36 homology.PSI-Blast using human Trap36 as query sequence

Results from round 2 Score ESequences producing significant alignments: (bits) ValueSequences used in model and found again:

ref|NP_054885.1| HSPC126 protein; p36 TRAP/SMCC/PC2 subunit [Hom... 367 e-101gb|AAF37289.1|AF230381_1 (AF230381) p36 TRAP/SMCC/PC2 subunit [H... 366 e-100ref|XP_007213.1| HSPC126 protein [Homo sapiens] 354 7e-97gb|AAF50591.1| (AE003559) CG8609 gene product [Drosophila melano... 283 1e-75pir||T27901 hypothetical protein ZK546.13 - Caenorhabditis elega... 223 2e-57

Sequences not found previously or not previously below threshold:

gb|AAD45920.1|AF162224_1 (AF162224) angiopoietin-related protein... 33 2.8ref|NP_014817.1| Stoichiometric member of mediator complex; Med4... 33 2.8emb|CAB64662.1| (AJ249991) myosin heavy chain [Mytilus galloprov... 33 3.7gb|AAF62395.1| (AF183909) myosin heavy chain cardiac muscle spec... 33 4.8gb|AAF62394.1| (AF183909) myosin heavy chain cardiac muscle spec... 33 4.8gb|AAF62392.1|AF183909_2 (AF183909) myosin heavy chain catch (sm... 33 4.8gb|AAF62391.1|AF183909_1 (AF183909) myosin heavy chain striated ... 33 4.8pir||T22890 hypothetical protein F58A3.1c - Caenorhabditis elega... 33 4.8pir||T22888 hypothetical protein F58A3.1b - Caenorhabditis elega... 33 4.8gb|AAB38367.1| (U80221) F58A3.1b [Caenorhabditis elegans] 33 4.8

Page 24: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

Gustafsson CM, Samuelsson T. Mol Microbiol. 2001 41:1-8.

An evolutionary conserved core of mediator subunits

Page 25: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

Sequence logo

Page 26: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

Multiple alignments - applications

Identify conserved motifs - patterns (PROSITE)Profiles (Pfam)Phylogenetic studiesPrediction of protein secondary structureExperimental : design of probes

Page 27: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG
Page 28: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG
Page 29: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG
Page 30: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG
Page 31: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

Number of OTUs Number of rootedtrees

Number of unrootedtrees

2 1 13 3 14 15 35 105 156 954 1057 10,395 9548 135,135 10,3959 2,027,025 135,13510 34,459,425 2,027,025

(2n - 5)!N = ------------- 2n-3(n - 3)!

Page 32: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

Distance matrix methodsUPGMA Example

(Unweighted pair group method with arithmetic mean)

 A  B  C  D  E B  2 C  4  4 D  6  6  6 E  6  6  6  4 F  8  8  8  8  8

http://www.icp.ucl.ac.be/~opperd/private/upgma.html

Page 33: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

UPGMA Example (cont’d)

 A  B  C  D  E

 B  2

 C  4  4

 D  6  6  6

 E  6  6  6  4

 F  8  8  8  8  8

 A,B  C  D  E

 C  4

 D  6  6

 E  6  6  4

 F  8  8  8  8

kjji

iki

ji

ikij D

nnn

Dnn

nD ,,),( ?

??

?

D(A,B),C = (DAC + DBC) / 2 = 4D(A,B),D = (DAD + DBD) / 2 = 6D(A,B),E = (DAE + DBE) / 2 = 6D(A,B),F = (DAF + DBF) / 2 = 8

http://www.icp.ucl.ac.be/~opperd/private/upgma.html

Page 34: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

UPGMA Example (cont’d)

 A,B  C  D,E

 C  4

 D,E  6  6

 F  8  8  8

 AB,C

 D,E

 D,E  6

 F  8  8

 ABC,DE

 F  8

http://www.icp.ucl.ac.be/~opperd/private/upgma.html

Page 35: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

UPGMA and the effect of unequal rates of evolution

Page 36: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

Maximum parsimony and Informative sites

Site 1 2 3 4 5 6 7 8 9Sequence -------------------------1 A A G A G T G C A2 A G C C G T G C G3 A G A T A T C C A4 A G A G A T C C G * * *

Page 37: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG
Page 38: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG
Page 39: Identify conserved motifs - patterns (PROSITE) Profiles ...bio.biomedicine.gu.se/courses/bio2/lecture2.pdf · hth14 LVLHDIAEAVGMHESTISRVTT hth15 LNLRIVADAIKMHESTVSRVTS hth16 MTRGDIGNYLGLTVETISRLLG

Examples of software in phylogenetic analysis

PHYLIP (Phylogenetic Inference Package)http://evolution.genetics.washington.edu/phylip.html Examples:

DNAPARS / PROTPARS - maximum parsimonyNEIGHBOR - neighbor or UPGMA joining

PAUP (Phylogenetic Analysis Using Parsimony)(GCG version: PaupSearch)