Phylogeny 2009

Embed Size (px)

Citation preview

  • 8/2/2019 Phylogeny 2009

    1/73

    Dr.G.P.L.Jayasree

    PHYLOGENETIC ANALYSIS AND PHYLOGENETIC TREES

  • 8/2/2019 Phylogeny 2009

    2/73

    TREE OF LIFE

  • 8/2/2019 Phylogeny 2009

    3/73

  • 8/2/2019 Phylogeny 2009

    4/73

    Evolution What we can see

    are the present-

    day species

    Offspring lookslike its parents

  • 8/2/2019 Phylogeny 2009

    5/73

    Biological FoundationsEvolution is driven by Inheritance Variation Mutations

    Phenotype Genotype

    Recombination Nature selects: survival of the fittest

    All organisms share a common ancestry

  • 8/2/2019 Phylogeny 2009

    6/73

    Terminology Phylogeny

    The evolutionary relationships among

    organisms, based on a common ancestor Phylogenetics

    Area of research concerned with finding the

    genetic relationships between species (Greek: phylon = race and genetic = birth

  • 8/2/2019 Phylogeny 2009

    7/73

    Phylogeny

    Orangutan Gorilla Chimpanzee Human

    From the Tree of the Life Website,

    University of Arizona

  • 8/2/2019 Phylogeny 2009

    8/73

    Applications of phylogenetic trees

    Evolution studies

    Systematic biology

    Medical research and epidemiology

    Ecology

  • 8/2/2019 Phylogeny 2009

    9/73

    Construction of phylogenetic

    trees Classic phylogenetic analysis uses

    morphological features

    Anatomy, size, number of legs, beak

    shape

    Modern phylogenetic analysis uses

    molecular informationGenetic material (DNA and protein sequences)

  • 8/2/2019 Phylogeny 2009

    10/73

    Advantages of molecularphylogenetic analysis

    Analogous features (share common

    function, but NOT common ancestry)

    can be misleading

    DNA sequences more simple to model,

    we only have the four states A, C, G, T DNA samples for sequence analysis

    easy to prepare

  • 8/2/2019 Phylogeny 2009

    11/73

    Phylogenetic Trees A graph representing the

    evolutionary history of a sequence

    Relationship of one sequence toother sequences

    Dissect the order of appearance of

    insertions, deletions, and mutations Predict function, observe changes

    A

    B

    C

    D

    Simple

    Tree

  • 8/2/2019 Phylogeny 2009

    12/73

    Parts of a phylogenetic treeNode

    Root

    Outgroup

    Ingroup

    Branch

  • 8/2/2019 Phylogeny 2009

    13/73

    Tree ShapesRooted Un-rooted

    Branches intersect atNodes

    Leaves are the topmost branches

    A

    B

    C

    D

    A

    B

    C

    D

    A

    B

    C

    D

  • 8/2/2019 Phylogeny 2009

    14/73

    Tree Characteristics Tree Properties

    Clade: all the descendants of a commonancestor represented by a node

    Distance: number of changes that havetaken place along a branch

    Tree Types Cladogram: shows the branching order of

    nodes

    Phylogram: shows branching order anddistances

    A

    B

    C

    D

    .035

    .009

    .057

    .044

    .012

    .016

    Phylogram

  • 8/2/2019 Phylogeny 2009

    15/73

    Taxon A

    Taxon B

    Taxon C

    Taxon D

    1

    1

    1

    6

    3

    5

    genetic change

    Taxon A

    Taxon B

    Taxon C

    Taxon D

    time

    Taxon A

    Taxon B

    Taxon C

    Taxon D

    no meaning

    Three types of trees

    Cladogram Phylogram Ultrametric tree

    All show the same evolutionary relationships, or branching orders, between the taxa.

  • 8/2/2019 Phylogeny 2009

    16/73

    Phylogenetic trees diagram the evolutionaryrelationshipsbetween the taxa

    ((A,(B,C)),(D,E)) = The above phylogeny as nested parentheses

    Taxon A

    Taxon B

    Taxon C

    Taxon E

    Taxon D

    No meaning to thespacing between thetaxa, or to the order inwhich they appear fromtop to bottom.

    This dimension either can have no scale (for cladograms),can be proportional to genetic distance or amount of change

    (for phylograms or additive trees), or can be proportionalto time (for ultrametric trees or true evolutionary trees).

    These say that B and C are more closely related to each other than either is to A,and that A, B, and C form a clade that is a sister group to the clade composed of

    D and E. If the tree has a time scale, then D and E are the most closely related.

  • 8/2/2019 Phylogeny 2009

    17/73

    Completely unresolvedor "star" phylogeny

    Partially resolvedphylogeny

    Fully resolved,bifurcating phylogeny

    A A A

    B

    B B

    C

    C

    C

    E

    E

    E

    D

    D D

    Polytomy or multifurcation A bifurcation

    The goal of phylogeny inference is to resolve thebranching orders of lineages in evolutionary trees:

  • 8/2/2019 Phylogeny 2009

    18/73

    C-B Stewart, NHGRI lecture,12/5/00

    There are three possible unrooted treesfor four taxa (A, B, C, D)

    A C

    B D

    Tree 1

    A B

    C D

    Tree 2

    A B

    D C

    Tree 3

    Phylogenetic tree building (or inference) methods are aimed atdiscovering which of the possible unrooted trees is "correct".

    We would like this to be the true biological tree that is, onethat accurately represents the evolutionary history of the taxa.However, we must settle for discovering the computationallycorrector optimaltree for the phylogenetic method of choice.

  • 8/2/2019 Phylogeny 2009

    19/73

    The number of unrooted trees increases in a greaterthan exponential manner with number of taxa

    # Taxa ( N)

    3

    4

    5

    6

    78

    9

    10

    .

    .

    .

    .

    30

    # Unrooted trees

    1

    3

    15

    105

    94510,935

    135,135

    2,027,025

    .

    .

    .

    .

    3.58 x 1036

    CA

    B D

    A B

    C

    A D

    B E

    C

    A D

    B E

    C

    F (2n 5)! / ((n-3)!2n-3)

  • 8/2/2019 Phylogeny 2009

    20/73

    A

    BC

    Root

    D

    Unrooted tree

    Note that in this rooted tree, taxon A is mostclosely related to taxon B, and together theyare equally distantly related to taxa C and D.

    C D

    Root

    Rooted tree

    A

    B

  • 8/2/2019 Phylogeny 2009

    21/73

    An unrooted, four-taxon tree theoretically can be rooted in fivedifferent places to produce five different rooted trees

    The unrooted tree 1:A C

    B D

    Rooted tree 1d

    C

    D

    A

    B

    4

    Rooted tree 1c

    A

    B

    C

    D

    3

    Rooted tree 1e

    D

    C

    A

    B

    5

    Rooted tree 1b

    A

    B

    C

    D

    2

    Rooted tree 1a

    B

    A

    C

    D

    1

    These trees showfive different evolutionary relationships among the taxa!

  • 8/2/2019 Phylogeny 2009

    22/73

    By outgroup:

    Uses taxa (the outgroup) that are known

    to fall outside of the group of interest (theingroup). Requires some prior

    knowledge about the relationships amongthe taxa. The outgroup can either bespecies (e.g., birds to root a mammaliantree) or previous gene duplicates (e.g.,a-globins to root b-globins).

    There are two major ways to root trees:

    A

    B

    C

    D

    10

    2

    3

    5

    2

    By midpoint or distance:

    Roots the tree at the midway pointbetween the two most distant taxa inthe tree, as determined by branchlengths. Assumes that the taxa areevolving in a clock-like manner. This

    assumption is built into some of thedistance-based tree building methods.

    outgroup

    d(A,D) = 10 + 3 + 5 = 18Midpoint = 18 / 2 = 9

  • 8/2/2019 Phylogeny 2009

    23/73

    Similarity vs. Evolutionary Relationship:

    Similarity and relationship are notthe same thing, even though

    evolutionary relationship is inferred from certain types of similarity.

    Similar: having likeness or resemblance (an observation)

    Related: genetically connected (an historical fact)

    Two taxa can be most similar without being most closely-related:

    Taxon A

    Taxon B

    Taxon C

    Taxon D

    1

    1

    1

    6

    3

    5

    C is more similar in sequenceto A (d= 3) than to B (d= 7),but C and B are most closelyrelated (that is, C and B shareda common ancestor more recentlythan either did with A).

  • 8/2/2019 Phylogeny 2009

    24/73

    Procedure: Steps of a molecularphylogenetic analysis

    1. Decide what sequences to examine

    2. Determine the evolutionary distancesbetween the sequences and build

    distance matrix

    3. Phylogenetic tree construction

  • 8/2/2019 Phylogeny 2009

    25/73

    Decide what to examine

    Choose homologous sequences in

    different species

    Homologous sequences must, by

    definition, be derived from a common

    ancestral sequence Homology is not similarity

  • 8/2/2019 Phylogeny 2009

    26/73

    Methods for DeterminingTrees

    Sequence Based methods:

    Maximum Parsimony

    Maximum Likelihood

    Distance Based methods:

    UPGMA

    Neighbor Joining

  • 8/2/2019 Phylogeny 2009

    27/73

    Distance Methods

    Distanceis expressed as the fractionof sites that differ between two

    sequences in an alignment Sequences with the smallest number of

    changes (shortest distance) are related

    taxa

  • 8/2/2019 Phylogeny 2009

    28/73

    Determine the evolutionarydistances and build distance matrix

    For molecular data, evolutionary distances

    can be the observed number of nucleotide

    differences between the pairs of species. Distance matrix: simply a table showing

    the evolutionary distances between all

    pairs of sequences in the dataset

  • 8/2/2019 Phylogeny 2009

    29/73

    Determine the evolutionary distances andbuild distance matrix - A simple example

    1. AGGCCATGAAT TAAGAATAA2. AGCCCATGGATAAAGAGTAA

    3. AGGACATGAATTAAGAATAA

    4. AAGCCAAGAATTACGAATAA

    Distance Matrix

    In this example the evolutionary distance isexpressed as the number of nucleotide differences

    for each sequence pair. For example, sequences 1and 2 are 20 nucleotides in length and have fourdifferences, corresponding to an evolutionarydifference of 4/20= 0.2.

  • 8/2/2019 Phylogeny 2009

    30/73

    1. AGGCCATGAATTAAGAATAA2. AGCCCATGGATAAAGAGTAA

    3. AGGACATGAATTAAGAATAA4. AAGCCAAGAATTACGAATAA

    1 2 3 41 --- 0.2 0.05 0.15

    2 --- --- 0.25 0.4

    3 --- --- --- 0.2

    4 --- --- --- ---

  • 8/2/2019 Phylogeny 2009

    31/73

    Additive Distance Matrices

    Distance based Phylogenetic

  • 8/2/2019 Phylogeny 2009

    32/73

    Distance-based PhylogeneticMethods

  • 8/2/2019 Phylogeny 2009

    33/73

    Distance Methods - UPGMA

    UPGMA (Unweighted Pair-Group Methodwith Arithmetic mean) Sequentially find pair of taxa with smallest

    distance between them, and define branching asmidpoint of two

    Assumes the tree is additive and that rate ofchange is constant in all of the branches

    DAB

    2 D(AB)C2

    D(ABC)D2

    A

    B

    C

    D

    A

    B

    A

    B

    C

  • 8/2/2019 Phylogeny 2009

    34/73

    Distance Methods - NJ Neighbor-Joining (NJ): useful when there are

    different rates of evolution within a tree Each possible pair-wise alignment is examined. Calculate

    distance from each sequence to every other sequence

    Choose the pair with the lowest distance value and jointhem to produce the minimal length tree

    Update distance matrix where joined node is substituted fortwo original taxa and then repeat process

    A

    B

    C

    DF

    G

    H A

    D

    B

    C E

    F

    GH

    2 11

    A

    D

    B C

    E

    F

    GH

    2 1

    3

  • 8/2/2019 Phylogeny 2009

    35/73

    DNA Sequence Evolution

    AAGACTT

    TGGACTTAAGGCCT

    -3 mil yrs

    -2 mil yrs

    -1 mil yrs

    today

    AGGGCAT TAGCCCT AGCACTT

    AAGGCCT TGGACTT

    TAGCCCA TAGACTT AGCGCTTAGCACAAAGGGCAT

    AGGGCAT TAGCCCT AGCACTT

    AAGACTT

    TGGACTTAAGGCCT

    AGGGCAT TAGCCCT AGCACTT

    AAGGCCT TGGACTT

    AGCGCTTAGCACAATAGACTTTAGCCCAAGGGCAT

  • 8/2/2019 Phylogeny 2009

    36/73

    Phylogeny Problem

    TAGCCCA TAGACTT TGCACAA TGCGCTTAGGGCAT

    U V W X Y

    U

    V W

    X

    Y

  • 8/2/2019 Phylogeny 2009

    37/73

    UPGMA(Unweighted pair group methodwith arithmetic mean)

  • 8/2/2019 Phylogeny 2009

    38/73

    UPGMA (Unweighted Pair Group Method with Arithmetic Mean)

    Assumes that rate of change along the branches of tree areconstant and distances are ultrametric (dAC < max(dAB, dBC))

  • 8/2/2019 Phylogeny 2009

    39/73

    There are 12 changes in the sequence so

    o + p = 12 m + n = 1512/2 = 6 15/2 = 7.5

    To calculate q we need to calculate the average distance fromall sequences to each other

    (MO + MP + NO + NP)/4 =(26 + 28 + 29 + 31)/4 = 28.5

    q1 + 7.5 = 14.25q1 = 6.75q2 + 6.0 = 14.25q2 = 8.25

    Therefore, q = 15

    6.0

    6.0

  • 8/2/2019 Phylogeny 2009

    40/73

    Maximum parsimony

    Parsimony is a special case of likelihood The tree with the smallest number of

    mutations is the maximum parsimony

    tree . Best tree is one where minimal changes

    take place .

  • 8/2/2019 Phylogeny 2009

    41/73

    Maximum parsimony (example)

    Input: Four sequences

    ACT

    ACA

    GTT

    GTA

    Question: which of the three trees has thebest MP scores?

  • 8/2/2019 Phylogeny 2009

    42/73

    Maximum Parsimony

    ACTACA GTT

    GTA3

    1

    1

    MP score = 5

    ACT ACA

    GTAGTT

    ACA GTT

    3 31

    MP score = 11

    GTA

    ACT

    GTT

    ACA

    ACA GTT

    2 3 1

    MP score = 10

    ACA

    GTT 31

    1

    3

    Optimal MP tree

  • 8/2/2019 Phylogeny 2009

    43/73

  • 8/2/2019 Phylogeny 2009

    44/73

    Maximum Parsimony Example

    1A AGAG T GCA

    2AGC CG T GCG

    3AGATAT C CA4AGAGAT C CG

    four sequences, nine sites, threepossible unrooted trees

  • 8/2/2019 Phylogeny 2009

    45/73

    Maximum Parsimony Example

    Possible Trees I:

    (3)AGATATCCA

    AGCCGTGCG AGAGATCCG

    (2)AGCCGTGCG (4)AGAGATCCG

    (1)AAGAGTGCA4

    0

    42

    0

    Number of Mutations: 10

  • 8/2/2019 Phylogeny 2009

    46/73

    Maximum Parsimony Example

    Possible Trees II:

    (3)AGATATCCA

    AGGAGTGCA AGAGGTCCG

    (2)AGCCGTGCG

    (4)AGAGATCCG

    (1)AAGAGTGCA1

    4

    53

    1

    Number of Mutations: 14

  • 8/2/2019 Phylogeny 2009

    47/73

    Maximum Parsimony Example

    Possible Trees III:

    (3)AGATATCCA

    AGGAGTGCA AGATGTCCG

    (2)AGCCGTGCG

    (4)AGAGATCCG

    (1)AAGAGTGCA1

    5

    53

    2

    Number of Mutations: 16

    Tree I has the topology with the least number of

    mutations and thus is the most parsimonious tree.

  • 8/2/2019 Phylogeny 2009

    48/73

    Maximum Parsimony Example

    Some sites are informative, others arenot

    Informative site: there are at least twodifferent kinds of nucleotides at thesite, each of which is represented in atleast two of the sequences under study.

    Only informative sites are considered

  • 8/2/2019 Phylogeny 2009

    49/73

    Maximum Parsimony Example

    1A AGAG T GCA

    2AGC CG T GCG

    3AGATAT C CA4AGAGAT C CG

    Three informative columns

  • 8/2/2019 Phylogeny 2009

    50/73

    Maximum Parsimony Example

    1 GGA

    2 GGG

    3ACA4AC G

    Tree 1: 4Tree 2: 5

    Tree 3: 6

    1

    2 4

    3 1

    3 4

    2 1

    4 2

    3

    1

    2 4

    3 1

    3 4

    2 1

    4 2

    3

    Column 1

    Column 2

    Column 3

    1

    2 4

    3 1

    3 4

    2 1

    4 2

    3

    Is a substitution

  • 8/2/2019 Phylogeny 2009

    51/73

    Maximum Parsimony Problems

    Small Parsimony Problem: Given the phylogeny topology, compute the

    internal nodes to minimize the total number of

    mutations; Used to evaluate the phylogeny;

    Polynomial time solvable.

    Large Parsimony Problem : Given that we have a way of determining the

    score of a given phylogeny, search through allpossible phylogenies to find the best one;

    Proved to be NP-complete.

  • 8/2/2019 Phylogeny 2009

    52/73

    Fitchs Algorithm forSmall

    Parsimony Problem Consider each site separately;

    Dynamic programming style;

    Constructs a set of possible states (possiblenucleotides) for each internal node;

    Start at the leaves of the phylogeny. Eachleaf is labeled with the singleton set

    containing the nucleotide at that particularsite.

    Traverse in a post order manner (all of thechildren of the current node have been

    visited before the current node).

  • 8/2/2019 Phylogeny 2009

    53/73

    Parsimony Example

    Score is evaluated on each positionindependetly. Scores are then summed

    over all positions.

    Solved independently for each position

  • 8/2/2019 Phylogeny 2009

    54/73

    Maximum Parsimony

    1 2 3 4 5 6 7 8 9 10

    Species 1 - A G G G T A A C T G

    Species 2 - A C G A T T A T T A

    Species 3 - A T A A T T G T C T

    Species 4 - A A T G T T G T C G

    How many possible unrooted trees?

  • 8/2/2019 Phylogeny 2009

    55/73

    Maximum ParsimonyHow many possible unrooted trees?

    1

    3

    2

    4

    1

    2

    3

    4

    1

    4

    3

    2

    1 2 3 4 5 6 7 8 9 10

    Species 1 - A G G G T A A C T G

    Species 2 - A C G A T T A T T A

    Species 3 - A T A A T T G T C T

    Species 4 - A A T G T T G T C G

  • 8/2/2019 Phylogeny 2009

    56/73

    Maximum Parsimony1 2 3 4 5 6 7 8 9 10

    1 -AG G G T A A C T G

    2 -AC G A T T A T T A

    3 -AT A A T T G T C T

    4 -AA T G T T G T C G0

    0

    0

    1

    3

    2

    4

    1

    2

    3

    4

    1

    4

    3

    2

  • 8/2/2019 Phylogeny 2009

    57/73

    Maximum Parsimony1 2 3 4 5 6 7 8 9 10

    1 - A G G G T A A C T G

    2 - A C G A T T A T T A

    3 - A T A A T T G T C T

    4 - AAT G T T G T C G0 3

    0 3

    0 3

    1

    3

    2

    4

    1

    2

    3

    4

    1

    4

    3

    2

  • 8/2/2019 Phylogeny 2009

    58/73

    Maximum Parsimony

    4

    1 - G

    2 - C

    3 - T

    4 - A

    1

    2

    3

    4A

    G

    C

    T

    C

    A

    G

    T

    C1

    3

    2

    4C

    C

    G

    A

    T1

    4

    3

    2C

    3

    3

    3

  • 8/2/2019 Phylogeny 2009

    59/73

    Maximum Parsimony1 2 3 4 5 6 7 8 9 10

    1 - A G G G T A A C T G

    2 - A C G A T T A T T A

    3 - A TAA T T G T C T

    4 - A A T G T T G T C G0 3 2

    0 3 2

    0 3 2

    1

    3

    2

    4

    1

    2

    3

    4

    1

    4

    3

    2

  • 8/2/2019 Phylogeny 2009

    60/73

    Maximum Parsimony1 2 3 4 5 6 7 8 9 10

    1 - A G G G T A A C T G

    2 - A C GAT T A T T A

    3 - A T AAT T G T C T

    4 - A A T G T T G T C G0 3 2 2

    0 3 2 1

    0 3 2 2

    1

    3

    2

    4

    1

    2

    3

    4

    1

    4

    3

    2

  • 8/2/2019 Phylogeny 2009

    61/73

    Maximum Parsimony

    4

    1 - G

    2 - A3 - A

    4 - G

    1

    2

    3

    4G

    G

    A

    A

    A

    G

    G

    A

    A1

    3

    2

    4

    A

    G

    A

    A

    G1

    4

    3

    2A

    2

    2

    1

  • 8/2/2019 Phylogeny 2009

    62/73

    Maximum Parsimony

    0 3 2 2 0 1 1 1 1 3 14

    0 3 2 1 0 1 2 1 2 3 15

    0 3 2 2 0 1 2 1 2 3 16

    1

    3

    2

    4

    1

    2

    3

    4

    1

    4

    3

    2

    H i h l ti l i

  • 8/2/2019 Phylogeny 2009

    63/73

    How is phylogenetic analysisdone?

    Start with a simple data set of 6 nucleotidesfrom 5 species:

    Look at a single character

    A ACGTAAB CCTTAA

    C CGTCAA

    D CGTCCGE CGTCCG

  • 8/2/2019 Phylogeny 2009

    64/73

    Phylogenetic Analysis cont.

    Look at position 1: species A has an Awhere everyone else has a C

    Look at position 3: species A has an Gwhere everyone else has a C

    Look at position 2: species A and B

    have a C where everyone else has a G Continue on with every position

  • 8/2/2019 Phylogeny 2009

    65/73

    Phylo. Analysis cont. Then you would map these changes to a tree

    Either the tree is known or this data set isused to infer a tree

    Usually an outgroup is included so as to givean idea of what is the ancestral character ateach state

    This is very easy with few characters and

    species, but most studies include 100 specieswith over 10,000 nucleotides from eachspecies

    A computer program is needed in most

    studies with large data sets

  • 8/2/2019 Phylogeny 2009

    66/73

    Resulting Phylogeny

    The phylogeny might look like this if wemapped all the changes from our data

    set:

    Comparative Protein Analysis

  • 8/2/2019 Phylogeny 2009

    67/73

    Comparative Protein Analysis

    Identify proteins within an organism that arerelated to each other and across differentspecies

    Generate an evolutionary history of related

    genes Locate insertions, deletions, and substitutions

    that have occurred during evolution

    CREATE CREASE -RELAPSE

    GREASER

    (Ancestor)

    Time

  • 8/2/2019 Phylogeny 2009

    68/73

    Multiple Sequence Alignments Place residues in columns that are derived

    from a common ancestral residue

    MSA can reveal sequence patterns

    Demonstration of homology between >2

    sequences

    Identification of functionally important sites

    Protein function prediction

    Structure prediction

    Search for weak but significant similarities in

    databases Design PCR primers for related gene

    identification

    Genome sequencing: contig assembly

    CREAT--E-

    CREAS--E-

    -RELAPSE-

    GREAS--ER

    CREATE

    CREASE

    GREASER

    RELAPSE

    123456789

    SeqA

    SeqB

    SeqC

    SeqD

    C i f h d

  • 8/2/2019 Phylogeny 2009

    69/73

    Comparison of MethodsNeighbor-

    joining/UPGMAMaximum parsimony Maximum likelihood

    Uses only pairwisedistances

    Uses only sharedderived characters

    Uses all data

    Minimizes distancebetween nearest

    neighbors

    Minimizes total distance Maximizes tree likelihoodgiven specific parameter

    values

    Very fast Slow Veryslow

    Easily trapped in localoptima

    Assumptions fail whenevolution is rapid

    Highly dependent onassumed evolution model

    Good for generatingtentative tree

    Best option whentractable (

  • 8/2/2019 Phylogeny 2009

    70/73

    1. Find Related Sequences

    BLAST

    www.ncbi.nih.gov/BLASTMLEICLKLVGCKSKKGLSSSSSCYLEEALQRPVASDFEPQGLSEAARWNSKENLLAGPSENDPNLFVALY

    DFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVADGLITTLHYPAPKRNKPTVYGVSPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVNAVVLLYMATQISSAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAPESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERPEGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEVEKELGKQGVRGAVSTLLQAPELPTKTRTSRRAAEHRDTTDVPEMPHSKGQGESDPLDHEPAVSPLLPRKERGPPEGGLNEDERLLPKDKKTNLFSALIKKKKKTAPTPPKRSSSFREMDGQPERRGAGEEEGRDISNGALAFTPLDTADPAKSPKPSNGAGVPNGALRESGGSGFRSPHLWKKSSTLTSSRLATGEEEGGGSSSKRFLRSCSASCVPHGAKDTEWRSVTLPRDLQSTGRQFDSSTFGGHKSEKPALPRKRAGENRSDQVTRGTVTPPPRLVKKNEEAADEVFKDIMESSPGSSPPNLTPKPLRRQVTVAPASGLPHKEEAGKGSALGTPAAAEPVTPTSKAGSGAPGGTSKGPAEESRVRRHKHSSESPGRDKGKLSRLKPAPPPPPAASAGKAGGKPSQSPSQEAAGEAVLGAKTKATSLVDAVNSDAAKPSQPGEGLKKPVLPATPKPQSAKPSGTPISPAPVPSTLPSASSALAGDQPSSTAFIPLISTRVSLRKTRQPPERIASGAITKGVVLDSTEALCLAISRNSEQMASHSAVLEAGKNLYTFCVSYVDSIQQMRNKFAFREAINKLENNLRELQICPATAGSGPAATQDFSKLLSSVKEISDIVQR

  • 8/2/2019 Phylogeny 2009

    71/73

    2. Compile & Align Sequences

    Compile Sequences into FASTA format

    Align clustalW

    clustalX

    >Human

    MPALGYKFSTW

    >MouseMDGSTDYGILQINS

    >Rat

    MKKP..

    >Murine_Leukemia_Virus

    MTSR.

    3 A l i th li d

  • 8/2/2019 Phylogeny 2009

    72/73

    3.Analysing the aligned sequencematrix

    PHYLIP

    POY

    PAUP, GCG

    And many more... (274 softwarepackages described at one website)

  • 8/2/2019 Phylogeny 2009

    73/73

    PHYLIP (Phylogeny InferencePackage)

    Available free in Windows/MacOS/Linuxsystems

    Parsimony, distance matrix and likelihoodmethods (bootstrapping and consensus trees)

    Data can be molecular sequences, genefrequencies, restriction sites and fragments,distance matrices and discrete characters

    http://evolution.genetics.washington.edu/phylip.html

    http://evolution.genetics.washington.edu/phylip.htmlhttp://evolution.genetics.washington.edu/phylip.html