Upload
alannah-roberts
View
217
Download
2
Tags:
Embed Size (px)
Citation preview
GENE TREES
Abhita Chugh
Phylogenetic tree
Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor
Species tree
• A phylogenetic tree showing the relationship among various species that are believed to have a common ancestor
Species tree
Shows the evolutionary history of a set of species
Speciation Nodes
Gene tree
• A phylogenetic tree that depicts how a single gene has evolved in a group of related species
• For this talk, evolve = duplication or loss
• Can be constructed over the topology of a species tree
Gene tree
Shows the evolutionary history of a single gene
Speciation Nodes
Duplicationnodes
Some definitions: Homologs
• Homolog: A gene related to a second gene by descent from a common ancestral DNA sequence
• Two types:
(i) Orthologs
(ii) Paralogs
Orthologs
• Genes in different species that evolved from a common ancestral gene by speciation - Retain the same function
Primates
Human Chimp
Speciation
Paralogs
• Genes related by duplication within a genome
• Evolve new functions
Primates
Chimp HumanRat
Rodents
Mouse
Why are Gene Trees interesting?
• Determine the evolutionary history of a gene family
• Infer gene duplications and losses
• Estimate bounds on times these events occurred
• Determine whether a given pair of homologs is orthologous or paralogous
Gene tree can be constructed over a species tree topology
PRIMATES INTELLIGENCE
No, seriously ..
Gene Tree Reconstruction
• Problem: Given a set of sequences from a gene family, find the tree that best explains the data
• 2 models:– Micro-evolutionary: considers sequence
evolution only– Macro-evolutionary: considers duplication
and losses only; useful but rarely used
Macro-evolutionary Problem
Macro-evolutionary Problem
Reconstruction algorithm
• Only macroevolutionary events are considered
• i – number of gene copies a node inherits from its parent
• j – number of gene copies a node sends to its children
• Range from 1 to m, where m is the maximum multiplicity of the gene in any species
Reconstruction algorithm
• The entering number of genes in root should be one• For each node, v, the dynamic program calculates the
minimum D/L Score of the subtree rooted at v, for all possible values of i and j
Step 1: Annotates minimum cost tables for all nodes
• cost [ i, j ] = cost at a node if it inherits i genes and sends j genes
• cost [ i ] = minimum cost at a node if it inherits i genes
= minimum { cost [ i, j ] }, for all j
cost[1] = 1cost[2] = 0
cost[1] = 1cost[2] = 0
cost[1] = 0cost[2] = 1
Cost of an internal node = cost of duplication/loss at the node + optimal cost of left subtree + optimal cost of right subtree, if they inherit j copies
cost[1, 1] = 0 + 0 + 1 = 1cost[1, 2] = 1 + 1 + 0 = 2cost[2, 1] = 1 + 0 + 1 = 2
cost[2, 2] = 0 + 1 + 0 = 1
cost[1] = 1cost[2] = 1
cost[1, 1] = 0 + 1 + 1 = 2cost[1, 2] = 1 + 0 + 1 = 2cost[1] = 2
Step 2: Enumerate all histories from the cost tables
• Maintain 3 variables for each node
• dups = optimal number of duplicated genes
• losses = optimal number of lost genes
• out = optimal number of genes to pass to its children
out = 1, losses = dups = 0
dups = 1losses = 0
out = 1 , losses = dups = 0
dups = 0losses = 0
dups = 1losses = 0
Step 3: Build a gene tree to represent the history
• From step 2: 1 duplication in humans & 1 duplication in frogs
• Build the gene tree with this information & the topology of the species tree
Hybrid Model