Upload
peregrine-goodwin
View
223
Download
3
Tags:
Embed Size (px)
Citation preview
1
Paralogs
Inbal Yanover
Reading Group in Computational Molecular Biology
2
• Orthologs: Homologous sequences are orthologous if they were separated by a speciation event
• Paralogs: paralogous if they were separated by a gene duplication event
Homologs
3
Genomic duplication
Can involve:Individual genes• Genomic segments • Whole genome duplication (WGD)
Gene duplication has a major role in evolution.
4
Whole genome duplication
• Large scale adaptation
• Polyploidy instability
• Back to stability: – gene loss– mutation– genomic rearrangements
5
Fate of duplicated genes
Find specialized ‘niche’:• Localization• Temporal expression• Expression level
Another classification:• Sub – functionalization• Neo – functionalization (lowest probability)• Non – functionalization (70%)
6
Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae
Kellis M, Birren BW, Lander ES.
Nature. Apr 2004.
First article
7
• S. Cerevisiae genome arose from ancient whole-genome duplication of K. waltii
• Analyzing post duplication divergence of paralogs
Main ideas
8
• After duplication, usually, one paralog would be lost (random local deletions)
• Both copies will be retained only if they acquire distinct functions
• Eventually: a few paralog genes in the same order and same orientation
• Those regions should be short since chromosomal rearrangements will disrupt gene order over time
Expected signature for genome duplication:
9
Model for WGD followed by massive gene loss
Common ancestor
10
Proving existence of an ancient WGD
• Look for a species (Y) in the lineage of S.cerevisiae (S).
• Y and S should have 1:2 mapping and:– Nearly every region in Y would correspond to 2 regions in S
(‘sister region’). – Each sister region in S would contain an ordered
subsequence of the genes in Y.
– Each sister region in S would contain ~half of Y genes. – Together, the two sister region account for nearly all Y’s genes.– Every region of S would correspond to one region in Y.
11
Y = K. Waltii
• Sequencing and assembling into 8 complete chromosomes (16 in S. cerevisiae).
• 5,230 likely protein-coding genes (5,714 genes in S. cerevisiae).
• 7% of it’s genes shows no protein similarity to S. Cerevisiae
• Identifying orthologs regions:– Matching genes (based on protein similarity)– Regions with numerous matching genes in the same order.
• Most local regions in K. waltii mapped to two regions in S. cerevisiae.
• Each of those regions matched subset of K. waltii genes.
12
Quantify observations
DCS – Doubly Conserved Synteny:
maximal regions in K. Waltii that map across their entire length to two distinct regions in S. cerevisiae.
13
Gene and region correspondence
14
Results
• 253 DCS blocks containing most of both genomes. (75% of K. waltii genes and 81% of S. cerevisiae genes)
• DCS blocks tile 85% of each K. waltii chromosome -> as expected in WGD
• Typical DCS block:– 27 genes.– Separated by small segments (~3 genes), that match
one conserved region in S. cerevisiae.
15
Duplicate mapping of centromers
Note: no paralogs here !
16
• Using the DCS blocks: define 253 sister regions in S. cerevisiae.
• Many of those could not be recognized without K. waltii mediation.
Duplicated blocks in S. cerevisiae
17
Duplicated blocks in S. cerevisiae
18
Zooming in on one sister region
19
Conclusion
WGD event occurred in the Saccharomyces lineage after the divergence from K. waltii.
20
Pattern of gene loss
• Number of chromosomes was doubled.
• Despite WGD, current S. cerevisiae genome:– 13% larger than K. waltii genome.– 10% more genes.
• Gene loss: – large segmental deletions <-> individual gene deletions.– Balanced between two paralogs <-> act primarily on one of
them.
• Analysis of DCS blocks show:– average size of lost segment: 2 genes.– average balance: 43%-57%.
21
Two models – what happens after duplication event
• One copy preserves original function while the other one is free to diverge (Ohno)
• Both copies would diverge more rapidly and acquire new functions
22
Study the evolution of the 457 gene pairs that arose by WGD:
• Use synteny to distinguish them from pairs which arose by local duplication events.
• Compute divergence rates for them, using sequences of K. waltii, S. cerevisiae and S. bayanus. (both amino acid and nucleotides).
Evolutionary analysis
23
Results
• 17% of gene pairs (76 of 457) showed accelerated protein evolution relative to K. waltii.
• In 95% of them, accelerated evolution was confined to only one paralog
• Supports Ohno’s model: one paralog retains ancestral function, the other one gains a derived function
24
• 115 gene pairs consisting of one paralog which has evolved >50% faster than the other.
• Often, derived paralogs are specialized in:– Cellular localization (Acc1 - Hfa1)– Temporal expression (Skt5 – Shc1)
Ancestral <-> derived paralogs
25
Ancestral <-> derived paralogs, cont.
• Functional distinction confirmed with knockout experiments (in rich medium) of all 115 genes:– Deletion of ancestral paralog was lethal in 18%.
– Deletion of derived paralog was never lethal.
• Explanation:– Derived paralog is not essential under this conditions.– Ancestral paralogue compensate. (but not vice versa)
26
• 60 of the 457 pairs (13%) showed decelerated protein evolution.
• Including highly constrained proteins: – ribosomal proteins (25)– Histone proteins (2)– Translation factors (4)
• In 90% of them both paralogs were very similar (98% amino acid identity versus 55% for all pairs)
more results
27
However…
• ~70% of the gene pairs had neither accelerated protein evolution nor decelerated evolution (321/457)
• Possible explanations:– Too strict criteria– Divergence in regulatory regions will not be seen
here.– Sometimes it’s nice to have two copies.
28
summary
• S. cerevisiae arose from an ancient WGD.– Massive loss of ~90% of duplicated genes in small
deletions.– Preserving at least one copy of each ancestral gene.
• divergence of paralogs:– Accelerated evolution (17%)– Derived genes tend to be specialized in function,
expression level and localization.– Derived genes tend to lose essential aspects of their
ancestral function.
29
Second article
Transcription control reprogramming in genetic backup circuits.
Kafri R, Bar-Even A, Pilpel Y.
Nat Genet. Mar 2005.
30
Introduction
• Severe mutations often don’t result in abnormal phenotype
• Partially ascribed to redundant paralogs, that provide backup to each other in case of mutation
• Suggested mechanism: transcriptional reprogramming
31
Definitions
• Working on S. cerevisiae.
• Paralog pairs defined by BLASTing their DNA sequences.
• Dispensable genes = non essential.
32
Expression parameters
• For each pair of paralog:
– Calculate 40 correlation coefficients of mRNA expression.
– Define: mean expression similarity <= mean.
– Define: partial co regulation (PCoR) <= standard deviation.
33
Summary of observations
Co-expressedExpressed differently
Remote paralog - +
Close paralog + -
: +backup enabled
34
Close paralogs
• Backup increases with co-expression.
• Similar sequences:– Similar expression– Enable backup
Co-expressedExpressed differently
Remote paralog
- +Close paralog
+ -
35
Remote paralogs
Co-expressedExpressed differently
Remote paralog
- +Close paralog
+ -
• Backup is optimal in non-co expressed pairs.
• co-expression (little backup): • interaction• sub-functionalization
36
Suggestion for backup mechanism
• A, B - genes which are expressed differently.
• Upon mutation in A: expression of gene B is reprogrammed.
• Result: wild type expression profile of A.
37
Experimental verifier: reprogramming in Acs1/Acs2
Glucose
Acs1
Acs2
Glucose
Wild-type
Acs1 Acs2
Acs1 Acs2
38
What is the mechanism enabling this change?
• Suggestion: backup occurs among paralogs with partially co regulation.
• Enable switching from different expression profile to similar one.
• Observation: PCoR predicts backup.
39
0 0.2 0.4 0.6
0.6
0.8
1P
ropo
rtio
n o
f d
isp
ensa
ble
ge
nes
Partial motif content overlap is optimal for backup
O=|m1 ∩ m2|
|m1 U m2|
Motif content overlap (O)
Backup measure
40
suggestion
• Unique motifs -> different expression level.
• Shared motifs -> enable responding to the same conditions.
Hypothesis: PCoR underlies reprogramming and backup.
41
In high PCoR paralogs one gene is upregulated when other is deleted
<0.35 >0.45Partial co-regulation (predicted backup capacity)
Fol
d ch
ange
0.35 – 0.450
1
2
3
4
5
6
7
8
9
10
(Hug
hes
et a
l. C
ell 2
000)
42
What controls reprogramming?
• Kinetic model:
TE2
E1
G1
G2
M1
M2
G1, G2 – paralog genes.
E1, E2 – their products.
T – TF which is generated by M1 and has binding site in both genes.
43
Conclusions
• In remote paralogs:Genes which express differently but has partial
common regulation tends to backup each other.
• In close paralogs:Backup increases with co-expression.
44
Third article
Gene regulatory network growth by duplication
Teichmann SA, Babu MM.
Nat Genet. May, 2004
45
• What is the role of gene duplication in regulatory network evolution?
• Determine the extent to which duplicated genes inherit interactions from their ancestors.
• Describe possible mechanisms which leads to the formation of a new interaction.
Main questions
46
• Transcription factor• DNA binding site• Target gene (or transcription unit)
Complex network:• 1 gene is regulated by few transcription factors.• 1 transcription factor controls more than one
gene.
Transcription factor
Target gene
Basic unit of gene regulation
47
Research subjects
E. Coli and yeast known regulatory networks:
> 100 transcription factors regulate several hundreds genes.
Gene regulatory network in Yeast
477 proteins (109 TFs + 368 TGs)901 interactions
Gene regulatory network in E. coli
795 proteins (121 TFs + 674 TGs) 1423 interactions
48
• duplication event:– Inherit regulatory interaction– Lose regulatory interaction
• Also, a new interaction may arise.
Duplication (reminder)
49
• structural protein homology
• Detects more distant relationships than sequence
• > 65% of the genes are the result of gene duplication
• Same domain architecture -> common ancestor.
Homology detecting
50
Duplication of transcription factor
Transcription factor
Target gene
Inheritance
Duplication of TF
Loss and gain
51
Duplication of transcription factor (TF)
• At first, new TF regulates the same target gene.• Divergence:
– Regulate the same gene but respond to a different signal.
– Recognize a new binding site.
• More than 2/3 of TF in E. coli and yeast have at least one interaction in common with their duplicates (128 interaction in E. coli (10%). 188 interactions in yeast (22%))
52
• Both homologous involves drug response.
• They responds to a different signal.
Pdr1 Pdr3
Flr1
Example: Duplication of TF in yeast
53
Duplication of target gene and it’s upstream region
Transcription factor
Target gene
Loss and gain Inheritance
54
Duplication of target gene (TG) and it’s upstream region
First, both genes are regulated by the same TF.
• Divergence: – Change coding sequence but stay under the
same TF control – Change upstream region as well, resulting in
recognition of a different TF
• 272 interaction in E. coli (22%). 166 interactions in yeast (20%)
55
BioA and BioBFCD operons are regulated by BirA TF.
Those are homologous enzymes in the biotin biosynthesis pathway.
Example: Duplication of TG in E. coli
BioA BioF
BirA
56
Duplication of transcription factor (TF) and its target gene (TG) around the same time
Duplication of TF+TGgain gain
57
Duplication of transcription factor (TF) and its target gene (TG) around the same time
• Can happen if both were adjacent on the chromosome.
• New TF regulates only the new TG, while old TF regulates old TG.
• Divergence of TF or TG can result in additional interactions.
• 74 interaction in E. coli (6%). 31 interactions in yeast (4%).
58
Example: Duplication of both TF and its TG in yeast
AraBAD RhaBAD
AraC RhaR
60
Some more numbers
• Duplication and inheritance:
E. Coli yeast
TF: 10% 22%
TG: 22% 20%
Both: 6% 4%
61
• Gene regulatory networks in E. coli and yeast:
The number of TG per TF obeys a power low.
• Do TF with many TG have many homologous genes as their target?
No.
Are duplication patterns linked to topology of networks?
62
Conclusions
• In both E. coli and yeast ~90% of the interactions evolved by duplication:
– Half of them: duplication + inheritance of interaction
– Other half: duplication + gain of new interactions.
63
The End
Of the semester…