62
1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

Embed Size (px)

Citation preview

Page 1: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

1

Paralogs

Inbal Yanover

Reading Group in Computational Molecular Biology

Page 2: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

2

• Orthologs: Homologous sequences are orthologous if they were separated by a speciation event

• Paralogs: paralogous if they were separated by a gene duplication event

Homologs

Page 3: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

3

Genomic duplication

Can involve:Individual genes• Genomic segments • Whole genome duplication (WGD)

Gene duplication has a major role in evolution.

Page 4: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

4

Whole genome duplication

• Large scale adaptation

• Polyploidy instability

• Back to stability: – gene loss– mutation– genomic rearrangements

Page 5: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

5

Fate of duplicated genes

Find specialized ‘niche’:• Localization• Temporal expression• Expression level

Another classification:• Sub – functionalization• Neo – functionalization (lowest probability)• Non – functionalization (70%)

Page 6: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

6

Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae

Kellis M, Birren BW, Lander ES.

Nature. Apr 2004.

First article

Page 7: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

7

• S. Cerevisiae genome arose from ancient whole-genome duplication of K. waltii

• Analyzing post duplication divergence of paralogs

Main ideas

Page 8: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

8

• After duplication, usually, one paralog would be lost (random local deletions)

• Both copies will be retained only if they acquire distinct functions

• Eventually: a few paralog genes in the same order and same orientation

• Those regions should be short since chromosomal rearrangements will disrupt gene order over time

Expected signature for genome duplication:

Page 9: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

9

Model for WGD followed by massive gene loss

Common ancestor

Page 10: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

10

Proving existence of an ancient WGD

• Look for a species (Y) in the lineage of S.cerevisiae (S).

• Y and S should have 1:2 mapping and:– Nearly every region in Y would correspond to 2 regions in S

(‘sister region’). – Each sister region in S would contain an ordered

subsequence of the genes in Y.

– Each sister region in S would contain ~half of Y genes. – Together, the two sister region account for nearly all Y’s genes.– Every region of S would correspond to one region in Y.

Page 11: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

11

Y = K. Waltii

• Sequencing and assembling into 8 complete chromosomes (16 in S. cerevisiae).

• 5,230 likely protein-coding genes (5,714 genes in S. cerevisiae).

• 7% of it’s genes shows no protein similarity to S. Cerevisiae

• Identifying orthologs regions:– Matching genes (based on protein similarity)– Regions with numerous matching genes in the same order.

• Most local regions in K. waltii mapped to two regions in S. cerevisiae.

• Each of those regions matched subset of K. waltii genes.

Page 12: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

12

Quantify observations

DCS – Doubly Conserved Synteny:

maximal regions in K. Waltii that map across their entire length to two distinct regions in S. cerevisiae.

Page 13: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

13

Gene and region correspondence

Page 14: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

14

Results

• 253 DCS blocks containing most of both genomes. (75% of K. waltii genes and 81% of S. cerevisiae genes)

• DCS blocks tile 85% of each K. waltii chromosome -> as expected in WGD

• Typical DCS block:– 27 genes.– Separated by small segments (~3 genes), that match

one conserved region in S. cerevisiae.

Page 15: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

15

Duplicate mapping of centromers

Note: no paralogs here !

Page 16: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

16

• Using the DCS blocks: define 253 sister regions in S. cerevisiae.

• Many of those could not be recognized without K. waltii mediation.

Duplicated blocks in S. cerevisiae

Page 17: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

17

Duplicated blocks in S. cerevisiae

Page 18: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

18

Zooming in on one sister region

Page 19: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

19

Conclusion

WGD event occurred in the Saccharomyces lineage after the divergence from K. waltii.

Page 20: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

20

Pattern of gene loss

• Number of chromosomes was doubled.

• Despite WGD, current S. cerevisiae genome:– 13% larger than K. waltii genome.– 10% more genes.

• Gene loss: – large segmental deletions <-> individual gene deletions.– Balanced between two paralogs <-> act primarily on one of

them.

• Analysis of DCS blocks show:– average size of lost segment: 2 genes.– average balance: 43%-57%.

Page 21: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

21

Two models – what happens after duplication event

• One copy preserves original function while the other one is free to diverge (Ohno)

• Both copies would diverge more rapidly and acquire new functions

Page 22: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

22

Study the evolution of the 457 gene pairs that arose by WGD:

• Use synteny to distinguish them from pairs which arose by local duplication events.

• Compute divergence rates for them, using sequences of K. waltii, S. cerevisiae and S. bayanus. (both amino acid and nucleotides).

Evolutionary analysis

Page 23: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

23

Results

• 17% of gene pairs (76 of 457) showed accelerated protein evolution relative to K. waltii.

• In 95% of them, accelerated evolution was confined to only one paralog

• Supports Ohno’s model: one paralog retains ancestral function, the other one gains a derived function

Page 24: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

24

• 115 gene pairs consisting of one paralog which has evolved >50% faster than the other.

• Often, derived paralogs are specialized in:– Cellular localization (Acc1 - Hfa1)– Temporal expression (Skt5 – Shc1)

Ancestral <-> derived paralogs

Page 25: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

25

Ancestral <-> derived paralogs, cont.

• Functional distinction confirmed with knockout experiments (in rich medium) of all 115 genes:– Deletion of ancestral paralog was lethal in 18%.

– Deletion of derived paralog was never lethal.

• Explanation:– Derived paralog is not essential under this conditions.– Ancestral paralogue compensate. (but not vice versa)

Page 26: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

26

• 60 of the 457 pairs (13%) showed decelerated protein evolution.

• Including highly constrained proteins: – ribosomal proteins (25)– Histone proteins (2)– Translation factors (4)

• In 90% of them both paralogs were very similar (98% amino acid identity versus 55% for all pairs)

more results

Page 27: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

27

However…

• ~70% of the gene pairs had neither accelerated protein evolution nor decelerated evolution (321/457)

• Possible explanations:– Too strict criteria– Divergence in regulatory regions will not be seen

here.– Sometimes it’s nice to have two copies.

Page 28: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

28

summary

• S. cerevisiae arose from an ancient WGD.– Massive loss of ~90% of duplicated genes in small

deletions.– Preserving at least one copy of each ancestral gene.

• divergence of paralogs:– Accelerated evolution (17%)– Derived genes tend to be specialized in function,

expression level and localization.– Derived genes tend to lose essential aspects of their

ancestral function.

Page 29: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

29

Second article

Transcription control reprogramming in genetic backup circuits.

Kafri R, Bar-Even A, Pilpel Y.

Nat Genet. Mar 2005.

Page 30: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

30

Introduction

• Severe mutations often don’t result in abnormal phenotype

• Partially ascribed to redundant paralogs, that provide backup to each other in case of mutation

• Suggested mechanism: transcriptional reprogramming

Page 31: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

31

Definitions

• Working on S. cerevisiae.

• Paralog pairs defined by BLASTing their DNA sequences.

• Dispensable genes = non essential.

Page 32: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

32

Expression parameters

• For each pair of paralog:

– Calculate 40 correlation coefficients of mRNA expression.

– Define: mean expression similarity <= mean.

– Define: partial co regulation (PCoR) <= standard deviation.

Page 33: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

33

Summary of observations

Co-expressedExpressed differently

Remote paralog - +

Close paralog + -

: +backup enabled

Page 34: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

34

Close paralogs

• Backup increases with co-expression.

• Similar sequences:– Similar expression– Enable backup

Co-expressedExpressed differently

Remote paralog

- +Close paralog

+ -

Page 35: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

35

Remote paralogs

Co-expressedExpressed differently

Remote paralog

- +Close paralog

+ -

• Backup is optimal in non-co expressed pairs.

• co-expression (little backup): • interaction• sub-functionalization

Page 36: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

36

Suggestion for backup mechanism

• A, B - genes which are expressed differently.

• Upon mutation in A: expression of gene B is reprogrammed.

• Result: wild type expression profile of A.

Page 37: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

37

Experimental verifier: reprogramming in Acs1/Acs2

Glucose

Acs1

Acs2

Glucose

Wild-type

Acs1 Acs2

Acs1 Acs2

Page 38: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

38

What is the mechanism enabling this change?

• Suggestion: backup occurs among paralogs with partially co regulation.

• Enable switching from different expression profile to similar one.

• Observation: PCoR predicts backup.

Page 39: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

39

0 0.2 0.4 0.6

0.6

0.8

1P

ropo

rtio

n o

f d

isp

ensa

ble

ge

nes

Partial motif content overlap is optimal for backup

O=|m1 ∩ m2|

|m1 U m2|

Motif content overlap (O)

Backup measure

Page 40: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

40

suggestion

• Unique motifs -> different expression level.

• Shared motifs -> enable responding to the same conditions.

Hypothesis: PCoR underlies reprogramming and backup.

Page 41: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

41

In high PCoR paralogs one gene is upregulated when other is deleted

<0.35 >0.45Partial co-regulation (predicted backup capacity)

Fol

d ch

ange

0.35 – 0.450

1

2

3

4

5

6

7

8

9

10

(Hug

hes

et a

l. C

ell 2

000)

Page 42: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

42

What controls reprogramming?

• Kinetic model:

TE2

E1

G1

G2

M1

M2

G1, G2 – paralog genes.

E1, E2 – their products.

T – TF which is generated by M1 and has binding site in both genes.

Page 43: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

43

Conclusions

• In remote paralogs:Genes which express differently but has partial

common regulation tends to backup each other.

• In close paralogs:Backup increases with co-expression.

Page 44: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

44

Third article

Gene regulatory network growth by duplication

Teichmann SA, Babu MM.

Nat Genet. May, 2004

Page 45: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

45

• What is the role of gene duplication in regulatory network evolution?

• Determine the extent to which duplicated genes inherit interactions from their ancestors.

• Describe possible mechanisms which leads to the formation of a new interaction.

Main questions

Page 46: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

46

• Transcription factor• DNA binding site• Target gene (or transcription unit)

Complex network:• 1 gene is regulated by few transcription factors.• 1 transcription factor controls more than one

gene.

Transcription factor

Target gene

Basic unit of gene regulation

Page 47: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

47

Research subjects

E. Coli and yeast known regulatory networks:

> 100 transcription factors regulate several hundreds genes.

Gene regulatory network in Yeast

477 proteins (109 TFs + 368 TGs)901 interactions

Gene regulatory network in E. coli

795 proteins (121 TFs + 674 TGs) 1423 interactions

Page 48: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

48

• duplication event:– Inherit regulatory interaction– Lose regulatory interaction

• Also, a new interaction may arise.

Duplication (reminder)

Page 49: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

49

• structural protein homology

• Detects more distant relationships than sequence

• > 65% of the genes are the result of gene duplication

• Same domain architecture -> common ancestor.

Homology detecting

Page 50: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

50

Duplication of transcription factor

Transcription factor

Target gene

Inheritance

Duplication of TF

Loss and gain

Page 51: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

51

Duplication of transcription factor (TF)

• At first, new TF regulates the same target gene.• Divergence:

– Regulate the same gene but respond to a different signal.

– Recognize a new binding site.

• More than 2/3 of TF in E. coli and yeast have at least one interaction in common with their duplicates (128 interaction in E. coli (10%). 188 interactions in yeast (22%))

Page 52: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

52

• Both homologous involves drug response.

• They responds to a different signal.

Pdr1 Pdr3

Flr1

Example: Duplication of TF in yeast

Page 53: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

53

Duplication of target gene and it’s upstream region

Transcription factor

Target gene

Loss and gain Inheritance

Page 54: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

54

Duplication of target gene (TG) and it’s upstream region

First, both genes are regulated by the same TF.

• Divergence: – Change coding sequence but stay under the

same TF control – Change upstream region as well, resulting in

recognition of a different TF

• 272 interaction in E. coli (22%). 166 interactions in yeast (20%)

Page 55: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

55

BioA and BioBFCD operons are regulated by BirA TF.

Those are homologous enzymes in the biotin biosynthesis pathway.

Example: Duplication of TG in E. coli

BioA BioF

BirA

Page 56: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

56

Duplication of transcription factor (TF) and its target gene (TG) around the same time

Duplication of TF+TGgain gain

Page 57: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

57

Duplication of transcription factor (TF) and its target gene (TG) around the same time

• Can happen if both were adjacent on the chromosome.

• New TF regulates only the new TG, while old TF regulates old TG.

• Divergence of TF or TG can result in additional interactions.

• 74 interaction in E. coli (6%). 31 interactions in yeast (4%).

Page 58: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

58

Example: Duplication of both TF and its TG in yeast

AraBAD RhaBAD

AraC RhaR

Page 59: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

60

Some more numbers

• Duplication and inheritance:

E. Coli yeast

TF: 10% 22%

TG: 22% 20%

Both: 6% 4%

Page 60: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

61

• Gene regulatory networks in E. coli and yeast:

The number of TG per TF obeys a power low.

• Do TF with many TG have many homologous genes as their target?

No.

Are duplication patterns linked to topology of networks?

Page 61: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

62

Conclusions

• In both E. coli and yeast ~90% of the interactions evolved by duplication:

– Half of them: duplication + inheritance of interaction

– Other half: duplication + gain of new interactions.

Page 62: 1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

63

The End

Of the semester…