1 Can junk DNA be exapted? Dan Graur Bat Sheva Workshop

Preview:

Citation preview

1

Can junk Can junk DNA be DNA be

exapted? exapted?

Dan GraurDan Graur

Bat Sheva WorkshopBat Sheva Workshop

2

Can Can strawstraw (junk DNA) (junk DNA) be spun into be spun into goldgold (genes)? (genes)?

3ExaptationExaptation

4

15 February 2001

5

The human genome The human genome is disappointing:is disappointing:

• It is smallIt is small• It is emptyIt is empty• It is unoriginalIt is unoriginal• It is repetitiveIt is repetitive

6

K-value paradox: Complexity K-value paradox: Complexity does not correlate with does not correlate with chromosome number.chromosome number.

46 250

Ophioglossum reticulatumHomo sapiens Lysandra atlantica

1260

7

C-value paradox: Complexity C-value paradox: Complexity does not correlate with does not correlate with genome size.genome size.

3.4 × 109 bp Homo sapiens Amoeba dubia

6.7 × 1011 bp

8

N-value paradox: Complexity N-value paradox: Complexity does not correlate with gene does not correlate with gene number.number.

~31,000 genes~31,000 genes ~26,000 genes~26,000 genes ~50,000 genes~50,000 genes

9

Intergenicregions(junk)

Introns (junk)Exons

1.5%1.5%

The genome is empty.The genome is empty.

10

The genome contains a large number of genetic “corpses” (pseudogenes).

11

L-gluono-L-gluono--lactone oxidase deficiency-lactone oxidase deficiency

12

From 23 genes per million base pairs on From 23 genes per million base pairs on chromosome 19 (chromosome 19 (3%3%) to only 5 genes per ) to only 5 genes per million base pairs on chromosome 13 (million base pairs on chromosome 13 (0.7%0.7%).).

There are gene-dense (urban centers) and There are gene-dense (urban centers) and gene-poor (deserts) chromosomesgene-poor (deserts) chromosomes

13

How can we be sure that How can we be sure that the genome is empty?the genome is empty?

Isn’t it possible that the Isn’t it possible that the emptiness is a mere emptiness is a mere artifact of our ignorance?artifact of our ignorance?

14

959 cells959 cells 1,031 cells1,031 cells

19,000 genes19,000 genes 13,600 genes13,600 genes~10~1088 cells cells

15

July 2000Bets: 165Mean: 61,710Lowest: 27,462Highest: 153,478

Bets: 281 Median: 61,302 Lowest: 27,462 Highest: 212,278

July 2001

The gene number game: GensweepThe gene number game: Gensweep©©

16

Humans are not at all original in comparison with other vertebrates.

17

Mouse-human synteny.Mouse-human synteny. Human chromosomes Human chromosomes can be cut into ~150 pieces, then shuffled into a can be cut into ~150 pieces, then shuffled into a reasonable approximation of the mouse genome. reasonable approximation of the mouse genome.

18

2 solutions to the N-value paradox:2 solutions to the N-value paradox:

* * What looks empty What looks empty isn’tisn’t..

* * What looks functional What looks functional is more sois more so..

19

Junk DNAJunk DNA

Junk can sometimes be useful:Junk can sometimes be useful:

• spare parts (spare parts (modulesmodules))

• motif donors (motif donors (exon shufflingexon shuffling))

• molds (molds (gene conversiongene conversion))

20

SplicingSplicing

Eukaryotic genes (exons & Eukaryotic genes (exons & introns)introns)

TranslationTranslation

21

SplicingAlternative

Mature splice variant II

Mature splice variant I

Alternative splicing: Alternative splicing: One gene, several One gene, several proteins!proteins!

22

Types of Types of alternative alternative

splicingsplicing

23

Cassette exon Cassette exon or or

internal-exon skippinginternal-exon skipping

24

Deduction of internal-exon skipping Deduction of internal-exon skipping through mRNA sequence alignmentthrough mRNA sequence alignment

25

Large-scale multiple alignment Large-scale multiple alignment of expressed sequencesof expressed sequences

Databases: Databases: tens of thousands of tens of thousands of mRNAmRNAss millions of millions of ESTESTss

From large-scale alignments, it is known From large-scale alignments, it is known that 40-60% of all human genes undergo that 40-60% of all human genes undergo alternative splicing.alternative splicing.

26

GenCarta (Compugen): Alignment of expressed GenCarta (Compugen): Alignment of expressed sequences to genomic sequencessequences to genomic sequences

27

Alternative splicing:Alternative splicing:

Alternative splicing may be Alternative splicing may be unconditionalunconditional, i.e., two or more , i.e., two or more mRNA variants are produced in all mRNA variants are produced in all tissues expressing the gene.tissues expressing the gene.

Alternative splicing may be Alternative splicing may be conditionalconditional, i.e., tissue specific, , i.e., tissue specific, developmental-stage specific or developmental-stage specific or physiological-state specific.physiological-state specific.

28

Initial goal: Identifying sequence elements Initial goal: Identifying sequence elements that regulate alternative splicingthat regulate alternative splicing

Compile a database of skipped exons. Compile a database of skipped exons. Compile a database of constitutive exons.Compile a database of constitutive exons. Characterize diagnostic features of alternative Characterize diagnostic features of alternative

splicing versus constitutive splicing.splicing versus constitutive splicing.

29

Initial resultsInitial results

4,151 constitutive exons.4,151 constitutive exons. 1,182 alternative exons.1,182 alternative exons. A motif searching program was run on A motif searching program was run on

each set.each set. A strong motif, found in some of the A strong motif, found in some of the

alternative exons, was not found in the alternative exons, was not found in the constitutive ones.constitutive ones.

The motif turned out to be part of an The motif turned out to be part of an AluAlu element.element.

30

Exaptation Exaptation case report:case report:

AluAluss

31

AluAlu elements elements Length = ~300 bpLength = ~300 bp Repetitive: > 1,000,000 times in the human Repetitive: > 1,000,000 times in the human

genomegenome Constitute >10% of the human genome Constitute >10% of the human genome Found mostly in intergenic regions and intronsFound mostly in intergenic regions and introns Propagate in the genome through retroposition Propagate in the genome through retroposition

(RNA intermediates). (RNA intermediates).

32

Repetitive DNA

Iinterspersedinterspersed in tandemin tandem

AluAlus are like that!s are like that!

33

Evolution of Evolution of AluAlu elements elements

34

Master-gene model for Master-gene model for AluAlu proliferation in the genomeproliferation in the genome

Master gene A

Replicatively incompetent progeny

Progeny undergoes multiple independent mutations

Mutation renders A non-functional & creates new master gene B

Mutation renders B non-functional & creates new master gene C

35

AluAlu elements can be divided into elements can be divided into subfamiliessubfamilies

The subfamilies are The subfamilies are distinguished by distinguished by ~16 diagnostic ~16 diagnostic positions.positions.

36

A

1 2

1 2A-OH

1 2YYYYYYYYYNCAGGTRAGT ACAG G

Donor site Acceptor siteBranch point

Lariat

Pyrimidine tract

Signals of splicingSignals of splicing

37

Because Because mRNAmRNAs and s and AluAlus are s are frequently frequently reversereverse transcribedtranscribed and and incorporated into the genome, incorporated into the genome, pypyrimidinerimidine tractstracts areare ubiubiqquitousuitous

The complementary strand of The complementary strand of polyA is polyT = pyrimidine polyA is polyT = pyrimidine tract.tract.

38

Our findingsOur findings

Out of 1,182 alternatively spliced cassette exons, Out of 1,182 alternatively spliced cassette exons, 62 62 have a significant hit to an have a significant hit to an AluAlu sequence. sequence.

Out of 4,151 constitutively spliced exons, Out of 4,151 constitutively spliced exons, nonenone has a significant hit to an has a significant hit to an AluAlu sequence. sequence.

all all AluAlu-containing -containing exons are exons are alternatively spliced.alternatively spliced.

39

Retention RatioRetention Ratio Retention ratio = number of Retention ratio = number of

mRNA molecules containing the mRNA molecules containing the alternatively spliced exon divided alternatively spliced exon divided by total number of mRNA by total number of mRNA molecules.molecules.

Retention ratio for Retention ratio for AluAlu-containing -containing exons was ~10%. exons was ~10%.

Retention ratio for alternatively Retention ratio for alternatively spliced exons that do not contain spliced exons that do not contain AluAlu was ~45%. was ~45%.

40

AluAlu elements: Definitions elements: Definitions

aaaaa aaaaaaaaaa+ strand:

– strand: tttttttttttttttt ttttttt

41

The minus strand ofThe minus strand of Alu Alu elements contains “near” elements contains “near” splice sitessplice sites The minus strand of The minus strand of AluAlu contains ~3 sites contains ~3 sites

that resemble the acceptor recognition site:that resemble the acceptor recognition site:

Consensus acceptor site:YYYYYYNCAG/RConsensus acceptor site:YYYYYYNCAG/RAlu-J: (127-114) :TTTTTTGtAG/AAlu-J: (127-114) :TTTTTTGtAG/A

The minus strand of The minus strand of AluAlu contains ~9 sites contains ~9 sites thatthat resemble the consensus donor site:resemble the consensus donor site:

Consensus donor site: CAG/GTRAGTConsensus donor site: CAG/GTRAGTAlu-J: (25-17) : CAG/GTGtGAAlu-J: (25-17) : CAG/GTGtGA

42

The plus strand ofThe plus strand of Alu Alu elements does not contain elements does not contain “near” “near” acceptor acceptor splice sitessplice sites

43

Exonization of a Exonization of a minusminus strandstrand

(all is (all is AluAlu))

AluExon

Acceptor

Donor

44

AluExon

Acceptor

Donor

Exonization of a Exonization of a plusplus strandstrand

(3’ of (3’ of Alu Alu is “in”)is “in”)

45

AluAluss withinwithin alternatively spliced exonsalternatively spliced exons

– strand + strand

Aluoccupiesentireexon

50 1

3’ 1 6

5’ 3 1

middle ofexon

0 0

46

Proposed model for Alu Proposed model for Alu exonizationexonization

Exon Exon

47

Proposed model for Alu Proposed model for Alu exonizationexonization

Exon Exon

48

Does Exonization Represent Does Exonization Represent Functionalization?Functionalization?

1. Alu1. Alus are only found in alternative s are only found in alternative exons. exons. – AluAlu-containing constitutive exons cannot be -containing constitutive exons cannot be

created by mutation.created by mutation.– AluAlu-containing constitutive exons are deleterious -containing constitutive exons are deleterious

and, therefore, and, therefore, selected againstselected against..

Constitutve Constitutve AluAlu-containing exons are known -containing exons are known and they are invariably deleterious.and they are invariably deleterious.

49

Does Exonization Represent Does Exonization Represent Functionalization?Functionalization?

2. Alu2. Alus are only found in alternative s are only found in alternative exons with low retention indices. exons with low retention indices.

Highly expressed alternative Highly expressed alternative AluAlu-containing -containing exons are deleterious.exons are deleterious.

50

Does Exonization Represent Does Exonization Represent Functionalization?Functionalization?

3. Eighty-four percent of all Alu3. Eighty-four percent of all Alu--containing exons cause containing exons cause frameshifts or premature frameshifts or premature termination. termination.

AluAlu-containing exons are unlikely to -containing exons are unlikely to contribute to the proteome.contribute to the proteome.

51

Does Exonization Represent Does Exonization Represent Functionalization?Functionalization?

4. 4. There are reasonsThere are reasons to believe that to believe that many identifications of alternative many identifications of alternative splicing are spurious. splicing are spurious.

The contribution of alternative splicing to the The contribution of alternative splicing to the proteomic repertoire may be vastly proteomic repertoire may be vastly overestimated.overestimated.

52

Conclusion?Conclusion?

AluAlu elements increase coding elements increase coding and regulatory versatility of the and regulatory versatility of the transcriptome, while transcriptome, while maintaining the intactness of maintaining the intactness of the genomic repertoire.the genomic repertoire.

53

ConclusionConclusion

No exaptationNo exaptation

54

Exaptation Exaptation case report:case report:

numtnumtss****pronounced “new mights”pronounced “new mights”

55

NumtNumts (s (nuclear mitochondrial nuclear mitochondrial DNA sequencesDNA sequences) are a type of ) are a type of promiscuous DNA, i.e., promiscuous DNA, i.e., nuclear sequences of organelle nuclear sequences of organelle (e.g., mitochondrial) origin.(e.g., mitochondrial) origin.

56

Numts: Evolution’s misplaced witnesses

57

The transfer of The transfer of functional functional genesgenes from the mitochondria from the mitochondria to the nucleus is thought to to the nucleus is thought to have has have has stoppedstopped in in evolution after the evolution after the emergence of animals emergence of animals (~(~1,0001,000 MYAMYA). ).

58

The reason is thought to The reason is thought to be the be the differencesdifferences between the nuclear and between the nuclear and mitochondrial genetic mitochondrial genetic codescodes..

59

The transfer of The transfer of nonfunctional piecesnonfunctional pieces of of mitochondrial genetic mitochondrial genetic information continues to information continues to this day. this day.

60

NumtNumts have been found so far s have been found so far in 83 eukaryote species.in 83 eukaryote species.

61

Most species whose genomes Most species whose genomes have been completely have been completely sequenced contain very fewsequenced contain very few numtnumtss.. Saccharomyces cerevisiaeSaccharomyces cerevisiae 1717 numts numtsCaenorhabditis elegansCaenorhabditis elegans 33 numts numtsDrosophila melanogasterDrosophila melanogaster 33 numts numtsPlasmodium falciparumPlasmodium falciparum 33 numts numts

62

In the human genome we find In the human genome we find ~1,000~1,000 numt numtss

total length = 831 Kb total length = 831 Kb

~0.02% of the nuclear genome~0.02% of the nuclear genome

63

We found 82 numts larger than 1,000 bp in the human genome.

64

Numts were found on all chromosomes.Numts larger than 1,000 bp were found on 21 chromosomes.

65

The newest numt was found on chromosome 6.

Length = ~6,000~6,000 bp (35% of the human mithochondrial genome)

Similarity = 98.2% DNA identity.

The longest numt was found on chromosome 5.

Length = ~16,000~16,000 (an entire mitochondrional genome)

Similarity = 88.8% DNA identity.

66

The The larlarggestest documented documented nonhuman nonhuman numtnumt is a 7.9-Kb is a 7.9-Kb fragment in the nuclear genome fragment in the nuclear genome of the domestic cat. of the domestic cat.

67

The 82 numts contain a total of 362 complete mitochondrial genes (of which 108 are protein-coding genes).

68

With the exception of the D-loop, which is variable and difficult to detect by similarity, all other regions of the mtDNA are represented in numts at frequencies that do no deviate significantly from the random expectation

69

Only 4 numts retained an intact reading frame. They are annotated as putative protein coding genes

70

In all cases the gene is NADH dehydrogenase subunits 4L (ND4L).

71

ND4L is the also the only mitochondrial gene that can be translated “without incident” by the nuclear genetic code.

72

ConclusionConclusion

No exaptationNo exaptation