15
1 Genome evolution Lukas Schärer Evolutionary Biology Zoological Institute University of Basel 26.10.2011 Advanced-level Evolutionary Biology HS 11 2 from Maynard Smith 1998

Genome Evolutionevolution.unibas.ch/teaching/evol_fort/pdf/Genome_Evolution.pdf · 3 • genome size • gene and genome duplication • evolution of chromosome form • repeated

  • Upload
    lammien

  • View
    227

  • Download
    0

Embed Size (px)

Citation preview

1

Genome evolution

Lukas Schärer

Evolutionary Biology

Zoological Institute

University of Basel

26.10.2011 Advanced-level Evolutionary Biology HS 11

2

from Maynard Smith 1998

3

• genome size

• gene and genome duplication

• evolution of chromosome form

• repeated DNA• gene clusters

• tandemly repeated genes

• short tandem repeats in DNA

• middle repetitive dispersed DNA

• highly repetitive DNA

Summary: Genome evolution

4

Genome size

• range within the eukaryotes (600’000x)

• smallest is 0.0023 pg in the parasitic microsporidium, Encephalitozoon intestinalis

• largest is 1’400 pg in the free-living amoeba, Chaos chaos

• range within the metazoa (6’650x)

• smallest is 0.02 pg in the plant-parasitic nematode, Pratylenchus coffeae

• largest is 133 pg in the marbled lungfish, Protopterus aethiopicus

data from www.genomesize.com

5

Genome size

• how much information is this?• 1pg of DNA is about 109 base pairs

• one A4 page contains about 2500 characters

• so 1pg of DNA corresponds to about 400’000 pages

• if an average book has 400 pages, 1pg of DNA corresponds to about 1000 books

• so the smallest genome corresponds to about two books!

• and the largest genome corresponds to about one million books (or 1/3 of the complete holdings of the UB Basel)

Gregory 2004

6

Genome size

• there is currently data for ~5000 species (www.genomesize.com)

7

Gene and genome duplication

• there are two different ways in which the amount of DNA could in theory increase:

• de novo synthesis of sequences that are not homologous to any pre-existing DNA

• there no evidence for this process

• duplication of pre-existing DNA• of whole genes, genome regions or whole genomes

• new ‘random’ sequences could stem from a duplication event followed by frame-shift mutations

8

Gene and genome duplication

• there are three main mechanisms that actually occur:

• unequal crossing over• generally leads to tandem repetitions of genes

• transposition• generally leads to dispersed copies of genes

• polyploidisation• whole genome duplication

9

Gene and genome duplication

from Maynard Smith 1998

• unequal crossing over• unequal crossing over can

lead to gene duplication

• once genes are duplicated unequal crossing over will be more likely, leading to tandem repetition of genes

• unequal crossing over also facilitates chromosomal rearrangements

10

Gene and genome duplication

• there are four possible fates of duplicated loci:

• one copy is inactivated and slowly degrade• the sequence homology in such pseudogenes will slowly degrade

• the two loci diverge, while maintaining similar functions• the sequence homology in such genes of a gene family may be maintained for a

long time

• the loci diverge and acquire different functions while still maintaining somewhat similar protein 3D structures

• some limited sequence homology may be actively retained

• similarity in 3D structure may allow to identify homology, but convergence more difficult to exclude

• one copy acquires a new function after a frame-shift mutation• no similarity in protein 3D structure or amino acid sequence, and very little

sequence homology due to positive selection on new function

11

Evolution of chromosome form

• the number of chromosomes can vary dramatically between species

• the lowest is n=1 in the nematode Parascaris (about 2 pg)

• the highest is n=127 in the hermit crab Eupagurus (? pg)

• in some taxa chromosome number is very uniform (e.g. n=13 in most dragonflies or n=18 in most snakes)

• in other taxa it varies dramatically• 2n=7 in males and 2n=6 in females of Muntiacus muntjac

vaginalis (about 2.5 pg)

• 2n=46 in Muntiacus reevesi (about 3 pg)

12

Evolution of chromosome form

from Maynard Smith 1998

• once genes occur repeatedly unequal crossing-over can lead to changes in chromosome form

• all the known mechanisms require at least two chromosome breaks

13

• w

Evolution of chromosome form

14

• polytene chromosome map of Anopheles gambiae

from vectorbase.org

Evolution of chromosome form

15

• effects of changes in chromosome structure• heterozygotes for the new structures have reduced fertility

• half of the gametes are aneuploid

Evolution of chromosome form

from Maynard Smith 1998

16

Repeated DNA

• much of the DNA in the genome is present in many copies• gene clusters

• many proteins are present in an individual in several different forms, coded for by distinct but similar genes (gene families)

• gene clusters allow diversification of gene function

• tandemly repeated genes• have identical function and can be present in several hundreds of copies• allow high volume production

• tandemly repeated DNA• short DNA motif repeats that may not have a clear function• represent highly useful genetic markers

• middle repetitive dispersed DNA• hundreds to thousands of nucleotides present in tens or hundreds of copies per

genome, dispersed throughout the genome, with often only one copy per site

• highly repetitive DNA• short sequences each present in very large numbers (can be >106) arranged in

tandemly repeated blocks

17

Gene clusters

• the haemoglobin gene family• the haemoglobin protein is tetrameric

• in embryos two ε- and two ζ-peptides• in foetus two α- and two γ-peptides

• in adults two α- and two β-peptides

• this structure allows to accommodate the different oxygen requirements of the different life stages

• all peptides have a similar gene structure, amino acid sequence and 3D structure so they must have originated from one gene

• there is an α- and a β-cluster, which are on different chromosomes

18

Gene clusters

• all these haemoglobin variants have the same gene structure• the positions of the three exons and two introns are conserved in all

haemoglobin genes

• introns are cleaved out before the mRNA is translated

• what are introns?• ancient transposable elements?

• remember the self splicing group I introns?

• or is each exon an ancestral mini gene?• if so then the different exons should tend to code for different functional domains

of the protein

from Maynard Smith 1998

19

Gene clusters

• in the cluster there are • different functional genes

• different pseudogenes with the correct exon/intron structure (but with e.g. frame-shift mutations)

• outside the cluster there are• pseudogenes which may have been

carried out by jumping transposable elements

• exons and introns each make up about 8% of the cluster

• the remaining 84% is of unknown function

• but its rate of divergence is at only 20% of that of the intron sequences and silent sites from Maynard Smith 1998

20

• tandem repetition of the ribosomal gene unit• this unit is tandemly repeated about 200 times on both the x and y

chromosomes

• in Drosophila melanogaster each unit contains a region of internal repetition of about 250 bp sequence, which is specifically cut by the endonuclease AluI

• other Drosophila species also have such a repeat unit, but it is insensitive to AluI

• how can this AluI sensitive internal repetition have spread to all 200 ribosomal gene unit repeats of D. melanogaster?

Tandemly repeated genes

21

• it has been suggested that the ribosomal gene cluster is maintained by a phenomenon called concerted evolution

• unequal crossing over could lead to concerted evolution, but this requires that there is selection against a too low, or two high copy number

• this mechanism is probably too slow and it could only explain tandemly repeated copies, not copies throughout the genome

Tandemly repeated genes

from Maynard Smith 1998

200 gene copies in Drosophilavery large population size

22

• gene conversion could also lead to concerted evolution• but it would have to be biased for one of the alleles

• it is unclear if this is important in this case

Tandemly repeated genes

23

• there are many stretches of DNA where short motifs are repeated a variable number of times

• microsatellites (about 2-10bp) and minisatellites (about 10-100bp)• e.g. CAGCAGCAGCAGCAGCAGCAG or (CAG)7 is a trinucleotide repeat

• these are highly useful, codominat and probably often neutral genetic markers, which are used in paternity analysis, population genetics, gene mapping etc.

• the evolution of variability in short tandem repeat number is not entirely clear

• in short motifs the DNA polymerase appears to make mistakes, which leads to so-called stutter bands

Short tandem repeats in DNA

24

• searching for microsatellite loci in the early Macrostomum lignano genome assembly and in expressed sequence tags (ESTs)

• the production of ESTs yield thousands of relatively short (about 1000bp) DNA sequences

Short tandem repeats in DNA

25

• example of a manual search in the early Macrostomum lignano genome assembly

• http://www.macgenome.org/

• example of an automated search in the Macrostomum lignano ESTs• http://flatworm.uibk.ac.at/macest/putative_ssrs.php

• https://tandem.bu.edu/cgi-bin/trdb/trdb.exe?taskid=1

Short tandem repeats in DNA

26

• about 15% of the genomic DNA of Drosophila melanogaster consists of moderately repetitive DNA (around 100 copies) that are hundreds to thousands of bp long

• about half of this consists of about 30 families of copia-like elements

Middle repetitive dispersed DNA

27

• characteristics of the copia element• there are 30-50 copies of copia in a haploid genome, but different sites are

occupied in different flies

• in total there are about 200 sites of possible integration• in cell culture up to 150 sites can be occupied

• suggests that the copia elements can move to new sites• mechanism unknown but may resemble retroviruses

• suggests that the total number of copies per genome is regulated• copia is transcribed and may limit it’s own expression (remember the p element?)

• new insertions may cause mutations

• copia probably is a parasitic DNA

Middle repetitive dispersed DNA

28

• some relatively short repeats (few to a few hundred bp) can be tandemly repeated a from thousands to millions of times

• e.g. in the kangaroo rat, Dipodomys ordii, more then 50% of the DNA consists of repeats of AAG (2.4·109x), TTAGGG (2.2·109x) and ACACAGCGGG (1.2·109x) (Salser et al. 1976)

• but the genome size in this mouse is apparently only about 4 pg of DNA?

• these repeats need not always be perfect

• often these sequences do not show any clear features

• they are highly problematic for whole genome sequencing

Highly repetitive DNA

29

• genome size

• gene and genome duplication

• evolution of chromosome form

• repeated DNA• gene clusters

• tandemly repeated genes

• short tandem repeats in DNA

• middle repetitive dispersed DNA

• highly repetitive DNA

Summary: Genome evolution

30

Literature

• Mandatory Reading• Chapter 11 on The Evolution of the Eukaryotic Genome of Maynard Smith

(1998). Evolutionary Genetics. 2nd Edition. Oxford University Press.

• Suggested Reading• none

• Books • none