Upload
alisha-pope
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
Genome organisation Genome organisation and evolutionand evolution
Level 3 Molecular Evolution and Level 3 Molecular Evolution and BioinformaticsBioinformatics
Jim ProvanJim Provan
Page and Holmes: Sections 3.1.4/5 and 3.3Page and Holmes: Sections 3.1.4/5 and 3.3
The eukaryotic genomeThe eukaryotic genome
Coding DNACoding DNA
Non-codingDNA
Non-codingDNA
Single-copy proteinSingle-copy proteincoding genescoding genes
Multigene familiesMultigene families
Regulatory sequencesRegulatory sequences
DispersedDispersed
Tandemly repeatedTandemly repeated
Tandemly repeatedTandemly repeatedDNADNA
Transposable elementsTransposable elementsAnd retrovirusesAnd retroviruses
Spacer DNASpacer DNA
Satellite DNASatellite DNA
MinisatellitesMinisatellites
MicrosatellitesMicrosatellites
The C-value paradoxThe C-value paradox
The amount of DNA per The amount of DNA per haploid genome is haploid genome is known as the known as the C-valueC-valueContrary to Contrary to expectation, the expectation, the amount of DNA is not amount of DNA is not correlated with correlated with complexity:complexity:
The protist, The protist, Amoeba Amoeba dubiadubia has about 200 has about 200 times more DNA times more DNA (670,000,000 kbp) than (670,000,000 kbp) than humans (3,300,000 kbp)humans (3,300,000 kbp)
Cannot be explained by Cannot be explained by differences in gene differences in gene numbernumber
0
2
4
6
8
10
12
0
2
4
6
8
10
12
Myc
oplasm
a pn
eum
oniae
Myc
oplasm
a pn
eum
oniae
Esch
erichi
a co
li
Esch
erichi
a co
li
Sacc
haro
myc
es cer
evisi
ae
Sacc
haro
myc
es cer
evisi
ae
Caeno
rhab
ditis
elega
ns
Caeno
rhab
ditis
elega
ns
Droso
phila
melan
ogas
ter
Droso
phila
melan
ogas
ter
Mus
mus
culu
s
Mus
mus
culu
s
Xenop
us la
evis
Xenop
us la
evis
Homo
sapi
ens
Homo
sapi
ens
Pisu
m sat
ivum
Pisu
m sat
ivum
Liliu
m lo
ngifl
oriu
m
Liliu
m lo
ngifl
oriu
m
Prot
opte
rus ae
thiopi
cus
Prot
opte
rus ae
thiopi
cus
Amoe
ba d
ubia
Amoe
ba d
ubia
The structure of genesThe structure of genes
There are many forms of genes:There are many forms of genes:Those which produce a protein, a tRNA or an rRNA are Those which produce a protein, a tRNA or an rRNA are referred to as referred to as structural genesstructural genesThose which control how and when genes are Those which control how and when genes are expressed are calledexpressed are called regulatory genes regulatory genesSome Some housekeeping geneshousekeeping genes need to be expressed in all need to be expressed in all tissues e.g. those involved in protein synthesistissues e.g. those involved in protein synthesisOther, Other, tissue-specific genestissue-specific genes, are only expressed in a , are only expressed in a particular cell or tissue type e.g. the insulin gene is only particular cell or tissue type e.g. the insulin gene is only expressed in the pancreatic β-cellsexpressed in the pancreatic β-cells
Whatever their function, all genes contain a Whatever their function, all genes contain a coding region which specifies a polypeptide or an coding region which specifies a polypeptide or an RNA moleculeRNA molecule
Regulation of gene expressionRegulation of gene expression
Coding regions of genes are usually flanked by Coding regions of genes are usually flanked by regulatory regions which control gene regulatory regions which control gene expression through transcription and translationexpression through transcription and translation
Upstream Upstream promoter regionspromoter regions::– In bacteria, there is a In bacteria, there is a Pribnow boxPribnow box (TATAAT) about 10 bp (TATAAT) about 10 bp
upstream from where transcription starts, the upstream from where transcription starts, the ‘-35 site’‘-35 site’ (TTGACA) about 35 bp upstream and the (TTGACA) about 35 bp upstream and the Shine-Dalgarno Shine-Dalgarno boxbox (AGGAGG) about 7 bp before the initiation codon (AGGAGG) about 7 bp before the initiation codon
– In eukaryotes, as well as the In eukaryotes, as well as the TATA boxTATA box, some promoter , some promoter regions contain a regions contain a CAAT boxCAAT box about 40 bp before initiation about 40 bp before initiation codon and a codon and a GC boxGC box (GGGCGG) about 110 bp upstream (GGGCGG) about 110 bp upstream
Downstream elements such as the Downstream elements such as the polyadenylation polyadenylation signalsignal (AATAA) signify the end of transcription and (AATAA) signify the end of transcription and increase stability of RNA transcriptsincrease stability of RNA transcripts
Structure of a typical gene - Structure of a typical gene - alcohol dehydrogenase (alcohol dehydrogenase (AdhAdh))
Promoter regionPromoter region• TATA boxTATA box• CAAT box (in mammals)CAAT box (in mammals)• GC box (GGGCGGG)GC box (GGGCGGG)
Initiation codonInitiation codon Stop codonStop codon
PolyadenylationPolyadenylationsignalsignalAATAAAATAA
Exon 1Exon 1 Exon 2Exon 2 Exon 3Exon 3 Exon 4Exon 4
Intron 1Intron 1 Intron 2Intron 2 Intron 3Intron 3
5’5’ 3’3’
EukaryoteEukaryote
Initiation codonInitiation codon Stop codonStop codon
Promoter regionPromoter region• Shine-Dalgarno box (AGGAGG)Shine-Dalgarno box (AGGAGG)• Pribnow box (TATAAT)Pribnow box (TATAAT)• -35 site (TTGACA)-35 site (TTGACA) ProkaryoteProkaryote
5’5’ 3’3’
IntronsIntrons
Occur frequently within eukaryotic genomes Occur frequently within eukaryotic genomes and make up most of the length of very long and make up most of the length of very long genesgenesNumber, size and organisation of introns varies:Number, size and organisation of introns varies:
Histones have no introns: chicken pro-Histones have no introns: chicken pro-22-collagen -collagen gene has over fiftygene has over fiftySV40 virus contains an intron of 31 bp: human SV40 virus contains an intron of 31 bp: human dystrophin gene has an intron of over 210,000 bpdystrophin gene has an intron of over 210,000 bpSome introns have genes contained within them - the Some introns have genes contained within them - the AdhAdh gene in gene in DrosophilaDrosophila is located within the intron of is located within the intron of the the outspreadoutspread gene gene
Strong conservation of intron-exon boundaries - Strong conservation of intron-exon boundaries - nearly always begin with GT and end with AGnearly always begin with GT and end with AG
Types of intronsTypes of introns
Most introns in eukaryotes are Most introns in eukaryotes are spliceosomal spliceosomal intronsintrons (‘nuclear introns’) because they are (‘nuclear introns’) because they are spliced by a spliced by a spliceosomespliceosome of proteins and RNA of proteins and RNASome introns can splice without the aid of Some introns can splice without the aid of proteins (“proteins (“self-splicing intronsself-splicing introns”):”):
One class - One class - group I intronsgroup I introns - are sometimes mobile because - are sometimes mobile because they encode proteins such as DNA endonucleases. They are they encode proteins such as DNA endonucleases. They are found in mitochondrial and chloroplast genomes, rRNAs of found in mitochondrial and chloroplast genomes, rRNAs of some eukaryotes and in T4 bacteriophagesome eukaryotes and in T4 bacteriophageGroup II intronsGroup II introns are found in organelles and their bacterial are found in organelles and their bacterial ancestors and contain reverse transcriptase-like sequencesancestors and contain reverse transcriptase-like sequencesGroup III intronsGroup III introns are found in a few protists and are similar to are found in a few protists and are similar to group II introns with the central portion removedgroup II introns with the central portion removed
The evolution of intronsThe evolution of introns
There are two competing hypotheses for the There are two competing hypotheses for the evolution of spliceosomal introns:evolution of spliceosomal introns:
The The introns-earlyintrons-early hypothesis, proposed by Walter hypothesis, proposed by Walter Gilbert, suggests that introns mark the boundaries Gilbert, suggests that introns mark the boundaries between ancient genes which encoded distinct between ancient genes which encoded distinct proteins.proteins.
Throughout evolution these once-independent proteins Throughout evolution these once-independent proteins have been put together in new combinations to have been put together in new combinations to produce more complex proteins by produce more complex proteins by exon shufflingexon shuffling
An alternative hypothesis (introns-late) suggests that An alternative hypothesis (introns-late) suggests that introns only invaded eukaryote genomes fairly recentlyintrons only invaded eukaryote genomes fairly recently
The evolution of introns The evolution of introns (continued)(continued)
A crucial prediction of the introns-early A crucial prediction of the introns-early hypothesis is that spliceosomal introns delineate hypothesis is that spliceosomal introns delineate structural or functional units within proteins:structural or functional units within proteins:
Introns are found in the same places in all known globin Introns are found in the same places in all known globin genes, including myoglobin and plant leghaemoglobinsgenes, including myoglobin and plant leghaemoglobinsMore frequently, however, introns do not appear to More frequently, however, introns do not appear to separate functionally distinct parts of proteinsseparate functionally distinct parts of proteins
Other problem with introns-early hypothesis is Other problem with introns-early hypothesis is absence from Archaea and Bacteria:absence from Archaea and Bacteria:
Massive intron loss has been postulated but does not Massive intron loss has been postulated but does not explain why they are found in nuclear copies of explain why they are found in nuclear copies of organelle genes but not in the genes of the organelles organelle genes but not in the genes of the organelles or their precursorsor their precursorsExon shuffling has probably been a factor in later Exon shuffling has probably been a factor in later eukaryoteseukaryotes
Multigene familiesMultigene families
Many genes are found not as individual copies but as Many genes are found not as individual copies but as part of part of multigene familiesmultigene families, larger families of related , larger families of related genes:genes:
Important evolutionary innovation: proteins with similar Important evolutionary innovation: proteins with similar function can be arranged so that they are regulated function can be arranged so that they are regulated efficientlyefficientlyVertebrates have a variety of multipolypeptide globin genes, Vertebrates have a variety of multipolypeptide globin genes, produced by produced by gene duplicationgene duplication, which are adapted to varying , which are adapted to varying oxygen requirements of different developmental stagesoxygen requirements of different developmental stages
Not all genes are functional:Not all genes are functional:PseudogenesPseudogenes arise through gene duplications but acquire arise through gene duplications but acquire mutations since only one copy is requiredmutations since only one copy is requiredProcessed pseudogenesProcessed pseudogenes, which lack promoters and introns, , which lack promoters and introns, have been produced by reverse transcription of mRNAhave been produced by reverse transcription of mRNA
Multigene families (continued)Multigene families (continued)
EmbryonicEmbryonic FoetalFoetal PseudogenePseudogene AdultAdult
00
100100
200200
Mill
ions
of
years
ago
Mill
ions
of
years
ago
Evolution of multigene familiesEvolution of multigene families
Most obvious way in which gene number can change Most obvious way in which gene number can change between species is through between species is through gene duplicationgene duplication::
Can arise through unequal crossing-overCan arise through unequal crossing-overMay occur by duplication of entire genomes (May occur by duplication of entire genomes (polyploidypolyploidy):):
– Common in plants: around 50% of angiosperms are polyploidCommon in plants: around 50% of angiosperms are polyploid– Xenopus laevisXenopus laevis is tetraploid: normal meiosis is possible is tetraploid: normal meiosis is possible– Other members of the genus Other members of the genus XenopusXenopus have chromosome have chromosome
numbers ranging from 20 to 108numbers ranging from 20 to 108
Another mechanism of geneAnother mechanism of gene duplication is duplication is transpositiontranspositionFate of new gene depends on function: redundancy Fate of new gene depends on function: redundancy vs. natural selectionvs. natural selectionGenes can also acquire new functions without Genes can also acquire new functions without duplication e.g. duplication e.g. -crystallin and LDH-crystallin and LDH
Gene duplication in the Gene duplication in the HoxHox gene gene familyfamily
Homeotic genes Homeotic genes control the control the development of body development of body plan in animalsplan in animalsIn both vertebrate In both vertebrate HoxHox and invertebrate and invertebrate HOMHOM genes, there is a highly genes, there is a highly conserved protein conserved protein motif known as a motif known as a homeoboxhomeoboxMutations in Mutations in HoxHox//HOMHOM genes can drastically genes can drastically affect the organisation affect the organisation of body partsof body parts
Although Although HoxHox//HOMHOM genes are related, their genes are related, their organisation differs between organisms:organisation differs between organisms:
In vertebrates, there are multiple clusters of In vertebrates, there are multiple clusters of HoxHox genes: genes: the mouse has four clusters, each located on a different the mouse has four clusters, each located on a different chromosome and covering over 100 kbchromosome and covering over 100 kbHOMHOM genes in genes in DrosophilaDrosophila are found in two clusters, are found in two clusters, Antennipedia and Bithorax, on the same chromosomeAntennipedia and Bithorax, on the same chromosomeIn amphioxus – a class of marine invertebrates which are In amphioxus – a class of marine invertebrates which are the closest relatives to the vertebrates – there is a single the closest relatives to the vertebrates – there is a single cluster of at least 10 cluster of at least 10 HoxHox genes each of which is genes each of which is homologous to a different homologous to a different HoxHox gene in vertebrates: origin gene in vertebrates: origin of vertebrates coincided with a series of gene of vertebrates coincided with a series of gene duplicationsduplications
Example of a Example of a disperseddispersed gene family in vertebrates gene family in vertebrates
Gene duplication in the Gene duplication in the HoxHox gene gene familyfamily
Gene duplication in the Gene duplication in the HoxHox gene gene familyfamily
GeneGeneDuplicationsDuplications(four clusters)(four clusters)
AmphioxusAmphioxus
HypotheticalHypotheticalCommonCommonAncestorAncestor
lablab pbpb DfdDfd ScrScr AntpAntp UbxUbx AbdAAbdA AbdBAbdB
DrosophilaDrosophila
Tandem arraysTandem arrays
Tandem arrays contain Tandem arrays contain multiple copies of genes multiple copies of genes with the same functionwith the same function
Good example is the Good example is the rDNA array:rDNA array:
18S18S 5.8S5.8S 28S28S
NTSNTS
ETSETS ITS1ITS1 ITS2ITS2
Large quantities of rRNA Large quantities of rRNA requiredrequired
Genes and spacers co-Genes and spacers co-transcribed and separated transcribed and separated by non-transcribed spacerby non-transcribed spacer
Variation in size of arrays:Variation in size of arrays:– 1 copy in 1 copy in TetrahymenaTetrahymena
– 19,300 copies19,300 copies in in AmphiumaAmphiuma
Evolution of rDNA arraysEvolution of rDNA arrays
Because they contain both highly conserved (18S) Because they contain both highly conserved (18S) and highly variable (NTS) regions, rDNA sequences and highly variable (NTS) regions, rDNA sequences have been used frequently in molecular systematicshave been used frequently in molecular systematicsDespite this, they do not evolve in a simple manner:Despite this, they do not evolve in a simple manner:
Although there is a high degree of sequence similarity within Although there is a high degree of sequence similarity within species, there is great divergence between themspecies, there is great divergence between themDue to unequal crossing-over and gene conversion, Due to unequal crossing-over and gene conversion, concerted evolutionconcerted evolution can take place which allows genes to can take place which allows genes to evolve together by spreading mutations throughout evolve together by spreading mutations throughout membersmembersThis makes phylogenetic analysis difficult since it is not easy This makes phylogenetic analysis difficult since it is not easy to discern which genes are truly homologousto discern which genes are truly homologousOften leads to “mosaics” of sequences, each with different Often leads to “mosaics” of sequences, each with different phylogenetic historyphylogenetic history
Non-coding repetitive DNANon-coding repetitive DNA
Satellite DNASatellite DNA Highly repetitive (>10Highly repetitive (>1044)) Tandemly repeatedTandemly repeated
Mini-/microsatelliteMini-/microsatellite Moderately repetitiveModerately repetitive Tandemly repeatedTandemly repeated
Transposable elementsTransposable elementsModerately/highly repetitiveModerately/highly repetitiveDispersedDispersed
ClassClass Copy numberCopy number OrganisationOrganisation
Tandemly repeated DNATandemly repeated DNA
Much of the non-coding repetitive DNA in eukaryotes Much of the non-coding repetitive DNA in eukaryotes consists of tandem repeats of short sequence motifs:consists of tandem repeats of short sequence motifs:
Satellite DNASatellite DNA is located mainly in the heterochromatin and is located mainly in the heterochromatin and consists of motifs up to 40 kb in length:consists of motifs up to 40 kb in length:
– The The -satellite DNA of primates based on a 171 bp motif -satellite DNA of primates based on a 171 bp motif repeated for hundreds of kilobasesrepeated for hundreds of kilobases
– Over 60% of the genome of Over 60% of the genome of Drosophila nasutoides Drosophila nasutoides is satellite is satellite DNADNA
MinisatellitesMinisatellites and and microsatellitesmicrosatellites are comprised of shorter are comprised of shorter motifs duplicated through unequal crossing over and DNA motifs duplicated through unequal crossing over and DNA slippage:slippage:
– Minisatellites motifs are 11 – 60 bp in length and contain a G-Minisatellites motifs are 11 – 60 bp in length and contain a G-rich “core” sequencerich “core” sequence
– Microsatellites are shorter, generally dinucleotide repeatsMicrosatellites are shorter, generally dinucleotide repeats– Both exhibit extremely high mutation rates and multiple alleles Both exhibit extremely high mutation rates and multiple alleles
are usually found in populationsare usually found in populations– Used in population genetics / forensicsUsed in population genetics / forensics
Transposable elementsTransposable elements
Transposable elementsTransposable elements increase copy number by increase copy number by moving around the genome making additional copies:moving around the genome making additional copies:
Around 50% of the maize genome may be transposable Around 50% of the maize genome may be transposable elementselements
10-20% of the 10-20% of the DrosophilaDrosophila genome genome
Three groups of transposable elements:Three groups of transposable elements:Class I (Class I (retroelementsretroelements) transpose through an intermediate ) transpose through an intermediate RNA stage via reverse transcriptase cf. retrovirusesRNA stage via reverse transcriptase cf. retroviruses
Class II (Class II (DNA elementsDNA elements) transpose directly from DNA to DNA) transpose directly from DNA to DNA
Little is known about Little is known about miniature inverted-repeat transposable miniature inverted-repeat transposable elementselements ( (MITEsMITEs): around 100 – 400 bp in length and ): around 100 – 400 bp in length and transpose by as yet unknown meanstranspose by as yet unknown means
Transposable elementsTransposable elements
Class I transposable elements (retroelements)Class I transposable elements (retroelements)
Reverse transcriptaseLTR LTRRetrotransposonsRetrotransposons
Reverse transcriptase AAAAAARetroposonsRetroposons
Class II transposable elements (DNA elements)Class II transposable elements (DNA elements)
Miniature inverted-repeat transposable elements (Miniature inverted-repeat transposable elements (MITEsMITEs))
e.g. e.g. TouristTourist and and StowawayStowaway
TransposaseAcAc-like elements-like elements
Short repeatShort repeat
Terminal repeatTerminal repeat
RetroelementsRetroelements
Two subgroups:Two subgroups:RetrotransposonsRetrotransposons contain long terminal repeats at both contain long terminal repeats at both ends: example is ends: example is copiacopia element which is found 20 – 60 element which is found 20 – 60 times in the genome of times in the genome of D. melanogasterD. melanogasterRetroposonsRetroposons have no LTR and have a poly-A tail: have no LTR and have a poly-A tail:
– Long interspersed nuclear elementsLong interspersed nuclear elements ( (LINEsLINEs) are 6 – 8 kb in ) are 6 – 8 kb in length and present in thousands of copies: the length and present in thousands of copies: the L1L1 family is family is present in 590,00 copies in the human genome (17% of total)present in 590,00 copies in the human genome (17% of total)
– Short interspersed nuclear elementsShort interspersed nuclear elements ( (SINEsSINEs) do not produce ) do not produce reverse transcriptase and so are not considered true reverse transcriptase and so are not considered true retroelements: they vary in size from 130 – 300 bp and have retroelements: they vary in size from 130 – 300 bp and have copy numbers from 50,000 to over 1,000,000copy numbers from 50,000 to over 1,000,000
– Originally derived from RNA transcriptsOriginally derived from RNA transcripts
Endogenous retrovirusesEndogenous retroviruses are proviruses which have are proviruses which have been integrated into the germ-line of eukaryotesbeen integrated into the germ-line of eukaryotes
Class II (DNA) elementsClass II (DNA) elements
Possess terminal repeats but unlike retrotransposons Possess terminal repeats but unlike retrotransposons these are short (generally < 100 bp) and usually these are short (generally < 100 bp) and usually invertedinvertedEncode a special transposase proteinEncode a special transposase proteinBest known types:Best known types:
Mariner Mariner elements in animalselements in animalsHobo Hobo and and P P elementselements in in DrosophilaDrosophila::
– PP elements can move between species and affect host phenotype elements can move between species and affect host phenotype– Increased infertility due to chromosome breakage (hybrid Increased infertility due to chromosome breakage (hybrid
dysgenesis) occurs in dysgenesis) occurs in D. melanogaster.D. melanogaster. PP elements are not found elements are not found in closely related species (in closely related species (D. simulansD. simulans, , D. sechelliaD. sechellia, , D. D. mauritaniamauritania) but are found in more distantly related species e.g. ) but are found in more distantly related species e.g. D. willistoni D. willistoni group: transferred after group: transferred after D. melanogasterD. melanogaster split from split from sibling speciessibling species
– Insertion can have “knock-out” effect on phenotype e.g. Insertion can have “knock-out” effect on phenotype e.g. whitewhite gene in flies lacking red eye pigmentgene in flies lacking red eye pigment