30
1 Sequencing Sequencing Genomes Genomes Human Genome Project: Human Genome Project: History, results and impact History, results and impact MUDr. Jan Pl MUDr. Jan Plá ten tení k, PhD. k, PhD. (December 2014) Beginnings of sequencing Beginnings of sequencing 1965: Sequence of a yeast 1965: Sequence of a yeast tRNA tRNA (80 (80 bp bp) ) determined determined 1977: Sanger 1977: Sanger’ s and s and Maxam Maxam & Gilbert & Gilbert’ s s techniques invented techniques invented 1981: Sequence of human 1981: Sequence of human mitochondrial mitochondrial DNA (16 DNA (16. 5 5 kbp kbp) 1983: 1983: Sequence Sequence of of bacteriophage bacteriophage T7 (40 T7 (40 kbp kbp) 1984: 1984: Epstein Epstein & & Barr Barr‘ s Virus (170 s Virus (170 kbp kbp)

HGP ENG 2014 web - Ústav lékařské biochemie 1.LF UKulbld.lf1.cuni.cz/file/1870/hgp-eng-2014-web.pdf · • Tandem ly repeated gen es for rRNA , tRNA and histon es (more identical

Embed Size (px)

Citation preview

1

SequencingSequencing GenomesGenomes

Human Genome Project: Human Genome Project:

History, results and impactHistory, results and impact

MUDr. Jan PlMUDr. Jan Pláátenteníík, PhD.k, PhD.

(December 2014)

Beginnings of sequencingBeginnings of sequencing

•• 1965: Sequence of a yeast 1965: Sequence of a yeast tRNAtRNA (80 (80 bpbp) ) determineddetermined

•• 1977: Sanger1977: Sanger’’s and s and MaxamMaxam & Gilbert& Gilbert’’s s techniques inventedtechniques invented

•• 1981: Sequence of human1981: Sequence of human mitochondrialmitochondrialDNA (16DNA (16..5 5 kbpkbp))

•• 1983: 1983: SequenceSequence of of bacteriophagebacteriophage T7 (40 T7 (40 kbpkbp))

•• 1984: 1984: EpsteinEpstein & & BarrBarr‘‘s Virus (170 s Virus (170 kbpkbp))

2

Homo sapiensHomo sapiens•• 19851985--1990: Discussion on human 1990: Discussion on human

genome sequencinggenome sequencing•• ““dangerousdangerous”” -- ““meaninglessmeaningless”” -- ““impossible to doimpossible to do””

•• 19881988--1990: Foundation of 1990: Foundation of

HUMAN GENOME PROJECTHUMAN GENOME PROJECT•• International collaboration:International collaboration: HUGO (Human HUGO (Human

Genome Organisation)Genome Organisation)

•• Aims:Aims:–– genetic map of human genomegenetic map of human genome

–– physical map: marker every 100 physical map: marker every 100 kbpkbp

–– sequencing of model organisms (E. coli, S. sequencing of model organisms (E. coli, S.

cerevisiaecerevisiae, C. , C. eleganselegans, Drosophila, mouse), Drosophila, mouse)

–– find all human genes (find all human genes (estimestim. 60. 60--80 80 tistisíícc))

–– sequence all human genome (sequence all human genome (estimestim. 4000 . 4000 MbpMbp) )

by 2005by 2005

Other genomesOther genomes

•• July 1995July 1995: : HaemophilusHaemophilus influenzaeinfluenzae

(1.8 (1.8 MbpMbp)) ... First genome of independent organism... First genome of independent organism

•• October 1996:October 1996: SaccharomycesSaccharomyces cerevisiaecerevisiae

(12 (12 MbpMbp)) ... First ... First EukaryotaEukaryota

•• December 1998: December 1998: CaenorhabditisCaenorhabditis eleganselegans

(100 (100 MbpMbp)) ... First ... First MetazoaMetazoa

3

May 1998:May 1998:

•• Craig VenterCraig Venter launches private launches private

biotechnology company biotechnology company CELERA CELERA

GENOMICS, Inc.GENOMICS, Inc. and announces intention and announces intention

to sequence whole human genome in just to sequence whole human genome in just

3 years and 300 mil. USD using the 3 years and 300 mil. USD using the wholewhole--

genome shotgun genome shotgun approach.approach.

•• The publicly funded HGP in that time: The publicly funded HGP in that time:

sequenced cca 4 % of the genome sequenced cca 4 % of the genome

March 2000:March 2000:

•• Celera Genomics & academic Celera Genomics & academic

collaborators publish draft genome of collaborators publish draft genome of

Drosophila Drosophila melanogastermelanogaster (cca 2/3 from (cca 2/3 from

180 180 MbpMbp))

•• ... ... wholewhole--genome shotgungenome shotgun is feasible for large is feasible for large

genomes as wellgenomes as well

•• ... ... Human genome: competition between ... ... Human genome: competition between

Human Genome Project and Celera GenomicsHuman Genome Project and Celera Genomics

4

International Human Genome Sequencing International Human Genome Sequencing

Consortium (Human Genome Project, HGP)Consortium (Human Genome Project, HGP)

•• Open to coOpen to co--operation from any operation from any countrycountry

•• 20 laboratories from USA, Great Britain, 20 laboratories from USA, Great Britain, Japan, France, Germany and ChinaJapan, France, Germany and China

•• About 2800 workers, main coordinator: About 2800 workers, main coordinator: Francis Collins, NIHFrancis Collins, NIH

•• Publicly fundedPublicly funded ((aboutabout 3 3 billionbillion USD)USD)

•• Approach: Approach: cloneclone--byby--cloneclone

•• ResultsResults: duty to : duty to uploadupload on internet on internet withinwithin24 24 hourshours ((thethe BermudaBermuda rule)rule). .

CloneClone--byby--cloneclonegenomic DNAgenomic DNA

fragments cca 150fragments cca 150,,000 000 bpbp

cloning in BAC (cloning in BAC (bacterial artificial chromosomebacterial artificial chromosome))

clones positioned in the genome using physical maps clones positioned in the genome using physical maps (STS (STS -- sequence tagged sitesequence tagged site, , fingerprintfingerprint -- cleavage by cleavage by restrictasesrestrictases))

digestion of every clone to short fragments cca 500 digestion of every clone to short fragments cca 500 bpbp

sequencingsequencing

assembly of each clone sequence with computerassembly of each clone sequence with computer

5

Celera Genomics, Inc.Celera Genomics, Inc.•• Private biotechnology company, based in Private biotechnology company, based in

Rockville, Maryland, USA. President Craig Rockville, Maryland, USA. President Craig Venter.Venter.

•• Investments into automation and Investments into automation and computer processing, few dozenscomputer processing, few dozens ofofemployeesemployees

•• Approach: Approach: wholewhole--genome shotgungenome shotgun + + utiliutilisseded publicly shared data from HGP.publicly shared data from HGP.

•• Results: raw data temporarily available at Results: raw data temporarily available at company www site, but all other updates company www site, but all other updates and annotations for commercial purpose. and annotations for commercial purpose.

WholeWhole--genome shotgungenome shotgun

genomic DNAgenomic DNA

fragments 2, 10, 50 fragments 2, 10, 50 kbpkbp

cloned in plasmids cloned in plasmids E.coliE.coli

sequencingsequencing

sequence assembly using sophisticated computer algorithmssequence assembly using sophisticated computer algorithms

6

February 2001:February 2001:

•• International Human Genome International Human Genome

Sequencing Consortium publishes Sequencing Consortium publishes

draft of human genome in Nature draft of human genome in Nature

((Feb. 15Feb. 15thth 20012001))

• Draft: 90 % euchromatin (2.95 Gbp, wholegenome 3.2 Gbp). 25 % definitive.

•• Celera Genomics, Inc. publishes Celera Genomics, Inc. publishes

human genome sequence in Science human genome sequence in Science

((Feb. 16Feb. 16thth 20012001))

• Sequence of euchromatin (2.91 Gbp)

Advance in sequencingAdvance in sequencing

1985: 500 1985: 500 bpbp /lab and day/lab and day–– still the Sanger still the Sanger dideoxynucleotidedideoxynucleotide

technique, buttechnique, but

–– capillary electrophoresis instead capillary electrophoresis instead of of gelgel

–– fluorescence markers instead fluorescence markers instead

radioactivityradioactivity

–– full full automatisationautomatisation & & robotisationrobotisation

–– computer powercomputer power

2000: 175,000 2000: 175,000 bpbp /day (Celera)/day (Celera)

1000 1000 bpbp/sec. (HGP)/sec. (HGP)

7

Sequencing continues...Sequencing continues...•• Human genome now:Human genome now: Definitive version Definitive version

announced 14/4/2003 announced 14/4/2003 ……50 years since DNA double 50 years since DNA double helix. The reference sequence still being updated.helix. The reference sequence still being updated.

•• FuguFugu rubripesrubripes:: draft of genome in August 2002draft of genome in August 2002

•• Mouse:Mouse:•• Celera Genomics: draft in JuneCelera Genomics: draft in June 20012001•• Mouse Mouse GenomeGenome SequencingSequencing ConsortiumConsortium: : NatureNature, ,

DecemberDecember 2002 2002

•• Laboratory rat:Laboratory rat: draft in March 2004draft in March 2004

•• ChimpanzeeChimpanzee:: SeptemberSeptember 20052005

•• …… andand many omany otherther genomes:genomes: malaria (the malaria (the cause Plasmodium cause Plasmodium falciparumfalciparum and carrier Anopheles and carrier Anopheles gambiaegambiae), ), zebrafishzebrafish, rice, dog, cattle, sheep, pig, , rice, dog, cattle, sheep, pig, chicken,chicken, honeybeehoneybee, , mammothmammoth etcetc..

Public databases of Public databases of

DDNA/RNA seNA/RNA sequencesquences

• GenBank, National Center for Biotechnology Information (NCBI), Bethesda, Maryland, USA

• EMBL-Bank, EMBL's European Bioinformatics Institute, Hinxton, UK

• DNA Data Bank of Japan, National Institute of Genetics, Mishima, Japan

•• 22/8/2005 c22/8/2005 contentontent of all three databases of all three databases exceeded 100,000,000,000 base pairs exceeded 100,000,000,000 base pairs (100 (100 GbGb) ) ..... from genes/genomes of . from genes/genomes of 165,000 species of organisms 165,000 species of organisms

8

Research in Research in ““postgenomicpostgenomic”” ageage•• New approaches to study genes & proteins:New approaches to study genes & proteins:

•• GENOMICS GENOMICS ...... analysis of whole genome and its analysis of whole genome and its

expressionexpression

•• PROTEOMICS PROTEOMICS ...... analysis of whole proteome, i.e. analysis of whole proteome, i.e.

all proteins in given tissue or organismall proteins in given tissue or organism

•• BIOINFORMATICS BIOINFORMATICS ...... processing, analysis and processing, analysis and

interpretation of large data sets (NA or protein interpretation of large data sets (NA or protein

sequences, gene arrays, 3D protein structures sequences, gene arrays, 3D protein structures

etc. Experiments etc. Experiments in in silicosilico

•• Rapid development of new technologies:Rapid development of new technologies:

•• e.g. e.g. DNA MicroarrayDNA Microarray -- expression of thousands of expression of thousands of

genes can be studied simultaneouslygenes can be studied simultaneously

DNA Microarray (DNA Microarray (““ DNA chipDNA chip””))

9

Single Nucleotide Polymorphism (SNP)Single Nucleotide Polymorphism (SNP)

OccursOccurs on on averageaverage in in oneone base base per 1000 per 1000 bpbp, i.e. in 0.1 % of , i.e. in 0.1 % of humanhumangenomegenome

AboutAbout 1010 millionmillionss of of SNPsSNPswithwith occurrenceoccurrence > 1%> 1%

Coding/nonCoding/non--codingcoding

Protein structure changed/unchangedProtein structure changed/unchanged

A G A G T T C T G C T C G

A G G G T T C T G C G CG

International International HapMapHapMap ProjectProject•• Further international collaborationFurther international collaboration

20022002--20092009

•• Genotyping and sGenotyping and seeququenencingcing ofof DNA DNA fromfrom270 270 people from fourpeople from four differentdifferent populapopulationstions(USA, (USA, NigeriNigeriaa, , JapJapanan, , ChiChina) na)

•• Aims at findingAims at finding•• all important humanall important human SNPsSNPs ((about 10,000,000about 10,000,000))

•• their their stabstablele ccombinaombinationstions ((haplotyphaplotypeses))

•• Tag SNP for each Tag SNP for each haplotypehaplotype

•• Data publicly available for further Data publicly available for further exploration exploration

10

Human genetic variationHuman genetic variation

•• Two unrelated humans have 99.5% Two unrelated humans have 99.5% of genome identicalof genome identical•• Single Nucleotide Polymorphisms: 0.1%Single Nucleotide Polymorphisms: 0.1%•• Copy number variation (insertions, Copy number variation (insertions,

deletions, duplications): 0.4% deletions, duplications): 0.4% •• Variable number tandem repeats Variable number tandem repeats

((……DNA fingerprinting in forensics)DNA fingerprinting in forensics)•• EpigeneticsEpigenetics ((methylationmethylation))

SecondSecond--generationgeneration sequencerssequencers::

E.g. Illumina Co., E.g. Illumina Co., XII/XII/2008:2008:

•• OneOne run (3 run (3 daysdays) of ) of GenomeGenome AnalyzerAnalyzermade made by by IlluminaIllumina IncInc. = 60 . = 60 yearsyears of of workwork of ABI 3730xl (of ABI 3730xl (usedused by Celera by Celera GenomicsGenomics))

•• CostCost of one of one humanhuman gengenoomemesequencingsequencing:: 4040--50,000 50,000 $$

•• FirstFirst individualindividual humanhuman genomesgenomessequencedsequenced::•• 2007: 2007: CraigCraig VenterVenter, , JamesJames WatsonWatson –– bothboth

genomesgenomes publishedpublished in in thethe internetinternet

11

…… andand thirdthird--generationgeneration sequencerssequencers

Graph: Nature 458, 719-724 (2009).

Obtained from http://genome.wellcome.ac.uk

NextNext--GenerationGeneration SequencingSequencing (NGS)(NGS)

CurrentCurrent technology, e.g technology, e.g IlluminaIllumina HiSeqHiSeq2500:2500:

SequencingSequencing by by synthesissynthesis (SBS)(SBS)

WholeWhole humanhuman genomegenome, 30x , 30x coveragecoverage, , takestakes 27 27 hourshours, , costcost <5000 USD<5000 USD

www.illumina.com

12

(for Illumina technology, Wikimedia Commons)

ArchonArchon X Prize X Prize

forfor GenomicsGenomics

$ 10$ 10,,000000,,000000

AnnouncedAnnounced in 2006.in 2006.For For thethe firstfirst team team thatthat succeedssucceeds in in sequencingsequencing of 100 of 100 individualindividual humanhumangenomesgenomes withinwithin 30 30 daysdays in in certaincertainrequestedrequested qualityquality andand costcost belowbelow$1,000$1,000 per per oneone genomegenome..

13

ArchonArchon X Prize X Prize

forfor GenomicsGenomics

$ 10$ 10,,000000,,000000

AnnouncedAnnounced in 2006.in 2006.For For thethe firstfirst team team thatthat succeedssucceeds in in sequencingsequencing of 100 of 100 individualindividual humanhumangenomesgenomes withinwithin 30 30 daysdays in in certaincertainrequestedrequested qualityquality andand costcost belowbelow$1,000$1,000 per per oneone genomegenome..

Prize Prize cancelle

d

cancelled 22/8/2013

22/8/2013

„„Outpaced

Outpaced by by innovation

innovation““

Human Genome Project: Human Genome Project:

ResultsResults

14

TheThe HumanHuman GenomeGenome

Haploid Haploid genomegenome: 3 : 3 billionbillion base base pairspairs divideddivided to to 23 23 chromosomeschromosomes

•• 1 meter of DNA 1 meter of DNA ifif extendedextended

•• 750 750 MbMb (1 CD)(1 CD)

•• 2 2 millionmillion standard standard printedprinted pagespages

Fig. from Bolzer et al. 2005, PLoS Biol. 3(5): e157 DOI: 10.1371/journal.pbio.0030157

(50 (50 lettersletters/line, 30 /line, 30 lineslines//pagepage))

DNA DNA in cell in cell

nucleusnucleus

NucleusNucleus of of typicaltypical humanhuman

cell has cell has diameterdiameter 55--8 8 µµm m

andand containscontains 2 m of DNA 2 m of DNA

ComparableComparable to a to a tennistennis ballball

intointo whichwhich 20 km of 20 km of thinthin

threadthread has has beenbeen neatlyneatly

packedpacked..

15

Classification of Classification of eukaryeukaryootictic genomicgenomic DNA:DNA:

•• DDegreeegree of condensation:of condensation:•• EEuchromatinuchromatin

•• HeterochromatinHeterochromatin (cca 10%, not (cca 10%, not sequencedsequenced!) !)

•• RRepetitivityepetitivity::•• HHighlyighly repetitiverepetitive•• MModeratelyoderately repetitiverepetitive

•• NNonon--repetitive (singlerepetitive (single--copy)copy)

•• FFunction:unction:•• SStructuraltructural ((centromerscentromers, , telomerstelomers))

•• CCodingoding proteinprotein

•• TranscribedTranscribed to to noncodingnoncoding RNA (RNA (intronsintrons, , rRNArRNA, ,

tRNAtRNA, , miRNAmiRNA etcetc.).)•• TranspoTranspossonsons

•• RegulatoryRegulatory sequencessequences

•• JunkJunk……??

ExperimentExperimentss withwith denaturadenaturationtion & &

reasreasssoocciaiationtion of DNAof DNA::Rapid reassociation (10Rapid reassociation (10--15%):15%):

-- highlyhighly repetitive DNArepetitive DNA

IntermediateIntermediate reassociation (25reassociation (25-- 40%):40%):

-- moderamoderatelytely rrepetitive DNAepetitive DNA

Slow reassociation (50Slow reassociation (50-- 60%):60%):

-- nonnon--repetitive (single copy) DNArepetitive (single copy) DNA

FigFig: : LodishLodish, H. et al.: Molecular Cell Biology (, H. et al.: Molecular Cell Biology (3rd3rd

ed.), ed.), W.H.FreemanW.H.Freeman, New York , New York 19951995. .

16

CClalasssifisificcaationtion ofof eukaryoticeukaryotic genomicgenomic DNA:DNA:

•• HighlyHighly repetitiverepetitive ((simplesimple--sequence DNAsequence DNA):):•• AllAll heterochromatinheterochromatin ((centromerescentromeres, , telomerestelomeres, 8% , 8%

of of genomegenome, , yetyet unsequencedunsequenced))

•• MinisatellitesMinisatellites (3% of (3% of euchromatineuchromatin))

•• ModeratelyModerately repetitiverepetitive::•• TandemTandemlyly repeatedrepeated gengenes es forfor rRNArRNA, , tRNAtRNA aandnd

histonhistones es (more (more identicalidentical copiescopies to to achieveachieve highhightranscriptiontranscription efficiencyefficiency, e.g. , e.g. rRNArRNA genesgenes in in eukaryoteseukaryotes >100 >100 ccopiopies)es)

•• TranspozonsTranspozons

•• NonNon--repetitiverepetitive::•• Protein Protein genesgenes

•• GenesGenes forfor noncodingnoncoding RNARNA

•• RegulatoryRegulatory sequencessequences

Eukaryotic GENEEukaryotic GENE

FigFig: : MurrayMurray, , RR..K.K. et al.: et al.: HarperovaHarperova biochemiebiochemie, Appleton & Lange 1993, , Appleton & Lange 1993, in in CzechCzech HH&H 2002&H 2002. .

17

Genes are not placed evenly in genomeGenes are not placed evenly in genome

•• Big differences among chromosomes:Big differences among chromosomes:

•• chromosome 1: 2968 geneschromosome 1: 2968 genes

•• chromosome Y: 231 genes chromosome Y: 231 genes

•• Regions rich in genes (Regions rich in genes (““citiescities””) )

-- more C and Gmore C and G

•• Regions poor in genes (Regions poor in genes (““desertsdeserts””) )

-- more A and Tmore A and T, , upup to 3 to 3 MbMb!!

•• CpGCpG islands islands -- ““barriers between cities barriers between cities

and desertsand deserts”” ... regulation of gene ... regulation of gene

activity activity

•• Solitary gene:Solitary gene:•• present as a single copy in the whole haploid present as a single copy in the whole haploid

genome (about half of genes)genome (about half of genes)

•• TandemlyTandemly repeated genes for repeated genes for rRNArRNA, , tRNAtRNA, , histoneshistones

•• Gene family:Gene family:•• cluster of related genes that in evolution cluster of related genes that in evolution

originated from a single ancestor, gradual originated from a single ancestor, gradual diversification of sequence and functiondiversification of sequence and function

•• PseudogenePseudogene::•• gene where mutations accumulated to an gene where mutations accumulated to an

extent that it cannot be transcribed extent that it cannot be transcribed ((““molecular fossilmolecular fossil””) )

•• Processed Processed pseudogenepseudogene::•• originated from reverse transcription of originated from reverse transcription of

mRNA and integration to genomemRNA and integration to genome

18

GenesGenes in in humanhuman genomegenome

•• CoCoddinging gengeneses: 20: 20,,364364

•• SmallSmall noncodingnoncoding gengeneses: 9: 9,,673673•• ((upup tto 200 o 200 bpbp, , rRNArRNA, , miRNAmiRNA, , ncRNAncRNA, ,

snRNAsnRNA, , snoRNAsnoRNA ……))

•• LongLong noncodingnoncoding gengeneses: 1: 144,,817817•• ((overover 200 200 bpbp, , variousvarious noncodingnoncoding RNA)RNA)

•• PseudogenPseudogeneses: 14: 14,,414155

•• Gene Gene transtransccriptriptss: 19: 1966,,345345

Ensembl release 78, Dec. 2014 (www.ensembl.org)

19

PProteinrotein genesgenes inin humanhuman genomgenomee

cca cca 20 20 440000

AboutAbout 25% 25% genomgenomee transcribedtranscribed to to prepre--

mRNAmRNA, ,

FromFrom thisthis onlyonly 5% 5% are are exonexonss

……HumanHuman EXOMEXOMEE: cca 1.5 % : cca 1.5 % of of genomgenomee

NumberNumber of of ggeneneses doesdoes not not reflectreflectorganismorganism complexitycomplexity?!?!

SacchSacch. . cerevisiaecerevisiae 66,,000 gen000 genesesC. C. eleganselegans 1818,,000 gen000 genesesDrosophila Drosophila 1313,,000 gen000 genesesArabidopsis thalianaArabidopsis thaliana 2626,,000 gen000 geneses

Comparison of human/mouse genome with Comparison of human/mouse genome with

genomes of lower organisms (C.genomes of lower organisms (C. eleganselegans, ,

Drosophila):Drosophila):

•• low gene density, longer low gene density, longer intronesintrones

FigureFigure fromfrom: : LodishLodish, H. et al.: Molecular Cell Biology (5th ed.), , H. et al.: Molecular Cell Biology (5th ed.), W.H.FreemanW.H.Freeman, New York 2004. , New York 2004.

20

How to find genes in genomes: How to find genes in genomes:

•• Bacteria, yeast:Bacteria, yeast:

•• open reading frames (open reading frames (ORFsORFs))

•• Higher organisms:Higher organisms:

•• hybridisation/comparison with hybridisation/comparison with cDNAcDNA or or EST (expressed sequence tag = part EST (expressed sequence tag = part cDNAcDNA))

•• by similarity with other known genes by similarity with other known genes

•• prediction of recognition sites for splicingprediction of recognition sites for splicing

•• comparison with genomes of other comparison with genomes of other organismsorganisms

Comparison of human/mouse genome with Comparison of human/mouse genome with

genomes of lower organisms (C. genomes of lower organisms (C. eleganselegans, ,

Drosophila):Drosophila):

•• expansion of gene families /new families expansion of gene families /new families

related to:related to:

•• blood clottingblood clotting

•• acquired (specific) immunityacquired (specific) immunity

•• nervous systemnervous system

•• intraintra-- and intercellular communicationand intercellular communication

•• regulation of gene expressionregulation of gene expression

•• programmed cell death (apoptosis)programmed cell death (apoptosis)

21

•• only about 7 % of protein domains only about 7 % of protein domains entirely new in vertebrates, butentirely new in vertebrates, but•• expansion of protein familiesexpansion of protein families

•• new combinations of domains; and proteins new combinations of domains; and proteins more complex (more domains per protein)more complex (more domains per protein)

•• more proteins from one gene more proteins from one gene -- alternative alternative

splicingsplicing in up to in up to 9595 % %

SusumuSusumu OhnoOhno, 1972, 1972

•• BecauseBecause of of mutationmutation loadload thethe humanhuman

haploidhaploid genomegenome cannotcannot affordafford to to keepkeep

more more thanthan aboutabout 30,000 gene loci.30,000 gene loci.

•• Most of DNA Most of DNA isis redundantredundant …… junkjunk! !

http://www.junkdna.com/ohno.html

22

Mobile DNA Mobile DNA elementselements ((transposonstransposons) )

AutonomousAutonomous DNA DNA sequencessequences, , capablecapable to copy to copy themselvesthemselves, , representrepresent 44 % of 44 % of genomegenome

DNA transposons Retrotransposons

Virus-like Non-viral

Long (LINEs) Short (SINEs)

Mobile elements (Mobile elements (transposonstransposons):):

FigFig.: .: LodishLodish, H. et al.: Molecular Cell Biology (5th ed.), , H. et al.: Molecular Cell Biology (5th ed.), W.H.FreemanW.H.Freeman, New York 2004. , New York 2004.

23

DNA DNA transpotransposonssons

2-3 kb (or shorter), encode

transposase, cut & paste in genome

without RNA intermediate

FigFig: : LodishLodish, H. et al.: Molecular Cell Biology (5th ed.), , H. et al.: Molecular Cell Biology (5th ed.), W.H.FreemanW.H.Freeman, New York 2004. , New York 2004.

Mobile (parasitic) elements in Mobile (parasitic) elements in

mammalian genome:mammalian genome:•• DNA DNA transposonstransposons

•• 22--3 kb (or shorter), encode 3 kb (or shorter), encode transposasetransposase, cut , cut & paste & paste

or copy & paste in genome without or copy & paste in genome without RNA RNA intermediateintermediate

•• VirusVirus--like like retrotransposonsretrotransposons•• 66--11 kb (or shorter11 kb (or shorter), ), retrovirusesretroviruses withoutwithout gene gene forfor

protein protein envelopeenvelope ((envenv))

•• LINEsLINEs (long(long--interspersed repeats) interspersed repeats) •• 66--8 kb, e.g. L1, encode 2 proteins (one is reverse 8 kb, e.g. L1, encode 2 proteins (one is reverse

transcriptase)transcriptase)

•• SINEsSINEs (short(short--interspersed repeats) interspersed repeats) •• 100100--300 300 bpbp, e.g. , e.g. AluAlu, code no protein, proliferation , code no protein, proliferation

depends on depends on LINEsLINEs, origin: small , origin: small noncodingnoncoding cellular cellular

RNARNA

24

Census of parasitic elements in human Census of parasitic elements in human

genome:genome:

LINEsLINEs: : 850 000x 850 000x 21 % genome21 % genome

SINEsSINEs: : 1 500 000x 1 500 000x 13 % genome13 % genome

RetrovirusRetrovirus--like: like: 450 000x450 000x 8 % genome8 % genome

DNA DNA transposonstransposons: : 300 000x 300 000x 3 % genome3 % genome

•• Mostly mutated and/or incomplete copies, Mostly mutated and/or incomplete copies, only small part (<0,05%) still active:only small part (<0,05%) still active:

•• LINEsLINEs: 80: 80--100 L1100 L1

•• SINEsSINEs: 2000: 2000--3000 3000 AluAlu, , <100 SVA<100 SVA

•• RetrovirRetrovirusus--likelike: ? : ? (HERV(HERV--KK……reallyreally extinctextinct?)?)

•• DNA DNA transposonstransposons: 0: 0

•••• Mouse genome contains much more functional Mouse genome contains much more functional

transposonstransposons (...why?)(...why?)

Significance of Significance of transposonstransposons in human in human

genomegenome

•• TranspositionTransposition in in germinalgerminal cellscells isis a a rarerare

eventevent ((approxapprox. 1 . 1 newnew insertioninsertion per 20 per 20 livelive

birthsbirths, , mostlymostly AluAlu))

•• StillStill a a significantsignificant sourcesource of of humanhuman geneticgenetic

variabilityvariability

•• CanCan inactivateinactivate genesgenes –– documenteddocumented as a as a

rarerare cause of cause of inheritedinherited diseasesdiseases

•• In In somaticsomatic cellscells cancan resultresult in in mosaicismmosaicism

•• role of L1 in role of L1 in neurogenesisneurogenesis? ?

25

•• TransposonsTransposons facilitatefacilitate recombinationrecombination

……drivingdriving forceforce of of evolutionevolution!!

FigFig.: .: LodishLodish, H. et al.: Molecular Cell Biology (5th ed.), , H. et al.: Molecular Cell Biology (5th ed.), W.H.FreemanW.H.Freeman, New York 2004. , New York 2004.

NonNon--classifiedclassified ””spacerspacer”” DNA:DNA:nnonon--repetitiverepetitive, n, noncodingoncoding, , >1/2 >1/2 genomgenomee ……likely also dead likely also dead transpotransposonssons, , too mutated to too mutated to be recognizablebe recognizable

Project Project ENCODEENCODE, 2012: , 2012: nono junkjunk DNA!DNA!

•• Up toUp to 80% 80% of of genomgenome has e has biologicbiologicalal funfunctionction

•• Up toUp to 75% 75% of of genomgenome is at least some time e is at least some time and somewhere transcribed to and somewhere transcribed to RNA RNA

•• Despite the fact that only Despite the fact that only 20% 20% of of genomgenome at e at best is under evolutionary constraintbest is under evolutionary constraint

…….?????......?????.....

26

Human Genome Project: Human Genome Project:

ImpactImpact

Benefits of genome sequencingBenefits of genome sequencing

•• Facilitates research into molecular Facilitates research into molecular

basis of diseasesbasis of diseases

•• Study of human evolution and migrationStudy of human evolution and migration

•• What the genome determines (What the genome determines (““nature nature

vs. nurturevs. nurture””) and how genetic variation ) and how genetic variation

causes differences among peoplecauses differences among people

•• Genomic medicine, Genomic medicine, pharmacogenomicspharmacogenomics, ,

personalized medicinepersonalized medicine……..

27

GenomGenomicic medicmedicineine

•• 1) 1) DiagnostiDiagnosticscs at the gene levelat the gene level

•• Rare Rare monogenmonogenicic diseasesdiseases

•• Shift to earlier diagnosticsShift to earlier diagnostics•• Possibility of Possibility of diagndiagnosisosis before disease appearsbefore disease appears

•• NNewbornewborn screeningscreening

•• Noninvasive prenatal testing Noninvasive prenatal testing

•• PPrereconception carrier testingconception carrier testing, , preimplantapreimplantationtiongenetic analysis ingenetic analysis in IVF IVF

•• GenGenomicomic--based analysis ofbased analysis of tumorstumors enables enables effective targeted therapieseffective targeted therapies

•• In common complex diseases with polygenic In common complex diseases with polygenic predispositions predispositions ((diabetes, coronary disease diabetes, coronary disease etc.etc.) ) still difficultstill difficult

GenomGenomicic medicmediciinnee

•• 2) 2) PhPharmaarmaccogenomiogenomicscs

•• Targeted therapy of tumors directed by genetic Targeted therapy of tumors directed by genetic analysisanalysis

•• E.g.E.g.: : antibody against antibody against HERHER--2 2 only in breast tumors only in breast tumors that express this proteinthat express this protein

•• GenomicGenomic--based tests pbased tests prediredict drug efficacy, ct drug efficacy, occurrence of adverse side effects, or help to occurrence of adverse side effects, or help to optimize dosage. optimize dosage.

•• E.g.E.g.: : treatment of ctreatment of chronichronic hepatitihepatitiss C, HIC, HIV, V, possibly dosage of possibly dosage of warfarinwarfarin

…… personalizpersonalizeded medicmediciinnee

28

GenomGenomicic medicmediciinnee

•• 3) 3) MicroorganismMicroorganismss::

•• PatPathhogenogenicic::•• RRapidapid diagnostidiagnosticscs of infectious disease by pathogen of infectious disease by pathogen

sequencing sequencing –– especially relevant in tracing newespecially relevant in tracing newepidemiepidemic outbreaksc outbreaks (SARS, MRSA(SARS, MRSA……) )

•• NNononpatpathhogenogenicic:: Human MHuman Miiccrobiomrobiomee•• E.g. human gut bacteria E.g. human gut bacteria –– metabolicmetabolic aacctivittivity y

comparable to livercomparable to liver, , individuindividually differentally different spespecctrumtrum, , relationships to inflammatory bowel disease, relationships to inflammatory bowel disease, atathheroseroscclerlerosisosis, , obeobesitysity……

PersonalPersonal GenomiGenomicscs: 23andME: 23andME

•• Saliva sample sent bySaliva sample sent by DHL, DHL, genotypigenotypingng

cca 700 000 cca 700 000 SNPsSNPs

•• DNA DNA relativesrelatives

•• AncestryAncestry::

•• AncestryAncestry CompositionComposition

•• PaternalPaternal (Y chromosome (Y chromosome haplogrouphaplogroup))

•• MaternalMaternal ((mitochondrialmitochondrial DNA DNA haplogrouphaplogroup))

•• Per cent Per cent NeanderthalNeanderthal DNADNA

•• HealthHealth

29

PersonalPersonal GenomiGenomicscs: 23andME: 23andME

•• Saliva sample sent bySaliva sample sent by DHL, DHL, genotypigenotypingng

cca 700 000 cca 700 000 SNPsSNPs

•• DNA DNA relativesrelatives

•• AncestryAncestry

•• HealthHealth::

•• DiseaseDisease risk: 122 (31 risk: 122 (31 highhigh confidenceconfidence))

•• DrugDrug response: 25 (12 response: 25 (12 highhigh confidenceconfidence) ) InheritedInherited conditionsconditions: 53 (: 53 (allall highhigh confidenceconfidence))

•• TraitsTraits: 61 (13 : 61 (13 highhigh confidenceconfidence))

Why analysis of Why analysis of SNPSNPss does not say moredoes not say more??•• Common Common SNPSNPs not sufficients not sufficient –– necessary to necessary to find individual find individual ((rarerare) ) polymorpolymorphphismismss•• SNPSNPs are not the main source of human s are not the main source of human genetic variability genetic variability –– duplidupliccaationstions//deledeletionstions aandndinsertions of insertions of transpotransposonssons more significantmore significant•• Trait controlled by a single gene is probably Trait controlled by a single gene is probably rather uncommon condition rather uncommon condition –– phphenotypenotype is e is result of interplay of numerous genesresult of interplay of numerous genes•• Expression of genes (how genome is used) is Expression of genes (how genome is used) is what decides what decides •• PolymorPolymorphphismisms in s in noncodingnoncoding rregulaegulatorytory DNADNA•• EpigenetiEpigeneticscs (DNA (DNA methylationmethylation etcetc.) .) –– also also

heritableheritable! !

30

Ethical, legislative and social issuesEthical, legislative and social issues

•• Gene privacy: Gene privacy: •• who has the right of knowing someone elsewho has the right of knowing someone else’’s s

genetic information and how it can be used, genetic information and how it can be used,

worries about discrimination by employer, health worries about discrimination by employer, health insurance company...insurance company...

•• Gene testingGene testing

•• Gene therapyGene therapy

•• DesignerDesigner babiesbabies

•• BehavioralBehavioral genetics: genetics: •• how genes determine human behaviour, how genes determine human behaviour,

possible fall into genetic determinism and loss of possible fall into genetic determinism and loss of

responsibility for oneresponsibility for one’’s own behaviour s own behaviour

•• GMGMOO

•• Gene patentingGene patenting

ReferenceReferencess::AlbertsAlberts, B. , B. etet alal.: .: EssentialEssential Cell Biology, Cell Biology, GarlandGarland PublishingPublishing, , IncInc., ., NewNew

York 1998.York 1998.

LodishLodish, H. et al.: Molecular Cell Biology, , H. et al.: Molecular Cell Biology, W.H.FreemanW.H.Freeman, New York , New York 1995, 1995, 2004 (2004 (““DarnellDarnell””).).

Nature 2001: 409 (6822, 15.2.2001); pp. 813Nature 2001: 409 (6822, 15.2.2001); pp. 813--958958..

Science 2001: 291 (5507, 16.2.2001); pp.1177Science 2001: 291 (5507, 16.2.2001); pp.1177--13511351..

TrendsTrends in in GeneticsGenetics 2007: 23, 2007: 23, pppp.183.183--191.191.

NatureNature 2009: 2009: 458, 719-724.

FEBS FEBS LettersLetters 2011: 585; 2011: 585; pppp. 1589. 1589--1594. 1594.

LectureLecture by dr. M. by dr. M. LeblLebl ((IlluminaIllumina Co.), 1.LF UK, 1.12.2008.Co.), 1.LF UK, 1.12.2008.

Science Science TranslationalTranslational MedicineMedicine 2013: 5, 189sr4.2013: 5, 189sr4.

PNAS 2014: 111, PNAS 2014: 111, pppp. 6131. 6131--6138 6138 http://http://www.ncbi.nlm.nih.govwww.ncbi.nlm.nih.govhttp://http://genomicsgenomics..energyenergy..govgovhttp://en.wikipedia.orghttp://en.wikipedia.orghttp://www.http://www.ensemblensembl..orgorghttp://http://hapmaphapmap..ncbincbi..nlmnlm..nihnih..govgovhttp:www.http:www.illuminaillumina..comcomhttp(s)://www.23andme.http(s)://www.23andme.comcomFig. “Human and DNA Shadow”: Courtesy of U.S. Department of Energy's Joint Genome Institute, Walnut Creek, CA, http://www.jgi.doe.gov.