8
DNA SEQUENCING –CHAPTER 12C OPEN GENETICS LECTURES –FALL 2015 PAGE 1 CHAPTER 12C – DNA SEQUENCING Figure 1. Output from an automated Sanger DNA sequencer. (Original-Harrington- CC BY-NC 3.0) INTRODUCTION DNA sequencing determines the order of nucleotide bases for a DNA molecule. These DNA molecules could be as small as a single restriction fragment, an entire gene, or as large as an organism's entire genome. Most DNA sequencing at the University of Alberta is done by the Molecular Biology Service Unit (MBSU). They use three machines: (1) an Applied Biosystems ABI 3730, (2) an Illumina MiSeq, and (3) an Illumina NextSeq 500, each has its own advantages and purposes. The 3730 uses an older technology called automated Sanger sequencing, while the Illumina machines perform next-generation DNA sequencing. This chapter will cover how these machines work and what they are used for. 1. AUTOMATED SANGER DNA SEQUENCING 1.1. HISTORICAL CONTEXT DNA sequencing has had a long history. Beginning in the 1970s there have been many methods and improvements. Some dates that stand out are: 1977 - Frederick Sanger invents a popular method, later called manual Sanger sequencing. 1986 - Leroy Hood improves upon this method to invent automated Sanger sequencing. 1987 - Applied Biosystems begins selling a machine to perform automated Sanger sequencing, their ABI 370. 1995 to 2003 - Using ABI 370s, ABI 377s, and similar machines scientists in the US, UK, and other countries sequenced the human genome. 2002 - Applied Biosystems begins selling the ABI 3730 (Figure 2) which became the most popular way to do automated Sanger sequencing and remains so to this day. Figure 2. The Applied Biosytems ABI 3730 in the MBSU (Molecular Biology Service Unit, Biological Sciences Department, U. of Alberta). (Original-Harrington- CC BY-NC 3.0)

EQUENCING HAPTER CHAPTER DNA SEQUENCINGopengenetics.net/Files/MRUOpenGeneticsLectures/Ch13... · DNA sequencing determines the order of nucleotide bases for a DNA molecule. These

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EQUENCING HAPTER CHAPTER DNA SEQUENCINGopengenetics.net/Files/MRUOpenGeneticsLectures/Ch13... · DNA sequencing determines the order of nucleotide bases for a DNA molecule. These

DNASEQUENCING–CHAPTER12C

OPENGENETICSLECTURES–FALL2015 PAGE1

CHAPTER12C–DNASEQUENCING

Figure1.OutputfromanautomatedSangerDNAsequencer.(Original-Harrington-CCBY-NC3.0)

INTRODUCTIONDNA sequencing determines the order ofnucleotide bases for a DNAmolecule. These DNAmolecules couldbeas small as a single restrictionfragment, an entire gene, or as large as anorganism's entire genome. Most DNA sequencingat the University of Alberta is done by theMolecular Biology Service Unit (MBSU). They usethree machines: (1) an Applied Biosystems ABI3730, (2) an Illumina MiSeq, and (3) an IlluminaNextSeq 500, each has its own advantages andpurposes.The3730usesanoldertechnologycalledautomated Sanger sequencing, while the Illuminamachines perform next-generation DNAsequencing. This chapter will cover how thesemachinesworkandwhattheyareusedfor.

1. AUTOMATEDSANGERDNASEQUENCING

1.1. HISTORICALCONTEXTDNAsequencinghashada longhistory.Beginningin the 1970s there have beenmanymethods andimprovements.Somedatesthatstandoutare:

• 1977 - Frederick Sanger invents a popularmethod, later called manual Sangersequencing.

• 1986-LeroyHoodimprovesuponthismethodtoinventautomatedSangersequencing.

• 1987 - Applied Biosystems begins selling amachine to perform automated Sangersequencing,theirABI370.

• 1995 to 2003 - Using ABI 370s, ABI 377s, andsimilar machines scientists in the US, UK, andothercountriessequencedthehumangenome.

• 2002 - Applied Biosystems begins selling theABI 3730 (Figure 2) which became the mostpopular way to do automated Sangersequencingandremainssotothisday.

Figure2.TheAppliedBiosytemsABI3730intheMBSU(MolecularBiology ServiceUnit, Biological Sciences Department, U.ofAlberta).(Original-Harrington-CCBY-NC3.0)

Page 2: EQUENCING HAPTER CHAPTER DNA SEQUENCINGopengenetics.net/Files/MRUOpenGeneticsLectures/Ch13... · DNA sequencing determines the order of nucleotide bases for a DNA molecule. These

CHAPTER12C–DNASEQUENCING

PAGE2 OPENGENETICSLECTURES–FALL2015

Figure3.An automated Sangersequencingreaction.TheregulardNTPs are shown here as blackwhiletheddNTPsareincolour.(Original-Deyholos- CC BY-NC3.0)

1.2. HOWAUTOMATEDSANGERDNASEQUENCINGWORKS

Recall that DNA Polymerases incorporatenucleotides(dNTPs) intoagrowingstrandofDNA,basedonthesequenceofa templatestrand.DNAPolymerases add a new base only to the 3’-OHgroupofanexistingstrandof

DNA; this is why primers are required in naturalDNA synthesis and in techniques such as PCR.Automated Sanger sequencing relies on therandom incorporation of modified nucleotidescalleddideoxynucleotides(ddNTPs,Figure3).

These lack a 3’-OH group and therefore cannotserveasanattachmentsitefortheadditionofthenextnucleotide.AfteraddNTPisincorporatedintoa strandofDNA,no further elongation canoccur.The ddNTPs are labelled with one of fourfluorescent dyes, each specific for one the fournucleotidebases(Figure4).

To sequence a DNA fragment, you need manycopies of that fragment (Figure 5). Unlike PCR,Automated Sanger sequencing does not amplifythe target sequence and only oneprimer is used.ThisprimerishybridizedtothedenaturedtemplateDNA, and determines where on the templatestrand the sequencing reaction will begin. Amixture of regular dNTPs, fluorescently-labelledddNTPs, and DNA Polymerase is added to a tubecontaining the primer-template hybrid. The DNAPolymerase will then synthesize a new strand ofDNA until a fluorescently-labelled ddNTPnucleotide is incorporated, at which pointextension is terminated. Because the reactioncontainsmillionsoftemplatemolecules,

Figure4.ddNTPs are synthetic DNA nucleotides that lack a 3'hydroxyl group. If a DNA Polymerase uses one it can'tcontinue.(Original-Harrington-CCBY-NC3.0)

Figure5.EachddNTPusedinasequencingreactionisattachedtoadifferent fluorescent molecule using a long chain ofcarbons. Note that black ink is used to represent yellowfluorescence.(Original-Harrington-CCBY-NC3.0)

a sufficient number of shorter molecules issynthesized,eachendinginafluorescentlabelthatcorresponds to the last base incorporated. Thenewly synthesized strands can be denatured fromthe template, and then separatedelectrophoretically based on their length (numberof bases). The ABI machine is used for thiselectrophoresis step. While the original, old ABI370usedaslabgelsimilartotheonesusedin

Page 3: EQUENCING HAPTER CHAPTER DNA SEQUENCINGopengenetics.net/Files/MRUOpenGeneticsLectures/Ch13... · DNA sequencing determines the order of nucleotide bases for a DNA molecule. These

DNASEQUENCING–CHAPTER12C

OPENGENETICSLECTURES–FALL2015 PAGE3

Figure6.Fluorescently labeled products can be separated bycapillary electrophoresis, generating a chromatogramfromwhichthesequencecanberead.(Wikimedia-Abizar-PD)

undergraduate labs, the newer ABI 3730 usescapillary tube electrophoresis (Figure 6). In thismachineeachsampletravelsthroughitsowntube.Near the endof the tube is a laser,which excitesany fluorescent dyes moving past and a detectorthat collects any emitted light. As each DNAmoleculemovespast the laser/detector it emits aspecific colour. Because there will be severalmolecules with the same length and same colourtheresultappearsasapeakofcolour.Acomputermonitoring the results can add the sequenceinformationtothecolourssincered=Tandsoon.In thisway theDNA sequence canbe read simplyfromtheorderofthecolorsinsuccessivepeaks.

The results from a sequencing reaction arepresentedasachromatograph.WhileFigure6onlyshows 9 peaks, a successful sequencing reactionwillgenerateabout700nucleotidesworthofdata.Thefigureshowstheresultsfromasingletubebutinfacttherecanbe48or96tubesintotal.Thusinasingle'run'anABI3730machinecansequenceupto67000bpofDNA.

1.3. USINGAUTOMATEDSANGERDNASEQUENCINGTOSEQUENCEAPLASMID

Makinganewrecombinantplasmidtakestimeandmoney. You will want to confirm that it has theDNA sequence it should before you use it forimportantexperiments.Asimplewaytofindoutisto sequence it. Let's say you have put a 3.0 kb

insert into a pBluescript II plasmid and right nowtherecombinantplasmidisinE.colicells(Figure7).ThefirststepistoisolateplasmidDNAfromsomeof thecellswithamini-prepprotocol.Thiswillbethe template DNA. The primer will beoligonucleotidescomplementarytothepBluescriptII vector adjacent to the insert. The sequencingreaction will tell you the sequence of the insertDNAwithintheplasmid.

1.4. USINGAUTOMATEDSANGERDNASEQUENCINGTOSEQUENCEAGENE

Ifyoususpectthatanorganismhasamutationinaspecific gene you can use automated Sangersequencing to find out (Figure 8). Let's say youhave a mouse strain and you think it has amutationinageneyouarestudying.Asbefore,thefirst step is to isolate DNA. However, we can'tsequence this DNA directly. Amongst all of thegenomic DNA there just aren't enough copies ofthe gene to serve as the template DNA. Toovercome this limitationaPCR reaction isused toamplify the gene sequence in question. Then wesequence the PCR product. Depending upon howlargethegene is itmaytakeseveralPCRproductsandseveralsequencingreactionstogetthewholesequence.

Figure7

UsingautomatedSangersequencing tosequence aplasmid.(Original–Harrington-CCBY-NC3.0)

Figure8.Using automatedSanger sequencingto sequence a gene.The site of a pointmutation is shownhereas'm'.(Original –Harrington-CCBY-NC3.0)

Page 4: EQUENCING HAPTER CHAPTER DNA SEQUENCINGopengenetics.net/Files/MRUOpenGeneticsLectures/Ch13... · DNA sequencing determines the order of nucleotide bases for a DNA molecule. These

CHAPTER12C–DNASEQUENCING

PAGE4 OPENGENETICSLECTURES–FALL2015

2. NEXT-GENERATIONDNASEQUENCING

2.1. HISTORICALCONTEXTSequencing a single gene or plasmid with an ABI3730 is quick and inexpensive. But sequencing awhole genome this way would be very slow andveryexpensive.Therearetworeasonsforthis.

The first is that automated Sanger sequencingrequires many copies of the template DNA. Asample of purified plasmid DNA or purified PCRproducthasmillionsofcopiesofthetargetregion.ButasampleofgenomicDNAhasonlyafewcopiesofaspecifictargetregion.Formanyyearstheonlyway to sequence an organism was to isolate itsgenomicDNA,breaktheDNAintolargepieces,andthenclonethesepiecesintoBAC(bacteriaartificialchromosome)vectors.TheBACcloneswould thenhavetobesequencedonebyone.Mostofthe13yearsandmillionsofdollarsittooktosequencethehuman genomewas spentmaking and organizingtheseBACclones.

The second limitation of automated Sangersequencingisthateachreactioncanonlygenerate700 nucleotides worth of data. It took literallymillions of independent sequencing reactions tosequencethehumangenome.

Beginning in the late1990sscientists realized thatthere was a need for a machine that couldsequence genomic DNA directly andwith a singlereaction. In several instances a technology wasinvented in a university lab, developed in a smallbiotechnology company, and thenpurchasedby alargerbiotechcompany.Anexampleofthisis:

• 1996 - Swedish scientists invent a completelynew way to sequence DNA calledpyrosequencing. It is clever but very labourintensive.

• 2000-AnAmericaninventorandentrepreneur,JonathanRothberg,refinestheirtechniqueintoautomatedpyrosequencing.

• 2004-Hiscompany,454LifeSciences,marketsthe first so callednext-generation sequencingmachine.

• 2007 - The largest biotech company in theworld,Roche,buys454LifeSciences.

• 2015 - 454, now a subsidiary of Roche,continues to develop and sell next-generationmachines.

In 2015 there are several choices for next-generation sequencing. For a few hundredthousanddollarsyoucanpurchaseaGSFLX(madeby 454/Roche), anABI 5500 (AppliedBiosystems),oranIonProton(LifeTechnologies).Eachisafancylookingmachinethatusesauniqueandproprietarytechnology.

2.2. HOWNEXTGENERATIONDNASEQUENCINGWORKS

Asmentioned in the introduction to this chapter,theMBSUrecentlypurchasedtwonextgenerationmachines: an Illumina MiSeq and an IlluminaNextSeq500(Figure9).

Both use a similar workflow (Figure 10). Thescientist has to isolate genomic DNA from anorganism(step1)andthenuseakittobreakitintosmall fragments (step 2). The scientist then loadsthe fragments into the machine and turns it on.Once inside, theDNA fragmentsare isolated fromeachother(step3),amplifiedinplace(step4),andfinallysequenced(step5).Thetechnologyiscalledsequencing by synthesis. Illumina has madeanimated movies of what happens within theirmachines:

www.youtube.com/watch?v=HMyCqWhwB8E

Figure9.

The Illumina MiSeq in the MBSU (Biological SciencesDepartment,U.ofAlberta).Thedoorhasbeenopenedtoshowwhere theDNA sample and reactionmixtures areloaded.(Original-Harrington-CCBY-NC3.0)

Page 5: EQUENCING HAPTER CHAPTER DNA SEQUENCINGopengenetics.net/Files/MRUOpenGeneticsLectures/Ch13... · DNA sequencing determines the order of nucleotide bases for a DNA molecule. These

DNASEQUENCING–CHAPTER12C

OPENGENETICSLECTURES–FALL2015 PAGE5

Figure10.

Next generation DNA sequencing using Illumina'ssequencing by synthesis technology. Steps 1 and 2 aredonebytheresearcherandsteps3,4,and5occurwithinthemachine.(Original–Harrington-CCBY-NC3.0)

Table1.Comparisonbetweendifferentsequencingmachines.

Machine ABI3730 IlluminaMiSeq

IlluminaNextSeq500

DNA plasmidorPCRproduct genomicDNA genomicDNA

Technology automatedSanger

sequencingbysynthesis

sequencingbysynthesis

Datagenerated 700bp 540Mbto15

Gb16Gbto120

Gb

Price $4.75 $1,250-$1,850

$2,050-$5,150

Theoutputisjustrawsequencedata,therearenochromatograms. Powerful software is needed forsequence assembly, the process of joining thesesmall pieces of sequence data into a continuoussequence(Figure11).Ultimatelytherewillbeonesequenceforeachoftheorganism'schromosomes.

2.3. COMPARISONBETWEENDNASEQUENCINGMETHODS

Scientists all over the world now have a choicebetween automated Sanger sequencing and next-generation sequencing. For example, at theUniversityofAlbertayourchoicesattheMBSUareshowninTable1.

RecallthatDNAismeasuredinbasepairswhere: 1kilobase(kb)=1000basepairs(bp) 1Megabase(Mb)=1000000bp 1Gigabase(Gb)=1000000000bp

Let's sayyouwantedtosequencea2,000bp longPCR product. You could do this with threesequencingreactionsintheABI3730orasinglerunintheIlluminaMiSeq.Thefirstmethodwouldcost$15 and the second would cost $1,250. Eventhoughonemachineisadecadeolderitisstilltheway togo! Ifyoudiduse theMiSeqyou'dendupsequencingthesamePCRproductoverandover.Itwouldn'tproduceanymoredata.

Figure11.AssemblingtheDNAsequenceofa chromosome from manysmallersequences.(NHGRI-DarrylLeja-PD)

Page 6: EQUENCING HAPTER CHAPTER DNA SEQUENCINGopengenetics.net/Files/MRUOpenGeneticsLectures/Ch13... · DNA sequencing determines the order of nucleotide bases for a DNA molecule. These

CHAPTER12C–DNASEQUENCING

PAGE6 OPENGENETICSLECTURES–FALL2015

Ontheotherhandlet'ssayyouwantedtosequenceyourownDNA.Evenifyoudon'tconsiderthetimeandcostofmakingtheBACclonesitwouldstillcostmillionsofdollarstodoallofthesequencingreactionsintheABI3730.ConverselytheMBSUcouldusetheirNextSeq500andhaveeverythingdoneintwodaysforabout$4000.Eachofyour46chromosomeswouldbesequencedabout30timeseach.Amoreexpensivemachine,theIlluminaHiSeq,cansequencehumanDNAforabout$1000aperson.

2.4. USINGNEXT-GENERATIONDNASEQUENCINGTOSEQUENCEHUMANS

Even though we know the average human DNAsequence, each of us is unique. There are tworeasons why human DNA continues to besequenced.(Table2)

2.5. USINGNEXT-GENERATIONDNASEQUENCINGTOSEQUENCEOTHERORGANISMS

Next-generationsequencinghasmadeitfeasibletosequence anything.Here are just a fewexamples.(Table3)

Table2.Usingnext-generationsequencingtosequencehumans.Useofnext-generationsequencing Description

Personalizedgenomics Ifwesequenceaperson'sDNAitcanrevealinformationabouttheir susceptibility to disease and their response to variousmedicaltreatments.

Tumourcellsequencing Ifapersonhascanceritisnowpossibletosequenceindividualcancer cells. Thishas revolutionizedhowphysicianshelp theirpatients. Instead of treatments based upon the location oftumours, treatmentscannowbedesignedaroundthegeneticdefects that lead to the cells becoming cancerous in the firstplace.

Table3.Usingnext-generationsequencingtosequenceotherorganisms.Useofnext-generationsequencing DescriptionDenovosequencing This is when an organism is sequenced for the first time. For

examplein2014researchersinSierraLeonesequenced99Ebolavirus genomes from 78 patients. They identified changes in thevirusthatcausedtherecentoutbreak.

Metagenomics This is when the entire collection of DNA in an environment issequenced to determine which species are present. Thistechnique has been used to show that a person's gutmicrobesvarywiththeirdiet.

RNASeq ThisiswhenRNAsfromatissueororganareisolated,copiedintoDNAmolecules, and then sequenced. This reveals which geneswereactiveinthecell,tissue,ororgan.

Page 7: EQUENCING HAPTER CHAPTER DNA SEQUENCINGopengenetics.net/Files/MRUOpenGeneticsLectures/Ch13... · DNA sequencing determines the order of nucleotide bases for a DNA molecule. These

DNASEQUENCING–CHAPTER12C

OPENGENETICSLECTURES–FALL2015 PAGE7

___________________________________________________________________________SUMMARY:

• Automated Sanger sequencing became commonplace in 1987 and is still used today. It is used tosequenceplasmidsandPCRproducts.ThemostpopularmachinesareAppliedBiosystem'sABI3730s.

• Next-generation sequencing began in 2004 as a betterway to sequencewhole genomes. There areseveralcompetingtechnologies,forexampleIllumina'sMiSeqmachineanditssequencingbysynthesistechnology.

• Sequencingcentresofferbothtypesofsequencingtoday.

• SequencinganyDNAmolecules,largeorsmall,isnowfastandinexpensive.

KEYTERMS:automatedSangersequencingAppliedBiosystemsABI3730DNAPolymerasesprimerregulardNTPsfluorescently-labelledddNTPscapillarytubeelectrophoresischromatogramnext-generationsequencing

IluminaMiSeqIlluminaNextSeq500sequencingbysynthesissequenceassemblypersonalizedgenomicstumourcellsequencingdenovosequencingmetagenomicsRNASeq

Page 8: EQUENCING HAPTER CHAPTER DNA SEQUENCINGopengenetics.net/Files/MRUOpenGeneticsLectures/Ch13... · DNA sequencing determines the order of nucleotide bases for a DNA molecule. These

CHAPTER12C–DNASEQUENCING

PAGE8 OPENGENETICSLECTURES–FALL2015

STUDYQUESTIONS:1) Whatwouldthechromatogramlooklikeifyou

set up an automated Sanger sequencingreaction with only template, primers,polymerase,andfluorescentddNTPs?

2) HowcouldyouuseDNAsequencingtoidentifynewspeciesofmarinemicroorganisms?

3) An alternative name for automated Sangersequencing isdye-terminator sequencing.Whyisthistermappropriate?

4) Tenyearsagoitwouldhavecost$100,000,000to sequenceyourDNA.Today itwould costaslittle as $1,000.Why did the cost go down somuch?

5) Why haven't next-generation machinescompletely replaced the first generation ofautomatedDNAsequencers?

6) True or false: Automated pyrosequencing andsequencing by synthesis are both considerednext-generationDNAsequencingtechnologies.

Notes: