24
CHAPTER NINETEEN Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher Hart, Doron Lipson, Fatih Ozsolak, Tal Raz, Kathleen Steinmann, John Thompson, and Patrice M. Milos Contents 1. Introduction 408 2. Basic Principles of Single-Molecule Sequencing 409 3. Preparation of Genomic DNA for Single-Molecule Sequencing 410 3.1. DNA fragmentation and quantitation 411 3.2. Poly-A tailing 415 3.3. 3 0 end blocking 415 4. Bacterial Genome Sequencing 416 4.1. Preparation and sequencing of bacterial DNA 416 4.2. Assessment of coverage and lack of bias 417 5. Human Genome Sequencing and Quantitation 418 5.1. Copy number variation 420 6. Chromatin Immunoprecipitation Studies 421 6.1. Preparation of ChIP DNA 422 6.2. ChIP DNA poly-A tailing 423 6.3. ChIP DNA 3 0 blocking 423 7. Digital Gene Expression for Transcriptome Quantitation 423 7.1. Methodology for single-molecule sequencing digital gene expression 424 7.2. Demonstration of DGE counting reproducibility 428 8. Summary 428 Acknowledgments 428 References 430 Abstract Helicos Ò Single-Molecule Sequencing provides a unique view of genome biol- ogy through direct sequencing of cellular and extracellular nucleic acids in an unbiased manner, providing both quantitation and sequence information. Using Methods in Enzymology, Volume 472 # 2010 Elsevier Inc. ISSN 0076-6879, DOI: 10.1016/S0076-6879(10)72002-4 All rights reserved. Helicos BioSciences Corporation, One Kendall Square, Cambridge, Massachusetts, USA 407

Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

C H A P T E R N I N E T E E N

M

IS

H

ethods

SN 0

elicos

Single-Molecule Sequencing:

Sequence Methods to Enable

Accurate Quantitation

Christopher Hart, Doron Lipson, Fatih Ozsolak, Tal Raz,

Kathleen Steinmann, John Thompson, and Patrice M. Milos

Contents

1. In

in

076

Bio

troduction

Enzymology, Volume 472 # 2010

-6879, DOI: 10.1016/S0076-6879(10)72002-4 All rig

Sciences Corporation, One Kendall Square, Cambridge, Massachusetts, USA

Else

hts

408

2. B

asic Principles of Single-Molecule Sequencing 409

3. P

reparation of Genomic DNA for Single-Molecule Sequencing 410

3

.1. D NA fragmentation and quantitation 411

3

.2. P oly-A tailing 415

3

.3. 3 0 end blocking 415

4. B

acterial Genome Sequencing 416

4

.1. P reparation and sequencing of bacterial DNA 416

4

.2. A ssessment of coverage and lack of bias 417

5. H

uman Genome Sequencing and Quantitation 418

5

.1. C opy number variation 420

6. C

hromatin Immunoprecipitation Studies 421

6

.1. P reparation of ChIP DNA 422

6

.2. C hIP DNA poly-A tailing 423

6

.3. C hIP DNA 30 blocking 423

7. D

igital Gene Expression for Transcriptome Quantitation 423

7

.1. M ethodology for single-molecule sequencing digital

gene expression

424

7

.2. D emonstration of DGE counting reproducibility 428

8. S

ummary 428

Ackn

owledgments 428

Refe

rences 430

Abstract

HelicosÒ Single-Molecule Sequencing provides a unique view of genome biol-

ogy through direct sequencing of cellular and extracellular nucleic acids in an

unbiased manner, providing both quantitation and sequence information. Using

vier Inc.

reserved.

407

Page 2: Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

408 Christopher Hart et al.

a simple sample preparation, involving no ligation or amplification, genomic

DNA is sheared, tailed with poly-A and hybridized to the flow-cell surface

containing oligo-dT for initiating sequencing-by-synthesis. RNA measurements

involving direct RNA hybridization to the flow cell allows for the direct sequenc-

ing and quantitation of RNA molecules. From these methods, a diverse array of

applications has now been successfully demonstrated with the HelicosÒ

Genetic Analysis System, including human genome sequencing for accurate

variant detection, ChIP Seq studies involving picogram quantities of DNA

obtained from small cell numbers, copy number variation studies from both

fresh tumor tissue and formalin-fixed paraffin-embedded tissue and archival

tissue samples, small RNA studies leading to the identification of new classes of

RNAs, and the direct capture and sequencing of nucleic acids from cell quan-

tities as few as 400 cells with our end goal of single cell measurements. Helicos

methods provide an important opportunity to researchers, including genomic

scientists, translational researchers, and diagnostic experts, to benefit from

biological measurements at the single-molecule level. This chapter will describe

the various methods available to researchers.

1. Introduction

The revolution in genomic sequencing that is currently occurring inthe scientific community is heralding an exciting era of biology whereexperiments can be performed at a scale that fully elucidates the genome,its corresponding architecture, and the resulting transcriptome (all RNAmolecules transcribed from the genome), revealing amazing new findings(Kahvejian et al., 2008). This revolution is continuing as we move into theera of single-molecule sequencing where, for the first time, we are sequenc-ing and measuring the actual molecules present in cells and tissues. This newera offers the promise of a better understanding of the fundamental basis ofhealth and disease.

Helicos single-molecule sequencing offers the opportunity to examinebillions of DNA or RNA molecules in a highly parallel fashion, scalable tosequencing of an entire human genome (Harris et al., 2008; Ozsolak et al.,2009; Pushkarev et al., 2009). While other technologies may offer similarapproaches, the simplicity and the scalability of single-molecule sequencingsets it distinctly apart from next-generation sequencing technologies. Fur-ther, these same principles contribute directly to the absolute quantitativenature of the technology. By eliminating cumbersome sample preparationsteps, including complex ligations and polymerase chain reactions foramplification, single-molecule sequencing offers both sequence informationand reliable quantitation for many different applications. Often referred to as‘‘third-generation’’ sequencing (Hayden, 2009), the methods involvedin single-molecule sequencing demonstrate these unique principles.

Page 3: Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

Single Molecule Sequencing Methods 409

This chapter describes the methodological details for a variety of genomicapplications used by the research and translational biology communities,including preparation of genomic DNA for complete genomic sequencing,copy number variation detection and chromatin immunoprecipitation(ChIP) studies. Quantitative aspects of single-molecule measurements forRNA are also described for methods associated with digital gene expression.

2. Basic Principles of Single-Molecule

Sequencing

Helicos single-molecule sequencing utilizes sequencing-by-synthesismethodology, involving individual nucleic acid molecules that are initiallyfragmented in the case of genomic DNA, melted into single strands ofDNA, and poly-A tailed. These DNA molecules are then captured asindividual strands of DNA through deposition onto a glass HelicosÒ FlowCell (Fig. 19.1B) surface coated with oligo-dT-50 oligonucleotides, whichare then filled with dTTP and polymerase for the purpose of filling in anyremaining nucleotides complementary to the poly-A tail. Following the fill,nucleic acid templates are locked in place by the addition of fluorescentlylabeled dCTP, dGTP, and dATP Virtual TerminatorTM nucleotides, whichincorporate as a single complementary nucleotide and prohibit subsequentextension prior to terminator cleavage. This ‘‘fill and lock’’ step ensures thateach template become available for the sequencing-by-synthesis reaction(Bowers et al., 2009; Harris et al., 2008).

Following the fill and lock step, sequencing-by-synthesis is initiatedthrough the addition of fluorescently labeled Virtual TerminatorTM nucleo-tides added one at a time. Nucleotide incorporation occurs at the comple-mentary position in the individual growing strands of DNA, using a DNApolymerase. After incorporation, unincorporated nucleotides are rinsedthrough the flow cell. The flow-cell surface is then illuminated with alaser and incorporation is detected by the fluorescent emission of light.The HeliScope Sequencer captures the images via a CCD camera andrecords which strands have incorporated a nucleotide and records positionalinformation as well as cycle information to ensure conversion of the imageto the individual DNA molecules as well as the A, C, G, or T nucleotidesequence information. After the images are captured, the terminator moietyis cleaved from the incorporated nucleotide, allowing subsequent additionof the next complementary nucleotide.

Once the thousands of images, which correspond to all the channels ofthe flow cell, have been recorded, the fluorescent label is cleaved from thenucleotide, allowing the instrument to continue incorporation of the nextnucleotide in the addition cycle. In a standard run, the HeliScope Sequencer

Page 4: Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

25 channels - 2 flow cells per run

A

B

Figure 19.1 (A) The HeliScopeÒ Single-Molecule Sequencer. A powerful geneticanalyzer that performs single-molecule sequencing chemistry and captures images ofsingle-molecule incorporation of fluorescently labeled nucleotides, producing accuratesequences of billions of templates at a time. (B) The HelicosÒ Flow Cell. Specificallydesigned for sequencing chemistry used with the Sequencer, two flow cells – each with25 channels enable a multitude of applications all benefiting from Helicos proprietarychemistry.

410 Christopher Hart et al.

completes 120 cycles of individual nucleotide additions. A representativevisual image taken from the HeliScope Sequencer is shown in Fig. 19.2. Atthe end of the run, real-time image processing has converted all the imagesinto a complete sequence file, recording both the DNA strand position andnucleotide string addition; scientists are then able to download the sequencefile and begin the alignment to appropriate reference genomic or transcrip-tomic sequences. To date, numerous genomes have been sequenced usingthe HeliScope Sequencer, including genomes from M13 virus (Harris et al.,2008), bacterial species, yeast, and Caenorhabditis elegans (Bowers et al.,2009), culminating in the world’s first sequencing of a human genomeusing single-molecule sequencing (Pushkarev et al., 2009). The followingwill describe the basic methodologies one requires in order to preparegenomic templates for single-molecule DNA and cDNA sequencing.

3. Preparation of Genomic DNA for Single-

Molecule Sequencing

The basic principles involved in the preparation of genomic DNA forsubsequent sequencing-by-synthesis involve DNA fragmentation and quan-titation, poly-A tailing, and 30 end blocking to ensure that sequence obtained

Page 5: Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

Position

1

1

2

3

Cycle 2 C C -Cycle 3 A A A

Cycle 5 C - -

Cycle X G G -

Cycle 1 - - G

Cycle 4 - - T

-

2 3

Figure 19.2 Helicos Flow Cell Image and Virtual TerminatorÒ Nucleotide Incorpora-tion. A true image derived from a section of the sequencing flow cell showing a close-up image of single molecules of DNA which have incorporated fluorescent VirtualTerminatorÒ nucleotides. The right insert shows a close-up view of the single mole-cules and the definition of the nucleotide incorporated at the positions 1, 2, or 3 duringthe cyclic addition of nucleotides.

Single Molecule Sequencing Methods 411

from the 30 end of surface-bound oligoT is not contaminated with sequencefrom the 30 end of the hybridized DNA strands. Figure 19.3 outlines theprocess described in detail below.

3.1. DNA fragmentation and quantitation

When quantities are not limiting, 1–3 mg of genomic DNA is typically usedfor single-molecule DNA sequencing of whole genomes, although muchsmaller quantities are also possible (see subsequent ChIP DNA Sequencingsection). When the amount of DNA is low, care should be taken to uselow-loss tubes and pipette tips. Addition of any type of carrier nucleic acidshould be done cautiously as it could become a significant contaminant insequencing.

3.1.1. DNA shearing

1. Prepare 1–3 mg of genomic DNA in a final volume of 120 ml 10 mM Tris1 mM EDTA (1 � TE).

2. Any method of DNA shearing can be used; however, if completecoverage is desired, themethod chosen should cleave theDNA randomlyand provide a 30 hydroxyl end for subsequent tailing. In the currentprotocol, ultrasonic shearing of the DNA is achieved using the Covaris

Page 6: Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

Shearing

3′ 3′5′

5′

5′

5′

5′

5′

5′

5′

5′

AAAAAAAAAA3�

AAAAAAAAAA3�

AAAAAAAAAA3�

AAAAAAAAAAddT3�

AAAAAAAAAA3�

5′

Denature,terminal transferase+ dATP

Terminal transferase+ ddATP

True single moleculesequencing (tSMS)TM analysis

AAAAAAAAAAddT3�

AAAAAAAAAAddT3�

AAAAAAAAAAddT3�

Figure 19.3 Depiction of DNA sequencing methodology. Illustration of the basicsample preparation steps of genomic DNA for single-molecule sequencing.

412 Christopher Hart et al.

S2 instrument, resulting in fragmentation suitable for sequencing of theentire genomic sample. Conditions have been optimized by Covaris toallow for the use of genomicDNA ranging in length, at present, from 100to 3000 base pairs (bp) so that the researcher can select the desiredfragment size (Covaris, Woburn, MA; http://www.covarisinc.com).For typical genomic DNA sequencing using single reads, DNA is frag-mented to an average size of 200–300 bp. For paired read sequencing in

Page 7: Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

Single Molecule Sequencing Methods 413

which two or more regions of the same DNA fragment are sequenced,fragmentation of 1500 bp is optimal to provide spacer gaps ranging from100 to 700 nucleotides in length.

3. Conditions vary for the various fragmentation sizes: For example, toshear DNA to 200 bp, the DNA is sheared in Covaris microTubes using3 cycles of 60 s, 10% duty cycle, intensity 5, and 200 cycles per burst.

4. Transfer the DNA to a clean 1.5-ml microtube. At this point, the DNAsample can be stored at �20 �C.

3.1.2. DNA size selectionThe DNA sample is subsequently cleaned using Agencourt AMPureÒ beadsto remove small nucleic acids, nucleotides, and salts that may be present inthe sheared sample.

1. Adjust the DNA volume to 100 ml.2. Warm AMPure Bead solution to room temperature (RT). Vortex to

resuspend.3. Transfer DNA sample to 1.5-ml tube and add water to bring each

sample to 100 ml. Vortex the beads again and add 300 ml AMPure Beadslurry.

4. Incubate at RT for 30 min. Shake the tube every 10 min.5. Briefly centrifuge at low speed, capture beads on DynalÒ magnetic

stand for 5 min and carefully aspirate supernatant.6. Wash beads twice with 700 ml freshly prepared 70% (v/v) ethanol.7. Briefly centrifuge, place on magnet, remove ethanol, and dry pellet

completely at RT for 5–7 min. Cracks will form when the pellet isdried sufficiently.

8. To elute the sheared DNA from the AMPure beads add 20 ml of water,pipette the beads and water up and down 20 times and place the tubeback on the Dynal magnet.

9. Collect the 20 ml volume and transfer to a new 1.5-ml tube.10. Repeat this process again to remove any remaining DNA on the

AMPure beads. DNA will now be in the 40 ml volume.

3.1.3. Concentration estimation of 30 ends for subsequentpoly-A tailing

1. In order to effectively tail the 30 ends of the genomic DNA, one mustdetermine the approximate concentration of 30 ends, which requires adetermination of the average fragment size of the sheared DNA obtainedby running a 2-ml DNA aliquot on a 4–20% gradient Tris Borate EDTA(TBE) polyacrylamide gel.

2. DNA standards of 1000 and 25 bp ladders are included for size comparison.

Page 8: Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

414 Christopher Hart et al.

3. To estimate the size of the sheared product, compare the middle of theDNA smear to the size standards. An example gel is shown in Fig. 19.4.

4. Determine the double-stranded DNA concentration using a NanoDrop1000 or 8000 spectrophotometer. Calculate the pmoles of the ends inthe sample using the following formula.

pmol 30termini=ml ¼ XXng DNA=ml� ð103pg=ngÞ� ðpmole=660 pgÞ� ð1=average fragment size as determined from gelÞ� 2ð30termini=dsDNA moleculeÞ

1 2 3

1: 1 kb DNA ladder2: Sheared sample3: 25bp DNA ladder

Middle of smear(~200bp)

Figure 19.4 Example of gel sizing for sheared DNA. A 4–20% TBE gradient gel isused to assess the successful fragmentation of genomic DNA for subsequent poly-Atailing. Size standards of 1 kilobase ladder and 25 bp ladder are used to estimate averagefragmentation length. Compare the average size in the middle of the sample smear tothe size standards.

Page 9: Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

Single Molecule Sequencing Methods 415

3.2. Poly-A tailing

The DNA fragments must be modified at their 30 ends with a poly-A tail toallow for efficient hybridization onto the oligonucleotide-coated HelicosFlow Cell. Conditions are optimized to allow the addition of 90–200 poly-As to the single-stranded DNA molecules.

1. Prepare a sampleDNATailingMix assuming a 3.0-pmole sample reaction.For one reaction—4 ml 10� Terminal Transferase buffer, 4 ml 2.5 mMCoCl2, 2 ml Terminal Transferase Enzyme (20 U/ml), 3.9 ml Helicossupplied Poly-A Tailing dATP and 1.1 ml deionized water (dH2O). Pleasenote: For the Tailing Control Tube, adjust the Poly-A Tailing dATP to1.3 ml and the dH2O to 3.7 ml.

2. Place the 3.0 pmole sheared DNA sample into a 200-ml PCR tube.3. At the same time, prepare a separate 200-ml PCR tube with tailing

control sample which consists of 0.8 pmoles of your DNA sample and0.2 pmoles of tailing oligo control supplied by Helicos to monitorefficiency of tailing.

4. Denature the DNA by placing the sheared DNA and tailing oligocontrol tubes in a PCR Thermocycler, at 95 �C for 5 min. Snap coolby placing tubes in an aluminum block prechilled on an ice slurry for2 min to prevent reannealing of the denatured single-stranded DNA.

5. Add 15 ml of Sample Tailing Mix or Control Tailing Mix to each DNAtube. Mix by pipetting up and down 10 times. Collect liquid contents bycentrifuging briefly.

6. Place the tubes in the thermocycler using the following conditions:37 �C for 60 min, 70 �C for 10 min, maintain at 4 �C until ready toproceed to next step.

7. Success of tailing is determined by monitoring the oligo control tailing.Twenty microliters of the oligo control is run on a 4–20% polyacryl-amide gel in TBE alongside 100 and 25 bp ladders. An example ofsuccessfully sheared and tailed DNA is shown in Fig. 19.4.

8. Control-tailed oligos should migrate anywhere between 250 and600 bp, indicating the sample is properly poly-A tailed with a desiredtail length of between 90 and 200 dA.

3.3. 30 end blocking

During flow-cell hybridization, the poly-A tail on the DNA sequencingtemplates may align imperfectly to the oligo-dT surface on the HelicosFlow Cell surface. This may result in the generation of a recessed 30 end thatcan serve as a substrate for the sequencing-by-synthesis reaction. To preventthe incorporation of fluorescent Virtual Terminator nucleotides at that end

Page 10: Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

416 Christopher Hart et al.

of the duplex, the 30 ends of sheared DNA molecules are modified with adideoxy terminator, using the following protocol.

1. Following the poly-A tailing, heat denature the DNA at 95 �C for 5 minin the thermocycler. Immediately remove and snap cool for a minimumof 2 min by placing in the ice-cooled aluminum block.

2. Add 0.3 ml of 500 mM Biotin ddATP to each tube.3. Add 2 ml Terminal Transferase (20 U/ml) to each tube. Mix thoroughly

by pipetting up and down 10 times.4. Collect contents by brief centrifugation.5. Return to the thermocycler and run the following conditions: 37 �C for

60 min, 70 �C for 10 min, maintain at 4 �C until ready to proceed tonext step.

Samples are now ready for hybridization to the Helicos Flow Cell forsubsequent sequencing-by-synthesis. DNA concentrations in the range of150–300 pM are utilized for each Helicos Flow Cell Channel typically in a20-ml loading volume.

4. Bacterial Genome Sequencing

Helicos BioSciences has applied the above DNA sample preparationmethodology to the sequencing of three bacterial genomes to demonstratethe principles of single-molecule sequencing—the simplicity of the samplepreparation, the lack of amplification requirement, and the correspondinglack of G þ C biases (Dohm et al., 2008), as well as the evenness of coverageacross a broad range of bacterial genomes, including Escherichia coli K12MG1655, Staphylococcus aureus USA 3000, and Rhodobacter sphaeroides 2.4.1.The percentage of guanine and cytosine nucleotides (%GC) content of thegenomes of these organisms represents the entire range of %GC content ofkilobase-sized windows found in the human genome (Table 19.1). Theyhave therefore been employed as reference genomes to test the ability ofsequencing platforms to sequence the human genome. Achieving accurateand even coverage across these bacterial genomes demonstrates an absence ofsequence content bias, which thus provides both sequence information aswell as quantitative accuracy.

4.1. Preparation and sequencing of bacterial DNA

1. Shear and prepare 1 mg of bacterial DNA obtained from each species to250–300 bp as described in Section 3.

2. Following sample preparation, load 150–300 pM of each bacterialDNA into individual flow-cell channels in a volume of 20 ml and

Page 11: Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

Table 19.1 Reference bacterial genomes containing diversegenome sequence content

Genome size (Mb) GC content (%)

E. coli 4.6 50.8

R. sphaeroides 4.3 68.8

S. aureus 2.8 32.7

Genomic size and G þ C content of bacterial genomes used for demonstration ofhigh-throughput sequencing methodologies due to diverse genomic content.

Single Molecule Sequencing Methods 417

sequence-by-synthesis for 120 nucleotide cycle additions via the Heli-Scope Sequencer in an 8-day run in which both flow cells are utilized.

3. Align the single-molecule sequence reads obtained at run completion tothe corresponding bacterial reference genomes using the HelicosIndexDP Genomic aligner (available at Helicos HeliSphere TechnologyCenter http://open.helicosbio.com/mwiki/index.php/Main_Page).

4. The resulting throughput yields 12–20 million aligned reads per flow-cell channel or, given the two flow cells totaling 50 channels per run,0.6–1 billion alignable reads per run.

5. A single channel provides upward of 80–120� coverage for thesebacterial genomes, depending on the genome size, and represents some3–4� more coverage than is required for accurate consensus calling.

Figure 19.5 shows the alignment view of reads and coverage within aselected region of the E. coli genome, which allows one to compare thesequence reads mapped to the region of a 5-kilobase pairs (kbp) windowagainst the background of varying GC content in this same region. Cover-age of sequence reads remains evenly distributed. Figure 19.5 also shows theread alignment, demonstrating the accuracy of the sequence informationobtained.

4.2. Assessment of coverage and lack of bias

The ability to achieve consistent coverage across these genomes with specialemphasis on regions of highly varying GC content is a hallmark of single-molecule sequencing. To demonstrate consistent coverage across the geno-mic regions of the three bacteria, we have plotted in Fig. 19.6 the coveragedepth of single-molecule sequence reads binned across the bacterial geno-mic sequence and similarly plotted their known GC content in the samewindows alongside the reference genomes. The HeliScope Sequencerproduces even coverage across the entire span of genomic sequence contentwithin a genome, even in the case of very G þ C rich (R. sphaeroides) andhighly A þ T rich (S. aureus) genomes.

Page 12: Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

Coverage

1,161,000 1,162,000 1,163,000Genome position

CGTGAAATACTGGCGCATCCGCTGGTGGGAGGGCTGATTCTCTTTACGCGTAACTATCATGATCCTGCCCAGTTACGTGAACTGGTGCGCCAGATCCGCG

1,164,000 1,165,000

GC content

0.0

0.2

0.4

0.6

0.8

1.0

50

60

70

80

90

100

110

120

130

Cov

erag

e

Figure 19.5 Detailed view of reads and coverage within an arbitrary position withinthe E. coli genome. Top panel shows sequence read coverage across each 5 kb regionof the E. coli genome along (red smooth line) with the regional GC coverage (greenjagged line). In each case positional statistics are derived from a sampling of the 500 bpupstream and 500 bp downstream regions. Bottom panel shows sequence reads as theyaligned to the genome within the region demarcated by the vertical black lines in thetop panel.

418 Christopher Hart et al.

5. Human Genome Sequencing and Quantitation

Whole genome sequencing has been successfully achieved by scientistsat Stanford University using Helicos single-molecule sequencing methodsand the HeliScope Sequencer. Pushkarev et al. (2009) utilized 200 pM ofpoly-A tailed human genomic DNA per Helicos Flow Cell channel andloaded some 170 channels with the genomic DNA. The researchersobtained 148 Gigabases of raw sequence of, on average, 33-nt read lengthto achieve, on average, a 28� coverage of a human genome. Some 90% ofthe human genome was covered using this initial genome sequencingmethodology. Sequence variants were identified as described in Pushkarevet al. (2009), which included data on copy number variation found withinthe human genome sequence.

Page 13: Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

Fraction GC

Fraction GC

Mea

n co

vera

geM

ean

cove

rage

Mea

n co

vera

ge

0.0

−1

0

1

2

3

4

5

6

0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

Fraction GC

0.00

1

2

3

4

5

6

7

0.2 0.4 0.6 0.8 1.0

10

0

20

0

10

20

30

40

50

60

40

60

80

100

120

140

20

30

40

50

60

70

80

90

0

1

2

3

4

5

6

7

A

B

C

Num

ber

of w

indo

ws

Num

ber

of w

indo

ws

Num

ber

of w

indo

ws

Figure 19.6 Comparison of genomic sequence coverage across differing G þ C con-tent within bacterial genomes. (A) E. coli (51% G þ C). (B) S. aureus (32% G þ C). (C)R. sphaeroides (70% G þ C). Single molecule DNA sequencing provides minimalsequence bias across diverse genomic content. Sequence reads were mapped to eachgenome and the number of reads which map in discrete bins of the genome are plotted(red line) versus the expected bins if the mapping was perfect. Obtaining a signal asnearly identical to each other demonstrates the unique ability to sequence across diverseGC and AT rich regions. The analysis utilized a 200-bp sliding window, the localGC content and observed mean sequencing coverage were tabulated. Windows werethen aggregated into GC-content bins ranging from 0 to 1 with a step size of 0.1.Plotted is the mean coverage (RED; Right Y axis) for each window within eachof the aggregated GC content bins (BLACK; Left Y axis). A distinguishing feature ofthe Helicos SMS approach appears to be the minimal shifts in coverage across the vastmajorities of sequence contexts.

Page 14: Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

420 Christopher Hart et al.

5.1. Copy number variation

Copy number variation studies provide a methodology for detecting ampli-fication and deletion of genomic regions across the human genome andoften represent critically important knowledge of mutational events occur-ring in cancer genomes. Given the demonstration of evenness of sequencecoverage in the bacterial strains representing the diversity of sequence foundin the human genome, the use of single-molecule sequencing with theHeliScope Sequencer for an assessment of copy number variation representsan important, cost-effective method.

1. When available, prepare 1–2 mg of genomic DNA as described inSection 3. Less material may be utilized if sample is limited. This materialmay be obtained as genomic DNA prepared from tissue, blood, andformalin-fixed paraffin-embedded (FFPE) genomic DNA.

2. In the case of FFPEDNA, visualize the isolated DNA on a 1% agarose gelto determine the size of the genomic DNA. It is possible that, dependingon the fixation of the tissue fromwhich theDNAwas obtained, theDNAstill consists of high-molecular weight DNA and, if above 2–3 kbp, willrequire additional shearing as described in Section 3.1.1.

3. If the FFPE DNA falls below the size range of 2–3 kbp, proceed directlyto Section 3.1.2 (DNA size selection) to ensure removal of small mole-cular weight DNA that can interfere with DNA sequence yields.

4. Following preparation of poly-dA tailed genomic DNA, load 150–300 pM of genomic DNA on each Helicos Flow Cell channel for theHeliScope Sequencer.

5. Depending on the desired level of resolution required for localizationof theregions of amplification and duplication, a decision will be requiredregarding the depth of sequence coverage desired. At present performance,one channel of the HeliScope Sequencer provides 0.2–0.3� coverage ofthe human genome. This allows you to group sequence reads by using‘‘bins’’ which can be between 10 and 50-kilobase-sized segments of thehuman genome. This resolution allows sufficient coverage for detection ofamplification and duplication events, including loss of heterozygosity andtwo- to threefold amplification across the entire human genome.

Figure 19.7 summarizes the copy number variation data obtained from ahuman cancer cell line in which approximately 100 Mio sequence readswere mapped to the genome at a read bin size of 1 kbp intervals. These dataare compared to existing comparative genomic hybridization data using anarray technology. Peaks of amplification are easily detected, and the peakintensities reflect the extent of amplification.We refer also to the copy numbervariation data obtained from the first single-molecule human genomesequence (Pushkarev et al., 2009). To further demonstrate the power ofsingle-molecule sequencing technology, data used for the comprehensive

Page 15: Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

Position

~30Mb

Cov

erag

e

0

20

40

60

80

100

120

140

Figure 19.7 Detection of genomic amplification comparing Helicos single-moleculesequencing to comparative genomic hybridization.Genomic DNA from a breast cancercell line was isolated, sheared, tailed, and sequenced according to the methodsdescribed in Section 5.1. Following sequencing, sequence reads were aligned to thehuman genome, binned into genomic bin sizes of 10 kb and bin sizes are plotted alongthe a 30 Megabase region chromosomal 20 (top panel). Regions of amplification areclearly detected in well described regions of Chr 20 previously identified using CGHarrays (bottom panel). (CGH Data: Courtesy of Genome Institute of Singapore).

Single Molecule Sequencing Methods 421

view shown in Fig. 19.7 are replotted as individual channels of HeliScopeSequencer data and displayed in a 14-kbp region of the genome with 1-kbpsmoothing of the read peaks (Fig. 19.8). These data reflect the consistency aswell as the resolution achieved in single channels, allowing one to detect aregion of five- to sevenfold amplification in this region.

6. Chromatin Immunoprecipitation Studies

Helicos single-molecule sequencing technology is ideally suited also foranother area of genomic science where accurate quantitation is key, ChIPstudies (Goren et al., 2010). This method requires no ligation, amplification,

Page 16: Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

2.0

0.02000 4000 6000 8000 10,000 12,000 14,000

0.5

1.0

1.5

Figure 19.8 Visualization of genomic amplification obtained from single HeliScopeSequencer Channels. Genomic reads used in Fig. 19.7 shown aligned to a segment ofChr 20 defining the boundary of amplification. Ten independent HeliScope Sequencerchannels demonstrate the ability of a single channel of HeliScope Sequencer reads toclearly define the boundary of genomic amplification.

422 Christopher Hart et al.

or complicated cleanup steps—all of which have the potential to induce sampleloss and bias. The Helicos ChIP Seq methodology consists of a 1-h 30 poly-Atailing step followed by a 1-h 30 dideoxy-blocking step.Recommended startingmaterial consists of 6–9 ng ChIP DNA (average fragment size 400–500 bp),although as little as 1–3 ng DNA prepared using this same method can besuccessfully employed. Typical yields obtained with the recommended 6–9 ngChIP DNA frommouse or human studies allow one to load 3–6 Helicos FlowCell Channels with a yield of 7–12Million aligned sequence reads per channel.

6.1. Preparation of ChIP DNA

1. The quantity of ChIP DNA should first be determined with theQuant-iTTM PicoGreen dsDNA Reagent Kit (Invitrogen).

2. Samples should be free of RNA contamination and the use of theQiagen Reaction Cleanup Kit (Qiagen) is recommended.

3. The micrococcal nuclease treatment used for fragmentation in someselected ChIP methods will generate phosphate groups on the 30 endsand thus will require end repair prior to initiating the ChIP tailingprotocol.

4. One must also consider the alternative types of shearing used for frag-mentation to ensure the 30 ends are amenable to direct poly-A tailing.Recommendations vary with shearing devices and one must check withthe manufacturer on their advice for subsequent end repair.

Page 17: Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

Single Molecule Sequencing Methods 423

6.2. ChIP DNA poly-A tailing

1. Prepare a mix of 2 ml 10� Terminal Transferase buffer (NEB), 2 ml2.5 mM CoCl2, ChIP DNA and Nuclease-free water 10.8 ml in a 200 mlPCR tube.

2. Place mixture in a thermocylcer and heat to 95 �C for 5 min to denaturethe DNA.

3. Remove tube from the thermocylcer and quickly chill in an aluminumblock held in an icy slurry to prevent renaturation.

4. Prepare a mix of 1 ml Terminal Transferase (1:4 diluted, 5 U/ml; NEB),4 ml 50 mM dATP, and 0.2 ml BSA (NEB).

5. Add5.2mlmix to the denaturedDNAon ice to bring total volume to 20ml.6. Place tube in the thermocycler and run the following program: 37 �C for

1 h, 70 �C for 10min,maintain at 4 �Cuntil ready to proceed to next step.

6.3. ChIP DNA 30 blocking1. Denature the 20 ml poly-A tailed ChIP DNA at 95 �C for 5 min in the

thermocycler, followed by immediate transfer to a prechilled aluminumblock kept in an ice and water slurry.

2. Prepare a 10 ml mixture of 1 ml 10� Terminal Transferase buffer (NEB),1 ml 2.5 mM CoCl2, 1 ml Terminal Transferase (1:4 diluted, 5 U/ml),0.5 ml 200 mM Biotin-ddATP and 6.5 ml Nuclease-free water.

3. Add the 10 ml mixture to the denatured, poly-adenylated ChIP DNAmixture for a final volume of 30 ml.

4. Place the tube in a thermocycler and run the following program: 37 �Cfor 1 h, 70 �C for 20 min, followed by 4 �C until ready to proceed tonext step.

5. Add 2 pmol of a 50–80 nucleotide carrier oligonucleotide to the aboveterminal transferase reaction to minimize ChIP DNA loss during thesample loading steps. Since it does not contain a poly-A tail, the oligo-nucleotide will not hybridize to the Helicos Flow Cell.

6. Hybridize ChIP DNA sample to Helicos Flow Cell and sequence.

7. Digital Gene Expression for Transcriptome

Quantitation

Full transcriptome sequencing using high-throughput sequencingplatforms (RNA Seq) has increased the sensitivity and accuracy of geneexpression analysis. However, RNA Seq results in an inherent bias as aresult of more reads from longer transcripts and thus has reduced the

Page 18: Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

424 Christopher Hart et al.

sensitivity for quantification of shorter transcripts (Oshlack and Wakefield,2009). Further, assessing expression levels requires prior knowledge oftranscript length for count normalization, which will not always be areasonable demand, say in the case where there may be alternative splicingvariants. Single-molecule sequencing digital gene expression (smsDGE)answers these difficulties and provides a route to quantitative analyses.

smsDGE differs from RNA Seq in that only a single sequence read isgenerated per transcript molecule, regardless of its length. This permits shorttranscripts to be detected with the same sensitivity as long ones. Thus,whereas it would require 50 million RNA Seq reads to quantify 95% ofthe human transcriptome, with smsDGE 10 million reads will suffice(Lipson et al., 2009).

7.1. Methodology for single-molecule sequencing digitalgene expression

Sample preparation for smsDGE is minimal, requiring neither PCR ampli-fication nor ligation. A summary of the method is shown in Fig. 19.9.Single-stranded cDNA is made directly from total RNA or poly-A þRNA using poly-U primed reverse transcription. The RNA is thendigested away using RNase, and a poly-A tail is added to the cDNA’s 30end using terminal transferase. The sample can then be hybridized to theHeliScope flow-cell surface and sequenced (Lipson et al., 2009).

7.1.1. Single-stranded cDNA preparation7.1.1.1. cDNA synthesis

1. Thaw RNA on ice (1–8 mg total RNA or 100–200 ng poly-Aþ RNA)preferably in 8 ml volume.

2. For sample: Prepare Master Mix A stock of 1 ml poly-U primer dTU25V(50 mM) and 1 ml dNTP nucleotide mix. Keep on ice. Prepare MasterMix B from Invitrogen SuperscriptIII kit as follows: 2 ml 10� ReverseTranscriptase buffer, 4 ml 25 mM MgCl2, 2 ml 0.1 mM DTT, 1 mlRNaseOUTTM and 1 ml Superscript III Reverse Transcriptase.

3. Aliquot 2 ml Master Mix A into a PCR tube.4. Pipette 8 ml of RNA Sample into the PCR tube. Mix thoroughly by

pipetting up and down.5. Incubate the RNA at 65 �C for 5 min. Snap cool by placing in

aluminum block held in an ice water bath.6. Add 10 ml Master Mix B Reverse Transcriptase enzyme and buffer to

each tube. Mix well and spin down.7. Place PCR tubes in the thermocycler. Incubate at 40 �C for 5 min,

55 �C for 50 min, 85 �C for 5 min, maintain at 4 �C until ready toproceed to next step.

Page 19: Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

Reverse transcriptase+ poly(U) primer

RNA/primer digestion

mRNA AAAAAAAAAA

AAAAAAAAAA

5�

5�

3�

3�

3�5�

5�

5�

5�

3�

1st strandcDNA

1st strandcDNA

1st strandcDNA

1st strandcDNA

UUUUUUUUUU

ddTAAAAAAAAAA

3�

3�

Terminal transferase+ dATP

Terminal transferase+ ddATP

True single moleculesequencing (tSMS)TM analysis

AAAAAAAAAA

Figure 19.9 Overview of method utilized for single molecule sequencing digital geneexpression. Principles employed in the single-molecule sequencing digital gene expres-sion methodology described in Section 7.

Single Molecule Sequencing Methods 425

7.1.1.2. RNA digestion

1. Add 1 ml RNase H (2 U/ml) to the cDNA Synthesis reaction. Mix welland incubate at 37 �C for 15 min, maintain at 4 �C until ready to proceedto next step.

Page 20: Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

426 Christopher Hart et al.

2. Add 1 ml USERTM (1 U/ml) enzyme. Mix well and incubate at 37 �C for15 min, maintain at 4 �C until ready to proceed to next step.

3. Add 1 ml RNase I (50 U/ml) enzyme. Mix well and incubate at 37 �C for15 min, maintain at 4 �C until ready to proceed to next step.

7.1.1.3. cDNA sample cleanup

1. Warm AMPure Bead solution to RT. Vortex to resuspend.2. Transfer cDNA sample to a 1.5-ml tube and add water to bring each

sample to 50 ml. Vortex the beads again and add 65 ml AMPure Beadslurry.

3. Incubate at RT for 30 min. Shake tube every 10 min.4. Briefly centrifuge at low speed, capture beads on Dynal magnetic stand

for 5 min and carefully aspirate supernatant.5. Wash beads twice with 200 ml freshly prepared 70% (v/v) ethanol.6. Briefly centrifuge, place on magnet, remove ethanol, and dry pellet

completely at RT for 5–7 min.7. Elute the cDNA sample from the beads with 20 ml distilled water twice.8. Repeat entire cDNA sample cleanup once more. Final product will be

in 40 ml volume.

7.1.1.4. cDNA quantification

1. Determine the concentration and yield for each cDNA sample prepara-tion using a small volume spectrophotometer. If the sample concentra-tion is likely below 2 ng/ml use the Quant-iTTM OliGreenÒ ssDNAReagent Kit and obtain spectrofluorometer reads accordingly.

2. Store samples at �20 �C to continue sample preparation the next day ifdesired.

7.1.2. Poly-A tailing of the cDNA7.1.2.1. Poly-A tailing reaction

1. Obtain control oligonucleotides from HelicosÒ Digital Gene ExpressionAssay Reagent Kit.

2. Place 20–60 ng of cDNA into a PCR tube. Add water to bring eachto 28 ml.

3. Add 7.5 ml of HelicosÒ Control Oligonucleotide. Mix well and storeon ice.

4. Incubate at 95 �C for 5 min. Snap cool on ice. Briefly centrifuge.5. Prepare poly-A Tailing mix of 5 ml 10� Terminal Transferase buffer,

5 ml CoCl2, 2.5 ml HelicosÒ poly-A Tailing dATP, and 1.5 ml TerminalTransferase. Mix well.

6. Add 14 ml of poly-A TailingMix to the cDNA and pipette up and down.

Page 21: Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

Single Molecule Sequencing Methods 427

7. Incubate at 42 �C for 60 min, 70 �C for 10 min, and maintain at 4 �Cuntil ready to proceed to next step.

7.1.2.2. Determining the success of the tailing reaction

1. Success of the tailing reaction is determined by monitoring the oligocontrol tailing. Run an aliquot of the control oligonucleotide withoutpoly-A tail addition and the control poly-A tailed oligonucleotidealongside your cDNA tailing reaction on a 4–20% gradient polyacryl-amide gel in 1� TBE, together with a 25-bp ladder.

2. Since the cDNA molecules will be of a very broad size range, assess thelength of the tail added to the control oligonucleotide as a measure of thetail added to the cDNA molecules.

3. Control-tailed oligos should migrate anywhere between 225 and 450 bpof the 25-bp ladder to ensure a proper poly-A tail with a desired lengthbetween 90 and 140 dA.

7.1.3. cDNA blocking7.1.3.1. cDNA blocking reaction

1. Incubate the cDNA sample at 95 �C for 5 min. Snap cool on ice todenature.

2. Add 0.3 ml biotin-ddATP and 1.5 ml of Terminal Transferase enzyme.Mix well and spin down.

3. Incubate at 37 �C for 30 min, then 70 �C for 10 min, and maintain at4 �C until ready to proceed to next step.

7.1.3.2. Poly-A tailing control oligonucleotide digestion

1. Add 1 ml USER Enzyme (1 U/ml) to the cDNA sample. Mix well andspin down.

2. Incubate at 37 �C for 30 min, maintain at 4 �C until ready to proceed tonext step.

7.1.3.3. Sample cleanup

1. Transfer cDNA from digestion step above to a 1.5-ml tube. Add waterto bring volume to 60 ml.

2. Mix cDNA with 60 ml AMPure Bead slurry and incubate at RT for30 min. Shake every 10 min.

3. Capture the beads on Dynal magnetic stand for 5 min and carefullyaspirate supernatant.

4. Wash beads twice with 200 ml freshly prepared 70% (v/v) ethanol.

Page 22: Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

428 Christopher Hart et al.

5. Dry pellet completely at RT for 5–7 min.6. Elute cDNA sample from beads with 20-ml distilled water twice.

Hybridize 150–300 pmol smsDGE cDNA in 20 ml volume to HelicosFlow Cell and sequence.

7.2. Demonstration of DGE counting reproducibility

To assess smsDGE reproducibility, we independently prepared three brainsamples from the same RNA (poly-A RNA, Ambion, Austin TX) andsequenced each sample in a single HeliScope flow-cell channel. The threechannels yielded 15, 14, and 12 million transcriptome-aligned reads. Tran-script abundance ranged from 0 to 370,000 transcripts per million (tpm)with the highest seen for mitochondrial transcripts (chromosome M). Ofthe 28,800 transcripts included in our reference (UCSC genome database),18,700 were present at a level higher than 1 tpm (>12 mapped reads).Transcript count reproducibility between samples was high (r ¼ 0.99)with coefficient of variation (%CV) ranging from 4% at 100 tpm to 20%at 1 tpm (Fig. 19.10).

8. Summary

Methods for single-molecule sequence analysis of nucleic acids pro-vide a diverse repertoire for quantitative and qualitative investigation of thegenome and transcriptome. As such, we have attempted to describe many ofthe simple sample preparation methods offered to the research community.We will continue to optimize our sample preparation protocols to allowpreparation and sequencing from picogram quantities of nucleic acid(Ozsolak et al., 2010)—all important for maximizing researchers abilitiesto perform important biological experiments with limiting biological sam-ple amounts. These methods will serve as the starting point for the nextedition of methods for single-molecule sequencing.

ACKNOWLEDGMENTS

Special thanks to the many individuals who have contributed to the success of HelicosBioSciences technology—for their scientific excellence and passions to develop a remarkablenew technology and all its broad applications.

Page 23: Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

1

10

100

1000

10,000

100,000

1,000,000

A

B

1 10 100 1000 10,000 100,000 1,000,000Bra

in s

ampl

e 2

(tpm

, sing

le c

hann

el, 14

.5m

illio

n re

ads)

Brain sample 1 (tpm, single channel, 15million reads)

0

10

20

30

40

50

60

70

80

90

0.1 1 10 100 1000 10,000

Coe

ffic

ient

of va

riat

ion

Average transcript abundance (tpm, 12–15M reads/channel)

Figure 19.10 Demonstration of transcript counting reproducibility obtained usingsmsDGE methods with human brain RNA. smsDGE transcript quantification of inde-pendently prepared human brain samples. (A) Transcript count comparison betweentwo samples run on a single flow-cell channel each. Each sample represents a singletranscript (r ¼ 0.99). (B) Coefficient of variation (%CV) across transcript abundancelevels between three samples at 12, 14, and 15 million transcriptome-aligned reads perchannel.

Single Molecule Sequencing Methods 429

Page 24: Chapter 19 - Single-Molecule Sequencing: Sequence …biology.hunter.cuny.edu/molecularbio/Class...Single-Molecule Sequencing: Sequence Methods to Enable Accurate Quantitation Christopher

430 Christopher Hart et al.

REFERENCES

Bowers, J., Mitchell, M., Beer, E., Buzby, P. R., Causey, M., Efcavitch, J. W., Jarosz, M.,Krzymanska-Olejnik, E., Kung, L., Lipson, D., Lowman, G. M., Marappan, S., et al.(2009). Virtual terminator nucleotides for next generation DNA sequencing. Nat.Methods 6, 593–595.

Dohm, J. C., Lottaz, C., Borodina, T., and Himmelbauer, H. (2008). Substantial biases inultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 36(16), e105.

Goren, A., Ozsolak, F., Shoresh, N., Ku, M., Adli, M., Hart, C., Gymrek, M., Zuk, O.,Regev, A., Milos, P. M., and Bernstein, B. E. (2010). Chromatin profiling by directlysequencing small quantities of immunoprecipitated DNA. Nat. Methods 7(1), 47–49.

Harris, T. D., Buzby, P. R., Babcock, H., Beer, E., Bowers, J., Braslavsky, I., Causey, M.,Colonell, J., Dimeo, J., Efcavitch, J. W., Giladi, E., Gill, J., et al. (2008). Single-moleculeDNA sequencing of a viral genome. Science 320(5872), 106–109.

Hayden, E. (2009). Genome sequencing: The third generation. Nature 457, 768–769.Kahvejian, A., Quackenbush, J., and Thompson, J. F. (2008). What would you do if you

could sequence everything? Nat. Biotechnol. 26, 1125–1133.Lipson, D., Raz, T., Kieu, A., Jones, D. R., Giladi, E., Thayer, E., Thompson, J. F.,

Letovsky, S., Milos, P., and Causey, M. (2009). Quantification of the yeast transcriptomeby single-molecule sequencing. Nat. Biotechnol. 27, 652–658.

Oshlack, A., andWakefield, M. J. (2009). Transcript length bias in RNA-seq data confoundssystems biology. Biol. Direct 4, 14.

Ozsolak, F., Platt, A., Jones, D., Reifenberger, J., Sass, L. E., McInerney, P.,Thompson, J. F., Bowers, J., Jarosz, M., and Milos, P. (2009). Direct RNA sequencing.Nature 461, 814–818.

Ozsolak, F., Goren, A., Gymrek, M. A., Guttman, M., Regev, A., Bernstein, B. E., andMilos, P. M. (2010). Digital transcriptome profiling from attomole-level RNA samples.Genome Res. [Epub ahead of print].

Pushkarev, D., Neff, N. F., and Quake, S. R. (2009). Single-molecule sequencing of anindividual human genome. Nat. Biotechnol. 27, 847–850.