4
16 WWW.CEN-ONLINE.ORG DECEMBER 14, 2009 THE FIRST human genome sequence cost more than $2 billion and took about a decade to complete. Achieving the mile- stone of the billionth base sequenced in the 3 billion-base-pair human genome was a cause for celebration. Since the end of the Human Genome Project in 2003, cost and time barriers in sequencing methods have been shattered multiple times, and they keep getting lower. Technology has now advanced to the point that sequencing a genome can take less than a month and with some platforms, less than a week. Sequencing a billion bases is now the work of a single day. And the cost? Depending on the plat- form and whom you ask, reagents to sequence a billion bases now run as little as $5,000, although instru- mentation, facility, and personnel costs must be added to that. The technological improvements and plummeting costs of sequenc- ing are paving the way for whole- genome sequencing for research purposes and, ultimately, affordable sequencing of personal genomes. Scientists interested in genome sequencing can now choose from a number of sequencing technologies that ease sample preparation and allow longer stretches of DNA to be read at a time. And there are more technologies to come. The Human Genome Project obtained the first human sequence with the Sanger method of DNA sequencing. This laborious method involves copying a DNA template in the presence of dye-labeled modified nucleotides that termi- nate DNA-strand elongation when they are incorporated. Because these modified nucleotides can be incorporated at any point in the strands, the sequencing reac- tion results in a mixture of DNA strands of different length, each with its final base labeled with a fluores- cent dye. These strands are separated by size with capillary electrophoresis. Lining the strands up in order creates a ladder that allows the sequence to be read easily. Although the Sanger method remains the gold standard for DNA sequencing in terms of accuracy, it is being supplanted by more efficient methods. Many of the so-called second-generation sequencing methods use an approach in which DNA is sequenced by replicating a template and noting the identity of each base after its incorporation into the growing strand. Most commercially available technologies can accurately “read” a relatively small number of bases—generally up to 35 to 100—at one time. Just like Sanger sequenc- ing, most current methods require extensive sample preparation that involves making and amplifying a library of fragments from genomic DNA. But unlike Sanger sequencing, no lengthy separa- tion step is required, and the methods are massively parallel. A genome sequence is assembled by aligning millions of these fragments against the reference sequence from the Human Genome Project. Because of the short read lengths and because of potential errors in base identification, each base must be identified many times. The higher this “fold coverage”—often about 30×—the higher the confidence in putative deviations, such as single-nucleotide poly- morphisms, from the reference sequence. Generally, bases are added one at a time and allowed to react. Excess reagent is rinsed out, and the device is then imaged to see where a base was incorporated. When a base is incorporated into a growing DNA strand, it gives off a detectable signal. The nature of that signal is the primary differ- ence between versions of the method. The first of these methods for whole- genome sequencing was commercialized by 454 Life Sciences, in Branford, Conn. In that company’s version, called pyrosequencing, the nucleotides release a flash of chemilu- minescent light when they are incorporated into the growing strand. 454’s technology achieves longer read lengths than other available methods, currently about 500 bases, according to Michael Egholm, vice president of research and develop- ment. “We’re working on doubling the read length,” he says. Another pioneer of second- generation DNA sequencing is San Diego-based Illumina. In Il- lumina’s technology, synthesis occurs on an array of randomly ad- dressable clusters, each one with a different portion of the DNA sequence attached to it. All four bases can be added simultaneously because each base has a different fluorescent label. This reduces the number of cycles needed for a given read length and speeds up the run time. Illumina has increased its read lengths from about 35 bases to 100 bases while reducing the time required for each cycle. “The time the instrument runs is the product of the number of cycles and the time for each cycle,” says Jay T. Flatley, president and chief execu- tive officer at Illumina. “As our BRIGHT LIGHTS In Helicos Biosciences’ method, individual DNA strands extracted from cells are sequenced by iterative cycles of base addition. HELICOS BIOSCIENCES COVER STORY BIONANOMATRIX DNA SEQUENCING FORGES AHEAD Next-generation technologies LOWER COST of genome sequencing CELIA HENRY ARNAUD, C&EN WASHINGTON

DNA SEQUENCING FORGES AHEAD

Embed Size (px)

Citation preview

16WWW.CEN-ONLINE.ORG DECEMBER 14, 2009

THE FIRST human genome sequence cost more than $2 billion and took about a decade to complete. Achieving the mile-stone of the billionth base sequenced in the 3 billion-base-pair human genome was a cause for celebration.

Since the end of the Human Genome Project in 2003, cost and time barriers in sequencing methods have been shattered multiple times, and they keep getting lower. Technology has now advanced to the point that sequencing a genome can take less than a month and with some platforms, less than a week. Sequencing a billion bases is now the work of a single day. And the cost? Depending on the plat-form and whom you ask, reagents to sequence a billion bases now run as little as $5,000, although instru-mentation, facility, and personnel costs must be added to that.

The technological improvements and plummeting costs of sequenc-ing are paving the way for whole-genome sequencing for research purposes and, ultimately, affordable sequencing of personal genomes. Scientists interested in genome sequencing can now choose from a number of sequencing technologies that ease sample preparation and allow longer stretches of DNA to be read at a time. And there are more technologies to come.

The Human Genome Project obtained the first human sequence with the Sanger method of DNA sequencing. This laborious method involves copying a DNA template in the presence of dye-labeled modified nucleotides that termi-nate DNA-strand elongation when they are incorporated. Because these modified nucleotides can be incorporated at any point in the strands, the sequencing reac-tion results in a mixture of DNA strands of different length, each

with its final base labeled with a fluores-cent dye. These strands are separated by size with capillary electrophoresis. Lining the strands up in order creates a ladder that allows the sequence to be read easily. Although the Sanger method remains the gold standard for DNA sequencing in terms of accuracy, it is being supplanted by more efficient methods.

Many of the so-called second-generation sequencing methods use an approach in which DNA is sequenced by replicating a template and noting the identity of each

base after its incorporation into the growing strand. Most commercially available technologies can accurately “read” a relatively small number of bases—generally up to 35 to 100—at one time.

Just like Sanger sequenc-ing, most current methods

require extensive sample preparation that involves making and amplifying a library of fragments from genomic DNA. But unlike Sanger sequencing, no lengthy separa-tion step is required, and the methods are massively parallel. A genome sequence is assembled by aligning millions of these fragments against the reference sequence from the Human Genome Project. Because of the short read lengths and because of potential errors in base identification, each base must be identified many times. The higher this “fold coverage”—often about 30×—the higher the confidence in putative deviations, such as single-nucleotide poly-morphisms, from the reference sequence.

Generally, bases are added one at a time and allowed to react. Excess reagent is rinsed out, and the device is then imaged to see where a base was incorporated. When a base is incorporated into a growing DNA strand, it gives off a detectable signal. The nature of that signal is the primary differ-ence between versions of the method.

The first of these methods for whole-genome sequencing was commercialized by 454 Life Sciences, in Branford, Conn. In that company’s version, called pyrosequencing, the nucleotides release a flash of chemilu-minescent light when they are incorporated into the growing strand. 454’s technology achieves longer read lengths than other available methods, currently about 500 bases, according to Michael Egholm, vice

president of research and develop-ment. “We’re working on doubling the read length,” he says.

Another pioneer of second-generation DNA sequencing is San Diego-based Illumina. In Il-lumina’s technology, synthesis occurs on an array of randomly ad-dressable clusters, each one with a different portion of the DNA sequence attached to it. All four bases can be added simultaneously because each base has a different fluorescent label. This reduces the number of cycles needed for a given read length and speeds up the run time.

Illumina has increased its read lengths from about 35 bases to 100 bases while reducing the time required for each cycle. “The time the instrument runs is the product of the number of cycles and the time for each cycle,” says Jay T. Flatley, president and chief execu-tive officer at Illumina. “As our

BRIGHT LIGHTS In Helicos Biosciences’ method, individual DNA strands extracted from cells are sequenced by iterative cycles of base addition.

HE

LIC

OS

BIO

SC

IEN

CE

S

COVER STORY

BIO

NA

NO

MA

TR

IX

DNA SEQUENCING FORGES AHEAD

Next-generation technologies LOWER COST of genome sequencing

CELIA HENRY ARNAUD , C&EN WASHINGTON

17WWW.CEN-ONLINE.ORG DECEMBER 14, 2009

read lengths have gone up, we’ve reduced the cy-cle time enough that the overall run time hasn’t gone up very much.”

Helicos Biosciences, in Cambridge, Mass., also uses sequencing by synthesis, but the firm sequences single molecules rather than making many copies.

“We’re the only company that actually sequences the molecules of DNA that come out of your cell” instead of cloned and amplified DNA, says Patrice Milos, chief scientific officer at Helicos. “Because we’re sequencing the DNA molecules that come from your cells, there are no libraries cre-ated, there’s no ligation, there’s no sample amplification.” The sample preparation is therefore easier, and potential sources of bias are eliminated.

DNA EXTRACTED from the cells is broken into pieces, and the end of each fragment is tagged with a “polyA tail,” a string of a denosine nucleotides. The polyA tail allows the DNA to be attached to the surface of a flow cell. The technology now requires the amount of DNA in 100 to 200 cells to sequence an entire genome, but the compa-ny aspires to single-cell analysis, Milos says.

Pacific Biosciences’ technology puts a different twist on single-molecule sequencing by synthesis. Most of these methods treat the polymerase enzyme as just another reagent to be added and rinsed away with each cycle of base addition.

“Our view is that this enzyme is really a sequencing instrument in and of itself, and what a horrible shame to throw it away af-ter every base you sequence,” says Stephen Turner, chief technology officer at PacBio, in Menlo Park, Calif. “If we free it up to do what nature has programmed it over bil-lions of years of evolution to do, we can get the extraordinary features that it has of extreme frugality and high speed.”

A crucial component of PacBio’s tech-nology is that fluorescent dyes are at-

tached to the terminal phosphate of each nucleotide instead of the base, Turner says. Attaching the dye to the phosphate means that it will be spontaneously removed as part of the nucleotide incorporation, elimi-nating the need for a separate step to cleave off the dye.

PacBio’s sequencing chip is made of an array of zero-mode waveguides. Each wave-

guide is a hole tens of nanometers in diam-eter fabricated in a metal film on a silicon di-oxide substrate. The 20-zeptoliter detection volume is small enough to detect the fluo-rescence from each incorporated nucleotide against a background of all the other labeled nucleotides floating in solution. When the polymerase cleaves off the label, the dye dif-fuses out of the detection volume.

SMALL SCIENCE A DNA sample is pipetted onto a silicon wafer chip containing BioNanomatrix’ nanodevices.

18WWW.CEN-ONLINE.ORG DECEMBER 14, 2009

The firm’s technology enables long read lengths of thousands of bases. The read length is limited by the dye photochemistry, which can lead to damage that shuts down the enzyme, and by the processivity of the polymerase, or how long it remains associ-ated with the template strand. The enzyme can incorporate about 100,000 bases before

it dissociates from the template, although PacBio plans to start with much shorter read lengths of up to a few thousand bases. PacBio is set to release its instrument in the second half of 2010, Turner says.

Other companies, such as Applied Bio-systems (a division of Life Technologies), Dover-Polonator (a division of Danaher), and Complete Genomics, sequence DNA using variants of a method called sequenc-ing by ligation. In Complete Genomics’ method, fluorescently labeled probe mol-ecules are used to read seven to 10 bases ad-jacent to an adapter site inserted into a DNA sample. An anchor hybridizes to the adapter site, and the probe molecules compete for ligation to that anchor. A probe succeeds only if it is complementary to the sequence adjacent to the adapter site. A different pool of probes is used to read each base po-sition. The base in that particular location is identified by its fluorescent label. Repeat-

ed cycles of ligation build the sequence. One of the strengths of this sequencing-

by-ligation method is that the identification of each base is independent of the identifi-cation of its neighbors. “In sequencing by synthesis, if one base is not incorporated, you cannot incorporate the next base. Our approach doesn’t depend on that,” says

Radoje Drmanac, chief scientific officer at Complete Genomics, in Mountain View, Calif. “We sequence each base independent of the others. Once we read a base, we reset the DNA back to the original state and start from scratch. Every cycle starts from the same clean conditions.

“This unchained DNA sequenc-ing technique helps improve the quality of the base reads and lower the reagent costs,” Drmanac says. “Each base is equally as good as the first, because they’re inde-pendently read,” he says. “We use very low concentrations and small volumes of the labeled reagents, which are the expensive reagents used in every current sequenc-ing technology.”

Applied Biosystems uses chained sequencing by probes that identify two bases at once, and each base is identified by two different probes. This di-

base sequencing approach increases the accuracy of the company’s DNA sequences to 99.5%, according to Shaf Yousaf, president of the MCB Genomic Analysis Division at Life Technologies, in Foster City, Calif.

Other sequencing technologies on the horizon require little or no sample preparation, and their developers hope they also achieve longer read lengths.

Companies such as Cambridge, Mass.-based ZS Genetics and San Francisco-based Halcyon Molecular are using electron microscopy as a way to sequence DNA. The companies replicate DNA by using nucleotides modified with heavy-atom labels that can be seen with electron microscopy. “The labels have some real constraints,” says William R. Glover III, CEO of ZS Genetics. “They have to be heavy enough to be detectable, and they have to be light enough not to cause problems with the

enzyme. Within that range, they have to be different enough that they can be distinct.”

In these techniques, researchers label and then stretch double-stranded DNA on a surface and take a picture of it with an electron microscope. The position of labels along the DNA molecule reveals the base sequence. “We have molecules that are 10,000 and 20,000 base-pairs long,” Glover says. “We think our read lengths will allow for a coverage ratio of 5 to 8×, which we think will be a huge advantage in reassembly and informatics costs.”

Another technology with potentially long read lengths uses nanopores made from the protein α-hemolysin as the se-quencing device. Developed in the lab of Hagan Bayley at the University of Oxford, the nanopores are being commercially de-veloped by Oxford Nanopore Technologies and will be marketed by Illumina.

DNA sequence can be read by the nano-pore in one of two ways. In both methods, the bases are discriminated by the change they induce in the amplitude of the current carried by aqueous ions passing through the

COVER STORY

The $1,000 genome is within sight, and the time has come to think about what happens when that barrier has been smashed.

CO

MP

LE

TE

GE

NO

MIC

S

1980 1985 1990 1995 2000 2005 2010

10,000,000

Base pairs sequenced per dollar

1,000,000

100,000

10,000

1,000

100

10

1

0.1

0.01

2009

Ligation: $1,000–$5,000Polymerization: $50,000

2011: $100

1995

Gel electrophoresis: $3 billion

2005

Capillary electrophoresis: $50 million

A NEW ‘MOORE’S LAW’ Improvements in DNA sequencing are driving down the cost of whole genomes

NOTE: Dollar figures refer to reagent costs. SOURCE: George Church, Harvard University

pore. In one method, an intact DNA strand is threaded through the pore and the bases are identified as they pass a reading head. The challenge with this method is slowing the DNA enough to differentiate the bases as the molecule passes through the pore.

“Over the past year, we’ve made tremen-dous progress in identifying bases in intact

TIES THAT BIND In Complete Genomics’ sequencing-by-ligation method,

a fluorescently labeled probe binds to the target genomic DNA. DNA ligase connects the probe to the anchor DNA sequence bound to a synthetic adapter sequence.

19WWW.CEN-ONLINE.ORG DECEMBER 14, 2009

strands of DNA,” Bayley says. Until the problem with DNA speed is solved, Bayley won’t know how long a read length is pos-sible. “There doesn’t seem to be any real reason why you couldn’t feed through tens or even hundreds of kilobases,” he says.

Oxford Nanopore is focusing on the sec-ond method, in which an enzyme attached to the pore cleaves bases from the DNA strand so they traverse the pore one at a time. The bases bind to a cyclodextrin mole-cule attached to the inside of the pore. “The enzyme is going to work at a reasonably slow rate,” thus controlling the problem of exces-sive DNA speed, says Bayley, who founded the company and serves on its board.

THE PHILADELPHIA-BASED company BioNanomatrix is also working with long DNA strands rather than short fragments. “We are reducing the complexity of the genome analysis problem from one of a 10,000-piece puzzle in assembly and analy-sis to something more akin to a toddler puzzle of a few pieces,” says Michael Boyce-Jacino, company CEO.

BioNanomatrix’ technology provides access to a source of genetic diversity that is not easily accessed with short-read tech-nologies. These are structural changes in which whole chunks of DNA sequence ap-pear in different places on chromosomes. People are “99.9% identical in DNA se-quence, but we’re as little as 85% identical in the organization of our genomes,” Boyce-Jacino says. “We have the same base layout, but the organization is indeed personal.”

Boyce-Jacino compares BioNanomatrix’ technology with Google Maps, in which a user can zoom to different levels of detail. “Currently, DNA sequencing is like going house to house on the map. The problem is that you don’t know if you’re on Elm Street in Ohio or Elm Street in New Jersey. In the genome, because ‘Elm Street’ might be duplicated, you need to be able to zoom out a little bit and understand where that se-quence goes to make sure you know where you are in the genome.”

The company’s technology uses a nanofabricated device to separate double-stranded DNA containing 100,000 to 200,000 base pairs into individual lanes. Unique blocks of DNA that are seven bases long are labeled. The location of those blocks within the genome creates a bar code for individual genomes. BioNano-matrix is collaborating with Complete Genomics to use the latter company’s probe ligation approach to sequence short

stretches of DNA within the longer stretch-es. The goal of that long-term project is to reduce the reagent cost of human DNA se-quencing to $100 or less per genome.

All of these improvements are accelerat-ing DNA sequencing and lowering its cost. The $1,000 genome is within sight, and the time has come to think about what happens when that barrier has been smashed.

“When we started talking about the $1,000 genome, I don’t think we realized how fast we were going to get there,” says Richard P. Lifton, a genetics professor at Yale University. “We now need to think about when we’re going to get to the $10 or $100 ge-nome. We clearly are headed on a trajectory to be able to do that. We need to think about how we want to deploy the technology.” ■

Fax your orders to 614 760-9781 or NEW: Order online at www.quantabiodesign.com

The beginning of the future in chemical modification

of biologics! Aggressively expanding our branched

dPEG™ product line!

The dPEG™ylation Revolution continued!

NEW!!…3-branched m-dPEG™4, 8 and 24!!

Coming soon! Higher branched dPEG™s with MWs over 10 kDa!!

SAME ADDRESS

NEW LOOK!Visit the New and Featured

products section of our website

for all of our recent new products!!