document

i i VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

A note on nomenclatureLaura H Reid & Janet A Warrington

The following glossary defines key terms and concepts that are used throughout the Microarray Quality Control (MAQC) consortium manuscripts. Wherever possible, the definitions are based on the Clinical and Laboratory Standards Institute harmonized terminology database (http://www.clsi.org).

Detection call. A qualitative value that suggests a level of confidence in the signal calculated for that probe. In the MAQC study, the detection calls were binary and reduced to either ‘0’ for ‘not detected’ or ‘1’ for ‘detected’. For some platforms, the detection call reflects the quality of the nucleic acid spot on the microarray, similar to ‘Flag/No Flag’ scores. On other platforms, the detection call reflects the abundance of the target transcript or the concordance of results between multiple probes in a probe set, similar to ‘Absent/Present’ calls. Although the final detection call is qualitative, it is usually based on quantitative assessments and complex statistics.

External RNA control. An RNA species added to a biological sample during processing for the purpose of assessing technical performance of a gene expression assay. Different external RNA controls may be used to monitor different processes. In microarray research, external RNA controls are added either to a total RNA sample (to assess the enzymatic processes involved and the hybridization step) or to the labeled cRNA (to assess hybridization efficiencies only).

Gene. An expanded definition of this term was adopted by the MAQC consortium to denote both a DNA segment and the collection of RNA transcripts derived from it. In the DNA usage, a gene is a locatable region of genomic sequence, corresponding to

a unit of inheritance, which is associated with regulatory regions, transcribed regions and/or other functional sequence regions. In the RNA usage, a gene often refers to the targets measured in a gene expression assay.

Probe. A discrete piece of nucleic acid used to identify specific DNA or RNA molecules bearing the complementary sequence. Some microarray platforms rely on a single oligonucleotide probe to assay an RNA target; others combine data from multiple probes, arranged in a probe set, when calculating expression values for a target. Bead-based assays attach oligonucleotide probes to a microscopic bead surface. PCR-based assays use a pair of oligonucleotide primers (also referred to here as probes) to identify and amplify their intended RNA target, and in some cases, an oligonucleotide detection probe is hybridized to the amplified target.

Repeatability. The ability to provide closely similar results from replicate samples processed in parallel at the same test site using the same gene expression assay.

Reproducibility. The ability to provide closely similar results from replicate samples processed with different microarray platforms or at different test sites using the same gene expression assay.

Signal. The quantitative expression value for each probe derived from a hybridization image after preprocessing steps, such as background subtraction and summarizing of data from multiple probes, as well as normalization procedures that remove systematic artifacts. Signals are not the raw fluorescence or chemiluminescence intensities captured in a pixelated microarray image.

Target. Nucleic acid whose identity and/or abundance is revealed during the assay. The gene expression assays in the MAQC study have RNA targets. Multiple RNA targets can be transcribed from a single gene and individual transcripts can be alternatively spliced into multiple targets with different functions and expression patterns. Thus, a gene expression assay designed for one target may actually detect multiple RNA transcripts.

Laura H. Reid, Expression Analysis, Inc., 2605 Meridian Parkway, Durham, North Carolina 27713, USA. Janet A. Warrington, Affymetrix, Inc., 3420 Central Expressway, Santa Clara, CA95051, USA

GLOSSARY©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006 vi i

• Activated sludge metagenomics• Genome of a bioplastic producer• Knock-ins for knockout anti-inflammatory mAbs• Arrested protein chip fabrication• High-definition microarray for DNA binding site searches

Next month in

I N T H I S I S S U E

MicroArray Quality Control projectSince 2004, when the US Food and Drug Administration (FDA; Rockville, MD) started accepting voluntary genomic data submissions, the number and scope of DNA microarray–based expression data analyses filed as accompanying (but non-binding) information with new drug applications has been steadily increasing. And although the potential value of the information contained in these submissions is undisputed, no clear guidelines and standards have as yet been established for their use as part of a regulatory decision-making process [Foreword, p. 1103; Commentary, p. 1105].

But human healthcare is not the only area where microarrays represent a promising technology; environmental monitoring of pollutants through toxicogenomics, for instance, could also greatly benefit from their adoption. In a similar manner to their use in drug monitoring, microarrays could also be applied to detect early, subchronic exposure to pollutants using model systems or, at the very least, to characterize some of the underlying molecular mechanisms in toxicity [Commentary, p. 1108]. The practical challenges in implementing microarrays for the above applications will not be trivial, however. To translate the outcome of microarray analyses into the clinical and regulatory realms, many questions regarding sensitivity, reproducibility and ultimately biological significance remain to be answered [Commentary, p. 1112].

It is in this context that the MicroArray Quality Control (MAQC) project was conceived by a group of regulatory, academic and industrial partners to comprehensively tackle some of the technical issues surrounding the robustness and comparability of some of the most widely used microarray platforms. Starting with two well-defined, commercially available RNA samples, this consortium has carried out a side-by-side evaluation of seven different platforms with the aim of establishing a series of metrics that would facilitate future standardization approaches [Article, p. 1151]. To validate that microarray data are comparable to data obtained from other, more traditional gene expression assays, the MAQC data set was also assessed against three quantitative molecular assays for measuring gene transcription; and it turns out that the overlap is encouragingly high [Analysis, p. 1115].

Another important question to be addressed by the MAQC consortium was the use of RNA aliquots, external to the actual samples, that can serve as internal, technical controls for evaluating the level of performance at different steps of the experimental protocol, from reverse transcription to labeling of the samples [Analysis, p. 1132]. If adopted widely by the community, these and similar external RNA controls could provide researchers with a qualitative assessment of their assay’s performance. In a separate experiment, the consortium also put the quantification capabilities of the different platforms to test. Using a series of titration samples, good concordance of predicted and actual measurements was reported across platforms [Analysis, p. 1123].

In the early days of microarrays, two-color detection protocols were often preferred to those using one-color labeling of RNA because they could compensate for some of the imperfections and inaccuracies in microarray probe spotting. However, with improvements in microarray manufacture, the performance of one-color versus two-color platforms is becoming a central question for high-volume data generation with microarrays, in that robust and reliable single-color protocols would greatly facilitate implementation, and reduce the cost, of analyses [Analysis, p. 1140]. In a final report, the MAQC group applies their approach to real-world toxicogenomic analysis of rats exposed to three plant-derived carcinogenic compounds, aristolochic acid, riddelliine and comfrey. Again, the results across platforms showed high accuracy, reproducibility and biological relevance [Article, p. 1162]. AM & GTO

In This Issue written by Michael Francisco, Peter Hare, Sabine Louët, Andrew Marshall, Gaspar Taroncher-Oldenburg &Jan-Willem Theunissen.

Patent roundup

• Timothy Caulfield and colleagues report that policy makers may respond more to media controversies than systematic data on gene patenting. [Patent Article, p. 1091] MF

• A US federal appeals court ruled on August 3 that Cambridge-based Transkaryotic Therapies (TKT), acquired last year by Shire, has infringed two patents held by Amgen for the production of erythropoietin. [News in Brief, p.1048] SL

• Recent patent applications in tissue engineering. [New Patents, p. 1095] MF

©20

06 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

ureb

iote

chno

logy

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006 1039

Making the most of microarraysA major, multicenter study of microarray performance is a first step in translating the technology from bench to bedside.

No technology embodies the rise of ‘omic’ science more than the DNA microarray. First reduced to practice in the early 1990s, it has since

undergone numerous iterations, adaptations and refinements to achieve its present status as the platform of choice for massively parallel gene expression profiling. Today, several thousand papers describing data from microarrays are published each year. Sales of arrayers, array scanners and microarray kits to the academic and industrial R&D community represent a multi-billion-dollar business. The microarray has even made its first for-ays into the clinic, with the US Food and Drug Administration’s approval of the ‘AmpliChip’ to help physicians tailor patient dosages of drugs that are metabolized differentially by cytochrome P450 enzyme variants.

And yet doubts linger about the reproducibility of microarray experi-ments at different sites, the comparability of results on different platforms and even the variability of microarray results in the same laboratory. After 15 years of research and development, broad consensus is still lacking concerning best practice not only for experimental design and sample preparation, but also for data acquisition, statistical analysis and inter-pretation.

Though problematic for bench research, lack of resolution of these issues continues to even more seriously hamper translation of microar-ray technology into the regulatory and clinical settings. Indeed, several regulatory authorities have been wrestling with the problem of how and when (and indeed whether) to implement microarray expression profiling data as part of their decision-making processes. The move in the past two years to accept voluntary genomic data submissions by regulatory agencies overseeing human and environmental safety was the first in a long series of steps that will be needed.

One of the next steps can be found in this issue, which presents the first formal results of the MicroArray Quality Control (MAQC) Consortium—an unprecedented, community-wide effort, spearheaded by FDA scientists, that seeks to experimentally address the key issues sur-rounding the reliability of DNA microarray data. MAQC brings together more than a hundred researchers at 51 academic, government and com-mercial institutions to assess the performance of seven microarray plat-forms in profiling the expression of two commercially available RNA sample types. Results are compared not only at different locations and between different microarray formats but also in relation to three more traditional quantitative gene expression assays.

Although the direct comparison of microarray platforms and the establishment of common controls for microarray experiments is noth-ing new—several cross-format studies have already been published, and other groups, such as the External RNA Controls Consortium’s (ERCC), are developing standardized RNA controls—it is the size and comprehen-siveness of the data set generated by the MAQC effort that is unique. In the main study, ~60 hybridizations were carried out on each of the seven platforms; >1,300 microarrays were used during the entire project.

MAQC’s main conclusions confirm that, with careful experimental design and appropriate data transformation and analysis, microarray data can indeed be reproducible and comparable among different for-mats and laboratories, irrespective of sample labeling format. The data also demonstrate that fold change results from microarray experiments correlate closely with results from assays like quantitative reverse tran-scription PCR.

The levels of variation observed between microarray runs by MAQC were relatively low and largely attributable to cross-platform differences in probe binding to alternatively spliced transcripts or to transcripts that show a high degree of cross-hybridization to probes other than their own. Thus, although factors as diverse as day-to-day fluctuations in atmospheric ozone levels (which effect cyanidine 5 fluorescence), nuclease levels in sample tissues and the quality of microarray production between batches have all been cited as influencing array performance, on the basis of the data presented here, experimental variability appears manageable.

Another clear finding is that the days of the simple two-sample t-test as a means of ranking differentially expressed genes are surely numbered. A key take-home message is that statistical analysis in regulatory submis-sions and clinical diagnostics is likely to be different from that used in basic research and discovery. In the case of the MAQC study—where the goal was to optimize intra- and inter-platform reproducibility—the approach was to limit the number of transcripts identified and to sort differentially expressed genes using fold-change ranking with a nonstringent P-value cutoff. But for experiments that seek to identify differentially expressed transcripts at or near the lower limits of detection, this tradeoff between reproducibility on the one hand and precision and sensitivity on the other is likely to shift, and a different type of statistical analysis will be required. There is no one-size-fits-all statistical solution.

Overall, the MAQC study represents a landmark in DNA microarray research because it provides the community with a thoroughly charac-terized reference data set against which new refinements in platforms and probe sets can be compared. It complements other initiatives, such as the ERCC, in providing the community with two commercially avail-able human reference RNA samples that can be used to calibrate arrays in ongoing quality control and performance validation efforts. It can be used as the foundation for combining other microarray studies, thereby realizing the true cumulative potential of microarray data, which will undoubtedly lead to new insights. And from a clinical per-spective, it validates the DNA microarray as a tool that is sufficiently robust and reliable to be embraced for use on hard-to-obtain human tissue samples.

Clearly, microarrays have a long way to go before they can be used to support regulatory decision-making or accurate and consistent prediction of patient outcomes in the clinic. But the MAQC study has given us a solid foundation from which to build.

E D I T O R I A L©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

1040 VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

Can Europe accelerate out of trouble?Europe should seriously consider the ‘accelerator’ concept to foster the sustainability of its biotech companies.

Many of Europe’s biotech firms appear permanently stuck in a state of arrested development. Indeed, compared with their counter-

parts over the ocean, European startup companies continue to find it hard to achieve the size and stature requisite for commercial success. A recent report reveals that compared with the United States, Europe has an awful lot of small companies that, on average, grow much slower than their US counterparts. But help may be at hand in the form of a new incubator concept pioneered on the West Coast of the United States.

According to the latest report from EuropaBio and consultants Critical I, Biotech in Europe: 2006 Comparative Study, two-thirds of European companies have under 20 employees, whereas two-thirds of US companies employ more than 20 people. One would expect that companies established for, say, 2 years or less would be small, and the report confirms this, both for Europe and for the United States. In the United States, however, the initial phase of company growth is rapid: by the time US companies are 5 years old, 75% of them have more than 20 employees; in Europe, by contrast, companies employing ‘less than 20 employees’ are the largest group, right up until the firms are beyond 15 years old.

One reason that US biotech fares better is that US entrepreneurs and investors continue to look for ways of growing companies more efficiently. One of the models that is growing in popularity is the accelerator.

Like incubators, accelerators provide customizable laboratory and business space for young companies. Unlike incubators, which bring small chunks of fluffy capital, cramped facilities and low-grade access to a centralized team of distracted and generically qualified management mentors, accelerators provide a combination of concentrated capital overlaid with specific and committed technical, clinical or market exper-tise. The availability of greater amounts of seed and startup cash (on the order of ~$4 million per company) certainly reduces one of the major risks that young companies face, and by favoring companies that are past the point of discovery, accelerators certainly cut out a large chunk of technology risk. However, accelerators endeavor to take risk reduc-tion even further.

Consider, for instance, the eponymous Seattle, Washington–based Accelerator, started in 2003. Leroy Hood of the Institute for Systems Biology is Accelerator’s president (p. 1055), and Amgen’s venture fund is a founding partner. That gives companies backed by Accelerator (five, so far) instant access to world-class understanding of technology and market issue. Through its founders and management, Accelerator has close ties to several of the Pacific Northwest’s (and America’s) leading venture capital firms, such as MPM, Versant and ARCH.

Although Accelerator backs companies addressing various slow steps in the healthcare product development process, other accelera-tors focus on particular areas of clinical practice. One of the most highly focused is the Hackensack, New Jersey, firm Advanced Technologies,

which has started or re-started six companies that are each developing medical devices for interventional cardiology products. The team run-ning Advanced Technologies includes seasoned investors, cardiologists and clinicians, all of whom have clear roles to play in speeding up the development, clinical adoption and commercialization of cardiovascu-lar devices and hence in providing expedited investment and business exits.

More accelerators are on the way. A consortium of large pharma-ceutical firms is said to be considering creating one in the Cambridge, Massachusetts biotech cluster. And another may be built in the San Diego biotech cluster.

Oddly, just as accelerators are finding new ways to make the milieu for new US firms more encouraging and less risky, the opposite may be true in Europe. In the United States and the more advanced parts of Europe, the rate of formation of new companies has slowed in recent years. Consequently, a large proportion of the new European foundlings are arising in nations or regions that are themselves new to biotech. Often, there are precious few biotech-relevant resources in these locations, beyond a bit of seed money: there are no substantial finance streams, no management skills, no biotech-experienced support infrastructure of lawyers, accountants and consultants.

Such environments are precisely the opposite of accelerators, and are likely to have precisely the opposite effect. Global competition and technology supercession means that biotech firms need to have a ‘Red Queen’ mentality. But trying to ‘run as fast as you can just to stay still’ is difficult if you are wading through mud.

The lesson for companies in nations with new, fledgling biotech sectors is that they need to reach out beyond national borders to management and financiers in other, more established biotech clusters. It’s important to work with these experienced executives and investors because they are familiar with the idiosyncrasies and protracted timelines of life sci-ence ventures and they have the requisite historical and international perspective to place new biotech platforms or products in their proper global competitive context.

In this respect, the Accelerator model looks particularly interesting. Given the difficulty of pooling investors and management expertise and the relative scarcity of truly globally competitive ventures emerging at the national level, perhaps a pan-European accelerator could be an effec-tive approach. Certainly, if European centers of scientific excellence don’t want much of their first-class intellectual property to be hamstrung by underfunding, naive management and unsupportive surroundings, they should seriously consider the concept.

Europe doesn’t need more biotech ventures; it needs more successful ones. And starting biotech accelerators would be one means of bringing together the sort of expertise and funding that could increase the chances that that would happen.

ED ITOR IAL©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Companies eye slice of age-related macular degeneration market

Genentech’s new antibody therapy Lucentis (ranimizumab), approved in June, has the potential to dominate the market for a com-mon eye disease. In a few years, however, it may face direct competition from Avastin (bevaci-zumab), a sister drug made by the same com-pany. Meanwhile, OSI Pharmaceuticals and QLT, which already have drugs approved for the same eye disease, are trying to consolidate their market positions. As yet, several other potential competitors with drugs in phase 2 development for the same indication have not demonstrated any advantages over Lucentis.

On June 30 the US Food and Drug Administration (FDA) announced it had approved Lucentis, a treatment for wet age-related macular degeneration (AMD). The drug is administered as an injection into the eye and is a humanized antibody FabV2 frag-ment that targets vascular endothelial growth factor (VEGF), a protein associated with growth and leakage of blood vessels that causes vision to decline.

Lucentis, which is made by S. San Francisco-based Genentech, is the first approved treat-ment to restore sight in a significant percentage of patients afflicted with the disease. AMD is a major cause of blindness in people over 50 years old. Until now, drugs approved for AMD could only slow the progression of the disease, rather

than reverse it. But in phase 3 clinical studies of Lucentis, vision improved in more than one third of the individuals who took it.

Experts say Lucentis will likely steal a signifi-cant market share from the two AMD treatments on the market: Pfizer/OSI Pharmaceuticals’ aptamer Macugen (pegaptanib) and Novartis/QLT’s small molecule Visudyne (verteporfin). Lucentis may also discourage companies from

developing new therapies. “The success of Lucentis has raised the bar so much that it makes it difficult to come up with a drug that’s better,” says Julia Haller, a professor of ophthalmology at Johns Hopkins University in Baltimore.

But physicians are discovering on their own that Genentech’s approved cancer drug, Avastin, an anti-angiogenic antibody that binds VEGF, may work for AMD just as well and just as safely

An eye surgeon performs microsurgery on a patient with age-related macular degeneration. New drugs, such as Lucentis, could reduce the need for such procedures.

Scrip

ps H

owar

d P

hoto

Ser

vice

/Joh

n R

otte

t/R

alei

gh N

ews

& O

bser

ver/

New

sCom

Table 1 Drugs currently in development for the treatment of wet AMDProduct Company Mechanism of action Phase

Evizon (squalamine)

Genaera (Plymouth Meeting, Pennsylvania)

Anti-angiogenic; inhibits VEGF, PDGFβ, thrombin & bFGF intracellular pathways

3

PTK787 (vatalanib)

Novartis (Basel,)/Schering (Berlin) Small-molecule VEGFR kinase inhibitor 3 (cancer) 2 (AMD)

Retaane (anecortave acetate)

Alcon (Fort Worth, Texas) Small molecule angiostatic cortisene; inhibits angiogenesis induced by basic fibroblast growth factor, VEGF and other known stimulators

3 (application withdrawn from EMEA after asked for more data; awaiting FDA approval)

AG-13958 Pfizer (New York) Inhibits tyrosine kinases, including VEGF 2

CAND5 Acuity Pharmaceuticals (Philadelphia, Pennsylvania)

Gene silencing siRNA therapy that reduces production of VEGF

2

Combretastatin A4 Prodrug (combretastatin)

OXiGENE (Watertown, Massachusetts)

Tubulin inhibitor; disrupts the structure of endothelial cells lining the tumor vasculature to stop flow of blood and nutrients to tumor

2

VEGF Trap Regeneron Pharmaceuticals (Tarrytown, New York)

Recombinant decoy receptor fusion protein that binds to all forms of VEGF-A and placental growth factor

2

Source: Evaluate Pharma (http://www.evaluatepharma.com/) and company websites and information. PDGFb, platelet-derived growth factor b; VEGFR, VEGF receptor. EW

ALSO IN THIS SECTIONNew clinical trials policy at FDA p1043

Amgen’s TPO mimic faces stiff competition p1044BioXell: an Italian biotech success story? p1045

News in brief p1048Profile: Abe Abuchowsky p1050

N E W S©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Box 1 Avastin could become Lucentis’ greatest competitor

With little immediate competition from candidates in the development pipeline, Lucentis’ greatest competitor may be its sister drug, Avastin. The drug stems from the same murine monoclonal antibody as Lucentis. Avastin, however, is a full-length antibody, whereas Lucentis is an antibody fragment. Avastin is also designed as an intravenous drug and has a longer half-life than Lucentis. Genentech scientists say these components make Lucentis better tailored for the eye, with less chance of inflammation and better binding with VEGF.

When a standard vile of Avastin is split into eye-sized doses, the drug costs less than $50 per injection compared to the nearly $2,000 per dose estimated for Lucentis. In anecdotal reports and small independent studies of Avastin used off-label for AMD, the drug appears safe and effective, and word has spread in the ophthalmology community. “Reports have been small and anecdotal,” says Jeffrey Heier, a vitreoretinal specialist at Ophthalmic Consultants of Boston. “But all of us have used it on enough patients to know that the results are real.”

Genentech, however, decided in the late 1990s to stop pursuing Avastin as an AMD drug. In response, some clinicians are trying to organize their own large-scale study so the doctors will have more concrete data on which to base their prescription choices. EW

as Lucentis —and for a fraction of the price. Industry insiders say that if independent investi-gators complete enough safety and efficacy stud-ies, off-label Avastin prescribed for AMD may become Lucentis’ greatest competitor (Box 1).

However, analysts such as Joshua Schimmer, at SG Cowen in New York, believe Lucentis will likely capture 65% of the $1-billion US market. Indeed, Lucentis’ arrival on the market will be a major blow to Macugen, which was developed by OSI Pharmaceuticals and is marketed by Pfizer. Researchers say the difference between the drugs may be the amount of VEGF each drug inhibits. Although Lucentis can bind and inhibit all the active molecular forms of VEGF, Macugen binds to only one form of VEGF, called VEGF-165.“Macugen is effective but not as effective as Lucentis,” says Tony Adamis, chief scientific officer of OSI. “Lucentis’ data is so impressive…even I have to admit that.”

In August, OSI announced that due to competition, it would suspend or curtail all R&D for eye diseases, except on Macugen. OSI’s Adamis says he believes Macugen could find a spot on the market as a mainte-nance drug—something individuals can take after they’ve reaped the benefits of Lucentis. Based on their interpretation of Genentech’s data, OSI theorizes that patients could take Lucentis for their first three doses, and then switch to Macugen, which costs about half as much, and, on average, is injected less fre-quently. The company has set up a clinical trial to test this proposal.

Many experts, however, say OSI’s theory is based on little or no data, and is a last attempt to salvage their product. “They are trying to use a marketing ploy to feed off ophthalmologists who are not in the know,” says Peter Campochiaro, a professor of ophthalmology at the Wilmer Eye Institute at Johns Hopkins. “They’ve been pretty shameless.”

Lucentis’ other approved predecessor, Visudyne is a treatment called photodynamic therapy, in which the drug is injected into the bloodstream and activated in the eye by a light beam. Like Macugen, it can slow the progres-sion of AMD, but usually cannot reverse it. “It’s hard to see it playing a meaningful role going forward,” says Schimmer. Visudyne may be use-ful for individuals who cannot endure an injec-tion in the eye, he says. It may also be used as a combination treatment with Lucentis, although data so far have not supported this treatment regimen, he says.

As Lucentis edges out drugs already on the market, it may also douse enthusiasm and funding for some early-stage AMD can-didates. Just two months after Genentech announced in July 2005 its phase 3 results for Lucentis, Alnylam Pharmaceuticals, a biotech

1989

1993

1994

1996

1996

19971997

1999

2004

2005

2005

2006

2006

2006

• Napoleone Ferrara at Genentech discovers and clones VEGF

• Ferrara and colleagues published pre clinical data showing that an anti-VEGF antibody can suppress tumorgrowth and angiogenesis—the formation of new blood vessels (Nature 362, 841–844, 1993)

• Studies suggest VEGF may have a role in ocular diseases (NEJM 331, 1480–1487, 1994; Am. J. of Opthalmol.118, 445–450, 1994)

• Adamis and other Researchers at Massachusetts Eye and Ear Infirmary in Boston discover that a mouse monoclonal antibody against VEGF could be injected into monkey eyes to prevent blood vessels from growing (Arch. Ophthalmol. 114, 66–71, 1996). The cross-species experiment didn’t cause inflammation, suggesting that a humanized version may not cause inflammation if injected into human eyes.

• Genentech humanizes the anti-VEGF antibody

• Phase 1 trials begin for Avastin, a full-length monoclonal antibody targeting VEGF• Genentech compares full-length anti-VEGF antibodies with antibody fragments (Fab) and finds that the fragments better penetrate the retina (Toxicol. Pathol. 27, 536–544, 1999). Their findings compel the company to steer Avastin down a cancer pipeline, and develop a new therapy—Lucentis—for the eye. Researchers later suggest that the study was flawed. “While the Fab appeared to penetrate better than the full-length antibody, the study was flawed due to the fact that the two molecules recognized different antigens: the Fab was directed against VEGF, and the full-length antibody was directed against an antigen expressed within the inner retina known as HER2,” writes Philip Rosenfeld, an ophthalmologist at the University of Miami’s Bascom Palmer Eye Institute, in a 2006 issue of Ophthalmology.

• Phase 1a trial begins for Lucentis, an antibody fragment targeting VEGF made from the same murine monoclonal antibody as Avastin

• FDA approves Avastin for metastatic cancer of the colon or rectum

• Stephan Michels and his colleagues suggested in a small study that Avastin is safe and can improve macular anatomy and vision in people with wet AMD (Ophthalmol. 112,1035–1047, 2005)• Small studies and anecdotal reports conducted by clinicians support Michels’ findings

• Ziad Bashshur and colleagues at the American University of Beirut Medical Center in Lebanon publish the first prospective study of Avastin for AMD. (Am. J. Ophthalmol. 142, 1–9, 2006). Conducted in Lebanon on 17 human subjects, the study found marked improvement in nearly every eye studied, with no side effects.

• FDA approves Lucentis for wet AMD

• Clinicians vow to conduct a large-scale US clinical study of Avastin EW

Year Discovery/event

Lucentis timeline: the evolution of two anti-VEGF drugs under one roof.

company in Boston, announced that it would halt development of its AMD drug because of competition.

But others persevere. Among the most promi-sing candidates in the development pipeline, some experts say, is the VEGF Trap by Regeneron Pharmaceuticals. Scientists believe the drug works by binding more effectively with VEGF, thereby blocking VEGF receptors. A phase 1 study showed that a single injection lasted at least six weeks.

Some companies are exploring drug candi-dates that can be delivered systemically, or into

the bloodstream (Table 1).Although Lucentis is one step ahead of

competitors, it has some drawbacks, and industry insiders say there is still some room on the market for new products. Eye injec-tions are rough on patients and carry risk. Lucentis must be injected into the eye every month for the first four months, and then at varying frequency afterward. A drug that lasts longer or can be administered less inva-sively and less frequently than Lucentis has potential.

Emily Waltz, New York

NEWS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


New clinical trials policy at FDA

In a bid to speed drug development, the US Food and Drug Administration (FDA) is encouraging drug companies to design clini-cal trials with flexible enrollment, dosing and other parameters. Called ‘adaptive design,’ the approach promises quicker results with smaller trials, but also carries risks of manipulation, according to observers.

At a July meeting in Washington, DC, FDA deputy commissioner for medical and scientific affairs Scott Gottlieb, laid out the agency’s plan to develop five guidance papers over the next several years. Although not binding, the docu-ments will help drug companies design and implement adaptive trials that the FDA consid-ers up to snuff. “We have a dilemma. [Trial] costs are spiraling upwards, trials are getting bigger, patient resources are shrinking, there are a lot of drugs in the pipeline, and it’s getting harder to measure endpoints. The old paradigm just isn’t working,” says Brian Schwartz, senior vice president for research at Ziopharm Oncology, of New York.

In a typical clinical trial, parameters such as drug dosages and the number of patients in each arm of the trial are predetermined and immu-table. Adaptive trials, in contrast, allow tweak-ing of dosages, patient pool sizes, and so on in response to incoming data. Proponents describe adaptive trials as iterative, with each new round of parameters informed by lessons learned on the fly. “It’s more of a seamless approach,” says Gottlieb. Gottlieb also says that adaptive trials will more quickly rule out unsafe or ineffective drug candidates. “The ability to fail faster is an important advance,” but “adaptive procedures are more complicated to design and analyze, and in some settings more difficult to implement.”

In addition to these challenges, the FDA could have trouble getting buy-in on the concept, says Mark Senak, a consultant at Fleishman-Hillard who runs the ‘Eye on FDA’ blog. “The agency and industry will have a tough time selling the concept to policy makers and to a public that is already skeptical of clinical trial design and safety,” he says.

Already, though, industry is embracing the concept. Wyeth recently hired a new vice presi-dent for adaptive trials, and Robert O’Neill, director of the office of biostatistics at the FDA’s Center for Drug Evaluation and Research, says that each of the FDA’s drug evaluation branches has received adaptive trial proposals. “The FDA is very interested in the concept,” says Mark Chang, a biostatistician at Millennium Pharmaceuticals, Cambridge, Massachusetts. “They’ve begun working closely with industry on adaptive trial designs, and they’re encouraging companies to

approach them early in the process.”Although the concept is widely embraced—

few would argue against speeding up phase 2 and 3 clinical trials, which can drag on for five or more years—the mechanics of adaptive trials present thorny statistical challenges, says Chang. And Schwartz says that companies interested in adaptive trials tend to underestimate the diffi-culty of collecting real-time data. “By the time they look at the first 300 patients, there might be 900 patients in the trial,” he says.

Companies need to develop simulations to test adaptive scenarios, says Chang. In his mo-dels, the two most common variations involve ongoing assessment of sample size and enrich-ment of the treatment arm with patients most likely to benefit. For instance, Chang will model a range of patients’ responses to a drug, a key factor in sizing trials—smaller variations require smaller sample sizes. Enrichment scenarios, by comparison, often call for first discovering bio-markers in the best responders and then adding more of those patients to the protocol. Chang and Gottlieb also envision ‘pivotal’ trials that combine phase 2 dosing and phase 3 effective-ness studies. “You can run a lot more doses, maybe five instead of two,” says Chang.

Bayer Healthcare, based in Leverkusen, Germany, adopted an adaptive approach for its phase 2 trial of a new cancer drug. Without knowing which types of cancer Nexavar (sorafenib tosylate) would fight best, the com-pany enrolled patients suffering a range of advanced cancers. “We knew pretty quickly, within ten or so patients, that kidney cancer was the best responder,” says Schwartz, who helped

run the trial before joining Ziopharm earlier this summer. Bayer then designed a traditional phase 3 trial for renal cell cancer.

This approach illustrates another impetus for adaptive trials: many new cancer drugs stop a tumor from spreading but don’t necessarily shrink it. “The traditional endpoint of tumor shrinkage just doesn’t make sense anymore,” says Schwartz. He cautions, however, that commit-tees for evaluating and modifying trials on the fly need to be “completely independent” from sponsors. “That’s the only way to maintain inte-grity. Industry can’t be within an arm’s length” of the evaluation committee, meaning “companies will have to give up some control,” says Schwartz. He urges the FDA to “very explicitly” spell out the role of the new committees.

Most large trials already deploy a data safety and monitoring board empowered to end trials if wide benefits or severe adverse events appear early. However, these traditional committees simply collate data against pre-determined stop-ping points; the new committees will have much more power.

Gottlieb says the FDA will issue two guidance papers in January 2007, with three more to fol-low. The first will provide guidelines for evalu-ating multiple trial endpoints; the second will outline how to enrich trials with patients most likely to benefit. “This is a wonderful oppor-tunity,” comments Schwartz. “We want to get drugs to patients quickly and it’s frustrating to look back at some of our trials and see if we had changed this or that we could’ve had the drug to patients six months earlier.”

Brian Vastag, Washington, DC

Bayer Healthcare was one of the first companies to follow adaptive clinical trials protocol to determine for which cancer its drug Nevaxar would be more potent.

AP

Pho

to/H

erm

ann

J. K

nipp

ertz

NEWS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Amgen’s TPO mimic faces stiff competition

With seemingly novel science, unsatisfactory existing treatment options and a sizeable potential patient population, it’s hard to see how Amgen’s phase 3 platelet growth factor candidate to treat blood disorders could go wrong. But, with several small molecules in the pipeline, it could face stiff competition in platelet growth factor market.

Amgen, headquartered in Thousand Oaks, California, has a first-in-class treatment for a deficient platelet count slated to wrap up phase 3 trials by the end of this year and hit the market in 2007. If approved for several related blood disorders, the drug—known as AMG 531—could be a treatment option for more than a half-million people in the US and Europe. And for many indications marked by reduced plate-let counts, such as immune thrombocytopenic purpura, an illness causing abnormal bleeding, and chemotherapy-induced low platelet count, the treatment options are currently meager and include steroids as well as the infusion of plate-lets.

AMG 531 is the result of more than a decade of effort to create protein-based thrombopoi-etin-stimulating agents to increase platelet counts. First discovered in 1994, recombinant versions of human thrombopoietin (rTPO) had the catastrophic side effect of inducing the production of antibodies that cross-reacted with the subject’s own TPO and the develop-ment of low platelet counts in normal sub-jects. Several companies, including Amgen, Genentech, Pfizer, Johnson & Johnson and Schering Plough, all subsequently abandoned their rTPO efforts once it was clear that it had the unintended effect of lowering rather than boosting platelet count.

Around this time, research was published by peptide company Affymax of Palo Alto, California, describing a peptide that func-tioned as an erythropoietin (EPO) mimetic, by binding to and stimulating the EPO receptor. This never bore fruit, because it was less effec-tive than EPO, so Affymax “then published on a similar strategy to identify a TPO mimetic peptide, which was close to TPO in specific activity,” remembers Kenneth Kaushansky, chair of the department of medicine at the University of California, in San Diego. “Hence was born a peptide approach to stimulating the TPO receptor,” he adds, “Others thought that screening large libraries of small organic molecules could also net mimetics, and that is where several other small-molecule mimics have come from.”

In the wake of the debacle of rTPO, research has advanced along these two strategic paths—peptide and small-molecule development—to

create TPO mimetics. By the late 1990s research-ers were quite successful at identifying a num-ber of small molecules and peptides that bound with the TPO receptor. That’s when the peptide part of AMG 531 was identified in Amgen’s laboratory; it increases platelet production by binding to the TPO receptor and stimulating megakaryocytes, large cells in the bone marrow from which pieces break off to form platelets.

Once Amgen researchers identified an effective peptide that did not seem likely to trigger an anti-body response, they needed to improve the life span of the peptide in the bloodstream. To create the AMG 531 peptibody, Amgen combined its preselected peptide with a carrier molecule that extends the life of the drug in the patient’s circu-latory system, according to Roy Baynes, Amgen’s vice president of oncology and supportive care. If it is approved by the US Food and Drug Administration, AMG 531 will be the first drug known as a peptibody to make it to market.

Still, there are at least a half-dozen small-molecule and small-protein platelet-stimulat-ing agent projects currently at various stages of clinical development to treat diseases marked by platelet-deficiency, according to life sciences clinical trial research firm La Merie, located in Barcelona, Spain. AMG 531 is among the most clinically advanced treatments; but although it does not cross-react like rTPO, AMG 531 is still a relatively inconvenient treatment requiring weekly intravenous doses.

By contrast, eltrombopag, developed by GlaxoSmith Kline (GSK) in London, which is also in phase 3 trials, is a small-molecule treatment for patients with low platelet count. Eltrombopag may be among the first small mole-

cules to modulate protein-to-protein interac-tions, a particularly hard target for this platform, according to market research firm Decision Resources, based in Waltham, Massachusetts. This scientific advance translates into a market advantage: administration orally via a tablet.

Mark Schoenebaum, a research analyst at investment bank Bear Stearns in New York City, who follows Amgen thinks this could be a big obstacle for AMG 531. He expects the candidate, if approved, to peak out at a mere $300 million in sales. “It’s not thought to be a big drug,” asserts Schoenebaum. “It’s going to face serious competition from a pill from GSK if they are both approved. Since that’s an oral pill, it is cheaper to manufacture and more con-venient.”

The initial indication targeted by both Amgen and GSK is immune thrombocytopenic pur-pura, a condition common in HIV-infected people, in which the body produces antibod-ies against platelets in the blood. But the next indications targeted for approval are likely to include a whole range of conditions character-ized by low platelet counts, including chemo-therapy-induced thrombocytopenia. In the chemotherapy market, where Amgen has sev-eral major products including anemia treatment Epogen (erythropoietin) and which requires regular intravenous infusions, AMG 531 may still have an edge.

But neither Amgen nor GSK are likely to have the last word. “There are 20 or 30 companies that are quietly working on small molecules with much better pharmacologic properties than the GSK molecule,” concludes Kuter.

Stacy Lawrence, San Francisco

A new generation of platelet growth factors could succeed where recombinant thrombopoietin has failed.

NEWS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


BioXell: an Italian biotech success story?

Although BioXell’s successful initial public offering (IPO) on the Swiss Exchange SWX on June 22, which grossed CHF 57.8 ($46.9) million, can be considered a relatively unre-markable event for a company at its stage of development, it could have a wider signifi-cance for the Italian biotech sector. Indeed, Italian biotech has so far struggled to convert the country’s strength in life sciences research into a thriving commercial industry. Other, less heralded developments, including a series of regional initiatives and the entry of new inves-tors into the sector also provide some grounds for optimism on the part of the industry’s sup-porters, but it still faces considerable financial and cultural constraints that could choke fur-ther development.

BioXell’s decision to seek a listing in Zürich rather than its hometown of Milan underlines the lack of development of a fully fledged investment infrastructure for Italian bio-tech. In a similar vein, Villa-Guardia-based Gentium, a 2001 spinout of Crinos Industria Farmacobiologica, raised cash in recent, succes-sive offerings on the American Stock Exchange and on Nasdaq in New York City, whereas the Italian founders of NiCox opted to establish that company as a French entity, located in Sophia Antipolis and quoted on the Euronext exchange in Paris.

A Milan IPO “wasn’t really considered” says BioXell CEO Francesco Sinigaglia, whereas the Zürich exchange is home to several bio-tech successes and has the support of investors who understand the sector. Even so, the share offering, which was launched shortly after the general decline in global stock markets in early June, was priced at the bottom of the indicative price range of CHF 44–CHF 48 ($35.5–$38.8)

that the company published and investors took up the minimum number of shares on offer. However, the share price has held up since the IPO, hovering close to the initial offering price for the first six weeks of trading.

The BioXell success remains a largely isolated one in the Italian landscape. Despite its promi-nence in fields such as oncology, immunology and neuroscience, Italy has been Europe’s most egregious underperformer in biotech during the past decade. Italy was bottom of a league table of 14 western European states that mea-sured each country’s gross domestic product against its total number of biotech companies, according to the 2006 Ernst & Young biotech report “Beyond Borders.”

An absence of risk capital, deficits in areas such as patenting and technology transfer, a historic inattention to the sector on the part of government and a general lack of interest in commercial biotech on the part of academic scientists have all contributed to this state of underdevelopment. “There is still very modest entrepreneurship in the biotech sector and not many structured and savvy intermediaries. Deal flow is not significant compared to other EU countries of similar size,” says Joël Besse, senior partner with Atlas Venture in London, who participated in investments in two Italian bio-techs: Milan-based Novuspharma and Bresso-headquartered Newron Pharmaceuticals.These two firms, along with BioSearch Italia and Milan-based BioXell were all established as either spin-outs from or management buy-outs of international pharma R&D centers that had been located in the country. Only BioXell and Newron Pharmaceuticals remain independent. BioSearch Italia merged with Versicor, to form Vicuron Pharmaceuticals, an

Francesco Sinigaglia, BioXell’s CEO, is at the helm of one of Italian biotech’s success stories.

Bio

Xell

NEWS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


anti-infectives specialist located in King of Prussia, Pennsylvania. Pfizer then acquired Vicuron for ~$1.9 billion in cash in September 2005. Novuspharma was acquired in January 2004 by Cell Therapeutics, of Seattle, in a stock-based deal initially valued at $236 million.

It is difficult to predict whether other biotechs would follow the example of the likes of BioXell. The gap between these com-panies, all of which were established with relatively broad clinical development pipe-lines, seasoned management and access to international venture capital finance, and the rest of Italy’s fragmented and, for the most part, undercapitalized biotech indus-try has been considerable. The great chal-lenge for the sector has been how to close that gap. Regional authorities, notably in Lombardy and Piedmont, where the bulk of Italy’s 160 biopharmaceutical firms are based, are actively involved in promoting biotech, through funding technology transfer agen-cies, incubators and seed funds (Box 1). Newer initiatives have sprung up in Tuscany and in Sardinia too.

“Clearly the lack of Italian specialist venture capital funds is a problem,” says Sinigaglia. However, individual companies are pursuing alternative funding models. Some, most notably MolMed, have managed to raise cash directly from financial institu-tions and private investors. MolMed, located in the San Raffaele Science Park, adjacent to the San Raffaele University Hospital and the eponymous Scientific Institute, the country’s largest private clinical research center, has so far secured some €60 ($77) million by this route and may undertake an IPO during the first half of 2007. “We have a broad pipeline and we think we would be ready in the near future,” says Marina Del Bue, general man-ager at Milan-based MolMed, which is devel-oping cell-based therapies and biotech drugs for cancer.

Elsewhere, investors in Genextra, a hold-ing company with a controlling interest in four companies, agreed to double their commitment, to €60 ($77) million, this summer, following their participation in a $41-million investment round in Intercept

Pharmaceuticals, a company headquartered in New York City but based on research into the bile acid–activated nuclear receptor farnesoid X performed at the University of Perugia. “Although it is supplying mentoring and administrative support, Genextra is nei-ther an incubator nor an investor. We are not an investment fund. We are a biotechnology group,” says Paolo Fundaro, Genextra chief financial officer.

Milan-based Genextra has high visibility in Italy because its backers include leading entrepreneurs and industrialists, such as its founder, telecoms entrepreneur Francesco Micheli, Marco Tronchetti Provera, chair-man of Pirelli & Telecom Italia, FIAT chair-man Luca Cordero di Montezemolo and Diego Della Valle, CEO and chairman of the luxury shoemaker Tod’s. The model is bor-rowed directly from that of another Micheli-led enterprise, the internet and telecoms group eBiscom, now FastWeb, which raised $1.5 billion at the beginning of the decade. Its progress, along with that of BioXell—now the country’s flagship biotech firm—could help to shape investor sentiment toward the sector.

Assobiotec, in Milan, which represents the industry, thinks the country’s new national government can help as well. One measure, says Assobiotec president Roberto Gradnik, would be to create a national agency for innovation that would support technology transfer and partnering. “At the moment, if anybody, such as a private investor, is inter-ested in investing in biotechnology, they don’t know where to go,” he says. Risk-averse Italian investment funds might engage with the sector if a ‘guarantee fund’ were put in place—a sort of voluntary insurance scheme that would allow venture capital funds to off-set their investment losses against profits on more successful ventures.

Assobiotec is also trying to persuade the government to adapt the ‘Young Innovative Company’ concept, originally developed in France to provide tax breaks and other fis-cal supports to research-intensive startup companies, to the Italian tax code. Italy had a change of government in May. In its new cabinet, led by Prime Minister Romano Prodi, responsibility for innovation policy was transferred from the research ministry to the industry ministry, headed by Pier Luigi Bersani. Gradnik interprets this as a positive move. But, says Sinigaglia, a real shift away from manufacturing and towards a knowl-edge-based economy still needs to happen. “We need to see government to commit to that switch.”

Cormac Sheridan, Dublin

Box 1 Italian biotech park taps into traditional industries

Italians are often praised for making up for the deficiencies of their country—burdened by bureaucracy and lack of flexibility—with individual creativity. The Canavese Bioindustry Park may be a proof that there is some truth in this cliché. The park is located near the northern city of Turin and its creation was supported in the 1990s by the Piedmont region with the aim of reinforcing the high-tech dimension of the local economy after a major crisis. As a result the park shareholders are 70% public and 30% private.

Since 2004, seed capital for startups has been available thanks to a model of financing based on the business angel concept devised to bypass the lack of interest from venture capital investors for early stage projects. “We collect money from wealthy people with no experience in biotech, such as local small entrepreneurs in the textile or mechanical sector, lawyers or accountants. Before meeting us they never thought of becoming business angels,” says Silvano Fumero, who conceived the park when he was still head of R&D at Serono.

Thirty people gave a total of €3 ($3.8) million, each contributing a small sum and becoming founding members of a seed capital society called Eporgen Venture. “We are hopeful in a couple of years the most promising newborn companies may attract investments from [the] biggest players, maybe one [of] the international venture capitalists we have involved in the selection of projects,” explains the park project manager Fabrizio Conicella. Birth rate is unusually high by Italian standards with five new startups born last year and the intention of starting another five companies by mid 2007.

The initial success of the project is already a blow against the cultural and political foot-dragging of the country, but is it replicable? “We are actually examining the way to implement a similar model in the Rhône-Alps region [of France] but it’s not so easy,” says Valérie Ayache, managing director of the biotech association near Grenoble, Adebag. She points out that the motivations of people investing in Eporgen are very much related to the history of the territory, the charisma and experience of the project fathers, and the very integrated model they have created between the park and Eporgen. Other Italian regions are trying to learn a lesson from the Canavese experience, too: until now biotech has played a minor role in the national business angels network (Iban) but its secretary general Tomaso Marzotto Caotorta thinks it’s time to create a club of senior managers scouting Italian life sciences institutes for innovative ideas.

Anna Meldolesi, Rome

NEWS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


200% ethanol boost from Oz sugarUniversity of Queensland (UQ), Australia, molecular geneticist Robert Birch got more than he bargained for when he introduced a bacterial gene into sugarcane to convert sucrose into its high-value isomer, isomaltulose. In fact, the gene encoding sucrose isomerase, cloned from Pantoea dispersa, a harmless colonist of the crop’s leaves, delivered twice as much sugar as Birch expected. Some of the transgenic plants were producing isomaltulose at up to 110% of the normal concentration of sucrose. Others produced little or no isomaltulose, but yielded up to 100% more sucrose. In the past 50 years, breeders had been unable to improve sugarcane’s yield by even 1%. Last August, CSR, Australia’s biggest sugar refiner, and its commercial partner, UQ’s commercial arm, UniQuest, received an AUD$5 ($3.8) million federal research grant, under AusIndustry’s Renewable Energy Development Initiative, to develop Birch’s high-yield sugarcane, dubbed ‘SugarBoost’, as a source of the ‘green’ fuel eth-anol. The partners recently planted the first, small-scale, contained field trial of the trans-genic sugarcane, with approval from the office of the gene technology regulator. Queensland University of Technology molecular geneticist James Dale, also founder and CEO of Brisbane-based ‘biopharming’ company Farmacule, describes the development as “huge,” in terms of its significance to Australia’s nascent etha-nol industry. Indeed, Australia has been bat-tling to keep its sugar industry alive in the face of cheap sugar from Brazil. Dale adds that it might eventually be possible to engineer simi-lar yield increases in other ethanol feedstock crops like sugar beet and maize. GON

Senate compromise on SBIR reform

A bill has been approved by the US Senate committee on small business and entrepre-neurship that would allow companies that are primarily owned by venture capitalists (VCs) to obtain small business innovation research (SBIR) grants. Since 2003, companies whose majority investors are VCs have been ineligible for SBIR funds. Still, companies with some VC investment have been able to access the grants, according to a General Accounting Office

(GAO) report released in the first half of this year. In 2004, about 22% of National Institutes of Health SBIR grants, or $127 million, went to companies that were held in the minority by VCs, according to a report released in the first half of this year by the GAO. The Small Business Administration Reauthorization bill includes an amendment that would com-mit one-quarter of SBIR funds to companies that are majority-backed by VCs. “We’re sup-portive of the compromise and we look for-ward to working with the Senate,” asserts Alan Eisenberg, the executive vice president for capital formation and business development for the Washington, DC-based Biotechnology Industry Organization. The bill is now on its way to be considered by the full Senate. StL

Europe backs ES celslAfter a heated debate, the EU voted in late July to continue funding embryonic stem (ES) cell research, but with narrower crite-

ria. The EU council agreed to continue to support research on ES cells, but not their procurement, which often requires destruc-tion of the embryo. Several countries, nota-bly Germany, with strict laws on stem cell research attempted to block the decision. The funding is part of the EU’s €72.7 ($93) bil-lion research budget for 2007–2013. The vote came just days after US President George W. Bush blocked the passage of a bill that would have allowed federal money to fund similar stem cell research in the US. The bill would have supported use of embryos destined for disposal at in vitro fertilization clinics. The contrasting decisions of the EU and US may give European biotechs an edge in recruit-ing scientists, experts say. “This is a missed opportunity for the US to assert leadership in the field,” says Michael Werner, president of the Werner Group, a Washington, DC-based biotech research consulting firm, and former chief of policy at Biotechnology Industry Organization, also in Washington, DC. “The EU is taking advantage of that.” EW

News in Brief written by Alla Katsnelson, Kim Griggs, Stacy Lawrence, Linda Nordling, Graeme O’Neill, Peter Vermij & Emily Waltz

Bioengineered scents available soon from New Zealand

The Horticultural and Food Research Institute of New Zealand—known as HortResearch—has filed patent applications for the use of the genes that produce the scent of green apples and red roses. Auckland-based HortResearch examined its databases of fruit genes and compounds to find the genes that encode enzymes that make alpha faresene synthase (green apple scent) and germacrene D synthase (rose scent). HortResearch used its flavor compounds databases to build maps of hypothetical pathways of how the resulting compounds are synthesized in the fruit or flower. These hypothetical pathways then allowed the scientists to postulate what types of enzyme might catalyze each step. The scientists then looked in their gene databases for genes that encode enzymes that can perform these steps. Likely genes were tested in Escherichia coli to see if they did produce those compounds and then in model plants. To manufacture the enzymes, HortResearch uses biofermentation. “What we are suggesting is that you could actually use real enzymes from the plant,” says HortResearch scientist Richard Newcomb, “and it’s even more ‘nature identical’.” Steve Meller, head of Global Biosciences at Procter and Gamble located in Cincinnati, Ohio, believes that a technological process that could produce cost effectively the flavors and perfumes manufacturers need would be a benefit. He adds: “The really desirable odorants out there are those that are much more complex, so I think that’s really where the hurdle is going to be.” HortResearch’s work of producing flavors and fragrances is the flipside of the work done by Californian company Senomyx which focuses on the receptors that enable humans to perceive taste (Nat. Biotechnol. 22, 1203–1205 2004). KG

Hor

tRes

earc

h

NEWS IN BR IEF©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


UK panels urges tightening of phase 1 rulesBritish pharmaceutical industry organizations say they are generally pleased with draft re-commendations by a scientific expert panel for tighter rules governing phase 1 trials of “novel and potentially higher risk drugs…such as monoclonal antibodies.” The UK government convened the panel earlier this year when six volunteers experienced very serious adverse effects from TGN1412, a T cell–targeting ‘super monoclonal antibody’ with an agonist activity developed by TeGenero of Würzburg, Germany (Nat. Biotechnol. 24, 475–476 2006). In an interim report released on July 25, 2006, the panel said that in higher risk studies “the first dose in man should be given to one person only, leaving suf-ficient time for any adverse reaction to develop before further administration or administration to additional people.” The experts urged drug developers to inform regulators earlier about elevated risks and suggested enrolling people with the targeted disease rather than healthy volunteers into phase 1 trials of higher risk drug candidates “particularly if the drug is expected to affect the immune system.” The recommen-dations generally echoed those published a day earlier by a joint task force of the Association of the British Pharmaceutical Industry and the UK BioIndustry Association, both based in London, including the proposal to set starting

doses in first-in-man trials of biologicals below a point at which no biological effect is expected. The industry task force, however, limited some of its advice to “novel agents stimulating the immune system,” excluding from extra scrutiny agents with inhibitory effects. Such agents, the task force writes, “are widely used” and “rarely have acute adverse effects.” The expert panel is due to issue its final report in November. PV

GM sorghum stalled in SAIn July, the South African government rejected an application to conduct field trials of geneti-cally modified (GM) sorghum on its soil—research that received $16.9 million from the Bill and Melinda Gates Foundation. The deci-sion, which received a lot of media interest throughout Africa, was based on a judgment that the containment level proposed was too low for a native African plant. Similar concerns about contamination to native plants have been raised in Mexico in the past, as the coun-try tried to develop GM corn (Nat. Biotechnol. 23, 6, 2005). Gatsha Mazithulela, executive director of the biosciences arm of the Council for Scientific and Industrial Research (CSIR), located in Pretoria, says the rejection, far from destroying the public image of biotech, actu-ally could inspire confidence. “It’s giving a clear message that the South African GM [organ-isms] legislation is working and if you don’t

submit the right application you won’t go through,” he says. “The issue here is that sor-ghum’s center of origin is Africa and that’s why there’s a cautious approach,” explains Jocelyn Webster, executive director of AfricaBio, a non-governmental organization supporting research, development and application of bio-tech in Africa, adding, “ There’s more informa-tion required by the applicants, and I suspect that there will be the usual process followed by the regulators.” Meanwhile, researchers work-ing on the sorghum project are hopeful that a second application, which proposes higher containment levels, will be accepted before the end of this month. LN

TKT infringes Amgen EPO patents

A US federal appeals court ruled on August 3 that Cambridge-based company Transkaryotic Therapies (TKT), acquired last year by Shire, has infringed two patents held by Amgen for the production of erythropoietin. The ruling effectively bars US sales of TKT/Shire’s EPO product, Dynepo, in the US until the pat-ents expire in 2015. However, the court also ruled one of Amgen’s patents invalid and sent another claim back for review. Although both sides won two battles, notes Kevin Noonan, partner at the law firm McDonnell Boehnen Hulbert & Berghoff in Chicago, Illinois, Amgen won the war. “In the grand scheme, the paten-tee only has to win one” to demand an injunc-tion against the competitor. But such rulings give “some certainty to how these claims can be interpreted,” he added, and can be a “spur for other [companies] to figure out how to get around them.” An additional suit is pending against Swiss company Roche, which has plans to sell its new product CERA, a long-acting EPO, in the US. The ruling “could widen the window of opportunity for Roche to craft an infringement defense for CERA that capitalizes on these new interpretations,” writes David Witzke, biotech analyst at Banc of America in New York, in a research note. Amgen’s patents expired in Europe in 2004, and European sales of Dynepo are slated to launch this year. CERA was submitted for review for approval by the European Medicines Agency in April. AK

New products tableProduct Details

Lucentis (ranibizumab injection) Genentech (S. San Francisco, California)

On June 30, the US Food and Drug Administration approved Lucentis, a recombinant humanized antibody fragment, for the treatment of neovascu-lar wet age-related macular degeneration (AMD). An estimated 1.7 million people in the US suffer from severe AMD, the leading cause of blindness in the elderly. The ‘wet’ form of the condition is caused by growth of abnormal blood vessels that leak fluid and blood, leading to retinal scarring. Lucentis inhibits the activity of the angiogenesis protein human vascular endothelial growth factor A (VEGF-A). In trials, Lucentis maintained vision or restored partial vision loss in most wet AMD patients. Recommended treatment is intravitreal injection administered once a month.

Elaprase (idursulfase) Shire Pharmaceutical Group (Basingstoke, UK)

On July 24, the FDA approved Elaprase, an enzyme replacement therapy for the treatment of Hunter Syndrome. Also called Mucopolysaccharidosis II, Hunter Syndrome is a life-threatening X-linked recessive condition resulting from absence or insufficiency of duronate-2-sulfatase, causing the accumu-lation of cellular waste products in tissues and organs. The condition affects about 1 in 65,000 to 132,000 births. Elaprase, the first-ever treatment for Hunter Syndrome, is administered in weekly infusions. It has also under review by the EMEA in Europe.

AK

NEWS IN BR IEF©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Drug delivery is a dicey business, and Abe Abuchowski was one of the first to make it work. At the dawn of the biotech industry, proteins’ promise as therapeutics was undisputed. But taken from animal or recombinant bacterial sources, their therapeutic potential was often undone by high immunogenicity. Short circulating life posed an additional problem—fre-quent doses were required to maintain therapeutic levels, again increasing the likelihood of an immune response.

As luck would have it, when Abuchowski began his doctoral research in biochemistry in 1971, his thesis advisor at Rutgers University, Frank Davis, put him to work on this very problem. A few years back, Davis had hap-pened upon a paper suggesting that poly(ethylene glycol) (PEG), a poly-mer widely used in foods and cosmetics, could provide a solution. Initial studies indeed showed that “hanging a bit of PEG” onto a protein reduced immunogenicity and improved circulating life, recalls Davis, and along with two colleagues he patented a technique for PEG-protein delivery.

Within a few years Abuchowski and his colleagues hit the jackpot when looking for a general method for attaching PEG to a protein: a formu-lation of PEGylated bovine serum albumin. This was the first protein molecule created that was neither immunogenic nor antigenic. “It was a real Eureka moment,” says Abuchowski. “Even after we did it we couldn’t believe it, quite honestly.” More importantly, the researchers went on to show in mice that a PEGylated protein could cure a previously untreatable enzyme deficiency.

At a time when researchers were just beginning to venture into the com-mercial side of discovery, Abuchowski was happy to take the leap. “I think Abe very quickly saw the business applications,” says Davis, who is now retired. In 1982, the duo formed Enzon Corporation in New Jersey to bring PEG-based treatments to the clinic. In 1990 the company’s first product, PEGylated adenosine deaminase enzyme (ADA), known as Adagen, gained US Food and Drug Administration (FDA) approval—making Enzon the fifth company to have a biotech drug approved. Inherited absence of ADA had recently been found to cause one type of severe combined immuno-deficiency disorder. Without PEG, ADA has no therapeutic effect. Four years later, the company received approval for Oncaspar, PEGylated L-asparaginase for acute lymphoblastic leukemia. “A company doesn’t exist to do research, but to get products on the market,” says Abukowski.

The decision to go after two products with almost no market was a deliberate one. “I think Enzon was pretty smart,” notes Roger Harrison, an associate at Plexus Ventures, a global pharma consultancy based in Maple Glen, Pennsylvania, and an independent consultant specializing in drug delivery. “There’s an established belief that anything you do [to a protein] will create a problem with the FDA,” he says. But both Adagen and Oncaspar minimized this added uncertainty because both were made possible by the technology, and both approval processes could be expe-dited by orphan drug status. Even with Enzon’s irrefutable clinical data on Adagen, Abuchowsky notes, “up until the day of [FDA] approval, I probably had half of Enzon management betting against me.” Ultimately, getting the two products out in quick succession essentially proved the technology.

Meanwhile, big pharma was beginning to appreciate PEG’s poten-tial. Enzon signed a deal with Schering-Plough to develop a PEGylated version of alpha-interferon (PegIntron) for treating hepatitis C. But as Enzon’s management waited to see whether the project would succeed, resources dwindled, stock price fell and disagreement began to brew. A messy restructuring ensued, its outcome being a much-diminished R&D program and Abuchowski’s departure—not just from Enzon but, for a time, from biotech.

PegIntron’s approval in 2001 pushed Enzon into profitability, and also marked the first time that a second-generation protein superior to the first generation due to a biotech improvement. Within a year, it had captured about 65% of the market share of a protein that had already been on the market for over a decade. “Had Enzon had the ability to prepare their own proteins and chosen more of a therapeutics model than a drug delivery model, they could have done alpha-interferon on their own,” notes Robert Shorr, who served as vice president of research and development from 1991 to 1997. Shorr is now CEO of Cornerstone Pharmaceuticals in New York. He also serves a scientific advisor to Abukowski’s new company, Prolong Pharmaceuticals in Monmouth Junction, New Jersey.

By all indications, Abukowski is not about to make the same mis-take twice. “Enzon was a company that developed the technology and

Abe AbuchowskiAs CEO of one of the first companies to make protein delivery into a profitable business, Abe Abuchowski knows what it takes to bring a new technology to market. Although his technology—PEGylation—is now considered an industry gold standard, its three-decade development history illustrates the often rocky path to commercial success for platforms.

introduced it,” he says. “Prolong is a product company.” Part of the plan is to realize some of the projects that languished in Enzon’s deep-freezer. But with several biotech drugs coming off patent and manufacturing costs falling, Prolong is also looking to Asia. For Abukowski, one of the lessons of Enzon was that the sooner you get to revenue, the more freedom you have to decide where to go next. “There’s an alignment in philosophy between Abe and many players in Asia,” notes Gurinder Shahi, director of the Global Biobusiness Initiative at the University of Southern California in Los Angeles. Unlike in the United States, “in Asia there is no risk capital, so companies are forced to use a quick-to-revenue strategies and use that revenue to make a product.” Because the technology is now well estab-lished, it creates a proprietary dimension to generics.

In the five years since the technology’s acceptance became official with the approval of PegIntron, about ten other PEG products have come to market, with several more in trials. Yet other technologies are emerg-ing. Even for old products, notes Walter Blatter, CEO of ImmunoGen in Cambridge, Massachusetts, protein modifications other than PEG, such as the two additional N-glycosylation sites in Amgen’s long-acting eryth-ropoietin Aranesp, can increase the circulation life of a molecule. Whether PEG has the capacity to surpass its primary role as a second-generation modification remains to be seen, says Samuel Zalipsky, associate director of protein and linker chemistry at ALZA in Mountain View, California. He concludes: “It’s still got some life in it, though it’s not the only game in town as it was in the past.”

Alla Katsnelson, New York

“A company doesn’t exist to do research, but to get productson the market,”says Abukowski.

©20

06 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

ureb

iote

chno

logy


Biotech R&D goes further afieldStacey Lawrence

The governments in New Zealand, Korea and Canada are placing big bets on their biotech sectors. But the United States continues to dwarf other countries in terms of total investment; last year, the US public sector spent $30 billion on the life sciences, with the private sector

contributing $18 billion. Biotech firms increasingly dominate deal-making compared with big pharma, which now contributes just under half of the funding for all deals; the value and number of these biotech partnerships remained buoyant.

US life sciences federal research funding The life sciences continued to account for 54% of federal R&D funding for the third year in a row.

0.0

5.0

10.0

15.0

20.0

25.0

30.0

35.0

0%

10%

20%

30%

40%

50%

60%

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

1989

1988

1987

1986

1985

1984

1983

Year

($ b

illio

ns)

(% to

tal f

eder

al fu

ndin

g)

� Life sciences funding

� Life science funding as percentage of total federal funding

Source: National Science Foundation

Biopharmaceutical research andproduct alliancesWindhover’s data reports $12 billion in R&D deals; Burrill pegs the figure at a more outstanding $17 billion.

0.0

2.0

4.0

6.0

8.0

10.0

12.0

14.0 Total deal value

20052004200320022001 0

100

200

300

400

500

600

700

Number of deals

Year

($ b

illio

ns)

(Num

ber

of d

eals

)5149.1

60610.3 555

10.4

62112.3

61712.2

Windhover data include any deal involving a biotech firm and use only the first indication figures (if provided), whereas Burrill looks at all potential products and focuses on the research money going to biotech firms. Source: Windhover, Burrill & Company.

Top 20 pharma as portion of R&D dealsThe last three years have seen a rapid decline in the share of biotech R&D deals conducted by big pharma.

0%10%20%30%40%50%60%70%80%90%

Deal number

Deal value

20052004200320022001

YearTo

p 20

pha

rma-

biot

ech

deal

s as

a pe

rcen

tage

of t

otal 74% 77%

72%

59%

48%41%

34%39%

31% 27%

Source: Windhover, Burrill & Company, Nature Biotechnology

Number of partnership deals by stageVery late stage and discovery deals actually declined last year, whereas all other categories held steady or increased.

0 50 100 150 200 250

2005

2004

2003

2002

Discovery

Pre-clinical

Phase 1

Phase 2

Phase 3

Filed for approval

Approved

Marketed

Number of deals

81767566

8

16

41

62

44

134

137

14

23

27

57

45

127

193

12

15

25

37

38

119

169

9

20

29

50

26

119

235

Source: Windhover, Burrill & Company

R&D spend on biotech from business inOECD countriesIceland, Denmark and New Zealand are pouring anywhere from one-quarter to one-half of all their business R&D money into biotech.

0

2,000

4,000

6,000

8,000

10,000

12,000

14,000

16,000

Biotech R&D spendingby companies (millions)

Poland

Norway

Icelan

d

South

Afri

ca

Finlan

d

New Z

ealan

dSpa

in

Austra

lia

China,

Sha

ngha

iIta

lyIsr

ael

Switzer

land

Korea

Denm

ark

Canad

a

Fran

ce

Germ

any

United

King

dom

United

Sta

tes

0%

10%

20%

30%

40%

50%

60%

Percentage of totalbusiness R&D spending

Country

($ m

illio

ns)

Bio

tech

R&

D a

s a

perc

enta

geof

all

R&

D s

pend

ing

14,23

2

2,008

1,347

3%

1,342

6%

1,194

727

95 67

699

3%25

1

5% 201

4% 199

3% 882%

844% 29

2% 51%

236

3%

469

205

12%

24%

21%

7%

51%

9%

Based on earliest year available, 2002, 2003 or 2004. Source: Organisation for Economic Co-operation and Development

International biotech public sectorR&D spendingNew Zealand, Korea, and Canada have devoted the largest share of their public research funding to biotech.

0

100

200

300

400

500

600

700

800

Biotech publicR&D spending

Icelan

d

Sweden

Norway

Finlan

d

Denm

ark

New Z

ealan

d

United

King

dom

Spain

Canad

a

Korea

0%

5%

10%

15%

20%

25%

30%

Public biotech R&D as a percentageof total

($ m

illio

ns)

Pub

lic b

iote

ch s

pend

ing

asa

perc

enta

ge o

f tot

al

Country

72715%

55012% 453

0%

2122%

24%

10%

7% 6%90

291%

50%

149 131105

Includes government and higher education biotech R&D spending. Based on earliest year available, 2002, 2003 or 2004. Source: Organisation for Economic Cooperation and Development

DATA PAGE©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Systems Biology, Incorporated?As the first ‘systems biology’ companies achieve some measure of success, the question remains whether systems biology can provide a viable business model. Karl Thiel investigates.

In June, privately held VLST Corp. in Seattle announced that it had raised $55 million in a Series B venture financing round. At the time, that was reported to be the 16th largest venture capital deal of the year across all industries1—no small feat for an upstart biotech when even established players in the field were finding it tough to curry favor with investors. But the deal was particularly remarkable for another reason.

VLST (which derives its name from Viral Logic Systems Technology) is the first company to graduate from Seattle’s Accelerator Corp., a venture-backed life sciences incubator formed in 2003 by a group of venture investors—MPM Capital, Arch Ventures, Versant Ventures, Alexandria Real Estate Equities and later OVP Venture Partners and Amgen Ventures—in con-junction with the Institute for Systems Biology (ISB), of Seattle. As such, the recent financing seemingly marks a victory for the young ISB and a new milestone in the commercializa-tion of systems biology—a group of marquee venture capitalists (VCs) putting major money into a very early-stage platform technology company at a time when most venture capital-ists are avoiding biotech altogether or are trying to reduce risk with various accelerated commer-cialization strategies2,3. Is systems biology finally coming of age?

Systems biology catches onCertainly, systems biology as a discipline has gained in popularity over the past several years. Since the ISB was founded in 2000 by Leroy Hood, close to a dozen independent systems biology institutes have been created around the world, and many more universities have created systems biology departments. But the commer-cial success of the discipline has been thus far ambiguous.

Systems biology ideally seeks to understand complex biological systems in their entirety by integrating all levels of functional information into a cohesive model. That stands in contrast to the reductionist approaches that became stan-dard in the twentieth century, with biologists teasing out functional information on organ-isms one gene or one protein at a time.

Strategies for systems biology vary, but generally come down to some combination

of bottom-up data collection (for instance, amassing comprehensive information on an organism’s genome, proteome, “transcrip-tome,” “metabolome,” “interactome,” “trans-portome” and any other “-omic” approach, at all possible levels of complexity) and top-down computational modeling and simulation, in which known functions and behaviors of bio-logical components are described mathemati-cally and linked into complex models that allow for the dynamic interaction of large numbers of variables. Hood insists that both approaches are necessary, and that true systems biology requires, whenever possible, a global attitude toward data collection.

Some companies are indeed using both top-down and bottom-up approaches to discover new knowledge, but many have focused on modeling and simulation, gathering their basic data from scientific literature or collaborative partners and tacitly accepting the greater rates of error or uncertainty that go with incomplete understanding of an organism’s constituent parts and how they interact.

But there are more than technical challenges to pursuing a systems approach to biology. Because it requires understanding of biological function from phenotype down to the mole-cular or even atomic level, a systems approach

requires biologists, chemists and physicists of many stripes. And because of the intense data collection, processing, modeling, and simula-tion required, systems biology also requires computer scientists, mathematicians, software engineers and other people not usually found in university biology departments. Hood left the University of Washington in Seattle to found ISB because he believed that he couldn’t effectively build the necessary cross-disciplinary teams in a university environment, nor find the financial backing necessary to create the required infra-structure. And he believed that systems biology would produce an enormous amount of valu-able intellectual property that could be better managed outside a university setting.

That would seem to make systems biology bet-ter suited for a private, commercial enterprise. But there are challenges here, too. Theoretically, a systems approach to understanding and treating human disease should identify the best means of therapeutic intervention. But others will still need to translate that information into an actual therapy. Therefore, systems biology sounds like just one more ‘tool’ strategy for drug discovery and development—a platform that will feed new targets, or new interventional strategies, to drug makers. And that’s exactly what VCs don’t want to hear right now.

“That’s the noninvestable model,” says Carl Weissman, president of Accelerator and a ven-ture partner at MPM Capital of Cambridge, Massachusetts. “That’s what VCs are not inter-ested in—people who are trying to build up some sort of a stacked royalty and services business.”

That would seem to put the young companies trying to commercialize systems biology into a tough position. How do they create a winning

The Institute for Systems Biology (ISB) brings together researchers from diverse backgrounds and provides a place for interdisciplinary work outside the usual academic setting.

ISB

, Sea

ttle

NEWS FEATURE©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


that kind of money,” acknowledges Simonetti. “But when we sat back and thought about it, the real transforming event is the proof-of-concept phase 2 clinical trial.” Thus, the Series B round, divided into three tranches, is intended to take the company though a phase 2a trial, at which point a successful outcome should make further financing relatively easy. VLST is, in short, aim-ing to be a fully integrated drug company and to simply bypass the whole ‘tool’ conundrum.

Ceci n’est pas Systems BiologyThe only problem is, despite its affiliation with ISB, VLST is not really a systems biology com-pany by most measures. It is not seeking a sys-tems-level integration of the human immune system to better understand targeted diseases, but rather it is using the adaptive evolution of viruses as a shortcut guide through the darkness to better targets.

Hood takes it a step further and asserts that none of the companies at Accelerator—includ-ing one called Homestead spun out of his own lab—are really pursuing systems biology. “It’s still too soon,” he says. Homestead “is using systems thinking to identify biomarkers in the blood that may be useful in diagnostics. I think that kind of company has a chance of making a real contribution, but it’s not really a systems biology company—it just defines one aspect of a systems approach.”

The same goes for two other companies—MacroGenics of Rockville, Maryland and NanoString of Seattle—that were spun out of ISB but are not part of Accelerator, and indeed, by Hood’s standards, for most other compa-nies claiming to be working in the space. “Any company that claims to be in systems biology is doing it on a very marginal basis,” he says. “Because we’re just now developing the neces-sary tools.”

But for those companies that are at least bas-ing their businesses on modeling and simulation of complex systems, the problem remains—how do they successfully turn an essentially tool-oriented platform into a growth opportunity? For some companies, the answer has been to go after as much capital as possible and try to build a fully integrated drug company—an approach that requires finding willing VCs with long time horizons, a steep challenge these days. And one company initially pursuing this path, BG Medicine of Waltham, Massachusetts, actually switched from a drug discovery model to a ser-vice model4 in 2005. For others, slow growth and modest capital budgets have been the key.

One of the earliest simulation and model-ing companies to begin operations was Foster City, California’s Entelos. Founded in 1996, it has created a series of ‘PhysioLabs,’ dynamic models of various disease states that integrate

business model in a field that is still struggling to define itself?

Enter AcceleratorUnlike most incubators, the Accelerator was formed specifically to nurture startups that would benefit from an affiliation with ISB, and to provide management support and equity-based backing along with the more typical facili-ties and infrastructure addressed by many life sciences incubators.

Indeed, says Weissman, Accelerator was initially conceived as a vehicle to specifically nurture ideas spun out of the ISB, as an early-stage testing ground where for a relatively small investment—usually in the $2 million to $5 mil-lion range—new concepts could either prove worthy of significant further investment or be shut down without major loss. That’s still the idea, but Accelerator’s reach has widened.

VLST, for instance, did not come out of ISB, but rather was founded by Craig Smith and Steve Wiley, two scientists who most recently came from Immunex/Amgen of Thousand Oaks, California. Smith codiscovered the rheu-matoid arthritis drug Enbrel (etanercept) while at Immunex.

The company’s platform technology is based on using virulence factors found in various viral genomes as a guide to drug targets for autoim-mune and chronic inflammatory diseases. According to VLST president and CEO Martin Simonetti, Smith hypothesized that many viruses rely on these secreted proteins to slow or evade the immune system and thus gain a foothold in their host. For instance, he says, Smith found that many viruses encode a protein that “looks a lot like” the p75 tumor necrosis factor (TNF) receptor—a recombinant form of which ulti-mately became Enbrel. At the same time, the p55 TNF receptor, which some researchers investi-gated as a potential drug, did not prove effective. After retrospective analysis, Simonetti says viral genomes may explain why: “We couldn’t find any viruses that coded for p55.”

The idea that virulence factors could lead to what he calls “prevalidated” targets was retro-spectively validated not only with Enbrel, but with other targets like interleukin (IL)-1 and CD30, he says. “If you knew what the virus was telling you, you would have saved yourself a lot of time and money in the clinic.” The com-pany plans to use a bioinformatics approach to identify virulence factors in viruses, then use proteomics to identify the specific target, and then finally to create therapeutics to mimic the behavior of the virulence factors.

The $55-million Series B round is a big step up from the approximately $4.5 million the com-pany initially got from Accelerator. “When I first joined, we weren’t going to raise anywhere near

information down to individual protein inter-actions based on information derived from published literature, with behaviors repre-sented as differential equations and linked into a simulated patient. Different ‘virtual patients’ can then be created to represent either known variations in—or uncertainty about—the underlying parameters. Groups of diverse ‘patients’ are then used to simulate various interventional outcomes.

Entelos has certainly had some tangible suc-cess both in terms of partnerships and in the fact that after raising about $50 million in venture capital, it went public on London’s Alternative Investments Market (AIM) in April, raising $20 million in its initial public offering (IPO).

But the company’s path has not always been smooth. Its approximately $78-million market capitalization upon IPO was only slightly more than the roughly $70 million it has raised in private and public rounds, suggesting that the market does not yet see a great deal of surplus value in what the company has created with its capital.

Part of that could come down to the revenue model, which has thus far been mostly based on some form of fee-for-service compensation. But the company is now expanding its deal structures to take a greater stake in some of its projects. In February 2005, Entelos announced it had expanded a collaboration on rheumatoid arthritis therapies with Organon of Oss, The Netherlands, into a codevelopment, comarket-ing deal that gives it a bigger piece of the possible upside.

“Royalty deals are great, and we all want to get them,” says Entelos CEO James Karis, “but get-ting single digits on something that’s ten years out—I’m not sure that’s got a whole lot of value. But when you get to collaborate with someone and have the opportunity to codevelop and potentially comarket a drug, that’s a different level of value added.”

“We also in some cases own other aspects of the biology that come out of our relationships,” says chief technical officer Alex Bangs, not-ing that the company has filed for patents on potential drug targets it has identified through its simulations, which it could choose to later out-license or even develop.

The urge to move from service to products motivates more than just Entelos. San Diego’s Genomatica, a company that has used a sys-tems approach in modeling microbes and mammalian cells primarily to help clients improve the production efficiency of chemi-cals and recombinant proteins, has like many systems biology companies raised very little venture capital (Table 1). After an initial $3.5-million round from Iceland Genomic Ventures in 2000, Genomatica has relied on organic

NEWS FEATURE©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


growth and government funding to move its business forward.

But now, the company is migrating from what was essentially fee-for-service consulting work to product ownership. “I think the busi-ness plan has to be built around chemical and biological products,” says Christophe Schilling, Genomatica’s president and chief scientific offi-cer. That means negotiating royalties from the sales of drugs and biologics that Genomatica helps clients produce, but could also mean out-right ownership of some future projects.

Schilling acknowledges that when the com-pany was starting out a few years ago, the technology was at an early stage and needed further development and validation before it would “present the kind of business case where we would want to raise tens of millions of dollars.” So the company instead opted for a business plan that didn’t require much capi-tal and relied mostly on slow, organic growth. But now Schilling believes that Genomatica’s technology has proven its value and that he can offer investors a compelling growth opportunity.

“We’re at a point today where that scenario has definitely changed,” he says. Although he believes the company could be successful on a smaller scale with a low-capital, slow-growth approach, Schilling now sees reason to acceler-ate the process.

Still, not all companies want to create prod-ucts. Gene Network Systems of Ithaca, New York, is using a modeling approach to reverse-engi-neer experimental data into integrated models of complex biological networks, explains CEO Colin Hill. “We are definitely a tool company, a

platform company. We are not trying to make drugs ourselves,” he says.

But he, too, feels the pressure to move towards a product focus. “We’ve had various people dis-cuss that with us—’Why don’t you become a drug maker?’ And I honestly think that’s a hor-rible idea, at least right now,” he says. “Many young companies even now get forced into doing that,” he says, when they have no par-ticular competitive advantage—indeed, many considerable disadvantages of scale—in drug development.

“Until our technology really demonstrates a huge, huge improvement in drug development success rates, I don’t see why it makes sense for a platform company to switch—unless they have no choice because the investors are pushing them to do that,” he asserts. Gene Network, even though it has raised almost $12 million from government grants and angel backers, does not have any institutional backing.

The companies that have combined a com-puter-based simulation and modeling approach to systems biology with internal drug discovery programs have, not surprisingly, raised more ven-ture capital than their in silico-only counterparts, but it remains to be seen whether the advantages that a systems approach to biology bring are enough to overcome the challenges of establish-ing a new drug development organization.

At the same time, there’s something to be said for keeping companies small. Gerry Langeler, a general partner at Seattle’s OVP Venture Partners—one of the backers of VLST, Accelerator and ISB-spinout NanoString—echoes MPM’s Weissman when he says his firm is “relatively uninterested” in platform

companies with variations on fee-for-service models, which he sees as being unlikely to reach a scale that will make institutional backing a worthwhile investment.

But that doesn’t mean companies in the space can’t succeed with that model. “I think there is sometimes a mistaken belief that unless you scale to a very large size, you haven’t been suc-cessful,” says Langeler. “But if you can build a $20-million-a-year company that’s throwing off $3 million to $4 million a year in profit, hats off to you. Don’t take my money, keep it for yourself! There’s a lot to be said for the modest-sized company that may never be the big home run but can be a very successful enterprise for the entrepreneurs and maybe a few small angel backers.”

That approach may also help young com-panies still seeking to prove the value of their platforms mature into something that can sus-tain large capital investment. “In this market, where the venture community is largely trying to de-risk their opportunities and look at tried-and-true products, it’s hard to imagine how you get many systems biology companies funded,” acknowledges Weissman. Tomorrow’s systems biology successes may have to work outside the system.

Karl Thiel, Portland, Oregon

1. Cook, J. Biotech startup VLST gets $55 million. Seattle Post-Intelligencer June 16, (2006).

2. Lawrence, S. Bioentrepreneur, published online 22 December 2005 (doi:10.1038/bioent897).

3. Thiel, K. Nat. Biotechnol. 22, 1087–1092 (2004).4. Hendrickson, D. Mass High Tech, published online

Sept. 9, 2005 http://masshightech.bizjournals.com/masshightech/stories/2005/09/12/story5.html

Table 1 Selected systems biology companiesCompany/founded Focus Equity financing background

Entelos/1996 (LSE: ENTL) (Foster City, California)

Complex human disease models, PhysioLabs. About $50 million in Series A–D rounds; $20-million IPO on London’s AIM market in 2006.

Genstruct/2001 (Cambridge, Massachusetts)

Drug discovery based on iterative wet lab data collection and in silico modeling.

Raised $6.5-million Series A round in 2003; none since.

Genomatica/2000(San Diego)

SimPheny client-server application builds predictive models of organisms based on cellular metabolism.

Initial $3.5 million venture round from Iceland Genomic Ventures; none since.

GeneGo/2000(St. Joseph, Michigan)

MetaCore platform integrates and visualizes cellular function data into complex models.

$1.4 million from Michigan Life Sciences Corridor; no institutional backing.

Ariadne Genomics/2002(Rockville, Maryland)

“Natural language processing” and statistical algorithms. PathAssist software for visualization and analysis of regulatory pathways.

Various grants and government funding; no reported venture backing.

Gene Network Sciences/2000(Ithaca, New York)

VisualCell data integration tool. Licensees include ISB. Builds predictive models of cells.

Some angel investor backing.

Ingenuity Systems/1998 (Mountain View, California)

Pathway analysis software. Venture backers include Affymetrix as well as institutional VC firms.

BG Medicine/2000(Waltham, Massachusetts)(Founded as Beyond Genomics)

Focus on systems pharmacology, including biomarkers for liver toxicity. Wet lab as well as in silico work. Switched from drug discovery to service focus in 2005.

Over $26 million in institutional and strategic funding.

BioSeek/2002(Burlingame, California)

Human disease models, used for partners and internal discovery.

$8.4-million Series A (2002); $19 million in total private equity.

Target Discovery/2002(Palo Alto, California)

Diagnostics based on protein isoforms. $7-million Series A (2002–2003); none since.

NEWS FEATURE©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Clinical trial data: to disclose or not to disclose?Clinical trial databases are sprouting like weeds, but do they provide the information the public needs? Aaron Bouchie investigates.

On August 3, US Senators Enzi (R-WY) and Kennedy (D-MA) introduced legislation, which, if enacted, could help bolster the public’s confi-dence in the drug industry and the government agency that regulates it. The bill, called The Enhancing Drug Safety and Innovation Act of 2006 (S. 3807), calls for the establishment of a mandatory clinical trials registry and results database. In requiring that outcomes be included, such a registry differs significantly from the existing government database, ClinicalTrials.gov which mainly lists ongoing clinical tri-als1. The Enzi-Kennedy bill is a response to increasing public distrust of the drug industry and its oversight by the US Food and Drug Administration (FDA) that is a result of recent high-profile drug safety debacles (Box 1).

But the bill has sparked controversy. Public advocacy groups say it does not go far enough, whereas critics from indus-try say that releasing clinical trial data is unnecessary and may actually stifle inno-vation. As outlined in the Enzi-Kennedy bill, however, greater transparency of clinical trial data appears to offer little threat to drug developers, as the most sensitive business information—that on early-stage, exploratory trials—will remain in companies’ hands.

The current approachThe FDA Modernization Act of 1997 (FDAMA) required the US Department of Health and Human Services to set up a registry of clinical trials “of experimental treatments for serious or life-threatening diseases or conditions2.” To achieve this, the National Library of Medicine (NLM) launched ClinicalTrials.gov in 2002, the primary purpose of which is to help patients and physicians find information on nearby clinical trials. According to the NLM website, the regis-try currently contains ~31,700 clinical studies in over 130 countries.

Many believe that, although ClinicalTrials.gov is a good start, it could (and should) do more to benefit the medical community. In September 2004, the International Committee of Medical Journal Editors (ICMJE, a small working group of journal editors) called for the registry to

include all trials that test for efficacy (excluding only those early-stage trials that test for safety), not just for those that are experimental and for life-threatening diseases3. All 11 ICMJE member journals now require a trial to be registered at or before the onset of patient enrollment in order

to be considered for publication. This is no idle threat considering heavyweights such as The Lancet, The New England Journal of Medicine and the Journal of the American Medical Association are members.

Most pharmaceutical and biotech companies have complied with the ICMJE’s policy, not just so they can get published in reputable journals, but also because in excluding early-stage trials, sensitive business information is not revealed. In explaining this exclusion, Alan Goldhammer, vice president of regulatory affairs at the industry group Pharmaceutical Researchers and Manufacturers of America (PhRMA) in Washington, DC, points out that phase 1 trials are exploratory, or “hypothesis-generating,” and the drugs being tested in this early stage are still far from regulatory approval. “If you’re breaking ground in a new therapeutic area, then listing phase 1 trials would be telling the competition what you’re doing,” says Goldhammer (Box 2).

For this reason, many in the drug industry believe the recommendations published in May, 2006, by the World Health Organization (WHO)

based in Geneva—which states that all clinical studies should be registered, including phase 1 trials—would stifle innovation4. Furthermore, as most phase 1 trials are small and use healthy volunteers, sick patients wouldn’t need to know about them. “It is unclear how disclosing active phase 1 trials would benefit patients,” says Goldhammer.

Following PhRMA’s lead, the Biotechnology Industry Organization (BIO) in Washington, DC, encourages all of its members to reg-ister all “hypothesis-testing” trials (that is, late-stage, some phase 2 and all phase 3) on ClinicalTrials.gov. Most drug developers inter-viewed for this article agree with this strategy, although some still submit only those that are required by law.

One notable exception is GlaxoSmithKline (GSK; Brentford, UK), which was sued in June 2004 by New York Attorney General Eliot Spitzer for suppressing negative results from clinical trials with the anti-depressant drug Paxil (paroxetine hydrochloride) in adolescents. GSK makes public all of its active clinical trials including phase 1 on ClinicalTrials.gov. “We have decided to include all phase 1 trials in the pub-lic registry to support the movement of transparency, which was led by WHO and ICMJE,” explains Rick Koenig, GSK’s vice president of R&D communications.

GSK’s policy, though admirable, is not required by law, nor will it be if the Enzi-Kennedy bill passes. Although the new bill would require the registration of all late-stage trials, not just those that are

experimental and for life-threatening diseases, it does not go so far as to require companies to report early-stage trials. The senators clearly have listened to industry’s interests by not including phase 1 trials in the bill’s registry requirements.

The call for greater transparencyA few months after Spitzer sued GSK, Merck of Whitehouse Station, New Jersey, voluntarily withdrew its anti-inflammatory drug Vioxx (rofecoxib) from the shelves because it increased the risk of heart trouble. Although it created bad publicity for the pharmaceutical industry, these two events also highlighted apparent deficien-cies in the FDA’s post-marketing surveillance processes. The industry responded by launch-ing a number of databases of clinical data on their marketed products in hopes of improving its public image (Table 1 and Supplementary Table online).

Clinical data can also be submitted to ClinicalTrials.gov, for example, by linking to a published journal article, although such disclo-sure is not required by law. The Enzi-Kennedy

Senators Mike Enzi (left) and Edward Kennedy introduced legislation in early August that would mandate the public disclosure of late-stage clinical trial data.

New

scom

/UP

I Pho

to/K

evin

Die

tsch

NEWS FEATURE©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


trial is completed is necessary to benefit patients. For example, a patient looking to enroll in a trial should be able to base the decision on existing clinical data for all products that are in trials, argues Sidney Wolfe director of Public Citizen’s Health Research Group, a nonprofit watchdog based in Washington, DC. Wolfe also objects to a provision that allows companies to delay mak-ing trial results public for up to two years if it

bill, if passed, would mandate disclosure of results from some late phase 2 trials and all phase 3 and 4 trials. Under the legislation, fail-ure to comply has dire consequences—it could hold up drug approval or the release of funds to trials funded by federal agencies.

Even so, some patient advocates believe this bill does not go far enough, and that complete data transparency as soon as possible after the

is applying for marketing approval or attempt-ing to publish the data. “There needs to be the shortest amount of time possible between a trial ending and the data being made public,” says Wolfe.

Art Caplan, director of the University of Pennsylvania Medical Center’s Center for Bioethics in Philadelphia says that such a data-base should be a requirement, not an option. He believes that companies have an obligation to make the data public. Patients entering clinical trials are promised that the results will be made known to help advance medicine, but compa-nies often renege on that promise—especially if the results are negative. “This [database] fulfills companies’ promises to patients,” says Caplan.

But not everyone agrees. Henry Miller, fellow at the Hoover Institution and the Competitive Enterprise Institute in Palo Alto, California, thinks that concerns about drug companies obscuring negative results are exaggerated. “Except for offering a bonanza to plaintiffs’ attorneys trolling for business, the benefit of a publicly available database of clinical trial results would be minimal,” says Miller.

Nonetheless, some companies are already complying with the provisions in the bill through a voluntary database set up by PhRMA shortly after the Vioxx withdrawal. PhRMA recom-mends that its members make public “the results of all hypothesis-testing clinical trials…regard-less of outcome.” Many big pharma companies, such as Lilly in Indianapolis, Indiana, Roche in Basel, Switzerland, GSK and AstraZeneca in London, publish such data on their websites as well (Table 1).

Another provision of the Enzi-Kennedy bill is the requirement for summaries along with the raw data. Such summaries are important, according to Greg Simon, president of the bio-medical think tank FasterCures of Washington, DC, because he is “more worried about burying the public in data sets and statistics.”

Debra Aronson, director of BIO’s bioethics committee, believes that data is best presented to patients through peer-reviewed journal articles, but if not there, then the results and a summary should be verifiable before going into a public data-base. “I think there should be a peer-review process for such a database. I know some don’t like that answer, but that would be best,” says Aronson.

Share and share alikeMost agree that making phase 1 data public would not help patients. As around 80% of drugs fail at this stage, and for many drugs, safety data are obtained by giving the drug to healthy volunteers, such data would not benefit the public. Merrill Goozner of the Washington, DC’s Center for Science in the Public Interest says that in some cases, such as hormone

Box 1 Enzi-Kennedy bill basics

In addition to the clinical trials registry and results database, the Enzi-Kennedy has three other elements:

It outlines a plan to improve post-market monitoring of drugs by the FDA and companies. Before a drug can be approved, a company will be required to submit a risk evaluation and management strategy (REMS) that will help the FDA respond to risks identified after a product reaches market. Noncompliance results in fines of up to $250,000 per violation. According to FasterCures’ Simon, the FDA should receive additional appropriations to take on this added authority, rather than relying on user fees.

It creates the Reagan-Udall Institute for Applied Biomedical Research, a public-private partnership that would foster the creation of a new generation of predictive tools to speed product development and increase safety. This institute would identify and coordinate research priorities and distribute grants. “The FDA analyzes drugs using technologies that are 20 years old,” according to Peter Pitts, president of the nonprofit Center for Medicine in the Public Interest. Thus, the institute would “help the FDA move to the edge of 21st century medicine.”

It increases transparency and predictability in the FDA’s process for screening advisory committee members for potential financial conflicts of interest. Last month, the FDA announced it was looking into how to improve the process. C-Path’s Woosley says that it would be difficult to get balanced opinions without bringing in people with industry experience. “If you want an expert opinion, then you want that expert opinion, no matter where the person works,” he says. AB

Box 2 What is competitive business information in clinical trials?

The typical clinical trial scenario—phase 1 for safety, phase 2 for toxicology and for determining dosage determination and phase 3 for efficacy—has evolved over the years. Now, press releases come out daily that describe a drug in a phase 1/2 trial, or phase 2b, or some other name that further breaks down the stage of clinical development. When determining which trials harbor competitive business information, it may be more useful to think in terms of two categories: hypothesis generating (also called exploratory) and hypothesis testing (also called confirmatory or pivotal).

When a company is testing a drug, it performs lots of clinical trials to try out different delivery methods, indications and patient subpopulations. According to Hoover Institute’s Miller, “At last count, on average the results of more than 70 clinical trials are submitted by a corporate sponsor to support a submission to the FDA for approval to market a new drug, but generally only two or three of these are ‘pivotal’ trials that provide the required definitive evidence of safety and efficacy.” In other words, the pivotal trials, which would most likely be phase 3 or late-stage phase 2, would be hypothesis testing. The other 67 trials, mainly phase 1 and early phase 2, would be hypothesis generating, and data from these trials would not have to be made public under the Enzi-Kennedy bill.

PhRMA’s Goldhammer says that companies aren’t as concerned about disclosing data from hypothesis-testing trials because they are “closer to the finish line.” At that point, the timeline to approval is not as long as when exploratory phase 1 trials are being done, so disclosure will have less impact on the company’s pipeline. Goldhammer also notes that companies have a fiduciary duty to investors and the Securities and Exchange Commission to disclose results of late-stage trials, because they can have a greater impact on the success of a company than phase 1 trials. AB

NEWS FEATURE©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


replacement therapies, efficacy might be seen in phase 1 trials. In these cases, Goozner believes that the data should be made public, but these instances are very rare.

The WHO cited a more recent clinical trial disaster (Würzburg, Germany-based TeGenero’s phase 1 antibody trial, in which six healthy vol-unteers experienced severe immune reactions that has left some with lasting medical prob-lems) as another reason why the public’s trust in the drug industry is waning and how full transparency is necessary to regain that trust.

Goozner believes the issue with phase 1 safety is an issue of communication among compa-nies, more than an issue of making data public. The biotech and pharma industries could ben-efit from sharing such data for all phase 1 fail-ures, even when the effects are not as drastic as those seen in the TeGenero study, by eliminating the duplication of dead-end studies. By sharing some details of their failures, the knowledge base of medicine would grow much faster, and the drug industry as a whole would become more efficient. “Competition in business is under-standable, but science doesn’t work that way. Failures advance the field,” says Goozner.

Ray Woosley, head of the Clinical Path Institute (C-Path) in Tucson, Arizona, agrees that companies need to learn from each other’s mistakes. “C-Path was created to do just that,” he explains. Woosley points to the institute’s Predictive Safety Testing Consortium as an example of such collaboration. Although the consortium was just launched in March, already 14 companies are working on a ‘precompeti-tive’ way of developing better preclinical safety tests. And getting companies to collaborate in a

similar way on phase 1 data would be the next step, says Woosley. “If companies find the cur-rent efforts good, then I will approach pharma about clinical data,” he says.

Caplan believes mishaps could be avoided by the FDA if they were given a little more money to be more vigilant. “The FDA by law gets phase 1 safety stuff and they should be much more aggressive about sharing it with others even if corporate or researcher secrets are jeopardized,” says Caplan. “In terms of human subjects, they [the companies] should understand the point of the study is to generate safety information and how that will be shared with the FDA—that is the goal of phase 1 studies—not [general] knowledge,” says Caplan.

BIO’s Aronson worries that such data sharing of phase 1 trials could harm biotech firms, how-ever. “Biotechs rely on venture capital money, and venture capitalists are investing in intellec-tual property. It would be hard to get investors if all your development ideas were shared with your competitors,” she says. There is always a balance to be kept between the need to share information so that others can use it and learn from it and the need to keep some information protected so that the idea can be developed into an innovative therapeutic, she adds. But who determines which data are shared to help progress the field and which are kept protected? “Establishing that balance is sometimes difficult and often will depend on the timing of disclo-sures,” says Aronson.

Post-marketing bluesThe decision to deemphasize disclosure of early-phase trial results in the Enzi-Kennedy

not only mollifies company and investor con-cerns about competitiveness, but also may result in efforts being focused on what many see as the more serious problem. “The weakest part of regulatory oversight is once products get on the market,” according to Caplan. He cites the safety problems of Merck’s Vioxx and of a car-diac pacemaker from Minneapolis-based device manufacturer Guidant as examples of the FDA’s lack of teeth. “Anyone who thinks the current system is working is dreaming,” says Caplan.

In this respect, there already may be a solu-tion in the wings. Woosley thinks the Agency for Healthcare and Research Quality (AHRQ), in Rockville, Maryland, would be ideal to fix the problem. AHRQ gets about $300 million a year to fund 11 Centers for Education and Research on Therapeutics (CERTs), which are congres-sionally mandated to perform post-marketing studies on drugs, such as head-to-head com-parisons that companies tend to avoid. Post-marketing studies are not gathered and made public, according to Woosley, and CERTs could play a role in helping patients understand their therapies. “The FDA is a passive system, driven by what people bring to it,” he says. Right now only the FDA and companies are educating the public about drugs. Vioxx and Paxil have shown us that system isn’t nearly enough. “What is missing is a learned intermediary,” explains Woosley.

Aaron Bouchie, New York City

1. Clinical Trials.gov. http://www.clinicaltrials.gov2. FDA Modernization Act of 1997. http://www.fda.gov/

cber/fdama.htm3. De Angelis, C. et al. Ann. Intern. Med. 141, 477–478

(2004).4. Sim, I. et al. The Lancet 367, 1631–1633 (2006).

Table 1 Selected clinical trial results databasesDatabase/launch date Organization Description

Clinicaltrials.gov/2002

http://clinicaltrials.gov/

National Library of Medicine (NLM) Mandatory registry of trials “of experimental treatments for serious or life-threatening diseases or conditions.” Companies can register other trials and submit results, but that is not required by law.

Clinical Study Results/2004www.clinicalstudyresults.org

Pharmaceutical Researchers and Manufacturers of America (PhRMA, Washington, DC)

Voluntary results database of hypothesis-testing clinical trials regardless of outcome. Contains information on trials of over 200 drugs from about 50 companies.

SearchClinicalTrials.org/projected launch end of 2006www.searchclinicaltrials.org

The Center for Information & Study on Clinical Research Participation (CISCRP, Dedham, MA, USA)

Provides access to multiple registries. CISCRP is a nonprofit with sup-port from individuals, government and research institutions, founda-tions and corporations.

Eli Lilly and Company Clinical Trial Registry/2004www.lillytrials.com

Eli Lilly and Company(Indianapolis, IN, USA)

Registers all their phase 2, 3 and 4 clinical trials at initiation and results of phase 1, 2 and 3 trials for all commercially marketed products when the drug is available for patient use. Posts any significant safety findings as soon as possible.

Clinical Trial Protocol Registry and Results Database/2005www.roche-trials.com

Roche (Basel) Registers all their phase 2, 3 and 4 clinical trials that are ongoing and data from phase 2, 3 and 4 ‘confirmatory’ trials.

AstraZeneca Clinical Trials/2005www.astrazenecaclinicaltrials.com

AstraZeneca (London) Registers all their ongoing hypothesis-testing trials and results from all hypothesis-testing trials for its marketed products.

GlaxoSmith Kline Clinical Trials Register/2004http://ctr.gsk.co.uk/welcome.asp

GlaxoSmithKline (Brentford, UK) Holds data and summaries from all their clinical trials, including phase 1, for all marketed products. Results are posted for nonmarketed prod-ucts if GSK sees a safety problem related to mechanism of action.

Sources: Organization websites and organization spokespersons.

NEWS FEATURE©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


B U I L D I N G A B U S I N E S S

The promise of the East: India and China as R&D optionsSimon Goodall, Bart Janssens, Kim Wagner, John Wong, Wendy Woods & Michael Yeh

The East provides increasing opportunities for biotech companies seeking to optimize product development and accelerate time to market. But any undertaking in China or India requires close scrutiny of the risks.

Small to medium-sized enterprises (SMEs) in the biotech sector face a long, arduous

journey toward successful commercialization of early-stage products. To get their products to market more efficiently and to realize their true commercial potential, biotechs are looking for new resources to tap for a productivity boost, and for new markets for their products.

If pursued wisely, one of the most promis-ing and practicable solutions is the sourcing of selected tasks to Asia, particularly to India and China. Both countries have already attracted considerable investment and involvement from pharma multinational corporations and could provide smaller biotechs with comparable opportunities. Consider some of the poten-tial advantages: a huge and inexpensive talent pool (each country produces annually more than three times as many chemistry gradu-ates as the US does), including an increasing number of Western-trained returnees; a vast patient population available for clinical trials; strong government support for biotech, both through investment (as in science parks) and

through policies (such as tax concessions); and increasing private-sector funding and involve-ment. By making shrewd use of these attributes and actively working to manage the risks, your biotech could conduct operations in a leaner, more cost effective and perhaps faster way.

There are dangers, however, and a considered approach remains the watchword. First, any involvement in the region should be under-taken as part of a global R&D strategy, not as ad hoc and opportunistic forays. Then, you need to think of a regional strategy, not a country-specific strategy: the opportunity is a matter of China and India, not China or India. You need to consider that the offshoring process, though designed to ease the challenges and expenses of R&D, labors under its own set of complexi-ties and inefficiencies. Although some oppor-tunities will likely suit your company, others, equally appealing, might not, so you need to make precise evaluations each time. And even the surest opportunity involves possible

risk—most notably the risk to intellectual prop-erty (IP) and the chance of delays through red tape. Biotech SMEs may have more to lose with offshoring than a large pharmaceutical concern does, as they may lack the scale to tolerate IP theft or the failure of an outsourcing venture. They can also ill afford the diversion of internal resources to find the right set of sourcing part-ners or opportunities. The potential benefits do look increasingly viable, but at this point they remain more potential than proven.

If you are seriously considering outsourcing to India or China, you need to start moving toward an integrated and effective strategy. There are three key issues to consider when doing so: your motivation in investing, the location of investment and the risks inherent in the activity.

Motivations for investmentThe four likeliest motives for offshoring work to China and India are saving on R&D costs,

Simon Goodall is at The Boston Consulting Group, 355 South Grand Avenue, Suite 3200, 32nd Floor, Los Angeles, California 90071, USA; Bart Janssens is at The Boston Consulting Group, 14th Floor, Nariman Bhavan, 227 Nariman Point, Mumbai 400.021, India; Kim Wagner is at The Boston Consulting Group, 430 Park Avenue, New York, New York 10022, USA; John Wong is at The Boston Consulting Group, 34th Floor, Shell Tower, Times Square, Causeway Bay, Hong Kong, China; and Wendy Woods and Michael Yeh are at The Boston Consulting Group, Exchange Place, 31st Floor, Boston, Massachusetts 02109, USA.e-mail: [email protected]

Assay execution

SAR evaluation

Medicinalchemistry

Cell-basedmodels for

efficacy

Functionalgenomics

Proteinbiochemistry

Diseasemodels

Geneticallymodified mice

Bioimaging

Targetidentification

Targetvalidation

Compoundgenerationand assay

development

ScreeningLead

optimization

Research (Biology) Research (Chemistry)

Preclinical

Key

act

iviti

es a

nd te

chno

logi

es

Clinical

Phase 1–4

Common serviceofferings

India

Less commonservice offerings

China

Genetic research

Proteomics

Chemoinformatics

Bioinformatics

Expression profiling

Basicmolecular biology

technologies

Analog prep

Synthesis

Drug design

Structuralchemistry

Analyticalchemistry

Compoundsynthesis

HTS/UHTS

Assaydevelopment

Pharmacology

PKDM

Toxicology

Animal modelsfor efficacy

Clinical management

Data management

Regulatory

Figure 1 Indian and Chinese partnering opportunities along the R&D value chain. HTS/UHTS, high throughput screening / ultra-high throughput screening; SAR, structure-activity relationship; PKDM, pharmacokinetics and drug metabolism

©20

06 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

ureb

iote

chno

logy


BUILD ING A BUS INESS

reducing capacity bottlenecks, accessing talent and increasing market access. When broaching a strategy, you first need to clarify what weight to give to each of these motives. And then you need to assess how the available Indian and Chinese opportunities measure up in each case.

The advantages of cost cutting go without saying, but offsetting them are the dangers inherent in any form of outsourcing: the pos-sible need for greater supervision, and the potential for slower and lower-quality output. Reducing capacity bottlenecks is particularly advantageous for resource-strapped firms; by offshoring lower-priority projects, they can concentrate on higher priorities. Similarly, accessing talent to fill gaps as needed should give biotechs the freedom to concentrate on their core strengths. As for increased market access, the advantages again go without saying. Although the market is currently modest, its potential is very sizeable.

Locating the investmentAlthough India and China both offer outsourc-ing opportunities across all phases of the inno-vation value chain, the capabilities are uneven, and some of the more complex activities remain out of reach (Fig. 1). But don’t make any assumptions: new skills and resources keep coming online. A year ago, you would scour both countries in vain for preclinical services of US Food and Drug Administration/good laboratory practice (GLP)-quality; today, Bridge Pharmaceuticals in Beijing, or CDRI in Lucknow, India, will be happy to oblige. And if you need target discovery or validation, you could try various providers in Zhangjiang Life Science Park near Shanghai, or Triesta Sciences in Bangalore.

Though less advanced overall than vendors in the developed world, Asian vendors have a clear advantage when it comes to price, offering cost savings of at least 60% in many areas, such as basic chemistry or clinical trials. Just make sure each time that those cost savings aren’t going to be canceled out by extra administra-tive expenses on your side, or lower productiv-ity on the provider’s. With the right provider, you should be able to ease some of your pipe-line bottlenecks and capacity constraints at a stroke.

Which country to choose for any particular activity or project? And which to give greater emphasis to when devising a strategy? As things stand today, India’s greatest value is in giving you quick access to specific drug-development resources, so it might prove the better bet if your priorities are shorter time frames, easy setup, rapid results and very high cost sav-ings. China’s main attraction is in potentially strengthening your foothold in its huge and

fast-growing biopharma market, so if you have a particularly commercial agenda—developing government contacts, for example, with an eye to increasing market access—you would prob-ably opt for China. And if you have a longer time frame, you might also favor China, and pursue lengthier projects there through an alli-ance partner, perhaps one of the prestigious government-funded research institutes.

But a fully rounded strategy will leverage the assets of both countries, rather than just one of them, taking full advantage of their dif-ferences. In capabilities, China is considerably ahead in biology, though still at a modest level compared to developed nations standards. Chinese scientists participated in the Human Genome Project, and have made some notable advances in gene therapy and stem cell work. In 2003, Shenzhen-based SiBiono GeneTech was granted the world’s first license for a gene therapy medication. In chemistry, on the other hand, India arguably has a solid lead, with some vertically integrated suppliers now able to offer end-to-end services.

As for clinical trials, India once again is quicker off the mark, with contract research organizations typically able to secure approvals and get launched within 3 to 4 months, against a norm of 9 to 12 months in China. India also possesses superior strengths in information technology–dependent areas, most notably bio-statistics and clinical trials data management.

There are also some broader considerations. India has the unquantifiable benefit of very high proficiency in the English language. And arguably, its managerial and scientific/edu-

cational culture is more Westernized than China’s—more open to breaking with tradi-tion and more innovation minded. That said, Chinese scientists with advanced training from Western institutions are returning at ever-increasing rates, often to take management positions at Chinese biopharma companies.

What’s more, China has the distinctive strategic benefit of increased commercial potential for biotech products themselves (see Box 1). Companies that invest in China stand to enhance their commercial prospects by impressing doctors, key opinion leaders and officialdom. By raising technology standards in the country, R&D investors will earn govern-ment goodwill that could raise their chances of expedited approvals and easier market access.

The risks, singly and jointlyOn the downside, there are risk factors specific to each of the two countries. If operations are ever disrupted by workforce disputes or ani-mal-rights activists, that would be in India; if by government interference, that would more likely be in China. The infrastructure is also far more reliable in China; India still suffers from interrupted power supplies, antiquated ports and inadequate highways in many regions. China’s GLP standards are still evolving, and lag behind those of India—with few labs in either country being internationally GLP-approved. And the bureaucratic hurdles differ: the Indian authorities grant approvals for clinical trials far faster than their Chinese counterparts. But at the preclinical stage, Indian regulations are particularly stringent, making it difficult for

Box 1 The region’s market for biotech products

The markets for biotech products in China and India are quite different from those in the developed world—with a far lower proportion of consumers who can pay even a tiny fraction of Western prices. But given the high rate of growth of the region, especially within the middle class, the opportunity may eventually be a lucrative one, especially in China.

China’s overall pharmaceutical market is already 2–3 times more valuable than India’s, and will remain so. It should rise from $12 billion in 2005 to a predicted $37 billion in 2015 (graduating to become the world’s fifth most valuable market en route), against India’s $5.3 billion and $16 billion, in 2005 and 2015, respectively. What’s more, the proportion of generics (currently over 70% by value in both markets) versus branded drugs is declining more steadily in China than it is in India. And the price realization, though lower than that of developed nations, is considerably higher in China than in India. In each country, the target market for high-priced biotech drugs is probably no more than 5% of the population—those with private health insurance. Still, that’s 5% of a billion-strong population in each country. It all adds up: sales of biotech products in China reached $2.5 billion in 2005. Drugs that qualify as blockbusters in the United States can reach annual sales of $50 to $100 million in China with rapid success. GlaxoSmithKline’s (Brentford, UK) Heptodin (lamivudine) reached $80 million in annual sales in China within five years of launch.

That said, the commercial factor is less a current consideration than a future one. Biotechs about to launch new products, at least for the next few years, may best be advised to outlicense them to established pharma companies with proper scale in China or India. M.Y.

©20

06 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

ureb

iote

chno

logy



laboratories to source genetically modified animals and to import and export human tis-sue or blood samples.

Viewed more broadly, the main risks apply to both countries: red tape and insecure IP. In each case, the two governments have taken corrective steps, easing the bureaucratic con-straints and tightening the IP statutes. How these measures translate into reality isn’t yet clear. There are cultural and human factors at work, not just regulatory ones. Western ideas of urgency and privacy may take some time to permeate. Although laws that approach Western standards now exist, their enforce-ment in the realm of biopharma, especially in biologicals has not yet been established (see Box 2 for further details).

Biotechs can reduce their IP risk in both India and China through proactive manage-ment. First, you should carefully weigh the critical value of the IP against the perceived benefits of entering India or China, and refrain from any project with an unfavorable balance. When selecting a partner or vendor, you should make all necessary due diligence evaluations of the candidates on your short-list. In particular, check on their IP-protec-tion measures—physical, electronic, and other. One biotech, for instance, disables its printer drivers and tracks all data downloads. Some local vendors literally erect ‘Chinese walls’—separate rooms and facilities for cli-ent activities—and even withhold the client’s name from the workforce.

And when negotiating contractual arrange-ments, you should ensure that legal recourse, both local and abroad, is properly regis-tered. Vendors such as Beijing-based Bridge Pharmaceuticals and Aurigene in Bangalore maintain US-based operations in part to give assurance that they comply with all US IP regulations—and to give customers the option of pursuing US-based litigation if they don’t.

Even if not offshoring work to India or China, biotechs might still consider it pru-dent to protect their most valuable and vulnerable IP in these countries. By licens-ing IP to Chinese or Indian companies, they stand a better chance of preempting patent infringement, or of being represented by a party with a ‘home court’ advantage in case of litigation.

Choosing a sourcing modelLet’s assume that after weighing the risks and potential benefits scrupulously, you’ve decided to take the plunge, or at least to test the water. You now need to choose an optimal business model. There are three basic models—out-sourcing, partnership and captive investment

model—offering different degrees of flexibil-ity and control. For biotech SMEs, the start-ing point would generally be the outsourcing model: hands-off and low-commitment, and therefore involving minimal supervision and easy entry and exit. Of course, it also involves minimal control over output and IP, and for those dual reasons the projects outsourced would tend to be low-complexity work of less strategic import.

Once your company has gained confidence and has decided upon a longer-term commit-ment to the region or to a particular vendor, you may choose to advance to a partnership model, assigning projects of higher complex-ity or greater breadth to a Chinese or Indian provider, with more of your own participa-tion in supervising, training and monitoring. This would afford you greater control over quality and should improve communication and trust. However, by moving your partner up the learning curve, you risk finding that they use the enhanced know-how of their workforce to serve potential competitors of yours.

The most committed model, captive invest-ment—where a company acquires and oper-ates its own R&D base in China or India—is unlikely to be adopted by smaller or cash-con-strained biotechs. It certainly affords increased

control and IP security, but at the cost of a heavy investment of time and resources. It also means a host of new responsibilities. There is no lon-ger a streetwise local intermediary to deal with red tape or make good any unexpected infra-structure gaps. One biotech that set up a captive base in China admits ruefully that it has had to manufacture its own rodent cages.

Finding the right partnerTo match corporate investors with the right ven-dor or collaborator, both India and China have quasi-official dating agencies. In China, you would approach the administration in any of the biotech parks, and they would recommend a suitable match from the list of firms based there. In India, you would approach the Ministry of Science and Technology’s Department of Biotechnology or the Council for Scientific and Industrial Research, and they would fix you up with a potentially ideal partner.

But it’s worth ranging far wider than these sources. After all, finding the right partner will make a big difference to your offshor-ing experience, so don’t stint on the time and effort invested. In both countries develop ‘guan xi’—good relations with influential people—to get the best advice and also some help in seal-ing the deal. Investors and providers are heav-ily networked, and you should link in to these

Box 2 IP developments in India and China

Among executives contemplating offshoring, IP protection remains a key concern, especially for discovery work. The main IP laws in both India and China are new and relatively untested, so caution is appropriate.

After major changes in India’s IP laws in April 2005 that shifted from process to product protection, India now appears to have a reassuringly tough set of IP standards. Strong trade secret laws and the new Contract Act, based closely on IP statutes in the UK, protect a company against risks related to information leakage or employee switching. In addition, they allow companies to pursue litigation in Western courts against Indian companies for IP breaches. Another source of comfort is the presence of R.A. Mashelkar, director general of the Council for Scientific and Industrial Research. Mashelkar is a leading proponent of biotech partnerships and a global authority on IP protection in developing nations, serving as vice chair of the Commission on Intellectual Property Rights, Innovation and Public Health for the World Health Organization (Geneva). Although India’s new IP laws have appeared to work well in other industries, such as business process outsourcing, which handle sensitive company data, it remains to be seen if they will work as well for biotech and for biological products. After all, the Indian pharma industry as a whole does have a tradition of patent challenges and deep reverse-engineering skills.

China too has a strong set of IP protection laws in place, though perhaps not quite as strong as India’s overall, and perhaps not quite as strong for biologicals as for chemical molecules. Enforcement has been an ongoing issue, and the judicial protection of IP still has to prove itself. But since its accession to the World Trade Organization in 2001, the country has been subject to the Agreement of Trade-Related Aspects of Intellectual Property Rights, so the government is under pressure to enforce international standards. Its previous efforts to change underlying attitudes toward IP protection were not unqualified successes: the patent process remains awkward, Chinese courts continue to struggle with IP cases and protection is not always applied equally across domestic and foreign parties. M.Y.

©20

06 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

ureb

iote

chno

logy



networks right up to the last minute, as the land-scape changes quickly.

At this early stage, biotechs can afford to be cautious and methodical in their approach, as limited vendor capacity is not currently an issue. Over time, vendor capacity should grow to keep pace with demand, with perhaps more of a focus toward smaller biotechs as the sourc-ing market develops. That said, the earlier you take the plunge, the sooner you can reap the cost savings and the better your chances of accessing proven and established vendors.

Looking aheadThe virtue of the sourcing option goes beyond cost and time efficiencies. Biotech talent and drive are increasingly abundant in China and India, and innovative ideas, which can’t be far off, will be equally amenable to tapping. After all, the governments of the two countries aren’t invest-ing in biotech to create sourcing opportunities but to establish vigorous high-tech industries of their own. Specific areas of China and India represent rapidly growing clusters of biopharma

expertise and may ultimately be as important to biotech as San Diego or the Bay Area are. You only need to look at all the innovation emerging from the Taiwanese computer industry to see the parallels with Indian and Chinese biotech, and the pattern of success that the countries are sure to emulate. Small Western biotechs with large ambitions and a taste for adventure can get in at the ground floor and harness Asian innovation, rather than simply offshoring their own.

One other possibility that India and China are opening up is a new model of biotech prod-uct development (and perhaps of manufactur-ing, too). Call it the ‘modular model,’ a kind of decentralized R&D system where different aspects of R&D are distributed globally and conducted almost autonomously in different locations.

But you don’t have to look that far ahead. The opportunities in China and India are rapidly developing, with key pieces falling into place. Weigh the options carefully, delve into the realities and risks of operating within the two countries and decide carefully if you

want to enter. If you do, devise a precise and methodical strategy, find the right partners and implement the strategy with full commitment. With the right strategy, you stand to give your biotech SME a productivity boost and a hand-some competitive advantage.

ACKNOWLEDGMENTSWhile most of the material in this article derives from client work, it is backed by a detailed survey conducted by the Boston Consulting Group in 2005 and 2006, collating the views and experiences of executives at over 90 vendors in China and India and of officers at several government research institutes in the two countries, and of senior executives at over ten biopharma MNCs operating there. A report summarizing the findings from this study (Looking Eastward: Tapping China and India To Reinvigorate the Global Biopharmaceutical Industry, August 2006) along with other publications on the opportunities for biopharma R&D in India and China can be found at http://www.bcg.com

This story was reprinted with some modification from the Building a Business section of the Bioentrepreneur web portal (http://www.nature.com/bioent), 25 July 2006, doi:10.1038/bioent910.

©20

06 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

ureb

iote

chno

logy


How to stay out of a BINDTo the editor:Your very sympathetic editorial in the February issue (Nat. Biotechnol. 23, 215, 2006) regarding the demise of the Biomolecular Interaction Network Database (BIND) assigns the blame for this resource’s passing to “...bureaucratic delays [and] government fiscal nitpicking....” and calls on science funding agencies to provide more long-term funding for databases. Worthy as your crusade to better direct my tax dollars may be, I don’t find BIND to be a particularly suitable poster child for the effort.

According to your account, BIND, via the Blueprint Initiative, burned through $25 million in about two years. Even in Canadian dollars that burn rate is nothing short of shocking, especially given BIND’s relatively modest scope, and the ease with which its data were to be ‘scraped’ from a relatively small number of scientific publications (I have quite a bit of professional experience in this domain, so I say this with some insight.) Personally, I admire Genome Canada’s decision to stop the bleeding.

I’m sure there were, and are, those who have found BIND useful. Whether or not it was another $20.8 million worth of ‘useful’ or a total of $46 million worth of useful, given all the other worthy scientific uses to which that sum could be put, was the question, and Genome Canada decided this in the negative, citing concerns regarding management, budget justification and financial plan—concerns your editorial brushed aside without comment.

A happy consequence of Genome Canada’s decision is that BIND is now where many such efforts belong. . . in private hands (albeit under the same management), where the rigors of the marketplace can impose upon its owners some deep regard for efficiency and utility. If BIND is truly valuable, then

Christopher Hogue can charge users a modest access fee; perhaps research funding agencies will view their grantees’ carefully justified requests for these small sums with favor. He

may then use such hard-won revenues prudently to sustain and improve the product. If, on the other hand, BIND isn’t a particularly important resource, then users won’t be willing to pay, and it will pass on. This is as it should be.

Much the same may be said for the Alliance for Cellular Signaling’s Molecule Pages, which never really amounted to much (numerically, at

least). Now under Nature Publishing Group’s cost- and profit-conscious guidance they will, no doubt, either flourish or fold.

Rather than arguing for the importance

of long-term database funding by granting agencies, BIND’s saga in fact argues for greater caution and more demanding oversight when these agencies elect to fund a database’s initial development. Realistic plans for long-term sustainability must be demanded, as must some basic enterprise management ability on the grant recipient’s part. Such expectations are anything but fiscal nitpicking; they are a fiduciary responsibility. I have no bone to pick with researchers who bemoan the intermingling of capitalism and scientific research (if, in this Bayh-Dole era, there’s anyone left who can still do so with a straight face). But those who feel this way should be prepared to make every precious tax dollar go as far as it possibly can. Those who fail at this should be quicker to blame themselves, and slower to blame ‘bureaucrats’.

William B Busa

Busa Consulting, Renfrew, Pennsylvania, 201 Johns Schools Road, Renfrew, PA 16053, USA.e-mail: [email protected]

The dog as a cancer modelTo the editor:The dog has long been used as a model in drug discovery and development research because of its similarities to human anatomy and physiology, particularly with respect to the cardiovascular, urogenital, nervous and musculoskeletal systems. Compared with other animal models, it may also prove invaluable in research and development on cancer drugs, because dogs naturally develop cancers that share many characteristics with human malignancies. The completion of a high (7.5×) coverage canine genome1 now paves the way for the development of critical resources that will allow the integration of naturally occurring canine cancers within the mainstream of cancer research. To initiate and facilitate collaborative efforts and leverage the opportunities provided by the dog in

cancer research, scientific and clinical leaders from both human and veterinary oncology have come together to form a multidisciplinary consortium, the Canine Comparative Oncology and Genomics Consortium (CCOGC).

Cancers in pet dogs are characterized by tumor growth over long periods of time in the setting of an intact immune system, inter-individual and intra-tumoral heterogeneity, the development of recurrent or resistant disease, and metastasis to relevant distant sites. In these ways, dog cancers capture the ‘essence’ of the problem of human cancer in a manner not possible with other animal model systems. Compared with other large animals commonly used in biomedical research, such as pigs and nonhuman primates, an additional advantage offered by pet dogs is that they are cared for into the ages

C O R R E S P O N D E N C E©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


commonly associated with the highest risk for cancer. This risk, coupled with their large population size (>70 million in the United States), results in a cancer rate sufficient to power clinical trials, including assessment of new drugs. Using crude estimates of cancer incidence, in the United States alone, there are ~4 million new cancer diagnoses made each year in dogs2. Examples of these cancers include non-Hodgkin lymphoma, osteosarcoma, melanoma, prostate carcinoma, lung carcinoma, head and neck carcinoma, mammary carcinoma and soft-tissue sarcoma. For many of these cancers, strong similarities to human cancers are seen, including histological appearance, tumor genetics, biological behavior and response to conventional therapies. The compressed course of cancer progression seen in dogs allows timely assessment of new cancer therapies.

With the recent release of the canine genome sequence, the dog is now also amenable to comparative genomic analysis. Indeed, preliminary assessment of the canine genome suggests that the dog and human lineages are more similar than the human and rodent lineage in terms of both nucleotide divergence and rearrangements. The CCOGC initially plans to take advantage of these opportunities through the following actions:

• Develop a robust and well-annotated biospecimen repository of canine cancers and tissues—funding of a large, accessible biospecimen repository is difficult using existing resources.

• Improve opportunities to link the efforts of veterinary and comparative oncologists with the work of basic cancer researchers and clinicians.

• Initiate non-clinical trials using pet dogs with cancers that are integrated into the development path of new cancer drugs. Mechanisms for review of these non-clinical trials by regulatory bodies should be developed such that information from these studies, where appropriate, may help to focus the scope of early human clinical trials.

To date, non-clinical studies in dogs with cancer have answered questions that would have been difficult or impossible to answer in either mice or humans. The lack of gold-standard veterinary treatments also provides the opportunity for the early and humane evaluation of new therapies for dogs with

cancer. Following institutional review of trials, pet owners would be given the option to enter their dogs into clinical trials and in so doing receive access to novel cutting-edge treatment options for cancer, many of which are less toxic than conventional treatment options currently available. Accordingly, studies in pet dogs offer opportunities in both human and animal healthcare.

First, pet dog trials will help better define the safety and activity of new anticancer agents. They may also assist in the identification of relevant biomarkers associated with response or exposure to these drugs. Furthermore, these studies may allow rational development of combination strategies that will improve the success of these new drugs in human clinic trials. These data may be useful before the filing of an investigational new drug application (IND) at the US Food and Drug Administration (FDA; Rockville, MD) and as means to optimize the development of anticancer agents currently in early human trials.

Second, data generated through such studies may inform the development of new cancer treatments for animals. Research and development of new anticancer treatments is increasingly recognized as an area of need in the field of animal health. In this way, pet dogs with cancer will be directly helped through access to new these new drugs; results may be translated and extended to

the development of better cancer drugs for humans and other pet dogs.

An opportunity window now exists. With the realization of the need for more useful animal models in human cancer drug development, the organization of a number of consortia and collective groups, the completion of the canine genome sequence, the increasing availability of dog-specific biological reagents and investigative methodologies, (e.g. antibodies specific for dog proteins or dog-specific oligonucleotide arrays) and the interest of the animal health biotech and drug industry, the CCOGC hopes to further stimulate efforts to fully exploit the many advantages of the dog in cancer drug research.

Chand Khanna1, Kerstin Lindblad-Toh2,David Vail3, Cheryl London4, Philip Bergman5, Lisa Barber6, Matthew Breen7,Barbara Kitchell8, Elizabeth McNeil9,Jaime F Modiano10, Steven Niemi11,Kenine E Comstock12, Elaine Ostrander13,Susan Westmoreland11 & Stephen Withrow3

1Comparative Oncology Program, Center for Cancer Research, National Cancer Institute, 9610 Medical Center Drive, Room 315, Rockville, Maryland 20815, USA. 2Broad Institute of Harvard and Massachusetts Institute of Technology, 320 Charles Street, Cambridge, Massachusetts 02141, USA. 3Animal Cancer Center, Colorado State University, Fort Collins, Colorado 80523, USA. 4Department of Veterinary Biosciences, The Ohio State University, Columbus, Ohio 43210, USA. 5The Animal Medical Center, New York, New York 10021, USA. 6Department of Clinical Sciences, Tufts University School of Veterinary Medicine, North Grafton, Massachusetts 01536, USA. 7Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, North Carolina 27606, USA. 8Center for Comparative Oncology, Michigan State University, East Lansing, Michigan 44824, USA. 9Department of Veterinary Clinical Sciences, University of Minnesota, St. Paul, Minnesota 55108, USA. 10Integrated Department of Immunology and AMC Cancer Research Center, University of Colorado at Denver and Health Sciences Center, Denver, Colorado 80214, USA. 11Center for Comparative Medicine, Massachusetts General Hospital, Charlestown, Massachusetts 02129, USA. 12University of Michigan, 5111 Cancer Center, Ann Arbor, Michigan 48109, USA. 13National Human Genome Research Institute, National Institutes of Health, 50 South Drive, MSC 8000, Building 50 Bethseda, MD 20892-8000, USA.e-mail: [email protected] [email protected]

1. Lindblad-Toh, K. et al. Nature 438, 803–819 (2005).

2. Vail, D.M. & MacEwen, E.G. Cancer Invest. 18,

781–792 (2000).

The 2.4-billion-bp (7.5× coverage) sequence of a female boxer dog (pictured) published in December 2005 (ref. 1), together with that of a poodle sequence released in 2003, should facilitate the use of dogs in cancer studies.

CORRESPONDENCE©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


GM sterile mosquitoes—a cautionary noteTo the editor:The article in your November issue by Andrea Crisanti and colleagues (Nat. Biotechnol. 23, 1414–1417, 2005) reported the development of a transgenic strain of Anopheles stephensi, an Asian malaria vector, that the authors suggested may be useful as a sexing strain in a sterile insect technique (SIT) program against this vector. The SIT relies on the release of massive numbers of sterilized male mosquitoes to reduce the reproductive capacity of wild populations that transmit malaria1–4. Sterile females can still transmit disease, hence the need for efficient sex separation systems. It is beyond doubt, therefore, that this new methodology addresses an important need of mosquito SIT programs currently under development.

As can be seen from the data, the use of this sexing method under experimental small-scale conditions was successful, but we do wish to respond to the suggestion that this methodology, and even this strain, can immediately be transferred to a large-scale SIT program. On the basis of our experience with the development of comparable systems in other species, we expect that strain evaluation(s) will have to be extremely thorough and carried out under appropriate conditions before it will be possible to judge whether a strain or particular sexing procedure is suitable for use in mosquito control programs integrating the SIT. These strains will have to be reared at high levels of production and for an extended period of time before sufficiently reliable and realistic data on the overall fitness, the accuracy and efficiency of the sexing procedure and the stability of the sexing system become available. In addition, the field performance of these strains will need to be evaluated. All these data will be used by decision makers to weigh any potential negative characteristics of the strain(s) against the benefits they provide, and only then can a judgment on the suitability of a particular strain(s) for inclusion in an SIT program be made.

Radiation-induced sterility provides

some level of risk mitigation when transgenic insects are released, and this approach has been proposed for a first evaluation of the use of this technology5. In operational programs, where insect competitiveness is a key factor for success, there is currently a trend to reduce the radiation dose to a level that maximizes the sterility induction in the wild population. In the case of transgenic strains, however, this level will depend on regulatory requirements and the type of strain that is being released; for example, what type of transgene is used, in combination with

what operational strategy (that is, eradication versus suppression). It is conceivable that a lower dose chosen for a conventional strain would not be appropriate for a transgenic strain.

More troubling is the perception, fuelled by comments made in the press, that any efficient transgenic sexing strain can be easily incorporated into a mosquito SIT program

without much further consideration. Although release of sterile males for mosquito control has been practiced in the past, direct inclusion of modern biotechnological approaches such as transgenesis should not be taken for granted. This technology needs to be considered systemically-holistically and be integrated into a broader social context6—a notion that larger development agencies like the World Bank (Geneva, Switzerland) have recognized for many years, but which still appears to have eluded some scientists and funding bodies. Mosquito genetic control specialists have been discussing the merits and limitations of modern biotech for over five years from a molecular genetic7, ecological8 and transitional9 perspective. It is evident from these discussions that only when the benefits are judged to outweigh the publicly perceived risk of the technology10 will the release of genetically modified (GM) mosquitoes become a reality. Thus, it is imperative that important stakeholders, in particular end-beneficiaries, participate in the scientific and development process11. If not, millions will be poured into technologies that are

not acceptable or feasible, betraying those most in need.

A participative-iterative-strategic approach to malaria control is necessary to cope with intrinsic uncertainties of the interventions and changes of the ‘environment’12. The inclusion of ethical, legal and social aspects in this debate has been rudimentary at best. Although it is argued that GM insects for disease control is still in its infancy, we contend that several negative developments in the field of GM organisms may seriously impede the future applicability of this approach. Given the intricacies of stakeholder management, even in developed parts of the world13, we propose a three-pronged strategy to anticipate potential antagonism.

The first and most critical step will be to gain public support. The establishment of trust through openness and direct involvement of stakeholders, including public authorities and the press, in the decision making process will be critical. Failure in this regard could result in the polarization of viewpoints and scaremongering; indeed, in India in the 1970s claims of biological warfare in the press led to the abandonment of a World Health Organisation (Geneva, Switzerland)-funded mosquito genetic control program just two days before the start of releases and after several years of research of development14. To prevent history from repeating itself, the establishment of equitable partnerships with scientists in disease-endemic countries combined with the transfer of ‘problem ownership’ are necessary. Scientific funding agencies should appreciate the complexity of such issues and the resulting need to communicate through other means in addition to the peer-review process, as rationality and reductionism are embedded in the scientific method and culture15 and are not necessarily perspectives required to tackle complexity. This would lead to a research agenda also driven by developing nations.

A second need relates to oversight. The search for potential field sites to release transgenic mosquitoes is currently proceeding, backed by hastily established and cosmetic partnerships with scientists and institutions in situ. It follows that in the absence of any governance over this process and research progression in years to come, serious problems may develop. Inasmuch as

CORRESPONDENCE©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


developing countries are actively developing policy to engage with GM crops, there is indeed very little going on in terms of GM insects, which, for the record, will ignore national boundaries. An international entity with broad, adaptive and adequate representation is therefore urgently called for. Given the right mandate, it can safeguard against uncontrolled expansion of activities while serving as a shield for antagonistic influences through active stakeholder engagement.

Finally, following the foregoing multiple perspective debates on GM mosquitoes, we propose the rapid initiation of an international gathering to start addressing the complexity of ethical, legal and social aspects of GM mosquitoes for disease control, a process that should already have taken place16,17. We conclude that contrary to there being a ‘green light for mosquito control,’ as announced in your journal18, research on SIT using transgenic insects has, for now at least, stalled at a yellow light.

Bart G J Knols1, Rebecca C Hood-Nowotny1, Hervé Bossin1, Gerald Franz1, Alan Robinson1, Wolfgang R Mukabana2 & Samuel K Kemboi2

1Entomology Unit, FAO/IAEA Agriculture and Biotechnology Laboratory, A-2444 Seibersdorf, Seibersdorf, Vienna, Austria. 2University of Nairobi, P.O. Box 29053, Nairobi, Kenya.e-mail: [email protected]

1. Dyck, A.V., Hendrichs, J. & Robinson, A.S. (eds.) The Sterile Insect Technique: Principles and Practice in Area-Wide Integrated Pest Management (Springer, Heidelberg, 2005).

2. Catteruccia, F. et al. Science 299, 1225–1227 (2003).

3. Andreasen, M. & Curtis, C.F. Med. Vet. Entomol. 19, 238–244 (2005).

4. Franz, G. Genetica 116, 73–84 (2002).5. Benedict, M. & Robinson, A.S. Trends Parasitol. 19,

349–355 (2003).6. Scott, T.A., Takken, W., Knols, B.G.J. & Boete, C.

Science 298, 117–119.7. Alphey, L. et al. Science 298, 119–121 (2002).8. Takken, W. & Scott, T.A. (eds.) Ecological Aspects

for Application of Genetically Modified Mosquitoes. (Kluwer Academic Publishers, Dordrecht, The Netherlands, 2005) <http://library.wur.nl/frontis/malaria/>

9. Knols, B.G.J. & Louis, C. (eds.) Bridging Laboratory and Field Research for Genetic Control of Disease Vectors (Springer, Berlin, 2005). <http://library.wur.nl/frontis/disease_vectors/>

10. The Royal Society. Risk Analysis, Perception and Management. Report of the Royal Society Study Group (The Royal Society, London, 1992).

11. Wynn, B. Global Environ. Change June, 111–127 (1992).

12. Rondinelli, D. Development Projects as Policy Experiments. (Routledge, London & New York, 1993).

13. Lusk, J.L. & Rozan, A. Trends Biotechnol. 23, 386–387 (2005).

14. World Health Organisation. WHO Chronicle 30, 131–139 (1976).

15. Ison, R.L. Rangeland J. 15, 154–166 (1993).16. Macer, D. Ethical, Legal and Social Issues of

Genetically Modified Disease Vectors in Public Health. TDR/STR/SEB/ST/03.1 (World Health Organisation, Geneva, Switzerland, 2003).

17. Touré, Y.T. & Knols, B.G.J. in Genetically Modified Mosquitoes for Malaria Control (Boëte, C., ed.) (Landes Bioscience, Georgetown, Texas, USA, in the press, 2006).

18. Atkinson, P. Nat. Biotechnol. 23, 1371–1372 (2005).

Peter Atkinson responds:Knols et al. draw attention to two important points: that any new genetic strain developed for use in the sterile insect technique must undergo rigorous testing to ensure that it meets the necessary quality control standards required for the successful application of this technique; and that there must be full consultation with the public, stakeholders and any other interested parties before transgenic

strains can be released. These self-evident facts are not in dispute; rather, the advance reported by Crisanti and colleagues in Nature Biotechnology illustrates that recombinant techniques are now generating genetic strains that may now be appropriate for assessment and, pending the outcome, deployment in insect genetic control programs. The application of these developments do need to be openly discussed in the type of forum outlined by Knols et al. and, toward this goal, preliminary workshops on this topic have already been convened1.

1. Takken, W. & Scott, T.W. (eds.) Ecological Aspects

for Application of Genetically Modified Mosquitoes.

Reports from a Workshop held at Wageningen University

and Research Center, June 2002 (Kluwer Academic

Publishers, Dordrecht, The Netherlands, 2003).

Sequencing errors or SNPs at splice-acceptor guanines in dbSNP?

To the editor:Single-nucleotide polymorphisms (SNPs) are the most frequent type of human genetic variation. They are the major basis of our phenotypic individuality, particularly with respect to heritable differences in disease susceptibility. Large collections of mapped SNPs, public and private, are powerful tools for genetic studies1. The most comprehensive public SNP database, dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP), currently contains more than 12 million human SNPs (version 126). This wealth of data is extensively used by a broad community, including clinical, experimental and computational scientists, for both locus-specific and genome-wide studies. Therefore, the quality and completeness of dbSNP is of paramount importance and a recent meta-analysis of four confirmation studies estimated a false-positive rate of ~15–17%2.

As we have an interest in alternative splicing in general3 and with respect to diseases in particular4, we searched dbSNP for human variations in a nine-nucleotide context (three exon and six intron positions) of all splice-donor/acceptor sites of mRNA RefSeqs. Contrary to our expectation for the highly conserved intron positions +1, +2 (donor) and –2, –1 (acceptor), the acceptor G at –1 showed a variability comparable to that of the random position –4 (Fig. 1a). As the disruption of the G at –1 normally results in the loss of the

acceptor site5, we questioned whether this surprising variability could be compensated by any of the known biological processes (for example, RNA editing) or is an indication for a yet unknown biological phenomenon. As we could not shape a plausible explanation for our observation, and before we considered undertaking a challenging, lengthy and potentially fruitless search for an unknown biological mechanism, we decided next to evaluate the possibility that false-positive entries in dbSNP are accountable for the inexplicable variability of position –1.

To this end, we first used the dbSNP validation status description and classified the RefSNPs (dbSNP entries) in three categories: (C1) validated by frequency or genotype data from HapMap6 or any other submitter; (C2) validated by independent submissions, observation of the minor allele in at least two chromosomes or submitter confirmation; and (C3) single submission without confirmation. Conspicuously, position –1 showed the highest fraction in C3 (305 of 364, 84%; Fig. 1b). As experimental verification of RefSNPs depends on the availability of appropriate population samples and assays, it was not feasible for us to carry out such a study on a large scale. Therefore, we switched to a verification procedure making use of the electropherograms derived from automatic fluorescence-based DNA sequencing instruments (traces).

CORRESPONDENCE©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Currently, 76% of all RefSNPs are supplied with trace references and for nearly 60% these data are accessible via the US National Center for Biotechnology Information (NCBI) Trace Archive (http://www.ncbi.nlm.nih.gov/Traces; Supplementary Notes). We manually examined the available traces for RefSNPs at acceptor positions –2, –1 and +1 and collected false-positive entries, which we classified as sequencing errors (wrong base calling due to low signal-to-noise ratio) and database errors (identity of genomic RefSeq and the trace supported RefSNP allele or ambiguous alignment in microsatellites). Sequencing errors were mainly detected among C3 RefSNPs that are solely based on single-pass trace data. Database errors occurred both in C2 and C3 RefSNPs independently of their trace coverage (single trace, multiple traces of the same strand, traces from both strands; Supplementary Notes online).

The astonishing error rate of 93% among 181 RefSNPs with trace data at acceptor position –1 was exclusively caused by the well-known suppression of G after A incorporation using thermostable, genetically engineered DNA polymerases in dye terminator sequencing reactions7 (Fig. 1c). Naturally, this problem occurs at acceptor sites only in forward (5′-to-3′) traces because the AG is CT in the reverse sequencing direction. Moreover, the ‘G after A’ problem is further enhanced by the polypyrimidine tract preceding the acceptor AG in the splice consensus8. Homopolymer stretches of T and C are known to cause problems with sequence accuracy as a result of polymerase slippage9, thus leading to elevated error rates not only at position –1 but also at –2 and +1.

Altogether, we estimated false-positive rates at acceptor positions –2, –1 and +1 of 17%, 82% and 11%, respectively (Supplementary Tables 1–3 online). Excluding the estimated false-positive rates, no significant difference in the variability between acceptor positions –1 and –2 remains. Thus, we conclude that a systematic sequencing error (‘suppressed G after A’) and not a previously unknown biological phenomenon causes the high

frequency of RefSNPs in splice-acceptor position –1.

Sensitized by this analysis, we then asked to what extent dbSNP contains sequencing errors in general. First, a scan of all RefSNPs for the sequence confidence of the allele alternative to the genomic RefSeq confirmed our initial observation that false positives are very likely enriched among C3 entries (18% with Phred confidence value <30; which means more than one error among 1,000 entries10) and will be equally rare among C1 and C2 entries (Fig. 1d; Supplementary Notes online). Moreover, the ‘suppressed G after A’ problem is not restricted to acceptor sites because among all G/H (genomic RefSeq allele/non-RefSeq allele, where H stands for A,C or T) C3 RefSNPs with traces, the fraction of low-confidence entries among A(G/H) variations is twice as large as for the remaining contexts (Fig. 1e; Supplementary Notes online).

For a concluding estimation of sequencing errors in dbSNP, we selected a

set of 10,000 random SNPs and manually examined representative trace sets for all possible N(N/N)N contexts (where N is any nucleotide). Along with the expected A(G/H)N, the C(A/Y)C and G(A/C)C contexts also showed false-positive rates >10%. Altogether, we estimated that there were about 256,000 sequencing and 124,000 database errors, representing 3.2% and 1.5% of all RefSNPs. Among sequencing errors, the vast majority (85%) are caused by the ‘suppressed G after A’ problem. Most interestingly, some of the false RefSNPs were investigated in the HapMap project6 (Supplementary Tables 1–3 online) and, as expected, did not show any variation in all genotyped populations.

The described error rates in dbSNP might both introduce serious biases in large-scale bioinformatic studies and misdirect experimental efforts, particularly if a special sequence context such as acceptor AG is considered. Therefore,

0

100

200

300

400

500

Intron+1 +2 –2 –1

Donor Acceptor

G G

a

c

N A

e

72

9

19

21

4138

70

11

19

74

18

8

G(G/H)

b

0

100

200

300

400

500

GA

C2 C3C1

Intron border

Intronic

Exonic

d

87

310

87

310

6319

18

C1 C2

> 40

30–39

0–29

Phred values

rs12039312

rs6089914

sP

NSfe

R s

PN

SfeR

No.

C3

A(G/H) C(G/H) T(G/H)

No.

Figure 1 RefSNPs and sequence confidence. (a) Apparent hypervariability at splice-acceptor Gs. (b) Classification of RefSNPs at the splice acceptors according to their validation status.(c) Electropherograms (traces) illustrating the ‘G after A’ problem at splice-acceptor sites in the 5′-to-3′ sequencing direction. (d,e) Sequence confidence (Phred) values of trace data supported RefSNPs (d) classified according to their validation status and of G/H RefSNPs(e) classified according to the 5′ nucleotide; (d,e) numbers expressed as a percentage

CORRESPONDENCE©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


we emphatically recommend all users of dbSNP to refer to the ‘validation status’ tag and use a simple SNP classification scheme, as described above, that aims at extracting RefSNPs with lower error rates. According to our classification, dbSNP (version 124) contains in C1, C2 and C3 2,077,680, 2,946,840 and 3,470,166 entries, respectively. To investigate the differences between those three classes, we extracted the available confidence information. C1 and C2 RefSNPs have higher average values (both 51.4) than SNPs in C3 (43.2, Supplementary Notes online). Furthermore, about 87% in C1 and C2 have confidence values of at least 40, in contrast to only 63% in C3 (Fig. 1d). As a low confidence value indicates a potential sequencing error, we recommend that bioinformatics and/or experimental efforts either use only C1 and C2 RefSNPs or find a way of excluding from C3 all dbSNP entries with Phred <40 (ref. 11).

Note: Supplementary information is available on the Nature Biotechnology website.

Matthias Platzer1, Michael Hiller2,

Karol Szafranski1, Niels Jahn1, Jochen Hampe3, Stefan Schreiber3, Rolf Backofen2 & Klaus Huse1

1Genome Analysis, Leibniz Institute for Age Research–Fritz Lipmann Institute, Beutenbergstr. 11, 07745, Jena, Germany. 2Institute of Computer Science, Albert-Ludwigs-University Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany. 3Institute for Clinical Molecular Biology, Christian-Albrechts-University Kiel, Schittenhelmstr. 12, 24105, Kiel, Germany.e-mail: [email protected]

1. Kruglyak, L. Nat. Genet. 17, 21–24 (1997).2. Mitchell, A.A., Zwick, M.E., Chakravarti, A. & Cutler,

D.J. Bioinformatics 20, 1022–1032 (2004).3. Hiller, M. et al. Nat. Genet. 36, 1255–1257

(2004).4. Valentonyte, R. et al. Nat. Genet. 37, 357–364

(2005).5. Krawczak, M., Reiss, J. & Cooper, D.N. Hum. Genet.

90, 41–54 (1992).6. International HapMap Consortium. Nature 426,

789–796 (2003).7. Korch, C. & Drabkin, H. Genome Res. 9, 588–595

(1999).8. Stephens, R.M. & Schneider, T.D. J. Mol. Biol. 228,

1124–1136 (1992).9. Kotlyar, A.B., Borovok, N., Molotsky, T., Fadeev,

L. & Gozin, M. Nucleic Acids Res. 33, 525–535 (2005).

10. Ewing, B. & Green, P. Genome Res. 8, 186–194 (1998).

11. Hiller, M. et al. Am. J. Hum. Genet. 78, 291–302 (2006).

Data integration gets ‘Sloppy’To the editor:Data integration in life sciences currently faces a conundrum1–4. On the one hand, the diversity of data is increasing as explosively as its volume. This makes it imperative that some degree of data formatting standardization

is agreed upon by the diverse community generating and using that data. On the other hand, the value of individual data sets can only be appreciated when enough of those distinct pieces of the systemic puzzle are put together. Therefore, it is also imperative that

standard formats not be enforced so strictly as to be an obstacle to reporting the very novel data that brings value to the targeted systemic integration. We present here a prototype application, termed Simple Sloppy Semantic Database (S3DB), that provides a bridge between loosely structured raw data annotated using personal ontologies and a globally referenceable semantic representation indexed to controlled vocabularies. Wide adoption of this database formalism has the potential to facilitate and optimize data management in a range of research fields, from molecular epidemiology to basic biology.

For most types of biological data, the agreed-upon communal format has a complexity that is far from trivial and requires specialized converters that were not available when the analytical method was first developed. For example, an agreed-upon Minimum Information about Microarray Experiments (MIAME) standard was defined in 2001 (ref. 5), but the jury is still out for much older and widely used techniques such as gel-based proteomics (for example, see ref. 6). Even when, after much consultation, a community standard emerges, the rigidity of minimal descriptions eventually becomes insufficient for stand-alone reposition7. Like many others before us, we have reached the conclusion that complementary efforts in proteomics8, transcriptomics9 and genomics10 can only be integrated in a common representation within a semantic framework2,11. We have specifically argued2 for the need to migrate to RDF (Resource Description Framework) from the more widely used XML (Extensible Markup Language) hierarchies or relational structures, a view also espoused by the World Wide Web consortium Life Sciences interest group (http://www.w3.org/2001/sw/hcls/). However, that formalism is cumbersome for configuring information management systems and trades human intuitiveness for machine process expressiveness. This combination of implementation and interface challenges typically loses the very contribution that is needed to put the systemic puzzle together: that of the ‘biology domain’ expert.

Figure 1 Example of a S3DB application. The indexing scheme is described by the table in the upper left, where the connecting lines identify the three clauses, (a)–(c), verified by the validation engine for a new statement. Three snapshots of the S3DB application for the example discussed in the text are displayed: directed graph depiction of the rules (1), validation log for submission of a literal (nuclear data element such as ‘30 years’) (2) and validation log for the association of two resources (3).

CORRESPONDENCE©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


A bridge between poorly structured raw data annotated to personal ontologies and a globally referenceable semantic representation indexed to controlled vocabularies is thus needed. Such a bridge should raise no obstacles to data submission and should instead allow the incremental editing of the underlining data model without compromise of the data already submitted. It should also be deployable as a web-server application such that collaborating users can share a common repository independently of their location. Finally, it should allow the referencing to external controlled vocabularies and should be exportable as RDF2. With this in mind, we have developed a prototype application, S3DB, that incorporates all these characteristics.

The proposed implementation of editable semantic reposition relies on a relational backbone made of three tables: rules, statements and resources. The concept driving the configuration of S3DB is purely semantic and relies solely on documenting an entity relationship (ER) model12 of the type [Subject][Property][Object]. The solution that enables editable data model reposition consists of an indexing scheme where the permanence of the indexes rather than the permanence of the element names allows renaming and reassociation without loss of content. Although the relational backbone of S3DB consists of three tables, they do not establish a relational model. Instead, their interoperation relies on a validation engine that checks for syntactic and semantic consistency. Data are submitted as statements made of five element vectors, [Subject] [UID] [Property] [Object] [Value] that are verified for the following: (i) that the triple [Subject][Property][Object] exists in the rules table; (ii) that the resource unique index (UID) pair [Subject][UID] exists in the resources table; and (iii) that if [Object] is a resource (if it is declared as having UID in rules) then [Value] has to be a valid UID for that resource (i.e. , the pair [Subject][UID] exists in the resources table).

This solution mimics locally the sort of evolution of the data model that we expect to achieve for global representations using RDF2. Its workings are illustrated using an example in Figure 1. To assign an age to a patient, the first step is to add that property to the domain of discourse (for example, to include an entry in the rules table saying that people have age as a demographic property; see popup window inset). Subsequently, a statement can be made for an existing patient, UID 115, saying that she is 30 years

of age (literal value) and then that a sample, UID 308, was collected. Because not just all resources, but also all rules and statements, are uniquely indexed (UID), their contents can subsequently be edited for renaming of resources and rewiring of relations. The result is a conveyor belt of successive editing into more structured and global formats. The S3DB prototype was developed with open-source languages and is made freely available with open source for unrestricted use and modification (http://www.s3db.org).

ACKNOWLEDGEMENTSThis work was partially funded by the National Heart, Lung and Blood Institute of the US National Institutes of Health, under contract no. N01-HV-28181, and the PREVIS project (Pneumococcal Resistance Epidemicity and Virulence, An International Study), contract number LSHM-CT-2003-503413 from the European Commission.

Jonas S Almeida1*,3, Chuming Chen2,Robert Gorlitsky2, Romesh Stanislaus2,Marta Aires-de-Sousa3, Pedro Eleutério3,João Carriço3, António Maretzek3,Andreas Bohn3, Allen Chang1, Fan Zhang4,Rahul Mitra4,5, Gordon B Mills4,Xiaoshu Wang2 & Helena F Deus3

1Department of Biostatistics and Applied Mathematics, University of Texas, 1515 Holcombe Blvd., Box 0447, Houston, Texas 77030-4009, USA. 2Department of Biostatistics, Bioinformatics & Epidemiology, Medical University of South Carolina, 135 Cannon Street, Suite 303, Charleston, South Carolina 29425, USA. 3Instituto de Tecnologia Química e Biológica da Universidade Nova de Lisboa (ITQB/UNL), Av. da República (EAN), 2781-901 Oeiras, Portugal. 4Kleberg Center for Molecular Markers, Department of Molecular Therapeutics, University of Texas M. D. Anderson Cancer Center, 1515 Holcombe Blvd., Box 0317, Houston, Texas 77030-4009, USA.e-mail: [email protected]

1. Hey, T. & Trefethen, A.E. Science 308, 817–821 (2005).

2. Wang, X., Gorlitsky, R. & Almeida, J.S. Nat. Biotechnol. 23, 1099–1103 (2005).

3. Buetow, K.H. Science 308, 821–824 (2005).4. Foster, I. Science 308, 814–817 (2005).5. Brazma, A. et al. Nat. Genet. 29, 365–371 (2001).6. Stanislaus, R. et al. BMC Bioinformatics 5, 9 (2004).7. Shields, R. Trends Genet. 22, 65–66 (2006).8. Stanislaus, R. et al. Bioinformatics 21, 1754–1757

(2005).9. Almeida, J.S. et al. Compar. Func. Genomics 6, 132–

137 (2005).10. McKillen, D.J. et al. BMC Genomics 6, 34 (2005).11. Neumann, E. Sci. STKE 2005, pe22 (2005).12. Chen, P.P.S. Assoc. Comput. Machinery Trans.

Database Syst. 1, 9–36 (1976).

Replacing cRNA targets withcDNA reduces microarraycross-hybridization

To the editor:Gene-expression microarrays are designed to measure relative concentrations of transcripts through the specific hybridization of an immobilized DNA probe to its complementary target. This technology is viable to the extent that a single, rather permissive hybridization condition allows most probes to bind specifically to their targets. Despite efforts to maximize stringency, a significant hybridization signal can still be detected on various oligonucleotide-based platforms, even when there are a few mismatches between probe and target1–3. Furthermore, several groups have detected widespread cross-hybridization in microarray measurements4,5, and on the order of 10% of the probes on a common oligonucleotide array platform were predicted to be susceptible to cross-hybridization5. Efforts to optimize probe length found that longer probes enjoy stronger signal intensity but also suffer from increased

propensity toward cross-hybridization6. Therefore, nonspecific binding remains a significant source of measurement error and may be the reason why quantitative reverse transcription (qRT)-PCR fails to confirm about 10–20% of difference calls made by microarray analysis (reviewed in ref. 7). Here, we report that a high-level of promiscuity in DNA-RNA hybridization underlies widespread cross-hybridization in microarrays. This cross-hybridization can be reduced using cDNA targets in place of cRNA.

From its inception, microarray technology took advantage of either of two types of biochemical entities as the labeled target, cRNA8 or cDNA9. Although in many aspects these two types of labeled target are considered to be equivalent for the purpose of microarray analysis, the use of cRNA has held an important methodological advantage. Because RNA polymerase does not require a primer, it was rather straightforward to design a near-linear

CORRESPONDENCE©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


target amplification method, which has been extremely useful in experiments with small amounts of starting RNA8, including most clinical studies.

Even so, there are hints in the literature indicating that DNA-RNA hybridization might be less specific than DNA-DNA hybridization. One such relevant observation is that in free solution DNA-RNA hybrids, certain mismatch ‘wobble’ base pairs are more stable than complementary base pairs10. This effect is, however, absent in DNA-DNA hybrids11,12, indicating that targets made of DNA might be a more specific alternative to standard RNA targets. Although free solution studies cannot be automatically extrapolated to the microarray process (microarray hybridizations are not in free solution, are not in equilibrium13 and are burdened with labeling moieties), we postulated that using cDNA instead of cRNA targets may reduce cross-hybridization on microarrays. Further stimulus for testing this hypothesis was provided by the recent development of a novel target-preparation method using a chimeric DNA-RNA primer and isothermal DNA amplification to generate cDNA that is subsequently labeled for use on standard oligo microarrays14. This method provides a near-linear amplification method with labeled cDNA as the end product.

To compare the extent of cross-hybridization using cRNA and cDNA targets, we used a baseline RNA specimen from the human T-cell leukemia–derived Jurkat cell line with or without an added set of ‘spikes’ comprising hemoglobin transcripts HbA1, HbA2 and HbB. The comparative microarray analysis of these two samples is expected to identify all probes affected by cross-hybridization with the spikes. Therefore, the two RNA samples were processed in triplicate, either with a standard one-round T7 in vitro transcription protocol to produce cRNA target or with the linear isothermal protocol15 to produce cDNA target. Each labeled sample was hybridized to Affymetrix (Santa Clara, CA, USA) U133A 2.0 microarrays containing 22,277 probe sets, nine of which were intended to detect the spike-in transcripts (for further details, see Supplementary Methods online; raw data from the 12 arrays is available from GEO, accession no. GSE4532).

These nine probe sets were indeed the most differentially expressed, regardless of the type of target used (Fig. 1a,b). Along with these true changes, there were other probe sets that seemed to have lesser, but

still substantial, differential expression. Using simple criteria, we detected 791 false changes using cRNA and 19 false changes using cDNA (Fig. 1a,b).

The Affymetrix probe set comprises multiple individual probes, each with a different sequence. As expected, we found that all probes with a sequence exactly matching those of the spikes were increased using either type of target. Additionally, many off-target probes were increased or decreased, and this effect was substantially greater with cRNA than with cDNA (Fig. 1c,d). In general, fold change of a probe corresponded with maximum sequence similarity between that probe and any of the three spikes (Fig. 1c,d). Thus, it seems that cDNA target is better able to discriminate between the correct probe and similar but incorrect probes.

It is unlikely that the increased specificity using cDNA targets instead of cRNA targets

comes at the price of decreased sensitivity. First, the percentage of probe sets called ‘present’ was very similar in the baseline sample (cRNA range 60.5–61.2; cDNA range 60.4–61.4) and greater with cDNA than with cRNA after addition of spikes (cRNA range 51.0–53.0; cDNA range 59.3–60.2). Also, in a recent paper Barker et al.16 compare the same cDNA- and cRNA-based protocols for their ability to identify differentially expressed genes in the same pair of RNA aliquots. To validate differential gene-expression calls, they ran independent qRT-PCR based measurements on 106 genes whose expression varied by several orders of magnitude. Across the entire concentration range of independently quantified genes, measurements using cDNA targets were better correlated with qRT-PCR than were measurements using cRNA targets16.

To assess microarray specificity, we chose to compare two RNA samples

–5 0 5 10 –5 0 5 10

6 8 10 12 14 6 8 10 12 14

14

12

10

8

6

14

12

10

8

6

10

5

0

–5

10

5

0

–5

log2 probe intensity, no spikes log2 probe intensity, no spikes

log 2

pro

be in

tens

ity, s

pike

s ad

ded

log 2

pro

be in

tens

ity, s

pike

s ad

ded

log2 MAS5 signal, no spikes log2 MAS5 signal, no spikes

log 2

MA

S5

sign

al, s

pike

s ad

ded

log 2

MA

S5

sign

al, s

pike

s ad

ded

cRNA cDNA

cRNA cDNA

True changeFalse changeNo change

012–2426–3436–4448–50

BLAST score

a b

c d

Figure 1 Comparison of microarray data from Jurkat-cell RNA with and without spiked transcripts. (a,b) At the probe-set level, cRNA target (a) detected more false changes than did cDNA target(b). Here, a probe set was considered changed if it had present calls in at least three of six samples, twofold change in mean expression and Student’s t-test P < 0.05. (c,d) At the individual probe level, cRNA target (c) was more likely than cDNA target (d) to hybridize with off-target probes with some complementarity, as determined with BLAST. For reference, the score for an alignment with no mismatches is approximately equal to twice the number of identical bases.

CORRESPONDENCE©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


that differ only by a set of three high-concentration spikes. It seems unlikely that a physiologically relevant experiment would involve gene-level changes of this magnitude, so our results are likely to indicate an upper limit of the effects of cross-hybridization. However, in an experiment in which many genes change levels, we expect that cross-hybridization could affect a larger number of probes, albeit to a lesser degree.

As indicated by our results, the use of multiple probes against the same transcript in Affymetrix gene chips may partially compensate for the effect of a few nonspecific probes within a given probe set. Nonetheless, there is a general tendency even on this platform to reduce the number of probes per transcript. For example, the recently introduced exon arrays from Affymetrix contain only four probes per exon. Interestingly, DNA rather than RNA target is used in this type of microarray, which may explain why four probes per probe set are sufficient. Other widely used microarray platforms, such as those from Agilent (Palo Alto, CA, USA), GE Healthcare (Little Chalfont, UK), Illumina (San Diego, CA, USA), and Applied Biosystems (Foster City, CA, USA), generally use a single probe per transcript; therefore, when a probe is prone to cross-hybridization, its false signals cannot be corrected by other probes. We thus conclude that replacing cRNA with cDNA seems to provide a simple solution to eliminating a significant portion of undesirable nonspecific signals.

Genomic tiling arrays have recently been used to determine the true extent of the transcribed portion of the human genome (reviewed in ref. 17). Because of the lack of appropriate controls for false-positive detection17, these experiments are particularly sensitive to errors caused by cross-hybridization. In light of our results, it is therefore not surprising that a side-by-side analysis of the same transcript pool by RNA- and cDNA-based analysis revealed only a 35% overlap between the positive probe pairs identified by the two methods18. Consequently, it is rather reassuring that most tiling array experiments searching for previously unidentified transcripts used labeled cDNA as target18.

Although the results presented here demonstrate the increased promiscuity of RNA-DNA hybridization relative to DNA-DNA hybridization, it will take further experimentation to see whether the same difference holds in vivo. An affirmative result may have significant implications for the design of DNA-based antisense therapy, RNA-mediated chromatin remodeling and DNA methylation, which may be initiated by RNA-DNA hybridization19.


Aron C Eklund1,2, Leah R Turner3,Pengchin Chen3, Roderick V Jensen4, Gianfranco deFeo3, Anne R Kopf-Sill3 &Zoltan Szallasi1

1Children’s Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and

Technology (CHIP@HST), Harvard Medical School, Boston, Massachusetts 02115, USA. 2Center for Neurologic Diseases, Brigham and Women’s Hospital, 65 Landsdowne St., Cambridge, Massachusetts 02139, USA. 3NuGEN Technologies, Inc., 821 Industrial Rd, Unit A, San Carlos, California 94070, USA. 4Department of Physics, University of Massachusetts Boston, Boston, Massachusetts 02125, USA.e-mail: [email protected]

1. Ramakrishnan, R. et al. Nucleic Acids Res 30, e30 (2002).

2. Chudin, E. et al. Genome Biol 3, RESEARCH0005 (2002).

3. Hughes, T.R. et al. Nat Biotechnol 19, 342–347 (2001).

4. Zhang, J., Finney, R.P., Clifford, R.J., Derr, L.K. & Buetow, K.H. Genomics 85, 297–308 (2005).

5. Wu, C., Carta, R. & Zhang, L. Nucleic Acids Res. 33, e84 (2005).

6. Relogio, A., Schwager, C., Richter, A., Ansorge, W. & Valcarcel, J. Nucleic Acids Res. 30, e51 (2002).

7. Draghici, S., Khatri, P., Eklund, A.C. & Szallasi, Z. Trends Genet. 22, 101–109 (2006).

8. Lockhart, D.J. et al. Nat. Biotechnol. 14, 1675–1680 (1996).

9. Schena, M., Shalon, D., Davis, R.W. & Brown, P.O. Science 270, 467–470 (1995).

10. Sugimoto, N., Nakano, M. & Nakano, S. Biochemistry 39, 11270–11281 (2000).

11. Allawi, H.T. & SantaLucia, J., Jr. Biochemistry 37, 2170–2179 (1998).

12. Allawi, H.T. & SantaLucia, J., Jr. Biochemistry 37, 9435–9444 (1998).

13. Sekar, M.M., Bloch, W. & St John, P.M. Nucleic Acids Res. 33, 366–375 (2005).

14. Kurn, N. et al. Clin. Chem. 51, 1973–1981 (2005).15. Dafforn, A. et al. Biotechniques 37, 854–857

(2004).16. Barker, C.S. et al. BMC Genomics 6, 57 (2005).17. Johnson, J.M., Edwards, S., Shoemaker, D. & Schadt,

E.E. Trends Genet. 21, 93–102 (2005).18. Kampa, D. et al. Genome Res. 14, 331–342 (2004).19. Matzke, M.A. & Birchler, J.A. Nat. Rev. Genet. 6,

24–35 (2005).

CORRESPONDENCE©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Why spurning food biotech has become a liabilityHenry I Miller, Gregory Conko & Drew L Kershen

By rejecting gene-spliced ingredients in their products, some major food companies may be making foods that are less safe and wholesome for consumers—and that expose them to litigation.

In the late 1990s, a singular phenomenon swept the world. One after another, food and bever-age companies capitulated to strident xeno-genophobic voices that called for elimination of gene-spliced ingredients from their prod-uct lines. In the United States, fast food giant McDonald’s (Chicago) banned transgenic ingredients from its menu, food manufacturers Heinz (Pittsburgh) and Gerber (Fremont, MI, USA) dropped them from their baby food lines, and Frito-Lay (Atlanta) told its growers to stop planting corn containing Bacillus thuringiensis (Bt) toxin or risk exclusion from its snacks busi-ness. Elsewhere, brewers Kirin (Shinkawa, Japan) and Carlsberg (Valby, Denmark) eliminated gene-spliced ingredients from their beers.

These actions were rationalized variously as “protecting stakeholder interests,” “ensuring human safety” and “safeguarding the environ-ment.” Ironically (and also surprisingly in these litigious times), in their eagerness to avoid bio-tech and the mainstream media’s “if it bleeds, it leads” coverage of the outlandish accusations and speculations of anti-biotech activists, these companies have exposed themselves to richly deserved legal jeopardy.

Toxic foodEvery year, scores of packaged food products are recalled from the US market because of the presence of (all-natural) contaminants like insect parts, toxic molds, bacteria and viruses. Because farming takes place out of doors and in dirt, such contamination is a fact of life. Over the centuries, the main culprits in mass food poisoning have often been mycotoxins, such as ergotamine from ergot (Claviceps purpurea) or fumonisin from Fusarium spp., resulting from the fungal contamination of unprocessed crops. This process is exacerbated when insects attack food crops, opening wounds in the plant

cuticle and epidermis that provide an opportu-nity for pathogen invasion. Once the molds get a foothold, poor storage conditions also pro-mote their post-harvest growth on grain.

Fumonisin and some other mycotoxins are highly toxic, causing fatal diseases in livestock that eat infected corn and esophageal cancer in humans. Fumonisin also interferes with the cellular uptake of folic acid, a vitamin that is known to reduce the risk of neural tube defects in developing fetuses. Because fumonisin pre-vents the folic acid from being absorbed by cells, the toxin can, in effect, induce functional folic acid deficiency—and thereby cause neural

Henry I. Miller is a fellow at the Hoover Institution, Stanford University, Stanford, California 94305-6010, USA. Gregory Conko is at the Competitive Enterprise Institute, 1001 Connecticut Avenue NW, Washington, DC 20036, USA. Barron’s has named their book, The Frankenfood Myth: How Protest and Politics Threaten the Biotech Revolution, one of the 25 Best Books of 2004. Drew L. Kershen is at the University of Oklahoma College of Law, Norman, Oklahoma 73019-5081, USA. e-mail: [email protected]

What’s on the menu? Consumers could react litigiously if baby food giants like Gerber or Heinz continue to prioritize the perceived risks of gene-spliced foods over the clear and present dangers of food allergies and mycotoxins.

©LW

A-D

ann

Tard

if/C

OR

BIS

C O M M E N TA RY©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


tube defects such as spina bifida—even when the diet contains what otherwise would be suf-ficient amounts of folic acid.

Regulatory agencies, such as the US Food and Drug Administration and UK Food Safety Agency are acutely aware of the danger of myco-toxins. They have established recommended maximum fumonisin levels in food and feed products made from corn. Although highly processed cornstarch and corn oil are unlikely to be contaminated with fumonisin, unpro-cessed corn or lightly processed corn (e.g., corn meal) can have fumonisin levels that exceed rec-ommended levels. In 2003, the UK Food Safety Agency tested six organic corn meal products and 20 conventional corn meal products for fumonisin contamination. All six organic corn meals had elevated levels—from nine to forty times greater than the recommended levels for human health—and they were voluntarily withdrawn from grocery stores.

A role for biotechThe conventional way to combat mycotoxins is simply to test unprocessed and processed grains and discard those found to be contami-nated—an approach that is both wasteful and dubious. But modern technology—specifi-cally, products derived from recombinant DNA technology (also known as food biotech, gene-splicing or genetic modification)—offers a way to prevent the problem. Contrary to the claims of biotech critics, who single out such crops as posing the risk of new allergens, toxins or other nasty substances if introduced into the food supply (none of which has been proven actu-ally to have occurred), such products would offer the food industry a proven and practical means of tackling the fungal contamination at its source.

An excellent example is corn crafted by splicing into commercial corn varieties a gene (or genes) encoding natural toxins from the bacterium B. thuringiensis. The Bt gene expresses a protein that is toxic to corn-bor-ing insects but is harmless to birds, fish and mammals, including humans. As the Bt corn fends off insect pests, it also reduces the levels of the mold Fusarium, thereby reducing the levels of fumonisin. Thus, switching to the gene-spliced, insect-resistant corn for food processing would lower the levels of fumo-nisin—as well as the concentration of insect parts—likely to be found in the final product. Indeed, researchers at Iowa State University in Ames and the US Department of Agriculture found that in Bt corn the level of fumonisin is reduced by as much as 80% compared to conventional corn1,2.

Thus, on the basis of both theory and empirical knowledge, there should be potent

incentives—legal, commercial and ethical—to use such gene-spliced grains more widely. One would expect public and private sector advo-cates of public health to demand that such improved varieties be cultivated and used for food—not unlike requirements for drinking water to be chlorinated and fluoridated. Food producers who wish to offer the safest and best products to their customers—to say nothing of being offered the opportunity to advertise ‘new and improved!’—should be competing to get gene-spliced products into the marketplace.

Alas, none of this has come to pass. Activists have mounted vocal and intractable opposi-tion to food biotech, in spite of demonstrated, significant benefits, including reduced use of chemical pesticides, less runoff of chemicals into waterways, greater use of farming prac-tices that prevent soil erosion, higher profits for farmers and less fungal contamination. Inexplicably, government oversight has also been an obstacle, by subjecting the testing and commercialization of gene-spliced crops to unscientific and draconian regulations that have vastly increased testing and development costs and limited the use and diffusion of food biotech.

The result is jeopardy for everyone involved in food production and consumption: consum-ers are subjected to avoidable, and often unde-tected, health risks, and food producers have placed themselves in legal jeopardy. The first point is obvious, the latter less so, but it makes a fascinating story: agricultural processors and food companies may face at least two kinds of civil liability for their refusal to purchase and use fungus-resistant, gene-spliced plant variet-ies, as well as other superior products.

(Baby) food for thoughtIn 1999, the Gerber foods company succumbed to activists’ pressure, announcing that its baby food products would no longer contain any gene-spliced ingredients. Indeed, Gerber went farther and promised it would attempt to shift to organic ingredients that are grown without synthetic pesticides or fertilizers. Because corn starch and corn sweeteners are often used in a range of foods, this meant wholesale changes to Gerber’s entire product line.

As noted above, not only is gene-spliced corn likely to have lower levels of fumonisin than conventional varieties, but organic is likely to have the highest levels because it suffers greater insect predation due to less effective pest con-trols. If a mother some day discovers that her ‘Gerber baby’ has developed liver or esopha-geal cancer, or a neural tube defect such as spina bifida, she might have a valid legal claim against Gerber3. On the child’s behalf, a plain-tiff ’s lawyer can allege strict products liability

based on mycotoxin contamination in the baby food as the causal agent of the cancer or neural tube defects. The contamination would be considered a manufacturing defect under products liability law because the baby food did not meet its intended product specifications or level of safety. Gerber could be found liable “even though all possible care was exercised in the preparation and marketing of the product,” simply because the contamination occurred.

The plaintiff ’s lawyer could also allege a design defect in the baby food, because Gerber knew of the existence of a less risky design—namely, the use of gene-spliced varieties that are less prone to Fusarium and fumonisin contam-ination—but deliberately chose not to use it. Instead, Gerber chose to use non-gene-spliced, organic food ingredients, knowing that the foreseeable risks of harm posed by them could have been reduced or avoided by adopting a reasonable alternative design—that is, by using gene-spliced Bt corn, which is known to have a lower risk of mycotoxin contamination.

Gerber might answer this design defect claim by contending that it was only responding to consumer demand, but that alone would not be dispositive. Products liability law subjects defenses in design defect cases to a risk-utility balancing in which consumer expectations are only one of several factors used to determine whether the product design (e.g., the use of only non-gene-spliced ingredients) is reason-ably safe. A jury might conclude that whatever consumer demand there may be for nonbiotech ingredients does not outweigh Gerber’s failure to use a technology that is known to lower the health risks to consumers.

Even if Gerber were able to defend itself from the design defect claim, the company might still be liable because it failed to provide adequate instructions or warnings about the potential risks of non-gene-spliced ingredients. For example, Gerber could have labeled its non-gene-spliced baby food with a statement such as: “This product does not contain gene-spliced ingredients. Consequently, this product has a very slight additional risk of mycotoxin contamination. Mycotoxins can cause serious diseases, such as liver and esophageal cancer and birth defects.”

Hypoallergenic foodsWhatever the risk of toxic or carcinogenic fumonisin levels in nonbiotech corn may be (probably low in industrialized countries, where food producers generally are cautious about such contamination), a more likely sce-nario is potential legal liability when a food product causes an allergic reaction4.

Between 6% and 8% of children and between 1% and 2% of adults are allergic to one or

COMMENTARY©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


another food ingredient, and an estimated 150 US citizens die each year from exposure to food allergens5. Allergies to proteins from peanuts, soybeans and wheat, for example, are quite common and can be severe. Although only about 1% of the population is allergic to peanuts, some individuals are so highly sensi-tive that exposure causes anaphylactic shock, killing dozens of people every year in North America6.

Protecting those with true food allergies is a daunting task. Farmers, food shippers and processors, wholesalers and retailers, and even restaurants must maintain meticulous records and labels and ensure against cross-contamina-tion. Still, in a country where about a billion meals are eaten every day, missteps are inevi-table. Dozens of processed food items must be recalled every year due to accidental contami-nation or inaccurate labeling.

Fortunately, biotech researchers are well along in the development of crops in which the genes encoding allergenic proteins have been silenced or removed. According to University of California, Berkeley, biochem-ist Bob Buchanan, hypoallergenic varieties of wheat could be ready for commercialization within a decade, and nuts soon thereafter (R. Buchanan, personal communication; ref. 7). Once these products are commercially avail-able, agricultural processors and food compa-nies that refuse to use these safer food sources will open themselves to products-liability, design-defect lawsuits4.

Property damages and personal injuryPotato farming is a growth industry, primarily due to the vast consumption of french fries at fast-food restaurants. However, growing pota-toes is not easy because they are preyed upon by a wide range of voracious and difficult-to-control pests, such as the Colorado potato bee-tle, virus-spreading aphids, nematodes, potato blight and others.

To combat these pests and diseases, potato growers use an assortment of fungicides (to control blight), insecticides (to kill aphids and the Colorado potato beetle) and fumigants (to control soil nematodes). Although some of these chemicals are quite hazardous to farm workers, forgoing them could jeopardize the sustainability and profitability of the entire potato industry. Standard application of syn-thetic pesticides enhances yields more than 50% over organic potato production, which prohibits most synthetic inputs.

Consider a specific example. Many growers use methamidophos, a toxic organophosphate nerve poison, for aphid control. Although methamidophos is a US Environmental Protection Agency–approved pesticide, the

agency is currently reevaluating the use of organophosphates and could ultimately pro-hibit or greatly restrict the use of this entire class of pesticides. As an alternative to these chemicals, Monsanto (St. Louis) developed a potato dubbed NewLeaf that contains a Bt gene to control the Colorado potato beetle. The ORF-1 (open reading frame 1) and ORF-2 regions from potato leafroll luteovirus (PLRV) were later added to confer resistance to PLRV infection spread by the aphids. The resulting NewLeaf-Plus potato, which received US regu-latory approval for food/feed use and environ-mental release in 1998, is resistant to these two scourges of potato plants and allows growers who adopt it to reduce their use of chemical controls and increase yields.

Farmers who planted NewLeaf and NewLeaf-Plus became convinced that they were the most environmentally sound and economically effi-cient way to grow potatoes, but after five years of excellent results they encountered an unex-pected snag. Under pressure from anti-bio-tech organizations, McDonald’s, Burger King (Miami) and other restaurant chains informed their potato suppliers that they would no lon-ger accept gene-spliced potato varieties for their french fries. As a result, potato processors such as J.R. Simplot (Boise, ID, USA) inserted a nonbiotech potato clause into their farmer-processor contracts and informed farmers that they would no longer buy gene-spliced pota-toes. In spite of its substantial environmental, occupational and economic benefits, NewLeaf became a sort of contractual poison pill and is no longer grown commercially.

Now, assume that a farmer who is required by contractual arrangement to plant non-biotech potatoes sprays his potato crop with methamidophos (the organophosphate nerve poison) and that the pesticide drifts into a nearby stream and onto nearby farm labor-ers. As a result, thousands of fish die in the stream and the laborers report to hospital emergency rooms complaining of neurologi-cal symptoms.

This hypothetical scenario is, in fact, not at all far fetched. Fish kills attributed to pesticide runoff from potato fields are commonplace. In the potato-growing region of Prince Edward Island, Canada, for example, a dozen such inci-dents occurred in one thirteen-month period alone, between July 1999 and August 2000 (ref. 8). According to the United Nation’s Food and Agriculture Organization (Rome), “normal” use of the pesticides parathion and methami-dophos is responsible for some 7,500 pesticide poisoning cases in China each year.

In our hypothetical scenario, the state envi-ronmental agency might bring an administra-tive action for civil damages to recover the cost

of the fish kill, and a plaintiff ’s lawyer could file a class-action suit on behalf of the farm labor-ers for personal injury damages.

Who’s legally responsible? Several pos-sible circumstances could enable the farmer’s defense lawyer to shift culpability for the alleged damages to the contracting processor and the fast-food restaurants that are the ultimate purchasers of the potatoes4. These circum-stances include: the farmer’s having planted Bt potatoes for the previous several years; his contractual obligation to the potato processor and its fast-food retail buyers to provide only nonbiotech varieties; and his demonstrated preference for planting gene-spliced, Bt pota-toes, were it not for the contractual proscrip-tion. If these conditions could be proven, the lawyer defending the farmer could name the contracting processor and the fast-food res-taurants as cross-defendants, claiming either contribution in tort law or indemnification in contract law for any damages legally imposed upon the farmer client.

The farmer’s defense could be that those companies bear the ultimate responsibility for the damages because they compelled the farmer to engage in higher-risk production practices than he would otherwise have chosen. The companies chose to impose cultivation of a non-gene-spliced variety upon the farmer although they knew that to avoid severe yield losses, he would need to use organophosphate pesticides. Thus, the defense could argue that the farmer should have a legal right to pass any damages (arising from contractually imposed production practices) back to the processor and the fast-food chains.

Food giants—watch out!Companies that insist upon farmers’ using production techniques that involve foreseeable harms to the environment and humans may be held legally accountable for that decision. If agricultural processors and food companies manage to avoid legal liability for their insis-tence on nonbiotech crops, they will be ‘guilty’ at least of externalizing their environmental costs onto the farmers, the environment and society at large.

1. Munkvold, G.P., Hellmich, R.L. & Rice, L.G. Plant Dis. 83, 130–138 (1999).

2. Dowd, P. J. Econ. Entomol. 93, 1669–1679 (2000).3. Kershen, D.L. Food Drug Law J. 61, 197–236

(2006).4. Kershen, D.L. Oklahoma Law Rev. 53, 631–652

(2000).5. Sicherer, S., Munoz-Furlong, A., Wesley Burks, A. &

Sampson, H. J. Allergy Clin. Immunol. 103, 559–562 (1999).

6. Bock, S.A., Munoz-Furlong, A. & Sampson, H.A. J. Allergy Clin. Immunol. 107, 191–193 (2001).

7. Weise, E. Biotechnology appears to be withering as a food source. USA Today February 2 (2005), p. 8D.

8. Nickerson, C. Potatoes, pesticides divide island. The Boston Globe August 30, (2000), p.A1

COMMENTARY©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


The breeder’s dilemma—yield or nutrition?Cindy E Morris & David C Sands

The emphasis of traditional crop production on yield is counter-productive for human nutrition.

Plant breeders, challenged to create more nutritious crops, face seemingly radical

choices that constitute a ‘breeder’s dilemma’. In the search for higher yields and lower farm-ing costs, have breeders inadvertently selected for crops with reduced nutritional quality? To create foods that keep pace with our growing understanding of what constitutes healthy diets, plant breeders may need to make a significant shift away from traditional selection criteria. Subsidizing crop nutritional value rather than yield could be an important and economical driver for this shift in perspective.

How healthy is food?The wide variety and availability of DNA and proteomic tests for human health and disease treatment are among the principal techno-logical consequences of the Human Genome Project. This is leading to a growing under-standing about the molecular basis of human health and of genetic predisposition to diseases, such as obesity, type 2 diabetes mellitus, cardio-vascular disease and colorectal and other can-cers. The pivotal role that diet plays in both the cause and the remediation of these and other health problems is also becoming increasingly clear. Our challenge is to narrow the growing gap between what we should eat to maintain optimal health and the nutritional quality of the staple foods in modern diets.

Plants are a fundamental constituent of the human diet, either as direct sources of nutri-ents or indirectly as feed for animals. Modern

plant breeding has been historically oriented toward high agronomic yield, easy and consis-tent processing, and disease and pest resistance. This strategy may have unwittingly led to the proliferation of foods that are at the root of certain dietary problems.

The biochemical quality of certain staple plant foods—and not simply the quantities consumed—might be a predisposing fac-tor for obesity and cardiovascular disease. Furthermore, some plants, although efficient as feeds for animal production, may adversely affect the nutritional qualities of animal-based foods. For example, they might not provide sources of certain types of polyunsaturated fatty acids. Creating staple foods that are more nutritious might require selecting crop culti-vars that are lower yielding, more sensitive to pests, possess unusual flavors or other uncom-mon properties, or otherwise do not meet the traditional criteria of plant breeders. Creation of oil crops and animal feeds that enhance the health-promoting quality of animal-derived foods might involve some concerted genetic modification of our current crops or even replacement of traditional canola-, soy-, wheat- and corn-based products with new crops.

Problems with staplesWheat breeding, for example, has been histori-cally oriented toward increasing yield and the amounts of amylopectin, gluten and protein. Amylopectin and gluten contents ensure bak-ing and processing qualities. After cooking, however, amylopectin (branched starch) is more readily digestible than amylose (straight starch). Results of feeding trials suggest that quickly digested starch, such as amylopectin, promotes the development of insulin resistance in rats. The relatively slow time course of this condition resembles the normal development of insulin resistance in humans1. Insulin resis-tance is the leading risk factor for type 2 dia-betes mellitus and is aggravated by obesity2. In

contrast, consumption of high-amylose foods normalizes the insulin response of hyperin-sulinemic human subjects. This has potential benefit for diabetics3.

Gluten, the major storage protein in wheat (and similar proteins in barley and rye) causes an autoimmune response that damages the small intestine of certain genetically predis-posed individuals. The damaged mucosal lining of the small intestine leads to chronic malnu-trition whose symptoms are impaired physical health and emotional state. This genetic dis-order is known as celiac disease. According to the US National Institutes of Health Consensus Development Conference Statement on Celiac

Cindy E. Morris is at the Unité de Pathologie Végétale, INRA-Avignon, Montfavet 84140, France. She and David C. Sands are at the Department of Plant Sciences and Plant Pathology, 119 Plant Biosciences Building, Montana State University, Bozeman, Montana 59717-3150, USA. e-mail: [email protected]

For staple crops, dietary problems, such as intolerance to gluten in wheat or immune reactions to the Bet v 6 minor allergen in strawberries, and improved dietary value could be addressed if plant breeding programs were to broaden from their narrow focus on agronomic traits, such as increased yield.

Jam

es F

. Qui

nn/C

hica

go T

ribun

e/N

ewsc

om

COMMENTARY©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Disease of June 2004 (ref. 4), this disease has been underdiagnosed by the medical com-munity and may affect as many as 0.5–1% of people in the United States and Europe.

Our staple crops may also have inherent deficiencies that may contribute to emerging dietary problems. Corn is a case in point. About 60% of corn seed proteins consist of prolamins (or zeins) that are almost completely devoid of the essential amino acids lysine and tryptophan. Attempts to select corn lines with enhanced lysine and tryptophan have invariably led to reductions in zein content. The resulting corn lines had soft chalky endosperm and conse-quently also suffered increased mechanical damage during harvest. They also were more susceptible to diseases and were lower yielding and thus have never led to significant commer-cial interest5.

Corn-based diets (animal or human) require lysine and tryptophan supplementation for adequate protein synthesis. Tryptophan is also the precursor for the synthesis of some neu-rotransmitters and for niacin6. Historically, the nutritional deficiency pellagra developed where corn was an important dietary staple and where protein intake was low. It is caused by niacin deficiency due to the absence of its precursor, tryptophan, in the diet. Symptoms are severe dermatitis, diarrhea, dementia and eventually death7. Pellagra is rather uncommon today outside of all but the poorest regions of the world. But in those parts of the world where corn is still an important component of the diet, there may be other consequences of low tryptophan consumption that we are ignoring. The neurotransmitter serotonin, synthesized in the brain from tryptophan, is responsible for feelings of well-being, calmness, personal secu-rity, relaxation, confidence and concentration; it is a key player in overall mood and in aggres-siveness8 and in the development of depres-sion. Could consumption of tryptophan-rich foods play a role in reducing the prevalence of depression and aggression in society?

Lipids also play an important role in human health from both the standpoint of caloric intake and as a source of essential fatty acids. The ratio of omega-3 to omega-6 polyunsatu-rated fatty acids in our modern diet has esca-lated from an optimal ratio of 1:1 or 1:2 to a current ratio of 1:25 to 1:30. This nonoptimal ratio results from high consumption of red meats rather than cold-water fish, a lack of plant sources of long chain omega-3 fatty acids in our diets and the use of animal feeds rich in omega-6 versus omega-3 fatty acids. Development of cooking oils has led to widespread avail-ability of mono- and polyunsaturated oils, such as canola and soy, that have largely replaced saturated fats. Unfortunately, these

oils are relatively low in long chain omega-3 fatty acids and high in omega-6 fatty acids. This skewed ratio is a key factor in the preva-lence of cardiovascular diseases and inflam-matory/auto-immune diseases9,10. Fish oils, high in long chain omega-3 fatty acids, cannot replace plant-derived oils for widespread use in cooking and feeds. There is a need for crops that can provide abundant quantities of these fatty acids.

It is axiomatic that one of the aims of plant production is the reduction of crop loss from predation and disease. But we wonder if in our efforts to boost yields to feed growing popula-tions we haven’t overlooked what pests could be telling us about the nutritional value of crops. Plants and bacteria are the only organ-isms that can synthesize the full complement of protein amino acids. Animals must consume certain preformed amino acids. The ten essen-tial amino acids are the same for all animals (despite a few exceptions, such as aphids and termites, which harbor bacterial symbionts that can make essential amino acids and fur-nish them to their hosts11,12). Nevertheless, plant-derived food that is nutritionally good for insects, rodents, deer and nematodes, for example, is fundamentally good food for humans.

Likewise, many of the compounds that plants produce to inhibit herbivory, such as alkaloids, cyanogenic glycosides, glucosinolates and terpenoids, have wide spectrum activities across the animal kingdom. Furthermore, most animals are equipped with taste receptors and internal feedback mechanisms that allow them to sort through stimuli to obtain necessary nutrients and avoid toxins. Thus, if crops are bred for undesirability to insects could this mean something with regard to the quality of these crops as food for humans? Corn and wheat are both deficient in lysine and methio-nine and there is some effort to increase the content of these amino acids in these crops. Could this lead to increased desirability to pests? We are not proposing that plant breeding ignore or enhance the susceptibility of crops to pests. Rather, we are pointing out that this is part of the breeder’s dilemma in selecting more nutritious crops.

Breeding for yield, fruit size and shelf life has also inadvertently led to changes in the flavor qualities of fruits and vegetables. Tomatoes13 and strawberries14 are well-stud-ied examples. Many of the plant volatiles that contribute to flavor are derived from essen-tial long-chain polyunsaturated fatty acids or essential amino acids. In tomato, virtually all of the major volatiles are linked to compounds that are essential nutrients13. With the excep-tion of volatiles that originate from lycopene

(which remain high in tomatoes because of selection for red-colored fruits), flavor com-ponents related to essential nutrient have been diminished through breeding13.

Shifting goals for breeders?What are the technological routes to develop-ing highly nutritious foods, particularly from staple crops? Clearly, genetic and metabolic engineering are likely to be very effective (and in some cases the only possible) routes to modify our current staple crops by insertions of specific genes, gene silencing and immuno-modulation. Gene insertion has been used to create rice high in provitamin A15 and gene silencing has led to slight increases in lysine and tryptophan contents of wheat5. Recently, heavy-chain antibodies from llamas have been used to immunomodulate starch branching enzyme A in potatoes, leading to higher amy-lose content of tubers16.

We also need to seriously consider alterna-tive plant species as candidates for major sta-ple crops. To create gluten-free grain crops or enhance omega-3 production in plants, should breeders focus on genetic modifications of wheat, canola or soy, or could alternative plant species more efficiently and effectively lead to solutions for the associated dietary problems?

To avoid damage of more nutritious crops by pests, breeders might need to design multi-line or composite crops. Each component of these crops might lack a specific nutrient, but the composite would have all of the essential amino acids, for example, in optimal quanti-ties. This might avoid enhancing their desir-ability to pests.

The route to more nutritious crops might also involve consideration of the full gamut of enlightenment arising from genomics, proteomics and metabolomics as applied to humans, nematodes, insects, plants and other organisms and what these approaches are telling us about the biochemical differences between health and disease. Furthermore, to create widespread acceptance of nutritious crops, breeders may need to bravely address the biology of food addiction and satiation, two very real drivers of food preferences.

Breeders have always confronted dilemmas; it is their stock in trade. The breeder’s dilemma we describe arises from the confrontation between the emerging data-driven insights into the physiology of human health and traditional agricultural practices and economics of food production. Wholesale creation of highly nutri-tious crops may threaten current commodity crops. Solving the breeder’s dilemma requires a radical shift in perspective.

Improving our diet is an investment in human capital. It will have important positive

COMMENTARY©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


spillovers for education and behavior; ulti-mately, it could improve the quality of life. Can we eventually consider that gains in work time and higher learning performance, for example, are part of the economic results of plant breed-ing programs? Can an economic system that subsidizes yield be converted to one that, by subsidizing crops with high nutritional quality, concomitantly reduces other costs to society related to human health?

1. Byrnes, S.E., Miller, J.C. & Denyer, G.S. J. Nutr. 125, 1430–1437 (1995).

2. Hirosumi, J. et al. Nature 420, 333–336 (2002).3. Behall, K.M. & Howe, J.C. Am. J. Clin. Nutr. 61, 334–

340 (1995).4. <http://consensus.nih.gov/2004/2004CeliacDisease118

html.htm>5. Huang, S. et al. J. Agric. Food Chem. 52, 1958–1964

(2004).6. Heine, W., Radke, M. & Wutzke, K.D. Amino Acids 9,

191–205 (1995).7. Breoton, B.P. Nutr. Anthropol. 22, 2–9 (1994).8. Young, S.N. & Leyton, M. Pharm. Biochem. Behavior 71,

857–865 (2002).9. Kris-Etherton, P.M., Harris, W.S. & Appel, L.J. Circulation

106, 2747–2757 (2002).10. Simopoulos, A.P. J. Am. Coll. Nutr. 21, 495–505

(2002).11. Douglas, A.E. Annu. Rev. Entomol. 43, 17–37 (1998).12. Douglas, A.E., Minto, L.B. & Wilkinson, T.L. J. Exp. Biol.

204, 349–358 (2001).13. Goff, S.A. & Klee, H.J. Science 311, 815–819 (2006).14. Aharoni, A. et al. Plant Cell 16, 3110–3131 (2004).15. Potrykus, I. Plant Physiol. 125, 1157–1161 (2001).16. Jobling, S.A. et al. Nat. Biotechnol. 21, 77–80

(2003).

COMMENTARY©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


The protection racketTom Jacobs

Biotech stocks can be white-knuckle volatile, but there is a way you can profit from, or

insure against, such rapid and dramatic price changes. Through put and call options, you may not only make more profits, but also buy and sell ‘protection’ for your investments (with no need for ‘knuckle’ sandwiches from your neigh-borhood mobster). Despite their potential ben-efits, options are definitely not for beginners, but everyone—especially biotech investors—should know about these valuable tools.

Calls and puts as profit makersCall and put options are like futures contracts for oil, wheat, coffee or pork bellies where a buyer has the right, but not the obligation, to sell a commodity to a seller at a certain time (‘expiration’) for a certain price (the ‘strike price’). You can buy or sell options just like stock trades through your broker, paying a commission and, as a buyer, a charge called an options premium.

You buy options for a larger potential gain than if you owned the stock. For (hypotheti-cal) biotech Gene Genie, shares sell for $10. You might pay a $1 premium for a ‘call’ option on the shares, with a strike price of $10 and January 2008 expiration. Let’s say shares rise to $15 some time before expiration. The call option premium rises $4 to $5 (it won’t always move dollar-for-dollar with the stock, but that’s a more complex topic), and if you sold, you would pocket a terrific net gain of $4 ($5 minus your $1 premium), or four times your money. Meanwhile, the stock owner profited only 50%. These powerful gains are why few people actually hold the options to expiration and buy the stock, but there is a red flag warn-

ing: If Gene Genie shares stay at $11 or below, you lose 100% of your investment, while the stock owner loses only a part.

Using the same principle, if you think the Gene Genie stock price will go down from its price of $10, you can pay a $1 premium to buy a ‘put’ option at $10 a share strike price for January 2008 expiration. You then profit if the stock falls below that price at or before expira-tion, just as if you had shorted the stock itself.

Calls and puts as insuranceThe reverse is how you insure against disaster. If you bought Gene Genie at $10, you could buy a $10 put as protection, profiting on the put option if the stock drops, even though you would lose money on the stock itself. If you are shorting Gene Genie at $10, you buy the $10 call as protection. Then, if the stock rises and your short loses, you compensate with some profit because the call option increases in value.

True, this strategy would reduce potential profits by the amount of option premium you pay for the insurance. On the other hand, it also downsizes your risk because you will make some profit on the option if you suffer losses on the stock.

A seller or a buyer be?So far, we have discussed only buying options, but for every buyer, there is a seller. This is the cool part: if you sell (or write) the insurance, you pocket the premium today and may never have to pay out on the ‘claim.’

Remember, if you buy Gene Genie calls or puts, you pay the option premium to a seller on the open market in a normal trade via your broker. A seller pockets the cash immediately. If Gene Genie is not above $10 (in the case of calls) or below $10 (in the case of puts) at January 2008 expiration, your option expires worthless to you, but the seller keeps the pre-mium. Because most option contracts expire worthless or close to it, the option seller has a statistical advantage. Indeed, the larger your investing account, the more you can supple-

ment your income by selling options selec-tively. Thus, my business partner and friend Jeff Fischer has supplemented his income for years by selling puts on two biotechs: Millennium Pharmaceuticals (Cambridge, MA, USA; Nasdaq:MLNM) whenever shares dropped to the mid-single digits and Northfield Laboratories (Northbrook, IL, USA; Nasdaq: NFLD) at prices below $10. (This not a recom-mendation, but an illustration.)

Risks and benefitsOptions involve significant risks and restric-tions. First, when you sell a put or a call, you agree to buy or sell the shares at a given price. If the option works against you and for some reason you have to buy or sell, you can lose a lot of money. The topics of ‘covered’ or ‘naked’ puts and calls require much more space than is available here, but you should know what your potential losses are with each option. They can cripple you if you aren’t prepared.

Second, not all stocks have options, so you can’t profit with, insure or sell insurance for all holdings, and those that do often are for only short periods before expiration, not over a year away as in our Gene Genie example. You can find these and other options data via online sources such ‘Yahoo! Finance’ or online bro-kers, including Chicago-based OptionsXpress, which specializes in options. Lastly, most retire-ment accounts prohibit options except selling covered calls, so you are restricted to a taxable brokerage account for most options activity.

These caveats aside, the volatility of the bio-tech sector makes it fertile ground for options investors. Volatility injects uncertainty, prompt-ing the options seller (rather like an insurance company) to charge a higher premium. That’s why options premiums on average are much higher for a speculative biotech stock than for, say, Pfizer (New York, NY, USA), Microsoft (Seattle, WA, USA) or other large companies (hurricane insurance is after all more expen-sive in Miami than in Toronto). When biotechs move dramatically, they can provide excellent options opportunities.

Tom Jacobs is co-founder of Complete Growth Investor, http://www.completegrowth.com, a stock service for individual investors. Tom was neither long nor short either shares or options of the companies mentioned here at the time of writing. Options can involve significant risk and should be studied carefully before investing.

I N V E S T O R ’ S L A B©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Oversight of US genetic testing laboratoriesKathy L Hudson, Juli A Murphy, David J Kaufman, Gail H Javitt, Sara H Katsanis & Joan Scott

Despite the boom in genetic tests available in US laboratories, oversight remains patchy. A survey of laboratory directors suggests that mandatory proficiency testing would result in fewer errors.

Today, genetic tests for close to 1,000 dis-eases are clinically available, with hundreds

more under development1. Results from these tests can lead to profound, life-changing deci-sions, such as whether to undergo prophylactic mastectomy, terminate a pregnancy or take a particular drug or dosage of a drug. An incor-rect test result can lead to misdiagnosis and inappropriate or delayed treatment; therefore, it is imperative that results from genetic tests be accurate and reliable.

To explore whether creation of a genetic test-ing specialty with specific proficiency testing (PT) standards could improve the quality of genetic testing, we have examined not only the relationship between participation in PT for genetic testing and laboratory quality but also the attitudes of laboratory directors toward current genetic testing regulation, the value of a genetic testing specialty and the value of PT in ensuring quality testing. The data from our survey clearly demonstrate that participa-tion in PT correlates with test quality. What’s more, most laboratory directors support moves to create formal registration under a genetic testing specialty for centers that carry out such analyses.

The testing landscapeOver the past three decades, genetic testing has played an increasingly important role in clinical medicine. The first genetic test, for the prenatal

diagnosis of sickle cell disease, was developed in 1978 and signaled the birth of modern clinical molecular genetics2. What began as a handful of academic laboratories performing genetic testing for rare and often debilitating diseases has grown into a multimillion-dollar commercial industry3. Fueled by information gained from the Human Genome Project, new genetic tests are quickly transitioning from the research bench to clinical practice (Fig. 1).

Currently, a patchwork of oversight mecha-nisms is in place to help ensure the quality of genetic testing. Only a few genetic tests—those marketed by companies as ‘test kits’—require FDA premarket review. Most tests are devel-oped in-house by clinical laboratories (so-called home brews) and are not subject to government review before they are made clini-cally available.

In 1988, the US Congress enacted the Clinical Laboratory Improvement Amendments (CLIA) in response to reports of rampant errors and poor quality laboratory testing services, par-

ticularly with regard to Pap smear results. Any laboratory performing testing on human specimens and reporting patient-specific results must be certified under the provisions of CLIA and adhere to general requirements for quality control (QC) standards, personnel qualification and documentation/validation of test procedures4. Research laboratories are exempt only if they “do not report patient-specific results for the diagnosis, prevention, or treatment of any disease or impairment of, or the assessment of the health of individual patients” (Box 1).

Laboratories performing tests categorized as high complexity under CLIA must enroll in the appropriate specialty area, if one is available. Specialty areas provide more detailed require-ments than the general CLIA regulations. In particular, many specialties require enrollment in a CLIA-approved PT program. However, a specialty area for molecular and biochemi-cal genetic testing has not yet been created, so there are no specific QC, personnel or PT

Kathy L. Hudson, Juli A. Murphy, David J. Kaufman, Gail H. Javitt and Joan Scott are at the Genetics and Public Policy Center, Berman Bioethics Institute, Johns Hopkins University, 1717 Massachusetts Avenue, NW, Suite 530, Washington, DC, 20036, USA and Sara H. Katsanis is at the DNA Diagnostic Laboratory, Johns Hopkins Hospital, 600 N. Wolfe St., Baltimore, MD 21287, USA. e-mail: [email protected]

Genetic testing without quality control may be cause for concern. (Source: Genetics and Public Policy Center, Washington, DC.)

F E AT U R E©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


standards required by CLIA for these kinds of tests. In the absence of a formal PT program, CLIA states that “a laboratory must establish and maintain the accuracy of its testing pro-cedures” and “have a system for verifying the accuracy of its test results at least twice a year.” Thus, CLIA does not require genetic testing laboratories to enroll in a formal PT program, although some accrediting entities do (e.g., New York State requires laboratories located in New York State or doing business in New York State to participate in PT programs if they

are available). Moreover, formal PT programs are available for only a small fraction of the genetic tests offered today. When a laboratory cannot or chooses not to enroll in a formal PT program, it can perform PT by exchanging samples with other laboratories performing similar testing, retesting archived specimens or splitting samples and comparing results.

Few empirical data exist on genetic testing laboratory errors and testing quality, and no data have been made available that directly assess the relationship between the extent of

participation in formal and informal PT pro-grams and the types or frequency of genetic testing errors. A review of the literature from both genetic5,6 and nongenetic7–10 testing lab-oratories finds that although error rates can vary widely from study to study, the distribu-tion of errors across the pre-analytic, analytic and post-analytic phases of testing remains remarkably consistent for all types of clinical laboratory testing, including genetic testing. The majority of reported laboratory errors occur in either the pre-analytic (e.g., misla-beling specimens, incorrect test ordering) or post-analytic phases of testing (e.g., transcrip-tion or interpretation errors). Analytic errors, which are the types of errors that CLIA was intended to address, are estimated to account for 4%–32% of all laboratory errors7. In a 1999 survey of 42 molecular genetic testing labora-tories, analytic errors accounted for only 6.1% of all reported problems5.

Another survey of 245 molecular genetic testing laboratories found that participa-tion in PT was a leading indicator of higher quality assurance scores6. Quality assurance scores were assigned based on the number of American College of Medical Genetics (ACMG; Bethesda, MD, USA) standards for proper procedures met by a laboratory; the study did not assess laboratory errors. The study’s con-clusions were based only on the potential for laboratory errors to occur.

In the survey of laboratory directors pre-sented below, we study the quality of genetic testing laboratories, as measured by the level of participation in PT programs, the number of PT deficiencies, the number of incorrect test reports issued and the percent of laboratory directors who cite an analytic error as the lab-oratory’s most common problem. In addition, we document attitudes of laboratory directors toward current CLIA regulation, the value of a genetic testing specialty and the value of PT in ensuring quality testing.

Survey resultsOverall, 190 laboratory directors responded to our survey (see Box 2 for methodology). They provided information on CLIA certification and specialty, their use of formal or informal PT methods, the effect of PT on laboratory quality, the overall number and type of incor-rect test results and their enthusiasm for more stringent oversight of the testing sector.

Demographics. Of the 190 respondents, 55% worked in laboratories that perform only clini-cal testing, 42% in laboratories that offered both clinical and research testing, and 3% in laboratories that perform only research test-ing, but provided test results to patients and

0

100

200

300

400

500

600

700

800

900

1,000

1,100

1,200

1,300

1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005

Diseases for which testingis available

Year

Figure 1 Growth of genetic testing, including both clinical and research testing. (Source: Gene Tests database 2005, http://www.genetests.org/.)

Box 1 How CLIA works

CLIA defines a clinical laboratory as “a facility for the biological, microbiological, serological, chemical, immunohematological, biophysical, cytological, pathological, or other examination of materials derived from the human body for the purpose of providing information for the diagnosis, prevention, or treatment of any disease or impairment of, or assessment of the health of a human being.” (United States Code, Title 42 Section 263a.)

Under CLIA, laboratory tests are categorized based on their degree of complexity. Tests are graded based on seven criteria: (i) knowledge, (ii) training and experience, (iii) reagents and materials preparation, (iv) characteristics of operational steps, (v) calibration, quality control and PT materials, (vi) test system troubleshooting and equipment maintenance and (vii) interpretation and judgment. Tests requiring higher skills and knowledge to perform and interpret, such as tests for HIV, other infectious diseases, or molecular diagnostics, are categorized as “moderate” or “high complexity” tests. For these tests CLIA develops specialty areas (e.g., virology, toxicology) that provide additional QC, personnel and other standards specific to that type of testing. CLIA also requires laboratories performing moderate or high complexity testing to enroll in approved PT programs for each specialty in which the laboratory is certified, to provide an independent, external assessment of how well a laboratory is able to perform that type of testing (commonly referred to as a formal PT program). Laboratories enrolled in these formal PT programs periodically receive blinded specimens from the program to be tested in the same manner as samples received from patients. The PT program determines how often a laboratory obtains and reports correct results on these tests, which helps laboratories identify procedural problems and take corrective actions. Proficiency test results are graded as either satisfactory or unsatisfactory depending on how many deficiencies (errors) are detected. Unsatisfactory performance is reported to the CLIA-accrediting organization, and laboratories that consistently perform poorly risk losing their accreditation and CLIA certification.

FEATURE©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


providers. When respondents worked in a setting that offered research testing, we asked them to consider only the research testing they did that resulted in a report back to the patient or provider.

Nearly one in four respondents (23%) was the director of a commercial or independent laboratory, half were at a university or medi-cal school laboratory and 22% were in other hospitals. More than half of the directors (58%)

were PhDs, whereas 22% held an MD or DO degree, 18% were MD/PhDs and the remain-der (2%) held another degree. Most directors (77%) reported that their laboratories perform molecular genetic tests, whereas 5% reported that their laboratories perform biochemical genetic tests and 17% reported performing both types. The number of distinct tests offered, the estimated yearly volume of tests performed and other characteristics are found in Table 1.

CLIA certification and specialty. Laboratory directors were asked, “By which organiza-tions is your laboratory accredited or licensed as a molecular or biochemical diagnostic laboratory?” A laboratory was considered CLIA certified if it was accredited by either CLIA or one of three ‘deemed’ accrediting organizations (the Joint Commission on Accreditation of Healthcare Organizations, the College of American Pathologists Laboratory

In the absence of a comprehensive directory of US genetic testing laboratory directors, our search strategy for potential participants was designed to cast a wide net and capture as many genetic testing laboratory directors as possible.

Survey design. A list of 680 potential participants was compiled using the current GeneTests Clinic Directory1 (n = 226), the Association for Molecular Pathology membership directory15 (n = 274), New York State Department of Health’s online directory of certified biochemical16 and molecular genetic17 testing laboratories (n = 120), laboratories participating in the 2005 National Tay-Sachs and Allied Diseases’ Quality Control Program18 (n = 79), the Canavan Foundation’s laboratory directory19 (n = 91), Washington G-2 Reports’ 2005 Lab Outreach Buyer’s Guide: Providers of Laboratory Outreach Products and Services20 (n = 57) and Veteran Administration hospital laboratories selected from the Veterans Administration website21 (n = 8), as well as potential participants from laboratories identified in Google searches using a variety of search terms (n = 9). Many potential participants appeared on more than one of these lists.

All 680 potential participants were mailed an initial invitation to participate in an online survey of genetic testing laboratory directors. This was followed several days later by an e-mail invitation. To be eligible to complete the survey, a potential participant had to identify himself or herself as the director of a molecular or biochemical testing laboratory that reports test results to patients or providers. Potential participants were excluded if they were not laboratory directors, were directors of laboratories that did not provide results to patients or providers or were directors of laboratories that test only for paternity, identity, ancestry, cytogenetics, infectious diseases tissue typing or newborn screening. Up to eight periodic mail, e-mail and phone call reminders were made to nonresponders over a three-month period.

Of 680 potential participants, 404 responded. Of these, 199 respondents were ineligible based on the above criteria and were not offered the survey, whereas 190 were eligible and completed the survey. Fifteen additional eligible laboratory directors began the survey but did not complete it, and were excluded from analyses. No response was received from the remaining 276 potential participants. To calculate a response rate among eligible laboratory directors, we estimated the total number of eligible laboratory directors in our list of 680 potential participants by extrapolating the proportion of respondents who were eligible to the 276 nonrespondents22 (Supplementary Methods online). In this way, we estimated that 345 of our potential participants had been eligible for the survey, for a valid response rate of 190/345, or 55%.

A 65-question survey, qualified by the Johns Hopkins University Institutional Review Board as exempt (Application no. NA_00001533), was developed to collect data on the current laboratory practices and opinions of molecular and biochemical genetic testing laboratory directors in the United States. A small pretest was conducted with directors of six genetic testing laboratories, and feedback was incorporated into the final survey instrument.

The survey collected data about the laboratory setting, types of testing performed (molecular or biochemical or both; research or clinical or both), the qualifications of the laboratory director, laboratory accreditation and certification, test volume and menu, quality control practices, the nature and frequency of laboratory errors, and PT practices.

Knowledge Networks, a survey research firm in Menlo Park, California, administered the Web-based instrument. The data provided to the Genetics and Public Policy Center were anonymized with respect to respondents’ identifying information. Potential participants were told that data collected from the survey would be reported only in aggregate, and that analyses would not identify any particular laboratory or director. An incentive in the form of a $25 donation to one of four organizations (College of American Pathologists Foundation, American College of Medical Genetics Foundation, American Red Cross or America’s Second Harvest) was offered in exchange for a laboratory director’s participation.

Survey analyses. Analyses included examination of the relationship between laboratory characteristics and the level of participation in both formal and informal PT programs, the number of deficiencies reported in formal PT programs, the number of incorrect test reports and the types of errors observed. Data on annual laboratory test volume were collected by asking respondents to choose a range corresponding to the number of biochemical genetic tests and the number of molecular genetic tests the laboratory performs in a year (ranges provided for both questions were 0, 1–249, 250–999, 1,000–4,999, 5,000–9,999, 10,000–14,9999, ≥15,000). To create an estimate of the total annual genetic test volume for a given laboratory, we added the midpoint of the range for the number of molecular tests to the midpoint of the range for the number of biochemical tests. These sums fell into four clusters, resulting in categories of 1–1,999, 2,000–5,999, 6,000–14,999 and ≥15,000 total tests performed annually. Observations based on these ranges should be interpreted with the understanding that they are estimates of laboratory volume. To assess the relationship between survey variables, we implemented general linear, Poisson and logistic regression models using SAS version 9.1. Key variables used in regression are listed in column 1 of Table 1.

Box 2 Survey methodology

FEATURE©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Accreditation Program, the Commission on Office Laboratory Accreditation), or held a New York State clinical laboratory permit. Ninety-five percent of respondents indi-cated that their laboratory was CLIA certified (Table 1). All of the laboratories that were not CLIA certified were low volume laboratories that process <2,000 tests yearly; 86% of these low volume laboratories were CLIA certi-fied, compared to 100% of laboratories that perform ≥2,000 tests annually (p = 0.0001). Additionally, certification rates increased sig-nificantly as the menu of different tests offered increased (p = 0.006). The majority of labo-ratories that performed only research testing and reported patient-specific results were not CLIA certified.

Nearly all CLIA-certified laboratories (97%) were certified for high complexity test-ing. However, 16% reported no specialty area certification. Approximately a third of labo-ratories with the highest test volumes (35%) and largest test menus (29%) reported having no specialty certification (Table 1). Among CLIA-certified laboratories, 41% were certified in a single specialty area, and 43% listed mul-tiple specialties. The most common specialty

certifications were pathology (48%), chemistry (46%) and clinical cytogenetics (41%).

Participation in formal PT. All respondents were asked, “Does your laboratory partici-pate in a formal external proficiency testing program?” Two-thirds of directors said their laboratory participated in “all available formal external proficiency testing programs,” whereas 17% said, “Yes, for some formal, external pro-ficiency testing programs.” Sixteen percent indicated they do not participate in any for-mal PT programs. Significantly more labora-tory directors at university sites (66%, p = 0.03) and other hospitals (82%, p = 0.01) than com-mercial laboratory directors (56%) reported using all of the formal external PT programs available to them, after excluding directors who said no formal programs were available for the tests they offer (n = 19).

The 43 directors who responded either that their laboratory did not participate in formal external PT programs or that their laboratory participated in only some formal programs were asked to select up to five possible rea-sons for their nonparticipation. Sixty three percent of respondents indicated they did

not participate because of “the lack of avail-ability of formal testing programs.” Another 17% responded that internal PT is adequate. Very few laboratory directors responded that “formal external proficiency testing is too expensive” (7%) or “formal external profi-ciency testing does not provide timely feed-back” (2%). Twenty-four percent selected “another reason” in response to this question and were provided the opportunity to type in their response. Other reasons provided were that the laboratory was for research or teach-ing purposes only, that the diseases tested for were too rare, that a formal testing program for rare diseases was being established or that other informal means of PT were used. Some respondents indicated that they would partici-pate if PT programs were available.

Use of informal PT methods. For tests where no formal external proficiency test is available, CLIA requires that the laboratory “have a sys-tem for verifying the accuracy of the test result at least twice a year.” All respondents were asked, “When a formal external proficiency testing program is not available, does your laboratory perform proficiency testing using some other

Table 1 Extent of CLIA certification, specialty certification and proficiency testing among laboratoriesPercent of tests subjected to proficiency testing

(formal or informal)*

Type of laboratory N PercentPercent of labs that are CLIA certified

Percent of CLIA certified labs with no specialty certification 0–24% 25–74% 75–99% 100% <100%

All respondents 190 100 95 16 8 8 18 65 35

Clinical or research testing

Clinical only 104 55 98 13 6 7 19 68 32

Clinical and research

80 42 98 19 9 10 19 63 37

Research only 6 3 17 100 33 17 0 50 50

Setting Commercial 43 23 98 25 7 12 12 70 30

Univ./medical school

101 53 92 17 10 7 23 60 40

Other hospital 46 24 100 7 4 9 15 72 28

Director’s education (one missing)

MD or DO 41 22 98 6 10 10 20 61 39

PhD 119 58 95 23 7 9 17 66 34

MD/PhD 34 18 97 7 6 6 21 68 32

Other 5 3 80 0 20 0 20 60 40

Estimate of annual test volume (one missing)

1–1,999 65 34 86 13 15 8 9 68 32

2,000–5,999 71 38 100 18 4 7 25 63 36

6,000–14,999 35 19 100 9 6 6 17 71 29

15,000+ 18 10 100 35 0 22 22 56 44

Number of distinct tests offered (one missing)

1–4 45 24 87 7 13 4 4 78 21

5–19 77 41 96 8 7 8 21 65 35

20+ 67 35 100 29 6 12 25 57 43

Molecular or biochemical testing

Molecular 147 77 94 20 7 6 17 70 30

Biochemical 10 5 100 20 30 20 10 40 60

Both 33 17 100 0 6 15 27 52 48

*Row totals may not add up to 100% due to rounding.

F EATURE©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


mechanism?” A majority of respondents said “yes” for all (77%) or some (15%) tests whereas 8% said “no.” Respondents whose laboratories offer 1–4 different tests were twice as likely as those offering a larger menu of tests to say that they used no additional informal PT methods (16% versus 8%, p = 0.02). Half of the labora-tories that perform only research testing used no informal PT methods.

Respondents (n = 42) whose laboratories did not always perform informal PT on tests when no external program was available were asked “Which of the following, if any, are reasons your lab does not perform proficiency testing using some other mechanism when a formal program does not exist?” The most common response (53%) was “We use competency testing to docu-ment our laboratory proficiency.” Forty percent answered, “We are the sole source of the test”; 21% said, “Our test volume is too low to justify developing a proficiency testing program”; and 3% said, “Proficiency testing is not necessary for the types of tests we perform.”

Overall extent of PT use. We also asked respon-dents, “For what percentage of the genetic

tests offered by your laboratory do you con-duct some sort of proficiency testing?” More than one-third of respondents (35%) offered some genetic tests for which they perform no PT at all, including 8% who conducted either formal or informal PT on less than a quarter of the tests they offer (Table 1). Three percent conduct no PT for any of their tests. Nearly two-thirds of participants (65%) said that their laboratory performs either formal or informal PT on every test offered (Table 1).

After adjusting for key variables, laborato-ries that perform only molecular genetic tests were significantly more likely to complete either formal or informal PT on all their tests, compared to directors of laboratories using any biochemical genetic tests (70% versus 49%, p = 0.006). Additionally, the smaller the menu of tests offered, the more likely laboratories were to perform some type of PT on all of their tests (p= 0.02). No significant differences in any of the other key variables modeled were noted with respect to the extent of PT employed.Influence of PT on laboratory test quality. A laboratory participating in a formal external proficiency program is given a deficiency if the

laboratory is unable to ascertain and report the correct test results in a timely manner. Among laboratories that participate in formal external proficiency programs (n = 159), 78% reported that their laboratory had no deficiencies over the past two years, 16% reported one deficiency during that period and 7% reported two or more. Table 2 shows that as the percentage of tests on which formal or informal PT is done in a laboratory increased, the number of formal deficiencies decreased. In addition, laboratories that do not perform formal or informal PT on all of their tests were eight times as likely to report multiple deficiencies (16% versus 2%, p = 0.001).

After adjusting for key variables, the per-centage of tests on which formal or informal PT is done was the strongest predictor of the number of formal PT deficiencies reported over the past two years (p = 0.004), that is, the number of deficiencies decreased with increas-ing use of PT. After adjusting for extent of PT participation, only annual test volume was significantly related to the number of reported PT deficiencies. Laboratories that performed >2,000 tests annually reported significantly

Table 2 Frequency of proficiency test deficiencies and incorrect test reports issued“How many times in the past 2 years has your lab been found to be deficient in any way on a

formal external proficiency test?”

“What is your best estimate of how many incorrect test reports were issued by your lab

in the past 2 years?”

Type of laboratory N % Never (%) 1 time (%) 2+ times (%) None (%) 1–3 (%) 4+ (%)

All respondents 190 100 78 16 7 28 37 35

Clinical or research testing

Clinical only 104 55 79 17 3 23 38 39

Clinical and research

80 42 75 13 12 29 38 32

Research only 6 3 — — — 83 17 0

Setting Commercial 43 23 77 17 6 28 25 48

Univ./medical school

101 53 76 18 6 27 44 29

Other hospital 46 24 83 10 7 30 34 36

Estimate of annual test volume

1–1,999 65 34 86 12 2 41 44 15

2,000–5,999 71 38 73 18 10 25 45 30

6,000–14,999 35 19 82 15 3 18 21 61

15,000+ 18 10 71 18 12 12 12 77

Number of distinct tests offered

1–4 45 24 79 14 7 48 35 18

5–19 77 41 79 16 6 25 44 32

20+ 67 35 76 16 7 19 32 49

Percent of tests subjected to proficiency testing (formal or informal)

0–24% 15 8 50a 33a 17a 54 31 15

25–74% 16 8 67 8 25 27 33 40

75–99% PT 35 18 70 18 12 15 42 42

100% PT 124 65 84 14 2 28 37 35

<100% PT 66 35 67 16 16 26 38 36

Number of PT deficiencies in the past 2 years (35 missing)

None 121 78 — — — 28 36 35

1 24 15 — — — 9 48 43

2+ 10 7 — — — 0 40 60

Row totals may not add up to 100% because of rounding.aThis category excludes those performing no PT, because they cannot have PT errors.

F EATURE©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


more PT deficiencies than low volume labo-ratories. Findings did not differ when the six directors whose laboratories perform only research testing were excluded.

Incorrect test results. All respondents were asked to provide their best estimate of how many incorrect test reports were issued to patients or providers by their laboratory over the past two years (Table 2). Among respon-dents (n = 177), 28% said no incorrect test reports had been issued by their laboratory during that period, 37% reported between one and three incorrect reports and 35% reported four or more incorrect reports. The average number of incorrect reports reported over the past two years was 5.1.

Not surprisingly, the number of incorrect test reports increased significantly with the vol-ume of testing (p < 0.0001). However, adjusting for key variables, the number of incorrect test reports detected also increased significantly

with the number of deficient proficiency tests in the same period. A 20% increase in the num-ber of incorrect test reports is associated with each additional PT deficiency (p = 0.03). This finding did not differ when laboratories that perform only research testing were excluded.

Types of laboratory errors reported. All respondents were provided a list of seventeen types of laboratory errors and asked to indicate which had been observed in their laboratory over the last two years. Respondents were then asked to select the most common type of error seen in their laboratory. These were grouped into pre-analytic, analytic and post-analytic errors (Table 3).

The most commonly observed errors occurred during the pre-analytic phase of testing; 45% of the most common errors were pre-analytic, 30% were analytic and 24% were post-analytic. Adjusting for key variables, the strongest predictor of whether the most com-

monly observed error occurred during the analytic phase of testing was annual testing volume. Lower-volume laboratories were more likely than those in higher-volume laborato-ries to identify an analytic error as the most common error (p = 0.03). The second stron-gest predictor of whether a laboratory’s most common error was analytic was the percent-age of tests on which formal or informal PT is performed (Table 4). The odds that the most common error was analytic increased 40% with each decrease in level of PT completed (p = 0.06, Table 4). When analysis was restricted to laboratories that complete ≥2,000 tests annu-ally, those that do not perform some form of PT on all of their genetic tests were significantly more likely than those who complete PT on all tests to identify an analytic error as the most common type (p = 0.02). This finding did not differ when the laboratories that perform only research testing were excluded.

Laboratory directors’ attitudes. A majority of respondents (73%) agreed or strongly agreed that “CLIA should create a genetic testing spe-cialty for molecular and biochemical tests.” Directors of laboratories that perform testing for both clinical and research purposes showed somewhat greater approval for a new specialty (79%) than directors of laboratories that pro-vide only clinical genetic testing (66%, p = 0.07). There was no difference in support for a new CLIA specialty based on setting (com-mercial, academic, other), test volume or on the type of testing performed.

Sixty percent of respondents found PT to be “very useful” to “improve the quality of genetic testing performed by the laboratory industry” and another 32% said PT was “somewhat use-ful.” The perceived value of PT was similarly high in both clinical and research laboratories, in laboratories that do and do not perform biochemical testing, and across laboratory settings and annual test volume. Respondents whose laboratories conducted some type of PT on fewer than half their tests also showed high support for PT in general: 47% said it was very useful and 40% said it was somewhat useful (p = 0.35).

DiscussionResults of this survey indicate that participa-tion in PT—whether through a formal pro-gram or through other measures—has a clear association with laboratory quality as mea-sured by the number of reported deficien-cies and the frequency of reported analytic errors. In this survey, the number of reported deficiencies decreased as the percentage of tests for which any PT was performed increased. In addition, the number of incorrect test reports

Table 3 Type and frequency of laboratory errors

Test phase Error

Percent of directors that reported detecting this type of error during the past

two years

“Which was the most common type of error over

the past 2 years?”

Pre-analytic errors

Referrer ordered incorrect test

74 27

Referrer labeled specimen incorrectly

68 10

Contamination before receipt by laboratory

19 4

Transcription error at specimen receipt

32 2

Sample switch at specimen receipt

16 2

Error in written protocol 7 1

Patient’s transfusion not reported by referrer

13 0

Total pre-analytic 45

Analytic errors

Faulty reagent 52 13

Equipment failure 52 11

Human error in data analysis

44 3

Contamination during specimen testing

18 2

Sample switch during specimen testing

27 1

Total analytic 30

Post-analytic errors

Typographical error on test report

55 17

Data transcription error 42 5

Misinterpretation of data 19 1

Wrong results reported to patient/provider

20 1

Software error in data analysis

8 0

Total post-analytic 24

Other Other 4 1

FEATURE©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Since the mid-1990s a number of federal government advisory groups have questioned the adequacy of US regulatory oversight of both genetic tests and the laboratories performing them. In 2000, the US Centers for Disease Control recommended that the Centers for Medicare and Medicaid Services (CMS), the agency that oversees CLIA, create a genetic testing specialty area under CLIA11. Nearly three out of four respondents to this survey approved of such a measure. To date, CMS has not issued a rule for the creation of a genetic testing specialty. Although the US Department of Health and Human Services placed the issuance of a proposed rule for a genetic testing specialty on its regulatory agenda12 in April, with a target publication date of November 2006, more recent state-ments by CMS officials indicate the agency believes a specialty is not needed.

In enacting CLIA, the US Congress stated that PT “should be the central element in determining a laboratory’s competence, as it provides a measure of actual performance on laboratory test procedures rather than only gauging the potential for accurate outcomes”13. The importance of PT in evaluating and moni-toring laboratory quality is underscored by the fact that errors can be difficult to detect, and self-reported error rates may not accurately reflect the actual occurrence of errors in the laboratory or the quality of the laboratory. A laboratory may be making errors but not have mechanisms in place to detect them, whereas another laboratory may rarely make errors but detect them more often as a result of redundant checks and balances that have been instituted in the laboratory. Thus, PT is a useful and objec-tive means of evaluating a laboratory’s ability to get the correct test result and to identify potential sources of error.

Creation of a genetic testing specialty under CLIA by CMS is a prerequisite to mandating enrollment in specified, CLIA-approved PT programs for genetic testing laboratories. In the absence of CLIA-approved PT programs, laboratories have adopted different practices with regard to PT. Some laboratories enroll in all available formal PT programs, whereas

increased 20% with each additional reported deficiency. Furthermore, laboratories that per-form PT on a lower percentage of tests were more likely to report that their most common error occurred during the analytic phase of testing, which is the phase of testing that PT is intended to evaluate.

A limitation of our study stems from the fact that there are no comprehensive baseline data describing the numbers, types and sizes of genetic testing laboratories in the United States that would allow us to determine whether the study sample is representative. Therefore, the extrapolation of the results to the universe of US genetic testing laboratories should be made with some caution. Respondents may over-represent laboratory directors with strong opinions, or under-represent those reluctant to share information about their attitudes or practices. In addition, because we collected data regarding the annual volume of tests and the size of the test menu as ranges (e.g., 250–999 test requisitions per year), we could not com-pletely account for the effect of differences in volume and menu size on respondents’ answers to other questions.

The significant rates of nonparticipation in PT reported by directors of laboratories of all sizes demonstrates that merely being certified under CLIA is insufficient to ensure quality: nearly a third of respondents reported that their laboratories perform PT for only some tests they offer. Mandating participation in PT (formal or informal) would increase the num-ber of laboratories performing PT and thereby enhance the quality of genetic testing.

Genetic testing has become an increasingly integral component in the diagnosis, treatment, management and prevention of numerous diseases and conditions. Information gained from genetic test results can have a significant impact on medical decision making. Incorrect test results stemming from laboratory errors can lead to misdiagnosis, inappropriate and/or delayed treatment, anxiety and in rare cases, even death. Thus, it is critical that mechanisms are in place to detect and reduce laboratory errors and to ensure that the laboratories per-forming genetic testing are of high quality.

others do not. When a formal external PT pro-gram is not available, some laboratories seek to comply with CLIA’s general requirement to ensure accuracy through alternative PT meth-ods, whereas others do not. Lack of availabil-ity of formal PT programs was a key reason cited by respondents for failure to perform PT. In the absence of formal PT programs, some laboratory directors use competency testing as a means to assess proficiency. However, compe-tency testing is not a comparable substitute as it assesses an individual laboratory employee’s performance and not the actual ability of a laboratory to get the correct test result.

In a recent US Senate hearing, CMS stated that genetic tests are adequately covered by other specialties14. However, the survey data show 16% are not certified in any specialty, including one-third of high volume laborato-ries. Furthermore, the most common specialty certifications held by genetic testing laborato-ries have questionable relevance to establishing quality for genetic testing.

Establishing additional formal PT programs for genetic testing laboratories and requiring enrollment as a condition of CLIA certifica-tion would require additional resources. Even so, more than nine out of ten laboratory direc-tors surveyed regard PT as useful for improv-ing the quality of the genetic testing performed by the laboratory industry and almost no one said cost is a driver of nonparticipation in pro-grams. Furthermore, a majority of laboratory directors support creation of a genetic testing specialty under CLIA. Given these observations, and the demonstrated association between PT and laboratory quality, we conclude that the creation of a genetic testing specialty and the associated requirement to enroll in specified CLIA-approved PT programs would improve the quality of genetic testing laboratories.


ACKNOWLEDGMENTSThe Genetics and Public Policy Center is supported at Johns Hopkins University by The Pew Charitable Trusts. The opinions expressed in this report are those of the authors and do not necessarily reflect the views of The Pew Charitable Trusts. The authors are grateful to Linda Bradley, Michele Caggana, Wayne Grody and Michele Schoonmaker for their helpful review of an earlier draft of this manuscript, and to GeneTests for providing their Clinic Directory.

1. Gene Tests. http://www.genetests.org/2. Kan, Y.K. et al. Polymorphism of DNA Sequence adja-

cent to human-globin structural gene: relationship to sickle mutation. Proc. Natl. Acad. Sci. USA 75, 5631–5365 (1978).

3. Frost & Sullivan, U.S. Genetic Diagnostics Markets, F463–552 (2005).

4. United States Code, Title 42, Section 263(a).5. Hofgartner, W.T. & Tait, J.T. Frequency of problems

during clinical molecular genetic testing. Am. J. Clin. Pathol. 112, 14–21 (1999).

Table 4 Relationship between extent of proficiency testing and type of most common errorType of most common error (%)

Pre-analytic Analytic Post-analytic

Percent of tests on which some PT is done

0–24 PT 29 50 21

25–74 PT 53 33 13

75–99 PT 37 34 29

100 PT 48 26 26

All respondents 45 30 24

Row totals may not add up to 100% because of rounding.

F EATURE©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


6. McGovern, M.M. et al. Quality assurance in molecular genetic testing laboratories. J. Am. Med. Assoc. 281, 835–840 (1999).

7. Bonini P. et al. Errors in laboratory medicine. Clin Chem., 48, 691–698 (2002).

8. Witte, D.L. et al. Errors, mistakes blunders, outliers, or unacceptable results: how many? Clin. Chem. 43, 1352–1356 (1997).

9. Howanitz, P.J. Errors in laboratory medicine: practical lessons to improve patient safety. Arch. Pathol. Med. 129, 1252–1261 (2005).

10. Hollensead, S.C. et al. Errors in pathology and labora-tory medicine: consequences and prevention. J. Surg. Oncol. 88, 161–181 (2004).

11. Federal Register, vol. 65, p. 25, 928, May 4, 2000.12. Federal Register, vol. 71, p. 22,595, April 24,

2006.13. H.R. Rep. No. 100-899 (1988).

14. At Home DNA Tests: Marketing Scam or Medical Breakthrough? (Testimony of Thomas Hamilton, Director, Survey and Certification Group, Centers for Medicare and Medicaid Services) Before the Senate Special Committee on Aging, 109th Cong. (2006).

15. Association for Molecular Pathology (AMP). Membership Directory (AMP. Rockville, MD, 2005).

16. New York State Department of Health. Database of clinical laboratories currently holding a New York State Department of Health permit in the specified category of testing (2005). Genetic testing/biochemistry. http://www.wadsworth.org/labcert/clep/CategoryPermitLinks/CategoryListing.

17. New York State Department of Health. Database of clinical laboratories currently holding a New York State Department of Health permit in the specified category of testing (2005). Genetic testing/molecular. http://www.wadsworth.org/labcert/clep/CategoryPermitLinks/

CategoryListing.htm18. National Tay Sachs and Allied Diseases Association.

2005 Directory: NTSAD Quality Control Program Participating Laboratories (2005). http://www.ntsad.org/pages%5Cqclabs2005.htm

19. Canavan Foundation. Canavan Foundation Directory of Testing Centers (2005). http://www.canavanfoundation.org/screening.php

20. Washington G-2 Reports. Lab Outreach Buyer’s Guide: Providers of Laboratory Outreach Products and Services (Washington G-2 Publications, New York, 2005).

21. U.S. Department of Veteran’s Affairs. Facilities Locator and Directory (2005). http://www1.va.gov/directory/guide/rpt_fac_list.cfm

22. American Association for Public Opinion Research. Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys. ed. 4. (AAPOR Lenexa, Kansas, 2006).

F EATURE©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Evidence and anecdotes: an analysis of humangene patenting controversiesTimothy Caulfield, Robert M Cook-Deegan, F Scott Kieff & John P Walsh

When it comes to gene patenting, policy makers may be responding more to high-profile media controversiesthan to systematic data about the issues.

Gene patenting has attracted intense scru-tiny for decades, raising a host of ethical,

legal and economic concerns. Much of the policy debate has focused on seemingly quan-tifiable and practical concerns about the effect of patents on access to useful technologies in the contexts of both research and the clinic. Here, we summarize the dominant policy concerns and the events that have motivated these debates. We then reflect on what the evidence now says about the major concerns articulated in policy reports. We conclude by discussing what might explain some of the disparity between the empirical evidence and the policy focus.

Although policymakers and advisory groups have long recognized the moral and ethical con-cerns associated with human gene patents1–3, such concerns have only rarely led to concrete proposals for reform4. A systematic review of the content and timing of major policy documents highlights the fact that policy activity has been largely stimulated by a convergence of a general social unease, the emergence of preliminary data and literature on the possible adverse practical ramifications of gene patents, and several high-profile patent protection controversies.

The timing of the policy activity reflects this tendency. The recommendations for diagnostic-use licensing, for example, followed the inter-national controversy associated with Myriad Genetics’ decision to enforce the patents over the BRCA1 and BRCA2 mutations5 (Fig. 1). There have been other gene patenting contro-versies, such as the furor over patents related to Canavan disease, or the attempt by US National Institutes of Health in the early 1990s to patent over 7,000 expressed sequence tags (ESTs)6. The mid-1990s was also a period of rapid (roughly 50% per annum) growth in DNA-related patents in the United States7. Internationally, however, the Myriad controversy coincides with the most policy activity. Indeed, as Figure 2 (and Supplementary Data online) shows, the Myriad Genetics–BRCA1/2 story is, by far, the most referenced patent controversy in the policy documents we reviewed.

These controversial gene-patenting stories raised several concerns in the academic and policy literature. A prominent concern was that of a “tragedy of the anticommons,” or the possi-bility that the large number of patents on genes and their diverse set of owners would make it difficult to acquire the rights to all necessary research inputs, which could, in turn, result in the underuse of valuable technologies8. Second is the longstanding concern that the owners of patents on fundamental technologies will exer-cise their rights to exclude in ways that will pre-vent others from developing or accessing the technology9–11. The Myriad case was held out as an example and as a harbinger of the coming problems associated with human gene patents5. Such restrictions on access to patented genes were viewed as especially pernicious given a belief that such patents could not be invented around, because of the unique role that genes play in biological processes.

A closely related concern was that the strong commercial incentive built into recent policy changes, and the associated pro-commercial milieu in universities, were undermining the norms of open science12,13, leading research-ers to be more secretive about their ongoing research, to delay publication of result, and to be less likely to share research materials or data. These behaviors, it was held, would retard the progress of science and technology.

Starting around 2001, this literature, together with the Myriad Genetics controversy and similar ones, began to stimulate significant policy activity (Fig. 1). In Canada, an Ontario government report recommended a variety of reforms, including strengthening the research exemption and revising the compulsory licens-ing provisions in the Patent Act to create an exemption for genetic diagnostic and screening tests14. The UK’s Nuffield Council on Bioethics made similar recommendations2. In the United States, the National Academy of Sciences issued two reports7,15, both of which recommended a research exemption as a means of dealing with the anticommons and restricted access prob-lems. These reports were clearly influenced by emerging empirical evidence about the effects of gene patents on genetic testing services16,17 and the Myriad controversy (the production of the Ontario report immediately followed the eruption of controversy over Myriad’s pat-ent in Canada and the Nuffield Council and the National Academy’s 2005 report both used Myriad as a case study)18.

Reflecting on the evidenceWith the passage of time and the accumulation of more data, we can now reflect on what the available data do and do not say about the anec-dotes, theories and initial evidence that spurred so much policy activity. Indeed, the policy

Timothy Caulfield is at the Health Law Institute, University of Alberta, Canada; Robert M. Cook-Deegan is at the IGSP Center for Genome Ethics, Law & Policy, Sanford Institute of Public Policy and Duke University School of Medicine, Durham, North Carolina, USA; F. Scott Kieff is at the Washington University School of Law and the Hoover Institution, Stanford University, Stanford, California, USA; and John P. Walsh is at the School of Public Policy, Georgia Institute of Technology, Atlanta, Georgia, USA.e-mail: [email protected]

PAT E N T S©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


debates around these concerns have both led to and been informed by a number of empiri-cal studies designed to find out where and to what extent each of these concerns is manifest in the practice of biomedical research.

The results of these empirical efforts have been fairly consistent. First, the effects predicted by the anticommons problem are not borne out in the available data. The effects are much less prevalent than would be expected if its hypoth-esized mechanisms were in fact operating. The data do show a large number of patents associ-ated with genes. A recent study found that nearly 20% of human genes were associated with at least one US patent, and many had multiple patents19. Another study estimated that in the United States over 3,000 new DNA-related pat-ents have issued every year since 1998, and more than 40,000 such patents have been granted7. But despite the large number of patents and the numerous, heterogeneous actors—including large pharmaceutical firms, biotech startups,

universities and governments—studies that have examined the incidence of anticommons problems find them relatively uncommon20–24. These studies span both academics and indus-try, and include data from the United States, Germany, Australia and Japan.

Studies on access to upstream research tools find that although some researchers or firms are denied access to a particular technology, oth-ers do have access to the same technology, sug-gesting that the resulting limitations have more to do with a willingness to accept the market price and access terms25,26. Similarly, among academic biomedical researchers in the United States, only 1% report having had to delay a project and none having abandoned a project as a result of others’ patents, suggesting that neither anticommons nor restrictions on access were seriously limiting academic research21—despite the fact that these researchers operate in a patent-dense environment, without the benefit of a clear research exemption.

One important exception is in the area of gene patents that cover a diagnostic test. Here, there are more instances of researchers and firms claiming that the patent owner is assert-ing exclusivity or license terms that are widely viewed as inappropriate, thus lending some empirical evidence to support the concerns highlighted by the Myriad Genetics story. For example, 30% of clinical labs report not devel-oping or abandoning testing for the HFE gene after the patent issued17. In addition, 25% of labs had abandoned one or more genetic test as a result of patents, with Myriad’s patents among the most frequently mentioned27. Such unlicensed lab testing, from the perspective of the patent owner, competes with its commer-cial activity, and hence it is not surprising to find owners asserting their rights.

There is also substantial empirical evidence that university researchers are becoming more secretive and less willing to share research results or materials28–32. The causes of this

Legend

Policy report

Guidelines

Cases, grants, etc.

Data source

MyriadPatentFilings(BRCA 1/2)

Heller and Eisenberg

US BRCA1/2PatentGrants

CanadianBRCA 1PatentGrant

Cook-Deegan and McCormack

Rabino

Canadian BRCA 2 Patent Grant

Straus

Merz

Harvard Mouse (SCC)

Madey v. Duke (US CAFC)

Ongoing: Myriad Genetics Controvery

Nicol and Nielsen

Cho et al.

Monsanto(SCC)

Walsh (a)

Walsh (b)

Walsh (c)

Jensen & Murray

Paradise et al.

Pressmanet al.

Merck v.Integra(USSC)

Re:Dane K.Fisher(USCAFC)

Vogeli et al.

Verbeure et al.

Hansen et al.

Blumenthal et al.

National Research Council—IPR & Licensing Practices

The NuffieldCouncil—The Ethicsof Patenting DNA

OECD—Genetic Inventions, IPR, and Licensing Practices

CBAC—Patenting of Higher Life Forms

Ontario—Genetics, Testing and Gene Patenting

FTC—To Promote Innovation...

PHGU—IPR and Genetics

The Royal Society—Keeping Science Open

NAS—PatentSystem...

ALRC—Report 99

Danish Council of Ethics—Patenting Human Genes...

NIH

CIPO

IPAustralia

UK PatentOffice

OECD

CBAC—HGMs:...

NRC—Reaping...

AACIP—Patents and EU

NAS—Reaping the Benefits...

CBAC—HGMs, IPRs...

WHO—Genetics,Genomics...

WHO—Public HealthInnovation...

1995/1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006

Figure 1 Timeline of gene patenting cases, decisions and studies, and corresponding significant policy activity (refs. 45–56).

PATENTS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


secrecy, however, are still in dispute. In particu-lar, we cannot determine the impact of patents themselves on secrecy, in part because many studies of academic secrecy28,32 use compos-ite measures and, as a result, it is difficult to tease out specific causes thereof. Some stud-ies find that patents per se have little effect on discussion of ongoing research or on sharing of research materials21,29. In contrast, several studies have found that commercial activity, as well as scientific competition and the cost and effort involved in sharing, all have negative effects on open science21,28,32.

Industry funding is also often associated with delayed publication29,33,34. This failure to share research materials seems to have a negative impact on research. For example, Walsh et al. find that 19% of recent requests were not ful-filled (and that failures to supply materials are increasing), and that at least 8% of respondents had a project delayed owing to an inability to get timely access to research materials (com-pared to 1% who were delayed by an inability to get a patent license)21. Finally, some studies show reduced citations of publications once a corresponding patent is granted35,36. However, the causes and implications of such a relation-ship are unclear. In particular, is this a result of a change in research practices or simply of citation practices (that is, an unwillingness to announce infringement in print)? Even if it is the former, does this simply reflect changing incentives causing a shift by researchers (espe-cially industry researchers) toward less encum-bered research areas? The overall social welfare implications of this redirection are also uncer-tain, as there is both the potential loss of fewer people working on a problem, and a potential gain of a more diverse research portfolio36,37.

Analyzing the concerns, evidenceand anecdotesThe survey of policy reports reveals that the Myriad Genetics controversy was used as a pri-mary tool for justifying patent reform—thus highlighting the potential of a single high-pro-file controversy to mobilize both governmen-tal and non-governmental policy makers. In Belgium, for instance, the controversy directly incited the adoption of a research exemption38. There were certainly other gene patenting controversies that might have been used in a similar fashion, but it was the Myriad case that emerged as emblematic of the fear that pat-ents on human genetic material would have an adverse impact on access to useful technolo-gies, both for research and for clinical use. This is most likely because the controversy, more than any other, resonated so well with the theo-retical concerns that existed in the literature. In addition, the clinical consequences were easy

to understand and highly visible breast cancer constituencies were engaged.

Although the available evidence suggests that the concerns associated with the Myriad case have merit in the context of diagnostic tests, the data are hardly definitive, and empir-ical research suggests that data about diag-nostics cannot be generalized to other uses. Furthermore, five years later, there have been few similar gene patent controversies. One pos-sibility is that the Myriad story has become a cautionary tale for the holders of similar gene patents, guiding them toward more construc-tive patent enforcement strategies.

The evidence regarding the anticommons and restricted access concerns is clearer. The empirical research suggests that the fears of widespread anticommons effects that block the use of upstream discoveries have largely not materialized. The reasons for this are numerous and are often straightforward matters of basic economics39. In addition to licensing being widely available40, research-ers make use of a variety strategies to develop working solutions to the problem of access, including inventing around, going offshore, challenging questionable patents and using technology without a license. Though it has been suggested that this latter strategy is an inappropriate and unstable policy15,41, it is important to remember that the stability of this unlicensed use is supported by a combi-nation of the difficulty of enforcing patents owing to the secrecy of research programs, costs of lost goodwill among researchers, costs of litigation, the relatively small damages to be collected from blocking research use, and the interest of the patent owner in allowing

research advances in most cases. An anticom-mons or restricted access–type failure requires not that any one strategy be unavailable, but that the entire suite be simultaneously inef-fective, which may explain why, empirically, such failures are much less common than was first posited.

Finally, the data concerning the increasing secrecy of university researchers seem to indi-cate that there may be a conflation of patenting and commercial and/or scientific competition as the cause of this trend. It appears that aca-demic researchers are becoming more secre-tive, but that is not shown to be attributable to the patenting process, suggesting that the solution might not reside in modifying pat-ent policy. Some have suggested tempering the commercial orientation of faculty and facilitating the flow of research materials42,43. Another approach might be recognizing the inherently competitive nature of the academic process44 and developing additional and improved mechanisms for exchange among its members.

ConclusionsLooking back on years of policy debates and the associated empirical work on gene patents, what lessons can be drawn? First, although there may have been good reasons for con-cern, the feared problems have not widely manifested. And the problems that the data do reveal may have less to do with patents than with commercial concerns, scientific competi-tion and frictions in sharing physical materials. Second, despite the growing acknowledgment of this empirical work, there is still a tendency to recommend policy interventions, usually

1 2 2 2 7 7 9 1624 28 31

223

7

249

0

50

100

150

200

250

300

CM

T-1A

, CM

T-X

Factor V

Leiden

Myotonic dystrophy

SC

A1, S

CA

2, SC

A3,

SC

A6

D/B

dystrophy

Fragile X

AP

OE

FAP

Huntington

Canavan

HF

E

BR

CA

1/2

Cellpro

Myriad G

enetics

Num

ber

of r

efer

ence

s

Figure 2 Explicit references to controversial biotechnology patents and firms in major policy documents after 2002.

PATENTS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


including a ‘research exemption.’ Yet, given the research noted above, a strengthened research exemption seems unlikely to address the anti-commons or restricted access problems, espe-cially in diagnostic testing. And such reforms need to be sensitive to the incentives that pat-ents can provide for developing and distribut-ing research technologies.

The combination of a lack of empirical evi-dence of problems and a mismatch between the problems and proposed solutions may explain why there has been little actual policy change. In addition, our review of the lively policy debate and the limited empirical support for the claims that are driving that debate suggest that policymakers may be responding more to a high-profile anecdote or arguments with high face validity than they are to systematic data on the issues. However, we must acknowl-edge that one effect of these various high-pro-file policy debates may have been to sensitize both administrative and funding agencies (for example, the US Patent and Trademark Office and National Institutes of Health) and patent holders to the possible adverse consequences of the overly liberal granting of patents and overly restrictive licensing practices. Whether this swing of the pendulum will help, hurt or have no effect on innovation and the progress of science remains an open question. Thus, further research on the exact mechanisms underlying these effects, as well as their net impacts, should be encouraged.


ACKNOWLEDGMENTSWe would like to thank Lori Sheremeta, Richard Gold, Michael Sharp, C.J. Murdoch and Robyn Hyde-Lay for the invaluable research assistance; Genome Alberta, AHFMR, the Stem Cell Network and AFMNet for the funding support; and the US National Human Genome Research Institute and Department of Energy (R.C.-D.). We would also like to thank all of the participants of the Genome Alberta Banff Patenting Workshop (May 2006) for insightful comments.

1. Danish Council of Ethics. Patenting Human Genes and Stem Cells (Danish Council of Ethics, Copenhagen, 2004).

2. The Nuffield Council on Bioethics. The Ethics of Patenting DNA: A Discussion Paper (Nuffield Council on Bioethics, London, 2002).

3. Resnik, D.B. J. Law Med. Ethics 29, 152–165 (2001).

4. House of Commons. Standing Committee on Health. Assisted Human Reproduction: Building Families (Government of Canada, Ottawa, 2001).

5. Williams-Jones, B. Health Law J. 10, 123–146 (2002).

6. Kevles, D. & Berkowitz, A. Brooklyn Law Rev. 67, 233–248 (2001).

7. National Academy of Sciences. Reaping the Benefits of Genomic and Proteomic Research: Intellectual Property Rights, Innovation, and Public Health (National Academies Press, Washington, DC, 2005).

8. Heller, M. & Eisenberg, R. Science 280, 698–701 (1998).

9. Merges, R.P. & Nelson, R.R. Columbia Law Rev. 90, 839–916 (1990).

10. Scotchmer, S. J. Econ. Perspect. 5, 29–41 (1991).11. Caulfield, T. Community Genet. 8, 223–227

(2005).12. Nelson, R.R. Res. Policy 33, 455–471 (2004).13. David, P.A. J. Theoret. Institutional Econ. 160, 1–26

(2004).14. Ontario Ministry of Health. Genetics, Testing and

Gene Patenting: Charting New Territory in Healthcare (Government of Ontario, Toronto, 2002).

15. National Academy of Sciences. A Patent System for the 21st Century (National Academies Press, Washington, DC, 2004).

16. Cho, M. Am. Assoc. Clin. Chem. Newslett. 47–53 (1998).

17. Merz, J.F., Kriss, A.G., Leonard, D.G.B. & Cho, M.K. Nature 415, 577–579 (2002).

18. Benzie, R. The National Post A:15 (September 20, 2001).

19. Jensen, K. & Murray, F. Science 310, 239–240 (2005).

20. Walsh, J.P., Cohen, W.M. & Arora, A. Science 299, 1021 (2003).

21. Walsh, J.P., Cho, C. & Cohen, W.M. Science 309, 2002–2003 (2005).

22. Nicol, D. & Nielsen, J. Patents and medical biotech-nology: An empirical analysis of issues facing the Australian industry—Occasional Paper No. 6 (Centre for Law & Genetics, Sandy Bay, Australia, 2003).

23. Nagaoka, S. Presentation to OECD Conference on Research Use of Patented Inventions (Madrid, May 18–19, 2006). <http://www.oecd.org/datao-ecd/20/54/36816178.pdf>

24. Straus, J. Presentation to the BMBF & OECD Workshop on Genetic Inventions, Intellectual Property Rights and Licensing Practices (Berlin, January 24–25, 2002). <http://www.oecd.org/dataoecd/36/22/1817995.pdf >

25. Cohen, J. Science 285, 28 (1999).26. Walsh, J.P., Cohen, W.M. & Arora, A. Patenting and

licensing of research tools and biomedical innova-tion. in Cohen, W.M. & Merrill, S. (eds.) Patents in the Knowledge-Based Economy (National Academies Press, Washington, DC, 2003).

27. Cho, M.K. et al. J. Mol. Diagn. 5, 3–8 (2003).28. Campbell, E.G. et al. J. Am. Med. Assoc. 287, 473–

480 (2002).29. Walsh, J.P. & Hong, W. Nature 422, 801–802

(2003).30. Grushcow, J. J. Legal Studies 33, 59–84 (2004).31. Vogeli, C. et al. Acad. Med. 81:2, 128–136 (2006).32. Blumenthal, D. et al. Acad. Med. 81, 137–145

(2006).

33. Cohen, W.M., Florida, R. & Goe, R. University-indus-try research centers in the United States. Report to the Ford Foundation (Carnegie Mellon University, Pittsburgh, 1994).

34. Bekelman, J.E., Li, Y. & Gross, G.P. J. Am. Med. Assoc. 289, 454–465 (2003).

35. Stern, S. & Murray, F.E. Do formal intellectual property rights hinder the free flow of scientific knowledge? An empirical test of the anticommons hypothesis: NBER Working Paper No. W11465 (2005).

36. Agrawal, A. & Henderson R. Management Science 48, 44–60 (2002)

37. Dasgupta, P. & Maskin, E. Econ. J. 97, 581–595 (1987).

38. Van Overwalle, G. & Van Zimmeren, E. Chizaiken Forum 64, 42–49 (2006).

39. Kieff, F.S. Northwestern Univ. Law Rev. 95, 691–706 (2001).

40. Pressman, L. et al. Nat. Biotechnol. 24, 31–39 (2006).

41. Eisenberg, R. Science 299, 1018–1019 (2003).42. Rohrbaugh, M.I. Fed. Regist. 70, 18413–18415

(2005).43. Grimm, D. Science 312, 1862–1866 (2006).44. Ravetz, J.R. Scientific Knowledge and Its Social

Problems (Oxford Univ. Press, New York, 1973).45. Canadian Biotechnology Advisory Committee. Human

Genetic Materials, Intellectual Property and the Health Sector (CBAC, Ottawa, 2006).

46. World Health Organization. Public Health Innovation and Intellectual Property Rights (WHO Press, Geneva, 2006).

47. Australian Government Advisory Committee on Intellectual Property. Patents and Experimental Use (ACIP, Sydney, 2005).

48. Canadian Biotechnology Advisory Committee Expert Working Party on Human Genetic Materials. Human Genetics Materials: Making Canada’s Intellectual Property Regime Work for the Health of Canadians (CBAC, Ottawa, 2005).

49. National Research Council Committee on Intellectual Property Rights in Genomic and Protein Research and Innovation. Reaping the Benefits of Genomic and Proteomic Research: Intellectual Property Rights, Innovation and Public Health (National Academies Press, Washington, DC, 2005).

50. World Health Organization. Genetics, Genomics and the Patenting of DNA: Review of Potential Implications for Health in Developing Countries (WHO Press, Geneva, 2005).

51. Australian Law Reform Commission. Report 99—Genes and Ingenuity: Gene Patenting and Human Health (SOS Printing Group, Sydney, 2004).

52. Federal Trade Commission. To Promote Innovation: The Proper Balance of Competition and Patent Law and Policy (FTC, Washington, DC, 2003).

53. The Royal Society. Keeping Science Open: The Effects of Intellectual Property Policy on the Conduct of Science (TRS, London, 2003).

54. Public Health Genetics Unit. Intellectual Property Rights and Genetics (PHGU, Cambridge, 2003).

55. Organization for Economic Co-operation and Development. Genetic Inventions, Intellectual Property Rights & Licensing Practices (OECD Publications, Paris, 2002).

56. Canadian Biotechnology Advisory Committee. Patenting of Higher Life Forms and Related Issues (CBAC, Ottawa, 2002).

PATENTS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


PATENTS

Recent patent applications in tissue engineering

Patent # Subject Assignee(s) Inventor(s)

Priorityapplication

datePublication

date

WO 2006067080 Magnetic pole matrices useful for tissue engineering and targeting systematic therapy for cardiovascular disease. The matrices provide a source of strong localized magnetic field gradient for targeted drug delivery and distributes the magnetic nanoparticles on the artificial surface locally and uniformly.

Steinbeis Center for Heart and Circulation Research (Rostock, Germany),Li W, Ma N, Steinhoff G, Steinhoff K

Li W, Ma N, Nan M, Steinhoff G, Steinhoff K, Wenzhong L

12/22/2004 6/29/2006

WO 2006068972 A device for the repair and regeneration of tissue such as skin, bone and cartilage, comprising asupport scaffold layer and cell sheet layer; provides equivalent cell viability and cell distribution toseeding by cell suspension.

Ethicon(Somerville, NJ, USA), Gosiewska A, Seyda A

Buensuceso CS, Colter DC, Geesin JC, Gosiewska A, Scopelianos AG, Seyda A, Sridevi D

12/21/2004 6/29/2006

US 20060128012 A structure for growing isolated differentiablehuman mesenchymal cells for use in tissueengineering; includes a three-dimensional matrix of fibers that forms a scaffold for growing the isolated differentiable human mesenchymal cells.

Arinzeh T, Jaffe M, Shanmugasundaram S

Arinzeh T, Jaffe M, Shanmugasundaram S

12/3/2004 6/15/2006

WO 2006059953 A new cell culture medium comprising tumorgrowth factor-β1, useful for developing research/drug screening kits and liver tissue engineering andbiosensors for detecting chemical/biologicalwarfare agents.

National University of Singapore

Chia SM, Yu H 11/30/2004 6/8/2006

WO 2006055261 Preparation of a biocompatible and biodegradable polyurethane foam, comprising mixing at least one biocompatible polyol, water, at least one stabilizer and at least one cell opener to form a resin mix;useful as scaffolds for bone tissue engineering.

Carnegie Mellon University(Pittsburgh)

Didier J, Guelcher SA, Hollinger JO, Patel V

11/5/2004 5/26/2006

WO 2006046490 Cellular-tissue microchips comprising multiplecell-retaining cavities for forming cellular tissues in uniform configuration and size over a prolonged period of time; useful in, for example, tissueengineering.

Kitakyushu Foundation for the Advancement of Industry, Science and Technology(Kitakyushu, Japan)

Fukuda J, Nakazawa K 10/29/2004 5/4/2006

WO 2006044832 An implant comprising several parallel layersspaced apart by several members, and having auniform thickness and several openings to permit fluid flow through the implant; useful in a scaffold for tissue engineering applications.

Cleveland Clinic Foundation(Cleveland)

Fleischman AJ, Mata A, Muschler GF, Roy S

10/15/2004 4/27/2006

CN 1746295 A tissue engineering cartilage based on filled stem cells from placenta, with a short cell culture timeand easy quality control; useful for functional repair of joint cartilage injuries.

School of Basic Medicine, Military Medical University(Xi’an, China)

Dong L, Duan C, Guo X, Jiang H, Li J, Wang C, Zhou X

9/9/2004 3/15/2006

US 20050276791 A polymer scaffold for, for example, tissueengineering applications such as wound healingand tissue regeneration, comprising polymer layer(s) having uniform structural features withpredetermined geometries.

Ohio State University (Columbus, OH, USA)

Ferrell N, Hansford DJ, Yang S

2/22/2005 12/15/2005

US 20050272153 A three-dimensional tissue scaffold implant forsupporting tissue on-growth comprising a latticehaving a matrix of interconnected pores; an inert, biocompatible material covering the surfaces; andat least one growth factor covering the material;useful for bone regeneration.

Bunger C, Li H,Xuenong Z

Bunger C, Li H,Xuenong Z

1/27/2004 12/8/2005

KR 2005039960 A foam dressing material using chitosan, whose material has increased stretch capacity andtensional strength and is useful for wounddressing and other structures of tissue engineering.

Hyosung Corp.(Seoul)

Choi YB, Kim DS,Kim SK

10/27/2003 5/3/2005

Source: Thomson Scientific Search Service (formerly Derwent). The status of each application is slightly different from country to country. For further details, contact Thomson Scientific, 1725 Duke Street, Suite 250, Alexandria, Virginia 22314, USA. Tel: 1 (800) DERWENT (http://www.thomson.com/scientific).

©20

06 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

ureb

iote

chno

logy


Artificial sperm and epigenetic reprogrammingDiana Lucifero & Wolf Reik

Sperm cells derived from embryonic stem cells give rise to mice with genetic imprinting defects.

Of the many differentiated cell types that have been derived in vitro from mouse embryonic stem (ES) cells, among the most intriguing are cells resembling male and female gam-etes1–4. Such ES cell–derived germ cells have been shown to undergo meiosis to form hap-loid gametes that can support early develop-ment, but their capacity to support post-natal development remained untested. In a recent report in Developmental Cell, Nayernia et al.5 address this question by demonstrating that ES cell–derived male gametes can give rise to viable offspring.

ES cells are derived from the inner cell mass of the pre-implantation embryo at the blastocyst stage of early development. As plu-ripotent cells, they are able to generate cells of every embryonic lineage6. When introduced back into embryos, genetically modified ES cells can be used to generate knockout mice because they give rise to functional germ cells that further differentiate into male and female gametes (sperm and oocytes). ES cells can also be differentiated in vitro into various special-ized cell types, which are of interest both for biological studies and for cell therapies. Eventually, the generation of gametes from human ES cells may be relevant to assisted reproductive technologies.

Nayernia et al. provide the first demonstra-tion that ES cell–derived gametes can lead to the birth of viable mice. Reporter gene–based systems have previously been used to select for a population of stem cells that express germ cell–specific genes such as Oct4 (also known as Pou5f1) or others1–3, and thereby to enrich the population of cells with the most promise of becoming gametes in vitro. Nayernia et al.

took this approach further by using a clever two-step selection system in combination with special culture conditions (Fig. 1).

First, they created an ES cell line to select for spermatogonial stem cells by introducing a promoter (Stra8) active in early male germ cells linked to a marker gene encoding enhanced green fluorescent protein (eGFP). The selected cell population already had the characteristics of male germ cells ready to enter the initial stages of meiosis. Using these enriched cells, they then performed another round of selec-tion by introducing the promoter of a gene expressed in more mature haploid male germ cells (Prm1) linked to another fluorescence marker gene, dsRED. After inducing differenti-ation by retinoic acid treatment in the predom-inantly green fluorescent spermatogonial stem cells, they obtained red fluorescent cells, some of which appeared mobile. The appearance of red fluorescent cells suggested that the emerg-ing haploid cells had undergone the final stages of spermatogenesis. The shape of the resulting sperm was, however, abnormal. Approximately one-third of the cell population obtained by selection with Stra8-eGFP turned into haploid cells when induced to differentiate.

Next, the authors investigated the func-tionality of their in vitro–derived germ cells. Presumably because the sperm were not nor-mal enough to fertilize oocytes by themselves, the authors injected them into oocytes using an assisted reproductive technology called intra-cytoplasmic sperm injection. A propor-tion of the fertilized zygotes developed into normal-looking pre-implantation embryos. Out of 65 pre-implantation embryos, 7 devel-oped into live mice carrying the Prm1-dsRED transgene. However, most newborn transgenic mice were either smaller or larger than con-trol mice, and died between five days and five months after birth. Thus the in vitro–derived sperm did not give rise to normal mice.

The high rate of abnormal development in these manipulated gametes is likely to be

epigenetic, rather than genetic, in nature. Such epigenetic changes may involve DNA modifica-tions (such as methylation) and/or chromatin modifications that in turn regulate gene expres-sion essential for normal development. In this context, imprinted genes, a subset of mam-malian genes that are methylated either in the male or female germ line and hence expressed only from one of the parental chromosomes in the offspring7, are of particular interest. These genes tend to affect fetal growth, and their par-ent-specific DNA methylation marks need to be erased in early germ cells (primordial germ cells), and re-established according to the sex of the gamete at later stages of gametogenesis—in this case, spermatogenesis (Fig. 1).

An important question is whether this epi-genetic reprogramming occurs normally as ES cells undergo germ cell development in vitro; conversely if reprogramming never happens, the resulting sperm would have abnormal pat-terns of imprinted methylation. Nayernia et al. examined the DNA methylation state of three imprinted genes with well-defined differen-tially methylated regions. Overall, the results suggest that some reprogramming did occur in the culture system, but that not all methyla-tion marks were properly erased and re-estab-lished. Indeed, the mice that were born from ES cell–derived male germ cells also had vari-able imprinted methylation profiles. Together, these findings suggest that the poor viability and growth abnormalities of the mice derived from ES cell–derived sperm may be at least partly explained by incomplete epigenetic reprogramming.

Follow-up experiments that rigorously investigate the methylation and expression status of a wider panel of imprinted genes in the ES cell–derived germ cells at multiple time points during the differentiation process will be important to help clarify the extent to which reprogramming occurs normally in such engineered germ cells. A similar more detailed approach should also be taken with

Diana Lucifero and Wolf Reik are in the Laboratory of Developmental Genetics and Imprinting, The Babraham Institute, Cambridge CB2 4AT, UKe-mail: [email protected], [email protected]

N E W S A N D V I E W S©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


offspring from the ES cell-derived sperm to reveal whether the phenotypic abnormalities observed are directly linked to imprinting defects.

The demonstration that sperm derived from ES cells can give rise to viable offspring mice is an important step, but the finding that these mice are abnormal and have imprinting defects suggests that the clinical applications of this procedure are still far off in the future. Clearly, ES cell–derived sperm can be used as a model system for investigating both nor-mal and pathological germ cell development, especially in the case of human germ cells where ethical considerations preclude access to the fetal material that would be needed for such analysis. However, the relevance of this technology for infertility treatments and as a source of gametes is worthy of continued investigation. The production of oocytes from human ES cells would be of particular value for the derivation of stem cells by somatic cell nuclear transfer.

The study by Nayernia et al. raises many interesting questions for future experiments. How extensive is epigenetic reprogramming, and at what time points does it occur in the culture system as compared with the in vivo situation? Reprogramming of sequences other than imprinted genes would also be impor-tant to study. Are other epigenetic marks such as chromatin signatures also affected? Can the normal reprogramming of imprints be encouraged in vitro? As the players involved in reprogramming are still not known, could ES cell–derived sperm be used as an experimental system to dissect out which genes and factors

are essential for normal reprogramming? Do the ES cells used by Nayernia et al. show nor-mal methylation profiles on imprinted differ-entially methylated regions to begin with?

The overall efficiency of isolating sperm from ES cells is also a concern, with only 3% of oocytes microinjected with in vitro–derived male gametes giving rise to adult transgenic viable mice. Can this yield be improved? Are the embryos that fail to develop to term also dying because of aberrant imprinting repro-gramming? Can a similar two-step selection approach be used for female germ cells, or are

Imprintserased

PGCs infetal gonad

Spermatogonialstem cells

Imprintsestablished

Spermatocytes Sperm Earlyembryo

Healthyoffspring

Egg

Imprintsmaintained

EggES cells Spermatogonial

stem cellsHaploid male

germ cells'Sperm' Early embryo

transferred torecipient female

Healthyoffspring?

Normalfertilization

ICSI

Somaticimprints?

Imprintserased?

Imprintsestablished?

Incorrect imprintsmaintained?

a

b

functional oocytes inherently more difficult to obtain? Answers to these questions will enhance insight into the biology of reproduc-tion, and may improve regenerative medicine and assisted reproductive technologies.

1. Hubner, K. et al. Science 300, 1251–1256 (2003).2. Geijsen, N. et al. Nature 427, 148–154 (2004).3. Toyooka, Y., Tsunekawa, N., Akasu, R. & Noce, T. Proc.

Natl. Acad. Sci. USA 100, 11457–11462 (2003).4. Clark, A.T. et al. Hum. Mol. Genet. 13, 727–739

(2004).5. Nayernia, K. et al. Dev. Cell 11, 125–132 (2006).6. Solter, D. Nat. Rev. Genet. 7, 319–327 (2006).7. Reik, W. & Walter, J. Nat. Rev. Genet. 2, 21–32

(2001).

Figure 1 Offspring from ES cell–derived male germ cells. (a) Simplified version of the normal course of events during murine spermatogenesis and the normal timing of imprint erasure, establishment and maintenance. (b) The approach of Nayernia et al. to create ES cell–derived male germ cells for derivation of viable offspring. The stages of normal male germ cell development and those described by Nayernia et al. were matched for simplicity; this rough depiction may not be biologically accurate. The in vitro derivation of sperm results in aberrant imprint erasure, establishment and/or maintenance. PGCs, primordial germ cells; ICSI, Intra-cytoplasmic sperm injection.

The worm turns for antimicrobial discoveryAmit P Bhavsar & Eric D Brown

High-throughput screening using an in vivo infection model identifies nontraditional antimicrobials.

Amit P. Bhavsar and Eric D. Brown are in the Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada. e-mail: [email protected]

Pathogenic bacteria have evolved resistance to all the major antibiotics used to defeat them, partly in response to the use of ‘broad-spec-trum’ antibiotics. This has led to calls for new

therapeutic approaches that do not contribute to the spread of antibiotic resistance1. One such strategy would be to target the virulence mechanisms of pathogenic bacteria so as to disrupt their ability to infect the host but not their viability1. An important advance in this direction was recently reported by Ausubel and colleagues2 in the Proceedings of the National Academy of Sciences USA. They describe an anti-infective screen based upon rescue of

Kat

ie R

is

NEWS AND V IEWS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


the nematode Caenorhabditis elegans from a persistent infection by the human pathogen Enterococcus faecalis. Screening of over 7,000 purified compounds or natural extracts for their ability to promote survival of the infected worms revealed 16 synthetic compounds and 9 natural extracts, many of which appear to either activate host immunity or attenuate pathogen virulence.

High-throughput bacterial virulence screens have previously been used to discover inhibi-tors of virulence for the pathogenic organisms Vibrio cholera3, Yersinia pseudotuberculosis4 and enteropathogenic Escherichia coli5. However, in each of these cases, the bacteria were screened in the absence of a host organism. Moreover, as the V. cholera and Y. pseudotuberculosis assays screened for decreased expression of known virulence genes, they relied on detailed know-ledge of the pathways these organisms use for infection. In contrast, the E. coli screen relied on recognizing compounds that impaired the general bacterial virulence mechanism of protein secretion. Interestingly, the inhibitor of V. cholera virulence substantially reduced in vivo colonization in an infant mouse model3, and the inhibitor of Y. pseudotuberculosis viru-lence was partially effective in a HeLa cell infec-tion model6.

What makes the method of Ausubel and colleagues remarkable is that it marries high-

throughput screening with an in vivo infec-tion model to identify molecules that are true anti-infectives. This simple, elegant approach involves infecting nematodes with E. faecalis by allowing them to ingest the bacteria. Once a persistent infection occurs in their intestines, the worms are systematically exposed to test compounds added to liquid medium in multi-well plates. Several days later, worm survival is scored manually and compared with survival in control media.

The 25 compounds that promoted worm survival undoubtedly act through very differ-ent mechanisms. Notably, the failure of some of these small molecules to affect bacterial survival in vitro suggests that they would have been overlooked in a more conventional screen for antibiotic activity. Whereas some of the compounds appear to act on E. faecalis directly, others seem to modulate worm responses to the bacteria (Fig. 1). Certain members of the latter class of compounds even allow worms to tolerate colonization by the pathogen. Other molecules had in vivo activities at lower con-centrations than the minimum levels needed to inhibit in vitro growth, suggesting multiple modes of action.

Antimicrobials that act directly on the bac-teria could have at least two outcomes with respect to bacterial persistence. Inhibition of virulence could prevent the pathogens from

persisting in the worm intestine, thereby allow-ing the immune system to clear the infection and decrease bacterial numbers. Alternatively, a compound might have little effect on per-sistence but render the bacteria avirulent and therefore harmless. This would manifest as a healthy worm that remained colonized by the pathogen. These two outcomes might also occur if the compound modulated the worm immune system so that it could now readily clear the persistent infection or toler-ate bacterial colonization. Whereas the former mechanism may involve upregulating nema-tode immunity, the mechanisms underlying increased host tolerance are unclear and very intriguing.

Of course, the next challenge is to identify the mechanisms of action for each of the com-pounds that promote worm survival. In cases where the compound alters pathogen growth, the target might be identified genetically—for example, though multicopy suppression where target overproduction facilitates tar-get discovery7. Another approach, recently described for the identification of FabF, a bacterial enzyme involved in fatty acid bio-synthesis, as the target of the antibiotic pla-tensimycin8, exploits target depletion through antisense RNA inhibition of target production. Mutagenesis can also yield target information, as demonstrated by the identification of ToxT,

Inhibitionof bacterialvirulence Modulation of

host responsespermits tolerance

of infection

Inhibitionof bacterialvirulence

Activationof host

immunity

Harmless infection Cleared infectionDeath

Figure 1 Compounds that promote survival of Caenorhabditis elegans after persistent infection with Enterococcus faecalis can either prevent persistence of the pathogen in the intestine or enable the host to tolerate chronic innocuous intestinal colonization. Both protective outcomes may arise from either inhibiting bacterial virulence (red) or stimulating host defenses (blue). A striking difference in the appearance of live worms (sinusoidal posture) and worms that have succumbed to the infection (straight, rigid appearance resulting from proliferation of the pathogen) facilitates high-throughput screening for antimicrobials.

Kat

ie R

is

NEWS AND V IEWS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


a key DNA-binding protein that allows tran-scription of virulence factors, as the target of virstatin, a V. cholera virulence inhibitor3. Although identifying mechanisms of action that target host responses will be challenging, a recent high-throughput screen of C. elegans to identify growth-altering compounds9 seems noteworthy. In this work, a library of 180,000 randomly mutated C. elegans strains was used to identify a calcium channel subunit as the target of nemadipine-A.

Although the in vivo virulence screen designed by Ausubel and colleagues shows great potential, it may require further optimization to become compatible with industrial-scale high-throughput screening. A promising devel-opment in this regard is the use of automated inoculation and imaging of nematodes by Kwok et al.9. The easily scored morphological differ-ences between live and dead worms observed in screens for nontraditional antimicrobials

should be amenable to such automation. Finally, the suitability of C. elegans as a model infection system deserves reconsideration. As pointed out by the authors, C. elegans can feed on sev-eral human pathogens, including the trouble-some Gram negative pathogen Pseudomonas aeruginosa. However, although this model is responsive to E. faecalis, it is not susceptible to infection by other human pathogens, such as Enterococcus faecium.

1. Brown, E.D. & Wright, G.D. Chem. Rev. 105, 759–774 (2005).

2. Moy, T.I. et al. Proc. Natl. Acad. Sci. USA 103, 10414–10419 (2006).

3. Hung, D.T., Shakhnovich, E.A., Pierson, E & Mekalanos, J.J. Science 310, 670–674 (2005).

4. Kauppi, A.M et al. Chem. Biol. 10, 241–249 (2003).5. Gauthier, A. et al. Antimicrob. Agents Chemother. 49,

4101–4109 (2005).6. Nordfelth, R. et al. Infect. Immun. 73, 3104–3114

(2005).7. Li, X. et al. Chem. Biol. 11, 1423–1430 (2004).8. Wang, J. et al. Nature 441, 358–361 (2006).9. Kwok, T.C.Y. et al. Nature 441, 91–95 (2006).

The sweet side of biomarker discoveryCarlos J Bosques, S Raguram & Ram Sasisekharan

Glycomics offers exciting possibilities for discovering serum biomarkers.

Glycans are involved in each and every aspect of tumor progression, from cellular prolif-eration to angiogenesis and metastasis1,2, and in theory could be used to diagnose, predict susceptibility to and monitor the progression of cancer. Despite the potential of glycans as diagnostic and prognostic biomarkers1, how-ever, progress in developing reliable clinical tools has been slow. In two recent articles in Molecular and Cellular Proteomics3 and the Journal of Proteome Research4, researchers led by Carlito Lebrilla and Suzanne Miyamoto describe a promising strategy for discovering glycan biomarkers based on analyzing total glycan profiles rather than the sugars on par-ticular glycoproteins.

Protein glycosylation is one of the most common post-translational modifications

in humans. In fact most secreted proteins are glycosylated, including important tumor biomarkers such as prostate- specific antigen and the ovarian cancer marker CA125. Many publications have suggested the use of glycans for cancer diagnostics1. Glycans might reflect pathologies in instances when reliable changes in protein profiles cannot be identified. Present at the cell surface and in the extracellular matrix, glycans are critically important in the remodeling of the microenvironment during tumorigenesis. Alterations to the normal func-tion of the glycosylation machinery are increas-ingly recognized as a consistent indication of malignant transformation and oncogenesis.

The extreme sensitivity of glycosylating enzymes to pathological changes is reflected by large alterations in the distribution of glycoforms presented on glycoproteins. For example, upregulation of N-acetylglucosami-nyltransferase V and sialyltransferases (lead-ing to increased β-1,6-GlcNAc branching and sialylation of N-linked glycans, respectively) are major hallmarks of cancer progression5. Specifically, increased branching of N-linked glycans has been associated with invasion, angiogenesis and metastasis6, and increased sialylation on the cell surface can, for example,

promote cell detachment from primary tumors through charge repulsion, thereby inducing tumor proliferation2. Increased branching of O-linked glycans by core 2 β-1,6-N-acetylglu-cosaminyltransferase has also been associated with tumorigenesis7.

With at least two notable exceptions8,9, ear-lier studies have aimed to use carbohydrates from a specific glycoprotein as the diagnostic fingerprint. Although this approach has shed light on the functions of particular glycopro-teins in many diseases, it has been difficult to correlate disease progression with specific gly-cosylation patterns on the protein of interest. Owing to the dynamic nature of these co- and post-translational modifications and their pleiotropic regulation, a broad overview of the total pathological changes to glycans in a tissue or body fluid (‘glycomics’) may be more informative than characterizing the glycans on particular proteins.

In the new papers, the groups of Lebrilla and Miyamoto, focusing on O-linked oligo-saccharide markers in ovarian and breast can-cer models, rapidly profiled the total glycans released from glycoproteins in the culture media of various cancer cell lines or in the sera of diseased mice or patients. Their approach differs from earlier studies focused on single glycoproteins in that the O-linked glycans of all serum glycoproteins are released by β-elimi-nation and analyzed. After a simple purifica-tion through graphitized carbon solid-phase extraction cartridges, fractions of glycan mix-tures are directly analyzed by matrix-assisted laser desorption/ionization Fourier transform ion cyclotron resonance mass spectrometry (Fig. 1). Because no high-performance liquid chromatography separation is required, the procedure is fairly fast. Another advantage is high sensitivity, as only 50-nl samples are needed. Furthermore, as serum albumin is not glycosylated, this approach also overcomes many of the problems faced by proteomics in removing serum albumin and other abundant proteins before analysis.

Profiling of the glycans in the media of four breast cancer cell lines revealed that simi-lar breast cancer cell lines had essentially the same oligosaccharide profile, whereas a more precancerous ductal carcinoma cell line and a noncancer cell line showed different glycan profiles. This suggests that differences between glycans in different cell lines may reflect distinct stages of breast cancer. Remarkably, glycans that appeared as the disease advanced were also found in the breast cancer cell lines, but not in control cells. Furthermore, a mouse model of breast cancer revealed significant changes in glycan profiles during disease progression. Although results in mice may not adequately

Carlos J. Bosques, S. Raguram and Ram Sasisekharan are in the Biological Engineering Division, Harvard-MIT Division of Health Sciences and Technology, Center for Environmental Health Sciences, Center for Biomedical Engineering, Massachusetts Institute of Technology, 15-561, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA. e-mail: [email protected]

NEWS AND V IEWS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


represent human cancers, an advantage of these model systems in evaluating the potential of biomarkers is that matching samples can be obtained from genetically identical animals.

The authors also compared the serum gly-coprofiles from a limited number of cancer patients and healthy individuals. Although the carbohydrates of the graphitized carbon cartridge fractions from both sets of samples were similar, glycans from cancer samples eluted at higher acetonitrile concentrations than those from noncancer samples. This result could arise from different isomeric structures. However, it is also possible that other features, such as differences in relative glycan amounts, could affect the elution profiles. For example, as glycans are synthesized through an intercon-nected (and very sensitive) circuit of enzymes, pathophysiological alterations could easily alter the expression ratios of glycans by converting some glycan precursors to glycan products.

Owing to its low detection limits, efficiency and rapid characterization capabilities, mass spectrometry remains one of the most popular

techniques for biomarker discovery. As demon-strated in the papers by Lebrilla and Miyamoto, matrix-assisted laser desorption ionization mass spectrometry provides a good alternative for effi-ciently analyzing complex glycan profiles from cell lines or serum for the purpose of discover-ing new biomarkers. However, the glycan struc-tural analyses that are necessary and sufficient will likely vary with the particular application. Furthermore, if specific glycan signatures are to serve as clinically accepted biomarkers, these analytical techniques must be optimized for high reproducibility and sensitivity so that large numbers of human samples can be analyzed with statistical significance. Such studies will generate vast and complicated data sets, making data analysis a possible bottleneck. Therefore, better glycan-based bioinformatics tools must also be developed (Fig. 1)10. Although currently there are initiatives to use mass spectrometry in the clinic for diagnostic purposes, other tech-niques, such as lectin and glycan arrays, will also be indispensable in transforming glycan biomarkers into reliable clinical tests.

*

α

✲

Glycan release andpreconcentration MS glycoprofiling Glycomic

ratios?

Diagnosticglycomic signatures

Presence/absence

Featurerecognition

Structuralsubgroups?

100

80

60

40

20

% in

tens

ity100

80

60

40

20

% in

tens

ity

(m/z)

(m/z)

0

50

100

–50

–100

/

/20

40

60

80

100

% in

tens

ity%

inte

nsity

% in

tens

ity

(m/z)20

40

60

80

100

Normal serum

Cancer serum

Glycans have great potential as cancer biomarkers because of their involvement in all stages of tumor progression1,2,11. If we can capture this vast information content in an efficient and meaningful manner and develop appropriate technologies for clinical translation, these biomolecules could become an alternative or a complement to DNA and proteins in the difficult endeavor of early can-cer diagnosis.

1. Dube, D.H. & Bertozzi, C.R. Nat. Rev. Drug Discov. 4, 477–488 (2005).

2. Fuster, M.M. & Esko, J.D. Nat. Rev. Cancer 5, 526–542 (2005).

3. Kirmiz, C. et al. Mol. Cell. Proteomics 10.1074/mcp.M600171–MCP200 (2006).

4. An, H.J. et al. J. Proteome Res. 5, 1626–1635 (2006).

5. Dennis, J.W., Granovsky, M. & Warren, C.E. Biochim. Biophys. Acta 1473, 21–34 (1999).

6. Dennis, J.W. et al. Science 236, 582–585 (1987).7. Shimodaira, K. et al. Cancer Res. 57, 5201–5206

(1997).8. Callewaert, N. et al. Nat. Med. 10, 429–434 (2004).9. Butler, M. et al. Glycobiology 13, 601–622 (2003).10. Raman, R. et al. Nat. Methods 2, 817–824 (2005).11. http://grants.nih.gov/grants/guide/rfa-files/RFA-CA-

07–020.html

Figure 1 Global serum glycomics for biomarker discovery. In contrast to most glycan analysis studies that focus on particular glycoproteins, total glycans from all serum glycoproteins are cleaved and preconcentrated before global glycoprofiling using matrix-assisted laser desorption/ionization mass spectrometry. Spectra from the sera of cancer patients and healthy subjects are analyzed to identify cancer-associated glycan signatures. Efficient bioinformatics tools with feature recognition capabilities will likely be indispensable in identifying cancer-associated glycomic signatures and translating them into noninvasive diagnostic tests.

Kat

ie R

is

NEWS AND V IEWS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Compound relationshipsDetermining the mode of action of new compounds can take years, but now Parsons et al. describe a way to map out the functions of bioactive compounds rapidly. Extending previous work on lethal genetic pro-files in yeast deletion mutants, the researchers used Ron Davis’ barcode microarray approach to test a set of ~5000 deletion mutant strains for sensitivity to a range of compounds connected to human therapeu-tics—small molecules and natural products, both purified and in crude extracts, including 23 FDA-approved drugs. By analyzing the data with probabilistic sparse matrix factorization analysis, they were able to create multi-factorial groupings (as opposed to hierarchical clustering which allows for a compound to reside in a single group). Whereas some obvi-ous groups emerged, some unexpected relationships were revealed. For example, the profile of the cytotoxic anti-HIV compound papuamide B (pap B) resembled that of peptides that disrupt membranes. Molecular biochemical analysis of pap B resistant mutants showed that pap B’s target is phosphatidyl serine, an important component of the yeast cell wall. This result explains how pap B interferes with HIV infection, and illustrates the power of the barcode approach in elucidating a drug’s mode of action. (Cell 126, 611–625, 2006) LD

MicroRNAs suppress angiogenesis inhibitorsMicroRNAs have been associated with the regulation of several cellular processes, including differentiation, cell proliferation, and apoptosis. Now a role for a cluster of these short (~22 nt) non-coding RNAs has been elucidated in the control of endogenous angiogenesis inhibi-tors in colon cancer. After studying the role of MYC proto-oncogene

overexpression in tumor angiogenesis and in the growth of colonocytes containing mutations in the KRAS proto-oncogene and the TP53 tumor suppressor gene implanted into mice, Dews et al. demonstrate that lev-els of the micoRNA polycistron miR-17-92—which is transcriptionally activated by Myc and amplified in B-cell lymphomas—are elevated in Myc-overexpressing colon cancer cells. They then go on to show that some of these microRNAs directly downregulate mRNAs encoding the anti-angiogenic proteins thrombospondin-1 (Tsp1) and connective tissue growth factor (CTGF). Direct overexpression of the miR-17-92 locus in the absence of oncogenic Myc partially restores Myc-dependent phenotypes by downregulating Tsp1 and CTGF mRNAs and increas-ing tumor angiogenesis; conversely, administration of a mixture of antisense 2´-O-methyl oligoribonucleotides (antagomirs) specific for microRNAs derived from the miR-17-92 re-establishes Tsp1 and CTGF protein expression. The study suggests putative therapeutic potential for antagomirs directed against these microRNAs in colon cancer. (Nat. Genet. 38, 1060–1065, 2006) JWT

Submersible riceSeasonal flooding destroys an estimated $1 billion worth of rice crops each year in south and southeast Asia. But now researchers have identified a gene that allows some cultivars to survive submergence for up to two weeks. The gene, Sub1A, is part of the 182-kilobase Sub1 (Submergence 1) quantitative trait locus, which comprises three ethylene-response factors (ERFs) and ten other genes unrelated to tolerance. Sub1A is overexpressed when plants are submersed. One allele, Sub1A-1, found only in tolerant cultivars, encodes a single nucleotide polymorphism in a mitogen-activated protein kinase site, which may explain the effectiveness of the Sub1A ERF, as phosphory-lation can affect ERF-DNA binding. Intolerant japonica rice plants transformed with a copy of the allele survived 11 days of submer-gence, though they were somewhat smaller than normal japonica plants. To further engineer a flood-hardy rice without interfering with the variety’s desirable traits, Xu et al. introgressed the Sub1 genes into the widely grown Swarna Indian rice variety and used marker-assisted selection to pick out the progeny plants with the few-est chromosomal segments from the Sub1-source plant. The resultant Swarna-Sub1 lines survive extended submergence but show yield and plant height comparable to Swarna’s. The introduction of flood-resistant varieties is an important step in protecting farmers from flood-related losses. (Nature 442, 705–708, 2006) TM

Antibody inflammatory sweet spotAlthough antibodies are best known for their ability to target foreign antigens and trigger inflammation, administration of IgG mixtures pooled from thousands of donors can relieve certain inflammatory autoimmune disorders. Ravetch and colleagues appear to have resolved this paradox by demonstrating that in mice, two terminal sialic acid moieties on a glycan linked to the IgG Fc domain account for the antibody’s anti-inflammatory effects. Removal of the sugar residues promotes inflammation. Despite differences between mouse and human IgG subclasses and Fc receptors, this suggests that antibodies developed to treat autoimmune diseases, such as arthritis, lupus and asthma, should be optimally sialylated, whereas sialylation of cytotoxic antibodies designed to counteract diseases such as cancer should be restricted. The glycosyltransferase and glycosidase activities in anti-body-producing cells that presumably limit inflammation to periods of infection may be promising alternative therapeutic targets. (Science 313, 670–673, 2006) PH

PALM images stack upDespite significant progress in recent years, ‘super-resolution’ techniques for overcoming the diffraction-limited resolution of far-field optical microscopy have yet to achieve macromolecular resolution. Using a new super-resolution approach and total internal reflection fluorescence microscopy, Betzig et al. have succeeded in imaging proteins labeled with a photoactivatable fluorescent protein at a resolution below 10 nm. Photoactivated localization microscopy (PALM) resolves the signals of multiple proteins within a diffraction-limited region by an iterative process of photoactivation, localization and bleaching. Each cycle reveals the location of a small fraction of all the labeled proteins in the sample. A stack of ~104–105 images is then combined to generate a single image showing the positions of ~105–106 individual proteins. As the authors demonstrate, the labeled proteins can be visualized in their cellular context by superimposing a PALM image and a transmission electron microscope image. Compared with transmission electron microscopy of immunolabeled samples, PALM shows the labeled proteins at a much higher density (up to 105 per µm2). The method can be applied to cryosections of pelleted cells or to fixed whole cells. (Sciencexpress 10 August 2006 10.1126/science.1127344) KA

Research Highlights written by Kathy Aschheim, Laura DeFrancesco, Peter Hare, Teresa Moogan & Jan-Willem Theunissen

RESEARCH H IGHL IGHTS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Empowering microarrays in theregulatory setting

The mission of the US Food and Drug Administration (FDA) is to protect and promote the public health. One of the ways we carry out

this mission is by advancing innovations that make medicines and foods safer, more effective or more affordable. Currently, almost nine out of ten investigational pharmaceuticals fail during clinical development. These failures are thought to be a consequence of both the variability among patients caused by intrinsic and extrinsic factors, and the inability to predict the effects of molecular entities in people based on in vitro and animal studies.

There is a critical need for methodologies that can describe altered gene expression and cellular protein profiles in terms of their early metabolic consequences and relate these to developing, established or regressing pathologies in humans. Because of this need, drug companies are pursu-ing active R&D projects to develop reliable biomarkers of efficacy and toxicity using various technologies, often supported by DNA microar-ray data. Biomarkers can be defined as measurable characteristics that reflect physiological, pharmacological, toxicological or disease processes in humans or animals.

The FDA is an active participant, with the regulated industry and the scientific community, in promoting innovative tools that will advance its mission. This participation is reflected in several significant documents, including the FDA white paper, The Critical Path to New Medical Products1 (which identifies pharmacogenomics as crucial to advancing medical product development and personalized medicine), Draft Guidance on Pharmacogenetic Tests and Genetic Tests for Heritable Markers2 and the Guidance for Industry: Pharmacogenomic Data Submissions3.

This last document recognizes that at present, most pharmacogenomic data are of an exploratory or of a research nature, and FDA regulations do not require that these data be submitted to an investigational new drug application or that complete reports be submitted to a new drug application or biologics licensing application. However, to be prepared to appropriately evaluate anticipated future submissions, FDA and industry scientists need to develop an understanding of relevant scientific issues, including the following: the types of genetic loci or gene expression profiles being explored for pharmacogenomic testing; the test systems and techniques being employed; the problems encountered in applying pharmacogenomic tests to drug development and to clinical outcomes; and the ability to transmit, store and process large amounts of complex

pharmacogenomic data streams with retention of fidelity. As described in the Guidance for Industry: Pharmacogenomic Data Submissions3, the FDA is asking sponsors conducting such programs to consider providing pharmacogenomic data to the agency, voluntarily, when such data are not otherwise required by regulation.

DNA microarray technology, a tool that can evaluate simultaneously the relative expression of thousands of genes, has developed rapidly and has been suggested as the presently preferred technology to identify early biomarkers of toxicity and disease. The outcome of microarray studies can be affected by many technical, instrumental, computa-tional and interpretative factors. Indeed, a major criticism voiced about microarray studies has been the lack of reproducibility and accuracy of the derived data.

To address this concern, the microarray community and regulatory agencies have developed a consortium to establish a set of quality assur-ance and quality control criteria to assess and assure data quality, to iden-tify critical factors affecting data quality and to optimize and standardize microarray procedures so that biological interpretation and decision making are not based on unreliable data. These fundamental issues are addressed by the MicroArray Quality Control (MAQC) project.

The MAQC project aims to establish quality control metrics and thresholds for the objective assessment of the performance achiev-able by different microarray platforms and evaluating the merits and limitations of various data analysis methods. It is anticipated that the MAQC project will help improve microarray technology and foster its appropriate application in discovery, development and review of FDA-regulated products.

The results of these efforts are published in the compendium of papers that follows. It is anticipated that the efforts made by the con-tributors from diverse sectors of the scientific and regulatory communi-ties will serve as a solid foundation on which to build a consensus on the use of microarray data in a regulatory setting. The development and validation of microarray quality control procedures also will serve as a foundation for integrating proteomics and metabonomics to accom-plish an applied systems biology approach to elucidate complex disease pathways and identify and validate therapeutic targets, and identify disease biomarkers. Understanding of disease pathways will accelerate drug discovery and make clinical trials more informative by providing pharmacodynamic and safety information earlier in the drug develop-ment process. Ultimately, exploitation of microarray-based biomarkers will help bring about the transition from population-based medical treatment to true personalized medicine.

Daniel A. Casciano & Janet Woodcock

1. http://www.fda.gov/oc/initiatives/criticalpath/2. http://www.fda.gov/cdrh/oivd/guidance/1549.pdf3. http://www.fda.gov/cder/guidance/6400fnl.pdf

Daniel A. Casciano is professor, University of Arkansas for Medical Sciences, Little Rock, 47 Marcella Drive, Little Rock, Arkansas 72223, USA. He was former director of the US Food and Drug Administration’s National Center for Toxicological Research in Jefferson, Arkansas, USA. Janet Woodcock is Deputy Commissioner for Operations, US Food and Drug Administration, Rockville, Maryland 20857, USA.e-mail: [email protected]

F O R E W O R D©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Impact of microarray data quality on genomic data submissions to the FDAFelix W Frueh

How can microarray data best be exploited and integrated into the regulatory decision-making process?

Five years ago, the completion of the sequenc-ing of the human genome was announced1,2,

triggering many comments about the value of this knowledge for new approaches and insights into drug development. However, although genomics is used in an increasing number of drug development programs, the genomics-led ‘revolution’ in drug development has not hap-pened yet. This can be attributed to a variety of reasons; one reason is the lack of a thorough evaluation of the quality of novel technologies such as DNA microarrays as well as the man-ner in which the results of such experiments are analyzed and interpreted.

To investigate the challenges presented to regulators by microarray data, the US Food and Drug Administration (FDA) spearheaded the formation of the MicroArray Quality Consortium (MAQC), which brings together researchers from the government, industry and academia to assess the key factors contributing to variability and reproducibility of microarray data. Ultimately, the data from this initiative will help determine a new set of standards and guidelines for the use of DNA microarray data.

Genomic data maturesSeveral factors have encouraged the adoption and integration of genomic data in drug devel-opment and regulatory assessment, including a better understanding of disease pathophysi-ology and targeted drug molecules to sites of action. However, there are challenges to further

expansion of genomics use; one key issue fre-quently discussed is that genomic science has evolved more quickly than technologies suitable for generating consistent, high-quality genomic data. Before 2004, genomic information was largely absent from the investigational new drug submissions or new drug applications received by the FDA; today, that situation is chang-ing (Fig. 1). This more than likely reflects the timelines associated with the drug development process overall and the integration of genomics within that process. It is therefore logical that by this time, we should be starting to see an increase in submissions to the FDA containing genomic information; indeed, the number of data submissions containing genomic informa-tion is increasing significantly (Fig. 1).

On the basis of 20 voluntary genomic data submissions that have been submitted to the

FDA so far, it appears that the technologies for generating genomic data have only recently become a commodity of broader applica-tion. Recently, the integration of large-scale screening approaches (e.g., gene expression profiling or whole genome single-nucleotide polymorphism (SNP) scans has been observed in different stages of drug discovery and now also in drug development. Consequently, at this point, the generation and exploitation of genomic data generated from such large-scale efforts in modern drug development requires a regulatory environment adequately equipped to review such data.

The agency respondsShortly after the human genome sequence was announced, a seminal paper by the FDA’s Lesko and Woodcock3 was published highlighting

Felix W. Frueh, US Food and Drug Administration, Office of Clinical Pharmacology, Center for Drug Evaluationand Research, 10903 New Hampshire Avenue, Silver Spring, Maryland 20993, USA.e-mail: [email protected]

20

15

10

5

0

Num

ber

of s

ubm

issi

ons

Q1 '04 Q2 '04 Q3 '04 Q4 '04 Q1 '05 Q2 '05 Q3 '05 Q4 '05 Q1 '06 Q2 '06

Quarter of year

ConsultsVGDS

Figure 1 Increase in formal requests (consults) for genomic data review (data submitted as part of regular INDs, NDAs or BLAs) to the Office of Clinical Pharmacology, and voluntary genomic data submissions (VGDS) to the FDA, since 2004. IND, investigational new drug; NDA, new drug application; BLA, biologic license application.

C O M M E N TA RY©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


the importance of new guidance for regula-tory submissions containing genomic infor-mation. This ‘call to arms’ was followed by a series of workshops organized by FDA/the Drug Information Association (Horsham, PA, USA)/and the Pharmaceutical Researchers and Manufacturers of America (PhRMA; Washington, DC, USA) on pharmacogeno-mics, which led to the development of a guid-ance document and ultimately facilitated a new type of voluntary data submission process—voluntary genomic data submission (VGDS). This process allowed for a new informal inter-action between sponsors of voluntary submis-sions and regulators to discuss the science of novel, exploratory uses of pharmacogenomics. The Guidance for Industry: Pharmacogenomic Data Submissions4, released as a final guid-ance document in 2005, was accompanied by two additional documents explaining the newly created VGDS path and the function/responsibilities of a newly created FDA-wide Interdisciplinary Pharmacogenomic Review Group (IPRG), respectively.

At the same time, the FDA launched a new website (http://www.fda.gov/cder/genomics), which serves as a portal for regulatory infor-mation in the area of genomics. Together, these new regulatory resources allow and promote the submission of exploratory, cutting-edge genomic data to the FDA. This exploratory information is not used by regulators or industry as part of regulatory decision mak-ing, which is a critical aspect as it is understood that many of the data sets generated with this new technology are not yet sufficiently mature to contribute to critical regulatory decisions that have a wide-ranging impact on entire drug development programs. Nonetheless, these data are of value to regulators in understand-ing the changes underway in the processes, approaches and direction of drug research and development programs. It is also impor-tant to note that the Guidance for Industry: Pharmacogenomic Data Submissions4 is not a guidance about ‘voluntary’ submissions alone;

instead, in very general terms, it explains what types of genomic data need to be submitted to the FDA and when, and what types of data can be submitted on a voluntary basis.

Voluntary genomic data submissionThe VGDS program creates a forum for sci-entific data exchange and discussions with the FDA outside of the regular review process. The VGDS program is used for a variety of strategic purposes and continues to evolve. For example, sponsors submit data on a voluntary basis to discuss the potential impact of using this infor-mation in the drug development program: this leads to questions such as ‘how can we test the hypothesis and how can it be validated’ or ‘will this approach provide us with a clinically useful answer,’ but also to such questions as ‘how do we best analyze the data’ or ‘what is the most suitable approach for a biological (that is, mechanistic) interpretation of the data?’

To date, the FDA/IPRG has received and reviewed ~20 voluntary genomic data sub-missions. These submissions varied signifi-cantly in content and focus (Table 1), and a large number contained microarray gene expression data. Even though most of the microarray data were generated using a pho-tolithographically synthesized oligonucleotide chip platform (Affymetrix; Santa Clara, CA, USA), the heterogeneity of the data submis-sions was surprising, illustrating to the agency two key problems: first, the need for standard-ization in data generation, normalization and submission; and second, the need for measures of data quality.

Although VGDS data allow the FDA to gain insight into specific drug development pro-grams and genomic data used within them, the data often do not allow a systematic assess-ment of quality measures. Consequently, these VGDS data are ideal to create snapshots of the state of the art in industry generation and use of genomic data, but they may or may not be consistent with more general quality standards. This poses a challenge to the interpretation of

the data itself and the conclusions drawn from such data sets. Even so, our experience with reviewing microarray data sets has already given us invaluable information that has helped us both to design the experiments and strategies needed to create such standards and to point to the most critical aspects of data analysis and interpretation.

The genesis of MAQCTogether with a variety of other motivating fac-tors outlined elsewhere in this issue of Nature Biotechnology, reviewers in the IPRG created a list of issues that need to be addressed for microarray data to become acceptable for regu-latory review. For example, data normalization was identified as a major factor for differences when comparing results and data interpreta-tions performed by sponsors and FDA review-ers. The use of different data analysis protocols could explain differences as well as the use of different data interpretation tools, such as software for pathway analyses. In other words, the VGDS process, and data received in volun-tary submissions, have helped to identify the impact of different data analysis strategies, but the data themselves cannot be used to address and solve the issues. To do this, a broader, well defined and generalizable process needs to be used, such as the MAQC, which allows the sys-tematic exploration of all sources of variability, the assessment of the importance of each factor (e.g., how much does a difference in data nor-malization contribute to the overall variability in the data seen) and, ultimately, the determi-nation of a set of standards and best practices to be followed.

It is reasonable to expect, however, that dif-ferent parameters (technical as well as practical) may continue to hamper the implementation of ‘best practices’ in the future under certain cir-cumstances. We are aware that the studies con-ducted in the MAQC occur in an ‘optimized’ environment that may not always be possible to implement because of limitations to infra-structure, slower turnaround time of sample processing or other restrictions that real-life settings bear. Regardless, it is critically impor-tant to identify, and evaluate the importance of each individual step in the generation, process-ing and interpretation of microarray data to be able to assess the extent to which these steps may contribute to data variability.

For this reason—and because a regulatory agency must not only be able to understand the steps that led to the generation of data that are submitted, but should also be able to set this information into the context of other, similar data to assess data consistency and overall quality—it is important to know what could be considered a ‘gold standard.’ Ultimately,

Table 1 Focus areas of voluntary genomic data submissions as of February 2006Therapeutic areas Scientific field

Cancer (multiple types) Biomarkers

Alzheimer disease Genotyping devices

Hypertension Microarrays

Hypoglycemia Analysis software, databases

Depression Metabolic pathways

Obesity Enrichment design

Rheumatoid arthritis Registry design

All Toxicology

All Biostatistics

COMMENTARY©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


this knowledge will help the agency to better understand and interpret data that have been generated under less ideal settings because the sources of the largest data variability or uncertainty in data interpretation are known and can be addressed more adequately. It has been suggested that, for example, GLP (Good Laboratory Practices, 21 CFR part 58)5 could be used to clarify some of the issues around microarray standards and data variability. The requirements of 21 CFR part 58 apply to non-clinical studies submitted to support safety findings, including nonclinical pharmacoge-nomic studies intended to support regulatory decision making. Given the exploratory nature of many of the microarray studies, it seems unreasonable to expect compliance with part 58 for these types of studies with the rigor of these standards. At the same time, it may not be feasible to conduct separate, long-term, non-GLP preclinical studies: sampling of tis-sues from GLP studies is a valuable means of conducting additional exploratorion, investi-gational studies. Although the removal of tis-sue samples and the reason for removal (e.g., exploratory, mechanistic study, tissue banking) should be specified in the protocol, the removal of specimens for investigational purposes from a study does not invalidate the GLP status of the main toxicology study, if otherwise acceptable (see ref. 4, section IV.D. for more details).

The ultimate goal—standardsData generated during modern drug develop-ment are becoming increasingly more com-plex and large data sets, such as microarray data, need to be handled and processed in an efficient and coherent fashion. The FDA has started with the implementation of new data standards, such as the ones recommended by the Clinical Data Interchange Standards Consortium for new regulatory submissions. To date, these standards are available for such data sets as pharmacokinetics/pharmacody-namics; they are not yet available for genomic data submissions. Although we feel that it is too early for definitive recommendations on how, and in what format, to submit genomic information, it is advisable to work toward such standards, even at this early stage. Lessons learned from the VGDS program have been helpful in explaining and recom-mending aspects of submitting genomic data already and the agency feels that efforts, such as the MAQC, will further help it create ‘best practices’ and recommendations detailing the preferred formats and extent of genomic data submissions that should accompany regula-tory filings with the FDA.

From the analysis of approximately ten voluntary data submissions that contained

microarray data, the agency has found that the results are heavily dependent on the qua-lity of the starting material that is being used for a microarray experiment, the data analysis protocol and the biological pathway analysis tools available to interpret lists of statistically significant genes. Sample storage and prepa-ration are critical for the reproducibility of these data. Poor sample quality can prevent data interpretation from being conclusive.

A second critical factor is the data analysis protocol. Different sets of gene expression signatures with different biological contexts can be generated from the same raw data by different data analysis protocols. Different biological contexts can also be generated from the same gene expression signature by different biological pathway analysis tools. The biological interpretation of the data is common currency for VGDS discussions and regulatory review between the sponsors and the FDA. The uniqueness of a list of genes in a signature is not in and of itself the goal of exploratory biomarker investigations submit-ted as part of a VGDS. It could, however, be important in the selection of signatures for validation studies.

Consequently, for the FDA to interpret microarray data that are submitted for regu-latory purposes, it is critical for sponsors of genomic data submissions to include a pre-cise description of the steps involved before the actual array experiment, including the method of sample collection, storage, RNA extraction and labeling, as well as the data analysis protocol and biological pathway interpretation tools applied to these data.

Much has been published about the con-cordance, or lack thereof, of data generated on different gene expression analytical plat-forms6,7. Although the MAQC addresses this issue with the goals of establishing quality parameters for microarray experimentation and of identifying sources of variability, the agreement (overlap) of different platforms in real-life settings might actually be less important, especially in situations where a gene signature is to be identified that might be narrowed down to a handful of key genes. In these cases, it is likely that the assay itself will be moved onto a different platform (e.g., from a high-density microarray plat-form onto low-density arrays or quantitative PCR) that would require new and indepen-dent validation. For this scenario, it is first important to identify the particular subset of genes that is predictive of a given state (e.g., disease, treatment effect) that in fact may or may not resemble the actual full set of genes altered in expression in that state. The use of one particular (e.g., high-density)

microarray platform can be used to screen for and identify these ‘predictive’ genes without the need to get a full ‘representative’ picture of the transcriptome, with the intention of producing a signature set of genes that can be used in downstream applications, such as clinical trials and clinical practice.

Consequently, it is sufficient and rea-sonable to expect that only a subset of the transcriptome will be analyzed for this pur-pose. This approach is not unlike the use of haplotypes and ‘tag-SNPs’ when genotyping (rather than gene expression profiling) is performed to characterize a particular dis-ease or disease state. Additional sources of discordance among platforms include, for example, the fact that often different loca-tions in the same gene are used as probes on different platforms, which may, for example, either result in different reported intensities (that are therefore interpreted as different fold changes) or result in the analysis of par-ticular splice variants of a gene.

ConclusionsDespite the limitations outlined above, we believe that microarray platforms are suitable tools to produce high-quality and reliable data that will prove useful in drug devel-opment and regulatory decision making. Understanding the limitations and assess-ing the variability is imperative, however. As those in regulatory agencies are exposed to, and expected to adequately analyze, microar-ray data, we need a better understanding of the technology and agreement on standards and formats for submission of data and the interpretation of the results.

The MAQC provides an excellent and unprecedented resource to determine ‘best microarray practices,’ including the use of reference material, data assembly and for-mats. This and other efforts, such as the VGDS program at the FDA are instrumental to the efficient and effective use of microarray data in the regulatory review process. Given the increasing number of genomic data sub-missions to the agency, these initiatives are happening just at the right time.

Disclaimer: The views expressed in this article are those of the author and not necessarily those of the US Food and Drug Administration.

1. International Human Genome Sequencing Consortium. Nature 409, 860–921 (2001).

2. Venter, J.C. et al. Science 291, 1304–1351 (2001).3. Lesko, L.J. & Woodcock, J. Pharmacogenomics J. 2,

20–24 (2002).4. http://www.fda.gov/cder/guidance/6400fnl.pdf5. http://www.cfsan.fda.gov/~dms/opa-pt58.html6. Shi, L. et al. Expert Rev. Mol. Diagn. 4, 761–777

(2004).7. Shi, L. et al. BMC Bioinformatics 6 suppl. Suppl. 2,

S12 (2005).

COMMENTARY©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


A framework for the use of genomics data at the EPADavid J Dix, Kathryn Gallagher, William H Benson, Brenda L Groskinsky, J Thomas McClintock, Kerry L Dearfield & William H Farland

The US Environmental Protection Agency is developing a new guidance that outlines best practice in the submission, quality assurance, analysis and management of genomics data for environmental applications.

Four years ago, the US Environmental Protection Agency’s (EPA) paper Potential Implications of Genomics for Regulatory and Risk Assessment Applications at EPA1 identified four areas of oversight likely to be influenced by genomics data. These were the prioriti-zation of contaminants and contaminated sites, environmental monitoring, reporting provisions and risk assessment. The paper also identified a critical need for analysis and acceptance criteria for genomics information

in scientific and regulatory applications. As a response to these challenges, the Genomics Technical Framework and Training Workgroup was formed and is currently developing an Interim Guidance for Microarray-Based Assays: Regulatory and Risk Assessment Applications at EPA. This guidance will address genomics data submission, quality assurance, analysis and management in the context of current possible applications by the EPA and the broader aca-demic and industrial community. The guid-ance will also identify future actions that are needed to incorporate genomics information more fully into the EPA’s risk assessments and regulatory decision making.

The growing impact of genomicsToxicogenomics is the examination of changes in gene expression, protein and metabolite profiles within cells and tissues, complemen-tary to more traditional toxicological methods. Genomics tools provide detailed molecular data about the underlying biochemical mech-anisms of disease or toxicity (that is, disease etiology and biochemical pathways) and could represent sensitive measures of exposure, new approaches for detecting effects of such exposures or methods for predicting genetic susceptibilities to particular stressors in the environment. Thus, genomics, proteomics and metabonomics/metabolomics can provide use-ful weight-of-evidence data along the source-to-outcome continuum, when appropriate bioinformatic and computational methods are available for integrating molecular, chemical and toxicological information (Fig. 1).

Identification of changes in gene expres-sion using DNA microarrays (a collection of microscopic DNA spots attached to a solid

surface) is becoming an important genomics tool for understanding toxicological processes, and informing hazard identification and mode of action analysis. In fact, microarray data have already been encountered in agency program offices. For example, a pesticide registrant cited a published genomics article2 as part of the mode-of-action data package submitted for a product registration to the EPA’s Office of Pesticide Programs. It is not unreasonable to expect similar submissions to be made by other pesticide registrants, or other stakeholders, in support of mode-of-action analyses. Thus, there is an important need for the agency to be proactive and develop processes and policies to address how genomics data will be used in agency decision making. The EPA anticipates the development of increasing volumes of microarray data by environmental research-ers, and as a part of the regulatory process. To ensure optimal use of these data, the EPA is developing science policy and guidance to address the submission, analysis and storage of microarray data (Table 1). The first step in this process was the Interim Policy on Genomics3.

The Interim Policy on GenomicsWith the advent and growth of genomics data, a major consideration for the EPA was what to do with information currently being generated by genomics technologies and available to the agency. Although it was clear that genomics data are already available, much of these data have not been correlated with frank adverse effects, such as cancer or reproductive impairment.

Therefore, in June 2002, the EPA issued an interim policy position—the Interim Policy on Genomics—which provides guidance con-cerning how and when genomics information

David J. Dix is in the Office of Research and Development, US Environmental Protection Agency, National Center for Computational Toxicology (D343-03), Research Triangle Park, North Carolina 27711, USA; Kerry Dearfield is at the US Department of Agriculture, Food Safety and Inspection Service and, together with Kathryn Gallagher and William H. Farland, is in the Office of the Science Advisor (8105R), US Environmental Protection Agency, Ariel Rios Building, 1200 Pennsylvania Avenue, NW, Washington, DC 20460, USA; William H. Benson is in the Office of Research and Development, US Environmental Protection Agency, National Health and Environmental Effects Research Laboratory, Gulf Ecology Division, Sabine Island Drive, Gulf Breeze, Florida 32561, USA; Brenda L. Grosinsky is at the US Environmental Protection Agency, Region 7, 901 North 5th Street, Kansas City, Kansas 66101, USA; and J. Thomas McClintock is at the Office of Prevention Pesticides Toxic Substances, US Environmental Protection Agency, MC 7101M, 1200 Pennsylvania Avenue, NW, Washington, DC 20460, USA. e-mail: [email protected].

COMMENTARY©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


should be used to assess the risks of environ-mental contaminants under the various regu-latory programs implemented by the agency3. This policy encourages and supports contin-ued genomics research for understanding the molecular basis of toxicity and for developing indicators of exposure, effects and suscep-tibility; however, the interim policy clearly states that genomics data alone are currently insufficient as a basis for risk assessment and management decisions. The interim policy does state that genomics data may be useful in a weight-of-evidence approach for human health and ecological health risk assessments and can be used in concert with all the other information the EPA considers for a particular assessment or decision.

Implications for regulatory oversightAfter the release of the Interim Policy on Genomics, the EPA’s Science Policy Council created an intra-agency Genomics Task Force and charged it with examining the broader implications genomics is likely to have on the EPA programs and policies. This Genomics Task Force developed in 2004 a genomics white paper1 that identified four areas likely to be influenced by both genomics informa-tion within the EPA and the submission of such information to the EPA: (i) prioritiza-tion of contaminants and contaminated sites, (ii) monitoring, (iii) reporting provisions and (iv) risk assessment.

The first of these four regulatory applica-tions relates to the prioritizations done by many agency programs to help focus resources on greater hazards or risks. Genomics data can be used as part of the body of information considered in the EPA’s prioritization efforts,

including testing to more fully investigate a hazard and making predictions based on this testing. Examples include the EPA’s voluntary high production volume (HPV) program, in which chemicals manufactured in large amounts are identified and their hazards characterized according to chemical category. Here, genomics may be part of a suite of tools to help confirm category groupings of HPV chemicals and identify which chemicals (or groups of chemicals) may present greater haz-ard or risk.

The second regulatory area relates to moni-toring activities at the EPA, generally for com-pliance and assessment purposes. Monitoring activities include the following: (i) chemical and physical analyses of air, water, soil and sediment; (ii) toxicity testing of various envi-ronmental media or chemicals; (iii) analysis of plant, animal and human tissue residues for various chemicals or their breakdown prod-ucts; (iv) ecological community structure analyses; and (v) microbial community and pathogenic microorganism analyses of air, water, soil and sediment. These activities may be one of the first nearer term applications of genomics data by the EPA. For example, they are being applied to microbial source tracking to help identify nonpoint sources responsible for the fecal pollution of water systems4.

As a third regulatory application of genom-ics, the white paper designated reporting provi-sions under EPA statutes—for example, Toxic Substances Control Act (TSCA) section 8(e) and Federal Insecticide, Fungicide, and Rodenticide Act (FIFRA) section 6(a)(2). Clearly, to have an effect on reporting provisions, the linkage of genomic changes to adverse effects or response pathways must be established and addressed.

As the predictability and validity of genomics methods increase, EPA will reevaluate this stance on reporting provisions.

The fourth and final area likely to be influ-enced by genomics information is risk assess-ment. Genomics data along with conventional toxicological data may help identify which molecular events are crucial to the biological processes that represent a mode of action by a chemical and stressor. Comparative genomics might aid in the interpretation of the human relevance of animal toxicity findings and assist in assessing impacts on susceptible popula-tions and life stages. For example, genomics data may facilitate elucidation of the possible key events in the modes of carcinogenic action, such as mutagenicity, mitogenesis, inhibition of cell death, cytotoxicity with regenerative cell proliferation and immune suppression. A mode of action comprising the same set of key events may apply to many different com-pounds. Thus, mRNA transcript, protein or metabolite profiles may be developed that can advance the screening of individual chemicals and allow faster, more accurate categorization into defined classes according to their mode of action. Better understanding of toxicity path-ways may also provide insights into chemical interactions, and possibly improve mixtures and cumulative risk assessments.

The genomics white paper1 not only iden-tifies these four regulatory and risk assess-ment applications of genomics data, but also highlights some challenges and needs for the EPA5. These include research needs to link genomic responses to adverse effects and sup-port proper interpretation of genomics data, development of acceptance criteria for genom-ics data submissions to the EPA, management and storage of the large amount of genom-ics information that the EPA is projected to handle, and training of the EPA risk assessors and managers so that they can interpret and understand genomics data.

To address some of these needs, the EPA has formed the Genomics Technical Framework and Training Workgroup. This ‘workgroup’ has facilitated coordination efforts across the agency as well as with other federal agencies (e.g., the US Food and Drug Administration (FDA) and the US Department of Agriculture and continues science policy development efforts for use of genomics data in regulatory and risk assessment applications (Table 1).

An interim guidance for microarray-based assaysThe Genomics Workgroup considered all of the ‘omics’ technologies and applications and has decided that an interim guidance on the use of data generated by DNA microarray technology

Source/stressorformation

Environmental concentration

Biological event Effect/outcome

Exposure Dose

Computational methods/bioinformatics

Genomics/proteomics/metabonomics

Figure 1 Genomics, proteomics and metabonomics/metabolomics can provide useful weight-of-evidence data along the source-to-outcome continuum when appropriate bioinformatic and computational methods are applied toward integrating molecular, chemical and toxicological information. The source-to-outcome continuum captures the entire paradigm from the source of environmental contaminants and stressors, through to exposure, effects and ultimate outcomes on human health and ecological populations.

COMMENTARY©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


would be the most beneficial at this time to the agency and its academic and industrial communities. Thus, a document is currently under development that describes (i) the data that should be submitted to the EPA for micro-array studies, (ii) a performance approach to quality assessment parameters, (iii) microarray data analysis approaches and (iv) data manage-ment and storage issues for microarray data submitted to, or used by, the EPA.

The purpose of the Interim Guidance for Microarray-Based Assays will be to provide information to the community and other interested parties regarding the submission of DNA microarray data to the EPA, and to provide guidance for reviewers in evaluat-ing and using such data or information. The interim guidance is intended to be used by EPA Program and Regional Offices to determine the applicability of specific genomics information to the evaluation of specific hazards or risks. It is important to note that microarray tech-nology is rapidly changing, such that meth-odologies for generating genomics data and ensuring their quality will likely change. Even so, the need to ensure consistency and qual-ity in generating, analyzing and using the data will be a consistent need. Thus, as the science develops, the EPA expects to revisit and revise the interim guidance.

With respect to quality assurance, the interim guidance will not prescribe specific methods to be used in microarray experiments beyond compliance with MIAME (minimal informa-tion about microarray experiments) standards. Indeed, a slightly modified version of the MIAME standards6 is proposed in the interim guidance as a data submission template for the EPA, recognizing that this submission template will be subject to change as the technology evolves. In addition, the interim guidance will provide information regarding submission of microarray data to the EPA to facilitate appro-priate review and consistent evaluation.

The interim guidance will likely assert that a systematic approach to data analysis is necessary for the use of genomics data in risk assessments. As an interim solution, it will put forward a

genomics ‘data evaluation record’ template as a tool for systematic extraction and organization of data from genomics studies. The transfer of these evaluations, and the underlying genomics data, into searchable, electronic databases will be essential to make these complex data truly useful in risk assessments. Furthermore, devel-opment of the EPA’s databases containing gene expression profiles for a wide variety of com-pounds will facilitate creation of the statistical and computational methods for predicting the toxic potential of a chemical.

Additional initiativesIn concert with other federal agencies, the EPA has also begun to investigate and evaluate the currently available computational tools for genomics data analysis. The agency has been testing the functionality of toxicogenomics data management and analysis solutions and how these solutions can be applied toward the EPA’s efforts. For example, the FDA’s National Center for Toxicological Research’s ArrayTrack database7 is being tested.

The EPA has also been collaborating with FDA, the US National Institutes of Health, the National Institute of Standards and Technology and other stakeholders on the microarray qual-ity control (MAQC) project to establish proto-cols for genomics data analysis. The first results of this initiative are presented in this special issue of Nature Biotechnology.

Furthermore, the agency has participated in the US National Academy of Science (Washington, DC, USA) workshops and International Life Sciences Institute (Washington, DC, USA) proj-ects on the application of genomics to toxicology and risk assessment. Building on these previous efforts, the interim guidance suggests continued exploration of genomics tools appropriate to the application of genomics data in risk assessments and regulatory decision making. Ultimately, the EPA will need quantitative and predictive mod-eling tools, which will likely require the develop-ment of new algorithms and models.

These tools will need to provide reliable and repeatable genomics data analysis and the consistent and necessary information for

EPA risk assessments and decision-making processes. The scientific, mathematical and statistical methods that are used for these models and analyses will need to be validated and standardized. Because of the large volumes of genomics and associated toxicological data projected, it is essential that the EPA consider the development of a complete data manage-ment solution. A preliminary outline of the functional requirements of such a solution is provided in the interim guidance. In addition, this data management solution would need to address requirements unique to scientifically based risk assessments, confidential and pro-prietary data security, public access and other aspects of regulatory application.

The Genomics Workgroup has noted that consistency, scientific and operational robust-ness, common but controlled access, and avail-ability in a scalable environment are also part of these data management requirements. Although the EPA has begun to use and develop bioinfor-matics research approaches, both intramurally (e.g., in the National Center for Computational Toxicology; http://www.epa.gov/comptox) and extramurally (e.g., STAR-funded Environmental Bioinformatics Centers, including the Research Center for Environmental Bioinformatics and Computational Toxicology at the University of Medicine & Dentistry of Piscataway, New Jersey and the Carolina Environmental Bioinformatics Research Center at the University of North Carolina, Chapel Hill; http://www.epa.gov/comptox/award_biocenters.html), an agency-wide data management solution integrating genomics, toxicological and other key data required for regulatory applications is not yet realized.

The interim guidance will conclude with the Genomics Workgroup’s recommendations for follow-up activities. These will include the fol-lowing: first, further development of genom-ics training materials and modules, to be offered throughout the Agency to risk asses-sors and decision makers, who will be faced with the challenge of interpreting and apply-ing genomics information into regulatory and prioritization processes; second, continued

Table 1 US EPA development of science policy for the use of genomics data in regulatory and risk assessment applicationsYear Publication Purpose URL

2002 Interim Policy on Genomics Defined EPA’s initial approach to using genomics information in risk assessment and decision making.

html://www.epa.gov/osa/spc/genomics.htm

2004 Potential Implications of Genomics for Regulatory and Risk Assessment Applications at EPA

Identified impact genomics likely to have on (i) prioritizationof contaminants and contaminated sites, (ii) monitoring, (iii) reporting provisions and (iv) risk assessment.

html://www.epa.gov/osa/genomics.htm

External review pending

Interim Guidance for Microarray-Based Assays: Regulatory and Risk Assessment Applications at EPA

Describes (i) microarray data submission review to the agency, (ii) quality assessment pending parameters, (iii) data manage-ment, analysis and evaluation and (iv) training needs for risk assessors and decision makers.

html://www.epa.gov/osa/index.htm

COMMENTARY©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


collaboration of EPA personnel with staff from other federal agencies and stakeholders in the development of tools for the analysis of genom-ics data; third, application of this guidance to a series of case studies to evaluate its utility in risk assessment and regulatory applications; and fourth, the updating of this guidance as needed and as the technology evolves.

ConclusionsThe advent of genomics and the burgeoning amount of genomics-related data presents opportunities and challenges to the EPA in fulfilling its regulatory and risk-assessment responsibilities. To meet these opportuni-ties and challenges, the EPA has initiated a series of activities to properly address the use of genomics information and has reorga-nized its research activities into a coordinated Computational Toxicology Program. To clarify how genomics information will be consid-

ered going forward at the EPA, the agency has provided guidance in its Interim Policy on Genomics3. The EPA has also developed a genomics white paper1 outlining implications and applications of genomics at the EPA.

The final part of the EPA’s response has been to generate a technical framework that focuses on specific genomics technology, spe-cifically DNA microarrays. This framework will be outlined in the forthcoming Interim Guidance for Microarray-Based Assays. It rep-resents a first step for the EPA in develop-ing formats, methodologies and consistent approaches for dealing with genomics infor-mation submitted to the agency.

ACKNOWLEDGMENTSThis perspective is based on the efforts of the dedicated EPA staff within the Office of the Science Advisor, the Science Policy Council, the Agency-wide Genomics Task Force and the subsequent Genomics Technical Framework Workgroups.

Disclaimer: This work was reviewed by EPA and approved for publication but does not necessarily reflect official agency policy.

1. US Environmental Protection Agency. Potential Implications of Genomics for Regulatory and Risk Assessment Applications at EPA. Science Policy Council. EPA Publication No EPA 100/B-04/002 (EPA, Washington, DC, 2004). http://www.epa.gov/osa/genomics.htm

2. Genter, M.B., Burman, D.M., Vijayakumar, S., Ebert, C.L. & Aronow, B.J. Physiol. Genomics 12, 35–45 (2002).

3. US Environmental Protection Agency, Science Policy Council. Interim Policy on Genomics. (EPA, Washington, DC, 2002). http://www.epa.gov/osa/spc/genomics.htm

4. US Environmental Protection Agency. Microbial Source Tracking Guide Document (EPA, Washington, DC, June 2005). http://www.epa.gov/ORD/NRMRL/pubs/600r05064/600r05064.htm

5. Dearfield, K.L., Benson, W.H., Gallagher, K. & Johnson, J. in Genetics and Environmental Policy: Ethical, Legal and Regulatory Perspectives. (ed. Marchant, G.), in press (Johns Hopkins University Press, Baltimore, 2007).

6. http://www.mged.org/Workgroups/MIAME/miame.html7. http://www.fda.gov/nctr/science/centers/toxicoinfor-

matics/ArrayTrack/

COMMENTARY©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Data quality in genomics and microarraysHanlee Ji & Ronald W Davis

Objective quality control indices are needed to facilitate clinical implementation of DNA microarrays used in transcriptional profiling as well as other types of genomic analysis.

DNA microarrays are increasingly used for investigating gene expression in human

diseases with the hope of identifying signatures that correlate with specific clinical outcomes. The discovery of these signatures offers the tantalizing possibility that they could be trans-lated into fully fledged clinical diagnostic tests. Significant hurdles exist, however, in transition-ing microarray technology and gene expression analysis into the complicated realm of the clinic. Namely, gene expression genomic data quality, a measure of its general reproducibility and ulti-mately, its true biological relevance, requires significant improvement1. For example, com-paring gene expression studies using different microarray formats is fraught with difficulty, even under circumstances in which the same type of tissue is analyzed2. A recent promi-nent example illustrates a case where different clinical conclusions were derived from the same gene expression data set3.

Currently, few if any objective metrics or established quality control standards are used to evaluate the quality of microarray stud-ies. Often, the assessment of microarray data quality requires running replicates and making intra-sample comparisons to determine repro-ducibility. Using replicate arrays is an expensive strategy and cannot be routinely applied where quantities of precious biological samples, such as tumor biopsies, are limited. The majority of clinically related studies do not have replicates, leaving genomic data purveyors little in the way of guidance to determine the overall quality of

submitted microarray data. Two major efforts currently under way, however, offer an oppor-tunity to improve genomic data quality for gene expression.

Looking at gene expression data qualitySeveral studies have addressed the issues of genomic data quality in the realm of gene expression analysis through comparison of dif-ferent formats of microarrays4–8. To date, the MicroArray Quality Control (MAQC) proj-ect—the first results of which are presented in this issue—and the External RNA Controls Consortium (ERCC) are the most comprehen-sive efforts in assessing and comparing gene expression data derived from common samples among different microarray platforms9,10. Both projects are focused on the analysis of highly calibrated reference RNA pools with the poten-tial for wide distribution to the research com-munity. Analysis of the MAQC and ERCC RNA sets has resulted in extensive gene expression data sets with validation across many microar-ray platforms and systems (e.g., by quantita-tive reverse transcription (qRT)-PCR)9,10. The public release of these results should spawn new applications to evaluate gene expression data quality.

A vital part of the MAQC project has been the identification of common transcripts that are mutually represented among the various microarray platforms included in the analysis9. This aspect will enormously facilitate cross-platform comparisons of gene expression and open the door to robust meta-analysis studies in clinical gene expression studies.

The completion of these projects also pro-vides an opportunity to advocate for the adop-tion of genomic data quality control processes into clinically oriented studies. It is critical that there be wide acceptance of some type of qual-ity control standards at the planning stages of clinically oriented projects. Accurate and

routinely reproducible data will improve the clinical validity of molecular signatures and speed transition into the clinical setting. For translational research, adoption of quality con-trol will be faster if there is easy accessibility of quality standards to any size research group.

We anticipate that establishing quality control standards for genomic data will sub-stantially reduce genomic analysis costs by eliminating the need for replicate experiments and improve the design and implementation of large translational studies involving hundreds if not thousands of patient samples. Another benefit is that genomic data quality standards will facilitate future technology development. When established standards exist, it is much easier to conduct proof-of-principle studies using new systems.

Moving beyond RNAThere is a general recognition that quality control standards for transcriptional profiling experiments are an absolute necessity given the complexities of working with RNA, the wide variety of methodologies and different micro-array platforms. We suggest that it is equally important to establish such standards for all other microarray formats, including array comparative genomic hybridization (CGH) and genotyping. When microarrays are used, the analysis of DNA has major advantages over RNA in terms of its physical and biochemical properties. Even so, many of the inherent issues of microarray performance and reliability in analyzing RNA are just as relevant.

One could imagine a future consortium, similar to the MAQC and ERCC, developing a universal set of standardized DNA references with known genotypes and gene-copy altera-tions for use in high-throughput genotyping, sequencing and gene-copy microarray technol-ogies. Many highly characterized DNA samples already exist, the larger hurdle being one of

Hanlee Ji is in the Department of Medicine, Division of Oncology and Ronald W. Davis is in the Department of Biochemistry and Department of Genetics at Stanford University School of Medicine, 269 Campus Drive, CCSR 1115, Stanford, California 94305-5151, USA. e-mail: [email protected]

COMMENTARY©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


establishing a consensus about the samples to be included. A set of DNA references would enable quality control assessment, facilitate data set comparisons among different microar-ray platforms and provide a valuable resource for validating new genomic technologies.

Controls to assess genomic data qualityNumerous studies have identified sources of inter- and intralaboratory error and variabil-ity in microarray experimental results6,7,11. They include variation in tissue processing, RNA extraction, inherent biological differ-ences in normal tissue and microarray assay protocols11. The MAQC’s and ERCC’s RNA pools of highly characterized transcripts could be incorporated into the microarray workflow process. For example, an individual site could analyze the RNA pools character-ized by the MAQC project to make perfor-mance comparisons. Leveraging the MAQC data sets will prompt the development of methods to increase the confidence that dif-ferential expression of specific genes will be reproducible. For example, we and others (H.J. and R.W.D., unpublished data; Lin, G., He, G., Shi, L. & Zhong, S., personal communication) are currently developing algorithmic methods that use the MAQC data set to account for interlaboratory variation in the discovery of differentially expressed genes.

Another strategy for monitoring genomic data quality would rely on highly character-ized external controls or RNA pools at every step of a microarray experimental protocol12. We and others have suggested the incorpora-tion of a universal set of nucleic acid controls tailored to measure performance for indi-vidual steps of microarray analysis. Several RNA transcripts or pools, PCR products, oligonucleotides or other external ‘spiked’ controls would be added at every individual

step of the microarray analysis process. For example, one could imagine multiple ‘spiked’ external control sequences that would directly measure the quality and subsequent level of degradation of nucleic acid extracted from processed tissues; other controls would assess an assay’s enzyme quality and some would be specific for the hybridization process. These external controls would be assessed via microarray hybridization. To facilitate the development of external controls, one could design synthetic sequences as probes to avoid problems of cross-hybridization and reduce the interfering aspects of nucleic acid secondary structure. Incorporating synthetic sequence probes and targets would be quite similar to the development of oligonucleotide barcodes in microarrays, which has proven to be quite robust13. Another application that would improve genomic data quality is the inclusion of universal external controls in different concentrations for normalization in individual microarray experiment. As the final step of a quality control assessment process, the formal report of quality control performance would be incorporated in the resulting data file output.

Building in a quality control assessment and an incorporated report of quality metrics would be enormously useful in a variety of settings (Fig. 1). We offer some hypothetical examples; a genomic data quality report would assist the individual researcher in measuring the performance of an experiment ‘on-the-fly,’ and provide journal editors with some exter-nal criteria to judge the quality of submitted data sets, and embedded quality metrics in genomic data reports would substantially quicken the complicated task of regulatory agency analysis and review. A universal set of quality control reagents for genomic data qual-ity assessment also has the potential to decrease

costs. Individual researchers could assess their genomic data quality at the very beginning of a project and avoid costly mistakes. Like its MAQC and ERCC predecessors, any future efforts would require general agreement and coordination among the research community, government agencies, microarray manufactur-ers and producers of biological reagents. We believe this is a realistic goal.

ConclusionsThe completion of the MAQC and ERCC collaborative projects sets the foundation for future consortia working towards universal genomic data quality control standards. These projects herald a movement in the genomics community to improve the reliability of micro-array technologies in both basic and clinical research. Perhaps our greatest aspiration is that through these efforts we will improve genomic data quality sufficiently to spur rapid develop-ment of the next generation of genomic diag-nostics and thus have a positive impact on the provision of human healthcare.

1. Steinmetz, L.M. & Davis, R.W. Nat. Rev. Genet. 5, 190–201 (2004).

2. Tan, P.K. et al. Nucleic Acids Res. 31, 5676–5684 (2003).

3. Tibshirani, R. N. Engl. J. Med. 352, 1496–1497 (2005).

4. Jarvinen, A.K. et al. Genomics 83, 1164–1168 (2004).

5. Bammler, T. et al. Nat. Methods 2, 351–356 (2005).6. Irizarry, R.A. et al. Nat. Methods 2, 345–350 (2005).7. Larkin, J.E., Frank, B.C., Gavras, H., Sultana, R. &

Quackenbush, J. Nat. Methods 2, 337–344 (2005).8. Shi, L. et al. BMC Bioinformatics 6 Suppl 2, S12

(2005).9. MACQ Consortium. Nat. Biotechnol. 24, 1151–1161

(2006).10. Baker, S.C. et al. Nat. Methods 2, 731–734 (2005).11. Cobb, J.P. et al. Proc. Natl. Acad. Sci. USA 102,

4801–4806 (2005).12. van Bakel, H. & Holstege, F.C. EMBO Rep. 5, 964–969

(2004).13. Eason, R.G. et al. Proc. Natl. Acad. Sci. USA 101,

11046–11051 (2004).

Clinical sample preparation InterrogationArray processing

RNA extraction• Protocol• Sample amount• Sample quality• Degradation• Contamination

Enzymatic preparation• Temperature• Time of incubation• Enzyme quality

Quality control checks• Temperature• Enzyme quality• Length of incubation• Quality of product

Hybridization and staining• Temperature• Time of incubation• Properties of staining• Array quality

Washing• Buffer salt concentration• Temperature• Array quality

Scanning• PMT settings (photomultiplier tube detector)• Saturation of signal• Array quality

RNA

Figure 1 DNA microarray analysis of human tissues involves multiple steps and protocols. As a result, these assays are susceptible to variance throughout the process. Improved methods are needed for monitoring experimental variation during this workflow. (Pictures used with permission of Affymetrix.)

COMMENTARY©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Evaluation of DNA microarray results with quantitative gene expression platformsRoger D Canales1,10, Yuling Luo2,10, James C Willey3,10, Bradley Austermiller3, Catalin C Barbacioru1, Cecilie Boysen4, Kathryn Hunkapiller1, Roderick V Jensen5, Charles R Knight6, Kathleen Y Lee1, Yunqing Ma2, Botoul Maqsodi2, Adam Papallo5, Elizabeth Herness Peters6, Karen Poulter1, Patricia L Ruppel7, Raymond R Samaha1, Leming Shi8, Wen Yang2, Lu Zhang1 & Federico M Goodsaid9

We have evaluated the performance characteristics of three quantitative gene expression technologies and correlated their expression measurements to those of five commercial microarray platforms, based on the MicroArray Quality Control (MAQC) data set. The limit of detection, assay range, precision, accuracy and fold-change correlations were assessed for 997 TaqMan Gene Expression Assays, 205 Standardized RT (Sta)RT-PCR assays and 244 QuantiGene assays. TaqMan is a registered trademark of Roche Molecular Systems, Inc. We observed high correlation between quantitative gene expression values and microarray platform results and found few discordant measurements among all platforms. The main cause of variability was differences in probe sequence and thus target location. A second source of variability was the limited and variable sensitivity of the different microarray platforms for detecting weakly expressed genes, which affected interplatform and intersite reproducibility of differentially expressed genes. From this analysis, we conclude that the MAQC microarray data set has been validated by alternative quantitative gene expression platforms thus supporting the use of microarray platforms for the quantitative characterization of gene expression.

To evaluate performance characteristics of gene expression measure-ment technologies and the data they generate, one must identify alterna-tive quantitative platforms that can be used as references. The MAQC consortium used the TaqMan assays, Standardized (Sta)RT-PCR and

QuantiGene platforms for this purpose because these platforms had been shown to have high assay specificity and detection sensitivity, broad linear dynamic range and high signal-to-analyte response1–4. The plat-forms were used to evaluate some of these performance characteristics in each commercial whole genome microarray platform investigated in the MAQC study. In addition, we report the fold-change correlation of each alternative quantitative platform relative to these microarray plat-forms. We observed high correlations between the quantitative platform measurements and the data derived from the microarrays and were also able to identify the sources of variability among microarray platforms relative to the quantitative platforms.

Here we define validation as a measure of the concordance and discor-dance of the microarray data with the quantitative reference platforms selected—we used the results of the quantitative platforms as a reference against which to evaluate the microarray platforms. We have thus not attempted to establish a ‘gold standard’ for expression measurements but a solid reference point to allow data validation.

Quantitative, real-time PCR has been developed over the last decade to specifically measure template molecule numbers4,5. The development of fluorogenic probes6 enabled accurate quantification of PCR products through measurement of a fluorescence signal during the exponential amplification phase. TaqMan Gene Expression Assays are based on the use of the 5′ nuclease activity of Taq polymerase to hydrolyze a target-specific, dual-labeled, fluorogenic hybridization probe during the exten-sion phase7. The number of template transcript molecules in a sample is determined by recording the amplification cycle in the exponential phase (cycle threshold or CT), at which time the fluorescence signal can be detected above background fluorescence. Thus, the starting number of template transcript molecules is inversely related to CT–the more tem-plate transcript molecules at the beginning, the lower the CT

7,8. TaqMan assays have been used in recent studies to validate microarray data9–11.

StaRT-PCR4,12 is a competitive PCR-based platform that enables endpoint quantification of PCR products. After RNA is converted to cDNA, the cDNA is added to a standardized mixture of internal stan-dard (SMIS) competitive templates, aliquoted into microplate wells containing gene-specific PCR primers and amplified for 35 cycles. The individual endpoint StaRT-PCR products are then separated by size and quantified by high-throughput microfluidic electrophoresis. StaRT-PCR has also been used in studies to validate microarray data1 and has been used to generate potential biomarkers for disease stratification13,14.

1Applied Biosystems, 850 Lincoln Centre Dr., Foster City, California 94404, USA. 2Panomics, Inc., 6519 Dumbarton Circle, Fremont, California 94555, USA. 3University of Toledo, Toledo, Ohio 43614, USA. 4ViaLogy Corp., 2400 Lincoln Avenue, Altadena, California 91001, USA. 5University of Massachusetts-Boston, 100 Morrissey Blvd., Boston, Massachusetts 02125, USA. 6Gene Express, Inc., 975 Research Drive, Toledo, Ohio 43614, USA. 7Innovative Analytics, 7107 Elm Valley Dr., Kalamazoo, Michigan 49009, USA. 8National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR Rd., Jefferson, Arkansas 72079, USA. 9Center for Drug Evaluation and Research, US Food and Drug Administration, 10903 New Hampshire Ave., Silver Spring, Maryland 20993, USA. 10These authors contributed equally to this work. Correspondence should be addressed to F.M.G. ([email protected]).

Published online 8 September 2006; doi:10.1038/nbt1236

A N A LY S I S©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


The QuantiGene Reagent System15 detects DNA and RNA directly without a reverse transcription step. It is a sandwich nucleic acid hybrid-ization platform in which targets are captured through cooperative hybridization of multiple probes16. This complex is detected through signal amplification by a branched DNA amplifier and chemilumines-cence signal generation. The QuantiGene assay has been used in US Food and Drug Administration–approved clinical diagnostic products for quantitative viral load determination of HIV, hepatitis C virus and hepa-titis B virus with detection sensitivity of <50 transcript molecules17–19.Because the QuantiGene assay can measure gene expression either by measuring RNA directly without a reverse transcription step, or by mea-suring cDNA without PCR amplification, it provides an independent method of measurement relative to the quantitative reverse transcription (RT)-PCR and microarray platforms.

Application of these quantitative platforms in the MAQC project increased the confidence in concordance observed between the micro-array platforms. In addition, the results obtained from using these plat-forms allowed us to explore the sources of variability among microarray platforms. With this comprehensive evaluation, we demonstrate the value of alternative quantitative platforms as tools for the independent validation of microarray data and the resolution of discordant results.

RESULTSAssay performance of three alternative quantitative platformsThe MAQC consortium selected a list of 1,297 genes to evaluate and compare the performance of microarray and alternative quantitative platforms and to identify and analyze discordant results. TaqMan assays, StaRT-PCR and QuantiGene assays were performed on 997, 205 and 244 of the 1,297 genes, respectively. Gene lists used for analysis of selected performance metrics for quantitative platforms, and for analysis of con-cordance between the quantitative platforms and microarrays are shown in Supplementary Table 1 online.

Four RNA samples A, B, C and D, provided by the MAQC consor-tium, were analyzed20. TaqMan assays were done in quadruplicate, and StaRT-PCR assays in triplicate, on cDNA generated from 10 ng total RNA (Supplementary Methods online). Both the TaqMan assays and StaRT-PCR were based on cDNA from a single reverse transcription reaction. QuantiGene assays were performed in triplicate directly from 500 ng of total RNA (Table 1). Performance metrics presented are not

directly comparable because each platform assayed a different gene set, and had different assay ranges of measurements and signal-to-analyte response.

Detection sensitivityTaqMan assay quantification is directly related to CT. A gene is not detectable when the average CT > 35 cycles. By this definition, 857 genes (86%) were detectable in both A and B. The StaRT-PCR detection limit is defined as ten transcript molecules. By this definition, 193 genes (94%) were detectable in both A and B. For QuantiGene the detection limit is defined as a signal three standard deviations (s.d.) above the back-ground. By this standard, 223 genes (91.4%) were detectable in both A and B.

Assay rangeThe assay range represents the difference in signals measured on a log10 scale between genes with the highest and the lowest expression. The assay range for TaqMan assays was 8.1 with CT values ranging from 8 (>108 transcript molecules) for 18S rRNA to 35 (~5 transcript molecules) for low expressors. For StaRT-PCR, the assay range was 6.8 with nor-malized transcripts of 6.4 × 107 transcript molecules for 18S rRNA to 10 transcript molecules for low expressors. For QuantiGene, the assay range was 4.1 with the highest assay range of 599 relative luminescence units (RLU) for LDHA and the lowest detectable signal of 0.045 RLU for SPARCL1.

PrecisionThe precision of the three alternative quantitative platforms was measured by coefficient of variance (CV) (Fig. 1 and Table 1) or s.d. (Supplementary Fig. 1 online). There were interplatform differences in the number of transcript molecules (RNA or cDNA) loaded into each assay. Because of differences in the amount of sample loaded (Table 1), a majority of the genes measured with QuantiGene contained >6,000 transcript molecules in the assay, whereas a majority of those measured by TaqMan assays and StaRT-PCR had less. These two platforms were used to assess the previously reported stochastic process involved in the relationship between transcript molecules loaded and CV21. A clear trend of increased CV with decreasing abundance of transcripts was observed for TaqMan assays and StaRT-PCR when <6,000 transcript

Table 1 Summary of platform performance metricsPlatform Gene list Sample processing Detection sensitivitya Dynamic

rangeb (log10)

Precisionc (median) Accuracyd (median)

Symbol Number of genes tested

Sample input

Assay replicates

Data presentation

Both A & B above LOD

Both A & B below LOD

All data >6,000 Linearitye (R2)

RA (%median)f

RA (%variance)g

TAQ 997 cDNA from 10 ng total

RNA, one RT reaction

Four replicates of cDNA

Normalized against POLR2A

857 (86%) 38 (3.8%) 8.1 3.46 2.42 0.950 3.6 9.4

GEX 205 cDNA from 10 ng total

RNA, one RT reaction

Three replicates of cDNA

Normalized against

beta-actin

193 (94%) 4 (2.0%) 6.8 6.26 3.82 0.96h 0.4h 21.1h

QGN 244 500 ng total RNA

Three replicate of

RNA directly

Original data 223 (91%) 5 (2.0%) 4.1 2.16 2.12 0.994 1.0 5.0

aDetection sensitivity: the number (percent) of detectable or undetectable genes in both sample A&B based on each platform’s detection limit. bAssay range: based on the ratio of (highest detect-able signal/lowest detectable signal) of all the genes and samples measured in each platform. cPrecision: based on median value of CV measured either a) in all genes and all samples in each platform or b) in samples with 6,000 transcript molecules or above. dBased on formula C = 0.25A +0.75B and D = 0.75A + 0.25B for TaqMan assays and QuantiGene and C = 0.88A + 0.12B and D = 0.45A + 0.55B for StaRT-PCR. eLinearity: based on the median R2 slope of the linear fit of assay signal from sample A, B, C, D for all the detectable genes with greater than twofold dif-ference between A and B. 829, 125 and 223 genes are analyzed for TaqMan, StaRT-PCR, and QuantiGene, respectively. fRA score (% median): RA (relative accuracy) score for sample C and D for a gene is defined as (C-C′/C′) and (D-D′/D′), which represents the percent difference of experimental from the expected. Median value of % RA score for both sample C and D combined is pre-sented here. Only detectable genes in both A & B are analyzed for each platform. gRA score (% variance): median value of the absolute RA scores for both sample C and D combined is presented here. hBased on a recalibrated data set (Supplementary Methods).

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


molecules (below dashed line in Fig. 1) were loaded as also specified in Table 1. For the TaqMan and StaRT-PCR platforms, each cDNA sample was split for replicate measurements, so precision measurement did not include the reverse transcription reaction. For the QuantiGene platform, replication encompassed the entire process from total RNA to chemi-luminescent detection.

Relative accuracyRelative accuracy was defined as the proximity of observed expression values for C and D to the predicted values based on measured expres-sion values for A and B. Error handling for all platforms was on a linear scale with the exception of TaqMan assays in which errors increased exponentially because CT is transformed to number of molecules. The percent difference between the predicted signal C′ and D′ and the actual assay signal C and D could be used as an indication of relative assay accuracy (RA). An RA score ∆C and ∆D for a target gene was defined as (C–C′/C′) and (D–D′/D′), respectively. The distribution of percent difference from expected (RA score) for each gene was presented in a box plot for each platform (Fig. 2 and Table 1). The median percent difference from expected for both C and D was 3.6, 0.4, 1.0 for TaqMan

assays, StaRT-PCR and QuantiGene, respectively, which are all closely centered around zero. The median distribution of the absolute value of RA scores (|∆C| and |∆D|) indicates the variance of percent differ-ence between the predicted signal C′ and D′ and the actual assay range C and D. For TaqMan assays, the median variance value for 856 genes for both C and D was 9.4; for StaRT-PCR (193 genes) it was 21.1 and for QuantiGene (223) genes it was 5.0. The data for the QuantiGene platform are notable given that these values encompass the system-wide accuracy of the platform.

Fold-change correlationTo evaluate the concordance of fold changes between the alternative quantitative platforms, we performed regression analysis of fold differ-ences in sample A compared to sample B. This analysis was performed using pair-wise common gene sets between platforms because the over-lap between the three platforms was limited to 48 genes (Fig. 3). The R2 and slope for TaqMan assays versus StaRT-PCR (92 common genes) were 0.88 and 0.93, respectively; for QuantiGene versus TaqMan assays (193 common genes), 0.81 and 0.78, respectively; and for QuantiGene versus StaRT-PCR (55 common genes), 0.85 and 0.77, respectively. Although linear regression analysis indicates good fold-change correlation across the three platforms, the respective slopes indicate compression or expan-sion effects between the platforms.

Concordance of microarrays with alternative quantitative platformsWe used the results of the alternative quantitative platforms as a ref-erence to evaluate concordance with microarray platforms. For cross-platform comparison to microarrays, we evaluated four parameters (Figs. 4 and 5): (i) detection sensitivity, the ability of the microarrays to detect genes that were called ‘present’ by each alternative quantita-tive platform; (ii) the fold-change correlation between microarrays and each alternative quantitative platform; (iii) true positive rate (TPR), the concordance of genes called statistically differentially expressed by the TaqMan assay that are also called statistically differentially expressed in the microarrays; (iv) false discovery rate (FDR), the concordance of genes differentially expressed in microarrays that are not differentially expressed in the TaqMan assay. TaqMan assays were evaluated for all parameters, whereas StaRT-PCR and QuantiGene were evaluated only

80

60

40

20

80

60

40

20

0 10 20 30 0 10 20 30 0 10 20 30

Transcript molecules

Coe

ffici

ent o

f var

iatio

n

ATaqMan

TaqMan

A AStaRT-PCR

StaRT-PCR

QuantiGene

QuantiGeneB B B

◆ Unfiltered ◆ Below LOD ◆ Above 6000 copies

Figure 1 Effect of the number of transcript molecules on assay precision. The measured (StaRT-PCR) or estimated (TaqMan assays and QuantiGene) number of transcript molecules loaded into an assay for each gene in sample A or B was plotted against its CV. The data for the three platforms were transformed to be on the same x-axis scale as described in Methods. The vertical dashed line is ~6,000 transcript molecules; blue symbol, assays detecting <6,000 transcript molecules; orange, assays detecting >6,000 transcript molecules; green, assays below the limit of detection. LOD, limit of detection.

100

50

0

– 50

–100

100

50

0

– 50

–100

100

50

0

– 50

–100

100

50

0

– 50

–100

TAQ 850 GEX 192 QGN 223 TAQ 850 GEX 192 QGN 223

TAQ 95 GEX 75 QGN 183 TAQ 95 GEX 75 QGN 183

Per

cent

diff

eren

ceP

erce

nt d

iffer

ence

Sample C, A&B > LOD Sample D, A&B > LOD

Sample C, A&B > 6K Sample D, A&B > 6K

Figure 2 Analysis of assay accuracy. The values measured for C and D were compared to the values expected (% difference) based on measured A and B values. Formulas used to calculate expected C and D are provided in text. Box plot components are: horizontal line, median; box, interquartile range; whiskers, 1.5× interquartile range; black squares, outliers. TAQ, TaqMan assays; GEX, StaRT-PCR assays; QGN, QuantiGene assays; LOD, limit of detection. The number of genes for each platform is shown.

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


for parameters i and ii because fewer genes were assayed for these plat-forms. Detailed site-by-site analysis of genes is provided for StaRT-PCR and QuantiGene in Supplementary Table 2 online and for TaqMan assays in Supplementary Figure 2 online.

Detection sensitivity analysis was done for each alternative quantita-tive platform using the genes common to that platform and each of the microarray platforms. For this reason, assay ranges and expression characteristics of gene sets differed. There were 845, 157 and 197 genes determined to be present in sample A by TaqMan assays, StaRT-PCR and QuantiGene, respectively. At the lower ranges of gene expression, for each microarray, the fraction of genes detected decreased relative to each of the alternative quantitative platforms (Fig. 4a–c). In addition, detection sensitivities relative to each alternative quantitative platform varied among the microarray platforms.

A fold-change comparison between each alternative quantitative plat-form and each microarray platform was also performed using LOWESS smoothing (Fig. 4d–f, ref. 22), which does not assume a linear relation-ship of fold-change values between platforms. We used a total of 392, 101 and 83 genes that were present in samples A and B at each site measured by each microarray platform and shared with TaqMan assays, StaRT-PCR and QuantiGene, respectively, for comparison. Although excellent fold-change correlations were observed, varying degrees of compression of signal-to-analyte response relative to the alternative quantitative platforms were also found. These data are consistent with the analysis presented elsewhere in this issue20. An additional analysis was done to show that compression effects are detectable for both low and high expressors (Supplementary Fig. 3 online).

Traditionally, analysis of accuracy is carried out by analyzing the true positive rate (TPR) and false discovery rate (FDR). In this case, the actual rates were unknown. For this reason, we compared the microarray plat-forms to TaqMan, which became the reference platform. Using TaqMan assay calls as the reference, we constructed contingency tables against microarray platforms, in which the concordance was determined and both the P-value significance of the t-test and fold-change directionality (up- or downregulation) were taken into consideration. Specifically, true positives (TP) are genes differentially expressed (significant P value for the t-test) in both TaqMan and microarray platforms with fold change in the same direction; true negatives (TN) are genes not differentially expressed in either platform; false positives (FP), consist of two sets of genes: (i) genes not differentially expressed in TaqMan and differentially expressed in microarrays, or (ii) genes differentially expressed in both platforms with fold change in the opposite direction; false negatives (FN), genes differentially expressed for TaqMan and not for microarrays.

For TPR analysis in TaqMan assays, microarrays were compared to genes considered differentially regulated at fold-change cut-offs of 0, 1.5 and 2.0 (Fig. 5a–c, Supplementary Table 3 online). For microarrays, differ-ential expression was measured using a t-test and controlling for FDR at a 5% level23 for genes present in either sample A or B. For approximately half of the assay range assessed by TaqMan assays, there were consistent TPR values across array platforms. However, it is apparent that at low expression, detection percentages were directly proportional to TPR. As a result, there was also variation (up to 20%) in TPRs between array plat-forms (Fig. 5a, Supplementary Table 3 online). FDR analysis (Fig. 5d–f, Supplementary Table 3 online) using TaqMan assays as a reference also showed consistent FDRs for genes expressed at medium and high lev-els for the microarray platforms. As expected, alternative quantitative platforms showed ~5% discordance with arrays in agreement with the FDR cut-off used for defining differential expression in microarrays. However, genes expressed at low levels showed a variable and inverse relationship to FDR values (Fig. 5d, Supplementary Table 3 online). These results support the idea that differential expression measurement depends on the detection limit for each microarray platform.

Discordant gene analysisAlternative quantitative platforms can also be used to resolve discor-dance among the microarray platforms because specific assays can be designed easily to identify the source of the discordance by probing differ-ent regions. Analysis of extremely discordant results among the 997 genes shared by microarray platforms and TaqMan assays resulted in 9 genes (~1%) that exhibit twofold or greater changes in opposite directions on different platforms with P < 0.0001 (Supplementary Table 4 online). Some of these genes such as POMC, LTA and EPHA7 (Supplementary Fig. 4 online) were considered low expressors by TaqMan assays (CT values > 32) and, as expected, were undetected in a majority of the microarray platforms. However, some genes appeared to exhibit true discordance, of which three (ELAVL1, IGFBP5, ABCD1) were selected for further analysis by the three alternative quantitative platforms. To investigate the nature of the discordance, we designed probes against different regions of the three genes. For IGFBP5 and ABCD1, alternative quantitative platform probes indicate consistently lower expression in sample A along the length of the transcripts (Fig. 6, Supplementary Table 5 online). These results suggest that discordance between the platforms in some cases is likely to be a result of cross-hybridization of microarray probes with other sequences. For ELAVL1, alternative quantitative plat-form probes were able to evaluate differential expression characteristics of the 5′ and 3′ ends of the gene. This result is consistent with a mapping

15

10

5

0

– 5

–10

–15

15

10

5

0

– 5

–10

–15

15

10

5

0

– 5

–10

–15

–15 –10 – 5 0 5 10 15 –15 –10 – 5 0 5 10 15 –15 –10 – 5 0 5 10 15

92 Common genes 181 Common genes 53 Common genes

TAQ

Log

2FC

QG

N L

og2F

C

QG

N L

og2F

C

GEX Log2FC TAQ Log2FC GEX Log2FC

TaqMan QuantiGene

StaRT-PCR

625 42134

4844 5

97

995

Figure 3 Correlation of fold change between alternative quantitative platforms. The sample B over sample A (B/A) fold changes (log2) for each gene common between two platforms were subjected to bivariate analysis. (a) TaqMan assays versus StaRT-PCR. (b) QuantiGene versus TaqMan assays. (c) QuantiGene versus StaRT-PCR. The dashed line on each graph represents the ideal slope of 1.0. The solid lines represent a linear regression fit. The overlapping gene list among the alternative quantitative platforms is represented in the Venn diagram. Linear fit: TaqMan assay versus StaRT-PCR, Y = –0.03647 + 0.9347X, R2 = 0.879; QuantiGene versus TaqMan assay, Y = 0.14 + 0.7825X, R2 = 0.8118; QuantiGene versus StaRT-PCR, Y = 0.4095 + 0.7707X, R2 = 0.8497.

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


study showing that ELAVL1 has two alternative polyadenylation sites (unpublished observations). We also investigated some genes (DPYD, PTGS2, FURIN) that were discordant between the alternative quantita-tive platforms. DPYD discordant results were determined to be a result of probing different sequence locations in the gene. When probes from each alternative quantitative platform were designed to interrogate similar sequences, expression characteristics along the length of the gene were found to be in concordance. Although more 5′ probes appeared to have discrepancies in directionality of expression, these differences were found to be statistically insignificant (P > .01). Multiple probe locations for PTGS2 generated expression differences in the same direction of change across all three platforms. The only gene that remained discordant after using multiple probe designs for each of the three platforms was FURIN. For this gene both TaqMan assays and StaRT-PCR detected differential expression in probes specific to the 5′ end of the gene. Although all plat-forms interrogate this region of the gene, the smaller probes (TaqMan assays; base 25–95 and StaRT-PCR; base 22–182) may be detecting a splice variant not detected by probes interrogating a longer region of the gene (QuantiGene; base 1–501). Thus, by designing probes against different regions of a gene, alternative quantitative platforms can confirm location-specific expression characteristics of genes and aid in the resolu-tion of discordant gene expression data.

DISCUSSIONWe have assessed three quantitative gene expression measurement tech-nologies for their performance metrics, correlated the results obtained with them to DNA microarray data and then subsequently used them as a means to identify sources of discordance among microarray plat-forms. Our results show a good correlation between quantitative plat-form measurements and microarray data. This is true, regardless of whether RNA or cDNA levels were measured. A primary focus of this study was to identify possible sources of discordance. On the basis of data reported here, we have identified specific reasons that partially explain why, as previously reported22, groups of genes detected as differ-entially expressed on a particular microarray platform are occasionally not reproducible across microarray platforms.

Whereas alternative quantitative platforms could detect over 85% of the genes shared across alternative quantitative and array plat-forms in this study, microarray platforms were less sensitive in the detection of lower expressed genes in this set (Fig. 4a–c, Supplementary Table 2 and Supplementary Fig. 2 online). In addition, relative to the alternative quantita-tive platforms, detection levels varied by as much as 60% among microarray platforms for lower expressed genes in this set. Since sig-nificant differential expression in microarrays is largely dependent on the ability to reliably detect expression, intersite and interplatform variation can lead to discordant results in the gene lists.

Using TaqMan assays as a reference, TPR and FDR for the various microarray platforms differed across the assay range (Fig. 5a,d, Supplementary Table 3 online). TPR was directly correlated to percent of detectable genes whereas FDR was inversely correlated,

indicating that although this metric reflects the ability of each platform to detect expression, it may also be subject to the stringency defined by the array manufacturer in applying detection calls. The consequences of these varying stringencies are that whereas a relaxed stringency in detec-tion calls can lead to better detection and differential expression concor-dance, there will be a higher percentage of false positives. Supplementary Figure 2 online verifies that the discordance in differential expression is related to the intersite and interplatform variation in detection.

Using StaRT-PCR or QuantiGene as references and more stringent criteria in which a fold-change cutoff of 2.0 was applied for genes that were considered present in at least three out of five replicates in both A and B samples did not eliminate intersite or interplatform variation in detection of differentially expressed genes (Supplementary Table 2 online). It is clear that this variation is nearly exclusively for genes expressed at low level. Even with these more stringent selection criteria, intersite variation in detection resulted in intersite and interplatform variation in lists of differentially expressed genes.

Another source of discordance in differentially expressed genes in this study was interplatform variation in compression. Using alternative quantitative platforms as a reference, interplatform variation in signal-to-analyte response was observed (Fig. 4d–f) and it was particularly large among genes expressed in the high or low range (Supplementary Fig. 3 online). This platform-dependent compression was associated with discordance in differentially expressed genes (Supplementary Table 2 online).

Whereas these results have identified specific causes of discordance in lists of detected, and/or differentially expressed genes, we found excellent fold-change correlation between each quantitative platform and each microarray platform for those genes that were detected by microar-ray platforms (Fig. 4d–f). Of the 845 genes detected in the microarray

Detection

FCcorrelation

100

80

60

40

20

0

100

80

60

40

20

0

100

80

60

40

20

0

5 10 15 20 5 10 15 20 5 10 15 20

–15 –10 – 5 0 5 10 15 –10 – 5 0 5 10 –10 – 5 0 5 10

15

10

5

0

– 5

–10

–15

10

5

0

– 5

–10

10

5

0

– 5

–10

ABIAFXAG1GEHILM

ABIAFXAG1GEHILM

ABIAFXAG1GEHILM

ABIAFXAG1GEHILM

ABIAFXAG1GEHILM

ABIAFXAG1GEHILM

TaqMan

TaqMan

StaRT-PCR

StaRT-PCR

QuantiGene

QuantiGene

a b c

e fd

Average signal

Figure 4 Performance of microarray platforms relative to alternative quantitative platforms. (a–c) Sensitivity of detection. Each microarray platform was compared to TaqMan (a), StaRT-PCR (b) or QuantiGene (c) for ability to detect genes expressed in sample A. Genes were analyzed based on present call criteria of being present in 3/5 replicates at one of the three microarray sites and in the majority of replicates for each alternative quantitative platform (at least 3/4 for TaqMan, 2/3 for StaRT-PCR and QuantiGene). Genes detected by each alternative quantitative platform were sorted according to their sig-nals (scaling as described in Fig. 1), and the percent of genes detected by both microarray and alternative quantitative platforms from bins of 30 consecutive genes (y axis) were plotted against the average signal of those genes measured by the alternative quantitative platform (x axis). (d–f) Correlation of fold change measured by each microarray platform compared to TaqMan (d), StaRT-PCR (e) or QuantiGene (f). Pair-wise Sample A to Sample B fold-change comparison, measured by each alternative quantitative platform (x axis) compared to each microarray platform (y axis). For each microarray platform, only genes present in both samples at each site were called present. Each line represents the Lowess smoothing fitting curve. The number of genes involved in each analysis varies with the platforms compared.

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


manufacturers in August 2005. This selection ensured that the genes would cover the entire intensity and fold-change ranges and include any bias due to RefSeq itself. To aid in the titration study, we included a subset of (~100) genes based on tissue-specificity (A versus B). To address cross-platform data inconsistency, we also included another subset, which showed the largest variability in log2 fold change across platforms in the Pilot-I Study. Platform vendors were queried about their ‘favorite’ genes (e.g., CYP family, PPARA, HDAC family and a small number of these were included). Consideration was also given to the inclusion of genes that were available from QuantiGene and StaRT-PCR platforms. The final list was therefore not completely unbiased.

Gene list for the MAQC study by alternative quantitative platforms. TaqMan assays: 1,000 TaqMan gene expression assays used in the study that matches with the MAQC gene list. These 1,000 assays were selected from > 200,000 available human TaqMan assays (>20,000 NCBI genes) and covered 997 genes (3 genes had more than one assay). StaRT-PCR: 103 genes were selected from the nearly 800 genes for which StaRT-PCR reagents are already available that match with the MAQC gene list. All genes that overlap with those measured by TaqMan assays and QuantiGene were included as well as an additional 102 genes for a total of 205. QuantiGene: we selected 245 QuantiGene assays (covered 244 genes) that matched with the MAQC gene list from nearly 2,600 genes for which QuantiGene probe sets are already available. All genes that overlap with those measured by TaqMan assays and StaRT-PCR were included. 55 genes were in common to all three alternative quantitative platforms.

TaqMan assays. RNA Samples: total RNA samples A (universal human refer-ence RNA (UHRR), Stratagene), B (brain, Ambion), C (3 UHRR:1 brain) and D (1 UHRR:3 brain) as described earlier were used for all TaqMan assays. There was no additional treatment to these samples before cDNA preparation. cDNA Preparation: cDNA was prepared from total RNA Sample A, B, C and D using Applied Biosystems cDNA Archive Kit and random primers. Multiple reactions containing 10 µg total RNA per 100 µl reaction volume were run for each sample following manufacturer’s recommendations. Individual reactions were pooled by sample and used for TaqMan assays analysis. TaqMan assays: each TaqMan Gene Expression Assay consists of two sequence-specific PCR primers and a TaqMan assay–FAM labeled MGB (minor groove binder) probe. Primer and probe design is described in Supplementary Methods. Each TaqMan assay was run in four replicates for each RNA sample. 10 ng total cDNA (as total input RNA) in a 10 µl final volume was used for each replicate assay. Assays were run with 2× Universal Master Mix without uracil-N-glycosylase on Applied Biosystems 7900 Fast Real-Time PCR System using universal cycling conditions (10 min at

platforms and commonly mapped to one or more of the alternative quantitative platforms, only 9 (1%) were ‘extremely’ discordant. A major factor contributing to these infrequent discordant results is differences in probe location. Assays designed to different locations of the discordant genes in this study demonstrated a utility of the alternative quantitative platforms (Fig. 6) to independently validate gene expression measure-ments from array platforms.

This analysis was also useful in the study of discordance observed between alternative quantitative platforms. For example, discordant expression results for FURIN observed in alternative quantitative plat-forms is consistent with a probe location difference. The limited com-mon gene list precluded a detailed analysis of the discordance caused by low expression genes among alternative quantitative platforms. In addition, another source of potential discordance may come from the difference of measuring mRNA directly versus measuring cDNA, which were not analyzed here.

In summary, analysis of the MAQC samples by three alternative quan-titative platforms revealed excellent fold-change correlation with micro-array platform data while enabling identification of possible sources of intersite and interplatform discordance in lists of genes measured as differentially expressed. Advantages of the alternative quantitative platforms were partially due to assay specificity, lower detection thresh-old and expanded assay range. Another advantage was the ease with which they interrogated specific gene locations due to their flexible assay design. Further, analysis by these alternative quantitative technologies contributed to characterization of the MAQC samples and confirmed their value in guiding optimization of gene expression methods.

METHODSSample definition. Sample A was Universal Human Reference RNA (Stratagene) and sample B was human brain total RNA (Ambion). Concentrations of A and B were normalized based on total RNA as measured by OD260. C was a 3:1 volumet-ric mixture of A and B, and D was a 1:3 volumetric mixture of A and B.

Selection of genes for validation by alternative quantitative platforms. A list of 1,297 RefSeqs was selected by the MAQC consortium. Over 90% of these genes were selected from a subset of 9,442 RefSeq common to the four plat-forms (Affymetrix, Agilent, GE Healthcare and Illumina) used in the MAQC Pilot-I Study (RNA Sample Pilot), based on annotation information provided by

Figure 5 Assessment of true positive rates and false discovery rates using TaqMan assays. (a–c) True positive rate (TPR) assessment using TaqMan assays. All common genes between TaqMan assays and microarray platforms were used for the TPR analysis. TPR was defined as the percentage of differentially expressed genes in sample A compared to sample B detected by each microarray platform out of the ones detected by TaqMan assays data as truth [TPR = TP/(TP+FN)], where TP is true positive and FN is false negative in microarray. Differential expression was detected by t-test, where false discovery rate (FDR) was controlled at the 5% level with fold-change filters of 0 (d), 1.5 (e) and 2.0 (f). For TaqMan assays, genes were ordered according to the average signals of A and B and for bins of 50 consecutive genes, we compared the significant difference calls between each microarray platform and TaqMan assays. Concordance of differential

expression was assessed for each platform. (d–f) False discovery rate (FDR) assessment using TaqMan assays. All common genes between TaqMan assays and microarray platforms were used for the FDR analysis. FDR was defined as FP/(TP + FP), where FP is false positive in microarrays. The FDR represents the percentage of differentially expressed genes detected only by microarray platforms out of all genes differentially expressed in microarray platforms. Notice that the FDR (relative to TaqMan assays) is slightly larger than 5%, which is expected from Benjamini Hochberg (BH) adjustment for multiple testing. Differential expression was detected by t-test (FDR at 5%), with fold-change level filters of 0 (d), 1.5 (e) and 2.0 (f).

TPR

FDR

No FC cut offa b

100

80

60

40

20

0

100

80

60

40

20

0

100

80

60

40

20

0

100

80

60

40

20

0

100

80

60

40

20

0

100

80

60

40

20

0

5 10 15 20 5 10 15 20 5 10 15 20

5 10 15 20 5 10 15 20 5 10 15 20

ABIAFXAG1GEHILM

ABIAFXAG1GEHILM

ABIAFXAG1GEHILM

ABIAFXAG1GEHILM

ABIAFXAG1GEHILM

ABIAFXAG1GEHILM

e fd

c

No FC cut off

FC cut off = 1.5

FC cut off = 1.5

FC cut off = 2.0

FC cut off = 2.0

Average signal

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


95 °C; 15 s at 95 °C, 1 min 60 °C, 40 cycles). The assays and samples were analyzed across a total of 44–384 well plates. Robotic methods (Biomek FX) were used for plate setup and each sample and assay replicate was tracked on a per well, per plate basis. Data normalization: in QRT-PCR an endogenous control gene is used to normalize data and control for variability between samples as well as plate, instrument and pipetting differences. POLR2A was chosen as the reference gene because its CT value was within the range of most of the genes in the study and showed the least variation across the samples (Supplementary Fig. 5a,b online). Each replicate CT was normalized to the average CT of POLR2A on a per plate basis by subtracting the average CT of POLR2A from each replicate to give the ∆CT which is equivalent to the log2 difference between endogenous control and target gene. Data analysis and filtering: the ∆CT of each replicate for each of the 1,000 assays was presented in the final data set as the normalized data. When TaqMan gene expression assays are run on a 7900HT system in a 10 µl reaction volume, a raw CT value of 34 represents approximately ten transcript molecules (assuming 100% amplification efficiency). At a copy number less than five, sto-chastic effects dominate and data generated are less reliable. Thus, a raw CT of 35 was set as the limit of detection in this study: individual replicates which gave CT values >35 were considered not detected and flagged as not expressed (A, absent); replicates with CT < 35 were considered detectable and identified as expressed (P, present). A CT > 32 and <35 (~5–40 transcript molecules) was considered a low expressing gene. For the ∆CT calculations we used CT of 35 for any replicate with CT > 35. Fold-change calculation: the log2 fold change between two samples was calculated using ∆∆CT method21: the average ∆CT of sample A was subtracted from that of samples B.

StaRT-PCR. StaRT-PCR assays were performed according to the procedures pre-viously described in detail4,12. Reverse transcription: for each of the four MAQC samples, two 20 µg aliquots of RNA were reverse transcribed. Each reverse-tran-scription reaction took place in a 90 µl volume containing Moloney Murine Leukemia Virus (MMLV) reverse transcriptase (1,500 units), MMLV RT 5× first strand buffer (final concentrations 50 mM Tris-HCl, pH 8.3, 75 mM KCl, 3 mM MgCl2) (both from Invitrogen), oligo dT primers (1.5 µg), RNasin (70 units), and deoxynucleotide triphosphates (dNTPs) (10 mM) (all from Promega). Calibration of cDNA: After reverse transcription, the two 90 µl cDNA products for each sample were combined into a single 180 µl volume. Each sample was then calibrated. A 2 µl aliquot of undiluted, tenfold diluted, or 100-fold diluted cDNA from each sample was PCR-amplified in presence of 2 µl of SMIS. In each µl of SMIS there are 600,000 JW molecules of ACTB internal standard (IS). It was determined that for each MAQC cDNA sample, a 50-fold dilution would result in approximate equivalence between ACTB NT and IS PCR products when equivalent volumes of each were included in the PCR reaction. After 50-fold dilu-tion, there were 4,500 µl of each cDNA sample. It was then confirmed for each sample that the amount of ACTB cDNA in 1 µl was approximately in balance with the 600,000 ACTB internal standard molecules in 1 µl of SMIS. The amount of RNA that contributed to each µl of each 50-fold diluted working solution was 4 ng. StaRT-PCR reaction conditions: for each StaRT-PCR reaction, a 20 µl reaction volume was prepared containing 2 µl of the calibrated cDNA sample, 2 µl of SMIS, 0.5 units of Taq polymerase, 2.2 µl of buffer, 0.6 ml of MgCl2, 1 µl of each primer, 0.45 µl of dNTPs, and 10.65 µl of water. Range finding step: the expression level of each gene in each sample was initially unknown. Thus, to ensure that each measurement was in range of quantification (NT/IS > 1/10 and < 10/1), a range finding measurement was conducted for each gene in each sample with E SMIS. Each µl of E SMIS, contains 600 molecules of the target gene IS and 600,000 molecules of ACTB IS. After PCR amplification and electropho-retic separation of the PCR products, the SEM Center software then determined whether the NT/IS ratio of the PCR products was acceptable or, if not, predicted which SMIS should be used for quantification. This prediction was 95% accurate. Quantification: each 20 µl reaction volume contained 2 µl of the calibrated cDNA sample and 2 µl of the appropriate SMIS (that is, A–F), predicted to be correct in the range finding step. Triplicate measurements were made of each gene in each sample. The fold-change calculation for each gene was based on the ratio of the gene transcript in sample B over sample A.

QuantiGene. Assay procedure: the QuantiGene assays were performed accord-ing to the procedure of QuantiGene Reagent System (Panomics), which was previously described in detail24,25. Briefly, 10 µl of starting total RNA (500 ng)

from sample A, B, C or D was mixed with 40 µl of Lysis Mixture (Panomics), 40 µl of Capture Buffer (Panomics) and 10 µl of target gene-specific probe set (CE (capture extender), 1.65 fmol/µl; LE (label extender), 6.6 fmol/µl; BL (blocker), 3.3 fmol/µl). Each sample mixture was then dispensed into an individual well of a Capture Plate (Panomics). The Capture Plate was sealed with foil tape and incubated at 53 °C for 16–20 h. The hybridization mixture was removed and the wells were washed 3× with 250 µl of wash buffer (0.1× SSC, 0.03% lithium lauryl sulfate). Residual wash buffer was removed by centrifuging the inverted Capture Plate at 1,000g. Signals for the bound target mRNA were developed by sequential hybridization with branched DNA (bDNA) amplifier, and alka-line phosphatase-conjugated label probe, at 46 °C for 1 h each. Two washes with wash buffer were used to remove unbound material after each hybridiza-tion step. Substrate dioxetane was added to the wells and incubated at 46 °C for 30 min. Luminescence from each well was measured using a Lmax microtiter plate luminometer (Molecular Devices). Three replicate assays measuring RNA directly (independent sampling n = 3) were performed for all described experi-ments. Genomic DNA contamination in the RNA sample, if there is any, does not affect the QuantiGene assay, since it remains doubled-stranded throughout the entire procedure and thus cannot hybridize to the probe sets at the temperature used in the assay. Data analysis and filtering: the QuantiGene assays of 244 genes are performed for MAQC samples A, B, C, D. For all samples, background signals were determined in the absence of RNA samples and subtracted from signals obtained in the presence of RNA samples. Because the QuantiGene assay measures RNA directly, no data normalization against a reference gene is required in the data analysis. The presence and absence call is determined by limit of detection (LOD) of the assay, where LOD = background + 3 s.d. of background. If at least two samples out of A, B, C, D have signals below LOD in a gene, we call the gene absence. To determine gene expression fold change in sample A versus sample B,

Gene coordinates

Log 2

fold

cha

nge

−− TaqMan assays − StaRT-PCR assays − − QuantiGene assays − Array platforms

ABCD1 DPYD ELAVL1

FURIN IGFBP5 PTGS2

4

2

0

–2

–4

2

1

0

–1

–2

2

1

0

–1

–2

2

1

0

–1

–2

2

0

–2

5

0

–5

0

1,00

0

2,00

0

3,00

0

4,00

0 0

1,00

0

2,00

0

3,00

0

4,00

0

0

1,00

0

2,00

0

3,00

0

4,00

00

1,00

0

2,00

0

3,00

0

0

2,00

0

4,00

0

6,00

0

0

1,00

0

2,00

0

3,00

0

4,00

0

5,00

0

6,00

0

Figure 6 Resolution of fold-change discrepancy results. Fold changes were calculated for Sample B vs. Sample A in all platforms. Each panel shows expression characteristics of a discordant gene across the transcript length. Y axis is log2 fold change. X axis represents transcript length starting from the 5′ end of the transcript. Gray bar graphically illustrates the transcript and the red vertical lines represent the exon-exon junctions. Colored bars represent expression value of each probe along the length of the transcript. The length of the colored bar represents the region interrogated by the probe for each platform. Two probes for FURIN (base 1–501, and base 217–2133) produced indistinguishable fold-change value in QuantiGene assay.

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


we calculated the fold change (fold changes) using formula log2 fold changes = log2(SA/SB), where SA represents the assay range for a target gene in sample A and SB represents the assay range for the target gene in sample B. A gene is considered for fold-change analysis if the signal in both sample A and sample B passes the LOD. Relative accuracy calculation: relative accuracy measures the proximity of observed expression values for C and D to the predicted values based on measured expression values for A and B. Concentrations of samples A and B were each quan-tified and normalized on the basis of total RNA (OD260). They were then mixed on a volumetric basis to yield sample C (0.75A/0.25B) and sample D (0.25A/0.75B). If the assay range for the target mRNA is within the linear dynamic range of the assay, then the predicted assay signal for Sample C and Sample D can be calculated using the following formula: C′ = 0.75A + 0.25B and D′ = 0.25A + 0.75B. TaqMan assay and QuantiGene sample input was based on total RNA. For this reason the predicted values of C and D can be calculated from the volumetric proportions of A and B based on the formula C = 0.25A + 0.75B and D = 0.75A + 0.25B. With StaRT-PCR, as with the microarrays, each measurement was normalized to mRNA instead of the starting total RNA. As described in26 and27, if the fraction of mRNA is higher in sample A compared to sample B, the predicted C and D values will be different from the formula provided above. Based on analysis of optimal linearity among the MAQC samples for the StaRT-PCR data, the most likely formula was determined to be C = 0.88A + 0.12B and D = 0.45A + 0.55B. A data set recalibrated on the basis of these assumed formulas (Supplementary Methods) was used to assess relative accuracy for StaRT-PCR.

Multi-platform data transformation for Figure 1. For StaRT-PCR, 6,000 tran-script molecules were defined by a value of 6,000 or log2 (6,000) = 12.55. For TaqMan assays, first the CT values were transformed from a decreasing copy number scale to an increasing copy number scale. This was accomplished by tak-ing the absolute value of the difference of every TaqMan assay CT value and the lowest value for TaqMan assays CT (40). This rescaling preserves the assay range measured by TaqMan assays in the log2 space. Given that a TaqMan assay CT value of 35 is estimated to correspond to 5 transcript molecules, the extrapolated CT equivalent for 6,000 transcript molecules is ~24.78. This value on the transformed scale corresponds to |24.78–40| or 15.22. To scale this to the StaRT-PCR value of 6,000 transcript molecules, a rescaling value of 2.66025 was applied to all values. This factor was calculated by taking the difference between the prescaling value in TaqMan assays that corresponds to 6,000 transcript molecules (15.22) and the value of StaRT-PCR that corresponds to 6,000 transcript molecules (12.55). The same transformation was applied to QuantiGene values resulting in a rescaling factor = 13.55. This factor was generated with the estimation of 6,000 transcript molecules defined by 0.5 RLU or –1.0 on a log2 scale. These transformations result in all platforms having a post-scaling value of 12.55 on a log2 scale for an approximate threshold of 6,000 transcript molecules.


ACKNOWLEDGMENTSWe would like to acknowledge the contribution to this manuscript from the following members of the MAQC team: Shawn B. Baker, Anne Bergstrom Lucas, Jim Collins, Eugene Chudin, Stephanie Fulmer-Smentek, Damir Herman, Richard Shippy, Chunlin Xiao and Necip Mehmet.

DISCLAIMERThis work includes contributions from, and was reviewed by, the FDA. The FDA has approved this work for publication, but it does not necessarily reflect official Agency policy. Certain commercial materials and equipment are identified in order to adequately specify experimental procedures. In no case does such identification imply recommendation or endorsement by the FDA, nor does it imply that the items identified are necessarily the best available for the purpose.

COMPETING INTERESTS STATEMENTThe authors declare competing financial interests (see the Nature Biotechnology website for details).

Published online at http://www.nature.com/naturebiotechnology/Reprints and permissions information is available online at http://npg.nature.com/reprintsandpermissions/

1. Vondracek, M. et al. Transcript profiling of enzymes involved in detoxification of xenobiotics and reactive oxygen in human normal and simian virus 40 T antigen-immortalized oral keratinocytes. Int. J. Cancer 99, 776–782 (2002).

2. Urdea, M. et al. Branched DNA amplification multimers for the sensitive, direct detec-tion of human hepatitis virus. Nucleic Acids Symp. Ser. 24, 197–200 (1991).

3. Gleaves, C.A. et al. Multicenter evaluation of the Bayer VERSANT HIV-1 RNA 3.0 assay: analytical and clinical performance. J. Clin. Virol. 25, 205–216 (2002).

4. Bustin, S.A. (ed.). A-Z of Quantitative PCR. (International University Line Biotechnology Series, La Jolla, California, USA, 2004).

5. Wong, M.L. & Medrano, J.F. Real-time PCR for mRNA quantitation. Biotechniques 39, 75–85 (2005).

6. Lee, L.G., Connell, C.R. & Bloch, W. Allelic discrimination by nick-translation PCR with fluorogenic probes. Nucleic Acids Res. 21, 3761–3766 (1993).

7. Heid, C.A., Stevens, J., Livak, K.J. & Williams, P.M. Real time quantitative PCR. Genome Res. 6, 986–994 (1996).

8. Gibson, U.E., Heid, C.A. & Williams, P.M. A novel method for real time quantitative RT-PCR. Genome Res. 6, 995–1001 (1996).

9. Qin, L.X. et al. Evaluation of methods for oligonucleotide array data via quantitative real-time PCR. BMC Bioinformatics 7, 23 (2006).

10. Kuo, W.P. et al. A sequence-oriented comparison of gene expression measurements across different hybridization-based technologies. Nat. Biotechnol. 24, 832–840 (2006).

11. Wang, Y. et al. Large scale real-time PCR validation on gene expression measure-ments from two commercial long-oligonucleotide microarrays. BMC Genomics 7, 59 (2006).

12. Willey, J.C. et al. Standardized RT-PCR and the standardized expression measurement center. Methods Mol. Biol. 258, 13–41 (2004).

13. Rots, M.G. et al. mRNA expression levels of methotrexate resistance-related proteins in childhood leukemia as determined by a standardized competitive template-based RT-PCR method. Leukemia 14, 2166–2175 (2000).

14. Mullins, D.N. et al. CEBPG transcription factor correlates with antioxidant and DNA repair genes in normal bronchial epithelial cells but not in individuals with broncho-genic carcinoma. BMC Cancer 5, 141 (2005).

15. Flagella, M. et al. A multiplex branched DNA assay for parallel quantitative gene expression profiling. Anal. Biochem. 352, 50–60 (2006).

16. Yao, J.D. et al. Multicenter Evaluation of the VERSANT Hepatitis B Virus DNA 3.0 Assay. J. Clin. Microbiol. 42, 800–806 (2004).

17. Elbeik, T. et al. Multicenter Evaluation of the Performance Characteristics of the Bayer VERSANT HCV RNA 3.0 Assay (bDNA). J. Clin. Microbiol. 42, 563–569 (2004).

18. Stenman, J. & Orpana, A. Accuracy in amplification. Nat. Biotechnol. 19, 1011–1012 (2001).

19. Cleveland, W. Robust locally weighted regression and smoothing scatter plots. J. Am. Stat. Assoc. 74, 829–836 (1979).

20. MAQC Consortium. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24, 1151–1161 (2006).

21. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate - a practical and pow-erful approach to multiple testing. J. R. Stat. Soc. B. Met. 57, 289–300 (1995).

22. Shippy, R. et al. Performance evaluation of commercial short-oligonucleotide microar-rays and the impact of noise in making cross-platform correlations. BMC Genomics 5, 61 (2004).

23. Livak, K.J. & Schmittgen, T.D. Analysis of relative gene expression data using real-time quantitative PCR and the 2-∆∆CT Method. Methods 25, 402–408 (2001).

24. Kern, D. et al. An enhanced-sensitivity branched-DNA assay for quantification of human immunodeficiency virus type 1 RNA in plasma. J. Clin. Microbiol. 34, 3196–3202 (1996).

25. Wang, J. et al. Regulation of insulin preRNA splicing by glucose. Proc. Natl Acad. Sci. USA 94, 4360–4365 (1997).

26. Shippy, R. et al. Using RNA sample titrations to assess microarray platform performance and normalization techniques. Nat. Biotechnol. 24, 1123–1131 (2006).

27. Tong, W. et al. Evaluation of external RNA controls for the assessment of microarray performance. Nat. Biotechnol. 24, 1132–1139 (2006).

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Using RNA sample titrations to assess microarray platform performance and normalization techniquesRichard Shippy1, Stephanie Fulmer-Smentek2, Roderick V Jensen3, Wendell D Jones4, Paul K Wolber2, Charles D Johnson5, P Scott Pine6, Cecilie Boysen7, Xu Guo8, Eugene Chudin9, Yongming Andrew Sun10, James C Willey11, Jean Thierry-Mieg12, Danielle Thierry-Mieg12, Robert A Setterquist13, Mike Wilson5, Anne Bergstrom Lucas2, Natalia Novoradovskaya14, Adam Papallo3, Yaron Turpaz8, Shawn C Baker9, Janet A Warrington8, Leming Shi15 & Damir Herman12

We have assessed the utility of RNA titration samples for evaluating microarray platform performance and the impact of different normalization methods on the results obtained. As part of the MicroArray Quality Control project, we investigated the performance of five commercial microarray platforms using two independent RNA samples and two titration mixtures of these samples. Focusing on 12,091 genes common across all platforms, we determined the ability of each platform to detect the correct titration response across the samples. Global deviations from the response predicted by the titration ratios were observed. These differences could be explained by variations in relative amounts of messenger RNA as a fraction of total RNA between the two independent samples. Overall, both the qualitative and quantitative correspondence across platforms was high. In summary, titration samples may be regarded as a valuable tool, not only for assessing microarray platform performance and different analysis methods, but also for determining some underlying biological features of the samples.

Microarrays are widely used to simultaneously measure the levels of thousands of RNA targets in a biological sample. Despite their wide-spread use, many in the community are concerned with the compara-bility of the results obtained using different microarray platforms and thus the biological relevance of the qualitative and quantitative results obtained. Microarray platform performance has been evaluated before on the criteria of sensitivity, specificity, dynamic range, precision and accuracy1–12. As part of the MicroArray Quality Control (MAQC) proj-ect, similar assessments have also been reported13,14. Other studies have used defined mixtures of RNA samples (titration samples) for interplat-form2,15 and interlaboratory15 comparisons. Here we have investigated an alternative performance metric: the abilities of different microar-ray platforms to accurately detect a signal trend produced by mixing samples (titration trend) and the effects of normalization and other data analysis practices on this performance characteristic. Gene-expression levels were measured for two pure samples and two mixtures using five different commercial whole-genome platforms at three different test sites per platform. The five commercially available whole-genome plat-forms tested were Applied Biosystems (ABI), Affymetrix (AFX), Agilent Technologies (AG1), GE Healthcare (GEH) and Illumina (ILM). The level of accurate titration response was quantified by determining the number of probes for which the average signal response in the titration samples was consistent with the response in the independent, reference RNA samples. We analyzed every platform at each site, and here we pres-ent comparisons of the various platforms using various data processing and normalization techniques.

To assess the titration response of as many genes as possible, an a priori expectation of differential expression of many transcripts was necessary. On the basis of results from pilot titration studies (data not shown), we elected to use two independent samples (A, Stratagene Universal RNA, and B, Ambion Human Brain RNA) that showed large, statistically significant differences in expression for a large number of transcripts to generate the two titration samples (C and D, consisting of 3:1 and 1:3 ratios of A to B, respectively; see Fig. 1). We defined the series of mean signals generated by a gene on a microarray platform across these samples as its titration response. For these analyses, we assumed

1GE Healthcare, 7700 S. River Pkwy., Suite #2603, Tempe, Arizona 85284, USA. 2Agilent Technologies, Inc., 5301 Stevens Creek Blvd., Santa Clara, California 95051, USA. 3University of Massachusetts-Boston, 100 Morrissey Blvd., Boston, Massachusetts 02125, USA. 4Expression Analysis, Inc., 2605 Meridian Pkwy., Durham, North Carolina 27713, USA. 5Asuragen, Inc., 2150 Woodward, Austin, Texas 78744, USA. 6Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, Maryland 20993, USA. 7ViaLogy, 2400 Lincoln Ave, Altadena, California 91001, USA. 8Affymetrix, Inc., 3420 Central Expressway, Santa Clara, California 95051, USA. 9Illumina, Inc., 9885 Towne Centre Dr., San Diego, California 92121, USA. 10Applied Biosystems, 850 Lincoln Centre Dr., Foster City, California 94404, USA. 11University of Toledo, Toledo, Ohio 43606, USA. 12National Center for Biotechnology Information, Bethesda, Maryland 20894, USA. 13Applied Biosystems, 2150 Woodward, Austin, Texas 78744, USA. 14Stratagene, 11011 N. Torrey Pines Rd., La Jolla, California 92037, USA. 15National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR Rd., Jefferson, Arizona 72079, USA. Correspondence should be addressed to R.S. ([email protected]).


ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


that the expression measurement of a transcript in a titration sample follows a linear titration relationship: the signal of any given transcript in the two titration samples should be a linear combination of the signals produced by the two independent samples. From the signal intensities in the microarray titration experiments, we obtained the percentage of genes on each platform that showed a monotonic titration response and analyzed that percentage as a function of the magnitude of differential expression between A and B or as a function of the signal intensity.

Many normalization methods have been developed that are commonly used for different microarray platforms16–24, including those methods that have been recommended by the array manufacturers for the MAQC project13 (see Methods). Differences in these methods significantly influ-ence several aspects of microarray performance, including precision and sensitivity9,16–20,23,24. However, no clear consensus exists in the microarray community as to which method is best under a given set of circumstances. The optimal normalization or scaling methods for a given dataset may depend both on the experiment and on many attributes of that microar-ray dataset, including signal distribution and noise characteristics25. The experimental design used here is valuable for assessing the influence of different data processing techniques on the self-consistency of microar-ray data with regard to titration response. In addition, the different data processing techniques were also analyzed with respect to their impact on the statistical power of these platforms to distinguish between the inde-pendent and titration samples. The titration analysis presented here was applied to all commercial whole-genome microarray platforms tested in the MAQC project13, using various data processing techniques, to evaluate the self-consistency and statistical power of the resulting data.

When assessing accuracy in experimental systems, the goal is to compare observed results to the expected ‘true’ values of the system. For most experiments measuring gene expression, the ‘true’ values are either unknown or difficult to measure independently. However, the titration response results presented here can provide some quantitative information about the relative accuracy of measurements of differential gene expression. Monotonicity in the titration response indicates a self-consistent relationship among the expression measurements from the four samples. Because many inferences drawn from microarray experi-ments depend as much or more on the direction of expression changes

as on their magnitudes, the consistency with which microarray assays determine direction of change is an important performance character-istic. The main advantages of our method are that titration responses can be assessed on a large scale, independent of a designated reference platform, and that it does not require substantial assumptions to be made about the data2,25.

RESULTSThe experimental design of the main MAQC study is described in detail elsewhere13. Briefly, two independent RNA samples were chosen for study and used to generate two titration samples. The gene-expression profiles of these samples, all split from a single pool, were measured on ten gene-expression measurement platforms. For each of the five whole-genome microarray platforms examined in this study, the sam-ples were analyzed at three different test sites, each with ≤5 replicate assays per sample, for a total of 293 microarray hybridizations at 15 different sites. Data from all platforms were then processed using the recommended method from each array manufacturer, as represented in the main MAQC paper13, as well as one or more alternative normaliza-tion methods.

Using probe sequence information, we identified 12,091 genes that were uniquely targeted by at least one probe for all five commercial whole-genome microarray platforms. For each platform, only the probe closest to the 3′ end of the gene was considered13. We chose to exclude genes that were not detected across all samples and focused on genes whose signals were above the noise level and therefore more reliable10. Each manufacturer provided quantitative detection calls characterizing the probability that a gene was detected in a given replicate13. For most analyses, only genes detected in at least three replicates for a given sample and site were considered. This detection-call protocol is the same as described in the main MAQC paper13.

Measuring titration response as a function of fold changeThe chief advantage of an experiment that evaluates gene expression in a series of known mixtures of two samples is that the rank order of measured expression levels of any given gene across the series can be predicted from the relative expression levels in the two original sam-ples. For the series described in this paper, if the true expression level (Ai) of any gene i in sample A is greater than the true expression level (Bi) of the same gene i in sample B, then Ai > Ci > Di > Bi, where Ci and Di are the true expression levels of gene i in samples C and D. If Bi > Ai, then Bi > Di > Ci > Ai. In our case, if we postulate Ai > Bi on the basis of the observed sample mean of Ai (

–Ai) being significantly larger

(P < 0.001) than the observed sample mean of Bi (–Bi), then we expect

–Ai >

–Ci >

–Di >

–Bi. Finally, if Ai ≈ Bi, then the order of observed means

will be nearly random.In Figure 2, the percentage of genes in a 100-gene moving window

that produce the expected titration response for each site and platform is plotted as a function of the average

–Ai /

–Bi ratio of those 100 genes,

when –Ai >

–Bi (left side of graph), or of the

–Bi /

–Ai ratio, when

–Bi >

–Ai (right

side of graph). The x-axis origin of these graphs is at –Ai /

–Bi =

–Bi /

–Ai = 1,

the ratio at which the titration response changes direction. The overall shapes of all of the curves are similar: as expected from theory, they rise from a value near zero at

–Ai /

–Bi =

–Bi /

–Ai = 1 to an asymptote of 100% at

larger values of –Ai /

–Bi or

–Bi /

–Ai. Figure 2 also illustrates how alternative

normalization methods (for AFX, alternative data reduction methods of the individual features) affect the quantitative outcome. For example, the data from the different test sites for AG1 show distinct behaviors under the standard normalization, but exhibit much more similar titration behaviors when normalized using the alternative method. In addition, for the AFX data, GCRMA processing26 (a modified version

A

C D

B

75% A+

25% B

25% A+

75% B

Independent samples

Titration samples

Figure 1 RNA samples. We used expression measurements from two independent total RNA samples, A and B, and mixtures of these two samples at defined ratios of 3:1 (C) and 1:3 (D). The titration mixtures were generated once for all experiments, with samples A and B at equal total RNA concentrations as determined by A260.

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


of robust multichip analysis (RMA) processing that models intensity of probe level data as a function of GC content) results in titration curves with a broader spread than those produced by probe logarithmic intensity error (PLIER)21 or RMA18. It should be noted that the differ-ent data processing techniques also yield different numbers of genes showing significant deviations in expression values between samples A and B (Fig. 2 and Table 1), which can also influence titration per-formance. The most striking differences resulting from normalization techniques are seen with the ILM data, where the alternative method, invariant scaling, resulted in many fewer significant genes on the left side of the panel as well as lower percentages of genes that titrate at lower-fold changes.

The quantitative differences between the various curves shown in Figure 2 are listed in Table 1, which presents the ratios at which 50%, 75% or 90% of the detected genes show a monotonic titration response. The performances observed for different sites and platforms were similar but not identical (Table 1). Many different platforms and sites identified the correct ordering of the titration samples for more than 90% of genes with twofold difference between A and B (Table 1, rows 14 and 17), which suggests that the DNA microarrays can reli-ably distinguish very small-fold differences in the mixture samples. The

differences resulting from alternative normalization techniques are also apparent in the results presented in Figure 2 and Table 1.

Measuring titration response as a function of signal intensityTo further explore the impact of different normalization techniques, we assessed titration response as a function of signal intensity. In Figure 3, we plot the fraction of genes that titrate relative to the total number of genes in the given intensity range, as a function of the lowest signal in the monotonic titration trend. That is, for the monotonic trend –Ai >

–Ci >

–Di >

–Bi, we plotted this fraction against the signal intensity

–Bi

(solid lines), whereas for the opposite trend –Bi >

–Di >

– –Ci >

–Ai, we used

the intensity –Ai (dashed lines). We observed that, in general, the fraction

of genes that titrate is inversely proportional to the signal intensity. The signal plotted on the x-axis is the lowest signal in the series; therefore, when this signal is low, the probes are more likely to show the expected titration response, as the fold differences will tend to be larger. When the magnitude of this lowest signal increases, the possible fold difference between A and B will decrease.

Differences in distribution among platforms and normalization methods are evident. For ABI, the fraction of genes that titrate follows the same trend as for the other platforms when A > B (Fig. 3, solid lines),

100

90

80

70

60

50

40

30

20

10

0

100

90

80

70

60

50

40

30

20

10

0

100

90

80

70

60

50

40

30

20

10

0

100

90

80

70

60

50

40

30

20

10

0

100

90

80

70

60

50

40

30

20

10

0

100

90

80

70

60

50

40

30

20

10

0

100

90

80

70

60

50

40

30

20

10

0

100

90

80

70

60

50

40

30

20

10

0

100

90

80

70

60

50

40

30

20

10

0

100

90

80

70

60

50

40

30

20

10

0

100

90

80

70

60

50

40

30

20

10

0

100

90

80

70

60

50

40

30

20

10

0

4.0 3.5 3.0 2.5 2.0 1.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

4.0 3.5 3.0 2.5 2.0 1.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

4.0 3.5 3.0 2.5 2.0 1.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.0 3.5 3.0 2.5 2.0 1.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

4.0 3.5 3.0 2.5 2.0 1.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

4.0 3.5 3.0 2.5 2.0 1.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.0 3.5 3.0 2.5 2.0 1.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

4.0 3.5 3.0 2.5 2.0 1.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

4.0 3.5 3.0 2.5 2.0 1.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.0 3.5 3.0 2.5 2.0 1.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

4.0 3.5 3.0 2.5 2.0 1.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

4.0 3.5 3.0 2.5 2.0 1.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

ABI - quantile ABI - scaling AG1 - median scaling AG1 - 75th percentile scaling

AFX - PLIER AFX - MAS5 AFX - RMA AFX - GCRMA

GEH - quantile ILM - invariant scalingPer

cent

age

of g

enes

that

titr

ate

Average linear ratio

A > C > D > Bn = 2,806n = 2,169n = 2,740

B > D > C > An = 2,240n = 1,803n = 2,198

A > C > D > Bn = 2,960n = 2,285n = 2,807

B > D > C > An = 2,355n = 1,844n = 2,312

A > C > D > Bn = 3,654n = 3,435n = 2,697

B > D > C > An = 1,977n = 2,168n = 2,589

A > C > D > Bn = 3,138n = 3,048n = 3,254

B > D > C > An = 2,164n = 2,256n = 2,538

A > C > D > Bn = 2,938n = 3,520n = 3,517

B > D > C > An = 1,869n = 2,038n = 2,132

A > C > D > Bn = 2,290n = 2,772n = 2,966

B > D > C > An = 1,696n = 1,834n = 1,951

A > C > D > Bn = 2,772n = 3,305n = 3,365

B > D > C > An = 1,781n = 2,020n = 2,015

A > C > D > Bn = 2,581n = 3,092n = 3,227

B > D > C > An = 1,720n = 1,931n = 1,956

A > C > D > Bn = 3,809n = 4,063n = 4,034

B > D > C > An = 1,918n = 2,008n = 2,091

A > C > D > Bn = 3,902n = 3,977n = 4,352

B > D > C > An = 2,251n = 2,326n = 2,496

A > C > D > Bn = 3,128n = 3,002n = 2,543

B > D > C > An = 2,136n = 2,038n = 1,792

A > C > D > Bn = 1,981n = 1,755n = 1,542

B > D > C > An = 3,152n = 2,882n = 2,900

A/B B/A

A/B B/A

A/B B/A

A/B B/A

A/B B/A

A/B B/A

A/B B/A

A/B B/A

A/B B/A

A/B B/A

A/B B/A

A/B B/A

ILM - quantileGEH - median scaling

Figure 2 Percentage of genes showing the monotonic titration responses –Ai >

–Ci >

–Di >

––Bi and

–Bi >

–Di >

– –Ci >

–Ai plotted against the linear

–Ai /

–Bi and

–Bi /

–Ai

ratios, respectively, for each commercial whole-genome microarray platform, using various normalization methods. All graphs were generated from the set of 12,091 genes common across whole-genome platforms, with outlier arrays excluded per manufacturer’s recommendations13. Genes detected across all four samples per site that were also significantly differentially expressed (P < 0.001) in independent samples A and B were used in the calculations (Table 1, rows 4 and 5). A two-sample t-test, with equal variance, was performed within each site on log2 expression values. For each platform, a 100-probe moving window, based on sorted

–Ai /

–Bi ratios (left side of plot) or

–Bi /

–Ai ratios (right side of plot), was used to calculate the percentage of self-consistent monotonic titration

response genes (y-axis) as a function of the corresponding moving average of –Ai /

–Bi or

–Bi /

–Ai ratios (x-axis) within each site. Graphs are plotted with a scale

break between –1 and 1, with reassignment of the x-axis for clarity. Each graph contains six series of data points (three sites in two monotonic directions), which were smoothed using a distance-weighted least-squares method. Blue, site 1; red, site 2; gray, site 3. Total number of genes showing the monotonic trend for each site are indicated in each graph, for both directions (

–Ai >

–Ci >

–Di >

––Bi for

–Ai /

–Bi ratios >1 and

–Bi >

–Di >

– –Ci >

–Ai for

–Bi /

–Ai ratios >1), and are also

listed in Table 1 (rows 4 and 5). The normalization methods highlighted in yellow for each platform represent the manufacturer’s recommended method used in the MAQC main paper13.

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Table 1 Gene counts for AFX and ABI (top) and AG1, GEH and ILM (bottom) for each normalization method

Quantile Scaling PLIER MAS 5.0 RMA GCRMA

Row Condition ABI_1 ABI_2 ABI_3 ABI_1 ABI_2 ABI_3 AFX_1 AFX_2 AFX_3 AFX_1 AFX_2 AFX_3 AFX_1 AFX_2 AFX_3 AFX_1 AFX_2 AFX_3

1 Detected in A · B · C · D 8,049 7,863 8,550 8,049 7,863 8,550 7,359 7,006 7,424 7,359 7,006 7,424 7,359 7,006 7,424 7,359 7,006 7,424

2 A > B 4,284 4,191 4,509 4,308 4,219 4,424 4,423 4,291 4,557 4,244 4,040 4,267 4,414 4,192 4,440 4,356 4,125 4,376

3 B > A 3,765 3,672 4,041 3,741 3,644 4,126 2,936 2,715 2,867 3,115 2,966 3,157 2,945 2,814 2,984 3,003 2,881 3,048

4 A > B and P < 0.001 3,144 2,298 3,046 3,143 2,376 3,037 3,723 3,632 3,848 2,982 2,934 3,168 3,559 3,491 3,670 3,420 3,273 3,490

5 B > A and P < 0.001 2,572 1,886 2,436 2,571 1,930 2,494 2,356 2,176 2,306 2,074 1,999 2,182 2,272 2,274 2,372 2,224 2,172 2,303

6 A > C > D > B 3,063 2,924 3,159 3,296 3,104 3,256 3,042 3,751 3,616 2,493 3,111 3,258 2,862 3,462 3,479 2,708 3,297 3,407

7 B > D > C > A 2,471 2,424 2,622 2,670 2,487 2,772 1,924 2,154 2,222 1,873 2,089 2,170 1,858 2,100 2,087 1,829 2,071 2,075

8 A > C > D > B and P < 0.001 2,806 2,169 2,740 2,960 2,285 2,807 2,938 3,520 3,517 2,290 2,772 2,966 2,772 3,305 3,365 2,581 3,092 3,227

9 B > D > C > A and P < 0.001 2,240 1,803 2,198 2,355 1,844 2,312 1,869 2,038 2,132 1,696 1,834 1,951 1,781 2,020 2,015 1,720 1,931 1,956

10 (A > C > D > B) / (A > B) 0.71 0.70 0.70 0.77 0.74 0.74 0.69 0.87 0.79 0.59 0.77 0.76 0.65 0.83 0.78 0.62 0.80 0.78

11 (B > D > C > A) / (B > A) 0.66 0.66 0.65 0.71 0.68 0.67 0.66 0.79 0.78 0.60 0.70 0.69 0.63 0.75 0.70 0.61 0.72 0.68

12 50% titrate when A/B = 1.35 1.35 1.36 1.28 1.32 1.32 1.30 1.13 1.20 1.52 1.28 1.30 1.40 1.18 1.25 1.60 1.28 1.32

13 75% titrate when A/B = 1.58 1.65 1.65 1.45 1.60 1.60 1.65 1.20 1.30 1.98 1.45 1.50 1.70 1.32 1.42 2.05 1.47 1.58

14 90% titrate when A/B = 1.80 1.98 1.99 1.68 1.90 1.94 2.10 1.30 1.52 3.00 1.67 1.78 2.10 1.42 1.61 2.80 1.68 1.85

15 50% titrate when B/A = 1.43 1.42 1.45 1.34 1.35 1.40 1.39 1.20 1.22 1.53 1.30 1.36 1.44 1.22 1.30 1.63 1.35 1.47

16 75% titrate when B/A = 1.77 1.80 1.88 1.60 1.75 1.83 1.68 1.37 1.38 1.82 1.45 1.52 1.75 1.40 1.50 2.22 1.65 1.80

17 90% titrate when B/A = 2.08 2.23 2.40 1.85 2.12 2.30 2.05 1.49 1.50 2.50 1.75 1.87 2.15 1.58 1.68 2.90 2.10 2.30

18 A/B > 2.00 1,794 1,664 1,830 1,813 1,718 1,808 1,703 1,602 1,832 1,759 1,548 1,756 1,693 1,468 1,702 2,178 2,062 2,255

19 B/A > 2.00 1,636 1,562 1,745 1,634 1,548 1,793 1,171 1,028 1,136 1,360 1,202 1,346 1,172 1,017 1,141 1,462 1,378 1,501

20 A/B > 2.00 (P < 0.001) 1,772 1,558 1,802 1,793 1,626 1,782 1,703 1,602 1,832 1,732 1,542 1,748 1,693 1,468 1,700 2,168 2,049 2,233

21 B/A > 2.00 (P < 0.001) 1,613 1,423 1,672 1,612 1,435 1,716 1,171 1,028 1,136 1,350 1,195 1,335 1,171 1,017 1,141 1,447 1,365 1,487

Median scaling 75th % scaling Median scaling Quantile Quantile Invariant scaling

Row Condition AG1_1 AG1_2 AG1_3 AG1_1 AG1_2 AG1_3 GEH_1 GEH_2 GEH_3 GEH_1 GEH_2 GEH_3 ILM_1 ILM_2 ILM_3 ILM_1 ILM_2 ILM_3

1 Detected in A · B · C · D 8,322 8,468 9,121 8,322 8,468 9,121 10,416 10,505 10,289 10,416 10,505 10,289 7,995 7,761 7,555 7,995 7,761 7,555

2 A > B 5,046 4,922 5,051 4,624 4,705 5,027 6,324 6,537 6,161 6,173 6,275 6,123 4,505 4,349 4,221 3,670 3,512 3,009

3 B > A 3,276 3,546 4,070 3,698 3,763 4,094 4,092 3,968 4,128 4,243 4,230 4,166 3,490 3,412 3,334 4,325 4,249 4,546

4 A > B and P < 0.001 3,711 3,763 3,710 3,443 3,624 3,807 3,998 4,753 4,393 4,042 4,582 4,512 3,657 3,289 2,808 2,868 2,479 1,769

5 B > A and P < 0.001 2,057 2,439 2,839 2,447 2,707 2,958 2,238 2,352 2,632 2,409 2,586 2,772 2,713 2,473 2,051 3,384 3,068 2,960

6 A > C > D > B 4,249 3,714 2,923 3,430 3,218 3,460 4,413 4,314 4,381 4,637 4,308 4,917 3,204 3,170 2,924 2,097 1,945 1,989

7 B > D > C > A 2,304 2,357 2,848 2,384 2,377 2,703 2,167 2,230 2,258 2,718 2,653 2,833 2,198 2,153 2,059 3,426 3,221 3,697

8 A > C > D > B and P < 0.001 3,654 3,435 2,697 3,138 3,048 3,254 3,809 4,063 4,034 3,902 3,977 4,352 3,128 3,002 2,543 1,981 1,755 1,542

9 B > D > C > A and P < 0.001 1,977 2,168 2,589 2,164 2,256 2,538 1,918 2,008 2,091 2,251 2,326 2,496 2,136 2,038 1,792 3,152 2,882 2,900

10 (A > C > D > B) / (A > B) 0.84 0.75 0.58 0.74 0.68 0.69 0.70 0.66 0.71 0.75 0.69 0.80 0.71 0.73 0.69 0.57 0.55 0.66

11 (B > D > C > A) / (B > A) 0.70 0.66 0.70 0.64 0.63 0.66 0.53 0.56 0.55 0.64 0.63 0.68 0.63 0.63 0.62 0.79 0.76 0.81

12 50% titrate when A/B = 1.24 1.35 1.60 1.38 1.48 1.43 1.34 1.45 1.40 1.25 1.38 1.25 1.32 1.30 1.34 1.52 1.55 1.32

13 75% titrate when A/B = 1.39 1.66 2.15 1.53 1.75 1.70 1.50 1.70 1.53 1.40 1.62 1.38 1.50 1.49 1.54 2.08 2.08 1.65

14 90% titrate when A/B = 1.55 2.09 3.20 1.68 2.02 2.02 1.65 1.95 1.66 1.60 1.95 1.55 1.65 1.70 1.72 2.72 2.80 2.15

15 50% titrate when B/A = 1.39 1.45 1.40 1.52 1.57 1.48 1.46 1.44 1.51 1.30 1.35 1.30 1.44 1.45 1.41 1.26 1.30 1.25

16 75% titrate when B/A = 1.76 1.87 1.70 1.90 1.92 1.87 1.65 1.65 1.70 1.50 1.58 1.50 1.74 1.81 1.69 1.42 1.47 1.47

17 90% titrate when B/A = 2.30 2.60 2.05 2.50 2.35 2.33 1.87 1.85 1.88 1.72 1.80 1.72 2.00 2.14 1.93 1.65 1.70 1.75

18 A/B > 2.00 2,570 2,435 2,284 2,179 2,236 2,262 2,363 2,772 2,640 2,216 2,522 2,570 1,620 1,602 1,446 1,377 1,298 1,063

19 B/A > 2.00 1,556 1,714 1,901 1,790 1,843 1,916 1,351 1,351 1,453 1,373 1,432 1,451 1,382 1,371 1,254 2,008 1,969 2,227

20 A/B > 2.00 (P < 0.001) 2,504 2,393 2,249 2,136 2,197 2,227 2,339 2,757 2,616 2,200 2,508 2,545 1,620 1,602 1,430 1,377 1,290 1,045

21 B/A > 2.00 (P < 0.001) 1,458 1,673 1,883 1,672 1,802 1,901 1,340 1,347 1,443 1,356 1,427 1,437 1,382 1,365 1,238 2,004 1,942 2,146

Row 1 lists the number of genes detected in all four samples for each platform, separated by site. Rows 2 and 3 represent the number of concordantly detected genes for –A >

–B

and –B >

–A, respectively. The sum of rows 2 and 3 for each column is identical to the gene count in row 1. Rows 4 and 5 represent the number of concordantly detected, statistically

significant (P < 0.001) genes for –A >

–B and

–B >

–A. Rows 6 and 7 represent the number of detected genes that show the monotonic titration trends

–A >

–C >

–D >

––B and

–B >

–D >

– –C >

–A.

Rows 8 and 9 represent the number of statistically significant (P < 0.001), concordantly detected genes that show the monotonic titration trends –A >

–C >

–D >

––B and

–B >

–D >

– –C >

–A.

The statistical test used was a two-sample t-test, using equal variance, calculated within each site and comparing log2 expression values between the independent samples A and B. The gene counts in rows 8 and 9 are also indicated in Figure 2 for each monotonic direction. Rows 10 and 11 translate the previous rows into percentages of genes showing the monotonic titration trend. Rows 12–17 summarize Figure 2 for three specific y-axis values (50%, 75% and 90% of genes titrate at the listed average fold changes). Rows 18 and 19 show the numbers of genes for which

–A /

–B > 2 and

–B /

–A > 2. Rows 20 and 21 show the numbers of statistically significant (P < 0.001) genes used to create the box plots in

Figure 4. Columns highlighted in blue, for each platform, represent the manufacturer’s recommended normalization methods used in the main MAQC paper13. More detailed gene counts with cross-site intersections can be found in Supplementary Table 1 online.

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


but when B > A (dotted lines), these data show a sudden increase in that fraction at high intensity. This effect, although still present, is much less distinct for the scaled than for the quantile-normalized data. We saw improved reproducibility among sites and concordance between the two titration trends in the AG1 75th percentile scaling relative to the median scaling. For the AFX-PLIER data, the signal range across which a titration response is elicited is smaller than for the other platforms and normalization methods, possibly owing to the variance stabilization used in the PLIER method. In all cases, the AFX data show lower percent-ages for site 1, as in Figure 2. For the GEH data, median normalization results in a very clear distinction between the two different titration patterns; this distinction is moderated by quantile normalization. The data for the ILM rank invariant scaling indicate a larger number of genes showing the titration response

–Bi >

–Di >

– –Ci >

–Ai than showing the

opposite trend, a result not seen for any other platform or normaliza-tion method. Unlike in Figure 2, the percentage of titrating genes never reaches 100% because, at all signal ranges, some genes show only very small differences in expression across the samples and are more likely to yield a near-random ordering in their titration responses.

Analysis of titration mixturesAn underlying assumption for this study was that the proportions of each mRNA in the mixture samples (C and D) from each of the original samples (A and B) are equivalent to the mixing proportions of the total RNA. For this assumption to be true, the fractions of each mRNA in the total RNA samples A and B had to be the same and had to be processed by the various biochemical systems with equal efficiencies. Using math-ematical modeling, we investigated whether we could derive the relative mRNA contents of the two independent samples using the microarray data from the independent and titration samples (see Methods). Such

modeling defines the true fractions of mRNA derived from sample A in titration samples C and D as αC and αD, and the true fractions of mRNA derived from sample B in titration samples C and D as βC and βD (see Box 1 and Supplementary Fig. 5). Figure 4 shows the results of this modeling for all the platforms and normalization methods, with the y-axes representing the estimates of βC (bottom) and βD (top). The lower charts show median values of βC centered on 0.18 but usually larger for

–Ai >

–Bi (left) than for

–Bi >

–Ai (right), and the upper charts

show median values of βD centered on 0.67. These deviations from the expected values of 0.25 and 0.75 based on the 3:1 mixtures of total RNA suggest that the mRNA concentrations of the A and B samples were not identical. From these results, we estimate the mRNA concentration in the B sample to be approximately two-thirds of the concentration in the A sample (see Box 1). An empirical evaluation of mRNA content in samples A and B is consistent with our estimates of 3% and 2%, respectively (see Methods).

The values calculated from the different platforms and normaliza-tion methods are generally similar, with two clear exceptions. For ILM, invariant scaling results in much lower estimates for βC and βD than the other platforms and normalization methods when A > B (left side) but not when B > A. This difference is consistent with the results noted for the titration response (Figs. 2 and 3). For ABI, the estimates of βC and βD are consistent with the other platforms when A > B but lower than the other platforms when B > A. This result was seen with both nor-malization methods, although to different extents, and may be related to the differences noted in Figure 3. The deviations for βC and βD are particularly noteworthy because of the relatively small errors of the ABI data in this analysis.

The individual microarray measurements for the titration coefficients shown in Figure 4 indicate that normalization and data-processing

0.6

0.5

0.4

0.3

0.2

0.1

0–5 0 5 10 15 20

0.6

0.5

0.4

0.3

0.2

0.1

0–5 0 5 10 15 20

0.6

0.5

0.4

0.3

0.2

0.1

0–5 0 5 10 15 20

ABI - quantile ABI - scaling AG1 - median scaling AG1 - 75th percentile scaling

AFX - PLIER AFX - MAS5 AFX - RMA AFX - GCRMA

GEH - median scaling GEH - quantile ILM - quantile ILM - invariant scaling

Frac

tion

of g

enes

that

titr

ate

Average log2 signal

0.6

0.5

0.4

0.3

0.2

0.1

0–5 0 5 10 15 20

0.6

0.5

0.4

0.3

0.2

0.1

0–5 0 5 10 15 20

0.6

0.5

0.4

0.3

0.2

0.1

0–5 0 5 10 15 20

0.6

0.5

0.4

0.3

0.2

0.1

0–5 0 5 10 15 20

0.6

0.5

0.4

0.3

0.2

0.1

0–5 0 5 10 15 20

0.6

0.5

0.4

0.3

0.2

0.1

0–5 0 5 10 15 20

0.6

0.5

0.4

0.3

0.2

0.1

0–5 0 5 10 15 20

0.6

0.5

0.4

0.3

0.2

0.1

0–5 0 5 10 15 20

0.6

0.5

0.4

0.3

0.2

0.1

0–5 0 5 10 15 20

A > C > D > B Site 1 Site 2 Site 3 B > D > C > A Site 1 Site 2 Site 3

Figure 3 Impact of normalization on the distributions of titrating genes as a function of signal intensity. Fractions of genes showing the monotonic titration responses

–Ai >

–Ci >

–Di >

––Bi and

–Bi >

–Di >

– –Ci >

–Ai are plotted against

–Bi (solid line) and

–Ai (dashed line), respectively. Histograms in each panel represent data

from a different platform and normalization technique, separated by site and direction. Normalization methods highlighted in yellow for each platform are the manufacturer’s recommended method used in the MAQC study. Blue, site 1; red, site 2; gray, site 3. The data for these graphs were generated from the set of 12,091 genes common across the platforms that were significantly differentially expressed (P < 0.001) in samples A and B and detected in all four samples (Table 1, rows 4 and 5). All data are plotted on the same scale: the x-axis is normalized signal in log2 units and the y-axis shows the fraction of titrating probes relative to the total number of probes in the given intensity range. Bin centers are 0.5 apart on the log2 scale. To avoid spurious oscillations in the lowest and highest signal intensities, we plotted only bins with more than ten genes. Differences between normalization techniques are demonstrated by the differing signal ranges within a platform for the monotonic titration response. The normalization methods highlighted in yellow for each platform represent the manufacturer’s recommended method used in the MAQC main paper13.

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


differences are not the primary cause for the deviations from the theo-retical values. Differences in mRNA abundance contribute to these deviations and may not be circumvented with normalization alone. Additionally, further analysis of microarray measurements from these titration mixtures may provide greater-resolution observations of the global tendency (Fig. 4) of estimates of βC and βD to be larger for A > B than for B > A (see Supplementary Fig. 1 online).

Effects of outlier dataDuring execution and analysis of the MAQC study, the consortium iden-tified one outlier site and multiple outlier arrays on the basis of objective criteria of data quality13. In some cases, we evaluated the effects of not censoring such data from the analysis. The results (data not shown) were as expected: inclusion of low-quality data degraded both intra- and intermethod reproducibility. This result, although predictable, is none-theless noteworthy because microarray experiments are expensive and are sometimes used to analyze samples that are available in very limited quantities. Low-quality microarray data are discarded with great pain. It is therefore important that the community develop shared standards of microarray data quality to allow use and interpretation of less-than-perfect data while preventing overinterpretation. The well-character-ized RNA samples and all of the data (including outliers) produced by the MAQC study are a good start on the road to such data-quality standards. In particular, the titration experimental design used in this work may prove to be an important tool for developing such standards, as the experiments can be interpreted using a small number of plausible assumptions.

DISCUSSIONThe MAQC titration study was conceived as an experiment that could be implemented across several platforms, with a minimum of assump-tions. One of the initial goals of the titration study was to assess relative accuracy by comparing observed expression in the titration samples with the expression expected on the basis of the known mixing ratios

of the two independent samples. This analysis proved to be more com-plex than originally anticipated, largely owing to the effects of different mRNA fractions in the two independent samples. However, the qualita-tive expectation of a particular signal ordering is still valid and provides a sensitive tool for differentiating microarray platform performance and normalization methods. As the measurement of titration response illus-trates, different platforms and data analysis methods have slightly dif-ferent performance optima: design and processing choices that increase the number of detected genes also tend to increase noise in the titration series. In addition to differences in the number of genes analyzed, the variations seen in Figure 2 and Table 1 can also result from differences in expression-ratio compression (leading to different ratios observed for any given gene) as well as levels of noise in each measurement. In general, the behaviors of various sites and platforms are quite similar.

The analysis of the titration mixtures reveals some interesting obser-vations about the data. These results show asymmetry in the titration responses (Figs. 2 and 3) and the estimates of the true fractions of mRNA in the titration samples (Fig. 4). This asymmetry may be caused in part by additional differences in the normalization of the A and B samples (Supplementary Fig. 1), may relate to more difficulty in distinguishing A and C at low signal or may be a consequence of nonlinearity in the signal response relative to the concentration amounts (Supplementary Fig. 2 online). In addition, the results presented here demonstrate that the mRNA content of the two independent samples is not equal. This conclusion is supported by additional lines of evidence. First, an appar-ent power analysis27–30 (Supplementary Figs. 3 and 4 online) is asym-metric between the sample pairings (A, C) and (B, D). This asymmetry is probably the result of the A sample being more similar to C than B is to D. Second, the slopes of the linear trends for the titration sample/inde-pendent sample ratios (Supplementary Fig. 1) suggest that the ratio of sample A to B in sample C differs from the expected value from the total RNA ratios. Third, external spike-in RNA controls were included for sev-eral platforms; these controls were amplified and labeled along with the sample RNA and indicate that the A sample contains a higher percentage

Box 1 Modeling of titration mixturesIdeally, the mRNA expression levels of each gene in samples C and D may be mathematically expressed as

C = αCA + βCB and D = αDA + βDB,

where A and B are the measured mRNA abundances of the gene in samples A and B, respectively, and αC, βC, αD and βD

are the mixture coefficients. If we impose the requirement that

αC + βC =1 and

αD + βD = 1 (if A = B, then C = A = B = D),

then elementary algebra can be used to derive simple formulas for βC and βD:

βC = (C – A)/(B – A) and

βD = (D – A)/(B – A).

If the mRNA fractions in samples A and B are identical and the normalization of samples A, B, C and D exactly the same, then the measured fraction should be centered on the ideal mixture fractions of βC = 0.25 and βD = 0.75 (implying αC = 0.75 and αD = 0.25). However, different mRNA concentrations in the A and B samples and differences in the normalization of the four samples

for different platforms, sites and normalization methods can lead to deviations from these expected values (Fig. 4). For example, if the mRNA fractions for the A and B samples (termed a and b, respectively) are unequal (a ≠ b), then

C = ((0.75a)A + (0.25b)B)/(0.75a + 0.25b) and

D = ((0.25a)A + (0.75b)B)/(0.25a + 0.75b).

We can express the true ratios of the B to A mRNA fractions,

b/a = 3βC/(1 – βC) = βD/3(1 – βD)

(see Supplementary Fig. 5). Using the empirical measurements of βC and βD, we can then estimate these true mRNA fractions. For example, if the B fraction of sample C is βC ≈ 0.18, as indicated by microarray median values in Figure 4 (bottom), then we can deduce that the true ratio of mRNA fractions b/a is approximately 2:3. Moreover, these results predict that

βD = 9βC/(1 + 8βC) ≈ 0.67,

which is consistent with the empirical microarray results in Figure 4 (top).

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


of mRNA relative to the B sample31. Finally, a preliminary empirical analysis of mRNA content in the A and B samples (see Methods) con-firmed that the mRNA content differs between the samples.

The discovery of a difference in the mRNA content of samples A and B has important implications for the future use of these commercially available samples in method calibration, proficiency testing and other activities requiring well-characterized, complex RNA. As a result of the MAQC study, these samples are probably the best-characterized complex RNA preparations available. The RNA-measurement community should complete the characterization of these samples by more accurately mea-suring the fraction of mRNA in each preparation, so that the scientific community can make better use of this resource.

The utility of the titration samples for assessing normalization and data preprocessing methods can be seen throughout the analyses pre-sented here. Notably, for all platforms except AFX and ILM, the perfor-mance of the MAQC ‘standard’ normalization or data preprocessing method was slightly inferior to that of the secondary method, especially in the apparent power analysis (Supplementary Fig. 3). This result high-lights the observation noted throughout this study that data process-ing methods determined to be optimal under one set of circumstances may not always prove appropriate under all conditions, particularly if primary assumptions underlying those data processing methods are violated.

A great strength of the design presented here is that, despite the added complexities of varying mRNA content, the qualitative expectation of a particular signal ordering is still valid, provided that the different data sets are properly scaled relative to one another. Therefore, this design is very valuable for assessing microarray performance. Specifically, as we have shown here, the titration response can be used to distin-guish between normalization methods that are sensitive to changes in mRNA fraction and methods that are robust despite such changes. One observation of this study is that the robustness of a normalization

method depends in part on the subset of data used to determine the scaling constant or function. Our results indicate a path toward objec-tive optimization of this normalization set. The differences in gene expression among samples may be greater and the variability across replicates may be smaller in this study than in typical biological experi-ments; nonetheless, the lessons learned regarding the use of titration mixtures to evaluate the performance and normalization of large-scale gene-expression measurements may have widespread application in more realistic settings. In addition, the wide range of gene expression in these samples probably served to amplify data processing–derived differences that would have been more difficult to detect in analyses of more closely matched samples.

Finally, it should be noted that the majority of genes considered here yielded very similar behavior across all platforms, in spite of the com-plications noted in this manuscript. Therefore, these results should be considered a testament to the underlying strength of all of the methods examined. Improvement of mRNA quantification methods remains an important objective, and the MAQC study has produced samples and data that will aid the community in making such improvements. The con-cordance of data presented here demonstrate that the methods used are sound and, when properly implemented and interpreted, can be used to measure expression levels of thousands of RNA targets simultaneously.

METHODSPreparation of the RNA sample titrations. RNA samples are described in detail in the main MAQC paper13. Briefly, two commercially available total RNA solu-tions and 3:1 and 1:3 mixtures were chosen at the outset by the members of the MAQC project. For simplicity, these samples were designated as A, B, C and D. A and B are independent total RNA samples. A is derived from a collection of ten human cell lines and B from human brain tissue. Sample A is sold commercially under the name Universal Human Reference RNA (Catalog number 740000, Stratagene). Sample B is sold commercially under the name FirstChoice Human Brain Reference RNA (Catalog number 6050, Ambion).

A > B B > A

β C =

(C

– A

)/(B

– A

)β D

= (

D –

A)/

(B –

A)

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

ABI GEHAFX ILMAG1 ABI GEHAFX ILMAG1

ABI GEHAFX ILMAG1 ABI GEHAFX ILMAG1

Quantile Quantile QuantileScaling Scaling Scaling ScalingScaling 75PLIER MAS 5.0 RMA GCRMA

Quantile Quantile QuantileScaling Scaling Scaling ScalingScaling 75PLIER MAS 5.0 RMA GCRMA Quantile Quantile QuantileScaling Scaling Scaling ScalingScaling 75PLIER MAS 5.0 RMA GCRMA

Quantile Quantile QuantileScaling Scaling Scaling ScalingScaling 75PLIER MAS 5.0 RMA GCRMA

Site 1 � Site 2 � Site 3 �

Figure 4 Titration-response concordance for each commercial whole-genome microarray platform, using different normalization methods, with data from each platform separated by site and fold-change direction. Data shown are from the 12,091 genes common across whole-genome platforms. Box plots were generated in cases where a gene was detected across all samples per site and had a statistically significant (P < 0.001) A/B ratio >2 in the direction indicated. A two-sample t-test, with equal variance, was performed within each site on log2 expression values. Data for each site were split by direction of fold change: left, genes where A/B > 2; right, genes where B/A > 2 (all differences significant, P < 0.001, for both directions). Number of genes used for each box plot is indicated by individual site counts in Table 1 (rows 20 and 21). Each box represents the interquartile range, with median marked by a horizontal black line and 10th and 90th percentiles marked by the outer whiskers. Blue, site 1; red, site 2; gray, site 3. The horizontal dashed black lines represent expected values assuming 3% and 2% mRNA abundance levels for samples A and B, respectively. In other words, when the mRNA/total RNA fraction in A is equal to 3% and in B is equal to 2%, then βC = (C – A)/(B – A) = 0.18 (bottom two charts) and βD = (D – A)/(B – A) = 0.67 (top two charts). Refer to Box 1 for further details. Normalization methods highlighted in yellow for each platform represent the manufacturer’s recommended method used in the MAQC main paper13.

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


RNA titration samples were generated once for all MAQC experiments (Fig. 1), with samples A and B at equal concentrations as measured by A260. Sample C was made by mixing sample A with sample B at a volumetric ratio of 75:25, and sample D was made by mixing sample A with sample B at a volumetric ratio of 25:75.

Normalization methods used in this study. For ABI, we used quantile normaliza-tion17 independently for each test site and 90% trim mean scaling. For trim mean scaling, the signals for highest 5% and lowest 5% are removed, and the remaining 90% of signals are used to calculate the mean. The mean of each array is scaled to the same level, and the scaling factor for each array is used to scale the signals. The trim mean scaling was calculated independently for each test site.

For AG1, the data were transformed so that signal values below 5 were set to 5. After this transformation, each measurement was divided by the median of all detected measurements in that sample (for median scaling) or by the 75th percentile of all measurements in that sample (for 75th percentile scaling).

For AFX data, we used PLIER21, MAS 5.0, RMA18 and GCRMA27 for data preprocessing and normalization. The PLIER method produces a summary value for a probe set by accounting for experimentally observed patterns in feature behavior and handling error appropriately at low and high abundance. PLIER accounts for the systematic differences between features by means of parameters termed feature responses, using one such parameter per feature (or pair of fea-tures, when using mismatch (MM) probes to estimate cross-hybridization signal intensities for background). Feature responses represent the relative differences in intensity between features hybridizing to a common target. PLIER produces a probe-set signal by using these feature responses to interpret intensity data, apply-ing dynamic weighting by empirical feature performance and handling error appropriately across low and high abundances. Feature responses are calculated using experimental data across multiple arrays. PLIER also uses an error model that assumes error is proportional to the observed intensity rather than to the background-subtracted intensity. This ensures that the error model can adjust appropriately for relatively low and high abundances of target nucleic acids. Here, PLIER was run with the default options (quantile normalization and PM-MM) with the addition of a 16 offset to each expression value13.

The AFX MAS 5.0 algorithm is a method for calculating probe-set signal values. The MAS 5.0 algorithm is implemented on a chip-by-chip basis and is not applied across an entire set of chips. The signal value is calculated from the background-adjusted PM and MM values of the probes in the set using a robust biweight estimator. Here, MAS 5.0 is implemented with default options, and global scaling (96% trim mean) is used for normalization.

RMA18 fits a robust linear model to the probe-level data and conducts a mul-tichip analysis. The algorithm includes a model-based background correction, quantile normalization and an iterative median polishing procedure to generate a single expression value for each probe set. GCRMA substantially refines the RMA algorithm by replacing the model for background correction with a more sophisticated computation that uses each probe’s sequence information to adjust the measured intensity for the effects of nonspecific binding, according to the different bond strengths of the two types of base pairs. It also takes into account the optical noise present in data acquisition. Both RMA and GCRMA were imple-mented using the ArrayAssist Lite package with default settings (Affymetrix; http://www.affymetrix.com/products/software/specific/arrayassist_lite.affx).

For GEH data, we compared median scaling and quantile normalization. For the median-scaling approach, each measurement was divided by the median of all measurements within each array. Therefore, the median signal is scaled to 1 for each array. The quantile normalization approach16 was applied to log2-transformed expression values across all samples and replicates within each site.

For ILM data, we compared quantile normalization16 with the addition of 15 counts of offset to each probe signal13 and normalization by a robust least-squares fit of rank-invariant genes. For the latter normalization method, array data corresponding to sample A were averaged and used as a reference on each site independently. Signals from each array in the experiment were compared to the reference, and probes with relative rank changes of less than 5% (only probes ranked between the 50th and 90th percentiles were included) were con-sidered to be rank invariant. Normalization coefficients were computed with iteratively reweighted linear least squares using the Tukey bisquare weight func-tion. Background signal, estimated as the mean signal of negative controls, was subtracted before normalization. Each ILM array contains approximately 1,600

negative control probes, which are thermodynamically equivalent to regular probes but do not have specific targets in the transcriptome. Gene signals were ranked relative to signals of negative controls, and the detection flag was set to present if gene signal exceeded 99% of signals of negative controls.

Purification of mRNA to empirically determine abundance in samples A and B. In a follow-up experiment, mRNA was isolated from 100 µg of samples A and B total RNA in duplicate using the Absolutely mRNA purification kit (Stratagene) according to the manufacturer’s protocol. Briefly, 50 µl of mRNA oligo (dT) magnetic particles were combined with 100 µl of total RNA and washed four times, and mRNA was eluted with 100 µl elution buffer. mRNA quantity and quality were evaluated by ND-1000 NanoDrop spectrophotometer (NanoDrop Technologies) and Agilent 2100 Bioanalyzer with RNA 6000 Nano LabChip Kit (Agilent Technologies). This empirical evaluation of mRNA content in each 100 ng of total RNA produced an average yield of 2.870 ± 0.095 ng for sample A and 2.003 ± 0.124 ng for sample B (mean ± s.d.).


ACKNOWLEDGMENTSThis study used a number of computing resources, including the high-performance computational capabilities of the Biowulf PC/Linux cluster at the US National Institutes of Health in Bethesda, Maryland (http://biowulf.nih.gov). This research was supported in part by the Intramural Research Program of the US National Institutes of Health, National Library of Medicine.

DISCLAIMER This work includes contributions from, and was reviewed by, the FDA and the NIH. This work has been approved for publication by these agencies, but it does not necessarily reflect official agency policy. Certain commercial materials and equipment are identified in order to adequately specify experimental procedures. In no case does such identification imply recommendation or endorsement by the FDA or the NIH, nor does it imply that the items identified are necessarily the best available for the purpose.

COMPETING INTERESTS STATEMENTThe following authors declare competing financial interests (see the Nature Biotechnology website for details).

Published online at http://www.nature.com/nbt/Reprints and permissions information is available online at http://npg.nature.com/reprintsandpermissions/

1. Barczak, A. et al. Spotted long oligonucleotide arrays for human gene expression analysis. Genome Res. 13, 1775–1785 (2003).

2. Barnes, M., Freudenberg, J., Thompson, S., Aronow, B. & Pavlidis, P. Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms. Nucleic Acids Res. 33, 5914–5923 (2005).

3. Dobbin, K.K. et al. Interlaboratory comparability study of cancer gene expression analysis using oligonucleotide microarrays. Clin. Cancer Res. 11, 565–572 (2005).

4. Dorris, D.R. et al. Oligodeoxyribonucleotide probe accessibility on a three-dimensional DNA microarray surface and the effect of hybridization time on the accuracy of expres-sion ratios. BMC Biotechnol. 3, 6 (2003).

5. Hughes, T.R. et al. Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat. Biotechnol. 19, 342–347 (2001).

6. Irizarry, R.A. et al. Multiple-laboratory comparison of microarray platforms. Nat. Methods 2, 345–350 (2005).

7. Larkin, J.E., Frank, B.C., Gavras, H., Sultana, R. & Quackenbush, J. Independence and reproducibility across microarray platforms. Nat. Methods 2, 337–344 (2005).

8. Li, J., Pankratz, M. & Johnson, J.A. Differential gene expression patterns revealed by oligonucleotide versus long cDNA arrays. Toxicol. Sci. 69, 383–390 (2002).

9. Naef, F., Socci, N.D. & Magnasco, M. A study of accuracy and precision in oligo-nucleotide arrays: extracting more signal at large concentrations. Bioinformatics 19, 178–184 (2003).

10. Shippy, R. et al. Performance evaluation of commercial short-oligonucleotide microar-rays and the impact of noise in making cross-platform correlations. BMC Genomics 5, 61 (2004).

11. Yuen, T., Wurmbach, E., Pfeffer, R.L., Ebersole, B.J. & Sealfon, S.C. Accuracy and calibration of commercial oligonucleotide and custom cDNA microarrays. Nucleic Acids Res. 30, e48 (2002).

12. Chudin, E. et al. Assessment of the relationship between signal intensities and tran-script concentration for Affymetrix GeneChip arrays. Genome Biol. 3, RESEARCH0005 (2002).


ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


14. Shi, L. et al. Cross-platform comparability of microarray technology: intra-platform con-sistency and appropriate data analysis procedures are essential. BMC Bioinformatics 6 (Suppl.) S12 (2005).

15. Thompson, K.L. et al. Use of a mixed tissue RNA design for performance assessments on multiple microarray formats. Nucleic Acids Res. 33, e187 (2005).

16. Bolstad, B.M., Irizarry, R.A., Astrand, M. & Speed, T.P. A comparison of normaliza-tion methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185–193 (2003).

17. Irizarry, R.A. et al. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 31, e15 (2003).

18. Irizarry, R.A. et al. Exploration, normalization, and summaries of high density oligo-nucleotide array probe level data. Biostatistics 4, 249–264 (2003).

19. Irizarry, R.A., Wu, Z. & Jaffee, H.A. Comparison of Affymetrix GeneChip expression measures. Bioinformatics 22, 789–794 (2006).

20. Parrish, R.S. & Spencer, H.J. III. Effect of normalization on significance testing for oligonucleotide microarrays. J. Biopharm. Stat. 14, 575–589 (2004).

21. Guide to probe logarithmic intensity error (PLIER) estimation. Affymetrix Technical Note <http://www.affymetrix.com/support/technical/technotes/plier_technote.pdf>

22. Statistical algorithms description document. Affymetrix <http://www.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf>

23. Cope, L.M., Irizarry, R.A., Jaffee, H.A., Wu, Z. & Speed, T.P. A benchmark for Affymetrix GeneChip expression measures. Bioinformatics 20, 323–331 (2004).

24. Wu, Z. & Irizarry, R.A. Stochastic models inspired by hybridization theory for short oligonucleotide arrays. J. Comput. Biol. 12, 882–893 (2005).

25. Sendera, T.J. et al. Expression profiling with oligonucleotide arrays: technologies and applications for neurobiology. Neurochem. Res. 27, 1005–1026 (2002).

26. Wu, Z. , Irizarry, R.A., Gentleman, R., Martinez Murillo, F. & Spencer, F. A model based background adjustment for oligonucleotide expression arrays. J. Am. Stat. Assoc. 99, 909–917 (2004).

27. Seo, J., Gordish-Dressman, H. & Hoffman, E.P. An interactive power analysis tool for microarray hypothesis testing and generation. Bioinformatics 22, 808–814 (2006).

28. Hwang, D., Schmitt, W.A. & Stephanopoulos, G. Determinatoin of minimum sample size and discriminatory expression patterns in microarray data. Bioinformatics 18, 1184–1193 (2002).

29. Tibshirani, R. A simple method for assessing sample sizes in microarray experiments. BMC Bioinformatics 7, 106 (2006).

30. Page, G.P. et al. The PowerAtlas: a power and sample size atlas for microarray experi-mental design and research. BMC Bioinformatics 7, 84 (2006).

31. Tong, W. et al. Evaluation of external RNA controls for the assessment of microarray performance. Nat. Biotechnol. 24, 1132–1139 (2006).

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Evaluation of external RNA controls for the assessment of microarray performanceWeida Tong1, Anne Bergstrom Lucas 2, Richard Shippy3, Xiaohui Fan1,4, Hong Fang5, Huixiao Hong5, Michael S Orr6, Tzu-Ming Chu7, Xu Guo8, Patrick J Collins2, Yongming Andrew Sun9, Sue-Jane Wang6, Wenjun Bao7, Russell D Wolfinger7, Svetlana Shchegrova2, Lei Guo1, Janet A Warrington8 & Leming Shi1

External RNA controls (ERCs), although important for microarray assay performance assessment, have yet to be fully implemented in the research community. As part of the MicroArray Quality Control (MAQC) study, two types of ERCs were implemented and evaluated; one was added to the total RNA in the samples before amplification and labeling; the other was added to the copyRNAs (cRNAs) before hybridization. ERC concentration-response curves were used across multiple commercial microarray platforms to identify problematic assays and potential sources of variation in the analytical process. In addition, the behavior of different ERC types was investigated, resulting in several important observations, such as the sample-dependent attributes of performance and the potential of using these control RNAs in a combinatorial fashion. This multiplatform investigation of the behavior and utility of ERCs provides a basis for articulating specific recommendations for their future use in evaluating assay performance across multiple platforms.

ERCs are synthetic or naturally occurring RNA species that are added to an RNA sample for the purpose of quality control of the assay. Most commercial microarray platforms contain probes specifically designed for interrogating ERC transcripts. These probes have been extensively pro-totyped and optimized for performance on each microarray platform. To provide an enhanced assessment of the analytical performance of the system during data collection, a variety of ERCs can be added to the sample in a range of concentrations spanning high to low abundance by

evaluating assay performance across the expected range of concentrations in the sample1. A well-constructed concentration-response series of ERCs is useful in many ways for assessing assay performance. Depending on the point in the assay the ERCs are added, they can be used to identify poten-tially failed steps during the assay process. Realizing the potential impor-tance of ERCs for analytical performance assessment, the External RNA Control Consortium (ERCC) was established in 2003 with the objective of developing a set of ERC transcripts that could be used with various gene expression profiling technologies, including microarray platforms2.

ERCs can also be useful for evaluating different data analysis methods3. The cRNA data set from Affymetrix, known as the Latin square data set (http://www.affymetrix.com/support/technical/sample_data/datasets.affx), consists of data from 42 cRNAs, which were prelabeled and added to a hybridization solution at various known concentrations. A similar data set is also provided by GeneLogic (http://www.genelogic.com/newsroom/studies/index.cfm). Both data sets are freely available and have been widely used in the research community for comparative performance analysis of GeneChip-specific normalization and gene selection methods4–7. Recently, Choe et al.8,9 demonstrated the value of using a large number of cRNA transcripts at concentration ratios varying from one- to fourfold to com-pare the performance of different data analysis scenarios.

The MAQC study10 provides a rich data resource to investigate various issues associated with DNA microarray platforms, including the perfor-mance of ERCs across various platforms. In this project, the probes for the ERC transcripts (Supplementary Methods online) are unique non-mammalian sequences selected to minimize cross-hybridization with transcripts from mammalian species such as human, mouse and rat. Seven microarray platforms were evaluated and ERCs were used in the following platforms: Applied Biosystems Genome Survey Microarray, Affymetrix GeneChip, both Agilent’s One-Color and Two-Color plat-forms, GE Healthcare CodeLink and Eppendorf (data not shown). With these data sets, the following questions were asked: (i) Do the ERCs behave in the expected manner? (ii) Can outlying assays be identified using ERCs? (iii) Can ERCs assess the accuracy of ratios between dif-ferent samples? (iv) Can ERCs provide information other than assay quality? (v) How does the choice of normalization and data processing methods affect the ERCs data?

RESULTSThe utility and performance behavior of ERCs were investigated using two independent sets of data; the MAQC data set10 and rat toxicogenomics

1National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR Rd., Jefferson, Arkansas 72079, USA. 2Agilent Technologies, Inc., 5301 Stevens Creek Blvd., Santa Clara, California 95051, USA. 3GE Healthcare, 7700 S. River Pkwy., Suite #2603, Tempe, Arizona 85284, USA.4Pharmaceutical Informatics Institute, Zhejiang University, Hangzhou 310027, China. 5Z-Tech Corporation, National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR Rd., Jefferson, Arkansas 72079, USA. 6Center for Drug Evaluation and Research, US Food and Drug Administration, 10903 New Hampshire Ave., Silver Spring, Maryland 20993, USA. 7SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513, USA. 8Affymetrix, Inc., 3420 Central Expressway, Santa Clara, California 95051, USA. 9Applied Biosystems, 850 Lincoln Centre Dr., Foster City, California 94404, USA. Correspondence should be addressed to W.T. ([email protected]).


ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


(TGx) data set11. Because the results in this paper are derived from two independent experiments the following nomenclature is used to provide clarity.

The subset of the MAQC data set used for the present analysis cor-responds to four genome-wide commercial microarray platforms, Affymetrix GeneChip (AFX), Applied Biosystems Genome Survey Microarray (ABI) and Agilent One-Color (AG1) and Agilent Two-Color (AGL) microarrays. Data were generated for each of these platforms by three different test sites with five technical replicates for each of the four RNA samples (A, B, C and D10,12). Each data set is denoted by plat-form_site_replicate; for example, AG1_2_A1 denotes Agilent One-Color platform, test site 2, sample A and replicate 1.

The rat TGx data set that is denoted by platform_RAT contains data from Affymetrix (AFX_Rat), Agilent One-Color microarray (AG1_Rat), Applied Biosystems (ABI_Rat) and GE Healthcare (GEH_Rat). This experiment was performed at one test site with six biological replicates for each of six different treatments. The nomenclature for the site, men-tioned above, is therefore not applicable, yet it’s necessary to make a distinction between samples and that is provided in Methods and within the figures.

Two types of ERCs were investigated. One type is added to the total RNA (called tERC hereafter) before initiating the cDNA synthesis and in vitro transcription steps of the RNA labeling procedure. When added in this manner, the tERC generally assesses the efficiency of the target preparation as well as the performance of the hybridization and scanner. The other type of ERC is added to the cRNA (called cERC hereafter) immediately before hybridization, which allows assessment of the assay performance from the hybridization onward. Applied Biosystems and Affymetrix platforms used both types of ERCs in their respective protocols, whereas Agilent used tERC and GE Healthcare used cERC only (Fig. 1).

The concentration-response behavior of both tERCs and cERCs was evaluated using a linear regression analysis in an effort to iden-tify microarray assays that show outlier behavior. This is a favorable approach as the analysis is self-contained within each microarray, and therefore, does not require replicates to assess outliers. The behavior of both ERC types was investigated further to determine if additional ERC-specific analysis methods could be useful for analytical performance assessment.

External RNA control concentration-response curvesThe ERC transcripts span a range of concen-trations in the Affymetrix, Agilent and GE Healthcare microarray platforms, making them suitable for concentration-response analyses. The Agilent One-Color platform has ten tERCs that span six logs of concentration and interrogate the lower and upper limits of assay signal detection (Supplementary Table 1 online). The Affymetrix platform has four tERCs that span one and a half logs of con-centration and the GE Healthcare platform has six ERCs that span three logs of concentration. For the Applied Biosystems microarray plat-form, ERC controls are spiked at a single fixed concentration, rendering them unsuitable for a concentration-response analysis.

Figure 2 depicts the concentration-response curves for AG1, AFX and GEH_RAT. In general, all platforms exhibited accurate concentration-response patterns. In addition, performance

differences are observed for tERCs relative to cERCs as seen in the data from AFX where the tERCs show decreased linear correlations compared to the cERC plots (Fig. 2, comparing the second and third rows of graphs for the AFX platform). This result is somewhat expected as the tERCs are introduced earlier in the assay process and are subject to multiple sources of variation introduced during sample amplification and labeling, more closely approximating the analytic manipulation. In contrast, the cERCs are added just before hybridization, and their more stable performance reflects fewer sample manipulations after these controls are added.

Two assays generated by AG1 site 2 (AG1_2_D2 and AG1_2_A3) have noticeably higher signals for tERCs at the lowest concentrations, indicat-ing potential assay outliers. However, the specific problematic step of the assay for these two data sets cannot be identified because the behavior of tERC reflects the performance associated with multiple steps of the experiment. The benefit of using both tERCs and cERCs is demonstrated with the AFX platform, where the combination was used to elucidate pro-cedural problems in the assay. In this example the AFX cERC performance is stable and consistent across all three test sites, but tERCs in site 1 have lower y-intercepts as compared to the other two sites, indicating that for site 1 the target preparation yield or labeling efficiency differed from the other sites (Fig. 2).

Concentration-response curves in one-color microarray assaysIn addition to visually inspecting the concentration-response curves to interrogate the performance over the dynamic range of an assay, we cal-culated the linear regression statistics of the linear portion of the curves for outlier identification, including R2 correlations and slopes. Figure 3 (Supplementary Table 2 online) plots the linear regression slope versus R2 correlations for AG1, AFX and GEH_Rat. Three outlying assays were identified for AG1 site 2 (Fig. 3a); AG1_2_D1 has a normal R2 with a low slope, whereas AG1_2_D2 has a normal slope with a low R2 and AG1_2_A3 has both low slope and R2.

An assay with a concentration-response slope of one indicates no compression of the signal because values of x and y are identical across the regression fit. By inspecting the slopes in Figure 3, different degrees of compression in gene expression data are observed between three

Fragmentation

Fragmented cRNA

cERCadded to

cERCadded to

IVT

RT

Total RNA samples

cDNA

cRNA

Affymetrix: four poly-A controls

Applied Biosystems: three IVT controls and three RT controls

Hybridization

Agilent: ten in vitro synthesized, polyadenylatedtranscripts for both one- and two-color arrays

tERCadded to

Microarray assay process

GE Healthcare: six positive controls

Affymetrix: four hybridization controls

Applied Biosystems: three hybridization controls

Figure 1 Overview of external RNA controls (ERCs) implemented in Affymetrix, Agilent, Applied Biosystems and GE Healthcare platforms. Two types of ERCs are implemented in these four commercial microarray platforms. The first type of ERC is added to the total RNA (tERC) before initiating the cDNA synthesis and IVT (in vitro transcription) steps of the RNA labeling procedure. The second type of ERC is added to the cRNA (cERC) just before the cRNA is placed into the hybridization mixture. Applied Biosystems and Affymetrix platforms use both types of ERCs in their respective protocols, whereas Agilent uses the tERC and GE Healthcare uses cERC in this study. RT, reverse transcription.

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


platforms. The AG1 platform has very little compression with a slope close to 1 for tERCs. However, ERC data for AFX and GEH_RAT experiments appear compressed to a similar extent with slopes that are detectably <1. The effect of normalization methods on the compression was also investigated. For AFX, PLIER13, MAS514, RMA6, GCRMA15 and dChip16 algorithms were used, whereas median scaling and quantile normalization methods were applied for both AG1 and GEH. For AFX, dChip compresses the gene expression data more than other methods whereas GCRMA has little compression or even a small degree of expansion (Fig. 3b)15. For AG1, the quantile nor-malization tends to separate A and C from D and B by slope in accordance with the mRNA abundance of the samples as shown in the “Performance of External RNA Controls” section (Fig. 3a). This sample-dependent behavior associated with the quantile normalization is also observed for AFX (PLIER, RMA and GC-RMA) (Fig. 3b) and GEH_RAT (Fig. 3c).

External RNA controls in two-color microarray assaysAgilent was the only two-color platform to use ERCs in the MAQC study. Agilent formulates two-color ERCs into two different mix-tures that span 2.3 logs of concentration and are mixed with different

concentrations to give the following expected ratios: 1:10, 1:3, 1:1, 3:1 and 10:1 (Supplementary Table 3 online). This type of ERC formulation adds an additional dimension to the typical one-color concentration-response analysis, because not only should the tERCs generate signals proportional to the concentrations within each of the samples, but the two-color assays should also generate observed ratios equal to the expected ratios (or log10 ratios) when the data sets are dye normalized and analyzed. This accuracy assessment is contained within each probe interrogating a specific ERC transcript.

The observed versus expected ERC log10 ratio plots for AGL are pre-sented in Figure 4. There were two outlying assays at AGL site 1 with major assay performance failures generating log10 ratios close to zero across the assay. AG1_1_01 had to be rescanned three weeks after the initial experiment and had faded significantly and AG1_1_85 had the same ERC control mixture added to the samples resulting in log10 ratios of 0 for all ten of the ERC transcripts. Two assays in AGL site 2 were also determined to be outliers. These outliers were found to have increased

AG1 site 1

Sig

nal i

nten

sity

tERC/poly-A molar ratio1/1,000,000 1/100,000 1/10,000 1/1,000 1/100

AG1 site 2

AG1_2_A3

AG1_2_D2


AG1 site 3

sample Asample Bsample Csample D


AFX site 1

Sig

nal i

nten

sity

tERC/poly-A molar ratio1/1,000,000 1/100,000 1/10,000 1/1,000

AFX site 2


AFX site 3



AFX site 1 AFX site 2 AFX site 3


LAA

Sig

nal i

nten

sity

cERC concentration (pM) cERC concentration (pM) cERC concentration (pM)

GEH Rat

Sig

nal i

nten

sity

cERC concentration (pM) cERC concentration (pM) cERC concentration (pM)

LRDLLCTRLCFY

KAAKCTR

Figure 2 Concentration-response curves for ERCs on the Agilent, Affymetrix and GE Healthcare microarray platforms. Each concentration-response curve is generated from an individual microarray data set and represents the concentration of either the tERC (spiked poly-A molar ratio) or of cERC (spiked concentration in pM) on the x-axis as a function of normalized signal intensity on the y-axis. The amount of cERC added to the hybridization mixture is expressed in molar concentration based on the mass of the cERC transcript added to a specific volume of the hybridization mixture. The assumptions used to calculate the poly-A mass ratio for the different tERCs were that the average percentage of mRNA in total RNA is 2%, the average transcript length is 2,000 bases and the average molecular weight of a single base is 330 g/mol. The cERC concentration and tERC poly-A molar ratio used for this figure are summarized in Supplementary Table 1 online. The Agilent platform is presented in the first row where seven of the ten tERCs with the highest concentrations are plotted to better compare scales to the other platforms (the full concentration-response curve is presented in Supplementary Fig. 9 online). The Affymetrix platform is presented in the second and third rows and illustrates the combinatorial approach of using both tERCs (second row) and cERCs (third row). The GE Healthcare platform is presented in the fourth row illustrating the cERC concentration-response from the rat toxicogenomics study. This figure illustrates the different approaches each manufacturer employs for either tERC, cERC or both, when assessing assay quality using ERCs. Two microarrays from AG1 site 2 (AG1_2_D2 and AG1_2_A3) exhibit higher than expected signals for the tERCs with the lowest concentrations, indicating that these could be outlying assays. AA, aristolochic acid; RDL, riddelliine; CFY, comfrey. ‘L’ indicates samples isolated from livers and ‘K’ samples isolated from kidneys of treated rats. CTR, control (liver or kidney from untreated rats).

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


within-feature noise, which might result from sample contamination from the reagents used to purify the labeled cRNA. A similar observation is obtained when comparing the linear regres-sion correlation coefficients from the observed versus expected ratios, where the outliers are determined based on R2 correlations for the linear fit beyond two s.d. below the mean for that site (Supplementary Fig. 1 online).

In the MAQC study, the two-color microar-ray assays used only samples A and B with a dye-swap experimental design. The y-intercept was >0 (shifted up) for Cy5(B)/Cy3(A) in all three sites and the y-intercept was <0 (shifted down) for Cy5(A)/Cy3(B) at all three sites (Fig. 4). This shift indicates differences in mRNA abundance between sample A and sample B, which will be further analyzed in the following section.

Performance of external RNA controlsThe MAQC data sets were generated from four RNA samples with an incremental increase of brain total RNA across the samples: A (0%), C (25%), D (75%) and B (100%). Because the relative mRNA abundance is not expected to be the same between Stratagene Universal Human Reference RNA (UHRR or sample A) and Ambion Human Brain Reference RNA (brain or sample B), the effect of the mRNA abun-dance on the ERC behavior was investigated in terms of the signal intensity with the objec-tive of developing other ERC-specific analysis methods for assay assessment.

The tERC signal intensity increases in pro-portion to increasing concentrations of brain mRNA in the sample mixture, whereas the signal intensity from the biological probes exhibits reverse trends (Supplementary Fig. 2 online). The general trend was conserved

AG1 site 1 AG1 site 2 AG1 site 3 AG1 TGx

AFX site 1 AFX site 2 AFX site 3 AFX TGx

Raw

/median

Quantile

Plier

MA

S5

dChip

GC

RM

AR

MA

Raw

Quantile

Slope

Slope

Slope

R2

R2

R2

1

0.95

0.9

0.85

0.8

0.75

0.95

0.9

0.85

0.8

0.75

1

0.95

0.9

0.85

0.8

0.75

1

0.95

0.9

0.85

0.8

0.75

1

0.95

0.9

0.85

0.8

0.75

1

0.95

0.9

0.85

0.8

0.75

1

0.95

0.9

0.85

0.8

0.75

1

0.95

0.9

0.85

0.8

0.75

1

0.95

0.9

0.85

0.8

0.75

1

.6 .7 .8 .9 1 1.1 1.2

.6 .7 .8 .9 1 1.1 1.2 .6 .7 .8 .9 1 1.1 1.2 .6 .7 .8 .9 1 1.1 1.2 .6 .7 .8 .9 1 1.1 1.2

.6 .7 .8 .9 1 1.1 1.2

.6 .7 .8 .9 1 1.1 1.2 .6 .7 .8 .9 1 1.1 1.2 .6 .7 .8 .9 1 1.1 1.2

AG1 MAQC tERC sample A: (�), B: (�), C: (�) and D: (�)

AFX MAQC cERC sample A: (�), B: (�), C: (�) and D: (�)

TGx tERC sample KAA: (�), KCTR: (�), LAA: (�), LCFY: (�), LCTR: (�), and LRDL (�).

AFX MAQC cERC sample A: (�), B: (�), C: (�) and D: (�)

GEH TGx cERC sample KAA: (�), KCTR: (�), LAA: (�),LCFY: (�), LCTR: (�), and LRDL (�)

AFX TGx tERC sample KAA: (�), KCTR: (�), LAA: (�), LCFY: (�), LCTR: (�), and LRDL (�).

a

b

c

AG1_2_D1

AG1_2_A3 AG1_2_D2

Figure 3 Concentration-response linear regression results for the Agilent, Affymetrix and GE Healthcare microarray platforms. (a–c) The R2 correlation coefficients (y-axis) versus slope (x-axis) from a regression analysis based on the linear portion of the concentration-response curves for AG1 (a); AFX (b) and GEH_Rat (c). Data used in creating this figure are in Supplementary Table 2 online. Abbreviations are as defined in Figure 2. For AG1, two types of data normalization methods are presented for both MAQC and TGx data sets: raw/median scaling and quantile normalization. For AFX five types of data normalization methods are presented for both the MAQC and TGx data sets: PLIER, MAS5, dChip, RMA and GCRMA. For GEH_Rat, the raw/median data are presented for the TGx data set. This analysis indicates that (i) a degree of compression in signal is evident with the slope <1 for Affymetrix and GE Healthcare platforms, (ii) the quantile normalization method causes the data to separate by sample type and (iii) three outlying assays are identified in AG1 site 2.

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


across the different normalization methods when PLIER, MAS5, RMA and dChip were examined for AFX and the median scaling and quantile normalizations were applied to AG1 (Supplementary Fig. 3 online). This behavior was more pronounced when the ratio of the median tERC signal intensity was divided by the corresponding median biological probe intensity and plotted against the percentage of brain in the biological target sample as depicted in Figure 5 (Supplementary Table 4 online), where a posi-tive linear correlation was observed across three different one-color platforms (ABI, AFX and AG1) with slopes >0 and high correlation coefficients (R2 > 0.8). Two titration points (sample C and sample D) were plotted based on the amount of brain in the sample based on the volumetric mixing of samples A and B where C = 75%A + 25%B and D = 25%A + 75%B (Fig. 5). This plot is accurate if the percentage of mRNA is equal between sample A and sample B. However, the Agilent two-color tERC data indicate that the percentage of mRNA was higher in sample A compared to that in sample B (Fig. 4 and Supplementary Fig. 4 online). If we assume that sample A has 1.5-fold more mRNA as compared to sample B12, the percentage of brain RNA in sample C becomes 18% and for sample D becomes 67%. When these values are used in the x-axis of Supplementary Fig. 5 online, the correlation coefficients improve for all of the samples at all of the sites for three different microarray platforms, further supporting the hypothesis that the samples have different percentages of mRNA.

The effect of the mRNA abundance differences between the four sam-ples on cERC signal intensities was also investigated. Unlike tERC signal intensities, the cERC signal intensities across the four RNA samples for the ABI and AFX exhibited no significant difference (Supplementary Fig. 6 online), indicating that the cERCs added before hybridization are unaffected by the differences in the relative abundance of the sample mRNA tested in this set of experiments. The observation is also not affected by the choice of normalization (Supplementary Fig. 7 online). This result further supports the hypothesis that the differences between the biological samples occur at an earlier stage of target preparation.

Additional analyses using external RNA controlsFor most assays identified as problematic, one or several ERCs behave differently from the others, which should be captured by an intensity-based unsupervised analysis, such as principal component analysis (PCA)17 or hierarchical cluster analysis (HCA)17. PCA based on tERC signal intensity identified AG1_1_D2, AG1_1_A1 and AG1_3_B3 as outliers, consistent with the PCA plot based on the entire microarray

(Fig. 6a). Agilent’s Feature Extraction QC Report uses a different algo-rithm: the concentration-response curve fit to the linear portion is per-formed on a log-log plot after a parameterized sigmoidal curve fit of the data. The R2 correlations and slopes from the AG1 QC report are shown in Figure 6b. This type of sigmoidal curve fitting ignores the differences seen in the tERCs outside the linear range and results in identification of a different set of outlying assays than in the analysis shown in Fig. 3a, but with the same assays as identified in the PCA analysis (Fig. 6a). Results similar to those in Fig. 6a are also observed using HCA (Supplementary Fig. 8 online). These analyses, as well as approaches based on the concentration-response curve (Figs. 2 and 3) demonstrate the value of combining various ERC-specific approaches to enhance the capability of assay assessment.

DISCUSSIONA number of microarray manufacturers use ERCs to assess the technical performance of their gene expression assays. This study investigated the utility of ERCs, with emphasis on cERCs and tERCs, for assay assessment across five commercial microarray platforms using the MAQC data set10 and a rat toxicogenomic data set11.

This study explores several different uses of ERCs for assay assess-ment. First, the observed ERC signal intensities were examined against the expected concentrations to visually detect potential out-lying assays, which tend to deviate from the expected concentration-response curve trend. Second, the concentration-response curves were modeled for identification of potential outlying assays using output variables from linear regression analysis. These two approaches take advantage of the unique characteristic of ERCs spiked across a wide range of differing concentrations. However, for some platforms such as Applied Biosystems, ERCs are spiked in at a constant concentration, requiring analysis methods other than the concentration-response curve analysis. Thus, PCA and HCA were conducted based on the ERC signal intensity, and the ERC-identified outlying data sets are consistent with the analysis results based on the biological whole-microarray data. These approaches are complimentary to each other and could be used in conjunction to enhance the discrimination of outlier identification.

Sample_AA Sample_BB Sample_BA Sample_AB

AGL_2_A1

AG1_1_B5

AG1_1_D1

AGL_2_C4

Site 1

Site 2

Site 3

Obs

erve

d lo

g ra

tio

Expected log ratio

Figure 4 Expected versus observed log10 ratio comparison for Agilent Two-Color ERC data. The expected log10 ratios on the x-axis were based on the quantity of each tERC transcript spiked into the total RNA (Supplementary Table 3 online). The dye-normalized signal ratios obtained from the AGL Feature Extraction software are plotted as observed log10 ratios on the y-axis. These are grouped by site name and ordered by sample combination. In the Two-Color assay, four pairs of RNA samples were generated by using only samples A and B. The samples are named AA, BB, AB and BA where the letters represent the RNA sample type with the first letter denoting the sample labeled with Cy5 and the second letter the sample labeled with Cy3. Four outlying assays are highlighted as red, two from site 1 and two from site 2.

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


ERCs added at different steps in the assay offer a quality control for different steps of the assay process. cERCs are tolerant to differences in the mRNA abundance in the total RNA samples and provide the advantage of being able to assess assay performance independent of the total RNA sample complexity (Fig. 2). A limitation of the cERCs is the inability to detect variability that may occur during target preparation. Because tERCs are added into the assay process at a very early stage, they can reveal failures during sample collection, storage, labeling and amplification as well as hybridization, scanning and data collection. As poor target quality is a common reason for aberrant assay results, there is value in being able to use tERC to assess this independently, while using cERCs to differentiate post-labeling sources of variation. Therefore, these two types of ERCs are most valuable when used in combination. This utility was demonstrated through the analysis of the AFX site 1 data. The combination of tERC and cERC information assisted in the determination of sample amplification and labeling yields that differed from other sites and underlies the spread in the variability data.

Our key findings can be summarized as fol-lows. The cERCs exhibit stable and consistent performance across both samples and sites. tERC signal intensities increased and the bio-logical probe signal intensity decreased in pro-portion to increasing amounts of brain RNA in the samples. When the tERC is added to total RNA samples, it is assumed that the tERC tran-scripts are at different relative proportions to the pool of biological RNA transcripts. As the abundance of mRNA is relatively higher in sam-ple A as compared to the brain (sample B), the median signal of biological probes was found to be higher in sample A than in sample B, whereas the median tERC signal had the inverse relation-ship. We further determined that different levels of compression in gene expression exist across commercial platforms, indicating that care must be taken when conducting a cross-platform comparison with respect to making absolute fold-change assessments. And, finally, we also determined that quantile-based normalization approaches, such as those used in PLIER, RMA and GCRMA for Affymetrix and for the Agilent One-Color and GE Healthcare platforms, reveal the variability of the concentration-response slope estimates. This increase in variability may result from the differences in percentage mRNA between samples A and B. Although the median-normalized signals of the tERCs and cERCs are relatively consistent, their relative ranks within

samples A and B are different. Quantile normalization forces the distribu-tions of all data sets to be identical, moving the signals for the tERCs and cERCs away from their original raw expression values.

Because no single common standard set of external RNA controls using extended concentration range and a Latin square design are in place for use across platforms in the microarray community, it is not

Site 1 Site 2 Site 3

ABI

AGI

AFX

Rat

io o

f med

ian

tER

C s

igna

l div

ided

by

med

ian

biol

ogic

al s

igna

l R2 = 0.9170

R2 = 0.9522

R2 = 0.9350

R2 = 0.9789

R2 = 0.8314

R2 = 0.9788

R2 = 0.8198

R2 = 0.8372

R2 = 0.8325 �� Sample A� Sample B� Sample C� Sample D

Brain in the sample (%)

Figure 5 Illustration of the sample-dependent behavior of tERC signal across the MAQC samples. The ratio of the median tERC signal to the median biological signal is plotted against the percentages of brain RNA in the different samples (0%, 25%, 75%, and 100% for A, C, D and B, respectively). In all nine groupings (three sites for each of three platforms), the slope was greater than zero with high correlation coefficients, indicating that the tERC signal intensity is dependent on the abundance of mRNA or biological differences of the samples. Data used in creating this figure, along with the statistical assessment, are summarized in Supplementary Table 4 online.

PC1 PC1

PC

2R

2

–30 –20 –10 0 10 20 30 40 50 60 70

1.30.9 1 1.1 1.2

1

0.995

0.99

0.985

0.98

0.975

0.971.30.9 1 1.1 1.2

1

0.995

0.99

0.985

0.98

0.975

0.971.30.9 1 1.1 1.2

1

0.995

0.99

0.985

0.98

0.975

0.97

AG1_2_D2

AG1_2_D2

AG1_2_D2

AG1_1_A1

AG1_1_A1

AG1_1_A1

AG1_3_B3

AG1_3_B3

AG1_3_B3

Sample ASample BSample CSample D

tERC probes Biological probesa

b

AG1 estimate of slope

20

15

10

5

0

–5

–10

–15

–20

–25–200 –100 0 100 200 300

150

100

50

0

–50

–100

–150

–200

–250

Figure 6 Alternative analysis using ERCs. (a) The Principal Component Analysis (PCA) of the Agilent tERC signal intensity is compared with the Agilent biological signal intensities. The graphs are colored by sample and shaped by site (site 1, triangle; site 2, square; site 3, circle). The same three assays (AG1_1_A1, AG1_2_D2, and AG1_3_B3) are potential outliers based on their shift in both the tERC and the biological signal. (b) Similar to Figure 3a, except that the parameterized sigmoidal curve–fitted linear regression data from the Agilent QC Report concentration-response curves was used to compare R2 correlation data (y-axis) and slope data (y-axis). The same three outlying assays identified in the PCA are shown as potential outliers in this analysis (circled in red) demonstrating identification of outlier agreement between two fairly different analyses.

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


yet possible to run the ideal set of external controls for a study of this nature1. Thus, the intent of this study was to identify key attributes of ERC performance that should be considered for designing better ERCs and associated analysis approaches in the future, which is one of the many important ERCC endeavors1. Based on the findings of this study, several points of consideration are summarized in Box 1.

METHODSMAQC and TGx data sets. There are two types of data sets considered in this study; both are generated from the MAQC project. The difference between these two data sets is the nature of RNA samples used for generating the gene expres-sion data. The MAQC data set used two calibrated RNA samples (A-Stratagene Universal Reference RNA and B-Ambion Brain reference RNA) and their two mixtures (C- 75%A/25%B and D-25%A/75%B). Applied Biosystems data (ABI), Affymetrix GeneChip data (AFX), and Agilent’s One-Color platform data (AG1) were generated using these four RNA samples. Each platform comprises a total of 60 microarrays, five technical replicates for each of four samples (A, B, C and D) for one test site (20 microarrays) and data from three test sites were used. In addi-tion, Agilent Two-Color platform data (AGL) were also generated, but using only samples A and B. For AGL, four sets of assays were conducted with five replicates for each set, two dye swap experiments using brain-Cy5/UHRR-Cy3 (sample BA) and UHRR-Cy5/brain-Cy3 (sample AB) along two types of self-self hybridizations with brain-Cy5/brain-Cy3 (sample BB) and UHRR-Cy5/UHRR-Cy3 (sample AA), resulting in a total of 20 assays. The toxicogenomics (TGx) data set applied the RNA samples from rat livers in a TGx study. The detailed experimental protocol is described elsewhere11. Briefly, six-week-old Big Blue rats were treated with three compounds for 12 weeks and then killed. The compounds were aristolochic acid, a potent nephrotoxin and carcinogen that is present in plants used in herbal medi-cines, riddelliine, a carcinogenic pyrrolizidine alkaloid that contaminates various plants, and comfrey, a plant consumed by humans that is a rat liver carcinogen. RNA samples were isolated from livers of the rats treated with three compounds along with a liver control. In addition, RNA samples were also isolated from kid-neys associated with treatment of aristolochic acid and a kidney control. Thus, there were a total of six types of rat RNA samples (four from liver and two from kidney). Six biological replicates (rats) were generated for each type of six RNA samples. The gene expression data were generated from four microarray platforms, Applied Biosystems (ABI_Rat), Affymetrix GeneChip (AFX_Rat), Agilent One-Color microarray (AG1_Rat), and GE Healthcare CodeLink (GEH_Rat). For each platform, 36 microarrays were generated, six for each of six groups.

Applied Biosystems external RNA controls. These controls contains a suite of controls (>1,592 control probes) that can be used to check the quality of many aspects of an expression profiling experiment. These controls include

the following: blank features, control ladders, hybridization controls, in vitro transcription (IVT) labeling controls, reverse transcription labeling controls, negative controls, spatial calibration controls and manufacturing quality con-trols. Among these controls, we used only the IVT and reverse transcription labeling controls and the hybridization controls, which are spiked at a single fixed concentration. For the hybridization controls, three unlabeled probes are spotted on the microarray: HYB_Control_1_Cp (60 replicates), HYB_Control_2_Cp (60 replicates) and HYB_Control_3_Cp (115 replicates). The hybridization cERCs consist of three digoxigenin-labeled 60-mer oligo control targets supplied with the chemiluminescence detection kit HYB_Control_1_Ct, HYB_Control_2_Ct and HYB_Control_3_Ct. The digoxigenin-labeled oligo targets (cDNA or cRNA) are added to the hybridization mixture. Presence of signal indicates hybrid-ization occurrence and signal strength indicates hybridization stringency. IVT controls consist of three synthetic double-stranded cDNA with a T7 promoter and bacterial control gene sequences: bioB, 1,000-nt ds-cDNA; bioC, 750-nt ds-cDNA; bioD, 600-nt ds-cDNA. Five probes were used for each of three bacterial control genes, bioB, bioC and bioD targeting different regions of the control genes. This resulted in 15 probes and each probe is spotted eight times. Reverse transcription controls consist of three synthetic mRNAs with bacterial control gene sequences: lys, 1000-nt mRNA with poly(A) tail; phe, 1,400-nt mRNA with poly(A) tail; and dap, 1,900-nt mRNA with poly(A) tail. The synthetic mRNAs are added to the reverse transcription reaction with the RNA sample when using the reverse transcription labeling kit or the RT-IVT labeling kit. There are five control probes for each reverse transcription control gene targeting different regions on the gene, and each probe is spotted eight times with a total of 120 reverse transcription control probes. More detail on these controls can be found in http://docs.appliedbiosystems.com/pebiodocs/00113259.pdf and http://docs.appliedbiosystems.com/pebiodocs/04338853.pdf.

Affymetrix external RNA controls. ERCs on GeneChip eukaryotic microarrays include poly-A controls (lys, phe, thr and dap) and hybridization controls (bioB, bioC, bioD and cre). Poly-A controls are Bacillus subtilis genes that are modi-fied by the addition of poly-A tails, and then cloned into pBluescript vectors. The GeneChip Poly-A RNA Control Kit (P/N 900433) contains a presynthesized mixture of lys, phe, thr and dap. These poly A–tailed sense RNA samples can be spiked into isolated RNA samples as controls for the labeling and hybridiza-tion processes. Hybridization controls consists of bioB, bioC, bioD and cre. BioB, bioC and bioD represent genes in the biotin synthesis pathway of Escherichia coli; Cre is the recombinase gene from P1 bacteriophage. The GeneChip Eukaryotic Hybridization Control Kit (P/N 900299 and 900362) contains a mixture of bio-tin-labeled cRNA transcripts of bioB, bioC, bioD and cre. They can be spiked into the hybridization mixture, independent of RNA sample preparation, and used to evaluate sample hybridization efficiency. More detail can be found in GeneChip Expression Analysis Technical Manual (http://www.affymetrix.com/support/technical/manual/expression_manual.affx) and GeneChip Expression Analysis Data Analysis Fundamentals (http://www.affymetrix.com/support/downloads/manuals/data_analysis_fundamentals_manual.pdf).

Agilent external RNA controls. The Agilent One-Color ERC Kit contains a mixture of ten in vitro synthesized, polyadenylated transcripts derived from the Adenovirus E1A gene. These transcripts are premixed at concentrations that span six logs and differ by one log or half-log increments (Supplementary Table 1 online). The ERC mixture is added to the total RNA, amplified and labeled with Cy3-dye. When the ERCs are used in processing Agilent One-Color microarray assays, the Agilent Feature Extraction (version 8.5) QC Report contains a number of tables and graphs providing information on system performance. These include an indication of the linear portion of the dynamic range of the microarray experiment, the high and low detection limits of the experiment and the reproducibility of the controls with coefficient of variation (CV) percentage calculations across the replicate probes for each of the ten ERCs. For more details, see http://www.chem.agilent.com/scripts/literaturePDF.asp?iWHID=42629. The Agilent Two-Color ERC Kit contains the same ten tERC transcripts as used in the Agilent One-Color platform. Each transcript is premixed into two different ERC mixtures at known concentra-tions such that the ten transcripts are present in mass equivalents extending across 2.3 logs of concentration and represent ratios spanning from 1:10 to 10:1 (Supplementary Table 3 online). These two mixtures are spiked into

Box 1 Recommendations for the implementation of external RNA controls• One key benefit of external RNA controls (ERCs) is the ability to

get a qualitative assessment of assay performance. This benefit will be more fully realized when an extensive set of ERCs is available.

• A comprehensive study is needed for modeling concentration-response behavior based on large data sets to determine the tolerance ranges for linear fit, slope and y-intercept for assay assessment, specifically in the context of false positives and false negatives.

• The development of ERC-specific analysis approaches is encouraged.

• ERCs that are added at both the total RNA level and cRNA level are valuable as they enable failure analysis for different steps of the assay. Using both types of ERCs in the same assay is beneficial for monitoring quality at multiple steps in the process.

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


either the Cy3 or Cy5 labeling reactions and colabeled with the total RNA. The Agilent Feature Extraction (version 8.5) QC Report contains a number of tables and graphs providing information on system performance. These include a measure of the expected versus observed log ratios that provide an indication of system accuracy, as well as a determination of the reproducibility of the controls with CV percentage calculations across the replicate probes for each of the ten ERCs. For more details, see http://www.chem.agilent.com/scripts/literaturePDF.asp?iWHID=40485.

GE Healthcare external RNA controls. Each CodeLink Whole Genome bioarray, from GE Healthcare, contains a set of positive-control probes designed against six E. coli genes. For each of the six bacterial genes there are five unique probe sequences represented in an 8× redundancy per rat bioarray. Therefore, there are a total of 240 positive-control probes within each bioarray, which are used to assess microarray quality by reporting dynamic range and sensitivity. Each of the six bacterial transcripts is supplied individually as poly-A(+) mRNA, ranging in size from 1,000 to 1,300 ribonucleotides. These control RNAs can be spiked at dif-ferent concentrations into the total RNA starting material or labeled individually with biotin and spiked into the cRNA before hybridization. The cRNA spiking method, as used in this study, is the manufacturer’s recommendation for inde-pendently measuring bioarray quality because effects due to sample integrity and purity are circumvented. The positive-control poly-A(+) mRNAs supplied with the CodeLink Expression Assay Reagent Kit are araB, entF, fixB, hisB, gnd and leuB. These transcripts are reverse transcribed and amplified individually, incorporating biotin, and arranged in a dilution series from 50 fM to 50 pM, in fourfold concen-tration increments. The final hybridization concentrations of biotinylated spikes in the hybridization solution are araB (51.2pM), entF (12.8pM), fixB (3.2pM), hisB 0.80pM, gnd (0.20fM) and leuB (50.0fM). For more details, see http://www4.amershambiosciences.com/APTRIX/upp00919.nsf/Content/WD%3AExternal+RNA+co%28274354027-B500%29?OpenDocument&hometitle=WebDocs.

Microarray data preprocessing and normalization. Data preprocessing and normalization were performed in ArrayTrack, an FDA microarray data man-agement, analysis and interpretation software18,19. For Affymetrix GeneChip, five different sets of normalized data were used, PLIER, MAS5, dChip, RMA and GCRMA. Present and Absent Calls were generated for each probe set. For the Agilent One-Color microarray, the raw data (gProcessedSignal data), Median Scaling data and Quantile normalized data were used. Negative values and ERCs were not included in the normalization. For the Two-Color microarray, only the dye-normalized Log Ratio data was used, without any further normalization. For the Applied Biosystems Microarray, signal intensity is associated with two measurements, signal/noise ratio and detection call (or flag). The spots having a ratio >3 and flag <8,191 were considered Present. For GE Healthcare CodeLink, the raw data and quantile-normalized data were used.

Concentration-response curve analysis. An ERC commonly has multiple repli-cates placed in different positions of a microarray. In the concentration-response analysis, the ERC signal is the mean intensity over the replicates for AG1 and AGL. For Affymetrix, Applied Biosystems and GE Healthcare platforms, an ERC gene consists of multiple probes targeting different regions of the ERC gene. Thus the ERC signal is calculated by first averaging the signals from different probes of the same gene and then the mean value is calculated over multiple replicates.

The concentration-response curve shown in Figure 2 was generated by plotting the concentration of either tERC (spiked poly-A molar ratio) or cERC (spiked concentration in pM) on the x-axis as a function of signal intensity on the y-axis. The amount of cERC added to the hybridization mixture can be expressed in molar amounts based on the mass of the cERC transcript added to a specific volume of the hybridization mixture. Determining the final molar amount of tERCs in the final hybridization mixture is more difficult. One method is to express the ERC as a mass fraction of the total RNA used in the experiment, which has been recommended by ERCC1. A second method is to use a number of assumptions to determine the poly-A mass ratios. The assumptions used for this paper are that the average percentage of mRNA in total RNA is 2%, the average transcript length is 2,000 bases and the average molecular weight of a single base is 330 g/mol. Using these assumptions and the known length of the individual

tERCs, the poly-A mass ratio for the different tERCs was calculated. Both cERC concentration and tERC poly-A molar ratio used for analysis are summarized in Supplementary Table 1.

The linear regression analysis of the concentration-response curve was based on the linear portion of the curve (Fig. 3), which were generated in JMP Genomics (http://www.jmp.com/). All ERCs were used in analysis for both AFX and GEH_Rat but only six of ten tERCs were applied for AG1 by removing one top tERC at the signal saturation range and three bottom tERCs at the noise level. Agilent’s Feature Extraction QC Report uses a similar algorithm for the same analysis. In this method, the concentration-response curve fit to the linear portion was performed on a log-log plot after a parameterized sigmoidal curve fit of the data.


DISCLAIMERThis work includes contributions from, and was reviewed by, the FDA. The FDA has approved this work for publication, but it does not necessarily reflect official Agency policy. Certain commercial materials and equipment are identified in order to adequately specify experimental procedures. In no case does such identification imply recommendation or endorsement by the FDA, nor does it imply that the items identified are necessarily the best available for the purpose.



1. ERCC. Proposed methods for testing and selecting the ERCC external RNA controls. BMC Genomics 6, 150 (2005).

2. ERCC. The External RNA Controls Consortium: a progress report. Nat. Methods 2, 731–734 (2005).

3. Hill, A.A. et al. Evaluation of normalization procedures for oligonucleotide array data based on spiked cRNA controls. Genome Biol 2, RESEARCH0055 (2001).

4. Rajagopalan, D. A comparison of statistical methods for analysis of high density oligo-nucleotide array data. Bioinformatics 19, 1469–1476 (2003).

5. Irizarry, R.A. et al. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 31, e15 (2003).

6. Irizarry, R.A. et al. Exploration, normalization, and summaries of high density oligonucle-otide array probe level data. Biostatistics 4, 249–264 (2003).

7. Freudenberg, J., Boriss, H. & Hasenclever, D. Comparison of preprocessing procedures for oligo-nucleotide micro-arrays by parametric bootstrap simulation of spike-in experi-ments. Methods Inf. Med. 43, 434–438 (2004).

8. Choe, S.E., Boutros, M., Michelson, A.M., Church, G.M. & Halfon, M.S. Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset. Genome Biol. 6, R16 (2005).

9. Dabney, A.R. & Storey, J.D. A reanalysis of a published Affymetrix GeneChip control dataset. Genome Biol. 7, 401 (2006).


11. Guo, L. et al. Rat toxicogenomic study reveals analytical consistency across microarray platforms. Nat. Biotechnol. 24, 1162–1169 (2006).

12. Shippy, R. et al. Using RNA sample titrations to assess microarray platform performance and normalization techniques. Nat. Biotechnol. 24, 1123–1131 (2006).

13. “Guide to Probe Logarithmic Intensity Error (PLIER) Estimation”, Affymetrix Technical Note, http://www.affymetrix.com/support/technical/technotes/plier_technote.pdf

14. Microarray Suite User’s Guide, Version 5.0, http://www.affymetrix.com/support/technical/manuals.affx

15. Wu, Z., Irizarry, R.A., Gentleman, R., Murillo, F.M. & Spencer, F.A model based back-ground adjustment for oligonucleotide expression arrays. J. Am. Stat. Assoc. 99, 909–917 (2004).

16. Li, C. & Wong, W. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc. Natl. Acad. Sci. USA 98, 31–36 (2001).

17. Fang, H., Xie, Q., Boneva, R., Fostel, J., Perkins, R. & Tong, W. Gene expression pro-file exploration of a large dataset on chronic fatigue syndrome. Pharmacogenomics, 7, 429–440, (2006).

18. Tong, W. et al. ArrayTrack–supporting toxicogenomic research at the US Food and Drug Administration National Center for Toxicological Research. Environ. Health Perspect. 111, 1819–1826 (2003).

19. Tong, W. et al. Development of public toxicogenomics software for microarray data man-agement and analysis. Mutat. Res. 549, 241–253 (2004).

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) projectTucker A Patterson1, Edward K Lobenhofer2, Stephanie B Fulmer-Smentek3, Patrick J Collins3, Tzu-Ming Chu4, Wenjun Bao4, Hong Fang5, Ernest S Kawasaki6, Janet Hager7, Irina R Tikhonova7, Stephen J Walker8, Liang Zhang9, Patrick Hurban2, Francoise de Longueville10, James C Fuscoe1, Weida Tong1, Leming Shi1 & Russell D Wolfinger4

Microarray-based expression profiling experiments typically use either a one-color or a two-color design to measure mRNA abundance. The validity of each approach has been amply demonstrated. Here we provide a simultaneous comparison of results from one- and two-color labeling designs, using two independent RNA samples from the MicroArray Quality Control (MAQC) project, tested on each of three different microarray platforms. The data were evaluated in terms of reproducibility, specificity, sensitivity and accuracy to determine if the two approaches provide comparable results. For each of the three microarray platforms tested, the results show good agreement with high correlation coefficients and high concordance of differentially expressed gene lists within each platform. Cumulatively, these comparisons indicate that data quality is essentially equivalent between the one- and two-color approaches and strongly suggest that this variable need not be a primary factor in decisions regarding experimental microarray design.

Although microarray technology has now been available for more than ten years1–3, many fundamental questions remain about essentially every aspect of its use, including experimental design, data acquisition, data analysis and data interpretation. One of the first decisions encountered

when planning a microarray experiment is whether to use a one-color or two-color approach. A one-color procedure involves the hybridization of a single sample to each microarray after it has been labeled with a single fluorophore (such as phycoerythrin, cyanine-3 (Cy3) or cyanine-5 (Cy5)), whereas in a two-color procedure, two samples (e.g., experimen-tal and control) are labeled with different fluorophores (usually Cy3 and Cy5 dyes) and hybridized together on a single microarray.

There are advantages and disadvantages associated with each experi-mental approach. Although the two-color design was initially developed to reduce error associated with the variability in microarray manufac-turing, the availability of high quality commercial microarrays has decreased the variability due to microarray production and thereby improved the consistency of microarray results at both the signal and ratio level. In two-color designs, the hybridization of two samples to the same microarray allows a direct comparison, minimizing variability due to processing multiple microarrays per assay. This reduced variability theoretically results in increased sensitivity and accuracy in determining levels of differential expression between sample pairs. More complex hybridization schemes are also an option when using two-color plat-forms, including hybridization with common reference samples or the use of loop designs4. Although dye-specific biases can substantially affect results when experiments are performed using two-color designs, these biases can be mitigated by performing dye-reversed replicates (dye swaps or fluorophore reversals). Such technical replication adds to experimen-tal costs, but can enhance both accuracy and sensitivity in measuring differential expression. The primary advantages of one-color designs are experimental design simplicity and flexibility. Hybridization of a single sample per microarray facilitates comparisons across microarrays and between groups of samples. Data inconsistency across assays due to multiple sources of variability, including microarray fabrication and processing, can be reduced for one-color microarrays by performing sufficient biological and technical replicate assays.

Several groups have reported an inability to generate reproducible data across laboratories and across platforms5,6. More recent studies have demonstrated that under properly controlled conditions both inter- and intralaboratory comparisons show relatively good agreement7–10. Although a few recent studies have made one-color to two-color com-parisons across different platforms11–14 this manuscript describes a

1 National Center for Toxicological Research, US Food & Drug Administration, 3900 NCTR Rd., Jefferson, Arkansas 72079, USA. 2Cogenics, A Division of Clinical Data, 100 Perimeter Park Drive, Suite C, Morrisville, North Carolina 27560, USA. 3Integrated Biology Solutions, Agilent Technologies, 5301 Stevens Creek Blvd., Santa Clara, California 95052-8059, USA. 4SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513, USA. 5Division of Bioinformatics, Z-Tech Corporation at NCTR/FDA, 3900 NCTR Rd., Jefferson, Arkansas 72079, USA. 6NCI Advanced Technology Center, 8717 Grovemont Circle, Bethesda, Maryland 20892-4605, USA. 7Yale University, W.M. Keck Biotechnology Resource Laboratory, Microarray Resource, 300 George St., Suite 2110, New Haven, Connecticut 06511, USA. 8Department of Physiology & Pharmacology, Wake Forest University School of Medicine, 115 S. Chestnut St., Winston-Salem, North Carolina 27101, USA. 9CapitalBio Corporation, 18 Life Science Parkway, Changping District, Beijing 102206, P.R. China. 10Gene Expression Chips, Eppendorf Array Technologies (EAT), 20, rue du seminaire, 5000 Namur, Belgium. Correspondence should be addressed to T.A.P. ([email protected]).


ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


comprehensive study comparing one-color to two-color designs within the same platform and across multiple test sites. An advantage of this type of comparison is that results can be easily compared within a platform because the same microarray (thus identical probes), sample labeling protocols and detection technologies are used for both the one- and two-color designs. In this study we have used three different microarray platforms with the intent of focusing on the experimental design variable, rather than specific attributes of a given platform. Although comparison across platforms is possible, the purpose of this study is to compare results within and across design schemes for each platform. Differential expres-sion profiles from a pair of total RNA samples (Stratagene Universal Human Reference total RNA and Ambion Human Brain Reference total RNA) were generated using both one-color and two-color assays on dif-ferent microarray platforms (Agilent, CapitalBio and TeleChem). These data were used to evaluate the reproducibility, specificity and sensitivity of differential expression measurements between one- and two-color experimental designs within each platform. These analyses attempt to answer a fundamental question in microarray assay experimental design: are there significant differences between the results obtained with a one-color approach versus a two-color approach?

RESULTSAll data sets from the three platforms and five test sites (three sites for the Agilent platform and one site each for CapitalBio and TeleChem) were generated using the recommended protocols and methods of the respective manufacturers (including amplification, labeling, hybridiza-tion, image analysis and data preprocessing and filtering). The same lots of two distinct RNA samples (Stratagene Universal Human Reference total RNA and Ambion Human Brain Reference total RNA) were used

for all data sets. For each of the Agilent and CapitalBio sites 20 microar-rays (10 two-color and 10 one-color) were used. For the TeleChem site, 30 microarrays (20 two-color and 10 one-color) were used. Across all five sites, a total of 110 microarrays were hybridized (60 two-color and 50 one-color), which assayed a total of 170 samples (see the Methods section for additional experimental design details). After data prepro-cessing and filtering, the numbers of probes used in subsequent analyses for the Agilent, CapitalBio and TeleChem platforms were 19,802, 11,735 and 12,453, respectively.

ReproducibilityTo examine reproducibility within platforms, we calculated Pearson cor-relations on log2-scaled data for all pair-wise combinations of microar-rays within a given sample, and then averaged across combinations of specific microarrays to enable different comparisons regarding technical or platform variability. Table 1 presents average intra- and intersite cor-relations of intensities or ratios within one- and two-color designs for each platform. Scatter plots representing a subset of the comparisons are illustrated in Supplementary Figure 1 online. For the two-color designs, intensity reproducibility was calculated both within and across the two different dyes to assess the impact of the dye on the resulting measure-ment. For the within-dye calculations, the technical replicates of samples labeled with the same dye across the microarrays were considered, and for the across-dyes calculations, all of the replicates for a given sample when labeled with either dye were evaluated. The ratio results were sepa-rated according to whether the values used were calculated from within or across dye-swap configurations.

Most of the average correlations are well above 0.9, indicating high reproducibility. As expected, the correlations decline when computed

Table 1 Averages and standard deviations of Pearson correlations for both one-color and two-color data from each of the three platforms Platform Comparison Average one-color

correlation value (s.d.)Average two-color

correlation value (s.d.)

Agilent (three sites) Intrasite Within Dye/A 0.992 (0.005) 0.990 (0.013)

Intrasite Within Dye/B 0.993 (0.004) 0.980 (0.038)

Intrasite Within Dye Swap (Ratio) n/a 0.980 (0.032)

Intrasite Across Dye/A n/a 0.984 (0.015)

Intrasite Across Dye/B n/a 0.977 (0.029)

Intrasite Across Dye Swap (Ratio) n/a 0.950 (0.019)

Intersite Intra Dye/A 0.959 (0.018) 0.982 (0.015)

Intersite Intra Dye/B 0.965 (0.015) 0.970 (0.038)

Intersite Within Dye Swap (Ratio) n/a 0.968 (0.031)

Intersite Across Dye/A n/a 0.977 (0.016)

Intersite Across Dye/B n/a 0.966 (0.033)

Intersite Across Dye Swap (Ratio) n/a 0.950 (0.023)

CapitalBio (one site) Intrasite Within Dye/A 0.959 (0.010) 0.913 (0.073)






TeleChem (one site) Intrasite Within Dye/A 0.931 (0.018) 0.902 (0.042)






Correlations are computed from log2 normalized intensity values except for rows containing (Ratio), in which case they are computed from log2 normalized ratios.

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


across known sources of variability (dye and site). Interestingly, log2 ratios appeared to be slightly less reproducible than log2 intensities for Agilent and TeleChem, but more reproducible for CapitalBio. This result could be driven by a larger microarray-to-microarray variabil-ity for CapitalBio or the performance of a manual channel balancing while scanning two-color, but not one-color CapitalBio microarrays. The overall lower correlation values for TeleChem appear to be driven by a nonlinear dye bias (data not shown). The intersite, one-color results for the Agilent sites are presented elsewhere15 and reveal that the Agilent data are very consistent between sites.

To determine if the one-color and two-color designs are revealing the same biology, we compared the reproducibility of the lists of genes iden-tified as differentially expressed by each approach within each platform. Common gene lists were generated comparing the number of differen-tially expressed genes for one-color and two-color data within each plat-form (Table 2). Comparisons are given for combinations of two P values (P < 0.05 and P < 0.01) and three fold-change (FC) thresholds (FC > 1.5, FC > 2.0 and FC > 4.0), with differentially expressed genes identi-fied using a one-sample t-test of the sample B to sample A (B/A) ratio

data including five replicates for each site. Concordances of differentially expressed genes are consistently >80% for all three Agilent sites, regard-less of the P-value or fold-change criteria used. Similarly, the CapitalBio concordances are consistently ~70%. The TeleChem concordances are less consistent across P values and fold changes and are generally lower than those for the CapitalBio and Agilent data, which is in agreement with the lower overall correlation values for this platform.

Specificity and sensitivityIn addition to evaluating the reproducibility of the data from the one- and two-color assays, we also considered the sensitivity and specificity. Specificity defines the ability of an assay to determine differences only when they truly exist (that is, the true-negative rate). Sensitivity is the power to detect true differences (that is, the true-positive rate). Both of these measures make a tacit assumption that the truth is divided, which in this case means the mRNA levels derived from a gene are either the same for samples A and B or they are different. The actual truth is that they are likely to be always different, but this difference is small enough relative to technical noise that a substantial fraction

a b c

d e f

g h i

One-color intensity fold change (log2)

Two-color intensityfold change (log2)

Two-color ratiofold change (log2)







One

-col

or in

tens

ityP

val

ue (

–log

10)

Two-

colo

r in

tens

ityP

val

ue (

–log

10)

Two-

colo

r in

tens

ityP

val

ue (

–log

10)

Two-

colo

r in

tens

ityP

val

ue (

–log

10)

Two-

colo

r ra

tioP

val

ue (

–log

10)

Two-

colo

r ra

tioP

val

ue (

–log

10)

Two-

colo

r ra

tioP

val

ue (

–log

10)

One

-col

or in

tens

ityP

val

ue (

–log

10)

One

-col

or in

tens

ityP

val

ue (

–log

10)

50

40

30

20

10

0

–10 –8 –6 –4 –2 0 2 4 6 8 10

18

15

12

9

6

3

0

20

18

16

14

12

10

8

6

4

2

0

20

18

16

14

12

10

8

6

4

2

0

20

18

16

14

12

10

8

6

4

2

0

–10 –8 –6 –4 –2 0 2 4 6 8 10

18

15

12

9

6

3

0

–10 –8 –6 –4 –2 0 2 4 6 8 10

18

15

12

9

6

3

0

50

40

30

20

10

0

–10 –8 –6 –4 –2 0 2 4 6 8 10

50

40

30

20

10

0

–10 –8 –6 –4 –2 0 2 4 6 8 10

–10 –8 –6 –4 –2 0 2 4 6 8 10

–10 –8 –6 –4 –2 0 2 4 6 8 10–10 –8 –6 –4 –2 0 2 4 6 8 10–10 –8 –6 –4 –2 0 2 4 6 8 10

Figure 1 Volcano plots depicting estimated fold change (log2, x-axis) and statistical significance (–log10 P value, y-axis). Columns correspond to results from ANOVA model 1 (one-color intensity), model 2 (two-color intensity) and model 3 (two-color ratio). Rows correspond to manufacturers. (a–c) Agilent. (d–f) CapitalBio. (g–i) TeleChem. Each point represents a gene, and colors correspond to ranges of negative log10 P and log2 fold-change values. Red: 20 < –log10 P < 50 and 3 < log2 fold < 9 or –9 < log2 fold < –3; blue: 10 < –log10 P < 50 and 2 < log2 fold < 3 or –3 < log2 fold < –2; yellow: 4 < –log10 P < 50 and 1 < log2 fold < 2 or –2 < log2 fold < –1; pink: 10 < –log10 P < 20 and 3 < log2 fold or log2 fold < –3; light blue: 4 < –log10 P < 10 and 2 < log2 fold or log2 fold < –2; light green: 2 < –log10 P < 4 and 1 < log2 fold or log2 fold < –1; gray: –log10 P < 2 or log2 fold < 1 and log2 fold > –1.

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


of mRNA levels can be considered to be the same. When the binary truth is known, the trade-off between sensitivity and specificity is typically portrayed using a receiver operator characteristic (ROC) plot. However, here the truth is unknown with respect to A versus B gene expression, as is the case with most gene expression profiling experiments. Therefore, relative specificity and sensitivity is compared in terms of distributions of statistical modeling results.

By using a P-value criterion to declare genes differentially expressed, the specificity (false-positive rate) can be controlled at the desired level. The accuracy of this control depends, at least in part, on the standard t-test assumptions, which can be shown to be approximately valid for these data. Once specificity is bounded, the total number of differentially expressed genes can be compared as a measure of sensitivity.

To more rigorously assess sensitivity in this fashion, we fit and com-pared results from three different gene-by-gene ANOVA models (see Methods for details):

Model 1: log2(Intensity) = Mean + Sample + Site + Error Model 2: log2(Intensity) = Mean + Sample + Dye + Sample*Dye + Site +Microarray + Error Model 3: log2(Ratio) = Mean + Dye + Site + ErrorModel 1 is applied to one-color data, model 2 is applied for intensity

data directly without forming ratios for two-color data and model 3 is applied to ratios for two-color data.

Direct modeling of intensities in models 1 and 2 enables a straightforward comparison between results for the one- and two-color data. Furthermore, the results from models 2 and 3 are quite similar, and so model 2 provides a bridge between models 1 and 3 that can be used for comparisons with ratio results that are commonly computed with two-color data.

Before discussing primary results from these models, it should be noted that there is an imbalance in the number of samples hybridized for the one-color and two-color designs, which improves the sensitivity of the two-color results. More specifically, for each of the Agilent and CapitalBio sites, there are ten one-color microarrays and ten two-color microarrays, hence, there are twice as many samples hybridized on the two-color microarrays; that is, the one-color results effectively have half as much data, as only one sample was hybridized to each microarray.

This degree of unbalance is even greater in the TeleChem platform for which 20 two-color and only 10 one-color hybridizations were pro-cessed, resulting in four times as much two-color data. Subsequent results should be interpreted with this in mind.

The three models were fit to the preprocessed Agilent, CapitalBio and TeleChem data and several output summary statistics were collected for each gene. Volcano plots (Fig. 1) compare the estimated log2 fold-change (x-axes) against its statistical significance (y-axes). Large numbers of genes are identified as differentially expressed as a result of the analyses of data from all three platforms, as is expected when comparing a brain sample to a tissue pool sample. All of the volcano plots visually have a similar distribution and range for the statistical significance values (y-axes) within each platform, except for model 1 for the TeleChem data (Fig. 1g), which has a substantially smaller range, that (as noted above) may be due to differences in the total number of microarrays processed for each approach. For all three platforms there is a tendency for the one-color data to exhibit larger fold changes but smaller significance scores (that is, the volcano plots are shorter and wider for one-color as compared to two-color).

Figure 2 provides a more detailed depiction of the results from models 1, 2 and 3. Estimated log2 fold changes are compared in a scatter plot matrix for one-color intensities (model 1), two-color intensities (model 2) and two-color ratios (model 3) for the Agilent, CapitalBio and TeleChem data. The estimated fold changes are very consistent, especially between the two two-color methods (far right column). The fold changes estimated from the one-color data tend to be larger than those estimated by either model for the two-color data, as indicated by the slopes shown in Figure 2.

The scatter plots in Figure 3 display negative log10 P-value compari-sons from Agilent, CapitalBio and TeleChem data. Larger negative log10 P values mean more significant results. Therefore, when the negative log10 P values from different methods are compared graphically on dif-ferent axes and the majority of the data points lie above the 45° reference line, it suggests that the method depicted on the y-axis is more sensitive than that depicted on the x-axis (or vice versa if the majority of points lie below the reference line). The scatter plots for the Agilent data suggest

Table 2 Common gene list results for one- versus two-color microarray data based on differentially expressed genesTest site Fold change Number of differentially expressed genes Number of differentially expressed genes

P < 0.05 P < 0.01

One color Two color Common genesa One color Two color Common genesa

Agilent 1 FC > 1.5 13,043 12,709 11,053 (86%) 11,771 12,506 10,175 (84%)

FC > 2 9,701 8,812 7,767 (84%) 9,273 8,678 7,467 (83%)

FC > 4 3,998 3,494 3,055 (82%) 3,979 3,447 3,029 (82%)

Agilent 2 FC > 1.5 13,308 12,345 10,992 (86%) 12,673 11,410 9,940 (83%)

FC > 2 9,792 8,686 7,712 (83%) 9,526 8,043 7,071 (80%)

FC > 4 4,077 3,623 3,104 (81%) 4,042 3,261 2,886 (79%)

Agilent 3 FC > 1.5 12,968 12,545 11,192 (88%) 12,537 12,056 10,580 (86%)

FC > 2 9,363 8,720 7,721 (85%) 9,266 8,373 7,397 (84%)

FC > 4 3,728 3,596 3,058 (84%) 3,716 3,399 2,987 (84%)

CapitalBio FC > 1.5 7,344 6,336 5,129 (75%) 6,238 6,098 4,529 (73%)

FC > 2 5,383 4,154 3,426 (72%) 5,004 4,078 3,203 (71%)

FC > 4 2,207 1,599 1,283 (67%) 2,081 1,580 1,187 (65%)

TeleChem FC > 1.5 2,883 3,306 1,491 (48%) 1,079 3,305 760 (35%)

FC > 2 2,220 1,133 659 (39%) 997 1,133 458 (43%)

FC > 4 645 178 148 (36%) 475 178 140 (43%)

Values are presented using two different statistical comparisons (P < 0.05 or P < 0.01) and three different fold-change (FC > 1.5, 2 or 4) criteria. aThe values in parentheses represent the percentage of common genes based on the number of common genes identified as differentially expressed in both one- and two-color approaches divided by the total number of differentially expressed genes from both approaches combined.

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


that the two-color intensity-based analysis (model 2) has more power (sensitivity) than both the one-color intensity-based (model 1) and two-color ratio-based analyses (model 3). The one-color analysis appears to have slightly more power than the two-color ratio analysis in the lower portion of the significance range, whereas the two-color ratio has more power in the upper range. When Figure 1a,c is also considered, the one-color data tend to exhibit larger fold changes, which explains why more differentially expressed genes were observed for the one-color data (Table 2). Figure 3 (row 1) shows that although the power between these two methods is similar, the relationship between them is nonlinear.

For the CapitalBio data in Figure 3, both two-color models produce very similar results, and both appear to have more power (sensitivity) than the one-color-intensity analyses. For the TeleChem data, the differ-ence is even more striking. As detailed above, these observed differences may be due to the differences in the amount of data for each approach, as twice as much data were obtained from a two-color assay compared

to a one-color assay. Because of this inequity in the data, the power comparisons shown here are not a completely fair assessment of the sensitivity of one- versus two-color procedures, although they do help to demonstrate the effectiveness of increasing sample sizes, without also increasing the number of microarrays used.

For example, from Table 2, when identical thresholds for signifi-cance are used, in most instances two-color ratio data produce fewer differentially expressed genes than one-color data, which indicates that either one-color platforms are more sensitive in identifying differen-tially expressed genes or that the fold changes reported by the one-color platform are less compressed than the two-color fold changes. The data modeled here suggest that the latter result is more likely.

For two-color experimental designs, specificity can also be addressed by analysis of self-self hybridizations. In experimental designs that include a dye-swap design such as this, systematic errors are reduced by inclusion of the dye-flip control. One can, therefore, assess the false-positive rate

One

-col

or in

tens

ityfo

ld c

hang

e (lo

g 2)

Two-color intensity fold change (log2)



One

-col

or in

tens

ityfo

ld c

hang

e (lo

g 2)




One

-col

or in

tens

ityfo

ld c

hang

e (lo

g 2)

One

-col

or in

tens

ityfo

ld c

hang

e (lo

g 2)

One

-col

or in

tens

ityfo

ld c

hang

e (lo

g 2)

One

-col

or in

tens

ityfo

ld c

hang

e (lo

g 2)

Two-

colo

r in

tens

ityfo

ld c

hang

e (lo

g 2)

Two-

colo

r in

tens

ityfo

ld c

hang

e (lo

g 2)

Two-

colo

r in

tens

ityfo

ld c

hang

e (lo

g 2)




8

4

0

–4

–8

–8 –4 0 4 8 –8 –4 0 4 8

8

4

0

–4

–8

8

4

0

–4

–8

–8 –4 0 4 8

8

4

0

–4

–8

–8 –4 0 4 8–8 –4 0 4 8–8 –4 0 4 8

8

4

0

–4

–8

8

4

0

–4

–8

8

4

0

–4

–8

–8 –4 0 4 8

–8 –4 0 4 8 –8 –4 0 4 8

R = 0.98S = 1.12

R = 1.00S = 1.00

R = 1.00S = 1.00

R = 1.00S = 1.00

R = 0.98S = 1.12

R = 0.95S = 1.25

R = 0.95S = 1.25

R = 0.73S = 1.18

R = 0.74S = 1.18

a b c

fed

ihg

Figure 2 Comparison of log2 fold-change estimate results from three different modeling approaches for the three different platforms. (a–c) Agilent. (d–f) CapitalBio. (g–i) TeleChem. Columns correspond to log2 fold-change comparisons of one-color intensity versus two-color intensity, one-color intensity versus two-color ratio and two-color intensity versus two-color ratio. Each gray point represents a feature on the microarray. The red lines are 45° reference lines and the contours represent density levels for the points. Statistics for correlation (R) and slope (S) are inset in each graph.

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


from self-self designs if one half of the self comparisons have the polarity reversed before calculation of significance. This analysis was performed for one of the Agilent test sites, for both pairs of self-self experiments. In this analysis, four of the self-self hybridizations were combined with two randomly chosen microarrays chosen for polarity reversal. For the A sample, 98 of 41,000 genes were detected as significantly differentially expressed (that is, false positives, P < 0.01). For the B sample, 61 of 41,000 genes were detected as significantly differentially expressed (P < 0.01).

To further address the question of which design (one-color or two-color) provides greater sensitivity, we examined correlations of one-color and two-color data for one of the Agilent test sites without any filtering based on detection calls (see Supplementary Fig. 2 online). Fold-change values correlated well between the two approaches across the entire inten-sity range, indicating that the approaches have similar levels of sensitivity. Furthermore, when thresholds for differential expression were applied (P < 0.01 and FC > 1.5) there was a 69% overlap of the genes identified by both approaches. Each approach uniquely identified 13–18% of the total number of differentially expressed genes and only a very small subset of the genes were found to be anticorrelated (18 or 0.09%).

AccuracyWhereas specificity and sensitivity refer to a divided version of the truth, a more direct assessment of the accuracy of the platforms can be obtained when the truth is quantitative. Again, the true quantita-tive differences between the mRNA levels of samples A and B for each gene are unknown, but a well accepted surrogate can be obtained from orthogonal quantitative technologies (e.g., TaqMan assays).

As detailed above, when data from one of the Agilent test sites were analyzed, ~31% of the total number of differentially expressed genes detected by one approach was not also identified by the other. To discern if these discordant data points are false positives on one or another of the approaches, we compared both to results generated using TaqMan assays. Genes were selected for measurement in these samples by TaqMan assays as part of the main MAQC study15. Most of the genes assayed by TaqMan were randomly selected from a set of RefSeq genes that were common to four commercial microarray platforms (Affymetrix, Agilent, GE Healthcare and Illumina). More details on the selection of these genes can be found elsewhere16. Figure 4 illustrates the comparison of the one-color, two-color and TaqMan assay data, and is colored based on the significance (P < 0.01 and FC > 1.5) of the ratio between B and A samples for the three different platforms (one-color, two-color and TaqMan assays). Data shown represent either all probes with TaqMan mapped data (Fig. 4a, N = 906) or only probes that were mapped as persistently detected in Agilent one- and two-color experiments (filtered as described in Methods) and detected in at least three of four replicates for both samples in the TaqMan assay data (Fig. 4b, N = 519). The results show a good overall correlation between the TaqMan assay data and both the one-color and two-color data. The 18 probes that were anticorrelated between one- and two-color data were not in the subset of genes assessed with TaqMan assays in this study. However, for those genes identified as discordant between the Agilent one- and two-color data, some were verified with TaqMan assays for each platform. A slightly higher per-centage of probes found to be significant for only the two-color design were verified with TaqMan assays (51 of 85 or 60% for one-color, versus 39 of 55 or 71% for two-color; Fig. 4a), thereby indicating that both approaches have similar levels of accuracy.

DISCUSSIONEvery aspect of microarray experimentation, including RNA isolation and purification, labeling and amplification, microarray fabrication, hybridization, data acquisition, analysis and statistical methods has

seen major advancements in the last several years. With the variety of platform choices available that have benefited from these advancements, a natural question arises regarding the characteristics of data generated from one-color and two-color assays. Results presented here describe a comprehensive study comparing one-color to two-color assays within three different platforms and across multiple test sites for one of the platforms, using two distinct RNA samples. Differential expression data from a pair of total RNA samples (Stratagene Universal Human Reference total RNA and Ambion Human Brain Reference total RNA) were generated using both one-color and two-color assays on different microarray platforms (Agilent, CapitalBio and TeleChem) and used to evaluate the relative reproducibility, specificity, sensitivity and accuracy of the two approaches.

One of the strengths of this analysis is that comparison of the one-color and two-color assays is not dependent on interplatform analysis, thus avoiding many of the complications inherent to such a comparison (including probe sequence issues as well as target labeling and detection technology differences). In addition, the filtered gene lists used for the analysis presented here are consistent between the two different design schemes on each platform, but are different between the different plat-forms, further complicating interplatform comparisons.

Overall, the results between one- and two-color assays compare well, which aligns with expectations generated by numerous independent successes of one- and two-color microarray applications. Here we pro-vide a statistical validation of this expectation. Reproducibility between the one-color and two-color assays is quite similar for each platform as demonstrated by the consistency of Pearson correlation values. When ratios are generated from the two distinct RNA samples, the differentially expressed gene lists are highly consistent across one- and two-color data when using widely accepted P-value and fold-change thresholds for sig-nificance. Just as important, the stability of the differentially expressed gene lists is consistent within individual platforms. Correlation coeffi-cients in Table 1 are higher for the Agilent data, leading to greater overall concordance, but for all the platforms the one-color data and two-color data are comparable when assessing concordance using differentially expressed gene lists.

Three ANOVA models are defined to provide a statistical framework for comparison of relative intraplatform specificity and sensitivity. Model 1 applies to one-color log2 intensities and model 3 to two-color log2 ratios. Model 2 handles two-color log2 intensities, and serves as a bridge between models 1 and 3. The use of these models avoids the problem of arbitrarily defining ratios for the one-color data, and enables adjustment for all known sources of variability. In addition, model 2 is shown to have slightly more sensitivity over model 3 for the Agilent data. Modeling two-color intensities directly as in model 2 is not common practice, but offers several advantages, including the ability to study sample-dye interactions. Overall, the relative specificity and sensitiv-ity of the three platforms as determined by the three models is very similar between one- and two-color assays within each platform (Figs. 1–3). The results suggest that the two-color assays have a slight advan-tage with regard to power (sensitivity) and the detection of small fold changes (Figs. 1 and 3), especially when considering an equal number of microarrays. The one-color data do appear to be less compressed than two-color data as indicated by the slopes shown in Figures 2 and 4, which should be considered when using filtering rules that apply directly to estimated fold changes.

In addressing the accuracy of the one-color and two-color assays using data from the Agilent platform, the results also show a good overall cor-relation with the TaqMan assay data. In some cases the TaqMan assay data have better agreement with the one-color data and in others the TaqMan assay data have better agreement with the two-color data. In

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


many cases the differential expression results were consistent in direc-tion between the one-color and two-color assays, but failed to meet the applied fold-change or significance criteria. In those cases when genes are reported as significantly differentially expressed by TaqMan assays, but not by either the one-color or two-color microarray assays, the differences may be attributable to the fact that the technologies are targeting and measuring different regions of a particular gene and/or splice variant. Also, most of the genes reported as significantly differ-entially expressed in the TaqMan assay, but not in the microarray data, are below the detection level of the microarray assay (Fig. 4) and may be indicative of the higher sensitivity of the PCR-based method. Finally, the significance of the microarray and TaqMan assays is not directly comparable, as a different level of replication was undertaken for the TaqMan assay data16.

In summary, by presenting the experimental design and perfor-mance advantages of both modes, researchers are now provided insight and guidance for properly selecting the best approach (one- or two-color) to meet their research needs. When assessing the

reproducibility of the biology across the two approaches by compar-ing the concordance of differentially expressed gene lists, perfor-mance was approximately equal (Table 2 and Fig. 4). Cumulatively, these results indicate that data generated from both one- and two-color assays are approximately equivalent and provide similar lev-els of biological insight. It should be noted that these results may not apply to microarray platforms for which manufacturing vari-ability is high (such as may occur with some suboptimal, in-house, robotically spotted arrays, with poor quality control). All microarrays used in this study were obtained and processed at approximately the same time. Although in all three platforms multiple manufacturing lots of microarrays were used, no effort was made to control which manufacturing lots were grouped together in the study. Hence, the magnitude of the variance of the one-color and two-color results may differ from those presented here, if the data were specifically generated and assessed as individual groups across multiple manu-facturing lots. In essence the variability due to manufacturing lot has not been addressed in this study since the array populations for each

Two-color intensity P value (–log10)



Two-color ratioP value (–log10)






50

40

30

20

10

0

50

40

30

20

10

0

50

40

30

20

10

0

0 10 20 30 40 50

14

12

10

8

6

4

2

0

0 2 4 6 8 10 12 14

18

16

14

12

10

8

6

4

2

0

0 2 4 6 8 10 12 14 16 18

18

16

14

12

10

8

6

4

2

0

0 2 4 6 8 10 12 14 16 18

18

16

14

12

10

8

6

4

2

0

0 2 4 6 8 10 12 14 16 18

14

12

10

8

6

4

2

0

0 2 4 6 8 10 12 14

14

12

10

8

6

4

2

0

0 2 4 6 8 10 12 14

0 10 20 30 40 50 0 10 20 30 40 50

R = 0.99S = 0.99

R = 0.97S = 0.99

R = 0.94S = 0.98

R = 0.78S = 0.68

R = 0.81S = 0.69

R = 0.69S = 0.60

R = 0.76S = 0.58

R = 0.42S = 0.17

R = 0.41S = 0.16

One

-col

or in

tens

ityP

val

ue (

–log

10)

One

-col

or in

tens

ityP

val

ue (

–log

10)

One

-col

or in

tens

ityP

val

ue (

–log

10)

One

-col

or in

tens

ityP

val

ue (

–log

10)

One

-col

or in

tens

ityP

val

ue (

–log

10)

One

-col

or in

tens

ityP

val

ue (

–log

10)

Two-

colo

r in

tens

ityP

val

ue (

–log

10)

Two-

colo

r in

tens

ityP

val

ue (

–log

10)

Two-

colo

r in

tens

ityP

val

ue (

–log

10)

Figure 3 Comparison of negative log10 P-value estimate results from three different modeling approaches for the three different platforms. (a–c) Agilent. (d–f) CapitalBio. (g–i) TeleChem. Columns correspond to negative log10 P-value estimates of one-color intensity versus two-color intensity, one-color intensity versus two-color ratio and two-color intensity versus two-color ratio. Each gray point represents a feature on the microarray. The red lines are 45° reference lines and the contours represent density levels for the points. Statistics for correlation (R) and slope (S) are inset in each graph.

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


platform were heterogeneous in terms of batch specificity. Ultimately the decision to use either a one-color or two-color approach will be determined by cost, experimental design considerations and personal preference.

METHODSHybridization. Three independent test sites were used for the Agilent platform and one test site each was used for the CapitalBio and TeleChem platforms (five total test sites). All test sites received the same lot numbers of two different total RNA samples (Stratagene Universal Human Reference total RNA (SUHRR, sample A) and Ambion Human Brain Reference total RNA (AHBRR, sample B). The hybridization-dye pairings and RNA descriptions were as follows: two-color hybridization: a, SUHRR-Cy3 versus SUHRR-Cy5; b, AHBRR-Cy3 versus AHBRR-Cy5; c, SUHRR-Cy3 versus AHBRR-Cy5; d, AHBRR-Cy3 versus SUHRR-

Cy5; one-color hybridization: e, SUHRR-Cy3; f, AHBRR-Cy3.The two-color self-self hybridizations (codes a and b) provide information about

the reproducibility and specificity of the two-color hybridizations, but are not used for a majority of the analyses described in this paper because of space constraints and to more evenly balance the comparisons between the one- and two-color results within a platform. However, they are included in the available data set.

For each of the Agilent and CapitalBio sites, 5 microarrays were used for each of the RNA codes c, d, e and f, for a total of 20 microarrays (10 two-color and 10 one-color) at each of these sites. For the TeleChem site, 10 microarrays were used for RNA codes c and d, and 5 microarrays for codes e and f, for a total of 30 microarrays (20 two-color and 10 one-color). Across all five sites, a total of 110 microarrays were hybridized (60 two-color and 50 one-color), which assayed a total of 170 samples.

RNA quantification and purity assessment. RNA samples were quantified using a NanoDrop ND-100 UV-VIS spectrophotometer. Each test site performed three replicate measurements for each sample using 1.5 µl and reported the values as average ± s.d.

RNA intactness assessment. SUHRR and AHBRR (200 ng) were run on the Agilent Bioanalyzer 2100 in triplicate (all samples on one chip) by each test site. rRNA ratio (28S/18S) and RNA Integrity Numbers (RIN) are reported as average ± s.d. Acceptable values were defined as: A260/A280 ratio in the range of 1.8–2.2, rRNA ratio (28S/18S) > 0.9 and RIN value > 8.0.

Labeling and hybridizations on the Agilent platform. Five hundred nanograms of total RNA was converted into labeled cRNA with nucleotides coupled to a fluorescent dye (either Cy3 or Cy5) using the Low RNA Input Fluorescent Linear Amplification Kit (version 4.0 protocol) (Agilent Technologies). The quality and quantity of the resulting labeled cRNA was assessed using a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies) and an Agilent 2100 Bioanalyzer. Individually labeled cRNAs were not pooled before hybridization. Equal amounts of Cy3 and Cy5-labeled cRNA (1.5 µg) from two different samples (for the two-color protocol) or only from one Cy3-labeled cRNA (for the one-color proto-col) were hybridized (see hybridization configurations above) to Agilent Human Whole Genome Oligo Microarrays (G4112A) for 17 h at 65 °C. The hybridized microarrays were then washed using manufacturers’ recommended conditions and scanned using an Agilent G2565BA scanner. Data were extracted from the scanned image using Agilent Technologies’ Feature Extraction software version 8.5 (FE8.5). All data columns present in the extracted data files are described in detail in the Agilent G2567AA FE8.5 Software Reference Guide (http://www.chem.agilent.com/scripts/LiteraturePDF.asp?iWHID=41954).

34439

3816

39

160 87

51

129

30289

2814

2329

19

R = 0.947m = 0.913

R = 0.876m = 0.733

R = 0.968m = 0.842

R = 0.906m = 0.761

R = 0.916m = 0.924

R = 0.876m = 0.807

One-color B/A log2 ratio

One

-col

or B

/A lo

g 2 r

atio

One

-col

or B

/A lo

g 2 r

atio

Two-

colo

r B

/A lo

g 2 r

atio

Taq B/A log2 ratio

One-color B/A log2 ratio

Taq B/A log2 ratio

Taq Taq

a b10

5

0

–5

–10

10

5

0

–5

–10

–10 –5 0 5 1010

5

0

–5

–10

10

5

0

–5

–10

–10 –5 0 5 10

–10 –5 0 5 10

–10 –5 0 5 10

One color Two colorOne color Two color

Figure 4 Comparison of Agilent one-color and two-color data with TaqMan assay data. The figure illustrates the comparison of the one-color, two-color and TaqMan assay data, and is colored based on the significance of the ratio between B and A samples for the three different sets of data as illustrated. Significance was based on a P < 0.01 and a 1.5 fold change. Data shown represent either of two possibilities. (a) All probes with TaqMan mapped data (N = 906). (b) Only probes that were mapped as persistently detected in Agilent one- and two-color experiments (filtering as described in Methods) and that were detected in at least three of four replicates for both samples in the TaqMan assay data (N = 519). The numbers in gray refer to the number of genes that are not detected as significantly differentially expressed (based on given FC and P-value criteria) by any of the three assays. Lines shown represent the orthogonal fit to the data with slope (m) and correlation (R) as shown in the inset.

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Labeling and hybridizations on the CapitalBio platform. The human genome-wide long oligonucleotide microarray was constructed in-house at CapitalBio Corporation. Briefly, 5′-amino-modified 70-mer probes representing 21,329 H. sapiens genes from the Human Genome Oligo Set Version 2.1 (Qiagen), and internal and external controls, were printed on amino silaned glass slides using a SmartArray microarrayer (CapitalBio Corp.).

Fluorescent-labeled DNA (Cy3 and Cy5-dCTP) was produced through Eberwine’s linear RNA amplification method and subsequent enzymatic reaction. This procedure has been previously described in detail17. Briefly, double-stranded cDNA containing T7 RNA polymerase promoter sequence was synthesized with 5 µg of total RNA using Reverse Transcription System, RNase H, DNA polymerase I and T4 DNA polymerase, according to the manufacturer’s recommended pro-tocol (Promega). The resulting labeled DNA (labeled control and test samples) was quantitatively adjusted based on the efficiency of Cy-dye incorporation and mixed into 80-µl hybridization solution (3× SSC, 0.2% SDS, 25% formamide and 5× Denhart’s). Individually labeled cRNAs were not pooled before hybridization. Hybridization on a microarray (see hybridization configurations above) was per-formed under LifterSlip (Erie Company). The hybridization chamber was laid on a three-phase Tiling Agitator (CapitalBio Corp.) to facilitate the microfluidic circulation under the coverslip. The microarray was hybridized at 42 °C over-night and washed with two consecutive washing solutions (0.2% SDS, 2× SSC at 42 °C for 5 min, and 0.2% SSC for 5 min at 22 °C) before scanning with a confocal LuxScan scanner (CapitalBio Corp.). For two-color microarrays, the scanning setting for the Cy3 and Cy5 channels was manually balanced by visual inspection of the external control spots. The data from the obtained images were extracted

with SpotData software (CapitalBio Corp.).

Labeling and hybridizations on the TeleChem H25K platform. Two micrograms of each sample was amplified using a Genisphere SenseAmp Plus Amplification kit (generating amplified poly A–tailed senseRNA), according to the manufac-turer’s recommended protocol. The resulting tailed senseRNA was reverse tran-scribed with amino-allyl indirect labeling using a SuperScript Indirect cDNA Labeling Kit (Invitrogen) with slight modifications. Each first strand cDNA generation reaction used 5 µg of senseRNA with Superscript II and aa-dUTP at 42 °C for 2 h. cDNA was purified using a MinElute PCR Purification Kit and conjugated with mono-functional Cy3 or Cy5 dye aliquot (GE Healthcare) for 1 h at 22 °C in the dark. Dye-conjugated cDNA was purified with a MinElute PCR Purification Kit. Dye:base labeling efficiency was determined at this point for all dye-conjugated cDNA.

Hybridization was done manually in TeleChem Hybridization cassettes using LifterSlip (Erie Company). cRNAs were labeled independently and not pooled before hybridization. In one-color experiments, Cy3-labeled cDNA samples were denatured independently and one sample applied to each microarray. For two-color experiments, Cy3- and Cy5-paired cDNA samples were combined and denatured before applying to individual microarrays (see hybridization configu-rations above). Hybridization mixes (55 µl total volume) consisted of 38.5 µl labeled cDNA, 5.5 µl 2% SDS, 7.0 µl 20× SSC, 3.0 µl poly dA (5 µg/µl) and 1.0 µl Cotl DNA (1 µg/µl). Hybridization cassettes and slides were pre-heated to 55 °C before samples were added and 3× SSC was added into humidity grooves in the cassette. Samples were applied to the microarrays and hybridized for 16 h at 55 °C in a water bath. After hybridization, slides were washed (10 min, 2× SSC/0.1% SDS at 42 °C; 10 min, 0.2× SSC/0.1% SDS at 42 °C; 10 min, 0.2× SSC at 22 °C twice) before centrifugation in 50-ml conical tubes at 201g for 5 min to dry. Scanning was performed on Axon 4200A or 4200B instruments at a PMT

yielding 1% or less saturated spots.

Agilent data preprocessing, normalization and filtering. For one-color experi-ments, gProcessedSignal values from Agilent’s Feature Extraction software were used as input into experimental analyses. This ProcessedSignal is generated after background subtraction and includes correction for multiplicative surface trends. Features were marked as Absent (A) when the processed signal intensity was less than twofold the value of the processed signal error value (these features were transformed by setting their processed intensity value to that of the processed signal error value). Features were marked as Marginal (M) when the measured intensity was at a saturated value or if there was a substantial amount of varia-tion in the signal intensity within the pixels of a particular feature. Features not considered Absent or Marginal were marked Present (P).

For the two-color microarrays, raw data signals were preprocessed in a similar fashion as those for one-color microarrays, but did not include a surface-trend correction and did include additional preprocessing to adjust for possible dye bias within a microarray. Data used in the two-color analyses was either the red and green ProcessedSignal or LogRatio values from Agilent’s Feature Extraction software. Dye normalization of two-color Agilent microarrays includes both linear scaling and Lowess normalization to a rank invariant set of microarray features. For some of the analyses (see Table 2, Fig. 4 and Supplementary Fig. 2), LogRatio values, which are calculated from the ProcessedSignals by Agilent’s Feature Extraction software, were used. When LogRatio was used for the two-color data, the sign on LogRatio was changed for half of the RNA comparisons to accommodate the dye swap.

Generation of a filtered feature list for Agilent one- and two-color data was conducted as follows: (i) Agilent flagging rules were applied, setting all absent and marginal features to missing. (ii) To derive a reliable common gene set across both one- and two-color data, features with fewer than 50% present genes across all microarrays were filtered. (iii) Features with fewer than five present calls from each sample group (A or B) across sites for one-color or fewer than five pres-ent calls across sites for two-color were also filtered. (iv) This filtering results in 19,802 genes in the final common gene set that was used for much of the statistical analysis presented, from a total of 41,000 non-control probes on the microarray. For the analysis presented in Figure 4 and Supplementary Fig. 2, all 41,000 noncontrol probes were included in the analysis.

Further details on the data processing steps used to generate the Agilent one- and two-color output columns can be found in the Agilent G2567AA FE8.5 Software Reference Guide (http://www.chem.agilent.com/scripts/LiteraturePDF.asp?iWHID=41954).

Data were median normalized for the statistical analyses in Figures 1–3 and Supplementary Figure 1 through JMP Genomics software (http://www.jmp.com/). For the remainder of the analyses, normalization of the Agilent one-color data was performed in GeneSpring GX as follows: (i) Values below 5.0 were set to 5.0. (ii) Each measurement was divided by the 50th percentile of all measurements in that sample. The percentile was calculated using only genes marked present.

For analyses presented in Figure 4 and Supplementary Figure 2 only, specific samples were normalized to one another. All samples were normalized against the median of the control samples (A). Each measurement for each gene in those specific samples was divided by the median of that gene’s measurements in the

corresponding control samples.

CapitalBio data preprocessing, normalization and filtering. All one-color and two-color images were analyzed using SpotData software (CapitalBio Corp.) and raw data were provided in the form of tab-delimited text files for each microar-ray. A spot-exclusion method was adopted to filter faint spots18,19. The aver-age log2 intensity of each gene across all replicates of both samples (A and B) was calculated and sorted. Genes with average intensity in the lowest 50% were excluded from further analysis. A subset of 11,735 genes from a total of 23,231 spots (including controls) remained for analysis.

Local median and background subtraction was applied for one-color and two-color intensity. For two-color data, an additional linear Lowess normalization was applied to the background-subtracted data. This was performed by scaling each channel to a median intensity of 100 and then normalized. For one-color data,

each microarray was scaled to a median intensity of 1,000.

TeleChem data preprocessing, normalization and filtering. All one-color and two-color images were analyzed using Axon GenePix Pro 5.0 software, and raw data were provided in the form of one tab-delimited text (.gpr) file per micro-array. Features automatically marked as Absent (A) had a numerical value of –75 and corresponded to features in the Axon (.gal) file that show ID ‘empty’. Features marked Not Found (NF) had a numerical value of –50 and were defined as features with less than 6 pixels, or the feature diameter was greater than the lesser of three nominal diameters set in Block Properties of the (.gal) file, or the diameter that would cause it to overlap an adjacent feature of nominal diameter, or the feature was found at a position that would overlap an adjacent feature. Features marked Bad (B) had a numerical value of –100 and were defined by visual inspection during spot finding as having major noise associated with either the spot or background signal. All probes with a value less than 0 on at least one microarray were removed across all microarrays. Features marked Present (P) had

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


a numerical value of 0 and were considered acceptable for further analysis. The common filtered genes between one- and two-color microarrays were retained. The subset data was based on the list of 12,453 genes from a total of 27,648 spots (including controls) on the microarray. Analysis was based on intensity values: (F532_Median)-B532 intensity was used for one- and two-color data, in addition to (F635_Median)-B635 for two-color data. Lowess normalization of intensities was applied within individual two-color microarrays and median normalization across the microarrays.

The aforementioned preprocessing and normalization methods for all three platforms followed manufacturers’ recommendations to reflect what will most likely occur in common practice. The methods differ somewhat across platforms but are consistent within platforms in order to make intraplatform one-and two-color comparisons fair.

The primary difference in normalization techniques between the three plat-forms is found in the TeleChem two-color data. For all the other data analyses, median scaling was applied to the data before Lowess normalization. However, with the TeleChem two-color data, this process was reversed. To compare all the data using the same normalization work flow, we applied median scaling to the TeleChem two-color data before Lowess normalization and compared to the original normalization process (Lowess before median scaling). This compari-son is shown in Supplementary Figure 3 online. These additional data confirm that the minor differences in normalization procedure have very little impact

on the data.

Outlier assays. For the Agilent data set, microarrays identified as outlier microar-rays based on single microarray quality metrics (AG1_1_A1, AG1_2_A3, AG1_3_B3, AGL_1_B5, AGL_1_D1, AGL_2_A1, AGL_2_C4) were not removed from analysis for the majority of the analysis presented here. The analysis presented in

Figure 4 and Supplementary Figure 2 did exclude outlier microarrays.

Generation of common differentially expressed gene lists. Data used for the generation of the common differentially expressed gene lists (Table 2) were from the genes that passed data preprocessing and filtering criteria for each platform and included 19,802 genes for Agilent, 11,735 genes for CapitalBio and 12,453 genes for TeleChem. Data normalization for the Agilent data was performed as described above. For both CapitalBio and TeleChem, ArrayTrack20 median scal-ing was used for one-color data and Linear & Lowess for two-color data (default median target intensity = 1,000). Significant differentially expressed genes were identified with a one sample t-test of log2 (B/A) ratio of five replicates that differ from 0. For two-color data, the dye swap result was averaged before doing the t-test. For both one-color and two-color data, all combinations of P-values of 0.05 and 0.01 and fold-changes of 1.5, 2.0 and 4.0 were calculated to determine the percentage of common differentially expressed genes. The percentage of common genes was calculated by dividing the number of common genes identified as dif-ferentially expressed in both one- and two-color approaches by the total number of differentially expressed genes from both approaches combined. The common

manufacturer ID was used to identify the common genes from the gene lists.

ANOVA models. Several analyses are based on fitting three different models to the preprocessed and normalized data:

Model 1: log2(Intensity) = Mean + Sample + Site + Error Model 2: log2(Intensity) = Mean + Sample + Dye + Sample*Dye + Site +Microarray + ErrorModel 3: log2(Ratio) = Mean + Dye + Site + ErrorSeparate models are fitted to the data from each feature within each platform.

Model 1 is used for the one-color data and models 2 and 3 are used for the two-color data. In these models, Intensity refers to a particular intensity value for one gene; Ratio refers to a particular ratio value for one gene; Mean indicates an overall mean value, which corresponds to mean log2(Intensity) for models 1 and 2, and mean log2(Ratio) for model 3; Sample indicates whether the intensity measurement is from sample A or B (this term is not needed in model 3 because ratios between A and B are being modeled); Site indicates the site (included for Agilent data only because CapitalBio and TeleChem data only had one site); Dye indicates the dye effect in model 2 and the dye-swap configuration in model 3; Sample*Dye refers to an interaction effect between samples and dyes; Microarray indicates the microarray from which the data were measured; Error indicates

random error, which is assumed to be normally distributed with mean zero and variance specific for each gene.

Along with the Error term, the Site and Microarray effects are also assumed to be normally distributed with mean zero and constant variance. This arises from an assumption that effects of Site and Microarray can be assumed to be drawn from a normal population. They are so-called random effects, and estimates of their variances are known as variance components. All other effects are assumed to be fixed, that is, they have a finite number of levels, the mean value of which is estimated during the model fitting process.

Model 2 is obviously the most complex of the three models but is easily fitted to two-color data using standard mixed models software. The random Microarray effect is critical, as it models the correlation between pairs of intensi-ties observed on the same microarray. This model enables a more refined analysis of two-color results than that from model 3 by including estimates of overall mean intensity and the Sample*Dye interaction. Model 2 and its variants have been used successfully for the past five years in a variety of microarray applica-tions21–23.

For each feature on each platform, an estimate of the log2 fold change between samples A and B is computed in models 1 and 2 as the difference between the two levels of the estimated Sample effect. The ANOVA model output also includes a standard error and degrees of freedom for this difference, from which a –log10 P value is computed using a t-distribution. For model 3, the estimate of the Mean effect represents the estimated log2-scaled fold change (B/A) because the ratios were computed by dividing the B-intensity by the A-intensity, and a –log10 P value is computed in the same way as in models 1 and 2. Statistical results for all three models are based on mixed model theory21–24.

Comparison of Agilent one-color and two-color data with TaqMan assay data. One-color data were normalized in Agilent GeneSpring GX as described above including the normalization of specific samples to each other (Fig. 4). Two-color data were analyzed using the following scheme.

The processed signal data from the Agilent Feature Extraction software were loaded into Agilent’s GeneSpring GX software. To account for dye swap, we reversed the signal channel and control channel measurements for all d micro-arrays. Each gene’s measured intensity was divided by its control channel value in each sample.

TaqMan assay data were generated as part of the MAQC study as described elsewhere16. TaqMan assay data were imported into Agilent’s GeneSpring GX from the data file provided by the MAQC after splitting it into individual files for each sample. For the TaqMan assay comparisons, the mapping from the final 12,091 genes was used for cross comparison between the Agilent probes and TaqMan assays15. The processed (‘intensity like’) TaqMan assay data were imported into GeneSpring GX based on the mapping, and ratios were calculated as follows: each measurement for each gene in those specific samples was divided by the median of that gene’s measurements in the corresponding control (A, SUHRR) samples.

P values were calculated for the Agilent and TaqMan assay data using a one-sample t-test with the appropriate number of replicates (four or five for the microarray assays, depending on the comparison, and four for the TaqMan assays), with the mean intensity value (as calculated above) compared to 1.


ACKNOWLEDGMENTSThe authors thank the MicroArray Quality Control (MAQC) consortium for generating the large data sets used in this study. E.K.L. and P.H. acknowledge the Advanced Technology Program of the National Institute of Standards and Technology, whose generous support provided partial funding of this research (70NANB2H3009).

DISCLAIMER This work includes contributions from, and was reviewed by, the FDA and the NIH. This work has been approved for publication by these agencies, but it does not necessarily reflect official agency policy. Certain commercial materials and equipment are identified in order to adequately specify experimental procedures. In no case does such identification imply recommendation or endorsement by the FDA or the NIH, nor does it imply that the items identified are necessarily the best available for the purpose.

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy




1. Fodor, S.P. et al. Light-directed, spatially addressable parallel chemical synthesis. Science 251, 767–773 (1991).

2. Fodor, S.P. et al. Multiplexed biochemical assays with biological chips. Nature 364, 555–556 (1993).

3. Schena, M. et al. Quantitative monitoring of gene expression patterns with a comple-mentary DNA microarray. Science 270, 467–470 (1995).

4. Churchill, G.A. Fundamentals of experimental design for cDNA microarrays. Nat. Genet. 32, Suppl. 490–494 (2002).

5. Li, J., Pankratz, M. & Johnson, J. Differential gene expression patterns revealed by oligonucleotide versus long cDNA arrays. Toxicol. Sci. 69, 383–390 (2002).

6. Tan, P. et al. Evaluation of gene expression measurements from commercial platforms. Nucleic Acids Res. 31, 5676–5684 (2003).

7. Dobbin, K.K. et al. Interlaboratory comparability study of cancer gene expression analysis using oligonucleotide microarrays. Clin. Cancer Res. 11, 565–572 (2005).

8. Irizarry, R.A. et al. Multiple-laboratory comparison of microarray platforms. Nat. Methods 2, 345–349 (2005).

9. Larkin, J.E., Frank, B.C., Gavras, H., Sultana, R. & Quackenbush, J. Independence and reproducibility across microarray platforms. Nat. Methods 2, 337–343 (2005).

10. Kuo, W.P. et al. A sequence-oriented comparison of gene expression measurements across different hybridization-based technologies. Nat. Biotechnol. 24, 832–840 (2006).

11. Järvinen, A-K. et al. Are data from different gene expression microarray platforms comparable? Genomics 83, 1164–1168 (2004).

12. de Reynies, A. et al. Comparison of the latest commercial short and long oligonucleotide microarray technologies. BMC Genomics 7, 51 (2006).

13. Wang, Y. et al. Large scale real-time PCR validation on gene expression measurements from two commercial long-oligonucleotide microarrays. BMC Genomics 7, 59 (2006).

14. Bammler, T. et al. Standardizing global gene expression analysis between laboratories and across platforms. Nat. Methods 2, 351–356 (2005).

15. MAQC Consortium. The MicroArray Quality Control (MAQC) project shows inter- and intramolecular reproducibility of gene expression measurements. Nat. Biotechnol. 24, 1151–1161 (2006).

16. Canales, R.D. et al. Evaluation of DNA microarray results with quantitative gene expres-sion platforms. Nat. Biotechnol. 24, 1115–1122 (2006).

17. Guo, Y. et al. Genomic analysis of anti-hepatitis B virus (HBV) activity by small interfer-ing RNA and lamivudine in stable HBV-producing cells. J. Virol. 79, 14392–14403 (2005).

18. Barczak, A. Spotted long oligonucleotide arrays for human gene expression analysis. Genome Res. 13, 1175–1785 (2003).

19. Shi, L. et al. Cross-platform comparability of microarray technology: intra-platform con-sistency and appropriate data analysis procedures are essential. BMC Bioinformatics 6 Suppl. 2, S12 (2005).

20. Tong, W. et al. Development of public toxicogenomics software for microarray data management and analysis. Mutat. Res. 549, 241–253 (2004).

21. Wolfinger, R.D. et al. Assessing gene significance from cDNA microarray data via mixed models. J. Comput. Biol. 8, 625–637 (2001).

22. Jin, W., Riley, R., Wolfinger, R.D., White, K.P, Passador-Gurgel, G. & Gibson G. Contributions of sex, genotype and age to transcriptional variance in Drosophila mela-nogaster. Nat. Genet. 29, 389–395 (2001).

23. Chu, T-M., Deng, S., Wolfinger, R.D., Paules, R.S. & Hamadeh, H.K. Cross-site com-parison of gene expression data reveals high similarity. Environ. Health Perspect. 112, 449–455 (2004).

24. Chu, T-M., Deng, S. & Wolfinger, R.D. Modeling Affymetrix data at the probe level. in DNA microarray and statistical genomics techniques: Design, analysis, and interpreta-tion of experiment. (eds. Edwards, J.W., Beasley, T.M., Page, G.P. and Allison, D.B.) 197–222 (Chapman & Hall/CRC, Taylor & Francis Group, Boca Raton, FL, 2006).

ANALYS IS©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

The MicroArray Quality Control (MAQC) project showsinter- and intraplatform reproducibility of geneexpression measurementsMAQC Consortium*

Over the last decade, the introduction of microarray technology has had a profound impact on gene expression research.

The publication of studies with dissimilar or altogether contradictory results, obtained using different microarray platforms

to analyze identical RNA samples, has raised concerns about the reliability of this technology. The MicroArray Quality

Control (MAQC) project was initiated to address these concerns, as well as other performance and data analysis issues.

Expression data on four titration pools from two distinct reference RNA samples were generated at multiple test sites

using a variety of microarray-based and alternative technology platforms. Here we describe the experimental design and

probe mapping efforts behind the MAQC project. We show intraplatform consistency across test sites as well as a high

level of interplatform concordance in terms of genes identified as differentially expressed. This study provides a resource

that represents an important first step toward establishing a framework for the use of microarrays in clinical and

regulatory settings.

Recently, pharmacogenomics and toxicogenomics have been identifiedboth by the US Food and Drug Administration (FDA) and the USEnvironmental Protection Agency (EPA) as key opportunities inadvancing personalized medicine1,2 and environmental risk assess-ment3. These agencies have issued guidance documents to encouragescientific progress and to facilitate the use of these data in drugdevelopment, medical diagnostics and risk assessment (http://www.fda.gov/oc/initiatives/criticalpath/; http://www.fda.gov/cder/guidance/6400fnl.pdf; http://www.fda.gov/cdrh/oivd/guidance/1549.pdf; http://www.epa.gov/osa/genomics.htm). However, although DNA micro-arrays represent one of the core technologies for this purpose,concerns have been raised regarding the reliability and consistency,and hence potential application of microarray technology in theclinical and regulatory settings. For example, a widely cited studyreported little overlap among lists of differentially expressed genesderived from three commercial microarray platforms when the sameset of RNA samples was analyzed4. Similar low levels of overlap havebeen reported in other interplatform and/or cross-laboratory micro-array studies5–8.

Although similar results continue to appear in peer-reviewedjournals9,10, raising doubts about the repeatability, reproducibilityand comparability of microarray technology11–13, several studieshave also been recently published showing increased reproducibilityof microarray data generated at different test sites and/or usingdifferent platforms14–18. It follows that before this technology can beapplied in clinical practice and regulatory decision making, microarraystandards, quality measures and consensus on data analysis methodsneed to be developed2,19–21.

Here we describe the MAQC project, a community-wide effortinitiated and led by FDA scientists involving 137 participants from 51organizations. In this project, gene expression levels were measuredfrom two high-quality, distinct RNA samples in four titration pools onseven microarray platforms in addition to three alternative expressionmethodologies. Each microarray platform was deployed at threeindependent test sites and five replicates were assayed at each site.This experimental design and the resulting data set provide a uniqueopportunity to assess the repeatability of gene expression microarraydata within a specific site, the reproducibility across multiple sites andthe comparability across multiple platforms. Objective assessment ofthese technical metrics is an important step towards understanding theappropriate use of microarray technology in clinical and regulatorysettings. This study also addresses many other needs of the scientificcommunity pertaining to the use and analysis of microarray data (seeMAQC goals in Supplementary Data online).

The MAQC project has generated a rich data set that, whenappropriately analyzed, reveals promising results regarding the con-sistency of microarray data between laboratories and across platforms.In this article, we detail the study design, describe its implementationand summarize the key findings of the MAQC main study. Theaccompanying set of articles22–26 provides additional analyses andrelated data sets. Although the sample types used in this study are notdirectly representative of a relevant biological study, the study providestechnical insights into the capabilities and limitations of microarraytechnology. Similar levels of concordance in cross-laboratory andinterplatform comparisons have been independently reported usinga toxicogenomics study26.

Received 6 June; accepted 31 July; published online 8 September 2006; doi:10.1038/nbt1239

*A list of authors and their affiliations appears at the end of the paper. Correspondence and requests for materials should be addressed to L.S. ([email protected]).


ART I C L E S©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

RESULTS

Experimental design

The MAQC project (http://www.fda.gov/nctr/science/centers/toxicoinformatics/maqc/) repeatedly assayed four pools comprised oftwo RNA sample types on a variety of gene expression platforms andat multiple test sites. The two RNA sample types used were a UniversalHuman Reference RNA (UHRR) from Stratagene and a Human BrainReference RNA (HBRR) from Ambion. The four pools included thetwo reference RNA samples as well as two mixtures of the originalsamples: Sample A, 100% UHRR; Sample B, 100% HBRR; Sample C,75% UHRR:25% HBRR; and Sample D, 25% UHRR:75% HBRR. Thiscombination of biologically different RNA sources and known titra-tion differences provides a method for assessing the relative accuracyof each platform based on the differentially expressed genes detected.A unique feature of the MAQC project is that both sample type A andsample type B are commercially available to the community for a fewyears to come in the exact batches as those used by the MAQC project.

Six commercially available microarray platforms were tested:Applied Biosystems (ABI); Affymetrix (AFX); Agilent Technologies(AGL for two-color and AG1 for one-color); GE Healthcare (GEH);Illumina (ILM) and Eppendorf (EPP). In addition, scientists at theNational Cancer Institute (NCI) generated spotted microarrays usingoligonucleotides obtained from Operon. The RNA sample typeswere also tested on three alternative gene expression platforms:TaqMan Gene Expression Assays from Applied Biosystems (TAQTaqMan is a registered trademark of Roche Molecular Systems,Inc.); StaRT-PCR from Gene Express (GEX) and QuantiGene assaysfrom Panomics (QGN).

Each microarray platform provider selected three sites for testing. Inmost cases, five replicate assays for each of the four sample types wereprocessed at each of the test sites. Six of the microarray providers usedone-color protocols where one labeled RNA sample was hybridized toeach microarray (Table 1). The Agilent two-color and NCI micro-arrays were tested using a two-color protocol so that two differentlylabeled RNA samples were simultaneously hybridized to the samemicroarray. The Eppendorf assay contained two identical microarrayson one glass slide, which were independently hybridized to twosamples. Although only a single fluorescent dye was used, theEppendorf data are presented in a ratio format.

Each microarray provider used its own software to generate aquantitative signal value and a qualitative detection call for eachprobe on the microarray. This attention to the qualitative calls ofeach platform resulted in our using a potentially different number ofgenes in each calculation. It also had an impact on data analysis,because some, but not all, of the platforms removed suspect or lowintensity data. In addition, 11 hybridizations were removed fromfurther analysis due to quality issues. Table 1 notes the final numberof hybridizations used in the final data analysis for each micro-array platform. Further details are presented in Methods and TablesS1-S4 in Supplementary Data online. Pre-hybridization and post-hybridization quality information of samples is available as Supple-mentary Table 1 online.

A direct comparison of results across platforms was challengingbecause of inherent differences in protocols, number of data points perplatform and data preprocessing methods. Whenever possible, allplatforms were included in any comparisons, but occasionally resultsfrom one or two platforms were excluded from an analysis because thedata comparison was untenable and forced contrivance that wasultimately uninformative. Although some data from the alternativeplatforms are presented in this article, a more thorough discussion isincluded elsewhere22.

Probe mapping

Microarray experiments generally rely on a hybridization intensitymeasurement for an individual probe to infer a transcript abundancelevel for a specific gene. This relationship raises several difficult issues,including: which gene corresponds to which probe, and how sensitiveand specific is the probe. Previous publications have suggested thatsome of the variability in cross-platform studies was due to annotationproblems that made it difficult to reconcile which genes weremeasured by specific probes27–30. Despite the fact that the humangenome sequence is complete, the final list of actual genes has yetto be determined. All identifiers are moving targets, and eventhe NCBI hand-curated reference sequences are often modified.Another issue is that a gene expression assay designed to measurea given RNA target may unknowingly detect multiple alter-natively spliced transcripts, which may have different functionsand expression patterns. Thus, the number of genes or transcripts

Table 1 Gene expression platforms and data analyzed in the MAQC main study

Manufacturer Code Protocol Platform

Number of

probesa

Number of

test sites

Number of

samples

Number of

replicates

Total number of

microarraysb

Applied Biosystems ABI One-color microarray Human Genome Survey Microarray v2.0 32,878 3 4 5 58

Affymetrix AFX One-color microarray HG-U133 Plus 2.0 GeneChip 54,675 3 4 5 60

Agilent AGL Two-color microarrayc Whole Human Genome Oligo Microarray, G4112A 43,931 3 2 10 56

AG1 One-color microarray Whole Human Genome Oligo Microarray, G4112A 43,931 3 4 5 56

Eppendorf EPP One-color microarray DualChip Microarray 294 3 4 5 60

GE Healthcare GEH One-color microarray CodeLink Human Whole Genome, 300026 54,359 3 4 5 60

Illumina ILM One-color microarray Human-6 BeadChip, 48K v1.0 47,293 3 4 5 59

NCI_Operon NCI Two-color microarray Operon Human Oligo Set v3 37,632 2 4 5 33

Applied Biosystems TAQ TaqMan assays 4200,000 assays available 1,004 1 4 4 N/A

Panomics QGN QuantiGene assays B2,600 assays available 245 1 4 3 N/A

Gene Express GEX StaRT-PCR assays B1,000 assays available 207 1 4 3 N/A

Total 442

aA global definition of probes is used to include individual probes, probe sets or primer pairs depending on the gene expression platform. The numbers listed in this table are derived fromproduct literature and may include some platform duplication. Alternative figures for the number of probes analyzed are provided as Table S5 in Supplementary Data online. bMaximum numberof microarrays per one-color protocol is 60 (3 sites � 4 sample types � 5 replicates). As described in the text, replacement hybridizations but not outlier hybridizations are included in the mainstudy data analysis. Only data from 386 microarrays were analyzed in this article. Additional data sets are described in Table S4 in Supplementary Data online. cAlthough not presented in thispaper, the Agilent two-color data (56 microarrays) are discussed elsewhere24. In the remaining figures, test sites and sample types are referenced using the following nomenclature: ‘‘platformcode_test site_ sample ID’’. Sample A ¼ 100% UHRR; Sample B ¼ 100% HBRR; Sample C ¼ 75% UHRR: 25% HBRR; and Sample D ¼ 25% UHRR: 75% HBRR.


A R T I C L E S©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

detected with a gene expression platform is inherently difficult todefine and quantify.

A unique advantage of the MAQC project is that most of thesequence information for the probes used in each gene expressiontechnology was provided by the manufacturers. We mapped theprobes (see Supplementary Methods online and SupplementaryNotes online) to the RefSeq human mRNA database31 (http://www.ncbi.nlm.nih.gov/RefSeq) and to the AceView database32

(http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly), a less curatedbut more comprehensive database, which includes all the RefSeq,GenBank and dbEST human cDNA sequences. Although the totalnumber of probes varied across platforms, the six high-densitymicroarray platforms assayed similar numbers of Entrez genes(15,429–16,990) and had similar percentages of probes (68–84%)that aligned to AceView transcripts (see Table S5 in SupplementaryData online). We found that 23,971 of the 24,157 RefSeq NMAccessions from the March 8, 2006 release were assayed by atleast one platform (Supplementary Table 2 online) and that15,615 Accessions were assayed by all high-density microarrayplatforms used in the MAQC study. Because of alternative splicing,each platform mapped to roughly four RefSeq transcripts per threeEntrez genes.

To simplify the interplatform comparison, we condensed thecomplex probe-target relationships to a ‘one-probe-to-one-gene’ list.The 15,615 RefSeq entries on all of the high-density microarrayplatforms represented 12,091 Entrez genes. For each gene, weselected a single RefSeq entry (Supplementary Table 4 online),primarily the one annotated by TaqMan assays, or secondarilythe one targeted by the majority of platforms. When a platformcontained multiple probes matching the same RefSeq entry, only theprobe closest to the 3¢ end was included in the common set(Supplementary Table 3 online). In this way, we selected for eachhigh-density platform 12,091 probes matching a common set of12,091 reference sequences from 12,091 different genes (Supplemen-tary Table 5 online).

Intraplatform data repeatability and reproducibility

We examined microarray data for consistency within each platform byreviewing both the intrasite repeatability and the intersite reproduci-bility at two levels: the quantitative signal values and the qualitativedetection calls. Only genes that were detected in at least three of thefive sample replicates (or generally detected genes) were included inmost of these calculations. This filter accounts for the differentmanner in which the microarray platforms identified genes belowtheir quality thresholds, and directs our research away from the lessconfident, noisy results. The number of generally detected genes foreach sample type at each site varied from 8,000 to 12,000 for the high-density microarray platforms, but was relatively consistent betweentest sites using the same platform (Fig. 1).

The coefficient of variation (CV) of the quantitative signal valuesbetween the intrasite replicates was calculated using the generallydetected subset from the 12,091 common genes for each sample typeat every test site. The distribution of the replicate CV measures acrossthe set of detected genes is displayed in a series of box and whiskersplots in Figure 1. Most of the one-color microarray platforms and testsites demonstrated similar replicate CV median values of 5–15%,although the distributions of replicate CV results differed betweenplatforms. For the two-color NCI microarrays, the replicate CVs werecalculated using the Cy3/Cy5 ratios. (Sample type A was used as theCy5 reference in all NCI hybridizations.) These values were onlyslightly larger than the one-color signals for the same sample type.

We next examined the total CV of the quantitative signal, whichincluded both the intrasite repeatability as well as variation due tointersite differences. By definition, the total CV measure (n r 15)will be larger than the replicate CV measures (n r 5). Median valuesfor the total CV distribution and the average of three replicateCV medians for each platform are presented in Figure 2. Overall,the total CV median was very consistent across all platforms, rangingfrom 10% to just over 20% and not dramatically higher than thereplicate CV median values. In general, the total CV median was up totwice as large as the replicate CV median, but this result is not

Figure 1 Repeatability of expression signal

within test sites. For the one-color platforms,

the CV of the expression signal values between

site replicates of the same sample type was

calculated for all generally detected genes. The

distributions of these replicate CVs are presented

in a series of twelve box and whiskers plots for

each microarray platform: one for each of thefour sample types at the three test sites. The

plots are highlighted to distinguish the sample

replicates: sample A (white), sample B (light

blue), sample C (light purple) and sample D (dark

blue). The twelve plots showing results from the

platforms with three test sites are presented in

the following order from left to right: A1, A2, A3,

B1, B2, B3, C1, C2, C3, D1, D2 and D3. For the

two-color NCI platform, the CV of the expression

Cy3/Cy5 ratios between site replicates of the

same sample type was similarly calculated. The

distributions of these replicate CVs are presented

in a series of eight box and whiskers plots from

the two NCI test sites in the following order from

left to right: A1, A2, B1, B2, C1, C2, D1, and

D2. The median (gap), interquartile range as well

as the 10th and 90th percentile values are indicated in each plot. Only genes from the 12,091 common set that were detected in at least three of the

replicates were included in the box plots and CV calculations. This number varies by platform/sample/test site and is noted as the line plot with the secondary

axis and as Table S6 in Supplementary Data online. The platforms and sample types are labeled according to the nomenclature presented in Table 1.

ABI

A B C D A B C D A B C D A B C D A B C D A B C D A B C D

AFX AG1 EPP

Platform-sample

GEH ILM NCI0

2,000

4,000

6,000

No. of detected genes

8,000

10,000

12,000

0

10

20

30

CV

(%

)

40

50

60


A R T I C L E S©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

unexpected and simply implies that site-related effects should betaken into account when combining data from multiple sites usingthe same platform.

To assess variation in the qualitative measures, the percentage of the12,091 common genes with concordant detection calls betweenreplicates of the same sample type was calculated for each of thefour sample types on each platform (Fig. 3). These figures include

either all sample replicates at a single site (n r 5) or all samplereplicates across the test sites (n r 15). Most one-color test sitesdemonstrated 80–95% concordance in the qualitative calls for thesample replicates within their facility. The value dropped to 70–85%concordance for the reproducibility of the qualitative calls across allthree test sites. It is not surprising that platforms with more detectedcalls (Fig. 1) generally had higher concordance percentages. Forexample, the NCI microarrays detected almost all of the 12,091common genes and had concordance percentages near 100% betweentest sites. Microarray platforms that had lower numbers of detectedgenes generally had reduced concordance percentages. Interestingly,the GE Healthcare platform had both a large number of genes detected(B11,000 per hybridization) and approximately 85% concordancebetween test sites.

Interplatform data comparability

Expression values generated on different platforms cannot be directlycompared because unique labeling methods and probe sequences willresult in variable signals for probes that hybridize to the same target.Alternatively, the relative expression between a pair of sample typesshould be maintained across platforms. For this reason, we examinedthe microarray data for comparability between platforms by reviewingsample type B relative to sample type A expression values with threedifferent metrics: differential gene list overlap, log ratio compressionand log ratio rank correlation. For log ratio compression and rankcorrelation, only generally detected genes from the common 12,091gene list were included in the analysis. For the gene list overlap, all12,091 common genes were considered.

A list of differentially expressed genes was generated for each testsite and compared to lists from other test sites using the same platformand those using a different platform. A percent score was calculated toindicate the number of genes in common between each pair of test sitelists. The percentage of overlap for each comparison is displayed inFigure 4. Note the graphic comparisons are asymmetrical indicatingthe analysis is performed in two directions. That is, the percentage oftest site Y genes on the list from test site X can be different from the

Figure 3 Concordance of detection calls within

and between test sites. For the 12,091 commongenes, detection calls within each platform were

categorized as either ‘detected’ or ‘not detected.’

For each sample type within each platform, the

percentage of genes with calls that were perfectly

concordant as ‘detected’ within the replicates

for a given site is plotted as blue dots, and the

corresponding percentage of genes with calls

perfectly concordant as ‘detected’ across all sites

are plotted as the blue bars. The total percentage

of genes with perfectly concordant calls (detected

and not detected) within a site is plotted as the

yellow dots, and the corresponding percentage of

genes with calls perfectly concordant across all

sites is plotted as the top of the yellow bars. The

bars are split between perfectly detected genes

(blue portion) and perfectly not detected genes

(yellow portion) across all test sites. It is not

expected that detected genes are concordantacross sample types. The number of perfectly

detected genes for each test site is provided as

Table S6 in Supplementary Data online. As described in the main text, the stringency with which individual platforms determine that the data for a gene

is sufficiently reliable to be called detected has different manufacturer defaults, leading to altered concordance percentages. Changes in the settings for

sensitivity/specificity may shift the proportion of the bar assigned to each detection category. Because reliability depends on platform-specific details,

detected calls do not correspond directly to relative abundance and may vary between platforms. Note: as some platforms have removed outlier

hybridizations, the number of replicates within (n r 5) and between sites (n r 15) varies for determining concordance.

A B C D A B C D A B C D A B C D A B C D A B C DABI

5

0

10Med

ian

CV

(%

)

15

20

25

30

AFX AG1 EPP

Platform-sample

GEH ILM

No. of detected genes across sites2,000

0

4,000

6,000

8,000

10,000

12,000

Figure 2 Signal variation within and between test sites. For each of the

four sample types, the replicate CV of signal within a test site (blue bar)

and the total CV of signal across and within sites (red bar) are presented. As

in Figure 1, genes detected in at least three of the replicates of a sample

type at a single test site are included in the replicate CV calculation. Genespresent in the intersection of these gene lists are included in the total CV

calculation. (These gene lists are therefore slightly different than those in

Figure 1.) The number of such genes within each platform and sample type

is noted by blue dots connected by lines and is read on the secondary axis.

It is also reported as Table S6 in Supplementary Data online. Intrasite

normalization was performed according to default settings for each

manufacturer, and intersite normalization was performed by scaling between

sites (see main text). The NCI platform is omitted because data from only

two test sites was available in the main study so intersite reproducibility

measures may not be representative. The platforms and sample types are

labeled according to the nomenclature presented in Table 1.

100

90

80

70

60

50

40

30

20

10

0ABI

1 2A

3 1 2B

3 1 2C

3 1 2D

3 1 2D

3 1 2D

3 1 2D

3 1 2D

1 2D

1 2C

1 2B

1 2A

31 2C

31 2B

31 2A

31 2C

31 2B

31 2A

31 2C

31 2B

31 2A

31 2C

31 2B

31 2A

3

AFX AG1

Platform-sample-site

Gen

es w

ith c

onsi

sten

t cal

ls (

%)

GEH ILM NCI


A R T I C L E S©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

percentage of test site X genes on the test site Y list. For all but the NCItest sites, the gene list overlap is at least 60% for each test sitecomparison (both directions) with many site pairings achieving80% or more between platforms and 90% within platforms. Typically,the genes that the NCI microarray platform identified as differentiallyexpressed were also identified on the other platforms, suggesting alow false positive rate for this platform. However, the converse was

not necessarily true, most likely due to more log ratio compres-sion observed in the NCI platform and the use of a stringentP-value threshold.

Each microarray platform has a defined background correctionmethod and dynamic range of signal detection, which can lead toover- or underestimates of log ratios and fold changes in expressionbetween sample types. To examine the level of compression or

Test site X gene list

Overlap betweentest site pairs (%)

Test

site

Y g

ene

list

AB

I_1

AB

I_2

AB

I_3

AF

X_1

AF

X_2

AF

X_3

AG

1_1

AG

1_2

AG

1_3

GE

H_1

GE

H_2

GE

H_3

ILM

_1

ILM

_2

ILM

_3

NC

I_1

NC

I_2

ABI_1

ABI_2

ABI_3

AFX_1

AFX_2

AFX_3

AG1_1

AG1_2

AG1_3

GEH_1

GEH_2

GEH_3

ILM_1

ILM_2

ILM_3

NCI_1

NCI_2

01020304050607080

Figure 4 Agreement of gene lists. This graph indicates the concordance of

genes identified as differentially expressed for pairs of test sites, labeled as

X and Y. A list of differentially expressed genes between sample type A

replicates versus sample type B replicates was generated for each test site

(using the 12,091 common genes with Z twofold change and P o 0.001

thresholds) and compared for commonality to other test sites. The size of

these gene lists is reported as Table S7 in Supplementary Data online. No

filtering related to the qualitative detection call was performed. The color ofthe square in the matrix reflects the percent overlap of genes on the list for

the test site Y (listed in row) that are also present on the list for the test site

X (listed in column). A light-colored square indicates a high percent overlap

between the gene lists at both test sites. A dark-colored square indicates a

low percent overlap, suggesting that most genes identified in site Y were not

identified in site X. Numerical values for the percent overlap are presented

as Table S9 in Supplementary Data online. Note: the graph is asymmetric

and not complementary. Only the six high-density microarray platforms are

presented. As described in the text, data from some platforms were omitted

from these calculations because of quality issues. The platforms and sample

types are labeled according to the nomenclature presented in Table 1.

The _1, _2 and _3 suffixes refer to test site location.

AB

I_1

AB

I_2

AB

I_3

AF

X_1

AF

X_2

AF

X_3

AG

1_1

AG

1_2

AG

1_3

EP

P_1

EP

P_2

EP

P_3

GE

H_1

GE

H_2

GE

H_3

ILM

_1IL

M_2

ILM

_3N

CI_

1N

CI_

2G

EX

_1Q

GN

_1TA

Q_1

AB

I_1

AB

I_2

AB

I_3

AF

X_1

AF

X_2

AF

X_3

AG

1_1

AG

1_2

AG

1_3

EP

P_1

EP

P_2

EP

P_3

GE

H_1

GE

H_2

GE

H_3

ILM

_1IL

M_2

ILM

_3N

CI_

1N

CI_

2G

EX

_1Q

GN

_1TA

Q_1

ABI_1ABI_2ABI_3

AFX_1AFX_2AFX_3AG1_1AG1_2AG1_3EPP_1EPP_2EPP_3GEH_1GEH_2GEH_3ILM_1ILM_2ILM_3NCI_1NCI_2

GEX_1QGN_1TAQ_1

ABI_1

Test site X

Compression/expansiondifference between

site pairs

Rank correlation of log ratios between

site pairs

Test site X

Test

site

Y

ABI_2ABI_3

AFX_1AFX_2AFX_3AG1_1AG1_2AG1_3EPP_1EPP_2EPP_3GEH_1GEH_2GEH_3ILM_1ILM_2ILM_3NCI_1NCI_2

GEX_1QGN_1TAQ_1

0.600.500.400.300.200.100.00

0.000.100.200.300.400.500.600.700.800.901.00

–0.10–0.20–0.30–0.40–0.50–0.60

a b

Figure 5 Agreement of log ratios across platforms and test sites. (a) Log ratio compression/expansion. This graph indicates the percent difference from

equivalency between platform/sites (corresponding to a slope value 1 for the best fitted line using orthogonal regression) of the log ratio differential

expression using A and B replicates. A dark spot implies equivalency (slope ¼ 1 - percent difference ¼ 0). A positive percent difference in slope from the

ideal line (aqua) indicates compression of log signal for test site Y relative to test site X. A negative percent difference in the ideal line (magenta) indicates

expansion. Read as ‘‘What is the difference from equivalence in slope (m ¼ 1) for the test site Y versus test site X ?’’ Only genes detected by both test sites

in at least three replicates of sample type A and three replicates of sample type B are included in the calculation, and the number for each pair is reported

as Table S8 in Supplementary Data online. Numerical values for the percent difference are presented as Table S10 in Supplementary Data online. Note: the

graph is asymmetric, but approximately complementary. As described in the text, data from some platforms were omitted from these calculations due to

quality issues. The platforms and sample types are labeled according to the nomenclature presented in Table 1. The _1, _2 and _3 suffixes refer to test site

location. (b) Rank correlation of log ratios. This graph indicates the correlation of the log ratio differential expression values (using A versus B replicates)

when we examine their rank. Large positive log ratio values would be ranked high and large negative log ratio values would be ranked low. Read as ‘‘What isthe correlation of the rank log ratio values between the test site Y and the test site X?’’ Only genes generally detected in both sample types A and B and by

both test sites are included in the calculation, and the number for each pair is reported as Table S8 in Supplementary Data online. Numerical values for the

rank correlation are presented as Table S11 in Supplementary Data online. Note: the graph is symmetric. As described in the text, data from some platforms

were omitted from these calculations due to quality issues. The platforms and sample types are labeled according to the nomenclature presented in Table 1.

The _1, _2 and _3 suffixes refer to test site location.


A R T I C L E S©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

expansion in log ratios, we determined the best fitted line for the logratio estimates between pairs of test sites. The percent difference of theslope for each comparison is displayed in Figure 5a. An ideal slope of1 would result in a percent difference of 0; negative or positive percentdifferences in the slope of the ideal line indicate compression orexpansion of the log ratios in one test site relative to the other. Foreach commercial one-color platform, good agreement was observedbetween its three test sites. Most of the interplatform test sitecomparisons also showed little compression or expansion. Test site 1for the NCI microarrays produced consistently different results fromthe other test sites, both within and between platforms.

The comparability of results across platforms was also examinedusing a rank correlation metric. Log ratios for the differential expres-sion observed between sample B replicates and sample A replicateswere calculated for the generally detected common genes and thencompared between test sites and across platforms. The rank correla-tions of the log ratios are displayed visually in Figure 5b. Goodagreement was observed between all sites, even those using differentmicroarray platforms. In fact, the median rank correlation was 0.87and the smallest rank correlation value was 0.69 between the micro-array platforms.

Assessing relative accuracy

The relative accuracy of the microarray platforms can be assessedusing either the titrated mixtures of the RNA samples23 or geneabundance measurements collected with alternative technologies22.Figure 5, as well as Tables S12 and S13 in Supplementary Data online,illustrate the relative rank correlation and compression/expansionvalues for log (B/A) between microarray-based and alternative geneexpression technologies. Further comparisons between each micro-array platform relative to the TaqMan assays are presented as scatterplots in Figure 6.

The log ratios of sample type B to sample type A expressiondetected on the TaqMan assays were compared to the log ratiosobtained for the same genes on the microarray assays. Only genesthat were generally detected in both sample A and B replicates on theTaqMan assays and on the microarray were included in this analysis.The relative accuracy of each high-density platform to the TaqManassay data was generally higher for those microarray platforms withfewer genes detected as indicated by number and magnitude ofdeviations from the ideal 451 line indicated in Figure 5a and Figure 6.

Correlation with alternative platforms

Similarly, the Affymetrix, Agilent, and Illumina platforms displayedhigh correlation values of 0.90 or higher with TaqMan assays based oncomparisons of B450–550 genes, whereas the GE Healthcare and NCIplatforms had a reduced average correlation of 0.84, but includedalmost 30% more genes in the data comparisons. These additionalgenes were not identified as ‘not detected’ during the data reviewprocess, but may represent less confident results due to lower signalsexhibiting greater variance. Thus, much of the difference in compar-ability metrics may be a reflection of the algorithm used to assigndetection calls. Similar correlation values for the microarray platformswere observed relative to each of the other alternative platforms,StaRT-PCR, and QuantiGene22.

DISCUSSION

The results of the MAQC project provide a framework for assessingthe potential of microarray technologies as a tool to provide reliablegene expression data for clinical and regulatory purposes. All one-color microarray platforms had a median CV of 5–15% for thequantitative signal (Fig. 1) and a concordance rate of 80–95%for the qualitative detection call (Fig. 3) between sample replicates.This variation increased when data from different test sites using

a

e f g

b c d12

10

8

6

4

2

0

–2

–4

–6

–8

–10

–10 –8 –6 –4

AB

I log

rat

io

TaqMan log ratio

AF

X lo

g ra

tio

TaqMan log ratio

AG

1 lo

g ra

tio

TaqMan log ratio

EP

P lo

g ra

tio

TaqMan log ratio

NC

I log

rat

io

TaqMan log ratio

ILM

log

ratio

TaqMan log ratio

GE

H lo

g ra

tio

TaqMan log ratio

–2 0 2 4 6 8 10 12

Site1: n = 528, r = 0.86Site2: n = 523, r = 0.85Site3: n = 567, r = 0.84





Site1: n = 769, r = 0.82Site2: n = 740, r = 0.83


–12–12

12

10

8

6

4

2

0

–2

–4

–6

–8

–10

–10 –8 –6 –4 –2 0 2 4 6 8 10 12–12

–12

12

10

8

6

4

2

0

–2

–4

–6

–8

–10

–10 –8 –6 –4 –2 0 2 4 6 8 10 12–12

–12

12

10

8

6

4

2

0

–2

–4

–6

–8

–10

–10 –8 –6 –4 –2 0 2 4 6 8 10 12–12

–12

12

10

8

6

4

2

0

–2

–4

–6

–8

–10

–10 –8 –6 –4 –2 0 2 4 6 8 10 12–12

–12

12

10

8

6

4

2

0

–2

–4

–6

–8

–10

–10 –8 –6 –4 –2 0 2 4 6 8 10 12–12

–12

12

10

8

6

4

2

0

–2

–4

–6

–8

–10

–10 –8 –6 –4 –2 0 2 4 6 8 10 12–12

–12

Figure 6 Correlation between microarray and TaqMan data. The scatter plots compare the log ratio differential expression values (using A versus B replicates)

from each microarray platform relative to values obtained by TaqMan assays. Each point represents a gene that was measured on both the microarray and

TaqMan assays. The spot coloring indicates whether the data were generated in test site 1 (black), test site 2 (blue) or test site 3 (red) for the microarray

platform. Only genes that were generally detected in sample type A replicates and sample type B replicates were used in the comparisons. The exact number

of probes analyzed for each test site and its correlation to TaqMan assays are listed in the bottom right corner of each plot. As described in the text, data

from some platforms were omitted from these calculations because of quality issues. The platforms and sample types are labeled according to the

nomenclature presented in Table 1. The line shown is the ideal 451 line.


A R T I C L E S©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

the same platform were included (Figs. 2 and 3). However, listsof differentially expressed genes averaged B89% overlap betweentest sites using the same platform and B74% overlap across one-color microarray platforms (Fig. 4). Importantly, the ranks of logratios were highly correlated among the microarrays (minimumR ¼ 0.69; Fig. 5b), indicating that all platforms were detecting similarchanges in gene abundance. These results indicate that, for thesesample types and these laboratories, microarray results were generallyrepeatable within a test site, reproducible between test sites andcomparable across platforms, even when the platforms used probeswith sequence differences as well as unique protocols for labeling andexpression detection.

Within the MAQC study, there were notable differences in variousdimensions of performance between microarray platforms. Someplatforms had better intrasite repeatability overall (e.g., Illumina),better intersite reproducibility (e.g., Affymetrix), or more consistencyin the detection calls (e.g., GE Healthcare). Likewise, some platformswere more comparable to TaqMan assays (e.g., Applied Biosystemsand Agilent one-color), whereas others demonstrated signal compres-sion (e.g., NCI_Operon). Some of these differences were manifest inthe apparent power analyses (see Figure S1 in Supplementary Dataonline) as test sites with smaller CV values (Fig. 1) typically had morepower to discriminate differences between groups, as would beexpected. Other differences might have been related to the platform’ssignal-to-analyte response characteristics22. It is important to note that11 (2.4%) of the 453 microarray hybridizations were removed fromthe analysis due to quality issues (listed as Table S1 in SupplementaryData online). The relative performance of some platforms might havebeen altered if this data filter had not been applied.

Each microarray platform has made different trade-offs with respectto repeatability, sensitivity, specificity and ratio compression. Oneinteresting result was that platforms with divergent approaches tomeasuring expression often generated comparable results. For exam-ple, data from Affymetrix test sites, which use multiple short oligo-nucleotide probes per target with perfect match and mismatchsequences, and Illumina test sites, which use plasma-etched siliconwafers containing beads with long oligonucleotide probes, wereremarkably similar in the numbers of genes detected and the detectioncall consistency, gene list overlap and ratio compression analyses. Inother words, the expression patterns generated were reflective ofbiology regardless of the differences in technology.

Some of the results were affected by differences in data analysis anddetection call algorithms. This effect is most noticeable in the fold-change compression observed in the two-color results from the NCImicroarrays, which generally included low intensity probes resultingin over 95% detection call rate. The comparability of the NCImicroarrays relative to the other platforms improves when back-ground is based on ‘alien’ or negative control sequences. This alter-native method reduces the detection call rate to 60–70%, whilegenerally increasing the absolute fold changes in up- and down-regulated genes (E.S.K., unpublished data). Interestingly, the NCIplatform had lower intrasite repeatability (Fig. 1), but demonstratedcomparable rankings in log ratios when compared to the otherplatforms (Fig. 5b).

Additional analyses of the MAQC data are provided in theaccompanying articles. For example, the microarray platformsdetected known differences in gene abundance between definedRNA mixtures23 and generated differential expression results thatwere comparable with other gene expression platforms22,24. Thecomparability of the gene expression results increased whenthe microarrays and other methodologies analyzed overlapping

sequences from the same gene22. Furthermore, external RNA controlsincluded in some microarray platforms were useful predictors oftechnical performance25.

Direct comparison of different microarray platforms is neither anew nor an original idea in the realm of high-throughput biology.However, the data set generated by the MAQC project is unique inboth its size and content. The main study compares seven differentmicroarray platforms and includes B60 hybridizations per platformusing well-characterized, commercially available RNA sample types.Including the reagents used in the two pilot studies and the toxico-genomics validation study26, 1,327 microarrays have been used for thisproject (see Table S4 in Supplementary Data online). Moreover, theavailability of the probe sequences in the MAQC project enabled us toapproach the interplatform comparisons with greater scientific rigor.We performed detailed probe mapping to confirm identity and revealpotential sequence- or target-based differences between the geneexpression platforms. This analysis confirmed that the great majorityof probes were very carefully chosen and of high quality.

Most of the results in this report are based on a set of 12,091common genes that are represented on six high-density microarrayplatforms, but which generally use different probe sequences fordetection. Our probe selection procedure may have introduced abias in the study because the imposed criteria neither reflect theplatform design philosophies nor does it account for the very richunderlying biology. More than one probe per target can be a highlydesirable feature on microarray platforms because a single probe maynot capture all tissue-specific effects. We also found a number ofprobes that were not gene specific, suggesting a strategy of targetingmultigene families.

The MAQC data set captures intrasite, intersite and interplatformdifferences. However, it does not address protocol, time or othertechnical variables within a test site because all test sites used the sameprotocol and generated replicate data at approximately the same time(except as noted under data filtering). The effect and levels of thesesources of variation have been described in other studies15,33. Further-more, our analysis does not include performance metrics based on‘biology’ (e.g., Gene Ontology terms or pathways)26. Though arelatively high level of concordance of differentially expressed genelists were observed in this study, it is possible that a higher level ofagreement would be detected using these other methods of gene listconcordance34, or that a lower level would be observed with sampletypes that were more realistically similar.

It should be noted that the results presented in this paper in termsof log ratios and overlap of lists of differentially expressed genes werederived from comparing sample types A and B, which exhibited thegreatest differences among the four sample types used in the MAQCproject. In practical applications, the expected differences betweensample types (e.g., treated versus control animals) are usually muchsmaller compared to those seen between sample types A and B.Therefore, the comparability of microarray data reported in thispaper does not necessarily mean that the same level of consistencywould be achieved in toxicogenomic or pharmacogenomic applica-tions. This difference can be seen from the relatively lower power andsmaller overlap of gene lists (see Figures S1-S2 in SupplementaryData online) when comparing sample types C and D, where themaximum fold change is three.

The MAQC data set can be used to compare normalizationmethods23 and data analysis algorithms26 (see Figure S2 in Supple-mentary Data online), similar to a currently available website (http://affycomp.biostat.jhsph.edu) which illustrates the impact of the differ-ent data analysis methods on expression results30,34. It is our hope that


A R T I C L E S©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

future studies will add to the MAQC data set. For example, microarrayproviders could submit gene expression results from new microarrayswith updated probe content and then use the MAQC data set toconfirm consistency with older versions of the microarray. In an effortto equally represent all platforms and to present results in a timelymanner, this publication analyzed only 386 microarray hybridizationsfrom 20 test sites. However, additional data sets from the MAQC mainstudy are available (listed as Tables S1-S4 in Supplementary Dataonline). Although most sites generated quality results, some differ-ences were detected between test sites using the same platform. Thus,microarray studies need unified metrics and standards, which can beused to identify suboptimal results and monitor performance inmicroarray facilities.

Previous reports have relied heavily on the statistical significance(P value) rather than on the actual measured quantity of differentialexpression (fold change or ratio) in identifying differentially expressedgenes. This strict reliance on P values alone has resulted in theapparent lack of agreement between sites and microarray plat-forms20,26. Our results from analyzing the MAQC human data sets(see Figure S2 in Supplementary Data online) and the rat toxicoge-nomics data set26 indicate that a straightforward approach of fold-change ranking plus a nonstringent P cutoff can be successful inidentifying reproducible gene lists, whereas ranking and selectingdifferentially expressed genes solely by the t-test statistic predestine apoor concordance in results, in particular for shorter gene lists, due tothe relatively unstable nature of the variance (noise) estimate in thet-statistic measure. More robust methods such as ranking using thetest statistic from the Significance Analysis of Microarrays (SAM)35

did not generate more reproducible results compared to fold-changeranking in our cross-laboratory and interplatform comparisons. Ourresults are consistent with previously published data20. Furthermore,the impact of normalization methods on the reproducibility of genelists becomes minimal when the fold change, instead of the P value, isused as the ranking criterion for gene selection24,26.

Two initiatives for microarray reference materials are currently inprogress. A group led by FDA’s Center for Drug Evaluation andResearch (CDER) developed two mixed-tissue RNA pools withknown differences in tissue-selective genes that can be used as ratreference materials36, whereas the External RNA Controls Consortium(ERCC) is testing polyadenylated transcripts that can be added to eachRNA sample before processing to monitor the technical performanceof the assay37. The MAQC project complements these efforts byestablishing several commercially available human reference RNAsamples, and an accompanying large data set, which can be used bythe scientific community to compare results generated in their ownlaboratories for quality control and performance validation efforts.In fact, the commercial availability of the MAQC reference sampletypes allowed several laboratories to generate and submit additionalgene expression data to the MAQC project after the official deadline(listed as Table S4 in Supplementary Data online).

Repeated intersite comparisons, such as a proficiency testing,are required three times a year for many Clinical LaboratoryImprovement Amendments (CLIA) assays and may also be usefulin microarray facilities to monitor the comparability and con-sistency of data sets generated over time38. For example, a proficiencytesting program evaluated the performance over a 9-month periodof 18 different laboratories by repeatedly hybridizing threereplicates of the same two RNA sample types to Affymetrix micro-arrays (L.H.R. and W.D.J., unpublished results). This study revealedthe range of quality metrics and the impact of protocol differenceson the microarray results. The MAQC human reference RNA

sample types could be used in this kind of intersite proficiencytesting program.

In summary, the technical performance of microarrays as assessedin the MAQC project supports their continued use for gene expressionprofiling in basic and applied research and may lead to their use as aclinical diagnostic tool as well. International organizations such asERCC37, the Microarray Gene Expression Data Society39 and thisMAQC project are providing the microarray community with stan-dardization of data reporting, common analysis tools and usefulcontrols that can help provide confidence in the consistency andreliability of these gene expression platforms.

METHODSProbe mapping. Affymetrix, Agilent, GE Healthcare, Illumina and Operon

oligonucleotides used by the NCI provide publicly available probe sequences

for their microarray platforms in a spreadsheet format (websites listed in

Supplementary Data online). The probe sequences for Applied Biosystems

microarrays can be individually obtained through the Panther database (http://

www.pantherdb.org) and the sequences of the intended regions for QuantiGene

(Panomics) assays are available upon request. Probe sequences for Eppendorf

microarrays are not yet publicly available, but were provided to the MAQC

project for confidential analysis. Gene Express provided annotation and

approximate forward and reverse primer locations for the StaRT-PCR assays,

which were sufficient to localize the intended target. For TaqMan assays,

Applied Biosystems provided Assay ID, amplicon size, assay location on the

RefSeq and a context sequence (exact 25-nt sequence that includes the TaqMan

assay detection probe). The MAQC probe mapping (Supplementary Methods

online and Supplementary Notes online) used the March 8, 2006 RefSeq

release containing 24,000 curated accessions to which we subjectively added 157

entries that were recently either withdrawn or retired from the NCBI curation.

AceView comparisons were based on the August 2005 database32.

An exact match of the sequence of the probe to the database entry was

required. Probes matching only the reverse strand of a transcript were excluded

as well as probes matching more than one gene. An exact match of 80% of the

probes within a probe set (usually 9 probes out of 11) was required for

Affymetrix. The results based on these stringent criteria are provided as

Supplementary Tables 2–5 online and summarized as Table S5 in Supple-

mentary Data online. The counts for the StaRT-PCR and TaqMan assays were

based on the annotation provided by Gene Express and Applied Biosystems. In

the AceView analysis, the mapping was tolerant to low levels of noncentral

mismatches, but applied a stringent gene-specific filter so that probes which

potentially cross-hybridize were removed even if they had a single exact match.

RNA preparation. The total RNA sources were tested and selected based on the

results of 160 microarrays from Pilot Project I (data not shown). The Universal

Human Reference RNA (catalog no. 740000) and Human Brain Reference RNA

(catalog no. 6050) were generously donated by Stratagene and Ambion,

respectively. The four titration mixtures of the samples were selected based

on the results of 254 microarrays from Pilot Project II (data not shown) and

prepared as described elsewhere23. The titration pools were mixed at the same

time at one site using a documented protocol (MAQC_RNA_Preparation_

SOP.doc) available at the MAQC website (http://www.fda.gov/nctr/science/

centers/toxicoinformatics/maqc/). Each test site received 50-mg aliquots of the

four sample types and confirmed the RNA quality using a Bioanalyzer (Agilent)

before initiating target preparation.

Target preparation and quality assessments. Every test site was provided with

instructions (MAQC_Sample_Processing_Overview_SOP.doc) on the proces-

sing of RNA samples, conducting quality assessment of RNA reference samples,

target preparations and replication guidelines, standardized nomenclature for

referencing samples and a template for reporting quality assessment data

(MAQC_RNA_Quality_Report_Template.xls). The gene expression vendors

generously provided all reagents to the test sites. Each microarray test site

assessed cRNA yields using a spectrophotometer and determined the median

transcript sizes using a Bioanalyzer (Agilent). Pre-hybridization and post-

hybridization quality metrics are presented as Supplementary Table 1 online.


A R T I C L E S©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

Some statistically significant differences were observed in these quality metrics

between sites (data not shown).

Affymetrix, Agilent, Applied Biosystems and Eppendorf test sites added

platform-specific external RNA controls to the samples before processing25.

Data were submitted to the FDA’s National Center for Toxicological Research

(FDA/NCTR) directly from each test site and distributed to the eleven official

analysis sites for review. Lists of the gene expression test sites and data analysis

centers are available as Tables S1 and S2 in Supplementary Data online. All test

sites for one vendor used the same target preparation protocols and processed

all replicates at approximately the same time, with two exceptions: (i) Micro-

array slides at the NCI test sites were scanned at 100% laser power, but the

photomultiplier setting varied from slide to slide and (ii) some outlier

hybridizations were repeated at a later date as described below. Exact protocols

for sample processing are available at the MAQC website (http://www.fda.gov/

nctr/science/centers/toxicoinformatics/maqc/) and are briefly described in

Supplementary Data online.

Data filters. Outlier hybridizations were repeated or removed from the analysis

after the original data submission deadline in October 2005. One site each for

the NCI and GE Healthcare platforms repeated all sample types in the MAQC

study (NCI_2 and GEH_2) due to protocol issues. One Illumina site (ILM_2)

repeated two samples in the MAQC study due to low cRNA yield, and another

Illumina site (ILM_1) did not hybridize one sample replicate due to the same

reason. Data quality from 11 hybridizations at seven test sites (ABI_2, ABI_3,

AG1_1, AG1_2, AG1_3, AGL_1 and AGL_2) was not satisfactory. More details

are provided as Table S3 in Supplementary Data online.

Data processing. The platform-specific methods used for background subtrac-

tion, data normalization and the optional incorporation of offset values are

described in Supplementary Data online. Each test site submitted its data

(including image files) to the FDA/NCTR. All data were imported into the

ArrayTrack database system40,41 and preprocessed and normalized according to

the manufacturer’s suggested procedures. Each gene was reviewed for quality

and marked with a detection call, using the manufacturer’s protocol. Data in a

uniform format were distributed to all test sites and official data analysis sites

for independent study.

Data analysis. Data analyses were performed on either all of the 12,091

common genes or a subset of this group based on the qualitative detection

call reported for each hybridization. The size of these subsets in each of the

test sites for each sample type is reported as Table S6 in Supplementary

Data online.

Signal repeatability and reproducibility. The coefficient of variation (CV) of

the signal or Cy3/Cy5 values (not log transformed) between the intrasite

replicates (n r 5) was calculated for genes that were detected in at least three

replicates of the same sample type within a test site. The distributions of these

replicate CV values are displayed in Figure 1. The replicate CV medians from

three test sites are included in Figure 2. A total CV (Fig. 2) of the signal values

was calculated for all replicates across three test sites (n r 15) using the

intersection of the generally detected gene lists (that is, genes detected in at least

three replicates at all three sites). A global scaling normalization is inherently

applied to data from the GE Healthcare and Agilent platforms, but is not part

of data extraction and normalization on the Applied Biosystems, Affymetrix

(using PLIER+16) and Illumina platforms. To account for these differences,

Applied Biosystems, Affymetrix and Illumina provided scaling factors for each

test site that were included when measuring the total CV.

Concordance of detection call. Analyses were performed on all 12,091

common genes using the feature quality metrics provided by the manufac-

turers. All calls were resolved to a Detected or Not Detected status. Details on

each platform’s method of determining qualitative calls are provided in

Supplementary Data online. In general, the results are provided regarding

the consistency of the resolved detection calls. If the call was missing because

the microarray was absent, then the detection value was not considered.

Otherwise, the qualitative call was considered, including those cases where

the signal value was missing.

Gene list agreement. A list of differentially expressed genes was identified for

each test site using the usual two group t-test that assumes equal variances

between groups resulting in a pooled estimate of variance. This calculation is

based on log signal. The criteria were P value o 0.001 and a mean difference

greater than or equal to twofold. No filtering related to gene detection was

performed. For each pair of test sites, the number of genes in both lists was

identified. Percent overlap (Fig. 4) was calculated as the number of genes in

common divided by number of genes on the list from one test site. For

example, the agreement score for test site Y relative to test site X equals

the number of genes on both lists divided by the number of genes on the

test site Y list.

Log ratio comparability. The log ratio of each gene is defined as the average of

log signal for all sample B replicates minus the average of log signal of all

sample A replicates. (This value is the equivalent of the log of the ratio of the

geometric average of signal for all sample A replicates to the geometric average

of signal for all sample B replicates.) Only genes that were detected in at least

three sample A replicates and detected in at least three sample B replicates for

both test sites were included. To detect compression or expansion (Fig. 5a), the

slope (m) was calculated for each pair of test sites using orthogonal regression

due to the potential measurement error in both sites. This analysis is based on

the formula y ¼ mx + b, where y is the log ratio from test site Y and x is the log

ratio from test site X. As the ideal slope is 1, the percent difference from ideal is

simply m – 1. Comparability between a pair of test sites was also examined

using Spearman rank correlations of the log ratios (Fig. 5b). This value

compares the relative position of a gene in the test site X rank order of the

log ratio (fold change) values against its position in test site Y rank order.

Scatter plots of the log ratios from all sites against the log ratios generated with

the TaqMan assays are presented in Figure 6.

Accession numbers. All data are available through GEO (series accession

number: GSE5350), ArrayExpress (accession number: E-TABM-132),

ArrayTrack (http://www.fda.gov/nctr/science/centers/toxicoinformatics/Array

Track/), and the MAQC web site (http://www.fda.gov/nctr/science/centers/

toxicoinformatics/maqc/).


ACKNOWLEDGMENTSAll MAQC participants freely donated their time and reagents for thecompletion and analysis of the MAQC project. Participants from the NationalInstitutes of Health (NIH) were supported by the Intramural Research Programof NIH, Bethesda, Maryland. D.H. thanks Ian Korf for BLAST discussions.This study utilized a number of computing resources, including the high-performance computational capabilities of the Biowulf PC/Linux cluster atthe NIH (http://biowulf.nih.gov/) as well as resources at the analysis sites.

DISCLAIMERThis work includes contributions from, and was reviewed by, the FDA, the EPAand the NIH. This work has been approved for publication by these agencies,but it does not necessarily reflect official agency policy. Certain commercialmaterials and equipment are identified in order to adequately specifyexperimental procedures. In no case does such identification implyrecommendation or endorsement by the FDA, the EPA or the NIH, nordoes it imply that the items identified are necessarily the best available forthe purpose.

COMPETING INTERESTS STATEMENTThe authors declare competing financial interests (see the Nature Biotechnologywebsite for details).

Published online at http://www.nature.com/naturebiotechnology/

Reprints and permissions information is available online at http://npg.nature.com/

reprintsandpermissions/

1. Lesko, L.J. & Woodcock, J. Translation of pharmacogenomics and pharmacogenetics: aregulatory perspective. Nat. Rev. Drug Discov. 3, 763–769 (2004).

2. Frueh, F.W. Impact of microarray data quality on genomic data submissions to the FDA.Nat. Biotechnol. 24, 1105–1107 (2006).

3. Dix, D.J. et al. A framework for the use of genomics data at the EPA. Nat. Biotechnol.24, 1108–1111 (2006).


A R T I C L E S©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

4. Tan, P.K. et al. Evaluation of gene expression measurements from commercial micro-array platforms. Nucleic Acids Res. 31, 5676–5684 (2003).

5. Ramalho-Santos, M., Yoon, S., Matsuzaki, Y., Mulligan, R.C. & Melton, D.A. ‘‘Stem-ness’’: transcriptional profiling of embryonic and adult stem cells. Science 298,597–600 (2002).

6. Ivanova, N.B. et al. A stem cell molecular signature. Science 298, 601–604 (2002).7. Miller, R.M. et al. Dysregulation of gene expression in the 1-methyl-4-phenyl-1,2,3,6-

tetrahydropyridine-lesioned mouse substantia nigra. J. Neurosci. 24, 7445–7454(2004).

8. Fortunel, N.O. et al. Comment on ‘‘‘Stemness’: transcriptional profiling of embryonicand adult stem cells’’ and ‘‘a stem cell molecular signature’’. Science 302, 393 authorreply 393 (2003).

9. Miklos, G.L. & Maleszka, R. Microarray reality checks in the context of a complexdisease. Nat. Biotechnol. 22, 615–621 (2004).

10. Frantz, S. An array of problems. Nat. Rev. Drug Discov. 4, 362–363 (2005).11. Marshall, E. Getting the noise out of gene arrays. Science 306, 630–631 (2004).12. Michiels, S., Koscielny, S. & Hill, C. Prediction of cancer outcome with microarrays: a

multiple random validation strategy. Lancet 365, 488–492 (2005).13. Ein-Dor, L., Zuk, O. & Domany, E. Thousands of samples are needed to generate a

robust gene list for predicting outcome in cancer. Proc. Natl. Acad. Sci. USA 103,5923–5928 (2006).

14. Petersen, D. et al. Three microarray platforms: an analysis of their concordance inprofiling gene expression. BMC Genomics 6, 63 (2005).

15. Dobbin, K.K. et al. Interlaboratory comparability study of cancer gene expressionanalysis using oligonucleotide microarrays. Clin. Cancer Res. 11, 565–572 (2005).

16. Irizarry, R.A. et al. Multiple-laboratory comparison of microarray platforms. Nat.Methods 2, 345–350 (2005).

17. Larkin, J.E., Frank, B.C., Gavras, H., Sultana, R. & Quackenbush, J. Independence andreproducibility across microarray platforms. Nat. Methods 2, 337–344 (2005).

18. Kuo, W.P. et al. A sequence-oriented comparison of gene expression measurementsacross different hybridization-based technologies. Nat. Biotechnol. 24, 832–840(2006).

19. Shi, L. et al. QA/QC: challenges and pitfalls facing the microarray community andregulatory agencies. Expert Rev. Mol. Diagn. 4, 761–777 (2004).

20. Shi, L. et al. Cross-platform comparability of microarray technology: intra-platformconsistency and appropriate data analysis procedures are essential. BMC Bioinfor-matics 6 Suppl. 2, S12 (2005).

21. Ji, H. & Davis, R.W. Data quality in genomics and microarrays. Nat. Biotechnol. 24,1112–1113 (2006).

22. Canales, R.D. et al. Evaluation of DNA microarray results with quantitative geneexpression platforms. Nat. Biotechnol. 24, 1115–1122 (2006).

23. Shippy, R. et al. Using RNA sample titrations to assess microarray platform perfor-mance and normalization techniques. Nat. Biotechnol. 24, 1123–1131 (2006).

24. Patterson, T.A. et al. Performance comparison of one-color and two-color platformswithin the MicroArray Quality Control (MAQC) project. Nat. Biotechnol. 24, 1140–1150 (2006).

25. Tong, W. et al. Evaluation of external RNA controls for the assessment of microarrayperformance. Nat. Biotechnol. 24, 1132–1139 (2006).

26. Guo, L. et al. Rat toxicogenomic study reveals analytical consistency across microarrayplatforms. Nat. Biotechnol. 24, 1162–1169 (2006).

27. Mecham, B.H. et al. Sequence-matched probes produce increased cross-platformconsistency and more reproducible biological results in microarray-based gene expres-sion measurements. Nucleic Acids Res. 32, e74 (2004).

28. Carter, S.L., Eklund, A.C., Mecham, B.H., Kohane, I.S. & Szallasi, Z. Redefinition ofAffymetrix probe sets by sequence overlap with cDNA microarray probes reduces cross-platform inconsistencies in cancer-associated gene expression measurements. BMCBioinformatics 6, 107 (2005).

29. Draghici, S., Khatri, P., Eklund, A.C. & Szallasi, Z. Reliability and reproducibility issuesin DNA microarray measurements. Trends Genet. 22, 101–109 (2006).

30. Irizarry, R.A., Wu, Z. & Jaffee, H.A. Comparison of Affymetrix GeneChip expressionmeasures. Bioinformatics 22, 789–794 (2006).

31. Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI Reference Sequence (RefSeq): acurated non-redundant sequence database of genomes, transcripts and proteins.Nucleic Acids Res. 33, D501–D504 (2005).

32. Thierry-Mieg, D.& J, T.-M. AceView: a comprehensive cDNA-supported gene andtranscripts annotation. Genome Biology 7, Suppl. 1, S12 (2006).

33. Bammler, T. et al. Standardizing global gene expression analysis between laboratoriesand across platforms. Nat. Methods 2, 351–356 (2005).

34. Harr, B. & Schlotterer, C. Comparison of algorithms for the analysis of Affymetrixmicroarray data as evaluated by co-expression of genes in known operons. NucleicAcids Res. 34, e8 (2006).

35. Tusher, V.G., Tibshirani, R. & Chu, G. Significance analysis of microarrays appliedto the ionizing radiation response. Proc. Natl. Acad. Sci. USA 98, 5116–5121(2001).

36. Thompson, K.L. et al. Use of a mixed tissue RNA design for performance assessmentson multiple microarray formats. Nucleic Acids Res. 33, e187 (2005).

37. Baker, S.C. et al. The External RNA Controls Consortium: a progress report. Nat.Methods 2, 731–734 (2005).

38. Reid, L.H. The value of a proficiency testing program to monitor performance inmicroarray laboratories. Pharm. Discov. 5, 20–25 (2005).

39. Ball, C.A. et al. Standards for microarray data. Science 298, 539 (2002).40. Tong, W. et al. ArrayTrack–supporting toxicogenomic research at the U.S. Food and

Drug Administration National Center for Toxicological Research. Environ. HealthPerspect. 111, 1819–1826 (2003).

41. Tong, W. et al. Development of public toxicogenomics software for microarray datamanagement and analysis. Mutat. Res. 549, 241–253 (2004).

AUTHORS

The following authors contributed to project leadership:

Leming Shi1, Laura H Reid2, Wendell D Jones2, Richard Shippy3, Janet A Warrington4, Shawn C Baker5, Patrick J Collins6, Francoise de Longueville7,

Ernest S Kawasaki8, Kathleen Y Lee9, Yuling Luo10, Yongming Andrew Sun9, James C Willey11, Robert A Setterquist12, Gavin M Fischer13, Weida Tong1,

Yvonne P Dragan1, David J Dix14, Felix W Frueh15, Federico M Goodsaid15, Damir Herman16, Roderick V Jensen17, Charles D Johnson18, Edward K

Lobenhofer19, Raj K Puri20, Uwe Scherf21, Jean Thierry-Mieg16, Charles Wang22, Mike Wilson12,18, Paul K Wolber6, Lu Zhang9,23, William Slikker, Jr1,

Leming Shi1, Laura H Reid2

Project leader: Leming Shi1

Manuscript preparation team leader: Laura H Reid2

MAQC Consortium:

Leming Shi1, Laura H Reid2, Wendell D Jones2, Richard Shippy3, Janet A Warrington4, Shawn C Baker5, Patrick J Collins6, Francoise de Longueville7,

Ernest S Kawasaki8, Kathleen Y Lee9, Yuling Luo10, Yongming Andrew Sun9, James C Willey11, Robert A Setterquist12, Gavin M Fischer13, Weida Tong1,

Yvonne P Dragan1, David J Dix14, Felix W Frueh15, Federico M Goodsaid15, Damir Herman16, Roderick V Jensen17, Charles D Johnson18,

Edward K Lobenhofer19, Raj K Puri20, Uwe Scherf21, Jean Thierry-Mieg16, Charles Wang22, Mike Wilson12,18, Paul K Wolber6, Lu Zhang9,23,

Shashi Amur15, Wenjun Bao24, Catalin C Barbacioru9, Anne Bergstrom Lucas6, Vincent Bertholet7, Cecilie Boysen25, Bud Bromley25, Donna Brown26,

Alan Brunner3, Roger Canales9, Xiaoxi Megan Cao27, Thomas A Cebula28, James J Chen1, Jing Cheng29, Tzu-Ming Chu24, Eugene Chudin5, John Corson6,

J Christopher Corton14, Lisa J Croner30, Christopher Davies4, Timothy S Davison18, Glenda Delenstarr6, Xutao Deng22, David Dorris12, Aron C Eklund17,

Xiao-hui Fan1, Hong Fang27, Stephanie Fulmer-Smentek6, James C Fuscoe1, Kathryn Gallagher31, Weigong Ge1, Lei Guo1, Xu Guo4, Janet Hager32,

Paul K Haje33, Jing Han20, Tao Han1, Heather C Harbottle34, Stephen C Harris1, Eli Hatchwell35, Craig A Hauser36, Susan Hester14, Huixiao Hong27,

Patrick Hurban19, Scott A Jackson28, Hanlee Ji37, Charles R Knight38, Winston P Kuo39, J Eugene LeClerc28, Shawn Levy40, Quan-Zhen Li41, Chunmei Liu4,

Ying Liu42, Michael J Lombardi17, Yunqing Ma10, Scott R Magnuson43, Botoul Maqsodi10, Tim McDaniel4, Nan Mei1, Ola Myklebost44, Baitang Ning1,

Natalia Novoradovskaya13, Michael S Orr15, Terry W Osborn38, Adam Papallo17, Tucker A Patterson1, Roger G Perkins27, Elizabeth H Peters38,

Ron Peterson45, Kenneth L Philips19, P Scott Pine15, Lajos Pusztai46, Feng Qian27, Hongzu Ren14, Mitch Rosen14, Barry A Rosenzweig15,

Raymond R Samaha9, Mark Schena33, Gary P Schroth23, Svetlana Shchegrova6, Dave D Smith47, Frank Staedtler45, Zhenqiang Su1, Hongmei Sun27,

Zoltan Szallasi48, Zivana Tezak21, Danielle Thierry-Mieg16, Karol L Thompson15, Irina Tikhonova32, Yaron Turpaz4, Beena Vallanat14, Christophe Van7,


A R T I C L E S©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

Stephen J Walker49, Sue Jane Wang15, Yonghong Wang8, Russ Wolfinger24, Alex Wong6, Jie Wu27, Chunlin Xiao9, Qian Xie27, Jun Xu22, Wen Yang10,

Liang Zhang29, Sheng Zhong50, Yaping Zong51, William Slikker, Jr1

Scientific management (National Center for Toxicological Research, US Food and Drug Administration): Leming Shi, Weida Tong, Yvonne P. Dragan,

William Slikker, Jr.

Affiliations:

1National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR Road, Jefferson, Arkansas 72079, USA; 2Expression Analysis, Inc., 2605 MeridianParkway, Durham, North Carolina 27713, USA; 3GE Healthcare, 7700 S. River Parkway, Suite 2603, Tempe, AZ 85284, USA; 4Affymetrix, Inc., 3420 Central Expressway, SantaClara, California 95051, USA; 5Illumina,Inc. 9885 Towne Centre Drive, San Diego, California 92121, USA; 6Agilent Technologies, Inc., 5301 Stevens Creek Blvd., Santa Clara,California 95051, USA; 7Eppendorf Array Technologies, rue du Seminaire 20a, 5000 Namur, Belgium; 8NCI Advanced Technology Center, 8717 Grovemont Circle, Bethesda,Maryland 20892, USA; 9Applied Biosystems, 850 Lincoln Centre Drive, Foster City, California 94404, USA; 10Panomics, Inc., 6519 Dumbarton Circle, Fremont, California94555, USA; 11Medical University of Ohio, 3000 Arlington Avenue, Toledo, Ohio 43614, USA; 12Ambion, An Applied Biosystems Business, 2130 Woodward Street, Austin, Texas78744, USA; 13Stratagene Corp., 11011 North Torrey Pines Road, La Jolla, California 92130, USA; 14Office of Research and Development, US Environmental Protection Agency,109 TW Alexander Drive, Research Triangle Park, North Carolina 27711, USA; 15Center for Drug Evaluation and Research, US Food and Drug Administration, 10903 NewHampshire Avenue, Silver Spring, Maryland 20993, USA; 16National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600Rockville Pike, Bethesda, Maryland 20894, USA; 17University of Massachusetts-Boston, 100 Morrissey Boulevard, Boston, Massachusetts 02125, USA; 18Asuragen, Inc., 2150Woodward, Austin, Texas 78744, USA; 19CogenicsTM, A Division of Clinical Data, Inc., 100 Perimeter Park Drive, Suite C, Morrisville, North Carolina 27560, USA; 20Center forBiologics Evaluation and Research, US Food and Drug Administration, 29 Lincoln Drive, Bethesda, Maryland 20892, USA; 21Center for Devices and Radiological Health, US Foodand Drug Administration, 2098 Gaither Road, Rockville, Maryland 20850, USA; 22UCLA David Geffen School of Medicine, Transcriptional Genomics Core, Cedars-Sinai MedicalCenter, 8700 Beverly Boulevard, Los Angeles, California 90048, USA; 23Solexa, Inc., 25861 Industrial Boulevard, Hayward, California 94545, USA; 24SAS Institute, Inc., 100SAS Campus Drive, Cary, North Carolina 27513, USA; 25Vialogy Corp., 2400 Lincoln Avenue, Altadena, California 91001, USA; 26Operon Biotechnologies, 2211 Seminole Drive,Huntsville, Alabama 35805, USA; 27Z-Tech Corp., 3900 NCTR Road, Jefferson, Arkansas 72079, USA; 28Center for Food Safety and Applied Nutrition, US Food and DrugAdministration, 8401 Muirkirk Road, Laurel, Maryland 20708, USA; 29CapitalBio Corp., 18 Life Science Parkway, Changping District, Beijing 102206, China; 30Biogen Idec,5200 Research Place, San Diego, California 92122, USA; 31US Environmental Protection Agency, Office of the Science Advisor, 1200 Pennsylvania Avenue, NW, Washington, DC20460, USA; 32Yale University, W.M. Keck Biotechnology Resource Laboratory, Microarray Resource, 300 George Street, New Haven, Connecticut 06511, USA; 33TeleChemArrayIt, 524 E. Weddell Drive, Sunnyvale, California 94089, USA; 34Center for Veterinary Medicine, US Food and Drug Administration, 8401 Muirkirk Road, Laurel, Maryland20708, USA; 35Cold Spring Harbor Laboratory, 500 Sunnyside Boulevard, Woodbury, New York 11797, USA; 36Burnham Institute, 10901 North Torrey Pines Road, La Jolla,California 92037, USA; 37Stanford University School of Medicine, 318 Campus Drive, Stanford, California 94305, USA; 38Gene Express, Inc., 975 Research Drive, Toledo, Ohio43614, USA; 39Harvard School of Dental Medicine, Department of Developmental Biology, 188 Longwood Avenue, Boston, Massachusetts 02115, USA; 40Vanderbilt University,465 21st Avenue South, Nashville, Tennessee 37232, USA; 41University Texas Southwestern Medical Center, 6000 Harry Hines Boulevard/ND6.504, Dallas, Texas 75390, USA;42University of Texas at Dallas, Department of Computer Science, MS EC31 Richardson, Texas 75083, USA; 43GenUs BioSystems, Inc., 1808 Janke Drive Unit M, Northbrook,Illinois 60062, USA; 44Norwegian Microarray Consortium, Rikshospitalet - Radiumhospitalet Health Centre, Montebello, N0310 Oslo, Norway; 45Novartis, 250 MassachusettsAvenue, Cambridge, Massachusetts 02139, USA; 46MD Anderson Cancer Center, Breast Medical Oncology Department-Unit 1354, 1155 Pressler Street, Houston, Texas 77230,USA; 47Luminex Corp., 12212 Technology Boulevard, Austin, Texas 78727, USA; 48Harvard Medical School, Children’s Hospital Informatics Program at the Harvard-MIT Divisionof Health Sciences and Technology (CHIP@HST), Boston, Massachusetts 02115, USA; 49Wake Forest University School of Medicine, Department of Physiology and Pharmacology,Medical Center Boulevard, Winston-Salem, North Carolina 27157, USA; 50University of Illinois at Urbana-Champaign, Department of Bioengineering, 1304 W. Springfield Avenue,Urbana, Illinois 61801, USA; 51Full Moon Biosystems, Inc., 754 N. Pastoria Avenue, Sunnyvale, California 94085, USA.


A R T I C L E S©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

Rat toxicogenomic study reveals analytical consistencyacross microarray platformsLei Guo1, Edward K Lobenhofer2, Charles Wang3, Richard Shippy4, Stephen C Harris1, Lu Zhang5, Nan Mei1,Tao Chen1, Damir Herman6, Federico M Goodsaid7, Patrick Hurban2, Kenneth L Phillips2, Jun Xu3,Xutao Deng3, Yongming Andrew Sun8, Weida Tong1, Yvonne P Dragan1 & Leming Shi1

To validate and extend the findings of the MicroArray Quality Control (MAQC) project, a biologically relevant toxicogenomics

data set was generated using 36 RNA samples from rats treated with three chemicals (aristolochic acid, riddelliine and

comfrey) and each sample was hybridized to four microarray platforms. The MAQC project assessed concordance in

intersite and cross-platform comparisons and the impact of gene selection methods on the reproducibility of profiling

data in terms of differentially expressed genes using distinct reference RNA samples. The real-world toxicogenomic data

set reported here showed high concordance in intersite and cross-platform comparisons. Further, gene lists generated by

fold-change ranking were more reproducible than those obtained by t-test P value or Significance Analysis of Microarrays.

Finally, gene lists generated by fold-change ranking with a nonstringent P-value cutoff showed increased consistency in Gene

Ontology terms and pathways, and hence the biological impact of chemical exposure could be reliably deduced from all

platforms analyzed.

To validate and extend the findings of the MAQC project1, describedelsewhere in this issue, we generated a toxicogenomics data set usinga rat chemical exposure study. One of the objectives of theMAQC project was to assess the reproducibility of gene expressionprofiling data across laboratories and platforms. Analysis of theMAQC data set shows the high reproducibility of microarray dataunder well-controlled conditions and further indicates that thecriteria used to define differentially expressed genes can have adramatic impact on the overlap of the resulting gene lists. Inparticular, lists of differentially expressed genes generated using foldchange, rather than t-test P value for gene selection have beenpreviously proposed to be more reproducible1,2. The two RNAsamples used in the MAQC project were reference samples with noexplicit biological connection: the Stratagene Universal HumanReference RNA (comprised of RNA from ten different cell lines)and Ambion Human Brain Reference RNA1. The availability of thesedata provides an invaluable resource for benchmarking laboratoryperformance and for testing and validating new procedures,equipment and reagents, for example. Although data from thesereference samples address technical performance and reproduci-bility of results from microarray technology, they cannot addresswhether microarray data from different laboratories or platformswould result in the same biological interpretation of real-world

samples. We therefore sought to apply the findings of the MAQCstudy to a set of experimental toxicogenomic data to validatethe approach.

Several recent publications have investigated the genotoxicity ofthree botanical carcinogens: aristolochic acid, riddelliine and com-frey3–6. In the present study, 36 RNAs were isolated from the kidneyand/or liver of rats exposed to one of these compounds or a controlgroup. To corroborate the findings of the MAQC project and todetermine whether the same biological interpretations would resultfrom cross-platform comparisons, we hybridized these samples to fourcommercially available platforms (Affymetrix, Agilent, Applied Bio-systems and GE Healthcare). To address intersite performance, weused the Affymetrix platform at two different test sites.

The results from this study are consistent with those of the MAQCproject in that good concordance is found between data generated atdifferent sites, as well as from different platforms. Furthermore, whenfold-change ranking is used as the primary criterion for selectingdifferentially expressed genes, the overlap between gene lists fromdifferent laboratories using either the same or different platforms ishigh. In contrast, when a t-statistic (P-value) ranking is used as theprimary criterion the cross-site or cross-platform overlap is substan-tially lower1,2. The selection criteria for differential expression can thusaffect both the apparent reproducibility of microarray data, as well as

Received 5 June; accepted 18 July; published online 8 September 2006; doi:10.1038/nbt1238

1National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR Road, Jefferson, Arkansas 72079, USA; 2Cogenics, A Division of Clinical Data,100 Perimeter Park Drive, Suite C, Morrisville, North Carolina 27560, USA; 3UCLA David Geffen School of Medicine, Transcriptional Genomics Core, Cedars-SinaiMedical Center, 8700 Beverly Boulevard, Los Angeles, California 90048, USA; 4GE Healthcare, 7700 S. River Parkway, Suite #2603, Tempe, Arizona 85284, USA;5Solexa, 25861 Industrial Boulevard, Hayward, California 94545, USA; 6National Center for Biotechnology Information, National Library of Medicine, National Institutesof Health, 8600 Rockville Pike, Bethesda, Maryland 20894, USA; 7Center for Drug Evaluation and Research, US Food and Drug Administration, 10903 New HampshireAvenue, Silver Spring, Maryland 20993, USA; 8Applied Biosystems, 850 Lincoln Centre Drive, Foster City, California 94404, USA. Correspondence should be addressedto L.G. ([email protected]) or L.S. ([email protected]).


A R T I C L E S©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

the biological interpretation of the data. By using fold-change rankingplus a nonstringent P-value cutoff, the overlap of differentiallyexpressed gene lists is increased, leading to improved agreement ofthe biological interpretation of the data in terms of enriched GeneOntology (GO) nodes and pathways. Furthermore, data generated bythis approach led to novel biological findings concerning chemicalexposure. These findings are reproducible across laboratories andplatforms when the preferred gene selection criteria are used. Together,these results further support the findings of the MAQC project,highlight the importance of appropriate data analysis proceduresand demonstrate that microarray data generated from different plat-forms not only result in similar biological interpretation, but alsoreveal novel findings.

RESULTS

RNA was isolated from the target organs of rats exposed to aristolo-chic acid, riddelliine or comfrey, from studies that have beendetailed previously3–6. In total there were six treatment/tissue groups:kidney from aristolochic acid–treated rats, kidney from vehicle

control, liver from aristolochic acid–treated rats, liver from riddel-liine-treated rats, liver from comfrey-treated rats and liver fromvehicle control. Within each treatment/tissue group there were sixbiological replicates. Aliquots of these samples were prepared anddistributed to each of the test sites for gene expression profilingusing microarrays from four different platforms. Laboratory proce-dures were identical to those in the MAQC project1. Unless otherwisestated, the platform manufacturer’s recommendations were usedfor data processing.

Hierarchical clustering analysis

To assess the overall reproducibility of microarray data from the fourplatforms, we performed hierarchical clustering analyses for eachplatform. Within each platform, samples were largely clusteredfirst by tissue type and then by treatment (Fig. 1). Within eachplatform there are individual samples that did not cluster withthe other members of their respective treatment/tissue group; how-ever, the only sample that was consistently different across all plat-forms was sample no. 4 from the aristolochic acid–treated kidney

AG

1_K

_AA

_6A

BI_

K_A

A_4

AF

X_K

_AA

_4A

FX

_K_C

TR

_4A

FX

_K_C

TR

_2A

FX

_K_C

TR

_1A

FX

_K_C

TR

_3A

FX

_K_C

TR

_5A

FX

_K_C

TR

_6A

FX

_K_A

A_6

AF

X_K

_AA

_3A

FX

_K_A

A_5

AF

X_K

_AA

_1A

FX

_K_A

A_2

AB

I_K

_AA

_1A

BI_

K_A

A_2

AB

I_K

_AA

_3A

BI_

K_A

A_6

AB

I_K

_AA

_5A

BI_

K_C

TR

_2A

BI_

K_C

TR

_3A

BI_

K_C

TR

_1A

BI_

K_C

TR

_5A

BI_

K_C

TR

_4A

BI_

K_C

TR

_6A

BI_

L_C

FY

_4A

BI_

L_C

FY

_1A

BI_

L_C

FY

_5A

BI_

L_C

FY

_6A

BI_

L_C

FY

_2A

BI_

L_C

FY

_3

AF

X_L

_CF

Y_4

AF

X_L

_CF

Y_6

AF

X_L

_CF

Y_2

AF

X_L

_CF

Y_3

AF

X_L

_CF

Y_1

AF

X_L

_CF

Y_5

AG

1_K

_CT

R_3

AG

1_K

_CT

R_4

AG

1_K

_CT

R_5

AG

1_K

_CT

R_1

AG

1_K

_CT

R_2

AG

1_K

_CT

R_6

AG

1_K

_AA

_4A

G1_

K_A

A_3

AG

1_K

_AA

_5A

G1_

K_A

A_1

AG

1_K

_AA

_2A

G1_

L_C

FY

_1A

G1_

L_C

FY

_2A

G1_

L_C

FY

_4A

G1_

L_C

FY

_5A

G1_

L_C

FY

_3A

G1_

L_C

FY

_6A

G1_

L_A

A_2

AG

1_L_

RD

L_2

AG

1_L_

RD

L_6

AG

1_L_

RD

L_3

AG

1_L_

RD

L_4

AG

1_L_

RD

L_1

AG

1_L_

RD

L_5

AG

1_L_

AA

_5A

G1_

L_A

A_6

AB

I_L_

AA

_6A

BI_

L_A

A_2

AB

I_L_

AA

_1A

BI_

L_A

A_3

AB

I_L_

AA

_4A

BI_

L_A

A_5

AF

X_L

_AA

_5

AF

X_L

_AA

_4A

FX

_L_A

A_1

AF

X_L

_AA

_3A

FX

_L_A

A_2

AF

X_L

_AA

_6

AG

1_L_

CT

R_5

AB

I_L_

CT

R_6

AB

I_L_

CT

R_5

AB

I_L_

CT

R_2

AB

I_L_

CT

R_1

AB

I_L_

CT

R_3

AB

I_L_

CT

R_4

AF

X_L

_CT

R_6

AF

X_L

_CT

R_1

AF

X_L

_CT

R_3

AF

X_L

_CT

R_4

AF

X_L

_CT

R_2

AF

X_L

_CT

R_5

AF

X_L

_RD

L_1

AF

X_L

_RD

L_2

AF

X_L

_RD

L_4

AF

X_L

_RD

L_3

AF

X_L

_RD

L_5

AF

X_L

_RD

L_6

AB

I_L_

RD

L_1

AB

I_L_

RD

L_2

AB

I_L_

RD

L_4

AB

I_L_

RD

L_5

AB

I_L_

RD

L_3

AB

I_L_

RD

L_6

AG

1_L_

AA

_4A

G1_

L_C

TR

_2A

G1_

L_C

TR

_4

AG

1_L_

CT

R_6

AG

1_L_

CT

R_1

AG

1_L_

CT

R_3

GE

H_K

_AA

_4G

EH

_K_A

A_6

GE

H_K

_AA

_3G

EH

_K_A

A_5

GE

H_K

_AA

_1G

EH

_K_A

A_2

GE

H_K

_CT

R_4

GE

H_K

_CT

R_6

GE

H_K

_CT

R_2

GE

H_K

_CT

R_3

GE

H_K

_CT

R_1

GE

H_K

_CT

R_5

GE

H_L

_RD

L_1

GE

H_L

_RD

L_6

GE

H_L

_RD

L_3

GE

H_L

_RD

L_5

GE

H_L

_RD

L_2

GE

H_L

_RD

L_4

GE

H_L

_CT

R_4

GE

H_L

_AA

_4G

EH

_L_A

A_3

GE

H_L

_AA

_1G

EH

_L_A

A_2

GE

H_L

_AA

_6G

EH

_L_C

TR

_1G

EH

_L_C

TR

_2G

EH

_L_C

TR

_3G

EH

_L_C

TR

_5G

EH

_L_C

TR

_6G

EH

_L_A

A_5

GE

H_L

_CF

Y_5

GE

H_L

_CF

Y_4

GE

H_L

_CF

Y_1

GE

H_L

_CF

Y_3

GE

H_L

_CF

Y_2

GE

H_L

_CF

Y_6

AG

1_L_

AA

_1A

G1_

L_A

A_3

F

ABI AFX

AG1 GEH

a b

c d

Figure 1 Hierarchical clustering of platform-specific microarray data separates samples by tissue and treatment. For each platform, the log2 intensity data

from all 36 microarrays after filtering for genes flagged as below the detection level were hierarchically clustered using an average linkage algorithm and

Euclidean distance as the distance metric. (a) Data from the Applied Biosystems platform (ABI). (b) Affymetrix site 1(AFX). (c) Agilent (AG1). (d) GE

Healthcare (GEH). The sample labels are colored based on treatment/tissue group. Black, control kidney; purple, aristolochic acid–treated kidney; blue,

control liver; red, aristolochic acid–treated liver; orange, riddelliine-treated liver; green, comfrey-treated liver.

Table 1 Average Pearson correlation coefficients of log2-normalized intensity data for each treatment/tissue group

Test site No. of Probe(set)s Aristolochic acid kidneya Control kidney Aristolochic acid liver Comfrey liver Control liver Riddelliine liver

Applied Biosystems (ABI) 26,857 0.9586 (0.9623) 0.9742 0.9636 0.9737 0.9634 0.9705

Affymetrix no. 1 (AFX) 31,099 0.9748 (0.9828) 0.9881 0.9871 0.9861 0.9876 0.9867

Affymetrix no. 2 (AFX2) 31,099 0.9736 (0.9818) 0.9879 0.9860 0.9827 0.9862 0.9836

Agilent (AG1) 41,071 0.9610 (0.9711) 0.9701 0.9642 0.9659 0.9740 0.9675

GE Healthcare (GEH) 35,129 0.9697 (0.9739) 0.9761 0.9690 0.9690 0.9687 0.9734

aNumbers in parentheses represent data after excluding sample no. 4.


A R T I C L E S©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

group. Similar results were obtained using principal componentsanalysis (Supplementary Fig. 1 online). DNA adduct data indicatethat this sample had 50% fewer DNA adducts compared to theother five animals in the same treatment group (Mei, N. et al.,

unpublished data), suggesting that the consistent failure of thissample to cluster with its treatment/tissue group may be biologicallybased. It was also determined that aristolochic acid–treated liversamples showed a relatively small difference in expression profileswhen compared to their tissue-matched control group. This result wasreproduced across all platforms and is consistent with previousobservations that kidney, not liver, is the target organ of aristolochicacid–mediated carcinogenesis7.

The reproducibility of the microarray data was further explored bycalculating the Pearson correlation coefficients of the log2 intensitydata for all pair-wise sample comparisons within a treatment/tissuegroup for each platform. Table 1 shows the average correlation ofbiological replicates within each treatment/tissue group for eachplatform and further demonstrates the high degree of similarity ofthese data. Because of the presence of an animal that had a diminishedtreatment response, the aristolochic acid–treated kidney group had asignificantly lower correlation, as expected, compared to other groups(e.g., P ¼ 0.0024, two-sided, paired t-test compared to the controlkidney group). Removal of sample no. 4 from the aristolochicacid–treated kidney group resulted in a less significant difference(P ¼ 0.085, two-sided, paired t-test compared to the control kidneygroup). These data coupled with the DNA adduct data consistentlyindicate that sample no. 4 from the aristolochic acid–treated kidneygroup has a different response relative to the other group members.Therefore, for the assessment of cross-platform data consistency, thedata from this sample have been excluded.

Overlap of differentially expressed gene lists across sites

One of the fundamental goals of a gene expression profiling experi-ment is to identify those genes that are differentially expressed withinthe system being studied. There are a large number of methodsfor selecting such genes, and ultimately, the genes that are identifiedhave a fundamental impact on the biological interpretation of thedata. Therefore, this toxicogenomics study was used to validate thefindings in regard to gene selection methods by employing differentselection criteria and determining the percentage of overlap betweendifferent laboratories or platforms1,2. The overlap across the two sitesthat generated data using the Affymetrix platform is high (85–90%)when the genes (from a few up to B2,000) are selected by rankordering the genes based on fold change (Supplementary Fig. 2online). As more genes are considered differentially expressed (thatis, moving to the right on the x-axis) the percentage of overlapbegins to decline because of the inclusion of more genes demonstrat-ing smaller fold changes, which are less likely to be reproducibleacross sites. There is a small decrease in the overlap when a P-valuecutoff of 0.01 or 0.05 is applied to the fold change–based,

ABI-K-AA-1

Kid

ney,

aris

tolo

chic

aci

dLi

ver,

aris

tolo

chic

aci

dLi

ver,

com

frey

ABI-K-AA-4AFX-K-AA-4

AFX2-K-AA-4AG1-K-AA-4GEH-K-AA-4AFX-K-AA-1

AFX2-K-AA-1AFX-K-AA-2

AFX2-K-AA-2AFX-K-AA-3

AFX2-K-AA-3

AFX2-K-AA-5

AFX2-K-AA-6

AFX-K-AA-5

AFX-K-AA-6

AG1-K-AA-1AG1-K-AA-2AG1-K-AA-3AG1-K-AA-5GEH-K-AA-1GEH-K-AA-2GEH-K-AA-3GEH-K-AA-5GEH-K-AA-6AGI-K-AA-6ABI-K-AA-2ABI-K-AA-3ABI-K-AA-5ABI-K-AA-6ABI-CFY-1ABI-CFY-2ABI-CFY-3ABI-CFY-6ABI-CFY-4ABI-CFY-5

AFX-CFY-2AFX-CFY-1

AFX-CFY-3AFX-CFY-6

AFX2-CFY-6AFX2-CFY-3AFX2-CFY-1AFX2-CFY-2

AFX2-CFY-5

AFX2-CFY-4AG1-CFY-1AG1-CFY-2AG1-CFY-3AG1-CFY-6AG1-CFY-4AG1-CFY-5GEH-CFY-1GEH-CFY-2GEH-CFY-3GEH-CFY-6GEH-CFY-4GEH-CFY-5

ABI-AA-1ABI-AA-2ABI-AA-3ABI-AA-4ABI-AA-6ABI-AA-5

AFX-AA-1AFX-AA-3AFX-AA-2AFX-AA-4

AFX2-AA-4AFX2-AA-1AFX2-AA-2AFX2-AA-3AFX2-AA-6

AFX-AA-6AG1-AA-1AG1-AA-3AG1-AA-4GEH-AA-1GEH-AA-2GEH-AA-4GEH-AA-3

GEH-AA-5

GEH-AA-6ABI-RDL-1ABI-RDL-3ABI-RDL-6ABI-RDL-2ABI-RDL-4ABI-RDL-5

AFX-RDL-1

AFX-RDL-4AFX-RDL-3AFX-RDL-6

AFX-RDL-5

AFX-RDL-2AFX2-RDL-1

AFX2-RDL-2AFX2-RDL-4

AFX2-RDL-3

AFX2-RDL-5

AG1-RDL-2AG1-RDL-3AG1-RDL-6AG1-RDL-1AG1-RDL-5GEH-RDL-1GEH-RDL-2GEH-RDL-4GEH-RDL-3GEH-RDL-5GEH-RDL-6AG1-RDL-4

AFX2-RDL-6

Live

r, rid

delli

ine

AFX-AA-5AFX2-AA-5AG1-AA-5

AG1-AA-6AG1-AA-2

AFX-CFY-5

AFX-CFY-4

Figure 2 Hierarchical clustering of all individual sample data from all

microarray platforms separated by tissue and treatment. Within each

platform/site, a fold change was calculated and log2 transformed for all

5,112 common genes that did not have any missing values (n ¼ 4,609)

for each of the 24-treated individual samples compared to a tissue-match

control. These values were then hierarchically clustered using Euclidean

distance metric and average linkage. Each row represents the results froman individual treated animal assayed on a particular platform. Each row is

labeled with a platform designation first, followed by the organ assayed for

kidney samples, and then the treatment and unique animal identifier (1–6).

ABI, Applied Biosystems platform; AFX, Affymetrix site 1; AFX2, Affymetrix

site 2; AG1, Agilent; and GEH, GE Healthcare. K, kidney; AA, aristolochic

acid; RDL, riddelliine; and CFY, comfrey. The yellow boxes highlight areas

in which replicates of the same sample across all multiple platform and/or

sites have clustered together.


A R T I C L E S©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

gene-selection methods. This results from the P-value thresholdaltering the composition of the total list of genes such that each testsite has a different list of genes to begin with in the gene selectionprocess, thereby increasing the intersite inconsistency.Supplementary Figure 2 also illustrates the overlap when genes are

selected based on P-value rank ordering alone or with a fold-changecriterion of 2.0 or 1.4. For P value–based gene-selection methods, theoverlap gradually increases as the number of differentially expressedgenes increases. An increase in the overlap is also observed when afold-change cutoff of 1.4 or 2.0 is applied in conjunction with theP-value criterion. This is understandable since the larger fold changesare more easily reproduced than smaller ones.

The impact of different normalization methods on the overlap ofgene lists was also assessed by comparing the overlap of gene listsderived from two normalization methods using the same gene selec-tion method on the same sample pair comparison from data generatedat the same test site (Supplementary Fig. 3 online). When P value isused as the criterion for gene selection, the overlap from differentnormalization methods is relatively low. However, when genes areranked and selected based on fold change with or without a P-valuecutoff, the overlap between different normalization methods is very

high (490%). Furthermore, global scaling methods do not alter therank order of genes based on fold change (hence the gene lists);therefore, the overlap between raw, mean-, or median-scaled data is100% when using the fold change for ranking and selecting genes.However, these scaling factors can affect the magnitude of the foldchanges and the P values and thus will only affect the gene list when aP-value criterion is involved in gene selection. Our results areconsistent with those reported elsewhere8.

In addition to the standard t-test, numerous different statistical testshave been used for the identification of differentially expressed genes9.One commonly used method is Significance Analysis of Microarrays(SAM)10. Supplementary Figure 4 online illustrates the intersiteconcordance results of differentially expressed genes selected basedon fold-change ranking, SAM, t-test and random selection when thedata from the comfrey-treated liver samples are compared to theircorresponding controls. The site-site concordance based on SAM wasclearly improved over that based on a simple t-test, but did notachieve the same level of concordance as that reached based on fold-change ranking. Similar results were obtained when other sample pairsor cross-platform data were analyzed in the same manner (data notshown). Cumulatively, these results illustrate that fold change–based

100 ABI

AG1 GEH

Fold-change rank orderingFold-change rank ordering/P-value <0.05Fold-change rank ordering/P-value <0.01P -value rank ordering/fold-change >2.0P -value rank ordering/fold-change >1.4P -value rank ordering

AFX AFX2

90

80

70

60

50

40

30

20

10

0

Intr

alab

orat

ory

conc

orda

nce

(%)

Intr

alab

orat

ory

conc

orda

nce

(%)

1 10 100 1,000 10,000

100

90

80

70

60

50

40

30

20

10

01 10 100 1,000 10,000

100

90

80

70

60

50

40

30

20

10

01 10 100 1,000 10,000

100

90

80

70

60

50

40

30

20

10

01 10 100 1,000

Number of genes selected as differentially expressed Number of genes selected as differentially expressed

Number of genes selected as differentially expressed Number of genes selected as differentially expressed Number of genes selected as differentially expressed

10,000

100

90

80

70

60

50

40

30

20

10

01 10 100 1,000 10,000

a

d

b

e

c

Figure 3 Intralaboratory overlap of differentially expressed gene lists generated using different selection criteria. For each platform, the liver control and

comfrey treatment groups were equally and randomly divided into two experiments and the differentially expressed genes were identified independently from

the two experiments using different gene selection criteria. Differentially expressed genes were selected from a subset of genes that are detectable by both

experiments. The x-axis represents the number of genes selected as differentially expressed, and the y-axis represents the overlap (%) of two gene lists for a

given number of differentially expressed genes. Each line on the graph represents the intralaboratory overlap of differentially expressed gene lists based on

one of six different gene ranking/selection methods. Red, fold-change rank ordering only; orange, P-value rank ordering only; light green, fold-change rank

ordering and P o 0.01; blue, fold-change rank ordering and P o 0.05; teal, P-value rank ordering and fold change 41.4; and purple, P-value rank ordering

and fold change 42.0. (a) Applied Biosystems (ABI). (b) Affymetrix site 1 (AFX). (c) Affymetrix site 2 (AFX2). (d) Agilent (AG1). (e) GE Healthcare (GEH).


A R T I C L E S©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

selection methods usually offer a higher level of consistency of lists ofdifferentially expressed genes.

Overlap of differentially expressed gene lists across platforms

To assess the reproducibility of data across multiple microarray plat-forms, we identified the list of genes that was measured by all four ofthe microarray platforms using the March 2006 version of the RefSeqdatabase and the methods described by the MAQC project1. Thisresulted in the identification of 5,112 common genes, which were usedin all subsequent cross-platform comparisons. Consistent with resultsfrom intersite comparisons (Supplementary Fig. 2 online), the cross-platform data comparisons reveal the same trends. Specifically, thepercentage of overlap for differentially expressed gene lists is highestwhen fold change–based gene selection methods are used (Supple-mentary Fig. 5a online). Not surprisingly, the cross-platform overlapis higher (B80%) in all instances when genes that are not reprodu-cibly detected on the microarrays are omitted (e.g., those probes thatare flagged as ‘not present’) (Supplementary Fig. 5b online). Theseresults combined with intersite results further corroborate the findingsof the MAQC project that fold change–based selection criteria fordifferentially expressed genes generate more reproducible results1,2. Nomeasure of sensitivity or specificity of the approach was included inthe analysis.

Within each platform/site, the fold change was calculated for all5,112 common genes that did not have any missing values (n ¼ 4,609)for each of the 24-treated individual samples compared to atissue-match control and these values were then hierarchicallyclustered (Fig. 2). The resulting dendrogram illustrates that thesamples are separated by tissue and then by treatment. Each of thefour major branches of the dendrogram contain all of the biologicalreplicate data for a given treatment/tissue group regardless of the siteor platform that was used to generate the data. Within each of thesebranches, the platform as opposed to the biological replicate is thenext major division. There are a few notable exceptions to thisobservation. When the same platform is performed at different testsites, the replicates of the same sample assayed at different sites clustermore closely together. In a few instances, the results from multipledifferent platforms for the same biological sample cluster together(e.g., aristolochic acid–treated liver sample no. 5). Because no geneselection criteria were used to generate this visualization, these resultsfurther indicate that interlaboratory and cross-platform data arehighly reproducible.

Agreement of biological interpretation with GO and pathways

Typically, a microarray-based experiment is performed in a singlelaboratory using a single platform. Furthermore, it is relativelycommon to use three biological replicates in a toxicogenomic studywhen multiple groups of samples are involved. To explore whetheror not a similar biological response was obtained when comparingresults within a given laboratory, we generated data from six bio-logical replicates. The control and treatment groups were thenequally and randomly divided into two artificial experiments. Con-sistent with the interlaboratory and cross-platform results, the overlapof differentially expressed genes using different gene selection criteriafrom the intra-laboratory results revealed the same trend, namelythat fold change–based selection criteria generate more reproducibleresults (Fig. 3). For each of the ABI, AFX and AFX2 intralaboratorycomparisons, the overlap of gene lists was almost identical with orwithout a P cutoff (o0.05) for up to B1,000 genes selected asdifferentially expressed; for AG1 and GEH, the use of a P cutoff(o0.05) slightly increased the overlap of gene lists. However, theuse of a more stringent P cutoff (o0.01) decreased the overlap ofgene lists. These intralaboratory comparison results are consistentwith those of interlaboratory comparisons (Supplementary Fig. 2online). Therefore, a modest P cutoff (o0.05) appeared to bereasonable for data sets of this small sample size (3). Furthermore,the use of a fold-change threshold increased the overlap of genelists derived from P-value ranking; a more stringent fold-changethreshold leads to higher overlap of gene lists (Fig. 3 and Supple-mentary Fig. 2 online).

The differences in overlap of gene lists based on selection criteriawere further investigated by assessing the impact on the associatedGO terms. From each artificial experiment, the top 200 genes basedon either a fold-change (with P o 0.05 cutoff) or P-value rankingwere selected. The P value from the Fisher’s exact test was calculatedfor each GO term associated with these genes. For each artificialexperiment, the GO terms were then rank-ordered based on theP value. The overlap between the two artificial experiments wasdetermined by dividing the number of GO terms commonly meetinga P-value ranking criterion in both of the artificial experiments by thetotal number of GO terms meeting the P-value criterion for eitherexperiment. Figure 4 illustrates the percentage of overlapping GOterms plotted against a defined number of the highest rankingGO terms from both experiments. Clearly, the overlap of GO termswas much higher when genes are selected by fold change compared

1009080706050403020100

0

Com

mon

GO

term

s (%

)

20 40 60 80 100 120 140

1009080706050403020100

0 20 40 60 80 100 120 140

1009080706050403020100

0 20 40 60 80 100 120 140

Number of GO terms Number of GO terms Number of GO terms Number of GO terms

Number of GO terms

1009080706050403020100

0 20 40 60 80 100 120 140

1009080706050403020100

0 20 40 60 80 100 120 140

ABI_FC (P < 0.05)ABI_P

AFX_FC (P < 0.05)AFX_P

AFX2_FC (P < 0.05)AFX2_P

GEH_FC (P < 0.05)GEH_P

AG1_FC (P < 0.05)AG1_P

a

Com

mon

GO

term

s (%

)b

Com

mon

GO

term

s (%

)c

Com

mon

GO

term

s (%

)C

omm

on G

O te

rms

(%)

d

eFigure 4 Intralaboratory overlap of enriched GO terms. The control and treatment groups were equally and

randomly divided into two experiments. From each experiment, the top 200 genes based on either a fold

change (blue line) or P value (pink line) ranking were selected. The GO terms associated with these genes

were then rank ordered and the overlap between the two experiments was identified and graphed to compare

the percentage of overlap (y-axis) against the total number of GO terms present in both experiments. The

results depicted are derived from the comfrey-treated comparisons for each platform, but similar results were

generated with the other treatment comparisons. (a) Applied Biosystems (ABI). (b) Affymetrix site 1 (AFX).

(c) Affymetrix site 2 (AFX2). (d) Agilent (AG1). (e) GE Healthcare (GEH).


A R T I C L E S©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

to those selected by P value. Similar results were obtained when thegene lists are mapped to KEGG pathways (Fig. 5) or other pathwaydatabases (e.g., Ingenuity) (data not shown). These results clearlyshow that common biological responses are evident when genes areselected by criteria that lead to reproducible gene lists. Nonoverlappinglists of differentially expressed genes generally lead to inconsistentbiological interpretation of microarray results in terms of GO termsand pathways.

Agreement of biological response

To further explore the agreement of biological response across themicroarray platforms, we combined data from the cross-platformcommon gene list (5,112 genes) and from the six comfrey-treated liversamples and compared them to the data from the six control liversamples for each platform. A t-test was performed and genes withP o 0.05 were identified. This filtered gene set was then rank orderedby fold change and for each platform the top 250 up- and down-regulated genes were selected, generating a list of the top 500differentially expressed genes for each of the five platform/site combi-nations (the overlap in genes between the gene lists for any twoplatforms is 470%).

A GO enrichment analysis was performed for each platform bycomparing the content of the top 500 differentially expressed genes tothe content of the 5,112 common gene list using a Fisher’s Exact Testin GoMiner11,12, resulting in an enrichment P for each GO term. Acomparison of P values across platforms identified 101 nodes thatwere significantly over- or underenriched (Po 0.05) in at least four offive platforms, with nearly 60% of these terms being significant in allfive platforms. Inspection of these enriched categories confirmed thatthe different microarray platforms were reporting the same biologicalresponses in these samples, and also provided novel insight into theeffects of comfrey exposure.

Comfrey is a perennial plant that has been widely used for over2,000 years as an herbal medicine for a wide variety of ailments.However, comfrey has been shown to be both genotoxic and hepato-toxic13. The exact molecular mechanism underlying these toxicities isnot fully understood, but is known to be associated with thepyrrolizidine alkaloids present in comfrey, which can be metabolicallyactivated and bind to DNA6,14. Considering that there are 4350different pyrrolizidine alkaloids found in over 6,000 different species15,it has been suggested that pyrrolizidine alkaloids are ‘‘probably the

most common poisonous plant constituents that poison livestock,wildlife, and humans, worldwide14.’’ Examination of the 101 signifi-cant GO terms revealed at least two that were noteworthy: copper ionhomeostasis (GO:0006878) and vitamin A metabolism (GO:0006776).Dietary or medicinal exposure to several pyrrolizidine alkaloid–con-taining plants has been shown to result in decreased levels of vitaminA in the liver and increased liver levels of copper16–18, but there is noindication that these effects have been observed in response to comfreyexposure. These results suggest that comfrey influences copper andvitamin A levels similar to other pyrrolizidine alkaloid–containingplants. Furthermore, these data are the first indication that changes inliver vitamin A and copper levels in response to pyrrolizidine alkaloid–exposure are transcriptionally regulated. Interestingly, only four genesassociated with copper ion homeostasis are present in the commongene list and in all instances each platform identified two of thesegenes as significantly upregulated: amyloid beta (A4) precursorprotein (APP) and prion protein (PRNP). Previously, both of thesegenes were shown to bind copper and were shown to be upregulatedin response to chronic copper exposure19–21. Cumulatively, thesefindings indicate that comfrey, like several other pyrrolizidine alka-loid–containing plants, may affect liver levels of vitamin A and copper.Importantly, these data demonstrate that different microarray plat-forms can consistently report novel biological findings at the level ofbiological processes and of individual genes.

DISCUSSION

In this study, a data set was created that could validate and extend thefindings of the MAQC project by focusing on a biologically relevantset of samples. Specifically, a large toxicogenomics data set wasgenerated using 36 RNA samples from rats treated with threechemicals and four commercial microarray platforms to investigatethe agreement of intersite and cross-platform gene lists.

When a few or up to 2,000 genes are selected as differentiallyexpressed from different sites using the same microarray platform, thepercentage of overlap is B85% based on a fold-change criterion forgene ranking and selection (Supplementary Fig. 2 online). A lowerpercentage of overlap is observed using P as the criterion for generanking and selection, in particular when fewer genes are selected asdifferentially expressed. This same trend was also observed when geneselection methods were compared across platforms using the subset of5,112 common genes (Supplementary Fig. 5 online). In addition,

1009080706050403020100

0

Com

mon

KE

GG

pat

hway

s (%

)

Com

mon

KE

GG

pat

hway

s (%

)

5 10 15 20 25 30 35

1009080706050403020100

0 5 10 15 20 25 30 35

1009080706050403020100

0 5 10 15 20 25 30 35

1009080706050403020100

0 5 10 15

Number of KEGG pathways Number of KEGG pathways Number of KEGG pathways Number of KEGG pathways

Number of KEGG pathways

20 25 30 35

1009080706050403020100

0 5 10 15 20 25 30 35

AG1_FC (P < 0.05)AG1_P

GEH_FC (P < 0.05)GEH_P

ABI_FC (P < 0.05)ABI_P

AFX_FC (P < 0.05)AFX_P

AFX2_FC (P < 0.05)AFX2_P

a b

Com

mon

KE

GG

pat

hway

s (%

)

Com

mon

KE

GG

pat

hway

s (%

)C

omm

on K

EG

G p

athw

ays

(%)

d

e

c

Figure 5 Intralaboratory overlap of differentially enriched KEGG pathways. The control and treatment groupswere equally and randomly divided into two experiments. From each experiment, the top 200 genes based on

either a fold change (blue line) or P-value (pink line) ranking were selected. The KEGG pathways associated with

these genes were then rank ordered and the overlap between the two experiments was identified and graphed to

compare the percentage of overlap (y-axis) against the total number of KEGG pathways present in both

experiments. The results depicted are derived from the comfrey-treated comparisons for each platform,

but similar results were generated with the other treatment comparisons. (a) Applied Biosystems (ABI).

(b) Affymetrix site 1 (AFX). (c) Affymetrix site 2 (AFX2). (d) Agilent (AG1). (e) GE Healthcare (GEH).


A R T I C L E S©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

concordance offered by the widely used SAM approach did notachieve the same high level of concordance generated by fold-changeranking (Supplementary Fig. 4 online). These results are also con-sistent with those based on MAQC human samples and highlight theproblems with commonly used gene selection methods that are solelybased on t-test P values1,2. As expected, the degree of overlap of genelists directly affects the ability to consistently identify the samebiological response in regard to GO terms (Fig. 4) and KEGGpathways (Fig. 5). Therefore, to ensure reproducible biological inter-pretation of microarray results, it is important that criteria forgenerating lists of differentially expressed genes are selected properly.

The lack of overlap of lists of differentially expressed genes selectedusing a P-value criterion may be explained by the fact that fold changeis calculated by comparing signal intensity for a given gene as directlymeasured using a microarray, whereas the P-value calculation incor-porates the signal-to-noise ratio. Therefore, if the signal intensity forthe gene is more reproducible across laboratories or platforms thanthe associated noise level, this would result in the finding that foldchange–based, gene-selection methods are more reproducible. How-ever, the impact of the proposed analysis method on two otherparameters, sensitivity and specificity, will also have to be assessedbefore any final conclusions can be drawn regarding the general-izability of this approach.

Sample size is another important factor that impacts concordanceof lists of differentially expressed genes. It is interesting to compare theresults of Figure 3 (AFX and AFX2) with those of SupplementaryFigure 2b online in which for the same microarray platform, one canobserve an overall increased level in the overlap of differentiallyexpressed genes when six replicates from different laboratories arecompared as opposed to the three replicates from within the samelaboratory. This increase is observed despite the potential for inter-laboratory variation, which would affect the six-replicate comparisonsbut not comparisons of three replicates within a laboratory. Thisdemonstrates the relationship between increases in statistical powerand the resulting gain in reliable detection of differential expressionthat occurs with increased sample sizes. It is worth noting thatdifferences between individual biological replicates also contribute tothe relatively lower overlap observed in Figure 3.

To illustrate the importance of using gene selection criteria thatmaximize overlap of gene lists, we first filtered the data (comfreycompared to control) using a relatively nonstringent P cutoff(o0.05) and then the remaining genes were rank ordered usingfold change. By selecting the top 250 up- and downregulated genesfrom each platform and performing a GO enrichment analysis, notonly was the cross-platform reproducibility of GO terms demon-strated, but a novel biological finding was also revealed on allplatforms and at all sites. Specifically, comfrey, like several otherpyrrolizidine alkaloid–containing plants, affects liver levels of vitaminA and copper; furthermore, these changes are, at least in part,transcriptionally regulated.

Microarray technology has had a profound impact on biologicalresearch partially from its ability to identify differentially expressedgenes that may be used to develop potential biomarkers, elucidatemolecular mechanisms and group similar samples based on genesignatures. Therefore, the reproducibility and reliability of the datafrom a study and the choice of methods that lead to the identificationof concordant lists of differentially expressed genes are critical forbiological interpretation. Concerns have been raised regarding thereliability of microarray results due to the apparent lack of overlap ofthe lists of differentially expressed genes22–28. The results from thisstudy suggest that the disappointingly low concordance reported in

some earlier publications can be attributed in large part to the practiceof deriving differentially expressed gene lists based on the ranking ofgenes solely by a statistical significance measure. Furthermore, theseresults demonstrate that microarray data generated from differentplatforms can not only result in a similar biological interpretation, butalso reveal novel findings.

METHODSMicroarray processing. Details on the description of the in vivo portion of this

study has been described3–6. Briefly, groups of six 6-week-old Big Blue rats were

gavaged with riddelliine (1 mg/kg body weight) or aristolochic acid (10 mg/kg

body weight) five times a week for 12 weeks or Big Blue rats were fed a diet of

8% comfrey roots for 12 weeks. The animals were sacrificed after 12 weeks of

treatment, and the tissues were isolated, frozen quickly in liquid nitrogen and

stored at –80 1C. RNA was isolated from tissues of rats that had been exposed to

aristolochic acid (liver and kidney), riddelliine (liver), comfrey (liver) or a

control group (liver and kidney). There were six biological replicates for each

treatment/tissue group for a total of 36 samples. The samples were randomly

labeled and each test site was provided an aliquot of each sample. To avoid

potential confounding factors in experimental implementation, the identity of

the RNA samples was kept unknown to the test sites before data were submitted

to FDA/NCTR. The sample ID, RNA Integrity Number, OD ratio, microarray

ID and data file names are provided in the Supplementary Table 1 online.

Each of the RNA samples was labeled and hybridized to a microarray from

one of four commercial platforms: Affymetrix (Rat Genome 230 2.0), Agilent

(Whole Rat Genome Oligo Microarray, G4131A), Applied Biosystems (Rat

Genome Survey Microarray) and GE Healthcare (Rat Whole Genome Bioarray,

300031). Except for Affymetrix, which was performed at two independent test

sites, each platform was used at one single test site with 36 microarrays using

biological replicate RNA samples. The labeling and hybridizations were

performed according to the manufacturer’s recommendation using methods

detailed in the MAQC project1.

Data analysis. Unless otherwise stated, the manufacturer’s recommended

normalization methods were used: quantile normalization for Applied Biosys-

tems, PLIER with an offset value of 16 for Affymetrix and median-scaling for

both Agilent and GE Healthcare1. To assess the impact of normalization

methods on microarray results, we compared a limited number of commonly

used normalization methods: raw, mean, median and quantile (Supplementary

Fig. 3 online). The toxicogenomics data set generated in this study has also

been used for the evaluation of microarray assay performance based on external

RNA controls29.

Six different gene selection methods were used: (i) fold-change rank

ordering only, (ii) fold-change rank ordering and P o 0.01, (iii) fold-change

rank ordering and P-value cutoff o 0.05, (iv) t-test P value (assuming equal

variance) rank ordering only, (v) P-value rank ordering and fold change 41.4,

(vi) P-value rank ordering and fold change 42.0. The percentage of over-

lapping genes from these differentially expressed gene lists was then calculated

in the same way as was described elsewhere1.

ArrayTrack30 was used for GO and KEGG pathway mapping, whereas GO

enrichment analyses were performed using High Throughput GoMiner11,12.

Cross-platform sequence mapping to RefSeq. Probe sequences from each

microarray platform were mapped onto the NCBI-curated rat RefSeq database

from March 2006. The same mapping criteria as reported for the main MAQC

study was used1. The primary mapping criterion is a perfect match between a

probe sequence and the target transcript sequence: a probe perfectly matches a

transcript provided that a completely homologous sequence of length equal to

the probe length is found anywhere on the transcript. The only exception to

this rule is from the Affymetrix platform in which a ProbeSet is considered a

perfect match to a transcript as long as 80% of probes within the ProbeSet

(usually nine out of 11) perfectly match the same transcript. To simplify the

cross-platform data analysis, a mapping table was generated with one probe per

gene. Consistent with the MAQC main study1, if more than one probe from a

platform perfectly matches the same gene, the probe closest to the 3¢ UTR was

considered, resulting in 5,204 common non-model RefSeq mRNAs (NMs)

mapped across 5,112 common genes (Supplementary Table 2 online).


A R T I C L E S©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

Accession numbers. All data are available through GEO (series accession

number: GSE5350), ArrayExpress (accession number: E-TABM-132), Array-

Track (http://www.fda.gov/nctr/science/centers/toxicoinformatics/ArrayTrack/),

and the MAQC web site (http://www.fda.gov/nctr/science/centers/

toxicoinformatics/maqc/).


ACKNOWLEDGMENTSE.K.L., K.L.P. and P.H. acknowledge Agilent Technologies, Inc. and Affymetrix,Inc. for their material contributions to this work, thank John Pufky, StephenBurgin and Jennifer Troehler for their outstanding technical assistance, andgratefully acknowledge the Advanced Technology Program of the NationalInstitute of Standards and Technology, whose generous support provided partialfunding of this research (70NANB2H3009). C.W. acknowledges Affymetrix, Inc.for material contributions to this work. R.S. acknowledges technical support ofAlan Brunner for generating GE Healthcare microarray data. L.G. and L.S.thank X. Megan Cao, Stacey Dial, Carrie Moland and Feng Qian for theirsuperb technical assistance.

DISCLAIMERThis work includes contributions from, and was reviewed by, the FDA and theNIH. Certain commercial materials and equipment are identified in orderto adequately specify experimental procedures. In no case does suchidentification imply recommendation or endorsement by the FDA or the NIH,nor does it imply that the items identified are necessarily the best available forthe purpose.

COMPETING INTERESTS STATEMENTThe authors declare competing financial interests (see the Nature Biotechnologywebsite for details).

Published online at http://www.nature.com/naturebiotechnology/

Reprints and permissions information is available online at http://npg.nature.com/

reprintsandpermissions/

1. MAQC Consortium. The MicroArray Quality Control (MAQC) project shows inter- andintraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24,1151–1161 (2006).

2. Shi, L. et al. Cross-platform comparability of microarray technology: intra-platformconsistency and appropriate data analysis procedures are essential. BMC Bioinfor-matics 6 (Suppl 2), S12 (2005).

3. Chen, L., Mei, N., Yao, L. & Chen, T. Mutations induced by carcinogenic doses ofaristolochic acid in kidney of Big Blue transgenic rats. Toxicol. Lett. 165, 250–256(2006).

4. Mei, N., Chou, M.W., Fu, P.P., Heflich, R.H. & Chen, T. Differential mutagenicity ofriddelliine in liver endothelial and parenchymal cells of transgenic Big Blue rats. CancerLett. 215, 151–158 (2004).

5. Mei, N., Heflich, R.H., Chou, M.W. & Chen, T. Mutations induced by the carcinogenicpyrrolizidine alkaloid riddelliine in the liver cII gene of transgenic Big Blue rats. Chem.Res. Toxicol. 17, 814–818 (2004).

6. Mei, N., Guo, L., Fu, P.P., Heflich, R.H. & Chen, T. Mutagenicity of comfrey(Symphytum Officinale) in rat liver. Br. J. Cancer 92, 873–875 (2005).

7. Arlt, V.M., Stiborova, M. & Schmeiser, H.H. Aristolochic acid as a probable humancancer hazard in herbal remedies: a review. Mutagenesis 17, 265–277 (2002).

8. Patterson, T.A. et al. Performance comparison of one-color and two-color platformswithin the MicroArray Quality Control (MAQC) project. Nat. Biotechnol. 24, 1140–1150 (2006).

9. Allison, D.B., Cui, X., Page, G.P. & Sabripour, M. Microarray data analysis: from disarrayto consolidation and consensus. Nat. Rev. Genet. 7, 55–65 (2006).

10. Tusher, V.G., Tibshirani, R. & Chu, G. Significance analysis of microarrays appliedto the ionizing radiation response. Proc. Natl. Acad. Sci. USA 98, 5116–5121 (2001).

11. Zeeberg, B.R. et al. GoMiner: a resource for biological interpretation of genomic andproteomic data. Genome Biol. 4, R28 (2003).

12. Zeeberg, B.R. et al. High-Throughput GoMiner, an ‘industrial-strength’ integrative geneontology tool for interpretation of multiple-microarray experiments, with application tostudies of Common Variable Immune Deficiency (CVID). BMC Bioinformatics 6, 168(2005).

13. Stickel, F. & Seitz, H.K. The efficacy and safety of comfrey. Public Health Nutr. 3,501–508 (2000).

14. Fu, P.P., Xia, Q., Lin, G. & Chou, M.W. Pyrrolizidine alkaloids–genotoxicity, meta-bolism enzymes, metabolic activation, and mechanisms. Drug Metab. Rev. 36, 1–55(2004).

15. Betz, J.M., Eppley, R.M., Taylor, W.C. & Andrzejewski, D. Determination of pyrrolizidinealkaloids in commercial comfrey products (Symphytum sp.). J. Pharm. Sci. 83,649–653 (1994).

16. Cheeke, P.R. Toxicity and metabolism of pyrrolizidine alkaloids. J. Anim. Sci. 66,2343–2350 (1988).

17. Huan, J. et al. Dietary pyrrolizidine (Senecio) alkaloids and tissue distribution of copperand vitamin A in broiler chickens. Toxicol. Lett. 62, 139–153 (1992).

18. Moghaddam, M.F. & Cheeke, P.R. Effects of dietary pyrrolizidine (Senecio) alkaloids onvitamin A metabolism in rats. Toxicol. Lett. 45, 149–156 (1989).

19. Armendariz, A.D., Gonzalez, M., Loguinov, A.V. & Vulpe, C.D. Gene expression profilingin chronic copper overload reveals upregulation of Prnp and App. Physiol. Genomics20, 45–54 (2004).

20. Hesse, L., Beher, D., Masters, C.L. & Multhaup, G. The beta A4 amyloid precursorprotein binding to copper. FEBS Lett. 349, 109–116 (1994).

21. Varela-Nallar, L., Toledo, E.M., Chacon, M.A. & Inestrosa, N.C. The functional linksbetween prion protein and copper. Biol. Res. 39, 39–44 (2006).

22. Tan, P.K. et al. Evaluation of gene expression measurements from commercial micro-array platforms. Nucleic Acids Res. 31, 5676–5684 (2003).

23. Ramalho-Santos, M., Yoon, S., Matsuzaki, Y., Mulligan, R.C. & Melton, D.A. ‘‘Stem-ness’’: transcriptional profiling of embryonic and adult stem cells. Science 298, 597–600 (2002).

24. Ivanova, N.B. et al. A stem cell molecular signature. Science 298, 601–604 (2002).25. Fortunel, N.O. et al. Comment on ’’ ‘Stemness’: transcriptional profiling of embryonic

and adult stem cells’’ and ‘‘a stem cell molecular signature’’. Science 302, 393 authorreply 393 (2003).

26. Marshall, E. Getting the noise out of gene arrays. Science 306, 630–631 (2004).27. Miller, R.M. et al. Dysregulation of gene expression in the 1-methyl-4-phenyl-1,2,3,6-

tetrahydropyridine-lesioned mouse substantia nigra. J. Neurosci. 24, 7445–7454(2004).

28. Frantz, S. An array of problems. Nat. Rev. Drug Discov. 4, 362–363 (2005).29. Tong, W. et al. Evaluation of external RNA controls for the assessment of microarray

performance. Nat. Biotechnol. 24, 1132–1139 (2006).30. Tong, W. et al. ArrayTrack–supporting toxicogenomic research at the US Food and Drug

Administration National Center for Toxicological Research. Environ. Health Perspect.111, 1819–1826 (2003).


A R T I C L E S©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Erratum: Alfimeprase to succeed Genentech’s alteplase?Brian VastagNat. Biotechnol. 24, 875–876 (2006)

In the 6th paragraph, the statement “in 1996, Genentech launched Alteplase” is incorrect. Genentech received FDA approval of Alteplase for heart attack in 1987. In 1996, it received approval for Alteplase’s second indication, stroke.

Erratum: Diversifying chemical arraysLaura DeFrancescoNat. Biotechnol. 24, 799 (2006)

In the print version of the article, the author of the featured article is incorrectly identified as Brandord et al. The author’s name is Bradner.

Corrigendum: All in the RNA familyBeverly L. DavidsonNat. Biotechnol. 24, 951–952 (2006)

In the fifth paragraph, the abbreviation for prostate-specific membrane antigen (PSMA) was mistakenly written several times as PMSA. This error also appears in Figure 1.

Corrigendum: Engineering and characterization of a superfolder green fluorescent proteinJean-Denis Pédelacq, Stéphanie Cabantous, Timothy Tran, Thomas C Terwilliger & Geoffrey S WaldoNat. Biotechnol. 24, 79–88 (2005)

In the legend for Figure 4b and in the last line of paragraph 6 in Methods, “number of moles” should be moles. Also in Methods, paragraph 3, “superfolder GFP (27.747 kDa/mole)...” should read “superfolder GFP (27747 g/mole)” and “folding reporter GFP (27.742 kDa/mole)...” should read “...folding reporter GFP (27,742 g/mole).” The error has been corrected in the PDF version of the article.

Retraction: Identification of genes that function in the TNF-α-mediated apoptotic pathway using randomized hybrid ribozyme librariesHiroaki Kawasaki, Reiko Onuki, Eigo Suyama & Kazunari TairaNat. Biotechnol. 20, 376–380 (2002)

Although the gene discovery technology described in this paper has been demonstrated to have practical utility by several independent researchers, the first author of the paper failed to maintain a proper data notebook to support the results presented. As this constitutes nonadherence to the ethical standards in scientific research, and in accordance with the recommendations from the National Institute of Advanced Industrial Science & Technology (AIST), R. Onuki, E. Suyami and K. Taira respectfully retract this paper. H. Kawasaki declines to associate himself with this retraction and maintains that all the data contained in the paper are valid.

ERRATA , CORR IGENDA AND RETRACT ION©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Five attributes of a successful manager in a research organizationGrace H W Wong

What does it take to make the transition from scientist to manager?

Little in the education, training or back-ground of scientists prepares them for

management. Good managers tend to be good with people; they look at the larger picture, are good at motivating their team, adapt to unanticipated business events, are comfortable working to budgets and are able to assess and respond to risk. In contrast, researchers at the bench spend most of their time focusing on narrow scientific questions, designing experi-ments and budgeting resources needed for those experiments. One would think that these two professions—business management and scientific research—were mutually exclusive, and in the vast majority of cases, one would be right. But a talented few have been successful in making the transition from the bench to the boardroom. I asked several of these individuals to identify the key attributes to their success and the factors that influenced their transition from academia into management (Box 1).

The industry research managerIt’s difficult enough managing a team of people in any business. One must manage budgets, prioritize time, delegate tasks, motivate a team and provide clear leadership. But managers in a research-intensive organization, such as a biotech or pharmaceutical company, also have to contend with several additional challenges. First and foremost, a research manager’s team (that is, scientists) comprises probably one of the least manageable groups of people on the planet. As Bob Ruffalo, president of R&D at Wyeth (Madison, NJ, USA), succinctly puts it: “Although I enjoy heading a group of scien-tists, they are, by their nature, very difficult to

manage, and they are not always comfort-able with change.”

Second, the team’s business goal—dis-covering (if not developing and mar-keting) drugs—is an endeavor with one of the highest rates of failure and attrition of any industry. This means that a manager is often faced with the decision to close

or shelve projects—projects that their teams often are invested in intellectually—and the highs do not necessarily outweigh the disap-pointments.

Third, the drug sector is so incredibly diverse that expertise may not be transferable. For example, a manager at a small startup venture faces different challenges from one heading a large team at a multinational pharma company. Business and management skills acquired in a small-to-medium enterprise (SME) environ-ment, where money and resources are at a premium, may be less relevant to teams in big pharma, and vice versa.

The jobs themselves are intensive and bal-ance many different skill sets. Although most research managers remain married to the sci-ence—devoting as much time as possible to planning experiments, carrying out secondary analysis of data and reviewing key and perti-nent literature—they spend an equal amount of their time performing managerial activities (personnel and site-wide meetings), prioritiz-ing workloads for a particular day and attend-ing various conferences. Many of the managers interviewed emphasized the importance of remaining close to their teams, and visiting the laboratories under their supervision to

demonstrate interest and support and to answer any questions.

Five key attributesGiven the demands and responsibilities of a manager in a drug research group—whatever its size or focus—it’s no surprise that it takes a talented person to succeed. Exceptional indi-viduals may each have a unique way of tackling their jobs, but on the basis of feedback from our respondents, most successful R&D manag-ers share several common attributes.

Determination. In a sector where progress often appears to take the form of two steps forward and one step back, several execu-tives regard staying power, quiet determina-tion and persistence in the face of adversity as a key characteristic. Lex Van der Ploeg, site head for Merck Research Laboratories in Boston, Massachussetts, believes that research managers need “motivation and drive” and a resolve not to “get discouraged by failures.”

Wyeth’s Ruffalo agrees, saying that you need to “learn how to cope with disappointment. The thing that frustrates me most is the enor-mous risk that we face in drug discovery and development. Most people do not understand the kinds of risk. . . the pharmaceutical indus-try is an extremely hard place to work.”

Drive and diligence. Long hours and hard work are a given in biotech and pharma research groups. A typical day for Van der Ploeg starts between 4 and 5 a.m. “I get up early, walk the dog and then head to work. Because the days are generally packed with meeting and events, I make sure to reserve space and time for reflection. Evenings are spent with my fam-ily or with late-day work events. After 9 p.m., I do a bit more work. This daily cycle runs six days per week.”

Grace H. W. Wong is chief scientific officer at ActoKine Therapeutics and president of Student Vision.e-mail: [email protected]

Wyeth’s Bob Ruffalo: “Scientists are, by their nature, very difficult to manage.”

CAREERS AND RECRUITMENT©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Robert Lewis, former senior vice president at Aventis and former chief scientific officer at Seattle-based Cell Therapeutics, also puts in long hours. At Aventis his daily routine included “at least two group scientific meetings on basic or early development projects (1–2 1/2 hours each), a group site management meeting (human resources, budget, site services, etc.) (1/2–1 1/2 hours), up to three one-on-one meetings with colleagues/scientists from the organization (1/2–1 hour each), time for read-ing enclosed materials on e-mail and answer-ing/initiating correspondence (1 1/2–2 hours),

with the remainder of the time spent reading scientific journal articles. This, without lunch, amounted to a 11- to 12-hour day.”

William Shek, senior scientific director at Charles River Laboratories (Wilmington, MA, USA), spends the majority of his waking hours at his company. His working day largely involves resolving “technical and management problems.” But once the laboratory has emp-tied in the late afternoon to early evening, he uses the time to “concentrate on writing and computer programming, which has become an avocation and occupation of mine because

of the critical role of information man-agement in the labo-ratory.” He usually goes home “around midnight.”

Because of these long and intense workdays, Lewis is keenly aware of the need for manag-ers to develop time management skills.

William R. Shek, senior scientific director, research animal diagnostic services, Charles River Laboratories, Wilmington, MA, USA. “I became a veterinarian when I was just 13. Consequently, I went to a high school with a special program in agricultural and spent my summers working on dairy farms in upstate New York. At that time, farm experience was a requirement for entry into vet school. I graduated from high school and went on to attend the College of Agriculture at Cornell University where I majored in biology. After three years as a undergraduate, including a semester at Tel Aviv University, I was accepted to the Cornell New York State College of Veterinary Medicine where I matriculated in the fall of 1974. During the summer of 1975, I started graduate research in microbiology at the veterinary school. I graduated from there in 1977, and went on to complete MSc and PhD degrees in 1979 and 1982, respectively. Although I had been offered an a position as assistant professor at the Cornell’s State Veterinary Diagnostic Laboratory, I decided after 12 years it was time to move on. And so I accepted a job as director of microbiology and immunology at Charles River Laboratories, where I began work in the spring of 1982 and have been employed ever since.”

Gary Peltz, head of genetics and genomics, Roche Palo Alto, Palo Alto, CA, USA. “I was an MD/PhD student at Stanford University who did a residency in internal medicine and a fellowship in rheumatology at the University of California, San Francisco. Although I had planned to go into academic medicine, I changed course when I looked at several academic positions in the early 1990s. The very low level of research funding was very discouraging and was coupled with very demanding clinical obligations placed on junior faculty. This made it very difficult to engage in the type of cutting-edge research that I wanted to pursue. Therefore, my first job was at Syntex Research, which subsequently became part of Roche.”

Scott Wadsworth, research fellow, medical devices group, Center for Biomaterials & Advanced Technologies, Johnson & Johnson, Somerville, NJ, USA. “I had an MSc in agricultural biochemistry/marine sciences and wanted to be a marine biologist. When I realized only a few jobs were available, I rethought my option and spent two years in a rheumatology laboratory at Children’s Hospital of Philadelphia, which inspired me to obtain a PhD in immunology at the University of Pennsylvania. Between 1985 and 1989, I held postdoctoral/staff fellow positions at the National Institute of Allergy and Infectious Diseases, studying the role of integrins in T cell development and function. And from there I joined J&J as a

senior scientist. From 1995 I have been biology leader for J&J’s p38 kinase inhibitor program, anti-inflammatory drug discovery, putting four compounds into preclinical development. Since 2002, I have worked on various drug-device combination products, resulting in four prototypes handed off to operating companies for preclinical development. Currently at the Center for Biomaterials & Advanced Technologies I am continuing work on discovery/development of novel drug/biologic-device combination products for indications in orthopedics, postsurgical adhesion, postoperative ileus and drug-eluting stents.”

Martin Wasserman, former Pfizer, GlaxoSmithKline, Bristol-Myers Squibb, Roche, Aventis and AtheroGenics (Alpharetta, GA, USA) executive. “I began my career with an undergraduate degree in pharmacy and spent five years as a registered pharmacist in a drugstore. I decided to matriculate to The University of Texas Medical Branch in Galveston to pursue a PhD degree in pharmacology and toxicology, which I received in 1972. I was immediately recruited by The Upjohn Company in Kalamazoo, Michigan (now Pfizer), where I spent over nine years as a bench researcher in the hypersensitivity diseases research department. I was then recruited by SmithKline & French (now GlaxoSmithKline) to head the pharmacology department. When SK&F merged with Beecham in the late 1980s, my position was eliminated and I sought a position with Bristol-Myers Squibb as their first director of human pharmacology (a newly created position in clinical research) where my group performed creative phase 1 studies. After spending over three years at BMS, I was sought and hired by Hoffmann-La Roche as director of bronchopulmonary pharmacology in research, from which four years later I was recruited by Marion Merrell-Dow to become the group director of three departments (immunology, metabolic diseases and respiratory research). Soon after, MMD became Hoechst Marion Roussel and later, Aventis, and then Sanofi-Aventis, where my title was vice president and senior distinguished scientist in the respiratory and rheumatoid arthritis disease group and the acting head of oncology. After seven years, an interesting opportunity arose at a small startup biotech company, AtheroGenics. I became senior vice president of discovery research and chief scientific officer. After four and a half years, I chose to officially retire after 35 years in the drug industry and relocate to be closer to my children in California. Now settled there, I am commencing a campaign to explore opportunities to consult with the industry, academia or institutions.

Box 1 In their own words

Long days are the norm, according to Aventis’ Robert Lewis.

CAREERS AND RECRU ITMENT©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


“A manager should ask him/herself how able he/she is to thoughtfully delegate and monitor tasks without micromanaging, on one hand, or being blindly dependent upon others, on the other hand,” he says. “It is important for a scientific manager to be able to modulate the pace of his/her day and not to be constantly overworked. There is no reward for burnout!”

Passion. Given the workload and day-to-day frustrations associated with working in the drug industry, a common motivation among the inter-viewees was altru-ism—to help reduce human suffering through the discov-ery of new medicines. Franz Hefti, for-merly at Genentech and Merck, and now executive vice president at Rinat Neuroscience (S. San Francisco, CA, USA), says: “It has always been my dream and goal to bring better medica-tion to people who suffer from diseases of the nervous system. The ability to help [do] this is [my] dominant motivational force.” Van der Ploeg is also upbeat about the drug discovery endeavor: “I am in this business because of great science, stimulating and excellent col-leagues, and motivated teams.” At Roche Palo Alto in California, head of genetics and genomics Gary Peltz emphasizes the research challenge: “I enjoy solving scientific problems that can impact human health. I am particu-larly fortunate to work with a motivated and talented multidisciplinary team of scientists (in genetics, statistics, computation, genomics and biology) that can undertake high risk/high reward projects.”

Several other managers also emphasize the rewards of participating in the research endeavor. “I enjoy the collegial nature of scien-tific/biomedical pursuit, the people-to-people interactions and achieving global recognition for my work,” says Martin Wasserman, a former manager at five pharma companies who recently retired from his post as chief scientific offi-

cer at AtheroGenics (Alpharetta, GA, USA). Elsewhere, Aventis’ Lewis praises

“the people, individually, the science (as a continuous learning experience) and the opportunity to drive many new and poten-tially productive ideas into actual experiments that challenge hypotheses.” Reinhard Ebner, principal scientist at Avalon Pharmaceuticals (Germantown, MD, USA), feels that a manag-er’s capacity to play an instrumental, or even leading, role in a team that finds the answer to a complex, long sought-after problem can be an “indescribably rewarding experience, second only to the involvement in an initiative that succeeds in making a concrete contribution to the development of a solution for a previously unmet medical need.”

And it’s not only the altruistic side of the drug discovery business that galvanizes people. For Scott Wadsworth, research fellow at Johnson & Johnson (New Brunswick, NJ, USA), it’s “the independence and entrepreneurial spirit that exists, despite being part of a huge corporation. I like the opportunity to have an impact in a large corporation.”

Broad experience. Given the diverse respon-sibilities and skill required in a research man-ager position, it helps to be well read and to develop as broad a scientific and busi-ness knowledge as possible. Wadsworth says it is important to “diversify your experience,” add-ing, “Make sure you have demonstrated significant, quantifi-able, reproducible successes in your early career. Network as much as possible within your company and outside. Gain as much exposure outside your company as possible, via speaking engagements, chair-ing meetings, publishing, etc. Gain as much

Box 2 Starting out

The research managers interviewed for this article had several pieces of advice for those thinking of moving from the bench into research management at a company. Wyeth’s Bob Ruffalo exorts fledgling managers to “work very hard, publish extensively and remember that discovering and developing new drugs is one of the most noble professions, which patients depend upon us to do.”

But what practical steps can you take to increase your chances of making the transition? Roche’s Gary Peltz says that when he visits universities and meets with graduate students and postdocs, he was initially quite surprised to find one universally asked question: “What was it like in industry?” “They were more concerned with my answer to that question than discussing their science,” he says. “It was clear that virtually all academic programs offer very little career counseling or direction for trainees, which is a major deficiency.” Rinat’s Franz Hefti agrees: “It’s important to understand the differences between academic and industrial research. The goal of academic research is to understand nature; the goal of biopharmaceutical research is to find effective treatment for human diseases. Academic research favors an individualistic approach that emphasizes the contribution of an individual; industrial research favors teamwork and emphasizes the common goal.”

Peltz’s pragmatic suggestions for students: “First, inquire about and explore a number of options before choosing a career path. Second, realize that there is a wide range of options within industry. Just as the experience at Stanford is very different from that at a local community college, the cultures and experiences in small startup companies differs from that in large pharma companies. Lastly, I strongly suggest that students read Tom Friedman’s book, The World Is Flat. Things are changing within the pharma industry; the pace of change is going to accelerate, and you’d better be prepared for it.”

Martin Wasserman advises those interested in research management careers to “consider an undergraduate degree in pharmacy, which permits exposure to most of the biomedical disciplines, unlike a pre-med degree,” adding, “Where possible, take courses in biotechnology.” Charles River’s William Shek also notes, “some of my colleagues have gotten MBA degrees and gone on to senior management.”

Wasserman also stresses the importance of attending job fairs and local and national meetings for exposure and appointments. “Try to network with recruiting firms,” he says, “and consider investing in society memberships; invest in the FASEB [the Federation of American Societies for Experimental Biology] directory of members.” Finally, he advises “learn who the executives are and try to set up appointments with them.”

Rinat’s Franz Hefti: Helping people is the dominant motivational force.

Martin Wasserman advises scientists interested in management to network.

Reinhard Ebner of Avalon says the smaller the company, the more demanding and wide-ranging the management problems.


2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


management experience as possible, by lead-ing project teams, mentoring postdocs, host-ing interns, etc. Do all that and management positions will come naturally.”

Charles River’s Shek also emphasizes the importance of broad horizons. “I have found it to be particularly important to acquire knowl-edge and skills beyond my field of scientific specialization in the areas of quality control, project management and bioinformatics,” he says. He adds, “I have had the resources to do many interesting things, to expand my knowledge and skills and to collaborate with highly intelligent and talented colleagues at and/or outside [the company]. As a member of the research models and services division of Charles River, I have participated in a wide array of projects involving diverse disciplines including genetics, diagnostics, engineering, bioinformatics and so forth.”

Flexibility, inspiration and leadership. There is no doubt that the pharma industry is currently undergoing a difficult period in terms of sustaining growth, meeting investor expectations and public perception. Working to improve the poor productivity and high attrition of drug pipelines is a key goal for many research managers. Roche’s Peltz puts the problem like this: “My major challenge is maintaining momentum and progress within a constantly changing environment that has an increasingly near-term outlook. This makes it more difficult to maintain cohesion among the large number of individuals performing

the work, and with the stakeholders concerned about the outcome. Discovery science is a lot like cooking; if you open the oven door too often, the cake will not rise.”

There is a burgeoning demand for experi-enced research managers in biotech compa-nies; many of these are being recruited from pharma. But as Avalon’s Ebner points out, having consecutively worked for established large, growing medium-sized, and entirely new startup biotech companies, the decision paths in small and large enterprises are very different. “The younger and more unfin-ished an institution, the more demanding, wide-ranging and intensive the management problems. This is most acute in the startup setting, which is a bit like starting a family restaurant, where everyone has to help out on every front.”

The increasingly cross-disciplinary nature of research and the need to collaborate intra-murally and extramurally also creates man-agement headaches. “Some of the bigger and most difficult, yet most important, questions can only be answered by the coordinated stud-ies of many investigators, often from different institutions and countries. This has been true for many fields of discovery for a while, but is now increasingly apparent in the biologi-cal sciences. Making the best use of combined efforts almost always requires a great deal of organizational, communicative, planning and even diplomatic skills,” says Ebner.

A research manager’s job also offers the opportunity to mentor and reward excellence

and achievement within a team. According to Lewis, with senior management positions, he really enjoyed the chance to do “things that seriously affect the quality-of-life for employees in a positive way; this means that the ‘power’ of a senior job is most useful when it is used to (appropriately) enrich the lives of the most junior colleagues.”

ConclusionsThere are several keys to success in making the transition from the bench to the boardroom (Box 2). Research managers need determina-tion, diligence and passion to do work that matters and that makes a difference. They must also possess the experience to lead and ability to inspire their team. The most effective managers are patient, have a sense of humor, respect their colleagues and are willing to subordinate their ego for the benefit of the organization.

If you love being a scientist but crave the financial and professional benefits of management, heading a research group as a department or division leader at a com-pany offers several opportunities. In this role, you will have greater supervisory and budget management responsibilities, and the compensation that comes with them. A per-son who is a great bench scientist will never be happy being a mediocre manager, but a great scientist who has the ability and desire to move into management has a whole new set of opportunities to achieve important and satisfying results.


2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy


Genomic Vision (Paris) has appointed founder Aaron Bensimon as president and CEO. He will be assisted by Daniel Nerson, who has been named chief operating officer. Dr. Bensimon has been head of the genome stability unit at the Institut Pasteur since 1994, where he devel-oped molecular combing technology and its use in the precise study of genomes. The technol-ogy has resulted in 13 patents granted to the Institut Pasteur, for which Genomic Vision has an exclusive license.

Novavax (Malvern, PA, USA) has announced the appointment of Jeffrey Church as vice president, chief financial officer and treasurer. He joins the company from GenVec, where he served as CFO, treasurer and corporate secretary since 1988.

Bernhard R. M. Ehmer has been appointed to the supervisory board of Hybrigenics (Paris) as a non-executive independent director. Dr. Ehmer is currently CEO of BioPheresis Technologies, and previously served at Merck KGaA in sev-eral capacities, most recently as vice president for corporate strategic planning and alliance management.

Sylvie Gregoire has been appointed executive chair of the board of directors at IDM Pharma (Irvine, CA, USA). She has been a board member since August 2005. Dr. Gregoire previously served

as president and CEO of GlycoFi, and currently serves on the boards of Cubist Pharmaceuticals and Caprion Pharmaceuticals.

Algeta (Oslo, Norway) has appointed Johan Harmenberg as chief medical officer and Michael Dornish as chief scientific officer. Before joining Algeta, Dr. Harmenberg spent nine years at Medivir as vice president of development. Dr. Dornish has nearly 25 years’ research experience in the life sciences industry, most recently as vice president, R&D at FMC Biopolymer. Dr. Dornish replaces Algeta co-founder Roy Larsen, who has decided to pur-sue other interests and opportunities but will continue as a consultant to the company.

Peter Hnik has been named chief medical officer at iCo Therapeutics (Vancouver, BC, Canada). Dr. Hnik most recently served as associate director of clinical research with QLT, playing a critical role in designing and directing Visudyne clinical trials in AMD and diabetic retinopathy.

Celera Genomics Group (Rockville, MD, USA) has named Joel Jung vice president of finance. Mr. Jung has held several executive and senior positions with Chiron, including most recently vice president and treasurer.

Rosemary Mazanet has joined the board of directors of Cellumen (Pittsburgh, PA, USA).

Dr. Mazanet is presently CEO of Breakthrough Therapeutics and acting CEO of Access Pharmaceuticals. Previously, she has served as the CSO and general partner of Oracle Partners, and before that was director of clinical research at Amgen.

James A. Ratigan has joined Nitric BioTherapeutics (Philadelphia, PA, USA), for-merly known as Theranox, as CFO. He previ-ously served as executive vice president and CFO of Orapharma, where he raised private capital for the startup, directed its IPO in 2000 and helped orchestrate its sale to Johnson & Johnson.

ArQule (Woburn, MA, USA) has named Nigel J. Rulewski as chief medical officer. Dr. Rulewski brings to ArQule more than two decades of experience in R&D, regulatory affairs and com-mercialization, having previously served as senior vice president of BioAccelerate and vice president, medical affairs and chief medical offi-cer at Astra USA.

ADVENTRX Pharmaceuticals (San Diego, CA, USA) has announced that Joachim P. H. Schupp has been appointed to the newly created posi-tion of vice president of medical affairs. Dr. Schupp served most recently as vice president of clinical business solutions and clinical data services at ProSanos.

Steve Toon has joined the board of Simcyp (Sheffield, UK), which offers in silico simulation and prediction of pharmacokinetics and drug-drug interactions in virtual patient populations. Dr. Toon has over 20 years’ experience in the pharmaceutical industry, previously serving as CEO of Medeval.

AAIPharma (Wilmington, NC, USA) has named Martin Tyson to the position of senior vice president, information systems and tech-nology. Mr. Tyson was most recently senior vice president and chief information officer for Quintiles Transnational. The company also announced the appointment of Ninad Deshpanday to the newly created position of vice president of pharmaceutical business development. Previously, Dr. Deshpanday was vice president of drug product development for Synta Pharmaceuticals.

Archemix (Cambridge, MA, USA) has announced the appointment of Robert Schaub (left) as vice president of preclinical discovery. Dr. Schaub comes to Archemix after 16 years with Genetics Institute and Wyeth Pharmaceuticals, most recently as the assistant vice president for cardiovascular and metabolic diseases. “Archemix’s aptamer technology has the potential to bring forth an entirely new class of therapeutics for the treatment of acute and

chronic diseases,” says Dr. Schaub. “I look forward to leveraging my experience with both biotherapeutics and small-molecule drug candidates to this new class of therapeutics.”

In addition, Archemix has elevated Page Bouchard (above) to lead its research and preclinical development group. Dr. Bouchard joined Archemix in November 2004 as senior vice president of preclinical drug discovery and development. “Page possesses the unique combination of scientific experience and leadership skills necessary to guide our rapidly expanding pipeline of therapeutic aptamers through research and preclinical development,” says Archemix president and CEO Errol De Souza. “We are privileged to have a leader of his caliber directing our R&D efforts.”

P E O P L E©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

Documents

document