36
117 Molecular Medicine. DOI: http://dx.doi.org/ © 2012 Elsevier Inc. All rights reserved. 2012 10.1016/B978-0-12-381451-7.00004-9 Omics CHAPTER 4 OUTLINE INTRODUCTION The Human Genome Project has generated more questions than answers, but there is little doubt that it has led to many new technological developments. The ability to study all or most genes, mRNA transcripts, proteins and a range of cellular products was considered as the emergence of omics in Chapter 1. This chapter will expand on omics, and the way it is driving research discoveries and clinical care through molecular medicine. DNA SEQUENCING Technology Arguably one of the most significant, recent, technological developments has been DNA Introduction 117 DNA Sequencing 117 Technology 117 Bioinformatics Support 119 Research Applications 121 Clinical Applications 122 DNA Microarrays 125 Technology 125 Gene Expression 126 SNP Microarray 127 Array-Based Comparative Genomic Hybridization (aCGH) 128 Bioinformatics 129 Research Applications 131 Clinical Applications 134 Other Omics 137 Proteomics 137 Metabolomics 140 Phenomics 142 Metagenomics 146 Systems Biology 147 Clinical Applications 147 Overview 149 References 151

Molecular Medicine || Omics

Embed Size (px)

Citation preview

Page 1: Molecular Medicine || Omics

117Molecular Medicine. DOI: http://dx.doi.org/ © 2012 Elsevier Inc. All rights reserved.201210.1016/B978-0-12-381451-7.00004-9

Omics

C H A P T E R

4

O U T L I N E

INTRODUCTION

The Human Genome Project has generated more questions than answers, but there is little doubt that it has led to many new technological developments. The ability to study all or most genes, mRNA transcripts, proteins and a range of cellular products was considered as the emergence of omics in Chapter 1. This chapter will expand on omics, and the way it is driving

research discoveries and clinical care through molecular medicine.

DNA SEQUENCING

Technology

Arguably one of the most significant, recent, technological developments has been DNA

Introduction 117

DNA Sequencing 117Technology 117Bioinformatics Support 119Research Applications 121Clinical Applications 122

DNA Microarrays 125Technology 125Gene Expression 126SNP Microarray 127Array-Based Comparative Genomic

Hybridization (aCGH) 128

Bioinformatics 129Research Applications 131Clinical Applications 134

Other Omics 137Proteomics 137Metabolomics 140Phenomics 142Metagenomics 146

Systems Biology 147Clinical Applications 147

Overview 149

References 151

Page 2: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

118

sequencing; which is now faster, cheaper and easier to do [1,2]. A chronology of the develop-ment of DNA sequencing is given in Table 4.1. It shows a slow start before 1977, followed by a period of reliance on Sanger sequencing, which achieved its full potential once it became fully automated. A new round of innovations fol-lowed the completion of the Human Genome Project in 2000, leading to the development of next generation (NG) DNA sequencing.

Although Sanger enzyme-based DNA sequencing was initially less popular than the Maxam and Gilbert chemical method, it soon became the preferred technique because it needed fewer toxic materials and, once the

DNA cloning step was no longer required, it became faster and easier to use. Improvements over the original chain termination method and the availability of capillary electrophoresis (Box 4.1) ensured that it is now a routine part of clinical diagnostic work. However, it remains expensive, and considerable work is needed to annotate the DNA variants found (Chapter 3).

The next major development was an increase in throughput, that was made possible by dra-matically increasing the number of sequences generated. Initially called massively parallel sequencing, this is now known as NG DNA sequencing (Table 4.2). The megabytes (Mb) of DNA sequences that were generated by the

TABLE 4.1 Landmarks in the development of DNA sequencing.

Date Event

1953 Structure of DNA shown to be a double-stranded helix.

1972 Recombinant DNA technologies allow DNA to be cloned.

1977 Sequencing methods developed by A Maxam and W Gilbert using chemical degradation and F Sanger using enzymatic synthesis. Sanger and Gilbert awarded Nobel Prize for this achievement.

Late 1980s First semi-automated sequencing platforms developed commercially. Sequencing lengths generated measured in kilobases (Kb).

1995 First complete bacterial sequence described for H. influenzae by J Venter and colleagues. Now sequencing options expand from Kb to Mb (megabase).

2000 First sequence of human haploid genome announced. Takes until 2003 for annotated more accurate version to be published. Cost around $3 billion.

2004 The US National Human Genome Research Institute funds work to reduce the cost of a whole genome sequence to $1 000 in 10 years.

2004–2005 The move from megabase (Mb) to gigabase (Gb) of sequencing length comes with the realization that Sanger sequencing has reached its limitation and new approaches based on massively parallel sequencing start to emerge.

2007 Complete human diploid genome sequences publicly announced for J Watson and J Venter with the former costing about $2 million.

2009 First human genome sequence using single molecule sequencing technique.

2010 Single molecule sequencing (third generation) has the potential to increase sequence data generated from Tb (terabyte) to Pb (petabyte). Advantages include: faster, occurs in real time, longer read lengths and easier detection of heterozygous changes. Claims are made that a whole genome sequence will cost $100 and take 15 minutes to do within 5 years.

2010 First publication of a human metagenome – an additional layer of complexity for bioinformatics.

Page 3: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

119

Sanger method expanded up to the gigabyte (Gb) and terabyte (Tb) range. In NG DNA sequencing, individual sequencing fragments are very small (around 100 bp) so it became nec-essary to have multiple coverage with a de novo whole genome sequence typically requiring 330 coverage to ensure that most (but not nec-essarily all) regions of DNA were adequately represented. For re-sequencing or targeted sequencing applications, a less dense cover-age was acceptable. Another popular strategy was exome sequencing, which was considerably easier and faster to do as only known exons (including their exon-intron boundaries) were included in the clonal/amplification/sequenc-ing steps (Figure 4.1).

The four components to NG DNA sequenc-ing are shown in Figure 4.2. NG DNA sequencing utilizes some conventional meth-odologies such as sequencing by synthesis (DNA polymerase) or ligation (DNA ligase) and these remain expensive. A significant time and

cost limitation is the DNA or RNA preparation steps that require cloning or PCR. However, more novel approaches are being developed including sequencing in real time. One that is particularly interesting is single molecule sequencing because it allows the initial cloning or amplification step in NG DNA sequencing to be bypassed thereby saving time and money. Presently, single molecule (also called third generation) DNA sequencing is in the roll-out phase and there are only a limited number of research publications available that describe its utility and applications (Table 4.2).

Bioinformatics Support

Compared to Sanger sequencing, the data output from NG DNA sequencing is a signifi-cant bioinformatics challenge with support required in: (1) Production bioinformatics to pro-cess raw sequence data generated including quality assurance steps to remove suboptimal

After PCR the next step is usually electro-phoresis to assess PCR products or separate them into fragments. DNA is negatively charged and so migrates towards the positive electrode (the anode). Sizing of separated DNA is under-taken by comparing against standard markers. Mobility shifts can now be identified and meas-ured. The separation of DNA is possible with slab gels made from agarose or polyacrylamide (for smaller fragments). Slab gels have inher-ent problems when precise fragment sizing is needed for clinical diagnosis or forensic cases:

1. Variable texture that can influence electrical conductance leading to inconsistency in fragment size calling, and

2. Automation is difficult.

For clinical and forensic DNA testing slab gels have been replaced by capillary gels. These are commercially produced and involve a very fine capillary packed with gel. Capillary gel electrophoresis has revolutionized DNA electro-phoresis because it is fast, reproducible, auto-mated and quality assurance measures can be implemented. Sizing is undertaken by computer software which takes away another source of human error.

BOX 4.1

D E V E L O P M E N T S I N D N A E L E C T R O P H O R E S I S .

Page 4: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

120

TABLE 4.2 Comparisons between DNA sequencing methods [1,2].

First generation (Sanger) DNA sequencing: In 1977 chemicals and radioactivity were used for sequencing but these were soon replaced by enzymatic methods. Until PCR became available DNA needed to be cloned to generate multiple copies of single fragments. Sequencing reagents were then incorporated into the PCR itself in what was known as dye termination (Sanger) sequencing. For Sanger sequencing about 100 DNA fragments yielding ~1 Kb of read length are sequenced in parallel. The introduction of capillary electrophoresis allowed greater automation, better QC and more accurate sizing. Multiple samples (96) could be analyzed simultaneously. This type of sequencing remains the gold standard in relation to error and reproducibility.

Second Generation (Massively parallel or Next Generation) DNA sequencing: From 2005, Mb to Gb of DNA sequence could be generated through massively parallel sequencing of millions of short (50–150 bp) fragments. Although this terminology best describes the new development, it was soon overtaken by the preferred Next Generation (NG) DNA sequencing. For this development, it was necessary to fragment DNA and then prepare pure cloned DNA using various types of PCR including emulsion or bridge PCR. The actual sequencing utilized the traditional Sanger synthesis approach or other methods such as ligation. This was the first major step towards the $1 000 whole genome sequence. It is still expensive compared to Sanger sequencing although the dollar per base cost is very cheap. A downside of NG DNA sequencing is the relatively small fragments sequenced although each year the read length increases. The size limitation is overcome by the depth of sequencing (for example 330) through the generation of massive (parallel) amounts of overlapping sequences that are then able to be placed in the appropriate part of the jigsaw puzzle through bioinformatics. One ongoing concern is that small fragments might give a distorted view of the genome and so NG DNA sequencing remains under evaluation for clinical diagnostic work.

Third Generation (single molecule) DNA sequencing: This started around 2007 and is work in progress. Advantages include bypassing the initial library/cloning/PCR DNA preparation step and going directly to sequencing of single molecules. This became possible as miniaturization allowed a single DNA molecule to be sequenced in real time. Read lengths are predicted to be longer and output is said to be 1 000 times NG DNA sequencing. The expected commercial competition will ensure hardware costs continue to fall. The use of single-stranded DNA without the requirement for cloning or PCR is attractive from a clinical diagnostic perspective because it avoids the inherent errors that occur with amplified DNA. Informatics implications are more complex as Tb to Pb of data are generated. Computer storage capacity and analytic software will remain significant limitations.

Germline DNA Somatic DNA

Research Direct-to-Consumer

Sangersequencing

TargetedNG sequencing

ExomeNG sequencing

Wholegenomesequencing

OR

Clinical diagnostic

FIGURE 4.1 The evolution of DNA sequencing. Traditional Sanger sequencing is moving towards Next Generation (NG or massively parallel) DNA sequencing with a whole genome sequence the ultimate goal. In the meantime, there are intermediate applications proving popular until the costs or the bioinformatics infrastructure for whole genome sequenc-ing are addressed. They include: (1) Targeted sequencing (or re-sequencing) which allows the study of many genes in the one sequencing run. An example would be to study all known breast cancer related genes (around 20 genes in 2011) rather than the limited BRCA1 and BRCA2 genes to progress further down the path of personalized medicine. (2) Exome sequencing (all exons in the human genome) which has enabled the discovery of new genes for Mendelian disorders. The different DNA sequencing options should also be considered in the context of germline DNA versus somatic cell DNA, and how they were provided (research, diagnostic or direct-to-consumer).

Page 5: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

121

sequence. Data are then ready for the customer who will have specific requirements for analy-sis, and (2) Analytic bioinformatics which is the next step in the process and dependent on the research aims.

Approximately half the costs of NG sequenc-ing are in the bioinformatics component, i.e. software costs and skilled scientists’ time. In most cases it is the bioinformatics that is lim-iting, as software available for conventional DNA sequencing, with its focus on long read lengths, does not work well with the shorter read lengths and large data sets generated with NG DNA sequencing. Thus, new software and algorithms are being developed.

NG DNA sequencing is evolving rapidly into third generation platforms, making a $1 000 (or

cheaper) whole genome sequence possible in the not too distant future (Box 4.2). As the data generated in sequencing expand to Pb (peta-byte), resources may become rationalized, with fewer but larger centralized laboratories per-forming the actual sequencing and analytic bio-informatics being conducted in-house. If this happens, more bioinformatics capacity will be required. Cloud computing may solve some of these issues, particularly storage, but there will be concerns around privacy and security, as the legal oversight will be dependent on where the computing facility is located.

Research Applications

One concern of the Human Genome Project was the bypassing of hypothesis-driven research, with block-buster type projects rely-ing on a mass of data to produce something useful. NG DNA sequencing will further pro-mote this approach. Nevertheless, impressive research findings have already emerged and it is increasingly difficult to criticize a strategy that may be the only way to answer difficult questions. NG DNA sequencing in medical research has been used for:

l Cataloging and understanding diversity in humans, animals and other organisms.

l Revisiting the pathogenesis of complex diseases.

l Replacing or improving GWAS.l Providing an alternative approach to

transcriptomics.l Drug development through identification

of novel targets.

Although discussion and interest tends to focus on whole genome sequencing, a related strategy that is often preferred is whole exome sequencing, because it is cheaper, easier to do and has smaller bioinformatics requirements. This approach captures only a small proportion of the genome (the protein-coding genes), which

FIGURE 4.2 Four components to NG DNA sequencing. (1) DNA (or RNA) preparation steps, preparation of librar-ies and fragment amplification by PCR. These are time con-suming and costly steps likely to be replaced by more direct access to DNA through single molecule sequencing in third generation platforms. (2) The DNA sequencing method-ologies usually involve a stepwise chemical synthesis step. This represents a target for cheaper costs and improved efficiency in third generation platforms. Two bioinformat-ics steps follow: (3) Data processing and (4) Data analysis. These remain potential road blocks to the $1 000 genome having clinical utility because the cost will not be in the actual sequencing but the bioinformatics. As already noted, third generation platforms will reduce the sequencing costs but will complicate the bioinformatics because of the larger data sets (Tb to Pb) compared to Gb to Tb with NG DNA sequencing.

Generating

DNA

sequence

Production

of sequence

data

Analysis

of sequence

data

1

2

34

DNA/RNA

Libraries

PCR

Page 6: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

122

is a limitation since it will miss regulatory sequences and copy number variations (CNVs).

Whatever approach is used, the sensitivity and specificity of NG DNA sequencing is still being defined, particularly for clinical applica-tions. This is made more difficult as new plat-forms continue to emerge on a regular basis. Quality assurance issues with sequencing are

important in research and more so in clinical testing.

Clinical Applications

The many research applications of NG DNA sequencing had placed little pressure on indus-try to consider how this technology might be

As well as the goal of a $1 000 whole genome sequence, another incentive for progress was announced in 2006. This was the Archon Genomics X Prize, worth $10 million, to be given to the team that could sequence:

1. 100 human diploid genomes;2. In 10 days;3. For $10 000 per genome;4. With 1 error in every 105 bases sequenced,

and5. The sequence must accurately cover at least

98% of each genome [3].

The X Prize Foundation is described as an edu-cational non-profit organization, whose goal is to create radical breakthroughs for the benefit of humanity. The reasons given for selecting whole genome sequencing was to include in mutation detection a more comprehensive profile of an individual’s DNA mutations including those that might be missed because they are in regulatory regions or repetitive sequences, and to catalog mutations occurring exclusively in somatic cells. Getting a more comprehensive profile of an indi-vidual’s genomic makeup was expected to assist the pursuit of personalized medicine for pharma-cogenetics and preventive medicine by screen-ing for mutations before disease was established. Ultimately the personalized medicine approach

would mean lower health costs. As of late 2011, the Archon Genomics X Prize had not been won and the cost for sequencing a whole genome was considerably less than $10 000. Therefore, a new initiative was announced, revitalizing the Prize and making it more focused and relevant – with the $10 million reward remaining. Now the pur-pose was to sequence 100 human genomes from centenarians and so was dubbed as 100 over 100. Since centenarians represent a rare and extreme human model for studying aging, it is hoped that the whole genome sequencing approach might shed further insight into the genetic basis of aging, as well as providing an incentive for improved technology. It is interesting to compare the standards expected in 2006 with those in 2011 which required:

1. A whole, medical-grade, genome sequence;2. 100 human haploid genomes;3. Completed within 30 days (the longer

time frame was considered necessary after consultation with industry);

4. Total cost of $1 000 per genome;5. Accuracy of 1 error per 106 bases, and6. 98% completeness including identification of

insertions, deletions and rearrangements.

The competition was scheduled to run over a month from 3 January to 3 February 2013.

BOX 4.2

A R C H O N G E N O M I C S X P R I Z E .

Page 7: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

123

used in the clinic. However, in 2010 new plat-forms emerged, designed for the clin ical diag-nostic laboratory. Some applications for patient care include:

l Somatic cell cancer DNA testing.l Targeted gene DNA testing.l Diagnosis of difficult cases.l Clinical screening of asymptomatic

individuals.l Reproductive screening.

Discussions of the role of NG DNA sequenc-ing in clinical care will generate many differ-ent views. Some clinicians are emphatic that the technology should not be the driver, and it is still too early to move this broad, catch-all, sequencing strategy from research into the clinic. Others express the view that NG DNA sequencing has the potential to revolutionize the way medicine is practiced, particularly in terms of personalizing decision making. Two key points in the debate are technological/qual-ity issues, and the way this type of sequencing is delivered in clinical care. Unsurprisingly, the direct-to-consumer market has taken on NG DNA sequencing and is moving forward with attractive offers underpinned by broad dis-claimers, encouraging individuals to purchase their whole genome sequences (Chapter 5).

Overall, the technological aspects of NG DNA sequencing for clinical care are less of an issue, although concerns around quality remain to be addressed. There is a general view that results are not given to the patient until they are validated against the gold standard of Sanger sequencing or they are confirmed using a different NG DNA sequencing plat-form. Outstanding technological issues will be addressed as the analytic platforms evolve. Bioinformatic tools are important because they allow results to be filtered, so even if a whole genome or exome sequence is obtained, it is possible to remove or hide data or genes or segments of the genome that are irrelevant to the clinical problem under consideration. This

becomes a form of targeted DNA sequenc-ing and helps reduce the number of unwanted incidental findings that will invariably emerge with NG DNA sequencing.

In contrast to the above, there is less con-sensus on how NG DNA sequencing will be delivered as a clinical service. The first two of the five clinical applications described above are moving forward. Somatic cell DNA testing for cancer is being developed through work like that of the International Cancer Genome Consortium (Chapter 7). Although guided by various research protocols, the results obtained are being used, often on an ad hoc basis, for decisions on patient care. The second applica-tion, involving targeted sequencing, is also pro-gressing. For this, a number of genes relevant to a patient’s clinical disorder can be studied simultaneously, rather than being sequenced separately. This could potentially be affordable (and so improve access), have a reduced turn-around time and give a better overview of the health problem.

An example of targeted NG DNA sequenc-ing is an individual with hypercholesterolemia, who could have the LDLR and other genes involved in lipid metabolism sequenced to confirm the diagnosis of familial hypercholes-terolemia as the underlying cause of elevated cholesterol levels. The DNA mutation can then be used for testing asymptomatic family mem-bers (predictive DNA testing). Sequencing applications can be taken further if the patient is treated with a cholesterol lowering agent such as a statin, since it becomes possible to check for the presence of genes with pharmaco-genetic relevance (Table 3.8). This comprehen-sive DNA-based care could be undertaken by cloning or amplifying by PCR the target genes and then NG DNA sequencing. Alternatively a whole genome or exome strategy can be fol-lowed, filtering out what is not needed.

NG DNA sequencing for diagnosing a dif-ficult clinical problem is acceptable, particu-larly if there is a significant health risk and

Page 8: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

124

conventional approaches have failed to find the cause. Some examples of this approach are start-ing to emerge, which illustrate how this technol-ogy can be life saving (see Overview below).

What is certain is that as more sequencing is done, more variants of unknown significance (VUS, Chapter 3) will be found, and these will place an increasing burden on the laboratory and the clinician. The patient and family may be given a list of DNA changes that are yet to be classified in terms of an illness, and more problematic, DNA mutations that are associ-ated with known diseases. Thus, germline NG DNA sequencing to screen healthy individuals is potentially a concern because of the likelihood that variants with pathogenic potential will be found incidentally. Some of the earliest whole genome sequences of high profile individuals, including the Nobel Laureate James Watson and the genomics researcher J Craig Venter, have already demonstrated that each individual can have over 100 of these changes, including some that are purported to be lethal, with no appar-ent effects on health. Which sequence changes have real consequences, and which are artifacts of the technology remain to be determined.

Case StudySome insight into how personalized medi-

cine will be developed through whole genome sequencing is starting to emerge. An example of this is a clinical risk assessment based on a 40 year old male’s family history and his whole genome sequence [4]. Genomic risk factors were estimated from:

1. Variants in genes causing Mendelian genetic disorders;

2. Novel mutations detected during the study;3. Variants implicated in genes influencing drug

metabolism, i.e. pharmacogenetic tests, and4. SNPs associated with complex genetic

disorders.

This study highlights a new paradigm for medical care based on comprehensive but as yet

incomplete knowledge. In particular, the use of SNPs in complex genetic disease is contentious, as many findings come from association stud-ies and so the clinical utility for an individual’s health is difficult to assess. It is sobering but not surprising to note that one variant found in this patient was reported to cause late onset hypertrophic cardiomyopathy, and then subse-quently shown to be a benign polymorphism. This emphasizes the need for: (1) Mutation databases and their careful and methodological curating to ensure that data entered are correct, and (2) A more rigorous approach when analyz-ing variants if only in silico data are used.

The way in which the results were given to the patient is also noteworthy. Included were three tables with a long list of genes and vari-ants associated with disease but different degrees of significance using headings such as unknown importance or potentially important. The results were depicted in a complex conditional depend-ency diagram, highlighting risks of various diseases that had at least a 10% post-test risk probability. A finding that should be followed with further evaluation to assess clinical util-ity was the comment that ….63 clinically relevant previously described pharmacogenetic variants …… in genes that are important for drug response …… [4].

New Clinical Paradigm?Will NG DNA sequencing change the way

DNA genetic testing is undertaken in the clinic? Presently DNA sequencing using the traditional Sanger approach to detect mutations in the BRCA1 and BRCA2 genes costs over $2 000. Yet, the goal for NG DNA sequencing is to sequence the whole genome for around $1 000 (and exome sequencing already costs less than this). An obvious goal would be a once-in-a-lifetime whole genome sequence with the data stored in an electronic health record. Appropriate fil-tering then allows relevant genes to be inter-rogated each time they might provide useful clinical information, e.g. prior to taking medica-tion, or in testing for health issues like diabetes,

Page 9: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

125

heart disease and in an aging community, dementia. The same DNA sequence might also help at the time of death as a component of the traditional postmortem (Chapter 9). Since a whole genome sequence need only be done once to look for germline mutations, it would be very cost effective compared to the current piecemeal approach that relies on sequencing single genes. The economic benefits will not be missed by those holding the health dollars.

Some issues that will influence how effec-tively NG DNA sequencing progresses into the clinic include:

1. The accuracy of NG DNA sequencing compared to the gold standard Sanger sequencing. A UK study, describing exome sequencing to detect mutations in TP53, BRCA1 and BRCA2 genes in breast cancer, suggests that overall reagent costs and analysis times were reduced, and the sensitivity and specificity of Sanger sequencing could be achieved by obtaining 350 coverage with NG DNA sequencing [5];

2. How to ensure secure storage of the large data sets generated and the protection of privacy? Fortunately, various professional organizations have already started to deal with the relevant ELSI (Chapter 10). Perhaps only a temporary solution is needed if costs fall below $1 000 (and it has been suggested that they could fall as low as $100), since storage and privacy issues might be addressed by repeating the whole genome sequence each time it is needed;

3. The best way to evaluate the clinical utility of this approach. It has been proposed that only NG DNA sequencing that leads to actionable clinical decisions should be undertaken. This is sensible although it will depend on how actionable is defined, and

4. Educational and workforce issues need to be addressed, in particular the training of scientists and clinicians in the interpretation of DNA variants.

DNA MICROARRAYS

This section deals with the transcriptome and ways in which it may be studied. DNA microarrays (DNA chips) are 2D grids con-taining ordered high density arrangements of nucleic acids spots. Each spot (up to ~102 to 106 spots in any one array) represents a DNA probe that is attached to an inert surface such as a glass slide or a silicon wafer. Target DNA or cDNA can be hybridized to the probes. Microarrays allow a snapshot to be taken of gene or cellular activity in the cell. They also provide a composite picture of multiple DNA markers such as SNPs or CNVs. This informa-tion can be compared between controls (normal cells, tissue or study cohorts) and patients, to identify significant differences. High through-put screening of gene expression can reveal molecular signatures of what is occurring at the cellular level. This knowledge can be exploited in the clinic for diagnostic purposes, or in research to understand disease initiation and progression.

Technology

Microarray probes are either double-stranded (ds) DNA or oligonucleotides. The probes can be printed using similar technol-ogy to ink jet printers. dsDNA probes are larger than oligonucleotide ones and so have higher sensitivity, although the specificity may be lower. Since oligonucleotide probes are smaller, they allow a larger number of spots per micro-array. Printed microarrays can be developed in-house for particular purposes, and the array density is typically around 10 000 to 30 000 [6]. Commercially available in situ synthesized microarrays allow a much higher density of spots (around a million) because the oligonu-cleotide probes are synthesized directly onto the surface of the microarray. An example of this is the Affymetrix Genome Wide Human SNP Array 6.0 that has 1.8 3 106 spots (genetic

Page 10: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

126

markers) for detecting SNPs and CNVs. The costs of commercial microarrays are falling, but the trade-off is that they cannot always be individualized for particular experiments. Importantly, only known genes, variants or mRNA species are detectable.

Target DNA that is hybridized to the microarray can be labelled with fluorescein which allows multiple colors to be detected with a laser. Microarrays can be studied in different cells or tissues, and comparisons in terms of gene expression are made. An accepted cut off for gene expression in micro-arrays is greater than two-fold (this means an up-regulated gene) or less than 0.5 fold (that is, a down-regulated gene). It should be noted that expression microarrays are only screens. They identify likely changes in the transcrip-tome. Results need to be confirmed by more specific measures, such as real time Q-PCR (Table 3.3).

New bioinformatics tools were required to address the needs of microarrays. These included the design of probes for the hybridi-zation conditions required, and the analysis of complex data sets. Analysis includes the com-parison of the various hybridization signals to ensure quality and consistency between experi-ments as well as inter-laboratory variability. The ability to assess the intensity of the signal generated is basic to determining whether a gene is up or down regulated. Additional flex-ibility became possible when multiple colors were used in labeling genes. There are different types of gene microarrays allowing measure-ment of: (1) Gene expression; (2) DNA marker profiles, and (3) Detection of CNVs.

Gene Expression

The expression microarray that allows the transcriptome (all the RNA species in a given cell) to be studied and compared with the tran-scriptome in another cell has proven to be suc-cessful in both research and clinical service.

In this type of analysis it is possible to measure any number of mRNA species. For example, what is the difference at the genomic level between a cell line that is growing normally and the same cell line that has become cancer-ous? A way in which to make this comparison is to hybridize the mRNAs from the two different cell lines against a microarray which has genes of relevance to carcinogenesis (Figure 4.3). Differences in expression might help explain the biology of tumors, or detect tumor-specific tar-gets for better diagnostics or new drug develop-ment. Commercially produced microarrays are now available covering a wide range of genes (TP53, CYP450) or genetic pathways (apoptosis) or organisms (E. coli gene array).

More objective predictors to guide treat-ment and prognosis would be invaluable for managing many diseases, particularly cancers. An example of what might be possible is the clinically-based microarray test MammaPrint® approved by the FDA for breast cancer diagno-sis (Box 4.3). The MammaPrint® test also high-lights a number of problems:

1. Costs must be reasonable to allow greater access for patients;

2. Work practices must change to ensure availability of fresh tissues (to isolate mRNA) rather than the traditional formalin preserved material or blocks;

3. Clinical utility needs to be evaluated, and4. Regulators must decide what is the

appropriate oversight mechanism for this type of test.

In clinical medicine, microarrrays might be useful for:

l Diagnostic confirmation and disease classification.

l Personalized treatment selection through analysis of the individual’s germline DNA and somatic cell DNA in tumor tissue.

l Better prognostic indictors derived from tumor DNA.

Page 11: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

127

SNP Microarray

As discussed previously (Chapter 2) genome wide association studies (GWAS) have signifi-cantly advanced the potential to detect genetic markers or genes implicated in complex genetic disorders. This was possible because:

1. Larger cohorts were tested;2. The genome could be divided into haplotype

blocks thereby needing fewer SNPs, and3. Multiplexing SNPs became easier and

cheaper with microarrays.

The Affymetrix SNP array was mentioned earlier. It was developed to enable applicabil-ity across many populations. It contains cover-age redundancy to optimize the detection rate, as it is difficult to ensure uniform hybridization conditions across all SNP probes. Alternative products are bead arrays such as Illumina’s BeadChips. These can be customized for a partic-ular need or available off the shelf, for example, there is a panel that contains SNPs from 400 genes implicated in cancer. The Illumina com-pany has also introduced flexibility in its analytic platforms, allowing both NG DNA sequencing and SNP genotyping to be undertaken with the same instrument. Apart from SNP detection, the commercial arrays enable CNVs to be detected.

An obvious clinical application for microarrays is mutation detection, as this would allow known mutations to be printed on a chip. A number have been produced, such as the Roche AmpliChip® CYP450 for drugs metabolized by CYP2D6 and CYP2C19. Despite its attractiveness, the microar-ray approach to DNA genetic testing has not been popular perhaps because of the costs of chips, and methods based on hybridization are not ideal for the close to 100% detection rate needed in clinical work compared to a lesser requirement in research. Another important consideration is that the underlying mutations for most genetic diseases are very heterogeneous with ones spe-cific to families often predominating. These private mutations would not be detected through

FIGURE 4.3 Comparing gene expression in normal versus cancer tissue with a DNA microarray. A microarray can identify important genes in a cancer tissue. Total mRNA from both normal and cancer tissue is made into cDNA. The normal tissue cDNAs are labeled with a green dye (Cy3) and the cancer tissue cDNAs with Cy5 (red color). The cDNAs are mixed in equal proportions, and hybridized to the microarray which has spotted onto it DNA probes for genes with relevance to cancer. Following hybridization, the excess cDNAs are washed off, and the microarray plate is scanned with a laser to detect four possible color changes: (1) Red – cancer tissue genes; (2) Green – normal tissue genes; (3) Yellow – genes from both cancer and normal tis-sue are expressing because red green yellow, and (4) Black – no marked genes are expressing. Using appropriate software and the results from control DNA samples, it is possible to identify the intensity of each red and green color to estimate the level of the gene being expressed as well as the global gene expression profiles.

NormalmRNA

TumormRNA

cDNA-Cy3

mix equal portions

cDNA-Cy5

Hybridize to microarray

Scan with laserlaser

Analyze withbioinformatics tools

RT-PCR

Page 12: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

128

microarrays that include only known mutations. For detecting novel mutations DNA sequenc-ing is needed. Therefore, it is likely in the longer term that increasingly cheaper costs for DNA sequencing will mean many of the microarray-based applications are replaced by NG DNA sequencing.

Array-Based Comparative Genomic Hybridization (aCGH)

In earlier editions of Molecular Medicine, there was discussion about a new development in cytogenetics called FISH (Fluorescence In Situ Hybridization). FISH utilized DNA probes that

An example of how microarray-based tests might impact clinical decision-making is illus-trated in research findings first published in 2002. This work was initiated because breast cancer patients with the same disease sta-ging have different outcomes and survival rates. Conventional prognostic indicators rely on lymph node status, histological grade and immunophenotyping of the tumor. Treatment options for early stage breast cancer after the tumor is removed vary from doing nothing to adjuvant chemotherapy or anti-estrogen agents such as Tamoxifen; both of which have sig-nificant side effects. It is difficult for patients to decide what to do, particularly when it is known that a large number of women will not relapse. In developing a microarray for breast cancer, the researchers at the Netherlands Cancer Institute in Amsterdam first took mRNA from 78 primary breast tumors obtained from women 55 years old, who were lymph node negative. Of these, 34 patients subsequently developed metastases within five years, and 44 remained disease free after five years. mRNA from tumors were ini-tially hybridized against 25 000 human genes. It was shown that prognostic information was captured predominantly by 70 genes whose bio-logical function spanned many potential path-ways in breast cancer development, including

cell cycle, DNA replication, growth, prolif-eration, transformation and apoptosis. The 70 genes were spotted onto another microarray and make up the MammaPrint® test. The RNA profile was considered a more powerful pre-dictor of outcome than standard measures and has been approved by the FDA (more on this in Chapter 7). Clinical trials are now underway to determine the test’s clinical utility. One study is MINDACT, which started in 2007 and closed in mid 2011 when it had recruited over 6 000 patients. Validation data are eagerly awaited of the claim that the tumor’s microarray profile can predict early stage breast cancer patients who will do well without chemotherapy [7]. The test requires fresh tumor tissue from which to extract mRNA. This does not fit into the traditional work flow which utilizes paraffin embedded DNA, so significant clinical benefits will need to be demonstrated before changes in practice result. The same company that pro-duced the above breast cancer genomic screen is working up a similar one for colon cancer. This is ColoPrint® and involves an 18 gene signature. It is targeted to stage II cancers, where following resection of the tumor there is uncertainty about the value of adjuvant chemotherapy (a position similar to early stage breast cancer) as many patients are cured by surgery alone.

BOX 4.3

P E R S O N A L I Z I N G T R E AT M E N T T H R O U G H M I C R O A R R AY S .

Page 13: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

129

hybridized to metaphase or interphase nuclei and allowed chromosomal location as well as gene copy number to be detected. Cytogenetic-based techniques were able to detect chromo-somal abnormalities at the 5–10 Mb level of resolution but dividing cells were necessary. FISH could detect deletions and duplications not previously seen with cytogenetics at a res-olution around 2 Mb for metaphase FISH and even better for interphase FISH. However, FISH was technically demanding, required special equipment and was limited to chromosomal regions detected by the DNA probes. FISH is still useful but it is likely to be replaced by aCGH (also called chromosomal microarray or molecular karyotyping).

aCGH uses DNA rather than chromosomal preparations. aCGH probes (oligonucleotides or cloned segments of DNA) are tiled on micro-scope slides and hybridized against patient and control DNA (Figure 4.4). aCGH kits are com-mercially available and provide various levels of cover across the genome depending on the number of probes used. aCGH is attractive for clinical practice because of:

1. Ease of use;2. Higher detection rate;3. Faster turnaround time, and4. Automation.

aCGH is useful when investigating possible chromosomal imbalances or CNVs leading to birth defects, developmental disorders includ-ing intellectual impairment. It is considered by some to be the first tier diagnostic test in these circumstances [8]. This approach is proving popular in prenatal testing and mutation detec-tion when CNV is the underlying abnormality. Nevertheless, some problems with aCGH need resolution including:

1. The significance of some CNVs detected which is comparable to DNA variants of unknown significance. Centralized databases including the scientific literature help here as does the study of parents to determine if

changes found are de novo or inherited. In the USA and Europe there are emerging clinical and laboratory practice guidelines to address this issue [8];

2. Quality assurance. This is being resolved as home-brew kits are replaced by commercial ones, and

3. Evaluation for clinical utility.

A critical step in the development of aCGH is evaluation. Challenges ahead are illustrated in a 2009 health technology report on aCGH used for patients with developmental delay/men-tal retardation or autism spectrum disorder [9]. Two quotes from this report are noteworthy:

The results of neither conventional cytogenetic evaluation nor aCGH evaluation have been sys-tematically studied for impact on patient outcomes other than diagnostic yield, which is an intermedi-ate outcome. Impact of testing on the kinds of out-comes that matter to the patient and family has been directly addressed in very few studies. Thus, it is not possible to draw evidence-based conclusions regarding the clinical utility of aCGH genetic evalu-ation. The same may also be said of conventional cytogenetic evaluation.

Expert consensus and clinical guidelines state that genetic information is of value because it estab-lishes a causal explanation that is helpful to fami-lies. It is suggested that such genetic information avoids additional consultations and various types of diagnostic tests, assists with early and improved access to community services that may ameliorate or improve behavioral and cognitive outcomes, provides estimates of recurrence rates to better guide reproductive decision-making, and enables an understanding of prognosis and future needs. However, little evidence supports these outcomes.

Although only DNA microarrays have been described, there are microarrays for proteins, carbohydrates and other potential biomarkers. As well as 2D microarrays, it is possible to have 3D suspension arrays.

BIOINFORMATICS

Bioinformatics describes the application of computational tools and analysis to capture,

Page 14: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

130

FIGURE 4.4 Array-based Comparative Genomic Hybridization (aCGH). An example of a duplication and deletion on chromosome 16p. The patient’s DNA is labeled with green fluorescent dye and the normal control DNA has a red dye. The two DNA samples are allowed to hybridize onto slides coated with DNA probes, usually oligonucleotides. Probes can represent regions in the genome known to have CNVs causing disease, or there can be probes scattered across the whole genome. Different aCGHs are available depending on what is needed. Where there is no quantitative difference between the patient and the control both green and red colors will appear around the baseline (center of figure; 0 along the top axis 4 to 4). Where there are duplications/deletions in the patient’s DNA the green/red will predominate. Top: The green intensity is about 0.5 while red is 0.5, i.e. there is an excess of green which indicates a duplication at the site of these probes. Bottom: A relative deficiency of green which is around 1 (patient) and 1 for control (red) DNA means a dele-tion at this locus. aCGH provided by Dr Melody Caramins, South East Area Laboratory Services, Prince of Wales Hospital, Sydney, Australia.

Page 15: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

131

store and interpret biological data. It intersects a number of disciplines, including biology, medi-cine, computer science, information technol-ogy and mathematics. There are many related terms used interchangeably with bioinformatics, including informatics, computational biology, medical informatics, eHealth and health informa-tion technology. In this chapter bioinformatics will be used as a broad descriptor. A new term has emerged; in silico (computer based) analysis – which complements the more traditional in vivo and in vitro approaches to study gene function.

In modern biological research, bioinformat-ics is essential for managing and analyzing data. The computer also increasingly impacts on medical practice, through the availability of sophisticated databases, accessible to patients, the community and health professionals over the Internet. Computers can potentially assist in clinical decision making. The importance of bioinformatics has closely paralleled the growth of molecular medicine and the recent evolution of omics. As the omics analytical plat-forms have become more automated, the role and input of the laboratory scientist or pathol-ogist is diminishing, and the role of the bioin-formatician is growing, as well as becoming a limitation to progress. For the full translation of molecular medicine discoveries into clinical healthcare delivery it will be necessary to build a sophisticated bioinformatics infrastructure, while at the same time ensuring that health professionals and the community are suffi-ciently educated to utilize these resources.

Research Applications

Two key catalysts for major developments in bioinformatics were the Internet’s arrival [10], and the Human Genome Project (Chapter 1). The importance of bioinformatics in molecular medicine became apparent in the 1980s, when DNA sequencing data began to accumulate. These data had to be stored, and the traditional paper methods were inadequate for the amount

generated. The solution was to deposit the sequences electronically into various databases such as GenBank and EMBL. Information about proteins was placed in databases including PIR (Protein Information Resource) and PDB (Protein Data Bank) (Table 4.3).

As well as expanding the storage capacity through better computer hardware, new soft-ware programs were required to analyze the data. Since protein-coding genes occupy only a small proportion (1–2%) of the total genome, and are discontinuous with exons interspersed within introns, an initial focus for bioinformat-ics was predicting the location of protein-coding regions in the genome [11]. Another was the analysis of DNA sequence from newly discov-ered genes, to predict their function. For this, the DNA sequence was compared with other sequences to look for homology (similarity). Software programs, such as FASTA (abbrevia-tion for Fast – all), allowed comparisons with other sequences in the databases. Finding some homology to another gene would help in trying to understand function. Finding no homology made it more problematic for the researcher to predict function.

As the Human Genome Project progressed, an increasing number of model organisms and plants were sequenced and compared through bioinformatic in silico approaches. More sophis-ticated software had to be developed to cope with the increasing complexity in data analy-sis. A program called BLASTN (Basic Local Alignment Search Tool Nucleic acid) provided more rapid and better information about DNA sequences and gene characterization.

Further challenges have emerged, as studies of gene expression generated data from poten-tially thousands of genes using microarrays. The earlier requirement for bioinformatics to provide understanding of relatively straightfor-ward one-dimensional objects such as a DNA sequence has changed significantly, to cope with information related to networks and the relationship between genes (systems biology).

Page 16: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

132

TABLE 4.3 some useful clinical laboratory or research bioinformatics sites. Note: All web-based references accessed on 16 Feb 2012.

Name URL and Comments

NCBI (National Center for Biotechnology Information)

www.ncbi.nlm.nih.gov Repository for many bioinformatics tools and databases including GenBank; RefSeq; Entrez; BLAST; FASTA; dbSNP, dbGaP; PubMed; OMIM; peptidome; DCODE.

EMBL nucleotide sequence databases

www.ebi.ac.uk/embl/ Europe’s primary DNA, RNA nucleotide sequence resource. Data are exchanged on a daily basis with two other similar databases (see GenBank, DDBJ).

DDBJ–DNA databank of Japan

www.ddbj.nig.ac.jp/ DDBJ is a member of the International Nucleotide Sequence Databases developed and maintained collaboratively between DDBJ, EMBL and GenBank for over 18 years. These three databases are synchronized and so contain the same data but differ in the way the data are displayed.

Ensembl www.ensembl.org/index.html Joint UK, European Bioinformatics Institute (EBI) initiative. This database has many complete and up to date annotated entries on selected eukaryotic genomes.

UCSC Genome Bioinformatics

http://genome.ucsc.edu/ A commonly used genome browser.

UniProt www.uniprot.org/ A curated protein sequence database providing a high level of annotation (e.g. description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases.

Protein Data Bank (PDB) www.pdb.org/pdb/ Contains information about experimentally determined structures of proteins, nucleic acids and complex assemblies.

PIR – Protein Information Resource

http://pir.georgetown.edu/ Integrated protein informatics resource for genomic, proteomic and systems biology research.

International Society for Computational Biology

www.iscb.org/ Involved in policy, giving members access to publications and meetings, and functions as a portal for information on training, education and employment.

International HapMap Project

http://hapmap.ncbi.nlm.nih.gov/ A resource to find genes associated with human disease and pharmacogenetics.

Database of genomic variants

http://projects.tcag.ca/variation/ A curated catalog of structural variation in the human genome.

Vega (Vertebrate Genome Annotation)

http://vega.sanger.ac.uk/ A central repository for high quality annotations of vertebrate finished genome sequence.

1 000 Genomes www.1000genomes.org/ A comprehensive catalog of human genetic variation.

Rfam www.sanger.ac.uk/resources/databases/rfam.html Information about RNA families.

Pfam www.sanger.ac.uk/resources/databases/pfam.html Information on classifying proteins.

miRBase www.mirbase.org/ Searchable database of published miRNA sequences and annotation.

KEGG (Kyoto Encyclopedia of Genes and Genomes Databases)

www.genome.jp/kegg/ Contains descriptions of cellular pathways, e.g. metabolic pathways and disease related pathways.

COSMIC www.sanger.ac.uk/genetics/CGP/cosmic/ A catalog of somatic mutations in cancer.

(Continued )

Page 17: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

133

A quantum leap in bioinformatics comput-ing power as well as analytic software became necessary with the emergence of NG DNA sequencing.

Hardware DevelopmentsA number of adaptations have been made to

meet the hardware (computer power) challenge of bioinformatics. One was the development of computer grids by linking computer and database resources across widely distributed scientific communities. With this type of computer power, homology searches that used to take days to

weeks to complete could be finished in seconds to hours. Supercomputers are computers with the fastest calculation speeds, and are at the frontline of processing capacity. Some supercomputers can reach high speeds because they have been designed for one purpose. Presently, a major lim-itation to increasing computational speed is sec-ondary heating. This remains a challenge for the computing industry. Cloud computing describes Internet-based computing using shared resources and software. Access is available on demand and payment is made to cover the capital expenditure (hardware, software) and services.

Human Microbiome Project

http://commonfund.nih.gov/hmp/ NIH sponsored program to characterize the microbiota in different sites of the human body in both health and disease.

Zebra fish model organism database

http://zfin.org/cgi-bin/webdriver?MIval=aa-ZDB_home.apg Provides access to a variety of resources for those working with this model animal.

FlyBase http://flybase.org/ A database of Drosophila genes and genomes.

Caenorhabditis Genome www.sanger.ac.uk/Projects/C_elegans/ A database of Caenorhabditis genome sequencing projects.

GeneCards www.genecards.org/ Searchable, integrated, database of human genes that provides concise genomic, transcriptomic, genetic, proteomic, functional and disease related information on all known and predicted human genes.

Cytochrome P450 home page

www.cypalleles.ki.se Useful site to observe the considerable heterogeneity with DNA changes in the P450 genes.

International Cancer Genome Consortium

www.icgc.org/ International project to map 50 different tumors that have clinical and societal importance across the globe.

Human Genome Variation Society

www.hgvs.org/mutnomen/ Determines the official nomenclature for describing DNA variants and mutations.

Mutation surveyor www.softgenetics.com/mutationSurveyor.html Allows changes in a DNA sequence to be detected by comparing to a reference sequence.

Alamut http://www.interactive-biosoftware.com/software/alamut/overview Provides useful algorithms to interrogate DNA sequence changes and highlights relevant literature as well as recommended HGVS nomenclature.

Human Gene Mutation Database

www.hgmd.cf.ac.uk/ac/index.php International database containing genetic mutations across a wide range of diseases. This is available free, with a professional version containing a larger number of entries also accessible but for a subscription fee.

Human Variome Project www.humanvariomeproject.org/Goal of this international project is to capture and catalog all human genetic variations which are country specific or gene/disease specific.

TABLE 4.3 (Continued)

Name URL and Comments

Page 18: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

134

Clinical Applications

In a rapidly moving field such as genom-ics, information needs to be regularly updated. The bulk of data now being generated means the Internet is the only route for accessing data-bases and linking relevant information to pub-lications in journals that provide the health practitioner (and often patients and families) with up to date and comprehensive informa-tion. In terms of genetic disorders, one of the most extensive and useful databases is OMIM – Online Mendelian Inheritance in Man which is regularly updated. For each clinical condi-tion described, it provides links with relevant publications as well as the related DNA or pro-tein data, and comes in a historically formatted summary. This and other useful databases are listed in Tables 4.3 and 4.4.

In Silico Analysis of DNA VariantsAn example of how bioinformatics and

molecular medicine have impacted on the delivery of clinical genetic services is the use of sophisticated software to interrogate DNA sequence data. There are three key resources available to laboratory health professionals to assess the clinical significance of DNA variants:

1. DNA mutation databases and the scientific literature;

2. In silico approaches utilizing software, and3. In vitro or in vivo experimentation.

As already noted in Chapter 3, the increasing volume of DNA sequencing data that are now being generated makes it impractical to under-take the third option, and so in silico analysis, coupled with information derived from DNA mutation databases and the literature, becomes the default approach. DNA mutation data-bases have also proved to be key resources for depositing new DNA mutations, as these are no longer accepted for publication in journals. An example of such a database is the Human Gene Mutation Database (Table 4.3).

Although DNA mutation databases are important resources, they are also a trap for the inexperienced because variants in databases are not necessarily true mutations and each has to be judged carefully based on the evi-dence provided. Particularly difficult to evalu-ate are variants involving intronic changes that are potential splicing mutations. Missense changes can be interrogated using a number of well-established software algorithms that consider conservation, homology and the potential for altering protein structure or con-formation. Nevertheless, even these can be dif-ficult to confirm as true mutations. Ultimately, this uncertainty has to be transmitted in the genetic counseling. Effective interactions between the laboratory health professional and the clinical health professional are essen-tial to ensure results of DNA tests are fully understood by the patient (and their family members). As will be highlighted in Chapter 5 the direct-to-consumer model for DNA testing bypasses this link. Examples of some software programs used to gauge the clinical signifi-cance of DNA variants are given in Box 4.4 and an overview of what steps are necessary to evaluate the significance of a DNA variant is found in [12].

eHealthThere are many components of eHealth,

including electronic health records (EHR), deci-sion support systems, eConsulting and tele-medicine [13,14]. Computers now comprise an integral component of most clinical practices. Electronically recorded patient information pro-vides the start for computer-generated prescrip-tions that have many benefits, including links to software that highlights risks from drugs or drug-drug combinations. Just as bioinformatics is playing an increasingly important role in the research applications of molecular medicine, so will eHealth initiatives set the pace for the translation of molecular medicine into clinical practice.

Page 19: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

135

Software programs facilitate drawing pedi-grees and obtaining family history (Table 4.4). More relevant for taking genomic discoveries into the clinic will be the availability of com-puter generated algorithms for decision mak-ing. The provision of in silico tools to make clinical practice easier is just as important as formal educational activities. For example, the National Cancer Institute in the USA has developed an Internet-based program which allows the physician or counselor to input clini-cal information relevant to breast cancer risk

including family history and previous breast pathology. This information is then returned to the health professional in the form of a relative risk. The program also has succinct informa-tion about various options available for the at-risk patient (Table 4.4). Some current and future applications of eHealth are described in [13,14].

Professional genetic counseling services are faced with increasing demands and more com-plex clinical scenarios. This trend will continue as new genes and genetic risks are defined in the complex genetic disorders (Chapter 2). The

Using various software programs, DNA vari-ants can be interrogated in silico to assist in their detection and interpretation. One example is Mutation Surveyor®, which can identify where variants are present in Sanger sequencing. The claimed detection sensitivity is 5% of the primary peak, and an accuracy 99% (when used to ana-lyze both the forward and reverse sequencing strands). The software compares the patient’s DNA sequence with a reference one, then identi-fies changes, producing various characteristics including quality scores. The latter is important because no software (and the same would apply to the naked eye) is infallible and poor quality sequences and/or artifacts such as dye blobs can lead to errors. For this reason laboratory staff will always visually confirm any changes reported. For DNA sequencing, the quality score is called Phred, and is based on parameters taken from the DNA sequence peak shape and resolution. Because it is a logarithmic scale a Phred quality score of 10 implies the base call accuracy to be around 90% while a score of 20 means 99% accu-racy. Once a variant is identified, it is assessed for function, which can be aided by Alamut, a deci-sion support program. This software takes the

variant and compares it to other databases includ-ing Ensembl, UCSC Genome Bioinformatics, Swiss Prot, dbSNP and PubMed. In terms of mis-sense changes, the software considers conserva-tion of the nucleotide and amino acid across many species during evolution; physicochemical differ-ences between the wild type amino acid and the mutated one and whether the change occurs in a protein domain. On the basis of this, it makes a prediction about likely pathogenicity of the vari-ant. Although this type of software has helped in the interpretation of DNA variants, it is still ulti-mately the responsibility of the laboratory sci-entist or pathologist to make the final call on the DNA variant’s significance. This is not always an easy task, and increasingly the laboratory DNA sequencing component progresses rapidly while the assessment of the result becomes the limita-tion in turnaround time. It is sobering to note that a formal health technology assessment of Alamut came up with positive recommendations about its clinical utility, but noted that mistakes will occur if the primary information in the databases interro-gated by this software is not correct, or is written in a confusing format. Links to the above software programs may be found in Table 4.3.

BOX 4.4

I N S I L I C O S O F T WA R E A N D D N A S E Q U E N C I N G .

Page 20: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

136

community is also more knowledgeable about genes and genetics as a result of the many media reports or access to the Internet. This means the level of detail requested by patients and families will challenge health profession-als. The same standard for counseling services must be provided to those living in rural or remote regions. In this environment, traditional one-to-one, face-to-face counseling may not be feasible. One way to address these expectations

is through computer-based education and telehealth initiatives.

As will be discussed in Chapter 5, the Internet is used to deliver DNA tests directly to consumers. Now direct-to-consumer coun-seling services are being advertised through the Internet or are available by telephone (Table 4.4). Significant concerns have been expressed about the bypassing of health professionals in the marketing of DNA tests. Nevertheless, there

TABLE 4.4 some clinically-relevant resources available online. Note: All web-based references accessed on 16 Feb 2012.

Name Comments

Online Mendelian Inheritance in Man (OMIM)

www.ncbi.nlm.nih.gov/omim A must for any clinician dealing with genetic diseases. Reputable and regularly updated. Links to DNA and protein information and databases.

National Cancer Institute’s (NCI) Information Service

www.cancer.gov/aboutnci/cis/page1/print?page=&keyword Evidence based summaries providing genetic basis for various cancers.

NCI’s Breast cancer risk assessment tool

www.cancer.gov/bcrisktool/Interactive tool for health professionals to measure woman’s risk of invasive breast cancer.

Canadian Diabetes Association Website

www.diabetes.ca Wide ranging information for patients and health professionals dealing with diabetes.

National Centre for Biotechnology Information (NCBI)

www.ncbi.nlm.nih.gov/About/primer A science primer providing useful summaries of many topics in genomics. NCBI also hosts PubMed and OMIM.

NCBI’s GeneClinics www.ncbi.nlm.nih.gov/sites/GeneTests/?db=GeneTests Provides information for diagnosis, management and counseling for genetic disorders.

Pharmacogenomics Knowledge Base (PharmGKB)

www.pharmgkb.org/ Comprehensive database of information about pharmacogenetics/pharmacogenomics including a list of drugs with genetic information available.

Gene Therapy Clinical Trial Site www.wiley.co.uk/genmed/clinical/ Lists gene therapy studies undertaken worldwide.

Internet genetic counseling service

www.informeddna.com/ Advertises through the Internet for genetic counseling to be delivered by telephone.

DECIPHER http://decipher.sanger.ac.uk/ A database of phenotypes associated with genetic disorders caused by chromosomal abnormalities. This is increasingly a challenge as techniques such as aCGH identify many new submicroscopic changes.

BioInform – Genomeweb www.genomeweb.com/newsletter/bioinform/ Started as a bioinformatics news service but now deals with broader issues.

Medline Plus® www.nlm.nih.gov/medlineplus/ency/article/001657.htm Health information US National Library of Medicine and NIH

CSHL Dolan DNA Learning Center

www.dnalc.org/resources/animations/ Series of around 30 animations on many molecular medicine topics. Many other educational resources also available.

Family health program https://familyhistory.hhs.gov Family history program.

Page 21: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

137

are also lessons to be learnt. In particular, how more effective use can be made of the electronic media in delivering clinical services. While the Internet is essential for educating patients, families and the community at large, the risk of cyberchondriasis is increased as information previously found only in specialized medical journals or books is now readily available for all.

Another new paradigm is the online doctor-patient consultation or eConsultation. Apart from privacy and confidentiality issues related to Internet traffic, this approach has many advantages for the patient and the physician when it comes to simple problems includ-ing repeat prescriptions or communicating the results of tests. However, there are medico-legal issues to be overcome since electronic commu-nication can make it more difficult to assess how well a patient has understood the informa-tion provided, or the physician may not have a complete picture of the clinical problem from an email.

As access to the Internet increases, there will be more pressure for eConsultations to become a part of clinical practice. In response, professional bodies such as the American Medical Association and the American Medical Informatics Association have developed guide-lines on how electronic communication should be used. Recently, a review compared the use of emails between physicians and patients in 2008 versus 2005. It found that overall there has been little change and, perhaps surprisingly, there seemed to be less interest in taking up this method of communication by physicians in 2008 [15]. Of concern was an apparent decrease in adherence to best practice guidelines. These trends would seem inconsistent with the rapid developments that are occurring in personal-ized medicine.

OTHER OMICS

Although the focus of Molecular Medicine is predominantly the genome, transcriptome

and the epigenome, the contributions from other omics provides a more complete pic-ture. As shown in Table 1.12, the list of omics has expanded dramatically. For the pur-pose of Molecular Medicine, some of the more conventional omics are described below, although it is exciting to think about the prospects for new approaches such as venomics (Box 4.5) or the concept of cocainomics. The emergence of omics has made it essential to understand how genes interact in complex biological models. This is now possible through systems biology.

Proteomics

Proteomics is the analysis of the total pro-teins (proteome) expressed by a cell, tissue, bio-logical fluid or organism. Important distinctions between genomics and proteomics include:

1. Proteomic biomarkers are present in biological fluids like plasma, serum, urine as well as in cells and tissues;

2. The proteome is not static, but constantly changing in response to both endogenous and exogenous stimuli;

3. The proteome will differ in different cells and tissues;

4. Added complexity results from protein conformation and post-translational modifications, and

5. There is no technique comparable to PCR that allows minute amounts of a protein to be amplified for ease of assay.

The above points show that the proteome more closely resembles the transcriptome than the genome.

The surprising observation that the human genome has far fewer genes than originally anticipated (from 100 000 at the beginning of the Human Genome Project to the contemporary view of around 20 000) remains to be explained. Earlier it was believed that the most direct way to understand our complex proteome (mil-lions of proteins versus tens of thousands of genes) was to characterize genes, and from this

Page 22: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

138

understand the proteins. Methods to discover and sequence genes made this achievable.

This idea now needs to be re-assessed, because the protein-coding DNA (about 1–2% of the genome) does not explain sufficient variabil-ity or even the human phenome and there must be something else occurring at the level of the genome/transcriptome/epigenome to account for the comparable number of genes across both vertebrates and invertebrates (Table 1.7). Hence, effort is increasingly being directed back to the study of proteins. Proteomics has also been revit-alized by important technological developments, particularly the evolution of 2-dimensional pro-tein gel electrophoresis into the higher resolution liquid chromatography and mass spectrometry.

TechnologyAlthough the term proteomics was coined

in the mid 1990s, a limitation to its develop-ment was the difficulty in sequencing a protein.

This became even more apparent when DNA sequencing methods improved as the Human Genome Project progressed. Today, advances in mass spectrometry (MS) combined with liquid chromatography (LC) have underpinned impor-tant developments in proteomics, metabolomics and lipidomics [17]. Generally two methods are used to identify proteins: (1) Proteins in a com-plex mix are digested into peptides, separated by chromatography and analyzed, or (2) Protein mixtures are first separated and then analyzed without any prior digestion. In both cases the analysis is undertaken with mass spectrometry.

In mass spectrometry, the mass-to-charge ratio (m/z) of gas phase ions is measured. From this a mass spectrum is developed to identify a substance. Typically in a strategy called shot gun proteomics, a protein (or even a number of pro-teins) is digested into peptides and separation undertaken by passage through a liquid chro-matography (LC) column, before the product is introduced into the mass spectrometer (hence

New paradigms for drug discovery are needed, and one approach is the identification of novel peptides. What better place to look than the diverse venoms found in many invertebrates and vertebrates? Apart from snakes and some spi-ders, venoms have been ignored or have proven to be too difficult to study because of the minute amounts present. It is thought that there are about 41 000 species of spiders, which could pro-vide over 12 million biologically active peptides. Currently, only about 600 peptides have been described [16]. The potential of omics-based tech-nologies to study minute quantities of venom pro-vides new opportunities. It would be possible to combine both proteomic and genomic strategies

to identify many more targets from small polyamines found in some spiders or the complex and large proteins found in other venoms. The familiar approach where data (DNA or peptide sequences) are compared against various data-bases to help in identification will be less help-ful in venomics, because many of the peptides in venoms are unique. In these circumstances, the entire sequence has to be obtained de novo and then the challenge would be to determine the various conformations, including disulphide linkages, that are important for functionality. Interesting times are ahead and no doubt more opportunities will arise for bioinformatics-based modeling to assist in determining function.

BOX 4.5

V E N O M I C S.

Page 23: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

139

LC-MS). The peptides are next ionized and vaporized. Ionization can occur by techniques such as electrospray (ES) or via matrix assisted laser/desorption ionization (MALDI). Ionized peptides in a high vacuum system are then exposed to a laser beam. The laser blasts off the ionized peptides and they fly down a vacuum tube towards an oppositely charged electrode. There are various ways to measure the m/z, with a popular one being TOF (time of flight) hence MALDI-TOF. It is also possible to refine the analysis further through Tandem MS (MS/MS). This serial analysis allows some of the peptides from the first mass scan to be rescanned. Mass spectrometers now enable the mass of peptides (or metabolites) to be determined rapidly and accurately. The result is a spectrum based on the various m/z ratios generated, with the height of each peak in that spectrum approximating the abundance of that particle.

Bioinformatics-based algorithms then take the MS data, and allow them to be identified through comparisons with known peptides in the databases. Once high throughput methods became available to characterize proteins accu-rately, it was necessary to develop databases comparable to the ones used to store DNA data. Despite these developments, the proteomic databases remain inferior to the genomic ones because they are limited by substrate access, since proteins need to be isolated from relevant tissues (in contrast to germline DNA, which is identical in all tissues).

Bioinformatic analysis of amino acid sequences and protein function prediction fol-lows along the lines described above for DNA, although is more complex [11]. The amino acid sequence of the protein determines its ultimate conformation and so its biological function. However, the protein’s final shape can be influ-enced by other variables, particularly the physi-cochemical environment in which the amino acids or protein exist and the structural and functional contexts for the amino acids or pro-tein. This means that predicting protein shape

from its linear amino acid sequence is pres-ently not possible in silico. Protein shape can be looked at in terms of known protein structures that have previously been determined through X-ray crystallography or nuclear magnetic res-onance imaging using a resource such as the PDB database (Table 4.3). Software programs including FASTA and BLASTP are used to per-form the calculations. In trying to predict pro-tein function, use can be made of evolutionary relationships to proteins whose structure has already been determined.

ApplicationsBiomarker discovery: A biomarker is a bio-

logical measure such as a compound (usually a protein) that can be used to improve diagno-sis or detect risk, follow disease progress or the effects of a treatment. Considerable effort has gone into biomarker discovery in diseases such as Alzheimer disease or Parkinson disease. Although these two neurodegenerative disor-ders have distinct phenotypes, they have over-lapping features [18]. Apart from attempting to find biomarkers that are based on medical imag-ing, a lot of work has gone into examination of body fluids, particularly cerebrospinal fluid to identify protein and other biomarkers.

This field is still evolving and shares some similarities with gene association studies, as biomarkers can be identified but determining their functional significance is the challenge and limitation. Like cancer, the progression of neurodegenerative disorders is complicated by coexisting secondary changes, such as inflam-mation, cell death and perhaps regeneration. Unlike genomics, protein biomarkers in a vari-ety of tissues or fluids will give different results. The changes found are dynamic and easily influenced by environmental factors, so it is not surprising that proteomic profiles are often not reproducible between studies. Nevertheless, the potential of MS-based strategies to identify and quantify biomarkers will add to the vast quanti-ties of data being generated.

Page 24: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

140

Protein microarrays: These generally rely on the capture of peptides or proteins using anti-body immunoassays. Commercial kits are now available and provide functional analysis in areas such as inflammation, signal transduc-tion, phosphorylation and so on [19]. Claims are made that combinations of protein biomark-ers can be used to distinguish cancer from other conditions, and it is inevitable that a conten-tious screening marker such as PSA (prostate specific antigen) will be replaced by biomarkers with greater specificity and sensitivity.

Drug development: Proteomics is an important entry into drug discovery and development as ultimately it is the protein that is the effector in disease. Applications for proteomics in drug discovery include:

1. Interrogating databases as these have many peptides and proteins that will help to identify novel targets or model different structures as well as protein-protein interactions and post-translational modifications;

2. Utilizing biomarkers to assist in all stages in drug development including the monitoring of efficacy and toxicity, and

3. Producing cheaper or novel drugs.

For example, knowledge of protein struc-ture can be used to make synthetic (cheaper) products exemplified by the antimalarial drug artemisinin or novel therapeutics (Box 4.6).

InteractomeRelated to the proteome is the interactome,

which describes all the protein-to-protein inter-actions within a cell, tissue, fluid or organism. It is usually expressed as a directed graph and is an attempt at a systems biology approach (see below). This can be illustrated with the mature red blood cell which does not have a nucleus, and so has a relatively simple proteome and interactome because there is little mRNA. Apart from carrying oxygen, the red blood cell has to cross narrow capillaries by changing its shape,

and must also cope with hypertonic condi-tions. The earliest investigations of its proteome took place in 2002 using 2D electrophoresis and MALDI-TOF, and identified 102 proteins. Today, the numbers have dramatically increased to around 1 989 proteins involving 15 major red blood cell pathways and 50 major networks. The interactome identified has confirmed and demonstrated the key functions of the red blood cell and how they are maintained including:

1. Surviving oxidative stress because of the constant exposure to high oxygen levels;

2. Requiring the cytoskeleton to unfold, and3. Apoptosis pathways important for the red

blood cell’s aging process [20].

Metabolomics

Metabolomics refers to the total number of small molecular mass organic compounds found in or produced by cells, tissues, fluids or an organism. Polymerized structures such as pro-teins and nucleic acids are excluded. Molecules that make up the metabolome are called metab-olites [21]. The closely related term metabonomics is included under this definition (see Table 1.12). The human endogenous metabolome is esti-mated to contain a few thousand species.

Investigating the metabolome utilizes simi-lar approaches to those described for proteom-ics, although it is complicated by significant dynamic changes. For example, measuring the metabolome requires consideration of envi-ronmental factors such as drugs, dietary com-pounds and even pollutants [21]. This potential for background noise is an additional challenge for experimental design and bioinformatic analysis.

Mass spectrometry has previously been described as a core technology for proteom-ics and metabolomics. However, for the latter any one single approach is usually insufficient. Another technology used to measure the metab-olome is NMR spectroscopy (NMR – nuclear

Page 25: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

141

magnetic resonance). NMR detects nuclear spin which is found in atoms with an odd mass number, e.g. 1H, 31P [21]. Nuclear spin is detect-able in atoms that contain odd numbers of pro-tons and neutrons in the nucleus. Using NMR

spectroscopy, metabolites can be identified by the chemical shift in resonance frequencies. Like MS, the shift in peak identifies the product, while the height of the peak gives an indication of quantity. Generally this approach has poor sensitivity.

Artemisinin exemplifies how an expen-sive natural product can be synthesized more cheaply. It is isolated from the plant Artemisia annua, and in combination with other antima-larials it is used to treat multi-drug resistant malaria. However, it is expensive to isolate and there are uncertainties associated with growing this plant. These constraints make it unattain-able in the developing countries where it is most needed. A synthetic precursor product was made in 2006 using a rDNA approach (Chapter 8) but this was not sufficiently active and needed changes to its structure. Now with funding from the Bill and Melinda Gates Foundation and involvement of the biopharmaceutical company Sanofi-Aventis, researchers from the University of California are attempting to make a synthetic product that will cost around $1 per dose. It will be reliably produced and not subject to weather and other conditions that impact on the native plant that is the current source of this product.

The next two examples involve targeted ther-apies, where drug use is limited to patients who satisfy specific requirement(s) based on protein or DNA tests from tumor tissue.

Imatinib (Gleevec®) is a small molecule spe-cifically developed to inhibit tyrosine kinase (TK). It was originally produced in response to the bcr/abl translocation in chronic myeloid leukemia, which has a fusion gene with unregu-lated TK activity. Imatinib binds close to the ATP binding site specific to the bcr-abl product and

so inhibits production of TK. More recently this drug has been approved for use in gastrointes-tinal stromal tumors because these are associ-ated with activating mutations in the KIT gene (a receptor TK). The successful introduction of Imatinib has led to a number of other TK inhibitors being developed including gefitinib, nilotinib and dasatinib. Although they all work through the same effect on ATP inhibition, the second generation products differ in their tar-geted kinases. In some cases, the newer prod-ucts are now preferred as a front line treatment. TK inhibitors have been shown to be effective in a number of cancers, and they are now being trialed in non-malignant diseases, including pul-monary hypertension, rheumatoid arthritis and other conditions.

Trastuzumab (Herceptin®) is a humanized monoclonal antibody against the human epi-dermal growth factor receptor type 2 (HER2). Following discovery of the HER2 gene and its related protein, it was shown that this biomar-ker (amplification of the gene or its protein product) could identify a subgroup of breast cancer patients with a poor prognosis. Hence a targeted therapy was developed for patients with metastatic breast cancer who were unre-sponsive to conventional therapies. It is asso-ciated with significant side effects and so is preferentially used in patients who are most likely to respond – i.e. those with HER2 over-expressing breast cancer.

BOX 4.6

D R U G S D E V E L O P E D T H R O U G H M O L E C U L A R T E C H N O L O G I E S .

Page 26: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

142

Another technique used in metabolomics is gas chromatography linked to mass spectrom-etry (GS-MS). Here the sample (containing volatile, non-polar metabolites) is vaporized and passed through a chromatograph in the gas phase, before being analyzed by MS. More recently, the LC-MS approach described earlier has become the preferred approach for investi-gating the metabolome.

The metabolome is dependent on the genome, the transcriptome and the proteome, as well as the environment, hence it provides additional information that might be useful for biomarker development or understanding physiologic and disease pathways. Examples of how metabolites are being studied to explain drug toxicity (hepatic and renal) as well as identifying biomarkers in a range of human disorders are given in [21].

Phenomics

The phenome is the entire set of pheno-types in a cell, tissue, organ, organism or spe-cies. It is derived by systematic measurement of phenotypic contributors, including qualita-tive and quantitative traits, allowing it to be defined on a much broader whole-body scale. As the accuracy of genomic based measure-ments improves, more attention is being paid to the phenotype – which remains the criti-cal variable in any genetics or genomics study. Confounding factors in genetic studies include pleiotropy, penetrance, epistasis, allelic and locus heterogeneity. These effects should be considered in designing research protocols but cannot be avoided. In contrast, errors in the phenotype occuring because of phenocopies can be avoided, or their effects can be lessened by more careful assessment of the phenotype [22]. An example would be the genetic disorder thalassemia and acquired iron deficiency. Both have similar phenotypes in terms of the hema-tologic profile but are usually distinguishable with care.

Human ModelsThe concept of deep-phenotyping is used

to explain ways in which the human phenome might be generated [22]. For this, it is neces-sary to document more comprehensive clinical and investigative parameters with preference for the generation of quantitative data. A heat map can be generated to allow statistical assess-ment of what might be overlapping syndromes (Figure 4.5). The human genome with its 3 billion bases represented by four possible com-binations is relatively straightforward compared

FIGURE 4.5 A heat map to define a human phenome. The heat map is generated by placing a phenotype class or disease along one axis (X in this case) and phenotypic char-acteristics on the Y axis. In this example A to E represent 5 phenotypically similar disorders while the numbers 1 to 8 are characteristics derived from the phenotypes in these disorders. A two color heat map is shown with red ↑ intensity/prevalence of the characteristic compared to a ref-erence range; blue ↓ intensity/prevalence of the charac-teristic; white absent characteristic. A pink or light blue color would suggest a less conclusive phenotype. Based on the patterns shown, it would appear that disorders A and C are similar; A and D share some similarities while A and E and to a lesser extent A and B are different. This more rigor-ous assessment of phenotype would help in selecting sub-jects for a case control association study or define better the underlying disorders. See [22] for examples using one and two color heat maps.

A B C D E

Disease

1

2

3

4

5

6

7

8

Cha

ract

eris

tics

Page 27: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

143

to the human phenome, which apart from its potential complexity, will contain components yet to be defined. A Human Phenome Project akin to the Human Genome Project would be significantly more complex because of the intrinsic difficulty in determining both quali-tatively and quantitatively what components should be included. In the meantime there have been many initiatives to catalog human pheno-types and phenomes including the publication of personal genomes from members of the pub-lic as well as celebrities. Considerable progress has been made in understanding the phenome through animal studies.

Animal ModelsUnlike humans, animals can be manipulated

experimentally and bred under specific con-ditions. Some animal models of disease arise spontaneously, but a more useful approach is to produce experimentally the phenotype required which allows the natural history of a disorder to be followed over many generations and various interventions can be tried.

Traditional animal models: For many years, inbred strains of animals, particularly the laboratory mouse, have been important tools for studying a wide range of human disorders. Inbred mice are produced by repeated sister-brother matings over about 20 generations. The end result is a syngeneic mouse which will be identical (e.g. homozygous) at every genetic locus, and to other mice of the same strain. Another type of inbred mouse is the congenic one. Although derived from one strain, selec-tive breeding allows this animal to have genetic material from a second strain at a single locus. Naturally-derived animal models provide con-siderable information, but they have limita-tions, for instance the mutation may not be representative of that found in the human dis-order. Importantly, there are many diseases for which a suitable animal model does not exist.

Transgenic mouse: Recombinant DNA (rDNA) methods provide a way to create new animal

models or manipulate existing ones to test the function of genes (Box 4.7). The rDNA approaches can be divided into two strategies; reverse or genotype driven animal models, and forward or phenotype driven models. The reverse strategy is essentially the transgenic animal – i.e. manipulating a specific gene in a mouse will provide information about a dis-ease. The gene driven strategies require a priori knowledge of likely gene function. In contrast, the forward strategy makes no prior assump-tions and focuses on the disease (phenotype) and from this, knowledge of the underlying genomic changes can be gained. An example of the forward approach is the ENU mouse.

ENU mouse: ENU (N-ethyl-N-nitrosourea) is a potent germline mutagen that is used to generate single nucleotide mutations in DNA. Using this chemical, it is possible to create random mutations in mouse DNA, and then observe the resulting phenotypes. Those which resemble human diseases are studied to iden-tify the relevant gene. From this, the human homolog can be isolated. Difficulties with this model include a preference for ENU-induced mutations to occur at A-T base pairs and so mutations at G to C sites are under-represented. Because there is no prior information, detecting the various phenotypic changes, particularly subtle ones is challenging [24].

Zebrafish: Danio rerio is an attractive model organism because of its small size, short life cycle, and ease of growth. It is easier to work with in terms of gene identification since its genome is half the size of the human or mouse. It is a particularly good model when study-ing development because the embryos are transparent, and develop outside the moth-er’s body, so they can be studied in real time. In the zebrafish, antisense approaches to gene manipulation have been used successfully to knock out genes, and then observe the effects on the phenotype (Chapter 8). Zebrafish can be used to evaluate drug toxicity by direct release of the drug in the fish tank and observation of

Page 28: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

144

Transgenic mice have become an invalu-able resource for understanding human disease. Three types are available:

1. The conventional transgenic mouse is produced by a microinjection of DNA into the pronucleus of a fertilized oocyte, which is then inserted into a pseudopregnant foster mother. In this model, the injected transgene is randomly inserted into the genome. Despite this it can still function and its expression will produce a new phenotype. Foreign DNA that has become integrated into the germline of what is now a chimeric mouse enables the gene to be transmitted to progeny. Appropriate matings will produce homozygotes containing the transgene.

2. Embryonic stem cells also allow a gene to be targeted to its appropriate locus, and replace its normal wild-type counterpart by homologous recombination; i.e. integration into the genome is no longer random. Gene function can be inhibited (knock-out mouse) or the effect of a specific gene or gene mutation can be observed (knock-in mouse) (Figure 4.6) (See Chapter 8 for discussion of homologous recombination).

3. The two types of transgenics so far described represent an all-or-nothing effect, and there is widespread expression of the transgene in many tissues. Therefore, it is difficult to investigate subtle phenotypic changes or distinguish primary from secondary effects. The uncontrolled expression of the transgene

1 2 3 4 5 6

ES cell+

transfectedDNA (–)

ES cellcoloniesgrown

DNAisolated

fromcolonies

colony withhomologousrecombinant

PositiveES cells

microinjectedinto

blastocysts

Chimericmouse

FIGURE 4.6 Embryonic stem (ES) cells for in vivo expression of recombinant DNA. This method produces transgenic mice which are used to test the function of genes in vivo. (1) ES cells are transfected with foreign DNA. ES cells will take up DNA into different random sites in the mouse genome. In a very rare instance, the integra-tion will have occurred into the correct site in the genome by homologous recombination. (2) Colonies of ES cells are grown. (3) DNA is isolated from pools of colonies. (4) The colony which has DNA integrated into the correct position in the genome by homologous recombination can be identified by PCR (marked in red here). (5) ES cells with the homologous recombined DNA are injected into mouse blastocysts. (6) Using different colored mice as sources of ES cells (e.g. white mouse) and blastocysts (e.g. black mouse) will enable chimeric (white and black) mice to be distinguished. If the transgene has also integrated into the germline it will be possible to obtain a homozygous animal by breeding [23] (Chapter 8 has further discussion on ES cells).

BOX 4.7

T R A N S G E N I C M O U S E M O D E L S .

Page 29: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

145

during embryonic development could also be lethal if it is not normally expressing at this time. To improve on these limitations, it is now possible to make a conditional knock-out mouse, which means that the inserted gene can be switched on or off conditional to a specific stimulus. One approach to make a conditional transgenic mouse utilizes what

is called the Cre-lox system (Figure 4.7). A summary of gene targeting, homologous recombination and the Cre-lox system is found in the citation for the 2007 Nobel Prize in Physiology or Medicine awarded to M Capecchi, M Evans and O Smithies for their work in homologous recombination and transgenic mice [23].

BOX 4.7 (cont’d )

FIGURE 4.7 Cre-lox system to generate a conditional transgenic mouse. Cre (causes recombination) recom-binase enables recombinations to be made where there are recombinase recognition sites called loxP (locus of recombination). (1) The floxed transgenic (flanked by lox) is produced by the usual embryonic stem cell homolo-gous recombination approach but in this case the gene of interest is constructed so that it is flanked by loxP sites. Mice with this transgene are bred to homozygosity, but have no phenotypic changes because the Cre recombinase is needed. (2) To introduce the Cre recombinase requires breeding to a Cre expressing transgenic mouse. This trans-genic has Cre under the control of a promoter which can be tissue or time specific. For example, using the cardiac myosin promoter will mean the gene will express only in cardiac tissue. By introducing into the promoter an ele-ment requiring a drug such as tetracycline it becomes possible to turn on the Cre gene only when there is exposure to tetracycline. (3) Offspring of the Cre/Floxed mating on exposure to tetracycline will allow targeted recombina-tion to occur and so inhibit gene function (i.e. a knock-out). Because this is tissue or time specific it allows some control of the transgenic gene expression and avoids the potential for lethality [23].

12

3

X

Floxedtransgenic

Livercre transgenic

Liver specific mutantcreated-responsive

to an external stimulant

Page 30: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

146

toxic effects in embryos or adult fish. For drug discovery, mutant zebrafish can be exposed to various compounds and disease-suppressing effects sought as markers for novel drugs. Mutants in zebrafish produced by ENU have also proven useful models for human disorders (Table 4.3) [25].

Metagenomics

The human microbiota refers to the commu-nity of microbes that lives in symbiosis with its host. The set of genes encoded by the micro-biota is called the microbiome. Humans have four major microbiomes – gut, skin, oral cavity and reproductive tract. Metagenomics refers to the sequencing of uncultured microorganisms in various environmental niches to provide a snap shot of the microbial populations, thereby allowing their biodiversity to be studied.

The nonpathogenic human gut bacterial flora has been described as the third major genome of mammals after nuclear and mitochondrial DNA, with the difference being that it can change. The human gastrointestinal tract has a diverse bacterial flora in terms of both number and species. It is the site for important mutually beneficial interactions including digestion and immunity. Numbers quoted for the gut flora are pretty impressive – 500 different species, diversity greater than what is found in the skin, oral cavity or reproductive tract and a cumu-lative microbiome genome that is 100 times larger than the mammalian nuclear genome [26]. Since many of the gut flora cannot be cul-tured, the only option for identifying new spe-cies and cataloging those present is NG DNA sequencing.

Although the gut microbiome is important for normal health, it is also implicated in inflam-matory bowel diseases such as Crohn disease, ulcerative colitis and irritable bowel syndrome. Differences in the microbiomes for these condi-tions could indicate a breakdown in the toler-ance normally existing between microbes and

the gut mucosa leading to inflammation [26]. Animal studies have also shown that the gut microbiome might contribute to obesity, thereby broadening the concept that obesity is a product of nutritional and genetic factors (Chapter 6). The efficacy of complementary medicines, such as the taking of probiotics to enhance the benefi-cial bacteria in the gut can now be better assessed by NG DNA sequencing approaches.

Human Microbiome ProjectThe goals of the NIH sponsored Human

Microbiome Project read like a mini Human Genome Mapping Project:

1. Determine if individuals share a common human microbiome;

2. Understand if changes in the human microbiome can be correlated with human health;

3. Develop new technologies and bioinformatics tools, and

4. Address ELSI raised by human microbiome research (Table 4.3).

The Human Microbiome Project utilizes two strategies developed through metagenomics. DNA present in a particular environment is iso-lated using degenerate PCR primers to amplify all 16S or 18S ribosomal RNA (rRNA) spe-cies representing prokaryotes and eukaryotes respectively. Since these RNA species contain highly conserved regions an overview of what is present can be obtained. Alternatively, DNA or RNA is prepared from the pool of micro-organisms, subcloned, amplified, and then NG DNA sequencing is used to give an overview of what is present. Both approaches rely on final identification through in silico comparisons with protein, DNA and RNA sequences already in the databases.

The challenges for bioinformatics in meta-genomics are significant [27]. Sequencing a sin-gle organism was only achieved in 1995 (Table 4.1) but today it is relatively easy to provide a

Page 31: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

147

complete picture of any organism’s genomic structure with the assistance of bioinformatics. On the other hand, metagenomic approaches are considerably more difficult because there will be a mixture of sequences representing many organ-isms, and the sequences themselves will be rela-tively small because they have been generated by NG DNA sequencing. Thus, there is a growing demand for better software, and skills to process and then analyze the data from metagenomics studies.

More recently the viral metagenome (viriome) has been studied in different environments. This work is technically more challenging because there is no reference point equivalent to the ubiquitous 16S rRNA genes found in prokaryo-tes. Nevertheless, different viriomes are being characterized to identify the pathogens present. Interesting results are already emerging, with over 50% of the DNA or RNA sequences unknown. Ultimately, it is expected that new insights into virus-host interactions will become possible. For example, knowledge of viral ecol-ogy could be used for monitoring emerging infections or assessing water quality [28].

SYSTEMS BIOLOGY

Systems biology is the computational recon-struction of biological systems [29]. It is based on an interdisciplinary approach that involves holistic rather than reductionist strategies to understand complex interactions in biological systems. In this way quantitative models can be developed to predict function and behav-ior in a system. In biology the drivers for sys-tems biology include omics-based data sets that have been integrated through advanced computer science and computational analyses (Figure 4.8). The ultimate output would be the production of a virtual cell.

Genomics and proteomics data sets are found in the literature and in many databases notably EMBL, GenBank, DDBJ and Ensembl

(nucleotides); UniProtKB-Swiss-Prot, Protein Data Bank (proteins) (Table 4.3), while Medline and PubMed offer computerized access to the scientific literature [29]. Having mined these resources, the data need to be analyzed for function by homology searching (DNA and protein) or identifying particular domains in the case of proteins. Predicting protein struc-ture is more difficult as most remain unknown. Inference may only be possible. Each of the data sets (for example genome, transcriptome, proteome, metabolome and phenome) provides information and allows the construction of net-works, but none gives the complete picture on its own. Merging all the information together and developing integrated models requires additional bioinformatic input. This is needed to assemble the data sets into some form of net-work that is consistent with the model under study, and then converting the network into a computational model that can be tested in silico against specified biological parameters. Ultimately, it will be necessary to validate the model through in vivo studies. Successful applications of systems biology require multi-disciplinary contributions, particularly biology, mathematics, engineering and physics.

It has been suggested that there are two approaches in systems biology: (1) Top down – by computer modeling and simulation, and (2) Bottom up – integrating all clinical, laboratory and imaging data. The latter would have par-ticular relevance to the clinic.

Clinical Applications

In medical practice, an approach compa-rable to systems biology is already followed, since clinical, family, laboratory and imaging data sets are all considered in decision mak-ing. However, this is ad hoc, not validated and is derived informally. From being theoretical constructs, research-based systems biology strategies are now able to be simulated in silico, becoming more robust and reproducible as

Page 32: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

148

evidence is accumulated. Today, there is grow-ing interest in developing a more systematic approach that is underpinned by bioinformat-ics in concepts such as systems pharmacology and systems pathology. The former seeks to develop a whole-organism understanding of drug action. To do so requires a thorough understanding of the drug’s potential effects generated by input from clinical markers, ani-mal models, the effects of the drug on cells, tissues and organs. Interacting networks can then be modeled in silico and all data are used to understand better the effects of drugs on an individual including drug-drug interactions.

Clinical trials will then be required to test any relevant observations. In some circumstances, it may not be possible to generate numbers for a statistically significant clinical trial (for example drug-drug interactions) and in silico modeling may only be possible. Other advantages to a systems pharmacology approach would be the generation of decision-making software tools for the clinician, and the identification of poten-tial new targets for drug development.

Systems pathology follows along similar lines and provides a more global approach to manag-ing complex systems such as cancer. Examples where this would be helpful are: (1) PSA

FIGURE 4.8 A representation of systems biology. Left of arrow: Symbols represent individual data or data sets gener-ated through omics. However, isolated data sets per se do not identify the complex interactions that might be occurring. Information may only be meaningful if it can be linked together. Right of arrow: Systems biology utilizes computer-based algorithms to join related data sets in terms of metabolic pathways or function. This produces a better understanding of the 3-dimensional picture in the cell or tissue.

Page 33: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

149

(prostate specific antigen) screening for detect-ing early prostate cancer, and in predicting the outcome of treatment, and (2) early stage can-cer when the primary tumor is removed but the patient is left with a dilemma in terms of what adjuvant therapies (if any) are needed to reduce the risk of relapse.

Traditional surgical, biochemical, molecu-lar, imaging and pathological markers for predicting outcomes are still limited in their utility. Systems pathology implies that a more global assessment of markers and their interac-tions will allow various biological networks or dynamics to be found. Following validation, bioinformatics-based algorithms can be devel-oped to identify treatment options personalized to the tumor or the patient.

Some successes are emerging:

1. In hereditary ataxias, seemingly unrelated findings derived from known abnormal proteins secondary to gene mutations, complex protein-protein interaction networks and related pathways have been connected, showing that these neurological disorders are likely to result from RNA splicing defects that promote the death of Purkinje cells.

2. Parkinson disease involves at least six genes in pathogenesis with many different pathways. There was no unifying hypothesis of how these interacted to cause brain damage, until a more global picture based on genomic and proteomic data identified mitochondrial pathways as being important [30].

OVERVIEW

A number of concepts have been described in Chapters 1 to 4, each having fine distinctions in terminology – such as molecular medicine, genomic medicine and personalized medi-cine. An attempt at connecting them is made in Figure 4.9. Whatever the distinctions, a com-mon thread linking them is technology, which

remains an important driver for new discover-ies. In this environment, a robust mechanism to evaluate clinical utility or effectiveness is essential.

Traditionally, new drugs or diagnostic tests are assessed within a population. Evidence-based medicine (EBM) approaches, such as randomized clinical trials (RCT) allow the evaluation of product safety. However, most RCTs (and the same applies to GWAS stud-ies) measure efficacy as an outcome – i.e. does something work or not. RCTs are generally conducted under ideal conditions, so the strict requirements set by regulators can be met. As we learn more about human variation, par-ticularly at the DNA level, and differences in susceptibility to disease, it is evident that popu-lation stratification within RCTs might provide more reliable data. The ultimate in stratifica-tion is represented by the individual in his or her own environment which is likely to be less than ideal. The RCT is difficult in this respect and newer approaches are needed particularly for molecular medicine which will invariably involve gene plus environment (G x E) effects.

Comparative Effectiveness Research (CER) is an additional evaluative approach. It was given a boost in the USA with a new Act in 2009 pro-viding $1.1 billion to fund its implementation. CER involves a direct comparison of existing health interventions (DNA genetic tests or genetic ther-apies in the present context), and the examina-tion of outcomes in a real life environment with effectiveness as the end point – i.e. does an inter-vention do what it claims to do in ordinary cir-cumstances [31]. Gathering data for CER can be via traditional RCTs and systematic reviews, as well as other means. An important medical inter-vention is the NG DNA whole genome or exome sequence. But does it have clinical utility? Case reports in the rare genetic disorders would sug-gest that NG DNA sequencing is clinically effec-tive (Box 4.8). However, these disorders are rare and the numbers are not there for an RCT. A CER approach might be better to make an assessment.

Page 34: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

150

FIGURE 4.9 Relationship between molecular, genomic and personalized medicines. Molecular medicine describes the use of DNA (RNA) based knowledge to inform clinical practice although the impact of other omics must be considered. Genomic medicine is a recent term for what is essentially the same activity although the name implies a more restricted focus to DNA. Populations are traditionally used to assess new therapies or models of care. Underpinning this is evidence-based medicine via randomized clinical trials (RCTs). Outcomes produce a one-size-fits-all view which is a very different philosophy to personalized medicine. The latter is reached via population stratification and the evidence comes from the traditional RCT as well as other methodologies such as comparative effectiveness research (CER). Drivers for molecular medicine are technology and industry with the immediate goal being whole genome sequencing (WGS). Outcomes include a range of DNA genetic and genomic tests and a renewed drug development pipeline through pharmacogenomics. Success will depend on the appropriate business model that is attractive to those who hold the health dollars; interest and under-standing by health professionals and an educated and engaged community.

Molecular (DNA/RNA/OMICS) Medicine

Population-basedmedicine

Personalizedmedicine

Collectiveresult

Individualresult

RCT RCT

CER

Populationstratification

Success

Business case (Government; Health organisations)

Uptake by Health Professionals

Engagement of community

Outcomes

DNAdiagnostics

Newdrugs

Genetic & genomic tests

Pharmacogenetics for Rx

Pharmacogenomics & drug delivery

DNA based population stratification

Drivers Technology & $ WGS

Page 35: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

151

There is a place for both RCTs and CER in molec-ular medicine but flexibility is needed so that either or both may be appropriate depending on the potential utility of a discovery.

References [1] Next steps in the sequence: the implications of

whole genome sequencing for health in the UK. PHG Foundation 2011. www.phgfoundation.org/reports/10364/

[2] Mardis ER. A decade’s perspective on DNA sequenc-ing technology. Nature 2011;470:198–203.

[3] X Archon prize. http://genomics.xprize.org/ [4] Ashley EA, Butte AJ, Wheeler MT, et al. Clinical

assessment incorporating a personal genome. Lancet 2010;375:1525–35.

[5] Morgan JE, Carr IM, Sheridan E, et al. Genetic diagno-sis of familial breast cancer using clonal sequencing. Human Mutation 2010;31:484–91.

[6] Miller MB, Tang Y-W. Basic concepts of microarrays and potential applications in clinical microbiology. Clinical Microbiology Reviews 2009;22:611–33.

[7] Cardoso F, Van’t Veer L, Rutgers E, et al. Clinical application of the 70-gene profile: The MINDACT trial. Journal of Clinical Oncology 2008;26:729–35.

[8] Miller DT, Adam MP, Aradhya S, et al. Consensus statement: Chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anoma-lies. American Journal of Human Genetics 2010; 86:749–64.

[9] Formal HTA on aCGH for the genetic evaluation of patients with developmental delay/mental retar-dation or autism spectrum disorder. http://www.bcbs.com/blueresources/tec/vols/23/acgh-genetic-evaluation.html

[10] Origin of the Internet from the Internet Society. http://www.isoc.org/internet/history/brief.shtml

[11] Tramontano A. Bioinformatics. In: Encyclopedia of Life Sciences (ELS). Chichester: John Wiley & Sons, Ltd.; 2009.

[12] Calo V, Bruno L, La Paglia L, et al. The clinical signifi-cance of unknown sequence variants in BRCA genes. Cancers 2010;2:1644–60.

[13] Blumenthal D, Glaser JP. Information technology comes to medicine. New England Journal of Medicine 2007;356:2527–34.

Two recent success stories demonstrate how whole genome sequencing or exome sequencing can provide invaluable input into the diagnosis and treatment of rare genetic disorders.

The first involves a severely affected male child aged 15 months with an acute coli-tis resembling Crohn disease. Known causes were sought using conventional investigations including DNA sequencing of potential candi-date genes. All failed to give an answer, until a whole exome sequencing strategy was used. It identified a hemizygous missense change in the gene XIAP (X-linked inhibitor of apoptosis). This gene plays a key role in the pro-inflamma-tory pathway and represents a novel mechanism for developing Crohn disease. On the basis of

this, an allogeneic hematopoietic cell transplant was performed and the child’s gastrointestinal disease resolved [32].

The second case involved two non-identical twins aged 14 years. They had been diagnosed when aged 5 as having DRD (dopamine respon-sive dystonia) and were treated with L-dopa. However, their condition deteriorated and whole genome sequencing was undertaken. This showed two mutations (a missense change and a premature stop codon) in the SPR gene which had previously been associated with DRD. As a result of this observation, the L-dopa treatment was supplemented with 5 hydroxytryptophan which bypassed the SPR gene defect. This led to clinical improvement in both twins [33].

BOX 4.8

T H E E F F E C T I V E N E S S O F N G D N A S E Q U E N C I N G I N M A N A G I N G R A R E D I S E A S E S .

Page 36: Molecular Medicine || Omics

4. OmICs

MOLECULAR MEDICINE

152

[14] Ullman-Cullere MH, Mathew JP. Emerging land-scape of genomics in the electronic health record for personalized medicine. Human Mutation 2011; 32:512–6.

[15] Menachemi N, Prickett CT, Brooks RG. The use of physician-patient email: a follow-up examination of adoption and best-practice adherence 2005–2008. Journal of Medical Internet Research 2011;13:e23.

[16] Escoubas P, King GF. Venomics as a drug discovery platform. Expert Reviews of Proteomics 2009;6:221–4.

[17] Griffiths WJ, Wang Y. Mass spectrometry: from pro-teomics to metabolomics and lipidomics. Chemical Society Reviews 2009;38:1882–96.

[18] Shi M, Caudle WM, Zhang J. Biomarker discovery in neurodegenerative diseases: a proteomic approach. Neurobiology of Disease 2009;35:157–64.

[19] Yu X, Schneiderhan-Marra N, Joos TO. Protein micro-arrays for personalized medicine. Clinical Chemistry 2010;56:376–87.

[20] D’Allesandro A, Righetti PG, Zolla L. The red blood cell proteome and interactome: an update. Journal of Proteome Research 2010;9:144–63.

[21] Roux A, Lison D, Junot C, Heilier J-F. Applications of liquid chromatography coupled to mass spec-trometry-based metabolomics in clinical chemis-try and toxicology: A review. Clinical Biochemistry 2011;44:119–35.

[22] Lanktree MB, Hassell RG, Lahiry P, Hegele RA. Phenomics: expanding the role of clinical evaluation in genomic studies. Journal of Investigative Medicine 2010;58:700–6.

[23] 2007 Citation for the Nobel Prize in Physiology or Medicine. http://nobelprize.org/nobel_prizes/medicine/laureates/2007/advanced.html

[24] Acevedo-Arozena A, Wells S, Potter P, et al. ENU mutagenesis, a way forward to understand gene func-tion. Annual Reviews Genomics and Human Genetics 2008;9:49–69.

[25] Lieschke GJ, Currie PD. Animal models of human disease: zebrafish swim into view. Nature Reviews Genetics 2007;8:353–67.

[26] Carroll IM, Threadgill DW, Threadgill DS. The gas-trointestinal microbiome: a malleable, third genome of mammals. Mammalian Genome 2009;20:395–403.

[27] Wooley JC, Godzik A, Friedberg I. A primer on metagenomics. PLOS Computational Biology 2010;6:e1000677.

[28] Rosario K, Breitbart M. Exploring the viral world through metagenomics. Current Opinion in Virology 2011;1:1–9.

[29] Kersey P, Apweiler R. Linking publication, gene and protein data. Nature Cell Biology 2006;8:1183–9.

[30] Villoslada P, Steinman L, Baranzini SE. Systems biol-ogy and its application to the understanding of neuro-logical diseases. Annals of Neurology 2009;65:124–39.

[31] Khoury MJ, Rich EC, Randhawa G, Teutsch SM, Niederhuber J. Comparativeness effectiveness research and genomic medicine: An evolving partner-ship for 21st century medicine. Genetics in Medicine 2009;11:707–11.

[32] Worthey EA, Mayer AN, Syverson GD, et al. Making a definitive diagnosis: successful clinical application of whole exome sequencing in a child with intracta-ble inflammatory bowel disease. Genetics in Medicine 2011;13:255–62.

[33] Bainbridge MN, Wiszniewski W, Murdock DR, et al. Whole genome sequencing for optimized patient man-agement. Science Translational Medicine 2011;3:87re3.

Note: All web-based references accessed on 16 Feb 2012.