5
Stoffel RH, Randall RR, Premont RT, et al: 1994. Palmitoylation of G protein– coupled receptor kinase, GRK6. Lipid modification diversity in the GRK family. J Biol Chem 269:27791–27794. Takagi C, Urasawa K, Yoshida I, et al: 1999. Enhanced GRK5 expression in the hearts of cardiomyopathic hamsters, J2N-k. Biochem Biophys Res Commun 262:206 –210. Trivedi M & Lokhandwala MF: 2005. Rosigli- tazone restores renal D1A receptor-Gs pro- tein coupling by reducing receptor hyper- phosphorylation in obese rats. Am J Physiol Renal Physiol 289:F298 –F304. Umrani DN, Banday AA, Hussain T, & Lokhandwala MF: 2002. Rosiglitazone treatment restores renal dopamine recep- tor function in obese Zucker rats. Hyper- tension 40:880 – 885. Usui I, Imamura T, Babendure JL, et al: 2005. G protein– coupled receptor kinase 2 medi- ates endothelin-1–induced insulin resis- tance via the inhibition of both Galphaq/11 and insulin receptor substrate-1 pathways in 3T3-L1 adipocytes. Mol Endocrinol 19: 2760 –2768. Usui I, Imamura T, Satoh H, et al: 2004. GRK2 is an endogenous protein inhibitor of the insulin signaling pathway for glucose transport stimulation. EMBO J 23:2821– 2829. Vinge LE, von Lueder TG, Aasum E, et al: 2008. Cardiac-restricted expression of the carboxyl-terminal fragment of GRK3 un- covers distinct functions of GRK3 in regu- lation of cardiac contractility and growth: GRK3 controls cardiac alpha1-adrenergic receptor responsiveness. J Biol Chem 283: 10601–10610. Völkers M, Weidenhammer C, Herzog N, et al: 2011. The inotropic peptide ARKct improves AR responsiveness in normal and failing cardiomyocytes through G()- mediated L-type calcium current disinhibi- tion. Circ Res 108:27–39. Watanabe H, Xu J, Bengra C, et al: 2002. Desensitization of human renal D1 dopa- mine receptors by G protein– coupled re- ceptor kinase 4. Kidney Int 62:790 –798. White DC, Hata JA, Shah AS, et al: 2000. Preservation of myocardial beta-adrener- gic receptor signaling delays the develop- ment of heart failure after myocardial infarction. Proc Natl Acad Sci U S A 97:5428 –5433. Zhou RH, Pesant S, Cohn HI, et al: 2009. Negative regulation of VEGF signaling in human coronary artery endothelial cells by G protein– coupled receptor kinase 5. Clin Transl Sci 2:57– 61. PII S1050-1738(12)00272-1 TCM Challenges in Medical Applications of Whole Exome/Genome Sequencing Discoveries Ali J. Marian* Despite the well-documented influence of genetics on susceptibility to cardiovascular diseases, delineation of the full spectrum of the risk alleles had to await the development of modern next-generation sequencing technologies. The techniques provide unbiased approaches for identifica- tion of the DNA sequence variants (DSVs) in the entire genome (whole genome sequencing [WGS]) or the protein-coding exons (whole exome sequencing [WES]). Each genome contains approximately 4 million DSVs and each exome approximately 13,000 single nucleotide variants. The challenge facing researchers and clinicians alike is to decipher the biolog- ical and clinical significance of these variants and harness the information for the practice of medicine. The common DSVs typically exert modest effect sizes, as evidenced by the results of genome-wide association studies, and hence have modest or negligible clinical implications. The focus is on the rare variants with large effect sizes, which are expected to have stronger clinical implications, as in single gene disorders with Mendelian patterns of inheritance. However, the clinical implications of the rare variants for common complex cardiovascular diseases remain to be established. The most important contribution of WES or WGS is in delineation of the novel molecular pathways involved in the pathogenesis of the phenotype, which would be expected to provide for preventive and therapeutic opportunities. (Trends Cardiovasc Med 2012;22:219 –223) © 2012 Elsevier Inc. All rights reserved. • Introduction The advent of massively parallel nucleic acid sequencing or next-generation se- quencing (NGS) technologies has had an enabling impact in delineating the ge- netic makeup of each individual. NGS has raised considerable interest in po- tential applications of the information embedded in the DNA sequence of each genome in the practice of medicine. The NGS platforms have afforded the oppor- tunity to sequence the entire genome of an individual (whole genome sequenc- ing [WGS]), which is composed of 3.2 billion nucleotides, within a week. Like- wise, it has enabled sequencing of the entire protein-coding region, referred to as exome, which is composed of approx- imately 180,000 exons in approximately 23,500 genes and 30 million nucleotides, in dozens of individuals within a few days (whole exome sequencing [WES]). The enormity of this progress is remark- Ali J. Marian is at the Center for Cardiovas- cular Genetics, Brown Foundation Institute of Molecular Medicine, The University of Texas Health Science Center and Texas Heart Institute, Houston, TX 77030, USA. *Address correspondence to: Ali J. Marian, MD, Center for Cardiovascular Genetics, Brown Foundation Institute of Molecular Medicine, The University of Texas Health Sciences Center, Texas Heart Institute, 6770 Bertner St, Suite C900A, Houston, TX 77030, USA. Tel.: (1) 713 500 2350; fax: (1) 713 500 2320; e-mail: [email protected]. Published online 24 August, 2012. © 2012 Elsevier Inc. All rights reserved. 1050-1738/$-see front matter TCM Vol. 22, No. 8, 2012 219

Challenges in Medical Applications of Whole Exome/Genome Sequencing Discoveries

  • Upload
    ali-j

  • View
    217

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Challenges in Medical Applications of Whole Exome/Genome Sequencing Discoveries

rights reserved.

Stoffel RH, Randall RR, Premont RT, et al:1994. Palmitoylation of G protein–coupledreceptor kinase, GRK6. Lipid modificationdiversity in the GRK family. J Biol Chem269:27791–27794.

Takagi C, Urasawa K, Yoshida I, et al: 1999.Enhanced GRK5 expression in the hearts ofcardiomyopathic hamsters, J2N-k. BiochemBiophys Res Commun 262:206–210.

Trivedi M & Lokhandwala MF: 2005. Rosigli-tazone restores renal D1A receptor-Gs pro-tein coupling by reducing receptor hyper-phosphorylation in obese rats. Am J PhysiolRenal Physiol 289:F298–F304.

Umrani DN, Banday AA, Hussain T, &Lokhandwala MF: 2002. Rosiglitazonetreatment restores renal dopamine recep-tor function in obese Zucker rats. Hyper-tension 40:880 – 885.

Usui I, Imamura T, Babendure JL, et al: 2005.G protein–coupled receptor kinase 2 medi-ates endothelin-1–induced insulin resis-tance via the inhibition of both Galphaq/11and insulin receptor substrate-1 pathwaysin 3T3-L1 adipocytes. Mol Endocrinol 19:2760–2768.

Usui I, Imamura T, Satoh H, et al: 2004. GRK2is an endogenous protein inhibitor of theinsulin signaling pathway for glucosetransport stimulation. EMBO J 23:2821–2829.

Vinge LE, von Lueder TG, Aasum E, et al:2008. Cardiac-restricted expression of thecarboxyl-terminal fragment of GRK3 un-covers distinct functions of GRK3 in regu-lation of cardiac contractility and growth:GRK3 controls cardiac alpha1-adrenergicreceptor responsiveness. J Biol Chem 283:10601–10610.

Völkers M, Weidenhammer C, Herzog N, etal: 2011. The inotropic peptide �ARKctimproves �AR responsiveness in normaland failing cardiomyocytes through G(��)-mediated L-type calcium current disinhibi-tion. Circ Res 108:27–39.

Watanabe H, Xu J, Bengra C, et al: 2002.Desensitization of human renal D1 dopa-mine receptors by G protein– coupled re-ceptor kinase 4. Kidney Int 62:790 –798.

White DC, Hata JA, Shah AS, et al: 2000.Preservation of myocardial beta-adrener-gic receptor signaling delays the develop-ment of heart failure after myocardialinfarction. Proc Natl Acad Sci U S A97:5428 –5433.

Zhou RH, Pesant S, Cohn HI, et al: 2009.Negative regulation of VEGF signaling inhuman coronary artery endothelial cells byG protein–coupled receptor kinase 5. ClinTransl Sci 2:57–61.

PII S1050-1738(12)00272-1 TCM

TCM Vol. 22, No. 8, 2012

Challenges in Medical Applications ofWhole Exome/Genome SequencingDiscoveriesAli J. Marian*

Despite the well-documented influence of genetics on susceptibility tocardiovascular diseases, delineation of the full spectrum of the risk alleleshad to await the development of modern next-generation sequencingtechnologies. The techniques provide unbiased approaches for identifica-tion of the DNA sequence variants (DSVs) in the entire genome (wholegenome sequencing [WGS]) or the protein-coding exons (whole exomesequencing [WES]). Each genome contains approximately 4 million DSVsand each exome approximately 13,000 single nucleotide variants. Thechallenge facing researchers and clinicians alike is to decipher the biolog-ical and clinical significance of these variants and harness the informationfor the practice of medicine. The common DSVs typically exert modesteffect sizes, as evidenced by the results of genome-wide association studies,and hence have modest or negligible clinical implications. The focus is onthe rare variants with large effect sizes, which are expected to have strongerclinical implications, as in single gene disorders with Mendelian patternsof inheritance. However, the clinical implications of the rare variants forcommon complex cardiovascular diseases remain to be established. Themost important contribution of WES or WGS is in delineation of the novelmolecular pathways involved in the pathogenesis of the phenotype, whichwould be expected to provide for preventive and therapeutic opportunities.(Trends Cardiovasc Med 2012;22:219–223) © 2012 Elsevier Inc. All

• Introduction

The advent of massively parallel nucleicacid sequencing or next-generation se-quencing (NGS) technologies has had an

Ali J. Marian is at the Center for Cardiovas-cular Genetics, Brown Foundation Instituteof Molecular Medicine, The University ofTexas Health Science Center and Texas HeartInstitute, Houston, TX 77030, USA.

*Address correspondence to: Ali J. Marian,MD, Center for Cardiovascular Genetics,Brown Foundation Institute of MolecularMedicine, The University of Texas HealthSciences Center, Texas Heart Institute, 6770Bertner St, Suite C900A, Houston, TX 77030,USA. Tel.: (�1) 713 500 2350; fax: (�1) 713500 2320; e-mail: [email protected] online 24 August, 2012.

© 2012 Elsevier Inc. All rights reserved.

1050-1738/$-see front matter

enabling impact in delineating the ge-netic makeup of each individual. NGShas raised considerable interest in po-tential applications of the informationembedded in the DNA sequence of eachgenome in the practice of medicine. TheNGS platforms have afforded the oppor-tunity to sequence the entire genome ofan individual (whole genome sequenc-ing [WGS]), which is composed of 3.2billion nucleotides, within a week. Like-wise, it has enabled sequencing of theentire protein-coding region, referred toas exome, which is composed of approx-imately 180,000 exons in approximately23,500 genes and 30 million nucleotides,in dozens of individuals within a fewdays (whole exome sequencing [WES]).

The enormity of this progress is remark-

219

Page 2: Challenges in Medical Applications of Whole Exome/Genome Sequencing Discoveries

able: Whereas the draft sequence of thehuman genome (the Human GenomeProject), which was completed about adecade ago, took about 11 years and costapproximately $3 billion (Lander et al.2001), the modern NGS instruments cangenerate up to 600-gigabase output persequencing run in less than 2 weeks andat the cost of several thousand dollars.The output is sufficient to sequence sev-eral human genomes and several dozenexomes at an excellent average coverageper read. The exciting advances alongwith improvements in bioinformaticstools, which provide for the analysis of amassive amount of sequence reads gen-erated by the sequencing instruments,have led to the dawning of the genetic-based practice of medicine. Despitethese advances, however, a number ofimportant obstacles have to be resolvedbefore WGS or WES is routinely utilizedin the practice of medicine.

• Genetic Etiology of Susceptibilityto Cardiovascular Diseases

Clinical phenotypes are the conse-quences of complex, nonlinear, and of-ten stochastic interactions among vari-ous etiological determinants, includinggenetics factors. It is probably accurateto state that there is no phenotype thatdoes not have a genetic etiological com-ponent. It is equally important to realizethat there is no phenotype—Mendelianor otherwise—that is not influenced bythe nongenetic factors. The magnitudeof contribution of the genetic factors tothe clinical phenotypes typically followsa gradient that ranges from minimal tolarge (Marian 2009). It is best illustratedin familial segregation, which is indicativeof the heritable component of the pheno-type. Heritability, which is the fraction ofobserved variability in the phenotype thatis attributed to genetic differences amongthe population with the trait, is best esti-mated through twin and family studies.It is strongest for Mendelian diseases,wherein the presence of the risk allele issufficient to cause the phenotype, eventhough the phenotype is also influencedby a number of other genetic and nonge-netic factors. In complex traits, however,the risk allele is neither sufficient nor nec-essary for the disease to manifest. Never-theless, heritability estimates of the com-plex cardiovascular diseases, estimatedfrom monozygotic and dizygotic twins

studies, are quite variable and typically

220

range from 30% to 80% (Marian 2012).The well-documented heritability of thecardiovascular phenotypes is the basis forintense interest in genome-wide associa-tion studies (GWAS), WGS, or WES todelineate the genetic basis for susceptibil-ity to disease, response to therapy, and theclinical outcomes. GWAS, which by de-sign analyze the common variants, haveled to identification of more than 200susceptibility loci for cardiovascular dis-eases (http://www.genome.gov/gwastudies).However, alleles identified throughGWAS account for only a small fractionof heritability of the complex cardiovas-cular diseases. Hence, the emphasis hasshifted toward identification and analy-sis of the rare variants through WES/WGS approaches, which might havelarger effect sizes.

• Abundance of DNA SequenceVariants in the Human Genome

The human genome is composed of 3.2billion pairs of nucleotides, of whichapproximately 4 million nucleotides arepolymorphic in a given individual—thatis, have a different nucleotide at a givenposition than the reference sequence.However, in a population of approxi-mately 10,000-15,000 individuals, ap-proximately 1 in every 20 nucleotides ispolymorphic (�5% of the population ge-nome) and the vast majority of the poly-morphic alleles are exceedingly rare(Keinan and Clark 2012, Nelson et al.2012). With the exception of monozy-gotic twins, no two individuals are ge-netically (ie, at the DNA sequence level)identical. On average, each human ge-nome contains approximately 4 millionDNA sequence variants (DSVs), of whichapproximately 3.5 million are single nu-cleotide variants (SNVs) (Levy et al.2007, Ng et al. 2008, Pennisi 2010, Wanget al. 2008, Wheeler et al. 2008). Inaddition, the genome contains thou-sands of small insertions/deletions andlarge segments of DNA duplications, in-sertions, deletions, and rearrangements,which are collectively referred to asstructural variations (SVs) (Kidd et al.2008, Korbel et al. 2007). SVs that in-crease or decrease the two copies of thegenes or chromosomal segments are re-ferred to as copy number variants(CNVs). The vast majority of the DSVs inthe human genome are SNVs, and aconsiderable number of them are unique

to an individual (Table 1). Almost half of

the genes in each individual genome arepolymorphic—that is, the two copies ofthe genes are not precisely identical atthe DNA sequence level (Levy et al.2007). In addition, each genome/exomecontains approximately 13,000 nonsyn-onymous variants (nsSNVs) that — bydefinition — affect the amino acid se-quence in the encoded proteins andhence, might exert biological effects(Levy et al. 2007, Ng et al. 2008, Wang etal. 2008, Wheeler et al. 2008). The com-monly used bioinformatics tools for pre-dicting the functional consequences ofthe DSVs—outside of nonsense, frame-shift, and splice junction variants—of-ten do not offer concordant results (Ten-nessen et al. 2012). On average, 2.3% of13,595 SNVs identified in each exomeare predicted to affect protein functionby multiple bioinformatics tools (Ten-nessen et al. 2012). Thus, each exomecarries several hundred to several thou-sand SNVs that are considered poten-tially damaging. Likewise, each genomecontains approximately 100-120 loss-of-function (LoF) variants, of which ap-proximately 20 are homozygous andhence inactivate the corresponding gene(MacArthur et al. 2012). Approximately25-35 heterozygous and 2 or 3 homozy-gous variants affect stop codons andlead to inactivation of one or two copiesof the corresponding gene. Approxi-mately 50-100 variants in each genomeare known to be associated with inher-ited disorders, and approximately 30variants in each genome are de novo (ie,absent in the parents). The plethora ofDSVs, including putatively functionalDSVs, highlights the complexity of thegenetic diversity of humans, which isaccentuated by the recent acceleratedexpansion of the human population andintroduction of a very large number ofnew alleles in each generation. Thenewly introduced alleles are, as ex-pected, rare, often private, and expectedto be more deleterious because of anadequate filtering by natural selection.This complexity complicates clinical ap-plications of the DNA sequencing data.

• Medical DNA Sequencing

The strength of the NGS in offering anunbiased approach to identification ofDSVs in an individual genome or exome,in terms of medical applications of thefindings, has to be considered in the

context of the enormous genetic diver-

TCM Vol. 22, No. 8, 2012

Page 3: Challenges in Medical Applications of Whole Exome/Genome Sequencing Discoveries

sity of humans and the presence of avery large number of DSVs in each ge-nome or exome. Further complicatingthe clinical implications is the unknowncontributions of these variants to thephenotype. In considering the clinicalutility of NGS data, the effect sizes of theDSVs, which follow a gradient, have ma-jor implications (Marian 2009). By andlarge, rare DSVs exert larger effect sizesand hence are more likely to be patho-genic, as in diseases with Mendelianpatterns of inheritance. In contrast,common DSVs typically exert modesteffect sizes and hence have modest, ifany, clinical utility at an individual level,even though their population-attribut-able risk might be greater, simply be-cause of their abundance (Marian andBelmont 2011). For common complexdiseases, WES/WGS, in the best-casescenario, might lead to identification ofa clinically meaningful risk allele for atleast one disease (Roberts et al. 2012).However, the majority of individuals arelikely to have negative WES/WGS re-sults, but the negative predictive value ofsuch tests would be relatively small forcommon complex diseases (Roberts etal. 2012). Overall, the direct clinical im-pact of NGS is expected to be greater forsingle gene diseases, whereas the indi-rect impact—through delineation of theresponsible mechanisms for the patho-genesis of the phenotype—is equally im-portant for single gene and complexpolygenic traits. The complexity of ap-plying the NGS data to medical practicehas two major components: the imper-

Table 1. DNA sequence variants in

Nucleotides (base pairs)Protein-coding genesNo. of exonsSize of exome (base pairs)DSVsSNPsSVs/CNVsnsSNPsnsSNP potentially damagingLoF variantsHomozygous LoF variantsVariants known to be associated with inStop codon variantsHomozygous stop codon variantsDe novo variants

CNVs, copy number variants; DSVs, DNA senonsynonymous single nucleotide polymorphSVs, structural variants.

fectness of the NGS technologies and

TCM Vol. 22, No. 8, 2012

difficulty in identifying the clinically sig-nificant alleles.

Technical Challenges

At the technical level, the error rate ofallele calling (ie, false positives) is prob-ably the most important aspect of NGSbecause even a small error rate leads toan inflated number of false positives inthe genome simply because of the size ofthe human genome. The lowest sequenc-ing error rate of 10–4 per nucleotide withthe best available platforms is sufficientto introduce a large number of falsepositive calls. In the best circumstances,the false positive rate is approximately5%, which means that of approximately4 million variants detected in each ge-nome, 200,000 allele calls would be er-roneous. This would complicate clinicalimplications of the discoveries. Perhapsthe simplest way to reduce the numberof false positives in WES/WGS is toincrease the number of reads per eachnucleotide or read (ie, the coverage).Typically, there is an inverse relation-ship between the average coverage rateof reads and the number of false-positivealleles, albeit within a limit, becausesome errors might simply get amplifiedwith increasing average. Likewise, cer-tain genomic regions might be moreprone to false-positive calls; hence, re-sequencing alone might not be sufficientto eliminate such calls. In addition, in-creasing the coverage or re-sequencing,at least with the current instruments,adds to the cost of DNA sequencing as

e human genome

3.2 � 109

23,500180,00030 � 106

4 � 106

3.5 � 106

104-105

10,000-13,000100s-1,000s

12020

ited diseases 50-10025-352-330

nce variants; LoF, loss of function; nsSNPs,s; SNPs, single nucleotide polymorphisms;

well as the complexity of managing the

massive amount of data and the bioin-formatics analysis. Nevertheless, a cov-erage rate that offers the maximum con-fidence in accurate allele calling isessential in medical sequencing. Giventhe size of the whole genome, a highermean nucleotide coverage rate in WGSwould be necessary to reduce the tech-nical difficulties in accurate allele call-ing. The mean coverage rate for WES istypically much higher because of thesmaller size of the exome as opposed tothe genome. Also relevant to medicalsequencing is inadequate capture of1000-2000 genes for variant detectionbecause the current WES offers ade-quate capture for approximately 80-90%of the exons. Hence, this imperfect sen-sitivity of the NGS is another source forpotential missed diagnosis (Kiezun et al.2012).

Regarding the population genetic as-pects, the issue of discerning the false-positive calls is further complicated byfact that approximately 75% of alleles inthe population genome are singleton ordoubleton, which are subject to a highfalse-positive rate (Keinan and Clark2012, Nelson et al. 2012). Eliminatingsingleton and doubleton is expected toreduce the number of false-positive callsand yet it could also eliminate poten-tially important causal variants withlarge effect sizes. In addition, the false-positive rate or the confidence in accu-rate allele calling is not evenly distrib-uted to eliminate the false positives byre-sequencing the whole exome or thegenome. Thus, false-positive allele callsare expected to remain a major limita-tion in medical DNA sequencing, neces-sitating an alternative method for valida-tion of the variants identified by WES orWGS in individuals.

Challenges in Linking the DSVs toClinical Phenotypes

Perhaps an even larger challenge thanthe problem of false positives is the dif-ficulty in identifying the true causal al-lele from the vast number of relativelyinnocuous alleles because no gene orprotein is perfect and each carries anumber of nonpathogenic variants, in-cluding non-SNVs. Evidently, the largerthe gene, the greater the likelihood ofidentifying nsSNVs in the gene, some ofwhich might be the causal variants. For

th

her

queism

example, TTN gene encoding sarcomere

221

Page 4: Challenges in Medical Applications of Whole Exome/Genome Sequencing Discoveries

a2cietoitmlcfcstssm

rsaagtns

bgrdulcwi2gcgi

gbancwacpspocTvpr(maaeamtve

fctovmstatdsvcoantcafioppnpa

W

protein titin is one of the largest genes inthe genome. It carries a very large num-ber of nsSNVs, of which only a fractionmight be true causal variants despite thewell-established causal role of TTN vari-nts in cardiomyopathies (Herman et al.012). In a recent study, Seidman andolleagues considered only the truncat-ng variants—that is, those that short-ned the length of the titin protein—ashe causal variants for dilated cardiomy-pathy and did not include the nsSNVsn the initial analysis, partly because ofheir abundance in this large gene (Her-

an et al. 2012). Moreover, the chal-enge is to identify the clinically signifi-ant alleles from those that might beunctional but clinically not signifi-ant. In population genetics, varioustrategies might be employed to reducehe chance of a random association,uch as setting a proper threshold fortatistical significance, corrections for

rge

iant

Lar

of th

e Va

ri

Likely disease-

Disease-

causing

gnifi

canc

e

Disease-associated

diseasecausing

Clin

ical

Sig

Fnw

CSm

all

Population FrequRare

Figure 1. Plot showing population frequency of thclinical significance (effect size) of the variants. Rargreater clinical significance than common alleles, wsignificance. Each genome contains a few disease-genetic evidence of causality, typically through studgenome also contains a number of rare variantdisease-causing variants” because the genetic evideThis group typically includes rare variants that are iwith the phenotype cannot be robustly establishedpenetrance. Disease-associated variants are relativstudies, such as GWAS, have been associated withcause the disease nor necessary but, rather, increasehundred to several thousand nonsynonymous putaencompass approximately 4 million DSVs that neithwith a phenotype.

ultiple hypothesis testing by Bonfer- e

222

oni’s methods, or permutation analy-is (Kiezun et al. 2012). Such measurespply to population data and do notpply to a single individual. Thus, for aiven individual, one can only estimatehe risk based on the population ge-etic data; hence, this might not beufficiently accurate.

To better appreciate the clinical andiological significance of the DSVs in theenome, the variants may be catego-ized, in order of strength of the evi-ence for causality, into five classes (Fig-re 1) as follows: (1) disease-causing, (2)

ikely disease-causing, (3) disease-asso-iated, (4) functional but not associatedith a disease, and (5) unknown biolog-

cal function (Marian and Belmont011). The clinical significance follows aradient, being the highest for disease-ausing variants (category 1) and negli-ible for variants with unknown biolog-cal function (category 5). As would be

Unknown function and not associated with a disease

tional but ssociated a disease

y of the VariantCommon

IndividualGenome

SVs, abundance in an individual genome, and theleles impart larger effect sizes and generally have ah have modest effect sizes and typically no clinicaling variants, a category for which there is strongin large pedigrees with single gene disorders. Eachith large effect sizes that are considered “likelyfor their causal role is not sufficiently conclusive.

tified in affected individuals, wherein cosegregationcause of small size of the families or incompletemore common and are those that in large-scalephenotype. However, they are neither sufficient torisk of the disease. Each genome/exome has severaly functional variants. The most abundant variantsave a known biological function nor are associated

xpected, the frequency of the alleles, in r

eneral, follows the opposite gradient,eing rare for disease-causing variantsnd common for the variants that areot know to carry biological signifi-ance. Thus, the practical approachould be to focus on those variants thatlready have been implicated as theausal variants in the pathogenesis ofhenotype, such as nonsense or mis-ense mutations previously identified inatients with cardiovascular phenotypesr in a gene previously shown to be aausal gene for a Mendelian disorder.here are probably a handful of suchariants in each genome, which couldrovide for early detection of those atisk prior to development of the diseasepreclinical). The genetic information

ight be exploited for close monitoringnd follow-up of the mutation carriersnd even interventions to prevent thevolving phenotype. Nevertheless, as forny medical diagnostic test, the resultsust not be overinterpreted because of

he presence of considerable phenotypeariability as well as plasticity (Klassent al. 2011).

It would be expected that putativelyunctional nsSNVs in each genome alsoontribute to phenotypic expression ofhe diseases. However, the clinical utilityf these relatively common functionalariants in a given individual is relativelyodest and is negligible in most in-

tances. Accordingly, medical applica-ions of the NGS are largely restricted topproximately 100 variants that are ei-her known to be causal for inheritediseases or impart major effects on genetructure and function, such as the LoFariants. Astute clinical phenotypicharacterization and periodic follow-upf those who carry these typically rarend functionally important variants isecessary. Likewise, close monitoring ofhose who do not exhibit any discerniblelinical phenotype is necessary to detectny evolving disease early and prior toull-blown manifestations. The goal is tontervene early to prevent developmentf the disease. Nevertheless, despite thelausibility of the genetic-based medicalractice, the potential utility of the ge-etic information on early diagnosis andrevention of cardiovascular diseaseswaits to be tested and validated.

Concluding Remarks

ES affords an unbiased discovery of

uncot aith

enc

e De alhic

causies

s wnceden

beelythethe

tiveler h

are and common variants in the pro-

TCM Vol. 22, No. 8, 2012

Page 5: Challenges in Medical Applications of Whole Exome/Genome Sequencing Discoveries

tein-coding regions of the genomes andhence has the potential for clinical ap-plications. However, the imperfectnessof variant detection by NGS, which re-duces the sensitivity of the approach to80%-90% and the specificity to 90%-95%, poses major challenges. Althoughthe sensitivity and specificity of NGS invariant detection and calling are withinthe realm of clinical testing, the poten-tial implications of a false or a misseddiagnosis are greater—both from a med-ical standpoint and because of the psy-chological stress that such misdiagnosismight generate. In addition, clinical ap-plication of the information content ofDSVs is partially restricted by the phe-notypic variability and plasticity and,hence, the lack of a one-to-one correla-tion. Thus, experienced clinicians whoare trained in medical genetics shouldcarefully discuss the results of WES withthose involved in order to reduce and, itis hoped, eliminate the potential for pro-viding false information.

In addition to direct medical implica-tions, WES/WGS is also a robust ap-proach for various clinical applications,including identification of the causalgenes for cardiovascular disorders, par-ticularly single gene disorders (Hermanet al. 2012); genetic testing in probandsand family members with cardiovascu-lar diseases with a Mendelian pattern ofinheritance (Meder et al. 2011); and indefining the genetic architecture of adisease in a given individual/family. Es-tablishing the causal role of the rarevariants for complex phenotypes in pop-ulations is quite challenging because ofthe need for a very large sample size, theundefined effect size of the rare alleles,the multidirectionality of the effects,population heterogeneity, and so on (Ki-ezun et al. 2012). Perhaps the most im-

portant contribution of WES/WGS is in

TCM Vol. 22, No. 8, 2012

providing insight into the molecularpathogenesis of the phenotype throughdelineating its genetic etiology. The lat-ter, namely elucidation of the moleculargenetic basis of cardiovascular diseases,has the potential to provide for newtherapeutic targets to prevent the dis-ease and reverse the established pheno-type.

Finally, it is important to recognizethat clinical phenotypes are the out-comes of complex intertwined, stochas-tic, and typically nonlinear interactionsamong genetic, genomics, and environ-mental factors. Hence, information em-bedded in the genome/exome should beutilized wisely to fulfill the promise of“primum non nocere.”

References

Herman DS, Lam L, Taylor MR, et al: 2012.Truncations of titin causing dilated cardio-myopathy. N Engl J Med 366:619–628.

Keinan A & Clark AG: 2012. Recent explosivehuman population growth has resulted inan excess of rare genetic variants. Science336:740–743.

Kidd JM, Cooper GM, Donahue WF, et al:2008. Mapping and sequencing of struc-tural variation from eight human genomes.Nature 453:56–64.

Kiezun A, Garimella K, Do R, et al: 2012.Exome sequencing and the genetic basis ofcomplex traits. Nat Genet 44:623–630.

Klassen T, Davis C, Goldman A, et al: 2011.Exome sequencing of ion channel genesreveals complex profiles confounding per-sonal risk assessment in epilepsy. Cell 145:1036–1048.

Korbel JO, Urban AE, Affourtit JP, et al: 2007.Paired-end mapping reveals extensivestructural variation in the human genome.Science 318:420–426.

Lander ES, Linton LM, Birren B, et al: 2001.Initial sequencing and analysis of the hu-

man genome. Nature 409:860–921.

Levy S, Sutton G, Ng PC, et al: 2007. Thediploid genome sequence of an individualhuman. PLoS Biol 5:e254.

MacArthur DG, Balasubramanian S, Frank-ish A, et al: 2012. A systematic survey ofloss-of-function variants in human protein-coding genes. Science 335:823–828.

Marian AJ: 2009. Nature’s genetic gradientsand the clinical phenotype. Circ CardiovascGenet 2:537–539.

Marian AJ: 2012. Elements of “missing heri-tability”. Curr Opin Cardiol 27:197–201.

Marian AJ & Belmont J: 2011. Strategic ap-proaches to unraveling genetic causes ofcardiovascular diseases. Circ Res 108:1252–1269.

Meder B, Haas J, Keller A, et al: 2011.Targeted next-generation sequencing forthe molecular genetic diagnostics of car-diomyopathies. Circ Cardiovasc Genet4:110 –122.

Nelson MR, Wegmann D, Ehm MG, et al:2012. An abundance of rare functional vari-ants in 202 drug target genes sequenced in14,002 people. Science 337:100–104.

Ng PC, Levy S, Huang J, et al: 2008. Geneticvariation in an individual human exome.PLoS Genet 4:e1000160.

Pennisi E: 2010. 1000 Genomes Project givesnew map of genetic diversity. Science 330:574–575.

Roberts NJ, Vogelstein JT, Parmigiani G, etal: 2012. The predictive capacity of per-sonal genome sequencing. Sci Transl Med4:133ra158.

Tennessen JA, Bigham AW, O’Connor TD, etal: 2012. Evolution and functional impactof rare coding variation from deep sequenc-ing of human exomes. Science 337:64–69.

Wang J, Wang W, Li R, et al: 2008. Thediploid genome sequence of an Asian indi-vidual. Nature 456:60–65.

Wheeler DA, Srinivasan M, Egholm M, et al:2008. The complete genome of an individ-ual by massively parallel DNA sequencing.Nature 452:872–876.

PII S1050-1738(12)00273-3 TCM

223