15
Child Development and Structural Variation in the Human Genome Ying Zhang Yale University Rajini Haraksingh and Fabian Grubert Stanford University Alexej Abyzov, Mark Gerstein, and Sherman Weissman Yale University Alexander E. Urban Stanford University Structural variation of the human genome sequence is the insertion, deletion, or rearrangement of stretches of DNA sequence sized from around 1,000 to millions of base pairs. Over the past few years, structural var- iation has been shown to be far more common in human genomes than previously thought. Very little is currently known about the effects of structural variation on normal child development, but such effects could be of considerable signicance. This review provides an overview of the phenomenon of structural variation in the human genome sequence, describing the novel genomics technologies that are revolutioniz- ing the way structural variation is studied and giving examples of genomic structural variations that affect child development. Child development is shaped by the interplay of genetic and environmental forces. The extent to which each of these forces affects developmental outcomes has been the subject of intense debate and research for decades (Stiles, 2011). Structural aberrations of the genome, such as large deletions, duplications, inversions, or translocations, have been known for many years to be potential causes of disturbances in the normal trajectory of child development (Francke, 1978). However, until recently, it was thought that there is only a rela- tively minor amount of difference in the genomic sequence between healthy individuals and almost none on the level of structural variation. Therefore, structural variation would not have been expected to play a signicant role in normal child develop- ment. For example, when the rst human genome sequence was released (Lander et al., 2001), one plausible assumption was that less than 0.1% of the genomic sequence was different between two nor- mal individuals and that most of this difference would be caused by single nucleotide polymor- phisms (SNPs; Sachidanandam et al., 2001; Taylor, Choi, Foster, & Chanock, 2001). Nevertheless, we now know that structural variation is actually com- mon in every human genome and that many geno- mic structural variants (SVs) can be associated with atypical outcomes for child development. These two phenomena, structural variation in every human genome and SVs in association with atypi- cal child development, when taken together, form the foundation for the expectation that genomic structural variation will emerge as a signicant factor in shaping the phenotypes of child develop- ment even within the normal range. Here we will describe the early fruits of a revolu- tion in genome analysis technology that over just the last few years has lead to the accumulation of discoveries and insights that point to genomic structural variations extensive involvement in development (1000 Genomes Consortium, 2010; Gonzaga-Jauregui, Lupski, & Gibbs, 2012). It is now clear that individual human genomes contain a signicant number of variants in their sequences, and that a considerable share of these variants are SVs (i.e., encompassing more than just a very small number of nucleotides). For many of those sequence variants, in particular for the relatively large num- ber of smaller to medium-sized variation events, We thank Dr. George Mias for his helpful comments on the manuscript. Correspondence concerning this article should be addressed to Alexander Eckehart Urban, Department of Psychiatry and Behav- ioral Sciences, and Department of Genetics, Stanford University School of Medicine, 1050A Arastradero Road, Room A233A, Palo Alto, CA, 94305. Electronic mail may be sent to aeurban@ stanford.edu. © 2013 The Authors Child Development © 2013 Society for Research in Child Development, Inc. All rights reserved. 0009-3920/2013/8401-0004 DOI: 10.1111/cdev.12051 Child Development, January/February 2013, Volume 84, Number 1, Pages 3448

child development

Embed Size (px)

DESCRIPTION

research

Citation preview

Page 1: child development

Child Development and Structural Variation in the Human Genome

Ying ZhangYale University

Rajini Haraksingh and Fabian GrubertStanford University

Alexej Abyzov, Mark Gerstein, andSherman Weissman

Yale University

Alexander E. UrbanStanford University

Structural variation of the human genome sequence is the insertion, deletion, or rearrangement of stretchesof DNA sequence sized from around 1,000 to millions of base pairs. Over the past few years, structural var-iation has been shown to be far more common in human genomes than previously thought. Very little iscurrently known about the effects of structural variation on normal child development, but such effectscould be of considerable significance. This review provides an overview of the phenomenon of structuralvariation in the human genome sequence, describing the novel genomics technologies that are revolutioniz-ing the way structural variation is studied and giving examples of genomic structural variations that affectchild development.

Child development is shaped by the interplay ofgenetic and environmental forces. The extent towhich each of these forces affects developmentaloutcomes has been the subject of intense debateand research for decades (Stiles, 2011). Structuralaberrations of the genome, such as large deletions,duplications, inversions, or translocations, havebeen known for many years to be potential causesof disturbances in the normal trajectory of childdevelopment (Francke, 1978). However, untilrecently, it was thought that there is only a rela-tively minor amount of difference in the genomicsequence between healthy individuals and almostnone on the level of structural variation. Therefore,structural variation would not have been expectedto play a significant role in normal child develop-ment. For example, when the first human genomesequence was released (Lander et al., 2001), oneplausible assumption was that less than 0.1% of thegenomic sequence was different between two nor-mal individuals and that most of this differencewould be caused by single nucleotide polymor-phisms (SNPs; Sachidanandam et al., 2001; Taylor,

Choi, Foster, & Chanock, 2001). Nevertheless, wenow know that structural variation is actually com-mon in every human genome and that many geno-mic structural variants (SVs) can be associated withatypical outcomes for child development. Thesetwo phenomena, structural variation in everyhuman genome and SVs in association with atypi-cal child development, when taken together, formthe foundation for the expectation that genomicstructural variation will emerge as a significantfactor in shaping the phenotypes of child develop-ment even within the normal range.

Here we will describe the early fruits of a revolu-tion in genome analysis technology that over justthe last few years has lead to the accumulation ofdiscoveries and insights that point to genomicstructural variation’s extensive involvement indevelopment (1000 Genomes Consortium, 2010;Gonzaga-Jauregui, Lupski, & Gibbs, 2012). It is nowclear that individual human genomes contain asignificant number of variants in their sequences,and that a considerable share of these variants areSVs (i.e., encompassing more than just a very smallnumber of nucleotides). For many of those sequencevariants, in particular for the relatively large num-ber of smaller to medium-sized variation events,We thank Dr. George Mias for his helpful comments on the

manuscript.Correspondence concerning this article should be addressed to

Alexander Eckehart Urban, Department of Psychiatry and Behav-ioral Sciences, and Department of Genetics, Stanford UniversitySchool of Medicine, 1050A Arastradero Road, Room A233A,Palo Alto, CA, 94305. Electronic mail may be sent to [email protected].

© 2013 The AuthorsChild Development © 2013 Society for Research in Child Development, Inc.All rights reserved. 0009-3920/2013/8401-0004DOI: 10.1111/cdev.12051

Child Development, January/February 2013, Volume 84, Number 1, Pages 34–48

Page 2: child development

the phenotypic consequences are not yet known.Next we give examples of genomic structural varia-tion that have been shown to be associated withchildhood development in often quite drastic andnegative ways. We think that such examples pro-vide an outlook on what may be uncovered as thehuman genomics revolution continues to unfold:that normal childhood development is influenced insubtle, yet important, ways by the often smaller,but frequent SVs that are present in variouscombinations in the genomes of all individuals. Inmany of the disease phenotypes associated withstructural variation that are being discovered at arapid pace (Glessner, Connolly, & Hakonarson,2012; Stankiewicz & Lupski, 2010), we observe thedeleterious influences of typically large and rareSVs. These include pervasive and detrimentaleffects on morphogenetic processes of critical organsystems, such as the heart and its outflow tracts, aswell as on systems with central importance to thevarious ways in which the individual interacts withits environment, such as the immune system, thebrain, facial features, and the speech apparatus. Inorgan systems such as these, many of the smallerand more common SVs that are present in everyhuman genome may eventually be found to havephenotypic effects that are less drastic than thoseassociated with large and rarer SVs. These smallerbut more frequent SVs may exert a more (orentirely) benign, modulating influence that couldstill potentially constitute one of the fundamentalelements of the genetic contribution to variation inchild development.

Structural Variation in the Human Genome

A haploid human genome, which has a gross archi-tecture that is fixed between individuals, consists ofapproximately 3 billion base pairs (3 Gbp) of DNAsequence. All the somatic cells in the human bodycontain twice that amount of DNA, the diploidhuman genome. The total diploid genomic DNA ispartitioned into two copies of each of 22 autosomes(nonsex chromosomes) and the two sex chromo-somes. Furthermore, the corresponding base pairsof each genomic sequence are also generally con-served between individual genomes, especially infunctional regions. These functional regions of thegenome include those stretches of DNA that will betranscribed into RNA, as well as regulatory DNAsequences that will not but that nevertheless servecritical functions in the complex system of controlof genetic information. The regulatory genomic

DNA sequence elements are stretches of DNAsequences in the genome that are recognized byspecialized regulatory proteins that bind to the reg-ulatory sequence element with sometimes very highspecificity to its DNA sequence and then controland modulate the activity of genes and other tran-scribed regions of the genome that can be foundlocated at various but often short distances fromthe regulatory element itself (Tom Strachan, 2004).

The relative differences in the sequence of DNAnucleotides, base pairs, or in the arrangement ofblocks of DNA sequence between different genomesconstitute genomic variation, a normal aspect of thenature of the human genome. The concept of geno-mic variation is only meaningful in the context of acomparison between distinct genomes. From a tech-nical point of view, it usually means determiningthe differences between the genome to be analyzedand the sequence information of the Human Refer-ence Genome (Lander et al., 2001). Genomicsequence variation is composed of different classesof DNA sequence variation events. These classesare characterized by the size of the variant, whichrefers to the contiguous number of base pairsaffected between the start- and endpoints of a givensequence variation event. A further important con-sideration when classifying genomic sequence vari-ation is whether DNA sequence was gained, lost, orrearranged and what the position of the variant isrelative to the rest of the genomic sequence, forexample, whether a duplicated stretch of DNAsequence was inserted at a position far distant fromits original copy as opposed to right next to it(Stankiewicz & Lupski, 2010).

Size Distribution and Definition of Structural Variation

There is a size continuum of genomic DNAsequence variation ranging from single base pairevents, such as single nucleotide polymorphismsand point mutations, over small InDels (Insertionsand Deletions) to large or very large Copy NumberVariants (CNVs) and SVs, respectively (Figure 1).Single nucleotide polymorphisms are particularsingle nucleotides in the genomic sequence that arevery frequently observed to be variable in the nor-mal population (The International Hapmap Consor-tium, 2005). InDels are insertions or deletions of afew to a few dozen base pairs (or to several hun-dred base pairs, or even up to 1,000 base pairs—thedefinitions are still in flux here). CNVs/SVs begin insize where InDels end and consist of changes in thegenomic sequence that can encompass hundreds tohundreds of thousands, and in rare instances mil-

Child Development and Structural Variation 35

Page 3: child development

lions, of base pairs. CNVs refer to such medium tolarge size deletions or duplications of genomicsequence whereas SVs are understood to includeCNVs as well as similarly medium- to large-sizedcopy number neutral events, such as sequenceinversions (where a stretch of DNA sequence hasbeen “flipped around” between two endpointswhile no net gain or loss in DNA sequence hasoccurred) and balanced chromosomal translocations(where two chromosomes have exchanged stretchesof DNA sequence, again with no net gain or loss ofsequence to the individual genome in question;Alkan, Coe, & Eichler, 2011). In summary, althoughsome details of the naming conventions are stillbeing debated, consensus is emerging that structuralvariation generally refers to deletions, duplications,or insertions as well as copy number neutral inver-sions and translocation of stretches of genomicDNA sequence of around (or exactly) 1 kilobase-pair (kbp) in length and larger, but still consider-ably smaller than whole chromosomes (seeFigure 1; Stankiewicz & Lupski, 2010).

SVs of the insertion type are classified dependingon the relative position and nature of the events.They can be insertions of novel DNA sequence, inthe sense that the sequence in question did not pre-viously exist in the human genome (but came, e.g.,from viral genomes), or repeat events (duplicationsof stretches of DNA sequence that already existedin the genome before the structural variation event).Repeat insertions can be variable in number, be intandem to each other, or be dispersed over thechromosome from which they originated or evenover other chromosomes in the genome.

Distribution of Structural Variation Across the HumanGenome

Structural variation in the genome is a majorsource of genomic sequence variation between ge-nomes of individuals. Based on what is currentlyknown about variation in the human genome, it canbe assumed that the sum of base pairs that can beaffected by structural variation in a typical humangenome can surpass the number of base pairs thatare contributed to the total amount of genomic vari-ation in the form of SNPs (of which each genometypically contains more than 3 million). In fact, alarge proportion of base pairs that vary betweenindividual genomes lie in regions where structuralvariation is inherently common (1000 Genomes Con-sortium, 2010; Conrad et al., 2010; Mills et al., 2011).Over the last 8 years, after the initial discovery ofthe prevalence of CNVs in the normal human gen-ome (Iafrate et al., 2004; Sebat et al., 2004), therehave been an impressive number of projects aimingto map structural variation genome-wide. Theseprojects have led to the discovery of novel variableregions and the refining of known variable loci(Conrad et al., 2010; Korbel et al., 2007; Mills et al.,2006; Redon et al., 2006; Tuzun et al., 2005). The lat-est and so far most comprehensive such effort isbeing undertaken by the structural variation groupof the 1000 Genomes Consortium (2010; Mills et al.,2011). The recently completed pilot phase of thiseffort produced a map of SVs at base pair resolutionin the human genome in several different popula-tions from around the world through analyzing thegenomes of approximately 180 healthy individuals.These SVs are cataloged in public databases such asthe Database of Genomic Variants (DGV; http://projects.tcag.ca/variation) and the Human Struc-tural Variation Database (dbVar; http://www.ncbi.nlm.nih.gov/dbvar). The curation ofthese databases has been challenging, as the datathat are reported in them come from mappingefforts that use a multitude of technologies withvarying boundary definitions, DNA quality, refer-ence samples, and terminology (Ionita-Laza, Rogers,Lange, Raby, & Lee, 2009). This somewhat unsettledsituation is also a function of how recent an eventthe discovery of widespread structural variation inthe human genome still is. But generally acceptedstandards, conventions, and terminologies are nowemerging, and these large-scale mapping and cata-loging efforts will be of great utility for the biomedi-cal research community.

To date, around 28,000 unique SVs have beendiscovered and cataloged (Mills et al., 2011) and

Figure 1. The different types of sequence variation in the humangenome across the size spectrum and the degree to which eachavailable analytical technology can access types of sequence vari-ants across the spectrum of variant size. The various technologiesare described further in the main text (2G-Seq stands for second-generation sequencing).

36 Zhang et al.

Page 4: child development

most of these can be assumed to be to a certaindegree common in the human population. Forexample, in an individual human genome we cannow expect to find more than 2,000 of these SVswhen carrying out a complete genome sequencinganalysis and requiring that at least two indepen-dent computational methods (i.e., two of the threeanalytical approaches shown in Figure 2) detect agiven SV, to have a high degree of confidence inthe results of the analysis.

The distribution of structural variation in thehuman genome is nonrandom (Figure 3) and biasedagainst functional regions such as genes or regula-tory sequence elements. SVs have been shown to beenriched in pericentromeric and subtelomericregions (i.e., around the centromeres and towardthe ends, respectively, of the chromosomes). Fur-thermore, SVs overlap frequently with regions ofsegmental duplication. Segmental duplications aredefined as regions in the human genome that are atleast several thousands of base pairs in size andwith greater than 95% of sequence homology to aregion elsewhere in the genome. Such segmentalduplications are themselves ancient SVs that havebecome fixed in the genomic sequence (Kim et al.,2008; Marques-Bonet & Eichler, 2009).

Recurrent SV rearrangements of more than100 kbp in size often occur in mutational hotspots.These stretches of unique sequence are in manycases flanked by large (larger than 10 kbp andsometimes even larger than 100 kbp) segmentalduplications that have been found to be well-suited

as substrates for nonallelic homologous recombina-tion, which is in turn one of the mechanisms givingrise to structural variation (see below; Itsara et al.,2009; Mefford & Eichler, 2009).

SVs are less likely to overlap with genes and reg-ulatory sequence elements than they are withregions of the genome with no known function. Atthe same time, there are certain classes of genes thatseem to be more often impacted by SVs thanothers. Interestingly enough, those gene classes thatshow greater affinity to—or tolerance of—structuralvariation are classes of genes that are involved inhow we perceive and interact successfully with ourenvironment (i.e., olfactory receptors, immune andinflammatory response genes, cell signaling mole-cules and ion channels; Korbel et al., 2007; Korbelet al., 2008; Mills et al., 2011). Conversely, the posi-tioning of SVs is negatively correlated with genesthat occupy central nodes and connection points inmolecular networks of interaction and control. Suchhighly dosage sensitive genes are underrepresentedin SV regions (Korbel et al., 2008; Schuster-Bockler,Conrad, & Bateman, 2010).

Structural variation is widely assumed to be botha driving force of evolution as well as the mark ofancient evolutionary processes. SVs may act as sub-strates for further genomic rearrangements byrecombination-based mechanisms (see also below).

Figure 2. The three approaches to analyzing 2nd-generation,high-output DNA sequencing reads to detect structural genomicvariation. Paired-end DNA sequencing output can be analyzedfor paired-end mapping (PEM) and then reanalyzed using split-read analysis (SRA) and read-depth analysis (RDA).

Figure 3. Distribution of structural variation in two normalhuman genomes and genomic regions where structural variationhas been shown to lead to developmental aberrations. The mapof normal structural variation shows structural variation (SV)events sized 3 kbp and larger and is based on the analyses donein Korbel et al. (2007). Note that SV is frequent even in normalhuman genomes and that its distribution is uneven. The high-lighted regions have been selected as examples from the mostprominent disease-associated genomic SV regions. ASD = autismspectrum disorder; CMT1A = Charcot-Marie-Tooth disease type1A; MR = mental retardation; CHD = congenital heart disease.

Child Development and Structural Variation 37

Page 5: child development

At the same time they also consist of those variantsthat were allowed to persist at polymorphic fre-quencies (i.e., they are common in the normal popu-lation) under evolutionary forces. Again, as alreadyalluded to above, segmental duplications are thegenomic remnants of ancient duplication events(Kim et al., 2008; Marques-Bonet & Eichler, 2009).

Mechanisms of SV Formation

SVs are thought to form by recombination, repli-cation, or retrotransposition events. There are fourmajor molecular mechanisms by which SVs are cur-rently thought to occur in the genome: (a) nonallelichomologous recombination (NAHR), (b) nonhomol-ogous end joining (NHEJ), (c) forkhead stalling andtemplate switching (FoSTes), and (d) L1-mediatedretrotransposition (Korbel et al., 2007; Zhang, Carv-alho, and Lupski, 2009; Zhang et al., 2009). Thesedifferent mechanisms form different types of SVs.NAHR and FoSTeS can form deletions, duplica-tions, and inversions. NHEJ can form deletions andduplications, but not inversions. Retrotranspositioncan only form novel insertions. Finally, FoSTeS canform complex patterns of deletions, duplications,and inversions at once at a single locus. All fourmajor mechanisms are further outlined next.

1. NAHR is a recombination event that occurswhen highly homologous but nonallelicsequences align and undergo crossing overduring the events that lead up to cell divisionin the germline. Repeat sequence stretches onthe same chromosome in direct orientation toeach other can recombine to produce deletionand/or duplication events while those in oppo-site orientation to each other can recombine toproduce inversions. NAHR by repeatsequences from different chromosomes that arehomologous to each other, even though theyare not alleles of the same locus, leads to trans-locations. In addition, NAHR can occur in mei-osis where SVs can be inherited. Finally, it canalso occur in mitosis causing mosaic popula-tions of cells carrying the SV.

2. NHEJ occurs when cells repair double-strandedbreaks in their DNA caused by external dam-aging influences such as radiation or, in thecase of immune system cells, by biological V(D)J recombination, a phenomenon of funda-mental importance enabling the immune sys-tem to properly differentiate between self andnon-self (Schatz & Ji, 2011). NHEJ does notrequire long stretches of sequence homology,

but the exact molecular mechanisms involvedare still not well characterized. Rearrangementsoccur when DNA double-stranded breaks arerepaired incorrectly by NHEJ. Often, break-points of NHEJ events fall in repetitivesequence elements of the genome such as LongTerminal Repeats, Long Interspersed Elements(LINEs), and Short Interspersed Elements suchas Alu sequences and Mammalian InterspersedRepeats (MIR). Many such events are found tooccur in regions of the genome with architec-tural features that promote DNA double-stranded breaks (Korbel et al., 2007).

3. FoSTeS is a DNA replication-based mechanismthat can account for complex genomic rear-rangements. In this case, a DNA replicationfork can stall and the lagging strand can disen-gage from the original fork and switch toanother replication fork, usually in close three-dimensional proximity to the original. Here,the lagging strand can restart DNA synthesisby priming the new replication fork from theshort stretches of homologous DNA sequence(sequence microhomology) between the origi-nal and new template. The direction of forkprogression and whether the lagging or leadingstrand of the new fork was used as the templateboth affect the orientation of the erroneouslyincorporated fragment from the new fork to itsoriginal position. In addition, the location of thenew fork downstream or upstream of the origi-nal fork results in the template switching, caus-ing either a deletion or duplication of genomicsequence. This sequence of events, stalling, dis-engaging, invading, and synthesizing, mayoccur multiple times in series giving rise tocomplex rearrangements (Zhang, Carvalho, &Lupski, 2009; Zhang, Khajavi, et al., 2009).

4. L1-mediated retrotransposition is the final poten-tial mechanism of SV formation involving theactivity of full-length L1 retrotransposons (alsoknown as LINE) in the genome. While sequencesof mobile or transposable elements comprisearound half of the total human genomicsequence, most of those formerly mobilesequence elements are now degraded and nolonger able to cause transposition. However,there are about 100 full-length LINE copies thatare potentially active in the human genome(Iskow et al., 2010). Those full-length LINEs areDNA sequences approximately 6 kbp in lengththat carry all the genes that are necessary for theLINE to retrotranspose itself. Retrotranspositionoccurs via an RNA intermediate when the LINE

38 Zhang et al.

Page 6: child development

sequence is transcribed from the genomic DNA.This RNA intermediate is then reverse-tran-scribed into DNA and inserted elsewhere intothe genomic sequence. The reverse transcriptionand insertion are thought to occur by a mecha-nism known as target primed reverse transcrip-tion. This results in an insertion that is flanked byduplicated target sequences. Also, LINEsequences are frequently associated with other,non-LINE, SV sequence variation events, wherethey form one or both of the endpoints of the—often much larger—non-LINE SV, SV which waspresumably caused by an NAHR or NHEJ eventbased on the LINE sequence (Kim et al., 2008).

Functional or Phenotypic Consequences of SV

SVs, just as genomic sequence variants in gen-eral, may or may not have functional consequences,depending on the genetic sequence environment inthe neighborhood of the SV. SVs should beexpected to be often functional in a benign sense,affecting phenotypes that vary normally in humanpopulations, or in a pathological sense, or not func-tional at all, but arising by chance and persisting inthe genome due to lack of selective pressure.

There are several mechanisms by which SVs mayhave a functional impact on phenotype. SVs arecapable of reorganizing functional elements of thegenomic DNA sequence. The simplest mechanisticmodel for SVs affecting phenotype is by alteringgene dosage (i.e., when whole genes are deleted orduplicated causing a decrease or increase in tran-scription and subsequently altering protein levels inthe cell). One of the few already known examplesfor a benign effect of SV on phenotypic variability,which corroborates this model, is a study thatshowed that amylase gene copy number is posi-tively correlated with the starch content of the dietsof several different populations around the world(Perry et al., 2007). The resulting hypothesis fromthis finding is that populations with higher starchcontent need more amylase protein (and hence havemore functional copies of the gene) than popula-tions with low-starch diets.

Structural variants can also alter the coding com-plement of a gene by affecting only a subset ofexons (Korbel et al., 2007; Mills et al., 2011). Thereis mounting speculation that such a type of “incom-plete-gene” SV is more likely to be of a detrimentalnature than a full-gene SV, presumably because theaberrant protein structures caused by incomplete-gene SVs might be more difficult to compensate forthan mere changes in protein level.

Another model by which SVs could plausiblyaffect phenotype is by a deletion of the normalallele unmasking a recessive allele, thus causingexpression of the recessive phenotype; however,this model is mostly speculation at this point.

SVs can furthermore affect gene expression whenthey occur in the regulatory sequence elements of agene. The specific effect of the SV on gene expres-sion will be determined by which type of regula-tory element is affected (i.e., activator or repressor)and by the way in which it is affected (i.e., deletionor duplication of parts or the entire regulatorysequence element or insertions of different DNAsequence into the regulatory element that alters itsstructure; Kasowski et al., 2010).

Finally, the presence of an SV is not always suffi-cient for a phenotypic outcome, as the phenotypicoutcome may also depend on the presence of a dis-tinct environmental factor. For example, in the caseof SV associated with susceptibility to HIV infec-tion, the phenotypic consequence of the SV is onlyrevealed when the cells with differing copy num-bers of a stretch of sequence on chromosome 17containing the gene CCL3L1 are challenged withthe relevant environmental factor, HIV, while other-wise the SV seems to have no effect on the cells(Gonzalez et al., 2005).

Technologies for Mapping Structural Variation inthe Human Genome

Traditionally, very large SVs, that is, chromosomalaberrations of typically several hundred kilobase-pair or even several megabase-pair in size, weredetected by microscope-based methods such askaryotyping or FISH (fluorescence in situ hybridiza-tion), and also by microarrays constructed frombacterial artificial chromosomes (Redon et al., 2006).On the other end of the size spectrum of sequencevariation in the genome are the single base pairsequence changes, such as point mutations or SNPs,or very small sequence changes such as InDels ofjust several or a few dozen base pairs. Those smallsequence variants could be detected by Sanger-typesequencing and PCR—but only if the locus where agiven very small variant would typically occur waspreviously and precisely known (Figure 1).

Although microscope-based techniques willremain relevant for some applications, there is atechnological revolution afoot that is rapidly andprofoundly changing the way genomes can bescreened for sequence variation in general and SVin particular. There are now two main technological

Child Development and Structural Variation 39

Page 7: child development

platforms for detecting and mapping SVs in thehuman genome: high-density oligonucleotide micro-array chips and next-generation, high-speed, high-output DNA sequencing instruments (Figure 1). Theterm next-generation refers to the current DNA instru-ments that represented a significant jump in DNAsequence output relative to their predecessors moreprecisely DNA sequencing technology is currently inits second generation, with the automatic Sangercapillary DNA sequencing instruments that formedthe backbone of the Human Genome Project (Landeret al., 2001) constituting the first generation.

The high-density oligonucleotide microarrays canbe employed for array Comparative GenomeHybridization (aCGH) as well as for SNP geno-typing assays, which can then subsequently be re-analyzed to determine SV content. Array CGHrelies on observing differential hybridization inten-sities of test and reference samples on oligonucleo-tide probes on a microarray to infer deletions orduplications, of genomic regions in one samplecompared to the other. Not all SVs, but rather onlyCNVs such as deletions and duplications, can bedetected by array-based methods, and the resolu-tion at which CNV breakpoints are determined isdependent on the tiling density of the probes (i.e.,the number of genome-sequence-specific DNA olig-omers on the microarray that are representing agiven stretch of genomic DNA sequence; Conradet al., 2010; Urban et al., 2006).

The SNP genotyping assays detect CNVs by twomeasurements: (a) the log R ratio, which refers tothe log of the total signal intensity at the probes forthe A and B alleles of a SNP in the test sampledivided by that in a reference sample—this givesthe copy number in a similar way to array CGH,and (b) the B allele frequency, which refers to thecontribution of the B allele to the total genotype ata given SNP and is based on the signal intensitiesat the oligomer-probes on the array that are repre-senting the A and B allele of a given SNP (McCar-roll et al., 2008). Stretches of loss of heterozygosityindicate deletions while the presence of more thanthree genotypes indicates duplications (a test samplewith normal copy number of two at a given SNPmay have one of three genotypes: AA, AB, or BB).

Microarray platforms for CNV detection are nowavailable that offer chips that carry several hundredthousand to several million specific oligomers andthat sometimes combine array CGH and SNP anal-ysis on a single array chip. These array-basedapproaches are now very stable and reliable whileoffering ease of use, good throughput of samplenumbers, and cost efficiency, and can therefore be

expected to remain in use for some time to come(Beaudet, 2013; Haraksingh, Abyzov, Gerstein,Urban, & Snyder, 2011). At the same time, it is clearthat the future of genome analysis in general andSV analysis in particular belongs to the high-out-put, high-speed DNA sequencing technologies thatare just now becoming fully available. Already it ispossible to sequence the equivalent of a humangenome at deep sequence coverage (i.e., sequencingthe 3 Mbp of haploid DNA sequence at least 30times over to compensate for sequencing errors andunevenness in sequence distribution) in about1 week, for less than $4,000. This is possible withthe currently available versions of the just recentlydeveloped high-output DNA sequencing instru-ments (“next-generation sequencing,” second-gener-ation sequencing) while many leading genometechnology companies are competing to developeven more powerful platforms (third-generationsequencing).

Analysis of Second-Generation DNA Sequencing Data

The massive DNA sequencing output of the cur-rent second-generation of “next-generation” DNAsequencing instruments is composed of hundreds ofmillions of short (50 to several hundred base pairs)DNA sequence snippets (sequence reads), totaling bil-lions of base pairs of sequence per run of the instru-ment. This output is easily three or four orders ofmagnitude greater than that of the first-generationDNA sequencing machines of just 4–5 years ago.The technical principle that is used to achieve suchDNA sequence outputs is that of massively parallelsequencing, where the genome to be studied is frag-mented before applying it to the DNA sequencer.Then, within the reaction chamber of the sequenc-ing instrument, tens or hundreds of millions ofindividual relatively short sequencing reactions takeplace in parallel on the surface of a DNA chip. Eachof these sequencing reactions yields an independentDNA sequence read and the many millions ofsequence reads being produced are then used in aseries of computational analyses (1000 GenomesConsortium, 2010). This computational processingof the raw sequence data output coming from theDNA sequencers is not a trivial undertaking. Thereare in principle three analytical approaches avail-able when aiming to use second-generation DNAsequencing data for SV detection and analysis(Figure 2). All of them are based on the analysis ofthe raw DNA sequencing reads after they havebeen computationally mapped onto the sequenceof the public Human Reference Genome. The

40 Zhang et al.

Page 8: child development

three approaches are: Paired-End Mapping (PEM),Read-Depth Analysis (RDA), and Split-Read Analysis(SRA; Figure 2).

PEM (Alkan et al., 2009; Brunetti-Pierri et al.,2008; Chen et al., 2009; Kidd et al., 2008; Korbelet al., 2007; Korbel et al., 2009; Ritz, Bashir, &Raphael, 2010) employs information from a pair ofDNA sequencing reads (i.e., the relatively shortstretches of DNA sequence that are the output for-mat of the high-output DNA sequencing instru-ments) that come from the two ends of one DNAfragment of an experimentally selected length (typi-cally 300 bp to 3 kbp). Each member of the pair ofsequencing reads is mapped onto the human refer-ence genome sequence and the distance with whichthey mapped apart from each other is measured. Ifthe genome of the subject does not differ in thislocus relative to the reference sequence, then the twoends will map at a distance from each other that isequal to the experimentally selected length of theinput DNA fragment. If the subject has a deletionbetween the two ends, then the distance between thetwo end sequences will be less in the subject than inthe reference genome. If the subject has an insertionbetween the two end reads, then the distancebetween the ends will be greater in the subject thanin the reference; longer insertions can be detected byobserving only one end of the pair, or by observingreads that span the insertion breakpoint. Finally,inversions can be detected from fragments with oneend in the inversion, which will have the reversesequence compared to the reference genome.

SRA (Abyzov & Gerstein, 2011; Mills et al., 2006;Ye, Schulz, Long, Apweiler, & Ning, 2009; Zhanget al., 2011) utilizes information from partial, thatis, split, mappings of a single read. SRA makes useof DNA sequencing reads that fall on the junctionpoint of the two breakpoints caused by the occur-rence of an SV. If the split-read falls on this positionin such a way that there is enough sequence infor-mation available on both sides of the junction, toallow for unambiguous independent mapping ofeach side of the split-read onto the reference gen-ome, then SRA will produce an SV call that hasimmediate nucleotide-level (i.e., maximal) resolu-tion. More so than PEM and RDA, SRA is depen-dent on long read-length and deep genomiccoverage of the sequencing data but at the sametime it immediately produces SV predictions athighest resolution, which can be of great benefit—for example, if an SV has occurred within thesequence of a gene and it is of interest to learnwhich parts of the gene exactly have been pre-served and which have not.

RDA (Abyzov & Gerstein, 2011; Abyzov, Urban,Snyder, & Gerstein, 2011; Alkan et al., 2009; Chianget al., 2009; McCarthy et al., 2009) is based on a sta-tistical analysis of read mapping density per agiven stretch of the human genome and detectsdeletions and duplication by identifying regionswhere the density significantly deviates fromgenome average. DNA sequencing reads aremapped onto the reference genome and themapped reads are then counted in bins along thegenomic sequence. Copy number changes in thegenome of the subject relative to the referencegenome will change the count in a given bin tobelow or above, respectively, the genomic aver-age.

A recent study coming from the 1,000 GenomesProject applied multiple methods from eachapproach on various long and short, paired andunpaired, high-output sequencing data, and revealedsubstantial complementarities of the approaches(Mills et al., 2011). This complementarity wasobserved in types and sizes of discovered SVs,their sequence content, as well as the sequencecontent of the regions surrounding SVs. For exam-ple, SVs that arise as a result of retrotransposonactivity are well detected by paired-end andsplit-read approaches, but are rarely found bythe read-depth approach. Conversely, read-depthapproaches are computationally very robust andefficient and can be run at high processing speedswhile giving reliable SV information even if lessthan deep sequencing data are at hand. Therefore,to obtain the complete picture of SVs in a humangenome it is best to be combining multiple differ-ent computational methods and approaches for SVdiscovery (Lam et al., 2012).

It is becoming very obvious that second-genera-tion DNA sequencing based genome analysis app-roaches will soon become the vehicle of choice asthey allow us to cover the entire spectrum ofsequence variation in the human genome, in termsof both sizes and types of sequence variants, athigh resolution and based on a straightforward andhigh-throughput experimental principle.

Structural Variation and Associated Aberrant ChildDevelopment

Examples of genomic SV being associated withchild developmental disorders are rapidly increas-ing in numbers. In early studies, most of the causa-tive SVs were identified in single chromaticdisorders like the partial trisomies of chromosome21 that form a subgroup of cases with Down

Child Development and Structural Variation 41

Page 9: child development

syndrome (Korbel et al., 2009), which can relativelyeasily be detected by traditional cytogenetic tech-niques. However, over the past few years, with therapid advances in analytical technology, severallandmark findings were reported that identifiedrelatively rare, de novo SVs in complex humandisorders, including in common conditions such asautism and autism spectrum disorders (ASD), someof which we describe here as examples for the asso-ciation between SV and disturbed child develop-ment.

An increasing number of studies have shown thecausative effects of SV in child developmental dis-orders. Although less frequent than SNPs whencounted as single events, SVs may have more func-tional effects in genes and thus on phenotypes dueto the sometimes very large extent of genomicsequence, and consequently the large number ofgenes and other functional elements affected bythem. As already mentioned above, recent analysesreported a total of about 28,000 SVs from the exam-ination of about 180 human genomes, out of whichat least 1,700 SVs affected coding sequences (Millset al., 2011). Probably a total of several percent ofeach human genome sequence are variable relativeto the reference sequence as a result of SV (Alkanet al., 2011; Redon et al., 2006). It is predictable thatin the near future, with the broader usage of thenewly available analytical techniques, an evenlarger number of SVs will be cataloged in a largenumber of individual human genome sequences.

As we have discussed, SVs can take the shape ofdeletions, insertions, duplications, inversions, andtranslocations, but whether an SV will trigger ormediate a diseased state depends on whetherdosage sensitive genes or regulatory regions areaffected by those genome sequence alterations. Inthe following we describe several examples for thedetrimental effects of SVs on child development.

Deletion SVs and Associated Child DevelopmentDisorders

Williams syndrome and Velocardiofacial syn-drome (VCFS) serve as examples for this category.These syndromes are caused by large deletion SVson different, disease-specific chromosomes, with theheterozygous deletion SV spanning from 1 to 4 millionbase pair, each encompassing multiple genes andtheir regulatory regions, and typically being flankedby regions of Segmental Duplication that are sus-pected to play a role in the molecular mechanismsleading to the formation of the deletion (i.e.,NAHR).

Children with Williams syndrome have a distinc-tive pattern of facial dysmorphisms, connectivetissue abnormalities, aortic stenosis, mental retarda-tion, speech delay, and a characteristic neurobehav-ioral phenotype, including being indiscriminatelyfriendly to strangers (hypersociability), fearing loudsounds, and being particularly interested inmusic. Most of the patients have a 1.5–1.8 Mbpdeletion on chromosome 7q11.23, with around 28genes falling into this region. All children withthis syndrome have a confirmed phenotype–geno-type association between the elastin gene and thedefects in vascular and connective tissues (Ewartet al., 1993; Merla, Brunetti-Pierri, Micale, & Fusco,2010).

VCFS (also DiGeorge syndrome or 22q11 Dele-tion syndrome) is associated with multiple congeni-tal abnormalities, including typical facial featuresand a cleft palate, malformations of the heart andoutflow tract, immune and endocrine disorders,borderline learning disabilities, and a very high rateof psychiatric disorders, most notably early-onsetschizophrenia and related psychoses and disordersof the autism spectrum. Most children with thissyndrome have a heterozygous 3 Mbp deletion onchromosome 22q11, which encompasses about 50genes, with some of the children having a shorterdeletion region (1.5 Mbp) and even fewer patientshaving a variety of different, shorter atypical dele-tions in the 22q11 region. Several of the geneswithin the deletion region have attracted consider-able attention, as a result of their known or inferredfunction in developmental processes, or because ofknown or suspected key roles in nervous systemfunction (Gothelf et al., 2005; McDonald-McGinn &Sullivan, 2011; Shprintzen, 2008) For example, oneTATA box-binding transcription factor gene, TBX1,has been associated with the heart malformationphenotype in a mouse model of the disease (Jerome& Papaioannou, 2001). Another example of a genewith an interesting functionality that lies within the22q11 deletion region is Catechol-O-methyltransfer-ase (COMT), which codes for an enzyme thatmetabolizes the dopamine in neurons and is impor-tant for the balance of excitatory and inhibitory sig-nals in the brain (Vorstman et al., 2009). But all inall, it is fair to say that the genetic and molecularetiology of VCFS is far from clear. For example,individuals with this syndrome have a rate ofschizophrenia that is about 30 times higher than inthe general population. Furthermore individualswith this syndrome have a rate of ASD that is ele-vated about 20-fold. But these elevated rates do notmatch with the size or type of the particular geno-

42 Zhang et al.

Page 10: child development

mic deletion SV in region 22q11 in a given individ-ual with VCFS. In other words, whereas most indi-viduals carrying a seemingly identical 3 Mbpdeletion encompassing about 50 genes only 30% ofthem develop schizophrenia and 70% do not. Thisperplexing problem can hopefully be resolved byapplying the new high-output second-generationsequencing-based technologies as discussed earlier,with the aim of resolving and cataloging all possi-ble genetic modifiers of the main 3 Mbp deletion.Possible genetic modifiers could exist in the form ofdifferences in the exact position of the deletion end-points or as small SVs and other sequence variantswithin the boundaries of the main deletion, but onthe other non-deletion-carrying 22nd chromosome.And in addition to that any other SVs and othersequence variants in general, anywhere in thepatient’s genome, could have a further modifyingeffect on the individual’s phenotype and combina-tion of symptoms.

Duplication SV and Associated Child DevelopmentDisorders

The deletions described earlier are often theresults of NAHR, often arising in the SD regionsthat are flanking the SV prone region. The samemechanism of NAHR is hypothesized to be able togive rise to sequence duplication variants, often inthe same genomic location in which deletion vari-ants have been known to occur. Although it is morecommon to see an association between a deletionand a developmental disorder, duplications ofgenomic sequence associated with a detrimentaleffect on phenotype are now more often observed,a result of the advent of more powerful and easilyavailable genomic analysis technologies.

A developmental disorder that was shown earlyon to be associated with a structural sequenceduplication event is Charcot-Marie-Tooth diseasetype 1A (CMT1A), which comprises symptomssuch as distal muscle atrophy, sensory loss, andslow nerve conduction velocity (Lupski, 2009; Roa,Garcia, & Lupski, 1991). The associated SV is theduplication of a 1.1 Mbp genome fragment on chro-mosome 17p12 that is carrying the dosage-sensitivegene PMP22.

It is appearing more and more likely that quitefrequently deletions and duplications are actuallycaused by a single rearrangement event due to theshared mechanism of NAHR. In fact, it seems thatunequal meiotic crossover generates the duplicationand deletion of the 1.5 Mb segment on chromosome17p12, which is associated with CMT1A (duplica-

tion) or another developmental disorder, hereditaryneuropathy with liability to pressure palsies (HNPP;deletion; Chance et al., 1994). Although bothCMT1A and HNPP belong to the demyelinatingperipheral neuropathies, the clinical features arequite different.

Complex Child Developmental Disorders AssociatedWith Deletion or Duplication SV

Here we describe ASD and attention deficithyperactivity disorders (ADHD) as examples toshow how current cutting-edge technologies changethe landscape of what we understand aboutcomplex child developmental disorders—but alsoserving as examples for the interesting observationthat sometimes it can be SV in the form of deletionsor duplication that give rise to a phenotype changewithin the same spectrum.

ASDs are a group of neurodevelopmental disor-ders characterized by impairment in language, defi-cit in social-emotional functioning, along withstereotyped behaviors and restricted interests. TheASD category includes autistic disorder, Aspergersyndrome, and pervasive developmental disordernot otherwise specified (or atypical autism). Theestimated prevalence of ASD is at least 0.6% in theUnited States with the occurrence ratio of 4:1between men and women (Newschaffer et al.,2007). Twin and family studies showed that ASDare among the most heritable complex disorders.Some studies have reported a concordance of 80%or more for monozygotic twins compared to 5% fordizygotic pairs, and a 25- to 40-fold increase in riskin ASD families compared to overall populations(Chakrabarti & Fombonne, 2005). Recent work hasindicated that the effect of environmental factorsmay have been underestimated, but the existence ofa genetic component of considerable magnitude inASD is not in doubt (Hallmayer et al., 2011).Despite the strong evidence for a genetic involve-ment in ASD, only few of the relevant genetic fac-tors have been identified with certainty. Linkagestudies have implicated several disease-associatedregions including 2q, 7q, 15q, 16p, 17q, 19p, and Xqand a few candidate genes including MET, SLC6A4,RELN (reelin), PTEN, TSC1, neuroligins and theirbinding partners (Klauck, 2006; Losh, Sullivan,Trembath, & Piven, 2008; Veenstra-Vanderweele,Christian, & Cook, 2004). Mutations in those genesthat can cause monogenic disorders such as fragileX syndrome and Rett syndrome have also beenreported to contribute to the etiology of ASD (Lau-monnier et al., 2004; Zhang et al., 2002).

Child Development and Structural Variation 43

Page 11: child development

After the discovery of pervasive genomic struc-tural variation and with the rapid progress in thetechnology for structural variation analysis, manystudies have now investigated the role of SVs inASD. They have found that genomic structural vari-ation in general is a factor of considerable interestin the molecular etiology of ASD and furthermorethat several specific SVs (of which we will nameonly a few examples here) are particularly wellassociated with the disease. Sebat et al. (2007) haveconducted array CGH analysis to detect anincreased rate of de novo copy number SVs. Denovo copy number SVs refers to an increased occur-rence of deletion or duplication SVs that are onlypresent in ASD patients, but not in their parents. In264 families, 17 de novo SVs were revealed andconfirmed to be present in 16 individuals, predomi-nantly in patients (14 of 195) with only 2 (of 196)being in controls. Such a strong association(p = .0005) suggests a significant contribution ofthese SVs to the disease. Using Affymetrix 10 KSNP arrays, the Autism Genome Project Consortiumhas genotyped and carried out linkage analysis onindividuals coming from 1,181 ASD families with atleast two affected individuals in each one (Pintoet al., 2010). A total of 254 copy number SVs wereidentified with high confidence in 196 ASD patientsfrom 173 families among which 10 families carry denovo SVs. In one family two ASD siblings bear anidentical 300 kbp deletion on chromosome 2p16,causing the elimination of the neurexin 1 gene(NRXN1) that is a partner for neuroligins and playsa critical role in synaptogenesis. A 933 kbp de novoduplication on 17p12 was detected in two pairs ofASD siblings and in one ASD individual from threefamilies; we note that the duplication of this regionalso contributes to the etiology of CMT1A (seeabove). A genome-wide association study was con-ducted in 751 multiplex families from the AutismGenetic Resource Exchange and a certain numberof de novo SVs were confirmed by evaluation ofclinical data from the Children’s Hospital of Bostonand results from a population study in Iceland(Weiss et al., 2008). This revealed a recurrent dele-tion SV on chromosome 16p11.2 (a locus for whicha reciprocal duplication SV is known to exist aswell) that is tightly associated with ASD andappears to account for 1% of ASD cases studied.This association between the 16p11.2 deletion SVand ASD was also reported by Marshall et al.(2008). To pinpoint copy number SVs that carryhigh susceptibility to ASDs, another whole-genomescan was conducted in 859 ASD cases and 1,409healthy controls using Illumina HumanHap550

BeadChip SNP arrays (Glessner et al., 2009). Thecandidate SVs were re-tested in another cohort con-sisting of 1,336 ASD cases and 1,110 controls. Thisstudy has identified two groups of genes involvedin neuronal cell-adhesion (NLGN1 and ASTN2) andubiquitin pathways (UBE3A, PARK2, RFWD2 andFBXO40) that may confer increased susceptibility toASD. More recently several expansive and impres-sive studies have confirmed, expanded and refinedthese findings about an important role for genomicSVs, inherited as well as de novo, in the disordersof the autism spectrum (Levy et al., 2011; Pintoet al., 2010; Sanders et al., 2011).

ADHD is characterized by age-inappropriate inat-tentive, hyperactive, and impulsive symptoms (Amer-ican Psychiatric Association, 2000). Elia et al. (2010)evaluated the effects of SVs in ADHD using the Illu-mina Infinium II HumanHap550 BeadChip. Theyidentified 222 inherited SVs in ADHD patients but notin 2,026 controls. The ADHD-specific SVs wereenriched with genes important for neurodevelopmentand neurological functions, with some of them beingthe candidate genes for other neurological disordersincluding ASD, mental retardation, and TS. A secondstudy conducted by Williams et al. (2010) alsorevealed significantly higher rates (p = 8.9 9 10�5) oflarge (> 500 kb) rare (< 1%) SVs in 366 ADHD chil-dren compared to 1,047 ethnically matched controls.Interestingly, chromosome 16p13.11 duplicationswere enriched in the ADHD group, and furthermoreSVs identified as enriched in the ADHD childrengroup were in loci that had also been reported forASD and schizophrenia. In addition, a very recentstudy reported by the same group also found signifi-cant association of duplications at 15q13.3 withADHD, which were confirmed by the analysis of anadditional 2,242 ADHD case subjects and 8,552comparison controls from four unrelated cohorts (Wil-liams et al., 2012). In another very recent study, thisone again by Elia et al. (2012), 1,013 ADHD cases and4,105 healthy children were analyzed for disease-asso-ciated CNVs using 550 k SNPs. It was found thatthere is a significant enrichment of CNVs in theADHD patients in either the various genes of the me-tabotropic glutamate receptors (GRM1 throughGRM8) as well as in genes that interact with the GRMgenes, with about 10% of the cases showing CNVenrichment in this GRM-interaction network.

Inversion of Genomic Sequence and Associated ChildDevelopment Disorders

As another type of genomic structural variation,inversion of stretches of genomic sequence is also

44 Zhang et al.

Page 12: child development

involved in child developmental disorders. Huntersyndrome is an X-linked genetic disorder caused byinsufficiency of the enzyme iduronate-2-sulfatase(IDS), which leads to the accumulation of glycos-aminoglycans (GAGs) in the body—especially ofheparan sulfate and dermatan sulfate in the lyso-somes and their excretion in the urine. A widerange of phenotypes was observed in children withthis syndrome, with severe forms including pro-nounced mental retardation, short status, deafness,spasticity, and progressive damage to multipleorgans, including liver and brain, usually causingthe death of children with this syndrome beforeadulthood (Scriver, Beaudet, Sly, & Valle, 1989).Individuals affected with a mild form of this syn-drome may only exhibit mild or no mental retarda-tion and can live through adulthood. Differentmutations and deletions were revealed in about20% of the patients, whereas in 13% of the childrenwith Hunter syndrome, there is a structuralsequence inversion between the IDS gene and aregion called 2nd IDS locus (IDS-2) located within90 kb telomeric of the IDS gene, which disrupts theintron 7 of IDS and causes the deficiency of thisgene (Bondeson et al., 1995).

In addition to directly causing some disorders,genomic sequence inversions can be a risk factorfor a certain kind of developmental diseases,because in these cases the inversions do not haveeffects in parents, but increase the risk of their off-spring having a diseases-associated SV (anotherexample for SVs giving rise to more SVs). Oneexample for this is Sotos syndrome, an autosomaldominant disorder characterized by physical over-growth, distinctive craniofacial dysmorphic features,accompanied by mild to severe mental retardationand motor, cognitive, and social developmentaldelay (Kurotaki et al., 2003). A much higher per-centage (52%) of Japanese Sotos syndrome patientscarry a deletion SV on chromosome 5q35 than non-Japanese patients (6%) and fathers of these patientspredominantly carry a 1.9-Mb inversion variant onchromosome 5q35 that predisposes to the disease intheir offspring. This has highlighted the importanceof detecting inversion variants even in the healthypopulation to predict the risk of diseases in the nextgeneration.

Conclusion

Novel experimental technologies, in particularhigh-output next-generation (currently second-gen-eration) DNA sequencing, have made it possible to

detect comprehensively, and with a high degree ofaccuracy, copy number variation and more gener-ally structural variation in the human genomicDNA sequence. It has become obvious that suchstructural variation is widespread even in the nor-mal human genome and that it has to be taken intoaccount when working to understand the relationbetween genotype and phenotype. Similarly theeffects of structural variation must be incorporatedin our efforts to understand the molecular etiolo-gies of disease phenotypes, including those that areof central importance in child development. Furtherinvestigations need to be undertaken to promotethe discovery and cataloging of all common andrare SVs. Efforts to associate phenotypes, especiallynondisease phenotypes, with SVs have only justcommenced. Furthermore, we need to address theintricate effects of combinations of several SVsand non-SV sequence variants within a single gen-ome, each exerting only a minor effect and maybeonly when present in a particular genomic back-ground. The greatest challenge will be to under-stand complex genotypes made up from bothcommon and rare variants leading to complex phe-notypes. Given the advanced technological toolboxat our disposal for both experimental and computa-tional analysis, we find that we are better posi-tioned than ever to undertake comprehensivestudies of the effects of SVs in complex disorders.This should eventually deepen profoundly ourunderstanding of the variation in child develop-ment in disease and in health.

References

1000 Genomes Consortium. (2010). A map of human gen-ome variation from population-scale sequencing. Nat-ure, 467, 1061–1073.

Abyzov, A., & Gerstein, M. (2011). AGE: Defining break-points of genomic structural variants at single-nucleo-tide resolution, through optimal alignments with gapexcision. Bioinformatics, 27, 595–603.

Abyzov, A., Urban, A. E., Snyder, M., & Gerstein, M.(2011). CNVnator: An approach to discover, genotype,and characterize typical and atypical CNVs from familyand population genome sequencing. Genome research,21, 974–984.

Alkan, C., Coe, B. P., & Eichler, E. E. (2011). Genomestructural variation discovery and genotyping. NatureReviews Genetics, 12, 363–376.

Alkan, C., Kidd, J. M., Marques-Bonet, T., Aksay, G.,Antonacci, F., Hormozdiari, F., et al. (2009). Personal-ized copy number and segmental duplication mapsusing next-generation sequencing. Nature Genetics, 41,1061–1067.

Child Development and Structural Variation 45

Page 13: child development

American Psychiatric Association (Ed.). (2000). Diagnosticand statistical manual of mental disorders DSM-IV-TR(Text Revision, 4th ed.). Arlington, VA: Author.

Beaudet, A. L. (2013). The utility of chromosomal micro-array analysis in developmental and behavioral pediat-rics. Child Development, 84, 121–132.

Bondeson, M. L., Dahl, N., Malmgren, H., Kleijer, W. J.,Tonnesen, T., Carlberg, B. M., et al. (1995). Inversion ofthe IDS gene resulting from recombination with IDS-related sequences is a common cause of the Huntersyndrome. Human Molecular Genetics, 4, 615–621.

Brunetti-Pierri, N., Berg, J. S., Scaglia, F., Belmont, J.,Bacino, C. A., Sahoo, T., et al. (2008). Recurrent recipro-cal 1q21.1 deletions and duplications associated withmicrocephaly or macrocephaly and developmental andbehavioral abnormalities. Nature Genetics, 40, 1466–1471.

Chakrabarti, S., & Fombonne, E. (2005). Pervasive devel-opmental disorders in preschool children: Confirmationof high prevalence. American Journal of Psychiatry, 162,1133–1141.

Chance, P. F., Abbas, N., Lensch, M. W., Pentao, L., Roa,B. B., Patel, P. I., et al. (1994). Two autosomal dominantneuropathies result from reciprocal DNA duplication/deletion of a region on chromosome 17. Human Molecu-lar Genetics, 3, 223–228.

Chen, K., Wallis, J. W., McLellan, M. D., Larson, D. E.,Kalicki, J. M., Pohl, C. S., et al. (2009). BreakDancer: Analgorithm for high-resolution mapping of genomicstructural variation. Nature Methods, 6, 677–681.

Chiang, D. Y., Getz, G., Jaffe, D. B., O’Kelly, M. J., Zhao,X., Carter, S. L., et al. (2009). High-resolution mappingof copy-number alterations with massively parallelsequencing. Nature Methods, 6, 99–103.

Conrad, D., Pinto, D., Redon, R., Feuk, L., Gokcumen, O.,Zhang, Y., et al. (2010). Origins and functional impactof copy number variation in the human genome. Nat-ure, 464, 704–712.

Elia, J., Gai, X., Xie, H. M., Perin, J. C., Geiger, E., Gless-ner, J. T., et al. (2010). Rare structural variants found inattention-deficit hyperactivity disorder are preferen-tially associated with neurodevelopmental genes. Molec-ular Psychiatry, 15(6), 637–646.

Elia, J., Glessner, J. T., Wang, K., Takahashi, N., Shtir, C.J., Hadley, D., et al. (2012). Genome-wide copy numbervariation study associates metabotropic glutamatereceptor gene networks with attention deficit hyperac-tivity disorder. Nature Genetics, 44, 78–84.

Ewart, A. K., Morris, C. A., Atkinson, D., Jin, W., Sternes,K., Spallone, P., et al. (1993). Hemizygosity at the elas-tin locus in a developmental disorder, Williams syn-drome. Nature Genetics, 5, 11–16.

Francke, U. (1978). Clinical syndromes associated withpartial duplications of chromosomes 2 and 3: dup(2p),dup(2q),dup(3p),dup(3q). Birth Defects Original ArticleSeries, 14, 191–217.

Glessner, J. T., Connolly, J. J., & Hakonarson, H. (2012).Rare genomic deletions and duplications and their role

in neurodevelopmental disorders. Current Topics inBehavioral Neurosciences, 12, 345–360.

Glessner, J. T., Wang, K., Cai, G., Korvatska, O., Kim, C.E., Wood, S., et al. (2009). Autism genome-wide copynumber variation reveals ubiquitin and neuronal genes.Nature, 459, 569–573.

Gonzaga-Jauregui, C., Lupski, J. R., & Gibbs, R. A. (2012).Human genome sequencing in health and disease.Annual Review of Medicine, 63, 35–61.

Gonzalez, E., Kulkarni, H., Bolivar, H., Mangano, A.,Sanchez, R., Catano, G., et al. (2005). The influence ofCCL3L1 gene-containing segmental duplications onHIV-1/AIDS susceptibility. Science, 307, 1434–1440.

Gothelf, D., Eliez, S., Thompson, T., Hinard, C., Penni-man, L., Feinstein, C., et al. (2005). COMT genotypepredicts longitudinal cognitive decline and psychosisin 22q11.2 deletion syndrome. Nature Neuroscience, 8,1500–1502.

Hallmayer, J., Cleveland, S., Torres, A., Phillips, J., Cohen,B., Torigoe, T., et al. (2011). Genetic heritability andshared environmental factors among twin pairs withautism. Archives of General Psychiatry, 68, 1095–1102.

Haraksingh, R. R., Abyzov, A., Gerstein, M., Urban, A.E., & Snyder, M. (2011). Genome-wide mapping ofcopy number variation in humans: Comparative analy-sis of high resolution array platforms. PLoS ONE, 6,e27859.

Iafrate, A. J., Feuk, L., Rivera, M. N., Listewnik, M. L.,Donahoe, P. K., Qi, Y., et al. (2004). Detection of large-scale variation in the human genome. Nature Genetics,36, 949–951.

The International HapMap Consortium. (2005). A haplo-type map of the human genome. Nature, 437, 1299–1320.

Ionita-Laza, I., Rogers, A. J., Lange, C., Raby, B. A., &Lee, C. (2009). Genetic association analysis of copy-number variation (CNV) in human disease pathogene-sis. Genomics, 93, 22–26.

Iskow, R. C., McCabe, M. T., Mills, R. E., Torene, S.,Pittard, W. S., Neuwald, A. F., et al. (2010). Naturalmutagenesis of human genomes by endogenous retro-transposons. Cell, 141, 1253–1261.

Itsara, A., Cooper, G. M., Baker, C., Girirajan, S., Li, J.,Absher, D., et al. (2009). Population analysis of largecopy number variants and hotspots of human geneticdisease. American Journal of Human Genetics, 84,148–161.

Jerome, L. A., & Papaioannou, V. E. (2001). DiGeorgesyndrome phenotype in mice mutant for the T-boxgene, Tbx1. Nature Genetics, 27, 286–291.

Kasowski, M., Grubert, F., Heffelfinger, C., Hariharan,M., Asabere, A., Waszak, S. M., et al. (2010). Variationin transcription factor binding among humans. Science,328, 232–235.

Kidd, J. M., Cooper, G. M., Donahue, W. F., Hayden, H.S., Sampas, N., Graves, T., et al. (2008). Mapping andsequencing of structural variation from eight humangenomes. Nature, 453, 56–64.

46 Zhang et al.

Page 14: child development

Kim, P. M., Lam, H. Y., Urban, A. E., Korbel, J. O.,Affourtit, J., Grubert, F., et al. (2008). Analysis of copynumber variants and segmental duplications in thehuman genome: Evidence for a change in the processof formation in recent evolutionary history. GenomeResearch, 18, 1865–1874.

Klauck, S. M. (2006). Genetics of autism spectrum disor-der. European Journal of Human Genetics, 14, 714–720.

Korbel, J. O., Abyzov, A., Mu, X. J., Carriero, N., Cayting,P., Zhang, Z., et al. (2009). PEMer: A computationalframework with simulation-based error models forinferring genomic structural variants from massivepaired-end sequencing data. Genome Biology, 10, R23.

Korbel, J. O., Kim, P. M., Chen, X., Urban, A. E., Weiss-man, S., Snyder, M., et al. (2008). The current excite-ment about copy-number variation: How it relates togene duplications and protein families. Current Opinionin Structural Biology, 18, 366–374.

Korbel, J. O., Tirosh-Wagner, T., Urban, A. E., Chen, X.N., Kasowski, M., Dai, L., et al. (2009). The geneticarchitecture of Down syndrome phenotypes revealedby high-resolution analysis of human segmental triso-mies. Proceedings of the National Academy of Sciences ofthe United States of America, 106, 12031–12036.

Korbel, J. O., Urban, A. E., Affourtit, J. P., Godwin, B.,Grubert, F., Simons, J. F., et al. (2007). Paired-end map-ping reveals extensive structural variation in the humangenome. Science, 318, 420–426.

Kurotaki, N., Harada, N., Shimokawa, O., Miyake, N.,Kawame, H., Uetake, K., et al. (2003). Fifty microdele-tions among 112 cases of Sotos syndrome: Low copyrepeats possibly mediate the common deletion. HumanMutation, 22, 378–387.

Lam, H. Y., Pan, C., Clark, M. J., Lacroute, P., Chen, R.,Haraksingh, R., et al. (2012). Detecting and annotatinggenetic variations using the HugeSeq pipeline. NatureBiotechnology, 30, 226–229.

Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody,M. C., Baldwin, J., et al. (2001). Initial sequencing andanalysis of the human genome. Nature, 409, 860–921.

Laumonnier, F., Bonnet-Brilhault, F., Gomot, M., Blanc,R., David, A., Moizard, M. P., et al. (2004). X-linkedmental retardation and autism are associated with amutation in the NLGN4 gene, a member of the neuroli-gin family. American Journal of Human Genetics, 74,552–557.

Levy, D., Ronemus, M., Yamrom, B., Lee, Y. H., Leotta,A., Kendall, J., et al. (2011). Rare de novo and transmit-ted copy-number variation in autistic spectrum dis-orders. Neuron, 70, 886–897.

Losh, M., Sullivan, P. F., Trembath, D., & Piven, J. (2008).Current developments in the genetics of autism: Fromphenome to genome. Journal of Neuropathology andExperimental Neurology, 67, 829–837.

Lupski, J. R. (2009). Genomic disorders ten years on. Gen-ome Medicine, 1, 42.

Marques-Bonet, T., & Eichler, E. E. (2009). The evolutionof human segmental duplications and the core duplicon

hypothesis. Cold Spring Harbor Symposia on QuantitativeBiology, 74, 355–362.

Marshall, C. R., Noor, A., Vincent, J. B., Lionel, A. C.,Feuk, L., Skaug, J., et al. (2008). Structural variation ofchromosomes in autism spectrum disorder. AmericanJournal of Human Genetics, 82, 477–488.

McCarroll, S. A., Kuruvilla, F. G., Korn, J. M., Cawley, S.,Nemesh, J., Wysoker, A., et al. (2008). Integrated detec-tion and population-genetic analysis of SNPs and copynumber variation. Nature Genetics, 40, 1166–1174.

McCarthy, S. E., Makarov, V., Kirov, G., Addington, A.M., McClellan, J., Yoon, S., et al. (2009). Microduplica-tions of 16p11.2 are associated with schizophrenia.Nature Genetics, 41, 1223–1227.

McDonald-McGinn, D. M., & Sullivan, K. E. (2011). Chro-mosome 22q11.2 deletion syndrome (DiGeorge syn-drome/velocardiofacial syndrome). Medicine (Baltimore),90, 1–18.

Mefford, H. C., & Eichler, E. E. (2009). Duplication hot-spots, rare genomic disorders, and common disease.Current Opinion in Genetics & Development, 19, 196–204.

Merla, G., Brunetti-Pierri, N., Micale, L., & Fusco, C.(2010). Copy number variants at Williams-Beuren syn-drome 7q11.23 region. Human Genetics, 128, 3–26.

Mills, R. E., Luttig, C. T., Larkins, C. E., Beauchamp, A.,Tsui, C., Pittard, W. S., et al. (2006). An initial map ofinsertion and deletion (INDEL) variation in the humangenome. Genome Research, 16, 1182–1190.

Mills, R. E., Walter, K., Stewart, C., Handsaker, R. E.,Chen, K., Alkan, C., et al. (2011). Mapping copy num-ber variation by population-scale genome sequencing.Nature, 470, 59–65.

Newschaffer, C. J., Croen, L. A., Daniels, J., Giarelli, E.,Grether, J. K., Levy, S. E., et al. (2007). The epidemiol-ogy of autism spectrum disorders. Annual Review ofPublic Health, 28, 235–258.

Perry, G. H., Dominy, N. J., Claw, K. G., Lee, A. S., Fieg-ler, H., Redon, R., et al. (2007). Diet and the evolutionof human amylase gene copy number variation. NatureGenetics, 39, 1256–1260.

Pinto, D., Pagnamenta, A. T., Klei, L., Anney, R., Merico,D., Regan, R., et al. (2010). Functional impact of globalrare copy number variation in autism spectrum dis-orders. Nature, 466, 368–372.

Redon, R., Ishikawa, S., Fitch, K. R., Feuk, L., Perry, G.H., Andrews, T. D., et al. (2006). Global variation incopy number in the human genome. Nature, 444, 444–454.

Ritz, A., Bashir, A., & Raphael, B. J. (2010). Structural var-iation analysis with strobe reads. Bioinformatics, 26,1291–1298.

Roa, B. B., Garcia, C. A., & Lupski, J. R. (1991). Charcot-Marie-tooth disease type 1A: Molecular mechanisms ofgene dosage and point mutation underlying a commoninherited peripheral neuropathy. International Journal ofNeurology, 25–26, 97–107.

Sachidanandam, R., Weissman, D., Schmidt, S. C., Kakol,J. M., Stein, L. D., Marth, G., et al. (2001). A map of

Child Development and Structural Variation 47

Page 15: child development

human genome sequence variation containing 1.42million single nucleotide polymorphisms. Nature, 409,928–933.

Sanders, S. J., Ercan-Sencicek, A. G., Hus, V., Luo, R.,Murtha, M. T., Moreno-De-Luca, D., et al. (2011). Multi-ple recurrent de novo CNVs, including duplications ofthe 7q11.23 Williams syndrome region, are stronglyassociated with autism. Neuron, 70, 863–885.

Schatz, D. G., & Ji, Y. (2011). Recombination centres andthe orchestration of V(D)J recombination. NatureReviews Immunology, 11, 251–263.

Schuster-Bockler, B., Conrad, D., & Bateman, A. (2010).Dosage sensitivity shapes the evolution of copy-numbervaried regions. PLoS ONE, 5, e9474.

Scriver, C. R., Beaudet, A. L, Sly, W. S., & Valle, D.(Eds.). (1989). The metabolic basis of inherited disease. NewYork: McGraw-Hill.

Sebat, J., Lakshmi, B., Malhotra, D., Troge, J., Lese-Martin,C., Walsh, T., et al. (2007). Strong association of denovo copy number mutations with autism. Science, 316,445–449.

Sebat, J., Lakshmi, B., Troge, J., Alexander, J., Young, J.,Lundin, P., et al. (2004). Large-scale copy numberpolymorphism in the human genome. Science, 305,525–528.

Shprintzen, R. J. (2008). Velo-cardio-facial syndrome: 30years of study. Developmental Disabilities ResearchReviews, 14, 3–10.

Stankiewicz, P., & Lupski, J. R. (2010). Structural varia-tion in the human genome and its role in disease.Annual Review of Medicine, 61, 437–455.

Stiles, J. (2011). Brain development and the nature versusnurture debate. Progress in Brain Research, 189, 3–22.

Taylor, J. G., Choi, E. H., Foster, C. B., & Chanock, S. J.(2001). Using genetic variation to study human disease.Trends in Molecular Medicine, 7, 507–512.

Tom Strachan, A. P. R. (Ed.). (2004). Human moleculargenetics 3, New York: Garland.

Tuzun, E., Sharp, A. J., Bailey, J. A., Kaul, R., Morrison,V. A., Pertz, L. M., et al. (2005). Fine-scale structuralvariation of the human genome. Nature Genetics, 37,727–732.

Urban, A. E., Korbel, J. O., Selzer, R., Richmond, T.,Hacker, A., Popescu, G. V., et al. (2006). High-resolutionmapping of DNA copy alterations in human chromo-some 22 using high-density tiling oligonucleotide arrays.

Proceedings of the National Academy of Sciences of theUnited States of America, 103, 4534–4539.

Veenstra-Vanderweele, J., Christian, S. L., & Cook, E. H.,Jr. (2004). Autism as a paradigmatic complex geneticdisorder. Annual Review of Genomics and Human Genet-ics, 5, 379–405.

Vorstman, J. A., Turetsky, B. I., Sijmens-Morcus, M. E., deSain, M. G., Dorland, B., Sprong, M., et al. (2009). Pro-line affects brain function in 22q11DS children with thelow activity COMT 158 allele. Neuropsychopharmacology,34, 739–746.

Weiss, L. A., Shen, Y., Korn, J. M., Arking, D. E., Miller,D. T., Fossdal, R., et al. (2008). Association betweenmicrodeletion and microduplication at 16p11.2 andautism. New England Journal of Medicine, 358, 667–675.

Williams, N. M., Franke, B., Mick, E., Anney, R. J., Frei-tag, C. M., Gill, M., et al. (2012). Genome-wide analysisof copy number variants in attention deficit hyperactiv-ity disorder: The role of rare variants and duplicationsat 15q13.3. American Journal of Psychiatry, 169, 195–204.

Williams, N. M., Zaharieva, I., Martin, A., Langley, K.,Mantripragada, K., Fossdal, R., et al. (2010). Rare chro-mosomal deletions and duplications in attention-deficithyperactivity disorder: A genome-wide analysis. Lancet,376, 1401–1408.

Ye, K., Schulz, M. H., Long, Q., Apweiler, R., & Ning, Z.(2009). Pindel: A pattern growth approach to detectbreak points of large deletions and medium sized inser-tions from paired-end short reads. Bioinformatics, 25,2865–2871.

Zhang, F., Carvalho, C. M., & Lupski, J. R. (2009). Com-plex human chromosomal and genomic rearrange-ments. Trends Genetics, 25, 298–307.

Zhang, F., Khajavi, M., Connolly, A. M., Towne, C. F.,Batish, S. D., & Lupski, J. R. (2009). The DNA replica-tion FoSTeS/MMBIR mechanism can generate genomic,genic and exonic complex rearrangements in humans.Nature Genetics, 41, 849–853.

Zhang, H., Liu, X., Zhang, C., Mundo, E., Macciardi, F.,Grayson, D. R., et al. (2002). Reelin gene alleles andsusceptibility to autism spectrum disorders. MolecularPsychiatry, 7, 1012–1017.

Zhang, Z. D., Du, J., Lam, H., Abyzov, A., Urban, A. E.,Snyder, M., et al. (2011). Identification of genomicindels and structural variations using split reads. BMCGenomics, 12, 375.

48 Zhang et al.