Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
1
Mario Cáceres
Genomic structural variation
The “new” genomic variation
DNA sequence differs across individuals much more than researchers had suspected through structural changes
A huge amount of structural variation has been discovered in the genome of all the studied species, which could have a much bigger biological impact than SNPs
2
1. Types of structural variants (SVs)
2. Methods for detecting SVs
3. Indels and CNVs
4. Inversions
5. Mechanisms of generation
6. Functional effects and examples
Overview
What are structural variants
Genomic alterations other than nucleotidesubstitutions that change the organization ofthe DNA molecule
Traditionally detection limited by resolution of techniques and considered to be quite large
Currently include changes at the base level, although typically are >1kb (no clear size limit)
3
Types of structural variants
A B C D
A B C E
Reference
Insertion
A B C Deletion
A BC D Inversion
A B C C C
D
D Copy Number Variation
“CNVs”
and traslocations….
Methods for detecting SVs
New genomic techniques have changed the focus from large to small events and from locus-specific to global studies:
Throughput
Cytogenetic techniques(regular karyotyping, FISH, fiber FISH)
Array comparative genomichybridization (aCGH)(BAC arrays, high-density oligo arrays)
SNP array genotyping(Afymetrix, Illumina)
Pair-end mapping(Sanger or next-generation sequencing)
+
−
Resolution
+
−
4
Cytogenetic techniques
Karyotyping FISH
Chromosomepainting Fiber FISH
Array CGH
Ratio of fluoresecence intensity of the test and the reference DNA indicatesthe differences in copy number for a particular location in the genome
5
Characteristics of aCGH arrays
BAC arraysHigh-density
oligonucleotide arrays
• 1 BAC (100-200 kb) per genomic
region (Whole-genome tiling path)
• Good genomic coverage
• Low resolution (>20-50 kb)
• Variable oligo spacing (~2-10 kb on
Agilent 244K or 1M arrays)
• Targeted or whole-genome approach
• Coverage depending on probe density
• High resolution (0.2-10 kb)
Pair-end mapping (PEM) method
1. Generation of DNA library of fragments of definedsize from sample of interest
2. Sequencing of both ends of a large number offragments
3. Mapping of both ends to reference genome andcomputational prediction of SVs
High-throughput way of detecting all kind of mediumand large sized SVs from an individual:
6
Pair-end mapping (PEM) methods
Normallymapped
PEM span> cutoff D
PEM distribution
PEM span< cutoff I
Opposite endorientation
Short-fragment PEM: Low coverage/High resolution(500 bp - 3 kb)
Long-fragment PEM: High coverage/Low resolution(10 kb – 40 kb)
SV detection methods comparison
Feuk et al. Nature Reviews Genetics 7, 85–97 (February 2006) | doi:10.1038/nrg1767
7
Importance of structural variants
http://projects.tcag.ca/variation/
Total entries: 89,427CNVs: 57,829Inversions: 850
InDels (100 bp - 1 kb): 30,748Total CNV loci: 14,478
Articles cited: 38(march 25, 2010)
Segmental duplications Inversion breakpoints Copy number variants
Timescale of CNVs study
(Beckman et al., Nat. Rev. Genet. 2007)
CNVs discoveryis not that recent!
8
Initial large-scale CNV studies
• Analysis of 20 individualsusing low density arrays
• 76 unique CNV regions of>100 kb
• ~11 CNVs differing betweenpair of individuals
• Analysis of 55 individualsusing BAC arrays (1 Mb resolution)
• 224 unique CNV regions
• 12.4 CNVs differing betweenpair of individuals
Global analysis of human CNVs
• CNV map by WGTP BAC arrays and Affymetrix arrays
• 270 individuals ofEuropean, African andAsian ancestry
• 1447 CNV regions covering360 Mb (12% of thegenome)
• Comprenhensive CNV mapby tiling oligonucleotidearrays (42 million probes)
• 450 individuals ofEuropean, African andAsian ancestry
• 11700 CNVs assayed of>0.5 kb
9
Comprehensive human CNV map
Overview ofExperimentalstrategy forCNV discoveryand genotyping
(Conrad et al., Nature 2009)
• 1098 validated CNVs differ between two individual genomes
• Detected CNVs ranged from 443 bp to 1.28 Mb (median size 2.9 kb)
• Validated CNVs cover a total of 112.7 Mb (3.7% of the genome)
On average:
Population distribution of CNVs
CNVs across 29 human populations
CNV-based phylogenetic tree
10
CNVs in other species
CNVs in other species
11
Stefansson 2005NAHRIncreased fertilityPFGE/genot17%900 Kb17q21.31-32
Gimelli 2003-unknownFISH/PFGE4%6.0 Mb15q11.2-13.1
Martin 2004-unknownComp. map.-305 Kb16p12.2
Giglio 2002NAHR?unknownFISH9%5.1 Mb4p16.1-16.2
DSB?
NAHR?
NAHR
NAHR
Origin
Giglio 2002unknownFISH15%4.6 Mb8p23.1
Gilling 2006unknownCloning-22.6 Mb10p11.21-q21.1
Osborne 2001Williams susceptibility?FISH/PFGE5%2.2 Mb7q11.23
Small 1997unknownSouthern8-72%49 KbXq28
ReferencePhenotypic effectDetectionmethod
Freq.SizeChrom.
region
Human polymorphic inversions
Insights from genomic studies
12
Insights from genomic studies
New genomic techniques have uncovered an unprecedented degree of all types of structural variation, including inversions:
• Feuk et al., PLOS Genet. 2005:- 1576 putative inversions between human and chimp genomes (23/27 exp. validated) - Three of those were polymorphic in 10 european individuals
• Korbel et al., Science 2007:- ~3 kb fragments pair-end mapping in 2 individuals of european and african ancestry- 122 inversion brkpts. by genomic comparison with human ref. sequence
• Tuzun et al., Nat. Genet. 2005; Kidd et al., Nature 2008:- Fosmid pair-end mapping in 9 individuals of different geographic origins- 224 inversions validated by FISH, fingerprint analysis or breakpoint sequencing
• And....:- Several individual genome sequences, 1000 Genomes Project, etc.
• Levy et al., PLOS Biol. 2007:- 90 inversions identified by direct comparison of the C. Venter and the HG18 reference genome sequence assemblies
Current inversion data
Total of 850 inversions in the Database of Genomic Variants (but many redundant!)
Chromosomal distribution
Size distribution
05
101520253035404550
chr1
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chr1
0
chr1
1
chr1
2
chr1
3
chr1
4
chr1
5
chr1
6
chr1
7
chr1
8
chr1
9
chr2
0
chr2
1
chr2
2
chrX
chrY
Obs. Dist.
Exp. Dist
Fre
que
ncy
of
inve
rsio
ns
Inversions
CNVs
13
Functional and evolutionary
impact ofinversions
Large-scalegenomic
information
Traditional studies oninversions
Latest dataon humanpredictedinversions
INVFEST project (INVersion Functional & Evolutionary Studies)
Objectives and main questions
1
Human genome sequences
Structural variation data
Functional annotation Gene-expression levels
HapMap data
Other species genomes
1Catalogue of humaninversions
Evolutionaryhistory ofinversions
Functional consequencesof inversions.
Effect onnucleotidevariation
2
3
4
14
Biological consequences of structural variants.
Evaluate/Develop tools for study of inversions.
Complete characterization of human inversions.
Determine inversion genetic effects and adaptive value.
Main scientific contributions
49
17(1)
11
186(1)
203(3)
56
Combination of predictions from different studies resultedin 364 independent inversions in humans (354 new):
Levy et al. 2007(90 indep. inv.)
Korbel et al. 2007(91 indep. inv.)
Kidd et al. 2008(245 indep. inv.)
Catalogue of human inversions
(Martínez and Cáceres, unpub. data)
15
Inversions in other species
Drosophila Inversions have been studiedfor more than 80 years and thousands of inversions have been described as fixedOr polymorphic differences
Few inversions are known in other organisms:
Mechanisms of generation of SVs
Non-allelic homologous recombination (NAHR)(duplications, deletions, inversions, translocations )
Non-homologous end joining (NHEJ) and microhomology mediated end joining (MMEJ)(deletions, inversions?)
Transposition of transposable elements(insertions, deletions)
Fork Stalling and Template Switching (FOsTEs)
SVs are typically generated during DNA break-induced repair,recombination or replication by different possible mechanisms:
16
Non-allelic homologous recombination
Intra or interchomosomal recombination betweencopies of a sequence in different genomic positions:
Non-homologous end joining
Mechanism of repair of double-strand
breaks that typically utilizes short
homologous DNA sequences
(microhomology) to guide repair:
17
SV breakpoints examples
Inferred mechanisms of SVs
Kidd et al.,Nature 2008(Fosmid PEM)
Lam et al., Nature Biotech. 2010 (different methods)
18
Biological relevance of SVs
Currently identified structural variantscomprise many more bases of the human
genome than SNPs and could have amuch bigger biological impact
Functional effects of SVs
Altered gene dosage and expression(CNVs)
Disruption of gene or regulatory elements(insertion, deletions, inversions)
Gene fusion(deletions, inversions)
Change in the exon-intron structure(insertion, deletions, CNVs, inversions)
Modification of gene regulatory regions(insertion, deletions, CNVs, inversions)
Indirect effects though increased susceptibility of genomic rearrangements(CNVs, inversions)
19
Functional consequences of SVs
Distribution of SVstends to be biasedagainst genes andother functionalelements
Genomic location of SVs
(Conrad et al., Nature 2010)
Functional impact of CNVs by type,frequency and population
20
Fre
quen
cy o
f inv
ersi
ons
Distribution of breakpoints of 114 precisely located inversions
0
10
20
30
40
50
60
70
80
Intergenic Intronic 1 mRNA 2 mRNA
12 inversions (10.5%)22.8%
66.7%(76 inv.)
(26 inv.)
(7 inv.) (5 inv.)
(Martínez and Cáceres, unpub. data)
Genomic location of SVs
Functional categories enriched in CNVs
21
Cell adhesionSensory perceptionSynapse organizationSynaptic transmissionNervous system developmentDefense responseImmune response
Functional categories enriched in CNVs
CNVs and complex diseases
22
Inversions and genomic disorders
CNV example #1: CCL3L1
CCL3L1: chemokine (C-C motif) ligand 3-like 1
(González et al., Science. 2005)
23
CNV example #2: Amylase gene
Biaka ind.low starch(6 copies)
Chimpanzeelow starch(2 copies)
(Perry et al., Nat. Genet. 2007)
Japanese ind.high starch(14 copies)
(Stefansson et al., Nat. Genet. 2005)
Inversion example #1: Chr17q inversion
• Inversion originated through NAHR between 200-500 kb SD blocks
• Found mainly in Europan populations at ~20% frequency
• Possibly positively selected through increased fertility of carrier females
24
Inversion example #2: ChrX inversion
FLNA
FLNA
EMD
EMD
37.4 Kb(Small et al., Nat. Genet. 1997)
Inversion example #2: ChrX inversion