24
1 Mario Cáceres Genomic structural variation The “new” genomic variation DNA sequence differs across individuals much more than researchers had suspected through structural changes A huge amount of structural variation has been discovered in the genome of all the studied species, which could have a much bigger biological impact than SNPs

Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

1

Mario Cáceres

Genomic structural variation

The “new” genomic variation

DNA sequence differs across individuals much more than researchers had suspected through structural changes

A huge amount of structural variation has been discovered in the genome of all the studied species, which could have a much bigger biological impact than SNPs

Page 2: Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

2

1. Types of structural variants (SVs)

2. Methods for detecting SVs

3. Indels and CNVs

4. Inversions

5. Mechanisms of generation

6. Functional effects and examples

Overview

What are structural variants

Genomic alterations other than nucleotidesubstitutions that change the organization ofthe DNA molecule

Traditionally detection limited by resolution of techniques and considered to be quite large

Currently include changes at the base level, although typically are >1kb (no clear size limit)

Page 3: Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

3

Types of structural variants

A B C D

A B C E

Reference

Insertion

A B C Deletion

A BC D Inversion

A B C C C

D

D Copy Number Variation

“CNVs”

and traslocations….

Methods for detecting SVs

New genomic techniques have changed the focus from large to small events and from locus-specific to global studies:

Throughput

Cytogenetic techniques(regular karyotyping, FISH, fiber FISH)

Array comparative genomichybridization (aCGH)(BAC arrays, high-density oligo arrays)

SNP array genotyping(Afymetrix, Illumina)

Pair-end mapping(Sanger or next-generation sequencing)

+

Resolution

+

Page 4: Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

4

Cytogenetic techniques

Karyotyping FISH

Chromosomepainting Fiber FISH

Array CGH

Ratio of fluoresecence intensity of the test and the reference DNA indicatesthe differences in copy number for a particular location in the genome

Page 5: Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

5

Characteristics of aCGH arrays

BAC arraysHigh-density

oligonucleotide arrays

• 1 BAC (100-200 kb) per genomic

region (Whole-genome tiling path)

• Good genomic coverage

• Low resolution (>20-50 kb)

• Variable oligo spacing (~2-10 kb on

Agilent 244K or 1M arrays)

• Targeted or whole-genome approach

• Coverage depending on probe density

• High resolution (0.2-10 kb)

Pair-end mapping (PEM) method

1. Generation of DNA library of fragments of definedsize from sample of interest

2. Sequencing of both ends of a large number offragments

3. Mapping of both ends to reference genome andcomputational prediction of SVs

High-throughput way of detecting all kind of mediumand large sized SVs from an individual:

Page 6: Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

6

Pair-end mapping (PEM) methods

Normallymapped

PEM span> cutoff D

PEM distribution

PEM span< cutoff I

Opposite endorientation

Short-fragment PEM: Low coverage/High resolution(500 bp - 3 kb)

Long-fragment PEM: High coverage/Low resolution(10 kb – 40 kb)

SV detection methods comparison

Feuk et al. Nature Reviews Genetics 7, 85–97 (February 2006) | doi:10.1038/nrg1767

Page 7: Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

7

Importance of structural variants

http://projects.tcag.ca/variation/

Total entries: 89,427CNVs: 57,829Inversions: 850

InDels (100 bp - 1 kb): 30,748Total CNV loci: 14,478

Articles cited: 38(march 25, 2010)

Segmental duplications Inversion breakpoints Copy number variants

Timescale of CNVs study

(Beckman et al., Nat. Rev. Genet. 2007)

CNVs discoveryis not that recent!

Page 8: Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

8

Initial large-scale CNV studies

• Analysis of 20 individualsusing low density arrays

• 76 unique CNV regions of>100 kb

• ~11 CNVs differing betweenpair of individuals

• Analysis of 55 individualsusing BAC arrays (1 Mb resolution)

• 224 unique CNV regions

• 12.4 CNVs differing betweenpair of individuals

Global analysis of human CNVs

• CNV map by WGTP BAC arrays and Affymetrix arrays

• 270 individuals ofEuropean, African andAsian ancestry

• 1447 CNV regions covering360 Mb (12% of thegenome)

• Comprenhensive CNV mapby tiling oligonucleotidearrays (42 million probes)

• 450 individuals ofEuropean, African andAsian ancestry

• 11700 CNVs assayed of>0.5 kb

Page 9: Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

9

Comprehensive human CNV map

Overview ofExperimentalstrategy forCNV discoveryand genotyping

(Conrad et al., Nature 2009)

• 1098 validated CNVs differ between two individual genomes

• Detected CNVs ranged from 443 bp to 1.28 Mb (median size 2.9 kb)

• Validated CNVs cover a total of 112.7 Mb (3.7% of the genome)

On average:

Population distribution of CNVs

CNVs across 29 human populations

CNV-based phylogenetic tree

Page 10: Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

10

CNVs in other species

CNVs in other species

Page 11: Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

11

Stefansson 2005NAHRIncreased fertilityPFGE/genot17%900 Kb17q21.31-32

Gimelli 2003-unknownFISH/PFGE4%6.0 Mb15q11.2-13.1

Martin 2004-unknownComp. map.-305 Kb16p12.2

Giglio 2002NAHR?unknownFISH9%5.1 Mb4p16.1-16.2

DSB?

NAHR?

NAHR

NAHR

Origin

Giglio 2002unknownFISH15%4.6 Mb8p23.1

Gilling 2006unknownCloning-22.6 Mb10p11.21-q21.1

Osborne 2001Williams susceptibility?FISH/PFGE5%2.2 Mb7q11.23

Small 1997unknownSouthern8-72%49 KbXq28

ReferencePhenotypic effectDetectionmethod

Freq.SizeChrom.

region

Human polymorphic inversions

Insights from genomic studies

Page 12: Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

12

Insights from genomic studies

New genomic techniques have uncovered an unprecedented degree of all types of structural variation, including inversions:

• Feuk et al., PLOS Genet. 2005:- 1576 putative inversions between human and chimp genomes (23/27 exp. validated) - Three of those were polymorphic in 10 european individuals

• Korbel et al., Science 2007:- ~3 kb fragments pair-end mapping in 2 individuals of european and african ancestry- 122 inversion brkpts. by genomic comparison with human ref. sequence

• Tuzun et al., Nat. Genet. 2005; Kidd et al., Nature 2008:- Fosmid pair-end mapping in 9 individuals of different geographic origins- 224 inversions validated by FISH, fingerprint analysis or breakpoint sequencing

• And....:- Several individual genome sequences, 1000 Genomes Project, etc.

• Levy et al., PLOS Biol. 2007:- 90 inversions identified by direct comparison of the C. Venter and the HG18 reference genome sequence assemblies

Current inversion data

Total of 850 inversions in the Database of Genomic Variants (but many redundant!)

Chromosomal distribution

Size distribution

05

101520253035404550

chr1

chr2

chr3

chr4

chr5

chr6

chr7

chr8

chr9

chr1

0

chr1

1

chr1

2

chr1

3

chr1

4

chr1

5

chr1

6

chr1

7

chr1

8

chr1

9

chr2

0

chr2

1

chr2

2

chrX

chrY

Obs. Dist.

Exp. Dist

Fre

que

ncy

of

inve

rsio

ns

Inversions

CNVs

Page 13: Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

13

Functional and evolutionary

impact ofinversions

Large-scalegenomic

information

Traditional studies oninversions

Latest dataon humanpredictedinversions

INVFEST project (INVersion Functional & Evolutionary Studies)

Objectives and main questions

1

Human genome sequences

Structural variation data

Functional annotation Gene-expression levels

HapMap data

Other species genomes

1Catalogue of humaninversions

Evolutionaryhistory ofinversions

Functional consequencesof inversions.

Effect onnucleotidevariation

2

3

4

Page 14: Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

14

Biological consequences of structural variants.

Evaluate/Develop tools for study of inversions.

Complete characterization of human inversions.

Determine inversion genetic effects and adaptive value.

Main scientific contributions

49

17(1)

11

186(1)

203(3)

56

Combination of predictions from different studies resultedin 364 independent inversions in humans (354 new):

Levy et al. 2007(90 indep. inv.)

Korbel et al. 2007(91 indep. inv.)

Kidd et al. 2008(245 indep. inv.)

Catalogue of human inversions

(Martínez and Cáceres, unpub. data)

Page 15: Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

15

Inversions in other species

Drosophila Inversions have been studiedfor more than 80 years and thousands of inversions have been described as fixedOr polymorphic differences

Few inversions are known in other organisms:

Mechanisms of generation of SVs

Non-allelic homologous recombination (NAHR)(duplications, deletions, inversions, translocations )

Non-homologous end joining (NHEJ) and microhomology mediated end joining (MMEJ)(deletions, inversions?)

Transposition of transposable elements(insertions, deletions)

Fork Stalling and Template Switching (FOsTEs)

SVs are typically generated during DNA break-induced repair,recombination or replication by different possible mechanisms:

Page 16: Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

16

Non-allelic homologous recombination

Intra or interchomosomal recombination betweencopies of a sequence in different genomic positions:

Non-homologous end joining

Mechanism of repair of double-strand

breaks that typically utilizes short

homologous DNA sequences

(microhomology) to guide repair:

Page 17: Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

17

SV breakpoints examples

Inferred mechanisms of SVs

Kidd et al.,Nature 2008(Fosmid PEM)

Lam et al., Nature Biotech. 2010 (different methods)

Page 18: Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

18

Biological relevance of SVs

Currently identified structural variantscomprise many more bases of the human

genome than SNPs and could have amuch bigger biological impact

Functional effects of SVs

Altered gene dosage and expression(CNVs)

Disruption of gene or regulatory elements(insertion, deletions, inversions)

Gene fusion(deletions, inversions)

Change in the exon-intron structure(insertion, deletions, CNVs, inversions)

Modification of gene regulatory regions(insertion, deletions, CNVs, inversions)

Indirect effects though increased susceptibility of genomic rearrangements(CNVs, inversions)

Page 19: Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

19

Functional consequences of SVs

Distribution of SVstends to be biasedagainst genes andother functionalelements

Genomic location of SVs

(Conrad et al., Nature 2010)

Functional impact of CNVs by type,frequency and population

Page 20: Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

20

Fre

quen

cy o

f inv

ersi

ons

Distribution of breakpoints of 114 precisely located inversions

0

10

20

30

40

50

60

70

80

Intergenic Intronic 1 mRNA 2 mRNA

12 inversions (10.5%)22.8%

66.7%(76 inv.)

(26 inv.)

(7 inv.) (5 inv.)

(Martínez and Cáceres, unpub. data)

Genomic location of SVs

Functional categories enriched in CNVs

Page 21: Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

21

Cell adhesionSensory perceptionSynapse organizationSynaptic transmissionNervous system developmentDefense responseImmune response

Functional categories enriched in CNVs

CNVs and complex diseases

Page 22: Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

22

Inversions and genomic disorders

CNV example #1: CCL3L1

CCL3L1: chemokine (C-C motif) ligand 3-like 1

(González et al., Science. 2005)

Page 23: Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

23

CNV example #2: Amylase gene

Biaka ind.low starch(6 copies)

Chimpanzeelow starch(2 copies)

(Perry et al., Nat. Genet. 2007)

Japanese ind.high starch(14 copies)

(Stefansson et al., Nat. Genet. 2005)

Inversion example #1: Chr17q inversion

• Inversion originated through NAHR between 200-500 kb SD blocks

• Found mainly in Europan populations at ~20% frequency

• Possibly positively selected through increased fertility of carrier females

Page 24: Genomic structural variation - UAB Barcelonabioinformatica.uab.cat/.../mcaceres_structuralvariation.pdf · 2010. 6. 1. · Insights from genomic studies New genomic techniques have

24

Inversion example #2: ChrX inversion

FLNA

FLNA

EMD

EMD

37.4 Kb(Small et al., Nat. Genet. 1997)

Inversion example #2: ChrX inversion