60
Creating a Consensus Set of Structural Variants Jayne Hehir-Kwa on behalf of the GoNL SV Group

Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Creating a Consensus Set of Structural Variants

Jayne Hehir-Kwa on behalf of the GoNL SV Group

Page 2: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

What do when mean by structural variation?

Insertion

Events larger than 20bp

Page 3: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Different Approaches for NGS data

Page 4: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

The dataset: GoNL study design

500 bp

90 bpCoverageBase: 12xPhysical: 30x

[Boomsma et al, 2013, EJHG]

1000 G GoNL

DNA source Cell lines Blood

Coverage 3-4x >12x

Data generation

Mult. platforms BGI/Illumina

Population Multiple, unrelated

Dutch only,trios, twins

Phenotype info None Multiple

Page 5: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

SV detection strategy GoNL

Multiple …• Algorithms• Approaches

– Single sample– Trio aware– Population aware

Page 6: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Call Set OverviewAlgorithm

Approach

Priority

Dels Dups Ins Trans Inv

Pindel Split Read

2 TandomDups

N/A

UG Split Read

1 N/A N/A N/A

123SV Read Pair 5 Eversions

Breakdancer

Read Pair 6 N/A

DWACSeq Read Depth

8 N/A N/A N/A

CNVnator Read Depth

9 N/A N/A N/A

Gstrip Read Depth / Read Pair

7 N/A N/A N/A N/A

Clever Split Read / Read Pair

3 N/A N/A N/A N/A

SOAP* deNovo

De Novo Assembly

4 N/A N/A

Façade* Read Depth

10 N/A N/A N/A N/A

*Family basedLast updated 22nd Jan 2013

Page 7: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

DeletionsBreakdancer GASV

123SV

DWACSeq_Con DWACSeq_Pem Pindel

Why use so many different algorithms?

E. Wubbo & K. Ye

Page 8: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

When are two deletions the same?

Page 9: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

When are deletions the same?

• Experimental noise blurs breakpoints between samples

• Different algorithms have different sensitivity– Lag and delay in breakpoint definition

• Different approaches are biased by different genome structure– Simple Repeats, Segmental Duplications …

• Different algorithms report events differently– Left align– Last ‘normal’ base, first deleted base

Page 10: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Strategies for merging regions

Median of regionsMinimum overlap, maximum overlap…

Over merging -> Breakpoint blurringUnder merging -> Over segmentationIs the end result a unique set of genomic regions? (i.e. overlap)

Ignore differences in breakpoint accuracy between algorithms

Page 11: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Algorithm Aware Merging with Reciprocal overlap

0.7

Ignore Breakpoints

Evidence of region is used for subdomain, but not for defining start and end of subdomain

Page 12: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Filtering to create a confident set

397,979 merged deletion events

1. >= 2 algorithms

2. > 2 trios

3. Inherited

4. Different approaches

5. No alpha-satellites

9,187 merged deletion events

23,195 events

15,734 events

13,248 events

9,379 events

Page 13: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Assessing the quality of the consensus

Are the results of the merge reproducible?Does the consensus make sense?Wet lab• Validation

Dry lab• Annotation• Size and frequency distributions

Page 14: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

How does filtering affect the size?

Sines

Lines

Page 15: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

The full frequency range of events detected

0

2000

4000

6000

8000

10000

12000

14000

Many Deletion events are rare

Number of Trios

Num

ber

of

events

Common CNV >2 trios

Page 16: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Allele Frequency Distribution

Filtered

Unfiltered

Number of Trios

Page 17: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

1KG vs GoNL

K. Ye

Page 18: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Which Gene components are affected ?

Ref Seq Gene Observed

Intergenic 5990

Intronic 2331

Exonic 167*

Splice Acceptor 2

UTR 888

Page 19: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Genetic deletion load per individual

Size 20 – 100bp

Size 100bp+

Total deleted bases 175,000bp (±8,027)

6,384,000bp

Nr Del Events 5,195 (±231) 3,299 (±153)

Integenic 3,108 (±140) 2088 (±106)

Exonic 10 (±2) 36 (±5)

Nr exons affected 10 (±2) 91 (±19)

Predicted Loss of Function 5 (±2) 12 (±2)

Nr Genes with OMIM Disease Terms

2 (±1) 3 (±2)

Page 20: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Results 100+ validations GoNL main

• 48 assays• Size range: 100bp-150kb

• All assay show band on gel• Sanger sequencing confirmed

breakpoint for 46/48 calls• 2 assays did not show a breakpoint

in the Sanger read (F4 and G5)

F4

G5

W. Kloosterman

Page 21: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

What about the rest?

Page 22: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Window merge 20 – 100bp

Approximately 10% of all calls are not correctly left aligned

Subdomain

Premise: What is the chance that 2 independent deletions will occur within a window of X basepairs.

The same filters steps as 100bp + deletions = 21,048 del events

Where X = 0,1,…bp (we choose X=1)

Page 23: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’
Page 24: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Final results Sanger 20 – 100bp dels

• 1 failed alignment• 2 repetitive regions• 8 unclear partly due to poor sequence or due to repetitive nature of region• 2 contained no deletion• 84 variants confirmed

Y Repeats Unclear no del no alignment0

10

20

30

40

50

60

70

80

90

W. Kloosterman

Page 25: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Merging of RearrangementsC

ase

ID

Breakpoint 1

Breakpoint 2

Breakpoint 3

Algorithm aware / priority used for defining consensus regions

Page 26: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Structural Variation merging cont’d

Pindel

123SV

Breakdancer

Cluster if both breakpoints withinfragment size (500 bp) from

< 500 bp < 500 bp

V. Gruyev

Page 27: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Call Set OverviewAlgorithm

Approach

Priority

Dels Dups Ins Trans Inv

Pindel Split Read

2 TandomDups

N/A

UG Split Read

1 N/A N/A N/A

123SV Read Pair 5 Eversions

Breakdancer

Read Pair 6 N/A

DWACSeq Read Depth

8 N/A N/A N/A

CNVnator Read Depth

9 N/A N/A N/A

Gstrip Read Depth / Read Pair

7 N/A N/A N/A N/A

Clever Split Read / Read Pair

3 N/A N/A N/A N/A

SOAP* deNovo

De Novo Assembly

4 N/A N/A

Façade* Read Depth

10 N/A N/A N/A N/A

*Family basedLast updated 22nd Jan 2013

Page 28: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Results for merging rearrangements

Translocations 60

Inversions 90

Evertions 146

Insertions 2,242

Large duplications (100bp and above)

1,047

V. Gruyev

Page 29: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Does it make sense?

• Data in public data sources, is often limited by– detection method– sample size

• “is the same event” difficult

• 85% of breakpoints are non-coding

To be continued …. Validation currently ongoing

Page 30: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Conclusions

• Different methods of merging regions use– Reciprocal overlap– Window– Breakpoint clustering

• All methods are ‘algorithm aware’• The consensus only makes sense if it

produces results which can be (wet) experimentally validated

Page 31: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

AcknowledgementsGoNL SV Team

Victor Guryev UMCG Wigard Kloosterman UMCULaurent C. Francioli UMCUJayne Y. Hehir-Kwa UMCNTobias Marschall CWIAlexander Schoenhuth CWIMatthijs Moed LUMCEric-Wubbo Lameijer LUMCAbdel Abdellaoui VUSlavik Koval EMCJoep de Ligt UMCNNajaf Amin EMCFreerk van DijkUMCGLennart Karssen EMCHailiang Mei LUMCKai Ye LUMC

University of WashingtonFereydoun HormozdiariEvan E. Eichler

GoNL steering committee

Paul de Bakker UMCUDorret Boomsma VUCornelia van Duin EMCGert-Jan van OmmenLUMCEline Slagboom LUMCMorris Swertz UMCGCisca Wimenga UMCG

ERIBA / RuGRene Wardenaar

BGI ShenzenJun Wang

Page 32: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Acknowledgements

Page 33: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Other cut offs?

1 or more

2 or more 3 or more

23195 17985 15734

Not Inherited

6029 3621 2486

Inherited 17166 14364 13248

Single 2483 2282 3869

Multiple 14,683 12,082 9379

AlphaSatellites

353 192

Final 11727 9187

Page 34: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Next steps

1. As genotype data was incomplete I suggest to re-genotype it by mappingThe data against reference assembly expanded with references for SV Alt alleles

2. Medium-sized deletions 20-100bp (may need a different approach)

3. Validation of new segments

Page 35: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

The consensus deletion listData Set N Min Max Median Stdev >1Mb

1000 Genomes

13,722 103 887kb 29kb 28kb 0

GoNL 53,844 2 21Mb* 50bp 150kb 40

* Centromere Chr1

Data Set Intergenic Exonic

1000 Genomes 63% (8,635) 5% (733)

GoNL 94% (75,893) 6% (4,454)

Common deletions vary greatly in size And are mostly intergenic

Page 36: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

OMIM Disease TermsOMIM_DISEASEHypoaldosteronism, congenital, due to CMO II deficiency, 610600 (3);Laron dwarfism, 262500 (3); Short stature, 604271 (3);Fetal hemoglobin quantitative trait locus 1, 141749 (3){Macular degeneration, age-related, reduced risk of}, 603075 (3);{Hypersensitivity syndrome, carbamazepine-induced, susceptibility{HIV infection, resistance to}, 609423 (2){Hypersensitivity syndrome, carbamazepine-induced, susceptibility{Macular degeneration, age-related, reduced risk of}, 603075 (3);Myopathy, distal 2, 606070 (3){Pulmonary fibrosis, idiopathic, susceptibility to}, 178500 (3)CR1 deficiency (1); {?SLE susceptibility} (1); [Blood group, KnopsThrombocytopenic purpura, autoimmune, 188030 (1)[Blood group, Ii], 110800 (3); Adult i phenotype with congenitalProstate cancer, hereditary, 176807 (3); BarrettCeroid lipofuscinosis, neuronal, 3, 204200 (3)Reticular dysgenesis, 267500 (3)Spermatogenic failure 9, 613958 (3)Esophageal squamous cell carcinoma, 133239 (3)Immunodeficiency due to CASP8 deficiency, 607271 (3);

Page 37: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

• 96 Candidates variants from consensus 20-100bp deletion set• Selection and PCR primer design by Victor/Jane• PCR performed on sample A105c• Sanger sequencing in Forward and Reverse direction• Alignment of F and R traces to reference sequence• Manual assessment of alignments• MiSeq sequencing of PCR amplicons

• Mapping to reference genome with deletion alleles included

Experimental Setup

Page 38: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Examples

Deletion 50 bp

Unclear what happened due to repetitive nature of genomic region

Page 39: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Primer design and verification

Primer design is done for all SVs

Suggestion for validation:

1. All translocations and inversions (150)

2. All evertions (142)

3. 48 insertions ( random selection from 2,242 ) one sample (A105c?)

4. 48 large deletions ( not from main-paper set) one sample (A105c?)

Page 40: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Translocations, intra and inter

Filter Step Window 100bp

Window 500bp

Window 1kb

Consensus regions

12,239 22,242

2 or more algorithms

280 266 362

Different approaches

0 44 73

2 or more trios 40 68

Inherited 311 582

• 8 inter• 23 intra

• 20 inter• 38 intra

Page 41: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

SV merging

3. Select most precise coordinates:If Pindel is in cluster – then PindelIf no Pindel, but Clever, then Clever If no – Assembly Still not – GenomeSTRIPNo? 123SVAnd finally – BreakDancer

!!! If multiple calls from the same tool cluster together – pick the call that is seen in bigger number of samples

4. Get Read Depth data on top of thatUsing best coordinates from step3 and coordinates of DWAC-Seq, CNVnator and Façade, and requiring 80% reciprocal overlap. 5. Remove all clusters that are not supported by multiple SV calling approaches:Pindel only, Assembly-only, 123SV only, BreakDancer onlyI left in Clever-only ( RP+SR ) and GenomeSTRIP-only clusters (RD+RP)

Page 42: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

20 – 100bp filtering

• >= 2 algorithms• >= 3 trios• >= 1 inherited• No alpha satelites

• 21048 events

Page 43: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Per Individual - deletions20 – 100bp 100bp+

Total bp del 175,188bp (±8,027) 6,384,084bp*

Nr exons affected 10 (±2) 91 (±19)

Nr Del Events 5195 (±231) 3299 (±153)

Integenic 3108 (±140) 2088 (±106)

Intronic 1528 (±73) 880 (±37)

UTRs 542 (±28) 294 (±16)

Exonic 10 (±2) 36 (±5)

SA 4 (±1) 1 (±1)

SD 1 (±1) 0

LOF 5 (±2) 12 (±2)

Nr Genes with OMIM Disease Terms

2 (±1) 3 (±2)

*No single event larger than 0.5Mb, no annotated alpha regions

Page 44: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

How many are supported by different tools?

Tool Support dels Used for fine-coords

Pindel 10,791 10,791

Clever 13,671 5,557

Assembly 3,954 689

GenomeSTRiP 13,587 3,327

123SV 17,305 176

BreakDancer 18,889 35

DWAC-Seq 2,287 0

CNVnator 2,376 0

Façade 594 0

TOTAL: 20,575

Page 45: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

“Backwards compatibility” and sequence content inside deleted

segments9,186 deletions to be reported in GoNL main paper

100% of them are in the new set!

AluY/J

AluYa5AluYb8

L1

SVA-F/E

SINEs

LINEs

Simple

Victor

Page 46: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Deletions in 9.2k set: characterization

Purpose: to understand bias of SV distribution and selective pressures

How: 1.Generate 100 (1000) sets where position of deletion are permuted

(shuffled)2.Annotate these sets for overlap with genes / pathways3.Check what is over- (under-) represented in GoNL deletion set

compared to randomized sets

Constraints:4.NGS-accessibility: shuffled deletions should be flanked by

sequences to which we can unambiguously map reads (next to deletion, but with 500 bp we observe reads with mapq>30 in one or more GoNL individuals)

5.Preserve chromosomal distribution of SVs: only local shuffling, at least 1 kb, but not further than 2 Mb from original deletion.

6.Try to keep balance between mechanisms of SV formation – e.g. if many deletions are due to a SINE element in original set, they should contribute an equal amount to a shuffled set.

Page 47: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Deletions in 9.2k set: repeats

Repeat type Count BasesAvg

length Expected

Observed in GoNL 9.2k dels

(>50% overlap), %

Tandem (TRF) 1,394,795116,376,8

54 83.4 150 1498 0.11

AluYa5 3,9421,167,38

0 296.1 1 75519.1

5

AluY 136,36839,539,49

8 289.9 82 576 0.42

AluYb8 2,707 824,019 304.4 <1 44316.3

6Dust(low complexity) 2,938,002

98,602,586 33.6 61 390 0.01

L1HS 1,5213,302,50

7 2,171.3 2 21414.0

7

AluSx 334,04997,304,74

3 291.3 148 213 0.06

AluYb9 518 141,456 273.1 <1 8115.6

4SVA_F 989 766,159 774.7 1 61 6.17

AluSg 81,04423,558,08

1 290.7 40 61 0.08

Page 48: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Results from Shuffle analysis 100bp+

Observed Expected (mean)

Expected (stdev)

Intronic 2331 2302 36

Exonic 167 732 24 1.8x10-120

UTR 888 975 28 1.1x10-3

OMIM Disease1

19 137 11 2.4x10-27

LOF2 48 97 8 2.6x10-9

1. Event must be exonic and have an associated OMIM disease term2. 1st Exon and / or more than 50% of a gene(s) exons are affected

Page 49: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Results from Shuffle analysis 20 - 100bp

Observed Expected (mean)

Expected (stdev)

Intronic 5607 5704 57 0.056

Exonic 58 270 15 5.9x10-44

UTR 1957 2040 40 0.018

OMIM Disease1

15 51 7 1.6x10-7

LOF2 18 20 4

1. Event must be exonic and have an associated OMIM disease term2. 1st Exon and / or more than 50% of a gene(s) exons are affected

Page 50: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

• 65167 candidate regions

Nr Algorithms Nr Events

1 41245

2 17104

3 6197

4 621

Page 51: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Verification of 20 to 100 bp dels by Sanger

sequencing

Page 52: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Final results MiSeq

Ref_reads Alt_readsConclusion Mi-Seq

0 81AA59 145RA2 62AA

109 142RA141 206RA118 138RA20 36RA0 50AA64 92RA0 97AA0 51AA4 50AA1 49AA0 32AA

Summary stats  AA 35RA 55RR 2Cov<10 4  96Overall (MiSeq + Sanger): 3/96 dels were not confirmed

Page 53: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’
Page 54: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

How is GoNL different to the 1000Genomes?

1000 Genomes GoNL

Source of DNA Cell lines Blood

Coverage 3-4x ~12x

Data generation

Multiple centers & platforms

BGI/Illumina

Population Multiple populations Netherlands

Family Unrelated Trio/quartet: MZ or DZ

Phenotype None Multiple

Page 55: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

5 calls > 1Mb1092 > 10kb

Page 56: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

1. Algorithm

• 397,979 calls in total

Nr Algorithms Calls

1 374,783

2 10,303

3 4,034

4 3,598

5 3,085

6 1,561

7 468

8 133

9 13

23,195

Page 57: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

2. Nr Trios & 3. InheritanceNr Trios N

1 5210

2 2251

>2 15734

No Yes

2486 13248

Page 58: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

4. Different Approaches

• Single Method = 3869• Multiple Method = 9379

5. Alpha Satellites• = 192

• Final Set = 9187 deletions

Page 59: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Events per algorithm

UG 7729

PINDEL 25529

CLEVER 41415

ASSEMBLY 6210

123SV 15229

Breakdancer 352

GSTRIP 2

DWACSeq 62

CNVnator 0

façade 0

Page 60: Creating a Consensus Set of Structural Variants · –Simple Repeats, Segmental Duplications … • Different algorithms report events differently –Left align –Last ‘normal’

Size Density < 1kb