28
Review Paper digit Structural Variation Detection Structural Variation Detection

Structural Variation Detection

Embed Size (px)

DESCRIPTION

Journal club slides for "Detection of structural DNA variation from next generation sequencing data: a review of informatic approaches" and a description of the software pipeline digit

Citation preview

Page 1: Structural Variation Detection

Review Paper digit

Structural Variation Detection

Structural Variation Detection

Page 2: Structural Variation Detection

Review Paper digit

Table of contents

• Detection of structural DNA variation from next generationsequencing data: a review of informatic approaches

• The software pipeline digit

Structural Variation Detection

Page 3: Structural Variation Detection

Review Paper digit

Detection of structural DNA variation from next generation sequencingdata: a review of informatic approaches

Authors: Haley J. Abel1, Eric J. Duncavage2

(1) Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA

(2) Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA

Structural Variation Detection

Page 4: Structural Variation Detection

Review Paper digit

Definition

Structural DNA variation is generally defined asvariation in a DNA region larger than 1 kb andincludes several classes such as translocations,inversions, insertions/deletions and copy numbervariations (CNVs).

Structural Variation Detection

Page 5: Structural Variation Detection

Review Paper digit

Methods

• Cytogenetics:unbiasedBUT limited resolution/sensitivity (350-500 band level)

• FISH - Fluorescence in situ hybridization:increased resolution, ability to test fixed interphase cells, faster turnaround time,greater sensitivityBUT evaluation of multiple loci requires multiple probes/assays ⇒ increasingcomplexity

• Microarrays:especially reliable for CNV and loss of heterozygosityBUT unable to detect balanced translocations

• Next Generation Sequencing:ability to detect full range of genetic variation ⇒ potential to streamline testing byusing a single analysis platformBUT dependent on coverage ⇒ susceptible to GC bias

Structural Variation Detection

Page 6: Structural Variation Detection

Review Paper digit

NGS - Methods

• Depth of coverage analysis

......

• Discordant read pair analysis

......

......

• Split read analysis

......

......

Structural Variation Detection

Page 7: Structural Variation Detection

Review Paper digit

Tools

Structural Variation Detection

Page 8: Structural Variation Detection

Review Paper digit

Translocation and Inversion Detection

Structural Variation Detection

Page 9: Structural Variation Detection

Review Paper digit

Translocation and Inversion Detection

Discordant pair analysis:

• sensitive but low breakpoint resolution and low specificity• repetetive regions on top of beeing a source of false positives drivetranslocations (difficult to separate from false positives)

• Many methods try heuristic cut offs to improve specificity:• VariationHunter and Hydra consider multiple, high scoring mappings if

available• GASVPRO tries to improve specificity by combining discordant pair

and coverage analysis

Split read analysis: excellent breakpoint resolution (up to single baseresolution), but requires much higher coverages.

Structural Variation Detection

Page 10: Structural Variation Detection

Review Paper digit

Copy Number Variation Detection

Structural Variation Detection

Page 11: Structural Variation Detection

Review Paper digit

Copy Number Variation Detection

Discordant pair analysis:• performs best on large deletions. struggles with dublications• cannot detect large insertions with the usual strategy due to pairs notspanning the dublication

• cannot detect large insertions with the usual strategy due to pairs notspanning the dublication

• Pindel pieces translocation calls together via pattern growth algorithmto find large insertions

Structural Variation Detection

Page 12: Structural Variation Detection

Review Paper digit

Copy Number Variation DetectionDepth of coverage analysis:

• DNA• Main problem is accounting for factors that modify read depth like GC

bias• event-wise testing (EWT) algorithms rely purely on deviations in

coverage from the sample’s mean depth. GC content is adressed byanalysing the genome bin wise.

• SegSeq, CNVnator, CNAseg, CNV-seq compare the same region acrossmultiple samples (control samples). Methods make also use ofbins/partitions and rely on coverage ratios which permit finer CNVmapping.

• Exome• target-capture-data increases GC bias• small size of targets makes paired normals or population controls a

requirement

Structural Variation Detection

Page 13: Structural Variation Detection

Review Paper digit

Copy Number Variation Detection

• Exome methods calculate local CNV first and then merge themtogether with various strategies

• CONTRA: uses circular binary segmentation for merging• CoNVEX: denoises coverage ratios with a discrete wavelet transformand then uses a Hidden Markov Model to identify gains and losses

• ExomeCNV: models B-allele frequencies to detect loss ofheterozygosity

• Some methods try to find sporadic CNVs in population exome data bynormalizing read count with principal component analysis

Structural Variation Detection

Page 14: Structural Variation Detection

Review Paper digit

Insertion and Deletion Detection

Structural Variation Detection

Page 15: Structural Variation Detection

Review Paper digit

Insertion and Deletion Detection

• Alignment based:• offered by many packages: SAMtools, GATK, VarScan• usually rely on probabilistic models to make indel calls• Dindel and Stampy rely on this methods but employ filters to

differentiate common errors from true indels.• all of these methods require considerable validation• insertion detection is limited to 15% of total read length

• Split read based:• Suitable for medium sized indels• High false-positive rate, because no probabilistic models discriminate

between alignment errors and true events

Structural Variation Detection

Page 16: Structural Variation Detection

Review Paper digit

Conclusion

• There is currently no single informatic method capable of identifyingthe full range structural DNA variation.

• multiple complementary tools are required for robust variant detection

• Since methods can perform differently based on assay design,extensive validation is required for clinical use.

Structural Variation Detection

Page 17: Structural Variation Detection

Review Paper digit

digit - A tool for detection and identification of genomicinter-chromosomal translocations

Authors: Richard Meier1,4, Stefan Graw1,4, Julian R Molina3, PeterBeyerlein1, Devin Koestler2, Jeremy Chien4

(1) Technical University of Applied Sciences Wildau, 15745 Wildau, Germany(2) Department of Biostatistics, University of Kansas Medical Center, Kansas City, KS 66160(3) Department of Medical Oncology, Mayo Clinic, Rochester, MN 55905(4) Department of Cancer Biology, University of Kansas Medical Center, Kansas City, KS 66160

Structural Variation Detection

Page 18: Structural Variation Detection

Review Paper digit

Goals of the project

• Interchromosomal translocation detection utilizing mate-pairsequencing data

• Handle artifacts and robustly remove false positive calls

• Investigate translocation profiles of populations / trait associatedgroups

Structural Variation Detection

Page 19: Structural Variation Detection

Review Paper digit

Mate-pair sequencing

sequencing

adapter ligationfragmentation

circularisation

fragmentation

genome / chromosome

template

terminalfragment

read1 read2

Structural Variation Detection

Page 20: Structural Variation Detection

Review Paper digit

digit overview

MVM

Den

sity

01

23

4

1.0 1.5 2.0

rejected approved

chromosome_1

chromosome_2

read_1 read_2

preprocessed read pairs

retain discordantlymapping read pairs

find read pairclusters

cluster_Bcluster_A . . .. . .calculate MVMs foreach pair and filterout low value pairs

recluster remainingread pairs

compare samples and search forgroup associations

called translocations

chr14:1573290-158941 & chr22:2732247-2735312

chr2:11002738-11002738 & chr3:3763766-3766175

chr11:1573290-158941 & chr17:1147275-11149839

chr5:25819112-25821940 & chr9:5151006-5154147. . . . . . . . .. . .

sample_1

sample_4

sample_5

sample_9

discordant read pair cluster

group associated super cluster

concordant pairs

discordant pairs

threshold

Structural Variation Detection

Page 21: Structural Variation Detection

Review Paper digit

Mapping validity measure (MVM)

... ...

AC T GG G A CT A C T ACG TA C G T

AC T GG G A CT G C T ACG G AC CC A GG CT

G A CT A C T ACG

TA C G T

G AC CC A GG CT

2kb

mapper assignsread to region

mapper assignsread to region

chromosome A

chromosome B

G T A T C C CA A TC G C AT ......

......

but

• The two reads of a read pair are remapped to both regions the mapping softwareoriginally assigned them to.

• If a read maps equally well to both regions it is impossible to resolve the readpair’s origin and it is rejected.

• The MVM judges how ambiguous the mappability of a read pair is.

• The MVM distribution of concordant (well behaved) read pairs in a sample areused as internal standard to determine a filtering threshold.

Structural Variation Detection

Page 22: Structural Variation Detection

Review Paper digit

Simulated data

lStructural Variation Detection

Page 23: Structural Variation Detection

Review Paper digit

Real dataSamples achieved a good separation between ambiguous and distinct readpairs via MVM thresholds across the board.

concordantdiscordantthreshold

1.0 1.5 2.0 2.5

01

23

45

sample LU526

N = 749 Bandwidth = 0.02034

Den

sity

1.0 1.5 2.0 2.5

01

23

4

sample LU748

N = 461 Bandwidth = 0.04017

Den

sity

1.0 1.5 2.0 2.5

02

46

8

sample LU271

N = 641 Bandwidth = 0.01287

Den

sity

1.0 1.5 2.0 2.5

01

23

45

6

sample LU820

N = 534 Bandwidth = 0.02189

Den

sity

1.0 1.5 2.0 2.5

01

23

4

sample LU1160

N = 268 Bandwidth = 0.05798

Den

sity

1.0 1.5 2.0 2.5

01

23

4

sample LU1184

N = 370 Bandwidth = 0.04009

Den

sity

1.0 1.5 2.0 2.5

01

23

45

sample LU1434

N = 391 Bandwidth = 0.02477

Den

sity

1.0 1.5 2.0 2.50

24

68

sample LU1466

N = 585 Bandwidth = 0.01317

Den

sity

Structural Variation Detection

Page 24: Structural Variation Detection

Review Paper digit

Real data

• We processed 20 patient samples from a non-cancer background and35 patient samples with a lung cancer background.

• After comparing the two populations we retrieved 218 sample specificevents, 160 of which were from cancer.

• 328 translocation calls were shared between 2 or more samples

• 16 translocations were shared between cancer samples exclusively.

• 13 translocations shared between cancer and normal samples werelabeled potentially disease relevant.

Structural Variation Detection

Page 25: Structural Variation Detection

Review Paper digit

Translocations exclusively found in cancer

Structural Variation Detection

Page 26: Structural Variation Detection

Review Paper digit

Translocations enriched in cancer

Structural Variation Detection

Page 27: Structural Variation Detection

Review Paper digit

Conclusion

• The method sucessfully reduces the false positives rate.

• Group comparision and population analysis is working, but will requiremore samples to make reliable judgements in the future.

• Comparisions with other tools are running as we speak.

• Combining strategies from different tools might be valuable to lookinto in future projects.

Structural Variation Detection

Page 28: Structural Variation Detection

Review Paper digit

Questions

?Structural Variation Detection