35
Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics

Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

Internship Presentation

Halina KrzystekMPS Candidate in Biomedical and Health

Informatics

Page 2: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

Part I: An Evaluation of Copy Number Variant Calling Algorithms for a Clinical Genomics Pipeline Using Exome Sequencing

Page 3: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

NCGENES2: North Carolina Clinical Genomic Evaluation by Next Generation Sequencing Phase 2

• The NCGENES2 project aims to generate evidence for the use of exome sequencing as a first-line diagnostic tool. • Seeking to expand the diagnostic yield of its exomes sequencing

by identifying additional variants (CNVs) which may contribute to a clinical phenotype.

Page 4: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

My Project Goals

(1) Explore the CNV calling tools available for exome sequencing in a literature review

(2) Evaluate their appropriateness for the NCGENES2 pipeline, and then

(3) Compare the tools’ performance on data from the 1000 Genomes Project

Page 5: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

Copy Number Variants (CNVs)

• Variations from the normal copy number of 2 for a diploid organism

Rice and Lysaght, 2017

Page 6: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

• Decreasing cost of sequencing

• Previous microarray- based methods: typically had a resolution of 400kb

.• ES could replace current methods that employ

microarrays + sequencing

• ES could have a finer granularity in CNV detection, identifying new CNVs as small as oneor two exons

• Shortcomings: low sensitivity and high number of false positives

Pros of Exome Sequencing for CNV

detection

Page 7: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

The Read-Depth

Method

Fromer and Purcell, 2014

Page 8: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

The Read-Depth

Method

Fromer and Purcell, 2014

• Normalization• Of raw read count• To remove bias such as GC content, mappability, and

capture

• Segmentation• Segment the CNV calls into chromosomal regions that may

span several exons

• CNV calling

Page 9: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

Methods

• ExomeDepth, CoNIFER, XHMM, CN.Mops and CODEX were appropriate for detecting CNVs in germline samples

• However, XHMM and CoNIFER require large number of samples for their normalization steps

• Thus, ExomeDepth, CN.Mops, and CODEX were selected for evaluation

• All tools were run using default values for comparison

• Ran through the framework Ximmer

Page 10: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

• ExomeDepth:• Normalization: Robust Beta-Binomial distribution• Segmentation: Hidden Markov Model on log ratio of read

counts

• CODEX• Normalization: Poisson latent model • Segmentation: Exon-level threshold to a Circular Binary

Segmentation (CBS) algorithm

• CN.Mops:• Normalization: Mixture of Poissons models• Segmentation: Based on models’ I/NI calls, joining segments

with similar I/NI calls

Page 11: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

Results

(1) CNV Size and Distribution

(2) Concordance between 3 Callers

(3) Concordance with 1000 Genomes GS Call Set

Page 12: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

1.CNV Size and Distribution

• CN.Mops could not identify CNVs less than 1kb in length

• ExomeDepth had thegreatest range in CNV size

• ExomeDepth identified CNVs as small as one exon and CODEX identified CNVs as small as three exons

• CN.Mops and CODEX identified larger CNVs

Page 13: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

2. Concordance

Between Callers

Page 14: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

IGV

Page 15: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

IGV

Page 16: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

3. Concordance with GS call set

•Only 180 of the 1213 (15%) target exons were identified by all three callers.

• ExomeDepth achieved the highest sensitivity at 40.73%, precision at 5.3%

• CODEX had a sensitivity of 30.67%, precision of 2.4%

• CN.Mops had a sensitivity of 29.60%, precision of 1.1%

Page 17: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

All 3 callers:15%

sensitivity, 8% precision,2097 FP

Page 18: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

Two or more callers:37.5%

sensitivity, 14288 FP

Page 19: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

ExomeDepth + 1 or more

callers:35.0%

sensitivity, 6189 FP

Page 20: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

Summary Statistics

Sensitivity Precision False Positives

ExomeDepth alone 40.7% 5.3% 8,880

CODEX alone 30.7% 2.4% 14,528

CN.Mops alone 29.6% 1.1% 31,733

All 3 Callers 14.8% 8% 2,097

2 or More Callers 37.5% 3.1% 14,288

ExomeDepth + 1 or more callers

35.0 % 6.4% 6,189

Page 21: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

631 missed exons are largely on the X chromosome

Page 22: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

Corresponding 127 CNVs’ Size

DuplicationsDeletions CN > 4

Page 23: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

Takeaways and Future Directions

THE BEST RULE IS EXOMEDEPTH + ONE OR MORE CALLERS

1

CAN MACHINE LEARNING BE APPLIED TO CNV CALLING?

2

Page 24: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

Part II: The Application of Machine Learning Clustering on MicroRNAs as a Quality Analysis and Control Tool for Large Cancer Genomics Projects

Page 25: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

microRNAs

• miRNAs are small non-coding RNAs that can regulate genes• Identified as significant in

multiple cancers: breast, ovarian, glioblastoma, leukemia

Page 26: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

machine learning- clustering

• Clustering is a type of unsupervised machine learning—one that requires no training set • Groups together samples by

similarity in traits• In our case, samples are grouped

by similarity in microRNA expression• Two widely used methods: k-

means and hierarchical clustering

Page 27: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

Results of Hierarchical Clustering

Page 28: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

Lung Adenocarcinoma Clustering- Pilot

Page 29: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

Lung Squamous Cell Carcinoma- Pilot

Page 30: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

Normal Adjacents- Pilot

Page 31: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

Amyloid Leukemia Clustering

Page 32: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

Glioblastoma Clustering

Page 33: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

Observations

Amyloid Leukemia and Glioblastomas cluster well in retrospective data

Lung Adenocarcinoma and Lung Squamous Cell Carcinoma can cluster separately

Normal Adjacent samples cluster separately from tumors

Page 34: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

Lessons Learned from Internship

FORENSIC BIOINFORMATICS POWER AND LIMITATIONS OF SEQUENCING TECHNOLOGY IN

HEALTHCARE

HOW TECH CAN BE LEVERAGED TO IMPROVE HEALTH OUTCOMES

Page 35: Internship Presentation - chip.unc.edu€¦ · Internship Presentation Halina Krzystek MPS Candidate in Biomedical and Health Informatics. Part I: An Evaluation of Copy Number Variant

REFERENCES

• Eisenmann, D. M., microRNAs (June 25, 2005), WormBook, ed. The C. elegans Research Community, WormBook, doi/10.1895/wormbook.1.7.1, http://www.wormbook.org.

• Rice, A., McLysaght, A. Dosage sensitivity is a major determinant of human copy number variant pathogenicity. Nat Commun 8, 14366 (2017). https://doi.org/10.1038/ncomms14366

• Zhao, M., Wang, Q., Wang, Q. et al. Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives. BMC Bioinformatics 14, S1 (2013). https://doi.org/10.1186/1471-2105-14-S11-S1