Development of a bioinformatics tool for the automated generation of a report of the somatic mutations found
in a Normal/Tumor cancer experiment
Isaac Noguera Guixà
Universitat Autònoma de Barcelona15th of July, 2014
Project tutor:
Dr. Raúl TondaData analysis team. Centre Nacional d‘Anàlisi Genòmica (CNAG), PCB
Academic tutor: Dr. Miguel Perez-Enciso. Centre for Research in Agricultural Genomics (CRAG), UAB
Course 2013 - 2014
Master’s Thesis
2
Table of contents
Introduction◦ Cancer genetics
◦ Cancer in Bioinformatics
Objectives
Material and methods
Results
Conclusions
3
Introduction
Loss of normal growth control
Cell damage (no repair)Normal cell
Cell suicide (apoptosis)
Uncontrolled growth
1st mutation
2nd mutation 3rd mutation
Yulug, I. (2006). Molecular basis of cancer [PowerPoint slides]. Retrieved from http://www.hugointernational.org/resources/Isik_Yulug_Molecular_Basis_of_cancer_bilingual.ppt
4
Introduction
Cancer in Bioinformatics
Normal sample
Tumor sample
Read mapping and
variant calling
Normal/Tumor experiment
Lopez-Bigas, N. (2011). Identification of cancer drivers across tumor types [PowerPoint slides]. Retrieved from http://es.slideshare.net/nurialopezbigas/identification-of-cancer-drivers-across-tumor-types#
A variant is determined by the joint status in tumor-normal sequence pairs
5
Variant call format (vcf)
Introduction
Cancer in Bioinformatics
Normal/Tumor experiment
(Danecek, P. et al., 2011)
6
Objectives
Main objective
Develop an automated tool to produce a report of the somatic variants found in a Normal/Tumor experiment
→ Process the output of the CNAG’s variant calling pipeline
→ Filter the somatic variants from it and extract relevant statistics from them
→ Identify those variants that are already known and annotated in cancer somatic mutations databases
→ Transform the obtained data into some tables and graphics to include in the report
→ Fill a report template independently from the code of the main script with the processed data
→ Generate the report document in printable format such as a portable document format (pdf)
→ Execute all these steps sequentially and automatically
Additional objective
Incorporate the developed tool as an additional step in the variant calling pipeline from the CNAG’s Data Analysis team
7
Material and methods
Basis of the developed tool:
Main script Template document
Perl script
Template module
Input data processing
Output data generation
Template Toolkit script
LaTeX code with R and Template Toolkit
code embedded
8
Material and methods
Template Toolkit document
Noweb document
CNAG’s vcf
Data processing
COSMICdb annotation
Somatic variants filtering
Output data storing/generation
Template processing
Template processing
R Sweave
LaTeX document
pdflatexPdf
document
Inputdata
Designed pipeline:
##INFO=<ID=FP,Number=1,Type=Float,Description="Fisher test P-value for somatic comparison.">#CHROM POS ID REF ALT QUAL FILTER FORMAT INFO NORMAL TUMORChr1 883814 . A G 18.1 mrd10 DP=36;UPSTREAM(MODIFIER||||NOC2L|processed_transcript|CODING|ENST00000496938|);FP=0.00604 GT:PL:DP 0/0:0,96,255:32 0/1:51,0,26:3Chr20 126154 dbSNPBuildID=137;GMAF=0.1648 T A 64.7 mrp0.05 INDEL;EFF=FRAME_SHIFT(HIGH||||DEFB126|protein_coding|CODING|ENST00000382398|exon_20_126056_126392;FP=1 GT:PL:DP 1/1:255,255,0:274 0/1:253,0,45:26
9
Results
Script's usage description...
usage: main.pl -f file [-template file] [-p value] [-s value] [-project "string"] [-cnv "string "] [-methods] [-cosmic file] [-h]
- h this (help) message
- f file variant call format file (.vcf) to be analyzed
- template file template Toolkit file (.tt) to be used as a template. If not defined, it will use the default (“reporttemplate.tt”)
- p valueadd extra p-values to the default p-values (1,0.05 and 0.001) that will be used for the somatic variants filtering
- s valuesomatic variants will be only filtered for the specified p-values defined by this option
- cosmic fileCOSMIC database file for SNPSift annotation (default “CosmicCodingMuts_v68")
- cnv "string"specify the path where the script will look for the Control-FREEC output. If it is found, it will be added to the report
- project "string"add the name of the project to the report title page
- methodsprint the methods appendix in the report (if not defined it will be not printed)
10
Results
Adobe Acrobat Document
$ perl main.pl –f PatientX.vcf –s ‘1,0.001’ –cnv “/Project/Production/DAT/CNV/” –project “FAMCOLON” –methods
11
Conclusions
1) We developed a functional automated tool which automatically generates a report document for the somatic variants found in a Normal/Tumor experiment.
2) The content of the report is acceptable but it can be improved.
3) The tool has been successfully tested. It also has already been implemented within CNAG’s variant calling pipeline to be run as its last step.
4) The template document is independent from the main script. It, in addition to the set of configurable parameters from the main script, makes the tool really customizable.
5) Not limited by the use of computational resources. The execution time and memory usage required by the tool seems not to be a limiting factor for its usage.
Tool's last aim Make easier the transfer of information from the basic research to the clinical diagnostic .
12
Thank you for your attention