View
221
Download
1
Category
Tags:
Preview:
Citation preview
Sample to Insight
BIOBASE Training
Human Gene Mutation Database (HGMD®)
The only comprehensive source of data on human inherited disease-associated mutations
Sample to Insight
A comprehensive source of mutation data
• Focus on peer-reviewed scientific literature
• Experimental results are extracted by highly trained genetic experts
• Content is updated 4x per year
Sample to Insight
More than 170,000 curated mutations
HGMD® Professional Spring 2015.2 Release
Mutation Type Number of Entries
Micro Lesions:
Missense / Nonsense 94860
Splicing 15476
Regulatory 3242
Small Deletions 25454
Small Insertions 10617
Small Indels 2436
Gross Lesions:
Repeat Variations 476
Gross Insertions / Duplication 3086
Complex Rearrangements 1638
Gross Deletions 12833
Total 170118
Sample to Insight
HGMD® advantages
• Identifying the known genetic causes of a given inherited disease
• Understanding the mutational spectrum of a particular gene
• Verifying novel mutations
• Assessing individual disease risk
• Reducing time for literature review relating to a given inherited disease
HGMD® is the industry standard for:
Sample to Insight
LRRK2
Mutation report for CM074929
Sample to Insight
Categorization of mutations & polymorphisms
DM = Disease causing (pathological) mutation
DM? = Likely disease causing (likely pathological) mutation
DP = Disease associated polymorphism
DFP = Disease associated polymorphism with additional supporting functional evidence
FP = Polymorphism affecting the structure, function or expression of a gene but with no disease association reported yet
Sample to Insight
PGMDTM
• Comprehensive pharmacogenomic database
• PGx/ADME panels
• FDA and EMA approved drugs containing PGx labels Associations from 6500+ publications from
500+ journals studying >1400 drugs
Sample to Insight
Facilitates mapping of variants onto genome at position or genotype level
Associations from 6500+ publications from 500+ journals studying >1400 drugs
A/C
• Median dose requirement of warfarin in patients with CYP2C9*1/CYP2C9*3 haplotype is 2.6 mg
Genotype/haplotype specific findings
• p-value - .001• Relative Risk, Hazards Ratio, 95%
Confidence Interval when available
Statistical significance
• 22 cases with A/C genotype, 159 subjects studied, Design - Clinical Trial
• Pop: European Continental Ancestry Group, Age: 24-95, Treatment: All patients are treated with 0.5 mg to 10 mg/day of warfarin
Study details (All studies are in vivo)
PGMD: PharmacoGenomic Mutation Database
Sample to Insight
Types of evidence
Sample to Insight
HapMap D’, LOD, and R2 scores
Computed for all PGMD sites Includes between non-PGMD sites
Linkage Disequilibrium
Sample to Insight
Allele frequencies
Major sources including: EVS 1000 Genomes HapMap
Sample to Insight
Delivery models
Online
PGMD Web InterfaceSubject specific
annotation via Genome Trax
Download
MySQL databaseTSV BEDGFF
Custom Pipeline
Integration
Sample to Insight
Genome Trax™
Sample to Insight
NGS analysis pipeline
Sample to Insight
Genome Trax™
Candidate Genes
Disease causing variants
Regulatory variants
Over 190 million annotations total
Track Release 2015.1HGMD® inherited disease mutations 146,581HGMD® imputed mutations 14,570Pharmacogenomic Variants 806,806GWAS Catalogue 18,735COSMIC somatic disease mutations 2,626,811ClinVar 127,638TRANSFAC® experimentally verified TFBS 15,330ChIP-seq Transcription Factor Binding Sites 9,178,528Predicted TF@DNase I hypersensitivity sites 10,732,462miRNA gene sites 2,735PTMs (Post-Translational Modifications) 35,079PROTEOME ™ disease genes 14,905PROTEOME ™ Drug target genes 2,976PROTEOME ™ Pathway genes 2,057HGMD® disease genes 27,257SIFT &Polyphen predictions, conservation 88,986,833EVS allele frequencies 3,663,071Allele frequency from 1000 Genomes 12,330,177dbSNP common SNPs 13,604,359dbSNP 60,879,061
Function prediction & frequency
Sample to Insight
Use it as you like it
Download Flat files, MySQL dump
Use with genome browsers, excel, tools, scripts,
ANNOVAR, CLC bio Workbenches, Alamut, Cartagenia…
Sample to Insight
HGMD – inherited mutations
Sample to Insight
HGMDCAC (Histidine) changing to CAA (Glutamine) is causative for disease X
CAC > CAG, leads to the same Histidine to Glutamine changebut would not be a match for the mutation
The HGMD equivalent track covers such cases
HGMD imputed
Sample to Insight
Facilitates mapping of variants onto genome at position or genotype level
Associations from 6500+ publications from 500+ journals studying >1400 drugs
A/C
• Median dose requirement of warfarin in patients with CYP2C9*1/CYP2C9*3 haplotype is 2.6 mg
Genotype/haplotype specific findings
• p-value - .001• Relative Risk, Hazards Ratio, 95%
Confidence Interval when available
Statistical significance
• 22 cases with A/C genotype, 159 subjects studied, Design - Clinical Trial
• Pop: European Continental Ancestry Group, Age: 24-95, Treatment: All patients are treated with 0.5 mg to 10 mg/day of warfarin
Study details (All studies are in vivo)
PGMD: PharmacoGenomic Mutation Database
Sample to Insight
ClinVar Variants Version:ClinVar-2015-02Track Description:This track contains data from the ClinVar. ClinVar is a public archive of reports that lists relationship between human variations and phenotypes with supporting evidence. Thus ClinVar facilitates access to and communication about the relationships asserted between human variation and observed health status, and how interpretation of variation may change over time. ClinVar collects reports of variants found in patient samples, assertions made regarding their clinical significance, information about the submitter, and other supporting data. The alleles described in the submissions are mapped to reference sequences, and reported according to the HGVS standard.Benefit:This data set contains experimentally observed, clinically significant variants that are reviewed by experts.Filename: clinvarLink-out base URL: http://preview.ncbi.nlm.nih.gov/clinvar/$$Links to: An individual variant report in ClinVar site at NCBI.Accession: ClinVar ID.Feature:HGVS description and the phenotype. For eg: NT_011109.15:g.14128514A>G:Diaphyseal dysplasia;
Sample to Insight
COSMIC somatic disease mutations Version: v71Track Description:This track contains data from the Catalogue of Somatic Mutations in Cancer (COSMIC).COSMIC contains somatic mutation information relating to human cancers. The mutation data and associated information is extracted from the primary literature and entered into the COSMIC database. In order to provide a consistent view of the data a histology and tissue ontology has been created and all mutations are mapped to a single version of each gene. A central aim of COSMIC is to provide somatic mutation frequencies. This track contains SNPs, insertions and deletions from COSMIC.We include COSMIC mutations for which a chromosomal position can be determined. The percentage of mutations with position is approximately 75%.Benefit:These somatic mutations complement the set of germ-line mutations from HGMD to allow for a more comprehensive assessment of prior knowledge about observed mutations.Filename: cosmicLink-out base URL:http://www.sanger.ac.uk/perl/genetics/CGP/cosmic?action=mut_summary&id=$$Links to:An individual mutation report in COSMIC site at the Welcome Trust Sanger Institute.Accession: COSMIC Mutation ID.Feature:The histology and mutational change, eg "carcinoma:c.775G>T".
Sample to Insight
EVS Exome Variations Version:ESP6500Track Description:The EVS annotation source contains exome sequencing variants retrieved from the Exome Variant Server (EVS) for NHLBI Exome Sequencing Project (ESP)1. The EVS data release (ESP6500) The dataset is comprised of a set of 2203 African-Americans and 4300 European-Americans unrelated individuals, totaling 6503 samples (13,006 chromosomes).. All data were simultaneously analyzed for exome variants at the University of Michigan (Abecasis Laboratory). The methods used for analysis is explained in detail at http://evs.gs.washington.edu/EVS/Benefit:EVS provides the population based genotype, allele counts and MAF scores for the variations observed in exome regions.Filename:evsAccession:a uniqe number identifying the EVS record. e.g. EVS2265387Feature:rsID and hgnc symbol of the gene eg. "rs138751118:C4orf21".
Sample to Insight
Orphanet (Beta) Version:02/18/2015Track Description:Orphanet is the reference portal for information on rare diseases and orphan drugs, for all audiences. Orphanet's aim is to help improve the diagnosis, care and treatment of patients with rare diseases.Benefit:Allows you to associate known patterns of inheritance (dominant, recessive) with rare diseases and the genes implicated in them. Togehter with the observed zygosity, and the disease causing mutations in HGMD, this can help you to focus only on dominant disease causing variants, or on recessive disease causing variants that are homozygous in the patient sample.Filename:OrphaAccession:The numerical part of the 'Orpha number‘, for example 79314 associated with the 'Orpha number' ORPHA79314
Sample to Insight
GWAS Catalogue Version:02/17/2015Track Description:This track contains data from the GWAS Catalogue1. These are literature derived disease associations for polymorphisms from GWAS studies that assayed at least 100,000 single nucleotide polymorphisms, associations listed are limited to those with p-values < 1.0 x 10 -5. The dataset provides Odds Ratios for common variants that can be used to calculate increased or decreased risk for the disease. A detailed description of the methods to assemble the dataset can be found in Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, and Manolio TA. Potentialetiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA. May 27, 2009., available http://www.genome.gov/pages/about/od/newsandfeatures/pnasgwasonlinecatalog.pdf, and at the GWAS Catalogue at www.genome.gov/gwastudies.Benefit: These disease association data are manually curated, experimentally determined associations from the scientific literature, mapped to coordinates. They allow you to identify common SNPs that influence the risk for common diseases.Filename: gwasLink-out base URL: http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=$$Links to: dbSNP record. As the GWAS catalog does not provide reports for the individual SNPs, we link to dbSNP instead.Accession: dbSNP rsidFeature: The disease, risk allele, and odds-ratio or beta (denoted by OR or beta), e.g. “Ovarian_cancer; rs2363956-T;1.1OR
Sample to Insight
dbNSFP Nonsynonymous functional predictions Version:version:v2.9Track Description:This track contains data from dbNSFP(Database for Non-synonymous SNPs Functional Predictions)1. href="#fn4">4. dbNSFP is an integrated database of functional predictions from multiple algorithms for the comprehensive collection of human non-synonymous SNPs (NSs).It compiles prediction scores from four new and popular algorithms (SIFT, Polyphen2, LRT, and MutationTaster), along with a conservation score (PhyloP) and other related information, for every potential NS SNP in the human genome. More details about the methods of prediction is available at http://www.ncbi.nlm.nih.gov/pubmed/21520341Benefit:This track also provides a calculated consensus prediction based on the results from different prediction algorithms from dbNSFP data. The prediction of each NSs is accreted according to its deleterious tendency ("Probably Deleterious", "Unknown", "Probably Harmless", "Harmless").Filename:dbnsfpAccession:Gene ID; eg: "85440"Feature:Aminoacid reference base > Aminoacid alternate reference base: Consensus prediction; eg: > N: Probably Deleterious 50%.
Sample to Insight
TRANSFAC – gene regulation
Sample to Insight
PROTEOME – candidate genes
Sample to Insight
PROTEOME – disease genes & drugs
Sample to Insight
Trio dataset from clinical practice
Bloom Syndome Our Patient
Autosomal recessive Compound heterozygote
Short stature Short stature
Facial Anomalies Facial Anomalies
Skin hypo- and hyperpigmentation
Skin hypo- and hyperpigmentation
Feeding difficulties Feeding difficulties
Mild intellectual disability Severe intellectual disability
Cancer Predispostion Cancer Predisposition
Frequent childhood infections
No frequent infections
After 20 years, following Genome Trax trio analysis finally able to be diagnosed with
BLOOM SYNDROME
Sample to Insight
Stand-alone Application
ANNOVAR Introduction
32
Sample to Insight
ANNOVAR requires the annotation databases saved in local disk for annotating genetic variants.
A simple command can be issued to download the database directly from the internet (from UCSC browser, 1000 genome project or the ANNOVAR website).
annotate_variation.pl -downdb [optional arguments] <table-name> <output-directory-name>
Database preparation
33
Sample to Insight
Gene anno databases
gene / refgene / refGene
knowngene / knownGene
ensgene / ensGene
Region anno databases
• Cytoband• tfbsConsSites• GenomicSuperDups• omimGene
Filter databases
• 1000g2012apr• snp137• snp135
Database preparation
34
Sample to Insight
Database download
35
Sample to Insight
ANNOVAR takes text-based input files, where each line corresponds to one variant.
On each line, the first five space- or tab- delimited columns represent
chromosome start position end position ref nucleotides obs nucleotides
Input files
36
Sample to Insight
Isolate tumor specific variants by removing the germ line variants
This file, containing filtered results is used as input for gene based annotation which extracts variants in the exonic, intronic, intergenic and other regions
Profiling Breast Cancer variants – Input file
37
Sample to Insight
This result file can be searched for specific, high risk genes such as TP53, BRCA1 and BRCA2
Profiling Breast Cancer variants
38
Sample to Insight
39
Recommended