Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Exome Sequencing:
A new diagnostic paradigm in renal disease?
Dr Daniel Gale
MRC Clinician Scientist and Consultant Nephrologist
Outline
• Monogenic kidney disease
• Sequencing technology
– Whole exome sequencing
– Whole genome sequencing
• Whole Exome Sequencing in familial kidney
disease
Monogenic kidney disease
• Over 200 genes known to be associated with
monogenic renal disorders
Devuyst et al Lancet 2014
Cystic kidney diseases
● Ciliopathies
● Tubulointerstitial kidney disease
Causes of ESKD in US children
Cystic/hereditary/ congenital diseases
36%
Glomerulonephritis (GN) 32%
Etiology uncertain 8%
Missing 6%
Other 6%
Interstitial nephritis/ pyelonephritis
5%
Hypertensive/large vessel disease
4% Neoplasms 2%
Diabetes 1%
Polycystic kidneys (dominant) 1%
Polycystic kidneys (recessive) 7% Medullary cystic disease/
nephronophthisis 5%
Hereditary nephritis, Alport's syndrome 6%
Cystinosis 2%
Congenital nephrotic syndrome 6%
Drash syndrome, mesangial sclerosis 1% Congenital ureteropelvic junction
obstruction 2% Congenital ureterovesical junction
obstruction 2%
Other congenital obstructive uropathy 23%
Renal hypoplasia, dysplasia, 31%
Prune belly syndrome 4% Other (congenital malformation
syndromes) 9%
Causes of ESKD in UK adults
Data not available
10%
Uncertain aetiology
19%
Diabetes 22%
Glomerulo-nephritis
11%
Hypertension 5%
Other 14%
Polycystic Kidney
Disease 6%
Pyelonephritis 7%
Renovascular disease
6%
Byrne et al UK Renal Registry 2009
Known genetic
kidney diseases
Known
unknowns
Unknown
unknowns
Family history is a major risk factor for ESKD
0
2
4
6
8
10
12
14
16
18
Ag
e a
dju
ste
d H
aza
rd R
ati
o
Ethnicity Co-morbidity Smoking
Data from >1.5 million UK residents, Hippisley-Cox and Coupland, BMC Family Practice 2010
Monogenic contribution to kidney disease
• Monogenic disease known to account for
substantial proportion of kidney failure of known
cause
• Kidney disease is frequently familial
• Is monogenic disease responsible for
undiagnosed kidney disease?
• Ability to make a secure diagnosis non-invasively – Discloses cause of disease
– Determine prognosis, select treatment
– Risk of disease recurrence post transplantation
• Predictive testing informs decision making – Suitability for donation (risk to
donor or their other family members)
– Reproductive decisions
• Identify homogeneous cohorts of
patients
– Starting point for adequately
powered clinical trials in rare
diseases
• Identifying and studying the gene
reveals biology underlying the
disease
– Rational drug development
– Application to common (sporadic)
diseases
Utility of molecular genetics:
Clinical and Scientific
Molecular genetics approaches
• Sanger sequencing and Next Generation
Sequencing
• Both can be used to sequence specified candidate
gene or region
• Next generation sequencing also applied to:
– Candidate genes or panel of genes
– Whole exome sequencing (WES)
– Whole genome sequencing (WGS)
The genome and the exome
• <100 million bases (2%) encode the ~20,000 genes
expressed in humans (the exome)
Exon 1
Transcription
Translation
DNA
RNA
Protein
Exon 2 Exon 3 Exon 4
Intergenic
Intron
• The human genome contains
3 billion DNA bases
• 98% lie outside coding regions
‘exons’ of genes
Coding
regions
Known disease genes
Non-coding regions
(intra and inter-genic)
• Target gene amplified
– PCR of each exon individually
• Synthesis with premature chain termination using fluorescent
ddNTPs followed by electrophoresis to determine sequence
• High quality robust data requiring only modest infrastructure
• Rapidly becomes prohibitively expensive if multiple targets
sequenced – typical cost $600-2000 per gene
• Unable to detect copy number variations (CNVs)
Sanger sequencing
www.wikipedia.org
Next Generation (massively parallel)
Sequencing
Ross and Cronin, Am J Clin Path 2011
Shear DNA Anchor
Amplify
Denature
Synthesize with
fluorescent
reversible
terminators
Image
Attach
adapters
Example NGS data
Sequencing cost per genome
Hayden, E. C. Nature. 2014; 507: pp294–295
Targeted Next Generation Sequencing
Ross and Cronin, Am J Clin Path 2011
Anchor
Amplify
Denature
Synthesize with
fluorescent
reversible
terminators
Image
Attach
adapters
Shear and
capture
targeted
region
PCR
targeted
region
Enrichment (PCR or Capture)
• Massively reduces sequencing needed
– Cheaper
– Simplifies data analysis and storage
• Introduces additional costs and noise
– Validation/optimisation of customised amplification or
capture array
– Read depth at a particular locus affected by
amplification/capture efficiency as well as copy number
– Genomic breakpoints flanking CNVs usually lie outside
coding regions
Whole Exome Sequencing (WES)
• >95% of known protein-coding genes sequenced
• Single nucleotide and short (sub-exon sized)
insertions and deletion variants detected reliably
– Copy number can be inferred by number of reads but
performance worse than panels or WGS
• Multiple commercial providers and significant
demand
– Cost $500-800 – similar to candidate gene or gene
panel sequencing
• Data generated: gigabytes per patient
Whole Exome Sequencing data
• Vast majority of genes (>90%) not associated with
monogenic disease
– Sequence variants within them often uninterpretable
• Prior probability any one variant identified
responsible for disease is extremely low
– Need robust data to support pathogenicity
• Allows detection of “Incidental findings”
Incidental findings
• Findings in genes unrelated to the reason for test
– Prior probability of pathogenicity very low so only applies
to known or clearly pathogenic variants
• Some incidental findings are medically actionable
– Intervention in an asymptomatic patient has health benefit
– Example would breast screening or prophylactic
mastectomy where a BRCA1 mutation is identified
– Present in up to 1-2% of population (Dorschner et al AJHG 2013)
• How this is approached depends on:
– Context (research or clinical test, healthcare service etc)
– Consent obtained from patient (NB children)
Whole Genome Sequencing
• No capture step
– More sensitive detection of
duplications/deletions/rearrangements (CNVs)
• Disease-causing mutations disproportionately
present in coding regions (exome)
– Bioinformatics tools to interpret effect of non-coding
variants on organism are crude and unreliable
• 50-100-fold more sequencing needed
– Cost ~$5000
– Handling very large amounts of data has substantial
infrastructure implications (terabytes per individual)
Coding
regions
Known disease genes
Non-coding regions
(intra and inter-genic)
WES in familial kidney disease – London
experience
• Patients who reported 1 or more relative with unexplained
kidney disease were recruited
– Known genetic diseases excluded (eg ADPKD, Alports etc)
• 71 families included – most from London but also from
collaborators in UK and abroad
– Very wide distribution of ethnic origins (24 countries)
– Family size ranged from 2-16 individuals
– Some families had only a single living affected member
• Exome sequenced index case from each family
Exome sequencing data filtering
• ~25,000 exonic/flanking variants detected in each
person
– Intronic and intergenic variants removed
• Synonymous coding variants removed
• Non-synonymous coding and splicing variants
filtered to remove common variants
– Present in >0.5% of 1000 genomes database
• ~400-1400 rare coding variants remain depending
on ancestry
Fewer rare variants in
patients with European
ancestry (p<10-9)
Geography and frequency of rare variants
• UK Caucasians are well-represented in 1000
genomes database
– Lower frequency polymorphisms in other populations
occur at <0.5% in 1000 genomes database so not
filtered
• Overall genetic diversity greater in African
populations
– Relevant to rare variant identification
Exome sequencing in 71 families identified
causative mutation in 28%
Causative variant
identified; 20
No Causative variant;
51
Assigning pathogenicity
• Mutation previously
described in association with
phenotype (14)
• Very likely damaging (1)
– eg frameshift, splicing or
nonsense where such
mutations in that gene known
to cause disease
• Similar mechanism to known
pathogenic mutations (5)
– Glycine substitution in
collagenous domain of
Collagen 4
• All cosegregated with
disease
Pathogenic mutations in renal disease genes
• 3 families with X-linked Alport Syndrome had not received a
firm clinical/histological diagnosis
• Phenotype associated with some genes (eg COL4A3,
LMX1B, NPHS2) probably broader than recognised
Gene Families
COL4A3 6
COL4A4 1
UMOD 2
INF2 2
MYH9 1
LMX1B 1
PAX2 2
Gene Families
SLC12A3 1*
NPHS2 1*
*Consanguinity reported
Dominant Recessive
Gene Families
COL4A5 3
X-linked
Homozygous NPHS2 (Podocin) V180M
previously identified in children with SRNS
• 51 year old Lebanese man with CKD
• Brother required a kidney transplant in late 20s
but died shortly afterwards
• Presented with proteinuria aged 30
– Biopsy showed ‘mesangial proliferative GN’ with IgM
deposition and sclerosis
– Repeat biopsy 15 years later showed FSGS
– Never developed nephrotic syndrome
• Parents were probably first cousins
Heterozygous LMX1B mutation R246Q in man
with biopsy diagnosis of ‘Alport Syndrome’
• Proteinuria noted in childhood (around age 10)
– Biopsy showed basement membrane abnormalities “typical of Alport
Syndrome”
– Father reached ESKD in his 50s (no biopsy)
• Rapid deterioration to ESKD in late 30s
• National level triathlon competitor
• Nail Patella Syndrome without nail or patella abnormalities!
Families not diagnosed by WES
• Non-Mendelian disease in proband or whole family
– Is this more common in non-Europeans (NB APOL1)?
• Failure to detect causative variant
– ~5-10% of annotated exome not covered
– CNVs not reliably detected at this read depth
– Non-exonic causative variants responsible for up to 20%
Mendelian disease
• Failure to recognise detected causative variant
– Mutation in gene not previously associated with disease
Kirby et al 2013: MCKD1 caused by C-insertion in >80% GC MUC1
tandem repeat
CCGCCCCCCAAGCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCA
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCACCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCGGGCCCCGGGCTCCACCCCGGCCCCGG
GCTCCACCGCCCCCCCAGCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGG
GCTCCACCGCCCCCCCAGCCCATGGTGTCACCTCGGCCCCGGACAACAGGCCCGCCTTGG
CCGCCCCCCAAGCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCA
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCCA
GCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCA
GCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCA
GCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCA
GCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCA
GCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCA
GCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCA
GCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCA
GCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCA
GCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCA
GCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCA
GCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCA
GCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCA
GCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCA
GCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCA
GCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCA
GCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCA
GCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCACCCCCA
GCCCACGGTGTCACCTCGGCCCCGGACACCAGGCGGGCCCCGGGCTCCACCCCGGCCCCG
GGCTCCACCGCCCCCCCAGCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCG
GGCTCCACCGCCCCCCCAGCCCATGGTGTCACCTCGGCCCCGGACAACAGGCCCGCCTTG
G
Wild type Mutant
~20-125
repeats in
controls
MUC1 VNTR not covered by NGS (WES/WGS)
24 out of 71 (34%) of families solved
Causative variant
identified; 20
MUC1 positive; 4
No causative
variant identified;
47
Familial kidney disease, mutations and
catchment population
Causative mutation identified in 46% Europeans
compared with 17% non-Europeans (p=0.01)
887 39
18
372
26
5 392
6 1
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Receiving RRT in N London
Families investigated Mutation identified
European
Asian/ Mediterranean
African/ Caribbean
Geographical distribution of ancestry and
mutations
Mutation identified in 43% of European
but only 24% of non-European families
4
30 UMOD
PAX2, COL4A3
COL4A3
MYH9
INF2
4
9
3xCOL4A5, 2xCOL4A3
INF2, SLC12A3, UMOD
LMX1B, MUC1x4
COL4A4
Europe
Cyprus
Asia/Mediterranean
Sub-Saharan Africa
3 2
2
COL4A3
COL4A3
NPHS2
3
COL4A3
Ancestry
PAX2 2 4
Application to non-familial disease
• Some monogenic diseases frequently present in
absence of family history (atypical HUS)
• Seems likely that a proportion of patients with
unexplained kidney failure and no family history
will have identifiable genetic defects
– This remains to be demonstrated empirically
• CNVs frequently detectable in children with
developmental delay and chronic kidney disease
– Verbitsky et al JCI 2015; DDD study Nature 2015
Conclusions
• WES provides additional diagnostic power
compared with standard clinical/biopsy assessment
– In patients with a family history WES revealed diagnosis in
a significant proportion (28%) previously undiagnosed
– Spectrum disease associated with some genes is broader
than conventionally recognised
• Key limitation is ability to interpret variants identified
– Will improve as more pathogenic variants are deposited in
databases and bioinformatics/experimental tools improve
• As sequencing costs drop and bioinformatics
infrastructure improve WGS may supplant WES
– WGS/WES has significant blind spots (eg MUC1 VNTR)
Patients, their families
and clinicians
UCL Centre for Nephrology
• Fiona Lin, Nadia Khan, Tom
Connor, Anna Ferlin
Northern Cyprus
• Guy Neild
• Deren Oygur
Other colleagues from
Cyprus, Hull, Portsmouth,
Sheffield, Oxford
Acknowledgements
Kings College London
• Michael Simpson
Cambridge University
• Patrick Maxwell
Wake Forest Medical Center
• Tony Bleyer
Candidate/Panel Whole Exome Whole Genome
Technical Enrichment step Enrichment step No enrichment step
Small dataset (MB) Larger dataset (GB) Enormous dataset (TB)
Sensitivity Only specific
region/genes targeted Known exons
targeted Includes coding and
non-coding genome
Good CNV detection
(read depth)
May need updating as
new genes discovered
Poor CNV detection
Does not target all
disease-causing
variants
Better CNV detection
Will miss some variants
(i.e. not whole genome)
Interpretation
of rare
variants
High prior probability
of pathogenicity Very low prior
probability of
pathogenicity
Very low prior
probability of
pathogenicity
Difficult to predict effect
of non-coding variants
No incidental findings Incidental findings Incidental findings
Cost <$1000 <$1000 >$5000