Exome Sequencing: A new diagnostic paradigm in renal …Renal hypoplasia, dysplasia, 31% Prune belly syndrome 4% Other (congenital malformation ... Mendelian disease •Failure to

Exome Sequencing:

A new diagnostic paradigm in renal disease?

Dr Daniel Gale

MRC Clinician Scientist and Consultant Nephrologist

[email protected]

Outline

• Monogenic kidney disease

• Sequencing technology

– Whole exome sequencing

– Whole genome sequencing

• Whole Exome Sequencing in familial kidney

disease

Monogenic kidney disease

• Over 200 genes known to be associated with

monogenic renal disorders

Devuyst et al Lancet 2014

Cystic kidney diseases

● Ciliopathies

● Tubulointerstitial kidney disease

Causes of ESKD in US children

Cystic/hereditary/ congenital diseases

36%

Glomerulonephritis (GN) 32%

Etiology uncertain 8%

Missing 6%

Other 6%

Interstitial nephritis/ pyelonephritis

5%

Hypertensive/large vessel disease

4% Neoplasms 2%

Diabetes 1%

Polycystic kidneys (dominant) 1%

Polycystic kidneys (recessive) 7% Medullary cystic disease/

nephronophthisis 5%

Hereditary nephritis, Alport's syndrome 6%

Cystinosis 2%

Congenital nephrotic syndrome 6%

Drash syndrome, mesangial sclerosis 1% Congenital ureteropelvic junction

obstruction 2% Congenital ureterovesical junction

obstruction 2%

Other congenital obstructive uropathy 23%

Renal hypoplasia, dysplasia, 31%

Prune belly syndrome 4% Other (congenital malformation

syndromes) 9%

Causes of ESKD in UK adults

Data not available

10%

Uncertain aetiology

19%

Diabetes 22%

Glomerulo-nephritis

11%

Hypertension 5%

Other 14%

Polycystic Kidney

Disease 6%

Pyelonephritis 7%

Renovascular disease

6%

Byrne et al UK Renal Registry 2009

Known genetic

kidney diseases

Known

unknowns

Unknown

unknowns

Family history is a major risk factor for ESKD

0

2

4

6

8

10

12

14

16

18

Ag

e a

dju

ste

d H

aza

rd R

ati

o

Ethnicity Co-morbidity Smoking

Data from >1.5 million UK residents, Hippisley-Cox and Coupland, BMC Family Practice 2010

Monogenic contribution to kidney disease

• Monogenic disease known to account for

substantial proportion of kidney failure of known

cause

• Kidney disease is frequently familial

• Is monogenic disease responsible for

undiagnosed kidney disease?

• Ability to make a secure diagnosis non-invasively – Discloses cause of disease

– Determine prognosis, select treatment

– Risk of disease recurrence post transplantation

• Predictive testing informs decision making – Suitability for donation (risk to

donor or their other family members)

– Reproductive decisions

• Identify homogeneous cohorts of

patients

– Starting point for adequately

powered clinical trials in rare

diseases

• Identifying and studying the gene

reveals biology underlying the

disease

– Rational drug development

– Application to common (sporadic)

diseases

Utility of molecular genetics:

Clinical and Scientific

Molecular genetics approaches

• Sanger sequencing and Next Generation

Sequencing

• Both can be used to sequence specified candidate

gene or region

• Next generation sequencing also applied to:

– Candidate genes or panel of genes

– Whole exome sequencing (WES)

– Whole genome sequencing (WGS)

The genome and the exome

• <100 million bases (2%) encode the ~20,000 genes

expressed in humans (the exome)

Exon 1

Transcription

Translation

DNA

RNA

Protein

Exon 2 Exon 3 Exon 4

Intergenic

Intron

• The human genome contains

3 billion DNA bases

• 98% lie outside coding regions

‘exons’ of genes

Coding

regions

Known disease genes

Non-coding regions

(intra and inter-genic)

• Target gene amplified

– PCR of each exon individually

• Synthesis with premature chain termination using fluorescent

ddNTPs followed by electrophoresis to determine sequence

• High quality robust data requiring only modest infrastructure

• Rapidly becomes prohibitively expensive if multiple targets

sequenced – typical cost $600-2000 per gene

• Unable to detect copy number variations (CNVs)

Sanger sequencing

www.wikipedia.org

Next Generation (massively parallel)

Sequencing

Ross and Cronin, Am J Clin Path 2011

Shear DNA Anchor

Amplify

Denature

Synthesize with

fluorescent

reversible

terminators

Image

Attach

adapters

Example NGS data

Sequencing cost per genome

Hayden, E. C. Nature. 2014; 507: pp294–295

Targeted Next Generation Sequencing

Ross and Cronin, Am J Clin Path 2011

Anchor

Amplify

Denature

Synthesize with

fluorescent

reversible

terminators

Image

Attach

adapters

Shear and

capture

targeted

region

PCR

targeted

region

Enrichment (PCR or Capture)

• Massively reduces sequencing needed

– Cheaper

– Simplifies data analysis and storage

• Introduces additional costs and noise

– Validation/optimisation of customised amplification or

capture array

– Read depth at a particular locus affected by

amplification/capture efficiency as well as copy number

– Genomic breakpoints flanking CNVs usually lie outside

coding regions

Whole Exome Sequencing (WES)

• >95% of known protein-coding genes sequenced

• Single nucleotide and short (sub-exon sized)

insertions and deletion variants detected reliably

– Copy number can be inferred by number of reads but

performance worse than panels or WGS

• Multiple commercial providers and significant

demand

– Cost $500-800 – similar to candidate gene or gene

panel sequencing

• Data generated: gigabytes per patient

Whole Exome Sequencing data

• Vast majority of genes (>90%) not associated with

monogenic disease

– Sequence variants within them often uninterpretable

• Prior probability any one variant identified

responsible for disease is extremely low

– Need robust data to support pathogenicity

• Allows detection of “Incidental findings”

Incidental findings

• Findings in genes unrelated to the reason for test

– Prior probability of pathogenicity very low so only applies

to known or clearly pathogenic variants

• Some incidental findings are medically actionable

– Intervention in an asymptomatic patient has health benefit

– Example would breast screening or prophylactic

mastectomy where a BRCA1 mutation is identified

– Present in up to 1-2% of population (Dorschner et al AJHG 2013)

• How this is approached depends on:

– Context (research or clinical test, healthcare service etc)

– Consent obtained from patient (NB children)

Whole Genome Sequencing

• No capture step

– More sensitive detection of

duplications/deletions/rearrangements (CNVs)

• Disease-causing mutations disproportionately

present in coding regions (exome)

– Bioinformatics tools to interpret effect of non-coding

variants on organism are crude and unreliable

• 50-100-fold more sequencing needed

– Cost ~$5000

– Handling very large amounts of data has substantial

infrastructure implications (terabytes per individual)

Coding

regions

Known disease genes

Non-coding regions

(intra and inter-genic)

WES in familial kidney disease – London

experience

• Patients who reported 1 or more relative with unexplained

kidney disease were recruited

– Known genetic diseases excluded (eg ADPKD, Alports etc)

• 71 families included – most from London but also from

collaborators in UK and abroad

– Very wide distribution of ethnic origins (24 countries)

– Family size ranged from 2-16 individuals

– Some families had only a single living affected member

• Exome sequenced index case from each family

Exome sequencing data filtering

• ~25,000 exonic/flanking variants detected in each

person

– Intronic and intergenic variants removed

• Synonymous coding variants removed

• Non-synonymous coding and splicing variants

filtered to remove common variants

– Present in >0.5% of 1000 genomes database

• ~400-1400 rare coding variants remain depending

on ancestry

Fewer rare variants in

patients with European

ancestry (p<10-9)

Geography and frequency of rare variants

• UK Caucasians are well-represented in 1000

genomes database

– Lower frequency polymorphisms in other populations

occur at <0.5% in 1000 genomes database so not

filtered

• Overall genetic diversity greater in African

populations

– Relevant to rare variant identification

Exome sequencing in 71 families identified

causative mutation in 28%

Causative variant

identified; 20

No Causative variant;

51

Assigning pathogenicity

• Mutation previously

described in association with

phenotype (14)

• Very likely damaging (1)

– eg frameshift, splicing or

nonsense where such

mutations in that gene known

to cause disease

• Similar mechanism to known

pathogenic mutations (5)

– Glycine substitution in

collagenous domain of

Collagen 4

• All cosegregated with

disease

Pathogenic mutations in renal disease genes

• 3 families with X-linked Alport Syndrome had not received a

firm clinical/histological diagnosis

• Phenotype associated with some genes (eg COL4A3,

LMX1B, NPHS2) probably broader than recognised

Gene Families

COL4A3 6

COL4A4 1

UMOD 2

INF2 2

MYH9 1

LMX1B 1

PAX2 2

Gene Families

SLC12A3 1*

NPHS2 1*

*Consanguinity reported

Dominant Recessive

Gene Families

COL4A5 3

X-linked

Homozygous NPHS2 (Podocin) V180M

previously identified in children with SRNS

• 51 year old Lebanese man with CKD

• Brother required a kidney transplant in late 20s

but died shortly afterwards

• Presented with proteinuria aged 30

– Biopsy showed ‘mesangial proliferative GN’ with IgM

deposition and sclerosis

– Repeat biopsy 15 years later showed FSGS

– Never developed nephrotic syndrome

• Parents were probably first cousins

Heterozygous LMX1B mutation R246Q in man

with biopsy diagnosis of ‘Alport Syndrome’

• Proteinuria noted in childhood (around age 10)

– Biopsy showed basement membrane abnormalities “typical of Alport

Syndrome”

– Father reached ESKD in his 50s (no biopsy)

• Rapid deterioration to ESKD in late 30s

• National level triathlon competitor

• Nail Patella Syndrome without nail or patella abnormalities!

Families not diagnosed by WES

• Non-Mendelian disease in proband or whole family

– Is this more common in non-Europeans (NB APOL1)?

• Failure to detect causative variant

– ~5-10% of annotated exome not covered

– CNVs not reliably detected at this read depth

– Non-exonic causative variants responsible for up to 20%

Mendelian disease

• Failure to recognise detected causative variant

– Mutation in gene not previously associated with disease

Kirby et al 2013: MCKD1 caused by C-insertion in >80% GC MUC1

tandem repeat

CCGCCCCCCAAGCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCA

CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCAG
































CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCACCCCCAG

CCCACGGTGTCACCTCGGCCCCGGACACCAGGCGGGCCCCGGGCTCCACCCCGGCCCCGG

GCTCCACCGCCCCCCCAGCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGG

GCTCCACCGCCCCCCCAGCCCATGGTGTCACCTCGGCCCCGGACAACAGGCCCGCCTTGG

CCGCCCCCCAAGCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCA
















CCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCCA

GCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCCCCCCCA
















GCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCGGGCTCCACCGCACCCCCA

GCCCACGGTGTCACCTCGGCCCCGGACACCAGGCGGGCCCCGGGCTCCACCCCGGCCCCG

GGCTCCACCGCCCCCCCAGCCCACGGTGTCACCTCGGCCCCGGACACCAGGCCGGCCCCG

GGCTCCACCGCCCCCCCAGCCCATGGTGTCACCTCGGCCCCGGACAACAGGCCCGCCTTG

G

Wild type Mutant

~20-125

repeats in

controls

MUC1 VNTR not covered by NGS (WES/WGS)

24 out of 71 (34%) of families solved

Causative variant

identified; 20

MUC1 positive; 4

No causative

variant identified;

47

Familial kidney disease, mutations and

catchment population

Causative mutation identified in 46% Europeans

compared with 17% non-Europeans (p=0.01)

887 39

18

372

26

5 392

6 1

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Receiving RRT in N London

Families investigated Mutation identified

European

Asian/ Mediterranean

African/ Caribbean

Geographical distribution of ancestry and

mutations

Mutation identified in 43% of European

but only 24% of non-European families

4

30 UMOD

PAX2, COL4A3

COL4A3

MYH9

INF2

4

9

3xCOL4A5, 2xCOL4A3

INF2, SLC12A3, UMOD

LMX1B, MUC1x4

COL4A4

Europe

Cyprus

Asia/Mediterranean

Sub-Saharan Africa

3 2

2

COL4A3

COL4A3

NPHS2

3

COL4A3

Ancestry

PAX2 2 4

Application to non-familial disease

• Some monogenic diseases frequently present in

absence of family history (atypical HUS)

• Seems likely that a proportion of patients with

unexplained kidney failure and no family history

will have identifiable genetic defects

– This remains to be demonstrated empirically

• CNVs frequently detectable in children with

developmental delay and chronic kidney disease

– Verbitsky et al JCI 2015; DDD study Nature 2015

Conclusions

• WES provides additional diagnostic power

compared with standard clinical/biopsy assessment

– In patients with a family history WES revealed diagnosis in

a significant proportion (28%) previously undiagnosed

– Spectrum disease associated with some genes is broader

than conventionally recognised

• Key limitation is ability to interpret variants identified

– Will improve as more pathogenic variants are deposited in

databases and bioinformatics/experimental tools improve

• As sequencing costs drop and bioinformatics

infrastructure improve WGS may supplant WES

– WGS/WES has significant blind spots (eg MUC1 VNTR)

Patients, their families

and clinicians

UCL Centre for Nephrology

• Fiona Lin, Nadia Khan, Tom

Connor, Anna Ferlin

Northern Cyprus

• Guy Neild

• Deren Oygur

Other colleagues from

Cyprus, Hull, Portsmouth,

Sheffield, Oxford

Acknowledgements

Kings College London

• Michael Simpson

Cambridge University

• Patrick Maxwell

Wake Forest Medical Center

• Tony Bleyer

[email protected]

Candidate/Panel Whole Exome Whole Genome

Technical Enrichment step Enrichment step No enrichment step

Small dataset (MB) Larger dataset (GB) Enormous dataset (TB)

Sensitivity Only specific

region/genes targeted Known exons

targeted Includes coding and

non-coding genome

Good CNV detection

(read depth)

May need updating as

new genes discovered

Poor CNV detection

Does not target all

disease-causing

variants

Better CNV detection

Will miss some variants

(i.e. not whole genome)

Interpretation

of rare

variants

High prior probability

of pathogenicity Very low prior

probability of

pathogenicity

Very low prior

probability of

pathogenicity

Difficult to predict effect

of non-coding variants

No incidental findings Incidental findings Incidental findings

Cost <$1000 <$1000 >$5000

Documents

Exome Sequencing: A new diagnostic paradigm in renal …Renal hypoplasia, dysplasia, 31% Prune belly syndrome 4% Other (congenital malformation ... Mendelian disease •Failure to