53
Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 [email protected] du 303-724-3107 HMGP HMGP

Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 [email protected] 303-724-3107 HMGP

Embed Size (px)

Citation preview

Page 1: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Hunting Disease Genes

Richard A. Spritz, M.D.April 13, 2015

[email protected]

Page 2: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Why Find Disease Genes?

Acceleratedby finding

the disease gene

Page 3: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

•Virtually all diseases result from a combination of genes and environmental factors

•We have no systematic ways to discover environmental risk factors

•We do have systematic ways discover disease genes

•Discovery of disease genes will provide clues to pathogenic mechanisms, new

approaches to treatment, inference of environmental risk factors, and ultimately disease prevention•Personalized medicine ( = “Precision Medicine”)

Why Find Disease Genes?

The Holy Grail

Page 4: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Personalized/Precision Medicine Paradigm

• Discover risk genes for common diseases, specific risk variants, high-risk combinations

• Carry out accurate DNA-based predictive diagnostics of disease susceptibilities based on individualized genetic risks

• Apply optimized individualized treatment or prevention based on genetic diagnosis of disease susceptibilities and pharmacogenetic

• analysis of optimized drug efficacy/specificity• This is why there was a Human Genome Project

Page 5: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Personalized/Precision Medicine Paradigm--Problems

•For most common complex traits, individual genes/variants confer low odds ratio OR = Risk of disease having a given gene variant / Risk of disease not having variant Population/study wide; no meaning at level of individual

•We do not yet know how to do “combinatorial” complex trait risk prediction Genetic risk scores

•For most complex diseases it has been hard to account for much of the ‘heritability’ of the trait H2 = (Var G) / (Var P)

•Low positive predictive value of genetic tests for complex traits significant non-genetic component late onset

Page 6: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Hunting for Disease Genes

1. In a “Mendelian”, single-gene trait, one gene is sufficient to cause (most of) the disease phenotype2. In a polygenic/multifactorial, “complex” trait, no one gene is sufficient to cause the disease phenotype

Page 7: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

How Do You Find Disease Genes?

I. Hypothesis-driven approachesCandidate gene associationCandidate gene sequencing

II. Hypothesis-free approachesGenomewide linkageGenomewide association(Genomewide expression)Genomewide sequencing

ExomeFull-genome

Page 8: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Disease Gene Identification—“Functional Cloning” vs. “Positional Cloning”

Page 9: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Positional Cloning: Determine a Disease Gene’s Genomic Position, and then

Identify the GeneObviated by

Human Genome Project

Page 10: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

• You can only track/measure differences between people and through families

• Polymorphic DNA markers constitute any scorable differences at known genomic positions

• Surrogates for disease mutations; some polymorphisms cause disease; most don’t

• Most commonly used marker types:– microsatellites– single-nucleotide polymorphisms (SNPs)– copy-number variations (CNVs)

Gene Mapping Technology Polymorphic DNA Markers

Page 11: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

The First Goal of the HGP was to Assemble a High-Density Genome Map

of Polymorphic Markers

Page 12: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

How Do You Find Disease Genes?

I. Hypothesis-driven approachesCandidate gene associationCandidate gene sequencing

II. Hypothesis-free approachesGenomewide linkageGenomewide association(Genomewide expression)Genomewide sequencing

ExomeFull-genome

Most hypotheses wrong!

Page 13: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Genetic Linkage Studies

•Studies families•Search for regions of genome that are systematically co-inherited along with disease on passage through families•Requires families with multiple affected relatives (multiplex families)•Best at detecting genes with Mendelian effects (uncommon alleles with strong effects)•Unit of genetic linkage is LOD (“Log of the Odds) score (>3)

Page 14: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Principle of genetic linkage—Loci close by on a chromosome tend not to be separated by recombination vs. loci far apart

Loci on the same chromosome Loci on different chromosomes Very close Nearby Far Apart

Freq. of crossover Rare Some Frequent - between 2 loci

Linkage Tight Some Absent Absent

Recombination 0% 1-49% 50% 50%

• Unit of genetic “distance” is centiMorgan (cM) = 1% recombination/meiosis; ~ 1 Mb

Page 15: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Genetic Linkage Analysis

• Statistical measure is LOD (log of odds) score

• Significance level: LOD >3.0 for Mendelian trait LOD >3.3 for Polygenic

trait

Likelihood of data if loci unlinked

Likelihood of data if loci linked at LOD = Log10

Page 16: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Restriction Fragment Length Polymorphism (RFLP)

EcoRI

Allele 1 AGAGCCTCAACTTGAATTCGTTTAGTAA

Allele 2 AGAGCCTCAACTTGAATTTGTTTAGTAA

Restriction enzyme EcoRI cuts at sequence

5’-GAATTC-3’

Allele 1 has an EcoRI cut site; Allele 2 does not• This RFLP is assaying a SNP

Page 17: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP
Page 18: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

“Genetic linkage analysis”Co-segregation of disease gene in “multiplex

families” with alleles of polymorphic DNA “markers” (initially RFLPs)

Page 19: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

“Microsatellites” (SSLPs; STRPs, SSRs) [multi-allelic; ~ 1/30,000 bp; mostly used for

linkage analysis, forensics]

ggctgcacacacacacacacacacacacatgctt

ggctgcacacacacacacacacacacatgctt

ggctgcacacacacacacacacacatgctt

ggctgcacacacacacacacacatgctt

ggctgcacacacacacacacatgctt

Page 20: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Can follow “segregation” of ancestral “haplotypes” of linked marker alleles along a chromosome through families

Page 21: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Recombination events prune marker haplotypes, defining “genetic interval” that

must contain the disease gene

Page 22: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Single-Nucleotide Polymorphisms (SNPs) [bi-allelic; ~1/50-300 bp; mostly used for

association analysis]SNP1 Allele 1 CCGAGATCCAGAAATCCTGAACATAA

SNP1 Allele 2 CTGAGATCCAGAAATCCTGAACATAA

SNP2 Allele 1 CCGAGATCCAGAAATCCTGAACATAA

SNP2 Allele 2 CCGAGATCCAGAAAGCCTGAACATAA

• Occurrence/allele frequencies differ in different ethic groups/populations

• Can be in genes (~4,000,000) on not (~8,000,000), can result in amino acid substitutions or not

• Each occurs in local context (haplotype) of surrounding SNPs (in example above, SNP2 is on background of SNP1 C allele)

Page 23: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Haplotype Map of Human GenomeInternational HapMAP Project

•Recombination breaks macro-patterns of polymorphic genotypes on the same chromosome into haplotypes

•Recombination is not truly random, so very close polymorphism genotypes on the same chromosome cluster into ~10-50 kb haplotype blocks in which SNP alleles are in linkage disequilibrium (marker alleles within blocks tend to be co-inherited, because recombination within blocks is uncommon)

•Blocks smaller in African than Caucasian or Asian pops. because African pop. is more ancient

•HapMap genotyped SNPs in different populations to characterize haplotype block distributions

Page 24: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Copy-Number Variants (CNVs) [bi-allelic]

Basically are common genomic deletions, hundreds to tens of thousands of nucleotides in size

May be detected by LD with local SNP patterns:

Allele --1---1---1----1---2----1----2----1-----1----2----1----1----1----1---Allele --2---2---2----1---1----2----2----1-----1----2----2----2----1----2---CNV Allele --1---1—[ ]--1----2---

• Tens of thousands known• Like SNPS, occurrence/allele frequencies differ in different

ethic groups/populations• Individually most are rare (< 1%), collectively common• Can be in genes or not, can include genes• NOT commonly definitively causal for human disease

Page 25: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

1000 Genomes Project, UK10K Project International projects to sequence 1000/10000

genomes from different ethnic groups

• Catalog human genetic variations (particularly SNPs, indels)– ~60,000,000 SNPs now known– Essential for sequence-based analysis of rare variants that may be causal for

common diseases

Page 26: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

How Do You Find Disease Genes?

I. Hypothesis-driven approachesCandidate gene associationCandidate gene sequencing

II. Hypothesis-free approachesGenomewide linkageGenomewide association(Genomewide expression)Genomewide sequencing

ExomeFull-genome

Most hypotheses wrong!

Page 27: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Common, Complex Diseases

• Asthma• Autism• Obesity• Preterm birth• Cleft lip/palate• IBD• Diabetes• Cancers• Common traits like height

Page 28: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Common, Complex DiseasesUtility of Experimental Approaches

CommonCommon

RISK ALLELE RISK ALLELE FREQUENCYFREQUENCY

RareRare

SmallSmall LargeLargeEFFECT SIZE (OR)EFFECT SIZE (OR)

GWASGWAS

LinkageLinkageRe-SequencingRe-Sequencing

Page 29: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

How Do You Find Disease Genes?

I. Hypothesis-driven approachesCandidate gene associationCandidate gene sequencing

II. Hypothesis-free approachesGenomewide linkageGenomewide association(Genomewide expression)Genomewide sequencing

ExomeFull-genome

Page 30: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Hypothesis-Driven Approaches

Candidate genes Depends on:

biological hypothesis (biological candidate) positional hypothesis / information (positional

candidate)

Sometimes successful in Mendelian disorders Low yield in polygenic, multifactorial

(“complex”) disorders—pathogenic sequence variants not obvious, often present in normal individuals

Most hypotheses wrong!

Page 31: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Candidate Gene Association Study

Concept:Causal disease variation in gene suggested by known biology ‘tagged’ by nearby polymorphic DNA markers; test for co-occurrence.Because:DNA sequence variations very close together on the same piece of DNA will tend to not be separated by recombination over long periods, and so will be non-randomly co-inherited even on a populationbasis (“linkage disequilibrium”).Most hypotheses wrong!

Page 32: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Candidate Gene Association Studies Compares SNP allele frequencies in cases

versus controls (“case-control” study design) Easy statistics (Fisher exact test, Chi-square) Must Bonferroni correct for multiple-testing Must ethnically match cases and controls Easy, cheap Most powerful for common risk alleles Can detect common alleles with small allele-

specific effects (i.e. “complex”, polygenic traits)

Most common published type of “genetic study”

Most hypotheses wrong! Most (~96%) such published studies wrong!!

Page 33: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Three Fatal Flaws in Gene-by-Gene Case-Control Design

• Must apply multiple-testing correction; true denominator often not known

• Must ethnically match cases & controls; otherwise, differences in allele frequencies may reflect different genetic backgrounds of cases vs. controls

• Positive studies result in publication bias

Page 34: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

“Population stratification” and false-positive case-control genetic association studies

Population 1 Population 2

Admixed Study Population 1/2

Cases Controls

Disease

blue/green just indicates overall genetic background

Prof. Wizard’s Prof. Wizard’s Case-Control Case-Control

StudyStudy

Eureka!

Page 35: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Hypothesis-Free Approaches Genome-Wide Association Studies (GWAS)

Relatively recent approach (>300 published):

•Genotype hundreds of thousands to millions of SNPs across genome using microarrays; extremely expensive•Case-control or family-based (trio) design•Requires no hypotheses about pathogenesis; can discover new genes•Can discover common alleles with small effects•Can provide very fine localization

Page 36: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

How Do You Find Disease Genes?

I. Hypothesis-driven approachesCandidate gene associationCandidate gene sequencing

II. Hypothesis-free approachesGenomewide linkageGenomewide association(Genomewide expression)Genomewide sequencing

ExomeFull-genome

Page 37: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Hypothesis-free approachesGenome-wide association studies (GWAS)

• Study self-contained; can apply appropriate multiple testing correction

- “Genomewide significance” P < 5 x 10-8

• Still requires ethnic matching of cases and controls- Can correct for population stratification by

“Principal components” analysis- Can correct for residual “Genomic inflation

factor” by “genomic control”• Can discover new, unknown genes; power similar to

candidate gene case-control study• Case-control “associations” require independent

confirmation

Page 38: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

The Genomewide Association Study (GWAS)

Manolio TA. N Engl J Med 2010;363:166-176.

Page 39: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Meta-Analysis of Multiple Genomewide Association Studies

Page 40: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Genome-Wide Association Studies“Manhattan plot”

Per-SNP -log(P values)across genomefor association ofSNP allele freq.differences between patientswith generalizedvitiligo versuscontrols (all Caucasian)

Page 41: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Genome-Wide Association Studies

• Very large number of SNPs tested (500,000 – 2,000,000) presents huge multiple-testing problem; requires at least ~1000 cases and ~1000 controls• Many SNPs in linkage disequilibrium (i.e. correlated); simple Bonferroni correction too strict (assumes independence)•“Significant” associations require confirmation by independent follow-up association study of specific SNPs to reduce multiple-testing complexity

Page 42: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Personalized MedicineThe case of the ‘missing heritability’

• Disease risk genes found by GWAS account for only a small fraction of genetic risk >Type 1 diabetes-- ~100 genes, ~70% of genetic risk 50% of risk due to HLA class II

• Are there a virtually unlimited number of additional genes, each conferring small additional risk?

>Maybe• Have we under-estimated fraction of genetic risk already

accounted for?>Maybe. GWAS misses rare risk alleles

• Have we over-estimated total genetic component of risk? >Maybe, but not ten-fold

Page 43: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Hypotheses of Common, “Complex” Disease

• Common disease, common variant hypothesis (Reich & Lander, 2001)

versus

• Rare variant hypothesis (Pritchard, 2001; Prixhard and Cox, 2002)

Page 44: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Complex DiseasesUtility of Experimental Approaches

CommonCommon

RISK ALLELE RISK ALLELE FREQUENCYFREQUENCY

RareRare

SmallSmall LargeLargeEFFECT SIZE (OR)EFFECT SIZE (OR)

GWASGWAS

LinkageLinkageRe-SequencingRe-Sequencing

Page 45: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Combined hypothesis-based and hypothesis-free approaches

Deep re-sequencing

• High-throughput DNA sequencing• Biological candidate genes • GWAS signals (specific genes or genes

within regions)• Must distinguish potentially causal variants

from non-pathological variation (1000 Genomes Project data will help)

• Prioritize for follow-up functional analyses

Page 46: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

How Do You Find Disease Genes?

I. Hypothesis-driven approachesCandidate gene associationCandidate gene sequencing

II. Hypothesis-free approachesGenomewide linkageGenomewide association(Genomewide expression)Genomewide sequencing

ExomeFull-genome

Page 47: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Hypothesis-free approach

Exome/Genome sequencing

• High-throughput DNA sequencing- Genome- Exome (1% of genome)

• Must distinguish potentially causal variants from non-pathological variation (1000 Genomes Project data will help)

- Predict based on Mendelian inheritance- Compare across unrelated families

• Prioritize for follow-up functional analyses

Page 48: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Exome Sequencing in Mendelian DiseasesMethod

E

Exome = Gene coding regions; ~ 3 Mb (1% of genome)

Page 49: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

How Do You Find Disease Genes?Exome/Genome Sequencing in Mendelian Diseases

There is a lot of genomic ‘noise’

E

There is a lotof noise!!

Page 50: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

Variant Filtering in Exome/Genome Sequencing

• Missense (non-synonymous) substitutions- Most rare (<1%) missense may be deleterious

• Nonsense, frameshift mutations• Splice junction mutations• Exonic splice enhancer mutations• INDELs, CNVs, translocations• Regulatory Feature variants

Page 51: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

How Do You Find Disease Genes?Exome/Genome Sequencing in Mendelian Diseases

Filtering Schemes

E

Page 52: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP

How Do You Find Disease Genes?Exome/Genome Sequencing in Mendelian Diseases

Exome sequencing is rapidly becoming a fairly routine clinical test, costing ~$1000, ordered in lieu of tens of thousands of dollars worth of functional clinical tests in a patient one believes might have a genetic (principally single-gene Mendelian) cause for their disorder.

Who will do the interpretation of the data, how will “variants of unknown significance” (VUS) be addressed, and what will that cost?

Page 53: Hunting Disease Genes Richard A. Spritz, M.D. April 13, 2015 richard.spritz@ucdenver.edu 303-724-3107 HMGP