98
SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Mark J. Rieder, PhD Dana Crawford, PhD Dana Crawford, PhD Deborah Nickerson, PhD Deborah Nickerson, PhD SeattleSNPs PGA SeattleSNPs PGA July 19-20, 2005 July 19-20, 2005

SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

  • View
    224

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

SNP Discovery and Analysis: Application to Association Studies

Mark J. Rieder, PhDMark J. Rieder, PhDDana Crawford, PhDDana Crawford, PhD

Deborah Nickerson, PhDDeborah Nickerson, PhD

SeattleSNPs PGASeattleSNPs PGAJuly 19-20, 2005July 19-20, 2005

Page 2: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Practical Aspects of SNP Association Studies

1. SNP Discovery: Where do I find SNPs to use in my association studies? (e.g. databases, direct resequencing)

2. SNP Selection:How do I choose SNPs that are informative?(i.e. assessing SNP correlation - linkage disequilibrium)

3. SNP Associations:What analyses can I perform after genotyping these SNPs?(e.g. single SNP data, haplotype data)

4. SNP Replication/Function:How is function predicted or assessed. (e.g. nonsynonymous SNPs, conserved non-coding regions (CNS)

transcription factor binding sites, gene expression)

Page 3: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

SeattleSNPs Program for Genomic Applications: Overview

Aim 1: To establish a variation discovery resource capable of comprehensive resequencing of candidate genes related to HLBS.

Biological Focus: InflammationBiological Focus: InflammationGenes and Pathways: Coagulation, Complement, CytokinesGenes and Pathways: Coagulation, Complement, Cytokines Interacting PartnersInteracting Partners

Page 4: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

SNPs in Candidate Genes

Average Gene Size - 26.5 kb ~ Compare 2 haploid - 1 in 1,200 bp

~130 SNPs (200 bp) - 15,000,000 SNPs

~ 44 SNPs > 0.05 MAF (600 bp) - 6,000,000 SNPs

SeattleSNPs

Page 5: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

SeattleSNPs PGA: Candidate Gene SNP Resource

• 4.9 Mb in 47 individuals = 230 Mb total sequence • Define sequence diversity - catalogue all SNPs• Select “optimal” tagSNPs sets • Determine haplotype structure • Provide necessary baseline data for association studies

Page 6: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Warfarin Pharmacogenetics1. Background

• Warfarin characteristics• Pharmacokinetics/Pharmacodynamics• Discovery of VKORC1

2. VKORC1 - SNP Discovery

3. VKORC1 - SNP Selection (tagSNPs)

4. VKORC1 - SNP Testing• SNP/Haplotype Inference• Haplotype Inference, Testing

5. VKORC1 - SNP Replication/Function

Page 7: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Pharmacogenomics as a Model for Association Studies

Reduce variability and identify outliers. Prospective testing

Personalized Medicine

Clear genotype-phenotype link intervention variable responsePharmacokinetics - 5x variation

Quantitative intervention and responsedrug dose, response time, metabolism rate, etc.

Target/metabolism of drug generally knowngene target that can be tested directly with response

Page 8: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Warfarin Background• Commonly prescribed oral anti-coagulant

• In 2003, 21.2 million prescriptions were written for warfarin (Coumadin)

• Prescribed following MI, atrial fibrillation, stroke,venous thrombosis, prosthetic heart valve replacement,and following major surgery

• Difficult to determine effective dosage- Narrow therapeutic range - Monitoring of prothrombin time (INR) - 2.0 - 3.0

- Large inter-individual variation

Page 9: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Add warfarin dose distribution

Patient/Clinical/Environmental Factors

0

10

20

30

40

50

0 2 4 6 8 10 12 14 16

Warfarin Dose (mg/d)

No. of patients

Ave: 5.2 mg/dn = 186European-American30x dose variability

Pharmacokinetic/Pharmacodynamic - Genetic

Page 10: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Vitamin K-dependent clotting factorsVitamin K-dependent clotting factors(FII, FVII, FIX, FX, Protein C/S/Z)(FII, FVII, FIX, FX, Protein C/S/Z)

EpoxideReductase

-Carboxylase(GGCX)

Warfarin inhibits the vitamin K cycle

Warfarin

Inactivation

CYP2C9

Pharmacokinetic

Page 11: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Warfarin Metabolism (Pharmacokinetics)Warfarin Metabolism (Pharmacokinetics)

Major pathway for termination of pharmacologic effect Major pathway for termination of pharmacologic effect is through metabolism of S-warfarin in the liver by CYP2C9is through metabolism of S-warfarin in the liver by CYP2C9

• CYP2C9CYP2C9 SNPs alter warfarin metabolism: SNPs alter warfarin metabolism:

CYP2C9*1 (WT) - normalCYP2C9*1 (WT) - normalCYP2C9*2 (Arg144Cys) - low/intermediateCYP2C9*2 (Arg144Cys) - low/intermediateCYP2C9*3 (Ile359Leu) - low CYP2C9*3 (Ile359Leu) - low

• CYP2C9CYP2C9 alleles occur at a significant minor allele frequency alleles occur at a significant minor allele frequencyEuropean: *2 - 10.7% *3 - 8.5 % European: *2 - 10.7% *3 - 8.5 % Asian: *2 - 0% *3 - 1-2% Asian: *2 - 0% *3 - 1-2% African-American: *2 - 2.9% *3 - 0.8% African-American: *2 - 2.9% *3 - 0.8%

Page 12: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Effect of CYP2C9 Genotype on Anticoagulation-Related Outcomes(Higashi et al., JAMA 2002)

WARFARIN MAINTENANCE DOSE

0

1

2

3

4

5

6

7

8

9

*1/*1 *1/*2 *2/*2 *1/*3 *2/*3 *3/*3

mg Warfarin/day

N 127 28 4 18 3 5

mg

war

fari

n/d

ay

- Variant alleles have significant clinical impact- Variant alleles have significant clinical impact- Still large variability in warfarin dose (15-fold) in *1/*1 “controls”?- Still large variability in warfarin dose (15-fold) in *1/*1 “controls”?

TIME TO STABLE ANTICOAGULATION

CYP2C9-WT ~90 days

*2 or *3 carriers take longer to reach stable anticoagulation

CYP2C9-Variant ~180 days

Page 13: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Analysis of Analysis of IndependentIndependent Predictors of Warfarin Dose Predictors of Warfarin Dose

Variable Change in Warfarin Dose P value

Target INR, per 0.5 increase 21% <0.0005

BMI, per SD 14% <0.0001

Ethnicity (African-American, [Asian]) 13%, [ 10-15%] 0.003

Age, per decade 13% <0.0001Gender, Female 12% <0.0001

Drugs (Amiodarone) 24% 0.007

CYP2C9*2, per allele 19% <0.0001

CYP2C9*3, per allele 30% <0.0001

Adapted from Gage et al., Thromb Haemost, 2004

~ 30% of the variability in warfarin dose is explained by these factors

What other candidate genes are influencing warfarin dosing?What other candidate genes are influencing warfarin dosing?

Page 14: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Vitamin K-dependent clotting factorsVitamin K-dependent clotting factors(FII, FVII, FIX, FX, Protein C/S/Z)(FII, FVII, FIX, FX, Protein C/S/Z)

EpoxideReductase

-Carboxylase(GGCX)

Warfarin acts as a vitamin K antagonist

Warfarin

Inactivation

CYP2C9

Pharmacodynamic

Page 15: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

New Target Protein for Warfarin

EpoxideReductase

-Carboxylase(GGCX)

Clotting Factors(FII, FVII, FIX, FX, Protein C/S/Z)

Rost et al. & Li, et al., Nature (2004)

(VKORC1)

5 kb - chr 165 kb - chr 16

Page 16: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Warfarin Resistance VKORC1 Polymorphisms

• Rare non-synonymous mutations in VKORC1 causative for warfarin resistance (15-35 mg/d)• NONO non-synonymous mutations found in ‘control’ chromosomes (n = ~400)

Rost, et. al. Nature (2004)

Page 17: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Warfarin maintenance dose (mg/day)

Inter-Individual Variability in Warfarin Dose: Genetic Liabilities

SENSITIVITYSENSITIVITY

CYP2C9 coding

SNPs - *3/*3

RESISTANCERESISTANCEVKORC1

nonsynonymous coding SNPs

0.5 5 15

Fre

qu

ency

Common Common VKORC1VKORC1

non-coding non-coding SNPs?SNPs?

Page 18: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

SNP Discovery: Resequencing SNP Discovery: Resequencing VKORC1VKORC1

• PCR amplicons --> Resequencing of the complete genomic region

• 5 Kb upstream and each of the 3 exons and intronic segments; ~11 Kb

• SeattleSNPs PGA - pga.gs.washington.edu (24 African-Am./23 Europeans)

• Warfarin treated clinical patients (UWMC): 186 European

• Other populations: 96 European, 96 African-Am., 120 Asian

Page 19: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Summary of PGA samples (European, n = 23)Total: 13 SNPs identified 10 common/3 rare (<5% MAF)

Clinical Samples (European patients n = 186)Total: 28 SNPs identified 10 common/18 rare (<5% MAF)

15 - intronic/regulatory7 - promoter SNPs2 - 3’ UTR SNPs3 - synonymous SNPs1 - nonsynonymous

- single heterozygous indiv. - highest warfarin dose = 15.5 mg/d

How does the comprehensive SNP discovery compare to How does the comprehensive SNP discovery compare to what was known for this gene?what was known for this gene?

SNP Discovery: Resequencing Results

Page 20: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

dbSNP -NCBI SNP database

SNP Discovery: dbSNP database

Page 21: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

SeattleSNPs Resequencing 28 SNPs --> 15 SNPs gene region

10 dbSNPs • 8/10 confirmations

• 3 frequency/genotype data

• 7 new dbSNP entries generated by SeattleSNPs resequencing

• 8 dbSNPs/15 SNPs (~50%)

SNP Discovery: dbSNP database (VKORC1)

Page 22: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

SNP Discovery: dbSNP database

Nickerson and Kruglyak, Nature Genetics, 2001

Mar 2005Mar 2005 - 5.0 million (validated - 1/600 bp) - 5.0 million (validated - 1/600 bp)

5.0/10.0 = 50% of all common SNPs (validated)!5.0/10.0 = 50% of all common SNPs (validated)!

Page 23: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

SNP discovery is dependent on your sample population sizeSNP discovery is dependent on your sample population size

0.0 0.2 0.3 0.4 0.50.10.0

0.5

1.0

Minor Allele Frequency (MAF)

Fra

ctio

n o

f S

NP

s D

isco

vere

d

2

4824

16

8

96

GTTACGCCAATACAGGTTACGCCAATACAGGGATCCAGGAGATTACCATCCAGGAGATTACCGTTACGCCAATACAGGTTACGCCAATACAGCCATCCAGGAGATTACCATCCAGGAGATTACC{{2 chromosomes2 chromosomes

Page 24: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Rarer and population specific SNPs are found by resequencing

SNP Discovery: dbSNP database

Minor Allele Freq. (MAF)

dbSNP (Perlegen/HapMap) SeattleSNPs

Minor Allele Freq. (MAF)

{ 75%75%50%50%

25%25%

Page 25: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

0.00

1.00

2.00

3.00

4.00

5.00

6.00

Jan-03 Mar-03 Jun-03 Aug-03 Oct-03 Jan-04 Mar-04 Jun-04 Sep-04 Jan-05 Mar-05

dbSNP Release

SNPs(millions)

Validated SNPs

SNPs with Genotypes

PerlegenPerlegenDataData

dbSNP: Increasing numbers of SNPs now have genotype datadbSNP: Increasing numbers of SNPs now have genotype data

HapMapHapMapPhase IIPhase IIPerlegenPerlegen

Page 26: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

0.00

2.00

4.00

6.00

8.00

10.00

12.00

Jan-03 Mar-03 Jun-03 Aug-03 Oct-03 Jan-04 Mar-04 Jun-04 Sep-04 Jan-05 Mar-05

dbSNP Release

SNPs(millions)

Total Reference SNPsValidated SNPsGenotyped SNPs

Current State of dbSNPCurrent State of dbSNP

Many SNPs left to validate and characterize.Many SNPs left to validate and characterize.

Page 27: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Development of a genome-wide SNP map: How many SNPs?Development of a genome-wide SNP map: How many SNPs?

Nickerson and Kruglyak, Nature Genetics, 2001

~ 10 million common SNPs (>1- 5% MAF) - 1/300 bp~ 10 million common SNPs (>1- 5% MAF) - 1/300 bp

Mar 2005Mar 2005 - 5.0 million (validated - 1/600 bp) - 5.0 million (validated - 1/600 bp)

5.0/10.0 = 50% of all common SNPs validated!5.0/10.0 = 50% of all common SNPs validated!Coming Soon!Coming Soon! 5.0 million validated SNPs with genotypes! 5.0 million validated SNPs with genotypes!

Page 28: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

dbSNP Issues:

Not comprehensive catalog (50% of SNPs)

Is the data confirmed? (50% are validated)

Information about allele frequency/population (50%)

No information about SNP correlations (linkage disequilibrium)genotyping efficiency

SNP Discovery: dbSNP database

Page 29: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

• Common SNPs• VKORC1 - 28 total - 10 SNPs > 10% MAF

• Evaluate linkage disequilibrium (non-random association ofgenotype data)

Does common variation in VKORC1 have a role in determiningDoes common variation in VKORC1 have a role in determiningwarfarin dose?warfarin dose?

Warfarin Dose (mg/d)

Fre

quen

cy

SNP Selection: Using Linkage Disequilibrium

Page 30: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

T G 0.5 X 0.5 = 0.25 0.48 *

C : 50%

T : 50%

A : 50%

G : 50%

Site 1 Site 2

C A 0.5 X 0.5 = 0.25 0.50 * C G 0.5 X 0.5 = 0.25 0.01 T A 0.5 X 0.5 = 0.25 0.01

C

T

A

G

Site 1 Site 2

Maternal

Paternal

* Sites Correlated

Possible2-site comb. Expected Freq. Observed Freq.

SNP Selection: Using Linkage Disequilibrium

Page 31: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

SNP Selection: Using Linkage Disequilibrium• SNP discovery data (i.e. population of samples with genotypes)• Find all correlated SNPs to minimize the total number of SNPs• Maintains genetic information (correlations) for that locus

LD_Select - SNP tagging/binning algorithm - based on LD (r2), not haplotypes

Carlson, et al. AJHG (2004)

Page 32: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

SNP Selection: VG/LD_Select on the Web

pga.gs.washington.ed/VG2

Page 33: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

SNP Selection: tagSNP Data

Page 34: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

SNP Selection: VKORC1 tagSNPs

Page 35: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Five Bins to TestFive Bins to Test1.1. 381, 3673, 6484, 6853, 7566381, 3673, 6484, 6853, 75662.2. 2653, 60092653, 60093.3. 8618614.4. 580858085.5. 90419041

Bin 1 - p < 0.001Bin 1 - p < 0.001Bin 2 - p < 0.02 Bin 2 - p < 0.02 Bin 3 - p < 0.01 Bin 3 - p < 0.01 Bin 4 - p < 0.001 Bin 4 - p < 0.001 Bin 5 - p < 0.001Bin 5 - p < 0.001

C/C C/T T/T

e.g. Bin 1 - SNP 381

SNP x SNP interactions - haplotype analysis?SNP x SNP interactions - haplotype analysis?

SNP Testing: VKORC1 tagSNPs

Page 36: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

VKORC1 Summary: SNP Discovery/SNP Selection

1. VKORC1 candidate gene for warfarin dose response

2. SNP discovery performed using PCR/resequencing to catalog common SNPs• 28 SNPs found • 10 common SNPs

3. SNP discovery using dbSNP• 8/10 dbSNPs confirmed • 7 new SNPs added

4. SNP Selection using linkage disequilibrium• 10 common SNPs (> 10% MAF)• 5 informative SNPs for genotyping

Page 37: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Haplotypes in Genetic Association Studies

Two main approaches with haplotypes:

Haplotypes Pick tagSNPs Genotype samples

Pick tagSNPs Infer haplotypes Test for association

Page 38: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Haplotypes in Genetic Association Studies

1. How can you get haplotypes?

2. What information do you get from haplotypes?

3. How do you use haplotypes to find tagSNPs?

4. How do you use haplotypes to test for associations?

Page 39: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Haplotypes – The Definition

“…a unique combination of genetic markers present in a chromosome.” pg 57 in Hartl & Clark, 1997

Page 40: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Constructing Haplotypes

C TA G

T TG G

C CA G

C/T, A/G

C/C, A/GT/T, G/G

C/T, A/AC/C, A/G

Collect pedigrees Somatic cell hybrids

Human Rodent

Hybrid

SNP 1 SNP 2

C/T A/G

Allele-specific PCR

Page 41: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Constructing Haplotypes

Examples of Haplotype Inference Software:

EM AlgorithmHaploview http://www.broad.mit.edu/mpg/haploview/index.php Arlequinhttp://lgb.unige.ch/arlequin/

PHASE v2.1http://www.stat.washington.edu/stephens/software.html

HAPLOTYPERhttp://www.people.fas.harvard.edu/~junliu/Haplo/docMain.htm

Page 42: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Haplotypes in SeattleSNPs

• >200 genes re-sequenced in inflammation response

• 2 populations: European- and African-Americans

• PHASEv2.0 results posted on website

• Interactive tool (VH1) to visualize and sort haplotypes

http://pga.gs.washington.edu

Page 43: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Haplotypes in SeattleSNPs

Page 44: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Haplotypes in SeattleSNPs

Page 45: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Haplotypes in SeattleSNPs

Page 46: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Haplotypes in SeattleSNPs

Page 47: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Haplotypes in SeattleSNPs

Page 48: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Haplotypes in SeattleSNPs

Page 49: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Haplotypes in SeattleSNPs

Page 50: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Haplotypes in SeattleSNPs

Page 51: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Haplotypes in SeattleSNPs

Page 52: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Haplotypes in SeattleSNPs

Page 53: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Haplotypes in SeattleSNPs

Page 54: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Haplotypes in Genetic Association Studies

Two main approaches with haplotypes:

Haplotypes Pick tagSNPs Genotype samples

Pick tagSNPs Infer haplotypes Test for association

RecombinationNatural selectionPopulation historyPopulation demography

Haplotype block definition

Page 55: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Measuring Pair-wise SNP Correlations

• SNP correlation described by linkage disequilibrium (LD)

• Pair-wise measures of LD: D´ and r2

D = pAB - pApB; D´ = D/Dmax Recombination

r2 = D2

f(A1)f(A2)f(B1)f(B2) Power

Page 56: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

• r2 is inversely related to power

1/r2

1,000 cases 1,250 cases1,000 controls r2=1.0 1,250 controls r2 = 0.80

• D´ is related to recombination history

D´ = 1 no recombinationD´ < 1 historical recombination

Example: LDSelect

Example: Haplotype “blocks”

Using LD and Haplotypes to Pick tagSNPs

Page 57: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Haplotype “Blocks”

Strong LD Few Haplotypes Represent most chromosomes

Daly et al 2001Daly et al Nat. Genet. (2001)

Page 58: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Block Definitions

Daly et al 2001

D´ [Gabriel et al Science (2002)]

Daly et al Nat. Genet. (2001)

Page 59: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Block Definitions

A B

a bA b

a B

Four-gamete test:

A B

a b

<4 haplotypes, D´=1 block

4 haplotypes, D´<1 boundary

Page 60: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Haplotype Blocks and tagSNPs

Identifying blocks and tagSNPs:

• Manually

• Algorithms– Haploview

Page 61: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Haplotype Blocks and tagSNPs

IL1B:19 SNPs (MAF >5%)

4 “common” haplotypes

tagSNPs

Page 62: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Haplotype Blocks and tagSNPs

Identifying blocks and tagSNPs:

• Manually

• Algorithms– HaploView

Page 63: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,
Page 64: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

LD and tagSNPs using Haploview

VKORC1European-Americans

PHASEv2.1 data

Page 65: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,
Page 66: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,
Page 67: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,
Page 68: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Minimal set of tagSNPs based on r2

Page 69: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,
Page 70: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Where to Find Tagging Software

HaploBlockFinder http://cgi.uc.edu/cgi-bin/kzhang/haploBlockFinder.cgi

LDSelect http://droog.gs.washington.edu/ldSelect.html

SNPtagger http://www.well.ox.ac.uk/~xiayi/haplotype/index.html

TagIT http://popgen.biol.ucl.ac.uk/software.html

tagSNPs http://www-rcf.usc.edu/~stram/tagSNPs.html

Haploview http://www.broad.mit.edu/personal/jcbarret/haplo/

Page 71: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Haplotypes, TagSNPs, and Caveats

• Haplotypes are inferred

• Block-like structure assumed for some software

• Different block definitions

• Block boundaries sensitive to marker density

• Genotype savings may not be great (recombination)

Page 72: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Haplotypes in Genetic Association Studies

Two main approaches with haplotypes:

Haplotypes Pick tagSNPs Genotype samples

Pick tagSNPs Infer haplotypes Test for associationGenetic diversity of sampleMulti-SNP analysis

Page 73: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Five tagSNPs (10 total SNPs)Five tagSNPs (10 total SNPs)

186 warfarin patients (European)PHASE v2.1

9 haplotypes/5 common (>5%)

Multi-SNP testing: Haplotypes

Page 74: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Multi-SNP testing: Haplotypes

Test for association between haplotype and warfarin dose using multiple linear regression

Adjusted for all significant covariates: age, sex, amiodarone, CYP2C9 genotype

Page 75: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

CCGATCTCTG-H1 CCGAGCTCTG-H2

TAGGTCCGCA-H8 TACGTTCGCG-H9

(381, 3673, 6484, 6853, 7566) 5808

9041

861

B

A

VKORC1 haplotypes cluster into divergent clades

Patients can be assigned a clade diplotype:e.g. Patient 1 - H1/H2 = A/A

Patient 2 - H1/H7 = A/BPatient 3 - H7/H9 = B/B

Explore the evolutionary relationship across haplotypes

TCGGTCCGCA-H7

Multi-SNP testing: Haplotypes

Page 76: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

VKORC1 clade diplotypes show a strong association with warfarin dose

Low

High

A/AA/BB/B

*

††

**

All patients 2C9 WT patients 2C9 VAR patientsAA AB BBAA AB BB AA AB BB

(n = 181) (n = 124) (n = 57)

Independent of INR levels across all groups

Page 77: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

• European - mean ~ 5 mg/d

• African-American - higher ~ 6.0-7.0 mg/d

• Asian - lower ~ 3.0-3.5 mg/d

Hypothesis:Hypothesis: VKORC1VKORC1 haplotypes contribute to racial haplotypes contribute to racial variability in warfarin dosing.variability in warfarin dosing.

• “Control” populations: 120 Europeans 96 African-Americans

120 Asian

Multi-SNP testing: Haplotypes

Page 78: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Asian (Han) Clade Distribution

Low dose phenotype

A(89%)

B(11%)

African-American Clade Distribution

High dose phenotype

A(14%)

B(47%)

Other(39%)

European (CEPH)Clade Distribution

B(58%)

A(37%)

Clade A = LowClade B = High

Explore the evolutionary relationship across populations

Multi-SNP testing: Haplotypes

Page 79: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

• Small sample size

• Subgroup analysis and multiple testing

• Random error

• Poorly matched control group

• Failure to attempt study replication

• Failure to detect LD with adjacent loci

• Overinterpreting results and positive publication bias

• Unwarranted ‘candidate gene’ declaration after identifying association in arbitrary genetic region

Common Errors in Association StudiesBell and Cardon (2001)

e.g., Second case/control studyGene expression studies

Page 80: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

*

††

* *

All patients 2C9 WT patients 2C9 VAR patientsAA AB BBAA AB BB AA AB BB

Univ. of Washingtonn = 185

All patients 2C9 WT patients 2C9 VAR patientsAA AB BBAA AB BB AA AB BB

*

*

21% variance in dose explained

Washington Universityn = 386

Brian GageHoward McCleodCharles Eby

SNP Replication: VKORC1

Page 81: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

SNP Function: VKORC1 Expression

mechanism

No nonsynonymous SNPs

Several SNPs are present in evolutionarily conserved non-coding regions

- mRNA expression in human liver cell lines

Page 82: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

SNP Function: VKORC1 Expression

Expression in human liver tissue (n = 53) shows a graded change in expression.

Page 83: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

VKORC1 SNP alters liver-specific binding siteVKORC1 SNP alters liver-specific binding site

Page 84: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

• Databases and resources available for SNP discovery

• Software for tagSNP selection available

• Both single and multi-SNP analysis are useful

• Replication required by several journals

SNP Discovery and Analysis Application to Association Studies

Summary

Page 85: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

SeattleSNPs Genotyping Service

• Free genotyping (BeadArray or SNPlex)

• Emphasis on young investigators

• Research related to heart, lung, blood, or sleep disorders

• Moderate to large population samples

• Apply at pga.gs.washington.edu

• Due: October 15th, 2005

Page 86: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

SNP Typing Formats

Microtiter Plates - Fluorescence

Size Analysis by Electrophoresis

Arrays - Custom or Universal

eg. Taqman - Good for a few markers - lots of samples - PCR prior to genotyping

eg. SNPlex - Intermediate Multiplexing reduces costs - Genotype directly on

genomic DNA - new paradigm for high throughput

eg. Illumina, ParAllele, Affymetrics - Highly multiplexed- 1,500 SNPs and beyond (500K+)

Low

Medium

High

Scale

Page 87: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Taqman

Genotyping with fluorescence-based homogenous assays (single-tube assay) = 1 SNP/ tube

Page 88: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

SNP Typing Formats

Microtiter Plates - Fluorescence

Size Analysis by Electrophoresis

Arrays - Custom or Universal

eg. Taqman - Good for a few markers - lots of samples - PCR prior to genotyping

eg. SNPlex - Intermediate Multiplexing reduces costs - Genotype directly on

genomic DNA - new paradigm for high throughput

eg. Illumina, ParAllele, Affymetrics - Highly multiplexed- 1,500 SNPs and beyond (500K+)

Low

Medium

High

Scale

Page 89: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Technological Leap - No advance PCR

Universal PCR after preparing multiple regions for analysis -

Several based on primer specific on genomic DNA followed by PCR of the ligated products - different strategiesand different readouts.

SNPlex, Illumina, Parallele

Also, reduced representation - Affymetrix - cut with restriction enzyme, then ligate linkers and amplify from linkers and follow by chiphybridization to read out.

Page 90: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

9. Characterize on Capillary Sequencer

Detection

SNP 1

SNP 2

Page 91: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

SNP Typing Formats

Microtiter Plates - Fluorescence

Size Analysis by Electrophoresis

Arrays - Custom or Universal

eg. Taqman - Good for a few markers - lots of samples - PCR prior to genotyping

eg. SNPlex - Intermediate Multiplexing reduces costs - Genotype directly on

genomic DNA - new paradigm for high throughput

eg. Illumina, ParAllele, Affymetrics - Highly multiplexed- 1,500 SNPs and beyond (500K+)

Low

Medium

High

Scale

Page 92: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Locus 1 Specific Sequence

cTag1 sequenceTag1 sequence

SubstrateBead or Chip

Tag 1

Tag 2

Tag 3

Tag 4

Chip ArrayBead Array

Multiplexed Genotyping - Universal Tag Readouts

Locus 2 Specific Sequence

cTag2 sequenceTag2 sequence

SubstrateBead or Chip

C T A G

Multiplex ~1,000 SNPs

Not dependent on primary PCRIllumina

ParAllele

Affymetrics

Page 93: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Illumina Platform

96 Multi-array Matrix matches standard microtiter plates~ 1,500 SNPs typed per matrix for 96 samples

Page 94: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Affymetrix’s 100K Chip

http://www.affymetrix.com/products/arrays/specific/100k.affx

Optimized for 250-2000bp

Page 95: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

High Throughput Chip Formats

Page 96: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Defining the scale of the genotyping project is key to selecting an approach:

5 to 10 SNPs in a candidate gene - Many approaches (expensive ~ 0.60 per SNP/genotype)

48 ( to 96) SNPs in a handful of candidate genes (~ 0.25 per SNP/genotype)

384 0 1,536 SNPs (~0.15 - 0.08 per SNP/genotype)

10,000 cSNPs - defined format(~0.05 per SNP/genotype)

100,000 Genic SNPs - defined format(~0.005 per SNP/genotype

500,000 SNPs defined format(~0.004 per SNP/ genotype)

1000 individuals

$6,000

$12,000

$57,600-122,880

$500,000

$500,000

$2,000,000

Page 97: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

AcknowledgementsAcknowledgements

Allan Rettie, Medicinal ChemistryAllan Rettie, Medicinal ChemistryAlex ReinerAlex ReinerDave VeenstraDave VeenstraDave BloughDave BloughKen ThummelKen Thummel

Noel HastingsNoel HastingsMaggie AhearnMaggie Ahearn

Josh SmithJosh SmithChris BaierChris BaierPeggy Dyer-RobertsonPeggy Dyer-Robertson

Washington UniversityWashington UniversityBrian GageBrian GageHoward McLeodHoward McLeodCharles EbyCharles Eby

Joyce You - Hong KongJoyce You - Hong Kong

Page 98: SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,