75
Genome-wide association studies Misha Kapushesky Slides: Johan Rung, EBI St. Petersburg Russia 2010

20100515 bioinformatics kapushesky_lecture06

Embed Size (px)

Citation preview

Page 1: 20100515 bioinformatics kapushesky_lecture06

Genome-wide association studies

Misha Kapushesky

Slides: Johan Rung, EBISt. Petersburg Russia 2010

Page 2: 20100515 bioinformatics kapushesky_lecture06

Overview

• Methods for genome-wide association studies

• Montreal GWAS for Type 2 Diabetes

• GWAS results - context and caveats

Page 3: 20100515 bioinformatics kapushesky_lecture06

Study coverage

• Associating phenotype/disease state to genetic variation

• Cost per genotype has decreased

• Instead of a candidate gene approach, just scan the entire genome

• SNP microarrays covering up to 5M SNPs on one chip

• Increased sample sizes

Page 4: 20100515 bioinformatics kapushesky_lecture06

Recombination

Page 5: 20100515 bioinformatics kapushesky_lecture06

Linkage disequilibrium

Two markers on the genome are inherited together more often than would be expected by chance

This leads to high correlation between nearby markers in its haplotype block

Page 6: 20100515 bioinformatics kapushesky_lecture06

Haplotypes and genotype tagging

Page 7: 20100515 bioinformatics kapushesky_lecture06

Association studies

• Linkage disequilibrium enables association studies, because of detection by proxy - not every variant need to be typed

Page 8: 20100515 bioinformatics kapushesky_lecture06

Study power

1

2

3

4

1

2

3

4

A B

Cases

Controls

Page 9: 20100515 bioinformatics kapushesky_lecture06

• The power of a study is to correctly predict a true positive

• To calculate this, you need: • risk model• genotype relative risk• allele frequency• number of cases and controls• population penetrance• Acceptable rate of false positives

Study power

Page 10: 20100515 bioinformatics kapushesky_lecture06

How many SNPs should be tested? Studies of small regions revealed linkage disequilibrium blocks in which common SNPs are highly correlated (usually <10,000–30,000 base pairs in African populations or 30,000–50,000 base pairs in the newer European and Asian populations) (22). This motivated the HapMap Project (www.hapmap.org [12]), which has validated approximately 4 million SNPs, including 2.8 million of the estimated 10 million common SNPs in major world populations, while creating competition among biotechnology companies to develop high-throughput genotyping technologies. Sequencing and genotyping studies showed that sets of 500,000 (European populations) to 1,000,000 (African populations) SNPs could "tag" (serve as proxies for) approximately 80% of common SNPs (23).

Page 11: 20100515 bioinformatics kapushesky_lecture06

Quality controls

• Call rates for samples and SNPs

• Exclusion of low frequency SNPs

• Exclusion of SNPs out of Hardy-Weinberg Equilibrium

• Clean (or take into account) population stratification

Page 12: 20100515 bioinformatics kapushesky_lecture06

Hardy-Weinberg Equilibrium

• If the alleles A and B have frequencies p and q, you would expect the following genotype frequencies:

AA: p2

AB: 2pq

BB: q2

Page 13: 20100515 bioinformatics kapushesky_lecture06

Hardy-Weinberg Equilibrium

• When observed genotype frequencies deviate from the ones expected under HWE, this is indicative of

• population stratification

• different mutation rates between males and females

• different fitness between alleles

• genotype calling problems

• true association at the locus

Page 14: 20100515 bioinformatics kapushesky_lecture06

• Binary traits are typically disease state labels (case or control)

• Real-valued traits are quantitatively measured phenotypes• blood sugar• lipids• height• BMI• gene expression

Binary or real-valued phenotypes

Page 15: 20100515 bioinformatics kapushesky_lecture06

Molecular vs disease phenotypes

• Disease phenotypes are the result of combinations of molecular phenotypes in the body

• Progression with time

• Precision of phenotype measurement

Page 16: 20100515 bioinformatics kapushesky_lecture06

Molecular vs disease phenotypes

• Many physiological phenotypes involved in disease dynamics

Page 17: 20100515 bioinformatics kapushesky_lecture06

Molecular vs disease phenotypesMolecular phenotypes can give more precise information about disease state

Page 18: 20100515 bioinformatics kapushesky_lecture06

• Association statistics for binary traits are most often based on a 2-statistic, based on the genotype count table, or a logistic regression model

2-statistic summarizes independence between disease state and genotype

Association statistics

Page 19: 20100515 bioinformatics kapushesky_lecture06

aa aA AA Sum

Cases r0 r1 r2 R

Controls s0 s1 s2 S

Count n0 n1 n2 N

• For aa in cases, you would expect

N

n

N

RNr 0

0 **~

• The sum of the squares of the differences is 2-distributed

Association statistics

Page 20: 20100515 bioinformatics kapushesky_lecture06

• For real-valued phenotypes, use linear regression• For binary phenotypes, use logistic regression

Regression

Page 21: 20100515 bioinformatics kapushesky_lecture06

• Population stratification occurs when groups or subpopulations within your sample are more related than would be expected by random

• This introduces correlations and inflates association p-values and need to be corrected for

Population stratification

Page 22: 20100515 bioinformatics kapushesky_lecture06

Genomic control

Page 23: 20100515 bioinformatics kapushesky_lecture06

Eigenstrat

Page 24: 20100515 bioinformatics kapushesky_lecture06

Imputation

• Using a reference population (like HapMap or 1000 genomes) we can infer the genotype of SNPs that were not tested

• IMPUTE or MACH commonly used

• Yields probabilistic genotypes that need special treatment

Page 25: 20100515 bioinformatics kapushesky_lecture06

Imputation

Wu et al, Nat. Genet. 41, 991-995, 2009

Page 26: 20100515 bioinformatics kapushesky_lecture06

Montreal GWASMontreal GWAS

Page 27: 20100515 bioinformatics kapushesky_lecture06

Type 2 diabetes

• Blood glucose levels are regulated by insulin release

• Increased blood glucose levels triggers release of insulin, that signals to the cells in muscle for glucose intake

• Through -cell dysfunction or insulin resistance, insulin regulation is impaired, leading to increased glucose levels and eventually type 2 diabetes

Page 28: 20100515 bioinformatics kapushesky_lecture06

Type 2 diabetes

Page 29: 20100515 bioinformatics kapushesky_lecture06

Genetics of type 2 diabetes

• Before GWAS, T2D genetics was studied with linkage studies and candidate gene approaches

• Results in particular for MODY variants, caused by disruptions of single genes

• Genome-wide association studies and SNP arrays made it possible to study complex diseases

• Five large GWAS for T2D in 2007

• DIAGRAM meta-analysis in 2008

Page 30: 20100515 bioinformatics kapushesky_lecture06

Montreal GWAS

• Part of a larger T2D project at McGill and Genome Quebec

• After initial planning for candidate gene genotyping, we switched to a GWAS strategy

Page 31: 20100515 bioinformatics kapushesky_lecture06

Multi-stage GWAS

• Two main strategies for increasing study power

• Meta-analyses increase effective sample size by combining results from different studies

• Multi-stage approaches scan the whole genome with relatively low power, followed by focusing in on the hits with higher power

• Maximizing power in a single study in a cost-effective way

Page 32: 20100515 bioinformatics kapushesky_lecture06

Multi-stage GWAS

Page 33: 20100515 bioinformatics kapushesky_lecture06

Study design

Stage 1: Genome-wide scan - 392,365 SNPs

French (N=1,376)679 cases, 697 controls

Stage 1: Genome-wide scan - 392,365 SNPs

French (N=1,376)679 cases, 697 controls

Focused Stage 2 - 16,273 SNPs

French (N=4,977)2,245 cases, 2,732 controls

Focused Stage 2 - 16,273 SNPs

French (N=4,977)2,245 cases, 2,732 controls

Focused Stage 3 - 28 SNPs

Danish (N=7,698)3,334 cases, 4,364 controls

Focused Stage 3 - 28 SNPs

Danish (N=7,698)3,334 cases, 4,364 controls

Stage 4: population effect study - 1 SNP (rs2943641)

Population based study samplesFrench (N=3,351), Finnish (N=5,183), Danish (N=5,824)

Stage 4: population effect study - 1 SNP (rs2943641)

Population based study samplesFrench (N=3,351), Finnish (N=5,183), Danish (N=5,824)

CASE-CONTROLT2D ASSOCIATION

QT ASSOCIATIONIN POPULATIONS

Fast-track confirmation - 57 SNPs

French (N=5,511)2,617 cases, 2,894 controls

Previously published,Nature, Feb 2007

Fast-track confirmation - 57 SNPs

French (N=5,511)2,617 cases, 2,894 controls

Previously published,Nature, Feb 2007

Fasting glucoseNormoglycemic individuals

Stage 1: French (N=654)

Stage 2: rs560887 (N=9,353)

Previously published,Science, May 2007

Fasting glucoseNormoglycemic individuals

Stage 1: French (N=654)

Stage 2: rs560887 (N=9,353)

Previously published,Science, May 2007

Stage 1: Genome-wide scan - 392,365 SNPs

French (N=1,376)679 cases, 697 controls

Stage 1: Genome-wide scan - 392,365 SNPs

French (N=1,376)679 cases, 697 controls

Focused Stage 2 - 16,273 SNPs

French (N=4,977)2,245 cases, 2,732 controls

Focused Stage 2 - 16,273 SNPs

French (N=4,977)2,245 cases, 2,732 controls

Fast-track confirmation - 57 SNPs

French (N=5,511)2,617 cases, 2,894 controls

Previously published,Nature, Feb 2007

Fast-track confirmation - 57 SNPs

French (N=5,511)2,617 cases, 2,894 controls

Previously published,Nature, Feb 2007

Stage 1: Genome-wide scan - 392,365 SNPs

French (N=1,376)679 cases, 697 controls

Stage 1: Genome-wide scan - 392,365 SNPs

French (N=1,376)679 cases, 697 controls

Focused Stage 2 - 16,273 SNPs

French (N=4,977)2,245 cases, 2,732 controls

Focused Stage 2 - 16,273 SNPs

French (N=4,977)2,245 cases, 2,732 controls

Fasting glucoseNormoglycemic individuals

Stage 1: French (N=654)

Stage 2: rs560887 (N=9,353)

Previously published,Science, May 2007

Fasting glucoseNormoglycemic individuals

Stage 1: French (N=654)

Stage 2: rs560887 (N=9,353)

Previously published,Science, May 2007

Fast-track confirmation - 57 SNPs

French (N=5,511)2,617 cases, 2,894 controls

Previously published,Nature, Feb 2007

Fast-track confirmation - 57 SNPs

French (N=5,511)2,617 cases, 2,894 controls

Previously published,Nature, Feb 2007

Stage 1: Genome-wide scan - 392,365 SNPs

French (N=1,376)679 cases, 697 controls

Stage 1: Genome-wide scan - 392,365 SNPs

French (N=1,376)679 cases, 697 controls

Focused Stage 2 - 16,273 SNPs

French (N=4,977)2,245 cases, 2,732 controls

Focused Stage 2 - 16,273 SNPs

French (N=4,977)2,245 cases, 2,732 controls

Fasting glucoseNormoglycemic individuals

Stage 1: French (N=654)

Stage 2: rs560887 (N=9,353)

Previously published,Science, May 2007

Fasting glucoseNormoglycemic individuals

Stage 1: French (N=654)

Stage 2: rs560887 (N=9,353)

Previously published,Science, May 2007

Fast-track confirmation - 57 SNPs

French (N=5,511)2,617 cases, 2,894 controls

Previously published,Nature, Feb 2007

Fast-track confirmation - 57 SNPs

French (N=5,511)2,617 cases, 2,894 controls

Previously published,Nature, Feb 2007

Stage 1: Genome-wide scan - 392,365 SNPs

French (N=1,376)679 cases, 697 controls

Stage 1: Genome-wide scan - 392,365 SNPs

French (N=1,376)679 cases, 697 controls

Focused Stage 2 - 16,273 SNPs

French (N=4,977)2,245 cases, 2,732 controls

Focused Stage 2 - 16,273 SNPs

French (N=4,977)2,245 cases, 2,732 controls

Stage 1: Genome-wide scan - 392,365 SNPs

French (N=1,376)679 cases, 697 controls

Stage 1: Genome-wide scan - 392,365 SNPs

French (N=1,376)679 cases, 697 controls

Focused Stage 2 - 16,273 SNPs

French (N=4,977)2,245 cases, 2,732 controls

Focused Stage 2 - 16,273 SNPs

French (N=4,977)2,245 cases, 2,732 controls

Focused Stage 3 - 28 SNPs

Danish (N=7,698)3,334 cases, 4,364 controls

Focused Stage 3 - 28 SNPs

Danish (N=7,698)3,334 cases, 4,364 controls

Stage 4: population effect study - 1 SNP (rs2943641)

Population based study samplesFrench (N=3,351), Finnish (N=5,183), Danish (N=5,824)

Stage 4: population effect study - 1 SNP (rs2943641)

Population based study samplesFrench (N=3,351), Finnish (N=5,183), Danish (N=5,824)

Fast-track confirmation - 57 SNPs

French (N=5,511)2,617 cases, 2,894 controls

Previously published,Nature, Feb 2007

Fast-track confirmation - 57 SNPs

French (N=5,511)2,617 cases, 2,894 controls

Previously published,Nature, Feb 2007

Stage 1: Genome-wide scan - 392,365 SNPs

French (N=1,376)679 cases, 697 controls

Stage 1: Genome-wide scan - 392,365 SNPs

French (N=1,376)679 cases, 697 controls

Focused Stage 2 - 16,273 SNPs

French (N=4,977)2,245 cases, 2,732 controls

Focused Stage 2 - 16,273 SNPs

French (N=4,977)2,245 cases, 2,732 controls

Fast-track confirmation - 57 SNPs

French (N=5,511)2,617 cases, 2,894 controls

Previously published,Nature, Feb 2007

Fast-track confirmation - 57 SNPs

French (N=5,511)2,617 cases, 2,894 controls

Previously published,Nature, Feb 2007

Stage 1: Genome-wide scan - 392,365 SNPs

French (N=1,376)679 cases, 697 controls

Stage 1: Genome-wide scan - 392,365 SNPs

French (N=1,376)679 cases, 697 controls

Focused Stage 2 - 16,273 SNPs

French (N=4,977)2,245 cases, 2,732 controls

Focused Stage 2 - 16,273 SNPs

French (N=4,977)2,245 cases, 2,732 controls

Fast-track confirmation - 57 SNPs

French (N=5,511)2,617 cases, 2,894 controls

Previously published,Nature, Feb 2007

Fast-track confirmation - 57 SNPs

French (N=5,511)2,617 cases, 2,894 controls

Previously published,Nature, Feb 2007

Stage 1: Genome-wide scan - 392,365 SNPs

French (N=1,376)679 cases, 697 controls

Stage 1: Genome-wide scan - 392,365 SNPs

French (N=1,376)679 cases, 697 controls

Focused Stage 2 - 16,273 SNPs

French (N=4,977)2,245 cases, 2,732 controls

Focused Stage 2 - 16,273 SNPs

French (N=4,977)2,245 cases, 2,732 controls

Fasting glucoseNormoglycemic individuals

Stage 1: French (N=654)

Stage 2: rs560887 (N=9,353)

Previously published,Science, May 2007

Fasting glucoseNormoglycemic individuals

Stage 1: French (N=654)

Stage 2: rs560887 (N=9,353)

Previously published,Science, May 2007

Fast-track confirmation - 57 SNPs

French (N=5,511)2,617 cases, 2,894 controls

Previously published,Nature, Feb 2007

Fast-track confirmation - 57 SNPs

French (N=5,511)2,617 cases, 2,894 controls

Previously published,Nature, Feb 2007

Stage 1: Genome-wide scan - 392,365 SNPs

French (N=1,376)679 cases, 697 controls

Stage 1: Genome-wide scan - 392,365 SNPs

French (N=1,376)679 cases, 697 controls

Focused Stage 2 - 16,273 SNPs

French (N=4,977)2,245 cases, 2,732 controls

Focused Stage 2 - 16,273 SNPs

French (N=4,977)2,245 cases, 2,732 controls

Page 34: 20100515 bioinformatics kapushesky_lecture06

Stage 1 samples

• French individuals: 690 cases, 670 controls

• Criteria for cases:• T2D• First degree relative with T2D• Non-obese (BMI < 31 kg/m² , 25.8 ± 2.8 kg/m²)

• Controls from DESIR, a prospective French cohort• Normal glucose tolerance for the 9 years of the study

Page 35: 20100515 bioinformatics kapushesky_lecture06

Stage 1 SNPs

• Tested on Illumina Human1 (100k) and HumanHap300 (300k)

• 392,935 unique SNPs from the combined arrays

Page 36: 20100515 bioinformatics kapushesky_lecture06

Stage 1 results

Page 37: 20100515 bioinformatics kapushesky_lecture06

Fast-track validation

• Top 57 fast-tracked and tested on a Sequenom panel on 2,617 cases, 2,894 controls

• Relaxed criteria for cases• BMI < 35 kg/m² (28.9 ± 3.7 kg/m²)

• Sladek et al., Nature 445, 881-885, 2007

Page 38: 20100515 bioinformatics kapushesky_lecture06

Results

SNP Chr Position pMAXClosest

gene

rs7903146 10 114748339 1.5 x 10-34 TCF7L2

rs13266634 8 118253964 6.1 x 10-8 SLC30A8

rs1111875 10 94452862 3.0 x 10-6 HHEX

rs7923837 10 94471897 7.5 x 10-6 HHEX

rs7480010 11 42203294 1.1 x 10-4 LOC387761

rs3740878 11 44214378 1.2 x 10-4 EXT2

rs11037909 11 44212190 1.8 x 10-4 EXT2

rs1113132 11 44209979 3.3 x 10-4 EXT2

Page 39: 20100515 bioinformatics kapushesky_lecture06

SLC30A8

Chimienti et al. Biometals 18:313

Page 40: 20100515 bioinformatics kapushesky_lecture06

HHEX

**

**

**

-log10(p) 024

* *

rs2

4907

45 r

s242

2067

rs1

1187

182

rs2

4907

51 r

s424

4932

rs1

4183

88 r

s115

9206

7 r

s111

8717

3 r

s193

5492

rs2

4880

62 r

s105

0964

6 r

s249

7351

rs9

4205

92 r

s153

9330

rs2

4880

71 r

s947

591

rs2

4973

04 r

s249

7311

rs7

9238

37 r

s111

1875

rs2

2757

29 r

s791

7359

rs7

9024

36 r

s658

3830

rs7

0709

90 r

s791

4814

rs1

0882

091

rs2

2752

19 r

s108

8208

8 r

s460

4791

rs3

8247

35 r

s122

5643

5 r

s658

3826

rs3

7585

05 r

s199

9763

rs7

9081

11 r

s242

1943

rs1

1187

064

rs1

1187

060

rs1

8321

97 r

s707

8413

rs6

5838

20 r

s111

8702

5 r

s373

7225

rs2

4219

40 r

s214

9632

rs1

8879

22 r

s551

266

rs7

9109

77 r

s107

8604

4 r

s122

5705

3 r

s708

6285

rs2

9015

87 r

s225

9049KIF11 HHEXIDE

D'0 0.2 0.4 0.6 0.8 1

Page 41: 20100515 bioinformatics kapushesky_lecture06

HHEX controls pancreatic development

Habener Endocrinology 146:1025

Hex homeobox gene-dependent tissue positioning is required for organogenesis of the ventral pancreas. Bort (2004)

Heart induction by Wnt antagonists depends on the homeodomain transcription factor Hex. Foley (2005)

The homeobox gene Hex is required in definitive endodermal tissues for normal forebrain, liver and thyroid formation. Martinez Barbera (2000)

Page 42: 20100515 bioinformatics kapushesky_lecture06

Stage 2

• Top 5% of GWAS hits were selected for design of a focused Stage 2

• Control for population bias with EIGENSTRAT

• iSelect array with 16,405 SNPs, tested on 2,245 cases, 2,732 controls (French)

• Analysis with EIGENSTRATand selection of 28 SNPs for a focused Stage 3

Page 43: 20100515 bioinformatics kapushesky_lecture06

QC

Exclusion criterion Samples

Call rate < 95% 27

Continental stratification

296

Sex mismatch 64

Related individuals 70

Total 457

Chromosome SNPs Failed HWE Failed MAF Successful

TOTAL 16,360 48 43 16,273

Page 44: 20100515 bioinformatics kapushesky_lecture06

EIGENSTRATcorrection

filters for MAF, HWE, call rate filters for MAF, HWE, call rate and r2

Page 45: 20100515 bioinformatics kapushesky_lecture06

Results - stage 1 vs stage 2

Page 46: 20100515 bioinformatics kapushesky_lecture06

Results - taking out known loci

Page 47: 20100515 bioinformatics kapushesky_lecture06
Page 48: 20100515 bioinformatics kapushesky_lecture06

Stage 3

• The top 28 SNPs were tested using a Sequenom panel in ~7,700 Danish cases and controls

• We confirm association of TCF7L2, WFS1, CDKAL1 and find one new association: rs2943641 near IRS1

Page 49: 20100515 bioinformatics kapushesky_lecture06

rs2943641

• We studied the effect of variation in rs2943641 on T2D risk and metabolic phenotypes in general populations:

• DESIR: 3,351 French adults

• Inter99: 5,183 Danish adults

• NFBC 1986: 5,824 Finnish adolescents

Page 50: 20100515 bioinformatics kapushesky_lecture06

Metabolic traits

• A variety of indexes to capture -cell function and insulin resistance

• HOMA-B and HOMA-IR based on fasting levels of glucose and insulin

• For Inter99, we had access to OGTT data and could calculate other measures of insulin response • time course data• AUC• corrected insulin response (CIR)• disposition indexes

Page 51: 20100515 bioinformatics kapushesky_lecture06

Oral Glucose Tolerance Test

Page 52: 20100515 bioinformatics kapushesky_lecture06

Metabolic traits 1

Metabolic trait

Cohort

rs2943641

P add P dom P recC/C C/T T/T

Age

NFBC 1986 16 16 16

DESIR 47.1 ± 9.8 47.5 ± 9.9 47.6 ± 10.1

INTER99 44.9 ± 7.9 45.4 ± 7.8 45.2 ± 7.6

Sex

NFBC 1986 1062/1092 1153/1208 322/346

DESIR 645/728 728/812 216/222

INTER99 776/942 974/1070 307/354

BMI (kg/m2)

NFBC 1986 21.3 ± 3.8 21.3 ± 3.7 21.1 ± 3.5 0.24 0.43 0.21

DESIR 24.5 ± 3.7 24.4 ± 3.5 24.4 ± 3.4 0.55 0.63 0.61

INTER99 25.6 ± 3.9 25.4 ± 4.1 25.7 ± 4.2 0.57 0.094 0.24

Fasting plasmaglucose(mmol/l)

NFBC 1986 5.13 ± 0.41 5.14 ± 0.40 5.13 ± 0.41 0.77 0.62 0.90

DESIR 5.21 ± 0.44 5.20 ± 0.42 5.18 ± 0.43 0.05 0.32 0.07

INTER99 5.31 ± 0.40 5.31 ± 0.41 5.33 ± 0.39 0.66 0.93 0.32

Fasting serum insulin(pmol/l)

NFBC 1986 78.7 ± 48.6 76.8 ± 44.5 71.7 ± 32.1 0.001 0.03 0.0009

DESIR 50.6 ± 32.9 48.4 ± 29.7 49.1 ± 29.1 0.05 0.003 0.76

INTER99 38.8 ± 24.7 36.4 ± 21.9 37.6 ± 23.3 0.018 0.0043 0.49

Page 53: 20100515 bioinformatics kapushesky_lecture06

Metabolic traits 2

HOMA-B

NFBC 1986 141 ± 95.1 136 ± 80.1 131 ± 91.6 0.006 0.05 0.009

DESIR 109 ± 87.0 103 ± 64.8 108 ± 92.2 0.16 0.006 0.24

INTER99 75.2  ± 65.6 68.3 ± 42.2 71.0 ± 49.9 0.005 0.0011 0.32

HOMA-IR

NFBC 1986 2.52 ± 1.63 2.47 ± 1.58 2.29 ± 1.06 0.007 0.07 0.005

DESIR 1.95 ± 1.35 1.86 ± 1.20 1.88 ± 1.17 0.03 0.004 0.95

INTER99 1.54 ± 1.00 1.44 ± 0.89 1.49 ± 0.95 0.026 0.0058 0.59

Insulin 30’

INTER99

300 ± 183 277 ± 172 281 ± 169 0.0019 8.1 x 10‑4 0.14

Insulin 120’ 176 ± 138 163 ± 127 162 ± 124 0.0059 0.011 0.057

AUC insulin 22000 ± 13800 20300 ± 12900 20500 ± 12700 6.9 x 10‑4 2.2 x 10‑4 0.12

Glucose 30’ 8.19 ± 1.53 8.17 ± 1.56 8.22 ± 1.50 0.72 0.34 0.55

Glucose 120’ 5.51 ± 1.11 5.51 ± 1.11 5.47 ± 1.15 0.54 0.99 0.23

AUC glucose 182 ± 101 181 ± 102 180 ± 99.5 0.44 0.48 0.59

AUC insulin / AUC glucose

32.5 ± 17.4 30.1 ± 16.2 30.6 ± 16.1 6.0 x 10‑4 1.6 x 10‑4 0.13

CIR 1140 ± 4210 1000 ± 1130 1000 ± 1060 0.045 0.066 0.17

ISI 0.151 ± 0.095 0.16 ± 0.098 0.156 ± 0.096 0.026 0.0058 0.59

Disp. Index (CIR * ISI)

180 ± 1610 147 ± 220 143 ± 174 0.73 1.0 0.50

Page 54: 20100515 bioinformatics kapushesky_lecture06

IRS1 locus - rs2943641

Page 55: 20100515 bioinformatics kapushesky_lecture06

IRS1

• G972R is a missense polymorphism in IRS1 that is known to impair insulin signalling (rs1801278) (Almind 1993)

• G972R associated to insulin resistance and insulin release (Clausen 1995, Sesti 2001)

• In mice, IRS1 disruption causes disrupted insulin action, both in target tissues and in -cells (Nandi 2004)

• Also linked to insulin resistance, glucose intolerance, islet hyperplasia (Tamemoto 1994, Araki 1994, Terauchi 1997, Withers 1998)

• G972R not conclusively associated to T2D (Florez 2004, Florez 2007, Jellema 2003, Zeggini 2004)

• We detect no epistasis between rs2943641 and G972R in DESIR or NFBC, only nominal significance in Inter99

• Evidence for link between rs2943641 and IRS1?

Page 56: 20100515 bioinformatics kapushesky_lecture06

rs2943641 - IRS1 protein association

Page 57: 20100515 bioinformatics kapushesky_lecture06

rs2943641 - IRS1 protein association

rs2943641CC

rs2943641CT

rs2943641TT

PAdd PDom PRec

n (male/female) 74 (35/39) 88 (51/37) 28 (10/18)

Age (years) 42.5 ± 17.1 43.5 ± 16.9 43.2 ± 17.6

BMI (kg/m2) 25.0 ± 3.8 24.9 ± 3.9 25.3 ± 4.1 0.3 0.7 0.2

Rd insulin clamp

(mg/kgFFM/min)10.4 ± 3.5 11.0 ± 3.2 11.7 ± 3.7 0.2 0.2 0.4

Di (x 10‑7) 1.7 ± 1.1 1.8 ± 1.3 1.8 ± 1.1 0.8 0.8 0.9

IRS-1 protein basal (AU) 296.7 ± 167.7 314.0 ± 155.1 413.1 ± 227.6 0.03 0.3 0.009

IRS-1 protein insulin (AU)

276.6 ± 143.6 280.9 ± 156.4 313.3 ± 147.9 0.3 0.7 0.2

IRS-1-associated PI3K activity basal (AU)

25.0 ± 12.6 26.6 ± 15.4 30.1 ± 17.2 0.3 0.4 0.4

IRS-1-associated PI3K activity insulin (AU)

47.1 ± 29.9 56.6 ± 32.1 72.2 ± 41.3 0.001 0.02 0.002

Page 58: 20100515 bioinformatics kapushesky_lecture06

Conclusions

• The multi-stage study detected T2D risk loci that were later confirmed in other cohorts (SLC30A8, HHEX)

• Variation in rs2943641 is associated to • T2D risk• increased insulin levels• impaired insulin sensitivity• IRS1 protein levels• IRS1 activity in insulin signaling pathway

• Study provided a ”full story” from GWAS scan to functional evidence thanks to rich phenotyping

Page 59: 20100515 bioinformatics kapushesky_lecture06

Paper

Rung et al., Nature Genetics, 41, 1110-1115, 2009

Page 60: 20100515 bioinformatics kapushesky_lecture06

Acknowledgements

Johan Rung

Rob Sladek

Philippe Froguel

Oluf Pedersen

Constantin Polychronakos Ghislain Rocheleau

Alexander Mazur

Lishuang Shen

David Serre

Philippe Boutin

Daniel Vincent

Alexandre Belisle

Samy Hadjadj

Beverley Balkau

Barbara Heude

Guillaume Charpentier

Tom Hudson

Sebastien Brunet

François Bacot

Rosalie Frechette

Valérie Catudal

Philippe Laflamme

Stephane Cauchi

Christian Dina

David Meyre

Christine Cavalcanti-Proença

Anders Albrechtsen

Torben Hansen

Knut Borch-Johnsen

Torsten Lauritzen

Marjo-Riitta Järvelin

Jaana Laitinen

Emmanuelle Durand

Paul Elliott

Samy Hadjadj

Michel Marre

Alexander Montpetit

Charlotta Pisinger

Barry Posner

Anneli Pouta

Marc Prentki

Rasmus Ribel-Madsen

Aimo Ruokonen

Anelli Sandbaek

Jean Tichet

Martine Vaxillaire

Jorgen Wojtaszewski

Allan Vaag

Page 61: 20100515 bioinformatics kapushesky_lecture06

GWAS into context

Complexity of interactions in biological systems...

Page 62: 20100515 bioinformatics kapushesky_lecture06

Complexity

...a lot of complexity

Page 63: 20100515 bioinformatics kapushesky_lecture06

AA BB

GG

BB

EEFF

DD

AA

CC

Page 64: 20100515 bioinformatics kapushesky_lecture06

Redundancy

Page 65: 20100515 bioinformatics kapushesky_lecture06

Network structure

• Biological networks have a scale-free structure

Log(#edges)

Log(# genes)

Most genes have few connections

Few genes have many connections

Page 66: 20100515 bioinformatics kapushesky_lecture06

Signal propagation

• The structure of biological networks result in robustness against random errors

• Most mutations, even knockouts, can go by unnoticed because of redundancy and network wiring

• Low probability to knock out a hub

Page 67: 20100515 bioinformatics kapushesky_lecture06

Common diseases

• What is most common - disease cause by many variants with low effect, or few rare variants with strong effects?

• GWAS so far have by necessity focused on common variants

• Many known rare variants associated with common diseases - or phenotypes that may contribute and progress to disease

Page 68: 20100515 bioinformatics kapushesky_lecture06

Common disease / common variant

• The hypothesis that most common diseases are caused by a large number of variants, common in a general population, but each adding just a small risk

• GWAS results find many loci for common complex diseases, with small risk

• But... GWAS detected loci so far only explain a very small fraction of the observed variation

Page 69: 20100515 bioinformatics kapushesky_lecture06

Rare variants

• With improved and lower cost sequencing, we can address rare variants

• Not just SNPs

• Utility of “extreme cohorts”

• Ex. “A new highly penetrant form of obesity due to deletions on chromosome 16p11.2” (Nature Feb 4, 2010)

Page 70: 20100515 bioinformatics kapushesky_lecture06

Polygenic contributions

• Groups of non-genomewide significant SNPs proven to be associated with phenotype

• Individual SNPs can not be inferred, just “group action”

• Supports the idea of many weak variants responsible for effect

• Ex. “Common polygenic variation contributes to risk of schizophrenia and bipolar disorder” (Nature 460, 748-752)

Page 71: 20100515 bioinformatics kapushesky_lecture06

Meta-analysis caveats

• Meta-analysis on heterogeneous data

• Phenotypes

• Quality control

• Platforms

• Genotype calling

• Analysis

Page 72: 20100515 bioinformatics kapushesky_lecture06

Future directions for GWAS

• Sequencing is cheaper and yielding higher quality data

• Better basis for studying and detecting rare variants and their effect on diseases or phenotypes

• Copy number variants

• Genetic interactions, GxE interactions

• More samples => higher power

Page 73: 20100515 bioinformatics kapushesky_lecture06

Future directions for GWAS

• Complex phenotypes

• Association of genetic loci to

• genome-wide expression levels

• protein levels

• metabolite levels

Page 74: 20100515 bioinformatics kapushesky_lecture06

Future directions for GWAS

• More data shared => better quality of results

• As in other branches of science, data sharing, transparency and openness should be promoted

Page 75: 20100515 bioinformatics kapushesky_lecture06

Resources• Analysis software packages

• PLINK - http://pngu.mgh.harvard.edu/~purcell/plink/ • *Abel - http://mga.bionet.nsc.ru/~yurii/ABEL/• MERLIN - http://www.sph.umich.edu/csg/abecasis/merlin/

• Imputations• IMPUTE - http://mathgen.stats.ox.ac.uk/impute/impute.html• MACH - http://www.sph.umich.edu/csg/abecasis/MACH/

• Population structure• Eigenstrat - http://genepath.med.harvard.edu/~reich/Software.htm• EMMA(X) - http://genetics.cs.ucla.edu/emmax/index.html

• Meta-analysis• METAL - http://www.sph.umich.edu/csg/abecasis/METAL/• GWAMA - http://www.well.ox.ac.uk/gwama/index.shtml

• Data• EGA - http://www.ebi.ac.uk/ega/• dbGAP - http://www.ncbi.nlm.nih.gov/gap