Bernard Keavney Institute of Human Genetics University of Newcastle, UK. Recent developments in genetic epidemiology relevant to PURE

Bernard KeavneyInstitute of Human GeneticsUniversity of Newcastle, UK.

Recent developments in genetic epidemiology relevant to PURE

Objectives

• Brief revision of some genetic “basics”

• Developments 2003-2005 in genetic markers and genotyping technology

• Ethnicity, genetic variation and disease

• The potential impact of rare variants on common diseases: epidemiological and technological challenges.

Monogenic HCM, LQTS(disease genes)

Genetic contribution to cardiovascular diseases

genes

environment

(large-effect susceptibility genes)

oligogenic

Non-genetic

Congenital HD

Hypertension

T II DM

Atherosclerosis

(small-effect susceptibility genes)

polygenic

Common variants which affect human diseases

• HLA: Autoimmunity and infection• APOE4: Alzheimer’s, CHD, lipids• FV Leiden: Venous thrombosis• PPARG: Type II Diabetes• KCJN11: Type II Diabetes• PTPN22: RhA, Type 1 Diabetes• Insulin: Type I Diabetes• NOD2: Crohn’s disease• CF-H: Age-related MD• RET: Hirschprung disease

Candidate gene association studies: a uniquely non-replicable area of science

• Six of 166 replicated in >75% of studies (4%)*

• Study sizes too small• Statistical significance levels not

stringent enough• Meta-analyses: problem of publication

bias• Most conducted in urban Western

Caucasian populations• Minimal environmental heterogeneity

within individual studies• Minimal amount of “gene space” tested

*Hirschhorn et al. Genet. Med. 2002

Genome figures

• The human genome: 3,200,000,000 base pairs• 5% gene coding regions (1% expressed sequence)• Noncoding regulatory elements are situated near genes

• 20,000 genes• Any two genomes: 99.9% identical

• 3.2M differences between any two individuals

• 11,000,000 sites vary in at least 1% of the world’s population (Polymorphisms)

• Every site compatible with life has been mutated several times in this generation alone

Single nucleotide polymorphisms (SNPs): the mapping tool for association studies

CAACTGTGTAGGTTGAG

CAACTGTGTTGGTTGAG

Between 2000 and 2005 10 million SNPs have been identified.

For mapping, focus hitherto on common SNPs (MAF > 0.05): ancient power to detect given effect greater90% of human variation is due to common allelesMost common variants are found in all world populationsTechnology to find rare variants has not been available thus far

Expect one common SNP every ~600 bpTotal of 7M genomewide……Which ones to type? And how many?

Coding (amino acid change)Minority

NoncodingSome regulatory

SNPs in dbSNP 2000-2005

The degree of association between a disease allele and a marker allele

determines power

Disease

Causal SNP

Marker SNP

Testing two associations in one.

D H D H

A B B A

The arrangement of two or more alleles on a chromosome is called a haplotype

Locus 1

Locus 2

The degree of association between a disease allele and a marker allele

determines power

Disease

Causal SNP

Marker SNP

Testing two associations in one.

D H D H

A B B A

The arrangement of two or more alleles on a chromosome is called a haplotype

Locus 1

Locus 2

M D

M D DM DD

after n generations

M DD M M D

Chromosomes are mosaics reflecting ancestral haplotypes

ACE gene diagram

Position of 10 polymorphisms typed at the ACE locus210 haplotypes could be generated from these genotypes

T A T A T C G I A 3

T A T A T T G I A 3

T A T A T C A I A 3

C C C T C C A D G 2

C C C T C C G D G 2

C A D G 2T A C A T

C A D G 2T A T A T

.

Clade A Clade B

Clade C

X

Keavney et al 1998

Oct 2005: Characterisation of most of the common genetic variation present genomewide in four world populations

HapMap project

• Phase I: 1 common SNP (MAF>0.05) every 5 Kb in 269 DNA samples (1 million SNPs)

• Yoruba from Ibadan, Nigeria• European ancestry from Utah, US• Han Chinese from Beijing• Japanese from Tokyo

• 10 x 500Kb regions• Resequenced in 48 individuals• All SNPs genotyped in 269 samples

• Phase II : 4 million common SNPs• Goal: to assess feasibility of whole-genome

association studies and provide the “road map”of SNPs to type

HapMap phase I data

Recombination rates, haplotype lengths and gene locationChromosome 9q13

The POMC gene

Intron 1(3709bp)

Exon 1(85bp)

Exon 2(151bp)

Intron 2(2887bp)

Exon 3(833bp)

RsaI C1032G C8246T

There are no common polymorphisms in the translated sequence

5’

Baker et al Diabetes 2005

POMC C8246T genotype

Adju

sted s

tandard

ised W

HR

T/TC/TC/C

0.5

0.4

0.3

0.2

0.1

0.0

-0.1

-0.2

WHR adjusted for age, sex, smoking, alcohol, exercise, with or without BMI Difference 0.2 SD per allele. P=0.003 for C1032G; p=NS for RsaIN=1426

P<0.0001Means (95% CIs)

Baker et al. Diabetes 2005

Genome-wide association studies are feasible: HapMap data

Chip-based genotyping provides the possibility to type 500,000 SNPs in a single individual today.

Chip-based WGA study using 116,204 SNPs identified the role of Factor H in AMD (Klein et al. April 2005)

The within-population component of genetic variation accountsfor most of human genetic diversity

Rosenberg et al. Science 2003

1052 individuals from 52 populations; 377 autosomal microsatellites47% of 4199 alleles present in all regions7% alleles region-specific; median q=0.01

Few SNPs rare in one panel are common in another

HapMap 2005

Ioannidis et al. Nat Genet. 2004

Heterogeneity of allele frequencies and disease O.R.s inmeta-analyses of 43 gene-disease associations

I2=75% shown by red line

Disease-causing variants: common or rare alleles?

With a few exceptions (e.g. ACE I/D and plasma ACE) this is empirically confirmed

20Kb shownAll common haplotypes at LEP are captured by these markersC538T is a rare allele (q<0.01)

Leptin gene polymorphisms and cardiovascular risk

Gaukrodger et al. 2005

LEP C538T polymorphism, arterial stiffness and carotid IMT

Trait Estimate (SE) 95% CI

Pulse pressure Displacement* 1.00 (0.31) 0.39 – 1.61

Polygenic h2$ 0.24 (0.06) 0.12 – 0.36

Mean IMT Displacement 0.90 (0.36) 0.19 – 1.61

Polygenic h2 0.20 (0.07) 0.06 – 0.34

Residual correlation

0.13 (0.04) 0.04 – 0.21

Gaukrodger et al. JMG 2005

Rare alleles with large effect contribute to HDL cholesterol variation in the “normal range”

APOA1ABCA1LCAT

SequencedCodingRegion

128 High HDLC(>95%)

128 Low HDLC(<5%)

Low HDLC

High HDLC

Var + 21 3

Var - 107 125

• Variants affected function• Replicated in 2nd population• No association between HDLC and common variants in these genes• 1/6 of those with HDLC <5% had a mutation• These would be missed by a “common variant only” strategy

Cohen et al. Science 2004

High-throughput sequencing technologies from September 2005 issues of “Science” and “Nature”

Conclusions

• Technological progress is very rapid: prospect of WGA scans on large numbers of samples in near future

• Many studies (eg UK Biobank) focus on gene-environment interaction but often environmental heterogeneity is minimal

• There remains a pressing need to describe and validate genetic associations with CVD in populations other than US and Western European Caucasians

Documents

Bernard Keavney Institute of Human Genetics University of Newcastle, UK. Recent developments in genetic epidemiology relevant to PURE