Genetics for Imagers: How Geneticists Model Quantitative Phenotypes Nelson Freimer UCLA Center for...

Preview:

Citation preview

Genetics for Imagers: How Geneticists Model

Quantitative Phenotypes

Nelson Freimer

UCLA Center for Neurobehavioral Genetics

What makes a genetic association significant?

Outline

• The problem of achieving validated findings in psychiatric genetics

• Approaches to genetic mapping and statistical significance

- linkage analysis (+ examples)

- association analysis (+ examples

Psychiatric genetics: The brains of the family

10 July 2008 | Nature 454, 154-157 (2008)

Does the difficulty in finding the genes responsible for mental illness reflect the complexity of the genetics or the poor definitions of psychiatric disorders?

“The studies sofar are statisticallyunderpowered.We need biggerstudies.”— Jonathan Flint

“Geneticists knownothing about

psychiatric disease.”— Daniel Weinberger

WHAT IS THE PROBLEM?

• Psychiatric disorders are highly heritable• No psychiatric susceptibility genes known

• Studies so far are underpowered– Phenotypes are of uncertain validity– Samples are too small and markers too few – Signal to noise ratio is too low

(etiological heterogeneity: genetic and non-genetic)

“We are just too ignorantof the underlyingneurobiology to makeguesses about candidategenes.” —Steven Hyman

This is why geneticists have turned to genome wide

mapping

Genome-wide mapping and allelic architecture

Allelic architecture and genetic mapping approachesE

ffect

Siz

e

Disease Gene Allele FrequencyRare (<1%) Common (>5%)

Sm

all

Lar

ge

LINKAGE

ASSOCIATION

Family-based Case-control

OR

NOT FOUND TO DATE

COPY NUMBER VARIANTS

Disease Gene IBD Region

Shared IBD Region

Founder

Present-day affectedindividuals

IBD= Identical By Descent

The Principle of Genetic Linkage

If genes are located on different chromosomes theyshow independent assortment.

compute this probability.

However, genes on the same chromosome, especially ifthey are close to each other, tend to be passed ontotheir offspring in the same configuration as on theparental chromosomes.

Genetic markers: SNPs

Detecting Genetic Linkage: Linkage Analysis vs Association

Analysis• Linkage Analysis

– Using pedigree samples, search for regions of the genome where affected individuals share alleles more than you would expect

• Association Analysis– Compare allele frequency distributions in

cases and controls• For quantitative traits can apply similar

principles

G,T T,T

T,T G,T G,T T,T

G,T

G,T

G,T G,T G,T G,T

AssociationAnalysis

LinkageAnalysis

T,T T,T T,TT,T

When are two genetic loci significantly linked?

Stringent significance thresholds based on…• Low prior probability of linkage between

any two loci– Considered when there were few markers

• Multiple tests involved in genotyping studies– Considered after there were many markers

• Both considerations yielded ~ same threshold:

LOD score (log. base 10 of the likelihood ratio) >~ 3

(i.e. p < 10-4)

• Prior probability of linkage between a given locus and a random genome location: 0.02

• To obtain posterior probability of linkage of >0.95 (i.e. <0.05 false positive linkages), apply Bayes theorem:

• Solving for the likelihood ratio Pr(Data | Linkage) / Pr(Data | NoLinkage)…– ratio must be >1,000, i.e. LOD >3

Controlling for multiple testing in linkage

• With complete genome marker sets, prior probability that some marker linked is 1

• ~500 fully informative, independent markers cover linkage in all regions of the genome

• To control at 0.05 level, the global hypothesis of no linkage anywhere in the genome: 0.05/500 = 10-4 for each test, i.e. LOD >3

• Suggestive linkage: a lod score or p value expected to occur once by chance in a whole genome scan.

LOD >2.2, p < 7.4 x 10-4

• Significant linkage: a lod score or p value expected to occur by chance 0.05 times in a whole genome scan

LOD >3.6, p < 2.2 x 10-5

• Highly significant linkage: a lod score or p value expected to occur by chance 0.001 times in a whole genome scan.

LOD > 5.4, p < 3 x 10-7

• Confirmed linkage - a significant linkage observed in one study is confirmed by finding a lod score or p value expected to occur 0.01 times by chance in a specific search of the candidate region.

Significance thresholds for linkage Lander and Kruglyak, 1996

An example of linkage to a quantitative neurobehavioral

trait

Monoamine Neurotransmitters

Norepinephrine

and epinephrineAttention

Blood pressure

HistamineGastric acid release

Immune response

DopamineReward

SerotoninAppetite,Mood

Gastrointestinal motility

From David Krantz

Catecholamine Synthesis and Degradation

Genome wide linkage analysis of HVA in a vervet monkey

pedigree

Vervet research colony pedigree

MONOAMINE METABOLITES

0

0.2

0.4

0.6

0.8

5-HIAA HVA MHPG

PR

OP

VA

R

h2-GENETIC

c2-MATERNAL

Heritability of Monoamine Metabolites in vervet monkeys

HVA level in Vervets on Chromosome 10

Linkage analysis in extended pedigrees may be powerful for

structural MRI phenotypes

Brain MRIs in the VRC

357 Vervets scanned

Mobile Siemens Symphony1.5 Tesla scanner

Genetic association analysis

Linkage analysis is not very powerful for mapping complex

traits

(with many alleles of small effect)

Disease gene discovery methodsE

ffect

Siz

e

Disease Gene Allele FrequencyRare (<1%) Common (>5%)

Sm

all

Lar

ge

LINKAGE

ASSOCIATION

Family-based Case-control

OR

NOT FOUND TO DATE

COPY NUMBER VARIANTS

G,T T,T

T,T G,T G,T T,T

G,T

G,T

G,T G,T G,T G,T

AssociationAnalysis

LinkageAnalysis

T,T T,T T,TT,T

Significance thresholds for association

Consider simple Bayesian argument: - Prior probability that a random gene

associated with trait: ~1/30,000, assuming 30,000 genes/genome

- Likelihood ratio should be > 550,000 for association to be significant (posterior probability >0.95)- With χ2 test, p< 2.6 x 10-7

A more complete evaluation of significance

Posterior odds = Prior odds x Power(for true association) Significance

• Strength of evidence depends on likely number of true associations and power to detect them

• These depend on effect sizes and sample sizes• Less well-powered studies need more stringent

thresholds to control false-positive rate

See Wacholder et al., J. National Cancer Institute 2004

Genome wide association thresholds

• Controlling for multiple testing E.g. Bonferroni: 0.05 x No. of SNPs x No. of traits

E. g. For single trait with 106 SNPs, p < 5 x10-8

• However, more complicated…– SNPs are not all independent (LD) – LD varies across genome and populations– traits are not all independent

• False discovery rate (FDR) increasingly used

(proportion of false positives among all positives)

…if 1 out of 20 hits are false not so bad

Evaluating association in neurobehavioral genetics

studies

Monoamine Neurotransmitters

Norepinephrine

and epinephrineAttention

Blood pressure

HistamineGastric acid release

Immune response

DopamineReward

SerotoninAppetite,Mood

Gastrointestinal motility

From David Krantz

Serotonin Transporter Promoter Polymorphism Association Studies

as of 2002

Phenotype P<.05 P>.05 Phenotype P<.05 P>.05

Schizo. 2 7 BP/mood disorder

8 13

OCD 2 2 Personality traits

12 10

Drug response

3 0 Suicide 4 1

Anorexia 0 2 Late Onset Alzheimer’s

2 2

Smoking related

4 1 Alcohol related 5 2

Autism 2 2 Fibromyalgia 1 0

Panic disorder

0 3

Association of Anxiety-Related Traits with Polymorphism in the Serotonin Transporter Gene

Regulatory Region Lesch et al. Science. 1996;274(5292):1527-31.

• Two samples (N = 221, N = 284)

• Association with P ~ 0.02

A more complete evaluation of significance

Posterior odds = Prior odds x Power(for true association) Significance

• Strength of evidence depends on likely number of true associations and power to detect them

• These depend on effect sizes and sample sizes• Less well-powered studies need more stringent

thresholds to control false-positive rate

See Wacholder et al., J. National Cancer Institute 2004

In large samples: No association of 5HTTLPR with temperament

Example from Northern Finland Birth Cohort, N ~ 4000

Influence of Life Stress on Depression: Moderation by a Polymorphism in the 5-HTT

Gene

Caspi et al.

Science 301: 386 – 389 2003

Interaction Between the Serotonin Transporter Gene (5-HTTLPR),

Stressful Life Events, and Risk of Depression: A Meta-analysis

Risch et al.

JAMA. 2009;301(23):2462-2471.

Copyright restrictions may apply.

Logistic Regression Analyses of Risk of Depression for 14 Studies

Genomewide association analysis

51

Progress in identifying gene variants for common traitsProgress in identifying gene variants for common traits

CholesterolObesityMyocardial infarctionQT intervalAtrial FibrilliationType 2 Diabetes Prostate cancerBreast cancerColon cancerheight

KCNJ11

2003

2000

PPAR

2001

IBD5NOD2

2005

2006

2002

CTLA4

2004

PTPN22

Age Related Macular DegenerationCrohns DiseaseType 1 DiabetesSystemic Lupus ErythematosusAsthmaRestless leg syndromeGallstone diseaseMultiple sclerosisRheumatoid arthritisGlaucoma

2007

CD25IRF5PCSK9CFH

NOS1APIFIH1

PCSK9CFB/C2

LOC387715

8q24IL23R

TCF7L2

CDKN2B/A

8q24 #28q24 #38q24 #48q24 #58q24 #6ATG16L1

5p1310q21IRGM

NKX2-3IL12B3p211q24

PTPN2TCF2

CDKN2B/A

IGF2BP2CDKAL1

HHEXSLC30A8

MEIS1LBXCOR

1BTBD9

C38q24

ORMDL3

4q25TCF2GCKRFTO

C12orf30

ERBB3KIAA03

50CD22616p13PTPN2SH2B3FGFR2TNRC9

MAP3K1LSP18q24

HMGA2GDF5-UQCCHMPGJAZF1CDC123ADAMTS9THADAWSF1LOXL1IL7RTRAF1/C5STAT4ABCG8GALNT2PSRC1NCANTBL2TRIB1KCTD10ANGLPT3GRIN3A

Slide from David Altshuler

HDL Association at 16q22.1

HDL Association near LIPC

55

Progress in identifying gene variants for common traitsProgress in identifying gene variants for common traits

CholesterolObesityMyocardial infarctionQT intervalAtrial FibrilliationType 2 Diabetes Prostate cancerBreast cancerColon cancerheight

KCNJ11

2003

2000

PPAR

2001

IBD5NOD2

2005

2006

2002

CTLA4

2004

PTPN22

Age Related Macular DegenerationCrohns DiseaseType 1 DiabetesSystemic Lupus ErythematosusAsthmaRestless leg syndromeGallstone diseaseMultiple sclerosisRheumatoid arthritisGlaucoma

2007

CD25IRF5PCSK9CFH

NOS1APIFIH1

PCSK9CFB/C2

LOC387715

8q24IL23R

TCF7L2

CDKN2B/A

8q24 #28q24 #38q24 #48q24 #58q24 #6ATG16L1

5p1310q21IRGM

NKX2-3IL12B3p211q24

PTPN2TCF2

CDKN2B/A

IGF2BP2CDKAL1

HHEXSLC30A8

MEIS1LBXCOR

1BTBD9

C38q24

ORMDL3

4q25TCF2GCKRFTO

C12orf30

ERBB3KIAA03

50CD22616p13PTPN2SH2B3FGFR2TNRC9

MAP3K1LSP18q24

HMGA2GDF5-UQCCHMPGJAZF1CDC123ADAMTS9THADAWSF1LOXL1IL7RTRAF1/C5STAT4ABCG8GALNT2PSRC1NCANTBL2TRIB1KCTD10ANGLPT3GRIN3A

Slide from David Altshuler

A success story in neuropsychiatry

8

6

4

2

-log

10

(P v

alu

e)

Chr 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

HLA

17 18 19 20 21 22

Genome Wide association in narcolepsyin Japan (222 cases vs 389 controls)

From Emmanuel Mignot

J. Hallmayer et al.

Nature Genetics 41, 708 - 711 (2009)

Narcolepsy is strongly associated with the T-cell receptor alpha locus

~2000 cases in GWAS + ~2000 cases in replication

Analysis of rs1154155 Genotypes in Three Replication Cohorts and Combined

Ethnicity AA Case/Ctrl

AC Case/Ctrl

CC Case/Ctrl

ORAC ORCC ORC

African Americans

90/117 23/20 0/1 1.50 (0.74,3.04)

0.00 (0.00,22.90)

1.31 (0.68,2.52)

Asians 86/161 296/318 167/120 1.74

(1.27,2.39) 2.61

(1.81,3.76) 1.54

(1.30,1.83) Caucasians 201/259 132/83 10/6 2.05

(1.45,2.89) 2.15

(0.70,6.77) 1.80

(1.35,2.41) Replication (MH)

1.83 (1.48,2.27)

2.50 (1.80,3.48)

1.59 (1.38,1.83)

* All Samples (MH)

1.94 (1.68,2.25)

2.55 (1.92,3.38)

1.69 (1.52,1.88)

**

* 2 = 42.9, P=5.9x10-11** 2 = 94.2, P=2.8x10-22

Strong genome-wide evidence

Known genes and environment explain little of trait variance

Sequencing: the currently unexplored middle of the allelic spectrum

Whole genome sequencing is coming soon…

But we don’t have very good models for it yet

Summary• The allelic spectrum of complex traits

determines the appropriate genetic mapping approach

• Genetic linkage and association studies require stringent statistical thresholds

• Single candidate gene studies have very low probability of being true positives

• Genome-wide linkage and association studies are beginning to bear fruit for neurobehavioral traits

• Whole-genome sequencing is just around the corner

Recommended