Upload
kristopher-green
View
218
Download
2
Tags:
Embed Size (px)
Citation preview
Lecture 25: Association Genetics
November 30, 2012
Announcements Final exam on Monday, Dec 10 at 11 am,
in 3306 LSB
2010 exam and study sheets posted on website
Exam is mostly non-cumulative
Review session on Friday, Dec. 7
Extra credit lab next Wednesday: up to 10 points
Extra credit report due at final exam
Last Time
Quantitative traits
Genetic basis
Heritability
Linking phenotype to genotype
QTL analysis introduction
Limitations of QTL
Today
Association genetics
Effects of population structure
Transmission Disequilibrium Tests
Quantitative Trait Locus Mapping
HEIG
HT
GENOTYPEBBBbbb
modified from D. Neale
abc
ABC
ABC
Parent 1 Parent 2
Xabc
F1 F1
X
ABC
abc
ABC
abc
ABc
aBc
aBc
Abc
ABc
aBc
Abc
Abc
abc
Abc
ABC
ABc
Abc
aBc
aBc
Abc
aBc
aBc
Bb
BbBB BB BBbb bbBB Bb Bb
QTL for aggressive behavior in mice
X chromosome
Monoamine Oxidase A (MAOA)
Brodkin et al. 2002
http://people.bu.edu/jcherry/webpage/pheromone.htm
XABC
ABC
abc
abc
F1 XABC
abc
ABC
abc
ABc
aBc
aBc
Abc
ABc
aBc
Abc
Abc
Monoamine Oxidase A (MAOA) Selectively degrades serotonin, norephinephrine,
and dopamine
Located near QTL for aggressive behavior on the X chromosome
Levels of expression affected by a VNTR (minisatellite) locus in the promoter region
Sabol et al. 1998
MAOA and childhood maltreatment
Caspi et al. 2002
Genotype-by-Environment interaction
QTL Limitations
Biased toward detection of large-effect loci
Need very large pedigrees to do this properly
Limited genetic base: QTL may only apply to the two individuals in the cross!
Genotype x Environment interactions rampant: some QTL only appear in certain environments
Huge regions of genome underly QTL, usually hundreds of genes
How to distinguish among candidates?
Linkage Disequilibrium and Quantitative Trait Mapping
Linkage and quantitative trait locus (QTL) analysis
Need a pedigree and moderate number of molecular markers
Very large regions of chromosomes represented by markers
Association Studies with Natural Populations
No pedigree required
Need large numbers of genetic markers
Small chromosomal segments can be localized
Many more markers are required than in traditional QTL analysis
Cardon and Bell 2001, Nat. Rev. Genet. 2: 91-99
Association Mapping
ancestral chromosomes
*TG
recombination throughevolutionary history
present-daychromosomesin natural population
*TG
*TA
CG
CA*TG
CA
Slide courtesy of Dave Neale
HEIG
HT
GENOTYPECCTCTT
Candidate Gene Associations vs. Whole Genome Scans
If LD is high and haplotype blocks are conserved, entire genome can be efficiently scanned for associations with phenotypes
Simplest for case-control studies (e.g., disease, gender)
If LD is low, candidate genes are usually identified a priori, and a limited number are scanned for associations
Biased by existing knowledge
Use "Candidate Regions" from high LD populations, assess candidate genes in low LD populations
P_2852_A157.3
P_2385_A
AB
OV
E:B
ELO
W
CO
AR
SE
RO
OT
P_204_C0.0S8_328.8P_2385_C11.6T4_1012.1S15_8S5_3713.8T4_7S6_1215.5S8_2917.9P_2786_A S12_1820.4T1_1322.3T7_423.5T3_13 T3_36S17_2124.1
S15_16T12_1525.3T2_3026.5S13_2029.5S1_2036.5T9_1 S1_1943.2S3_1350.5S1_2452.9S2_754.1P_575_A59.1T12_2260.6S2_3285.0T7_995.7S2_6107.8S13_16 T5_25121.4T5_12124.3T10_4129.0T1_26 T7_13135.7P_93_A148.6S4_20150.2S7_13 S7_12T12_4152.8
S4_24T3_10S6_4154.1
S3_1163.4S6_20 S13_31T7_15171.3
T2_31178.2S8_4180.8S8_28182.1O_30_A184.2T5_4193.5T3_17198.1T12_12206.8S5_29210.6P_2789_A219.9P_634_A S17_43226.5S17_33230.3S17_12232.7S4_19243.1
S17_26262.9
I
QTL Candidate Region
Candidate Gene Identification
Human HapMap Project and Whole Genome Scans
LD structure of human Chromosome 19 (www.hapmap.org)
1 common SNP genotyped every 5kb for 269 individuals 9.2 million SNP in total
Take advantage of haplotype blocks to efficiently scan genome
NATURE|Vol 437|27 October 2005
Next-Generation Sequencing and Whole Genome Scans
The $1000 genome is on the horizon
Current cost with Illumina HiSeq 2000 is about $2000 for 10X depth
The 1000 genomes project has sequenced thousands of human genomes at low depth
Can detect most polymorphisms with frequency >0.01
True whole genome association studies now possible at a very large scalehttp://www.1000genomes.org/
Identifying genetic mechanisms of simple vs. complex diseases
Simple (Mendelian) diseases: Caused by a single major gene
High heritability; often can be recognized in pedigrees
Example: Huntington’s, Achondroplasia, Cystic fibrosis, Sickle Cell Anemia
Tools: Linkage analysis, positional cloning
Over 2900 disease-causing genes have been identified thus far: Human Gene Mutation Database: www.hgmd.cf.ac.uk
Complex (non-Mendelian) diseases: Caused by the interaction between environmental factors and multiple genes with minor effects
Interactions between genes, Low heritability
Example: Heart disease, Type II diabetes, Cancer, Asthma
Tools: Association mapping, SNPs !!
Over 35,000 SNP associations have been identified thus far:
http://www.snpedia.com
Slide adapted from Kermit Ritland
Complicating factor: Trait HeterogeneitySame phenotype has multiple genetic mechanisms underlying
it
Slide adapted from Kermit Ritland
Case-Control Example: Diabetes
Knowler et al. (1988) collected data on 4920 Pima and Papago Native American populations in Southwestern United States
High rate of Type II diabetes in these populations
Found significant associations with Immunoglobin G marker (Gm)
Does this indicate underlying mechanisms of disease?
Knowler et al. (1988) Am. J. Hum. Genet. 43: 520
Type 2 Diabetes present absent Total
present 8 29 37
absent 92 71 163
Total 100 100 200
Gm Haplotype
(1) Test for an association
21 = (ad - bc)2N .
(a+c)(b+d)(a+b)(c+d)
Case-control test for association (case=diabetic, control=not diabetic)
Question: Is the Gm haplotype associated with risk of Type 2 diabetes???
(2) Chi-square is significant. Therefore presence of GM haplotype seems to confer reduced occurence of diabetes
= [(8x71)-(29x92)]2 (200) (100)(100)(37)(163)
= 14.62
Slide adapted from Kermit Ritland
Index of indian Heritage
Gm Haplotype
Percent with diabetes
0 Present
Absent
17.8
19.9
4 Present
Absent
28.3
28.8
8 Present
Absent
35.9
39.3
Case-control test for association (continued)
Question: Is the Gm haplotype actually associated with risk of Type 2 diabetes???
The real story: Stratify by American Indian heritage
0 = little or no indian heritage; 8 = complete indian heritage
Conclusion: The Gm haplotype is NOT a risk factor for Type 2 diabetes, but is a marker of American Indian heritage
Slide adapted from Kermit Ritland
Assume populations are historically isolated
One has higher disease frequency by chance
Unlinked loci are differentiated between populations also
Unlinked loci show disease association when populations are lumped together
Population structure and spurious association
Alleles at neutral locus
Alleles causing susceptibility to disease
Population with low disease frequency
Population with high disease
frequencyG
ene
flow
bar
rier
Association Study Limitations
Population structure: differences between cases and controls
Genetic heterogeneity underlying trait
Random error/false positives
Inadequate genome coverage
Poorly-estimated linkage disequilibrium
a=# times M transmitted
b=# times M not transmitted
(a-b)2/(a+b)
Approximately distributed as 2 with 1 degree of freedom
Transmission Disequilibrium Test (TDT) (Spiegelman et al 1993)
Mm
Mm
mm
Mm
mm
mm
Slide adapted from Kermit Ritland
Compare diseased offspring genotypes to parental genotypes to test if loci violate Mendelian expectations
Controls for population structure
Compared with “standard” association tests:
Still need to have tight LD, so need many markers:
Is not affected by population stratification
Only detects signal if there is both linkage and association, does not depend on mode of inheritance
Uses only affected progeny (and parental genotypes), so method is efficient
Transmission Disequilibrium Test (TDT)
Association Tests and Population Structure Transmission disequilibrium
tests have limited power and range of application
sample size limitations
restricted allelic diversity
“Genomic Control” uses random markers throughout genome to control for false associations
“Mixed Model” approach allows incorporation of known relatedness and population structure simultaneously
Cardon and Bell 2001 Nature Reviews Genetics 2:91
ANOVA/Regression Model
(monotonic)transformation
phenotype(response variable)
of individual i
effect size(regression coefficient)
coded genotype(feature) of individual i
p(β=0)error
(residual)
Goal: Find effect size that explains best all (potentially transformed) phenotypes as a linear function of the
genotypes and estimate the probability (p-value) for the data being consistent with the null hypothesis (i.e. no effect)http://www2.unil.ch/cbg/index.php?title=Genome_Wide_Association_Studies
Mixed Model
phenotype(response variable)
of individual i
effect of target SNP Family effect(Kinship
coefficient)
Population Effect (e.g., Admixture coefficient from
Structure or values of Principal Components)
effects of background
SNPs
Implemented in the Tassel program (Wednesday in lab)
Commercial Services for Human Genome-Wide SNP Characterization
NATURE|Vol 437|27 October 2005
Assay 1.2 million “tag SNPs” scattered across genome using Illumina BeadArray technology
Ancestry analyses and disease/behavioral susceptibility