View
4
Download
0
Category
Preview:
Citation preview
SNPs and Diseases
Molecular School of Medicine
Thursday, November 15th, 2018
Carolina Medina Gomez PhD
- Quick look to the pioneers: HapMap
- 1000 Genomes project
-Description
- Diversity Panel
-The HRC consortium
- Local Panels
- Acquire awareness on the implications of population diversity
- Comprehend the utility of large haplotype reference panels and
large biobank data
- Use this knowledge for the mapping of complex traits
Topic outline
Learning Aims
AIM
Perform a comprehensive sampling of common genetic variation that
may form the basis of phenotypic differences in humans
The HapMap Project
YRI
CEU
CHB+JPT
A second generation human haplotype map of over 3.1 million SNPs 2007, Nature 449: 851-861.
Study Population
• Ongoing birth population-based
longitudinal study. Parents from
over 100 different countries
(N~5732 GWAS)
The HapMap Project
YRI
CEU
CHB+JPT
HapMap II r 22: Build 36 - 2007
270 samples
PCs
HapMap III r22 Build 36 - 2010
1,184 Samples – DEPICT, LDSC
Name Population # of samples
ASW African ancestry in Southwest USA 53
CEUUtah residents with Northern and Western European ancestry
from the CEPH collection112
CHB Han Chinese in Beijing, China 137
CHD Chinese in Metropolitan Denver, Colorado 109
GIH Gujarati Indians in Houston, Texas 101
JPT Japanese in Tokyo, Japan 113
LWK Luhya in Webuye, Kenya 110
MEX Mexican ancestry in Los Angeles, California 58
MKK Maasai in Kinyawa, Kenya 156
TSI Toscani in Italia 102
YRI Yoruba in Ibadan, Nigeria 147
Integrating common and rare genetic variation in diverse populations 2010, Nature 467: 52-58.
Phase 1 – 2010
AIM
Catalogue of human genetic variation sequencing whole genome of 1,092
individuals from 14 worldwide populations. Discover human genetic
variations of all types (95% of variation > 1% frequency) at the population
level
The 1000 Genomes Project – Build 37
The 1000 Genomes Project Consortium 2010, Nature 467: 1061-1073.
Phase 3 – 2015
AIM
Catalogue of human genetic variation sequencing whole genome of 2,504
individuals from 26 worldwide populations. Discover human genetic
variations of all types (99% of variation > 1% frequency) at the population
level
The 1000 Genomes Project – Build 37
A global reference for human genetic variation 2015, Nature 526: 68-.
Phase 3 – 2014
AIM
Catalogue of human genetic variation sequencing whole genome of 2,504
individuals from 26 worldwide populations. Discover human genetic
variations of all types (99% of variation > 1% frequency) at the population
level
The 1000 Genomes Project – Build 37
A global reference for human genetic variation 2015, Nature 526: 68-.
The American Journal of Human Genetics 96, 37–53, January 8, 2015
Phased design
Generation R HapMap Imputation
3,021,329 SNPs
2,671,724 MAF>0.01 r2>=0.3
Generation R 1KG Imputation
47,072,644 SNPs
11,361,791 MAF>0.01 r2>=0.3
2012
2017
None of the two variants were present or tagged by HapMap variants (one
is common)
Das et al. Annual Reviews 2018.
Genotype Imputation from Large Reference Panels
Imputation Servers
Phase 1 – 2016
AIM
To bring together as many whole-genome sequencing data sets as
possible. This reference panel consists of 64,976 haplotypes at
39,235,157 SNPs.
The Haplotype reference consortium – Build 37
A reference panel of 64,976 haplotypes for genotype imputation. Nature Genetics 48 10
Phase 1.1 – 2016
AIM
To bring together as many whole-genome sequencing data sets as
possible. This reference panel consists of 64,976 haplotypes at
39,235,157 SNPs.
The Haplotype reference consortium – Build 37
A reference panel of 64,976 haplotypes for genotype imputation. Nature Genetics 48 10
34% increase r2 at 0.1%
r2 at ~0.4%
1KG~ 2 500
HRC: ~ 32 000
TOPMed~ 64 000
Larger Panels better results
Das et al. Annual Reviews 2018.
2017
Canela-Xandri et al. 2018
GWAS hits increase sample size with no sign of
saturation
The Million Veteran Program
began collecting data in 2011,
and it has the goal of reaching
1 million participants by 2020
or 2021.
Now… imagine if we combine
data
Predictions for next GIANT
freeze 1.5 million
200,000 KP member volunteers in Northern
California.
~700,000
volunteers
worldwide.
200,000 volunteers
100,000
volunteers
up to 500,000
Mexico City Prospective StudyMetabolomics 150,000 participants
Analytical Issues!
Current challenges
Perception challenge:
Are we using the correct multiple testing threshold, or we should change
it as we are including more rare variants constituting independent test (LD low).
Methodological challenge:
Is it necessary to correct further for population stratification
(implementation of mixed model) to avoid false-positive signals.
Computational challenge:
Can we store and analyze the data with our current computational power.
Follow-up challenge
Can we identify correctly variants/genes for follow-up studies.
http://geneatlas.roslin.ed.ac.uk/
Cryptic familiar relations and population structure
http://www.nealelab.is/uk-biobank
Imputed data: http://geneatlas.roslin.ed.ac.uk/
http://www.nealelab.is/uk-biobank
Genotyped data: https://biobankengine.stanford.edu/
The discovery of genetic variants associated with a trait
or disease is determine by different parameters
We are surpassing the 1M barrier
New imputation panels allow to
explore variants with MAF ~0.1%
More variants more opportunities
to tag the causal variant
The new GWAS era is a treasure trove for making
new fundamental discoveries in human genetics.
10 Years of GWAS Discovery: Biology, Function, and Translation. AJHG July. 2017
Resolution magnification
P<=6.6x10-9
Identification of 153 new loci associated with heel bone mineral density and functional involvement of GPC6 in osteoporosis.
Nature Genetics.
Original 2x2 scenario of genetic architecture needs to be
redefined under the scope of the 1000G and other projects
- Understanding of human genome diversity is key for the design of
genetic studies
- Large and more comprehensive panels panels provide the best
performance and yield in terms of quality and MAF coverage resulting in
greater power (even more so, in combination with MegaBiobanks!)
- Most of the novel variants to be discovered are of “low to rare” allele
frequency, highly population specific & enriched for functional aspects
- Upscaling of technology, either through interfacing with -omic data or
through experimental perturbations are necessary for making new
fundamental discoveries in human genetics
Take home messages
Recommended