Conifer Translational Genomics Network Coordinated Agricultural … · 2019-01-21 · Conifer...

Conifer Translational Genomics Network

Coordinated Agricultural Project

www.pinegenome.org/ctgn

Genomics in Tree Breeding and

Forest Ecosystem Management

Module 11 – Association Genetics

Nicholas Wheeler & David Harry – Oregon State University

What is association genetics?

Association genetics is the process of identifying alleles that are

disproportionately represented among individuals with different

phenotypes. It is a population-based survey used to identify

relationships between genetic markers and phenotypic traits

– Two approaches for grouping individuals

– By phenotype (e.g. healthy vs. disease)

– By marker genotype (similar to approach used in QTL studies)

– Two approaches for selecting markers for evaluation

– Candidate gene

– Whole genome

Association genetics: conceptual example

Figure Credit: Nicholas Wheeler, Oregon State University

Comparing the approaches

Criteria Family-based QTL Mapping Population-based Association Mapping

Number of markers Relatively few (50 – 100’s) Many (100’s – 1000’s)

Populations Few parents or grandparents with many offspring (>500)

Many individuals with unknown or mixed relationships. If pedigreed, family sizes are typically small (10’s) relative to sampled population (>500)

QTL analysis Easy or complex. Sophisticated tools minimize ghost QTL and increase mapping precision

Easy or complex. Sophisticated tools reduce risk of false positives

Detection depends on QTL segregation in offspring, and marker-trait linkage within-family(s)

QTL segregation in population, and marker-trait LD in mapping population

Mapping precision Poor (0.1 to 15 cM). QTL regions may contain many positional candidate genes

Can be excellent (10’s to 1000’s kb). Depends on population LD

Variation detected Subset (only the portion segregating in sampled pedigrees)

Larger subset. Theoretically all variation segregating in targeted regions of genome

Extrapolation to other families or populations

Poor. (Other families not segregating QTL, changes in marker phase, etc)

Good to excellent. (Although not all QTL will be segregate in all population / pedigree subsamples)

Essential elements of association genetics

Appropriate populations

– Detection

– Verification

Good phenotypic data

Good genotypic data

– Markers (SNPs): Number determined by experimental approach

– Quality of SNP calls

– Missing data

Appropriate analytical approach to detect significant associations

Flowchart of a gene association study

Figure Credit: Modified from Flint-Garcia et al., 2005

An association mapping population with

known kinship

– 32 parents

• 64 families

– ~1400 clones

Figure Credits: Cooperative Forest Genetics Research Program, University of Florida

Phenotyping: Precision, accuracy, and more

Figure Credits: Gary Peter, University of Florida

Genotyping: Potential genomic targets

Figure Credit: Nicholas Wheeler and David Harry, Oregon State University

Whole genome or candidate gene? Let’s

look again at how this works

Figure Credit: Reprinted from Current Opinion in Plant Biology, Vol 5, Rafalski, Applications of single nucleotide polymorphisms in

crop genetics, pages 94-100, 2002, with permission from Elsevier

Local distribution of SNPs and genes

Figure Credit: Reprinted by permission from Macmillan Publishers Ltd: Nature Reviews Genetics, Jorgenson and Witte, 2006

Candidate genes for novel (your) species

Availability of candidate genes

– Positional candidates

– Functional studies

– Model organisms

– Genes identified in other forest trees

Candidate genes for association studies

Functional candidates By homology to genes in other species

By direct evidence in forest trees

Expression candidates Microarray analyses

Proteomics

Metabolomics

e fh a nd

rib os o m a l

r n a p olym e r as e

r na /dn a b in din g

s tru c tu r e r e co g nit ion

tr an sc r iptio n fac to r

u biq u it in

atp syn th a se

un kn o w n

o xid as e

h ea t sh o ck pr o te in

he lica s e

h isto ne

2 %h ydr o las e

p e ptid as e

ph o sp h ata se

1 %r ed u cta se

k in a se

tub u lin

tr a ns latio n in it ia t io n

z inc f in g er p ro tein

2 %aq u ap o rin

Positional candidates QTL analyses in pedigrees

P-UBC_BC_350_10000.0P-PmIFG_1275_a/Thaumatin-like_precursor8.7P-PmIFG_1173_a8.9F-PmIFG_1548_a(fr8-1)16.3M-PmIFG_1514_c19.8P-UBC_BC_570_42523.7M-estPmIFG_Pt9022+/2/translation-factor26.9P-OSU_BC_570_36030.8B-PmIFG_1474_a31.7M-PmIFG_1474_b35.7F-PmIFG_0339_a(fr8-2)40.0P-PmIFG_0320_a43.3P-PmIFG_1005_e(fr17-2)43.8M-UBC_BC_245_75044.6P-PmIFG_1427_a45.4F-PmIFG_1567_a(fr8-3)47.1P-PmIFG_1570_a50.0P-PmIFG_0005_a(fr17-1)50.3P-PmIFG_1308_a51.3F-PtIFG_2885_a/255.6P-PmIFG_1145_a(fr8-4)61.1P-PmIFG_1601_a64.1M-IFG_OP_K14_82565.2M-estPtIFG_8510/1173.0M-PmIFG_0005_b78.5P-UBC_BC_446_60081.0P-PmIFG_0315_a(fr15-1)91.4B-estPmaLU_SB49/395.6P-PmIFG_1060_a97.8M-IFG_OP_H09_065098.9F-PmIFG_1123_a(fr15-2)104.3M-UBC_BC_506_800115.4M-PmIFG_1591_a122.4

Antifreeze protein

Linkage group

P-UBC_BC_350_10000.0P-PmIFG_1275_a/Thaumatin-like_precursor8.7P-PmIFG_1173_a8.9F-PmIFG_1548_a(fr8-1)16.3M-PmIFG_1514_c19.8P-UBC_BC_570_42523.7M-estPmIFG_Pt9022+/2/translation-factor26.9P-OSU_BC_570_36030.8B-PmIFG_1474_a31.7M-PmIFG_1474_b35.7F-PmIFG_0339_a(fr8-2)40.0P-PmIFG_0320_a43.3P-PmIFG_1005_e(fr17-2)43.8M-UBC_BC_245_75044.6P-PmIFG_1427_a45.4F-PmIFG_1567_a(fr8-3)47.1P-PmIFG_1570_a50.0P-PmIFG_0005_a(fr17-1)50.3P-PmIFG_1308_a51.3F-PtIFG_2885_a/255.6P-PmIFG_1145_a(fr8-4)61.1P-PmIFG_1601_a64.1M-IFG_OP_K14_82565.2M-estPtIFG_8510/1173.0M-PmIFG_0005_b78.5P-UBC_BC_446_60081.0P-PmIFG_0315_a(fr15-1)91.4B-estPmaLU_SB49/395.6P-PmIFG_1060_a97.8M-IFG_OP_H09_065098.9F-PmIFG_1123_a(fr15-2)104.3M-UBC_BC_506_800115.4M-PmIFG_1591_a122.4

Antifreeze protein

Linkage group

Figure Credits: Kostya Krutovsky, Texas A&M University

Potential genotyping pitfalls

Quality of genotype data

– Contract labs, automated base calls

Minor allele frequency

– Use minimum threshold, e.g. MAF ≥ 0.05 or MAF ≥ 0.10

– Rare alleles can cause spurious associations due to small samples

(recall that D’ is unstable with rare alleles)

Missing data !!!

– Alternative methods for imputing missing data

Statistical tests for marker/trait associations

SNP by trait association testing is, at its core, a simple test of

correlation/regression between traits

In reality such cases rarely exist and more sophisticated

approaches are required. These may take the form of mixed models

that account for potential covariates and other sources of variance

The principle covariates of concern are population structure and

kinship or relatedness, both of which may result in LD between a

marker and a QTN that is not predictive for the population as a

Causes of population structure

Geography

– Adaptation to local conditions (selection)

Non-random mating

– Isolation / bottlenecks (drift)

– Assortative mating

– Geographic isolation

Population admixture (migration)

Co-ancestry

Case-control and population structure

Figure Credit: Reprinted by permission from Macmillan Publishers Ltd: Nature Genetics Marchini et al., 2004.

Accommodating population structure

Avoid the problem by avoiding admixted populations or working with

populations of very well defined co-ancestry

Use statistical tools to make appropriate adjustments

Detecting and accounting for population

structure

Family based methods

Population based methods

– Genomic control (GC)

– Structured association (SA)

– Multivariate

Mixed model analyses (test for association)

Family based approaches

Avoid unknown population structure by following marker-trait inheritance in families (known parent-offspring relationships)

Common approaches include – Transmission disequilibrium test (TDT) for binary traits

– Quantitative transmission disequilibrium test (QTDT) for quantitative traits

– Both methods build upon Mendelian inheritance of markers within families

Test procedure – Group individuals by phenotype

– Look for markers with significant allele frequency differences between groups

For a binary trait such as disease, use families with affected offspring

Constraints – Family structures must be known (e.g. pedigree)

– Limited samples

Population based: Genomic control

Because of shared ancestry, population structure should translate

into an increased level of genetic similarity distributed throughout

the genome of related individuals

By way of contrast, the expectation for a causal association would

be a gene specific effect

Genomic control (GC) process

– Neutral markers (e.g. 10-100 SSRs) are used to estimate the overall

level of genetic similarity within a sampled population

– In turn, this proportional increase in similarity is used as an inflation

factor, sometimes called , used to adjust significance probabilities (p-

values)

– For example,

– Typical values of are in the range of ~0.02-0.10

p-value(adj) = p-value(unadj) /(1+ )

Structured association

The general idea behind structured association (SA) is that cryptic

population history (or admixture) causes increased genetic similarity

within groups

The challenge is to determine how many groups (K) are

represented, and then to quantify group affinities for each individual

Correction factors are applied separately to each individual, based

upon the inferred group affinities

SA is computationally demanding

Multivariate methods

Multivariate methods build upon co-variances among marker

genotypes

Multivariate methods such as PCA offer several advantages over

Downstream analysis of SA and PCA data are similar

Mixed model approaches

Mixed models test for association by taking into account factors

such as kinship and population structure, provided by other means

Provides good control of both type 1 (false positive associations)

and type 2 (false negative associations) errors

Figure Credit: Fikret Isik, North Carolina State University

Trait L1 L2 SNP1 P1 P2 G1 G2 G3 G4

y1 = 1 0 1 1 0 1 0 0 0 e1

y2 = 1 0 1 0 1 0 0 1 0 e2

y3 = 1 0 1 0 1 0 0 1 0 u1 e3

y4 = 1 0 b1 0 0 1 v1 0 0 1 0 u2 e4

y5 = 0 1 b2 0 1 0 v2 0 1 0 0 u3 e5

y6 = 0 1 0 0 1 0 0 0 1 u4 e6

y7 = 0 1 1 1 0 0 1 0 0 e7

y8 = 0 1 1 0 1 0 0 0 1 e8

yi = Xβ + Sα + Qv + Zu + ei

y3 = b1 + a1 + v2 + u3 + e3

SNP ID

+* + *

Location ID

= Measured trait

Population ID

= Fixed effects (BLUE = Best Linear Unbiased Esitimates)

= Random effects (BLUP = Best Linear Unbiased Predictions)

Genotype ID

a1 + *

eZuQvSXyi :model mixed Tassel1 2 3 4 5 6 7 8

Significant associations for diabetes

distributed across the human genome

Figure Credit: Reprinted by permission from Macmillan Publishers Ltd: Nature Reviews Genetics McCarthy et al., 2008.

Association genetics: Concluding comments

Advantages

– Populations

– Mapping precision

– Scope of inference

Drawbacks

– Resources required

– Confounding effects

– Repeatability

References cited

Bradbury, P. J., Z. Zhang, D. E. Kroon, T. M. Casstevens, Y. Ramdos, and E. S. Buckler. 2007. TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 23: 2633-2635. (Available online at: http://dx.doi.org/10.1093/bioinformatics/btm308) (verified 2 June 2011).

Flint-Garcia, S. A., A. C. Thuillet, J. M. Yu, G. Pressoir, S. M. Romero, S. E. Mitchell, J. Doebley, S. Kresovich, M. M. Goodman, and E. S. Buckler. 2005. Maize association population: A high-resolution platform for quantitative trait locus dissection. Plant Journal 44: 1054-1064. (Available online at: http://dx.doi.org/10.1111/j.1365-313X.2005.02591.x) (verified 2 June 2011).

Jorgenson, E. and J. Witte. 2006. A gene-centric approach to genome-wide association studies. Nature Reviews Genetics 7: 885-891. (Available online at: http://dx.doi.org/10.1038/nrg1962) (verified 2 June 2011).

Marchini, J., L. R. Cardon, M. S. Phillips, and P. Donnelly. 2004. The effects of human population structure on large genetic association studies. Nature Genetics 36: 512-517. (Available online at: http://dx.doi.org/10.1038/ng1337) (verified 2 June 2011).

McCarthy, M. I., G. R. Abecasis, L. R. Cardon, D. B. Goldstein, J. Little, J.P.A. Iaonnidis, and J. N. Hirschhorn. 2008. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Reviews Genetics 9: 356-369. (Available online at: http://dx.doi.org/10.1038/nrg2344) (verified 2 June 2011).

References cited

Neale, D. B., and O. Savolainen. 2004. Association genetics of complex traits in conifers. Trends in Plant Science 9: 325-330. (Available online at: http://dx.doi.org/10.1016/j.tplants.2004.05.006) (verified 2 June 2011).

Pearson, T. A., and T. A. Manolio. 2008. How to interpret a genome-wide association study. Journal of the American Medical Association 299: 1335-1344. (Available online at: http://dx.doi.org/10.1001/jama.299.11.1335) (verified 2 June 2011).

Rafalski, A. 2002. Applications of single nucleotide polymorphisms in crop genetics. Current Opinion in Plant Biology 5: 94-100.

Wheeler, N. C., K. D. Jermstad, K. Krutovsky, S. N. Aitken, G. T. Howe, J. Krakowski, and D. B. Neale. 2005. Mapping of quantitative trait loci controlling adaptive traits in coastal Doublas-fir. IV. Cold-hardiness QTL verification and candidate gene mapping. Molecular Breeding 15: 145-156. (Available online at: http://dx.doi.org/10.1007/s11032-004-3978-9) (verified 2 June 2011).

Yu, J., G. Pressoir, W. H. Briggs, I. V. Bi, M. Yamasaki, J. F. Doebley, M. D. McMullen, et al. 2006. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genetics 38: 203-208. (Available online at: http://dx.doi.org/10.1038/ng1702) (verified 2 June 2011).

Thank You. Conifer Translational Genomics Network

Coordinated Agricultural Project

Conifer Translational Genomics Network Coordinated Agricultural … · 2019-01-21 · Conifer...

Documents

Translational Genomics Research Institute Individualized

Translational Genomics Research Institute

Challenges in translating genomics technologies to …crdi.ie/uploads/files/Translational/Pierre_Meulien.pdfChallenges in translating genomics technologies to the clinic Irish Biomarkers

Conifer Translational Genomics Network Coordinated Agricultural …pbgworks.org/sites/pbgworks.org/files/QTL_Discovery.pdf · · 2011-08-08Conifer Translational Genomics Network

Research in Focus: Translational Biological and Chemical Genomics

Planning for Gene Regulatory Network Intervention Daniel Bryce Arizona State University Seungchan Kim Arizona State University & Translational Genomics

ain Grain Legumes 37-41 Mortimer Street, London W1T …oar.icrisat.org/8584/1/CriticalRewsinPlantSciences_34_169-194_2015.pdf · Translational Genomics in Agriculture: Some Examples

Translational Genomics for Crop Breeding,€¦ · Chapter 1 Translational Genomics for Crop Breeding: Abiotic Stress Tolerance, Yield, and Quality, An Introduction 1 Rajeev K. Varshney

Personal Genomics, Personalized Medicine, YOU · Personal Genomics, Personalized Medicine, & YOU Carrie Iwema, PhD, MLS 21st May 2012 AAAS/Science Translational Medicine panel discussion;

Conifer Translational Genomics Network Coordinated Agricultural Project Genomics in Tree Breeding and Forest Ecosystem Management

Genomics and bioinformatics resources for translational ... · Genomics and bioinformatics resources for translational science in Rosaceae Sook Jung • Dorrie Main Received: 11 February

ACCELERATING CLINICAL AND TRANSLATIONAL RESEARCH Metabolomics/Proteomics and Genomics at IUB Indiana CTSI – Purdue Retreat Monday,

Conifer Ele

ClinSeq: Piloting Large-Scale Medical Sequencing for Translational Genomics Research Leslie Biesecker, M.D. National Human Genome Research Institute, NIH

ACCELERATING CLINICAL AND TRANSLATIONAL RESEARCH Supercores – Genomics & Metabolomics/Proteomics Indiana CTSI – Purdue Retreat MONDAY,

Conifer Translational Genomics Network Coordinated ... Genetics.pdf · Population genetics Population genetics is the study of genetic differences within and among populations of

Translational Genomics for Crop Breeding,download.e-bookshelf.de/download/0003/9792/41/L-G-0003979241... · important book, Translational Genomics for Crop Breeding, Vol. 1: Biotic

Clinical Genomics – infrastructure for translational research at Karolinska Institutet & Science for Life Laboratory

ConiferQuarterlyPhoto by Tom Cox. Fall 2014 Conifer Quarterly 1 The Conifer Quarterly is the publication of the American Conifer Society The purposes of the American Conifer Society

Conifer Translational Genomics Network Coordinated ... and Genomes.pdf · Quantitative genetics describes variation in traits influenced by multiple genes (continuous rather than