Upload
hollye
View
37
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Simulating and Modeling Genetically Informative Data. Matthew C. Keller Sarah E. Medland. Outline. The usefulness of simulation in behavioral genetics Using GeneEvolve to simulate genetically informative data Practical simulating different designs Classical Twin Design (CTD) - PowerPoint PPT Presentation
Citation preview
Simulating and Modeling Genetically Informative Data
Matthew C. Keller
Sarah E. Medland
The usefulness of simulation in behavioral The usefulness of simulation in behavioral geneticsgenetics
Using GeneEvolve to simulate genetically Using GeneEvolve to simulate genetically informative datainformative data
Practical simulating different designsPractical simulating different designs1.1. Classical Twin Design (CTD)Classical Twin Design (CTD)
2.2. Nuclear Twin Family Design (NTFD)Nuclear Twin Family Design (NTFD)
Outline
Independent check of models. Especially Independent check of models. Especially important for complex (e.g., extended twin family) important for complex (e.g., extended twin family) models. models.
Model verificationModel verification: Check that your models work as they are : Check that your models work as they are supposed to.supposed to.
Sensitivity analysisSensitivity analysis: Check the effect on parameter estimates : Check the effect on parameter estimates when assumptions are violated (e.g., different modes of when assumptions are violated (e.g., different modes of assortative mating, vertical transmission, or genetic action). assortative mating, vertical transmission, or genetic action).
Method for predicting complex dynamics in Method for predicting complex dynamics in population geneticspopulation genetics
Simulation provides knowledge about processes that are difficult/impossible to
figure out analytically
Using complex models without independent verification (e.g., simulation) is like…
QuickTime™ and a decompressor
are needed to see this picture.
Process of model verification
1.1. Simulate a dataset that has parameters that your Simulate a dataset that has parameters that your model can estimate.model can estimate.
2.2. Run your model on the simulated datasetRun your model on the simulated dataset
3.3. Obtain and store parameter estimatesObtain and store parameter estimates
4.4. Repeat steps 1-3 many (e.g., 1000) times Repeat steps 1-3 many (e.g., 1000) times
Results of model verification
If the mean parameter estimate = the simulated If the mean parameter estimate = the simulated parameter estimate, the estimate is parameter estimate, the estimate is unbiased. unbiased. If If your model has no mistakes, parameters should your model has no mistakes, parameters should generally be unbiased (there are exceptions)generally be unbiased (there are exceptions)
The standard deviation of an estimates The standard deviation of an estimates corresponds to its corresponds to its standard errorstandard error and its and its distribution to its distribution to its sampling distributionsampling distribution
You can also easily study the You can also easily study the multivariate multivariate sampling distribution and statisticssampling distribution and statistics. E.g., how . E.g., how correlated parameters are.correlated parameters are.
Process of sensitivity analysis
1.1. Simulate a dataset that has one or more parameters Simulate a dataset that has one or more parameters that your model that your model cannotcannot estimate. estimate.
2.2. Run your model on the simulated datasetRun your model on the simulated dataset
3.3. Obtain and store parameter estimatesObtain and store parameter estimates
4.4. Repeat steps 1-3 many (e.g., 1000) times Repeat steps 1-3 many (e.g., 1000) times
Results of sensitivity analysis
Because we are simulating Because we are simulating violations of violations of assumptionsassumptions, we expect parameters to be biased, we expect parameters to be biased . . The question becomes: The question becomes: howhow biased? I.e., how big biased? I.e., how big of a deal are these violations? We should be able of a deal are these violations? We should be able to quantify the answers to these questions.to quantify the answers to these questions.
Reality: A=.4, D=.15, S=.15
A,D, & F estimates are highly correlated in Stealth & Cascade models
Simulation is not a panacea Simulation can be said to provide “knowledge Simulation can be said to provide “knowledge
without understanding.” It is a helpful tool for without understanding.” It is a helpful tool for understanding, but doesn’t provide understanding understanding, but doesn’t provide understanding in and of itself.in and of itself.
Simulations themselves rely on assumptions about Simulations themselves rely on assumptions about how processes work. If these are wrong, our how processes work. If these are wrong, our simulation results may not reflect reality. simulation results may not reflect reality.
Simulation program: GeneEvolve
Implemented in R, open-source, user modifiableImplemented in R, open-source, user modifiable User specifies 31 basic parameters up front (and User specifies 31 basic parameters up front (and
17 advanced ones); no need to alter script after 17 advanced ones); no need to alter script after that.that.
Fast Fast (on AMD Opteron 3.2GHz dual 64 bit processor, 2GB RAM; OS= RHEL AS4)(on AMD Opteron 3.2GHz dual 64 bit processor, 2GB RAM; OS= RHEL AS4) 10 genes, N=20,000 takes ~ 20 seconds/gen10 genes, N=20,000 takes ~ 20 seconds/gen
GeneEvolve 0.73
Download: www.matthewckeller.com
User specifies:User specifies: population size, # generations for population to population size, # generations for population to
evolve, threshold effects, mechanisms of evolve, threshold effects, mechanisms of assortative mating, vertical transmission, etc.assortative mating, vertical transmission, etc.
3 types of genetic effects3 types of genetic effects 5 types of environmental effects5 types of environmental effects 13 types of moderator/covariate effects13 types of moderator/covariate effects
How GeneEvolve works:
Download: www.matthewckeller.com
Diagram of GeneEvolve Model
PMa
Sa
Dd
Ee s
A
q
x
w
f
F
PFa
Sa
D d
Ees
A
q
x
w
f
F
mm
PT1
Sa
Dd
Ee s
A
f
F
PT2
Sa
D d
Ees
A
f
F
mm
zd
zs
µ
AxAAxA
AxAAxA
aa aa
aaaa
zaa
11 age
AL AS
P
βAL
1 1
βAS
βageβ01
A
P
βA+ βint(age)
β0
1
Diagram: GeneEvolve Age-by-A Interactions
r
Purcell Model: Our Model:
open Purcell-vs-Ours.pdf; Purcell-vs-OursCorrelation.pdf
At adulthood, ~ x% find mates s.t. phenotypic At adulthood, ~ x% find mates s.t. phenotypic correlation b/w mating phenotypes = AM:correlation b/w mating phenotypes = AM:
Pairs have children :Pairs have children : Rate Rate determined by user-specified population growthdetermined by user-specified population growth
Process iterated Process iterated nn times times
How GeneEvolve works (cont):
Download: www.matthewckeller.com
After After n n iterations, population split into two:iterations, population split into two: Parents of spousesParents of spouses Parents of twinsParents of twins
Parents of twins have offspring (MZ/DZ twins & Parents of twins have offspring (MZ/DZ twins & their sibs)their sibs)
Twins mate with spousal population & have Twins mate with spousal population & have offspringoffspring
How GeneEvolve works (cont):
Download: www.matthewckeller.com
3 generations of phenotypic data written out (one 3 generations of phenotypic data written out (one row per family), potentially across repeated row per family), potentially across repeated measuresmeasures
This data (& subsets of it) can be entered into This data (& subsets of it) can be entered into structural models for model verification and structural models for model verification and sensitivity analysissensitivity analysis
A summary PDF at end showsA summary PDF at end shows: : Basic simulation statisticsBasic simulation statistics Changes in variance components across timeChanges in variance components across time Correlations between 10 relative typesCorrelations between 10 relative types
What you get:
Download: www.matthewckeller.com
SEM is great because…SEM is great because… Directs focus to effect sizes, not “significance” Directs focus to effect sizes, not “significance” Forces consideration of causes and consequencesForces consideration of causes and consequences Explicit disclosure of assumptionsExplicit disclosure of assumptions
Potential weakness…Potential weakness… Parameter reification: “Using the CTD we found that 50% Parameter reification: “Using the CTD we found that 50%
of variation is due to A and 20% to C.” of variation is due to A and 20% to C.”
Structural Equation Modeling (SEM) in BG
SEM is great because…SEM is great because… Directs focus to effect sizes, not “significance” Directs focus to effect sizes, not “significance” Forces consideration of causes and consequencesForces consideration of causes and consequences Explicit disclosure of assumptionsExplicit disclosure of assumptions
Potential weakness…Potential weakness… Parameter reification: “Using the CTD we found that 50% Parameter reification: “Using the CTD we found that 50%
of variation is due to A and 20% to C.” of variation is due to A and 20% to C.”
Structural Equation Modeling (SEM) in BG
NO! Only true under strong assumptions that probably aren’t
met (e.g., D=0) and usually go untested. To the degree
assumptions wrong, estimates are biased.
PT1
Ca
Dd
Ee c
A
PT2
Ca
D d
Eec
A1/.25
1
Classical Twin Design (CTD)
PT1
Ca
Dd
Ee c
A
PT2
Ca
D d
Eec
A1/.25
1
Classical Twin Design (CTD) Assumption biased up biased downAssumption biased up biased down
Either D or C is zero A C & DEither D or C is zero A C & D
No assortative mating C DNo assortative mating C D
No A-C covariance C D & ANo A-C covariance C D & A
Adding parents gets us around all these assumptions
Assumption biased up biased downAssumption biased up biased downEither D or C is zeroEither D or C is zero
No assortative matingNo assortative mating
No A-C covarianceNo A-C covariance
PMa
Ca
Dd
Ee c
A
qw
PFa
Ca
D d
Eec
A
qw
m
m
PT1
Ca
Dd
Ee c
A
PT2
Ca
Dd
Eec
A
mm
1/.25
µ
We don’t have to
make these
x x
With parents, we can break “C” up into:
S = env. factors shared only between sibs
F = familial env factors passed from parents to offspring
F
SC
PT1
SaDd
Ee s
A
f
F
PT2
S aD d
Ees
A
f
F1/.25
1
Parents also allow differentiation of S & F
PT1
Ca
Dd
Ee c
A
PT2
Ca
D d
Eec
A1/.25
1
Nuclear Twin Family Design (NTFD)
Assumptions:Assumptions: Only can estimate 3 of 4: A, D, S, and F (bias is variable)Only can estimate 3 of 4: A, D, S, and F (bias is variable) Assortative mating due to primary phenotypic assortment (bias is variable)Assortative mating due to primary phenotypic assortment (bias is variable)
Note: m estimated and
f fixed to 1
PMa
Sa
Dd
Ee s
A
q
x
w
f
F
PFa
Sa
D d
Ees
A
q
x
w
f
F
mm
PT1
Sa
Dd
Ee s
A
f
F
PT2
Sa
D d
Ees
A
f
F
mm
zd
zs
µ
Stealth
Include twins and their sibs, parents, spouses, and Include twins and their sibs, parents, spouses, and offspring…offspring… Gives 17 unique covariances (MZ, DZ, Sib, P-O, Spousal, Gives 17 unique covariances (MZ, DZ, Sib, P-O, Spousal,
MZ avunc, DZ avunc, MZ cous, DZ cous, GP-GO, and 7 in-MZ avunc, DZ avunc, MZ cous, DZ cous, GP-GO, and 7 in-laws) laws)
88 covariances with sex effects88 covariances with sex effects
can be estimated simultaneously
= env. factors shared only between twins
PT1
Sa
Dd
Ee s
A
f
F
PT2
Sa
D d
Ees
A
f
F1/.25
1
Additional obs. covs with Stealth allow estimation of A, S, D, F, T
Tt
d
T t
1/0
T
(Remember: we’re not just estimating more effects. More importantly, we’re
reducing the bias in estimated effects!)
FS DA T
Stealth
PMa
Sa
Dd
T
E
t
e s
A
q
x
w
f
F
PFa
Sa
D d
T
E
t
es
A
q
x
w
f
F
mm
PT1
Sa
Dd
T
E
t
e s
A
f
F
PMa
Sa
Dd
T
E
t
e s
A
q
x
w
f
F
PT2
Sa
D d
T
E
t
es
A
f
F
PFa
Sa
D d
T
E
t
es
A
q
x
w
f
F
mm
PCh
Sa
D d
T
E
t
es
A
f
F
mm
mm
PCh
Sa
Dd
T
E
t
e s
A
f
F
1/0
1/.25
1
µ
µ
µ
Stealth
Assumption biased up biased downAssumption biased up biased downPrimary assortative mating A, D, or F A, D, or FPrimary assortative mating A, D, or F A, D, or F
No epistasis A, D SNo epistasis A, D S
No AxAge D, S ANo AxAge D, S A
Stealth
Assumption biased up biased downAssumption biased up biased downPrimary assortative mating A, D, or F A, D, or FPrimary assortative mating A, D, or F A, D, or F
No epistasis A, D SNo epistasis A, D S
No AxAge D, S ANo AxAge D, S A
Primary AM: mates choose each other based on Primary AM: mates choose each other based on phenotypic similarityphenotypic similarity
Social homogamy: mates choose each other due to Social homogamy: mates choose each other due to environmental similarity (e.g., religion)environmental similarity (e.g., religion)
Convergence: mates become more similar to each Convergence: mates become more similar to each other (e.g., becoming more conservative when other (e.g., becoming more conservative when dating a conservative)dating a conservative)
PMa
Sa
Dd
T
E
t
e
PMa
s
e
a
A
q
x
wf
f
~t~
~
~~d~s~
F
µ
PFa
Sa
D d
T
E
t
e
PFa
s
e
a
A
q
x
wf
f
~t~
~
~~
d~ s~
F
mm
PT1
Sa
Dd
T
E
t
e
PT1
s
e
a
Af
f
~
t~
~
~~
d~s~F
PMa
Sa
Dd
T
E
t
e
PSp
s
e
a
A
q
x
wf
f
~t~
~
~~d~s~
F
PT2
Sa
D d
T
E
t
e
PT2
s
e
a
Af
f
~
t~
~
~~
d~s~ F
PFa
Sa
D d
T
E
t
e
PSp
s
e
a
A
q
x
wf
f
~t~
~
~~
d~ s~
F
mm
PCh
Sa
D d
T
E
t
es
A
f
F
µ µ
mm
mm
PCh
Sa
Dd
T
E
t
e s
A
f
F
1/0
1/.25
1
Cascade
Reality: A=.5, D=.2
Reality: A=.5, S=.2
Reality: A=.4, D=.15, S=.15
Reality: A=.35, D=.15, F=.2, S=.15, T=.15, AM=.3
Reality: A=.45, D=.15, F=.25, AM=.3 (Soc Hom)
Reality: A=.4, A*A=.15, S=.15
Reality: A=.4, A*Age=.15, S=.15
All models require assumptions. Generally, more All models require assumptions. Generally, more assumptions = more biased estimatesassumptions = more biased estimates
For the first time, we have demonstrated For the first time, we have demonstrated independent assessments of the NTFD, independent assessments of the NTFD, StealthStealth, , and and CascadeCascade models models These complicated models work as designed!These complicated models work as designed! In all models, but especially the CTD, please In all models, but especially the CTD, please
don’t REIFY A, C, & D!don’t REIFY A, C, & D!
Conclusions
Those who conceived of these models originally:Those who conceived of these models originally: Jinks, Fulker, Eaves, Cloninger, Reich, Rice, Jinks, Fulker, Eaves, Cloninger, Reich, Rice,
Heath, Neale, Maes, etc.Heath, Neale, Maes, etc. And to Nick Martin: for his energy and And to Nick Martin: for his energy and
enthusiasm, and for encouraging us to do this to enthusiasm, and for encouraging us to do this to begin withbegin with
Acknowledgments
Check bias & identification:Check bias & identification: Feed PE parameters you are modeling, simulate data, Feed PE parameters you are modeling, simulate data,
& see if your model recovers the parameters& see if your model recovers the parameters Check model’s sensitivity to assumptions:Check model’s sensitivity to assumptions:
Simulate violations of assumptions & note its effects Simulate violations of assumptions & note its effects on estimateson estimates
Estimate power & multivariate sampling dist’s of Estimate power & multivariate sampling dist’s of estimates under very general conditions:estimates under very general conditions:
Run PE multiple times given whatever condition you Run PE multiple times given whatever condition you wantwant
Why use it? Modeling aid
Download: www.matthewckeller.com
Find changes in variance parameters & relative Find changes in variance parameters & relative covariances under different modes of AM, VT, & covariances under different modes of AM, VT, & genetic effects:genetic effects:
Simulate random genetic drift by varying Simulate random genetic drift by varying population sizepopulation size
Introduce selection (coming) to test theories on Introduce selection (coming) to test theories on maintenance of genetic variationmaintenance of genetic variation
Why use it? Predictor of population / evolutionary genetics dynamics
Download: www.matthewckeller.com