Application of Integrated Nested LaplaceApproximation to Survival Models as used in
Animal Breeding
Sara Martino1, Gregor Gorjanc2, & Ingelin Steinsland1
Norwegian University of Science and Technology1, University of Ljubljana2
Workshop in Bayesian Inference for Latent Gaussian Models . . .Zürich, Switzerland
February 2011
Outline
1. Animal breeding
2. Longevity of cows
3. INLA framework
4. Application
Animal breeding
I Animal breeding= mixture(animal science, genetics, statistics, . . . )
I Many species (cattle, chicken, pig, sheep, goat, horse, dog,salmon, shrimp, honeybee, . . . )
I Many (complex) traits:I production (milk, meat, eggs, . . . )I reproduction (no. of offspring, insemination success, . . . )I conformation (body height, width, . . . )I health & longevityI . . .
I Genetic evaluation - inference of unobserved/latent genotypicvalue in order to enhance selective breeding
Genetic evaluation
Phenotype decomposition(Fisher, 1918)
P = µ+ G + E + G × EG = A + D + IP w µ+ A + E
Pedigree based (mixed)model (Henderson, 1949+)
y|b, a, σ2e ∼ N
(Xb + Za, Iσ2
e)
a|A, σ2a ∼ N
(0,Aσ2
a)
Selected candidates will bredthe next (better) generation
Longevity of cows
I Breeders want high producing & robust cowsI Involuntary culling due to:
I fertility problemsI health statusI . . .
I Robust animals have better longevity = length ofproductive life (an indirect measure of ability to cope withproduction environment)
I Improved longevity has economic impact:I lowered replacement costsI less veterinary costsI more animals producing at mature levelI greater selection response
Statistical analysis of cow longevity
I Some cows are alive at the time of analysis - censored data–> survival analysis
I Need to take into account:I time varying covariatesI genetic (frailty) effectI large/huge datasets!!!
I Available software: SurvivalKit (open source, FORTRAN)I Weibull model, grouped data model, Cox modelI empirical Bayes approach (Ducrocq and Casella)
I joint posterior mode of hyperparametersI effect solutions
I now at version 6 (Ducrocq et al., 2010)
INLA framework
Let (ti , δi) be the observed time and censoring indicator. Weibullregression model is defined as:
hi(t) = h0(t) exp(ηi) = αtα−1 exp(ηi),
where ηi = xTi b is the linear predictor.
Can be casted into INLA framework (Martino et al., 2010):I Hyperparameters θ = (α)
I Latent Gaussian field l = (η,b)I Likelihood π(data|l,θ) =
∏π(datai |ηi ,θ)
INLA framework - frailty terms
As long as we keep ηi Gaussian we can add “complicated” termswithout changing the model structure:
ηi = xTi b + wT
i h + ai
b|σ2b ∼ N
(0, Iσ2
b)
h|H, σ2h ∼ N
(0,Hσ2
h)
a|A, σ2a ∼ N
(0,Aσ2
a)
INLA framework - time varying covariates
t
h(t)
disease
recoveryincreasedherd size
drying off
We assume piecewise-constant time varying covariates. These canbe included via data augmentation noting that:
ˆ t
0h(u) du =
ˆ t1
0h(u) du +
ˆ t2
t1
h(u) du + · · ·+ˆ t
tk
h(u) du
where in each interval the time varying covariate is constant.
Application - data
I A subset of data from national genetic evaluation of cowlongevity in Slovenia (a small country!!!)
I 20,330 cows (daughters of 194 bulls) from 770 herdsI Data
I age at first calvingI stage of lactation within parity (time varying)I year (time varying)I herd size change (time varying)I herd (time varying)I cowI pedigree
Data preparationI 20,330 cows –> 189,504 elementary records
id start stop event ageN parSta year herdS herd fid
---
1 1627208 0 60 0 30 1 1 4 1 1
2 1627208 60 150 0 30 2 1 4 1 1
3 1627208 150 194 0 30 3 1 4 1 1
4 1627208 194 270 0 30 3 2 4 1 1
5 1627208 270 347 0 30 4 2 4 1 1
...
16 1627208 925 1012 0 30 13 4 7 1 1
17 1627208 1012 1018 0 30 14 4 7 1 1
---
18 1628324 0 60 0 26 1 1 7 2 3
19 1628324 60 150 0 26 2 1 7 2 3
20 1628324 150 309 1 26 3 1 7 2 3
...
Model
y|η, α ∼ Weibull (η, α)log (η) = Xb + Wh + Za
b|σ2b ∼ N
(0, Iσ2
b)
covariatesh|σ2
h ∼ N(0, Iσ2
h)
herda|A, σ2
a ∼ N(0,Aσ2
a)
breeding value
Hyperparameters θ = (α, σ2h, σ
2a)
Caution: we need enough phenotypes and strcutured pedigree forproper decoupling of genetic and environmental components!!! Inparticular with non-Gaussian likelihoods.
Population structure in cattleI Only few bulls are used due to artificial inseminationI We get large groups of daughters (half-sisters) –> progeny testI Cows have few daughters
Animal & Sire model20,330 cows (sires) 0 cows
194 bulls (sires) 194 bulls (sires)σ2a
af(k) am(k)
ak
k = 1 : nI
Dk,k
1/2 1/2
bj
j = 1 : nB
hj
j = 1 : nH
ηi
σ2h
α yi
i = 1 : nY
Zi,k
Xi,j Wi,j
σ2s = 1/4σ2
aσ2s
af(k)
k = 1 : nI
I
bj
j = 1 : nB
hj
j = 1 : nH
ηi
σ2h
α yi
i = 1 : nY
Zi,k
Xi,j Wi,j
A−1 = (T−1)T D−1T−1
T−1 = I− 1/2P
INLA call from R (~30 min)## Scale times
myData$start <- myData$start / max
myData$stop <- myData$stop / max
## Specify model
model <- inla.surv(stop, event, start) ~
ageN0 + parSta + year + herdS +
f(herd, model="iid") +
f(fid, model="iid")
## Run INLA
fit <- inla(formula=model, family="weibull",
data=myData)
## For comparison with SurvivalKit
## f(..., prior="logiflat")
## control.data=list(prior="logiflat")
## To fix alpha
## control.data=list(initial=log(2), fixed=TRUE)
Results - covariates
Year
Ris
k ra
tio
−0.
9−
0.7
−0.
5−
0.3
1 2 3 4 1 2 3 4 5 6 7
0.0
0.5
1.0
1.5
Herd size change
Ris
k ra
tio
0.0
0.5
1.0
1.5
1 2 3 4 5 6 7
Stage of lactation with parity
Ris
k ra
tio
−2.
5−
2.0
−1.
5−
1.0
2 3 4 5 6 7 8 9 10 11 12 13 14
Length of productive life (days)
Haz
ard
0 200 400 600 800 1000 1200
Results - hyperparameters (SKit - lines)
α
1.70 1.75 1.80 1.85 1.90 1.95σh
20.08 0.09 0.10 0.11 0.12 0.13 0.14
σs2
0.04 0.06 0.08 0.10 0.12 0.14σa
20.2 0.3 0.4 0.5
h2 =4σ2
sσ2
s + σ2h + 1
= 0.26
To large?
Conclusion
I A long history of using latent Gaussian models in animalbreeding
I Extended INLA R-package to work with time varying covariates
I SurvivalKit & INLA give very similar point estimates
I Application (work in progress) –> continue with relationshipmatrices and larger dataset
References
I Ducrocq V. and Casella G. (1996) A Bayesian analysis of mixedsurvival models. Genet. Sel. Evol., 28:505-529
I Ducrocq V., Sölkner J. and Mészáros G. (2010) Survival Kit v6– a Software Package for Survival Analysis. In: 9th WorldCong. Genet. Appl. Livest. Prod., Leipzig, Germany
I Martino S., Akerkar R., Rue H. (2010) Approximate BayesianInference for Survival Models. Scan. J. Stat.