Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
98 Twin Research Volume 7 Number 1 pp 98plusmn 113
We investigate whether relative contributions ofgenetic and shared environmental factors are
associated with an increased risk in melanoma Datafrom the Queensland Familial Melanoma Projectcomprising 15907 subjects arising from 1912 fami-lies were analyzed to estimate the additive geneticcommon and unique environmental contributions tovariation in the age at onset of melanoma Two com-plementary approaches for analyz ing correlatedtime-to-onset family data were considered the gener-alized estimating equations (GEE) method in whichone can estimate relationship-specific dependencesimultaneously with regression coefficients thatdescribe the average population response to chang-ing covariates and a subject-specific Bayesian mixedmodel in which heterogeneity in regression para-meters is explici tly modeled and the differentcomponents of variation may be estimated directlyThe proportional hazards and Weibull models wereutilized as both produce natural frameworks for esti-mating relative risks while adjusting for simultaneouseffects of other covariates A simple Markov ChainMonte Carlo method for covariate imputat ion ofmissing data was used and the actual implementationof the Bayesian model was based on Gibbs samplingusing the free ware package BUGS In addition wealso used a Bayesian model to investigate the relativecontribution of genetic and environmental effects onthe expression of naevi and freckles which areknown risk factors for melanoma
Melanoma is a complex chronic disease the incidenceof which has more than doubled over the past 20 years(MacLennan et al 1992) The diseasersquos complexity canbe in part explained in terms of the joint effects ofgenotype and environment and censorship due to thelate onset of the disease in most patients
Studies have revealed a number of possible under-lying factors that may contribute to the risk ofmelanoma (English et al 1997 Siskind et al 2002Swerdlow amp Green 1987) Such factors haveincluded sun exposure skin hair and eye colourdegree of freckling number of naevi place of birthand ethnic origin It is also thought that certain genesare responsible for a personrsquos susceptibility to thedisease (Aitken et al 1998) and that a number of theabove mentioned risk factors may themselves begenetically influenced
A preliminary segregation analysis conducted byAitken et al (1998) investigated whether the familialclustering of cutaneous melanoma was consistentwith Mendelian inheritance of a major autosomalgene Analyses were performed with the SAGE statis-tical package using the maximum likelihood REGTLprogram for a binary trait The hypothesis of co-dom-inant Mendelian inheritance gave a significantlybetter fit to the data than either dominant or recessiveMendelian inheritance Overall both Mendelianinheritance of a single major gene and purely environ-mental transmission were rejected However therewas strong evidence of familial dependence inmelanoma occurrence
Despite these findings the etiology of melanoma isstill not well understood For example it is unclearwhy some people are more susceptible to the diseasethan others and for susceptible cases it is unclearwhether certain risk factors play a role in the progres-sion of the disease or if genetic factors are a majorsource of influence
Analysis of Melanoma Onset Assessing Familial Aggregation by Using Estimating Equations and Fitting Variance Components via Bayesian Random Effects Models
Kim-Anh Do1 Joanne F Aitken2 Adegravele C Green3 and Nicholas G Martin3
1 Department of BiostatisticsThe University of Texas MD Anderson Cancer CenterHoustonTexas USA2 Queensland Cancer Fund Epidemiology Unit BrisbaneAustralia3 Epidemiology and Population Health Unit Queensland Institute of Medical Research BrisbaneAustralia
Address for correspondence Dr K-A Do Department of BiostatisticsThe University of Texas MD Anderson Cancer Center 1515 HolcombeBlvd Houston Texas 77030 USA Email kimmdandersonorg
Research into chronic diseases such as melanomahas largely been achieved through family studieswhere the age of onset of the disease is modelledwithin a survival analysis framework An example ofthis type of analysis is the work by Abel and Bonney(1990) who developed a model accounting for age ofonset where the hazard function was expressed interms of a major gene effect and residual familydependence using the regressive approach described inBonney (1986)
Clayton (1978) and Vaupel (1979) introduced theso-called ldquofrailtyrdquo model which has been extended bymany others (Andersen et al 1993 Clayton ampCuzick 1985 Gauderman amp Thomas 1994Hougaard 1986 Li amp Thompson 1997 Li ampWijsman 1998 Nielsen et al 1992 Siegmund ampMcKnight 1998) A variant of these approaches usesestimating equations which accommodates correlatedage of onset outcomes and is known to be robust andcomputationally efficient (Hsu amp Prentice 1996 Hsuamp Zhao 1996) A common theme is the investigationof possible underlying genetic and environmentalfactors that may influence the age of onset of thedisease of interest
Recently the focus of familial studies involvingcensored data has moved towards the development ofBayesian methods using packages such as BUGS andWinBUGS (Gilks et al 1994 Spiegelhalter et al1996a 1996b 2003) the Genetic Analysis Package(GAP 1996) and MIXD (Olshen amp Wijsman 1996Thompson 1994) Examples of this include theimplementation of Markov Chain Monte Carlo(MCMC) methods for linkage analysis (Heath 1997Kong et al 1992 Lange amp Sobel 1991) the estima-tion of parameters in a mixed model with andwithout covariates (Guo amp Thompson 1991Thomas 1992) estimation of a gene-smoking interac-tion and covariate imputation (Gauderman et al1997) combined linkage and segregation analysis(Guo amp Thompson 1992 Faucett et al 1993) anddevelopment of mixed models for large complex pedi-grees (Guo amp Thompson 1994) Recent work by Doet al (2000) demonstrates an application of Bayesianmethodology to menopausal age in twins using a gen-eral ised l inear mixed model (GLMM) In thisapplication they investigated the contribution ofcovariates and any underlying genetic and environ-mental factors to explain variation in menopausalage A similar investigation by Scurrah et al (2000)fitted a Bayesian model to survival data In this appli-cation the authors used a Bayesian piece-wiseexponential model to explore the time to onset of res-piratory disease given known risk factors andpossible familial effects Both papers highlight theflexible nature of the Bayesian approach which canbe seen through the inclusion of priors and the inte-gration of fixed and random effects
The Bayesian approach to fitting a genetic modelin a GLMM framework was examined in detail inKuhnert and Do (2003) In this paper a simulationstudy investigated the flexible nature of the Bayesianmodel and its ability to incorporate genetic compo-nents through random effects This was comparedwith standard maximum likelihood methods for esti-mating genetic components in the model Resultsfrom a simulation study indicated a consistent advan-tage in using the Bayesian method to detect a correctmodel under particular scenarios of additive geneticsand common environmental effects Moreover forbinary data there was difficulty in detecting thecorrect model under low and moderate levels of heri-tability Results however were improved for ordinaldata under similar scenarios
We present an alternative model using Bayesianmethodology which takes into account the complexfeatures inherent with melanoma using a large datasetcomprising 1449 families The model does not assumeproportional hazards but a multiplicative modelwhere the Weibull distribution is used to model the ageat onset of melanoma This is fit in a Bayesian frame-work which incorporates fixed and random effects toestimate possible risk factors covariates and anyunderlying genetic factors Another approach toaccommodate the correlated age-at-onset outcomes rig-orously is to use the estimating equations for assessingfamilial aggregation of age-at-onset (Hsu amp Prentice1996 Hsu amp Zhao 1996) It has two desirable fea-tures (i) robustness mdash no higher-order distributionalassumptions are required beyond pairwise ones and(ii) computational efficiency
In this article our interest lies primarily withinvestigating whether certain risk factors are able toexplain a considerable proportion of the familialdependence and if including them into the modelreduces the residual variation and results in increasedpower for detecting a major gene effect To ourknowledge this study will be the first in melanomaresearch that extends current methodology and incor-porates covariate and genetic effects simultaneouslywith age of onset using Bayesian methodology onfamily data
Material and MethodsThe Data on the Age at Onset of Melanoma and Potential Risk Factors
We analysed data from the Queensland FamilialMelanoma Project Family ascertainment and datacollection have been described in detail (Aitken et al1996) Assessing standard melanoma risk factorsinclude counts of naevi on the arms and back demo-graphic and medical details lifetime residence andfamily history of melanoma and other cancers Brieflywe ascertained all 12016 first incident cases of cuta-neous melanoma (invasive and in situ) diagnosed inQueensland residents between 1982 and 1990 and
99Twin Research February 2004
Analysis of Melanoma Onset
reported to the Queensland Cancer Registry or foundby comparing cancer registrations for 1984 and 1987with records of pathology laboratories throughoutQueensland It is estimated that registry records areapproximately 95 complete for the study periodDoctorsrsquo permission was obtained to approach10407 cases of whom 7784 (75) returned a brieffamily history questionnaire stating whether any oftheir first-degree relatives (parents siblings children)had had a diagnosis of melanoma A total of 2920probands was sampled from these respondentsincluding all who had claimed a positive familyhistory (n = 1529) and an approximate 20 randomsample of the remainder (n = 1391) Probands weresent a detailed family history questionnaire askingfor the names and addresses of all first-degree rela-tives relativesrsquo vital status dates of birth and agesand whether any of these relatives had had amelanoma diagnosed by a doctor To avoid bias indetermining the mode of inheritance second andlower degree relatives were enrolled in the studyaccording to a sequential sampling scheme (Canningsamp Thompson 1977) First degree relatives of all rela-tives with confirmed melanoma were ascertainedthrough the detailed family history questionnairedescribed above which was mailed to all confirmedpositive relatives In total 15989 relatives belongingto 1912 separate families were reported by 2118(73) probands or other positive relatives A total of1044 relatives for whom date of birth was unknownwere excluded leaving 14945 relatives for analysisThere were 188 families independently ascertainedthrough two or more probands To avoid ascertain-ment bias these families were included in the datasetseparately for each proband in the family
Medical confirmation and dates of diagnosis weresought for the relatives reported by probands or otherrelatives to have had melanoma After eliminating187 of subjects who refused access to their medicalrecords or those with lost records or those with falsepositive reports (basal or squamous cell carcinomasolar keratoses or benign naevi) medical confirma-tion of melanoma as the diagnosis was obtained for482 of the original number of relatives Only themedically verified cases among relatives were classi-fied as true events all other relatives were treated asunaffected (censored at last date of contact)
Risk factor questionnaires were subsequentlymailed to all living relatives aged between 18 and 75years ascertained through the sequential samplingprocedure Other relatives provided proxy reportsThe combined number of proxy-reports and self-reports was 9746 relatives for whom standard riskfactor information was available
For the Bayesian analysis we focused on familiesthat included at least one parent and at least onechild where each member in the family should haveinformation on age at diagnosis or age at last follow-up and with maximum one missing covariate
The demographic covariates and hypothesizedmelanoma risk factors included gender birth yearplace of birth ability to tan (very brown moderatetan slight tan no tan) propensity to burn (neverburn and always tan sometimes burn and usually tanusually burn and sometimes tan always burn andnever tan) number of sunburns (0 1 2ndash5 gt 6) skincolor (olivedark medium fairpale) hair color(black lightdark brown fairblonde lightdark red)eye color (brown greenhazel bluegrey) total freck-ling in summer (0 1ndash100 gt 100) number of naevi(none few moderate number very many) andnumerous measures of cumulative lifetime exposuresto sun and ultraviolet rays
Preliminary Exploratory Analysis
As a preliminary analysis we ignored correlationswithin families and applied a combination of paramet-ric and non-parametric survival analysis techniques asexploratory tools to identify possible risk factors formelanoma Once these fixed effects were identified weconsidered incorporating these into a subsequent gen-eralized estimating equation model or a Bayesianmodel with random effects that could account forwithin-family correlations The aim was to quantify thegenetic and familial associations in the presence ofobserved covariate effects
Manipulation of the entire melanoma datasetresulted in a subset of 9669 observations with a rangeof explanatory variables that described phenotypiccharacteristics for each individual along with somedemographic details such as birth year and genderThe response variable was the time to diagnosis (orage at the last follow-up) with the proportion of cen-sored cases being approximately 76 The medianage at onset of melanoma was 43 The correlationestimates for age of onset for different relationshippairs with both affected members were 067 055and 039 for sibndashsib parentndashchild and secondlowerorder pairs respectively
The first stage of modelling involved fitting uni-variable proportional hazard models to assess eachvariablersquos individual effect on the time of onset ofmelanoma The SAS (Allison 1995) package wasused to fit proportional hazards models of the follow-ing form
h(t x) = Y (x b)h0(t)
where Y represents a log-linear function ebrsquox of theexplanatory variables x and corresponding coeffi-cients b and h0(t) represents the baseline hazard attime t
Significant variables associated with the age atonset of melanoma consisted of eye hair and skincolour freckling number of moles skin type abilityto burn ability to tan previous skin cancers ultravio-let exposure between the ages of 5 and 12 years
100 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
101Twin Research February 2004
Analysis of Melanoma Onset
cumulative sun exposure up to the age of 19 yearsand birth year
The second stage fitted multivariable proportionalhazard models to those explanatory variables thatwere significant at the univariable stage Table 1 dis-plays the results from the final model which onlyincluded significant variables (p-value lt 05) It isworth noting that a similar result could be obtainedusing an automated stepwise procedure
Results from this analysis highlight some interest-ing but quite obvious risk factors noted in previousanalyses (Aitken et al 1996 1998) For example
An increase of 1 year in birth year induces 17increase in risk of earlier melanoma onset
People with neither freckles nor naevi have thelowest risk of melanoma onset
The risk of earlier melanoma onset is increased byup to 37 for blue eyed people and even further(46) for green eyed people when compared toindividuals with brown eyes
ldquoRed Headsrdquo have an increased risk of earliermelanoma onset (46) when compared to indi-viduals with black hair However no significantincrease was noted for individuals with fair orlight red hair
A personrsquos tendency to burn easily increases therisk of earlier melanoma onset in some cases byup to 100 compared to those that never burn
However striking this last statement is issues of con-founding must also be considered The most obviousillustration of this is the confounding that occurredbetween mole count and freckling This is seenthrough close inspection of the parameter estimateswhich changed in magnitude when mole count wasadded to the model after adjusting for freckling(Results not displayed here)
To reduce the dimension of the problem furtherand avoid some of these confounding issues a sur-vival tree was constructed using RPART (RecursivePartitioning and Regression Trees) see Therneau andAtkinson (1997) Survival trees are a special case of
Table 1
Results from Fitting a Multivariable Proportional Hazards Model to the Melanoma Data Based on Univariable Results The Results Reported in thisTable Are the Parameter Estimates b Their Standard Errors se(b) the Relative Risk eb and the p-value for Each Estimate
Variable b se(b) eb p-value
Birth Year 016 lt 001 117 lt 05
Eye Colour (Baseline Brown)
BlueGrey 031 007 136 lt 05
GreenHazel 038 007 146 lt05
Hair Colour (Baseline Black)
Light RedGinger 017 015 119 27
Dark RedAuburn 038 015 146 lt 05
FairBlonde 006 012 106 62
Light Brown 014 012 115 22
Dark Brown 002 012 102 87
Skin Type (Baseline never burn)
Always burn 069 016 1968 lt 05
Usually burn 045 015 157 lt 05
Sometimes burn 031 015 136 lt 05
Freckling (Baseline none)
1 to 100 017 006 118 lt 05
gt 100 009 008 110 23
Mole Count (Baseline none)
Few 029 007 134 lt 05
Moderate 079 008 220 lt 05
Many 112 010 308 lt 05
Number of Sunburns (Baseline none)
One ndash007 011 093 49
2 to 5 ndash006 009 094 50
gt 6 017 009 119 07
Cumulative Sun Exposure ( lt 5 yrs) 004 001 104 lt 05
UV Exposure (5ndash12 yrs) 00003 lt 00001 1 lt 05
102 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
decision trees that were incorporated into RPART byTherneau (1997) using the ideas put forth by LeBlancand Crowley (1992) for survival data with censoringThese authors showed through simulation studies thatsurvival trees could outperform standard parametricmethods such as proportional hazards modelling particularly in situations where the underlying distrib-ution was not exponential Along with Breiman et al(1984) they showed decision trees to be usefulexploratory tools for identifying important variablesinteractions and outliers
The methodology for survival trees begins by iden-tifying optimal splits using the log rank statistic toseparate the data into homogeneous groups Eachsplit is comprised of a parent node and two daughternodes which are linked to the parent by branchesFigure 1 is an example of a survival tree produced onthe melanoma data Once a large tree is grown withmany terminal nodes that contain very few observa-tions a pruning procedure is introduced to identify asequence of sub-trees The technique of pruninginvolves snipping back splits of the tree one at a timeuntil only the root node remains Cross-validation isthen introduced to aid in the selection of the optimalmodel One nice feature of survival trees is the use ofKaplan-Meier curves to provide information aboutsurvival rates at each terminal node of the tree Thisapproach may allow for better interpretation of theterminal nodes and highlight different scenarioswhich yield similar survival rates as can be seen inFigure 1
The variables identified from the multivariablemodel were used as input into RPART Variables suchas the sun and UV exposures were omitted from themodelling stage since nearly half of the data for eachof these variables were missing (cumulative sun expo-sure 438 UV exposure 404) A large survivaltree was produced in SPLUS and using cross-valida-tion a model splitting solely on birth year wasidentified This survival tree yielded the minimumcross-validated error rate (093) suggesting that birthyear was an important indicator for melanomaSelecting a slightly larger but more informative model(error = 094) resulted in a survival tree consisting ofnine terminal nodes This revealed splits on birth yearmole count and freckling Al l other variablesappeared to be either competing at each node oracting as a surrogate variable for one of these primarysplits Skin type in particular arose as an importantsurrogate for freckling
In Figure 1 we aim to provide a more visual inter-pretation to the results from the survival analysisEach split is shown at the top of each node and canbe assumed to be the split that directs observationstowards the left side of the tree For example aftersplitting on birth year at 1933 a split to the left cor-responds to an individual with few moles A split tothe right indicates an individual with many moles andso on The terminal nodes in this figure are displayed
as Kaplan-Meier plots showing the survival curvesalong with the number of observations n and the riskratio calculated with reference to the baseline groupof 9669 individuals The reference survival curve isshown at the root node along with a bar which showsthe percentage of events in the entire dataset Gridsare placed on each survival plot at equal intervals of20 years of age These grids have been plotted to sim-plify the interpretation An indicator to the left of theplot displays the risk ratio Ratios above one areshown by an ldquoup arrowrdquo while a decrease in risk isillustrated by a ldquodown arrowrdquo From this model wecan see that there are a few scenarios that indicatehigh risk for earlier onset of melanoma These scenar-ios may be described as
individuals born after 1966 with many moles (RR = 113)
individuals born between 1947 and 1966 withmany moles (RR = 45)
individuals born between 1933 and 1953 with fewmoles but many freckles (RR = 24)
individuals born between 1923 and 1933 withmany moles (RR = 141)
It is obvious from these results that birth year has asubstantial impact on the age-at-onset of melanomaOnce this is taken into account mole count and freck-ling only provide a small contribution to the risk
Family History of Melanoma
As described earlier in the Data section family historywas collected regarding first-degree (siblings andparents) and second-degree relatives From this pedi-gree structure other higher-order types of relative pairscould also be formed Some of these relatives were dis-eased with melanoma resulting in pairs of relativeswho both may be diseased (++) both not diseased (mdash)and one diseased while the other was not diseased (+ndash)Table 2 lists the concordant and discordant pairs ofspecific relationships sibndashsib parentndashchild andsecond-degreelower order relative pairs The second-degree relative pairs include grandparentndashgrandchildand auntndashniece while the lower order relative pairsinclude the in-law pairs From Table 2 the percent-ages of both diseased pairs are 06 09 and 04among sibndashsib parentndashchild and secondlower orderpairs respectively Crude estimates of correlationcoefficients can be calculated from these percentageswithout accounting for ages at onset among these rel-atives However the risk of developing melanomamay depend on the subjectrsquos age Hence adjusting forthe age at onset is essential in quantifying the correla-tion of age at onset between pairs of relatives In asubsequent section we describe how this can be donerigorously via the generalized estimating equationsapproach On average melanomas were diagnosedslightly earlier in relatives (475 years) than inprobands (502 years) Among relatives melanomaswere diagnosed at younger ages in later generations
103Twin Research February 2004
Analysis of Melanoma Onset
RR
= 1
00
Bas
elin
e
Fig
ure
1
Surv
ival
tree
for t
he m
elan
oma
data
con
sist
ing
of 9
669
indi
vidu
als
from
191
2 Qu
eens
land
fam
ilies
Ter
min
al n
odes
are
rect
angu
lar a
nd d
ispl
ay th
e Ka
plan
-Mei
er c
urve
for t
he c
orre
spon
ding
sub
grou
p
104 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
To account for the different ages at censoring in eachgeneration due to termination of the study or deathfrom causes other than melanoma we examined thedisease-free survival distribution for each generationusing the standard failure-time analysis technique(Figure 2) The median age at diagnosis of melanomawas 64 among parents of probands 50 among sib-lings of probands and 33 among children ofprobands The disease-free survival functions differedsignificantly between the children generation fromearlier generations (log-rank test with p lt 01) butthere was no significant differences in age-at-onset of
melanoma between the siblings and parents ofprobands (log rank test with p gt 9)
Preliminary Segregation Analysis
A preliminary segregation analysis was conducted(Aitken et al 1998) to investigate whether the famil-ial clustering of cutaneous melanoma is consistentwith Mendelian inheritance of a major autosomalgene Analyses were performed with the SAGE (1992)statistical package using the maximum likelihoodREGTL program for a binary trait with a variable ageof onset The hypothesis of co-dominant Mendelianinheritance gave a significantly better fit to the datathan either dominant or recessive Mendelian inheri-tance Overall both Mendelian inheritance of a singlemajor gene and purely environmental transmissionwere rejected However there was strong evidence offamilial dependence in melanoma occurrence
The inclusion of risk factors in the models mayreveal whether all or a combination of these explainsthe familial dependence that was demonstrated by thesegregation analysis If a major gene exists that oper-ates independently on these covariates includingthem in the models may reduce residual variation andincrease the power of the analysis to detect a majorgene effect
Estimating Equations Approach
Let y = (dki tki Zki) denote the data collected for the ith
member in the kth family (k = 1hellipK and i = 12)where dki = 0 if the observation is censored tki is eitherthe recorded age at diagnosis of melanoma or the ageat the most recent follow-up for unaffected peopleand Zki is a vector of measured covariates We assumethat censoring time age at diagnosis of melanomaand the covariates are independently distributedThese assumptions can be relaxed in more generalmodels subject to identification constraints Thehazard rate for melanoma is the instantaneous proba-bility that melanoma is diagnosed immediately aftertime t given that the person is unaffected at time tThe hazard rate under the Cox proportional hazardsmodel (Cox 1972) is given by
l(tki) = l0(tki) exp(brsquoZki)
Table 2
Concordant and Discordant Pairs of Relatives in 1912 Families from the Queensland Familial Melanoma Project Probands Were Not Included forthe Calculation of Concordance
Sibndashsib Parentndashchild SecondOthers Total
++ 49 15 41 105
+ndash 763 536 1200 2499
mdash 7011 1078 9817 17906
Total 7823 1629 11058 20510
Figure 2
Cumulative melanoma-free survival among first-degree relatives ofconfirmed cutaneous melanoma cases (probands) diagnosed inQueensland Australia 1982ndash1987 according to the relativersquos rela-tionship to the proband
where l0() is the baseline hazard function and b is avector of regression coefficients
For a specific pair of relatives we follow Clayton(1978) in modeling the bivariate survivor function
F(tk1 tk2) = (F1(tk1)ndashq + F2(tk2)
ndashq ndash1)ndash1q
where F1 and F2 are univariable survivor functions qis a scalar parameter that measures the degree ofdependence between the relativesrsquo times at onsetindependence being implied by q = 0 and positiveassociation by q gt 0 The Clayton model allows nega-tive dependencies and has the property that failuretimes are absolutely continuous for q gt ndash05 In addi-tion the cross-ratio (or odds-ratio) function asstudied by Oakes (1989) is
c(tk1 tk2) = l(tk1 frac12 Tk2 = tk2) l(tk1frac12Tk2 sup3 tk2) =1 + q
This is equivalent to assuming that the odds-ratio isinvariant over the grid region that supports the dataHeuristically the parameter 1 + q is an odds-ratiothat depends on the degree of dependence betweenthe onset ages of the two relatives If genetic factorsdo influence the age at onset of melanoma we wouldexpect to see a higher concordance in the age of onsetin first degree relatives who on average share halftheir genes in common in comparison to seconddegree relatives Under the current model this trans-lates as qfirst-degree gt qsecond-degree
We may use a standard method to estimate withinpair correlations for 2 acute 2 tables from odds ratiosEstimates of relation-pair correlations psibndashsib pparentndashchildand psecond-order are recovered from using the relation-ships for example r sibndashsib = min(1ln(1 + q sibndashsib))Testing for the presence of genetic factors underlyingthe age at diagnosis of melanoma is equivalent totesting H0 p first-order = psecond-order We may test thishypothesis using a z-transform (Kendall 1979 p 315) of the point estimates of the correlation coef-ficients Let n1 and n2 denote the number of first orderand second order relatives let z1and z2 denote thetransformed statistics of r1and r2 the correlation esti-mates for first-order and second-order relative pairsrespectively Specifically we reject H0 when ED gtZ1ndasha where
E = E(z1 ndash z2) = mdash12mdash log 31mdash11
+ndash r
r
1
1mdash2 1mdash11
+ndash r
r2
2
mdash24D2 = V(z1 ndash z2) = mdash
(n1
1ndash 3)mdash+ mdash
(n2
1ndash 3)mdash
and Z1ndasha is the standard normal deviate correspondingto the one-sided a significance level
This approach has the advantage of providing atest for the presence of genetic effects through a singleparameter (q) However it is limited in its ability to
attribute the phenotypic variance to specific effects(eg additive gene action)
Mathematical details of the GEE model and theiterative procedure to estimate the regression coeffi-cients b and specific degrees of dependence q for thedifferent types of relative pairs have been summarizedpreviously in Do et al (2000)
MCMC Analysis Using BUGSThe Bayesian Paradigm and Gibbs Sampling
Markov Chain Monte Carlo (MCMC) is an alterna-tive Bayesian approach that provides estimates oflikelihoods and associated parameter values whenexact computation is infeasible (Hastings 1970Metropolis et al 1953) MCMC methods can beused to draw samples from the underlying joint distri-bution of major genotypes and polygenic valuesconditional on the observed data From thesesamples desired parameters and likelihoods can beestimated without the need to resort to exact compu-tation MCMC methods have been used for linkageanalysis (Kong et al 1992 Lange amp Sobel 1991)for estimation of parameters in the mixed model withand without covariates (Guo amp Thompson 1991Thomas 1992) for estimation of genendashsmoking inter-action and covariate imputation (Gauderman et al1997) for performing combined linkage and segrega-tion analysis (Faucett et al 1993 Guo amp Thompson1992) and for mixed models of large complex pedi-grees (Guo amp Thompson 1994)
In a general setting let y be the observed data andq be everything not observed including parametersand latent variables The implementation of Bayesianmethods using realistic models and priors is com-puter-intensive and relies heavily on cunningcomputational tools to approximate integrals Theproblem in general terms is to obtain the expectedvalue of a function of interest s () under the posteriordensity p(qfrac12x)
E[s(q)] =
which cannot generally be found analytically Onemethod to carry out the integration on the RHS is toperform simulation of exact Bayesian posterior distri-butions using Markov chain Monte Carlo techniquessuch as Gibbs sampling The Gibbs sampler (Gemanamp Geman 1984) is the most popular algorithm usedin MCMC applications to correlated data Gibbssampling was introduced to the main statistical com-munity by Gelfand and Smith (1990) and has sincebeen applied to an even wider array of problems TheGibbs sampler is easy to implement because it onlydepends on the local neighborhood structure In thecontext of pedigree analysis (Olshen amp Wijsman1996) the basic procedure is a sequential updating ofmissing and latent data including the underlying andunobserved major genotypes polygenic effects and
EQ s(q) p(q) p(xfrac12q) dqmdash mdash mdash
EQ p(q) p(xfrac12q) dq
105Twin Research February 2004
Analysis of Melanoma Onset
106 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
environmental effects Values for the missing or latentdata are sampled from the local conditional distribu-tion a function of the observed individual data thecurrent sampled values of other missinglatent data forthis particular individual such as polygenic and environ-mental effects and the values for the sampled geneticeffects in the immediate neighbors of an individualGibbs sampling basically consists of three main steps
Step 1 Setting initial values for unobserved quanti-ties (parameters and latent variables)
Step 2 For each parameter or latent variable qj sample from its ldquofull conditional distributionrdquogiven the current values of all other quantities inthe model
Step 3 Examine sampled values of parameters andlatent variables to monitor convergence and toprovide summary measures
Some of the most recent and popular packages thatimplement Gibbs sampling for analysis of pedigreedata include BUGS (Gilks et al 1994 Spiegelhalter et al 1996ab) Genetic Analysis Package (GAP1996) and MIXD (Olshen amp Wijsman 1996Thompson 1994) We have used BUGS mainlybecause of its flexibility in programming hierarchicalmodels besides being a freeware product
The Model
The aim here is to model the correlation structurewithin the family structure to satisfy the fundamentaladditive genetic model (Crow amp Kimura 1970Falconer 1990 Kempthorne 1960) as followsConsider a nuclear family structure consisting of 4members father mother and two children denoted byF M S1 S2 respectively Using similar notation as inBurton et al (1999) a conventional mixed linearmodel consisting of fixed and random effects may bewritten in the form
Qij = bTz + Aij +Cij + CSij+ Eij
where Qij is the observed value of a normally distrib-uted continuous trait for the jth individual in the ith
nuclear family zij is a vector of observed covariatesrepresenting fixed effects and b is a correspondingvector of unknown fixed regression coefficients Aij Cij and CSij
A denote random effects that repre-sent additive polygenic common family environmentand common sibling environment effects respectivelyThe variation in an individual response is representedby a composite covariance matrix VT and is the sumof an additive genetic covariance matrix VA acommon family environment matrix VC a sharedsibling environment matrix VCs and residual environ-mental effects The different variance components are
The overall total covariance matrix is
The components of variance s 2A s 2
C s 2Cs need not be
positive as long as VT is positive definite A negativevalue for s 2
Cs simply implies that the realized covari-ance between sib lings is less than the realized
107Twin Research February 2004
Analysis of Melanoma Onset
covariance between parent and a child A negativevalue for s 2
Cs may suggest dominance To optimizeconvergence in BUGS Model (1) may be reparameter-ized as
where Fi Gi H i are independent additive randomeffects or latent variables and are the residual errorterms for parents and children respectively If we model
Fi ~ N(0 mdash12mdash s 2
A +s 2C)
Gi ~ N(0 mdash12mdash s 2
A) Hi ~ N(0 s 2Cs)
then the basic genetics covariance model (additivegenetic common environment and unique environ-ment for this particular four-member family structureis satisfied
In survival models unobserved or unmeasuredexplanatory variables some of which may be geneticare often referred to as frailties The frailties takevalues restricted to the posit ive line and may beassumed to act multiplicatively on the hazardExtending the above model to correlated family datawith time-to-onset endpoint a multiplicative individ-ual heterogeneity or frailty term representing thelatent genetic and common environment variablesmay be modeled as random effects simultaneouslywith the effects associated with observed covariatesConsider right censored time to onset of melanomadata (Tij dij zij)1 j n from n relative pairs hereT i j denotes the true age at onset of the jth familymember or the censored time depending on whetherdij =1 or 0 respectively and z denotes a p acute 1 vector ofcovariates A Weibull distribution may be used tomodel time to failure as
f(ti zi) = ebrsquoz igt igndash1 exp(ndashebrsquoz it g
i )
where b is a vector of unknown regression coefficientsand g is the shape parameter of the Weibull distribu-tion This leads to a baseline hazard of the form
l0(ti) = gt igndash1
Re-parameterize by letting mi = ebrsquoz i the conditionaldistribution of ti given mI is then Weibull (gmi) We for-mulated a mixed model to represent the conditionaldistribution of tij given covariate effects random addi-tive genetic and common environment effects as
tijfrac12mij ~ Weibull (g mij) i = 1hellipn j = 12
where
The regression coefficients and the precision of therandom effects (tG tF tH) were given ldquonon-informa-tiverdquo Normal and Gamma priors respectively The shapeparameter g of the time to onset of melanoma distribu-tion was also given a non-informative Gamma priorwhich was slowly decreasing on the positive real line
The overall model (including random effects) maybe described in a Bayesian graph (Figure 3) whichsimplifies sampling from full conditional distributionsby exploiting part ial independence propert ies(Spiegelhalter et al 1996a) In this graph eachrandom quantity is represented by a node which maybe connected by directed or undirected linksConditional independence assumptions are repre-sented by the absence of such links
We implemented the Gibbs sampler using theBUGS program (Gilks et al 1994) (code in theAppendix) Imputation of missing data was handlednaturally in the Gibbs sampling framework by treatingmissing values as additional unknown quantities andrandomly sampling values from their full conditionaldistributions We chose simple prior distributions forimputation since the number of missing values forcovariates was not large (complete for birth year 16missing for naevi and 21 missing for freckles) andthere was no indication of non-random missingness inour data Therefore imputation for missing naevi andfreckles covariates were based on Bernouilli prior dis-tributions with respective parameter values estimatedfrom the complete observations We performed aninitial 10000 burn-in iterations followed by an addi-tional 20000 Parameter estimates were the mean andstandard deviation (SD) of all post convergence Gibbssamples with a thinning interval between 20 and 50credible intervals were computed as the lower andupper a2 percentiles from the last 20000 iterationsConvergence to the posterior distribution was con-firmed by using the different criteria provided by theadd-on CODA package including those of Gelmanand Rubin (1992) Geweke (1992) and Raftery andLewis (1992a 1992b)
ResultsGEE Approach
An inspection of residual plots following preliminarymodel-fitting provided no evidence for the failure ofproportional hazards assumption and did not detectinfluential observations We then proceeded to applythe GEE approach that could estimate regressioncoefficients while incorporating a dependence struc-ture between relative pairs The results are displayed
108 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
in Table 3 which suggest that later birth year havingat least a moderate number of naevi and freckles wereall simultaneously associated with later age at onsetof melanoma Table 3 also presents the estimatedodds-ratios for quantifying the correlation betweenpaired relatives of specific relationships
The odds ratio for sibndashsib pairs is 2973 which issignificantly different from 1 (p lt 001) The odds ratiofor parentndashchild pairs is 1650 which is slightly greaterthan 1 indicating a mild dependency between thesepairs although they are not quite statistically signifi-cant The odds ratio for second-degree and higherrelative pairs is 1155 indicating no dependence at allbetween these pairs This pattern of familial aggrega-tion is compatible with dominance variance as well asadditive genetic variance (Falconer 1990)
Bayesian Approach
We re-analyzed the age-at-onset of melanoma by usingGibbs sampling to impute missing covariates and toestimate subject-specific covariate effects randomadditive genetic common family environment andshared sibling environment effects on the log scale Theestimated shape parameter g was 43 with a 95 credi-ble interval (CI) of (42 46) The results aresummarized in Table 4 (Model A) The residual plotsdid not indicate a gross departure from the underlyingWeibull model and revealed no influential observa-tions We checked the sensitivity of the analyses to
initial parameter values by re-running the Gibbssampler five more times using different starting valuesThe resulting estimates did not differ by more than 5from the values reported here The mean estimate for(s 2
A) was 0452 with 95 CI = (0348 0566) for (s 2C)
was ndash0053 with 95 CI = (ndash0120 0019) and for(s 2
Cs) was 0467 with 95 CI = (0393 0545)A small negative value for the common family
effect suggests that there may be a dominant effect orthat the current model is not quite appropriate forexample that there are systematic effects or correla-tion structures that have not been accounted for Theresults here indicate that additive genetics seem toimpact equally on the variation of the age at onset ofmelanoma Further exploration for alternative modelsis a focus of our future research
In addition we investigated the relative contribu-tion of genetic and environmental effects on theexpression of naevi (Model B) and freckles (ModelC) which are known risk factors for melanoma Theexpressions of naevi and freckles were coded asbinary variables (none or few moles versus moderateor many moles and no freckles versus one or morefreckles) A hierarchical Bayesian binomial model wasfitted to estimate the random variance componentsThe results in Table 4 indicate that a common familyenvironment effect contributed the most to theexpression of naevi (s 2
C = 0704) (relative to the con-tributions of additive genetic effect (s 2
A = 0142) and
Individual j in family i
Figure 3
Graphical model of covariate and random family effects for an individual in a nuclear family structure tij represents the observed failure time forthe jth individual in the ith family with tcenij being an indicator variable of censoring status Full arrows indicate stochastic links to which a probabil-ity is attached broken arrows denote deterministic relationships bs are regression coefficients T is the precision of the prior distribution andequals the inverse of the variance Fi Gi Hi are random effects modeled as Fi ~ N(012 s 2
A + s 2C) Gi ~ N(0 12 s 2
A) and Hi ~ N(0 s 2Cs) Rectangles rep-
resent actual data values for the covariates g and mij are shape and scale parameters for the underlying Weibull distribution
109Twin Research February 2004
Analysis of Melanoma Onset
of shared sibling effect(s 2Cs = 0142) both of which
were non negligible In contrast variation in theexpression of freckles was largely explained by addi-tive genetic and shared family effects(s 2
A = 2050 s 2C =
2600) compared to a relatively small shared siblingeffect(s 2
C = 0115)
Discussion
We applied two methods mdash generalized estimatingequat ion and Bayesian analys is mdash to the geneticanalysis of age at onset of melanoma based on anuclear family structure Under both approaches theresults suggest that additive genetic factors played an
Table 4
Gibbs Sampling Approach Estimated Regression Coefficients and Estimated Variance Components in a Melanoma Study of Queensland Families
Weibull Model A Mean effects mdash Response variable is Age-at-onset
Covariate RR = eb Coefficient b Robust se(b) 95 CI of bYear of birth 1378 0321 00027 (03160326)
Naevi 1126 0119 00021 (00580185)
Freckling 1017 0017 01400 (ndash00050085)
Weibull Model B Variance components mdash Response variable is Age-at-onset
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 0452 0054 (03480566)
s 2C ndash0053 0027 (ndash01200019)
s 2Cs 0467 0040 (03930545)
g 43 0104 (4246)
Binomial Model Variance components mdash Response variable is Naevi
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 0142 0149 (00020498)
s 2C 0704 0156 (04031010)
s 2Cs 0195 0157 (000250553)
Binomial Model Variance components mdash Response variable is Freckling
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 2050 0779 (08353570)
s 2C 2600 0418 (17803460)
s 2Cs 0115 0088 (00110312)
Note Indicates Significance at the 005 level) Naevi Is a Binary Variable with Baseline 0 = No or Few Moles Freckling Is Coded as a Binary Variable with Baseline0 = No Freckles
Table 3
GEE Approach Estimated Regression Coefficients in the Proportional Hazard Model and Estimated Odds Ratios for Quantifying FamilialAggregation in Age at Onset of Melanoma in Queensland Families ( Indicates Significance at the 05 Significance Level)
A Mean effects
Covariate RR = eb Coefficient b Robust se(b) Z-statistic
Year of birth 1142 0132 0051 2588
Naevi (Baseline = No or few moles) 1765 0568 0073 7781
Freckling (Baseline = No freckles) 1160 0148 0049 3020
B Pattterns of familial aggregation
Relationship 1 + q se(q) Z-statistic
Sibndashsib 2973 06217 317
Parentndashchild 1650 0434 150
SecondOthers 1155 0270 047
important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix
Acknowledgments
Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee
ReferencesAbel L amp Bonney G E (1990) A time-dependent
logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407
Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401
Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165
Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc
Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag
Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press
Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625
Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California
Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212
Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151
Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117
Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202
Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235
English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283
Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd
Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494
GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software
110 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
111Twin Research February 2004
Analysis of Melanoma Onset
Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214
Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188
Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409
Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511
Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741
Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press
Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178
Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189
Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126
Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432
Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109
Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760
Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678
Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506
Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071
Kempthorne O (1960) Biometrical genetics New YorkPergammon Press
Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press
Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210
Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452
Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334
LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425
Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293
Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298
MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432
Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092
Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43
Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493
Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497
Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press
SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University
Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148
112 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171
Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press
Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit
Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit
Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95
Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146
Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation
Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230
Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington
Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454
113Twin Research February 2004
Analysis of Melanoma Onset
Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects
Research into chronic diseases such as melanomahas largely been achieved through family studieswhere the age of onset of the disease is modelledwithin a survival analysis framework An example ofthis type of analysis is the work by Abel and Bonney(1990) who developed a model accounting for age ofonset where the hazard function was expressed interms of a major gene effect and residual familydependence using the regressive approach described inBonney (1986)
Clayton (1978) and Vaupel (1979) introduced theso-called ldquofrailtyrdquo model which has been extended bymany others (Andersen et al 1993 Clayton ampCuzick 1985 Gauderman amp Thomas 1994Hougaard 1986 Li amp Thompson 1997 Li ampWijsman 1998 Nielsen et al 1992 Siegmund ampMcKnight 1998) A variant of these approaches usesestimating equations which accommodates correlatedage of onset outcomes and is known to be robust andcomputationally efficient (Hsu amp Prentice 1996 Hsuamp Zhao 1996) A common theme is the investigationof possible underlying genetic and environmentalfactors that may influence the age of onset of thedisease of interest
Recently the focus of familial studies involvingcensored data has moved towards the development ofBayesian methods using packages such as BUGS andWinBUGS (Gilks et al 1994 Spiegelhalter et al1996a 1996b 2003) the Genetic Analysis Package(GAP 1996) and MIXD (Olshen amp Wijsman 1996Thompson 1994) Examples of this include theimplementation of Markov Chain Monte Carlo(MCMC) methods for linkage analysis (Heath 1997Kong et al 1992 Lange amp Sobel 1991) the estima-tion of parameters in a mixed model with andwithout covariates (Guo amp Thompson 1991Thomas 1992) estimation of a gene-smoking interac-tion and covariate imputation (Gauderman et al1997) combined linkage and segregation analysis(Guo amp Thompson 1992 Faucett et al 1993) anddevelopment of mixed models for large complex pedi-grees (Guo amp Thompson 1994) Recent work by Doet al (2000) demonstrates an application of Bayesianmethodology to menopausal age in twins using a gen-eral ised l inear mixed model (GLMM) In thisapplication they investigated the contribution ofcovariates and any underlying genetic and environ-mental factors to explain variation in menopausalage A similar investigation by Scurrah et al (2000)fitted a Bayesian model to survival data In this appli-cation the authors used a Bayesian piece-wiseexponential model to explore the time to onset of res-piratory disease given known risk factors andpossible familial effects Both papers highlight theflexible nature of the Bayesian approach which canbe seen through the inclusion of priors and the inte-gration of fixed and random effects
The Bayesian approach to fitting a genetic modelin a GLMM framework was examined in detail inKuhnert and Do (2003) In this paper a simulationstudy investigated the flexible nature of the Bayesianmodel and its ability to incorporate genetic compo-nents through random effects This was comparedwith standard maximum likelihood methods for esti-mating genetic components in the model Resultsfrom a simulation study indicated a consistent advan-tage in using the Bayesian method to detect a correctmodel under particular scenarios of additive geneticsand common environmental effects Moreover forbinary data there was difficulty in detecting thecorrect model under low and moderate levels of heri-tability Results however were improved for ordinaldata under similar scenarios
We present an alternative model using Bayesianmethodology which takes into account the complexfeatures inherent with melanoma using a large datasetcomprising 1449 families The model does not assumeproportional hazards but a multiplicative modelwhere the Weibull distribution is used to model the ageat onset of melanoma This is fit in a Bayesian frame-work which incorporates fixed and random effects toestimate possible risk factors covariates and anyunderlying genetic factors Another approach toaccommodate the correlated age-at-onset outcomes rig-orously is to use the estimating equations for assessingfamilial aggregation of age-at-onset (Hsu amp Prentice1996 Hsu amp Zhao 1996) It has two desirable fea-tures (i) robustness mdash no higher-order distributionalassumptions are required beyond pairwise ones and(ii) computational efficiency
In this article our interest lies primarily withinvestigating whether certain risk factors are able toexplain a considerable proportion of the familialdependence and if including them into the modelreduces the residual variation and results in increasedpower for detecting a major gene effect To ourknowledge this study will be the first in melanomaresearch that extends current methodology and incor-porates covariate and genetic effects simultaneouslywith age of onset using Bayesian methodology onfamily data
Material and MethodsThe Data on the Age at Onset of Melanoma and Potential Risk Factors
We analysed data from the Queensland FamilialMelanoma Project Family ascertainment and datacollection have been described in detail (Aitken et al1996) Assessing standard melanoma risk factorsinclude counts of naevi on the arms and back demo-graphic and medical details lifetime residence andfamily history of melanoma and other cancers Brieflywe ascertained all 12016 first incident cases of cuta-neous melanoma (invasive and in situ) diagnosed inQueensland residents between 1982 and 1990 and
99Twin Research February 2004
Analysis of Melanoma Onset
reported to the Queensland Cancer Registry or foundby comparing cancer registrations for 1984 and 1987with records of pathology laboratories throughoutQueensland It is estimated that registry records areapproximately 95 complete for the study periodDoctorsrsquo permission was obtained to approach10407 cases of whom 7784 (75) returned a brieffamily history questionnaire stating whether any oftheir first-degree relatives (parents siblings children)had had a diagnosis of melanoma A total of 2920probands was sampled from these respondentsincluding all who had claimed a positive familyhistory (n = 1529) and an approximate 20 randomsample of the remainder (n = 1391) Probands weresent a detailed family history questionnaire askingfor the names and addresses of all first-degree rela-tives relativesrsquo vital status dates of birth and agesand whether any of these relatives had had amelanoma diagnosed by a doctor To avoid bias indetermining the mode of inheritance second andlower degree relatives were enrolled in the studyaccording to a sequential sampling scheme (Canningsamp Thompson 1977) First degree relatives of all rela-tives with confirmed melanoma were ascertainedthrough the detailed family history questionnairedescribed above which was mailed to all confirmedpositive relatives In total 15989 relatives belongingto 1912 separate families were reported by 2118(73) probands or other positive relatives A total of1044 relatives for whom date of birth was unknownwere excluded leaving 14945 relatives for analysisThere were 188 families independently ascertainedthrough two or more probands To avoid ascertain-ment bias these families were included in the datasetseparately for each proband in the family
Medical confirmation and dates of diagnosis weresought for the relatives reported by probands or otherrelatives to have had melanoma After eliminating187 of subjects who refused access to their medicalrecords or those with lost records or those with falsepositive reports (basal or squamous cell carcinomasolar keratoses or benign naevi) medical confirma-tion of melanoma as the diagnosis was obtained for482 of the original number of relatives Only themedically verified cases among relatives were classi-fied as true events all other relatives were treated asunaffected (censored at last date of contact)
Risk factor questionnaires were subsequentlymailed to all living relatives aged between 18 and 75years ascertained through the sequential samplingprocedure Other relatives provided proxy reportsThe combined number of proxy-reports and self-reports was 9746 relatives for whom standard riskfactor information was available
For the Bayesian analysis we focused on familiesthat included at least one parent and at least onechild where each member in the family should haveinformation on age at diagnosis or age at last follow-up and with maximum one missing covariate
The demographic covariates and hypothesizedmelanoma risk factors included gender birth yearplace of birth ability to tan (very brown moderatetan slight tan no tan) propensity to burn (neverburn and always tan sometimes burn and usually tanusually burn and sometimes tan always burn andnever tan) number of sunburns (0 1 2ndash5 gt 6) skincolor (olivedark medium fairpale) hair color(black lightdark brown fairblonde lightdark red)eye color (brown greenhazel bluegrey) total freck-ling in summer (0 1ndash100 gt 100) number of naevi(none few moderate number very many) andnumerous measures of cumulative lifetime exposuresto sun and ultraviolet rays
Preliminary Exploratory Analysis
As a preliminary analysis we ignored correlationswithin families and applied a combination of paramet-ric and non-parametric survival analysis techniques asexploratory tools to identify possible risk factors formelanoma Once these fixed effects were identified weconsidered incorporating these into a subsequent gen-eralized estimating equation model or a Bayesianmodel with random effects that could account forwithin-family correlations The aim was to quantify thegenetic and familial associations in the presence ofobserved covariate effects
Manipulation of the entire melanoma datasetresulted in a subset of 9669 observations with a rangeof explanatory variables that described phenotypiccharacteristics for each individual along with somedemographic details such as birth year and genderThe response variable was the time to diagnosis (orage at the last follow-up) with the proportion of cen-sored cases being approximately 76 The medianage at onset of melanoma was 43 The correlationestimates for age of onset for different relationshippairs with both affected members were 067 055and 039 for sibndashsib parentndashchild and secondlowerorder pairs respectively
The first stage of modelling involved fitting uni-variable proportional hazard models to assess eachvariablersquos individual effect on the time of onset ofmelanoma The SAS (Allison 1995) package wasused to fit proportional hazards models of the follow-ing form
h(t x) = Y (x b)h0(t)
where Y represents a log-linear function ebrsquox of theexplanatory variables x and corresponding coeffi-cients b and h0(t) represents the baseline hazard attime t
Significant variables associated with the age atonset of melanoma consisted of eye hair and skincolour freckling number of moles skin type abilityto burn ability to tan previous skin cancers ultravio-let exposure between the ages of 5 and 12 years
100 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
101Twin Research February 2004
Analysis of Melanoma Onset
cumulative sun exposure up to the age of 19 yearsand birth year
The second stage fitted multivariable proportionalhazard models to those explanatory variables thatwere significant at the univariable stage Table 1 dis-plays the results from the final model which onlyincluded significant variables (p-value lt 05) It isworth noting that a similar result could be obtainedusing an automated stepwise procedure
Results from this analysis highlight some interest-ing but quite obvious risk factors noted in previousanalyses (Aitken et al 1996 1998) For example
An increase of 1 year in birth year induces 17increase in risk of earlier melanoma onset
People with neither freckles nor naevi have thelowest risk of melanoma onset
The risk of earlier melanoma onset is increased byup to 37 for blue eyed people and even further(46) for green eyed people when compared toindividuals with brown eyes
ldquoRed Headsrdquo have an increased risk of earliermelanoma onset (46) when compared to indi-viduals with black hair However no significantincrease was noted for individuals with fair orlight red hair
A personrsquos tendency to burn easily increases therisk of earlier melanoma onset in some cases byup to 100 compared to those that never burn
However striking this last statement is issues of con-founding must also be considered The most obviousillustration of this is the confounding that occurredbetween mole count and freckling This is seenthrough close inspection of the parameter estimateswhich changed in magnitude when mole count wasadded to the model after adjusting for freckling(Results not displayed here)
To reduce the dimension of the problem furtherand avoid some of these confounding issues a sur-vival tree was constructed using RPART (RecursivePartitioning and Regression Trees) see Therneau andAtkinson (1997) Survival trees are a special case of
Table 1
Results from Fitting a Multivariable Proportional Hazards Model to the Melanoma Data Based on Univariable Results The Results Reported in thisTable Are the Parameter Estimates b Their Standard Errors se(b) the Relative Risk eb and the p-value for Each Estimate
Variable b se(b) eb p-value
Birth Year 016 lt 001 117 lt 05
Eye Colour (Baseline Brown)
BlueGrey 031 007 136 lt 05
GreenHazel 038 007 146 lt05
Hair Colour (Baseline Black)
Light RedGinger 017 015 119 27
Dark RedAuburn 038 015 146 lt 05
FairBlonde 006 012 106 62
Light Brown 014 012 115 22
Dark Brown 002 012 102 87
Skin Type (Baseline never burn)
Always burn 069 016 1968 lt 05
Usually burn 045 015 157 lt 05
Sometimes burn 031 015 136 lt 05
Freckling (Baseline none)
1 to 100 017 006 118 lt 05
gt 100 009 008 110 23
Mole Count (Baseline none)
Few 029 007 134 lt 05
Moderate 079 008 220 lt 05
Many 112 010 308 lt 05
Number of Sunburns (Baseline none)
One ndash007 011 093 49
2 to 5 ndash006 009 094 50
gt 6 017 009 119 07
Cumulative Sun Exposure ( lt 5 yrs) 004 001 104 lt 05
UV Exposure (5ndash12 yrs) 00003 lt 00001 1 lt 05
102 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
decision trees that were incorporated into RPART byTherneau (1997) using the ideas put forth by LeBlancand Crowley (1992) for survival data with censoringThese authors showed through simulation studies thatsurvival trees could outperform standard parametricmethods such as proportional hazards modelling particularly in situations where the underlying distrib-ution was not exponential Along with Breiman et al(1984) they showed decision trees to be usefulexploratory tools for identifying important variablesinteractions and outliers
The methodology for survival trees begins by iden-tifying optimal splits using the log rank statistic toseparate the data into homogeneous groups Eachsplit is comprised of a parent node and two daughternodes which are linked to the parent by branchesFigure 1 is an example of a survival tree produced onthe melanoma data Once a large tree is grown withmany terminal nodes that contain very few observa-tions a pruning procedure is introduced to identify asequence of sub-trees The technique of pruninginvolves snipping back splits of the tree one at a timeuntil only the root node remains Cross-validation isthen introduced to aid in the selection of the optimalmodel One nice feature of survival trees is the use ofKaplan-Meier curves to provide information aboutsurvival rates at each terminal node of the tree Thisapproach may allow for better interpretation of theterminal nodes and highlight different scenarioswhich yield similar survival rates as can be seen inFigure 1
The variables identified from the multivariablemodel were used as input into RPART Variables suchas the sun and UV exposures were omitted from themodelling stage since nearly half of the data for eachof these variables were missing (cumulative sun expo-sure 438 UV exposure 404) A large survivaltree was produced in SPLUS and using cross-valida-tion a model splitting solely on birth year wasidentified This survival tree yielded the minimumcross-validated error rate (093) suggesting that birthyear was an important indicator for melanomaSelecting a slightly larger but more informative model(error = 094) resulted in a survival tree consisting ofnine terminal nodes This revealed splits on birth yearmole count and freckling Al l other variablesappeared to be either competing at each node oracting as a surrogate variable for one of these primarysplits Skin type in particular arose as an importantsurrogate for freckling
In Figure 1 we aim to provide a more visual inter-pretation to the results from the survival analysisEach split is shown at the top of each node and canbe assumed to be the split that directs observationstowards the left side of the tree For example aftersplitting on birth year at 1933 a split to the left cor-responds to an individual with few moles A split tothe right indicates an individual with many moles andso on The terminal nodes in this figure are displayed
as Kaplan-Meier plots showing the survival curvesalong with the number of observations n and the riskratio calculated with reference to the baseline groupof 9669 individuals The reference survival curve isshown at the root node along with a bar which showsthe percentage of events in the entire dataset Gridsare placed on each survival plot at equal intervals of20 years of age These grids have been plotted to sim-plify the interpretation An indicator to the left of theplot displays the risk ratio Ratios above one areshown by an ldquoup arrowrdquo while a decrease in risk isillustrated by a ldquodown arrowrdquo From this model wecan see that there are a few scenarios that indicatehigh risk for earlier onset of melanoma These scenar-ios may be described as
individuals born after 1966 with many moles (RR = 113)
individuals born between 1947 and 1966 withmany moles (RR = 45)
individuals born between 1933 and 1953 with fewmoles but many freckles (RR = 24)
individuals born between 1923 and 1933 withmany moles (RR = 141)
It is obvious from these results that birth year has asubstantial impact on the age-at-onset of melanomaOnce this is taken into account mole count and freck-ling only provide a small contribution to the risk
Family History of Melanoma
As described earlier in the Data section family historywas collected regarding first-degree (siblings andparents) and second-degree relatives From this pedi-gree structure other higher-order types of relative pairscould also be formed Some of these relatives were dis-eased with melanoma resulting in pairs of relativeswho both may be diseased (++) both not diseased (mdash)and one diseased while the other was not diseased (+ndash)Table 2 lists the concordant and discordant pairs ofspecific relationships sibndashsib parentndashchild andsecond-degreelower order relative pairs The second-degree relative pairs include grandparentndashgrandchildand auntndashniece while the lower order relative pairsinclude the in-law pairs From Table 2 the percent-ages of both diseased pairs are 06 09 and 04among sibndashsib parentndashchild and secondlower orderpairs respectively Crude estimates of correlationcoefficients can be calculated from these percentageswithout accounting for ages at onset among these rel-atives However the risk of developing melanomamay depend on the subjectrsquos age Hence adjusting forthe age at onset is essential in quantifying the correla-tion of age at onset between pairs of relatives In asubsequent section we describe how this can be donerigorously via the generalized estimating equationsapproach On average melanomas were diagnosedslightly earlier in relatives (475 years) than inprobands (502 years) Among relatives melanomaswere diagnosed at younger ages in later generations
103Twin Research February 2004
Analysis of Melanoma Onset
RR
= 1
00
Bas
elin
e
Fig
ure
1
Surv
ival
tree
for t
he m
elan
oma
data
con
sist
ing
of 9
669
indi
vidu
als
from
191
2 Qu
eens
land
fam
ilies
Ter
min
al n
odes
are
rect
angu
lar a
nd d
ispl
ay th
e Ka
plan
-Mei
er c
urve
for t
he c
orre
spon
ding
sub
grou
p
104 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
To account for the different ages at censoring in eachgeneration due to termination of the study or deathfrom causes other than melanoma we examined thedisease-free survival distribution for each generationusing the standard failure-time analysis technique(Figure 2) The median age at diagnosis of melanomawas 64 among parents of probands 50 among sib-lings of probands and 33 among children ofprobands The disease-free survival functions differedsignificantly between the children generation fromearlier generations (log-rank test with p lt 01) butthere was no significant differences in age-at-onset of
melanoma between the siblings and parents ofprobands (log rank test with p gt 9)
Preliminary Segregation Analysis
A preliminary segregation analysis was conducted(Aitken et al 1998) to investigate whether the famil-ial clustering of cutaneous melanoma is consistentwith Mendelian inheritance of a major autosomalgene Analyses were performed with the SAGE (1992)statistical package using the maximum likelihoodREGTL program for a binary trait with a variable ageof onset The hypothesis of co-dominant Mendelianinheritance gave a significantly better fit to the datathan either dominant or recessive Mendelian inheri-tance Overall both Mendelian inheritance of a singlemajor gene and purely environmental transmissionwere rejected However there was strong evidence offamilial dependence in melanoma occurrence
The inclusion of risk factors in the models mayreveal whether all or a combination of these explainsthe familial dependence that was demonstrated by thesegregation analysis If a major gene exists that oper-ates independently on these covariates includingthem in the models may reduce residual variation andincrease the power of the analysis to detect a majorgene effect
Estimating Equations Approach
Let y = (dki tki Zki) denote the data collected for the ith
member in the kth family (k = 1hellipK and i = 12)where dki = 0 if the observation is censored tki is eitherthe recorded age at diagnosis of melanoma or the ageat the most recent follow-up for unaffected peopleand Zki is a vector of measured covariates We assumethat censoring time age at diagnosis of melanomaand the covariates are independently distributedThese assumptions can be relaxed in more generalmodels subject to identification constraints Thehazard rate for melanoma is the instantaneous proba-bility that melanoma is diagnosed immediately aftertime t given that the person is unaffected at time tThe hazard rate under the Cox proportional hazardsmodel (Cox 1972) is given by
l(tki) = l0(tki) exp(brsquoZki)
Table 2
Concordant and Discordant Pairs of Relatives in 1912 Families from the Queensland Familial Melanoma Project Probands Were Not Included forthe Calculation of Concordance
Sibndashsib Parentndashchild SecondOthers Total
++ 49 15 41 105
+ndash 763 536 1200 2499
mdash 7011 1078 9817 17906
Total 7823 1629 11058 20510
Figure 2
Cumulative melanoma-free survival among first-degree relatives ofconfirmed cutaneous melanoma cases (probands) diagnosed inQueensland Australia 1982ndash1987 according to the relativersquos rela-tionship to the proband
where l0() is the baseline hazard function and b is avector of regression coefficients
For a specific pair of relatives we follow Clayton(1978) in modeling the bivariate survivor function
F(tk1 tk2) = (F1(tk1)ndashq + F2(tk2)
ndashq ndash1)ndash1q
where F1 and F2 are univariable survivor functions qis a scalar parameter that measures the degree ofdependence between the relativesrsquo times at onsetindependence being implied by q = 0 and positiveassociation by q gt 0 The Clayton model allows nega-tive dependencies and has the property that failuretimes are absolutely continuous for q gt ndash05 In addi-tion the cross-ratio (or odds-ratio) function asstudied by Oakes (1989) is
c(tk1 tk2) = l(tk1 frac12 Tk2 = tk2) l(tk1frac12Tk2 sup3 tk2) =1 + q
This is equivalent to assuming that the odds-ratio isinvariant over the grid region that supports the dataHeuristically the parameter 1 + q is an odds-ratiothat depends on the degree of dependence betweenthe onset ages of the two relatives If genetic factorsdo influence the age at onset of melanoma we wouldexpect to see a higher concordance in the age of onsetin first degree relatives who on average share halftheir genes in common in comparison to seconddegree relatives Under the current model this trans-lates as qfirst-degree gt qsecond-degree
We may use a standard method to estimate withinpair correlations for 2 acute 2 tables from odds ratiosEstimates of relation-pair correlations psibndashsib pparentndashchildand psecond-order are recovered from using the relation-ships for example r sibndashsib = min(1ln(1 + q sibndashsib))Testing for the presence of genetic factors underlyingthe age at diagnosis of melanoma is equivalent totesting H0 p first-order = psecond-order We may test thishypothesis using a z-transform (Kendall 1979 p 315) of the point estimates of the correlation coef-ficients Let n1 and n2 denote the number of first orderand second order relatives let z1and z2 denote thetransformed statistics of r1and r2 the correlation esti-mates for first-order and second-order relative pairsrespectively Specifically we reject H0 when ED gtZ1ndasha where
E = E(z1 ndash z2) = mdash12mdash log 31mdash11
+ndash r
r
1
1mdash2 1mdash11
+ndash r
r2
2
mdash24D2 = V(z1 ndash z2) = mdash
(n1
1ndash 3)mdash+ mdash
(n2
1ndash 3)mdash
and Z1ndasha is the standard normal deviate correspondingto the one-sided a significance level
This approach has the advantage of providing atest for the presence of genetic effects through a singleparameter (q) However it is limited in its ability to
attribute the phenotypic variance to specific effects(eg additive gene action)
Mathematical details of the GEE model and theiterative procedure to estimate the regression coeffi-cients b and specific degrees of dependence q for thedifferent types of relative pairs have been summarizedpreviously in Do et al (2000)
MCMC Analysis Using BUGSThe Bayesian Paradigm and Gibbs Sampling
Markov Chain Monte Carlo (MCMC) is an alterna-tive Bayesian approach that provides estimates oflikelihoods and associated parameter values whenexact computation is infeasible (Hastings 1970Metropolis et al 1953) MCMC methods can beused to draw samples from the underlying joint distri-bution of major genotypes and polygenic valuesconditional on the observed data From thesesamples desired parameters and likelihoods can beestimated without the need to resort to exact compu-tation MCMC methods have been used for linkageanalysis (Kong et al 1992 Lange amp Sobel 1991)for estimation of parameters in the mixed model withand without covariates (Guo amp Thompson 1991Thomas 1992) for estimation of genendashsmoking inter-action and covariate imputation (Gauderman et al1997) for performing combined linkage and segrega-tion analysis (Faucett et al 1993 Guo amp Thompson1992) and for mixed models of large complex pedi-grees (Guo amp Thompson 1994)
In a general setting let y be the observed data andq be everything not observed including parametersand latent variables The implementation of Bayesianmethods using realistic models and priors is com-puter-intensive and relies heavily on cunningcomputational tools to approximate integrals Theproblem in general terms is to obtain the expectedvalue of a function of interest s () under the posteriordensity p(qfrac12x)
E[s(q)] =
which cannot generally be found analytically Onemethod to carry out the integration on the RHS is toperform simulation of exact Bayesian posterior distri-butions using Markov chain Monte Carlo techniquessuch as Gibbs sampling The Gibbs sampler (Gemanamp Geman 1984) is the most popular algorithm usedin MCMC applications to correlated data Gibbssampling was introduced to the main statistical com-munity by Gelfand and Smith (1990) and has sincebeen applied to an even wider array of problems TheGibbs sampler is easy to implement because it onlydepends on the local neighborhood structure In thecontext of pedigree analysis (Olshen amp Wijsman1996) the basic procedure is a sequential updating ofmissing and latent data including the underlying andunobserved major genotypes polygenic effects and
EQ s(q) p(q) p(xfrac12q) dqmdash mdash mdash
EQ p(q) p(xfrac12q) dq
105Twin Research February 2004
Analysis of Melanoma Onset
106 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
environmental effects Values for the missing or latentdata are sampled from the local conditional distribu-tion a function of the observed individual data thecurrent sampled values of other missinglatent data forthis particular individual such as polygenic and environ-mental effects and the values for the sampled geneticeffects in the immediate neighbors of an individualGibbs sampling basically consists of three main steps
Step 1 Setting initial values for unobserved quanti-ties (parameters and latent variables)
Step 2 For each parameter or latent variable qj sample from its ldquofull conditional distributionrdquogiven the current values of all other quantities inthe model
Step 3 Examine sampled values of parameters andlatent variables to monitor convergence and toprovide summary measures
Some of the most recent and popular packages thatimplement Gibbs sampling for analysis of pedigreedata include BUGS (Gilks et al 1994 Spiegelhalter et al 1996ab) Genetic Analysis Package (GAP1996) and MIXD (Olshen amp Wijsman 1996Thompson 1994) We have used BUGS mainlybecause of its flexibility in programming hierarchicalmodels besides being a freeware product
The Model
The aim here is to model the correlation structurewithin the family structure to satisfy the fundamentaladditive genetic model (Crow amp Kimura 1970Falconer 1990 Kempthorne 1960) as followsConsider a nuclear family structure consisting of 4members father mother and two children denoted byF M S1 S2 respectively Using similar notation as inBurton et al (1999) a conventional mixed linearmodel consisting of fixed and random effects may bewritten in the form
Qij = bTz + Aij +Cij + CSij+ Eij
where Qij is the observed value of a normally distrib-uted continuous trait for the jth individual in the ith
nuclear family zij is a vector of observed covariatesrepresenting fixed effects and b is a correspondingvector of unknown fixed regression coefficients Aij Cij and CSij
A denote random effects that repre-sent additive polygenic common family environmentand common sibling environment effects respectivelyThe variation in an individual response is representedby a composite covariance matrix VT and is the sumof an additive genetic covariance matrix VA acommon family environment matrix VC a sharedsibling environment matrix VCs and residual environ-mental effects The different variance components are
The overall total covariance matrix is
The components of variance s 2A s 2
C s 2Cs need not be
positive as long as VT is positive definite A negativevalue for s 2
Cs simply implies that the realized covari-ance between sib lings is less than the realized
107Twin Research February 2004
Analysis of Melanoma Onset
covariance between parent and a child A negativevalue for s 2
Cs may suggest dominance To optimizeconvergence in BUGS Model (1) may be reparameter-ized as
where Fi Gi H i are independent additive randomeffects or latent variables and are the residual errorterms for parents and children respectively If we model
Fi ~ N(0 mdash12mdash s 2
A +s 2C)
Gi ~ N(0 mdash12mdash s 2
A) Hi ~ N(0 s 2Cs)
then the basic genetics covariance model (additivegenetic common environment and unique environ-ment for this particular four-member family structureis satisfied
In survival models unobserved or unmeasuredexplanatory variables some of which may be geneticare often referred to as frailties The frailties takevalues restricted to the posit ive line and may beassumed to act multiplicatively on the hazardExtending the above model to correlated family datawith time-to-onset endpoint a multiplicative individ-ual heterogeneity or frailty term representing thelatent genetic and common environment variablesmay be modeled as random effects simultaneouslywith the effects associated with observed covariatesConsider right censored time to onset of melanomadata (Tij dij zij)1 j n from n relative pairs hereT i j denotes the true age at onset of the jth familymember or the censored time depending on whetherdij =1 or 0 respectively and z denotes a p acute 1 vector ofcovariates A Weibull distribution may be used tomodel time to failure as
f(ti zi) = ebrsquoz igt igndash1 exp(ndashebrsquoz it g
i )
where b is a vector of unknown regression coefficientsand g is the shape parameter of the Weibull distribu-tion This leads to a baseline hazard of the form
l0(ti) = gt igndash1
Re-parameterize by letting mi = ebrsquoz i the conditionaldistribution of ti given mI is then Weibull (gmi) We for-mulated a mixed model to represent the conditionaldistribution of tij given covariate effects random addi-tive genetic and common environment effects as
tijfrac12mij ~ Weibull (g mij) i = 1hellipn j = 12
where
The regression coefficients and the precision of therandom effects (tG tF tH) were given ldquonon-informa-tiverdquo Normal and Gamma priors respectively The shapeparameter g of the time to onset of melanoma distribu-tion was also given a non-informative Gamma priorwhich was slowly decreasing on the positive real line
The overall model (including random effects) maybe described in a Bayesian graph (Figure 3) whichsimplifies sampling from full conditional distributionsby exploiting part ial independence propert ies(Spiegelhalter et al 1996a) In this graph eachrandom quantity is represented by a node which maybe connected by directed or undirected linksConditional independence assumptions are repre-sented by the absence of such links
We implemented the Gibbs sampler using theBUGS program (Gilks et al 1994) (code in theAppendix) Imputation of missing data was handlednaturally in the Gibbs sampling framework by treatingmissing values as additional unknown quantities andrandomly sampling values from their full conditionaldistributions We chose simple prior distributions forimputation since the number of missing values forcovariates was not large (complete for birth year 16missing for naevi and 21 missing for freckles) andthere was no indication of non-random missingness inour data Therefore imputation for missing naevi andfreckles covariates were based on Bernouilli prior dis-tributions with respective parameter values estimatedfrom the complete observations We performed aninitial 10000 burn-in iterations followed by an addi-tional 20000 Parameter estimates were the mean andstandard deviation (SD) of all post convergence Gibbssamples with a thinning interval between 20 and 50credible intervals were computed as the lower andupper a2 percentiles from the last 20000 iterationsConvergence to the posterior distribution was con-firmed by using the different criteria provided by theadd-on CODA package including those of Gelmanand Rubin (1992) Geweke (1992) and Raftery andLewis (1992a 1992b)
ResultsGEE Approach
An inspection of residual plots following preliminarymodel-fitting provided no evidence for the failure ofproportional hazards assumption and did not detectinfluential observations We then proceeded to applythe GEE approach that could estimate regressioncoefficients while incorporating a dependence struc-ture between relative pairs The results are displayed
108 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
in Table 3 which suggest that later birth year havingat least a moderate number of naevi and freckles wereall simultaneously associated with later age at onsetof melanoma Table 3 also presents the estimatedodds-ratios for quantifying the correlation betweenpaired relatives of specific relationships
The odds ratio for sibndashsib pairs is 2973 which issignificantly different from 1 (p lt 001) The odds ratiofor parentndashchild pairs is 1650 which is slightly greaterthan 1 indicating a mild dependency between thesepairs although they are not quite statistically signifi-cant The odds ratio for second-degree and higherrelative pairs is 1155 indicating no dependence at allbetween these pairs This pattern of familial aggrega-tion is compatible with dominance variance as well asadditive genetic variance (Falconer 1990)
Bayesian Approach
We re-analyzed the age-at-onset of melanoma by usingGibbs sampling to impute missing covariates and toestimate subject-specific covariate effects randomadditive genetic common family environment andshared sibling environment effects on the log scale Theestimated shape parameter g was 43 with a 95 credi-ble interval (CI) of (42 46) The results aresummarized in Table 4 (Model A) The residual plotsdid not indicate a gross departure from the underlyingWeibull model and revealed no influential observa-tions We checked the sensitivity of the analyses to
initial parameter values by re-running the Gibbssampler five more times using different starting valuesThe resulting estimates did not differ by more than 5from the values reported here The mean estimate for(s 2
A) was 0452 with 95 CI = (0348 0566) for (s 2C)
was ndash0053 with 95 CI = (ndash0120 0019) and for(s 2
Cs) was 0467 with 95 CI = (0393 0545)A small negative value for the common family
effect suggests that there may be a dominant effect orthat the current model is not quite appropriate forexample that there are systematic effects or correla-tion structures that have not been accounted for Theresults here indicate that additive genetics seem toimpact equally on the variation of the age at onset ofmelanoma Further exploration for alternative modelsis a focus of our future research
In addition we investigated the relative contribu-tion of genetic and environmental effects on theexpression of naevi (Model B) and freckles (ModelC) which are known risk factors for melanoma Theexpressions of naevi and freckles were coded asbinary variables (none or few moles versus moderateor many moles and no freckles versus one or morefreckles) A hierarchical Bayesian binomial model wasfitted to estimate the random variance componentsThe results in Table 4 indicate that a common familyenvironment effect contributed the most to theexpression of naevi (s 2
C = 0704) (relative to the con-tributions of additive genetic effect (s 2
A = 0142) and
Individual j in family i
Figure 3
Graphical model of covariate and random family effects for an individual in a nuclear family structure tij represents the observed failure time forthe jth individual in the ith family with tcenij being an indicator variable of censoring status Full arrows indicate stochastic links to which a probabil-ity is attached broken arrows denote deterministic relationships bs are regression coefficients T is the precision of the prior distribution andequals the inverse of the variance Fi Gi Hi are random effects modeled as Fi ~ N(012 s 2
A + s 2C) Gi ~ N(0 12 s 2
A) and Hi ~ N(0 s 2Cs) Rectangles rep-
resent actual data values for the covariates g and mij are shape and scale parameters for the underlying Weibull distribution
109Twin Research February 2004
Analysis of Melanoma Onset
of shared sibling effect(s 2Cs = 0142) both of which
were non negligible In contrast variation in theexpression of freckles was largely explained by addi-tive genetic and shared family effects(s 2
A = 2050 s 2C =
2600) compared to a relatively small shared siblingeffect(s 2
C = 0115)
Discussion
We applied two methods mdash generalized estimatingequat ion and Bayesian analys is mdash to the geneticanalysis of age at onset of melanoma based on anuclear family structure Under both approaches theresults suggest that additive genetic factors played an
Table 4
Gibbs Sampling Approach Estimated Regression Coefficients and Estimated Variance Components in a Melanoma Study of Queensland Families
Weibull Model A Mean effects mdash Response variable is Age-at-onset
Covariate RR = eb Coefficient b Robust se(b) 95 CI of bYear of birth 1378 0321 00027 (03160326)
Naevi 1126 0119 00021 (00580185)
Freckling 1017 0017 01400 (ndash00050085)
Weibull Model B Variance components mdash Response variable is Age-at-onset
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 0452 0054 (03480566)
s 2C ndash0053 0027 (ndash01200019)
s 2Cs 0467 0040 (03930545)
g 43 0104 (4246)
Binomial Model Variance components mdash Response variable is Naevi
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 0142 0149 (00020498)
s 2C 0704 0156 (04031010)
s 2Cs 0195 0157 (000250553)
Binomial Model Variance components mdash Response variable is Freckling
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 2050 0779 (08353570)
s 2C 2600 0418 (17803460)
s 2Cs 0115 0088 (00110312)
Note Indicates Significance at the 005 level) Naevi Is a Binary Variable with Baseline 0 = No or Few Moles Freckling Is Coded as a Binary Variable with Baseline0 = No Freckles
Table 3
GEE Approach Estimated Regression Coefficients in the Proportional Hazard Model and Estimated Odds Ratios for Quantifying FamilialAggregation in Age at Onset of Melanoma in Queensland Families ( Indicates Significance at the 05 Significance Level)
A Mean effects
Covariate RR = eb Coefficient b Robust se(b) Z-statistic
Year of birth 1142 0132 0051 2588
Naevi (Baseline = No or few moles) 1765 0568 0073 7781
Freckling (Baseline = No freckles) 1160 0148 0049 3020
B Pattterns of familial aggregation
Relationship 1 + q se(q) Z-statistic
Sibndashsib 2973 06217 317
Parentndashchild 1650 0434 150
SecondOthers 1155 0270 047
important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix
Acknowledgments
Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee
ReferencesAbel L amp Bonney G E (1990) A time-dependent
logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407
Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401
Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165
Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc
Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag
Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press
Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625
Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California
Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212
Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151
Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117
Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202
Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235
English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283
Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd
Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494
GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software
110 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
111Twin Research February 2004
Analysis of Melanoma Onset
Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214
Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188
Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409
Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511
Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741
Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press
Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178
Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189
Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126
Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432
Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109
Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760
Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678
Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506
Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071
Kempthorne O (1960) Biometrical genetics New YorkPergammon Press
Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press
Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210
Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452
Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334
LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425
Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293
Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298
MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432
Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092
Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43
Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493
Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497
Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press
SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University
Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148
112 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171
Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press
Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit
Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit
Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95
Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146
Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation
Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230
Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington
Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454
113Twin Research February 2004
Analysis of Melanoma Onset
Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects
reported to the Queensland Cancer Registry or foundby comparing cancer registrations for 1984 and 1987with records of pathology laboratories throughoutQueensland It is estimated that registry records areapproximately 95 complete for the study periodDoctorsrsquo permission was obtained to approach10407 cases of whom 7784 (75) returned a brieffamily history questionnaire stating whether any oftheir first-degree relatives (parents siblings children)had had a diagnosis of melanoma A total of 2920probands was sampled from these respondentsincluding all who had claimed a positive familyhistory (n = 1529) and an approximate 20 randomsample of the remainder (n = 1391) Probands weresent a detailed family history questionnaire askingfor the names and addresses of all first-degree rela-tives relativesrsquo vital status dates of birth and agesand whether any of these relatives had had amelanoma diagnosed by a doctor To avoid bias indetermining the mode of inheritance second andlower degree relatives were enrolled in the studyaccording to a sequential sampling scheme (Canningsamp Thompson 1977) First degree relatives of all rela-tives with confirmed melanoma were ascertainedthrough the detailed family history questionnairedescribed above which was mailed to all confirmedpositive relatives In total 15989 relatives belongingto 1912 separate families were reported by 2118(73) probands or other positive relatives A total of1044 relatives for whom date of birth was unknownwere excluded leaving 14945 relatives for analysisThere were 188 families independently ascertainedthrough two or more probands To avoid ascertain-ment bias these families were included in the datasetseparately for each proband in the family
Medical confirmation and dates of diagnosis weresought for the relatives reported by probands or otherrelatives to have had melanoma After eliminating187 of subjects who refused access to their medicalrecords or those with lost records or those with falsepositive reports (basal or squamous cell carcinomasolar keratoses or benign naevi) medical confirma-tion of melanoma as the diagnosis was obtained for482 of the original number of relatives Only themedically verified cases among relatives were classi-fied as true events all other relatives were treated asunaffected (censored at last date of contact)
Risk factor questionnaires were subsequentlymailed to all living relatives aged between 18 and 75years ascertained through the sequential samplingprocedure Other relatives provided proxy reportsThe combined number of proxy-reports and self-reports was 9746 relatives for whom standard riskfactor information was available
For the Bayesian analysis we focused on familiesthat included at least one parent and at least onechild where each member in the family should haveinformation on age at diagnosis or age at last follow-up and with maximum one missing covariate
The demographic covariates and hypothesizedmelanoma risk factors included gender birth yearplace of birth ability to tan (very brown moderatetan slight tan no tan) propensity to burn (neverburn and always tan sometimes burn and usually tanusually burn and sometimes tan always burn andnever tan) number of sunburns (0 1 2ndash5 gt 6) skincolor (olivedark medium fairpale) hair color(black lightdark brown fairblonde lightdark red)eye color (brown greenhazel bluegrey) total freck-ling in summer (0 1ndash100 gt 100) number of naevi(none few moderate number very many) andnumerous measures of cumulative lifetime exposuresto sun and ultraviolet rays
Preliminary Exploratory Analysis
As a preliminary analysis we ignored correlationswithin families and applied a combination of paramet-ric and non-parametric survival analysis techniques asexploratory tools to identify possible risk factors formelanoma Once these fixed effects were identified weconsidered incorporating these into a subsequent gen-eralized estimating equation model or a Bayesianmodel with random effects that could account forwithin-family correlations The aim was to quantify thegenetic and familial associations in the presence ofobserved covariate effects
Manipulation of the entire melanoma datasetresulted in a subset of 9669 observations with a rangeof explanatory variables that described phenotypiccharacteristics for each individual along with somedemographic details such as birth year and genderThe response variable was the time to diagnosis (orage at the last follow-up) with the proportion of cen-sored cases being approximately 76 The medianage at onset of melanoma was 43 The correlationestimates for age of onset for different relationshippairs with both affected members were 067 055and 039 for sibndashsib parentndashchild and secondlowerorder pairs respectively
The first stage of modelling involved fitting uni-variable proportional hazard models to assess eachvariablersquos individual effect on the time of onset ofmelanoma The SAS (Allison 1995) package wasused to fit proportional hazards models of the follow-ing form
h(t x) = Y (x b)h0(t)
where Y represents a log-linear function ebrsquox of theexplanatory variables x and corresponding coeffi-cients b and h0(t) represents the baseline hazard attime t
Significant variables associated with the age atonset of melanoma consisted of eye hair and skincolour freckling number of moles skin type abilityto burn ability to tan previous skin cancers ultravio-let exposure between the ages of 5 and 12 years
100 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
101Twin Research February 2004
Analysis of Melanoma Onset
cumulative sun exposure up to the age of 19 yearsand birth year
The second stage fitted multivariable proportionalhazard models to those explanatory variables thatwere significant at the univariable stage Table 1 dis-plays the results from the final model which onlyincluded significant variables (p-value lt 05) It isworth noting that a similar result could be obtainedusing an automated stepwise procedure
Results from this analysis highlight some interest-ing but quite obvious risk factors noted in previousanalyses (Aitken et al 1996 1998) For example
An increase of 1 year in birth year induces 17increase in risk of earlier melanoma onset
People with neither freckles nor naevi have thelowest risk of melanoma onset
The risk of earlier melanoma onset is increased byup to 37 for blue eyed people and even further(46) for green eyed people when compared toindividuals with brown eyes
ldquoRed Headsrdquo have an increased risk of earliermelanoma onset (46) when compared to indi-viduals with black hair However no significantincrease was noted for individuals with fair orlight red hair
A personrsquos tendency to burn easily increases therisk of earlier melanoma onset in some cases byup to 100 compared to those that never burn
However striking this last statement is issues of con-founding must also be considered The most obviousillustration of this is the confounding that occurredbetween mole count and freckling This is seenthrough close inspection of the parameter estimateswhich changed in magnitude when mole count wasadded to the model after adjusting for freckling(Results not displayed here)
To reduce the dimension of the problem furtherand avoid some of these confounding issues a sur-vival tree was constructed using RPART (RecursivePartitioning and Regression Trees) see Therneau andAtkinson (1997) Survival trees are a special case of
Table 1
Results from Fitting a Multivariable Proportional Hazards Model to the Melanoma Data Based on Univariable Results The Results Reported in thisTable Are the Parameter Estimates b Their Standard Errors se(b) the Relative Risk eb and the p-value for Each Estimate
Variable b se(b) eb p-value
Birth Year 016 lt 001 117 lt 05
Eye Colour (Baseline Brown)
BlueGrey 031 007 136 lt 05
GreenHazel 038 007 146 lt05
Hair Colour (Baseline Black)
Light RedGinger 017 015 119 27
Dark RedAuburn 038 015 146 lt 05
FairBlonde 006 012 106 62
Light Brown 014 012 115 22
Dark Brown 002 012 102 87
Skin Type (Baseline never burn)
Always burn 069 016 1968 lt 05
Usually burn 045 015 157 lt 05
Sometimes burn 031 015 136 lt 05
Freckling (Baseline none)
1 to 100 017 006 118 lt 05
gt 100 009 008 110 23
Mole Count (Baseline none)
Few 029 007 134 lt 05
Moderate 079 008 220 lt 05
Many 112 010 308 lt 05
Number of Sunburns (Baseline none)
One ndash007 011 093 49
2 to 5 ndash006 009 094 50
gt 6 017 009 119 07
Cumulative Sun Exposure ( lt 5 yrs) 004 001 104 lt 05
UV Exposure (5ndash12 yrs) 00003 lt 00001 1 lt 05
102 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
decision trees that were incorporated into RPART byTherneau (1997) using the ideas put forth by LeBlancand Crowley (1992) for survival data with censoringThese authors showed through simulation studies thatsurvival trees could outperform standard parametricmethods such as proportional hazards modelling particularly in situations where the underlying distrib-ution was not exponential Along with Breiman et al(1984) they showed decision trees to be usefulexploratory tools for identifying important variablesinteractions and outliers
The methodology for survival trees begins by iden-tifying optimal splits using the log rank statistic toseparate the data into homogeneous groups Eachsplit is comprised of a parent node and two daughternodes which are linked to the parent by branchesFigure 1 is an example of a survival tree produced onthe melanoma data Once a large tree is grown withmany terminal nodes that contain very few observa-tions a pruning procedure is introduced to identify asequence of sub-trees The technique of pruninginvolves snipping back splits of the tree one at a timeuntil only the root node remains Cross-validation isthen introduced to aid in the selection of the optimalmodel One nice feature of survival trees is the use ofKaplan-Meier curves to provide information aboutsurvival rates at each terminal node of the tree Thisapproach may allow for better interpretation of theterminal nodes and highlight different scenarioswhich yield similar survival rates as can be seen inFigure 1
The variables identified from the multivariablemodel were used as input into RPART Variables suchas the sun and UV exposures were omitted from themodelling stage since nearly half of the data for eachof these variables were missing (cumulative sun expo-sure 438 UV exposure 404) A large survivaltree was produced in SPLUS and using cross-valida-tion a model splitting solely on birth year wasidentified This survival tree yielded the minimumcross-validated error rate (093) suggesting that birthyear was an important indicator for melanomaSelecting a slightly larger but more informative model(error = 094) resulted in a survival tree consisting ofnine terminal nodes This revealed splits on birth yearmole count and freckling Al l other variablesappeared to be either competing at each node oracting as a surrogate variable for one of these primarysplits Skin type in particular arose as an importantsurrogate for freckling
In Figure 1 we aim to provide a more visual inter-pretation to the results from the survival analysisEach split is shown at the top of each node and canbe assumed to be the split that directs observationstowards the left side of the tree For example aftersplitting on birth year at 1933 a split to the left cor-responds to an individual with few moles A split tothe right indicates an individual with many moles andso on The terminal nodes in this figure are displayed
as Kaplan-Meier plots showing the survival curvesalong with the number of observations n and the riskratio calculated with reference to the baseline groupof 9669 individuals The reference survival curve isshown at the root node along with a bar which showsthe percentage of events in the entire dataset Gridsare placed on each survival plot at equal intervals of20 years of age These grids have been plotted to sim-plify the interpretation An indicator to the left of theplot displays the risk ratio Ratios above one areshown by an ldquoup arrowrdquo while a decrease in risk isillustrated by a ldquodown arrowrdquo From this model wecan see that there are a few scenarios that indicatehigh risk for earlier onset of melanoma These scenar-ios may be described as
individuals born after 1966 with many moles (RR = 113)
individuals born between 1947 and 1966 withmany moles (RR = 45)
individuals born between 1933 and 1953 with fewmoles but many freckles (RR = 24)
individuals born between 1923 and 1933 withmany moles (RR = 141)
It is obvious from these results that birth year has asubstantial impact on the age-at-onset of melanomaOnce this is taken into account mole count and freck-ling only provide a small contribution to the risk
Family History of Melanoma
As described earlier in the Data section family historywas collected regarding first-degree (siblings andparents) and second-degree relatives From this pedi-gree structure other higher-order types of relative pairscould also be formed Some of these relatives were dis-eased with melanoma resulting in pairs of relativeswho both may be diseased (++) both not diseased (mdash)and one diseased while the other was not diseased (+ndash)Table 2 lists the concordant and discordant pairs ofspecific relationships sibndashsib parentndashchild andsecond-degreelower order relative pairs The second-degree relative pairs include grandparentndashgrandchildand auntndashniece while the lower order relative pairsinclude the in-law pairs From Table 2 the percent-ages of both diseased pairs are 06 09 and 04among sibndashsib parentndashchild and secondlower orderpairs respectively Crude estimates of correlationcoefficients can be calculated from these percentageswithout accounting for ages at onset among these rel-atives However the risk of developing melanomamay depend on the subjectrsquos age Hence adjusting forthe age at onset is essential in quantifying the correla-tion of age at onset between pairs of relatives In asubsequent section we describe how this can be donerigorously via the generalized estimating equationsapproach On average melanomas were diagnosedslightly earlier in relatives (475 years) than inprobands (502 years) Among relatives melanomaswere diagnosed at younger ages in later generations
103Twin Research February 2004
Analysis of Melanoma Onset
RR
= 1
00
Bas
elin
e
Fig
ure
1
Surv
ival
tree
for t
he m
elan
oma
data
con
sist
ing
of 9
669
indi
vidu
als
from
191
2 Qu
eens
land
fam
ilies
Ter
min
al n
odes
are
rect
angu
lar a
nd d
ispl
ay th
e Ka
plan
-Mei
er c
urve
for t
he c
orre
spon
ding
sub
grou
p
104 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
To account for the different ages at censoring in eachgeneration due to termination of the study or deathfrom causes other than melanoma we examined thedisease-free survival distribution for each generationusing the standard failure-time analysis technique(Figure 2) The median age at diagnosis of melanomawas 64 among parents of probands 50 among sib-lings of probands and 33 among children ofprobands The disease-free survival functions differedsignificantly between the children generation fromearlier generations (log-rank test with p lt 01) butthere was no significant differences in age-at-onset of
melanoma between the siblings and parents ofprobands (log rank test with p gt 9)
Preliminary Segregation Analysis
A preliminary segregation analysis was conducted(Aitken et al 1998) to investigate whether the famil-ial clustering of cutaneous melanoma is consistentwith Mendelian inheritance of a major autosomalgene Analyses were performed with the SAGE (1992)statistical package using the maximum likelihoodREGTL program for a binary trait with a variable ageof onset The hypothesis of co-dominant Mendelianinheritance gave a significantly better fit to the datathan either dominant or recessive Mendelian inheri-tance Overall both Mendelian inheritance of a singlemajor gene and purely environmental transmissionwere rejected However there was strong evidence offamilial dependence in melanoma occurrence
The inclusion of risk factors in the models mayreveal whether all or a combination of these explainsthe familial dependence that was demonstrated by thesegregation analysis If a major gene exists that oper-ates independently on these covariates includingthem in the models may reduce residual variation andincrease the power of the analysis to detect a majorgene effect
Estimating Equations Approach
Let y = (dki tki Zki) denote the data collected for the ith
member in the kth family (k = 1hellipK and i = 12)where dki = 0 if the observation is censored tki is eitherthe recorded age at diagnosis of melanoma or the ageat the most recent follow-up for unaffected peopleand Zki is a vector of measured covariates We assumethat censoring time age at diagnosis of melanomaand the covariates are independently distributedThese assumptions can be relaxed in more generalmodels subject to identification constraints Thehazard rate for melanoma is the instantaneous proba-bility that melanoma is diagnosed immediately aftertime t given that the person is unaffected at time tThe hazard rate under the Cox proportional hazardsmodel (Cox 1972) is given by
l(tki) = l0(tki) exp(brsquoZki)
Table 2
Concordant and Discordant Pairs of Relatives in 1912 Families from the Queensland Familial Melanoma Project Probands Were Not Included forthe Calculation of Concordance
Sibndashsib Parentndashchild SecondOthers Total
++ 49 15 41 105
+ndash 763 536 1200 2499
mdash 7011 1078 9817 17906
Total 7823 1629 11058 20510
Figure 2
Cumulative melanoma-free survival among first-degree relatives ofconfirmed cutaneous melanoma cases (probands) diagnosed inQueensland Australia 1982ndash1987 according to the relativersquos rela-tionship to the proband
where l0() is the baseline hazard function and b is avector of regression coefficients
For a specific pair of relatives we follow Clayton(1978) in modeling the bivariate survivor function
F(tk1 tk2) = (F1(tk1)ndashq + F2(tk2)
ndashq ndash1)ndash1q
where F1 and F2 are univariable survivor functions qis a scalar parameter that measures the degree ofdependence between the relativesrsquo times at onsetindependence being implied by q = 0 and positiveassociation by q gt 0 The Clayton model allows nega-tive dependencies and has the property that failuretimes are absolutely continuous for q gt ndash05 In addi-tion the cross-ratio (or odds-ratio) function asstudied by Oakes (1989) is
c(tk1 tk2) = l(tk1 frac12 Tk2 = tk2) l(tk1frac12Tk2 sup3 tk2) =1 + q
This is equivalent to assuming that the odds-ratio isinvariant over the grid region that supports the dataHeuristically the parameter 1 + q is an odds-ratiothat depends on the degree of dependence betweenthe onset ages of the two relatives If genetic factorsdo influence the age at onset of melanoma we wouldexpect to see a higher concordance in the age of onsetin first degree relatives who on average share halftheir genes in common in comparison to seconddegree relatives Under the current model this trans-lates as qfirst-degree gt qsecond-degree
We may use a standard method to estimate withinpair correlations for 2 acute 2 tables from odds ratiosEstimates of relation-pair correlations psibndashsib pparentndashchildand psecond-order are recovered from using the relation-ships for example r sibndashsib = min(1ln(1 + q sibndashsib))Testing for the presence of genetic factors underlyingthe age at diagnosis of melanoma is equivalent totesting H0 p first-order = psecond-order We may test thishypothesis using a z-transform (Kendall 1979 p 315) of the point estimates of the correlation coef-ficients Let n1 and n2 denote the number of first orderand second order relatives let z1and z2 denote thetransformed statistics of r1and r2 the correlation esti-mates for first-order and second-order relative pairsrespectively Specifically we reject H0 when ED gtZ1ndasha where
E = E(z1 ndash z2) = mdash12mdash log 31mdash11
+ndash r
r
1
1mdash2 1mdash11
+ndash r
r2
2
mdash24D2 = V(z1 ndash z2) = mdash
(n1
1ndash 3)mdash+ mdash
(n2
1ndash 3)mdash
and Z1ndasha is the standard normal deviate correspondingto the one-sided a significance level
This approach has the advantage of providing atest for the presence of genetic effects through a singleparameter (q) However it is limited in its ability to
attribute the phenotypic variance to specific effects(eg additive gene action)
Mathematical details of the GEE model and theiterative procedure to estimate the regression coeffi-cients b and specific degrees of dependence q for thedifferent types of relative pairs have been summarizedpreviously in Do et al (2000)
MCMC Analysis Using BUGSThe Bayesian Paradigm and Gibbs Sampling
Markov Chain Monte Carlo (MCMC) is an alterna-tive Bayesian approach that provides estimates oflikelihoods and associated parameter values whenexact computation is infeasible (Hastings 1970Metropolis et al 1953) MCMC methods can beused to draw samples from the underlying joint distri-bution of major genotypes and polygenic valuesconditional on the observed data From thesesamples desired parameters and likelihoods can beestimated without the need to resort to exact compu-tation MCMC methods have been used for linkageanalysis (Kong et al 1992 Lange amp Sobel 1991)for estimation of parameters in the mixed model withand without covariates (Guo amp Thompson 1991Thomas 1992) for estimation of genendashsmoking inter-action and covariate imputation (Gauderman et al1997) for performing combined linkage and segrega-tion analysis (Faucett et al 1993 Guo amp Thompson1992) and for mixed models of large complex pedi-grees (Guo amp Thompson 1994)
In a general setting let y be the observed data andq be everything not observed including parametersand latent variables The implementation of Bayesianmethods using realistic models and priors is com-puter-intensive and relies heavily on cunningcomputational tools to approximate integrals Theproblem in general terms is to obtain the expectedvalue of a function of interest s () under the posteriordensity p(qfrac12x)
E[s(q)] =
which cannot generally be found analytically Onemethod to carry out the integration on the RHS is toperform simulation of exact Bayesian posterior distri-butions using Markov chain Monte Carlo techniquessuch as Gibbs sampling The Gibbs sampler (Gemanamp Geman 1984) is the most popular algorithm usedin MCMC applications to correlated data Gibbssampling was introduced to the main statistical com-munity by Gelfand and Smith (1990) and has sincebeen applied to an even wider array of problems TheGibbs sampler is easy to implement because it onlydepends on the local neighborhood structure In thecontext of pedigree analysis (Olshen amp Wijsman1996) the basic procedure is a sequential updating ofmissing and latent data including the underlying andunobserved major genotypes polygenic effects and
EQ s(q) p(q) p(xfrac12q) dqmdash mdash mdash
EQ p(q) p(xfrac12q) dq
105Twin Research February 2004
Analysis of Melanoma Onset
106 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
environmental effects Values for the missing or latentdata are sampled from the local conditional distribu-tion a function of the observed individual data thecurrent sampled values of other missinglatent data forthis particular individual such as polygenic and environ-mental effects and the values for the sampled geneticeffects in the immediate neighbors of an individualGibbs sampling basically consists of three main steps
Step 1 Setting initial values for unobserved quanti-ties (parameters and latent variables)
Step 2 For each parameter or latent variable qj sample from its ldquofull conditional distributionrdquogiven the current values of all other quantities inthe model
Step 3 Examine sampled values of parameters andlatent variables to monitor convergence and toprovide summary measures
Some of the most recent and popular packages thatimplement Gibbs sampling for analysis of pedigreedata include BUGS (Gilks et al 1994 Spiegelhalter et al 1996ab) Genetic Analysis Package (GAP1996) and MIXD (Olshen amp Wijsman 1996Thompson 1994) We have used BUGS mainlybecause of its flexibility in programming hierarchicalmodels besides being a freeware product
The Model
The aim here is to model the correlation structurewithin the family structure to satisfy the fundamentaladditive genetic model (Crow amp Kimura 1970Falconer 1990 Kempthorne 1960) as followsConsider a nuclear family structure consisting of 4members father mother and two children denoted byF M S1 S2 respectively Using similar notation as inBurton et al (1999) a conventional mixed linearmodel consisting of fixed and random effects may bewritten in the form
Qij = bTz + Aij +Cij + CSij+ Eij
where Qij is the observed value of a normally distrib-uted continuous trait for the jth individual in the ith
nuclear family zij is a vector of observed covariatesrepresenting fixed effects and b is a correspondingvector of unknown fixed regression coefficients Aij Cij and CSij
A denote random effects that repre-sent additive polygenic common family environmentand common sibling environment effects respectivelyThe variation in an individual response is representedby a composite covariance matrix VT and is the sumof an additive genetic covariance matrix VA acommon family environment matrix VC a sharedsibling environment matrix VCs and residual environ-mental effects The different variance components are
The overall total covariance matrix is
The components of variance s 2A s 2
C s 2Cs need not be
positive as long as VT is positive definite A negativevalue for s 2
Cs simply implies that the realized covari-ance between sib lings is less than the realized
107Twin Research February 2004
Analysis of Melanoma Onset
covariance between parent and a child A negativevalue for s 2
Cs may suggest dominance To optimizeconvergence in BUGS Model (1) may be reparameter-ized as
where Fi Gi H i are independent additive randomeffects or latent variables and are the residual errorterms for parents and children respectively If we model
Fi ~ N(0 mdash12mdash s 2
A +s 2C)
Gi ~ N(0 mdash12mdash s 2
A) Hi ~ N(0 s 2Cs)
then the basic genetics covariance model (additivegenetic common environment and unique environ-ment for this particular four-member family structureis satisfied
In survival models unobserved or unmeasuredexplanatory variables some of which may be geneticare often referred to as frailties The frailties takevalues restricted to the posit ive line and may beassumed to act multiplicatively on the hazardExtending the above model to correlated family datawith time-to-onset endpoint a multiplicative individ-ual heterogeneity or frailty term representing thelatent genetic and common environment variablesmay be modeled as random effects simultaneouslywith the effects associated with observed covariatesConsider right censored time to onset of melanomadata (Tij dij zij)1 j n from n relative pairs hereT i j denotes the true age at onset of the jth familymember or the censored time depending on whetherdij =1 or 0 respectively and z denotes a p acute 1 vector ofcovariates A Weibull distribution may be used tomodel time to failure as
f(ti zi) = ebrsquoz igt igndash1 exp(ndashebrsquoz it g
i )
where b is a vector of unknown regression coefficientsand g is the shape parameter of the Weibull distribu-tion This leads to a baseline hazard of the form
l0(ti) = gt igndash1
Re-parameterize by letting mi = ebrsquoz i the conditionaldistribution of ti given mI is then Weibull (gmi) We for-mulated a mixed model to represent the conditionaldistribution of tij given covariate effects random addi-tive genetic and common environment effects as
tijfrac12mij ~ Weibull (g mij) i = 1hellipn j = 12
where
The regression coefficients and the precision of therandom effects (tG tF tH) were given ldquonon-informa-tiverdquo Normal and Gamma priors respectively The shapeparameter g of the time to onset of melanoma distribu-tion was also given a non-informative Gamma priorwhich was slowly decreasing on the positive real line
The overall model (including random effects) maybe described in a Bayesian graph (Figure 3) whichsimplifies sampling from full conditional distributionsby exploiting part ial independence propert ies(Spiegelhalter et al 1996a) In this graph eachrandom quantity is represented by a node which maybe connected by directed or undirected linksConditional independence assumptions are repre-sented by the absence of such links
We implemented the Gibbs sampler using theBUGS program (Gilks et al 1994) (code in theAppendix) Imputation of missing data was handlednaturally in the Gibbs sampling framework by treatingmissing values as additional unknown quantities andrandomly sampling values from their full conditionaldistributions We chose simple prior distributions forimputation since the number of missing values forcovariates was not large (complete for birth year 16missing for naevi and 21 missing for freckles) andthere was no indication of non-random missingness inour data Therefore imputation for missing naevi andfreckles covariates were based on Bernouilli prior dis-tributions with respective parameter values estimatedfrom the complete observations We performed aninitial 10000 burn-in iterations followed by an addi-tional 20000 Parameter estimates were the mean andstandard deviation (SD) of all post convergence Gibbssamples with a thinning interval between 20 and 50credible intervals were computed as the lower andupper a2 percentiles from the last 20000 iterationsConvergence to the posterior distribution was con-firmed by using the different criteria provided by theadd-on CODA package including those of Gelmanand Rubin (1992) Geweke (1992) and Raftery andLewis (1992a 1992b)
ResultsGEE Approach
An inspection of residual plots following preliminarymodel-fitting provided no evidence for the failure ofproportional hazards assumption and did not detectinfluential observations We then proceeded to applythe GEE approach that could estimate regressioncoefficients while incorporating a dependence struc-ture between relative pairs The results are displayed
108 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
in Table 3 which suggest that later birth year havingat least a moderate number of naevi and freckles wereall simultaneously associated with later age at onsetof melanoma Table 3 also presents the estimatedodds-ratios for quantifying the correlation betweenpaired relatives of specific relationships
The odds ratio for sibndashsib pairs is 2973 which issignificantly different from 1 (p lt 001) The odds ratiofor parentndashchild pairs is 1650 which is slightly greaterthan 1 indicating a mild dependency between thesepairs although they are not quite statistically signifi-cant The odds ratio for second-degree and higherrelative pairs is 1155 indicating no dependence at allbetween these pairs This pattern of familial aggrega-tion is compatible with dominance variance as well asadditive genetic variance (Falconer 1990)
Bayesian Approach
We re-analyzed the age-at-onset of melanoma by usingGibbs sampling to impute missing covariates and toestimate subject-specific covariate effects randomadditive genetic common family environment andshared sibling environment effects on the log scale Theestimated shape parameter g was 43 with a 95 credi-ble interval (CI) of (42 46) The results aresummarized in Table 4 (Model A) The residual plotsdid not indicate a gross departure from the underlyingWeibull model and revealed no influential observa-tions We checked the sensitivity of the analyses to
initial parameter values by re-running the Gibbssampler five more times using different starting valuesThe resulting estimates did not differ by more than 5from the values reported here The mean estimate for(s 2
A) was 0452 with 95 CI = (0348 0566) for (s 2C)
was ndash0053 with 95 CI = (ndash0120 0019) and for(s 2
Cs) was 0467 with 95 CI = (0393 0545)A small negative value for the common family
effect suggests that there may be a dominant effect orthat the current model is not quite appropriate forexample that there are systematic effects or correla-tion structures that have not been accounted for Theresults here indicate that additive genetics seem toimpact equally on the variation of the age at onset ofmelanoma Further exploration for alternative modelsis a focus of our future research
In addition we investigated the relative contribu-tion of genetic and environmental effects on theexpression of naevi (Model B) and freckles (ModelC) which are known risk factors for melanoma Theexpressions of naevi and freckles were coded asbinary variables (none or few moles versus moderateor many moles and no freckles versus one or morefreckles) A hierarchical Bayesian binomial model wasfitted to estimate the random variance componentsThe results in Table 4 indicate that a common familyenvironment effect contributed the most to theexpression of naevi (s 2
C = 0704) (relative to the con-tributions of additive genetic effect (s 2
A = 0142) and
Individual j in family i
Figure 3
Graphical model of covariate and random family effects for an individual in a nuclear family structure tij represents the observed failure time forthe jth individual in the ith family with tcenij being an indicator variable of censoring status Full arrows indicate stochastic links to which a probabil-ity is attached broken arrows denote deterministic relationships bs are regression coefficients T is the precision of the prior distribution andequals the inverse of the variance Fi Gi Hi are random effects modeled as Fi ~ N(012 s 2
A + s 2C) Gi ~ N(0 12 s 2
A) and Hi ~ N(0 s 2Cs) Rectangles rep-
resent actual data values for the covariates g and mij are shape and scale parameters for the underlying Weibull distribution
109Twin Research February 2004
Analysis of Melanoma Onset
of shared sibling effect(s 2Cs = 0142) both of which
were non negligible In contrast variation in theexpression of freckles was largely explained by addi-tive genetic and shared family effects(s 2
A = 2050 s 2C =
2600) compared to a relatively small shared siblingeffect(s 2
C = 0115)
Discussion
We applied two methods mdash generalized estimatingequat ion and Bayesian analys is mdash to the geneticanalysis of age at onset of melanoma based on anuclear family structure Under both approaches theresults suggest that additive genetic factors played an
Table 4
Gibbs Sampling Approach Estimated Regression Coefficients and Estimated Variance Components in a Melanoma Study of Queensland Families
Weibull Model A Mean effects mdash Response variable is Age-at-onset
Covariate RR = eb Coefficient b Robust se(b) 95 CI of bYear of birth 1378 0321 00027 (03160326)
Naevi 1126 0119 00021 (00580185)
Freckling 1017 0017 01400 (ndash00050085)
Weibull Model B Variance components mdash Response variable is Age-at-onset
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 0452 0054 (03480566)
s 2C ndash0053 0027 (ndash01200019)
s 2Cs 0467 0040 (03930545)
g 43 0104 (4246)
Binomial Model Variance components mdash Response variable is Naevi
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 0142 0149 (00020498)
s 2C 0704 0156 (04031010)
s 2Cs 0195 0157 (000250553)
Binomial Model Variance components mdash Response variable is Freckling
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 2050 0779 (08353570)
s 2C 2600 0418 (17803460)
s 2Cs 0115 0088 (00110312)
Note Indicates Significance at the 005 level) Naevi Is a Binary Variable with Baseline 0 = No or Few Moles Freckling Is Coded as a Binary Variable with Baseline0 = No Freckles
Table 3
GEE Approach Estimated Regression Coefficients in the Proportional Hazard Model and Estimated Odds Ratios for Quantifying FamilialAggregation in Age at Onset of Melanoma in Queensland Families ( Indicates Significance at the 05 Significance Level)
A Mean effects
Covariate RR = eb Coefficient b Robust se(b) Z-statistic
Year of birth 1142 0132 0051 2588
Naevi (Baseline = No or few moles) 1765 0568 0073 7781
Freckling (Baseline = No freckles) 1160 0148 0049 3020
B Pattterns of familial aggregation
Relationship 1 + q se(q) Z-statistic
Sibndashsib 2973 06217 317
Parentndashchild 1650 0434 150
SecondOthers 1155 0270 047
important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix
Acknowledgments
Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee
ReferencesAbel L amp Bonney G E (1990) A time-dependent
logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407
Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401
Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165
Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc
Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag
Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press
Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625
Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California
Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212
Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151
Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117
Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202
Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235
English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283
Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd
Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494
GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software
110 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
111Twin Research February 2004
Analysis of Melanoma Onset
Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214
Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188
Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409
Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511
Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741
Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press
Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178
Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189
Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126
Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432
Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109
Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760
Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678
Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506
Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071
Kempthorne O (1960) Biometrical genetics New YorkPergammon Press
Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press
Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210
Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452
Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334
LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425
Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293
Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298
MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432
Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092
Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43
Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493
Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497
Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press
SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University
Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148
112 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171
Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press
Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit
Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit
Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95
Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146
Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation
Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230
Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington
Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454
113Twin Research February 2004
Analysis of Melanoma Onset
Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects
101Twin Research February 2004
Analysis of Melanoma Onset
cumulative sun exposure up to the age of 19 yearsand birth year
The second stage fitted multivariable proportionalhazard models to those explanatory variables thatwere significant at the univariable stage Table 1 dis-plays the results from the final model which onlyincluded significant variables (p-value lt 05) It isworth noting that a similar result could be obtainedusing an automated stepwise procedure
Results from this analysis highlight some interest-ing but quite obvious risk factors noted in previousanalyses (Aitken et al 1996 1998) For example
An increase of 1 year in birth year induces 17increase in risk of earlier melanoma onset
People with neither freckles nor naevi have thelowest risk of melanoma onset
The risk of earlier melanoma onset is increased byup to 37 for blue eyed people and even further(46) for green eyed people when compared toindividuals with brown eyes
ldquoRed Headsrdquo have an increased risk of earliermelanoma onset (46) when compared to indi-viduals with black hair However no significantincrease was noted for individuals with fair orlight red hair
A personrsquos tendency to burn easily increases therisk of earlier melanoma onset in some cases byup to 100 compared to those that never burn
However striking this last statement is issues of con-founding must also be considered The most obviousillustration of this is the confounding that occurredbetween mole count and freckling This is seenthrough close inspection of the parameter estimateswhich changed in magnitude when mole count wasadded to the model after adjusting for freckling(Results not displayed here)
To reduce the dimension of the problem furtherand avoid some of these confounding issues a sur-vival tree was constructed using RPART (RecursivePartitioning and Regression Trees) see Therneau andAtkinson (1997) Survival trees are a special case of
Table 1
Results from Fitting a Multivariable Proportional Hazards Model to the Melanoma Data Based on Univariable Results The Results Reported in thisTable Are the Parameter Estimates b Their Standard Errors se(b) the Relative Risk eb and the p-value for Each Estimate
Variable b se(b) eb p-value
Birth Year 016 lt 001 117 lt 05
Eye Colour (Baseline Brown)
BlueGrey 031 007 136 lt 05
GreenHazel 038 007 146 lt05
Hair Colour (Baseline Black)
Light RedGinger 017 015 119 27
Dark RedAuburn 038 015 146 lt 05
FairBlonde 006 012 106 62
Light Brown 014 012 115 22
Dark Brown 002 012 102 87
Skin Type (Baseline never burn)
Always burn 069 016 1968 lt 05
Usually burn 045 015 157 lt 05
Sometimes burn 031 015 136 lt 05
Freckling (Baseline none)
1 to 100 017 006 118 lt 05
gt 100 009 008 110 23
Mole Count (Baseline none)
Few 029 007 134 lt 05
Moderate 079 008 220 lt 05
Many 112 010 308 lt 05
Number of Sunburns (Baseline none)
One ndash007 011 093 49
2 to 5 ndash006 009 094 50
gt 6 017 009 119 07
Cumulative Sun Exposure ( lt 5 yrs) 004 001 104 lt 05
UV Exposure (5ndash12 yrs) 00003 lt 00001 1 lt 05
102 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
decision trees that were incorporated into RPART byTherneau (1997) using the ideas put forth by LeBlancand Crowley (1992) for survival data with censoringThese authors showed through simulation studies thatsurvival trees could outperform standard parametricmethods such as proportional hazards modelling particularly in situations where the underlying distrib-ution was not exponential Along with Breiman et al(1984) they showed decision trees to be usefulexploratory tools for identifying important variablesinteractions and outliers
The methodology for survival trees begins by iden-tifying optimal splits using the log rank statistic toseparate the data into homogeneous groups Eachsplit is comprised of a parent node and two daughternodes which are linked to the parent by branchesFigure 1 is an example of a survival tree produced onthe melanoma data Once a large tree is grown withmany terminal nodes that contain very few observa-tions a pruning procedure is introduced to identify asequence of sub-trees The technique of pruninginvolves snipping back splits of the tree one at a timeuntil only the root node remains Cross-validation isthen introduced to aid in the selection of the optimalmodel One nice feature of survival trees is the use ofKaplan-Meier curves to provide information aboutsurvival rates at each terminal node of the tree Thisapproach may allow for better interpretation of theterminal nodes and highlight different scenarioswhich yield similar survival rates as can be seen inFigure 1
The variables identified from the multivariablemodel were used as input into RPART Variables suchas the sun and UV exposures were omitted from themodelling stage since nearly half of the data for eachof these variables were missing (cumulative sun expo-sure 438 UV exposure 404) A large survivaltree was produced in SPLUS and using cross-valida-tion a model splitting solely on birth year wasidentified This survival tree yielded the minimumcross-validated error rate (093) suggesting that birthyear was an important indicator for melanomaSelecting a slightly larger but more informative model(error = 094) resulted in a survival tree consisting ofnine terminal nodes This revealed splits on birth yearmole count and freckling Al l other variablesappeared to be either competing at each node oracting as a surrogate variable for one of these primarysplits Skin type in particular arose as an importantsurrogate for freckling
In Figure 1 we aim to provide a more visual inter-pretation to the results from the survival analysisEach split is shown at the top of each node and canbe assumed to be the split that directs observationstowards the left side of the tree For example aftersplitting on birth year at 1933 a split to the left cor-responds to an individual with few moles A split tothe right indicates an individual with many moles andso on The terminal nodes in this figure are displayed
as Kaplan-Meier plots showing the survival curvesalong with the number of observations n and the riskratio calculated with reference to the baseline groupof 9669 individuals The reference survival curve isshown at the root node along with a bar which showsthe percentage of events in the entire dataset Gridsare placed on each survival plot at equal intervals of20 years of age These grids have been plotted to sim-plify the interpretation An indicator to the left of theplot displays the risk ratio Ratios above one areshown by an ldquoup arrowrdquo while a decrease in risk isillustrated by a ldquodown arrowrdquo From this model wecan see that there are a few scenarios that indicatehigh risk for earlier onset of melanoma These scenar-ios may be described as
individuals born after 1966 with many moles (RR = 113)
individuals born between 1947 and 1966 withmany moles (RR = 45)
individuals born between 1933 and 1953 with fewmoles but many freckles (RR = 24)
individuals born between 1923 and 1933 withmany moles (RR = 141)
It is obvious from these results that birth year has asubstantial impact on the age-at-onset of melanomaOnce this is taken into account mole count and freck-ling only provide a small contribution to the risk
Family History of Melanoma
As described earlier in the Data section family historywas collected regarding first-degree (siblings andparents) and second-degree relatives From this pedi-gree structure other higher-order types of relative pairscould also be formed Some of these relatives were dis-eased with melanoma resulting in pairs of relativeswho both may be diseased (++) both not diseased (mdash)and one diseased while the other was not diseased (+ndash)Table 2 lists the concordant and discordant pairs ofspecific relationships sibndashsib parentndashchild andsecond-degreelower order relative pairs The second-degree relative pairs include grandparentndashgrandchildand auntndashniece while the lower order relative pairsinclude the in-law pairs From Table 2 the percent-ages of both diseased pairs are 06 09 and 04among sibndashsib parentndashchild and secondlower orderpairs respectively Crude estimates of correlationcoefficients can be calculated from these percentageswithout accounting for ages at onset among these rel-atives However the risk of developing melanomamay depend on the subjectrsquos age Hence adjusting forthe age at onset is essential in quantifying the correla-tion of age at onset between pairs of relatives In asubsequent section we describe how this can be donerigorously via the generalized estimating equationsapproach On average melanomas were diagnosedslightly earlier in relatives (475 years) than inprobands (502 years) Among relatives melanomaswere diagnosed at younger ages in later generations
103Twin Research February 2004
Analysis of Melanoma Onset
RR
= 1
00
Bas
elin
e
Fig
ure
1
Surv
ival
tree
for t
he m
elan
oma
data
con
sist
ing
of 9
669
indi
vidu
als
from
191
2 Qu
eens
land
fam
ilies
Ter
min
al n
odes
are
rect
angu
lar a
nd d
ispl
ay th
e Ka
plan
-Mei
er c
urve
for t
he c
orre
spon
ding
sub
grou
p
104 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
To account for the different ages at censoring in eachgeneration due to termination of the study or deathfrom causes other than melanoma we examined thedisease-free survival distribution for each generationusing the standard failure-time analysis technique(Figure 2) The median age at diagnosis of melanomawas 64 among parents of probands 50 among sib-lings of probands and 33 among children ofprobands The disease-free survival functions differedsignificantly between the children generation fromearlier generations (log-rank test with p lt 01) butthere was no significant differences in age-at-onset of
melanoma between the siblings and parents ofprobands (log rank test with p gt 9)
Preliminary Segregation Analysis
A preliminary segregation analysis was conducted(Aitken et al 1998) to investigate whether the famil-ial clustering of cutaneous melanoma is consistentwith Mendelian inheritance of a major autosomalgene Analyses were performed with the SAGE (1992)statistical package using the maximum likelihoodREGTL program for a binary trait with a variable ageof onset The hypothesis of co-dominant Mendelianinheritance gave a significantly better fit to the datathan either dominant or recessive Mendelian inheri-tance Overall both Mendelian inheritance of a singlemajor gene and purely environmental transmissionwere rejected However there was strong evidence offamilial dependence in melanoma occurrence
The inclusion of risk factors in the models mayreveal whether all or a combination of these explainsthe familial dependence that was demonstrated by thesegregation analysis If a major gene exists that oper-ates independently on these covariates includingthem in the models may reduce residual variation andincrease the power of the analysis to detect a majorgene effect
Estimating Equations Approach
Let y = (dki tki Zki) denote the data collected for the ith
member in the kth family (k = 1hellipK and i = 12)where dki = 0 if the observation is censored tki is eitherthe recorded age at diagnosis of melanoma or the ageat the most recent follow-up for unaffected peopleand Zki is a vector of measured covariates We assumethat censoring time age at diagnosis of melanomaand the covariates are independently distributedThese assumptions can be relaxed in more generalmodels subject to identification constraints Thehazard rate for melanoma is the instantaneous proba-bility that melanoma is diagnosed immediately aftertime t given that the person is unaffected at time tThe hazard rate under the Cox proportional hazardsmodel (Cox 1972) is given by
l(tki) = l0(tki) exp(brsquoZki)
Table 2
Concordant and Discordant Pairs of Relatives in 1912 Families from the Queensland Familial Melanoma Project Probands Were Not Included forthe Calculation of Concordance
Sibndashsib Parentndashchild SecondOthers Total
++ 49 15 41 105
+ndash 763 536 1200 2499
mdash 7011 1078 9817 17906
Total 7823 1629 11058 20510
Figure 2
Cumulative melanoma-free survival among first-degree relatives ofconfirmed cutaneous melanoma cases (probands) diagnosed inQueensland Australia 1982ndash1987 according to the relativersquos rela-tionship to the proband
where l0() is the baseline hazard function and b is avector of regression coefficients
For a specific pair of relatives we follow Clayton(1978) in modeling the bivariate survivor function
F(tk1 tk2) = (F1(tk1)ndashq + F2(tk2)
ndashq ndash1)ndash1q
where F1 and F2 are univariable survivor functions qis a scalar parameter that measures the degree ofdependence between the relativesrsquo times at onsetindependence being implied by q = 0 and positiveassociation by q gt 0 The Clayton model allows nega-tive dependencies and has the property that failuretimes are absolutely continuous for q gt ndash05 In addi-tion the cross-ratio (or odds-ratio) function asstudied by Oakes (1989) is
c(tk1 tk2) = l(tk1 frac12 Tk2 = tk2) l(tk1frac12Tk2 sup3 tk2) =1 + q
This is equivalent to assuming that the odds-ratio isinvariant over the grid region that supports the dataHeuristically the parameter 1 + q is an odds-ratiothat depends on the degree of dependence betweenthe onset ages of the two relatives If genetic factorsdo influence the age at onset of melanoma we wouldexpect to see a higher concordance in the age of onsetin first degree relatives who on average share halftheir genes in common in comparison to seconddegree relatives Under the current model this trans-lates as qfirst-degree gt qsecond-degree
We may use a standard method to estimate withinpair correlations for 2 acute 2 tables from odds ratiosEstimates of relation-pair correlations psibndashsib pparentndashchildand psecond-order are recovered from using the relation-ships for example r sibndashsib = min(1ln(1 + q sibndashsib))Testing for the presence of genetic factors underlyingthe age at diagnosis of melanoma is equivalent totesting H0 p first-order = psecond-order We may test thishypothesis using a z-transform (Kendall 1979 p 315) of the point estimates of the correlation coef-ficients Let n1 and n2 denote the number of first orderand second order relatives let z1and z2 denote thetransformed statistics of r1and r2 the correlation esti-mates for first-order and second-order relative pairsrespectively Specifically we reject H0 when ED gtZ1ndasha where
E = E(z1 ndash z2) = mdash12mdash log 31mdash11
+ndash r
r
1
1mdash2 1mdash11
+ndash r
r2
2
mdash24D2 = V(z1 ndash z2) = mdash
(n1
1ndash 3)mdash+ mdash
(n2
1ndash 3)mdash
and Z1ndasha is the standard normal deviate correspondingto the one-sided a significance level
This approach has the advantage of providing atest for the presence of genetic effects through a singleparameter (q) However it is limited in its ability to
attribute the phenotypic variance to specific effects(eg additive gene action)
Mathematical details of the GEE model and theiterative procedure to estimate the regression coeffi-cients b and specific degrees of dependence q for thedifferent types of relative pairs have been summarizedpreviously in Do et al (2000)
MCMC Analysis Using BUGSThe Bayesian Paradigm and Gibbs Sampling
Markov Chain Monte Carlo (MCMC) is an alterna-tive Bayesian approach that provides estimates oflikelihoods and associated parameter values whenexact computation is infeasible (Hastings 1970Metropolis et al 1953) MCMC methods can beused to draw samples from the underlying joint distri-bution of major genotypes and polygenic valuesconditional on the observed data From thesesamples desired parameters and likelihoods can beestimated without the need to resort to exact compu-tation MCMC methods have been used for linkageanalysis (Kong et al 1992 Lange amp Sobel 1991)for estimation of parameters in the mixed model withand without covariates (Guo amp Thompson 1991Thomas 1992) for estimation of genendashsmoking inter-action and covariate imputation (Gauderman et al1997) for performing combined linkage and segrega-tion analysis (Faucett et al 1993 Guo amp Thompson1992) and for mixed models of large complex pedi-grees (Guo amp Thompson 1994)
In a general setting let y be the observed data andq be everything not observed including parametersand latent variables The implementation of Bayesianmethods using realistic models and priors is com-puter-intensive and relies heavily on cunningcomputational tools to approximate integrals Theproblem in general terms is to obtain the expectedvalue of a function of interest s () under the posteriordensity p(qfrac12x)
E[s(q)] =
which cannot generally be found analytically Onemethod to carry out the integration on the RHS is toperform simulation of exact Bayesian posterior distri-butions using Markov chain Monte Carlo techniquessuch as Gibbs sampling The Gibbs sampler (Gemanamp Geman 1984) is the most popular algorithm usedin MCMC applications to correlated data Gibbssampling was introduced to the main statistical com-munity by Gelfand and Smith (1990) and has sincebeen applied to an even wider array of problems TheGibbs sampler is easy to implement because it onlydepends on the local neighborhood structure In thecontext of pedigree analysis (Olshen amp Wijsman1996) the basic procedure is a sequential updating ofmissing and latent data including the underlying andunobserved major genotypes polygenic effects and
EQ s(q) p(q) p(xfrac12q) dqmdash mdash mdash
EQ p(q) p(xfrac12q) dq
105Twin Research February 2004
Analysis of Melanoma Onset
106 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
environmental effects Values for the missing or latentdata are sampled from the local conditional distribu-tion a function of the observed individual data thecurrent sampled values of other missinglatent data forthis particular individual such as polygenic and environ-mental effects and the values for the sampled geneticeffects in the immediate neighbors of an individualGibbs sampling basically consists of three main steps
Step 1 Setting initial values for unobserved quanti-ties (parameters and latent variables)
Step 2 For each parameter or latent variable qj sample from its ldquofull conditional distributionrdquogiven the current values of all other quantities inthe model
Step 3 Examine sampled values of parameters andlatent variables to monitor convergence and toprovide summary measures
Some of the most recent and popular packages thatimplement Gibbs sampling for analysis of pedigreedata include BUGS (Gilks et al 1994 Spiegelhalter et al 1996ab) Genetic Analysis Package (GAP1996) and MIXD (Olshen amp Wijsman 1996Thompson 1994) We have used BUGS mainlybecause of its flexibility in programming hierarchicalmodels besides being a freeware product
The Model
The aim here is to model the correlation structurewithin the family structure to satisfy the fundamentaladditive genetic model (Crow amp Kimura 1970Falconer 1990 Kempthorne 1960) as followsConsider a nuclear family structure consisting of 4members father mother and two children denoted byF M S1 S2 respectively Using similar notation as inBurton et al (1999) a conventional mixed linearmodel consisting of fixed and random effects may bewritten in the form
Qij = bTz + Aij +Cij + CSij+ Eij
where Qij is the observed value of a normally distrib-uted continuous trait for the jth individual in the ith
nuclear family zij is a vector of observed covariatesrepresenting fixed effects and b is a correspondingvector of unknown fixed regression coefficients Aij Cij and CSij
A denote random effects that repre-sent additive polygenic common family environmentand common sibling environment effects respectivelyThe variation in an individual response is representedby a composite covariance matrix VT and is the sumof an additive genetic covariance matrix VA acommon family environment matrix VC a sharedsibling environment matrix VCs and residual environ-mental effects The different variance components are
The overall total covariance matrix is
The components of variance s 2A s 2
C s 2Cs need not be
positive as long as VT is positive definite A negativevalue for s 2
Cs simply implies that the realized covari-ance between sib lings is less than the realized
107Twin Research February 2004
Analysis of Melanoma Onset
covariance between parent and a child A negativevalue for s 2
Cs may suggest dominance To optimizeconvergence in BUGS Model (1) may be reparameter-ized as
where Fi Gi H i are independent additive randomeffects or latent variables and are the residual errorterms for parents and children respectively If we model
Fi ~ N(0 mdash12mdash s 2
A +s 2C)
Gi ~ N(0 mdash12mdash s 2
A) Hi ~ N(0 s 2Cs)
then the basic genetics covariance model (additivegenetic common environment and unique environ-ment for this particular four-member family structureis satisfied
In survival models unobserved or unmeasuredexplanatory variables some of which may be geneticare often referred to as frailties The frailties takevalues restricted to the posit ive line and may beassumed to act multiplicatively on the hazardExtending the above model to correlated family datawith time-to-onset endpoint a multiplicative individ-ual heterogeneity or frailty term representing thelatent genetic and common environment variablesmay be modeled as random effects simultaneouslywith the effects associated with observed covariatesConsider right censored time to onset of melanomadata (Tij dij zij)1 j n from n relative pairs hereT i j denotes the true age at onset of the jth familymember or the censored time depending on whetherdij =1 or 0 respectively and z denotes a p acute 1 vector ofcovariates A Weibull distribution may be used tomodel time to failure as
f(ti zi) = ebrsquoz igt igndash1 exp(ndashebrsquoz it g
i )
where b is a vector of unknown regression coefficientsand g is the shape parameter of the Weibull distribu-tion This leads to a baseline hazard of the form
l0(ti) = gt igndash1
Re-parameterize by letting mi = ebrsquoz i the conditionaldistribution of ti given mI is then Weibull (gmi) We for-mulated a mixed model to represent the conditionaldistribution of tij given covariate effects random addi-tive genetic and common environment effects as
tijfrac12mij ~ Weibull (g mij) i = 1hellipn j = 12
where
The regression coefficients and the precision of therandom effects (tG tF tH) were given ldquonon-informa-tiverdquo Normal and Gamma priors respectively The shapeparameter g of the time to onset of melanoma distribu-tion was also given a non-informative Gamma priorwhich was slowly decreasing on the positive real line
The overall model (including random effects) maybe described in a Bayesian graph (Figure 3) whichsimplifies sampling from full conditional distributionsby exploiting part ial independence propert ies(Spiegelhalter et al 1996a) In this graph eachrandom quantity is represented by a node which maybe connected by directed or undirected linksConditional independence assumptions are repre-sented by the absence of such links
We implemented the Gibbs sampler using theBUGS program (Gilks et al 1994) (code in theAppendix) Imputation of missing data was handlednaturally in the Gibbs sampling framework by treatingmissing values as additional unknown quantities andrandomly sampling values from their full conditionaldistributions We chose simple prior distributions forimputation since the number of missing values forcovariates was not large (complete for birth year 16missing for naevi and 21 missing for freckles) andthere was no indication of non-random missingness inour data Therefore imputation for missing naevi andfreckles covariates were based on Bernouilli prior dis-tributions with respective parameter values estimatedfrom the complete observations We performed aninitial 10000 burn-in iterations followed by an addi-tional 20000 Parameter estimates were the mean andstandard deviation (SD) of all post convergence Gibbssamples with a thinning interval between 20 and 50credible intervals were computed as the lower andupper a2 percentiles from the last 20000 iterationsConvergence to the posterior distribution was con-firmed by using the different criteria provided by theadd-on CODA package including those of Gelmanand Rubin (1992) Geweke (1992) and Raftery andLewis (1992a 1992b)
ResultsGEE Approach
An inspection of residual plots following preliminarymodel-fitting provided no evidence for the failure ofproportional hazards assumption and did not detectinfluential observations We then proceeded to applythe GEE approach that could estimate regressioncoefficients while incorporating a dependence struc-ture between relative pairs The results are displayed
108 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
in Table 3 which suggest that later birth year havingat least a moderate number of naevi and freckles wereall simultaneously associated with later age at onsetof melanoma Table 3 also presents the estimatedodds-ratios for quantifying the correlation betweenpaired relatives of specific relationships
The odds ratio for sibndashsib pairs is 2973 which issignificantly different from 1 (p lt 001) The odds ratiofor parentndashchild pairs is 1650 which is slightly greaterthan 1 indicating a mild dependency between thesepairs although they are not quite statistically signifi-cant The odds ratio for second-degree and higherrelative pairs is 1155 indicating no dependence at allbetween these pairs This pattern of familial aggrega-tion is compatible with dominance variance as well asadditive genetic variance (Falconer 1990)
Bayesian Approach
We re-analyzed the age-at-onset of melanoma by usingGibbs sampling to impute missing covariates and toestimate subject-specific covariate effects randomadditive genetic common family environment andshared sibling environment effects on the log scale Theestimated shape parameter g was 43 with a 95 credi-ble interval (CI) of (42 46) The results aresummarized in Table 4 (Model A) The residual plotsdid not indicate a gross departure from the underlyingWeibull model and revealed no influential observa-tions We checked the sensitivity of the analyses to
initial parameter values by re-running the Gibbssampler five more times using different starting valuesThe resulting estimates did not differ by more than 5from the values reported here The mean estimate for(s 2
A) was 0452 with 95 CI = (0348 0566) for (s 2C)
was ndash0053 with 95 CI = (ndash0120 0019) and for(s 2
Cs) was 0467 with 95 CI = (0393 0545)A small negative value for the common family
effect suggests that there may be a dominant effect orthat the current model is not quite appropriate forexample that there are systematic effects or correla-tion structures that have not been accounted for Theresults here indicate that additive genetics seem toimpact equally on the variation of the age at onset ofmelanoma Further exploration for alternative modelsis a focus of our future research
In addition we investigated the relative contribu-tion of genetic and environmental effects on theexpression of naevi (Model B) and freckles (ModelC) which are known risk factors for melanoma Theexpressions of naevi and freckles were coded asbinary variables (none or few moles versus moderateor many moles and no freckles versus one or morefreckles) A hierarchical Bayesian binomial model wasfitted to estimate the random variance componentsThe results in Table 4 indicate that a common familyenvironment effect contributed the most to theexpression of naevi (s 2
C = 0704) (relative to the con-tributions of additive genetic effect (s 2
A = 0142) and
Individual j in family i
Figure 3
Graphical model of covariate and random family effects for an individual in a nuclear family structure tij represents the observed failure time forthe jth individual in the ith family with tcenij being an indicator variable of censoring status Full arrows indicate stochastic links to which a probabil-ity is attached broken arrows denote deterministic relationships bs are regression coefficients T is the precision of the prior distribution andequals the inverse of the variance Fi Gi Hi are random effects modeled as Fi ~ N(012 s 2
A + s 2C) Gi ~ N(0 12 s 2
A) and Hi ~ N(0 s 2Cs) Rectangles rep-
resent actual data values for the covariates g and mij are shape and scale parameters for the underlying Weibull distribution
109Twin Research February 2004
Analysis of Melanoma Onset
of shared sibling effect(s 2Cs = 0142) both of which
were non negligible In contrast variation in theexpression of freckles was largely explained by addi-tive genetic and shared family effects(s 2
A = 2050 s 2C =
2600) compared to a relatively small shared siblingeffect(s 2
C = 0115)
Discussion
We applied two methods mdash generalized estimatingequat ion and Bayesian analys is mdash to the geneticanalysis of age at onset of melanoma based on anuclear family structure Under both approaches theresults suggest that additive genetic factors played an
Table 4
Gibbs Sampling Approach Estimated Regression Coefficients and Estimated Variance Components in a Melanoma Study of Queensland Families
Weibull Model A Mean effects mdash Response variable is Age-at-onset
Covariate RR = eb Coefficient b Robust se(b) 95 CI of bYear of birth 1378 0321 00027 (03160326)
Naevi 1126 0119 00021 (00580185)
Freckling 1017 0017 01400 (ndash00050085)
Weibull Model B Variance components mdash Response variable is Age-at-onset
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 0452 0054 (03480566)
s 2C ndash0053 0027 (ndash01200019)
s 2Cs 0467 0040 (03930545)
g 43 0104 (4246)
Binomial Model Variance components mdash Response variable is Naevi
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 0142 0149 (00020498)
s 2C 0704 0156 (04031010)
s 2Cs 0195 0157 (000250553)
Binomial Model Variance components mdash Response variable is Freckling
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 2050 0779 (08353570)
s 2C 2600 0418 (17803460)
s 2Cs 0115 0088 (00110312)
Note Indicates Significance at the 005 level) Naevi Is a Binary Variable with Baseline 0 = No or Few Moles Freckling Is Coded as a Binary Variable with Baseline0 = No Freckles
Table 3
GEE Approach Estimated Regression Coefficients in the Proportional Hazard Model and Estimated Odds Ratios for Quantifying FamilialAggregation in Age at Onset of Melanoma in Queensland Families ( Indicates Significance at the 05 Significance Level)
A Mean effects
Covariate RR = eb Coefficient b Robust se(b) Z-statistic
Year of birth 1142 0132 0051 2588
Naevi (Baseline = No or few moles) 1765 0568 0073 7781
Freckling (Baseline = No freckles) 1160 0148 0049 3020
B Pattterns of familial aggregation
Relationship 1 + q se(q) Z-statistic
Sibndashsib 2973 06217 317
Parentndashchild 1650 0434 150
SecondOthers 1155 0270 047
important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix
Acknowledgments
Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee
ReferencesAbel L amp Bonney G E (1990) A time-dependent
logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407
Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401
Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165
Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc
Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag
Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press
Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625
Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California
Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212
Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151
Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117
Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202
Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235
English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283
Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd
Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494
GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software
110 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
111Twin Research February 2004
Analysis of Melanoma Onset
Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214
Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188
Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409
Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511
Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741
Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press
Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178
Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189
Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126
Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432
Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109
Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760
Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678
Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506
Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071
Kempthorne O (1960) Biometrical genetics New YorkPergammon Press
Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press
Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210
Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452
Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334
LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425
Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293
Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298
MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432
Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092
Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43
Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493
Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497
Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press
SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University
Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148
112 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171
Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press
Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit
Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit
Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95
Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146
Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation
Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230
Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington
Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454
113Twin Research February 2004
Analysis of Melanoma Onset
Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects
102 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
decision trees that were incorporated into RPART byTherneau (1997) using the ideas put forth by LeBlancand Crowley (1992) for survival data with censoringThese authors showed through simulation studies thatsurvival trees could outperform standard parametricmethods such as proportional hazards modelling particularly in situations where the underlying distrib-ution was not exponential Along with Breiman et al(1984) they showed decision trees to be usefulexploratory tools for identifying important variablesinteractions and outliers
The methodology for survival trees begins by iden-tifying optimal splits using the log rank statistic toseparate the data into homogeneous groups Eachsplit is comprised of a parent node and two daughternodes which are linked to the parent by branchesFigure 1 is an example of a survival tree produced onthe melanoma data Once a large tree is grown withmany terminal nodes that contain very few observa-tions a pruning procedure is introduced to identify asequence of sub-trees The technique of pruninginvolves snipping back splits of the tree one at a timeuntil only the root node remains Cross-validation isthen introduced to aid in the selection of the optimalmodel One nice feature of survival trees is the use ofKaplan-Meier curves to provide information aboutsurvival rates at each terminal node of the tree Thisapproach may allow for better interpretation of theterminal nodes and highlight different scenarioswhich yield similar survival rates as can be seen inFigure 1
The variables identified from the multivariablemodel were used as input into RPART Variables suchas the sun and UV exposures were omitted from themodelling stage since nearly half of the data for eachof these variables were missing (cumulative sun expo-sure 438 UV exposure 404) A large survivaltree was produced in SPLUS and using cross-valida-tion a model splitting solely on birth year wasidentified This survival tree yielded the minimumcross-validated error rate (093) suggesting that birthyear was an important indicator for melanomaSelecting a slightly larger but more informative model(error = 094) resulted in a survival tree consisting ofnine terminal nodes This revealed splits on birth yearmole count and freckling Al l other variablesappeared to be either competing at each node oracting as a surrogate variable for one of these primarysplits Skin type in particular arose as an importantsurrogate for freckling
In Figure 1 we aim to provide a more visual inter-pretation to the results from the survival analysisEach split is shown at the top of each node and canbe assumed to be the split that directs observationstowards the left side of the tree For example aftersplitting on birth year at 1933 a split to the left cor-responds to an individual with few moles A split tothe right indicates an individual with many moles andso on The terminal nodes in this figure are displayed
as Kaplan-Meier plots showing the survival curvesalong with the number of observations n and the riskratio calculated with reference to the baseline groupof 9669 individuals The reference survival curve isshown at the root node along with a bar which showsthe percentage of events in the entire dataset Gridsare placed on each survival plot at equal intervals of20 years of age These grids have been plotted to sim-plify the interpretation An indicator to the left of theplot displays the risk ratio Ratios above one areshown by an ldquoup arrowrdquo while a decrease in risk isillustrated by a ldquodown arrowrdquo From this model wecan see that there are a few scenarios that indicatehigh risk for earlier onset of melanoma These scenar-ios may be described as
individuals born after 1966 with many moles (RR = 113)
individuals born between 1947 and 1966 withmany moles (RR = 45)
individuals born between 1933 and 1953 with fewmoles but many freckles (RR = 24)
individuals born between 1923 and 1933 withmany moles (RR = 141)
It is obvious from these results that birth year has asubstantial impact on the age-at-onset of melanomaOnce this is taken into account mole count and freck-ling only provide a small contribution to the risk
Family History of Melanoma
As described earlier in the Data section family historywas collected regarding first-degree (siblings andparents) and second-degree relatives From this pedi-gree structure other higher-order types of relative pairscould also be formed Some of these relatives were dis-eased with melanoma resulting in pairs of relativeswho both may be diseased (++) both not diseased (mdash)and one diseased while the other was not diseased (+ndash)Table 2 lists the concordant and discordant pairs ofspecific relationships sibndashsib parentndashchild andsecond-degreelower order relative pairs The second-degree relative pairs include grandparentndashgrandchildand auntndashniece while the lower order relative pairsinclude the in-law pairs From Table 2 the percent-ages of both diseased pairs are 06 09 and 04among sibndashsib parentndashchild and secondlower orderpairs respectively Crude estimates of correlationcoefficients can be calculated from these percentageswithout accounting for ages at onset among these rel-atives However the risk of developing melanomamay depend on the subjectrsquos age Hence adjusting forthe age at onset is essential in quantifying the correla-tion of age at onset between pairs of relatives In asubsequent section we describe how this can be donerigorously via the generalized estimating equationsapproach On average melanomas were diagnosedslightly earlier in relatives (475 years) than inprobands (502 years) Among relatives melanomaswere diagnosed at younger ages in later generations
103Twin Research February 2004
Analysis of Melanoma Onset
RR
= 1
00
Bas
elin
e
Fig
ure
1
Surv
ival
tree
for t
he m
elan
oma
data
con
sist
ing
of 9
669
indi
vidu
als
from
191
2 Qu
eens
land
fam
ilies
Ter
min
al n
odes
are
rect
angu
lar a
nd d
ispl
ay th
e Ka
plan
-Mei
er c
urve
for t
he c
orre
spon
ding
sub
grou
p
104 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
To account for the different ages at censoring in eachgeneration due to termination of the study or deathfrom causes other than melanoma we examined thedisease-free survival distribution for each generationusing the standard failure-time analysis technique(Figure 2) The median age at diagnosis of melanomawas 64 among parents of probands 50 among sib-lings of probands and 33 among children ofprobands The disease-free survival functions differedsignificantly between the children generation fromearlier generations (log-rank test with p lt 01) butthere was no significant differences in age-at-onset of
melanoma between the siblings and parents ofprobands (log rank test with p gt 9)
Preliminary Segregation Analysis
A preliminary segregation analysis was conducted(Aitken et al 1998) to investigate whether the famil-ial clustering of cutaneous melanoma is consistentwith Mendelian inheritance of a major autosomalgene Analyses were performed with the SAGE (1992)statistical package using the maximum likelihoodREGTL program for a binary trait with a variable ageof onset The hypothesis of co-dominant Mendelianinheritance gave a significantly better fit to the datathan either dominant or recessive Mendelian inheri-tance Overall both Mendelian inheritance of a singlemajor gene and purely environmental transmissionwere rejected However there was strong evidence offamilial dependence in melanoma occurrence
The inclusion of risk factors in the models mayreveal whether all or a combination of these explainsthe familial dependence that was demonstrated by thesegregation analysis If a major gene exists that oper-ates independently on these covariates includingthem in the models may reduce residual variation andincrease the power of the analysis to detect a majorgene effect
Estimating Equations Approach
Let y = (dki tki Zki) denote the data collected for the ith
member in the kth family (k = 1hellipK and i = 12)where dki = 0 if the observation is censored tki is eitherthe recorded age at diagnosis of melanoma or the ageat the most recent follow-up for unaffected peopleand Zki is a vector of measured covariates We assumethat censoring time age at diagnosis of melanomaand the covariates are independently distributedThese assumptions can be relaxed in more generalmodels subject to identification constraints Thehazard rate for melanoma is the instantaneous proba-bility that melanoma is diagnosed immediately aftertime t given that the person is unaffected at time tThe hazard rate under the Cox proportional hazardsmodel (Cox 1972) is given by
l(tki) = l0(tki) exp(brsquoZki)
Table 2
Concordant and Discordant Pairs of Relatives in 1912 Families from the Queensland Familial Melanoma Project Probands Were Not Included forthe Calculation of Concordance
Sibndashsib Parentndashchild SecondOthers Total
++ 49 15 41 105
+ndash 763 536 1200 2499
mdash 7011 1078 9817 17906
Total 7823 1629 11058 20510
Figure 2
Cumulative melanoma-free survival among first-degree relatives ofconfirmed cutaneous melanoma cases (probands) diagnosed inQueensland Australia 1982ndash1987 according to the relativersquos rela-tionship to the proband
where l0() is the baseline hazard function and b is avector of regression coefficients
For a specific pair of relatives we follow Clayton(1978) in modeling the bivariate survivor function
F(tk1 tk2) = (F1(tk1)ndashq + F2(tk2)
ndashq ndash1)ndash1q
where F1 and F2 are univariable survivor functions qis a scalar parameter that measures the degree ofdependence between the relativesrsquo times at onsetindependence being implied by q = 0 and positiveassociation by q gt 0 The Clayton model allows nega-tive dependencies and has the property that failuretimes are absolutely continuous for q gt ndash05 In addi-tion the cross-ratio (or odds-ratio) function asstudied by Oakes (1989) is
c(tk1 tk2) = l(tk1 frac12 Tk2 = tk2) l(tk1frac12Tk2 sup3 tk2) =1 + q
This is equivalent to assuming that the odds-ratio isinvariant over the grid region that supports the dataHeuristically the parameter 1 + q is an odds-ratiothat depends on the degree of dependence betweenthe onset ages of the two relatives If genetic factorsdo influence the age at onset of melanoma we wouldexpect to see a higher concordance in the age of onsetin first degree relatives who on average share halftheir genes in common in comparison to seconddegree relatives Under the current model this trans-lates as qfirst-degree gt qsecond-degree
We may use a standard method to estimate withinpair correlations for 2 acute 2 tables from odds ratiosEstimates of relation-pair correlations psibndashsib pparentndashchildand psecond-order are recovered from using the relation-ships for example r sibndashsib = min(1ln(1 + q sibndashsib))Testing for the presence of genetic factors underlyingthe age at diagnosis of melanoma is equivalent totesting H0 p first-order = psecond-order We may test thishypothesis using a z-transform (Kendall 1979 p 315) of the point estimates of the correlation coef-ficients Let n1 and n2 denote the number of first orderand second order relatives let z1and z2 denote thetransformed statistics of r1and r2 the correlation esti-mates for first-order and second-order relative pairsrespectively Specifically we reject H0 when ED gtZ1ndasha where
E = E(z1 ndash z2) = mdash12mdash log 31mdash11
+ndash r
r
1
1mdash2 1mdash11
+ndash r
r2
2
mdash24D2 = V(z1 ndash z2) = mdash
(n1
1ndash 3)mdash+ mdash
(n2
1ndash 3)mdash
and Z1ndasha is the standard normal deviate correspondingto the one-sided a significance level
This approach has the advantage of providing atest for the presence of genetic effects through a singleparameter (q) However it is limited in its ability to
attribute the phenotypic variance to specific effects(eg additive gene action)
Mathematical details of the GEE model and theiterative procedure to estimate the regression coeffi-cients b and specific degrees of dependence q for thedifferent types of relative pairs have been summarizedpreviously in Do et al (2000)
MCMC Analysis Using BUGSThe Bayesian Paradigm and Gibbs Sampling
Markov Chain Monte Carlo (MCMC) is an alterna-tive Bayesian approach that provides estimates oflikelihoods and associated parameter values whenexact computation is infeasible (Hastings 1970Metropolis et al 1953) MCMC methods can beused to draw samples from the underlying joint distri-bution of major genotypes and polygenic valuesconditional on the observed data From thesesamples desired parameters and likelihoods can beestimated without the need to resort to exact compu-tation MCMC methods have been used for linkageanalysis (Kong et al 1992 Lange amp Sobel 1991)for estimation of parameters in the mixed model withand without covariates (Guo amp Thompson 1991Thomas 1992) for estimation of genendashsmoking inter-action and covariate imputation (Gauderman et al1997) for performing combined linkage and segrega-tion analysis (Faucett et al 1993 Guo amp Thompson1992) and for mixed models of large complex pedi-grees (Guo amp Thompson 1994)
In a general setting let y be the observed data andq be everything not observed including parametersand latent variables The implementation of Bayesianmethods using realistic models and priors is com-puter-intensive and relies heavily on cunningcomputational tools to approximate integrals Theproblem in general terms is to obtain the expectedvalue of a function of interest s () under the posteriordensity p(qfrac12x)
E[s(q)] =
which cannot generally be found analytically Onemethod to carry out the integration on the RHS is toperform simulation of exact Bayesian posterior distri-butions using Markov chain Monte Carlo techniquessuch as Gibbs sampling The Gibbs sampler (Gemanamp Geman 1984) is the most popular algorithm usedin MCMC applications to correlated data Gibbssampling was introduced to the main statistical com-munity by Gelfand and Smith (1990) and has sincebeen applied to an even wider array of problems TheGibbs sampler is easy to implement because it onlydepends on the local neighborhood structure In thecontext of pedigree analysis (Olshen amp Wijsman1996) the basic procedure is a sequential updating ofmissing and latent data including the underlying andunobserved major genotypes polygenic effects and
EQ s(q) p(q) p(xfrac12q) dqmdash mdash mdash
EQ p(q) p(xfrac12q) dq
105Twin Research February 2004
Analysis of Melanoma Onset
106 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
environmental effects Values for the missing or latentdata are sampled from the local conditional distribu-tion a function of the observed individual data thecurrent sampled values of other missinglatent data forthis particular individual such as polygenic and environ-mental effects and the values for the sampled geneticeffects in the immediate neighbors of an individualGibbs sampling basically consists of three main steps
Step 1 Setting initial values for unobserved quanti-ties (parameters and latent variables)
Step 2 For each parameter or latent variable qj sample from its ldquofull conditional distributionrdquogiven the current values of all other quantities inthe model
Step 3 Examine sampled values of parameters andlatent variables to monitor convergence and toprovide summary measures
Some of the most recent and popular packages thatimplement Gibbs sampling for analysis of pedigreedata include BUGS (Gilks et al 1994 Spiegelhalter et al 1996ab) Genetic Analysis Package (GAP1996) and MIXD (Olshen amp Wijsman 1996Thompson 1994) We have used BUGS mainlybecause of its flexibility in programming hierarchicalmodels besides being a freeware product
The Model
The aim here is to model the correlation structurewithin the family structure to satisfy the fundamentaladditive genetic model (Crow amp Kimura 1970Falconer 1990 Kempthorne 1960) as followsConsider a nuclear family structure consisting of 4members father mother and two children denoted byF M S1 S2 respectively Using similar notation as inBurton et al (1999) a conventional mixed linearmodel consisting of fixed and random effects may bewritten in the form
Qij = bTz + Aij +Cij + CSij+ Eij
where Qij is the observed value of a normally distrib-uted continuous trait for the jth individual in the ith
nuclear family zij is a vector of observed covariatesrepresenting fixed effects and b is a correspondingvector of unknown fixed regression coefficients Aij Cij and CSij
A denote random effects that repre-sent additive polygenic common family environmentand common sibling environment effects respectivelyThe variation in an individual response is representedby a composite covariance matrix VT and is the sumof an additive genetic covariance matrix VA acommon family environment matrix VC a sharedsibling environment matrix VCs and residual environ-mental effects The different variance components are
The overall total covariance matrix is
The components of variance s 2A s 2
C s 2Cs need not be
positive as long as VT is positive definite A negativevalue for s 2
Cs simply implies that the realized covari-ance between sib lings is less than the realized
107Twin Research February 2004
Analysis of Melanoma Onset
covariance between parent and a child A negativevalue for s 2
Cs may suggest dominance To optimizeconvergence in BUGS Model (1) may be reparameter-ized as
where Fi Gi H i are independent additive randomeffects or latent variables and are the residual errorterms for parents and children respectively If we model
Fi ~ N(0 mdash12mdash s 2
A +s 2C)
Gi ~ N(0 mdash12mdash s 2
A) Hi ~ N(0 s 2Cs)
then the basic genetics covariance model (additivegenetic common environment and unique environ-ment for this particular four-member family structureis satisfied
In survival models unobserved or unmeasuredexplanatory variables some of which may be geneticare often referred to as frailties The frailties takevalues restricted to the posit ive line and may beassumed to act multiplicatively on the hazardExtending the above model to correlated family datawith time-to-onset endpoint a multiplicative individ-ual heterogeneity or frailty term representing thelatent genetic and common environment variablesmay be modeled as random effects simultaneouslywith the effects associated with observed covariatesConsider right censored time to onset of melanomadata (Tij dij zij)1 j n from n relative pairs hereT i j denotes the true age at onset of the jth familymember or the censored time depending on whetherdij =1 or 0 respectively and z denotes a p acute 1 vector ofcovariates A Weibull distribution may be used tomodel time to failure as
f(ti zi) = ebrsquoz igt igndash1 exp(ndashebrsquoz it g
i )
where b is a vector of unknown regression coefficientsand g is the shape parameter of the Weibull distribu-tion This leads to a baseline hazard of the form
l0(ti) = gt igndash1
Re-parameterize by letting mi = ebrsquoz i the conditionaldistribution of ti given mI is then Weibull (gmi) We for-mulated a mixed model to represent the conditionaldistribution of tij given covariate effects random addi-tive genetic and common environment effects as
tijfrac12mij ~ Weibull (g mij) i = 1hellipn j = 12
where
The regression coefficients and the precision of therandom effects (tG tF tH) were given ldquonon-informa-tiverdquo Normal and Gamma priors respectively The shapeparameter g of the time to onset of melanoma distribu-tion was also given a non-informative Gamma priorwhich was slowly decreasing on the positive real line
The overall model (including random effects) maybe described in a Bayesian graph (Figure 3) whichsimplifies sampling from full conditional distributionsby exploiting part ial independence propert ies(Spiegelhalter et al 1996a) In this graph eachrandom quantity is represented by a node which maybe connected by directed or undirected linksConditional independence assumptions are repre-sented by the absence of such links
We implemented the Gibbs sampler using theBUGS program (Gilks et al 1994) (code in theAppendix) Imputation of missing data was handlednaturally in the Gibbs sampling framework by treatingmissing values as additional unknown quantities andrandomly sampling values from their full conditionaldistributions We chose simple prior distributions forimputation since the number of missing values forcovariates was not large (complete for birth year 16missing for naevi and 21 missing for freckles) andthere was no indication of non-random missingness inour data Therefore imputation for missing naevi andfreckles covariates were based on Bernouilli prior dis-tributions with respective parameter values estimatedfrom the complete observations We performed aninitial 10000 burn-in iterations followed by an addi-tional 20000 Parameter estimates were the mean andstandard deviation (SD) of all post convergence Gibbssamples with a thinning interval between 20 and 50credible intervals were computed as the lower andupper a2 percentiles from the last 20000 iterationsConvergence to the posterior distribution was con-firmed by using the different criteria provided by theadd-on CODA package including those of Gelmanand Rubin (1992) Geweke (1992) and Raftery andLewis (1992a 1992b)
ResultsGEE Approach
An inspection of residual plots following preliminarymodel-fitting provided no evidence for the failure ofproportional hazards assumption and did not detectinfluential observations We then proceeded to applythe GEE approach that could estimate regressioncoefficients while incorporating a dependence struc-ture between relative pairs The results are displayed
108 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
in Table 3 which suggest that later birth year havingat least a moderate number of naevi and freckles wereall simultaneously associated with later age at onsetof melanoma Table 3 also presents the estimatedodds-ratios for quantifying the correlation betweenpaired relatives of specific relationships
The odds ratio for sibndashsib pairs is 2973 which issignificantly different from 1 (p lt 001) The odds ratiofor parentndashchild pairs is 1650 which is slightly greaterthan 1 indicating a mild dependency between thesepairs although they are not quite statistically signifi-cant The odds ratio for second-degree and higherrelative pairs is 1155 indicating no dependence at allbetween these pairs This pattern of familial aggrega-tion is compatible with dominance variance as well asadditive genetic variance (Falconer 1990)
Bayesian Approach
We re-analyzed the age-at-onset of melanoma by usingGibbs sampling to impute missing covariates and toestimate subject-specific covariate effects randomadditive genetic common family environment andshared sibling environment effects on the log scale Theestimated shape parameter g was 43 with a 95 credi-ble interval (CI) of (42 46) The results aresummarized in Table 4 (Model A) The residual plotsdid not indicate a gross departure from the underlyingWeibull model and revealed no influential observa-tions We checked the sensitivity of the analyses to
initial parameter values by re-running the Gibbssampler five more times using different starting valuesThe resulting estimates did not differ by more than 5from the values reported here The mean estimate for(s 2
A) was 0452 with 95 CI = (0348 0566) for (s 2C)
was ndash0053 with 95 CI = (ndash0120 0019) and for(s 2
Cs) was 0467 with 95 CI = (0393 0545)A small negative value for the common family
effect suggests that there may be a dominant effect orthat the current model is not quite appropriate forexample that there are systematic effects or correla-tion structures that have not been accounted for Theresults here indicate that additive genetics seem toimpact equally on the variation of the age at onset ofmelanoma Further exploration for alternative modelsis a focus of our future research
In addition we investigated the relative contribu-tion of genetic and environmental effects on theexpression of naevi (Model B) and freckles (ModelC) which are known risk factors for melanoma Theexpressions of naevi and freckles were coded asbinary variables (none or few moles versus moderateor many moles and no freckles versus one or morefreckles) A hierarchical Bayesian binomial model wasfitted to estimate the random variance componentsThe results in Table 4 indicate that a common familyenvironment effect contributed the most to theexpression of naevi (s 2
C = 0704) (relative to the con-tributions of additive genetic effect (s 2
A = 0142) and
Individual j in family i
Figure 3
Graphical model of covariate and random family effects for an individual in a nuclear family structure tij represents the observed failure time forthe jth individual in the ith family with tcenij being an indicator variable of censoring status Full arrows indicate stochastic links to which a probabil-ity is attached broken arrows denote deterministic relationships bs are regression coefficients T is the precision of the prior distribution andequals the inverse of the variance Fi Gi Hi are random effects modeled as Fi ~ N(012 s 2
A + s 2C) Gi ~ N(0 12 s 2
A) and Hi ~ N(0 s 2Cs) Rectangles rep-
resent actual data values for the covariates g and mij are shape and scale parameters for the underlying Weibull distribution
109Twin Research February 2004
Analysis of Melanoma Onset
of shared sibling effect(s 2Cs = 0142) both of which
were non negligible In contrast variation in theexpression of freckles was largely explained by addi-tive genetic and shared family effects(s 2
A = 2050 s 2C =
2600) compared to a relatively small shared siblingeffect(s 2
C = 0115)
Discussion
We applied two methods mdash generalized estimatingequat ion and Bayesian analys is mdash to the geneticanalysis of age at onset of melanoma based on anuclear family structure Under both approaches theresults suggest that additive genetic factors played an
Table 4
Gibbs Sampling Approach Estimated Regression Coefficients and Estimated Variance Components in a Melanoma Study of Queensland Families
Weibull Model A Mean effects mdash Response variable is Age-at-onset
Covariate RR = eb Coefficient b Robust se(b) 95 CI of bYear of birth 1378 0321 00027 (03160326)
Naevi 1126 0119 00021 (00580185)
Freckling 1017 0017 01400 (ndash00050085)
Weibull Model B Variance components mdash Response variable is Age-at-onset
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 0452 0054 (03480566)
s 2C ndash0053 0027 (ndash01200019)
s 2Cs 0467 0040 (03930545)
g 43 0104 (4246)
Binomial Model Variance components mdash Response variable is Naevi
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 0142 0149 (00020498)
s 2C 0704 0156 (04031010)
s 2Cs 0195 0157 (000250553)
Binomial Model Variance components mdash Response variable is Freckling
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 2050 0779 (08353570)
s 2C 2600 0418 (17803460)
s 2Cs 0115 0088 (00110312)
Note Indicates Significance at the 005 level) Naevi Is a Binary Variable with Baseline 0 = No or Few Moles Freckling Is Coded as a Binary Variable with Baseline0 = No Freckles
Table 3
GEE Approach Estimated Regression Coefficients in the Proportional Hazard Model and Estimated Odds Ratios for Quantifying FamilialAggregation in Age at Onset of Melanoma in Queensland Families ( Indicates Significance at the 05 Significance Level)
A Mean effects
Covariate RR = eb Coefficient b Robust se(b) Z-statistic
Year of birth 1142 0132 0051 2588
Naevi (Baseline = No or few moles) 1765 0568 0073 7781
Freckling (Baseline = No freckles) 1160 0148 0049 3020
B Pattterns of familial aggregation
Relationship 1 + q se(q) Z-statistic
Sibndashsib 2973 06217 317
Parentndashchild 1650 0434 150
SecondOthers 1155 0270 047
important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix
Acknowledgments
Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee
ReferencesAbel L amp Bonney G E (1990) A time-dependent
logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407
Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401
Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165
Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc
Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag
Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press
Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625
Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California
Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212
Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151
Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117
Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202
Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235
English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283
Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd
Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494
GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software
110 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
111Twin Research February 2004
Analysis of Melanoma Onset
Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214
Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188
Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409
Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511
Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741
Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press
Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178
Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189
Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126
Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432
Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109
Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760
Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678
Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506
Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071
Kempthorne O (1960) Biometrical genetics New YorkPergammon Press
Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press
Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210
Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452
Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334
LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425
Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293
Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298
MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432
Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092
Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43
Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493
Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497
Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press
SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University
Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148
112 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171
Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press
Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit
Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit
Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95
Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146
Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation
Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230
Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington
Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454
113Twin Research February 2004
Analysis of Melanoma Onset
Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects
103Twin Research February 2004
Analysis of Melanoma Onset
RR
= 1
00
Bas
elin
e
Fig
ure
1
Surv
ival
tree
for t
he m
elan
oma
data
con
sist
ing
of 9
669
indi
vidu
als
from
191
2 Qu
eens
land
fam
ilies
Ter
min
al n
odes
are
rect
angu
lar a
nd d
ispl
ay th
e Ka
plan
-Mei
er c
urve
for t
he c
orre
spon
ding
sub
grou
p
104 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
To account for the different ages at censoring in eachgeneration due to termination of the study or deathfrom causes other than melanoma we examined thedisease-free survival distribution for each generationusing the standard failure-time analysis technique(Figure 2) The median age at diagnosis of melanomawas 64 among parents of probands 50 among sib-lings of probands and 33 among children ofprobands The disease-free survival functions differedsignificantly between the children generation fromearlier generations (log-rank test with p lt 01) butthere was no significant differences in age-at-onset of
melanoma between the siblings and parents ofprobands (log rank test with p gt 9)
Preliminary Segregation Analysis
A preliminary segregation analysis was conducted(Aitken et al 1998) to investigate whether the famil-ial clustering of cutaneous melanoma is consistentwith Mendelian inheritance of a major autosomalgene Analyses were performed with the SAGE (1992)statistical package using the maximum likelihoodREGTL program for a binary trait with a variable ageof onset The hypothesis of co-dominant Mendelianinheritance gave a significantly better fit to the datathan either dominant or recessive Mendelian inheri-tance Overall both Mendelian inheritance of a singlemajor gene and purely environmental transmissionwere rejected However there was strong evidence offamilial dependence in melanoma occurrence
The inclusion of risk factors in the models mayreveal whether all or a combination of these explainsthe familial dependence that was demonstrated by thesegregation analysis If a major gene exists that oper-ates independently on these covariates includingthem in the models may reduce residual variation andincrease the power of the analysis to detect a majorgene effect
Estimating Equations Approach
Let y = (dki tki Zki) denote the data collected for the ith
member in the kth family (k = 1hellipK and i = 12)where dki = 0 if the observation is censored tki is eitherthe recorded age at diagnosis of melanoma or the ageat the most recent follow-up for unaffected peopleand Zki is a vector of measured covariates We assumethat censoring time age at diagnosis of melanomaand the covariates are independently distributedThese assumptions can be relaxed in more generalmodels subject to identification constraints Thehazard rate for melanoma is the instantaneous proba-bility that melanoma is diagnosed immediately aftertime t given that the person is unaffected at time tThe hazard rate under the Cox proportional hazardsmodel (Cox 1972) is given by
l(tki) = l0(tki) exp(brsquoZki)
Table 2
Concordant and Discordant Pairs of Relatives in 1912 Families from the Queensland Familial Melanoma Project Probands Were Not Included forthe Calculation of Concordance
Sibndashsib Parentndashchild SecondOthers Total
++ 49 15 41 105
+ndash 763 536 1200 2499
mdash 7011 1078 9817 17906
Total 7823 1629 11058 20510
Figure 2
Cumulative melanoma-free survival among first-degree relatives ofconfirmed cutaneous melanoma cases (probands) diagnosed inQueensland Australia 1982ndash1987 according to the relativersquos rela-tionship to the proband
where l0() is the baseline hazard function and b is avector of regression coefficients
For a specific pair of relatives we follow Clayton(1978) in modeling the bivariate survivor function
F(tk1 tk2) = (F1(tk1)ndashq + F2(tk2)
ndashq ndash1)ndash1q
where F1 and F2 are univariable survivor functions qis a scalar parameter that measures the degree ofdependence between the relativesrsquo times at onsetindependence being implied by q = 0 and positiveassociation by q gt 0 The Clayton model allows nega-tive dependencies and has the property that failuretimes are absolutely continuous for q gt ndash05 In addi-tion the cross-ratio (or odds-ratio) function asstudied by Oakes (1989) is
c(tk1 tk2) = l(tk1 frac12 Tk2 = tk2) l(tk1frac12Tk2 sup3 tk2) =1 + q
This is equivalent to assuming that the odds-ratio isinvariant over the grid region that supports the dataHeuristically the parameter 1 + q is an odds-ratiothat depends on the degree of dependence betweenthe onset ages of the two relatives If genetic factorsdo influence the age at onset of melanoma we wouldexpect to see a higher concordance in the age of onsetin first degree relatives who on average share halftheir genes in common in comparison to seconddegree relatives Under the current model this trans-lates as qfirst-degree gt qsecond-degree
We may use a standard method to estimate withinpair correlations for 2 acute 2 tables from odds ratiosEstimates of relation-pair correlations psibndashsib pparentndashchildand psecond-order are recovered from using the relation-ships for example r sibndashsib = min(1ln(1 + q sibndashsib))Testing for the presence of genetic factors underlyingthe age at diagnosis of melanoma is equivalent totesting H0 p first-order = psecond-order We may test thishypothesis using a z-transform (Kendall 1979 p 315) of the point estimates of the correlation coef-ficients Let n1 and n2 denote the number of first orderand second order relatives let z1and z2 denote thetransformed statistics of r1and r2 the correlation esti-mates for first-order and second-order relative pairsrespectively Specifically we reject H0 when ED gtZ1ndasha where
E = E(z1 ndash z2) = mdash12mdash log 31mdash11
+ndash r
r
1
1mdash2 1mdash11
+ndash r
r2
2
mdash24D2 = V(z1 ndash z2) = mdash
(n1
1ndash 3)mdash+ mdash
(n2
1ndash 3)mdash
and Z1ndasha is the standard normal deviate correspondingto the one-sided a significance level
This approach has the advantage of providing atest for the presence of genetic effects through a singleparameter (q) However it is limited in its ability to
attribute the phenotypic variance to specific effects(eg additive gene action)
Mathematical details of the GEE model and theiterative procedure to estimate the regression coeffi-cients b and specific degrees of dependence q for thedifferent types of relative pairs have been summarizedpreviously in Do et al (2000)
MCMC Analysis Using BUGSThe Bayesian Paradigm and Gibbs Sampling
Markov Chain Monte Carlo (MCMC) is an alterna-tive Bayesian approach that provides estimates oflikelihoods and associated parameter values whenexact computation is infeasible (Hastings 1970Metropolis et al 1953) MCMC methods can beused to draw samples from the underlying joint distri-bution of major genotypes and polygenic valuesconditional on the observed data From thesesamples desired parameters and likelihoods can beestimated without the need to resort to exact compu-tation MCMC methods have been used for linkageanalysis (Kong et al 1992 Lange amp Sobel 1991)for estimation of parameters in the mixed model withand without covariates (Guo amp Thompson 1991Thomas 1992) for estimation of genendashsmoking inter-action and covariate imputation (Gauderman et al1997) for performing combined linkage and segrega-tion analysis (Faucett et al 1993 Guo amp Thompson1992) and for mixed models of large complex pedi-grees (Guo amp Thompson 1994)
In a general setting let y be the observed data andq be everything not observed including parametersand latent variables The implementation of Bayesianmethods using realistic models and priors is com-puter-intensive and relies heavily on cunningcomputational tools to approximate integrals Theproblem in general terms is to obtain the expectedvalue of a function of interest s () under the posteriordensity p(qfrac12x)
E[s(q)] =
which cannot generally be found analytically Onemethod to carry out the integration on the RHS is toperform simulation of exact Bayesian posterior distri-butions using Markov chain Monte Carlo techniquessuch as Gibbs sampling The Gibbs sampler (Gemanamp Geman 1984) is the most popular algorithm usedin MCMC applications to correlated data Gibbssampling was introduced to the main statistical com-munity by Gelfand and Smith (1990) and has sincebeen applied to an even wider array of problems TheGibbs sampler is easy to implement because it onlydepends on the local neighborhood structure In thecontext of pedigree analysis (Olshen amp Wijsman1996) the basic procedure is a sequential updating ofmissing and latent data including the underlying andunobserved major genotypes polygenic effects and
EQ s(q) p(q) p(xfrac12q) dqmdash mdash mdash
EQ p(q) p(xfrac12q) dq
105Twin Research February 2004
Analysis of Melanoma Onset
106 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
environmental effects Values for the missing or latentdata are sampled from the local conditional distribu-tion a function of the observed individual data thecurrent sampled values of other missinglatent data forthis particular individual such as polygenic and environ-mental effects and the values for the sampled geneticeffects in the immediate neighbors of an individualGibbs sampling basically consists of three main steps
Step 1 Setting initial values for unobserved quanti-ties (parameters and latent variables)
Step 2 For each parameter or latent variable qj sample from its ldquofull conditional distributionrdquogiven the current values of all other quantities inthe model
Step 3 Examine sampled values of parameters andlatent variables to monitor convergence and toprovide summary measures
Some of the most recent and popular packages thatimplement Gibbs sampling for analysis of pedigreedata include BUGS (Gilks et al 1994 Spiegelhalter et al 1996ab) Genetic Analysis Package (GAP1996) and MIXD (Olshen amp Wijsman 1996Thompson 1994) We have used BUGS mainlybecause of its flexibility in programming hierarchicalmodels besides being a freeware product
The Model
The aim here is to model the correlation structurewithin the family structure to satisfy the fundamentaladditive genetic model (Crow amp Kimura 1970Falconer 1990 Kempthorne 1960) as followsConsider a nuclear family structure consisting of 4members father mother and two children denoted byF M S1 S2 respectively Using similar notation as inBurton et al (1999) a conventional mixed linearmodel consisting of fixed and random effects may bewritten in the form
Qij = bTz + Aij +Cij + CSij+ Eij
where Qij is the observed value of a normally distrib-uted continuous trait for the jth individual in the ith
nuclear family zij is a vector of observed covariatesrepresenting fixed effects and b is a correspondingvector of unknown fixed regression coefficients Aij Cij and CSij
A denote random effects that repre-sent additive polygenic common family environmentand common sibling environment effects respectivelyThe variation in an individual response is representedby a composite covariance matrix VT and is the sumof an additive genetic covariance matrix VA acommon family environment matrix VC a sharedsibling environment matrix VCs and residual environ-mental effects The different variance components are
The overall total covariance matrix is
The components of variance s 2A s 2
C s 2Cs need not be
positive as long as VT is positive definite A negativevalue for s 2
Cs simply implies that the realized covari-ance between sib lings is less than the realized
107Twin Research February 2004
Analysis of Melanoma Onset
covariance between parent and a child A negativevalue for s 2
Cs may suggest dominance To optimizeconvergence in BUGS Model (1) may be reparameter-ized as
where Fi Gi H i are independent additive randomeffects or latent variables and are the residual errorterms for parents and children respectively If we model
Fi ~ N(0 mdash12mdash s 2
A +s 2C)
Gi ~ N(0 mdash12mdash s 2
A) Hi ~ N(0 s 2Cs)
then the basic genetics covariance model (additivegenetic common environment and unique environ-ment for this particular four-member family structureis satisfied
In survival models unobserved or unmeasuredexplanatory variables some of which may be geneticare often referred to as frailties The frailties takevalues restricted to the posit ive line and may beassumed to act multiplicatively on the hazardExtending the above model to correlated family datawith time-to-onset endpoint a multiplicative individ-ual heterogeneity or frailty term representing thelatent genetic and common environment variablesmay be modeled as random effects simultaneouslywith the effects associated with observed covariatesConsider right censored time to onset of melanomadata (Tij dij zij)1 j n from n relative pairs hereT i j denotes the true age at onset of the jth familymember or the censored time depending on whetherdij =1 or 0 respectively and z denotes a p acute 1 vector ofcovariates A Weibull distribution may be used tomodel time to failure as
f(ti zi) = ebrsquoz igt igndash1 exp(ndashebrsquoz it g
i )
where b is a vector of unknown regression coefficientsand g is the shape parameter of the Weibull distribu-tion This leads to a baseline hazard of the form
l0(ti) = gt igndash1
Re-parameterize by letting mi = ebrsquoz i the conditionaldistribution of ti given mI is then Weibull (gmi) We for-mulated a mixed model to represent the conditionaldistribution of tij given covariate effects random addi-tive genetic and common environment effects as
tijfrac12mij ~ Weibull (g mij) i = 1hellipn j = 12
where
The regression coefficients and the precision of therandom effects (tG tF tH) were given ldquonon-informa-tiverdquo Normal and Gamma priors respectively The shapeparameter g of the time to onset of melanoma distribu-tion was also given a non-informative Gamma priorwhich was slowly decreasing on the positive real line
The overall model (including random effects) maybe described in a Bayesian graph (Figure 3) whichsimplifies sampling from full conditional distributionsby exploiting part ial independence propert ies(Spiegelhalter et al 1996a) In this graph eachrandom quantity is represented by a node which maybe connected by directed or undirected linksConditional independence assumptions are repre-sented by the absence of such links
We implemented the Gibbs sampler using theBUGS program (Gilks et al 1994) (code in theAppendix) Imputation of missing data was handlednaturally in the Gibbs sampling framework by treatingmissing values as additional unknown quantities andrandomly sampling values from their full conditionaldistributions We chose simple prior distributions forimputation since the number of missing values forcovariates was not large (complete for birth year 16missing for naevi and 21 missing for freckles) andthere was no indication of non-random missingness inour data Therefore imputation for missing naevi andfreckles covariates were based on Bernouilli prior dis-tributions with respective parameter values estimatedfrom the complete observations We performed aninitial 10000 burn-in iterations followed by an addi-tional 20000 Parameter estimates were the mean andstandard deviation (SD) of all post convergence Gibbssamples with a thinning interval between 20 and 50credible intervals were computed as the lower andupper a2 percentiles from the last 20000 iterationsConvergence to the posterior distribution was con-firmed by using the different criteria provided by theadd-on CODA package including those of Gelmanand Rubin (1992) Geweke (1992) and Raftery andLewis (1992a 1992b)
ResultsGEE Approach
An inspection of residual plots following preliminarymodel-fitting provided no evidence for the failure ofproportional hazards assumption and did not detectinfluential observations We then proceeded to applythe GEE approach that could estimate regressioncoefficients while incorporating a dependence struc-ture between relative pairs The results are displayed
108 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
in Table 3 which suggest that later birth year havingat least a moderate number of naevi and freckles wereall simultaneously associated with later age at onsetof melanoma Table 3 also presents the estimatedodds-ratios for quantifying the correlation betweenpaired relatives of specific relationships
The odds ratio for sibndashsib pairs is 2973 which issignificantly different from 1 (p lt 001) The odds ratiofor parentndashchild pairs is 1650 which is slightly greaterthan 1 indicating a mild dependency between thesepairs although they are not quite statistically signifi-cant The odds ratio for second-degree and higherrelative pairs is 1155 indicating no dependence at allbetween these pairs This pattern of familial aggrega-tion is compatible with dominance variance as well asadditive genetic variance (Falconer 1990)
Bayesian Approach
We re-analyzed the age-at-onset of melanoma by usingGibbs sampling to impute missing covariates and toestimate subject-specific covariate effects randomadditive genetic common family environment andshared sibling environment effects on the log scale Theestimated shape parameter g was 43 with a 95 credi-ble interval (CI) of (42 46) The results aresummarized in Table 4 (Model A) The residual plotsdid not indicate a gross departure from the underlyingWeibull model and revealed no influential observa-tions We checked the sensitivity of the analyses to
initial parameter values by re-running the Gibbssampler five more times using different starting valuesThe resulting estimates did not differ by more than 5from the values reported here The mean estimate for(s 2
A) was 0452 with 95 CI = (0348 0566) for (s 2C)
was ndash0053 with 95 CI = (ndash0120 0019) and for(s 2
Cs) was 0467 with 95 CI = (0393 0545)A small negative value for the common family
effect suggests that there may be a dominant effect orthat the current model is not quite appropriate forexample that there are systematic effects or correla-tion structures that have not been accounted for Theresults here indicate that additive genetics seem toimpact equally on the variation of the age at onset ofmelanoma Further exploration for alternative modelsis a focus of our future research
In addition we investigated the relative contribu-tion of genetic and environmental effects on theexpression of naevi (Model B) and freckles (ModelC) which are known risk factors for melanoma Theexpressions of naevi and freckles were coded asbinary variables (none or few moles versus moderateor many moles and no freckles versus one or morefreckles) A hierarchical Bayesian binomial model wasfitted to estimate the random variance componentsThe results in Table 4 indicate that a common familyenvironment effect contributed the most to theexpression of naevi (s 2
C = 0704) (relative to the con-tributions of additive genetic effect (s 2
A = 0142) and
Individual j in family i
Figure 3
Graphical model of covariate and random family effects for an individual in a nuclear family structure tij represents the observed failure time forthe jth individual in the ith family with tcenij being an indicator variable of censoring status Full arrows indicate stochastic links to which a probabil-ity is attached broken arrows denote deterministic relationships bs are regression coefficients T is the precision of the prior distribution andequals the inverse of the variance Fi Gi Hi are random effects modeled as Fi ~ N(012 s 2
A + s 2C) Gi ~ N(0 12 s 2
A) and Hi ~ N(0 s 2Cs) Rectangles rep-
resent actual data values for the covariates g and mij are shape and scale parameters for the underlying Weibull distribution
109Twin Research February 2004
Analysis of Melanoma Onset
of shared sibling effect(s 2Cs = 0142) both of which
were non negligible In contrast variation in theexpression of freckles was largely explained by addi-tive genetic and shared family effects(s 2
A = 2050 s 2C =
2600) compared to a relatively small shared siblingeffect(s 2
C = 0115)
Discussion
We applied two methods mdash generalized estimatingequat ion and Bayesian analys is mdash to the geneticanalysis of age at onset of melanoma based on anuclear family structure Under both approaches theresults suggest that additive genetic factors played an
Table 4
Gibbs Sampling Approach Estimated Regression Coefficients and Estimated Variance Components in a Melanoma Study of Queensland Families
Weibull Model A Mean effects mdash Response variable is Age-at-onset
Covariate RR = eb Coefficient b Robust se(b) 95 CI of bYear of birth 1378 0321 00027 (03160326)
Naevi 1126 0119 00021 (00580185)
Freckling 1017 0017 01400 (ndash00050085)
Weibull Model B Variance components mdash Response variable is Age-at-onset
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 0452 0054 (03480566)
s 2C ndash0053 0027 (ndash01200019)
s 2Cs 0467 0040 (03930545)
g 43 0104 (4246)
Binomial Model Variance components mdash Response variable is Naevi
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 0142 0149 (00020498)
s 2C 0704 0156 (04031010)
s 2Cs 0195 0157 (000250553)
Binomial Model Variance components mdash Response variable is Freckling
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 2050 0779 (08353570)
s 2C 2600 0418 (17803460)
s 2Cs 0115 0088 (00110312)
Note Indicates Significance at the 005 level) Naevi Is a Binary Variable with Baseline 0 = No or Few Moles Freckling Is Coded as a Binary Variable with Baseline0 = No Freckles
Table 3
GEE Approach Estimated Regression Coefficients in the Proportional Hazard Model and Estimated Odds Ratios for Quantifying FamilialAggregation in Age at Onset of Melanoma in Queensland Families ( Indicates Significance at the 05 Significance Level)
A Mean effects
Covariate RR = eb Coefficient b Robust se(b) Z-statistic
Year of birth 1142 0132 0051 2588
Naevi (Baseline = No or few moles) 1765 0568 0073 7781
Freckling (Baseline = No freckles) 1160 0148 0049 3020
B Pattterns of familial aggregation
Relationship 1 + q se(q) Z-statistic
Sibndashsib 2973 06217 317
Parentndashchild 1650 0434 150
SecondOthers 1155 0270 047
important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix
Acknowledgments
Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee
ReferencesAbel L amp Bonney G E (1990) A time-dependent
logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407
Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401
Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165
Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc
Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag
Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press
Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625
Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California
Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212
Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151
Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117
Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202
Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235
English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283
Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd
Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494
GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software
110 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
111Twin Research February 2004
Analysis of Melanoma Onset
Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214
Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188
Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409
Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511
Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741
Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press
Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178
Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189
Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126
Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432
Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109
Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760
Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678
Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506
Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071
Kempthorne O (1960) Biometrical genetics New YorkPergammon Press
Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press
Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210
Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452
Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334
LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425
Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293
Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298
MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432
Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092
Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43
Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493
Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497
Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press
SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University
Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148
112 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171
Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press
Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit
Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit
Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95
Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146
Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation
Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230
Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington
Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454
113Twin Research February 2004
Analysis of Melanoma Onset
Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects
104 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
To account for the different ages at censoring in eachgeneration due to termination of the study or deathfrom causes other than melanoma we examined thedisease-free survival distribution for each generationusing the standard failure-time analysis technique(Figure 2) The median age at diagnosis of melanomawas 64 among parents of probands 50 among sib-lings of probands and 33 among children ofprobands The disease-free survival functions differedsignificantly between the children generation fromearlier generations (log-rank test with p lt 01) butthere was no significant differences in age-at-onset of
melanoma between the siblings and parents ofprobands (log rank test with p gt 9)
Preliminary Segregation Analysis
A preliminary segregation analysis was conducted(Aitken et al 1998) to investigate whether the famil-ial clustering of cutaneous melanoma is consistentwith Mendelian inheritance of a major autosomalgene Analyses were performed with the SAGE (1992)statistical package using the maximum likelihoodREGTL program for a binary trait with a variable ageof onset The hypothesis of co-dominant Mendelianinheritance gave a significantly better fit to the datathan either dominant or recessive Mendelian inheri-tance Overall both Mendelian inheritance of a singlemajor gene and purely environmental transmissionwere rejected However there was strong evidence offamilial dependence in melanoma occurrence
The inclusion of risk factors in the models mayreveal whether all or a combination of these explainsthe familial dependence that was demonstrated by thesegregation analysis If a major gene exists that oper-ates independently on these covariates includingthem in the models may reduce residual variation andincrease the power of the analysis to detect a majorgene effect
Estimating Equations Approach
Let y = (dki tki Zki) denote the data collected for the ith
member in the kth family (k = 1hellipK and i = 12)where dki = 0 if the observation is censored tki is eitherthe recorded age at diagnosis of melanoma or the ageat the most recent follow-up for unaffected peopleand Zki is a vector of measured covariates We assumethat censoring time age at diagnosis of melanomaand the covariates are independently distributedThese assumptions can be relaxed in more generalmodels subject to identification constraints Thehazard rate for melanoma is the instantaneous proba-bility that melanoma is diagnosed immediately aftertime t given that the person is unaffected at time tThe hazard rate under the Cox proportional hazardsmodel (Cox 1972) is given by
l(tki) = l0(tki) exp(brsquoZki)
Table 2
Concordant and Discordant Pairs of Relatives in 1912 Families from the Queensland Familial Melanoma Project Probands Were Not Included forthe Calculation of Concordance
Sibndashsib Parentndashchild SecondOthers Total
++ 49 15 41 105
+ndash 763 536 1200 2499
mdash 7011 1078 9817 17906
Total 7823 1629 11058 20510
Figure 2
Cumulative melanoma-free survival among first-degree relatives ofconfirmed cutaneous melanoma cases (probands) diagnosed inQueensland Australia 1982ndash1987 according to the relativersquos rela-tionship to the proband
where l0() is the baseline hazard function and b is avector of regression coefficients
For a specific pair of relatives we follow Clayton(1978) in modeling the bivariate survivor function
F(tk1 tk2) = (F1(tk1)ndashq + F2(tk2)
ndashq ndash1)ndash1q
where F1 and F2 are univariable survivor functions qis a scalar parameter that measures the degree ofdependence between the relativesrsquo times at onsetindependence being implied by q = 0 and positiveassociation by q gt 0 The Clayton model allows nega-tive dependencies and has the property that failuretimes are absolutely continuous for q gt ndash05 In addi-tion the cross-ratio (or odds-ratio) function asstudied by Oakes (1989) is
c(tk1 tk2) = l(tk1 frac12 Tk2 = tk2) l(tk1frac12Tk2 sup3 tk2) =1 + q
This is equivalent to assuming that the odds-ratio isinvariant over the grid region that supports the dataHeuristically the parameter 1 + q is an odds-ratiothat depends on the degree of dependence betweenthe onset ages of the two relatives If genetic factorsdo influence the age at onset of melanoma we wouldexpect to see a higher concordance in the age of onsetin first degree relatives who on average share halftheir genes in common in comparison to seconddegree relatives Under the current model this trans-lates as qfirst-degree gt qsecond-degree
We may use a standard method to estimate withinpair correlations for 2 acute 2 tables from odds ratiosEstimates of relation-pair correlations psibndashsib pparentndashchildand psecond-order are recovered from using the relation-ships for example r sibndashsib = min(1ln(1 + q sibndashsib))Testing for the presence of genetic factors underlyingthe age at diagnosis of melanoma is equivalent totesting H0 p first-order = psecond-order We may test thishypothesis using a z-transform (Kendall 1979 p 315) of the point estimates of the correlation coef-ficients Let n1 and n2 denote the number of first orderand second order relatives let z1and z2 denote thetransformed statistics of r1and r2 the correlation esti-mates for first-order and second-order relative pairsrespectively Specifically we reject H0 when ED gtZ1ndasha where
E = E(z1 ndash z2) = mdash12mdash log 31mdash11
+ndash r
r
1
1mdash2 1mdash11
+ndash r
r2
2
mdash24D2 = V(z1 ndash z2) = mdash
(n1
1ndash 3)mdash+ mdash
(n2
1ndash 3)mdash
and Z1ndasha is the standard normal deviate correspondingto the one-sided a significance level
This approach has the advantage of providing atest for the presence of genetic effects through a singleparameter (q) However it is limited in its ability to
attribute the phenotypic variance to specific effects(eg additive gene action)
Mathematical details of the GEE model and theiterative procedure to estimate the regression coeffi-cients b and specific degrees of dependence q for thedifferent types of relative pairs have been summarizedpreviously in Do et al (2000)
MCMC Analysis Using BUGSThe Bayesian Paradigm and Gibbs Sampling
Markov Chain Monte Carlo (MCMC) is an alterna-tive Bayesian approach that provides estimates oflikelihoods and associated parameter values whenexact computation is infeasible (Hastings 1970Metropolis et al 1953) MCMC methods can beused to draw samples from the underlying joint distri-bution of major genotypes and polygenic valuesconditional on the observed data From thesesamples desired parameters and likelihoods can beestimated without the need to resort to exact compu-tation MCMC methods have been used for linkageanalysis (Kong et al 1992 Lange amp Sobel 1991)for estimation of parameters in the mixed model withand without covariates (Guo amp Thompson 1991Thomas 1992) for estimation of genendashsmoking inter-action and covariate imputation (Gauderman et al1997) for performing combined linkage and segrega-tion analysis (Faucett et al 1993 Guo amp Thompson1992) and for mixed models of large complex pedi-grees (Guo amp Thompson 1994)
In a general setting let y be the observed data andq be everything not observed including parametersand latent variables The implementation of Bayesianmethods using realistic models and priors is com-puter-intensive and relies heavily on cunningcomputational tools to approximate integrals Theproblem in general terms is to obtain the expectedvalue of a function of interest s () under the posteriordensity p(qfrac12x)
E[s(q)] =
which cannot generally be found analytically Onemethod to carry out the integration on the RHS is toperform simulation of exact Bayesian posterior distri-butions using Markov chain Monte Carlo techniquessuch as Gibbs sampling The Gibbs sampler (Gemanamp Geman 1984) is the most popular algorithm usedin MCMC applications to correlated data Gibbssampling was introduced to the main statistical com-munity by Gelfand and Smith (1990) and has sincebeen applied to an even wider array of problems TheGibbs sampler is easy to implement because it onlydepends on the local neighborhood structure In thecontext of pedigree analysis (Olshen amp Wijsman1996) the basic procedure is a sequential updating ofmissing and latent data including the underlying andunobserved major genotypes polygenic effects and
EQ s(q) p(q) p(xfrac12q) dqmdash mdash mdash
EQ p(q) p(xfrac12q) dq
105Twin Research February 2004
Analysis of Melanoma Onset
106 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
environmental effects Values for the missing or latentdata are sampled from the local conditional distribu-tion a function of the observed individual data thecurrent sampled values of other missinglatent data forthis particular individual such as polygenic and environ-mental effects and the values for the sampled geneticeffects in the immediate neighbors of an individualGibbs sampling basically consists of three main steps
Step 1 Setting initial values for unobserved quanti-ties (parameters and latent variables)
Step 2 For each parameter or latent variable qj sample from its ldquofull conditional distributionrdquogiven the current values of all other quantities inthe model
Step 3 Examine sampled values of parameters andlatent variables to monitor convergence and toprovide summary measures
Some of the most recent and popular packages thatimplement Gibbs sampling for analysis of pedigreedata include BUGS (Gilks et al 1994 Spiegelhalter et al 1996ab) Genetic Analysis Package (GAP1996) and MIXD (Olshen amp Wijsman 1996Thompson 1994) We have used BUGS mainlybecause of its flexibility in programming hierarchicalmodels besides being a freeware product
The Model
The aim here is to model the correlation structurewithin the family structure to satisfy the fundamentaladditive genetic model (Crow amp Kimura 1970Falconer 1990 Kempthorne 1960) as followsConsider a nuclear family structure consisting of 4members father mother and two children denoted byF M S1 S2 respectively Using similar notation as inBurton et al (1999) a conventional mixed linearmodel consisting of fixed and random effects may bewritten in the form
Qij = bTz + Aij +Cij + CSij+ Eij
where Qij is the observed value of a normally distrib-uted continuous trait for the jth individual in the ith
nuclear family zij is a vector of observed covariatesrepresenting fixed effects and b is a correspondingvector of unknown fixed regression coefficients Aij Cij and CSij
A denote random effects that repre-sent additive polygenic common family environmentand common sibling environment effects respectivelyThe variation in an individual response is representedby a composite covariance matrix VT and is the sumof an additive genetic covariance matrix VA acommon family environment matrix VC a sharedsibling environment matrix VCs and residual environ-mental effects The different variance components are
The overall total covariance matrix is
The components of variance s 2A s 2
C s 2Cs need not be
positive as long as VT is positive definite A negativevalue for s 2
Cs simply implies that the realized covari-ance between sib lings is less than the realized
107Twin Research February 2004
Analysis of Melanoma Onset
covariance between parent and a child A negativevalue for s 2
Cs may suggest dominance To optimizeconvergence in BUGS Model (1) may be reparameter-ized as
where Fi Gi H i are independent additive randomeffects or latent variables and are the residual errorterms for parents and children respectively If we model
Fi ~ N(0 mdash12mdash s 2
A +s 2C)
Gi ~ N(0 mdash12mdash s 2
A) Hi ~ N(0 s 2Cs)
then the basic genetics covariance model (additivegenetic common environment and unique environ-ment for this particular four-member family structureis satisfied
In survival models unobserved or unmeasuredexplanatory variables some of which may be geneticare often referred to as frailties The frailties takevalues restricted to the posit ive line and may beassumed to act multiplicatively on the hazardExtending the above model to correlated family datawith time-to-onset endpoint a multiplicative individ-ual heterogeneity or frailty term representing thelatent genetic and common environment variablesmay be modeled as random effects simultaneouslywith the effects associated with observed covariatesConsider right censored time to onset of melanomadata (Tij dij zij)1 j n from n relative pairs hereT i j denotes the true age at onset of the jth familymember or the censored time depending on whetherdij =1 or 0 respectively and z denotes a p acute 1 vector ofcovariates A Weibull distribution may be used tomodel time to failure as
f(ti zi) = ebrsquoz igt igndash1 exp(ndashebrsquoz it g
i )
where b is a vector of unknown regression coefficientsand g is the shape parameter of the Weibull distribu-tion This leads to a baseline hazard of the form
l0(ti) = gt igndash1
Re-parameterize by letting mi = ebrsquoz i the conditionaldistribution of ti given mI is then Weibull (gmi) We for-mulated a mixed model to represent the conditionaldistribution of tij given covariate effects random addi-tive genetic and common environment effects as
tijfrac12mij ~ Weibull (g mij) i = 1hellipn j = 12
where
The regression coefficients and the precision of therandom effects (tG tF tH) were given ldquonon-informa-tiverdquo Normal and Gamma priors respectively The shapeparameter g of the time to onset of melanoma distribu-tion was also given a non-informative Gamma priorwhich was slowly decreasing on the positive real line
The overall model (including random effects) maybe described in a Bayesian graph (Figure 3) whichsimplifies sampling from full conditional distributionsby exploiting part ial independence propert ies(Spiegelhalter et al 1996a) In this graph eachrandom quantity is represented by a node which maybe connected by directed or undirected linksConditional independence assumptions are repre-sented by the absence of such links
We implemented the Gibbs sampler using theBUGS program (Gilks et al 1994) (code in theAppendix) Imputation of missing data was handlednaturally in the Gibbs sampling framework by treatingmissing values as additional unknown quantities andrandomly sampling values from their full conditionaldistributions We chose simple prior distributions forimputation since the number of missing values forcovariates was not large (complete for birth year 16missing for naevi and 21 missing for freckles) andthere was no indication of non-random missingness inour data Therefore imputation for missing naevi andfreckles covariates were based on Bernouilli prior dis-tributions with respective parameter values estimatedfrom the complete observations We performed aninitial 10000 burn-in iterations followed by an addi-tional 20000 Parameter estimates were the mean andstandard deviation (SD) of all post convergence Gibbssamples with a thinning interval between 20 and 50credible intervals were computed as the lower andupper a2 percentiles from the last 20000 iterationsConvergence to the posterior distribution was con-firmed by using the different criteria provided by theadd-on CODA package including those of Gelmanand Rubin (1992) Geweke (1992) and Raftery andLewis (1992a 1992b)
ResultsGEE Approach
An inspection of residual plots following preliminarymodel-fitting provided no evidence for the failure ofproportional hazards assumption and did not detectinfluential observations We then proceeded to applythe GEE approach that could estimate regressioncoefficients while incorporating a dependence struc-ture between relative pairs The results are displayed
108 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
in Table 3 which suggest that later birth year havingat least a moderate number of naevi and freckles wereall simultaneously associated with later age at onsetof melanoma Table 3 also presents the estimatedodds-ratios for quantifying the correlation betweenpaired relatives of specific relationships
The odds ratio for sibndashsib pairs is 2973 which issignificantly different from 1 (p lt 001) The odds ratiofor parentndashchild pairs is 1650 which is slightly greaterthan 1 indicating a mild dependency between thesepairs although they are not quite statistically signifi-cant The odds ratio for second-degree and higherrelative pairs is 1155 indicating no dependence at allbetween these pairs This pattern of familial aggrega-tion is compatible with dominance variance as well asadditive genetic variance (Falconer 1990)
Bayesian Approach
We re-analyzed the age-at-onset of melanoma by usingGibbs sampling to impute missing covariates and toestimate subject-specific covariate effects randomadditive genetic common family environment andshared sibling environment effects on the log scale Theestimated shape parameter g was 43 with a 95 credi-ble interval (CI) of (42 46) The results aresummarized in Table 4 (Model A) The residual plotsdid not indicate a gross departure from the underlyingWeibull model and revealed no influential observa-tions We checked the sensitivity of the analyses to
initial parameter values by re-running the Gibbssampler five more times using different starting valuesThe resulting estimates did not differ by more than 5from the values reported here The mean estimate for(s 2
A) was 0452 with 95 CI = (0348 0566) for (s 2C)
was ndash0053 with 95 CI = (ndash0120 0019) and for(s 2
Cs) was 0467 with 95 CI = (0393 0545)A small negative value for the common family
effect suggests that there may be a dominant effect orthat the current model is not quite appropriate forexample that there are systematic effects or correla-tion structures that have not been accounted for Theresults here indicate that additive genetics seem toimpact equally on the variation of the age at onset ofmelanoma Further exploration for alternative modelsis a focus of our future research
In addition we investigated the relative contribu-tion of genetic and environmental effects on theexpression of naevi (Model B) and freckles (ModelC) which are known risk factors for melanoma Theexpressions of naevi and freckles were coded asbinary variables (none or few moles versus moderateor many moles and no freckles versus one or morefreckles) A hierarchical Bayesian binomial model wasfitted to estimate the random variance componentsThe results in Table 4 indicate that a common familyenvironment effect contributed the most to theexpression of naevi (s 2
C = 0704) (relative to the con-tributions of additive genetic effect (s 2
A = 0142) and
Individual j in family i
Figure 3
Graphical model of covariate and random family effects for an individual in a nuclear family structure tij represents the observed failure time forthe jth individual in the ith family with tcenij being an indicator variable of censoring status Full arrows indicate stochastic links to which a probabil-ity is attached broken arrows denote deterministic relationships bs are regression coefficients T is the precision of the prior distribution andequals the inverse of the variance Fi Gi Hi are random effects modeled as Fi ~ N(012 s 2
A + s 2C) Gi ~ N(0 12 s 2
A) and Hi ~ N(0 s 2Cs) Rectangles rep-
resent actual data values for the covariates g and mij are shape and scale parameters for the underlying Weibull distribution
109Twin Research February 2004
Analysis of Melanoma Onset
of shared sibling effect(s 2Cs = 0142) both of which
were non negligible In contrast variation in theexpression of freckles was largely explained by addi-tive genetic and shared family effects(s 2
A = 2050 s 2C =
2600) compared to a relatively small shared siblingeffect(s 2
C = 0115)
Discussion
We applied two methods mdash generalized estimatingequat ion and Bayesian analys is mdash to the geneticanalysis of age at onset of melanoma based on anuclear family structure Under both approaches theresults suggest that additive genetic factors played an
Table 4
Gibbs Sampling Approach Estimated Regression Coefficients and Estimated Variance Components in a Melanoma Study of Queensland Families
Weibull Model A Mean effects mdash Response variable is Age-at-onset
Covariate RR = eb Coefficient b Robust se(b) 95 CI of bYear of birth 1378 0321 00027 (03160326)
Naevi 1126 0119 00021 (00580185)
Freckling 1017 0017 01400 (ndash00050085)
Weibull Model B Variance components mdash Response variable is Age-at-onset
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 0452 0054 (03480566)
s 2C ndash0053 0027 (ndash01200019)
s 2Cs 0467 0040 (03930545)
g 43 0104 (4246)
Binomial Model Variance components mdash Response variable is Naevi
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 0142 0149 (00020498)
s 2C 0704 0156 (04031010)
s 2Cs 0195 0157 (000250553)
Binomial Model Variance components mdash Response variable is Freckling
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 2050 0779 (08353570)
s 2C 2600 0418 (17803460)
s 2Cs 0115 0088 (00110312)
Note Indicates Significance at the 005 level) Naevi Is a Binary Variable with Baseline 0 = No or Few Moles Freckling Is Coded as a Binary Variable with Baseline0 = No Freckles
Table 3
GEE Approach Estimated Regression Coefficients in the Proportional Hazard Model and Estimated Odds Ratios for Quantifying FamilialAggregation in Age at Onset of Melanoma in Queensland Families ( Indicates Significance at the 05 Significance Level)
A Mean effects
Covariate RR = eb Coefficient b Robust se(b) Z-statistic
Year of birth 1142 0132 0051 2588
Naevi (Baseline = No or few moles) 1765 0568 0073 7781
Freckling (Baseline = No freckles) 1160 0148 0049 3020
B Pattterns of familial aggregation
Relationship 1 + q se(q) Z-statistic
Sibndashsib 2973 06217 317
Parentndashchild 1650 0434 150
SecondOthers 1155 0270 047
important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix
Acknowledgments
Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee
ReferencesAbel L amp Bonney G E (1990) A time-dependent
logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407
Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401
Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165
Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc
Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag
Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press
Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625
Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California
Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212
Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151
Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117
Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202
Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235
English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283
Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd
Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494
GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software
110 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
111Twin Research February 2004
Analysis of Melanoma Onset
Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214
Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188
Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409
Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511
Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741
Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press
Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178
Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189
Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126
Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432
Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109
Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760
Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678
Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506
Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071
Kempthorne O (1960) Biometrical genetics New YorkPergammon Press
Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press
Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210
Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452
Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334
LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425
Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293
Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298
MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432
Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092
Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43
Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493
Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497
Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press
SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University
Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148
112 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171
Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press
Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit
Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit
Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95
Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146
Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation
Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230
Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington
Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454
113Twin Research February 2004
Analysis of Melanoma Onset
Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects
where l0() is the baseline hazard function and b is avector of regression coefficients
For a specific pair of relatives we follow Clayton(1978) in modeling the bivariate survivor function
F(tk1 tk2) = (F1(tk1)ndashq + F2(tk2)
ndashq ndash1)ndash1q
where F1 and F2 are univariable survivor functions qis a scalar parameter that measures the degree ofdependence between the relativesrsquo times at onsetindependence being implied by q = 0 and positiveassociation by q gt 0 The Clayton model allows nega-tive dependencies and has the property that failuretimes are absolutely continuous for q gt ndash05 In addi-tion the cross-ratio (or odds-ratio) function asstudied by Oakes (1989) is
c(tk1 tk2) = l(tk1 frac12 Tk2 = tk2) l(tk1frac12Tk2 sup3 tk2) =1 + q
This is equivalent to assuming that the odds-ratio isinvariant over the grid region that supports the dataHeuristically the parameter 1 + q is an odds-ratiothat depends on the degree of dependence betweenthe onset ages of the two relatives If genetic factorsdo influence the age at onset of melanoma we wouldexpect to see a higher concordance in the age of onsetin first degree relatives who on average share halftheir genes in common in comparison to seconddegree relatives Under the current model this trans-lates as qfirst-degree gt qsecond-degree
We may use a standard method to estimate withinpair correlations for 2 acute 2 tables from odds ratiosEstimates of relation-pair correlations psibndashsib pparentndashchildand psecond-order are recovered from using the relation-ships for example r sibndashsib = min(1ln(1 + q sibndashsib))Testing for the presence of genetic factors underlyingthe age at diagnosis of melanoma is equivalent totesting H0 p first-order = psecond-order We may test thishypothesis using a z-transform (Kendall 1979 p 315) of the point estimates of the correlation coef-ficients Let n1 and n2 denote the number of first orderand second order relatives let z1and z2 denote thetransformed statistics of r1and r2 the correlation esti-mates for first-order and second-order relative pairsrespectively Specifically we reject H0 when ED gtZ1ndasha where
E = E(z1 ndash z2) = mdash12mdash log 31mdash11
+ndash r
r
1
1mdash2 1mdash11
+ndash r
r2
2
mdash24D2 = V(z1 ndash z2) = mdash
(n1
1ndash 3)mdash+ mdash
(n2
1ndash 3)mdash
and Z1ndasha is the standard normal deviate correspondingto the one-sided a significance level
This approach has the advantage of providing atest for the presence of genetic effects through a singleparameter (q) However it is limited in its ability to
attribute the phenotypic variance to specific effects(eg additive gene action)
Mathematical details of the GEE model and theiterative procedure to estimate the regression coeffi-cients b and specific degrees of dependence q for thedifferent types of relative pairs have been summarizedpreviously in Do et al (2000)
MCMC Analysis Using BUGSThe Bayesian Paradigm and Gibbs Sampling
Markov Chain Monte Carlo (MCMC) is an alterna-tive Bayesian approach that provides estimates oflikelihoods and associated parameter values whenexact computation is infeasible (Hastings 1970Metropolis et al 1953) MCMC methods can beused to draw samples from the underlying joint distri-bution of major genotypes and polygenic valuesconditional on the observed data From thesesamples desired parameters and likelihoods can beestimated without the need to resort to exact compu-tation MCMC methods have been used for linkageanalysis (Kong et al 1992 Lange amp Sobel 1991)for estimation of parameters in the mixed model withand without covariates (Guo amp Thompson 1991Thomas 1992) for estimation of genendashsmoking inter-action and covariate imputation (Gauderman et al1997) for performing combined linkage and segrega-tion analysis (Faucett et al 1993 Guo amp Thompson1992) and for mixed models of large complex pedi-grees (Guo amp Thompson 1994)
In a general setting let y be the observed data andq be everything not observed including parametersand latent variables The implementation of Bayesianmethods using realistic models and priors is com-puter-intensive and relies heavily on cunningcomputational tools to approximate integrals Theproblem in general terms is to obtain the expectedvalue of a function of interest s () under the posteriordensity p(qfrac12x)
E[s(q)] =
which cannot generally be found analytically Onemethod to carry out the integration on the RHS is toperform simulation of exact Bayesian posterior distri-butions using Markov chain Monte Carlo techniquessuch as Gibbs sampling The Gibbs sampler (Gemanamp Geman 1984) is the most popular algorithm usedin MCMC applications to correlated data Gibbssampling was introduced to the main statistical com-munity by Gelfand and Smith (1990) and has sincebeen applied to an even wider array of problems TheGibbs sampler is easy to implement because it onlydepends on the local neighborhood structure In thecontext of pedigree analysis (Olshen amp Wijsman1996) the basic procedure is a sequential updating ofmissing and latent data including the underlying andunobserved major genotypes polygenic effects and
EQ s(q) p(q) p(xfrac12q) dqmdash mdash mdash
EQ p(q) p(xfrac12q) dq
105Twin Research February 2004
Analysis of Melanoma Onset
106 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
environmental effects Values for the missing or latentdata are sampled from the local conditional distribu-tion a function of the observed individual data thecurrent sampled values of other missinglatent data forthis particular individual such as polygenic and environ-mental effects and the values for the sampled geneticeffects in the immediate neighbors of an individualGibbs sampling basically consists of three main steps
Step 1 Setting initial values for unobserved quanti-ties (parameters and latent variables)
Step 2 For each parameter or latent variable qj sample from its ldquofull conditional distributionrdquogiven the current values of all other quantities inthe model
Step 3 Examine sampled values of parameters andlatent variables to monitor convergence and toprovide summary measures
Some of the most recent and popular packages thatimplement Gibbs sampling for analysis of pedigreedata include BUGS (Gilks et al 1994 Spiegelhalter et al 1996ab) Genetic Analysis Package (GAP1996) and MIXD (Olshen amp Wijsman 1996Thompson 1994) We have used BUGS mainlybecause of its flexibility in programming hierarchicalmodels besides being a freeware product
The Model
The aim here is to model the correlation structurewithin the family structure to satisfy the fundamentaladditive genetic model (Crow amp Kimura 1970Falconer 1990 Kempthorne 1960) as followsConsider a nuclear family structure consisting of 4members father mother and two children denoted byF M S1 S2 respectively Using similar notation as inBurton et al (1999) a conventional mixed linearmodel consisting of fixed and random effects may bewritten in the form
Qij = bTz + Aij +Cij + CSij+ Eij
where Qij is the observed value of a normally distrib-uted continuous trait for the jth individual in the ith
nuclear family zij is a vector of observed covariatesrepresenting fixed effects and b is a correspondingvector of unknown fixed regression coefficients Aij Cij and CSij
A denote random effects that repre-sent additive polygenic common family environmentand common sibling environment effects respectivelyThe variation in an individual response is representedby a composite covariance matrix VT and is the sumof an additive genetic covariance matrix VA acommon family environment matrix VC a sharedsibling environment matrix VCs and residual environ-mental effects The different variance components are
The overall total covariance matrix is
The components of variance s 2A s 2
C s 2Cs need not be
positive as long as VT is positive definite A negativevalue for s 2
Cs simply implies that the realized covari-ance between sib lings is less than the realized
107Twin Research February 2004
Analysis of Melanoma Onset
covariance between parent and a child A negativevalue for s 2
Cs may suggest dominance To optimizeconvergence in BUGS Model (1) may be reparameter-ized as
where Fi Gi H i are independent additive randomeffects or latent variables and are the residual errorterms for parents and children respectively If we model
Fi ~ N(0 mdash12mdash s 2
A +s 2C)
Gi ~ N(0 mdash12mdash s 2
A) Hi ~ N(0 s 2Cs)
then the basic genetics covariance model (additivegenetic common environment and unique environ-ment for this particular four-member family structureis satisfied
In survival models unobserved or unmeasuredexplanatory variables some of which may be geneticare often referred to as frailties The frailties takevalues restricted to the posit ive line and may beassumed to act multiplicatively on the hazardExtending the above model to correlated family datawith time-to-onset endpoint a multiplicative individ-ual heterogeneity or frailty term representing thelatent genetic and common environment variablesmay be modeled as random effects simultaneouslywith the effects associated with observed covariatesConsider right censored time to onset of melanomadata (Tij dij zij)1 j n from n relative pairs hereT i j denotes the true age at onset of the jth familymember or the censored time depending on whetherdij =1 or 0 respectively and z denotes a p acute 1 vector ofcovariates A Weibull distribution may be used tomodel time to failure as
f(ti zi) = ebrsquoz igt igndash1 exp(ndashebrsquoz it g
i )
where b is a vector of unknown regression coefficientsand g is the shape parameter of the Weibull distribu-tion This leads to a baseline hazard of the form
l0(ti) = gt igndash1
Re-parameterize by letting mi = ebrsquoz i the conditionaldistribution of ti given mI is then Weibull (gmi) We for-mulated a mixed model to represent the conditionaldistribution of tij given covariate effects random addi-tive genetic and common environment effects as
tijfrac12mij ~ Weibull (g mij) i = 1hellipn j = 12
where
The regression coefficients and the precision of therandom effects (tG tF tH) were given ldquonon-informa-tiverdquo Normal and Gamma priors respectively The shapeparameter g of the time to onset of melanoma distribu-tion was also given a non-informative Gamma priorwhich was slowly decreasing on the positive real line
The overall model (including random effects) maybe described in a Bayesian graph (Figure 3) whichsimplifies sampling from full conditional distributionsby exploiting part ial independence propert ies(Spiegelhalter et al 1996a) In this graph eachrandom quantity is represented by a node which maybe connected by directed or undirected linksConditional independence assumptions are repre-sented by the absence of such links
We implemented the Gibbs sampler using theBUGS program (Gilks et al 1994) (code in theAppendix) Imputation of missing data was handlednaturally in the Gibbs sampling framework by treatingmissing values as additional unknown quantities andrandomly sampling values from their full conditionaldistributions We chose simple prior distributions forimputation since the number of missing values forcovariates was not large (complete for birth year 16missing for naevi and 21 missing for freckles) andthere was no indication of non-random missingness inour data Therefore imputation for missing naevi andfreckles covariates were based on Bernouilli prior dis-tributions with respective parameter values estimatedfrom the complete observations We performed aninitial 10000 burn-in iterations followed by an addi-tional 20000 Parameter estimates were the mean andstandard deviation (SD) of all post convergence Gibbssamples with a thinning interval between 20 and 50credible intervals were computed as the lower andupper a2 percentiles from the last 20000 iterationsConvergence to the posterior distribution was con-firmed by using the different criteria provided by theadd-on CODA package including those of Gelmanand Rubin (1992) Geweke (1992) and Raftery andLewis (1992a 1992b)
ResultsGEE Approach
An inspection of residual plots following preliminarymodel-fitting provided no evidence for the failure ofproportional hazards assumption and did not detectinfluential observations We then proceeded to applythe GEE approach that could estimate regressioncoefficients while incorporating a dependence struc-ture between relative pairs The results are displayed
108 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
in Table 3 which suggest that later birth year havingat least a moderate number of naevi and freckles wereall simultaneously associated with later age at onsetof melanoma Table 3 also presents the estimatedodds-ratios for quantifying the correlation betweenpaired relatives of specific relationships
The odds ratio for sibndashsib pairs is 2973 which issignificantly different from 1 (p lt 001) The odds ratiofor parentndashchild pairs is 1650 which is slightly greaterthan 1 indicating a mild dependency between thesepairs although they are not quite statistically signifi-cant The odds ratio for second-degree and higherrelative pairs is 1155 indicating no dependence at allbetween these pairs This pattern of familial aggrega-tion is compatible with dominance variance as well asadditive genetic variance (Falconer 1990)
Bayesian Approach
We re-analyzed the age-at-onset of melanoma by usingGibbs sampling to impute missing covariates and toestimate subject-specific covariate effects randomadditive genetic common family environment andshared sibling environment effects on the log scale Theestimated shape parameter g was 43 with a 95 credi-ble interval (CI) of (42 46) The results aresummarized in Table 4 (Model A) The residual plotsdid not indicate a gross departure from the underlyingWeibull model and revealed no influential observa-tions We checked the sensitivity of the analyses to
initial parameter values by re-running the Gibbssampler five more times using different starting valuesThe resulting estimates did not differ by more than 5from the values reported here The mean estimate for(s 2
A) was 0452 with 95 CI = (0348 0566) for (s 2C)
was ndash0053 with 95 CI = (ndash0120 0019) and for(s 2
Cs) was 0467 with 95 CI = (0393 0545)A small negative value for the common family
effect suggests that there may be a dominant effect orthat the current model is not quite appropriate forexample that there are systematic effects or correla-tion structures that have not been accounted for Theresults here indicate that additive genetics seem toimpact equally on the variation of the age at onset ofmelanoma Further exploration for alternative modelsis a focus of our future research
In addition we investigated the relative contribu-tion of genetic and environmental effects on theexpression of naevi (Model B) and freckles (ModelC) which are known risk factors for melanoma Theexpressions of naevi and freckles were coded asbinary variables (none or few moles versus moderateor many moles and no freckles versus one or morefreckles) A hierarchical Bayesian binomial model wasfitted to estimate the random variance componentsThe results in Table 4 indicate that a common familyenvironment effect contributed the most to theexpression of naevi (s 2
C = 0704) (relative to the con-tributions of additive genetic effect (s 2
A = 0142) and
Individual j in family i
Figure 3
Graphical model of covariate and random family effects for an individual in a nuclear family structure tij represents the observed failure time forthe jth individual in the ith family with tcenij being an indicator variable of censoring status Full arrows indicate stochastic links to which a probabil-ity is attached broken arrows denote deterministic relationships bs are regression coefficients T is the precision of the prior distribution andequals the inverse of the variance Fi Gi Hi are random effects modeled as Fi ~ N(012 s 2
A + s 2C) Gi ~ N(0 12 s 2
A) and Hi ~ N(0 s 2Cs) Rectangles rep-
resent actual data values for the covariates g and mij are shape and scale parameters for the underlying Weibull distribution
109Twin Research February 2004
Analysis of Melanoma Onset
of shared sibling effect(s 2Cs = 0142) both of which
were non negligible In contrast variation in theexpression of freckles was largely explained by addi-tive genetic and shared family effects(s 2
A = 2050 s 2C =
2600) compared to a relatively small shared siblingeffect(s 2
C = 0115)
Discussion
We applied two methods mdash generalized estimatingequat ion and Bayesian analys is mdash to the geneticanalysis of age at onset of melanoma based on anuclear family structure Under both approaches theresults suggest that additive genetic factors played an
Table 4
Gibbs Sampling Approach Estimated Regression Coefficients and Estimated Variance Components in a Melanoma Study of Queensland Families
Weibull Model A Mean effects mdash Response variable is Age-at-onset
Covariate RR = eb Coefficient b Robust se(b) 95 CI of bYear of birth 1378 0321 00027 (03160326)
Naevi 1126 0119 00021 (00580185)
Freckling 1017 0017 01400 (ndash00050085)
Weibull Model B Variance components mdash Response variable is Age-at-onset
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 0452 0054 (03480566)
s 2C ndash0053 0027 (ndash01200019)
s 2Cs 0467 0040 (03930545)
g 43 0104 (4246)
Binomial Model Variance components mdash Response variable is Naevi
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 0142 0149 (00020498)
s 2C 0704 0156 (04031010)
s 2Cs 0195 0157 (000250553)
Binomial Model Variance components mdash Response variable is Freckling
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 2050 0779 (08353570)
s 2C 2600 0418 (17803460)
s 2Cs 0115 0088 (00110312)
Note Indicates Significance at the 005 level) Naevi Is a Binary Variable with Baseline 0 = No or Few Moles Freckling Is Coded as a Binary Variable with Baseline0 = No Freckles
Table 3
GEE Approach Estimated Regression Coefficients in the Proportional Hazard Model and Estimated Odds Ratios for Quantifying FamilialAggregation in Age at Onset of Melanoma in Queensland Families ( Indicates Significance at the 05 Significance Level)
A Mean effects
Covariate RR = eb Coefficient b Robust se(b) Z-statistic
Year of birth 1142 0132 0051 2588
Naevi (Baseline = No or few moles) 1765 0568 0073 7781
Freckling (Baseline = No freckles) 1160 0148 0049 3020
B Pattterns of familial aggregation
Relationship 1 + q se(q) Z-statistic
Sibndashsib 2973 06217 317
Parentndashchild 1650 0434 150
SecondOthers 1155 0270 047
important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix
Acknowledgments
Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee
ReferencesAbel L amp Bonney G E (1990) A time-dependent
logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407
Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401
Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165
Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc
Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag
Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press
Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625
Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California
Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212
Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151
Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117
Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202
Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235
English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283
Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd
Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494
GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software
110 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
111Twin Research February 2004
Analysis of Melanoma Onset
Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214
Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188
Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409
Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511
Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741
Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press
Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178
Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189
Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126
Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432
Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109
Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760
Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678
Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506
Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071
Kempthorne O (1960) Biometrical genetics New YorkPergammon Press
Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press
Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210
Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452
Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334
LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425
Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293
Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298
MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432
Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092
Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43
Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493
Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497
Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press
SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University
Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148
112 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171
Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press
Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit
Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit
Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95
Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146
Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation
Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230
Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington
Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454
113Twin Research February 2004
Analysis of Melanoma Onset
Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects
106 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
environmental effects Values for the missing or latentdata are sampled from the local conditional distribu-tion a function of the observed individual data thecurrent sampled values of other missinglatent data forthis particular individual such as polygenic and environ-mental effects and the values for the sampled geneticeffects in the immediate neighbors of an individualGibbs sampling basically consists of three main steps
Step 1 Setting initial values for unobserved quanti-ties (parameters and latent variables)
Step 2 For each parameter or latent variable qj sample from its ldquofull conditional distributionrdquogiven the current values of all other quantities inthe model
Step 3 Examine sampled values of parameters andlatent variables to monitor convergence and toprovide summary measures
Some of the most recent and popular packages thatimplement Gibbs sampling for analysis of pedigreedata include BUGS (Gilks et al 1994 Spiegelhalter et al 1996ab) Genetic Analysis Package (GAP1996) and MIXD (Olshen amp Wijsman 1996Thompson 1994) We have used BUGS mainlybecause of its flexibility in programming hierarchicalmodels besides being a freeware product
The Model
The aim here is to model the correlation structurewithin the family structure to satisfy the fundamentaladditive genetic model (Crow amp Kimura 1970Falconer 1990 Kempthorne 1960) as followsConsider a nuclear family structure consisting of 4members father mother and two children denoted byF M S1 S2 respectively Using similar notation as inBurton et al (1999) a conventional mixed linearmodel consisting of fixed and random effects may bewritten in the form
Qij = bTz + Aij +Cij + CSij+ Eij
where Qij is the observed value of a normally distrib-uted continuous trait for the jth individual in the ith
nuclear family zij is a vector of observed covariatesrepresenting fixed effects and b is a correspondingvector of unknown fixed regression coefficients Aij Cij and CSij
A denote random effects that repre-sent additive polygenic common family environmentand common sibling environment effects respectivelyThe variation in an individual response is representedby a composite covariance matrix VT and is the sumof an additive genetic covariance matrix VA acommon family environment matrix VC a sharedsibling environment matrix VCs and residual environ-mental effects The different variance components are
The overall total covariance matrix is
The components of variance s 2A s 2
C s 2Cs need not be
positive as long as VT is positive definite A negativevalue for s 2
Cs simply implies that the realized covari-ance between sib lings is less than the realized
107Twin Research February 2004
Analysis of Melanoma Onset
covariance between parent and a child A negativevalue for s 2
Cs may suggest dominance To optimizeconvergence in BUGS Model (1) may be reparameter-ized as
where Fi Gi H i are independent additive randomeffects or latent variables and are the residual errorterms for parents and children respectively If we model
Fi ~ N(0 mdash12mdash s 2
A +s 2C)
Gi ~ N(0 mdash12mdash s 2
A) Hi ~ N(0 s 2Cs)
then the basic genetics covariance model (additivegenetic common environment and unique environ-ment for this particular four-member family structureis satisfied
In survival models unobserved or unmeasuredexplanatory variables some of which may be geneticare often referred to as frailties The frailties takevalues restricted to the posit ive line and may beassumed to act multiplicatively on the hazardExtending the above model to correlated family datawith time-to-onset endpoint a multiplicative individ-ual heterogeneity or frailty term representing thelatent genetic and common environment variablesmay be modeled as random effects simultaneouslywith the effects associated with observed covariatesConsider right censored time to onset of melanomadata (Tij dij zij)1 j n from n relative pairs hereT i j denotes the true age at onset of the jth familymember or the censored time depending on whetherdij =1 or 0 respectively and z denotes a p acute 1 vector ofcovariates A Weibull distribution may be used tomodel time to failure as
f(ti zi) = ebrsquoz igt igndash1 exp(ndashebrsquoz it g
i )
where b is a vector of unknown regression coefficientsand g is the shape parameter of the Weibull distribu-tion This leads to a baseline hazard of the form
l0(ti) = gt igndash1
Re-parameterize by letting mi = ebrsquoz i the conditionaldistribution of ti given mI is then Weibull (gmi) We for-mulated a mixed model to represent the conditionaldistribution of tij given covariate effects random addi-tive genetic and common environment effects as
tijfrac12mij ~ Weibull (g mij) i = 1hellipn j = 12
where
The regression coefficients and the precision of therandom effects (tG tF tH) were given ldquonon-informa-tiverdquo Normal and Gamma priors respectively The shapeparameter g of the time to onset of melanoma distribu-tion was also given a non-informative Gamma priorwhich was slowly decreasing on the positive real line
The overall model (including random effects) maybe described in a Bayesian graph (Figure 3) whichsimplifies sampling from full conditional distributionsby exploiting part ial independence propert ies(Spiegelhalter et al 1996a) In this graph eachrandom quantity is represented by a node which maybe connected by directed or undirected linksConditional independence assumptions are repre-sented by the absence of such links
We implemented the Gibbs sampler using theBUGS program (Gilks et al 1994) (code in theAppendix) Imputation of missing data was handlednaturally in the Gibbs sampling framework by treatingmissing values as additional unknown quantities andrandomly sampling values from their full conditionaldistributions We chose simple prior distributions forimputation since the number of missing values forcovariates was not large (complete for birth year 16missing for naevi and 21 missing for freckles) andthere was no indication of non-random missingness inour data Therefore imputation for missing naevi andfreckles covariates were based on Bernouilli prior dis-tributions with respective parameter values estimatedfrom the complete observations We performed aninitial 10000 burn-in iterations followed by an addi-tional 20000 Parameter estimates were the mean andstandard deviation (SD) of all post convergence Gibbssamples with a thinning interval between 20 and 50credible intervals were computed as the lower andupper a2 percentiles from the last 20000 iterationsConvergence to the posterior distribution was con-firmed by using the different criteria provided by theadd-on CODA package including those of Gelmanand Rubin (1992) Geweke (1992) and Raftery andLewis (1992a 1992b)
ResultsGEE Approach
An inspection of residual plots following preliminarymodel-fitting provided no evidence for the failure ofproportional hazards assumption and did not detectinfluential observations We then proceeded to applythe GEE approach that could estimate regressioncoefficients while incorporating a dependence struc-ture between relative pairs The results are displayed
108 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
in Table 3 which suggest that later birth year havingat least a moderate number of naevi and freckles wereall simultaneously associated with later age at onsetof melanoma Table 3 also presents the estimatedodds-ratios for quantifying the correlation betweenpaired relatives of specific relationships
The odds ratio for sibndashsib pairs is 2973 which issignificantly different from 1 (p lt 001) The odds ratiofor parentndashchild pairs is 1650 which is slightly greaterthan 1 indicating a mild dependency between thesepairs although they are not quite statistically signifi-cant The odds ratio for second-degree and higherrelative pairs is 1155 indicating no dependence at allbetween these pairs This pattern of familial aggrega-tion is compatible with dominance variance as well asadditive genetic variance (Falconer 1990)
Bayesian Approach
We re-analyzed the age-at-onset of melanoma by usingGibbs sampling to impute missing covariates and toestimate subject-specific covariate effects randomadditive genetic common family environment andshared sibling environment effects on the log scale Theestimated shape parameter g was 43 with a 95 credi-ble interval (CI) of (42 46) The results aresummarized in Table 4 (Model A) The residual plotsdid not indicate a gross departure from the underlyingWeibull model and revealed no influential observa-tions We checked the sensitivity of the analyses to
initial parameter values by re-running the Gibbssampler five more times using different starting valuesThe resulting estimates did not differ by more than 5from the values reported here The mean estimate for(s 2
A) was 0452 with 95 CI = (0348 0566) for (s 2C)
was ndash0053 with 95 CI = (ndash0120 0019) and for(s 2
Cs) was 0467 with 95 CI = (0393 0545)A small negative value for the common family
effect suggests that there may be a dominant effect orthat the current model is not quite appropriate forexample that there are systematic effects or correla-tion structures that have not been accounted for Theresults here indicate that additive genetics seem toimpact equally on the variation of the age at onset ofmelanoma Further exploration for alternative modelsis a focus of our future research
In addition we investigated the relative contribu-tion of genetic and environmental effects on theexpression of naevi (Model B) and freckles (ModelC) which are known risk factors for melanoma Theexpressions of naevi and freckles were coded asbinary variables (none or few moles versus moderateor many moles and no freckles versus one or morefreckles) A hierarchical Bayesian binomial model wasfitted to estimate the random variance componentsThe results in Table 4 indicate that a common familyenvironment effect contributed the most to theexpression of naevi (s 2
C = 0704) (relative to the con-tributions of additive genetic effect (s 2
A = 0142) and
Individual j in family i
Figure 3
Graphical model of covariate and random family effects for an individual in a nuclear family structure tij represents the observed failure time forthe jth individual in the ith family with tcenij being an indicator variable of censoring status Full arrows indicate stochastic links to which a probabil-ity is attached broken arrows denote deterministic relationships bs are regression coefficients T is the precision of the prior distribution andequals the inverse of the variance Fi Gi Hi are random effects modeled as Fi ~ N(012 s 2
A + s 2C) Gi ~ N(0 12 s 2
A) and Hi ~ N(0 s 2Cs) Rectangles rep-
resent actual data values for the covariates g and mij are shape and scale parameters for the underlying Weibull distribution
109Twin Research February 2004
Analysis of Melanoma Onset
of shared sibling effect(s 2Cs = 0142) both of which
were non negligible In contrast variation in theexpression of freckles was largely explained by addi-tive genetic and shared family effects(s 2
A = 2050 s 2C =
2600) compared to a relatively small shared siblingeffect(s 2
C = 0115)
Discussion
We applied two methods mdash generalized estimatingequat ion and Bayesian analys is mdash to the geneticanalysis of age at onset of melanoma based on anuclear family structure Under both approaches theresults suggest that additive genetic factors played an
Table 4
Gibbs Sampling Approach Estimated Regression Coefficients and Estimated Variance Components in a Melanoma Study of Queensland Families
Weibull Model A Mean effects mdash Response variable is Age-at-onset
Covariate RR = eb Coefficient b Robust se(b) 95 CI of bYear of birth 1378 0321 00027 (03160326)
Naevi 1126 0119 00021 (00580185)
Freckling 1017 0017 01400 (ndash00050085)
Weibull Model B Variance components mdash Response variable is Age-at-onset
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 0452 0054 (03480566)
s 2C ndash0053 0027 (ndash01200019)
s 2Cs 0467 0040 (03930545)
g 43 0104 (4246)
Binomial Model Variance components mdash Response variable is Naevi
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 0142 0149 (00020498)
s 2C 0704 0156 (04031010)
s 2Cs 0195 0157 (000250553)
Binomial Model Variance components mdash Response variable is Freckling
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 2050 0779 (08353570)
s 2C 2600 0418 (17803460)
s 2Cs 0115 0088 (00110312)
Note Indicates Significance at the 005 level) Naevi Is a Binary Variable with Baseline 0 = No or Few Moles Freckling Is Coded as a Binary Variable with Baseline0 = No Freckles
Table 3
GEE Approach Estimated Regression Coefficients in the Proportional Hazard Model and Estimated Odds Ratios for Quantifying FamilialAggregation in Age at Onset of Melanoma in Queensland Families ( Indicates Significance at the 05 Significance Level)
A Mean effects
Covariate RR = eb Coefficient b Robust se(b) Z-statistic
Year of birth 1142 0132 0051 2588
Naevi (Baseline = No or few moles) 1765 0568 0073 7781
Freckling (Baseline = No freckles) 1160 0148 0049 3020
B Pattterns of familial aggregation
Relationship 1 + q se(q) Z-statistic
Sibndashsib 2973 06217 317
Parentndashchild 1650 0434 150
SecondOthers 1155 0270 047
important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix
Acknowledgments
Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee
ReferencesAbel L amp Bonney G E (1990) A time-dependent
logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407
Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401
Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165
Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc
Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag
Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press
Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625
Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California
Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212
Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151
Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117
Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202
Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235
English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283
Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd
Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494
GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software
110 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
111Twin Research February 2004
Analysis of Melanoma Onset
Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214
Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188
Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409
Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511
Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741
Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press
Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178
Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189
Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126
Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432
Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109
Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760
Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678
Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506
Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071
Kempthorne O (1960) Biometrical genetics New YorkPergammon Press
Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press
Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210
Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452
Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334
LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425
Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293
Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298
MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432
Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092
Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43
Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493
Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497
Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press
SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University
Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148
112 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171
Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press
Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit
Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit
Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95
Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146
Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation
Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230
Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington
Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454
113Twin Research February 2004
Analysis of Melanoma Onset
Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects
107Twin Research February 2004
Analysis of Melanoma Onset
covariance between parent and a child A negativevalue for s 2
Cs may suggest dominance To optimizeconvergence in BUGS Model (1) may be reparameter-ized as
where Fi Gi H i are independent additive randomeffects or latent variables and are the residual errorterms for parents and children respectively If we model
Fi ~ N(0 mdash12mdash s 2
A +s 2C)
Gi ~ N(0 mdash12mdash s 2
A) Hi ~ N(0 s 2Cs)
then the basic genetics covariance model (additivegenetic common environment and unique environ-ment for this particular four-member family structureis satisfied
In survival models unobserved or unmeasuredexplanatory variables some of which may be geneticare often referred to as frailties The frailties takevalues restricted to the posit ive line and may beassumed to act multiplicatively on the hazardExtending the above model to correlated family datawith time-to-onset endpoint a multiplicative individ-ual heterogeneity or frailty term representing thelatent genetic and common environment variablesmay be modeled as random effects simultaneouslywith the effects associated with observed covariatesConsider right censored time to onset of melanomadata (Tij dij zij)1 j n from n relative pairs hereT i j denotes the true age at onset of the jth familymember or the censored time depending on whetherdij =1 or 0 respectively and z denotes a p acute 1 vector ofcovariates A Weibull distribution may be used tomodel time to failure as
f(ti zi) = ebrsquoz igt igndash1 exp(ndashebrsquoz it g
i )
where b is a vector of unknown regression coefficientsand g is the shape parameter of the Weibull distribu-tion This leads to a baseline hazard of the form
l0(ti) = gt igndash1
Re-parameterize by letting mi = ebrsquoz i the conditionaldistribution of ti given mI is then Weibull (gmi) We for-mulated a mixed model to represent the conditionaldistribution of tij given covariate effects random addi-tive genetic and common environment effects as
tijfrac12mij ~ Weibull (g mij) i = 1hellipn j = 12
where
The regression coefficients and the precision of therandom effects (tG tF tH) were given ldquonon-informa-tiverdquo Normal and Gamma priors respectively The shapeparameter g of the time to onset of melanoma distribu-tion was also given a non-informative Gamma priorwhich was slowly decreasing on the positive real line
The overall model (including random effects) maybe described in a Bayesian graph (Figure 3) whichsimplifies sampling from full conditional distributionsby exploiting part ial independence propert ies(Spiegelhalter et al 1996a) In this graph eachrandom quantity is represented by a node which maybe connected by directed or undirected linksConditional independence assumptions are repre-sented by the absence of such links
We implemented the Gibbs sampler using theBUGS program (Gilks et al 1994) (code in theAppendix) Imputation of missing data was handlednaturally in the Gibbs sampling framework by treatingmissing values as additional unknown quantities andrandomly sampling values from their full conditionaldistributions We chose simple prior distributions forimputation since the number of missing values forcovariates was not large (complete for birth year 16missing for naevi and 21 missing for freckles) andthere was no indication of non-random missingness inour data Therefore imputation for missing naevi andfreckles covariates were based on Bernouilli prior dis-tributions with respective parameter values estimatedfrom the complete observations We performed aninitial 10000 burn-in iterations followed by an addi-tional 20000 Parameter estimates were the mean andstandard deviation (SD) of all post convergence Gibbssamples with a thinning interval between 20 and 50credible intervals were computed as the lower andupper a2 percentiles from the last 20000 iterationsConvergence to the posterior distribution was con-firmed by using the different criteria provided by theadd-on CODA package including those of Gelmanand Rubin (1992) Geweke (1992) and Raftery andLewis (1992a 1992b)
ResultsGEE Approach
An inspection of residual plots following preliminarymodel-fitting provided no evidence for the failure ofproportional hazards assumption and did not detectinfluential observations We then proceeded to applythe GEE approach that could estimate regressioncoefficients while incorporating a dependence struc-ture between relative pairs The results are displayed
108 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
in Table 3 which suggest that later birth year havingat least a moderate number of naevi and freckles wereall simultaneously associated with later age at onsetof melanoma Table 3 also presents the estimatedodds-ratios for quantifying the correlation betweenpaired relatives of specific relationships
The odds ratio for sibndashsib pairs is 2973 which issignificantly different from 1 (p lt 001) The odds ratiofor parentndashchild pairs is 1650 which is slightly greaterthan 1 indicating a mild dependency between thesepairs although they are not quite statistically signifi-cant The odds ratio for second-degree and higherrelative pairs is 1155 indicating no dependence at allbetween these pairs This pattern of familial aggrega-tion is compatible with dominance variance as well asadditive genetic variance (Falconer 1990)
Bayesian Approach
We re-analyzed the age-at-onset of melanoma by usingGibbs sampling to impute missing covariates and toestimate subject-specific covariate effects randomadditive genetic common family environment andshared sibling environment effects on the log scale Theestimated shape parameter g was 43 with a 95 credi-ble interval (CI) of (42 46) The results aresummarized in Table 4 (Model A) The residual plotsdid not indicate a gross departure from the underlyingWeibull model and revealed no influential observa-tions We checked the sensitivity of the analyses to
initial parameter values by re-running the Gibbssampler five more times using different starting valuesThe resulting estimates did not differ by more than 5from the values reported here The mean estimate for(s 2
A) was 0452 with 95 CI = (0348 0566) for (s 2C)
was ndash0053 with 95 CI = (ndash0120 0019) and for(s 2
Cs) was 0467 with 95 CI = (0393 0545)A small negative value for the common family
effect suggests that there may be a dominant effect orthat the current model is not quite appropriate forexample that there are systematic effects or correla-tion structures that have not been accounted for Theresults here indicate that additive genetics seem toimpact equally on the variation of the age at onset ofmelanoma Further exploration for alternative modelsis a focus of our future research
In addition we investigated the relative contribu-tion of genetic and environmental effects on theexpression of naevi (Model B) and freckles (ModelC) which are known risk factors for melanoma Theexpressions of naevi and freckles were coded asbinary variables (none or few moles versus moderateor many moles and no freckles versus one or morefreckles) A hierarchical Bayesian binomial model wasfitted to estimate the random variance componentsThe results in Table 4 indicate that a common familyenvironment effect contributed the most to theexpression of naevi (s 2
C = 0704) (relative to the con-tributions of additive genetic effect (s 2
A = 0142) and
Individual j in family i
Figure 3
Graphical model of covariate and random family effects for an individual in a nuclear family structure tij represents the observed failure time forthe jth individual in the ith family with tcenij being an indicator variable of censoring status Full arrows indicate stochastic links to which a probabil-ity is attached broken arrows denote deterministic relationships bs are regression coefficients T is the precision of the prior distribution andequals the inverse of the variance Fi Gi Hi are random effects modeled as Fi ~ N(012 s 2
A + s 2C) Gi ~ N(0 12 s 2
A) and Hi ~ N(0 s 2Cs) Rectangles rep-
resent actual data values for the covariates g and mij are shape and scale parameters for the underlying Weibull distribution
109Twin Research February 2004
Analysis of Melanoma Onset
of shared sibling effect(s 2Cs = 0142) both of which
were non negligible In contrast variation in theexpression of freckles was largely explained by addi-tive genetic and shared family effects(s 2
A = 2050 s 2C =
2600) compared to a relatively small shared siblingeffect(s 2
C = 0115)
Discussion
We applied two methods mdash generalized estimatingequat ion and Bayesian analys is mdash to the geneticanalysis of age at onset of melanoma based on anuclear family structure Under both approaches theresults suggest that additive genetic factors played an
Table 4
Gibbs Sampling Approach Estimated Regression Coefficients and Estimated Variance Components in a Melanoma Study of Queensland Families
Weibull Model A Mean effects mdash Response variable is Age-at-onset
Covariate RR = eb Coefficient b Robust se(b) 95 CI of bYear of birth 1378 0321 00027 (03160326)
Naevi 1126 0119 00021 (00580185)
Freckling 1017 0017 01400 (ndash00050085)
Weibull Model B Variance components mdash Response variable is Age-at-onset
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 0452 0054 (03480566)
s 2C ndash0053 0027 (ndash01200019)
s 2Cs 0467 0040 (03930545)
g 43 0104 (4246)
Binomial Model Variance components mdash Response variable is Naevi
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 0142 0149 (00020498)
s 2C 0704 0156 (04031010)
s 2Cs 0195 0157 (000250553)
Binomial Model Variance components mdash Response variable is Freckling
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 2050 0779 (08353570)
s 2C 2600 0418 (17803460)
s 2Cs 0115 0088 (00110312)
Note Indicates Significance at the 005 level) Naevi Is a Binary Variable with Baseline 0 = No or Few Moles Freckling Is Coded as a Binary Variable with Baseline0 = No Freckles
Table 3
GEE Approach Estimated Regression Coefficients in the Proportional Hazard Model and Estimated Odds Ratios for Quantifying FamilialAggregation in Age at Onset of Melanoma in Queensland Families ( Indicates Significance at the 05 Significance Level)
A Mean effects
Covariate RR = eb Coefficient b Robust se(b) Z-statistic
Year of birth 1142 0132 0051 2588
Naevi (Baseline = No or few moles) 1765 0568 0073 7781
Freckling (Baseline = No freckles) 1160 0148 0049 3020
B Pattterns of familial aggregation
Relationship 1 + q se(q) Z-statistic
Sibndashsib 2973 06217 317
Parentndashchild 1650 0434 150
SecondOthers 1155 0270 047
important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix
Acknowledgments
Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee
ReferencesAbel L amp Bonney G E (1990) A time-dependent
logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407
Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401
Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165
Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc
Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag
Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press
Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625
Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California
Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212
Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151
Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117
Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202
Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235
English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283
Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd
Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494
GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software
110 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
111Twin Research February 2004
Analysis of Melanoma Onset
Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214
Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188
Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409
Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511
Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741
Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press
Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178
Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189
Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126
Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432
Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109
Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760
Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678
Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506
Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071
Kempthorne O (1960) Biometrical genetics New YorkPergammon Press
Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press
Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210
Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452
Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334
LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425
Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293
Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298
MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432
Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092
Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43
Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493
Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497
Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press
SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University
Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148
112 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171
Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press
Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit
Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit
Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95
Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146
Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation
Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230
Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington
Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454
113Twin Research February 2004
Analysis of Melanoma Onset
Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects
108 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
in Table 3 which suggest that later birth year havingat least a moderate number of naevi and freckles wereall simultaneously associated with later age at onsetof melanoma Table 3 also presents the estimatedodds-ratios for quantifying the correlation betweenpaired relatives of specific relationships
The odds ratio for sibndashsib pairs is 2973 which issignificantly different from 1 (p lt 001) The odds ratiofor parentndashchild pairs is 1650 which is slightly greaterthan 1 indicating a mild dependency between thesepairs although they are not quite statistically signifi-cant The odds ratio for second-degree and higherrelative pairs is 1155 indicating no dependence at allbetween these pairs This pattern of familial aggrega-tion is compatible with dominance variance as well asadditive genetic variance (Falconer 1990)
Bayesian Approach
We re-analyzed the age-at-onset of melanoma by usingGibbs sampling to impute missing covariates and toestimate subject-specific covariate effects randomadditive genetic common family environment andshared sibling environment effects on the log scale Theestimated shape parameter g was 43 with a 95 credi-ble interval (CI) of (42 46) The results aresummarized in Table 4 (Model A) The residual plotsdid not indicate a gross departure from the underlyingWeibull model and revealed no influential observa-tions We checked the sensitivity of the analyses to
initial parameter values by re-running the Gibbssampler five more times using different starting valuesThe resulting estimates did not differ by more than 5from the values reported here The mean estimate for(s 2
A) was 0452 with 95 CI = (0348 0566) for (s 2C)
was ndash0053 with 95 CI = (ndash0120 0019) and for(s 2
Cs) was 0467 with 95 CI = (0393 0545)A small negative value for the common family
effect suggests that there may be a dominant effect orthat the current model is not quite appropriate forexample that there are systematic effects or correla-tion structures that have not been accounted for Theresults here indicate that additive genetics seem toimpact equally on the variation of the age at onset ofmelanoma Further exploration for alternative modelsis a focus of our future research
In addition we investigated the relative contribu-tion of genetic and environmental effects on theexpression of naevi (Model B) and freckles (ModelC) which are known risk factors for melanoma Theexpressions of naevi and freckles were coded asbinary variables (none or few moles versus moderateor many moles and no freckles versus one or morefreckles) A hierarchical Bayesian binomial model wasfitted to estimate the random variance componentsThe results in Table 4 indicate that a common familyenvironment effect contributed the most to theexpression of naevi (s 2
C = 0704) (relative to the con-tributions of additive genetic effect (s 2
A = 0142) and
Individual j in family i
Figure 3
Graphical model of covariate and random family effects for an individual in a nuclear family structure tij represents the observed failure time forthe jth individual in the ith family with tcenij being an indicator variable of censoring status Full arrows indicate stochastic links to which a probabil-ity is attached broken arrows denote deterministic relationships bs are regression coefficients T is the precision of the prior distribution andequals the inverse of the variance Fi Gi Hi are random effects modeled as Fi ~ N(012 s 2
A + s 2C) Gi ~ N(0 12 s 2
A) and Hi ~ N(0 s 2Cs) Rectangles rep-
resent actual data values for the covariates g and mij are shape and scale parameters for the underlying Weibull distribution
109Twin Research February 2004
Analysis of Melanoma Onset
of shared sibling effect(s 2Cs = 0142) both of which
were non negligible In contrast variation in theexpression of freckles was largely explained by addi-tive genetic and shared family effects(s 2
A = 2050 s 2C =
2600) compared to a relatively small shared siblingeffect(s 2
C = 0115)
Discussion
We applied two methods mdash generalized estimatingequat ion and Bayesian analys is mdash to the geneticanalysis of age at onset of melanoma based on anuclear family structure Under both approaches theresults suggest that additive genetic factors played an
Table 4
Gibbs Sampling Approach Estimated Regression Coefficients and Estimated Variance Components in a Melanoma Study of Queensland Families
Weibull Model A Mean effects mdash Response variable is Age-at-onset
Covariate RR = eb Coefficient b Robust se(b) 95 CI of bYear of birth 1378 0321 00027 (03160326)
Naevi 1126 0119 00021 (00580185)
Freckling 1017 0017 01400 (ndash00050085)
Weibull Model B Variance components mdash Response variable is Age-at-onset
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 0452 0054 (03480566)
s 2C ndash0053 0027 (ndash01200019)
s 2Cs 0467 0040 (03930545)
g 43 0104 (4246)
Binomial Model Variance components mdash Response variable is Naevi
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 0142 0149 (00020498)
s 2C 0704 0156 (04031010)
s 2Cs 0195 0157 (000250553)
Binomial Model Variance components mdash Response variable is Freckling
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 2050 0779 (08353570)
s 2C 2600 0418 (17803460)
s 2Cs 0115 0088 (00110312)
Note Indicates Significance at the 005 level) Naevi Is a Binary Variable with Baseline 0 = No or Few Moles Freckling Is Coded as a Binary Variable with Baseline0 = No Freckles
Table 3
GEE Approach Estimated Regression Coefficients in the Proportional Hazard Model and Estimated Odds Ratios for Quantifying FamilialAggregation in Age at Onset of Melanoma in Queensland Families ( Indicates Significance at the 05 Significance Level)
A Mean effects
Covariate RR = eb Coefficient b Robust se(b) Z-statistic
Year of birth 1142 0132 0051 2588
Naevi (Baseline = No or few moles) 1765 0568 0073 7781
Freckling (Baseline = No freckles) 1160 0148 0049 3020
B Pattterns of familial aggregation
Relationship 1 + q se(q) Z-statistic
Sibndashsib 2973 06217 317
Parentndashchild 1650 0434 150
SecondOthers 1155 0270 047
important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix
Acknowledgments
Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee
ReferencesAbel L amp Bonney G E (1990) A time-dependent
logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407
Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401
Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165
Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc
Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag
Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press
Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625
Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California
Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212
Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151
Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117
Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202
Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235
English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283
Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd
Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494
GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software
110 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
111Twin Research February 2004
Analysis of Melanoma Onset
Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214
Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188
Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409
Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511
Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741
Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press
Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178
Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189
Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126
Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432
Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109
Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760
Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678
Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506
Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071
Kempthorne O (1960) Biometrical genetics New YorkPergammon Press
Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press
Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210
Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452
Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334
LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425
Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293
Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298
MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432
Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092
Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43
Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493
Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497
Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press
SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University
Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148
112 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171
Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press
Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit
Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit
Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95
Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146
Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation
Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230
Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington
Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454
113Twin Research February 2004
Analysis of Melanoma Onset
Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects
109Twin Research February 2004
Analysis of Melanoma Onset
of shared sibling effect(s 2Cs = 0142) both of which
were non negligible In contrast variation in theexpression of freckles was largely explained by addi-tive genetic and shared family effects(s 2
A = 2050 s 2C =
2600) compared to a relatively small shared siblingeffect(s 2
C = 0115)
Discussion
We applied two methods mdash generalized estimatingequat ion and Bayesian analys is mdash to the geneticanalysis of age at onset of melanoma based on anuclear family structure Under both approaches theresults suggest that additive genetic factors played an
Table 4
Gibbs Sampling Approach Estimated Regression Coefficients and Estimated Variance Components in a Melanoma Study of Queensland Families
Weibull Model A Mean effects mdash Response variable is Age-at-onset
Covariate RR = eb Coefficient b Robust se(b) 95 CI of bYear of birth 1378 0321 00027 (03160326)
Naevi 1126 0119 00021 (00580185)
Freckling 1017 0017 01400 (ndash00050085)
Weibull Model B Variance components mdash Response variable is Age-at-onset
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 0452 0054 (03480566)
s 2C ndash0053 0027 (ndash01200019)
s 2Cs 0467 0040 (03930545)
g 43 0104 (4246)
Binomial Model Variance components mdash Response variable is Naevi
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 0142 0149 (00020498)
s 2C 0704 0156 (04031010)
s 2Cs 0195 0157 (000250553)
Binomial Model Variance components mdash Response variable is Freckling
Latent effect Mean from 5000 iterations se(s2) 95 CI of s2
s 2A 2050 0779 (08353570)
s 2C 2600 0418 (17803460)
s 2Cs 0115 0088 (00110312)
Note Indicates Significance at the 005 level) Naevi Is a Binary Variable with Baseline 0 = No or Few Moles Freckling Is Coded as a Binary Variable with Baseline0 = No Freckles
Table 3
GEE Approach Estimated Regression Coefficients in the Proportional Hazard Model and Estimated Odds Ratios for Quantifying FamilialAggregation in Age at Onset of Melanoma in Queensland Families ( Indicates Significance at the 05 Significance Level)
A Mean effects
Covariate RR = eb Coefficient b Robust se(b) Z-statistic
Year of birth 1142 0132 0051 2588
Naevi (Baseline = No or few moles) 1765 0568 0073 7781
Freckling (Baseline = No freckles) 1160 0148 0049 3020
B Pattterns of familial aggregation
Relationship 1 + q se(q) Z-statistic
Sibndashsib 2973 06217 317
Parentndashchild 1650 0434 150
SecondOthers 1155 0270 047
important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix
Acknowledgments
Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee
ReferencesAbel L amp Bonney G E (1990) A time-dependent
logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407
Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401
Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165
Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc
Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag
Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press
Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625
Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California
Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212
Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151
Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117
Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202
Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235
English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283
Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd
Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494
GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software
110 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
111Twin Research February 2004
Analysis of Melanoma Onset
Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214
Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188
Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409
Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511
Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741
Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press
Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178
Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189
Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126
Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432
Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109
Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760
Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678
Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506
Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071
Kempthorne O (1960) Biometrical genetics New YorkPergammon Press
Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press
Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210
Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452
Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334
LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425
Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293
Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298
MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432
Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092
Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43
Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493
Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497
Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press
SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University
Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148
112 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171
Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press
Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit
Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit
Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95
Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146
Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation
Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230
Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington
Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454
113Twin Research February 2004
Analysis of Melanoma Onset
Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects
important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix
Acknowledgments
Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee
ReferencesAbel L amp Bonney G E (1990) A time-dependent
logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407
Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401
Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165
Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc
Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag
Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press
Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625
Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California
Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212
Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151
Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117
Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202
Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235
English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283
Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd
Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494
GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software
110 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
111Twin Research February 2004
Analysis of Melanoma Onset
Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214
Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188
Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409
Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511
Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741
Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press
Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178
Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189
Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126
Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432
Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109
Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760
Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678
Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506
Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071
Kempthorne O (1960) Biometrical genetics New YorkPergammon Press
Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press
Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210
Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452
Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334
LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425
Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293
Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298
MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432
Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092
Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43
Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493
Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497
Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press
SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University
Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148
112 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171
Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press
Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit
Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit
Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95
Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146
Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation
Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230
Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington
Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454
113Twin Research February 2004
Analysis of Melanoma Onset
Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects
111Twin Research February 2004
Analysis of Melanoma Onset
Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214
Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188
Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409
Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511
Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741
Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press
Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178
Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189
Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126
Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432
Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109
Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760
Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678
Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506
Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071
Kempthorne O (1960) Biometrical genetics New YorkPergammon Press
Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press
Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210
Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452
Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334
LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425
Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293
Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298
MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432
Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092
Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43
Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493
Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497
Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press
SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University
Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148
112 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171
Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press
Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit
Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit
Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95
Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146
Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation
Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230
Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington
Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454
113Twin Research February 2004
Analysis of Melanoma Onset
Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects
112 Twin Research February 2004
Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin
Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171
Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press
Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit
Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit
Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95
Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146
Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation
Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230
Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington
Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454
113Twin Research February 2004
Analysis of Melanoma Onset
Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects
113Twin Research February 2004
Analysis of Melanoma Onset
Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects