16
98 Twin Research Volume 7 Number 1 pp. 98± 113 W e investigate whether relative contributions of genetic and shared environmental factors are associated with an increased risk in melanoma. Data from the Queensland Familial Melanoma Project comprising 15,907 subjects arising from 1912 fami- lies were analyzed to estimate the additive genetic, common and unique environmental contributions to variation in the age at onset of melanoma. Two com- plementary approaches for analyzing correlated time-to-onset family data were considered: the gener- alized estimating equations (GEE) method in which one can estimate relationship-specific dependence simultaneously with regression coefficients that describe the average population response to chang- ing covariates; and a subject-specific Bayesian mixed model in which heterogeneity in regression para- meters is explicitly modeled and the different components of variation may be estimated directly. The proportional hazards and Weibull models were utilized, as both produce natural frameworks for esti- mating relative risks while adjusting for simultaneous effects of other covariates. A simple Markov Chain Monte Carlo method for covariate imputation of missing data was used and the actual implementation of the Bayesian model was based on Gibbs sampling using the free ware package BUGS. In addition, we also used a Bayesian model to investigate the relative contribution of genetic and environmental effects on the expression of naevi and freckles, which are known risk factors for melanoma. Melanoma is a complex chronic disease, the incidence of which has more than doubled over the past 20 years (MacLennan et al., 1992). The disease’s complexity can be, in part, explained in terms of the joint effects of genotype and environment, and censorship due to the late onset of the disease in most patients. Studies have revealed a number of possible under- lying factors that may contribute to the risk of melanoma (English et al, 1997; Siskind et al, 2002; Swerdlow & Green, 1987). Such factors have included sun exposure, skin, hair and eye colour, degree of freckling, number of naevi, place of birth, and ethnic origin. It is also thought that certain genes are responsible for a person’s susceptibility to the disease (Aitken et al., 1998) and that a number of the above mentioned risk factors may themselves be genetically influenced. A preliminary segregation analysis conducted by Aitken et al. (1998) investigated whether the familial clustering of cutaneous melanoma was consistent with Mendelian inheritance of a major autosomal gene. Analyses were performed with the SAGE statis- tical package using the maximum likelihood REGTL program for a binary trait. The hypothesis of co-dom- inant Mendelian inheritance gave a significantly better fit to the data than either dominant or recessive Mendelian inheritance. Overall, both Mendelian inheritance of a single major gene and purely environ- mental transmission were rejected. However, there was strong evidence of familial dependence in melanoma occurrence. Despite these findings, the etiology of melanoma is still not well understood. For example, it is unclear why some people are more susceptible to the disease than others and for susceptible cases, it is unclear whether certain risk factors play a role in the progres- sion of the disease or if genetic factors are a major source of influence. Analysis of Melanoma Onset: Assessing Familial Aggregation by Using Estimating Equations and Fitting Variance Components via Bayesian Random Effects Models Kim-Anh Do 1 , Joanne F. Aitken 2 , Adèle C. Green 3 , and Nicholas G. Martin 3 1 Department of Biostatistics,The University of Texas M.D. Anderson Cancer Center, Houston,Texas, USA 2 Queensland Cancer Fund Epidemiology Unit,Brisbane,Australia 3 Epidemiology and Population Health Unit, Queensland Institute of Medical Research, Brisbane,Australia Address for correspondence: Dr. K-A Do, Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, Texas 77030, USA. Email: [email protected]

Analysis of Melanoma Onset: Assessing Familial Aggregation ... · and Fitting Variance Components via Bayesian Random Effects Models Kim-Anh Do1, Joanne F. Aitken2, Adè le C. Green3,

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Analysis of Melanoma Onset: Assessing Familial Aggregation ... · and Fitting Variance Components via Bayesian Random Effects Models Kim-Anh Do1, Joanne F. Aitken2, Adè le C. Green3,

98 Twin Research Volume 7 Number 1 pp 98plusmn 113

We investigate whether relative contributions ofgenetic and shared environmental factors are

associated with an increased risk in melanoma Datafrom the Queensland Familial Melanoma Projectcomprising 15907 subjects arising from 1912 fami-lies were analyzed to estimate the additive geneticcommon and unique environmental contributions tovariation in the age at onset of melanoma Two com-plementary approaches for analyz ing correlatedtime-to-onset family data were considered the gener-alized estimating equations (GEE) method in whichone can estimate relationship-specific dependencesimultaneously with regression coefficients thatdescribe the average population response to chang-ing covariates and a subject-specific Bayesian mixedmodel in which heterogeneity in regression para-meters is explici tly modeled and the differentcomponents of variation may be estimated directlyThe proportional hazards and Weibull models wereutilized as both produce natural frameworks for esti-mating relative risks while adjusting for simultaneouseffects of other covariates A simple Markov ChainMonte Carlo method for covariate imputat ion ofmissing data was used and the actual implementationof the Bayesian model was based on Gibbs samplingusing the free ware package BUGS In addition wealso used a Bayesian model to investigate the relativecontribution of genetic and environmental effects onthe expression of naevi and freckles which areknown risk factors for melanoma

Melanoma is a complex chronic disease the incidenceof which has more than doubled over the past 20 years(MacLennan et al 1992) The diseasersquos complexity canbe in part explained in terms of the joint effects ofgenotype and environment and censorship due to thelate onset of the disease in most patients

Studies have revealed a number of possible under-lying factors that may contribute to the risk ofmelanoma (English et al 1997 Siskind et al 2002Swerdlow amp Green 1987) Such factors haveincluded sun exposure skin hair and eye colourdegree of freckling number of naevi place of birthand ethnic origin It is also thought that certain genesare responsible for a personrsquos susceptibility to thedisease (Aitken et al 1998) and that a number of theabove mentioned risk factors may themselves begenetically influenced

A preliminary segregation analysis conducted byAitken et al (1998) investigated whether the familialclustering of cutaneous melanoma was consistentwith Mendelian inheritance of a major autosomalgene Analyses were performed with the SAGE statis-tical package using the maximum likelihood REGTLprogram for a binary trait The hypothesis of co-dom-inant Mendelian inheritance gave a significantlybetter fit to the data than either dominant or recessiveMendelian inheritance Overall both Mendelianinheritance of a single major gene and purely environ-mental transmission were rejected However therewas strong evidence of familial dependence inmelanoma occurrence

Despite these findings the etiology of melanoma isstill not well understood For example it is unclearwhy some people are more susceptible to the diseasethan others and for susceptible cases it is unclearwhether certain risk factors play a role in the progres-sion of the disease or if genetic factors are a majorsource of influence

Analysis of Melanoma Onset Assessing Familial Aggregation by Using Estimating Equations and Fitting Variance Components via Bayesian Random Effects Models

Kim-Anh Do1 Joanne F Aitken2 Adegravele C Green3 and Nicholas G Martin3

1 Department of BiostatisticsThe University of Texas MD Anderson Cancer CenterHoustonTexas USA2 Queensland Cancer Fund Epidemiology Unit BrisbaneAustralia3 Epidemiology and Population Health Unit Queensland Institute of Medical Research BrisbaneAustralia

Address for correspondence Dr K-A Do Department of BiostatisticsThe University of Texas MD Anderson Cancer Center 1515 HolcombeBlvd Houston Texas 77030 USA Email kimmdandersonorg

Research into chronic diseases such as melanomahas largely been achieved through family studieswhere the age of onset of the disease is modelledwithin a survival analysis framework An example ofthis type of analysis is the work by Abel and Bonney(1990) who developed a model accounting for age ofonset where the hazard function was expressed interms of a major gene effect and residual familydependence using the regressive approach described inBonney (1986)

Clayton (1978) and Vaupel (1979) introduced theso-called ldquofrailtyrdquo model which has been extended bymany others (Andersen et al 1993 Clayton ampCuzick 1985 Gauderman amp Thomas 1994Hougaard 1986 Li amp Thompson 1997 Li ampWijsman 1998 Nielsen et al 1992 Siegmund ampMcKnight 1998) A variant of these approaches usesestimating equations which accommodates correlatedage of onset outcomes and is known to be robust andcomputationally efficient (Hsu amp Prentice 1996 Hsuamp Zhao 1996) A common theme is the investigationof possible underlying genetic and environmentalfactors that may influence the age of onset of thedisease of interest

Recently the focus of familial studies involvingcensored data has moved towards the development ofBayesian methods using packages such as BUGS andWinBUGS (Gilks et al 1994 Spiegelhalter et al1996a 1996b 2003) the Genetic Analysis Package(GAP 1996) and MIXD (Olshen amp Wijsman 1996Thompson 1994) Examples of this include theimplementation of Markov Chain Monte Carlo(MCMC) methods for linkage analysis (Heath 1997Kong et al 1992 Lange amp Sobel 1991) the estima-tion of parameters in a mixed model with andwithout covariates (Guo amp Thompson 1991Thomas 1992) estimation of a gene-smoking interac-tion and covariate imputation (Gauderman et al1997) combined linkage and segregation analysis(Guo amp Thompson 1992 Faucett et al 1993) anddevelopment of mixed models for large complex pedi-grees (Guo amp Thompson 1994) Recent work by Doet al (2000) demonstrates an application of Bayesianmethodology to menopausal age in twins using a gen-eral ised l inear mixed model (GLMM) In thisapplication they investigated the contribution ofcovariates and any underlying genetic and environ-mental factors to explain variation in menopausalage A similar investigation by Scurrah et al (2000)fitted a Bayesian model to survival data In this appli-cation the authors used a Bayesian piece-wiseexponential model to explore the time to onset of res-piratory disease given known risk factors andpossible familial effects Both papers highlight theflexible nature of the Bayesian approach which canbe seen through the inclusion of priors and the inte-gration of fixed and random effects

The Bayesian approach to fitting a genetic modelin a GLMM framework was examined in detail inKuhnert and Do (2003) In this paper a simulationstudy investigated the flexible nature of the Bayesianmodel and its ability to incorporate genetic compo-nents through random effects This was comparedwith standard maximum likelihood methods for esti-mating genetic components in the model Resultsfrom a simulation study indicated a consistent advan-tage in using the Bayesian method to detect a correctmodel under particular scenarios of additive geneticsand common environmental effects Moreover forbinary data there was difficulty in detecting thecorrect model under low and moderate levels of heri-tability Results however were improved for ordinaldata under similar scenarios

We present an alternative model using Bayesianmethodology which takes into account the complexfeatures inherent with melanoma using a large datasetcomprising 1449 families The model does not assumeproportional hazards but a multiplicative modelwhere the Weibull distribution is used to model the ageat onset of melanoma This is fit in a Bayesian frame-work which incorporates fixed and random effects toestimate possible risk factors covariates and anyunderlying genetic factors Another approach toaccommodate the correlated age-at-onset outcomes rig-orously is to use the estimating equations for assessingfamilial aggregation of age-at-onset (Hsu amp Prentice1996 Hsu amp Zhao 1996) It has two desirable fea-tures (i) robustness mdash no higher-order distributionalassumptions are required beyond pairwise ones and(ii) computational efficiency

In this article our interest lies primarily withinvestigating whether certain risk factors are able toexplain a considerable proportion of the familialdependence and if including them into the modelreduces the residual variation and results in increasedpower for detecting a major gene effect To ourknowledge this study will be the first in melanomaresearch that extends current methodology and incor-porates covariate and genetic effects simultaneouslywith age of onset using Bayesian methodology onfamily data

Material and MethodsThe Data on the Age at Onset of Melanoma and Potential Risk Factors

We analysed data from the Queensland FamilialMelanoma Project Family ascertainment and datacollection have been described in detail (Aitken et al1996) Assessing standard melanoma risk factorsinclude counts of naevi on the arms and back demo-graphic and medical details lifetime residence andfamily history of melanoma and other cancers Brieflywe ascertained all 12016 first incident cases of cuta-neous melanoma (invasive and in situ) diagnosed inQueensland residents between 1982 and 1990 and

99Twin Research February 2004

Analysis of Melanoma Onset

reported to the Queensland Cancer Registry or foundby comparing cancer registrations for 1984 and 1987with records of pathology laboratories throughoutQueensland It is estimated that registry records areapproximately 95 complete for the study periodDoctorsrsquo permission was obtained to approach10407 cases of whom 7784 (75) returned a brieffamily history questionnaire stating whether any oftheir first-degree relatives (parents siblings children)had had a diagnosis of melanoma A total of 2920probands was sampled from these respondentsincluding all who had claimed a positive familyhistory (n = 1529) and an approximate 20 randomsample of the remainder (n = 1391) Probands weresent a detailed family history questionnaire askingfor the names and addresses of all first-degree rela-tives relativesrsquo vital status dates of birth and agesand whether any of these relatives had had amelanoma diagnosed by a doctor To avoid bias indetermining the mode of inheritance second andlower degree relatives were enrolled in the studyaccording to a sequential sampling scheme (Canningsamp Thompson 1977) First degree relatives of all rela-tives with confirmed melanoma were ascertainedthrough the detailed family history questionnairedescribed above which was mailed to all confirmedpositive relatives In total 15989 relatives belongingto 1912 separate families were reported by 2118(73) probands or other positive relatives A total of1044 relatives for whom date of birth was unknownwere excluded leaving 14945 relatives for analysisThere were 188 families independently ascertainedthrough two or more probands To avoid ascertain-ment bias these families were included in the datasetseparately for each proband in the family

Medical confirmation and dates of diagnosis weresought for the relatives reported by probands or otherrelatives to have had melanoma After eliminating187 of subjects who refused access to their medicalrecords or those with lost records or those with falsepositive reports (basal or squamous cell carcinomasolar keratoses or benign naevi) medical confirma-tion of melanoma as the diagnosis was obtained for482 of the original number of relatives Only themedically verified cases among relatives were classi-fied as true events all other relatives were treated asunaffected (censored at last date of contact)

Risk factor questionnaires were subsequentlymailed to all living relatives aged between 18 and 75years ascertained through the sequential samplingprocedure Other relatives provided proxy reportsThe combined number of proxy-reports and self-reports was 9746 relatives for whom standard riskfactor information was available

For the Bayesian analysis we focused on familiesthat included at least one parent and at least onechild where each member in the family should haveinformation on age at diagnosis or age at last follow-up and with maximum one missing covariate

The demographic covariates and hypothesizedmelanoma risk factors included gender birth yearplace of birth ability to tan (very brown moderatetan slight tan no tan) propensity to burn (neverburn and always tan sometimes burn and usually tanusually burn and sometimes tan always burn andnever tan) number of sunburns (0 1 2ndash5 gt 6) skincolor (olivedark medium fairpale) hair color(black lightdark brown fairblonde lightdark red)eye color (brown greenhazel bluegrey) total freck-ling in summer (0 1ndash100 gt 100) number of naevi(none few moderate number very many) andnumerous measures of cumulative lifetime exposuresto sun and ultraviolet rays

Preliminary Exploratory Analysis

As a preliminary analysis we ignored correlationswithin families and applied a combination of paramet-ric and non-parametric survival analysis techniques asexploratory tools to identify possible risk factors formelanoma Once these fixed effects were identified weconsidered incorporating these into a subsequent gen-eralized estimating equation model or a Bayesianmodel with random effects that could account forwithin-family correlations The aim was to quantify thegenetic and familial associations in the presence ofobserved covariate effects

Manipulation of the entire melanoma datasetresulted in a subset of 9669 observations with a rangeof explanatory variables that described phenotypiccharacteristics for each individual along with somedemographic details such as birth year and genderThe response variable was the time to diagnosis (orage at the last follow-up) with the proportion of cen-sored cases being approximately 76 The medianage at onset of melanoma was 43 The correlationestimates for age of onset for different relationshippairs with both affected members were 067 055and 039 for sibndashsib parentndashchild and secondlowerorder pairs respectively

The first stage of modelling involved fitting uni-variable proportional hazard models to assess eachvariablersquos individual effect on the time of onset ofmelanoma The SAS (Allison 1995) package wasused to fit proportional hazards models of the follow-ing form

h(t x) = Y (x b)h0(t)

where Y represents a log-linear function ebrsquox of theexplanatory variables x and corresponding coeffi-cients b and h0(t) represents the baseline hazard attime t

Significant variables associated with the age atonset of melanoma consisted of eye hair and skincolour freckling number of moles skin type abilityto burn ability to tan previous skin cancers ultravio-let exposure between the ages of 5 and 12 years

100 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

101Twin Research February 2004

Analysis of Melanoma Onset

cumulative sun exposure up to the age of 19 yearsand birth year

The second stage fitted multivariable proportionalhazard models to those explanatory variables thatwere significant at the univariable stage Table 1 dis-plays the results from the final model which onlyincluded significant variables (p-value lt 05) It isworth noting that a similar result could be obtainedusing an automated stepwise procedure

Results from this analysis highlight some interest-ing but quite obvious risk factors noted in previousanalyses (Aitken et al 1996 1998) For example

An increase of 1 year in birth year induces 17increase in risk of earlier melanoma onset

People with neither freckles nor naevi have thelowest risk of melanoma onset

The risk of earlier melanoma onset is increased byup to 37 for blue eyed people and even further(46) for green eyed people when compared toindividuals with brown eyes

ldquoRed Headsrdquo have an increased risk of earliermelanoma onset (46) when compared to indi-viduals with black hair However no significantincrease was noted for individuals with fair orlight red hair

A personrsquos tendency to burn easily increases therisk of earlier melanoma onset in some cases byup to 100 compared to those that never burn

However striking this last statement is issues of con-founding must also be considered The most obviousillustration of this is the confounding that occurredbetween mole count and freckling This is seenthrough close inspection of the parameter estimateswhich changed in magnitude when mole count wasadded to the model after adjusting for freckling(Results not displayed here)

To reduce the dimension of the problem furtherand avoid some of these confounding issues a sur-vival tree was constructed using RPART (RecursivePartitioning and Regression Trees) see Therneau andAtkinson (1997) Survival trees are a special case of

Table 1

Results from Fitting a Multivariable Proportional Hazards Model to the Melanoma Data Based on Univariable Results The Results Reported in thisTable Are the Parameter Estimates b Their Standard Errors se(b) the Relative Risk eb and the p-value for Each Estimate

Variable b se(b) eb p-value

Birth Year 016 lt 001 117 lt 05

Eye Colour (Baseline Brown)

BlueGrey 031 007 136 lt 05

GreenHazel 038 007 146 lt05

Hair Colour (Baseline Black)

Light RedGinger 017 015 119 27

Dark RedAuburn 038 015 146 lt 05

FairBlonde 006 012 106 62

Light Brown 014 012 115 22

Dark Brown 002 012 102 87

Skin Type (Baseline never burn)

Always burn 069 016 1968 lt 05

Usually burn 045 015 157 lt 05

Sometimes burn 031 015 136 lt 05

Freckling (Baseline none)

1 to 100 017 006 118 lt 05

gt 100 009 008 110 23

Mole Count (Baseline none)

Few 029 007 134 lt 05

Moderate 079 008 220 lt 05

Many 112 010 308 lt 05

Number of Sunburns (Baseline none)

One ndash007 011 093 49

2 to 5 ndash006 009 094 50

gt 6 017 009 119 07

Cumulative Sun Exposure ( lt 5 yrs) 004 001 104 lt 05

UV Exposure (5ndash12 yrs) 00003 lt 00001 1 lt 05

102 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

decision trees that were incorporated into RPART byTherneau (1997) using the ideas put forth by LeBlancand Crowley (1992) for survival data with censoringThese authors showed through simulation studies thatsurvival trees could outperform standard parametricmethods such as proportional hazards modelling particularly in situations where the underlying distrib-ution was not exponential Along with Breiman et al(1984) they showed decision trees to be usefulexploratory tools for identifying important variablesinteractions and outliers

The methodology for survival trees begins by iden-tifying optimal splits using the log rank statistic toseparate the data into homogeneous groups Eachsplit is comprised of a parent node and two daughternodes which are linked to the parent by branchesFigure 1 is an example of a survival tree produced onthe melanoma data Once a large tree is grown withmany terminal nodes that contain very few observa-tions a pruning procedure is introduced to identify asequence of sub-trees The technique of pruninginvolves snipping back splits of the tree one at a timeuntil only the root node remains Cross-validation isthen introduced to aid in the selection of the optimalmodel One nice feature of survival trees is the use ofKaplan-Meier curves to provide information aboutsurvival rates at each terminal node of the tree Thisapproach may allow for better interpretation of theterminal nodes and highlight different scenarioswhich yield similar survival rates as can be seen inFigure 1

The variables identified from the multivariablemodel were used as input into RPART Variables suchas the sun and UV exposures were omitted from themodelling stage since nearly half of the data for eachof these variables were missing (cumulative sun expo-sure 438 UV exposure 404) A large survivaltree was produced in SPLUS and using cross-valida-tion a model splitting solely on birth year wasidentified This survival tree yielded the minimumcross-validated error rate (093) suggesting that birthyear was an important indicator for melanomaSelecting a slightly larger but more informative model(error = 094) resulted in a survival tree consisting ofnine terminal nodes This revealed splits on birth yearmole count and freckling Al l other variablesappeared to be either competing at each node oracting as a surrogate variable for one of these primarysplits Skin type in particular arose as an importantsurrogate for freckling

In Figure 1 we aim to provide a more visual inter-pretation to the results from the survival analysisEach split is shown at the top of each node and canbe assumed to be the split that directs observationstowards the left side of the tree For example aftersplitting on birth year at 1933 a split to the left cor-responds to an individual with few moles A split tothe right indicates an individual with many moles andso on The terminal nodes in this figure are displayed

as Kaplan-Meier plots showing the survival curvesalong with the number of observations n and the riskratio calculated with reference to the baseline groupof 9669 individuals The reference survival curve isshown at the root node along with a bar which showsthe percentage of events in the entire dataset Gridsare placed on each survival plot at equal intervals of20 years of age These grids have been plotted to sim-plify the interpretation An indicator to the left of theplot displays the risk ratio Ratios above one areshown by an ldquoup arrowrdquo while a decrease in risk isillustrated by a ldquodown arrowrdquo From this model wecan see that there are a few scenarios that indicatehigh risk for earlier onset of melanoma These scenar-ios may be described as

individuals born after 1966 with many moles (RR = 113)

individuals born between 1947 and 1966 withmany moles (RR = 45)

individuals born between 1933 and 1953 with fewmoles but many freckles (RR = 24)

individuals born between 1923 and 1933 withmany moles (RR = 141)

It is obvious from these results that birth year has asubstantial impact on the age-at-onset of melanomaOnce this is taken into account mole count and freck-ling only provide a small contribution to the risk

Family History of Melanoma

As described earlier in the Data section family historywas collected regarding first-degree (siblings andparents) and second-degree relatives From this pedi-gree structure other higher-order types of relative pairscould also be formed Some of these relatives were dis-eased with melanoma resulting in pairs of relativeswho both may be diseased (++) both not diseased (mdash)and one diseased while the other was not diseased (+ndash)Table 2 lists the concordant and discordant pairs ofspecific relationships sibndashsib parentndashchild andsecond-degreelower order relative pairs The second-degree relative pairs include grandparentndashgrandchildand auntndashniece while the lower order relative pairsinclude the in-law pairs From Table 2 the percent-ages of both diseased pairs are 06 09 and 04among sibndashsib parentndashchild and secondlower orderpairs respectively Crude estimates of correlationcoefficients can be calculated from these percentageswithout accounting for ages at onset among these rel-atives However the risk of developing melanomamay depend on the subjectrsquos age Hence adjusting forthe age at onset is essential in quantifying the correla-tion of age at onset between pairs of relatives In asubsequent section we describe how this can be donerigorously via the generalized estimating equationsapproach On average melanomas were diagnosedslightly earlier in relatives (475 years) than inprobands (502 years) Among relatives melanomaswere diagnosed at younger ages in later generations

103Twin Research February 2004

Analysis of Melanoma Onset

RR

= 1

00

Bas

elin

e

Fig

ure

1

Surv

ival

tree

for t

he m

elan

oma

data

con

sist

ing

of 9

669

indi

vidu

als

from

191

2 Qu

eens

land

fam

ilies

Ter

min

al n

odes

are

rect

angu

lar a

nd d

ispl

ay th

e Ka

plan

-Mei

er c

urve

for t

he c

orre

spon

ding

sub

grou

p

104 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

To account for the different ages at censoring in eachgeneration due to termination of the study or deathfrom causes other than melanoma we examined thedisease-free survival distribution for each generationusing the standard failure-time analysis technique(Figure 2) The median age at diagnosis of melanomawas 64 among parents of probands 50 among sib-lings of probands and 33 among children ofprobands The disease-free survival functions differedsignificantly between the children generation fromearlier generations (log-rank test with p lt 01) butthere was no significant differences in age-at-onset of

melanoma between the siblings and parents ofprobands (log rank test with p gt 9)

Preliminary Segregation Analysis

A preliminary segregation analysis was conducted(Aitken et al 1998) to investigate whether the famil-ial clustering of cutaneous melanoma is consistentwith Mendelian inheritance of a major autosomalgene Analyses were performed with the SAGE (1992)statistical package using the maximum likelihoodREGTL program for a binary trait with a variable ageof onset The hypothesis of co-dominant Mendelianinheritance gave a significantly better fit to the datathan either dominant or recessive Mendelian inheri-tance Overall both Mendelian inheritance of a singlemajor gene and purely environmental transmissionwere rejected However there was strong evidence offamilial dependence in melanoma occurrence

The inclusion of risk factors in the models mayreveal whether all or a combination of these explainsthe familial dependence that was demonstrated by thesegregation analysis If a major gene exists that oper-ates independently on these covariates includingthem in the models may reduce residual variation andincrease the power of the analysis to detect a majorgene effect

Estimating Equations Approach

Let y = (dki tki Zki) denote the data collected for the ith

member in the kth family (k = 1hellipK and i = 12)where dki = 0 if the observation is censored tki is eitherthe recorded age at diagnosis of melanoma or the ageat the most recent follow-up for unaffected peopleand Zki is a vector of measured covariates We assumethat censoring time age at diagnosis of melanomaand the covariates are independently distributedThese assumptions can be relaxed in more generalmodels subject to identification constraints Thehazard rate for melanoma is the instantaneous proba-bility that melanoma is diagnosed immediately aftertime t given that the person is unaffected at time tThe hazard rate under the Cox proportional hazardsmodel (Cox 1972) is given by

l(tki) = l0(tki) exp(brsquoZki)

Table 2

Concordant and Discordant Pairs of Relatives in 1912 Families from the Queensland Familial Melanoma Project Probands Were Not Included forthe Calculation of Concordance

Sibndashsib Parentndashchild SecondOthers Total

++ 49 15 41 105

+ndash 763 536 1200 2499

mdash 7011 1078 9817 17906

Total 7823 1629 11058 20510

Figure 2

Cumulative melanoma-free survival among first-degree relatives ofconfirmed cutaneous melanoma cases (probands) diagnosed inQueensland Australia 1982ndash1987 according to the relativersquos rela-tionship to the proband

where l0() is the baseline hazard function and b is avector of regression coefficients

For a specific pair of relatives we follow Clayton(1978) in modeling the bivariate survivor function

F(tk1 tk2) = (F1(tk1)ndashq + F2(tk2)

ndashq ndash1)ndash1q

where F1 and F2 are univariable survivor functions qis a scalar parameter that measures the degree ofdependence between the relativesrsquo times at onsetindependence being implied by q = 0 and positiveassociation by q gt 0 The Clayton model allows nega-tive dependencies and has the property that failuretimes are absolutely continuous for q gt ndash05 In addi-tion the cross-ratio (or odds-ratio) function asstudied by Oakes (1989) is

c(tk1 tk2) = l(tk1 frac12 Tk2 = tk2) l(tk1frac12Tk2 sup3 tk2) =1 + q

This is equivalent to assuming that the odds-ratio isinvariant over the grid region that supports the dataHeuristically the parameter 1 + q is an odds-ratiothat depends on the degree of dependence betweenthe onset ages of the two relatives If genetic factorsdo influence the age at onset of melanoma we wouldexpect to see a higher concordance in the age of onsetin first degree relatives who on average share halftheir genes in common in comparison to seconddegree relatives Under the current model this trans-lates as qfirst-degree gt qsecond-degree

We may use a standard method to estimate withinpair correlations for 2 acute 2 tables from odds ratiosEstimates of relation-pair correlations psibndashsib pparentndashchildand psecond-order are recovered from using the relation-ships for example r sibndashsib = min(1ln(1 + q sibndashsib))Testing for the presence of genetic factors underlyingthe age at diagnosis of melanoma is equivalent totesting H0 p first-order = psecond-order We may test thishypothesis using a z-transform (Kendall 1979 p 315) of the point estimates of the correlation coef-ficients Let n1 and n2 denote the number of first orderand second order relatives let z1and z2 denote thetransformed statistics of r1and r2 the correlation esti-mates for first-order and second-order relative pairsrespectively Specifically we reject H0 when ED gtZ1ndasha where

E = E(z1 ndash z2) = mdash12mdash log 31mdash11

+ndash r

r

1

1mdash2 1mdash11

+ndash r

r2

2

mdash24D2 = V(z1 ndash z2) = mdash

(n1

1ndash 3)mdash+ mdash

(n2

1ndash 3)mdash

and Z1ndasha is the standard normal deviate correspondingto the one-sided a significance level

This approach has the advantage of providing atest for the presence of genetic effects through a singleparameter (q) However it is limited in its ability to

attribute the phenotypic variance to specific effects(eg additive gene action)

Mathematical details of the GEE model and theiterative procedure to estimate the regression coeffi-cients b and specific degrees of dependence q for thedifferent types of relative pairs have been summarizedpreviously in Do et al (2000)

MCMC Analysis Using BUGSThe Bayesian Paradigm and Gibbs Sampling

Markov Chain Monte Carlo (MCMC) is an alterna-tive Bayesian approach that provides estimates oflikelihoods and associated parameter values whenexact computation is infeasible (Hastings 1970Metropolis et al 1953) MCMC methods can beused to draw samples from the underlying joint distri-bution of major genotypes and polygenic valuesconditional on the observed data From thesesamples desired parameters and likelihoods can beestimated without the need to resort to exact compu-tation MCMC methods have been used for linkageanalysis (Kong et al 1992 Lange amp Sobel 1991)for estimation of parameters in the mixed model withand without covariates (Guo amp Thompson 1991Thomas 1992) for estimation of genendashsmoking inter-action and covariate imputation (Gauderman et al1997) for performing combined linkage and segrega-tion analysis (Faucett et al 1993 Guo amp Thompson1992) and for mixed models of large complex pedi-grees (Guo amp Thompson 1994)

In a general setting let y be the observed data andq be everything not observed including parametersand latent variables The implementation of Bayesianmethods using realistic models and priors is com-puter-intensive and relies heavily on cunningcomputational tools to approximate integrals Theproblem in general terms is to obtain the expectedvalue of a function of interest s () under the posteriordensity p(qfrac12x)

E[s(q)] =

which cannot generally be found analytically Onemethod to carry out the integration on the RHS is toperform simulation of exact Bayesian posterior distri-butions using Markov chain Monte Carlo techniquessuch as Gibbs sampling The Gibbs sampler (Gemanamp Geman 1984) is the most popular algorithm usedin MCMC applications to correlated data Gibbssampling was introduced to the main statistical com-munity by Gelfand and Smith (1990) and has sincebeen applied to an even wider array of problems TheGibbs sampler is easy to implement because it onlydepends on the local neighborhood structure In thecontext of pedigree analysis (Olshen amp Wijsman1996) the basic procedure is a sequential updating ofmissing and latent data including the underlying andunobserved major genotypes polygenic effects and

EQ s(q) p(q) p(xfrac12q) dqmdash mdash mdash

EQ p(q) p(xfrac12q) dq

105Twin Research February 2004

Analysis of Melanoma Onset

106 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

environmental effects Values for the missing or latentdata are sampled from the local conditional distribu-tion a function of the observed individual data thecurrent sampled values of other missinglatent data forthis particular individual such as polygenic and environ-mental effects and the values for the sampled geneticeffects in the immediate neighbors of an individualGibbs sampling basically consists of three main steps

Step 1 Setting initial values for unobserved quanti-ties (parameters and latent variables)

Step 2 For each parameter or latent variable qj sample from its ldquofull conditional distributionrdquogiven the current values of all other quantities inthe model

Step 3 Examine sampled values of parameters andlatent variables to monitor convergence and toprovide summary measures

Some of the most recent and popular packages thatimplement Gibbs sampling for analysis of pedigreedata include BUGS (Gilks et al 1994 Spiegelhalter et al 1996ab) Genetic Analysis Package (GAP1996) and MIXD (Olshen amp Wijsman 1996Thompson 1994) We have used BUGS mainlybecause of its flexibility in programming hierarchicalmodels besides being a freeware product

The Model

The aim here is to model the correlation structurewithin the family structure to satisfy the fundamentaladditive genetic model (Crow amp Kimura 1970Falconer 1990 Kempthorne 1960) as followsConsider a nuclear family structure consisting of 4members father mother and two children denoted byF M S1 S2 respectively Using similar notation as inBurton et al (1999) a conventional mixed linearmodel consisting of fixed and random effects may bewritten in the form

Qij = bTz + Aij +Cij + CSij+ Eij

where Qij is the observed value of a normally distrib-uted continuous trait for the jth individual in the ith

nuclear family zij is a vector of observed covariatesrepresenting fixed effects and b is a correspondingvector of unknown fixed regression coefficients Aij Cij and CSij

A denote random effects that repre-sent additive polygenic common family environmentand common sibling environment effects respectivelyThe variation in an individual response is representedby a composite covariance matrix VT and is the sumof an additive genetic covariance matrix VA acommon family environment matrix VC a sharedsibling environment matrix VCs and residual environ-mental effects The different variance components are

The overall total covariance matrix is

The components of variance s 2A s 2

C s 2Cs need not be

positive as long as VT is positive definite A negativevalue for s 2

Cs simply implies that the realized covari-ance between sib lings is less than the realized

107Twin Research February 2004

Analysis of Melanoma Onset

covariance between parent and a child A negativevalue for s 2

Cs may suggest dominance To optimizeconvergence in BUGS Model (1) may be reparameter-ized as

where Fi Gi H i are independent additive randomeffects or latent variables and are the residual errorterms for parents and children respectively If we model

Fi ~ N(0 mdash12mdash s 2

A +s 2C)

Gi ~ N(0 mdash12mdash s 2

A) Hi ~ N(0 s 2Cs)

then the basic genetics covariance model (additivegenetic common environment and unique environ-ment for this particular four-member family structureis satisfied

In survival models unobserved or unmeasuredexplanatory variables some of which may be geneticare often referred to as frailties The frailties takevalues restricted to the posit ive line and may beassumed to act multiplicatively on the hazardExtending the above model to correlated family datawith time-to-onset endpoint a multiplicative individ-ual heterogeneity or frailty term representing thelatent genetic and common environment variablesmay be modeled as random effects simultaneouslywith the effects associated with observed covariatesConsider right censored time to onset of melanomadata (Tij dij zij)1 j n from n relative pairs hereT i j denotes the true age at onset of the jth familymember or the censored time depending on whetherdij =1 or 0 respectively and z denotes a p acute 1 vector ofcovariates A Weibull distribution may be used tomodel time to failure as

f(ti zi) = ebrsquoz igt igndash1 exp(ndashebrsquoz it g

i )

where b is a vector of unknown regression coefficientsand g is the shape parameter of the Weibull distribu-tion This leads to a baseline hazard of the form

l0(ti) = gt igndash1

Re-parameterize by letting mi = ebrsquoz i the conditionaldistribution of ti given mI is then Weibull (gmi) We for-mulated a mixed model to represent the conditionaldistribution of tij given covariate effects random addi-tive genetic and common environment effects as

tijfrac12mij ~ Weibull (g mij) i = 1hellipn j = 12

where

The regression coefficients and the precision of therandom effects (tG tF tH) were given ldquonon-informa-tiverdquo Normal and Gamma priors respectively The shapeparameter g of the time to onset of melanoma distribu-tion was also given a non-informative Gamma priorwhich was slowly decreasing on the positive real line

The overall model (including random effects) maybe described in a Bayesian graph (Figure 3) whichsimplifies sampling from full conditional distributionsby exploiting part ial independence propert ies(Spiegelhalter et al 1996a) In this graph eachrandom quantity is represented by a node which maybe connected by directed or undirected linksConditional independence assumptions are repre-sented by the absence of such links

We implemented the Gibbs sampler using theBUGS program (Gilks et al 1994) (code in theAppendix) Imputation of missing data was handlednaturally in the Gibbs sampling framework by treatingmissing values as additional unknown quantities andrandomly sampling values from their full conditionaldistributions We chose simple prior distributions forimputation since the number of missing values forcovariates was not large (complete for birth year 16missing for naevi and 21 missing for freckles) andthere was no indication of non-random missingness inour data Therefore imputation for missing naevi andfreckles covariates were based on Bernouilli prior dis-tributions with respective parameter values estimatedfrom the complete observations We performed aninitial 10000 burn-in iterations followed by an addi-tional 20000 Parameter estimates were the mean andstandard deviation (SD) of all post convergence Gibbssamples with a thinning interval between 20 and 50credible intervals were computed as the lower andupper a2 percentiles from the last 20000 iterationsConvergence to the posterior distribution was con-firmed by using the different criteria provided by theadd-on CODA package including those of Gelmanand Rubin (1992) Geweke (1992) and Raftery andLewis (1992a 1992b)

ResultsGEE Approach

An inspection of residual plots following preliminarymodel-fitting provided no evidence for the failure ofproportional hazards assumption and did not detectinfluential observations We then proceeded to applythe GEE approach that could estimate regressioncoefficients while incorporating a dependence struc-ture between relative pairs The results are displayed

108 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

in Table 3 which suggest that later birth year havingat least a moderate number of naevi and freckles wereall simultaneously associated with later age at onsetof melanoma Table 3 also presents the estimatedodds-ratios for quantifying the correlation betweenpaired relatives of specific relationships

The odds ratio for sibndashsib pairs is 2973 which issignificantly different from 1 (p lt 001) The odds ratiofor parentndashchild pairs is 1650 which is slightly greaterthan 1 indicating a mild dependency between thesepairs although they are not quite statistically signifi-cant The odds ratio for second-degree and higherrelative pairs is 1155 indicating no dependence at allbetween these pairs This pattern of familial aggrega-tion is compatible with dominance variance as well asadditive genetic variance (Falconer 1990)

Bayesian Approach

We re-analyzed the age-at-onset of melanoma by usingGibbs sampling to impute missing covariates and toestimate subject-specific covariate effects randomadditive genetic common family environment andshared sibling environment effects on the log scale Theestimated shape parameter g was 43 with a 95 credi-ble interval (CI) of (42 46) The results aresummarized in Table 4 (Model A) The residual plotsdid not indicate a gross departure from the underlyingWeibull model and revealed no influential observa-tions We checked the sensitivity of the analyses to

initial parameter values by re-running the Gibbssampler five more times using different starting valuesThe resulting estimates did not differ by more than 5from the values reported here The mean estimate for(s 2

A) was 0452 with 95 CI = (0348 0566) for (s 2C)

was ndash0053 with 95 CI = (ndash0120 0019) and for(s 2

Cs) was 0467 with 95 CI = (0393 0545)A small negative value for the common family

effect suggests that there may be a dominant effect orthat the current model is not quite appropriate forexample that there are systematic effects or correla-tion structures that have not been accounted for Theresults here indicate that additive genetics seem toimpact equally on the variation of the age at onset ofmelanoma Further exploration for alternative modelsis a focus of our future research

In addition we investigated the relative contribu-tion of genetic and environmental effects on theexpression of naevi (Model B) and freckles (ModelC) which are known risk factors for melanoma Theexpressions of naevi and freckles were coded asbinary variables (none or few moles versus moderateor many moles and no freckles versus one or morefreckles) A hierarchical Bayesian binomial model wasfitted to estimate the random variance componentsThe results in Table 4 indicate that a common familyenvironment effect contributed the most to theexpression of naevi (s 2

C = 0704) (relative to the con-tributions of additive genetic effect (s 2

A = 0142) and

Individual j in family i

Figure 3

Graphical model of covariate and random family effects for an individual in a nuclear family structure tij represents the observed failure time forthe jth individual in the ith family with tcenij being an indicator variable of censoring status Full arrows indicate stochastic links to which a probabil-ity is attached broken arrows denote deterministic relationships bs are regression coefficients T is the precision of the prior distribution andequals the inverse of the variance Fi Gi Hi are random effects modeled as Fi ~ N(012 s 2

A + s 2C) Gi ~ N(0 12 s 2

A) and Hi ~ N(0 s 2Cs) Rectangles rep-

resent actual data values for the covariates g and mij are shape and scale parameters for the underlying Weibull distribution

109Twin Research February 2004

Analysis of Melanoma Onset

of shared sibling effect(s 2Cs = 0142) both of which

were non negligible In contrast variation in theexpression of freckles was largely explained by addi-tive genetic and shared family effects(s 2

A = 2050 s 2C =

2600) compared to a relatively small shared siblingeffect(s 2

C = 0115)

Discussion

We applied two methods mdash generalized estimatingequat ion and Bayesian analys is mdash to the geneticanalysis of age at onset of melanoma based on anuclear family structure Under both approaches theresults suggest that additive genetic factors played an

Table 4

Gibbs Sampling Approach Estimated Regression Coefficients and Estimated Variance Components in a Melanoma Study of Queensland Families

Weibull Model A Mean effects mdash Response variable is Age-at-onset

Covariate RR = eb Coefficient b Robust se(b) 95 CI of bYear of birth 1378 0321 00027 (03160326)

Naevi 1126 0119 00021 (00580185)

Freckling 1017 0017 01400 (ndash00050085)

Weibull Model B Variance components mdash Response variable is Age-at-onset

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 0452 0054 (03480566)

s 2C ndash0053 0027 (ndash01200019)

s 2Cs 0467 0040 (03930545)

g 43 0104 (4246)

Binomial Model Variance components mdash Response variable is Naevi

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 0142 0149 (00020498)

s 2C 0704 0156 (04031010)

s 2Cs 0195 0157 (000250553)

Binomial Model Variance components mdash Response variable is Freckling

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 2050 0779 (08353570)

s 2C 2600 0418 (17803460)

s 2Cs 0115 0088 (00110312)

Note Indicates Significance at the 005 level) Naevi Is a Binary Variable with Baseline 0 = No or Few Moles Freckling Is Coded as a Binary Variable with Baseline0 = No Freckles

Table 3

GEE Approach Estimated Regression Coefficients in the Proportional Hazard Model and Estimated Odds Ratios for Quantifying FamilialAggregation in Age at Onset of Melanoma in Queensland Families ( Indicates Significance at the 05 Significance Level)

A Mean effects

Covariate RR = eb Coefficient b Robust se(b) Z-statistic

Year of birth 1142 0132 0051 2588

Naevi (Baseline = No or few moles) 1765 0568 0073 7781

Freckling (Baseline = No freckles) 1160 0148 0049 3020

B Pattterns of familial aggregation

Relationship 1 + q se(q) Z-statistic

Sibndashsib 2973 06217 317

Parentndashchild 1650 0434 150

SecondOthers 1155 0270 047

important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix

Acknowledgments

Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee

ReferencesAbel L amp Bonney G E (1990) A time-dependent

logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407

Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401

Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165

Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc

Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag

Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press

Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625

Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California

Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212

Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151

Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117

Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202

Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235

English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283

Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd

Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494

GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software

110 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

111Twin Research February 2004

Analysis of Melanoma Onset

Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214

Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188

Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409

Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511

Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741

Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press

Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178

Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189

Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126

Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432

Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109

Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760

Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678

Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506

Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071

Kempthorne O (1960) Biometrical genetics New YorkPergammon Press

Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press

Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210

Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452

Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334

LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425

Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293

Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298

MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432

Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092

Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43

Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493

Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497

Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press

SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University

Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148

112 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171

Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press

Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit

Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit

Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95

Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146

Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation

Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230

Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington

Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454

113Twin Research February 2004

Analysis of Melanoma Onset

Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects

Page 2: Analysis of Melanoma Onset: Assessing Familial Aggregation ... · and Fitting Variance Components via Bayesian Random Effects Models Kim-Anh Do1, Joanne F. Aitken2, Adè le C. Green3,

Research into chronic diseases such as melanomahas largely been achieved through family studieswhere the age of onset of the disease is modelledwithin a survival analysis framework An example ofthis type of analysis is the work by Abel and Bonney(1990) who developed a model accounting for age ofonset where the hazard function was expressed interms of a major gene effect and residual familydependence using the regressive approach described inBonney (1986)

Clayton (1978) and Vaupel (1979) introduced theso-called ldquofrailtyrdquo model which has been extended bymany others (Andersen et al 1993 Clayton ampCuzick 1985 Gauderman amp Thomas 1994Hougaard 1986 Li amp Thompson 1997 Li ampWijsman 1998 Nielsen et al 1992 Siegmund ampMcKnight 1998) A variant of these approaches usesestimating equations which accommodates correlatedage of onset outcomes and is known to be robust andcomputationally efficient (Hsu amp Prentice 1996 Hsuamp Zhao 1996) A common theme is the investigationof possible underlying genetic and environmentalfactors that may influence the age of onset of thedisease of interest

Recently the focus of familial studies involvingcensored data has moved towards the development ofBayesian methods using packages such as BUGS andWinBUGS (Gilks et al 1994 Spiegelhalter et al1996a 1996b 2003) the Genetic Analysis Package(GAP 1996) and MIXD (Olshen amp Wijsman 1996Thompson 1994) Examples of this include theimplementation of Markov Chain Monte Carlo(MCMC) methods for linkage analysis (Heath 1997Kong et al 1992 Lange amp Sobel 1991) the estima-tion of parameters in a mixed model with andwithout covariates (Guo amp Thompson 1991Thomas 1992) estimation of a gene-smoking interac-tion and covariate imputation (Gauderman et al1997) combined linkage and segregation analysis(Guo amp Thompson 1992 Faucett et al 1993) anddevelopment of mixed models for large complex pedi-grees (Guo amp Thompson 1994) Recent work by Doet al (2000) demonstrates an application of Bayesianmethodology to menopausal age in twins using a gen-eral ised l inear mixed model (GLMM) In thisapplication they investigated the contribution ofcovariates and any underlying genetic and environ-mental factors to explain variation in menopausalage A similar investigation by Scurrah et al (2000)fitted a Bayesian model to survival data In this appli-cation the authors used a Bayesian piece-wiseexponential model to explore the time to onset of res-piratory disease given known risk factors andpossible familial effects Both papers highlight theflexible nature of the Bayesian approach which canbe seen through the inclusion of priors and the inte-gration of fixed and random effects

The Bayesian approach to fitting a genetic modelin a GLMM framework was examined in detail inKuhnert and Do (2003) In this paper a simulationstudy investigated the flexible nature of the Bayesianmodel and its ability to incorporate genetic compo-nents through random effects This was comparedwith standard maximum likelihood methods for esti-mating genetic components in the model Resultsfrom a simulation study indicated a consistent advan-tage in using the Bayesian method to detect a correctmodel under particular scenarios of additive geneticsand common environmental effects Moreover forbinary data there was difficulty in detecting thecorrect model under low and moderate levels of heri-tability Results however were improved for ordinaldata under similar scenarios

We present an alternative model using Bayesianmethodology which takes into account the complexfeatures inherent with melanoma using a large datasetcomprising 1449 families The model does not assumeproportional hazards but a multiplicative modelwhere the Weibull distribution is used to model the ageat onset of melanoma This is fit in a Bayesian frame-work which incorporates fixed and random effects toestimate possible risk factors covariates and anyunderlying genetic factors Another approach toaccommodate the correlated age-at-onset outcomes rig-orously is to use the estimating equations for assessingfamilial aggregation of age-at-onset (Hsu amp Prentice1996 Hsu amp Zhao 1996) It has two desirable fea-tures (i) robustness mdash no higher-order distributionalassumptions are required beyond pairwise ones and(ii) computational efficiency

In this article our interest lies primarily withinvestigating whether certain risk factors are able toexplain a considerable proportion of the familialdependence and if including them into the modelreduces the residual variation and results in increasedpower for detecting a major gene effect To ourknowledge this study will be the first in melanomaresearch that extends current methodology and incor-porates covariate and genetic effects simultaneouslywith age of onset using Bayesian methodology onfamily data

Material and MethodsThe Data on the Age at Onset of Melanoma and Potential Risk Factors

We analysed data from the Queensland FamilialMelanoma Project Family ascertainment and datacollection have been described in detail (Aitken et al1996) Assessing standard melanoma risk factorsinclude counts of naevi on the arms and back demo-graphic and medical details lifetime residence andfamily history of melanoma and other cancers Brieflywe ascertained all 12016 first incident cases of cuta-neous melanoma (invasive and in situ) diagnosed inQueensland residents between 1982 and 1990 and

99Twin Research February 2004

Analysis of Melanoma Onset

reported to the Queensland Cancer Registry or foundby comparing cancer registrations for 1984 and 1987with records of pathology laboratories throughoutQueensland It is estimated that registry records areapproximately 95 complete for the study periodDoctorsrsquo permission was obtained to approach10407 cases of whom 7784 (75) returned a brieffamily history questionnaire stating whether any oftheir first-degree relatives (parents siblings children)had had a diagnosis of melanoma A total of 2920probands was sampled from these respondentsincluding all who had claimed a positive familyhistory (n = 1529) and an approximate 20 randomsample of the remainder (n = 1391) Probands weresent a detailed family history questionnaire askingfor the names and addresses of all first-degree rela-tives relativesrsquo vital status dates of birth and agesand whether any of these relatives had had amelanoma diagnosed by a doctor To avoid bias indetermining the mode of inheritance second andlower degree relatives were enrolled in the studyaccording to a sequential sampling scheme (Canningsamp Thompson 1977) First degree relatives of all rela-tives with confirmed melanoma were ascertainedthrough the detailed family history questionnairedescribed above which was mailed to all confirmedpositive relatives In total 15989 relatives belongingto 1912 separate families were reported by 2118(73) probands or other positive relatives A total of1044 relatives for whom date of birth was unknownwere excluded leaving 14945 relatives for analysisThere were 188 families independently ascertainedthrough two or more probands To avoid ascertain-ment bias these families were included in the datasetseparately for each proband in the family

Medical confirmation and dates of diagnosis weresought for the relatives reported by probands or otherrelatives to have had melanoma After eliminating187 of subjects who refused access to their medicalrecords or those with lost records or those with falsepositive reports (basal or squamous cell carcinomasolar keratoses or benign naevi) medical confirma-tion of melanoma as the diagnosis was obtained for482 of the original number of relatives Only themedically verified cases among relatives were classi-fied as true events all other relatives were treated asunaffected (censored at last date of contact)

Risk factor questionnaires were subsequentlymailed to all living relatives aged between 18 and 75years ascertained through the sequential samplingprocedure Other relatives provided proxy reportsThe combined number of proxy-reports and self-reports was 9746 relatives for whom standard riskfactor information was available

For the Bayesian analysis we focused on familiesthat included at least one parent and at least onechild where each member in the family should haveinformation on age at diagnosis or age at last follow-up and with maximum one missing covariate

The demographic covariates and hypothesizedmelanoma risk factors included gender birth yearplace of birth ability to tan (very brown moderatetan slight tan no tan) propensity to burn (neverburn and always tan sometimes burn and usually tanusually burn and sometimes tan always burn andnever tan) number of sunburns (0 1 2ndash5 gt 6) skincolor (olivedark medium fairpale) hair color(black lightdark brown fairblonde lightdark red)eye color (brown greenhazel bluegrey) total freck-ling in summer (0 1ndash100 gt 100) number of naevi(none few moderate number very many) andnumerous measures of cumulative lifetime exposuresto sun and ultraviolet rays

Preliminary Exploratory Analysis

As a preliminary analysis we ignored correlationswithin families and applied a combination of paramet-ric and non-parametric survival analysis techniques asexploratory tools to identify possible risk factors formelanoma Once these fixed effects were identified weconsidered incorporating these into a subsequent gen-eralized estimating equation model or a Bayesianmodel with random effects that could account forwithin-family correlations The aim was to quantify thegenetic and familial associations in the presence ofobserved covariate effects

Manipulation of the entire melanoma datasetresulted in a subset of 9669 observations with a rangeof explanatory variables that described phenotypiccharacteristics for each individual along with somedemographic details such as birth year and genderThe response variable was the time to diagnosis (orage at the last follow-up) with the proportion of cen-sored cases being approximately 76 The medianage at onset of melanoma was 43 The correlationestimates for age of onset for different relationshippairs with both affected members were 067 055and 039 for sibndashsib parentndashchild and secondlowerorder pairs respectively

The first stage of modelling involved fitting uni-variable proportional hazard models to assess eachvariablersquos individual effect on the time of onset ofmelanoma The SAS (Allison 1995) package wasused to fit proportional hazards models of the follow-ing form

h(t x) = Y (x b)h0(t)

where Y represents a log-linear function ebrsquox of theexplanatory variables x and corresponding coeffi-cients b and h0(t) represents the baseline hazard attime t

Significant variables associated with the age atonset of melanoma consisted of eye hair and skincolour freckling number of moles skin type abilityto burn ability to tan previous skin cancers ultravio-let exposure between the ages of 5 and 12 years

100 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

101Twin Research February 2004

Analysis of Melanoma Onset

cumulative sun exposure up to the age of 19 yearsand birth year

The second stage fitted multivariable proportionalhazard models to those explanatory variables thatwere significant at the univariable stage Table 1 dis-plays the results from the final model which onlyincluded significant variables (p-value lt 05) It isworth noting that a similar result could be obtainedusing an automated stepwise procedure

Results from this analysis highlight some interest-ing but quite obvious risk factors noted in previousanalyses (Aitken et al 1996 1998) For example

An increase of 1 year in birth year induces 17increase in risk of earlier melanoma onset

People with neither freckles nor naevi have thelowest risk of melanoma onset

The risk of earlier melanoma onset is increased byup to 37 for blue eyed people and even further(46) for green eyed people when compared toindividuals with brown eyes

ldquoRed Headsrdquo have an increased risk of earliermelanoma onset (46) when compared to indi-viduals with black hair However no significantincrease was noted for individuals with fair orlight red hair

A personrsquos tendency to burn easily increases therisk of earlier melanoma onset in some cases byup to 100 compared to those that never burn

However striking this last statement is issues of con-founding must also be considered The most obviousillustration of this is the confounding that occurredbetween mole count and freckling This is seenthrough close inspection of the parameter estimateswhich changed in magnitude when mole count wasadded to the model after adjusting for freckling(Results not displayed here)

To reduce the dimension of the problem furtherand avoid some of these confounding issues a sur-vival tree was constructed using RPART (RecursivePartitioning and Regression Trees) see Therneau andAtkinson (1997) Survival trees are a special case of

Table 1

Results from Fitting a Multivariable Proportional Hazards Model to the Melanoma Data Based on Univariable Results The Results Reported in thisTable Are the Parameter Estimates b Their Standard Errors se(b) the Relative Risk eb and the p-value for Each Estimate

Variable b se(b) eb p-value

Birth Year 016 lt 001 117 lt 05

Eye Colour (Baseline Brown)

BlueGrey 031 007 136 lt 05

GreenHazel 038 007 146 lt05

Hair Colour (Baseline Black)

Light RedGinger 017 015 119 27

Dark RedAuburn 038 015 146 lt 05

FairBlonde 006 012 106 62

Light Brown 014 012 115 22

Dark Brown 002 012 102 87

Skin Type (Baseline never burn)

Always burn 069 016 1968 lt 05

Usually burn 045 015 157 lt 05

Sometimes burn 031 015 136 lt 05

Freckling (Baseline none)

1 to 100 017 006 118 lt 05

gt 100 009 008 110 23

Mole Count (Baseline none)

Few 029 007 134 lt 05

Moderate 079 008 220 lt 05

Many 112 010 308 lt 05

Number of Sunburns (Baseline none)

One ndash007 011 093 49

2 to 5 ndash006 009 094 50

gt 6 017 009 119 07

Cumulative Sun Exposure ( lt 5 yrs) 004 001 104 lt 05

UV Exposure (5ndash12 yrs) 00003 lt 00001 1 lt 05

102 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

decision trees that were incorporated into RPART byTherneau (1997) using the ideas put forth by LeBlancand Crowley (1992) for survival data with censoringThese authors showed through simulation studies thatsurvival trees could outperform standard parametricmethods such as proportional hazards modelling particularly in situations where the underlying distrib-ution was not exponential Along with Breiman et al(1984) they showed decision trees to be usefulexploratory tools for identifying important variablesinteractions and outliers

The methodology for survival trees begins by iden-tifying optimal splits using the log rank statistic toseparate the data into homogeneous groups Eachsplit is comprised of a parent node and two daughternodes which are linked to the parent by branchesFigure 1 is an example of a survival tree produced onthe melanoma data Once a large tree is grown withmany terminal nodes that contain very few observa-tions a pruning procedure is introduced to identify asequence of sub-trees The technique of pruninginvolves snipping back splits of the tree one at a timeuntil only the root node remains Cross-validation isthen introduced to aid in the selection of the optimalmodel One nice feature of survival trees is the use ofKaplan-Meier curves to provide information aboutsurvival rates at each terminal node of the tree Thisapproach may allow for better interpretation of theterminal nodes and highlight different scenarioswhich yield similar survival rates as can be seen inFigure 1

The variables identified from the multivariablemodel were used as input into RPART Variables suchas the sun and UV exposures were omitted from themodelling stage since nearly half of the data for eachof these variables were missing (cumulative sun expo-sure 438 UV exposure 404) A large survivaltree was produced in SPLUS and using cross-valida-tion a model splitting solely on birth year wasidentified This survival tree yielded the minimumcross-validated error rate (093) suggesting that birthyear was an important indicator for melanomaSelecting a slightly larger but more informative model(error = 094) resulted in a survival tree consisting ofnine terminal nodes This revealed splits on birth yearmole count and freckling Al l other variablesappeared to be either competing at each node oracting as a surrogate variable for one of these primarysplits Skin type in particular arose as an importantsurrogate for freckling

In Figure 1 we aim to provide a more visual inter-pretation to the results from the survival analysisEach split is shown at the top of each node and canbe assumed to be the split that directs observationstowards the left side of the tree For example aftersplitting on birth year at 1933 a split to the left cor-responds to an individual with few moles A split tothe right indicates an individual with many moles andso on The terminal nodes in this figure are displayed

as Kaplan-Meier plots showing the survival curvesalong with the number of observations n and the riskratio calculated with reference to the baseline groupof 9669 individuals The reference survival curve isshown at the root node along with a bar which showsthe percentage of events in the entire dataset Gridsare placed on each survival plot at equal intervals of20 years of age These grids have been plotted to sim-plify the interpretation An indicator to the left of theplot displays the risk ratio Ratios above one areshown by an ldquoup arrowrdquo while a decrease in risk isillustrated by a ldquodown arrowrdquo From this model wecan see that there are a few scenarios that indicatehigh risk for earlier onset of melanoma These scenar-ios may be described as

individuals born after 1966 with many moles (RR = 113)

individuals born between 1947 and 1966 withmany moles (RR = 45)

individuals born between 1933 and 1953 with fewmoles but many freckles (RR = 24)

individuals born between 1923 and 1933 withmany moles (RR = 141)

It is obvious from these results that birth year has asubstantial impact on the age-at-onset of melanomaOnce this is taken into account mole count and freck-ling only provide a small contribution to the risk

Family History of Melanoma

As described earlier in the Data section family historywas collected regarding first-degree (siblings andparents) and second-degree relatives From this pedi-gree structure other higher-order types of relative pairscould also be formed Some of these relatives were dis-eased with melanoma resulting in pairs of relativeswho both may be diseased (++) both not diseased (mdash)and one diseased while the other was not diseased (+ndash)Table 2 lists the concordant and discordant pairs ofspecific relationships sibndashsib parentndashchild andsecond-degreelower order relative pairs The second-degree relative pairs include grandparentndashgrandchildand auntndashniece while the lower order relative pairsinclude the in-law pairs From Table 2 the percent-ages of both diseased pairs are 06 09 and 04among sibndashsib parentndashchild and secondlower orderpairs respectively Crude estimates of correlationcoefficients can be calculated from these percentageswithout accounting for ages at onset among these rel-atives However the risk of developing melanomamay depend on the subjectrsquos age Hence adjusting forthe age at onset is essential in quantifying the correla-tion of age at onset between pairs of relatives In asubsequent section we describe how this can be donerigorously via the generalized estimating equationsapproach On average melanomas were diagnosedslightly earlier in relatives (475 years) than inprobands (502 years) Among relatives melanomaswere diagnosed at younger ages in later generations

103Twin Research February 2004

Analysis of Melanoma Onset

RR

= 1

00

Bas

elin

e

Fig

ure

1

Surv

ival

tree

for t

he m

elan

oma

data

con

sist

ing

of 9

669

indi

vidu

als

from

191

2 Qu

eens

land

fam

ilies

Ter

min

al n

odes

are

rect

angu

lar a

nd d

ispl

ay th

e Ka

plan

-Mei

er c

urve

for t

he c

orre

spon

ding

sub

grou

p

104 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

To account for the different ages at censoring in eachgeneration due to termination of the study or deathfrom causes other than melanoma we examined thedisease-free survival distribution for each generationusing the standard failure-time analysis technique(Figure 2) The median age at diagnosis of melanomawas 64 among parents of probands 50 among sib-lings of probands and 33 among children ofprobands The disease-free survival functions differedsignificantly between the children generation fromearlier generations (log-rank test with p lt 01) butthere was no significant differences in age-at-onset of

melanoma between the siblings and parents ofprobands (log rank test with p gt 9)

Preliminary Segregation Analysis

A preliminary segregation analysis was conducted(Aitken et al 1998) to investigate whether the famil-ial clustering of cutaneous melanoma is consistentwith Mendelian inheritance of a major autosomalgene Analyses were performed with the SAGE (1992)statistical package using the maximum likelihoodREGTL program for a binary trait with a variable ageof onset The hypothesis of co-dominant Mendelianinheritance gave a significantly better fit to the datathan either dominant or recessive Mendelian inheri-tance Overall both Mendelian inheritance of a singlemajor gene and purely environmental transmissionwere rejected However there was strong evidence offamilial dependence in melanoma occurrence

The inclusion of risk factors in the models mayreveal whether all or a combination of these explainsthe familial dependence that was demonstrated by thesegregation analysis If a major gene exists that oper-ates independently on these covariates includingthem in the models may reduce residual variation andincrease the power of the analysis to detect a majorgene effect

Estimating Equations Approach

Let y = (dki tki Zki) denote the data collected for the ith

member in the kth family (k = 1hellipK and i = 12)where dki = 0 if the observation is censored tki is eitherthe recorded age at diagnosis of melanoma or the ageat the most recent follow-up for unaffected peopleand Zki is a vector of measured covariates We assumethat censoring time age at diagnosis of melanomaand the covariates are independently distributedThese assumptions can be relaxed in more generalmodels subject to identification constraints Thehazard rate for melanoma is the instantaneous proba-bility that melanoma is diagnosed immediately aftertime t given that the person is unaffected at time tThe hazard rate under the Cox proportional hazardsmodel (Cox 1972) is given by

l(tki) = l0(tki) exp(brsquoZki)

Table 2

Concordant and Discordant Pairs of Relatives in 1912 Families from the Queensland Familial Melanoma Project Probands Were Not Included forthe Calculation of Concordance

Sibndashsib Parentndashchild SecondOthers Total

++ 49 15 41 105

+ndash 763 536 1200 2499

mdash 7011 1078 9817 17906

Total 7823 1629 11058 20510

Figure 2

Cumulative melanoma-free survival among first-degree relatives ofconfirmed cutaneous melanoma cases (probands) diagnosed inQueensland Australia 1982ndash1987 according to the relativersquos rela-tionship to the proband

where l0() is the baseline hazard function and b is avector of regression coefficients

For a specific pair of relatives we follow Clayton(1978) in modeling the bivariate survivor function

F(tk1 tk2) = (F1(tk1)ndashq + F2(tk2)

ndashq ndash1)ndash1q

where F1 and F2 are univariable survivor functions qis a scalar parameter that measures the degree ofdependence between the relativesrsquo times at onsetindependence being implied by q = 0 and positiveassociation by q gt 0 The Clayton model allows nega-tive dependencies and has the property that failuretimes are absolutely continuous for q gt ndash05 In addi-tion the cross-ratio (or odds-ratio) function asstudied by Oakes (1989) is

c(tk1 tk2) = l(tk1 frac12 Tk2 = tk2) l(tk1frac12Tk2 sup3 tk2) =1 + q

This is equivalent to assuming that the odds-ratio isinvariant over the grid region that supports the dataHeuristically the parameter 1 + q is an odds-ratiothat depends on the degree of dependence betweenthe onset ages of the two relatives If genetic factorsdo influence the age at onset of melanoma we wouldexpect to see a higher concordance in the age of onsetin first degree relatives who on average share halftheir genes in common in comparison to seconddegree relatives Under the current model this trans-lates as qfirst-degree gt qsecond-degree

We may use a standard method to estimate withinpair correlations for 2 acute 2 tables from odds ratiosEstimates of relation-pair correlations psibndashsib pparentndashchildand psecond-order are recovered from using the relation-ships for example r sibndashsib = min(1ln(1 + q sibndashsib))Testing for the presence of genetic factors underlyingthe age at diagnosis of melanoma is equivalent totesting H0 p first-order = psecond-order We may test thishypothesis using a z-transform (Kendall 1979 p 315) of the point estimates of the correlation coef-ficients Let n1 and n2 denote the number of first orderand second order relatives let z1and z2 denote thetransformed statistics of r1and r2 the correlation esti-mates for first-order and second-order relative pairsrespectively Specifically we reject H0 when ED gtZ1ndasha where

E = E(z1 ndash z2) = mdash12mdash log 31mdash11

+ndash r

r

1

1mdash2 1mdash11

+ndash r

r2

2

mdash24D2 = V(z1 ndash z2) = mdash

(n1

1ndash 3)mdash+ mdash

(n2

1ndash 3)mdash

and Z1ndasha is the standard normal deviate correspondingto the one-sided a significance level

This approach has the advantage of providing atest for the presence of genetic effects through a singleparameter (q) However it is limited in its ability to

attribute the phenotypic variance to specific effects(eg additive gene action)

Mathematical details of the GEE model and theiterative procedure to estimate the regression coeffi-cients b and specific degrees of dependence q for thedifferent types of relative pairs have been summarizedpreviously in Do et al (2000)

MCMC Analysis Using BUGSThe Bayesian Paradigm and Gibbs Sampling

Markov Chain Monte Carlo (MCMC) is an alterna-tive Bayesian approach that provides estimates oflikelihoods and associated parameter values whenexact computation is infeasible (Hastings 1970Metropolis et al 1953) MCMC methods can beused to draw samples from the underlying joint distri-bution of major genotypes and polygenic valuesconditional on the observed data From thesesamples desired parameters and likelihoods can beestimated without the need to resort to exact compu-tation MCMC methods have been used for linkageanalysis (Kong et al 1992 Lange amp Sobel 1991)for estimation of parameters in the mixed model withand without covariates (Guo amp Thompson 1991Thomas 1992) for estimation of genendashsmoking inter-action and covariate imputation (Gauderman et al1997) for performing combined linkage and segrega-tion analysis (Faucett et al 1993 Guo amp Thompson1992) and for mixed models of large complex pedi-grees (Guo amp Thompson 1994)

In a general setting let y be the observed data andq be everything not observed including parametersand latent variables The implementation of Bayesianmethods using realistic models and priors is com-puter-intensive and relies heavily on cunningcomputational tools to approximate integrals Theproblem in general terms is to obtain the expectedvalue of a function of interest s () under the posteriordensity p(qfrac12x)

E[s(q)] =

which cannot generally be found analytically Onemethod to carry out the integration on the RHS is toperform simulation of exact Bayesian posterior distri-butions using Markov chain Monte Carlo techniquessuch as Gibbs sampling The Gibbs sampler (Gemanamp Geman 1984) is the most popular algorithm usedin MCMC applications to correlated data Gibbssampling was introduced to the main statistical com-munity by Gelfand and Smith (1990) and has sincebeen applied to an even wider array of problems TheGibbs sampler is easy to implement because it onlydepends on the local neighborhood structure In thecontext of pedigree analysis (Olshen amp Wijsman1996) the basic procedure is a sequential updating ofmissing and latent data including the underlying andunobserved major genotypes polygenic effects and

EQ s(q) p(q) p(xfrac12q) dqmdash mdash mdash

EQ p(q) p(xfrac12q) dq

105Twin Research February 2004

Analysis of Melanoma Onset

106 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

environmental effects Values for the missing or latentdata are sampled from the local conditional distribu-tion a function of the observed individual data thecurrent sampled values of other missinglatent data forthis particular individual such as polygenic and environ-mental effects and the values for the sampled geneticeffects in the immediate neighbors of an individualGibbs sampling basically consists of three main steps

Step 1 Setting initial values for unobserved quanti-ties (parameters and latent variables)

Step 2 For each parameter or latent variable qj sample from its ldquofull conditional distributionrdquogiven the current values of all other quantities inthe model

Step 3 Examine sampled values of parameters andlatent variables to monitor convergence and toprovide summary measures

Some of the most recent and popular packages thatimplement Gibbs sampling for analysis of pedigreedata include BUGS (Gilks et al 1994 Spiegelhalter et al 1996ab) Genetic Analysis Package (GAP1996) and MIXD (Olshen amp Wijsman 1996Thompson 1994) We have used BUGS mainlybecause of its flexibility in programming hierarchicalmodels besides being a freeware product

The Model

The aim here is to model the correlation structurewithin the family structure to satisfy the fundamentaladditive genetic model (Crow amp Kimura 1970Falconer 1990 Kempthorne 1960) as followsConsider a nuclear family structure consisting of 4members father mother and two children denoted byF M S1 S2 respectively Using similar notation as inBurton et al (1999) a conventional mixed linearmodel consisting of fixed and random effects may bewritten in the form

Qij = bTz + Aij +Cij + CSij+ Eij

where Qij is the observed value of a normally distrib-uted continuous trait for the jth individual in the ith

nuclear family zij is a vector of observed covariatesrepresenting fixed effects and b is a correspondingvector of unknown fixed regression coefficients Aij Cij and CSij

A denote random effects that repre-sent additive polygenic common family environmentand common sibling environment effects respectivelyThe variation in an individual response is representedby a composite covariance matrix VT and is the sumof an additive genetic covariance matrix VA acommon family environment matrix VC a sharedsibling environment matrix VCs and residual environ-mental effects The different variance components are

The overall total covariance matrix is

The components of variance s 2A s 2

C s 2Cs need not be

positive as long as VT is positive definite A negativevalue for s 2

Cs simply implies that the realized covari-ance between sib lings is less than the realized

107Twin Research February 2004

Analysis of Melanoma Onset

covariance between parent and a child A negativevalue for s 2

Cs may suggest dominance To optimizeconvergence in BUGS Model (1) may be reparameter-ized as

where Fi Gi H i are independent additive randomeffects or latent variables and are the residual errorterms for parents and children respectively If we model

Fi ~ N(0 mdash12mdash s 2

A +s 2C)

Gi ~ N(0 mdash12mdash s 2

A) Hi ~ N(0 s 2Cs)

then the basic genetics covariance model (additivegenetic common environment and unique environ-ment for this particular four-member family structureis satisfied

In survival models unobserved or unmeasuredexplanatory variables some of which may be geneticare often referred to as frailties The frailties takevalues restricted to the posit ive line and may beassumed to act multiplicatively on the hazardExtending the above model to correlated family datawith time-to-onset endpoint a multiplicative individ-ual heterogeneity or frailty term representing thelatent genetic and common environment variablesmay be modeled as random effects simultaneouslywith the effects associated with observed covariatesConsider right censored time to onset of melanomadata (Tij dij zij)1 j n from n relative pairs hereT i j denotes the true age at onset of the jth familymember or the censored time depending on whetherdij =1 or 0 respectively and z denotes a p acute 1 vector ofcovariates A Weibull distribution may be used tomodel time to failure as

f(ti zi) = ebrsquoz igt igndash1 exp(ndashebrsquoz it g

i )

where b is a vector of unknown regression coefficientsand g is the shape parameter of the Weibull distribu-tion This leads to a baseline hazard of the form

l0(ti) = gt igndash1

Re-parameterize by letting mi = ebrsquoz i the conditionaldistribution of ti given mI is then Weibull (gmi) We for-mulated a mixed model to represent the conditionaldistribution of tij given covariate effects random addi-tive genetic and common environment effects as

tijfrac12mij ~ Weibull (g mij) i = 1hellipn j = 12

where

The regression coefficients and the precision of therandom effects (tG tF tH) were given ldquonon-informa-tiverdquo Normal and Gamma priors respectively The shapeparameter g of the time to onset of melanoma distribu-tion was also given a non-informative Gamma priorwhich was slowly decreasing on the positive real line

The overall model (including random effects) maybe described in a Bayesian graph (Figure 3) whichsimplifies sampling from full conditional distributionsby exploiting part ial independence propert ies(Spiegelhalter et al 1996a) In this graph eachrandom quantity is represented by a node which maybe connected by directed or undirected linksConditional independence assumptions are repre-sented by the absence of such links

We implemented the Gibbs sampler using theBUGS program (Gilks et al 1994) (code in theAppendix) Imputation of missing data was handlednaturally in the Gibbs sampling framework by treatingmissing values as additional unknown quantities andrandomly sampling values from their full conditionaldistributions We chose simple prior distributions forimputation since the number of missing values forcovariates was not large (complete for birth year 16missing for naevi and 21 missing for freckles) andthere was no indication of non-random missingness inour data Therefore imputation for missing naevi andfreckles covariates were based on Bernouilli prior dis-tributions with respective parameter values estimatedfrom the complete observations We performed aninitial 10000 burn-in iterations followed by an addi-tional 20000 Parameter estimates were the mean andstandard deviation (SD) of all post convergence Gibbssamples with a thinning interval between 20 and 50credible intervals were computed as the lower andupper a2 percentiles from the last 20000 iterationsConvergence to the posterior distribution was con-firmed by using the different criteria provided by theadd-on CODA package including those of Gelmanand Rubin (1992) Geweke (1992) and Raftery andLewis (1992a 1992b)

ResultsGEE Approach

An inspection of residual plots following preliminarymodel-fitting provided no evidence for the failure ofproportional hazards assumption and did not detectinfluential observations We then proceeded to applythe GEE approach that could estimate regressioncoefficients while incorporating a dependence struc-ture between relative pairs The results are displayed

108 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

in Table 3 which suggest that later birth year havingat least a moderate number of naevi and freckles wereall simultaneously associated with later age at onsetof melanoma Table 3 also presents the estimatedodds-ratios for quantifying the correlation betweenpaired relatives of specific relationships

The odds ratio for sibndashsib pairs is 2973 which issignificantly different from 1 (p lt 001) The odds ratiofor parentndashchild pairs is 1650 which is slightly greaterthan 1 indicating a mild dependency between thesepairs although they are not quite statistically signifi-cant The odds ratio for second-degree and higherrelative pairs is 1155 indicating no dependence at allbetween these pairs This pattern of familial aggrega-tion is compatible with dominance variance as well asadditive genetic variance (Falconer 1990)

Bayesian Approach

We re-analyzed the age-at-onset of melanoma by usingGibbs sampling to impute missing covariates and toestimate subject-specific covariate effects randomadditive genetic common family environment andshared sibling environment effects on the log scale Theestimated shape parameter g was 43 with a 95 credi-ble interval (CI) of (42 46) The results aresummarized in Table 4 (Model A) The residual plotsdid not indicate a gross departure from the underlyingWeibull model and revealed no influential observa-tions We checked the sensitivity of the analyses to

initial parameter values by re-running the Gibbssampler five more times using different starting valuesThe resulting estimates did not differ by more than 5from the values reported here The mean estimate for(s 2

A) was 0452 with 95 CI = (0348 0566) for (s 2C)

was ndash0053 with 95 CI = (ndash0120 0019) and for(s 2

Cs) was 0467 with 95 CI = (0393 0545)A small negative value for the common family

effect suggests that there may be a dominant effect orthat the current model is not quite appropriate forexample that there are systematic effects or correla-tion structures that have not been accounted for Theresults here indicate that additive genetics seem toimpact equally on the variation of the age at onset ofmelanoma Further exploration for alternative modelsis a focus of our future research

In addition we investigated the relative contribu-tion of genetic and environmental effects on theexpression of naevi (Model B) and freckles (ModelC) which are known risk factors for melanoma Theexpressions of naevi and freckles were coded asbinary variables (none or few moles versus moderateor many moles and no freckles versus one or morefreckles) A hierarchical Bayesian binomial model wasfitted to estimate the random variance componentsThe results in Table 4 indicate that a common familyenvironment effect contributed the most to theexpression of naevi (s 2

C = 0704) (relative to the con-tributions of additive genetic effect (s 2

A = 0142) and

Individual j in family i

Figure 3

Graphical model of covariate and random family effects for an individual in a nuclear family structure tij represents the observed failure time forthe jth individual in the ith family with tcenij being an indicator variable of censoring status Full arrows indicate stochastic links to which a probabil-ity is attached broken arrows denote deterministic relationships bs are regression coefficients T is the precision of the prior distribution andequals the inverse of the variance Fi Gi Hi are random effects modeled as Fi ~ N(012 s 2

A + s 2C) Gi ~ N(0 12 s 2

A) and Hi ~ N(0 s 2Cs) Rectangles rep-

resent actual data values for the covariates g and mij are shape and scale parameters for the underlying Weibull distribution

109Twin Research February 2004

Analysis of Melanoma Onset

of shared sibling effect(s 2Cs = 0142) both of which

were non negligible In contrast variation in theexpression of freckles was largely explained by addi-tive genetic and shared family effects(s 2

A = 2050 s 2C =

2600) compared to a relatively small shared siblingeffect(s 2

C = 0115)

Discussion

We applied two methods mdash generalized estimatingequat ion and Bayesian analys is mdash to the geneticanalysis of age at onset of melanoma based on anuclear family structure Under both approaches theresults suggest that additive genetic factors played an

Table 4

Gibbs Sampling Approach Estimated Regression Coefficients and Estimated Variance Components in a Melanoma Study of Queensland Families

Weibull Model A Mean effects mdash Response variable is Age-at-onset

Covariate RR = eb Coefficient b Robust se(b) 95 CI of bYear of birth 1378 0321 00027 (03160326)

Naevi 1126 0119 00021 (00580185)

Freckling 1017 0017 01400 (ndash00050085)

Weibull Model B Variance components mdash Response variable is Age-at-onset

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 0452 0054 (03480566)

s 2C ndash0053 0027 (ndash01200019)

s 2Cs 0467 0040 (03930545)

g 43 0104 (4246)

Binomial Model Variance components mdash Response variable is Naevi

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 0142 0149 (00020498)

s 2C 0704 0156 (04031010)

s 2Cs 0195 0157 (000250553)

Binomial Model Variance components mdash Response variable is Freckling

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 2050 0779 (08353570)

s 2C 2600 0418 (17803460)

s 2Cs 0115 0088 (00110312)

Note Indicates Significance at the 005 level) Naevi Is a Binary Variable with Baseline 0 = No or Few Moles Freckling Is Coded as a Binary Variable with Baseline0 = No Freckles

Table 3

GEE Approach Estimated Regression Coefficients in the Proportional Hazard Model and Estimated Odds Ratios for Quantifying FamilialAggregation in Age at Onset of Melanoma in Queensland Families ( Indicates Significance at the 05 Significance Level)

A Mean effects

Covariate RR = eb Coefficient b Robust se(b) Z-statistic

Year of birth 1142 0132 0051 2588

Naevi (Baseline = No or few moles) 1765 0568 0073 7781

Freckling (Baseline = No freckles) 1160 0148 0049 3020

B Pattterns of familial aggregation

Relationship 1 + q se(q) Z-statistic

Sibndashsib 2973 06217 317

Parentndashchild 1650 0434 150

SecondOthers 1155 0270 047

important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix

Acknowledgments

Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee

ReferencesAbel L amp Bonney G E (1990) A time-dependent

logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407

Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401

Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165

Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc

Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag

Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press

Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625

Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California

Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212

Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151

Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117

Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202

Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235

English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283

Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd

Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494

GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software

110 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

111Twin Research February 2004

Analysis of Melanoma Onset

Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214

Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188

Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409

Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511

Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741

Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press

Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178

Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189

Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126

Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432

Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109

Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760

Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678

Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506

Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071

Kempthorne O (1960) Biometrical genetics New YorkPergammon Press

Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press

Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210

Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452

Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334

LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425

Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293

Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298

MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432

Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092

Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43

Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493

Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497

Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press

SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University

Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148

112 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171

Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press

Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit

Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit

Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95

Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146

Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation

Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230

Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington

Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454

113Twin Research February 2004

Analysis of Melanoma Onset

Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects

Page 3: Analysis of Melanoma Onset: Assessing Familial Aggregation ... · and Fitting Variance Components via Bayesian Random Effects Models Kim-Anh Do1, Joanne F. Aitken2, Adè le C. Green3,

reported to the Queensland Cancer Registry or foundby comparing cancer registrations for 1984 and 1987with records of pathology laboratories throughoutQueensland It is estimated that registry records areapproximately 95 complete for the study periodDoctorsrsquo permission was obtained to approach10407 cases of whom 7784 (75) returned a brieffamily history questionnaire stating whether any oftheir first-degree relatives (parents siblings children)had had a diagnosis of melanoma A total of 2920probands was sampled from these respondentsincluding all who had claimed a positive familyhistory (n = 1529) and an approximate 20 randomsample of the remainder (n = 1391) Probands weresent a detailed family history questionnaire askingfor the names and addresses of all first-degree rela-tives relativesrsquo vital status dates of birth and agesand whether any of these relatives had had amelanoma diagnosed by a doctor To avoid bias indetermining the mode of inheritance second andlower degree relatives were enrolled in the studyaccording to a sequential sampling scheme (Canningsamp Thompson 1977) First degree relatives of all rela-tives with confirmed melanoma were ascertainedthrough the detailed family history questionnairedescribed above which was mailed to all confirmedpositive relatives In total 15989 relatives belongingto 1912 separate families were reported by 2118(73) probands or other positive relatives A total of1044 relatives for whom date of birth was unknownwere excluded leaving 14945 relatives for analysisThere were 188 families independently ascertainedthrough two or more probands To avoid ascertain-ment bias these families were included in the datasetseparately for each proband in the family

Medical confirmation and dates of diagnosis weresought for the relatives reported by probands or otherrelatives to have had melanoma After eliminating187 of subjects who refused access to their medicalrecords or those with lost records or those with falsepositive reports (basal or squamous cell carcinomasolar keratoses or benign naevi) medical confirma-tion of melanoma as the diagnosis was obtained for482 of the original number of relatives Only themedically verified cases among relatives were classi-fied as true events all other relatives were treated asunaffected (censored at last date of contact)

Risk factor questionnaires were subsequentlymailed to all living relatives aged between 18 and 75years ascertained through the sequential samplingprocedure Other relatives provided proxy reportsThe combined number of proxy-reports and self-reports was 9746 relatives for whom standard riskfactor information was available

For the Bayesian analysis we focused on familiesthat included at least one parent and at least onechild where each member in the family should haveinformation on age at diagnosis or age at last follow-up and with maximum one missing covariate

The demographic covariates and hypothesizedmelanoma risk factors included gender birth yearplace of birth ability to tan (very brown moderatetan slight tan no tan) propensity to burn (neverburn and always tan sometimes burn and usually tanusually burn and sometimes tan always burn andnever tan) number of sunburns (0 1 2ndash5 gt 6) skincolor (olivedark medium fairpale) hair color(black lightdark brown fairblonde lightdark red)eye color (brown greenhazel bluegrey) total freck-ling in summer (0 1ndash100 gt 100) number of naevi(none few moderate number very many) andnumerous measures of cumulative lifetime exposuresto sun and ultraviolet rays

Preliminary Exploratory Analysis

As a preliminary analysis we ignored correlationswithin families and applied a combination of paramet-ric and non-parametric survival analysis techniques asexploratory tools to identify possible risk factors formelanoma Once these fixed effects were identified weconsidered incorporating these into a subsequent gen-eralized estimating equation model or a Bayesianmodel with random effects that could account forwithin-family correlations The aim was to quantify thegenetic and familial associations in the presence ofobserved covariate effects

Manipulation of the entire melanoma datasetresulted in a subset of 9669 observations with a rangeof explanatory variables that described phenotypiccharacteristics for each individual along with somedemographic details such as birth year and genderThe response variable was the time to diagnosis (orage at the last follow-up) with the proportion of cen-sored cases being approximately 76 The medianage at onset of melanoma was 43 The correlationestimates for age of onset for different relationshippairs with both affected members were 067 055and 039 for sibndashsib parentndashchild and secondlowerorder pairs respectively

The first stage of modelling involved fitting uni-variable proportional hazard models to assess eachvariablersquos individual effect on the time of onset ofmelanoma The SAS (Allison 1995) package wasused to fit proportional hazards models of the follow-ing form

h(t x) = Y (x b)h0(t)

where Y represents a log-linear function ebrsquox of theexplanatory variables x and corresponding coeffi-cients b and h0(t) represents the baseline hazard attime t

Significant variables associated with the age atonset of melanoma consisted of eye hair and skincolour freckling number of moles skin type abilityto burn ability to tan previous skin cancers ultravio-let exposure between the ages of 5 and 12 years

100 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

101Twin Research February 2004

Analysis of Melanoma Onset

cumulative sun exposure up to the age of 19 yearsand birth year

The second stage fitted multivariable proportionalhazard models to those explanatory variables thatwere significant at the univariable stage Table 1 dis-plays the results from the final model which onlyincluded significant variables (p-value lt 05) It isworth noting that a similar result could be obtainedusing an automated stepwise procedure

Results from this analysis highlight some interest-ing but quite obvious risk factors noted in previousanalyses (Aitken et al 1996 1998) For example

An increase of 1 year in birth year induces 17increase in risk of earlier melanoma onset

People with neither freckles nor naevi have thelowest risk of melanoma onset

The risk of earlier melanoma onset is increased byup to 37 for blue eyed people and even further(46) for green eyed people when compared toindividuals with brown eyes

ldquoRed Headsrdquo have an increased risk of earliermelanoma onset (46) when compared to indi-viduals with black hair However no significantincrease was noted for individuals with fair orlight red hair

A personrsquos tendency to burn easily increases therisk of earlier melanoma onset in some cases byup to 100 compared to those that never burn

However striking this last statement is issues of con-founding must also be considered The most obviousillustration of this is the confounding that occurredbetween mole count and freckling This is seenthrough close inspection of the parameter estimateswhich changed in magnitude when mole count wasadded to the model after adjusting for freckling(Results not displayed here)

To reduce the dimension of the problem furtherand avoid some of these confounding issues a sur-vival tree was constructed using RPART (RecursivePartitioning and Regression Trees) see Therneau andAtkinson (1997) Survival trees are a special case of

Table 1

Results from Fitting a Multivariable Proportional Hazards Model to the Melanoma Data Based on Univariable Results The Results Reported in thisTable Are the Parameter Estimates b Their Standard Errors se(b) the Relative Risk eb and the p-value for Each Estimate

Variable b se(b) eb p-value

Birth Year 016 lt 001 117 lt 05

Eye Colour (Baseline Brown)

BlueGrey 031 007 136 lt 05

GreenHazel 038 007 146 lt05

Hair Colour (Baseline Black)

Light RedGinger 017 015 119 27

Dark RedAuburn 038 015 146 lt 05

FairBlonde 006 012 106 62

Light Brown 014 012 115 22

Dark Brown 002 012 102 87

Skin Type (Baseline never burn)

Always burn 069 016 1968 lt 05

Usually burn 045 015 157 lt 05

Sometimes burn 031 015 136 lt 05

Freckling (Baseline none)

1 to 100 017 006 118 lt 05

gt 100 009 008 110 23

Mole Count (Baseline none)

Few 029 007 134 lt 05

Moderate 079 008 220 lt 05

Many 112 010 308 lt 05

Number of Sunburns (Baseline none)

One ndash007 011 093 49

2 to 5 ndash006 009 094 50

gt 6 017 009 119 07

Cumulative Sun Exposure ( lt 5 yrs) 004 001 104 lt 05

UV Exposure (5ndash12 yrs) 00003 lt 00001 1 lt 05

102 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

decision trees that were incorporated into RPART byTherneau (1997) using the ideas put forth by LeBlancand Crowley (1992) for survival data with censoringThese authors showed through simulation studies thatsurvival trees could outperform standard parametricmethods such as proportional hazards modelling particularly in situations where the underlying distrib-ution was not exponential Along with Breiman et al(1984) they showed decision trees to be usefulexploratory tools for identifying important variablesinteractions and outliers

The methodology for survival trees begins by iden-tifying optimal splits using the log rank statistic toseparate the data into homogeneous groups Eachsplit is comprised of a parent node and two daughternodes which are linked to the parent by branchesFigure 1 is an example of a survival tree produced onthe melanoma data Once a large tree is grown withmany terminal nodes that contain very few observa-tions a pruning procedure is introduced to identify asequence of sub-trees The technique of pruninginvolves snipping back splits of the tree one at a timeuntil only the root node remains Cross-validation isthen introduced to aid in the selection of the optimalmodel One nice feature of survival trees is the use ofKaplan-Meier curves to provide information aboutsurvival rates at each terminal node of the tree Thisapproach may allow for better interpretation of theterminal nodes and highlight different scenarioswhich yield similar survival rates as can be seen inFigure 1

The variables identified from the multivariablemodel were used as input into RPART Variables suchas the sun and UV exposures were omitted from themodelling stage since nearly half of the data for eachof these variables were missing (cumulative sun expo-sure 438 UV exposure 404) A large survivaltree was produced in SPLUS and using cross-valida-tion a model splitting solely on birth year wasidentified This survival tree yielded the minimumcross-validated error rate (093) suggesting that birthyear was an important indicator for melanomaSelecting a slightly larger but more informative model(error = 094) resulted in a survival tree consisting ofnine terminal nodes This revealed splits on birth yearmole count and freckling Al l other variablesappeared to be either competing at each node oracting as a surrogate variable for one of these primarysplits Skin type in particular arose as an importantsurrogate for freckling

In Figure 1 we aim to provide a more visual inter-pretation to the results from the survival analysisEach split is shown at the top of each node and canbe assumed to be the split that directs observationstowards the left side of the tree For example aftersplitting on birth year at 1933 a split to the left cor-responds to an individual with few moles A split tothe right indicates an individual with many moles andso on The terminal nodes in this figure are displayed

as Kaplan-Meier plots showing the survival curvesalong with the number of observations n and the riskratio calculated with reference to the baseline groupof 9669 individuals The reference survival curve isshown at the root node along with a bar which showsthe percentage of events in the entire dataset Gridsare placed on each survival plot at equal intervals of20 years of age These grids have been plotted to sim-plify the interpretation An indicator to the left of theplot displays the risk ratio Ratios above one areshown by an ldquoup arrowrdquo while a decrease in risk isillustrated by a ldquodown arrowrdquo From this model wecan see that there are a few scenarios that indicatehigh risk for earlier onset of melanoma These scenar-ios may be described as

individuals born after 1966 with many moles (RR = 113)

individuals born between 1947 and 1966 withmany moles (RR = 45)

individuals born between 1933 and 1953 with fewmoles but many freckles (RR = 24)

individuals born between 1923 and 1933 withmany moles (RR = 141)

It is obvious from these results that birth year has asubstantial impact on the age-at-onset of melanomaOnce this is taken into account mole count and freck-ling only provide a small contribution to the risk

Family History of Melanoma

As described earlier in the Data section family historywas collected regarding first-degree (siblings andparents) and second-degree relatives From this pedi-gree structure other higher-order types of relative pairscould also be formed Some of these relatives were dis-eased with melanoma resulting in pairs of relativeswho both may be diseased (++) both not diseased (mdash)and one diseased while the other was not diseased (+ndash)Table 2 lists the concordant and discordant pairs ofspecific relationships sibndashsib parentndashchild andsecond-degreelower order relative pairs The second-degree relative pairs include grandparentndashgrandchildand auntndashniece while the lower order relative pairsinclude the in-law pairs From Table 2 the percent-ages of both diseased pairs are 06 09 and 04among sibndashsib parentndashchild and secondlower orderpairs respectively Crude estimates of correlationcoefficients can be calculated from these percentageswithout accounting for ages at onset among these rel-atives However the risk of developing melanomamay depend on the subjectrsquos age Hence adjusting forthe age at onset is essential in quantifying the correla-tion of age at onset between pairs of relatives In asubsequent section we describe how this can be donerigorously via the generalized estimating equationsapproach On average melanomas were diagnosedslightly earlier in relatives (475 years) than inprobands (502 years) Among relatives melanomaswere diagnosed at younger ages in later generations

103Twin Research February 2004

Analysis of Melanoma Onset

RR

= 1

00

Bas

elin

e

Fig

ure

1

Surv

ival

tree

for t

he m

elan

oma

data

con

sist

ing

of 9

669

indi

vidu

als

from

191

2 Qu

eens

land

fam

ilies

Ter

min

al n

odes

are

rect

angu

lar a

nd d

ispl

ay th

e Ka

plan

-Mei

er c

urve

for t

he c

orre

spon

ding

sub

grou

p

104 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

To account for the different ages at censoring in eachgeneration due to termination of the study or deathfrom causes other than melanoma we examined thedisease-free survival distribution for each generationusing the standard failure-time analysis technique(Figure 2) The median age at diagnosis of melanomawas 64 among parents of probands 50 among sib-lings of probands and 33 among children ofprobands The disease-free survival functions differedsignificantly between the children generation fromearlier generations (log-rank test with p lt 01) butthere was no significant differences in age-at-onset of

melanoma between the siblings and parents ofprobands (log rank test with p gt 9)

Preliminary Segregation Analysis

A preliminary segregation analysis was conducted(Aitken et al 1998) to investigate whether the famil-ial clustering of cutaneous melanoma is consistentwith Mendelian inheritance of a major autosomalgene Analyses were performed with the SAGE (1992)statistical package using the maximum likelihoodREGTL program for a binary trait with a variable ageof onset The hypothesis of co-dominant Mendelianinheritance gave a significantly better fit to the datathan either dominant or recessive Mendelian inheri-tance Overall both Mendelian inheritance of a singlemajor gene and purely environmental transmissionwere rejected However there was strong evidence offamilial dependence in melanoma occurrence

The inclusion of risk factors in the models mayreveal whether all or a combination of these explainsthe familial dependence that was demonstrated by thesegregation analysis If a major gene exists that oper-ates independently on these covariates includingthem in the models may reduce residual variation andincrease the power of the analysis to detect a majorgene effect

Estimating Equations Approach

Let y = (dki tki Zki) denote the data collected for the ith

member in the kth family (k = 1hellipK and i = 12)where dki = 0 if the observation is censored tki is eitherthe recorded age at diagnosis of melanoma or the ageat the most recent follow-up for unaffected peopleand Zki is a vector of measured covariates We assumethat censoring time age at diagnosis of melanomaand the covariates are independently distributedThese assumptions can be relaxed in more generalmodels subject to identification constraints Thehazard rate for melanoma is the instantaneous proba-bility that melanoma is diagnosed immediately aftertime t given that the person is unaffected at time tThe hazard rate under the Cox proportional hazardsmodel (Cox 1972) is given by

l(tki) = l0(tki) exp(brsquoZki)

Table 2

Concordant and Discordant Pairs of Relatives in 1912 Families from the Queensland Familial Melanoma Project Probands Were Not Included forthe Calculation of Concordance

Sibndashsib Parentndashchild SecondOthers Total

++ 49 15 41 105

+ndash 763 536 1200 2499

mdash 7011 1078 9817 17906

Total 7823 1629 11058 20510

Figure 2

Cumulative melanoma-free survival among first-degree relatives ofconfirmed cutaneous melanoma cases (probands) diagnosed inQueensland Australia 1982ndash1987 according to the relativersquos rela-tionship to the proband

where l0() is the baseline hazard function and b is avector of regression coefficients

For a specific pair of relatives we follow Clayton(1978) in modeling the bivariate survivor function

F(tk1 tk2) = (F1(tk1)ndashq + F2(tk2)

ndashq ndash1)ndash1q

where F1 and F2 are univariable survivor functions qis a scalar parameter that measures the degree ofdependence between the relativesrsquo times at onsetindependence being implied by q = 0 and positiveassociation by q gt 0 The Clayton model allows nega-tive dependencies and has the property that failuretimes are absolutely continuous for q gt ndash05 In addi-tion the cross-ratio (or odds-ratio) function asstudied by Oakes (1989) is

c(tk1 tk2) = l(tk1 frac12 Tk2 = tk2) l(tk1frac12Tk2 sup3 tk2) =1 + q

This is equivalent to assuming that the odds-ratio isinvariant over the grid region that supports the dataHeuristically the parameter 1 + q is an odds-ratiothat depends on the degree of dependence betweenthe onset ages of the two relatives If genetic factorsdo influence the age at onset of melanoma we wouldexpect to see a higher concordance in the age of onsetin first degree relatives who on average share halftheir genes in common in comparison to seconddegree relatives Under the current model this trans-lates as qfirst-degree gt qsecond-degree

We may use a standard method to estimate withinpair correlations for 2 acute 2 tables from odds ratiosEstimates of relation-pair correlations psibndashsib pparentndashchildand psecond-order are recovered from using the relation-ships for example r sibndashsib = min(1ln(1 + q sibndashsib))Testing for the presence of genetic factors underlyingthe age at diagnosis of melanoma is equivalent totesting H0 p first-order = psecond-order We may test thishypothesis using a z-transform (Kendall 1979 p 315) of the point estimates of the correlation coef-ficients Let n1 and n2 denote the number of first orderand second order relatives let z1and z2 denote thetransformed statistics of r1and r2 the correlation esti-mates for first-order and second-order relative pairsrespectively Specifically we reject H0 when ED gtZ1ndasha where

E = E(z1 ndash z2) = mdash12mdash log 31mdash11

+ndash r

r

1

1mdash2 1mdash11

+ndash r

r2

2

mdash24D2 = V(z1 ndash z2) = mdash

(n1

1ndash 3)mdash+ mdash

(n2

1ndash 3)mdash

and Z1ndasha is the standard normal deviate correspondingto the one-sided a significance level

This approach has the advantage of providing atest for the presence of genetic effects through a singleparameter (q) However it is limited in its ability to

attribute the phenotypic variance to specific effects(eg additive gene action)

Mathematical details of the GEE model and theiterative procedure to estimate the regression coeffi-cients b and specific degrees of dependence q for thedifferent types of relative pairs have been summarizedpreviously in Do et al (2000)

MCMC Analysis Using BUGSThe Bayesian Paradigm and Gibbs Sampling

Markov Chain Monte Carlo (MCMC) is an alterna-tive Bayesian approach that provides estimates oflikelihoods and associated parameter values whenexact computation is infeasible (Hastings 1970Metropolis et al 1953) MCMC methods can beused to draw samples from the underlying joint distri-bution of major genotypes and polygenic valuesconditional on the observed data From thesesamples desired parameters and likelihoods can beestimated without the need to resort to exact compu-tation MCMC methods have been used for linkageanalysis (Kong et al 1992 Lange amp Sobel 1991)for estimation of parameters in the mixed model withand without covariates (Guo amp Thompson 1991Thomas 1992) for estimation of genendashsmoking inter-action and covariate imputation (Gauderman et al1997) for performing combined linkage and segrega-tion analysis (Faucett et al 1993 Guo amp Thompson1992) and for mixed models of large complex pedi-grees (Guo amp Thompson 1994)

In a general setting let y be the observed data andq be everything not observed including parametersand latent variables The implementation of Bayesianmethods using realistic models and priors is com-puter-intensive and relies heavily on cunningcomputational tools to approximate integrals Theproblem in general terms is to obtain the expectedvalue of a function of interest s () under the posteriordensity p(qfrac12x)

E[s(q)] =

which cannot generally be found analytically Onemethod to carry out the integration on the RHS is toperform simulation of exact Bayesian posterior distri-butions using Markov chain Monte Carlo techniquessuch as Gibbs sampling The Gibbs sampler (Gemanamp Geman 1984) is the most popular algorithm usedin MCMC applications to correlated data Gibbssampling was introduced to the main statistical com-munity by Gelfand and Smith (1990) and has sincebeen applied to an even wider array of problems TheGibbs sampler is easy to implement because it onlydepends on the local neighborhood structure In thecontext of pedigree analysis (Olshen amp Wijsman1996) the basic procedure is a sequential updating ofmissing and latent data including the underlying andunobserved major genotypes polygenic effects and

EQ s(q) p(q) p(xfrac12q) dqmdash mdash mdash

EQ p(q) p(xfrac12q) dq

105Twin Research February 2004

Analysis of Melanoma Onset

106 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

environmental effects Values for the missing or latentdata are sampled from the local conditional distribu-tion a function of the observed individual data thecurrent sampled values of other missinglatent data forthis particular individual such as polygenic and environ-mental effects and the values for the sampled geneticeffects in the immediate neighbors of an individualGibbs sampling basically consists of three main steps

Step 1 Setting initial values for unobserved quanti-ties (parameters and latent variables)

Step 2 For each parameter or latent variable qj sample from its ldquofull conditional distributionrdquogiven the current values of all other quantities inthe model

Step 3 Examine sampled values of parameters andlatent variables to monitor convergence and toprovide summary measures

Some of the most recent and popular packages thatimplement Gibbs sampling for analysis of pedigreedata include BUGS (Gilks et al 1994 Spiegelhalter et al 1996ab) Genetic Analysis Package (GAP1996) and MIXD (Olshen amp Wijsman 1996Thompson 1994) We have used BUGS mainlybecause of its flexibility in programming hierarchicalmodels besides being a freeware product

The Model

The aim here is to model the correlation structurewithin the family structure to satisfy the fundamentaladditive genetic model (Crow amp Kimura 1970Falconer 1990 Kempthorne 1960) as followsConsider a nuclear family structure consisting of 4members father mother and two children denoted byF M S1 S2 respectively Using similar notation as inBurton et al (1999) a conventional mixed linearmodel consisting of fixed and random effects may bewritten in the form

Qij = bTz + Aij +Cij + CSij+ Eij

where Qij is the observed value of a normally distrib-uted continuous trait for the jth individual in the ith

nuclear family zij is a vector of observed covariatesrepresenting fixed effects and b is a correspondingvector of unknown fixed regression coefficients Aij Cij and CSij

A denote random effects that repre-sent additive polygenic common family environmentand common sibling environment effects respectivelyThe variation in an individual response is representedby a composite covariance matrix VT and is the sumof an additive genetic covariance matrix VA acommon family environment matrix VC a sharedsibling environment matrix VCs and residual environ-mental effects The different variance components are

The overall total covariance matrix is

The components of variance s 2A s 2

C s 2Cs need not be

positive as long as VT is positive definite A negativevalue for s 2

Cs simply implies that the realized covari-ance between sib lings is less than the realized

107Twin Research February 2004

Analysis of Melanoma Onset

covariance between parent and a child A negativevalue for s 2

Cs may suggest dominance To optimizeconvergence in BUGS Model (1) may be reparameter-ized as

where Fi Gi H i are independent additive randomeffects or latent variables and are the residual errorterms for parents and children respectively If we model

Fi ~ N(0 mdash12mdash s 2

A +s 2C)

Gi ~ N(0 mdash12mdash s 2

A) Hi ~ N(0 s 2Cs)

then the basic genetics covariance model (additivegenetic common environment and unique environ-ment for this particular four-member family structureis satisfied

In survival models unobserved or unmeasuredexplanatory variables some of which may be geneticare often referred to as frailties The frailties takevalues restricted to the posit ive line and may beassumed to act multiplicatively on the hazardExtending the above model to correlated family datawith time-to-onset endpoint a multiplicative individ-ual heterogeneity or frailty term representing thelatent genetic and common environment variablesmay be modeled as random effects simultaneouslywith the effects associated with observed covariatesConsider right censored time to onset of melanomadata (Tij dij zij)1 j n from n relative pairs hereT i j denotes the true age at onset of the jth familymember or the censored time depending on whetherdij =1 or 0 respectively and z denotes a p acute 1 vector ofcovariates A Weibull distribution may be used tomodel time to failure as

f(ti zi) = ebrsquoz igt igndash1 exp(ndashebrsquoz it g

i )

where b is a vector of unknown regression coefficientsand g is the shape parameter of the Weibull distribu-tion This leads to a baseline hazard of the form

l0(ti) = gt igndash1

Re-parameterize by letting mi = ebrsquoz i the conditionaldistribution of ti given mI is then Weibull (gmi) We for-mulated a mixed model to represent the conditionaldistribution of tij given covariate effects random addi-tive genetic and common environment effects as

tijfrac12mij ~ Weibull (g mij) i = 1hellipn j = 12

where

The regression coefficients and the precision of therandom effects (tG tF tH) were given ldquonon-informa-tiverdquo Normal and Gamma priors respectively The shapeparameter g of the time to onset of melanoma distribu-tion was also given a non-informative Gamma priorwhich was slowly decreasing on the positive real line

The overall model (including random effects) maybe described in a Bayesian graph (Figure 3) whichsimplifies sampling from full conditional distributionsby exploiting part ial independence propert ies(Spiegelhalter et al 1996a) In this graph eachrandom quantity is represented by a node which maybe connected by directed or undirected linksConditional independence assumptions are repre-sented by the absence of such links

We implemented the Gibbs sampler using theBUGS program (Gilks et al 1994) (code in theAppendix) Imputation of missing data was handlednaturally in the Gibbs sampling framework by treatingmissing values as additional unknown quantities andrandomly sampling values from their full conditionaldistributions We chose simple prior distributions forimputation since the number of missing values forcovariates was not large (complete for birth year 16missing for naevi and 21 missing for freckles) andthere was no indication of non-random missingness inour data Therefore imputation for missing naevi andfreckles covariates were based on Bernouilli prior dis-tributions with respective parameter values estimatedfrom the complete observations We performed aninitial 10000 burn-in iterations followed by an addi-tional 20000 Parameter estimates were the mean andstandard deviation (SD) of all post convergence Gibbssamples with a thinning interval between 20 and 50credible intervals were computed as the lower andupper a2 percentiles from the last 20000 iterationsConvergence to the posterior distribution was con-firmed by using the different criteria provided by theadd-on CODA package including those of Gelmanand Rubin (1992) Geweke (1992) and Raftery andLewis (1992a 1992b)

ResultsGEE Approach

An inspection of residual plots following preliminarymodel-fitting provided no evidence for the failure ofproportional hazards assumption and did not detectinfluential observations We then proceeded to applythe GEE approach that could estimate regressioncoefficients while incorporating a dependence struc-ture between relative pairs The results are displayed

108 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

in Table 3 which suggest that later birth year havingat least a moderate number of naevi and freckles wereall simultaneously associated with later age at onsetof melanoma Table 3 also presents the estimatedodds-ratios for quantifying the correlation betweenpaired relatives of specific relationships

The odds ratio for sibndashsib pairs is 2973 which issignificantly different from 1 (p lt 001) The odds ratiofor parentndashchild pairs is 1650 which is slightly greaterthan 1 indicating a mild dependency between thesepairs although they are not quite statistically signifi-cant The odds ratio for second-degree and higherrelative pairs is 1155 indicating no dependence at allbetween these pairs This pattern of familial aggrega-tion is compatible with dominance variance as well asadditive genetic variance (Falconer 1990)

Bayesian Approach

We re-analyzed the age-at-onset of melanoma by usingGibbs sampling to impute missing covariates and toestimate subject-specific covariate effects randomadditive genetic common family environment andshared sibling environment effects on the log scale Theestimated shape parameter g was 43 with a 95 credi-ble interval (CI) of (42 46) The results aresummarized in Table 4 (Model A) The residual plotsdid not indicate a gross departure from the underlyingWeibull model and revealed no influential observa-tions We checked the sensitivity of the analyses to

initial parameter values by re-running the Gibbssampler five more times using different starting valuesThe resulting estimates did not differ by more than 5from the values reported here The mean estimate for(s 2

A) was 0452 with 95 CI = (0348 0566) for (s 2C)

was ndash0053 with 95 CI = (ndash0120 0019) and for(s 2

Cs) was 0467 with 95 CI = (0393 0545)A small negative value for the common family

effect suggests that there may be a dominant effect orthat the current model is not quite appropriate forexample that there are systematic effects or correla-tion structures that have not been accounted for Theresults here indicate that additive genetics seem toimpact equally on the variation of the age at onset ofmelanoma Further exploration for alternative modelsis a focus of our future research

In addition we investigated the relative contribu-tion of genetic and environmental effects on theexpression of naevi (Model B) and freckles (ModelC) which are known risk factors for melanoma Theexpressions of naevi and freckles were coded asbinary variables (none or few moles versus moderateor many moles and no freckles versus one or morefreckles) A hierarchical Bayesian binomial model wasfitted to estimate the random variance componentsThe results in Table 4 indicate that a common familyenvironment effect contributed the most to theexpression of naevi (s 2

C = 0704) (relative to the con-tributions of additive genetic effect (s 2

A = 0142) and

Individual j in family i

Figure 3

Graphical model of covariate and random family effects for an individual in a nuclear family structure tij represents the observed failure time forthe jth individual in the ith family with tcenij being an indicator variable of censoring status Full arrows indicate stochastic links to which a probabil-ity is attached broken arrows denote deterministic relationships bs are regression coefficients T is the precision of the prior distribution andequals the inverse of the variance Fi Gi Hi are random effects modeled as Fi ~ N(012 s 2

A + s 2C) Gi ~ N(0 12 s 2

A) and Hi ~ N(0 s 2Cs) Rectangles rep-

resent actual data values for the covariates g and mij are shape and scale parameters for the underlying Weibull distribution

109Twin Research February 2004

Analysis of Melanoma Onset

of shared sibling effect(s 2Cs = 0142) both of which

were non negligible In contrast variation in theexpression of freckles was largely explained by addi-tive genetic and shared family effects(s 2

A = 2050 s 2C =

2600) compared to a relatively small shared siblingeffect(s 2

C = 0115)

Discussion

We applied two methods mdash generalized estimatingequat ion and Bayesian analys is mdash to the geneticanalysis of age at onset of melanoma based on anuclear family structure Under both approaches theresults suggest that additive genetic factors played an

Table 4

Gibbs Sampling Approach Estimated Regression Coefficients and Estimated Variance Components in a Melanoma Study of Queensland Families

Weibull Model A Mean effects mdash Response variable is Age-at-onset

Covariate RR = eb Coefficient b Robust se(b) 95 CI of bYear of birth 1378 0321 00027 (03160326)

Naevi 1126 0119 00021 (00580185)

Freckling 1017 0017 01400 (ndash00050085)

Weibull Model B Variance components mdash Response variable is Age-at-onset

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 0452 0054 (03480566)

s 2C ndash0053 0027 (ndash01200019)

s 2Cs 0467 0040 (03930545)

g 43 0104 (4246)

Binomial Model Variance components mdash Response variable is Naevi

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 0142 0149 (00020498)

s 2C 0704 0156 (04031010)

s 2Cs 0195 0157 (000250553)

Binomial Model Variance components mdash Response variable is Freckling

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 2050 0779 (08353570)

s 2C 2600 0418 (17803460)

s 2Cs 0115 0088 (00110312)

Note Indicates Significance at the 005 level) Naevi Is a Binary Variable with Baseline 0 = No or Few Moles Freckling Is Coded as a Binary Variable with Baseline0 = No Freckles

Table 3

GEE Approach Estimated Regression Coefficients in the Proportional Hazard Model and Estimated Odds Ratios for Quantifying FamilialAggregation in Age at Onset of Melanoma in Queensland Families ( Indicates Significance at the 05 Significance Level)

A Mean effects

Covariate RR = eb Coefficient b Robust se(b) Z-statistic

Year of birth 1142 0132 0051 2588

Naevi (Baseline = No or few moles) 1765 0568 0073 7781

Freckling (Baseline = No freckles) 1160 0148 0049 3020

B Pattterns of familial aggregation

Relationship 1 + q se(q) Z-statistic

Sibndashsib 2973 06217 317

Parentndashchild 1650 0434 150

SecondOthers 1155 0270 047

important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix

Acknowledgments

Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee

ReferencesAbel L amp Bonney G E (1990) A time-dependent

logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407

Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401

Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165

Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc

Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag

Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press

Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625

Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California

Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212

Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151

Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117

Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202

Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235

English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283

Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd

Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494

GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software

110 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

111Twin Research February 2004

Analysis of Melanoma Onset

Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214

Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188

Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409

Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511

Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741

Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press

Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178

Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189

Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126

Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432

Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109

Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760

Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678

Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506

Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071

Kempthorne O (1960) Biometrical genetics New YorkPergammon Press

Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press

Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210

Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452

Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334

LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425

Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293

Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298

MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432

Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092

Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43

Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493

Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497

Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press

SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University

Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148

112 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171

Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press

Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit

Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit

Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95

Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146

Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation

Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230

Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington

Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454

113Twin Research February 2004

Analysis of Melanoma Onset

Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects

Page 4: Analysis of Melanoma Onset: Assessing Familial Aggregation ... · and Fitting Variance Components via Bayesian Random Effects Models Kim-Anh Do1, Joanne F. Aitken2, Adè le C. Green3,

101Twin Research February 2004

Analysis of Melanoma Onset

cumulative sun exposure up to the age of 19 yearsand birth year

The second stage fitted multivariable proportionalhazard models to those explanatory variables thatwere significant at the univariable stage Table 1 dis-plays the results from the final model which onlyincluded significant variables (p-value lt 05) It isworth noting that a similar result could be obtainedusing an automated stepwise procedure

Results from this analysis highlight some interest-ing but quite obvious risk factors noted in previousanalyses (Aitken et al 1996 1998) For example

An increase of 1 year in birth year induces 17increase in risk of earlier melanoma onset

People with neither freckles nor naevi have thelowest risk of melanoma onset

The risk of earlier melanoma onset is increased byup to 37 for blue eyed people and even further(46) for green eyed people when compared toindividuals with brown eyes

ldquoRed Headsrdquo have an increased risk of earliermelanoma onset (46) when compared to indi-viduals with black hair However no significantincrease was noted for individuals with fair orlight red hair

A personrsquos tendency to burn easily increases therisk of earlier melanoma onset in some cases byup to 100 compared to those that never burn

However striking this last statement is issues of con-founding must also be considered The most obviousillustration of this is the confounding that occurredbetween mole count and freckling This is seenthrough close inspection of the parameter estimateswhich changed in magnitude when mole count wasadded to the model after adjusting for freckling(Results not displayed here)

To reduce the dimension of the problem furtherand avoid some of these confounding issues a sur-vival tree was constructed using RPART (RecursivePartitioning and Regression Trees) see Therneau andAtkinson (1997) Survival trees are a special case of

Table 1

Results from Fitting a Multivariable Proportional Hazards Model to the Melanoma Data Based on Univariable Results The Results Reported in thisTable Are the Parameter Estimates b Their Standard Errors se(b) the Relative Risk eb and the p-value for Each Estimate

Variable b se(b) eb p-value

Birth Year 016 lt 001 117 lt 05

Eye Colour (Baseline Brown)

BlueGrey 031 007 136 lt 05

GreenHazel 038 007 146 lt05

Hair Colour (Baseline Black)

Light RedGinger 017 015 119 27

Dark RedAuburn 038 015 146 lt 05

FairBlonde 006 012 106 62

Light Brown 014 012 115 22

Dark Brown 002 012 102 87

Skin Type (Baseline never burn)

Always burn 069 016 1968 lt 05

Usually burn 045 015 157 lt 05

Sometimes burn 031 015 136 lt 05

Freckling (Baseline none)

1 to 100 017 006 118 lt 05

gt 100 009 008 110 23

Mole Count (Baseline none)

Few 029 007 134 lt 05

Moderate 079 008 220 lt 05

Many 112 010 308 lt 05

Number of Sunburns (Baseline none)

One ndash007 011 093 49

2 to 5 ndash006 009 094 50

gt 6 017 009 119 07

Cumulative Sun Exposure ( lt 5 yrs) 004 001 104 lt 05

UV Exposure (5ndash12 yrs) 00003 lt 00001 1 lt 05

102 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

decision trees that were incorporated into RPART byTherneau (1997) using the ideas put forth by LeBlancand Crowley (1992) for survival data with censoringThese authors showed through simulation studies thatsurvival trees could outperform standard parametricmethods such as proportional hazards modelling particularly in situations where the underlying distrib-ution was not exponential Along with Breiman et al(1984) they showed decision trees to be usefulexploratory tools for identifying important variablesinteractions and outliers

The methodology for survival trees begins by iden-tifying optimal splits using the log rank statistic toseparate the data into homogeneous groups Eachsplit is comprised of a parent node and two daughternodes which are linked to the parent by branchesFigure 1 is an example of a survival tree produced onthe melanoma data Once a large tree is grown withmany terminal nodes that contain very few observa-tions a pruning procedure is introduced to identify asequence of sub-trees The technique of pruninginvolves snipping back splits of the tree one at a timeuntil only the root node remains Cross-validation isthen introduced to aid in the selection of the optimalmodel One nice feature of survival trees is the use ofKaplan-Meier curves to provide information aboutsurvival rates at each terminal node of the tree Thisapproach may allow for better interpretation of theterminal nodes and highlight different scenarioswhich yield similar survival rates as can be seen inFigure 1

The variables identified from the multivariablemodel were used as input into RPART Variables suchas the sun and UV exposures were omitted from themodelling stage since nearly half of the data for eachof these variables were missing (cumulative sun expo-sure 438 UV exposure 404) A large survivaltree was produced in SPLUS and using cross-valida-tion a model splitting solely on birth year wasidentified This survival tree yielded the minimumcross-validated error rate (093) suggesting that birthyear was an important indicator for melanomaSelecting a slightly larger but more informative model(error = 094) resulted in a survival tree consisting ofnine terminal nodes This revealed splits on birth yearmole count and freckling Al l other variablesappeared to be either competing at each node oracting as a surrogate variable for one of these primarysplits Skin type in particular arose as an importantsurrogate for freckling

In Figure 1 we aim to provide a more visual inter-pretation to the results from the survival analysisEach split is shown at the top of each node and canbe assumed to be the split that directs observationstowards the left side of the tree For example aftersplitting on birth year at 1933 a split to the left cor-responds to an individual with few moles A split tothe right indicates an individual with many moles andso on The terminal nodes in this figure are displayed

as Kaplan-Meier plots showing the survival curvesalong with the number of observations n and the riskratio calculated with reference to the baseline groupof 9669 individuals The reference survival curve isshown at the root node along with a bar which showsthe percentage of events in the entire dataset Gridsare placed on each survival plot at equal intervals of20 years of age These grids have been plotted to sim-plify the interpretation An indicator to the left of theplot displays the risk ratio Ratios above one areshown by an ldquoup arrowrdquo while a decrease in risk isillustrated by a ldquodown arrowrdquo From this model wecan see that there are a few scenarios that indicatehigh risk for earlier onset of melanoma These scenar-ios may be described as

individuals born after 1966 with many moles (RR = 113)

individuals born between 1947 and 1966 withmany moles (RR = 45)

individuals born between 1933 and 1953 with fewmoles but many freckles (RR = 24)

individuals born between 1923 and 1933 withmany moles (RR = 141)

It is obvious from these results that birth year has asubstantial impact on the age-at-onset of melanomaOnce this is taken into account mole count and freck-ling only provide a small contribution to the risk

Family History of Melanoma

As described earlier in the Data section family historywas collected regarding first-degree (siblings andparents) and second-degree relatives From this pedi-gree structure other higher-order types of relative pairscould also be formed Some of these relatives were dis-eased with melanoma resulting in pairs of relativeswho both may be diseased (++) both not diseased (mdash)and one diseased while the other was not diseased (+ndash)Table 2 lists the concordant and discordant pairs ofspecific relationships sibndashsib parentndashchild andsecond-degreelower order relative pairs The second-degree relative pairs include grandparentndashgrandchildand auntndashniece while the lower order relative pairsinclude the in-law pairs From Table 2 the percent-ages of both diseased pairs are 06 09 and 04among sibndashsib parentndashchild and secondlower orderpairs respectively Crude estimates of correlationcoefficients can be calculated from these percentageswithout accounting for ages at onset among these rel-atives However the risk of developing melanomamay depend on the subjectrsquos age Hence adjusting forthe age at onset is essential in quantifying the correla-tion of age at onset between pairs of relatives In asubsequent section we describe how this can be donerigorously via the generalized estimating equationsapproach On average melanomas were diagnosedslightly earlier in relatives (475 years) than inprobands (502 years) Among relatives melanomaswere diagnosed at younger ages in later generations

103Twin Research February 2004

Analysis of Melanoma Onset

RR

= 1

00

Bas

elin

e

Fig

ure

1

Surv

ival

tree

for t

he m

elan

oma

data

con

sist

ing

of 9

669

indi

vidu

als

from

191

2 Qu

eens

land

fam

ilies

Ter

min

al n

odes

are

rect

angu

lar a

nd d

ispl

ay th

e Ka

plan

-Mei

er c

urve

for t

he c

orre

spon

ding

sub

grou

p

104 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

To account for the different ages at censoring in eachgeneration due to termination of the study or deathfrom causes other than melanoma we examined thedisease-free survival distribution for each generationusing the standard failure-time analysis technique(Figure 2) The median age at diagnosis of melanomawas 64 among parents of probands 50 among sib-lings of probands and 33 among children ofprobands The disease-free survival functions differedsignificantly between the children generation fromearlier generations (log-rank test with p lt 01) butthere was no significant differences in age-at-onset of

melanoma between the siblings and parents ofprobands (log rank test with p gt 9)

Preliminary Segregation Analysis

A preliminary segregation analysis was conducted(Aitken et al 1998) to investigate whether the famil-ial clustering of cutaneous melanoma is consistentwith Mendelian inheritance of a major autosomalgene Analyses were performed with the SAGE (1992)statistical package using the maximum likelihoodREGTL program for a binary trait with a variable ageof onset The hypothesis of co-dominant Mendelianinheritance gave a significantly better fit to the datathan either dominant or recessive Mendelian inheri-tance Overall both Mendelian inheritance of a singlemajor gene and purely environmental transmissionwere rejected However there was strong evidence offamilial dependence in melanoma occurrence

The inclusion of risk factors in the models mayreveal whether all or a combination of these explainsthe familial dependence that was demonstrated by thesegregation analysis If a major gene exists that oper-ates independently on these covariates includingthem in the models may reduce residual variation andincrease the power of the analysis to detect a majorgene effect

Estimating Equations Approach

Let y = (dki tki Zki) denote the data collected for the ith

member in the kth family (k = 1hellipK and i = 12)where dki = 0 if the observation is censored tki is eitherthe recorded age at diagnosis of melanoma or the ageat the most recent follow-up for unaffected peopleand Zki is a vector of measured covariates We assumethat censoring time age at diagnosis of melanomaand the covariates are independently distributedThese assumptions can be relaxed in more generalmodels subject to identification constraints Thehazard rate for melanoma is the instantaneous proba-bility that melanoma is diagnosed immediately aftertime t given that the person is unaffected at time tThe hazard rate under the Cox proportional hazardsmodel (Cox 1972) is given by

l(tki) = l0(tki) exp(brsquoZki)

Table 2

Concordant and Discordant Pairs of Relatives in 1912 Families from the Queensland Familial Melanoma Project Probands Were Not Included forthe Calculation of Concordance

Sibndashsib Parentndashchild SecondOthers Total

++ 49 15 41 105

+ndash 763 536 1200 2499

mdash 7011 1078 9817 17906

Total 7823 1629 11058 20510

Figure 2

Cumulative melanoma-free survival among first-degree relatives ofconfirmed cutaneous melanoma cases (probands) diagnosed inQueensland Australia 1982ndash1987 according to the relativersquos rela-tionship to the proband

where l0() is the baseline hazard function and b is avector of regression coefficients

For a specific pair of relatives we follow Clayton(1978) in modeling the bivariate survivor function

F(tk1 tk2) = (F1(tk1)ndashq + F2(tk2)

ndashq ndash1)ndash1q

where F1 and F2 are univariable survivor functions qis a scalar parameter that measures the degree ofdependence between the relativesrsquo times at onsetindependence being implied by q = 0 and positiveassociation by q gt 0 The Clayton model allows nega-tive dependencies and has the property that failuretimes are absolutely continuous for q gt ndash05 In addi-tion the cross-ratio (or odds-ratio) function asstudied by Oakes (1989) is

c(tk1 tk2) = l(tk1 frac12 Tk2 = tk2) l(tk1frac12Tk2 sup3 tk2) =1 + q

This is equivalent to assuming that the odds-ratio isinvariant over the grid region that supports the dataHeuristically the parameter 1 + q is an odds-ratiothat depends on the degree of dependence betweenthe onset ages of the two relatives If genetic factorsdo influence the age at onset of melanoma we wouldexpect to see a higher concordance in the age of onsetin first degree relatives who on average share halftheir genes in common in comparison to seconddegree relatives Under the current model this trans-lates as qfirst-degree gt qsecond-degree

We may use a standard method to estimate withinpair correlations for 2 acute 2 tables from odds ratiosEstimates of relation-pair correlations psibndashsib pparentndashchildand psecond-order are recovered from using the relation-ships for example r sibndashsib = min(1ln(1 + q sibndashsib))Testing for the presence of genetic factors underlyingthe age at diagnosis of melanoma is equivalent totesting H0 p first-order = psecond-order We may test thishypothesis using a z-transform (Kendall 1979 p 315) of the point estimates of the correlation coef-ficients Let n1 and n2 denote the number of first orderand second order relatives let z1and z2 denote thetransformed statistics of r1and r2 the correlation esti-mates for first-order and second-order relative pairsrespectively Specifically we reject H0 when ED gtZ1ndasha where

E = E(z1 ndash z2) = mdash12mdash log 31mdash11

+ndash r

r

1

1mdash2 1mdash11

+ndash r

r2

2

mdash24D2 = V(z1 ndash z2) = mdash

(n1

1ndash 3)mdash+ mdash

(n2

1ndash 3)mdash

and Z1ndasha is the standard normal deviate correspondingto the one-sided a significance level

This approach has the advantage of providing atest for the presence of genetic effects through a singleparameter (q) However it is limited in its ability to

attribute the phenotypic variance to specific effects(eg additive gene action)

Mathematical details of the GEE model and theiterative procedure to estimate the regression coeffi-cients b and specific degrees of dependence q for thedifferent types of relative pairs have been summarizedpreviously in Do et al (2000)

MCMC Analysis Using BUGSThe Bayesian Paradigm and Gibbs Sampling

Markov Chain Monte Carlo (MCMC) is an alterna-tive Bayesian approach that provides estimates oflikelihoods and associated parameter values whenexact computation is infeasible (Hastings 1970Metropolis et al 1953) MCMC methods can beused to draw samples from the underlying joint distri-bution of major genotypes and polygenic valuesconditional on the observed data From thesesamples desired parameters and likelihoods can beestimated without the need to resort to exact compu-tation MCMC methods have been used for linkageanalysis (Kong et al 1992 Lange amp Sobel 1991)for estimation of parameters in the mixed model withand without covariates (Guo amp Thompson 1991Thomas 1992) for estimation of genendashsmoking inter-action and covariate imputation (Gauderman et al1997) for performing combined linkage and segrega-tion analysis (Faucett et al 1993 Guo amp Thompson1992) and for mixed models of large complex pedi-grees (Guo amp Thompson 1994)

In a general setting let y be the observed data andq be everything not observed including parametersand latent variables The implementation of Bayesianmethods using realistic models and priors is com-puter-intensive and relies heavily on cunningcomputational tools to approximate integrals Theproblem in general terms is to obtain the expectedvalue of a function of interest s () under the posteriordensity p(qfrac12x)

E[s(q)] =

which cannot generally be found analytically Onemethod to carry out the integration on the RHS is toperform simulation of exact Bayesian posterior distri-butions using Markov chain Monte Carlo techniquessuch as Gibbs sampling The Gibbs sampler (Gemanamp Geman 1984) is the most popular algorithm usedin MCMC applications to correlated data Gibbssampling was introduced to the main statistical com-munity by Gelfand and Smith (1990) and has sincebeen applied to an even wider array of problems TheGibbs sampler is easy to implement because it onlydepends on the local neighborhood structure In thecontext of pedigree analysis (Olshen amp Wijsman1996) the basic procedure is a sequential updating ofmissing and latent data including the underlying andunobserved major genotypes polygenic effects and

EQ s(q) p(q) p(xfrac12q) dqmdash mdash mdash

EQ p(q) p(xfrac12q) dq

105Twin Research February 2004

Analysis of Melanoma Onset

106 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

environmental effects Values for the missing or latentdata are sampled from the local conditional distribu-tion a function of the observed individual data thecurrent sampled values of other missinglatent data forthis particular individual such as polygenic and environ-mental effects and the values for the sampled geneticeffects in the immediate neighbors of an individualGibbs sampling basically consists of three main steps

Step 1 Setting initial values for unobserved quanti-ties (parameters and latent variables)

Step 2 For each parameter or latent variable qj sample from its ldquofull conditional distributionrdquogiven the current values of all other quantities inthe model

Step 3 Examine sampled values of parameters andlatent variables to monitor convergence and toprovide summary measures

Some of the most recent and popular packages thatimplement Gibbs sampling for analysis of pedigreedata include BUGS (Gilks et al 1994 Spiegelhalter et al 1996ab) Genetic Analysis Package (GAP1996) and MIXD (Olshen amp Wijsman 1996Thompson 1994) We have used BUGS mainlybecause of its flexibility in programming hierarchicalmodels besides being a freeware product

The Model

The aim here is to model the correlation structurewithin the family structure to satisfy the fundamentaladditive genetic model (Crow amp Kimura 1970Falconer 1990 Kempthorne 1960) as followsConsider a nuclear family structure consisting of 4members father mother and two children denoted byF M S1 S2 respectively Using similar notation as inBurton et al (1999) a conventional mixed linearmodel consisting of fixed and random effects may bewritten in the form

Qij = bTz + Aij +Cij + CSij+ Eij

where Qij is the observed value of a normally distrib-uted continuous trait for the jth individual in the ith

nuclear family zij is a vector of observed covariatesrepresenting fixed effects and b is a correspondingvector of unknown fixed regression coefficients Aij Cij and CSij

A denote random effects that repre-sent additive polygenic common family environmentand common sibling environment effects respectivelyThe variation in an individual response is representedby a composite covariance matrix VT and is the sumof an additive genetic covariance matrix VA acommon family environment matrix VC a sharedsibling environment matrix VCs and residual environ-mental effects The different variance components are

The overall total covariance matrix is

The components of variance s 2A s 2

C s 2Cs need not be

positive as long as VT is positive definite A negativevalue for s 2

Cs simply implies that the realized covari-ance between sib lings is less than the realized

107Twin Research February 2004

Analysis of Melanoma Onset

covariance between parent and a child A negativevalue for s 2

Cs may suggest dominance To optimizeconvergence in BUGS Model (1) may be reparameter-ized as

where Fi Gi H i are independent additive randomeffects or latent variables and are the residual errorterms for parents and children respectively If we model

Fi ~ N(0 mdash12mdash s 2

A +s 2C)

Gi ~ N(0 mdash12mdash s 2

A) Hi ~ N(0 s 2Cs)

then the basic genetics covariance model (additivegenetic common environment and unique environ-ment for this particular four-member family structureis satisfied

In survival models unobserved or unmeasuredexplanatory variables some of which may be geneticare often referred to as frailties The frailties takevalues restricted to the posit ive line and may beassumed to act multiplicatively on the hazardExtending the above model to correlated family datawith time-to-onset endpoint a multiplicative individ-ual heterogeneity or frailty term representing thelatent genetic and common environment variablesmay be modeled as random effects simultaneouslywith the effects associated with observed covariatesConsider right censored time to onset of melanomadata (Tij dij zij)1 j n from n relative pairs hereT i j denotes the true age at onset of the jth familymember or the censored time depending on whetherdij =1 or 0 respectively and z denotes a p acute 1 vector ofcovariates A Weibull distribution may be used tomodel time to failure as

f(ti zi) = ebrsquoz igt igndash1 exp(ndashebrsquoz it g

i )

where b is a vector of unknown regression coefficientsand g is the shape parameter of the Weibull distribu-tion This leads to a baseline hazard of the form

l0(ti) = gt igndash1

Re-parameterize by letting mi = ebrsquoz i the conditionaldistribution of ti given mI is then Weibull (gmi) We for-mulated a mixed model to represent the conditionaldistribution of tij given covariate effects random addi-tive genetic and common environment effects as

tijfrac12mij ~ Weibull (g mij) i = 1hellipn j = 12

where

The regression coefficients and the precision of therandom effects (tG tF tH) were given ldquonon-informa-tiverdquo Normal and Gamma priors respectively The shapeparameter g of the time to onset of melanoma distribu-tion was also given a non-informative Gamma priorwhich was slowly decreasing on the positive real line

The overall model (including random effects) maybe described in a Bayesian graph (Figure 3) whichsimplifies sampling from full conditional distributionsby exploiting part ial independence propert ies(Spiegelhalter et al 1996a) In this graph eachrandom quantity is represented by a node which maybe connected by directed or undirected linksConditional independence assumptions are repre-sented by the absence of such links

We implemented the Gibbs sampler using theBUGS program (Gilks et al 1994) (code in theAppendix) Imputation of missing data was handlednaturally in the Gibbs sampling framework by treatingmissing values as additional unknown quantities andrandomly sampling values from their full conditionaldistributions We chose simple prior distributions forimputation since the number of missing values forcovariates was not large (complete for birth year 16missing for naevi and 21 missing for freckles) andthere was no indication of non-random missingness inour data Therefore imputation for missing naevi andfreckles covariates were based on Bernouilli prior dis-tributions with respective parameter values estimatedfrom the complete observations We performed aninitial 10000 burn-in iterations followed by an addi-tional 20000 Parameter estimates were the mean andstandard deviation (SD) of all post convergence Gibbssamples with a thinning interval between 20 and 50credible intervals were computed as the lower andupper a2 percentiles from the last 20000 iterationsConvergence to the posterior distribution was con-firmed by using the different criteria provided by theadd-on CODA package including those of Gelmanand Rubin (1992) Geweke (1992) and Raftery andLewis (1992a 1992b)

ResultsGEE Approach

An inspection of residual plots following preliminarymodel-fitting provided no evidence for the failure ofproportional hazards assumption and did not detectinfluential observations We then proceeded to applythe GEE approach that could estimate regressioncoefficients while incorporating a dependence struc-ture between relative pairs The results are displayed

108 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

in Table 3 which suggest that later birth year havingat least a moderate number of naevi and freckles wereall simultaneously associated with later age at onsetof melanoma Table 3 also presents the estimatedodds-ratios for quantifying the correlation betweenpaired relatives of specific relationships

The odds ratio for sibndashsib pairs is 2973 which issignificantly different from 1 (p lt 001) The odds ratiofor parentndashchild pairs is 1650 which is slightly greaterthan 1 indicating a mild dependency between thesepairs although they are not quite statistically signifi-cant The odds ratio for second-degree and higherrelative pairs is 1155 indicating no dependence at allbetween these pairs This pattern of familial aggrega-tion is compatible with dominance variance as well asadditive genetic variance (Falconer 1990)

Bayesian Approach

We re-analyzed the age-at-onset of melanoma by usingGibbs sampling to impute missing covariates and toestimate subject-specific covariate effects randomadditive genetic common family environment andshared sibling environment effects on the log scale Theestimated shape parameter g was 43 with a 95 credi-ble interval (CI) of (42 46) The results aresummarized in Table 4 (Model A) The residual plotsdid not indicate a gross departure from the underlyingWeibull model and revealed no influential observa-tions We checked the sensitivity of the analyses to

initial parameter values by re-running the Gibbssampler five more times using different starting valuesThe resulting estimates did not differ by more than 5from the values reported here The mean estimate for(s 2

A) was 0452 with 95 CI = (0348 0566) for (s 2C)

was ndash0053 with 95 CI = (ndash0120 0019) and for(s 2

Cs) was 0467 with 95 CI = (0393 0545)A small negative value for the common family

effect suggests that there may be a dominant effect orthat the current model is not quite appropriate forexample that there are systematic effects or correla-tion structures that have not been accounted for Theresults here indicate that additive genetics seem toimpact equally on the variation of the age at onset ofmelanoma Further exploration for alternative modelsis a focus of our future research

In addition we investigated the relative contribu-tion of genetic and environmental effects on theexpression of naevi (Model B) and freckles (ModelC) which are known risk factors for melanoma Theexpressions of naevi and freckles were coded asbinary variables (none or few moles versus moderateor many moles and no freckles versus one or morefreckles) A hierarchical Bayesian binomial model wasfitted to estimate the random variance componentsThe results in Table 4 indicate that a common familyenvironment effect contributed the most to theexpression of naevi (s 2

C = 0704) (relative to the con-tributions of additive genetic effect (s 2

A = 0142) and

Individual j in family i

Figure 3

Graphical model of covariate and random family effects for an individual in a nuclear family structure tij represents the observed failure time forthe jth individual in the ith family with tcenij being an indicator variable of censoring status Full arrows indicate stochastic links to which a probabil-ity is attached broken arrows denote deterministic relationships bs are regression coefficients T is the precision of the prior distribution andequals the inverse of the variance Fi Gi Hi are random effects modeled as Fi ~ N(012 s 2

A + s 2C) Gi ~ N(0 12 s 2

A) and Hi ~ N(0 s 2Cs) Rectangles rep-

resent actual data values for the covariates g and mij are shape and scale parameters for the underlying Weibull distribution

109Twin Research February 2004

Analysis of Melanoma Onset

of shared sibling effect(s 2Cs = 0142) both of which

were non negligible In contrast variation in theexpression of freckles was largely explained by addi-tive genetic and shared family effects(s 2

A = 2050 s 2C =

2600) compared to a relatively small shared siblingeffect(s 2

C = 0115)

Discussion

We applied two methods mdash generalized estimatingequat ion and Bayesian analys is mdash to the geneticanalysis of age at onset of melanoma based on anuclear family structure Under both approaches theresults suggest that additive genetic factors played an

Table 4

Gibbs Sampling Approach Estimated Regression Coefficients and Estimated Variance Components in a Melanoma Study of Queensland Families

Weibull Model A Mean effects mdash Response variable is Age-at-onset

Covariate RR = eb Coefficient b Robust se(b) 95 CI of bYear of birth 1378 0321 00027 (03160326)

Naevi 1126 0119 00021 (00580185)

Freckling 1017 0017 01400 (ndash00050085)

Weibull Model B Variance components mdash Response variable is Age-at-onset

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 0452 0054 (03480566)

s 2C ndash0053 0027 (ndash01200019)

s 2Cs 0467 0040 (03930545)

g 43 0104 (4246)

Binomial Model Variance components mdash Response variable is Naevi

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 0142 0149 (00020498)

s 2C 0704 0156 (04031010)

s 2Cs 0195 0157 (000250553)

Binomial Model Variance components mdash Response variable is Freckling

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 2050 0779 (08353570)

s 2C 2600 0418 (17803460)

s 2Cs 0115 0088 (00110312)

Note Indicates Significance at the 005 level) Naevi Is a Binary Variable with Baseline 0 = No or Few Moles Freckling Is Coded as a Binary Variable with Baseline0 = No Freckles

Table 3

GEE Approach Estimated Regression Coefficients in the Proportional Hazard Model and Estimated Odds Ratios for Quantifying FamilialAggregation in Age at Onset of Melanoma in Queensland Families ( Indicates Significance at the 05 Significance Level)

A Mean effects

Covariate RR = eb Coefficient b Robust se(b) Z-statistic

Year of birth 1142 0132 0051 2588

Naevi (Baseline = No or few moles) 1765 0568 0073 7781

Freckling (Baseline = No freckles) 1160 0148 0049 3020

B Pattterns of familial aggregation

Relationship 1 + q se(q) Z-statistic

Sibndashsib 2973 06217 317

Parentndashchild 1650 0434 150

SecondOthers 1155 0270 047

important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix

Acknowledgments

Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee

ReferencesAbel L amp Bonney G E (1990) A time-dependent

logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407

Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401

Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165

Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc

Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag

Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press

Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625

Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California

Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212

Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151

Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117

Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202

Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235

English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283

Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd

Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494

GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software

110 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

111Twin Research February 2004

Analysis of Melanoma Onset

Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214

Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188

Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409

Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511

Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741

Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press

Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178

Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189

Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126

Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432

Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109

Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760

Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678

Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506

Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071

Kempthorne O (1960) Biometrical genetics New YorkPergammon Press

Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press

Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210

Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452

Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334

LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425

Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293

Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298

MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432

Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092

Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43

Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493

Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497

Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press

SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University

Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148

112 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171

Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press

Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit

Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit

Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95

Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146

Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation

Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230

Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington

Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454

113Twin Research February 2004

Analysis of Melanoma Onset

Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects

Page 5: Analysis of Melanoma Onset: Assessing Familial Aggregation ... · and Fitting Variance Components via Bayesian Random Effects Models Kim-Anh Do1, Joanne F. Aitken2, Adè le C. Green3,

102 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

decision trees that were incorporated into RPART byTherneau (1997) using the ideas put forth by LeBlancand Crowley (1992) for survival data with censoringThese authors showed through simulation studies thatsurvival trees could outperform standard parametricmethods such as proportional hazards modelling particularly in situations where the underlying distrib-ution was not exponential Along with Breiman et al(1984) they showed decision trees to be usefulexploratory tools for identifying important variablesinteractions and outliers

The methodology for survival trees begins by iden-tifying optimal splits using the log rank statistic toseparate the data into homogeneous groups Eachsplit is comprised of a parent node and two daughternodes which are linked to the parent by branchesFigure 1 is an example of a survival tree produced onthe melanoma data Once a large tree is grown withmany terminal nodes that contain very few observa-tions a pruning procedure is introduced to identify asequence of sub-trees The technique of pruninginvolves snipping back splits of the tree one at a timeuntil only the root node remains Cross-validation isthen introduced to aid in the selection of the optimalmodel One nice feature of survival trees is the use ofKaplan-Meier curves to provide information aboutsurvival rates at each terminal node of the tree Thisapproach may allow for better interpretation of theterminal nodes and highlight different scenarioswhich yield similar survival rates as can be seen inFigure 1

The variables identified from the multivariablemodel were used as input into RPART Variables suchas the sun and UV exposures were omitted from themodelling stage since nearly half of the data for eachof these variables were missing (cumulative sun expo-sure 438 UV exposure 404) A large survivaltree was produced in SPLUS and using cross-valida-tion a model splitting solely on birth year wasidentified This survival tree yielded the minimumcross-validated error rate (093) suggesting that birthyear was an important indicator for melanomaSelecting a slightly larger but more informative model(error = 094) resulted in a survival tree consisting ofnine terminal nodes This revealed splits on birth yearmole count and freckling Al l other variablesappeared to be either competing at each node oracting as a surrogate variable for one of these primarysplits Skin type in particular arose as an importantsurrogate for freckling

In Figure 1 we aim to provide a more visual inter-pretation to the results from the survival analysisEach split is shown at the top of each node and canbe assumed to be the split that directs observationstowards the left side of the tree For example aftersplitting on birth year at 1933 a split to the left cor-responds to an individual with few moles A split tothe right indicates an individual with many moles andso on The terminal nodes in this figure are displayed

as Kaplan-Meier plots showing the survival curvesalong with the number of observations n and the riskratio calculated with reference to the baseline groupof 9669 individuals The reference survival curve isshown at the root node along with a bar which showsthe percentage of events in the entire dataset Gridsare placed on each survival plot at equal intervals of20 years of age These grids have been plotted to sim-plify the interpretation An indicator to the left of theplot displays the risk ratio Ratios above one areshown by an ldquoup arrowrdquo while a decrease in risk isillustrated by a ldquodown arrowrdquo From this model wecan see that there are a few scenarios that indicatehigh risk for earlier onset of melanoma These scenar-ios may be described as

individuals born after 1966 with many moles (RR = 113)

individuals born between 1947 and 1966 withmany moles (RR = 45)

individuals born between 1933 and 1953 with fewmoles but many freckles (RR = 24)

individuals born between 1923 and 1933 withmany moles (RR = 141)

It is obvious from these results that birth year has asubstantial impact on the age-at-onset of melanomaOnce this is taken into account mole count and freck-ling only provide a small contribution to the risk

Family History of Melanoma

As described earlier in the Data section family historywas collected regarding first-degree (siblings andparents) and second-degree relatives From this pedi-gree structure other higher-order types of relative pairscould also be formed Some of these relatives were dis-eased with melanoma resulting in pairs of relativeswho both may be diseased (++) both not diseased (mdash)and one diseased while the other was not diseased (+ndash)Table 2 lists the concordant and discordant pairs ofspecific relationships sibndashsib parentndashchild andsecond-degreelower order relative pairs The second-degree relative pairs include grandparentndashgrandchildand auntndashniece while the lower order relative pairsinclude the in-law pairs From Table 2 the percent-ages of both diseased pairs are 06 09 and 04among sibndashsib parentndashchild and secondlower orderpairs respectively Crude estimates of correlationcoefficients can be calculated from these percentageswithout accounting for ages at onset among these rel-atives However the risk of developing melanomamay depend on the subjectrsquos age Hence adjusting forthe age at onset is essential in quantifying the correla-tion of age at onset between pairs of relatives In asubsequent section we describe how this can be donerigorously via the generalized estimating equationsapproach On average melanomas were diagnosedslightly earlier in relatives (475 years) than inprobands (502 years) Among relatives melanomaswere diagnosed at younger ages in later generations

103Twin Research February 2004

Analysis of Melanoma Onset

RR

= 1

00

Bas

elin

e

Fig

ure

1

Surv

ival

tree

for t

he m

elan

oma

data

con

sist

ing

of 9

669

indi

vidu

als

from

191

2 Qu

eens

land

fam

ilies

Ter

min

al n

odes

are

rect

angu

lar a

nd d

ispl

ay th

e Ka

plan

-Mei

er c

urve

for t

he c

orre

spon

ding

sub

grou

p

104 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

To account for the different ages at censoring in eachgeneration due to termination of the study or deathfrom causes other than melanoma we examined thedisease-free survival distribution for each generationusing the standard failure-time analysis technique(Figure 2) The median age at diagnosis of melanomawas 64 among parents of probands 50 among sib-lings of probands and 33 among children ofprobands The disease-free survival functions differedsignificantly between the children generation fromearlier generations (log-rank test with p lt 01) butthere was no significant differences in age-at-onset of

melanoma between the siblings and parents ofprobands (log rank test with p gt 9)

Preliminary Segregation Analysis

A preliminary segregation analysis was conducted(Aitken et al 1998) to investigate whether the famil-ial clustering of cutaneous melanoma is consistentwith Mendelian inheritance of a major autosomalgene Analyses were performed with the SAGE (1992)statistical package using the maximum likelihoodREGTL program for a binary trait with a variable ageof onset The hypothesis of co-dominant Mendelianinheritance gave a significantly better fit to the datathan either dominant or recessive Mendelian inheri-tance Overall both Mendelian inheritance of a singlemajor gene and purely environmental transmissionwere rejected However there was strong evidence offamilial dependence in melanoma occurrence

The inclusion of risk factors in the models mayreveal whether all or a combination of these explainsthe familial dependence that was demonstrated by thesegregation analysis If a major gene exists that oper-ates independently on these covariates includingthem in the models may reduce residual variation andincrease the power of the analysis to detect a majorgene effect

Estimating Equations Approach

Let y = (dki tki Zki) denote the data collected for the ith

member in the kth family (k = 1hellipK and i = 12)where dki = 0 if the observation is censored tki is eitherthe recorded age at diagnosis of melanoma or the ageat the most recent follow-up for unaffected peopleand Zki is a vector of measured covariates We assumethat censoring time age at diagnosis of melanomaand the covariates are independently distributedThese assumptions can be relaxed in more generalmodels subject to identification constraints Thehazard rate for melanoma is the instantaneous proba-bility that melanoma is diagnosed immediately aftertime t given that the person is unaffected at time tThe hazard rate under the Cox proportional hazardsmodel (Cox 1972) is given by

l(tki) = l0(tki) exp(brsquoZki)

Table 2

Concordant and Discordant Pairs of Relatives in 1912 Families from the Queensland Familial Melanoma Project Probands Were Not Included forthe Calculation of Concordance

Sibndashsib Parentndashchild SecondOthers Total

++ 49 15 41 105

+ndash 763 536 1200 2499

mdash 7011 1078 9817 17906

Total 7823 1629 11058 20510

Figure 2

Cumulative melanoma-free survival among first-degree relatives ofconfirmed cutaneous melanoma cases (probands) diagnosed inQueensland Australia 1982ndash1987 according to the relativersquos rela-tionship to the proband

where l0() is the baseline hazard function and b is avector of regression coefficients

For a specific pair of relatives we follow Clayton(1978) in modeling the bivariate survivor function

F(tk1 tk2) = (F1(tk1)ndashq + F2(tk2)

ndashq ndash1)ndash1q

where F1 and F2 are univariable survivor functions qis a scalar parameter that measures the degree ofdependence between the relativesrsquo times at onsetindependence being implied by q = 0 and positiveassociation by q gt 0 The Clayton model allows nega-tive dependencies and has the property that failuretimes are absolutely continuous for q gt ndash05 In addi-tion the cross-ratio (or odds-ratio) function asstudied by Oakes (1989) is

c(tk1 tk2) = l(tk1 frac12 Tk2 = tk2) l(tk1frac12Tk2 sup3 tk2) =1 + q

This is equivalent to assuming that the odds-ratio isinvariant over the grid region that supports the dataHeuristically the parameter 1 + q is an odds-ratiothat depends on the degree of dependence betweenthe onset ages of the two relatives If genetic factorsdo influence the age at onset of melanoma we wouldexpect to see a higher concordance in the age of onsetin first degree relatives who on average share halftheir genes in common in comparison to seconddegree relatives Under the current model this trans-lates as qfirst-degree gt qsecond-degree

We may use a standard method to estimate withinpair correlations for 2 acute 2 tables from odds ratiosEstimates of relation-pair correlations psibndashsib pparentndashchildand psecond-order are recovered from using the relation-ships for example r sibndashsib = min(1ln(1 + q sibndashsib))Testing for the presence of genetic factors underlyingthe age at diagnosis of melanoma is equivalent totesting H0 p first-order = psecond-order We may test thishypothesis using a z-transform (Kendall 1979 p 315) of the point estimates of the correlation coef-ficients Let n1 and n2 denote the number of first orderand second order relatives let z1and z2 denote thetransformed statistics of r1and r2 the correlation esti-mates for first-order and second-order relative pairsrespectively Specifically we reject H0 when ED gtZ1ndasha where

E = E(z1 ndash z2) = mdash12mdash log 31mdash11

+ndash r

r

1

1mdash2 1mdash11

+ndash r

r2

2

mdash24D2 = V(z1 ndash z2) = mdash

(n1

1ndash 3)mdash+ mdash

(n2

1ndash 3)mdash

and Z1ndasha is the standard normal deviate correspondingto the one-sided a significance level

This approach has the advantage of providing atest for the presence of genetic effects through a singleparameter (q) However it is limited in its ability to

attribute the phenotypic variance to specific effects(eg additive gene action)

Mathematical details of the GEE model and theiterative procedure to estimate the regression coeffi-cients b and specific degrees of dependence q for thedifferent types of relative pairs have been summarizedpreviously in Do et al (2000)

MCMC Analysis Using BUGSThe Bayesian Paradigm and Gibbs Sampling

Markov Chain Monte Carlo (MCMC) is an alterna-tive Bayesian approach that provides estimates oflikelihoods and associated parameter values whenexact computation is infeasible (Hastings 1970Metropolis et al 1953) MCMC methods can beused to draw samples from the underlying joint distri-bution of major genotypes and polygenic valuesconditional on the observed data From thesesamples desired parameters and likelihoods can beestimated without the need to resort to exact compu-tation MCMC methods have been used for linkageanalysis (Kong et al 1992 Lange amp Sobel 1991)for estimation of parameters in the mixed model withand without covariates (Guo amp Thompson 1991Thomas 1992) for estimation of genendashsmoking inter-action and covariate imputation (Gauderman et al1997) for performing combined linkage and segrega-tion analysis (Faucett et al 1993 Guo amp Thompson1992) and for mixed models of large complex pedi-grees (Guo amp Thompson 1994)

In a general setting let y be the observed data andq be everything not observed including parametersand latent variables The implementation of Bayesianmethods using realistic models and priors is com-puter-intensive and relies heavily on cunningcomputational tools to approximate integrals Theproblem in general terms is to obtain the expectedvalue of a function of interest s () under the posteriordensity p(qfrac12x)

E[s(q)] =

which cannot generally be found analytically Onemethod to carry out the integration on the RHS is toperform simulation of exact Bayesian posterior distri-butions using Markov chain Monte Carlo techniquessuch as Gibbs sampling The Gibbs sampler (Gemanamp Geman 1984) is the most popular algorithm usedin MCMC applications to correlated data Gibbssampling was introduced to the main statistical com-munity by Gelfand and Smith (1990) and has sincebeen applied to an even wider array of problems TheGibbs sampler is easy to implement because it onlydepends on the local neighborhood structure In thecontext of pedigree analysis (Olshen amp Wijsman1996) the basic procedure is a sequential updating ofmissing and latent data including the underlying andunobserved major genotypes polygenic effects and

EQ s(q) p(q) p(xfrac12q) dqmdash mdash mdash

EQ p(q) p(xfrac12q) dq

105Twin Research February 2004

Analysis of Melanoma Onset

106 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

environmental effects Values for the missing or latentdata are sampled from the local conditional distribu-tion a function of the observed individual data thecurrent sampled values of other missinglatent data forthis particular individual such as polygenic and environ-mental effects and the values for the sampled geneticeffects in the immediate neighbors of an individualGibbs sampling basically consists of three main steps

Step 1 Setting initial values for unobserved quanti-ties (parameters and latent variables)

Step 2 For each parameter or latent variable qj sample from its ldquofull conditional distributionrdquogiven the current values of all other quantities inthe model

Step 3 Examine sampled values of parameters andlatent variables to monitor convergence and toprovide summary measures

Some of the most recent and popular packages thatimplement Gibbs sampling for analysis of pedigreedata include BUGS (Gilks et al 1994 Spiegelhalter et al 1996ab) Genetic Analysis Package (GAP1996) and MIXD (Olshen amp Wijsman 1996Thompson 1994) We have used BUGS mainlybecause of its flexibility in programming hierarchicalmodels besides being a freeware product

The Model

The aim here is to model the correlation structurewithin the family structure to satisfy the fundamentaladditive genetic model (Crow amp Kimura 1970Falconer 1990 Kempthorne 1960) as followsConsider a nuclear family structure consisting of 4members father mother and two children denoted byF M S1 S2 respectively Using similar notation as inBurton et al (1999) a conventional mixed linearmodel consisting of fixed and random effects may bewritten in the form

Qij = bTz + Aij +Cij + CSij+ Eij

where Qij is the observed value of a normally distrib-uted continuous trait for the jth individual in the ith

nuclear family zij is a vector of observed covariatesrepresenting fixed effects and b is a correspondingvector of unknown fixed regression coefficients Aij Cij and CSij

A denote random effects that repre-sent additive polygenic common family environmentand common sibling environment effects respectivelyThe variation in an individual response is representedby a composite covariance matrix VT and is the sumof an additive genetic covariance matrix VA acommon family environment matrix VC a sharedsibling environment matrix VCs and residual environ-mental effects The different variance components are

The overall total covariance matrix is

The components of variance s 2A s 2

C s 2Cs need not be

positive as long as VT is positive definite A negativevalue for s 2

Cs simply implies that the realized covari-ance between sib lings is less than the realized

107Twin Research February 2004

Analysis of Melanoma Onset

covariance between parent and a child A negativevalue for s 2

Cs may suggest dominance To optimizeconvergence in BUGS Model (1) may be reparameter-ized as

where Fi Gi H i are independent additive randomeffects or latent variables and are the residual errorterms for parents and children respectively If we model

Fi ~ N(0 mdash12mdash s 2

A +s 2C)

Gi ~ N(0 mdash12mdash s 2

A) Hi ~ N(0 s 2Cs)

then the basic genetics covariance model (additivegenetic common environment and unique environ-ment for this particular four-member family structureis satisfied

In survival models unobserved or unmeasuredexplanatory variables some of which may be geneticare often referred to as frailties The frailties takevalues restricted to the posit ive line and may beassumed to act multiplicatively on the hazardExtending the above model to correlated family datawith time-to-onset endpoint a multiplicative individ-ual heterogeneity or frailty term representing thelatent genetic and common environment variablesmay be modeled as random effects simultaneouslywith the effects associated with observed covariatesConsider right censored time to onset of melanomadata (Tij dij zij)1 j n from n relative pairs hereT i j denotes the true age at onset of the jth familymember or the censored time depending on whetherdij =1 or 0 respectively and z denotes a p acute 1 vector ofcovariates A Weibull distribution may be used tomodel time to failure as

f(ti zi) = ebrsquoz igt igndash1 exp(ndashebrsquoz it g

i )

where b is a vector of unknown regression coefficientsand g is the shape parameter of the Weibull distribu-tion This leads to a baseline hazard of the form

l0(ti) = gt igndash1

Re-parameterize by letting mi = ebrsquoz i the conditionaldistribution of ti given mI is then Weibull (gmi) We for-mulated a mixed model to represent the conditionaldistribution of tij given covariate effects random addi-tive genetic and common environment effects as

tijfrac12mij ~ Weibull (g mij) i = 1hellipn j = 12

where

The regression coefficients and the precision of therandom effects (tG tF tH) were given ldquonon-informa-tiverdquo Normal and Gamma priors respectively The shapeparameter g of the time to onset of melanoma distribu-tion was also given a non-informative Gamma priorwhich was slowly decreasing on the positive real line

The overall model (including random effects) maybe described in a Bayesian graph (Figure 3) whichsimplifies sampling from full conditional distributionsby exploiting part ial independence propert ies(Spiegelhalter et al 1996a) In this graph eachrandom quantity is represented by a node which maybe connected by directed or undirected linksConditional independence assumptions are repre-sented by the absence of such links

We implemented the Gibbs sampler using theBUGS program (Gilks et al 1994) (code in theAppendix) Imputation of missing data was handlednaturally in the Gibbs sampling framework by treatingmissing values as additional unknown quantities andrandomly sampling values from their full conditionaldistributions We chose simple prior distributions forimputation since the number of missing values forcovariates was not large (complete for birth year 16missing for naevi and 21 missing for freckles) andthere was no indication of non-random missingness inour data Therefore imputation for missing naevi andfreckles covariates were based on Bernouilli prior dis-tributions with respective parameter values estimatedfrom the complete observations We performed aninitial 10000 burn-in iterations followed by an addi-tional 20000 Parameter estimates were the mean andstandard deviation (SD) of all post convergence Gibbssamples with a thinning interval between 20 and 50credible intervals were computed as the lower andupper a2 percentiles from the last 20000 iterationsConvergence to the posterior distribution was con-firmed by using the different criteria provided by theadd-on CODA package including those of Gelmanand Rubin (1992) Geweke (1992) and Raftery andLewis (1992a 1992b)

ResultsGEE Approach

An inspection of residual plots following preliminarymodel-fitting provided no evidence for the failure ofproportional hazards assumption and did not detectinfluential observations We then proceeded to applythe GEE approach that could estimate regressioncoefficients while incorporating a dependence struc-ture between relative pairs The results are displayed

108 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

in Table 3 which suggest that later birth year havingat least a moderate number of naevi and freckles wereall simultaneously associated with later age at onsetof melanoma Table 3 also presents the estimatedodds-ratios for quantifying the correlation betweenpaired relatives of specific relationships

The odds ratio for sibndashsib pairs is 2973 which issignificantly different from 1 (p lt 001) The odds ratiofor parentndashchild pairs is 1650 which is slightly greaterthan 1 indicating a mild dependency between thesepairs although they are not quite statistically signifi-cant The odds ratio for second-degree and higherrelative pairs is 1155 indicating no dependence at allbetween these pairs This pattern of familial aggrega-tion is compatible with dominance variance as well asadditive genetic variance (Falconer 1990)

Bayesian Approach

We re-analyzed the age-at-onset of melanoma by usingGibbs sampling to impute missing covariates and toestimate subject-specific covariate effects randomadditive genetic common family environment andshared sibling environment effects on the log scale Theestimated shape parameter g was 43 with a 95 credi-ble interval (CI) of (42 46) The results aresummarized in Table 4 (Model A) The residual plotsdid not indicate a gross departure from the underlyingWeibull model and revealed no influential observa-tions We checked the sensitivity of the analyses to

initial parameter values by re-running the Gibbssampler five more times using different starting valuesThe resulting estimates did not differ by more than 5from the values reported here The mean estimate for(s 2

A) was 0452 with 95 CI = (0348 0566) for (s 2C)

was ndash0053 with 95 CI = (ndash0120 0019) and for(s 2

Cs) was 0467 with 95 CI = (0393 0545)A small negative value for the common family

effect suggests that there may be a dominant effect orthat the current model is not quite appropriate forexample that there are systematic effects or correla-tion structures that have not been accounted for Theresults here indicate that additive genetics seem toimpact equally on the variation of the age at onset ofmelanoma Further exploration for alternative modelsis a focus of our future research

In addition we investigated the relative contribu-tion of genetic and environmental effects on theexpression of naevi (Model B) and freckles (ModelC) which are known risk factors for melanoma Theexpressions of naevi and freckles were coded asbinary variables (none or few moles versus moderateor many moles and no freckles versus one or morefreckles) A hierarchical Bayesian binomial model wasfitted to estimate the random variance componentsThe results in Table 4 indicate that a common familyenvironment effect contributed the most to theexpression of naevi (s 2

C = 0704) (relative to the con-tributions of additive genetic effect (s 2

A = 0142) and

Individual j in family i

Figure 3

Graphical model of covariate and random family effects for an individual in a nuclear family structure tij represents the observed failure time forthe jth individual in the ith family with tcenij being an indicator variable of censoring status Full arrows indicate stochastic links to which a probabil-ity is attached broken arrows denote deterministic relationships bs are regression coefficients T is the precision of the prior distribution andequals the inverse of the variance Fi Gi Hi are random effects modeled as Fi ~ N(012 s 2

A + s 2C) Gi ~ N(0 12 s 2

A) and Hi ~ N(0 s 2Cs) Rectangles rep-

resent actual data values for the covariates g and mij are shape and scale parameters for the underlying Weibull distribution

109Twin Research February 2004

Analysis of Melanoma Onset

of shared sibling effect(s 2Cs = 0142) both of which

were non negligible In contrast variation in theexpression of freckles was largely explained by addi-tive genetic and shared family effects(s 2

A = 2050 s 2C =

2600) compared to a relatively small shared siblingeffect(s 2

C = 0115)

Discussion

We applied two methods mdash generalized estimatingequat ion and Bayesian analys is mdash to the geneticanalysis of age at onset of melanoma based on anuclear family structure Under both approaches theresults suggest that additive genetic factors played an

Table 4

Gibbs Sampling Approach Estimated Regression Coefficients and Estimated Variance Components in a Melanoma Study of Queensland Families

Weibull Model A Mean effects mdash Response variable is Age-at-onset

Covariate RR = eb Coefficient b Robust se(b) 95 CI of bYear of birth 1378 0321 00027 (03160326)

Naevi 1126 0119 00021 (00580185)

Freckling 1017 0017 01400 (ndash00050085)

Weibull Model B Variance components mdash Response variable is Age-at-onset

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 0452 0054 (03480566)

s 2C ndash0053 0027 (ndash01200019)

s 2Cs 0467 0040 (03930545)

g 43 0104 (4246)

Binomial Model Variance components mdash Response variable is Naevi

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 0142 0149 (00020498)

s 2C 0704 0156 (04031010)

s 2Cs 0195 0157 (000250553)

Binomial Model Variance components mdash Response variable is Freckling

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 2050 0779 (08353570)

s 2C 2600 0418 (17803460)

s 2Cs 0115 0088 (00110312)

Note Indicates Significance at the 005 level) Naevi Is a Binary Variable with Baseline 0 = No or Few Moles Freckling Is Coded as a Binary Variable with Baseline0 = No Freckles

Table 3

GEE Approach Estimated Regression Coefficients in the Proportional Hazard Model and Estimated Odds Ratios for Quantifying FamilialAggregation in Age at Onset of Melanoma in Queensland Families ( Indicates Significance at the 05 Significance Level)

A Mean effects

Covariate RR = eb Coefficient b Robust se(b) Z-statistic

Year of birth 1142 0132 0051 2588

Naevi (Baseline = No or few moles) 1765 0568 0073 7781

Freckling (Baseline = No freckles) 1160 0148 0049 3020

B Pattterns of familial aggregation

Relationship 1 + q se(q) Z-statistic

Sibndashsib 2973 06217 317

Parentndashchild 1650 0434 150

SecondOthers 1155 0270 047

important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix

Acknowledgments

Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee

ReferencesAbel L amp Bonney G E (1990) A time-dependent

logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407

Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401

Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165

Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc

Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag

Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press

Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625

Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California

Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212

Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151

Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117

Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202

Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235

English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283

Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd

Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494

GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software

110 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

111Twin Research February 2004

Analysis of Melanoma Onset

Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214

Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188

Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409

Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511

Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741

Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press

Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178

Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189

Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126

Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432

Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109

Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760

Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678

Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506

Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071

Kempthorne O (1960) Biometrical genetics New YorkPergammon Press

Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press

Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210

Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452

Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334

LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425

Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293

Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298

MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432

Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092

Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43

Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493

Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497

Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press

SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University

Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148

112 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171

Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press

Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit

Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit

Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95

Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146

Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation

Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230

Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington

Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454

113Twin Research February 2004

Analysis of Melanoma Onset

Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects

Page 6: Analysis of Melanoma Onset: Assessing Familial Aggregation ... · and Fitting Variance Components via Bayesian Random Effects Models Kim-Anh Do1, Joanne F. Aitken2, Adè le C. Green3,

103Twin Research February 2004

Analysis of Melanoma Onset

RR

= 1

00

Bas

elin

e

Fig

ure

1

Surv

ival

tree

for t

he m

elan

oma

data

con

sist

ing

of 9

669

indi

vidu

als

from

191

2 Qu

eens

land

fam

ilies

Ter

min

al n

odes

are

rect

angu

lar a

nd d

ispl

ay th

e Ka

plan

-Mei

er c

urve

for t

he c

orre

spon

ding

sub

grou

p

104 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

To account for the different ages at censoring in eachgeneration due to termination of the study or deathfrom causes other than melanoma we examined thedisease-free survival distribution for each generationusing the standard failure-time analysis technique(Figure 2) The median age at diagnosis of melanomawas 64 among parents of probands 50 among sib-lings of probands and 33 among children ofprobands The disease-free survival functions differedsignificantly between the children generation fromearlier generations (log-rank test with p lt 01) butthere was no significant differences in age-at-onset of

melanoma between the siblings and parents ofprobands (log rank test with p gt 9)

Preliminary Segregation Analysis

A preliminary segregation analysis was conducted(Aitken et al 1998) to investigate whether the famil-ial clustering of cutaneous melanoma is consistentwith Mendelian inheritance of a major autosomalgene Analyses were performed with the SAGE (1992)statistical package using the maximum likelihoodREGTL program for a binary trait with a variable ageof onset The hypothesis of co-dominant Mendelianinheritance gave a significantly better fit to the datathan either dominant or recessive Mendelian inheri-tance Overall both Mendelian inheritance of a singlemajor gene and purely environmental transmissionwere rejected However there was strong evidence offamilial dependence in melanoma occurrence

The inclusion of risk factors in the models mayreveal whether all or a combination of these explainsthe familial dependence that was demonstrated by thesegregation analysis If a major gene exists that oper-ates independently on these covariates includingthem in the models may reduce residual variation andincrease the power of the analysis to detect a majorgene effect

Estimating Equations Approach

Let y = (dki tki Zki) denote the data collected for the ith

member in the kth family (k = 1hellipK and i = 12)where dki = 0 if the observation is censored tki is eitherthe recorded age at diagnosis of melanoma or the ageat the most recent follow-up for unaffected peopleand Zki is a vector of measured covariates We assumethat censoring time age at diagnosis of melanomaand the covariates are independently distributedThese assumptions can be relaxed in more generalmodels subject to identification constraints Thehazard rate for melanoma is the instantaneous proba-bility that melanoma is diagnosed immediately aftertime t given that the person is unaffected at time tThe hazard rate under the Cox proportional hazardsmodel (Cox 1972) is given by

l(tki) = l0(tki) exp(brsquoZki)

Table 2

Concordant and Discordant Pairs of Relatives in 1912 Families from the Queensland Familial Melanoma Project Probands Were Not Included forthe Calculation of Concordance

Sibndashsib Parentndashchild SecondOthers Total

++ 49 15 41 105

+ndash 763 536 1200 2499

mdash 7011 1078 9817 17906

Total 7823 1629 11058 20510

Figure 2

Cumulative melanoma-free survival among first-degree relatives ofconfirmed cutaneous melanoma cases (probands) diagnosed inQueensland Australia 1982ndash1987 according to the relativersquos rela-tionship to the proband

where l0() is the baseline hazard function and b is avector of regression coefficients

For a specific pair of relatives we follow Clayton(1978) in modeling the bivariate survivor function

F(tk1 tk2) = (F1(tk1)ndashq + F2(tk2)

ndashq ndash1)ndash1q

where F1 and F2 are univariable survivor functions qis a scalar parameter that measures the degree ofdependence between the relativesrsquo times at onsetindependence being implied by q = 0 and positiveassociation by q gt 0 The Clayton model allows nega-tive dependencies and has the property that failuretimes are absolutely continuous for q gt ndash05 In addi-tion the cross-ratio (or odds-ratio) function asstudied by Oakes (1989) is

c(tk1 tk2) = l(tk1 frac12 Tk2 = tk2) l(tk1frac12Tk2 sup3 tk2) =1 + q

This is equivalent to assuming that the odds-ratio isinvariant over the grid region that supports the dataHeuristically the parameter 1 + q is an odds-ratiothat depends on the degree of dependence betweenthe onset ages of the two relatives If genetic factorsdo influence the age at onset of melanoma we wouldexpect to see a higher concordance in the age of onsetin first degree relatives who on average share halftheir genes in common in comparison to seconddegree relatives Under the current model this trans-lates as qfirst-degree gt qsecond-degree

We may use a standard method to estimate withinpair correlations for 2 acute 2 tables from odds ratiosEstimates of relation-pair correlations psibndashsib pparentndashchildand psecond-order are recovered from using the relation-ships for example r sibndashsib = min(1ln(1 + q sibndashsib))Testing for the presence of genetic factors underlyingthe age at diagnosis of melanoma is equivalent totesting H0 p first-order = psecond-order We may test thishypothesis using a z-transform (Kendall 1979 p 315) of the point estimates of the correlation coef-ficients Let n1 and n2 denote the number of first orderand second order relatives let z1and z2 denote thetransformed statistics of r1and r2 the correlation esti-mates for first-order and second-order relative pairsrespectively Specifically we reject H0 when ED gtZ1ndasha where

E = E(z1 ndash z2) = mdash12mdash log 31mdash11

+ndash r

r

1

1mdash2 1mdash11

+ndash r

r2

2

mdash24D2 = V(z1 ndash z2) = mdash

(n1

1ndash 3)mdash+ mdash

(n2

1ndash 3)mdash

and Z1ndasha is the standard normal deviate correspondingto the one-sided a significance level

This approach has the advantage of providing atest for the presence of genetic effects through a singleparameter (q) However it is limited in its ability to

attribute the phenotypic variance to specific effects(eg additive gene action)

Mathematical details of the GEE model and theiterative procedure to estimate the regression coeffi-cients b and specific degrees of dependence q for thedifferent types of relative pairs have been summarizedpreviously in Do et al (2000)

MCMC Analysis Using BUGSThe Bayesian Paradigm and Gibbs Sampling

Markov Chain Monte Carlo (MCMC) is an alterna-tive Bayesian approach that provides estimates oflikelihoods and associated parameter values whenexact computation is infeasible (Hastings 1970Metropolis et al 1953) MCMC methods can beused to draw samples from the underlying joint distri-bution of major genotypes and polygenic valuesconditional on the observed data From thesesamples desired parameters and likelihoods can beestimated without the need to resort to exact compu-tation MCMC methods have been used for linkageanalysis (Kong et al 1992 Lange amp Sobel 1991)for estimation of parameters in the mixed model withand without covariates (Guo amp Thompson 1991Thomas 1992) for estimation of genendashsmoking inter-action and covariate imputation (Gauderman et al1997) for performing combined linkage and segrega-tion analysis (Faucett et al 1993 Guo amp Thompson1992) and for mixed models of large complex pedi-grees (Guo amp Thompson 1994)

In a general setting let y be the observed data andq be everything not observed including parametersand latent variables The implementation of Bayesianmethods using realistic models and priors is com-puter-intensive and relies heavily on cunningcomputational tools to approximate integrals Theproblem in general terms is to obtain the expectedvalue of a function of interest s () under the posteriordensity p(qfrac12x)

E[s(q)] =

which cannot generally be found analytically Onemethod to carry out the integration on the RHS is toperform simulation of exact Bayesian posterior distri-butions using Markov chain Monte Carlo techniquessuch as Gibbs sampling The Gibbs sampler (Gemanamp Geman 1984) is the most popular algorithm usedin MCMC applications to correlated data Gibbssampling was introduced to the main statistical com-munity by Gelfand and Smith (1990) and has sincebeen applied to an even wider array of problems TheGibbs sampler is easy to implement because it onlydepends on the local neighborhood structure In thecontext of pedigree analysis (Olshen amp Wijsman1996) the basic procedure is a sequential updating ofmissing and latent data including the underlying andunobserved major genotypes polygenic effects and

EQ s(q) p(q) p(xfrac12q) dqmdash mdash mdash

EQ p(q) p(xfrac12q) dq

105Twin Research February 2004

Analysis of Melanoma Onset

106 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

environmental effects Values for the missing or latentdata are sampled from the local conditional distribu-tion a function of the observed individual data thecurrent sampled values of other missinglatent data forthis particular individual such as polygenic and environ-mental effects and the values for the sampled geneticeffects in the immediate neighbors of an individualGibbs sampling basically consists of three main steps

Step 1 Setting initial values for unobserved quanti-ties (parameters and latent variables)

Step 2 For each parameter or latent variable qj sample from its ldquofull conditional distributionrdquogiven the current values of all other quantities inthe model

Step 3 Examine sampled values of parameters andlatent variables to monitor convergence and toprovide summary measures

Some of the most recent and popular packages thatimplement Gibbs sampling for analysis of pedigreedata include BUGS (Gilks et al 1994 Spiegelhalter et al 1996ab) Genetic Analysis Package (GAP1996) and MIXD (Olshen amp Wijsman 1996Thompson 1994) We have used BUGS mainlybecause of its flexibility in programming hierarchicalmodels besides being a freeware product

The Model

The aim here is to model the correlation structurewithin the family structure to satisfy the fundamentaladditive genetic model (Crow amp Kimura 1970Falconer 1990 Kempthorne 1960) as followsConsider a nuclear family structure consisting of 4members father mother and two children denoted byF M S1 S2 respectively Using similar notation as inBurton et al (1999) a conventional mixed linearmodel consisting of fixed and random effects may bewritten in the form

Qij = bTz + Aij +Cij + CSij+ Eij

where Qij is the observed value of a normally distrib-uted continuous trait for the jth individual in the ith

nuclear family zij is a vector of observed covariatesrepresenting fixed effects and b is a correspondingvector of unknown fixed regression coefficients Aij Cij and CSij

A denote random effects that repre-sent additive polygenic common family environmentand common sibling environment effects respectivelyThe variation in an individual response is representedby a composite covariance matrix VT and is the sumof an additive genetic covariance matrix VA acommon family environment matrix VC a sharedsibling environment matrix VCs and residual environ-mental effects The different variance components are

The overall total covariance matrix is

The components of variance s 2A s 2

C s 2Cs need not be

positive as long as VT is positive definite A negativevalue for s 2

Cs simply implies that the realized covari-ance between sib lings is less than the realized

107Twin Research February 2004

Analysis of Melanoma Onset

covariance between parent and a child A negativevalue for s 2

Cs may suggest dominance To optimizeconvergence in BUGS Model (1) may be reparameter-ized as

where Fi Gi H i are independent additive randomeffects or latent variables and are the residual errorterms for parents and children respectively If we model

Fi ~ N(0 mdash12mdash s 2

A +s 2C)

Gi ~ N(0 mdash12mdash s 2

A) Hi ~ N(0 s 2Cs)

then the basic genetics covariance model (additivegenetic common environment and unique environ-ment for this particular four-member family structureis satisfied

In survival models unobserved or unmeasuredexplanatory variables some of which may be geneticare often referred to as frailties The frailties takevalues restricted to the posit ive line and may beassumed to act multiplicatively on the hazardExtending the above model to correlated family datawith time-to-onset endpoint a multiplicative individ-ual heterogeneity or frailty term representing thelatent genetic and common environment variablesmay be modeled as random effects simultaneouslywith the effects associated with observed covariatesConsider right censored time to onset of melanomadata (Tij dij zij)1 j n from n relative pairs hereT i j denotes the true age at onset of the jth familymember or the censored time depending on whetherdij =1 or 0 respectively and z denotes a p acute 1 vector ofcovariates A Weibull distribution may be used tomodel time to failure as

f(ti zi) = ebrsquoz igt igndash1 exp(ndashebrsquoz it g

i )

where b is a vector of unknown regression coefficientsand g is the shape parameter of the Weibull distribu-tion This leads to a baseline hazard of the form

l0(ti) = gt igndash1

Re-parameterize by letting mi = ebrsquoz i the conditionaldistribution of ti given mI is then Weibull (gmi) We for-mulated a mixed model to represent the conditionaldistribution of tij given covariate effects random addi-tive genetic and common environment effects as

tijfrac12mij ~ Weibull (g mij) i = 1hellipn j = 12

where

The regression coefficients and the precision of therandom effects (tG tF tH) were given ldquonon-informa-tiverdquo Normal and Gamma priors respectively The shapeparameter g of the time to onset of melanoma distribu-tion was also given a non-informative Gamma priorwhich was slowly decreasing on the positive real line

The overall model (including random effects) maybe described in a Bayesian graph (Figure 3) whichsimplifies sampling from full conditional distributionsby exploiting part ial independence propert ies(Spiegelhalter et al 1996a) In this graph eachrandom quantity is represented by a node which maybe connected by directed or undirected linksConditional independence assumptions are repre-sented by the absence of such links

We implemented the Gibbs sampler using theBUGS program (Gilks et al 1994) (code in theAppendix) Imputation of missing data was handlednaturally in the Gibbs sampling framework by treatingmissing values as additional unknown quantities andrandomly sampling values from their full conditionaldistributions We chose simple prior distributions forimputation since the number of missing values forcovariates was not large (complete for birth year 16missing for naevi and 21 missing for freckles) andthere was no indication of non-random missingness inour data Therefore imputation for missing naevi andfreckles covariates were based on Bernouilli prior dis-tributions with respective parameter values estimatedfrom the complete observations We performed aninitial 10000 burn-in iterations followed by an addi-tional 20000 Parameter estimates were the mean andstandard deviation (SD) of all post convergence Gibbssamples with a thinning interval between 20 and 50credible intervals were computed as the lower andupper a2 percentiles from the last 20000 iterationsConvergence to the posterior distribution was con-firmed by using the different criteria provided by theadd-on CODA package including those of Gelmanand Rubin (1992) Geweke (1992) and Raftery andLewis (1992a 1992b)

ResultsGEE Approach

An inspection of residual plots following preliminarymodel-fitting provided no evidence for the failure ofproportional hazards assumption and did not detectinfluential observations We then proceeded to applythe GEE approach that could estimate regressioncoefficients while incorporating a dependence struc-ture between relative pairs The results are displayed

108 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

in Table 3 which suggest that later birth year havingat least a moderate number of naevi and freckles wereall simultaneously associated with later age at onsetof melanoma Table 3 also presents the estimatedodds-ratios for quantifying the correlation betweenpaired relatives of specific relationships

The odds ratio for sibndashsib pairs is 2973 which issignificantly different from 1 (p lt 001) The odds ratiofor parentndashchild pairs is 1650 which is slightly greaterthan 1 indicating a mild dependency between thesepairs although they are not quite statistically signifi-cant The odds ratio for second-degree and higherrelative pairs is 1155 indicating no dependence at allbetween these pairs This pattern of familial aggrega-tion is compatible with dominance variance as well asadditive genetic variance (Falconer 1990)

Bayesian Approach

We re-analyzed the age-at-onset of melanoma by usingGibbs sampling to impute missing covariates and toestimate subject-specific covariate effects randomadditive genetic common family environment andshared sibling environment effects on the log scale Theestimated shape parameter g was 43 with a 95 credi-ble interval (CI) of (42 46) The results aresummarized in Table 4 (Model A) The residual plotsdid not indicate a gross departure from the underlyingWeibull model and revealed no influential observa-tions We checked the sensitivity of the analyses to

initial parameter values by re-running the Gibbssampler five more times using different starting valuesThe resulting estimates did not differ by more than 5from the values reported here The mean estimate for(s 2

A) was 0452 with 95 CI = (0348 0566) for (s 2C)

was ndash0053 with 95 CI = (ndash0120 0019) and for(s 2

Cs) was 0467 with 95 CI = (0393 0545)A small negative value for the common family

effect suggests that there may be a dominant effect orthat the current model is not quite appropriate forexample that there are systematic effects or correla-tion structures that have not been accounted for Theresults here indicate that additive genetics seem toimpact equally on the variation of the age at onset ofmelanoma Further exploration for alternative modelsis a focus of our future research

In addition we investigated the relative contribu-tion of genetic and environmental effects on theexpression of naevi (Model B) and freckles (ModelC) which are known risk factors for melanoma Theexpressions of naevi and freckles were coded asbinary variables (none or few moles versus moderateor many moles and no freckles versus one or morefreckles) A hierarchical Bayesian binomial model wasfitted to estimate the random variance componentsThe results in Table 4 indicate that a common familyenvironment effect contributed the most to theexpression of naevi (s 2

C = 0704) (relative to the con-tributions of additive genetic effect (s 2

A = 0142) and

Individual j in family i

Figure 3

Graphical model of covariate and random family effects for an individual in a nuclear family structure tij represents the observed failure time forthe jth individual in the ith family with tcenij being an indicator variable of censoring status Full arrows indicate stochastic links to which a probabil-ity is attached broken arrows denote deterministic relationships bs are regression coefficients T is the precision of the prior distribution andequals the inverse of the variance Fi Gi Hi are random effects modeled as Fi ~ N(012 s 2

A + s 2C) Gi ~ N(0 12 s 2

A) and Hi ~ N(0 s 2Cs) Rectangles rep-

resent actual data values for the covariates g and mij are shape and scale parameters for the underlying Weibull distribution

109Twin Research February 2004

Analysis of Melanoma Onset

of shared sibling effect(s 2Cs = 0142) both of which

were non negligible In contrast variation in theexpression of freckles was largely explained by addi-tive genetic and shared family effects(s 2

A = 2050 s 2C =

2600) compared to a relatively small shared siblingeffect(s 2

C = 0115)

Discussion

We applied two methods mdash generalized estimatingequat ion and Bayesian analys is mdash to the geneticanalysis of age at onset of melanoma based on anuclear family structure Under both approaches theresults suggest that additive genetic factors played an

Table 4

Gibbs Sampling Approach Estimated Regression Coefficients and Estimated Variance Components in a Melanoma Study of Queensland Families

Weibull Model A Mean effects mdash Response variable is Age-at-onset

Covariate RR = eb Coefficient b Robust se(b) 95 CI of bYear of birth 1378 0321 00027 (03160326)

Naevi 1126 0119 00021 (00580185)

Freckling 1017 0017 01400 (ndash00050085)

Weibull Model B Variance components mdash Response variable is Age-at-onset

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 0452 0054 (03480566)

s 2C ndash0053 0027 (ndash01200019)

s 2Cs 0467 0040 (03930545)

g 43 0104 (4246)

Binomial Model Variance components mdash Response variable is Naevi

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 0142 0149 (00020498)

s 2C 0704 0156 (04031010)

s 2Cs 0195 0157 (000250553)

Binomial Model Variance components mdash Response variable is Freckling

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 2050 0779 (08353570)

s 2C 2600 0418 (17803460)

s 2Cs 0115 0088 (00110312)

Note Indicates Significance at the 005 level) Naevi Is a Binary Variable with Baseline 0 = No or Few Moles Freckling Is Coded as a Binary Variable with Baseline0 = No Freckles

Table 3

GEE Approach Estimated Regression Coefficients in the Proportional Hazard Model and Estimated Odds Ratios for Quantifying FamilialAggregation in Age at Onset of Melanoma in Queensland Families ( Indicates Significance at the 05 Significance Level)

A Mean effects

Covariate RR = eb Coefficient b Robust se(b) Z-statistic

Year of birth 1142 0132 0051 2588

Naevi (Baseline = No or few moles) 1765 0568 0073 7781

Freckling (Baseline = No freckles) 1160 0148 0049 3020

B Pattterns of familial aggregation

Relationship 1 + q se(q) Z-statistic

Sibndashsib 2973 06217 317

Parentndashchild 1650 0434 150

SecondOthers 1155 0270 047

important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix

Acknowledgments

Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee

ReferencesAbel L amp Bonney G E (1990) A time-dependent

logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407

Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401

Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165

Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc

Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag

Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press

Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625

Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California

Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212

Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151

Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117

Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202

Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235

English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283

Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd

Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494

GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software

110 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

111Twin Research February 2004

Analysis of Melanoma Onset

Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214

Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188

Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409

Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511

Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741

Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press

Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178

Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189

Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126

Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432

Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109

Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760

Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678

Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506

Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071

Kempthorne O (1960) Biometrical genetics New YorkPergammon Press

Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press

Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210

Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452

Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334

LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425

Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293

Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298

MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432

Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092

Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43

Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493

Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497

Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press

SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University

Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148

112 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171

Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press

Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit

Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit

Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95

Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146

Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation

Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230

Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington

Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454

113Twin Research February 2004

Analysis of Melanoma Onset

Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects

Page 7: Analysis of Melanoma Onset: Assessing Familial Aggregation ... · and Fitting Variance Components via Bayesian Random Effects Models Kim-Anh Do1, Joanne F. Aitken2, Adè le C. Green3,

104 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

To account for the different ages at censoring in eachgeneration due to termination of the study or deathfrom causes other than melanoma we examined thedisease-free survival distribution for each generationusing the standard failure-time analysis technique(Figure 2) The median age at diagnosis of melanomawas 64 among parents of probands 50 among sib-lings of probands and 33 among children ofprobands The disease-free survival functions differedsignificantly between the children generation fromearlier generations (log-rank test with p lt 01) butthere was no significant differences in age-at-onset of

melanoma between the siblings and parents ofprobands (log rank test with p gt 9)

Preliminary Segregation Analysis

A preliminary segregation analysis was conducted(Aitken et al 1998) to investigate whether the famil-ial clustering of cutaneous melanoma is consistentwith Mendelian inheritance of a major autosomalgene Analyses were performed with the SAGE (1992)statistical package using the maximum likelihoodREGTL program for a binary trait with a variable ageof onset The hypothesis of co-dominant Mendelianinheritance gave a significantly better fit to the datathan either dominant or recessive Mendelian inheri-tance Overall both Mendelian inheritance of a singlemajor gene and purely environmental transmissionwere rejected However there was strong evidence offamilial dependence in melanoma occurrence

The inclusion of risk factors in the models mayreveal whether all or a combination of these explainsthe familial dependence that was demonstrated by thesegregation analysis If a major gene exists that oper-ates independently on these covariates includingthem in the models may reduce residual variation andincrease the power of the analysis to detect a majorgene effect

Estimating Equations Approach

Let y = (dki tki Zki) denote the data collected for the ith

member in the kth family (k = 1hellipK and i = 12)where dki = 0 if the observation is censored tki is eitherthe recorded age at diagnosis of melanoma or the ageat the most recent follow-up for unaffected peopleand Zki is a vector of measured covariates We assumethat censoring time age at diagnosis of melanomaand the covariates are independently distributedThese assumptions can be relaxed in more generalmodels subject to identification constraints Thehazard rate for melanoma is the instantaneous proba-bility that melanoma is diagnosed immediately aftertime t given that the person is unaffected at time tThe hazard rate under the Cox proportional hazardsmodel (Cox 1972) is given by

l(tki) = l0(tki) exp(brsquoZki)

Table 2

Concordant and Discordant Pairs of Relatives in 1912 Families from the Queensland Familial Melanoma Project Probands Were Not Included forthe Calculation of Concordance

Sibndashsib Parentndashchild SecondOthers Total

++ 49 15 41 105

+ndash 763 536 1200 2499

mdash 7011 1078 9817 17906

Total 7823 1629 11058 20510

Figure 2

Cumulative melanoma-free survival among first-degree relatives ofconfirmed cutaneous melanoma cases (probands) diagnosed inQueensland Australia 1982ndash1987 according to the relativersquos rela-tionship to the proband

where l0() is the baseline hazard function and b is avector of regression coefficients

For a specific pair of relatives we follow Clayton(1978) in modeling the bivariate survivor function

F(tk1 tk2) = (F1(tk1)ndashq + F2(tk2)

ndashq ndash1)ndash1q

where F1 and F2 are univariable survivor functions qis a scalar parameter that measures the degree ofdependence between the relativesrsquo times at onsetindependence being implied by q = 0 and positiveassociation by q gt 0 The Clayton model allows nega-tive dependencies and has the property that failuretimes are absolutely continuous for q gt ndash05 In addi-tion the cross-ratio (or odds-ratio) function asstudied by Oakes (1989) is

c(tk1 tk2) = l(tk1 frac12 Tk2 = tk2) l(tk1frac12Tk2 sup3 tk2) =1 + q

This is equivalent to assuming that the odds-ratio isinvariant over the grid region that supports the dataHeuristically the parameter 1 + q is an odds-ratiothat depends on the degree of dependence betweenthe onset ages of the two relatives If genetic factorsdo influence the age at onset of melanoma we wouldexpect to see a higher concordance in the age of onsetin first degree relatives who on average share halftheir genes in common in comparison to seconddegree relatives Under the current model this trans-lates as qfirst-degree gt qsecond-degree

We may use a standard method to estimate withinpair correlations for 2 acute 2 tables from odds ratiosEstimates of relation-pair correlations psibndashsib pparentndashchildand psecond-order are recovered from using the relation-ships for example r sibndashsib = min(1ln(1 + q sibndashsib))Testing for the presence of genetic factors underlyingthe age at diagnosis of melanoma is equivalent totesting H0 p first-order = psecond-order We may test thishypothesis using a z-transform (Kendall 1979 p 315) of the point estimates of the correlation coef-ficients Let n1 and n2 denote the number of first orderand second order relatives let z1and z2 denote thetransformed statistics of r1and r2 the correlation esti-mates for first-order and second-order relative pairsrespectively Specifically we reject H0 when ED gtZ1ndasha where

E = E(z1 ndash z2) = mdash12mdash log 31mdash11

+ndash r

r

1

1mdash2 1mdash11

+ndash r

r2

2

mdash24D2 = V(z1 ndash z2) = mdash

(n1

1ndash 3)mdash+ mdash

(n2

1ndash 3)mdash

and Z1ndasha is the standard normal deviate correspondingto the one-sided a significance level

This approach has the advantage of providing atest for the presence of genetic effects through a singleparameter (q) However it is limited in its ability to

attribute the phenotypic variance to specific effects(eg additive gene action)

Mathematical details of the GEE model and theiterative procedure to estimate the regression coeffi-cients b and specific degrees of dependence q for thedifferent types of relative pairs have been summarizedpreviously in Do et al (2000)

MCMC Analysis Using BUGSThe Bayesian Paradigm and Gibbs Sampling

Markov Chain Monte Carlo (MCMC) is an alterna-tive Bayesian approach that provides estimates oflikelihoods and associated parameter values whenexact computation is infeasible (Hastings 1970Metropolis et al 1953) MCMC methods can beused to draw samples from the underlying joint distri-bution of major genotypes and polygenic valuesconditional on the observed data From thesesamples desired parameters and likelihoods can beestimated without the need to resort to exact compu-tation MCMC methods have been used for linkageanalysis (Kong et al 1992 Lange amp Sobel 1991)for estimation of parameters in the mixed model withand without covariates (Guo amp Thompson 1991Thomas 1992) for estimation of genendashsmoking inter-action and covariate imputation (Gauderman et al1997) for performing combined linkage and segrega-tion analysis (Faucett et al 1993 Guo amp Thompson1992) and for mixed models of large complex pedi-grees (Guo amp Thompson 1994)

In a general setting let y be the observed data andq be everything not observed including parametersand latent variables The implementation of Bayesianmethods using realistic models and priors is com-puter-intensive and relies heavily on cunningcomputational tools to approximate integrals Theproblem in general terms is to obtain the expectedvalue of a function of interest s () under the posteriordensity p(qfrac12x)

E[s(q)] =

which cannot generally be found analytically Onemethod to carry out the integration on the RHS is toperform simulation of exact Bayesian posterior distri-butions using Markov chain Monte Carlo techniquessuch as Gibbs sampling The Gibbs sampler (Gemanamp Geman 1984) is the most popular algorithm usedin MCMC applications to correlated data Gibbssampling was introduced to the main statistical com-munity by Gelfand and Smith (1990) and has sincebeen applied to an even wider array of problems TheGibbs sampler is easy to implement because it onlydepends on the local neighborhood structure In thecontext of pedigree analysis (Olshen amp Wijsman1996) the basic procedure is a sequential updating ofmissing and latent data including the underlying andunobserved major genotypes polygenic effects and

EQ s(q) p(q) p(xfrac12q) dqmdash mdash mdash

EQ p(q) p(xfrac12q) dq

105Twin Research February 2004

Analysis of Melanoma Onset

106 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

environmental effects Values for the missing or latentdata are sampled from the local conditional distribu-tion a function of the observed individual data thecurrent sampled values of other missinglatent data forthis particular individual such as polygenic and environ-mental effects and the values for the sampled geneticeffects in the immediate neighbors of an individualGibbs sampling basically consists of three main steps

Step 1 Setting initial values for unobserved quanti-ties (parameters and latent variables)

Step 2 For each parameter or latent variable qj sample from its ldquofull conditional distributionrdquogiven the current values of all other quantities inthe model

Step 3 Examine sampled values of parameters andlatent variables to monitor convergence and toprovide summary measures

Some of the most recent and popular packages thatimplement Gibbs sampling for analysis of pedigreedata include BUGS (Gilks et al 1994 Spiegelhalter et al 1996ab) Genetic Analysis Package (GAP1996) and MIXD (Olshen amp Wijsman 1996Thompson 1994) We have used BUGS mainlybecause of its flexibility in programming hierarchicalmodels besides being a freeware product

The Model

The aim here is to model the correlation structurewithin the family structure to satisfy the fundamentaladditive genetic model (Crow amp Kimura 1970Falconer 1990 Kempthorne 1960) as followsConsider a nuclear family structure consisting of 4members father mother and two children denoted byF M S1 S2 respectively Using similar notation as inBurton et al (1999) a conventional mixed linearmodel consisting of fixed and random effects may bewritten in the form

Qij = bTz + Aij +Cij + CSij+ Eij

where Qij is the observed value of a normally distrib-uted continuous trait for the jth individual in the ith

nuclear family zij is a vector of observed covariatesrepresenting fixed effects and b is a correspondingvector of unknown fixed regression coefficients Aij Cij and CSij

A denote random effects that repre-sent additive polygenic common family environmentand common sibling environment effects respectivelyThe variation in an individual response is representedby a composite covariance matrix VT and is the sumof an additive genetic covariance matrix VA acommon family environment matrix VC a sharedsibling environment matrix VCs and residual environ-mental effects The different variance components are

The overall total covariance matrix is

The components of variance s 2A s 2

C s 2Cs need not be

positive as long as VT is positive definite A negativevalue for s 2

Cs simply implies that the realized covari-ance between sib lings is less than the realized

107Twin Research February 2004

Analysis of Melanoma Onset

covariance between parent and a child A negativevalue for s 2

Cs may suggest dominance To optimizeconvergence in BUGS Model (1) may be reparameter-ized as

where Fi Gi H i are independent additive randomeffects or latent variables and are the residual errorterms for parents and children respectively If we model

Fi ~ N(0 mdash12mdash s 2

A +s 2C)

Gi ~ N(0 mdash12mdash s 2

A) Hi ~ N(0 s 2Cs)

then the basic genetics covariance model (additivegenetic common environment and unique environ-ment for this particular four-member family structureis satisfied

In survival models unobserved or unmeasuredexplanatory variables some of which may be geneticare often referred to as frailties The frailties takevalues restricted to the posit ive line and may beassumed to act multiplicatively on the hazardExtending the above model to correlated family datawith time-to-onset endpoint a multiplicative individ-ual heterogeneity or frailty term representing thelatent genetic and common environment variablesmay be modeled as random effects simultaneouslywith the effects associated with observed covariatesConsider right censored time to onset of melanomadata (Tij dij zij)1 j n from n relative pairs hereT i j denotes the true age at onset of the jth familymember or the censored time depending on whetherdij =1 or 0 respectively and z denotes a p acute 1 vector ofcovariates A Weibull distribution may be used tomodel time to failure as

f(ti zi) = ebrsquoz igt igndash1 exp(ndashebrsquoz it g

i )

where b is a vector of unknown regression coefficientsand g is the shape parameter of the Weibull distribu-tion This leads to a baseline hazard of the form

l0(ti) = gt igndash1

Re-parameterize by letting mi = ebrsquoz i the conditionaldistribution of ti given mI is then Weibull (gmi) We for-mulated a mixed model to represent the conditionaldistribution of tij given covariate effects random addi-tive genetic and common environment effects as

tijfrac12mij ~ Weibull (g mij) i = 1hellipn j = 12

where

The regression coefficients and the precision of therandom effects (tG tF tH) were given ldquonon-informa-tiverdquo Normal and Gamma priors respectively The shapeparameter g of the time to onset of melanoma distribu-tion was also given a non-informative Gamma priorwhich was slowly decreasing on the positive real line

The overall model (including random effects) maybe described in a Bayesian graph (Figure 3) whichsimplifies sampling from full conditional distributionsby exploiting part ial independence propert ies(Spiegelhalter et al 1996a) In this graph eachrandom quantity is represented by a node which maybe connected by directed or undirected linksConditional independence assumptions are repre-sented by the absence of such links

We implemented the Gibbs sampler using theBUGS program (Gilks et al 1994) (code in theAppendix) Imputation of missing data was handlednaturally in the Gibbs sampling framework by treatingmissing values as additional unknown quantities andrandomly sampling values from their full conditionaldistributions We chose simple prior distributions forimputation since the number of missing values forcovariates was not large (complete for birth year 16missing for naevi and 21 missing for freckles) andthere was no indication of non-random missingness inour data Therefore imputation for missing naevi andfreckles covariates were based on Bernouilli prior dis-tributions with respective parameter values estimatedfrom the complete observations We performed aninitial 10000 burn-in iterations followed by an addi-tional 20000 Parameter estimates were the mean andstandard deviation (SD) of all post convergence Gibbssamples with a thinning interval between 20 and 50credible intervals were computed as the lower andupper a2 percentiles from the last 20000 iterationsConvergence to the posterior distribution was con-firmed by using the different criteria provided by theadd-on CODA package including those of Gelmanand Rubin (1992) Geweke (1992) and Raftery andLewis (1992a 1992b)

ResultsGEE Approach

An inspection of residual plots following preliminarymodel-fitting provided no evidence for the failure ofproportional hazards assumption and did not detectinfluential observations We then proceeded to applythe GEE approach that could estimate regressioncoefficients while incorporating a dependence struc-ture between relative pairs The results are displayed

108 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

in Table 3 which suggest that later birth year havingat least a moderate number of naevi and freckles wereall simultaneously associated with later age at onsetof melanoma Table 3 also presents the estimatedodds-ratios for quantifying the correlation betweenpaired relatives of specific relationships

The odds ratio for sibndashsib pairs is 2973 which issignificantly different from 1 (p lt 001) The odds ratiofor parentndashchild pairs is 1650 which is slightly greaterthan 1 indicating a mild dependency between thesepairs although they are not quite statistically signifi-cant The odds ratio for second-degree and higherrelative pairs is 1155 indicating no dependence at allbetween these pairs This pattern of familial aggrega-tion is compatible with dominance variance as well asadditive genetic variance (Falconer 1990)

Bayesian Approach

We re-analyzed the age-at-onset of melanoma by usingGibbs sampling to impute missing covariates and toestimate subject-specific covariate effects randomadditive genetic common family environment andshared sibling environment effects on the log scale Theestimated shape parameter g was 43 with a 95 credi-ble interval (CI) of (42 46) The results aresummarized in Table 4 (Model A) The residual plotsdid not indicate a gross departure from the underlyingWeibull model and revealed no influential observa-tions We checked the sensitivity of the analyses to

initial parameter values by re-running the Gibbssampler five more times using different starting valuesThe resulting estimates did not differ by more than 5from the values reported here The mean estimate for(s 2

A) was 0452 with 95 CI = (0348 0566) for (s 2C)

was ndash0053 with 95 CI = (ndash0120 0019) and for(s 2

Cs) was 0467 with 95 CI = (0393 0545)A small negative value for the common family

effect suggests that there may be a dominant effect orthat the current model is not quite appropriate forexample that there are systematic effects or correla-tion structures that have not been accounted for Theresults here indicate that additive genetics seem toimpact equally on the variation of the age at onset ofmelanoma Further exploration for alternative modelsis a focus of our future research

In addition we investigated the relative contribu-tion of genetic and environmental effects on theexpression of naevi (Model B) and freckles (ModelC) which are known risk factors for melanoma Theexpressions of naevi and freckles were coded asbinary variables (none or few moles versus moderateor many moles and no freckles versus one or morefreckles) A hierarchical Bayesian binomial model wasfitted to estimate the random variance componentsThe results in Table 4 indicate that a common familyenvironment effect contributed the most to theexpression of naevi (s 2

C = 0704) (relative to the con-tributions of additive genetic effect (s 2

A = 0142) and

Individual j in family i

Figure 3

Graphical model of covariate and random family effects for an individual in a nuclear family structure tij represents the observed failure time forthe jth individual in the ith family with tcenij being an indicator variable of censoring status Full arrows indicate stochastic links to which a probabil-ity is attached broken arrows denote deterministic relationships bs are regression coefficients T is the precision of the prior distribution andequals the inverse of the variance Fi Gi Hi are random effects modeled as Fi ~ N(012 s 2

A + s 2C) Gi ~ N(0 12 s 2

A) and Hi ~ N(0 s 2Cs) Rectangles rep-

resent actual data values for the covariates g and mij are shape and scale parameters for the underlying Weibull distribution

109Twin Research February 2004

Analysis of Melanoma Onset

of shared sibling effect(s 2Cs = 0142) both of which

were non negligible In contrast variation in theexpression of freckles was largely explained by addi-tive genetic and shared family effects(s 2

A = 2050 s 2C =

2600) compared to a relatively small shared siblingeffect(s 2

C = 0115)

Discussion

We applied two methods mdash generalized estimatingequat ion and Bayesian analys is mdash to the geneticanalysis of age at onset of melanoma based on anuclear family structure Under both approaches theresults suggest that additive genetic factors played an

Table 4

Gibbs Sampling Approach Estimated Regression Coefficients and Estimated Variance Components in a Melanoma Study of Queensland Families

Weibull Model A Mean effects mdash Response variable is Age-at-onset

Covariate RR = eb Coefficient b Robust se(b) 95 CI of bYear of birth 1378 0321 00027 (03160326)

Naevi 1126 0119 00021 (00580185)

Freckling 1017 0017 01400 (ndash00050085)

Weibull Model B Variance components mdash Response variable is Age-at-onset

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 0452 0054 (03480566)

s 2C ndash0053 0027 (ndash01200019)

s 2Cs 0467 0040 (03930545)

g 43 0104 (4246)

Binomial Model Variance components mdash Response variable is Naevi

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 0142 0149 (00020498)

s 2C 0704 0156 (04031010)

s 2Cs 0195 0157 (000250553)

Binomial Model Variance components mdash Response variable is Freckling

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 2050 0779 (08353570)

s 2C 2600 0418 (17803460)

s 2Cs 0115 0088 (00110312)

Note Indicates Significance at the 005 level) Naevi Is a Binary Variable with Baseline 0 = No or Few Moles Freckling Is Coded as a Binary Variable with Baseline0 = No Freckles

Table 3

GEE Approach Estimated Regression Coefficients in the Proportional Hazard Model and Estimated Odds Ratios for Quantifying FamilialAggregation in Age at Onset of Melanoma in Queensland Families ( Indicates Significance at the 05 Significance Level)

A Mean effects

Covariate RR = eb Coefficient b Robust se(b) Z-statistic

Year of birth 1142 0132 0051 2588

Naevi (Baseline = No or few moles) 1765 0568 0073 7781

Freckling (Baseline = No freckles) 1160 0148 0049 3020

B Pattterns of familial aggregation

Relationship 1 + q se(q) Z-statistic

Sibndashsib 2973 06217 317

Parentndashchild 1650 0434 150

SecondOthers 1155 0270 047

important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix

Acknowledgments

Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee

ReferencesAbel L amp Bonney G E (1990) A time-dependent

logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407

Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401

Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165

Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc

Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag

Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press

Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625

Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California

Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212

Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151

Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117

Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202

Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235

English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283

Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd

Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494

GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software

110 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

111Twin Research February 2004

Analysis of Melanoma Onset

Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214

Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188

Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409

Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511

Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741

Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press

Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178

Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189

Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126

Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432

Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109

Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760

Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678

Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506

Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071

Kempthorne O (1960) Biometrical genetics New YorkPergammon Press

Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press

Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210

Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452

Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334

LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425

Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293

Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298

MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432

Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092

Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43

Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493

Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497

Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press

SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University

Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148

112 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171

Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press

Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit

Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit

Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95

Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146

Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation

Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230

Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington

Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454

113Twin Research February 2004

Analysis of Melanoma Onset

Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects

Page 8: Analysis of Melanoma Onset: Assessing Familial Aggregation ... · and Fitting Variance Components via Bayesian Random Effects Models Kim-Anh Do1, Joanne F. Aitken2, Adè le C. Green3,

where l0() is the baseline hazard function and b is avector of regression coefficients

For a specific pair of relatives we follow Clayton(1978) in modeling the bivariate survivor function

F(tk1 tk2) = (F1(tk1)ndashq + F2(tk2)

ndashq ndash1)ndash1q

where F1 and F2 are univariable survivor functions qis a scalar parameter that measures the degree ofdependence between the relativesrsquo times at onsetindependence being implied by q = 0 and positiveassociation by q gt 0 The Clayton model allows nega-tive dependencies and has the property that failuretimes are absolutely continuous for q gt ndash05 In addi-tion the cross-ratio (or odds-ratio) function asstudied by Oakes (1989) is

c(tk1 tk2) = l(tk1 frac12 Tk2 = tk2) l(tk1frac12Tk2 sup3 tk2) =1 + q

This is equivalent to assuming that the odds-ratio isinvariant over the grid region that supports the dataHeuristically the parameter 1 + q is an odds-ratiothat depends on the degree of dependence betweenthe onset ages of the two relatives If genetic factorsdo influence the age at onset of melanoma we wouldexpect to see a higher concordance in the age of onsetin first degree relatives who on average share halftheir genes in common in comparison to seconddegree relatives Under the current model this trans-lates as qfirst-degree gt qsecond-degree

We may use a standard method to estimate withinpair correlations for 2 acute 2 tables from odds ratiosEstimates of relation-pair correlations psibndashsib pparentndashchildand psecond-order are recovered from using the relation-ships for example r sibndashsib = min(1ln(1 + q sibndashsib))Testing for the presence of genetic factors underlyingthe age at diagnosis of melanoma is equivalent totesting H0 p first-order = psecond-order We may test thishypothesis using a z-transform (Kendall 1979 p 315) of the point estimates of the correlation coef-ficients Let n1 and n2 denote the number of first orderand second order relatives let z1and z2 denote thetransformed statistics of r1and r2 the correlation esti-mates for first-order and second-order relative pairsrespectively Specifically we reject H0 when ED gtZ1ndasha where

E = E(z1 ndash z2) = mdash12mdash log 31mdash11

+ndash r

r

1

1mdash2 1mdash11

+ndash r

r2

2

mdash24D2 = V(z1 ndash z2) = mdash

(n1

1ndash 3)mdash+ mdash

(n2

1ndash 3)mdash

and Z1ndasha is the standard normal deviate correspondingto the one-sided a significance level

This approach has the advantage of providing atest for the presence of genetic effects through a singleparameter (q) However it is limited in its ability to

attribute the phenotypic variance to specific effects(eg additive gene action)

Mathematical details of the GEE model and theiterative procedure to estimate the regression coeffi-cients b and specific degrees of dependence q for thedifferent types of relative pairs have been summarizedpreviously in Do et al (2000)

MCMC Analysis Using BUGSThe Bayesian Paradigm and Gibbs Sampling

Markov Chain Monte Carlo (MCMC) is an alterna-tive Bayesian approach that provides estimates oflikelihoods and associated parameter values whenexact computation is infeasible (Hastings 1970Metropolis et al 1953) MCMC methods can beused to draw samples from the underlying joint distri-bution of major genotypes and polygenic valuesconditional on the observed data From thesesamples desired parameters and likelihoods can beestimated without the need to resort to exact compu-tation MCMC methods have been used for linkageanalysis (Kong et al 1992 Lange amp Sobel 1991)for estimation of parameters in the mixed model withand without covariates (Guo amp Thompson 1991Thomas 1992) for estimation of genendashsmoking inter-action and covariate imputation (Gauderman et al1997) for performing combined linkage and segrega-tion analysis (Faucett et al 1993 Guo amp Thompson1992) and for mixed models of large complex pedi-grees (Guo amp Thompson 1994)

In a general setting let y be the observed data andq be everything not observed including parametersand latent variables The implementation of Bayesianmethods using realistic models and priors is com-puter-intensive and relies heavily on cunningcomputational tools to approximate integrals Theproblem in general terms is to obtain the expectedvalue of a function of interest s () under the posteriordensity p(qfrac12x)

E[s(q)] =

which cannot generally be found analytically Onemethod to carry out the integration on the RHS is toperform simulation of exact Bayesian posterior distri-butions using Markov chain Monte Carlo techniquessuch as Gibbs sampling The Gibbs sampler (Gemanamp Geman 1984) is the most popular algorithm usedin MCMC applications to correlated data Gibbssampling was introduced to the main statistical com-munity by Gelfand and Smith (1990) and has sincebeen applied to an even wider array of problems TheGibbs sampler is easy to implement because it onlydepends on the local neighborhood structure In thecontext of pedigree analysis (Olshen amp Wijsman1996) the basic procedure is a sequential updating ofmissing and latent data including the underlying andunobserved major genotypes polygenic effects and

EQ s(q) p(q) p(xfrac12q) dqmdash mdash mdash

EQ p(q) p(xfrac12q) dq

105Twin Research February 2004

Analysis of Melanoma Onset

106 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

environmental effects Values for the missing or latentdata are sampled from the local conditional distribu-tion a function of the observed individual data thecurrent sampled values of other missinglatent data forthis particular individual such as polygenic and environ-mental effects and the values for the sampled geneticeffects in the immediate neighbors of an individualGibbs sampling basically consists of three main steps

Step 1 Setting initial values for unobserved quanti-ties (parameters and latent variables)

Step 2 For each parameter or latent variable qj sample from its ldquofull conditional distributionrdquogiven the current values of all other quantities inthe model

Step 3 Examine sampled values of parameters andlatent variables to monitor convergence and toprovide summary measures

Some of the most recent and popular packages thatimplement Gibbs sampling for analysis of pedigreedata include BUGS (Gilks et al 1994 Spiegelhalter et al 1996ab) Genetic Analysis Package (GAP1996) and MIXD (Olshen amp Wijsman 1996Thompson 1994) We have used BUGS mainlybecause of its flexibility in programming hierarchicalmodels besides being a freeware product

The Model

The aim here is to model the correlation structurewithin the family structure to satisfy the fundamentaladditive genetic model (Crow amp Kimura 1970Falconer 1990 Kempthorne 1960) as followsConsider a nuclear family structure consisting of 4members father mother and two children denoted byF M S1 S2 respectively Using similar notation as inBurton et al (1999) a conventional mixed linearmodel consisting of fixed and random effects may bewritten in the form

Qij = bTz + Aij +Cij + CSij+ Eij

where Qij is the observed value of a normally distrib-uted continuous trait for the jth individual in the ith

nuclear family zij is a vector of observed covariatesrepresenting fixed effects and b is a correspondingvector of unknown fixed regression coefficients Aij Cij and CSij

A denote random effects that repre-sent additive polygenic common family environmentand common sibling environment effects respectivelyThe variation in an individual response is representedby a composite covariance matrix VT and is the sumof an additive genetic covariance matrix VA acommon family environment matrix VC a sharedsibling environment matrix VCs and residual environ-mental effects The different variance components are

The overall total covariance matrix is

The components of variance s 2A s 2

C s 2Cs need not be

positive as long as VT is positive definite A negativevalue for s 2

Cs simply implies that the realized covari-ance between sib lings is less than the realized

107Twin Research February 2004

Analysis of Melanoma Onset

covariance between parent and a child A negativevalue for s 2

Cs may suggest dominance To optimizeconvergence in BUGS Model (1) may be reparameter-ized as

where Fi Gi H i are independent additive randomeffects or latent variables and are the residual errorterms for parents and children respectively If we model

Fi ~ N(0 mdash12mdash s 2

A +s 2C)

Gi ~ N(0 mdash12mdash s 2

A) Hi ~ N(0 s 2Cs)

then the basic genetics covariance model (additivegenetic common environment and unique environ-ment for this particular four-member family structureis satisfied

In survival models unobserved or unmeasuredexplanatory variables some of which may be geneticare often referred to as frailties The frailties takevalues restricted to the posit ive line and may beassumed to act multiplicatively on the hazardExtending the above model to correlated family datawith time-to-onset endpoint a multiplicative individ-ual heterogeneity or frailty term representing thelatent genetic and common environment variablesmay be modeled as random effects simultaneouslywith the effects associated with observed covariatesConsider right censored time to onset of melanomadata (Tij dij zij)1 j n from n relative pairs hereT i j denotes the true age at onset of the jth familymember or the censored time depending on whetherdij =1 or 0 respectively and z denotes a p acute 1 vector ofcovariates A Weibull distribution may be used tomodel time to failure as

f(ti zi) = ebrsquoz igt igndash1 exp(ndashebrsquoz it g

i )

where b is a vector of unknown regression coefficientsand g is the shape parameter of the Weibull distribu-tion This leads to a baseline hazard of the form

l0(ti) = gt igndash1

Re-parameterize by letting mi = ebrsquoz i the conditionaldistribution of ti given mI is then Weibull (gmi) We for-mulated a mixed model to represent the conditionaldistribution of tij given covariate effects random addi-tive genetic and common environment effects as

tijfrac12mij ~ Weibull (g mij) i = 1hellipn j = 12

where

The regression coefficients and the precision of therandom effects (tG tF tH) were given ldquonon-informa-tiverdquo Normal and Gamma priors respectively The shapeparameter g of the time to onset of melanoma distribu-tion was also given a non-informative Gamma priorwhich was slowly decreasing on the positive real line

The overall model (including random effects) maybe described in a Bayesian graph (Figure 3) whichsimplifies sampling from full conditional distributionsby exploiting part ial independence propert ies(Spiegelhalter et al 1996a) In this graph eachrandom quantity is represented by a node which maybe connected by directed or undirected linksConditional independence assumptions are repre-sented by the absence of such links

We implemented the Gibbs sampler using theBUGS program (Gilks et al 1994) (code in theAppendix) Imputation of missing data was handlednaturally in the Gibbs sampling framework by treatingmissing values as additional unknown quantities andrandomly sampling values from their full conditionaldistributions We chose simple prior distributions forimputation since the number of missing values forcovariates was not large (complete for birth year 16missing for naevi and 21 missing for freckles) andthere was no indication of non-random missingness inour data Therefore imputation for missing naevi andfreckles covariates were based on Bernouilli prior dis-tributions with respective parameter values estimatedfrom the complete observations We performed aninitial 10000 burn-in iterations followed by an addi-tional 20000 Parameter estimates were the mean andstandard deviation (SD) of all post convergence Gibbssamples with a thinning interval between 20 and 50credible intervals were computed as the lower andupper a2 percentiles from the last 20000 iterationsConvergence to the posterior distribution was con-firmed by using the different criteria provided by theadd-on CODA package including those of Gelmanand Rubin (1992) Geweke (1992) and Raftery andLewis (1992a 1992b)

ResultsGEE Approach

An inspection of residual plots following preliminarymodel-fitting provided no evidence for the failure ofproportional hazards assumption and did not detectinfluential observations We then proceeded to applythe GEE approach that could estimate regressioncoefficients while incorporating a dependence struc-ture between relative pairs The results are displayed

108 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

in Table 3 which suggest that later birth year havingat least a moderate number of naevi and freckles wereall simultaneously associated with later age at onsetof melanoma Table 3 also presents the estimatedodds-ratios for quantifying the correlation betweenpaired relatives of specific relationships

The odds ratio for sibndashsib pairs is 2973 which issignificantly different from 1 (p lt 001) The odds ratiofor parentndashchild pairs is 1650 which is slightly greaterthan 1 indicating a mild dependency between thesepairs although they are not quite statistically signifi-cant The odds ratio for second-degree and higherrelative pairs is 1155 indicating no dependence at allbetween these pairs This pattern of familial aggrega-tion is compatible with dominance variance as well asadditive genetic variance (Falconer 1990)

Bayesian Approach

We re-analyzed the age-at-onset of melanoma by usingGibbs sampling to impute missing covariates and toestimate subject-specific covariate effects randomadditive genetic common family environment andshared sibling environment effects on the log scale Theestimated shape parameter g was 43 with a 95 credi-ble interval (CI) of (42 46) The results aresummarized in Table 4 (Model A) The residual plotsdid not indicate a gross departure from the underlyingWeibull model and revealed no influential observa-tions We checked the sensitivity of the analyses to

initial parameter values by re-running the Gibbssampler five more times using different starting valuesThe resulting estimates did not differ by more than 5from the values reported here The mean estimate for(s 2

A) was 0452 with 95 CI = (0348 0566) for (s 2C)

was ndash0053 with 95 CI = (ndash0120 0019) and for(s 2

Cs) was 0467 with 95 CI = (0393 0545)A small negative value for the common family

effect suggests that there may be a dominant effect orthat the current model is not quite appropriate forexample that there are systematic effects or correla-tion structures that have not been accounted for Theresults here indicate that additive genetics seem toimpact equally on the variation of the age at onset ofmelanoma Further exploration for alternative modelsis a focus of our future research

In addition we investigated the relative contribu-tion of genetic and environmental effects on theexpression of naevi (Model B) and freckles (ModelC) which are known risk factors for melanoma Theexpressions of naevi and freckles were coded asbinary variables (none or few moles versus moderateor many moles and no freckles versus one or morefreckles) A hierarchical Bayesian binomial model wasfitted to estimate the random variance componentsThe results in Table 4 indicate that a common familyenvironment effect contributed the most to theexpression of naevi (s 2

C = 0704) (relative to the con-tributions of additive genetic effect (s 2

A = 0142) and

Individual j in family i

Figure 3

Graphical model of covariate and random family effects for an individual in a nuclear family structure tij represents the observed failure time forthe jth individual in the ith family with tcenij being an indicator variable of censoring status Full arrows indicate stochastic links to which a probabil-ity is attached broken arrows denote deterministic relationships bs are regression coefficients T is the precision of the prior distribution andequals the inverse of the variance Fi Gi Hi are random effects modeled as Fi ~ N(012 s 2

A + s 2C) Gi ~ N(0 12 s 2

A) and Hi ~ N(0 s 2Cs) Rectangles rep-

resent actual data values for the covariates g and mij are shape and scale parameters for the underlying Weibull distribution

109Twin Research February 2004

Analysis of Melanoma Onset

of shared sibling effect(s 2Cs = 0142) both of which

were non negligible In contrast variation in theexpression of freckles was largely explained by addi-tive genetic and shared family effects(s 2

A = 2050 s 2C =

2600) compared to a relatively small shared siblingeffect(s 2

C = 0115)

Discussion

We applied two methods mdash generalized estimatingequat ion and Bayesian analys is mdash to the geneticanalysis of age at onset of melanoma based on anuclear family structure Under both approaches theresults suggest that additive genetic factors played an

Table 4

Gibbs Sampling Approach Estimated Regression Coefficients and Estimated Variance Components in a Melanoma Study of Queensland Families

Weibull Model A Mean effects mdash Response variable is Age-at-onset

Covariate RR = eb Coefficient b Robust se(b) 95 CI of bYear of birth 1378 0321 00027 (03160326)

Naevi 1126 0119 00021 (00580185)

Freckling 1017 0017 01400 (ndash00050085)

Weibull Model B Variance components mdash Response variable is Age-at-onset

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 0452 0054 (03480566)

s 2C ndash0053 0027 (ndash01200019)

s 2Cs 0467 0040 (03930545)

g 43 0104 (4246)

Binomial Model Variance components mdash Response variable is Naevi

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 0142 0149 (00020498)

s 2C 0704 0156 (04031010)

s 2Cs 0195 0157 (000250553)

Binomial Model Variance components mdash Response variable is Freckling

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 2050 0779 (08353570)

s 2C 2600 0418 (17803460)

s 2Cs 0115 0088 (00110312)

Note Indicates Significance at the 005 level) Naevi Is a Binary Variable with Baseline 0 = No or Few Moles Freckling Is Coded as a Binary Variable with Baseline0 = No Freckles

Table 3

GEE Approach Estimated Regression Coefficients in the Proportional Hazard Model and Estimated Odds Ratios for Quantifying FamilialAggregation in Age at Onset of Melanoma in Queensland Families ( Indicates Significance at the 05 Significance Level)

A Mean effects

Covariate RR = eb Coefficient b Robust se(b) Z-statistic

Year of birth 1142 0132 0051 2588

Naevi (Baseline = No or few moles) 1765 0568 0073 7781

Freckling (Baseline = No freckles) 1160 0148 0049 3020

B Pattterns of familial aggregation

Relationship 1 + q se(q) Z-statistic

Sibndashsib 2973 06217 317

Parentndashchild 1650 0434 150

SecondOthers 1155 0270 047

important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix

Acknowledgments

Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee

ReferencesAbel L amp Bonney G E (1990) A time-dependent

logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407

Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401

Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165

Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc

Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag

Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press

Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625

Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California

Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212

Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151

Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117

Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202

Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235

English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283

Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd

Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494

GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software

110 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

111Twin Research February 2004

Analysis of Melanoma Onset

Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214

Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188

Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409

Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511

Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741

Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press

Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178

Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189

Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126

Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432

Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109

Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760

Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678

Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506

Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071

Kempthorne O (1960) Biometrical genetics New YorkPergammon Press

Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press

Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210

Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452

Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334

LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425

Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293

Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298

MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432

Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092

Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43

Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493

Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497

Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press

SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University

Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148

112 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171

Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press

Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit

Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit

Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95

Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146

Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation

Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230

Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington

Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454

113Twin Research February 2004

Analysis of Melanoma Onset

Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects

Page 9: Analysis of Melanoma Onset: Assessing Familial Aggregation ... · and Fitting Variance Components via Bayesian Random Effects Models Kim-Anh Do1, Joanne F. Aitken2, Adè le C. Green3,

106 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

environmental effects Values for the missing or latentdata are sampled from the local conditional distribu-tion a function of the observed individual data thecurrent sampled values of other missinglatent data forthis particular individual such as polygenic and environ-mental effects and the values for the sampled geneticeffects in the immediate neighbors of an individualGibbs sampling basically consists of three main steps

Step 1 Setting initial values for unobserved quanti-ties (parameters and latent variables)

Step 2 For each parameter or latent variable qj sample from its ldquofull conditional distributionrdquogiven the current values of all other quantities inthe model

Step 3 Examine sampled values of parameters andlatent variables to monitor convergence and toprovide summary measures

Some of the most recent and popular packages thatimplement Gibbs sampling for analysis of pedigreedata include BUGS (Gilks et al 1994 Spiegelhalter et al 1996ab) Genetic Analysis Package (GAP1996) and MIXD (Olshen amp Wijsman 1996Thompson 1994) We have used BUGS mainlybecause of its flexibility in programming hierarchicalmodels besides being a freeware product

The Model

The aim here is to model the correlation structurewithin the family structure to satisfy the fundamentaladditive genetic model (Crow amp Kimura 1970Falconer 1990 Kempthorne 1960) as followsConsider a nuclear family structure consisting of 4members father mother and two children denoted byF M S1 S2 respectively Using similar notation as inBurton et al (1999) a conventional mixed linearmodel consisting of fixed and random effects may bewritten in the form

Qij = bTz + Aij +Cij + CSij+ Eij

where Qij is the observed value of a normally distrib-uted continuous trait for the jth individual in the ith

nuclear family zij is a vector of observed covariatesrepresenting fixed effects and b is a correspondingvector of unknown fixed regression coefficients Aij Cij and CSij

A denote random effects that repre-sent additive polygenic common family environmentand common sibling environment effects respectivelyThe variation in an individual response is representedby a composite covariance matrix VT and is the sumof an additive genetic covariance matrix VA acommon family environment matrix VC a sharedsibling environment matrix VCs and residual environ-mental effects The different variance components are

The overall total covariance matrix is

The components of variance s 2A s 2

C s 2Cs need not be

positive as long as VT is positive definite A negativevalue for s 2

Cs simply implies that the realized covari-ance between sib lings is less than the realized

107Twin Research February 2004

Analysis of Melanoma Onset

covariance between parent and a child A negativevalue for s 2

Cs may suggest dominance To optimizeconvergence in BUGS Model (1) may be reparameter-ized as

where Fi Gi H i are independent additive randomeffects or latent variables and are the residual errorterms for parents and children respectively If we model

Fi ~ N(0 mdash12mdash s 2

A +s 2C)

Gi ~ N(0 mdash12mdash s 2

A) Hi ~ N(0 s 2Cs)

then the basic genetics covariance model (additivegenetic common environment and unique environ-ment for this particular four-member family structureis satisfied

In survival models unobserved or unmeasuredexplanatory variables some of which may be geneticare often referred to as frailties The frailties takevalues restricted to the posit ive line and may beassumed to act multiplicatively on the hazardExtending the above model to correlated family datawith time-to-onset endpoint a multiplicative individ-ual heterogeneity or frailty term representing thelatent genetic and common environment variablesmay be modeled as random effects simultaneouslywith the effects associated with observed covariatesConsider right censored time to onset of melanomadata (Tij dij zij)1 j n from n relative pairs hereT i j denotes the true age at onset of the jth familymember or the censored time depending on whetherdij =1 or 0 respectively and z denotes a p acute 1 vector ofcovariates A Weibull distribution may be used tomodel time to failure as

f(ti zi) = ebrsquoz igt igndash1 exp(ndashebrsquoz it g

i )

where b is a vector of unknown regression coefficientsand g is the shape parameter of the Weibull distribu-tion This leads to a baseline hazard of the form

l0(ti) = gt igndash1

Re-parameterize by letting mi = ebrsquoz i the conditionaldistribution of ti given mI is then Weibull (gmi) We for-mulated a mixed model to represent the conditionaldistribution of tij given covariate effects random addi-tive genetic and common environment effects as

tijfrac12mij ~ Weibull (g mij) i = 1hellipn j = 12

where

The regression coefficients and the precision of therandom effects (tG tF tH) were given ldquonon-informa-tiverdquo Normal and Gamma priors respectively The shapeparameter g of the time to onset of melanoma distribu-tion was also given a non-informative Gamma priorwhich was slowly decreasing on the positive real line

The overall model (including random effects) maybe described in a Bayesian graph (Figure 3) whichsimplifies sampling from full conditional distributionsby exploiting part ial independence propert ies(Spiegelhalter et al 1996a) In this graph eachrandom quantity is represented by a node which maybe connected by directed or undirected linksConditional independence assumptions are repre-sented by the absence of such links

We implemented the Gibbs sampler using theBUGS program (Gilks et al 1994) (code in theAppendix) Imputation of missing data was handlednaturally in the Gibbs sampling framework by treatingmissing values as additional unknown quantities andrandomly sampling values from their full conditionaldistributions We chose simple prior distributions forimputation since the number of missing values forcovariates was not large (complete for birth year 16missing for naevi and 21 missing for freckles) andthere was no indication of non-random missingness inour data Therefore imputation for missing naevi andfreckles covariates were based on Bernouilli prior dis-tributions with respective parameter values estimatedfrom the complete observations We performed aninitial 10000 burn-in iterations followed by an addi-tional 20000 Parameter estimates were the mean andstandard deviation (SD) of all post convergence Gibbssamples with a thinning interval between 20 and 50credible intervals were computed as the lower andupper a2 percentiles from the last 20000 iterationsConvergence to the posterior distribution was con-firmed by using the different criteria provided by theadd-on CODA package including those of Gelmanand Rubin (1992) Geweke (1992) and Raftery andLewis (1992a 1992b)

ResultsGEE Approach

An inspection of residual plots following preliminarymodel-fitting provided no evidence for the failure ofproportional hazards assumption and did not detectinfluential observations We then proceeded to applythe GEE approach that could estimate regressioncoefficients while incorporating a dependence struc-ture between relative pairs The results are displayed

108 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

in Table 3 which suggest that later birth year havingat least a moderate number of naevi and freckles wereall simultaneously associated with later age at onsetof melanoma Table 3 also presents the estimatedodds-ratios for quantifying the correlation betweenpaired relatives of specific relationships

The odds ratio for sibndashsib pairs is 2973 which issignificantly different from 1 (p lt 001) The odds ratiofor parentndashchild pairs is 1650 which is slightly greaterthan 1 indicating a mild dependency between thesepairs although they are not quite statistically signifi-cant The odds ratio for second-degree and higherrelative pairs is 1155 indicating no dependence at allbetween these pairs This pattern of familial aggrega-tion is compatible with dominance variance as well asadditive genetic variance (Falconer 1990)

Bayesian Approach

We re-analyzed the age-at-onset of melanoma by usingGibbs sampling to impute missing covariates and toestimate subject-specific covariate effects randomadditive genetic common family environment andshared sibling environment effects on the log scale Theestimated shape parameter g was 43 with a 95 credi-ble interval (CI) of (42 46) The results aresummarized in Table 4 (Model A) The residual plotsdid not indicate a gross departure from the underlyingWeibull model and revealed no influential observa-tions We checked the sensitivity of the analyses to

initial parameter values by re-running the Gibbssampler five more times using different starting valuesThe resulting estimates did not differ by more than 5from the values reported here The mean estimate for(s 2

A) was 0452 with 95 CI = (0348 0566) for (s 2C)

was ndash0053 with 95 CI = (ndash0120 0019) and for(s 2

Cs) was 0467 with 95 CI = (0393 0545)A small negative value for the common family

effect suggests that there may be a dominant effect orthat the current model is not quite appropriate forexample that there are systematic effects or correla-tion structures that have not been accounted for Theresults here indicate that additive genetics seem toimpact equally on the variation of the age at onset ofmelanoma Further exploration for alternative modelsis a focus of our future research

In addition we investigated the relative contribu-tion of genetic and environmental effects on theexpression of naevi (Model B) and freckles (ModelC) which are known risk factors for melanoma Theexpressions of naevi and freckles were coded asbinary variables (none or few moles versus moderateor many moles and no freckles versus one or morefreckles) A hierarchical Bayesian binomial model wasfitted to estimate the random variance componentsThe results in Table 4 indicate that a common familyenvironment effect contributed the most to theexpression of naevi (s 2

C = 0704) (relative to the con-tributions of additive genetic effect (s 2

A = 0142) and

Individual j in family i

Figure 3

Graphical model of covariate and random family effects for an individual in a nuclear family structure tij represents the observed failure time forthe jth individual in the ith family with tcenij being an indicator variable of censoring status Full arrows indicate stochastic links to which a probabil-ity is attached broken arrows denote deterministic relationships bs are regression coefficients T is the precision of the prior distribution andequals the inverse of the variance Fi Gi Hi are random effects modeled as Fi ~ N(012 s 2

A + s 2C) Gi ~ N(0 12 s 2

A) and Hi ~ N(0 s 2Cs) Rectangles rep-

resent actual data values for the covariates g and mij are shape and scale parameters for the underlying Weibull distribution

109Twin Research February 2004

Analysis of Melanoma Onset

of shared sibling effect(s 2Cs = 0142) both of which

were non negligible In contrast variation in theexpression of freckles was largely explained by addi-tive genetic and shared family effects(s 2

A = 2050 s 2C =

2600) compared to a relatively small shared siblingeffect(s 2

C = 0115)

Discussion

We applied two methods mdash generalized estimatingequat ion and Bayesian analys is mdash to the geneticanalysis of age at onset of melanoma based on anuclear family structure Under both approaches theresults suggest that additive genetic factors played an

Table 4

Gibbs Sampling Approach Estimated Regression Coefficients and Estimated Variance Components in a Melanoma Study of Queensland Families

Weibull Model A Mean effects mdash Response variable is Age-at-onset

Covariate RR = eb Coefficient b Robust se(b) 95 CI of bYear of birth 1378 0321 00027 (03160326)

Naevi 1126 0119 00021 (00580185)

Freckling 1017 0017 01400 (ndash00050085)

Weibull Model B Variance components mdash Response variable is Age-at-onset

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 0452 0054 (03480566)

s 2C ndash0053 0027 (ndash01200019)

s 2Cs 0467 0040 (03930545)

g 43 0104 (4246)

Binomial Model Variance components mdash Response variable is Naevi

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 0142 0149 (00020498)

s 2C 0704 0156 (04031010)

s 2Cs 0195 0157 (000250553)

Binomial Model Variance components mdash Response variable is Freckling

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 2050 0779 (08353570)

s 2C 2600 0418 (17803460)

s 2Cs 0115 0088 (00110312)

Note Indicates Significance at the 005 level) Naevi Is a Binary Variable with Baseline 0 = No or Few Moles Freckling Is Coded as a Binary Variable with Baseline0 = No Freckles

Table 3

GEE Approach Estimated Regression Coefficients in the Proportional Hazard Model and Estimated Odds Ratios for Quantifying FamilialAggregation in Age at Onset of Melanoma in Queensland Families ( Indicates Significance at the 05 Significance Level)

A Mean effects

Covariate RR = eb Coefficient b Robust se(b) Z-statistic

Year of birth 1142 0132 0051 2588

Naevi (Baseline = No or few moles) 1765 0568 0073 7781

Freckling (Baseline = No freckles) 1160 0148 0049 3020

B Pattterns of familial aggregation

Relationship 1 + q se(q) Z-statistic

Sibndashsib 2973 06217 317

Parentndashchild 1650 0434 150

SecondOthers 1155 0270 047

important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix

Acknowledgments

Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee

ReferencesAbel L amp Bonney G E (1990) A time-dependent

logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407

Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401

Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165

Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc

Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag

Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press

Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625

Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California

Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212

Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151

Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117

Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202

Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235

English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283

Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd

Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494

GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software

110 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

111Twin Research February 2004

Analysis of Melanoma Onset

Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214

Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188

Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409

Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511

Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741

Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press

Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178

Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189

Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126

Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432

Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109

Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760

Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678

Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506

Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071

Kempthorne O (1960) Biometrical genetics New YorkPergammon Press

Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press

Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210

Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452

Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334

LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425

Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293

Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298

MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432

Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092

Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43

Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493

Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497

Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press

SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University

Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148

112 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171

Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press

Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit

Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit

Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95

Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146

Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation

Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230

Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington

Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454

113Twin Research February 2004

Analysis of Melanoma Onset

Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects

Page 10: Analysis of Melanoma Onset: Assessing Familial Aggregation ... · and Fitting Variance Components via Bayesian Random Effects Models Kim-Anh Do1, Joanne F. Aitken2, Adè le C. Green3,

107Twin Research February 2004

Analysis of Melanoma Onset

covariance between parent and a child A negativevalue for s 2

Cs may suggest dominance To optimizeconvergence in BUGS Model (1) may be reparameter-ized as

where Fi Gi H i are independent additive randomeffects or latent variables and are the residual errorterms for parents and children respectively If we model

Fi ~ N(0 mdash12mdash s 2

A +s 2C)

Gi ~ N(0 mdash12mdash s 2

A) Hi ~ N(0 s 2Cs)

then the basic genetics covariance model (additivegenetic common environment and unique environ-ment for this particular four-member family structureis satisfied

In survival models unobserved or unmeasuredexplanatory variables some of which may be geneticare often referred to as frailties The frailties takevalues restricted to the posit ive line and may beassumed to act multiplicatively on the hazardExtending the above model to correlated family datawith time-to-onset endpoint a multiplicative individ-ual heterogeneity or frailty term representing thelatent genetic and common environment variablesmay be modeled as random effects simultaneouslywith the effects associated with observed covariatesConsider right censored time to onset of melanomadata (Tij dij zij)1 j n from n relative pairs hereT i j denotes the true age at onset of the jth familymember or the censored time depending on whetherdij =1 or 0 respectively and z denotes a p acute 1 vector ofcovariates A Weibull distribution may be used tomodel time to failure as

f(ti zi) = ebrsquoz igt igndash1 exp(ndashebrsquoz it g

i )

where b is a vector of unknown regression coefficientsand g is the shape parameter of the Weibull distribu-tion This leads to a baseline hazard of the form

l0(ti) = gt igndash1

Re-parameterize by letting mi = ebrsquoz i the conditionaldistribution of ti given mI is then Weibull (gmi) We for-mulated a mixed model to represent the conditionaldistribution of tij given covariate effects random addi-tive genetic and common environment effects as

tijfrac12mij ~ Weibull (g mij) i = 1hellipn j = 12

where

The regression coefficients and the precision of therandom effects (tG tF tH) were given ldquonon-informa-tiverdquo Normal and Gamma priors respectively The shapeparameter g of the time to onset of melanoma distribu-tion was also given a non-informative Gamma priorwhich was slowly decreasing on the positive real line

The overall model (including random effects) maybe described in a Bayesian graph (Figure 3) whichsimplifies sampling from full conditional distributionsby exploiting part ial independence propert ies(Spiegelhalter et al 1996a) In this graph eachrandom quantity is represented by a node which maybe connected by directed or undirected linksConditional independence assumptions are repre-sented by the absence of such links

We implemented the Gibbs sampler using theBUGS program (Gilks et al 1994) (code in theAppendix) Imputation of missing data was handlednaturally in the Gibbs sampling framework by treatingmissing values as additional unknown quantities andrandomly sampling values from their full conditionaldistributions We chose simple prior distributions forimputation since the number of missing values forcovariates was not large (complete for birth year 16missing for naevi and 21 missing for freckles) andthere was no indication of non-random missingness inour data Therefore imputation for missing naevi andfreckles covariates were based on Bernouilli prior dis-tributions with respective parameter values estimatedfrom the complete observations We performed aninitial 10000 burn-in iterations followed by an addi-tional 20000 Parameter estimates were the mean andstandard deviation (SD) of all post convergence Gibbssamples with a thinning interval between 20 and 50credible intervals were computed as the lower andupper a2 percentiles from the last 20000 iterationsConvergence to the posterior distribution was con-firmed by using the different criteria provided by theadd-on CODA package including those of Gelmanand Rubin (1992) Geweke (1992) and Raftery andLewis (1992a 1992b)

ResultsGEE Approach

An inspection of residual plots following preliminarymodel-fitting provided no evidence for the failure ofproportional hazards assumption and did not detectinfluential observations We then proceeded to applythe GEE approach that could estimate regressioncoefficients while incorporating a dependence struc-ture between relative pairs The results are displayed

108 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

in Table 3 which suggest that later birth year havingat least a moderate number of naevi and freckles wereall simultaneously associated with later age at onsetof melanoma Table 3 also presents the estimatedodds-ratios for quantifying the correlation betweenpaired relatives of specific relationships

The odds ratio for sibndashsib pairs is 2973 which issignificantly different from 1 (p lt 001) The odds ratiofor parentndashchild pairs is 1650 which is slightly greaterthan 1 indicating a mild dependency between thesepairs although they are not quite statistically signifi-cant The odds ratio for second-degree and higherrelative pairs is 1155 indicating no dependence at allbetween these pairs This pattern of familial aggrega-tion is compatible with dominance variance as well asadditive genetic variance (Falconer 1990)

Bayesian Approach

We re-analyzed the age-at-onset of melanoma by usingGibbs sampling to impute missing covariates and toestimate subject-specific covariate effects randomadditive genetic common family environment andshared sibling environment effects on the log scale Theestimated shape parameter g was 43 with a 95 credi-ble interval (CI) of (42 46) The results aresummarized in Table 4 (Model A) The residual plotsdid not indicate a gross departure from the underlyingWeibull model and revealed no influential observa-tions We checked the sensitivity of the analyses to

initial parameter values by re-running the Gibbssampler five more times using different starting valuesThe resulting estimates did not differ by more than 5from the values reported here The mean estimate for(s 2

A) was 0452 with 95 CI = (0348 0566) for (s 2C)

was ndash0053 with 95 CI = (ndash0120 0019) and for(s 2

Cs) was 0467 with 95 CI = (0393 0545)A small negative value for the common family

effect suggests that there may be a dominant effect orthat the current model is not quite appropriate forexample that there are systematic effects or correla-tion structures that have not been accounted for Theresults here indicate that additive genetics seem toimpact equally on the variation of the age at onset ofmelanoma Further exploration for alternative modelsis a focus of our future research

In addition we investigated the relative contribu-tion of genetic and environmental effects on theexpression of naevi (Model B) and freckles (ModelC) which are known risk factors for melanoma Theexpressions of naevi and freckles were coded asbinary variables (none or few moles versus moderateor many moles and no freckles versus one or morefreckles) A hierarchical Bayesian binomial model wasfitted to estimate the random variance componentsThe results in Table 4 indicate that a common familyenvironment effect contributed the most to theexpression of naevi (s 2

C = 0704) (relative to the con-tributions of additive genetic effect (s 2

A = 0142) and

Individual j in family i

Figure 3

Graphical model of covariate and random family effects for an individual in a nuclear family structure tij represents the observed failure time forthe jth individual in the ith family with tcenij being an indicator variable of censoring status Full arrows indicate stochastic links to which a probabil-ity is attached broken arrows denote deterministic relationships bs are regression coefficients T is the precision of the prior distribution andequals the inverse of the variance Fi Gi Hi are random effects modeled as Fi ~ N(012 s 2

A + s 2C) Gi ~ N(0 12 s 2

A) and Hi ~ N(0 s 2Cs) Rectangles rep-

resent actual data values for the covariates g and mij are shape and scale parameters for the underlying Weibull distribution

109Twin Research February 2004

Analysis of Melanoma Onset

of shared sibling effect(s 2Cs = 0142) both of which

were non negligible In contrast variation in theexpression of freckles was largely explained by addi-tive genetic and shared family effects(s 2

A = 2050 s 2C =

2600) compared to a relatively small shared siblingeffect(s 2

C = 0115)

Discussion

We applied two methods mdash generalized estimatingequat ion and Bayesian analys is mdash to the geneticanalysis of age at onset of melanoma based on anuclear family structure Under both approaches theresults suggest that additive genetic factors played an

Table 4

Gibbs Sampling Approach Estimated Regression Coefficients and Estimated Variance Components in a Melanoma Study of Queensland Families

Weibull Model A Mean effects mdash Response variable is Age-at-onset

Covariate RR = eb Coefficient b Robust se(b) 95 CI of bYear of birth 1378 0321 00027 (03160326)

Naevi 1126 0119 00021 (00580185)

Freckling 1017 0017 01400 (ndash00050085)

Weibull Model B Variance components mdash Response variable is Age-at-onset

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 0452 0054 (03480566)

s 2C ndash0053 0027 (ndash01200019)

s 2Cs 0467 0040 (03930545)

g 43 0104 (4246)

Binomial Model Variance components mdash Response variable is Naevi

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 0142 0149 (00020498)

s 2C 0704 0156 (04031010)

s 2Cs 0195 0157 (000250553)

Binomial Model Variance components mdash Response variable is Freckling

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 2050 0779 (08353570)

s 2C 2600 0418 (17803460)

s 2Cs 0115 0088 (00110312)

Note Indicates Significance at the 005 level) Naevi Is a Binary Variable with Baseline 0 = No or Few Moles Freckling Is Coded as a Binary Variable with Baseline0 = No Freckles

Table 3

GEE Approach Estimated Regression Coefficients in the Proportional Hazard Model and Estimated Odds Ratios for Quantifying FamilialAggregation in Age at Onset of Melanoma in Queensland Families ( Indicates Significance at the 05 Significance Level)

A Mean effects

Covariate RR = eb Coefficient b Robust se(b) Z-statistic

Year of birth 1142 0132 0051 2588

Naevi (Baseline = No or few moles) 1765 0568 0073 7781

Freckling (Baseline = No freckles) 1160 0148 0049 3020

B Pattterns of familial aggregation

Relationship 1 + q se(q) Z-statistic

Sibndashsib 2973 06217 317

Parentndashchild 1650 0434 150

SecondOthers 1155 0270 047

important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix

Acknowledgments

Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee

ReferencesAbel L amp Bonney G E (1990) A time-dependent

logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407

Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401

Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165

Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc

Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag

Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press

Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625

Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California

Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212

Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151

Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117

Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202

Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235

English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283

Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd

Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494

GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software

110 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

111Twin Research February 2004

Analysis of Melanoma Onset

Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214

Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188

Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409

Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511

Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741

Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press

Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178

Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189

Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126

Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432

Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109

Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760

Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678

Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506

Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071

Kempthorne O (1960) Biometrical genetics New YorkPergammon Press

Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press

Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210

Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452

Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334

LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425

Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293

Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298

MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432

Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092

Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43

Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493

Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497

Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press

SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University

Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148

112 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171

Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press

Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit

Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit

Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95

Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146

Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation

Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230

Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington

Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454

113Twin Research February 2004

Analysis of Melanoma Onset

Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects

Page 11: Analysis of Melanoma Onset: Assessing Familial Aggregation ... · and Fitting Variance Components via Bayesian Random Effects Models Kim-Anh Do1, Joanne F. Aitken2, Adè le C. Green3,

108 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

in Table 3 which suggest that later birth year havingat least a moderate number of naevi and freckles wereall simultaneously associated with later age at onsetof melanoma Table 3 also presents the estimatedodds-ratios for quantifying the correlation betweenpaired relatives of specific relationships

The odds ratio for sibndashsib pairs is 2973 which issignificantly different from 1 (p lt 001) The odds ratiofor parentndashchild pairs is 1650 which is slightly greaterthan 1 indicating a mild dependency between thesepairs although they are not quite statistically signifi-cant The odds ratio for second-degree and higherrelative pairs is 1155 indicating no dependence at allbetween these pairs This pattern of familial aggrega-tion is compatible with dominance variance as well asadditive genetic variance (Falconer 1990)

Bayesian Approach

We re-analyzed the age-at-onset of melanoma by usingGibbs sampling to impute missing covariates and toestimate subject-specific covariate effects randomadditive genetic common family environment andshared sibling environment effects on the log scale Theestimated shape parameter g was 43 with a 95 credi-ble interval (CI) of (42 46) The results aresummarized in Table 4 (Model A) The residual plotsdid not indicate a gross departure from the underlyingWeibull model and revealed no influential observa-tions We checked the sensitivity of the analyses to

initial parameter values by re-running the Gibbssampler five more times using different starting valuesThe resulting estimates did not differ by more than 5from the values reported here The mean estimate for(s 2

A) was 0452 with 95 CI = (0348 0566) for (s 2C)

was ndash0053 with 95 CI = (ndash0120 0019) and for(s 2

Cs) was 0467 with 95 CI = (0393 0545)A small negative value for the common family

effect suggests that there may be a dominant effect orthat the current model is not quite appropriate forexample that there are systematic effects or correla-tion structures that have not been accounted for Theresults here indicate that additive genetics seem toimpact equally on the variation of the age at onset ofmelanoma Further exploration for alternative modelsis a focus of our future research

In addition we investigated the relative contribu-tion of genetic and environmental effects on theexpression of naevi (Model B) and freckles (ModelC) which are known risk factors for melanoma Theexpressions of naevi and freckles were coded asbinary variables (none or few moles versus moderateor many moles and no freckles versus one or morefreckles) A hierarchical Bayesian binomial model wasfitted to estimate the random variance componentsThe results in Table 4 indicate that a common familyenvironment effect contributed the most to theexpression of naevi (s 2

C = 0704) (relative to the con-tributions of additive genetic effect (s 2

A = 0142) and

Individual j in family i

Figure 3

Graphical model of covariate and random family effects for an individual in a nuclear family structure tij represents the observed failure time forthe jth individual in the ith family with tcenij being an indicator variable of censoring status Full arrows indicate stochastic links to which a probabil-ity is attached broken arrows denote deterministic relationships bs are regression coefficients T is the precision of the prior distribution andequals the inverse of the variance Fi Gi Hi are random effects modeled as Fi ~ N(012 s 2

A + s 2C) Gi ~ N(0 12 s 2

A) and Hi ~ N(0 s 2Cs) Rectangles rep-

resent actual data values for the covariates g and mij are shape and scale parameters for the underlying Weibull distribution

109Twin Research February 2004

Analysis of Melanoma Onset

of shared sibling effect(s 2Cs = 0142) both of which

were non negligible In contrast variation in theexpression of freckles was largely explained by addi-tive genetic and shared family effects(s 2

A = 2050 s 2C =

2600) compared to a relatively small shared siblingeffect(s 2

C = 0115)

Discussion

We applied two methods mdash generalized estimatingequat ion and Bayesian analys is mdash to the geneticanalysis of age at onset of melanoma based on anuclear family structure Under both approaches theresults suggest that additive genetic factors played an

Table 4

Gibbs Sampling Approach Estimated Regression Coefficients and Estimated Variance Components in a Melanoma Study of Queensland Families

Weibull Model A Mean effects mdash Response variable is Age-at-onset

Covariate RR = eb Coefficient b Robust se(b) 95 CI of bYear of birth 1378 0321 00027 (03160326)

Naevi 1126 0119 00021 (00580185)

Freckling 1017 0017 01400 (ndash00050085)

Weibull Model B Variance components mdash Response variable is Age-at-onset

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 0452 0054 (03480566)

s 2C ndash0053 0027 (ndash01200019)

s 2Cs 0467 0040 (03930545)

g 43 0104 (4246)

Binomial Model Variance components mdash Response variable is Naevi

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 0142 0149 (00020498)

s 2C 0704 0156 (04031010)

s 2Cs 0195 0157 (000250553)

Binomial Model Variance components mdash Response variable is Freckling

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 2050 0779 (08353570)

s 2C 2600 0418 (17803460)

s 2Cs 0115 0088 (00110312)

Note Indicates Significance at the 005 level) Naevi Is a Binary Variable with Baseline 0 = No or Few Moles Freckling Is Coded as a Binary Variable with Baseline0 = No Freckles

Table 3

GEE Approach Estimated Regression Coefficients in the Proportional Hazard Model and Estimated Odds Ratios for Quantifying FamilialAggregation in Age at Onset of Melanoma in Queensland Families ( Indicates Significance at the 05 Significance Level)

A Mean effects

Covariate RR = eb Coefficient b Robust se(b) Z-statistic

Year of birth 1142 0132 0051 2588

Naevi (Baseline = No or few moles) 1765 0568 0073 7781

Freckling (Baseline = No freckles) 1160 0148 0049 3020

B Pattterns of familial aggregation

Relationship 1 + q se(q) Z-statistic

Sibndashsib 2973 06217 317

Parentndashchild 1650 0434 150

SecondOthers 1155 0270 047

important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix

Acknowledgments

Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee

ReferencesAbel L amp Bonney G E (1990) A time-dependent

logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407

Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401

Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165

Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc

Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag

Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press

Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625

Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California

Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212

Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151

Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117

Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202

Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235

English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283

Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd

Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494

GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software

110 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

111Twin Research February 2004

Analysis of Melanoma Onset

Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214

Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188

Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409

Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511

Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741

Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press

Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178

Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189

Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126

Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432

Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109

Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760

Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678

Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506

Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071

Kempthorne O (1960) Biometrical genetics New YorkPergammon Press

Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press

Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210

Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452

Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334

LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425

Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293

Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298

MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432

Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092

Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43

Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493

Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497

Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press

SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University

Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148

112 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171

Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press

Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit

Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit

Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95

Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146

Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation

Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230

Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington

Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454

113Twin Research February 2004

Analysis of Melanoma Onset

Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects

Page 12: Analysis of Melanoma Onset: Assessing Familial Aggregation ... · and Fitting Variance Components via Bayesian Random Effects Models Kim-Anh Do1, Joanne F. Aitken2, Adè le C. Green3,

109Twin Research February 2004

Analysis of Melanoma Onset

of shared sibling effect(s 2Cs = 0142) both of which

were non negligible In contrast variation in theexpression of freckles was largely explained by addi-tive genetic and shared family effects(s 2

A = 2050 s 2C =

2600) compared to a relatively small shared siblingeffect(s 2

C = 0115)

Discussion

We applied two methods mdash generalized estimatingequat ion and Bayesian analys is mdash to the geneticanalysis of age at onset of melanoma based on anuclear family structure Under both approaches theresults suggest that additive genetic factors played an

Table 4

Gibbs Sampling Approach Estimated Regression Coefficients and Estimated Variance Components in a Melanoma Study of Queensland Families

Weibull Model A Mean effects mdash Response variable is Age-at-onset

Covariate RR = eb Coefficient b Robust se(b) 95 CI of bYear of birth 1378 0321 00027 (03160326)

Naevi 1126 0119 00021 (00580185)

Freckling 1017 0017 01400 (ndash00050085)

Weibull Model B Variance components mdash Response variable is Age-at-onset

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 0452 0054 (03480566)

s 2C ndash0053 0027 (ndash01200019)

s 2Cs 0467 0040 (03930545)

g 43 0104 (4246)

Binomial Model Variance components mdash Response variable is Naevi

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 0142 0149 (00020498)

s 2C 0704 0156 (04031010)

s 2Cs 0195 0157 (000250553)

Binomial Model Variance components mdash Response variable is Freckling

Latent effect Mean from 5000 iterations se(s2) 95 CI of s2

s 2A 2050 0779 (08353570)

s 2C 2600 0418 (17803460)

s 2Cs 0115 0088 (00110312)

Note Indicates Significance at the 005 level) Naevi Is a Binary Variable with Baseline 0 = No or Few Moles Freckling Is Coded as a Binary Variable with Baseline0 = No Freckles

Table 3

GEE Approach Estimated Regression Coefficients in the Proportional Hazard Model and Estimated Odds Ratios for Quantifying FamilialAggregation in Age at Onset of Melanoma in Queensland Families ( Indicates Significance at the 05 Significance Level)

A Mean effects

Covariate RR = eb Coefficient b Robust se(b) Z-statistic

Year of birth 1142 0132 0051 2588

Naevi (Baseline = No or few moles) 1765 0568 0073 7781

Freckling (Baseline = No freckles) 1160 0148 0049 3020

B Pattterns of familial aggregation

Relationship 1 + q se(q) Z-statistic

Sibndashsib 2973 06217 317

Parentndashchild 1650 0434 150

SecondOthers 1155 0270 047

important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix

Acknowledgments

Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee

ReferencesAbel L amp Bonney G E (1990) A time-dependent

logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407

Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401

Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165

Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc

Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag

Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press

Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625

Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California

Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212

Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151

Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117

Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202

Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235

English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283

Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd

Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494

GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software

110 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

111Twin Research February 2004

Analysis of Melanoma Onset

Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214

Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188

Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409

Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511

Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741

Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press

Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178

Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189

Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126

Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432

Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109

Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760

Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678

Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506

Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071

Kempthorne O (1960) Biometrical genetics New YorkPergammon Press

Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press

Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210

Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452

Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334

LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425

Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293

Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298

MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432

Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092

Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43

Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493

Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497

Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press

SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University

Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148

112 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171

Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press

Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit

Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit

Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95

Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146

Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation

Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230

Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington

Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454

113Twin Research February 2004

Analysis of Melanoma Onset

Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects

Page 13: Analysis of Melanoma Onset: Assessing Familial Aggregation ... · and Fitting Variance Components via Bayesian Random Effects Models Kim-Anh Do1, Joanne F. Aitken2, Adè le C. Green3,

important role in the age at onset of melanoma butthat shared sibling environmental factors were notnegligible We focused attention on these approachesbecause they are more appropriate for modeling cor-related age at onset data and they allow the inclusionof covariates in the analyses Under both approachesthere were suggestions that earlier melanoma onsetwas influenced by later birth year having a moderatenumber of naevi and being freckly The principal dif-ference between the two approaches i s in theinterpretation of the regression coefficients The GEEmethod uses a marginal approach resulting in regres-sion coefficients that describe the average populationresponse to changing covariates whereas the Bayesianapproach produces subject-specific coefficients A sec-ondary distinction is in the nature of the within-pairdependence The GEE model only describes acommon covariance among specific relative pairswhereas the Bayesian approach can explicitly describethe source of this covariance A third advantage of theBayesian method is its flexibility in incorporatingprior information if available for the covariates orlatent effects by modifying their prior distributionsFurther the Bayesian method would permit a moreaccurate decomposition of the genetic variance intoadditive and dominant components thus providingthe means for a direct assessment of the no-domi-nance assumption Finally it is also interesting torecord the amount of CPU time required for eachmethod 20 seconds for the GEE approach andapproximately 3 hours (for binary traits) and 12hours (for age at onset outcome) to run BUGS on asingle-user Intel Pentium III 600 MHz personal com-puter with the Linux Mandrake 71 operating systemThe amount of human and financial resources dedi-cated to collecting maintaining and updating themelanoma family database is extremely high there-fore the extra CPU time requirement by the MCMCmethod is well worth the additional genetic informa-tion and flexibility that it provides A BUGS programis included in the Appendix

Acknowledgments

Data collection was supported by grants from theAustralian National Health and Medical ResearchCouncil (NHMRC) ID 930223 and ID 961061 USNational Cancer Institute (CA88363) Programmingresources were provided by start-up funds from theDepartment of Biostatistics at the M D AndersonCancer Center We acknowledge assistance withBUGS programming from Sang-Joon Lee

ReferencesAbel L amp Bonney G E (1990) A time-dependent

logistic hazard function for modeling variable age ofonset in the analysis of familial diseases GeneticEpidemiology 12 391ndash407

Aitken J F Bailey-Wilson J Green A C MacLennanR amp Martin N G (1998) Segregation analysis ofcutaneous melanoma in Queensland Genetic Epi-demiology 15 391ndash401

Aitken J F Green A C MacLennan R Youl P ampMartin N G (1996) The Queensland familialmelanoma project Study design and characteristics ofparticipants Melanoma Research 6 155ndash165

Allison P D (1995) Survival analysis using the SASsystem mdash A practical guide Cary NC SAS InstituteInc

Andersen P K Borgan O Gill R D amp Keiding N(1993) Statistical models based on count ingprocesses New York Springer-Verlag

Bernardo J M Berber J O David A P amp Smith F FM (Eds) (1992) Bayesian statistics 4 OxfordClarendon Press

Bonney G E (1986) Regressive logistic model for famil-ial disease and other binary traits Biometrics 42611ndash625

Breiman L Friedman J H Olshen R A amp Stone CJ (1984) Classif ication and regression treesWadsworth California

Cannings C amp Thompson E A (1977) Ascertainmentin the sequential sampling of pedigrees ClinicalGenetics 12 208ndash212

Clayton D G (1978) A model for association in bivari-ate life-tables and its application in epidemiologicalstudies of chronic disease incidence Biometrika 65141ndash151

Clayton D G amp Cuziak J (1985) Multivariate gener-alizations of the proportional hazards model (withdiscussion) Journal of the Royal Statistical SocietySeries A 148 82ndash117

Cox D R (1972) Regression models and life tables(with discussion) Journal of the Royal StatisticalSociety Series B 34 187ndash202

Do K-A Broom B M Kuhnert P Duffy D LTodorov A A Treloar S A amp Martin N G(2000) Genetic analysis of the age at menopause byusing estimating equations and Bayesian randomeffects models Statistics in Medicine 19 1217ndash1235

English D R Armstrong B K Kricker A et al(1997) Sunlight and cancer Cancer Causes Control8 271ndash283

Falconer D S (1990) Introduction to quantitativegenetics (3rd ed) New York Longman Group Ltd

Faucett C Gauderman W J Thomas D Ziogas Aamp Sobel E (1993) Combined segregation andlinkage analysis of late-onset Alzheimerrsquos disease inDuke famil ies us ing Gibbs sampling GeneticEpidemiology 10 489ndash494

GAP (1996) Genetic Analysis Package Release 10Pasadena CA Epicenter Software

110 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

111Twin Research February 2004

Analysis of Melanoma Onset

Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214

Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188

Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409

Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511

Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741

Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press

Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178

Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189

Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126

Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432

Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109

Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760

Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678

Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506

Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071

Kempthorne O (1960) Biometrical genetics New YorkPergammon Press

Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press

Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210

Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452

Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334

LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425

Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293

Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298

MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432

Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092

Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43

Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493

Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497

Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press

SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University

Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148

112 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171

Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press

Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit

Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit

Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95

Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146

Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation

Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230

Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington

Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454

113Twin Research February 2004

Analysis of Melanoma Onset

Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects

Page 14: Analysis of Melanoma Onset: Assessing Familial Aggregation ... · and Fitting Variance Components via Bayesian Random Effects Models Kim-Anh Do1, Joanne F. Aitken2, Adè le C. Green3,

111Twin Research February 2004

Analysis of Melanoma Onset

Gauderman W J Morrison J L Carpenter C L ampThomas D C (1997) Analysis of gene-smokinginteraction in lung cancer Genetic Epidemiology 14199ndash214

Gauderman W J amp Thomas D C (1994) Censoredsurvival models for genetic epidemiology A Gibbssampling approach Genetic Epidemiology 11171ndash188

Gelfand A E amp Smith A F M (1990) Sampling-based approaches to calculating marginal densitiesJournal of the American Statistical Association 85398ndash409

Gelman A amp Rubin D B (1992) Inference from itera-tive simulation using mult iple sequences (withdiscussion) Statistical Science 7 457ndash511

Geman S amp Geman D (1984) Stochastic relaxationGibbs distributions and the Bayesian restoration ofimages IEEE Trans Pattern Anal Mach Intelligence6 721ndash741

Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior momentsIn J M Bernardo J O Berber A P David amp F FM Smith (Eds) Bayesian statistics 4 OxfordClarendon Press

Gilks W R Thomas A amp Spiegelhalter D J (1994)A language and program for complex Bayesian mod-eling Statistica 43 169ndash178

Guo S W amp Thompson E A (1991) Monte CarloEM for the estimation of multiple random effectsmodels on large pedigrees IMA Journal ofMathematics Applied in Medicine and Biology 8171ndash189

Guo S W amp Thompson E A (1992) A Monte Carlomethod for combined segregation and linkage analy-sis American Journal of Human Genetics 511111ndash1126

Guo S W amp Thompson E A (1994) Monte Carloestimation of mixed models for large complex pedi-grees Biometrics 50 417ndash432

Hastings W K (1970) Monte Carlo sampling methodsusing Markov chains and their applicationsBiometrika 57 97ndash109

Heath A C (1997) Markov Chain Monte Carlo segre-gation and linkage analysis for oligogenic modelsAmerican Journal of Human Genetics 61 748ndash760

Hougaard P (1986) A class of multivariate failure timedistributions Biometrika 73 671ndash678

Hsu L amp Prentice R L (1996) On assessing thestrength of dependency between failure time variatesBiometrika 83 491ndash506

Hsu L amp Zhao L P (1996) Assessing familial aggre-gation of age at onset by using estimating equationswith application to breast cancer American Journalof Human Genetics 58 1057ndash1071

Kempthorne O (1960) Biometrical genetics New YorkPergammon Press

Kendall M amp Stuart A (1979) The advanced theoryof statistics (Vol 2 4th ed) Oxford OxfordUniversity Press

Kong K Frigge M Cox N amp Wong W H (1992)Linkage analysis with adjustments for covariates Amethod combining peeling with Gibbs samplingCytogenetics and Cell Genetics 59 208ndash210

Kuhnert P amp Do K-A (2003) Fitting genetic models totwin data with binary and ordered categoricalresponses A comparison of structural equation mod-eling and Bayesian hierarchical models BehaviorGenetics 33 439ndash452

Lange K amp Sobel E (1991) A random walk methodfor computing genetic location scores AmericanJournal of Human Genetics 49 1320ndash1334

LeBlanc M amp Crowley J (1992) Relative risk trees forcensored survival data Biometrics 48 411ndash425

Li H amp Thompson E A (1997) Semiparametric esti-mation of major gene and random environmentaleffects for age of onset Biometrics 53 282ndash293

Li H amp Wijsman E A (1998) Semiparametric estima-tion of major gene effects for age of onset GeneticEpidemiology 15 279ndash298

MacLennan R Green A C McLeod G R amp MartinN G (1992) Increasing incidence of cutaneousmelanoma in Queensland Australia Journal of theNational Cancer Institute 84 1424ndash1432

Metropolis N Rosenbluth A W Rosenbluth M NTeller A H amp Teller E (1953) Equation of statecalculations by fast computing machines Journal ofChemical Physics 21 1087ndash1092

Nielsen G G Gill R D Andersen P K amp SorensonT I A (1992) A counting process approach tomaximum likelihood estimation in frailty modelsScandinavian Journal of Statistics 19 25ndash43

Oakes D (1989) Bivariate survival models induced byfrailties Journal of American Statistical Association84 487ndash493

Raftery A L Lewis S (1992a) Comment One longrun with diagnostics Implementation strategies forMarkov Chain Monte Carlo Statistical Science 7493ndash497

Raftery A L amp Lewis S (1992b) How many interac-tions in Gibbs sampler (pp 763ndash774) In J MBernardo J O Berber A P David amp F F M Smith(Eds) Bayesian statistics 4 Oxford Clarendon Press

SAGE (1992) Statistical Analysis for GeneticEpidemiology Release 21 Cleveland OH Departmentof Epidemiology and Biostatistics RammelkampCenter for Education and Research MetroHealthCampus Case Western Reserve University

Scurrah K J Palmer L J amp Burton P R (2000)Variance components analysis for pedigree-based cen-sored survival data unusing generalized linear mixedmodels (GLMMS) and Gibbs sampling in BUGSGenetic Epidemiology 19 127ndash148

112 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171

Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press

Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit

Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit

Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95

Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146

Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation

Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230

Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington

Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454

113Twin Research February 2004

Analysis of Melanoma Onset

Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects

Page 15: Analysis of Melanoma Onset: Assessing Familial Aggregation ... · and Fitting Variance Components via Bayesian Random Effects Models Kim-Anh Do1, Joanne F. Aitken2, Adè le C. Green3,

112 Twin Research February 2004

Kim-Anh Do Joanne F Aitken Adele C Green and Nicholas G Martin

Siegmund K D amp McKnight B (1998) Modelinghazard functions in families Genetic Epidemiology15 147ndash171

Spiegelhalter D J Thomas A amp Best N G (1996a)Computation on Bayesian graphical models (pp407ndash425) Oxford Oxford University Press

Spiegelhalter D J Thomas A Best N G amp Gilks W(1996b) BUGS 06 ndashBayesian InferenceUsing GibbsSampling manual Cambridge MRC Biostatistics Unit

Spiegelhalter D J Thomas A Best N G Lunn D(2003) WinBUGS (Bayesian Inference Using GibbsSampling) Windows version 14 user manualCambridge MRC Biostatistics Unit

Siskind V Aitken J Green A amp Martin N (2002)Sun exposure and interaction with family history inrisk of melanoma Queensland Australia Inter-national Journal of Cancer 97 90ndash95

Swerdlow A amp Green A (1987) Melanocytic naeviand melanoma An epidemiological perspectiveBritish Journal of Dermatology 117 137ndash146

Therneau T M amp Atkinson E (1997) An introductionto recursive partitioning using the RPART routines(Technical report) Rochester Mayo Foundation

Thomas D (1992) Fitting genetic data using Gibbs sam-pling An application to nevus counts in 38 Utahkindreds Cytogenetics and Cell Genetics 59 228ndash230

Thompson E A (1994) Monte Carlo programs for pedigree analysis (Technical Report 267)Washington Dept of Statist ics University ofWashington

Vaupel J M Manton K G amp Stallard E (1979) Theimpact of heterogeneity in individual frailty on thedynamics of mortality Demography 16 439ndash454

113Twin Research February 2004

Analysis of Melanoma Onset

Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects

Page 16: Analysis of Melanoma Onset: Assessing Familial Aggregation ... · and Fitting Variance Components via Bayesian Random Effects Models Kim-Anh Do1, Joanne F. Aitken2, Adè le C. Green3,

113Twin Research February 2004

Analysis of Melanoma Onset

Appendix A A BUGS Program to Implement the Bayesian MCMC approach in Modeling an ACEModel for Age at onset of Melanoma with Covariate Effects and Random GeneticsEnvironmental Effects