Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Faculty of Sciences
Latent class linear mixed model to analyze preclinical tumor growth experiments with heterogeneous treatment responses
Sofie Denies
Master dissertation submitted to
obtain the degree of
Master of Statistical Data Analysis
Promoter: Prof. dr. Stijn Vansteelandt
Department of Applied Mathematics, Computer Science and Statistics
Academic year 2016 - 2017
Faculty of Sciences
Latent class linear mixed model to analyze preclinical tumor growth experiments with heterogeneous treatment responses
Sofie Denies
Master dissertation submitted to
obtain the degree of
Master of Statistical Data Analysis
Promoter: Prof. dr. Stijn Vansteelandt
Department of Applied Mathematics, Computer Science and Statistics
Academic year 2016 - 2017
The author and the promoter give permission to consult this master dissertation and to
copy it or parts of it for personal use. Each other use falls under the restrictions of the
copyright, in particular concerning the obligation to mention explicitly the source when
using results of this master dissertation.
FOREWORD This manuscript is divided in two parts. In the first part, simulations were performed to evaluate
the use of different models to analyse data from longitudinal tumor measurements. In the
second part, these models are applied to experimental data that is obtained during the student’s
employment at iTeos Therapeutics. iTeos proprietary compounds that were used in this
experiment are not identified, thus there is no need for confidentiality.
This work would not have been possible without the contribution of the people involved.
Firstly, I would like to thank my promoter professor dr. Stijn Vansteelandt, to give me the
opportunity to complete this thesis under his supervision as well as his valuable suggestions
and contributions to this thesis. I am also grateful to the management team at iTeos
Therapeutics, for allowing me to use data gathered in your company. Also the positive attitude
towards my personal development, specifically in the context of statistical analysis, is greatly
appreciated.
Table of Contents
1 ABSTRACT .................................................................................................................. 1
2 INTRODUCTION ......................................................................................................... 3 2.1 Data ............................................................................................................................... 3 2.2 Latent class linear mixed models .................................................................................... 4 2.3 Software ........................................................................................................................ 6
3 MATERIAL AND METHODS ......................................................................................... 7 3.1 Data ............................................................................................................................... 7 3.2 Research questions ...................................................................................................... 10 3.3 Specifying a latent class linear mixed model ................................................................ 10 3.4 Specifying non-‐latent class models ............................................................................... 12
4 RESULTS ................................................................................................................... 13 4.1 Simulation study .......................................................................................................... 13
4.1.1 Simulations ................................................................................................................. 13 4.1.2 Classification ............................................................................................................... 13 4.1.3 Model selection .......................................................................................................... 15 4.1.4 False positive rate ...................................................................................................... 16 4.1.5 Power ......................................................................................................................... 18 4.1.6 Bias and precision ....................................................................................................... 20 4.1.7 Model refinement ...................................................................................................... 24
4.2 Experimental data ........................................................................................................ 26 4.2.1 Data description ......................................................................................................... 26 4.2.2 Visual inspection data ................................................................................................ 26 4.2.3 Model building .......................................................................................................... 27 4.2.4 Comparison model with no latent classes to model with 2 latent classes ................. 28
5 DISCUSSION ............................................................................................................. 31
6 References ............................................................................................................... 35
1
1 ABSTRACT
This thesis discusses the analysis of longitudinal tumor measurements obtained from
preclinical experiments on cancer immunotherapy. Because the response to immunotherapy is
not homogenous, data do not meet the assumptions of a classical linear mixed model. One way
to deal with this unexplained heterogeneity is latent class analysis, which is the subject of this
thesis. An extension of linear mixed model by latent class analysis is compared with the
standard linear mixed model as well as a generalized estimating equation (GEE) approach in
the context of longitudinal measurements with heterogeneous treatment responses.
In a first part, simulations were performed to assess performance of the latent class linear mixed
model and compare it with the other techniques. For this, heterogeneous groups were composed
by mixing longitudinal profiles from two homogenous populations representing the latent
classes. Longitudinal parameters for the different populations as well as the mixing proportion
were specified. Performance of the latent class linear mixed model was assessed by
determining the proportion of profiles that were correctly classified in the latent class they were
sampled from and the ability to select the correct number of latent classes. Additionally, power
and type I error rate to detect a treatment difference, as well as bias and precision of predicted
tumor growth curves were assessed and compared with the standard linear mixed model and
the GEE model. In a second part, experimental data of a preclinical experiment conducted at
iTeos Therapeutics was analyzed by a standard linear mixed model and the latent class linear
mixed model. The ability to detect biologically meaningful subgroups was assessed as well as
the power to detect a relevant treatment difference.
From the simulations, it was clear that latent class linear mixed model is able to correctly
classify all the longitudinal profiles in more than 90% of the simulations, as long as the model
was run with a sufficient set of starting values (10) to ascertain a global maximum in the
likelihood is reached. It is important to allow the random effect structure to be different per
latent class if needed, as this influences the identification of the correct number of latent classes,
power, type I error rate, power and precision. With a correctly specified model, the latent class
linear mixed model performed very well, with a correct type I error rate and sufficient power
in all conditions evaluated. In contrast, the standard linear mixed model behaved poorly, with
2
a too high type I error rate and unsatisfying power, even with a low level of heterogeneity and
a relatively large treatment difference. The GEE model was considered inadequate for this kind
of data, as the missing values patters as present in this data, ad random because mice are
sacrificed starting from a certain tumor volume, resulted in severe bias invalidating results
obtained with that analysis. The application of the model to experimental data confirmed the
ability to discriminate meaningful subgroups and a higher power to detect a treatment
difference compared to the standard linear mixed model.
3
2 INTRODUCTION
2.1 Data
This thesis discusses the analysis of tumor growth curves obtained from preclinical
experiments on cancer immunotherapy. In this field of oncology, drugs are developed that are
not targeting the cancer cells directly, but instead stimulate the immune system to fight cancer.
Cancer immunotherapy is now considered to be one of the greatest promises for finding a cure
against cancer, mostly because of the impressive responses seen with the immune checkpoint
inhibitors. However, the mechanism of action of immunotherapy is complex leading to a high
degree of heterogeneity in how well patients respond to therapy, much more than for the
conventional cytotoxic treatments like chemotherapy. Which factors determine this
heterogeneity is of major interest and thus subject of intense research but are yet poorly
understood (Farkona et al., 2016).
Interestingly, and even more surprisingly considering the identical genetic background and
highly standardized procedures, even in preclinical experiments the response to
immunotherapy is not homogenous. Responses to the same treatment protocol can range from
no delay in tumor growth (non-responders) over a delay in tumor growth (partial responders)
to full regression of the tumor (complete responders). Because of the longitudinal nature of the
measurements, an elegant technique to analyze tumor growth curves is the linear mixed model
framework (Liu et al., 2010). Because tumor growth is exponential, tumor volumes are log
transformed before analysis. The inclusion of random effects accommodates a degree of
heterogeneity. It allows that the true parameter (baseline tumor volume for random intercept,
tumor growth rate for random slope) within one population determined by the fixed effects
varies from individual to individual. However, it assumes that this individual variability is
described by a single distribution, most typically the normal distribution. This assumption is
often not met in data from immunotherapy experiments, where clear subpopulations are
present, based on unknown and thus unobserved variables. One way to deal with this
unexplained heterogeneity is latent class analysis (Laajala et al., 2012). The subject of this
thesis is the evaluation of the extension of linear mixed model by latent class analysis for
analyzing tumor growth curves in response to immunotherapy.
4
2.2 Latent class linear mixed models In this model framework, subjects are probabilistically assigned to subpopulations (class
membership) based on the observed longitudinal data. Probabilistically, because the latent
nature of the class variable implies uncertainty, class membership is not directly observed but
estimated. Two assumptions are made in this context. First, the latent classes are mutually
exclusive; this means that one individual belongs to one class only. Because of the latent and
thus unobserved nature of class membership, there is no certainty to which class, which is why
probabilities are calculated for each class per individual. A second assumption is that latent
classes are exhaustive, probabilities to belong to the estimated classes must sum up to one
(Duncan et al. 2013). Within each class, longitudinal profiles are modeled by a standard linear
mixed model. Individual contributions to the likelihood are thus the weighted sum of class-
specific densities (equation 1, Proust-Lima et al. 2017).
𝑓" 𝑦" = 𝜋&
'
&()
𝑓 𝑦" 𝑐" = 𝑔 (1)
with:
0 ≤ 𝜋& ≤ 1 𝑎𝑛𝑑 𝜋& = 1'&()
and
𝑌"6 𝑐"(& = 𝑋1"6𝛽 + 𝑋2"6𝜐& + 𝑍"6𝑢"& + 𝜖"6
𝜋& = class-specific probabilities
𝑐" = discrete latent variable that equals g if subject i belongs to latent class g
𝑋1= vector of covariates that are associated with common fixed effects over classes 𝛽
𝑋2= vector of covariates that are associated with class-specific fixed effects 𝜐&
𝑍= vector of covariates associated with random effects
𝑢"&= class-specific random effects, 𝑢" 𝑐" = 𝑔~𝑁(𝑂,𝜔D& 𝐵) with B being an
unspecified variance-covariance matrix and 𝜔& a proportional coefficient allowing
class-specific intensity of individual variability
5
𝜖"6 = measurement error, ~𝑁(0, 𝜎GD)
For parameter estimation, the EM algorithm is the most used optimization method of latent
class analysis. Based on initial parameter values, in the expectation step the expected value for
the unobserved latent variable, conditional on the observed data and initial parameter values,
is calculated for each subject. In the next step (maximization), this expected value is treated as
a measured covariate and parameters are re-estimated based on maximizing the complete data
likelihood. In a next iteration, the expected value for the latent variable is again calculated for
each subject based on these updated parameters and the algorithm repeats alternating the
expectation and maximization steps until convergence (Mooijaart et al. 1992). Nevertheless, it
is also possible to directly maximize the incomplete data likelihood, with Newton-Raphson
like algorithms (Proust-Lima et al. 2017). An example is the Marquardt algorithm, a blend of
gradient descent and Gauss-Newton iterations. Gradient descent quickly approaches the
solution from a distance, but convergence is very slow close to the solution. Gauss-Newton is
the opposite, with fast convergence close to the solution, but its efficiency is heavily dependent
on the accuracy of the initial guess. The Marquardt algorithm combines the strengths of both,
acting more like a gradient descent when parameters are far from their optimal value, and more
like Gauss-Newton when they are close to the optimal value (Ranganathan, 2004). Apart from
likelihood based methods, also Bayesian estimation processes for latent class analysis are
available for longitudinal data (Lenk et al., 2000).
Two specific issues are related to the estimation of latent class problems. The first is that the
likelihood often suffers from local maxima. It is therefore imperative to run the algorithms with
different start values, ensuring a global maximum is reached (Jung et al., 2008). A second
problem is the selection of number of classes. The most recommended approach is to fit a
model with at least one more and one less than the expected number of latent classes, based on
the data or theoretical expectations. The correct number of classes is determined by model
selection, either based on information criteria or on likelihood-based tests. From the
information criteria, the Bayesian information criterion (BIC) performs best, and for
likelihood-based tests, the bootstrap likelihood ratio test (BLRT). The BLRT outperforms the
BIC, but is computationally demanding (Tein et al., 2013).
6
2.3 Software Several software options are available to run latent class mixed models. Mplus is a statistical
software package with a special emphasis on latent class analysis, including many applications
within linear mixed models (Jung et al., 2008). Within basic SAS, no functions for longitudinal
latent class analysis are available. A freely available extension proc Traj allows latent class
analysis with longitudinal data, but it assumes independence of the repeated measurements
within individuals per latent class, which only rarely holds (Jones et al., 2001). Macros that do
not have this restriction by allowing random effects within the latent classes have been
developed (Komarek et al., 2002). Within R, several packages for longitudinal latent class
analysis are available (Leisch, 2004; White et al., 2014, Benaglia et al., 2009; Proust-Lima et
al., 2017). For this thesis, lcmm was chosen, a package specifically designed to provide
extensions to the linear mixed model, including latent class analysis (Proust-Lima et al., 2017).
Compared to other R packages, advantages include the user-friendliness to specify the model
components (fixed, random, class-specific covariates for the longitudinal model and covariates
associated with latent class membership) and many built-in post-fit functions. Lcmm is based
on maximum likelihood theory, and uses a modified Marquardt algorithm for optimization.
7
3 MATERIAL AND METHODS
3.1 Data This thesis focuses on latent classes as most frequently observed during immunotherapy
experiments, i.e. mice with complete regression of the tumor and mice with exponential
(possibly delayed) tumor growth. For a large part, analyses were performed on simulated data.
Each simulated experiment has two treatment groups, each composed of mice belonging to
these two latent classes. A treatment effect is represented by a difference in longitudinal profile
in one of the two latent classes.
Some important characteristics of the data:
• Simulations and analyses are done on logarithmically transformed tumor volumes as
tumor growth is exponential in nature.
• Time is recoded to be zero at the start of treatment, which is generally not the same as
time of tumor inoculation. As part of experimental procedures, mice are allocated at
that time to treatment groups with on average the same tumor volume per group.
Although strictly not a randomization, as allocation is completely determined by
baseline tumor volume, this ensures completely balanced groups for tumor volume at
start of treatment. Treatment differences are thus always represented in a different
growth rate, while the main effect of treatment is not of interest.
• Mice are sacrificed when tumor volume exceeds 2000mm3. Additionally, tumor growth
below 1mm3 cannot be accurately measured. Therefore, all simulated data >log(2000)
were treated as missing values. All mice are followed from start to treatment, until
tumor volume reaches 2000mm3. As thus the probability of missing responses is not
depending on the response that is not observed, given the previously observed
responses, this can be considered a missing at random pattern (MAR) (Ibrahim and
Molenberghs, 2009). All simulated data <log(1) were fixed at log(1).
• Sample size is based on what is practically feasible within preclinical drug development
research: 15 mice per group, with twelve measurement occasions.
• The number of simulations was set at 2000.
8
Heterogeneous groups were created in following way:
• Homogenous longitudinal profiles were simulated, separately per latent class and per
treatment group. This was done with the simulate function of the lme4 package (Bates
et al., 2015).
ü One latent class is described by initial tumor growth, followed by regression of
the tumor. This curvature is modelled by a quadratic term of time. The
parameters used the describe this class can be found in equation 2 and example
tumor growth curves in figure 2. The longitudinal profile of this class is not
influenced by treatment. In the rest of this thesis, this latent class will be
described by ‘latent class 1’.
𝑌"6 = 4.35 + 𝑏M + 1.88 + 𝑏) ∗ 𝑇𝑖𝑚𝑒"6 − 6.64 + 𝑏D ∗ 𝑇𝑖𝑚𝑒"6D + 𝜖"6 (2)
𝐛 = 𝑏M, 𝑏), 𝑏D W~𝑁 𝟎,𝑫 𝑎𝑛𝑑 𝑫 =0.016 −0.10 0.028−0.10 5.34 −5.480.028 −5.48 8.37
𝜖~𝑁(0,0.066)
ü The other latent class is described by exponential tumor growth (linear for log
transformed values). Therefore, the quadratic term of time is set to zero for this
class. The parameters used the describe this class can be found in equation 3a
and example tumor growth curves in figure 2. To simulate a treatment effect,
the linear term of time was varied per treatment group, i.e. decreased by 25%
(equation 3b) and 50% (equation 3c). In the rest of this thesis, this latent class
will be described by ‘latent class 2’.
𝑌"6 = 4.35 + 𝑏M + 6.63 + 𝑏) ∗ 𝑇𝑖𝑚𝑒"6 − 0 + 𝑏D ∗ 𝑇𝑖𝑚𝑒"6D + 𝜖"6 (3𝑎)
𝑌"6 = 4.35 + 𝑏M + 5 + 𝑏) ∗ 𝑇𝑖𝑚𝑒"6 − 0 + 𝑏D ∗ 𝑇𝑖𝑚𝑒"6D + 𝜖"6 (3𝑏)
𝑌"6 = 4.35 + 𝑏M + 3.3 + 𝑏) ∗ 𝑇𝑖𝑚𝑒"6 − 0 + 𝑏D ∗ 𝑇𝑖𝑚𝑒"6D + 𝜖"6 (3𝑐)
𝐛 = 𝑏M, 𝑏), 𝑏D W~𝑁 𝟎,𝑫 𝑎𝑛𝑑 𝑫 =0.008 −0.05 0.014−0.05 2.64 −2.710.014 −2.71 2.98
𝜖~𝑁(0,0.033)
• The proportion of the two latent classes was kept constant across treatment groups and
varied from 0.2, 0.4 to 0.6 of latent class 1 (p1; p2=1 - p1) in different simulation runs.
9
The actual number of longitudinal profiles to be sampled from either latent class 1 or
latent class 2 per simulation was determined by binomial sampling. For this, the R base
function ‘sample’ was used to get 15 draws of either ‘class 1’ or ‘class 2’ with p1 and
p2 specified. An example of one simulation round, based on a p1 of 0.2, and a treatment
difference of 25% in the slope of class 2, is shown in figure 1.
Treatment A Treatment B
Latent class 1 Latent class 2 Latent class 1 Latent class 2
(1)
(2) 4 11 1 14
(3)
Figure 1: Schedule showing the creation of heterogeneous treatment groups by simulation. Per
treatment and latent class, homogenous longitudinal profiles (L1 and L2) were simulated (1).
Based on a fixed proportion of each latent class (here 0.2 for class 1 and 0.8 for class 2) equal
between treatment groups, and fixed sample size per treatment group (n=15), the number of
profiles to be taken from each latent class (n1 and n2) is simulated per treatment group (2).
Treatment groups are then composed by taking n1 profiles of L1 and n2 profiles of L2 (3).
In addition to the simulated datasets, also experimental data with observed heterogeneity is
used. Data was collected at iTeos Therapeutics and is composed of mice treated with a
reference immunotherapy drug (aPD-1) with and without an iTeos compound. More detail
about this data can be found in the relevant chapter.
10
3.2 Research questions In this thesis, the research question is limited to evaluating the performance of latent classes
when there is a treatment difference in the longitudinal profile of one of the latent classes. A
treatment effect manifesting in a difference in proportion of mice belonging to a certain class
is another potential application of latent class analysis that is not considered in this thesis but
reviewed in the discussion.
Performance of the latent class analysis was assessed by:
• The number of mice that were correctly classified, i.e. in the latent class they were
sampled from.
• The ability to select the model with the correct number of latent classes (i.e. 2) based
on model selection criterion BIC.
• The power of the model to detect the relevant treatment difference. This was compared
with the power of a linear mixed model ignoring latent classes as well as with a general
estimating equation (GEE) approach using robust standard errors.
• The false positive rate when simulated under the null hypothesis of no treatment
difference in the longitudinal profile of any of the latent classes. Again this was
compared with the two other models as for power.
• The bias in predicted growth curves in latent class 2 (the latent class simulated to have
a treatment difference) along with its precision. Also here a comparison with a standard
linear mixed model and the GEE approach was made.
• The ability to identify biologically relevant latent classes and treatment differences in
experimental data and compare with a standard linear mixed model.
3.3 Specifying a latent class linear mixed model Within the lcmm package, the hmle function is used to fit a latent class mixed model. The
function consists of:
• Specifying the fixed effects model formula: Although tumor growth is exponential in
nature, in latent class 1 there is initially tumor growth followed by regression. To
accommodate this a quadratic term of time is included to allow for this curvature.
Interaction of time with treatment group completes the fixed effects formula.
11
• Specifying the random effects (random=): a random intercept and slope (linear and
quadratic) is specified.
• Specifying whether the variance-covariance matrix is common over latent classes
(nwg=FALSE) or not (nwg=TRUE): to investigate how the specification of the random
effects influences inference about the fixed effects, models with both specifications
were run. It has to be mentioned that even with nwg=TRUE, only a proportional
coefficient is estimated to allow for a class-specific intensity of variance (see equation
1). There is no option to estimate a completely separate variance-covariance matrix per
class.
• Specifying the class-specific parameters of the linear model (mixture=): both time and
treatment and their interactions are allowed to be class-specific.
• Specifying the grouping structure (subject=): covariate identifying unique mouse
numbers.
• Specifying the number of latent classes (ng=): because data were sampled from two
latent classes, as part of model selection, models with 1 to 3 latent classes were run.
• Specifying the covariates having an influence on class membership (classmb=): not
used in this thesis, but potential use commented on in the discussion.
Other functions within the lcmm package that were used in this thesis:
• Gridsearch: estimates starting from random start values from parameters from a model
with no latent classes. The number of random start values can be specified by rep= and
the number of iterations after which the log likelihood is evaluated to determine the
best start values to run the full optimization by maxiter=. Rep= was varied from 1 to 20
and maxiter was kept constant at 50. This was used to evaluate classification and model
selection, to assess how easily one can obtain the correct model without any prior
information.
• Start: alternative to gridsearch, where start values are specified by the user. This was
used for simulated data to speed up the simulations when evaluating performance of
the (correctly identified) model.
• Pprob: gives the posterior individual classification table including the most likely class
membership. This was used to evaluate the classification accuracy of the model by
comparing this to the class where the mouse was sampled from.
12
• WaldMult: provides Walt test for joint significance of multiple coefficients. This was
used to jointly test both interactions of time and treatment within the longitudinal model
where a quadratic term of time was included.
• Posfix: Allows to fix parameters to a certain value, instead of estimating them. This
was done to simplify the model by restricting the growth curve of latent class 2 to be
linear, i.e. fixing the quadratic terms of time to zero for this class.
• predictY: provides class-specific predicted values. By comparing the predicted growth
curve with the true growth curve based on equations 3a-c, this was used to assess
accuracy of the different models. With the option draws=True also 95% confidence
limits are calculated, which was used to assess precision for the different models.
3.4 Specifying non-latent class models To compare power, false positive, bias and precision of the latent class analysis to analyses
ignoring latent classes, two additional models were fit. The first one was a standard linear
mixed model. This was done with the same function (hmle from package lcmm) as for the
latent class analysis, by specifying the number of latent classes (ng) to 1. For the GEE approach,
the function ‘geeglm’ form the package ‘geepack’ was used with first-order autoregressive
correlation structure (Højsgaard et al., 2006). The ‘lsmeans’ function from the ‘lsmeans’
package was used to calculate predicted values and their 95% CI for the GEE model (Russell,
2016).
13
4 RESULTS 4.1 Simulation study 4.1.1 Simulations In figure 2, an example of the simulated growth curves is shown, for the two latent classes
(panels) and treatment groups (colours). Both the curves on the logarithmic scale (simulated
and used for modeling) and the back-transformed linear scale (as they would be recorded in
experimental data and subsequently log transformed for analysis) are shown.
A B
Figure 2: Log transformed (A) tumor growth curves as they were simulated, and
backtransformed to their measurements scale (B). Left panels show latent class 1, right panel
latent class 2. Colours represent simulated treatments groups, having similar growth curves in
latent class 1 but different growth curves in latent class 2.
4.1.2 Classification Following parameters were varied to assess their influence on classification accuracy:
• the treatment difference: simulations were run for a small (25%, group A vs group B
from figure 2) and large (50%, group A vs group C from figure 2) difference in slope
for mice belonging to latent class 2
14
• the proportion of latent classes within a treatment group: simulations were run for a
proportion of latent class 1 of 0.2, 0.4, 0.6. For all simulations, this proportion was
equal between treatment groups
• number of random start values: models were run with 1, 10 and 20 sets of random start
values, based on parameters from a model without latent classes.
• Specification of random effects: both models were variance-covariance structure was
specified to be common over latent classes as well as models with a class-specific
proportional coefficient (see equation 1) were fit
First, per simulation the proportion of mice that were correctly classified was evaluated, i.e.
whose most likely posterior class membership corresponded to the class they were sampled
from. The allocation of class names (1 or 2) is random, so from simulation to simulation the
same latent class can be encoded by a different class number. To be able to translate the
posterior class number to latent class 1 or class 2 from simulations, the average tumor volume
per class was calculated and the class with the lowers average tumor volume was recoded to
latent class 1.
Figure 3: Distribution of the proportion of mice that were correctly classified for a model
with two latent classes, 15 mice per group, 20% of regressors and a slope difference of 25%
between treatment groups in the growers latent class. From left to right: with 1 set of start
values, 10 random start values and 20 random start values.
There was no influence of slope parameters or percentage of regressors on the number of
correctly classified mice. Also the specification of the random effects, common over latent
classes or allowed to be different, had no influence on classification accuracy (table 1). In
figure 3 and table 1, the effect of the number of random start values is shown. It is clear that
15
only 1 set of start values is clearly inferior (only in <60% of the simulations all mice are
classified correctly), demonstrating the problem of local maxima. Beyond 10 start values there
is no further increase in classification accuracy.
Table 1: Percentage of simulations were the model classified all the mice in the correct latent
class, in function of proportion of mice belonging to latent class 1 (0.2, 0.4 or 0.6), the size of
a treatment difference (25% or 50% in slope in latent class 2), number of random start values
(1,10 or 20) and whether random effects were forced to be common over latent classes (M1)
or not (M2).
4.1.3 Model selection The ability to distinguish the correct number of classes was also evaluated. It was evaluated if,
from models with 1 to 3 latent classes, the BIC could identify the correct model with 2 latent
classes. BIC is an information criterion based on the likelihood function, and a penalty for the
number of parameters in the model. It is scaled to identify the better model by a lower BIC.
Therefore, it was evaluated, if from models with 1 to 3 latent classes, the model with 2 latent
classes had indeed the lowest BIC. Based on the results of the previous section, models were
estimated based on 10 random start values. For these simulations, treatment difference in slope
of latent class 2 was fixed on 25% (group A vs group B in figure 2).
As can be seen from table 2, the BIC performs very well when comparing models with 1
compared to 2 classes, irrespective of how random effects are specified. The power of the BIC
to detect latent classes when they are indeed present in the data is thus high. However, the BIC
0.2 0.4 0.6
25% 50% 25% 50% 25% 50%
M1 1 49% 52% 43% 47% 50% 51%
10 93% 97% 94% 97% 92% 95%
20 95% 96% 93% 97% 93% 94%
M2
1 47% 49% 49% 50% 51% 49%
10 95% 94% 94% 93% 91% 93%
20 94% 94% 95% 92% 93% 93%
16
incorrectly favors 3 latent classes in a relatively large proportion of the cases if covariance
structure is forced constant over latent classes. Indeed, the 3 latent class model attempts to
compensates for this, by creating an extra class within latent class 1, resulting in a covariance
structure that is more similar to the one of latent class 2 then when the former is correctly
identified as one group. Allowing the random effects to vary per latent class by a proportional
coefficient partly resolves the overestimation of the number of latent classes by the BIC, but
not completely.
When mice were sampled from only one distribution (latent class 2), the BIC only incorrectly
identified 2 latent classes in a small minority of the simulations (<5%). The false-positive rate
of indicating the need for latent classes when in reality groups are composed of one
homogeneous population is thus low. Also, when simulations were performed with identical
covariance structure in the two latent classes, the problem of overestimation of 3 versus 2 latent
classes disappears (<5%), confirming that the liberal nature of the BIC in identifying the correct
number of classes as identified above results from misspecification of the random effects.
Table 2: Percentage of simulations were a model with 2 latent classes had a lower BIC
compared to model with 1 and 3 latent classes for different proportions of latent class 1 (0.2 to
0.6) for latent class models with common random effects (nwg=F) and including a class-
specific proportional coefficient (as defined in equation 1) for the random effects (nwg=T).
4.1.4 False positive rate To determine the false positive rate, mice were sampled under the null hypothesis of no
treatment difference. The slope of latent class 1 was kept constant as always, and the slope
parameter of latent class 2 was fixed as in equation 3b (group B from figure 2) for both
treatment groups to simulate the null hypothesis. The proportion of latent class 1 was varied as
in the previous sections. For these simulations, one set of start values was specified based on
0.2 0.4 0.6
2 vs no latent classes Nwg=F
Nwg=T
97%
100%
91%
99%
94%
100%
2 vs 3 latent classes Nwg=F
Nwg=T
70%
90%
62%
80%
69%
88%
17
simulation parameters instead of random sets of start values. This was done to speed up the
simulations, but means that all results are conditional on a correctly identified model (i.e. global
maximum likelihood function). However, as a global maximum is identified in the majority of
the cases when sufficient random start values are included (see above), these results are also
valid for real experimental data where there may be no prior information on the parameters
available.
With a Wald test is was tested if both time*treatment interactions terms were equal to zero for
a model with 2 latent classes, both with and without common covariance structure, and a
standard linear mixed model ignoring latent classes and a GEE approach with robust standard
errors. The proportion of simulations were this test gave a significant result (p<0.05) represents
the type I error rate, i.e. false positive results and are given in table 3. Apart from this outcome,
also histograms of the p-values were evaluated, as p-values should be uniformly distributed
under the null hypothesis (figure 4).
It is clear that the standard linear mixed model and the GEE are too liberal, with the type I error
rate exceeding 5% (table 3). Indeed, p-values are not uniformly distributed under the null
hypothesis, but are right-skewed (figure 4). A potential explanation could be that, although the
true proportion of latent classes are the same between the two treatment groups, a difference in
actual number of mice belonging to one latent class between treatment groups due to sampling
variability of a proportion is causing the excess of false positive results. Indeed, when only one
draw for number of mice in the latent classes per simulation was performed, resulting in exactly
the same numbers for both treatment groups, the opposite is true. P-values become left-skewed,
and the tests are too conservative.
The latent class linear mixed model where the covariance structure is (incorrectly) kept
constant over the latent classes is too conservative, with p-values having a left-skewed
distribution (table 3 and figure 4). This becomes clearer with an increasing proportion of mice
belonging to latent class 1. This can be explained by the greater individual variability in latent
class 1 compared to latent class 2 (see equations 2 and 3), which in this model is also incorrectly
influencing the variance of latent class 2, the class of interest.
The latent class linear mixed model where covariance structure was allowed to be different for
the latent classes approximates the uniform distribution of p-values and a correct type I error
18
rate the best from the three tests, although for a high number of mice belonging to class 1 the
left-skewed distribution becomes apparent again (table 3 and figure 4). This can be explained
by the fact that even in this model a completely separate covariance structure per latent class
is not estimated, but only one class-specific proportional parameter. Nevertheless, influence on
type I error rate is minimal, even with a high proportion of mice belonging to latent class 1(table
3). Performing the same multivariate Wald test with no heterogeneity, i.e. 0% of mice
belonging to latent class 1 or with two latent classes with identical covariance structure, results
in a uniform distribution of p-values under the null hypothesis with a correct type I error rate.
Table 3: Type I error rate for a linear mixed model ignoring latent classes (m1), a GEE model
(m2), a model with two latent classes with common covariance structure (m3) and including a
proportional coefficient for random effects (m4) in function of the proportion of mice
belonging to latent class 1 (0.2 to 0.6).
0.2 0.4 0.6
M1 8.1% 9.1% 8.1%
M2 13% 8% 8%
M3 3.5% 1.3% 1.1%
M4 5.5% 5.2% 4.2%
4.1.5 Power To assess power of the different models, tumor growth curves were sampled under alternative
hypotheses. The slope of latent class 1 was kept constant across treatment groups, but the slope
of latent class 2 had a difference between treatment groups of 25% and 50% (group A vs group
B and group A vs group C from figure 2 respectively). Again, one set of start values was
specified based on simulation parameters. With a Wald test is was tested if both time*treatment
interactions terms were equal to zero for both a model with 2 latent classes and a standard linear
mixed model ignoring latent classes as well as a GEE model. The proportion of simulations
where this test gave a significant result (p<0.05) represents the power of the test (1-type II error
rate).
19
0.2 0.4 0.6
M1
M2
M3
M4
Figure 4: P-value histogram for simulations under the null hypothesis from a linear mixed
model ignoring latent classes (m1), a GEE model (m2), a model with two latent classes with
common covariance structure (m3) and including a proportional coefficient for random effects
(m4) in function of proportion of mice belonging to latent class 1 (0.2 to 0.6).
In table 4 it is shown that the power of a test where the latent classes are included is superior
to the power of a standard linear mixed model in all cases. Moreover, it is clear that a standard
linear mixed model is severely underpowered to detect even relatively large treatment
differences in the presence of this heterogeneity at the evaluated sample size. Even with a low
number of mice belonging to latent class 1, power is extremely low. A GEE model outperforms
20
the standard linear mixed model, and with a large treatment difference it reaches sufficient
power. However, with a moderate treatment difference, power is inadequate. A dramatic
increase is observed with latent class analysis. Misspecification of the random effects as
common over latent classes has a negative influence on power, which becomes more clear as
the proportion of mice belonging to latent class 1 increases. Including the proportional
coefficient for the random effect greatly improves power. Latent class analysis, with a correctly
specified covariance structure, has a high power to detect the evaluated treatment difference in
the presence of heterogeneity. Only with a high proportion of mice belonging to latent class 1
and a small treatment difference, power of the latent class analysis does not reach 80%.
Table 4: Power of a linear mixed model ignoring latent classes (m1), a GEE model (m2), a
model with two latent classes with common covariance structure (m3) and including a
proportional coefficient for random effects (m4) in function of the proportion of mice
belonging to latent class 1 (0.2 to 0.6) and treatment difference (25 or 50%).
25% 50%
0.2 0.4 0.6 0.2 0.4 0.6
M1 26% 15% 11% 78% 42% 24%
M2 54% 32% 19% 99% 100% 89%
M3 96% 71% 30% 100% 100% 97%
M4 99% 90% 70% 100% 100% 99%
4.1.6 Bias and precision To further assess model fit, both accuracy and precision were compared between the different
models. Parameter of interest for inference is the difference in slope in latent class 2 between
treatment groups, but as this is represented by two interactions (linear and quadratic term of
time*treatment) that are difficult to interpret independently from one another, bias and
precision were evaluated at the level of predicted values for this latent class. Simulations were
run, and per simulation the predicted values and the 95% CI for the latent class with a treatment
difference (latent class 2) was saved via the function predictY of lcmm for a model with two
latent classes (with and without common covariance structure) and overall predicted values
and 95% CI ignoring latent classes for a standard linear mixed model and the GEE model. The
average of the predicted values and limits of the CI of all the simulations was saved and were
21
compared with the true growth curve of latent class 1, based on the parameters shown in
equation 3.
Unsurprisingly, when comparing the overall predicted values of a standard linear mixed model
ignoring latent classes with the actual growth curve of the latent class of interest, an
underestimating of the growth curve and an unprecise estimation is evident. Nevertheless, it is
interesting to see how large the influence is even if there is only a small contamination of the
other latent class. Already with only 20% of mice belonging to latent class 1, there is
tremendous difference, both regarding accuracy and prediction, compared to estimation with
the same model when there is no heterogeneity (figure 5). Naturally, this becomes even more
problematic as the proportion of mice belonging to latent class 1 increases.
20% latent class 1 0% latent class 1
Figure 5: Mean predicted values (black solid line) and 95% confidence intervals (black dotted
line) from a linear mixed model ignoring latent classes for simulations with a 50% slope
difference and 20% (left) or 0% (right) of mice belonging to latent class 1 compared to the true
growth curve of latent class 2 (red dotted line)
The situation is even worse for the GEE approach. With no heterogeneity, this model behaves
exactly as the standard linear mixed model. However, even with only 20% of mice belonging
latent class 1, this model performs extremely poorly. This can be explained by the missing
value pattern. Mice are euthanized when tumor volume exceeds a certain number (MAR).
22
Because of faster tumor growth in treatment group A compared to group B, during the follow-
up period there are more missing values for mice in group A. The missing values do not pose
serious problems when there is no heterogeneity, but becomes problematic when treatment
groups also contain mice from latent class 1 (figure 6). Because in latent class 1 there is
regression of the tumor, no missing values for those mice are present. So for treatment group
A, from a certain time point, data are missing in the latent class of interest but not for the other
latent class. It was confirmed that the bad behavior of the model was caused by the missing
values by running the model on simulations with the same level of heterogeneity but no missing
values (simulated tumor volumes >2000mm3 were kept as observations). Then the model
performs similarly to the standard linear mixed model (figure 6).
20% latent class 1
with missing values
20% latent class 1
without missing values 0% latent class
Figure 6: Mean predicted values (black solid line) and 95% confidence intervals (black dotted
line) from a GEE for simulations with a 50% slope difference and 20% (upper) or 0% (lower)
of mice belonging to latent class 1 with (left) and without (right) missing values compared to
the true growth curve of latent class 2 (red dotted line)
The latent class mixed model does not suffer from any bias, whether random effects are kept
constant over latent classer or not. Both also result in much more precise estimation then the
standard linear mixed model. In line with previous results, allowing the random effects to vary
per latent class has a beneficial influence, i.e. reduces the uncertainty in predicted values.
Although true for all tested conditions, this influence is minimal with only 20% belonging to
latent class 1, and becoming more clearly with increasing proportion of mice belonging that
that latent class (figure 7).
23
Nwg=F Nwg=T
20% latent class 1
60% latent class 1
Figure 7: Mean predicted values (black solid line) and 95% confidence intervals (black dotted
line) from a linear mixed model ignoring latent classes for simulations with a 50% slope
difference and 20% (left) or 0% (right) of mice belonging to latent class 1, with common
random effects (nwg=F) and including a class-specific proportional coefficient (as defined in
equation 1) for the random effects (nwg=T), compared to the true growth curve of latent class
2 (red dotted line)
24
4.1.7 Model refinement Tumor growth is exponential (and thus linear on a logarithmic scale) for latent class 2, but not
for latent class 1. In a standard linear mixed model, it is not possible to accommodate this
difference in growth pattern as there is no distinction between two latent classes. All the mice
are either fit by only a linear term or also a quadratic term. Interest lies in the latent class 2, but
by only including a linear term, residual variance is increased because of a bad fit for latent
class 1. Therefore, including a quadratic term of time results in better model fit and better
power then when only a linear term is included. Hence, to test for a treatment difference in
slope for latent class 2, two interactions need to be tested simultaneously, as the individual
terms of a polynomial parameter cannot be interpreted separately.
In the latent class model, it is possible to accommodate this difference in growth pattern
between the latent classes. The hlme fuction to fit a latent class model has the option ‘posfix’,
which allows the user to fix certain parameters, so they will not be estimated. This option can
only be used in combination with the ‘start’ function, where all the start values are specified
by the user. For the parameters specified by ‘posfix’, parameters are fixed to their specified
start value. In this case, the quadratic term for time and the interaction of this term with
treatment was set to zero only for latent class 2, reducing their growth curve to linear but
keeping a good fit for latent class 1. To test for a treatment difference, only one interaction
needs to be tested, which is more efficient. Table 5 shows the increase in power that this
approach brings, for a model with latent classes and a proportional coefficient for the random
effects. The increase in power is minimal for most conditions, because the power of the test is
already almost maximal without this adaptation. However, for the conditions where power is
not yet maximal, it is clear that this approach has a substantial influence on power, e.g. for a
25% difference in slope and 60% of mice belonging to latent class 1 it increases the power
from an insufficient 70% to a satisfying 83%. An additional advantage is that the effect of
interest is now represented by one parameter that is easy to interpret (the difference in slope
between treatment) and report together with a measure of precision (standard error, 95% CI).
25
Table 5: Power of a model with two latent classes including a proportional coefficient for
random effects with (m2) and without (m1) fixing the quadratic terms of time to zero, in
function of the number of regressors (20 to 60%) and treatment difference (25 or 50%).
25% 50%
20% 40% 60% 20% 40% 60%
M1 99% 90% 70% 100% 100% 99%
M2 100% 97% 83% 100% 100% 100%
26
4.2 Experimental data
4.2.1 Data description The data shows tumor growth measurements of mice that were subcutaneously inoculated with
CT26, a colon cancer model. Mice were allocated at day 10 after inoculation to treatment
groups with the same average tumor size. One group of 8 mice was treated with anti-PD-1, a
clinically approved immunotherapeutic drug. A second group of 16 mice was treated with anti-
PD-1 + an Iteos compound. Tumors were measured three times per week for 17 days. Mice
were euthanized when tumor volumes exceeded 2000mm3 and tumors measuring less than
1mm3 were recorded as 1mm3. There were no missing measurements apart from the mice that
were euthanized because of tumor size. The objective of the experiment was to investigate if
the combination of anti-PD-1 and Iteos compound was superior to anti-PD-1 monotherapy.
4.2.2 Visual inspection data In both treatment groups, there are mice with tumors that regress and mice with tumors that
continue growing, suggesting latent heterogeneity (figure 8). In the combination treatment,
there seems to be a delay in tumor growth for the mice that do not regress. After log
transformation of the tumor volumes, tumor growth seems to be linear in the mice with tumors
that continue growing, whereas in the mice with regressing tumors, a curvature is present
(figure 9).
Figure 8: Observed individual tumor growth curves for both treatment groups.
27
Figure 9: Log transformed individual tumor growth curves for both treatment groups.
4.2.3 Model building To assess the need for latent classes, a model with no latent classes, two and three latent classes
was fit. The same model as discussed in the simulation chapter was fit. To asses if random
effects could be kept constant across classes or not, a model with and without a class-specific
coefficient for the random effect was fit for the two and three latent class models. As in the
simulation, a gridsearch with 10 random start values based on the model with no latent classes
was used to fit the model with latent classes. With the function ‘summarytable()’, an overview
of the models was generated. Based on the BIC, a model with 2 latent classes is identified as
the best model, with no need for a class-specific coefficient for the random effects (table 6).
With this model, latent classes represent biologically meaningful groups; i.e. mice whose
tumors are regressing and mice whose tumors keep on growing (figure 10). To further simplify
the model, considering the linear tumor growth in mice whose tumors do not regress (latent
class 1), a model was fit where the quadratic term for this latent class was fixed to zero. This
did not result in a worse fit (BIC=146) and was thus chosen as the final model.
Table 6: Summary of models with different number of latent classes (G) and common
random effects structure (nwg=F) or class-specific coefficient (nwg=T).
G nwg npm BIC %class1 %class2 %class3 m1 1 / 13 156 100 m2 2 F 20 148 79 21 m3 2 T 21 151 21 79 m4 3 F 27 148 58 21 21 m5 3 T 29 153 38 42 21
28
Figure 10: Classification of the longitudinal profile according to a latent class linear mixed
model with two latent classes (red=latent class 1, blue=latent class 2).
4.2.4 Comparison model with no latent classes to model with 2 latent classes Model parameters for the model with no latent classes and with two latent classes are shown
in table 7 and 8 respectively. A Wald test jointly testing the two interaction terms between time
and treatment gave a p-value of 0.09 for the model with no latent classes. A Wald test testing
the interaction of time with treatment for latent class 1 for the model with two latent classes
resulted in a p-value of 0.0001. It can be thus concluded that the latent class model is much
more powerful to detect a treatment difference in this dataset than a standard linear mixed
model.
Table 7: Model parameters for a standard linear mixed model without latent classes.
Parameters in red represent a treatment difference.
coef Se Wald p-value
Intercept 4.75 0.08 61 <0.0001
Time 0.12 0.02 5.54 <0.0001
Time2 -0.005 0.001 -3.75 0.0002
Group -0.05 0.13 -0.36 0.72
Time*Group 0.02 0.04 0.45 0.65
Time2*Group 0.003 0.002 1.40 0.16
29
Table 8: Model parameters for a standard linear mixed model without latent classes.
Parameters in red represent a treatment difference. coef Se Wald p-value
Intercept cl1 4.71 0.12 40.02 <0.0001 Intercept cl2 4.61 0.31 15.07 <0.0001
Time cl1 0.14 0.02 9.32 <0.0001 Time cl2 -0.01 0.09 -0.12 0.90
Time2 cl1 0* Time2 cl2 -0.005 0.004 -1.32 0.19
Group cl1 0.07 0.15 0.48 0.63 Group cl2 0.02 0.34 0.07 0.94
Time*Group cl1 -0.07 0.02 -3.81 0.0001 Time*Group cl2 0.16 0.11 1.61 0.11 Time*Group cl1 0*
Time2*Group cl2 -0.006 0.004 -1.50 0.13
Overall predicted growth curves for the model with no latent classes and class-specific tumor
growth curves for latent class 1 for the model with two latent classes along with their 95% CI
are shown in figure 11 and 12 respectively. For a standard linear mixed model, a large influence
of the few mice with regressing tumors on the predicted growth curve of the combination
treatment is evident (figure 11A). For the latent class model, a separate growth curve for mice
depending on tumor regression (latent class 2) or tumor growth (latent class 1) is estimated,
resulting in a growth curve that is representative for each class (figure 12A). The improved
precision in estimation explains the greater power of this approach compared to the linear
mixed model (figure 11 and 12B).
30
A B
Figure 11: Predicted growth curves (red) superimposed on the observed indivivual growth
curves (black) (A) and predicted growht curves (solid lines) with their 95% confidence
intervals (dotted lines) (B) for a standard linear mixed model
A B
Figure 12: Predicted growth curves (red) superimposed on the observed indivivual growth
curves (black) (A) and predicted growht curves (solid lines) with their 95% confidence
intervals (dotted lines) (B) for latent class 1 from a latent class linear mixed model
31
5 DISCUSSION In this thesis, the performance of a standard linear mixed model, a robust GEE model and a
latent class linear mixed model was evaluated when there is latent heterogeneity present. The
ability to detect a treatment difference in longitudinal growth profile of one the latent classes
was of interest.
The standard linear mixed model performed inadequately, with both a too liberal type I error
rate and low power. As it is thus more likely to find a treatment difference when it is not there
and to miss a treatment difference when it is actually present, standard linear mixed models
cannot be seen as robust in the presence of latent classes. The liberal nature under the null
hypothesis can be explained by the fact that although the true proportion was the same for both
treatment groups, because of sampling the actual proportion was not the same in all
simulations. When there was a such a difference in proportion, the overall estimated growth
curve was falsely considered different between treatment groups, although both the
longitudinal profiles of the two latent classes and the proportion of latent classes were sampled
under the null hypothesis of no treatment difference. The lack of sufficient power can mainly
be explained by the increased variability compared to a homogenous population. It has to be
mentioned that the effect of heterogeneity is large, even with a low proportion of mice
belonging to another latent class and a large treatment difference. This can be explained by the
fact that the latent classes as evaluated in this thesis are very different from each other, the way
it is also often seen in experimental data.
The GEE model seemed to have the same problem for type I error rate, but a higher power as
the standard linear mixed model. However, when evaluating bias of this model, it was clear
that the GEE model was not able to cope with the missing value pattern in the presence of
heterogeneity. The data can be considered missing at random, as tumor volumes>2000mm3
were replaced by missing values, so the probability of missing responses is dependent on
previously measured responses (all measurements up to 2000mm3). Linear mixed models can
cope with missing at random, and indeed there was no difference whether missing values were
present or not. GEE models can only cope with missing completely at random (Nakai et al.,
2011). However, without heterogeneity, there was no influence of the missing value pattern as
32
described above. It is only in the presence of heterogeneity and missing values, that GEE did
not yield valid results. Therefore, GEE cannot be recommended to be used with this kind of
data.
Latent class linear mixed models performed very well, leading to a great increase in power
compared to a standard linear mixed model and not suffering from an inflated type I error rate.
It is important to run the model with different sets of start values, or specify sensible start
values, in order to reach a global maximum of the likelihood. If this is ensured, the model is
able to correctly classify profiles in the vast majority of cases. This good convergence of a
relatively complex model for a relatively small sample size can be explained by the fact that
the latent classes are quite distinct from each other. The BIC is a reliable criterion to identify
the need of latent classes versus none, as it favors in over 90% of simulations with two latent
classes, and less than 5% of simulations with no latent classes, the model with two latent classes
versus a standard linear mixed model. It is however somewhat liberal in the determination of
the number of latent classes. Therefore, care should be taken not to overestimate the number
of latent classes, which can be done by restricting the number of classes to biologically
meaningful and interpretable subgroups (Berlin et al., 2013). The need for latent classes and
determining the number of latent classes can also be tested with the BLRT, which has been
identified as superior compared to the BIC for model comparison. However, this test is
computationally demanding and time-consuming because of the extensive number of models
needed to be run for one dataset to get the different bootstrap estimates under different null
hypotheses, all with sufficient set of start values (Tein et al., 2013).
Within the latent class linear mixed model, it is important is to allow for class-specific random
effects when needed. If not, this influences model selection with regards to the number of latent
classes. In an attempt to obtain latent classes with similar variance, there will be an
overestimation of the number of latent classes. And because an incorrectly common variance
will over- or underestimate the true variance of one of the latent classes, also type I and type II
error rates are affected. A limitation of the model is that it only allows variances to vary
proportionally the same for all random effects between latent classes. However, this
approximation works well with no noteworthy influences on the different parameters
considered in this thesis to assess model performance.
33
It has to be mentioned that power, type I error rate, bias and precision of the latent class linear
mixed model were assessed conditional on a model with the correct number of latent classes
and with specified start values, which is not the same as estimation without any prior
information. However, because of the good convergence with a sufficient amount of start
values and the good performance of the BIC to select the number of latent classes, results would
be very similar if for each simulation this approach of different start values and selection of the
right model based on the BIC after fitting models with different number of classes was used.
This was not done because of the significant time gained by only fitting one model. Moreover,
because of the small sample size and clearly distinguishable subgroups, also for real
experimental data it will be possible to provide reasonable start values. Therefore, it is believed
that the results based on the stimulation as they were performed are also valid for real
experimental data.
The good performance of the latent class linear mixed model in the stimulation studies was
confirmed by its application to real experimental data. The model was able to identify
biologically meaningful subgroups and was more powerful than a standard linear mixed model
to detect a treatment difference.
In this thesis the latent class linear mixed effect model was used to test a difference in slope in
one of the latent classes. Only the longitudinal parameters of the model were thus of interest
for inference. However, the model can also be used to test a difference in proportion of latent
classes between treatment groups. A covariate explaining class membership in the logistic part
of the model can be easily included in the hlme function by specifying the classmb option
(Proust-Lima et al., 2017). Testing if a treatment would for instance increase the proportion of
mice with complete regression could be of great scientific interest. Also other variables
explaining class membership could be valuable. For instance, if potential biomarkers could be
evaluated this way for their association with a certain type of anti-tumor response, this could
lead to a better understanding of the mode of action of the treatment. This is of particular
interest for cancer immunotherapy, as it is still unknown what causes the heterogeneity in
response. However, in preclinical experiments as they are conducted routinely, with generally
only 15 mice per group, reliable conclusions from logistic regression are not possible (Peduzzi
et al., 1996). Increasing the sample size considerably in preclinical experiments is difficult for
practical reasons, because of the high number of compounds that need to be evaluated in this
34
stage. In clinical trials, sample sizes are much larger, and the use of latent class analysis to
answer these specific questions are very relevant in this setting.
In conclusion, this thesis demonstrated that standard linear mixed models do not perform well
to assess treatment differences in tumor growth rate in the presence of latent heterogeneity as
often observed in preclinical experiments, not even with a low level of heterogeneity and a
large treatment difference. GEE models should not be used in the presence of missing data
(unless missing completely at random), which is often the case in these kind of experiments.
In contrast, latent class linear mixed models perform very well, and should be the method of
choice in case of latent heterogeneity.
35
6 REFERENCES
Bates D, Maechler M, Bolker B, Walker S. Fitting Linear Mixed-Effects Models Using lme4.
Journal of Statistical Software. 2015;67(1), 1-48.doi:10.18637/jss.v067.i01.
Benaglia T, Chauveau D, Hunter D, & Young D. mixtools: An R Package for Analyzing
Mixture Models. Journal of Statistical Software. 2009;32(6), 1 - 29.
doi:http://dx.doi.org/10.18637/jss.v032.i06.
Berlin KS, Williams NA, Parra GR. An Introduction to Latent Variable Mixture Modeling
(Part 1): Overview and Cross-Sectional Latent Class and Latent Profile Analyses. Journal of
Pediatric Psychology. 2013;39(2):174-187. doi:10.1093/jpepsy/jst084.
Duncan TE, Duncan SC, Strycker LA. An Introduction to Latent Variable Growth Curve
Modeling: Concepts, Issues, and Application, Second Edition. 2013.
Farkona S, Diamandis EP, Blasutig IM. Cancer immunotherapy: the beginning of the end of
cancer? BMC Medicine. 2016;14:73. doi:10.1186/s12916-016-0623-5.
Højsgaard S, Halekoh U & Yan J. The R Package geepack for Generalized Estimating
Equations. Journal of Statistical Software. 2006; 15, 2, p1—11.
Ibrahim JG, Molenberghs G. Missing data methods in longitudinal studies: a review. Test
(Madrid, Spain). 2009;18(1):1-43. doi:10.1007/s11749-009-0138-x.
Jones BL, Nagin DS, Roeder K. A SAS Procedure Based on Mixture Models for Estimating
Developmental Trajectories. Sociological Methods & Research. 2001;29(3):374-393.
doi:10.1177/0049124101029003005.
Jung T, Wickrama KAS. An Introduction to Latent Class Growth Analysis and Growth Mixture
Modeling. Social Pers Psych Compass. 2008;2(1):302-317. doi:10.1111/j.1751-
9004.2007.00054.x.
36
Komárek A, Verbeke G. A SAS Macro for Linear Mixed Models with Finite Normal Mixtures
as Random-Effect Distribution. 2012. URL https://ibiostat.be/online-resources/online-
resources/longitudinal.
Laajala TD, Corander J, Saarinen NM, et al. Improved Statistical Modeling of Tumor Growth
and Treatment Effect in Preclinical Animal Studies with Highly Heterogeneous Responses In
Vivo. Clinical Cancer Research. 2012;18(16):4385-4396. doi:10.1158/1078-0432.ccr-11-
3215.
Leisch F. FlexMix: A General Framework for Finite Mixture Models and Latent Class
Regression in R. Journal of Statistical Software. 2004; 11(8), 1 - 18.
doi:http://dx.doi.org/10.18637/jss.v011.i08.
Lenk PJ, DeSarbo WS. Bayesian inference for finite mixtures of generalized linear models with
random effects. Psychometrika. 2000;65(1):93-119. doi:10.1007/BF02294188.
Liu C, Cripe TP, Kim M-O. Statistical Issues in Longitudinal Data Analysis for Treatment
Efficacy Studies in the Biomedical Sciences. Molecular Therapy. 2010;18(9):1724-1730.
doi:10.1038/mt.2010.127.
Mooijaart A, van der Heijden PGM. The EM algorithm for latent class analysis with equality
constraints. Psychometrika. 1992;57(2):261-269. doi:10.1007/BF02294508.
Nakai M, Ke W. Review of the Methods for Handling Missing Data in Longitudinal Data
Analysis. Int. Journal of Math. Analysis. 2011; 1, 1 – 13.
Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number
of events per variable in logistic regression analysis. Journal of Clinical Epidemiology.
1996;49(12):1373-1379. doi:10.1016/S0895-4356(96)00236-3.
Proust-Lima C, Philipps V, Liquet B. 2017. Estimation of Extended Mixed Models Using
Latent Classes and Latent Processes: The R Package lcmm. Journal of Statistical Software,
78(2), 1-56.doi:10.18637/jss.v078.i02.
37
Ranganathan A. The Levenberg-Marquardt Algorithm. Honda Research Institute USA. 2004.
URL http://www.ananth.in/Notes_files/lmtut.pdf.
Russell V. 2016. Least-Squares Means: The R Package lsmeans. Journal of Statistical
Software, 69(1), 1-33.doi:10.18637/jss.v069.i01.
Tein J-Y, Coxe S, Cham H. Statistical Power to Detect the Correct Number of Classes in Latent
Profile Analysis. Structural Equation Modeling: A Multidisciplinary Journal. 2013;20(4):640-
657. doi:10.1080/10705511.2013.824781.
White A, MurphyT. BayesLCA: An R Package for Bayesian Latent Class Analysis. Journal of
Statistical Software. 2014; 61(13), 1 - 28. doi:http://dx.doi.org/10.18637/jss.v061.i13.