Thesis Sofie Denies - Universiteit Gent

Faculty of Sciences

Latent class linear mixed model to analyze preclinical tumor growth experiments with heterogeneous treatment responses

Sofie Denies

Master dissertation submitted to

obtain the degree of

Master of Statistical Data Analysis

Promoter: Prof. dr. Stijn Vansteelandt

Department of Applied Mathematics, Computer Science and Statistics

Academic year 2016 - 2017

Faculty of Sciences

Latent class linear mixed model to analyze preclinical tumor growth experiments with heterogeneous treatment responses

Sofie Denies

Master dissertation submitted to

obtain the degree of

Master of Statistical Data Analysis

Promoter: Prof. dr. Stijn Vansteelandt

Department of Applied Mathematics, Computer Science and Statistics

Academic year 2016 - 2017

The author and the promoter give permission to consult this master dissertation and to

copy it or parts of it for personal use. Each other use falls under the restrictions of the

copyright, in particular concerning the obligation to mention explicitly the source when

using results of this master dissertation.

FOREWORD This manuscript is divided in two parts. In the first part, simulations were performed to evaluate

the use of different models to analyse data from longitudinal tumor measurements. In the

second part, these models are applied to experimental data that is obtained during the student’s

employment at iTeos Therapeutics. iTeos proprietary compounds that were used in this

experiment are not identified, thus there is no need for confidentiality.

This work would not have been possible without the contribution of the people involved.

Firstly, I would like to thank my promoter professor dr. Stijn Vansteelandt, to give me the

opportunity to complete this thesis under his supervision as well as his valuable suggestions

and contributions to this thesis. I am also grateful to the management team at iTeos

Therapeutics, for allowing me to use data gathered in your company. Also the positive attitude

towards my personal development, specifically in the context of statistical analysis, is greatly

appreciated.

Table of Contents

1 ABSTRACT .................................................................................................................. 1

2 INTRODUCTION ......................................................................................................... 3 2.1 Data ............................................................................................................................... 3 2.2 Latent class linear mixed models .................................................................................... 4 2.3 Software ........................................................................................................................ 6

3 MATERIAL AND METHODS ......................................................................................... 7 3.1 Data ............................................................................................................................... 7 3.2 Research questions ...................................................................................................... 10 3.3 Specifying a latent class linear mixed model ................................................................ 10 3.4 Specifying non-‐latent class models ............................................................................... 12

4 RESULTS ................................................................................................................... 13 4.1 Simulation study .......................................................................................................... 13

4.1.1 Simulations ................................................................................................................. 13 4.1.2 Classification ............................................................................................................... 13 4.1.3 Model selection .......................................................................................................... 15 4.1.4 False positive rate ...................................................................................................... 16 4.1.5 Power ......................................................................................................................... 18 4.1.6 Bias and precision ....................................................................................................... 20 4.1.7 Model refinement ...................................................................................................... 24

4.2 Experimental data ........................................................................................................ 26 4.2.1 Data description ......................................................................................................... 26 4.2.2 Visual inspection data ................................................................................................ 26 4.2.3 Model building .......................................................................................................... 27 4.2.4 Comparison model with no latent classes to model with 2 latent classes ................. 28

5 DISCUSSION ............................................................................................................. 31

6 References ............................................................................................................... 35

1

1 ABSTRACT

This thesis discusses the analysis of longitudinal tumor measurements obtained from

preclinical experiments on cancer immunotherapy. Because the response to immunotherapy is

not homogenous, data do not meet the assumptions of a classical linear mixed model. One way

to deal with this unexplained heterogeneity is latent class analysis, which is the subject of this

thesis. An extension of linear mixed model by latent class analysis is compared with the

standard linear mixed model as well as a generalized estimating equation (GEE) approach in

the context of longitudinal measurements with heterogeneous treatment responses.

In a first part, simulations were performed to assess performance of the latent class linear mixed

model and compare it with the other techniques. For this, heterogeneous groups were composed

by mixing longitudinal profiles from two homogenous populations representing the latent

classes. Longitudinal parameters for the different populations as well as the mixing proportion

were specified. Performance of the latent class linear mixed model was assessed by

determining the proportion of profiles that were correctly classified in the latent class they were

sampled from and the ability to select the correct number of latent classes. Additionally, power

and type I error rate to detect a treatment difference, as well as bias and precision of predicted

tumor growth curves were assessed and compared with the standard linear mixed model and

the GEE model. In a second part, experimental data of a preclinical experiment conducted at

iTeos Therapeutics was analyzed by a standard linear mixed model and the latent class linear

mixed model. The ability to detect biologically meaningful subgroups was assessed as well as

the power to detect a relevant treatment difference.

From the simulations, it was clear that latent class linear mixed model is able to correctly

classify all the longitudinal profiles in more than 90% of the simulations, as long as the model

was run with a sufficient set of starting values (10) to ascertain a global maximum in the

likelihood is reached. It is important to allow the random effect structure to be different per

latent class if needed, as this influences the identification of the correct number of latent classes,

power, type I error rate, power and precision. With a correctly specified model, the latent class

linear mixed model performed very well, with a correct type I error rate and sufficient power

in all conditions evaluated. In contrast, the standard linear mixed model behaved poorly, with

2

a too high type I error rate and unsatisfying power, even with a low level of heterogeneity and

a relatively large treatment difference. The GEE model was considered inadequate for this kind

of data, as the missing values patters as present in this data, ad random because mice are

sacrificed starting from a certain tumor volume, resulted in severe bias invalidating results

obtained with that analysis. The application of the model to experimental data confirmed the

ability to discriminate meaningful subgroups and a higher power to detect a treatment

difference compared to the standard linear mixed model.

3

2 INTRODUCTION

2.1 Data

This thesis discusses the analysis of tumor growth curves obtained from preclinical

experiments on cancer immunotherapy. In this field of oncology, drugs are developed that are

not targeting the cancer cells directly, but instead stimulate the immune system to fight cancer.

Cancer immunotherapy is now considered to be one of the greatest promises for finding a cure

against cancer, mostly because of the impressive responses seen with the immune checkpoint

inhibitors. However, the mechanism of action of immunotherapy is complex leading to a high

degree of heterogeneity in how well patients respond to therapy, much more than for the

conventional cytotoxic treatments like chemotherapy. Which factors determine this

heterogeneity is of major interest and thus subject of intense research but are yet poorly

understood (Farkona et al., 2016).

Interestingly, and even more surprisingly considering the identical genetic background and

highly standardized procedures, even in preclinical experiments the response to

immunotherapy is not homogenous. Responses to the same treatment protocol can range from

no delay in tumor growth (non-responders) over a delay in tumor growth (partial responders)

to full regression of the tumor (complete responders). Because of the longitudinal nature of the

measurements, an elegant technique to analyze tumor growth curves is the linear mixed model

framework (Liu et al., 2010). Because tumor growth is exponential, tumor volumes are log

transformed before analysis. The inclusion of random effects accommodates a degree of

heterogeneity. It allows that the true parameter (baseline tumor volume for random intercept,

tumor growth rate for random slope) within one population determined by the fixed effects

varies from individual to individual. However, it assumes that this individual variability is

described by a single distribution, most typically the normal distribution. This assumption is

often not met in data from immunotherapy experiments, where clear subpopulations are

present, based on unknown and thus unobserved variables. One way to deal with this

unexplained heterogeneity is latent class analysis (Laajala et al., 2012). The subject of this

thesis is the evaluation of the extension of linear mixed model by latent class analysis for

analyzing tumor growth curves in response to immunotherapy.

4

2.2 Latent class linear mixed models In this model framework, subjects are probabilistically assigned to subpopulations (class

membership) based on the observed longitudinal data. Probabilistically, because the latent

nature of the class variable implies uncertainty, class membership is not directly observed but

estimated. Two assumptions are made in this context. First, the latent classes are mutually

exclusive; this means that one individual belongs to one class only. Because of the latent and

thus unobserved nature of class membership, there is no certainty to which class, which is why

probabilities are calculated for each class per individual. A second assumption is that latent

classes are exhaustive, probabilities to belong to the estimated classes must sum up to one

(Duncan et al. 2013). Within each class, longitudinal profiles are modeled by a standard linear

mixed model. Individual contributions to the likelihood are thus the weighted sum of class-

specific densities (equation 1, Proust-Lima et al. 2017).

𝑓" 𝑦" = 𝜋&

'

&()

𝑓 𝑦" 𝑐" = 𝑔 (1)

with:

0 ≤ 𝜋& ≤ 1 𝑎𝑛𝑑 𝜋& = 1'&()

and

𝑌"6 𝑐"(& = 𝑋1"6𝛽 + 𝑋2"6𝜐& + 𝑍"6𝑢"& + 𝜖"6

𝜋& = class-specific probabilities

𝑐" = discrete latent variable that equals g if subject i belongs to latent class g

𝑋1= vector of covariates that are associated with common fixed effects over classes 𝛽

𝑋2= vector of covariates that are associated with class-specific fixed effects 𝜐&

𝑍= vector of covariates associated with random effects

𝑢"&= class-specific random effects, 𝑢" 𝑐" = 𝑔~𝑁(𝑂,𝜔D& 𝐵) with B being an

unspecified variance-covariance matrix and 𝜔& a proportional coefficient allowing

class-specific intensity of individual variability

5

𝜖"6 = measurement error, ~𝑁(0, 𝜎GD)

For parameter estimation, the EM algorithm is the most used optimization method of latent

class analysis. Based on initial parameter values, in the expectation step the expected value for

the unobserved latent variable, conditional on the observed data and initial parameter values,

is calculated for each subject. In the next step (maximization), this expected value is treated as

a measured covariate and parameters are re-estimated based on maximizing the complete data

likelihood. In a next iteration, the expected value for the latent variable is again calculated for

each subject based on these updated parameters and the algorithm repeats alternating the

expectation and maximization steps until convergence (Mooijaart et al. 1992). Nevertheless, it

is also possible to directly maximize the incomplete data likelihood, with Newton-Raphson

like algorithms (Proust-Lima et al. 2017). An example is the Marquardt algorithm, a blend of

gradient descent and Gauss-Newton iterations. Gradient descent quickly approaches the

solution from a distance, but convergence is very slow close to the solution. Gauss-Newton is

the opposite, with fast convergence close to the solution, but its efficiency is heavily dependent

on the accuracy of the initial guess. The Marquardt algorithm combines the strengths of both,

acting more like a gradient descent when parameters are far from their optimal value, and more

like Gauss-Newton when they are close to the optimal value (Ranganathan, 2004). Apart from

likelihood based methods, also Bayesian estimation processes for latent class analysis are

available for longitudinal data (Lenk et al., 2000).

Two specific issues are related to the estimation of latent class problems. The first is that the

likelihood often suffers from local maxima. It is therefore imperative to run the algorithms with

different start values, ensuring a global maximum is reached (Jung et al., 2008). A second

problem is the selection of number of classes. The most recommended approach is to fit a

model with at least one more and one less than the expected number of latent classes, based on

the data or theoretical expectations. The correct number of classes is determined by model

selection, either based on information criteria or on likelihood-based tests. From the

information criteria, the Bayesian information criterion (BIC) performs best, and for

likelihood-based tests, the bootstrap likelihood ratio test (BLRT). The BLRT outperforms the

BIC, but is computationally demanding (Tein et al., 2013).

6

2.3 Software Several software options are available to run latent class mixed models. Mplus is a statistical

software package with a special emphasis on latent class analysis, including many applications

within linear mixed models (Jung et al., 2008). Within basic SAS, no functions for longitudinal

latent class analysis are available. A freely available extension proc Traj allows latent class

analysis with longitudinal data, but it assumes independence of the repeated measurements

within individuals per latent class, which only rarely holds (Jones et al., 2001). Macros that do

not have this restriction by allowing random effects within the latent classes have been

developed (Komarek et al., 2002). Within R, several packages for longitudinal latent class

analysis are available (Leisch, 2004; White et al., 2014, Benaglia et al., 2009; Proust-Lima et

al., 2017). For this thesis, lcmm was chosen, a package specifically designed to provide

extensions to the linear mixed model, including latent class analysis (Proust-Lima et al., 2017).

Compared to other R packages, advantages include the user-friendliness to specify the model

components (fixed, random, class-specific covariates for the longitudinal model and covariates

associated with latent class membership) and many built-in post-fit functions. Lcmm is based

on maximum likelihood theory, and uses a modified Marquardt algorithm for optimization.

7

3 MATERIAL AND METHODS

3.1 Data This thesis focuses on latent classes as most frequently observed during immunotherapy

experiments, i.e. mice with complete regression of the tumor and mice with exponential

(possibly delayed) tumor growth. For a large part, analyses were performed on simulated data.

Each simulated experiment has two treatment groups, each composed of mice belonging to

these two latent classes. A treatment effect is represented by a difference in longitudinal profile

in one of the two latent classes.

Some important characteristics of the data:

• Simulations and analyses are done on logarithmically transformed tumor volumes as

tumor growth is exponential in nature.

• Time is recoded to be zero at the start of treatment, which is generally not the same as

time of tumor inoculation. As part of experimental procedures, mice are allocated at

that time to treatment groups with on average the same tumor volume per group.

Although strictly not a randomization, as allocation is completely determined by

baseline tumor volume, this ensures completely balanced groups for tumor volume at

start of treatment. Treatment differences are thus always represented in a different

growth rate, while the main effect of treatment is not of interest.

• Mice are sacrificed when tumor volume exceeds 2000mm3. Additionally, tumor growth

below 1mm3 cannot be accurately measured. Therefore, all simulated data >log(2000)

were treated as missing values. All mice are followed from start to treatment, until

tumor volume reaches 2000mm3. As thus the probability of missing responses is not

depending on the response that is not observed, given the previously observed

responses, this can be considered a missing at random pattern (MAR) (Ibrahim and

Molenberghs, 2009). All simulated data <log(1) were fixed at log(1).

• Sample size is based on what is practically feasible within preclinical drug development

research: 15 mice per group, with twelve measurement occasions.

• The number of simulations was set at 2000.

8

Heterogeneous groups were created in following way:

• Homogenous longitudinal profiles were simulated, separately per latent class and per

treatment group. This was done with the simulate function of the lme4 package (Bates

et al., 2015).

ü One latent class is described by initial tumor growth, followed by regression of

the tumor. This curvature is modelled by a quadratic term of time. The

parameters used the describe this class can be found in equation 2 and example

tumor growth curves in figure 2. The longitudinal profile of this class is not

influenced by treatment. In the rest of this thesis, this latent class will be

described by ‘latent class 1’.

𝑌"6 = 4.35 + 𝑏M + 1.88 + 𝑏) ∗ 𝑇𝑖𝑚𝑒"6 − 6.64 + 𝑏D ∗ 𝑇𝑖𝑚𝑒"6D + 𝜖"6 (2)

𝐛 = 𝑏M, 𝑏), 𝑏D W~𝑁 𝟎,𝑫 𝑎𝑛𝑑 𝑫 =0.016 −0.10 0.028−0.10 5.34 −5.480.028 −5.48 8.37

𝜖~𝑁(0,0.066)

ü The other latent class is described by exponential tumor growth (linear for log

transformed values). Therefore, the quadratic term of time is set to zero for this

class. The parameters used the describe this class can be found in equation 3a

and example tumor growth curves in figure 2. To simulate a treatment effect,

the linear term of time was varied per treatment group, i.e. decreased by 25%

(equation 3b) and 50% (equation 3c). In the rest of this thesis, this latent class

will be described by ‘latent class 2’.

𝑌"6 = 4.35 + 𝑏M + 6.63 + 𝑏) ∗ 𝑇𝑖𝑚𝑒"6 − 0 + 𝑏D ∗ 𝑇𝑖𝑚𝑒"6D + 𝜖"6 (3𝑎)

𝑌"6 = 4.35 + 𝑏M + 5 + 𝑏) ∗ 𝑇𝑖𝑚𝑒"6 − 0 + 𝑏D ∗ 𝑇𝑖𝑚𝑒"6D + 𝜖"6 (3𝑏)

𝑌"6 = 4.35 + 𝑏M + 3.3 + 𝑏) ∗ 𝑇𝑖𝑚𝑒"6 − 0 + 𝑏D ∗ 𝑇𝑖𝑚𝑒"6D + 𝜖"6 (3𝑐)

𝐛 = 𝑏M, 𝑏), 𝑏D W~𝑁 𝟎,𝑫 𝑎𝑛𝑑 𝑫 =0.008 −0.05 0.014−0.05 2.64 −2.710.014 −2.71 2.98

𝜖~𝑁(0,0.033)

• The proportion of the two latent classes was kept constant across treatment groups and

varied from 0.2, 0.4 to 0.6 of latent class 1 (p1; p2=1 - p1) in different simulation runs.

9

The actual number of longitudinal profiles to be sampled from either latent class 1 or

latent class 2 per simulation was determined by binomial sampling. For this, the R base

function ‘sample’ was used to get 15 draws of either ‘class 1’ or ‘class 2’ with p1 and

p2 specified. An example of one simulation round, based on a p1 of 0.2, and a treatment

difference of 25% in the slope of class 2, is shown in figure 1.

Treatment A Treatment B

Latent class 1 Latent class 2 Latent class 1 Latent class 2

(1)

(2) 4 11 1 14

(3)

Figure 1: Schedule showing the creation of heterogeneous treatment groups by simulation. Per

treatment and latent class, homogenous longitudinal profiles (L1 and L2) were simulated (1).

Based on a fixed proportion of each latent class (here 0.2 for class 1 and 0.8 for class 2) equal

between treatment groups, and fixed sample size per treatment group (n=15), the number of

profiles to be taken from each latent class (n1 and n2) is simulated per treatment group (2).

Treatment groups are then composed by taking n1 profiles of L1 and n2 profiles of L2 (3).

In addition to the simulated datasets, also experimental data with observed heterogeneity is

used. Data was collected at iTeos Therapeutics and is composed of mice treated with a

reference immunotherapy drug (aPD-1) with and without an iTeos compound. More detail

about this data can be found in the relevant chapter.

10

3.2 Research questions In this thesis, the research question is limited to evaluating the performance of latent classes

when there is a treatment difference in the longitudinal profile of one of the latent classes. A

treatment effect manifesting in a difference in proportion of mice belonging to a certain class

is another potential application of latent class analysis that is not considered in this thesis but

reviewed in the discussion.

Performance of the latent class analysis was assessed by:

• The number of mice that were correctly classified, i.e. in the latent class they were

sampled from.

• The ability to select the model with the correct number of latent classes (i.e. 2) based

on model selection criterion BIC.

• The power of the model to detect the relevant treatment difference. This was compared

with the power of a linear mixed model ignoring latent classes as well as with a general

estimating equation (GEE) approach using robust standard errors.

• The false positive rate when simulated under the null hypothesis of no treatment

difference in the longitudinal profile of any of the latent classes. Again this was

compared with the two other models as for power.

• The bias in predicted growth curves in latent class 2 (the latent class simulated to have

a treatment difference) along with its precision. Also here a comparison with a standard

linear mixed model and the GEE approach was made.

• The ability to identify biologically relevant latent classes and treatment differences in

experimental data and compare with a standard linear mixed model.

3.3 Specifying a latent class linear mixed model Within the lcmm package, the hmle function is used to fit a latent class mixed model. The

function consists of:

• Specifying the fixed effects model formula: Although tumor growth is exponential in

nature, in latent class 1 there is initially tumor growth followed by regression. To

accommodate this a quadratic term of time is included to allow for this curvature.

Interaction of time with treatment group completes the fixed effects formula.

11

• Specifying the random effects (random=): a random intercept and slope (linear and

quadratic) is specified.

• Specifying whether the variance-covariance matrix is common over latent classes

(nwg=FALSE) or not (nwg=TRUE): to investigate how the specification of the random

effects influences inference about the fixed effects, models with both specifications

were run. It has to be mentioned that even with nwg=TRUE, only a proportional

coefficient is estimated to allow for a class-specific intensity of variance (see equation

1). There is no option to estimate a completely separate variance-covariance matrix per

class.

• Specifying the class-specific parameters of the linear model (mixture=): both time and

treatment and their interactions are allowed to be class-specific.

• Specifying the grouping structure (subject=): covariate identifying unique mouse

numbers.

• Specifying the number of latent classes (ng=): because data were sampled from two

latent classes, as part of model selection, models with 1 to 3 latent classes were run.

• Specifying the covariates having an influence on class membership (classmb=): not

used in this thesis, but potential use commented on in the discussion.

Other functions within the lcmm package that were used in this thesis:

• Gridsearch: estimates starting from random start values from parameters from a model

with no latent classes. The number of random start values can be specified by rep= and

the number of iterations after which the log likelihood is evaluated to determine the

best start values to run the full optimization by maxiter=. Rep= was varied from 1 to 20

and maxiter was kept constant at 50. This was used to evaluate classification and model

selection, to assess how easily one can obtain the correct model without any prior

information.

• Start: alternative to gridsearch, where start values are specified by the user. This was

used for simulated data to speed up the simulations when evaluating performance of

the (correctly identified) model.

• Pprob: gives the posterior individual classification table including the most likely class

membership. This was used to evaluate the classification accuracy of the model by

comparing this to the class where the mouse was sampled from.

12

• WaldMult: provides Walt test for joint significance of multiple coefficients. This was

used to jointly test both interactions of time and treatment within the longitudinal model

where a quadratic term of time was included.

• Posfix: Allows to fix parameters to a certain value, instead of estimating them. This

was done to simplify the model by restricting the growth curve of latent class 2 to be

linear, i.e. fixing the quadratic terms of time to zero for this class.

• predictY: provides class-specific predicted values. By comparing the predicted growth

curve with the true growth curve based on equations 3a-c, this was used to assess

accuracy of the different models. With the option draws=True also 95% confidence

limits are calculated, which was used to assess precision for the different models.

3.4 Specifying non-latent class models To compare power, false positive, bias and precision of the latent class analysis to analyses

ignoring latent classes, two additional models were fit. The first one was a standard linear

mixed model. This was done with the same function (hmle from package lcmm) as for the

latent class analysis, by specifying the number of latent classes (ng) to 1. For the GEE approach,

the function ‘geeglm’ form the package ‘geepack’ was used with first-order autoregressive

correlation structure (Højsgaard et al., 2006). The ‘lsmeans’ function from the ‘lsmeans’

package was used to calculate predicted values and their 95% CI for the GEE model (Russell,

2016).

13

4 RESULTS 4.1 Simulation study 4.1.1 Simulations In figure 2, an example of the simulated growth curves is shown, for the two latent classes

(panels) and treatment groups (colours). Both the curves on the logarithmic scale (simulated

and used for modeling) and the back-transformed linear scale (as they would be recorded in

experimental data and subsequently log transformed for analysis) are shown.

A B

Figure 2: Log transformed (A) tumor growth curves as they were simulated, and

backtransformed to their measurements scale (B). Left panels show latent class 1, right panel

latent class 2. Colours represent simulated treatments groups, having similar growth curves in

latent class 1 but different growth curves in latent class 2.

4.1.2 Classification Following parameters were varied to assess their influence on classification accuracy:

• the treatment difference: simulations were run for a small (25%, group A vs group B

from figure 2) and large (50%, group A vs group C from figure 2) difference in slope

for mice belonging to latent class 2

14

• the proportion of latent classes within a treatment group: simulations were run for a

proportion of latent class 1 of 0.2, 0.4, 0.6. For all simulations, this proportion was

equal between treatment groups

• number of random start values: models were run with 1, 10 and 20 sets of random start

values, based on parameters from a model without latent classes.

• Specification of random effects: both models were variance-covariance structure was

specified to be common over latent classes as well as models with a class-specific

proportional coefficient (see equation 1) were fit

First, per simulation the proportion of mice that were correctly classified was evaluated, i.e.

whose most likely posterior class membership corresponded to the class they were sampled

from. The allocation of class names (1 or 2) is random, so from simulation to simulation the

same latent class can be encoded by a different class number. To be able to translate the

posterior class number to latent class 1 or class 2 from simulations, the average tumor volume

per class was calculated and the class with the lowers average tumor volume was recoded to

latent class 1.

Figure 3: Distribution of the proportion of mice that were correctly classified for a model

with two latent classes, 15 mice per group, 20% of regressors and a slope difference of 25%

between treatment groups in the growers latent class. From left to right: with 1 set of start

values, 10 random start values and 20 random start values.

There was no influence of slope parameters or percentage of regressors on the number of

correctly classified mice. Also the specification of the random effects, common over latent

classes or allowed to be different, had no influence on classification accuracy (table 1). In

figure 3 and table 1, the effect of the number of random start values is shown. It is clear that

15

only 1 set of start values is clearly inferior (only in <60% of the simulations all mice are

classified correctly), demonstrating the problem of local maxima. Beyond 10 start values there

is no further increase in classification accuracy.

Table 1: Percentage of simulations were the model classified all the mice in the correct latent

class, in function of proportion of mice belonging to latent class 1 (0.2, 0.4 or 0.6), the size of

a treatment difference (25% or 50% in slope in latent class 2), number of random start values

(1,10 or 20) and whether random effects were forced to be common over latent classes (M1)

or not (M2).

4.1.3 Model selection The ability to distinguish the correct number of classes was also evaluated. It was evaluated if,

from models with 1 to 3 latent classes, the BIC could identify the correct model with 2 latent

classes. BIC is an information criterion based on the likelihood function, and a penalty for the

number of parameters in the model. It is scaled to identify the better model by a lower BIC.

Therefore, it was evaluated, if from models with 1 to 3 latent classes, the model with 2 latent

classes had indeed the lowest BIC. Based on the results of the previous section, models were

estimated based on 10 random start values. For these simulations, treatment difference in slope

of latent class 2 was fixed on 25% (group A vs group B in figure 2).

As can be seen from table 2, the BIC performs very well when comparing models with 1

compared to 2 classes, irrespective of how random effects are specified. The power of the BIC

to detect latent classes when they are indeed present in the data is thus high. However, the BIC

0.2 0.4 0.6

25% 50% 25% 50% 25% 50%

M1 1 49% 52% 43% 47% 50% 51%

10 93% 97% 94% 97% 92% 95%

20 95% 96% 93% 97% 93% 94%

M2

1 47% 49% 49% 50% 51% 49%

10 95% 94% 94% 93% 91% 93%

20 94% 94% 95% 92% 93% 93%

16

incorrectly favors 3 latent classes in a relatively large proportion of the cases if covariance

structure is forced constant over latent classes. Indeed, the 3 latent class model attempts to

compensates for this, by creating an extra class within latent class 1, resulting in a covariance

structure that is more similar to the one of latent class 2 then when the former is correctly

identified as one group. Allowing the random effects to vary per latent class by a proportional

coefficient partly resolves the overestimation of the number of latent classes by the BIC, but

not completely.

When mice were sampled from only one distribution (latent class 2), the BIC only incorrectly

identified 2 latent classes in a small minority of the simulations (<5%). The false-positive rate

of indicating the need for latent classes when in reality groups are composed of one

homogeneous population is thus low. Also, when simulations were performed with identical

covariance structure in the two latent classes, the problem of overestimation of 3 versus 2 latent

classes disappears (<5%), confirming that the liberal nature of the BIC in identifying the correct

number of classes as identified above results from misspecification of the random effects.

Table 2: Percentage of simulations were a model with 2 latent classes had a lower BIC

compared to model with 1 and 3 latent classes for different proportions of latent class 1 (0.2 to

0.6) for latent class models with common random effects (nwg=F) and including a class-

specific proportional coefficient (as defined in equation 1) for the random effects (nwg=T).

4.1.4 False positive rate To determine the false positive rate, mice were sampled under the null hypothesis of no

treatment difference. The slope of latent class 1 was kept constant as always, and the slope

parameter of latent class 2 was fixed as in equation 3b (group B from figure 2) for both

treatment groups to simulate the null hypothesis. The proportion of latent class 1 was varied as

in the previous sections. For these simulations, one set of start values was specified based on

0.2 0.4 0.6

2 vs no latent classes Nwg=F

Nwg=T

97%

100%

91%

99%

94%

100%

2 vs 3 latent classes Nwg=F

Nwg=T

70%

90%

62%

80%

69%

88%

17

simulation parameters instead of random sets of start values. This was done to speed up the

simulations, but means that all results are conditional on a correctly identified model (i.e. global

maximum likelihood function). However, as a global maximum is identified in the majority of

the cases when sufficient random start values are included (see above), these results are also

valid for real experimental data where there may be no prior information on the parameters

available.

With a Wald test is was tested if both time*treatment interactions terms were equal to zero for

a model with 2 latent classes, both with and without common covariance structure, and a

standard linear mixed model ignoring latent classes and a GEE approach with robust standard

errors. The proportion of simulations were this test gave a significant result (p<0.05) represents

the type I error rate, i.e. false positive results and are given in table 3. Apart from this outcome,

also histograms of the p-values were evaluated, as p-values should be uniformly distributed

under the null hypothesis (figure 4).

It is clear that the standard linear mixed model and the GEE are too liberal, with the type I error

rate exceeding 5% (table 3). Indeed, p-values are not uniformly distributed under the null

hypothesis, but are right-skewed (figure 4). A potential explanation could be that, although the

true proportion of latent classes are the same between the two treatment groups, a difference in

actual number of mice belonging to one latent class between treatment groups due to sampling

variability of a proportion is causing the excess of false positive results. Indeed, when only one

draw for number of mice in the latent classes per simulation was performed, resulting in exactly

the same numbers for both treatment groups, the opposite is true. P-values become left-skewed,

and the tests are too conservative.

The latent class linear mixed model where the covariance structure is (incorrectly) kept

constant over the latent classes is too conservative, with p-values having a left-skewed

distribution (table 3 and figure 4). This becomes clearer with an increasing proportion of mice

belonging to latent class 1. This can be explained by the greater individual variability in latent

class 1 compared to latent class 2 (see equations 2 and 3), which in this model is also incorrectly

influencing the variance of latent class 2, the class of interest.

The latent class linear mixed model where covariance structure was allowed to be different for

the latent classes approximates the uniform distribution of p-values and a correct type I error

18

rate the best from the three tests, although for a high number of mice belonging to class 1 the

left-skewed distribution becomes apparent again (table 3 and figure 4). This can be explained

by the fact that even in this model a completely separate covariance structure per latent class

is not estimated, but only one class-specific proportional parameter. Nevertheless, influence on

type I error rate is minimal, even with a high proportion of mice belonging to latent class 1(table

3). Performing the same multivariate Wald test with no heterogeneity, i.e. 0% of mice

belonging to latent class 1 or with two latent classes with identical covariance structure, results

in a uniform distribution of p-values under the null hypothesis with a correct type I error rate.

Table 3: Type I error rate for a linear mixed model ignoring latent classes (m1), a GEE model

(m2), a model with two latent classes with common covariance structure (m3) and including a

proportional coefficient for random effects (m4) in function of the proportion of mice

belonging to latent class 1 (0.2 to 0.6).

0.2 0.4 0.6

M1 8.1% 9.1% 8.1%

M2 13% 8% 8%

M3 3.5% 1.3% 1.1%

M4 5.5% 5.2% 4.2%

4.1.5 Power To assess power of the different models, tumor growth curves were sampled under alternative

hypotheses. The slope of latent class 1 was kept constant across treatment groups, but the slope

of latent class 2 had a difference between treatment groups of 25% and 50% (group A vs group

B and group A vs group C from figure 2 respectively). Again, one set of start values was

specified based on simulation parameters. With a Wald test is was tested if both time*treatment

interactions terms were equal to zero for both a model with 2 latent classes and a standard linear

mixed model ignoring latent classes as well as a GEE model. The proportion of simulations

where this test gave a significant result (p<0.05) represents the power of the test (1-type II error

rate).

19

0.2 0.4 0.6

M1

M2

M3

M4

Figure 4: P-value histogram for simulations under the null hypothesis from a linear mixed

model ignoring latent classes (m1), a GEE model (m2), a model with two latent classes with

common covariance structure (m3) and including a proportional coefficient for random effects

(m4) in function of proportion of mice belonging to latent class 1 (0.2 to 0.6).

In table 4 it is shown that the power of a test where the latent classes are included is superior

to the power of a standard linear mixed model in all cases. Moreover, it is clear that a standard

linear mixed model is severely underpowered to detect even relatively large treatment

differences in the presence of this heterogeneity at the evaluated sample size. Even with a low

number of mice belonging to latent class 1, power is extremely low. A GEE model outperforms

20

the standard linear mixed model, and with a large treatment difference it reaches sufficient

power. However, with a moderate treatment difference, power is inadequate. A dramatic

increase is observed with latent class analysis. Misspecification of the random effects as

common over latent classes has a negative influence on power, which becomes more clear as

the proportion of mice belonging to latent class 1 increases. Including the proportional

coefficient for the random effect greatly improves power. Latent class analysis, with a correctly

specified covariance structure, has a high power to detect the evaluated treatment difference in

the presence of heterogeneity. Only with a high proportion of mice belonging to latent class 1

and a small treatment difference, power of the latent class analysis does not reach 80%.

Table 4: Power of a linear mixed model ignoring latent classes (m1), a GEE model (m2), a

model with two latent classes with common covariance structure (m3) and including a

proportional coefficient for random effects (m4) in function of the proportion of mice

belonging to latent class 1 (0.2 to 0.6) and treatment difference (25 or 50%).

25% 50%

0.2 0.4 0.6 0.2 0.4 0.6

M1 26% 15% 11% 78% 42% 24%

M2 54% 32% 19% 99% 100% 89%

M3 96% 71% 30% 100% 100% 97%

M4 99% 90% 70% 100% 100% 99%

4.1.6 Bias and precision To further assess model fit, both accuracy and precision were compared between the different

models. Parameter of interest for inference is the difference in slope in latent class 2 between

treatment groups, but as this is represented by two interactions (linear and quadratic term of

time*treatment) that are difficult to interpret independently from one another, bias and

precision were evaluated at the level of predicted values for this latent class. Simulations were

run, and per simulation the predicted values and the 95% CI for the latent class with a treatment

difference (latent class 2) was saved via the function predictY of lcmm for a model with two

latent classes (with and without common covariance structure) and overall predicted values

and 95% CI ignoring latent classes for a standard linear mixed model and the GEE model. The

average of the predicted values and limits of the CI of all the simulations was saved and were

21

compared with the true growth curve of latent class 1, based on the parameters shown in

equation 3.

Unsurprisingly, when comparing the overall predicted values of a standard linear mixed model

ignoring latent classes with the actual growth curve of the latent class of interest, an

underestimating of the growth curve and an unprecise estimation is evident. Nevertheless, it is

interesting to see how large the influence is even if there is only a small contamination of the

other latent class. Already with only 20% of mice belonging to latent class 1, there is

tremendous difference, both regarding accuracy and prediction, compared to estimation with

the same model when there is no heterogeneity (figure 5). Naturally, this becomes even more

problematic as the proportion of mice belonging to latent class 1 increases.

20% latent class 1 0% latent class 1

Figure 5: Mean predicted values (black solid line) and 95% confidence intervals (black dotted

line) from a linear mixed model ignoring latent classes for simulations with a 50% slope

difference and 20% (left) or 0% (right) of mice belonging to latent class 1 compared to the true

growth curve of latent class 2 (red dotted line)

The situation is even worse for the GEE approach. With no heterogeneity, this model behaves

exactly as the standard linear mixed model. However, even with only 20% of mice belonging

latent class 1, this model performs extremely poorly. This can be explained by the missing

value pattern. Mice are euthanized when tumor volume exceeds a certain number (MAR).

22

Because of faster tumor growth in treatment group A compared to group B, during the follow-

up period there are more missing values for mice in group A. The missing values do not pose

serious problems when there is no heterogeneity, but becomes problematic when treatment

groups also contain mice from latent class 1 (figure 6). Because in latent class 1 there is

regression of the tumor, no missing values for those mice are present. So for treatment group

A, from a certain time point, data are missing in the latent class of interest but not for the other

latent class. It was confirmed that the bad behavior of the model was caused by the missing

values by running the model on simulations with the same level of heterogeneity but no missing

values (simulated tumor volumes >2000mm3 were kept as observations). Then the model

performs similarly to the standard linear mixed model (figure 6).

20% latent class 1

with missing values

20% latent class 1

without missing values 0% latent class


line) from a GEE for simulations with a 50% slope difference and 20% (upper) or 0% (lower)

of mice belonging to latent class 1 with (left) and without (right) missing values compared to

the true growth curve of latent class 2 (red dotted line)

The latent class mixed model does not suffer from any bias, whether random effects are kept

constant over latent classer or not. Both also result in much more precise estimation then the

standard linear mixed model. In line with previous results, allowing the random effects to vary

per latent class has a beneficial influence, i.e. reduces the uncertainty in predicted values.

Although true for all tested conditions, this influence is minimal with only 20% belonging to

latent class 1, and becoming more clearly with increasing proportion of mice belonging that

that latent class (figure 7).

23

Nwg=F Nwg=T

20% latent class 1

60% latent class 1


line) from a linear mixed model ignoring latent classes for simulations with a 50% slope

difference and 20% (left) or 0% (right) of mice belonging to latent class 1, with common

random effects (nwg=F) and including a class-specific proportional coefficient (as defined in

equation 1) for the random effects (nwg=T), compared to the true growth curve of latent class

2 (red dotted line)

24

4.1.7 Model refinement Tumor growth is exponential (and thus linear on a logarithmic scale) for latent class 2, but not

for latent class 1. In a standard linear mixed model, it is not possible to accommodate this

difference in growth pattern as there is no distinction between two latent classes. All the mice

are either fit by only a linear term or also a quadratic term. Interest lies in the latent class 2, but

by only including a linear term, residual variance is increased because of a bad fit for latent

class 1. Therefore, including a quadratic term of time results in better model fit and better

power then when only a linear term is included. Hence, to test for a treatment difference in

slope for latent class 2, two interactions need to be tested simultaneously, as the individual

terms of a polynomial parameter cannot be interpreted separately.

In the latent class model, it is possible to accommodate this difference in growth pattern

between the latent classes. The hlme fuction to fit a latent class model has the option ‘posfix’,

which allows the user to fix certain parameters, so they will not be estimated. This option can

only be used in combination with the ‘start’ function, where all the start values are specified

by the user. For the parameters specified by ‘posfix’, parameters are fixed to their specified

start value. In this case, the quadratic term for time and the interaction of this term with

treatment was set to zero only for latent class 2, reducing their growth curve to linear but

keeping a good fit for latent class 1. To test for a treatment difference, only one interaction

needs to be tested, which is more efficient. Table 5 shows the increase in power that this

approach brings, for a model with latent classes and a proportional coefficient for the random

effects. The increase in power is minimal for most conditions, because the power of the test is

already almost maximal without this adaptation. However, for the conditions where power is

not yet maximal, it is clear that this approach has a substantial influence on power, e.g. for a

25% difference in slope and 60% of mice belonging to latent class 1 it increases the power

from an insufficient 70% to a satisfying 83%. An additional advantage is that the effect of

interest is now represented by one parameter that is easy to interpret (the difference in slope

between treatment) and report together with a measure of precision (standard error, 95% CI).

25

Table 5: Power of a model with two latent classes including a proportional coefficient for

random effects with (m2) and without (m1) fixing the quadratic terms of time to zero, in

function of the number of regressors (20 to 60%) and treatment difference (25 or 50%).

25% 50%

20% 40% 60% 20% 40% 60%

M1 99% 90% 70% 100% 100% 99%

M2 100% 97% 83% 100% 100% 100%

26

4.2 Experimental data

4.2.1 Data description The data shows tumor growth measurements of mice that were subcutaneously inoculated with

CT26, a colon cancer model. Mice were allocated at day 10 after inoculation to treatment

groups with the same average tumor size. One group of 8 mice was treated with anti-PD-1, a

clinically approved immunotherapeutic drug. A second group of 16 mice was treated with anti-

PD-1 + an Iteos compound. Tumors were measured three times per week for 17 days. Mice

were euthanized when tumor volumes exceeded 2000mm3 and tumors measuring less than

1mm3 were recorded as 1mm3. There were no missing measurements apart from the mice that

were euthanized because of tumor size. The objective of the experiment was to investigate if

the combination of anti-PD-1 and Iteos compound was superior to anti-PD-1 monotherapy.

4.2.2 Visual inspection data In both treatment groups, there are mice with tumors that regress and mice with tumors that

continue growing, suggesting latent heterogeneity (figure 8). In the combination treatment,

there seems to be a delay in tumor growth for the mice that do not regress. After log

transformation of the tumor volumes, tumor growth seems to be linear in the mice with tumors

that continue growing, whereas in the mice with regressing tumors, a curvature is present

(figure 9).

Figure 8: Observed individual tumor growth curves for both treatment groups.

27

Figure 9: Log transformed individual tumor growth curves for both treatment groups.

4.2.3 Model building To assess the need for latent classes, a model with no latent classes, two and three latent classes

was fit. The same model as discussed in the simulation chapter was fit. To asses if random

effects could be kept constant across classes or not, a model with and without a class-specific

coefficient for the random effect was fit for the two and three latent class models. As in the

simulation, a gridsearch with 10 random start values based on the model with no latent classes

was used to fit the model with latent classes. With the function ‘summarytable()’, an overview

of the models was generated. Based on the BIC, a model with 2 latent classes is identified as

the best model, with no need for a class-specific coefficient for the random effects (table 6).

With this model, latent classes represent biologically meaningful groups; i.e. mice whose

tumors are regressing and mice whose tumors keep on growing (figure 10). To further simplify

the model, considering the linear tumor growth in mice whose tumors do not regress (latent

class 1), a model was fit where the quadratic term for this latent class was fixed to zero. This

did not result in a worse fit (BIC=146) and was thus chosen as the final model.

Table 6: Summary of models with different number of latent classes (G) and common

random effects structure (nwg=F) or class-specific coefficient (nwg=T).

G nwg npm BIC %class1 %class2 %class3 m1 1 / 13 156 100 m2 2 F 20 148 79 21 m3 2 T 21 151 21 79 m4 3 F 27 148 58 21 21 m5 3 T 29 153 38 42 21

28

Figure 10: Classification of the longitudinal profile according to a latent class linear mixed

model with two latent classes (red=latent class 1, blue=latent class 2).

4.2.4 Comparison model with no latent classes to model with 2 latent classes Model parameters for the model with no latent classes and with two latent classes are shown

in table 7 and 8 respectively. A Wald test jointly testing the two interaction terms between time

and treatment gave a p-value of 0.09 for the model with no latent classes. A Wald test testing

the interaction of time with treatment for latent class 1 for the model with two latent classes

resulted in a p-value of 0.0001. It can be thus concluded that the latent class model is much

more powerful to detect a treatment difference in this dataset than a standard linear mixed

model.

Table 7: Model parameters for a standard linear mixed model without latent classes.

Parameters in red represent a treatment difference.

coef Se Wald p-value

Intercept 4.75 0.08 61 <0.0001

Time 0.12 0.02 5.54 <0.0001

Time2 -0.005 0.001 -3.75 0.0002

Group -0.05 0.13 -0.36 0.72

Time*Group 0.02 0.04 0.45 0.65

Time2*Group 0.003 0.002 1.40 0.16

29

Table 8: Model parameters for a standard linear mixed model without latent classes.

Parameters in red represent a treatment difference. coef Se Wald p-value

Intercept cl1 4.71 0.12 40.02 <0.0001 Intercept cl2 4.61 0.31 15.07 <0.0001

Time cl1 0.14 0.02 9.32 <0.0001 Time cl2 -0.01 0.09 -0.12 0.90

Time2 cl1 0* Time2 cl2 -0.005 0.004 -1.32 0.19

Group cl1 0.07 0.15 0.48 0.63 Group cl2 0.02 0.34 0.07 0.94

Time*Group cl1 -0.07 0.02 -3.81 0.0001 Time*Group cl2 0.16 0.11 1.61 0.11 Time*Group cl1 0*

Time2*Group cl2 -0.006 0.004 -1.50 0.13

Overall predicted growth curves for the model with no latent classes and class-specific tumor

growth curves for latent class 1 for the model with two latent classes along with their 95% CI

are shown in figure 11 and 12 respectively. For a standard linear mixed model, a large influence

of the few mice with regressing tumors on the predicted growth curve of the combination

treatment is evident (figure 11A). For the latent class model, a separate growth curve for mice

depending on tumor regression (latent class 2) or tumor growth (latent class 1) is estimated,

resulting in a growth curve that is representative for each class (figure 12A). The improved

precision in estimation explains the greater power of this approach compared to the linear

mixed model (figure 11 and 12B).

30

A B

Figure 11: Predicted growth curves (red) superimposed on the observed indivivual growth

curves (black) (A) and predicted growht curves (solid lines) with their 95% confidence

intervals (dotted lines) (B) for a standard linear mixed model

A B

Figure 12: Predicted growth curves (red) superimposed on the observed indivivual growth

curves (black) (A) and predicted growht curves (solid lines) with their 95% confidence

intervals (dotted lines) (B) for latent class 1 from a latent class linear mixed model

31

5 DISCUSSION In this thesis, the performance of a standard linear mixed model, a robust GEE model and a

latent class linear mixed model was evaluated when there is latent heterogeneity present. The

ability to detect a treatment difference in longitudinal growth profile of one the latent classes

was of interest.

The standard linear mixed model performed inadequately, with both a too liberal type I error

rate and low power. As it is thus more likely to find a treatment difference when it is not there

and to miss a treatment difference when it is actually present, standard linear mixed models

cannot be seen as robust in the presence of latent classes. The liberal nature under the null

hypothesis can be explained by the fact that although the true proportion was the same for both

treatment groups, because of sampling the actual proportion was not the same in all

simulations. When there was a such a difference in proportion, the overall estimated growth

curve was falsely considered different between treatment groups, although both the

longitudinal profiles of the two latent classes and the proportion of latent classes were sampled

under the null hypothesis of no treatment difference. The lack of sufficient power can mainly

be explained by the increased variability compared to a homogenous population. It has to be

mentioned that the effect of heterogeneity is large, even with a low proportion of mice

belonging to another latent class and a large treatment difference. This can be explained by the

fact that the latent classes as evaluated in this thesis are very different from each other, the way

it is also often seen in experimental data.

The GEE model seemed to have the same problem for type I error rate, but a higher power as

the standard linear mixed model. However, when evaluating bias of this model, it was clear

that the GEE model was not able to cope with the missing value pattern in the presence of

heterogeneity. The data can be considered missing at random, as tumor volumes>2000mm3

were replaced by missing values, so the probability of missing responses is dependent on

previously measured responses (all measurements up to 2000mm3). Linear mixed models can

cope with missing at random, and indeed there was no difference whether missing values were

present or not. GEE models can only cope with missing completely at random (Nakai et al.,

2011). However, without heterogeneity, there was no influence of the missing value pattern as

32

described above. It is only in the presence of heterogeneity and missing values, that GEE did

not yield valid results. Therefore, GEE cannot be recommended to be used with this kind of

data.

Latent class linear mixed models performed very well, leading to a great increase in power

compared to a standard linear mixed model and not suffering from an inflated type I error rate.

It is important to run the model with different sets of start values, or specify sensible start

values, in order to reach a global maximum of the likelihood. If this is ensured, the model is

able to correctly classify profiles in the vast majority of cases. This good convergence of a

relatively complex model for a relatively small sample size can be explained by the fact that

the latent classes are quite distinct from each other. The BIC is a reliable criterion to identify

the need of latent classes versus none, as it favors in over 90% of simulations with two latent

classes, and less than 5% of simulations with no latent classes, the model with two latent classes

versus a standard linear mixed model. It is however somewhat liberal in the determination of

the number of latent classes. Therefore, care should be taken not to overestimate the number

of latent classes, which can be done by restricting the number of classes to biologically

meaningful and interpretable subgroups (Berlin et al., 2013). The need for latent classes and

determining the number of latent classes can also be tested with the BLRT, which has been

identified as superior compared to the BIC for model comparison. However, this test is

computationally demanding and time-consuming because of the extensive number of models

needed to be run for one dataset to get the different bootstrap estimates under different null

hypotheses, all with sufficient set of start values (Tein et al., 2013).

Within the latent class linear mixed model, it is important is to allow for class-specific random

effects when needed. If not, this influences model selection with regards to the number of latent

classes. In an attempt to obtain latent classes with similar variance, there will be an

overestimation of the number of latent classes. And because an incorrectly common variance

will over- or underestimate the true variance of one of the latent classes, also type I and type II

error rates are affected. A limitation of the model is that it only allows variances to vary

proportionally the same for all random effects between latent classes. However, this

approximation works well with no noteworthy influences on the different parameters

considered in this thesis to assess model performance.

33

It has to be mentioned that power, type I error rate, bias and precision of the latent class linear

mixed model were assessed conditional on a model with the correct number of latent classes

and with specified start values, which is not the same as estimation without any prior

information. However, because of the good convergence with a sufficient amount of start

values and the good performance of the BIC to select the number of latent classes, results would

be very similar if for each simulation this approach of different start values and selection of the

right model based on the BIC after fitting models with different number of classes was used.

This was not done because of the significant time gained by only fitting one model. Moreover,

because of the small sample size and clearly distinguishable subgroups, also for real

experimental data it will be possible to provide reasonable start values. Therefore, it is believed

that the results based on the stimulation as they were performed are also valid for real

experimental data.

The good performance of the latent class linear mixed model in the stimulation studies was

confirmed by its application to real experimental data. The model was able to identify

biologically meaningful subgroups and was more powerful than a standard linear mixed model

to detect a treatment difference.

In this thesis the latent class linear mixed effect model was used to test a difference in slope in

one of the latent classes. Only the longitudinal parameters of the model were thus of interest

for inference. However, the model can also be used to test a difference in proportion of latent

classes between treatment groups. A covariate explaining class membership in the logistic part

of the model can be easily included in the hlme function by specifying the classmb option

(Proust-Lima et al., 2017). Testing if a treatment would for instance increase the proportion of

mice with complete regression could be of great scientific interest. Also other variables

explaining class membership could be valuable. For instance, if potential biomarkers could be

evaluated this way for their association with a certain type of anti-tumor response, this could

lead to a better understanding of the mode of action of the treatment. This is of particular

interest for cancer immunotherapy, as it is still unknown what causes the heterogeneity in

response. However, in preclinical experiments as they are conducted routinely, with generally

only 15 mice per group, reliable conclusions from logistic regression are not possible (Peduzzi

et al., 1996). Increasing the sample size considerably in preclinical experiments is difficult for

practical reasons, because of the high number of compounds that need to be evaluated in this

34

stage. In clinical trials, sample sizes are much larger, and the use of latent class analysis to

answer these specific questions are very relevant in this setting.

In conclusion, this thesis demonstrated that standard linear mixed models do not perform well

to assess treatment differences in tumor growth rate in the presence of latent heterogeneity as

often observed in preclinical experiments, not even with a low level of heterogeneity and a

large treatment difference. GEE models should not be used in the presence of missing data

(unless missing completely at random), which is often the case in these kind of experiments.

In contrast, latent class linear mixed models perform very well, and should be the method of

choice in case of latent heterogeneity.

35

6 REFERENCES

Bates D, Maechler M, Bolker B, Walker S. Fitting Linear Mixed-Effects Models Using lme4.

Journal of Statistical Software. 2015;67(1), 1-48.doi:10.18637/jss.v067.i01.

Benaglia T, Chauveau D, Hunter D, & Young D. mixtools: An R Package for Analyzing

Mixture Models. Journal of Statistical Software. 2009;32(6), 1 - 29.

doi:http://dx.doi.org/10.18637/jss.v032.i06.

Berlin KS, Williams NA, Parra GR. An Introduction to Latent Variable Mixture Modeling

(Part 1): Overview and Cross-Sectional Latent Class and Latent Profile Analyses. Journal of

Pediatric Psychology. 2013;39(2):174-187. doi:10.1093/jpepsy/jst084.

Duncan TE, Duncan SC, Strycker LA. An Introduction to Latent Variable Growth Curve

Modeling: Concepts, Issues, and Application, Second Edition. 2013.

Farkona S, Diamandis EP, Blasutig IM. Cancer immunotherapy: the beginning of the end of

cancer? BMC Medicine. 2016;14:73. doi:10.1186/s12916-016-0623-5.

Højsgaard S, Halekoh U & Yan J. The R Package geepack for Generalized Estimating

Equations. Journal of Statistical Software. 2006; 15, 2, p1—11.

Ibrahim JG, Molenberghs G. Missing data methods in longitudinal studies: a review. Test

(Madrid, Spain). 2009;18(1):1-43. doi:10.1007/s11749-009-0138-x.

Jones BL, Nagin DS, Roeder K. A SAS Procedure Based on Mixture Models for Estimating

Developmental Trajectories. Sociological Methods & Research. 2001;29(3):374-393.

doi:10.1177/0049124101029003005.

Jung T, Wickrama KAS. An Introduction to Latent Class Growth Analysis and Growth Mixture

Modeling. Social Pers Psych Compass. 2008;2(1):302-317. doi:10.1111/j.1751-

9004.2007.00054.x.

36

Komárek A, Verbeke G. A SAS Macro for Linear Mixed Models with Finite Normal Mixtures

as Random-Effect Distribution. 2012. URL https://ibiostat.be/online-resources/online-

resources/longitudinal.

Laajala TD, Corander J, Saarinen NM, et al. Improved Statistical Modeling of Tumor Growth

and Treatment Effect in Preclinical Animal Studies with Highly Heterogeneous Responses In

Vivo. Clinical Cancer Research. 2012;18(16):4385-4396. doi:10.1158/1078-0432.ccr-11-

3215.

Leisch F. FlexMix: A General Framework for Finite Mixture Models and Latent Class

Regression in R. Journal of Statistical Software. 2004; 11(8), 1 - 18.

doi:http://dx.doi.org/10.18637/jss.v011.i08.

Lenk PJ, DeSarbo WS. Bayesian inference for finite mixtures of generalized linear models with

random effects. Psychometrika. 2000;65(1):93-119. doi:10.1007/BF02294188.

Liu C, Cripe TP, Kim M-O. Statistical Issues in Longitudinal Data Analysis for Treatment

Efficacy Studies in the Biomedical Sciences. Molecular Therapy. 2010;18(9):1724-1730.

doi:10.1038/mt.2010.127.

Mooijaart A, van der Heijden PGM. The EM algorithm for latent class analysis with equality

constraints. Psychometrika. 1992;57(2):261-269. doi:10.1007/BF02294508.

Nakai M, Ke W. Review of the Methods for Handling Missing Data in Longitudinal Data

Analysis. Int. Journal of Math. Analysis. 2011; 1, 1 – 13.

Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number

of events per variable in logistic regression analysis. Journal of Clinical Epidemiology.

1996;49(12):1373-1379. doi:10.1016/S0895-4356(96)00236-3.

Proust-Lima C, Philipps V, Liquet B. 2017. Estimation of Extended Mixed Models Using

Latent Classes and Latent Processes: The R Package lcmm. Journal of Statistical Software,

78(2), 1-56.doi:10.18637/jss.v078.i02.

37

Ranganathan A. The Levenberg-Marquardt Algorithm. Honda Research Institute USA. 2004.

URL http://www.ananth.in/Notes_files/lmtut.pdf.

Russell V. 2016. Least-Squares Means: The R Package lsmeans. Journal of Statistical

Software, 69(1), 1-33.doi:10.18637/jss.v069.i01.

Tein J-Y, Coxe S, Cham H. Statistical Power to Detect the Correct Number of Classes in Latent

Profile Analysis. Structural Equation Modeling: A Multidisciplinary Journal. 2013;20(4):640-

657. doi:10.1080/10705511.2013.824781.

White A, MurphyT. BayesLCA: An R Package for Bayesian Latent Class Analysis. Journal of

Statistical Software. 2014; 61(13), 1 - 28. doi:http://dx.doi.org/10.18637/jss.v061.i13.

Documents

Thesis Sofie Denies - Universiteit Gent