1 Introduction
Joint models for longitudinal and survival data have become a valuable asset in the toolbox of modern data scientists. After the seminal papers of Faucett and Thomas (1996) and Wulfsohn and Tsiatis (1997), several extensions of these models have been proposed in the literature. These include, amongst others, flexible specification of the longitudinal model (Brown et al., 2005a), consideration of competing risks (Elashoff et al., 2008; Andrinopoulou et al., 2014) and multistate models (Ferrer et al., 2016), and the calculation of dynamic predictions (ProustLima and Taylor, 2009; Rizopoulos, 2011; Rizopoulos et al., 2014; Andrinopoulou and Rizopoulos, 2016; Rizopoulos et al., 2017; Andrinopoulou et al., 2018). A particularly useful and practical extension is that which allows for the inclusion of multiple longitudinal outcomes (Rizopoulos and Ghosh, 2011a; Chi and Ibrahim, 2006; Brown et al., 2005b; Lin et al., 2002). In medical settings in particular, data collection is likely to be complex: while the standard joint model allows us to determine the association between a survival outcome and a single longitudinal outcome (biomarker), there are more often than not multiple biomarkers relevant to the event of interest. Extending the univariate joint model to accommodate these multiple longitudinal outcomes allows us to incorporate more information, improving prognostication and enabling us to better make sense of the complex underlying nature of the disease dynamics. A motivating example of this is the BioSHiFT cohort study; a prospective observational study conducted in the Netherlands on chronic heart failure (CHF) patients. The primary focus of the study was to determine whether or not disease progression in individual CHF patients can be assessed using longitudinal measurements of several blood biomarkers (van Boven et al., 2018). Previous work on this data has focused mainly on the association between each individual biomarker and a single composite event, but it is likely that the predictive value of the biomarkers will be more accurately determined when they are assessed in concert.
Extension to the multivariate case is mathematically straightforward, and may be easily combined with other extensions, allowing for longitudinal outcomes of varying types; left, right and interval censoring; and the inclusion of competing risks, amongst others. There are also now a number of excellent software packages available, which make for easier implementation of the more complex models. There are however technical challenges which hamper the widespread use of these models. As the number of longitudinal outcomes increases, and thus the number of random effects, standard methods become computationally prohibitive: under a Bayesian approach, the number of parameters to sample becomes unreasonably large, and in the case of maximum likelihood, we are required to numerically approximate the integrals over the random effects, which is challenging in high dimensions. The practical solution most commonly used in such settings is that of the twostage approach, wherein a multivariate mixed model is first used for the longitudinal outcomes, following which, the output of this model is used to fit a survival submodel. Unfortunately, substantial research on this topic indicates that this approach results in biased estimates (Tsiatis and Davidian, 2004; Rizopoulos, 2012; Ye et al., 2008). In this paper, we propose an adaptation of the simple twostage approach which eliminates the bias and substantially reduces computational time. We propose the use of a correction factor, based on importance sampling theory (Press et al., 2007, Section 7.9)
. This correction factor allows us to reweight each realization of the MCMC sample obtained from the Bayesian estimation of the twostage approach, such that the resulting estimates more closely approximate those obtained via the full multivariate joint model. The weights are given by the target distribution (the full posterior distribution of the multivariate joint model), divided by the product of the posterior distributions for each of the two stages, evaluated for each iteration of the MCMC sample. The use of this correction factor alone is not enough to eliminate the bias, but, prior to its application, the twostage approach is itself modified: where before, in the second stage, only the parameters of the survival submodel were updated, we now also update the random effects. These adaptations combined, achieve unbiased estimates in a fraction of the time required to compute the full multivariate model.
The rest of the paper is organised as follows: Section 2 introduces the full multivariate joint model, and Section 3 discusses the estimation of the model under the Bayesian paradigm. Section 4 introduces the importancesampling corrected twostage approach, and presents the results of a simple simulation, and Section 5 the importancesampling corrected twostage approach with updated random effects. Section 6 presents the results of a more complex simulation, and finally in Section 7 we look at an analysis of the BioSHiFT data.
2 Joint Model Specification
We start with a general definition of the framework of multivariate joint models for multiple longitudinal outcomes and an event time.
Let denote a sample from the target population, where denotes the true event time for the th subject, and the observed event times. Then denotes the event indicator, with 0 corresponding to right censoring (), 1 to a true event (), 2 to left censoring (), and 3 to interval censoring (). Assuming longitudinal outcomes we let denote the
longitudinal response vector for the
th outcome () and the th subject, with elements denoting the value of the th longitudinal outcome for the th subject, taken at time point , .To accommodate multivariate longitudinal responses of different types in a unified framework, we postulate a generalized linear mixed effects model. In particular, the conditional distribution of given a vector of random effects is assumed to be a member of the exponential family, with linear predictor given by
(1) 
where denotes a known onetoone monotonic link function, denotes the value of the th longitudinal outcome for the th subject at time point , and and denote the design vectors for the fixedeffects and the random effects , respectively. The dimensionality and composition of these design vectors is allowed to differ between the multiple outcomes, and they may also contain a combination of baseline and timevarying covariates. To account for the association between the multiple longitudinal outcomes we link their corresponding random effects. More specifically, the complete vector of random effects
is assumed to follow a multivariate normal distribution with mean zero and variancecovariance matrix
.For the survival process, we assume that the risk for an event depends on a function of the subjectspecific linear predictor and/or the random effects. More specifically, we have
(2) 
where denotes the history of the underlying longitudinal process up to , denotes the baseline hazard function, and is a vector of exogenous, possibly timevarying, covariates with corresponding regression coefficients . Functions , parameterized by vector , specify which components/features of each longitudinal outcome are included in the linear predictor of the relative risk model Brown (2009); Rizopoulos and Ghosh (2011b); Rizopoulos (2012); Rizopoulos et al. (2014). Some examples, motivated by the literature, are (subscripts have been dropped in the following expressions but are assumed):
These formulations of postulate that the hazard of an event at time may be associated with the underlying level of the biomarker at the same time point, the slope of the longitudinal profile at or the accumulated longitudinal process up to . In addition, the specified terms from the longitudinal outcomes may also interact with some covariates in the . Furthermore, note, that we allow a combination of functional forms per longitudinal outcome. Finally, the baseline hazard function is modeled flexibly using a Bsplines approach, i.e.,
(3) 
where denotes the th basis function of a Bspline with knots and the vector of spline coefficients; typically or 20.
3 Likelihood and Priors
As explained in Section 1, we use a Bayesian approach for the estimation of the joint model’s parameters. The posterior distribution of the model parameters given the observed data is derived under the assumptions that given the random effects, the longitudinal outcomes are independent from the event times, the multiple longitudinal outcomes are independent of each other, and the longitudinal responses of each subject in each outcome are independent. Under these assumptions the posterior distribution is analogous to:
(4) 
where denotes the full parameter vector, and
with and denoting the natural and dispersion parameters in the exponential family, respectively, and , , and are known functions specifying the member of the exponential family. For the survival part accordingly we have
(5) 
where denotes the indicator function. The integral in the definition of the cumulative hazard function does not have a closedform solution, and thus a numerical method is employed for its evaluation. Standard options are the GaussKronrod and GaussLegendre quadrature rules.
For the parameters of the longitudinal outcomes we use standard default priors. More specifically, independent normal priors with zero mean and variance 1000 for the fixed effects and halfStudent’s t priors with 3 degrees of freedom for scale parameters. The covariance matrix of the random effects is parameterized in terms of a correlation matrix
and a vector of . For the correlation matrix we use the LKJCorrelation prior proposed by Lewandowski et al. (2009) with parameter . For each element of we use a halfStudent’s t prior with 3 degrees of freedom. For the regression coefficients of the relative risk model we assume independent normal priors with zero mean and variance 1000. The same prior is also assumed for the vector of association parameters . However, when becomes high dimensional (e.g., when several functional forms are considered per longitudinal outcome), we opt for a globallocal ridgetype shrinkage prior. More specifically, for the th element of we assumeThe global smoothing parameter has sufficient mass near zero to ensure shrinkage, while the local smoothing parameter allows individual coefficients to attain large values. The motivation for using this type of prior distribution in this case is that we expect the different terms behind the specification of to be correlated, and many of the corresponding coefficients to be nonzero. Nonetheless, other options of shrinkage or variableselection priors could also be used Andrinopoulou and Rizopoulos (2016). Finally, the penalized version of the Bspline approximation to the baseline hazard is specified using the following hierarchical prior for Lang and Brezger (2004):
where is the smoothing parameter that takes a
prior distribution, with a hyperprior , which ensures a proper posterior distribution for Jullion and Lambert (2007), , with denoting the th difference penalty matrix, and where denotes the rank of .
4 Corrected TwoStage Approach
4.1 Importance sampling correction
Carrying out a full Bayesian estimation of the multivariate joint model is straightforward, using either Markov chain Monte Carlo (MCMC) or Hamiltonian Monte Carlo (HMC). However, this estimation becomes very challenging from a computational viewpoint, due to the high number of random effects involved, and the requirement for numerical integration in the calculation of the density of the survival outcome (
5). This limitation has hampered the use of multivariate joint models in practice.The twostage approach, which entails fitting the longitudinal and survival outcomes separately, is the solution most often used to overcome this computational deadlock. Using this approach, under the Bayesian framework, we would have the following two stages:

[leftmargin=0.7cm]

We fit a multivariate mixed model for the longitudinal outcomes using either MCMC or HMC, and we obtain a sample of size from the posterior,
where denotes the subset of the parameters that are included in the definition of the longitudinal submodels (including the parameters in the randomeffects distribution).

[leftmargin=0.7cm]

Utilizing the sample from Stage I, we obtain a sample for the parameters of the survival submodel
from the corresponding posterior distribution,
where denotes the subset of the parameters that are included in the definition of the survival submodel, and .
This twostage procedure essentially entails the same number of iterations as the full Bayesian estimation of the multivariate joint model. The computational benefits stem from the fact that we do not need to numerically integrate the survival submodel density function in Stage I. Even though this approach greatly reduces the computational burden, there exists a substantial body of work demonstrating that it results in biased estimates, even in the simpler case of univariate joint models (see Tsiatis and Davidian, 2004; Rizopoulos, 2012, and references therein)
. This bias is a result of not working with the full joint distribution, which would produce estimates of
and that are appropriately corrected for informative dropout relating to the occurrence of an event.To overcome this issue, we propose the correction of the estimates we obtain from the twostage approach using importance sampling weights (Press et al., 2007, Section 7.9). In particular, we consider that the realizations that we have obtain using the twostage approach can be considered a weighted sample from the full posterior of the multivariate joint model with weights given by:
(6) 
The numerator in this expression is the posterior distribution of the multivariate joint model, and the denominator, the corresponding posterior distributions from each of the two stages. As previously stated, from (6) we observe that the difference between fitting the full joint model versus the twostage approach comes from the second term in the numerator and denominator. By expanding these two terms we obtain
(7) 
The resulting weights involve a marginal likelihood calculation, which we perform using a Laplace approximation, namely
where
denotes the determinant of matrix ,
and denotes the dimensionality of the vector. The extra computational burden of performing this Laplace approximation is minimal in practice, since good initial values can be provided from one iteration to the next , which substantially reduces the number of required optimization iterations for finding (i.e., is provided as an initial value to find ).
4.2 Performance
To evaluate whether the introduction of the importance sampling weights alleviates the bias observed with the simple twostage approach (i.e., without the weights), we perform a ‘proofofconcept’ simulation study. In particular, we compare the proposed corrected two stage approach with the simple twostage approach, as well as the full multivariate joint model in the case of two continuous longitudinal outcomes. The specific details of this simulation setting are given in Appendix A.1. The results from 500 simulated datasets are presented in Figure 1 and in the appendix, in Figures D.1 and D.2.
Figure D.1 shows boxplots with the computing times required to fit the joint model under three approaches. Comparing the first two of these approaches, we see that the calculation of the importancesampling weights in the corrected twostage approach had minimal computational cost, with the full multivariate joint model taking substantially more time to fit. Figure D.2 shows boxplots of posterior means from the 500 datasets for the parameters of the two longitudinal submodels. We observe that all three approaches provide very similar results with minimal bias. Figure 1 shows the corresponding boxplots of posterior means for the parameters of the survival submodel. As expected, the full multivariate joint model returns unbiased results. Similarly, as has previously been reported in the literature, the simple twostage approach exhibits considerable bias. We see that this bias persists for the corrected twostage approach, although theoretically the use of the importance sampling weights should alleviate it (by adjusting the posterior means obtained via the simple twostage approach such that they more closely resemble those from the full multivariate model).
5 Corrected TwoStage Approach with Random Effects
5.1 Importance sampling correction with random effects
The above result is unexpected, since (as per Figure D.2), the corrected twostage (and indeed the simple twostage) approach unbiasedly estimates both the fixed effects and the variance components of the longitudinal submodels. However, further investigation shows that there is a considerable difference between the corrected twostage approach and the multivariate joint model with regards to the posterior of the random effects. This is depicted in Figure 2 for one of the longitudinal outcomes we have simulated.
The data have been simulated such that higher values for longitudinal outcome are associated with a higher hazard of the event. From Figure 2 we observe that the random effect estimates for the multivariate mixed model, and especially the random slope estimates for subjects with and without an event differ from those for the multivariate joint model. In particular, we observe that the random slope estimates from the joint model are larger for subjects with an event compared to the linear mixed model, and vice versa for subjects without an event. This observation suggests that we could improve the weights given in (6) by updating (in the second stage) not only the parameters of the survival submodel but also the random effects . That is, we obtain a sample for the parameters of the survival submodel from the corresponding joint posterior distribution,
(8) 
Admittedly, simulating from is more computationally intensive than simulating from , the corresponding second stage presented in Section 4, since we now also need to calculate the densities of the mixedeffect models for the longitudinal outcomes. Nonetheless, the computational gains compared to fitting the full joint model remain significant.
Under this second stage (8) the importance sampling weights now take the form:
(9) 
Similarly to (6), the new weights have been formulated such that the difference lies in the second term in both the numerator and denominator. By doing an expansion of these two terms similar to that used in the previous section, we obtain:
(10) 
and the selfnormalized weights are
The integrals in the numerator are once again approximated using the Laplace method, namely, we let
and
denote the Hessian matrix for the random effects, and analogously,
denote the Hessian matrix for the parameters. Then, we approximate the inner integral by
where denotes the number of random effects for each subject . Similarly, the outer integral is approximated as
Given the requirement for a double Laplace approximation, and the fact that the denominator does not simplify, the calculation of the weights given by (10) is more computationally intensive than the ones presented in Section 4. Nevertheless, these required computations still remain many orders of magnitude faster than fitting the full joint model.
5.2 Performance
To assess whether updating the random effects in the importance sampling weights alleviates the bias we observed in Section 4.2, we have reanalyzed the same simulated datasets. The details are again given in Appendix A.1. The results from 500 simulated datasets are presented in Figures 3, D.1, and D.3. As anticipated, the corrected twostage approach with updated random effects added only a small computational cost, with the full multivariate joint model still taking considerably more time to fit than either of the corrected twostage approaches (Figure D.1). The boxplots depicting the posterior means from the 500 datasets for the parameters of the longitudinal submodels once again demonstrate similar results for all three approaches (Figure D.3). Figure 3 shows the posterior means for the parameters of the survival submodel. We observe that the bias seen for the corrected twostage approach is now eliminated, with the posterior means from the approach with updated random effects closely approximating those from the full multivariate joint model.
6 Extra Simulations
Further simulations were performed in order to assess the performance of the importancesamplingcorrected twostage approach with the updated random effects, in different scenarios. Details of these simulations are given in Appendices A.2 and A.3.
6.1 Scenario II
Scenario II included 6 continuous longitudinal outcomes. Owing to the increased number of outcomes, the full multivariate joint model was not run. Table 1
shows the bias for the parameters of the survival submodel, together with the RMSE and coverage (based on the 2.5% and 97.5% credibility intervals for each parameter). Table
C.1 in the appendix shows the same information for the parameters of the 6 longitudinal outcomes.Parameter  Bias  RMSE  Coverage 



Group  0.02  0.19  0.96 
0.01  0.04  0.93  
0.01  0.04  0.93  
0.01  0.16  0.95  
0.03  0.17  0.95  
0.03  0.05  0.83  
0.02  0.04  0.88  

6.2 Scenario III
Scenario III again included 6 longitudinal outcomes, now of varying types: 3 continuous and 3 binary. Table 2 demonstrates yet again the alleviation of the bias achieved by updating the random effects.
Parameter  Bias  RMSE  Coverage 

Group  0.01  0.91  0.97 
0.01  0.04  1.00  
0.01  0.04  1.00  
0.00  0.43  0.90  
0.02  0.88  0.96  
0.02  0.04  1.00  
0.01  0.05  0.99  

7 Analysis of the BioSHiFT Dataset
In this section, we present the analysis of data from the BioSHiFT cohort study. During a median followup period of 2.4 years (IQR: 2.32  2.45), estimated using the reverse KaplanMeier methodology Shuster J.J (1991)
, 66/254 (26%) patients experienced the primary event of interest (a composite event, consisting of hospitalisation for heart failure, cardiac death, LVAD placement and heart transplantation). Biomarkers were measured at inclusion and subsequently every 3 months until the end of followup. We focus on 6 biomarkers: the glomerular marker Cystatin C (CysC), two tubular markers; urinary NacetylbetaDglucosaminidase (NAG) and kidneyinjurymolecule (KIM)1, and the markers Nterminal propBNP (NTproBNP), cardiac troponin T (HsTNT), and Creactive protein (CRP). The latter three markers are known to be related to poor outcomes in CHF patients, and measure various aspects of heart failure pathophysiology (wall stress, myocyte damage and inflammation respectively). All biomarkers were logarithmically transformed for further analysis (log base 2) due to skewness.
For each of NTprobnp, HsTNT and CRP, we included natural cubic splines in both the fixed and random effects parts of their longitudinal models, with differing numbers of knots per outcome (Figure D.4). Simple linear models with random intercept and slope were used for CysC, NAG and KIM1. Thus, for each of CysC, NAG and KIM1 (), we fit:
For the remaining outcomes (), we have:
where denotes the Bspline basis matrix for a natural cubic spline of time with two internal knots placed at the 25th and 75th percentiles of the follow up times for NTprobnp (), and one internal knot placed at the 50th percentile of the follow up times for each of HsTNT and CRP (). Boundary knots were set at the 5th and 95th percentiles. We assume a multivariate normal distribution for the random effects, , where D is a unstructured variance covariance matrix. For the survival process, we included the baseline variables: (standardized) age, sex, NYHA class (class III / IV vs. class I / II), use of diuretics, presence or absence of ischemic heart disease (IHD), diabetes mellitus, (standardized) BMI, and the estimated glomerular filtration rate (eGFR) value.
We fit 3 joint models, using the globallocal ridgetype shrinkage prior previously described in each case. Model 1 included only the current underlying value of the longitudinal marker for each of the 6 markers. Model 2 included the current value and slope for each marker, and model 3 included the integrated longitudinal profile for each marker (AUC). We thus have:
The parameter estimates and 95% credibility intervals for the event process are presented in Table 3. Hazard ratios are presented per doubling of level, slope or AUC at any point in time.
Event Process  

Model 1: Current value  
HR (95% CI)  HR*  pvalue  
Age  1.01 (0.99 to 1.03)  1.00  0.242 
Sex (Male vs. Female)  1 (0.77 to 1.31)  1.06  0.97 
NYHA Class (III / IV vs. I / II)  1.47 (0.97 to 2.64)  2.52  0.134 
Diuretics (Yes vs. No)  1.12 (0.79 to 2.74)  0.96  0.77 
IHD (Yes vs. No)  1.06 (0.88 to 1.55)  0.95  0.696 
eGFR  1.01 (1 to 1.02)  1.01  0.048 
BMI  1.04 (0.99 to 1.1)  1.03  0.14 
Diabetes Mellitus  1.08 (0.86 to 1.79)  0.93  0.638 
1.44 (1.1 to 1.89)  1.25  0.008  
1.32 (0.99 to 1.84)  1.35  0.078  
1.86 (1.45 to 2.37)  2.20  0.0001  
1.04 (0.55 to 2.48)  0.90  0.966  
0.89 (0.51 to 1.38)  0.55  0.66  
0.96 (0.74 to 1.22)  1.21  0.768  

Model 2: Current value and slope  
HR (95% CI)  HR*  pvalue  
Age 
1.01 (0.99 to 1.03)  1.00  0.162 
Sex (Male vs. Female)  0.99 (0.66 to 1.34)  0.91  0.994 
NYHA Class (III / IV vs. I / II)  1.61 (0.97 to 2.97)  1.53  0.104 
Diuretics (Yes vs. No)  1.2 (0.79 to 4.64)  0.95  0.736 
IHD (Yes vs. No)  1.06 (0.84 to 1.56)  1.36  0.750 
eGFR  1.01 (1 to 1.02)  1.01  0.090 
BMI  1.04 (0.98 to 1.1)  1.08  0.236 
Diabetes Mellitus  1.08 (0.84 to 1.73)  1.03  0.676 
1.52 (1.19 to 1.99)  1.38  0.002  
1.37 (1 to 1.89)  1.08  0.048  
1.9 (1.48 to 2.44)  1.61  0.0001  
1.05 (0.47 to 3.2)  0.92  0.996  
0.74 (0.36 to 1.22)  1.04  0.292  
0.92 (0.69 to 1.18)  1.02  0.534  
0.95 (0.55 to 1.5)  0.91  0.878  
0.84 (0.24 to 1.91)  1.05  0.734  
1.02 (0.7 to 1.48)  0.87  0.926  
1.75 (0.19 to 356.56)  1.35  0.826  
1.03 (0.08 to 6.63)  0.84  0.946  
2.37 (0.61 to 63.48)  0.93  0.460  

Model 3: AUC  
HR (95% CI)  HR*  pvalue  
Age  1 (0.98 to 1.03)  1.01  0.812 
Sex  1.02 (0.83 to 1.38)  1.00  0.944 
NYHA Class (III / IV vs. I / II)  1.8 (0.99 to 3.25)  1.83  0.080 
Diuretics (Yes vs. No)  1.18 (0.83 to 3.72)  1.22  0.680 
IHD (Yes vs. No)  1.04 (0.85 to 1.4)  1.21  0.742 
eGFR  1.01 (1 to 1.02)  1.00  0.166 
BMI  1.03 (0.98 to 1.08)  1.02  0.286 
Diabetes Mellitus  1.11 (0.89 to 1.87)  0.99  0.564 
1.18 (0.99 to 1.43)  1.10  0.064  
1.2 (0.98 to 1.56)  1.06  0.082  
1.47 (1.21 to 1.79)  1.50  0.0001  
1.13 (0.73 to 2.6)  1.38  0.704  
0.96 (0.63 to 1.34)  1.12  0.846  
1 (0.82 to 1.19)  1.01  0.960  

Following adjustment for covariates, the estimated association parameters in Model 1 indicate significant associations between the risk of the composite event and the current underlying values of NTproBNP and CRP, such that there is a 1.86 fold increase in the risk of the composite event (95% CI: 1.45 to 2.37), per doubling of NTprobnp level, and a 1.44 fold increase in the risk of the composite event (95% CI: 1.1 to 1.89), per doubling of CRP level. No significant associations were found for any of HsTNT, CysC, NAG or KIM1. Similarly, no significant associations were found for the current underlying values of CysC, NAG or KIM1 in Model 2, and nor were there any significant associations between the risk of the composite event and the slopes of the 6 continuous markers.
Model 3 indicates a significant association for NTproBNP, with a 1.47 fold increase in the risk of the composite event (95% CI: 1.21 to 1.79) per doubling of the area under the NTproBNP profile.
Since the parameter estimates for each of the longitudinal outcomes remained fairly constant across models, to avoid repetition, the estimates and 95% credibility intervals are presented for one model only (Table C.4).
In previous analyses of these same 6 markers, the current underlying value, instantaneous slope and area under the curve of each marker were each assessed independently of one another. Van Boven et al., 2018 found significant associations in all cases for CRP, HsTNT and NTproBNP, and Brankovic et al., 2018 found significant associations for the current underlying values and slopes of each of CysC, NAG and KIM1, and the area under the curves for CysC, and NAG.
Van Boven et al., 2018 provided an additional multivariate analysis for CRP, HsTNT and NTproBNP, wherein the predicted individual profiles for each marker were separately determined, and functions thereof were simultaneously included in a single extended Cox model as timevarying covariates. Models therefore included either the current underlying values, the instantaneous slopes or the area under the curves for all 3 markers simultaneously. In that analysis, only CRP and NT proBNP were found to be independently predictive of the composite event, with significant associations for each of the current underlying values and slopes of these markers. In the model for the area under the curves, only NT proBNP was significant.
8 Discussion
In this paper, we presented a novel approach for fitting joint models which allows for the inclusion of multivariate longitudinal outcomes with realistic computing times. We demonstrated once again, the bias of the estimated parameters for the survival process characteristic of the standard twostage approach, and proposed the use of an importancesampling corrected twostage approach, with updated random effects, in its place. Our approach was shown to be successful, producing satisfactory results in a number of simulation scenarios: both survival and longitudinal estimates were unbiased, and computing times were reduced by several orders of magnitude, compared to the full multivariate joint model. We were easily able to incorporate multiple outcomes in the analysis of the BioSHiFT data, obtaining very similar results to those previously noted for the CHFrelated biomarkers (CRP, HsTNT and NTproBNP). We did not find any significant associations between any of the renal markers (CysC, NAG and KIM1) and the risk of the composite event in the multivariate analysis, indicating that their predictive value may not be independent of the CHFrelated markers. While the simulations included up to 6 multiple outcomes of varying types, it would be interesting to confirm our results in even more complex settings, (perhaps incorporating competing risks such as those present in the BioSHiFT study), and to try determine the limits of the methodology. A further topic for research would be methods for increasing the speed of computation involved in fitting the multivariate mixed model itself, so as to extend the number of outcomes even further.
The proposed importancesampling corrected twostage estimation approach is implemented in function mvJointModelBayes() in the freely available package JMbayes (version 0.80) for the R programming language (freely available from the Comprehensive R Archive Network at http://cran.rproject.org/package=JMbayes). An example of how these functions should be used can be found in the appendices.
Acknowledgements
The first and last authors acknowledge support by the Netherlands Organization for Scientific Research VIDI grant nr. 016.146.301.
Appendix A Simulation Study Design
a.1 Scenario I
Scenario I simulates 500 patients with a maximum of 15 repeated measurements per patient. We included continuous longitudinal outcomes and one survival outcome. The longitudinal outcomes each had form:
with and , with . The variancecovariance matrix D has general form:
and for Scenario I:
Time was simulated from a uniform distribution between 0 and 25. For the survival outcome, adjusting for group allocation, we used:
Comments
There are no comments yet.