| Literature DB >> 27789653 |
David M Hughes1, Arnošt Komárek2, Gabriela Czanner1,3, Marta Garcia-Fiñana1.
Abstract
There is an emerging need in clinical research to accurately predict patients' disease status and disease progression by optimally integrating multivariate clinical information. Clinical data are often collected over time for multiple biomarkers of different types (e.g. continuous, binary and counts). In this paper, we present a flexible and dynamic (time-dependent) discriminant analysis approach in which multiple biomarkers of various types are jointly modelled for classification purposes by the multivariate generalized linear mixed model. We propose a mixture of normal distributions for the random effects to allow additional flexibility when modelling the complex correlation between longitudinal biomarkers and to robustify the model and the classification procedure against misspecification of the random effects distribution. These longitudinal models are subsequently used in a multivariate time-dependent discriminant scheme to predict, at any time point, the probability of belonging to a particular risk group. The methodology is illustrated using clinical data from patients with epilepsy, where the aim is to identify patients who will not achieve remission of seizures within a five-year follow-up period.Entities:
Keywords: Discriminant analysis; mixture distributions; multivariate generalized linear mixed model; multivariate longitudinal data; random effects
Mesh:
Substances:
Year: 2016 PMID: 27789653 PMCID: PMC5985589 DOI: 10.1177/0962280216674496
Source DB: PubMed Journal: Stat Methods Med Res ISSN: 0962-2802 Impact factor: 3.021
Figure 1.Observed longitudinal profiles of an indicator of whether a patient had seizures, and number of adverse events experienced since the last visit for patients from the Remission group (left column) and the Refractory group (right column). In both groups, profiles of only 20 randomly selected patients are shown for clarity. Solid bold lines show LOESS smoothed profiles calculated using data from all patients. The data indicating whether a patient had seizures or not have been vertically jittered to aid interpretation.
Penalized expected deviance for models with mixture components in the random effects distribution.
| Group | ||||
|---|---|---|---|---|
| Remission | 37,305 | 36,669 |
| 36,607 |
| Refractory |
| 9,740 | 10,403 | 10,497 |
These values were based upon the full data available in each group. The models with the best PED values are shown in bold for each group.
Comparison of the choice of K and its effect on the marginal prediction accuracy.
| Cutoff | Sensitivity | Specificity | PCC | AUC | PPV | NPV | |
|---|---|---|---|---|---|---|---|
| 0.75 | 0.94 | 0.91 | 0.91 | 0.96 | 0.55 | 0.99 | |
| 0.74 | 0.94 | 0.92 | 0.92 | 0.97 | 0.58 | 0.99 | |
| 0.67 | 0.93 | 0.91 | 0.91 | 0.96 | 0.56 | 0.99 | |
| 0.71 | 0.93 | 0.91 | 0.91 | 0.96 | 0.55 | 0.99 |
PCC: probability of correct classification; AUC: area under curve; PPV: positive predictive value; NPV: negative predictive value.
The predictions are based on 100 splits of the data where 70% of the patients in each group were used to train the MGLMMs and the remaining 30% were used to test the predictive accuracy.
Posterior summary statistics and highest posterior density (HPD) credible intervals for the fixed effects, and random effects in a model with K = 2. These statistics are based on the full longitudinal data available in each group.
| Remission | Refractory | |||
|---|---|---|---|---|
| Posterior mean | 95% HPD interval | Posterior mean | 95% HPD interval | |
| Seizures ( | ||||
| | 1.2 | (−7.2, 9.4) | 1 | (1, 1.4) |
| | −3.7 | (−4.1, −3.3) | 4.8 | (4, 91) |
| | −6.2 | (−12, 0) | 7.9 | (−1.1, 2.6) |
| | 0.64 | (0.39, 0.87) | 1.00 | (0.30, 1.66) |
| | −0.10 | (−0.31, 0.12) | −0.09 | (−0.69, 0.53) |
| | −0.47 | (−0.69, −0.26) | 1.44 | (−0.71, 3.75) |
| | −0.32 | (−0.64, −0.01) | 1.25 | (−0.66, 3.83) |
| SD(Intercept) (SD | 2.11 | (1.94, 2.28) | 3.43 | (1.14, 7.62) |
| log( | ||||
| | 1.1 | (9,14) | 3 | (2.6, 3.7) |
| | −1.6 | (−1.7, −1.5) | 1.6 | (3.8, 29) |
| | −3.6 | (−5.4, −2) | −5.6 | (−17, 4.6) |
| | 0.11 | (0.03, 0.18) | 0.37 | (0.02, 0.77) |
| | −0.06 | (−0.13, 0.01) | −0.29 | (−0.63, 0.04) |
| | −0.11 | (−0.18, −0.04) | 0.41 | (−0.63, 1.36) |
| | 1.05 | (0.96, 1.15) | 1.77 | (1.34, 2.22) |
| SD(Intercept) (SD | 0.78 | (0.74, 0.83) | 1.05 | (0.92, 1.20) |
| SD(error) | 0.89 | (0.88, 0.91) | 1.11 | (1.07, 1.15) |
| Number of adverse events ( | ||||
| | −1.3 | (−1.9, −1) | −1 | (−1.6, 0) |
| | −1.2 | (−1.4, −1) | −3.6 | (−5.3, −1.9) |
| | 7.1 | (3.7, 10) | 1.8 | (1, 2.6) |
| | 0.23 | (0.09, 0.36) | −0.16 | (−0.48, 0.16) |
| | −0.09 | (−0.22, 0.03) | −0.16 | (−0.42, 0.11) |
| | −0.28 | (−0.40, −0.16) | 0.63 | (−0.26, 1.51) |
| | −0.90 | (−1.08, −0.71) | −0.92 | (−1.32, −0.53) |
| SD(Intercept) (SD | 0.93 | (0.84, 1.01) | 0.76 | (0.51, 1.16) |
SD: standard deviation; TLFU: Time since Last Follow Up.
These statistics are based on the full longitudinal data available in each group.
Summary of the classification accuracy for each of the marginal, conditional and random effects methods and for traditional LDA and QDA.
| Marginal | Conditional | Random effects | Marginal (full data) | Conditional (full data) | Random effects (full data) | LDA | QDA | |
|---|---|---|---|---|---|---|---|---|
| Cutoff | 0.74 | 0.44 | 0.27 | 0.52 | 0.22 | 0.16 | 0.17 | 0.33 |
| Sensitivity | 0.94 | 0.91 | 0.82 | 0.93 | 0.93 | 0.84 | 0.80 | 0.80 |
| Specificity | 0.92 | 0.91 | 0.72 | 0.94 | 0.92 | 0.80 | 0.74 | 0.74 |
| PCC | 0.92 | 0.91 | 0.73 | 0.94 | 0.92 | 0.80 | 0.74 | 0.75 |
| AUC | 0.97 | 0.96 | 0.83 | 0.96 | 0.95 | 0.89 | 0.76 | 0.48 |
| PPV | 0.59 | 0.56 | 0.26 | 0.65 | 0.59 | 0.34 | 0.26 | 0.27 |
| NPV | 0.99 | 0.99 | 0.97 | 0.99 | 0.99 | 0.98 | 0.97 | 0.97 |
| Mean lead time (days) | 651 | 634 | 1000 | 75 | 78 | 65 | ||
| Mean prediction time (days) | 876 | 899 | 522 | 1450 | 1451 | 1459 |
LDA: linear discriminant analysis; QDA: quadratic discriminant analysis; PCC: probability of correct classification; AUC: area under curve; PPV: positive predictive value; NPV: negative predictive value.
These results are based on averages across 100 splits of the data into training and test sets. For the dynamic LoDA (first three columns), prediction stops if a patient is predicted as refractory whilst for full data predictions (columns 4 to 6), all data up until the visit before the group status is confirmed is used in the prediction. The final two columns present the results of prediction using LDA and QDA based on baseline characteristics and using no longitudinal information.
Figure 2.Receiver Operating Characteristic curves of the dynamic LoDA using the marginal (solid red), conditional (dashed blue) and random effects (dot dashed green) prediction methods.
The longitudinal observations on a randomly selected refractory and remission patient.
| time | Seizures | Total number of seizures | Number of adverse events | |
|---|---|---|---|---|
| Patient (a) | ||||
| 93 | 0.15 | Yes | 10 | 3 |
| 184 | 0.21 | Yes | 36 | 2 |
| 366 | 0.33 | Yes | 40 | 0 |
| 720 | 0.71 | Yes | 70 | 0 |
| 833 |
| Yes | 30 | 3 |
| 924 | 0.99 | Yes | 100 | 3 |
| 1101 | 1 | Yes | 150 | 0 |
| 1295 | 1 | Yes | 72 | 0 |
| 1480 | 1 | Yes | 100 | 0 |
| Patient (b) | ||||
| 84 | 0.02 | No | 0 | 1 |
| 259 | 0.00 | No | 0 | 0 |
| 418 | 0.23 | Yes | 20 | 0 |
| 509 | 0.12 | No | 0 | 2 |
| 718 | 0.02 | No | 0 | 0 |
| 862 | – | No | 0 | 2 |
The refractory patient was a 35-year-old male with generalized epilepsy randomized before 6 June 2001, whilst the remission patient was a 44-year-old male with generalized epilepsy also randomized before 6 June 2001.
Figure 3.Changes of marginal group membership probabilities over time. The profiles are from one test set of 30% of patients. Their probabilities are calculated using the model developed on the remaining 70% of patients. The top row shows those patients whose true status is refractory whilst the bottom row shows the true remission patients. The left hand panels show all patients who are classed as remission within five years. The right panels show the patients who are predicted as refractory (up until the point at which they are classified as refractory).
Comparison of possible models under the marginal prediction scheme based on averages of 100 splits of the data into training and test sets.
| Cutoff | Sensitivity | Specificity | PCC | AUC | PPV | NPV | Mean lead | Mean prediction | |
|---|---|---|---|---|---|---|---|---|---|
| Time (days) | Time (days) | ||||||||
|
| 0.61 | 0.94 | 0.94 | 0.94 | 0.98 | 0.64 | 0.99 | 502 | 1041 |
|
| 0.43 | 0.89 | 0.87 | 0.87 | 0.94 | 0.45 | 0.99 | 860 | 666 |
|
| 0.13 | 0.71 | 0.69 | 0.70 | 0.78 | 0.22 | 0.95 | 1001 | 535 |
|
| 0.75 | 0.93 | 0.92 | 0.92 | 0.97 | 0.57 | 0.99 | 656 | 871 |
|
| 0.54 | 0.94 | 0.92 | 0.92 | 0.97 | 0.58 | 0.99 | 593 | 952 |
|
| 0.45 | 0.90 | 0.89 | 0.89 | 0.95 | 0.49 | 0.99 | 834 | 692 |
|
| 0.72 | 0.94 | 0.92 | 0.92 | 0.97 | 0.58 | 0.99 | 659 | 869 |
PCC: probability of correct classification; AUC: area under curve; PPV: positive predictive value; NPV: negative predictive value.
Y1 denotes whether a patient experienced seizures or not since the previous visit, Y2 describes the total number of seizures experienced since the previous visit under the transformation and Y3 describes the number of adverse events experienced since the previous visit. The optimal cutoffs for each model were determined by ROC analysis by selecting the top left most point of the ROC curve.
Figure 4.Receiver Operating Characteristic curves of the prediction using the marginal (solid red), conditional (dashed blue) and random effects (dot dashed green) prediction methods. The thick lines represent the dynamic allocations whilst the thin lines represent the use of the full data.