| Literature DB >> 33961630 |
John L Mbotwa1,2,3, Marc de Kamps1,4, Paul D Baxter1,2, George T H Ellison1,2,5, Mark S Gilthorpe1,2,6.
Abstract
The present study aimed to compare the predictive acuity of latent class regression (LCR) modelling with: standard generalised linear modelling (GLM); and GLMs that include the membership of subgroups/classes (identified through prior latent class analysis; LCA) as alternative or additional candidate predictors. Using real world demographic and clinical data from 1,802 heart failure patients enrolled in the UK-HEART2 cohort, the study found that univariable GLMs using LCA-generated subgroup/class membership as the sole candidate predictor of survival were inferior to standard multivariable GLMs using the same four covariates as those used in the LCA. The inclusion of the LCA subgroup/class membership together with these four covariates as candidate predictors in a multivariable GLM showed no improvement in predictive acuity. In contrast, LCR modelling resulted in a 18-22% improvement in predictive acuity and provided a range of alternative models from which it would be possible to balance predictive acuity against entropy to select models that were optimally suited to improve the efficient allocation of clinical resources to address the differential risk of the outcome (in this instance, survival). These findings provide proof-of-principle that LCR modelling can improve the predictive acuity of GLMs and enhance the clinical utility of their predictions. These improvements warrant further attention and exploration, including the use of alternative techniques (including machine learning algorithms) that are also capable of generating latent class structure while determining outcome predictions, particularly for use with large and routinely collected clinical datasets, and with binary, count and continuous variables.Entities:
Year: 2021 PMID: 33961630 PMCID: PMC8104399 DOI: 10.1371/journal.pone.0243674
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Descriptive characteristics of the study cohort.
| Study Cohort | |
|---|---|
| 1,796 (100.0) | |
| 1,061 (59.1) | |
| 1,313 (73.1) | |
| 504 (28.1) | |
| 3.40 (2.11, 5.78) | |
| 69.7 (69.1, 70.2) | |
| 13.46 (13.38, 13.54) |
N = number; % = percentage; IQR = interquartile range; CI = confidence interval.
Latent class analysis (LCA) model summaries—The preferred model from this step was used in Procedures 2 and 3.
| Number of classes | Number of parameters | BIC | Entropy | Class | Modal N (%) | Probabilistic N (%) |
|---|---|---|---|---|---|---|
| 1 | 6 | 19,818.53 | - | 1,796 (100.0) | - | |
| 2 | 11 | 19,537.79 | 0.75 | Class 1 | 1,452 (80.8) | 1425.3 (79.4) |
| Class 2 | 344 (19.2) | 370.7 (20.6) | ||||
| 3 | 16 | 19,445.74 | 0.74 | Class 1 | 1,203 (67.0) | 1175.0 (65.4) |
| Class 2 | 480 (26.7) | 500.7 (27.9) | ||||
| Class 3 | 113 (6.3) | 120.3 (6.7) | ||||
| 4 | 21 | 19,422.35 | 0.80 | Class 1 | 811 (45.2) | 797.0 (44.4) |
| Class 2 | 486 (27.1) | 504.4 (28.1) | ||||
| Class 3 | 381 (21.2) | 371.4 (20.7) | ||||
| Class 4 | 118 (6.6) | 123.2 (6.9) | ||||
| 6 | 31 | 19,422.87 | 0.63 | Class 1 | 527 (29.3) | 517.7 (28.8) |
| Class 2 | 474 (26.4) | 470.5 (26.2) | ||||
| Class 3 | 276 (15.4) | 247.7 (13.8) | ||||
| Class 4 | 234 (13.0) | 232.6 (13.0) | ||||
| Class 5 | 186 (10.4) | 229.8 (12.8) | ||||
| Class 6 | 99 (5.5) | 97.6 (5.4) |
BIC = Bayesian information criterion; N = number; % = percentage; the optimal LCA model according to the BIC is emboldened.
Covariate coefficients for each preferred model (Procedures 1–4) executed on the complete data, along with median c-statistic and empirical 95% empirical confidence intervals generated through 10-fold cross-validation.
| Model (c-statistic: 95% CI) | HR (95% CI) | |
|---|---|---|
| Type 2 Diabetic vs. not | 1.35 (1.16, 1.59) | |
| Male vs. Female | 1.76 (1.47, 2.11) | |
| Age (per 5 years) | 1.24 (1.20, 1.29) | |
| Haemoglobin (per g/dl) | 0.82 (0.78, 0.86) | |
| Class 2 (470) | 0.35 (0.30, 0.44) | |
| Class 3 (324) | 1.33 (1.10, 1.60) | |
| Class 4 (317) | 0.71 (0.57, 0.87) | |
| Class 5 (99) | 0.17 (0.10, 0.29) | |
| Class 2 (26.0%) | 0.26 (0.19, 0.34) | |
| Class 3 (18.0%) | 1.00 (0.71, 1.39) | |
| Class 4 (18.0%) | 1.58 (1.27, 1.97) | |
| Class 5 (6.0%) | 0.17 (0.09, 0.32) | |
| Type 2 Diabetic vs. not | 1.51 (1.13, 2.01) | |
| Male vs. Female | 1.80 (1.49, 2.17) | |
| Age (per 5 years) | 1.21 (1.13, 1.29) | |
| Haemoglobin (per g/dl) | 0.82 (0.79, 0.86) | |
| Class 2 (470) | 0.77 (0.53, 1.10) | |
| Class 3 (324) | 0.84 (0.59, 1.19) | |
| Class 4 (317) | 0.92 (0.71, 1.20) | |
| Class 5 (99) | 0.79 (0.38, 1.67) | |
| Type 2 Diabetic vs. not | 1.44 (1.01, 2.06) | |
| Male vs. Female | 1.70 (1.31, 2.21) | |
| Age (per 5 years) | 1.21 (1.11, 1.32) | |
| Haemoglobin (per g/dl) | 0.81 (0.76, 0.88) | |
| Class 2 (26.0%) | 0.78 (0.41, 1.49) | |
| Class 3 (18.0%) | 0.90 (0.55, 1.48) | |
| Class 4 (18.0%) | 1.15 (0.56, 2.36) | |
| Class 5 (6.0%) | 0.99 (0.35, 2.78) | |
| Class 1 (‘High risk’): | Type 2 Diabetic vs. not | 1.26 (0.91, 1.75) |
| Male vs. Female | 2.07 (1.58, 2.71) | |
| Age (per 5 years) | 1.36 (1.28, 1.44) | |
| Class 2 (‘Low risk’): | Type 2 Diabetic vs. not | 0.44 (0.23, 0.82) |
| Male vs. Female | 1.01 (0.64, 1.60) | |
| Age (per 5 years) | 1.17 (1.06, 1.29) | |
| ‘High’ vs. ‘Low’ risk: | Type 2 Diabetic vs. not | 0.27 (0.09, 0.76) |
| Haemoglobin (per g/dl) | 2.16 (1.64, 2.84) | |
c-statistic = concordance index; CI = empirical confidence interval obtained from the 2.5% to 97.5% centiles of bootstrapped samples following 10-fold cross-validation; HR = hazards ratio; OR = odds ratio; CPH = Cox proportional hazards; LCA = latent class analysis (modal assignment or probabilistic assignment); LCR = latent class regression.
Descriptive characteristics for the 2-class Cox proportional hazards latent class regression model.
| Latent Class Regression Model | ||||
|---|---|---|---|---|
| Class 1 (‘High risk’) | Class 2 (‘Low risk’) | |||
| 1,566 (87.2) | 1507.8 (84.0) | 230 (22.8) | 288.2 (16.0) | |
| 1,046 (66.8) | 1014.7 (67.3) | 15 (6.5) | 45.8 (15.9) | |
| 1,160 (74.1) | 1112.8 (73.8) | 153 (66.5) | 200.9 (69.7) | |
| 368 (23.5) | 342.3 (22.7) | 136 (59.1) | 162.5 (56.4) | |
| 3.86 (2.41, 5.89) | 1.13 (0.50, 2.27) | |||
| 69.2 (68.6, 69.9) | 72.5 (71.1, 73.9) | |||
| 13.80 (13.72, 13.88) | 11.14 (10.99, 11.30) | |||
N = number; % = percentage; IQR = interquartile range; CI = confidence interval.
Latent class regression (LCR) model summaries for Procedure 4.
| Number of classes | Number of parameters | BIC | Entropy |
|---|---|---|---|
| 1 | 3 | 3695.06 | ---- |
| 3 | 17 | 3682.44 | 0.91 |
| 4 | 24 | 3722.89 | 0.94 |
BIC = Bayesian information criterion; the optimal LCA model according to the BIC is emboldened.