| Literature DB >> 28860558 |
José M Lezcano-Valverde1, Fernando Salazar2, Leticia León1, Esther Toledano1, Juan A Jover1, Benjamín Fernandez-Gutierrez1, Eduardo Soudah2, Isidoro González-Álvaro3, Lydia Abasolo1, Luis Rodriguez-Rodriguez4.
Abstract
We developed and independently validated a rheumatoid arthritis (RA) mortality prediction model using the machine learning method Random Survival Forests (RSF). Two independent cohorts from Madrid (Spain) were used: the Hospital Clínico San Carlos RA Cohort (HCSC-RAC; training; 1,461 patients), and the Hospital Universitario de La Princesa Early Arthritis Register Longitudinal study (PEARL; validation; 280 patients). Demographic and clinical-related variables collected during the first two years after disease diagnosis were used. 148 and 21 patients from HCSC-RAC and PEARL died during a median follow-up time of 4.3 and 5.0 years, respectively. Age at diagnosis, median erythrocyte sedimentation rate, and number of hospital admissions showed the higher predictive capacity. Prediction errors in the training and validation cohorts were 0.187 and 0.233, respectively. A survival tree identified five mortality risk groups using the predicted ensemble mortality. After 1 and 7 years of follow-up, time-dependent specificity and sensitivity in the validation cohort were 0.79-0.80 and 0.43-0.48, respectively, using the cut-off value dividing the two lower risk categories. Calibration curves showed overestimation of the mortality risk in the validation cohort. In conclusion, we were able to develop a clinical prediction model for RA mortality using RSF, providing evidence for further work on external validation.Entities:
Mesh:
Year: 2017 PMID: 28860558 PMCID: PMC5579234 DOI: 10.1038/s41598-017-10558-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Demographic and clinical characteristics of the rheumatoid arthritis patients from the “Hospital Clínico San Carlos - Rheumatoid Arthritis Cohort” and from the “Hospital Universitario de La Princesa Early Arthritis Register Longitudinal” with more than 2 years of follow-up after disease diagnosis.
| Variables | HCSC-RAC | PEARL | ||
|---|---|---|---|---|
| n = 1,461 | Missing data, n (%) | n = 280 | Missing data, n (%) | |
| Women, n (%) | 1,105 (75.6) | 0 | 223 (79.6) | 0 |
| Age of RA diagnosis, median (IQR) | 58.6 (45.2–72.0) | 0 | 54.9 (45.3–67.6) | 0 |
| Elapsed time from RA symptoms onset to diagnosis, in years, median (IQR) | 0.7 (0.3 to 3.5) | 180 (12.3) | 0.5 (0.3–0.7) | 0 |
| Presence of Rheumatoid Factor, n (%) | 885 (61.5) | 23 (1.6) | 181 (64.6) | 0 |
| Presence of ACPA, n (%) | 465 (44.9) | 425 (29.1) | 234 (83.9) | 1 (0.4) |
| Nationality, n (%): | 0 | 0 | ||
| Spanish | 1,160 (79.4) | — | 232 (82.7) | — |
| South/Centre America, Caribbean | 237 (16.2) | — | 37 (13.2) | — |
| Other | 64 (4.4) | — | 11 (3.9) | — |
| Year of RA diagnosis, n (%): | 0 | 0 | ||
| 2001–2005 | 614 (42.0) | — | 107 (38.2) | — |
| 2006–2011 | 847 (58.0) | — | 128 (45.7) | — |
| 2012–2014 | 0 | 45 (16.1) | — | |
| Median HAQ in the first 2 years after RA diagnosis, median (IQR) | 0.50 (0.19–1.10) | 376 (25.7) | 0.63 (0.25–1.00) | 1 (0.4) |
| Median ESR in the first 2 years after RA diagnosis, median (IQR) | 23 (14 to 36.5) | 248 (17.0) | 20 (13–30) | 1 (0.4) |
| Any biological therapy in the first 2 years after RA diagnosis, n (%) | 89 (6.1) | 0 | 28 (10.0) | 0 |
| Hospital admissions in the first 2 years after RA diagnosis, n (%) | 0 | 13 (4.6) | ||
| 0 | 1,258 (86.1) | — | 230 (86.1) | — |
| 1 | 144 (9.9) | — | 28 (10.5) | — |
| 2 | 39 (2.7) | — | 5 (1.87) | — |
| 3 | 16 (1.1) | — | 2 (0.8) | — |
| ≥4 | 4 (0.28) | — | 2 (0.8) | — |
| Inclusion period, calendar years | 2001–2011 | — | 2001–2014 | — |
ACPA: Anti-citrullinated peptides antibodies, ESR: Erythrocyte sedimentation rate, HAQ: Health assessment questionnaire, IQR: Interquartile range, RA: Rheumatoid Arthritis.
Parameters and quality measures of two random survival forests models using the log-rank or the log-rank score splitting rules developed for the prediction of mortality in a cohort of rheumatoid arthritis patients (HCSC-RAC).
| Model | Splitting rule | Minimum terminal node size, n | Terminal nodes, mean | Variables tried at each split, n | Prediction error, mean (SD) | 1 year IBS, mean (SD) | 2 years IBS, mean (SD) | 5 years IBS, mean (SD) | 7 years IBS, mean (SD) | Overall IBS, mean (SD) |
|---|---|---|---|---|---|---|---|---|---|---|
| MLR | Log-rank | 3 | 131.7 | 3 | 0.187 (0.002) | 0.003 (0.0001) | 0.013 (0.0004) | 0.070 (0.002) | 0.128 (0.003) | 0.150 (0.003) |
| MLRS | Log-rank score | 3 | 228.04 | 3 | 0.209 (0.003) | 0.003 (0.0001) | 0.012 (0.001) | 0.071 (0.002) | 0.140 (0.004) | 0.167 (0.004) |
IBS: Integrated Brier Score; SD: Standard deviation.
Variables included in the random survival forest MLR ranked based on their variable importance value (VIMP).
| Variables | VIMP, mean (SD) | IR (%) |
|---|---|---|
| Age of RA diagnosis | 0.110 (0.001) | 100 |
| Median ESR in the first 2 years after RA diagnosis | 0.014 (9.8 × 10−4) | 12.7 |
| Hospital admissions in the first 2 years after RA diagnosis | 0.012 (7.0 × 10−4) | 10.5 |
| Calendar year of RA diagnosis | 0.009 (9.1 × 10−4) | 8.4 |
| Spaniard | 0.005 (5.3 × 10−4) | 4.5 |
| Presence of Rheumatoid Factor | 6.1 × 10−4 (5.7 × 10−4) | 0.6 |
| Any biological therapy in the first 2 years after RA diagnosis | 3.5 × 10−4 (1.7 × 10−4) | 0.3 |
| Elapsed time from RA symptoms onset to diagnosis | 2.8 × 10−4 (9.4 × 10−4) | 0.2 |
| Gender | 0.5 × 10−4 (5.5 × 10−4) | 0.1 |
ACPA: Anti-citrullinated peptides antibodies; ESR: Erythrocyte sedimentation rate; HAQ: Health assessment questionnaire; RA: Rheumatoid Arthritis; SD: standard deviation; VIMP: Variable importance.
Figure 1Kaplan Meier curves for the observed mortality of patients from the HCSC-RAC. Patient were grouped in mortality risk categories (low, intermediate, and high) according to a rheumatoid arthritis mortality random survival forest model using the log-rank splitting rule (MLR).
Figure 2Kaplan Meier curves for the observed mortality of patients from the PEARL. Patient were grouped in mortality risk categories (low, intermediate, and high) according to a rheumatoid arthritis mortality random survival forest model using the log-rank splitting rule (MLR).