| Literature DB >> 35604911 |
Zhi Jiang1,2, Longhai Tian1,2, Wei Liu1,2, Bo Song1,2, Chao Xue1,2, Tianzong Li1,2, Jin Chen1,2, Fang Wei1,2.
Abstract
As the rate of percutaneous coronary intervention increases, in-stent restenosis (ISR) has become a burden. Random forest (RF) could be superior to logistic regression (LR) for predicting ISR due to its robustness. We developed an RF model and compared its performance with the LR one for predicting ISR. We retrospectively included 1501 patients (age: 64.0 ± 10.3; male: 76.7%; ISR events: 279) who underwent coronary angiography at 9 to 18 months after implantation of 2nd generation drug-eluting stents. The data were randomly split into a pair of train and test datasets for model development and validation with 50 repeats. The predictive performance was assessed by the area under the curve (AUC) of the receiver operating characteristic (ROC). The RF models predicted ISR with larger AUC-ROCs of 0.829 ± 0.025 compared to 0.784 ± 0.027 of the LR models. The difference was statistically significant in 29 of the 50 repeats. The RF and LR models had similar sensitivity using the same cutoff threshold, but the specificity was significantly higher in the RF models, reducing 25% of the false positives. By removing the high leverage outliers, the LR models had comparable AUC-ROC to the RF models. Compared to the LR, the RF was more robust and significantly improved the performance for predicting ISR. It could cost-effectively identify patients with high ISR risk and help the clinical decision of coronary stenting.Entities:
Mesh:
Year: 2022 PMID: 35604911 PMCID: PMC9126385 DOI: 10.1371/journal.pone.0268757
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Fig 1Overview of the data source and model development.
Abbreviations: DES = drug-eluting stent, ISR = intra-stent restenosis, ROC = receiver operating characteristic, PR = precision-recall.
Baseline characteristics.
| Control n = 1222 | ISR n = 279 | P value | |
|---|---|---|---|
| Male gender | 931 (76.2) | 220 (78.9) | 0.383 |
| Age, yrs | 63.6 ± 10.5 | 65.7 ± 9.6 | 0.001 |
| Bodyweight, kg | 70.6 ± 11.5 | 71.5 ± 11.1 | 0.23 |
| Smoking history | 413 (33.8) | 118 (42.3) | 0.009 |
| Hypertension | 706 (57.8) | 182 (65.2) | 0.026 |
| Dyslipidemia | 379 (31.0) | 115 (41.2) | 0.001 |
| Diabetes | 433 (35.4) | 171 (61.3) | < 0.001 |
| CKD stage | |||
| I or II | 944 (77.3) | 164 (58.8) | < 0.001 |
| III | 223 (18.2) | 69 (24.7) | |
| IV | 43 (3.5) | 30 (10.8) | |
| V | 12 (1.0) | 16 (5.7) | |
| LVEF, % | 50.8 ± 8.0 | 49.0 ± 8.4 | 0.002 |
| DAPT | 1198 (98.0) | 269 (96.4) | 0.101 |
| Statins | 1203 (98.4) | 274 (98.2) | 0.775 |
| ACEI/ARB | 1117 (91.4) | 258 (92.5) | 0.562 |
| β-blocker | 1075 (88.0) | 247 (88.5) | 0.794 |
| ACS | 948 (77.6) | 232 (83.2) | 0.049 |
| Number of stenotic vessels | |||
| 1 vessel | 615 (50.3) | 100 (35.8) | < 0.001 |
| 2 vessels | 392 (32.1) | 97 (34.8) | |
| 3 vessels | 215 (17.6) | 82 (29.4) | |
| Left main stenosis | 48 (3.9) | 22 (7.9) | 0.008 |
| Number of stenting vessels | |||
| 1 vessel | 959 (78.5) | 158 (56.6) | < 0.001 |
| 2 vessels | 229 (18.7) | 100 (35.8) | |
| 3 vessels | 34 (2.8) | 21 (7.5) | |
| Left main stenting | 48 (3.9) | 22 (7.9) | 0.008 |
| Bifurcation stenting | 33 (2.7) | 14 (5) | 0.07 |
| Complex lesion (type B2 and C) | 500 (40.9) | 201 (72.0) | < 0.001 |
| Minimum stent diameter, mm | 3.0 ± 0.4 | 2.7 ± 0.3 | < 0.001 |
| Total stent length, mm | 43.0 ± 21.6 | 59.7 ± 27.4 | < 0.001 |
Values are n (%) or mean ± SD.
ACEI = angiotensin-converting enzyme inhibitors; ACS = acute coronary syndrome; ARB = angiotensin receptor blockers; CKD = chronic kidney disease; DATP = dual antiplatelet therapy; LVEF = left ventricular ejection fraction.
Fig 2Analysis of the ROC and PR curves.
The representative ROC (A) and PR curve (B) from 1 of the 50 test datasets are shown. The AUC-ROC (C) and AUC-PR (D) in the 50 test datasets were presented. The X-axis denotes each test dataset. The Y-axis denotes the value of AUC-ROC or AUC-PR. The AUC-ROCs between the RF and LR models were compared by the DeLong test. *P < 0.05, #P < 0.01, †P < 0.001. Abbreviations: AUC = area under the curve; LR = logistic regression; RF = random forest; ROC = receiver operating characteristic; PR = precision-recall.
The predictive performance between the RF and LR models.
| Random forest | Logistic regression | Difference | P value | |
|---|---|---|---|---|
| AUC-ROC | 0.829 ± 0.025 (0.783 ~ 0.880) | 0.784 ± 0.027 (0.722 ~ 0.835) | 0.045 ± 0.015 (0.000 ~ 0.075) | 29/50 |
| AUC-PR | 0.512 ± 0.056 (0.389 ~ 0.682) | 0.435 ± 0.047 (0.313 ~ 0.548) | 0.077 ± 0.038 (-0.025 ~ 0.193) | NA |
| Sensitivity | 0.801 ± 0.057 (0.667 ~ 0.899) | 0.793 ± 0.062 (0.623 ~ 0.899) | 0.007 ± 0.053 (-0.116 ~ 0.101) | 0.335 |
| Specificity | 0.717 ± 0.031 (0.652 ~ 0.770) | 0.623 ± 0.033 (0.561 ~ 0.692) | 0.094 ± 0.035 (0.023 ~ 0.180) | < 0.001 |
| PPV | 0.392 ± 0.026 (0.333 ~ 0.444) | 0.323 ± 0.019 (0.272 ~ 0.362) | 0.069 ± 0.023 (0.016 ~ 0.127) | < 0.001 |
| NPV | 0.941 ± 0.015 (0.910 ~ 0.969) | 0.931 ± 0.018 (0.887 ~ 0.964) | 0.010 ± 0.015 (-0.026 ~ 0.036) | < 0.001 |
| Detection rate | 0.148 ± 0.011 (0.123 ~ 0.166) | 0.146 ± 0.011 (0.115 ~ 0.166) | 0.001 ± 0.010 (-0.021 ~ 0.019) | 0.335 |
| Detection prevalence | 0.378 ± 0.030 (0.316 ~ 0.439) | 0.454 ± 0.034 (0.366 ~ 0.524) | -0.076 ± 0.034 (-0.160 ~ -0.003) | < 0.001 |
| F1 score | 0.525 ± 0.029 (0.465 ~ 0.581) | 0.459 ± 0.025 (0.388 ~ 0.506) | 0.067 ± 0.025 (0.002 ~ 0.114) | < 0.001 |
| Accuracy | 0.759 ± 0.027 (0.705 ~ 0.811) | 0.708 ± 0.027 (0.634 ~ 0.758) | 0.051 ± 0.024 (-0.014 ~ 0.094) | < 0.001 |
Values are mean ± SD (minimum ~ maximum) from the 50 random test datasets.
a The value of random forest minus the value of logistic regression from each test dataset.
b DeLong test was used. P value less than 0.05 was revealed in 29 of the 50 test datasets.
c Paired student’s T test was used.
AUC = area under the curve; NPV = negative predictive value; PPV = positive predictive value; PR = precision-recall; ROC = receiver operating characteristic.
Robustness test by sequentially removing the outliers.
| AUC-ROC | P < 0.05 | |||
|---|---|---|---|---|
| Random forest | Logistic regression | Difference | ||
| Total data control = 1222, ISR = 279 | 0.829 ± 0.025 (0.783 ~ 0.880) | 0.784 ± 0.027 (0.722 ~ 0.835) | 0.045 ± 0.015 (0.000 ~ 0.075) | 29/50 |
| Removal of the outliers with | ||||
| > 8 times of mCD control = 1217, ISR = 266 | 0.836 ± 0.021 (0.786 ~ 0.875) | 0.801 ± 0.028 (0.734 ~ 0.871) | 0.035 ± 0.017 (-0.002 ~ 0.066) | 21/50 |
| > 7 times of mCD control = 1215, ISR = 255 | 0.845 ± 0.021 (0.813 ~ 0.897) | 0.815 ± 0.024 (0.775 ~ 0.864) | 0.030 ± 0.016 (-0.004 ~ 0.071) | 13/50 |
| > 6 times of mCD control = 1212, ISR = 236 | 0.872 ± 0.023 (0.825 ~ 0.914) | 0.850 ± 0.021 (0.806 ~ 0.889) | 0.021 ± 0.016 (-0.021 ~ 0.057) | 14/50 |
| > 5 times of mCD control = 1203, ISR = 204 | 0.900 ± 0.017 (0.859 ~ 0.932) | 0.886 ± 0.016 (0.853 ~ 0.923) | 0.014 ± 0.013 (-0.017 ~ 0.040) | 3/50 |
| > 4 times of mCD control = 1200, ISR = 173 | 0.918 ± 0.016 (0.885 ~ 0.950) | 0.915 ± 0.016 (0.878 ~ 0.945) | 0.003 ± 0.010 (-0.018 ~ 0.021) | 0/50 |
Values are mean ± SD (minimum ~ maximum) from the 50 test datasets.
a Value of random forest minus value of logistic regression.
b DeLong test was used.
c P value less than 0.05 was revealed in no. of the 50 test datasets.
mCD = mean Cooks distance.
Fig 3Importance of the features.
The CPI, MDA, and MDGini of the variables. Sort by descending CPI. Values were all scaled to 0 ~10 for presentation. The higher the value, the more important the variable is. Abbreviations: ACS = acute coronary syndrome, CKD = chronic kidney disease, CPI = conditional permutation importance, MDA = mean decrease accuracy, MDGini = mean decrease Gini.