Literature DB >> 35604911

Random forest vs. logistic regression: Predicting angiographic in-stent restenosis after second-generation drug-eluting stent implantation.

Zhi Jiang^1,2, Longhai Tian^1,2, Wei Liu^1,2, Bo Song^1,2, Chao Xue^1,2, Tianzong Li^1,2, Jin Chen^1,2, Fang Wei^1,2.

Abstract

As the rate of percutaneous coronary intervention increases, in-stent restenosis (ISR) has become a burden. Random forest (RF) could be superior to logistic regression (LR) for predicting ISR due to its robustness. We developed an RF model and compared its performance with the LR one for predicting ISR. We retrospectively included 1501 patients (age: 64.0 ± 10.3; male: 76.7%; ISR events: 279) who underwent coronary angiography at 9 to 18 months after implantation of 2nd generation drug-eluting stents. The data were randomly split into a pair of train and test datasets for model development and validation with 50 repeats. The predictive performance was assessed by the area under the curve (AUC) of the receiver operating characteristic (ROC). The RF models predicted ISR with larger AUC-ROCs of 0.829 ± 0.025 compared to 0.784 ± 0.027 of the LR models. The difference was statistically significant in 29 of the 50 repeats. The RF and LR models had similar sensitivity using the same cutoff threshold, but the specificity was significantly higher in the RF models, reducing 25% of the false positives. By removing the high leverage outliers, the LR models had comparable AUC-ROC to the RF models. Compared to the LR, the RF was more robust and significantly improved the performance for predicting ISR. It could cost-effectively identify patients with high ISR risk and help the clinical decision of coronary stenting.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35604911 PMCID： PMC9126385 DOI： 10.1371/journal.pone.0268757

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

Introduction

Percutaneous coronary intervention (PCI) has been a routine clinical practice for revascularization in patients with coronary artery disease (CAD), by reducing mortality in ST-segment elevation myocardial infarction and improving quality of life [1]. The mid-term risk of death associated with PCI using second-generation drug-eluting stent (DES) is close to that associated with coronary artery bypass grafting, except for individuals with diabetes and/or three-vessel disease [2]. As the rate of revascularization by stenting continues to increase, in-stent restenosis (ISR) has become a burden that impairs patient well-being [3]. About 10% of the PCIs in the United States were for ISR lesions, and approximately 25% of the patients with ISR presented acute myocardial infarction. Predicting ISR would enable the possibility to optimize stent procedure, closely monitor or consider an alternate treatment. However, the existing risk model has not been used in clinical practice due to limited external validation [4]. A powerful and robust prediction model is urgently needed. Logistic regression (LR) is a standard approach for binary prediction, but it is easily impacted by outliers [5]. Outlier is an observation point that is distant from other observation points. They produce leverage effect to the LR model and impair its predictive performance. Recently, random forest (RF), a machine-learning (ML) algorithm, has gained popularity in predicting clinical outcomes. In a large-scale benchmark experiment, RF outperformed LR in prediction in 69% of datasets from open ML databases [6]. Vien et al. found that the RF model was superior to the traditional LR model in predicting pacemaker implantation following transcatheter aortic valve replacement [7]. A previous study reported that the ML-based algorithms had higher accuracy than the existing risk score model in predicting ISR [8]. But no significant difference was revealed between RF and LR due to the small sample size (263 patients with 23 ISR events). We hypothesized that the RF model can be used in ISR prediction and have better performance than the LR model due to higher robustness. We developed an RF model and compared its predictive metrics to the LR model in a larger retrospective dataset including 1501 patients and 279 ISR events. The robustness of the RF and LR model was also tested. The following article is presented following the STROBE reporting checklist [9].

Method

Data source

The data were retrospectively collected from the database at Guizhou Provincial People’s Hospital with the institutional ethics committee’s approval. The requirement for informed participant consent was waived by the ethics committee since the data were deidentified. ISR is defined as more than 50% stenosis within or 5 mm adjacent to a previously stented segment by quantitative coronary analysis (QCA; syngo QCA software, Siemens) [10]. We screened 2508 patients who had reassessed coronary dimension by QCA within 9 to 18 months after prior coronary stenting between January 2014 and August 2020 (Fig 1). We excluded 967 patients who had prior stenting procedures in other hospitals, 16 receiving stents other than 2nd generation DES, and 24 with missing essential clinical characteristics. A total of 1501 patients were finally included in the study. 279 patients were diagnosed with ISR. 1222 patients without ISR were identified as control. No patients with PCI in bypass grafts were included.

Fig 1

Overview of the data source and model development.

Abbreviations: DES = drug-eluting stent, ISR = intra-stent restenosis, ROC = receiver operating characteristic, PR = precision-recall.

Overview of the data source and model development.

Abbreviations: DES = drug-eluting stent, ISR = intra-stent restenosis, ROC = receiver operating characteristic, PR = precision-recall.

Model development

We used the open-source R software version 4.0.5 (The R Foundation, Vienna, Austria) for ML model development. The data were randomly split into the train (75%) and test datasets (25%) with 50 repeats, generating 50 paired train and test datasets (Fig 1). Each test dataset was unseen to its paired train dataset. The 10-fold cross-validation method was used for tuning hyperparameters, selecting variable subsets, and developing models in the train datasets. Then each model was validated in the paired test dataset. The design was aimed to avoid any over-optimistic or over-pessimistic results by chance. 15 ISR predictors were selected according to documentation, including patient age, male gender, smoking history, clinical presentation of acute coronary syndrome (ACS), diabetes, hypertension, dyslipidemia, chronic kidney disease (CKD) stage, number of stenotic vessels (>50% of luminal diameter by QCA), number of stenting vessels, minimum stent diameter, total stent length, left main artery stenting, bifurcation stenting (left main), and stenting in complex lesions (type B2 and C) [11]. The CKD stage is classified by calculating the estimated glomerular filtration rate using the Modification of Diet in Renal Disease Study equation [12]. The complex lesion was classified by two experienced interventional cardiologists according to the ACC/AHA criteria [13]. The hyperparameter of mtry (the number of random feature candidates at each split) and ntree (the number of trees in forest) in the RF model were first tuned by the grid search method (caret package, version 6.0–88; randomForest package, version 4.6–14). Then we measured the conditional permutation importance (CPI), mean decrease accuracy (MDA), and mean decrease Gini (MDGini) to select important variables using the 50 train datasets (permimp package, version 1.0–1) [14]. We unselected left main artery stenting and bifurcation stenting due to their low values in all the parameters (S1 Fig). For the LR model, the stepwise Akaike information criterion (AIC) method was performed to exclude redundant variables in 50 train datasets (MASS package, version 7.3–54; stats package, version 4.0.5). Left main stenting and bifurcation stenting were unselected due to their high exclusion frequency (S2 Fig). Then significant collinearity was detected between the number of stenting vessels and total stent length using the variance inflation factor (mctest package, version 1.3.1). We unselected the number of stenting vessels to alleviate the collinearity, because the total stent length had high scores in the variable importance analysis, and it was not excluded once by stepwise AIC method. A total of 12 variables were finally selected. Patient age, number of stenotic vessels, total stent length, and minimum stent diameter were input as continuous variables. CKD stage was input as ordered categorical variables. The male gender, smoking history, stenting for ACS, diabetes, hypertension, dyslipidemia, and complex lesions (type B2 and C) were input as binary variables. For RF model development using 10-fold cross-validation, the mtry and ntree were tuned each time.

Model performance and interpretation

The predictive performance was evaluated using the test datasets. The area under the curve (AUC) of receiver operating characteristic (ROC) and precision-recall (PR) curves were calculated (pROC package, version 1.17.0). The sensitivity, specificity, positive predictive value, negative predictive value, detection rate, detection prevalence, F1 score, and accuracy at the cutoff thresholds of 0.8 sensitivity were evaluated using a confusion matrix. The variable importance in the RF model was assessed by CPI, MDA, and MDGini.

Test model robustness

The model robustness was tested by removing outliers from the total data. We initialized a logistic regression model using the total data by inputting the 12 variables. The outliers were detected by the Cooks distance (stats package, version 4.0.5). Then we sequentially removed patients with more than 8 to 4 times of mean Cooks distance (mCD) from the study population, and reperformed the model development and validation in the 50 paired train and test datasets (S3 Fig). The ROC curves were analyzed to evaluate the model accuracy.

Statistical analysis

For baseline characteristics, continuous variables were presented as mean ± standard deviation (SD). The Student’s t-test was used for comparison if normally distributed; otherwise, the Mann-Whitney U test was used. Categorical variables were presented as frequency (percentage), and comparison was performed using the Chi-square test. The predictive metrics were presented as mean ± SD (minimum ~ maximum). The AUC-ROC were compared using the DeLong test. The AUC-PR were compared by values as no established statistical method. The accuracy, F1, sensitivity, specificity, PPV, and NPV in 50 test datasets were compared using paired Student’s t-test. A two-tailed P value of less than 0.05 was considered statistically significant. All the statistical analysis was performed by R software 4.0.5.

Results

Study population

An overall number of 1501 patients with 279 (18.6%) ISR events were included in the study. Patient baseline characteristics were shown in Table 1. The age and male gender distributions between control patients and those with ISR were similar (control: age 63.6 ± 10.5 years, 76.2% male; ISR: age 65.7 ± 9.6 years, 78.9% male). The prevalence of smoking history, hypertension, dyslipidemia, diabetes, CKD stage, ACS, number of stenotic vessels, and number of stenting vessels were significantly higher in patients with ISR than those in control patients. The patients with ISR received stents with smaller minimum diameters and longer total lengths than the control patients.

Table 1

Baseline characteristics.

	Control n = 1222	ISR n = 279	P value
Male gender	931 (76.2)	220 (78.9)	0.383
Age, yrs	63.6 ± 10.5	65.7 ± 9.6	0.001
Bodyweight, kg	70.6 ± 11.5	71.5 ± 11.1	0.23
Smoking history	413 (33.8)	118 (42.3)	0.009
Hypertension	706 (57.8)	182 (65.2)	0.026
Dyslipidemia	379 (31.0)	115 (41.2)	0.001
Diabetes	433 (35.4)	171 (61.3)	< 0.001
CKD stage
I or II	944 (77.3)	164 (58.8)	< 0.001
III	223 (18.2)	69 (24.7)
IV	43 (3.5)	30 (10.8)
V	12 (1.0)	16 (5.7)
LVEF, %	50.8 ± 8.0	49.0 ± 8.4	0.002
DAPT	1198 (98.0)	269 (96.4)	0.101
Statins	1203 (98.4)	274 (98.2)	0.775
ACEI/ARB	1117 (91.4)	258 (92.5)	0.562
β-blocker	1075 (88.0)	247 (88.5)	0.794
ACS	948 (77.6)	232 (83.2)	0.049
Number of stenotic vessels
1 vessel	615 (50.3)	100 (35.8)	< 0.001
2 vessels	392 (32.1)	97 (34.8)
3 vessels	215 (17.6)	82 (29.4)
Left main stenosis	48 (3.9)	22 (7.9)	0.008
Number of stenting vessels
1 vessel	959 (78.5)	158 (56.6)	< 0.001
2 vessels	229 (18.7)	100 (35.8)
3 vessels	34 (2.8)	21 (7.5)
Left main stenting	48 (3.9)	22 (7.9)	0.008
Bifurcation stenting	33 (2.7)	14 (5)	0.07
Complex lesion (type B2 and C)	500 (40.9)	201 (72.0)	< 0.001
Minimum stent diameter, mm	3.0 ± 0.4	2.7 ± 0.3	< 0.001
Total stent length, mm	43.0 ± 21.6	59.7 ± 27.4	< 0.001

Values are n (%) or mean ± SD.

ACEI = angiotensin-converting enzyme inhibitors; ACS = acute coronary syndrome; ARB = angiotensin receptor blockers; CKD = chronic kidney disease; DATP = dual antiplatelet therapy; LVEF = left ventricular ejection fraction.

Values are n (%) or mean ± SD. ACEI = angiotensin-converting enzyme inhibitors; ACS = acute coronary syndrome; ARB = angiotensin receptor blockers; CKD = chronic kidney disease; DATP = dual antiplatelet therapy; LVEF = left ventricular ejection fraction.

Model performance

The ROC and PR curves from 1 of the 50 test datasets are shown in Fig 2A and 2B. The RF models had an overall better predictive performance than the LR models (Table 2). The RF models predicted ISR with 0.45 ± 0.015 (0.000 ~ 0.075) larger AUC-ROC than LR models [0.829 ± 0.025 (0.783 ~ 0.880) vs. 0.784 ± 0.027 (0.722 ~ 0.835)]. The RF models had significantly larger AUC-ROC than the LR models in 29 of the 50 test datasets (Fig 2C). The AUC-PR was also larger in the RF model than that of LR model in 49 of the 50 test datasets (Fig 2D). The predictive metrics were assessed in the test datasets using the cutoff threshold of 0.8 sensitivity. The sensitivity, NPV, and detection rate were similar, but the RF models had significantly higher specificity, PPV, F1 score, accuracy, and lower detection prevalence than LR in the majority of the test datasets. In general, the RF model predicted approximately 25% less false positive than the LR with similar sensitivity of 80%.

Fig 2

Analysis of the ROC and PR curves.

Table 2

The predictive performance between the RF and LR models.

	Random forest	Logistic regression	Difference^a	P value
AUC-ROC	0.829 ± 0.025 (0.783 ~ 0.880)	0.784 ± 0.027 (0.722 ~ 0.835)	0.045 ± 0.015 (0.000 ~ 0.075)	29/50^b
AUC-PR	0.512 ± 0.056 (0.389 ~ 0.682)	0.435 ± 0.047 (0.313 ~ 0.548)	0.077 ± 0.038 (-0.025 ~ 0.193)	NA
Sensitivity	0.801 ± 0.057 (0.667 ~ 0.899)	0.793 ± 0.062 (0.623 ~ 0.899)	0.007 ± 0.053 (-0.116 ~ 0.101)	0.335^c
Specificity	0.717 ± 0.031 (0.652 ~ 0.770)	0.623 ± 0.033 (0.561 ~ 0.692)	0.094 ± 0.035 (0.023 ~ 0.180)	< 0.001^c
PPV	0.392 ± 0.026 (0.333 ~ 0.444)	0.323 ± 0.019 (0.272 ~ 0.362)	0.069 ± 0.023 (0.016 ~ 0.127)	< 0.001^c
NPV	0.941 ± 0.015 (0.910 ~ 0.969)	0.931 ± 0.018 (0.887 ~ 0.964)	0.010 ± 0.015 (-0.026 ~ 0.036)	< 0.001^c
Detection rate	0.148 ± 0.011 (0.123 ~ 0.166)	0.146 ± 0.011 (0.115 ~ 0.166)	0.001 ± 0.010 (-0.021 ~ 0.019)	0.335^c
Detection prevalence	0.378 ± 0.030 (0.316 ~ 0.439)	0.454 ± 0.034 (0.366 ~ 0.524)	-0.076 ± 0.034 (-0.160 ~ -0.003)	< 0.001^c
F1 score	0.525 ± 0.029 (0.465 ~ 0.581)	0.459 ± 0.025 (0.388 ~ 0.506)	0.067 ± 0.025 (0.002 ~ 0.114)	< 0.001^c
Accuracy	0.759 ± 0.027 (0.705 ~ 0.811)	0.708 ± 0.027 (0.634 ~ 0.758)	0.051 ± 0.024 (-0.014 ~ 0.094)	< 0.001^c

Values are mean ± SD (minimum ~ maximum) from the 50 random test datasets.

a The value of random forest minus the value of logistic regression from each test dataset.

b DeLong test was used. P value less than 0.05 was revealed in 29 of the 50 test datasets.

c Paired student’s T test was used.

AUC = area under the curve; NPV = negative predictive value; PPV = positive predictive value; PR = precision-recall; ROC = receiver operating characteristic.

Analysis of the ROC and PR curves.

The representative ROC (A) and PR curve (B) from 1 of the 50 test datasets are shown. The AUC-ROC (C) and AUC-PR (D) in the 50 test datasets were presented. The X-axis denotes each test dataset. The Y-axis denotes the value of AUC-ROC or AUC-PR. The AUC-ROCs between the RF and LR models were compared by the DeLong test. *P < 0.05, #P < 0.01, †P < 0.001. Abbreviations: AUC = area under the curve; LR = logistic regression; RF = random forest; ROC = receiver operating characteristic; PR = precision-recall. Values are mean ± SD (minimum ~ maximum) from the 50 random test datasets. a The value of random forest minus the value of logistic regression from each test dataset. b DeLong test was used. P value less than 0.05 was revealed in 29 of the 50 test datasets. c Paired student’s T test was used. AUC = area under the curve; NPV = negative predictive value; PPV = positive predictive value; PR = precision-recall; ROC = receiver operating characteristic.

Model robustness

The AUC-ROC significantly increased in both RF and LR models as the outliers were sequentially removed (Table 3). After the patients with more than 4 time of mCD were removed, the AUC-ROC was comparable between the RF and LR models [0.918 ± 0.016 (0.885 ~ 0.950) vs. 0.915 ± 0.016 (0.878 ~ 0.945)].

Table 3

Robustness test by sequentially removing the outliers.

	AUC-ROC			P < 0.05^b
	Random forest	Logistic regression	Difference^a	P < 0.05^b
Total data control = 1222, ISR = 279	0.829 ± 0.025 (0.783 ~ 0.880)	0.784 ± 0.027 (0.722 ~ 0.835)	0.045 ± 0.015 (0.000 ~ 0.075)	29/50^c
Removal of the outliers with
> 8 times of mCD control = 1217, ISR = 266	0.836 ± 0.021 (0.786 ~ 0.875)	0.801 ± 0.028 (0.734 ~ 0.871)	0.035 ± 0.017 (-0.002 ~ 0.066)	21/50^c
> 7 times of mCD control = 1215, ISR = 255	0.845 ± 0.021 (0.813 ~ 0.897)	0.815 ± 0.024 (0.775 ~ 0.864)	0.030 ± 0.016 (-0.004 ~ 0.071)	13/50^c
> 6 times of mCD control = 1212, ISR = 236	0.872 ± 0.023 (0.825 ~ 0.914)	0.850 ± 0.021 (0.806 ~ 0.889)	0.021 ± 0.016 (-0.021 ~ 0.057)	14/50^c
> 5 times of mCD control = 1203, ISR = 204	0.900 ± 0.017 (0.859 ~ 0.932)	0.886 ± 0.016 (0.853 ~ 0.923)	0.014 ± 0.013 (-0.017 ~ 0.040)	3/50^c
> 4 times of mCD control = 1200, ISR = 173	0.918 ± 0.016 (0.885 ~ 0.950)	0.915 ± 0.016 (0.878 ~ 0.945)	0.003 ± 0.010 (-0.018 ~ 0.021)	0/50^c

Values are mean ± SD (minimum ~ maximum) from the 50 test datasets.

a Value of random forest minus value of logistic regression.

b DeLong test was used.

c P value less than 0.05 was revealed in no. of the 50 test datasets.

mCD = mean Cooks distance.

Values are mean ± SD (minimum ~ maximum) from the 50 test datasets. a Value of random forest minus value of logistic regression. b DeLong test was used. c P value less than 0.05 was revealed in no. of the 50 test datasets. mCD = mean Cooks distance.

Variable importance

The CPI, MDA, and MDGini were calculated from the 50 RF models (Fig 3). Although the results were discordant, the total stent length and minimum stent diameter ranked among the most important features for predicting ISR.

Fig 3

Importance of the features.

The CPI, MDA, and MDGini of the variables. Sort by descending CPI. Values were all scaled to 0 ~10 for presentation. The higher the value, the more important the variable is. Abbreviations: ACS = acute coronary syndrome, CKD = chronic kidney disease, CPI = conditional permutation importance, MDA = mean decrease accuracy, MDGini = mean decrease Gini.

Importance of the features.

Discussion

Major findings

In the present study, using 50 random splits of paired train and test datasets, we found that the RF model was more robust and showed stable superiority in predicting angiographic ISR compared to the LR model. The total stent length and minimum stent diameter were the most important features for predicting ISR in the RF model.

Comparison of the models

A model is considered to be robust if its accuracy is less affected by the outliers in the train dataset [15]. The robustness is usually tested by injecting outliers into the data. In our study, the multivariate outliers were the control patients with high ISR probability and the ISR patients who had low ISR probability. To test model robustness, we tailored the data by removing the patients with more than 4 times of mCD (S4 Fig). Then the LR model had comparable accuracy to the RF model. The result reversely provided the evidence that RF was more robust than LR. However, the cutoff threshold of outlier is arbitrary, and the multivariate outlier is associated with the study population and variables. In addition, removing outliers to achieve higher accuracy is not feasible in prospective studies in which the outcome is unknown until observed. Overfitting is the concept that a prediction model fits well with train datasets, but does not predict accurately with unseen test datasets. One of LR’s limitations is when multiple variables with correlations are included, serious deviation would be generated and lead to overfitting [5]. The predictors of ISR are of multiple correlation and interaction [11]. Patients with diabetes are associated with renal dysfunction, dyslipidemia, long lesion, and smaller vascular diameter [16-18]. RF is an ensemble-based ML algorithm that uses multiple de-correlated decision trees to make a prediction. The tree-based model can be resistant to correlative variables [19].

Important features

RF is not merely a black-box as other ML algorithms. We calculated the CPI, MDA, and MDGini to assess the importance of the variables. The CPI is considered to be more stable and reliable than the others [14]. The total stent length and minimum stent diameter ranked the most important features. Longer total stent length could imply a more complex vascular morphology and stenting approaches, such as multivessel disease, diffuse lesion, side branch, and bifurcate technique. Calcific lesion is associated with long lesion, older age, diabetes, and renal dysfunction. It is the major cause of stent under expansion and malapposition, which subsequently lead to ISR [20]. Longer total stent length could also correlate to smaller stent diameter in a diffuse lesion since more distal vessels could be targeted for stenting. A large-scale trial using intravascular ultrasound (IVUS) showed the cutoff of minimal stent area for prediction angiographic ISR was 5.3mm2 ~ 5.7mm2 [21]. By transforming to diameter, it was 2.6mm ~ 2.7mm, indicating a significantly higher ISR rate while implanting stents with diameters less than 2.75mm even in the absence of under expansion. Neoatherosclerosis is an important pathological characteristic of ISR in the second-generation DES era [22]. It can be accelerated atherosclerosis due to incomplete endothelialization, disrupted endothelial function, and excessive uptake of circulating lipid [23]. Smoking, hypertension, diabetes, CKD, and multivessel disease are associated with impaired endothelial function, and dyslipidemia contributes to a higher level of blood triglyceride and cholesterol. Stenting for ACS indicated stenting on unstable lesions that is a significant risk factor of neoatherosclerosis [22].

Discordance

Jesús et al. compared 6 ML algorithms with 3 traditional risk score systems in predicting ISR using the data containing 263 patients from the GRACIA-3 trial [8]. A total of 68 variables, including bare-metal and DES stent, were screened for model development. Using the 10-fold cross-validation method, the RF and LR were among the models with the highest accuracy, but no statistically significant difference was revealed due to the small sample size (AUC-ROC: power = 0.218) and the possibility of noise variables [19]. We conducted the comparison of RF and LR in a larger retrospective patient cohort. Our study population consisted of patients receiving 2nd generation DES on de novo atherosclerotic lesions with worse renal function, multivessel disease, complex lesion, and small vessel diameter. It would therefore better fit the routine clinical practice in developing countries where coronary artery bypass graft surgery is not widely applicable. By randomly splitting the data into paired train and test datasets with 50 repeats, we found that the RF models all had larger AUC-ROC than the LR models. As the difference of sensitivity between the RF and LR model oscillated around zero, the differences of specificity, PPV, and F1 score were all above zero, indicating that the stable and better performance of the RF models is not by chance (S5 Fig). Cui et al. reported that 6 plasma metabolites can be used to predict ISR with a very high accuracy of 0.93 [24]. However, the metabolites are not routinely measured by mass spectrum in routine clinical practice, and their predictions were made after coronary stenting. Our model provides prediction before coronary stenting based on the variables obtained from daily practice, QCA, and stenting strategy.

Clinical implication

Intravascular imaging modalities enable the ability to optimize PCI strategy and precise stenting [25]. The IVUS was associated with a 40% reduction in target vessel revascularization compared to angiographic guidance [26]. In our study, the intravascular imaging devices were only employed in less than 3% of the patients due to increased expense. The RF model predicted ISR with similar sensitivity of 80% but an average of 0.094 higher specificity than the LR model, reducing 25% of false positives. If the models had been used to identify patients with a high risk of ISR for employing intravascular imaging, close follow up and considering alternative therapy, the RF could have been more cost-effective than the LR by decreasing 25% expense with similar reductions in ISR and target vessel revascularization.

Limitations

First, the current study is limited by its retrospective and single-center nature. The indications for repeat QCA included newly onset chest discomfort, prior high-risk PCI, and ischemic findings in non-invasive testing. The patients could be a self-selected high-risk group, questioning the validation in external and prospective cohorts. Further model generalization including less biased observational cohorts is required. Second, gene polymorphisms, blood biomarkers, intravascular imaging, coronary calcification, and PCI procedures which have been reported to be risk factors of ISR were not included in our models. Further feature selection for a more generalized model with better predictive performance is an ongoing work by our team. Third, the QCA was reassessed 9 ~ 18 months after initial coronary stenting. Some control patients who had reassessment of QCA early at 9 months may have diagnosed ISR if QCA was reassessed late to 18 months. This bias could result in an underestimation of the accuracy of the predictive models. Forth, the best cutoff threshold is unknown. Identifying more patients who will develop ISR with acceptable specificity is the rationale that 0.8 of sensitivity is used in our study. Finally, the RF and LR algorithm are limited in inputting coronary imaging data. The QCA could miss important features that are difficult to quantify. The convolutional neural network has the advantage of fully utilizing imaging data and could play a key role in future studies.

Conclusions

Using the variables obtained from patient characteristics and QCA, we developed an ML model using RF to predict angiographic ISR in the retrospective cohort of patients who had initial coronary stenting for 9 ~ 18 months. The robust RF model improved predictive performance as compared with the traditional LR model and could help clinical decisions for coronary stenting with higher cost-effectiveness.

Variable selection in the RF algorithm.

The conditional permutation importance, mean decrease accuracy, and mean decrease Gini of the 15 variables. Order as descending CPI value. The higher the value, the more important the variable is. Abbreviations: ACS = acute coronary syndrome, CKD = chronic kidney disease. (TIF) Click here for additional data file.

Variable exclusion frequency in the LR model by the stepwise AIC method.

The exclusion frequency was counted from the 50 LR models. The exclusion frequency is denoted at the top of each column. Order as ascending exclusion frequency. The higher the frequency, the variable less influenced the LR model. Abbreviations: ACS = acute coronary syndrome, CKD = chronic kidney disease. (TIF) Click here for additional data file.

The workflow of robustness test.

Abbreviations: ISR = intra-stent restenosis, ROC = receiver operating characteristic. (TIF) Click here for additional data file.

The Cooks distance and probability of ISR.

The Cooks distances among the study population (A). The X-axis denotes each patient. The Y-axis denotes the Cooks distance of each patient in ascending order. The blue solid line denotes the mCD. The red dashed line denotes the threshold of 4 times of mCD. The histogram of the probability of ISR of the study population (B) and that after removal of the patients with more than 4 times of mCD (C). The X-axis ranges from 0 to 1, denoting the probability of ISR. The Y-axis denotes the patient frequency of the probabilities. Abbreviations: ISR = in-stent restenosis; mCD = mean Cooks distance. (TIF) Click here for additional data file.

Difference of the metrics between the RF and LR models.

The difference was calculated by subtracting the value of the LR model from that of the RF model in each test dataset. The difference of sensitivity oscillated around zero in the 50 test datasets. However, the differences in specificity, PPV, and F1 scores were all above zero, indicating that the RF models had higher specificity, PPV, and F1 scores than the LR models under similar sensitivity. Abbreviations: LR = logistic regression; RF = random forest. (TIF) Click here for additional data file.

Minimal data set.

(XLSX) Click here for additional data file. 26 Dec 2021

PONE-D-21-30474

Random forest vs. logistic regression: predicting angiographic in-stent restenosis after second-generation drug-eluting stent implantation

PLOS ONE Dear Dr. Fang, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Feb 09 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Ankur Sethi Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please update your Methods section to state that the requirement for informed participant consent was waived by your ethics committee due to the fact that data was anonymized. 3. Thank you for stating the following financial disclosure: FW recieved the Clinical Research Center Project of Department of Science and Technology of Guizhou Province [NO.(2017)5405]; ZJ recieved the Guizhou Provincial High-level Innovative Talents Project (GZSYQCC[2015]006); WL recieved the Guizhou Provincial Science and Technology Foundation (No.[2019]1197); FW recieved the Guizhou Provincial Science and Technology Social Development Project (No.[2018]2794). Please state what role the funders took in the study. If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." If this statement is not correct you must amend it as needed. Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf. 4. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability. Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized. Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access. We will update your Data Availability statement to reflect the information you provide in your cover letter. Additional Editor Comments: Issues – Page 2 line 43 – I think the mortality data is controversial for acute coronary syndrome. May be authors meant acute ST elevation myocardial infarction? Page 3 line 44 – the reference 1 cites IFR-SWEDHEART trial which compared FFR to iFR and probably not an appropriate reference for improvement in outcomes after PCI for ACS. Authors selected 1501 out of 2508 patients (59%) patients as they underwent a repeat coronary angiogram within 9-18 months of the index procedure. First, this misses out on cases of instent restenosis in rest 41% patients who did not receive coronary angiogram for various reasons including – death, no symptoms, or other medical reasons. Secondly, the reasons for the patients who were scheduled for repeat angiogram is unclear. Was it for staged intervention, new symptoms, or ACS? Therefore, these patients may be self-selected high risk group who received an unplanned angiogram compared to 41% who did not, questioning the validity and applicability of these models to all comers. “We excluded patients receiving stents by operators from other hospital” These patients were treated at Guizhou Provincial hospital? How many patients were excluded due missing data, and use of stent other than 2nd generation stents? Was there evidence of ischemia based on non-invasive testing in the territory of vessel with in-stent restenosis? Were patients with PCI of saphenous venous graft included? It may be of value to include severe calcification and/or use of atherectomy as predictor of in-stent restenosis. Would author consider use of intra-coronary imaging as a predictor for in-stent restenosis. What was the proportion of patients with imaging in two groups? Minor issues – Can authors briefly discuss how should there results be used in a clinical meaningful way to predict and/or prevent restenosis ? [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Wei et al presents a well-written and timely manuscript analyzing the utility of RF versus LR in predicting ISR after second generation DES implantation. RF may provide added benefits over LR in clinical practice given its ability to identify more complex feature patterns and provide better accuracy as compared to LR. Additionally, RF can also provide feature importance when predicting a specific outcome, whereas LR cannot. In the specific case of ISR as presented by the authors, RF may provide specific clinical prediction benefit over LR given the nature of significant interaction and multiple correlation between the predictors of ISR. LR analysis will likely provide less accuracy in predicting outcomes as seen in the manuscript presented by Wei et al due to these complex interactions between predictors of ISR and argues for the utilization of RF over LR in similar clinical scenarios where predictors of a specific outcome may have multiple correlation, and argues that RF may have important use in clinical practice over more traditional statistical analysis methods like LR. The importance of prospective trials in assessing the clinical utility of RF needs to be emphasized, and the retrospective nature of the manuscript is an important limitation. Outliers will not be readily identifiable in prospective studies if the study outcome is not known, which may limit the accuracy and potential clinical application of RF models. The authors should address in the manuscript whether any prospective validation studies utilizing RF in observational cohorts have been previously published and the outcomes of these studies in reference to their clinical utility. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 16 Jan 2022 Journal Requirements: 1. Please ensure that your manuscript meets PLOS ONE's style requirements. We have carefully checked the style. We ensure our manuscript meets PLOS ONE’s style requirement. 2. Please update your Methods section to state that the requirement for informed participant consent was waived by your ethics committee due to the fact that data was anonymized. We have included the statement in Page 5 line 74-75. 3. Thank you for stating the financial disclosure. Please state what role the funders took in the study. Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf. We have included the funder statement in our cover letter. They took no role in the study. 4. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found.Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. We have uploaded the minimal data set as Supporting Information files. We added S1 File in page 23 line 460. Additional Editor Comments: Issues 1. Page 2 line 43 – I think the mortality data is controversial for acute coronary syndrome. May be authors meant acute ST elevation myocardial infarction? Thank you for pointing out the controversy. We have revised ACS to ST-segment elevation myocardial infarction. 2. Page 3 line 44 – the reference 1 cites IFR-SWEDHEART trial which compared FFR to iFR and probably not an appropriate reference for improvement in outcomes after PCI for ACS. Thank you for pointing out the inappropriate reference. We have revised Reference 1 to the 2018 ESC guidelines on myocardial revascularization. 3. Authors selected 1501 out of 2508 patients (59%) patients as they underwent a repeat coronary angiogram within 9-18 months of the index procedure. First, this misses out on cases of instent restenosis in rest 41% patients who did not receive coronary angiogram for various reasons including – death, no symptoms, or other medical reasons. Secondly, the reasons for the patients who were scheduled for repeat angiogram is unclear. Was it for staged intervention, new symptoms, or ACS? Therefore, these patients may be self-selected high risk group who received an unplanned angiogram compared to 41% who did not, questioning the validity and applicability of these models to all comers. We notice that the data source was not expressed clearly. All the 2508 patients had a repeat coronary artery angiography (CAG) and 360(14.4%) of them were diagnosed in-stent restenosis (ISR). Among the 1007(41%) patients who were excluded, 967 had prior coronary stenting in other hospitals, 16 had stents other than 2nd generation drug-eluting stents, and 24 had missing clinical data. We have revised the expression in Page 5 line 77-82. The numbers have been added in Fig 1 patient flow. We agree with the editor that the patients could be a self-selected high-risk group, and the external validation of these models is questioned. A more generalized model would be more meaningful in clinical practice, but selection bias and external validation might always be the limitation. Model generalization is our ongoing work. We have added the limitation in Page 17 line 298-302. The most frequent indications for repeat CAG included newly onset chest discomfort, prior high-risk PCI, and ischemic findings at non-invasive testing. It is difficult to trace back all the indications due to the retrospective nature. In the study, our primary goal was to compare the predictive performance between two machine-learning algorithms. We revealed that the resistance to outliers resulted in the better predictive performance of the random forest than the logistic regression. We respectfully suggest that further investigation of the indications would not add significantly to our major findings and conclusions. 4. “We excluded patients receiving stents by operators from other hospital” These patients were treated at Guizhou Provincial hospital? These patients had reassessment by CAG in the Guizhou Provincial Hospital, but their prior coronary stenting was performed in other hospitals. We have revised the expression in Page 5 line 77-82. 5. How many patients were excluded due missing data, and use of stent other than 2nd generation stents? 24 patients were excluded due to missing data. The numbers have been shown in Fig 1 and included in Page 5 line 80-82. 6. Was there evidence of ischemia based on non-invasive testing in the territory of vessel with in-stent restenosis? There was ischemic evidence in the patients who had ISR. But it is difficult to trace back all the non-invasive testing and ischemic territory due to its retrospective nature. The rationale of our machine-learning models was to predict ISR before coronary stenting using the variables from daily practice, CAG, and stenting strategy. The prediction could help identify patients with high-risk ISR for employing intravascular imaging in PCI procedures, close follow up or consideration of alternative therapy. We respectfully suggest that further correlating the non-invasive testing and ischemic territory would not add significantly to our major findings and conclusion. 7. Were patients with PCI of saphenous venous graft included? CABG was far more less performed than PCI in our district due to patient preference. No patients with PCI of bypass grafts were included in the study. We have added the statement in Page 5 line 84. 8. It may be of value to include severe calcification and/or use of atherectomy as predictor of in-stent restenosis. 9. Would author consider use of intra-coronary imaging as a predictor for in-stent restenosis. What was the proportion of patients with imaging in two groups? We would like to respond to issue 8&9 together. We agree with the editor that it is of value if calcification and/or use of a specific device are added to the predictors. The characteristics of calcification such as depth, thickness, angle, length, as well as the fracture of calcific plaque after stenting were significantly associated with the prognosis.[1-4] However, the intracoronary imaging devices were employed in less than 3% of the patients. We were not able to report detailed calcific characteristics (Page 17 line 294-296). We notice that the random forest and logistic regression algorithms are limited in utilizing the imaging data because quantifying imaging data into a group of variables could have lost important features. The emerging convolutional neural network could overcome the limitation by inputting the digital imaging data.[5] It could play a key role in guiding PCI and predicting clinical outcomes in future studies. We respectfully suggest that further including new features would not add significantly to our major findings and conclusion. The limitation has been added in Page 17 line 311-314. References 1. Wang X, Matsumura M, Mintz GS, Lee T, Zhang W, Cao Y, et al. In Vivo Calcium Detection by Comparing Optical Coherence Tomography, Intravascular Ultrasound, and Angiography. JACC Cardiovasc Imaging. 2017;10: 869–879. doi:10.1016/j.jcmg.2017.05.014 2. Sharma SK, Vengrenyuk Y, Kini AS. IVUS, OCT, and Coronary Artery Calcification: Is There a Bone of Contention?∗. JACC: Cardiovascular Imaging. 2017;10: 880–882. doi:10.1016/j.jcmg.2017.06.008 3. Fujino A, Mintz GS, Lee T, Hoshino M, Usui E, Kanaji Y, et al. Predictors of Calcium Fracture Derived From Balloon Angioplasty and its Effect on Stent Expansion Assessed by Optical Coherence Tomography. JACC Cardiovasc Interv. 2018;11: 1015–1017. doi:10.1016/j.jcin.2018.02.004 4. Maejima N, Hibi K, Saka K, Akiyama E, Konishi M, Endo M, et al. Relationship Between Thickness of Calcium on Optical Coherence Tomography and Crack Formation After Balloon Dilatation in Calcified Plaque Requiring Rotational Atherectomy. Circ J. 2016;80: 1413–1419. doi:10.1253/circj.CJ-15-1059 5. Anwar SM, Majid M, Qayyum A, Awais M, Alnowami M, Khan MK. Medical Image Analysis using Convolutional Neural Networks: A Review. J Med Syst. 2018;42: 226. doi:10.1007/s10916-018-1088-1 Minor issues 1. Can authors briefly discuss how should there results be used in a clinical meaningful way to predict and/or prevent restenosis ? Yes. We have added the discussion in Page 16 line 287-295. Comments to the Author Reviewer #1: Wei et al presents a well-written and timely manuscript analyzing the utility of RF versus LR in predicting ISR after second generation DES implantation. RF may provide added benefits over LR in clinical practice given its ability to identify more complex feature patterns and provide better accuracy as compared to LR. Additionally, RF can also provide feature importance when predicting a specific outcome, whereas LR cannot. In the specific case of ISR as presented by the authors, RF may provide specific clinical prediction benefit over LR given the nature of significant interaction and multiple correlation between the predictors of ISR. LR analysis will likely provide less accuracy in predicting outcomes as seen in the manuscript presented by Wei et al due to these complex interactions between predictors of ISR and argues for the utilization of RF over LR in similar clinical scenarios where predictors of a specific outcome may have multiple correlation, and argues that RF may have important use in clinical practice over more traditional statistical analysis methods like LR. 1. The importance of prospective trials in assessing the clinical utility of RF needs to be emphasized, and the retrospective nature of the manuscript is an important limitation. We have added the limitation in Page 17 line 298-302. 2. Outliers will not be readily identifiable in prospective studies if the study outcome is not known, which may limit the accuracy and potential clinical application of RF models. We agree with the reviewer that removing outliers is not feasible in prospective studies. The process that sequentially removing the patients according to Cook’s distance was to reveal the mechanism that the random forest was superior in predictive accuracy than the logistic regression. We had the comment in Page 14 line 232-233. 3. The authors should address in the manuscript whether any prospective validation studies utilizing RF in observational cohorts have been previously published and the outcomes of these studies in reference to their clinical utility. Sampedro-Gómez et al first trained and validated the machine-learning models to predict ISR using the cohort from the prospective randomized control trial (GRACIA-3). They revealed that the random forest had better predictive performance than the logistic regression, but the difference was not statistically significant. The study was introduced in Page 3 line 61-63, reference 8. Submitted filename: Response to Reviewers R1.docx Click here for additional data file. 9 May 2022 Random forest vs. logistic regression: predicting angiographic in-stent restenosis after second-generation drug-eluting stent implantation PONE-D-21-30474R1 Dear Dr. Wei, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, George Vousden Deputy Editor-in-Chief PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: (No Response) ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No 12 May 2022 PONE-D-21-30474R1 Random forest vs. logistic regression: predicting angiographic in-stent restenosis after second-generation drug-eluting stent implantation Dear Dr. Wei: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. George Vousden Staff Editor PLOS ONE

24 in total

1. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies.

Authors: Erik von Elm; Douglas G Altman; Matthias Egger; Stuart J Pocock; Peter C Gøtzsche; Jan P Vandenbroucke
Journal: Lancet Date: 2007-10-20 Impact factor: 79.321

2. Intravascular ultrasound-guided vs angiography-guided drug-eluting stent implantation in complex coronary lesions: Meta-analysis of randomized trials.

Authors: Chirag Bavishi; Partha Sardar; Saurav Chatterjee; Abdur Rahman Khan; Arpit Shah; Sameer Ather; Pedro A Lemos; Pedro Moreno; Gregg W Stone
Journal: Am Heart J Date: 2016-11-12 Impact factor: 4.749

3. 2018 ESC/EACTS Guidelines on myocardial revascularization.

Authors: Franz-Josef Neumann; Miguel Sousa-Uva; Anders Ahlsson; Fernando Alfonso; Adrian P Banning; Umberto Benedetto; Robert A Byrne; Jean-Philippe Collet; Volkmar Falk; Stuart J Head; Peter Jüni; Adnan Kastrati; Akos Koller; Steen D Kristensen; Josef Niebauer; Dimitrios J Richter; Petar M Seferovic; Dirk Sibbing; Giulio G Stefanini; Stephan Windecker; Rashmi Yadav; Michael O Zembala
Journal: Eur Heart J Date: 2019-01-07 Impact factor: 29.983

4. Intravascular ultrasound assessment of optimal stent area to prevent in-stent restenosis after zotarolimus-, everolimus-, and sirolimus-eluting stent implantation.

Authors: Hae-Geun Song; Soo-Jin Kang; Jung-Min Ahn; Won-Jang Kim; Jong-Young Lee; Duk-Woo Park; Seung-Whan Lee; Young-Hak Kim; Cheol Whan Lee; Seong-Wook Park; Seung-Jung Park
Journal: Catheter Cardiovasc Interv Date: 2013-11-09 Impact factor: 2.692

5. Plasma Phospholipids and Sphingolipids Identify Stent Restenosis After Percutaneous Coronary Intervention.

Authors: Song Cui; Kefeng Li; Lawrence Ang; Jinghua Liu; Liqian Cui; Xiantao Song; Shuzheng Lv; Ehtisham Mahmud
Journal: JACC Cardiovasc Interv Date: 2017-06-14 Impact factor: 11.195

6. Long-Term Prognostic Impact of Restenosis of the Unprotected Left Main Coronary Artery Requiring Repeat Revascularization.

Authors: Jens Wiebe; Constantin Kuna; Tareq Ibrahim; Martin Lösl; Salvatore Cassese; Sebastian Kufner; Heribert Schunkert; Robert A Byrne; Karl-Ludwig Laugwitz; Marco Valgimigli; Gert Richardt; Julinda Mehilli; Adnan Kastrati
Journal: JACC Cardiovasc Interv Date: 2020-10-12 Impact factor: 11.195

7. Differences in restenosis rate with different drug-eluting stents in patients with and without diabetes mellitus: a report from the SCAAR (Swedish Angiography and Angioplasty Registry).

Authors: Ole Fröbert; Bo Lagerqvist; Jörg Carlsson; Johan Lindbäck; Ulf Stenestrand; Stefan K James
Journal: J Am Coll Cardiol Date: 2009-05-05 Impact factor: 24.094

8. Incidence and predictors of restenosis after coronary stenting in 10 004 patients with surveillance angiography.

Authors: Salvatore Cassese; Robert A Byrne; Tomohisa Tada; Susanne Pinieck; Michael Joner; Tareq Ibrahim; Lamin A King; Massimiliano Fusaro; Karl-Ludwig Laugwitz; Adnan Kastrati
Journal: Heart Date: 2013-11-22 Impact factor: 5.994

9. Comparison of clinical outcomes between percutaneous coronary intervention for de novo lesions versus in-stent restenosis lesions.

Authors: Mitsuhiro Takeuchi; Tomotaka Dohi; Tatsuya Fukase; Ryota Nishio; Norihito Takahashi; Hirohisa Endo; Shinichiro Doi; Yoshiteru Kato; Iwao Okai; Hiroshi Iwata; Shinya Okazaki; Kikuo Isoda; Katsumi Miyauchi; Tohru Minamino
Journal: Cardiovasc Interv Ther Date: 2021-07-05