Literature DB >> 34880677

Supervised Learning Based Systemic Inflammatory Markers Enable Accurate Additional Surgery for pT1NxM0 Colorectal Cancer: A Comparative Analysis of Two Practical Prediction Models for Lymph Node Metastasis.

Jinlian Jin¹, Haiyan Zhou¹, Shulin Sun¹, Zhe Tian¹, Haibing Ren¹, Jinwu Feng¹.

Abstract

PURPOSE: Predicting lymph node metastasis (LNM) after endoscopic resection is crucial in determining whether patients with pT1NxM0 colorectal cancer (CRC) should undergo additional surgery. This study was aimed to develop a predictive model that can be used to reduce the current likelihood of overtreatment. PATIENTS AND METHODS: We recruited a total of 1194 consecutive CRC patients with pT1NxM0 who underwent endoscopic or surgical resection at the Gezhouba Central Hospital of Sinopharm between January 1, 2006, and August 31, 2021. The random forest classifier (RFC) and generalized linear algorithm (GLM) were used to screen out the variables that greatly affected the LNM prediction, respectively. The area under the curve (AUC) and decision curve analysis (DCA) were applied to assess the accuracy of predictive models.
RESULTS: Analysis identified the top 10 candidate factors including depth of submucosal invasion, neutrophil-lymphocyte ratio (NLR), platelet lymphocyte ratio (PLR), platelet-to-neutrophil ratio(PNR), venous invasion, poorly differentiated clusters, tumor budding, grade, lymphatic vascular invasion, and background adenoma. The performance of the GLM achieved the highest AUC of 0.79 (95% confidence interval [CI]: 0.30 to 1.28) in the training cohort and robust AUC of 0.80 (95% confidence interval [CI]: 0.36 to 1.24) in the validation cohort. Meanwhile, the RFC exhibited a robust AUC of 0.84 (95% confidence interval [CI]: 0.40 to 1.28) in the training cohort and a high AUC of 0.85 (95% CI: 0.41 to 1.29) in the validation cohort. DCAs also showed that the RFC had superior predictive ability.
CONCLUSION: Our supervised learning-based model incorporating histopathologic parameters and inflammatory markers showed a more accurate predictive performance compared to the GLM. This newly supervised learning-based predictive model can be used to determine an individually tailored treatment strategy.

Entities: Chemical

Keywords: colorectal cancer; generalized linear model; lymph nodes metastasis; machine learning; pT1NxM0; prediction model; random forest classifier

Year: 2021 PMID： 34880677 PMCID： PMC8645952 DOI： 10.2147/CMAR.S337516

Source DB: PubMed Journal: Cancer Manag Res ISSN： 1179-1322 Impact factor: 3.989

Introduction

CRC is the third most common malignant tumor, leading to extremely high rates of mortality.1,2 Metastasis is the main cause of cancer-related death.3 According to the current literature reports, even CRC patients diagnosed with pT1NxM0 have an estimated risk of LNM, which has been estimated to occur in 10%~15%.4,5 Colonoscopy remains the gold standard for detecting and resecting precancerous colorectal lesions, but it is unable to provide the status of the regional lymph nodes. Nowadays, endoscopic resection is accepted as a curative therapy for colorectal cancer because of its minimal invasiveness to the diagnosis and treatment of CRC.6,7 Additional surgical resection after endoscopic resection in patients with CRC can achieve complete staging and reduce the recurrence rate.8 However, endoscopic resection of pT1NxM0 CRC should be used selectively because of the high risk of LNM.9 Therefore, the remaining two-thirds of patients may increase the risk of surgical resection and related postoperative mortality.10 In addition, unnecessary surgical resection will not bring clinical benefits. Due to the lack of preoperative prediction of LNM, it is difficult to determine additional surgery after endoscopic resection of pT1NxM0 CRC. Given this situation, there is now a pressing need to develop methods to determine whether pT1NxM0 CRC patients should undergo additional surgery. Supervised learning(SL) is a branch of artificial intelligence, which encapsulates statistical and iterative algorithms to make fact query and complex decision-making possible.11,12 In addition, SL analysis is more effective than the traditional logistic linear regression (LLR) statistical method and can optimize variable screening.13 Therefore, combinatory uses of SL practical analysis and medical records for LNM prediction in the early monitoring of patients with pT1NxM0 CRC are worth exploring. In this study, we aimed to develop an LNM risk prediction model for pT1NxM0 CRC that utilizes clinical medical data to stratify patients by LNM risk after endoscopic resection. The capability of enabling expeditious and accurate LNM risk stratification of pT1NxM0 CRC may facilitate more timely interventions that are conducive to high-risk LNM management via early identification, and ensuring instant intervention as well as additional surgery, thus, hopefully assisting to strengthen the oncological monitoring during the early-stage.

Patients and Methods

Patients

Between January 1, 2006, and August 31, 2021, we prospectively collated data from consecutive patients who had been diagnosed with CRC at the Gezhouba Central Hospital of Sinopharm. This study was approved by the Institutional Ethics Committee of Gezhouba Central Hospital of Sinopharm (Reference No. 2020–006) and complies with the Declaration of Helsinki. Before any treatment, the written informed consent of all participants was obtained. All patient information is anonymous. The selection criteria are as follows: (1) Pathologically diagnosed as pT1NxM0 stage; (2) Lymph node status can be fully assessed (imaging and/or pathological specimen); (3) Patient medical records are complete and can be traced and consulted. The exclusion criteria are as follows: (1) patients with familial adenomatous polyposis, inflammatory bowel disease, and concurrent advanced colorectal cancer; (2) patients receiving radiotherapy and chemotherapy before surgery; (3) patients with multiple primary colorectal cancers lesions. In the beginning, 1194 patients who underwent endoscopic or surgical resection were included. Postoperative histopathological examination confirmed that they were all pT1NxM0 stage CRC. We divided the patients into two groups based on the data. Among them, the data of 835 patients were used for the machine learning of the artificial intelligence model, and the remaining 359 patients were used for model verification. In addition, 717 patients from another tertiary medical center served as a cohort for external validation of the model. The research flowchart is shown in Figure 1.

Figure 1

Flow chart of this study.

Evaluate Evidence of LNM Presence

The status of LNM after endoscopic resection was determined according to the patient follow-up and the pathological results of additional surgery. It is worth considering that because it is impossible to prove that some patients have already had LNM during the initial endoscopic surgery, or that LNM appeared in a local area after endoscopic surgery, in patients with potential risk, if they were diagnosed as LNM during rescue surgery negative patients were considered negative for LNM, while patients diagnosed as positive for LNM during additional surgery were excluded. In addition, for recurrent patients with multiple metastases without salvage surgery, lymph nodes that may have jumping metastases (the presence of discontinuous LNM) were also excluded.

Data Preparation

The clinical-pathological data of the patient, including the operation method, postoperative pathological examination, tumor length, tumor pathological type, tumor differentiation, depth of submucosal invasion, nerve vascular invasion, and the number of intraoperative lymph nodes dissections. Factors related to lymph node metastasis, including age, gender, preoperative CEA, preoperative CA19-9, treatment method, tumor location, tumor length, tumor pathological type, tumor differentiation, depth of submucosal invasion, vascular invasion, and The number of lymph node dissections. Blood samples (3–5 mL of whole blood) were collected from each patient on an empty stomach on the morning of the day before endoscopy. We also collected preoperative routine laboratory measurement results, including neutrophil count, lymphocyte count, platelet count, and monocyte count. Meanwhile, among the 24 original variables, we eliminated the repeated variables through correlation matrix analysis and solved the bias caused by multicollinearity. This study is in line with the research statement of TRIPOD to develop a prediction model for LNM in patients with pT1NxM0 CRC.14

Statistical Analysis and Evaluation of Models

Categorical variables are expressed in numbers (%) or interquartile ranges. Continuity variables are expressed as median and interquartile ranges. Bonferroni corrected probability values are used to compare qualitative data.15 Wilcoxon rank-sum test or chi-square test was used to compare the differences between different groups. RFC is a collection of various decision tree models.16 In the process of selecting variables, each node is divided by using the best subset of randomly selected explanatory variables or features, and the class prediction values generated by each tree are collected. Finally, the candidate variables of the prediction model, namely the Gini index, are determined according to the weight. The GLM estimation algorithm based on β coefficients has coarsening covariates and multiple collinearity problems. Therefore, this study adopts the Akaike information criterion(AIC), stepwise regression, and screening variables to obtain the optimal subset.17 Compared with the GLM, the performance of the RFC model is through receiver operating characteristic curve(ROC), network reclassification improvement(NRI), and decision curve analysis(DCA). NRI specifies the net proportion of patients who are redefined as high-risk and low-risk events of interest and non-interest events, respectively. All data analysis is performed using the Python programming language (version 3.9.2, Python Software Foundation, ) and R Statistical Computing Project (version 4.0.5, ). A P value less than 0.05 is considered statistically significant.

Results

Patient Epidemiology and Characteristics

According to the established inclusion and exclusion criteria, 1194 pT1NxM0 CRC patients who received surgery or salvage surgery after endoscopic resection treatment from January 1, 2006, and August 31, 2021, in our center were included and randomly divided into a training set (n=835) and internal validation set (n =359). The demographic and clinical characteristics of the total population, training and validation cohorts were shown in Table 1. The details of the external verification queue were summarized in . In pT1NxM0 CRC patients, the final pathological examination in the entire cohort was 7.62%, while in the training and validation cohorts, the incidence of LNM was 7.54% and 7.80%, respectively. The incidence of LNM in the external cohort was 7.67%, which was consistent with the results of the internal cohort. We speculated that the potential variables that may be related to LNM were included in the heatmap matrix for analysis. As shown in , a total of 24 variables constituted the correlation state with LNM, while the depth of submucosal invasion, NLR, PLR, PNR, venous invasion, poorly differentiated clusters, tumor budding, grade, lymphatic vascular invasion, and background adenoma showed a positive correlation with LNM, suggesting that the above indicators may be used as potential effective variables of LNM prediction model.

Table 1

Baseline Demographic and Clinical Characteristics of the Study Cohort

Variables	Subgroups	Training Cohort				Validation Cohort
		Overall	LNM(-)	LNM(+)	P-value	Overall	LNM(-)	LNM(+)	P-value
		N=835	N=772	N=63	P-value	N=359	N=331	N=28	P-value
Sex (%)	Female	178 (21.3)	142 (18.4)	36 (57.1)	<0.001	81 (22.6)	68 (20.5)	13 (46.4)	0.004
	Male	657 (78.7)	630 (81.6)	27 (42.9)		278 (77.4)	263 (79.5)	15 (53.6)
Age, y		48.00 [34.00, 63.00]	48.00 [34.00, 63.00]	47.00 [34.00, 62.00]	0.327	50.00 [36.00, 63.50]	51.00 [36.00, 64.00]	44.50 [30.50, 52.50]	0.037
BMI, kg/m²		24.50 [21.30, 27.80]	24.60 [21.20, 27.90]	23.90 [22.05, 26.40]	0.41	24.50 [21.25, 27.80]	24.50 [21.40, 27.90]	24.45 [20.80, 27.07]	0.506
Smoking (%)	No	413 (49.5)	384 (49.7)	29 (46.0)	0.663	187 (52.1)	175 (52.9)	12 (42.9)	0.411
	Yes	422 (50.5)	388 (50.3)	34 (54.0)		172 (47.9)	156 (47.1)	16 (57.1)
Tumor site (%)	Colon	398 (47.7)	365 (47.3)	33 (52.4)	0.517	166 (46.2)	157 (47.4)	9 (32.1)	0.174
	Rectum	437 (52.3)	407 (52.7)	30 (47.6)		193 (53.8)	174 (52.6)	19 (67.9)
Endoscope type (%)	Non-polypoid	67 (8.0)	52 (6.7)	15 (23.8)	<0.001	27 (7.5)	17 (5.1)	10 (35.7)	<0.001
	Polypoid	768 (92.0)	720 (93.3)	48 (76.2)		332 (92.5)	314 (94.9)	18 (64.3)
Treatment (%)	Endoscopic+surgery	248 (29.7)	226 (29.3)	22 (34.9)	0.424	109 (30.4)	98 (29.6)	11 (39.3)	0.392
	Endoscopic	587 (70.3)	546 (70.7)	41 (65.1)		250 (69.6)	233 (70.4)	17 (60.7)
Grade (%)	High	167 (20.0)	121 (15.7)	46 (73.0)	<0.001	82 (22.8)	61 (18.4)	21 (75.0)	<0.001
	Low	668 (80.0)	651 (84.3)	17 (27.0)		277 (77.2)	270 (81.6)	7 (25.0)
Histology (%)	ADE	688 (82.4)	645 (83.5)	43 (68.3)	0.004	292 (81.3)	276 (83.4)	16 (57.1)	0.002
	M-ADE	147 (17.6)	127 (16.5)	20 (31.7)		67 (18.7)	55 (16.6)	12 (42.9)
DSI (%)	sm1	335 (40.1)	311 (40.3)	24 (38.1)	0.868	157 (43.7)	145 (43.8)	12 (42.9)	0.527
	sm2	360 (43.1)	333 (43.1)	27 (42.9)		139 (38.7)	126 (38.1)	13 (46.4)
	sm3	140 (16.8)	128 (16.6)	12 (19.0)		63 (17.5)	60 (18.1)	3 (10.7)
Background adenoma (%)	No	263 (31.5)	211 (27.3)	52 (82.5)	<0.001	112 (31.2)	88 (26.6)	24 (85.7)	<0.001
Background adenoma (%)	Yes	572 (68.5)	561 (72.7)	11 (17.5)		247 (68.8)	243 (73.4)	4 (14.3)
Lymphovascular invasion (%)	No	581 (69.6)	568 (73.6)	13 (20.6)	<0.001	251 (69.9)	246 (74.3)	5 (17.9)	<0.001
Lymphovascular invasion (%)	Yes	254 (30.4)	204 (26.4)	50 (79.4)		108 (30.1)	85 (25.7)	23 (82.1)
Venous invasion (%)	No	641 (76.8)	634 (82.1)	7 (11.1)	<0.001	272 (75.8)	270 (81.6)	2 (7.1)	<0.001
Venous invasion (%)	Yes	194 (23.2)	138 (17.9)	56 (88.9)		87 (24.2)	61 (18.4)	26 (92.9)
Neurovascular invasion (%)	No	150 (18.0)	139 (18.0)	11 (17.5)	1	58 (16.2)	55 (16.6)	3 (10.7)	0.584
Neurovascular invasion (%)	Yes	685 (82.0)	633 (82.0)	52 (82.5)		301 (83.8)	276 (83.4)	25 (89.3)
Tumor budding (%)	No	656 (78.6)	642 (83.2)	14 (22.2)	<0.001	287 (79.9)	282 (85.2)	5 (17.9)	<0.001
Tumor budding (%)	Yes	179 (21.4)	130 (16.8)	49 (77.8)		72 (20.1)	49 (14.8)	23 (82.1)
Poorly differentiated clusters (%)	High	245 (29.3)	218 (28.2)	27 (42.9)	0.016	108 (30.1)	96 (29.0)	12 (42.9)	0.285
	Low	308 (36.9)	294 (38.1)	14 (22.2)		123 (34.3)	116 (35.0)	7 (25.0)
	None	282 (33.8)	260 (33.7)	22 (34.9)		128 (35.7)	119 (36.0)	9 (32.1)
CA199 (%), U/mL		32.00 [22.00, 42.00]	31.00 [21.00, 41.00]	39.00 [33.00, 46.50]	<0.001	31.00 [18.00, 44.00]	32.00 [17.00, 46.00]	37.00 [35.00, 47.00]	<0.001
CEA (%), ng/mL		2.13 [1.43, 2.84]	2.11 [1.42, 2.81]	2.32 [1.73, 3.18]	<0.001	2.57 [1.23, 3.26]	2.19 [1.58, 3.11]	2.65 [1.15, 3.25]	<0.001
Neutrophil count,10^9		3.02 [2.29, 3.66]	2.90 [2.25, 3.52]	4.85 [4.06, 5.25]	<0.001	3.02 [2.36, 3.62]	2.89 [2.30, 3.54]	4.33 [3.72, 4.89]	<0.001
Lymphocyte count, 10^9		1.71 [1.25, 2.17]	1.65 [1.23, 2.13]	2.12 [1.87, 2.36]	<0.001	1.74 [1.33, 2.14]	1.66 [1.29, 2.10]	2.00 [1.78, 2.54]	<0.001
Platelet count, 10^9		185.00 [119.50, 242.00]	188.00 [120.00, 245.00]	165.00 [102.00, 215.50]	0.007	168.00 [118.00, 228.00]	168.00 [117.50, 231.50]	162.00 [126.25, 208.00]	0.695
NLR		1.75 [1.35, 2.38]	1.70 [1.33, 2.33]	2.21 [1.79, 2.64]	<0.001	1.73 [1.39, 2.28]	1.71 [1.35, 2.25]	2.14 [1.64, 2.39]	0.009
PLR		106.40 [72.27, 144.20]	109.18 [74.20, 150.56]	73.68 [47.86, 103.30]	<0.001	99.10 [65.83, 138.02]	100.00 [66.32, 143.53]	85.40 [57.88, 110.94]	0.008
PNR		60.35 [40.21, 85.02]	64.12 [42.42, 87.09]	34.75 [21.22, 43.61]	<0.001	56.30 [37.78, 78.96]	60.77 [40.02, 80.99]	37.27 [28.59, 45.60]	<0.001

Abbreviations: BMI, body mass index; ADE, adenocarcinoma; M-ADE, mucinous adenocarcinoma; DSI, depth of submucosal invasion; CA199, carbohydrate antigen199; CEA, carcinoembryonic antigen; NLR, neutrophil-to-lymphocyte ratio; PLR, platelet-to-lymphocyte ratio; PNR, platelet-to- neutrophil ratio.

Baseline Demographic and Clinical Characteristics of the Study Cohort Abbreviations: BMI, body mass index; ADE, adenocarcinoma; M-ADE, mucinous adenocarcinoma; DSI, depth of submucosal invasion; CA199, carbohydrate antigen199; CEA, carcinoembryonic antigen; NLR, neutrophil-to-lymphocyte ratio; PLR, platelet-to-lymphocyte ratio; PNR, platelet-to- neutrophil ratio.

Variables Selection and Construction of RFC Model

A total of 835 patients in the training set were used to fit the random forest algorithm. The samples were randomly allocated to non-overlapping training samples, and the RFC prediction model was established after ten-fold cross-validation. Finally, as shown in Figure 2A, there were a total of 19 variables sorted by weight and included in this RFC model. The detailed Gini index for each variable is shown in . As shown in the gravel diagram in Figure 2B, the robustness of the RFC model constructed with the above variables was relatively satisfactory. In addition, the risk stratification of patients by RFC was also very obvious (Figure 2C). Taken together, our study demonstrated that the RFC model absorbed the advantages of robustness and accuracy in predicting LNM.

Figure 2

Development and verification of the RFC model. (A) The influencing factors of LNM were ordered according to the mean decreased Gini index. (B) Ten-fold cross-validation of the performance of the prediction model. (C) Clinical impact curve for the evaluation of RFC model.

Construction of GLM and Variable Iteration

The model depends on the parameters estimated using observations, and the best model must be determined based on the available observations.17 Therefore, we chose the AIC variable screening mode, and finally obtained the following meaningful variables to participate in the construction of GLM, which were poorly differentiated clusters, NLR, background adenoma, PLR, tumor budding, venous invasion, and depth of submucosal invasion. The C-index and Brier score of each model predicted based on AIC were summarized in . As shown in Figure 3A, according to the filtering effect of AIC, the stability and potential practicability of model1 have been confirmed for GLM construction. The results of the constructed nomogram demonstrated that its predictive performance and actual observation performance were relatively satisfactory but slightly inferior to the RFC model (Figure 3B and C).

Figure 3

Nomogram to estimate the risk of LNM. (A) Nomogram used to predict LNM risk, showing the proportion of parameters included in the scoring table (%). (B) Calibration curves for internal validation of the nomogram. (C) Predicted risk histogram comparing predicted risk of the nomogram with the observed frequency.

Comparison of the Effectiveness of Two Predictive Models

The AUC of the two prediction models was shown in Figure 4. Compared with GLM, the RFC yielded the highest AUCs in the training set and the validation set, which were 0.84 and 0.85 respectively. In addition, the performance of the RFC in the external cohort was also consistent with the internal data set. In DCA analysis, which depicted the prediction of LNM, the use of RFC also resulted in the highest net benefit, as compared with GLM (Figure 5). Collectively, the performance of RFC showed a more accurate predictive performance compared to the GLM model in predicting LNM in patients with pT1NxM0 CRC.

Figure 4

The ROC curve analyses for models in the study cohort. (A) Internal training set. (B) Internal testing set. (C) External validation set.

Figure 5

Decision curve analysis compares the net benefits associated with predicting LNM using RFC and GLM models. (A) Internal training set. (B) Internal testing set. (C) External validation set.

The ROC curve analyses for models in the study cohort. (A) Internal training set. (B) Internal testing set. (C) External validation set. Decision curve analysis compares the net benefits associated with predicting LNM using RFC and GLM models. (A) Internal training set. (B) Internal testing set. (C) External validation set.

Discussion

With the extensive development of population-based CRC screening programs and the latest progress of endoscopic diagnosis, the number of endoscopic resections in patients with pT1NxM0 CRC is increasing.10 Previous studies have shown that endoscopic resection before surgical resection of pT1NxM0 CRC has no adverse effect on the prognosis.10,18,19 In short, provided that endoscopy will not lead to tumor diffusion or tumor resection can be carried out directly with the help of endoscopy, the patient can avoid extra surgery and gain many benefits, such as reducing incidence rate, shortening recovery period, and improving the quality of life.20 However, it is difficult to determine the appropriate indication for deciding whether to perform additional surgery or not, because clinicians should not only consider the probability of LNM but also consider surgical complications, postoperative quality of life, and patient’s personal choices. According to existing guidelines, the tumor indications for endoscopic resection of pT1NxM0 CRC mainly depend on the probability of occurrence of LNM.21,22 Although a more accurate LNM prediction system is needed to guide subsequent treatment, the risk stratification of LNM remains controversial. Previous studies have been conducted on the histopathological predictors of LNM in pT1NxM0 CRC, and various risk factors have been identified.10,20,23,24 Jung R.O et al reported that vascular invasion, high-grade histology, submucosal invasion, budding, and background adenoma were independent risk factors for LNM.20 Cracco N et al reported that the width and the area of submucosal invasion were both reliable prognostic factors for LNM in pT1NxM0 CRC.24 Mou S et al examined the strength of evidence that well-differentiated nonpedunculated pT1NxM0 CRC invasive into the submucosa ≤1000 μm, without lymphovascular involvement or tumor budding, has the lowest risk of nodal metastasis.25 Previous studies have shown that inflammation plays an important role in the occurrence and development of colorectal cancer.26 Especially for NLR, may be more reliable than neutrophil count, lymphocyte count, or platelet count alone, because the individual count is vulnerable to many factors.27 In addition, new evidence provides a link between inflammation and cancer development.28 It is not surprising that cancer inherent or cancer-induced inflammation can be triggered by cancer initiation mutations and can promote malignant progression through the recruitment and activation of inflammatory cells.29,30 Both exogenous and endogenous inflammation can lead to immunosuppression, which provides an ideal background for the occurrence of tumors.31 Consistent with previous research reports, our study also identified several candidate factors, including depth of submucosal invasion, NLR, PLR, PNR, venous invasion, poorly differentiated clusters, tumor budding, grade, lymphatic vascular invasion, and background adenoma that were associated with LNM, combining these candidate variables, the purpose of this study is to develop and validate a better model for predicting LNM in pT1NxM0 CRC. In this study, we successfully determined the rank order of risk factors for LNM prediction. Nowadays, machine learning classification is the most important computer development in recent years to meet the main needs of clinicians for automatic early diagnosis. As an important branch of supervised learning, the RFC model has been successfully applied to high-dimensional multi-source data reduction in many scientific fields.32 Mature supervised learning classifiers, including support vector machines, random forests, convolutional neural networks, and decision trees, have been gradually applied in clinical practice. Consistent with the results of previous research reports, for feature selection and classification, we found that the RFC model has more advantages than the traditional linear regression model. The RFC is a classification tree analysis that can model potentially complex relationships, including nonlinearities and interactions in the data, but rarely provides information about the prediction process.33 The RFC allows the calculation of the risk level based on all the variables collected from the medical records. Herein, we adopted the “bagging” procedure in RFC for the selection of observations and variables. In other words, an RFC is a combination of tree predictors, so that each tree depends on independently sampled random vector values, and has the same distribution for all trees in the forest.34 Based on the latent variables obtained by multi-layer iteration, we construct the LNM prediction model. The algorithm is more accurate, thereby improving the predictive performance of diagnosis. Interestingly, the variables obtained by the RFC screening are almost the same as those obtained by the regression algorithm, but the potential predictive performance is quite different. Taken together, we have obtained a robust model of LNM prediction based on two different algorithms. This model can predict the risk of LNM in patients with pT1NxM0 CRC in time, to better obtain clinical shunt guidance. In this study, we hope to use this model to predict whether patients with a specific set of characteristics have a high chance of LNM. Therefore, testing should be considered and patients are advised to take any preventive measures to reduce their risk. We separately evaluated the two outputs of the risk prediction model to determine their performance and determine possible improvements to the algorithm. According to the model predictive performance evaluation rules, when the AUC is greater than 0.75, the predictive model is considered to have good recognition ability.35 Meanwhile, the DCA was used to evaluate the utility of decision models.36 Compared with the predicted results of the nomogram, the AUC and DCA of RFC were relatively high, which mirrors RFC is a new supervised learning algorithm, at least could be evaluated as lymph node stage and take on a better role than GLM. We acknowledged that this study has some limitations. First, this is a retrospective cohort study based on clinical records, these findings inevitably take into account the inherent selection bias. Second, this study was based on data from two tertiary treatment centers, it is necessary to conduct repeated validation using data from other more clinical medical centers. Third, both these models were based on clinical collectable variables, there will still be screening and exploration of concentrated molecular markers, such as immunological diagnosis biomarkers and genetical analysis. Collectively, other potential biomarkers need to be explored to use different methods to improve predictability.

Conclusion

The RFC model developed in this study was shown to be a potentially useful tool in determining the percentage risk and predicting the possibility of LNM in patients with pT1NxM0 CRC. As such, it may be useful for clinicians to use in combination with other biomarkers to determine which patients need additional surgery to avoid progression, as well as to avoid the additional risks of surgery.

35 in total

1. Machine Learning for Health Services Researchers.

Authors: Patrick Doupe; James Faghmous; Sanjay Basu
Journal: Value Health Date: 2019-07 Impact factor: 5.725

Review 2. When to use the Bonferroni correction.

Authors: Richard A Armstrong
Journal: Ophthalmic Physiol Opt Date: 2014-04-02 Impact factor: 3.117

3. Artificial intelligence may help in predicting the need for additional surgery after endoscopic resection of T1 colorectal cancer.

Authors: Katsuro Ichimasa; Shin-Ei Kudo; Yuichi Mori; Masashi Misawa; Shingo Matsudaira; Yuta Kouyama; Toshiyuki Baba; Eiji Hidaka; Kunihiko Wakamura; Takemasa Hayashi; Toyoki Kudo; Tomoyuki Ishigaki; Yusuke Yagawa; Hiroki Nakamura; Kenichi Takeda; Amyn Haji; Shigeharu Hamatani; Kensaku Mori; Fumio Ishida; Hideyuki Miyachi
Journal: Endoscopy Date: 2017-12-22 Impact factor: 10.093

4. Rectal cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up.

Authors: R Glynne-Jones; L Wyrwicz; E Tiret; G Brown; C Rödel; A Cervantes; D Arnold
Journal: Ann Oncol Date: 2017-07-01 Impact factor: 32.976

Review 5. Inflammation and cancer.

Authors: Lisa M Coussens; Zena Werb
Journal: Nature Date: 2002 Dec 19-26 Impact factor: 49.962

Review 6. Pathologic predictive factors for lymph node metastasis in submucosal invasive (T1) colorectal cancer: a systematic review and meta-analysis.

Authors: Shanshan Mou; Roy Soetikno; Tadakasu Shimoda; Robert Rouse; Tonya Kaltenbach
Journal: Surg Endosc Date: 2013-02-08 Impact factor: 4.584

Review 7. Supervised Machine Learning: A Brief Primer.

Authors: Tammy Jiang; Jaimie L Gradus; Anthony J Rosellini
Journal: Behav Ther Date: 2020-05-16

8. Protection from right- and left-sided colorectal neoplasms after colonoscopy: population-based study.

Authors: Hermann Brenner; Michael Hoffmeister; Volker Arndt; Christa Stegmaier; Lutz Altenhofen; Ulrike Haug
Journal: J Natl Cancer Inst Date: 2009-12-30 Impact factor: 13.506

9. Development of Europe-Wide Models for Particle Elemental Composition Using Supervised Linear Regression and Random Forest.

Authors: Jie Chen; Kees de Hoogh; John Gulliver; Barbara Hoffmann; Ole Hertel; Matthias Ketzel; Gudrun Weinmayr; Mariska Bauwelinck; Aaron van Donkelaar; Ulla A Hvidtfeldt; Richard Atkinson; Nicole A H Janssen; Randall V Martin; Evangelia Samoli; Zorana J Andersen; Bente M Oftedal; Massimo Stafoggia; Tom Bellander; Maciej Strak; Kathrin Wolf; Danielle Vienneau; Bert Brunekreef; Gerard Hoek
Journal: Environ Sci Technol Date: 2020-11-25 Impact factor: 9.028

Review 10. Artificial Intelligence and Machine Learning in Pathology: The Present Landscape of Supervised Methods.

Authors: Hooman H Rashidi; Nam K Tran; Elham Vali Betts; Lydia P Howell; Ralph Green
Journal: Acad Pathol Date: 2019-09-03