Literature DB >> 36059703

Machine learning for lymph node metastasis prediction of in patients with gastric cancer: A systematic review and meta-analysis.

Yilin Li1, Fengjiao Xie1, Qin Xiong1, Honglin Lei1, Peimin Feng1.   

Abstract

Objective: To evaluate the diagnostic performance of machine learning (ML) in predicting lymph node metastasis (LNM) in patients with gastric cancer (GC) and to identify predictors applicable to the models.
Methods: PubMed, EMBASE, Web of Science, and Cochrane Library were searched from inception to March 16, 2022. The pooled c-index and accuracy were used to assess the diagnostic accuracy. Subgroup analysis was performed based on ML types. Meta-analyses were performed using random-effect models. Risk of bias assessment was conducted using PROBAST tool.
Results: A total of 41 studies (56182 patients) were included, and 33 of the studies divided the participants into a training set and a test set, while the rest of the studies only had a training set. The c-index of ML for LNM prediction in training set and test set was 0.837 [95%CI (0.814, 0.859)] and 0.811 [95%CI (0.785-0.838)], respectively. The pooled accuracy was 0.781 [(95%CI (0.756-0.805)] in training set and 0.753 [95%CI (0.721-0.783)] in test set. Subgroup analysis for different ML algorithms and staging of GC showed no significant difference. In contrast, in the subgroup analysis for predictors, in the training set, the model that included radiomics had better accuracy than the model with only clinical predictors (F = 3.546, p = 0.037). Additionally, cancer size, depth of cancer invasion and histological differentiation were the three most commonly used features in models built for prediction.
Conclusion: ML has shown to be of excellent diagnostic performance in predicting the LNM of GC. One of the models covering radiomics and its ML algorithms showed good accuracy for the risk of LNM in GC. However, the results revealed some methodological limitations in the development process. Future studies should focus on refining and improving existing models to improve the accuracy of LNM prediction. Systematic Review Registration: https://www.crd.york.ac.uk/PROSPERO/, identifier CRD42022320752.
Copyright © 2022 Li, Xie, Xiong, Lei and Feng.

Entities:  

Keywords:  Machine learning; gastric cancer; lymph node metastasis; meta-analysis; systematic review

Year:  2022        PMID: 36059703      PMCID: PMC9433672          DOI: 10.3389/fonc.2022.946038

Source DB:  PubMed          Journal:  Front Oncol        ISSN: 2234-943X            Impact factor:   5.738


Background

Gastric cancer (GC) is the fifth most common malignancy and the third leading cause of cancer-associated death worldwide (1–3). Lymph node metastasis (LNM) is one of the most sensitive prognostic factors for patients with GC (4–6). Patients at different lymph node stages may require different degrees of lymphadenectomy or neoadjuvant therapy (7–10), and typically have different outcomes. Therefore, it is of necessity to accurately predict and evaluate LNM before making treatment decisions (11, 12). Non-invasive imaging modalities, including computed tomography (CT), functional magnetic resonance imaging (fMRI), and B-ultrasonography, have been widely applied for the evaluation of lymph node status in GC patients. However, the performances of these techniques remain controversial due to their sensitivity, specificity, and accuracy (13–19). Sentinel lymph node (SLN) biopsy is an invasive approach that has also been adopted for LNM detection in GC (20), while is still in debates on its effectiveness. Kitagawa et al. (21) and Miyashiro et al. (22) applied two different SLN biopsy methods, but reached different false negative rates (7% and 46.4%, respectively). Endoscopic ultrasonography combined with fine needle aspiration can be used for local lymph node staging and LNM diagnosing, while the former fails to detect distant metastases (23). Several new molecular biomarkers have been found to be useful for predicting LNM of GC, but the application of these agents is limited due to high cost and complex technological requirements (24, 25). There are indeed multiple methods that have potential to diagnose LNM, whereas their performances are tied down by so many limitations and uncertainties, making it an urgent need to find a more applicable and effective method for the identification of LNM status. Machine learning (ML) algorithm is a newly emerged technique that is capable of accurate raw data-processing, important data connections-analyzing, and accurate decision-making (26, 27). Compared with conventional statistical methods, ML model has higher prediction accuracy (28, 29). It is of critical application value in assisting disease-diagnosing and prognosis-predicting through processing massive and complex medical data (30, 31). Currently, ML has been increasingly applied for LNM prediction in GC patients (32–72). However, different types of ML prediction models have great differences in both included predictors and calculation methods of the models (73, 74), the results produced by different models are far from unanimous (75). More importantly, there is no systematic review and meta-analysis conducted to assess ML for LNM prediction in GC patients. Therefore, we reviewed and synthesized all the related studies published previously to evaluate the accuracy of ML models for LNM prediction in GC patients.

Methods

This systematic review and meta-analysis was performed following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (76). The study was registered on PROSPERO (Registration No. CRD42022320752).

Literature retrieval strategy

PubMed, EMBASE, Web of Science, and Cochrane Library were systematically searched from inception to March 16, 2022 for all the related published articles. Search items mainly contained: “stomach neoplasms,” “machine learning,” and “lymphatic metastasis.” References of included articles were also searched manually for potential eligible studies. Detailed search procedures and strategy are presented in .

Inclusion criteria

The studies were selected according to the following criteria: (1) Patients were diagnosed with GC based on histopathological examination; (2) Cohort study published in English, with the full-text available; (3) Reported assessment of the performance of ML algorithm for LNM prediction; (4) Clearly description for ML models and predictor variables used (5) Reported the prediction performance indices of ML models and included sufficient data to infer the c-index and/or accuracy. Studies meeting the following criteria were excluded: (1) Limited sample size (less than 100); (2) Letter, editorial, animal study, review, conference summary, consensus, case report, and guidelines; (3) Research focusing on identifying or analyzing individual predictors, rather than the development and/or verification of models; (4) Model performance measurements were not reported; (5) Model building process or method was not described.

Data extraction

Data extraction was processed by two reviewers independently (YL and FX). The list of extraction items was designed based on the modified version of Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) (77). Discrepancies were resolved by a third reviewer (PF). The following data were extracted: (1) study characteristics (authors, publication date, study-design, and country or region); (2) cohort characteristics (number of participants, number of patients with positive LNM, and cancer stages); (3) Feature selection algorithms, number and types of predictor in final model, types of ML prediction model, and model validation and application; (4) prediction outcomes, including c-index, accuracy, sensitivity, and specificity.

Risk of bias assessment

Risk of bias assessment and applicability of included studies were performed using the Prediction Model Risk of Bias Assessment Tool (PROBAST) (78), which includes four domains; participants, predictors, outcomes, and analysis. Risk of bias in each study was assessed based on the four domains, while the applicability was evaluated based on the first three domains. Each study was graded as “high risk”, “low risk”, or “unclear risk” (78). This process was conducted and cross-checked by two reviewers (YL and FX) independently. Discrepancies were settled by the third reviewer (PF).

Statistical analysis

Data analysis was performed using R Statistical Software (version 4.1.1) with ‘matrix’, ‘metafor’ and ‘meta’ packages (79, 80). Subgroups were set based on ML algorithms. The c-index and accuracy for LNM prediction in GC patients, which were obtained from each study included, were measured with 95% confidence intervals (95% CIs) in the final analysis. For studies that did not report c-index, we calculated it via plotting receiver operating characteristic (ROC) curves based on reported probability distributions. The results from all included studies were pooled, and an overall estimated effect was evaluated using random-effect model which processed heterogeneity among studies (81). We used one-way ANOVA to discuss the differences in c-index and accuracy between the training group and the test group.

Results

Study selection

There were 2582 articles identified through searching the four databases, in which 1210 were excluded after duplicate-checking, 1126 excluded via browsing titles and abstracts. Full-texts of the remaining 246 articles were read, and 205 articles were excluded for reasons specified in . Finally, a total of 41 studies were included (32–72).
Figure 1

The PRISMA flow diagram for study selection.

The PRISMA flow diagram for study selection.

Characteristics of included studies

There were 35 studies (85.4%) that were conducted in China (32, 33, 35–43, 45–49, 51–54, 56–63, 65–69, 71, 72), 5 (12.2%) in Korea (34, 44, 50, 55, 64) and 1 (2.4%) in Germany (70), with the publication date ranged from 2004 to 2022. The number of studies using ML for LNM prediction has gradually increased since 2018 ( ). There were 35 retrospective studies (33, 35–53, 56–59, 61–70, 72) and 6 prospective studies (32, 34, 54, 55, 60, 71), with a total of 56182 participants, in which the number of patients with LNM was 12031. Characteristics of included studies are presented in .
Figure 2

Distribution of studies by the year of publication.

Table 1

Characteristics of included studies.

StudyCountryStudy designStageNo. patients in the train setNo. patients in the test setTechnique used for feature selectionTypes of machine learningData source
Xiao-Peng Zhang (2011)ChinaRetroEarly GCBorrmann I-IV175NALRSVMSingle institution
Song Liu (2021)ChinaRetroStages I-IV12241LASSOSVM, LRSingle institution
C Jin (2021)ChinaRetroStages I-IV1172527LR, RFDLMultiple institution
Xiaoxiao Wang (2021)ChinaRetroT1-28079LRLRSingle institution
Xiao-Yi Yin (2020)ChinaRetroT1a, T1b596227LRLRSingle institution
Bang Wool Eom (2016)KoreaRetroT1a, T1b336NALRLRSingle institution
Zhixue Zheng (2015)ChinaRetroT1a, T1b262NALRLRSingle institution
Jing Li (2020)ChinaRetroBorrmann I-IV13668LRDLSingle institution
Zhengbing Wang (2021)ChinaRetroT1a, T1b363140LRLRSingle institution
HuaKai Tian (2022)ChinaRetroT1a, T1b2294227LRGLM, RPART, RF, GBM, SVM, RDA, ANNMultiple institution
Zhixue Zheng (2016)ChinaRetroT1a, T1b597NALRLRSingle institution
Yu Mei (2021)ChinaProsT1a, T1b794418LRLRSingle institution
Jing Li (2018)ChinaRetroBorrmann I-IV14070LRLRSingle institution
Su Mi Kim (2020)KoreaProsT1a, T1b105792100LRLRSingle institution
Miaoquan Zhang (2021)ChinaRetroT1a, T1b285NALRLRSingle institution
Yuming Jiang (2019)ChinaRetroStages I-IV3121377LRLRMultiple institution
Qiu-Xia Feng (2019)ChinaRetroStages I-IV326164SVMSVMSingle institution
Jianfeng Mu (2019)ChinaRetroT1a, T1b746126LRLRSingle institution
Shilong Li (2021)ChinaRetroStages I-IV144151LRLRSingle institution
Yue Wang (2020)ChinaRetroNA19750LRRFSingle institution
Chun Guang Guo (2016)ChinaRetroT1a, T1b2561273LRLRMultiple institution
Cheng-Mao Zhou (2021)ChinaProsT1a, T1b818351GBDTGBDT, XGB, RF, LR, XGB+LR, RF+LR, GBDT+LRSingle institution
Xujie Gao (2021)ChinaRetroT1a, T1b308155LRLRSingle institution
Xujie Gao (2020)ChinaRetroStages I-IV486240LRLRSingle institution
Xu Wang (2021)ChinaRetroT1-425099LRLRSingle institution
Siwei Pan (2021)ChinaRetroT1a, T1b1274637LRLRMultiple institution
Wujie Chen (2019)ChinaRetroT2-47175LRLRSingle institution
Bong-Il Song (2020)KoreaRetroT1-4377189LRLRSingle institution
Chao Huang (2020)ChinaRetroNA466NARFDTSingle institution
Lili Wang (2021)ChinaRetroT2-4340175LRLRSingle institution
Seokhwi Kim (2021)KoreaRetroT1a28108LRBayesianMultiple institution
Qiufang Liu (2021)ChinaRetroNA185NARFDLSingle institution
Wannian Sui (2021)ChinaRetroT1a, T1b1496246LRLRMultiple institution
Dexin Chen (2019)ChinaRetroT1a, T1b232143LRLRMultiple institution
Lingwei Meng (2021)ChinaRetroT1-4377162LASSOLRMultiple institution
D Dong (2020)ChinaRetroT2-4225505Multivariable linear regression analysis, SVMDLMultiple institution
Zepang Sun (2021)ChinaProsT1-45311087LRLRMultiple institution
Ji-Eun Na (2022)KoreaProsT1a, T1b103324428LR, SVM, RFLR, SVM, RFSingle institution
Haixing Zhu (2022)ChinaRetroT1a, T1b1878470DT, GBM, LR,ANN, RF, XGBOOSTDT, GBM, LR, ANN, RF, XGBOOSTMultiple institution
Elfriede H Bollschweiler (2004)GermanyRetroStages I-IV135NAANNANNSingle institution
Yinan Zhang (2018)ChinaProsT1a, T1b27281LRLRSingle institution

ANN, artificial neural network; DL, deep learning; DT, decision tree; GBM, gradient boosting machine;GC, gastric cancer; GLM, generalized linear model; LASSO, Least Absolute Shrinkage and Selection Operator; LR, logistic regression; NA, not available; No., number; Pros, prospective; RDA, regularized dual averaging; Retro, retrospective; RF, random forest; SVM, support vector machine; XGBOOST, extreme gradient boosting;

Distribution of studies by the year of publication. Characteristics of included studies. ANN, artificial neural network; DL, deep learning; DT, decision tree; GBM, gradient boosting machine;GC, gastric cancer; GLM, generalized linear model; LASSO, Least Absolute Shrinkage and Selection Operator; LR, logistic regression; NA, not available; No., number; Pros, prospective; RDA, regularized dual averaging; Retro, retrospective; RF, random forest; SVM, support vector machine; XGBOOST, extreme gradient boosting;

Characteristics of machine learning in included studies

A total of 61 models were retrieved from included studies (ranged from 1 to 7 models in each study), with various modelling methods applied. The most frequently used ML algorithms were logistic regression (LR) (n=30; 49.18%), support vector machine (SVM) (n=5; 8.2%), deep learning (DL) (n=4; 6.56%), and random forest (RF) (n=4; 6.56%) ( ). Feature selection is an important step for ML training. The number of features used in the models varied from 2 to 21, and summarizes the 15 most common features. The most commonly used predictors were tumor size (n=35; 14.96%), depth of tumor invasion (n=32; 13.68%), histology differentiation (n=20; 8.55%), imaging techniques (n=17; 7.26%), lymphovascular invasion (n=17; 7.26%), tumor location (n=14; 5.98%), CT-reported LN (n=11; 4.7%), age (n=8; 3.42%), macroscopic features (n=8; 3.42%), and CA199 (n=7; 2.99%).
Figure 3

15 most frequently used predictors in 61 prediction models for gastric cancer patients.

15 most frequently used predictors in 61 prediction models for gastric cancer patients.

Risk of bias and applicability assessment

All the included studies were of low risk of bias with respect to the domain of participants and outcome, 19 (46.34%) studies had low risk of bias in predictors (32, 34, 37, 38, 40, 42, 44, 54–56, 60, 62, 63, 66–71), 22 (53.66%) had unclear risk of bias due to that the prediction assessment was performed in the know of outcome data (33, 35, 36, 39, 41, 43, 45–53, 57–59, 61, 64, 65, 72). As for the domain of analysis, the risk of bias in 16 studies was considered high (33, 35, 41, 43, 47, 50, 51, 53, 58, 59, 61, 63, 65, 68, 70, 71), and the reasons were that (1): Insufficient sample size. Eight of the studies did not meet the standard of including at least 100 participants (2). Selection of predictors based on univariable analysis (3). Lack of external validation techniques. Eight studies lacked external validation in model development (35, 50, 51, 53, 59, 63, 65, 70). Concern regarding ‘overall applicability’ was rated as low in 15 studies (36.59%) (32, 34, 37, 38, 40, 42, 44, 54–56, 60, 62, 66, 67, 69), high in 16 studies (39.02%) (33, 35, 41, 43, 47, 50, 53, 58, 59, 61, 63, 65, 68, 70, 71) and unclear in the remaining 10 (24.39%) (36, 39, 45, 46, 48, 49, 52, 57, 64, 72). Risk of bias and applicability assessment were shown in .
Table 2

Risk of bias and applicability assessment by PROBAST criteria.

AuthorYearRisk of biasOverall applicability rating
ParticipantsPredictorsOutcomeAnalysis
Xiaoxiao Wang2021lowunclearlowhighhigh
Xiao-Yi Yin2020lowunclearlowlowunclear
Bang Wool Eom2016lowunclearlowhighhigh
Zhixue Zheng2015lowunclearlowhighhigh
Zhengbing Wang2021lowunclearlowlowunclear
Zhixue Zheng2016lowunclearlowhighhigh
Yu Mei2021lowlowlowlowlow
Jing Li2018lowunclearlowhighhigh
Su Mi Kim2020lowlowlowlowlow
Miaoquan Zhang2021lowunclearlowhighhigh
Yuming Jiang2019lowunclearlowlowunclear
Jianfeng Mu2019lowlowlowlowlow
Shilong Li2021lowlowlowlowlow
Chun Guang Guo2016lowunclearlowlowunclear
Xujie Gao2021lowunclearlowlowunclear
Xujie Gao2020lowlowlowlowlow
Xu Wang2021lowunclearlowhighhigh
Siwei Pan2021lowlowlowlowlow
Wujie Chen2019lowunclearlowhighhigh
Bong-Il Song2020lowlowlowlowlow
Lili Wang2021lowunclearlowlowunclear
Wannian Sui2021lowunclearlowlowunclear
Dexin Chen2019lowunclearlowlowunclear
Zepang Sun2021lowlowlowlowlow
Xiao-Peng Zhang2011lowunclearlowhighhigh
Song Liu2021lowunclearlowhighhigh
C Jin2021lowlowlowlowlow
Jing Li2020lowlowlowhighhigh
HuaKai Tian2022lowlowlowlowlow
Qiu-Xia Feng2019lowunclearlowlowunclear
Yue Wang2020lowunclearlowhighhigh
Cheng-Mao Zhou2021lowlowlowlowlow
Chao Huang2020lowlowlowhighhigh
Seokhwi Kim2021lowunclearlowlowunclear
Qiufang Liu2021lowunclearlowhighhigh
Lingwei Meng2021lowlowlowlowlow
D Dong2020lowlowlowlowlow
Ji-Eun Na2022lowlowlowlowlow
Haixing Zhu2022lowlowlowlowlow
Elfriede H Bollschweiler2004lowlowlowhighhigh
Yinan Zhang2018lowlowlowhighhigh
Risk of bias and applicability assessment by PROBAST criteria.

C-index

There were different numbers for training and test models because there were five studies which only reported the training results (35, 50–53). The overall c-index for ML in training group was 0.837 [(95%CI (0.814, 0.859)] ( ; ). LR, one of the most commonly used ML methods, resulted in an overall pooled c-index of 0.838 [(95%CI (0.812, 0.865)] ( ), while non-logistic regression (non-LR) model resulted in an overall pooled c-index of 0.83 [(95%CI (0.786, 0.877)] ( ).
Table 3

c-index for prediction models in gastric cancer patients.

ModelTrainTest
No. modelc-index95%CINo. modelc-index95%CI
LR260.8380.812-0.865210.8240.791-0.858
Non-LR130.830.786-0.877130.7890.747-0.833
DL30.8660.799-0.93830.8350.780-0.895
GBDT10.7980.714-0.89210.7880.688-0.902
GBDT+LR10.6260.529-0.74010.650.557-0.759
RF30.8930.817-0.97730.8480.829-0.868
RF+LR10.6910.594-0.80410.6780.584-0.787
SVM20.8470.804-0.89420.8170.728-0.917
XGB10.8810.786-0.98710.7620.673-0.863
XGB+LR10.7390.648-0.84210.6190.521-0.736
Overall390.8370.814-0.859340.8110.785-0.838

No. model indicates the number of prediction models. DL, deep learning; LR, logistic regression; No., number; Non-LR, non logistic regression; RF, random forest; SVM, support vector machine.

c-index for prediction models in gastric cancer patients. No. model indicates the number of prediction models. DL, deep learning; LR, logistic regression; No., number; Non-LR, non logistic regression; RF, random forest; SVM, support vector machine. Furthermore, the pooled c-index in test group was 0.811 [(95%CI (0.785, 0.838)] ( ; ), which was similar with the result in training group. Subgroup analysis showed that 21 models in LR subgroup had a pooled c-index of 0.824 [95%CI (0.791, 0.858)] ( ), and 13 models that used non-LR model assessment had a pooled c-index of 0.789 [95%CI (0.747, 0.833)] ( ).

Accuracy

There were different numbers of training and test models because there were ten studies which only reported the results of training group (35, 50–53, 59, 63, 65, 70, 71), whereas two other studies only reported that of test group. The ML models for LNM in training group showed an overall pooled accuracy of 0.781 [95%CI (0.756-0.805)] ( ; ). Subgroup analysis showed no significant difference in different ML algorithms. LR algorithms had a pooled accuracy of 0.792 [95%CI (0.761-0.82)] ( ) and non-LR algorithms had that of 0.768 [95%CI (0.725, 0.805)] ( ).
Table 4

Results of meta-analyses of accuracy for prediction models for gastric cancer patients.

ModelTrainTest
No. modelaccuracy95%CINo. modelaccuracy95%CI
LR280.7920.761-0.820230.7870.745-0.824
Non-LR220.7680.725-0.805180.7070.665-0.746
ANN20.7070.574-0.81210.6340.589-0.678
DL20.8180.755-0.86810.7650.646-0.859
DT10.7940.754-0.83010.6320.587-0.676
GBDT10.8350.808-0.86010.8150.770-0.854
GBDT+LR10.9030.881-0.92310.5730.519-0.625
GBM10.6180.597-0.63810.6870.643-0.729
GLM10.6670.647-0.686NANANA
RDA10.6680.649-0.68810.7000.636-0.759
RF40.7930.710-0.85840.7230.678-0.764
RF+LR10.6440.610-0.67710.5780.525-0.631
RPART10.6250.604-0.645NANANA
SVM40.7650.678-0.83520.7890.693-0.861
XGB10.8630.838-0.88610.6780.626-0.727
XGB+LR10.8060.777-0.83210.5810.528-0.633
BayesianNANANA10.8240.739-0.891
XGBOOSTNANANA10.6910.648-0.733
Overall500.7810.756-0.805410.7530.721-0.783

No. model indicates the number of prediction models. ANN, artificial neural network; DL, deep learning; DT, decision tree; GBM, gradient boosting machine; GLM, generalized linear model; LR, logistic regression; NA, not available; No., number; Non-LR, non logistic regression; RDA, regularized dual averaging; RF, random forest; SVM, support vector machine; XGBOOST, extreme gradient boosting.

Results of meta-analyses of accuracy for prediction models for gastric cancer patients. No. model indicates the number of prediction models. ANN, artificial neural network; DL, deep learning; DT, decision tree; GBM, gradient boosting machine; GLM, generalized linear model; LR, logistic regression; NA, not available; No., number; Non-LR, non logistic regression; RDA, regularized dual averaging; RF, random forest; SVM, support vector machine; XGBOOST, extreme gradient boosting. In test group, the accuracy of the pooled 41 models was 0.753 [95%CI (0.721-0.783)] ( ; ). Subgroup analysis was conducted based on LR and non-LR algorithms assessment. The pooled accuracy for the 23 models that used LR was 0.787 [95%CI 0.745, 0.824] ( ). The overall pooled accuracy for non-LR models was 0.707 [95%CI (0.665, 0.746)] ( ).

Subgroup analysis for early-gastric cancer and advanced gastric cancer

Of the 41 included studies, 21 were early-gastric GC (EGC) (T1) studies (32, 34, 35, 37, 39, 42, 46, 48–53, 55, 56, 60, 64, 65, 69, 71, 72) and 3 were advanced GC (T2-4) studies (43, 45, 67). In EGC, there was a pooled c-index of 0.832 [95%CI (0.804, 0.860)] ( ; ) and 0.795 [95%CI (0.755, 0.838)] ( ) for the training and test groups, respectively. As for advanced GC, the pooled c-index for the training and test groups was 0.849 [95%CI (0.801-0.900)]( ) and 0.804 [95%CI (0.778-0.830)]( ), respectively.
Table 5

Subgroup analysis for early-gastric cancer and advanced gastric cancer.

StageTrainTestTrainTest
No. modelc-index(95%CI)No. modelc-index(95%CI)No. modelAccuracy(95%CI)No. modelAccuracy(95%CI)
EGC240.832(0.804-0.860)190.795(0.755-0.838)310.765(0.730-0.796)260.731(0.686-0.773)
Advanced GC30.849(0.801-0.900)30.804(0.778-0.830)20.821(0.737-0.882)20.844(0.794-0.884)

No. model indicates the number of prediction models. EGC, early-gastric cancer; GC, gastric cancer.

Subgroup analysis for early-gastric cancer and advanced gastric cancer. No. model indicates the number of prediction models. EGC, early-gastric cancer; GC, gastric cancer. Thirty-one models evaluated the accuracy of ML for EGC, and their pooled accuracy was 0.765 [95% CI (0.730-0.796)]( ) for the training group. In terms of test group, the pooled accuracy for EGC was 0.731 [95%CI (0.686-0.773)] ( ). As for advanced GC, the training group had a pooled accuracy of 0.821 [95%CI (0.737-0.882)] ( ) while the test group had a pooled accuracy of 0.844 [95%CI (0.794-0.884)] ( ).

Subgroup analysis for predictors

Furthermore, we reviewed the predictors in the included original studies and we found three cases: Group A included only clinical predictors, Group B included only radiomic predictors, and Group C included both clinical and radiomic predictors. In the training group, the c-index of groups A, B, and C was 0.822 ± 0.079 (n = 25), 0.852 ± 0.072 (n = 8), and 0.847 ± 0.063 (n = 8), respectively, with no significant difference between them (F = 0.604, p=0.552) ( ). In the test group, the c-index of groups A, B, and C was 0.792 ± 0.092 (n = 20), 0.83 ± 0.07 (n = 8), and 0.817 ± 0.043 (n = 6), respectively, and there was also no significant difference between them (F = 0.664, p = 0.522).
Table 6

Subgroup analysis for predictors.

ModelIndicatorCPRPCP+RPFP
nmean(sd)nmean(sd)nmean(sd)
Trainc-index250.822(0.079)80.852(0.072)60.847(0.063)0.6040.552
accuracy340.75(0.087)90.811(0.066)70.822(0.073)3.5460.037
Testc-index200.792(0.092)80.830(0.07)60.817(0.043)0.6640.522
accuracy280.722(0.098)80.799(0.075)50.795(0.04)3.2240.051

n indicates the number of prediction models. CP, Clinical Predictors; RP, Radiomics Predictors.

Subgroup analysis for predictors. n indicates the number of prediction models. CP, Clinical Predictors; RP, Radiomics Predictors. In the training group, the accuracy of groups A, B, and C were 0.75 ± 0.087 (n=34), 0.811 ± 0.066 (n = 9), and 0.822 ± 0.073 (n = 7), respectively, and there was a significant difference between them (F = 3.546, p = 0.037) ( ), and the model containing radiomics had better accuracy. In the test group, the accuracy of groups A, B, and C were 0.722 ± 0.098 (n = 28), 0.799 ± 0.075 (n = 8), and 0.795 ± 0.04 (n = 5), respectively, although there was no significant difference between them (F = 3.224, p=0.051), the significance probability p-value was close to the critical value of 0.05. The mean value of accuracy was higher for models containing radiomics in the test cohort than for models containing only clinical predictors. In summary, the model covering radiomics and its machine learning algorithms has better accuracy for the risk of lymph node metastasis in gastric cancer.

Discussion

The number of studies that apply ML to LNM prediction has been gradually increasing since 2018, making it important to systematically review the published studies so as to provide guidance for future research. To our knowledge, this is the first systematic review and meta-analysis that evaluated ML performance in the assessment of LNM in GC patients. ML-related studies can be methodologically categorized into LR and non-LR study. Several included studies were assessed to be of high or unclear risk of bias in the domains of prediction, analysis and overall applicability, which highlighted the current state of technology, as well as the need for methodological quality improvement. This study demonstrated that ML had an excellent diagnostic performance in predicting LNM with great repeatability, which was in consistence with other studies. The pooled c-index and accuracy were 0.837 [95%CI (0.814,0.859)] and 0.781 [95%CI (0.756–0.805)], respectively. Significant heterogeneity existed between the studies, which could be caused by multiple factors. EGC is defined as a tumor limited to the mucosa and submucosa, regardless of the LNM (82). A subgroup analysis was performed since the difference in the order of magnitude characteristics of LNM between EGC and advanced gastric cancer may have a certain impact on the results of machine learning. It showed no significant difference in c-index or accuracy between EGC and advanced gastric cancer. In addition, since the included studies used LR or non-LR, subgroup analysis based on this variable was conducted to observe the changes in heterogeneity between the two groups. There was also no significant difference in c-index or accuracy among different ML algorithms. The type of ML algorithm had no effect on LNM prediction. Most importantly, this study was not designed to identify one superior algorithm from the other ones. Feature selection was also critical to the performance and interpretation of the model. The most commonly used variables in the model development were tumor size, depth of tumor invasion, histology differentiation, imaging techniques, lymphovascular invasion, tumor location, CT-reported LN, age, macroscopic features, and CA199. These variables are either anthropometric characteristics serving as markers of disease severity, or important factors contributing to the natural disease progression. These predictive indicators are easy to measure. Another merit of these predictors is the low risk of bias in measurement, resulting in a minimal possibility of exposure misclassification. Previous studies have revealed that the size of tumor is closely related to the incidence of LNM in patients with GC (83–85). Larger tumor typically indicates a higher risk of LNM (86–88), which might be attributed to easiness of invasion for larger tumor to the surrounding tissues. Depth of tumor invasion was found to be a strong predictor in 32 models (32, 34–37, 39, 41, 42, 46, 49, 50, 52, 54–56, 62, 69–71), which is in consistence with substantial evidence supporting its use as a predictor of LNM (36, 63, 89–91). There were 20 models considered histology differentiation as a vital factor for predicting LNM (34–37, 41, 42, 48–55, 61–63). Deeply infiltrated and poorly differentiated tumors might have sufficient nutritional support to facilitate its invasion to tissues, capillaries and lymphatic vessels, and thus to have the potential to grow and metastasize faster (69). The novel PROBAST was applied for assessment of risk of bias and applicability of included prediction model studies, which allowed more details of the model, such as data source, processing, number of events per variable, feature selection, model development, and model validation, to be checked intensively (92–95). The PROBAST quality assessment revealed some other issues that could be avoided in future studies. First, external validation is rarely performed, which might be a primary limitation in studies of this field. Simple determination of samples for modeling would lead to an overestimation for the model performance (96), and further accuracy verification for these models would be advisable. Guidelines that include external validation should be followed when reporting ML models (97). On the other hand, most of the included studies were retrospective design, leading to confounding and selection bias. More prospective studies are needed to produce evidence of high quality. The included studies also demonstrated another roadblock to the clinical implementation of ML. The data used in most of the included studies were from single institution, which resulted in limited datasets for training and failed to exert the advantage of ML that it is effective in processing large samples on multiple dimensions (98). Also, limited number of studies were less likely to be of broad public health significance. Future studies should take into accounts the expansion of datasets from multiple centers to increase the sample size and to improve classifier performance. We also note the importance of preoperative assessment of peritoneal metastasis of GC for prognosis. Currently, the assessment of peritoneal metastases is mainly in the form of radiomics, but the data obtained by radiomics is obtained from a variety of sources, usually by CT, which may affect the results obtained (99–106). The heterogeneity of the results can be brought about by the different parameters and bits of CT and the artificial partitioning by different investigators through their own experience, so that the prediction of preoperative peritoneal metastases based on radiomics can be highly heterogeneous. At the same time, the application of radiomics generates a large amount of high-dimensional data, and the screening of these high-dimensional data is a great challenge in clinical practice. Therefore, although the prediction of preoperative peritoneal metastasis based on radiomics has been favored by a large number of researchers in recent years, these studies have not reached a clear consensus, thus resulting in a great variation in C-index (C-index ranged from 0.712 to 0.981) (99–106). We also expect subsequent studies based on radiomics to guide preoperative peritoneal metastases. There were several limitations in this study. The first limitation was the significant heterogeneity. The sample sizes and distributions varied in different studies, as well as heterogenous variety of feature selection methods and ML algorithms, which compromised the performance and applicability of each model. However, such heterogeneity could be deemed as a key finding that should be addressed by future studies. Second, the estimation for prediction performance was based on limited data due to the incomplete reports of the results in several studies. Third, most of the reviewed studies included GC patients in different cancer stages, which might represent a possible confounding factor that disrupted ML performance in differential diagnosis. Fourth, our findings should be interpreted prudently considering the potential significant publication bias. It is not suggested for investigators to report a test of unsatisfactory prediction values. It is probable that there might be instances in which ML might not have optimal prediction accuracies and so that has not been published yet (107). Last but not least, eight of the included studies that did not report the test set also affected the robustness of this study by causing a false high-performance result. It would be even better if all the studies provided external test results.

Conclusion

ML has shown excellent diagnostic performance for LNM prediction in GC patients, and ML models based on radiomics and clinical features could be a better potential prediction method. However, there were some methodological limitations in their development, and there is still room for improvement in predictive value. Future studies are needed to explore efficient, minimally invasive, and easily collected predictors for LNM so as to build more effective ML models and improve the accuracy of LNM prediction.

Data availability statement

The original contributions presented in the study are included in the article/ . Further inquiries can be directed to the corresponding author.

Author contributions

Concept and design, YL, FX, QX, HL, and PF. Acquisition of data, YL, FX, and PF. Statistical analysis, YL, QX, and PF. Interpretation of data, YL, HL, and PF. Writing original draft, YL and PF. Writing review and editing, all authors. All authors have made substantial contributions to this work and have approved the final version of the manuscript.

Funding

This work support by the National Natural Science Foundation of China. (Grant No. 81673854).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
  105 in total

1.  CT-based radiomics nomogram for preoperative prediction of No.10 lymph nodes metastasis in advanced proximal gastric cancer.

Authors:  Lili Wang; Jing Gong; Xinming Huang; Guifang Lin; Bin Zheng; Jingming Chen; Jiangao Xie; Ruolan Lin; Qing Duan; Weiwen Lin
Journal:  Eur J Surg Oncol       Date:  2020-11-24       Impact factor: 4.424

2.  Meta-analysis in clinical trials.

Authors:  R DerSimonian; N Laird
Journal:  Control Clin Trials       Date:  1986-09

3.  Radiomics analysis using contrast-enhanced CT for preoperative prediction of occult peritoneal metastasis in advanced gastric cancer.

Authors:  Shunli Liu; Jian He; Song Liu; Changfeng Ji; Wenxian Guan; Ling Chen; Yue Guan; Xiaofeng Yang; Zhengyang Zhou
Journal:  Eur Radiol       Date:  2019-08-05       Impact factor: 5.315

4.  The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration.

Authors:  Alessandro Liberati; Douglas G Altman; Jennifer Tetzlaff; Cynthia Mulrow; Peter C Gøtzsche; John P A Ioannidis; Mike Clarke; P J Devereaux; Jos Kleijnen; David Moher
Journal:  BMJ       Date:  2009-07-21

Review 5.  Advanced gastric cancer: What we know and what we still have to learn.

Authors:  Federico Coccolini; Giulia Montori; Marco Ceresoli; Simona Cima; Maria Carla Valli; Gabriela E Nita; Arianna Heyer; Fausto Catena; Luca Ansaloni
Journal:  World J Gastroenterol       Date:  2016-01-21       Impact factor: 5.742

6.  PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies.

Authors:  Robert F Wolff; Karel G M Moons; Richard D Riley; Penny F Whiting; Marie Westwood; Gary S Collins; Johannes B Reitsma; Jos Kleijnen; Sue Mallett
Journal:  Ann Intern Med       Date:  2019-01-01       Impact factor: 25.391

7.  The correlation between miRNA and lymph node metastasis in gastric cancer.

Authors:  Kuo-Hung Huang; Yuan-Tzu Lan; Wen-Liang Fang; Jen-Hao Chen; Su-Shun Lo; Anna Fen-Yau Li; Shih-Hwa Chiou; Chew-Wun Wu; Yi-Ming Shyr
Journal:  Biomed Res Int       Date:  2015-01-22       Impact factor: 3.411

8.  Nomograms Involving HER2 for Predicting Lymph Node Metastasis in Early Gastric Cancer.

Authors:  Yu Mei; Shuo Wang; Tienan Feng; Min Yan; Fei Yuan; Zhenggang Zhu; Tian Li; Zhenglun Zhu
Journal:  Front Cell Dev Biol       Date:  2021-12-24

9.  Development and validation of an individualized nomogram to identify occult peritoneal metastasis in patients with advanced gastric cancer.

Authors:  D Dong; L Tang; Z-Y Li; M-J Fang; J-B Gao; X-H Shan; X-J Ying; Y-S Sun; J Fu; X-X Wang; L-M Li; Z-H Li; D-F Zhang; Y Zhang; Z-M Li; F Shan; Z-D Bu; J Tian; J-F Ji
Journal:  Ann Oncol       Date:  2019-03-01       Impact factor: 32.976

10.  Evaluating risk prediction models for adults with heart failure: A systematic literature review.

Authors:  Gian Luca Di Tanna; Heidi Wirtz; Karen L Burrows; Gary Globe
Journal:  PLoS One       Date:  2020-01-15       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.