BACKGROUND: The risk of gastric cancer after Helicobacter pylori (H. pylori) eradication remains unknown. AIM: To evaluate the performances of seven different machine learning models in predicting gastric cancer risk after H. pylori eradication. METHODS: We identified H. pylori-infected patients who had received clarithromycin-based triple therapy between 2003 and 2014 in Hong Kong. Patients were divided into training (n = 64 238) and validation sets (n = 25 330), according to period of eradication therapy. The data were used to construct seven machine learning models to predict risk of gastric cancer development within 5 years after H. pylori eradication. A total of 26 clinical variables were input into these models. The performances were measured by the area under receiver operating characteristic curve (AUC) analysis. RESULTS: During a mean follow-up of 4.7 years, 0.21% of H. pylori-eradicated patients developed gastric cancer. Of the seven machine learning models, extreme gradient boosting (XGBoost) had the best performance in predicting cancer development (AUC 0.97, 95%CI 0.96-0.98), and was superior to conventional logistic regression (AUC 0.90, 95% CI 0.84-0.92). With the XGBoost model, the number of patients considered at high risk of gastric cancer was 6.6%, with miss rate of 1.9%. Patient age, presence of intestinal metaplasia, and gastric ulcer were the heavily weighted factors used by the XGBoost. CONCLUSION: Based on simple baseline patient information, machine learning model can accurately predict the risk of post-eradication gastric cancer. This model could substantially reduce the number of patients who require endoscopic surveillance.
BACKGROUND: The risk of gastric cancer after Helicobacter pylori (H. pylori) eradication remains unknown. AIM: To evaluate the performances of seven different machine learning models in predicting gastric cancer risk after H. pylori eradication. METHODS: We identified H. pylori-infectedpatients who had received clarithromycin-based triple therapy between 2003 and 2014 in Hong Kong. Patients were divided into training (n = 64 238) and validation sets (n = 25 330), according to period of eradication therapy. The data were used to construct seven machine learning models to predict risk of gastric cancer development within 5 years after H. pylori eradication. A total of 26 clinical variables were input into these models. The performances were measured by the area under receiver operating characteristic curve (AUC) analysis. RESULTS: During a mean follow-up of 4.7 years, 0.21% of H. pylori-eradicated patients developed gastric cancer. Of the seven machine learning models, extreme gradient boosting (XGBoost) had the best performance in predicting cancer development (AUC 0.97, 95%CI 0.96-0.98), and was superior to conventional logistic regression (AUC 0.90, 95% CI 0.84-0.92). With the XGBoost model, the number of patients considered at high risk of gastric cancer was 6.6%, with miss rate of 1.9%. Patient age, presence of intestinal metaplasia, and gastric ulcer were the heavily weighted factors used by the XGBoost. CONCLUSION: Based on simple baseline patient information, machine learning model can accurately predict the risk of post-eradication gastric cancer. This model could substantially reduce the number of patients who require endoscopic surveillance.