| Literature DB >> 31455831 |
Junichi Taninaga1, Yu Nishiyama1, Kazutoshi Fujibayashi2,3,4, Toshiaki Gunji5, Noriko Sasabe5, Kimiko Iijima5, Toshio Naito6.
Abstract
A comprehensive screening method using machine learning and many factors (biological characteristics, Helicobacter pylori infection status, endoscopic findings and blood test results), accumulated daily as data in hospitals, could improve the accuracy of screening to classify patients at high or low risk of developing gastric cancer. We used XGBoost, a classification method known for achieving numerous winning solutions in data analysis competitions, to capture nonlinear relations among many input variables and outcomes using the boosting approach to machine learning. Longitudinal and comprehensive medical check-up data were collected from 25,942 participants who underwent multiple endoscopies from 2006 to 2017 at a single facility in Japan. The participants were classified into a case group (y = 1) or a control group (y = 0) if gastric cancer was or was not detected, respectively, during a 122-month period. Among 1,431 total participants (89 cases and 1,342 controls), 1,144 (80%) were randomly selected for use in training 10 classification models; the remaining 287 (20%) were used to evaluate the models. The results showed that XGBoost outperformed logistic regression and showed the highest area under the curve value (0.899). Accumulating more data in the facility and performing further analyses including other input variables may help expand the clinical utility.Entities:
Mesh:
Year: 2019 PMID: 31455831 PMCID: PMC6712020 DOI: 10.1038/s41598-019-48769-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
List of discriminative models.
| Models | Classifier | Input variables |
|---|---|---|
| Model A | XGBoost | |
| Model B | XGBoost | |
| Model C | XGBoost | Variables in model B, gastric or duodenal ulcers including scars, GERDb or Barrett’s oesophagus and post-gastrectomy |
| Model D | XGBoost | Variables in model C, sex, age and body mass index |
| Model E | XGBoost | Variables in model D, white blood cell counts, neutrophil ratio, lymphocyte ratio, eosinophil ratio, monocyte ratio, basophil ratio, platelet count, haemoglobin, mean corpuscular volume and haemoglobin A1c |
| Model F | LRc | The same variables as model A |
| Model G | LR | The same variables as model B |
| Model H | LR | The same variables as model C |
| Model I | LR | The same variables as model D |
| Model J | LR | The same variables as model E |
aHelicobacter pylori, H. pylori.
bGastroesophageal reflux disease, GERD.
cLogistic regression, LR.
Figure 1Receiver operating characteristic (ROC) curves obtained for the prediction of the development of gastric cancer.
Results of predicting patients at risk of developing gastric cancer.
| ROCa_AUCb (cvc) | ROC_AUC (test) | Accuracy | Sensitivity | Specificity | |
|---|---|---|---|---|---|
| Model A | 0.736 | 0.742 | 0.690 | 0.800 | 0.684 |
| Model B | 0.792 | 0.815 | 0.641 | 1.000 | 0.621 |
| Model C | 0.823 | 0.790 | 0.690 | 0.867 | 0.680 |
| Model D | 0.858 | 0.885 | 0.763 | 1.000 | 0.750 |
| Model E | 0.874 | 0.899 | 0.777 | 0.933 | 0.768 |
| Model F | 0.736 | 0.742 | 0.948 | 0.000 | 1.000 |
| Model G | 0.792 | 0.815 | 0.948 | 0.000 | 1.000 |
| Model H | 0.822 | 0.799 | 0.634 | 1.000 | 0.614 |
| Model I | 0.853 | 0.880 | 0.941 | 0.000 | 0.993 |
| Model J | 0.862 | 0.874 | 0.885 | 0.600 | 0.901 |
aReceiver operating characteristic curve, ROC.
bArea under the curve, AUC.
cCross-validation, CV.
Figure 2Illustration of patients with or without detected gastric cancer. (a) 122 months was used as the cut-off because it was the longest period that gastric cancer could be detected in the case group.
Demographic characteristics at the initial examination.
| Patients with detected gastric cancer | Patients without detected gastric cancer | ||
|---|---|---|---|
| n | 89 | 1342 | |
| Examination period (months), mean (SD) | 47.4 (32.8) | 127.6 (4.1) | <0.001 |
| Age (y), mean (SD) | 56.7 (8.8) | 46.2 (1.0) | <0.001 |
| Sex (male), n (%) | 75 (84.2) | 1042 (77.6) | 0.183 |
| Body mass index (kg/m2), mean (SD) | 23.3 (2.9) | 23.1 (3.2) | 0.539 |
| 69 (77.5) | 409 (30.4) | <0.001 | |
|
| |||
| Chronic atrophic gastritis, n (%) | 81 (91.0) | 409 (30.4) | <0.001 |
| Gastric or duodenal ulcers including scars, n (%) | 21 (23.5) | 118 (8.79) | <0.001 |
| GERDc or Barrett’s oesophagus, n (%) | 20 (22.4) | 312 (23.2) | 0.969 |
| Post-gastrectomy, n (%) | 4 (4.49) | 19 (1.41) | 0.072 |
|
| |||
| White blood cell counts (×103/μL), mean (SD) | 5.866 (1.762) | 5.510 (1.598) | 0.0678 |
| Neutrophil ratio (%), mean (SD) | 59.1 (8.3) | 57.3 (8.6) | 0.0527 |
| Lymphocyte ratio (%), mean (SD) | 32.2 (7.1) | 33.7 (8.0) | 0.0591 |
| Eosinophil ratio (%), mean (SD) | 2.8 (1.8) | 3.2 (2.6) | 0.0278 |
| Monocyte ratio (%), mean (SD) | 5.5 (1.8) | 5.3 (1.3) | 0.275 |
| Basophil ratio (%), mean (SD) | 0.5 (0.3) | 0.6 (0.4) | 0.198 |
| Haemoglobin (g/dL), mean (SD) | 14.8 (1.0) | 14.4 (1.3) | 0.391 |
| Mean corpuscular volume (fL), mean (SD) | 94.4 (4.6) | 92.3 (4.7) | <0.001 |
| Platelet count (×104/μL), mean (SD) | 22.2 (4.8) | 23.0 (5.1) | 0.158 |
| Haemoglobin A1c (%), mean (SD) | 5.85 (0.87) | 5.39 (0.56) | <0.001 |
at-test or chi-squared test.
bHelicobacter pylori, H. pylori.
cGastroesophageal reflux disease, GERD.
Figure 3Flowchart showing the inclusion and exclusion procedures in the present study.