| Literature DB >> 31367620 |
Rui-Quan Zhou1, Hong-Chen Ji2, Qu Liu2, Chun-Yu Zhu2, Rong Liu1.
Abstract
BACKGROUND: The incidence of pancreatic neuroendocrine tumors (PNETs) is now increasing rapidly. The tumor grade of PNETs significantly affects the treatment strategy and prognosis. However, there is still no effective way to non-invasively classify PNET grades. Machine learning (ML) algorithms have shown potential in improving the prediction accuracy using comprehensive data. AIM: To provide a ML approach to predict PNET tumor grade using clinical data.Entities:
Keywords: Biochemical indexes; Machine learning; Pancreatic neuroendocrine tumors; Tumor grade; Tumor markers
Year: 2019 PMID: 31367620 PMCID: PMC6658377 DOI: 10.12998/wjcc.v7.i13.1611
Source DB: PubMed Journal: World J Clin Cases ISSN: 2307-8960 Impact factor: 1.337
General confusion matrix
| Condition | Condition negative | True negative | False positive |
| Condition positive | False negative | True positive | |
Figure 1Distribution of different clinical variables.
Relationship between different pancreatic neuroendocrine tumor grades and clinical variables
| Gender (male / female) | s | 0.3575 | ||
| G1 | 14 / 18 | |||
| G2 | 37 / 11 | 0.002 | ||
| G3 | 4 / 7 | 0.668 | 0.008 | |
| Age | < 0.001 | |||
| G1 | 52.47 ± 11.70 | |||
| G2 | 49.19 ± 11.34 | 0.634 | ||
| G3 | 50.00 ± 17.04 | 0.082 | 0.039 | |
| ALT | < 0.001 | |||
| G1 | 34.95 ± 72.06 | |||
| G2 | 33.97 ± 59.10 | 0.730 | ||
| G3 | 117.02 ± 143.74 | 0.039 | 0.006 | |
| BIL | < 0.001 | |||
| G1 | 9.15 ± 3.65 | |||
| G2 | 12.71 ± 9.56 | 0.009 | ||
| G3 | 69.63 ± 67.56 | < 0.001 | < 0.001 | |
| AFP | < 0.001 | |||
| G1 | 2.27 ± 1.02 | |||
| G2 | 3.34 ± 2.38 | 0.014 | ||
| G3 | 3.47 ± 1.50 | 0.035 | 0.606 | |
| CEA | < 0.001 | |||
| G1 | 1.51 ± 0.82 | |||
| G2 | 2.19 ± 2.38 | 0.132 | ||
| G3 | 11.77 ± 17.05 | < 0.001 | < 0.001 | |
| CA19-9 | < 0.001 | |||
| G1 | 9.58 ± 7.57 | |||
| G2 | 20.46 ± 24.74 | 0.007 | ||
| G3 | 37.10 ± 39.40 | < 0.001 | 0.118 | |
| CA125 | 0.0146 | |||
| G1 | 10.43 ± 5.60 | |||
| G2 | 13.72 ± 12.22 | 0.039 | ||
| G3 | 11.13 ± 4.75 | 0.942 | 0.195 | |
| CA15-3 | < 0.001 | |||
| G1 | 8.98 ± 4.34 | |||
| G2 | 11.41 ± 5.49 | 0.361 | ||
| G3 | 11.52 ± 3.80 | 0.585 | 0.318 | |
| CA72-4 | < 0.001 | |||
| G1 | 1.83 ± 1.43 | |||
| G2 | 2.14 ± 1.50 | 0.217 | ||
| G3 | 7.42 ± 7.85 | < 0.001 | < 0.001 |
BIL: Bilirubin; ALT: Alanine transaminase; CA: Carbohydrate antigen; CEA: Carcinoembryonic antigen; AFP: Alpha fetoprotein.
Figure 2The cutoff value with the minimum P for Chi-square test when sample volume ranged from 30% to 100% in steps of 5%.
Figure 3The impact of a combination of independent variables and change of parameter on F1 score, recall rate and precision rate of four models. Combination 1: bilirubin (BIL); 2: BIL + alanine transaminase (ALT); 3: BIL + ALT + carbohydrate antigen 72-4 (CA72-4); 4: BIL + ALT + CA72-4 + carcinoembryonic antigen (CEA); 5: BIL + ALT + CA72-4 + CEA + CA19-9; 6: BIL + ALT + CA72-4 + CEA + CA19-9 + alpha fetoprotein (AFP); 7: BIL + ALT + CA72-4 + CEA + CA19-9 + AFP + age; 8: BIL + ALT + CA72-4 + CEA + CA19-9 + AFP + age + CA15-3; 9: BIL + ALT + CA72-4 + CEA + CA19-9 + AFP + age + CA15-3 + CA125; 10: BIL + ALT + CA72-4 + CEA + CA19-9 + AFP + age + CA15-3 + CA125 + gender). LR: Logistic regression; SVM: Support vector machine; LDA: Linear discriminant analysis MLP: Multilayer perceptron.
The highest F1 score, recall rate and precision rate scores of different models
| LR | 0.80 | 0.80 | 0.81 |
| SVM | 0.81 | 0.81 | 0.82 |
| Linear SVM | 0.82 | 0.82 | 0.84 |
| LDA-eigen1 | 0.85 | 0.85 | 0.86 |
| LDA-lsqr1 | 0.85 | 0.85 | 0.86 |
| LDA-svd | 0.82 | 0.82 | 0.84 |
| MLP | 0.82 | 0.81 | 0.84 |
1Values over 0.85. LR: Logistic regression; SVM: Support vector machine; LDA: Linear discriminant analysis; MLP: Multilayer perceptron.
Figure 4F1 score, recall rate and precision rate of different models with increasing sample size. LR: Logistic regression; SVM: Support vector machine; LDA: Linear discriminant analysis MLP: Multilayer perceptron.
F1 score, recall rate and precision rate of each grade in different models
| LR | G1 | 0.83 | 0.94 | 0.75 |
| G2 | 0.8 | 0.73 | 0.88 | |
| G3 | 0.73 | 0.73 | 0.73 | |
| SVM | G1 | 0.85 | 0.94 | 0.77 |
| G2 | 0.81 | 0.75 | 0.88 | |
| G3 | 0.73 | 0.73 | 0.73 | |
| Linear SVM | G1 | 0.85 | 0.94 | 0.77 |
| G2 | 0.81 | 0.73 | 0.92 | |
| G3 | 0.8 | 0.91 | 0.71 | |
| LDA-eigen | G1 | 0.85 | 0.94 | 0.77 |
| G2 | 0.85 | 0.81 | 0.89 | |
| G3 | 0.84 | 0.73 | 1 | |
| LDA-lsqr | G1 | 0.85 | 0.94 | 0.77 |
| G2 | 0.85 | 0.81 | 0.89 | |
| G3 | 0.84 | 0.73 | 1 | |
| LDA-svd | G1 | 0.85 | 0.97 | 0.76 |
| G2 | 0.82 | 0.75 | 0.9 | |
| G3 | 0.76 | 0.73 | 0.8 | |
| LDA-svd | G1 | 0.85 | 0.97 | 0.76 |
| G2 | 0.82 | 0.75 | 0.9 | |
| G3 | 0.76 | 0.73 | 0.8 | |
| MLP | G1 | 0.83 | 0.94 | 0.75 |
| G2 | 0.81 | 0.75 | 0.88 | |
| G3 | 0.83 | 0.94 | 0.68 |
LR: Logistic regression; SVM: Support vector machine; LDA: Linear discriminant analysis; MLP: Multilayer perceptron.
Figure 5The importance of different variables for each model. LR: Logistic regression; SVM: Support vector machine; LDA: Linear discriminant analysis MLP: Multilayer perceptron.