| Literature DB >> 35311094 |
Weiwei Liu1, Lifan Zhang2, Zhaodan Xin1, Haili Zhang3, Liting You1, Ling Bai1, Juan Zhou1, Binwu Ying1.
Abstract
Background: The non-invasive preoperative diagnosis of microvascular invasion (MVI) in hepatocellular carcinoma (HCC) is vital for precise surgical decision-making and patient prognosis. Herein, we aimed to develop an MVI prediction model with valid performance and clinical interpretability.Entities:
Keywords: extreme gradient boosting (XGBoost); hepatocellular carcinoma; machine learning; microvascular invasion; non-invasive predictive models
Year: 2022 PMID: 35311094 PMCID: PMC8931027 DOI: 10.3389/fonc.2022.852736
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 6.244
Figure 1Flow chart of patient selection.
The participant baseline characteristics data.
| Variables | Total (N=2160) | NMVI (N=1585) | MVI (N=575) | |
|---|---|---|---|---|
|
| 53.2 (11.6) | 53.6 (11.4) | 52.0 (12.0) | 0.004 |
|
| 0.239 | |||
|
| 1813 (83.9) | 1321 (83.3) | 492 (85.6) | |
|
| 347 (16.1) | 264 (16.7) | 83 (14.4) | |
|
| 165.1 (7.0) | 165.1 (7.0) | 165.2 (7.0) | 0.857 |
|
| 63.2 (10.2) | 63.5 (10.40) | 62.5 (9.6) | 0.051 |
|
| 23.2 (3.1) | 23.3 (3.2) | 22.9 (2.9) | 0.008 |
|
| 0.313 | |||
|
| 76 (3.5) | 50 (3.2) | 26 (4.5) | |
|
| 2046 (94.7) | 1507 (95.1) | 539 (93.7) | |
|
| 38 (1.8) | 28 (1.8) | 10 (1.7) | |
|
| 0.113 | |||
|
| 1773 (82) | 1314 (82.9) | 459 (79.8) | |
|
| 387 (18) | 271 (17.1) | 116 (20.2) | |
|
| 0.768 | |||
|
| 23 (1.1) | 18 (1.1) | 5 (0.9) | |
|
| 2137 (98.9) | 1567 (98.9) | 570 (99.1) | |
|
| 0.114 | |||
|
| 911 (42.2) | 685 (43.2) | 226 (39.3) | |
|
| 1249 (57.8) | 900 (56.8) | 349 (60.7) |
The clinical characteristic differences between MVI and NMVI group.
| Variables | Total (N=2160) | NMVI (N=1585) | MVI (N=575) | |
|---|---|---|---|---|
|
| ||||
|
| <0.001 | |||
|
| 230 (10.6) | 118 (7.4) | 112 (19.5) | |
|
| 1930 (89.4) | 1467 (92.6) | 463 (80.5) | |
|
| 5.5 (3.4) | 4.9 (3.1) | 7.1 (3.7) | <0.001 |
|
| <0.001 | |||
|
| 343 (18.3) | 198 (14.4) | 145 (29.1) | |
|
| 1527 (81.7) | 1173 (85.6) | 354 (70.9) | |
|
| ||||
|
| 142.2 (71.2) | 136.2 (67.6) | 158.9 (77.9) | <0.001 |
|
| 60.2 (10.0) | 59.6 (10.1) | 61.7 (9.7) | <0.001 |
|
| 28.9 (8.8) | 29.4 (8.8) | 27.5 (8.7) | <0.001 |
|
| 2.5 (1.5) | 2.4 (1.4) | 2.7 (1.7) | <0.001 |
|
| 2.7 (1.0) | 2.6 (0.9) | 2.9 (1.0) | <0.001 |
|
| 46.8 (41.8) | 43.7 (38.3) | 55.3 (49.1) | <0.001 |
|
| 1.2 (0.7) | 1.1 (0.5) | 1.3 (0.9) | <0.001 |
|
| 103.9 (62.2) | 100.4 (61.1) | 113.7 (64.3) | <0.001 |
|
| 93.9 (124.8) | 85.6 (122.8) | 116.7 (127.2) | <0.001 |
|
| 2.4 (0.8) | 2.4 (0.7) | 2.5 (0.9) | <0.001 |
|
| 194.4 (86.4) | 186.8 (69.5) | 215.2 (118.9) | <0.001 |
|
| 150.0 (63.8) | 146.1 (54.7) | 160.6 (83.1) | <0.001 |
|
| <0.001 | |||
|
| 616 (38.2) | 484 (41.8) | 132 (29.2) | |
|
| 995 (61.8) | 675 (58.2) | 320 (70.8) | |
|
| 3.2 (2.1) | 3.0 (2.1) | 3.5 (2.0) | <0.001 |
|
| <0.001 | |||
|
| 1404 (65.5) | 1118 (71.1) | 286 (50.0) | |
|
| 740 (34.5) | 454 (28.9) | 286 (50.0) | |
|
| <0.001 | |||
|
| 1270 (78.1) | 969 (81.4) | 301 (69.0) | |
|
| 357 (21.9) | 222 (18.6) | 135 (31.0) | |
|
| 5597.0 (14822.6) | 3009.1 (9716.8) | 11905.3 (21680.9) | <0.001 |
|
| 628 (29.1) | 524 (33.1) | 104 (18.1) | |
|
| 503 (23.3) | 278 (17.5) | 225 (39.1) |
Figure 2Development and validation of MVI-prediction model (A) The training process of XGBoost model. Train-log-loss-mean value for the training datasets is shown in the vertical axis. The horizontal axis represents the number of times iterative cross-validation. (B) The learning curve of the score of training cohort and testing cohort. The score for training and test cohorts is shown in the vertical axis. The horizontal axis represents the number of samples trained.
Figure 3Performance of the predictive models. (A) The ROC curve analysis of various prediction model. (B) The PRC curve of different models. The confusion matrix of XGBoost model in the validation cohort. (C) The confusion matrix of XGBoost model. The confusion matrix was composed of the True negative in the first quadrant, the false negative samples in the second quadrant, the true positive example in the third quadrant and the false positive example in the fourth quadrant.
Figure 4Model interpretation. (A) Feature importance matrix plot derived from XGBoost model. (B) SHAP summary plot of the XGBoost model. The higher the SHAP value for each feature, the higher risk of MVI development. A dot represents each feature contribution for each patient in the model. Red indicates a high SHAP value, blue indicates a low SHAP value. (C) SHAP dependance plot of XGBoost model. The SHAP dependance plot represents the contribution of each feature that we care about to the output of the XGBoost model. If the SHAP value of the feature we care about is exceeds zero, the higher the risk of MVI will be.
| AFP | α-fetoprotein |
| AI | artificial intelligence |
| ALB | albumin |
| ALP | alkaline phosphatase |
| ALP/GGT (A/G) | alkaline phosphatase/γ-glutamyl transferase |
| ALT | alanine aminotransferase |
| APTT | activated partial thromboplastin time |
| AST | aspartate aminotransferase |
| AST/ALT (A/A) | aspartate aminotransferase/alanine aminotransferase |
| AUC | the area under the receiver operating characteristic curve |
| BASO% | basophil percentage |
| BMI | body mass index |
| CA-125 | carbohydrate antigen 125 |
| CA19-9 | carbohydrate antigen |
| CEA | carcinoembryonic antigen |
| CHOL | cholesterol |
| CK | creatine kinase |
| CREA | creatinine |
| Cys-C | cystatin C |
| DBIL | Direct bilirubin |
| eGFR | estimated glomerular filtration rate |
| EHR | electronic health record |
| EO% | Percentage of eosinophils |
| FIB | fibrinogen |
| FN | False Negative |
| FP | False Positive |
| GGT | γ-glutamyl transferase |
| GGTP | gamma glutamyl transpeptidase |
| GLB | globulin |
| GLU | glucose |
| Hb | hemoglobin |
| HBcAb | hepatitis B core antibody |
| HBDH | hydroxybutyrate dehydrogenase |
| HBeAb | hepatitis B e antibody |
| HBeAg | hepatitis B e antigen |
| HBsAb | hepatitis B s antibody |
| HBsAg | hepatitis B s antigen |
| HBV | hepatitis B virus |
| HBV DNA | hepatitis B virus DNA |
| HCV | hepatitis C virus |
| HCC | hepatocellular carcinoma |
| Hct | hematocrit |
| HDL-C | high-density lipoprotein cholesterol |
| IBIL | indirect bilirubin |
| IG% | Percentage of naive granulocytes |
| IG | | absolute value of immature granulocytes |
| LDH | lactate dehydrogenase |
| LDL-C | low-density lipoprotein cholesterol |
| LYMPH% | percentage of lymphocytes |
| MCH | mean corpuscular hemoglobin |
| MCHC | mean corpuscular hemoglobin concentration |
| MCV | mean red blood cell volume |
| MONO% | monocyte percentage |
| ML | machine learning |
| MLP | Multi-Layer Perception |
| MVI | microvascular invasion |
| NEUT% | neutral lobulated granulocyte percentage |
| NLR | the neutrophilic lymphocyte ratio |
| PIVKA-II | protein induced by vitamin K absence or antagonist-II |
| PLT | platelets |
| PRC | precision recall curve |
| PT | prothrombin time |
| RBC | red blood cells |
| RDW-CV | RBC distribution width-coefficient of variation |
| RDW-SD | RBC distribution width-standard deviation |
| RF | random forest |
| RFA | radiofrequency ablation |
| RFS | relapse free survival |
| SHAP | Shapley Addictive explanation |
| SVM | support vector machine |
| TACE | transcatheter arterial chemoembolization |
| TBA | total bile acid |
| TBIL | total bilirubin |
| TN | Ture Negative |
| TP | True Positive |
| TT | thrombin time |
| UREA | uric acid |
| WBCs | white blood cells |
| XGBoost | extreme gradient boosting |