| Literature DB >> 35720017 |
Wei-Bin Zhang1,2, Si-Ze Hou3, Yan-Ling Chen1, Feng Mao1, Yi Dong1, Jian-Gang Chen4, Wen-Ping Wang1.
Abstract
Background: First-line surveillance on hepatitis B virus (HBV)-infected populations with B-mode ultrasound is relatively limited to identifying hepatocellular carcinoma (HCC) without elevated α-fetoprotein (AFP). To improve the present HCC surveillance strategy, the state of the art of artificial intelligence (AI), a deep learning (DL) approach, is proposed to assist in the diagnosis of a focal liver lesion (FLL) in HBV-infected liver background.Entities:
Keywords: AFP negative; HBV infection; deep learning; focal liver lesion; focal nodular hyperplasia; hepatocellular carcinoma; ultrasound
Year: 2022 PMID: 35720017 PMCID: PMC9204304 DOI: 10.3389/fonc.2022.862297
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 5.738
Figure 1Flowchart of deep learning model construction and analysis: (i) obtained grayscale images of the model cohort were fed into five deep learning models for training and model construction; (ii) selected lesions of the test cohort with similar clinical backgrounds were tested; and (iii) the five deep learning models were assessed in terms of diagnostic performance.
Figure 2The flowchart of patient selection process. HCC, hepatocellular carcinoma; FNH, focal nodular hyperplasia.
Figure 3Structure of separable convolution.
Figure 4Structure of Xception.
Baseline information in the model and test cohorts.
| Parameters | Model | Test |
| HCC | FNH |
|---|---|---|---|---|---|
| Case | n = 305 | n = 102 | - | n = 209 | n = 198 |
| Age | 44.51 ± 16.34 | 48.12 ± 13.68 | 0.000 | 56.24 ± 10.96 | 34.31 ± 11.75 |
| Gender | 0.722 | ||||
| Male | 214 (70.2) | 73 (71.6) | – | 179 (85.6) | 111 (56.1) |
| Female | 91 (29.8) | 29 (28.4) | – | 30 (14.4) | 87 (43.9) |
| HBV infection | 132 (43.3) | 102 (100.0) | 0.000 | 188 (90.0) | 50 (25.3) |
| AFP ≥ 20 (ng/ml) | 105 (34.4) | 0 | 0.000 | 110 (52.6) | 0 |
| Lesion | n = 310 | n = 103 | – | n = 209 | n = 204 |
| Lesion size | 45.35 ± 27.71 | 35.54 ± 19.94 | 0.004 | 39.95 ± 30.38 | 45.93 ± 21.02 |
| ≥3 cm | 211 (68.1) | 56 (54.4) | – | 122 (58.4) | 145 (71.1) |
| <3 cm | 99 (31.9) | 47 (45.6) | – | 87 (41.6) | 59 (28.9) |
| Lesion echogenicity | 0.421 | ||||
| Hypo- | 197 (63.5) | 70 (68.0) | – | 135 (64.6) | 132 (64.7) |
| Iso- | 60 (19.4) | 21 (20.4) | – | 30 (14.4) | 51 (25.0) |
| Hyper- | 53 (17.1) | 12 (11.7) | – | 44 (21.1) | 21 (10.3) |
| Liver background | |||||
| Fatty liver | 49 (15.8) | 17 (16.5) | 0.867 | 24 (11.5) | 42 (20.6) |
| Liver fibrosis | 59 (19.0) | 36 (35.0) | 0.001 | 80 (38.3) | 15 (7.4) |
| Liver cirrhosis | 80 (25.8) | 25 (24.3) | 0.757 | 105 (50.2) | 0 |
Data are presented as mean ± SD or n (%); p-value is set to <0.05 to suggest statistical difference between the model and test cohorts.
HCC, hepatocellular carcinoma; FNH, focal nodular hyperplasia; HBV, hepatitis B virus; AFP, α-fetoprotein.
Figure 5ROC curves of all deep learning models in the model and test cohorts. All methods showed excellent AUCs in model cohort, while the ROC curves in test cohort reflect the diagnosis pressure of lesions in similar clinical backgrounds on different DL methods. ROC, receiver operating characteristic; AUC, area under the ROC curve; DL, deep learning.
Diagnostic performance of all deep learning models in the test cohort.
| Diagnostic index | Our method | MobileNet | Resnet50 | DenseNet121 | InceptionV3 |
|---|---|---|---|---|---|
| AUC | 93.68% | 89.06% | 85.67% | 83.94% | 78.13% |
| 95% CI upper | 98.77% | 95.33% | 92.93% | 92.32% | 87.27% |
| 95% CI lower | 88.60% | 82.80% | 78.41% | 75.55% | 68.99% |
| Sensitivity | 96.08% | 96.08% | 88.24% | 88.24% | 92.16% |
| Specificity | 76.92% | 61.54% | 59.62% | 61.54% | 53.85% |
| Accuracy | 86.41% | 78.64% | 73.79% | 74.76% | 72.82% |
| F1-score | 88.66% | 81.67% | 76.78% | 77.59% | 77.05% |
| PPV | 80.33% | 71.01% | 68.18% | 69.23% | 66.20% |
| NPV | 95.24% | 94.12% | 83.78% | 84.21% | 87.50% |
| FPR | 23.08% | 38.46% | 40.38% | 38.46% | 46.15% |
| FNR | 3.92% | 3.92% | 11.76% | 11.76% | 7.84% |
AUC, area under the receiver operating characteristic curve; PPV, positive predictive value; NPV, negative predictive value; FPR, false-positive rate; FNR, false-negative rate.
Accuracy of all models in 5-fold cross-validation.
| Accuracy | Xception | MobileNet | Resnet50 | DenseNet121 | InceptionV3 |
|---|---|---|---|---|---|
| Training cohort | |||||
| Accuracy1 | 98.26% | 94.77% | 83.06% | 95.89% | 95.39% |
| Accuracy2 | 98.13% | 94.27% | 84.56% | 95.39% | 94.89% |
| Accuracy3 | 98.63% | 94.89% | 84.06% | 95.89% | 96.14% |
| Accuracy4 | 98.01% | 94.52% | 83.94% | 96.26% | 94.89% |
| Accuracy5 | 98.01% | 95.15% | 84.70% | 96.14% | 95.77% |
| Validation cohort | |||||
| Accuracy1 | 98.00% | 94.53% | 88.06% | 96.02% | 95.52% |
| Accuracy2 | 98.51% | 96.52% | 82.09% | 98.01% | 97.51% |
| Accuracy3 | 96.52% | 94.03% | 84.08% | 96.02% | 92.54% |
| Accuracy4 | 99.00% | 95.52% | 84.58% | 94.53% | 97.51% |
| Accuracy5 | 99.00% | 93.00% | 81.50% | 95.00% | 94.00% |
Models in 5-fold cross-validation were capped at 50 epochs. Our proposed Xception method showed best robustness compared to other baselines.