| Literature DB >> 34101764 |
Thodsawit Tiyarattanachai1, Terapap Apiparakoon2, Sanparith Marukatat3, Sasima Sukcharoen4, Nopavut Geratikornsupuk5, Nopporn Anukulkarnkusol6, Parit Mekaroonkamol4, Natthaporn Tanpowpong7, Pamornmas Sarakul8, Rungsun Rerknimitr9, Roongruedee Chaiteerakij9.
Abstract
Artificial intelligence (AI) using a convolutional neural network (CNN) has demonstrated promising performance in radiological analysis. We aimed to develop and validate a CNN for the detection and diagnosis of focal liver lesions (FLLs) from ultrasonography (USG) still images. The CNN was developed with a supervised training method using 40,397 retrospectively collected images from 3,487 patients, including 20,432 FLLs (hepatocellular carcinomas (HCCs), cysts, hemangiomas, focal fatty sparing, and focal fatty infiltration). AI performance was evaluated using an internal test set of 6,191 images with 845 FLLs, then externally validated using 18,922 images with 1,195 FLLs from two additional hospitals. The internal evaluation yielded an overall detection rate, diagnostic sensitivity and specificity of 87.0% (95%CI: 84.3-89.6), 83.9% (95%CI: 80.3-87.4), and 97.1% (95%CI: 96.5-97.7), respectively. The CNN also performed consistently well on external validation cohorts, with a detection rate, diagnostic sensitivity and specificity of 75.0% (95%CI: 71.7-78.3), 84.9% (95%CI: 81.6-88.2), and 97.1% (95%CI: 96.5-97.6), respectively. For diagnosis of HCC, the CNN yielded sensitivity, specificity, and negative predictive value (NPV) of 73.6% (95%CI: 64.3-82.8), 97.8% (95%CI: 96.7-98.9), and 96.5% (95%CI: 95.0-97.9) on the internal test set; and 81.5% (95%CI: 74.2-88.8), 94.4% (95%CI: 92.8-96.0), and 97.4% (95%CI: 96.2-98.5) on the external validation set, respectively. CNN detected and diagnosed common FLLs in USG images with excellent specificity and NPV for HCC. Further development of an AI system for real-time detection and characterization of FLLs in USG is warranted.Entities:
Year: 2021 PMID: 34101764 PMCID: PMC8186767 DOI: 10.1371/journal.pone.0252882
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Number of USG images from 3 participating hospitals, along with allocation of images for AI training and performance evaluation.
| Training set | Internal test set | External validation cohorts | |||
|---|---|---|---|---|---|
| Cohort 1 | Cohort 2 | Pooled | |||
| 3487 | 385 | 311 | 625 | 936 | |
| 40397 | 6191 | 5624 | 13298 | 18922 | |
| 18239 | 801 | 344 | 734 | 1078 | |
| Total | 20432 (100) | 845 (100) | 360 (100) | 835 (100) | 1195 (100) |
| HCC | 2414 (11.8) | 102 (12.1) | 34 (9.4) | 104 (12.5) | 138 (11.5) |
| Cyst | 6600 (32.3) | 215 (25.4) | 130 (36.1) | 87 (10.4) | 217 (18.2) |
| Hemangioma | 5374 (26.3) | 217 (25.7) | 60 (16.7) | 202 (24.2) | 262 (21.9) |
| FFS | 5110 (25.0) | 264 (31.2) | 120 (33.3) | 404 (48.4) | 524 (43.8) |
| FFI | 934 (4.6) | 47 (5.6) | 16 (4.4) | 38 (4.6) | 54 (4.5) |
| Total | 1.6 (1.7) | 1.6 (1.6) | 1.5 (1.3) | 1.8 (1.7) | 1.7 (1.6) |
| HCC | 3.7 (5.5) | 3.3 (5.8) | 2.3 (6.6) | 3.9 (4.4) | 3.9 (5.5) |
| Cyst | 1.4 (1.5) | 1.0 (0.8) | 1.0 (0.7) | 1.2 (0.9) | 1.1 (0.9) |
| Hemangioma | 1.2 (1.2) | 1.4 (1.1) | 1.9 (1.1) | 1.5 (1.5) | 1.6 (1.4) |
| FFS | 1.7 (1.1) | 1.8 (1.4) | 1.9 (1.4) | 1.7 (1.3) | 1.8 (1.3) |
| FFI | 2.5 (2.5) | 2.4 (3.5) | 1.7 (1.0) | 2.4 (2.7) | 2.1 (2.7) |
| 22158 | 5390 | 5280 | 12564 | 17844 | |
aKing Chulalongkorn Memorial Hospital, Bangkok, Thailand
bMahachai Hospital, Samut Sakhon, Thailand
cQueen Savang Vadhana Memorial Hospital, Chonburi, Thailand
Performance of the AI system on the internal test set and external validation cohorts.
| Internal test set | External validation cohorts | P* | |||
|---|---|---|---|---|---|
| Cohort 1 | Cohort 2 | Pooled | |||
| Detection rate | 87.0 (84.3–89.6) | 80.3 (74.8–85.8) | 73.9 (69.9–78.0) | 75.0 (71.7–78.3) | <0.001 |
| Diagnostic sensitivity | 83.9 (80.3–87.4) | 84.6 (79.0–90.2) | 85.7 (81.7–89.6) | 84.9 (81.6–88.2) | 0.69 |
| Diagnostic specificity | 97.1 (96.5–97.7) | 97.2 (96.3–98.2) | 97.0 (96.3–97.7) | 97.1 (96.5–97.6) | 0.98 |
| Detection rate | 85.3 (78.4–92.2) | 91.2 (81.6–101) | 74.0 (65.6–82.5) | 78.3 (71.4–85.2) | 0.16 |
| Diagnostic sensitivity | 73.6 (64.3–82.8) | 74.2 (58.8–89.6) | 84.4 (76.3–92.5) | 81.5 (74.2–88.8) | 0.19 |
| Diagnostic specificity | 97.8 (96.7–98.9) | 96.1 (93.7–98.5) | 93.6 (91.5–95.7) | 94.4 (92.8–96.0) | 0.55 |
| Detection rate | 89.3 (85.2–93.4) | 76.9 (69.7–84.2) | 85.1 (77.6–92.5) | 80.2 (74.9–85.5) | 0.008 |
| Diagnostic sensitivity | 97.9 (95.9–99.9) | 91.0 (85.4–96.6) | 98.6 (96.0–100) | 94.3 (90.8–97.7) | 0.07 |
| Diagnostic specificity | 98.3 (97.2–99.4) | 97.8 (95.8–99.9) | 98.7 (97.7–99.7) | 98.5 (97.6–99.4) | 0.99 |
| Detection rate | 93.5 (90.3–96.8) | 78.3 (67.9–88.8) | 79.7 (74.2–85.2) | 79.4 (74.5–84.3) | <0.001 |
| Diagnostic sensitivity | 80.8 (75.4–86.2) | 74.5 (62.0–86.9) | 67.7 (60.5–74.9) | 69.2 (63.0–75.5) | 0.006 |
| Diagnostic specificity | 95.0 (93.2–96.9) | 97.9 (96.1–99.7) | 96.2 (94.4–98.0) | 96.8 (95.5–98.1) | 0.12 |
| Detection rate | 77.3 (72.2–82.3) | 80.0 (72.8–87.2) | 67.6 (63.0–72.1) | 70.4 (66.5–74.3) | 0.03 |
| Diagnostic sensitivity | 98.0 (96.1–99.9) | 100 (96.2–100) | 98.5 (97.1–100) | 98.9 (97.9–100) | 0.41 |
| Diagnostic specificity | 96.9 (95.5–98.4) | 95.8 (92.9–98.6) | 98.5 (97.2–99.8) | 97.5 (96.2–98.9) | 0.53 |
| Detection rate | 89.4 (80.5–98.2) | 75.0 (53.8–96.2) | 63.2 (47.8–78.5) | 66.7 (54.1–79.3) | 0.004 |
| Diagnostic sensitivity | 69.0 (55.1–83.0) | 83.3 (62.2–100) | 79.2 (62.9–95.4) | 80.6 (67.6–93.5) | 0.60 |
| Diagnostic specificity | 97.4 (96.2–98.6) | 98.5 (97.1–100) | 98.1 (97.0–99.2) | 98.3 (97.4–99.1) | 0.97 |
aKCMH, King Chulalongkorn Memorial Hospital, Bangkok, Thailand
bMahachai Hospital, Samut Sakhon, Thailand
cQueen Savang Vadhana Memorial Hospital, Chonburi, Thailand
*P-value for two-tailed z-test for difference of proportion, comparing performance results in the internal test set and pooled external validation set. P-value of <0.05 was considered statistically significant.
†Clopper-Pearson confidence interval was calculated for performance value at boundaries (i.e. 0% and 100%)
Detection rates, diagnostic sensitivities and specificities are shown in percentages. 95% confidence intervals are shown in parenthesis.
Confusion matrix for classification results on internal test set and external validation set.
| | |||||||
| HCC | Cyst | Hemangioma | FFS | FFI | Total | ||
| HCC | 64 | 2 | 9 | 1 | 2 | 78 | |
| Cyst | 3 | 188 | 4 | 2 | 0 | 197 | |
| Hemangioma | 13 | 2 | 164 | 1 | 10 | 190 | |
| FFS | 5 | 0 | 10 | 200 | 1 | 216 | |
| FFI | 2 | 0 | 16 | 0 | 29 | 47 | |
| Total | 87 | 192 | 203 | 204 | 42 | 728 | |
| | |||||||
| HCC | Cyst | Hemangioma | FFS | FFI | Total | ||
| HCC | 88 | 3 | 38 | 2 | 1 | 132 | |
| Cyst | 4 | 164 | 7 | 0 | 0 | 175 | |
| Hemangioma | 12 | 2 | 144 | 2 | 6 | 166 | |
| FFS | 3 | 5 | 5 | 365 | 0 | 378 | |
| FFI | 1 | 0 | 14 | 0 | 29 | 44 | |
| Total | 108 | 174 | 208 | 369 | 36 | 895 | |