| Literature DB >> 32382062 |
Younbeom Jeong1, Jung Hoon Kim2,3,4, Hee-Dong Chae5,6, Sae-Jin Park1,7, Jae Seok Bae1,7, Ijin Joo1,7, Joon Koo Han1,7,8.
Abstract
Ultrasonography (US) has been considered image of choice for gallbladder (GB) polyp, however, it had limitations in differentiating between nonneoplastic polyps and neoplastic polyps. We developed and investigated the usefulness of a deep learning-based decision support system (DL-DSS) for the differential diagnosis of GB polyps on US. We retrospectively collected 535 patients, and they were divided into the development dataset (n = 437) and test dataset (n = 98). The binary classification convolutional neural network model was developed by transfer learning. Using the test dataset, three radiologists with different experience levels retrospectively graded the possibility of a neoplastic polyp using a 5-point confidence scale. The reviewers were requested to re-evaluate their grades using the DL-DSS assistant. The areas under the curve (AUCs) of three reviewers were 0.94, 0.78, and 0.87. The DL-DSS alone showed an AUC of 0.92. With the DL-DSS assistant, the AUCs of the reviewer's improved to 0.95, 0.91, and 0.91. Also, the specificity of the reviewers was improved (65.1-85.7 to 71.4-93.7). The intraclass correlation coefficient (ICC) improved from 0.87 to 0.93. In conclusion, DL-DSS could be used as an assistant tool to decrease the gap between reviewers and to reduce the false positive rate.Entities:
Mesh:
Year: 2020 PMID: 32382062 PMCID: PMC7205977 DOI: 10.1038/s41598-020-64205-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Baseline Characteristics of Data Sets.
| Characteristics | Development Set | Test Set | Total | p |
|---|---|---|---|---|
| No. of Patients | 437 | 98 | 535 | |
| No. of Images | 5,171 | 885 | 6,056 | |
| Age (y) | 52.2 ± 13.5 (21–53–87) | 55.2 ± 12.5 (26–55–86) | 52.7 ± 13.4 (21–54–87) | 0.05 |
| No. of Man Patients | 198 [45.3] | 37 [37.8] | 235 [43.9] | 0.17 |
| No. of Patients with Nonneoplastic polyp | 294 [67.3] | 63 [64.3] | 357 [66.7] | 0.6 |
| No. of Patients with Neoplastic polyp | 144 [33.0] | 35 [39.3] | 179 [33.5] | |
| Size of Polyp | 12.2 ± 7.1 (4.1–10.4–47.2) | 13.4 ± 6.9 (4–12.3–35) | 12.4 ± 7.0 (4–10.6–47.2) | 0.14 |
| Size of Nonneoplastic Polyp | 9.4 ± 3.6 (4.1–8.9–21.4) | 9.9 ± 2.9 (4–9.9–15.4) | 9.4 ± 3.5 (4–9.1–21.4) | 0.27 |
| Size of Neoplastic Polyp | 18.1 ± 8.6 (4.3–16.0–47.2) | 19.8 ± 7.5 (4.6–17.7–35) | 18.4 ± 8.4 (4.3–16.4–47.2) | 0.29 |
| No. of Patients | 294 | 144 | ||
| Age (y) | 48.5 ± 12.5 | 59.7 ± 12.4 | <0.001 | |
| No. of Man Patients | 124 [42.2] | 74 [51.4] | 0.07 | |
| Size of Polyp | 9.4 ± 3.6 | 18.1 ± 8.6 | <0.001 | |
| No. of Patients | 63 | 35 | ||
| Age (y) | 51.1 ± 10.9 | 62.5 ± 11.9 | <0.001 | |
| No. of Man Patients | 17 [27.0] | 20 [57.1] | 0.07 | |
| Size of Polyp | 9.9 ± 2.9 | 19.8 ± 7.5 | <0.001 | |
Note.- Size of polyp was calculated from the one maximum value from each patients. Data in parentheses are minimum, median, maximum values, respectively. Data in brackets are percentage.
Figure 1Size distribution histogram of the polyps in the whole dataset. The average size of all polyps was 12.4 mm, and the average size of nonneoplastic and neoplastic polyp was 9.4 mm and 18.4 mm respectively. There was a substantial overlap zone between nonneoplastic polyps and neoplastic polyps.
Diagnostic Performance of Reviewers and DL-DSS.
| AUC | Comparison | Sensitivity | Specificity | Accuracy | F-1 Score | |
|---|---|---|---|---|---|---|
| Step 1 vs Step 2 (p value) | ||||||
| Reviewer A | 0.94 [0.88–0.98] | 0.49 | 88.6 (31/35) [73.3–96.8] | 85.7 (54/63) [74.6–93.3] | 86.7 (85/98) [74.1–94.6] | 0.827 [0.737–0.870] |
| Reviewer B | 0.78 [0.68–0.85] | <0.01 | 71.4 (25/35) [53.7–85.4] | 68.3 (43/63) [55.3–79.4] | 69.4 (68/98) [54.7–81.5] | 0.624 [0.509–0.704] |
| Reviewer C | 0.87 [0.79–0.93] | 0.11 | 97.1 (34/35) [85.1–99.9] | 65.1 (41/63) [52.0–76.7] | 76.5 (75/98) [63.8–85.0] | 0.747 [0.687–0.760] |
| DL-DSS | 0.92 [0.85–0.97] | 74.3 (26/35) [56.7–87.5] | 92.1 (58/63) [82.4–97.4] | 85.7 (84/98) [73.2–93.9] | 0.788 [0.663–0.867] | |
| Step 1 vs Step 3 (p value) | ||||||
| Reviewer A | 0.95 [0.88–0.98] | 0.65 | 85.7 (30/35) [69.7–95.2] | 93.7 (59/63) [84.5–98.2] | 90.8 (89/98) [79.2–97.1] | 0.869 [0.770–0.921] |
| Reviewer B | 0.91 [0.83–0.96] | <0.01 | 80.0 (28/35) [63.1–91.6] | 93.7 (59/63) [84.5–98.2] | 88.8 (87/98) [76.9–95.8] | 0.836 [0.723–0.902] |
| Reviewer C | 0.91 [0.83–0.96] | 0.17 | 91.4 (32/35) [76.9–98.2] | 71.4 (45/63) [58.7–82.1] | 78.6 (77/98) [65.2–87.9] | 0.753 [0.673–0.787] |
Note.- Data in brackets are 95% CI.
Comparison of US findings.
| Findings | Reviewer A | Reviewer B | Reviewer C | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Nonneo | Neo | p | Nonneo | Neo | p | Nonneo | Neo | p | ||
| Multiplicity | Single | 30.2% (19/63) | 71.4% (25/35) | <0.001 | 38.1% (24/63) | 74.3% (26/35) | <0.001 | 36.5% (23/63) | 80.0% (28/35) | <0.001 |
| Multiple | 69.8% (44/63) | 28.6% (10/35) | 63.9% (39/63) | 25.7% (9/35) | 63.5% (40/63) | 20.0% (7/35) | ||||
| Size | 9.4 ± 2.7 | 19.9 ± 7.5 | <0.001 | 10.2 ± 2.9 | 19.5 ± 7.6 | <0.001 | 9.6 ± 3.0 | 19.2 ± 7.9 | <0.001 | |
| Shape | Pedunculated | 95.2% (60/63) | 57.1% (20/35) | <0.001 | 73.0% (46/63) | 37.1% (13/35) | <0.01 | 81.0% (51/63) | 40.0% (14/35) | <0.001 |
| Sessile | 4.8% (3/63) | 42.9% (15/35) | 27.0% (17/63) | 62.9% (22/35) | 19.0% (12/63) | 60.0% (21/35) | ||||
| Contour | Smooth | 60.3% (38/63) | 22.9% (8/35) | <0.001 | 19.0% (12/63) | 8.6% (3/35) | 0.24 | 60.3% (38/63) | 20.0% (7/35) | <0.001 |
| Lobulated | 39.7% (25/63) | 77.1% (27/35) | 81.0% (51/63) | 91.4% (32/35) | 39.7% (25/63) | 80.0% (28/35) | ||||
| Gallstone | Absent | 88.9% (56/63) | 85.7% (30/35) | 0.75 | 46.7% (21/45) | 28.1% (9/32) | 0.15 | 85.7% (54/63) | 91.4% (32/35) | 0.53 |
| Present | 11.1% (7/63) | 14.3% (5/35) | 53.3% (24/45) | 71.9% (23/32) | 14.3% (9/63) | 8.6% (3/35) | ||||
| Vascular Core | Absent | 47.8% (22/46) | 21.9% (7/32) | 0.03 | 88.9% (56/63) | 94.3% (33/35) | 0.49 | 54.3% (25/46) | 21.9% (7/32) | <0.01 |
| Present | 52.2% (24/46) | 78.1% (25/32) | 11.1% (7/63) | 5.7% (2/35) | 45.7% (21/46) | 78.1% (25/32) | ||||
| Internal Echo Level | Hypoechoic | 41.3% (26/63) | 62.9% (22/35) | 0.03 | 64.5% (40/62) | 51.4% (18/35) | 0.30 | 34.9% (22/63) | 57.1% (20/35) | 0.05 |
| Isoechoic | 58.7% (37/63) | 34.3% (12/35) | 33.9% (21/62) | 42.9% (15/35) | 61.9% (39/63) | 40.0% (14/35) | ||||
| Hyperechoic | 0% (0/63) | 2.9% (1/35) | 1.6% (1/62) | 5.7% (2/35) | 3.2% (2/63) | 2.9% (1/35) | ||||
| Internal Echo Pattern | Homogenous | 66.7% (42/63) | 68.6% (24/35) | 1.00 | 71.4% (45/63) | 42.9% (15/35) | <0.01 | 58.7% (37/63) | 11.4% (4/35) | <0.001 |
| Heterogenous | 33.3% (21/63) | 31.4% (11/35) | 28.6% (18/63) | 57.1% (20/35) | 41.3% (26/63) | 88.6% (31/35) | ||||
| Presence of foci | Absent | 30.2% (19/63) | 48.6% (17/35) | 0.04 | 90.5% (57/63) | 97.1% (34/35) | 0.42 | 74.6% (47/63) | 45.7% (16/35) | <0.001 |
| Hypoechoic foci | 1.6% (1/63) | 11.4% (4/35) | 0% (0/63) | 0% (0/35) | 11.1% (7/63) | 45.7% (16/35) | ||||
| Hyperechoic foci | 68.3% (43/63) | 40.0% (14/35) | 9.5% (6/63) | 2.9% (1/35) | 14.3% (9/63) | 8.6% (3/35) | ||||
Note.- Nonneo: nonneoplastic polyp, Neo: neoplastic polyp.
Figure 2Example cases showing the effectiveness of DL-DSS aided diagnosis. (a) Three patients with a nonneoplastic polyp, measured over 10 mm size. Majority of the reviewers regarded these polyps as neoplastic polyp with confidence scale 3 or more. However, patient-level probability value was from 0.1 to 0.3 suggesting nonneoplastic polyp more likely. On the re-evaluation, some of the reviewers downgraded the score. (b) Three patients with a neoplastic polyp, measured from 13 to 18 mm size. Some of the reviewers classified these polyps as nonneoplastic polyp with confidence scale 3 or less. On the other hand, patient-level probability value was from 0.7 to 0.9, favoring neoplastic polyp. On the re-evaluation, some of the reviewers upgraded the score.
Diagnostic Performance of Reviewers and DL-DSS for GB polyps larger than 10 mm.
| AUC | Comparison | Sensitivity | Specificity | Accuracy | F-1 Score | |
|---|---|---|---|---|---|---|
| Step 1 vs Step 2 (p value) | ||||||
| Reviewer A | 0.92 [0.82–0.97] | 0.96 | 90.9 (30/33) [75.7–98.1] | 77.4 (24/31) [58.9–90.4] | 89.1 (57/64) [67.6–94.4] | 0.857 [0.769–0.895] |
| Reviewer B | 0.68 [0.55–0.79] | <0.001 | 69.7 (23/33) [51.3–84.4] | 54.8 (17/31) [36.0–72.7] | 62.5 (40/64) [43.9–78.7] | 0.657 [0.530–0.744] |
| Reviewer C | 0.82 [0.70–0.90] | 0.04 | 100 (33/33) [89.4–100.0] | 41.9 (13/31) [24.5–60.9] | 71.9 (46/64) [58.0–81.1] | 0.786 [0.733–0.814] |
| DL-DSS | 0.92 [0.82–0.97] | 78.8 (26/33) [61.1–91.0] | 87.1 (27/31) [70.2–96.4] | 82.8 (53/64) [65.5–93.6] | 0.825 [0.705–0.896] | |
| Step 1 vs Step 3 (p value) | ||||||
| Reviewer A | 0.94 [0.84–0.98] | 0.16 | 87.9 (29/33) [71.8–96.6] | 93.6 (29/31) [78.6–99.2] | 90.7 (58/64) [75.1–97.9] | 0.906 [0.807–0.953] |
| Reviewer B | 0.89 [0.79–0.96] | <0.001 | 84.9 (28/33) [68.1–94.9] | 87.1 (27/31) [70.2–96.4] | 86.0 (55/64) [69.1–95.6] | 0.862 [0.756–0.917] |
| Reviewer C | 0.89 [0.79–0.96] | 0.08 | 97.0 (32/33) [84.2–99.9] | 54.8 (17/31) [36.0–72.7] | 76.6 (49/31) [60.9–86.7] | 0.810 [0.743–0.825] |
Note.- Data in brackets are 95% CI.
Figure 3Flow diagrams for the patient selection and dataset division. From our institution’s medical record, we collected 923 patients who examined GB polyp on US and underwent consecutive cholecystectomy. After the exclusion step, we collected total of 535 patients. We divided patients into two temporally independent groups according to study date of their latest US exam.
Figure 4Schematic diagram of DL-DSS. (a) We used transfer learning method base on the GoogleNet Inception v3 CNN architecture. All cropped Image were resized into 299 × 299 pixels, and then processed with a pretrained Inception v3 model. Nonimage inputs were concatenated to the last fully connected layer of the network using a late fusion strategy. (b) As multiple images were included for one patient, the average of the multiple image-level probability values was used as a patient-level probability value. It is a continuous value between 0 and 1. Zero represents a definite nonneoplastic polyp, and vice versa.
Figure 5Training curves of DL-DSS. The orange and blue curves represent the (a) accuracy and (b) loss on the development and test datasets, respectively. We used an early stopping strategy to prevent overfitting, and training was performed up to 400 epochs where validation loss has reached a plateau. Final accuracy was 96.69% and 88.58% in the development set and test set.