| Literature DB >> 36008835 |
Yangyang Zhu1,2, Zheling Meng2,3, Xiao Fan1, Yin Duan4, Yingying Jia5, Tiantian Dong1, Yanfang Wang1, Juan Song5, Jie Tian2,3,6, Kun Wang7,8, Fang Nie9,10,11.
Abstract
BACKGROUND: Accurate diagnosis of unexplained cervical lymphadenopathy (CLA) using medical images heavily relies on the experience of radiologists, which is even worse for CLA patients in underdeveloped countries and regions, because of lack of expertise and reliable medical history. This study aimed to develop a deep learning (DL) radiomics model based on B-mode and color Doppler ultrasound images for assisting radiologists to improve their diagnoses of the etiology of unexplained CLA.Entities:
Keywords: Cervical lymphadenopathy; Deep learning; Lymphoma; Metastatic carcinoma; Reactive hyperplasia; Tuberculous lymphadenitis; Ultrasound
Mesh:
Year: 2022 PMID: 36008835 PMCID: PMC9410737 DOI: 10.1186/s12916-022-02469-z
Source DB: PubMed Journal: BMC Med ISSN: 1741-7015 Impact factor: 11.150
Fig. 1Patient selection flowchart. CLA, cervical lymphadenopathy; US, ultrasound
Fig. 2Proposed deep learning-based hierarchical diagnostic model (CLA-HDM) to non-invasively assess unexplained CLA. a Each sub-model takes BUS and CDFI images as inputs and assigns weights between different color channels in CDFI branch and pays attention to specific CDFI features under the guidance of BUS branch via attention mechanism. b For each test case, our model utilizes dual-modal ultrasound images as inputs each time, outputs hierarchical diagnostic task-related predictive probabilities and corresponding heatmaps to compare with and assist radiologists. CLA, cervical lymphadenopathy; BUS, B-mode ultrasound; CDFI, color Doppler flow imaging; AI, artificial intelligence
Performance of sub-models and CLA-HDM in the diagnosis of unexplained CLA
| Models | Cohorts | AUC | ACC (%) | SENS (%) | SPEC (%) |
|---|---|---|---|---|---|
| Sub-model 1 | Training cohort ( | 0.986 (0.977, 0.998) | 96.8 (95.2, 98.4) | 97.9 (96.7, 99.7) | 94.4 (91.1, 98.0) |
| Internal testing cohort ( | 0.932 (0.901, 0.966) | 86.0 (81.9, 90.1) | 89.5 (85.2, 94.5) | 78.9 (70.2, 88.3) | |
| External testing cohort 1 ( | 0.963 (0.939, 0.993) | 87.6 (82.9, 93.3) | 83.3 (76.7, 90.9) | 96.9 (93.9, 103.0) | |
| External testing cohort 2 ( | 0.896 (0.846, 0.963) | 82.6 (76.1, 90.2) | 81.8 (73.4, 90.9) | 83.8 (74.0, 95.2) | |
| Sub-model 2 | Training cohort ( | 0.935 (0.902, 0.976) | 86.3 (81.5, 91.9) | 84.6 (78.7, 91.1) | 90.9 (81.8, 100.0) |
| Internal testing cohort ( | 0.922 (0.866, 0.986) | 84.2 (77.2, 91.2) | 85.7 (76.8, 94.8) | 80.0 (65.6, 97.5) | |
| External testing cohort 1 ( | 0.857 (0.758, 0.981) | 75.8 (63.6, 87.9) | 76.2 (61.9, 93.6) | 75.0 (56.7, 96.2) | |
| External testing cohort 2 ( | 0.872 (0.771, 0.986) | 78.4 (67.6, 89.2) | 71.4 (54.9, 87.9) | 87.5 (75.0, 102.8) | |
| Sub-model 3 | Training cohort ( | 0.979 (0.96, 1.012) | 93.2 (90.8, 96.0) | 92.6 (89.9, 95.6) | 96.9 (93.8, 102.6) |
| Internal testing cohort ( | 0.852 (0.759, 0.968) | 86.0 (80.7, 91.2) | 87.9 (82.8, 93.4) | 73.3 (55.0, 93.3) | |
| External testing cohort 1 ( | 0.847 (0.742, 0.969) | 86.1 (79.2, 93.1) | 88.7 (82.2, 95.8) | 70.0 (40.0, 97.1) | |
| External testing cohort 2 ( | 0.827 (0.715, 0.964) | 83.6 (76.4, 92.7) | 87.2 (78.9, 95.6) | 62.5 (35.0, 91.7) | |
| CLA-HDM | Training cohort ( | 0.964 (0.951, 0.978) | 94.1 (92.6, 95.6) | 88.2 (85.3, 91.2) | 96.1 (95.1, 97.1) |
| Internal testing cohort ( | 0.873 (0.838, 0.908) | 87.1 (84.2, 90.1) | 74.3 (68.4, 80.1) | 91.4 (89.5, 93.4) | |
| External testing cohort 1 ( | 0.837 (0.789, 0.889) | 82.9 (78.6, 86.7) | 65.7 (57.1, 73.3) | 88.6 (85.7, 91.1) | |
| External testing cohort 2 ( | 0.840 (0.789, 0.898) | 85.9 (82.1, 89.7) | 71.7 (64.1, 79.3) | 90.6 (88.0, 93.1) |
The data in brackets represent the 95% confidence intervals
Fig. 3Diagnostic performance of three task-specific sub-models and their assembled model (CLA-HDM) in the training cohort, internal testing cohort, and external testing cohort 1 and 2
Fig. 4Examples of heatmaps generated by CLA-HDM for each etiology of unexplained CLA. When ultrasound BUS and CDFI images of a case (first row) are input into CLA-HDM, it will firstly give first-level diagnostic heatmaps to distinguish benign from malignant CLA (second row) and then second-level diagnostic heatmaps to identify the specific etiologies of benign or malignant CLA (third row). Generally, the heatmaps reveals a corresponding regularity for each pathology category. CLA, cervical lymphadenopathy; BUS, B-mode ultrasound; CDFI, color Doppler flow imaging
Fig. 5Comparison between CLA-HDM and radiologists and between radiologists without and with AI assistance to identify four common etiologies for unexplained CLA. Radiologists 1 and 2 represent senior-level experience, radiologists 3 and 4 represent middle-level experience, and radiologists 5 and 6 represent junior-level experience. ROC, receiver operating characteristic curve; AI, artificial intelligence; CLA, cervical lymphadenopathy
Comparison of diagnostic performance between CLA-HDM and six radiologists, and between radiologists with and without AI assistance
| Radiologists | Internal testing cohort ( | External testing cohort 1 ( | External testing cohort 2 ( | ||||
|---|---|---|---|---|---|---|---|
| Without AI (%) | With AI (%) | Without AI (%) | With AI (%) | Without AI (%) | With AI (%) | ||
| 1 | Accuracy | 84.2 (81.3, 87.4) | 86.8 (83.9, 89.8)↑# | 82.9 (79.1, 86.7) | 83.8 (80.0, 87.6)↑ | 83.2 (78.8, 87.0) | 82.6 (78.3, 86.4) |
| Sensitivity | 68.4 (62.6, 74.9) | 73.7 (67.8, 79.5)↑ | 65.7 (58.1, 73.3) | 67.6 (60.0, 75.2)↑ | 66.3 (57.6, 73.9) | 65.2 (56.5, 72.8) | |
| Specificity | 89.5 (87.5, 91.6) | 91.2 (89.3, 93.2)↑ | 88.6 (86.0, 91.1) | 89.2 (86.7, 91.8)↑ | 88.8 (85.9, 91.3) | 88.4 (85.5, 90.9) | |
| 2 | Accuracy | 82.2 (79.5, 85.1) ** | 85.7 (83.0, 88.3)↑## | 80.5 (76.7, 84.8) | 84.3 (80.5, 88.1)↑# | 79.9 (76.1, 84.2) | 82.6 (78.8, 87.0)↑ |
| Sensitivity | 64.3 (59.1, 70.2) * | 71.4 (66.1, 76.6)↑## | 60.9 (53.3, 69.5) | 68.6 (61.0, 76.2)↑ | 59.8 (52.2, 68.5) | 65.2 (57.6, 73.9)↑ | |
| Specificity | 88.1 (86.4, 90.1) | 90.5 (88.7, 92.2)↑ | 86.9 (84.4, 89.8) | 89.5 (87.0, 92.1)↑ | 86.6 (84.1, 89.5) | 88.4 (85.9, 91.3)↑ | |
| 3 | Accuracy | 81.0 (78.1, 83.9) ** | 84.8 (82.2, 87.7)↑## | 80.5 (76.2, 84.8) | 80.5 (76.7, 84.3)↑ | 79.4 (75.0, 83.7) | 81.0 (76.6, 85.3)↑ |
| Sensitivity | 62.0 (56.1, 67.8) * | 69.6 (64.3, 75.4)↑# | 60.9 (52.4, 69.5) | 61.0 (53.3, 68.6)↑ | 58.7 (50.0, 67.4) | 62.0 (53.3, 70.7)↑ | |
| Specificity | 87.3 (85.4, 89.3) * | 89.9 (88.1, 91.8)↑ | 87.0 (84.1, 89.8) | 87.0 (84.4, 89.5)↑ | 86.2 (83.3, 89.1) | 87.3 (84.4, 90.2)↑ | |
| 4 | Accuracy | 81.0 (78.1, 84.2) ** | 86.3 (83.6, 89.2)↑### | 75.2 (71.4, 79.5) ** | 81.0 (77.1, 84.8)↑## | 77.7 (73.9, 82.2) | 82.1 (78.3, 86.4)↑# |
| Sensitivity | 62.0 (56.1, 68.4) * | 72.5 (67.3, 78.4)↑### | 50.5 (42.9, 59.1)* | 61.9 (54.3, 69.5)↑# | 55.4 (47.8, 64.1) * | 64.1 (56.5, 72.8)↑ | |
| Specificity | 87.3 (85.4, 89.5) * | 90.8 (89.1, 92.8)↑ | 83.5 (81.0, 86.4) | 87.3 (84.8, 89.8)↑ | 85.1 (82.6, 88.0) | 88.0 (85.5, 90.9)↑ | |
| 5 | Accuracy | 78.7 (75.4, 81.9) *** | 79.8 (76.9, 83.0)↑ | 77.1 (72.9, 81.0) * | 82.4 (78.6, 86.7)↑# | 76.6 (72.3, 81.0) | 79.4 (75.5, 84.2)↑ |
| Sensitivity | 57.3 (50.9, 63.7) ** | 59.7 (53.8, 66.1)↑ | 54.3 (45.7, 61.9) | 64.8 (57.1, 73.3)↑# | 53.3 (44.6, 62.0) * | 58.7 (51.1, 68.5)↑ | |
| Specificity | 85.7 (83.6, 87.9) ** | 86.6 (84.6, 88.7)↑ | 84.8 (81.9, 87.3) | 88.3 (85.7, 91.1)↑ | 84.4 (81.5, 87.3) * | 86.2 (83.7, 89.5)↑ | |
| 6 | Accuracy | 76.9 (73.7, 80.1) *** | 82.2 (78.9, 85.1)↑## | 77.1 (73.3, 81.0) * | 81.9 (78.1, 86.2)↑# | 74.5 (70.1, 78.8) * | 78.3 (73.9, 82.6)↑ |
| Sensitivity | 53.8 (47.4, 60.2) *** | 64.3 (57.9, 70.2)↑## | 54.3 (46.7, 61.9) | 63.8 (56.2, 72.4)↑ | 48.9 (40.2, 57.6) ** | 56.5 (47.8, 65.2)↑ | |
| Specificity | 84.6 (82.5, 86.7) ** | 88.1 (85.9, 90.1)↑ | 84.8 (82.2, 87.3) | 87.9 (85.4, 90.8)↑ | 83.0 (80.1, 85.9) ** | 85.5 (82.6, 88.4)↑ | |
The data in brackets represent the 95% confidence intervals. * indicates a statistically significant difference between CLA-HDM and radiologist without AI assistance (*P < 0.05, **P < 0.01, and ***P < 0.001); # indicates a statistically significant difference between radiologist without and with CLA-HDM assistance (#P < 0.05, ##P < 0.01, and ###P < 0.001). The upward arrow (↑) represents indicators that improved owing to AI assistance
Comparison of diagnostic performance between the groups of radiologists at different levels
| Different levels of radiologist group | Internal testing cohort ( | External testing cohort 1 ( | External testing cohort 2 ( | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Without → with AI (%) | P1 | P2 | Without → with AI (%) | Without → with AI (%) | ||||||
| Senior | Accuracy | 83.2 → 86.3 ↑ | 0.033 | / | 81.7 → 84.1 ↑ | 0.132 | / | 81.5 → 82.6 ↑ | 0.291 | / |
| Sensitivity | 66.4 → 72.5 ↑ | 0.032 | / | 63.3 → 68.1↑ | 0.132 | / | 63.0 → 65.2 ↑ | 0.314 | / | |
| Specificity | 88.8 → 90.8 ↑ | 0.040 | / | 87.8 → 89.4 ↑ | 0.116 | / | 87.7 → 88.4 ↑ | 0.292 | / | |
| Middle | Accuracy | 81.0 → 85.5 ↑ | 0.007 | 0.534 | 77.9 → 80.7 ↑ | 0.088 | 0.616 | 78.5 → 81.5 ↑ | 0.096 | 0.628 |
| Sensitivity | 62.0 → 71.1 ↑ | 0.005 | 0.551 | 55.7 → 61.4 ↑ | 0.107 | 0.614 | 57.1 → 63.1 ↑ | 0.087 | 0.647 | |
| Specificity | 87.3 → 90.4 ↑ | 0.004 | 0.537 | 85.2 → 87.1 ↑ | 0.093 | 0.589 | 85.7 → 87.7 ↑ | 0.102 | 0.644 | |
| Junior | Accuracy | 77.8 → 81.0 ↑ | 0.033 | 0.420 | 77.1 → 82.1 ↑ | 0.013 | 0.670 | 75.6 → 78.8 ↑ | 0.075 | 0.783 |
| Sensitivity | 58.8 → 62.0 ↑ | 0.037 | 0.440 | 54.3 → 64.3↑ | 0.009 | 0.702 | 51.1 → 57.6 ↑ | 0.091 | 0.790 | |
| Specificity | 85.2 → 87.3 ↑ | 0.034 | 0.430 | 84.8 → 88.1 ↑ | 0.016 | 0.671 | 83.7 → 85.9 ↑ | 0.080 | 0.759 | |
P1 values indicate a comparison between the AI model and the different levels of radiologist groups without AI assistance. P2 values indicate a comparison between junior and middle experienced radiologist group with AI assistance and senior experienced radiologist group without AI assistance. The upward arrow (↑) represents indicators that improved owing to AI assistance