| Literature DB >> 33565422 |
Jin Li1, Xinxin Zhi2, Junxiang Chen2, Lei Wang3, Mingxing Xu1, Wenrui Dai1, Jiayuan Sun2, Hongkai Xiong1.
Abstract
BACKGROUND AND OBJECTIVES: Along with the rapid improvement of imaging technology, convex probe endobronchial ultrasound (CP-EBUS) sonographic features play an increasingly important role in the diagnosis of intrathoracic lymph nodes (LNs). Conventional qualitative and quantitative methods for EBUS multimodal imaging are time-consuming and rely heavily on the experience of endoscopists. With the development of deep-learning (DL) models, there is great promise in the diagnostic field of medical imaging.Entities:
Keywords: convex probe endobronchial ultrasound; deep learning; lymph nodes; multimodal imaging
Year: 2021 PMID: 33565422 PMCID: PMC8544010 DOI: 10.4103/EUS-D-20-00207
Source DB: PubMed Journal: Endosc Ultrasound ISSN: 2226-7190 Impact factor: 5.628
The results of region of interest segmentation on three endobronchial ultrasound modes
| Modes | Methods | AUC | Accuracy (%) | DC (%) |
|---|---|---|---|---|
| G | U-Net | 0.9820±0.0042 | 93.35±0.58 |
|
| Attention U-Net |
|
| 83.68±1.94 | |
| R2U-Net | 0.8971±0.0398 | 80.91±1.62 | 44.01±11.83 | |
| Attention R2U-Net | 0.8776±0.0602 | 82.81±3.16 | 51.33±11.73 | |
| F | U-Net | 0.9662±0.0027 | 88.81±0.67 |
|
| Attention U-Net |
|
| 88.21±1.84 | |
| R2U-Net | 0.9136±0.0391 | 80.88±3.50 | 77.49±5.68 | |
| Attention R2U-Net | 0.9004±0.0475 | 79.99±3.22 | 77.48±4.70 | |
| E | U-Net |
|
|
|
| Attention U-Net | 0.9679±0.0018 | 89.42±0.24 | 90.13±0.72 | |
| R2U-Net | 0.9128±0.0290 | 80.35±3.06 | 83.13±4.23 | |
| Attention R2U-Net | 0.9097±0.0114 | 79.48±1.46 | 83.17±2.34 |
Bold indicates values with the best performance for each statistical indicator in each mode. E: Elastography; G: Gray scale; F: Blood flow Doppler; AUC: Area under the curve; DC: Dice
Figure 1Illustration of ENet architecture. (a) The overall architecture. The input size is 3 × 224 × 224. This architecture consists of a multiscale convolution module (illustrated in [b]), four FireBlocks (illustrated in [c]), and full connected layer. The green arrow is max-pooling operation of 2 × 2, and the blue block is the feature of size 512 × 14 × 14. (b) Multiscale convolution. This module consists of four branches. The stride of the first convolution operation in the first three branches is 2 to reduce the feature size, and the fourth branch adds a 2 × 2 pooling operation before the 1 × 1 convolution to keep the same feature size as other branches. Other convolution operations are group-wise convolution used to expand the receptive field. (c) FireBlock module. This module is borrowed from SqueezeNet. There are two modifications. One is that we add two skip connections before the concatenation operation. The other is that we apply a Squeeze excitation module at last. (d) Squeeze excitation module. This module is borrowed from SENet, which applies a channel attention mechanism to extracted features
Figure 2The architecture of EBUSNet. (a) The overall architecture of EBUSNet. The stride of the initial convolution operations is 2. (b) The architecture of the initial CentralBlock. The FireBlocks in (b) and (c) are identical to those in Figure 2. The weights of the weighted concatenation are learnable and initialized as 0.33. (c) The architecture of other CentralBlocks. The weights of the weighted concatenation are learnable and initialized as 0.33. The weights of the weighted summarization are learnable and initialized as 0.5
Figure 3The whole framework of this study. (a) The collected CP-EBUS images are preprocessed to remove redundant information initially. (b) Various deep-learning models that automatically detect the LN area are applied to CP-EBUS images. (c) The impact of ROI detection on diagnostic performance. (d) A multimodal framework named EBUSNet is designed, and the comparison between multimodal and unimodal is conducted. G: gray scale; F: blood flow Doppler; E: elastography; ROI: region of interest; CP-EBUS: convex probe endobronchial ultrasound
Characteristics of patients and lymph nodes included in the study
| Characteristic | Cases (%) |
|---|---|
| Number of patients | 267 |
| Sex | |
| Female | 99 (37.08) |
| Male | 168 (62.92) |
| Location | |
| 2R | 2 (0.68) |
| 4L | 20 (6.80) |
| 4R | 99 (33.67) |
| 7 | 100 (34.01) |
| 10L | 6 (2.04) |
| 10R | 6 (2.04) |
| 11L | 33 (11.22) |
| 11Ri | 14 (4.76) |
| 11Rs | 14 (4.76) |
| Diagnosis (malignant) | 169 (57.5) |
| Adenocarcinoma | 68 (23.13) |
| Squamous carcinoma | 31 (10.5) |
| Adenosquamous carcinoma | 1 (0.3) |
| NSCLC-NOS | 7 (2.4) |
| Small cell carcinoma | 41 (13.9) |
| Large cell neuroendocrine carcinoma | 1 (0.3) |
| NET-NOS | 6 (2.0) |
| Unknown type of lung cancer | 8 (2.7) |
| Metastatic tumors (nonlung primary malignancy) | 4 (1.4) |
| Diagnosis (benign) | 125 (42.5) |
| Inflammation | 81 (27.6) |
| Sarcoidosis | 30 (10.2) |
| Tuberculosis | 13 (4.4) |
| Nontuberculous mycobacterium infection | 1 (0.3) |
NSCLC-NOS: Nonsmall cell lung cancer not otherwise specified; NET-NOS: Neuroendocrine tumor not otherwise specified
Impact of region of interest detection on diagnostic efficiency of convex probe endobronchial ultrasound multimodal images
| Modes | Five-fold cross-validation | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||||
| 1 | 2 | 3 | 4 | 5 | Average | |||||||
|
| ||||||||||||
| AUC |
| AUC |
| AUC |
| AUC |
| AUC |
| Average AUC 95% CI |
| |
| G w/o ROI | 0.6725 | 0.7843 | 0.6850 | 0.0847 | 0.5661 | 0.1206 | 0.6705 | 0.3635 | 0.6011 | 0.0938 | 0.6390 (0.5740-0.7041) | 0.91984 |
| G w/ROI | 0.6814 | 0.6181 | 0.6297 | 0.6212 | 0.6586 | 0.6418 (0.6079-0.6757) | ||||||
| F w/o ROI | 0.6363 | 0.3089 | 0.6541 | 0.2958 | 0.6574 | 0.9650 | 0.6866 | 0.6029 | 0.5201 |
| 0.6309 (0.5508-0.7110) | 0.62530 |
| F w/ROI | 0.5841 | 0.7053 | 0.6552 | 0.6603 | 0.6386 | 0.6487 (0.5944-0.7030) | ||||||
| E w/o ROI | 0.9548 | 0.0625 | 0.9488 | 0.4774 | 0.9456 | 0.2956 | 0.9579 | 0.3211 | 0.9447 |
| 0.9503 (0.9432-0.9575) | 0.13900 |
| E w/ROI | 0.9806 | 0.9348 | 0.9626 | 0.9727 | 0.9792 | 0.9660 (0.9426-0.9894) | ||||||
Bold indicates P<0.05. The first row shows the performance of the DL model which is trained and tested with cropped images, while the second row corresponds to the models trained and tested with ROI. For each fold, AUCs are calculated for each mode with and without ROI, and P value is obtained using the Delong test between the two ROCs. For average AUCs of five-folds, P value in the last column is obtained with paired-samples t-test. G: Gray scale; F: Blood flow Doppler; E: Elastography; DL: Deep learning; ROI: Region of interest; w/o ROI: Without ROI; w/ROI: With ROI; AUCs: Area under the curves; CI: Confidence interval
Comparison of deep-learning models and human group on endobronchial ultrasound images
| Modes | 95% CI | ||||||
|---|---|---|---|---|---|---|---|
|
| |||||||
| AUC | F1-score | Accuracy, % | Sensitivity, % | Specificity, % | PPV, % | NPV, % | |
| AI group | |||||||
| G | 0.6390 (0.5980-0.6801) | 0.6873 (0.6558-0.7189) | 62.04 (58.81-65.27) | 71.17 (63.09-79.26) | 48.80 (35.39-62.21) | 67.60 (63.05-72.16) | 53.89 (48.54-59.23) |
| F | 0.6309 (0.5803-0.6815) | 0.7171 (0.6882-0.7460) | 62.37 (57.50-67.24) | 80.41 (74.79-86.04) | 36.20 (21.15-51.25) | 65.23 (60.46-69.99) | 53.75 (44.99-62.50) |
| E |
|
|
|
|
|
|
|
| Expert group | |||||||
| G | 0.6559 (0.6335-0.6783) | 75.79 (74.53-77.04) | 66.8 (65.06-68.54) | 87.82 (84.61-91.02) | 36.33 (30.39-42.27) | 66.71 (65.13-68.30) | 67.55 (63.75-71.34) |
| F | 0.734 (0.7156-0.7523) | 79.03 (76.82-81.24) | 71.02 (67.57-74.47) |
| 40.33 (31.22-49.45) | 69.25 (66.34-72.17) | 78.42 (71.94-84.90) |
| E |
|
|
| 91.49 (88.92-94.07) |
|
|
|
| Trainee group | |||||||
| G | 0.4946 (0.4458-0.5434) | 0.5512 (0.4780-0.6244) | 50.34 (45.63-55.05) | 52.41 (41.54-63.28) | 47.33 (35.85-58.81) | 58.98 (54.64-63.33) | 40.58 (36.07-45.10) |
| F | 0.4443 (0.3481-0.5404) | 0.5116 (0.3893-0.6339) | 46.26 (37.12-55.39) | 49.19 (33.55-64.84) | 42.00 (32.22-51.78) | 54.24 (46.49-61.99) | 36.96 (27.99-45.94) |
| E |
|
|
|
|
|
|
|
Bold indicates values with the best performance for each statistical indicator in each group. G: Gray scale; F: Blood flow Doppler; E: Elastography; AUC: Area under the curve; PPV: Positive predictive value; NPV: Negative predictive value; AI: Artificial intelligence
Figure 4Comparison of diagnostic performance between DL models and human group. (a) The average ROC of the AI, experts, trainees, and the whole human group on elastography and multimodal image. (b) The univariate AUC of features extracted by ENet on elastography in the last layer. (c) The statistical of AUC in (b). DL: deep learning; ROC: receiver operating characteristic; AUC: area under the curve; AI: artificial intelligence
Comparison of ENet and mainstream architectures on elastography
| Modes | 95% CI | ||||||
|---|---|---|---|---|---|---|---|
|
| |||||||
| AUC | F1-score | Accuracy, % | Sensitivity, % | Specificity, % | PPV, % | NPV, % | |
| Squeezenet | 0.9313 (0.9209-0.9418) | 0.8561 (0.8345-0.8777) | 82.45 (78.56-86.34) | 87.03 (81.54-92.53) | 75.80 (59.60-92.00) | 85.62 (77.28-93.95) | 81.13 (77.73-84.54) |
| Mobilenet | 0.9331 (0.9199-0.9463) | 0.8702 (0.8558-0.8845) | 84.73 (83.35-86.12) | 86.76 (82.15-91.37) | 81.80 (75.57-88.03) | 87.69 (84.53-90.85) | 81.69 (77.15-86.23) |
| VGG11 | 0.5730 (0.5448-0.6012) | 0.6576 (0.6092-0.7059) | 59.02 (54.75-63.29) | 67.45 (58.49-76.40) | 46.80 (38.37-55.23) | 64.87 (61.52-68.21) | 50.81 (44.36-57.26) |
| Inception-v4 | 0.9177 (0.8760-0.9593) | 0.8634 (0.8243-0.9025) | 84.08 (79.11-89.05) | 84.55 (80.55-88.55) |
|
| 78.79 (73.17-84.40) |
| NasMobile | 0.9370 (0.9176-0.9564) | 0.8762 (0.8598-0.8926) | 84.98 (82.47-87.49) | 89.10 (85.70-92.51) | 79.00 (68.84-89.16) | 86.75 (80.82-92.68) | 83.85 (80.65-87.06) |
| SeResNet | 0.9445 (0.9365-0.9525) | 0.8747 (0.8606-0.8888) | 85.22 (83.57-86.88) | 87.17 (83.46-90.89) | 82.40 (76.06-88.74) | 88.08 (84.74-91.42) | 82.07 (78.16-85.98) |
| ResNet18 | 0.9445 (0.9364-0.9526) | 0.8606 (0.8390-0.8821) | 83.10 (80.00-86.21) | 88.00 (80.25-95.75) | 76.00 (60.45-91.55) | 86.02 (77.69-94.36) | 83.42 (77.44-89.41) |
|
|
|
|
|
| 80.40 (72.99-87.81) | 87.42 (83.55-91.29) |
|
Bold indicates values with the best performance for each statistical indicator of ENet and mainstream architectures. AUC: Area under the curve; PPV: Positive predictive value; NPV: Negative predictive value; CI: Confidence interval
Comparison of ENet, EBUSNet, and human group on multimodal imaging
| Modes | 95% CI | ||||||
|---|---|---|---|---|---|---|---|
|
| |||||||
| AUC | F1-score | Accuracy, % | Sensitivity, % | Specificity, % | PPV, % | NPV, % | |
| AI group | |||||||
| E | 0.9504 (0.9458-0.9549) | 0.8854 (0.8759-0.8950) | 86.20 (85.04-87.36) | 90.20 (85.76-94.65) | 80.40 (72.99-87.81) | 87.42 (83.55-91.29) | 85.78 (81.49-90.07) |
| G+F | 0.6543 (0.6177-0.6909) | 0.7312 (0.6955-0.7668) | 65.14 (61.82-68.47) | 80.97 (72.19-89.74) | 42.20 (30.82-53.58) | 67.26 (64.49-70.03) | 63.30 (55.58-71.02) |
| G+E | 0.9506 (0.9337-0.9674) | 0.8936 (0.8660-0.9213) | 86.53 (82.23-90.83) |
| 76.20 (62.40-90.00) | 86.03 (79.25-92.81) |
|
| F+E | 0.9512 (0.9440-0.9584) | 0.8972 (0.8933-0.9012) | 87.84 (87.36-88.31) | 89.79 (86.45-93.13) |
|
| 85.62 (82.16-89.07) |
| G+F+E |
|
|
| 92.41 (91.64-93.18) | 83.00 (79.12-86.88) | 88.82 (86.53-91.11) | 88.29 (87.13-89.44) |
| Human group | |||||||
| E |
|
|
|
|
|
|
|
| G+F+E | 0.7809 (0.6821-0.8797) | 0.7984 (0.7359-0.8608) | 75.03 (67.26-82.81) | 83.68 (75.43-91.93) | 62.50 (49.77-75.23) | 77.03 (69.45-84.60) | 73.90 (61.08-86.72) |
| Expert group | |||||||
| E |
|
|
| 91.49 (88.92-94.07) |
|
|
|
| G+F+E | 0.8696 (0.8369-0.9023) | 0.8505 (0.8199-0.8810) | 80.82 (77.42-84.21) |
| 63.67 (58.25-69.08) | 78.75 (76.58-80.93) | 86.78 (77.33-96.23) |
| Trainee group | |||||||
| E |
|
|
|
| 57.00 (27.91-86.09) |
|
|
| G+F+E | 0.6922 (0.5587-0.8257) | 0.7463 (0.6584-0.8342) | 69.25 (57.23-81.28) | 74.71 (69.53-79.90) |
| 75.30 (60.57-90.03) | 61.02 (49.06-72.98) |
Bold indicates values with the best performance for each statistical indicator in each group. G: Gray scale; F: Blood flow Doppler; E: Elastography; AUC: Area under the curve; PPV: Positive predictive value; NPV: Negative predictive value; CI: Confidence interval; AI: Artificial intelligence
Optimal decision thresholds for fivefold cross-validation on the validation set and test set
| Modes | Dataset | Optimal decision thresholds for five-fold cross-validation | Summarization | ||||
|---|---|---|---|---|---|---|---|
|
| |||||||
| 1 | 2 | 3 | 4 | 5 | |||
| G+F+E | Validation | 0.6250 | 0.5714 | 0.5263 | 0.4173 | 0.5222 | 0.1224 |
| Test | 0.5849 | 0.5852 | 0.5358 | 0.4677 | 0.5135 | ||
| E | Validation | 0.5040 | 0.4492 | 0.4724 | 0.4989 | 0.4650 | 0.4170 |
| Test | 0.4850 | 0.4025 | 0.5391 | 0.3410 | 0.5969 | ||
Optimal thresholds are obtained by exhaustion. Summarization is the sum of the absolute values of the differences between the best threshold of the validation set and the best threshold of the test set. G: Gray scale; F: Blood flow Doppler; E: Elastography