| Literature DB >> 33674378 |
Robert Arntfield1, Blake VanBerlo2, Thamer Alaifan3, Nathan Phelps4, Matthew White3, Rushil Chaudhary5, Jordan Ho2, Derek Wu2.
Abstract
OBJECTIVES: Lung ultrasound (LUS) is a portable, low-cost respiratory imaging tool but is challenged by user dependence and lack of diagnostic specificity. It is unknown whether the advantages of LUS implementation could be paired with deep learning (DL) techniques to match or exceed human-level, diagnostic specificity among similar appearing, pathological LUS images.Entities:
Keywords: COVID-19; adult intensive & critical care; chest imaging; respiratory infections; ultrasound
Mesh:
Year: 2021 PMID: 33674378 PMCID: PMC7939003 DOI: 10.1136/bmjopen-2020-045120
Source DB: PubMed Journal: BMJ Open ISSN: 2044-6055 Impact factor: 2.692
Figure 1Sample images and lung ultrasound characteristics typical of the three lung pathologies that are the subject of our deep learning classifier (videos available in online supplemental files 1–3).
Figure 2Data acquisition, selection and verification workflow.
Distribution of clips and images assigned to each dataset
| Data split | Encounters (% of total) | Frames (% of total) | Clips (% of total) |
| Training set | 204 | 99 471 | 500 |
| Test-1 set | 19 (7.82%) | 9540 (7.86%) | 49 (8.00%) |
| Test-2 set | 20 (8.23%) | 12 370 (10.19%) | 63 (10.29%) |
Data profile for the three groups of lung ultrasound images used to train and test our model
| COVID | NCOVID | HPE | |
| No of patients | 84 | 78 | 81 |
| No of loops | 185 | 236 | 191 |
| No of still images | 30 419 | 44 193 | 46 769 |
| Average loops/patient | 2.23 | 2.91 | 2.42 |
| Female sex (%) | 50% | 40% | 55% |
| Age (years) | 60.6±11.3 | 56.0±16.0 | 67.2±15.3 |
| Machines models (%) | SS Edge (77.4) | SS X-Porte (56.4) | SS Edge (76.9) |
| Transducers (%) | Phased (95.3) | Phased array (98.7) | Phased array (92.3) |
| Imaging preset (%) | Abdominal (98.8) | Abdominal (97.4) | Abdominal (87.2) |
| Focal point location (%) | Automatic (100) | Automatic (97.4) | Automatic (96.1) |
| Imaging frequencies (%) | 2–5 MHz (98.8) | 2–5 MHz (100.0) | 2–5 MHz (100.0) |
| Imaging depth average (cm) | 13.4 | 12.5 | 13.1 |
| Different sonographers | 12 | 43 | 45 |
| Date range | March 2020–June 2020 | August 2017–March 2020 | October 2018–April 2020 |
HPE, hydrostatic pulmonary edema; MR, Mindray; Ph, Philips; SS, Sonosite.
Confusion matrices for the physicians (survey responses from 61 physicians classifying lung ultrasound clips into their respective causes, numbers in parenthesis reflect classifications from the aggregated approach used to calculate area under the receiver operating characteristic curve), model performance on the test-2 holdback set at the frame and the encounter level
| Physicians | Predicted | Total | |||
| COVID | NCOVID | HPE | |||
| 173 (3) | 162 (3) | 34 (2) | 369 (8) | ||
| 177 (4) | 163 (1) | 30 (2) | 370 (7) | ||
| 138 (0) | 102 (0) | 302 (6) | 542 (6) | ||
| 488 (7) | 427 (4) | 366 (10) | |||
| 3188 | 256 | 7 | 3451 | ||
| 1176 | 3741 | 3 | 4920 | ||
| 109 | 1119 | 2771 | 3999 | ||
| 4473 | 5116 | 2781 | |||
| 6 | 0 | 0 | 6 | ||
| 1 | 6 | 0 | 7 | ||
| 0 | 3 | 4 | 7 | ||
| 7 | 9 | 4 | |||
‘Predicted’ represents the model or physicians’ opinions; ‘actual’ is the true label of the clip.
CNN, convolutional neural network; HPE, hydrostatic pulmonary edema.
Classification performance metrics calculated from the model’s predictions and ground truth from the test-2 set
| Prediction type | Class | Sensitivity/Recall | Specificity | Precision | F1 score | AUC |
| Frames | COVID | 0.924 | 0.883 | 0.713 | 0.805 | 0.965 |
| NCOVID | 0.760 | 0.815 | 0.731 | 0.746 | 0.893 | |
| HPE | 0.693 | 0.999 | 0.996 | 0.817 | 0.991 | |
| Encounters | COVID | 1.0 | 0.929 | 0.857 | 0.923 | 1.0 |
| NCOVID | 0.857 | 0.769 | 0.667 | 0.75 | 0.934 | |
| HPE | 0.571 | 1.0 | 1.0 | 0.727 | 1.0 |
Metrics are reported at both the frame and encounter levels.
AUC, area under the receiver operating characteristic curve; HPE, hydrostatic pulmonary edema.
Figure 3Receiver operating characteristic curves across the three classes of images that our human benchmarking (physicians) and our model (convolutional neural network (CNN)) were tasked with interpreting. The model’s performance on the test-2 (holdback) image set is plotted for both individual images and across the entire image set from one encounter. In all image categories, it can be seen that the model interpretation accuracy exceeded that of the human interpretation.
Figure 4Grad-CAM heatmaps corresponding to a selection of our model’s predictions. Blue areas reflect the regions of the image with the highest contribution to the resulting class predicted by the model. In all cases, the immediate area surrounding the pleura appears most activated. COVID, COVID-19 pneumonia; HPE, hydrostatic pulmonary edema; NCOVID, non-COVID-related acute respiratory distress syndrome.