| Literature DB >> 32064241 |
Lauren C Smail1,2, Kiret Dhindsa3,4,5, Luis H Braga6,7,8, Suzanna Becker1,5,9, Ranil R Sonnadara1,2,3,4,5,9.
Abstract
Grading hydronephrosis severity relies on subjective interpretation of renal ultrasound images. Deep learning is a data-driven algorithmic approach to classifying data, including images, presenting a promising option for grading hydronephrosis. The current study explored the potential of deep convolutional neural networks (CNN), a type of deep learning algorithm, to grade hydronephrosis ultrasound images according to the 5-point Society for Fetal Urology (SFU) classification system, and discusses its potential applications in developing decision and teaching aids for clinical practice. We developed a five-layer CNN to grade 2,420 sagittal hydronephrosis ultrasound images [191 SFU 0 (8%), 407 SFU I (17%), 666 SFU II (28%), 833 SFU III (34%), and 323 SFU IV (13%)], from 673 patients ranging from 0 to 116.29 months old (M age = 16.53, SD = 17.80). Five-way (all grades) and two-way classification problems [i.e., II vs. III, and low (0-II) vs. high (III-IV)] were explored. The CNN classified 94% (95% CI, 93-95%) of the images correctly or within one grade of the provided label in the five-way classification problem. Fifty-one percent of these images (95% CI, 49-53%) were correctly predicted, with an average weighted F1 score of 0.49 (95% CI, 0.47-0.51). The CNN achieved an average accuracy of 78% (95% CI, 75-82%) with an average weighted F1 of 0.78 (95% CI, 0.74-0.82) when classifying low vs. high grades, and an average accuracy of 71% (95% CI, 68-74%) with an average weighted F1 score of 0.71 (95% CI, 0.68-0.75) when discriminating between grades II vs. III. Our model performs well above chance level, and classifies almost all images either correctly or within one grade of the provided label. We have demonstrated the applicability of a CNN approach to hydronephrosis ultrasound image classification. Further investigation into a deep learning-based clinical adjunct for hydronephrosis is warranted.Entities:
Keywords: deep learning; diagnostic aid; diagnostic imaging; grading; hydronephrosis; machine learning; teaching aid; ultrasound
Year: 2020 PMID: 32064241 PMCID: PMC7000524 DOI: 10.3389/fped.2020.00001
Source DB: PubMed Journal: Front Pediatr ISSN: 2296-2360 Impact factor: 3.418
Figure 1The CNN architecture containing all convolutional (dark gray) and fully connected (black) layers. The convolutional kernels (light gray squares) were 3 × 3 pixels in all layers.
Figure 2The confusion matrix of the CNN model. Boxes along the diagonal in gray represent the number (percentage) of cases where the CNN made the correct classification decision. Light gray boxes represent the cases where the CNN was incorrect by one grade, and white boxes indicate cases where the CNN was incorrect by two or more grades.
CNN model classification results averaged across the 5-folds.
| Five-way (0 to IV) | 51 (49–53) | 0.49 (0.47–0.51) | |||
| SFU 0 | 0.11 (0–0.21) | 0.99 (0.97–1.00) | 0.26 (0.05–0.47) | 0.15 (0.01–0.29) | |
| SFU 1 | 0.39 (0.35–0.43) | 0.87 (0.84–0.90) | 0.39 (0.34–0.44) | 0.38 (0.35–0.42) | |
| SFU II | 0.54 (0.43–0.65) | 0.75 (0.72–0.79) | 0.45 (0.42–0.49) | 0.48 (0.43–0.53) | |
| SFU III | 0.65 (0.60–0.70) | 0.76 (0.74–0.78) | 0.59 (0.53–0.65) | 0.61 (0.56–0.66) | |
| SFU IV | 0.46 (0.29–0.62) | 0.96 (0.94–0.98) | 0.65 (0.54–0.75) | 0.52 (0.38–0.66) | |
| Mild (0, I, II) vs. Severe (III, IV) | 78 (75–82) | 0.78 (0.74–0.82) | |||
| Mild | 0.89 (0.82–0.96) | 0.66 (0.51–0.81) | 0.75 (0.69–0.81) | 0.81 (0.78–0.84) | |
| Severe | 0.66 (0.51–0.81) | 0.89 (0.82–0.96) | 0.87 (0.80–0.94) | 0.73 (0.64–0.82) | |
| SFU II vs. SFU III | 71 (68–74) | 0.71 (0.68–0.75) | |||
| SFU II | 0.76 (0.60–0.92) | 0.67 (0.52–0.82) | 0.67 (0.59–0.75) | 0.69 (0.63–0.75) | |
| SFU III | 0.67 (0.52–0.82) | 0.76 (0.60–0.92) | 0.80 (0.73–0.87) | 0.71 (0.65–0.77) |
The 95% confidence intervals are given in parentheses.
Weighted average.
Figure 3(A) Example SFU I, borderline SFU II/III, and SFU IV US images from the database. (B) The corresponding layerwise relevance propagations of each of the example images. Layer-wise relevance propagations give a sparse representation of pixel importance. Propagations were visualized as heat maps and overlaid on top of the gray-scale input US images. The cyan colored pixels indicate regions that the CNN heavily relied upon for classification. (C) The corresponding softmax output probability distribution of the borderline SFU II/III US image. The image was labeled as SFU grade III by physicians; however, the CNN predicted SFU grade II which was incorrect. We can see based on the probability distribution that the model “thought” SFU grade II and III were almost equally likely but had to select one grade as its prediction. This behavior is analogous to that of physicians and can be partially explained by the poor inter-rater reliability and subjectivity of the SFU system (i.e., intrinsic limitations of that classification).