| Literature DB >> 35434326 |
Peter Yao1, Dan Witte1, Hortense Gimonet1, Alexander German1, Katerina Andreadis1, Michael Cheng1, Lucian Sulica1, Olivier Elemento1, Josue Barnes1, Anaïs Rameau1.
Abstract
Objective: This study aims to develop and validate a convolutional neural network (CNN)-based algorithm for automatic selection of informative frames in flexible laryngoscopic videos. The classifier has the potential to aid in the development of computer-aided diagnosis systems and reduce data processing time for clinician-computer scientist teams.Entities:
Keywords: artificial intelligence; computer vision; computer‐aided diagnosis; laryngology; machine learning; vocal fold polyp
Year: 2022 PMID: 35434326 PMCID: PMC9008155 DOI: 10.1002/lio2.754
Source DB: PubMed Journal: Laryngoscope Investig Otolaryngol ISSN: 2378-8038
FIGURE 1Overview of the informative frame classifier
Selection criteria for informative frames
| Criterium | Description | Informative | Uninformative |
|---|---|---|---|
| Vocal fold visibility | If abducted, bilateral vocal folds are visible from vocal process to anterior commissure. (If adducted, 80% visibility is acceptable) |
|
|
| Lighting and exposure | Vocal folds are well exposed by light such that the vocal folds are clearly distinguished from surrounding structures. |
|
|
| Focus | Vocal folds are in focus with minimal blurring. |
|
|
| Camera distance | Camera is appropriate distance from vocal folds such that details on vocal folds are discernable. |
|
|
FIGURE 2Definition of quantitative performance metrics: recall, precision, and F1‐score
FIGURE 3A representative sample of frames correctly classified as informative (green border) and uninformative (red border)
Precision, recall, and F1‐score for informative and uninformative classes
| Precision | Recall | F1‐Score | |
|---|---|---|---|
| Informative | 0.94 | 0.90 | 0.92 |
| Uninformative | 0.63 | 0.76 | 0.69 |
FIGURE 4Confusion matrix of the performance of the informative frame classifier on the test set. The individual table cell values represent the number of images in each category
FIGURE 5Precision‐recall curve which summarizes the trade‐off between the true positive rate and the positive predictive value for our model using different probability thresholds
FIGURE 6An attention map generated by Grad‐CAM overlaid on an image correctly predicted to be informative. The model's focus, as illustrated by the warm colors, is on relevant key structures, particularly the glottis and vocal folds