| Literature DB >> 35023333 |
Kevin Chew Figueroa1, Bofan Song1, Sumsum Sunny2, Shaobai Li1, Keerthi Gurushanth3, Pramila Mendonca4, Nirza Mukhia3, Sanjana Patrick5, Shubha Gurudath3, Subhashini Raghavan3, Tsusennaro Imchen6, Shirley T Leivon6, Trupti Kolur4, Vivek Shetty4, Vidya Bushan4, Rohan Ramesh6, Vijay Pillai4, Petra Wilder-Smith7, Alben Sigamani4, Amritha Suresh2,4, Moni Abraham Kuriakose2,4,8, Praveen Birur3,5, Rongguang Liang1.
Abstract
SIGNIFICANCE: Convolutional neural networks (CNNs) show the potential for automated classification of different cancer lesions. However, their lack of interpretability and explainability makes CNNs less than understandable. Furthermore, CNNs may incorrectly concentrate on other areas surrounding the salient object, rather than the network's attention focusing directly on the object to be recognized, as the network has no incentive to focus solely on the correct subjects to be detected. This inhibits the reliability of CNNs, especially for biomedical applications. AIM: Develop a deep learning training approach that could provide understandability to its predictions and directly guide the network to concentrate its attention and accurately delineate cancerous regions of the image. APPROACH: We utilized Selvaraju et al.'s gradient-weighted class activation mapping to inject interpretability and explainability into CNNs. We adopted a two-stage training process with data augmentation techniques and Li et al.'s guided attention inference network (GAIN) to train images captured using our customized mobile oral screening devices. The GAIN architecture consists of three streams of network training: classification stream, attention mining stream, and bounding box stream. By adopting the GAIN training architecture, we jointly optimized the classification and segmentation accuracy of our CNN by treating these attention maps as reliable priors to develop attention maps with more complete and accurate segmentation.Entities:
Keywords: guided attention inference network; interpretable deep learning; oral cancer
Mesh:
Year: 2022 PMID: 35023333 PMCID: PMC8754153 DOI: 10.1117/1.JBO.27.1.015001
Source DB: PubMed Journal: J Biomed Opt ISSN: 1083-3668 Impact factor: 3.758
Fig. 1Block diagram of the proposed end-to-end deep learning approach for jointly optimizing OPML and malignant lesion classification and segmentation.
Fig. 2Examples of collected oral images, nonsuspicious cases (first row) and suspicious cases (second row). The red bounding box of the suspicious cases shows the lesion areas annotated by the specialists.
Fig. 3Data distributions of the training dataset (a) before and (b) after ROS, (c) validation dataset, and (d) testing dataset.
Dataset augmentation applied to stage 1 training.
| Training data augmentations | |
|---|---|
| Random rotations | [−180 deg, 180 deg] |
| Random | [0.75 |
| Random | [0.75 |
| Random | On |
| Random | On |
| Random | [−20 pixels, 20 pixels] |
| Random | [−20 pixels, 20 pixels] |
| Random | [−5 pixels, 5 pixels] |
| Random | [−5 pixels, 5 pixels] |
| RGB fill value | [128, 128, 128] |
Fig. 4The loss curve during stage 1 training on the augmented training dataset.
Fig. 5Utilizing image labels and bounding box data, a block diagram of the GAIN training architecture is shown for jointly optimizing the CNN’s classification and attention performance on OPML and malignant lesion data. The architecture consists of three streams of network training, classification stream , attention mining stream , and bounding box stream .
Fig. 6Test dataset images correctly classified as “suspicious.” The first column shows the images input into the CNN. Bounding box annotated images show in the second column, indicating the location of the OPML and malignant lesion as annotated by the medical team. Attention map (third column) output by the CNN explaining/segmenting the region of the image the network’s attention is focused on when making its classification decision; red indicates area of highest attention and blue indicates area of lowest attention.
Fig. 7Comparison between the output attention map generated by the proposed method and other conventional transfer learning trained networks: VGG19, Resnet50, and Inceptionresnetv2. Areas highlighted in red indicate the highest attention area when the CNN made its decision, and blue indicates the areas of lowest attention.