| Literature DB >> 32722344 |
Tim Boers1, Joost van der Putten1, Maarten Struyvenberg2, Kiki Fockens2, Jelmer Jukema2, Erik Schoon3, Fons van der Sommen1, Jacques Bergman2, Peter de With1.
Abstract
Early Barrett's neoplasia are often missed due to subtle visual features and inexperience of the non-expert endoscopist with such lesions. While promising results have been reported on the automated detection of this type of early cancer in still endoscopic images, video-based detection using the temporal domain is still open. The temporally stable nature of video data in endoscopic examinations enables to develop a framework that can diagnose the imaged tissue class over time, thereby yielding a more robust and improved model for spatial predictions. We show that the introduction of Recurrent Neural Network nodes offers a more stable and accurate model for tissue classification, compared to classification on individual images. We have developed a customized Resnet18 feature extractor with four types of classifiers: Fully Connected (FC), Fully Connected with an averaging filter (FC Avg(n = 5)), Long Short Term Memory (LSTM) and a Gated Recurrent Unit (GRU). Experimental results are based on 82 pullback videos of the esophagus with 46 high-grade dysplasia patients. Our results demonstrate that the LSTM classifier outperforms the FC, FC Avg(n = 5) and GRU classifier with an average accuracy of 85.9% compared to 82.2%, 83.0% and 85.6%, respectively. The benefit of our novel implementation for endoscopic tissue classification is the inclusion of spatio-temporal information for improved and robust decision making, and it is the first step towards full temporal learning of esophageal cancer detection in endoscopic video.Entities:
Keywords: Barrett neoplasia; recurrent neural networks; tissue detection; upper GI tract
Year: 2020 PMID: 32722344 PMCID: PMC7436238 DOI: 10.3390/s20154133
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Visual examples for each tissue label to be classified by the model sourced from pullback videos. These pullback videos start recording from the (a) Stomach and stops, while pulling the endoscope with a constant speed, at the (e) Squamous area. (a) Stomach; (b) Transition-zone Z-line; (c) Barrett; (d) Transition-zone Squamous; (e) Squamous.
Figure 2Data flow through the proposed model during training. Recurrent Neural Network (RNN) represents either Long Short-Term Memory (LSTM) or Gated Recurrent Unit (GRU) nodes.
Correspondence between predicted labels and ground-truth annotations of the clinicians.
| Predicted Label | True Positive If Label Is: |
|---|---|
| Stomach (St) | St |
| Transition Z-line (Tz) | Tz, B |
| Barrett (B) | Tz, B, Ts |
| Transition squamous (Ts) | B, Ts |
| Squamous (Sq) | Sq |
Figure 3Each bar illustrates the predicted organ labels per frame over a time axis. The figure illustrates the instability of the compared networks architectures at three different performance levels, which are best, median and worst performance for each model respectively. The average domain switch for Fully Connected classifier, is 43.27, Fully Connected Averaged (n = 5) 18.49, Long Short-Term Memory classifier 10.81 and Gated Recurrent Unit classifier 11.91
Mean accuracy per tissue class, and for various architecture configurations. Scores are averaged over all patient cases. The mean label accuracy is in correspondence with the label classification from Table 1. The classifiers reported are Fully Connected classifier, Fully Connected Averaged (n = 5), Long Short-Term Memory classifier, and Gated Recurrent Unit classifier.
| Label | N | Mean Accuracy (%) | |||
|---|---|---|---|---|---|
| FC | FC Avg (n = 5) | LSTM | GRU | ||
| Stomach (St) | 2593 | 59.6 | 62.2 | 60.7 |
|
| Tran. Z-line (Tz) | 2921 | 74.1 | 72.9 | 79.3 |
|
| Barrett (B) | 9444 | 95.7 | 96.1 | 98.0 |
|
| Tran. squamous (Ts) | 4215 | 79.5 | 81.3 |
| 83.9 |
| Squamous (Sq) | 755 | 58.3 | 59.9 | 63.8 |
|
| Overall | 19,931 | 82.2 | 83.0 |
| 85.6 |
Figure 4Confusion matrices display the average percentage that a True Label is predicted as a specific class. These values are normalized for the patient cases. (a) Fully Connected classifier, (b) Fully connected Averaged (n = 5), (c) Long Short-Term Memory classifier, and (d) Gated Recurrent Unit classifier.