| Literature DB >> 34222640 |
Astrid de Maissin1, Remi Vallée2, Mathurin Flamant3, Marie Fondain-Bossiere4, Catherine Le Berre4, Antoine Coutrot2, Nicolas Normand2, Harold Mouchère2, Sandrine Coudol5, Caroline Trang4, Arnaud Bourreille4.
Abstract
Background and study aims Computer-aided diagnostic tools using deep neural networks are efficient for detection of lesions in endoscopy but require a huge number of images. The impact of the quality of annotation has not been tested yet. Here we describe a multi-expert annotated dataset of images extracted from capsules from Crohn's disease patients and the impact of the quality of annotations on the accuracy of a recurrent attention neural network. Methods Images of capsule were annotated by a reader first and then reviewed by three experts in inflammatory bowel disease. Concordance analysis between experts was evaluated by Fleiss' kappa and all the discordant images were, again, read by all the endoscopists to obtain a consensus annotation. A recurrent attention neural network developed for the study was tested before and after the consensus annotation. Available neural networks (ResNet and VGGNet) were also tested under the same conditions. Results The final dataset included 3498 images with 2124 non-pathological (60.7 %), 1360 pathological (38.9 %), and 14 (0.4 %) inconclusive. Agreement of the experts was good for distinguishing pathological and non-pathological images with a kappa of 0.79 ( P < 0.0001). The accuracy of our classifier and the available neural networks increased after the consensus annotation with a precision of 93.7 %, sensitivity of 93 %, and specificity of 95 %. Conclusions The accuracy of the neural network increased with improved annotations, suggesting that the number of images needed for the development of these systems could be diminished using a well-designed dataset. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/).Entities:
Year: 2021 PMID: 34222640 PMCID: PMC8216776 DOI: 10.1055/a-1468-3964
Source DB: PubMed Journal: Endosc Int Open ISSN: 2196-9736
Fig. 1Global architecture of the attention recurrent network. At each time t , we provide the Glimpse sensor with an endoscopic image X and the location lt −1 of the patch to extract from the original image. Two independent neural networks, the What? Network and the Where? Network, will then extract information related to the content and location of the patch. A gated recurrent unit (GRU) will then merge the characteristics previously extracted by the network to produce the current system state ht . From this state, three sub-networks will independently produce lt , the position of the next patch to extract, at , a vector containing a score associated with each class and bt , the baseline from which is calculated the reward for reinforcement learning.
Fig. 2Flowchart of the study. The final dataset was obtained after the selection of non-pathological (NP) and pathological (P) images of interest extracted from 66 SBCE performed in patients with CD by an initial reader. All the images were reviewed and annotated by three experts. The discordant images were read again by all four gastroenterologists to obtain a consensual annotation. Inconclusive images (I) were excluded of the dataset. The performance of the neural network has been tested at each step of the process as well as the concordance between readers.
Number (%) of images containing each type of lesion and non-pathological images for each expert and agreement between experts.
|
|
|
|
|
|
|
|
| |
| Expert 1 | 112 (3.2) | 202 (5.8) | 442 (12.6) | 315 (9.0) | 266 (7.6) | 82 (2.3) | 2037 (58.2) | 42 (1.2) |
| Expert 2 | 68 (1.9) | 58 (1.7) | 372 (10.6) | 357 (10.2) | 95 (2.7) | 296 (8.5) | 2243 (64.1) | 9 (0.3) |
| Expert 3 | 97 (2.8) | 73 (2.1) | 143 (4.1) | 618 (17.7) | 273 (7.8) | 237 (6.8) | 2011 (57.5) | 46 (1.3) |
| Category-wise Fleiss Kappas | 0.31 | 0.27 | 0.48 | 0.35 | 0.50 | 0.58 | 0.79 | 0.22 |
Proportion of images identically classified by the experts with regards to the number of images with disagreement, according to the type of lesions. Three levels of labels are considered with their respective inter-observer agreement (Fleiss Kappa coefficient)
|
|
|
| ||
| Ratio of agreed and total images N/N (%) | ||||
| Non-pathologic | 1827/2345 (78) | 1827/2345 (78) | 1827/2345 (78) | |
| Pathologic | S | 1134/1614 (70) | 80/323 (25) | 80/323 (25) |
| U > 10 | 658/1300 (51) | 74/369 (20) | ||
| U3–10 | 117/850 (14) | |||
| AU | 103/555 (19) | |||
| O | 39/406 (10) | 16/250 (6) | ||
| E | 10/197 (5) | |||
| Inconclusive | 0/94 (0) | 0/94 (0) | 0/94 (0) | |
| Total | 2961/3498 (85) | 2604/3498 (74) | 2227/3498 (64) | |
| Fleiss Kappa coefficient | 0.79 | 0.68 | 0.57 | |
| S, stenosis; U > 10, ulceration > 10 mm; U 3–10, ulceration between 3 and 10 mm; AU, aphthod ulceration; O, edema; E, erythema. | ||||
Fig. 3Identified lesions retained to define pathological images. The images show: a Two examples of erythema, b edema, c aphtoïd erosions, d ulcerations, and e stenosis.
Neural network performance evaluated on CrohnIPI dataset after successive rounds of annotation.
|
|
|
|
| |
|
| ||||
| One-reader annotation (1st round) | 3331 | 88.53 | 86.24 | 90.67 |
|
Experts annotation (2nd round)
| 3331 | 92.37 | 90.96 | 93.31 |
| Consensual annotation (3 rd round) | 3331 | 93.70 | 92.89 | 94.76 |
|
| ||||
| One-reader annotation (1st round) | 3363 | 90.90 | 90.06 | 91.70 |
|
Experts annotation (2nd round)
| 3476 | 91.83 | 88.45 | 94.00 |
| Consensual annotation (3 rd round) | 3484 | 92.48 | 88.16 | 95.24 |
Images were categorized as non-pathological or pathological when at least two readers among three were concordant.
Fig. 4Confusion matrix of the classifier based on the final dataset: predicted labels of each type of lesion, erythema (E), edema (O), stenosis (S), aphthoid ulceration (AU), ulceration 3–10 mm (U3–10) and ulceration > 10 mm (U > 10).