| Literature DB >> 30726865 |
Mohamed Amgad1, Habiba Elfandy2, Hagar Hussein3, Lamees A Atteya4, Mai A T Elsebaie5, Lamia S Abo Elnasr6, Rokia A Sakr6, Hazem S E Salem5, Ahmed F Ismail7, Anas M Saad5, Joumana Ahmed3, Maha A T Elsebaie5, Mustafijur Rahman8, Inas A Ruhban9, Nada M Elgazar10, Yahya Alagha3, Mohamed H Osman11, Ahmed M Alhusseiny10, Mariam M Khalaf12, Abo-Alela F Younes5, Ali Abdulkarim3, Duaa M Younes5, Ahmed M Gadallah5, Ahmad M Elkashash3, Salma Y Fala13, Basma M Zaki13, Jonathan Beezley14, Deepak R Chittajallu14, David Manthey14, David A Gutman15, Lee A D Cooper1,16.
Abstract
MOTIVATION: While deep-learning algorithms have demonstrated outstanding performance in semantic image segmentation tasks, large annotation datasets are needed to create accurate models. Annotation of histology images is challenging due to the effort and experience required to carefully delineate tissue structures, and difficulties related to sharing and markup of whole-slide images.Entities:
Mesh:
Year: 2019 PMID: 30726865 PMCID: PMC6748796 DOI: 10.1093/bioinformatics/btz083
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Study overview. (A) Slides from the TNBC cohort were reviewed for difficulty and the study coordinator selected a single representative ROI in each slide. (B) Participants were recruited on social media from medical student interest groups. Documentation and instructional videos were developed to train participants in breast cancer pathology and the use of DSA annotation tools. A spreadsheet lists slide-level descriptions of histologic features for each of the 151 images to aid in training. (C) Participants were each assigned six slides based on experience. Challenging slides were assigned to faculty/pathology residents, while standard slides were distributed among all participants. (D) The DSA was used by participants to draw the outlines of tissue regions in their assigned slides/ROIs. A Slack workspace enabled less experienced users to ask questions and receive guidance from the more experienced users. (E) Ten evaluation ROIs were identified in the slides and were annotated by all participants in an unsupervised manner to enable inter-participant comparisons. (F) Agreement between each pair of participants was evaluated using the Dice coefficient to generate an inter-participant discordance matrix
Fig. 2.Screenshot of the DSA and HistomicsTK web interface. The main viewport allows panning and zooming within the slide. Annotations are grouped by class into layers (middle right panel) whose style properties like color and fill can be adjusted (bottom right panel). Other features include: controlling annotation transparency, an interactive mode to highlight individual annotations, and ability to download the WSI, regions of interest or annotations. Annotation properties can also be programmatically manipulated using the DSA API
Fig. 3.Evaluation slide set concordance and model accuracy. (A) Inter-participant discordance matrices for SP, JP, NP and AL. (B) 2-D MDS plots of participant discordance. (C, D) Testing accuracy and confusion of comparison models trained on evaluation set ROIs from SPs (cyan) and NPs (magenta), measured against post-correction masks from the core set. Confusion matrix values are percentages relative to total pixel count. (Color version of this figure is available at Bioinformatics online.)
Testing accuracy of full semantic segmentation model
| Mean AUC (SD) | DICE | Accuracy | |
|---|---|---|---|
| Overall | 0.945 (0.042) (micro) | 0.888 | 0.799 |
| Tumor | 0.941 (0.058) | 0.851 | 0.804 |
| Stroma | 0.881 (0.056) | 0.800 | 0.824 |
| Inflammatory | 0.917 (0.150) | 0.712 | 0.743 |
| Necrosis | 0.864 (0.237) | 0.723 | 0.872 |
| Other | 0.885 (0.129) | 0.666 | 0.670 |
Fig. 4.Model performance over the testing set. (A) Visualization of full semantic segmentation model predictions on testing set regions of interest. Color codes used: red (tumor); transparent (stroma); cyan (inflammatory infiltrates); yellow (necrosis). (B) Area under ROC curve for semantic segmentation algorithm, broken down by region class. (C) Effect of training sample size on scale-dependent patch classification models. Each point represents the macro-average AUC of a single model, trained on different sets of randomly selected slides. (Color version of this figure is available at Bioinformatics online.)