| Literature DB >> 36268093 |
Jonathan Folmsbee1,2, Lei Zhang1, Xulei Lu3, Jawaria Rahman4, John Gentry5, Brendan Conn6, Marilena Vered7,8, Paromita Roy9, Ruta Gupta10, Diana Lin11, Shabnam Samankan12, Pooja Dhorajiva13, Anu Peter14, Minhua Wang15, Anna Israel16, Margaret Brandwein-Weber3, Scott Doyle1,2.
Abstract
In digital pathology, deep learning has been shown to have a wide range of applications, from cancer grading to segmenting structures like glomeruli. One of the main hurdles for digital pathology to be truly effective is the size of the dataset needed for generalization to address the spectrum of possible morphologies. Small datasets limit classifiers' ability to generalize. Yet, when we move to larger datasets of whole slide images (WSIs) of tissue, these datasets may cause network bottlenecks as each WSI at its original magnification can be upwards of 100 000 by 100 000 pixels, and over a gigabyte in file size. Compounding this problem, high quality pathologist annotations are difficult to obtain, as the volume of necessary annotations to create a classifier that can generalize would be extremely costly in terms of pathologist-hours. In this work, we use Active Learning (AL), a process for iterative interactive training, to create a modified U-net classifier on the region of interest (ROI) scale. We then compare this to Random Learning (RL), where images for addition to the dataset for retraining are randomly selected. Our hypothesis is that AL shows benefits for generating segmentation results versus randomly selecting images to annotate. We show that after 3 iterations, that AL, with an average Dice coefficient of 0.461, outperforms RL, with an average Dice Coefficient of 0.375, by 0.086.Entities:
Keywords: Active learning; Computational pathology; Digital pathology; Oral cavity cancer; Region of interest; Semantic segmentation; U-net; Whole slide imaging
Year: 2022 PMID: 36268093 PMCID: PMC9577135 DOI: 10.1016/j.jpi.2022.100146
Source DB: PubMed Journal: J Pathol Inform
Fig. 1An example of semantic segmentation on oral cavity cancer.
Fig. 2An example of a less informative sample (left) vs a more informative sample (right) in looking for worst pattern of invasion. The image on the left has very sparse lymphocytic infiltration and little tumor, whereas the image on the right showcases tumor and tumor satellites as well as more distinct and dense regions of lymphocytic infiltration.
Legend of classes and their colors.
| Class name | Annotation color |
|---|---|
| Stroma | Red |
| Tumor | Blue |
| Lymphocytes | Yellow |
| Mucosa | Sky blue |
| Background/Adipose | Gray |
| Blood | Green |
| Nerves | Orange |
| Necrosis | Black |
| Keratin Pearl | Dark blue |
| Muscle | Olive |
| “Junk” (tissue folds, out of focus areas, ink) | Pink |
Fig. 3Visual representation of utilized CNN architecture.
Fig. 4Active learning pipeline emphazing the roles of the AI and the pathologists.
Fig. 5Loss curves for training and validation across different iterations of AL and RL. While the training curves are similar, we see validation losses for AL are lower across versions than RL.
Dice coefficients for present classes for holdout testing images across all versions. The highest dice coefficient for each class is in bold text.
| 1AL | 2AL | 3AL | 1RL | 2RL | 3RL | |
|---|---|---|---|---|---|---|
| Tumor | 0.719 | 0.703 | 0.708 | 0.610 | 0.695 | |
| Stroma | 0.636 | 0.643 | 0.587 | 0.616 | 0.671 | |
| Lymphocytes | 0.599 | 0.498 | 0.487 | 0.404 | 0.549 | |
| Mucosa | 0 | 0.006 | 0.002 | 0 | 0.002 | |
| Blood | 0 | 0.004 | 0.186 | 0.197 | 0.207 | |
| Keratin pearl | 0.077 | 0.189 | 0.172 | 0.164 | 0.008 | |
| Muscle | 0.077 | 0.012 | 0.011 | 0.056 | 0.008 | |
| Background/Adipose | 0.517 | 0.564 | 0.475 | 0.326 | 0.507 | |
| Average | 0.420 | 0.391 | 0.375 | 0.339 | 0.375 |
Fig. 7Unweighted average dice coefficient across all versions of AL vs RL().
Fig. 6Dice coefficients across all versions for tumor, stroma, and lymphocytes. This demonstrates the varying degrees of impact AL has across different classes of interest.
AUC for holdout testing images across all versions. The highest AUCs for each class are in bold text.
| 1AL | 2AL | 3AL | 1RL | 2RL | 3RL | |
|---|---|---|---|---|---|---|
| Tumor | 0.89 | 0.91 | 0.91 | 0.92 | ||
| Stroma | 0.54 | 0.54 | 0.7 | 0.59 | 0.53 | |
| Lymphocytes | 0.49 | 0.71 | 0.62 | 0.54 | 0.73 | |
| Blood | 0.49 | 0.36 | 0.51 | 0.71 | 0.7 | |
| Keratin pearl | 0.74 | 0.79 | 0.79 | 0.87 | 0.8 | |
| Muscle | 0.06 | 0.36 | 0.28 | 0.55 | 0.17 | |
| Background/Adipose | 1 | 1 | 1 | 1 | 1 | 1 |
Fig. 8ROC curves for holdout testing images for tumor, lymphocytes, and stroma across all versions.
Fig. 9Progression of an ROI for AL vs all 3 batches of RL. We see how RL varies wildly between batches, whereas AL gives a guarantee of qualitative performance.
Fig. 10Total ground-truth pixels for classes of interest. This showcases that there is not a statistically significant difference in the amount of pixels of ground truth added via AL vs those added for RL.
Fig. 11Progression of WSI maps generated for AL vs RL. These demonstrate that the AI WSI maps improve for both AL and RL across versions.