| Literature DB >> 32175170 |
Lorne Holland1, Dongguang Wei1, Kristin A Olson1, Anupam Mitra1, John Paul Graff1, Andrew D Jones1, Blythe Durbin-Johnson2, Ananya Datta Mitra1, Hooman H Rashidi1.
Abstract
BACKGROUND: Little is known about the effect of a minimum number of slides required in generating image datasets used to build generalizable machine-learning (ML) models. In addition, the assumption within deep learning is that the increased number of training images will always enhance accuracy and that the initial validation accuracy of the models correlates well with their generalizability. In this pilot study, we have been able to test the above assumptions to gain a better understanding of such platforms, especially when data resources are limited.Entities:
Keywords: Carcinoma; colon; convolutional neural network; machine learning
Year: 2020 PMID: 32175170 PMCID: PMC7047745 DOI: 10.4103/jpi.jpi_49_19
Source DB: PubMed Journal: J Pathol Inform
Figure 1(a) Dataset A included 1000 images acquired through ten slides (five benign colon and five invasive colon carcinomas). (b) From the above 1000 images from Dataset A, seven distinct image training set categories (1000 images, 500 images, 200 images, 100 images, 50 images, 30 images, and 10 images) were constructed to assess the significance of the number of training images on their respective model's accuracy. (c) A transfer-learning approach was employed to retrain the three distinct well established convolutional neural networks noted above in building models that could distinguish colonic carcinoma from normal colonic tissue. (d) The model's accuracy was then assessed through two distinct data sets “Internal Validation” is based on Dataset A's 20% of the images that were kept outside of the training phase and used for the first validation accuracy measure while the “External Validation” test set is based on Dataset B which was completely unknown to our trained images (taken from a variety of public domain sources) and used to assess each model's generalizability. (e) The performance parameters of the individual models were then compared, contrasted, and statistically analyzed
Figure 2Effect of the size of training set and the respective model's classification accuracy on their internal validation image test set (the 20% held-out images from Dataset A). Mean ± standard error of the mean, for each group
Figure 3Effect of the size of training set and the respective model's classification accuracy based on Dataset B, the external validation test set (veneralization). Mean ± standard error of the mean, for each group
Figure 4Correlation between internal validation accuracy (based on the held back 20% of the images from Dataset A and its subsets) and external validation accuracy (based on Dataset B) with regression line for (a) ResNet50, (b) AlexNet, and (c) SqueezeNet. ResNet50 showed the strongest correlation (R2 = 0.3) between the internal and external validation test set accuracies for their respective models. Both ResNet50 and AlexNet showed a similar slope