| Literature DB >> 29888078 |
Si Wen1, Tahsin M Kurc1, Le Hou1, Joel H Saltz1, Rajarsi R Gupta1, Rebecca Batiste2, Tianhao Zhao2, Vu Nguyen1, Dimitris Samaras1, Wei Zhu1.
Abstract
Segmentation of nuclei in whole slide tissue images is a common methodology in pathology image analysis. Most segmentation algorithms are sensitive to input algorithm parameters and the characteristics of input images (tissue morphology, staining, etc.). Because there can be large variability in the color, texture, and morphology of tissues within and across cancer types (heterogeneity can exist even within a tissue specimen), it is likely that a set of input parameters will not perform well across multiple images. It is, therefore, highly desired, and necessary in some cases, to carry out a quality control of segmentation results. This work investigates the application of machine learning in this process. We report on the application of active learning for segmentation quality assessment for pathology images and compare three classification methods, Support Vector Machine (SVM), Random Forest (RF) and Convolutional Neural Network (CNN), for their performance improvement and efficiency.Entities:
Year: 2018 PMID: 29888078 PMCID: PMC5961826
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Figure 1.Workflow of active learning to support quality control in nucleus segmentation. (After generating (a)segmentation results for the whole slide images, a small portion of the images are selected to (b)markup regions showing good segmentation result from high/low threshold value and extract patches from the labeled regions; the other images are partitioned into unlabeled patches. Using the labels and (c)features extracted from the labeled patches, (d)a classifier is trained. If the users are not satisfied with the current model, they can select the unlabeled patches with less (e)uncertainty measure and (f)manually label them. Then, the (d) classifier is updated with both the original labeled patches and the new labeled patches until a satisfied model achieves)
Figure 2.Viewing images with segmentation results, marking up ROI and labeling uncertain samples. ((a) regions chosen for each threshold value. The pink color represents the segmentation result from high threshold value and the green color represent the one from low threshold value; (b) sample patches from two different colored regions - the first row presents the result from the selected threshold value, and the second row shows the one from the alternative threshold value; (c) heatmap to show the location of uncertain samples. The light blue (or orange) ones are the uncertain samples, the blue (or red) ones are patches with high predicted probability to choose result from low (or high) threshold value; (d) labeling the uncertain samples with blue line means the high threshold and red line means low threshold and the color for the patch changes to the corresponding color.)
The architecture of our 15-layer network. We use batch normalization24 followed by leaky ReLU25 in all layers, except that for the output layer, we use the sigmoid activation function.
| Layer | Output size | Filter size, stride |
|---|---|---|
| Input | 100x100x4 | - |
| Conv. | 100x100x100 | 5x5, 1 |
| Conv. | 100x100x120 | 5x5, 1 |
| Average Pooling | 50x50x120 | 2x2, 2 |
| Conv | 50x50x240 | 3x3, 1 |
| Conv. | 50x50x320 | 3x3, 1 |
| Average Pooling | 25x25x320 | 2x2, 2 |
| Conv | 25x25x640 | 3x3, 1 |
| Conv. | 25x25x1024 | 3x3, 1 |
| Conv. | 25x25x640 | 1x1, 1 |
| Conv. | 25x25x100 | 1x1, 1 |
| Conv. (repeat for 4 times) | 25x25x320 | 1x1, 1 |
| Global Average Pooling | 1x1x320 | 25x25, 25 |
| Fully Connected | 1x1x100 | - |
| Fully Connected | 1x1x80 | - |
| Output | 1x1x1 | - |
List of information of training and test dataset
| # of patches (# of images) | Breast Cancer(BC) | Pancreatic Cancer(PC) | ||
|---|---|---|---|---|
| Low threshold | High threshold | Low threshold | High threshold | |
| 2605 (13) | 3113 (8) | 2456 (6) | 1393 (9) | |
| Test set | 2463 (12) | 3209 (7) | 2465 (5) | 1424 (8) |
Figure 3.Comparison of ROC curves for different classifiers for both BC and PC
Comparison of performance of different classifiers for both BC and PC
| Test Accuracy (%) | Breast Cancer(BC) | Pancreatic Cancer(PC) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| SVM | RF | CNN-1 | CNN-2 | CNN-3 | SVM | RF | CNN-1 | CNN-2 | CNN-3 | |
| 50.58 | 58.34 | 71.28 | 70.79 | 75.11 | 47.96 | 43.28 | 66.21 | 74.08 | 73.59 | |
| 75.29 | 77.46 | 85.82 | 85.37 | 87.92 | 81.89 | 67.91 | 82.13 | 91.29 | 86.20 | |
| 76.90 | 80.15 | 86.11 | 87.87 | 90.79 | 82.92 | 71.40 | 87.96 | 92.41 | 86.83 | |
| 82.85 | 82.68 | 90.67 | 91.33 | 92.66 | 87.39 | 73.17 | 92.43 | 94.64 | 92.75 | |
Figure 4.The test accuracy changes with the total number of patches added by active learning for both BC and PC
Comparison of processing time of different classifiers for both BC and PC
| Processing Time (second) | Breast Cancer(BC) | ||||
|---|---|---|---|---|---|
| SVM | RF | CNN-1 | CNN-2 | CNN-3 | |
| 950.15 | 33.31 | 107,973.619 | 116,783.285 | 108,506.769 | |
| 909.96 | 34.34 | 108,687.878 | 106,496.861 | 107,924.15 | |
| 864.73 | 34.77 | 107,611.661 | 106,756.751 | 106,154.06 | |
| 937.02 | 35.68 | 106,532.782 | 107,620.047 | 106,215.291 | |
| 398.24 | 19.23 | 73,974.748 | 79,892.922 | 74,208.552 | |
| 450.18 | 19.69 | 73,636.762 | 72,718.406 | 74,578.636 | |
| 387.58 | 20.09 | 72,147.099 | 72,860.227 | 73,726.653 | |
| 604.91 | 21.24 | 77,015.413 | 75,224.745 | 72,458.709 | |
Figure 5.Convergence of CNN for both BC and PC. (The upper subplots present the change of training loss over time and the lower subplot shows the change of test accuracy over time. The time between two dots in the plot represent processing time for 4 epochs)