| Literature DB >> 34178558 |
Anabik Pal1, Zhiyun Xue1, Brian Befano2, Ana Cecilia Rodriguez3, L Rodney Long1, Mark Schiffman3, Sameer Antani1.
Abstract
Cervical cancer is caused by the persistent infection of certain types of the Human Papillomavirus (HPV) and is a leading cause of female mortality particularly in low and middle-income countries (LMIC). Visual inspection of the cervix with acetic acid (VIA) is a commonly used technique in cervical screening. While this technique is inexpensive, clinical assessment is highly subjective, and relatively poor reproducibility has been reported. A deep learning-based algorithm for automatic visual evaluation (AVE) of aceto-whitened cervical images was shown to be effective in detecting confirmed precancer (i.e. direct precursor to invasive cervical cancer). The images were selected from a large longitudinal study conducted by the National Cancer Institute in the Guanacaste province of Costa Rica. The training of AVE used annotation for cervix boundary, and the data scarcity challenge was dealt with manually optimized data augmentation. In contrast, we present a novel approach for cervical precancer detection using a deep metric learning-based (DML) framework which does not incorporate any effort for cervix boundary marking. The DML is an advanced learning strategy that can deal with data scarcity and bias training due to class imbalance data in a better way. Three different widely-used state-of-the-art DML techniques are evaluated- (a) Contrastive loss minimization, (b) N-pair embedding loss minimization, and, (c) Batch-hard loss minimization. Three popular Deep Convolutional Neural Networks (ResNet-50, MobileNet, NasNet) are configured for training with DML to produce class-separated (i.e. linearly separable) image feature descriptors. Finally, a K-Nearest Neighbor (KNN) classifier is trained with the extracted deep features. Both the feature quality and classification performance are quantitatively evaluated on the same data set as used in AVE. It shows that, unlike AVE, without using any data augmentation, the best model produced from our research improves specificity in disease detection without compromising sensitivity. The present research thus paves the way for new research directions for the related field.Entities:
Keywords: Automated cervical visual examination; cervical cancer; deep metric learning; siamese network
Year: 2021 PMID: 34178558 PMCID: PMC8224396 DOI: 10.1109/access.2021.3069346
Source DB: PubMed Journal: IEEE Access ISSN: 2169-3536 Impact factor: 3.367
FIGURE 1.(a) Block diagram of the proposed system. Upper part denotes two-step training phase and lower part denotes test phase. (b) Block diagram of deep metric learning (DML). Images and their class labels are inputted and the mini-batch loss is computed based on the image embeddings.
FIGURE 2.Samples of cervical images from the present data set. Left image from Control class. Right image from Case class.
Age-stratified data set splits. The entries in the table denote the number of images.
| Age-Group | Training | Val1 | Val2 | Hold-out Test | ||||
|---|---|---|---|---|---|---|---|---|
| Conrol | Case | Conrol | Case | Conrol | Case | Conrol | Case | |
| <25 | 36 | 12 | 5 | 2 | 24 | 9 | 924 | 19 |
| 25-49 | 324 | 119 | 53 | 22 | 153 | 58 | 5010 | 45 |
| >49 | 115 | 26 | 26 | 4 | 65 | 15 | 2232 | 20 |
| Total | 475 | 157 | 84 | 28 | 242 | 82 | 8174 | 85 |
FIGURE 3.Loss improvement during DML training.
FIGURE 4.The t-SNE plots of feature embeddings.
Mean K-Precision in val2 data set. Network-wise best performing Mean K-Precision for different values of K are bold-faced.
| Neighbour (K) | Network | Method | ||||
|---|---|---|---|---|---|---|
| Pre-trained | Fine-tuned | Contrastive | N-Pair Embedding | Batch-hard | ||
| 1 | ResNet-50 | 0.6512 | 0.9043 | 0.8920 | 0.9105 | |
| MobileNet | 0.7099 | 0.9105 | 0.9074 | 0.9105 | ||
| NasNet | 0.6636 | 0.8704 | 0.9105 | 0.9228 | ||
| 3 | ResNet-50 | 0.6687 | 0.8992 | 0.8940 | 0.9115 | 0.9198 |
| MobileNet | 0.6636 | 0.9043 | 0.8961 | 0.9105 | ||
| NasNet | 0.6605 | 0.8755 | 0.8961 | 0.9208 | ||
| 5 | ResNet-50 | 0.6784 | 0.8926 | 0.8932 | 0.9105 | |
| MobileNet | 0.6611 | 0.9031 | 0.8951 | 0.9105 | ||
| NasNet | 0.6451 | 0.8716 | 0.8963 | 0.9204 | ||
Class-wise Mean N-Precision in val2 data set. The best performing feature representation method is chosen based on the average of Case and Control’s Mean N-precision. Network-wise best performing feature representation methods are bold-faced.
| Network | Pre-trained | Fine-tuned | Contrastive | N-Pair Embedding | Batch-hard | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Control | Case | Control | Case | Control | Case | Control | Case | Control | Case | |
| ResNet-50 | 0.7557 | 0.2458 | 0.8814 | 0.4273 | 0.9762 | 0.5593 | 0.9790 | 0.8119 | ||
| MobileNet | 0.7383 | 0.3305 | 0.8700 | 0.5027 | 0.9679 | 0.6385 | 0.9750 | 0.8482 | ||
| NasNet | 0.7381 | 0.3262 | 0.8766 | 0.5443 | 0.9714 | 0.6461 | 0.9709 | 0.8846 | ||
KNN classification accuracy for val2 data set in percentage (%). The best performing classification model is chosen based on the average of Case and Control’s classification accuracy. The bold-faced numbers represent the network-wise best performing models for different values of K.
| K | Network | Pre-trained | Fine-tuned | Contrastive | N-Pair Embedding | Batch-hard | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Control | Case | Control | Case | Control | Case | Control | Case | Control | Case | ||
| K=1 | ResNet-50 | 78.93 | 24.39 | 95.45 | 75.61 | 96.28 | 68.29 | 94.63 | 76.83 | ||
| MobileNet | 80.99 | 41.46 | 95.04 | 79.27 | 93.39 | 73.17 | 92.15 | 86.59 | |||
| NasNet | 76.45 | 36.59 | 91.74 | 73.17 | 95.04 | 68.29 | 91.74 | 89.02 | |||
| K=3 | ResNet-50 | 88.43 | 18.29 | 96.28 | 73.17 | 96.69 | 68.29 | 95.87 | 80.49 | ||
| MobileNet | 82.64 | 32.93 | 95.45 | 75.61 | 94.63 | 75.61 | 92.15 | 86.59 | |||
| NasNet | 84.30 | 30.49 | 93.39 | 76.83 | 95.87 | 68.29 | 91.74 | 89.02 | |||
| K=5 | ResNet-50 | 95.04 | 10.98 | 95.04 | 71.95 | 96.69 | 68.29 | 94.21 | 80.49 | ||
| MobileNet | 83.88 | 28.05 | 96.28 | 75.61 | 95.04 | 79.27 | 92.15 | 86.59 | |||
| NasNet | 87.19 | 26.83 | 94.63 | 71.95 | 95.45 | 69.51 | 92.56 | 87.80 | |||
Mean-K Precision on hold-out test set. Network-wise best results are bold-faced.
| K | Network | Contrastive | N-Pair Embedding | Batch-hard |
|---|---|---|---|---|
| K=1 | ResNet-50 | 0.8964 | 0.8962 | |
| MobileNet | 0.8885 | 0.8845 | ||
| NasNet | 0.8961 | 0.8514 | ||
| K=3 | ResNet-50 | 0.8961 | 0.8962 | |
| MobileNet | 0.8877 | 0.8845 | ||
| NasNet | 0.8959 | 0.8514 | ||
| K=5 | ResNet-50 | 0.8959 | 0.8962 | |
| MobileNet | 0.8890 | 0.8845 | ||
| NasNet | 0.8958 | 0.8514 |
N-Precision for hold-out test set. The best performing DML model is chosen based on the average of Case and Control’s mean N-precision. The network-wise best results are bold-faced.
| Network | Contrastive | N-Pair Embedding | Batch-hard | |||
|---|---|---|---|---|---|---|
| Control | Case | Control | Case | Control | Case | |
| ResNet-50 | 0.9664 | 0.3951 | 0.9645 | 0.7014 | ||
| MobileNet | 0.9549 | 0.5061 | 0.9630 | 0.7365 | ||
| NasNet | 0.9575 | 0.5227 | 0.9643 | 0.7365 | ||
Comparison with state-of-the-art on hold-out test data. This table shows overall and age-stratified (95% CI with exact binomial) comparison of best DML model with Faster RCNN [8]. Reported age stratified analysis excludes nine (9) women as their ages are missing.
| Age Group | Method | Predicted Classes | Ground Truth | |||||
|---|---|---|---|---|---|---|---|---|
| Case | COL% | 95% CI | Control | COL% | 95% CI | |||
| All | Faster-RCNN | Case | 71 | 83.5% | 73.9% - 90.7% | 1392 | 17.0% | 16.2% - 17.9% |
| Control | 14 | 16.5% | 9.3% - 26.1% | 6782 | 83.0% | 82.1% - 83.8% | ||
| BH-NasNet-1-NN | Case | 71 | 83.5% | 73.9% - 90.7% | 1213 | 14.8% | 14.1% - 15.6% | |
| Control | 14 | 16.5% | 9.3% - 26.1% | 6961 | 85.2% | 84.4% - 85.9% | ||
| <25 | Faster-RCNN | Case | 15 | 78.9% | 54.4% - 93.9% | 212 | 22.9 % | 20.3% - 25.8% |
| Control | 4 | 21.1% | 6.1% - 45.6% | 712 | 77.1% | 74.2% - 79.7% | ||
| BH-NasNet-1-NN | Case | 14 | 73.7% | 48.8% - 90.9% | 220 | 23.8% | 21.1% - 26.7% | |
| Control | 5 | 26.3% | 9.1% - 51.2% | 704 | 76.2% | 73.3% - 78.9% | ||
| 25-49 | Faster-RCNN | Case | 43 | 95.6% | 84.9% - 99.5% | 800 | 16.0% | 15.0% - 17.0% |
| Control | 2 | 4.4% | 0.5% - 15.1% | 4210 | 84.0% | 83.0% - 85.0% | ||
| BH-NasNet-1-NN | Case | 41 | 91.1% | 78.8% - 97.5% | 692 | 13.8% | 12.9% - 14.8% | |
| Control | 4 | 8.9% | 2.5% - 21.2% | 4318 | 86.2% | 85.2% - 87.1% | ||
| >50 | Faster-RCNN | Case | 12 | 60.0% | 36.1% - 80.9% | 380 | 17.0% | 15.5% - 18.6% |
| Control | 8 | 40.0% | 19.1% - 63.9% | 1852 | 83.0% | 81.4% - 84.5% | ||
| BH-NasNet-1-NN | Case | 15 | 75.0% | 50.9% - 91.3% | 301 | 13.5% | 12.1% - 15.0% | |
| Control | 5 | 25.0% | 8.7% - 49.1% | 1931 | 86.5% | 85.0% - 87.9% | ||
Confusion matrix of the best DML model (BH-NasNet-1-NN) and comparison with state-of-the-art performance on hold-out test data. This table shows overall and age stratified Kappa statistics between best DML model and Faster RCNN [8]. Reported age stratified analysis excludes nine (9) women as their ages are missing.
| Age group | Faster RCNN | BH-NasNet-1-NN | Kappa | |
|---|---|---|---|---|
| Control | Case | |||
| All ages | Control | 6613 | 183 | 0.76 |
| Case | 362 | 1101 | ||
| <25 | Control | 676 | 40 | 0.79 |
| Case | 33 | 194 | ||
| 25-49 | Control | 4120 | 92 | 0.78 |
| Case | 202 | 641 | ||
| >50 | Control | 1809 | 51 | 0.70 |
| Case | 127 | 265 | ||