Literature DB >> 31359913

Performance of A Convolutional Neural Network in Screening Liquid Based Cervical Cytology Smears.

Parikshit Sanyal¹, Sanghita Barui¹, Prabal Deb², Harish Chander Sharma³.

Abstract

CONTEXT: Cervical cancer is the second most common cancer in women. The liquid based cervical cytology (LBCC) is a useful tool of choice for screening cervical cancer. AIMS: To train a convolutional neural network (CNN) to identify abnormal foci from LBCC smears. SETTINGS AND
DESIGN: We have chosen retrospective study design from archived smears of patients undergoing screening from cervical cancer by LBCC smears.
MATERIALS AND METHODS: 2816 images, each of 256 × 256 pixels, were prepared from microphotographs of these LBCC smears, which included 816 "abnormal" foci (low grade or high grade squamous intraepithelial lesion) and 2000 'normal' foci (benign epithelial cells and reactive changes). The images were split into three sets, Training, Testing, and Evaluation. A convolutional neural network (CNN) was developed with the python programming language. The CNN was trained with the Training dataset; performance was assayed concurrently with the Testing dataset. Two CNN models were developed, after 20 and 10 epochs of training, respectively. The models were then run on the Evaluation dataset. STATISTICAL ANALYSIS USED: A contingency table was prepared from the original image labels and the labels predicted by the CNN.
RESULTS: Combined assessment of both models yielded a sensitivity of 95.63% in detecting abnormal foci, with 79.85% specificity. The negative predictive value was high (99.19%), suggesting potential utility in screening. False positives due to overlapping cells, neutrophils, and debris was the principal difficulty met during evaluation.
CONCLUSIONS: The CNN shows promise as a screening tool; however, for its use in confirmatory diagnosis, further training with a more diverse dataset will be required.

Entities: Chemical

Keywords: Artificial intelligence; cervical cytology; liquid based smears; neural network; screening

Year: 2019 PMID： 31359913 PMCID： PMC6592125 DOI： 10.4103/JOC.JOC_201_18

Source DB: PubMed Journal: J Cytol ISSN： 0970-9371 Impact factor: 1.000

INTRODUCTION

Cancer of the cervix uteri is the second most common cancer among women worldwide.[1] In India the incidence rate of cervical cancer is 14.7/100000 women per year, making it the second most common cancer in Indian women.[2] Liquid based cervical cytology (LBCC) has emerged as the standard of care for screening cervical cancer, due to its higher sensitivity and specificity than conventional smears, reduced artifacts such as blood, mucus, and other debris in the slide, reduced the rate of false negative diagnoses and a reduction in inadequate smears.[345] Screening LBCC for abnormal cells is a demanding and labor intensive task requiring a significant investment in man hours.[6] Thus, there has been interest in automated screening of cervical smears.[7] Presently, the Focal Point Slide Profiler (FSPS) ® (BD Tripath Imaging) and the ThinPrep Imaging System ® have been approved by FDA as primary screening tools for cervical cytology smears.[89101112] However, both of these systems are closed source, with proprietary rights and are tightly adherent to their respective devices, and require staining methods directed by the company. There is a need for a generic image analysis tool for screening liquid based cervical cytology (LBCC) smears. Artificial neural networks (ANN) are systems of linear algebra that mimic the way the brain computes information. It calibrates it's coefficients as to perform a certain task, e.g., pattern recognition. Thus, it has the ability to build up its own rules, referred to as “experience”.[13] Convolutional neural networks (CNNs) are a special class of ANNs which take a whole image as input and classify the image in defined categories. The input image is passed through multiple “layers” in a feed forward manner, each layer comprising multiple, independent, and linear convolutional filters.[14] Each layer passes its output to the next layer, with an overlaid nonlinearity. The network is trained by a method of back propagation, i.e. adjusting the coefficients of the linear equations in each layer, until the desired output is achieved. Further details about CNNs, the significance of each of their components, and how they perform image classification have been described by Karpathy et al.[15] In the present study, we have chosen a CNN to identify foci of abnormality from LBCC smears. The smears were first classified as per the Bethesda System 2014, and then categorised in two broad strata.[16] “normal” - comprising of the broad group “Negative for intraepithelial lesion or malignancy” (NILM), including normal epithelial cells, reactive, and inflammatory changes, infections “abnormal” - comprising of Low grade squamous intraepithelial lesions (LSIL) and High grade squamous intraepithelial lesions (HSIL) The objective of the present study was to train the CNN to identify foci of HSIL and LSIL and report them as abnormal. Subsequently, the foci are displayed to the pathologist for further evaluation. Effectively, the CNN would perform the role of a slide screener for LBCC smears.

SUBJECTS AND METHODS

Cases undergoing screening for cervical cancer by LBCC smears were selected from a tertiary care hospital. A total of 36 LBCC smears were prepared. The BD SurePath™ technique and instrument was used for slide preparation.[17] 89 images from “abnormal” smears (HSIL and LSIL) and 462 images from “normal” smears were microphotographed, using a Nikon microphotography system. The various categories of diagnoses are shown in Table 1.

Table 1

Distribution of LBCC images by diagnostic category as per Bethesda System 2014 (n=551)

CategoryDiagnosis by Bethesda System	“Normal”		“Abnormal”

	Normal cellular elements onlyORReactive cellular changes associated with inflammation	Shift in flora suggestive of bacterial vaginosis	HSIL	LSIL
Number of cases (smears)	22	06	04	04
Number of microphotographs	392	70	58	31

Distribution of LBCC images by diagnostic category as per Bethesda System 2014 (n=551) Each image was of 1280 × 720 pixels resolution. Images were systematically sliced by the ImageMagick™ command line tool into 256 × 256 foci.[18] A single image produced multiple such foci. After slicing into foci, blurred foci with indistinct features were manually removed. A total 2816 foci, each of 256 × 256 resolution, were selected for training and evaluation. The entire set (N = 2816) was split into random sets as shown in Table 2. No duplication was allowed between the sets.

Table 2

Splitting the image data in training, testing and evaluation categories (n=2816)

	Abnormal	Normal	Total
Training	410	410	820
Testing	200	200	400
Evaluation	206	1390	1596
Total	816	2000	2816

Splitting the image data in training, testing and evaluation categories (n=2816) A CNN was developed with the Python language, using the Keras platform.[19] The training method published by Cholet et al. was adopted.[20] The architecture of the network is shown in Figure 1. The network takes an image of size 3 × 256 × 256 (the three channels red, green and blue are represented separately in a color image), applies successive convolution and pooling layers until an output of 0 or 1 is produced.

Figure 1

Architecture of the Convolutional Neural Network

Architecture of the Convolutional Neural Network The CNN was trained on the training set twice: once with 20 epochs (Model A) once with 10 epochs (Model B). Thus, two different learning models were prepared. In each epoch, 500 batches of 16 images each were randomly selected by the network for training. The network self-calibrated its parameters over the period of training. Concurrent testing was carried out during training in the “Testing” dataset to keep track of learning by the CNN, as seen in Figure 2.

Figure 2

Training of the CNN (Model A)

Training of the CNN (Model A) The accuracy on Testing set gradually increased with training, as seen in Figure 3. In Model A (20 epochs), the accuracy peaked after 10 epochs and then settled at 92.25%; the loss function (error rate) stabilized at 0.4. In Model B (10 epochs), a higher accuracy (94.75%), and lower error rate (0.23) was observed. This might be attributed to the learning rate parameter of the CNN, which causes the accuracy to decrease after reaching a peak in middle of training. However, both the models were preserved for evaluation.

Figure 3

Accuracy and loss function of two models plotted against epochs of training

Accuracy and loss function of two models plotted against epochs of training The trained models were then run on the Evaluation dataset. Results were statistically interpreted by the R software package.[21]

RESULTS

Concurrent testing during training yielded the following results [Table 3]. In Model A, 92.25% diagnostic accuracy was achieved after 20 epochs of training. 13 (6.5%) foci were falsely labeled as “abnormal” by the CNN [Figure 4]. In Model B, the false positive rate was slightly higher (7%) during training, but the accuracy was also higher (94.75%), due to the reduced false negative rate (7 out of 200 foci, 3.5%).

Table 3

Results on concurrent testing dataset (n=400)

	Original

	Model A		Model B

	Normal	Abnormal	Normal	Abnormal
Labeled by CNN
Normal	187	18	186	7
Abnormal	13	182	14	193
Total	200	200	200	200

Figure 4

Accuracy and loss function of two models plotted against epochs of training

Results on concurrent testing dataset (n=400) Accuracy and loss function of two models plotted against epochs of training After completion of training, both the models were run on the Evaluation set [Figure 5], which shows the following results [Table 4].

Figure 5

The CNN predicting labels for the Evaluation set (Model A)

Table 4

Results on evaluation dataset (n=1596) by Model A

	True label		Total

	Normal	Abnormal
Label by CNN
Normal	1221 True negative (TN)	31 False negative (FN)	1252
Abnormal	169 False positive (FP)	175 True positive (TP)	344
Total	1390	206	1596
Sensitivity	TP/(TP + FN)	175/206	84.95%
Specificity	TN/(TN + FP)	1221/1390	87.84%
Positive predictive value	TP/(TP + FP)	175/344	50%
Negative predictive value	TN/(TN + FN)	1221/1252	97.54%

The CNN predicting labels for the Evaluation set (Model A) Results on evaluation dataset (n=1596) by Model A Among 1390 images of the ‘normal’ class, 169 images (12.2%) were falsely labeled “abnormal” by Model A. Also, 31 out of 206 images of the “abnormal” class (15%) were missed by the CNN and falsely labeled “normal”. Model B showed higher sensitivity (95.10%) and is thus suitable for screening; however, specificity was lower than Model A [Table 5]. A combination analysis of both models (i.e. a focus must be labeled “abnormal” by either of the two models to be diagnosed abnormal) showed 95.63% sensitivity [Table 6]. 197 out of 206 abnormal foci were correctly labeled “abnormal” by the combined model.

Table 5

Results on evaluation dataset (n=1596) by Model B

	True label		Total

	Normal	Abnormal
Label by CNN
Normal	1123 True negative (TN)	10 False negative (FN)	1133
Abnormal	267 False positive (FP)	196 True positive (TP)	463
Total	1390	206	1596
Sensitivity	TP/(TP + FN)	196/206	95.10%
Specificity	TN/(TN + FP)	1123/1390	80.80%
Positive predictive value	TP/(TP + FP)	196/463	42.30%
Negative predictive value	TN/(TN + FN)	1123/1133	99.10%

Table 6

Results on evaluation dataset by combinatorial analysis of two models

	True label		Total

	Normal	Abnormal
Combined label by two models
Normal	1110 True negative (TN)	9 False negative (FN)	1119
Abnormal	280 False positive (FP)	197 True positive (TP)	477
Total	1390	206	1596
Sensitivity	TP/(TP + FN)	197/206	95.63%
Specificity	TN/(TN + FP)	1110/1390	79.85%
Positive predictive value	TP/(TP + FP)	196/477	41.29%
Negative predictive value	TN/(TN + FN)	1123/1133	99.19%

Results on evaluation dataset (n=1596) by Model B Results on evaluation dataset by combinatorial analysis of two models

DISCUSSION

Most of the approaches to automated image analysis have focused on cell segmentation, which has remained an elusive problem.[22] Early approaches to segmentation included geometric image analysis techniques such as mean-shift, median filtering, adaptive thresholding, Canny edge detection, edge detection by Riemannian dilatation, and Hough transform for finding candidate nuclei.[232425] Doudkine et al. approached the segmentation problem by analyzing texture features from slides stained with quantitative stains for DNA. They used descriptive statistics of chromatin distribution, discrete texture features, ranges, Markovian, run length and fractal texture features for image classification.[26] More recently, Rodenacker et al. prepared a set of parameters for extracting features from cytological images.[27] A geometrical segmentation approach was also used by Anderson et al., who reported a sensitivity of 95% for severe dysplasia and 90% for moderate dysplasia. A combination of nuclear texture features was found which could reliably classify the images.[28] Good segmentation of LBCC smears was achieved by Zhang et al.; their segmentation method achieved 93% accuracy for cytoplasm, and 87.3% F-measure for nuclei.[29] Neural networks provide an alternative approach to the problem. These networks process an image in entirety and produce an output. After repeated epochs of reinforcement training, the network adjusts its parameters to produce the correct result in majority of cases. Thus, the segmentation problem is bypassed. A very early neural network was the PAPNET system, developed in the 1990s.[30] Over the last two decades, the convolutional neural network (CNN) model has proved to be a reliable image classifier in several scenarios, including recognizing everyday objects, traffic signs, text, and handwritten numbers.[31] The CNN model has been chosen for the purpose because of its consistently superior performance than other machine learning models.[323334] The CNN extracts features from an image in its successively deeper layers [Figure 6], until the image is converted to a single number “0”, which corresponds to “abnormal”, or “1”, corresponding to “normal”.

Figure 6

Intermediate layers of the network, producing a final label

Intermediate layers of the network, producing a final label The principal difficulties met in this study was that of false positives, unlike Anderson et al. 280 out of 1390 normal foci (20%) were marked “abnormal” by the combined model. The false positives observed in this study might be attributable to overfitting, i.e. training to both the “signal” (i.e. abnormal cells) and “noise” (artifacts) in the Training data. In a few cases, overlapping cells produced hyperchromasia in the image, which has been wrongly labeled as “abnormal” by the CNN [Figure 7]. Also, we have randomly sliced images, so that neutrophils, background debris and hemorrhage have all been included in the evaluation set. A few of these foci have been falsely marked positive [Figure 4], indicating the need for further training.

Figure 7

Foci containing neutrophils falsely labeled as “abnormal” by the CNN

Foci containing neutrophils falsely labeled as “abnormal” by the CNN The high sensitivity of the CNN in picking up abnormal foci (95.63%) makes it suitable for screening. To improve the specificity so that the CNN becomes useful for diagnostic purposes, further training with a larger dataset will be required. Presently, the CNN is not useful for the purpose of confirmatory diagnosis, because of very low positive predictive value. However, the high negative predictive value (99.19%) indicates its potential for use in screening. With an automated slide preparation system, microphotography and slide scanning system, the CNN can provide reproducible results and become a useful screening tool.

CONCLUSION

The present study demonstrates the performance characteristics of a convolutional neural network in screening liquid based cervical smears. Training with a larger and more diverse dataset will be required before it can be employed for confirmatory diagnosis.

Financial support and sponsorship

Nil.

Conflicts of interest

There are no conflicts of interest.

16 in total

Review 1. Accuracy of the Papanicolaou test in screening for and follow-up of cervical cytologic abnormalities: a systematic review.

Authors: K Nanda; D C McCrory; E R Myers; L A Bastian; V Hasselblad; J D Hickey; D B Matchar
Journal: Ann Intern Med Date: 2000-05-16 Impact factor: 25.391

2. Liquid-based cervical cytologic smear study and conventional Papanicolaou smears: a metaanalysis of prospective studies comparing cytologic diagnosis and sample adequacy.

Authors: S J Bernstein; L Sanchez-Ramos; B Ndubisi
Journal: Am J Obstet Gynecol Date: 2001-08 Impact factor: 8.661

Review 3. The FocalPoint System: FocalPoint slide profiler and FocalPoint GS.

Authors: Thomas F Kardos
Journal: Cancer Date: 2004-12-25 Impact factor: 6.860

4. Segmentation of cervical cell nuclei in high-resolution microscopic images: A new algorithm and a web-based software framework.

Authors: Christoph Bergmeir; Miguel García Silvente; José Manuel Benítez
Journal: Comput Methods Programs Biomed Date: 2012-02-10 Impact factor: 5.428

5. American Society of Cytopathology workload recommendations for automated Pap test screening: developed by the productivity and quality assurance in the era of automated screening task force.

Authors: Tarik M Elsheikh; R Marshall Austin; David F Chhieng; Fern S Miller; Ann T Moriarty; Andrew A Renshaw
Journal: Diagn Cytopathol Date: 2012-02-20 Impact factor: 1.582

6. Performance of ThinPrep liquid-based cervical cytology in comparison with conventionally prepared Papanicolaou smears: a quantitative survey.

Authors: Ovadia Abulafia; John C Pezzullo; David M Sherer
Journal: Gynecol Oncol Date: 2003-07 Impact factor: 5.482

7. Second edition of 'The Bethesda System for reporting cervical cytology' - atlas, website, and Bethesda interobserver reproducibility project.

Authors: Ritu Nayar; Diane Solomon
Journal: Cytojournal Date: 2004-10-21 Impact factor: 2.091

8. Liquid-based cytology for primary cervical cancer screening: a multi-centre study.

Authors: J Monsonego; A Autillo-Touati; C Bergeron; R Dachez; J Liaras; J Saurel; L Zerat; P Chatelain; C Mottot
Journal: Br J Cancer Date: 2001-02-02 Impact factor: 7.640

Review 9. A feature set for cytometry on digitized microscopic images.

Authors: Karsten Rodenacker; Ewert Bengtsson
Journal: Anal Cell Pathol Date: 2003 Impact factor: 2.916

10. Does the ThinPrep Imaging System increase the detection of high-risk HPV-positive ASC-US and AGUS? The Women and Infants Hospital experience with over 200,000 cervical cytology cases.

Authors: M Rudhul Quddus; Theresa Neves; Mary E Reilly; Margaret M Steinhoff; C James Sung
Journal: Cytojournal Date: 2009-08-06 Impact factor: 2.091

1 in total

Review 1. Artificial neural network in diagnostic cytology.

Authors: Pranab Dey
Journal: Cytojournal Date: 2022-04-02 Impact factor: 2.091

1 in total