Literature DB >> 29456725

Application of deep learning to the classification of images from colposcopy.

Masakazu Sato¹, Koji Horie¹, Aki Hara¹, Yuichiro Miyamoto¹, Kazuko Kurihara¹, Kensuke Tomio¹, Harushige Yokota¹.

Abstract

The objective of the present study was to investigate whether deep learning could be applied successfully to the classification of images from colposcopy. For this purpose, a total of 158 patients who underwent conization were enrolled, and medical records and data from the gynecological oncology database were retrospectively reviewed. Deep learning was performed with the Keras neural network and TensorFlow libraries. Using preoperative images from colposcopy as the input data and deep learning technology, the patients were classified into three groups [severe dysplasia, carcinoma in situ (CIS) and invasive cancer (IC)]. A total of 485 images were obtained for the analysis, of which 142 images were of severe dysplasia (2.9 images/patient), 257 were of CIS (3.3 images/patient), and 86 were of IC (4.1 images/patient). Of these, 233 images were captured with a green filter, and the remaining 252 were captured without a green filter. Following the application of L2 regularization, L1 regularization, dropout and data augmentation, the accuracy of the validation dataset was ~50%. Although the present study is preliminary, the results indicated that deep learning may be applied to classify colposcopy images.

Entities: Chemical Disease Gene Species

Keywords: artificial intelligence; cervical cancer; cervical intraepithelial neoplasia; colposcopy; deep learning

Year: 2018 PMID： 29456725 PMCID： PMC5795879 DOI： 10.3892/ol.2018.7762

Source DB: PubMed Journal: Oncol Lett ISSN： 1792-1074 Impact factor: 2.967

Introduction

In the era of big data, various types of data can be obtained and shared by all users through the internet or social network systems. One method to efficiently manage enormous amounts of data is to apply deep learning (1). One characteristic of deep learning is that this approach does not require features or representations to be selected during data input. Recently, a number of studies have focused on image classification by deep learning, and these technologies are now becoming readily available for use by corporations and individuals (2–6). For example, TensorFlow is a Google software library for machine learning that was released under an open-source license in 2015 (7). Using this technology, the present study aimed to potentially integrate deep learning into gynecological clinical practice. Cervical cancer is a leading cause of death in women worldwide (8). Although mortality rates were drastically reduced following the introduction of the Pap smear test, determining the types of patients who should be further screened and treated as high-risk remains an important issue, particularly for avoiding overmedication (9–11). In daily practice, patient management is determined by the combined use of cytology, histology, HPV typing and colposcopy results. The present study investigated whether deep learning with a focus on colposcopy images as input could predict the postoperative diagnosis. Colposcopy is a well-established tool for observing the cervix at up to ×10 magnification (12). Cervical intraepithelial lesions are enhanced and easily recognized when treated with acetic acid solutions. For instance, areas that turn white following acetic acid treatment (acetowhitening) and/or areas that present abnormal vascular patterns are considered for biopsy. These effects become more visible after a green filter is applied (13). Diagnoses are then evaluated by gynecologists based on the degree of staining and the underlying vascular patterns. Studies have attempted to classify images from colposcopy using neural networks (14–16). For instance, one group investigated whether neural networks could recognize the dot pattern, which represents a colposcopy finding, after learning the pattern from samples annotated by the researchers (16). The present study is distinct from the aforementioned studies because features or representations of the images, for example, the presence of this dot pattern, were not selected during data input. For deep learning, the Keras neural network and TensorFlow libraries were used (7,17). In the present study, the classification accuracy on the validation dataset reached ~50%. While this result in itself is not satisfactory, it suggests that deep learning has the potential to classify images from colposcopy. In addition, the present study investigated methods to improve the learning rate and avoid overfitting due to the limitation of insufficient numbers of obtained images. In the process presented, L2 regularization, L1 regularization and dropout were applied, and the amount of input data was increased via data augmentation. In the present study, the intention was not to stress the accuracy rate itself but rather to demonstrate that gynecologists, who are not specialists in artificial intelligence or machine learning, may be able to utilize deep learning in clinical practice. Furthermore, the present results suggest that relevant information from clinical practice should be appropriately stored for future use.

Materials and methods

Patients

The present study was approved by the Institutional Ethics Committee of Saitama Cancer Center (approval no. 630). Written informed consent was obtained from all the patients. Medical records and data from the gynecological oncology database were retrospectively reviewed. Patients who underwent conization at Saitama Cancer Centre (Ina, Japan) from January 2014 to December 2015 were enrolled. Conization management at the facility is determined according to the guidelines of the Japan Society of Obstetrics and Gynecology. Although each diagnosis was performed in principle according to the postoperative pathology (conization), the preoperative pathology (biopsy) was prioritized when the results were severe and used as the output (‘target’ in deep learning) for images from colposcopy. A total of 158 patients were enrolled; their median age was 39 years (range, 21–63 years; Fig. 1A). The diagnoses and corresponding patient numbers were as follows: severe dysplasia, 49; carcinoma in situ (CIS), 78; invasive cancer, (IC) 21; and others (such as adenocarcinoma in situ and invasive adenocarcinoma), 10. In the current study, patient classification was limited to three groups (severe dysplasia, CIS and IC) because of the limited number of available images.

Figure 1.

Patient characteristics and collected images. (A) Distribution of patient age. The median age of the patients was 39 years (range, 21–63 years). (B) Preprocessing of images. Images containing 640×480 pixels were trimmed to 300×300 pixels using Photoshop CC. (C) Patterns of collected images. Representative images for each group are shown. Group 1, images without a green filter that captured >two thirds of the cervix; Group 2, images without a green filter that captured two thirds of the cervix; and Group 4, images with a green filter that captured

Images

Preoperative images from colposcopy were used as the input data for deep learning. Because this investigation was a retrospective study, there were no criteria for determining the number and type of colposcopy images to retain. Images following acetic acid treatment with or without a green filter that represented areas of biopsy and were used in the diagnoses were stored. The total number of images was 485, with 142 images for severe dysplasia (2.9 images/patient), 257 for CIS (3.3 images/patient), and 86 for IC (4.1 images/patient). Of these, 233 images were captured with a green filter, and the remaining 252 were captured without a green filter. Images from colposcopy captured at our facility were stored in PNG format at a resolution of 640×480 pixels in RGB 3-channel color. These raw images often represented areas inappropriate and unwanted for deep learning, such as the Cusco speculum and vaginal wall; therefore, preprocessing was performed to focus on the cervix by trimming the images to 300×300 pixels (Fig. 1B). Trimming was performed with Photoshop CC (Adobe Systems, Inc., San Jose, CA, USA). Images without a green filter that captured >two thirds of the cervix or two thirds of the cervix or

Figure 2.

Application of deep learning. (A) Neural network architecture. Tuning the dense layers appeared to improve the learning rates. (B) Influence of the number and types of images on learning rates. The validation accuracy was reduced irrespective of the presence (lower right) or absence (lower left) of a green filter. The total number of images was likely more important for increasing the validation accuracy, at least in this small-scale study (upper). Blue line, training-accuracy curve; Green line, validation-accuracy curve.

Deep learning

We used the Keras (https://keras.io) neural network library and the TensorFlow (https://www.tensorflow.org) software library. The code was frequently referred to in the Keras blog (https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html) and the basic code was adjusted for the learning procedure of the present study. The validation dataset contained 25 randomly selected images for each diagnosis (75 images in total), and it was not used for training in the study unless otherwise mentioned.

Development environment

The development environment used in the present study was as follows: a Mac running OS X 10.11.3 (Apple, Inc., Cupertino, CA, USA); Python language v. 2.7.12; Keras 1.1.0; TensorFlow 0.8.0; and matplotlib 1.5.3.

Statistical analysis

JMP Pro 11 (SAS Institute, Inc., Cary, NC, USA) was used for the statistical analysis. One-way analysis of variance was used for comparing the means. The Tukey-Kramer test was used for post-hoc analysis. P<0.05 was considered to indicate a statistically significant difference.

Results

Preoperative images from colposcopy were retrospectively collected as described in the Methods section. Statistical analysis suggested that a higher number of images were stored for more severe lesions (P=0.0085). The % of groups 1–4 were summarized for each diagnosis (Fig. 1D). Unlike the images of severe dysplasia and CIS, the IC images tended to include magnified lesions and were usually captured with a green filter (Fig. 1D). The total number of images is more important for avoiding overfitting than dividing the input images according to the presence or absence of a green filter. A validation set accuracy of <33% meant that learning did not occur because the same number of images for each diagnosis was assigned to the validation dataset. The convolution layers and dense layers were tuned as described in the Methods section (Fig. 2A). In the present study, dense layers appeared to affect the learning rates, and a training accuracy that exceeded 90% in 100 epochs was obtained by tuning the dense layers. However, the validation accuracy plateaued at ~40–50%, which suggested that overfitting had occurred. Therefore, methods of avoiding overfitting to prevent discrepancies between the training curve and validation curve were explored. First, the set of collected images included images both with and without a green filter, and these images were individually used for learning because of possible learning inefficiencies caused by mixed data (Fig. 2B). However, the validation dataset was re-selected (10 images for each diagnosis, 30 in total), and the results demonstrated that the validation accuracy was reduced regardless of the presence or absence of a green filter. This result was likely related to a reduction in the total number of images. Thus, the total number of images appeared to be more important for increasing the validation accuracy than the division of input data according to the presence or absence of green a filter, at least in the present small-scale study.

L2 regularization can improve overfitting

To avoid overfitting, L2 regularization, L1 regularization and dropout were applied (Fig. 3). L2 regularization and L1 regularization were applied in the first input layer, and dropout was applied to all the layers after max-pooling (the dropout rate was set to 0.5). L2 regularization appeared to be effective at avoiding overfitting when properly tuned (Fig. 3). L1 regularization caused learning failure in the investigated value set, and the application of dropout reduced the learning efficacy, although this result may have been related to the relatively short epochs.

Figure 3.

Exploring methods to avoid overfitting. L2 regularization, L1 regularization and dropout were applied. L2 regularization was thought to be somewhat effective in avoiding overfitting when set at a value of 0.001 or 0.01, but not at a value of 0.02. L1 regularization set at a value of 0.01 caused learning failure, and the application of dropout caused a lower learning efficacy; however, this result may have been caused by the relatively small epochs, or it may have represented a characteristic of applying dropout. Blue line, training-accuracy curve; Green line, validation-accuracy curve.

Data augmentation slightly improves the validation accuracy and overfitting

When performing deep learning, the total number of images is known to be an important factor for improving the learning accuracy and avoiding overfitting, which is generally consistent with the aforementioned results (5). Thus, the hypothesis that data augmentation could improve the accuracy rates was investigated. When viewing colposcopy images, tilted images were occasionally encountered because the angles of the cervix relative to the camera posture varied; however, differences in the angle should not be important for recognizing lesions. To test this hypothesis, 20 images were obtained from a single image by randomly rotating it, applying different zoom magnifications and horizontally flipping the image and the resulting images were then used as input. Thus, one image was converted into ~20 images by data augmentation. Examples of the augmentation results are illustrated in Fig. 4A. Data augmentation appeared to worsen the overfitting limitation; however, the application of L2 regularization and dropout slightly improved overfitting (Fig. 4B). The validation accuracy ultimately reached a stable level of ~50%.

Figure 4.

Data augmentation. (A) Example of data-augmented images. A total of 20 images were obtained from a single image by randomly rotating, zooming in/out and horizontally flipping the image. Therefore, one image was increased to ~20 images by data augmentation. (B) Combination of data augmentation, L2 regularization and dropout. Data augmentation worsened the overfitting; however, the application of L2 regularization and dropout slightly improved the overfitting. The final validation accuracy reached a stable value of ~50%. Blue line, training-accuracy curve; Green line, validation-accuracy curve.

Discussion

The present study investigated whether deep learning could be applied to the classification of images from colposcopy. Various types of data are increasingly available through the Internet, and inexpensive high-end smartphones are more readily available for the general public, facilitating the uploading and sharing of information, such as pictures. The same is true for data processing. High-performance personal computers are affordable for individuals, and statistical analyses or machine learning can be performed without supercomputers if the information volume is limited. Furthermore, deep learning technologies are becoming more accessible for corporations and individuals. For example, the Google software library for machine learning, TensorFlow, was released under an open-source license in 2015 (7). Based on these trends, the present study aimed to apply deep learning to gynecological clinical practice. Preoperative images from colposcopy were retrospectively collected. A total of 485 images were obtained, with 142 images for severe dysplasia (2.9 images/patient), 257 for CIS (3.3 images/patient), and 86 for IC (4.1 images/patient). These results indicate that more images were stored when the lesions were more severe (P=0.0085), because gynecologists tend to capture a higher number of images in lesions in order to record important findings. Accordingly, the IC images tended to include lesions under greater magnification. Furthermore, these images were more frequently captured with a green filter compared with the severe dysplasia and CIS images (Fig. 1D). One of the greatest challenges associated with machine learning, including deep learning, is the prevention of overfitting (5). Overfitting is a condition in which the model cannot be applied to unknown data because it has been overly adjusted to the training data. In the present study, the large discrepancy between the training curve and validation curve suggests that overfitting occurred (Figs. 2–4), most likely due to the small number of included images. Ordinarily, 500–1,000 images are prepared for each class during image classification with deep learning (2). The present study explored methods to improve the learning rate and avoid overfitting under the limitation of an insufficient number of included images. In addition, L2 regularization, L1 regularization and dropout were applied, and the amount of input data was increased by data augmentation. In clinical practice, it would be of interest for clinicians to distinguish CIN1, CIN2 and CIN3, or to distinguish CIN1 (or low-grade squamous intraepithelial lesions) from CIN2 (or high-grade squamous intraepithelial lesions). Furthermore, classification might not be clinically necessary for severe dysplasia and carcinoma in situ (CIS), because there is little difference in diagnosis and treatment between these two conditions. However, due to technical reasons, the present preliminary study used images from CIN3 and invasive cancer patients. In terms of deep learning, the output (i.e., the ground-truth classification result of an image) is very important. For instance, what is considered CIN1 may not always be ‘genuine’ CIN1 because only a biopsy is performed in most cases. Ideally, conization should be performed to provide the true answer. In addition, providing ground-truth classifications such as ‘white lesion’ or ‘glandular opening’ would not construct a reliable model, because those answers are subject to human perception: There is no strict definition for these terms. As such, for this initial study, images from CIN3 and invasive cancer patients were used, because pathological diagnoses of the conization samples were readily obtained. For clinicians, an improved solution would be to use the ‘patient's prognosis’ as an output. A clinical application designed to screen a target group not in need of invasive testing, such as biopsy and conization could be desirable as well. Therefore, although the clinical significance of the classification into three groups (severe dysplasia, CIS and IC) is currently limited, the present study demonstrated that deep learning, by inputting only images, could be used to classify colposcopy images. The final validation accuracy was ~50%, which is better than a random result (33%). To the best of our knowledge, no report using ‘deep learning’ for classification of images from colposcopy exists to date. Although previous studies have used automation diagnosis, deep learning was not employed and the patient cohort was different than the current study (CIN3 and invasive cancer) (14–16). As such, the present study cannot be directly compared to the previous literature, in order to evaluate whether the 50% accuracy result was good or poor. However, although this result may not be satisfactory in terms of deep-learning tasks, it may be sufficient in the clinic considering the difficulty of distinguishing among severe dysplasia, CIS and IC even among experts. The present study aimed, not to stress the accuracy rate itself, but rather to demonstrate that gynecologists, who are not specialists in artificial intelligence or machine learning, can utilize deep learning in clinical practice. The barriers to using artificial intelligence and deep learning will likely be decreased in the near future. Thus, as much relevant clinical information as possible should be stored appropriately for future use. For instance, the images used in this study contained as few as 150×150 pixels and 3 RGB channels. Images of this size could be obtained by most users, even those using smart phones. Our facility is a cancer center, and the patient population could have been biased in terms of disease conditions because those who require operations or intensive observation tend to be referred to our hospital. Consequently, the collected images could also be biased. Furthermore, as mentioned above, there were many more images taken of lesions considered to be more severe. Therefore, the same proportion of images is likely stored at many other facilities, regardless of the diagnosis and the apparent severity of the lesion. The current study investigated a method for applying deep learning to colposcopy image classification. The accuracy on the final validation dataset reached ~50%. Although this result is preliminary, it suggests that clinicians and researchers, who are not specialists in artificial intelligence or machine learning, can utilize deep learning. Furthermore, these findings suggest that as much relevant clinical practice information as possible, including colposcopy data and images, should be stored for future use.

17 in total

1. Automatic colposcopy video tissue classification using higher order entropy-based image registration.

Authors: Juan D García-Arteaga; Jan Kybic; Wenjing Li
Journal: Comput Biol Med Date: 2011-09-03 Impact factor: 4.589

2. A novel optical imaging method for the early detection, quantitative grading, and mapping of cancerous and precancerous lesions of cervix.

Authors: C Balas
Journal: IEEE Trans Biomed Eng Date: 2001-01 Impact factor: 4.538

3. Application of time series discretization using evolutionary programming for classification of precancerous cervical lesions.

Authors: Héctor-Gabriel Acosta-Mesa; Fernando Rechy-Ramírez; Efrén Mezura-Montes; Nicandro Cruz-Ramírez; Rodolfo Hernández Jiménez
Journal: J Biomed Inform Date: 2014-03-15 Impact factor: 6.317

4. Cross sectional study of conventional cervical smear, monolayer cytology, and human papillomavirus DNA testing for cervical cancer screening.

Authors: Joël Coste; Béatrix Cochand-Priollet; Patricia de Cremoux; Catherine Le Galès; Isabelle Cartier; Vincent Molinié; Sylvain Labbé; Marie-Cécile Vacher-Lavenu; Philippe Vielh
Journal: BMJ Date: 2003-04-05

5. TensorFlow: Biology's Gateway to Deep Learning?

Authors: Ladislav Rampasek; Anna Goldenberg
Journal: Cell Syst Date: 2016-01-27 Impact factor: 10.304

6. Automated Melanoma Recognition in Dermoscopy Images via Very Deep Residual Networks.

Authors: Lequan Yu; Hao Chen; Qi Dou; Jing Qin; Pheng-Ann Heng
Journal: IEEE Trans Med Imaging Date: 2016-12-21 Impact factor: 10.048

7. Molecular transitions from papillomavirus infection to cervical precancer and cancer: Role of stromal estrogen receptor signaling.

Authors: Johan A den Boon; Dohun Pyeon; Sophia S Wang; Mark Horswill; Mark Schiffman; Mark Sherman; Rosemary E Zuna; Zhishi Wang; Stephen M Hewitt; Rachel Pearson; Meghan Schott; Lisa Chung; Qiuling He; Paul Lambert; Joan Walker; Michael A Newton; Nicolas Wentzensen; Paul Ahlquist
Journal: Proc Natl Acad Sci U S A Date: 2015-06-08 Impact factor: 11.205

8. Predicting the outcome for patients in a heart transplantation queue using deep learning.

Authors: Dennis Medved; Pierre Nugues; Johan Nilsson
Journal: Conf Proc IEEE Eng Med Biol Soc Date: 2017-07

9. High-Throughput Classification of Radiographs Using Deep Convolutional Neural Networks.

Authors: Alvin Rajkomar; Sneha Lingam; Andrew G Taylor; Michael Blum; John Mongan
Journal: J Digit Imaging Date: 2017-02 Impact factor: 4.056

10. Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases.

Authors: Andrew Janowczyk; Anant Madabhushi
Journal: J Pathol Inform Date: 2016-07-26

19 in total

1. Simplified Convolutional Neural Network Application for Cervix Type Classification via Colposcopic Images.

Authors: Vitalii Pavlov; Stanislav Fyodorov; Sergey Zavjalov; Tatiana Pervunina; Igor Govorov; Eduard Komlichenko; Viktor Deynega; Veronika Artemenko
Journal: Bioengineering (Basel) Date: 2022-05-30

2. Automated Measurement of Lumbar Lordosis on Radiographs Using Machine Learning and Computer Vision.

Authors: Brian H Cho; Deepak Kaji; Zoe B Cheung; Ivan B Ye; Ray Tang; Amy Ahn; Oscar Carrillo; John T Schwartz; Aly A Valliani; Eric K Oermann; Varun Arvind; Daniel Ranti; Li Sun; Jun S Kim; Samuel K Cho
Journal: Global Spine J Date: 2019-08-13

3. Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning.

Authors: Jiajun Hong; Yongchao Luo; Yang Zhang; Junbiao Ying; Weiwei Xue; Tian Xie; Lin Tao; Feng Zhu
Journal: Brief Bioinform Date: 2020-07-15 Impact factor: 11.622

4. Deep learning based classification of ultrasound images for thyroid nodules: a large scale of pilot study.

Authors: Qing Guan; Yunjun Wang; Jiajun Du; Yu Qin; Hongtao Lu; Jun Xiang; Fen Wang
Journal: Ann Transl Med Date: 2019-04

5. The application of deep learning based diagnostic system to cervical squamous intraepithelial lesions recognition in colposcopy images.

Authors: Chunnv Yuan; Yeli Yao; Bei Cheng; Yifan Cheng; Ying Li; Yang Li; Xuechen Liu; Xiaodong Cheng; Xing Xie; Jian Wu; Xinyu Wang; Weiguo Lu
Journal: Sci Rep Date: 2020-07-15 Impact factor: 4.379

6. Feasibility of artificial intelligence for predicting live birth without aneuploidy from a blastocyst image.

Authors: Yasunari Miyagi; Toshihiro Habara; Rei Hirata; Nobuyoshi Hayashi
Journal: Reprod Med Biol Date: 2019-02-19

7. Feasibility of deep learning for predicting live birth from a blastocyst image in patients classified by age.

Authors: Yasunari Miyagi; Toshihiro Habara; Rei Hirata; Nobuyoshi Hayashi
Journal: Reprod Med Biol Date: 2019-03-01

8. Feasibility of predicting live birth by combining conventional embryo evaluation with artificial intelligence applied to a blastocyst image in patients classified by age.

Authors: Yasunari Miyagi; Toshihiro Habara; Rei Hirata; Nobuyoshi Hayashi
Journal: Reprod Med Biol Date: 2019-06-12

9. Deep Metric Learning for Cervical Image Classification.

Authors: Anabik Pal; Zhiyun Xue; Brian Befano; Ana Cecilia Rodriguez; L Rodney Long; Mark Schiffman; Sameer Antani
Journal: IEEE Access Date: 2021-03-29 Impact factor: 3.367

10. Classification of cervical neoplasms on colposcopic photography using deep learning.

Authors: Bum-Joo Cho; Youn Jin Choi; Myung-Je Lee; Ju Han Kim; Ga-Hyun Son; Sung-Ho Park; Hong-Bae Kim; Yeon-Ji Joo; Hye-Yon Cho; Min Sun Kyung; Young-Han Park; Byung Soo Kang; Soo Young Hur; Sanha Lee; Sung Taek Park
Journal: Sci Rep Date: 2020-08-12 Impact factor: 4.379