Literature DB >> 34112812

Cric searchable image database as a public platform for conventional pap smear cytology data.

Mariana T Rezende1,2, Raniere Silva3, Fagner de O Bernardo4, Alessandra H G Tobias5, Paulo H C Oliveira4, Tales M Machado4, Caio S Costa4, Fatima N S Medeiros6, Daniela M Ushizima7,8,9, Claudia M Carneiro10,5, Andrea G C Bianchi4.   

Abstract

Amidst the current health crisis and social distancing, telemedicine has become an important part of mainstream of healthcare, and building and deploying computational tools to support screening more efficiently is an increasing medical priority. The early identification of cervical cancer precursor lesions by Pap smear test can identify candidates for subsequent treatment. However, one of the main challenges is the accuracy of the conventional method, often subject to high rates of false negative. While machine learning has been highlighted to reduce the limitations of the test, the absence of high-quality curated datasets has prevented strategies development to improve cervical cancer screening. The Center for Recognition and Inspection of Cells (CRIC) platform enables the creation of CRIC Cervix collection, currently with 400 images (1,376 × 1,020 pixels) curated from conventional Pap smears, with manual classification of 11,534 cells. This collection has the potential to advance current efforts in training and testing machine learning algorithms for the automation of tasks as part of the cytopathological analysis in the routine work of laboratories.

Entities:  

Year:  2021        PMID: 34112812      PMCID: PMC8192784          DOI: 10.1038/s41597-021-00933-8

Source DB:  PubMed          Journal:  Sci Data        ISSN: 2052-4463            Impact factor:   6.444


Introduction

Cervical cancer is one of the most frequently diagnosed neoplasms and one of the main causes of death from cancer in the female population, and constitutes a significant public health problem worldwide[1]. According to the most recent estimate by the World Health Organization (WHO), it is the fourth most incident cancer among women worldwide, with approximately 342,000 deaths in 2020, and is the leading cause of cancer death in 42 countries[2]. Cervical cancer has one of the best prognosis for prevention and cure, reaching almost 100% of cure when diagnosed early with screening methods[3]. The cytopathological procedure known as Pap smear[4], which is the most widely used test, is relatively inexpensive to perform, and effective in detecting precursor lesions[4]. The discovery that cervical infection by high oncogenic risk human papillomavirus (HPV) genotypes can progress to cervical cancer has led to the advancement of HPV molecular detection tests to screen for this neoplasm[5]. However, the Pap test is still essential, since positivity for oncogenic HPV still requires cytological information[6,7] as a follow-up. Even with molecular diagnosis, Pap smear is still the screening method used in low-income countries: its replacement by more expensive methods is a distant reality[3]. Even in high-income countries, well-structured implementation of programs using the Pap smear has reduced cervical cancer incidence and mortality rates by up to 65% in the last 40 years[8]. Despite the global use of Pap smear to detect cervical cancer, it has inherent limitations[9], including the fact that it is labor intensive, underscoring the need for strategies that generate more accurate results and with lower rates of false-negative, false-positive and unsatisfactory results. Since visual interpretation is time-consuming, subjective and requires highly specialized human interaction[10-15], increasing efforts over the past few decades have been aimed at developing automated analysis of Pap smear data. In recent years, machine learning algorithms have been proposed to tackle these challenges. One of the most crucial requirements for an automated system powered by computer vision and machine learning techniques is a collection of hundreds (if not thousands) of high-quality, well-curated digitized images of Pap smears and related metadata. The existing FAIR (Findable, Accessible, Interoperable, and Reusable)[16] data about Pap smears is limited, and without representation of cells with pre-neoplastic alteration of all classes, mostly composed of cut-out cells, synthetic, non-standardized images, and from liquid-based cytology[17-23]. The most widely used FAIR data in Pap smear image classification studies for cervical lesion detection is the Herlev base[5,17,21,22,24]. This database comprises a total of 917 images: each image contains a single cervical cell and is assigned to one of seven classes of a pre-neoplasic lesion. Other FAIR data used in Pap smear image classification studies include the SIPaKMeD[25] database, with 4,049 cells (in 966 images) assigned to one of five categories of cell types, with no pre-neoplastic alteration. Another issue with most of the databases, including Herlev and SIPaKMeD, is that they are not classified according to the Bethesda System nomenclature, created in 1988 to standardize the terminology of cervicovaginal cytology in order to reduce high variability in the communication of results. The Bethesda System is considered the most used, uniform and reproducible terminology among different pathologists and laboratories[26,27]. A further complication is that many previous works have relied on image collections with cut-out cells and/or unrealistically “clean” images, similar to those in liquid-based cytology[10,20,25,28]. Figure 1 illustrates the drastic difference between images acquired using the conventional and the liquid-based cytology. Conventional cytology has overlapping cells, leukocytes, red blood cells, and mucus, and the image is hugely varied and significantly more complex for analysis than that of liquid-based cytology, which presents homogeneity in cell distribution and absence of obscuring factors[29].
Fig. 1

Illustration of (a) conventional cytology and (b) liquid-based cytology.

Illustration of (a) conventional cytology and (b) liquid-based cytology. Constructing and delivering a database of real images of conventional cytology from Pap smear will have considerable impact on computer vision and machine learning methods for knowledge discovery. It allows the comparison of existing algorithms, as well as the investigation of new prediction methodologies that are much more realistic from a biological perspective, and necessary to make such methods valuable to routine cervical cancer screening. Therefore, there are two major challenges: algorithms that allow the identification and classification of cells in a precise way, and, a complete, representative database with accurate information that enables the development/training of automatic methodologies to support the professional’s decision. This article presents a web platform, and uses this platform to support the development of a new cervical cell database, “CRIC Cervix”, a collection of images obtained from conventional Pap smear. The Center for Recognition and Inspection of Cells (“CRIC”) is a collaborative consortium among researchers that aims to provide cell collections to the scientific community. The CRIC Cervix images are similar to those obtained during an examination, with many cells per image and 150 dpi resolution. Cytopathologists classified the cells following the protocol defined by the Bethesda System nomenclature. In addition to the cells being individually identified and classified manually by different specialists, the full image field of collected cells also has a diagnosis. The information is available on the CRIC Searchable Image Database web platform, https://database.cric.com.br, which provides a publicly available web tool with information about cervical cells. The goal of this platform, as illustrated in Fig. 2, is to enable the community to explore cell morphology and variability, and facilitate discoveries from the image collections.
Fig. 2

CRIC Database workflow. (a) Latest protocols for taxonomy; (b) microscope screening for smear selection; (c) photo-documentation of smears; (d) image selection and curation; (e) insertion of images in CRIC, and (f) manual classification of cells by cytopathologists at CRIC.

CRIC Database workflow. (a) Latest protocols for taxonomy; (b) microscope screening for smear selection; (c) photo-documentation of smears; (d) image selection and curation; (e) insertion of images in CRIC, and (f) manual classification of cells by cytopathologists at CRIC.

Results

Database contents

The collection CRIC Cervix has 400 images of Pap smears and 11,534 classified cells. In addition to https://database.cric.com.br, all the images in the current release are available in figshare[30] as a collection, 10.6084/m9.figshare.c.4960286.v2, under a Creative Commons Attribution 4.0 International license. The classification of the cells is provided in a CSV and JSON file formats with a companion README file that provides human readable details of CSV headers and JSON fields. The collection is the first to cover conventional cytology cervical cells with classifications performed by several cytopathologists based on the Bethesda System nomenclature, which is the standardized terminology and most used worldwide in the area of cervical cytopathology. The cells in the CRIC Cervix collection are classified in six (6) classes: (1) negative for intraepithelial lesion or malignancy (NILM); (2) atypical squamous cells of undetermined significance, possibly non-neoplastic (ASC-US); (3) low-grade squamous intraepithelial lesion (LSIL); (4) atypical squamous cells, cannot exclude a high-grade lesion (ASC-H); (5) high-grade squamous intraepithelial lesion (HSIL); and (6) squamous cell carcinoma (SCC). Another unique contribution of the proposed database is that it contains real images of conventional cytology with a variable number of cells, for example, including cells that overlap, which is often present in a smear of cytology, and must be considered when designing and testing automated cytology screening[23,31]. A list of differences among the cervical cell databases are listed in Tables 1 and 2.
Table 1

Comparison of properties among databases.

PropertyCRIC CervixHerlevSIPaKMeD
Number of images400917966
Cells per imageVariable1Variable
Image size (in pixels)1,376 × 1,020Variable2,048 × 1,536
Resolution0.228 μm/pixel0.201 μm/pixelUnknown
ClassificationManualManualManual
Classified cells11,5349174,049
Validation3 cytopathologists2 cyto-techniciansexpert cytopathologists
Download Pagedatabase.cric.com.brmde-lab.aegean.gr/downloadswww.cs.uoi.gr/~marina/sipakmed.html
Table 2

Number of cells per characteristics in databases.

Cell typeCell count
CRIC CervixHerlevSIPaKMed
NILM6,779144 (*)(***)
ASC-US6060(***)
ASC-H925(***)
LSIL1,360182(***)
HSIL1,703493 (**)(***)
SCC1610(***)
Total11,5348194,049 (***)

(*) Small requirements and Intermediate squamous epithelial.

(**) Intermediate squamous epithelial and Severe squamous non-keratinizing dysplasia.

(***) Cell categories cannot be translated into the Bethesda System nomenclature.

Comparison of properties among databases. Number of cells per characteristics in databases. (*) Small requirements and Intermediate squamous epithelial. (**) Intermediate squamous epithelial and Severe squamous non-keratinizing dysplasia. (***) Cell categories cannot be translated into the Bethesda System nomenclature. Both in the total number of cells and the number of cells per class of lesion, CRIC Cervix is the largest (11,534) database available for real cancer cells, providing a broad diversity of neoplastic lesions to date. Also, the CRIC images contain many of challenges often found in ordinary exams, including overlapping cells and inflammatory cells. Classifications are performed by cytopathologists, using the standardized Bethesda System nomenclature. Although the Herlev database has more images, it contains far less cells. As for the Sipakmed database, it has isolated cells from Pap smear slides, and additionally uses a non-Bethesda-conforming classification, dividing cells into five categories: superficial-intermediate cells, parabasal cells, koilocytotic cells, dyskeratotic cells, and metaplastic cells, including normal, abnormal, and benign cases.

Web platform

In this work, we implemented the CRIC Searchable Image Database, available at https://database.cric.com.br, allowing the user to browse the images via a modern web-interface, easing data access and acceleration of discoveries (Supplementary Video). In addition to the web-interface, the platform is also accessible by a REST (Representational State Transfer) API that makes it possible to add interoperability with third-party projects. The CRIC Searchable Image Database backend uses several technologies, including the web framework Express[32] for Node.js[33], which supports the view system, routing, middleware, and other capabilities. In order to connect the web framework with the data, CRIC uses Sequelize[34] which offers Object Relational Mapping (ORM) objects to the relational database systems, here enabled by MySQL[35]. The front-end uses Angular[36], which is a JavaScript framework for building web apps. The code is available at https://github.com/CRICDatabase/searchable-image-database and it is released under an open source license that allows anyone to reuse it, particularly if considering the documentation, which is available at https://cric-database.stoplight.io/docs/searchable-image-database-nodejs/reference/cric.v2.yaml. After an image is uploaded to the CRIC Searchable Image Database, the user can identify and classify cells manually. The interface captures the coordinate of the click performed by the user, prompts the user for the cell classification, saves the information, and updates the image in the user’s web browser with a square around the latest classified cell. The end result is a curated image such as Fig. 3. In addition to individual cell information, each image has a diagnosis following the most severe case among the classified cells in the image. The user can download the raw images, augmented images (such as Fig. 3), and associated data of the collection.
Fig. 3

CRIC Cervix Microscope Slide Image #383 with annotations.

CRIC Cervix Microscope Slide Image #383 with annotations. Figure 4 illustrates the platform dashboard where the user has an augmented view of the images with squares around the cells. If the user owns the image, they can add or edit the classification of the cells. We provided a playground area available at https://playground.database.cric.com.br/ where users may login to their own area to upload and classify their images. In the image upload, one must provide information about the captured data, author, a reference number from your laboratory, a DOI, if available, as seen at Fig. 4.
Fig. 4

CRIC Searchable Image Database currently available.

CRIC Searchable Image Database currently available. At the classification page, the user may access all the images of CRIC Cervix collection. They are presented in a reduced version with information on the slide, final diagnostic, and the number of cells in the image. When accessed in detail, the image is shown with the classification markings that may or may not be enabled. Lesions can also be accessed individually. The images can be saved, with or without the classification labels, and the labels can also be saved in JSON or CVS format. The files contain information on the lesion’s classification and the position (x, y) in the image where it occurs. The markings were made in such a way as to be located inside the nucleus of the cell that contains the lesion. At the download page, the user can save the entire collection, images, and classifications simultaneously or separately.

Discussion

We introduced the CRIC Searchable Image Database platform, which includes CRIC Cervix, a collection of cell images that are publicly available. This dataset of real images of conventional cervical cytology also have corresponding manual classification of pre-neoplastic cell lesions following the Bethesda System, and performed and certified by cytopathologists. The database will contribute significantly to the development of methodologies and/or automated products in computer-assisted cytology. It will also benefit researchers in machine learning who need labeled data for testing algorithm performance, particularly those related to deep convolutional neural networks[37]. The use of the CRIC platform and collections will allow discoveries regarding computational and biological descriptors that influence the decision about the presence of a lesion, supporting creation of new hypotheses about the differentiation of low and high grade cell lesions. We expect to further the development of image processing and computer analysis tools for the Pap smear, which are likely to augment the efficiency of visual inspection carried out by cytopathologists. Portraying the slide elements quantitatively allows the increase of the reliability and quality of the examination results. The goal is to reduce the false negatives, minimizing the number of lesions that are not detected or go unnoticed by professionals, providing additional information to support the decision of the specialist in cytopathology in order to obtain more accurate results. CRIC Cervix has been used by the CRIC research group in several efforts. Araujo et al.[38] introduced a fast way for image ranking from large datasets using convolutional neural networks available in a tool named pyCBIR. They also developed a segmentation algorithm that applies a convolutional neural network trained with patch-images to identify and rank abnormal cell regions[39]. More recently, they presented an analysis framework, CRIC-feat, which streamlines the investigation of different image databases and respective descriptors, particularly applicable to Pap images[40]. Isidoro et al.[41], developed a methodology for automatic classification of cervical cell samples based on non-geometric characteristics present in cell nuclei, excluding the use of segmentation. Such studies can feasibly be explored in different ways by other researchers when the database becomes available, including new segmentation and cell classification methodologies. An additional impact of the CRIC Searchable Image Database platform is an open-source web application for cell center annotation with single mouse-click and augmented view of the images with the cell annotation. The CRIC platform targets individual users or small groups without the financial resources to use commercially available digital histopathology software or the technical expertise to use complex and opaque solutions[42,43]. Besides commercially available digital histopathology software, e.g. provided by the manufacturers of the scanner hardware, there are many open source software solutions available for slide viewing and analysis, with some products supporting whole-slide images and others using standard graphics formats. Many solutions provide not only annotation capabilities, but also plug-in systems for automated analysis or pre-processing, e.g. Icy[44], CellProfiler[45] or SlideRunner[43]. However, none of the previous mentioned tools allow their users to publish the images to the general public. The CRIC platform allows researchers to define collections, upload images they have, annotate cells in their images, share their images and associated annotations, and download the data to be used in computer vision and machine learning experiments. At the moment, the CRIC Cervix collection includes normal squamous cells with precursor and invasive lesions. Updates will include normal glandular cells with precursor and invasive lesions. The authors opted for this strategy due to the fact that among invasive cervical cancers, the most common in the population is squamous cell carcinoma, which account for 75–90% of all cases, depending on the study[46-52]. Finally, the platform updates will increase the cytology collections and add more functionality at different user access levels, including manual segmentation of cervical cells. The CRIC Searchable Image platform also can support cytology research with other cell types, and we expect to include anal cytology soon.

Methods

The Research Ethics Committee approved this work at the Federal University of Ouro Preto, Minas Gerais, Brazil, through the document with protocol number 1944523. Due to limitations imposed by such a document, it is not possible to provide demographic information on the participants that contributed to the image dataset, including age and the ethnic diversity. This limitation exists because the participants did not provide consent for data sharing, and the requirement for informed consent was waived by the Research Ethics Committee under the agreements described above. The Pap smear samples were obtained from female patients from the Southeast region of Brazil, South America. The Pap smear samples were processed and analyzed in the Cytology Laboratory of the Pharmacy School, Federal University of Ouro Preto, Minas Gerais, Brazil. Since the liquid-based cytology method is more expensive, and unavailable to this patient cohort, we employed the conventional cytology method for cytopathological smears, which is recommended by the Brazilian Ministry of Health. The conventional Pap test consists of introducing a speculum into the vaginal canal for exposure and inspection of the uterine cervix. The Ayre spatula collects cells from the ectocervix and the endocervical brush from the endocervix. The collected material was dispensed on a matte tip blade, previously identified in pencil with the patient’s initials and date of birth. In order to preserve the cell structure, the material was fixed in 96% alcohol, and placed in slide bottles supplied by the Cytology Laboratory. These slide bottles were placed in plastic bags, sealed, and transported to the Cytology Laboratory for the processing, and analysis. In the Cytology Laboratory, a pre-analytical evaluation was carried out, which consisted of selecting the smears that were in adequate conditions for the analysis, and were duly identified with the laboratory registration number. Papanicolaou staining was used, consisting of a nuclear dye, hematoxylin, and two cytoplasmic stains, Orange G and EA 36 or EA 50. Leica AutoStainer XL was used for automatic staining. Then, the smears were assembled automatically, with Entellan between the slide and the coverslip, using the Leica CV5030 equipment. The microscopic analysis, which includes the observation and evaluation of cervical cells under an optical microscope to classify cervical cancer precursor lesions, was performed by a team of 3 cytopathologists. Since 2013, these professionals conduct cytopathological exams for the Brazilian Unified Health System, also known as SUS, and since 2014, they have been carrying out External Quality Monitoring of laboratories that provide services to the SUS in the state of Minas Gerais, Brazil. The entire process was carried out following the Brazilian Ministry of Health recommendation[53]. The Cytology Laboratory performs Internal Quality Monitoring, which are regular actions to guarantee the quality of the Pap smear, including control of the receipt, assembly and coloration of the samples, use of the 100% Rapid Review method (negative smears are reviewed) and Rapid Pre-scrutiny (scrutiny prior to routine reading), implementation of corrective actions, recording of results, promotion of permanent education to professionals and internal audit[54]. The first step in the development of the CRIC web platform was the survey of the most recent protocols for taxonomy and the elaboration of a diagram with cell types and respective lesions for guidance in the selection of Pap smears, which served as an interface model used to create the decision tree on the platform. The smears selected for photo-documentation come from the collection of cervical-vaginal smears obtained through conventional cytology performed at the Cytology Laboratory as part of the routine care. The collection was curated by analyzing the database to access the results of the exam. From the result, the smears were scrutinized under an optical microscope by the three cytologists in order to assess the cytomorphological criteria that best represented each cytological alteration. After consensus among the three specialists, 118 smears from 118 patients were selected. These 118 patients are diagnosed with 3 NILM, 45 ASC-US, 45 LSIL, 12 ASC-H, 10 HSIL and 3 SCC. The smear photo-documentation was performed by conventional bright field microscopy with a 40× objective and a 10× eyepiece, using a Zeiss AxionCam MRc digital camera coupled to the Zeiss AxioImager.Z2 microscope, with the Axion Vision Zeiss Software, which are at the Multi-User Laboratory of the Biological Sciences Research Center (NUPEB), Federal University of Ouro Preto, Minas Gerais, Brazil. After the photo-documentation, all images obtained by the three cytologists were analyzed and curated, and 400 images were selected to compose the CRIC Cervix dataset. The classification of cervical precursor and invasive lesions was performed within the CRIC Searchable Image platform, https://database.cric.com.br, according to the Brazilian Nomenclature for cervical cytopathological reports, based on the Bethesda System nomenclature[27]. The classification was carried out by three members of the Cytology Laboratory team, cytologists with experience of 6, 11 and 20 years working in diagnostic cytology. Each cell was classified by selecting the class corresponding to the lesion, based on morphological criteria prescribed by the Bethesda System nomenclature, followed by marking the center of the cell nucleus. This procedure repeated until all the cells in the image were labeled. The classification protocol followed by the three cytologists started with an independent classification made by an initial professional, then the second specialist checked the labels. Next, the third cytologist performed a review of the markings and if the three answers were in agreement, he/she approved the label. Otherwise, a consensus was reached among the three cytologists to define the final label. After manual labeling, computational routines for pre-processing removed labeled cells near the borders and whose nucleus or cytoplasm structures were not fully visible. This process aims to avoid feeding algorithms with incorrect information about the nucleus or cytoplasm. Supplementary Video
  35 in total

1.  CytoView. A prototype computer image-based Papanicolaou smear proficiency test.

Authors:  R N Taylor; M Gagnon; J Lange; T Lee; R Draut; E Kujawski
Journal:  Acta Cytol       Date:  1999 Nov-Dec       Impact factor: 2.319

2.  Icy: an open bioimage informatics platform for extended reproducible research.

Authors:  Fabrice de Chaumont; Stéphane Dallongeville; Nicolas Chenouard; Nicolas Hervé; Sorin Pop; Thomas Provoost; Vannary Meas-Yedid; Praveen Pankajakshan; Timothée Lecomte; Yoann Le Montagner; Thibault Lagache; Alexandre Dufour; Jean-Christophe Olivo-Marin
Journal:  Nat Methods       Date:  2012-06-28       Impact factor: 28.547

3.  Deep learning for cell image segmentation and ranking.

Authors:  Flávio H D Araújo; Romuere R V Silva; Daniela M Ushizima; Mariana T Rezende; Cláudia M Carneiro; Andrea G Campos Bianchi; Fátima N S Medeiros
Journal:  Comput Med Imaging Graph       Date:  2019-01-30       Impact factor: 4.790

Review 4.  A review of image analysis and machine learning techniques for automated cervical cancer screening from pap-smear images.

Authors:  Wasswa William; Andrew Ware; Annabella Habinka Basaza-Ejiri; Johnes Obungoloch
Journal:  Comput Methods Programs Biomed       Date:  2018-06-26       Impact factor: 5.428

5.  Digital morphometry of cytologic aspirate endometrial samples.

Authors:  Vesna Mahovlić; Ana Ovanin-Rakić; Lada Skopljanac-Macina; Ana Barisić; Sanda Rajhvajn; Danijela Juric; Ivana Samija Projić; Jadranka Ilić-Forko; Damir Babić; Snjezana Skrablin-Kucić; Jadranka Bozikov
Journal:  Coll Antropol       Date:  2010-03

Review 6.  Bethesda 2014: improving on a paradigm shift.

Authors:  D C Wilbur; R Nayar
Journal:  Cytopathology       Date:  2015-12       Impact factor: 2.073

7.  Graph-based segmentation of abnormal nuclei in cervical cytology.

Authors:  Ling Zhang; Hui Kong; Shaoxiong Liu; Tianfu Wang; Siping Chen; Milan Sonka
Journal:  Comput Med Imaging Graph       Date:  2017-01-31       Impact factor: 4.790

8.  Human papillomavirus type distribution in invasive cervical cancer and high-grade cervical intraepithelial neoplasia across 5 countries in Asia.

Authors:  Swee Chong Quek; Boon Kiong Lim; Efren Domingo; Ruey Soon; Jong-Sup Park; Thi Nhung Vu; Eng Hseon Tay; Quang Thanh Le; Young-Tak Kim; Ba Quyet Vu; Ngoc Thanh Cao; Genara Limson; Viet Thanh Pham; Anco Molijn; Gunasekaran Ramakrishnan; Jing Chen
Journal:  Int J Gynecol Cancer       Date:  2013-01       Impact factor: 3.437

Review 9.  Automated screening of Papanicolaou tests: A review of the literature.

Authors:  Michael J Thrall
Journal:  Diagn Cytopathol       Date:  2018-03-30       Impact factor: 1.582

10.  Is cervical screening preventing adenocarcinoma and adenosquamous carcinoma of the cervix?

Authors:  Alejandra Castanon; Rebecca Landy; Peter D Sasieni
Journal:  Int J Cancer       Date:  2016-05-06       Impact factor: 7.396

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.