Literature DB >> 28480122

The Need for Careful Data Collection for Pattern Recognition in Digital Pathology.

Abstract

Effective pattern recognition requires carefully designed ground-truth datasets. In this technical note, we first summarize potential data collection issues in digital pathology and then propose guidelines to build more realistic ground-truth datasets and to control their quality. We hope our comments will foster the effective application of pattern recognition approaches in digital pathology.

Entities: Chemical

Keywords: Digital pathology; ground truth; object recognition; pattern recognition; quality control

Year: 2017 PMID： 28480122 PMCID： PMC5404354 DOI： 10.4103/jpi.jpi_94_16

Source DB: PubMed Journal: J Pathol Inform

Introduction

In pathology, the study of cells (cytology) and tissue (histology) is performed by examining cells and tissues which were sectioned, stained, and mounted on a microscope glass slide under a light microscope. These studies typically aim at detecting changes in cellularity or tissue architecture for the diagnosis of a disease. Over the past few decades, technological advances in scanning technology enabled the high-throughput conversion of glass slides into digital slides (whole slide images) at resolutions approaching those of traditionally used optical microscopes. Digital pathology has become an active field that holds promise for the future of anatomic pathology and raises many pattern recognition research challenges such as rare object detection/counting and robust tissue segmentation.[123] In addition to the numerous potential patterns to recognize in digital slides, one of the key challenges for recognition algorithms is the wide variety of sample preparation protocols. These yield highly variable image appearances of tissue and cellular structures. Ideally, pattern recognition algorithms should be versatile so that they could be applied to several classification tasks and image acquisition conditions without the need to develop completely novel methods but using training datasets related to each novel task at hand. However, such an idealistic application of pattern recognition methods on real-world applications requires the ground-truth data to be carefully designed and realistic. We believe realistic data collection is an underestimated challenge in digital pathology that deserves more attention. In this technical note, we first discuss potential dataset issues in digital pathology. We then suggest guidelines and tools to set up better ground-truth datasets and evaluation protocols.

Discussion

Potential sources of dataset variability and bias

Object recognition aims at designing methods to automatically find and identify objects in an image. The design of such methods usually requires ground-truth datasets provided by domain experts and depicting various categories (or classes) of objects to recognize. In object recognition research, publicly available ground-truth datasets are essential to enable continuous progress as they also allow algorithm quantitative evaluation and comparison of algorithms. However, computer vision dataset issues have been raised recently against datasets used for several years.[4567891011] We expect similar problematic issues might arise in the coming years in the emerging field of digital pathology if precautions are not taken when collecting new datasets. Indeed, in some of these studies, published in the broader computer vision community, authors have shown that some hidden regularities can be exploited by learning algorithms to classify images with some success. For example, background environments can be exploited in several face recognition benchmarks.[67] Similarly, images of some object recognition datasets can be classified using background regions with accuracy far higher than mere chance[11] although images were acquired in controlled environments. In biomedical imaging, illumination, focus, or staining settings might also discreetly contribute to classification performance.[10] This type of fluctuation can lead to reduced generalization performance of classifiers as also observed in high-content screening experiments where images of different plates can have quite different gray value distributions.[12] Overall, these dataset biases will prevent an algorithm to work well on new images and are potentially guiding algorithm developers in the wrong direction. Moreover, the realism of several benchmarks has to be questioned beside the large amount of imaging data needed to analyze digital pathology applications. For example, in diagnostic cytology, a single patient slide might contain hundreds of thousands of objects (cells and artifacts). However, typical benchmarks (e.g.,[13] in serous cytology, and[14] in cervical cancer cytology screening) contain only a few hundred individual cells from a limited number (or unknown number) of patient samples; hence, variations induced by laboratory practices and by biological factors are often not well represented. We believe that this partly explains why pattern recognition approaches had only a limited impact in cytology although there have been numerous attempts at designing computer-aided cytology systems.[15] The lack of details concerning data acquisition and evaluation protocols is also potentially hiding idiosyncrasies. An obvious sample selection bias would consist in collecting all examples of a given class (e.g., malignant cells) from a subset of slides while objects of another class (e.g., benign cells) are collected from another subset of slides. Such a data collection strategy might lead to classifiers that unwillingly capture slide-specific patterns rather than class-specific ones, hence have poor generalization performance. Similar problems might occur with other experimental factors, for example, when examples from slides stained in a different laboratory or stained on different days of the week are used, as it has been shown that these are major factors causing color variations in histology.[16] It has been reported that many other factors (e.g., variation in fixation delay timings, changes in temperature, etc.) can affect cytological specimens[17] and tissue sections,[18] hence the images used to develop recognition algorithms. Similarly, in immunohistochemistry, variable preanalytical conditions (such as fluctuations in cold ischemia, fixation, or stabilization time) could induce changes on certain marker expression, hence image analysis results.[19] Indeed, samples are prepared using colored histochemical stains that bind selectively to cellular components. Color variability is inherent to cytopathology and histopathology based on transmitted microscopy due to the several factors such as variable chemical coloring/reactivity from different manufacturers/batches of stains, coloring being dependent on staining procedures (timing, concentrations, etc.). Furthermore, light transmission is a function of tissue section thickness and influenced by the components of the different scanners used to acquire whole slide images.

Data collection guidelines

While it would be hardly possible to avoid all dataset variability and bias, it is important that the protocols for data acquisition and imaging acquisition try to reduce the nonrelevant differences between object categories. Moreover, object recognition evaluation protocols should focus on challenging methods in terms of robustness. Table 1 lists and organizes recommendations for less biased data collection based on lessons learned from the design of a practical cytology system,[20] from observations in digital pathology challenges,[21] from more general recommendations in the broader microscopy image analysis,[222326] and from computer vision literature.[27] While all these recommendations might not be followed simultaneously due to current standard practices and limited resources, we recommend to follow the most of these whenever possible.

Table 1

Guidelines for less biased data collection and algorithm evaluation

Dataset quality control

While following guidelines for the construction of a realistic ground truth should reduce dataset bias, it might not be possible to control and constrain every aspect of the data collection due to the current laboratory practices and available resources. Hence, there might still be real-life reasons for dataset shift.[28] While other works have considered ground-truth quality assessment using various annotation scoring functions (e.g.,[29] where authors used the number of control points in the bounding polygon of a manual annotation), we believe these are not very relevant for practical pattern recognition applications in digital pathology. As[610] we rather think, it is important to assess dataset quality with respect to outcomes used by the final users. We, therefore, recommend to implement two simple quality control tests for assessing novel datasets and detecting biases before intensively working on them. The first strategy simply evaluates recognition performances (e.g., classification accuracy) with global color histogram methods or related approaches. While color information can be helpful for some classification tasks, too good results using such a simple scheme might reveal that individual pixel intensities are (strongly) related to image classes. In particular, in histology and cytology, color statistics may be of additional value, for example, to indirectly recognize a cell with a larger dark nucleus, but experts usually discriminate objects based on subtle morphological or textural criteria. For example, we have observed that staining variability can be exploited by such an approach on a dataset of 850 images of H and E stained liver tissue sections from an aging study involving female mice on ad libitum or caloric restriction diets.[30] We use the Extremely Randomized Trees for Feature Learning (ET-FL) open-source classification algorithm of[31] that yields < 5% error rate to discriminate mouse liver tissues at different development stages using only individual pixels encoded in the Hue-Saturation-Value (HSV) color space (using 10-fold cross-validation evaluation protocol, and method parameter values[31] were: T = 10, nmin = 5000, and k = 3, with NLs = 1 million pixels extracted from training images). We observed that a similar approach yields also < 5% error rate for the classification of 1057 patches of four immunostaining patterns (background, connective tissue, cytoplasmic staining, and nuclear staining) from breast tissue microarrays[32] (using the same evaluation protocol and method parameter values). Second, similarly to[6] that observed background artifacts in face datasets, one can easily evaluate recognition rates of classification methods on regions not centered on the objects of interest. We performed such an experiment using all 260 images of acute lymphoblastic leukemia lymphoblasts.[33] Using the ET-FL classifier,[31] we obtained 9% error rate using only pixel data from a square patch of 50 × 50 pixels extracted at the top-left corner of each image corresponding to background regions (using 10-fold cross-validation evaluation protocol, and method parameter values[31] were: T = 10, nmin = 5000, and k = 28, with NLs = 1 million 16 × 16 subwindows extracted from training images and described by HSV pixel values). That is significantly better than majority/random voting although these patches do not include any information about the cells to be recognized. This problem is illustrated in Figure 1.

Figure 1

Illustration of illumination/saturation bias in unprocessed images from a dataset describing normal and lymphoblast cells.[28] The large images (left) are two images from each class. Small images (right) are cropped subimages (top left 50 × 50 corner) from 16 images for each class In these two datasets, some acquisition factors are correlated to individual classes. Overall, these overly simple experiments stress the need for carefully designed datasets and evaluation protocols in digital pathology.

Conclusions

Pattern recognition could significantly shape digital pathology in the next few years as it has a large number of potential applications, but it requires the availability of representative ground-truth datasets. In this note, we summarized data collection challenges in this field and suggest guidelines and tools to improve the quality of ground-truth datasets. Overall, we hope these comments will complement other recent studies that provide guidelines for the design and application of pattern recognition methodologies,[313435] hence contribute to the successful application of pattern recognition in digital pathology.

Financial support and sponsorship

R.M. was supported by the CYTOMINE and HISTOWEB research grants of the Wallonia (DGO6 n°1017072 and n°1318185).

Conflicts of interest

There are no conflicts of interest.

15 in total

1. Assessing the efficacy of low-level image content descriptors for computer-based fluorescence microscopy image analysis.

Authors: L Shamir
Journal: J Microsc Date: 2011-05-23 Impact factor: 1.758

2. Computational pathology: challenges and promises for tissue analysis.

Authors: Thomas J Fuchs; Joachim M Buhmann
Journal: Comput Med Imaging Graph Date: 2011-04-09 Impact factor: 4.790

Review 3. Challenges and Benchmarks in Bioimage Analysis.

Authors: Michal Kozubek
Journal: Adv Anat Embryol Cell Biol Date: 2016 Impact factor: 1.231

4. Evaluation of CellSolutions BestPrep® automated thin-layer liquid-based cytology Papanicolaou slide preparation and BestCyte® cell sorter imaging system.

Authors: Agnes Delga; Frederic Goffin; Frederic Kridelka; Raphaël Marée; Chantal Lambert; Philippe Delvenne
Journal: Acta Cytol Date: 2014-09-27 Impact factor: 2.319

5. Cytological artifacts masquerading interpretation.

Authors: Khushboo Sahay; Monica Mehendiratta; Shweta Rehani; Madhumani Kumra; Rashi Sharma; Priyanka Kardam
Journal: J Cytol Date: 2013-10 Impact factor: 1.000

Review 6. Pattern recognition software and techniques for biological image analysis.

Authors: Lior Shamir; John D Delaney; Nikita Orlov; D Mark Eckley; Ilya G Goldberg
Journal: PLoS Comput Biol Date: 2010-11-24 Impact factor: 4.475

Review 7. Pathology imaging informatics for quantitative analysis of whole-slide images.

Authors: Sonal Kothari; John H Phan; Todd H Stokes; May D Wang
Journal: J Am Med Inform Assoc Date: 2013-08-19 Impact factor: 4.497

8. Collaborative analysis of multi-gigapixel imaging data using Cytomine.

Authors: Raphaël Marée; Loïc Rollus; Benjamin Stévens; Renaud Hoyoux; Gilles Louppe; Rémy Vandaele; Jean-Michel Begon; Philipp Kainz; Pierre Geurts; Louis Wehenkel
Journal: Bioinformatics Date: 2016-01-10 Impact factor: 6.937

9. Automated classification of immunostaining patterns in breast tissue from the human protein atlas.

Authors: Issac Niwas Swamidoss; Andreas Kårsnäs; Virginie Uhlmann; Palanisamy Ponnusamy; Caroline Kampf; Martin Simonsson; Carolina Wählby; Robin Strand
Journal: J Pathol Inform Date: 2013-03-30

10. Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases.

Authors: Andrew Janowczyk; Anant Madabhushi
Journal: J Pathol Inform Date: 2016-07-26

3 in total

1. Gastric Pathology Image Classification Using Stepwise Fine-Tuning for Deep Neural Networks.

Authors: Jia Qu; Nobuyuki Hiruta; Kensuke Terai; Hirokazu Nosato; Masahiro Murakawa; Hidenori Sakanashi
Journal: J Healthc Eng Date: 2018-06-21 Impact factor: 2.682

2. Semantic annotation for computational pathology: multidisciplinary experience and best practice recommendations.

Authors: Noorul Wahab; Islam M Miligy; Katherine Dodd; Harvir Sahota; Michael Toss; Wenqi Lu; Mostafa Jahanifar; Mohsin Bilal; Simon Graham; Young Park; Giorgos Hadjigeorghiou; Abhir Bhalerao; Ayat G Lashen; Asmaa Y Ibrahim; Ayaka Katayama; Henry O Ebili; Matthew Parkin; Tom Sorell; Shan E Ahmed Raza; Emily Hero; Hesham Eldaly; Yee Wah Tsang; Kishore Gopalakrishnan; David Snead; Emad Rakha; Nasir Rajpoot; Fayyaz Minhas
Journal: J Pathol Clin Res Date: 2022-01-10

Review 3. Open Practices and Resources for Collaborative Digital Pathology.

Authors: Raphaël Marée
Journal: Front Med (Lausanne) Date: 2019-11-14

3 in total