Literature DB >> 30755250

BreCaHAD: a dataset for breast cancer histopathological annotation and diagnosis.

Alper Aksac1, Douglas J Demetrick2, Tansel Ozyer3, Reda Alhajj4,5.   

Abstract

OBJECTIVES: Histopathological tissue analysis by a pathologist determines the diagnosis and prognosis of most tumors, such as breast cancer. To estimate the aggressiveness of cancer, a pathologist evaluates the microscopic appearance of a biopsied tissue sample based on morphological features which have been correlated with patient outcome. DATA DESCRIPTION: This paper introduces a dataset of 162 breast cancer histopathology images, namely the breast cancer histopathological annotation and diagnosis dataset (BreCaHAD) which allows researchers to optimize and evaluate the usefulness of their proposed methods. The dataset includes various malignant cases. The task associated with this dataset is to automatically classify histological structures in these hematoxylin and eosin (H&E) stained images into six classes, namely mitosis, apoptosis, tumor nuclei, non-tumor nuclei, tubule, and non-tubule. By providing this dataset to the biomedical imaging community, we hope to encourage researchers in computer vision, machine learning and medical fields to contribute and develop methods/tools for automatic detection and diagnosis of cancerous regions in breast cancer histology images.

Entities:  

Keywords:  Annotation; Breast cancer; Dataset; H&E staining; Histopathology; Nottingham histologic score

Mesh:

Year:  2019        PMID: 30755250      PMCID: PMC6373078          DOI: 10.1186/s13104-019-4121-7

Source DB:  PubMed          Journal:  BMC Res Notes        ISSN: 1756-0500


Objective

Histopathological tissue analysis by a pathologist plays an important role in the diagnosis and prognosis of many types of cancer, such as breast. Staging and grading systems may vary for different types of cancer. Breast cancer is one of the most common types of cancer; it has its own grading systems. Nottingham grading system (also called the Elston-Ellis [1] modification of Scarff-Bloom-Richardson [2] grading system) is widely used criteria for the grade of breast tissues based on three main features, namely nuclear pleomorphism, tubular formation, and mitotic count, each of which is given 1 to 3 points. The scores of these three features are added together to determine an overall final score (in the range of 3–9) and the grade of the breast cancer. However, manually spotting and annotating the affected area(s) on histopathology images with high accuracy is regarded as the gold standard in cancer diagnosis and grading, but it is also a time-consuming and tedious task that requires considerable effort, expertise and experience of pathologists. These skills are mostly gained over time by analyzing more cases. Whereas this visual interpretation has strict guidelines, it brings a certain subjectivity to the histological analysis, and therefore leads to inter/intra-observer variability [3, 4] and some reproducibility issues. Besides, these issues may have a direct effect on patient prognosis and treatment planning. These problems can be alleviated by developing automated image analysis tools in digitized histopathology. Thanks to the rapid development in the image capturing and analysis technology which could be employed to not only give more insight to but also guide pathologists in detecting and grading infected cases. These quantitative computational tools aim to improve the quality of pathology researchers concerning speed and accuracy. Thus, it is imperative to develop an automatic assessment tool for the quantitative and qualitative analysis in order to help in removing this drawback. However, histopathological examination of tissues is still a challenging problem since fixation, embedding, sectioning and staining steps in tissue preparation produce large amounts of artifacts and differences [5]. Besides, the variability in size, shape, location, texture of nuclei turn automated detection into a tedious and more difficult task. We believe that our various annotations from different cases will help to provide good enough information about these challenging situations.

Data description

In this paper, we present a dataset of breast cancer histopathology images named BreCaHAD (Table 1, Data set 1) which is publicly available to the biomedical imaging community [6]. The images were obtained from archived surgical pathology example cases which have been archived for teaching purposes. Nottingham Grading System is an international grading system for breast cancer recommended by the World Health Organization, where the assessment of three morphological features (tubule formation, nuclear pleomorphism, and mitotic count) is used for scoring to decide on the final grade of the cancer case. To get these features, the H&E stained histological images are annotated or marked by a pathologist as either mitosis, apoptosis, tumor nuclei, non-tumor nuclei, tubule, and non-tubule. The sample cases are collected from various scenarios ranging from histological structures with clear boundaries to poorly differentiated structures with lack of typical features.
Table 1

Overview of data files/data sets

LabelName of data file/data setFile types (file extension)Data repository and identifier (DOI or accession number)
Data file 1annotation_details.xlsxMS Excel file (.xlsx)Figshare (10.6084/m9.figshare.7379186)
Data file 2original.pngImage file (.png)Figshare (10.6084/m9.figshare.7379186)
Data file 3annotated.pngImage file (.png)Figshare (10.6084/m9.figshare.7379186)
Data file 4data.jsonJSON format file (.json)Figshare (10.6084/m9.figshare.7379186)
Data set 1BreCaHAD.zipArchive file (.zip) containing datasetFigshare (10.6084/m9.figshare.7379186)
Overview of data files/data sets The BreCaHAD dataset contains microscopic biopsy images which are saved in uncompressed (.TIFF) image format, three-channel RGB with 8-bit depth in each channel, and the dimension is 1360 × 1024 pixels and each image is annotated (see Table 1, Data file 2–3). These annotations are mitosis, apoptosis, tumor nuclei, non-tumor nuclei, tubule, and non-tubule. They are used in the assessment of three morphological features, namely nuclear pleomorphism, tubular formation, and mitotic count. Besides, breast tissue biopsy slides are used to generate samples is stained with hematoxylin and eosin (H&E). The same acquisition conditions and settings are used to obtain digitized images from tissue sample slides with a 0.514 µm × 0.527 µm per pixel at 40×, the camera at 40× objective captures 700 microns by 540 microns of microscopic image with a chip of 1360 × 1024 pixels. The images were captured under brightfield illumination with a Zeiss 40× oil objective on a Ziess Axiophot microscope through a 10× magnifier to a Spot Pursuit PR3440 camera controlled by Spot v5.2 software. While an automatic exposure mode is selected for the camera, the focusing is done manually for each slide. All specimens were breast tissue fixed in 10% neutral buffered formalin (pH 7.4) for 12 h, processed in graded ethanol/xylene to Surgiplast paraffin. All sections were cut at 4 microns thickness, deparaffinized and stained with Harris’ hematoxylin and 1% eosin as per standard procedures. Specimens have been archived from 2 to 20 years, hence slight differences in staining and color characteristics reflect the procedures and reagents used over time. The dataset currently contains four malignant tumors (breast cancer): ductal carcinoma (DC), lobular carcinoma (LC), mucinous carcinoma (MC), and tubular carcinoma (TC). The distribution of annotations in the previously mentioned six classes and the format of the annotations for the BreCaHAD dataset can be found in Table 1, Data file 1. The annotations for the BreCaHAD dataset are provided in JSON (JavaScript Object Notation) format. In the given Table 1, Data file 4, the JSON file (ground truth) contains two mitosis and only one tumor nuclei annotations. Here, x and y are the coordinates of the centroid of the annotated object, and the values are between [0, 1] (divided by width and height of an image). By providing this dataset for research purposes, we wish to promote research in computer-aided diagnosis for breast cancer histopathology. Thus, researchers can optimize and prove the usefulness of their proposed methods while experimenting with this dataset.

Limitations

The limited pixel/image tonal range of the images due to the camera, slight differences in color due to differing batches of hematoxylin over time, and the optical resolution of the 100× oil objective and immersion oil medium as these images were meant to reflect actual surgical pathology images typically used by diagnostic surgical pathologists to evaluate breast biopsies. In addition, the overall grading score for each case is not available and also the classification label is not included as either ductal carcinoma, lobular carcinoma, mucinous carcinoma or tubular carcinoma for each image.
  4 in total

1.  Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: experience from a large study with long-term follow-up.

Authors:  C W Elston; I O Ellis
Journal:  Histopathology       Date:  1991-11       Impact factor: 5.087

2.  Interobserver reproducibility of the Nottingham modification of the Bloom and Richardson histologic grading scheme for infiltrating ductal carcinoma.

Authors:  H F Frierson; R A Wolber; K W Berean; D W Franquemont; M J Gaffey; J C Boyd; D C Wilbur
Journal:  Am J Clin Pathol       Date:  1995-02       Impact factor: 2.493

3.  Histological grading of breast carcinomas: a study of interobserver agreement.

Authors:  P Robbins; S Pinder; N de Klerk; H Dawkins; J Harvey; G Sterrett; I Ellis; C Elston
Journal:  Hum Pathol       Date:  1995-08       Impact factor: 3.466

4.  Histological grading and prognosis in breast cancer; a study of 1409 cases of which 359 have been followed for 15 years.

Authors:  H J BLOOM; W W RICHARDSON
Journal:  Br J Cancer       Date:  1957-09       Impact factor: 7.640

  4 in total
  8 in total

Review 1.  Computer-Aided Histopathological Image Analysis Techniques for Automated Nuclear Atypia Scoring of Breast Cancer: a Review.

Authors:  Asha Das; Madhu S Nair; S David Peter
Journal:  J Digit Imaging       Date:  2020-10       Impact factor: 4.056

Review 2.  AI-enabled image fraud in scientific publications.

Authors:  Jinjin Gu; Xinlei Wang; Chenang Li; Junhua Zhao; Weijin Fu; Gaoqi Liang; Jing Qiu
Journal:  Patterns (N Y)       Date:  2022-07-08

3.  Deep learning-based cross-classifications reveal conserved spatial behaviors within tumor histological images.

Authors:  Javad Noorbakhsh; Saman Farahmand; Ali Foroughi Pour; Sandeep Namburi; Dennis Caruana; David Rimm; Mohammad Soltanieh-Ha; Kourosh Zarringhalam; Jeffrey H Chuang
Journal:  Nat Commun       Date:  2020-12-11       Impact factor: 14.919

4.  Generative Adversarial Networks to Improve Fetal Brain Fine-Grained Plane Classification.

Authors:  Alberto Montero; Elisenda Bonet-Carne; Xavier Paolo Burgos-Artizzu
Journal:  Sensors (Basel)       Date:  2021-11-29       Impact factor: 3.576

Review 5.  Breast histopathological image analysis using image processing techniques for diagnostic puposes: A methodological review.

Authors:  R Rashmi; Keerthana Prasad; Chethana Babu K Udupa
Journal:  J Med Syst       Date:  2021-12-03       Impact factor: 4.460

Review 6.  Potential of modern circulating cell-free DNA diagnostic tools for detection of specific tumour cells in clinical practice.

Authors:  Jernej Gašperšič; Alja Videtič Paska
Journal:  Biochem Med (Zagreb)       Date:  2020-08-05       Impact factor: 2.313

7.  CACTUS: cancer image annotating, calibrating, testing, understanding and sharing in breast cancer histopathology.

Authors:  Alper Aksac; Tansel Ozyer; Douglas J Demetrick; Reda Alhajj
Journal:  BMC Res Notes       Date:  2020-01-06

8.  Novel Transfer Learning Approach for Medical Imaging with Limited Labeled Data.

Authors:  Laith Alzubaidi; Muthana Al-Amidie; Ahmed Al-Asadi; Amjad J Humaidi; Omran Al-Shamma; Mohammed A Fadhel; Jinglan Zhang; J Santamaría; Ye Duan
Journal:  Cancers (Basel)       Date:  2021-03-30       Impact factor: 6.639

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.