| Literature DB >> 35764660 |
Bálint Ármin Pataki1, Alex Olar1, Dezső Ribli1, Adrián Pesti2, Endre Kontsek2, Benedek Gyöngyösi2, Ágnes Bilecz2, Tekla Kovács2, Kristóf Attila Kovács2, Zsófia Kramer2, András Kiss2, Miklós Szócska3, Péter Pollner4,5, István Csabai1.
Abstract
Histopathology is the gold standard method for staging and grading human tumors and provides critical information for the oncoteam's decision making. Highly-trained pathologists are needed for careful microscopic analysis of the slides produced from tissue taken from biopsy. This is a time-consuming process. A reliable decision support system would assist healthcare systems that often suffer from a shortage of pathologists. Recent advances in digital pathology allow for high-resolution digitalization of pathological slides. Digital slide scanners combined with modern computer vision models, such as convolutional neural networks, can help pathologists in their everyday work, resulting in shortened diagnosis times. In this study, 200 digital whole-slide images are published which were collected via hematoxylin-eosin stained colorectal biopsy. Alongside the whole-slide images, detailed region level annotations are also provided for ten relevant pathological classes. The 200 digital slides, after pre-processing, resulted in 101,389 patches. A single patch is a 512 × 512 pixel image, covering 248 × 248 μm2 tissue area. Versions at higher resolution are available as well. Hopefully, HunCRC, this widely accessible dataset will aid future colorectal cancer computer-aided diagnosis and research.Entities:
Mesh:
Year: 2022 PMID: 35764660 PMCID: PMC9240013 DOI: 10.1038/s41597-022-01450-y
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
Summary of the annotated data at patch level, after removing the patches that did not pass the quality filter.
| label | #WSIs | #patches | %patches | #Gpixels |
|---|---|---|---|---|
| low-grade dysplasia | 115 | 57397 | 56.61 | 15.0 |
| high-grade dysplasia | 35 | 3057 | 3.02 | 0.8 |
| adenocarcinoma | 34 | 4567 | 4.50 | 1.2 |
| suspicious for invasion | 13 | 681 | 0.67 | 0.2 |
| inflammation | 22 | 1026 | 1.01 | 0.3 |
| resection edge | 11 | 541 | 0.53 | 0.1 |
| tumor necrosis | 10 | 624 | 0.62 | 0.2 |
| lymphovascular invasion | 0 | 0 | 0.00 | 0.0 |
| artifact | 30 | 4169 | 4.11 | 1.1 |
| normal | 174 | 31323 | 30.89 | 8.2 |
Altogether 200 WSIs were annotated and it can be clearly seen that almost all contained some normal tissue. Note, that summing the proportions end up over 100%. This happens because some patches have overlapping, multiple annotations. Altogether 101,389 patches were stored.
Summary of the patches, which passed the quality filters for three different zoom levels.
| zoom ID | covered tissue per patch ( | #patches | dataset size (GB) |
|---|---|---|---|
| 0 | 62 × 62 | 1,593,113 | 79 |
| 1 | 124 × 124 | 402,904 | 24 |
| 2 | 248 × 248 | 101,389 | 7 |
For each zoom level a single patch contains 512 × 512 pixels. In the manuscript, all the results are presented using the dataset, which corresponds to the zoom ID 2.
Fig. 1Visualization of the saturation and intensity based patch quality filter. Top Left: saturation was calculated for various RGB colors. The red dashed line shows the used threshold value 0.05, which is clearly a permissive cutoff. Bottom Left: Zoomed to the highlighted 0.0–0.2 region. ε = 10−8 is needed for numerical stability. Right: intensity plotted for a few RGB colors, the colors are considered burnout if their intensity is over 245. On the figure the red dashed line separates the burnout colors.
Fig. 2The process of creating annotations on the WSI. First, the local, free-hand annotations are marked on WSI. Then, the annotations are exported as binary masks which get processed into categorized patches with the described filtering, see Algorithm 1. (a) Screenshot from QuPath v0.1.2 from the viewpoint of the annotator. The surrounding cyan-blue line indicated the area that was annotated. The various smaller colored regions represent different labeled areas. A tissue area without any annotation is considered normal. (b) The exported binary, black&white masks for the annotations. (c) Visualization of patches on a WSI. Patches that did not pass the quality filter are discarded and patches that had more than half of their pixels labeled with an annotation are assigned to that category. For the shown example 186 patches were kept. (d) A few examples of the resulting 512 × 512 pixel size patches. Each patch covers a 248 μm × 248 μm area.
Fig. 3Predictions generated by a ResNet50 CNN with 5-fold cross-validation, calculated from patch-level data. All patches were handled independently. Left: precision and recall score for various local annotation categories. The outputs of the final sigmoid layer, the probability predictions were converted to binary predictions with a 50% threshold. Right: receiver operating characteristic (ROC) curves for the same categories.
| Measurement(s) | H&E slide staining • |
| Technology Type(s) | Hematoxylin and Eosin Staining Method • bright-field microscopy • Observation • Biopsy of Colon |
| Factor Type(s) | screening status of colon cancer or normal tissue |
| Sample Characteristic - Organism | Homo sapiens |
| Sample Characteristic - Location | Central Hungary |