| Literature DB >> 35064165 |
Zahra Mousavi Kouzehkanan1,2, Sepehr Saghari2,3, Sajad Tavakoli2,4, Peyman Rostami2,5, Mohammadjavad Abaszadeh1, Farzaneh Mirzadeh2,6, Esmaeil Shahabi Satlsar2,7, Maryam Gheidishahran8, Fatemeh Gorgi9, Saeed Mohammadi10, Reshad Hosseini11.
Abstract
Accurate and early detection of anomalies in peripheral white blood cells plays a crucial role in the evaluation of well-being in individuals and the diagnosis and prognosis of hematologic diseases. For example, some blood disorders and immune system-related diseases are diagnosed by the differential count of white blood cells, which is one of the common laboratory tests. Data is one of the most important ingredients in the development and testing of many commercial and successful automatic or semi-automatic systems. To this end, this study introduces a free access dataset of normal peripheral white blood cells called Raabin-WBC containing about 40,000 images of white blood cells and color spots. For ensuring the validity of the data, a significant number of cells were labeled by two experts. Also, the ground truths of the nuclei and cytoplasm are extracted for 1145 selected cells. To provide the necessary diversity, various smears have been imaged, and two different cameras and two different microscopes were used. We did some preliminary deep learning experiments on Raabin-WBC to demonstrate how the generalization power of machine learning methods, especially deep neural networks, can be affected by the mentioned diversity. Raabin-WBC as a public data in the field of health can be used for the model development and testing in different machine learning tasks including classification, detection, segmentation, and localization.Entities:
Mesh:
Year: 2022 PMID: 35064165 PMCID: PMC8782871 DOI: 10.1038/s41598-021-04426-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Characteristics of white blood cells[4].
| WBCs | % In blood | Nucleus | Cytoplasm | Size (μm) |
|---|---|---|---|---|
| Neutrophils | 60% | It is divided into 2 to 5 segments and stains dark purple (multi-lobed) | It is pale pink to tan with pink-purple granules | 12–16 |
| Eosinophils | 3% | It is blue and is divided into 2 segments | It is full of pale pink tan with large orange and red granules | 14–16 |
| Basophils | 1% | It has 2 lobes that each stains purple, and is difficult to be seen | It is pale pink-tan but contains large purple/blue-black granules which obscure the cell nucleus | 14–16 |
| Monocytes | 6% | It is singular and is kidney shaped (convoluted shape), bean shaped or horseshoe shaped with deep indentation | It stains a blue-gray color and is "ground glass" with tiny granules, Vacuoles are sometimes present in it | 14–20 |
| Lymphocytes | 30% | It is large, round or oval, and is dark staining | It is not present or very small, and is pale blue in color, and occasionally has purple-reddish granules | 8–15 |
Figure 1Five types of white blood cells in the normal peripheral blood.
White blood cells alterations and related different diseases[5].
| White blood cell | Increase | Decrease |
|---|---|---|
| Lymphocyte | Acute and chronic leukemia, hypersensitivity reaction, viral infection | AIDS, influenza, sepsis, aplastic anemia |
| Monocyte | Autoimmune disease, fungal and protozoan infection | Aplastic anemia, hairy cell leukemia, acute infections |
| Neutrophil | Chronic inflammation, Infection | Chediak-Higashi syndrome, Kostmman syndrome, Autoimmune neutropenia |
| Eosinophil | Allergic reaction, parasitic infection, malignancy | Cushing syndrome, shock or trauma driven stress |
| Basophil | Leukemias | Hyperthyroidism and acute infections |
Raabin-WBC information table.
| Number of all films (smear) | 73 |
| Number of CML films | 1 |
| Number of normal-anemia films | 2 |
| Number of normal-eosinophilia films | 2 |
| Number of normal films | 68 |
| Number of microscopic large images | 20,936 |
| Number of bounding boxes (including WBCs and artifacts) | 40,763 |
| Number of 0 labeled WBCs | 10,385 |
| Number of 1 labeled WBCs | 4971 |
| Number of 2 labeled WBCs | 25,408 |
| Number of ground truths for lymphocytes (including nucleus and cytoplasm) | 242 |
| Number of ground truths for monocytes (including nucleus and cytoplasm) | 242 |
| Number of ground truths for neutrophils (including nucleus and cytoplasm) | 242 |
| Number of ground truths for eosinophils (including nucleus and cytoplasm) | 201 |
| Number of ground truths for basophils (whole cell) | 218 |
Figure 2Diagram of labels in the Raabin-WBC dataset.
The number of labels associated with two experts.
| Artifact | Band | Basophil | Burst | Eosinophil | Large lymph | Meta | Monocyte | Neutrophil | Small lymph | Not recognized | Not labeled | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Artifact | 3489 | 0 | 0 | 14 | 2 | 1 | 0 | 0 | 6 | 4 | 96 | 225 |
| Band | 0 | 311 | 0 | 2 | 0 | 0 | 2 | 0 | 32 | 0 | 16 | 71 |
| Basophil | 0 | 0 | 308 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 |
| Burst | 29 | 0 | 0 | 2673 | 1 | 11 | 0 | 4 | 1 | 32 | 96 | 525 |
| Eosinophil | 0 | 0 | 0 | 9 | 1466 | 0 | 0 | 0 | 1 | 1 | 13 | 607 |
| Large lymph | 0 | 0 | 0 | 1 | 0 | 2153 | 0 | 4 | 1 | 172 | 23 | 163 |
| Meta | 0 | 0 | 0 | 1 | 0 | 0 | 11 | 1 | 2 | 0 | 6 | 12 |
| Monocyte | 0 | 0 | 0 | 2 | 0 | 24 | 0 | 874 | 1 | 0 | 36 | 104 |
| Neutrophil | 0 | 134 | 0 | 29 | 3 | 1 | 0 | 2 | 11,726 | 1 | 109 | 1078 |
| Small lymph | 1 | 0 | 0 | 1 | 0 | 31 | 0 | 0 | 0 | 1833 | 20 | 370 |
| Not recognized | 65 | 5 | 0 | 9 | 5 | 744 | 10 | 81 | 127 | 332 | 1099 | 268 |
| Not labeled | 5 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 9 | 4 | 3 | 9015 |
The rows and columns belong to the first and second experts, respectively.
Figure 5An example of two overlapped microscopic images.
Figure 3Main steps of the Raabin-WBC dataset collection.
Smartphone camera specifications used for data collecting.
| Smartphone | Release date | Sensor model | Sensor type | No. of pixels | Aperture | Sensor size | Pixel size |
|---|---|---|---|---|---|---|---|
| Samsung Galaxy S5 | 2014 | Samsung S5K2P2XX ISOCELL | CMOS | 16 MP | f/2.2 31 mm | 1/2.6" | 1.12 µm |
| LG G3 | 2014 | Sony IMX135 Exmor RS | CMOS | 13 MP | f/2.4 29 mm | 1/3" | 1.12 µm |
Figure 4Designed adapter to mount smart phones on the ocular lens of a microscope to make the act of capturing the photos from the samples quicker and easier. Experts work with a microscope manually and see the images on the mounted smartphone and take photos.
Figure 6One sample that had been repeated three times.
Figure 7The user interface of the two android applications that were designed to selecting and labeling the white blood cells.
Figure 8The user interface of the desktop application designed for labeling white blood cells.
Figure 9Some samples of ground truths provided in the Raabin-WBC dataset. First row contains the original cropped images of white blood cells. Second row contains the ground truths of some nuclei and cytoplasm. The columns (a), (b), (c), (d), and (e) show lymphocyte, monocyte, neutrophil, eosinophil, and basophil, respectively.
Figure 10The user interface of Easy-GT software[41]. This software was developed for extracting the ground truths of nuclei in white blood cells.
Comparing some datasets with double-labeled Raabin-WBC.
| Dataset | Number of WBCs | Access | Staining | Microscope and zoom | Camera | Label | Ground truths | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Lymp | Mon | Neut | Eos | Bas | Total | Nucleus | Cytoplasm or whole cell | ||||||
| LISC[ | 59 | 55 | 56 | 42 | 54 | 266 | Public | Gismo-right | Axioskope40 Zoom : 100X | Sony-SSCDC50AP | One expert | 266 | 266 |
| BCCD[ | 33 | 19 | 208 | 86 | 3 | 349 | Public | Gismo-right | Regular light microscope Zoom : 100X | CCD color camera | One expert | ⨉ | ⨉ |
| Hegde et al.[ | 33 | 23 | 30 | 22 | 14 | 122 | Private | Leishman | OLYMPUS CX31 Zoom : 100X | N/A | One expert | 122 | ⨉ |
| MISP[ | 36 | 33 | 38 | 42 | 0 | 149 | Public | N/A | Canon optical microscope Zoom : 100X | Canon V1 | One expert | ⨉ | ⨉ |
| ALL-IDB[ | 60 | 3 | 18 | 2 | 1 | 84 | Public | N/A | N/A Zoom : 300X–500X | Canon PowerShot G5 | N/A | ⨉ | ⨉ |
Zheng et al.[ (CellaVision) | 37 | 18 | 30 | 12 | 3 | 100 | Public | N/A | N/A Zoom : N/A | N/A | One expert | 100 | 100 |
| Zheng et al.[ | 53 | 48 | 176 | 22 | 1 | 300 | Public | A newly developed method[ | N800-D motorized autofocus Zoom : N/A | Motic moticam pro 252A | One expert | 300 | 300 |
| Double-labeled Raabin-WBC | 3609 | 795 | 10,862 | 1066 | 301 | 17,965 | public | Giemsa | 1. Olympus Cx18 2. Zeiss microscope Zoom : 100 | 1.Camera phone Samsung galaxy S5 2.Camera phone LG G3 | Two experts | 1145 | 1145 |
Double-labeled Raabin-WBC does not contain the repeated samples as well as includes only five general cells (lymphocyte, monocyte, neutrophil, eosinophil, and basophil).
The number of samples in training data, test-A, and test-B.
| Sets | Lymph | Mono | Neut | Eos | Bas |
|---|---|---|---|---|---|
| Training data | 2427 | 561 | 6231 | 744 | 212 |
| Test-A | 1034 | 234 | 2660 | 322 | 89 |
| Test-B | 148 | 0 | 1971 | 0 | 0 |
The results of different pre-trained models as well as Tavakoli et al.[50] on the test-A dataset.
| Methods | Lymph | Mono | Neut | Eosi | Baso | Acc (%) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| P (%) | S (%) | F1 (%) | P (%) | S (%) | F1 (%) | P (%) | S (%) | F1 (%) | P (%) | S (%) | F1 (%) | P (%) | S (%) | F1 (%) | ||
| ResNet18[ | 98.84 | 99.23 | 99.08 | 96.96 | 95.30 | 96.12 | 99.66 | 99.36 | 99.5 | 95.77 | 98.45 | 97.09 | 100 | 100 | 100 | 99.06 |
| ResNet34[ | 98.66 | 99.52 | 99.09 | 96.93 | 94.44 | 95.67 | 99.85 | 99.14 | 99.49 | 94.67 | 99.38 | 96.97 | 100 | 100 | 100 | 99.01 |
| ResNet50[ | 98.47 | 99.52 | 98.99 | 97.36 | 94.44 | 95.88 | 99.74 | 99.40 | 99.57 | 96.94 | 98.45 | 97.69 | 100 | 100 | 100 | 99.10 |
| ResNext50[ | 98.01 | 100 | 98.99 | 99.53 | 91.45 | 95.32 | 99.74 | 99.55 | 99.64 | 97.85 | 98.76 | 98.30 | 100 | 100 | 100 | 99.17 |
| MnasNet1[ | 98.93 | 98.65 | 98.79 | 94.14 | 96.15 | 95.14 | 99.66 | 98.76 | 99.21 | 92.15 | 98.45 | 95.20 | 100 | 100 | 100 | 98.59 |
| MobileNet-V2[ | 98.85 | 99.32 | 99.08 | 96.93 | 94.44 | 95.67 | 99.66 | 99.40 | 99.53 | 96.97 | 99.38 | 98.16 | 100 | 100 | 100 | 99.12 |
| DenseNet121[ | 98.38 | 99.61 | 98.99 | 97.72 | 91.45 | 94.48 | 99.70 | 99.14 | 99.42 | 94.40 | 99.38 | 96.82 | 100 | 100 | 100 | 98.87 |
| ShuffleNet-V2[ | 98.09 | 99.32 | 98.70 | 96.49 | 94.02 | 95.24 | 99.74 | 99.36 | 99.55 | 97.85 | 98.76 | 98.30 | 100 | 100 | 100 | 99.03 |
| VGG16[ | 98.26 | 98.26 | 98.26 | 95.09 | 91.03 | 93.01 | 99.43 | 98.57 | 99 | 89.01 | 98.14 | 93.35 | 100 | 100 | 100 | 98.09 |
| Tavakoli et al.[ | 97.23 | 95.07 | 96.14 | 84.87 | 86.32 | 85.59 | 98 | 95.60 | 96.78 | 72.24 | 91.30 | 80.66 | 96.59 | 95.51 | 96.05 | 94.65 |
The results of different pre-trained models as well as Tavakoli et al.[50] on the test-B dataset.
| Method | Lymp | Neut | Acc (%) | ||||
|---|---|---|---|---|---|---|---|
| P (%) | S (%) | F1 (%) | P (%) | S (%) | F1 (%) | ||
| ResNet18[ | 21 | 94 | 34 | 100 | 2 | 3 | 8 |
| ResNet34[ | 24 | 94 | 38 | 100 | 27 | 43 | 32 |
| ResNet50[ | 21 | 95 | 35 | 100 | 2 | 4 | 8 |
| ResNext50[ | 24 | 90 | 38 | 100 | 3 | 5 | 9 |
| MnasNet1[ | 20 | 88 | 33 | 100 | 0 | 0 | 6 |
| MobileNet-V2[ | 72 | 50 | 59 | 100 | 0 | 0 | 4 |
| DenseNet121[ | 43 | 64 | 51 | 100 | 8 | 15 | 12 |
| ShuffleNet-V2[ | 42 | 75 | 54 | 100 | 0 | 0 | 5 |
| VGG16[ | 96 | 89 | 93 | 100 | 65 | 79 | 66 |
| Tavakoli et al.[ | 94 | 54 | 69 | 100 | 92 | 96 | 90 |
Figure 11The plots of the accuracy and loss of training data and validation data related to nine pre-trained models.