| Literature DB >> 32179774 |
Rianda-Putra Firdani1, Mohy Uddin2, Shabbir Syed-Abdul1, Hee-Jung Chung3,4, Mina Hur5, Jae Hyeon Park6,7, Hyung Woo Kim8,7, Anton Gradišek9, Erik Dovgan9.
Abstract
Cell Population Data (CPD) provides various blood cell parameters that can be used for differential diagnosis. Data analytics using Machine Learning (ML) have been playing a pivotal role in revolutionizing medical diagnostics. This research presents a novel approach of using ML algorithms for screening hematologic malignancies using CPD. The data collection was done at Konkuk University Medical Center, Seoul. A total of (882 cases: 457 hematologic malignancy and 425 hematologic non-malignancy) were used for analysis. In our study, seven machine learning models, i.e., SGD, SVM, RF, DT, Linear model, Logistic regression, and ANN, were used. In order to measure the performance of our ML models, stratified 10-fold cross validation was performed, and metrics, such as accuracy, precision, recall, and AUC were used. We observed outstanding performance by the ANN model as compared to other ML models. The diagnostic ability of ANN achieved the highest accuracy, precision, recall, and AUC ± Standard Deviation as follows: 82.8%, 82.8%, 84.9%, and 93.5% ± 2.6 respectively. ANN algorithm based on CPD appeared to be an efficient aid for clinical laboratory screening of hematologic malignancies. Our results encourage further work of applying ML to wider field of clinical practice.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32179774 PMCID: PMC7075908 DOI: 10.1038/s41598-020-61247-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Case numbers analyzed in the study (after preprocessing).
| ICD-10 Code | Type | Cases | Group |
|---|---|---|---|
| C81-C96 | Malignant neoplasms of lymphoid, haematopoietic and related tissue | 457 | Hematologic - Malignancies |
| D50-D53 | Nutritional anemia | 49 | Hematologic – Non Malignancies |
| D55-D59 | Haemolytic anemia | 6 | Hematologic – Non Malignancies |
| D60-D64 | Aplastic and other anemia | 166 | Hematologic – Non Malignancies |
| D65-D69 | Coagulation defects, purpura and other hemorrhagic conditions | 83 | Hematologic – Non Malignancies |
| D70-D77 | Other diseases of blood and blood-forming organs | 121 | Hematologic – Non Malignancies |
CPD selected variables based on point-biserial correlation.
| Abbreviation | Name | Absolute Correlation |
|---|---|---|
| P-LCC | Platelet-large cell count | 0.351 |
| PCT | Plateletcrit | 0.336 |
| PLT | optical impedance | 0.321 |
| PLT-I | Platelet count- Impedance | 0.320 |
| InR‰ | Infected RBC percentage | 0.297 |
| Age | Age | 0.282 |
| Gender | Gender | 0.231 |
| HFC% | High fluorescent Cell percentage | 0.223 |
| Neu-BF% | Neutrophils percentage -body fluid | 0.210 |
| H-NR% | High forward scatter NRBC ratio | 0.198 |
| PLR | Platelet-to-lymphocyte ratio | 0.188 |
| Neu-BF# | Neutrophils Number -body fluid | 0.186 |
| HF-BF# | High Fluorescent cell Number -body fluid | 0.181 |
| NLR | Neutrophil-to-lymphocyte ratio | 0.181 |
| L-NR% | Low forward scatter NRBC ratio | 0.179 |
| Mon% | Monocytes percentage | 0.168 |
| MO-BF% | Monocytes percentage- body fluid | 0.166 |
| LY-BF% | Lymphocytes percentage- body fluid | 0.157 |
| Eos-BF# | Eosinophils number -body fluid | 0.152 |
| RDW-CV | Red Blood Cell Distribution Width Coefficient of Variation | 0.149 |
| IMG% | Immature Granulocyte percentage | 0.146 |
| Micro# | RBC microcyte Cell Number | 0.143 |
| Micro% | RBC microcyte Cell percentage | 0.142 |
| RDW-SD | Red Blood Cell Distribution Width Standard Deviation | 0.141 |
| Macro# | RBC macrocyte Cell Number | 0.130 |
| HCT | Hematocrit | 0.128 |
| IME% | Immature eosinophil percentage | 0.114 |
| HGB | Hemoglobin Concentration | 0.110 |
| MCHC | Mean Corpuscular Hemoglobin Concentration | 0.100 |
| RBC | Red Blood Cell count | 0.098 |
| Macro% | RBC macrocyte Cell percentage | 0.096 |
| Lym# | Lymphocytes number | 0.095 |
| MPV | Mean Platelet Volume | 0.093 |
| MCV | Mean Corpuscular volume | 0.091 |
| LY-BF# | Lymphocytes number- body fluid | 0.090 |
| Bas% | Basophils percentage | 0.089 |
| MO-BF# | Monocytes number- body fluid | 0.084 |
| P-LCR | Platelet-large cell ratio | 0.075 |
| Eos-BF% | Eosinophils percentage -body fluid | 0.064 |
| NRBC# | Nucleated red blood cell number | 0.059 |
| NRBC% | Nucleated red blood cell percentage | 0.057 |
Demographic population distribution.
| Age | Malignancies | Non-Malignancies | Total (%) | ||
|---|---|---|---|---|---|
| Female (%) | Male (%) | Female (%) | Male (%) | ||
| <18 (Children) | 0 (0) | 3 (0.34) | 1 (0.11) | 1 (0.11) | 5 (0.57) |
| 18–64 (Adults) | 124 (14.06) | 207 (23.47) | 152 (17.23) | 63 (7.14) | 546 (61.90) |
| 65 + (Elderly) | 56 (6.35) | 67 (7.60) | 113 (12.81) | 95 (10.77) | 331 (37.53) |
| Total | 180 (20.41) | 277 (31.41) | 266 (30.16) | 159 (18.03) | 882 (100) |
Granularity information of group diseases in dataset.
| Group | ICD code | Disease category | Frequency | Percentage (%) |
|---|---|---|---|---|
| C81 | Hodgkin lymphoma | 9 | 1.02% | |
| C82 | Follicular lymphoma | 1 | 0.11% | |
| C83 | Non-follicular lymphoma | 50 | 5.67% | |
| C84 | Mature T/NK-cell lymphomas | 10 | 1.13% | |
| C85 | Other specified and unspecified types of non-Hodgkin lymphoma | 28 | 3.17% | |
| C86 | Other specified types of T/NK-cell lymphoma | 22 | 2.49% | |
| C88 | Malignant immunoproliferative diseases and certain other B-cell lymphomas | 15 | 1.70% | |
| C90 | Multiple myeloma and malignant plasma cell neoplasms | 20 | 2.27% | |
| C91 | Lymphoid leukemia | 73 | 8.28% | |
| C92 | Myeloid leukemia | 177 | 20.07% | |
| C94 | Other leukemias of specified cell type | 13 | 1.47% | |
| C95 | Leukemia of unspecified cell type | 38 | 4.31% | |
| C96 | Other and unspecified malignant neoplasms of lymphoid, hematopoietic and related tissue | 1 | 0.11% | |
| D50-D53 | Nutritional anaemias | 49 | 5.56% | |
| D55-D59 | Haemolytic anaemias | 6 | 0.68% | |
| D60-D64 | Aplastic and other anaemias | 166 | 18.82% | |
| D65-D69 | Coagulation defects, purpura and other haemorrhagic conditions | 83 | 9.41% | |
| D70-D77 | Other diseases of blood and blood-forming organs | 121 | 13.72% | |
| C81-C96 | 457 | 51.81% | ||
| D50-D77 | 425 | 48.19% | ||
| 882 | 100% |
Total variable predictor on selection variable and model with high result AUC and recall.
| Used variable | Total Variable Predictor | Model | AUC % (± Standard Deviation) | Recall |
|---|---|---|---|---|
| All Variables | 61 | ANN | 93.9 ± 3 | 84.2 |
| >0.05 | 41 | ANN | 93.5 ± 3 | 84.9 |
| >0.1 | 29 | ANN | 92.8 ± 3 | 83.6 |
| >0.15 | 19 | ANN | 90.7 ± 5 | 82.8 |
| >0.20 | 9 | ANN | 87.7 ± 5 | 79.1 |
Figure 1AUC Obtained with ML Models when Applying Variable Selection with the Threshold of 0.05.
Model performance indicators when applying variable selection with the threshold of 0.05.
| AUC ± Standard Deviation | Accuracy | Precision | Recall | |
|---|---|---|---|---|
| Stochastic Gradient Descent (SGD) | 0.823 ± 0.040 | 0.699 | 0.746 | 0.710 |
| Support Vector Machine (SVM) | 0.792 ± 0.035 | 0.716 | 0.719 | 0.744 |
| Decision Tree (DT) | 0.782 ± 0.039 | 0.728 | 0.745 | 0.722 |
| Ramdom Forest (RF) | 0.859 ± 0.027 | 0.778 | 0.803 | 0.764 |
| Linear Regression (LINEAR), adapted | 0.802 ± 0.019 | 0.721 | 0.726 | 0.742 |
| Logistic Regression (LOGIT) | 0.822 ± 0.034 | 0.725 | 0.741 | 0.724 |
| Artificial Neural Network (ANN) | 0.935 ± 0.026 | 0.828 | 0.828 | 0.849 |
The p values of testing hypothesis that pairs of algorithms perform similarly.
| SVM | DT | RF | LINEAR | LOGIT | ANN | |
|---|---|---|---|---|---|---|
| 0.329 | 0.497 | 0.238 | 0.577 | 0.773 | 0.019 | |
| 0.187 | 0.051 | 0.123 | 0.165 | 0.010 | ||
| 0.161 | 0.304 | 0.892 | 0.002 | |||
| 0.099 | 0.104 | 0.000 | ||||
| 0.507 | 0.010 | |||||
| 0.005 |