| Literature DB >> 31699079 |
V Laengsri1,2, W Shoombuatong3, W Adirojananon2, C Nantasenamat3, V Prachayasittikul4, P Nuchnoi5,6.
Abstract
BACKGROUND: The hypochromic microcytic anemia (HMA) commonly found in Thailand are iron deficiency anemia (IDA) and thalassemia trait (TT). Accurate discrimination between IDA and TT is an important issue and better methods are urgently needed. Although considerable RBC formulas and indices with various optimal cut-off values have been developed, distinguishing between IDA and TT is still a challenging problem due to the diversity of various anemic populations. To address this problem, it is desirable to develop an improved and automated prediction model for discriminating IDA from TT.Entities:
Keywords: Decision making; Discrimination; Iron deficiency anemia; Machine learning; Random forest; Support vector machine; Thalassemia trait
Mesh:
Year: 2019 PMID: 31699079 PMCID: PMC6836478 DOI: 10.1186/s12911-019-0929-2
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
List of laboratory testing abbreviation used in this study
| Full Name | Abbreviation |
|---|---|
| Hemoglobin | Hb |
| Hematocrit | Hct |
| Hypochromic microcytic anemia | HMA |
| Iron deficiency anemia | IDA |
| Mean corpuscular volume | MCV |
| Mean corpuscular hemoglobin | MCH |
| Mean corpuscular hemoglobin concentration | MCHC |
| Red blood cell | RBC |
| Red blood cell distribution width | RDW |
| Thalassemia trait | TT |
Fig. 1The workflow of the computation model of ThalPred for discriminating IDA from TT and providing the set of interpretable rules
The age and red blood cell parameters of study subjects with thalassemia trait or iron deficiency anemia
| Parameters | TT ( | IDA ( | |
|---|---|---|---|
| Age (yrs.) | 37.79 ± 7.86 (18.00–50.00) | 39.15 ± 9.61 (23.00–58.00) | 0.435 |
| RBC (106/μL) | 5.32 ± 0.48 (4.35–6.83) | 4.03 ± 0.96 (1.69–5.77) | < 0.001* |
| Hb (g/dL) | 11.99 ± 1.11 (8.10–14.60) | 7.97 ± 2.26 (2.50–10.90) | < 0.001* |
| Hct (%) | 36.46 ± 4.10 (34.10–42.40) | 26.19 ± 6.63 (9.90–34.30) | < 0.001* |
| MCV (fL) | 69.49 ± 6.14 (52.30–79.70) | 65.12 ± 9.32 (48.70–81.00) | 0.010* |
| MCH (pg) | 22.71 ± 2.25 (17.70–26.60) | 19.65 ± 3.33 (12.50–25.70) | < 0.001* |
| MCHC (%) | 32.67 ± 1.14 (30.10–35.70) | 30.13 ± 1.93 (24.90–35.70) | < 0.001* |
| RDW (%) | 15.88 ± 1.13 (13.50–22.00) | 20.48 ± 3.23 (14.90–26.70) | < 0.001* |
The data are shown as mean ± standard deviation
Hb Hemoglobin; Hct Hematocrit; IDA Iron deficiency anemia; MCH mean corpuscular hemoglobin; MCHC mean corpuscular hemoglobin concentration; MCV mean corpuscular volume; RBC red blood cell count; RDW red blood cell distribution width; TT Thalassemia trait
* Mann-Whitney U test p-value < 0.05
Fig. 2Multivariate analysis using principal component analysis (PCA) of our laboratory data consisting of 146 TT cases (red circles) and 40 IDA cases (blue circles) derived from PCA scores (a) and loadings (b) plots
Performance comparisons of existing discriminant formulas and indices proposed for differentiation of iron deficiency anemia from thalassemia trait
| Indices/ formulas | Cut-off | Ac (%) | Sn (%) | Sp (%) | MCC | YI | AUC |
|---|---|---|---|---|---|---|---|
| BI = RDW | 15 | 36.02 | 19.18 | 97.50 | 0.19 | 0.17 | 0.71 |
| EF = MCV – 10 × RBC | 15 | 52.15 | 45.89 | 75.00 | 0.17 | 0.21 | 0.70 |
| E&F = MCV - RBC – 5Hb - 6.4 | 0 | 67.42 | 60.95 | 92.50 | 0.44 | 0.54 | 0.91 |
| G&K = MCV2×RDW/100Hb | 72 | 84.95 | 82.88 | 92.50 | 0.66 | 0.75 | 0.93 |
| MI = MCV/RBC | 13 | 54.30 | 47.95 | 77.50 | 0.21 | 0.25 | 0.75 |
| RDWI = MCV ×RDW/RBC | 220 | 67.20 | 60.96 | 90.00 | 0.42 | 0.51 | 0.92 |
| RI = RDW/RBC | 3.3 | 87.63 | 86.30 | 92.50 | 0.70 | 0.79 | 0.98 |
| S&L = MCV2×MCH/100 | 1530 | 74.19 | 92.47 | 7.50 | −0.01 | 0.00 | 0.31 |
| SI = MCV – RBC - 3Hb | 27 | 53.76 | 45.21 | 85.00 | 0.26 | 0.302 | 0.80 |
| SF = MCH/RBC | 3.8 | 37.63 | 26.71 | 77.50 | 0.04 | 0.04 | 0.68 |
| SiF = 1.5Hb – 0.05MCV | 14 | 24.73 | 31.51 | 0.00 | −0.56 | −0.69 | 0.02 |
| KF1 = RBC/Hct + 0.5RDW | 8.2 | 70.97 | 64.38 | 95.00 | 0.49 | 0.59 | 0.93 |
| KF2 = 5RDW/RBC | 16.8 | 89.79 | 89.04 | 92.50 | 0.74 | 0.82 | 0.98 |
Ac Accuracy; AUC Area under receiver operating curve; Hb Hemoglobin; Hct Hematocrit; IDA Iron deficiency anemia; MCC Matthew’s correlation coefficient; MCH mean corpuscular hemoglobin; MCV mean corpuscular volume; RBC red blood cell count; RDW red blood cell distribution width; Sn Sensitivity; Sp Specificity; TT Thalassemia trait; YI Youden’s index
Fig. 3Performance comparison of existing discriminant formulas and indices using ROC curves
Performance comparisons between DT, RF and SVM models in differentiation of iron deficiency anemia from thalassemia trait
| Classifier | Validation | Ac (%) | Sp (%) | Sn (%) | MCC | YI | AUC |
|---|---|---|---|---|---|---|---|
| 5-fold CV | 92.36 ± 1.67 | 90.48 ± 3.62 | 92.80 ± 1.76 | 0.76 ± 0.06 | 0.83 ± 0.04 | 0.81 ± 0.07 | |
| External | 92.54 ± 4.26 | 90.09 ± 10.79 | 93.35 ± 4.22 | 0.77 ± 0.14 | 0.83 ± 0.13 | 0.80 ± 0.10 | |
| Independent | 90.20 | 83.33 | 100.00 | 0.82 | 0.83 | 0.85 | |
| DT | 5-fold CV | 98.03 ± 0.91 | 96.15 ± 3.40 | 98.54 ± 0.99 | 0.94 ± 0.026 | 0.93 ± 0.06 | 1.00 ± 0.00 |
| External | 93.83 ± 4.10 | 86.86 ± 11.16 | 96.22 ± 3.77 | 0.82 ± 0.12 | 0.82 ± 0.04 | 0.92 ± 0.06 | |
| Independent | 92.16 | 86.21 | 100.00 | 0.85 | 0.85 | 1.00 | |
| RF | 5-fold CV | 94.17 ± 1.26 | 88.15 ± 3.49 | 95.75 ± 0.95 | 0.83 ± 0.04 | 0.83 ± 0.04 | 0.97 ± 0.01 |
| External | 94.62 ± 3.29 | 90.07 ± 8.99 | 96.13 ± 3.19 | 0.84 ± 0.10 | 0.84 ± 0.07 | 0.98 ± 0.02 | |
| Independent | 92.16 | 86.21 | 100.00 | 0.85 | 0.85 | 1.00 | |
| ANN | 5-fold CV | 94.11 ± 1.31 | 86.75 ± 3.58 | 96.14 ± 1.06 | 0.83 ± 0.04 | 0.83 ± 0.04 | 0.97 ± 0.02 |
| External | 93.78 ± 3.71 | 86.81 ± 10.84 | 96.22 ± 3.30 | 0.82 ± 0.11 | 0.83 ± 0.11 | 0.98 ± 0.02 | |
| Independent | 94.12 | 89.29 | 100.00 | 0.89 | 0.89 | 1.00 | |
| SVM | 5-fold CV | 95.05 ± 1.06 | 89.81 ± 2.66 | 96.45 ± 0.96 | 0.85 ± 0.03 | 0.85 ± 0.04 | 0.97 ± 0.01 |
| External | 95.59 ± 2.76 | 92.49 ± 8.47 | 96.74 ± 2.59 | 0.87 ± 0.08 | 0.87 ± 0.10 | 0.98 ± 0.03 | |
| Independent | 96.08 | 92.59 | 100.00 | 0.92 | 0.92 | 1.00 |
The data are shown as mean ± standard deviation (100 times)
Ac Accuracy; ANN Artificial neural network; AUC Area under receiver operating curve; DT Decision tree; k-NN k-nearest neighbor; MCC Matthew’s correlation coefficient; RF Random forest; Sn Sensitivity; Sp Specificity; SVM Support vector machine; YI Youden’s index; 5-fold CV 5-fold cross validation
Parameters of k-NN (k), RF (ntree, mtry), ANN (size, decay) and SVM (cost, γ) were optimized by a 5-fold CV procedure. Values of k, ntree, mtry, size, decay, cost and γ are 5, 200, 2, 4, 0.5, 8 and 0.5
Fig. 4Performance comparisons among k-NN, DT, RF, ANN and SVM models using ROC curves over 5-fold CV (a) external test (b) and independent test (c), where k-NN, DT, RF, ANN and SVM models are represented by pink, green, red, black and blue, respectively
The extracted interpretable rules derived from RF model in differentiation of iron deficiency anemia from thalassemia trait
| Length | Frequency (%) | Error (%) | Condition | Prediction |
|---|---|---|---|---|
| 1 | 63.98 | 0.00 | Hb > 10.95 | TT |
| 2 | 15.05 | 0.00 | RBC ≤ 4.5 and Hb ≤ 10.45 | IDA |
| 1 | 11.29 | 0.00 | RDW ≤ 17.15 | TT |
| 3 | 4.84 | 0.00 | RBC > 4.59 and Hb ≤ 10.95 and RDW > 17.7 | IDA |
| 4 | 2.15 | 0.00 | RBC > 4.28 and MCHC ≤32.15 and RDW > 17.15 and RDW ≤ 17.7 | TT |
| 3 | 1.08 | 0.00 | Hb ≤ 11.45 and MCHC > 31.35 and RDW > 17.4 | IDA |
| 1 | 1.61 | 33.00 | Else | TT |
Hb Hemoglobin; IDA Iron deficiency anemia; MCHC mean corpuscular hemoglobin concentration; RBC red blood cell count; RDW red blood cell distribution width; TT Thalassemia trait
Fig. 5Screenshots of the ThalPred web-based tool before (a) and after (b, c) submission of laboratory data, which is available at http://codes.bio/thalpred/
The prediction results derived from DT, RF and SVM models in differentiation of iron deficiency anemia (IDA) from thalassemia trait (TT)
| No. | RBC | Hb | Hct | MCV | MCH | MCHC | RDW | Diagnosis | Prediction | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DT | RF | ANN | SVM | ||||||||||
| 1 | 5.35 | 10.6 | 33.0 | 61.7 | 19.8 | 32.1 | 13.7 | TT | TT | TT | TT | TT | TT |
| 2 | 5.40 | 10.9 | 34.1 | 63.2 | 20.2 | 32.0 | 14.5 | TT | TT | TT | TT | TT | TT |
| 3 | 5.40 | 10.3 | 33.0 | 61.1 | 19.1 | 31.2 | 14.2 | TT | TT | TT | TT | TT | TT |
| 4 | 6.01 | 12.3 | 38.3 | 63.7 | 20.5 | 32.1 | 13.2 | TT | TT | TT | TT | TT | TT |
| 5 | 5.55 | 11.0 | 33.4 | 60.2 | 19.8 | 32.9 | 13.2 | TT | TT | TT | TT | TT | TT |
| 6 | 6.04 | 12.9 | 39.6 | 65.6 | 21.4 | 32.6 | 13.5 | TT | TT | TT | TT | TT | TT |
| 7 | 5.90 | 13.1 | 40.0 | 67.8 | 22.2 | 32.8 | 12.7 | TT | TT | TT | TT | TT | TT |
| 8 | 5.95 | 12.6 | 39.0 | 65.6 | 21.2 | 32.3 | 13.4 | TT | TT | TT | TT | TT | TT |
| 9 | 6.11 | 12.7 | 39.5 | 64.7 | 20.8 | 32.2 | 13.7 | TT | TT | TT | TT | TT | TT |
| 10 | 5.45 | 11.8 | 36.6 | 67.1 | 21.7 | 32.3 | 14.0 | TT | TT | TT | TT | TT | TT |
| 11 | 5.40 | 11.0 | 34.0 | 63.0 | 20.4 | 32.4 | 12.8 | TT | TT | TT | TT | TT | TT |
| 12 | 5.40 | 11.0 | 34.0 | 63.0 | 20.4 | 32.4 | 14.0 | TT | TT | TT | TT | TT | TT |
| 13 | 6.20 | 12.4 | 37.2 | 60.0 | 20.0 | 33.3 | 12.6 | TT | TT | TT | TT | TT | TT |
| 14 | 5.40 | 10.7 | 33.0 | 61.1 | 19.8 | 32.4 | 13.6 | TT | TT | TT | TT | TT | TT |
| 15 | 6.11 | 12.3 | 38.0 | 62.2 | 20.1 | 32.4 | 12.6 | TT | TT | TT | TT | TT | TT |
| 16 | 3.40 | 7.7 | 24.1 | 70.9 | 22.7 | 32.0 | 20.1 | IDA | IDA | IDA | IDA | IDA | IDA |
| 17 | 4.66 | 11.3 | 34.4 | 73.8 | 24.3 | 32.9 | 21.0 | IDA | TT | TT | TT | IDA | TT |
| 18 | 4.54 | 10.6 | 32.8 | 72.3 | 23.4 | 32.3 | 21.0 | IDA | IDA | IDA | IDA | IDA | IDA |
| 19 | 3.50 | 7.9 | 25.2 | 72.0 | 22.6 | 31.4 | 21.2 | IDA | IDA | IDA | IDA | IDA | IDA |
| 20 | 4.15 | 9.9 | 29.0 | 69.9 | 23.9 | 34.1 | 20.2 | IDA | TT | IDA | IDA | IDA | IDA |
| 21 | 3.90 | 8.9 | 28.0 | 71.8 | 22.8 | 31.8 | 20.2 | IDA | IDA | IDA | IDA | IDA | IDA |
| 22 | 4.17 | 9.9 | 29.0 | 69.5 | 23.7 | 34.1 | 21.1 | IDA | IDA | IDA | IDA | IDA | IDA |
| 23 | 3.85 | 8.5 | 27.5 | 71.4 | 22.1 | 30.9 | 21.3 | IDA | IDA | IDA | IDA | IDA | IDA |
| 24 | 4.24 | 9.8 | 30.5 | 71.9 | 23.1 | 32.1 | 20.2 | IDA | IDA | IDA | IDA | IDA | IDA |
| 25 | 4.83 | 11.1 | 34.0 | 70.4 | 23.0 | 32.7 | 19.0 | IDA | TT | TT | TT | TT | TT |
| 26 | 4.64 | 10.6 | 33.0 | 71.1 | 22.8 | 32.1 | 20.5 | IDA | IDA | IDA | IDA | IDA | IDA |
| 27 | 4.01 | 9.5 | 29.0 | 72.3 | 23.7 | 32.8 | 20.4 | IDA | IDA | IDA | IDA | IDA | IDA |
| 28 | 4.80 | 11.0 | 34.0 | 70.8 | 22.9 | 32.4 | 21.1 | IDA | TT | TT | TT | IDA | IDA |
| 29 | 3.65 | 8.4 | 26.0 | 71.2 | 23.0 | 32.3 | 20.4 | IDA | IDA | IDA | IDA | IDA | IDA |
| 30 | 4.00 | 9.2 | 28.0 | 70.0 | 23.0 | 32.9 | 20.1 | IDA | IDA | IDA | IDA | IDA | IDA |
| 31 | 4.45 | 10.2 | 31.4 | 70.6 | 22.9 | 32.5 | 19.8 | IDA | IDA | IDA | IDA | IDA | IDA |
| 32 | 4.44 | 10.3 | 32.0 | 72.1 | 23.2 | 32.2 | 21.3 | IDA | IDA | IDA | IDA | IDA | IDA |
| 33 | 4.56 | 10.4 | 32.0 | 70.2 | 22.8 | 32.5 | 19.4 | IDA | IDA | IDA | IDA | IDA | IDA |
| 34 | 4.84 | 11.0 | 34.0 | 70.3 | 22.7 | 32.4 | 20.7 | IDA | TT | TT | TT | IDA | IDA |
| 36 | 3.83 | 8.7 | 27.0 | 70.5 | 22.72 | 32.22 | 20.0 | IDA | IDA | IDA | IDA | IDA | IDA |
| 37 | 3.6 | 8.7 | 25.0 | 69.44 | 24.17 | 34.8 | 19.3 | IDA | IDA | IDA | IDA | IDA | IDA |
| 38 | 3.52 | 8.0 | 25.0 | 71.02 | 22.73 | 32 | 21.1 | IDA | IDA | IDA | IDA | IDA | IDA |
| 39 | 3.98 | 9.0 | 27.8 | 69.85 | 22.61 | 32.37 | 20.7 | IDA | IDA | IDA | IDA | IDA | IDA |
| 40 | 4.02 | 9.0 | 28.5 | 70.9 | 22.39 | 31.58 | 21.1 | IDA | IDA | IDA | IDA | IDA | IDA |
| 41 | 4.43 | 10.1 | 31.4 | 70.88 | 22.8 | 32.17 | 19.8 | IDA | IDA | IDA | IDA | IDA | IDA |
| 42 | 4.24 | 9.6 | 30.0 | 70.75 | 22.64 | 32.0 | 21.0 | TT | TT | TT | TT | TT | TT |
| 43 | 5.94 | 12.6 | 39.0 | 65.7 | 21.2 | 32.3 | 13.0 | TT | IDA | TT | TT | IDA | TT |
| 44 | 5.80 | 11.7 | 36.3 | 62.6 | 20.2 | 32.2 | 13.1 | TT | IDA | TT | TT | IDA | TT |
| 45 | 5.45 | 11.4 | 35.0 | 64.2 | 20.9 | 32.6 | 13.0 | TT | IDA | TT | TT | IDA | TT |
| 46 | 5.40 | 11.0 | 33.7 | 62.4 | 20.4 | 32.6 | 12.5 | TT | IDA | TT | TT | IDA | TT |
| 47 | 6.11 | 12.6 | 38.5 | 63.0 | 20.6 | 32.7 | 14.3 | TT | IDA | TT | TT | IDA | TT |
| 48 | 5.80 | 12.2 | 36.4 | 62.8 | 21.0 | 33.5 | 14.0 | TT | IDA | TT | TT | IDA | TT |
| 49 | 5.44 | 10.8 | 34.0 | 62.5 | 19.9 | 31.8 | 12.7 | TT | IDA | TT | TT | IDA | TT |
| 50 | 5.40 | 10.8 | 33.0 | 61.1 | 20.0 | 32.7 | 13.6 | TT | TT | TT | TT | TT | TT |
| 51 | 6.05 | 12.3 | 38.0 | 62.8 | 20.3 | 32.4 | 14.0 | TT | TT | TT | TT | TT | TT |
ANN Artificial neural network; DT Decision tree; Hb Hemoglobin; Hct Hematocrit; IDA Iron deficiency anemia; k-NN k-nearest neighbor; MCH mean corpuscular hemoglobin; MCHC mean corpuscular hemoglobin concentration; MCV mean corpuscular volume; RBC red blood cell count; RDW red blood cell distribution width; RF Random forest; SVM Support vector machine; TT Thalassemia trait. Parameters of k-NN (k), RF (ntree, mtry), ANN (size, decay) and SVM (cost, γ) were optimized by a 5-fold CV procedure. Values of k, ntree, mtry, size, decay, cost and γ are 5, 200, 2, 4, 0.5, 8 and 0.5
Fig. 6Multivariate analysis using principal component analysis (PCA) of internal (square) and external (circles) sets, where red and blue represent TT and IDA cases, respectively