| Literature DB >> 35721829 |
Gianlucca Zuin1,2, Daniella Araujo1,3, Vinicius Ribeiro3, Maria Gabriella Seiler2, Wesley Heleno Prieto4, Maria Carolina Pintão4, Carolina Dos Santos Lazari4, Celso Francisco Hernandes Granato4, Adriano Veloso1.
Abstract
Background: The Complete Blood Count (CBC) is a commonly used low-cost test that measures white blood cells, red blood cells, and platelets in a person's blood. It is a useful tool to support medical decisions, as intrinsic variations of each analyte bring relevant insights regarding potential diseases. In this study, we aimed at developing machine learning models for COVID-19 diagnosis through CBCs, unlocking the predictive power of non-linear relationships between multiple blood analytes.Entities:
Keywords: Biomarkers; Diagnosis; Viral infection
Year: 2022 PMID: 35721829 PMCID: PMC9199341 DOI: 10.1038/s43856-022-00129-0
Source DB: PubMed Journal: Commun Med (Lond) ISSN: 2730-664X
Mean and standard deviation for all considered cell counts in each cohort. N = 1,138,728 CBCs.
| Analyte | Covid-19 (+) | Covid-19 (-) | Influenza (+) | Other Viruses (+) | Entire Data |
|---|---|---|---|---|---|
| RBC (1012/L) | 5.06 ± 0.52 | 4.21 ± 0.98 | 4.73 ± 0.60 | 3.67 ± 0.87 | 4.28 ± 0.96 |
| Hemoglobin (g/dl) | 14.9 ± 1.4 | 12.4 ± 2.8 | 14.0 ± 1.7 | 10.8 ± 2.5 | 12.6 ± 2.7 |
| Hematocrit (%) | 43.8 ± 4.0 | 36.8 ± 7.9 | 41.0 ± 4.9 | 31.7 ± 7.3 | 37.4 ± 7.7 |
| MCV (fL) | 86.8 ± 4.7 | 88.1 ± 6.4 | 87.0 ± 6.7 | 86.9 ± 8.0 | 88.0 ± 6.2 |
| MCH (pg/cell) | 29.5 ± 1.9 | 29.6 ± 2.3 | 29.6 ± 2.3 | 29.6 ± 2.6 | 29.5 ± 2.2 |
| MCHC (g/dL) | 34.1 ± 1.1 | 33.6 ± 1.4 | 34.0 ± 1.1 | 34.1 ± 1.4 | 33.6 ± 1.4 |
| RDW (%) | 13.0 ± 1.0 | 14.3 ± 2.2 | 13.6 ± 1.2 | 15.1 ± 2.1 | 14.1 ± 2.2 |
| WBC (109/L) | 6.07 ± 2.37 | 8.07 ± 3.81 | 6.96 ± 2.81 | 5.87 ± 4.69 | 8.02 ± 3.81 |
| Monocytes (109L) | 0.66 ± 0.29 | 0.68 ± 0.35 | 0.75 ± 0.37 | 0.66 ± 0.46 | 0.66 ± 0.34 |
| Lymphocytes (109L) | 1.40 ± 0.72 | 1.67 ± 1.05 | 1.23 ± 0.92 | 1.25 ± 1.40 | 1.54 ± 0.99 |
| Eosinophils (109/L) | 0.07 ± 0.09 | 0.18 ± 0.20 | 0.07 ± 0.10 | 0.10 ± 0.16 | 0.15 ± 0.20 |
| Basophils (109/L) | 0.02 ± 0.02 | 0.03 ± 0.02 | 0.02 ± 0.01 | 0.02 ± 0.02 | 0.03 ± 0.02 |
| Neutrophils (109/L) | 3.92 ± 2.22 | 5.53 ± 3.50 | 4.90 ± 2.57 | 4.08 ± 3.93 | 5.64 ± 3.57 |
| Platelets (109/L) | 195.7 ± 56.7 | 222.0 ± 102.3 | 182.9 ± 63.6 | 145.8 ± 115.6 | 222.7 ± 99.9 |
| RBC (1012/L) | 4.57 ± 0.44 | 4.03 ± 0.75 | 4.62 ± 0.67 | 3.75 ± 0.78 | 4.05 ± 0.75 |
| Hemoglobin (g/dl) | 13.3 ± 1.2 | 11.8 ± 2.1 | 13.6 ± 1.9 | 11.0 ± 2.1 | 11.8 ± 2.1 |
| Hematocrit (%) | 39.8 ± 3.4 | 35.4 ± 6.1 | 40.3 ± 5.4 | 32.8 ± 6.4 | 35.6 ± 6.0 |
| MCV (fL) | 87.3 ± 5.0 | 88.3 ± 6.3 | 87.7 ± 6.7 | 87.9 ± 8.1 | 88.3 ± 6.2 |
| MCH (pg/cell) | 29.2 ± 2.0 | 29.3 ± 2.3 | 29.7 ± 2.3 | 29.4 ± 2.7 | 29.3 ± 2.2 |
| MCHC (g/dL) | 33.5 ± 1.0 | 33.2 ± 1.3 | 33.8 ± 1.2 | 33.5 ± 1.4 | 33.2 ± 1.3 |
| RDW (%) | 13.1 ± 1.1 | 14.2 ± 2.1 | 13.7 ± 1.3 | 14.9 ± 2.1 | 14.1 ± 2.1 |
| WBC (109/L) | 5.87 ± 2.40 | 8.03 ± 3.71 | 7.11 ± 3.15 | 6.62 ± 4.63 | 7.84 ± 3.66 |
| Monocytes (109/L) | 0.56 ± 0.24 | 0.62 ± 0.32 | 0.70 ± 0.35 | 0.61 ± 0.43 | 0.60 ± 0.31 |
| Lymphocytes (109/L) | 1.54 ± 0.80 | 1.85 ± 1.05 | 1.36 ± 0.95 | 1.54 ± 1.40 | 1.78 ± 1.02 |
| Eosinophils (109/L) | 0.06 ± 0.08 | 0.16 ± 0.18 | 0.075 ± 0.11 | 0.09 ± 0.18 | 0.15 ± 0.18 |
| Basophils (109/L) | 0.02 ± 0.01 | 0.03 ± 0.02 | 0.01 ± 0.01 | 0.02 ± 0.02 | 0.03 ± 0.02 |
| Neutrophils (109/L) | 3.68 ± 2151.51 | 5.39 ± 3.36 | 4.94 ± 2.97 | 4.56 ± 3.81 | 5.29 ± 3.34 |
| Platelets (109/L) | 222.6 ± 63.0 | 249.2 ± 101.4 | 185.0 ± 69.1 | 188.9 ± 123.8 | 248.4 ± 100.4 |
Entire dataset, training sets, and validation sets for the two waves that occurred during the Brazilian COVID-19 outbreak.
| CBC (+) | CBC (−) | |||||
|---|---|---|---|---|---|---|
| Gender | COVID-19 (+) | COVID-19 (−) | Influenza-A (+) | Influenza-B (+) | Influenza-H1N1 (+) | Other viruses (+) |
| Male | 11.3% | 34.0% | 46.7% | 46.5% | 48.4% | 59.5% |
| (122,793) | (369,787) | (3160) | (1384) | (4108) | (20,107) | |
| Female | 10.3% | 44.4% | 53.3% | 53.5% | 51.6% | 40.5% |
| (111,673) | (482,453) | (3604) | (1588) | (4380) | (13,691) | |
| Male | 12.9% | 9.8% | 4.2% | 2.1% | 6.0% | 12.8% |
| (5859) | (4469) | (1895) | (975) | (2742) | (5825) | |
| Female | 12.1% | 15.2% | 4.9% | 2.8% | 6.9% | 10.3% |
| (5527) | (6918) | (2223) | (1214) | (3118) | (4656) | |
| Male | 4.9% | 37.6% | 1.0% | <0.1% | 1.4% | 2.3% |
| (5808) | (44,637) | (1113) | (188) | (1660) | (2710) | |
| Female | 4.7% | 43.4% | 1.1% | <0.1% | 1.6% | 1.7% |
| (5647) | (51,550) | (1343) | (134) | (1842) | (2028) | |
| Male | 25.9% | 10.5% | 2.0% | 1.0% | 3.1% | 6.3% |
| (24,104) | (9770) | (1895) | (975) | (2742) | (5825) | |
| Female | 24.2% | 15.1% | 2.3% | 1.3% | 3.3% | 5.0% |
| (22,404) | (14,088) | (2223) | (1214) | (3118) | (4656) | |
| Male | 4.5% | 38.9% | 0.4% | <0.1% | 0.6% | 1.0% |
| (11,860) | (101,655) | (1113) | (188) | (1660) | (2710) | |
| Female | 4.3% | 48.1% | 0.5% | <0.1% | 0.7% | 0.8% |
| (11,021) | (125,776) | (1343) | (134) | (1842) | (2028) | |
Training sets were obtained after applying the inclusion–exclusion criteria to the entire data and downsampling the COVID-19(-) class in the training sets to account for class unbalance. We considered October 1st as the split point between the first and second wave data to eliminate possible incubation periods before the start of the second wave in early November. As such, validation for the first wave encompasses data from late June to late September, and validation for the second wave ranges from early October to late February. N = 1,138,728 CBCs.
Fig. 1Analytes average progression through COVID-19 disease course.
Average values of the most impactful analytes along with the disease time frame, from 30 days before the first positive RT-PCR result up to 30 days after. N = 120,726 patients.
COVID-19 endemic and pandemic simulations. AUROC, Specificity and Sensitivity, and the respective confidence intervals for different COVID-19 prevalence simulations under 95% confidence.
| COVID-19 prevalence | AUROC | Specificity | Sensitivity |
|---|---|---|---|
| 1% | 0.928 ± 0.093 | 0.875 ± 0.018 | 0.913 ± 0.152 |
| 2% | 0.881 ± 0.117 | 0.877 ± 0.024 | 0.812 ± 0.250 |
| 3% | 0.917 ± 0.046 | 0.874 ± 0.016 | 0.873 ± 0.099 |
| 4% | 0.922 ± 0.037 | 0.882 ± 0.033 | 0.896 ± 0.087 |
| 5% | 0.918 ± 0.046 | 0.874 ± 0.012 | 0.879 ± 0.104 |
| 6% | 0.909 ± 0.041 | 0.874 ± 0.032 | 0.857 ± 0.116 |
| 7% | 0.910 ± 0.024 | 0.883 ± 0.018 | 0.840 ± 0.083 |
| 8% | 0.904 ± 0.054 | 0.879 ± 0.036 | 0.849 ± 0.102 |
| 9% | 0.907 ± 0.046 | 0.872 ± 0.025 | 0.871 ± 0.085 |
| 10% | 0.896 ± 0.059 | 0.871 ± 0.025 | 0.848 ± 0.118 |
| 20% | 0.916 ± 0.029 | 0.866 ± 0.025 | 0.878 ± 0.045 |
| 30% | 0.906 ± 0.021 | 0.871 ± 0.018 | 0.862 ± 0.059 |
| 40% | 0.911 ± 0.016 | 0.871 ± 0.024 | 0.873 ± 0.032 |
| 50% | 0.913 ± 0.032 | 0.886 ± 0.030 | 0.863 ± 0.028 |
| 60% | 0.901 ± 0.015 | 0.868 ± 0.031 | 0.852 ± 0.037 |
| 70% | 0.906 ± 0.021 | 0.867 ± 0.033 | 0.858 ± 0.035 |
| 80% | 0.902 ± 0.040 | 0.869 ± 0.074 | 0.854 ± 0.025 |
| 90% | 0.911 ± 0.030 | 0.889 ± 0.081 | 0.864 ± 0.022 |
N = 30 simulations with 20,000 unique patients each.