| Literature DB >> 34091385 |
Alexandre de Fátima Cobre1, Dile Pontarolo Stremel2, Guilhermina Rodrigues Noleto3, Mariana Millan Fachi4, Monica Surek5, Astrid Wiens6, Fernanda Stumpf Tonin7, Roberto Pontarolo8.
Abstract
OBJECTIVE: This study aimed to implement and evaluate machine learning based-models to predict COVID-19' diagnosis and disease severity.Entities:
Keywords: Blood test; COVID-19; Diagnosis; Machine learning model; Severity; Urine test
Year: 2021 PMID: 34091385 PMCID: PMC8164361 DOI: 10.1016/j.compbiomed.2021.104531
Source DB: PubMed Journal: Comput Biol Med ISSN: 0010-4825 Impact factor: 4.589
Biochemical, urinalysis, haematological, virological, and bacteriological tests performed on the patients included in the study.
| Biochemical tests |
|---|
| Glucose serum, urea, C-reactive protein, creatinine, potassium, sodium, alanine transaminase, aspartate transaminase, gamma-glutamyltransferase, total bilirubin, direct bilirubin, indirect bilirubin, alkaline phosphatase, ionised pH, blood, magnesium analysis, HCO3 (venous blood gas analysis), lactate dehydrogenase, creatine phosphokinase, ferritin, arterial lactic acid, lipase dosage, HCO3 (arterial blood gas analysis), phosphorus, pCO2 (venous blood gas analysis), Hb saturation (venous blood gas analysis), base excess (venous blood gas analysis), pO2 (venous blood gas analysis), total CO2 (venous blood gas analysis), Hb saturation (arterial blood gases), pCO2 (arterial blood gas analysis), base excess (arterial blood gas analysis), pH (arterial blood gas analysis), total CO2 (arterial blood gas analysis), pO2 (arterial blood gas analysis), arterial FiO2, and ctO2 (arterial blood gas analysis). |
| Hematocrit, Hemoglobin, Platelets, Mean platelet volume, Red blood Cells, Lymphocytes, Mean corpuscular hemoglobin concentration, Leukocytes, Basophils, Mean corpuscular hemoglobin, Eosinophils, Mean corpuscular volume, Monocytes, Red blood cell distribution width |
| Urine pH, segmented neutrophil, promyelocytes, metamyelocytes and myeloblasts, and the international normalised ratio (INR). |
| Respiratory syncytial virus, influenza A, influenza B, parainfluenza 1, coronavirus NL63, rhinovirus/enterovirus, coronavirus HKU1, parainfluenza 3, adenovirus, parainfluenza 4, coronavirus 229E, coronavirus OC43, influenza A H1N1, influenza H1N1 test, and influenza A rapid test. |
Levels of biochemical, haematological, and urine biomarkers variation in positive patients with severe disease on a normalised scale of patients.
| Biomarker | COVID-19 positive patients' samples | COVID-19 severe patients' samples |
|---|---|---|
| Hematocrit | Low | Low |
| Haemoglobin | Low | Low |
| Platelets | Low | Low |
| Mean platelet volume | Low | Low |
| Red blood Cells | Low | Low |
| Lymphocytes | Low | Low |
| Mean corpuscular haemoglobin concentration (MCHC) | Low | Low |
| Leukocytes | High | High |
| Basophils | Normal | Normal |
| Mean corpuscular haemoglobin (MCH) | Normal | Low |
| Eosinophils | Low | Low |
| Mean corpuscular volume (MCV) | Low | Low |
| Monocytes | High | Normal |
| Red blood cell distribution width (RDW) | Low | Normal |
| Serum glucose | High | High |
| Neutrophils | Low | Low |
| Urea | Low | Low |
| C-reactive protein | High | High |
| Creatinine | High | High |
| Potassium | Low | Low |
| Sodium | Low | Low |
| Alanine transaminase | High | High |
| Aspartate transaminase | High | High |
| Gamma-glutamyltransferase | High | High |
| Total bilirubin | High | High |
| Direct bilirubin | High | High |
| Indirect bilirubin | High | High |
| Alkaline phosphatase | High | High |
| Ionised calcium | Low | Low |
| pCO2 (venous blood gas analysis) | High | High |
| Magnesium | Low | Low |
| Hb saturation (venous blood gas analysis) | Low | Low |
| Base excess (venous blood gas analysis) | Low | Low |
| pO2 (venous blood gas analysis) | Low | Low |
| Total CO2 (venous blood gas analysis) | High | High |
| pH (venous blood gas analysis) | Low | Low |
| HCO3 (venous blood gas analysis) | High | High |
| Rods | High | High |
| Segmented | Low | Low |
| Promyelocytes | Normal | – |
| Metamyelocytes | Normal | – |
| Myelocytes | Normal | – |
| Urine pH | Low | Low |
| Urine density | Normal | Low |
| Urine red blood cells | Normal | Normal |
| International normalised ratio (INR) | High | High |
| Lactate dehydrogenase | High | High |
| Creatine phosphokinase (CPK) | Normal | Low |
| Ferritin | High | High |
| Arterial lactic acid | High | High |
| Hb saturation (arterial blood gases) | Low | Low |
| pCO2 (arterial blood gas analysis) | High | High |
| Base excess (arterial blood gas analysis) | Low | Low |
| pH (arterial blood gas analysis) | Low | Low |
| Total CO2 (arterial blood gas analysis) | High | High |
| HCO3 (arterial blood gas analysis) | High | High |
| pO2 (arterial blood gas analysis) | Low | Low |
| Arterial FiO2 | Low | Low |
| Phosphorous | Low | – |
Diagnostic data.
Disease severity data.
Fig. 1Exploratory analysis. Principal component analysis (PCA) model for the discrimination of negative and positive samples (A) and samples from patients with severe and non-severe disease (B).
Fig. 2Graph of leverage versus student residuals for detecting outlier samples. For diagnostic data: outlier analysis of negative samples (A) and positive samples (B). For severity data: outlier analysis for samples from patients without severity (C) and with severity (D).
Performance comparison of the machine learning models for COVID-19.
| Metric | Diagnostic model | Disease severity model | ||||||
|---|---|---|---|---|---|---|---|---|
| RNA | DT | PLS-DA | K-NN | RNA | DT | PLS-DA | K-NN | |
| Training time | 21 min. 43 s | 27 min. 11 s | 31 min. 19 s | 22 min. 15 s | 7 min. 1 s | 10 min. 19 s | 18 min. 3 s | 09 min. 53 s |
| Calibration error | 1.0% | 0.5% | 1.2% | 0.5% | 1.0% | 8.4% | 6.0% | 0.4% |
| Cross validation error | 0.8% | 1.0% | 0.9% | 0.6% | 0.5% | 1.8% | 4.0% | 0.7% |
| Sensibility | 0.93 | 0.89 | 0.88 | 0.84 | 0.99 | 0.90 | 0.87 | 0.82 |
| Specificity | 0.94 | 0.89 | 0.90 | 0.83 | 0.97 | 0.94 | 0.88 | 0.88 |
| Accuracy | 0.94 | 0.90 | 0.90 | 0.84 | 0.98 | 0.92 | 0.88 | 0.86 |
Area under the ROC curve.
Fig. 3ROC curves of the accuracy of the machine learning models. Artificial neural network (ANN): diagnosis (A) and severity (B). Decision tree (DT): diagnosis (C) and severity (D). Discriminant analysis by partial least squares (PLS-DA): diagnosis (E) and severity (F). K-nearest neighbours (KNN): diagnosis (G) and severity (H).
Important biomarkers in machine learning models for the diagnosis and classification of COVID-19 severity.
| Biomarkers | Diagnostic model | Disease severity model | ||||||
|---|---|---|---|---|---|---|---|---|
| ANN | DT | PLSDA | KNN | ANN | DT | PLSDA | KNN | |
| Ferritin | ||||||||
| Gamma-glutamyltransferase | ||||||||
| HCO3 (arterial) | ||||||||
| Base excess (arterial) | ||||||||
| Base excess (venous) | ||||||||
| Sodium | ||||||||
| Total O2 (arterial) | ||||||||
| pO2 (arterial) | ||||||||
| Total CO2 | ||||||||
| pCO2 (arterial) | ||||||||
| pCO2 (venous) | ||||||||
| Indirect bilirubin | ||||||||
| Alkaline phosphatase | ||||||||
| Urine pH | ||||||||
| pH (venous) | ||||||||
| pH (arterial) | ||||||||
| FiO2 (arterial) | ||||||||
| ctO2 (arterial) | ||||||||
| Total bilirubin | ||||||||
| Red blood cell distribution width | ||||||||
| Platelets | ||||||||
| C-reactive protein | ||||||||
| Calcium ionised | ||||||||
| Urine-density | ||||||||
| Lactate dehydrogenase | ||||||||
| Arterial lactic acid | ||||||||
| Haemoglobin saturation (arterial) | ||||||||
| Phosphorous | ||||||||
| Lipase dosage | ||||||||
| Rods | ||||||||
Less important variable (−); Important variable (+); Critical variable (++).