| Literature DB >> 33471512 |
Jeany Delafiori1, Luiz Cláudio Navarro2, Rinaldo Focaccia Siciliano3,4, Gisely Cardoso de Melo5,6, Estela Natacha Brandt Busanello1, José Carlos Nicolau4, Geovana Manzan Sales1, Arthur Noin de Oliveira1, Fernando Fonseca Almeida Val5,6, Diogo Noin de Oliveira1, Adriana Eguti7, Luiz Augusto Dos Santos8, Talia Falcão Dalçóquio4, Adriadne Justi Bertolin4, Rebeca Linhares Abreu-Netto5,6, Rocio Salsoso4, Djane Baía-da-Silva5,6, Fabiana G Marcondes-Braga4, Vanderson Souza Sampaio5,9, Carla Cristina Judice10, Fabio Trindade Maranhão Costa10, Nelson Durán11, Mauricio Wesley Perroud7, Ester Cerdeira Sabino12, Marcus Vinicius Guimarães Lacerda5,13, Leonardo Oliveira Reis14, Wagner José Fávaro11, Wuelton Marcelo Monteiro5,6, Anderson Rezende Rocha2, Rodrigo Ramos Catharino1.
Abstract
COVID-19 is still placing a heavy health and financial burden worldwide. Impairment in patient screening and risk management plays a fundamental role on how governments and authorities are directing resources, planning reopening, as well as sanitary countermeasures, especially in regions where poverty is a major component in the equation. An efficient diagnostic method must be highly accurate, while having a cost-effective profile. We combined a machine learning-based algorithm with mass spectrometry to create an expeditious platform that discriminate COVID-19 in plasma samples within minutes, while also providing tools for risk assessment, to assist healthcare professionals in patient management and decision-making. A cross-sectional study enrolled 815 patients (442 COVID-19, 350 controls and 23 COVID-19 suspicious) from three Brazilian epicenters from April to July 2020. We were able to elect and identify 19 molecules related to the disease's pathophysiology and several discriminating features to patient's health-related outcomes. The method applied for COVID-19 diagnosis showed specificity >96% and sensitivity >83%, and specificity >80% and sensitivity >85% during risk assessment, both from blinded data. Our method introduced a new approach for COVID-19 screening, providing the indirect detection of infection through metabolites and contextualizing the findings with the disease's pathophysiology. The pairwise analysis of biomarkers brought robustness to the model developed using machine learning algorithms, transforming this screening approach in a tool with great potential for real-world application.Entities:
Mesh:
Substances:
Year: 2021 PMID: 33471512 PMCID: PMC8023531 DOI: 10.1021/acs.analchem.0c04497
Source DB: PubMed Journal: Anal Chem ISSN: 0003-2700 Impact factor: 6.986
Characteristics of COVID-19 Confirmed and Suspicious Patients
| characteristics | suspicious = 23 | |
|---|---|---|
| age, years, mean (SD) | 50 (15.4) | 56 (13.6) |
| female
sex, | 186 (42.1) | 6 (26.1) |
| Severity, N (%) | ||
| homecare | 189 (42.8) | 1 (4.3) |
| hospitalization | 253 (57.2) | 22 (95.7) |
| ≤10 days | 125 (49.4) | 8 (34.8) |
| >10 days | 123 (48.6) | 15 (65.2) |
| transferred | 5 (2.0) | - |
| in-hospital death | 123 (49.6) | 11 (47.8) |
| onset of symptoms to enrolment, days, mean (SD) | 10·6 (6.3) | 5.5 (3.5) |
| Respiratory Support, | ||
| no oxygen received | 213 (48.2) | 2 (8.7) |
| oxygen | 76 (17.2) | 4 (17.4) |
| invasive mechanical ventilation | 153 (34.6) | 17 (73.9) |
| Comorbidities, | ||
| diabetes | 115 (26.4) | 8 (34.8) |
| hypertension | 176 (40.5) | 12 (52.2) |
| obesity | 113 (29.9) | 2 (8.7) |
| cardiomyopathy | 35 (8.1) | 5 (21.7) |
| respiratory diseases | 37 (8.5) | 8 (34.8) |
| chronic renal diseases | 13 (3.0) | 3 (13.0) |
| chronic hepatic diseases | 15 (34.6) | - |
| HIV | 6 (13.9) | 1 (4.2) |
N = 431.
N = 435.
N = 378.
N = 432.
N = 433.
Figure 1Study design flowchart. Abbreviations: Hosp, hospitalization; IMV, invasive mechanical ventilation.
Figure 2End to end process for putative biomarkers determination and diagnosis test generation. (a) MS data acquisition and preparation; (b) Sequential steps of ML data analysis and metabolomics biomarkers validation.
Statistical Metrics Definition to Evaluate Classification Resultsa
| metric | formula |
|---|---|
| sensitivity (SEN) | TP/(TP+FN) |
| specificity (SPE) | TN/(TN+FP) |
| precision (PRE) | TP/(TP+FP) |
| accuracy (ACC) | (SEN+SPE)/2 |
| F1-score (F 1s) | 2·PRE·SEN/(PRE+SEN) |
| Matthews’ Correlation Coefficient (MCC) | ((TP·TN)-(FP·FN))/sqrt((TP+FP)·(TP+FN)·(TN+FP)·(TN+FN)) |
| Δ | |
| correlation
index ( |
Abbreviations: TP = true positives; TN = true negatives; FP = false positives; FN = false negatives; sqrt = square root; A = set of all vectors; = value of variable j of vector i in A, ∈ ; = label of vector i in A, y = [0,1]; X set of all values of variable j in A; = set of all vectors of negative samples in A, i.e., labeled y = 0; = median of values of variable j for all vectors in A; () = the cumulative probability function (CDF) of values x ∈ X in A; = set of all vectors of positive samples in A, i.e., labeled y = 1; = median of values of variable j for all vectors in A; () = the cumulative probability function (CDF) of values x ∈ X in A; t and u = features.
Dataset Subdivisions for Model Fitting (Training and Validation), Testing and Blind Testa
| model | COVID-19 diagnosis (M1) ( | risk assessment (M2) ( | low-risk
discrimination (M3) ( | ||||||
|---|---|---|---|---|---|---|---|---|---|
| class | positive | negative | subtotal | severe | mild | subtotal - | mild | negative | subtotal |
| training | 260 | 231 | 491 (45) | 94 | 104 | 198 (45) | 113 | 140 | 253 (13) |
| validation | 105 | 95 | 200 (18) | 37 | 43 | 80 (18) | 34 | 42 | 76 (13) |
| testing | 57 | 53 | 110 (10) | 19 | 23 | 42 (10) | 23 | 28 | 51 (9) |
| blind test | 231 | 50 | 281 (26) | 41 | 76 | 117 (27) | 76 | 139 | 215 (36) |
Numbers correspond to individual (N) average and percentages in parentheses.
Performance Metrics Using Pairwise Features in 10 Validation Tests, Final Development Testing and Deployed Software Blind Testa
| Model | COVID-19 diagnosis (M1) | Risk assessment (M2) | Low-risk discrimination (M3) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Algorithm | GDB | ADA | ADA | ||||||
| Data set | Validation | Test | Blind Test | Validation | Test | Blind Test | Validation | Test | Blind Test |
| Vector length | 39 | 39 | 39 | 32 | 32 | 32 | 29 | 29 | 29 |
| # of Estimators | 260 (3) | 256 | 256 | 260 (3) | 256 | 256 | 260 (3) | 256 | 256 |
| TN | 90 (3) | 50 | 48 | 38 (2) | 21 | 61 | 40 (2) | 26 | 121 |
| FP | 5 (2) | 3 | 2 | 5 (2) | 2 | 15 | 2 (1) | 2 | 18 |
| FN | 4 (2) | 3 | 39 | 4 (2) | 4 | 6 | 3 (1) | 2 | 4 |
| TP | 101 (4) | 54 | 192 | 33 (3) | 15 | 35 | 31 (2) | 21 | 72 |
| Accuracy (%) | 95.6 (1.1) | 94.5 | 89.6 | 88.7 (3.2) | 85.1 | 82.8 | 93.4 (1.8) | 92.1 | 90.9 |
| Sensitivity (%) | 95.9 (1.8) | 94.7 | 83.1 | 88.1 (4.6) | 79.0 | 85.4 | 91.8 (3.1) | 91.3 | 94.7 |
| Specificity (%) | 95.2 (2.1) | 94.3 | 96.0 | 89.3 (4.7) | 91.3 | 80.3 | 95.0 (2.4) | 92.9 | 87.1 |
| Precision (%) | 95.3 (1.9) | 94.4 | 95.4 | 89.3 (4.2) | 90.1 | 81.2 | 94.9 (2.3) | 92.7 | 88.0 |
| F1 Score (%) | 95.6 (1.1) | 94.6 | 88.8 | 88.6 (3.2) | 84.2 | 83.2 | 93.3 (1.9) | 92.0 | 91.2 |
| MCC | 0.91 (0.02) | 0.89 | 0.80 | 0.78 (0.06) | 0.71 | 0.66 | 0.87 (0.04) | 0.84 | 0.82 |
Numbers correspond to individual’s classification average and standard deviations in parentheses. Abbreviations: ADA, ADA Boosting; GDB, gradient tree boosting; FN, false negative; FP, false positive; TN, true negative; TP, true positive; MCC, Mathew’s Correlation Coefficient.
Figure 3Recursive fitting of mass spectra data followed by model optimization processes allowed the determination of putative biomarkers ranked by ΔJ importance and group contribution. Abbreviations: CE, cholesteryl ester; DG diacylglycerol; DHEA, dehydroepiandrosterone; DeoxyGU, deoxyguanosine; LysoPC, lysophosphatidylcholine; PC, phosphatidylcholine; PE, phosphatidyethanolamine; PG, phosphatidylglycerol; PS, phosphatidylserine; SM, sphingomyelin; TG, triacylglycerol; UNK, unknown.
Figure 4Proposed role of identified biomarkers in COVID-19 pathophysiology. Abbreviations: ARDS, acute respiratory distress syndrome; COX-2, cyclooxygenase-2, deoxyGU, deoxyguanosine; LPCAT1, lysophosphatidylcholine acyltransferase 1; LysoPC, lysophosphatidylcholine; PC, phosphatidylcholine; PLA2, phospholipase A2.