Literature DB >> 33471512

Covid-19 Automated Diagnosis and Risk Assessment through Metabolomics and Machine Learning.

Jeany Delafiori¹, Luiz Cláudio Navarro², Rinaldo Focaccia Siciliano^3,4, Gisely Cardoso de Melo^5,6, Estela Natacha Brandt Busanello¹, José Carlos Nicolau⁴, Geovana Manzan Sales¹, Arthur Noin de Oliveira¹, Fernando Fonseca Almeida Val^5,6, Diogo Noin de Oliveira¹, Adriana Eguti⁷, Luiz Augusto Dos Santos⁸, Talia Falcão Dalçóquio⁴, Adriadne Justi Bertolin⁴, Rebeca Linhares Abreu-Netto^5,6, Rocio Salsoso⁴, Djane Baía-da-Silva^5,6, Fabiana G Marcondes-Braga⁴, Vanderson Souza Sampaio^5,9, Carla Cristina Judice¹⁰, Fabio Trindade Maranhão Costa¹⁰, Nelson Durán¹¹, Mauricio Wesley Perroud⁷, Ester Cerdeira Sabino¹², Marcus Vinicius Guimarães Lacerda^5,13, Leonardo Oliveira Reis¹⁴, Wagner José Fávaro¹¹, Wuelton Marcelo Monteiro^5,6, Anderson Rezende Rocha², Rodrigo Ramos Catharino¹.

Abstract

COVID-19 is still placing a heavy health and financial burden worldwide. Impairment in patient screening and risk management plays a fundamental role on how governments and authorities are directing resources, planning reopening, as well as sanitary countermeasures, especially in regions where poverty is a major component in the equation. An efficient diagnostic method must be highly accurate, while having a cost-effective profile. We combined a machine learning-based algorithm with mass spectrometry to create an expeditious platform that discriminate COVID-19 in plasma samples within minutes, while also providing tools for risk assessment, to assist healthcare professionals in patient management and decision-making. A cross-sectional study enrolled 815 patients (442 COVID-19, 350 controls and 23 COVID-19 suspicious) from three Brazilian epicenters from April to July 2020. We were able to elect and identify 19 molecules related to the disease's pathophysiology and several discriminating features to patient's health-related outcomes. The method applied for COVID-19 diagnosis showed specificity >96% and sensitivity >83%, and specificity >80% and sensitivity >85% during risk assessment, both from blinded data. Our method introduced a new approach for COVID-19 screening, providing the indirect detection of infection through metabolites and contextualizing the findings with the disease's pathophysiology. The pairwise analysis of biomarkers brought robustness to the model developed using machine learning algorithms, transforming this screening approach in a tool with great potential for real-world application.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Biomarkers

Year: 2021 PMID： 33471512 PMCID： PMC8023531 DOI： 10.1021/acs.analchem.0c04497

Source DB: PubMed Journal: Anal Chem ISSN： 0003-2700 Impact factor: 6.986

Coronaviruses (CoVs) are enveloped, single-stranded positive RNA viruses from the Coronaviridae family.[1] The recent pandemic, caused by SARS-CoV-2 and denominated COVID-19, opened discussions about measures to control disease spread, such as social distancing and population screening.[2] Currently, available tests are based on the direct detection of SARS-CoV-2 virus through antigens or RNA amplification (RT-PCR), serological tests, and the combination of RT-PCR and chest CT (computed-tomography). The development of medical decision-making tools for patient’s risk stratification and management is aligned with COVID-19 testing urgency. Even though the basis for the standard procedures is well-established, there are increased concerns about test’s sensitivity and specificity achieved on the field, time and costs associated with procedures, reagents and trained personnel availability, and the testing window.[3−5] Difficulties for an accurate diagnosis of SARS-CoV-2 and patient’s risk categorization are consequences of COVID-19 complexity. SARS-CoV-2 infection pathophysiology reflects in a broad spectrum of patient symptoms, ranging from mild flu-like manifestations to life-threatening acute respiratory distress syndrome (ARDS), vascular dysfunction, and sepsis.[2,6] Changes in lipid homeostasis, a common characteristic of viral infections, have been associated with SARS-CoV-2 pathology.[7−9] Lipidomic and metabolomic profiling of plasma samples indicate that exosomes enriched with monosialodihexosyl ganglioside (GM3) are associated with the severity of COVID-19.[7] Moreover, Fan et al. (2020) proposed the relationship between progressive decrease in serum low-density lipoprotein cholesterol (LDL-C) and cholesterol within deceased patients.[8] Individual susceptibility to COVID-19 symptoms is not fully understood, thereby hampering any potential outcome prediction. Panels of biomarkers that translate disease pathophysiology and sample profiling contribute to SARS-CoV-2 detection and may be proposed through “omics” techniques.[7,9−11] The current trend in associating artificial intelligence-explained algorithms and “omics” techniques has yielded platforms involving machine learning (ML) to analyze mass spectrometry (MS) data, aiming at biomarker identification of diseases, including COVID-19 severity assessment.[9,11,12] However, applying traditional untargeted mass spectrometry for diagnostic purposes remains laborious.[12,13] Considering that the testing tool for COVID-19 introduced in this contribution is based on metabolites from actual patients, it may be considered a new approach for SARS-CoV-2 screening. The proposed end-to-end mass spectrometry and machine learning combination aims at predictively identifying and modeling putative biomarkers for COVID-19 identification, risk assessment and low-risk discrimination from noninfected individuals. The introduced pairwise features approach is critical for effective implementation of untargeted metabolomics on a real-world setting, adding robustness to the model in spite of variations in the input data. Therefore, using the potential of MS-ML techniques in COVID-19 fighting,[14] we enrolled a cohort of 815 individuals for the development and testing of this independent system that simultaneously functions as an automated screening test using plasma samples, and provides metabolic information related to the presence and severity risk for the infection.

Experimental Section

Study Design and Patient Recruitment

Participants were recruited from selected sites with proven expertise in research from April to July, 2020 to increase data variability: Hospital das Clinicas, Faculdade de Medicina, Universidade de São Paulo (HCFMUSP), São Paulo, Brazil; Sumaré State Hospital and Paulínia Municipal Hospital localized in São Paulo state inland, Brazil; and Hospital Delphina Rinaldi Abdel Aziz, Manaus, Amazonas State, Brazil. The study was conducted according to principles expressed in Declaration of Helsinki and approved by local Ethics Committees (CAAE 32077020.6.0000.0005, CAAE 31049320.7.1001.5404 and CAAE 30299620.7.0000.0068). All participants provided informed consent before sample collection. Inclusion criteria for COVID-19 group (CV) were adult patients with one or more clinical symptoms of SARS-CoV-2 infection in the last 7 days (fever, dry cough, malaise and/or dyspnea) and positive SARS-CoV-2 RT-PCR in nasopharyngeal samples, following local hospital testing protocols based on Charité and WHO recommendations.[15] Control (CT) was formed by symptomatic RT-PCR-negative participants, also discarded by clinical and tomographic picture, and asymptomatic volunteers. Patients with suspicious COVID-19 and negative RT-PCR were separated in a group for posterior assessment. In this study, 815 participants were included. Gender, age, and fasting restrictions were not applied to simulate real-world conditions and to provide results with no patient bias. Table shows detailed COVID-19 and suspicious patients’ demographic information and Figure , a flowchart of study design.

Table 1

Characteristics of COVID-19 Confirmed and Suspicious Patients

characteristics	CV = 442	suspicious = 23
age, years, mean (SD)	50 (15.4)	56 (13.6)
female sex, N (%)	186 (42.1)	6 (26.1)
Severity, N (%)
homecare	189 (42.8)	1 (4.3)
hospitalization	253 (57.2)	22 (95.7)
≤10 days	125 (49.4)	8 (34.8)
>10 days	123 (48.6)	15 (65.2)
transferred	5 (2.0)	-
in-hospital death	123 (49.6)	11 (47.8)
onset of symptoms to enrolment, days, mean (SD)	10·6 (6.3)a	5.5 (3.5)
Respiratory Support, N (%)
no oxygen received	213 (48.2)	2 (8.7)
oxygen	76 (17.2)	4 (17.4)
invasive mechanical ventilation	153 (34.6)	17 (73.9)
Comorbidities, N (%)
diabetes	115 (26.4)b	8 (34.8)
hypertension	176 (40.5)b	12 (52.2)
obesity	113 (29.9)c	2 (8.7)
cardiomyopathy	35 (8.1)d	5 (21.7)
respiratory diseases	37 (8.5)b	8 (34.8)
chronic renal diseases	13 (3.0)c	3 (13.0)
chronic hepatic diseases	15 (34.6)c	-
HIV	6 (13.9)d	1 (4.2)

N = 431.

N = 435.

N = 378.

N = 432.

N = 433.

Figure 1

Study design flowchart. Abbreviations: Hosp, hospitalization; IMV, invasive mechanical ventilation.

N = 431. N = 435. N = 378. N = 432. N = 433. Study design flowchart. Abbreviations: Hosp, hospitalization; IMV, invasive mechanical ventilation.

COVID-19 Diagnostic Modeling (M1)

Diagnosis model (M1) was trained, validated, and tested using CV group composed of 548 plasma samples from 442 symptomatic SARS-CoV-2 confirmed cases upon hospital arrival, and 106 samples representing a second collection from hospitalized patients (mean 9.6 days, SD 3.8). CT group was formed by 37 symptomatic individuals with COVID-19 discarded and 313 asymptomatic controls, totaling 350 individuals, with median age of 50 years-old (IQR 32–72) and 64.9% female. Pooled samples (n = 184) were introduced to the data set to increase method sensitivity: 79 pooled CT, 5 pooled CV, 50 samples with 1:5 (CV:CT) and 50 with 1:10 (CV:CT) dilutions. Positivity rate of 23 suspicious individuals was assessed using this model.

Risk Assessment and Mild Symptoms Discrimination Modeling

Samples from 437 SARS-CoV-2 positive patients with reported outcome were divided into severe cases (n = 191) and mild cases (n = 246). Severe cases were categorized by required hospitalization for more than 10 days with recovery as outcome, or invasive mechanical ventilation, or deceased; mild group consisted of those with moderate (hospitalization lower than 10 day with recovery) and mild symptoms (homecare). Severe cases were compared against mild group for classification and determination of risk biomarkers (M2). The method sensitivity and specificity to discriminate low-risk patients were also accessed comparing controls (CT = 350) against mild group in a third machine learning model (M3).

Mass Spectrometry Sample Preparation

Plasma samples from peripheral venous blood were carefully handled due biohazard, and frozen at −80 °C until analysis. A 20 μL aliquot of plasma was diluted in 200 μL of tetrahydrofuran, followed by homogenization for 30 s at room temperature. Thus, 780 μL of methanol was added followed by homogenization for 30 s and centrifugation for 5 min, 3400 rpm at 4 °C. An aliquot of 5 μL of the supernatant was diluted in 495 μL of methanol and positively ionized by the addition of formic acid (0.1% final concentration) prior analysis.

Mass Spectrometry Analysis and Biomarker Elucidation

All samples were randomized for data acquisition intra- and interdaily and directly infused in a HESI-Q Exactive Orbitrap-MS (Thermo Scientific, Bremen, Germany) with 140 000 FWHM of mass resolution on positive ion mode. MS parameters were set as follows:m/z range 150–1700, 10 mass spectral acquisition per sample, flow rate 10 μL/min, sheath gas flow rate 5 units, capillary temperature 320 °C, aux gas heater temperature 33 °C, spray voltage 3.70 kV, automatic gain control (AGC) at 1 × 106, S-lens RF level 50, and injection time <2 ms. After ML modeling, the presence of each discriminant m/z determined by the algorithm was confirmed in mass spectra using Xcalibur 3.0 software (Thermo, Bremen, Germany). Molecule identity was proposed using METLIN (https://metlin.scripps.edu) database and literature search with mass accuracy ≤5 ppm and confirmed through MSn experiments using an ESI-LTQ XL (Thermos Scientific, Bremen, Germany) with collision energy ranging from 20 to 50 eV (He) and Mass Frontier software (Thermos Scientific, Bremen, Germany) for fragmentation modeling. Biomarkers pathway and meaning were attributed based on Kegg database (https://www.genome.jp/kegg/) and scientific literature.

Machine Learning Data Analysis

The MS-ML platform[16] consists of two primary data analysis phases. The first phase comprises the development of a ML model using a classification algorithm over MS data to determine potential m/z biomarkers. The second phase entails a deployed prediction model for diagnosing and determining risk, used for individuals screening in the field (blind data), as described in Figure . First, mass spectrometric data is preprocessed for ion annotation (intensity, width, resolution, and m/z), alignment, normalization, and denoising.[17] Three partitions of data are segregated according to the best practices of ML, consisting of fitting (training and validation), test, and blind test partitions. The final classification results are reported using the blind test (see Figure a). For the COVID-19 model (M1), a second blind test using suspicious cases was performed to evaluate the positivity rate between SARS-CoV-2 RT-PCR negative patients. The most discriminant features are determined and validated using the ML algorithms (ADA tree boosting (ADA), gradient tree boosting (GDB), random forest (RF), and extreme random forest (XRF), partial least squares (PLS), and support-vector machines (SVM)).[18−20] Applied recursive fitting to training and validation data (Figure b), with the annotation of averaging and computing the related standard deviation of selected performance metrics is defined in Table for each round of validation (optimized through accuracy, F1score, MCC).

Figure 2

Table 2

Statistical Metrics Definition to Evaluate Classification Resultsa

metric	formula
sensitivity (SEN)	TP/(TP+FN)
specificity (SPE)	TN/(TN+FP)
precision (PRE)	TP/(TP+FP)
accuracy (ACC)	(SEN+SPE)/2
F1-score (F 1s)	2·PRE·SEN/(PRE+SEN)
Matthews’ Correlation Coefficient (MCC)	((TP·TN)-(FP·FN))/sqrt((TP+FP)·(TP+FN)·(TN+FP)·(TN+FN))
ΔJ
correlation index (r), using Pearson correlation coefficient

Abbreviations: TP = true positives; TN = true negatives; FP = false positives; FN = false negatives; sqrt = square root; A = set of all vectors; = value of variable j of vector i in A, ∈ ; = label of vector i in A, y = [0,1]; X set of all values of variable j in A; = set of all vectors of negative samples in A, i.e., labeled y = 0; = median of values of variable j for all vectors in A; () = the cumulative probability function (CDF) of values x ∈ X in A; = set of all vectors of positive samples in A, i.e., labeled y = 1; = median of values of variable j for all vectors in A; () = the cumulative probability function (CDF) of values x ∈ X in A; t and u = features.

End to end process for putative biomarkers determination and diagnosis test generation. (a) MS data acquisition and preparation; (b) Sequential steps of ML data analysis and metabolomics biomarkers validation. Abbreviations: TP = true positives; TN = true negatives; FP = false positives; FN = false negatives; sqrt = square root; A = set of all vectors; = value of variable j of vector i in A, ∈ ; = label of vector i in A, y = [0,1]; X set of all values of variable j in A; = set of all vectors of negative samples in A, i.e., labeled y = 0; = median of values of variable j for all vectors in A; () = the cumulative probability function (CDF) of values x ∈ X in A; = set of all vectors of positive samples in A, i.e., labeled y = 1; = median of values of variable j for all vectors in A; () = the cumulative probability function (CDF) of values x ∈ X in A; t and u = features. After the observation of performance metrics, the discriminant features are evaluated through ΔJ importance (see Table ) and selected for metabolomics biomarkers identification. The marker importance is given by a cumulative distribution function (CDF) analysis: for a specific m/z, a CDF of the feature values for the negative samples (CT group) is compared with the CDF of positive samples (CV group) used in the fitting partition using first the Kolmogorov–Smirnov (KS-test) two samples equality hypothesis test to determine whether those distributions are different (failed on equality hypothesis). Then the ΔJ metric is used to determine if the features contribute negatively ΔJ < 0, or positively ΔJ > 0 for the disease. Features are discarded if CDFs are equal according to KS-test or ΔJ = 0.[21] The selected discriminant biomarkers undergo a second round of training and validation with the development software (Figure b). As putative biomarkers are validated via metabolomics, they are submitted to pairwise model creation, where the relationship between the biomarker’s intensities are used instead of their relative abundance solely provided in each spectrum, leading to an applied untargeted metabolomics diagnosis software. Features correlated to the selected biomarkers through the correlation index (Table ) r ≥ 80% were also identified. Detailed information on ML method is displayed in the Supporting Information (SI).

Results and Discussion

COVID-19 Testing Through MS-ML Platform: Modeling and Performance

The full data set was segregated as shown in Table for the fitting process (shuffled in 10 rounds of training and validation), and testing. The novel sequential processing of metabolomics data with ML algorithms resulted in a predictive model used for the diagnosis and risk assessment in the field (blind test). Table shows metric results for the predictive models (automated diagnosis (M1), risk assessment (M2), and low-risk discrimination (M3)) using pairwise features. Features selected through recursive fitting using MCC as metric are used to project groups separation (Figure ). During this process, discriminant features are evaluated through ΔJ importance, with posterior identification into molecules by metabolomics approach. The best final results were obtained with gradient tree boosting (GDB) to COVID-19 automated diagnosis with 96.0% of specificity and 83.1% of sensitivity. COVID-19 suspicious patients with RT-PCR negative results were assessed using the final COVID-19 classifier, resulting in a positivity rate of 78.3%, which may indicate the presence of false negative among RT-PCR results. The best results for risk assessment were obtained with ADA Boosting algorithm with 80.3% of specificity and 85.4% of sensitivity, from blind test.

Table 3

Dataset Subdivisions for Model Fitting (Training and Validation), Testing and Blind Testa

model	COVID-19 diagnosis (M1) (n = 1082)			risk assessment (M2) (n = 437)			low-risk discrimination (M3) (n = 595)
class	positive	negative	subtotal	severe	mild	subtotal -	mild	negative	subtotal
training	260	231	491 (45)	94	104	198 (45)	113	140	253 (13)
validation	105	95	200 (18)	37	43	80 (18)	34	42	76 (13)
testing	57	53	110 (10)	19	23	42 (10)	23	28	51 (9)
blind test	231	50	281 (26)	41	76	117 (27)	76	139	215 (36)

Numbers correspond to individual (N) average and percentages in parentheses.

Table 4

Performance Metrics Using Pairwise Features in 10 Validation Tests, Final Development Testing and Deployed Software Blind Testa

Model	COVID-19 diagnosis (M1)			Risk assessment (M2)			Low-risk discrimination (M3)
Algorithm	GDB			ADA			ADA
Data set	Validation	Test	Blind Test	Validation	Test	Blind Test	Validation	Test	Blind Test
Vector length	39	39	39	32	32	32	29	29	29
# of Estimators	260 (3)	256	256	260 (3)	256	256	260 (3)	256	256
TN	90 (3)	50	48	38 (2)	21	61	40 (2)	26	121
FP	5 (2)	3	2	5 (2)	2	15	2 (1)	2	18
FN	4 (2)	3	39	4 (2)	4	6	3 (1)	2	4
TP	101 (4)	54	192	33 (3)	15	35	31 (2)	21	72
Accuracy (%)	95.6 (1.1)	94.5	89.6	88.7 (3.2)	85.1	82.8	93.4 (1.8)	92.1	90.9
Sensitivity (%)	95.9 (1.8)	94.7	83.1	88.1 (4.6)	79.0	85.4	91.8 (3.1)	91.3	94.7
Specificity (%)	95.2 (2.1)	94.3	96.0	89.3 (4.7)	91.3	80.3	95.0 (2.4)	92.9	87.1
Precision (%)	95.3 (1.9)	94.4	95.4	89.3 (4.2)	90.1	81.2	94.9 (2.3)	92.7	88.0
F1 Score (%)	95.6 (1.1)	94.6	88.8	88.6 (3.2)	84.2	83.2	93.3 (1.9)	92.0	91.2
MCC	0.91 (0.02)	0.89	0.80	0.78 (0.06)	0.71	0.66	0.87 (0.04)	0.84	0.82

Numbers correspond to individual’s classification average and standard deviations in parentheses. Abbreviations: ADA, ADA Boosting; GDB, gradient tree boosting; FN, false negative; FP, false positive; TN, true negative; TP, true positive; MCC, Mathew’s Correlation Coefficient.

Figure 3

Recursive fitting of mass spectra data followed by model optimization processes allowed the determination of putative biomarkers ranked by ΔJ importance and group contribution. Abbreviations: CE, cholesteryl ester; DG diacylglycerol; DHEA, dehydroepiandrosterone; DeoxyGU, deoxyguanosine; LysoPC, lysophosphatidylcholine; PC, phosphatidylcholine; PE, phosphatidyethanolamine; PG, phosphatidylglycerol; PS, phosphatidylserine; SM, sphingomyelin; TG, triacylglycerol; UNK, unknown.

Numbers correspond to individual (N) average and percentages in parentheses. Numbers correspond to individual’s classification average and standard deviations in parentheses. Abbreviations: ADA, ADA Boosting; GDB, gradient tree boosting; FN, false negative; FP, false positive; TN, true negative; TP, true positive; MCC, Mathew’s Correlation Coefficient. Recursive fitting of mass spectra data followed by model optimization processes allowed the determination of putative biomarkers ranked by ΔJ importance and group contribution. Abbreviations: CE, cholesteryl ester; DG diacylglycerol; DHEA, dehydroepiandrosterone; DeoxyGU, deoxyguanosine; LysoPC, lysophosphatidylcholine; PC, phosphatidylcholine; PE, phosphatidyethanolamine; PG, phosphatidylglycerol; PS, phosphatidylserine; SM, sphingomyelin; TG, triacylglycerol; UNK, unknown. To assess model specificity and sensitivity, we compared selected moderate and mild symptoms cases with noninfected controls (M3) with ADA boosting, which resulted in the metrics of 92.9% of specificity and 91.3% of sensitivity from blinded data. Supporting data from validation metrics obtained during models’ development with different algorithms are displayed in SI Tables S1 and S2.

Panel of Discriminant Metabolites for COVID-19 Patients Using Untargeted Metabolomics

Twenty-six ions were selected by the ML and used for COVID-19 diagnosis (M1) (Table , metrics) and further validated through mass spectrometric data. From those, 19 discriminant biomarkers for COVID-19 infection were proposed, divided into 8 with positive and 11 with a negative contribution to the condition. Out of 19 molecules, seven belong to the glycerophospholipid class, three sterol lipids, three glycerolipids, two fatty acids, one sphingosine, one purine metabolite, and two unknown peptides. The remaining seven molecules have not yet been identified, a common element of nontargeted metabolomics.[13] A decrease in lysophosphatidylcholines (LysoPC), cholesterol species and unsaturated fatty acids followed by increased intensities of triacylglycerols (TG), diacylglycerols (DG) and a purine were the main findings that discriminated SARS-CoV-2 infected patients from noninfected individuals. Biomarkers data are displayed in Figure and detailed in SI Table S3. For risk assessment (M2), 26 ions achieved the metrics displayed in Table . Among them, seven markers contributed to the COVID-19 severe condition and 19 contributed to mild group. The main findings shown in Figure pointed to a relative reduction of certain species of LysoPC, phosphatidylcholine (PC), phosphatidylcholine derived plasmalogens, cholesterol, TG, sphingomyelins (SM), and N-acylethanolamines in severe cases in comparison to patients with mild and moderate symptoms (SI Table S4). Severe cases were represented by deoxyguanosine/adenosine, N-stearoyl valine and sterol lipid derivatives. The metrics for low-risk discrimination (M3) (Table ) were achieved with 24 ions divided in four glycerophospholipid and two glycerolipid markers, one peptide and nine unknown metabolites for mild group, whereas COVID-19 negative patients showed enhanced eicosatrienoic acid, three sterol lipid metabolites, one peptide, and three unknowns (Figure , SI Table S5).

Elected Biomarkers and COVID-19 Pathophysiology

The use of AI-explained algorithms allowed the creation of reliable models that facilitate decision-making in clinics and the investigation of the pathophysiological meaning of distinct biomarker’s levels. Viral recognition is an essential for initial host immune response, and the rapid course and cytokine storm associated with SARS-CoV infection may be involved with the guanosine- and uridine-rich (GU) single-strand RNA potential role as PAMP (pathogen-associated molecular patterns).[1] Deoxyguanosine, a metabolite from purine metabolism, triggers the enhanced signaling of TLR7 in the presence of ssRNA, inducing cytokine secretion in macrophages.[22] Therefore, further investigations are required to understand the potential role of deoxyguanosine in SARS-CoV-2 immune hyperactivation. On the other hand, N-linoleoyl-glycine and N-acylethanolamines (C20:1 and C22:0), found in this study associated with mild cases, regulate immune response by promoting anti-inflammatory effects.[23,24] The main lipidic findings pointed to a remodeling of glycerophospholipid metabolism. We identified enhanced presence of phosphatidylglycerol (PG) [PG(20:5)], phosphatidylethanolamine (PE) [PE(38:4)] and phosphatidylcholine (PC) [PC(38:8)], a diminishment of LysoPCs [LysoPC(16:0), and correlated m/z LysoPC(18:0), LysoPC(18:1), and LysoPC(18:2)] and plasmalogens species[25] (PS(O-36:2), PC(O-36:3), PC(O-34:2), PC(O-36:3)) in COVID-19 positive patients; the same PG, PC, and PE markers discriminated low-risk patients from noninfected individuals, as illustrated in Figure by glycerophospholipid pathway recurrence. LysoPC(18:2) were also found as negative contributors in plasma samples from patients with higher risk of death, as well as such PCs markers (PC(34:2), PC(36:3), PC(38:5)) and correlated PC molecules (PC(36:2), PC(36:4), PC(38:3), PC(38:4), PC(38:6)). Cell responses to various stimuli may be mediated by lysophospholipids, which actively participates in inflammation processes.[26,27] The relative intensities decrease of LysoPCs and PCs in severely ill patients are in accordance with recent studies of metabolic changes in ARDS and sepsis,[26−28] characteristics of COVID-19 severity.[2,6] LysoPC is formed through the cleavage of PC by phospho-lipase A2 (PLA2), whose modulation has a crucial role in inflammation processes. PLA2 up-regulation promotes fatty acids formation, precursors of eicosanoids, and LysoPCs.[29] Data shows that SARS-CoV nucleocapsid protein stimulates the expression of Cyclooxygenase-2,[30] an essential enzyme in the catalysis of prostanoids production from fatty acids, molecules that have been found downregulated in ARDS.[31]

Figure 4

Proposed role of identified biomarkers in COVID-19 pathophysiology. Abbreviations: ARDS, acute respiratory distress syndrome; COX-2, cyclooxygenase-2, deoxyGU, deoxyguanosine; LPCAT1, lysophosphatidylcholine acyltransferase 1; LysoPC, lysophosphatidylcholine; PC, phosphatidylcholine; PLA2, phospholipase A2. The availability of LysoPCs is also finely regulated by the acyltransferase activity of LCAT1 (lysophosphatidylcholine acyltransferase 1), which may promote the restoration of PCs via Lands cycle. The most abundant lipid specie found in alveolar surfactant formed by LCAT1 activity over LysoPC is dipalmitoylphosphatidylcholine (DPPC, PC(16:0/16:0)). This molecule corresponds to 70–80% of surfactant lipid composition, and the dysregulation of surfactant film is directly related to lung injury and ARDS.[29] Since DPPC formation is dependent on the availability of lipid substrates and the Lands cycle functioning, interferences in this process may disturb LysoPC and PC availability. Ferrarini et al. (2017) described a decrease in LysoPC species in serum of patients with ARDS derived from influenza infection and sepsis, reinforcing our findings.[26] Moreover, COVID-19 pathophysiology seems to impair cholesterol homeostasis.[7,8] We found cholesterol and cholesteryl ester (CE (16:0)) diminished in COVID-19 positive patients, and cholesterol decreases within mild/moderate symptoms, which was similarly reported by Song et. al (2020). They demonstrated the correlation between CE abundance and bis (monoacylglycero)phosphate, BMP(38:5), a lipid that influences cellular exportation of cholesterol from endosomes. During recovering progression, it was found increased alveolar macrophages BMP with enhanced CEs.[7] Cholesterol and LDL-C (low-density lipoprotein cholesterol) lowering was also observed in clinical practice associated with COVID-19 poor prognosis,[8] corroborating to our findings. Herein, based on the proposed m/z markers, we discriminated COVID-19 patients using a diagnostic, risk assessment and low-risk discrimination classifier generated from a MS-ML combination. Although the proposed biomarkers correlates COVID-19 pathophysiology to the mathematical process, a mechanistic biomarker evaluation is needed to better understand their contribution to COVID-19, and identify the unknowns.

Conclusions

The use of machine learning as a mean for the discrimination of diseases from mass spectrometric data aims to develop diagnostic and prognostic tools, treatment targets, and patient management systems.[11,12] From published articles to date, mass spectrometry-machine learning approaches employed a MALDI-MS direct method for untargeted analysis of SARS-CoV-2 specimens for diagnosis based on spectra profile,[11] or focused on biomarkers evaluation and their significance to severity levels of COVID-19 patients,[9] keeping the traditional chromatography–mass spectrometry approach. Our methodology introduced the pairwise m/z analysis, an essential advance in untargeted metabolomics application to provide diagnosis directly from raw data. By combining different m/z, this approach supports the spectra acquired by different mass spectrometers, including the robust use of flow-injection mass spectrometry (FI-MS), on an effort to overcome the ion competition effect.[32] Moreover, the proposed MS-ML platform for COVID-19 presented reliable qualitative results, with specificity of 96.0% and sensitivity of 83.1% (in a blind test), similar or even better in performance when compared to available serological and RT-PCR methods.[4,5] Our analysis also brings molecular information about the disease pathophysiology that may aid in prognostic markers and treatment targets for COVID-19. Overall, it aggregates, in one solution, an alternative for populational COVID-19 screening and guidance for public health efforts through risk classification. One future work consists in exploring a multiclass model (preliminary results in SI Table S6) for COVID-19 diagnosis and risk assessment. The same approach may be applied to other diseases involved with patient management during the pandemic and contribute to the COVID-19 MS Coalition’s collective effort[14] by consolidating the combination of mass spectrometry and artificial intelligence in a real-world setting.

27 in total

1. Detection of SARS-CoV-2 in nasal swabs using MALDI-MS.

Authors: Fabiane M Nachtigall; Alfredo Pereira; Oleksandra S Trofymchuk; Leonardo S Santos
Journal: Nat Biotechnol Date: 2020-07-30 Impact factor: 54.908

2. Potential anti-inflammatory actions of the elmiric (lipoamino) acids.

Authors: Sumner H Burstein; Jeffrey K Adams; Heather B Bradshaw; Cristian Fraioli; Ronald G Rossetti; Rebecca A Salmonsen; John W Shaw; J Michael Walker; Robert E Zipkin; Robert B Zurier
Journal: Bioorg Med Chem Date: 2007-03-13 Impact factor: 3.641

3. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China.

Authors: Chaolin Huang; Yeming Wang; Xingwang Li; Lili Ren; Jianping Zhao; Yi Hu; Li Zhang; Guohui Fan; Jiuyang Xu; Xiaoying Gu; Zhenshun Cheng; Ting Yu; Jiaan Xia; Yuan Wei; Wenjuan Wu; Xuelei Xie; Wen Yin; Hui Li; Min Liu; Yan Xiao; Hong Gao; Li Guo; Jungang Xie; Guangfa Wang; Rongmeng Jiang; Zhancheng Gao; Qi Jin; Jianwei Wang; Bin Cao
Journal: Lancet Date: 2020-01-24 Impact factor: 79.321

4. SARS-CoV-2 and viral sepsis: observations and hypotheses.

Authors: Hui Li; Liang Liu; Dingyu Zhang; Jiuyang Xu; Huaping Dai; Nan Tang; Xiao Su; Bin Cao
Journal: Lancet Date: 2020-04-17 Impact factor: 79.321

5. The COVID-19 MS Coalition-accelerating diagnostics, prognostics, and treatment.

Authors: Weston Struwe; Edward Emmott; Melanie Bailey; Michal Sharon; Andrea Sinz; Fernando J Corrales; Kostas Thalassinos; Julian Braybrook; Clare Mills; Perdita Barran
Journal: Lancet Date: 2020-05-27 Impact factor: 79.321

6. Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR.

Authors: Victor M Corman; Olfert Landt; Marco Kaiser; Richard Molenkamp; Adam Meijer; Daniel Kw Chu; Tobias Bleicker; Sebastian Brünink; Julia Schneider; Marie Luisa Schmidt; Daphne Gjc Mulders; Bart L Haagmans; Bas van der Veer; Sharon van den Brink; Lisa Wijsman; Gabriel Goderski; Jean-Louis Romette; Joanna Ellis; Maria Zambon; Malik Peiris; Herman Goossens; Chantal Reusken; Marion Pg Koopmans; Christian Drosten
Journal: Euro Surveill Date: 2020-01

Review 7. Machine Learning Applications for Mass Spectrometry-Based Metabolomics.

Authors: Ulf W Liebal; An N T Phan; Malvika Sudhakar; Karthik Raman; Lars M Blank
Journal: Metabolites Date: 2020-06-13

8. Stability issues of RT-PCR testing of SARS-CoV-2 for hospitalized patients clinically diagnosed with COVID-19.

Authors: Yafang Li; Lin Yao; Jiawei Li; Lei Chen; Yiyan Song; Zhifang Cai; Chunhua Yang
Journal: J Med Virol Date: 2020-04-05 Impact factor: 2.327

9. Rapid point-of-care testing for SARS-CoV-2 in a community screening setting shows low sensitivity.

Authors: M Döhla; C Boesecke; B Schulte; C Diegmann; E Sib; E Richter; M Eschbach-Bludau; S Aldabbagh; B Marx; A-M Eis-Hübinger; R M Schmithausen; H Streeck
Journal: Public Health Date: 2020-04-18 Impact factor: 2.427

10. Testing for SARS-CoV-2 (COVID-19): a systematic review and clinical guide to molecular and serological in-vitro diagnostic assays.

Authors: Antonio La Marca; Martina Capuzzo; Tiziana Paglia; Laura Roli; Tommaso Trenti; Scott M Nelson
Journal: Reprod Biomed Online Date: 2020-06-14 Impact factor: 3.828

22 in total

1. Metabolomic Profiling of Plasma Reveals Differential Disease Severity Markers in COVID-19 Patients.

Authors: Lucas Barbosa Oliveira; Victor Irungu Mwangi; Marco Aurélio Sartim; Jeany Delafiori; Geovana Manzan Sales; Arthur Noin de Oliveira; Estela Natacha Brandt Busanello; Fernando Fonseca de Almeida E Val; Mariana Simão Xavier; Fabio Trindade Costa; Djane Clarys Baía-da-Silva; Vanderson de Souza Sampaio; Marcus Vinicius Guimarães de Lacerda; Wuelton Marcelo Monteiro; Rodrigo Ramos Catharino; Gisely Cardoso de Melo
Journal: Front Microbiol Date: 2022-04-27 Impact factor: 6.064

Review 2. The remodel of the "central dogma": a metabolomics interaction perspective.

Authors: Gilson Costa Dos Santos; Mariana Renovato-Martins; Natália Mesquita de Brito
Journal: Metabolomics Date: 2021-05-09 Impact factor: 4.290

3. MS-based targeted profiling of oxylipins in COVID-19: A new insight into inflammation regulation.

Authors: Denise Biagini; Maria Franzini; Paolo Oliveri; Tommaso Lomonaco; Silvia Ghimenti; Andrea Bonini; Federico Vivaldi; Lisa Macera; Laurence Balas; Thierry Durand; Camille Oger; Jean-Marie Galano; Fabrizio Maggi; Alessandro Celi; Aldo Paolicchi; Fabio Di Francesco
Journal: Free Radic Biol Med Date: 2022-01-25 Impact factor: 7.376

4. Systematic review with meta-analysis of diagnostic test accuracy for COVID-19 by mass spectrometry.

Authors: Matt Spick; Holly M Lewis; Michael J Wilde; Christopher Hopley; Jim Huggett; Melanie J Bailey
Journal: Metabolism Date: 2021-10-27 Impact factor: 8.694

5. MALDI-TOF mass spectrometry of saliva samples as a prognostic tool for COVID-19.

Authors: Lucas C Lazari; Rodrigo M Zerbinati; Livia Rosa-Fernandes; Veronica Feijoli Santiago; Klaise F Rosa; Claudia B Angeli; Gabriela Schwab; Michelle Palmieri; Dmitry J S Sarmento; Claudio R F Marinho; Janete Dias Almeida; Kelvin To; Simone Giannecchini; Carsten Wrenger; Ester C Sabino; Herculano Martinho; José A L Lindoso; Edison L Durigon; Paulo H Braz-Silva; Giuseppe Palmisano
Journal: J Oral Microbiol Date: 2022-02-27 Impact factor: 5.474

6. Immune age and biological age as determinants of vaccine responsiveness among elderly populations: the Human Immunomics Initiative research program.

Authors: Jaap Goudsmit; Anita Huiberdina Johanna van den Biggelaar; Wouter Koudstaal; Albert Hofman; Wayne Chester Koff; Theodore Schenkelberg; Galit Alter; Michael Joseph Mina; Julia Wei Wu
Journal: Eur J Epidemiol Date: 2021-06-12 Impact factor: 8.082

Review 7. The snapshot of metabolic health in evaluating micronutrient status, the risk of infection and clinical outcome of COVID-19.

Authors: Dimitris Tsoukalas; Evangelia Sarandi; Spyridoula Georgaki
Journal: Clin Nutr ESPEN Date: 2021-06-26

8. Targeted metabolomics identifies high performing diagnostic and prognostic biomarkers for COVID-19.

Authors: Yamilé López-Hernández; Joel Monárrez-Espino; Ana-Sofía Herrera-van Oostdam; Julio Enrique Castañeda Delgado; Lun Zhang; Jiamin Zheng; Juan José Oropeza Valdez; Rupasri Mandal; Fátima de Lourdes Ochoa González; Juan Carlos Borrego Moreno; Flor M Trejo-Medinilla; Jesús Adrián López; José Antonio Enciso Moreno; David S Wishart
Journal: Sci Rep Date: 2021-07-19 Impact factor: 4.379

9. Kynurenine and Hemoglobin as Sex-Specific Variables in COVID-19 Patients: A Machine Learning and Genetic Algorithms Approach.

Authors: Jose M Celaya-Padilla; Karen E Villagrana-Bañuelos; Juan José Oropeza-Valdez; Joel Monárrez-Espino; Julio E Castañeda-Delgado; Ana Sofía Herrera-Van Oostdam; Julio César Fernández-Ruiz; Fátima Ochoa-González; Juan Carlos Borrego; Jose Antonio Enciso-Moreno; Jesús Adrián López; Yamilé López-Hernández; Carlos E Galván-Tejada
Journal: Diagnostics (Basel) Date: 2021-11-25

10. A machine learning PROGRAM to identify COVID-19 and other diseases from hematology data.

Authors: Patrick A Gladding; Zina Ayar; Kevin Smith; Prashant Patel; Julia Pearce; Shalini Puwakdandawa; Dianne Tarrant; Jon Atkinson; Elizabeth McChlery; Merit Hanna; Nick Gow; Hasan Bhally; Kerry Read; Prageeth Jayathissa; Jonathan Wallace; Sam Norton; Nick Kasabov; Cristian S Calude; Deborah Steel; Colin Mckenzie
Journal: Future Sci OA Date: 2021-06-12