| Literature DB >> 36012423 |
Antonio Paolo Beltrami1,2, Maria De Martino1, Emiliano Dalla1, Matilde Clarissa Malfatti1, Federica Caponnetto1, Marta Codrich1,2, Daniele Stefanizzi1, Martina Fabris2, Emanuela Sozio2, Federica D'Aurizio2, Carlo E M Pucillo1, Leonardo A Sechi1, Carlo Tascini1,2, Francesco Curcio1,2, Gian Luca Foresti3, Claudio Piciarelli3, Axel De Nardin3, Gianluca Tell1, Miriam Isola1.
Abstract
The persistence of long-term coronavirus-induced disease 2019 (COVID-19) sequelae demands better insights into its natural history. Therefore, it is crucial to discover the biomarkers of disease outcome to improve clinical practice. In this study, 160 COVID-19 patients were enrolled, of whom 80 had a "non-severe" and 80 had a "severe" outcome. Sera were analyzed by proximity extension assay (PEA) to assess 274 unique proteins associated with inflammation, cardiometabolic, and neurologic diseases. The main clinical and hematochemical data associated with disease outcome were grouped with serological data to form a dataset for the supervised machine learning techniques. We identified nine proteins (i.e., CD200R1, MCP1, MCP3, IL6, LTBP2, MATN3, TRANCE, α2-MRAP, and KIT) that contributed to the correct classification of COVID-19 disease severity when combined with relative neutrophil and lymphocyte counts. By analyzing PEA, clinical and hematochemical data with statistical methods that were able to handle many variables in the presence of a relatively small sample size, we identified nine potential serum biomarkers of a "severe" outcome. Most of these were confirmed by literature data. Importantly, we found three biomarkers associated with central nervous system pathologies and protective factors, which were downregulated in the most severe cases.Entities:
Keywords: COVID-19; cardiometabolic; inflammation; neurologic disease; proximity extension assay
Mesh:
Substances:
Year: 2022 PMID: 36012423 PMCID: PMC9409308 DOI: 10.3390/ijms23169161
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 6.208
Summary of baseline characteristics of enrolled patients. Baseline demographic, clinical features (comorbidities), and vaccination status of the enrolled patients (n = 160). Patients were stratified according to the severity of the disease into a “non-severe” group and a “severe” one. Data are presented as either percentage or median and interquartile range (IQR). Results of the comparison between “non-severe” vs. “severe” patients are shown in the right column (p-value). Significant results are shown in bold.
| Total | Non-Severe | Severe | ||
|---|---|---|---|---|
|
| 0.251 | |||
|
| 59 (36.9) | 33 (41.2) | 26 (32.5) | |
|
| 101 (63.1) | 47 (58.8) | 54 (67.5) | |
|
| 67 (56–76) | 61 (50–74) | 70 (63–77) |
|
|
| 28.2 (24.9–31.2) | 27.8 (24.9–31.1) | 28.5 (25.5–31.5) | 0.541 |
|
| 0.411 | |||
|
| 79/107 (73.8) | 41/53 (77.4) | 38/54 (70.4) | |
|
| 28/107 (26.2) | 12/53 (22.6) | 16/54 (29.6) | |
|
| 3 (1–5) | 2 (1–5) | 4 (2–5.5) |
|
|
| ||||
|
| 79/155 (51.0) | 39/76 (51.3) | 40/79 (50.6) | 0.932 |
|
| 64/134 (47.8) | 30/66 (45.4) | 34/68 (50.0) | 0.598 |
|
| 31/155 (20.0) | 13/76 (17.1) | 18/79 (22.8) | 0.377 |
|
| 16/156 (10.3) | 5/77 (6.5) | 11/79 (13.9) | 0.126 |
|
| 57/156 (36.5) | 27/77 (35.1) | 30/79 (38.0) | 0.706 |
|
| 5/155 (3.2) | 1/76 (1.3) | 4/79 (5.1) | 0.187 |
|
| 13/156 (8.3) | 6/77 (7.8) | 7/79 (8.9) | 0.809 |
|
| 14/147 (9.5) | 6/67 (9.0) | 8/80 (10.0) | 0.830 |
|
| 24 (15.0) | 20 (25.0) | 4 (5.0) |
|
Summary of clinical characteristics of enrolled patients. Stratification of the enrolled patients according to the disease severity, necessity for invasive and non-invasive ventilation, admission to the intensive care unit, the duration of hospitalization, death rate, and the duration of infection. Data are presented as either percentage or median and interquartile range. Results of the comparison between non-severe vs. severe patients are shown in the right column (p-value). Significant results are shown in bold.
| Total | Non-Severe | Severe | ||
|---|---|---|---|---|
|
|
| |||
|
| 16 (10.0) | 16 (20.0) | 0 (0.0) | |
|
| 55 (34.4) | 55 (68.7) | 0 (0.0) | |
|
| 14 (8.7) | 9 (11.3) | 5 (6.3) | |
|
| 71 (44.4) | 0 (0.0) | 71 (88.7) | |
|
| 4 (2.5) | 0 (0.0) | 4 (5.0) | |
|
| 69 (43.1) | 1 (1.2) | 68 (85.0) |
|
|
| 45 (28.1) | 0 (0.0) | 45 (56.2) |
|
|
| 9 (5–15.5) | 5.5 (4–9) | 14 (9–19) |
|
|
| 51 (31.9) | 0 (0.0) | 51 (63.7) |
|
|
| 56 (35.0) | 0 (0.0) | 56 (70.0) |
|
|
| 19 (14–24) | 17 (12–20) | 21.5 (17.5–29.5) |
|
Figure 1Elastic net logistic regression with hematochemical variables. Receiver Operating Characteristic (ROC), Area Under the Curve (AUC) with its 95% confidence interval (CI) are 0.868 (95% CI 0.785–0.952).
Figure 2Cardiometabolic biomarkers associated with COVID-19 severity. (a) Elastic net logistic regression with cardiometabolic variables. Coefficients of each variable are shown in the left panel as horizontal bars. The right panel shows that the receiver operating characteristic (ROC) and area under the curve (AUC), with its 95% confidence interval (CI) of the model, are 0.931 (95% CI 0.873–0.989). (b) Violin plots showing the distribution of the normalized protein expression (NPX) values of each variable included in the model in “non-severe” (blue) vs. “severe” (red) patients; p values for each comparison are shown at the bottom of each plot.
Summary of elastic net logistic regression analyses of biomarkers associated with disease severity.
| Short Name | Biomarker Category | Association with Outcome | Literature Data | |
|---|---|---|---|---|
|
| TIMP-1 | Cardio | Direct | [ |
|
| LTBP2 | Cardio | Direct | [ |
|
| IGFBP3 | Cardio | Inverse | [ |
|
| FETUB | Cardio | Inverse | [ |
|
| TNC | Cardio | Direct | [ |
|
| KIT | Cardio | Inverse | |
|
| CR2 | Cardio | Inverse | |
|
| C2 | Cardio | Direct | [ |
|
| APOM | Cardio | Inverse | [ |
|
| IGLC2 | Cardio | Direct | [ |
|
| NID1 | Cardio | Direct | [ |
|
| ENG | Cardio | Inverse | [ |
|
| GAS6 | Cardio | Direct | [ |
|
| CA3 | Cardio | Direct | [ |
|
| MCP3 | Inflammatory | Direct | [ |
|
| IL6 | Inflammatory | Direct | [ |
|
| IL12B | Inflammatory | Inverse | [ |
|
| OPG | Inflammatory | Direct | [ |
|
| CX3CL1 | Inflammatory | Direct | [ |
|
| TNFSF11 or TRANCE, RANKL, OPGL | Inflammatory | Inverse | [ |
|
| CD6 | Inflammatory | Inverse | [ |
|
| LIF | Inflammatory | Direct | [ |
|
| TNFRSF12A | Neurology | Direct | [ |
|
| MATN3 | Neurology | Direct | [ |
|
| CD200R1 | Neurology | Inverse | [ |
|
| TNR | Neurology | Inverse | |
|
| EZR | Neurology | Direct | [ |
|
| IL12 | Neurology | Inverse | [ |
|
| NTRK3 | Neurology | Inverse | [ |
|
| NrCAM | Neurology | Direct | [ |
|
| UNC5C | Neurology | Direct | [ |
|
| CTSS | Neurology | Direct | [ |
|
| DRAXIN | Neurology | Direct | [ |
|
| CNTN5 | Neurology | Inverse | [ |
|
| GZMA | Neurology | Inverse * | [ |
|
| CDH3 | Neurology | Direct | [ |
|
| NTRK2 | Neurology | Inverse | [ |
Cardiometabolic, inflammatory, and neurology disease biomarkers associated with disease severity, according to the elastic net logistic regression analyses. Columns indicate, for each biomarker, its full name, its short name, whether a direct or inverse relationship between the biomarker level and disease severity was identified, and literature data supporting our findings. Biomarkers highlighted in grey have not yet been associated with severe COVID-19 by other authors. * an opposite, yet significant association was described in the literature. † literature results do not reach significance.
Figure 3Inflammatory biomarkers associated with COVID-19 severity. (a) Elastic net logistic regression with inflammatory variables. Coefficients of each variable are shown in the left panel as horizontal bars. The right panel shows the ROC and AUC of 0.906 (95% CI 0.832–0.980). (b) Violin plots showing the distribution of the normalized protein expression (NPX) values of each variable included in the model in “non-severe” (blue) vs. “severe” (red) patients; p values for each comparison are shown at the bottom of each plot.
Figure 4Neurology biomarkers associated with COVID-19 severity. (a) Elastic net logistic regression with neurology-related variables. Coefficients of each variable are shown in the left panel as horizontal bars. The right panel shows the ROC and AUC of 0.918 (95% CI 0.853–0.984). (b) Violin plots showing the distribution of the normalized protein expression (NPX) values of each variable included in the model in “non-severe” (blue) vs. “severe” (red) patients; p values for each comparison are shown at the bottom of each plot.
Figure 5Correlation discovery analysis. (a) Elastic net logistic regression with hematochemical, cardiometabolic, inflammatory, and neurology-related variables. ROC and AUC with its 95% confidence interval (CI) are 0.910 (95% CI 0.837–0.984). (b) Results of the mutual information analysis. On the x-axis, the dataset parameters are provided ranked based on their MI value related to the target variable. (c) Ranking of the dataset parameters based on the GINI index values obtained by the adopted random forest mode. (d) The ranking of the features obtained through the RFE algorithm. (e) The graph was obtained through the SHAP analysis approach. On each row of the graph, the value of the corresponding feature for each instance of the dataset is represented by a dot. The color of the dot indicates how large the value of the feature is in that instance (blue dot: small value, red dot: large value). Furthermore, the position of the dot with respect to the central vertical line indicates whether that feature led the model to classify the patient as a severe case (dot on the right side of the line) or not (dot on the left side). As a clarifying example, we can see that for the MCP3 parameter higher values are correlated with a severe condition for the patient, as the red dots are mostly on the right side of the graph, while the instances in which the MCP3 value is low are often correlated with a non-severe condition for the patient. In (b–e), the parameters are color-coded to represent their category (red: hematochemical; yellow: inflammatory biomarkers; orange: cardiometabolic biomarkers; blue: neurological biomarkers; green: clinical data). (f) Venn diagram of the variables shared between the five models. The eight variables shared by all the models are Neutrophil count, MCP3, IL6, TRANCE, MCP1, CD200R1, MATN3, and LTBP2; KIT and a-MRAP are shared by four models.
Summary of biomarkers emerging from the correlation discovery analyses.
| Short Name | Model | Literature Data | |
|---|---|---|---|
|
| CX3CL1 | ENLR | [ |
|
| CD200R1 | ENLR, MI, GINI, SHAP, RFE | [ |
|
| C2 | ENLR | [ |
|
| CDCP1 | SHAP | [ |
|
| DRAXIN | MI | [ |
|
| EZR | MI, GINI, RFE | [ |
|
| FETUB | ENLR | [ |
|
| GDNFR-α 3 | GINI | [ |
|
| GDF8 | MI | [ |
|
| HGF | MI, GINI | [ |
|
| IGFBP3 | ENLR | [ |
|
| ICAM1 | MI | [ |
|
| IL12 | ENLR | [ |
|
| IL12B | ENLR | [ |
|
| IL6 | ENLR, MI, GINI, SHAP, RFE | [ |
|
| IL8 | GINI, SHAP | [ |
|
| KIT | MI, GINI, SHAP, RFE | |
|
| KYNU | MI | [ |
|
| LTBP2 | ENLR. MI, GINI, SHAP, RFE | [ |
|
| LIF | ENLR, GINI | [ |
|
| MATN3 | ENLR, MI, GINI, SHAP, RFE | [ |
|
| MCP1 | ENLR, MI, GINI, SHAP, RFE | [ |
|
| MCP3 | ENLR, MI, GINI, SHAP, RFE | [ |
|
| UNC5C | ENLR | [ |
|
| NrCAM | ENLR | [ |
|
| NTRK2 | SHAP | [ |
|
| NTRK3 | ENLR | [ |
|
| OPG | ENLR | [ |
|
| SCF | GINI | [ |
|
| TNR | ENLR | |
|
| TIMP-1 | ENLR, MI, GINI | [ |
|
| TNFSF11 or TRANCE, RANKL, OPGL | ENLR, MI, GINI, SHAP, RFE | [ |
|
| TNFRSF12A | ENLR, RFE | [ |
|
| α2-MRAP | MI, GINI, SHAP, RFE | [ |
Correlation discovery analyses results. Table summarizing biomarkers associated with disease severity in the analyses conducted on the complete dataset, including demographic, clinical, anamnestic, hematochemical, and immunometric parameters. The columns show in which of the five models the biomarker was associated with the severity of the disease and which literature data confirm our results. Biomarkers highlighted in grey have not yet been associated with severe COVID-19 by other authors. ENLR: elastic net logistic regression; MI: mutual information analysis; GINI: GINI index analysis; SHAP: Shapley additive explanations analysis; RFE: recursive feature extraction analysis. † the literature results do not reach significance.
Figure 6Functional enrichment analysis of proteins associated with a “severe” outcome. Pie charts representing the most enriched functional terms (padj ≤ 0.05) related to proteins whose coefficients were positively (a) and negatively (b) associated with the “severe” class. The Cytoscape [65] app ClueGO [66] was used to query the following functional databases: GO (BiologicalProcess and ImmuneSystemProcess), KEGG, REACTOME, CLINVAR, WikiPathways, and CORUM-3.0.