| Literature DB >> 34783559 |
Ying Zhang1, Xue Cai2,3,4, Weigang Ge2,3,4,5, Donglian Wang1, Guangjun Zhu1, Liujia Qian2,3,4, Nan Xiang2,3,4,5, Liang Yue2,3,4, Shuang Liang2,3,4, Fangfei Zhang2,3,4, Jing Wang1, Kai Zhou1, Yufen Zheng1, Minjie Lin1, Tong Sun1, Ruyue Lu1, Chao Zhang1, Luang Xu2,3,4, Yaoting Sun2,3,4, Xiaoxu Zhou2,4, Jing Yu2,3,4, Mengge Lyu2,3,4, Bo Shen1, Hongguo Zhu1, Jiaqin Xu1, Yi Zhu2,3,4, Tiannan Guo2,3,4.
Abstract
RT-PCR is the primary method to diagnose COVID-19 and is also used to monitor the disease course. This approach, however, suffers from false negatives due to RNA instability and poses a high risk to medical practitioners. Here, we investigated the potential of using serum proteomics to predict viral nucleic acid positivity during COVID-19. We analyzed the proteome of 275 inactivated serum samples from 54 out of 144 COVID-19 patients and shortlisted 42 regulated proteins in the severe group and 12 in the non-severe group. Using these regulated proteins and several key clinical indexes, including days after symptoms onset, platelet counts, and magnesium, we developed two machine learning models to predict nucleic acid positivity, with an AUC of 0.94 in severe cases and 0.89 in non-severe cases, respectively. Our data suggest the potential of using a serum protein-based machine learning model to monitor COVID-19 progression, thus complementing swab RT-PCR tests. More efforts are required to promote this approach into clinical practice since mass spectrometry-based protein measurement is not currently widely accessible in clinic.Entities:
Keywords: COVID-19; disease course monitoring; machine learning; proteomics; serum
Mesh:
Year: 2021 PMID: 34783559 PMCID: PMC8610005 DOI: 10.1021/acs.jproteome.1c00525
Source DB: PubMed Journal: J Proteome Res ISSN: 1535-3893 Impact factor: 4.466
Figure 1Summary of the serum sample collection from COVID-19 patients. The days are numbered from the onset of the symptoms. The gray squares represent the days before admission, the orange ones represent the days on which nucleic acid test was positive, and the green ones represent the days on which nucleic acid test was negative. A total of 275 serum samples were collected from 54 COVID-19 patients at different time points for MS analysis as indicated with ∗.
Clinical Characteristics of COVID-19 Patients and Controls
| COVID-19 | |||||
|---|---|---|---|---|---|
| variables | total ( | non-severe ( | severe ( | non-COVID-19 ( | healthy control ( |
| Sex- No. (%) | |||||
| male | 77 (53) | 57 (53) | 20 (56) | 16 (67) | 14 (67) |
| female | 67 (47) | 51 (47) | 16 (44) | 8 (33) | 7 (33) |
| Age- Yr. | |||||
| mean ± SD | 47.7 ± 14.5 | 45.0 ± 14.2 | 55.7 ± 12.5 | 49.3 ± 14.3 | 45.2 ± 8.0 |
| median (IQR) | 47.0 (38.0–56.0) | 44.5 (37.0–54.0) | 55.0 (47.8–65.0) | 54.0 (36.5–61.0) | 46.0 (38.0–51.5) |
| range | 4.0–86.0 | 4.0–86.0 | 33.0–79.0 | 23.0–67.0 | 28.0–57.0 |
| Time from Onset to Admission, Days | |||||
| mean ± SD | 7.0 ± 4.1 | 6.7 ± 3.8 | 7.9 ± 4.9 | ||
| median (IQR) | 6.0 (4.0–10.0) | 6.0 (4.0–9.0) | 7.5 (4.0–11.0) | ||
| range | 1.0–24.0 | 1.0–18.0 | 1.0–24.0 | ||
| Time from Admission to Severe, Days | |||||
| mean ± SD | 2.6 ± 1.5 | ||||
| median (IQR) | 2.0 (1.0–3.8) | ||||
| range | 0.0–7.0 | ||||
| Time from Admission to Discharge, Days | |||||
| mean ± SD | 21.6 ± 9.4 | 20.5 ± 9.7 | 24.7 ± 7.8 | ||
| median (IQR) | 21.5 (13.0–28.0) | 20.0 (13.0–27.0) | 23.0 (19.3–31.8) | ||
| range | 6.0–44.0 | 6.0–44.0 | 9.0–40.0 | ||
| Symptoms- No. (%) | |||||
| with fever | 104 (72.2) | 70 (64.8) | 34.0 (94.4) | ||
| without fever | 40 (27.8) | 38 (35.2) | 2 (5.6) | ||
| Comorbidity- No. (%) | |||||
| with comorbidity | 59 (41.0) | 41 (37.9) | 18 (50.0) | ||
| without comorbidity | 85 (59.0) | 67 (62.1 | 18 (50.0) | ||
| Chest CT- No. (%) | |||||
| abnormal chest radiographs | 141 (97.9) | 105 (97.2) | 36 (100.0) | ||
| total sample- no. | 631 | 380 | 251 | 24 | 21 |
| sample with MS analysis- no. | 275 | 147 | 128 | 24 | 21 |
Figure 2Study design. (A) Five stages of the COVID-19 course. Different colors represent different stages. (B) Workflow of the SWATH-MS and the data analysis. Study population: 36 severe and 108 non-severe COVID-19 patients, 24 non-COVID patients, and 21 healthy individuals. A total of 320 serum samples were analyzed by SWATH-MS. Dysregulated serum proteins were analyzed by ANOVA and Mfuzz in the five stages of the severe and non-severe COVID-19 cases, respectively. On the basis of the resulting dysregulated proteins, two machine learning models were built to identify NCP and NCN.
Figure 3Dysregulated serum proteins dynamics in the course of COVID-19. (A) Three clusters of proteins identified with Mfuzz analysis showed trends of continuous changes in the five stages of severe COVID-19 course. One cluster showed an upward trend for the non-severe COVID-19 course. (B) Heatmap of 48 dysregulated proteins in the five stages of the COVID-19 course. Proteins with green boxes were selected as input features for the machine learning models using ANOVA and Mfuzz.
Figure 4Machine learning models for predicting the stage of the severe and non-severe courses. (A) Workflow of two machine learning models built with proteomics quantitative data and clinical indexes. (B) Prioritization of 8 important variables in the severe model and 23 important variables in the non-severe model. (C) ROC plots for the two machine learning models of the severe (left) and the non-severe patients (right). (D) Performance of the severe and the non-severe models in the two test cohorts of 30 independent samples. The labeled numbers represent the sample ID.
Figure 5Eight significantly dysregulated proteins in COVID-19 compared to non-COVID-19 patients or healthy controls. Expression level changes of APOA4, APOB, APOC2, APOC3, APOH, SERPIND1, FN1, and ITIH1 in the five stages of the COVID-19 course as well as in non-COVID-19 patients and healthy controls. Asterisks indicate the statistical significance based on the unpaired two-sided Welch’s t test p-value: *, < 0.05; **, < 0.01; ***, < 0.001; ****, < 0.0001.