| Literature DB >> 33656857 |
Ling Yan1, Jia Yi2, Changwu Huang3, Jian Zhang4, Shuhui Fu5, Zhijie Li1, Qian Lyu5, Yuan Xu1, Kun Wang1, Huan Yang1, Qingwei Ma5, Xiaoping Cui6, Liang Qiao2, Wei Sun4, Pu Liao1.
Abstract
The outbreak of coronavirus disease 2019 (COVID-19) caused by SARS CoV-2 is ongoing and a serious threat to global public health. It is essential to detect the disease quickly and immediately to isolate the infected individuals. Nevertheless, the current widely used PCR and immunoassay-based methods suffer from false negative results and delays in diagnosis. Herein, a high-throughput serum peptidome profiling method based on matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) is developed for efficient detection of COVID-19. We analyzed the serum samples from 146 COVID-19 patients and 152 control cases (including 73 non-COVID-19 patients with similar clinical symptoms, 33 tuberculosis patients, and 46 healthy individuals). After MS data processing and feature selection, eight machine learning methods were used to build classification models. A logistic regression machine learning model with 25 feature peaks achieved the highest accuracy (99%), with sensitivity of 98% and specificity of 100%, for the detection of COVID-19. This result demonstrated a great potential of the method for screening, routine surveillance, and diagnosis of COVID-19 in large populations, which is an important part of the pandemic control.Entities:
Mesh:
Substances:
Year: 2021 PMID: 33656857 PMCID: PMC7945584 DOI: 10.1021/acs.analchem.0c04590
Source DB: PubMed Journal: Anal Chem ISSN: 0003-2700 Impact factor: 6.986
Figure 1Scheme of establishing a diagnostic model for rapid screening of COVID-19 patients. Serum samples collected from COVID-19 patients and control participants were analyzed with MALDI-TOF after simple pretreatment. Mass spectra were aligned with MALDIquant, and significant features were selected to establish the diagnostic model with different machine learning methods.
Figure 2Selection of 25 feature peaks for COVID-19 detection. (a) General scheme of the data processing and feature selection workflow. (b) Top 20 features prioritized by LASSO analysis ranked by the decrease in repetition frequency. (c) Top 20 features prioritized by PLS-DA ranked by the decrease in VIP values. NV: non-COVID-19. V: COVID-19. (d) Top 10 features prioritized by RFECV ranked by the decrease in feature importance scores. (e) Heatmap of the selected 25 features. (f) ROC curves of eight different machine learning models in the training cohort by cross validation.
Figure 3Identification of COVID-19 patients using machine learning-based classification modes in the test cohort. (a) PCA analysis using the 25 features. (b) ROC curves by eight machine learning methods. (c) Summary of the accuracy, precision, F1-score, sensitivity, and specificity obtained for each machine learning method. (d) Confusion matrix of the classification results by the LR mode.