Literature DB >> 33656857

Rapid Detection of COVID-19 Using MALDI-TOF-Based Serum Peptidome Profiling.

Ling Yan1, Jia Yi2, Changwu Huang3, Jian Zhang4, Shuhui Fu5, Zhijie Li1, Qian Lyu5, Yuan Xu1, Kun Wang1, Huan Yang1, Qingwei Ma5, Xiaoping Cui6, Liang Qiao2, Wei Sun4, Pu Liao1.   

Abstract

The outbreak of coronavirus disease 2019 (COVID-19) caused by SARS CoV-2 is ongoing and a serious threat to global public health. It is essential to detect the disease quickly and immediately to isolate the infected individuals. Nevertheless, the current widely used PCR and immunoassay-based methods suffer from false negative results and delays in diagnosis. Herein, a high-throughput serum peptidome profiling method based on matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) is developed for efficient detection of COVID-19. We analyzed the serum samples from 146 COVID-19 patients and 152 control cases (including 73 non-COVID-19 patients with similar clinical symptoms, 33 tuberculosis patients, and 46 healthy individuals). After MS data processing and feature selection, eight machine learning methods were used to build classification models. A logistic regression machine learning model with 25 feature peaks achieved the highest accuracy (99%), with sensitivity of 98% and specificity of 100%, for the detection of COVID-19. This result demonstrated a great potential of the method for screening, routine surveillance, and diagnosis of COVID-19 in large populations, which is an important part of the pandemic control.

Entities:  

Mesh:

Substances:

Year:  2021        PMID: 33656857      PMCID: PMC7945584          DOI: 10.1021/acs.analchem.0c04590

Source DB:  PubMed          Journal:  Anal Chem        ISSN: 0003-2700            Impact factor:   6.986


Coronavirus disease-2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has become an unprecedented global threat[1,2] and was declared a pandemic by the World Health Organization on March 11, 2020.[3] Accurate, rapid and high-throughput detection of COVID-19 patients, especially among those with similar symptoms, is very important for the epidemic control. A number of COVID-19 identification methods based on the detection of virus nucleic acids, virus proteins, or specific antibodies generated by the human immune system have been developed.[4] Reverse transcription polymerase chain reaction (RT-PCR) is the most predominantly used method for the detection of SARS-CoV-2 nucleic acids using respiratory samples[5] and has become the standard clinical method for the diagnosis of COVID-19. However, improper sampling operations, low viral load in the sample collection area or at the collection time, nonstandard transportation processes, and viral mutations often lead to false negative results.[6,7] The test sensitivity of RT-PCR for throat swab samples has been reported to be only from 30% to 60%,[6,8] and multiple evidences are recommended for epidemiologically linked patients even when the nasopharyngeal and/or oropharyngeal swab tests are negative.[7] New techniques, such as CARMEN-Cas13[9] and isothermal amplification,[10] have also emerged. In addition, mass spectrometry-based methods have also been used to detect unique peptides of SARS-CoV-2 nucleoproteins[11] and nucleocapsid N protein[12] in the physiological fluids of infected patients. Immunoassays for the detection of specific immunoglobulin G and M (IgG and IgM) corresponding to SARS-CoV-2 virus infection in serum are alternative methods of clinical diagnosis of COVID-19. However, there is a distinct delay for the generation of the antibodies after infection. It is reported that the median interval for the generation of SARS-CoV-2 specific IgG and IgM is 13 days post symptom onset, and it takes approximately 17–19 days or 20–22 days after symptom onset that positive rates for IgG reach 100% or positive rates for IgM reach 94.1%, respectively.[13] Chest CT has also been used in clinical diagnosis of SARS-CoV-2 infection. It is a noninvasive imaging examination with high speed and high test sensitivity (86%–98%),[6,14−16] but the test specificity is very low (25%)[6] with a relatively high cost and requirement for experienced experts. To date, there is still an urgent need of accurate and high-throughput detection of COVID-19 for large population screening. Besides targeting virus nucleic acids, virus proteins, or infection specific antibodies, detection of the inducted molecular changes in the host during and after infection is another effective way for disease diagnosis and deciphering the host response to virus. For example, elevation of the C-reactive protein and D-dimer as well as a decrease of lymphocytes, leukocytes, and blood platelets have been reported in infected patients,[14] and cytokines IP-10 and MCP-3 were found as severity predictors for the progression of COVID-19.[17] Human serum contains a complex array of proteolytically derived peptides, namely, serum peptidome, that is correlated with the biological events occurring in the entire organism.[18,19] Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS)-based peptidome analysis, as a rapid, high-throughput, low cost, and easily handled technique, has been successfully used in clinic applications, including microorganism identification[20] approved by the U.S. Food and Drug Administration, and determining treatment benefits from epidermal growth factor receptor (EGFR) inhibitors in nonsmall cell lung cancer patients as a laboratory-developed test in clinics.[21] It has also shown great potential for the screening of Alzheimer’s disease[22] and rapid diagnosis and treatment monitoring of active Mycobacterium tuberculosis (TB) disease.[23] In this study, serum samples of 146 COVID-19 patients, including mild, typical, severe, and critical classifications, and 152 control individuals, including non-COVID-19 patients with fever/cough symptoms, TB patients, and healthy controls, were analyzed by MALDI-TOF MS. Twenty-five MS peaks with statistically significant differences between the patients and control individuals were identified. Using various machine learning methods, a model constructed with the 25 peaks was established, showing high accuracy (99%) for the identification of COVID-19 patients with a sensitivity of 98% and specificity of 100% on a test cohort (100 samples) independent from the samples used for model generation and feature selection. This accurate and rapid method provides a powerful tool for high-throughput screening and surveillance of COVID-19.

Experimental Section

Patients and Samples

Non-COVID-19 serum samples were collected as the control, including 73 samples from non-COVID-19 patients with similar clinical symptoms, 33 samples from TB patients, and 46 samples from healthy individuals. These samples were collected from Chongqing General Hospital. The serum samples from 146 COVID-19 patients were collected from Sanxia Hospital. All the patients were diagnosed as COVID-19 according to the Chinese Government Diagnosis and Treatment Guideline. The COVID-19 virus nucleic acid detections in throat swabs were all positive. The computed tomography (CT) ground-glass opacity pneumonia characteristic was detected in 97% (142/146) of the patients at the sampling time. The symptoms, such as fever, cough, fatigue, and dyspnea, severity classification, and other information are shown in Table S1 and Data set S1. Classification of COVID-19 patients was performed according to the following guideline: (1) Mild (n = 2): mild symptoms without pneumonia, (2) Typical (n = 114): fever or respiratory tract symptoms with pneumonia, (3) Severe (fulfill any of the three criteria, n = 26): respiratory distress with respiratory rate ≥ 30 times/min, oxygen saturation ≤ 93% in resting state, arterial blood oxygen partial pressure (PaO2) /oxygen concentration (FiO2) ≤ 300 mmHg, (4) Critical (fulfill any of the three criteria, n = 4): respiratory failure and require mechanical ventilation, shock incidence, admission to ICU with other organ failure. The peripheral venous blood was collected in the morning before eating using a vacuum serum tube without additives (Shandong Weihai Weigao Blood Collection Supplies Co., Ltd., China). After centrifugation at 2264g for 10 min and 56 °C sterilization for 30 min, the serum samples were aliquoted and frozen at −80 °C.

Sample Preparation and MALDI-TOF Analysis

For each sample, 5 μL of serum was diluted with 45 μL of dilution buffer (PMFpre kit 1010305, Bioyong Technologies Inc., Beijing, China). Then, 10 μL of the diluted serum was mixed with 10 μL of sinapinic acid matrix solution, and 2 μL of the mixture was pipetted onto a stainless-steel target plate. After being dried at room temperature, the samples were analyzed on a MALDI-TOF MS (Clin-TOF-II; Bioyong Technologies Inc., Beijing, China) in a linear positive mode. Mass calibration was performed with the standard calibration mixture of peptides and proteins (Sigma-Aldrich Co., St. Louis, MO, USA) with a tolerance of 500 ppm. Each spectrum was accumulated with 500 laser shots (50 positions per sample spot and 10 laser shots per position).

MALDI-TOF Data Analysis

MALDI-TOF raw data were processed with R packages MALDIquant, MALDIquantForeign,[24] and limma.[25] The data of COVID-19 patients and control samples were randomly split into training and validation cohorts with an allocation of 2:1, corresponding to 198 (101 controls and 97 patients) and 100 (51 controls and 49 patients) samples, respectively. The partial least-squares-discriminant analysis (PLS-DA) was performed using Metaboanalyst 4.0[26] (McGill University, Montreal, Canada, https://www.metaboanalyst.ca/). A two-tailed t test was used to identify the differentially expressed peaks between COVID-19 and control samples on the training data set. The P values were corrected by the Benjamini–Hochberg algorithm. Peaks with adjusted P < 0.05 and fold change greater than 1.5 were used for further feature selection with the least absolute shrinkage and selection operator (LASSO)[27] regression conducted in R version 3.5.3 with the glmnet[28] package. Recursive feature elimination with cross-validation (RFECV) was performed using the sklearn.feature_selection.RFECV package in Jupyter Notebook 5.7.9. The estimator of parameters was logical regression, and the CV of parameters was 10.[29] Logistic regression (LR),[30] support vector machine (SVM),[31] naive bayes (NB),[32] decision tree (DT),[33] random forest (RF),[34] gradient boosting decision tree (GBDT),[35,36] AdaBoost,[37] and K-nearest neighbors (KNN)[38] were performed using the sklearn package in Jupyter Notebook 5.7.9. Detailed information on data analysis is shown in Text S1.

Ethical Statement

This study was approved by the Ethical Committee of Chongqing General Hospital. Informed consents from patients were waived by the board. The samples in this study were from a clinical trial registered in the Chinese Clinical Trial Registry (ChiCTR2000033872).

Results and Discussion

Selection of Feature Peaks on MALDI-TOF Mass Spectra

The workflow of this study is shown in Figure . After sterilization, the serum samples were profiled by a MALDI-TOF MS. The obtained mass spectra were then subjected to peak extraction and alignment, followed by feature selection and classification model construction by using different machine learning methods. The representative mass spectra of samples in different groups are shown in Figure S1. Most of the MS peaks were located between m/z 5000 and m/z 30,000. Distinctive features can be easily observed from the profiles, e.g., m/z = 6357, 6654, 6639, 13,886, and 28,232 that were significantly down-regulated in the COVID-19 group, and m/z = 7614, 15,123, 15,867, and 28,091 that were significantly up-regulated in the COVID-19 group. Totally, 386 peaks were detected after data processing of all the MS raw files (Data set S2).
Figure 1

Scheme of establishing a diagnostic model for rapid screening of COVID-19 patients. Serum samples collected from COVID-19 patients and control participants were analyzed with MALDI-TOF after simple pretreatment. Mass spectra were aligned with MALDIquant, and significant features were selected to establish the diagnostic model with different machine learning methods.

Scheme of establishing a diagnostic model for rapid screening of COVID-19 patients. Serum samples collected from COVID-19 patients and control participants were analyzed with MALDI-TOF after simple pretreatment. Mass spectra were aligned with MALDIquant, and significant features were selected to establish the diagnostic model with different machine learning methods. Feature peaks were selected according to the process shown in Figure a. After intensity normalization and missing value imputing, the peaks in the training cohort (n = 198) were used for feature selection by three machine learning methods, including least absolute shrinkage and selection operator (LASSO), partial least-squares-discriminant analysis (PLS-DA), and recursive feature elimination with cross-validation (REFCV). The 20 peaks with the highest repetition frequency in LASSO, 20 peaks with the highest variable importance in projection (VIP) in PLS-DA, and 10 peaks with automatic tuning of all the features selected by cross-validation accuracy in RFECV were chosen (Figure b–d). After further empirical checking, 25 peaks were identified as the distinctive features between COVID-19 patients and control participants. The intensities of all the feature peaks in the training cohort are shown in Figure e and Figure S2 and Table S2. The intensities of these peaks were significantly different between the COVID-19 and control groups.
Figure 2

Selection of 25 feature peaks for COVID-19 detection. (a) General scheme of the data processing and feature selection workflow. (b) Top 20 features prioritized by LASSO analysis ranked by the decrease in repetition frequency. (c) Top 20 features prioritized by PLS-DA ranked by the decrease in VIP values. NV: non-COVID-19. V: COVID-19. (d) Top 10 features prioritized by RFECV ranked by the decrease in feature importance scores. (e) Heatmap of the selected 25 features. (f) ROC curves of eight different machine learning models in the training cohort by cross validation.

Selection of 25 feature peaks for COVID-19 detection. (a) General scheme of the data processing and feature selection workflow. (b) Top 20 features prioritized by LASSO analysis ranked by the decrease in repetition frequency. (c) Top 20 features prioritized by PLS-DA ranked by the decrease in VIP values. NV: non-COVID-19. V: COVID-19. (d) Top 10 features prioritized by RFECV ranked by the decrease in feature importance scores. (e) Heatmap of the selected 25 features. (f) ROC curves of eight different machine learning models in the training cohort by cross validation. Then, the classification models for the detection of COVID-19 were constructed with the 25 feature peaks in the training cohort by cross validation (9:1, 10 times) using eight machine learning methods, including logistic regression (LR), support vector machine (SVM), random forest (RF), naive Bayes (NB), gradient boosting decision tree (GBDT), K-nearest neighbor (KNN), decision tree (DT), and adaptive boosting (Adaboost). The receiver operating characteristic (ROC) curves of the different machine learning models are shown in Figure f. The area under the curve (AUC) evaluates the performance of a classifier, and a higher AUC means a better classification.[39] All of the models got AUC above 0.99. For LR, SVM, RF, GBDT, DT, and Adaboost models, the AUC was 1.

Identification of COVID-19 Patients by MALDI-TOF Profiling of Serum Using Machine Learning-Based Classification

The classification efficiency of the 25 feature peaks was then validated in a test cohort (n = 100) that was independent from the samples included in the training cohort. The intensities of the feature peaks in the test cohort are shown in Table S3. The PCA analysis based on the 25 features demonstrated that the COVID-19 patients and control cases in the test group could be well separated (Figure a). All of the eight models from the eight machine learning methods established by using the training cohort obtained AUC above 0.92 in the test cohort (Figure b). For LR, SVM, and NB models, the AUC was 1. The sensitivity, specificity, accuracy, precision, and F1-score (the definitions of these terms listed in Text S1) of the eight models are shown in Figure c. With the best classification performance (AUC = 1, sensitivity = 98%, specificity = 100%, accuracy = 99%, precision = 100%, F1-score = 99%), the LR model is recommended for future applications in the detection of COVID-19. The confusion matrix of the LR model in the test cohort is shown in Figure d, and only one sample was misclassified among 100 samples. We also tested the feasibility of classifying the 146 COVID-19 patients with different degrees of severity based on the MALDI-TOF spectra. There was no obvious difference between the different groups by considering all the aligned peaks or the 25 feature peaks.
Figure 3

Identification of COVID-19 patients using machine learning-based classification modes in the test cohort. (a) PCA analysis using the 25 features. (b) ROC curves by eight machine learning methods. (c) Summary of the accuracy, precision, F1-score, sensitivity, and specificity obtained for each machine learning method. (d) Confusion matrix of the classification results by the LR mode.

Identification of COVID-19 patients using machine learning-based classification modes in the test cohort. (a) PCA analysis using the 25 features. (b) ROC curves by eight machine learning methods. (c) Summary of the accuracy, precision, F1-score, sensitivity, and specificity obtained for each machine learning method. (d) Confusion matrix of the classification results by the LR mode. The symptoms of COVID-19 patients usually are nonspecific. Guan et al. reported that only 44% of COVID-19 patients from China had a fever when they entered the hospital.[14] At the same time, other symptoms such as cough, fatigue, sputum production, and shortness of breath could also be associated with other respiratory infections. It is very important to screen COVID-19 patients from non-COVID-19 individuals, especially with the similar symptoms. In this study, about half of the control participants were non-COVID-19 patients with similar clinical symptoms (n = 73), together with 33 tuberculosis patients and 46 healthy individuals. The sampling time of COVID-19 patients’ sera analyzed in this study ranged from 3 to 28 days from the onset of symptoms, which covered a relatively long period of disease progression. The long disease progression coverage of the samples demonstrated the great potential of the method in the screening of COVID-19 patients. We kept the most simplified experimental procedure, using the typical MALDI-TOF method and with only minimal sample preparation steps, in order to develop a high-throughput method that can be mastered by nonexperts. Because the organic matrix used in typical MALDI-TOF analysis can generate strong background signals in low mass regions, metabolites were not considered for MS analysis. Without proteolysis, large proteins cannot be efficiently ionized and detected. Hence, the mass range of the analysis focused on 5000 to 30,000 m/z, corresponding to serum peptidome and small proteins. Compared with the commonly used virus nucleic acids detection methods based on PCR, the MALDI-TOF MS-based serum peptidome profiling does not need an extra clean testing environment and has a low risk of sample cross contamination. Only 5 μL of serum sample was used per analysis. The serum samples could be analyzed immediately after sterilization. It took less than 1 min to analyze a sample by the MALDI-TOF MS. Without expensive consumables, the cost per test by MALDI-TOF MS is much lower compared to PCR or immunoassay, i.e., less than 1 US dollar per sample. Nachtigall et al. have recently reported the detection of COVID-19 using MALDI-TOF MS analysis of nasal swab samples and found seven feature peaks to distinguish COVID-19 patients from the control participants.[40] Compared to nasal swabs, the serum sample is more static and less sensitive to external turbulence for more reliable diagnosis. Furthermore, serum collection also reduced the exposure risk of samplers.

Annotation of Feature Peaks

To identify the feature peaks, the sera from four COVID-19 patients (VSamples 67, 81, 112, 140) and three control participants (Samples 36, 41, 145) were selected for proteomic analysis. A total of 753 proteins were identified from the sera samples by LC-MS/MS and then matched against the 25 feature peaks under the criteria detailed in Text S1. Fifteen out of the 25 feature peaks were identified as intact proteins or protein fragments (Table S4). As shown in Figure S3, the top Gene Ontology (GO) processes enriched by Metascape[41] involving the 15 identified features include amyloid fiber formation, neutrophil degranulation, infection with Mycobacterium tuberculosis, humoral immune response, receptor-mediated endocytosis, acute inflammatory response, and regulation of MAP kinase activity. In the previous proteomic and metabolomic study of COVID-19 patients’ sera by Shen et al., differentially expressed proteins in COVID-19 patients enriched in the function of neutrophil degranulation and acute inflammatory response were also reported.[42] Seven peaks were identified as proteins or peptides with the function of humoral immune response. Among the proteins, platelet factor 4 and complement factor I were investigated by Shen et al.[42] and Shu et al.,[43] respectively, among COVID-19 patients with different severities. Gordon et al. have identified the gene RAB7A as a host factor with functional relevance in SARS-CoV-2 infections.[44] In our result, Ras-related protein Rab-7a encoded by RAB7A was identified from one of the feature peaks. These results demonstrate the rationality of the feature peaks in the identification of COVID-19.

Conclusion

In summary, our data demonstrated that MALDI-TOF-based serum profiling is a rapid and accurate method for the detection of COVID-19. It has great potential for screening, routine surveillance, and diagnostic applications in large populations, which is an important part for the pandemic control. Future studies would include the asymptomatic infected COVID-19 patients, which was absent in the current work due to the limitation of sample availability. The performance of the LR model used in this study would also be validated in other cohorts.
  11 in total

Review 1.  A concise discussion on the potential spectral tools for the rapid COVID-19 detection.

Authors:  Abhijeet Mohanty; Adarsh P Fatrekar; Saravanan Krishnan; Amit A Vernekar
Journal:  Results Chem       Date:  2021-05-06

Review 2.  Mass Spectrometry-Based Human Breath Analysis: Towards COVID-19 Diagnosis and Research.

Authors:  Zi-Cheng Yuan; Bin Hu
Journal:  J Anal Test       Date:  2021-08-16

3.  Systematic review with meta-analysis of diagnostic test accuracy for COVID-19 by mass spectrometry.

Authors:  Matt Spick; Holly M Lewis; Michael J Wilde; Christopher Hopley; Jim Huggett; Melanie J Bailey
Journal:  Metabolism       Date:  2021-10-27       Impact factor: 8.694

4.  MALDI-TOF mass spectrometry of saliva samples as a prognostic tool for COVID-19.

Authors:  Lucas C Lazari; Rodrigo M Zerbinati; Livia Rosa-Fernandes; Veronica Feijoli Santiago; Klaise F Rosa; Claudia B Angeli; Gabriela Schwab; Michelle Palmieri; Dmitry J S Sarmento; Claudio R F Marinho; Janete Dias Almeida; Kelvin To; Simone Giannecchini; Carsten Wrenger; Ester C Sabino; Herculano Martinho; José A L Lindoso; Edison L Durigon; Paulo H Braz-Silva; Giuseppe Palmisano
Journal:  J Oral Microbiol       Date:  2022-02-27       Impact factor: 5.474

5.  Optical imaging spectroscopy for rapid, primary screening of SARS-CoV-2: a proof of concept.

Authors:  Emilio Gomez-Gonzalez; Alejandro Barriga-Rivera; Beatriz Fernandez-Muñoz; Jose Manuel Navas-Garcia; Isabel Fernandez-Lizaranzu; Francisco Javier Munoz-Gonzalez; Ruben Parrilla-Giraldez; Desiree Requena-Lancharro; Pedro Gil-Gamboa; Cristina Rosell-Valle; Carmen Gomez-Gonzalez; Maria Jose Mayorga-Buiza; Maria Martin-Lopez; Olga Muñoz; Juan Carlos Gomez-Martin; Maria Isabel Relimpio-Lopez; Jesus Aceituno-Castro; Manuel A Perales-Esteve; Antonio Puppo-Moreno; Francisco Jose Garcia-Cozar; Lucia Olvera-Collantes; Raquel Gomez-Diaz; Silvia de Los Santos-Trigo; Monserrat Huguet-Carrasco; Manuel Rey; Emilia Gomez; Rosario Sanchez-Pernaute; Javier Padillo-Ruiz; Javier Marquez-Rivas
Journal:  Sci Rep       Date:  2022-02-18       Impact factor: 4.996

Review 6.  Consolidating the potency of matrix-assisted laser desorption/ionization-time of flight mass spectrometry (MALDI-TOF MS) in viral diagnosis: Extrapolating its applicability for COVID diagnosis?

Authors:  Iyyakkannu Sivanesan; Judy Gopal; Rohit Surya Vinay; Elizabeth Hanna Luke; Jae-Wook Oh; Manikandan Muthu
Journal:  Trends Analyt Chem       Date:  2022-02-22       Impact factor: 12.296

Review 7.  Recent advances in on-site mass spectrometry analysis for clinical applications.

Authors:  Xiaoyu Zhou; Wenpeng Zhang; Zheng Ouyang
Journal:  Trends Analyt Chem       Date:  2022-01-31       Impact factor: 12.296

8.  Serum peptidome profiles immune response of COVID-19 Vaccine administration.

Authors:  Wenjia Zhang; Dandan Li; Bin Xu; Lanlan Xu; Qian Lyu; Xiangyi Liu; Zhijie Li; Jian Zhang; Wei Sun; Qingwei Ma; Liang Qiao; Pu Liao
Journal:  Front Immunol       Date:  2022-08-24       Impact factor: 8.786

Review 9.  Advances in rapid detection of SARS-CoV-2 by mass spectrometry.

Authors:  Tsz-Fung Wong; Pui-Kin So; Zhong-Ping Yao
Journal:  Trends Analyt Chem       Date:  2022-08-20       Impact factor: 14.908

Review 10.  Spectroscopic methods for COVID-19 detection and early diagnosis.

Authors:  Alaa Bedair; Kamal Okasha; Fotouh R Mansour
Journal:  Virol J       Date:  2022-09-22       Impact factor: 5.913

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.