| Literature DB >> 35053990 |
Monique Melo Costa1,2,3, Hugo Martin1,2,3, Bertrand Estellon4, François-Xavier Dupé4, Florian Saby1,2,3, Nicolas Benoit1,2,3,5, Hervé Tissot-Dupont3,6, Matthieu Million3,6, Bruno Pradines1,2,3,5, Samuel Granjeaud7, Lionel Almeras1,2,3.
Abstract
SARS-CoV-2 has caused a large outbreak since its emergence in December 2019. COVID-19 diagnosis became a priority so as to isolate and treat infected individuals in order to break the contamination chain. Currently, the reference test for COVID-19 diagnosis is the molecular detection (RT-qPCR) of the virus from nasopharyngeal swab (NPS) samples. Although this sensitive and specific test remains the gold standard, it has several limitations, such as the invasive collection method, the relative high cost and the duration of the test. Moreover, the material shortage to perform tests due to the discrepancy between the high demand for tests and the production capacities puts additional constraints on RT-qPCR. Here, we propose a PCR-free method for diagnosing SARS-CoV-2 based on matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) profiling and machine learning (ML) models from salivary samples. Kinetic saliva samples were collected at enrollment and ten and thirty days later (D0, D10 and D30), to assess the classification performance of the ML models compared to the molecular tests performed on NPS specimens. Spectra were generated using an optimized protocol of saliva collection and successive quality control steps were developed to ensure the reliability of spectra. A total of 360 averaged spectra were included in the study. At D0, the comparison of MS spectra from SARS-CoV-2 positive patients (n = 105) with healthy healthcare controls (n = 51) revealed nine peaks that significantly distinguished the two groups. Among the five ML models tested, support vector machine with linear kernel (SVM-LK) provided the best performance on the training dataset (accuracy = 85.2%, sensitivity = 85.1%, specificity = 85.3%, F1-Score = 85.1%). The application of the SVM-LK model on independent datasets confirmed its performances with 88.9% and 80.8% of correct classification for samples collected at D0 and D30, respectively. Conversely, at D10, the proportion of correct classification had fallen to 64.3%. The analysis of saliva samples by MALDI-TOF MS and ML appears as an interesting supplementary tool for COVID-19 diagnosis, despite the mitigated results obtained for convalescent patients (D10).Entities:
Keywords: COVID-19; MALDI-TOF MS; diagnostic; machine learning; saliva
Year: 2022 PMID: 35053990 PMCID: PMC8781148 DOI: 10.3390/jcm11020295
Source DB: PubMed Journal: J Clin Med ISSN: 2077-0383 Impact factor: 4.241
Figure 1Schematic presentation of the experimental workflow. (A) Main steps of the study. Individuals who were SARS-CoV-2-positive and negative (controls) based on NPS RT-qPCR tests were enrolled. Saliva was collected with Salivette® devices (SARSTEDT, Numbrecht, Germany), loaded onto an MS plate, and then submitted to MALDI-TOF MS acquisition. The resulting MS spectra were analyzed using machine learning (ML) methods. (B) Pre-processing steps of the MS spectra included a quality control step, evaluated the homogeneity of the data among the Cov+ and Cov− groups, and determined peak detection conditions. (C) Strategy used for the prediction of SARS-CoV-2-positive saliva using ML. Statistical analyses were performed to reveal relevant MS peaks before the assessment of ML models.
Characteristics of participants.
| COVID-19 Group a | Healthcare Worker Group | |||||
|---|---|---|---|---|---|---|
| Collection Time Point b | D0 | D0 + 10 | D0 + 30 | D0 | D0 + 10 | D0 + 30 |
| Participants, | 138 (105) | 79 (16) | 20 (0) | 51 | 40 | 32 |
| Age (years), median (IQR) | 37.4 (23–52) | 37.7 (24–52) | 48 (39.5–57.3) | 36.1 (27–45.5) | 38.4 (28.8–46.8) | 37.7 (27.8–46.8) |
| Male, | 68 (49.3%) | 42 (53.2%) | 10 (50.0%) | 22 (43.1%) | 18 (45.0%) | 16 (48.5%) |
| Onset of symptoms before D0 test (days), median (IQR) | 2.2 (1–3) | / | ||||
| Symptoms at presentation, | 89 (64.5%) | 0 (0.0%) | ||||
| Headache, | 38 (27.5%) | / | ||||
| Tiredness, | 26 (18.8%) | / | ||||
| Cough, | 24 (17.4%) | / | ||||
| Fever, | 21 (15.2%) | / | ||||
| Myalgia, | 20 (14.5%) | / | ||||
| Breathing difficulties, | 12 (8.7%) | / | ||||
| Anosmia/Ageusia, | 9 (6.5%) | / | ||||
| Sore throat, | 7 (5.1%) | / | ||||
| Diarrhea, | 7 (5.1%) | / | ||||
| Others, | 4 (2.9%) | / | ||||
a Tested positively for SARS-CoV-2 by RT-qPCR on NPSs less than five days before enrollment. b Saliva sampled ten (D0 + 10) and thirty (D0 + 30) days after the first collection (D0). c Tested positively for SARS-CoV-2 by RT-qPCR on NPSs the sampling day (D0). Abbreviations: IQR, interquartile range; NPS, nasopharyngeal swab; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.
Figure 2Pre-processing steps of MS spectra. (A) Detection of atypical spectra from Cov+ (n = 948) and Cov− (n = 564) groups using the screenSpectra function from the MALDIrppa package. A score was computed for each spectrum. Horizontal dotted line corresponds to atypical score threshold. (B) Comparison of total-ion-current intensities after filtering atypical spectra and averaging spectra replicates between Cov+ (n = 237) and Cov− (n = 123) groups (two-tailed Wilcoxon rank sum test). (C) Comparison of peak number distributions according to SNR values ranging from 2 to 7 between average MS spectra from Cov+ (n = 237) and Cov− (n = 123) groups. **** p < 0.001 by two-tailed Wilcoxon rank sum test. (D) Summary table of peak number distributions and comparisons for each SNR value between average MS spectra from Cov+ (n = 237) and Cov− (n = 123) groups. * exact p-values of by two-tailed Wilcoxon rank sum test. A.U.: arbitrary units; Cov+/−: MS spectra from individuals enrolled in the COVID-19 and control groups, respectively; SD: standard deviation; SNR: signal-to-noise ratio.
Figure 3Comparison of saliva MS spectra between SARS-CoV-2(+) (n = 105) and control (n = 51) groups sampled at D0. Mean MS spectra profiles with a total-ion-current normalization applied to peak spectra intensities of Cov+ (A) and Cov− (B) individuals and their superposition (C). (D) List of MS peaks with significant total-ion-current normalized intensity differences (two-tailed Wilcoxon rank sum test (p < 0.05) with a Benjamini–Hochberg correction (p < 0.1) for each peak) between Cov+ and Cov− groups. Dashed box highlighted the most two relevant peaks. (E) Summary table of Wilcoxon rank sum tests with a Benjamini–Hochberg (BH) correction for each peak. Ratio represent the proportion (%) of mean peak intensity variations between Cov+ and Cov− groups. Positive and negative values correspond, respectively, to upper and lower mean peak intensity variations in Cov+ compared to Cov− groups. Comparison of the two greatest peaks at m/z 2489.2 (F) and at m/z 5418.9 (G) differentiating Cov+ and Cov− groups. Lines and shadow represent mean values ± interquartile range (IQR), respectively. Dashed box framed each relevant peak. PCA (H) and UMAP (I) performed on Cov+ (n = 105) and Cov− (n = 51) groups using the selected peaks (n = 9) with two-tailed Wilcoxon rank sum test (p < 0.05) and a Benjamini–Hochberg correction (p < 0.1).
Figure 4Training and testing sets for machine learning models. The individuals enrolled were divided into several groups to establish the best training set and to assess the performance of the selected ML model on different groups. The total number of samples, in the Cov+ (n = 237) and Cov− (n = 123) categories, at the inclusion (D0), ten (D10), and thirty (D30) days later were indicated (orange numbers). Green, red, and blue numbers correspond to saliva samples classified as SARS-CoV-2 positive, negative, or uncertain negative, respectively, based on RT-PCR results. The samples selected for training set selection (n = 102) and assessment of the best ML model, organized into five groups (Test 1 to Test 5) are indicated by circles of the same color code. Test 2 to Test 4 were included in Test 1. Cov+/−: MS spectra from individuals enrolled in the COVID-19 and control groups, respectively; ML: machine learning.
Figure 5Performances curves of the five machine learning models. Best AUC values for ROC (A) and best AP values for precision recall (B) curves associated to each model trained on samples from Cov+ (n = 51) and Cov− (n = 51) groups collected at D0 are presented. (C) Summary of performances obtained per ML model. F1-Score was used as a performance measure to select the most suitable developed model. AP: average precision; AUC: area under the curve; Cov+/−: MS spectra from individuals enrolled in the COVID-19 and control groups, respectively; K-NN: K-nearest neighbors; LDA: linear discriminant analysis; ML: machine learning; RF: random forest; SVM-LK: support vector machine with linear kernel; SVM-RK: support vector machine with radial kernel.
Figure 6Confusion matrix for test results using SVM linear model. The metrics of the prediction using samples from the Test 1 (A), Test 2 (B), Test 3 (C), and Test 4 (D). (E) Summary of performances obtained per group tested (from Test 1 to 4) applied to the SVM-LK model. ITC: impossible to compute. (F) The metrics of the prediction using samples from Test 5 applied to the SVM-LK model. SVM-LK: support vector machine with linear kernel.