| Literature DB >> 33045283 |
María Florencia Rocca1, Jonathan Cristian Zintgraff2, María Elena Dattero3, Leonardo Silva Santos4, Martín Ledesma5, Carlos Vay6, Mónica Prieto2, Estefanía Benedetti3, Martín Avaro3, Mara Russo3, Fabiane Manke Nachtigall7, Elsa Baumeister3.
Abstract
Coronavirus disease 2019, known as COVID-19, is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The early, sensitive and specific detection of SARS-CoV-2 virus is widely recognized as the critical point in responding to the ongoing outbreak. Currently, the diagnosis is based on molecular real time RT-PCR techniques, although their implementation is being threatened due to the extraordinary demand for supplies worldwide. That is why the development of alternative and / or complementary tests becomes so relevant. Here, we exploit the potential of mass spectrometry technology combined with machine learning algorithms, for the detection of COVID-19 positive and negative protein profiles directly from nasopharyngeal swabs samples. According to the preliminary results obtained, accuracy = 67.66 %, sensitivity = 61.76 %, specificity = 71.72 %, and although these parameters still need to be improved to be used as a screening technique, mass spectrometry-based methods coupled with multivariate analysis showed that it is an interesting tool that deserves to be explored as a complementary diagnostic approach due to the low cost and fast performance. However, further steps, such as the analysis of a large number of samples, should be taken in consideration to determine the applicability of the method developed.Entities:
Keywords: COVID-19; MALDI-TOF; Machine learning; Mass spectrometry; SARS-CoV-2
Mesh:
Substances:
Year: 2020 PMID: 33045283 PMCID: PMC7546642 DOI: 10.1016/j.jviromet.2020.113991
Source DB: PubMed Journal: J Virol Methods ISSN: 0166-0934 Impact factor: 2.014
Fig. 1Main spectra profiles (MSPs) based dendrogram of the 20 samples supplemented in the new in-house database. The horizontal axis of the dendrogram represents the calculated distance in the clustering analysis, displayed in relative units, corresponding to the similarity of MS spectra. The dendrogram was created using Biotyper v3.0 software.
Fig. 22D Peak Distribution Plot of 2-class model (MSP database). This plot displays the distribution of two selected peaks in the non-excluded spectra on the loaded model generation classes. The data is shown on a two-dimensional plane. By default, the first two (best separating) peaks of the current statistic sort order are displayed. The ellipses represent the standard deviation of the class average of the peak area/intensities. The x-axis shows the peak area/intensity values with respect to the most important peak in accordance to the p-value, and the y-axis the peak area/intensity values for the second most important peak, respectively. The axis measures are given in arbitrary units which are chosen automatically to fit the plot optimal in the plane. Plot obtained by ClinPro Tools v3.0.
Characteristic MALDI-TOF MS peaks (best top ten) obtained by ClinPro Tools software for each model.
| 2 Class model A | 3 Class model B | 2 Class model C | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mass | DAve | PTTA | PWKW | PAD | Mass | DAve | PTTA | PWKW | PAD | Mass | DAve | PTTA | PWKW | PAD |
| 3372,19 | 7,85 | 0,000171 | 0,0000138 | < 0.000001 | 3443,31 | 13,35 | 0,00149 | 0,00000568 | < 0.000001 | 3443,19 | 13,35 | 0,00332 | 0,000319 | < 0.000001 |
| 3443,14 | 7,79 | 0,00834 | 0,000042 | < 0.000001 | 3372,29 | 11,52 | 0,000023 | 0,00000427 | < 0.000001 | 3372,15 | 11,52 | 0,00108 | 0,00059 | < 0.000001 |
| 4966,6 | 4,79 | 0,000277 | 0,000093 | < 0.000001 | 3487,4 | 11,42 | 0,00969 | 0,0359 | < 0.000001 | 4965,83 | 6,69 | 0,0000165 | 0,00000382 | < 0.000001 |
| 5236,08 | 3,88 | 0,00584 | 0,000764 | < 0.000001 | 5236,12 | 8,63 | < 0.000001 | < 0.000001 | < 0.000001 | 5235,02 | 4,62 | 0,00000929 | 0,000128 | < 0.000001 |
| 4078,24 | 3,49 | 0,000011 | < 0.000001 | 0 | 4966,4 | 6,74 | 0,0000157 | < 0.000001 | < 0.000001 | 4985,26 | 4,2 | 0,00000651 | 0,00000106 | < 0.000001 |
| 3487,1 | 3,24 | 0,0737 | 0,0191 | < 0.000001 | 4985,38 | 4,12 | 0,00000296 | < 0.000001 | < 0.000001 | 3465,33 | 3,61 | 0,0384 | 0,0414 | < 0.000001 |
| 4985,73 | 3,07 | 0,000033 | 0,0000101 | < 0.000001 | 5137,86 | 3,89 | < 0.000001 | < 0.000001 | < 0.000001 | 3394,17 | 3,3 | 0,00856 | 0,000829 | < 0.000001 |
| 3359,22 | 2,66 | 0,000032 | 0,000015 | < 0.000001 | 3465,33 | 3,6 | 0,0183 | 0,00758 | < 0.000001 | 4939,73 | 3,06 | < 0.000001 | 0,0000776 | < 0.000001 |
| 3393,78 | 2,54 | 0,00182 | 0,00418 | < 0.000001 | 5382,94 | 3,58 | < 0.000001 | 0,00000425 | < 0.000001 | 5381,66 | 2,93 | 0,0000266 | 0,0117 | < 0.000001 |
| 3475,85 | 2,48 | 0,000608 | 0,000148 | < 0.000001 | 5157,12 | 3,52 | < 0.000001 | < 0.000001 | < 0.000001 | 4077,47 | 2,5 | 0,00881 | 0,00794 | < 0.000001 |
DAve: Difference between the maximal and the minimal average peak intensity of all classes.
PTTA: P-value obtained through t-test. PWKW: P-value obtained through Wilcoxon/Kruskal-Wallis test. PAD: P-value obtained through Anderson-Darling test.
Parameters of the “in-house” Database evaluated (n = 50).
| Parameters Evaluated | (%) | 95 % CI (%) |
|---|---|---|
| 38.0 | (24.65–52.83) | |
| 26.5 | (12.88–44.36) | |
| 62.5 | (35.43–84.80) | |
| 28.6 | (20.65–38.07) | |
| 60.0 | (39.19–77.74) |
Fig. 3A- Characteristic peaks in the individual spectra of the MSPs database among COVID-19 positive samples versus COVID-19 negative samples, obtained by manual analysis in Flex Analysis v3.4 software. B- Average spectra of same peaks, obtained by ClinProTools v3.0 software. A/B-1: 3372 Da, 3442 Da, 3465 Da and 3488 Da. A/B-2: 6347 Da. A/B-3:10836 Da.
Single-peak analysis for the discrimination of COVID-19 positive samples from the negative samples (n = 20), between Flex Analysis v3.4 and ClinPro Tools v3.0.
| ClinPro Tools | FlexAnalysis | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Mass | PTTA | PWKW | PAD | AUC | AUC* | Sensitivity (%) | Specificity (%) | PPV (%) | PPN (%) |
| 3372,3 # | 0.0133 | 0.0199 | 0.05450 | 0.87 | 0.667 | 33.33 | 100.00 | 100.00 | 64,71 |
| 3443,28 # | 0.0133 | 0.0222 | 0.04480 | 0.86 | 0.667 | 33.33 | 100.00 | 100.00 | 64.71 |
| 3465,6 # | 0.0133 | 0.0244 | 0.05790 | 0.85 | 0.742 | 66.67 | 81.82 | 75.00 | 75.00 |
| 6347,57 # | 0.0146 | 0.0096 | 0.02110 | 0.92 | 0.636 | 100.00 | 27.27 | 52.94 | 100.00 |
| 10836,83 # | 0.0885 | 0,0236 | 0.00019 | 0.86 | 0.727 | 100.00 | 45.45 | 60.00 | 100.00 |
PTTA: P-value obtained through t-test. PWKW: P-value obtained through Wilcoxon/Kruskal-Wallis test. PAD: P-value obtained through Anderson-Darling test. *AUCs were obtained from a ROC curve constructed using Eng, J. ROC analysis: web-based calculator for ROC curves. from http://www.jrocfit.org. # the analysis include only these peaks, due to the other peaks of the table were not significant when the manual corroboration in Flex Analysis software was made.
Complete results of Recognition capacity and Cross validation derived from the classification models calculated.
| Classifier Models | RC (%) | CV (%) |
|---|---|---|
| 2 Class model A | 100 | 92.98 |
| 3 Class model B | 100 | 87.16 |
| 2 Class model C | 100 | 92.87 |
RC: Recognition Capacity; CV: Cross Validation.
Parameters of the Machine Learning combined with Potential Biomarkers. (n = 167).
| Parameters Evaluated | (%) | 95 % CI (%) |
|---|---|---|
| 67.66 | (60.00–74.69) | |
| 71.72 | (61.78–80.31) | |
| 61.76 | (49.18–73.29) | |
| 60.00 | (51.01–68.37) | |
| 73.20 | (66.33–79.10) |