| Literature DB >> 36004985 |
Eloghosa Ikponmwoba1, Okezzi Ukorigho1, Parikshit Moitra2,3, Dipanjan Pan2,3, Manas Ranjan Gartia1, Opeoluwa Owoyele1.
Abstract
In this study, we explored machine learning approaches for predictive diagnosis using surface-enhanced Raman scattering (SERS), applied to the detection of COVID-19 infection in biological samples. To do this, we utilized SERS data collected from 20 patients at the University of Maryland Baltimore School of Medicine. As a preprocessing step, the positive-negative labels are obtained using Polymerase Chain Reaction (PCR) testing. First, we compared the performance of linear and nonlinear dimensionality techniques for projecting the high-dimensional Raman spectra to a low-dimensional space where a smaller number of variables defines each sample. The appropriate number of reduced features used was obtained by comparing the mean accuracy from a 10-fold cross-validation. Finally, we employed Gaussian process (GP) classification, a probabilistic machine learning approach, to correctly predict the occurrence of a negative or positive sample as a function of the low-dimensional space variables. As opposed to providing rigid class labels, the GP classifier provides a probability (ranging from zero to one) that a given sample is positive or negative. In practice, the proposed framework can be used to provide high-throughput rapid testing, and a follow-up PCR can be used for confirmation in cases where the model's uncertainty is unacceptably high.Entities:
Keywords: COVID-19; Gaussian processes; machine learning; surface-enhanced Raman spectroscopy
Mesh:
Year: 2022 PMID: 36004985 PMCID: PMC9405612 DOI: 10.3390/bios12080589
Source DB: PubMed Journal: Biosensors (Basel) ISSN: 2079-6374
Figure 1(a) Positive and negative SERS Spectral (b) Mean positive and negative SERS.
Figure 2Explained Variance Ratio Plot for PCA.
Figure 3(a) 2D scatter plot of first and second principal components from PCA (b) 2D scatter plot of first and second components from UMAP.
Figure 4(a) Comparison of different number of principal components from PCA with the model’s mean accuracy across 10 folds (b) Comparison of different number of components from UMAP with the model’s mean accuracy across 10 folds.
Results from the 10-fold validation of GPC model with 7 PCs.
| Folds | Accuracy | Precision | Recall | F1 Score | ROC_AUC |
|---|---|---|---|---|---|
| Fold 1 | 0.800 | 1.000 | 0.600 | 0.750 | 0.920 |
| Fold 2 | 0.900 | 0.900 | 0.900 | 0.900 | 0.980 |
| Fold 3 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| Fold 4 | 0.750 | 0.727 | 0.800 | 0.762 | 0.860 |
| Fold 5 | 0.700 | 0.750 | 0.600 | 0.667 | 0.810 |
| Fold 6 | 0.950 | 1.000 | 0.900 | 0.947 | 0.970 |
| Fold 7 | 0.850 | 1.000 | 0.700 | 0.824 | 0.920 |
| Fold 8 | 0.800 | 0.875 | 0.700 | 0.778 | 0.950 |
| Fold 9 | 0.900 | 0.900 | 0.900 | 0.900 | 0.990 |
| Fold 10 | 0.900 | 1.000 | 0.800 | 0.889 | 0.980 |
| Mean | 0.855 | 0.915 | 0.740 | 0.842 | 0.941 |
Result from 10-Fold validation of GPR model with 4 UMAP Dimensions.
| Folds | Accuracy | Precision | Recall | F1 Score | ROC_AUC |
|---|---|---|---|---|---|
| Fold 1 | 0.800 | 1.000 | 0.600 | 0.750 | 0.920 |
| Fold 2 | 0.900 | 0.900 | 0.900 | 0.900 | 0.980 |
| Fold 3 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| Fold 4 | 0.750 | 0.727 | 0.800 | 0.762 | 0.860 |
| Fold 5 | 0.700 | 0.750 | 0.600 | 0.667 | 0.810 |
| Fold 6 | 0.950 | 1.000 | 0.900 | 0.947 | 0.970 |
| Fold 7 | 0.850 | 1.000 | 0.700 | 0.824 | 0.920 |
| Fold 8 | 0.800 | 0.875 | 0.700 | 0.778 | 0.950 |
| Fold 9 | 0.900 | 1.000 | 0.800 | 0.889 | 0.900 |
| Fold 10 | 0.900 | 1.000 | 0.800 | 0.889 | 0.980 |
| Mean | 0.855 | 0.925 | 0.780 | 0.841 | 0.929 |
Figure 5(a) Confusion Matrix for Gaussian Process Classifier with 7 principal components (b) Area Under Curve Receiver Operating Characteristics (AUC_ROC) for Gaussian Process Classifier with 7 principal components.
Figure 6(a) Confusion Matrix for Gaussian Process Classifier with 4 UMAP components (b) Area Under the Receiver Operating Characteristics (AUC_ROC) curve for Gaussian Process Classifier with 4 UMAP components.
Test predictions, prediction probability and uncertainty for GPR model with PCA. The misclassified samples are highlighted in red.
| Samples | Class 1 prob | Class 2 prob | True Class | Predicted Class | Uncertainty |
|---|---|---|---|---|---|
|
|
|
|
|
|
|
| 2 | 0.376 | 0.624 | 1 | 1 | 0.95 |
| 3 | 0.245 | 0.755 | 1 | 1 | 0.80 |
| 4 | 0.608 | 0.392 | 0 | 0 | 0.97 |
| 5 | 0.196 | 0.804 | 1 | 1 | 0.71 |
| 6 | 0.689 | 0.311 | 0 | 0 | 0.89 |
| 7 | 0.822 | 0.178 | 0 | 0 | 0.68 |
| 8 | 0.179 | 0.821 | 1 | 1 | 0.68 |
| 9 | 0.732 | 0.268 | 0 | 0 | 0.84 |
| 10 | 0.231 | 0.769 | 1 | 1 | 0.78 |
| 11 | 0.257 | 0.743 | 1 | 1 | 0.82 |
| 12 | 0.238 | 0.762 | 1 | 1 | 0.79 |
| 13 | 0.341 | 0.659 | 1 | 1 | 0.93 |
|
|
|
|
|
|
|
| 15 | 0.760 | 0.240 | 0 | 0 | 0.80 |
| 16 | 0.654 | 0.346 | 0 | 0 | 0.93 |
| 17 | 0.870 | 0.130 | 0 | 0 | 0.56 |
| 18 | 0.686 | 0.314 | 0 | 0 | 0.90 |
| 19 | 0.463 | 0.537 | 1 | 1 | 1.00 |
| 20 | 0.764 | 0.236 | 0 | 0 | 0.79 |
Test predictions, prediction probability and uncertainty for GPR model with UMAP. The misclassified samples are highlighted in red.
| Samples | Class 1 prob | Class 2 prob | True Class | Predicted Class | Uncertainty |
|---|---|---|---|---|---|
| 1 | 0.707 | 0.293 | 0 | 0 | 0.87 |
| 2 | 0.760 | 0.240 | 0 | 0 | 0.80 |
| 3 | 0.361 | 0.639 | 1 | 1 | 0.94 |
| 4 | 0.840 | 0.160 | 0 | 0 | 0.63 |
| 5 | 0.480 | 0.520 | 1 | 1 | 1.00 |
| 6 | 0.696 | 0.304 | 0 | 0 | 0.89 |
| 7 | 0.777 | 0.223 | 0 | 0 | 0.77 |
| 8 | 0.832 | 0.168 | 0 | 0 | 0.65 |
| 9 | 0.351 | 0.649 | 1 | 1 | 0.93 |
| 10 | 0.165 | 0.835 | 1 | 1 | 0.65 |
| 11 | 0.452 | 0.548 | 1 | 1 | 0.99 |
| 12 | 0.587 | 0.413 | 0 | 0 | 0.98 |
|
|
|
|
|
|
|
| 14 | 0.187 | 0.813 | 1 | 1 | 0.70 |
| 15 | 0.329 | 0.671 | 1 | 1 | 0.91 |
| 16 | 0.789 | 0.211 | 0 | 0 | 0.74 |
|
|
|
|
|
|
|
| 18 | 0.611 | 0.389 | 0 | 0 | 0.96 |
| 19 | 0.285 | 0.715 | 1 | 1 | 0.86 |
| 20 | 0.336 | 0.664 | 1 | 1 | 0.92 |
Figure 7(a) GPC model decision boundary plot with uncertainty estimation for PCA (b) GPC model decision boundary plot with uncertainty estimation for UMAP.