| Literature DB >> 35062556 |
Seongyong Park1, Jaeseok Lee2,3, Shujaat Khan1, Abdul Wahab4, Minseok Kim2,3.
Abstract
Surface-Enhanced Raman Spectroscopy (SERS) is often used for heavy metal ion detection. However, large variations in signal strength, spectral profile, and nonlinearity of measurements often cause problems that produce varying results. It raises concerns about the reproducibility of the results. Consequently, the manual classification of the SERS spectrum requires carefully controlled experimentation that further hinders the large-scale adaptation. Recent advances in machine learning offer decent opportunities to address these issues. However, well-documented procedures for model development and evaluation, as well as benchmark datasets, are missing. Towards this end, we provide the SERS spectral benchmark dataset of lead(II) nitride (Pb(NO3)2) for a heavy metal ion detection task and evaluate the classification performance of several machine learning models. We also perform a comparative study to find the best combination between the preprocessing methods and the machine learning models. The proposed model can successfully identify the Pb(NO3)2 molecule from SERS measurements of independent test experiments. In particular, the proposed model shows an 84.6% balanced accuracy for the cross-batch testing task.Entities:
Keywords: SVM; heavy-metal ion; machine learning; neural network; pattern classification; random forest; surface-enhanced raman spectroscopy (SERS)
Mesh:
Substances:
Year: 2022 PMID: 35062556 PMCID: PMC8778908 DOI: 10.3390/s22020596
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Configuration of the study. (A) Experimental setup of the proposed surface-enhanced Raman spectroscopy-based Pb(NO3)2 molecule detection model. (B) Real image of sample preparation and acquisition steps. (C) The hypothetical decision boundary learned by the proposed Radial Basis Function Kernel Support Vector Machine (RBFSVM) model.
Sample statistics of Pb(NO3)2.
| Negative | Positive | |||
|---|---|---|---|---|
| Concentration (uM) | 0.01 | 0.1 | 10 | 1000 |
| Batch1 | 500 | 500 | 500 | 500 |
| Batch2 | 500 | 500 | 500 | 500 |
Performance summary of the proposed model.
| Dataset | Train | Test | Accuracy | Sensitivity | Specificity | F1 | MCC | BACC | Youden’s Index |
|---|---|---|---|---|---|---|---|---|---|
| RAW | Batch1 | Batch2 | 0.501 ± 0.000 | 0.335 ± 0.000 | 1.000 ± 0.000 | 0.501 ± 0.000 | 0.334 ± 0.000 | 0.667 ± 0.000 | 0.335 ± 0.000 |
| Batch2 | Batch1 | 0.721 ± 0.019 | 0.765 ± 0.011 | 0.591 ± 0.084 | 0.805 ± 0.011 | 0.328 ± 0.069 | 0.678 ± 0.040 | 0.356 ± 0.080 | |
| Average | 0.611 ± 0.114 | 0.550 ± 0.221 | 0.796 ± 0.218 | 0.653 ± 0.156 | 0.331 ± 0.048 | 0.673 ± 0.028 | 0.345 ± 0.056 | ||
| PSN | Batch1 | Batch2 | 0.750 ± 0.000 | 1.000 ± 0.000 | 0.000 ± 0.000 | 0.857 ± 0.000 | 0.000 ± 0.000 | 0.500 ± 0.000 | 0.000 ± 0.000 |
| Batch2 | Batch1 | 0.750 ± 0.000 | 1.000 ± 0.000 | 0.000 ± 0.000 | 0.857 ± 0.000 | 0.000 ± 0.000 | 0.500 ± 0.000 | 0.000 ± 0.000 | |
| Average | 0.750 ± 0.000 | 1.000 ± 0.000 | 0.000 ± 0.000 | 0.857 ± 0.000 | 0.000 ± 0.000 | 0.500 ± 0.000 | 0.000 ± 0.000 | ||
| Proposed (BC+RBFSVM) | Batch1 | Batch2 | 0.637 ± 0.005 | 0.517 ± 0.007 | 0.999 ± 0.001 | 0.681 ± 0.006 | 0.459 ± 0.004 | 0.758 ± 0.003 | 0.516 ± 0.006 |
| Batch2 | Batch1 | 0.901 ± 0.005 | 0.868 ± 0.006 | 1.000 ± 0.000 | 0.929 ± 0.004 | 0.788 ± 0.008 | 0.934 ± 0.003 | 0.868 ± 0.006 | |
| Average | 0.769 ± 0.135 | 0.692 ± 0.180 | 1.000 ± 0.001 | 0.805 ± 0.127 | 0.623 ± 0.169 | 0.846 ± 0.09 | 0.692 ± 0.180 | ||
Figure 2Visualization of the Density preserving t-SNE (D-tSNE) embedding of the Pb(NO3)2 SERS spectrum for (A,D) RAW, (B,E) Power Spectrum Density Normalization (PSN), and (C,F) Baseline Correction (BC) methods. The PCA embedding is learned with the 80% training dataset of one batch. The dataset of the other batch is projected using the learned PCA embedding. The D-tSNE is used as the dimension reduction technique to reduce the dimension of the projected dataset while keeping spreading of the data points. Top: Batch1. Bottom: batch2. Left: RAW. Middle: PSN. Right: BC.
Performance comparison between the proposed model and six ML models. The proposed model showed the best independent test balanced accuracy (BACC) results for datasets corresponding to both batches.
| Model | Train/Test | Average | |
|---|---|---|---|
| Batch1/Batch2 | Batch2/Batch1 | ||
| LR | 0.735 ± 0.005 | 0.582 ± 0.008 | 0.658 ± 0.079 |
| LinSVM | 0.661 ± 0.042 | 0.606 ± 0.011 | 0.634 ± 0.042 |
| NB | 0.882 ± 0.002 | 0.695 ± 0.006 | 0.789 ± 0.096 |
| DT | 0.574 ± 0.031 | 0.733 ± 0.072 | 0.654 ± 0.098 |
| RF | 0.603 ± 0.006 | 0.784 ± 0.047 | 0.694 ± 0.098 |
| MLP | 0.754 ± 0.006 | 0.896 ± 0.030 | 0.825 ± 0.076 |
| RBFSVM | 0.758 ± 0.003 | 0.934 ± 0.003 | 0.846 ± 0.090 |
Ten-folds cross validation performance using same batch datasets.
| Dset | Train | Test | Accuracy | Sensitivity | Specificity | F1 | MCC | BACC | Youden’s Index |
|---|---|---|---|---|---|---|---|---|---|
| RAW | Batch1 | Batch1 | 0.999 ± 0.002 | 0.999 ± 0.003 | 1.000 ± 0.000 | 0.999 ± 0.001 | 0.997 ± 0.006 | 0.999 ± 0.001 | 0.999 ± 0.003 |
| Batch2 | Batch2 | 0.922 ± 0.015 | 0.906 ± 0.020 | 0.972 ± 0.036 | 0.946 ± 0.011 | 0.820 ± 0.034 | 0.939 ± 0.018 | 0.878 ± 0.036 | |
| Average | 0.961 ± 0.041 | 0.952 ± 0.050 | 0.986 ± 0.028 | 0.973 ± 0.028 | 0.909 ± 0.094 | 0.969 ± 0.033 | 0.938 ± 0.067 | ||
| PSN | Batch1 | Batch1 | 1.000 ± 0.002 | 0.999 ± 0.002 | 1.000 ± 0.000 | 1.000 ± 0.001 | 0.999 ± 0.004 | 1.000 ± 0.001 | 0.999 ± 0.002 |
| Batch2 | Batch2 | 0.900 ± 0.037 | 0.906 ± 0.033 | 0.880 ± 0.064 | 0.931 ± 0.026 | 0.751 ± 0.089 | 0.893 ± 0.044 | 0.786 ± 0.089 | |
| Average | 0.950 ± 0.057 | 0.953 ± 0.053 | 0.940 ± 0.076 | 0.965 ± 0.040 | 0.875 ± 0.141 | 0.946 ± 0.063 | 0.893 ± 0.125 | ||
| BC | Batch1 | Batch1 | 1.000 ± 0.000 | 1.000 ± 0.000 | 1.000 ± 0.000 | 1.000 ± 0.000 | 1.000 ± 0.000 | 1.000 ± 0.000 | 1.000 ± 0.000 |
| Batch2 | Batch2 | 0.973 ± 0.012 | 0.986 ± 0.009 | 0.934 ± 0.038 | 0.982 ± 0.008 | 0.928 ± 0.032 | 0.960 ± 0.020 | 0.920 ± 0.039 | |
| Average | 0.986 ± 0.016 | 0.993 ± 0.009 | 0.967 ± 0.043 | 0.991 ± 0.011 | 0.964 ± 0.043 | 0.980 ± 0.025 | 0.960 ± 0.049 | ||