| Literature DB >> 35464991 |
Jia-Wei Tang1, Jia-Qi Li1, Xiao-Cong Yin2, Wen-Wen Xu1, Ya-Cheng Pan3, Qing-Hua Liu4, Bing Gu2,5, Xiao Zhang1, Liang Wang1,6.
Abstract
With its low-cost, label-free and non-destructive features, Raman spectroscopy is becoming an attractive technique with high potential to discriminate the causative agent of bacterial infections and bacterial infections per se. However, it is challenging to achieve consistency and accuracy of Raman spectra from numerous bacterial species and phenotypes, which significantly hinders the practical application of the technique. In this study, we analyzed surfaced enhanced Raman spectra (SERS) through machine learning algorithms in order to discriminate bacterial pathogens quickly and accurately. Two unsupervised machine learning methods, K-means Clustering (K-Means) and Agglomerative Nesting (AGNES) were performed for clustering analysis. In addition, eight supervised machine learning methods were compared in terms of bacterial predictions via Raman spectra, which showed that convolutional neural network (CNN) achieved the best prediction accuracy (99.86%) with the highest area (0.9996) under receiver operating characteristic curve (ROC). In sum, machine learning methods can be potentially applied to classify and predict bacterial pathogens via Raman spectra at general level.Entities:
Keywords: bacterial pathogen; convolutional neural network; long short-term memory; machine learning; surface enhanced Raman spectra
Year: 2022 PMID: 35464991 PMCID: PMC9024395 DOI: 10.3389/fmicb.2022.843417
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 6.064
FIGURE 1Schematic illustration of the network structures of deep learning algorithms. (A) Multi-layer perceptron model architecture. (B) Convolutional neural network (CNN) model architecture. (C) Network structure of three recurrent neural networks (RNN) models.
FIGURE 2Average Raman spectra and the corresponding characteristic peaks of 15 bacterial pathogens isolated from clinical samples. (A) Average surfaced enhanced Raman spectra (SERS) spectra of 15 different clinical bacterial pathogens. Shaded part in each spectrum was 20% error band. (B) Dot plot distribution of characteristic peaks in the Raman spectra for 15 bacterial pathogens.
FIGURE 3Schematic illustration of the clustering results of 15 bacterial pathogens used in this study via K-means and agglomerative nesting (AGNES). (A) K-means. (B) AGNES. Dots with different colors represented different bacterial pathogens as indicated by the figure legends on the right.
Comparative analysis of the predicative capabilities of eight machine learning algorithms on surfaced enhanced Raman spectra (SERS) spectral data belonging to 15 bacterial pathogens.
| Algorithms | ACC | Pre | Recall | F1 | 5-Fold CV |
| CNN | 99.86% | 99.91% | 99.91% | 99.93% | 99.47% |
| LSTM | 98.87% | 98.87% | 92.20% | 98.74% | 96.76% |
| RF | 98.71% | 98.77% | 98.80% | 98.77% | 98.35% |
| GRU | 98.61% | 97.91% | 97.93% | 97.92% | 89.68% |
| SVM | 97.30% | 97.30% | 97.08% | 97.28% | 97.93% |
| SimpleRNN | 96.43% | 96.91% | 95.89% | 95.91 | 83.63% |
| DT | 96.01% | 97.96% | 97.53% | 97.95% | 97.48% |
| MLP | 95.17% | 96.07% | 95.54% | 95.86% | 96.84% |
FIGURE 4Comparison of receiver operating characteristic curve (ROC) curves via area under the curve (AUC) values for eight supervised machine learning algorithms.
FIGURE 5Confusion matrix of the convolutional neural network (CNN) model for 15 different bacterial pathogens. The rows in the confusion matrix represented the true categories of predictions, while the columns represented the categories of the incorrect predictions. The probability of correct prediction (diagonal) and the probability of incorrect prediction (off-diagonal) were all present in the matrix.
FIGURE 6Schematic illustration of the influences of different signal-to-noise ratio on the prediction accuracies of eight machine learning algorithms. X-axis shows different machine learning models. Y-axis represents prediction accuracy of machine learning algorithms under different signal-to-noise ratios. Lines with different colors represent noises intensities. The smaller the signal-to-noise ratio (SNR) value, the more noise added to the spectra and the worse the data quality.