| Literature DB >> 35756017 |
Hsin-Yao Wang1,2, Tsung-Ting Hsieh3, Chia-Ru Chung4, Hung-Ching Chang3, Jorng-Tzong Horng1,4,5, Jang-Jih Lu1,6,7, Jia-Hsin Huang3.
Abstract
Matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectrometry (MS) has recently become a useful analytical approach for microbial identification. The presence and absence of specific peaks on MS spectra are commonly used to identify the bacterial species and predict antibiotic-resistant strains. However, the conventional approach using few single peaks would result in insufficient prediction power without using complete information of whole MS spectra. In the past few years, machine learning algorithms have been successfully applied to analyze the MALDI-TOF MS peaks pattern for rapid strain typing. In this study, we developed a convolutional neural network (CNN) method to deal with the complete information of MALDI-TOF MS spectra for detecting Enterococcus faecium, which is one of the leading pathogens in the world. We developed a CNN model to rapidly and accurately predict vancomycin-resistant Enterococcus faecium (VREfm) samples from the whole mass spectra profiles of clinical samples. The CNN models demonstrated good classification performances with the average area under the receiver operating characteristic curve (AUROC) of 0.887 when using external validation data independently. Additionally, we employed the score-class activation mapping (CAM) method to identify the important features of our CNN models and found some discriminative signals that can substantially contribute to detecting the ion of resistance. This study not only utilized the complete information of MALTI-TOF MS data directly but also provided a practical means for rapid detection of VREfm using a deep learning algorithm.Entities:
Keywords: MALDI-TOF MS; antibacterial drug resistance; convolutional neural network (CNN); rapid detection; vancomycin-resistant Enterococcus faecium (VREfm)
Year: 2022 PMID: 35756017 PMCID: PMC9231590 DOI: 10.3389/fmicb.2022.821233
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 6.064
Figure 1Illustration of convolutional neural network (CNN) model architecture. C denotes different channels, and the dropout rate was 0.5.
Figure 2Comparison of average pooling in terms of accuracy (A) and AUC (B) for input mass spectrometry (MS) raw data and rectified linear units (ReLu) as activation function. Multiple comparison tests on the different units were applied in the models using one-way ANOVA, following the post-hoc Tukey honest significant difference (HSD) test for paired difference with a p-value < 0.05. Different letters indicate a significant difference.
Figure 3Comparison of model performance in terms of accuracy (A) and AUC (B) for with different channel sizes and dropouts. Multiple comparison tests on the different units were applied in the models using one-way ANOVA, following the post-hoc Tukey HSD test for paired difference with a p-value < 0.05. Different letters indicate a significant difference.
Figure 4Comparison of prediction performance in terms of accuracy (A) and AUC (B) using different models. Multiple comparison tests on the different units were applied in the models using one-way ANOVA, following the post-hoc Tukey HSD test for paired difference with a p-value < 0.05. Different letters indicate a significant difference.
Figure 5Feature selection is based on the Score-CAM. Each row of m/z peaks represents the important features with top 1% weights in the different models. The important features of our CNN models for the classification of vancomycin-resistant E. faecium (VREfm) (A) and vancomycin-susceptible E. faecium (VSEfm) (B) in the last and top 500 testing cases, respectively. Rectangles represent the important features of m/z ranges selected by at least 8 independent CNN models using Score-CAM. Triangles indicate that the important markers at m/z signals for discrimination of VREfm strain have been identified in the literature. Blue is identified by Wei et al. (2014), black is identified by Lasch et al. (2014), and orange is identified by Wang et al. (2020b). (C) The intensity distribution of informative features with the normalized intensities of 500 resistant and 500 susceptible isolates. The star (*) indicates a statistical difference in the feature intensities between resistant and susceptible isolates. Wilcoxon-rank sum test was applied to test the difference between the two groups. * q-value < 0.05, ** q-value < 0.01, ***, q-value < 0.001.