| Literature DB >> 34756065 |
Hsin-Yao Wang1,2, Chia-Ru Chung3, Chao-Jung Chen4,5, Ko-Pei Lu6, Yi-Ju Tseng1, Tzu-Hao Chang7,8, Min-Hsien Wu1,8,9,10,11, Wan-Ting Huang12, Ting-Wei Lin1, Tsui-Ping Liu1, Tzong-Yi Lee13,14, Jorng-Tzong Horng1,3,15, Jang-Jih Lu1,16,17.
Abstract
Enterococcus faecium is a clinically important pathogen that can cause significant morbidity and death. In this study, we aimed to develop a machine learning (ML) algorithm-based rapid susceptibility method to distinguish vancomycin-resistant E. faecium (VREfm) and vancomycin-susceptible E. faecium (VSEfm) strains. A predictive model was developed and validated to distinguish VREfm and VSEfm strains by analyzing the matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) mass spectrometry (MS) spectra of unique E. faecium isolates from different specimen types. The algorithm used 5,717 mass spectra, including 2,795 VREfm and 2,922 VSEfm mass spectra, and was externally validated with 2,280 mass spectra of isolates (1,222 VREfm and 1,058 VSEfm strains). A random forest-based algorithm demonstrated overall good classification performances for the isolates from the specimens, with mean accuracy, sensitivity, and specificity of 0.78, 0.79, and 0.77, respectively, with 10-fold cross-validation, timewise validation, and external validation. Furthermore, the algorithm provided rapid results, which would allow susceptibility prediction prior to the availability of phenotypic susceptibility results. In conclusion, an ML algorithm designed using mass spectra obtained from the routine workflow may be able to rapidly differentiate VREfm strains from VSEfm strains; however, susceptibility results must be confirmed by routine methods, given the demonstrated performance of the assay. IMPORTANCE A modified binning method was incorporated to cluster MS shifting ions into a set of representative peaks based on a large-scale MS data set of clinical VREfm and VSEfm isolates, including 2,795 VREfm and 2,922 VSEfm isolates. Predictions with the algorithm were significantly more accurate than empirical antibiotic use, the accuracy of which was 0.50, based on the local epidemiology. The algorithm improved the accuracy of antibiotic administration, compared to empirical antibiotic prescription. An ML algorithm designed using MALDI-TOF MS spectra obtained from the routine workflow accurately differentiated VREfm strains from VSEfm strains, especially in blood and sterile body fluid samples, and can be applied to facilitate the rapid and accurate clinical testing of pathogens.Entities:
Keywords: Enterococcus faecium; antibacterial drug resistance; clinical methods; machine learning; matrix-assisted laser desorption ionization–time of flight (MALDI-TOF) mass spectrometry; microbiology; rapid detection; vancomycin resistance; vancomycin-resistant Enterococcus faecium
Mesh:
Substances:
Year: 2021 PMID: 34756065 PMCID: PMC8579932 DOI: 10.1128/Spectrum.00913-21
Source DB: PubMed Journal: Microbiol Spectr ISSN: 2165-0497
FIG 1(a) Heat map. We selected the top 10 discriminative peaks by chi-square testing of the occurrence frequency of peaks in VREfm and VSEfm (see Table S2 in the supplemental material). The heat map was plotted based on hierarchical clustering of all of the VREfm and VSEfm isolates from the CGMH Linkou branch. Rows represent the isolates, and columns represent the top 10 discriminative peaks. The values in the heat map represent the MS spectral intensity, which was log10 normalized and Z-score standardized. Red indicates relatively higher peak intensity, while blue indicates lower peak intensity. The isolates are grouped into five clusters. VREfm and VSEfm isolates can be visually differentiated by using the top 10 discriminative peaks. (b) Intensity of the top 10 important predictors. The logarithms to base 10 of the peak intensities are plotted for VREfm and VSEfm. (c) Occurrence frequency of the top 10 important predictors. The occurrence frequency of the 10 peaks in VREfm and VSEfm is plotted.
Performance of VREfm prediction models in terms of k-fold CV, timewise validation, and external validation
| Evaluation metrics | Machine learning models: | ||
|---|---|---|---|
| RF model | SVM model | KNN model | |
| AUROC (95% CI) | |||
| 5-fold CV | 0.8495 (0.8397–0.8594) | 0.8367 (0.8264–0.8471) | 0.7908 (0.7792–0.8024) |
| 10-fold CV | 0.8491 (0.8392–0.8589) | 0.8338 (0.8234–0.8442) | 0.7589 (0.7468–0.7710) |
| Timewise validation | 0.8463 (0.8273–0.8654) | 0.8368 (0.8169–0.8566) | 0.7908 (0.7690–0.8127) |
| External validation | 0.8553 (0.8399–0.8706) | 0.8407 (0.8246–0.8569) | 0.8050 (0.7872–0.8227) |
| Accuracy (95% CI) | |||
| 5-fold CV | 0.7769 (0.7660–0.7878) | 0.7610 (0.7499–0.7721) | 0.7248 (0.7131–0.7364) |
| 10-fold CV | 0.7789 (0.7608–0.7827) | 0.7587 (0.7476–0.7699) | 0.6906 (0.6786–0.7027) |
| Timewise validation | 0.7840 (0.7640–0.8039) | 0.7815 (0.7615–0.8016) | 0.7228 (0.7011–0.7445) |
| External validation | 0.7855 (0.7687–0.8024) | 0.7781 (0.7610–0.7951) | 0.7355 (0.7174–0.7536) |
| Sensitivity (95% CI) | |||
| 5-fold CV | 0.8054 (0.7951–0.8517) | 0.7826 (0.7719–0.7934) | 0.7873 (0.7767–0.7980) |
| 10-fold CV | 0.7863 (0.7756–0.7969) | 0.8192 (0.8091–0.8292) | 0.7096 (0.6978–0.7214) |
| Timewise validation | 0.8153 (0.7965–0.8341) | 0.8415 (0.8238–0.8592) | 0.7491 (0.7281–0.7702) |
| External validation | 0.7791 (0.7620–0.7961) | 0.7954 (0.7789–0.8120) | 0.8044 (0.7881–0.8207) |
| Specificity (95% CI) | |||
| 5-fold CV | 0.7497 (0.7384–0.7609) | 0.7403 (0.7289–0.7517) | 0.6649 (0.6526–0.6772) |
| 10-fold CV | 0.7789 (0.7680–0.7897) | 0.7009 (0.6890–0.7128) | 0.6725 (0.6603–0.6848) |
| Timewise validation | 0.7477 (0.7266–0.7688) | 0.7120 (0.6900–0.7340) | 0.6922 (0.6698–0.7146) |
| External validation | 0.7930 (0.7764–0.8096) | 0.7580 (0.7405–0.7756) | 0.6560 (0.6365–0.6755) |
Performance of the RF-based VREfm detection model with different types of specimens in terms of external validation
| Metric | Types of specimens: | |||
|---|---|---|---|---|
| Blood samples ( | Urinary tract samples ( | Sterile body fluid samples ( | Wound samples ( | |
| AUROC (95% CI) | 0.9103 (0.8727–0.9480) | 0.8494 (0.8258–0.8731) | 0.8714 (0.8321–0.9106) | 0.8432 (0.8121–0.8743) |
| Accuracy (95% CI) | 0.8488 (0.7997–0.8978) | 0.7743 (0.7482–0.8004) | 0.8077 (0.7657–0.8497) | 0.7740 (0.7436–0.8043) |
| Sensitivity (95% CI) | 0.8870 (0.8436–0.9303) | 0.7672 (0.7409–0.7936) | 0.7788 (0.7345–0.8230) | 0.7339 (0.7018–0.7659) |
| Specificity (95% CI) | 0.8000 (0.7452–0.8548) | 0.7805 (0.7547–0.8063) | 0.8222 (0.7815–0.8630) | 0.8676 (0.8430–0.8922) |
FIG 2(a) ROC curves for different algorithms in terms of Linkou 5-fold CV. (b) ROC curves for different algorithms in terms of timewise validation. (c) ROC curves for different algorithms in terms of external validation. (d) ROC curves for the RF-based VREfm model with the isolates from different types of specimens.
FIG 3MALD-TOF MS analysis of the C4 LC fractions 8 to 10. The peak of m/z 3,645 and its singly charged protein peak (m/z 7,289) are evident in fraction 9.
FIG 4Nano-LC-MS/MS spectra for identification of RS14Z_ENTFA. The identified protein sequence is underlined.
FIG 5(a) Schematic illustration of the application of the VREfm model. A timeline of the bacterial culture testing using currently used clinical tests (i.e., traditional approach) and a modified timeline with the VREfm model incorporated (i.e., ML approach) are shown. In the traditional approach, specimens are collected for bacterial culture test. Usually, 1 day is needed for growth of a single colony for species identification (by MALDI-TOF MS). Vancomycin AST for VREfm requires another 1 day. In contrast, in the ML approach, the VREfm model can provide preliminary AST results at the time when the bacterial species is identified by MALDI-TOF MS. In the treatment of VREfm, the ML approach can improve the accuracy of antibiotic use. Meanwhile, the turnaround time of the bacterial culture test can be reduced to 1 day, which is a 50% reduction. (b) Schematic illustration of the study design. The study included several steps, i.e., data collection, data preprocessing, predictor candidate extraction and important predictor selection, and model training, evaluation, and testing. Data were obtained from two tertiary medical centers (Linkou and Kaohsiung branches of CGMH). The data included mass spectra and results of the vancomycin susceptibility testing of E. faecium strains. Data from the CGMH Linkou branch were used for model training and validation, while data from the CGMH Kaohsiung branch served as independent testing data. In the steps of data preprocessing and predictor candidate extraction and important predictor selection, a specific set of crucial predictors was used for model training. k-fold, timewise CV, and external validation were used to confirm the models’ robustness. The VREfm prediction model can detect VREfm accurately at least 1 day earlier than the current method can.