| Literature DB >> 29534106 |
Hsin-Yao Wang1, Tzong-Yi Lee2,3, Yi-Ju Tseng1,4, Tsui-Ping Liu1, Kai-Yao Huang2, Yung-Ta Chang1, Chun-Hsien Chen4, Jang-Jih Lu1,5.
Abstract
Methicillin-resistant Staphylococcus aureus (MRSA), one of the most important clinical pathogens, conducts an increasing number of morbidity and mortality in the world. Rapid and accurate strain typing of bacteria would facilitate epidemiological investigation and infection control in near real time. Matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) mass spectrometry is a rapid and cost-effective tool for presumptive strain typing. To develop robust method for strain typing based on MALDI-TOF spectrum, machine learning (ML) is a promising algorithm for the construction of predictive model. In this study, a strategy of building templates of specific types was used to facilitate generating predictive models of methicillin-resistant Staphylococcus aureus (MRSA) strain typing through various ML methods. The strain types of the isolates were determined through multilocus sequence typing (MLST). The area under the receiver operating characteristic curve (AUC) and the predictive accuracy of the models were compared. ST5, ST59, and ST239 were the major MLST types, and ST45 was the minor type. For binary classification, the AUC values of various ML methods ranged from 0.76 to 0.99 for ST5, ST59, and ST239 types. In multiclass classification, the predictive accuracy of all generated models was more than 0.83. This study has demonstrated that ML methods can serve as a cost-effective and promising tool that provides preliminary strain typing information about major MRSA lineages on the basis of MALDI-TOF spectra.Entities:
Mesh:
Year: 2018 PMID: 29534106 PMCID: PMC5849341 DOI: 10.1371/journal.pone.0194289
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Decision tree structure constructed for illustration of discriminative features.
“P” represents “positive” (presence of a peak) at branches of internal nodes; “N” represents “negative” (absence of a peak) at branches of internal nodes.
Fig 2Radar chart of average log-transformed intensities of selected feature peaks.
P1695, P2066, P2451, P2978, P3176, P3891, P4074, P4813, and P6550 represent the signal peaks at 1695, 2066, 2451, 2978, 3176, 3891, 4074, 4813, and 6550 m/z, respectively.
Fig 3Three-dimensional scatter plot of various ST type isolates.
P6550, P4074, and P4813 represent peak signals at 6550, 4074, and 4813 m/z, respectively.
Fig 4PCA comparison of various ST type isolates.
AUC of binary classification ML models for different ST types.
| ML models | ST5 | ST45 | ST59 | ST239 | ||||
|---|---|---|---|---|---|---|---|---|
| AUC | SE | AUC | SE | AUC | SE | AUC | SE | |
| 0.762 | 0.079 | 0.500 | 0.000 | 0.933 | 0.021 | 0.895 | 0.019 | |
| 0.963 | 0.018 | 0.919 | 0.039 | 0.975 | 0.015 | 0.991 | 0.005 | |
| 0.953 | 0.033 | 0.789 | 0.114 | 0.958 | 0.026 | 0.967 | 0.020 | |
AUC: area under ROC curve; SE: standard error.
Accuracy of multiclass classification using various ML methods.
| ML Methods | ACC | SE |
|---|---|---|
| 0.832 | 0.015 | |
| 0.864 | 0.020 | |
| 0.848 | 0.027 |
ACC: accuracy; SE: standard error.