| Literature DB >> 35631107 |
Po-Hsin Kong1,2, Cheng-Hsiung Chiang3, Ting-Chia Lin2,4, Shu-Chen Kuo5, Chien-Feng Li6, Chao A Hsiung3, Yow-Ling Shiue1,4, Hung-Yi Chiou3,7,8, Li-Ching Wu1,2, Hsiao-Hui Tsou3,9.
Abstract
Early administration of proper antibiotics is considered to improve the clinical outcomes of Staphylococcus aureus bacteremia (SAB), but routine clinical antimicrobial susceptibility testing takes an additional 24 h after species identification. Recent studies elucidated matrix-assisted laser desorption/ionization time-of-flight mass spectra to discriminate methicillin-resistant strains (MRSA) or even incorporated with machine learning (ML) techniques. However, no universally applicable mass peaks were revealed, which means that the discrimination model might need to be established or calibrated by local strains' data. Here, a clinically feasible workflow was provided. We collected mass spectra from SAB patients over an 8-month duration and preprocessed by binning with reference peaks. Machine learning models were trained and tested by samples independently of the first six months and the following two months, respectively. The ML models were optimized by genetic algorithm (GA). The accuracy, sensitivity, specificity, and AUC of the independent testing of the best model, i.e., SVM, under the optimal parameters were 87%, 75%, 95%, and 87%, respectively. In summary, almost all resistant results were truly resistant, implying that physicians might escalate antibiotics for MRSA 24 h earlier. This report presents an attainable method for clinical laboratories to build an MRSA model and boost the performance using their local data.Entities:
Keywords: MALDI-TOF MS; Staphylococcus aureus bacteremia; antimicrobial susceptibility testing; binning method; machine learning; methicillin-resistant Staphylococcus aureus
Year: 2022 PMID: 35631107 PMCID: PMC9143686 DOI: 10.3390/pathogens11050586
Source DB: PubMed Journal: Pathogens ISSN: 2076-0817
Figure 1The workflow of this study.
Evaluation of feature extraction by repeated random sampling from 366 samples.
| Sample No. | 36 | 72 | 144 | 216 | 288 | 366 | ||
|---|---|---|---|---|---|---|---|---|
| Total feature No. | 290 | 324 | 424 | 429 | 461 | 508 | ||
|
|
|
|
| |||||
| 0.184 | 6594.2 | 6593.4 | 6593.5 | 6593.3 | 6593.2 | 6593.2 | 6593.5 | 0.36 |
| 0.093 | 5305.0 | 5304.8 | 5305.0 | 5304.8 | 5304.8 | 5304.8 | 5304.9 | 0.12 |
| 0.078 | 7420.9 | 7420.7 | 7420.8 | 7420.4 | 7420.7 | 7420.8 | 7420.7 | 0.19 |
| 0.071 | 6425.1 | 6424.7 | 6425.3 | 6424.8 | 6424.8 | 6424.8 | 6424.9 | 0.23 |
| 0.061 | 13183.3 | N/A | 13185.3 | 13186.5 | 13184.7 | 13185.3 | 13185.0 | 1.17 |
| 0.049 | 6929.7 | 6928.6 | 6929.2 | 6928.4 | 6928.5 | 6928.8 | 6928.9 | 0.50 |
| 0.040 | 4591.1 | 4591.0 | 4591.2 | 4590.9 | 4591.0 | 4591.0 | 4591.0 | 0.10 |
| 0.036 | 2200.8 | 2200.1 | 2200.3 | 2200.1 | 2200.1 | 2199.9 | 2200.2 | 0.31 |
| 0.031 | 3211.9 | 3211.9 | 3212.0 | 3211.8 | 3211.9 | 3211.9 | 3211.9 | 0.07 |
| 0.031 | 9182.0 | 9181.5 | 9181.7 | 9181.6 | 9181.5 | 9181.5 | 9181.6 | 0.20 |
* Feature importance calculated from DT. # Standard deviation
The classification performance of four ML models using different feature selection strategies.
| Only Use Top 15 Important Features to Classify | SVM | DT | RF | PR | |
|---|---|---|---|---|---|
| Training | Accuracy | 0.9135 | 0.9999 | 0.9999 | 0.7900 |
| Sensitivity | 0.8687 | 1.0000 | 1.0000 | 0.6475 | |
| Specificity | 0.9432 | 0.9998 | 0.9998 | 0.8849 | |
| Testing | Accuracy | 0.7261 | 0.6943 | 0.7509 | 0.7297 |
| Sensitivity | 0.6337 | 0.6378 | 0.6453 | 0.5852 | |
| Specificity | 0.7906 | 0.7329 | 0.8250 | 0.8298 | |
|
|
|
|
|
| |
| Training | Accuracy | 0.9485 | 1.0000 | 1.0000 | 0.8198 |
| Sensitivity | 0.9184 | 1.0000 | 1.0000 | 0.7083 | |
| Specificity | 0.9686 | 1.0000 | 1.0000 | 0.8939 | |
| Testing | Accuracy | 0.7377 | 0.7022 | 0.7666 | 0.7373 |
| Sensitivity | 0.6530 | 0.6421 | 0.6551 | 0.6160 | |
| Specificity | 0.7968 | 0.7437 | 0.8455 | 0.8217 | |
|
|
|
|
|
| |
| Training | Accuracy | 0.9939 | 1.0000 | 1.0000 | 1.0000 |
| Sensitivity | 0.9897 | 1.0000 | 1.0000 | 1.0000 | |
| Specificity | 0.9967 | 1.0000 | 1.0000 | 1.0000 | |
| Testing | Accuracy | 0.7522 | 0.6993 | 0.7241 | 0.6254 |
| Sensitivity | 0.6686 | 0.6420 | 0.5688 | 0.5997 | |
| Specificity | 0.8121 | 0.7384 | 0.8326 | 0.6460 | |
Figure 2The classification performance of different methods, i.e., SVM (A–C), DT (D–F), RF (G–I), and PR (J–L), in different feature selection strategies. The horizontal axis represents the number of features. 15 IFs: top 15 important features, All IFs: all important features, All Fea.: all 508 features.
Figure 3An illustration of chromosome design for optimize SVM parameters. The chromosome is encoded as binary bits. Genes 1 to 12 (N = 12) are used to represent the parameter C; genes 13 to 24 are used to represent the parameter γ; genes 25 to 29 are used to represent the number of features N of the sample.
Figure 4An illustration of chromosome design for optimize DT parameters. The chromosome is encoded by 35 binary bits. The full terms for the abbreviated parameters are: Cr—criterion; Sp—splitter; MD—max_depth; MSS—min_samples_split; MSL—min_samples_leaf; CA—ccp_alpha; N—number of features.
The classification performance of 4 ML methods with or without optimizing parameters.
| Based on Optimal Parameter Settings | SVM | DT | RF | PR | |
|---|---|---|---|---|---|
| Training | Accuracy | 0.8689 | 0.8470 | 0.9563 | 0.7732 |
| Sensitivity | 0.8707 | 0.7959 | 0.9320 | 0.5850 | |
| Specificity | 0.8676 | 0.8813 | 0.9726 | 0.8995 | |
| Independent Testing | Accuracy | 0.8736 | 0.7692 | 0.7637 | 0.7582 |
| Sensitivity | 0.7500 | 0.7222 | 0.6806 | 0.5139 | |
| Specificity | 0.9545 | 0.8000 | 0.8182 | 0.9182 | |
| AUC | 0.8664 | 0.7956 | 0.8244 | 0.8238 | |
|
|
|
|
|
| |
| Independent Testing | Accuracy | 0.5879 | 0.7363 | 0.6538 | 0.5385 |
| Sensitivity | 0.0972 | 0.6389 | 0.5972 | 0.6667 | |
| Specificity | 0.9091 | 0.8000 | 0.6909 | 0.4545 | |
| AUC | 0.5910 | 0.7170 | 0.7134 | 0.5427 | |
Figure 5The ROC curve of the independent testing. (A,B) are the ROC curves for the four ML methods with and without the optimal parameter settings.
The classification accuracy against MRSA with different types of SCCmec.
| SCC | N | SVM | RF | DT | PR |
|---|---|---|---|---|---|
| Part 1: The best models optimized by clinical samples | |||||
| II | 10 | 0 (0%) | 0 (0%) | 1 (10%) | 0 (0%) |
| III | 50 | 36 (72%) | 43 (86%) | 47 (94%) | 37 (74%) |
| IV | 23 | 9 (39%) | 9 (39%) | 7 (30%) | 13 (56%) |
| V | 27 | 11 (40%) | 14 (51%) | 13 (48%) | 15 (55%) |
| Total | 110 | 56 (51%) | 66 (60%) | 68 (62%) | 65 (59%) |
| Part 2: Manually elevate the importance of feature 2410 to 2417 | |||||
| II | 10 | 10 (100%) | 9 (90%) | 10 (100%) | 9 (90%) |
| III | 50 | 45 (90%) | 32 (64%) | 50 (100%) | 44 (88%) |
| IV | 23 | 23 (100%) | 16 (69%) | 23 (100%) | 22 (95%) |
| V | 27 | 27 (100%) | 17 (62%) | 27 (100%) | 26 (96%) |
| Total | 110 | 105 (95%) | 74 (67%) | 110 (100%) | 101 (92%) |