| Literature DB >> 32657381 |
Caroline Weis1,2, Max Horn1,2, Bastian Rieck1,2, Aline Cuénod3,4, Adrian Egli3,4, Karsten Borgwardt1,2.
Abstract
MOTIVATION: Microbial species identification based on matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectrometry (MS) has become a standard tool in clinical microbiology. The resulting MALDI-TOF mass spectra also harbour the potential to deliver prediction results for other phenotypes, such as antibiotic resistance. However, the development of machine learning algorithms specifically tailored to MALDI-TOF MS-based phenotype prediction is still in its infancy. Moreover, current spectral pre-processing typically involves a parameter-heavy chain of operations without analyzing their influence on the prediction results. In addition, classification algorithms lack quantification of uncertainty, which is indispensable for predictions potentially influencing patient treatment.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32657381 PMCID: PMC7355261 DOI: 10.1093/bioinformatics/btaa429
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Summary statistics of the dataset that we used for all experiments in this article
| Species | Antibiotic | # samples | % resistant |
|---|---|---|---|
|
| amoxicillin/clavulanic acid | 1043 | 28.9 |
| ceftriaxone | 1060 | 20.4 | |
| ciprofloxacin | 1051 | 29.7 | |
|
| ceftriaxone | 597 | 15.1 |
| ciprofloxacin | 596 | 16.8 | |
| piperacillin/tazobactam | 576 | 13.9 | |
|
| amoxicillin/clavulanic acid | 973 | 13.7 |
| ciprofloxacin | 987 | 14.7 | |
| penicillin | 941 | 71.4 |
Fig. 1.A schematic illustration of our proposed pre-processing workflow for a raw spectrum (top) without any alignment or pre-processing steps. Our persistence transformation (bottom) yields a simplified and cleaner representation of the spectrum. The interpretation of the y-axis changes from an intensity to a persistence. The transformed spectrum can be easily converted to sparse tuples by taking the k most persistent peaks
Fig. 2.A depiction of the feature map u(x, t) of a given spectrum. The initial raw spectrum consists of single peaks whose influence is slowly diffused over the whole space. Increasing t minimizes the influence of a single peak
Results of all methods given by mean average precision (AUPRC) ± standard deviation on the test fold for five random splits
| Experiment | Species | Antibiotic | MQ–LR | PT–LR | MQ–GP–RBF | PT–GP–PIKE | MQ–GP–PIKE |
|---|---|---|---|---|---|---|---|
| E-AMOXCLAV |
| amoxicillin/clavulanic acid | 40.96 ± 7.41 | 35.72 ± 2.70 | 32.50 ± 8.48 | 38.89 ± 2.03 |
|
| E-CEF | ceftriaxone | 63.22 ± 6.08 | 58.04 ± 3.14 | 46.29 ± 24.00 | 62.78 ± 3.19 |
| |
| E-CIPRO | ciprofloxacin | 61.37 ± 8.52 | 55.14 ± 3.84 | 34.65 ± 10.71 | 54.02 ± 4.04 |
| |
| K-CEF |
| ceftriaxone | 58.20 ± 9.79 | 56.47 ± 6.26 | 58.72 ± 25.29 | 72.38 ± 9.03 |
|
| K-CIPRO | ciprofloxacin | 41.71 ± 9.82 | 35.04 ± 7.74 | 30.88 ± 13.54 | 40.15 ± 13.29 |
| |
| K-PIPTAZO | piperacillin/tazobactam | 31.58 ± 6.81 | 38.62 ± 8.65 | 13.79 ± 0.00 | 48.95 ± 9.90 |
| |
| S-AMOXCLAV |
| amoxicillin/clavulanic acid | 52.88 ± 3.91 | 55.21 ± 4.08 | 13.85 ± 0.00 | 61.02 ± 12.45 |
|
| S-CIPRO | ciprofloxacin | 34.11 ± 3.26 | 26.30 ± 6.16 | 23.32 ± 11.88 | 30.51 ± 2.95 |
| |
| S-PEN | penicillin | 79.66 ± 3.34 | 79.61 ± 4.66 | 74.15 ± 3.15 | 80.67 ± 1.92 |
|
Note: In the abbreviated names, a pre-processing method is followed by a classifier, which is in turn followed by a kernel (if applicable). For example, PT–GP–PIKE refers to persistence-transformed features and a GP classifier with our PIKE kernel. Both the logistic regression (LR) and GP using MALDIquant (MQ) features used peaks selected by MALDIquant, with a mean of 216 peaks given per spectrum. The GP using the topological features was trained with k = 200 peaks.
Fig. 3.A histogram showing the different distributions of the maximum class probability for the logistic regression (left column) and the Gaussian process classifier with PIKE (right column) trained on S. aureus. The upper figure depicts the in-training distribution of maximum class probabilities, i.e. class probabilities with respect to S. aureus, while the middle and lower figures show the values for out-of-distributions species (E. coli and K. pneumoniae)
Fig. 4.The curves depict the trade-off between in-proportion of rejected in-distribution samples versus out-of-distribution samples. A complete rejection of all out-of-distribution samples is reached at a low rejection ratio for in-training. The in-training dataset is MQ–GP–PIKE with S-AMOXCLAV
Fig. 5.The improvement in terms of prediction accuracy for every threshold θ, for S-AMOXCLAV. To permit comparability, we report the accuracy as the class ratio changes for different values of θ. Small sample size effects increase variance for larger values of θ