| Literature DB >> 32010083 |
Margot Delavy1, Lorenzo Cerutti2, Antony Croxatto1, Guy Prod'hom1, Dominique Sanglard1, Gilbert Greub1, Alix T Coste1.
Abstract
Candida albicans causes life-threatening systemic infections in immunosuppressed patients. These infections are commonly treated with fluconazole, an antifungal agent targeting the ergosterol biosynthesis pathway. Current Antifungal Susceptibility Testing (AFST) methods are time-consuming and are often subjective. Moreover, they cannot reliably detect the tolerance phenomenon, a breeding ground for the resistance. An alternative to the classical AFST methods could use Matrix-Assisted Laser Desorption/Ionization Time-of-Flight (MALDI-TOF) Mass spectrometry (MS). This tool, already used in clinical microbiology for microbial species identification, has already offered promising results to detect antifungal resistance on non-azole tolerant yeasts. Here, we propose a machine-learning approach, adapted to MALDI-TOF MS data, to qualitatively detect fluconazole resistance in the azole tolerant species C. albicans. MALDI-TOF MS spectra were acquired from 33 C. albicans clinical strains isolated from 15 patients. Those strains were exposed for 3 h to 3 fluconazole concentrations (256, 16, 0 μg/mL) and with (5 μg/mL) or without cyclosporin A, an azole tolerance inhibitor, leading to six different experimental conditions. We then optimized a protein extraction protocol allowing the acquisition of high-quality spectra, which were further filtered through two quality controls. The first one consisted of discarding not identified spectra and the second one selected only the most similar spectra among replicates. Quality-controlled spectra were divided into six sets, following the sample preparation's protocols. Each set was then processed through an R based script using pre-defined housekeeping peaks allowing peak spectra positioning. Finally, 32 machine-learning algorithms applied on the six sets of spectra were compared, leading to 192 different pipelines of analysis. We selected the most robust pipeline with the best accuracy. This LDA model applied to the samples prepared in presence of tolerance inhibitor but in absence of fluconazole reached a specificity of 88.89% and a sensitivity of 83.33%, leading to an overall accuracy of 85.71%. Overall, this work demonstrated that combining MALDI-TOF MS and machine-learning could represent an innovative mycology diagnostic tool.Entities:
Keywords: Candida albicans; MALDI-TOF MS; diagnostic; fluconazole resistance; machine learning
Year: 2020 PMID: 32010083 PMCID: PMC6971193 DOI: 10.3389/fmicb.2019.03000
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Candida albicans strains used in the project.
| 1 | TT | 2321 | 0.25 | ∗ | ||||||
| TT | 2322 | 16 | X | X | X | |||||
| V | 2323 | 32 | X | X | X | X | ||||
| 2 | TT | 731 | 0.25 | |||||||
| TT | 732 | 16 | X | X | ||||||
| V | 735 | 64 | X | X | ||||||
| 6 | TT | 2243 | 1 | X | ||||||
| V | 2242 | 8 | X | X | X | |||||
| 10 | TT | 741 | 0.25 | |||||||
| V | 742 | 16 | X | X | ||||||
| 12 | V | 2284 | 0.25 | |||||||
| TT | 2285 | 16 | X | X | X | X | ||||
| 19 | TT | 290 | 0.5 | |||||||
| TT | 292 | 128 | X | X | ∗ | |||||
| 20 | V | 294 | 0.25 | |||||||
| TT | 296 | 128 | X | X | X | |||||
| 21 | V | 347 | 0.25 | |||||||
| TT | 288 | 0.5 | X | |||||||
| TT | 289 | 128 | X | X | X | |||||
| 22 | TT | 3534 | 0.5 | |||||||
| TT | 3548 | 128 | X | X | X | X | X | |||
| 4 | V | 750 | 16 | X | ∗ | |||||
| TT | 2250 | 1 | X | |||||||
| 5 | V | 757 | 2 | ∗ | ||||||
| TT | 758 | 16 | X | X | ∗ | |||||
| 9 | TT | 482 | 8 | ∗ | ||||||
| TT | 488 | 16 | X | X | X | |||||
| 13 | TT | 520 | 32 | X | X | |||||
| TT | 522 | 128 | X | X | X | |||||
| 15 | TT | 2250 | 1 | X | ||||||
| TT | 2251 | 16 | X | X | X | |||||
| 18 | TT | 281 | 1 | ∗ | ||||||
| TT | 284 | 32 | X | X | X |
Housekeeping peaks and their associated frequency in the Bruker C. albicans superspectra.
| 0.852 | |
| 0.818 | |
| 0.7912 | |
| 0.915 | |
| 0.829 | |
| 0.812 | |
| 0.837 | |
| 0.969 | |
| 0.7064 | |
| 0.967 | |
| 0.882 | |
| 0.812 |
FIGURE 1Spectra processing pipeline. The parameters used for each step are indicated in italics. (A) Raw spectrum. (B) Raw spectrum’s variance is transformed. (C) The spectrum is smoothed and the baseline (red line) is estimated. (D) The baseline is removed. (E) The spectrum’s intensities are calibrated. (F) The spectra of the technical replicates are merged in a single average spectrum. (G) The peaks (red crosses) are detected and warped on the housekeeping peaks, which allow a stable alignment. A zoom of a single peak shows the changes expected in the alignment of a housekeeping peak after the warping. (H) The peaks are binned by merging together the peaks closer than 3 m/z. (I) An intensity matrix is generated with the intensities of each peak, for each spectrum.
FIGURE 2Fluconazole resistance detection by machine-learning approach. (A) Peaks’ ranking by importance to discriminate resistant and susceptible strains. A model based on the Random Forest (RF) classifier was trained on the training set and tested on the testing set to separate the fluconazole-resistant strains from the fluconazole-susceptible ones depending on the peaks’ intensities. Three values of number of trees to grow (ntree) were tested. The peaks were ranked by their associated Mean Decrease in Gini index (I) and four Decrease in Gini index thresholds (iThr = 0, 0.3, 0.4, 0.5) were arbitrarily set to extract a list of discriminating peaks (RF Peaks). (B) Models testing. The intensity matrix was reduced to the RF peaks and RF, logistic regression and LDA models were trained and tested to separate the fluconazole-resistant strains from the fluconazole-susceptible ones depending on the peaks’ intensities. In total, 32 models were tested on each of the 6 subsets, for a total of 192 pipelines of analysis from sample preparation to resistance prediction, each associated to a specific accuracy. (C) Selection of the most accurate pipelines. The 15% pipelines corresponding to the highest accuracies were selected. (D) Verification of the pipelines’ robustness. The training and testing set associated to each of the 15% best accurate pipelines were merged and randomly split (ratio 2:1) in new training and testing sets. The model was trained on the new training set and the accuracy of the susceptibility level prediction on the testing set was stored. This process was iteratively repeated 100 times to generated as many different training/testing set combinations. The pipeline associated with a high median of accuracies and a low variance of accuracies was selected for validation.
FIGURE 3Summary of the pipelines selected with the machine-learning approach. (A) 15% pipelines with the highest accuracy. Each line of the table described the sample’s preparation conditions (Cyclo and FLC), the algorithm (Test), the Mean Decrease in Gini index threshold (iThr) and number of trees (ntree) parameters used in the pipeline and the accuracy associated to it. (B) Pipelines’ robustness. Graph of the accuracies obtained by each 15% best pipelines during the 100 rounds they were submitted to, and summary of the associated median and variances of accuracies. The red box represents accuracy below 50%, the yellow box, the accuracies between 50 and 70% and the green box, the accuracies above 70%. (C) Description of final selected pipeline’s parameter and its associated accuracy, specificity, and sensitivity.