| Literature DB >> 32390972 |
Samaneh Kouchaki1, Yang Yang1,2, Alexander Lachapelle1, Timothy M Walker3,4,5, A Sarah Walker3,4,6, Timothy E A Peto3,4,6, Derrick W Crook3,4,6, David A Clifton1.
Abstract
Resistance prediction and mutation ranking are important tasks in the analysis of Tuberculosis sequence data. Due to standard regimens for the use of first-line antibiotics, resistance co-occurrence, in which samples are resistant to multiple drugs, is common. Analysing all drugs simultaneously should therefore enable patterns reflecting resistance co-occurrence to be exploited for resistance prediction. Here, multi-label random forest (MLRF) models are compared with single-label random forest (SLRF) for both predicting phenotypic resistance from whole genome sequences and identifying important mutations for better prediction of four first-line drugs in a dataset of 13402 Mycobacterium tuberculosis isolates. Results confirmed that MLRFs can improve performance compared to conventional clinical methods (by 18.10%) and SLRFs (by 0.91%). In addition, we identified a list of candidate mutations that are important for resistance prediction or that are related to resistance co-occurrence. Moreover, we found that retraining our analysis to a subset of top-ranked mutations was sufficient to achieve satisfactory performance. The source code can be found at http://www.robots.ox.ac.uk/~davidc/code.php.Entities:
Keywords: MLRF; SLRF; drug resistance; mutation ranking; tuberculosis
Year: 2020 PMID: 32390972 PMCID: PMC7188832 DOI: 10.3389/fmicb.2020.00667
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Figure 1The phenotypic profile of first-line drugs; (A) each row shows the number of isolates that are resistant to at least the indicated drugs combination and (B) heatmap quantifying the number of instances of resistance co-occurrence between drugs. Off-diagonal elements show resistance co-occurrence between different drugs and diagonal elements show resistant to a single drug.
Performance of the best machine learning classifier and DA considering INH, EMB, RIF, PZA, MDR-TB, and FDR-TB.
| INH | 91.15 ± 1.19 | 98.96 ± 0.25 | 95.05 ± 0.60 | F3 + MLRF | 93.76 | 97.79 ± 0.35 | 96.01 |
| EMB | 85.10 ± 1.79 | 94.91 ± 0.38 | 90.00 ± 0.97 | F1 + MLRF | 91.75 | 91.58 | 91.70 |
| RIF | 91.52 ± 1.34 | 98.68 ± 0.21 | 95.10 ± 0.65 | F3 + MLRF | 93.16 | 98.02 ± 0.32 | 96.00 |
| PZA | 43.21 ± 2.72 | 98.58 ± 0.23 | 70.89 ± 1.35 | F1 + SLRF | 87.27 | 90.71 | 88.99 |
| FDR-TB | 37.34 ± 3.97 | 98.59 ± 0.22 | 67.96 ± 1.99 | F1 + MLRF | 87.58 | 92.98 | 90.28 |
| MDR-TB | 89.84 ± 1.34 | 99.12 ± 0.178 | 94.48 ± 0.69 | F3 + MLRF | 93.70 | 97.45 ± 0.36 | 95.58 |
Sensitivity, specificity and AUC (mean ± standard error) were reported. The Wilcoxon signed-rank test was used to calculate the p-value of each method compared with the DA and
p < 0.01 vs. DA.
Performance of best models restricting to only important mutations for classification.
| Best model | IF3 (0.001) + MLRF | IF3 (0.005) + SLRF | IF3 (0.001) + MLRF | IF1 (0.001) + SLRF | IF3 (0.01) + SLRF | IF3 (0.001) + MLRF |
| Number of mutations | 37 | 17 | 37 | 32 | 16 | 37 |
| Sensitivity | 92.88 (↓0.28) ± 0.93 | 91.10 (↓0.65) ± 1.76 | 92.19 (↓0.07) ± 1.10 | 84.73 (↓2.54) ± 2.49 | 91.74 (↑4.16) ± 3.37 | 93.76 (↑0.06) ± 1.33 |
| Specificity | 97.88 (↑0.09) ± 0.31 | 92.70 (↑1.12) ± 0.51 | 97.77 (↓0.22) ± 0.52 | 92.83 (↓2.12) ± 0.52 | 90.06 (↓2.92) ± 0.61 | 97.38 (↓0.07) ± 0.49 |
| AUC | 95.48 (↓0.53) ± 0.40 | 91.90 (↑0.20) ± 0.82 | 94.98 (↓1.02) ± 0.53 | 88.78 (↓0.21) ± 1.17 | 90.90 (↑0.62) ± 1.56 | 95.47 (↓0.11) ± 0.62 |
The number of mutations used for the classification, best model and performance for INH, EMB, RIF, PZA, MDR-TB, and FDR-TB are shown. Increase/decrease in performance in comparison with the best model in .
Figure 2AUC (%) comparison considering MLRF and four thresholds {0.05, 0.01, 0.005, and 0.001} for feature selection. “I” prefix refers to our substudy.
Figure 3AUC (%) comparison considering SLRF and four thresholds {0.05, 0.01, 0.005, and 0.001} for feature selection. “I” prefix refers to our substudy.