| Literature DB >> 29240876 |
Yang Yang1, Katherine E Niehaus1, Timothy M Walker2, Zamin Iqbal2, A Sarah Walker2,3, Daniel J Wilson2, Tim E A Peto2,4, Derrick W Crook2,3,4, E Grace Smith5, Tingting Zhu1, David A Clifton1.
Abstract
Motivation: Correct and rapid determination of Mycobacterium tuberculosis (MTB) resistance against available tuberculosis (TB) drugs is essential for the control and management of TB. Conventional molecular diagnostic test assumes that the presence of any well-studied single nucleotide polymorphisms is sufficient to cause resistance, which yields low sensitivity for resistance classification. Summary: Given the availability of DNA sequencing data from MTB, we developed machine learning models for a cohort of 1839 UK bacterial isolates to classify MTB resistance against eight anti-TB drugs (isoniazid, rifampicin, ethambutol, pyrazinamide, ciprofloxacin, moxifloxacin, ofloxacin, streptomycin) and to classify multi-drug resistance.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29240876 PMCID: PMC5946815 DOI: 10.1093/bioinformatics/btx801
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Phenotype of 1839 isolates. left: bar plot of phenotype availability for the different drugs. right: heatmap quantifying the number of instances of co-occurrence of resistance between drugs normalized by total number of isolates resistant to at least one drug. Off-diagonal elements show co-occurrence of resistance between different drugs; on-diagonal elements show cases which are resistant to a single drug
Fig. 2.PCA (upper row) and SL-PCA (lower row) for all clades [Clades are defined based on the whole genome sequences (not just resistance genes). Interested readers are referred to Benavente .] (left plots) and cluster C1 (right plots) in terms of INH resistance (C1: Beijing, Euro, LAM, Tur and Uganda) (Color version of this figure is available at Bioinformatics online.)
Fig. 3.Classification performance in AUC for seven classifiers across eight anti-TB drugs and MDR-TB with the F1, F2 and F3 feature sets. While the horizontal axis is discrete, dashed lines are shown between data for ease of viewing (Color version of this figure is available at Bioinformatics online.)
Comparing performance between best classifier and DA-L for resistance prediction with 8 drugs and MDR-TB
| Drug | DA-L | Best classifier | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Sens | Spec | AUC | Classifier (Feature set) | Sens | Spec | AUC | Classifier with F1* | AUC | |
| INH | 93 ± 0.3 | 99 ± 0.1 | 96 ± 0.0 | RF(F1) | 97† ± 0.3 | 94† ± 0.4 | 99† ± 0.0 | RF(F1) | 98† ± 0.0 |
| EMB | 95 ± 0.7 | 97 ± 0.6 | 96 ± 0.1 | CBMM(F2) | 97† ± 1.0 | 96† ± 0.6 | 99† ± 0.1 | PM(F2) | 99† ± 0.1 |
| RIF | 94 ± 0.5 | 98 ± 0.3 | 96 ± 0.1 | CBMM(F3) | 97† ± 0.4 | 97 ± 0.4 | 99† ± 0.1 | CBMM(F3) | 99† ± 0.1 |
| PZA | 69 ± 1.4 | 100 ± 0.0 | 85 ± 0.0 | PM(F1) | 84† ± 1.2 | 90† ± 1.1 | 95† ± 0.2 | SVM-RBF(F1) | 95† ± 0.2 |
| CIP | 87 ± 1.0 | 99 ± 0.4 | 94 ± 0.1 | PM(F2) | 96† ± 0.9 | 98 ± 0.4 | 98† ± 0.3 | PM(F2) | 98† ± 0.3 |
| MOX | 83 ± 1.4 | 93 ± 0.8 | 87 ± 0.1 | PM(F3) | 95† ± 1.4 | 93 ± 1.0 | 95† ± 0.4 | PM(F3) | 94† ± 0.5 |
| OFX | 81 ± 1.5 | 95 ± 0.9 | 87 ± 0.3 | PM(F3) | 96† ± 1.4 | 92 ± 1.3 | 95† ± 0.5 | PM(F3) | 95† ± 0.6 |
| SM | 63 ± 1.8 | 98 ± 0.6 | 81 ± 0.1 | SVM-RBF(F2) | 87† ± 1.5 | 90† ± 1.0 | 91† ± 0.3 | PM(F2) | 92† ± 0.2 |
| MDR | 90 ± 0.7 | 100 ± 0.2 | 95 ± 0.0 | PM(F3) | 96† ± 0.6 | 98† ± 0.5 | 100† ± 0.1 | PM(F3) | 100† ± 0.0 |
Note: ‘D.SNPs’ refers to those SNPs known not to be causally involved with resistance mechanisms, and which are removed from F1 feature set in one experiment. Sensitivity (sens) and specificity (spec) are shown with AUC, where results are reported as mean and standard error.
P-value is lower than 0.01 (P < 0.01). The P-value of performance measurement of the examined classifier compared to the DA-L was obtained by Wilcoxon signed-rank test. Feature set F1* denotes the feature set F1 without D.SNPs.