| Literature DB >> 30462147 |
Samaneh Kouchaki1, Yang Yang1, Timothy M Walker2,3, A Sarah Walker2,3,4, Daniel J Wilson5, Timothy E A Peto2,3, Derrick W Crook2,3,6, David A Clifton1.
Abstract
MOTIVATION: Timely identification of Mycobacterium tuberculosis (MTB) resistance to existing drugs is vital to decrease mortality and prevent the amplification of existing antibiotic resistance. Machine learning methods have been widely applied for timely predicting resistance of MTB given a specific drug and identifying resistance markers. However, they have been not validated on a large cohort of MTB samples from multi-centers across the world in terms of resistance prediction and resistance marker identification. Several machine learning classifiers and linear dimension reduction techniques were developed and compared for a cohort of 13 402 isolates collected from 16 countries across 6 continents and tested 11 drugs.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30462147 PMCID: PMC6596891 DOI: 10.1093/bioinformatics/bty949
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
The phenotype profile; the number of isolates that are resistant or susceptible
| Drug | INH | EMB | RIF | PZA | SM | KAN | AK | CAP | CIP | OFX | MOX |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Susceptible | 9620 | 11 322 | 10 359 | 9806 | 5105 | 1925 | 2690 | 2741 | 529 | 2618 | 1249 |
| Resistant | 3457 | 1571 | 2808 | 1262 | 1729 | 242 | 273 | 315 | 77 | 458 | 262 |
| Total tested | 13 077 | 12 893 | 13 167 | 11 068 | 6834 | 2167 | 2963 | 3056 | 606 | 3076 | 1511 |
| Missing | 325 | 509 | 235 | 2334 | 6568 | 11 235 | 10 439 | 10 346 | 12 796 | 10 326 | 11 891 |
Fig. 1.Classification performance (AUC%) considering six machine learning classifiers (LR with L1 and L2 regularization terms, SVM with Linear and RBF kernels, RF, Adaboost, PM and GBT) across 11 anti-TB drugs and F1 feature space
Fig. 2.Classification performance (AUC%) considering six machine learning classifiers across 11 anti-TB drugs with (a) SNMF-F1 and (b) SPCA-F1
Comparing the best machine learning classifier and DA considering 11 drugs
| Drugs |
|
| |||||
|---|---|---|---|---|---|---|---|
| Sensitivity | Specificity | AUC | Feature set + Classifier | Sensitivity | Specificity | AUC | |
| INH | 91.95 ± 1.04 | 98.71 ± 0.22 | 94.95 ± 0.54 | F1 + LR – L2 | 92.19◊ ± 0.94 | 98.38 ± 0.29 | 97.89◊ ± 0.38 |
| EMB | 83.31 ± 1.62 | 95.17 ± 0.38 | 89.24 ± 0.85 | F1 + LR – L2 | 92.12◊ ± 0.98 | 91.89 ± 0.84 | 96.25◊ ± 0.54 |
| RIF | 91.70 ± 1.19 | 98.73 ± 0.22 | 95.22 ± 0.59 | F1 + LR – L2 | 92.27◊ ± 1.25 | 97.45 ± 0.63 | 98.08◊ ± 0.32 |
| PZA | 43.11 ± 2.97 | 98.46 ± 0.27 | 70.78 ± 1.46 | F1 + LR – L2 | 88.12◊ ± 2.65 | 88.91 ± 1.66 | 93.89◊ ± 0.80 |
| SM | 82.80 ± 1.90 | 97.19 ± 0.44 | 89.99 ± 0.99 | F1 + LR – L2 | 87.40◊ ± 1.98 | 94.15 ± 1.23 | 95.15◊ ± 0.56 |
| AK | 65.21 ± 5.32 | 99.70 ± 0.24 | 82.46 ± 2.70 | F1 + SPCA + LR – L2 | 77.23◊ ± 6.96 | 89.84 ± 3.05 | 91.37◊ ± 2.36 |
| MOX | 62.97 ± 6.60 | 98.80 ± 0.68 | 80.89 ± 3.32 | F1 + GBT | 76.84◊ ± 9.29 | 87.19 ± 8.21 | 90.27◊ ± 2.96 |
| OFX | 65.07 ± 3.92 | 99.31 ± 0.28 | 82.19 ± 1.98 | F1 + GBT | 79.06◊ ± 6.94 | 90.88 ± 6.38 | 92.33◊ ± 1.49 |
| KAN | 72.31 ± 5.40 | 97.61 ± 0.65 | 84.96 ± 2.68 | F1 + LR – L2 | 80.41◊ ± 6.48 | 93.48 ± 4.93 | 92.49◊ ± 2.93 |
| CAP | 59.68 ± 5.84 | 93.87 ± 0.88 | 76.78 ± 2.96 | F1 + SPCA + LR – L2 | 64.44◊ ± 6.02 | 92.74 ± 2.52 | 85.46◊ ± 2.02 |
| CIP | 46.65 ± 10.10 | 99.24 ± 0.89 | 72.95 ± 5.17 | F1 + LR – L2 | 79.86◊ ± 9.98 | 85.37 ± 7.65 | 89.53◊ ± 4.06 |
Note: Sensitivity, specificity and AUC (mean ± standard error) is reported. Wilcoxon signed-rank test was used to calculate the P-value of each method compared with the DA and ◊ indicate s P < 0.01.
Top 10 mutations ranked by top performing classifier for each drug
| INH | EMB | RIF | PZA | SM | AK | OFX | MOX |
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
|
| embB_Y319S |
| katG_S315T | katG_S315T |
|
|
|
|
|
|
|
|
| pncA_H51D |
|
| gyrA_E21Q* |
|
|
|
| gyrA_D94A |
|
|
|
|
|
| gidB_S100F* |
|
|
| embB_R507R | pncA_D136G |
|
| rrs_C513T |
| katG_S315N |
| katG_S315T |
|
|
| gidB_G62G |
|
| rpoB_S450L |
|
| rpsA_A381V |
| eis_C-14T |
|
| gyrB_G77S | pncA_D12N | gyrA_S95T* |
| rpoB_V170F |
|
| pncA_H51D | inhA_I194T | pncA_T142A |
|
| gyrA_E21Q* | gyrA_E21Q* | embB_M306I |
|
|
|
|
|
| rpoB_C-61T* |
|
| embA_Q38Q | ndh_Y108C |
|
| rpoB_D435V |
|
|
| rrs_G878A | embB_Y334H |
|
| embB_Y334H | iniC_T89I | katG_C-85T |
| rpoB_S450W |
|
|
| gyrA_K542K | pncA_P62S | gyrA_E21Q* | gyrB_D500N |
| rpoB_C-61T* | rpoB_V168A |
| rpsA_A381V |
|
| katG_V473L | rrs_A906G | gidB_G62G | gyrB_E540D | katG_S315T | pncA_H57R | pncA_F13L | embB_D869E |
Note: Resistance/susceptible-associated mutations to each given drug are indicated in the boldface (susceptible-associated indicated by +). The other mutations are either known to be related to other drugs, are lineage related (indicated by *), or not in the library (indicated by ). fabG1_L203L is a misnomer. fabG1 and inhA are basically contiguous and the L203L mutation actually acts as a mutation in the promoter region for inhA and increases expression of the inhA gene (Kandler ).