| Literature DB >> 32218465 |
Salma Jamal1, Mohd Khubaib1, Rishabh Gangwar1, Sonam Grover1, Abhinav Grover2, Seyed E Hasnain3,4.
Abstract
Tuberculosis (TB), an infectious disease caused by Mycobacterium tuberculosis (M.tb), causes highest number of deaths globally for any bacterial disease necessitating novel diagnosis and treatment strategies. High-throughput sequencing methods generate a large amount of data which could be exploited in determining multi-drug resistant (MDR-TB) associated mutations. The present work is a computational framework that uses artificial intelligence (AI) based machine learning (ML) approaches for predicting resistance in the genes rpoB, inhA, katG, pncA, gyrA and gyrB for the drugs rifampicin, isoniazid, pyrazinamide and fluoroquinolones. The single nucleotide variations were represented by several sequence and structural features that indicate the influence of mutations on the target protein coded by each gene. We used ML algorithms - naïve bayes, k nearest neighbor, support vector machine, and artificial neural network, to build the prediction models. The classification models had an average accuracy of 85% across all examined genes and were evaluated on an external unseen dataset to demonstrate their application. Further, molecular docking and molecular dynamics simulations were performed for wild type and predicted resistance causing mutant protein and anti-TB drug complexes to study their impact on the conformation of proteins to confirm the observed phenotype.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32218465 PMCID: PMC7099008 DOI: 10.1038/s41598-020-62368-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Total number of variations obtained from TBDReaMDB and GMTV database for each TB drug and the number of mutations obtained after data preprocessing.
| Drug | Gene | TBDReaMDB | GMTV | Final variations |
|---|---|---|---|---|
| Rifampin | 134 | 198 | 114 | |
| Isoniazid | 13 | 30 | 27 | |
| 273 | 83 | 250 | ||
| Pyrazinamide | 278 | 137 | 241 | |
| Fluoroquinolones | 17 | 112 | 73 | |
| 18 | 72 | 49 |
Number of genes/mutations included in the final training dataset and testing dataset, and the actual number of resistant and susceptible mutations included in both training and test dataset.
| Drug | Gene | Training set | Testing set | ||||
|---|---|---|---|---|---|---|---|
| Resistant | Susceptible | Total | Resistant | Susceptible | Total | ||
| Rifampin | 40 | 52 | 92 | 10 | 12 | 22 | |
| Isoniazid | 8 | 14 | 22 | 2 | 3 | 5 | |
| 108 | 92 | 201 | 27 | 23 | 50 | ||
| Pyrazinamide | 112 | 81 | 193 | 27 | 21 | 48 | |
| Fluoroquinolones | 25 | 33 | 58 | 6 | 8 | 14 | |
| 23 | 17 | 40 | 5 | 4 | 9 | ||
The performance of the classification models on the training data set using 10-folds cross validation.
| Gene | Measure/Methods | NB | SVM | ANN | kNN |
|---|---|---|---|---|---|
| Accuracy | 88.04% | 84.78% | 95.65% | 86.95% | |
| AUC | 0.92 | 0.83 | 0.99 | 0.87 | |
| Accuracy | 90.90% | 63.63% | 95.45% | 90.90% | |
| AUC | 0.75 | 0.5 | 1 | 0.9 | |
| Accuracy | 84% | 78.50% | 98.50% | 92% | |
| AUC | 0.94 | 0.77 | 0.99 | 0.91 | |
| Accuracy | 83.93% | 75.12% | 99.48% | 90.67% | |
| AUC | 0.96 | 0.76 | 1 | 0.9 | |
| Accuracy | 75.86% | 72.41% | 86.20% | 81.03% | |
| AUC | 0.86 | 0.71 | 0.95 | 0.82 | |
| Accuracy | 82.50% | 70% | 97.50% | 97.50% | |
| AUC | 0.91 | 0.68 | 0.96 | 0.97 |
The performance of the classification models on the non-redundant testing data set.
| Gene | Measure/Methods | NB | SVM | ANN | kNN |
|---|---|---|---|---|---|
| Accuracy | 90.90% | 86.36% | 90.90% | 95.45% | |
| AUC | 0.97 | 0.85 | 1 | 0.95 | |
| Accuracy | 100% | 60% | 81.81% | 100% | |
| AUC | 1 | 0.5 | 0.92 | 1 | |
| Accuracy | 98% | 70% | 98% | 96% | |
| AUC | 0.98 | 0.69 | 1 | 0.97 | |
| Accuracy | 93.75% | 81.25% | 97.91% | 97.91% | |
| AUC | 0.97 | 0.82 | 1 | 0.98 | |
| Accuracy | 92.85% | 78.57% | 100% | 85.71% | |
| AUC | 0.97 | 0.77 | 1 | 0.86 | |
| Accuracy | 66.66% | 77.77% | 88.88% | 88.88% | |
| AUC | 0.75 | 0.8 | 1 | 0.92 |
Figure 1ROC plots for all the models generated for genes (A) rpoB, (B) pncA, (C) inhA, (D) katG, (E) gyrA and (F) gyrB.
Docking scores of wild-type and mutant drug bound proteins.
| Gene | Drug | Wild type and mutants | Glide docking score (kcal/mol) |
|---|---|---|---|
| Isoniazid | wild type | −4.41 | |
| L587I | −4.29 | ||
| N238K | −4.09 | ||
| L634F | −4.30 | ||
| L619P | −4.17 | ||
| Pyrazinamide | wild type | −4.20 | |
| L96E | −3.55 | ||
| V155M | −3.48 | ||
| Fluoroquinolones | |||
| wild type | |||
| Ofloxacin | −3.18 | ||
| Moxifloxacin | −2.17 | ||
| Ciprofloxacin | −2.86 | ||
| L711M | |||
| Ofloxacin | −1.14 | ||
| Moxifloxacin | −0.09 | ||
| Ciprofloxacin | −2.39 | ||
| wild type | |||
| Ofloxacin | −2.72 | ||
| Moxifloxacin | −3.00 | ||
| Ciprofloxacin | −3.52 | ||
| Q431E | |||
| Ofloxacin | −2.34 | ||
| Moxifloxacin | −2.24 | ||
| Ciprofloxacin | −2.85 | ||
| wild type | |||
| Ofloxacin | −4.48 | ||
| Moxifloxacin | −4.15 | ||
| Ciprofloxacin | −4.07 | ||
| N499T | |||
| Ofloxacin | −3.86 | ||
| Moxifloxacin | −3.79 | ||
| Ciprofloxacin | −2.05 | ||
Figure 2RMSD, Rg and SASA plot for pncA gene. The RMSD, Rg and SASA were less in case of wild type indicating that the mutations destabilized the protein.
Figure 3RMSD, Rg and SASA plot for katG gene. The RMSD, Rg and SASA of mutants were higher that wild type demonstrating that the wild type protein was more stable.
Average values of RMSD, Rg and SASA for wild type and mutant protein-drug complexes over the course of entire MD simulations run.
| Gene | Drug | Wild type and mutants | RMSD (nm) | Rg (nm) | SASA(nm2) |
|---|---|---|---|---|---|
| Isoniazid | wild type | 0.25 | 2.80 | 316.90 | |
| L587I | 0.28 | 2.88 | 327.19 | ||
| N238K | 0.28 | 2.88 | 325.70 | ||
| L634F | 0.27 | 2.80 | 309.39 | ||
| L619P | 0.35 | 2.84 | 334.17 | ||
| Pyrazinamide | wild type | 0.20 | 1.57 | 93.10 | |
| L96E | 0.33 | 1.61 | 95.73 | ||
| V155M | 0.35 | 1.63 | 95.49 | ||
| Fluoroquinolones | |||||
| wild type | |||||
| Ofloxacin | 0.18 | 1.97 | 157.29 | ||
| Moxifloxacin | 0.18 | 1.94 | 155.76 | ||
| Ciprofloxacin | 0.18 | 1.93 | 154.25 | ||
| L711M | |||||
| Ofloxacin | 0.20 | 1.92 | 153.61 | ||
| Moxifloxacin | 0.18 | 1.92 | 152.33 | ||
| Ciprofloxacin | 0.19 | 1.92 | 153.59 | ||
| wild type | |||||
| Ofloxacin | 0.28 | 2.95 | 270.43 | ||
| Moxifloxacin | 0.27 | 2.95 | 268.73 | ||
| Ciprofloxacin | 0.27 | 2.96 | 269.43 | ||
| Q431E | |||||
| Ofloxacin | 0.29 | 2.97 | 267.41 | ||
| Moxifloxacin | 0.24 | 7.27 | 259.24 | ||
| Ciprofloxacin | 0.26 | 2.98 | 262.34 | ||
| wild type | |||||
| Ofloxacin | 0.19 | 1.96 | 142.37 | ||
| Moxifloxacin | 0.22 | 1.93 | 144.99 | ||
| Ciprofloxacin | 0.26 | 1.96 | 144.32 | ||
| N499T | |||||
| Ofloxacin | 0.23 | 1.94 | 144.17 | ||
| Moxifloxacin | 0.28 | 1.95 | 144.61 | ||
| Ciprofloxacin | 0.20 | 1.93 | 142.66 | ||
Figure 4Interaction patterns between (A) wild type and (B) L587I (C) L619P (D) L634F (E) N238K mutant protein-isoniazid complexes. The drug bound to protein through hydrophobic interactions only, however strong binding was observed in wild type protein.
Figure 5Hydrogen bonding and hydrophobic interactions seen in (A) wild type, (B) L96E and (C) V155M mutant protein-pyrazinamide complexes. Fewer interacting residues were observed in case of mutants in comparison to wild type.
Figure 6RMSD, Rg and SASA plot for gyrA gene, N-terminal protein. The plots for RMSD, Rg and SASA were similar to wild type in case of mutant, L711M.
Figure 7RMSD, Rg and SASA plot for gyrA gene, C-terminal protein. For Q431E mutant, the RMSD and Rg were slightly higher than wild type, however SASA was less for mutant protein.
Figure 8Interaction pattern observed between N-terminal of wild type gyrase A and fluoroquinolones; (A) ofloxacin; (B) moxifloxacin; (C) ciprofloxacin and mutant, L711M; (D) ofloxacin; (E) moxifloxacin and (F) ciprofloxacin. The wild type protein formed hydrogen bonds with the drugs whereas no hydrogen bond was present in case of mutant protein-drug complexes.
Figure 9Interaction pattern observed between C-terminal of wild type gyrase A and fluoroquinolones; (A) ofloxacin; (B) moxifloxacin; (C) ciprofloxacin and mutant, Q431E; (D) ofloxacin; (E) moxifloxacin and (F) ciprofloxacin. More number of interacting residues were present in wild type protein bound to the drugs than in mutant protein-drug complexes.
Figure 10RMSD, Rg and SASA plot for gyrB gene. The RMSD was higher for mutant while Rg and SASA were approximately similar for wild type and mutant showing that mutation did not had much impact on the protein.
Figure 11Hydrogen bonding and hydrophobic interactions between wild type gyrase B and various drugs (A) ofloxacin; (B) moxifloxacin; (C) ciprofloxacin and mutant protein, N499T; (D) ofloxacin (E) moxifloxacin and (F) ciprofloxacin. In case of mutant proteins, only weak hydrophobic interactions were seen.
Mutations from external blind dataset predicted to be resistance causing by our models.
| Gene | Mutations |
|---|---|
| F430S, G432D, G432S, S434R, Q435K, L436R, S437R, Q438K, Q438R, D441E, D441N, N444K, L449S, H451D, H451N, H451Q, H451R, H451S, H451T, P460S, I486T | |
| I16T, I21T, I47T, I95T, I194T | |
| L48Q, A61T, A65T, A66P, I71N, M84I, Q88R, G99E, A106V, W107R, H108D, H108E, H108Q, A109V, A110V, G121V, M126I, A139P, L148R, Y155S, A162T, G169A, A172T, A172V, M176I, G186V, W191R, G234E, G234R, A243S, M257T, M257I, T262R, A264T, G279D, A281V, G285D, A291P, G297V, G299A, W300G, Y304S, G305A, G307R, G307A, G307E, G309S, G309D, G316S, G316D, W321R, W321L, W321S, W328G, W328L, W328S, I335T, L336R, W341S, A350T, A350S, A361D, A379V, L384R, I393N, A409R, A409D, A424E, A424V, P429S, A444T, I462T, G485V, W505S, W505R, A550D, F567S, A574E, A574V, P589T, G593D, M609I, G629S, A636E, G685R, G699Q, A713P, A716P, A727D | |
| A146T, A171E, A171T, G162D, L159R, L182S, S179R, T142K, T153N, T168N | |
| A74S, G88A | |
| G509A, N538K, A543T, A543V |
The types of descriptors used for the generation of machine learning models.
| Sequence properties | Structural properties |
|---|---|
Molecular weight; Polarity; Hydrophobicity; van der Waals volume; Residue type; Isoelectric point | Solvent accessible surface area; Secondary structure where the mutation is located in experimental structure; ΔΔG |