| Literature DB >> 34376217 |
Hooman Zabeti1, Nick Dexter2, Amir Hosein Safari3, Nafiseh Sedaghat3, Maxwell Libbrecht3, Leonid Chindelevitch4.
Abstract
MOTIVATION: Prediction of drug resistance and identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Solving this problem requires a transparent, accurate, and flexible predictive model. The methods currently used for this purpose rarely satisfy all of these criteria. On the one hand, approaches based on testing strains against a catalogue of previously identified mutations often yield poor predictive performance; on the other hand, machine learning techniques typically have higher predictive accuracy, but often lack interpretability and may learn patterns that produce accurate predictions for the wrong reasons. Current interpretable methods may either exhibit a lower accuracy or lack the flexibility needed to generalize them to previously unseen data. CONTRIBUTION: In this paper we propose a novel technique, inspired by group testing and Boolean compressed sensing, which yields highly accurate predictions, interpretable results, and is flexible enough to be optimized for various evaluation metrics at the same time.Entities:
Keywords: Drug resistance; Group testing; Integer linear programming; Interpretable machine learning; Rule-based learning; Whole-genome sequencing
Year: 2021 PMID: 34376217 PMCID: PMC8353837 DOI: 10.1186/s13015-021-00198-1
Source DB: PubMed Journal: Algorithms Mol Biol ISSN: 1748-7188 Impact factor: 1.405
Correspondence between group testing and the drug resistance prediction problem
| Term | Meaning: group testing | Meaning: drug resistance | Notation | Domain |
|---|---|---|---|---|
| Row dimension | Number of tests | Number of isolates | ||
| Column dimension | Population size | Number of SNPs/variants | ||
| Sparsity/rule size | Infection prevalence | Number of relevant SNPs | ||
| Design matrix | Test membership | Genotype matrix | ||
| Outcome vector | Test result vector | Phenotype/label vector | ||
| Status vector | Infected/uninfected | Relevant/irrelevant to DR |
Summary statistics for our dataset, with a line separating first-line and second-line drugs
| Drug | # of isolates | # of resistant isolates | # of SNPs | # of SNP groups |
|---|---|---|---|---|
| Ethambutol | 6096 | 1407 | 597,133 | 55,164 |
| Isoniazid | 7734 | 3445 | 642,373 | 65,090 |
| Pyrazinamide | 3858 | 754 | 281,432 | 33,942 |
| Rifampicin | 7715 | 2968 | 646,855 | 65,379 |
| Streptomycin | 5125 | 2104 | 542,640 | 45,037 |
| Kanamycin | 2436 | 697 | 391,708 | 21,513 |
| Amikacin | 2033 | 573 | 141,952 | 17,103 |
| Capreomycin | 1991 | 552 | 341,935 | 15,389 |
| Ofloxacin | 2911 | 800 | 407,235 | 23,905 |
| Moxifloxacin | 961 | 129 | 97,700 | 11,927 |
| Ciprofloxacin | 443 | 37 | 43,950 | 5,563 |
| Ethionamide | 1516 | 498 | 344,960 | 15,145 |
Balanced accuracy of all the methods in predicting drug resistance to 12 drugs
| Drug | INGOT-DR | KOVER | LR-l1 | LR-l2 | RF | SVM-l1 | SVM-l2 |
|---|---|---|---|---|---|---|---|
| Isoniazid | 0.898 | 0.889 | 0.877 | 0.801 | 0.899 | 0.880 | |
| Rifampicin | 0.909 | 0.904 | 0.894 | 0.826 | 0.920 | 0.902 | |
| Ethambutol | 0.809 | 0.805 | 0.833 | 0.816 | 0.781 | 0.835 | |
| Pyrazinamide | 0.860 | 0.862 | 0.829 | 0.796 | 0.841 | 0.844 | |
| Streptomycin | 0.826 | 0.839 | 0.852 | 0.840 | 0.792 | 0.847 | |
| Kanamycin | 0.856 | 0.838 | 0.845 | 0.805 | 0.859 | 0.838 | |
| Amikacin | 0.843 | 0.817 | 0.853 | 0.785 | 0.853 | 0.851 | |
| Capreomycin | 0.826 | 0.836 | 0.812 | 0.764 | 0.826 | 0.812 | |
| Ethionamide | 0.734 | 0.736 | 0.715 | 0.704 | 0.659 | 0.702 | |
| Ofloxacin | 0.912 | 0.908 | 0.909 | 0.840 | 0.788 | 0.845 | |
| Moxifloxacin | 0.834 | 0.912 | 0.803 | 0.82 | 0.918 | 0.803 | |
| Ciprofloxacin | 0.780 | 0.720 | 0.623 | 0.774 | 0.714 |
Maximum values are shown in bold
Fig. 1Sensitivity and specificity of all the methods in predicting drug resistance to 12 drugs
Number of SNPs involved in the prediction made by each model for each drug
| Drug | INGOT-DR | KOVER | LR-l1 | LR-l2 | RF | SVM-l1 | SVM-l2 |
|---|---|---|---|---|---|---|---|
| Isoniazid | 20 | 20 | 1045 | 62,707 | 22,336 | 626 | 54,630 |
| Rifampicin | 20 | 20 | 739 | 63,621 | 29,373 | 476 | 52,732 |
| Ethambutol | 20 | 19 | 154 | 53,476 | 19,864 | 661 | 43,094 |
| Pyrazinamide | 20 | 17 | 114 | 32,885 | 9495 | 428 | 25,485 |
| Streptomycin | 20 | 13 | 5804 | 43,771 | 23,996 | 594 | 40,183 |
| Kanamycin | 20 | 20 | 2383 | 20,934 | 9314 | 231 | 18,716 |
| Amikacin | 20 | 19 | 2252 | 16,622 | 7639 | 212 | 14,260 |
| Capreomycin | 20 | 20 | 2103 | 14,907 | 7881 | 234 | 13,432 |
| Ethionamide | 20 | 20 | 41 | 14,791 | 7777 | 280 | 13,551 |
| Ofloxacin | 20 | 17 | 394 | 23,206 | 14,312 | 265 | 19,694 |
| Moxifloxacin | 12 | 7 | 29 | 11,678 | 1371 | 125 | 10,237 |
| Ciprofloxacin | 5 | 5 | 18 | 5448 | 325 | 29 | 4343 |
Fig. 2Top SNPs chosen by each model, categorized by association with drug resistance