| Literature DB >> 31616478 |
Wouter Deelder1,2, Sofia Christakoudi1,3, Jody Phelan1, Ernest Diez Benavente1, Susana Campino1, Ruth McNerney4, Luigi Palla1, Taane G Clark1.
Abstract
Background: Tuberculosis disease, caused by Mycobacterium tuberculosis, is a major public health problem. The emergence of M. tuberculosis strains resistant to existing treatments threatens to derail control efforts. Resistance is mainly conferred by mutations in genes coding for drug targets or converting enzymes, but our knowledge of these mutations is incomplete. Whole genome sequencing (WGS) is an increasingly common approach to rapidly characterize isolates and identify mutations predicting antimicrobial resistance and thereby providing a diagnostic tool to assist clinical decision making.Entities:
Keywords: MDR-TB; Mycobacterium tuberculosis; XDR-TB; drug resistance; machine learning
Year: 2019 PMID: 31616478 PMCID: PMC6775242 DOI: 10.3389/fgene.2019.00922
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Phylogenetic tree*(attached as separate file)* The tree includes all 16,688 isolates, complemented by additional data from lineages 5–7 and M. bovis. The tree was fitted using a maximum likelihood approach implemented in RAxML (Stamakis, 2014).
Drug-resistance loci identified in the machine learning models.
| Drug | Resistant | % | CT-KDG (N) | CT-ALL (N) | GBT-ALL (N) | Overlapping Loci | |
|---|---|---|---|---|---|---|---|
| Isoniazid | 16,422 | 5,215 | 31.8 | 2 | 5 | 103 | |
| Rifampicin | 16,507 | 4,462 | 27.0 | 1 | 1 | 39 | |
| Pyrazinamide | 11,968 | 1,813 | 15.1 | 2 | 4 | 116 | |
| Ethambutol | 14,830 | 2,576 | 17.4 | 1 | 10 | 36 | |
| Streptomycin | 5,213 | 1,338 | 25.7 | 4 | 4 | 134 | |
| Amikacin | 1,435 | 335 | 23.3 | 1 | 1 | 35 | |
| Capreomycin | 1,731 | 389 | 22.5 | 1 | 3 | 44 | |
| Kanamycin | 1,843 | 639 | 34.7 | 1 | 2 | 43 | |
| Ciprofloxacin | 400 | 63 | 15.8 | 1 | 1 | 30 | |
| Ofloxacin | 1,993 | 506 | 25.4 | 1 | 1 | 42 | |
| Moxifloxacin | 885 | 104 | 11.8 | 1 | 2 | 36 | |
| Ethionamide | 940 | 329 | 35.0 | 3 | 1 | 60 | |
| Cycloserine | 391 | 105 | 26.9 | 1 | 5 | 44 | |
| PAS | 407 | 43 | 10.6 | 1 | 1 | 54 | |
| MDR-TB | – | 3748 | 22.5 | 1 | 1 | 82 |
PAS, para-aminosalisylic acid; CT-KDG is a classification tree (CT) applied to a dataset with SNPs that are known to be associated with drug resistance [derived from Ref. (Phelan et al., 2019)]; CT-ALL and GBT-ALL are, respectively, a CT and gradient boosted tree (GBT) applied to a dataset that includes all genome-wide SNPs, except those linked to resistance for other drugs (co-occurrent resistance markers); GBT-CRM is a GBT that is applied to all genome-wide SNPs; MDR-TB is multidrug resistant TB, that is, resistance to isoniazid and rifampicin. *Total number of nonsynonymous mutations in that gene.
Sensitivity, specificity, and accuracy for the models (maximum value per prediction measure is bolded).
| Drug | LR-KDG | CT-KDG | CT-ALL | GBT-ALL | GBT-CRM | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sens. | Spec | Acc. | Sens | Spec | Acc | Sens | Spec | Acc | Sens | Spec | Acc | Sens | Spec | Acc | |
| INH | 87.3 | 95.3 | 87.3 | 95.3 | 87.3 | 95.3 | 88.0 | 99.0 | 95.4 | 98.8 | |||||
| RIF | 82.8 | 95.1 | 82.8 | 95.1 | 82.8 | 95.1 | 82.8 | 95.1 |
| 98.9 | |||||
| PZA | 21.6 | 87.2 | 21.6 | 87.2 | 35.2 | 98.5 | 88.2 | 42.8 | 99.2 | 90.0 | 96.1 | ||||
| EMB | 93.1 | 91.6 | 80.9 | 94 | 91.6 | 80.9 | 94.0 | 91.6 | 81.7 | 82.8 | 94.2 | 92.1 | |||
| STM | 71.6 | 91.1 | 72.3 | 96.5 | 90.3 | 71.2 | 97.3 | 90.6 | 72.3 | 97.3 | 90.9 | 96.0 | |||
| AMK | |||||||||||||||
| CAP | 69.6 | 95.5 | 89.6 | 69.6 | 95.5 | 89.6 | 69.6 | 95.5 | 89.6 | 72.1 | 95.8 | 90.4 | |||
| KAN | 74.4 | 89.7 | 74.4 | 89.7 | 97.8 | 91.8 | 80.8 | 97.8 | 91.3 | 98.2 | |||||
| CIP | 85.7 | 96.2 | 85.7 | 96.2 | |||||||||||
| OFL | 80 | 93.5 | 80.0 | 93.5 | 80.0 | 93.5 | 97.0 | 93.2 | |||||||
| MOX | 93.2 | 90.9 | 93.2 | 90.9 | 46.6 | 53.3 | 96.2 | 92.6 | 53.3 | 97.5 | |||||
| ETH | 75.6 | 75.6 | 75.6 | 75.6 | 74.2 | 79.6 | 77.7 | 66.6 | 92.6 | 83.5 | 68.1 | ||||
| CYS* | 88.6 |
| 38.4 |
|
| 30.7 | 94.3 | 73.4 | 46.1 | 92.4 | 77.2 | 50.0 | 92.4 |
| |
| PAS | 0 |
| 87.8 |
|
|
| 0 |
| 87.8 | 10.0 |
| 89.0 |
|
|
|
| MDR | 85.9 | 96.9 | 94.4 | 85.9 | 96.9 | 94.4 | 85.9 | 96.9 | 94.4 | 86.2 |
| 95.0 |
| 96.9 |
|
*No known drug-resistance SNPs for CYS were included in the KDG models; reported outcomes are the performance on the test set; RIF, rifampicin; INH, isoniazid; EMB, ethambutol; PZA, pyrazinamide; CIP, ciprofloxacin; OFL, ofloxacin; MOX, moxifloxacin; AMK, amikacin; KAN, kanamycin; CAP, capreomycin; PAS, para-aminosalisylic acid (PAS); CYS, cycloserine; ETH, ethionamide; CT-KDG is a classification tree (CT) fitted to a dataset with SNPs that are known to be associated with drug resistance [derived from Ref. (Phelan et al., 2019)]; LR-KDG is a logistic regression model applied to the same SNP set as CT-KDG; CT-ALL and GBT-ALL are, respectively, a CT and gradient boosted tree (GBT) applied to a dataset that includes all genome-wide SNPs, except those linked to resistance for other drugs (co-occurrent resistance markers); GBT-CRM is a GBT that is applied to all genome-wide SNPs; MDR is multidrug resistant TB.