| Literature DB >> 30736750 |
Tra-My Ngo1, Yik-Ying Teo2,3,4,5,6.
Abstract
BACKGROUND: It is possible to predict whether a tuberculosis (TB) patient will fail to respond to specific antibiotics by sequencing the genome of the infecting Mycobacterium tuberculosis (Mtb) and observing whether the pathogen carries specific mutations at drug-resistance sites. This advancement has led to the collation of TB databases such as PATRIC and ReSeqTB that possess both whole genome sequences and drug resistance phenotypes of infecting Mtb isolates. Bioinformatics tools have also been developed to predict drug resistance from whole genome sequencing (WGS) data. Here, we evaluate the performance of four popular tools (TBProfiler, MyKrobe, KvarQ, PhyResSE) with 6746 isolates compiled from publicly available databases, and subsequently identify highly probable phenotyping errors in the databases by genetically predicting the drug phenotypes using all four software.Entities:
Keywords: Drug-resistance; Genomic prediction; Tuberculosis
Mesh:
Substances:
Year: 2019 PMID: 30736750 PMCID: PMC6368788 DOI: 10.1186/s12859-019-2658-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Empirical sensitivities and specificities of four software for predicting anti-TB drug resistance
| Drug | Number of samples | Sensitivity | Specificity | ||||||
|---|---|---|---|---|---|---|---|---|---|
| TBProfiler | MyKrobe | KVarQ | PhyResSE | TBProfiler | MyKrobe | KVarQ | PhyResSE | ||
| INH | 4840 | 0.92 (0.91, 0.93) | 0.91 (0.89, 0.92) | 0.89 (0.88, 0.90) | 0.91 (0.89, 0.92) | 0.94 (0.93, 0.95) | 0.98 (0.97, 0.98) | 0.98 (0.98, 0.99) | 0.97 (0.96, 0.98) |
| RIF | 4843 | 0.91 (0.89, 0.92) | 0.92 (0.91, 0.94) | 0.92 (0.90, 0.93) | 0.94 (0.93, 0.95) | 0.95 (0.94, 0.96) | 0.97 (0.96, 0.97) | 0.97 (0.96, 0.97) | 0.96 (0.95, 0.97) |
| EMB | 4585 | 0.91 (0.89, 0.92) | 0.83 (0.81, 0.86) | 0.65 (0.62, 0.67) | 0.76 (0.73, 0.78) | 0.83 (0.81, 0.84) | 0.86 (0.85, 0.87) | 0.91 (0.90, 0.92) | 0.88 (0.87, 0.89) |
| PZA | 5026 | 0.59 (0.55, 0.63) | 0.38 (0.34, 0.42) | 0.54 (0.50, 0.58) | 0.58 (0.54, 0.61) | 0.92 (0.91, 0.93) | 0.98 (0.98, 0.99) | 0.94 (0.93, 0.94) | 0.97 (0.97, 0.98) |
| STM | 4357 | 0.82 (0.80, 0.84) | 0.79 (0.77, 0.81) | 0.75 (0.73, 0.77) | 0.76 (0.74, 0.78) | 0.86 (0.85, 0.87) | 0.93 (0.92, 0.94) | 0.92 (0.91, 0.93) | 0.92 (0.91, 0.93) |
| AMK | 1649 | 0.90 (0.86, 0.93) | 0.75 (0.70, 0.80) | 0.75 (0.69, 0.79) | 0.79 (0.74, 0.83) | 0.75 (0.73, 0.78) | 0.99 (0.98, 1.00) | 0.99 (0.98, 0.99) | 0.99 (0.98, 0.99) |
| CAP | 1830 | 0.71 (0.66, 0.76) | 0.67 (0.62, 0.73) | NA | 0.71 (0.65, 0.75) | 0.95 (0.93, 0.96) | 0.93 (0.92, 0.95) | NA | 0.96 (0.94, 0.97) |
| KAN | 1578 | 0.87 (0.83, 0.90) | 0.75 (0.71, 0.80) | 0.72 (0.67, 0.76) | 0.82 (0.77, 0.86) | 0.96 (0.94, 0.97) | 0.98 (0.97, 0.99) | 0.99 (0.98, 0.99) | 0.97 (0.96, 0.98) |
| CIP | 191 | 0.87 (0.75, 0.94) | 0.83 (0.71, 0.92) | 0.82 (0.70, 0.90) | 0.88 (0.77, 0.95) | 0.97 (0.92, 0.99) | 0.98 (0.93, 1.00) | 0.97 (0.92, 0.99) | 0.98 (0.93, 1.00) |
| MFX | 1086 | 0.68 (0.61, 0.75) | 0.61 (0.53, 0.68) | 0.58 (0.50, 0.65) | 0.67 (0.60, 0.74) | 0.93 (0.91, 0.94) | 0.95 (0.94, 0.97) | 0.93 (0.91, 0.94) | 0.95 (0.93, 0.96) |
| OFX | 2424 | 0.81 (0.78, 0.84) | 0.74 (0.71, 0.78) | 0.72 (0.68, 0.75) | 0.81 (0.77, 0.84) | 0.96 (0.95, 0.97) | 0.98 (0.97, 0.98) | 0.96 (0.95, 0.97) | 0.97 (0.97, 0.98) |
| ETO | 835 | 0.41 (0.35, 0.48) | NA | NA | 0.07 (0.04, 0.11) | 0.82 (0.79, 0.85) | NA | NA | 0.97 (0.95, 0.98) |
| PTO | 540 | 0.30 (0.24, 0.37) | NA | NA | 0.02 (0.01, 0.05) | 0.92 (0.89, 0.95) | NA | NA | 1.00 (0.98, 1.00) |
| PAS | 469 | 0.14 (0.08, 0.24) | NA | NA | 0.00 (0.00, 0.04) | 0.97 (0.95, 0.98) | NA | NA | 1.00 (0.99, 1.00) |
Numbers in brackets represent the corresponding 95% confidence intervals. An NA is assigned when the software does not predict the resistance profile for the specific drug
Fig. 1Misclassification rates of laboratory-based DST results by drugs. The solid circles represent the point estimates of the misclassification rates upon comparing the laboratory-based DST phenotypes with genetically-inferred drug-resistant phenotypes. The genetically-inferred phenotypes were probabilistically ascertained using all four software. The vertical lines represent the corresponding 95% confidence intervals
Summary of isolates in the three TB databases according to the 14 anti-TB drugs
| Drug | PATRIC | ReSeqTB | LitRev | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| INH | 5018 | 0.043 | 5018 | 0.000 | 3503 | 0.045 | 3555 | 0.015 | 5451 | 0.043 | 5451 | 0.000 |
| RIF | 4979 | 0.028 | 4981 | 0.000 | 3459 | 0.038 | 3521 | 0.018 | 5416 | 0.034 | 5416 | 0.000 |
| EMB | 4739 | 0.075 | 4787 | 0.010 | 3497 | 0.098 | 3546 | 0.014 | 5335 | 0.092 | 5335 | 0.000 |
| PZA | 3633 | 0.054 | 3633 | 0.000 | 3298 | 0.095 | 3346 | 0.014 | 4752 | 0.082 | 4752 | 0.000 |
| STM | 3367 | 0.141 | 3380 | 0.004 | 1951 | 0.129 | 2008 | 0.028 | 3716 | 0.129 | 3716 | 0.000 |
| AMK | 1131 | 0.034 | 1131 | 0.000 | 983 | 0.040 | 993 | 0.010 | 1256 | 0.037 | 1256 | 0.000 |
| CAP | 1100 | 0.055 | 1101 | 0.001 | 1158 | 0.076 | 1167 | 0.008 | 1587 | 0.059 | 1588 | 0.001 |
| KAN | 1348 | 0.056 | 1350 | 0.001 | 716 | 0.031 | 716 | 0.000 | 1095 | 0.031 | 1095 | 0.000 |
| CIP | 340 | 0.018 | 340 | 0.000 | 358 | 0.031 | 358 | 0.000 | 313 | 0.016 | 313 | 0.000 |
| MFX | 726 | 0.059 | 726 | 0.000 | 874 | 0.071 | 885 | 0.012 | 993 | 0.057 | 993 | 0.000 |
| OFX | 836 | 0.065 | 851 | 0.018 | 1163 | 0.070 | 1188 | 0.021 | 1818 | 0.052 | 1818 | 0.000 |
| ETO | 559 | 0.370 | 562 | 0.005 | 252 | 0.159 | 252 | 0.000 | 321 | 0.265 | 321 | 0.000 |
| PTO | 52 | 0.115 | 52 | 0.000 | 410 | 0.337 | 431 | 0.049 | 498 | 0.301 | 498 | 0.000 |
| PAS | 375 | 0.208 | 375 | 0.000 | 78 | 0.026 | 78 | 0.000 | 74 | 0.108 | 74 | 0.000 |
N refers to the number of isolates with valid DST results and genetically-inferred credibility scoring; R refers to the misclassification rate in each database, defined as the proportion of the N isolates with DST credibility scores < 0.5; N is defined as the summation of N and the number of good mapping quality isolates with no genetically-inferred credibility scoring; and R refers to the proportion of N isolates that presented unusable laboratory-based DST phenotypes due to either inconsistent results (across multiple DST phenotype entries for the same isolate) or documentation errors across the databases
Calibrated sensitivities and specificities of four software for predicting anti-TB drug resistance
| Drug | Number of samples | Sensitivity | Specificity | ||||||
|---|---|---|---|---|---|---|---|---|---|
| TBProfiler | MyKrobe | KVarQ | PhyResSE | TBProfiler | MyKrobe | KVarQ | PhyResSE | ||
| INH | 4840 | 0.99 | 0.99 | 0.99 | 1.00 | 0.94 | 0.99 | 1.00 | 0.99 |
| RIF | 4843 | 0.96 | 0.97 | 0.96 | 0.99 | 0.98 | 1.00 | 1.00 | 1.00 |
| EMB | 4585 | 0.99 | 0.93 | 0.69 | 0.84 | 0.96 | 1.00 | 1.00 | 0.99 |
| PZA | 5026 | 0.83 | 0.40 | 0.79 | 0.66 | 0.97 | 1.00 | 0.99 | 1.00 |
| STM | 4357 | 1.00 | 0.91 | 0.99 | 0.99 | 0.90 | 0.94 | 1.00 | 0.99 |
| AMK | 1649 | 1.00 | 0.96 | 0.97 | 1.00 | 0.75 | 1.00 | 1.00 | 0.99 |
| CAP | 1830 | 0.99 | 0.95 | NA | 0.99 | 0.99 | 0.98 | NA | 1.00 |
| KAN | 1578 | 1.00 | 0.89 | 0.84 | 1.00 | 0.96 | 1.00 | 1.00 | 1.00 |
| CIP | 191 | 0.95 | 0.95 | 0.93 | 1.00 | 0.98 | 1.00 | 0.99 | 1.00 |
| MFX | 1086 | 0.97 | 0.87 | 0.85 | 0.99 | 0.98 | 1.00 | 0.97 | 1.00 |
| OFX | 2424 | 0.97 | 0.89 | 0.87 | 0.99 | 0.98 | 1.00 | 0.98 | 1.00 |
| ETO | 835 | 0.95 | NA | NA | 0.21 | 0.94 | NA | NA | 1.00 |
| PTO | 540 | 0.80 | NA | NA | 0.05 | 0.99 | NA | NA | 1.00 |
| PAS | 469 | 0.14 | NA | NA | 0.00 | 0.97 | NA | NA | 1.00 |
An NA is assigned when the software does not predict the resistance profile for the specific drug