| Literature DB >> 29404424 |
Tatum D Mortimer1,2, Alexandra M Weber1,2, Caitlin S Pepperell1,2.
Abstract
Tuberculosis (TB) is the leading cause of death by an infectious disease, and global TB control efforts are increasingly threatened by drug resistance in Mycobacterium tuberculosis. Unlike most bacteria, where lateral gene transfer is an important mechanism of resistance acquisition, resistant M. tuberculosis arises solely by de novo chromosomal mutation. Using whole-genome sequencing data from two natural populations of M. tuberculosis, we characterized the population genetics of known drug resistance loci using measures of diversity, population differentiation, and convergent evolution. We found resistant subpopulations to be less diverse than susceptible subpopulations, consistent with ongoing transmission of resistant M. tuberculosis. A subset of resistance genes ("sloppy targets") were characterized by high diversity and multiple rare variants; we posit that a large genetic target for resistance and relaxation of purifying selection contribute to high diversity at these loci. For "tight targets" of selection, the path to resistance appeared narrower, evidenced by single favored mutations that arose numerous times in the phylogeny and segregated at markedly different frequencies in resistant and susceptible subpopulations. These results suggest that diverse genetic architectures underlie drug resistance in M. tuberculosis and that combined approaches are needed to identify causal mutations. Extrapolating from patterns observed for well-characterized genes, we identified novel candidate variants involved in resistance. The approach outlined here can be extended to identify resistance variants for new drugs, to investigate the genetic architecture of resistance, and when phenotypic data are available, to find candidate genetic loci underlying other positively selected traits in clonal bacteria. IMPORTANCEMycobacterium tuberculosis, the causative agent of tuberculosis (TB), is a significant burden on global health. Antibiotic treatment imposes strong selective pressure on M. tuberculosis populations. Identifying the mutations that cause drug resistance in M. tuberculosis is important for guiding TB treatment and halting the spread of drug resistance. Whole-genome sequencing (WGS) of M. tuberculosis isolates can be used to identify novel mutations mediating drug resistance and to predict resistance patterns faster than traditional methods of drug susceptibility testing. We have used WGS from natural populations of drug-resistant M. tuberculosis to characterize effects of selection for advantageous mutations on patterns of diversity at genes involved in drug resistance. The methods developed here can be used to identify novel advantageous mutations, including new resistance loci, in M. tuberculosis and other clonal pathogens.Entities:
Keywords: Mycobacterium tuberculosis; antibiotic resistance; evolution; genomics; positive selection
Year: 2018 PMID: 29404424 PMCID: PMC5790871 DOI: 10.1128/mSystems.00108-17
Source DB: PubMed Journal: mSystems ISSN: 2379-5077 Impact factor: 6.496
FIG 1 Phylogeny of Mycobacterium tuberculosis sample. The phylogeny was inferred using FastTree (60). Lineages are colored as follows: lineage 1 (L1) is shown in pink, lineage 2 (L2) in blue, lineage 3 (L3) in purple, and lineage 4 (L4) in red. Lineage 4 is associated with deeper branching sublineages in comparison with lineage 2. Bar, 5.0E−5 (5.0 × 10−5) nucleotide substitutions per position.
FIG 2 Distributions of gene-wise nucleotide diversity for all isolates, as well as lineages 4 and 2 considered separately. Repetitive regions of the alignment were masked. Sites were included in estimation of π if 95% of isolates in the alignment had a valid nucleotide at the position. We used Egglib to calculate statistics (64). The nucleotide diversity is lower in lineage 2 than in lineage 4 (P < 2.2 × 10−16 by Welch two-sample t test).
Frequency of resistance in the data set
| Drug | Frequency of the following characteristic in the data set: | ||
|---|---|---|---|
| Resistant | Susceptible | Unknown | |
| INH | 0.59 | 0.33 | 0.08 |
| STR | 0.53 | 0.39 | 0.07 |
| RIF | 0.50 | 0.43 | 0.07 |
| EMB | 0.30 | 0.59 | 0.11 |
| PZA | 0.21 | 0.67 | 0.12 |
| OFL | 0.16 | 0.39 | 0.45 |
| PRO | 0.15 | 0.22 | 0.62 |
| CAP | 0.10 | 0.41 | 0.49 |
| MOX | 0.09 | 0.41 | 0.49 |
| ETO | 0.06 | 0.07 | 0.88 |
| AMI | 0.05 | 0.34 | 0.61 |
| KAN | 0.05 | 0.12 | 0.83 |
AMI, amikacin; CAP, capreomycin; EMB, ethambutol; ETO, ethionamide; INH, isoniazid; KAN, kanamycin; MOX, moxifloxacin; OFL, ofloxacin; PRO, prothionamide; PZA, pyrazinamide; RIF, rifampin; STR, streptomycin.
FIG 3 Diversity of resistant and susceptible isolates. (A) Counts of genes with no nucleotide diversity in resistant and susceptible subpopulations. (B) Gene-wise nucleotide diversity (excluding invariant genes) in susceptible and resistant isolates. Among genes in which nucleotide diversity is measurable, it is similar between resistant and susceptible isolates even when drug resistance-associated genes and targets of independent mutation identified by Farhat et al. (24) are removed (P = 0.13). Abbreviations: AMI, amikacin; CAP, capreomycin; EMB, ethambutol; ETO, ethionamide; INH, isoniazid; KAN, kanamycin; MOX, moxifloxacin; OFL, ofloxacin; PRO, prothionamide; PZA, pyrazinamide; RIF, rifampin; STR, streptomycin.
Signatures of selection in known drug resistance genes
| Gene | Locus tag | Drug(s) | TB Dream | π | θ | TD | Homoplasy | FST | Type(s) |
|---|---|---|---|---|---|---|---|---|---|
| INH | 226 | 0.80 | 0.82 | 0.34 | Y | Y | Tight | ||
| PZA | 195 | 0.97 | 1.00 | 0.00 | Y | N | Sloppy | ||
| EMB | 117 | 0.77 | 0.89 | 0.08 | Y | Y | Hybrid | ||
| INH | 31 | 0.20 | 0.21 | 0.61 | Y | N | |||
| CAP | 28 | 0.37 | 0.89 | 0.06 | N | N | |||
| EMB | 28 | 0.59 | 0.74 | 0.01 | N | N | |||
| EMB | 25 | 0.46 | 0.49 | 0.28 | N | N | |||
| STR, KAN, CAP | 24 | 0.89 | 1.00 | 0.08 | N | N | |||
| ETO | 23 | 0.72 | 1.00 | 0.00 | Y | Y (IG) | Sloppy, | ||
| STR | 22 | 1.00 | 1.00 | 0.07 | Y | N | Sloppy | ||
| MOX, OFL | 15 | 0.58 | 0.91 | 0.00 | Y | N | |||
| INH, ETO | 13 | 0.60 | 0.66 | 0.30 | Y | Y (IG) | Tight | ||
| INH, ETO | 13 | 0.56 | 0.59 | 0.32 | Y | N | |||
| STR | 13 | 0.99 | 0.95 | 0.80 | Y | Y | Hybrid | ||
| MOX, OFL | 12 | 0.81 | 0.94 | 0.10 | Y | Y | Tight | ||
| EMB | 11 | 0.77 | 0.38 | 0.79 | N | N | |||
| INH | 7 | 0.73 | 0.18 | 0.86 | N | N | |||
| INH | 5 | 0.57 | 0.52 | 0.28 | N | N | |||
| EMB, INH | 4 | 0.64 | 0.33 | 0.56 | N | N | |||
| INH | 3 | 0.89 | 0.88 | 0.57 | N | N | |||
| EMB, INH | 3 | 0.07 | 0.07 | 0.79 | N | N | |||
| INH | 3 | 0.78 | 0.19 | 0.89 | N | N | |||
| EMB | 2 | 0.75 | 0.36 | 0.75 | N | N | |||
| EMB, INH | 2 | 0.49 | 0.67 | 0.08 | N | N | |||
| PAS | 2 | 0.84 | 0.94 | 0.28 | N | N | |||
| INH | 2 | 0.76 | 0.55 | 0.63 | N | N | |||
| INH | 1 | 0.90 | 0.63 | 0.90 | N | N | |||
| INH | 1 | 0.80 | 0.63 | 0.62 | N | N | |||
| INH | 1 | 0.50 | 0.35 | 0.54 | N | N | |||
| INH | 1 | 0.26 | 0.28 | 0.54 | N | N | |||
| INH | 1 | 0.36 | 0.58 | 0.12 | N | N | |||
| RIF | 1 | 0.82 | 0.92 | 0.18 | Y | Y | Hybrid | ||
| INH | 1 | 0.10 | 0.11 | 0.65 | N | N | |||
| ETO | 0.58 | 0.77 | 0.22 | N | N | ||||
| BDQ | 0.37 | 0.72 | 0.25 | N | N | ||||
| KAN | 0.51 | 0.28 | 0.54 | N | N | ||||
| ETO | 0.86 | 0.48 | 0.87 | N | N | ||||
| PZA | 0.88 | 0.62 | 0.84 | N | N | ||||
| PAS | 0.66 | 0.78 | 0.11 | Y | N | ||||
| LZD | 0.57 | 0.77 | 0.21 | N | N |
The number of distinct entries in the TB Drug Resistance Mutation Database for each gene is reported in the TB Dream column.
π and θ are the percentiles for each diversity value, respectively.
TD is the percentile of the residual after linear regression of Tajima’s D values with gene length.
Genes with homoplastic SNPs are indicated with “Y” (for Yes) in the Homoplasy column. N, no.
If a homoplastic SNP was also an FST outlier, it is indicated with a “Y” in the FST column. N, no.
Genes are classified as tight, sloppy, or hybrid targets of selection based on diversity, homoplasy, and FST results. (IG) indicates an intergenic SNP.
FIG 4 Gene-wise Tajima’s D values and gene lengths. Repetitive regions of the alignment were masked. Gene lengths have been log transformed (base 2). We added a constant value of 3 to all Tajima’s D values to make them positive and log transformed (base 2) the Tajima’s D values. We log transformed (base 2) the gene lengths. The linear regression line is plotted in blue. Genes with regression values in the lower 5% are highlighted in green. Drug resistance-associated genes in this group are labeled. While negative Tajima’s D values are normally associated with purifying selection or a recent selective sweep, we find that drug resistance genes with negative Tajima’s D values also have high nucleotide diversity. We hypothesize that patterns of diversity at these genes have been affected by relaxation of purifying selection and positive selection for drug resistance.
FIG 5 Ratios of nucleotide diversity in resistance-associated genes. Genes with zero diversity were transformed to 1 × 10−16 before calculating ratios. Genes with ratios more extreme than 10−1.5 or 101.5 are all filled with the deepest shade. Genes associated with resistance to each drug are outlined in black. (A) Ratio of nucleotide diversity in resistant and susceptible isolates. Green genes are more diverse in resistant isolates, which could be due to diversifying selection and/or relaxation of purifying selection. Purple genes are more diverse in susceptible isolates, likely due to increased purifying selection. White genes have similar diversities in resistant and susceptible isolates. (B) Comparison of ratios in lineage 2 and lineage 4. Teal genes are more diverse in resistant isolates of lineage 2, suggesting diversifying selection/relaxation of purifying selection specific to this lineage. Brown genes are more diverse in resistant isolates of lineage 4. White genes have similar diversities in lineages 2 and 4.
Homoplastic FST outliers
| Location | Gene | Type | wcFST | Known | Lineage | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AMI | CAP | EMB | ETO | INH | KAN | MOX | OFL | PRO | PZA | RIF | STR | |||||
| 1821 | Intergenic | 0.43 | 0.57 | 0.10 | N | All | ||||||||||
| 7570 | Missense | 0.33 | 0.11 | 0.46 | 0.66 | 0.29 | 0.18 | Y | All | |||||||
| 7572 | Missense | 0.06 | Y | All | ||||||||||||
| 7581 | Missense | 0.07 | Y | All | ||||||||||||
| 7582 | Missense | 0.35 | 0.22 | Y | All | |||||||||||
| 75233 | Intergenic | 0.05 | N | All | ||||||||||||
| 94388 | Synonymous | 0.07 | 0.12 | 0.12 | 0.13 | N | All | |||||||||
| 230170 | Missense | 0.12 | 0.05 | 0.12 | 0.13 | N | All | |||||||||
| 332916 | Missense | 0.10 | 0.09 | 0.20 | N | All | ||||||||||
| 761155 | Missense | 0.31 | 0.58 | 0.10 | 0.72 | 0.41 | Y | All | ||||||||
| 761161 | Missense | 0.33 | 0.09 | 0.51 | 0.71 | 0.13 | 0.16 | Y | All | |||||||
| 764817 | Missense | 0.19 | N | All | ||||||||||||
| 781687 | Missense | 0.10 | 0.32 | 0.15 | 0.37 | Y | All | |||||||||
| 922004 | Missense | 0.30 | 0.12 | 0.43 | 0.10 | 0.21 | N | All | ||||||||
| 1076880 | Synonymous | 0.12 | 0.12 | 0.13 | N | All | ||||||||||
| 1673425 | Intergenic | 0.11 | Y | All | ||||||||||||
| 1673432 | Intergenic | 0.52 | 0.65 | Y | All | |||||||||||
| 1722228 | Missense | 0.08 | 0.28 | 0.07 | 0.17 | 0.26 | N | All | ||||||||
| 2122395 | Synonymous | 0.06 | N | All | ||||||||||||
| 2155168 | Missense | 0.36 | 0.89 | 0.13 | 0.32 | 0.60 | 0.66 | Y | All | |||||||
| 2174216 | Synonymous | 0.08 | N | All | ||||||||||||
| 2207525 | Intergenic | 0.09 | N | All | ||||||||||||
| 2422824 | Missense | 0.30 | 0.43 | 0.57 | 0.10 | N | All | |||||||||
| 2660319 | Missense | 0.06 | N | All | ||||||||||||
| 2715369 | Intergenic | 0.17 | 0.09 | 0.28 | 0.30 | 0.13 | N | All | ||||||||
| 2866647 | Synonymous | 0.12 | 0.07 | N | All | |||||||||||
| 2867298 | Synonymous | 0.13 | N | All | ||||||||||||
| 2867347 | Synonymous | 0.13 | 0.06 | 0.12 | 0.14 | N | All | |||||||||
| 2867756 | Synonymous | 0.14 | N | All | ||||||||||||
| 3500149 | Synonymous | 0.11 | N | All | ||||||||||||
| 3550789 | Synonymous | 0.12 | 0.13 | N | All | |||||||||||
| 3680932 | Synonymous | 0.12 | 0.12 | 0.13 | N | All | ||||||||||
| 4001622 | Intergenic | 0.11 | N | All | ||||||||||||
| 4247429 | Missense | 0.25 | 0.45 | 0.23 | 0.05 | 0.11 | 0.31 | 0.21 | 0.20 | Y | All | |||||
| 4247574 | Synonymous | 0.19 | 0.07 | 0.27 | 0.30 | Y | All | |||||||||
| 4327480 | Intergenic | 0.20 | 0.07 | 0.27 | 0.30 | Y | All | |||||||||
| 764948 | Missense | 0.06 | Y | L2 | ||||||||||||
| 4248003 | Missense | 0.16 | Y | L2 | ||||||||||||
| 698 | Missense | 0.10 | N | L4 | ||||||||||||
| 60185 | Missense | 0.06 | N | L4 | ||||||||||||
| 761110 | Missense | 0.66 | Y | L4 | ||||||||||||
| 764822 | Missense | 0.06 | Y | L4 | ||||||||||||
| 781822 | Missense | 0.12 | 0.13 | 0.14 | Y | L4 | ||||||||||
| 2123145 | Missense | 0.06 | N | L4 | ||||||||||||
| 2372550 | Missense | 0.64 | N | L4 | ||||||||||||
| 2715344 | Intergenic | 0.06 | N | L4 | ||||||||||||
| 2986827 | Missense | 0.16 | 0.17 | 0.15 | N | L4 | ||||||||||
| 4247431 | Missense | 0.11 | 0.11 | 0.07 | Y | L4 | ||||||||||
| 4248003 | Missense | 0.06 | Y | L4 | ||||||||||||
For intergenic SNPs, the closest gene is listed.
Weir and Cockerham’s FST (wcFST) values in the top 1% of values genome-wide are reported for each drug.
We identified mutations in genes previously associated with drug resistance (Y for Yes in the Known column) and novel putative resistance or compensatory mutations (N for Novel in the Known column).
FIG 6 Homoplastic SNPs in drug resistance-associated genes. SNPs with FST in the top 1% of genome-wide values are labeled with the population (associated drug resistance) and the FST value. pncA is remarkable for harboring diverse homoplastic mutations, each of which occurs relatively infrequently (“sloppy target”). embB, gyrA, katG, rpoB, and rpsL harbor dominant mutations that occur frequently in the phylogeny and are strongly associated with resistant populations (“tight targets”).