Tuberculosis poses a global health emergency, which has been compounded by the emergence of drug-resistant Mycobacterium tuberculosis (Mtb) strains. We used whole-genome sequencing to compare the accumulation of mutations in Mtb isolated from cynomolgus macaques with active, latent or reactivated disease. We sequenced 33 Mtb isolates from nine macaques with an average genome coverage of 93% and an average read depth of 117×. Based on the distribution of SNPs observed, we calculated the mutation rates for these disease states. We found a similar mutation rate during latency as during active disease or in a logarithmically growing culture over the same period of time. The pattern of polymorphisms suggests that the mutational burden in vivo is because of oxidative DNA damage. We show that Mtb continues to acquire mutations during disease latency, which may explain why isoniazid monotherapy for latent tuberculosis is a risk factor for the emergence of isoniazid resistance.
Tuberculosis poses a global health emergency, which has been compounded by the emergence of drug-resistant Mycobacterium tuberculosis (Mtb) strains. We used whole-genome sequencing to compare the accumulation of mutations in Mtb isolated from cynomolgus macaques with active, latent or reactivated disease. We sequenced 33 Mtb isolates from nine macaques with an average genome coverage of 93% and an average read depth of 117×. Based on the distribution of SNPs observed, we calculated the mutation rates for these disease states. We found a similar mutation rate during latency as during active disease or in a logarithmically growing culture over the same period of time. The pattern of polymorphisms suggests that the mutational burden in vivo is because of oxidative DNA damage. We show that Mtb continues to acquire mutations during disease latency, which may explain why isoniazid monotherapy for latent tuberculosis is a risk factor for the emergence of isoniazid resistance.
In active tuberculosis, patients harbor large numbers of replicating organisms and are treated with multiple antibiotics to prevent the emergence of novel drug resistance mutations. In contrast, Mtb from latent infection is thought to have little capacity for mutation and is typically treated with a single antibiotic, isoniazid (INH). Recent epidemiologic studies have found that INH preventive monotherapy (IPT) for latent tuberculosis is associated with INH resistance[1,2]. In Mtb, all drug resistances are the result of chromosomal mutations and depend on the bacterium's capacity for mutation during the course of infection. Therefore, we seek to define the mutational capacity of the bacterium during infection to better predict the rate at which drug resistance can be expected to emerge in active, latent, and reactivated disease.Conventional approaches to measuring bacterial mutation rates cannot be applied to Mtb in vivo. However, high-density whole genome sequencing (WGS) allows us to assess the capacity of Mtb for mutation over the course of infection with minimal bias and maximum sensitivity[3-5]. As the nonhuman primate is the only animal model that mimics the broad range of disease seen in humantuberculosis[6,7], we performed WGS on the infecting strain of Mtb, Erdman, and 33 isolates from nine cynomolgus macaques that represented the three major clinical outcomes of infection (active disease, persistently latent infection and spontaneously reactivated disease after prolonged latency[7]) (Fig. 1). Genome coverage averaged 93% across these isolates, and average read depth was 117× across the genomes (Supplementary Table 1). Putative polymorphisms were identified using both a scaffolded approach[8,9] and a de novo assembly method[10], and polymorphic sites were validated by Sanger sequencing or through independent identification by WGS. Through these analyses, we identified 14 unique single nucleotide polymorphisms (SNPs) (Fig. 2). There was no evidence that these SNPs were present in the inoculum either from repeated deep sequencing and PCR resequencing of the inoculum, or from shared polymorphisms between bacteria from different lesions. While we have used WGS previously to detect insertions and deletions[9], we did not detect either in the 33 genomes analyzed. Within lesions, we identified both shared and independent polymorphisms as would be expected if the SNPs accrued within lesions over the course of the infection.
Figure 1
Experimental protocol for assessing mutational capacity in different disease states
1) Cynomolgus macaques were infected with ~25CFU of Mtb Erdman via bronchoscopy. 2) Animals were euthanized in the indicated stages of disease for strain isolation. 3) 18 pathologic lesions were plated for bacterial colonies. 33 strains were isolated for WGS. 4) Genomic DNA was isolated from these strains and then analyzed via Illumina sequencing. 5) Reads were assembled using both de novo and scaffolded approaches. 15 SNPs were predicted by both methodologies. Insertions and deletions were not detected using either methodology. 6) Sanger sequencing confirmed 14 of the 15 putative SNPs identified by both scaffolded and de novo analysis.
Figure 2
WGS identifies SNPs in strains isolated from animals with active, latent, and reactivated latent infection
SNPs were predicted through WGS in 33 Mtb strains isolated from nine cynomolgus macaques at various stages of disease. All SNPs predicted through WGS were confirmed via Sanger sequencing or through independent identification by WGS. Genome coverage and the original notation used to describe each animal are found in Supplementary Table 1. The total length of infection in days is listed for each animal below the animal identifier (A–I). Lesion locations are abbreviated as follows: LLL – left lower lobe, RLL – right lower lobe, RML – right middle lobe, RUL – right upper lobe, ACL – accessory lobe, CN – cranial lymph node, BN – bronchial lymph node. Inoculum represents the sequence at the given coordinate of the inoculating strain, Mtb Erdman.
We next sought to quantify the average mutation rate of the bacterium in the different stages of clinical disease. The mutation rate (μ) of a bacterium in vivo can be estimated from the number of mutations (m) that have occurred for a genome of known size (N) over a known number of generations (t/g where t is length of infection and g is generation time). However, the generation time of Mtb in humans or non-human primates is unknown. In vitro, Mtb has a generation time of approximately 20 hours under nutrient rich conditions[11]. In mice, the bacterial organ burden increases at roughly this rate during the first weeks of infection, but the subsequent onset of the adaptive immune response causes bacterial replication to slow substantially or cease entirely[12,13]. In clinical latency in nonhuman primates and humans, the immune response limits infection to the point that there are no clinical or radiographic signs of overt disease. This is thought to be associated with a dramatic slowing or cessation of bacterial replication, although we have recently shown that lesions from clinically latent cynomolgus macaques display a range of histopathologies and bacterial burdens, suggesting a spectrum of bacterial physiologies may occur in latency[14, 15].Because of the inherent uncertainty in the generation time of Mtb in vivo, we estimated the mutation rate across a broad range of generation times (18–240 hours), calculating the rate that would be required to generate the number of polymorphisms identified by WGS (Fig. 3a–c). In order to compare the mutation rate of bacteria from each clinical condition, we derived a lower limit for the bacterial mutation rate in vivo, which we define as the predicted mutation rate per generation if the in vivo generation time were equivalent to the in vitro generation time of 20 hours, μ(20hr). While Mtb is likely to have a much longer generation time in vivo, especially during prolonged latent infection, we use μ(20hr) as a highly conservative boundary estimate of the in vivo mutation rate that allows us to directly compare the mutational capacity of the bacterium in different in vivo conditions. Strikingly, we found that the bacterial population's capacity for mutation, μ(20hr), during latency (2.71×10−10) and reactivated disease (3.03×10−10) is equivalent to that of Mtb from animals with active disease (2.01 ×10−10) (Table 1). Mutation rate can also be calculated as the number of mutations that occur per day of infection rather than per generation. We therefore calculated the mutation rate per day required for the bacterial populations in each disease state to acquire the number of polymorphisms that we identified by WGS (Fig. 3d). Our data indicate that in macaques with active, latent and reactivated disease, the bacterial populations acquire mutations at the same rate over time, regardless of the number of bacterial replications that have occurred.
Figure 3
The mutational capacity of strains from latency and reactivated disease is similar to that of strains from active disease or in vitro growth
(a–c) Mutation rate (μ was estimated based on the number of unique SNPs (m) observed in each condition (4 active, 3 latent, 7 reactivated). This calculation was performed over a range of generation times (g, 18–240 hours per generation) to allow for the uncertainty in growth rate in vivo. The probability of observing μ when g is fixed at any given time was determined to build the probability distribution function around each estimate and to define the 95% confidence intervals. The single base mutation rate of the bacterium during in vitro growth (μin vitro) was determined by fluctuation analysis (Supplementary Figs. 1a–c) and is indicated by an arrow. In each clinical condition, μ20 (the predicted mutation rate if the generation time in vivo were as rapid as the generation time in vitro) is similar to μin vitro. Generation time in vivo is predicted to be substantially slower than in vitro, and thus the mutation rate must be proportionally higher to produce the observed number of SNPs. (d) Given the uncertainty in generation time, a mutation rate per day can be calculated to determine the rate at which mutations occur regardless of generation time. Mutations occur at a similar rate per day regardless of the disease status of the host. Error bars represent 95% confidence intervals.
Table 1
The predicted mutation rate for biologically relevant generation times.
Gen. Time (hrs) (g)
Growth Condition
μ(g)active, (95% CI)a
μ(g)latent, (95% CI)a
μ(g)reactivated, (95% CI)a
20
Rich Media
2.01×10−10 (8.09×10−11, 4.15×10−10)
2.71×10−10 (5.57×10−11, 7.89×10−10)
3.03×10−10 (1.22×10−10, 6.24×10−10)
45
Macrophage
4.77×10−10 (1.30×10−10, 1.22×10−9)
5.99×10−10 (1.23×10−10, 1.75×10−9)
6.71×10−10 (2.70×10−10, 1.38×10−9)
135
Mouse Infection at 10 weeks
1.43×10−9 (3.90×10−10, 3.66×10−9)
1.80×10−9 (3.70×10−10, 5.25×10−9)
2.01×10−9 (8.09×10−10, 4.15×10−9)
Mutation rates were estimated using the equation shown μ=m / [N*(t/g)], over g = 18 to 240 hours. The generation time (g) was varied from 18 to 240 hours, t represents total time of infection in hours and N is equal to the number of bases sequenced. The values shown represent the predicted μ and 95% confidence intervals of a bacterial population in animals with active, latent or reactivated disease estimated for the indicated, biologically relevant generation times[11,12,30].
We then sought to benchmark these rates against the mutation rate of the bacterium in vitro. Luria and Delbrück fluctuation analysis measures the rate at which coding polymorphisms conferring a selectable phenotype arise under stable in vitro conditions[16]. While standard and widely applied, this approach is limited in that it only measures the rate of a small set of coding mutations within a single region of the genome and is therefore not as sensitive as WGS. However, the fluctuation analysis derived mutation rate can be converted to a per base mutation rate by defining the number of mutations conferring resistance[17] and then compared to the mutation rate determined with WGS. Using fluctuation analysis and scoring for the acquisition of rifampicin resistance, we found that the rate of resistance is 2.1×10−09 (Supplementary Fig. 1a,b), consistent with or slightly higher than previously published values for Mtb[18,19]. We then used Sanger sequencing to define the number of coding mutations conferring rifampicin resistance under our growth conditions and found it occurred through ten unique polymorphisms, consistent with previous reports[20] (Supplementary Fig. 1c). Dividing the phenotypic rate by the target size, we determined that the in vitro mutation rate of the inoculating strain (Erdman) is μin vitro=2.1×10−10. Thus μ(20hr), our conservative estimate of the in vivo mutation rate from every disease state, is highly similar to the bacterium's mutation rate observed during in vitro growth.Why does the bacterial population in macaques with clinically latent infection acquire mutations at a similar rate to rapidly replicating bacteria in vitro? One possibility is that Mtb could be actively dividing during the entire course of prolonged clinical latency, perhaps balanced by robust killing. However, though the generation time of Mtb in animals or humans with latent infection is unknown, several lines of evidence suggest that Mtb replication slows during clinical latency[12-15]. If the generation time slows, the mutation rate would have to be proportionally higher to generate the number of mutations observed. For example, if Mtb from animals with latent infection replicate on average every 135 hours, as in mice after ten weeks of chronic infection[12], the bacterial population must have an average mutation rate per generation of 1.80×10−09, nearly an order of magnitude greater than the in vitro mutation rate (Table 1).An alternative interpretation is that the mutational capacity of Mtb during latent infection is determined primarily by the length of time the organism spends in the host environment rather than the replicative capacity and replicative error of the organism during infection. We noted that eight of the ten polymorphisms that we identified in our isolates from animals with persistent latent or reactivated latent infection are possible products of oxidative damage, either cytosine deamination (GC>AT) or the formation of 8-oxoguanine (GC>TA) (Fig. 4a). This is consistent with the model that Mtb faces an oxidative environment in the macrophage phagolysosome[21,22] and data indicating that genes involved in the repair of oxidative damage are essential for bacterial survival during murineinfection[23]. In addition, we found that the pattern of polymorphisms in Mtb from cynomolgus macaques is similar to the pattern of neutral polymorphisms that emerged during the evolution of extensively drug resistant Mtb strains in patients from South Africa (Fig. 4a)[9]. Thus, the mutational capacity of Mtb during latent infection as well as the spectrum of those mutations suggests that the dominant source of mutation during latency is oxidative DNA damage rather than replicative error[24] (Fig. 4b). This may occur because the immune response that results in latent infection causes more oxidative damage to the bacterial DNA[14,25] or because a portion of the bacteria may enter a metabolically quiescent state in which DNA repair is diminished[26,27].
Figure 4
Mutations in Mtb isolated from macaques with latent infection and related human isolates are putative products of oxidative damage
(a) Ten of the fourteen mutations observed in this study could be the product of oxidative damage: the deamination of cytosine (GC>AT) or the production of 7,8-dihydro-8-oxoguanine (GC>TA) by the oxidation of guanine. One of each type of mutation observed was seen in active disease (four mutations total). In contrast, eight of ten mutations observed in latent and reactivated disease are potential products of oxidative damage. There is a similar mutational spectra observed in the synonymous SNPs identified by WGS of a set of closely related strains from South Africa[9]. (b) These observations lead to a model of mutational pressures on Mtb during active disease and latent infection in which oxidative damage may play a central role in the generation of mutation.
Thus, using WGS, we demonstrate that Mtb has greater mutational capacity in latency and early reactivation disease than predicted by in vitro measurements of mutation rate and estimates of in vivo generation time. These data indicate that Mtb retains the ability to acquire drug resistance mutations during latency. The rate at which clinical drug resistance will emerge after IPT treatment of latently infected individuals harboring an initially drug-sensitive population also depends upon the number of bacteria in a latently infected individual and the rate of reactivation, which is low in immunocompetent individuals. Indeed, there is only a modest increased risk of INH resistance after IPT in immunocompetent populations[1-2], some of which may be attributable to selective killing of susceptible bacterial populations, leaving only resistant populations to reactivate[28]. Our results suggest that in addition to these mechanisms, part of the increased risk of INH resistance after IPT may be due to selection of monoresistant mutants that arise during latency. IPT is now being recommended globally for HIV+ individuals with clinically latent tuberculosis where bacterial burden and the rate of treatment failure may be higher because of immunocompromise[29]. If our data from the macaque model are predictive of the mutational capacity of Mtb in HIV+ individuals, INH monoresistance could arise at a substantial rate. These findings emphasize the importance of drug resistance testing and careful monitoring for treatment failure in these patient populations.
Online Methods
Preparation of isolates
Animals were infected as described previously[7] via bronchoscopy with a small number (~25) of organisms. Like humans, macaques developed either active disease or controlled latent infection. In latency, animals became clinically asymptomatic, without microbiologic or radiographic evidence of disease. Clinically latent animals were followed as described previously[7] for prolonged periods of time in the absence of treatment. Spontaneous reactivation of latent infection occurred in a small number of animals. Animals were euthanized at the indicated times after infection and lesions identified on necropsy were plated for bacterial colonies (Fig. 2)[7]. Colonies from necropsy were subsequently streaked onto LJ slants and expanded for extraction of genomic DNA. Minimal expansion occurred between isolation of strains and extraction of genomic DNA.
Illumina Sequencing
Two μg of genomic DNA from each isolate were used for sequencing with the Illumina Genome Analyzer (Illumina). Single-read & paired-end read sequencing was performed with read lengths of 36 bases or 75 bases and a target coverage of at least 3 million high quality bases. Libraries were prepared using standard sample preparation techniques recommended by the manufacturer. Libraries were quantified using a Sybr qPCR protocol with specific probes for the ends of the adapters. The qPCR assay measures the quantity of fragments properly adapter-ligated that are appropriate for sequencing. Based on the qPCR quantification, libraries were normalized to 2nM and then denatured using 0.1 N NaOH. Cluster amplification of denatured templates occurred according to manufacturer's protocol using V2 Chemistry and V2 Flowcells (1.4mm channel width). Sybr Green dye was added to all flowcell lanes to provide a quality control checkpoint after cluster amplification to ensure optimal cluster densities on the flowcells. Flowcells were sequenced on Genome Analyzer II's, using V3 Sequencing-by-Synthesis kits and analyzed with the Illumina v1.3.4 pipeline. Standard quality control metrics including error rates, % passing filter reads, and total Gb produced were used to characterize process performance prior to downstream analysis. Paired-end reads of 51 bases were acquired and analyzed as described previously[9]. Short sequence read data is available on the NCBI SRA (accession numbers in Supplementary Table 1) and on an independent site hosted by the Broad Institute and linked through the TB Database (see URLs).
Data filtering and assembly
Two read lengths were generated (Supplementary Table 1). Prior to mapping or assembly, the 2×75 bp reads were trimmed to 48 bases and filtered. Any read containing an unknown base was discarded. Reads with homopolymeric runs of A/Ts greater than nine bases or G/Cs greater than ten bases were discarded. Reads with an average quality score of less than 20 were removed. On average greater than 8,000,000 reads were retained after filtering. The 36 bp reads were not filtered. For de novo assembly, reads were processed with Edena v2.1.1[10] in overlapping mode with the default parameters to allow for the detection of insertions and deletions as well as SNPs. In assembly mode, “strict” was enforced and independent assemblies were generated with length overlaps ranging from position 23 to 37 bases. Assemblies generating the largest N50 values were selected for polymorphism discovery. Each assembly was analyzed by pair wise comparison using the MUMmer script, dnadiff[31]. Polymorphisms were further processed from the `SNP' files with Perl scripts and mapped to the reference genome H37Rv (Genbank Accession AL123456). For scaffolded assembly, Illumina reads were mapped to the reference genome Haarlem with MAQ v0.7.1[8]. Illumina fastq files for each pair were converted with sol2sanger, individually mapped to the reference and merged together for each isolate. Three mismatches in the alignment seed were allowed during mapping. A minimum read depth of ten was required to call SNPs and remaining parameters were defaults from the script easyrun. For 51bp paired end reads, a minimum read depth of five was required to call SNPs. For 36bp reads, reads were aligned to the reference using easyrun defaults except that we allowed up to three mismatches in the seed. For the Erdman inoculum, four runs were merged to generate the assembly. As this represented the first WGS of the Erdman strain of Mtb, multiple Mtb finished genomes were used as references in preliminary alignments. The Haarlem sequence resulted in the fewest number of SNPs and was selected as the reference sequence for the remainder of the alignments (see URLs). Only sites of difference between the experimental isolates were pursued for further analysis. A master list of sites was created and calls for each site from all samples were extracted with the MAQ command “pileup”, combined into a table and inspected manually. All polymorphic loci were validated either by Sanger sequencing using the indicated primers (Supplementary Table 2) or by independent identification by WGS.
Statistical analysis and estimation of in vivo mutation rate from WGS data
Mutation rate was estimated from the number of SNPs observed in each clinical condition. Our equations assume that both mutation rate and growth rate are parameters that, while potentially dynamic, can be averaged across the lifetime of the bacterium. Additionally, we assume that the number of mutations (m) is an accurate assessment of mutation rate during the life of the cell. SNPs observed multiple times within the same lesion were assumed to have arisen once and then replicated; as such they were each only counted once. Equation (1) describes the estimation of the mutation rate of a single strain as described in Table 1.
Mutation rate (μ) is determined by dividing the number of SNPs (m) by the genome size (N) times the number of generations (t/g). m is defined by the number of SNPs observed, N is determined based on 91% coverage of a 4.4Mb genome (N=4 × 106), t is the total duration of each infection in hours, and g is the generation time in hours. The application of this equation to a clinical condition is described by Equation (2). Samples were binned according to clinical condition and a representative mutation rate was estimated for each condition. Binning allows us to conservatively assess the distribution of mutations in each condition.
In Equation (2), the sum of the SNPs observed (mi) in a condition is divided by the genome size (N) multiplied by the sum of the number of replications possible (ti / g). The number of replications possible is calculated by dividing the total length of infection (in hours, ti) for strain i by the generation time in hours. The generation time, g, was varied from 18 hours to 240 hours to capture the maximum range of biologically plausible generation times. All calculations were performed in Matlab (Mathworks, Natick MA, USA). Estimates of mutation rate and 95% confidence intervals were determined using the poissfit function. Additional probability values were generated for each value of g using the binopdf function. The binopdf parameters and values matched exactly those produced by poissfit, reflective of the ability of the Poisson distribution to approximate the binomial distribution when N is large and p is small. Thus, binopdf was used to calculate the probability density function for the observed number of mutations given a mutation rate and a fixed value for g, while poissfit was used to calculate the estimates of μ and the 95% confidence intervals.
Determination of the in vitro mutation rate
To determine the in vitro mutation rate, we performed fluctuation analysis as previously described[18]. Briefly, 20 independent cultures of 1.08 × 10[9] cells each in 4mL of 7H9 supplemented with OADC, 0.05% Tween-80 and 0.5% glycerol were plated onto 7H10 plates supplemented with OADC, 0.05% Tween-80, 0.5% glycerol and 2μg/mL of rifampicin. The number of mutations per culture (mMSS) was calculated from the distribution of mutants using the MSS method[16] calculated by the Matlab scripts described by Lang et al[17]. Phenotypic mutation rate was estimated by dividing mMSS by the number of cells plated (Nt = 1.08 × 10[9]). The number of rpoB mutations conferring rifampicin resistance in our assay was determined by amplifying the resistance region of rpoB from 96 independent isolates from fluctuation analysis. A single base mutation rate (μin vitro) was calculated by dividing the rifampicin resistance rate by the number of mutations conferring rifampicin resistance.
Mutational spectrum of synonymous SNPs in the KwaZulu Natal Mtb isolates
We identified the synonymous mutations that distinguished the sequenced drug susceptible, MDR and XDR strains of Mtb from KwaZulu Natal, South Africa from each other using previously published data[9]. Polymorphisms were called in reference to the sequenced F11 strain of Mtb and polymorphisms in repetitive and IS elements, PE, PPE and PGRS genes and pks12 were excluded from this analysis because these genes contain large, near perfect repeats that create a high likelihood of sequencing and assembly error.
Authors: A Telenti; P Imboden; F Marchesi; D Lowrie; S Cole; M J Colston; L Matter; K Schopfer; T Bodmer Journal: Lancet Date: 1993-03-13 Impact factor: 79.321
Authors: J H Perriëns; R L Colebunders; C Karahunga; J C Willame; J Jeugmans; M Kaboto; Y Mukadi; P Pauwels; R W Ryder; J Prignot Journal: Am Rev Respir Dis Date: 1991-10
Authors: Simeone Marino; Nicholas A Cilfone; Joshua T Mattila; Jennifer J Linderman; JoAnne L Flynn; Denise E Kirschner Journal: Infect Immun Date: 2014-11-03 Impact factor: 3.441
Authors: Gwenan M Knight; Caroline Colijn; Sourya Shrestha; Mariam Fofana; Frank Cobelens; Richard G White; David W Dowdy; Ted Cohen Journal: Clin Infect Dis Date: 2015-10-15 Impact factor: 9.079
Authors: Ted Cohen; Paul D van Helden; Douglas Wilson; Caroline Colijn; Megan M McLaughlin; Ibrahim Abubakar; Robin M Warren Journal: Clin Microbiol Rev Date: 2012-10 Impact factor: 26.132
Authors: Yong Yang; Kathleen Kulka; Ronald C Montelaro; Todd A Reinhart; James Sissons; Alan Aderem; Anil K Ojha Journal: Cell Host Microbe Date: 2014-02-12 Impact factor: 21.023
Authors: Monty Liong; Anh N Hoang; Jaehoon Chung; Nil Gural; Christopher B Ford; Changwook Min; Rupal R Shah; Rushdy Ahmad; Marta Fernandez-Suarez; Sarah M Fortune; Mehmet Toner; Hakho Lee; Ralph Weissleder Journal: Nat Commun Date: 2013 Impact factor: 14.919