Literature DB >> 32524000

Mycobacterium tuberculosis clinical isolates carry mutational signatures of host immune environments.

Qingyun Liu1,2, Jianhao Wei3, Yawei Li4,5, Mei Wang3, Jun Su3, Yonghui Lu3, Mariana G López6, Xueqin Qian3, Zhaoqin Zhu3, Haiying Wang7, Mingyun Gan8, Qi Jiang1,9, Yun-Xin Fu10, Howard E Takiff11,12, Iñaki Comas6,13, Feng Li3, Xuemei Lu14,15, Sarah M Fortune2,16,17, Qian Gao1,9.   

Abstract

Mycobacterium tuberculosis (Mtb) infection results in a spectrum of clinical and histopathologic manifestations. It has been proposed that the environmental and immune pressures associated with different contexts of infection have different consequences for the associated bacterial populations, affecting drug susceptibility and the emergence of resistance. However, there is little concrete evidence for this model. We prospectively collected sputum samples from 18 newly diagnosed and treatment-naïve patients with tuberculosis and sequenced 795 colony-derived Mtb isolates. Mutant accumulation rates varied considerably between different bacilli isolated from the same individual, and where high rates of mutation were observed, the mutational spectrum was consistent with reactive oxygen species-induced mutagenesis. Elevated bacterial mutation rates were identified in isolates from HIV-negative but not HIV-positive individuals, suggesting that they were immune-driven. These results support the model that mutagenesis of Mtb in vivo is modulated by the host environment, which could drive the emergence of variants associated with drug resistance in a host-dependent manner.
Copyright © 2020 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC).

Entities:  

Year:  2020        PMID: 32524000      PMCID: PMC7259932          DOI: 10.1126/sciadv.aba4901

Source DB:  PubMed          Journal:  Sci Adv        ISSN: 2375-2548            Impact factor:   14.957


INTRODUCTION

Infection with Mycobacterium tuberculosis (Mtb) causes a spectrum of clinical outcomes, from latent infection to active disease. During latent infection, Mtb can survive in the infected host for decades, creating a reservoir that continuously fuels the global tuberculosis (TB) epidemic (). Control of the TB epidemic has been complicated by the emergence of high-level antibiotic resistance, which occurs through de novo chromosomal mutations (). We understand little about the rates and drivers of mutation in Mtb within the infected host where the bacterial population is thought to face a range of different environments and immunologic stresses in different disease states, as reflected by different histopathologic manifestations of infection, including granuloma formation and less-organized areas of inflammation. It is postulated that host immune stressors drive the bacterial population toward a nonreplicative state. Thus, replication-associated mutations are predicted to accumulate at very low rates (–). However, a study in the macaque TB model found that Mtb accumulated mutations at roughly the same rate during both active disease and early latent infection as during rapid growth in vitro (). This mutant accumulation rate is remarkably similar to the molecular clock estimated from the study of human Mtb isolates, suggesting that there are additional drivers of mutation in vivo (). Whole-genome sequencing has been used to track the evolution of Mtb bacilli within the host during antibiotic treatment and demonstrate how drug-resistant mutations arise and become fixed in the population (–). When a patient develops active TB, they are estimated to harbor 1010 to 1012 bacilli (), and such a large population should contain numerous genetic mutations that could be used to study the mutagenesis of Mtb in vivo (, , ). Because antibiotic treatment can reduce Mtb population and diversity very quickly, such mutational records should be preserved only in treatment-naïve patients. By applying whole-genome sequencing to large numbers of individual colonies recovered from 18 patients, we characterized the genetic diversity of Mtb populations at the onset of TB disease and reconstructed the within-host evolution of Mtb. This approach provided sufficient resolution to characterize the mutational signature of Mtb at the individual colony level, which should represent single bacilli in vivo. Our data indicate that mutagenesis of Mtb in vivo is modulated by the host environment, suggesting that the risk of de novo drug resistance varies in a host-dependent fashion.

RESULTS

Sampling Mtb population before antibiotics treatment

Between 1 February 2017 and 31 December 2018, we recruited 18 new, smear-positive patients with TB who had not taken any anti-TB drugs before the diagnosis. All of the isolates were pan-susceptible except for one isolate that was resistant only to isoniazid (table S1). We collected three to five sputum samples from the new patients with TB on the day of diagnosis, which allowed us to deeply sample the Mtb populations in their pulmonary lesions that communicated with the airways. The sampling size of the Mtb population was estimated to be between 25,000 and 150,000 bacilli per patient, according to the conversion index between smear microscopic scores and bacterial load (, ). The sputum sample from each patient was digested, dispersed, serially diluted, and then cultured on Löwenstein-Jensen (L-J) solid medium (Fig. 1A). The remainder of the undiluted sputum was cultured separately to represent the whole population of bacilli present. For each patient, we randomly picked ~50 well-separated colonies for whole-genome sequencing (Fig. 1A). In addition, we scraped all the colonies from the whole population plates of nine patients for deep whole-genome sequencing (Fig. 1A).
Fig. 1

Schematic diagram of the sampling approach and genetic diversity of the Mtb population within the host.

(A) We repeatedly collected three to five sputum samples from new and treatment-naïve patients with TB. All sputum samples from a single individual were mixed together and processed with a standard procedure and then serially diluted. The target dilutions were spread onto five plates, while the remaining undiluted samples were mixed together and spread onto two plates. WGS, whole-genome sequencing. (B) The correlation between the frequencies of single-nucleotide polymorphisms (SNPs) in scraped samples and the relative single-colony samples was tested by Pearson’s correlation coefficient (r); the small inset shows an enlargement of the dashed box on the left. (C) A bar plot showing the numbers of de novo SNPs that were detected in colony samples in different patients with the number of SNPs that were not detected in the corresponding scraped whole population samples highlighted. (D) A histogram showing the distribution of pairwise SNP distance between any two single colonies from the same patient with the two dashed lines indicating the two commonly used SNP thresholds for defining transmission clusters. (E) A violin plot showing the distribution of numbers of de novo SNPs in single colonies from each patient.

Schematic diagram of the sampling approach and genetic diversity of the Mtb population within the host.

(A) We repeatedly collected three to five sputum samples from new and treatment-naïve patients with TB. All sputum samples from a single individual were mixed together and processed with a standard procedure and then serially diluted. The target dilutions were spread onto five plates, while the remaining undiluted samples were mixed together and spread onto two plates. WGS, whole-genome sequencing. (B) The correlation between the frequencies of single-nucleotide polymorphisms (SNPs) in scraped samples and the relative single-colony samples was tested by Pearson’s correlation coefficient (r); the small inset shows an enlargement of the dashed box on the left. (C) A bar plot showing the numbers of de novo SNPs that were detected in colony samples in different patients with the number of SNPs that were not detected in the corresponding scraped whole population samples highlighted. (D) A histogram showing the distribution of pairwise SNP distance between any two single colonies from the same patient with the two dashed lines indicating the two commonly used SNP thresholds for defining transmission clusters. (E) A violin plot showing the distribution of numbers of de novo SNPs in single colonies from each patient.

Sequencing single colonies can delineate the Mtb population structure

In total, we obtained whole-genome sequence data for 795 single colonies from 18 patients (average sequencing depth is 119.4) and nine scraped whole populations (average sequencing depth is 726.3). Sequencing reads were aligned to the reconstructed ancestral genome of the Mtb complex to detect single-nucleotide polymorphisms (SNPs) (). The SNPs shared by all colonies from a given patient were defined as “inherited SNPs,” indicating phylogenetic SNPs that were fixed before the infection. The study then focused on the SNPs that were present in only a proportion of colonies that were termed “de novo SNPs” to indicate their de novo accumulation during infection. To verify that sequencing of ~50 colonies would represent the Mtb population structure in the original samples, we compared the frequency of SNPs detected in the deep-sequenced scraped whole population samples with their ratio in the single colonies. We detected 107 SNPs with a frequency above 1.5% in the nine scraped samples, of which 67 (62.6%) could be detected with a similar frequency in the corresponding single colonies (Pearson’s correlation coefficient: 0.982; Fig. 1B and fig. S1). The SNPs with higher frequencies showed better correlation, while the SNPs with frequencies between 1.5 and 15% presented considerable variation between the scraped population samples and the individual colonies (Pearson’s correlation coefficient: 0.503; Fig. 1B). In addition, we detected 224 SNPs that were present in only one or two single colonies but were not detected in the scraped samples, indicating the increased sensitivity of single-colony sequencing for capturing low-frequency mutations in the population.

Mtb diversity at the onset of TB disease

There was a broad range in the total number of de novo mutations found in the bacterial populations isolated from each patient (0 to 116 SNPs; Fig. 1C), with the average number of de novo mutations varying from 0 to 11.3 SNPs (fig. S2A). However, the SNP distance between any two colonies from the same patient was much lower than between any two strains from different patients (fig. S2, B and C). Mapping of both the fixed and unfixed SNPs from each patient to our phylogenomic database for the identification of multiple evolutionary paths () did not suggest any mixed infections in our samples. Of the 18 Mtb strains, 17 were lineage 2 strains and one belonged to lineage 4 (table S1). Because the SNP distance between different bacilli within a single individual is an important reference for establishing the SNP threshold for epidemiologically defining transmission clusters (), we calculated the pairwise SNP distance between any two single colonies from each patient (Fig. 1D). For 91.0% of the pairs, the difference was ≤5 SNPs, and for 98.8% of the pairs, the difference was ≤12 SNPs, indicating that these two widely used thresholds (5 or 12 SNPs) encompass most of the genetic distance between different bacilli within a patient. However, in 7 of 18 patients, there were colony pairs that differed by more than 5 SNPs and 3 of 18 patients had colony pairs that differed by more than 12 SNPs, indicating that these larger SNP distances can occur within an individual and thus could occasionally be found between isolates from cases linked by direct transmission.

Two patterns of Mtb population growth

We next used the minimum evolution method to reconstruct phylogenetic trees for the single colonies from each patient, setting the inferred ancestral genome as the root (Fig. 2). We found that the in vivo populations of Mtb separated into two different patterns: “starlike expansion” and “stepwise growth”. Ten patients were characterized by the starlike expansion model (Fig. 2A). For these cases, only a few colonies showed de novo SNPs, while the majority maintained the ancestral genome (Fig. 2A). This pattern suggests a bacterial population derived from a recent expansion, with a starting population that was genetically homogeneous. This observation was consistent with a model in which a small number of Mtb bacilli established the infection, proliferated, and caused TB disease relatively rapidly ().
Fig. 2

Phylogenetic trees of Mtb populations from different patients.

All trees are rooted to the inferred ancestral genome and all the “inherited SNPs” were excluded before the phylogenetic reconstruction. The length of solid lines represents the number of de novo SNPs. (A) “Starlike expansion” trees for these patients are shown in a circle format. Trees for patients G and R were not shown because no de novo SNPs were detected. (B) “Stepwise growth” trees for these patients are shown in rectangular format. Gray stars indicate those colonies with excessive de novo SNPs and the taxa names with blue backgrounds highlight the recently expanded populations.

Phylogenetic trees of Mtb populations from different patients.

All trees are rooted to the inferred ancestral genome and all the “inherited SNPs” were excluded before the phylogenetic reconstruction. The length of solid lines represents the number of de novo SNPs. (A) “Starlike expansion” trees for these patients are shown in a circle format. Trees for patients G and R were not shown because no de novo SNPs were detected. (B) “Stepwise growth” trees for these patients are shown in rectangular format. Gray stars indicate those colonies with excessive de novo SNPs and the taxa names with blue backgrounds highlight the recently expanded populations. By contrast, eight patients presented a pattern of stepwise growth, typically suggesting two or three stages (Fig. 2B): (i) growth after the initial infection with divergence into subpopulations containing different SNP markers, (ii) a limited number of subpopulations achieving a second wave of growth with further genetic differentiation of the within-host Mtb population, and (iii) recent expansion of one or a very few of the subpopulations. It appeared that 68.7 to 96.0% of the colonies were the result of the expansion at the third stage. These inferences rely on the assumption that the initial infection was established by a single or a few genetically homogeneous bacilli of Mtb, as appeared to be typical in the cases with starlike expansions. However, the possibility of initial infections with several different, closely related bacilli cannot be conclusively ruled out. In either scenario, the patterns suggested that a large proportion of the Mtb population at the time of clinically detectable TB disease derived from a recently expanded subpopulation.

Mutation rate varies between bacilli within a single individual

There was a variation in the number of de novo SNPs that accumulated in different colonies from the same patient (Fig. 1E). For example, for patients J and P in the starlike group, and patients A, C, I, K, Q, H, and S in the stepwise group, a few colonies (marked by gray stars) accumulated 4 to 14 de novo SNPs, while most colonies maintained the ancestral genotypes (Fig. 2). Because the in vivo mutation rate of Mtb is approximately 0.24 to 0.5 SNPs per genome per year (, –), this disparity suggested that the mutation rate in vivo might not be uniform. To test this, we simulated Mtb population growth in silico using a growth model with a constant mutation rate (Wright-Fisher model) (). The simulated data showed that the distribution of numbers of de novo SNPs appeared as a very convergent Poisson distribution with only one peak (Fig. 3). In contrast, the observed data for patients H, I, J, and S showed multiple peaks in the distribution of the number of de novo SNPs. A comparison of the 0.9 quantiles in the observed and simulated data shows that H, I, J, and S had several colonies with more de novo SNPs than were predicted (Fig. 3), suggesting a deviation from a constant mutation rate model. An alternative explanation might be positive selection that preferentially selected beneficial mutations. To test this possibility, we compared the proportion of nonsynonymous to synonymous mutations (pNS, similar to dN/dS; see details in Materials and Methods) in the de novo SNPs from the four patients whose colonies had the most SNPs (H, I, J, and S) against those of the other patients. We found that the average pNS value for the four patients with high SNP colonies was 0.64 (H, 0.67; I, 0.56; J, 0.83; S, 0.46), while it was 0.79 for the other patients. This indicates a purifying selection in all patients and argues against positive selection as an explanation for the high SNP colonies.
Fig. 3

Comparison of the distribution of de novo SNP numbers between simulated and observed populations.

(A to D) The comparisons of the Mtb populations from patients H, I, J, and S, respectively. The height of histograms shows the proportions of colonies with the relative number of de novo SNPs. The P values indicate the hypergeometric test for 0.9-quantile SNP numbers for the simulated and observed populations.

Comparison of the distribution of de novo SNP numbers between simulated and observed populations.

(A to D) The comparisons of the Mtb populations from patients H, I, J, and S, respectively. The height of histograms shows the proportions of colonies with the relative number of de novo SNPs. The P values indicate the hypergeometric test for 0.9-quantile SNP numbers for the simulated and observed populations.

Enhanced mutation rate likely induced by reactive oxygen species

The four patients, H, I, J, and S, whose colonies suggested non-constant mutation rates also had larger numbers of total de novo SNPs (Fig. 1C), and the majority of the SNPs were C>T and G>A, the base changes associated with oxidative damage (Fig. 4A) (). Although oxidative damage mutations are the major source of genetic changes during in vivo growth (, ), the ratio of these mutations in patients H, I, J, and S was significantly higher than the average level for all patients (80.6% versus 67.1%, P < 0.0001) and even higher than that of the fixed SNPs in a collection of 8399 genomes from global isolates (80.6% versus 47.2%, P < 0.0001; fig. S3). The most notable case was patient H, where 101 of the 116 (87.1%) total de novo SNPs were due to the base changes associated with oxidative damage. We further created histograms of mutation-type compositions for each colony and mapped them to the phylogenetic tree for each patient (Fig. 4B and fig. S4). For the 40 colonies from patient H, 27 had only 0 to 3 de novo SNPs, while 13 colonies had 4 to 14 de novo SNPs. The extra SNPs in these 13 colonies were exclusively mutations associated with oxidative damage—C-T and double CC-TT mutations (Fig. 4B)—which are thought to indicate exposure to reactive oxygen species (ROS) (, ). We also observed a similar pattern with I, J, and S (Fig. 4B). Moreover, the colonies with longer mutation branches from patients A, K, Q, and P can also be explained by C-T and CC-TT mutations (fig. S4).
Fig. 4

Oxidative damage mutations are the source of excessive de novo SNPs.

(A) The proportions of different mutation types in the samples from patients H, I, J, and S showed a deviation from the average level. (B) Integration of phylogenetic tree and mutation composition histograms for patients H, I, J, and S. For patient S, only the major clade (the 48 colonies) is shown here. Different colors refer to different mutation types. The gray dot “MRCA” represents the reconstructed ancestor of each Mtb population. For patients H, J, and S, the MRCA strains were detected in the single colonies. (C) A comparison of C-T mutation ratio between samples from HIV-negative and HIV-positive hosts (P < 0.0001 by t test), with each dot representing one patient. (D) The bar plot shows the ratio of C-T mutations in all samples from HIV-negative and HIV-positive hosts, respectively.

Oxidative damage mutations are the source of excessive de novo SNPs.

(A) The proportions of different mutation types in the samples from patients H, I, J, and S showed a deviation from the average level. (B) Integration of phylogenetic tree and mutation composition histograms for patients H, I, J, and S. For patient S, only the major clade (the 48 colonies) is shown here. Different colors refer to different mutation types. The gray dot “MRCA” represents the reconstructed ancestor of each Mtb population. For patients H, J, and S, the MRCA strains were detected in the single colonies. (C) A comparison of C-T mutation ratio between samples from HIV-negative and HIV-positive hosts (P < 0.0001 by t test), with each dot representing one patient. (D) The bar plot shows the ratio of C-T mutations in all samples from HIV-negative and HIV-positive hosts, respectively.

The elevated mutation rate was host dependent

This elevated mutation rate could be explained either by an increased susceptibility of these particular Mtb strains to oxidative damage or by increased bacterial exposure to oxidative stress in a subset of the patients. If some of the strains had increased ROS susceptibility, all of the colonies derived from these Mtb strains should exhibit similar patterns of mutations and comparable numbers of SNPs, but this was not what was observed; different colonies from the same strain showed a wide variation in de novo SNPs (Fig. 4B). In addition, strains that gave rise to colonies with higher mutation rates should have more C-T and CC-TT inherited mutations than the other strains, but there was no difference in the ratio of C-T and CC-TT between the two groups for the inherited mutations (fig. S5). Last, the four strains whose colonies had more mutations (H, I, J, and S) were not clustered in the phylogenetic tree, showing that the increased mutation rate was not a feature of a particular phylogenetic clade (fig. S6); besides, we found no homoplastic mutation between any two of these four strains. Together, these results suggest that increased bacterial susceptibility to ROS seemed to be not a plausible explanation for the higher mutation rates in a subset of colonies. As the production and regulation of ROS and reactive nitrogen species are important components of the host immune response to pathogen infections, we hypothesized that the variation in the bacterial mutation rate could be host dependent. We compared the ratio of C-T mutations in Mtb samples collected from HIV-negative hosts (all 18 patients) in this study with the ratio from HIV-positive hosts in a previous study of 2587 Mtb single colonies from 42 HIV-positive hosts (). This comparison showed that Mtb strains from HIV-negative hosts accumulated more C-T mutations than those from HIV-positive hosts (P < 0.0001, Fig. 4, C and D), and the difference remained when we restricted this comparison to lineage 2 strains from the two groups (P = 0.0005; fig. S7). Furthermore, the double CC-TT mutations were not observed in the Mtb colonies from HIV-positive hosts, and in these hosts, the phylogenetic trees of the de novo SNPs did not show colonies with “long branches” (fig. S8). These analyses thus suggest that the signature of ROS-associated mutations could be driven by immune pressure.

DISCUSSION

This work analyzed the population structure and mutation signatures of Mtb by genome sequencing colonies isolated directly from clinical specimens, thereby allowing us to make inferences about the growth and evolution of the bacterial population during infection. The data present at least four clear findings: (i) distinct levels of bacterial diversity were observed in the Mtb populations from different patients; (ii) the pairwise SNP distance between any two colonies within a host could occasionally exceed the SNP thresholds used to define a transmission cluster; (iii) the number of de novo SNPs varied significantly between different bacilli within the same host, suggesting that the mutation rate varies between different bacteria within a host; and (iv) the elevated mutation rate in some colonies carries the signature of ROS-induced mutagenesis. The estimated mutation rate of Mtb, about 0.5 mutations in every 10,000 genomes copied (), is low compared to most other bacteria, yet the high rates of acquired resistance in clinical strains raise the question of whether the mutation rate within the host could somehow be higher (). The rate calculated from an in vivo macaque infection model was very similar to that inferred in vitro through fluctuation experiments, despite the differences in bacterial replication rates in vitro and in vivo (, ), and deep whole-genome sequencing of serial sputum samples from patients with TB taken during treatment revealed a high degree of genetic diversity with tens of unfixed SNPs (, ). In the current study, we found an increased rate of ROS-associated mutations that could perhaps explain the high rate of acquired resistance during clinical treatment. Those patients in whom Mtb is subjected to high levels of ROS mutagenesis should carry larger pools of mutations and, thus, perhaps have a higher risk for developing drug resistance. Further studies are warranted to understand why some patients induce higher levels of ROS mutagenesis, how to identify them, and how to reduce their risk of developing of drug resistance. Unexpectedly, only 4 of the 18 patients had bacterial populations wset of the colonies from each of these four patients. Because the excess mutations were predominantly the C-T changes, the results suggest that ROS stress can vary between hosts and also that ROS heterogeneity exists in different microenvironments within the same host. This model is consistent with the growing knowledge of granuloma heterogeneity, where each granuloma represents a microenvironment that can be independently influenced by the local immune response (, ). This inference was further supported by the lower ratio of C-T mutations in strains of HIV-positive patients, implying that the host immune status plays a role in the frequency of oxidative damage mutations. Because unfixed SNPs were detected in 39% of the single colonies, we scanned them for CC-TT mutations and, as expected, found them in the genomes of colonies isolated from the patients with an elevated mutation rate: H (), I (), J (), and S (). Unexpectedly, we also found CC-TT mutations in colonies from patients without elevated mutation rates: 26 in colonies from E, 2 from F, and 3 from O (fig. S9A). This suggests that the rates of ROS-induced mutations could have been underestimated in our study because of the limitations of our sampling size and a mixture of different colonies. We considered positive selection as an alternative explanation for the high numbers of SNPs found in some colonies but discarded this explanation for the following reasons. First, pNS analysis suggested that a negative selection was operating on the bacterial population of strains H, I, J, and S, which was consistent with previous findings that within-host selective pressure on Mtb is governed by negative selection (, , ). Second, in the colonies with large numbers of SNPs, the mutation type of excessive SNPs was almost exclusively C-T or CC-TT. It seems unlikely that a particular mutation type would be favored by positive selection, which should be based only on the functional impact of the mutations on the genes without a preference for a specific type of mutation. Third, if a positive selection was operating, we should expect to see some homoplasy or gene enrichment in the different colonies with large numbers of SNPs isolated from a single patient. However, we found neither homoplastic mutations nor gene enrichment across the colonies from each of the four strains that had high SNP colonies. Our analysis suggested that increased bacterial susceptibility to ROS might not be a plausible explanation for the higher mutation rates in a subset of colonies, but we cannot completely rule out its role in the phenomenon we observed. Heterogeneity in the strains’ susceptibility to ROS could act together with host environmental stress and perhaps help explain why colonies with high numbers of SNPs were isolated only from some patients. Although we found no homoplastic mutations between any two of these four strains (H, I, J, and S) or mutations on the same gene but in different codons, it is still possible that mutations in different genes could have similar functional effects in increasing the susceptibility to ROS stress. The absence of C-T or CC-TT mutational signature in the inherited mutations was a strong argument, but it cannot exclude the possibility that these four strains only acquired an increased “ROS susceptibility” very recently, for which they would not have accumulated many C-T or CC-TT in the inherited mutations. The biological process of Mtb infection is difficult to study in humans, but PET-CT (positron emission tomography–computed tomography) tracking of the dynamic course of infection in cynomolgus macaques revealed that different lesions in the same patient followed diverse trajectories. While some granulomas were sterilized in both active and latent cases, others grew and ultimately determined the clinical outcome of infection (). Typically, only one or a few granulomas that are unable to contain the bacteria are probably responsible for bacterial dissemination and the onset of active disease (, , ). Our results are consistent with these observations. We found that the burden of TB disease in humans often appeared the result of the recent expansion of a subpopulation of the bacilli present in the individual. Together, these findings suggest a “turning point” theory for the progression of TB disease, in which the deterioration of a particular lesion leads to clinical disease. Hence, finding biomarkers that can predict such a transition or detect it early would allow preventive interventions that might prevent the development of clinical TB disease. For those patients with more mutational diversity, the probability was higher that the genomes of any two selected colonies differed by more than 5 SNPs, and pairs that differed by more than 12 SNPs were not rare (fig. S9, B and C). This finding implies that the dominant subpopulation cultured and sequenced from one patient could differ by >12 SNPs from a variant that infects a second patient. In the second host, this transmitted bacillus could again mutate and generate a more distant variant that dominates when the second patient’s sputum is cultured and sequenced. The commonly used SNP thresholds of ≤5 SNP differences for direct transmission and ≤ 12 SNPs for transmission clustered strains have been supported by epidemiologic data (, ), but our results suggest that strains with slightly more SNP differences might occasionally belong to the same transmission chain and deserve to be included when conducting epidemiological investigations. The extent to which the sputum bacillary population is representative of the total mycobacterial population within an individual is unknown. An ideal and complete characterization of the Mtb diversity within a host would require sampling from each lesion, which is not feasible in patients with TB. The bacilli in sputum mainly reflect Mtb population in lesions that are open and connected to the airway, so bacilli in lesions that were not open or connected to airways may not have been sampled and therefore not represented in our data. However, expectorated tubercle bacilli are thought to originate from sites of extensive bacterial growth that are the major contributors to active disease (, ), so we assume that we sampled at least some of the most clinically relevant sites. In addition, different sputum samples collected on the same day could vary in the Mtb growth sites from which they derive, so to avoid bias, we repeatedly sampled from the same patient and pooled the specimens. In conclusion, this study provides a baseline characterization of Mtb population diversity within the host during active TB disease, and the two patterns of population growth extend our understanding of Mtb proliferation in vivo. We show that the mutation rate of Mtb in vivo appears to vary, presumably related to the host environment. The possibility that the mutation rate may increase under certain in vivo conditions could help to understand how drug resistance evolves.

MATERIALS AND METHODS

Study cohort

The patients were enrolled in the Shanghai Public Health Clinical Center (a designated hospital for TB), in Shanghai, China, and the study was approved by the ethics committee of this clinical center. The recruitment criteria for the patients with TB were as follows: (i) no previous history of TB disease, (ii) no previous anti-TB antibiotics, (iii) positive sputum smear for bacilli on microscopy with a score >1+, (iv) a qualified sputum volume at least 5 ml, and (v) drug-susceptible Mtb isolates. Initially, 20 patients were enrolled, but samples from two patients were excluded because one contained Mycobacterium kansasii and the other was contaminated with fungus, leaving 18 patient specimens for analysis (table S1). In the isolate from patient F, an inhA −15 C-T promoter mutation was found and drug sensitivity testing reported isoniazid resistance after whole-genome sequencing was performed. Because this patient had not previously taken any anti-TB drugs, the strain was included. The methods of this study were carried out in accordance with the approved guidelines, and written informed consent was obtained from the patients before the study.

Sample collection and processing

Patients suspected to have TB were first screened according to recruitment criteria 1 and 2. Routine smear microscopy was performed on the sputum samples, and the remaining sediment was temporarily stored at 4°C. Sputum quality was monitored and specimens with purulent sputum were preferentially frozen. The clinical center routinely performed GeneXpert MTB/RIF tests and culture-based drug susceptibility testing for isoniazid, rifampicin, ethambutol, and pyrazinamide. GeneXpert MTB/RIF results were reported on the same day as diagnosis. When a sputum sample had a microscopic score >1+ and was rifampicin sensitive by GeneXpert, a further three to five sputum samples were collected from these patients before the treatment was started. All sputum samples from each patient were combined to reach a total of at least 5 ml for each patient. The combined sputum samples were subjected to digestion with 1:1 of 2% NaOH-NaCl and left standing for 15 min. Phosphate-buffered saline (PBS; pH 6.8) was added to a final volume of 50 ml and the samples were then centrifuged at 3000g for 15 min. The supernatant was discarded and the sediment was suspended in 1 ml of PBS solution. We then performed serial dilutions for each of the samples, and the dilution index was estimated based on the sputum smear results. For each target dilution, eight L-J medium plates were spread to obtain an estimated 50 colonies on each plate. The remainder of the original, undiluted sputum specimens from each patient was centrifuged and spread onto two L-J medium plates. From most patient isolates, 50 well-separated colonies were selected from different plates, but only 10 colonies were obtained from sample G. Each selected colony was spread onto a fresh L-J plate for a short amplification culture of 1 to 2 weeks and then collected for DNA extraction. For nine patients, all of the colonies from the two plates spread with the undiluted sputa were scraped into a single tube for DNA extraction and designated scraped, whole population samples.

Illumina sequencing and SNP calling

Genomic DNA from both the single-colony isolates and scraped whole population samples was extracted with the cetyltrimethylammonium bromide lysozyme method (). A 300–base pair fragment length library was constructed for each DNA sample and paired-end–sequenced on an Illumina HiSeq 2500 instrument. A previously validated pipeline was used for SNP calling (). Fixed mutations with a frequency of ≥90% and at least 10 supporting reads were identified. Whole-genome sequence data of 8399 Mtb isolates from previous studies were downloaded and subjected to SNP calling to identify the ratio of C-T/G-A mutations in each isolate ().

Filter for unfixed SNPs

A previously validated pipeline was used to filter out false positives and detect unfixed SNPs (). We considered only unfixed SNPs whose frequencies were estimated to be ≥1.5% in whole population samples. Because the false positives shared similar patterns in the strains with close genetic backgrounds, we further used repetitive unfixed SNPs called from Shanghai Mtb isolates that had been previously sampled and sequenced as a background filter (). Ideally, we should not find unfixed mutations in single-colony samples if each colony derived from a single bacillus (). However, we detected ≥1 unfixed SNP with allele frequencies above 10% but below 90% in 39.0% (310 of 795) of the single-colony samples. Because these unfixed SNPs survived our filters for false positives, we further examined their prevalence. We found that some unfixed SNPs detected in one colony from a patient were fixed SNPs (≥90%) in other colonies from the same patient but absent in colonies from other patients, suggesting that these unfixed SNPs were present because the colony had grown from more than one original, isolated bacillus or that colonies from separate bacilli grew into one another. Given these possibilities, we included the unfixed SNPs with frequencies above 50%, which represent the major clones for the subsequent analysis.

Phylogenetic reconstruction

For phylogenetic reconstructions, all SNP locations for each isolate were combined into a nonredundant consensus list and recalled with the mpileup2cns function of VarScan (version 2.3.9) (). Nucleotide positions with missing calls in more than 5% of the isolates were removed. An alignment of the remaining polymorphic positions from all strains was used for phylogeny reconstruction with MEGA 6.0 (). To make the branch lengths represent the number of de novo SNPs, we used the minimum evolution method to infer the phylogenetic trees under the “No. of differences” model with both transitions and transversions included. The bootstrap method was used with 500 replications for each test. Phylogeny trees were visualized in FigTree (version 1.4.3) (http://tree.bio.ed.ac.uk/software/figtree/).

Proportion of nonsynonymous to synonymous mutations

We wished to determine whether the excessive mutations in some colonies of the four patients, H, I, J, and S, were due to positive selection, which would accelerate the mutation frequency and fixation in the population. We therefore used pNS to evaluate the selective pressure in these patients’ samples and compared the values to those of the remaining patients (). The basic principle of the pNS method is similar to that of dN/dS, but dN/dS is used to compare two real sequences that exist in nature, while pNS can be applied to concatenated sequences of all the mutations in the genomes. A codon substitution matrix was generated using a base substitution model that takes into account the proportion of guanine and cytosine in the genome (percentage GC content, 0.656). Briefly, for each variant codon, we used a custom Python script to simulate 50,000 individual introductions of a single mutation into the codon and scored the outcomes as either synonymous or nonsynonymous. We considered the average number of nonsynonymous outcomes of the simulations as an estimate of the probability that a mutation in the given codon would be nonsynonymous. The formula used to calculate the pNS was described previously ().

Simulating Mtb growth

We generated Mtb growth with continuously accumulating neutral mutations under a constant growth rate in silico, starting with a single bacillus cell and ending when the population size reached 108. The mutation rate that we chose here was 2.01 × 10−10 mutations per site per generation (). Because our analysis excluded the mutations in PPE/PE-PGRS family genes and other mobile sequences, which accounts for 8.9% of the whole genome, the genome size was set as 4.41 × 106 × 91.1% = 4.02 × 106. Then, the mutation rate (μ) used was 0.0008 per genome per generation. Mtb bacilli in exponential growth are represented bywhere N is the number of bacilli in generation t with N = 1 and a is a constant number. If the average number of de novo mutations in the TB lineage of a patient is k, the number of generations from the first bacilli cell to 108 is k/μ. Therefore, the parameter . The Wright-Fisher model () was used to simulate discrete TB growth. In generation t, the expected number of bacilli is N = e0.0147. The bacilli in generation t were randomly sampled as progenies of cells from generation t − 1. A newborn cell had a probability μ of accumulating a de novo mutation in each cell division. The workflow of the simulation is displayed in fig. S10.

Test the heavy-tailed distribution

To test whether the de novo mutation number distribution of an observed TB lineage is a heavy-tailed distribution, we sorted the mutation number of observed data and simulated data from lowest to highest separately and used a hypergeometric test to compare the 0.9-quantile mutation number of the observed data and simulated data.

Ethics statement

This study was approved by the Research Ethics Review Committee of the Shanghai Public Health Clinical Center, Fudan University.
  35 in total

Review 1.  Heterogeneity in tuberculosis.

Authors:  Anthony M Cadena; Sarah M Fortune; JoAnne L Flynn
Journal:  Nat Rev Immunol       Date:  2017-07-24       Impact factor: 53.106

2.  Genomic diversity in autopsy samples reveals within-host dissemination of HIV-associated Mycobacterium tuberculosis.

Authors:  Tami D Lieberman; Douglas Wilson; Reshma Misra; Lealia L Xiong; Prashini Moodley; Ted Cohen; Roy Kishony
Journal:  Nat Med       Date:  2016-10-31       Impact factor: 53.440

Review 3.  Prospects for new interventions in the treatment and prevention of mycobacterial disease.

Authors:  D B Young; K Duncan
Journal:  Annu Rev Microbiol       Date:  1995       Impact factor: 15.500

Review 4.  Microscopy as a diagnostic tool in pulmonary tuberculosis.

Authors:  Ritu Singhal; Vithal Prasad Myneedu
Journal:  Int J Mycobacteriol       Date:  2015-02-02

5.  Spontaneous emergence of multiple drug resistance in tuberculosis before and during therapy.

Authors:  Caroline Colijn; Ted Cohen; Ayalvadi Ganesh; Megan Murray
Journal:  PLoS One       Date:  2011-03-30       Impact factor: 3.240

6.  Use of whole genome sequencing to estimate the mutation rate of Mycobacterium tuberculosis during latent infection.

Authors:  Christopher B Ford; Philana Ling Lin; Michael R Chase; Rupal R Shah; Oleg Iartchouk; James Galagan; Nilofar Mohaideen; Thomas R Ioerger; James C Sacchettini; Marc Lipsitch; JoAnne L Flynn; Sarah M Fortune
Journal:  Nat Genet       Date:  2011-04-24       Impact factor: 38.330

7.  The within-host population dynamics of Mycobacterium tuberculosis vary with treatment efficacy.

Authors:  Andrej Trauner; Qingyun Liu; Laura E Via; Xin Liu; Xianglin Ruan; Lili Liang; Huimin Shi; Ying Chen; Ziling Wang; Ruixia Liang; Wei Zhang; Wang Wei; Jingcai Gao; Gang Sun; Daniela Brites; Kathleen England; Guolong Zhang; Sebastien Gagneux; Clifton E Barry; Qian Gao
Journal:  Genome Biol       Date:  2017-04-19       Impact factor: 13.583

Review 8.  Evolution of drug resistance in Mycobacterium tuberculosis: a review on the molecular determinants of resistance and implications for personalized care.

Authors:  Navisha Dookie; Santhuri Rambaran; Nesri Padayatchi; Sharana Mahomed; Kogieleum Naidoo
Journal:  J Antimicrob Chemother       Date:  2018-05-01       Impact factor: 5.790

9.  The role of selection in shaping diversity of natural M. tuberculosis populations.

Authors:  Caitlin S Pepperell; Amanda M Casto; Andrew Kitchen; Julie M Granka; Omar E Cornejo; Edward C Holmes; Eddie C Holmes; Bruce Birren; James Galagan; Marcus W Feldman
Journal:  PLoS Pathog       Date:  2013-08-15       Impact factor: 6.823

10.  High-resolution mapping of tuberculosis transmission: Whole genome sequencing and phylogenetic modelling of a cohort from Valencia Region, Spain.

Authors:  Yuanwei Xu; Irving Cancino-Muñoz; Manuela Torres-Puente; Luis M Villamayor; Rafael Borrás; María Borrás-Máñez; Montserrat Bosque; Juan J Camarena; Ester Colomer-Roig; Javier Colomina; Isabel Escribano; Oscar Esparcia-Rodríguez; Ana Gil-Brusola; Concepción Gimeno; Adelina Gimeno-Gascón; Bárbara Gomila-Sard; Damiana González-Granda; Nieves Gonzalo-Jiménez; María Remedio Guna-Serrano; José Luis López-Hontangas; Coral Martín-González; Rosario Moreno-Muñoz; David Navarro; María Navarro; Nieves Orta; Elvira Pérez; Josep Prat; Juan Carlos Rodríguez; María Montserrat Ruiz-García; Herme Vanaclocha; Caroline Colijn; Iñaki Comas
Journal:  PLoS Med       Date:  2019-10-31       Impact factor: 11.069

View more
  11 in total

1.  A tRNA-Acetylating Toxin and Detoxifying Enzyme in Mycobacterium tuberculosis.

Authors:  Francesca G Tomasi; Alexander M J Hall; Jessica T P Schweber; Charles L Dulberger; Kerry McGowen; Qingyun Liu; Sarah M Fortune; Sophie Helaine; Eric J Rubin
Journal:  Microbiol Spectr       Date:  2022-05-31

2.  Whole Genome Sequencing Assessing Impact of Diabetes Mellitus on Tuberculosis Mutations and Type of Recurrence in India.

Authors:  Vidya Mave; Liang Chen; Uma Devi Ranganathan; Dileep Kadam; Vijay Vishwanathan; Rahul Lokhande; Siva Kumar S; Anju Kagal; Neeta N Pradhan; Shri Vijay Bala Yogendra Shivakumar; Mandar S Paradkar; Sona Deshmukh; Jeffrey A Tornheim; Hardy Kornfeld; Maha Farhat; Amita Gupta; Chandrasekaran Padmapriyadarsini; Nikhil Gupte; Jonathan E Golub; Barun Mathema; Barry N Kreiswirth
Journal:  Clin Infect Dis       Date:  2022-09-14       Impact factor: 20.999

3.  SNPs in Genes Related to DNA Damage Repair in Mycobacterium Tuberculosis: Their Association with Type 2 Diabetes Mellitus and Drug Resistance.

Authors:  Damián E Pérez-Martínez; Gustavo A Bermúdez-Hernández; Carlos F Madrazo-Moya; Irving Cancino-Muñoz; Hilda Montero; Cuauhtemoc Licona-Cassani; Raquel Muñiz-Salazar; Iñaki Comas; Roberto Zenteno-Cuevas
Journal:  Genes (Basel)       Date:  2022-03-29       Impact factor: 4.141

4.  Local adaptation of Mycobacterium tuberculosis on the Tibetan Plateau.

Authors:  Qingyun Liu; Haican Liu; Li Shi; Mingyu Gan; Xiuqin Zhao; Liang-Dong Lyu; Howard E Takiff; Kanglin Wan; Qian Gao
Journal:  Proc Natl Acad Sci U S A       Date:  2021-04-27       Impact factor: 11.205

5.  Genomic analyses of Mycobacterium tuberculosis from human lung resections reveal a high frequency of polyclonal infections.

Authors:  Miguel Moreno-Molina; Natalia Shubladze; Iza Khurtsilava; Zaza Avaliani; Nino Bablishvili; Manuela Torres-Puente; Luis Villamayor; Andrei Gabrielian; Alex Rosenthal; Cristina Vilaplana; Sebastien Gagneux; Russell R Kempker; Sergo Vashakidze; Iñaki Comas
Journal:  Nat Commun       Date:  2021-05-11       Impact factor: 17.694

6.  Comparative Genomics of Mycobacterium avium Complex Reveals Signatures of Environment-Specific Adaptation and Community Acquisition.

Authors:  Eric C Keen; JooHee Choi; Meghan A Wallace; Michelle Azar; Carlos R Mejia-Chew; Shail B Mehta; Thomas C Bailey; Lindsay J Caverly; Carey-Ann D Burnham; Gautam Dantas
Journal:  mSystems       Date:  2021-10-19       Impact factor: 6.496

7.  A Whole-Genome Sequencing-Based Study to Delineate the Risk and Characteristics of Tuberculosis Transmission in an Insular Population Over 10 Years in Shanghai.

Authors:  Min Wang; Yangyi Zhang; Cheng Huang; Jing Li; Xin Shen; Genming Zhao; Yuan Jiang; Qichao Pan
Journal:  Front Microbiol       Date:  2022-02-16       Impact factor: 5.640

8.  Multiple genetic paths including massive gene amplification allow Mycobacterium tuberculosis to overcome loss of ESX-3 secretion system substrates.

Authors:  Lin Wang; Emmanuel Asare; Amol C Shetty; Freddy Sanchez-Tumbaco; Megan R Edwards; Rajagopalan Saranathan; Brian Weinrick; Jiayong Xu; Bing Chen; Angèle Bénard; Gordon Dougan; Daisy W Leung; Gaya K Amarasinghe; John Chan; Christopher F Basler; William R Jacobs; JoAnn M Tufariello
Journal:  Proc Natl Acad Sci U S A       Date:  2022-02-22       Impact factor: 12.779

9.  Association of Mannose-Binding Lectin 2 Gene Polymorphism with Tuberculosis Based on Mycobacterium tuberculosis Lineages.

Authors:  Mengwen Liu; Quan Wang; Haican Liu; Chunjie Yin; Xiaokaiti Mijiti; Aiketaguli Anwaierjiang; Kanglin Wan; Miao Xu; Machao Li; Siqin Nong; Guilian Li; Hui Xiao
Journal:  Infect Drug Resist       Date:  2022-03-24       Impact factor: 4.003

10.  Distinguishing Relapse From Reinfection With Whole-Genome Sequencing in Recurrent Pulmonary Tuberculosis: A Retrospective Cohort Study in Beijing, China.

Authors:  Jian Du; Qing Li; Min Liu; Yufeng Wang; Zhongtan Xue; Fengmin Huo; Xuxia Zhang; Yuanyuan Shang; Shanshan Li; Hairong Huang; Yu Pang
Journal:  Front Microbiol       Date:  2021-12-08       Impact factor: 5.640

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.