Literature DB >> 24549847

Systems biology-based identification of Mycobacterium tuberculosis persistence genes in mouse lungs.

Noton K Dutta, Nirmalya Bandyopadhyay, Balaji Veeramani, Gyanu Lamichhane, Petros C Karakousis, Joel S Bader.   

Abstract

UNLABELLED: Identifying Mycobacterium tuberculosis persistence genes is important for developing novel drugs to shorten the duration of tuberculosis (TB) treatment. We developed computational algorithms that predict M. tuberculosis genes required for long-term survival in mouse lungs. As the input, we used high-throughput M. tuberculosis mutant library screen data, mycobacterial global transcriptional profiles in mice and macrophages, and functional interaction networks. We selected 57 unique, genetically defined mutants (18 previously tested and 39 untested) to assess the predictive power of this approach in the murine model of TB infection. We observed a 6-fold enrichment in the predicted set of M. tuberculosis genes required for persistence in mouse lungs relative to randomly selected mutant pools. Our results also allowed us to reclassify several genes as required for M. tuberculosis persistence in vivo. Finally, the new results implicated additional high-priority candidate genes for testing. Experimental validation of computational predictions demonstrates the power of this systems biology approach for elucidating M. tuberculosis persistence genes. IMPORTANCE: Mycobacterium tuberculosis, the causative agent of tuberculosis (TB), has a genetic repertoire that permits it to persist in the face of host immune responses. Identification of such persistence genes could reveal novel drug targets and elucidate mechanisms by which the organism eludes the immune system and resists drugs. Genetic screens have identified a total of 31 persistence genes, but to date only 15% of the ~4,000 M. tuberculosis genes have been tested experimentally. In this paper, as an alternative to brute force experimental screens, we describe computational methods that predict new persistence genes by combining known examples with growing databases of biological networks. Experimental testing demonstrated that these predictions are highly accurate, validating the computational approach and providing new information about M. tuberculosis persistence in host tissues. Using the new experimental results as additional input highlights additional genes for testing. Our approach can be extended to other data types and target organisms to characterize host-pathogen interactions relevant to this and other infectious diseases.

Entities:  

Mesh:

Year:  2014        PMID: 24549847      PMCID: PMC3944818          DOI: 10.1128/mBio.01066-13

Source DB:  PubMed          Journal:  MBio            Impact factor:   7.867


INTRODUCTION

Mycobacterium tuberculosis, the causative agent of tuberculosis (TB), has evolved adaptive mechanisms to avoid killing by host immune responses. Identifying metabolic and regulatory pathways required for M. tuberculosis persistence in host tissues may elucidate novel strategies to eradicate TB infection. The availability of the M. tuberculosis genome sequence has enabled high-throughput screens using subsaturated transposon (Tn) mutant libraries (1, 2). Such libraries have been used to study the genetic requirements of the pathogen under physiologically relevant stress conditions, including during infection of macrophages (3), mice (4–6), guinea pigs (7, 8), and nonhuman primates (9). Recently, there has been substantial interest in developing computational algorithms for accurately predicting genes essential for M. tuberculosis growth and survival. Flux balance analysis uses the stoichiometry of biochemical reactions to predict growth requirements but is limited to metabolic enzymes (10–12). Other approaches have enhanced flux balance analysis by including transcriptional profiles and regulatory relationships to constrain fluxes through metabolic reactions (13, 14). These approaches have been used to predict drug effects on M. tuberculosis mycolic acid biosynthesis capacity and transcription factor knockout phenotypes (13, 14). Approaches to predict genetic requirements beyond metabolism would have great value, particularly since only 660 M. tuberculosis genes (~17% of the genome) are represented in metabolic reconstructions. Alternative approaches described here combine actual physical interactions, including enzyme-substrate and protein-protein interactions, with functional associations. The resulting networks can be exploited to predict protein function and mutant phenotypes (15). Simple metrics, such as shortest distance to known genes of interest, have been used previously to predict M. tuberculosis drug resistance genes (16). Graph diffusion kernels, introduced first for searching Web pages, additionally account for multiple independent network paths and improve performance. Successes have included predicting epistatic genetic interactions in yeast (17, 18), predicting protein function through protein-protein interactions (19), and identifying candidate genes for disease (20, 21). Biological networks with different interaction types can provide complementary information, and integrative approaches modeling biological functions have been used to predict protein-protein interactions (22, 23), synthetic lethal interactions (17), co-complexed pairs (24), and driver missense mutations (25). In this study, we combined known M. tuberculosis persistence genes and transcriptional profiles with networks from metabolic reconstructions and functional associations to make genome-wide predictions of genes required for mycobacterial persistence in the host (26, 27). The top-ranked predictions were then tested experimentally to confirm their accuracy. Further, we developed new computational algorithms, incorporating recently published data sets (28–31), which together with our new experimental results highlight additional genes for testing. This study extends our knowledge of M. tuberculosis persistence and identifies potential novel drug targets, with the ultimate goal of shortening the duration of TB treatment. This systems biology approach, combining computational predictions with experimental validation, is general and readily extended to new data types and other target organisms, including host-pathogen interactions relevant to this and other infectious diseases.

RESULTS

Computational predictions.

Computational predictions (see Data Set S1 and S2 in the supplemental material) were used to prioritize mutants for experimental tests in mice (Fig. 1; see Data Set S3A and B). The predictions propagated gene-based phenotypes (Table 1), including known persistence defects and additional informative phenotypes, through M. tuberculosis gene networks to generate gene-based features for predicting additional persistence mutants with logistic regression (see Data Set S3C).
FIG 1 

Overview of study design. Phenotypes from previous studies of M. tuberculosis persistence in mouse lungs were combined with high-throughput data and functional and metabolic networks to predict new candidate genes for experimental testing. Mutants corresponding to the top-ranked genes were grown, pooled, and used for aerosol infection of mouse. Mutants were recovered from lungs at 1, 49, 98, and 196 days postinfection, and abundance for 57 mutants was characterized by qPCR. Statistical models identified 23 of the 57 mutants as attenuated, including 18 novel genes, representing a 6-fold enrichment over the fraction required for persistence genome-wide.

TABLE 1 

Sources of data input into computational models

SourceDescriptionEdge wtGene wt
M. tuberculosis networks
    STRING functional associations (27) 3,964 nodes, 496,278 edgesCombined score ∈ [0, 1]
    BiGG metabolic reconstruction (11) 661 nodes, 217,470 edgesPoisson score mapped to [0, 1]
M. tuberculosis essential genes
    Transposon mutants (1) 3,795 genes, Gibbs sampling posterior probabilityPr(essential) ∈ [0, 1]
    TraSH (2) 3,172 genesLog(input/output)
M. tuberculosis persistence genes
    DeADMAn in mouse (5) 31 persistence genes, 474 nonpersistence genes+1 (persistence), −1 (nonpersistence), 0 (untested)
    TraSH in mouse (6) 2,967 genes, measured 8 weeks after infectionLog(input/output)
    TraSH in mouse macrophage (3) 2,859 genes, unactivated macrophageLog(input/output)
    TraSH in mouse macrophage (3) 2,859 genes, activated with IFN-γ before infectionLog(input/output)
    TraSH in mouse macrophage (3) 2,859 genes, activated with IFN-γ after infectionLog(input/output)
M. tuberculosis differentially expressed genes
    Mouse infection (33) Weeks 1, 2, 4, and 8 after infectionLog(input/output)
    Macrophage infection (32) Hours 4 and 24 after infectionLog(input/output)
Overview of study design. Phenotypes from previous studies of M. tuberculosis persistence in mouse lungs were combined with high-throughput data and functional and metabolic networks to predict new candidate genes for experimental testing. Mutants corresponding to the top-ranked genes were grown, pooled, and used for aerosol infection of mouse. Mutants were recovered from lungs at 1, 49, 98, and 196 days postinfection, and abundance for 57 mutants was characterized by qPCR. Statistical models identified 23 of the 57 mutants as attenuated, including 18 novel genes, representing a 6-fold enrichment over the fraction required for persistence genome-wide. Sources of data input into computational models Known in vivo persistence genes were derived from a Tn mutant screen using designer arrays for defined mutant analysis (DeADMAn) (5). This screen identified 31 persistence genes and 474 genes not required for persistence in mouse lungs. These genes served as known positives and negatives, respectively. Additional relevant gene data sets included Tn site hybridization (TraSH) data derived from mouse spleen (6) and murine macrophages (3). Genes required for in vitro growth were obtained from Tn mutagenesis screens (1, 2). Genes differentially expressed during infection were obtained from transcription profiling studies (32, 33). Networks of functional associations were obtained from publicly available metabolic reconstructions (11) and data integration approaches (27). A steady-state graph diffusion kernel propagated the gene data (persistence genes, essential genes, and differentially expressed genes) through the networks to create features for logistic regression and support vector machine classifiers (see Data Set S1). The full logistic regression model included all 28 features; stepwise selection with the Akaike information criterion (AIC) eliminated redundant and uninformative features. Twentyfold cross-validation was used to assess performance based on the known positives and negatives, with area under the receiver operating curve (AUROC) and the maximum harmonic mean of precision and recall (F score) serving as quantitative criteria. Ten different random 20-way splits were performed to ensure robust results. Stepwise logistic regression and full logistic regression were equivalent, and both regression methods were superior to support vector machines (Fig. 2). The F score for all methods is maximal near 20 to 30% recall. Stepwise regression at 20% recall is predicted to have a mean precision of ~50%, an approximately 8-fold enrichment compared to the overall estimate of in vivo persistence genes within the entire genome (6%) (5). Stepwise logistic regression was chosen as the most parsimonious model and used to predict genome-wide persistence requirements based on the 11 features selected for the full data (Table 2). Known positives and negatives ranked by cross-validation provided empirical estimates of precision and recall as a function of ranking. Predicted values are provided genome-wide (see Data Set S2).
FIG 2 

Statistical assessment of prediction methods. Predictions using logistic regression with stepwise selection by AIC (solid, green), logistic regression with a full model (dashed, orange), and a support vector machine (solid, red) are assessed by receiver operating characteristic (A) and precision recall using 20-fold cross-validation (B). Logistic regression with a full model or stepwise selection provide equivalent performance and are superior to the support vector machine.

TABLE 2 

Stepwise logistic regression model

FeatureCoefficientP value
Intercept−17.55 ± 1,067.87 9.86 × 10−1
GDK (STRING, DeADMAn in mouse)[a]8.37 ± 2.211.57 × 10−4
GDK (STRING, TraSH in mouse macrophage after IFN-γ)0.66 ± 0.377.47 × 10−2
GDK (STRING, TraSH in mouse)1.62 ± 0.421.14 × 10−4
GDK (STRING, TraSH essential genes)0.40 ± 0.238.95 × 10−2
GDK (metabolic, DeADMAn in mouse)−110.06 ± 78.751.62 × 10−1
GDK (metabolic, TraSH in mouse macrophage unactivated)−1.35 ± 0.767.62 × 10−2
GDK (metabolic, transposon mutants)1,232.09 ± 793.441.20 × 10−1
Mouse infection day 14−0.63 ± 0.421.39 × 10−1
Mouse infection day 210.65 ± 0.223.42 × 10−3
Indicator (mouse infection day 7)−3.04 ± 1.443.50 × 10−2
Indicator (mouse infection day 14)17.82 ± 1,067.889.86 × 10−1

GDK (network, gene data) indicates features from a graph diffusion kernel with the given network and gene data.

Statistical assessment of prediction methods. Predictions using logistic regression with stepwise selection by AIC (solid, green), logistic regression with a full model (dashed, orange), and a support vector machine (solid, red) are assessed by receiver operating characteristic (A) and precision recall using 20-fold cross-validation (B). Logistic regression with a full model or stepwise selection provide equivalent performance and are superior to the support vector machine. Stepwise logistic regression model GDK (network, gene data) indicates features from a graph diffusion kernel with the given network and gene data.

Gene selection for experimental verification.

The top 75 computationally predicted genes were selected in rank order, in addition to the positive and negative controls, pknF (Rv1746) and Rv1863c (9), respectively, yielding 77 candidate genes. Of these 77 genes, 7 had unfavorable rankings as the prediction method was being developed, and 1 known positive was not selected for testing, leaving 69 genes selected for testing. Of the 70 corresponding mutant strains, 7 failed to grow sufficiently in vitro, yielding 63 M. tuberculosis Tn mutants corresponding to 62 unique genes in the infection pool.

Experimental verification in the murine model of TB infection.

On the day after aerosol infection of BALB/c mice, the implantation dose was determined to be 2.71 ± 0.01 log10 bacilli. The output time point of 14 weeks was selected to evaluate mutant persistence in mouse lungs for consistency with previous studies used for statistical modeling (5). In addition, earlier (day 49) and day 196 time points were included to permit a kinetic analysis of individual mutant survival. Total lung bacillary counts increased and mice gained weight as expected (see Data Set S3A and B). Gross examination of mouse lungs 49 days postinfection and beyond revealed discrete tubercle lesions. Histological evaluation showed cellular aggregates comprising primarily lymphocytes, with few histiocytes and plasma cells. Acid-fast bacilli were localized primarily within foamy macrophages (data not shown). The ability of each mutant to survive in the host was ascertained by quantitative real-time PCR (qPCR). PCR primers failed to amplify 5 of the mutants. Of 63 mutants used, 5 (the Rv0099, Rv0101, Rv1183, Rv1821, Rv3823c mutants) repeatedly failed to amplify and were removed from further analysis. Data were available for a total of 58 mutants corresponding to 57 unique genes, including 6 known positives previously characterized as having a persistence phenotype, 12 known negatives previously characterized as not required for persistence in mouse lung, and 39 mutants previously uncharacterized by DeADMAn. The mean predicted precision was 32%. Wild-type (null) mutants showed no change in representation over time. On the other hand, attenuated mutants showed an increase in cycle threshold (C) number over time, and “hypervirulent” mutants showed a decreasing C over time, indicating a population fraction increase. Mutants having a multiple-testing-corrected P value of 0.05 were classified as either attenuated or virulent; both replicates of Rv0169 had concordant null phenotypes. Of the 57 unique genes tested, 23 were found to be attenuated, 3 virulent, and 31 null (Table 3). Roughly equivalent results are obtained using a threshold of 95% posterior probability for a mutant to belong to the attenuated class. These thresholds correspond to a change of about 1 C unit between measurements or an average change of 3 C units (~8-fold attenuation) from the first to the last of the 4 time points.
TABLE 3 

Experimental results of Tn mutant survival in mice and comparison with prior high-throughput studies

Gene(s)CountResult
This screenDeADMAn (5) TraSH (3, 6)
mkl (Rv0655)1AttenuatedAttenuatedAttenuated
mmpL11 (Rv0202c), fadD26 (Rv2930)2AttenuatedAttenuatedNull
mmpL4 (Rv0450c), pknF (Rv1746)2AttenuatedAttenuatedUntested
hrcA (Rv2374c), fadA6 (Rv3556c), Rv38703AttenuatedNullAttenuated
atsD (Rv0663)1AttenuatedNullNull
pks16 (Rv1013), Rv1045, lprG (Rv1411c), bioA (Rv1568), aceE (Rv2241), cpsA (Rv3484), Rv3683, Rv3723, Rv38719AttenuatedUntestedAttenuated
pntB (Rv0157), Rv1226c, Rv1591, mez (Rv2332), tig (Rv2462c)5AttenuatedUntestedNull
lldD2 (Rv1872c)1NullAttenuatedNull
mce1A (Rv0169), Rv27072NullNullAttenuated
mmpL6 (Rv1557), Rv1863c, Rv2674, ppsE (Rv2935), pks1 (Rv2946c)5NullNullNull
fadD28 (Rv2941)1NullNullUntested
lprK (Rv0173), Rv0176, pcaA (Rv0470c), lpqY (Rv1235), ppgK (Rv2702), drrA (Rv2936), Rv3236c, Rv3616c, Rv39109NullUntestedAttenuated
gca (Rv0112), Rv0203, Rv0660c, Rv0662c, fabG (Rv2766c), dinF (Rv2836c), ppsC (Rv2933), Rv3253c, aspB (Rv3565)9NullUntestedNull
ppsA (Rv2931), ppsB (Rv2932), ppsD (Rv2934), Rv32734NullUntestedUntested
rodA (Rv0017c)1VirulentUntestedAttenuated
gabD2 (Rv1731), pgsA2 (Rv1822)2VirulentUntestedNull
Experimental results of Tn mutant survival in mice and comparison with prior high-throughput studies

Statistical assessment of performance on known genes and novel predictions.

Of the 6 known positives that were tested, 5 gave growth defects in this test. The single known positive with no growth defect was lldD2 (Rv1872c). However, the previously studied Rv1872c mutant was in an sigF deletion background (5), perhaps accounting for the persistence phenotype. Of the 12 known negatives that were tested, 8 remained negative. Four, however, were attenuated: atsd (Rv0663), hrca (Rv2374c), fadA6 (Rv3556c), and Rv3870. All four have been tested previously in related TraSH studies, and all but Rv0663 were required for growth in mouse spleen (2). The overall concordance for previously characterized mutants is at least (5 + 8)/(6 + 12), or 72%, and may be closer to (5 + 11)/(5 + 12), or 94%. Of the 39 unique novel genes tested, 22 had no persistence defect and 17 were found to have a non-wild-type phenotype, 14 with persistence defects and three with increased growth relative to the wild type (Table 3). The attenuation ranged from 8-fold (the lower limit for statistical significance) to over 100,000-fold (the dynamic range of qPCR) (Fig. 3). Counting only the attenuated strains as correct predictions, this 14/39 or 36% success rate is close to the 32% success rate predicted by the statistical model and represents a 6-fold enrichment over the 6% estimate of in vivo persistence genes (5).
FIG 3 

M. tuberculosis Tn mutant survival, as assessed by qPCR. Genes are sorted in decreasing order of β (blue line), the regression fit of the change in ΔC over 3 time intervals; large positive values correspond to attenuated mutants (green), and large negative values correspond to virulent mutants (red). The ΔC values at day 49 (dotted line), day 98 (dashed line), and day 196 (solid line) are shown relative to the day 1 baseline.

M. tuberculosis Tn mutant survival, as assessed by qPCR. Genes are sorted in decreasing order of β (blue line), the regression fit of the change in ΔC over 3 time intervals; large positive values correspond to attenuated mutants (green), and large negative values correspond to virulent mutants (red). The ΔC values at day 49 (dotted line), day 98 (dashed line), and day 196 (solid line) are shown relative to the day 1 baseline. The 23 genes required for persistence in mouse lungs in this assay include 5 that were previously known to be required and 18 novel genes that were either not tested or likely false negatives in previous mouse lung screens (Table 3).

Concordance of experimental model systems.

This and a previous study (5) used medium-throughput assays to test 545 genotypically characterized mutants for persistence in mouse lungs following aerosol infection (see Data Set S4A to E). Similar mutants have also been tested as part of high-throughput, complex libraries using TraSH to study bacillary survival in macrophages and in mouse spleen following intravenous infection (3, 6). Of the 459 genes tested by all three systems, 76 of the corresponding mutants have a defect in at least one of the three systems: 8 are attenuated in all three systems, and an additional 18 are attenuated in two of the three systems (see Data Set S4F). All pairwise comparisons of mutant phenotypes with 2-by-2 contingencies are highly significant (see Data Set S4G and H). It does appear, however, that the in vivo DeADMAn system is more similar to the corresponding in vivo TraSH mouse system (odds ratio of 13.3, Fisher’s exact one-sided P value of 1.3 × 10−10) than to TraSH in macrophages (odds ratio of 7.4, P value of 3.4 × 10−5). The two TraSH systems are also significantly correlated (odds ratio of 14.2, P value of 5.2 × 10−9). Of genes attenuated by TraSH overall, 37% are also required for persistence in mouse lungs, similar to the predictive performance of the statistical model. It is important to note, however, that this study identified 12 of the genes attenuated in both. Prior to this study, only 24% of genes attenuated by TraSH were also found to be attenuated using DeADMAn. Furthermore, of the mutants tested across all three systems, distinct sets are attenuated in only a single system: 20 are unique to DeADMAn, 18 are unique to TraSH in mice, and 12 are unique to TraSH in macrophages. These results suggest corresponding distinct mechanisms. The number of mutants unique to TraSH in macrophages is smallest, possibly because macrophage infection is common to all three systems.

Predictions with updated external data and new results from this study.

We investigated (see Data Set S4 to S6) whether recently reported external data improved our predictions (28–31). Incorporating four new external data sets with improved annotation of essential genes did not improve the predictions: the area under the curve (AUC) remained close to 0.69 and the F score remained close to 0.30 (see Data Set S4A and B). We also updated the predictions by including the new experimental results of this study, which update gene labels from “untested” to either “attenuated” or “null,” together with the four new external data sets (Fig. 4). In the three cases where the new experimental results conflicted with previous results (lprK [Rv0173], lldD2 [Rv1872c], tig [Rv2462c]), we used the new results for cross-validation tests. Here, the prediction performance improved substantially, with a new AUC of 0.77 and a new F score of 0.42 (see Data Set S4C and D). Three genes are particularly noteworthy in rising substantially in priority and also having mutants available for testing: Rv1410c, fadD21 (Rv1185c), and pheA (Rv3838c).
FIG 4 

Predictions of probabilities of persistence defects for deletion mutants, including new results from this study (y axis), are compared with original predictions at the start of this study (x axis). Colors indicate previously known and new positive attenuated mutants (solid and open green), previously known and new negative nonattenuated mutants (solid and open red), untested mutants available for testing (black circles), and mutants unavailable for testing because they are essential (solid gray) or otherwise unavailable (open gray).

Predictions of probabilities of persistence defects for deletion mutants, including new results from this study (y axis), are compared with original predictions at the start of this study (x axis). Colors indicate previously known and new positive attenuated mutants (solid and open green), previously known and new negative nonattenuated mutants (solid and open red), untested mutants available for testing (black circles), and mutants unavailable for testing because they are essential (solid gray) or otherwise unavailable (open gray).

DISCUSSION

Although many studies have highlighted the importance of various adaptive mechanisms in promoting the long-term persistence of M. tuberculosis in host tissues, the M. tuberculosis molecular pathways underlying long-term survival in the infected host remain largely undefined (34–36). This information is not only important for improving our understanding of TB pathogenesis but could also serve as the basis for the rational development of novel sterilizing drugs to shorten the duration of TB chemotherapy. The computational methods developed here provide a genome-scale ranking of bacterial mutants by predicting persistence phenotypes. The predictions are then validated by medium-scale tests of tens to hundreds of mutants in a mouse model. Using this approach, we observed a 6-fold enrichment in the predicted set of M. tuberculosis genes required for persistence in mouse lungs relative to randomly selected mutant pools. We identified 18 genes, which were previously not characterized as M. tuberculosis persistence in animal lungs. Of these genes, Rv1013, Rv1411c, Rv2374c, Rv2462c, Rv3484, Rv3556c, Rv3683, Rv3870, and Rv3871 were found to be significantly differentially expressed during nutrient deprivation of M. tuberculosis (37, 38), consistent with the hypothesis that the encoded products are involved in adaptation of M. tuberculosis to the nutrient-deprived environment of mouse lungs during chronic infection. The novel persistence genes Rv1226c, Rv2462c, Rv3556c, Rv3683, and Rv3723 were shown to be significantly differentially regulated by M. tuberculosis upon inorganic phosphate limitation, suggesting that the cognate products may contribute to bacillary survival within the phosphate-starved environment of the macrophage phagolysosome during chronic infection (3, 39). These genes represent potential novel drug targets but require further validation in individual infections. The M. tuberculosis genome contains a number of genes belonging to the family of polyketide synthases (PKSs), which catalyze the formation of polyketide secondary metabolites (40). The PKSs are structurally and mechanistically related to the fatty acid synthases (FASs), which are involved in the biosynthesis of fatty acids. Recent reports suggest that proteins encoded by the three-operon fadD26-mmpL7 locus (fadD26 ppsA-ppsE, drrA-drrC, papA5 mas fadD28 mmpL7) play major roles in phthiocerol dimycocerosate (PDIM) biosynthetic and transport pathways, which are required for virulence (41–44). Out of 13 genes in this locus, we tested 7 genes in the current study: fadD26, a known positive, and ppsA-ppsE and drrA, all previously untested in mouse lungs, except for the known negative ppsE. While attenuation of the fadD26 mutant was confirmed, none of the remaining genes was required for persistence in mouse lungs. Although the drrA and drrB genes are required for macrophage infection (3), our data suggest that they are not required for M. tuberculosis survival in mouse lungs. The PKS genes pks1, pks10 (45), and pks7 (46), which are involved in dimycocerosyl phthiocerol synthesis, were reported to be required for M. tuberculosis persistence in mice (45, 46). In the current study, a pks16-deficient mutant showed reduced persistence in mouse lungs, while the pks1-deficient mutant showed no survival defect. The discrepancy between our findings and those of Sirakova et al. may be due to the different strains of mice (BALB/c and C57BL6/J, respectively), different routes of infection (aerosol and intranasal, respectively), different inoculating dose (102 and 104 CFU, respectively), or model system (pooled and individual infection, respectively) (45). It is unlikely that the function of the Pks1 protein was not abrogated in our mutant, since the Tn insertion is at 2,869 bp (total gene length = 4,863). Although pks7 was previously reported to be an essential gene (2), our data are consistent with other studies demonstrating that the gene is dispensable for in vitro growth but essential for M. tuberculosis survival in mice (4). Of the 12 M. tuberculosis genes designated mycobacterial membrane protein large (mmpL1 to mmpL12), we studied three (mmpL4, mmpL11, mmpL6) and confirmed the results of earlier high-throughput screens demonstrating that the first two genes are required for long-term bacillary survival in mouse lungs (5, 41). MmpL4 and MmpL11 are predicted to serve as lipid transporters and have been shown to have a role in M. tuberculosis virulence in mice (47). The genes Rv3870 and Rv3871, which together with Rv3877 encode cytosolic or membrane-bound components of the ESX-1 secretion machinery, were found to be required for persistence in mouse lungs in the current study. Our findings are consistent with prior studies demonstrating a requirement for Rv3871 in M. tuberculosis survival in murine macrophages (3) and lungs (2), as well as in nonhuman primate lungs (9). Together, these results indicate the central role for the ESX secretion pathway in M. tuberculosis virulence (48). Interestingly, four mutants (Rv0017c::Tn, Rv0112::Tn, Rv1731::Tn, and Rv1822::Tn) were more abundant in the mouse lungs at days 98 and 196 relative to day 49. Data for two mutants (Rv0017c::Tn and Rv0112::Tn) appear to conflict with earlier TraSH-based studies reporting that Rv0112 is an essential gene (2) and that Rv0017c is required for M. tuberculosis survival in primary murine macrophages (3). Since the Tn insertion in our mutant, the Rv0112::Tn mutant, is at base pair position 91 (total gene length = 957 bp), gene function is expected to be disrupted, indicating that it is, in fact, a nonessential gene (1). The discrepancy in our findings and those of Rengarajan et al. (3) regarding Rv0017c, which encodes a probable cell division protein RodA, may be due to differences in models (mouse versus macrophages) or techniques used to assess mutant growth and survival (qPCR versus microarrays). The current study demonstrates that a network-based computational approach integrating diverse high-throughput data sets may be used to predict genes essential for M. tuberculosis persistence in mouse lungs. These computational predictive algorithms can be further improved by iterative refinement through active learning or by including data from additional relevant model systems, M. tuberculosis regulatory networks (49), and operon structure. To test this hypothesis, we updated the external data by including four new essential gene data sets and updated the training data by using the new experimental results from this study. The new experimental results highlighted three additional genes as high-priority candidates for testing. Additional rounds of experimentation and modeling could therefore lead to even greater knowledge of the genetic requirements for M. tuberculosis persistence. We believe future work should focus on the development of small molecule inhibitors of the most promising candidates identified through such systems biology-based approaches, with the ultimate goal of shortening the duration of TB chemotherapy.

MATERIALS AND METHODS

Network data.

A functional association network for the M. tuberculosis H37Rv strain was obtained from the STRING database (27). A metabolic reconstruction for H37Rv (11) was converted to a functional association network using the log-likelihood ratio ρ for shared metabolites (18) and then mapped to the weight: 1/(1 + e2 − ρ). Protein-protein interactions from yeast two-hybrid screens (50) are included in STRING and did not improve performance when used as a separate feature.

Essential genes in vitro.

Probabilities that genes are essential for M. tuberculosis growth in nutrient-rich broth were compiled from two random mutagenesis studies and Gibbs sampling with mutant survival data (1, 2, 51). Probabilities were recalculated using the “negenes” R-package (http://www.biostat.wisc.edu/~kbroman/software/) from current data available from the Tuberculosis Animal Research and Gene Evaluation Taskforce (TARGET) (http://webhost.nts.jhu.edu/target/).

Persistence genes in vivo.

Genes required for M. tuberculosis survival in mouse tissues (persistence genes) were obtained from two previous studies (5, 6). In addition, data were extracted from a Tn mutant study in macrophages derived from C57BL/6J bone marrow with and without gamma interferon (IFN-γ) activation (3). Persistence genes from M. tuberculosis strain CDC 1551 were mapped to H37Rv orthologs from TubercuList (52). Scores s were log output pool/input pool for each gene g, and s = 0 for untested genes. The 8-week time point from the Sassetti et al. study (6) was selected as the closest match to the 49-day time point in the Lamichhane et al. study (5). Class totals for each study were and normalized weights w were sS/S± for ±s > 0 and S = S+ + S−.

Transcriptional profiling.

Transcriptional data of M. tuberculosis H37Rv during infection of mouse lungs and bone marrow-derived macrophages were obtained from the TB database (53). Features, defined as positive or negative weights w for each gene g, were the log ratios of the transcriptional profiles obtained at 1, 2, 3, and 4 weeks (33) or 4 and 24 h (32) postinfection.

Features from graph diffusion kernels

Please see Data Set S1 for a detailed description.

Classification and cross-validation performance assessment.

Please see Data Set S1 for a detailed description. Software and data sets are available in the supplemental material (see Data Set S1 and reference 54).

Mutant pool generation for experimental studies.

A library of 5,126 unique transposon (Tn) insertion mutants in 2,246 unique genes in CDC 1551 was generated previously (1). The top 75 genes with Tn mutants available were considered in rank order, and 67 were selected for testing. A positive control, JHU1746-380, an in vivo persistence mutant containing a Tn insertion in gene Rv1746/MT1788, and a negative control, JHU1863c-275, a fully virulent mutant containing a Tn insertion in gene Rv1863c/MT1912, were also added to the pool. JHU0169-511 and JHU0169-573 mutants were internal controls with Tn insertions in the same gene but at different positions (511 bp and 573, respectively). Each mutant was grown individually at 37°C in supplemented Middlebrook 7H9 medium (Difco) containing 20 µg/ml kanamycin (Sigma) to mid-log phase (optical density at 600 nm [OD600] of ~0.6). The 63 different mutants in 62 unique genes were pooled by combining an equal volume of each strain.

Mouse infection.

All procedures involving animals were performed in compliance with the U.S. Animal Welfare Act regulations and Public Health Service Policy according to protocols approved by the Institutional Animal Care and Use Committee at Johns Hopkins University. All mice were maintained and bred under specific-pathogen-free conditions and fed water and chow ad libitum. Female BALB/c mice (5 to 6 weeks old; Charles River) were infected via the aerosol route using an inhalation exposure system (Glas-Col) with 2 log10 bacilli. Five mice per group were sacrificed at days 1, 49, 98, and 196 postinfection. Both lungs were homogenized in phosphate-buffered saline (PBS), plated on supplemented Middlebrook 7H10 solid medium (Difco) containing 20 µg of kanamycin/ml, and incubated at 37°C at least 3 weeks before colony enumeration or DNA extraction.

Real-time PCR.

For each time point, approximately 1,000 colonies were scraped and pooled, and genomic DNA (gDNA) was prepared (4, 5, 7). The gDNA preparations from each experimental group were pooled, and qPCR was performed in duplicate using iCycler iQ (version 3.1.7050; Bio-Rad). Mutant-specific primer sets, each composed of a generic Tn primer and a gene-specific primer, were designed to amplify 150- to 200-bp DNA fragments and validated by amplifying the correct-sized fragment by conventional PCR. For a given qPCR run, the cycle threshold (C) for Tn mutant g is C(g) and for the housekeeping gene sigA is C(h). The difference C(g) − C(h) is ΔC(gtr), where g labels the mutant, t labels the four time points (day 1, 49, 98, or 196), and r labels the technical replicate (1 or 2). Finally, y is the average of the replicates: y = [ΔC(gt1) + ΔC(gt2)]/2. A detailed description of the qPCR data analysis is provided in Data Set S1. Software, data, and expectation-maximization detailed methods are available in the supplemental material (see Data Set S1).

New predictions based on additional experimental data sets and new experimental results.

We collected essentiality data sets from four papers published after the initial selection of candidates for testing (28–31). Three of these new data sets rely on improved experimental methods using next-generation sequencing to identify TA sites lacking transposon insertions. Different methods characterize essential genes based on the number of consecutive TA sites without observed insertions (29) or identify overlapping genome regions lacking transposon insertions and then identify genes overlapping these essential regions (29, 31). New Bayesian methods using extreme value distributions to describe runs of TA sites have also been applied to estimate posterior probabilities of essentiality for each gene (28, 29). In addition to these experimental approaches, a recent computational method employed a metabolic reconstruction and flux balance analysis (FBA) to identify essential metabolic genes (30). These data sets generally identify 700 genes overall as essential, of which about 200 are metabolic (see Data Set S5A). These four data sets were incorporated as additional essential gene features and propagated through the biological networks using graph diffusion kernels. New predictions also relied on updated “attenuated” and “nonattenuated” gene labels according to the new results for mutants tested experimentally. Mutants found to be virulent were labeled as nonattenuated. We generated new predictions in two stages: first, we included just the new external data; then, we also included the updated gene labels. These predictions used the same methods as described for the original set of predictions used to prioritize genes for testing (see Data Set S6). Detailed computational methods. Download Data Set S1, DOCX file, 0 MB Genome-wide predictions of persistence and underlying data. Download Data Set S2, XLSX file, 0.9 MB Mouse infection data and illustration of predictive methods. Download Data Set S3, PPTX file, 0.1 MB Figures of new predictions using additional external data and new experimental results. Download Data Set S4, DOCX file, 0.2 MB Essential and nonessential genes obtained from new data sets. Download Data Set S5, DOCX file, 0 MB Table of updated predictions with additional external data and new experimental results. Download Data Set S6, XLSX file, 0.3 MB
  54 in total

1.  Probabilistic integrative modeling of genome-scale metabolic and regulatory networks in Escherichia coli and Mycobacterium tuberculosis.

Authors:  Sriram Chandrasekaran; Nathan D Price
Journal:  Proc Natl Acad Sci U S A       Date:  2010-09-27       Impact factor: 11.205

2.  Walking the interactome for prioritization of candidate disease genes.

Authors:  Sebastian Köhler; Sebastian Bauer; Denise Horn; Peter N Robinson
Journal:  Am J Hum Genet       Date:  2008-03-27       Impact factor: 11.025

3.  Genetic requirements for mycobacterial survival during infection.

Authors:  Christopher M Sassetti; Eric J Rubin
Journal:  Proc Natl Acad Sci U S A       Date:  2003-10-20       Impact factor: 11.205

4.  Virulence attenuation of two Mas-like polyketide synthase mutants of Mycobacterium tuberculosis.

Authors:  Cécile Rousseau; Tatiana D Sirakova; Vinod S Dubey; Yann Bordat; Pappachan E Kolattukudy; Brigitte Gicquel; Mary Jackson
Journal:  Microbiology       Date:  2003-07       Impact factor: 2.777

5.  Stationary phase gene expression of Mycobacterium tuberculosis following a progressive nutrient depletion: a model for persistent organisms?

Authors:  Tobias Hampshire; Shamit Soneji; Joanna Bacon; Brian W James; Jason Hinds; Ken Laing; Richard A Stabler; Philip D Marsh; Philip D Butcher
Journal:  Tuberculosis (Edinb)       Date:  2004       Impact factor: 3.131

6.  The role of RelMtb-mediated adaptation to stationary phase in long-term persistence of Mycobacterium tuberculosis in mice.

Authors:  John L Dahl; Carl N Kraus; Helena I M Boshoff; Bernard Doan; Korrie Foley; David Avarbock; Gilla Kaplan; Valerie Mizrahi; Harvey Rubin; Clifton E Barry
Journal:  Proc Natl Acad Sci U S A       Date:  2003-08-01       Impact factor: 11.205

7.  Finding friends and enemies in an enemies-only network: a graph diffusion kernel for predicting novel genetic interactions and co-complex membership from yeast genetic interactions.

Authors:  Yan Qi; Yasir Suhail; Yu-yi Lin; Jef D Boeke; Joel S Bader
Journal:  Genome Res       Date:  2008-10-02       Impact factor: 9.043

8.  Mycobacterium tuberculosis sigma factor E regulon modulates the host inflammatory response.

Authors:  Patricia A Fontán; Virginie Aris; María E Alvarez; Saleena Ghanny; Jeff Cheng; Patricia Soteropoulos; Analia Trevani; Richard Pine; Issar Smith
Journal:  J Infect Dis       Date:  2008-09-15       Impact factor: 5.226

9.  Predicting co-complexed protein pairs from heterogeneous data.

Authors:  Jian Qiu; William Stafford Noble
Journal:  PLoS Comput Biol       Date:  2008-04-18       Impact factor: 4.475

10.  Investigating the metabolic capabilities of Mycobacterium tuberculosis H37Rv using the in silico strain iNJ661 and proposing alternative drug targets.

Authors:  Neema Jamshidi; Bernhard Ø Palsson
Journal:  BMC Syst Biol       Date:  2007-06-08
View more
  9 in total

Review 1.  Metabolic Perspectives on Persistence.

Authors:  Travis E Hartman; Zhe Wang; Robert S Jansen; Susana Gardete; Kyu Y Rhee
Journal:  Microbiol Spectr       Date:  2017-01

Review 2.  Latent tuberculosis infection: myths, models, and molecular mechanisms.

Authors:  Noton K Dutta; Petros C Karakousis
Journal:  Microbiol Mol Biol Rev       Date:  2014-09       Impact factor: 11.056

3.  Transcriptional signatures of Mycobacterium tuberculosis in mouse model of intraocular tuberculosis.

Authors:  Sudhanshu Abhishek; Michelle Beth Ryndak; Alpa Choudhary; Sumedha Sharma; Amod Gupta; Vishali Gupta; Nirbhai Singh; Suman Laal; Indu Verma
Journal:  Pathog Dis       Date:  2019-07-01       Impact factor: 3.166

4.  Comparative Proteomic Analyses of Avirulent, Virulent, and Clinical Strains of Mycobacterium tuberculosis Identify Strain-specific Patterns.

Authors:  Gagan Deep Jhingan; Sangeeta Kumari; Shilpa V Jamwal; Haroon Kalam; Divya Arora; Neharika Jain; Lakshmi Krishna Kumaar; Areejit Samal; Kanury V S Rao; Dhiraj Kumar; Vinay Kumar Nandicoori
Journal:  J Biol Chem       Date:  2016-05-05       Impact factor: 5.157

5.  Gene expression models based on a reference laboratory strain are poor predictors of Mycobacterium tuberculosis complex transcriptional diversity.

Authors:  Álvaro Chiner-Oms; Fernando González-Candelas; Iñaki Comas
Journal:  Sci Rep       Date:  2018-02-28       Impact factor: 4.379

6.  Oleoyl coenzyme A regulates interaction of transcriptional regulator RaaS (Rv1219c) with DNA in mycobacteria.

Authors:  Obolbek Turapov; Simon J Waddell; Bernard Burke; Sarah Glenn; Asel A Sarybaeva; Griselda Tudo; Gilles Labesse; Danielle I Young; Michael Young; Peter W Andrew; Philip D Butcher; Martin Cohen-Gonsaud; Galina V Mukamolova
Journal:  J Biol Chem       Date:  2014-07-10       Impact factor: 5.157

7.  The ubiquitin ligase TRIM27 functions as a host restriction factor antagonized by Mycobacterium tuberculosis PtpA during mycobacterial infection.

Authors:  Jing Wang; Jade L L Teng; Dongdong Zhao; Pupu Ge; Bingxi Li; Patrick C Y Woo; Cui Hua Liu
Journal:  Sci Rep       Date:  2016-10-04       Impact factor: 4.379

Review 8.  Leveraging User-Friendly Network Approaches to Extract Knowledge From High-Throughput Omics Datasets.

Authors:  Pablo Ivan Pereira Ramos; Luis Willian Pacheco Arge; Nicholas Costa Barroso Lima; Kiyoshi F Fukutani; Artur Trancoso L de Queiroz
Journal:  Front Genet       Date:  2019-11-13       Impact factor: 4.599

9.  Mycobacterium tuberculosis inhibits the NLRP3 inflammasome activation via its phosphokinase PknF.

Authors:  Shivangi Rastogi; Sarah Ellinwood; Jacques Augenstreich; Katrin D Mayer-Barber; Volker Briken
Journal:  PLoS Pathog       Date:  2021-07-29       Impact factor: 7.464

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.