| Literature DB >> 24549847 |
Noton K Dutta, Nirmalya Bandyopadhyay, Balaji Veeramani, Gyanu Lamichhane, Petros C Karakousis, Joel S Bader.
Abstract
UNLABELLED: Identifying Mycobacterium tuberculosis persistence genes is important for developing novel drugs to shorten the duration of tuberculosis (TB) treatment. We developed computational algorithms that predict M. tuberculosis genes required for long-term survival in mouse lungs. As the input, we used high-throughput M. tuberculosis mutant library screen data, mycobacterial global transcriptional profiles in mice and macrophages, and functional interaction networks. We selected 57 unique, genetically defined mutants (18 previously tested and 39 untested) to assess the predictive power of this approach in the murine model of TB infection. We observed a 6-fold enrichment in the predicted set of M. tuberculosis genes required for persistence in mouse lungs relative to randomly selected mutant pools. Our results also allowed us to reclassify several genes as required for M. tuberculosis persistence in vivo. Finally, the new results implicated additional high-priority candidate genes for testing. Experimental validation of computational predictions demonstrates the power of this systems biology approach for elucidating M. tuberculosis persistence genes. IMPORTANCE: Mycobacterium tuberculosis, the causative agent of tuberculosis (TB), has a genetic repertoire that permits it to persist in the face of host immune responses. Identification of such persistence genes could reveal novel drug targets and elucidate mechanisms by which the organism eludes the immune system and resists drugs. Genetic screens have identified a total of 31 persistence genes, but to date only 15% of the ~4,000 M. tuberculosis genes have been tested experimentally. In this paper, as an alternative to brute force experimental screens, we describe computational methods that predict new persistence genes by combining known examples with growing databases of biological networks. Experimental testing demonstrated that these predictions are highly accurate, validating the computational approach and providing new information about M. tuberculosis persistence in host tissues. Using the new experimental results as additional input highlights additional genes for testing. Our approach can be extended to other data types and target organisms to characterize host-pathogen interactions relevant to this and other infectious diseases.Entities:
Mesh:
Year: 2014 PMID: 24549847 PMCID: PMC3944818 DOI: 10.1128/mBio.01066-13
Source DB: PubMed Journal: MBio Impact factor: 7.867
FIG 1 Overview of study design. Phenotypes from previous studies of M. tuberculosis persistence in mouse lungs were combined with high-throughput data and functional and metabolic networks to predict new candidate genes for experimental testing. Mutants corresponding to the top-ranked genes were grown, pooled, and used for aerosol infection of mouse. Mutants were recovered from lungs at 1, 49, 98, and 196 days postinfection, and abundance for 57 mutants was characterized by qPCR. Statistical models identified 23 of the 57 mutants as attenuated, including 18 novel genes, representing a 6-fold enrichment over the fraction required for persistence genome-wide.
Sources of data input into computational models
| Source | Description | Edge wt | Gene wt |
|---|---|---|---|
| STRING functional associations | 3,964 nodes, 496,278 edges | Combined score ∈ [0, 1] | |
| BiGG metabolic reconstruction | 661 nodes, 217,470 edges | Poisson score mapped to [0, 1] | |
| Transposon mutants ( | 3,795 genes, Gibbs sampling posterior | Pr(essential) ∈ [0, 1] | |
| TraSH ( | 3,172 genes | Log(input/output) | |
| DeADMAn in mouse ( | 31 persistence genes, 474 nonpersistence | +1 (persistence), | |
| TraSH in mouse ( | 2,967 genes, measured 8 weeks after | Log(input/output) | |
| TraSH in mouse macrophage ( | 2,859 genes, unactivated macrophage | Log(input/output) | |
| TraSH in mouse macrophage ( | 2,859 genes, activated with IFN-γ | Log(input/output) | |
| TraSH in mouse macrophage ( | 2,859 genes, activated with IFN-γ | Log(input/output) | |
| Mouse infection ( | Weeks 1, 2, 4, and 8 after infection | Log(input/output) | |
| Macrophage infection ( | Hours 4 and 24 after infection | Log(input/output) |
FIG 2 Statistical assessment of prediction methods. Predictions using logistic regression with stepwise selection by AIC (solid, green), logistic regression with a full model (dashed, orange), and a support vector machine (solid, red) are assessed by receiver operating characteristic (A) and precision recall using 20-fold cross-validation (B). Logistic regression with a full model or stepwise selection provide equivalent performance and are superior to the support vector machine.
Stepwise logistic regression model
| Feature | Coefficient | |
|---|---|---|
| Intercept | −17.55 ± 1,067.87 | 9.86 × 10−1 |
| GDK (STRING, DeADMAn in mouse)[ | 8.37 ± 2.21 | 1.57 × 10−4 |
| GDK (STRING, TraSH in mouse macrophage after IFN-γ) | 0.66 ± 0.37 | 7.47 × 10−2 |
| GDK (STRING, TraSH in mouse) | 1.62 ± 0.42 | 1.14 × 10−4 |
| GDK (STRING, TraSH essential genes) | 0.40 ± 0.23 | 8.95 × 10−2 |
| GDK (metabolic, DeADMAn in mouse) | −110.06 ± 78.75 | 1.62 × 10−1 |
| GDK (metabolic, TraSH in mouse macrophage unactivated) | −1.35 ± 0.76 | 7.62 × 10−2 |
| GDK (metabolic, transposon mutants) | 1,232.09 ± 793.44 | 1.20 × 10−1 |
| Mouse infection day 14 | −0.63 ± 0.42 | 1.39 × 10−1 |
| Mouse infection day 21 | 0.65 ± 0.22 | 3.42 × 10−3 |
| Indicator (mouse infection day 7) | −3.04 ± 1.44 | 3.50 × 10−2 |
| Indicator (mouse infection day 14) | 17.82 ± 1,067.88 | 9.86 × 10−1 |
GDK (network, gene data) indicates features from a graph diffusion kernel with the given network and gene data.
Experimental results of Tn mutant survival in mice and comparison with prior high-throughput studies
| Gene(s) | Count | Result | ||
|---|---|---|---|---|
| This screen | DeADMAn ( | TraSH ( | ||
| 1 | Attenuated | Attenuated | Attenuated | |
|
| 2 | Attenuated | Attenuated | Null |
|
| 2 | Attenuated | Attenuated | Untested |
|
| 3 | Attenuated | Null | Attenuated |
| 1 | Attenuated | Null | Null | |
|
| 9 | Attenuated | Untested | Attenuated |
|
| 5 | Attenuated | Untested | Null |
| 1 | Null | Attenuated | Null | |
| 2 | Null | Null | Attenuated | |
|
| 5 | Null | Null | Null |
| 1 | Null | Null | Untested | |
|
| 9 | Null | Untested | Attenuated |
|
| 9 | Null | Untested | Null |
|
| 4 | Null | Untested | Untested |
| 1 | Virulent | Untested | Attenuated | |
|
| 2 | Virulent | Untested | Null |
FIG 3 M. tuberculosis Tn mutant survival, as assessed by qPCR. Genes are sorted in decreasing order of β (blue line), the regression fit of the change in ΔC over 3 time intervals; large positive values correspond to attenuated mutants (green), and large negative values correspond to virulent mutants (red). The ΔC values at day 49 (dotted line), day 98 (dashed line), and day 196 (solid line) are shown relative to the day 1 baseline.
FIG 4 Predictions of probabilities of persistence defects for deletion mutants, including new results from this study (y axis), are compared with original predictions at the start of this study (x axis). Colors indicate previously known and new positive attenuated mutants (solid and open green), previously known and new negative nonattenuated mutants (solid and open red), untested mutants available for testing (black circles), and mutants unavailable for testing because they are essential (solid gray) or otherwise unavailable (open gray).