Literature DB >> 31368488

Shared Pathogenomic Patterns Characterize a New Phylotype, Revealing Transition toward Host-Adaptation Long before Speciation of Mycobacterium tuberculosis.

Guillaume Sapriel1,2, Roland Brosch3.   

Abstract

Tuberculosis remains one of the deadliest infectious diseases of humanity. To better understand the evolutionary history of host-adaptation of tubercle bacilli (MTB), we sought for mycobacterial species that were more closely related to MTB than the previously used comparator species Mycobacterium marinum and Mycobacterium kansasii. Our phylogenomic approach revealed some recently sequenced opportunistic mycobacterial pathogens, Mycobacterium decipiens, Mycobacterium lacus, Mycobacterium riyadhense, and Mycobacterium shinjukuense, to constitute a common clade with MTB, hereafter called MTB-associated phylotype (MTBAP), from which MTB have emerged. Multivariate and clustering analyses of genomic functional content revealed that the MTBAP lineage forms a clearly distinct cluster of species that share common genomic characteristics, such as loss of core genes, shift in dN/dS ratios, and massive expansion of toxin-antitoxin systems. Consistently, analysis of predicted horizontal gene transfer regions suggests that putative functions acquired by MTBAP members were markedly associated with changes in microbial ecology, for example adaption to intracellular stress resistance. Our study thus considerably deepens our view on MTB evolutionary history, unveiling a decisive shift that promoted conversion to host-adaptation among ancestral founders of the MTBAP lineage long before Mycobacterium tuberculosis has adapted to the human host.
© The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  evolutionary transition; host adaption; mycobacteria; pathogen evolution; pathogenomic; tuberculosis

Mesh:

Substances:

Year:  2019        PMID: 31368488      PMCID: PMC6736058          DOI: 10.1093/gbe/evz162

Source DB:  PubMed          Journal:  Genome Biol Evol        ISSN: 1759-6653            Impact factor:   3.416


Introduction

Obligate pathogens are specifically adapted to host cells, which represent their ecological niche. Most of these host-adapted organisms are descendants of free-living extracellular ancestors. Their emergence is associated with drastic adaptive changes to cope with new constraints, such as nutrient availability or host defenses (Toft and Andersson 2010). For many bacterial pathogens, comparative genomics identified genomic changes that correlate with this transition, including genome reduction, virulence factor acquisition, and loss of genetic diversity (Pallen and Wren 2007). Presently, tuberculosis remains the deadliest human infectious disease (Dye and Williams 2010). However, up to now, the evolutionary steps leading to the tubercle bacilli (MTB), responsible for this disease, remain uncertain. MTB belong to the slow growing mycobacteria (SGM), a subgroup of the mycobacterial genus that contains most of the pathogenic mycobacterial species. MTB mainly consist of strains from the Mycobacterium tuberculosis complex (MTBC), a clonal group of closely related bacteria that cause TB in selected mammalian species, and a clade of Mycobacterium canettii strains, representing recombinogenic, rare tubercle bacilli, with unusual smooth colony morphology (Fabre et al. 2004; Boritsch et al. 2016a; Madacki et al. 2018). Mycobacteriumtuberculosis and M. canettii share more than 98% nucleotide identity (Supply et al. 2013), and thus can theoretically be classified into a single species (Riojas et al. 2018), whereby often the traditional, host-related names are continued to be used. Comparative genomic approaches, using Mycobacterium marinum or Mycobacterium kansasii as outgroups suggested that massive genomic changes led to MTB differentiation (Stinear et al. 2008; Wang et al. 2015), including marked genomic reduction, horizontal gene transfer (HGT), and toxin–antitoxin massive expansion. These studies revealed a wide evolutionary gap separating the genomic structures of MTB from those of related environmental mycobacteria, suggesting the existence of unknown intermediate evolutionary steps that may have paved the way to MTB host-adaptation (Veyrier et al. 2011; Wang and Behr 2014). Taking advantage of recently released mycobacterial genus sequence data (Tortoli et al. 2017), we investigated a distinct phylogenetic cluster of species associated with MTB, hereafter called MTB-associated phylotype (MTBAP). A dedicated comparative genomic analysis approach revealed that MTBAP members share most of the MTB attributes associated with host adaptation, suggesting ancestral transitional forms leading to host-adaptation conversion. Our results allowed us to propose a novel phylogenetic scheme of MTB deep evolutionary roots, which is also linked to apparent changes in the ecology of MTB. They unveil a major ancestral evolutionary leap that shaped most genomic characteristics related to host-adaptation, inaugurating an extended co-evolution period, long before MTB speciation.

Materials and Methods

Research Strategy Overview

To evaluate the deeper phylogenetic and pathogenomic relationships between MTB and selected, recently genome-sequenced mycobacterial species, we used following approaches: At first, Average Nucleotide Identity (ANI) analyses were performed on whole genome sequences from selected mycobacterial species, followed by the generation of a phylogenetic tree based on maximum likelihood similarities of widely distributed essential bacterial gene products, preselected by the bcgTree software. In addition, synonymous mutation rates on core genomes were calculated. In a second step, species-specific evolutionary dynamics were investigated by analyzing gene gains and gene losses as well as dN/dS ratios. The putative functional specificities of the investigated species were predicted by using multivariate analyses on whole proteomes, as well as determination of gene flux by HGT and identification of putative genomic islands. Based on the different results, an evolutionary scenario was proposed that integrates the putative contributions of a group of closely related mycobacterial species on the shaping of host-associated traits of MTB.

ANI and Phylogeny Analyses

We performed mutual ANI calculations on genome data from 10 defined SGM species, consisting of strains M. tuberculosis H37Rv, M. canettii STB-K, Mycobacteriumdecipiens TBL 1200985, Mycobacteriumshinjukuense CCUG 53584, Mycobacteriumlacus DSM 44577, Mycobacteriumriyadhense DSM 45176, M. kansasii ATCC 12478, M. marinum E11, Mycobacteriumszulgai DSM 44166, and Mycobacteriumgordonae DSM 44160 (table 1), which were selected based on reported findings from genus-wide mycobacterial sequence analyses (Fedrizzi et al. 2017; Tortoli et al. 2017). Pairwise genomic distances between genomes were calculated according to BLAST-based ANI scores (Goris et al. 2007), determined by using JspeciesWS (Richter et al. 2016).
Table 1

Pairwise Genomic Distances of MTB and Closely Related SGM Species

ANI [% Aligned]MtubMcanMdecMshiMlacMriyMkanMmarMszuMgor
Mtub 97.79 [90.62] 85.18 [70.43] 82.22 [67.21] 81.46 [66.70] 80.51 [67.74] 79.63 [65.42]78.57 [64.75]79.28 [64.01]77.70 [62.28]
Mcan 97.71 [88.79] 85.15 [69.71] 81.99 [65.98] 81.43 [66.31] 80.63 [67.40] 79.69 [64.68]78.57 [63.34]79.28 [63.35]77.57 [60.75]
Mdec 84.34 [62.60] 84.39 [63.23] 80.91 [58.13] 81.30 [63.57] 80.42 [67.40] 79.45 [66.00]78.57 [66.67]79.32 [65.69]77.55 [62.72]
Mshi 82.21 [67.90] 82.08 [67.84] 81.48 [65.71] 82.37 [70.53] 81.43 [69.12] 80.68 [67.36]79.37 [65.38]80.35 [66.50]78.79 [64.74]
Mlac 81.15 [62.01] 81.25 [62.28] 81.63 [65.53] 82.12 [65.19] 82.04 [70.44] 80.46 [67.77]79.06 [65.30]80.49 [67.68]78.59 [64.42]
Mriy 79.54 [52.03] 79.59 [52.38] 80.17 [57.26] 80.49 [52.49] 81.14 [58.74] 79.13 [58.92]77.90 [56.69]80.93 [64.48]77.56 [58.40]
Mkan78.62 [47.78]78.63 [48.36]79.07 [54.05]79.65 [49.10]79.70 [53.48]79.13 [56.38]79.07 [59.76]78.92 [58.72]77.54 [57.73]
Mmar77.31 [49.41]77.35 [49.69]77.98 [57.08]78.31 [48.95]78.04 [53.76]77.59 [56.56]78.82 [62.67]77.37 [59.18]76.53 [58.75]
Mszu77.82 [47.14]77.84 [47.64]78.53 [55.11]78.99 [48.53]79.25 [54.46]80.39 [62.95]78.35 [60.67]77.24 [57.97]77.79 [61.73]
Mgor76.20 [40.69]76.23 [40.77]76.76 [45.98]77.29 [41.74]77.22 [45.49]76.96 [50.24]76.99 [51.96]76.16 [50.79]77.60 [54.55]

Notes.—ANI values were calculated from genome to genome BLAST-based comparison. Within brackets: aligned genome percentage.

Bold: higher ANI values with MTBC, as compared with previously studied reference outgroups M. kansasii and M. marinum.

Mtub, M. tuberculosis H37Rv; Mcan, M. canettii STBK; Mdec, M. decipiens; Mlac, M. lacus; Mriy, M. riyadhense; Mkan, M. kansasii; Mmar, M. marinum E11; Mszu, M. szulgai; Mgor, M. gordonae.

Pairwise Genomic Distances of MTB and Closely Related SGM Species Notes.—ANI values were calculated from genome to genome BLAST-based comparison. Within brackets: aligned genome percentage. Bold: higher ANI values with MTBC, as compared with previously studied reference outgroups M. kansasii and M. marinum. Mtub, M. tuberculosis H37Rv; Mcan, M. canettii STBK; Mdec, M. decipiens; Mlac, M. lacus; Mriy, M. riyadhense; Mkan, M. kansasii; Mmar, M. marinum E11; Mszu, M. szulgai; Mgor, M. gordonae. For phylogenetic analyses, a larger, representative study data set was established, to obtain robust bootstrap values. Twenty-eight mycobacterial species were selected for the analysis (supplementary table 1, Supplementary Material online), including 25 SGM species, Mycobacterium terrae, belonging to the intermediate M.terrae complex group, as well as Mycobacterium smegmatis and Mycobacterium abscessus, representing rapidly growing mycobacteria. The latter three species were used as outgroup. Putative coding sequences and associated protein sequences were identified, and annotated from contigs and whole genome data, using the Microbial Genome Annotation and Analysis platform “MicroScope” (Vallenet et al. 2017). For the calculations, 107 essential bacterial genes, which were defined by the bcgTree (bacterial core genome Tree) software (Ankenbrand and Keller 2016) as being present as single copy genes in the majority of bacterial genomes, were retrieved for each of the 28 species. Sequences were concatenated, aligned using clustal Omega, and well-conserved regions were selected using Gblocks. The phangorn R package (Schliep 2011) was then used to generate a phylogenetic tree based on maximum likelihood (ML) estimates (via the pml algorithm), using a JTT_DCMut+I+G best substitution model.

Core/Variable Genome and dN/dS Analyses

Gene families were determined using the MICFAM (MicroScope gene families) tool, computed with the SiLiX software (Miele et al. 2011) by using a 50% amino acid identity threshold and 80% alignment coverage. Families with exactly one gene per species were selected for core-genome orthologs. For dN/dS ratio calculations, core-genome amino-acid sequences were retrieved, and aligned using clustalW2 (Larkin et al. 2007). Nucleic sequences were then aligned according to the deduced codons from amino-acid alignment using the pal2nal software (Suyama et al. 2006), and gaps were removed. dN/dS calculation was then performed using the PAML program (Yang 1997). To remove saturation effects, genes having dS <0.01, dS >2, and dN >2 were removed from the analysis. For each species, dN/dS distribution values relative to the M. gordonae outgroup were generated and visualized using R boxplot functions (R-Core-Team 2014), and a Wilcoxon–Mann–Whitney test was used to identify median values significantly lower than M. tuberculosis median values. *P value < 0.05. **P value < 0.001.

Multivariate Analysis of Protein Family Domains, M. tuberculosis Ortholog and Genomic Island Analyses

PFAM is a comprehensive collection of protein domains and families (Finn et al. 2008). For our analysis, PFAM domains associated with translated sequences of all investigated mycobacterial species were retrieved (e value <0.0001) using MicroScope (Vallenet et al. 2017). Principal Component Analysis (PCA) and Between Class Analysis (BCA) were performed with centered and scaled data using the ade4 package in R (Thioulouse et al. 1997). Moreover, M. tuberculosis orthologs and synteny regions were identified using BLAST (Altschul et al. 1990) and MicroScope gene Phyloprofile functions, including the Bidirectional Best-Hit (BBH) method (50% identity, 80% query cover). A literature review was also performed on this data set to retain a subset of genes and gene products that were experimentally reported to be involved in animal infection or intracellular mycobacterial survival. For the identification of putative HGT regions, the Regions of Genomic Plasticity (RGP) functions of the MicroScope platform were used, by taking M. kansasii as the outgroup species (50% identity, 80% query cover). Finally, contigs associated with putative genomic islands were retrieved and alignment was performed using the MAUVE software (Darling et al. 2004). Homologous regions were drawn using genoPlotR package in R (Guy et al. 2010).

Results

MTB Phylogeny Revised

Previous comparative genomic studies on MTB used M. marinum or M. kansasii as closest known outgroups (Stinear et al. 2008; Wang et al. 2015). However, recent 16S rDNA analyses and genomic ANI clustering analyses suggested that certain other mycobacterial species might be closer related to MTB at the genomic level than these two most commonly used comparators (Fedrizzi et al. 2017; Tortoli et al. 2017). To get deeper insights into this matter, we performed ANI calculations on a set of 10 SGM species, which were selected according to previous genus-wide mycobacterial sequence data (Fedrizzi et al. 2017; Tortoli et al. 2017). These analyses revealed that highest ANI values and highest lengths of shared aligned genomic regions were found between MTB and four species (namely M. decipiens, M. shinjukuense, M. lacus, and M. riyadhense). For further phylogenetic analyses, an enlarged set of 28 mycobacterial species (supplementary table 1, Supplementary Material online) was chosen, from each of which the amino acid sequences of 107 broadly conserved, orthologous proteins were selected and retrieved using the bcgTree software (Ankenbrand and Keller 2016). The phylogenetic tree obtained from analyses of these data, using ML estimations generated very robust bootstrap values (fig. 1) and provides strong evidence that these four aforementioned species are more closely related to MTB than M. kansasii and M. marinum (fig. 1). Together with MTB, they represent a novel phylogenetic group that is descending from a common ancestor and will be referred to hereafter as the MTBAP lineage. Using core-genome data from MTBAP species and four closely related outgroup-species, namely M. kansasii, M. marinum, M. szulgai, and M. gordonae, referred here after as MGS–MKM outgroup, synonymous mutations were extracted to get an estimate of divergence time between these species and MTB (fig. 1). From these data, the most recent common ancestor (MRCA) of MTB and other species from the MTBAP lineage was found to be more recent than the MRCA of MTB and the MGS–MKM outgroup species. These data also suggest that MTBAP species are phylogenetically related descendants originating from a common progenitor, with a divergence time estimated to be 15–30 times longer than the one considered for M. tuberculosis and M. canettii within MTB.
. 1.

—Phylogenetic organization of MTB and closely related SGM species. (A) SGM phylogenetic tree. For tree construction, concatenated conserved protein sequences from 107 universally conserved bacterial genes, as defined by the bcgTree software, were extracted from 28 mycobacterial species. For data analysis, a similarity matrix was calculated using the JTT_DCmut+I+G model. The phylogentic tree was constructed using ML estimations. Bootstrap values were calculated from 500 replicates. Red branches: members of the MTBAP lineage. (B) Genetic distance of MTBAP lineage and MGS–MKM outgroup members relative to M. tuberculosis H37Rv. dS distribution of 923 core-genome genes compared with those of M. tuberculosis H37Rv. Synonymous mutation rates (dS) for each gene were determined using the PAML algorithm. y-Axis: logarithmic scale. Mcan, M. canettii STB-K; Mdec, M. decipiens; Mshi, M. shinjukuense; Mlac, M. lacus; Mriy, M. riyadhense; Mkan, M. kansasii; Mmar, M. marinum E11; Mszu, M. szulgai; Mgor, M. gordonae. Bold bar: median. Box edges: 25th and 75th percentiles. Whiskers: extreme values.

—Phylogenetic organization of MTB and closely related SGM species. (A) SGM phylogenetic tree. For tree construction, concatenated conserved protein sequences from 107 universally conserved bacterial genes, as defined by the bcgTree software, were extracted from 28 mycobacterial species. For data analysis, a similarity matrix was calculated using the JTT_DCmut+I+G model. The phylogentic tree was constructed using ML estimations. Bootstrap values were calculated from 500 replicates. Red branches: members of the MTBAP lineage. (B) Genetic distance of MTBAP lineage and MGS–MKM outgroup members relative to M. tuberculosis H37Rv. dS distribution of 923 core-genome genes compared with those of M. tuberculosis H37Rv. Synonymous mutation rates (dS) for each gene were determined using the PAML algorithm. y-Axis: logarithmic scale. Mcan, M. canettii STB-K; Mdec, M. decipiens; Mshi, M. shinjukuense; Mlac, M. lacus; Mriy, M. riyadhense; Mkan, M. kansasii; Mmar, M. marinum E11; Mszu, M. szulgai; Mgor, M. gordonae. Bold bar: median. Box edges: 25th and 75th percentiles. Whiskers: extreme values.

Shared Genomic Reduction Profiles

Because MTB are well-known to have undergone genome size reduction (<4.5 Mb) compared with M. kansasii (6.4 Mb) and M. marinum (6.6 Mb) outgroups, we assessed if other species from the MTBAP lineage displayed similar genomic reduction patterns. A preliminary study using EGGNOG gene functional classification from MicroScope (Vallenet et al. 2017) suggests that besides MTB, M. decipiens, M. shinjukuense, and M. lacus harbor significantly less genes in many functional categories (excluding basic, essential cellular functions), compared with other mycobacterial species (table 2), whereas M. riyadhense exhibits a less significant, intermediate pattern that is also associated with a relatively large genome size of 6.3 Mb (supplementary table 1, Supplementary Material online). To evaluate this trend in more detail, we used core-genome data to identify presence or absence of MICFAM gene families in each genome of the MTBAP and MGS–MKM lineages, which were then categorized by large-scale hierarchical clustering. Distance-tree analysis of species based on MICFAM gene family content shows that the MTBAP lineage represents a specific cluster, clearly distinct from the MGS–MKM outgroup, a finding that is further emphasized by the large number of gene families present in the MGS–MKM outgroup that are under-represented in all species of the MTBAP lineage (fig. 2). A model of gene gain and gene loss (fig. 2) confirms that loss of certain genes and gene families occurred within the MTBAP lineage. Most of the gene loss occurred within the branches leading to each species. In addition, some gene loss signatures are also shared between different lineages, which suggest that some of these genetic events might have been initiated at deep branching nodes at the base of the MTBAP lineage.
Table 2

EGGNOG Functional Category Counts of Annotated Genes in Genomes of MTB and Closely Related Mycobacterial Species

FunctionMtubMcanMdecMshiMlacMriyMkanMmarMszuMgor
Energy production and conversion212a214a293a230a282a322b347322366379
Cell cycle control, cell division, chromosome partitioning454535 b44414938324257
Amino acid transport and metabolism197a196a243187a210a224b236265242242
Nucleotide transport and metabolism74b777772b808280797475
Carbohydrate transport and metabolism147a141a175b166a170a197202193232240
Coenzyme transport and metabolism136a140a148a115a147a173158162171157
Lipid transport and metabolism213a210a273b216a265b289b311318313386
Translation, ribosomal structure, and biogenesis149147150139b140b150141154146151
Transcription227a235a281b221a273b356b306325356429
Replication, recombination, and repair233257176176188262248192152328
Cell wall/membrane/envelope biogenesis124a124a152127a123a152168149152175
Cell motility35a37a36a40b43b4547425250
Post-translational modification, protein turnover, chaperones103a108a106a115b115b126b142136132167
Inorganic ion transport and metabolism138a139a164a139a151a170a197202204235
Secondary metabolites biosynthesis, transport, and catabolism155a150a292145a217a273b263303351342
Signal transduction mechanisms110b105a131b109a111b147b182146186240
Intracellular trafficking, secretion, and vesicular transport22212222242822202328
Defense mechanisms59596261646777596287

Note.—Number of genes in each EGGNOG category.

Values below two standard deviation quantities as compared with the MGS–MKM outgroup.

Values below one standard deviation quantity, as compared with the MGS–MKM outgroup.

Mtub, M. tuberculosis H37Rv; Mcan, M. canettii STB-K; Mdec, M. decipiens; Mlac, M. lacus; Mriy, M. riyadhense; Mkan, M. kansasii; Mmar, M. marinum E11; Mszu, M. szulgai; Mgor, M. gordonae.

. 2.

—Genomic evolution in MTBAP lineage. (A) Color coded table representing a heatmap for which the rows and columns were sorted by hierarchical clustering approaches. The row tree shows the clustering of MICFAM protein families based on species profile similarity calculated by the Ward agglomerative hierarchical clustering algorithm. The column tree represents the clustering of species based on MICFAM profile. Red: under-represented gene families. Green: over-represented gene families. Color-key for Row Z-scores is shown together with a histogram indicating the number of MICFAM protein families associated with each of the Z-scores. (B) Estimated gene gain and loss in the MTBAP lineage and the MGS–MKM outgroup. Variable genome parts of the MTBAP lineage and the MGS–MKM outgroup were computed using the MICFAM tool with a 50% amino-acid identity threshold and 80% alignment coverage. A table representing presence or absence of gene families for each species was then analyzed by Gain Loss Mapping Engine. Red: branches showing more gene loss than gene gain. (C) dN/dS distribution of core-genome orthologs as compared with the M. gordonae outgroup. Bold bars indicate the median dN/dS values for each species. Notch estimates correspond to 95% confidence intervals for median values. Box edges represent 25th and 75th percentiles. Whiskers represent estimated extreme values. Mcan, M. canettii STB-K; Mdec, M. decipiens; Mshi, M. shinjukuense; Mlac, M. lacus; Mriy, M. riyadhense; Mkan, M. kansasii; Mmar, M. marinum E11; Mszu, M. szulgai; Mgor, M. gordonae.

—Genomic evolution in MTBAP lineage. (A) Color coded table representing a heatmap for which the rows and columns were sorted by hierarchical clustering approaches. The row tree shows the clustering of MICFAM protein families based on species profile similarity calculated by the Ward agglomerative hierarchical clustering algorithm. The column tree represents the clustering of species based on MICFAM profile. Red: under-represented gene families. Green: over-represented gene families. Color-key for Row Z-scores is shown together with a histogram indicating the number of MICFAM protein families associated with each of the Z-scores. (B) Estimated gene gain and loss in the MTBAP lineage and the MGS–MKM outgroup. Variable genome parts of the MTBAP lineage and the MGS–MKM outgroup were computed using the MICFAM tool with a 50% amino-acid identity threshold and 80% alignment coverage. A table representing presence or absence of gene families for each species was then analyzed by Gain Loss Mapping Engine. Red: branches showing more gene loss than gene gain. (C) dN/dS distribution of core-genome orthologs as compared with the M. gordonae outgroup. Bold bars indicate the median dN/dS values for each species. Notch estimates correspond to 95% confidence intervals for median values. Box edges represent 25th and 75th percentiles. Whiskers represent estimated extreme values. Mcan, M. canettii STB-K; Mdec, M. decipiens; Mshi, M. shinjukuense; Mlac, M. lacus; Mriy, M. riyadhense; Mkan, M. kansasii; Mmar, M. marinum E11; Mszu, M. szulgai; Mgor, M. gordonae. EGGNOG Functional Category Counts of Annotated Genes in Genomes of MTB and Closely Related Mycobacterial Species Note.—Number of genes in each EGGNOG category. Values below two standard deviation quantities as compared with the MGS–MKM outgroup. Values below one standard deviation quantity, as compared with the MGS–MKM outgroup. Mtub, M. tuberculosis H37Rv; Mcan, M. canettii STB-K; Mdec, M. decipiens; Mlac, M. lacus; Mriy, M. riyadhense; Mkan, M. kansasii; Mmar, M. marinum E11; Mszu, M. szulgai; Mgor, M. gordonae.

dN/dS Ratios Suggest Common Genome Evolution Dynamics within the MTBAP

To characterize the genome evolution dynamics of the different species, non-synonymous/synonymous mutation rates (dN/dS) were compared for the core genomes of the MTBAP and MGS–MKM group. As shown in figure 2, by using the commonly found environmental M. gordonae as comparator reference species, we observed dN/dS ratios that were significantly higher for M. tuberculosis than for M. szulgai, M. marinum, and M. kansasii, the latter considered as environmental mycobacteria. Of note, high dN/dS ratios were also obtained for M. canettii, M. decipiens, and M. shinjukuense, whereas M. lacus showed intermediate values.

Functional Profile Analysis Reveals Similar Protein Family Clustering

To explore if members of the MTBAP lineage were characterized by a specific protein content in terms of selected functional features, we retrieved the repertoire of PFAM protein family domains (Finn et al. 2006) associated specifically with each of the genomes of the MTBAP and MGS–MKM lineages. Multivariate analysis of these data, performed by PCA, showed that in agreement with aforementioned MICFAM clustering analyses, the MTBAP lineage constitutes a specific cluster (fig. 3). The first projection axe (PC1) explains 30% of the variance, and allows to clearly distinguish the MTBAP lineage from the species of the MGS–MKM outgroup (fig. 3). A striking consistency between data derived from MICFAM and PFAM domain analyses emphasize that genome contents of the MTBAP lineage members are substantially different from those of the MGS–MKM outgroup. In addition, BCA results showed that many PFAM domains were under-represented in the MTBAP lineage (negative BCA scores), apparently due to genomic reduction (fig. 3). However, we also noticed some positive BCA scores, indicating a particular over-representation of selected PFAM domains in MTBAP members, arguing that this lineage is not only defined by gene loss, but also by acquisition of specific functional features. Careful study of these PFAM domains over-represented in the MTBAP lineage revealed that many of them corresponded to toxin–antitoxin domains (red and green dots, respectively, fig. 3). Among these toxin–antitoxin domains, VapBC and MazEF type II systems appeared to be the most abundant in the MTBAP lineage (highest scores in y axis of fig. 3). Specific counts of the genes coding for VapC or MazF toxins (containing respectively PIN and PemK domains) confirmed that these families were significantly over-represented in M. shinjukuense, M. lacus, and M.riyadhense together with MTB members M. canettii and M. tuberculosis (fig. 4). It is clear from this analysis that certain mycobacterial species may harbor a greater repertoire of toxin–antitoxin systems, which were previously only reported for MTB (Ramage et al. 2009; Wang et al. 2015). Our results show now that with some exceptions found for M. decipiens, the MTBPA lineage members share an unexpectedly high number of VapBC and MazEF type II toxin–antitoxin systems. The high diversification of these toxin–antitoxin systems may be explained either by late genomic events that occurred independently in each of the species of the lineage, or by early genomic events that had occurred in a common ancestor of the lineage, followed by subsequent diversification. For the purpose of determining which of the two hypotheses was more likely, BLAST and BBH-based synteny analyses of VapC and MazF sequences of M. tuberculosis were undertaken in a variety of species. This approach revealed that among the 55 studied toxins, more than a third (35%) may be considered as orthologous proteins (BBH + synteny) that are shared among selected species of the MTBAP lineage, without showing orthologs in the MGS–MKM outgroup (fig. 4), arguing for an early acquisition of a substantial portion of toxin–antitoxin systems in species of the MTBAP lineage and subsequent diversification.
. 3.

—Multivariate and clustering analyses of PFAM domain contributions to the MTBAP lineage. (A) PCA of species from the MTBAP lineage and the MGS–MKM outgroup, based on occurrence of PFAM domains in each genome (only PFAM domains present in more than two species were conserved). Ellipse: 95% confidence value ellipse of MTBAP lineage members. (B) PFAM domain contribution to MTBAP lineage. x-Axis: PFAM domain scores determined by BCA (MTBAP lineage vs. MGS–MKM outgroup). y-Axis: PFAM domain differences in average occurrence for each class (MTBAP lineage vs. MGS–MKM outgroup). Red: toxin-associated domains. Green: antitoxin-associated domains.

. 4.

—Toxin–antitoxin content of mycobacterial species in the MTBAP lineage and other SGM. (A) Number of putative toxin genes bearing PIN and MazF domain in SGM. Toxins from toxin–antitoxin systems were identified from whole proteome data sets, using HMMER (PF02452.16—PemK_toxin; PF01850.20 PIN domain) with a 0.01 threshold e value. Orange: VapC. Blue MazF. **: Values above two standard deviation levels from SGM (outside MTBAP lineage and MGS–MKM outgroup) average. (B) Putative orthologs of M. tuberculosis VapC and MazF toxins. Yellow: BBH 50% translated sequence identity, BBH 80% query cover. Red: BBH and synteny. Mcan, M. canettii STB-K; Mdec, M. decipiens; Mshi, M. shinjukuense; Mlac, M. lacus; Mriy, M. riyadhense; Mkan, M. kansasii; Mmar, M. marinum E11; Mszu, M. szulgai; Mgor, M. gordonae.

—Multivariate and clustering analyses of PFAM domain contributions to the MTBAP lineage. (A) PCA of species from the MTBAP lineage and the MGS–MKM outgroup, based on occurrence of PFAM domains in each genome (only PFAM domains present in more than two species were conserved). Ellipse: 95% confidence value ellipse of MTBAP lineage members. (B) PFAM domain contribution to MTBAP lineage. x-Axis: PFAM domain scores determined by BCA (MTBAP lineage vs. MGS–MKM outgroup). y-Axis: PFAM domain differences in average occurrence for each class (MTBAP lineage vs. MGS–MKM outgroup). Red: toxin-associated domains. Green: antitoxin-associated domains. —Toxin–antitoxin content of mycobacterial species in the MTBAP lineage and other SGM. (A) Number of putative toxin genes bearing PIN and MazF domain in SGM. Toxins from toxin–antitoxin systems were identified from whole proteome data sets, using HMMER (PF02452.16—PemK_toxin; PF01850.20 PIN domain) with a 0.01 threshold e value. Orange: VapC. Blue MazF. **: Values above two standard deviation levels from SGM (outside MTBAP lineage and MGS–MKM outgroup) average. (B) Putative orthologs of M. tuberculosis VapC and MazF toxins. Yellow: BBH 50% translated sequence identity, BBH 80% query cover. Red: BBH and synteny. Mcan, M. canettii STB-K; Mdec, M. decipiens; Mshi, M. shinjukuense; Mlac, M. lacus; Mriy, M. riyadhense; Mkan, M. kansasii; Mmar, M. marinum E11; Mszu, M. szulgai; Mgor, M. gordonae.

HGT and Acquisition of Rare Virulence and Cellular Stress Factors

As toxin–antitoxin systems are often found within genomic islands acquired by HGT (Ramage et al. 2009) and may play important roles in shaping MTB gene content (Becq et al. 2007; Supply et al. 2013; Levillain et al. 2017), we performed a genome-wide screen to search for M. tuberculosis genes that likely were acquired in a common progenitor during the evolution of the MTBAP lineage before MTB speciation (i.e., after divergence from the MGS–MKM outgroup, but before branching of M. canettii and M. tuberculosis lineages). Such genes were determined by BBH-analysis to find M. tuberculosis/M. canettii genes with putative orthologs in other species from the MTBAP lineage that had no ortholog in the genomes of any of the other SGM species used in our study (supplementary table 1, Supplementary Material online). Our screen identified 169 genes of M. tuberculosis that were putatively acquired during the evolution of the MTBAP lineage before MTB speciation. These genes were present in 32 of the 46 identified M. tuberculosis RGP (supplementary table 2, Supplementary Material online), suggesting that many these genes, previously considered as M. tuberculosis-specific genomic traits, were acquired by HGT by a common ancestor in the MTBAP lineage before MTB speciation. We then asked if among these genes, some might play a role in pathogenicity. A literature review performed on these 169 genes, allowed us to identify 35 of them that were reported to be involved in survival or growth of MTB during infection of mononuclear phagocytic cells or in animal models (table 3), representing 21% of the initial gene pool, and 33% of the above-identified RGP. Among these genes, some were encoding regulation systems, comprising transcriptional regulation factors such as MosR, VirS, or Rv1359 adenyl cyclase, as well as several toxin–antitoxin systems (VapC24, VapBC19, VapBC20, VapC44). Strikingly, more than a third of these likely virulence-associated genes encode proteins involved in lipid metabolism and/or lipid transport (mosR, mce2B, Rv0895, Rv1288, Rv2954c, Rv2955c, virS, Rv3087, Rv3376, Rv3377c, lipF, fadD23). Many of them are predicted to be involved in biochemical processes linked to the mycomembrane of MTB, being implicated in mycolic acid composition (virS, Rv3087) (Singh et al. 2005), phenolic glycolipid glycosylation (Rv2954c, Rv2955c) (Simeone et al. 2013), and sulfolipid synthesis and accumulation (mce2B, fadD23) (Lynett and Stokes 2007; Marjanovic et al. 2011). Many genes were reported as induced under phagosome-associated stress, such as acidic pH (virS, lipF), and nutrient starvation (Rv0064, mosR, Rv1359, frdA, Rv2275, Rv3087, vapC44) as well as under specific conditions within the phagosome, the granuloma or the lungs (Rv0064, vapC24, frdA, vapC19, vapB19, vapB20, virS, PE_PGRS50) (Betts et al. 2002; Rachman et al. 2006; Watanabe et al. 2011; Mendum et al. 2015; Hudock et al. 2017). Moreover, genes encoding diterpene synthase (Rv3377c), implicated in production of diterpene nucleoside 1-TbAd involved in phagosome maturation arrest, or cyclopeptide synthase (Rv2275), producing mycocyclosin involved in intra-macrophage survival, are shared by selected species of the MTBAP lineage. Finally, we found that some of these potential virulence factors were encoded on obvious genomic islands (fig. 5), for example the fumarate reductase GI (Rv1552-1555), Myma GI (Rv3082c-3087), or sulfolipid synthesis GI (Rv3820c-3826). As summarized in figure 6, our results suggest that many genetic and genomic factors, formerly thought to correspond to MTB-specific virulence factors, are also present in certain other MTBAP species, and were probably acquired long time before MTB speciation, supporting an evolutionary scenario in which tuberculosis-causing mycobacteria emerged from an ancient family of mycobacteria that had “learned” to cope with intracellular stress conditions, likely through long-lasting interaction with yet unknown eukaryotic host organisms.
Table 3

Mycobacterium tuberculosis Virulence Genes Probably Acquired before MTB Speciation, and within MTBAP Lineage Evolution

LabelGeneProductFunctionPhenotypeIntracellular SurvivalAnimal Infection ModelMcanMdecMshiMlacMriySGM
Rv0064Membrane proteinDC (Mendum et al. 2015)+++++
Rv0071Deleted in L2/Beijing lineage M. tuberculosis strains (RD105)M (Stewart et al. 2005)+
Rv0240 vapC24 Toxin–antitoxinRegulationInduced under lysosomal stress conditions (Lin et al. 2016)Macaque (Dutta et al. 2010)+
Rv0348 mosR Transcriptional regulatorRegulationHypoxia responsive regulator (Abomoelak et al. 2009)Mouse (Abomoelak et al. 2009)++
Rv0590 mce2B TransporterTransport (lipids)Belongs to mce2 operon, involved in sulfolipid accumulation (Marjanovic et al. 2011)Mouse (Marjanovic et al. 2010)++++
Rv0890cTranscriptional regulatorRegulationDC (Mendum et al. 2015)++++
Rv0893cSAM methyltransferaseSecondary metabolismM (Zhang et al. 2013)++++
Rv0895DAG-O-acyltransferaseSecondary metabolism (lipids)M (Rengarajan et al. 2005) and DC (Mendum et al. 2015)+++++
Rv0977 PE_PGRS16 PE_PPESecreted and surface proteinsDC (Mendum et al. 2015)++++++
Rv1288Putative mycolyltransferase IISecondary metabolism (lipids)DC (Mendum et al. 2015)+++++
Rv1359Transcriptional regulatorRegulationDC (Mendum et al. 2015)+++
Rv1442 bisC Biotin sulfoxide reductaseElectron transfer activityOxidative stress resistance (putative)DC (Mendum et al. 2015)++++++ (p)+++ (p)
Rv1552 frdA Fumarate reductase subunitElectron transfer activityHypoxia adaptation (Watanabe et al. 2011). Phagosome acidification arrest (Stewart et al. 2005)DC (Mendum et al. 2015)++++++++++
Rv1739c sulP TransporterTransportSulfate uptake (Zolotarev et al. 2008)DC (Mendum et al. 2015)+++++
Rv1981c nrdF1 DNA-methylaseRegulationDeleted in M. bovis BCG strains (RD2)Mouse (Kozak et al. 2011)+++++
Rv2275Cyclopeptide synthaseSecondary metabolism (mycocyclosin)Mouse (Sassetti and Rubin 2003)+++++
Rv2328 PE23 PE_PPESecreted and surface proteinsM (Stewart et al. 2005)+++
Rv2547 vapC19 Toxin–antitoxinRegulationInduced within granulomas (Hudock et al. 2017).M (Stewart et al. 2005)++++++
Rv2548 vapB19 Toxin–antitoxinRegulationInduced within granulomas (Hudock et al. 2017)M (Stewart et al. 2005)++++++
Rv2549c vapC20 Toxin–antitoxinRegulationM (Stewart et al. 2005)++++++
Rv2550c vapB20 Toxin–antitoxinRegulationM (Stewart et al. 2005)++++++
Rv2735cDC (Mendum et al. 2015)+++
Rv2954cFucosyltransferaseSecondary metabolism (phenolglycolipid)M (Rosas-Magallanes et al. 2007)++++
Rv2955cFucosyltransferaseSecondary metabolism (phenolglycolipid)M (Rosas-Magallanes et al. 2007)++++
Rv3082c virS Transcriptional regulatorRegulationActivation of Myma operon upon acidic pH and within phagosome (Singh et al. 2003)M (Singh et al. 2005)Guinea pig (Singh et al. 2005)++++
Rv3087DAG-O-acyltransferaseSecondary metabolism (lipids)Belongs to Myma operon, involved in mycolic acid content (Singh et al. 2005)M (Singh et al. 2005)Mouse (Sassetti and Rubin 2003), Guinea pig (Singh et al. 2005)++++
Rv3179DC (Mendum et al. 2015)++++ (p)
Rv3320c vapC44 Toxin–antitoxinRegulationInduced in nutrient starvation conditions (Albrethsen et al. 2013)Mouse deltaMHC (Zhang et al. 2013)++++
Rv3343c PPE54 PE_PPESecreted and surface proteinsOxidative stress resistance (Mestre et al. 2013)DC (Mendum et al. 2015)+++++
Rv3345c PE_PGRS50 PE_PPESecreted and surface proteinsDC (Mendum et al. 2015)+++
Rv3376PhosphataseSecondary metabolism (diterpene 1TbAd)Phagolysosome maturation arrest (Pethe et al. 2004)M (Rengarajan et al. 2005)++++
Rv3377cDiterpene synthaseSecondary metabolism (diterpene 1TbAd)Phagolysosome maturation arrest (Pethe et al. 2004)DC (Mendum et al. 2015)++++++
Rv3476c kgtP TransporterTransportDC (Mendum et al. 2015)++++
Rv3487c lipF Lipase-esteraseSecondary metabolism (lipids)Upregulated under acidic pH conditions (Richter and Saviola 2009)Mouse (Camacho et al. 1999)++++
Rv3826 fadD23 Acyl-CoA synthetaseSecondary metabolism (sulfolipids)Lower binding affinity for macrophages (Lynett and Stokes 2007)DC (Mendum et al. 2015)+++

Notes.—List of experimentally confirmed M. tuberculosis virulence genes that have at least one ortholog among other species of the MTBAP lineage (M. canettii is not included in the analysis) and no ortholog in any SGM depicted in the representative SGM database.

Presence of a putative ortholog by BBH-analysis.

Presence of a putative ortholog by BBH-analysis and synteny confirmation, -Absence of putative ortholog.

(p)Putative pseudogene.

Mtub, M. tuberculosis H37Rv; Mcan, M. canettii STBK; Mdec, M. decipiens; Mlac, M. lacus; Mriy, M. riyadhense.

. 5.

—Representation of selected genomic islands. The genomic regions depict the fumarate reductase locus (Rv1552-1555), the Myma locus (Rv3082c-3087), or the sulfolipid synthesis locus (Rv3820c-3826), and the surrounding genomic regions in M. tuberculosis, M. decipiens, and M. kansasii. Pink links: homologous genomic regions. Filled arrows: genes in genomic islands.

. 6.

—Hypothetical evolutionary scenario of the members of the MTBAP lineage. In this schematic representation, the likely evolutionary breakpoint is marked, beyond which the concerned members depict shared host-adapted traits.

—Representation of selected genomic islands. The genomic regions depict the fumarate reductase locus (Rv1552-1555), the Myma locus (Rv3082c-3087), or the sulfolipid synthesis locus (Rv3820c-3826), and the surrounding genomic regions in M. tuberculosis, M. decipiens, and M. kansasii. Pink links: homologous genomic regions. Filled arrows: genes in genomic islands. —Hypothetical evolutionary scenario of the members of the MTBAP lineage. In this schematic representation, the likely evolutionary breakpoint is marked, beyond which the concerned members depict shared host-adapted traits. Mycobacterium tuberculosis Virulence Genes Probably Acquired before MTB Speciation, and within MTBAP Lineage Evolution Notes.—List of experimentally confirmed M. tuberculosis virulence genes that have at least one ortholog among other species of the MTBAP lineage (M. canettii is not included in the analysis) and no ortholog in any SGM depicted in the representative SGM database. Presence of a putative ortholog by BBH-analysis. Presence of a putative ortholog by BBH-analysis and synteny confirmation, -Absence of putative ortholog. (p)Putative pseudogene. Mtub, M. tuberculosis H37Rv; Mcan, M. canettii STBK; Mdec, M. decipiens; Mlac, M. lacus; Mriy, M. riyadhense.

Discussion

In the more than 20 years since the publication of the first complete mycobacterial genome sequence, featuring the reference strain M. tuberculosis H37Rv (Cole et al. 1998), the phenomenal increase in available sequence data has allowed an important refinement of the phylogenetic structure of the clonal population of M. tuberculosis and the MTBC to be achieved (Brosch et al. 2002; Mostowy et al. 2002; Gagneux and Small 2007; Comas et al. 2013). This information was further enriched by genome data from M. canettii strains, which share at least 98% sequence identity with M. tuberculosis strains, but differ from them by a highly recombinogenic population structure (Supply et al. 2013; Blouin et al. 2014; Boritsch et al. 2016b) and a yet unknown reservoir or environmental source. Apart from M. canettii, the closest known outgroups used until now for identifying M. tuberculosis-genome characteristics, and inferring its evolutionary history, were M. marinum (Stinear et al. 2008) and M. kansasii (Wang et al. 2015), which are both environmental mycobacteria characterized by substantially larger genomes (6.6 and 6.4 Mb, respectively) than M. tuberculosis (4.4 Mb) or M. canettii (4.5 Mb). This situation means that there was still a large gap between M. tuberculosis and the currently used outgroups, which implies that we had still not found the transitional forms (colloquially named “missing link”) between the professional, highly virulent pathogen M. tuberculosis and the more distantly related, much less virulent, environmental mycobacteria such as M. marinum or M. kansasii. In this study, we thus undertook a global comparative genomic analysis of M. tuberculosis, M. canettii, and closely related mycobacterial species. Among them were some recently described mycobacterial species, which were reported to have high ANI values, and to be closely related to M. tuberculosis (Tortoli et al. 2017). ANI is a global pairwise similarity index that is a useful tool to delineate species delimitation (Konstantinidis and Tiedje 2007). However, ANI was never proven to be a faithful measure for longer evolutionary distances, and previous ANI-based studies did not rely on explicit bootstrap confidence values associated with the inferred phylogenetic trees. Hence, we used multi-gene concatenated sequence multiple alignment (Ankenbrand and Keller 2016), to establish a phylogenetic model with strong bootstrap values, which indicates that M. decipiens, M. lacus, M. riyadhense, and M. shinjukuense, as well as M. canettii and M. tuberculosis belong to a single phylogenetic lineage, termed MTBAP. Phenotypic and epidemiological characteristics of mycobacteria belonging to this MTBAP lineage suggest that they share some specific traits that are different from related environmental mycobacteria of the closest outgroup. Whereas M. gordonae, M. szulgai, M. marinum, or M. kansasii is commonly isolated from environmental water samples (Leoni et al. 1999; Le Dantec et al. 2002; Vaerewijck et al. 2005), species from the MTBAP lineage were exclusively recovered from human clinical patient samples (Turenne et al. 2002; Saad et al. 2015; Takeda et al. 2016; Brown-Elliott et al. 2018). Contrary to most environmental mycobacteria of the MGS–MKM group, all of the MTBAP members are nonpigmented and show optimal growth temperatures higher than 25 °C. Our results show that the nontuberculous members of the MTBAP lineage constitute a specific group of mycobacterial species sharing several genomic characteristics associated with host-adaptation, strongly supporting the hypothesis that ancestral founders of this lineage might represent pivotal evolutionary intermediates between the non- or low-virulent environmental mycobacteria and the highly virulent professional pathogens such as M. tuberculosis. The study unveiled that specific genomic characteristics of MTB genomes, such as massive acquisition of toxin–antitoxin systems, were shared with most of the members of the MTBAP lineage. Indeed, our results suggest that the acquisition of a substantial part of the repertoire of M. tuberculosis VapBC and MazEF type II toxin–antitoxin systems was acquired in the MTBAP lineage before MTB speciation. In this perspective, M. decipiens represents as exception to the trend as this recently defined species does not seem to harbor the plethora of toxin–antitoxin encoding genes present in other members. To answer the question how this could happen, two working hypotheses may be proposed. The first hypothesis evokes that massive toxin–antitoxin acquisition occurred twice in the MTBAP lineage, due to convergent evolution dynamics whereas the second possible hypothesis suggests that massive toxin–antitoxin acquisition was initiated early in the MTBAP lineage ancestor, and that M. decipiens underwent a subsequent loss of most of these systems due to yet unknown reasons. Gene orthology and synteny analyses of toxin–antitoxin encoding genes suggest an ancestral acquisition for a significant part of these genes, and partial loss by certain members of the MTBAP lineage. This scenario means that a pivotal evolutionary breakpoint was initiated a long time before MTB speciation, giving rise to a distinct group of mycobacteria that likely evolved into opportunistic pathogens and acquired many of the characteristic features that today define the professional pathogen M. tuberculosis. Our attempts to estimate the putative emergence time of the last common ancestor of the MTBAP lineage using synonymous substitution comparisons with M. tuberculosis H37Rv, are consistent with previous calculations performed on a small gene set for M. canettii (Gutierrez et al. 2005). Synonymous mutations can be considered as neutral markers (Zuckerkandl and Pauling 1965; Kimura 1979) that linearly accumulate over time (Kishimoto et al. 2015). If we infer a constant mutation rate, our results suggest that the divergence between the MRCA of reference strain M. tuberculosis H37Rv and M. riyadhense occurred about 30 times earlier than the divergence between M. tuberculosis H37Rv and M. canettii STB-K, which is the most distantly related M. canettii strain relative to M. tuberculosis, known today (Supply et al. 2013; Blouin et al. 2014; Boritsch et al. 2014). Moreover, the MRCA of M. tuberculosis and either M. riyadhense, M. lacus, or M. shinjukuense is very close, suggesting an adaptive radiation process that might have occurred during, or soon after, the genomic evolutionary breakpoint that we can observe in the data sets. The estimations on the molecular clock rates for M. tuberculosis and the MTBC are still debated, as is it is difficult to calculate such rates for clonal organisms given certain evolutionary bottlenecks that may induce strong biases in long-term fixed mutation data (Comas and Gagneux 2009). Recent estimations based on genome analyses of ancient mycobacterial DNA from 1,000-year-old mummies placed the divergence of the clonal MTBC at about 6,000 years before present (Bos et al. 2014), which is one of the shortest evolutionary estimates for the MTBC, as by the use of other models the divergence time of the MTBC has been postulated to be up to 70,000 years (Comas et al. 2013). Using the shortest estimates, this would still suggest that the evolutionary breakpoint at the root of the MTBAP lineage occurred at least several hundred thousand years before M. tuberculosis specialization to the human host. However, previous studies on M. canettii genomic diversity reported phylogenetic profiles suggesting that M. canettii diversification might be 5–25 time more ancient than the rise of the clonal MTBC (Supply et al. 2013). Hence, under constant mutation rates, a rough estimate of the emergence of the MTBAP lineage would be 150–750 times earlier than the rise of the clonal MTBC. Although precise time estimates about the emergence of the MTBC are still under debate and will require further investigations on ancient DNAs (Donoghue 2016), our study reveals that major transitions in the evolutionary history of tuberculosis-causing mycobacteria occurred much earlier than previously thought, and were marked by early acquisition of host-adaptation characteristics shared with other nontuberculous mycobacteria of the MTBAP lineage. These reflections suggest a very long initiating period of co-evolution, with yet unknown hosts, that pre-dated MTB speciation and M. tuberculosis specialization as a human pathogen. Genomic reduction and expansion of toxin–antitoxin systems has been suggested to be correlated with gain in virulence (Dawson et al. 2018). Thus, the evolutionary breakpoint that gave rise to the MTBAP lineage might correspond to an, at least partial, transition point toward host-adapted lifestyle. Other features observed in our study, such as the loss of core genes observed in MTB, M. decipiens, M. shinjukuense, M. lacus, and to a lesser extent in M. riyadhense, and the specific MICFAM profiles support this hypothesis. The shift in core-genome dN/dS distribution pattern that we observe for MTB, M. decipiens, M. shinjukuense, and to a lesser extent for M. lacus, is a characteristic trait usually interpreted as a genomic marker associated with reduced population size and/or a genetic bottleneck that is consistent with specialized pathogen population dynamics. In agreement with previous reports, this pattern is considered as highly correlated with host-adaptation (Kuwahara et al. 2008; Kuo and Ochman 2009; Kjeldsen et al. 2012). Conversely, as observed for M. szulgai, M. marinum, and M. kansasii, this parameter is very stable within free-living, related bacteria (Kuo and Ochman 2009; Novichkov et al. 2009). Taken together, these trends are consistent with an early transition from free-living environmental to a specialized life-style within reduced ecological niches. Thereby, our results further support the hypothesis that the emergence of the MTBAP lineage included important steps toward a host-adapted lifestyle. This framework is consistent with a “quantum” evolutionary dynamic initiated by extensive genome-scale shifts, rather than by gradual evolution (Simpson 1944; Gould 1980). In this scenario, M. riyadhense, which is most distantly related to MTB within the MTBAP lineage, shows an intermediate pattern (with large toxin–antitoxin expansion, but only limited core-gene loss, a still-large genome size, and a dN/dS ratio similar to environmental species), suggesting a possible transitional form (“missing link”) at the root of the MTBAP lineage, that acquired some host-adaptation traits, but also conserved several traits associated with classical environmental life-style. These and all other results shed light on the emergence of MTB, and unveil pivotal transition that early shaped its major genomic characteristics, allowing us to propose a large-scale evolutionary model (fig. 6). Interestingly, this evolutionary scenario shows some similarities to a recently established extended model of the leprosy bacilli (Stinear and Brosch 2016). For many years it was thought that the etiological agent of human leprosy, Mycobacterium leprae, was an isolated, clonal species that had undergone massive genome decay (Cole et al. 2001) and was the sole agent of all forms of leprosy. However, more recent genome-based studies revealed that related leprosy bacilli, named Mycobacterium lepromatosis, shared significant genomic characteristics with M. leprae (Singh et al. 2015) and were responsible for particular leprosy-like pathologies in humans and in squirrels (Avanzi et al. 2016). Extensive genome analyses also established that these leprosy bacilli were closely related to the emerging pathogen Mycobacterium haemophilum (Tufariello et al. 2015), as well as to the most recently defined Mycobacterium uberis, an emerging skin pathogen in dairy animals (Benjak et al. 2018). It is clear from these observations that with the ongoing massive increase in the availability of bacterial genome sequences, important new insights into the evolution, phylogeny and population structure of key human pathogens may be obtained that can substantially revise our view on these agents and the diseases they cause. Genome comparisons can reveal the molecular tools bacteria employ to establish a pathogenic lifecycle in selected hosts, that is to survive and proliferate within intracellular and/or extracellular host environments and induce lesions and disease. For M. tuberculosis these features represent key requirements for its efficient transmission to new hosts. Some major virulence factors of M. tuberculosis are very common functions shared by many mycobacteria, such as ESX systems (Dumas et al. 2016; Newton-Foot et al. 2016), which were refined and adapted by SGM pathogens for survival in the host (Groschel et al. 2016), whereby the independent acquisition of an apparent genomic island harboring ESX-associated genes espACD has likely played an important role (Ates and Brosch 2017; Orgeur and Brosch 2018). Other examples are the many genes involved in lipid metabolism, a most important feature for M. tuberculosis, given its lipid-rich cell envelope and its capacity to use host lipids as nutrients (Madacki et al. 2018). However, although such functions are necessary for virulence of M. tuberculosis, they cannot solely explain the specific transition to host-adaptation and pathogenicity, because many of them are also found in environmental mycobacteria. Previous studies on genomic comparisons of M. kansasii or M. marinum and MTB, detected major HGT events which were thought to have contributed to MTB emergence as host-associated mycobacteria (Rosas-Magallanes et al. 2006; Becq et al. 2007; Stinear et al. 2008; Wang and Behr 2014; Wang et al. 2015). However, in these former studies a wide evolutionary gap remained between MTB and the environmental mycobacteria that served as a comparison point, leaving decisive steps of MTB differentiation unexplored. Here, our RGP analysis showed that most of these genomic domains, previously thought to have been acquired by HGT during MTB differentiation from M. kansasii and/or M. marinum common ancestors, were also present in other species of the MTBAP lineage and thus apparently occurred already before MTB speciation. This observation was particularly striking for the toxin–antitoxin systems, which are reported to be involved in dormancy and resistance against environmental stresses induced by antibiotics or host immune defenses (Coussens and Daines 2016), as well as in pathogenicity (Lobato-Marquez et al. 2016). Indeed, our results suggest that some of these systems, which were apparently already acquired by early members of the MTBAP lineage, are involved in virulence, because M. tuberculosis vapC24 and vapC44 transposon insertion mutants are attenuated in selected animal infection models (Dutta et al. 2010; Zhang et al. 2013), whereas vapBC19 and vapBC20 were reported to be induced in granulomas (Hudock et al. 2017). Our study highlights the likely early acquisition of other major regulatory functions such as transcriptional regulators linked with pathogenicity (MosR, VirS, Rv1359 adenylate cyclase). All of these regulators are associated with mycobacterial lipids. A deletion mutant for the gene virS, for example, is affected in mycolic acid composition, spleen persistence in guinea pigs, and resistance to acidic stress (Singh et al. 2005), whereas a strain with a mosR deletion is deficient in phthiocerol dimycocerosate (PDIM) synthesis (Abomoelak et al. 2009). Moreover, the Rv1359 adenylate cyclase is probably involved in PDIM regulation (Camacho et al. 1999). PDIMs are among the first virulence factors that were identified for M. tuberculosis by using transposon insertion mutants and mouse infection models (Camacho et al. 1999; Cox et al. 1999). PDIMs were also shown to be involved in resistance to oxidative burst produced by macrophages, and modulation of immunity (Rousseau et al. 2004). Tight regulation of PDIM was shown to impact phagosomal damage and autophagy in host cells (Quigley et al. 2017), a finding that is also linked to the combined action with the ESX secretion system (Augenstreich et al. 2017). Our study shows that many potential virulence factors that were likely acquired during evolution of the MTBAP lineage before speciation of MTB exhibit many lipid-associated functions, and are involved in M. tuberculosis synthesis of specific mycomembrane components (such as sulfolipid synthesis, glycolipid glycosylation, and mycolic acid composition) (table 3). As outlined in the “Results” section, acquisition of concerned genes or gene clusters by HGT seems to have paved the way of selected MTBAP members to adopt a pathogenic lifestyle. The obtained insights allow prediction of the lipid content of the different MTBAP members, a feature that can thus be experimentally compared in future wet-lab-based lipid analyses and cell culture infection experiments. Based on our results from genome-based comparisons and analyses that place several species of nontuberculous mycobacteria into the immediate phylogenetic proximity of M. tuberculosis, a range of experiments can be designed to explore the specific roles played by the various molecular systems potentially involved in dormancy, survival under anaerobic conditions, and persistence of M. tuberculosis. Selected members of the MTBAP lineage could thereby serve as promising model organisms to find new, vulnerable features of M. tuberculosis, which can then be targeted by innovative new intervention strategies.

Supplementary Material

Supplementary data are available at Genome Biology and Evolution online. Click here for additional data file.
  12 in total

1.  Novel Antibacterial Activity of Febuxostat, an FDA-Approved Antigout Drug against Mycobacterium tuberculosis Infection.

Authors:  Lee-Han Kim; Soon Myung Kang; Jake Whang; Kee Woong Kwon; Sung Jae Shin
Journal:  Antimicrob Agents Chemother       Date:  2022-08-30       Impact factor: 5.938

Review 2.  Macrophage: A Cell With Many Faces and Functions in Tuberculosis.

Authors:  Faraz Ahmad; Anshu Rani; Anwar Alam; Sheeba Zarin; Saurabh Pandey; Hina Singh; Seyed Ehtesham Hasnain; Nasreen Zafar Ehtesham
Journal:  Front Immunol       Date:  2022-05-06       Impact factor: 8.786

3.  ESX-1-Independent Horizontal Gene Transfer by Mycobacterium tuberculosis Complex Strains.

Authors:  Jan Madacki; Mickael Orgeur; Guillem Mas Fiol; Wafa Frigui; Laurence Ma; Roland Brosch
Journal:  mBio       Date:  2021-05-18       Impact factor: 7.867

4.  Comparative Genomic and Transcriptomic Analyses of Mycobacterium kansasii Subtypes Provide New Insights Into Their Pathogenicity and Taxonomy.

Authors:  Qingtian Guan; Roy Ummels; Fathia Ben-Rached; Yara Alzahid; Mohammad S Amini; Sabir A Adroub; Jakko van Ingen; Wilbert Bitter; Abdallah M Abdallah; Arnab Pain
Journal:  Front Cell Infect Microbiol       Date:  2020-03-24       Impact factor: 5.293

Review 5.  PE_PGRS proteins of Mycobacterium tuberculosis: A specialized molecular task force at the forefront of host-pathogen interaction.

Authors:  Flavio De Maio; Rita Berisio; Riccardo Manganelli; Giovanni Delogu
Journal:  Virulence       Date:  2020-12       Impact factor: 5.882

6.  Measurable genomic changes in Mycobacterium avium subsp. hominissuis after long-term adaptation in Acanthamoeba lenticulata and reduced persistence in macrophages.

Authors:  Nabeeh A Hasan; Grant J Norton; Ravleen Virdi; L Elaine Epperson; Charmie K Vang; Brandon Hellbusch; Xiyuan Bai; Edward D Chan; Michael Strong; Jennifer R Honda
Journal:  J Bacteriol       Date:  2021-01-11       Impact factor: 3.490

7.  Mycobacterium bovis uses the ESX-1 Type VII secretion system to escape predation by the soil-dwelling amoeba Dictyostelium discoideum.

Authors:  Rachel E Butler; Alex A Smith; Tom A Mendum; Aneesh Chandran; Huihai Wu; Louise Lefrançois; Mark Chambers; Thierry Soldati; Graham R Stewart
Journal:  ISME J       Date:  2020-01-02       Impact factor: 10.302

8.  TbD1 deletion as a driver of the evolutionary success of modern epidemic Mycobacterium tuberculosis lineages.

Authors:  Daria Bottai; Wafa Frigui; Fadel Sayes; Mariagrazia Di Luca; Dalila Spadoni; Alexandre Pawlik; Marina Zoppo; Mickael Orgeur; Varun Khanna; David Hardy; Sophie Mangenot; Valerie Barbe; Claudine Medigue; Laurence Ma; Christiane Bouchier; Arianna Tavanti; Gerald Larrouy-Maumus; Roland Brosch
Journal:  Nat Commun       Date:  2020-02-04       Impact factor: 14.919

Review 9.  Mycobacterial virulence: impact on immunogenicity and vaccine research.

Authors:  Vera M Kroesen; Jan Madacki; Wafa Frigui; Fadel Sayes; Roland Brosch
Journal:  F1000Res       Date:  2019-11-28

10.  Heterologous Production of 1-Tuberculosinyladenosine in Mycobacterium kansasii Models Pathoevolution towards the Transcellular Lifestyle of Mycobacterium tuberculosis.

Authors:  Marwan Ghanem; Jean-Yves Dubé; Joyce Wang; Fiona McIntosh; Daniel Houle; Pilar Domenech; Michael B Reed; Sahadevan Raman; Jeffrey Buter; Adriaan J Minnaard; D Branch Moody; Marcel A Behr
Journal:  mBio       Date:  2020-10-20       Impact factor: 7.867

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.