| Literature DB >> 35495132 |
Muthukumar Balamurugan1, Ruma Banerjee1, Sunitha Manjari Kasibhatla1, Archana Achalere1, Rajendra Joshi1.
Abstract
A total of two lineages of Mycobacterium tuberculosis var. africanum (Maf), L5 and L6, which are members of the Mycobacterium tuberculosis complex (MTBC), are responsible for causing tuberculosis in West Africa. Regions of difference (RDs) are usually used for delineation of MTBC. With increased data availability, single nucleotide polymorphisms (SNPs) promise to provide better resolution. Publicly available 380 Maf samples were analyzed for identification of "core-cluster-specific-SNPs," while additional 270 samples were used for validation. RD-based methods were used for lineage-assignment, wherein 31 samples remained unidentified. The genetic diversity of Maf was estimated based on genome-wide SNPs using phylogeny and population genomics approaches. Lineage-based clustering (L5 and L6) was observed in the whole genome phylogeny with distinct sub-clusters. Population stratification using both model-based and de novo approaches supported the same observations. L6 was further delineated into three sub-lineages (L6.1-L6.3), whereas L5 was grouped as L5.1 and L5.2 based on the occurrence of RD711. L5.1 and L5.2 were further divided into two (L5.1.1 and L5.1.2) and four (L5.2.1-L5.2.4) sub-clusters, respectively. Unassigned samples could be assigned to definite lineages/sub-lineages based on clustering observed in phylogeny along with high-confidence posterior membership scores obtained during population stratification. Based on the (sub)-clusters delineated, "core-cluster-specific-SNPs" were derived. Synonymous SNPs (137 in L5 and 128 in L6) were identified as biomarkers and used for validation. Few of the cluster-specific missense variants in L5 and L6 belong to the central carbohydrate metabolism pathway which include His6Tyr (Rv0946c), Glu255Ala (Rv1131), Ala309Gly (Rv2454c), Val425Ala and Ser112Ala (Rv1127c), Gly198Ala (Rv3293) and Ile137Val (Rv0363c), Thr421Ala (Rv0896), Arg442His (Rv1248c), Thr218Ile (Rv1122), and Ser381Leu (Rv1449c), hinting at the differential growth attenuation. Genes harboring multiple (sub)-lineage-specific "core-cluster" SNPs such as Lys117Asn, Val447Met, and Ala455Val (Rv0066c; icd2) present across L6, L6.1, and L5, respectively, hinting at the association of these SNPs with selective advantage or host-adaptation. Cluster-specific SNPs serve as additional markers along with RD-regions for Maf delineation. The identified SNPs have the potential to provide insights into the genotype-phenotype correlation and clues for endemicity of Maf in the African population.Entities:
Keywords: Mycobacterium africanum; SNP; bioinformatics; lineage; population genomics
Year: 2022 PMID: 35495132 PMCID: PMC9043288 DOI: 10.3389/fgene.2022.800083
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
FIGURE 1Flowchart for genetic diversity analysis of Mycobacterium africanum.
FIGURE 2Methodology for identification of core-cluster-specific SNPs. Cluster-specific-core SNPs are represented by green ovals, and cluster-specific-total SNPs are represented by solid red boxes.
FIGURE 3Phylogenetic tree of Mycobacterium africanum L6 samples derived using the maximum likelihood method, as implemented in RAxML based on genome-wide SNPs. Significant bootstrap values > 70% are represented by solid gray circles.
FIGURE 4Phylogenetic tree of Mycobacterium africanum L5 samples derived using the maximum likelihood method, as implemented in RAxML based on genome-wide SNPs. Pink circles represent the six major monophyletic clusters observed. Samples marked in light pink represent admixed samples as reported by the STRUCTURE tool. Significant bootstrap values >70% are represented by solid gray circles.
Details of datasets and their corresponding total number of SNPs and PI sites used for the study.
| Dataset | Lineage/sub-lineage | # Samples | # SNPs | # PI sites |
|---|---|---|---|---|
| D1 | L5 and L6 | 380 | 15,501 | 15,390 |
| D2 | L6 | 197 | 7,373 | 7,028 |
| D3 | L5 | 183 | 5,818 | 5,436 |
| Group 1 | L5.1 | 157 | 4,633 | 4,147 |
| Group 2 | L5.2 | 26 | 1931 | 1,292 |
FIGURE 5(A–C) Population stratification of Mycobacterium africanum based on genome-wide SNPs [(A) population structure of 380 samples of lineages L5 and L6 at k = 2; (B) population structure of 197 samples of lineage L6 at k = 3; (C) population structure of 157 samples of sub-lineage L5.1 at k = 2].
Comparison of L5 (sub)-lineage mapping with previous studies.
| Lineage reported in previous studies | Lineage identified in current study using STRUCTURE ( | Lineage identified in current study using STRUCTURE ( |
|---|---|---|
| L5.1 (Group 1) samples with RD711 | ||
| L5.1.1 | L5.1.1 | L5.1.1.1/L5.1.1.2 |
| L5.1.2 | L5.1.1 | L5.1.1.3 |
| L5.1.3 | L5.1.1 | L5.1.1.3 |
| L5.1.4 | L5.1.2 | L5.1.2 |
| L5.1.5 | L5.1.2 | L5.1.2 |
| NA | L5.1.1 | L5.1.1.4 |
|
|
| |
| L5.2 (Group 2) samples without RD711 | ||
| L5.2 | L5.2.1 | |
| L5.3 | L5.2.1 | |
| NA | L5.2.3 and L5.2.4 | |
FIGURE 6(A–D) PCA plots of Mycobacterium africanum derived using genome-wide SNPs [(A) PCA distribution of 380 samples belonging to lineages L5 and L6; (B) PCA distribution of 197 samples belonging to lineage L6; (C) PCA distribution of 157 samples belonging to sub-lineage L5.1; (D) PCA distribution of 26 samples belonging to sub-lineage L5.2].
Summary of L5 and L6 (sub)cluster-specific-core SNPs.
| Functional annotation | L5 | L5.1.1 | L5.1.2 | L5.2.1 | L5.2.2 | L5.2.4 | L6 | L6.1 | L6.2 | L6.3 |
|---|---|---|---|---|---|---|---|---|---|---|
| Synonymous | ||||||||||
| Cell_wall | 453 | 1 | 4 | 12 | 8 | 14 | 573 | 8 | 6 | 10 |
| Conserved_hypothetical | 382 | 2 | 6 | 4 | 5 | 13 | 560 | 8 | 13 | 13 |
| Lipid_metabolism | 255 | 0 | 1 | 1 | 5 | 4 | 289 | 4 | 4 | 5 |
| Pathways | 146 | 0 | 1 | 2 | 0 | 2 | 192 | 10 | 11 | 15 |
| Regulatory_proteins | 95 | 0 | 1 | 2 | 1 | 3 | 108 | 4 | 4 | 2 |
| Metabolism_respiration | 577 | 2 | 2 | 4 | 9 | 23 | 699 | 2 | 1 | 2 |
| Virulence | 89 | 0 | 2 | 0 | 1 | 2 | 109 | 2 | 3 | 1 |
| Missense | ||||||||||
| Cell_wall | 686 | 1 | 6 | 7 | 3 | 16 | 877 | 11 | 17 | 25 |
| Conserved_hypothetical | 686 | 2 | 2 | 9 | 8 | 16 | 885 | 15 | 12 | 20 |
| Lipid_metabolism | 311 | 0 | 3 | 2 | 4 | 4 | 419 | 7 | 7 | 7 |
| Pathways | 193 | 0 | 1 | 3 | 4 | 3 | 271 | 23 | 21 | 17 |
| Regulatory_proteins | 150 | 0 | 2 | 0 | 1 | 5 | 207 | 6 | 6 | 9 |
| Metabolism_respiration | 804 | 2 | 3 | 11 | 10 | 17 | 1,017 | 5 | 6 | 6 |
| Virulence | 145 | 0 | 2 | 1 | 2 | 3 | 175 | 3 | 7 | 3 |
| Upstream/downstream | ||||||||||
| Cell_wall | 137 | 0 | 0 | 2 | 3 | 6 | 191 | 2 | 3 | 8 |
| Conserved_hypothetical | 217 | 0 | 0 | 2 | 2 | 2 | 218 | 5 | 5 | 3 |
| Lipid_metabolism | 52 | 0 | 0 | 0 | 0 | 0 | 69 | 0 | 0 | 3 |
| Pathways | 47 | 0 | 0 | 0 | 1 | 0 | 52 | 3 | 2 | 5 |
| Regulatory_proteins | 37 | 0 | 0 | 0 | 0 | 0 | 54 | 1 | 2 | 1 |
| Metabolism_respiration | 137 | 1 | 3 | 0 | 3 | 6 | 193 | 1 | 1 | 1 |
| Virulence | 26 | 0 | 0 | 0 | 1 | 1 | 30 | 1 | 0 | 0 |
| Stop gained/lost/spliced | ||||||||||
| Cell_wall | 25 | 0 | 0 | 0 | 1 | 0 | 37 | 0 | 0 | 1 |
| Conserved_hypothetical | 40 | 0 | 1 | 1 | 0 | 1 | 55 | 1 | 0 | 0 |
| Lipid_metabolism | 7 | 0 | 0 | 0 | 0 | 1 | 17 | 0 | 0 | 0 |
| Pathways | 2 | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 0 | 0 |
| Regulatory_proteins | 8 | 0 | 0 | 0 | 0 | 0 | 9 | 0 | 0 | 0 |
| Metabolism_respiration | 25 | 0 | 0 | 0 | 1 | 1 | 34 | 1 | 1 | 1 |
| Virulence | 5 | 0 | 0 | 0 | 0 | 0 | 8 | 0 | 0 | 0 |
| Total | 5,737 + 81(transcript _variants) | 11 | 40 | 63s | 73 | 143 | 7,352 + 21(transcript_variants) | 123 + 1(transcript_variant) | 132 | 158 |
Functional mapping of core-cluster-specific missense SNPs of L6 with literature support (Gehre et al., 2013; Ofori-Anyinam et al., 2017; Ofori-Anyinam et al., 2020) functional role obtained from Mycobrowser (url: https://mycobrowser.epfl.ch/).
| Lineage/sub-lineage | Rv locus/gene name | Functional role | SNP |
|---|---|---|---|
| L6 | Rv0862c | Conserved hypothetical protein | Asp160Glu |
| Rv1096 | Probably involved in carbohydrate degradation | Pro272Ser | |
| Rv2241/aceE | Involved in energy metabolism | Ala777Thr | |
| Rv2383c/mbtB | Involved in biogenesis of siderophore mycobactins | Leu978Phe | |
| Rv2737c/recA | Involved in regulation of nucleotide excision repair | Gln566Pro | |
| Rv2194/ | Required during aerobic respiration for growth; may be responsible for differential energy metabolism | Lys228Gln | |
| Rv1023/eno | Role in tissue re-modeling and invasion of host cells; a potential drug target ( | Arg179Ser | |
| Rv1240/mdh | Involved in tricarboxylic acid cycle | Asp253Ala | |
| Rv0066c/icd2 | Involved in tricarboxylic acid cycle | Lys117Asn | |
| L6.1 | Rv3563/fadE32 | Involved in lipid degradation | Glu206Val |
| Rv0080 | Conserved hypothetical protein | Val31Gly | |
| Rv2504c/scoA | Involved in fatty acid degradation/synthesis | Arg230Trp | |
| Rv3223c/sigH | Alternative sigma factor that plays a role in oxidative-stress response | Glu151Asp | |
| Rv0066c/icd2 | Involved in tricarboxylic acid cycle | Lys117Asn | |
| Val447Met | |||
| Rv1328/glgP | Phosphorylase is an important allosteric enzyme in carbohydrate metabolism | Gly731Asp | |
| Rv2112c/dop | Deamidase of prokaryotic ubiquitin-like-protein | Ala500Val | |
| Rv3282 | Conserved hypothetical protein | Thr145Lys | |
| Rv1178 | Probably involved in cellular metabolism | Arg247Arg | |
| Rv3236c/kefB | Growth attenuation | Arg325His | |
| L6.2 | Rv3236c/kefB | Growth attenuation | Val106Ala |
| Rv2215 | Involved in tricarboxylic acid cycle and antioxidant defense | Ala338Val | |
| Rv1121/zwf1 | Involved in the pentose phosphate pathway | Gln277* | |
| L6.3 | Rv1180/pks3 | Potentially involved in intermediate steps for the synthesis of polyketide | Pro401Thr |
| Rv1181/pks4 | Involved in lipid metabolism | Gly40Arg | |
| Rv 2030c | Conserved hypothetical protein | Ser275Asn | |
| Rv1447c/zwf2 | Involved in pentose phosphate pathway | Gly357Ser |
Absent in only one isolate (SRA Accession ID: SRR1577833).
*Stop codon
Functional mapping of core-cluster-specific missense SNPs of L5 with literature support (Ofori-Anyinam et al., 2020), and functional role obtained from MycoBrowser (url: https://mycobrowser.epfl.ch/).
| Rv locus/gene name | Functional role | SNP | Additional functional evidence |
|---|---|---|---|
| Rv0211/ pckA | Gluconeogenesis; virulence and initiation of infection in macrophages | Lys422Thr |
|
|
| |||
| Rv2967c/pca | Gluconeogenesis; cholesterol detoxification and lipogenesis during intracellular growth | Ala926Thr | — |
| Rv1188/pruB | Proline metabolism associated with attenuated growth and adaptation to hypoxia | Arg257Cys |
|
|
| |||
|
| |||
|
| |||
| Rv1552/frdA | Associated with hypoxia and microaerophilic adaptation | Gly16Asp | — |
| Rv1309/atpG | Produces ATP from ADP in the electron transport chain | Tyr220Ser | — |
| Rv1307/atpH | Produces ATP from ADP in the presence of a proton or sodium gradient | Ser434Leu | — |
| Rv1240/mdh | Catalyzes the reversible oxidation of malate to oxaloacetate | Leu326Ile | — |
| Rv0066c/icd2 | Catalyzes the conversion of isocitrate to ɑ-ketoglutarate | Ala455Val | — |
| Rv0946c/pgi | Central carbohydrate metabolism | His6Tyr | — |
| Rv1131/prpC | Involved in the methyl citrate cycle | Glu255Ala | — |
| Rv2454c | Central carbohydrate metabolism | Ala309Gly | — |
| Rv1127c/Ppdk | Catalyzes the reversible phosphorylation of pyruvate and phosphate | Val425Ala and Ser112Ala | — |
| Rv3293/Pcd | Involved in L-alpha-aminoadipic acid biosynthesis | Gly198Ala | — |