| Literature DB >> 25766780 |
Jianping Jiang1,2, Jianlei Gu3,4, Liang Zhang5, Chenyi Zhang6, Xiao Deng7, Tonghai Dou8, Guoping Zhao9,10, Yan Zhou11,12.
Abstract
BACKGROUND: Over the last decade, emerging research methods, such as comparative genomic analysis and phylogenetic study, have yielded new insights into genotypes and phenotypes of closely related bacterial strains. Several findings have revealed that genomic structural variations (SVs), including gene gain/loss, gene duplication and genome rearrangement, can lead to different phenotypes among strains, and an investigation of genes affected by SVs may extend our knowledge of the relationships between SVs and phenotypes in microbes, especially in pathogenic bacteria.Entities:
Mesh:
Year: 2015 PMID: 25766780 PMCID: PMC4342819 DOI: 10.1186/s12864-015-1259-0
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
General features of thirteen genomes
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|
|
| NC_021251.1 | CHN | 4,156 | 249 | 4,405 | 2,880 | 65.38 | 4,156 | 94.35 |
|
| NC_017522.1 | CHN | 3,590 | 813 | 4,403 | 2,862 | 65.00 | 4,111 | 93.37 |
|
| NC_002755.2 | USA | 4,189 | 126 | 4,315 | 2,762 | 64.01 | 4,018 | 93.12 |
|
| NC_017524.1 | RUS | 3,944 | 427 | 4,371 | 2,849 | 65.18 | 4,139 | 94.69 |
|
| NC_009565.1 | ZA | 3,941 | 442 | 4,383 | 2,862 | 65.30 | 4,141 | 94.48 |
|
| NC_009525.1 | - | 4,034 | 372 | 4,406 | 2,866 | 65.05 | 4,168 | 94.60 |
|
| NC_000962.3 | - | 4,111 | 372 | 4,483 | 2,867 | 63.95 | 4,158 | 92.75 |
|
| NC_012943.1 | ZA | 4,059 | 325 | 4,384 | 2,859 | 65.21 | 4,148 | 94.62 |
|
| NC_016768.1 | ZA | 3,996 | 376 | 4,372 | 2,851 | 65.21 | 4,136 | 94.60 |
|
| NC_018078.1 | ZA | 4,001 | 349 | 4,350 | 2,835 | 65.17 | 4,122 | 94.76 |
|
| NC_017026.1 | IN | 3,691 | 325 | 4,016 | 2,459 | 61.23 | 3,594 | 89.49 |
|
| NC_017528.1 | IN | 3,622 | 271 | 3,893 | 2,379 | 61.11 | 3,462 | 88.93 |
|
| NC_016934.1 | COL | 3,796 | 433 | 4,229 | 2,778 | 65.69 | 4,011 | 94.85 |
Source: The strain information was acquired from NCBI. CHN: China, USA: America, RUS: Russia, ZA: South Africa, IN: India, COL: Colombia. H37Ra and H37Rv were derived from the original human-lung H37 isolate in 1934 and have been used extensively in biomedical research.
Figure 1The construction of GTNs and how GTNs are affected by evolutionary events. (A) Constructing a GTN from a genome. Supposing that Gene1 and Gene3 are annotated as COGA, Gene2 as COGB and Gene4 as COGC, the numbers adjacent to the lines in the GTNs are degrees. (B) The variation in GTN structure when Gene1 is duplicated and inserted between GeneX and GeneY. GeneX and GeneY are annotated as COGX and COGY, respectively. (C) The variation in GTN structure when Gene1 is lost. (D) The variation in GTN structure when the segments of Gene2 and Gene3 are reversed in the genome.
Figure 2Gene counts of thirteen strains after annotation refinement with different overlap thresholds. The horizontal axis shows each of the thirteen M. tuberculosis strain names, and the vertical axis depicts the total gene count in each strain. After annotation refining with different overlap thresholds (70%, 80% and 90%), the gene count of each strain increased. However, when the threshold was set to 70% or 90%, there were few differences in the results.
COG groups with the highest DDs
|
|
|
|
|
|
|---|---|---|---|---|
|
| Transposase and inactivated derivatives | 24 | 14 | 1.71 |
|
| Transposase and inactivated derivatives | 28 | 18 | 1.56 |
|
| Transposase and inactivated derivatives | 6 | 6 | 1.00 |
|
| Transposase and inactivated derivatives | 5 | 9 | 0.56 |
|
| PPE-repeat proteins | 31 | 62 | 0.50 |
|
| Adenylate cyclase, family 3 (some proteins contain HAMP domain) | 6 | 12 | 0.50 |
|
| Serine/threonine protein kinase | 5 | 11 | 0.45 |
|
| Beta-lactamase class C and other penicillin-binding proteins | 5 | 11 | 0.45 |
|
| FAD/FMN-containing dehydrogenases | 6 | 14 | 0.43 |
|
| Predicted drug exporters of the RND superfamily | 6 | 14 | 0.43 |
|
| Polyketide synthase modules and related proteins | 8 | 19 | 0.42 |
|
| Enoyl-CoA hydratase/carnitine racemase | 9 | 23 | 0.39 |
|
| Esterase/lipase | 5 | 13 | 0.38 |
|
| Predicted hydrolases or acyltransferases (alpha/beta hydrolase superfamily) | 12 | 34 | 0.35 |
|
| O-Methyltransferase involved in polyketide biosynthesis | 6 | 17 | 0.35 |
|
| Transcriptional regulator | 17 | 49 | 0.35 |
|
| Acyl-CoA synthetases (AMP-forming)/AMP-acid ligases II | 11 | 32 | 0.34 |
|
| Acyl-CoA dehydrogenases | 12 | 35 | 0.34 |
|
| Coenzyme F420-dependent N5,N10-methylene tetrahydromethanopterin reductase and related flavin-dependent oxidoreductases | 6 | 19 | 0.32 |
|
| Cytochrome P450 | 6 | 20 | 0.30 |
|
| Permeases of the major facilitator superfamily | 8 | 27 | 0.30 |
|
| Predicted nucleic acid-binding protein, contains PIN domain | 5 | 19 | 0.26 |
|
| SAM-dependent methyltransferases | 9 | 37 | 0.24 |
|
| Dehydrogenases with different specificities (related to short-chain alcohol dehydrogenases) | 9 | 41 | 0.22 |
DD/Str.: The averages of different degrees. The COG group’s average DDs higher than five are shown; Paralog/Str.: The average paralog number (rounded) of thirteen M. tuberculosis strains.
Figure 3A rooted phylogenetic tree constructed with orthologs determined by orthoMCL of thirteen stains and the outgroup BCG. The strain information is shown in Table 1. The first number in parentheses is the number of unique ortholog pairs in the strain when compared to its sister group. The second number corresponds to the differences of existing ortholog pairs in a strain when compared to its sister group. For instance, there are 60 unique ortholog pairs in KZN4207 compared with KZN605. There are 41 unique ortholog pairs in KZN605. For the ortholog pairs in both strains (groups), five more existences were detected in KZN4207 than in KZN605. The sub-groups are marked with different colors. The blue background denotes the KZN group, green denotes the H37 group, purple denotes the RGTB group and light blue denotes the CCDC group. The RGTB group was excluded from the group due to its abnormal gene counts. The number to the right of each group is the number of different ortholog pairs (unique and different) between the group and the other two groups.
Pathway (KEGG) enrichment analysis of lost genes from three strains
|
|
|
|
|
|
|---|---|---|---|---|
|
|
| 17 |
| 0.00 |
| 12 | mtu00240:Pyrimidine metabolism | 0.01 | ||
| 10 | mtu00361:gamma-Hexachlorocyclohexane degradation | 0.01 | ||
| 13 |
| 0.00 | ||
| 16 |
| 0.01 | ||
| 16 |
| 0.01 | ||
| 5 |
| 0.01 | ||
| 16 | mtu02010:ABC transporters | 0.00 | ||
|
|
| 15 | mtu00071:Fatty acid metabolism | 0.00 |
| 16 |
| 0.01 | ||
| 10 | mtu00250:Alanine, aspartate and glutamate metabolism | 0.00 | ||
| 8 | mtu00260:Glycine, serine and threonine metabolism | 0.04 | ||
| 18 | mtu00280:Valine, leucine and isoleucine degradation | 0.00 | ||
| 19 | mtu00281:Geraniol degradation | 0.00 | ||
| 13 | mtu00330:Arginine and proline metabolism | 0.00 | ||
| 7 | mtu00360:Phenylalanine metabolism | 0.03 | ||
| 15 | mtu00380:Tryptophan metabolism | 0.00 | ||
| 10 | mtu00410:beta-Alanine metabolism | 0.04 | ||
| 6 | mtu00480:Glutathione metabolism | 0.03 | ||
| 13 |
| 0.01 | ||
| 16 |
| 0.03 | ||
| 19 | mtu00640:Propanoate metabolism | 0.00 | ||
| 16 |
| 0.03 | ||
| 8 | mtu00910:Nitrogen metabolism | 0.01 | ||
| 9 | mtu00930:Caprolactam degradation | 0.04 | ||
| 6 |
| 0.00 | ||
| 16 | mtu02020:Two-component system | 0.00 | ||
| 7 | mtu03030:DNA replication | 0.01 | ||
|
|
| 4 | mtu00310:Lysine degradation | 0.01 |
| 2 | mtu00540:Lipopolysaccharide biosynthesis | 0.03 | ||
| 4 | mtu00650:Butanoate metabolism | 0.02 |
Pathway: KEGG pathways are enriched in lost genes. Shared pathways between RGTB327 and RGTB423 are shown in bold.