| Literature DB >> 32403359 |
Mengmeng Wang1, Han Zhu1, Zhijian Kong1, Tuo Li1, Lei Ma1, Dongyang Liu1, Qirong Shen1.
Abstract
The genus Geobacillus is abundant in ecological diversity and is also well-known as an authoritative source for producing various thermostable enzymes. Although it is clear now that Geobacillus evolved from Bacillus, relatively little knowledge has been obtained regarding its evolutionary mechanism, which might also contribute to its ecological diversity and biotechnology potential. Here, a statistical comparison of thirty-two Geobacillus genomes was performed with a specific focus on pan- and core genomes. The pan-genome of this set of Geobacillus strains contained 14,913 genes, and the core genome contained 940 genes. The Clusters of Orthologous Groups (COG) and Carbohydrate-Active Enzymes (CAZymes) analysis revealed that the Geobacillus strains had huge potential industrial application in composting for agricultural waste management. Detailed comparative analyses showed that basic functional classes and housekeeping genes were conserved in the core genome, while genes associated with environmental interaction or energy metabolism were more enriched in the pan-genome. Therefore, the evolution of Geobacillus seems to be guided by environmental parameters. In addition, horizontal gene transfer (HGT) events among different Geobacillus species were detected. Altogether, pan-genome analysis was a useful method for detecting the evolutionary mechanism, and Geobacillus' evolution was directed by the environment and HGT events.Entities:
Keywords: Geobacillus; comparative genomics; core genome; evolutionary events; genomic features; pan-genome
Mesh:
Year: 2020 PMID: 32403359 PMCID: PMC7246994 DOI: 10.3390/ijms21093393
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Genomic statistics and pan-genome information of all species.
| Species | Ngenome | Size (Mb) | Ngenes | Ncore | Nshell | Ncloud | Npan |
|---|---|---|---|---|---|---|---|
| - | 32 | 3.54 ± 0.35 | 3629 ± 356 | 940 | 5304 | 8496 | 14913 |
|
| 8 | 3.13 ± 0.33 | 3280 ± 281 | 1994 | 1925 | 1980 | 5899 |
|
| 8 | 3.53 ± 0.11 | 3794 ± 360 | 2142 | 2333 | 2640 | 7115 |
|
| 8 | 3.58 ± 0.13 | 3532 ± 153 | 2861 | 1101 | 1229 | 5191 |
|
| 8 | 3.92 ± 0.19 | 3912 ± 192 | 2659 | 1737 | 2352 | 6748 |
Figure 1Comparisons of COG functional categories of G. stearothermophilus, G. thermocatenulatus, G. thermodenitrificans, and P. thermoglucosidasius. For each category, from left to right are strain ATCC 12980 T, B5, Sah69, 53, B4109, C1BS50MT1, D1, DSM 458, BCO2, GS-1, DSM 730T, KCTC3921, SURF-114, SURF-48B, SURF-189, T6, DSM 465 T, ID-1, NG80-2, PA-3, JSC-T9a, KCTC3902, OS27, T12, C56-YS93, DSM 2542 T, GT23, ZCTH02-B4, TG4, TNO-09.020, W-2, and Y4.1MC1.
Figure 2GH family distribution of all genomes. (A) Pie chart indicating the percentage of each category of CAZymes. (B) Pie chart indicating the percentage of each GH family. (C) Heatmap displaying the GH family members identified in all the genomes.
Figure 3The pan-genome and clustering analysis of Geobacillus strains. (A) Development plots of the pan- and core genomes. The boxplots of the pan- (cyan color) and core genomes (pink color) are plotted. (B) Clustering of the Geobacillus strains based on the pan-genome ANI matrix. (A) G. stearothermophilus clade, (B) G. thermocatenulatus clade, (C) G. thermodenitrificans clade, and (D) P. thermoglucosidasius clade.
Reconstruction of pan-, core, and softcore genome of all the genomes with Brite and Pathway algorithm of GhostKOALA.
| Gene Numbers | n-Fold Extension | ||||
|---|---|---|---|---|---|
| Pan | Softcore | Core | Softcore-Pan | Core-Pan | |
| Brite mapping | |||||
| Orthologs and modules | 2846 | 1625 | 1419 | 1.8 | 2.0 |
| Protein families: metabolism | 1342 | 798 | 698 | 1.7 | 1.9 |
| Protein families: genetic information processing | 516 | 399 | 372 | 1.3 | 1.4 |
| Protein families: signaling and cellular processing | 539 | 228 | 190 | 2.4 | 2.8 |
| Total | 5243 | 3050 | 2679 | 1.7 | 2.0 |
| Pathway reconstruction | |||||
| Metabolism | 2958 | 1850 | 1666 | 1.6 | 1.8 |
| Genetic Information Processing | 198 | 180 | 167 | 1.1 | 1.2 |
| Environmental Information Processing | 278 | 98 | 79 | 2.8 | 3.5 |
| Cellular Processes | 156 | 102 | 78 | 1.5 | 2.0 |
| Organismal Systems | 61 | 30 | 27 | 2.0 | 2.3 |
| Human Diseases | 88 | 49 | 47 | 1.8 | 1.9 |
| Total | 3739 | 2309 | 2064 | 1.6 | 1.8 |
Figure 4KEGG pathway enrichment scatter plot of specific genes for different Geobacillus species. The x-axis represents the rich factor, and the y-axis shows the name of the KEGG pathway. Dot size represents the number of associated genes, and the color indicates the −log10(q-value). (A–D) Representations of the enrichment analysis of G. stearothermophilus, G. thermocatenulatus, G. thermodenitrificans, and P. thermoglucosidasius, respectively.
Figure 5The predicted horizontal gene transfer (HGT) events between different species. Bands connect the two species where HGT events occurred, and the outmost digits are the HGT events numbers within each species.