| Literature DB >> 35139824 |
En-Ze Hu1, Xin-Ran Lan1, Zhi-Ling Liu1, Jie Gao1, Deng-Ke Niu2.
Abstract
BACKGROUND: GC pairs are generally more stable than AT pairs; GC-rich genomes were proposed to be more adapted to high temperatures than AT-rich genomes. Previous studies consistently showed positive correlations between growth temperature and the GC contents of structural RNA genes. However, for the whole genome sequences and the silent sites of the codons in protein-coding genes, the relationship between GC content and growth temperature is in a long-lasting debate.Entities:
Keywords: Evolution; GC content; Optimal growth temperature; Prokaryotes; Thermophile
Mesh:
Year: 2022 PMID: 35139824 PMCID: PMC8827189 DOI: 10.1186/s12864-022-08353-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
The phylogenetic signals of the variables analyzed in this study
| Bacteria | Archaea | |||||
|---|---|---|---|---|---|---|
| Traits | Pagel′s λ | Pagel′s λ | ||||
| Tmax | 681 | 0.957 | 3.5 × 10−178 | 155 | 1.000 | 1.5 × 10−72 |
| Topt | 681 | 0.950 | 1.1 × 10− 196 | 155 | 0.988 | 5.7 × 10−70 |
| Tmin | 681 | 0.933 | 6.6 × 10−152 | 155 | 0.966 | 1.9 × 10−53 |
| GCw | 681 | 1.000 | 2.6 × 10− 294 | 155 | 1.000 | 5.8 × 10−60 |
| GCp | 681 | 1.000 | 4.7 × 10− 292 | 155 | 1.000 | 2.4 × 10−59 |
| GC4 | 681 | 1.000 | 4.8 × 10− 238 | 155 | 1.000 | 5.1 × 10−53 |
| GCnon | 681 | 1.000 | 8.6 × 10− 305 | 155 | 1.000 | 2.0 × 10−65 |
| GCtRNA | 681 | 0.998 | 6.0 × 10− 275 | 155 | 1.000 | 2.1 × 10−91 |
| GC5S | 646 | 1.000 | 7.0 × 10−178 | 130 | 1.000 | 9.1 × 10−51 |
| GC16S | 681 | 0.999 | 8.7 × 10− 250 | 155 | 0.996 | 3.7 × 10−86 |
| GC23S | 681 | 1.000 | 2.1 × 10− 245 | 155 | 1.000 | 6.8 × 10−83 |
Tmax, Topt, and Tmin represent maximal, optimal, and minimal growth temperature, respectively; GCw, GCp, GC4, GCtRNA, GC5S, GC16S, GC23S, and GCnon represent the GC contents of the whole genome, the protein-coding sequences, the fourfold degenerate sites, the genes coding tRNAs, the genes coding 5S rRNA, the genes coding 16S rRNA, the genes coding 23S rRNA, and the non-coding DNA (including intergenic sequences and untranslated regions of mRNA), respectively. The phylogenetic signals of the chromosomal, plasmid, core and accessory genes are also very close to one and deposited in Additional file 1: Tables S1-S4
PGLS regression of GC contents and growth temperatures
| Bacteria | Archaea | |||||
|---|---|---|---|---|---|---|
| Slope | Slope | |||||
| GCw-Tmax | 7.1 × 10−4 | 7.1 × 10− 4 | 9.4 × 10− 4 | 6.6 × 10− 4 | 0.115 | 0.153 |
| GCw-Topt | 5.7 × 10− 4 | 0.009 | 0.011 | 3.3 × 10−4 | 0.377 | 0.503 |
| GCw-Tmin | 2.8 × 10−4 | 0.156 | 0.226 | 5.2 × 10−4 | 0.126 | 0.168 |
| GCp-Tmax | 6.6 × 10−4 | 0.002 | 0.002 | 5.6 × 10−4 | 0.183 | 0.209 |
| GCp-Topt | 5.3 × 10−4 | 0.015 | 0.016 | 2.4 × 10−4 | 0.522 | 0.597 |
| GCp-Tmin | 2.5 × 10−4 | 0.202 | 0.231 | 4.6 × 10−4 | 0.180 | 0.205 |
| GC4-Tmax | 0.001 | 0.003 | 0.003 | 9.9 × 10−4 | 0.321 | 0.321 |
| GC4-Topt | 0.001 | 0.016 | 0.016 | 2.2 × 10−4 | 0.806 | 0.806 |
| GC4-Tmin | 5.5 × 10−4 | 0.239 | 0.239 | 6.9 × 10−4 | 0.393 | 0.393 |
| GCnon-Tmax | 8.0 × 10−4 | 1.7 × 10− 4 | 2.8 × 10− 4 | 9.1 × 10− 4 | 0.025 | 0.041 |
| GCnon-Topt | 6.4 × 10−4 | 0.004 | 0.006 | 6.4 × 10−4 | 0.080 | 0.129 |
| GCnon-Tmin | 2.7 × 10−4 | 0.170 | 0.226 | 6.5 × 10−4 | 0.048 | 0.077 |
| GCtRNA-Tmax | 4.1 × 10−4 | 2.2 × 10−16 | 5.9 × 10−16 | 7.1 × 10− 4 | 1.8 × 10−11 | 7.2 × 10−11 |
| GCtRNA-Topt | 3.9 × 10−4 | 2.6 × 10−14 | 6.9 × 10−14 | 5.0 × 10− 4 | 2.5 × 10−7 | 6.7 × 10−7 |
| GCtRNA-Tmin | 1.5 × 10−4 | 9.1 × 10− 4 | 0.002 | 4.2 × 10− 4 | 1.8 × 10−6 | 4.7 × 10−6 |
| GC5S-Tmax | 5.5 × 10−4 | 1.2 × 10−6 | 2.4 × 10−6 | 0.001 | 1.9 × 10−5 | 3.9 × 10−5 |
| GC5S-Topt | 4.4 × 10−4 | 1.4 × 10−4 | 2.9 × 10− 4 | 8.9 × 10− 4 | 1.6 × 10− 4 | 3.2 × 10− 4 |
| GC5S-Tmin | 3.5 × 10− 4 | 0.001 | 0.002 | 6.1 × 10− 4 | 0.005 | 0.010 |
| GC16S-Tmax | 5.4 × 10−4 | 2.2 × 10−16 | 5.9 × 10− 16 | 8.2 × 10− 4 | 3.9 × 10−11 | 1.0 × 10−10 |
| GC16S-Topt | 5.2 × 10−4 | 2.2 × 10− 16 | 8.8 × 10− 16 | 7.2 × 10− 4 | 1.1 × 10− 10 | 4.5 × 10− 10 |
| GC16S-Tmin | 4.6 × 10− 4 | 2.2 × 10− 16 | 8.8 × 10− 16 | 5.5 × 10− 4 | 8.5 × 10−8 | 3.4 × 10− 7 |
| GC23S-Tmax | 6.6 × 10− 4 | 2.2 × 10− 16 | 5.9 × 10− 16 | 0.001 | 2.2 × 10− 16 | 1.8 × 10−15 |
| GC23S-Topt | 6.5 × 10− 4 | 2.2 × 10− 16 | 8.8 × 10−16 | 0.001 | 1.2 × 10− 14 | 9.5 × 10− 14 |
| GC23S-Tmin | 4.9 × 10− 4 | 2.2 × 10− 16 | 8.8 × 10− 16 | 8.3 × 10− 4 | 8.0 × 10−11 | 6.4 × 10− 10 |
GC contents were the dependent variables, and growth temperatures were the independent variables. The results in this table were obtained using the Brownian motion model. Similar results obtained from three other models are deposited in Additional file 1: Tables S5-S7. PBH, Benjamini-Hochberg adjusted P value. Please see Table 1 for the meanings of the other abbreviations
The appearance of correlations in 1000 rounds of resampling analyses
| Significantly Negative | Not Significant | Significantly Positive | |
|---|---|---|---|
| GCw-Tmax | 0 | 974 | 26 |
| GCw-Topt | 0 | 991 | 9 |
| GCp-Tmax | 0 | 976 | 24 |
| GCp-Topt | 0 | 993 | 7 |
| GC4-Tmax | 0 | 962 | 38 |
| GC4-Topt | 0 | 992 | 8 |
| GCnon-Tmax | 0 | 974 | 26 |
| GCnon-Topt | 0 | 992 | 8 |
| GCtRNA-Tmax | 0 | 12 | 988 |
| GCtRNA-Topt | 0 | 21 | 979 |
| GC5S-Tmax | 0 | 308 | 692 |
| GC5S-Topt | 0 | 473 | 527 |
| GC16S-Tmax | 0 | 0 | 1000 |
| GC16S-Topt | 0 | 0 | 1000 |
| GC23S-Tmax | 0 | 0 | 1000 |
| GC23S-Topt | 0 | 0 | 1000 |
In each round of resampling, 155 samples were randomly drawn from the 681 bacteria. PGLS regression analyses were performed for each round. GC contents were the dependent variables, and growth temperatures were the independent variables. The results in this table were obtained using the Brownian motion model. Please see Table 1 for the meanings of the other abbreviations. The datasets for each round of resampling are deposited in Additional file 2: Data S1
PGLS regression of GC contents and growth temperatures in chromosomes and plasmids
| Plasmid | Chromosome | |||||
|---|---|---|---|---|---|---|
| Slope | Slope | |||||
| GCw-Tmax | 0.001 | 0.009 | 0.043 | 9.6 × 10− 4 | 0.029 | 0.043 |
| GCw-Topt | 0.001 | 0.005 | 0.031 | 9.6 × 10− 4 | 0.023 | 0.031 |
| GCp-Tmax | 0.001 | 0.016 | 0.043 | 9.1 × 10− 4 | 0.038 | 0.046 |
| GCp-Topt | 0.001 | 0.010 | 0.031 | 9.2 × 10− 4 | 0.031 | 0.034 |
| GC4-Tmax | 0.002 | 0.072 | 0.072 | 0.002 | 0.027 | 0.043 |
| GC4-Topt | 0.002 | 0.044 | 0.044 | 0.002 | 0.017 | 0.031 |
| GCnon-Tmax | 8.3 × 10−4 | 0.055 | 0.060 | 0.001 | 0.021 | 0.043 |
| GCnon-Topt | 9.3 × 10− 4 | 0.025 | 0.031 | 0.001 | 0.021 | 0.031 |
GC contents were the dependent variables, and growth temperatures were the independent variables. The results in this table were obtained using the Brownian motion model. Similar results obtained from three other models are deposited in Additional file 1: Tables S8-S10. PBH, Benjamini-Hochberg adjusted P value. Please see Table 1 for the meanings of the other abbreviations
PGLS analysis of GC contents and growth temperatures in core genes and accessory genes
| Core Genes | Accessory Genes | |||||
|---|---|---|---|---|---|---|
| Slope | Slope | |||||
| GCp-Tmax | 7.6 × 10−4 | 9.6 × 10− 4 | 0.002 | 9.0 × 10− 4 | 0.001 | 0.002 |
| GCp-Topt | 6.4 × 10−4 | 0.007 | 0.025 | 6.3 × 10−4 | 0.026 | 0.030 |
| GC4-Tmax | 0.002 | 6.3 × 10−4 | 0.002 | 0.002 | 0.003 | 0.003 |
| GC4-Topt | 0.002 | 0.004 | 0.025 | 0.002 | 0.019 | 0.030 |
GC contents were the dependent variables, and growth temperatures were the independent variables. The results in this table were obtained using the Brownian motion model. Similar results obtained from three other models are deposited in Additional file 1: Tables S12-S14. PBH, Benjamini-Hochberg adjusted P value. Please see Table 1 for the meanings of the other abbreviations
Fig. 1Pairwise comparison of the GC contents between closely related prokaryotes with different growth temperature ranges. Both bacteria (A) and archaea (B) were classified into four ranks according to their growth temperature, from low to high: psychrophiles/psychrotrophiles, mesophiles, thermophiles, and hyperthermophiles. The diagonal line represents cases in which prokaryotes with different ranks have the same GC contents. Points above the line (153 pairs of bacteria and 17 pairs of archaea) represent cases in which prokaryotes with higher ranks have higher GC contents than their paired relatives, while points below the line (119 pairs of bacteria and 24 pairs of archaea) indicate the reverse. The p values were calculated using two-tailed Wilcoxon signed-rank tests. The exact values of the GC contents are present in Additional file 1: Table S18
Fig. 2Positive correlations between the sudden changes in GC content and growth temperature of bacteria. Following Mahajan and Agasheand [3], the evolutionary jumps of GCw (whole-genome GC content) and Topt (optimal growth temperature) in the bacterial phylogenetic tree were detected using the Lévy jumps model [57]. A the magnitude of the GCw jumps are significantly correlated with the accompanied changes in Topt (Spearman′s rank correlation, 2-tailed, n = 108, rho = 0.209, P = 0.030). B the magnitude of the Topt jumps is significantly correlated with the accompanying change in GCw (Spearman’s rank correlation, 2-tailed, n = 86, rho = 0.280, P = 0.009). The exact values shown in this figure are present in Additional file 1: Tables S19-S20
Fig. 3Relationship between whole-genome GC content (GCw) and optimal growth temperature (Topt) in Archaea. The Topt ranges of Halobacteria (n = 151), other halophilic archaea (n = 2), and nonhalophilic archaea (n = 150) are 30 to 53 °C, 31 to 38 °C, and 23.6 to 106 °C, respectively. Phylogenetic generalized least squares regression analysis using the Ornstein-Uhlenbeck model with an ancestral state to be estimated at the root showed a significant positive correlation between GCw and Topt in nonhalophilic archaea (slope = 0.001, P = 0.025)
Fig. 4Nonlinearity in the relationship between prokaryotic optimal growth temperature and GC contents. It was estimated using the generalized additive mixed model (GAMM) by adjusting the genus as a random effect. The dataset including 836 prokaryotes (681 bacteria and 155 archaea) was used in this analysis. The 5S rRNA genes were not annotated in 60 genomes, so the analysis of the 5S rRNA has a sample size of 776. The effective degrees of freedom (edf) proxy for nonlinearity in the relationships. We presented the relationships of optimal growth temperature with the GC contents of the whole genome, fourfold degenerate sites, tRNA, 5S rRNA, 16S rRNA, and 23S rRNA as (A), (B), (C), (D), (E), and (F) in this figure and those of the protein-coding sequences and the non-coding DNA were deposited in Additional file 3: Fig. S1. The significance values of the results presented in (A) ~ (E) are P = 10− 4, 8 × 10− 7, 2 × 10− 16, 2 × 10− 16, 2 × 10− 16, and 2 × 10− 16, respectively