| Literature DB >> 29743515 |
Meng-Ze Du1, Shuo Liu1, Zhi Zeng1, Labena Abraham Alemayehu1, Wen Wei2, Feng-Biao Guo3,4,5.
Abstract
Inconsistent results on the association between evolutionary rates and amino acid composition of proteins have been reported in eukaryotes. However, there are few studies of how amino acid composition can influence evolutionary rates in bacteria. Thus, we constructed linear regression models between composition frequencies of amino acids and evolutionary rates for bacteria. Compositions of all amino acids can on average explain 21.5% of the variation in evolutionary rates among 273 investigated bacterial organisms. In five model organisms, amino acid composition contributes more to variation in evolutionary rates than protein abundance, and frequency of optimal codons. The contribution of individual amino acid composition to evolutionary rate varies among organisms. The closer the GC-content of genome to its maximum or minimum, the better the correlation between the amino acid content and the evolutionary rate of proteins would appear in that genome. The types of amino acids that significantly contribute to evolutionary rates can be grouped into GC-rich and AT-rich amino acids. Besides, the amino acid with high composition also contributes more to evolutionary rates than amino acid with low composition in proteome. In summary, amino acid composition significantly contributes to the rate of evolution in bacterial organisms and this in turn is impacted by GC-content.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29743515 PMCID: PMC5943316 DOI: 10.1038/s41598-018-25364-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Multivariate linear regression models between amino acids and evolutionary rates. (A) There are 273 genome pairs belong to 18 phyla. Corresponding genome pair count and the average R2 for the multivariate linear regression between amino acid compositions and evolutionary rates were shown. (B) For 273 organisms, the total decision coefficient R2 ranged in 0~0.6 with P is less than 0.05. (C) GC content influences the total decision coefficient R2 for the multivariate linear regression between amino acid compositions and evolutionary rates. (D) Genome size negatively correlates with the total decision coefficient R2. (E) The evolutionary rates for proteins in the five model organisms and corresponding average are: 0.26, 0.11,0.13,0.16, and 0.15.
The linear models for evolutionary rates and abundance/Fop/amino acid compositions.
| Organism | Reference genome | Homologous protein numbers | Linear regression models(Evolutionary ~) | |||||
|---|---|---|---|---|---|---|---|---|
| abundance | Fop | amino acid compositions | abundance, Fop and amino acid compositions | |||||
|
|
|
| Variables significantly contribute to evolutionary rates(positive; negative) |
| Variables significantly contribute to evolutionary rates(positive; negative) | |||
| E. coli (NC_000913) | NC_014479 | 610 | 0.0187 | 0.0536 | 0.1827 | L,V,W; H,R,G,Y | 0.1836 | V,L,W; R,Q,G,H,Y,Fop |
| M. tuberculosis (NC_000962) | NC_015125 | 351 | 0.0368 | 0.2233 | 0.2968 | A,V; I,D,K | 0.3370 | A,V; K,D,Fop |
| B. subtilis (NC_000964) | NC_014829 | 1134 | 0.0275 | 0.0899 | 0.2215 | L,W,F,I,S,Y; D,N,P,E,R,G | 0.2263 | K,W,F,L,A,I,S; N,D,E,P,R,G,Fop |
| M. tuberculosis (NC_002737) | NC_007350 | 455 | 0.0349 | 0.0723 | 0.2582 | V,Y,L,M; N,E,D,G,R,P | 0.2422 | V,Y,M; N,Fop,E,D,G,P,R |
| D. vulgaris (NC_002937) | NC_006832 | 127 | 0.0101# | 0.0748 | 0.2547 | V;N,H,Y,I | 0.2332 | V; N,Y,I,H |
*All MLRs in this table have P values less than 0.05. #This linear model has P = 0.45, which is nonsignificant.
Figure 2GC content influences the contributions of amino acid compositions to the evolutionary rates in GC-rich organisms and AT-rich organisms. (A) The relationship between GC content and the contributions of amino acid compositions to evolutionary rates for MLRs. (B) The count of the genomes that amino acid compositions negatively/positively contribute to the MLR in GC-rich and AT-rich organisms. The 20 amino acid types are represented by the letters A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W and Y. (C) The point plots between amino acid compositions and GC contents. For GC-rich/AT-rich amino acids, the average compositions of these amino acids of 273 organisms are positively/negatively correlated with the GC content (P ≪ 0.05).
Figure 3The amino acid composition and evolutionary rates. (A) The boxplot for the correlation index between amino acid composition and evolutionary rate in 56 genomes with GC content in range 45~55%. (B) The boxplot of the average amino acid compositions in 56 genomes with GC content in range 45~55%. (C) The scatterplot of contributions for 273 organisms. The horizontal axis represents organism. Each organism has two corresponding scatters: one is for the richest amino acids, and the other is for the rest amino acids.
Figure 4Average Ka/Ks for groups of genomes with different GC contents. The boxplot of Ka/Ks for GC-rich organism (GC content > 64%), AT-rich genomes (GC < 32%) and GC-middle genomes (GC content: 47% ~53%). The mean Ka/Ks for the three groups are: 0.0539, 0.0776 and 0.1132. The student’s t test showed that the GC-rich group and AT-rich group are significantly lower than the GC-middle group (P = 4.111e-07). The increase in Ka/Ks ratio may be an evidence of the relaxation of negative selection.