| Literature DB >> 31462646 |
Melissa Ilardo1, Rudrarup Bose2, Markus Meringer3, Bakhtiyor Rasulev4, Natalie Grefenstette5, James Stephenson6,7, Stephen Freeland8, Richard J Gillams9,10, Christopher J Butch9,11,12, H James Cleaves13,14,15.
Abstract
Life uses a common set of 20 coded amino acids (CAAs) to construct proteins. This set was likely canonicalized during early evolution; before this, smaller amino acid sets were gradually expanded as new synthetic, proofreading and coding mechanisms became biologically available. Many possible subsets of the modern CAAs or other presently uncoded amino acids could have comprised the earlier sets. We explore the hypothesis that the CAAs were selectively fixed due to their unique adaptive chemical properties, which facilitate folding, catalysis, and solubility of proteins, and gave adaptive value to organisms able to encode them. Specifically, we studied in silico hypothetical CAA sets of 3-19 amino acids comprised of 1913 structurally diverse α-amino acids, exploring the adaptive value of their combined physicochemical properties relative to those of the modern CAA set. We find that even hypothetical sets containing modern CAA members are especially adaptive; it is difficult to find sets even among a large choice of alternatives that cover the chemical property space more amply. These results suggest that each time a CAA was discovered and embedded during evolution, it provided an adaptive value unusual among many alternatives, and each selective step may have helped bootstrap the developing set to include still more CAAs.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31462646 PMCID: PMC6713743 DOI: 10.1038/s41598-019-47574-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Summary of results of better XAA sets as a function of set size.
| Set size (no. AAs) | # CAA Set Combinationsa | # CAA Maximal Setsb | % Better XAA Sets in 108 Trials |
|---|---|---|---|
| 3 | 1,140 | 509 | 0.499 |
| 4 | 4,845 | 1,250 | 0.242 |
| 5 | 15,504 | 2,151 | 0.161 |
| 6 | 38,760 | 2,875 | 4.65 × 10−2 |
| 7 | 77,520 | 3,044 | 1.68 × 10−2 |
| 8 | 125,970 | 3,177 | 7.52 × 10−3 |
| 9 | 167,960 | 2,787 | 3.58 × 10−3 |
| 10 | 184,756 | 2,160 | 1.23 × 10−3 |
| 11 | 167,960 | 1,566 | 3.77 × 10−4 |
| 12 | 125,970 | 1,181 | 1.59 × 10−4 |
| 13 | 77,520 | 799 | 1.75 × 10−4 |
| 14 | 38,760 | 504 | 1.68 × 10−4 |
| 15 | 15,504 | 289 | 2.10 × 10−4 |
| 16 | 4,845 | 165 | 3.24 × 10−4 |
| 17 | 1,140 | 74 | 2.07 × 10−4 |
| 18 | 190 | 28 | 6.47 × 10−5 |
| 19 | 20 | 8 | 1.09 × 10−5 |
| 20c | 1 | 1 | 6 × 10−6 |
aThis number is derived from the formula for binomial coefficients, see SI Section 1. bThis is the number of maximal sets, see methods section. cThere is only one possible set, which is also maximal, of the 20 CAAs, and only 6 better XAA sets were found in a previous study, of which several contained CAAs[3].
Figure 1Semi-log plot showing the results of two 108 samplings (yellow and blue bars) for better XAA sets of a given set size (shown on the x-axis) from the XAA library. The number of better XAA sets decreases approximately logarithmically with the exception of sets of size 13 to 18.
Figure 2Box plots showing the relative frequency of (A) CAAs and (B) XAAs in better sets. Boxes extend from the lower to upper quartile values of the data, with a line at the median, and whiskers extending to contain 95% of the data. Zoomed insets help to show that the comparison of (A,B) reveals that the median values for maximal CAA sets are always higher than those of the corresponding XAA set sizes. In (A) all top outlier data points represent Met in set sizes 16–19. The connected data points in Figure B highlight the anomalously high frequency of the CAAs, Ala, Cys and Pro, in better XAA sets of larger set size.
Figure 3Relative frequency at which the individual CAAs are found in maximal CAA and better XAA sets. (A) shows the raw relative frequency of occurrence of the CAAs in maximal sets. (B) shows the Z-value for the frequency distribution shown in (A). In (C), green corresponds to a particular CAA occurring at high frequency relative to the other CAAs in sets, while red corresponds with low frequency. In (D), the absolute difference between the relative frequencies (Z-values) shown in (B) and (C) highlight areas where the relative frequencies of a particular amino acid vary between maximal and better sets, possibly highlighting CAAs having different importance depending on the context in which they are compared. Dark blue indicates a large difference between the frequency of a particular CAA in maximal CAA vs. better XAA (e.g., those selected from the total XAA pool) sets. The direction of the bias can be determined by referring to panels B and C. Rounded raw values are shown for reference in each data cell.