| Literature DB >> 30441862 |
Joseph B Ahrens1, Jordon Rahaman2, Jessica Siltberg-Liberles3.
Abstract
Various structural and functional constraints govern the evolution of protein sequences. As a result, the relative rates of amino acid replacement among sites within a protein can vary significantly. Previous large-scale work on Metazoan (Animal) protein sequence alignments indicated that amino acid replacement rates are partially driven by a complex interaction among three factors: intrinsic disorder propensity; secondary structure; and functional domain involvement. Here, we use sequence-based predictors to evaluate the effects of these factors on site-specific sequence evolutionary rates within four eukaryotic lineages: Metazoans; Plants; Saccharomycete Fungi; and Alveolate Protists. Our results show broad, consistent trends across all four Eukaryote groups. In all four lineages, there is a significant increase in amino acid replacement rates when comparing: (i) disordered vs. ordered sites; (ii) random coil sites vs. sites in secondary structures; and (iii) inter-domain linker sites vs. sites in functional domains. Additionally, within Metazoans, Plants, and Saccharomycetes, there is a strong confounding interaction between intrinsic disorder and secondary structure-alignment sites exhibiting both high disorder propensity and involvement in secondary structures have very low average rates of sequence evolution. Analysis of gene ontology (GO) terms revealed that in all four lineages, a high fraction of sequences containing these conserved, disordered-structured sites are involved in nucleic acid binding. We also observe notable differences in the statistical trends of Alveolates, where intrinsically disordered sites are more variable than in other Eukaryotes and the statistical interactions between disorder and other factors are less pronounced.Entities:
Keywords: Eukaryotes; evolutionary rates; intrinsic disorder; protein sequence; structural prediction
Year: 2018 PMID: 30441862 PMCID: PMC6265720 DOI: 10.3390/genes9110553
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Scatterplots showing minimum pairwise sequence identity (fraction of matching aligned characters) and minimum alignment coverage (seq. length/alignment length) for all Metazoan, Plant, Saccharomycete, and Alveolate clusters used in analyses.
Dataset-specific information for nonparametric analysis.
| Dataset | Metazoans | Plants | Saccharomycetes | Alveolates |
|---|---|---|---|---|
| Clusters | 6938 | 8266 | 4494 | 2697 |
| Sequences | 130632 | 198081 | 122132 | 44060 |
| Total Alignment Sites | 4677490 | 4703587 | 2990109 | 1640297 |
| Gap-free sites | 3217225 | 2851827 | 1954761 | 1179122 |
| Ordered Sites | 1819695 | 1706275 | 1223656 | 801629 |
| Disordered Sites | 373639 | 234853 | 125047 | 113892 |
| Structured sites | 1062380 | 1014001 | 722444 | 417702 |
| Random coil sites | 1314563 | 1064725 | 670357 | 424795 |
| Domain sites | 1436746 | 1175745 | 936813 | 422813 |
| Linker sites | 1368702 | 1289830 | 817371 | 657080 |
| Median Order Rate | −0.599 | −0.625 | −0.6188 | −0.605 |
| Median Disorder Rate | −0.3155 | −0.2916 | −0.3271 | 0.1426 |
| Median Structure Rate | −0.5787 | −0.6262 | −0.5935 | −0.605 |
| Median Coil Rate | −0.4682 | −0.5013 | −0.5603 | −0.4542 |
| Median Domain Rate | −0.62345 | −0.6679 | −0.6353 | −0.629 |
| Median Linker Rate | −0.3698 | −0.3718 | −0.3902 | −0.3569 |
Figure 2Split violin plots showing differences in normalized site-specific rates of amino acid replacement in: (a) ordered vs. disordered sites; (b) structured vs. coil sites; and (c) domain vs. linker sites within four eukaryotic datasets. Middle dashed lines indicate medians and outer dashed lines indicate quartiles.
Figure 3Trace plots illustrating first-order interactions among all site-wise binary factor levels: order (Order) and intrinsic disorder (Disorder), secondary structures (Structure) and random coils (Coil), functional domains (Domain) and interdomain linkers (Linker). Trace factors (solid vs. dashed lines) are indicated to the right of each row of plots. Vertical columns of plots correspond to each of the four datasets (indicated) above. Y-axes represent mean normalized evolutionary rates.
Figure 4Scatterplot showing the disorder content of clusters (fraction of disordered alignment sites) against the mean rate of sequence evolution among sites predicted to be both disordered and structured. Only sequence clusters containing disordered/structured sites are shown. Trend lines were constructed for each of the four eukaryotic datasets using Loess regression. Note that the Alveolate trend line (dashed) is consistently higher than other lineages.