| Literature DB >> 31227545 |
Lou Nielly-Thibault1,2,3,4, Christian R Landry5,6,7,8.
Abstract
Proteins are among the most important constituents of biological systems. Because all protein-coding genes have a noncoding ancestral form, the properties of noncoding sequences and how they shape the birth of novel proteins may influence the structure and function of all proteins. Differences between the properties of young proteins and random expectations from noncoding sequences have previously been interpreted as the result of natural selection. However, interpreting such deviations requires a yet-unattained understanding of the raw material of de novo gene birth and its relation to novel functional proteins. We mathematically show that the average properties and selective filtering of the "junk" polypeptides of which this raw material is composed are not the only factors influencing the properties of novel functional proteins. We find that in some biological scenarios, they also depend on the variance of the properties of junk polypeptides and their correlation with the rate of allelic turnover, which may itself depend on mutational biases. This suggests for instance that any property of polypeptides that accelerates their exploration of the sequence space could be overrepresented in novel functional proteins, even if it has a limited effect on adaptive value. To exemplify the use of our general theoretical results, we build a simple model that predicts the mean length and mean intrinsic disorder of novel functional proteins from the genomic GC content and a single evolutionary parameter. This work provides a theoretical framework that can guide the prediction and interpretation of results when studying the de novo emergence of protein-coding genes.Entities:
Keywords: GC content; de novo gene birth; neutral evolution; novel proteins; random sequences
Mesh:
Substances:
Year: 2019 PMID: 31227545 PMCID: PMC6707459 DOI: 10.1534/genetics.119.302187
Source DB: PubMed Journal: Genetics ISSN: 0016-6731 Impact factor: 4.562
Figure 1A new classification of polypeptides to clarify the process of de novo gene birth. (A) Evolution of a single polypeptide from JP to novFP to derFP. A JP is a polypeptide whose beneficial effects are either nonexistent or have not yet caused the loss of a nonsynonymous or cis-regulatory-derived allele of this polypeptide through natural selection. We call such an elimination event the functionalization of the polypeptide. A novFP is the immediate product of functionalization: a polypeptide that is no longer a JP but is identical to its ancestral JP in terms of sequence and cis-regulation, while their genetic backgrounds and environments may differ. A derFP is produced when a novFP undergoes the fixation of a nonsynonymous or cis-regulatory change. (B) Partitioning of proteome evolution in accordance with the classification of polypeptides. The two loops in the diagram represent the fact that JPs and derFPs can evolve without leaving their respective classes, while a novFP stops being a novFP as soon as it evolves. (C) The general determinants of distributions of polypeptide properties across the three classes of polypeptides. The curves describe hypothetical distributions of an arbitrary property of polypeptides, such as length or ISD. The distribution among novFPs is always restricted to the values that occur in the distribution among JPs, which is a consequence of the fact that functionalization turns a JP into a novFP without modifying it, as can be seen in (A).
Figure 2The mean length and mean structural disorder of novFPs are predicted by the birth bias and the genomic GC content under a simple model of the sequences of JPs. (A) Contour plot of the predicted average length of novFPs in amino acid residues. (B) Contour plot of the predicted average of IUPred long disorder (Dosztányi ) among novFPs. (C) The predicted mean and SD of the length of JPs as functions of the GC content. (D) The predicted mean and SD of IUPred long disorder among JPs as functions of the GC content. Hatched areas indicate impossible scenarios, that is, negative polypeptide lengths and ISD percentages outside the 0–100% interval. Possible values of the birth bias differ between length and ISD because the ranges of these two properties constrain their respective means and SDs differently. The landscapes in A and B can be understood as the results of applying Equation 2 to the curves in C and D, respectively. As a result, the vertical “slice” of a landscape at a given GC content is a straight line whose intercept and slope are respectively the mean and SD associated with this GC content in the corresponding bottom panel. The curve obtained by taking a horizontal slice where there is no birth bias corresponds to the relation between the mean of the property among JPs, i.e., the solid blue curve in the corresponding bottom panel. Since the vertical distance between contour lines is inversely proportional to the vertical slope of the landscape, it is inversely proportional to the SD of the property among JPs, i.e., the dashed red curve in the corresponding bottom panel. The solid blue curves in C and D are consistent with known effects of an increase in GC content on random polypeptides, namely an increase in their mean length and mean ISD (Basile ). The fact that contour lines are curved in A and B indicates that the effect of GC content on novFPs is not always proportional to its effect on JPs. As shown in C and D, such inconsistencies of the effect of GC content between JPs and novFPs are due to the fact that the GC content affects both the SDs and the means of the properties of JPs, sometimes in opposite directions. St. dev., standard deviation.