| Literature DB >> 18270559 |
Charles D Warden1, Seong-Ho Kim, Soojin V Yi.
Abstract
Functional RNAs (fRNAs) are being recognized as an important regulatory component in biological processes. Interestingly, recent computational studies suggest that the number and biological significance of functional RNAs within coding regions (coding fRNAs) may have been underestimated. We hypothesized that such coding fRNAs will impose additional constraint on sequence evolution because the DNA primary sequence has to simultaneously code for functional RNA secondary structures on the messenger RNA in addition to the amino acid codons for the protein sequence. To test this prediction, we first utilized computational methods to predict conserved fRNA secondary structures within multiple species alignments of Saccharomyces sensu strico genomes. We predict that as much as 5% of the genes in the yeast genome contain at least one functional RNA secondary structure within their protein-coding region. We then analyzed the impact of coding fRNAs on the evolutionary rate of protein-coding genes because a decrease in evolutionary rate implies constraint due to biological functionality. We found that our predicted coding fRNAs have a significant influence on evolutionary rates (especially at synonymous sites), independent of other functional measures. Thus, coding fRNA may play a role on sequence evolution. Given that coding regions of humans and flies contain many more predicted coding fRNAs than yeast, the impact of coding fRNAs on sequence evolution may be substantial in genomes of higher eukaryotes.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18270559 PMCID: PMC2216430 DOI: 10.1371/journal.pone.0001559
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Substantial Proportion of Predicted fRNAs within Coding Regions.
Figure 2Distribution of GO Annotations in Strictly Defined Dataset.
This figure shows the distribution of GO annotations for the set of genes containing at least one predicted fRNA fold (‘strict’) compared to the background set of genes in the yeast genome as annotated in the SGD database (‘SGD’). Some GO annotations have been abbreviated for the interest of space limitation. p-values: * = 0.05, ** = 0.01, *** = 0.001.
Correlation and partial correlations show coding fRNAs decrease evolutionary rates (genes with negative mfe).
| dN | dS | dS′ | dN/dS | dN/dS′ | |
| Gene Expression | −0.163 (0.583****) | −0.322 (−0.735***) | −0.203# (−0.237*) | −0.062 (−0.360**) | −0.135 (−0.567****) |
| CAI | −0.376*** (−0.620****) | −0.514*** (−0.762****) | 0.206 (−0.015) | −0.211# (−0.391***) | −0.404*** (−0.632***) |
| Dispensability | 0.293* (0.370**) | −0.170 (0.294*) | 0.160 (0.223#) | 0.233* (0.300**) | 0.275* (0.350**) |
|
| −0.089 (− | −0.183 ( |
| −0.033 (−0.139) | −0.040 (−0.191) |
Note: Pearson Correlations are shown in parenthesis below partial correlation in the above table. For above dataset, ribosomal genes are removed and all other factors are considered for partial correlation analysis. Sample size is 73 genes. Significant correlations with fRNA coverage are shown in bold; p-values: # = 0.1, * = 0.05, ** = 0.01, *** = 0.001, **** = 10−4.
Principal component regression reveals coding fRNAs have significant influence on evolutionary divergence (genes with negative mfe).
| Principal Components | ||||||
| 1 | 2 | 3 | 4 | All | ||
| Component Composition: | ||||||
| Gene Expression |
| 0.046 | 0.006 |
| ||
| CAI |
| 0.171 | 0.000 |
| ||
| Gene Dispensability | 0.101 |
|
| 0.002 | ||
|
| 0.099 |
|
| 0.021 | ||
| Percent Variance Explained: | ||||||
| dN |
| 0.15 | 1.62 | 1.03 |
| |
| dS |
|
| 0.06 | 0.33 |
| |
| dS′ |
|
| 1.74 | 2.90 |
| |
| dN/dS |
| 0.01 | 2.43 | 0.85 |
| |
| dN/dS′ |
| 0.70 | 2.09 | 1.57 |
| |
Numbers in bold correspond to predictors that contribute at least 20% to indicated component.
Using information from regression analysis, underlined font means p-values<0.1; bold font means p-value<0.05.
Sample size is 73 genes. Results are similar when considering divergence across a shorter timescale and additional functional variables (see Tables S6,S7).
Correlations and Partial Correlations using Pearson Correlations on Genes with EFP>0.
| dN | dS | dS′ | dN/dS | dN/dS′ | |
| Gene Expression | 0.035 (−0.430**) | −0.260# (−0.683****) | −0.159 (−0.119) | 0.112 (−0.152) | 0.057 (−0.414**) |
| CAI | −0.451*** (−0.563****) | −0.551**** (−0.737****) | 0.256# (0.149) | −0.255* (−0.274) | −0.477*** (−0.581****) |
| Dispensability | 0.288* (0.334*) | 0.179 (−0.282*) | 0.109 (0.143) | 0.215 (0.235) | 0.273* (0.315) |
|
| −0.151 (−0.166) | −0.221 (−0.251#) |
| −0.072 (−0.066) | −0.098 (−0.114) |
Note: Pearson Correlations are shown in parenthesis below partial correlation in the above table. For above dataset, ribosomal genes are removed and all other factors are considered for partial correlation analysis. Sample size is 55 genes. Significant correlations with fRNA coverage are shown in bold; p-values: # = 0.1, * = 0.05, ** = 0.01, *** = 0.001, **** = 10−4.
Results of Principal Component Regression Analyses for Genes with EFP>0.
| Principal Components | ||||||
| 1 | 2 | 3 | 4 | All | ||
| Component Composition: | ||||||
| Gene Expression |
| 0.002 | 0.029 |
| ||
| CAI |
| 0.125 | 0.030 |
| ||
| Gene Dispensability | 0.111 | 0.102 |
| 0.007 | ||
|
| 0.029 |
| 0.161 | 0.040 | ||
| Percent Variance Explained: | ||||||
| dN |
| 0.09 | 0.95 |
|
| |
| dS |
| 0.05 | 1.11 | 1.53 |
| |
| dS′ | 0.08 |
| 0.20 | 2.23 |
| |
| dN/dS | 5.11 | 0.05 | 2.49 |
| 13.41 | |
| dN/dS′ |
| 0.78 | 1.06 |
|
| |
Numbers in bold correspond to predictors that contribute at least 20% to indicated component.
Using information from regression analysis, underlined font means p-values<0.1; bold font means p-value<0.05.
Sample size is 55 genes.