| Literature DB >> 22071789 |
David Managadze1, Igor B Rogozin, Diana Chernikova, Svetlana A Shabalina, Eugene V Koonin.
Abstract
Mammalian genomes contain numerous genes for long noncoding RNAs (lncRNAs). The functions of the lncRNAs remain largely unknown but their evolution appears to be constrained by purifying selection, albeit relatively weakly. To gain insights into the mode of evolution and the functional range of the lncRNA, they can be compared with much better characterized protein-coding genes. The evolutionary rate of the protein-coding genes shows a universal negative correlation with expression: highly expressed genes are on average more conserved during evolution than the genes with lower expression levels. This correlation was conceptualized in the misfolding-driven protein evolution hypothesis according to which misfolding is the principal cost incurred by protein expression. We sought to determine whether long intergenic ncRNAs (lincRNAs) follow the same evolutionary trend and indeed detected a moderate but statistically significant negative correlation between the evolutionary rate and expression level of human and mouse lincRNA genes. The magnitude of the correlation for the lincRNAs is similar to that for equal-sized sets of protein-coding genes with similar levels of sequence conservation. Additionally, the expression level of the lincRNAs is significantly and positively correlated with the predicted extent of lincRNA molecule folding (base-pairing), however, the contributions of evolutionary rates and folding to the expression level are independent. Thus, the anticorrelation between evolutionary rate and expression level appears to be a general feature of gene evolution that might be caused by similar deleterious effects of protein and RNA misfolding and/or other factors, for example, the number of interacting partners of the gene product.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22071789 PMCID: PMC3242500 DOI: 10.1093/gbe/evr116
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Statistics of lincRNA Data Sets
| Mouse | Human | |
| Probe sets | 5444 | 917 |
| All lincRNAs | 2390 | 589 |
| Median length, nt | 2,535 | 2,626 |
| Average length, full lincRNA | 11,775 | 16,855 |
| Fused exons | 1,843 | 1,998 |
| Fused introns | 24,246 | 36,686 |
| GC% (aggregate | 0.42/0.44 | 0.42/0.44 |
| Fused exons | 0.45/0.45 | 0.45/0.45 |
| Fused introns | 0.42/0.44 | 0.41/0.43 |
| Intron-containing (introns with length<40 nt discarded) | 979 | 245 |
| Exons in intron-containing lincRNAs | 3,439 | 1,194 |
| Introns in intron-containing lincRNAs | 2,462 | 949 |
| Introns shorter than 40 nt | 424 | 94 |
| Exons shorter than 15 nt | 41 | 7 |
| Introns per lincRNA | 2.52 | 3.86 |
| Exons per lincRNA | 3.52 | 4.85 |
| Average length, nt | 25,816 | 38,264 |
| Average exon length | 478 | 383 |
| Average intron length | 9,574 | 9,435 |
| Intronless genes (one exon only) | 1411 | 344 |
| lincRNAs with A ≥ 7.0, uniquely mapping to genomes | 2,013 | 519 |
| Intron-containing | 918 | 211 |
Aggregate GC% is calculated from the sequences of all samples concatenated together.
Evolutionary Rates of Mouse Intron-Containing lincRNA Genes
| Species Pair | Threshold (Indel %) | Exons | Introns | Student | |||||
| Data Points | Mean Rate | Variance | Data Points | Mean Rate | Variance | ||||
| Mouse–Human | 15 | 290 | 0.375 | 0.012 | 141 | 0.425 | 0.012 | 1 | 1.4E-05 |
| 30 | 468 | 0.394 | 0.011 | 259 | 0.430 | 0.009 | 1 | 6.0E-06 | |
| 45 | 599 | 0.404 | 0.010 | 458 | 0.439 | 0.006 | 1 | 1.9E-05 | |
| 100 | 871 | 0.418 | 0.011 | 863 | 0.449 | 0.006 | 1 | 1.9E-12 | |
| Mouse–Chimp | 15 | 270 | 0.375 | 0.013 | 117 | 0.431 | 0.010 | 1 | 6.5E-15 |
| 30 | 444 | 0.398 | 0.012 | 230 | 0.431 | 0.008 | 1 | 2.5E-07 | |
| 45 | 582 | 0.405 | 0.011 | 433 | 0.438 | 0.006 | 1 | 4.5E-06 | |
| 100 | 863 | 0.417 | 0.011 | 840 | 0.448 | 0.006 | 1 | 4.0E-12 | |
| Mouse–Macaque | 15 | 251 | 0.374 | 0.013 | 115 | 0.428 | 0.012 | 1 | 7.2E-05 |
| 30 | 408 | 0.392 | 0.011 | 221 | 0.434 | 0.009 | 1 | 9.8E-07 | |
| 45 | 540 | 0.403 | 0.011 | 375 | 0.442 | 0.007 | 1 | 2.0E-15 | |
| 100 | 847 | 0.419 | 0.012 | 829 | 0.451 | 0.006 | 1 | 1.2E-11 | |
| Mouse–Rat | 15 | 840 | 0.149 | 0.002 | 815 | 0.168 | 0.003 | 1 | 3.1E-10 |
| 30 | 878 | 0.150 | 0.002 | 887 | 0.169 | 0.003 | 1 | 8.9E-10 | |
| 45 | 890 | 0.151 | 0.002 | 897 | 0.169 | 0.003 | 1 | 3.6E-08 | |
| 100 | 910 | 0.152 | 0.003 | 913 | 0.171 | 0.003 | 1 | 1.2E-13 | |
| Mouse–Dog | 15 | 191 | 0.387 | 0.015 | 91 | 0.475 | 0.021 | 1 | 1.1E-09 |
| 30 | 355 | 0.429 | 0.017 | 195 | 0.502 | 0.016 | 1 | 1.0E-14 | |
| 45 | 483 | 0.445 | 0.015 | 321 | 0.503 | 0.013 | 1 | 4.0E-11 | |
| 100 | 816 | 0.467 | 0.018 | 823 | 0.509 | 0.010 | 1 | 7.9E-13 | |
We used alignments with the total length of indels below a threshold; three indel thresholds (15%, 30%, 45% and 100%, that is, no threshold) were applied.
FDistribution of evolutionary distances for exons (A) and introns (B) of the orthologous lincRNAs from human and mouse. All exon and intron sequences from each gene from the respective data sets were concatenated prior to the analysis.
Correlations between Evolutionary Rates and Expression Levels (Microarrays) of Mouse lincRNAs
| Species | Human | Chimp | Macaque | Rat | Dog |
| Exons, Indels: 15% | |||||
| Pearson | −0.105 | −0.157 | −0.139 | −0.113 | −0.143 |
| | 0.0040 | <0.0001 | 0.0003 | <0.0001 | 0.0009 |
| Spearman | −0.112 | −0.132 | −0.121 | −0.107 | −0.142 |
| | 0.0017 | 0.0004 | 0.0016 | <0.0001 | 0.0008 |
| Kendall | −0.074 | −0.087 | −0.080 | −0.070 | −0.093 |
| | 0.0019 | 0.0005 | 0.0018 | <0.0001 | 0.0011 |
| Datapoints | 779 | 720 | 684 | 1735 | 558 |
| Exons, Indels: 30% | |||||
| Pearson | −0.103 | −0.128 | −0.108 | −0.099 | −0.114 |
| | 0.0006 | <0.0001 | 0.0007 | <0.0001 | 0.0009 |
| Spearman | −0.102 | −0.112 | −0.095 | −0.099 | −0.123 |
| | 0.0005 | 0.0002 | 0.0022 | <0.0001 | 0.0003 |
| Kendall | −0.067 | −0.074 | −0.064 | −0.065 | −0.082 |
| | 0.0006 | 0.0003 | 0.0023 | <0.0001 | 0.0003 |
| Datapoints | 1148 | 1096 | 1027 | 1930 | 877 |
| Exons, Indels: 45% | |||||
| Pearson | −0.113 | −0.117 | −0.103 | −0.098 | −0.105 |
| | <0.0001 | <0.0001 | 0.0003 | <0.0001 | 0.0005 |
| Spearman | −0.098 | −0.097 | −0.091 | −0.097 | −0.100 |
| | 0.0002 | 0.0003 | 0.0010 | <0.0001 | 0.0007 |
| Kendall | −0.065 | −0.064 | −0.060 | −0.064 | −0.066 |
| | 0.0003 | 0.0003 | 0.0012 | <0.0001 | 0.0008 |
| Datapoints | 1411 | 1381 | 1286 | 1950 | 1138 |
| Introns, Indels: 15% | |||||
| Pearson | −0.014 | 0.004 | −0.004 | −0.011 | −0.009 |
| | 0.8696 | 0.9646 | 0.9653 | 0.7617 | 0.9322 |
| Spearman | −0.018 | −0.001 | −0.026 | −0.029 | −0.043 |
| | 0.8345 | 0.9922 | 0.7823 | 0.4053 | 0.6907 |
| Kendall | −0.010 | 0.005 | −0.015 | −0.019 | −0.028 |
| | 0.8655 | 0.9324 | 0.8071 | 0.4202 | 0.7019 |
| Datapoints | 141 | 117 | 115 | 814 | 89 |
| Introns, Indels: 30% | |||||
| Pearson | −0.014 | −0.017 | −0.038 | −0.015 | −0.038 |
| | 0.81701 | 0.7919 | 0.5721 | 0.6525 | 0.5952 |
| Spearman | −0.021 | −0.063 | −0.047 | −0.031 | −0.067 |
| | 0.7424 | 0.3421 | 0.4890 | 0.3592 | 0.3511 |
| Kendall | −0.013 | −0.041 | −0.031 | −0.020 | −0.045 |
| | 0.7571 | 0.3532 | 0.4869 | 0.3782 | 0.3569 |
| Datapoints | 259 | 230 | 221 | 885 | 194 |
| Introns, Indels: 45% | |||||
| Pearson | −0.009 | −0.013 | −0.025 | −0.010 | −0.043 |
| | 0.8450 | 0.7926 | 0.6252 | 0.7540 | 0.4421 |
| Spearman | −0.003 | −0.021 | −0.056 | −0.024 | −0.046 |
| | 0.9529 | 0.6700 | 0.2814 | 0.4777 | 0.4086 |
| Kendall | −0.001 | −0.012 | −0.038 | −0.015 | −0.029 |
| | 0.9625 | 0.6993 | 0.2735 | 0.5041 | 0.4341 |
| Datapoints | 458 | 433 | 375 | 895 | 320 |
FCorrelation between the expression level and evolutionary rate for mouse (A–C) and human (D) lincRNAs based on microarray data. The data are for the indel threshold = 15%.
FCorrelation between expression level and evolutionary rate for mouse lincRNAs based on RNA-Seq data. The data are for the indel threshold = 15%.
Correlation between the Evolutionary Rates and Expression Levels (RNA-seq) for Mouse
| Species | Human | Chimp | Macaque | Rat | Dog | |
| Exons, Indels: 15% | ||||||
| Pearson | −0.224 | −0.256 | −0.223 | −0.178 | −0.252 | |
| | <0.000001 | <0.000001 | <0.000001 | <0.000001 | <0.000001 | |
| Spearman | −0.247 | −0.279 | −0.236 | −0.203 | −0.280 | |
| | <0.000001 | <0.000001 | <0.000001 | <0.000001 | <0.000001 | |
| Kendall | −0.168 | −0.190 | −0.162 | −0.138 | −0.190 | |
| | <0.000001 | <0.000001 | <0.000001 | <0.000001 | <0.000001 | |
| Data points | 772 | 712 | 678 | 1766 | 547 | |
| Exons, Indels: 30% | ||||||
| Pearson | −0.248 | −0.242 | −0.249 | −0.171 | −0.278 | |
| | <0.000001 | <0.000001 | <0.000001 | <0.000001 | <0.000001 | |
| Spearman | −0.271 | −0.268 | −0.268 | −0.196 | −0.305 | |
| | <0.000001 | <0.000001 | <0.000001 | <0.000001 | <0.000001 | |
| Kendall | −0.185 | −0.181 | −0.182 | −0.134 | −0.207 | |
| | <0.000001 | <0.000001 | <0.000001 | <0.000001 | <0.000001 | |
| Data points | 1136 | 1086 | 1016 | 1851 | 863 | |
| Exons, Indels: 45% | ||||||
| Pearson | −0.234 | −0.224 | −0.227 | −0.181 | −0.269 | |
| | <0.000001 | <0.000001 | <0.000001 | <0.000001 | <0.000001 | |
| Spearman | −0.257 | −0.250 | −0.247 | −0.199 | −0.288 | |
| | <0.000001 | <0.000001 | <0.000001 | <0.000001 | <0.000001 | |
| Kendall | −0.174 | −0.168 | −0.167 | −0.135 | −0.195 | |
| | <0.000001 | <0.000001 | <0.000001 | <0.000001 | <0.000001 | |
| Data points | 1402 | 1366 | 1273 | 1873 | 1124 | |
Correlations between Evolutionary Rates and Expression Levels (Microarrays) for Samples of Alignments of Orthologous Protein-Coding Genes from Human and Mouse Simulating lincRNA Sets
| Comparison | Mean Correlation Coefficient CCPC | Fraction of Samples with the CCPC ≤ CClincRNA | 95% Confidence Intervals for CCPC |
| Nonsynonymous sites | |||
| Human–Chimp | −0.16 | 0.08 | −0.08: −0.22 |
| Human–Macaque | −0.14 | 0.62 | −0.06: −0.20 |
| Human–Dog | −0.06 | 0.98 | +0.07: −0.17 |
| Mouse–Rat | −0.10 | 0.44 | −0.05: −0.15 |
| Synonymous sites | |||
| Human–Chimp | −0.04 | 0.84 | +0.04: −0.12 |
| Human–Macaque | −0.04 | 0.98 | +0.04: −0.13 |
| Human–Dog | −0.05 | 0.99 | +0.09: −0.19 |
| Mouse–Rat | −0.04 | 0.94 | +0.02: −0.11 |
Note.—To compare protein-coding genes (PC) and lincRNAs, we used a sampling procedure repeated 1,000 times. Each sample has the size and the mean value of evolutionary distance approximately equal to those for the subsets of the lincRNAs (Table 1 and supplementary table 1, Supplementary Material online), the difference between the mean evolutionary distance for the PC genes and the mean distance for the lincRNAs is not significant according to the Student t-test. Pearson correlation coefficient was used to measure correlation between the expression and the divergence for protein-coding genes (CCPC). The median value of Pearson correlation coefficients for 15%, 30%, and 45% thresholds was used as CClincRNA. For pairwise comparisons other than those listed in the table, the sampling procedure did not converge due to insufficient number of protein-coding genes with large evolutionary distance.
FDistributions of correlation coefficients between the evolutionary rates and the expression levels for samples of alignments of the human–mouse protein-coding genes and lincRNAs. (A) Mouse–rat, nonsynonymous sites. (B) Human–dog, nonsynonymous sites. (C) Mouse–rat, synonymous sites. (D) Human–dog, synonymous sites. The distributions are for 1,000 samples of protein-coding genes simulating the lincRNA sets (for details see text and Table 5). The lincRNA correlation coefficients are shown by red arrows. The expression values were from the human and mouse microarray data sets.
FCorrelation between the predicted level of nucleotide pairing in optimal folding and expression level for mouse lincRNAs measured by EST abundance (A) and estimated from GenAtlas database (B).