| Literature DB >> 23601824 |
Sergio R P Line1, Xiaoming Liu, Ana Paula de Souza, Fuli Yu.
Abstract
BACKGROUND: Gene expression is one of the most relevant biological processes of living cells. Due to the relative small population sizes, it is predicted that human gene sequences are not strongly influenced by selection towards expression efficiency. One of the major problems in estimating to what extent gene characteristics can be selected to maximize expression efficiency is the wide variation that exists in RNA and protein levels among physiological states and different tissues. Analyses of datasets of stably expressed genes (i.e. with consistent expression between physiological states and tissues) would provide more accurate and reliable measurements of associations between variations of a specific gene characteristic and expression, and how distinct gene features work to optimize gene expression.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23601824 PMCID: PMC3639913 DOI: 10.1186/1471-2164-14-268
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Correlation analysis between gene parameters and mRNA expression
| −0.29(1.2e-12) | −0.05(0.23) | −0.46(<2.2e-16) | 0.32(3.3e-15) | 0.07(0.08) | 0.12(0.004) | |
| −0.39(1.3e-08) | −0.01(0.89) | −0.51(1.2e-14) | 0.39(4.6e-08) | 0.15(0.03) | 0.2(0.005) | |
| −0.34(4.7e-4) | −0.18(0.07) | −0.53(1.7e-08) | 0.18(0.07) | −0.01(0.89) | 0.022(0.82) | |
| −0.26(4.5e-09) | 0.01(0.11) | −0.23(2.3e-07) | 0.40(<2.2e-16) | 0.21(3.1e-06) | −0.02(0.20) | |
| | | −0.20(1.0e-12) | | | | |
| | | −0.18(<1.0e-04) | | | | |
| 0.23 (<0.0003) |
Spearman rank coefficients (p values) between gene characteristics and mRNA expression.
Correlations between amino acids frequencies weighted by expression and isoaccepting tRNA gene copy number, and amino acids size/complexity score
| Group 1 | 0.69(8.0e-04) | −0.75(1.2e-04) |
| Group 2 | 0.69(6.7e-04) | −0.73(2.4e-04) |
| Group 3 | 0.69(8.1e-04) | −0.73(2.4e-04) |
| Group 4 | 0.65(1.8e-03) | −0.80(1.6e-05) |
| High exp* | 0.56(1.0e-02) | −0.79(2.8e-05) |
| All genes* | 0.58(7.8e-03) | −0.80(2.0e-05) |
Spearman rank coefficients (p-values). *From reference 22.
Summary of regression analysis (Group 1 & regression model1)
| Log(Lcds) | −0.36(-0.49,-0.23) | 1.98e-07 | 0.26(0.19,0.33) |
| tAi | 13.39(8.51,18.27) | 1.09e-07 | 0.17(0.11,0.23) |
| Log(L3utr) | −0.25(-0.34,-0.16) | 1.48e-07 | 0.16(0.10,0.22) |
| dG | 0.05 (0.04,0.06) | 7.40e-08 | 0.04(0.02,0.06) |
| Cys | −10.62(-18.41,-2.83) | 0.00764 | 0.01(0.00,0.02) |
| Asp | −4.82(-10.04,0.40) | 0.07043 | 0.01(0.00,0.02) |
| Glu | −5.55(-9.22,-1.88) | 0.00316 | 0.02(0.01,0.03) |
| Leu | −10.37(-9.22,-1.88) | 2.60e-08 | 0.06(0.03,0.09) |
| Gln | −16.13(-21.45,-10.79) | 5.05e-09 | 0.08(0.04,0.12) |
| Ser | −10.37(-14.69,-6.05) | 3.15e-06 | 0.19(0.14,0.24) |
*Relative importance to log(expression) normalized to sum 100%. Lcds length of coding sequence.
L3utr length of 3′ UTR.
Figure 1Scatter plots of mRNA levels (x-axis) fitted model values (y-axis). A. Regression of model1 in Group 1 (adjusted R-squared = 0.41, p < 2.2e-16, n = 575). B. Regression using variables from model1 in Group 4 (adjusted R-squared = 0.31, p < 2.2e-16, n = 503). C. Regression using variables from model2 in Group 1 (adjusted R-squared = 0.35, p < 2.2e-16, n = 575). D. Regression of model2 in Group 4 (adjusted R-squared = 0.33, p < 2.2e-16, n = 503).
Summary of regression analysis (Group 4 & regression model2)
| tAi | 25.0(20.14,29.94) | < 2e-16 | 0.28(0.13,0.43) |
| Log(L3utr) | −0.14(-0.23,-0.05) | 0.00355 | 0.17(0.08,0.26) |
| dG | 0.02(0.01,0.03) | 0.03023 | 0.01(0.00,0.02) |
| Cys | −22.39(-26.59,-18.20) | < 2e-16 | 0.18(0.10,0.26) |
| Glu | −5.34(-8.96,-1.72) | 0.00396 | 0.01(0.00-0.02) |
| Gln | −10.58(-17.53,-3.63) | 0.00292 | 0.06(0.01,0.11) |
| His | −14.64(-23.84,-5.44) | 0.00189 | 0.21(0.10,0.32) |
| Leu | −7.92(-11.85,-3.99) | 8.75e-05 | 0.05(0.01,0.09) |
| Arg | 5.60(1.10,10.10) | 0.01488 | 0.01(0.00-0.01) |
*Relative importance to log(expression) normalized to sum 100%.
L3utr length of 3′ UTR, dG minimum folding energy. tAi tRNA adaptation index.
Figure 2Scatter plots of mRNA levels (x-axis) fitted model values (y-axis). A. Regression using variables from model1 in Group 2, formed by genes that were expressed in at least 2 tissues and had a standard deviation/mean ratio of mRNA expression values < 0.4 (adjusted R-squared = 0.51, p < 2.2e-16, n = 196). B. Regression using variables from model1 in Group 3, formed by genes that were expressed in at least 3 tissues and had a standard deviation/mean ratio of mRNA expression values < 0.4 (adjusted R-squared = 0.50, p < 2.2e-16, n = 99).
Gene characteristics in the groups analyzed
| 981(842,1107)a | 987(906,1028)a | 0.367(0.36-0.37)a | −11.6(-12.4,-11.3)a | 743(627,891)a | 27.8% a | |
| 801(663,935)b | 790(654,943)b | 0.371(0.37-0.30)b | −10.9(-12.1,-10.9)b | 3026(1891,4935)b | 50.6% b | |
| 717(513,856)b | 612(498,726)b | 0.377(0.37-0.38)c | - 10.1(-11.3,-8.6)b | 8212(5123,9010)c | 69.0% c | |
| 1071(934,1226)a | 1491(1356,1575)c | 0.367(0.365-0.369)a | −12.0(-12.1,-10.8)a | 18.0% d |
* median (95% Confidence interval)l,+ mean:95% (Confidence interval), Grp Group, dG refers to minimum-free-energy of mRNA structure formed by 50 bases of 5′ UTR, GO Tr&RibSy refers to the % of genes belonging to the pool of genes that belong to translation or ribosome synthesis categories. Different letters (a, b, c, d) represent statistically significant differences (p < 0.05) among values in the same column. Differences in GO: Tr&RibSy were determined using chi-squared test for proportions with Yates’ correction.