| Literature DB >> 19930585 |
Abstract
BACKGROUND: Compactness of highly/broadly expressed genes in human has been explained as selection for efficiency, regional mutation biases or genomic design. However, highly expressed genes in flowering plants were shown to be less compact than lowly expressed ones. On the other hand, opposite facts have also been documented that pollen-expressed Arabidopsis genes tend to contain shorter introns and highly expressed moss genes are compact. This issue is important because it provides a chance to compare the selectionism and the neutralism views about genome evolution. Furthermore, this issue also helps to understand the fates of introns, from the angle of gene expression.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19930585 PMCID: PMC2794262 DOI: 10.1186/1745-6150-4-45
Source DB: PubMed Journal: Biol Direct ISSN: 1745-6150 Impact factor: 4.540
The correlations between sequence structural parameters and expression pattern for Arabidopsis and rice genes.
| Rice | ||||||
|---|---|---|---|---|---|---|
| Parameters | ||||||
| Length of primary transcript | -0.073 | -0.253 | -0.001* | -0.002* | -0.226 | 0.085 |
| Length of CDS | -0.222 | -0.314 | -0.167 | -0.119 | -0.165 | -0.086 |
| -0.012** | -0.011* | |||||
| Average exon length | -0.139 | -0.048*** | -0.160 | -0.033*** | 0.024*** | -0.061 |
| 0.038*** | ||||||
| Average intron length | 0.154 | 0.037** | 0.178 | -0.043*** | -0.091 | -0.007* |
| Number of introns | 0.064 | -0.116 | 0.120 | 0.054 | -0.101 | 0.101 |
| Intron density | 0.151 | 0.028 | 0.188 | 0.144 | 0.007 | 0.175 |
| Total intron length | 0.094 | -0.093 | 0.150 | 0.017** | -0.151 | 0.080 |
| 5' UTR length | 0.251 | -0.012*** | 0.297 | 0.010* | -0.109 | 0.066 |
| 3' UTR length | 0.304 | 0.067*** | 0.346 | 0.048*** | -0.097 | 0.104 |
| 5' intergenic length | -0.027** | 0.022** | -0.054 | 0.022** | 0.018** | 0.020** |
| 0.014** | 0.003* | |||||
| 3' intergenic length | -0.045 | 0.044 | -0.076 | -0.047 | 0.012* | -0.061 |
| 0.037*** | ||||||
For each structural parameter, the first line represents the Spearman's rank sum correlations with expression pattern, whereas the second line represents Spearman's partial correlations. Controlled variable for the columns of Expis expression breadth, whereas that for the columns of Width is Exp. Intron density was defined as the ratio of intron number to CDS length, i.e. intron number per coding base. Exp, total expression level; Exp, average expression level; Width, expression breadth. CDS, Coding Sequence; UTR, Untranslated Region. Significance of correlations: no asterisks, P < 1e - 10; ***, 1e - 10 0.05. Numbers in bold indicate highly significant partial correlations (P < 1e - 10).
0.05. Numbers in bold indicate highly significant partial correlations (P < 1e - 10).
Figure 1Boxplots of structural characteristics versus expression level for . In each graph, x-axis represents gene-expression level, boxes represent the range of parameters for each gene group, with bold central lines represent the medians, lower and upper boundaries represent the first and third quartiles respectively, whereas whiskers extend to the most extreme points within 1.5 × interquartile range from the boxes. The red curves represent mean values of parameters for each expression group, whereas horizontal darkviolet lines indicate the population median for each structural parameter. Presented parameters are: CDS length in (a) Arabidopsis and (b) rice; total intron length per gene in (c) Arabidopsis and (d) rice; number of introns per gene in (e) Arabidopsis and (f) rice. Differences in structural parameters between different expression groups are statistically significant (all Kruskal-Wallis rank sum test P < 1e-50).
Figure 2Extreme values of transcript lengths for plant genes scale as power-laws of expression levels. In each graph, each point represents one gene in the whole dataset, whereas triangles denote the data subset used to fit the linear line. Axes are all on the logarithmic scale. Expression data were taken from MPSS experiments [19].
Extreme values of structural parameters scale as power-laws of expression levels for plant genes.
| Rice | ||||||
|---|---|---|---|---|---|---|
| Parameters | ||||||
| Primary transcript length | -0.32(0.88) | -0.44(0.87) | -0.35(0.84) | -0.55(0.82) | -0.52(0.87) | -0.50(0.82) |
| Intron number per gene | -0.36(0.84) | -0.52(0.84) | -0.34(0.79) | -0.59(0.73) | -0.51(0.74) | -0.63(0.77) |
| Total intron length | -0.36(0.85) | -0.51(0.86) | -0.39(0.77) | -0.64(0.72) | -0.60(0.78) | -0.76(0.78) |
| Average intron length | -0.23(0.66) | -0.36(0.67) | -0.22(0.42) | -0.58(0.70) | -0.56(0.79) | -0.66(0.77) |
| CDS length | -0.33(0.86) | -0.45(0.83) | -0.36(0.80) | -0.51(0.81) | -0.44(0.81) | -0.50(0.80) |
Exponents of the power-laws are shown. For each combination of structural and expression variables, the linear regression was done as follows: both variables were firstly log-transformed; the range of expression variable was then equally divided into ~100 spaces; for each space, the median expression value and the maximum structural value were selected; these maximum dependent values were fitted against the medium independent values. Numbers in parentheses show the coeffiecients of determination (r2) for the linear regressions. For all regressions, P-value < 2e-16 according to analysis of variance. Exp, total expression level; Exp, average expression level; Peak, peak expression level across tissues.