| Literature DB >> 19633720 |
Salvatore Camiolo1, Domenico Rau, Andrea Porceddu.
Abstract
Recently features of gene expression profiles have been associated with structural parameters of gene sequences in organisms representing a diverse set of taxa. The emerging picture indicates that natural selection, mediated by gene expression profiles, has a significant role in determining genic structures. However the current situation is less clear in plants as the available data indicates that the effect of natural selection mediated by gene expression is very weak. Moreover, the direction of the patterns in plants appears to contradict those observed in animal genomes. In the present work we analized expression data for >18000 Arabidopsis genes retrieved from public datasets obtained with different technologies (MPSS and high density chip arrays) and compared them with gene parameters. Our results show that the impact of natural selection mediated by expression on genes sequences is significant and distinguishable from the effects of regional mutational biases. In addition, we provide evidence that the level and the breadth of gene expression are related in opposite ways to many structural parameters of gene sequences. Higher levels of expression abundance are associated with smaller transcripts, consistent with the need to reduce costs of both transcription and translation. Expression breadth, however, shows a contrasting pattern, i.e. longer genes have higher breadth of expression, possibly to ensure those structural features associated with gene plasticity. Based on these results, we propose that the specific balance between these two selective forces play a significant role in shaping the structure of Arabidopsis genes.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19633720 PMCID: PMC2712092 DOI: 10.1371/journal.pone.0006356
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Cluster on the correlations (Pearson's r) among the different measures of expression considered both for (A) oligo-array and (B) MPSS.
The experimental unit is represented by the developemental stage in the DS method and by the Organ in the O method. Expression profiles were reduced to discrete measures considering all (A-methods) or only informative (I-methods) experimental units.
Correlation between structural genomic parameters and both gene expression and expression.
| Gene characteristics |
|
| ||
|
|
|
|
| |
| DS-I | O-I | DS-EB_w0 | O-EB_w0 | |
|
|
|
|
| |
| Gene length | −0.142* | −0.064* | 0.136* | 0.108* |
| 5′ UTR length | −0.023† | 0.030‡ | 0.154* | 0.125* |
| CDS length | −0.164* | −0.125* | 0.014 | 0.014 ns |
| Intron length | −0.022† | 0.018§ | 0.174* | 0.124* |
| Intron length_w0 | −0.102* | 0.014 ns | 0.152* | 0.141* |
| 3′ UTR length | 0.074* | 0.141* | 0.202* | 0.178* |
| Number of exons | −0.079* | −0.015 | 0.132* | 0.099* |
| Average exon length | −0.028* | −0.061* | −0.159 | −0.084* |
| Number of introns | −0.079* | −0.015* | 0.132* | 0.099* |
| Number introns_w0 | −0.099* | −0.016 ns | 0.090* | 0.099* |
| Average intron length | 0.008 ns | 0.045* | 0.146* | 0.042* |
| Average intron length_w0 | 0.003 ns | 0.050* | 0.060* | 0.044* |
| Intron Density | 0.007 ns | 0.078* | 0.188* | 0.124* |
| Intron Density_w0 | 0.004 ns | 0.078* | 0.151* | 0.124* |
Data are presented both for oligo-array and MPSS assays. Statistical significances: ns = not significant; §P<0.05 †P<0.01; ‡P<0.001;*P<0.0001. Sample size of the correlations were comprised between n = 12051 and n = 14236. Note1. For this calculations, gene for which expression was zero were not considered to avoid potential problem arising from defective probes (Eisenberg and Levanon, 2003). Note2. For introns parameters, data are presented both considering and excluding (w0) genes without introns.
Figure 2Relationship between the expression profile and (a) the CDS length, (b) the Total Intron Length and (c) 5utr Length.
The points indicated by the symbol • and interpolated by the continuous line refer to the expression level, whereas the points indicated by the symbol ○ and interpolated by a dashed line refer to the expression breadth. In both cases 14236 genes were considered which have been grouped in bins, each representing the 5% of the whole dataset.
Correlation between structural genomic parameters and both gene expression level and expression breadth.
| Genomic variables |
|
| ||
|
|
|
|
| |
| DS-I | O-I | DS-EB_w0 | O-EB_w0 | |
| GC% Gene | 0.033* | 0.043* | 0.046* | 0.084* |
| GC% 5′ UTR | −0.077* | 0.021§ | 0.156* | 0.137* |
| GC% CDS | 0.222* | 0.254* | 0.109* | 0.084* |
| GC% intron_w0 | −0.079* | 0.025† | 0.286* | 0.234* |
| GC% 3′ UTR | −0.045* | 0.000 ns | 0.124* | 0.124* |
| Length of intergenic spacers | 0.027† | 0.017† | −0.109* | −0.063* |
| Length of intergenic spacers_w0 | 0.024† | 0.016 ns | −0.123* | −0.069* |
| GC% intergenic spacers | 0.008 ns | 0.000 ns | 0.076* | 0.045* |
| GC% of RNA | 0.074* | 0.115* | 0.121* | 0.093* |
Data are presented both for oligo-array and MPSS assays. Statistical significances: ns = not significant; §P<0.05 †P<0.01; ‡P<0.001;*P<0.0001. Sample size of the correlations were comprised between n = 12051 and n = 14236. Note. For this calculations, gene for which expression was zero were not considered to avoid potential problem arising from defective probe (Eisenberg and Levanon, 2003).
Multiple regression analysis of EL, EB and several gene parameters.
| Microarray | MPSS | |||||||||||
| Independent variables in the models | Independent variables in the models | |||||||||||
| Dependent variable | EL,EB | EL,EB & regional | EL,EB, regional & genic | EL,EB | EL,EB & regional | EL,EB, regional& genic | ||||||
| βEL | βEB | βEL | βEB | βEL | βEB | βEL | βEB | βEL | βEB | βEL | βEB | |
| 5′ length |
|
|
|
|
|
|
|
|
|
| −0.021§ |
|
| CDS length |
|
|
|
|
| −0.025† |
|
|
|
| − |
|
| Intron length |
|
|
|
|
|
|
|
|
|
| −0.010ns |
|
| 3′ length | 0.005ns |
| 0.001ns |
| 0.035‡ |
|
|
|
|
|
|
|
| PT length |
|
|
|
| n.a. | n.a. |
|
|
|
| n.a. | n.a. |
| Intron number4 |
|
|
|
|
|
|
|
|
|
| 0.020† | −0.010ns |
Results from multiple-regression analyses of level of gene expression (EL) and breadth (EB) and length (of 5′, CDS, intron, 3′ and primary transcript, PT) when controlling for regional and genic effects. The results for the intron number are also shown. All lengths were log10 transformed.
Note. PT = Primary Transcript; n.a. = not applicable; Significance levels: ns = not significant; §P<0.05 †P<0.01; ‡P<0.001;*P<0.0001. In bold: values that are significant after Bonferroni correction for multiple regression. Alpha level was adjusted separately for each type of model: EL,EB: 0.05/2 = 0.025; EL,EB & regional: 0.05/4 = 0.0125; EL,EB, regional+genic = 0.05/9 = 0.0056 (see Mundfrom et al 2006).
regional variables: intergenic spacers length and intergenic spacers GC content. Genes without intergenic spacers were excluded.
genic variables; 5′,CDS,Intron,3′, primary transcript lengths and intron number. Genes without introns were not included in the analyses.
,4genes without introns were excluded from the analyses.
Figure 3Results of multiple regression analyses between intron length and espression level (EL) and breadth (EB) by intron position.
The analysis was conducted considering both microarray (ma) and MPSS (mp) data and all genes. Full circles: significant at P<0.05; empty circles: not significant at P>0.05.
Figure 4Relation between the expression level (oligoarray DS-I) and the Total Domain Length and Non-domain protein length.
Variables on the Y axis have been normalized