| Literature DB >> 20333206 |
Liran Carmel1, Eugene V Koonin.
Abstract
Analysis of gene architecture and expression levels of four organisms, Homo sapiens, Caenorhabditis elegans, Drosophila melanogaster, and Arabidopsis thaliana, reveals a surprising, nonmonotonic, universal relationship between expression level and gene compactness. With increasing expression level, the genes tend at first to become longer but, from a certain level of expression, they become more and more compact, resulting in an approximate bell-shaped dependence. There are two leading hypotheses to explain the compactness of highly expressed genes. The selection hypothesis predicts that gene compactness is predominantly driven by the level of expression, whereas the genomic design hypothesis predicts that expression breadth across tissues is the driving force. We observed the connection between gene expression breadth in humans and gene compactness to be significantly weaker than the connection between expression level and compactness, a result that is compatible with the selection hypothesis but not the genome design hypothesis. The initial gene elongation with increasing expression level could be explained, at least in part, by accumulation of regulatory elements enhancing expression, in particular, in introns. This explanation is compatible with the observed positive correlation between intron density and expression level of a gene. Conversely, the trend toward increasing compactness for highly expressed genes could be caused by selection for minimization of energy and time expenditure during transcription and splicing and for increased fidelity of transcription, splicing, and/or translation that is likely to be particularly critical for highly expressed genes. Regardless of the exact nature of the forces that shape the gene architecture, we present evidence that, at least, in animals, coding and noncoding parts of genes show similar architectonic trends.Entities:
Keywords: eukaryotic gene architecture; eukaryotic gene structure; genomic design; intron density; intron functionality; selection on gene compactness
Year: 2009 PMID: 20333206 PMCID: PMC2817431 DOI: 10.1093/gbe/evp038
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
FTotal length variables as functions of expression-level category. All lengths are measured as number of nucleotides. Expression levels are binned into 30 categories, with higher categories matching higher expression levels. Each dot is the mean value for all genes in the given expression category, and the error bar indicates the standard deviation of the mean. Dark areas depict the area (standard error) of the bend point where the trend changes from increasing to decreasing, according to SegReg.
The Nonmonotonic Dependence between Gene Length Variables and Expression
| Bend Point | Left Fraction | Left Slope | Right Slope | ||
| Total transcript length | HS | 12.6 ± 0.5 | 0.37 | 1,170 ± 487 | -2,440 ± 243 |
| DM | 17.5 ± 0.7 | 0.59 | 118 ± 24 | -119 ± 19 | |
| CE | 23.9 ± 0.4 | 0.80 | 84.4 ± 4.7 | -258 ± 18 | |
| AT | 15.8 ± 1.3 | 0.80 | 22.5 ± 4.2 | -68.6 ± 3.0 | |
| CDS length | HS | 21.9 ± 0.2 | 0.76 | 6.17 ± 3.4 | -109 ± 12 |
| DM | 22.8 ± 0.5 | 0.79 | 30.9 ± 3.4 | -102 ± 7 | |
| CE | 20.1 | 0.71 | 17.2 ± 1.8 | -38.8 ± 6.5 | |
| AT | 15.8 ± 16.6 | 0.80 | 6.09 ± 2.5 | -55.5 ± 1.7 | |
| Total exon length | HS | 21.6 ± 6.3 | 0.76 | 18.3 ± 4.8 | -171 ± 7 |
| DM | 22.8 ± 0.5 | 0.79 | 42.4 ± 3.6 | -81.4 ± 8.0 | |
| CE | 22.2 | 0.77 | 25.7 ± 1.7 | -57.9 ± 9.2 | |
| AT | 15.8 ± 2.7 | 0.80 | 12.7 ± 2.5 | -55.8 ± 1.8 | |
| Total intron length | HS | 12.6 ± 0.5 | 0.37 | 1,070 ± 484 | -2,370 ± 241 |
| DM | 16.9 ± 0.6 | 0.56 | 76.1 ± 17.4 | -101 ± 25 | |
| CE | 22.5 ± 0.4 | 0.77 | 60.7 ± 4.3 | -161 ± 14 | |
| AT | 15.8 ± 1.0 | 0.80 | 9.8 ± 2.3 | -12.8 ± 1.7 | |
| Mean exon length | HS | -5.58 ± 1.31 | -13.40 ± 1.76 | ||
| DM | -2.09 ± 1.32 | -22.2 ± 2.26 | |||
| CE | 1.69 ± 0.34 | 4.62 ± 0.29 | |||
| AT | 9.4 ± 0.4 | 0.64 | 4.25 ± 2.14 | -11.10 ± 1.22 | |
| Mean intron length | HS | -93 ± 29 | -237 ± 37 | ||
| DM | 14 ± 2.3 | 0.48 | 9.48 ± 6.45 | -10.2 ± 2.9 | |
| CE | 21.9 ± 0.6 | 0.74 | 7.77 ± 0.92 | -17.2 ± 2.2 | |
| AT |
The results of segmented regression applied to the data in figure 1. Bend point: The expression-level category that is the border between the increasing and the decreasing parts, as decided by SegReg. The ± symbol indicates standard error (SE). Whenever the curve does not show Λ-shape, the bend point is not reported. Left fraction: The fraction of genes in the increasing part of the curve. Left slope: The slope of the left part of the curve, as computed by SegReg. The ± symbol indicates SE. Right slope: The slope of the right part of the curve, as computed by SegReg. The ± symbol indicates SE. If not otherwise indicated, SegReg found statistical support in favor of two joint linear segments.
SegReg found statistical support for two disjoint linear segments.
Computation was made on median values.
Computation failed in SegReg (software crashed).
FMean lengths of exons and introns as functions of expression-level category. All designations are as in figure 1.
Raw versus Binned Correlation Coefficients
| CDS Length - Total Intron Length | Intron Density - Expression Level | |||||
| Raw Correlation | Binned Correlation | Raw Correlation | Binned Correlation | |||
| HS | 0.30 | 0.85 | 0.01 | 0.11 | 0.93 | <0.001 |
| DM | 0.20 | 0.69 | 0.01 | 0.13 | 0.73 | <0.001 |
| CE | 0.28 | 0.81 | 0.01 | -0.20 | -0.93 | <0.001 |
| AT | 0.52 | 0.58 | 0.40 | 0.06 | 0.84 | <0.001 |
Raw correlation: The correlation was computed using all the genes. Binned correlation: The correlation was computed between the mean values across the expression-level categories. P value: The significance of the correlation change due to the binning was computed using 1,000 bootstrap repetitions.
Pearson (linear) correlation.
Spearman correlation.
FIntron density of genes as a function of expression-level category. The intron density is measured as the number of introns per kilobase of the CDS. Color codes: blue = human, black = Drosophila, red = nematode, and green = Arabidopsis. All other designations are as in figure 1.
FThe universal nonmonotonic relationship between the expression level of a gene and its compactness: a schematic depiction.