| Literature DB >> 17641976 |
Xin-Ying Ren1, Willem J Stiekema, Jan-Peter Nap.
Abstract
Chromosomal coexpression domains are found in a number of different genomes under various developmental conditions. The size of these domains and the number of genes they contain vary. Here, we define local coexpression domains as adjacent genes where all possible pair-wise correlations of expression data are higher than 0.7. In rice, such local coexpression domains range from predominantly two genes, up to 4, and make up approximately 5% of the genomic neighboring genes, when examining different expression platforms from the public domain. The genes in local coexpression domains do not fall in the same ontology category significantly more than neighboring genes that are not coexpressed. Duplication, orientation or the distance between the genes does not solely explain coexpression. The regulation of coexpression is therefore thought to be regulated at the level of chromatin structure. The characteristics of the local coexpression domains in rice are strikingly similar to such domains in the Arabidopsis genome. Yet, no microsynteny between local coexpression domains in Arabidopsis and rice could be identified. Although the rice genome is not yet as extensively annotated as the Arabidopsis genome, the lack of conservation of local coexpression domains may indicate that such domains have not played a major role in the evolution of genome structure or in genome conservation.Entities:
Mesh:
Year: 2007 PMID: 17641976 PMCID: PMC2039854 DOI: 10.1007/s11103-007-9209-0
Source DB: PubMed Journal: Plant Mol Biol ISSN: 0167-4412 Impact factor: 4.076
Description of rice expression data used for whole-genome local coexpression analysis
| MPSS | MA | |
|---|---|---|
| Excluding overlapping genes | 23,146 | 14,789 |
| Without expressed neighbor(s) | 5,081 | 5,438 |
| represented in pairs | 18,065 | 9,351 |
| Total | 12,920 | 6,032 |
| Tandemly duplicated pairs (td) | 1,663 (12.9%)a | 573 (9.5%)a |
| Coexpressed | 584 (4.5%)b | 320 (5.3%)b |
| Total excluding td | 11,257 | 5,459 |
| Coexpressed excluding td | 438 (3.9%)c | 288 (5.3%)c |
| Total | 584 | 320 |
| Tandemly duplicated pairs | 146 (25%)d | 32 (10%)d |
| Total | 1,663 | 573 |
| Coexpressed | 146 (8.8%)e | 32 (5.6%) e |
aPercentage of tandemly duplicated pairs relative to the total number of adjacent pairs
bPercentage of coexpressed adjacent pairs relative to the total number of adjacent pairs
cPercentage of coexpressed adjacent pairs excluding td relative to the total number of adjacent pairs excluding tandemly duplicated pairs
dPercentage of coexpressed tandemly duplicated pairs relative to the total number of coexpressed adjacent pairs
ePercentage of coexpressed tandemly duplicated pairs relative to the total number of tandem duplicated pairs
Local coexpression domains in the rice genome
| Rice genome | Random genome (100×) | |||
|---|---|---|---|---|
| Totala | Coexpressedb | Averagec | ||
| MPSS + tde | 12,920 | 584 (4.52%) | 408 ± 17 | 1.46 × 10−17 |
| MPSS-tdf | 11,257 | 438 (3.89%) | 356 ± 21 | 2.17 × 10−6 |
| MA + tdg | 6,032 | 320 (5.30%) | 301 ± 17 | 0.012 |
| MA-tdh | 5,459 | 288 (5.28%) | 271 ± 16 | 0.014 |
| MPSS + td | 7,775 | 23 (0.30%) | 8.78 ± 2.9 | 2.95 × 10−5 |
| MPSS-td | 6,831 | 13 (0.19%) | 7.74 ± 3.0 | 0.025 |
| MA + td | 2,461 | 5 (0.20%) | 6.54 ± 2.7 | n.s. |
| MA-td | 2,149 | 3 (0.14%) | 5.10 ± 2.4 | n.s. |
| MPSS + td | 4,887 | 3 (0.06%) | 0.24 ± 0.47 | 1.81 × 10−3 |
| MPSS-td | 4,318 | 0 (0%) | 0.18 ± 0.39 | n.s. |
| MA + td | 1,079 | 0 (0%) | 0.14 ± 0.37 | n.s. |
| MA-td | ndi | nd | nd | nd |
aTotal number of pairs, triplets, quadruplets in each data set
bCoexpressed pairs, triplets, quadruplets in each data set. Percentages in brackets are coexpressed relative to the total
cAverage plus/minus standard deviation from 100 randomizations
dP-value according to the cumulative binomial distribution (Cohen et al. 2000) for obtaining such a result by chance. P < 0.05 is considered significant; n.s.: not significant
eMPSS data set including tandemly duplicated genes
fMPSS data set excluding tandemly duplicated genes
gMA data set including tandemly duplicated genes
hMA data set excluding tandemly duplicated genes
iNot determined
Fig. 1Distribution of local coexpression domains over all 12 rice chromosomes. Rectangles are schematic representation of chromosomes 1–12 from top to bottom. The numbers on the top show the scale in million bases along the chromosomes. Each gene in a local coexpression domain is depicted with a black bar. Only MPSS datasets excluding tandemly duplicated genes are shown. The orders of the drawings in each rectangle are: first lane, coexpressed pairs; second lane, coexpressed triplets; third lane, coexpressed quadruplets, fourth lane, partially syntenic coexpression domains (PSCDs) between Arabidopsis and rice
Orientation of coexpressed gene pairs
| Orientation groupsa | Totalb | Coexpressedc |
|---|---|---|
| tan-td | 5,621 | 239 (4.25%) |
| div-td | 2,418 | 82 (3.39%) |
| con-td | 3,218 | 117 (3.64%) |
| tan-td | 2,707 | 143 (5.28%) |
| div-td | 1,224 | 72 (5.88%) |
| con-td | 1,528 | 73 (4.78%) |
atan-td, div-td, con-td, respectively are the sub-groups of tandemly, divergently, convergently transcribed pairs excluding tandem duplicates
bTotal number of pairs in each direction group
cNumber of coexpressed pairs in each direction group. Percentages in the brackets are number of coexpressed pairs relative to the total number of pairs. None of the proportions are significantly different from each other according to the z test for comparing population proportions
Fig. 2Gene distance does not solely explain the occurrence of coexpression. Gene distance, defined as the length in nucleotides from the annotated end of one gene to the annotated start of the next gene relative to the strand the genome that is given, with annotated start always smaller than the annotated end. X-axis is the averaged gene distance (in base pair) in each 1,000-pair bin. The Y-axis depicts the number of pairs (A, D), number of coexpressed pairs (B, E) and the fraction of coexpressed pairs (C, F), relative to the total number of pairs in each orientation (tan: tandem pairs; div: divergent pairs; con: convergent pairs) in each 1,000-pair bin
Distribution of gene pairs over GOslim categories (Non-duplicated pairs)
| Alla | Coexpressedb | Non-coexpressedc | ||
|---|---|---|---|---|
| MPSS | ||||
| GO_func | ||||
| coverede | 2502 | 100 | 2402 | 0.42 |
| sameKnCatf | 365 (14.6%) | 12 (12.0%) | 353 (14.7%) | |
| GO_proc | ||||
| Covered | 1366 | 50 | 1316 | 0.47 |
| sameKnCat | 144 (10.5%) | 7 (14%) | 137 (10.4%) | |
| GO_comp | ||||
| Covered | 383 | 17 | 366 | 0.60 |
| sameKnCat | 113 (29.5%) | 6 (35.3%) | 107 (29.2%) | |
| MA | ||||
| GO_func | ||||
| coverede | 1365 | 83 | 1282 | 0.13 |
| sameKnCatf | 177 (13.0%) | 7 (8.43%) | 170 (13.3%) | |
| GO_proc | ||||
| Covered | 707 | 43 | 664 | 0.13 |
| sameKnCat | 67 (9.48%) | 2 (4.65%) | 65 (9.79%) | |
| GO_comp | ||||
| Covered | 202 | 10 | 192 | 0.24 |
| sameKnCat | 45 (22.3%) | 4 (40%) | 41 (21.4%) | |
aNumber of neighboring pairs excluding td. All other pairs are all duplicate-free, unless stated otherwise
bNumber of coexpressed pairs
cNumber of non-coexpressed pairs
dP value from the standard normal tables of the z statistic for the difference of the population proportion between coexpressed pairs and non-coexpressed pairs in the rice genome; *, significant (two-tailed; P < 0.05). The P value is the probability under the null hypothesis that the two population proportions are the same
eNumber of pairs of which both members are assigned (covered) with GOslim categories
fNumber of pairs of which both members fall into the same “known” GOslim category (excluding the categories with the indications ‘unknown’ and ‘other’). Percentage is the number of pairs relative to the number of pairs covered
Fig. 3Schematic representation of the chromosomal regions covering genes involved in a four-to-one orthology between Arabidopsis and rice. Top part of the figure is the chromosomal region from rice (from gene locus Os07g43540.1 to gene locus Os07g43570.1). Bottom part of the figure is the chromosomal region from Arabidopsis, representing 23 genes (from gene locus At4g23120 to At4g23340; the numbers in the picture do not carry “At4g”). Black arrows represent the four Arabidopsis and the one rice gene involved in this orthology, and dashed curved connecting lines show the orthology relationships. Black bracket-like lines depict duplication and genes connected and included within by black bracket line are duplicated to each other. Dotted lines depict coexpression relationship and genes connected and included by dotted line are coexpressed with each other