| Literature DB >> 21799793 |
Kevin L Childs1, Rebecca M Davidson, C Robin Buell.
Abstract
With the existence of large publicly available plant gene expression data sets, many groups have undertaken data analyses to construct gene coexpression networks and functionally annotate genes. Often, a large compendium of unrelated or condition-independent expression data is used to construct gene networks. Condition-dependent expression experiments consisting of well-defined conditions/treatments have also been used to create coexpression networks to help examine particular biological processes. Gene networks derived from either condition-dependent or condition-independent data can be difficult to interpret if a large number of genes and connections are present. However, algorithms exist to identify modules of highly connected and biologically relevant genes within coexpression networks. In this study, we have used publicly available rice (Oryza sativa) gene expression data to create gene coexpression networks using both condition-dependent and condition-independent data and have identified gene modules within these networks using the Weighted Gene Coexpression Network Analysis method. We compared the number of genes assigned to modules and the biological interpretability of gene coexpression modules to assess the utility of condition-dependent and condition-independent gene coexpression networks. For the purpose of providing functional annotation to rice genes, we found that gene modules identified by coexpression analysis of condition-dependent gene expression experiments to be more useful than gene modules identified by analysis of a condition-independent data set. We have incorporated our results into the MSU Rice Genome Annotation Project database as additional expression-based annotation for 13,537 genes, 2,980 of which lack a functional annotation description. These results provide two new types of functional annotation for our database. Genes in modules are now associated with groups of genes that constitute a collective functional annotation of those modules. Additionally, the expression patterns of genes across the treatments/conditions of an expression experiment comprise a second form of useful annotation.Entities:
Mesh:
Year: 2011 PMID: 21799793 PMCID: PMC3142134 DOI: 10.1371/journal.pone.0022196
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Rice gene expression data sets and analysis parameters used in this study.
| Data Set | Description | CV Cutoff | Beta Parameter | Tree Cut Parameter |
| GSE4471 | Arsenate response in roots of cultivars Azucena and Bala | 0.6 | 15 | 0.6 |
| GSE6719 | Cytokinin response in roots and leaves | 0.8 | 15 | 0.9 |
| GSE6893 | Inflorescence and seed developmental series | 0.8 | 22 | 0.9 |
| GSE6901 | Seedlings treated with abiotic stresses | 0.8 | 15 | 0.8 |
| GSE10373 |
| 0.6 | 10 | 0.8 |
| GSE11025 | Rice stripe virus infection of seedlings of cultivars WuYun3 and KT95 | 0.6 | 15 | 0.7 |
| GSE15046 | Analysis of shoots of gibberellin signalling mutants | 0.6 | 15 | 0.9 |
| GSE16793 | Infection by | 0.6 | 15 | 0.9 |
| GSE17245 | Iron and phosphorus interactions in shoots and roots | 0.6 | 15 | 0.9 |
| GSE18361 | Time course of root infection with | 0.6 | 30 | 0.9 |
| GSE19024 | Tissue atlas from cultivar Minghui 63 | 0.8 | 11 | 0.8 |
| GSE19239 | Response of transgenic rice with maize | 0.6 | 15 | 0.7 |
| E-MEXP-1766 | Time course from aerobic germination of seeds | 0.7 | 15 | 0.7 |
| E-MEXP-2267 | Time course from anaerobic/aerobic germination of seeds | 0.7 | 15 | 0.9 |
| E-MEXP-2506 | Thermoperiod/photoperiod time courses | 0.6 | 7 | 0.9 |
| Combined data set | Combined chips from all 15 individual experiments | 0.9 | 4 | 0.95 |
Identifiers for data are from either NCBI GEO or EBI ArrayExpress.
Coefficient of variation cutoff used to filter averaged and normalized gene expression data.
Beta and tree cut parameters used during WGCNA analysis of expression data.
Only shoot apical meristem, developing panicle and developing seed samples were used for this analysis.
Shoot −Fe+P samples were removed after chip QC analysis.
Only data from Minghui 63 were analyzed. Expression data from Zhenshan 97 were excluded from analysis. Callus tissue samples were not included in the analysis.
The LL-LDHC-124 hrs sample was excluded from analysis after chip QC analysis.
Numbers of genes and annotation status of genes assigned to modules by two analysis methods.
| Number of Genes Analyzed | Number of Modules | Genes with Functional Annotation | Genes without Functional Annotation | TE-related Genes | Total Genes Assigned to Modules | |
|
| ||||||
| GSE4471 | 2,613 | 6 | 1,777 | 672 | 83 | 2,532 |
| GSE6719 | 2,802 | 5 | 2,268 | 478 | 40 | 2,786 |
| GSE6893 | 4,231 | 8 | 2,340 | 600 | 68 | 3,008 |
| GSE6901 | 739 | 3 | 565 | 131 | 14 | 710 |
| GSE10373 | 672 | 3 | 395 | 144 | 28 | 567 |
| GSE11025 | 835 | 4 | 535 | 176 | 18 | 729 |
| GSE15046 | 1,197 | 6 | 976 | 190 | 20 | 1,186 |
| GSE16793 | 678 | 2 | 469 | 93 | 7 | 569 |
| GSE17245 | 4,747 | 5 | 3,679 | 823 | 64 | 4,566 |
| GSE18361 | 1,162 | 3 | 741 | 227 | 41 | 1,009 |
| GSE19024 | 7,478 | 5 | 1,453 | 435 | 45 | 1,933 |
| GSE19239 | 1,990 | 5 | 1,363 | 499 | 82 | 1,944 |
| E-MEXP-1766 | 3,704 | 3 | 2,986 | 605 | 68 | 3,659 |
| E-MEXP-2267 | 2,421 | 4 | 1,835 | 441 | 49 | 2,325 |
| E-MEXP-2506 | 1,816 | 9 | 844 | 266 | 97 | 1,207 |
|
| ||||||
| 13,537 | 71 | 9,014 | 2,908 | 406 | 12,328 | |
|
| ||||||
| 17,320 | 15 | 7,481 | 2,403 | 193 | 10,077 | |
Number of genes that had passed the CV filter and that were subsequently analyzed by the WGCNA method.
Transposable element-related genes.
Identifiers for data from either NCBI GEO or EBI ArrayExpress.
The condition-independent data set contained all gene chips used in the analyses of each of the 15 individual experiments.
Figure 1Normalized expression values of modules of genes identified from a panicle/seed developmental series.
Gene expression values from a panicle and seed developmental series were processed using Weighted Gene Coexpression Network Analysis to identify modules of highly correlated genes [36], [42]. Tissues analyzed were shoot apical meristems (SAM), panicles between 0 and 3 cm long (inflorescence P1), panicles between 3 and 5 cm long (inflorescence P2), panicles between 5 and 10 cm long (inflorescence P3), panicles between 10 and 15 cm long (inflorescence P4), panicles between 15 and 20 cm long (inflorescence P5), between 22 and 30 cm long - mature pollen stage (P6), developing seed 0 to 2 days after pollination (dap; seed S1), developing seed 3 to 4 dap (seed S2), developing seed 5 to 10 dap (seed S3), developing seed 11 to 20 dap (seed S4), developing seed 21 to 29 dap (seed S5). Expression data are represented here as normalized values (Z-scores). Modules names: (A) GSE6893-black, (B) GSE6893-blue, (C) GSE6893-red, (D) GSE6893-pink, (E) GSE6893-yellow, (F) GSE6893-brown, (G) GSE6893-turquoise, (H) GSE6893-green.
Figure 2Normalized expression values of modules of genes identified from a Striga root infection study.
Gene expression values from Striga hermonthica root infection time course of rice cultivars IAC165 and Nipponbare were processed using Weighted Gene Coexpression Network Analysis to identify modules of highly correlated genes [36], [43]. Expression data are represented here as normalized values (Z-scores). Two gene modules, (A) GSE10373-blue and (B) GSE10373-brown, display differential responses between genes in the two cultivars in response to infection by S. hermonthica. Genes from one module, (C) GSE10373-turquoise, are differentially expressed between the two rice cultivars but are not responsive to infection by S. hermonthica. Plots for genes that are positively correlated with each other within a module are shown in the same color. Genes within a module that are displayed in different colors are anti-correlated.
Number of genes assigned to modules from different experiments.
| Number of experiments | Number of genes |
| 1 | 5,170 |
| 2 | 2,768 |
| 3 | 1,860 |
| 4 | 1,212 |
| 5 | 698 |
| 6 | 374 |
| 7 | 149 |
| 8 | 64 |
| 9 | 25 |
| 10 | 7 |
| 11 | 0 |
| 12 | 1 |
The numbers listed only include those genes that passed the coefficient of variation filtering and were assigned to a module of highly correlated genes. Genes that passed the coefficient of variation filtering but that were unassigned to a module were excluded from this analysis.
Number of gene coexpression modules and number of enriched Pfam domains associated with different experiments.
| Experiment | Number of Modules Analyzed | Number Modules with Pfam Enrichment | Number Unique Pfam Domains Enriched within Experiment |
| E-MEXP-1766 | 3 | 3 | 24 |
| E-MEXP-2267 | 4 | 3 | 19 |
| E-MEXP-2506 | 9 | 9 | 31 |
| GSE10373 | 3 | 3 | 8 |
| GSE11025 | 4 | 3 | 4 |
| GSE15046 | 6 | 3 | 14 |
| GSE16793 | 2 | 2 | 9 |
| GSE17245 | 5 | 5 | 25 |
| GSE18361 | 3 | 3 | 26 |
| GSE19024 | 5 | 4 | 21 |
| GSE19239 | 5 | 4 | 11 |
| GSE4471 | 6 | 5 | 13 |
| GSE6719 | 5 | 4 | 14 |
| GSE6893 | 8 | 7 | 35 |
| GSE6901 | 3 | 3 | 7 |
Figure 3A Venn diagram showing the intersections of genes used in condition-dependent and condition-independent coexpression analyses.
The blue circles on the left represent the combined results from the condition-dependent coexpression analyses. The green circles on the right represent the results from the condition-independent analysis. The inner and outer circles respectively represent the genes that were assigned to modules and those that were not assigned to modules in each of the analyses, respectively.