| Literature DB >> 25392409 |
Rim Zaag1, Jean Philippe Tamby1, Cécile Guichard1, Zakia Tariq1, Guillem Rigaill1, Etienne Delannoy1, Jean-Pierre Renou1, Sandrine Balzergue1, Tristan Mary-Huard2, Sébastien Aubourg1, Marie-Laure Martin-Magniette3, Véronique Brunaud4.
Abstract
CATdb (http://urgv.evry.inra.fr/CATdb) is a database providing a public access to a large collection of transcriptomic data, mainly for Arabidopsis but also for other plants. This resource has the rare advantage to contain several thousands of microarray experiments obtained with the same technical protocol and analyzed by the same statistical pipelines. In this paper, we present GEM2Net, a new module of CATdb that takes advantage of this homogeneous dataset to mine co-expression units and decipher Arabidopsis gene functions. GEM2Net explores 387 stress conditions organized into 18 biotic and abiotic stress categories. For each one, a model-based clustering is applied on expression differences to identify clusters of co-expressed genes. To characterize functions associated with these clusters, various resources are analyzed and integrated: Gene Ontology, subcellular localization of proteins, Hormone Families, Transcription Factor Families and a refined stress-related gene list associated to publications. Exploiting protein-protein interactions and transcription factors-targets interactions enables to display gene networks. GEM2Net presents the analysis of the 18 stress categories, in which 17,264 genes are involved and organized within 681 co-expression clusters. The meta-data analyses were stored and organized to compose a dynamic Web resource.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25392409 PMCID: PMC4383956 DOI: 10.1093/nar/gku1155
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 19.160
Figure 1.Stress categories. Pie chart representing the classification of the CATdb experimental comparisons into 18 stress categories, nine biotic and nine abiotic stresses.
Figure 2.Workflow of GEM2Net. This workflow describes the bioinformatics steps required from the classification of CATdb experimental comparisons toward cluster annotation and gene interaction networks, with integration of the various meta-data.
Number of genes by meta-data in GEM2Net gene set and Arabidopsis genome Reference
| Total | Orphan | BP stress | Bibliostress | TF | Hormone | |
|---|---|---|---|---|---|---|
| Arabidopsis Reference | 34 042 | 5105 (15%) | 5106 (15%) | 2580 (7.5%) | 2260 (6.5%) | 695 (2%) |
| GEM2Net dataset | 17 264 | 2165 (13%) | 487 (3%) |
Comparison of the number of genes between Reference (all Arabidopsis genes) and GEM2Net dataset for the following meta-data: Orphan genes; BP stress gathers two terms of Biological Process from GO (‘response to stress’ and ‘response to abiotic or biotic stress’); Bibliostress lists the stress-responsive genes with related bibliography extracted from GO; TF is a list of genes characterized as TFs in the Regulators project; Hormone is a list of genes having a link with hormone response as annotated in the AHD2.0 database. Numbers in bold highlight significant gene set enrichments of the GEM2Net compared to the Reference datasets (binomial test with P-value < 0.05).
Figure 3.GO Biological Process analyses for the ‘Virus’ stress category. The GEM2Net web page representing the GO ‘Biological Process’ pie charts for all the clusters of Virus stress category. Statistically significant results of gene set enrichment tests are displayed with colored sections, and gene counts and P-values are mentioned in the information frame on the right side. In the same frame, all analysis results are summarized with blue points for the cluster being hovered over with the mouse.
Figure 4.Meta-analyses overview for Cluster_49 of the ‘Virus’ stress category. Synoptic view of the meta-data analyses performed on the cluster_49 is shown in the central panel and results of all analyses are summarized in the frame table on the upper right side. Part list of the genes involved in the Biological Process bias is seen below the central panel. In this table, each gene accession is tagged with colored circle(s) (legend table on the left) and other meta-data enrichments are indicated on the right with blue points when appropriate.
Figure 5.Protein Interactome Network for Cluster_49 of the ‘Virus’ stress category. In the central panel, all PPI (edges) between gene accessions (nodes) within the cluster_49 are represented with dark blue lines, using the Cytoscape Web software tool. Functional annotation is superimposed on nodes by selecting the corresponding checkbox above, hormone families (in blue) and orphans (in red) here. On the right frame, filters on GO categories can be applied to the network to view only nodes of the selected annotation. In addition, a ‘Targets of Transcription Factors’ option is available to display this type of interaction in the same network.