| Literature DB >> 16677396 |
Karen Lemmens1, Thomas Dhollander, Tijl De Bie, Pieter Monsieurs, Kristof Engelen, Bart Smets, Joris Winderickx, Bart De Moor, Kathleen Marchal.
Abstract
'ReMoDiscovery' is an intuitive algorithm to correlate regulatory programs with regulators and corresponding motifs to a set of co-expressed genes. It exploits in a concurrent way three independent data sources: ChIP-chip data, motif information and gene expression profiles. When compared to published module discovery algorithms, ReMoDiscovery is fast and easily tunable. We evaluated our method on yeast data, where it was shown to generate biologically meaningful findings and allowed the prediction of potential novel roles of transcriptional regulators.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16677396 PMCID: PMC1779513 DOI: 10.1186/gb-2006-7-5-r37
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1ReMoDiscovery analysis flow. ReMoDiscovery consists of a seed discovery step followed by a seed extension step. ChIP-chip data, motif data, and expression data are used as input for the algorithm. These three datasets can be represented as matrices in which the rows represent the genes. For the ChIP-chip data (R) the columns represent the regulators, for the motif data (M) they represent the motifs and for the expression data (A) the different experiments. (a) The seed discovery step identifies sets of genes that are co-expressed, bind the same regulators, and have the same motifs in their intergenic region. (b) The gene content of the seed modules can be extended during the seed extension step using less stringent criteria. The logarithms of the module enrichment p values (y-axis) are plotted for all regulators (motifs) as a function of the correlation threshold (x-axis). Each line in the sample plot shows the module enrichment p values for the enrichment of its corresponding regulator (motif) as a function of the gene expression correlation threshold used.
Figure 2Overview of the seed modules identified in the Spellman dataset [12]. For visualization purposes, seed modules with similar function are combined (indicated in green). A regulator or motif that is part of a regulatory program of an extended module is indicated in the figure by a bold edge from the regulator or motif to its module.
Figure 3Representative examples from the module content similarity analysis. The significance of the similarity in module content between ReMoDiscovery seed modules and GRAM [9] and SAMBA [3] output is shown at different parameter settings. The color bar on the right indicates the normalized Jaccard similarity score, that is, the number of standard deviations from the mean of the distribution of Jaccard similarity scores on randomized module partitioning. (a) Regulator content similarity between ReMoDiscovery and GRAM, with varying GRAM module p value cutoff and ReMoDiscovery Chip-chip threshold. (b) Gene content similarity between ReMoDiscovery and GRAM, with varying GRAM core profile p value cutoff and ReMoDiscovery correlation threshold. (c) Gene content similarity between ReMoDiscovery and SAMBA, with varying SAMBA overlap prior factor and ReMoDiscovery correlation threshold.
Summary of the results of the GRAM, SAMBA and ReMoDiscovery module discovery methods
| Method | No. | Genes | Regulatory program | |||||
| Mean | Min | Max | Mean functional enrichment | Mean | Min | Max | ||
| ReMoDiscovery (seed modules) | 20 | 2.05 | 2 | 3 | 0.05 | 6.15 | 3 | 12 |
| ReMoDiscovery (extended modules) | 18 | 67.72 | 6 | 200 | 2.00E-03 | 3.50 | 2 | 6 |
| GRAM | 274 | 6.80 | 5 | 33 | 0,02 | 2.35 | 1 | 8 |
| SAMBA | 205 | 57.53 | 5 | 265 | 1.10E-02 | 4.16 | 0 | 31 |
The number of modules (No.) and the mean (Mean), minimum (Min) and maximum (Max) number of genes and regulators in the identified modules are displayed, as well as the average functional enrichment of the modules (Mean functional enrichment).
Summary of the significantly cell cycle enriched modules, identified by the GRAM, SAMBA and ReMoDiscovery module discovery methods
| Method | No. | Genes | Regulatory program | |||||||
| Mean | Min | Max | Mean | Min | Max | No. cell cycle R/all R | No. non cell cycle R/all R | No. cell cycle R | ||
| ReMoDiscovery (seed modules) | 2 | 2 | 2 | 2 | 4 | 3 | 5 | 0.80 | 0.20 | 6 |
| ReMoDiscovery (extended modules) | 8 | 97.38 | 12 | 200 | 3.50 | 2 | 6 | 0.92 | 0.08 | 10 |
| GRAM | 33 | 6;47 | 5 | 11 | 2.66 | 1 | 6 | 0.74 | 0.26 | 17 |
| SAMBA | 14 | 58,;29 | 17 | 155 | 2.57 | 0 | 12 | 0.29 | 0.71 | 5 |
The number of cell cycle modules (No.) and the mean (Mean), minimum (Min) and maximum (Max) number of genes and regulators in these modules are displayed. Additionally, the ratio of the number of cell cycle regulators over the total number of regulators in a module, averaged over all cell cycle modules (No. cell cycle R/all R) is shown, as well as the ratio of the number of non-cell cycle regulators over the total number of regulators in a module, averaged over all cell cycle modules (No. non-cell cycle R/all R). The last column contains the number of regulators from the compiled list of 19 known cell cycle regulators (see Materials and methods) that were present in the regulatory program of at least one of the cell cycle modules (No. cell cycle R).