| Literature DB >> 24178033 |
Jun Li1, Hairong Wei, Tingsong Liu, Patrick Xuechun Zhao.
Abstract
The accurate construction and interpretation of gene association networks (GANs) is challenging, but crucial, to the understanding of gene function, interaction and cellular behavior at the genome level. Most current state-of-the-art computational methods for genome-wide GAN reconstruction require high-performance computational resources. However, even high-performance computing cannot fully address the complexity involved with constructing GANs from very large-scale expression profile datasets, especially for the organisms with medium to large size of genomes, such as those of most plant species. Here, we present a new approach, GPLEXUS (http://plantgrn.noble.org/GPLEXUS/), which integrates a series of novel algorithms in a parallel-computing environment to construct and analyze genome-wide GANs. GPLEXUS adopts an ultra-fast estimation for pairwise mutual information computing that is similar in accuracy and sensitivity to the Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNE) method and runs ∼1000 times faster. GPLEXUS integrates Markov Clustering Algorithm to effectively identify functional subnetworks. Furthermore, GPLEXUS includes a novel 'condition-removing' method to identify the major experimental conditions in which each subnetwork operates from very large-scale gene expression datasets across several experimental conditions, which allows users to annotate the various subnetworks with experiment-specific conditions. We demonstrate GPLEXUS's capabilities by construing global GANs and analyzing subnetworks related to defense against biotic and abiotic stress, cell cycle growth and division in Arabidopsis thaliana.Entities:
Mesh:
Year: 2013 PMID: 24178033 PMCID: PMC3950724 DOI: 10.1093/nar/gkt983
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Comparison of the performance of several integrated methods on four compendium datasets
| Species/cell line | Number of arrays | Number of probe sets | Runtime (minutes) | ||||
|---|---|---|---|---|---|---|---|
| M1 | M2 | M3 | M4 | M5 | |||
| 313 (Dataset I) | 15 552 | 12 640 | 1034 | 40 | 12 | 7 | |
| 1848 (Dataset II) | 22 810 | 5800 | 51 | 34 | |||
| 738 (Dataset III) | 66 190 | 12 000 | 88 | 42 | |||
| Human glioblastoma | 547 (Dataset IV) | 22 277 | 30 640 | 12 260 | 180 | 18 | 12 |
M1: Original ARACNE method; M2: Spearman-based MI estimation with integrated DPI analysis; M3: Parallel implementation of the original ARACNE method deployed on our BioGrid system; M4: Parallel implementation of the B-Spline-based MI estimation with integrated DPI analysis deployed on our BioGrid system; M5: Parallel implementation of the Spearman-based MI estimation with integrated DPI analysis deployed on our BioGrid system.
aComputationally infeasible.
The average performance of several integrated methods on yeast and A. thaliana benchmark datasets
| Method | AUROC | Average F-score | ||
|---|---|---|---|---|
| Yeast | Yeast | |||
| Spearman-based MI/DPI | 0.896 | 0.856 | 0.512 | 0.512 |
| ARACNE | 0.89 | 0.831 | 0.443 | 0.508 |
| B-Spline-based MI/DPI | 0.879 | 0.834 | 0.438 | 0.509 |
| Co-expression | 0.872 | 0.791 | 0.124 | 0.10 |
| GeneNet | 0.875 | 0.813 | 0.388 | 0.477 |
Figure 1.The topological properties of the constructed Arabidopsis thaliana gene networks. (A) A plot of the node degree distribution. (B) A plot of the node degree distribution after a log-transformation.
Figure 3.Comparisons of network edges recovered by the GPLEXUS and the ARACNE methods. (A) The overlap ratio of edges of the paired networks. (B) The distribution of the Gaussian-kernel-based MI values versus the Spearman transformation-based MI values for all overlapped edges.
Figure 2.A visualization of the genes that were predicted to interact directly with the WRKY33 gene. Nodes with five-star labels are genes validated by literature.