| Literature DB >> 26470848 |
Bing Li1,2, Yingying Zhang1, Yanan Yu1, Pengqian Wang1, Yongcheng Wang3, Zhong Wang1, Yongyan Wang1.
Abstract
Validation of pluripotent modules in diverse networks holds enormous potential for systems biology and network pharmacology. An arising challenge is how to assess the accuracy of discovering all potential modules from multi-omic networks and validating their architectural characteristics based on innovative computational methods beyond function enrichment and biological validation. To display the framework progress in this domain, we systematically divided the existing Computational Validation Approaches based on Modular Architecture (CVAMA) into topology-based approaches (TBA) and statistics-based approaches (SBA). We compared the available module validation methods based on 11 gene expression datasets, and partially consistent results in the form of homogeneous models were obtained with each individual approach, whereas discrepant contradictory results were found between TBA and SBA. The TBA of the Zsummary value had a higher Validation Success Ratio (VSR) (51%) and a higher Fluctuation Ratio (FR) (80.92%), whereas the SBA of the approximately unbiased (AU) p-value had a lower VSR (12.3%) and a lower FR (45.84%). The Gray area simulated study revealed a consistent result for these two models and indicated a lower Variation Ratio (VR) (8.10%) of TBA at 6 simulated levels. Despite facing many novel challenges and evidence limitations, CVAMA may offer novel insights into modular networks.Entities:
Mesh:
Year: 2015 PMID: 26470848 PMCID: PMC4607977 DOI: 10.1038/srep15258
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Topology- and statistics-based methods for module validation.
| No. | Type | Index | Equation | Criteria | Application | Test data | Ref. |
|---|---|---|---|---|---|---|---|
| Topological validation | |||||||
| 1 | Integrated index | Zsummary | ≥10, strongly preserved; 2~10, moderately preserved; ≤2, no preservation | Composite preservation statistics to validate whether a module is significantly preserved in another network. Apply to correlation networks (e.g., co-expression networks) | yes | ||
| 2 | ZsummaryADJ | ≥10, strongly preserved; 2~10, moderately preserved; ≤2, no preservation | Same as above. Apply to general networks (e.g., adjacency matrix networks) | yes | |||
| 3 | medianRank | The lower the better | Same as above. | yes | |||
| 4 | Single index | Entropy | The smaller the better | Access the quality of identified modules. A good quality module is expected to have a low entropy. | no | ||
| 5 | Mpres | Mpres = cor(kl,km) | The closer to 1, the better | Describe the preservation of intra-modular connectivity across two networks. A p-value can be assigned to evaluate the reproducibility of modules. | yes | ||
| 6 | NB value | NB ≥ 0.5 | A ratio of edges within a module and the total number of edges between modules is used to select modules with high intra-modular connectivity. | no | |||
| 7 | CS (S) | CS (S) > 0, the higher the better | Describe the compactness and neighboring conditions of a cluster. Apply to select good clusters from integrated clustering results | no | |||
| 8 | LS (S) | The higher the better | Judge the quality of a cluster S in a graph G and help to select good clusters from integrated clustering results. | no | |||
| 9 | Modularity | 0.3 ≤ Q ≤ 0.7 | Evaluate the level of modular structure and the best split of a network into modules. | no | |||
| Statistical validation | |||||||
| 1 | Integer linear programming | C · (X1, X2, …, Xk) | C ≤ 0, the smaller the better | A classifier and integer linear programming model to select modules based on the activity of the module in case and control samples. | yes | ||
| 2 | Bootstrap resampling | P-value | NULL | P ≤ 0.05 | P-value is derived from multiscale bootstrap resampling to assess the uncertainty of clustering analysis and search for significant modules. | no | |
| 3 | Consensus score | ≥ρ, the higher the better | A jackknife resampling procedure is used to assess the accuracy and robustness of functional modules resulting in an ensemble of optimal modules. | no | |||
| 4 | Permutation test | Combinatorial p-value | NULL | Combinatorial criteria: (1) P(Zm) < 0.05; (2) PGL, PnSNPs, Ptopo < 0.05; (3) Pemp < 0.05 Additional criteria: P(Zm(eval)) and/or Pemp(eval) < 0.05 | Significance and permutation tests are used to calculate the P value of module scores. Appropriate for GWAS data; multiple GWAS datasets are needed when using additional criteria. | yes | |
| 5 | coClustering (q) | ≥95% | A cross-tabulation-based statistic for determining whether modules in the reference dataset are preserved in a test dataset, a permutation test to determine the p value. | yes | |||
| 6 | Modular compatibility | Compatibility Score ( | The closer to 1, the better | An indication of agreement or overlap between two sets of modules to measure the network modular compatibility between two networks. | yes | ||
| 7 | Matching p-value | NULL | P < 0.05 | Modified hypergeometric test-derived p-values with Bonferroni correction to measure modules’ conservation between any two species or networks. | yes | ||
| 8 | IGP | The closer to 1, the better | Defined to validate an individual cluster’s reproducibility and prediction accuracy. | yes | |||
The topology-based methods (TBA) and statistics-based methods (SBA) for module validation. The columns reports the types, index names, equations, criteria (the cut-off value to evaluate modules), applicable conditions, test data (whether this method requires an additional test network to validate a module) and references.
Figure 1(A) Hierarchical cluster tree showing coexpression modules identified by WGCNA. Each leaf in the tree represents one gene. The major tree branches constitute 9 modules labeled by different colors. (B) The medianRank preservation statistics (y-axis) of the modules. Each point represents a module, labeled by color and names. Low numbers on the y-axis indicate high preservation. (C) The Zsummary preservation statistics (y-axis) of the modules. The modules are labeled as in panel (B) The dashed blue and green lines indicate the thresholds. A Zsummary value over 2 represents a moderately preserved module, and a value over 10 provides strong evidence of module preservation. (D) Scatter plots showing the correlation between the Zsummary (y-axis) and module size (x-axis). (E) The cluster dendrogram with approximately unbiased (AU) P-values. The AU p-values are displayed in red, and clusters with an AU p-value lower than 0.05 are highlighted by rectangles. The calculations and the drawn figure were performed using the pvclust R package.
Figure 2The preserved and unpreserved modules.
Each node is a gene, and each edge is the co-expression relationship. (A) The turquoise module. (B) The yellow module. (C) The red module. (D) The magenta module.
Figure 3The comparison of TBA (Zsummary and medianRank) and SBA (AU p-value) on 10 datasets.
(A) A mean module size of 10 datasets. The y-axis is the mean module size (nodes), and the x-axis is each dataset number. (B) Comparison of the top 10 preserved modules validated by Zsummary and medianRank. The red spots represent the number of consistently ranked modules. The green spots represent the number of overlapping modules. The blue spots represent the number of non-overlapping modules. (C) The effect of changing the minimum module size setting from 4 to 10 on two datasets. The y-axis is the percentage of preserved modules, and the x-axis is the different cutoff value settings. (D) The percentage of preserved modules validated by Zsummary on 10 datasets. (E) The percentage of significant modules validated by AU P-value on 10 datasets. In (D,E) the blue bars indicate the number of all the modules detected, and the red bars indicate the number of preserved or significant modules. (F) The VSR and FR of Zsummary and AU P-value on 10 datasets. The red bars represent Zsummary, and the blue bars represent the AU P-value.
Figure 4The relationship between the percentage of valid (preserved or significant) modules (y-axis) and the network parameters (x-axis).
The network parameters were calculated by plugins in Cytoscape. (A,B) MCL versus valid module percentage. (C,D) Density versus valid module percentage. (E,F) Clustering coefficient versus valid module percentage. (G,H) Characteristic path length versus valid module percentage. (I,J) Network heterogeneity versus valid module percentage. (K,L) Network centralization versus valid module percentage. The red line added to each plot is the linear regression line with intercept 0 and slope 1.
Figure 5The changes of modules or gray genes (only for WGCNA) on 8 datasets by Gray area simulation.
The blue spots represent the number of modules identified by WGCNA or pvclust. The red spots represent the number of valid (preserved or significant) modules. The gray spots are the number of gray genes. The numerical value above the spots is the VR at 6 simulated levels compared with the original dataset.
Figure 6Comparison of WGCNA (Zsummary) and pvclust (AU p-value) on Grey area simulated datasets.
(A) The average VR of the modules and genes of 8 datasets. (B) The average VR of WGCNA and pvclust of all simulated datasets. The red bars represent WGCNA, and the blue bars represent pvclust. (C) The VSR and FR of Zsummary and AU P-value of 8 datasets at 6 simulated levels. The red bars represent Zsummary, and the blue bars represent AU P-value.