| Literature DB >> 27842512 |
Casey P Shannon1,2, Virginia Chen3,4, Mandeep Takhar5, Zsuzsanna Hollander3,4, Robert Balshaw3,6, Bruce M McManus3,7,4, Scott J Tebbutt3,8,4, Don D Sin8,4, Raymond T Ng3,5,4.
Abstract
BACKGROUND: Gene network inference (GNI) algorithms can be used to identify sets of coordinately expressed genes, termed network modules from whole transcriptome gene expression data. The identification of such modules has become a popular approach to systems biology, with important applications in translational research. Although diverse computational and statistical approaches have been devised to identify such modules, their performance behavior is still not fully understood, particularly in complex human tissues. Given human heterogeneity, one important question is how the outputs of these computational methods are sensitive to the input sample set, or stability. A related question is how this sensitivity depends on the size of the sample set. We describe here the SABRE (Similarity Across Bootstrap RE-sampling) procedure for assessing the stability of gene network modules using a re-sampling strategy, introduce a novel criterion for identifying stable modules, and demonstrate the utility of this approach in a clinically-relevant cohort, using two different gene network module discovery algorithms.Entities:
Keywords: Bootstrap; Gene modules; Reproducibility; Systems biology; WGCNA
Mesh:
Year: 2016 PMID: 27842512 PMCID: PMC5109843 DOI: 10.1186/s12859-016-1319-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Visual derivation of the H-index. A visual depiction of the derivation of the H-index is shown for modules identified by the WGCNA algorithm. A set of reference modules derived from all available samples are compared to a series of comparator module sets derived from bootstrapped re-sampled data. For each re-sampled dataset, all reference modules are compared to all newly identified modules using the Jaccard similarity coefficient. For each reference module, the best match Jaccard similarity coefficient value is recorded. Finally, these best match similarity coefficients are sorted and a measure of the area under the resulting curve (Hirsch-index) used to estimate the reference module’s stability
Fig. 2Gene module stability increases with sample size and module size. For each n, we sampled without replacement from all available gene expression profiles 10 times. In each case, a reference module set was produced (by WCGNA), 100 bootstrap re-samplings of the selected expression profiles generated, and the stability of the reference module set across bootstrap re-samplings determined as described in the Methods section. a Stability of the modules is visualized at n = 10, 20, 40, 80, 120 and 160 for the 1st, 5th and 10th rank modules. b Stability is plotted against module size at n = 10, 20, 40, 80, 120 and 160. The dotted line depicts the best-case stability of random modules in simulation. We compare the stability of S (1st quartile) and XL (4th quartile) modules using Wilcoxon’s rank-sum test
Fig. 3Stability profiles differ between algorithms. Gene module similarity across bootstrap re-samplings, for all reference network modules identified by three gene module discovery algorithms, is visualized using box plots (a: WGCNA; b: Pihur, c: Chaussabel). The stability of the network modules is summarized using the H-index (red). The dotted line depicts the best-case stability of random modules in simulation
Fig. 4Stable modules are more interconnected. The relationship between module stability and a number of topological measures of network connectivity is visualized for modules identified by WGCNA (blue) or the Chaussabel approach (red). Stability is positively associated (Spearman’s ρ) with both number of appearance in the shortest path and number of triads in the network (* p ≤ 0.05)
Fig. 5Stable modules are more readily annotated. Module gene over-representation in annotated gene sets (sum of –log10 p-value for the hypergeometric test) is visualized, for modules with varying stability, in the MSigDB and BTM collections. Stability is positively associated (Spearman’s ρ) with our ability to assign module to known biology (* p ≤ 0.05; † p ≤ 0.10) in many of these collections