| Literature DB >> 33987868 |
Wessel N van Wieringen1,2, Yao Chen3.
Abstract
Gaussian graphical models are usually estimated from unreplicated data. The data are, however, likely to comprise signal and noise. These two cannot be deconvoluted from unreplicated data. Pragmatically, the noise is then ignored in practice. We point out the consequences of this practice for the reconstruction of the conditional independence graph of the signal. Replicated data allow for the deconvolution of signal and noise and the reconstruction of former's conditional independence graph. Hereto we present a penalized Expectation-Maximization algorithm. The penalty parameter is chosen to maximize the F-fold cross-validated log-likelihood. Sampling schemes of the folds from replicated data are discussed. By simulation we investigate the effect of replicates on the reconstruction of the signal's conditional independence graph. Moreover, we compare the proposed method to several obvious competitors. In an application we use data from oncogenomic studies with replicates to reconstruct the gene-gene interaction networks, operationalized as conditional independence graphs. This yields a realistic portrait of the effect of ignoring other sources but sampling variation. In addition, it bears implications on the reproducibility of inferred gene-gene interaction networks reported in literature.Entities:
Keywords: conditional independence graph; inverse covariance; network; reproducibility; ridge penalty
Mesh:
Year: 2021 PMID: 33987868 PMCID: PMC8360145 DOI: 10.1002/sim.9028
Source DB: PubMed Journal: Stat Med ISSN: 0277-6715 Impact factor: 2.497
FIGURE 1Various simulation results w.r.t. edge recovery for a banded signal precision matrix and a uniform error precision matrix . All plots show the partial AUC, integrated w.r.t. 1 − specificity from 0 to 0.1, of 100 simulation runs. In the top panel p = 50 and the pAUCs are plotted against various (n, K )‐combinations. The left bottom panel plots, for p = 10, 25, 50, the averaged pAUC vs the number of replicated samples with and all K ∈ {1, 2}. The right bottom panel, in which (n, p, K) = (50, 50, 2), shows boxplots of pAUCs of five methods. Legend for the labels at its tick marks: ‘, full ’: Ridge penalized EM algorithm without the diagonal error precision matrix assumption; “, diag ”: Ridge penalized EM algorithm with the diagonal error precision matrix assumption; “, Y average”: Ridge penalized estimation of from replicate‐wise averaged data (ie, ; “L 1, diag ”: Lasso penalized EM algorithm with the diagonal error precision matrix assumption; “L 1, Y average”: Lasso penalized estimation of from replicate‐wise averaged data (ie, [Colour figure can be viewed at wileyonlinelibrary.com]
FIGURE 2Left panel: the percentage of overlapping edges (y‐axis) between the conditional independence graphs formed by selecting the top r (x‐axis) strongest (in an absolute sense) partial correlations from the standardized signal precision matrix and the “observation” precision matrix . Each line represents a different pathway and connects the percentages of overlapping edges found for a top of varying sizes r, r = 1, … , 250. Right panel: boxplots of partial correlations of randomly selected edges evaluated from a fixed signal Z diluted with varying errors . For reference the partial correlations from the undiluted signals are added as blue diamonds [Colour figure can be viewed at wileyonlinelibrary.com]
FIGURE 3On the right, a horizontal bar plot of—per pathway—the number of overlapping edges between the top 100 strongest edges of the CIGs reconstructed from the RNA‐seq, micro‐array and the joint data. The left panel represents the accompanying color legend via a venn diagram [Colour figure can be viewed at wileyonlinelibrary.com]