| Literature DB >> 31009060 |
Gwenaël G R Leday1, Sylvia Richardson1.
Abstract
Despite major methodological developments, Bayesian inference in Gaussian graphical models remains challenging in high dimension due to the tremendous size of the model space. This article proposes a method to infer the marginal and conditional independence structures between variables by multiple testing, which bypasses the exploration of the model space. Specifically, we introduce closed-form Bayes factors under the Gaussian conjugate model to evaluate the null hypotheses of marginal and conditional independence between variables. Their computation for all pairs of variables is shown to be extremely efficient, thereby allowing us to address large problems with thousands of nodes as required by modern applications. Moreover, we derive exact tail probabilities from the null distributions of the Bayes factors. These allow the use of any multiplicity correction procedure to control error rates for incorrect edge inclusion. We demonstrate the proposed approach on various simulated examples as well as on a large gene expression data set from The Cancer Genome Atlas.Entities:
Keywords: Bayes factor; Gaussian graphical model; correlation; high-dimensional data; inverse-Wishart distribution
Mesh:
Year: 2019 PMID: 31009060 PMCID: PMC6916355 DOI: 10.1111/biom.13064
Source DB: PubMed Journal: Biometrics ISSN: 0006-341X Impact factor: 2.571
Average and SD (in parenthesis) of areas under the ROC and PR curves over the simulated datasets, as a function of the true graph structure and sample size
| Band structure | Cluster structure | ||||
|---|---|---|---|---|---|
|
| Methods |
|
|
|
|
| 100 |
|
| 0.65 (0.03) |
|
|
| 100 |
|
|
| 0.79 (0.02) | 0.51 (0.04) |
| 100 |
| 0.88 (0.03) | 0.63 (0.05) | 0.78 (0.03) | 0.50 (0.04) |
| 100 |
|
| 0.61 (0.04) | 0.77 (0.02) | 0.53 (0.04) |
| 50 |
|
|
|
|
|
| 50 |
| 0.82 (0.03) | 0.51 (0.06) | 0.72 (0.03) | 0.37 (0.04) |
| 50 |
| 0.81 (0.03) | 0.47 (0.05) | 0.72 (0.02) | 0.35 (0.04) |
| 50 |
| 0.82 (0.02) | 0.44 (0.04) | 0.68 (0.02) | 0.33 (0.04) |
| 25 |
|
|
|
|
|
| 25 |
| 0.75 (0.04) | 0.32 (0.05) | 0.65 (0.03) | 0.23 (0.03) |
| 25 |
| 0.75 (0.04) | 0.27 (0.05) | 0.64 (0.03) | 0.22 (0.03) |
| 25 |
| 0.73 (0.03) | 0.28 (0.05) | 0.58 (0.02) | 0.15 (0.02) |
Abbreviation: AUC, area under curve; PR, precision recall; ROC, receiver operating characteristic.
beam, our method; bdmcmc and rjmcmc, methods of Mohammadi and Wit (2015); saturnin, method of Schwaller et al. (2017); , area under the ROC curve; area under the PR curve. Best performances are boldfaced.
Average and SD (in parenthesis) areas under the ROC and PR curves over the simulated datasets, and as a function of the true graph structure and sample size
| Band structure | Cluster structure | ||||
|---|---|---|---|---|---|
|
| Methods |
|
|
|
|
| 200 |
| 0.88 (0.01) | 0.55 (0.02) |
| 0.58 (0.01) |
| 200 |
|
|
|
| 0.59 (0.01) |
| 200 |
| 0.87 (0.01) |
| 0.89 (0.01) |
|
| 500 |
|
| 0.58 (0.01) |
| 0.50 (0.01) |
| 500 |
|
| 0.60 (0.01) |
|
|
| 500 |
| 0.90 (0.01) |
| 0.85 (0.01) | 0.49 (0.01) |
| 1000 |
|
| 0.49 (0.01) |
| 0.48 (0.01) |
| 1000 |
|
| 0.49 (0.01) |
|
|
| 1000 |
| 0.87 (0.01) |
| 0.87 (0.00) | 0.48 (0.01) |
Abbreviation: AUC, area under curve; PR, precision recall; ROC, receiver operating characteristic.
beam, our method; saturnin, method of Schwaller et al. (2017); genenet, method of Schäfer and Strimmer (2005); fastggm, method of Ren et al. (2015); , area under the ROC curve; area under the PR curve. Best performances are boldfaced.
Figure 1Running time in seconds (assessed on 3.40 GHz Intel Core i7‐3770 CPU) for each method when
Figure 2A, Log‐marginal likelihood of the GC model as a function of . The vertical and horizontal dotted lines indicates the location of the optimum. B, Degree distribution of the conditional independence graph. GC, Gaussian conjugate
Figure 3Example of a densely connected gene subgraph identified by the clustering algorithm of Blondel et al. (2008)