| Literature DB >> 26872036 |
Ting Wang1, Zhao Ren2, Ying Ding3, Zhou Fang3, Zhe Sun3, Matthew L MacDonald4, Robert A Sweet4,5,6, Jieru Wang1, Wei Chen1,3,7.
Abstract
Biological networks provide additional information for the analysis of human diseases, beyond the traditional analysis that focuses on single variables. Gaussian graphical model (GGM), a probability model that characterizes the conditional dependence structure of a set of random variables by a graph, has wide applications in the analysis of biological networks, such as inferring interaction or comparing differential networks. However, existing approaches are either not statistically rigorous or are inefficient for high-dimensional data that include tens of thousands of variables for making inference. In this study, we propose an efficient algorithm to implement the estimation of GGM and obtain p-value and confidence interval for each edge in the graph, based on a recent proposal by Ren et al., 2015. Through simulation studies, we demonstrate that the algorithm is faster by several orders of magnitude than the current implemented algorithm for Ren et al. without losing any accuracy. Then, we apply our algorithm to two real data sets: transcriptomic data from a study of childhood asthma and proteomic data from a study of Alzheimer's disease. We estimate the global gene or protein interaction networks for the disease and healthy samples. The resulting networks reveal interesting interactions and the differential networks between cases and controls show functional relevance to the diseases. In conclusion, we provide a computationally fast algorithm to implement a statistically sound procedure for constructing Gaussian graphical model and making inference with high-dimensional biological data. The algorithm has been implemented in an R package named "FastGGM".Entities:
Mesh:
Year: 2016 PMID: 26872036 PMCID: PMC4752261 DOI: 10.1371/journal.pcbi.1004755
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1Flowchart of FastGGM algorithm.
Performance of estimating the precision matrix.
| PCC | ChebDist | Type I error (p-value < 0.01) | AUC | |||
|---|---|---|---|---|---|---|
| 0.04 | 100 | 400 | 0.946 | 1.793 | 0.0088 | 0.878 |
| 0.02 | 200 | 400 | 0.903 | 1.938 | 0.0092 | 0.872 |
| 0.01 | 400 | 100 | 0.592 | 4.925 | 0.0073 | 0.721 |
| 0.01 | 400 | 200 | 0.725 | 3.023 | 0.0086 | 0.806 |
| 0.01 | 400 | 400 | 0.832 | 1.995 | 0.0093 | 0.879 |
| 0.01 | 400 | 800 | 0.905 | 1.356 | 0.0095 | 0.936 |
| 0.005 | 800 | 400 | 0.730 | 2.107 | 0.0093 | 0.886 |
| 0.005 | 1000 | 800 | 0.806 | 1.423 | 0.0096 | 0.941 |
| 0.0025 | 2000 | 800 | 0.695 | 1.520 | 0.0096 | 0.946 |
| 0.0001 | 5000 | 800 | 0.523 | 1.608 | 0.0097 | 0.946 |
| 0.0005 | 10000 | 1000 | 0.436 | 1.462 | 0.0097 | 0.960 |
Comparison of computational time.
| FastGGM_parallel with 10 CPUs (s) | FastGGM (s) | Huge_glasso (s) | ANT_GGM (s) | |||
|---|---|---|---|---|---|---|
| 0.04 | 100 | 400 | 0.064 | 0.395 | 0.593 | 819.441 |
| 0.02 | 200 | 400 | 0.236 | 1.488 | 1.108 | 3502.05 |
| 0.01 | 400 | 100 | 0.281 | 1.465 | 6.213 | 13780.36 |
| 0.01 | 400 | 200 | 0.516 | 2.735 | 6.406 | 15818.482 |
| 0.01 | 400 | 400 | 1.079 | 6.095 | 4.829 | 20306.049 |
| 0.01 | 400 | 800 | 3.14 | 20.876 | 4.139 | 31023.622 |
| 0.005 | 800 | 400 | 7.565 | 50.136 | 30.542 | 202324.949 |
| 0.005 | 1000 | 800 | 33.083 | 123.483 | 50.366 | 576136.41 |
| 0.0025 | 2000 | 800 | 175.816 | 922.169 | 457.981 | 3196656.58 |
| 0.0001 | 5000 | 800 | 5799.889 | 9902.518 | 8431.532 | - |
| 0.0005 | 10000 | 1000 | 48702.67 | 74007.22 | 59084.709 | - |
Fig 2Comparing gene association networks under asthmatic and healthy conditions.
A) Venn diagram of the edges and nodes in the asthmatic and healthy networks. B) Distributions of vertex degree in the two networks.
Fig 3Sub-network of differential gene-gene associations between asthmatic and healthy conditions.
The sizes of nodes are proportional to their degrees.
Fig 4Heat maps of synaptic protein network in AD cohort where red indicates stronger correlation and the white indicates weaker correlation.
The left and top color bars indicate the module membership of each protein (grey colored proteins do not belong to any module), with the corresponding hierarchical clustering dendrograms plotted. The left is the heat map based on partial correlations and the right is the heat map based on marginal correlations.
Top 10 pairs of proteins with significant partial correlations from the AD study.
| Protein1 | Protein2 | parCor | p.parCor | fdr.parCor | marCor | p.marCor | fdr.marCor |
|---|---|---|---|---|---|---|---|
| 0.93 | 0 | 0 | 0.99 | 1.50E-53 | 6.80E-50 | ||
| 0.83 | 3.50E-88 | 3.20E-84 | 0.98 | 2.80E-42 | 1.40E-39 | ||
| 0.81 | 1.60E-76 | 9.70E-73 | 0.99 | 1.20E-49 | 3.60E-46 | ||
| 0.78 | 2.20E-53 | 1.00E-49 | 0.92 | 1.40E-24 | 2.90E-23 | ||
| -0.76 | 2.30E-44 | 8.40E-41 | 0.0039 | 0.98 | 0.98 | ||
| 0.74 | 1.00E-35 | 3.10E-32 | 0.93 | 3.40E-26 | 9.50E-25 | ||
| 0.73 | 7.70E-32 | 2.00E-28 | 0.88 | 8.80E-20 | 7.50E-19 | ||
| 0.71 | 1.10E-28 | 2.50E-25 | 1 | 4.80E-59 | 8.80E-55 | ||
| 0.66 | 9.00E-19 | 1.80E-15 | 0.91 | 3.20E-23 | 5.20E-22 | ||
| 0.64 | 2.10E-16 | 3.90E-13 | 0.97 | 1.60E-37 | 3.30E-35 |