| Literature DB >> 34912373 |
Kaixian Yu1, Zihan Cui1, Xin Sui1, Xing Qiu2, Jinfeng Zhang1.
Abstract
Bayesian networks (BNs) provide a probabilistic, graphical framework for modeling high-dimensional joint distributions with complex correlation structures. BNs have wide applications in many disciplines, including biology, social science, finance and biomedical science. Despite extensive studies in the past, network structure learning from data is still a challenging open question in BN research. In this study, we present a sequential Monte Carlo (SMC)-based three-stage approach, GRowth-based Approach with Staged Pruning (GRASP). A double filtering strategy was first used for discovering the overall skeleton of the target BN. To search for the optimal network structures we designed an adaptive SMC (adSMC) algorithm to increase the quality and diversity of sampled networks which were further improved by a third stage to reclaim edges missed in the skeleton discovery step. GRASP gave very satisfactory results when tested on benchmark networks. Finally, BN structure learning using multiple types of genomics data illustrates GRASP's potential in discovering novel biological relationships in integrative genomic studies.Entities:
Keywords: Bayesian network; Bayesian network structure learning; GRASP for BN structure learning; adaptive sequential Monte Carlo; biological network inference; sequential Monte Carlo
Year: 2021 PMID: 34912373 PMCID: PMC8668238 DOI: 10.3389/fgene.2021.764020
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Structure discovering procedure.
Bayesian networks used in the simulation study.
| Name | # of nodes | # of edges | # of parameters | Max in-degree |
|---|---|---|---|---|
| Alarm | 37 | 46 | 509 | 4 |
| Andes | 223 | 338 | 1,157 | 6 |
| Child | 20 | 25 | 230 | 2 |
| Hailfinder | 56 | 66 | 2,656 | 4 |
| Hepar2 | 70 | 1,236 | 1,453 | 6 |
| Insurance | 27 | 52 | 984 | 3 |
| Win95pts | 76 | 112 | 574 | 7 |
FIGURE 2Recall and f1 score of different methods with observation size 1,000. One can see that DF generally has higher recalls with higher or comparable F1-scores for the same network.
FIGURE 3BIC scores of all methods on seven benchmark networks with observation size 1,000. GRASP has higher BIC scores for all the benchmark networks.
FIGURE 4BIC scores for the flow cytometry data, comparing 12 methods, and GRASP has the highest BIC score. The y-axis value is the ratio of the BIC score of the sampled network and the true network. It is possible that a sampled network has even higher BIC score than the true network, hence the value can be higher than 1.
FIGURE 5The BN structure learned by GRASP using multiple different genomic features which are highly correlated with the expression of LOC90784. Orange nodes: mRNA transcripts; Red nodes: microRNAs; Blue nodes: protein expressions; Green nodes: DNA methylations.