| Literature DB >> 29986472 |
Linlin Xing1, Maozu Guo2,3,4, Xiaoyan Liu5, Chunyu Wang6, Lei Zhang7.
Abstract
The explosion of genomic data provides new opportunities to improve the task of gene regulatory network reconstruction. Because of its inherent probability character, the Bayesian network is one of the most promising methods. However, excessive computation time and the requirements of a large number of biological samples reduce its effectiveness and application to gene regulatory network reconstruction. In this paper, Flooding-Pruning Hill-Climbing algorithm (FPHC) is proposed as a novel hybrid method based on Bayesian networks for gene regulatory networks reconstruction. On the basis of our previous work, we propose the concept of DPI Level based on data processing inequality (DPI) to better identify neighbors of each gene on the lack of enough biological samples. Then, we use the search-and-score approach to learn the final network structure in the restricted search space. We first analyze and validate the effectiveness of FPHC in theory. Then, extensive comparison experiments are carried out on known Bayesian networks and biological networks from the DREAM (Dialogue on Reverse Engineering Assessment and Methods) challenge. The results show that the FPHC algorithm, under recommended parameters, outperforms, on average, the original hill climbing and Max-Min Hill-Climbing (MMHC) methods with respect to the network structure and running time. In addition, our results show that FPHC is more suitable for gene regulatory network reconstruction with limited data.Entities:
Keywords: data processing inequality; flooding-pruning hill-climbing algorithm; gene regulatory networks; neighbor selection
Year: 2018 PMID: 29986472 PMCID: PMC6071145 DOI: 10.3390/genes9070342
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1The four kinds of duplicate paths.
Figure 2The relationships between target node T and other nodes.
Figure 3The example trace of Flooding-Pruning Hill Climbing (FPNS). DPI: data processing inequality.
Figure 4The neighbor selection results with different sample sizes and different DPI Levels. (a) sensitivity result in the alarm network; (b) specificity result in the alarm network. ‘--’ denotes no pruning.
Figure 5The sensitivity and specificity results in the alarm network.
The runtime of FPNS and MMPC.
| Sample Size | FPNS-D1 | MMPC |
|---|---|---|
| 50 | 54.2 | 5.17 × 102 |
| 100 | 51.8 | 2.31 × 103 |
| 200 | 52.3 | 3.59 × 103 |
| 500 | 52.2 | 1.15 × 104 |
| 1000 | 54.7 | 2.87 × 104 |
| 2000 | 52.6 | 1.19 × 105 |
| 5000 | 52.8 | 1.05 × 106 |
| 10,000 | 54.2 | 7.58 × 106 |
| 20,000 | 53.5 | 3.19 × 107 |
| 50,000 | 53.1 | 2.64 × 108 |
Figure 6The performance comparison of different methods when sample size is 50 and 10,000.
Figure 7The performance comparison of different sample sizes and methods on the Hailfinder network.
Figure 8The comparison results on the DREAM networks.