| Literature DB >> 28933027 |
Fabio Nikolay1, Marius Pesavento2, George Kritikos3, Nassos Typas3.
Abstract
In this paper, we consider the problem of learning the genetic interaction map, i.e., the topology of a directed acyclic graph (DAG) of genetic interactions from noisy double-knockout (DK) data. Based on a set of well-established biological interaction models, we detect and classify the interactions between genes. We propose a novel linear integer optimization program called the Genetic-Interactions-Detector (GENIE) to identify the complex biological dependencies among genes and to compute the DAG topology that matches the DK measurements best. Furthermore, we extend the GENIE program by incorporating genetic interaction profile (GI-profile) data to further enhance the detection performance. In addition, we propose a sequential scalability technique for large sets of genes under study, in order to provide statistically significant results for real measurement data. Finally, we show via numeric simulations that the GENIE program and the GI-profile data extended GENIE (GI-GENIE) program clearly outperform the conventional techniques and present real data results for our proposed sequential scalability technique.Entities:
Keywords: Big data; Discrete optimization; Genetic interaction analysis; Graph learning; Large-scale gene networks; Multiple hypothesis test
Year: 2017 PMID: 28933027 PMCID: PMC5607220 DOI: 10.1186/s13637-017-0063-3
Source DB: PubMed Journal: EURASIP J Bioinform Syst Biol ISSN: 1687-4145
Fig. 1DAG of 13 genes and root node R
Fig. 2Possible hierarchical relationship classes between two arbitrary genes i,j of DAG according to [2]
Fig. 3Example SMAP
Fig. 4Schematically reduced DAGs to corresponding to Eqs. (5a)–(5e), respectively
Fig. 5Left: Original DAG with corresponding set of hierarchical relationship classes . Right: Reconstruction of DAG based on
Proposed sparse edge detection policy
|
|
Proposed reporter node edge detection policy
|
|
Fig. 6Example DAG to elucidate the functionality of the RHS of condition E of Table 2
Summary of the proposed SEQSCA-algorithm
|
| |
|
| |
| 1: Select subset | |
| 2: Update: | |
| 3: Estimate the DAG topology | |
| 4: Update reliability matrix | |
| 7: Update iteration number: | |
|
| |
| Set |
Fig. 7D ed versus SNR; t corr=0.6; 200 Monte Carlo runs; λ =0.05, λ =1, λ =0.8
Fig. 8D mis versus SNR; t corr=0.6; 200 Monte Carlo runs; λ =0.05, λ =1, λ =0.8
Fig. 9Reliability matrix G; S=5e 4 subsets considered; subset size N =10
Fig. 10Reliability matrix GI; S=5e 4 subsets considered; subset size N =10; λ =1e3, λ =1, λ =0.85
Acceptance ratios; ε=0.05
| Method: |
|
|---|---|
| SEQSCA and GENIE | 53 |
| SEQSCA and GI-GENIE | 74 |