| Literature DB >> 25009552 |
Catharina Olsen1, Gianluca Bontempi1, Frank Emmert-Streib2, John Quackenbush3, Benjamin Haibe-Kains4.
Abstract
When inferring networks from high-throughput genomic data, one of the main challenges is the subsequent validation of these networks. In the best case scenario, the true network is partially known from previous research results published in structured databases or research articles. Traditionally, inferred networks are validated against these known interactions. Whenever the recovery rate is gauged to be high enough, subsequent high scoring but unknown inferred interactions are deemed good candidates for further experimental validation. Therefore such validation framework strongly depends on the quantity and quality of published interactions and presents serious pitfalls: (1) availability of these known interactions for the studied problem might be sparse; (2) quantitatively comparing different inference algorithms is not trivial; and (3) the use of these known interactions for validation prevents their integration in the inference procedure. The latter is particularly relevant as it has recently been showed that integration of priors during network inference significantly improves the quality of inferred networks. To overcome these problems when validating inferred networks, we recently proposed a data-driven validation framework based on single gene knock-down experiments. Using this framework, we were able to demonstrate the benefits of integrating prior knowledge and expression data. In this paper we used this framework to assess the quality of different sources of prior knowledge on their own and in combination with different genomic data sets in colorectal cancer. We observed that most prior sources lead to significant F-scores. Furthermore, their integration with genomic data leads to a significant increase in F-scores, especially for priors extracted from full text PubMed articles, known co-expression modules and genetic interactions. Lastly, we observed that the results are consistent for three different data sets: experimental knock-down data and two human tumor data sets.Entities:
Keywords: colon cancer; knockdown; network inference; prior knowledge; validation
Year: 2014 PMID: 25009552 PMCID: PMC4067568 DOI: 10.3389/fgene.2014.00177
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Quantitative validation framework for network inference. The framework relies on a set of single-gene knock-down experiments in a leave-one-out cross-validation scheme.
Number of genes significantly affected by KD (out of 339 genes) based on gene expression data with FDR <10%.
| Number of affected genes | 73 | 122 | 33 | 38 |
| 117 | 59 | 99 | 61 |
Specifications of prior knowledge retrieval tools: .
| PN | PubMed and databases | (PN) | 419 |
| Co-expr | (GM2) | 2760 | |
| Co-local | (GM3) | 292 | |
| Genetic | (GM4) | 1546 | |
| GM | Pathway | (GM5) | 100 |
| Physical | (GM6) | 38 | |
| Predicted | (GM7) | 29 | |
| Shared | (GM8) | 199 | |
Figure 2Results when inferring networks with . The height of each bar corresponds to the obtained F-score, colored by prior source. The x-axis specifies the prior source and includes * if the F-score is significant with p-value <0.05 and − for p-values < 0.1.
Figure 3Results when inferring networks with . The height of each bar corresponds to the obtained F-score, colored by prior source. The x-axis specifies the prior source and includes * if the F-score is significant with p-value <0.05 and − for p-values < 0.1.
Best single prior source across three large colorectal cancer data sets (kd for knock-down experiments in colorectal cancer cell lines, .
| CDK5 | GM2 | PN | PN |
| HRAS | GM2 | GM4 | GM2 |
| MAP2K1 | GM2 | GM2 | GM2 |
| MAP2K2 | PN | PN | GM7 |
| MAPK1 | PN | PN | PN |
| MAPK3 | PN | PN | PN |
| NGFR | GM4 | GM4 | GM4 |
| RAF1 | PN | GM8 | PN |
Figure 4Results when inferring networks with . The height of each bar corresponds to the obtained F-score, colored by which prior source was added. The x-axis specifies the prior source and includes * if the F-score is significant with p-value < 0.05 and − for p-values < 0.1.