| Literature DB >> 17493253 |
Pierre Geurts1, Nizar Touleimat, Marie Dutreix, Florence d'Alché-Buc.
Abstract
BACKGROUND: Elucidating biological networks between proteins appears nowadays as one of the most important challenges in systems biology. Computational approaches to this problem are important to complement high-throughput technologies and to help biologists in designing new experiments. In this work, we focus on the completion of a biological network from various sources of experimental data.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17493253 PMCID: PMC1892073 DOI: 10.1186/1471-2105-8-S2-S4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1ROC curves. ROC curves for TF vs. LF edges (left) and TF vs. TF edges (right) with different sets of inputs, on the protein-protein interaction network (top) and the metabolic network (bottom).
AUC results.
| Inputs | All | TF vs. LF | TF vs. TF | Kern. (All) |
| Protein-protein interactions | ||||
| expr | 0.859 ± 0.027 | 0.819 ± 0.082 | 0.776 | |
| phy | 0.693 ± 0.036 | 0.698 ± 0.035 | 0.617 ± 0.064 | |
| loc | 0.725 ± 0.018 | 0.726 ± 0.017 | 0.710 ± 0.055 | |
| expr+phy+loc | 0 887 ± 0 024 | 0 891 ± 0 023 | 0 845 ± 0 081 | - |
| y2h | 0.795 ± 0.022 | 0.692 ± 0.068 | 0.612 | |
| expr+phy+loc+y2h | 0.910 ± 0.019 | 0.914 ± 0.017 | 0.865 ± 0.057 | |
| Metabolic network | ||||
| expr | 0.732 ± 0.035 | 0.619 ± 0.089 | 0.706 | |
| Phy | 0.819 ± 0.031 | 0.721 ± 0.086 | 0.747 | |
| loc | 0.587 ± 0.022 | 0.592 ± 0.042 | 0.577 | |
| expr+phy+loc | 0.853 ± 0.025 | 0.733 ± 0.057 | 0.804 | |
| y2h | 0.639 ± 0.033 | 0.650 ± 0.034 | 0.490 ± 0.098 | - |
| expr+phy+loc+y2h | 0.844 ± 0.025 | 0.851 ± 0.026 | 0.721 ± 0.056 | - |
AUC results obtained with extra-trees and ten-fold cross-validation compared with full kernel-based methods. The best result in each row between tree-based and kernel-based methods (for all predictions) is underlined.
Figure 2Decision tree. A decision tree obtained on the protein-protein interaction network using expression data, phylogenetic profiles and localization data as inputs. The tree size was determined by cost-complexity pruning with 10-fold cross-validation. The left (resp. right) edge from a test node corresponds to the test of the node being true (resp. false). Each leaf is labeled with a pair (N, p), where N is the number of proteins in its cluster and p is the percentage of protein pairs that interact in the cluster.
Figure 3Graph clustering. The projection of the tree leaves in Figure 2 on the protein-protein interaction network. Only the leaves that contain more than 5 proteins and 5% of connections are represented.
Variable ranking.
| Protein-protein interactions | Metabolic network | ||||
| # | Att. | Imp | # | Att. | Imp |
| 1 | loc – nucleolus | 0.021 | 1 | phy – dre | 0.011 |
| 2 | expr (Spell.) – elu 120 | 0.013 | 2 | phy – rno | 0.009 |
| 3 | loc – cytoplasm | 0.012 | 3 | expr (Eisen) – cdc15 120 m | 0.008 |
| 4 | expr (Eisen) – sporulation ndt80 early | 0.012 | 4 | phy – ecu | 0.008 |
| 5 | loc – nucleus | 0.012 | 5 | expr (Eisen) – cdc15 160 m | 0.008 |
| 6 | expr (Eisen) – sporulation 30 m | 0.011 | 6 | phy – pfa | 0.007 |
| 7 | expr (Eisen) – sporulation ndt80 middle | 0.010 | 7 | phy – mmu | 0.007 |
| 8 | expr (Spell.) – alpha 14 | 0.010 | 8 | loc – cytoplasm | 0.006 |
| 9 | expr (Spell.) – elu 150 | 0.010 | 9 | expr (Eisen) – cdc15 30 m | 0.005 |
| 10 | loc – mitochondrion | 0.009 | 10 | expr (Eisen) – elutriation 5.5 hrs | 0.005 |
Variable rankings obtained with expressions, phylogenetic profiles, and localization data used as inputs to extra-trees.
Figure 4Cluster prediction. Predictions of protein-protein interactions in a cluster of 198 genes. Blue diamond-shaped nodes are proteins present in the training sample, red circle-shaped nodes were not seen by the learning algorithm. Annotation was found using BiNGO.