| Literature DB >> 24904632 |
Jason Montojo1, Khalid Zuberi1, Quentin Shao1, Gary D Bader1, Quaid Morris1.
Abstract
Significant effort has been invested in network-based gene function prediction algorithms based on the guilt by association (GBA) principle. Existing approaches for assessing prediction performance typically compute evaluation metrics, either averaged across all functions being considered, or strictly from properties of the network. Since the success of GBA algorithms depends on the specific function being predicted, evaluation metrics should instead be computed for each function. We describe a novel method for computing the usefulness of a network by measuring its impact on gene function cross validation prediction performance across all gene functions. We have implemented this in software called Network Assessor, and describe its use in the GeneMANIA (GM) quality control system. Network Assessor is part of the GM command line tools.Entities:
Keywords: cross validation; function prediction; machine learning; network biology; network inference
Year: 2014 PMID: 24904632 PMCID: PMC4032932 DOI: 10.3389/fgene.2014.00123
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Schematic diagram of the Network Assessor workflow. The green arrows indicate the first round of cross validation. Red arrows are second round. AUROC/AUPR statistics are computed for each node label (e.g., GO term) for each round of cross validation. Dashed lines indicate alternative options.
Median AUROC and AUPR for all networks in R6 and R8, as well as the default networks of each, respectively (bold indicates higher number per comparison).
| Median AUROC | 0.650 | 0.627 | ||
| 95% CI | ±0.316 | ±0.311 | ±0.334 | ±0.329 |
| versus R6 ( | *4.74 × 10−82 | *1.41 × 10−27 | ||
| versus R8 ( | *8.11 × 10−10 | |||
| Median AUROC | 0.871 | 0.857 | ||
| 95% CI | ±0.217 | ±0.195 | ±0.246 | ±0.212 |
| versus R6 ( | *9.96 × 10−258 | *5.68 × 10−129 | ||
| versus R8 ( | *4.53 × 10−35 | |||
| Median AUPR | 0.012 | 0.019 | ||
| 95% CI | ±0.349 | ±0.409 | ±0.343 | ±0.408 |
| versus R6 ( | *1.35 × 10−28 | *1.34 × 10−18 | ||
| versus R8 ( | *5.19 × 10−19 | |||
| Median AUPR | 0.185 | 0.181 | ||
| 95% CI | ±0.412 | ±0.528 | ±0.415 | ±0.529 |
| versus R6 ( | *8.45 × 10−256 | 4.62 × 10−1 | ||
| versus R8 ( | 3.56 × 10−2 | |||
The Wilcoxon rank sum test was performed on the following pairs conditions: R6 versus R8, R6 versus R6 (default), and R8 versus R8 (default). The p-values for these tests are listed with statistically significant values (p < 0.01) marked with an asterisk.
Figure 2Cumulative distributions of AUROC and AUPR of GO BP terms containing 3–10 annotations, and 11–300 from human network data from R6 and R8. The “(default)” suffix indicates only the networks selected by default on the web server were used from the data set. The lack of the suffix indicates all available networks were used.
Median AUROC and AUPR for R8 when all networks are used (All) compared to when default (-default), co-expression (-coexp), co-localization (-coloc), genetic interaction (-gi), pathway (-path), physical (protein) interaction (-pi), predicted (-predict), and shared protein domain (-spd) networks are removed, respectively.
| Total edges | 1.64 × 108 | 1.50 × 108 | 6.94 × 106 | 1.63 × 108 | 1.59 × 108 | 1.64 × 108 | 1.63 × 108 | 1.63 × 108 | 1.63 × 108 |
| Edges removed from all | 0 | 1.37 × 107 | 1.57 × 108 | 4.87 × 105 | 4.85 × 106 | 1.16 × 105 | 2.75 × 105 | 1.99 × 105 | 1.02 × 106 |
| Median AUROC | 0.694 | 0.685 | 0.675 | 0.694 | 0.695 | 0.692 | 0.694 | 0.694 | 0.688 |
| 95% CI | ±0.311 | ±0.309 | ±0.332 | ±0.311 | ±0.310 | ±0.311 | ±0.309 | ±0.310 | ±0.313 |
| % difference from all | −1.3% | −2.8% | −0.1% | 0.1% | −0.4% | −0.1% | −0.1% | −0.9% | |
| versus all ( | *2.17 × 10−3 | *1.43 × 10−23 | *5.04 × 10−67 | *4.23 × 10−109 | *1.46 × 10−4 | *5.86 × 10−13 | *4.05 × 10−17 | 7.05 × 10−1 | |
| Median AUROC | 0.890 | 0.866 | 0.864 | 0.889 | 0.890 | 0.887 | 0.887 | 0.890 | 0.880 |
| 95% CI | ±0.195 | ±0.206 | ±0.224 | ±0.196 | ±0.195 | ±0.197 | ±0.199 | ±0.196 | ±0.199 |
| % difference from all | −2.8% | −2.9% | −0.1% | 0.0% | −0.3% | −0.4% | 0.0% | −1.1% | |
| versus all ( | *0 | *6.91 × 10−183 | 1.91 × 10−1 | *8.24 × 10−20 | *1.54 × 10−22 | *1.01 × 10−27 | 8.83 × 10−1 | *1.88 × 10−219 | |
| Median AUPR | 0.019 | 0.011 | 0.025 | 0.019 | 0.020 | 0.019 | 0.016 | 0.018 | 0.017 |
| 95% CI | ±0.409 | ±0.377 | ±0.406 | ±0.409 | ±0.408 | ±0.407 | ±0.392 | ±0.411 | ±0.404 |
| % difference from all | −44.8% | 29.8% | 0.0% | 3.4% | −4.0% | −16.4% | −4.8% | −14.0% | |
| versus all ( | *2.43 × 10−9 | *5.94 × 10−15 | *6.98 × 10−61 | *5.10 × 10−83 | *8.71 × 10−13 | *1.97 × 10−11 | *3.40 × 10−24 | *1.82 × 10−4 | |
| Median AUPR | 0.220 | 0.186 | 0.209 | 0.219 | 0.220 | 0.218 | 0.206 | 0.218 | 0.215 |
| 95% CI | ±0.528 | ±0.527 | ±0.530 | ±0.528 | ±0.528 | ±0.525 | ±0.528 | ±0.528 | ±0.527 |
| % difference from all | −15.6% | −5.0% | −0.4% | 0.1% | −0.6% | −6.2% | −0.9% | −2.2% | |
| versus all ( | *1.14 × 10−181 | *4.89 × 10−9 | 3.20 × 10−1 | *3.63 × 10−38 | *3.92 × 10−16 | *1.20 × 10−43 | *3.40 × 10−24 | *1.82 × 10−4 | |
The number of edges removed for each analysis is also listed, as well as total edges used during assessment. The Wilcoxon rank sum test was used to compute the listed p-values, where significant values (p < 0.01) are marked with an asterisk.
Figure 3AUROC and AUPR performance of each GO term. X-axis denotes performance using all networks from R8 while the Y-axis is R8 without default networks. GO terms containing 90 genes or more consistently performed better using all networks from R8.
| 3702 | |
| 6239 | |
| 7227 | |
| 9606 | |
| 10090 | |
| 4932 | |
| 10116 |
| GO:0000046 | 0.498133458 | 0.548483434 | 0.101077 |
| GO:0000117 | 0.471654812 | 0.516121807 | 0.094279 |
| GO:0000114 | 0.461791463 | 0.503638908 | 0.09062 |