| Literature DB >> 30723490 |
Hadas Biran1, Martin Kupiec2, Roded Sharan3.
Abstract
Network propagation is a central tool in biological research. While a number of variants and normalizations have been proposed for this method, each has its own shortcomings and no large scale assessment of those variants is available. Here we propose a novel normalization method for network propagation that is based on evaluating the propagation results against those obtained on randomized networks that preserve node degrees. In this way, our method overcomes potential biases of previous methods. We evaluate its performance on multiple large scale datasets and find that it compares favorably to previous approaches in diverse gene prioritization tasks. We further demonstrate its utility on a focused dataset of telomere length maintenance in yeast. The normalization method is available at http://anat.cs.tau.ac.il/WebPropagate.Entities:
Keywords: degree-preserving randomization; gene prioritization; network diffusion; p-value computation; protein–protein interaction network; telomere length maintenance
Year: 2019 PMID: 30723490 PMCID: PMC6350446 DOI: 10.3389/fgene.2019.00004
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Schematic pipeline of the RDPN method.
Average AUROC of the six methods across four data sets, using two variants of adjacency matrix normalization.
| Dataset | Symmetric adjacency matrix normalization | Degree-based adjacency matrix normalization | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Propagation | EC | DADA | RSS | RSS_SD | RDPN | Propagation | EC | DADA | RSS | RSS_SD | RDPN | |
| 0.695 | 0.74 | 0.707 | 0.729 | 0.745 | 0.663 | 0.685 | 0.738 | |||||
| 0.76 | 0.83 | 0.783 | 0.805 | 0.827 | 0.715 | 0.83 | 0.749 | 0.826 | 0.831 | |||
| 0.763 | 0.782 | 0.812 | 0.829 | 0.721 | 0.75 | 0.83 | 0.831 | |||||
| 0.74 | 0.798 | 0.757 | 0.774 | 0.797 | 0.707 | 0.802 | 0.734 | 0.798 | 0.8 | |||
FIGURE 2“Best method” counts, based on the AUROC measure, of the six methods across four data sets: Menche-OMIM (173 diseases), GO-MF (358 terms), GO-CC (306 terms), and GO-BP (1237 terms).
FIGURE 3Average rank vs. weighted degree of candidate proteins. Depicted here are ranks based on seed sets from five arbitrary diseases in the Menche-OMIM set (Menche et al., 2015); bins contain approximately equal numbers of proteins. Ranks are derived from the methods’ scores the better the score the lower the rank.
FIGURE 4Percent of proteins with p-values below 0.05 vs. seed set average weighted degree, using 173 seed sets from the Menche-OMIM data set (Menche et al., 2015).
Top 30 proteins obtained by the different methods in the telomere-length maintenance case study.
| Propagation | EC | DADA | RSS | RSS_SD | RDPN | |
|---|---|---|---|---|---|---|
| 1 | LIP2 | TFG2 | ||||
| 2 | SSB1 | RNH203 | SCW10 | |||
| 3 | SSA1 | RPI1 | SSA1 | RPB3 | ||
| 4 | RPN11 | RNH202 | SSB1 | SUB2 | MGM1 | |
| 5 | HHT1 | PMT5 | RNH203 | |||
| 6 | RPN11 | CPR7 | ||||
| 7 | CRM1 | RFU1 | RNH202 | RPO21 | CPR7 | |
| 8 | HHT2 | FLO11 | HHT1 | PAF1 | ||
| 9 | HHF1 | SPL2 | CRM1 | SUB2 | ||
| 10 | HSP82 | MVB12 | MGM1 | DLT1 | RPO21 | |
| 11 | CDC28 | HHT2 | UBP16 | |||
| 12 | RNH203 | MGM1 | HHF1 | SUP35 | BUD17 | |
| 13 | RSP5 | FMS1 | HSP82 | OLA1 | ||
| 14 | RNH202 | NTG2 | RSP5 | RIM8 | ||
| 15 | SSB2 | SAY1 | MTG2 | |||
| 16 | RPO21 | SCW10 | RPO21 | |||
| 17 | HHF2 | YKR051W | PEP5 | HTB1 | RPI1 | PEP5 |
| 18 | DSN1 | BSC1 | SUP35 | |||
| 19 | MGM1 | YBR063C | CDC28 | HTA2 | RSC3 | |
| 20 | CMR1 | SSB2 | SCP160 | RNH203 | ||
| 21 | PUT3 | YPK9 | ||||
| 22 | RVB1 | MLH3 | HHF2 | HHT2 | MVB12 | |
| 23 | RVB2 | IBA57 | DSN1 | NTG2 | PEP5 | |
| 24 | TOM1 | CIA2 | STH1 | ALG3 | ||
| 25 | RPC82 | MHF1 | HHF1 | REB1 | ||
| 26 | SSC1 | ERD2 | CMR1 | MRX1 | RNH202 | |
| 27 | PEP5 | BUD17 | RSC9 | |||
| 28 | YPR202W | TFG2 | ||||
| 29 | HTA2 | RIM8 | YJL070C | |||
| 30 | MMS22 | SRB4 | SCW10 | |||