Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 The effect of statistical normalization on network propagation scores.

Literature DB >> 33070187

The effect of statistical normalization on network propagation scores.

Sergio Picart-Armada^1,2, Wesley K Thompson^3,4, Alfonso Buil³, Alexandre Perera-Lluna^1,2.

Abstract

MOTIVATION: Network diffusion and label propagation are fundamental tools in computational biology, with applications like gene-disease association, protein function prediction and module discovery. More recently, several publications have introduced a permutation analysis after the propagation process, due to concerns that network topology can bias diffusion scores. This opens the question of the statistical properties and the presence of bias of such diffusion processes in each of its applications. In this work, we characterized some common null models behind the permutation analysis and the statistical properties of the diffusion scores. We benchmarked seven diffusion scores on three case studies: synthetic signals on a yeast interactome, simulated differential gene expression on a protein-protein interaction network and prospective gene set prediction on another interaction network. For clarity, all the datasets were based on binary labels, but we also present theoretical results for quantitative labels.
RESULTS: Diffusion scores starting from binary labels were affected by the label codification and exhibited a problem-dependent topological bias that could be removed by the statistical normalization. Parametric and non-parametric normalization addressed both points by being codification-independent and by equalizing the bias. We identified and quantified two sources of bias-mean value and variance-that yielded performance differences when normalizing the scores. We provided closed formulae for both and showed how the null covariance is related to the spectral properties of the graph. Despite none of the proposed scores systematically outperformed the others, normalization was preferred when the sought positive labels were not aligned with the bias. We conclude that the decision on bias removal should be problem and data-driven, i.e. based on a quantitative analysis of the bias and its relation to the positive entities. AVAILABILITY: The code is publicly available at https://github.com/b2slab/diffuBench and the data underlying this article are available at https://github.com/b2slab/retroData. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Species

Year: 2021 PMID： 33070187 PMCID： PMC8097756 DOI： 10.1093/bioinformatics/btaa896

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
References

32 in total

1. Comparative assessment of large-scale data sets of protein-protein interactions.

Authors: Christian von Mering; Roland Krause; Berend Snel; Michael Cornell; Stephen G Oliver; Stanley Fields; Peer Bork
Journal: Nature Date: 2002-05-08 Impact factor: 49.962

2. Evaluation of Cross-Validation Strategies in Sequence-Based Binding Prediction Using Deep Learning.

Authors: Angela Lopez-Del Rio; Alfons Nonell-Canals; David Vidal; Alexandre Perera-Lluna
Journal: J Chem Inf Model Date: 2019-02-22 Impact factor: 4.956

3. Graph convolutional networks for computational drug development and discovery.

Authors: Mengying Sun; Sendong Zhao; Coryandar Gilvary; Olivier Elemento; Jiayu Zhou; Fei Wang
Journal: Brief Bioinform Date: 2020-05-21 Impact factor: 11.622

4. diffuStats: an R package to compute diffusion-based scores on biological networks.

Authors: Sergio Picart-Armada; Wesley K Thompson; Alfonso Buil; Alexandre Perera-Lluna
Journal: Bioinformatics Date: 2018-02-01 Impact factor: 6.937

Review 5. Network medicine: a network-based approach to human disease.

Authors: Albert-László Barabási; Natali Gulbahce; Joseph Loscalzo
Journal: Nat Rev Genet Date: 2011-01 Impact factor: 53.242

6. Prioritizing candidate disease genes by network-based boosting of genome-wide association data.

Authors: Insuk Lee; U Martin Blom; Peggy I Wang; Jung Eun Shim; Edward M Marcotte
Journal: Genome Res Date: 2011-05-02 Impact factor: 9.043

7. Discovering causal pathways linking genomic events to transcriptional states using Tied Diffusion Through Interacting Events (TieDIE).

Authors: Evan O Paull; Daniel E Carlin; Mario Niepel; Peter K Sorger; David Haussler; Joshua M Stuart
Journal: Bioinformatics Date: 2013-08-27 Impact factor: 6.937

8. A large-scale benchmark of gene prioritization methods.

Authors: Dimitri Guala; Erik L L Sonnhammer
Journal: Sci Rep Date: 2017-04-21 Impact factor: 4.379

9. The BioGRID interaction database: 2017 update.

Authors: Andrew Chatr-Aryamontri; Rose Oughtred; Lorrie Boucher; Jennifer Rust; Christie Chang; Nadine K Kolas; Lara O'Donnell; Sara Oster; Chandra Theesfeld; Adnane Sellam; Chris Stark; Bobby-Joe Breitkreutz; Kara Dolinski; Mike Tyers
Journal: Nucleic Acids Res Date: 2016-12-14 Impact factor: 16.971

10. Null diffusion-based enrichment for metabolomics data.

Authors: Sergio Picart-Armada; Francesc Fernández-Albert; Maria Vinaixa; Miguel A Rodríguez; Suvi Aivio; Travis H Stracker; Oscar Yanes; Alexandre Perera-Lluna
Journal: PLoS One Date: 2017-12-06 Impact factor: 3.240