Literature DB >> 33070187

The effect of statistical normalization on network propagation scores.

Sergio Picart-Armada1,2, Wesley K Thompson3,4, Alfonso Buil3, Alexandre Perera-Lluna1,2.   

Abstract

MOTIVATION: Network diffusion and label propagation are fundamental tools in computational biology, with applications like gene-disease association, protein function prediction and module discovery. More recently, several publications have introduced a permutation analysis after the propagation process, due to concerns that network topology can bias diffusion scores. This opens the question of the statistical properties and the presence of bias of such diffusion processes in each of its applications. In this work, we characterized some common null models behind the permutation analysis and the statistical properties of the diffusion scores. We benchmarked seven diffusion scores on three case studies: synthetic signals on a yeast interactome, simulated differential gene expression on a protein-protein interaction network and prospective gene set prediction on another interaction network. For clarity, all the datasets were based on binary labels, but we also present theoretical results for quantitative labels.
RESULTS: Diffusion scores starting from binary labels were affected by the label codification and exhibited a problem-dependent topological bias that could be removed by the statistical normalization. Parametric and non-parametric normalization addressed both points by being codification-independent and by equalizing the bias. We identified and quantified two sources of bias-mean value and variance-that yielded performance differences when normalizing the scores. We provided closed formulae for both and showed how the null covariance is related to the spectral properties of the graph. Despite none of the proposed scores systematically outperformed the others, normalization was preferred when the sought positive labels were not aligned with the bias. We conclude that the decision on bias removal should be problem and data-driven, i.e. based on a quantitative analysis of the bias and its relation to the positive entities. AVAILABILITY: The code is publicly available at https://github.com/b2slab/diffuBench and the data underlying this article are available at https://github.com/b2slab/retroData. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Year:  2021        PMID: 33070187      PMCID: PMC8097756          DOI: 10.1093/bioinformatics/btaa896

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  32 in total

1.  Comparative assessment of large-scale data sets of protein-protein interactions.

Authors:  Christian von Mering; Roland Krause; Berend Snel; Michael Cornell; Stephen G Oliver; Stanley Fields; Peer Bork
Journal:  Nature       Date:  2002-05-08       Impact factor: 49.962

2.  Evaluation of Cross-Validation Strategies in Sequence-Based Binding Prediction Using Deep Learning.

Authors:  Angela Lopez-Del Rio; Alfons Nonell-Canals; David Vidal; Alexandre Perera-Lluna
Journal:  J Chem Inf Model       Date:  2019-02-22       Impact factor: 4.956

3.  Graph convolutional networks for computational drug development and discovery.

Authors:  Mengying Sun; Sendong Zhao; Coryandar Gilvary; Olivier Elemento; Jiayu Zhou; Fei Wang
Journal:  Brief Bioinform       Date:  2020-05-21       Impact factor: 11.622

4.  diffuStats: an R package to compute diffusion-based scores on biological networks.

Authors:  Sergio Picart-Armada; Wesley K Thompson; Alfonso Buil; Alexandre Perera-Lluna
Journal:  Bioinformatics       Date:  2018-02-01       Impact factor: 6.937

Review 5.  Network medicine: a network-based approach to human disease.

Authors:  Albert-László Barabási; Natali Gulbahce; Joseph Loscalzo
Journal:  Nat Rev Genet       Date:  2011-01       Impact factor: 53.242

6.  Prioritizing candidate disease genes by network-based boosting of genome-wide association data.

Authors:  Insuk Lee; U Martin Blom; Peggy I Wang; Jung Eun Shim; Edward M Marcotte
Journal:  Genome Res       Date:  2011-05-02       Impact factor: 9.043

7.  Discovering causal pathways linking genomic events to transcriptional states using Tied Diffusion Through Interacting Events (TieDIE).

Authors:  Evan O Paull; Daniel E Carlin; Mario Niepel; Peter K Sorger; David Haussler; Joshua M Stuart
Journal:  Bioinformatics       Date:  2013-08-27       Impact factor: 6.937

8.  A large-scale benchmark of gene prioritization methods.

Authors:  Dimitri Guala; Erik L L Sonnhammer
Journal:  Sci Rep       Date:  2017-04-21       Impact factor: 4.379

9.  The BioGRID interaction database: 2017 update.

Authors:  Andrew Chatr-Aryamontri; Rose Oughtred; Lorrie Boucher; Jennifer Rust; Christie Chang; Nadine K Kolas; Lara O'Donnell; Sara Oster; Chandra Theesfeld; Adnane Sellam; Chris Stark; Bobby-Joe Breitkreutz; Kara Dolinski; Mike Tyers
Journal:  Nucleic Acids Res       Date:  2016-12-14       Impact factor: 16.971

10.  Null diffusion-based enrichment for metabolomics data.

Authors:  Sergio Picart-Armada; Francesc Fernández-Albert; Maria Vinaixa; Miguel A Rodríguez; Suvi Aivio; Travis H Stracker; Oscar Yanes; Alexandre Perera-Lluna
Journal:  PLoS One       Date:  2017-12-06       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.