Paolo Tieri1, Christine Nardini. 1. Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Yue Yang Road 320, Shanghai, P. R. China.
Abstract
BACKGROUND: issues and limitations related to accessibility, understandability and ease of use of signalling pathway databases may hamper or divert research workflow, leading, in the worst case, to the generation of confusing reference frameworks and misinterpretation of experimental results. In an attempt to retrieve signalling pathway data related to a specific set of test genes, we queried and analysed the results from six of the major curated signalling pathway databases: Reactome, PathwayCommons, KEGG, InnateDB, PID, and Wikipathways. FINDINGS: although we expected differences - often a desirable feature for the integration of each individual query, we observed variations of exceptional magnitude, with disproportionate quality and quantity of the results. Some of the more remarkable differences can be explained by the diverse conceptual designs and purposes of the databases, the types of data stored and the structure of the query, as well as by missing or erroneous descriptions of the search procedure. To go beyond the mere enumeration of these problems, we identified a number of operational features, in particular inner and cross coherence, which, once quantified, offer objective criteria to choose the best source of information. CONCLUSIONS: in silico biology heavily relies on the information stored in databases. To ensure that computational biology mirrors biological reality and offers focused hypotheses to be experimentally validated, coherence of data codification is crucial and yet highly underestimated. We make practical recommendations for the end-user to cope with the current state of the databases as well as for the maintainers of those databases to contribute to the goal of the full enactment of the open data paradigm.
BACKGROUND: issues and limitations related to accessibility, understandability and ease of use of signalling pathway databases may hamper or divert research workflow, leading, in the worst case, to the generation of confusing reference frameworks and misinterpretation of experimental results. In an attempt to retrieve signalling pathway data related to a specific set of test genes, we queried and analysed the results from six of the major curated signalling pathway databases: Reactome, PathwayCommons, KEGG, InnateDB, PID, and Wikipathways. FINDINGS: although we expected differences - often a desirable feature for the integration of each individual query, we observed variations of exceptional magnitude, with disproportionate quality and quantity of the results. Some of the more remarkable differences can be explained by the diverse conceptual designs and purposes of the databases, the types of data stored and the structure of the query, as well as by missing or erroneous descriptions of the search procedure. To go beyond the mere enumeration of these problems, we identified a number of operational features, in particular inner and cross coherence, which, once quantified, offer objective criteria to choose the best source of information. CONCLUSIONS: in silico biology heavily relies on the information stored in databases. To ensure that computational biology mirrors biological reality and offers focused hypotheses to be experimentally validated, coherence of data codification is crucial and yet highly underestimated. We make practical recommendations for the end-user to cope with the current state of the databases as well as for the maintainers of those databases to contribute to the goal of the full enactment of the open data paradigm.
Authors: Emmanuel Minet; Linsey E Haswell; Sarah Corke; Anisha Banerjee; Andrew Baxter; Ivan Verrastro; Francisco De Abreu E Lima; Tomasz Jaunky; Simone Santopietro; Damien Breheny; Marianna D Gaça Journal: Sci Rep Date: 2021-03-17 Impact factor: 4.379
Authors: Rima Chaudhuri; James R Krycer; Daniel J Fazakerley; Kelsey H Fisher-Wellman; Zhiduan Su; Kyle L Hoehn; Jean Yee Hwa Yang; Zdenka Kuncic; Fatemeh Vafaee; David E James Journal: Sci Rep Date: 2018-01-29 Impact factor: 4.379