| Literature DB >> 24688854 |
Anabel Usie1, Hiren Karathia2, Ivan Teixidó3, Rui Alves2, Francesc Solsona3.
Abstract
UNLABELLED: One way to initiate the reconstruction of molecular circuits is by using automated text-mining techniques. Developing more efficient methods for such reconstruction is a topic of active research, and those methods are typically included by bioinformaticians in pipelines used to mine and curate large literature datasets. Nevertheless, experimental biologists have a limited number of available user-friendly tools that use text-mining for network reconstruction and require no programming skills to use. One of these tools is Biblio-MetReS. Originally, this tool permitted an on-the-fly analysis of documents contained in a number of web-based literature databases to identify co-occurrence of proteins/genes. This approach ensured results that were always up-to-date with the latest live version of the databases. However, this 'up-to-dateness' came at the cost of large execution times. Here we report an evolution of the application Biblio-MetReS that permits constructing co-occurrence networks for genes, GO processes, Pathways, or any combination of the three types of entities and graphically represent those entities. We show that the performance of Biblio-MetReS in identifying gene co-occurrence is as least as good as that of other comparable applications (STRING and iHOP). In addition, we also show that the identification of GO processes is on par to that reported in the latest BioCreAtIvE challenge. Finally, we also report the implementation of a new strategy that combines on-the-fly analysis of new documents with preprocessed information from documents that were encountered in previous analyses. This combination simultaneously decreases program run time and maintains 'up-to-dateness' of the results. AVAILABILITY: http://metres.udl.cat/index.php/downloads, CONTACT: metres.cmb@gmail.com.Entities:
Keywords: Literature analysis; Network reconstruction; Systems biology
Year: 2014 PMID: 24688854 PMCID: PMC3940481 DOI: 10.7717/peerj.276
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1User workflow for Biblio-MetReS.
Figure 2Effect of preprocessing documents on Biblio-MetReS’ run time.
In brief, genes from three KEGG-defined pathways are used for this test. Panels A.x show experimental results for glycolysis genes. Panels B.x show experimental results for Lysine metabolism genes. Panels C.x show experimental results for RNA processing genes. Three organisms are used in this benchmark. Panels Y.1 show results for Homo sapiens, panels Y.2 show results for Drosophila melanogaster, panels Y.3 show results for Escherichia coli, and panels Y.4 show results for Saccharomyces cerevisiae. These pathways and organisms were chosen to remain consistent with the tests performed in Usié . Searches were done selecting all the databases in the application. Graphs can be interpreted as follows. Light gray bars indicate the run time for Biblio-MetReS when the corresponding gene is searched for the first time. In this case the program has to do a full document analysis on the fly and no information has been preprocessed. Darker gray bars indicate the run time for Biblio-MetReS when the search for the corresponding gene is repeated, and preprocessed information is already present in Biblio-MetReS’ central database. The column ‘All’ indicates the run-time for searching all genes in the graph simultaneously, after individual searches for each gene had already been done and results preprocessed and stored.