Literature DB >> 17992743

Mining gene-disease relationships from biomedical literature: weighting protein-protein interactions and connectivity measures.

Graciela Gonzalez1, Juan C Uribe, Luis Tari, Colleen Brophy, Chitta Baral.   

Abstract

MOTIVATION: The promises of the post-genome era disease-related discoveries and advances have yet to be fully realized, with many opportunities for discovery hiding in the millions of biomedical papers published since. Public databases give access to data extracted from the literature by teams of experts, but their coverage is often limited and lags behind recent discoveries. We present a computational method that combines data extracted from the literature with data from curated sources in order to uncover possible gene-disease relationships that are not directly stated or were missed by the initial mining.
METHOD: An initial set of genes and proteins is obtained from gene-disease relationships extracted from PubMed abstracts using natural language processing. Interactions involving the corresponding proteins are similarly extracted and integrated with interactions from curated databases (such as BIND and DIP), assigning a confidence measure to each interaction depending on its source. The augmented list of genes and gene products is then ranked combining two scores: one that reflects the strength of the relationship with the initial set of genes and incorporates user-defined weights and another that reflects the importance of the gene in maintaining the connectivity of the network. We applied the method to atherosclerosis to assess its effectiveness.
RESULTS: Top-ranked proteins from the method are related to atherosclerosis with accuracy between 0.85 to 1.00 for the top 20 and 0.64 to 0.80 for the top 90 if duplicates are ignored, with 45% of the top 20 and 75% of the top 90 derived by the method, not extracted from text. Thus, though the initial gene set and interactions were automatically extracted from text (and subject to the impreciseness of automatic extraction), their use for further hypothesis generation is valuable given adequate computational analysis.

Entities:  

Mesh:

Year:  2007        PMID: 17992743

Source DB:  PubMed          Journal:  Pac Symp Biocomput        ISSN: 2335-6928


  21 in total

1.  Mining connections between chemicals, proteins, and diseases extracted from Medline annotations.

Authors:  Nancy C Baker; Bradley M Hemminger
Journal:  J Biomed Inform       Date:  2010-03-27       Impact factor: 6.317

2.  Extraction of genotype-phenotype-drug relationships from text: from entity recognition to bioinformatics application.

Authors:  Adrien Coulet; Nigam Shah; Lawrence Hunter; Chitta Barral; Russ B Altman
Journal:  Pac Symp Biocomput       Date:  2010

3.  Mining the pharmacogenomics literature--a survey of the state of the art.

Authors:  Udo Hahn; K Bretonnel Cohen; Yael Garten; Nigam H Shah
Journal:  Brief Bioinform       Date:  2012-07       Impact factor: 11.622

4.  Teaching computers to read the pharmacogenomics literature ... so you don't have to.

Authors:  Yael Garten; Russ B Altman
Journal:  Pharmacogenomics       Date:  2010-04       Impact factor: 2.533

5.  Topology-driven protein-protein interaction network analysis detects genetic sub-networks regulating reproductive capacity.

Authors:  Tarun Kumar; Leo Blondel; Cassandra G Extavour
Journal:  Elife       Date:  2020-09-09       Impact factor: 8.140

6.  A vector space model approach to identify genetically related diseases.

Authors:  Indra Neil Sarkar
Journal:  J Am Med Inform Assoc       Date:  2012-01-06       Impact factor: 4.497

7.  Advances in translational bioinformatics: computational approaches for the hunting of disease genes.

Authors:  Maricel G Kann
Journal:  Brief Bioinform       Date:  2009-12-10       Impact factor: 11.622

8.  Disease candidate gene identification and prioritization using protein interaction networks.

Authors:  Jing Chen; Bruce J Aronow; Anil G Jegga
Journal:  BMC Bioinformatics       Date:  2009-02-27       Impact factor: 3.169

9.  Identifying gene-disease associations using centrality on a literature mined gene-interaction network.

Authors:  Arzucan Ozgür; Thuy Vu; Günes Erkan; Dragomir R Radev
Journal:  Bioinformatics       Date:  2008-07-01       Impact factor: 6.937

10.  An integrated approach to inferring gene-disease associations in humans.

Authors:  Predrag Radivojac; Kang Peng; Wyatt T Clark; Brandon J Peters; Amrita Mohan; Sean M Boyle; Sean D Mooney
Journal:  Proteins       Date:  2008-08-15
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.