Literature DB >> 17463015

A quantitative model for linking two disparate sets of articles in MEDLINE.

Vetle I Torvik1, Neil R Smalheiser.   

Abstract

BACKGROUND: Identifying information that implicitly links two disparate sets of articles is a fundamental and intuitive data mining strategy that can help investigators address real scientific questions. The Arrowsmith two-node search finds title words and phrases (so-called B-terms) that are shared across two sets of articles within MEDLINE and displays them in a manner that facilitates human assessment. A serious stumbling-block has been the lack of a quantitative model for predicting which of the hundreds if not thousands of B-terms computed for a given search are most likely to be relevant to the investigator. METHODOLOGY/PRINCIPAL
FINDINGS: Using a public two-node search interface, field testers devised a set of two-node searches under real life conditions and a certain number of B-terms were marked relevant. These were employed as 'gold standards;' each B-term was characterized according to eight complementary features that were strongly correlated with relevance. A logistic regression model was developed that permits one to estimate the probability of relevance for each B-term, to rank B-terms according to their likely relevance, and to estimate the overall number of relevant B-terms inherent in a given two-node search.
CONCLUSIONS/SIGNIFICANCE: The model greatly simplifies and streamlines the process of carrying out a two-node search, and may be applicable to a number of other literature-based discovery applications, including the so-called one-node search and related gene-centric strategies that incorporate implicit links to predict how genes may be related to each other and to human diseases. This should encourage much wider exploration of text mining for implicit information among the general scientific community. AVAILABILITY: Two-node searches can be carried out freely at http://arrowsmith.psych.uic.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Year:  2007        PMID: 17463015     DOI: 10.1093/bioinformatics/btm161

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  17 in total

Review 1.  Frontiers of biomedical text mining: current progress.

Authors:  Pierre Zweigenbaum; Dina Demner-Fushman; Hong Yu; Kevin B Cohen
Journal:  Brief Bioinform       Date:  2007-10-30       Impact factor: 11.622

2.  Biomedical ontologies in action: role in knowledge management, data integration and decision support.

Authors:  O Bodenreider
Journal:  Yearb Med Inform       Date:  2008

3.  Extracting causal relations on HIV drug resistance from literature.

Authors:  Quoc-Chinh Bui; Breanndán O Nualláin; Charles A Boucher; Peter M A Sloot
Journal:  BMC Bioinformatics       Date:  2010-02-23       Impact factor: 3.169

4.  Gaps within the Biomedical Literature: Initial Characterization and Assessment of Strategies for Discovery.

Authors:  Yufang Peng; Gary Bonifield; Neil R Smalheiser
Journal:  Front Res Metr Anal       Date:  2017-05-22

5.  Rediscovering Don Swanson: the Past, Present and Future of Literature-Based Discovery.

Authors:  Neil R Smalheiser
Journal:  J Data Inf Sci       Date:  2017-12

6.  Arrowsmith two-node search interface: a tutorial on finding meaningful links between two disparate sets of articles in MEDLINE.

Authors:  Neil R Smalheiser; Vetle I Torvik; Wei Zhou
Journal:  Comput Methods Programs Biomed       Date:  2009-01-30       Impact factor: 5.428

7.  Author Name Disambiguation in MEDLINE.

Authors:  Vetle I Torvik; Neil R Smalheiser
Journal:  ACM Trans Knowl Discov Data       Date:  2009-07-01       Impact factor: 2.713

8.  MeSHHeading2vec: a new method for representing MeSH headings as vectors based on graph embedding algorithm.

Authors:  Zhen-Hao Guo; Zhu-Hong You; De-Shuang Huang; Hai-Cheng Yi; Kai Zheng; Zhan-Heng Chen; Yan-Bin Wang
Journal:  Brief Bioinform       Date:  2021-03-22       Impact factor: 11.622

9.  Discovering gene functional relationships using FAUN (Feature Annotation Using Nonnegative matrix factorization).

Authors:  Elina Tjioe; Michael W Berry; Ramin Homayouni
Journal:  BMC Bioinformatics       Date:  2010-10-07       Impact factor: 3.169

10.  Proceedings of the 2008 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference.

Authors:  Jonathan D Wren; Dawn Wilkins; James C Fuscoe; Susan Bridges; Stephen Winters-Hilt; Yuriy Gusev
Journal:  BMC Bioinformatics       Date:  2008-08-12       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.