Literature DB >> 28435015

Using classification models for the generation of disease-specific medications from biomedical literature and clinical data repository.

Liqin Wang1, Peter J Haug2, Guilherme Del Fiol3.   

Abstract

OBJECTIVE: Mining disease-specific associations from existing knowledge resources can be useful for building disease-specific ontologies and supporting knowledge-based applications. Many association mining techniques have been exploited. However, the challenge remains when those extracted associations contained much noise. It is unreliable to determine the relevance of the association by simply setting up arbitrary cut-off points on multiple scores of relevance; and it would be expensive to ask human experts to manually review a large number of associations. We propose that machine-learning-based classification can be used to separate the signal from the noise, and to provide a feasible approach to create and maintain disease-specific vocabularies.
METHOD: We initially focused on disease-medication associations for the purpose of simplicity. For a disease of interest, we extracted potentially treatment-related drug concepts from biomedical literature citations and from a local clinical data repository. Each concept was associated with multiple measures of relevance (i.e., features) such as frequency of occurrence. For the machine purpose of learning, we formed nine datasets for three diseases with each disease having two single-source datasets and one from the combination of previous two datasets. All the datasets were labeled using existing reference standards. Thereafter, we conducted two experiments: (1) to test if adding features from the clinical data repository would improve the performance of classification achieved using features from the biomedical literature only, and (2) to determine if classifier(s) trained with known medication-disease data sets would be generalizable to new disease(s).
RESULTS: Simple logistic regression and LogitBoost were two classifiers identified as the preferred models separately for the biomedical-literature datasets and combined datasets. The performance of the classification using combined features provided significant improvement beyond that using biomedical-literature features alone (p-value<0.001). The performance of the classifier built from known diseases to predict associated concepts for new diseases showed no significant difference from the performance of the classifier built and tested using the new disease's dataset.
CONCLUSION: It is feasible to use classification approaches to automatically predict the relevance of a concept to a disease of interest. It is useful to combine features from disparate sources for the task of classification. Classifiers built from known diseases were generalizable to new diseases.
Copyright © 2017 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Biomedical literature; Classification; Clinical data repository; Disease-specific vocabulary; Machine learning; Ontology

Mesh:

Year:  2017        PMID: 28435015      PMCID: PMC5509335          DOI: 10.1016/j.jbi.2017.04.014

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  20 in total

1.  The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text.

Authors:  Thomas C Rindflesch; Marcelo Fiszman
Journal:  J Biomed Inform       Date:  2003-12       Impact factor: 6.317

2.  Mining a clinical data warehouse to discover disease-finding associations using co-occurrence statistics.

Authors:  Hui Cao; Marianthi Markatou; Genevieve B Melton; Michael F Chiang; George Hripcsak
Journal:  AMIA Annu Symp Proc       Date:  2005

3.  Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms.

Authors: 
Journal:  Neural Comput       Date:  1998-09-15       Impact factor: 2.026

4.  Identifying gene-disease associations using centrality on a literature mined gene-interaction network.

Authors:  Arzucan Ozgür; Thuy Vu; Günes Erkan; Dragomir R Radev
Journal:  Bioinformatics       Date:  2008-07-01       Impact factor: 6.937

5.  A method for the development of disease-specific reference standards vocabularies from textual biomedical literature resources.

Authors:  Liqin Wang; Bruce E Bray; Jianlin Shi; Guilherme Del Fiol; Peter J Haug
Journal:  Artif Intell Med       Date:  2016-02-27       Impact factor: 5.326

6.  Extraction of semantic biomedical relations from text using conditional random fields.

Authors:  Markus Bundschus; Mathaeus Dejori; Martin Stetter; Volker Tresp; Hans-Peter Kriegel
Journal:  BMC Bioinformatics       Date:  2008-04-23       Impact factor: 3.169

7.  Validating an ontology-based algorithm to identify patients with type 2 diabetes mellitus in electronic health records.

Authors:  Alireza Rahimi; Siaw-Teng Liaw; Jane Taggart; Pradeep Ray; Hairong Yu
Journal:  Int J Med Inform       Date:  2014-06-20       Impact factor: 4.046

8.  Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study.

Authors:  Elizabeth S Chen; George Hripcsak; Hua Xu; Marianthi Markatou; Carol Friedman
Journal:  J Am Med Inform Assoc       Date:  2007-10-18       Impact factor: 4.497

9.  PDON: Parkinson's disease ontology for representation and modeling of the Parkinson's disease knowledge domain.

Authors:  Erfan Younesi; Ashutosh Malhotra; Michaela Gündel; Phil Scordis; Alpha Tom Kodamullil; Matt Page; Bernd Müller; Stephan Springstubbe; Ullrich Wüllner; Dieter Scheller; Martin Hofmann-Apitius
Journal:  Theor Biol Med Model       Date:  2015-09-22       Impact factor: 2.432

10.  dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text.

Authors:  Rong Xu; Li Li; Quanqiu Wang
Journal:  BMC Bioinformatics       Date:  2014-04-12       Impact factor: 3.169

View more
  1 in total

1.  A Stroke Risk Detection: Improving Hybrid Feature Selection Method.

Authors:  Yonglai Zhang; Yaojian Zhou; Dongsong Zhang; Wenai Song
Journal:  J Med Internet Res       Date:  2019-04-02       Impact factor: 5.428

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.