Literature DB >> 27121612

Text mining for precision medicine: automating disease-mutation relationship extraction from biomedical literature.

Ayush Singhal1, Michael Simmons1, Zhiyong Lu2.   

Abstract

OBJECTIVE: Identifying disease-mutation relationships is a significant challenge in the advancement of precision medicine. The aim of this work is to design a tool that automates the extraction of disease-related mutations from biomedical text to advance database curation for the support of precision medicine.
MATERIALS AND METHODS: We developed a machine-learning (ML) based method to automatically identify the mutations mentioned in the biomedical literature related to a particular disease. In order to predict a relationship between the mutation and the target disease, several features, such as statistical features, distance features, and sentiment features, were constructed. Our ML model was trained with a pre-labeled dataset consisting of manually curated information about mutation-disease associations. The model was subsequently used to extract disease-related mutations from larger biomedical literature corpora.
RESULTS: The performance of the proposed approach was assessed using a benchmarking dataset. Results show that our proposed approach gains significant improvement over the previous state of the art and obtains F-measures of 0.880 and 0.845 for prostate and breast cancer mutations, respectively. DISCUSSION: To demonstrate its utility, we applied our approach to all abstracts in PubMed for 3 diseases (including a non-cancer disease). The mutations extracted were then manually validated against human-curated databases. The validation results show that the proposed approach is useful in a real-world setting to extract uncurated disease mutations from the biomedical literature.
CONCLUSIONS: The proposed approach improves the state of the art for mutation-disease extraction from text. It is scalable and generalizable to identify mutations for any disease at a PubMed scale. Published by Oxford University Press on behalf of the American Medical Informatics Association 2016. This work is written by US Government employees and is in the public domain in the United States.

Entities:  

Keywords:  automated extraction; breast cancer; disease-mutation relationship; machine learning; precision medicine; prostate cancer; text mining

Mesh:

Year:  2016        PMID: 27121612      PMCID: PMC4926749          DOI: 10.1093/jamia/ocw041

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  23 in total

1.  A new initiative on precision medicine.

Authors:  Francis S Collins; Harold Varmus
Journal:  N Engl J Med       Date:  2015-01-30       Impact factor: 91.245

2.  The Cancer Biomedical Informatics Grid (caBIG): infrastructure and applications for a worldwide research community.

Authors: 
Journal:  Stud Health Technol Inform       Date:  2007

3.  Personalized medicine: challenges and opportunities for translational bioinformatics.

Authors:  Casey Lynnette Overby; Peter Tarczy-Hornoch
Journal:  Per Med       Date:  2013-07-01       Impact factor: 2.512

4.  MutationFinder: a high-performance system for extracting point mutation mentions from text.

Authors:  J Gregory Caporaso; William A Baumgartner; David A Randolph; K Bretonnel Cohen; Lawrence Hunter
Journal:  Bioinformatics       Date:  2007-05-11       Impact factor: 6.937

5.  SNPedia: a wiki supporting personal genome annotation, interpretation and analysis.

Authors:  Michael Cariaso; Greg Lennon
Journal:  Nucleic Acids Res       Date:  2011-12-02       Impact factor: 16.971

6.  Mutation extraction tools can be combined for robust recognition of genetic variants in the literature.

Authors:  Antonio Jimeno Yepes; Karin Verspoor
Journal:  F1000Res       Date:  2014-01-21

7.  Adapting a natural language processing tool to facilitate clinical trial curation for personalized cancer therapy.

Authors:  Jia Zeng; Yonghui Wu; Ann Bailey; Amber Johnson; Vijaykumar Holla; Elmer V Bernstam; Hua Xu; Funda Meric-Bernstam
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2014-04-07

8.  McKusick's Online Mendelian Inheritance in Man (OMIM).

Authors:  Joanna Amberger; Carol A Bocchini; Alan F Scott; Ada Hamosh
Journal:  Nucleic Acids Res       Date:  2008-10-08       Impact factor: 16.971

9.  PubTator: a web-based text mining tool for assisting biocuration.

Authors:  Chih-Hsuan Wei; Hung-Yu Kao; Zhiyong Lu
Journal:  Nucleic Acids Res       Date:  2013-05-22       Impact factor: 16.971

10.  ClinVar: public archive of relationships among sequence variation and human phenotype.

Authors:  Melissa J Landrum; Jennifer M Lee; George R Riley; Wonhee Jang; Wendy S Rubinstein; Deanna M Church; Donna R Maglott
Journal:  Nucleic Acids Res       Date:  2013-11-14       Impact factor: 16.971

View more
  16 in total

Review 1.  Text Mining for Precision Medicine: Bringing Structure to EHRs and Biomedical Literature to Understand Genes and Health.

Authors:  Michael Simmons; Ayush Singhal; Zhiyong Lu
Journal:  Adv Exp Med Biol       Date:  2016       Impact factor: 2.622

2.  ResidueFinder: extracting individual residue mentions from protein literature.

Authors:  Ton E Becker; Eric Jakobsson
Journal:  J Biomed Semantics       Date:  2021-07-21

3.  Biomarker identification of hepatocellular carcinoma using a methodical literature mining strategy.

Authors:  Nai-Wen Chang; Hong-Jie Dai; Yung-Yu Shih; Chi-Yang Wu; Mira Anne C Dela Rosa; Rofeamor P Obena; Yu-Ju Chen; Wen-Lian Hsu; Yen-Jen Oyang
Journal:  Database (Oxford)       Date:  2017-01-01       Impact factor: 3.451

4.  A Deep Phenotype Association Study Reveals Specific Phenotype Associations with Genetic Variants in Age-related Macular Degeneration: Age-Related Eye Disease Study 2 (AREDS2) Report No. 14.

Authors:  Freekje van Asten; Michael Simmons; Ayush Singhal; Tiarnan D Keenan; Rinki Ratnapriya; Elvira Agrón; Traci E Clemons; Anand Swaroop; Zhiyong Lu; Emily Y Chew
Journal:  Ophthalmology       Date:  2017-10-31       Impact factor: 12.079

5.  Precision medicine informatics.

Authors:  Lewis J Frey; Elmer V Bernstam; Joshua C Denny
Journal:  J Am Med Inform Assoc       Date:  2016-06-06       Impact factor: 7.942

6.  Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine.

Authors:  Ayush Singhal; Michael Simmons; Zhiyong Lu
Journal:  PLoS Comput Biol       Date:  2016-11-30       Impact factor: 4.475

7.  Deep learning of mutation-gene-drug relations from the literature.

Authors:  Kyubum Lee; Byounggun Kim; Yonghwa Choi; Sunkyu Kim; Wonho Shin; Sunwon Lee; Sungjoon Park; Seongsoon Kim; Aik Choon Tan; Jaewoo Kang
Journal:  BMC Bioinformatics       Date:  2018-01-25       Impact factor: 3.169

8.  eGARD: Extracting associations between genomic anomalies and drug responses from text.

Authors:  A S M Ashique Mahmood; Shruti Rao; Peter McGarvey; Cathy Wu; Subha Madhavan; K Vijay-Shanker
Journal:  PLoS One       Date:  2017-12-20       Impact factor: 3.240

9.  MAGPEL: an autoMated pipeline for inferring vAriant-driven Gene PanEls from the full-length biomedical literature.

Authors:  Nafiseh Saberian; Adib Shafi; Azam Peyvandipour; Sorin Draghici
Journal:  Sci Rep       Date:  2020-07-23       Impact factor: 4.379

10.  Detecting Potential Adverse Drug Reactions Using a Deep Neural Network Model.

Authors:  Chi-Shiang Wang; Pei-Ju Lin; Ching-Lan Cheng; Shu-Hua Tai; Yea-Huei Kao Yang; Jung-Hsien Chiang
Journal:  J Med Internet Res       Date:  2019-02-06       Impact factor: 5.428

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.