Literature DB >> 25946866

A hybrid named entity tagger for tagging human proteins/genes.

Kalpana Raja, Suresh Subramani, Jeyakumar Natarajan.   

Abstract

The predominant step and pre-requisite in the analysis of scientific literature is the extraction of gene/protein names in biomedical texts. Though many taggers are available for this Named Entity Recognition (NER) task, we found none of them achieve a good state-of-art tagging for human genes/proteins. As most of the current text mining research is related to human literature, a good tagger to precisely tag human genes and proteins is highly desirable. In this paper, we propose a new hybrid approach based on (a) machine learning algorithm (conditional random fields), (b) set of (manually constructed) rules, and (c) a novel abbreviation identification algorithm to surmount the common errors observed in available taggers to tag human genes/proteins. Experiment results on JNLPBA2004 corpus show that our domain specific approach achieves a high precision of 80.47, F-score of 75.77 and outperforms most of the state-of-the-art systems. However, the recall of 71.60 still remains low and leaves much room for future improvement.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 25946866     DOI: 10.1504/ijdmb.2014.064545

Source DB:  PubMed          Journal:  Int J Data Min Bioinform        ISSN: 1748-5673            Impact factor:   0.667


  8 in total

1.  Text Mining and Machine Learning Protocol for Extracting Human-Related Protein Phosphorylation Information from PubMed.

Authors:  Krishnamurthy Arumugam; Raja Ravi Shanker
Journal:  Methods Mol Biol       Date:  2022

2.  A Text Mining Protocol for Mining Biological Pathways and Regulatory Networks from Biomedical Literature.

Authors:  Sabenabanu Abdulkadhar; Jeyakumar Natarajan
Journal:  Methods Mol Biol       Date:  2022

3.  A Text Mining and Machine Learning Protocol for Extracting Posttranslational Modifications of Proteins from PubMed: A Special Focus on Glycosylation, Acetylation, Methylation, Hydroxylation, and Ubiquitination.

Authors:  Krishnamurthy Arumugam; Malathi Sellappan; Dheepa Anand; Sadhanha Anand; Subhashini Vedagiri Radhakrishnan
Journal:  Methods Mol Biol       Date:  2022

4.  Machine learning workflow to enhance predictions of Adverse Drug Reactions (ADRs) through drug-gene interactions: application to drugs for cutaneous diseases.

Authors:  Kalpana Raja; Matthew Patrick; James T Elder; Lam C Tsoi
Journal:  Sci Rep       Date:  2017-06-16       Impact factor: 4.379

5.  BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition.

Authors:  Gurusamy Murugesan; Sabenabanu Abdulkadhar; Balu Bhasuran; Jeyakumar Natarajan
Journal:  EURASIP J Bioinform Syst Biol       Date:  2017-05-05

Review 6.  A Review of Recent Advancement in Integrating Omics Data with Literature Mining towards Biomedical Discoveries.

Authors:  Kalpana Raja; Matthew Patrick; Yilin Gao; Desmond Madu; Yuyang Yang; Lam C Tsoi
Journal:  Int J Genomics       Date:  2017-02-26       Impact factor: 2.326

7.  Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application.

Authors:  Gondy Leroy; Yang Gu; Sydney Pettygrove; Maureen K Galindo; Ananyaa Arora; Margaret Kurzius-Spencer
Journal:  J Med Internet Res       Date:  2018-11-07       Impact factor: 5.428

8.  Biomolecular Relationships Discovered from Biological Labyrinth and Lost in Ocean of Literature: Community Efforts Can Rescue Until Automated Artificial Intelligence Takes Over.

Authors:  Rajinder Gupta; Shrikant S Mantri
Journal:  Front Genet       Date:  2016-03-31       Impact factor: 4.599

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.