Literature DB >> 35448463

MetaboListem and TABoLiSTM: Two Deep Learning Algorithms for Metabolite Named Entity Recognition.

Cheng S Yeung1, Tim Beck2,3, Joram M Posma1,3.   

Abstract

Reviewing the metabolomics literature is becoming increasingly difficult because of the rapid expansion of relevant journal literature. Text-mining technologies are therefore needed to facilitate more efficient literature reviews. Here we contribute a standardised corpus of full-text publications from metabolomics studies and describe the development of two metabolite named entity recognition (NER) methods. These methods are based on Bidirectional Long Short-Term Memory (BiLSTM) networks and each incorporate different transfer learning techniques (for tokenisation and word embedding). Our first model (MetaboListem) follows prior methodology using GloVe word embeddings. Our second model exploits BERT and BioBERT for embedding and is named TABoLiSTM (Transformer-Affixed BiLSTM). The methods are trained on a novel corpus annotated using rule-based methods, and evaluated on manually annotated metabolomics articles. MetaboListem (F1-score 0.890, precision 0.892, recall 0.888) and TABoLiSTM (BioBERT version: F1-score 0.909, precision 0.926, recall 0.893) have achieved state-of-the-art performance on metabolite NER. A training corpus with full-text sentences from >1000 full-text Open Access metabolomics publications with 105,335 annotated metabolites was created, as well as a manually annotated test corpus (19,138 annotations). This work demonstrates that deep learning algorithms are capable of identifying metabolite names accurately and efficiently in text. The proposed corpus and NER algorithms can be used for metabolomics text-mining tasks such as information retrieval, document classification and literature-based discovery and are available from the omicsNLP GitHub repository.

Entities:  

Keywords:  deep learning; named entity recognition; natural language processing

Year:  2022        PMID: 35448463      PMCID: PMC9031427          DOI: 10.3390/metabo12040276

Source DB:  PubMed          Journal:  Metabolites        ISSN: 2218-1989


  33 in total

1.  RelEx--relation extraction using dependency parse trees.

Authors:  Katrin Fundel; Robert Küffner; Ralf Zimmer
Journal:  Bioinformatics       Date:  2006-12-01       Impact factor: 6.937

Review 2.  Metabolomics for Investigating Physiological and Pathophysiological Processes.

Authors:  David S Wishart
Journal:  Physiol Rev       Date:  2019-10-01       Impact factor: 37.312

3.  Biomedical named entity recognition using BERT in the machine reading comprehension framework.

Authors:  Cong Sun; Zhihao Yang; Lei Wang; Yin Zhang; Hongfei Lin; Jian Wang
Journal:  J Biomed Inform       Date:  2021-05-06       Impact factor: 6.317

4.  Character-level neural network for biomedical named entity recognition.

Authors:  Mourad Gridach
Journal:  J Biomed Inform       Date:  2017-05-11       Impact factor: 6.317

5.  Intensification of the central serotoninergic processes as a possible determinant of the thymoleptic effect.

Authors:  I P Lapin; G F Oxenkrug
Journal:  Lancet       Date:  1969-01-18       Impact factor: 79.321

6.  An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition.

Authors:  Ling Luo; Zhihao Yang; Pei Yang; Yin Zhang; Lei Wang; Hongfei Lin; Jian Wang
Journal:  Bioinformatics       Date:  2018-04-15       Impact factor: 6.937

7.  Is Current Practice Adhering to Guidelines Proposed for Metabolite Identification in LC-MS Untargeted Metabolomics? A Meta-Analysis of the Literature.

Authors:  Dritan Kodra; Petros Pousinis; Panagiotis A Vorkas; Katerina Kademoglou; Theodoros Liapikos; Alexandros Pechlivanis; Christina Virgiliou; Ian D Wilson; Helen Gika; Georgios Theodoridis
Journal:  J Proteome Res       Date:  2021-12-20       Impact factor: 4.466

8.  CHEMDNER: The drugs and chemical names extraction challenge.

Authors:  Martin Krallinger; Florian Leitner; Obdulia Rabal; Miguel Vazquez; Julen Oyarzabal; Alfonso Valencia
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

9.  CollaboNet: collaboration of deep neural networks for biomedical named entity recognition.

Authors:  Wonjin Yoon; Chan Ho So; Jinhyuk Lee; Jaewoo Kang
Journal:  BMC Bioinformatics       Date:  2019-05-29       Impact factor: 3.169

10.  NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature.

Authors:  Rezarta Islamaj; Robert Leaman; Sun Kim; Dongseop Kwon; Chih-Hsuan Wei; Donald C Comeau; Yifan Peng; David Cissel; Cathleen Coss; Carol Fisher; Rob Guzman; Preeti Gokal Kochar; Stella Koppel; Dorothy Trinh; Keiko Sekiya; Janice Ward; Deborah Whitman; Susan Schmidt; Zhiyong Lu
Journal:  Sci Data       Date:  2021-03-25       Impact factor: 6.444

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.