Literature DB >> 27134681

Terminology spectrum analysis of natural-language chemical documents: term-like phrases retrieval routine.

Boris L Alperin1, Andrey O Kuzmin2, Ludmila Yu Ilina1, Vladimir D Gusev3, Natalia V Salomatina3, Valentin N Parmon2.   

Abstract

BACKGROUND: This study seeks to develop, test and assess a methodology for automatic extraction of a complete set of 'term-like phrases' and to create a terminology spectrum from a collection of natural language PDF documents in the field of chemistry. The definition of 'term-like phrases' is one or more consecutive words and/or alphanumeric string combinations with unchanged spelling which convey specific scientific meanings. A terminology spectrum for a natural language document is an indexed list of tagged entities including: recognized general scientific concepts, terms linked to existing thesauri, names of chemical substances/reactions and term-like phrases. The retrieval routine is based on n-gram textual analysis with a sequential execution of various 'accept and reject' rules with taking into account the morphological and structural information.
RESULTS: The assessment of the retrieval process, expressed quantitatively with a precision (P), recall (R) and F1-measure, which are calculated manually from a limited set of documents (the full set of text abstracts belonging to 5 EuropaCat events were processed) by professional chemical scientists, has proved the effectiveness of the developed approach. The term-like phrase parsing efficiency is quantified with precision (P = 0.53), recall (R = 0.71) and F1-measure (F1 = 0.61) values.
CONCLUSION: The paper suggests using such terminology spectra to perform various types of textual analysis across document collections. This sort of the terminology spectrum may be successfully employed for text information retrieval, for reference database development, to analyze research trends in subject fields of research and to look for the similarity between documents.Graphical abstractTerminology spectrum building process with term-like phrases retrieval.

Entities:  

Keywords:  Natural language text analysis; Term-like phrases retrieval; Terminology spectrum; Text information retrieval; n-Gram analysis

Year:  2016        PMID: 27134681      PMCID: PMC4850643          DOI: 10.1186/s13321-016-0136-4

Source DB:  PubMed          Journal:  J Cheminform        ISSN: 1758-2946            Impact factor:   5.514


  5 in total

1.  Global text matching for information retrieval.

Authors:  G Salton; C Buckley
Journal:  Science       Date:  1991-08-30       Impact factor: 47.728

2.  Developments in automatic text retrieval.

Authors:  G Salton
Journal:  Science       Date:  1991-08-30       Impact factor: 47.728

3.  ChemicalTagger: A tool for semantic text-mining in chemistry.

Authors:  Lezan Hawizy; David M Jessop; Nico Adams; Peter Murray-Rust
Journal:  J Cheminform       Date:  2011-05-16       Impact factor: 5.514

4.  OSCAR4: a flexible architecture for chemical text-mining.

Authors:  David M Jessop; Sam E Adams; Egon L Willighagen; Lezan Hawizy; Peter Murray-Rust
Journal:  J Cheminform       Date:  2011-10-14       Impact factor: 5.514

Review 5.  Chemical named entities recognition: a review on approaches and applications.

Authors:  Safaa Eltyeb; Naomie Salim
Journal:  J Cheminform       Date:  2014-04-28       Impact factor: 5.514

  5 in total
  2 in total

Review 1.  Opportunities and challenges of text mining in aterials research.

Authors:  Olga Kononova; Tanjin He; Haoyan Huo; Amalie Trewartha; Elsa A Olivetti; Gerbrand Ceder
Journal:  iScience       Date:  2021-02-06

2.  Quantifying the advantage of domain-specific pre-training on named entity recognition tasks in materials science.

Authors:  Amalie Trewartha; Nicholas Walker; Haoyan Huo; Sanghoon Lee; Kevin Cruse; John Dagdelen; Alexander Dunn; Kristin A Persson; Gerbrand Ceder; Anubhav Jain
Journal:  Patterns (N Y)       Date:  2022-04-08
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.