Literature DB >> 31361962

Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature.

L Weston, V Tshitoyan, J Dagdelen, O Kononova, A Trewartha, K A Persson, G Ceder, A Jain.   

Abstract

The number of published materials science articles has increased manyfold over the past few decades. Now, a major bottleneck in the materials discovery pipeline arises in connecting new results with the previously established literature. A potential solution to this problem is to map the unstructured raw text of published articles onto structured database entries that allow for programmatic querying. To this end, we apply text mining with named entity recognition (NER) for large-scale information extraction from the published materials science literature. The NER model is trained to extract summary-level information from materials science documents, including inorganic material mentions, sample descriptors, phase labels, material properties and applications, as well as any synthesis and characterization methods used. Our classifier achieves an accuracy (f1) of 87%, and is applied to information extraction from 3.27 million materials science abstracts. We extract more than 80 million materials-science-related named entities, and the content of each abstract is represented as a database entry in a structured format. We demonstrate that simple database queries can be used to answer complex "meta-questions" of the published literature that would have previously required laborious, manual literature searches to answer. All of our data and functionality has been made freely available on our Github ( https://github.com/materialsintelligence/matscholar ) and website ( http://matscholar.com ), and we expect these results to accelerate the pace of future materials science discovery.

Year:  2019        PMID: 31361962     DOI: 10.1021/acs.jcim.9b00470

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  12 in total

1.  Text-mined dataset of gold nanoparticle synthesis procedures, morphologies, and size entities.

Authors:  Kevin Cruse; Amalie Trewartha; Sanghoon Lee; Zheren Wang; Haoyan Huo; Tanjin He; Olga Kononova; Anubhav Jain; Gerbrand Ceder
Journal:  Sci Data       Date:  2022-05-26       Impact factor: 8.501

2.  Dataset of solution-based inorganic materials synthesis procedures extracted from the scientific literature.

Authors:  Zheren Wang; Olga Kononova; Kevin Cruse; Tanjin He; Haoyan Huo; Yuxing Fei; Yan Zeng; Yingzhi Sun; Zijian Cai; Wenhao Sun; Gerbrand Ceder
Journal:  Sci Data       Date:  2022-05-25       Impact factor: 8.501

3.  Automated knowledge extraction from polymer literature using natural language processing.

Authors:  Pranav Shetty; Rampi Ramprasad
Journal:  iScience       Date:  2020-12-10

Review 4.  Opportunities and challenges of text mining in aterials research.

Authors:  Olga Kononova; Tanjin He; Haoyan Huo; Amalie Trewartha; Elsa A Olivetti; Gerbrand Ceder
Journal:  iScience       Date:  2021-02-06

5.  Intelligent transportation systems (ITS): A systematic review using a Natural Language Processing (NLP) approach.

Authors:  Tsarina Dwi Putri
Journal:  Heliyon       Date:  2021-12-16

6.  Single Model for Organic and Inorganic Chemical Named Entity Recognition in ChemDataExtractor.

Authors:  Taketomo Isazawa; Jacqueline M Cole
Journal:  J Chem Inf Model       Date:  2022-02-24       Impact factor: 6.162

7.  Quantifying the advantage of domain-specific pre-training on named entity recognition tasks in materials science.

Authors:  Amalie Trewartha; Nicholas Walker; Haoyan Huo; Sanghoon Lee; Kevin Cruse; John Dagdelen; Alexander Dunn; Kristin A Persson; Gerbrand Ceder; Anubhav Jain
Journal:  Patterns (N Y)       Date:  2022-04-08

Review 8.  Progress and prospects for accelerating materials science with automated and autonomous workflows.

Authors:  Helge S Stein; John M Gregoire
Journal:  Chem Sci       Date:  2019-09-20       Impact factor: 9.825

Review 9.  Can we predict materials that can be synthesised?

Authors:  Filip T Szczypiński; Steven Bennett; Kim E Jelfs
Journal:  Chem Sci       Date:  2020-12-09       Impact factor: 9.825

10.  MOFSimplify, machine learning models with extracted stability data of three thousand metal-organic frameworks.

Authors:  Aditya Nandy; Gianmarco Terrones; Naveen Arunachalam; Chenru Duan; David W Kastner; Heather J Kulik
Journal:  Sci Data       Date:  2022-03-11       Impact factor: 6.444

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.