Literature DB >> 31270483

Unsupervised word embeddings capture latent knowledge from materials science literature.

Vahe Tshitoyan1,2, John Dagdelen3,4, Leigh Weston3, Alexander Dunn3,4, Ziqin Rong3, Olga Kononova4, Kristin A Persson3,4, Gerbrand Ceder5,6, Anubhav Jain7.   

Abstract

The overwhelming majority of scientific knowledge is published as text, which is difficult to analyse by either traditional statistical analysis or modern machine learning methods. By contrast, the main source of machine-interpretable data for the materials research community has come from structured property databases1,2, which encompass only a small fraction of the knowledge present in the research literature. Beyond property values, publications contain valuable knowledge regarding the connections and relationships between data items as interpreted by the authors. To improve the identification and use of this knowledge, several studies have focused on the retrieval of information from scientific literature using supervised natural language processing3-10, which requires large hand-labelled datasets for training. Here we show that materials science knowledge present in the published literature can be efficiently encoded as information-dense word embeddings11-13 (vector representations of words) without human labelling or supervision. Without any explicit insertion of chemical knowledge, these embeddings capture complex materials science concepts such as the underlying structure of the periodic table and structure-property relationships in materials. Furthermore, we demonstrate that an unsupervised method can recommend materials for functional applications several years before their discovery. This suggests that latent knowledge regarding future discoveries is to a large extent embedded in past publications. Our findings highlight the possibility of extracting knowledge and relationships from the massive body of scientific literature in a collective manner, and point towards a generalized approach to the mining of scientific literature.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 31270483     DOI: 10.1038/s41586-019-1335-8

Source DB:  PubMed          Journal:  Nature        ISSN: 0028-0836            Impact factor:   49.962


  59 in total

Review 1.  Big-Data Science in Porous Materials: Materials Genomics and Machine Learning.

Authors:  Kevin Maik Jablonka; Daniele Ongari; Seyed Mohamad Moosavi; Berend Smit
Journal:  Chem Rev       Date:  2020-06-10       Impact factor: 60.622

2.  Cultural influences on word meanings revealed through large-scale semantic alignment.

Authors:  Bill Thompson; Seán G Roberts; Gary Lupyan
Journal:  Nat Hum Behav       Date:  2020-08-10

3.  Improving data access democratizes and diversifies science.

Authors:  Abhishek Nagaraj; Esther Shears; Mathijs de Vaan
Journal:  Proc Natl Acad Sci U S A       Date:  2020-09-08       Impact factor: 11.205

4.  Predicting research trends with semantic and neural networks with an application in quantum physics.

Authors:  Mario Krenn; Anton Zeilinger
Journal:  Proc Natl Acad Sci U S A       Date:  2020-01-14       Impact factor: 11.205

Review 5.  Artificial Intelligence Applied to Battery Research: Hype or Reality?

Authors:  Teo Lombardo; Marc Duquesnoy; Hassna El-Bouysidy; Fabian Årén; Alfonso Gallo-Bueno; Peter Bjørn Jørgensen; Arghya Bhowmik; Arnaud Demortière; Elixabete Ayerbe; Francisco Alcaide; Marine Reynaud; Javier Carrasco; Alexis Grimaud; Chao Zhang; Tejs Vegge; Patrik Johansson; Alejandro A Franco
Journal:  Chem Rev       Date:  2021-09-16       Impact factor: 72.087

6.  Perovskite- and Dye-Sensitized Solar-Cell Device Databases Auto-generated Using ChemDataExtractor.

Authors:  Edward J Beard; Jacqueline M Cole
Journal:  Sci Data       Date:  2022-06-17       Impact factor: 8.501

7.  Unsupervised acquisition of idiomatic units of symbolic natural language: An n-gram frequency-based approach for the chunking of news articles and tweets.

Authors:  Dario Borrelli; Gabriela Gongora Svartzman; Carlo Lipizzi
Journal:  PLoS One       Date:  2020-06-08       Impact factor: 3.240

8.  Combining Machine Learning and Computational Chemistry for Predictive Insights Into Chemical Systems.

Authors:  John A Keith; Valentin Vassilev-Galindo; Bingqing Cheng; Stefan Chmiela; Michael Gastegger; Klaus-Robert Müller; Alexandre Tkatchenko
Journal:  Chem Rev       Date:  2021-07-07       Impact factor: 60.622

9.  Large-scale literature mining to assess the relation between anti-cancer drugs and cancer types.

Authors:  Chris Bauer; Ralf Herwig; Matthias Lienhard; Paul Prasse; Tobias Scheffer; Johannes Schuchhardt
Journal:  J Transl Med       Date:  2021-06-26       Impact factor: 5.531

10.  Measuring novelty in science with word embedding.

Authors:  Sotaro Shibayama; Deyun Yin; Kuniko Matsumoto
Journal:  PLoS One       Date:  2021-07-02       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.