| Literature DB >> 31909619 |
Edward Kim1, Zach Jensen1, Alexander van Grootel1, Kevin Huang1, Matthew Staib2, Sheshera Mysore3, Haw-Shiuan Chang3, Emma Strubell3, Andrew McCallum3, Stefanie Jegelka2, Elsa Olivetti1.
Abstract
Leveraging new data sources is a key step in accelerating the pace of materials design and discovery. To complement the strides in synthesis planning driven by historical, experimental, and computed data, we present an automated, unsupervised method for connecting scientific literature to inorganic synthesis insights. Starting from the natural language text, we apply word embeddings from language models, which are fed into a named entity recognition model, upon which a conditional variational autoencoder is trained to generate syntheses for any inorganic materials of interest. We show the potential of this technique by predicting precursors for two perovskite materials, using only training data published over a decade prior to their first reported syntheses. We demonstrate that the model learns representations of materials corresponding to synthesis-related properties and that the model's behavior complements the existing thermodynamic knowledge. Finally, we apply the model to perform synthesizability screening for proposed novel perovskite compounds.Entities:
Mesh:
Year: 2020 PMID: 31909619 DOI: 10.1021/acs.jcim.9b00995
Source DB: PubMed Journal: J Chem Inf Model ISSN: 1549-9596 Impact factor: 4.956