| Literature DB >> 29946023 |
Quan Zhou1, Peizhe Tang1, Shenxiu Liu1, Jinbo Pan2, Qimin Yan2, Shou-Cheng Zhang3,4.
Abstract
Exciting advances have been made in artificial intelligence (AI) during recent decades. Among them, applications of machine learning (ML) and deep learning techniques brought human-competitive performances in various tasks of fields, including image recognition, speech recognition, and natural language understanding. Even in Go, the ancient game of profound complexity, the AI player has already beat human world champions convincingly with and without learning from the human. In this work, we show that our unsupervised machines (Atom2Vec) can learn the basic properties of atoms by themselves from the extensive database of known compounds and materials. These learned properties are represented in terms of high-dimensional vectors, and clustering of atoms in vector space classifies them into meaningful groups consistent with human knowledge. We use the atom vectors as basic input units for neural networks and other ML models designed and trained to predict materials properties, which demonstrate significant accuracy.Entities:
Keywords: atomism; machine learning; materials discovery
Year: 2018 PMID: 29946023 PMCID: PMC6048531 DOI: 10.1073/pnas.1801181115
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Atom2Vec workflow to learn atoms from the materials database. Atom–environment pairs are generated for every compound in the materials database, based on which atom–environment matrix is constructed. A small dataset of seven compounds is used here as an example. Entries of the atom–environment matrix denote the numbers of atom–environment pairs. Inset shows the unit cell of compound and the pair corresponding to the entry of target atom and environment . Only compositional information is considered, while structural information is ignored. Atom2Vec learning algorithms extract knowledge of atoms from the atom–environment matrix and encode learned properties in atom vectors.
Fig. 2.Atom vectors of main-group elements learned by the model-free approach. (A) Illustration of atom vectors of 34 main-group elements in vector space of dimension and their hierarchical clustering based on distance metric (). Rows and columns denote atom types and dimension indexes, respectively; color in each cell stands for value of the vector on that dimension. Background colors of atom symbols label their columns in the periodic table. Dashed red boxes circle major atom clusters from hierarchical clustering. Red arrows point to dimensions distinguishing different types of atoms. (B) Projection of the atom vectors of 34 main-group elements onto the plane spanned by the first and second principal axes (PC1 and PC2). The percentage in parentheses gives the proportion of variance on that principal axis direction. Inset shows the periodic table of elements for reference. Dashed red circles show two distinctive clusters corresponding to two types of active metals. (C) Projection of atom vectors of 34 main-group elements onto the plane spanned by the third and fourth principal axes (PC3 and PC4). The percentage in parentheses gives the proportion of variance on that principal axis. Dashed red arrow indicates a valence trend almost parallel to PC3.
Fig. 3.Evaluation of atom vectors of main-group elements on elpasolites formation energy prediction. (A) Crystal structure of elpasolites . (B) Architecture of the one-hidden-layer neural network for formation energy prediction. Colored boxes represent atom vectors of atoms , , , and , respectively, and gray box in the hidden layer is representation of the elpasolites compound. (C) Trained weights on connections between the input layer and the hidden layer in the neural network for formation energy prediction using model-free atom vectors of dimension . (D) Mean absolute test errors of formation energy prediction using different sets of atom features. Empirical features refer to the position of an atom in the periodic table, padded with random noise in expanded dimensions if necessary. Model-based features are atom vectors learned from our model-based method using an inverse-square score function (). Model-free features are atom vectors learned from our model-free method. Error bars show the SDs of mean absolute prediction errors on five different random train/test/validation splits. (E) Comparison of exact formation energy and predicted formation energy using model-free atom vectors. Inset shows the distribution of prediction errors.
Fig. 4.Atom vectors of functional groups and elements beyond main groups learned by the model-free approach and the evaluation on tasks of half-Heusler compounds. (A) Illustration of atom vectors of non–main-group elements and functional groups in vector space of dimension and their hierarchical clustering based on distance metric (). Rows and columns denote atom types and dimension indexes, respectively; color in each cell stands for value of the vector on that dimension. Dashed red boxes circle major clusters in hierarchical clustering. (B) Crystal structure of half-Heusler alloys . (C) Mean absolute test errors of formation energy prediction given by ridge regression using different sets of atom features. (D) Mean classification error rates of metal/insulator classifications given by the 18-electron rule and logistic regression with model-based and model-free atom vectors of dimension .