| Literature DB >> 35783179 |
Tongyu Liu1, Katherine R Johnson2, Santa Jansone-Popova2, De-En Jiang1.
Abstract
Constituting the bulk of rare-earth elements, lanthanides need to be separated to fully realize their potential as critical materials in many important technologies. The discovery of new ligands for improving rare-earth separations by solvent extraction, the most practical rare-earth separation process, is still largely based on trial and error, a low-throughput and inefficient approach. A predictive model that allows high-throughput screening of ligands is needed to identify suitable ligands to achieve enhanced separation performance. Here, we show that deep neural networks, trained on the available experimental data, can be used to predict accurate distribution coefficients for solvent extraction of lanthanide ions, thereby opening the door to high-throughput screening of ligands for rare-earth separations. One innovative approach that we employed is a combined representation of ligands with both molecular physicochemical descriptors and atomic extended-connectivity fingerprints, which greatly boosts the accuracy of the trained model. More importantly, we synthesized four new ligands and found that the predicted distribution coefficients from our trained machine-learning model match well with the measured values. Therefore, our machine-learning approach paves the way for accelerating the discovery of new ligands for rare-earth separations.Entities:
Year: 2022 PMID: 35783179 PMCID: PMC9241157 DOI: 10.1021/jacsau.2c00122
Source DB: PubMed Journal: JACS Au ISSN: 2691-3704
Figure 1Distribution of the total data set of 1202 experimental log D values: (a) based on Ln(III), excluding radioactive Pm(III); (b) the value range. (c) Chemical structures of some representative ligands in the data set. (d) Workflow of predicting log D of Ln(III) extracted by a ligand via fully connected neural networks.
Figure 2Comparing the three different approaches, RDKit, ECFP, or ECFP + RDKit, to represent ligands, based on the validation set performances of the trained FCNN for predicting log D against the experiment in the first 5000 epochs: (a) coefficient of determination, R2, between the predicted log D and experimental log D values; (b) root-mean-square error, RMSE, between the predicted log D and experimental log D values (also measured against the standard deviation, σ, of experimental log D values of the training set, right axis). FCNN hyperparameters: 0.00001 learning rate, PReLU activation functions, 0.01 weight decay, three hidden layers, and the number of neurons on each layer = 512, 128, and 16.
Figure 3Performance of the best FCNN model. The parity plot between the predicted and experimental log D values: (a) training set and (b) validation set.
Figure 4Predictions on new ligands. (a) Chemical structures of new ligands 1–4 synthesized for Ln(III) extractions. (b) R2 and MAE values of predicted log D for new ligands 1–4 in comparison with the measured values. (c) Parity plots between the predicted and experimental log D for ligands 1–4; there are 14 data points for each ligand, representing 14 Ln(III)s extracted at the same conditions.