Literature DB >> 34079968

Determining usefulness of machine learning in materials discovery using simulated research landscapes.

Marcos Del Cueto1, Alessandro Troisi1.   

Abstract

When existing experimental data are combined with machine learning (ML) to predict the performance of new materials, the data acquisition bias determines ML usefulness and the prediction accuracy. In this context, the following two conditions are highly common: (i) constructing new unbiased data sets is too expensive and the global knowledge effectively does not change by performing a limited number of novel measurements; (ii) the performance of the material depends on a limited number of physical parameters, much smaller than the range of variables that can be changed, albeit such parameters are unknown or not measurable. To determine the usefulness of ML under these conditions, we introduce the concept of simulated research landscapes, which describe how datasets of arbitrary complexity evolve over time. Simulated research landscapes allow us to use different discovery strategies to compare standard materials exploration with ML-guided explorations, i.e. we can measure quantitatively the benefit of using a specific ML model. We show that there is a window of opportunity to obtain a significant benefit from ML-guided strategies. The adoption of ML can take place too soon (not enough information to find patterns) or too late (dense datasets only allow for negligible ML benefit), and the adoption of ML can even slow down the discovery process in some cases. We offer a qualitative guide on when ML can accelerate the discovery of new best-performing materials in a field under specific conditions. The answer in each case depends on factors like data dimensionality, corrugation and data collection strategy. We consider how these factors may affect the ML prediction capabilities and discuss some general trends.

Year:  2021        PMID: 34079968     DOI: 10.1039/d1cp01761f

Source DB:  PubMed          Journal:  Phys Chem Chem Phys        ISSN: 1463-9076            Impact factor:   3.676


  2 in total

1.  Limitations of machine learning models when predicting compounds with completely new chemistries: possible improvements applied to the discovery of new non-fullerene acceptors.

Authors:  Zhi-Wen Zhao; Marcos Del Cueto; Alessandro Troisi
Journal:  Digit Discov       Date:  2022-03-25

2.  Data-Driven Analysis of Hole-Transporting Materials for Perovskite Solar Cells Performance.

Authors:  Marcos Del Cueto; Charles Rawski-Furman; Juan Aragó; Enrique Ortí; Alessandro Troisi
Journal:  J Phys Chem C Nanomater Interfaces       Date:  2022-07-29       Impact factor: 4.177

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.