Literature DB >> 35811908

Machine Learning Approach to Delineate the Impact of Material Properties on Solar Cell Device Physics.

Md Shafiqul Islam¹, Md Tohidul Islam², Saugata Sarker¹, Hasan Al Jame¹, Sadiq Shahriyar Nishat³, Md Rafsun Jani¹, Abrar Rauf¹, Sumaiyatul Ahsan¹, Kazi Md Shorowordi¹, Harry Efstathiadis⁴, Joaquin Carbonara⁵, Saquib Ahmed⁶.

Abstract

In this research, solar cell capacitance simulator-one-dimensional (SCAPS-1D) software was used to build and probe nontoxic Cs-based perovskite solar devices and investigate modulations of key material parameters on ultimate power conversion efficiency (PCE). The input material parameters of the absorber Cs-perovskite layer were incrementally changed, and with the various resulting combinations, 63,500 unique devices were formed and probed to produce device PCE. Versatile and well-established machine learning algorithms were thereafter utilized to train, test, and evaluate the output dataset with a focused goal to delineate and rank the input material parameters for their impact on ultimate device performance and PCE. The most impactful parameters were then tuned to showcase unique ranges that would ultimately lead to higher device PCE values. As a validation step, the predicted results were confirmed against SCAPS simulated results as well, highlighting high accuracy and low error metrics. Further optimization of intrinsic material parameters was conducted through modulation of absorber layer thickness, back contact metal, and bulk defect concentration, resulting in an improvement in the PCE of the device from 13.29 to 16.68%. Overall, the results from this investigation provide much-needed insight and guidance for researchers at large, and experimentalists in particular, toward fabricating commercially viable nontoxic inorganic perovskite alternatives for the burgeoning solar industry.

Entities: Chemical

Year: 2022 PMID： 35811908 PMCID： PMC9260917 DOI： 10.1021/acsomega.2c01076

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

Machine learning (ML), a subfield of artificial intelligence (AI),[1,2] utilizes the knowledge of mathematics, statistics, and computer science[3] to build computer algorithms for specific aims. The algorithmic system learns from the experimental or computational data, analyzes it, and builds patterns to anticipate behavior with the goal to make better judgments. It is, therefore, no surprise that over the past decade, ML has become a vital tool in all branches of STEM. The field of Material Science has evolved in step during this time as well, allowing scientists to utilize a broad variety of model prediction methods and tools based on ML algorithms for use in different materials and devices.[4−6] With the use of these tools, material scientists can now devise new ways of investigating materials’ characteristics and improving material performance in general. Generating data from a specific process or device followed by data wrangling, feature generation, feature engineering, constructing models, and eventually making choices to get optimum outputs[7] are the steps in the ML processes. Perovskites, a material family with a crystal structure analogous to the mineral ‘perovskite’, consisting of (CaTiO3),[8] with the generalized formula ABX3 (where A represents cation species, e.g., CH3NH3, HC(NH2)2, Cs, etc.; B represents metal species, e.g., Sn, Pb, Ge, etc.; and X represents halide species[9,10]), have shown promising results for light capture, exciton production, and charge transition to corresponding device layers for extraction.[11,12] Pb halide perovskites, particularly, have improved PCE from 3.8 to 25.2 percent in the last decade.[13−15] The wide absorption range, high diffusion length, and excellent charge-carrier mobility of these Pb-based perovskites are remarkable. The perovskite material may be used alone as an absorber in different solar device designs and architectures, but it can also be utilized in conjunction with the standard silicon layer to lower the $/W value.[16] While there are many benefits to using perovskite solar cells (PSCs), the toxicity and harmful consequences of Pb must be considered.[17−19] Pb is leached and transported by water, air, and soil[20] (Figure ). Despite its excellent performance and durability,[21,22] commercializing this technology on a wide scale poses substantial challenges.[23] A broad range of nontoxic materials is currently being investigated for commercial feasibility using important device performance characteristics such as efficiency, stability, and degradability.[24−26]

Figure 1

Schematic of environment pollution by Pb-based perovskites.

Schematic of environment pollution by Pb-based perovskites. The current research utilizes supervised ML to critically investigate nontoxic perovskite devices (ABX3: A = Cs; B = Sn, Bi, Ge, Ag, and Sb and X = I, Br) and offers researchers clear guidance toward enhancing PCE values. Cs-based perovskites were particularly chosen given the fact that Cs can strongly tune the properties and performance of PSCs, in particular leading to higher device stability. This has been made clear through the fact that for the last few decades, researchers at large have strongly concentrated on the development of Cs-based perovskites with an aim to improve the stability, reproducibility, and spectral properties of PSCs.[27] This focus on Cs has additionally been beneficial due to the nontoxic nature of the material.[25,26] It is encouraging to see that experimental studies have been steadily showcasing this viability of Cs-based perovskite materials through stability and increasing PCE measurements.[24] In the present research, solar cell capacitance simulator-one-dimensional (SCAPS-1D) was used for simulating single-junction Cs-perovskite (Figure showcases a standard CsSnI3 structure) solar devices; the perovskite (absorber) parameters were modulated by incremental stages. A fairly large dataset consisting of the performance outputs from 63,500 unique devices was obtained. By utilizing the standard correlation algorithm together with a Random Forest algorithm, models were created and utilized to classify the properties of the material as a function of the highest impact on the performance and ultimate device PCE. The results from this investigation provide clear recommendations for researchers to selectively focus on and probe parameters that will impact device PCE the most, thereby providing a plausible pathway for disruption in the photovoltaic sector utilizing nontoxic inorganic perovskite materials.

Figure 2

Crystal structure of CsSnI3.

Materials and Methodology

SCAPS-1D, devised by the University of Gent’s ELIS department,[28−31] is used for our present work to simulate single-junction perovskite solar cells’ numerical simulations. SCAPS is a highly versatile software and is prolifically used within the photovoltaic community. It allows for a wide range of device architectures to be built and probed, utilizing the most realistic and accurate back-end physical equations to mimic fundamental photovoltaic activity such as light capture, exciton generation, charge transport, and recombination.

SCAPS Governing Equations and Definitions of Critical SCAPS Output Parameters

SCAPS solves three systems of equations for the carriers:[32] transport, Poisson, and continuity. The following is the carrier continuity equation: The electron–hole current densities, respectively, are denoted by J and Jp, the recombination and generation rates are denoted by R(x), G(x), respectively, and the position-dependent electron and hole concentrations, respectively, are denoted by p(x) and n(x). The drift-diffusion of electron–hole pairs is described by the following equations:where μp and μ denote the electron and hole mobility, respectively, and D and Dp denote the electron and hole diffusion constants, respectively. SCAPS-1D solves the Poisson equation and continuity equations for electrons and holes together by taking the appropriate boundary conditions at the interfaces and contacts[33] into consideration. The fill factor (FF) of the device is defined as follows:where Vmp and Jmp represent the voltage and current density at the maximum power points. The short-circuit current density is denoted by Jsc, and the open-circuit voltage is denoted by Voc. Defining how the aforementioned factors interact to produce the research’s primary result, notably device efficiency normalized to power input (Pin), is vital at this point

Device Structure

The solar device simulated in SCAPS and investigated in the present research is shown schematically in Figure . A conventional perovskite solar device consists of an antireflective coating, glass, a fluorine-doped tin oxide (FTO) electrode, an electron-transport layer (ETL), a perovskite layer, a hole transport layer (HTL), and an electrode layer.[34] In the present research, the device layers constitute (glass/fluorine-doped tin oxide (FTO)/TiO2/Cs-based perovskite/Cu2O/Au). The noble metals Au, Ag, and Pt are commonly used as electrode materials.[35] This type of structure significantly lowers electron–hole recombination and provides the necessary diffusion length for efficient electron–hole capture.[36] In the perovskite layer, a maximum portion of the light is absorbed to create electron–hole pairs,[35] and the rest of the light cannot be absorbed or converted to heat. As the electron and hole pairs are generated, they are transported through the electron-transport layer (ETL) and hole–transport layer (HTL) to generate electric current. The ETL extracts electrons from the perovskite layer and prevents electrons in the FTO from recombining with holes. TiO2 has been utilized as an ETL in most of the published perovskite solar devices.[37] The HTL serves a similar purpose as the ETL, but for holes; in the current work, Cu2O acts as the standard device HTL. Glass and FTO layers are used to increase optical absorption for higher light absorption in the layers.[38] However, typically, device performance only depends on the ETL, absorber, and HTL. Tables –3 list the material’s property values for each layer as utilized in this current research and used as input parameters into the SCAPS software. The SCAPS simulation settings were configured to make use of the specified spectrum of A.M. 1.5G at an operating temperature of 300 K.

Figure 3

Schematic of the solar device structure utilized in SCAPS simulation.

Table 1

Cs-Based Lead-Free Perovskite Material Parameters

parameter	Cs₂SnI₆	Cs₃Bi₂I₉	Cs₂AgBiBr₆	CsSn_0.5Ge_0.5I₃	CsGeI₃	Cs₃Sb₂I₉	CsSnI₃	Cs₂TiBr₆
bandgap (eV)	1.48	2.03	2.05	1.5	1.6	2.05	1.3	1.6
electron affinity	4.3	3.4	4.19	3.9	3.52	3.65	3.5	4.47
relative permittivity	7.2	9.68	5.8	28	18	13.04	9.93	10
conduction band DoS (cm^–3)	4.76 × 10¹⁸	4.98 × 10¹⁹	1 × 10¹⁹	1 × 10¹⁹	1 × 10¹⁸	4.33 × 10¹⁸	1 × 10¹⁹	1 × 10¹⁹
valence band DoS (cm^–3)	4.6 × 10¹⁹	2.11 × 10¹⁹	1 × 10¹⁹	1 × 10¹⁹	1 × 10¹⁹	7.58 × 10¹⁸	1 × 10¹⁸	1 × 10¹⁹
electron mobility (cm²/V.s)	2.3	4.3	11.81	974	20	1.8	1500	4.4
hole mobility (cm²/V.s)	2.3	1.7	1.00	213	20	0.14	585	2.5
donor concentration, N_d (cm^–3)	1 × 10⁹	1 × 10¹⁹	1 × 10⁹	1 × 10⁹	2 × 10¹⁶	1 × 10⁹	0	1 × 10¹⁹
acceptor concentration, N_a (cm^–3)	1 × 10⁹	1 × 10¹⁹	1 × 10⁹	1 × 10⁹		1 × 10⁹	1 × 10¹⁵	1 × 10¹⁹

Table 3

Parameters of ETL and HTL

parameter/layer	TiO₂(ETL)	perovskite (absorber)	Cu₂O(HTL)
layer thickness (nm)	150	350 (static)	150
relative permittivity (ε_r)	9	7–11	7.11
bandgap energy (eV)	3.2	1.5–1.7	2.17
electron affinity (eV)	4.26	4.15–4.30	3.2
mobility of electron (cm²/V.s)	20	20–100	200
mobility of hole (cm²/V.s)	10	20–100	80
donor level concentration (N_d) (cm^–3)	1.0 × 10¹⁶	1.0 × 10⁹ (static)	0
acceptor level concentration (N_a) (cm^–3)	0	1.0 × 10⁹ (static)	1.0 × 10¹⁸
conduction band DoS (N_c) (cm^–3)	2.2 × 10¹⁸	2.0 × 10¹⁸–1.0 × 10¹⁹	2.02 × 10¹⁷
valence band DoS (N_v) (cm^–3)	1.8 × 10¹⁸	2.0 × 10¹⁸–1.0 × 10¹⁹	1.1 × 10¹⁹
radiative recombination (cm³/s)	2.3 × 10^–9	2.3 × 10^–9 (static)	2.3 × 10^–9

Schematic of the solar device structure utilized in SCAPS simulation.

Simulation Parameters

The most fundamental parameters of each material layer utilized in the solar device for the current simulation are described below. These parameters are all material properties unique to each device layer, dependent on innate functionalities based on chemical composition, crystallographic orientation, etc. Thickness of the layers is optimized to fixed values to give maximum efficiency.[39−41] The bandgap of a material layer is related to its chemistry. It determines which portion of the electromagnetic (EM) spectrum is absorbed by the layers. Only photons having equal or higher than bandgap energy are absorbed.[42] The ETL should have a high-energy bandgap to enable a considerable portion of the electromagnetic spectrum to flow through and reach the perovskite (absorber) layer.[42] Electron affinity: The perovskite (absorber) layer’s electron affinity is significant for the current investigation. To produce an electric current, the electron–hole pairs must be routed to the ETL and HTL. As the higher value of electron affinity means a larger barrier to moving electrons from the absorber to ETL, a minimal value of perovskite electron affinity is required.[42] Relative permittivity (or dielectric constant) measures how quickly a material polarizes in an electric field.[43] The greater the value, the more likely it is to form exciton couples. As a result, we are more concerned with the absorber layer’s relative permittivity for our application. Conduction band density of states: In the perovskite and ETL, a larger DoS of the conduction band is preferable. This is a material feature that allows it to accept and conduct more electrons in the conduction band. Electrons will flow from the absorber’s (perovskite) conduction band to the ETL’s conduction band as soon as pairs of electron–hole are generated in the perovskite. Therefore, higher values of conduction band density of perovskites and ETL states have more influence on device PCE.[42] Valence band density of states: The holes will conduct from the perovskite’s valence band to the HTL’s valence band once they are formed. Higher valence band densities of perovskites and HTL states have more influence on device PCE.[42] Electron mobility: Maximal electron mobility is desired in both perovskites and ETLs. Once the electron–hole pairs are formed, the goal is to get the electrons out of the device as efficiently as possible (flow from perovskite → ETL → electrode). Hole mobility: same logic as #7 above. In this case, maximal hole mobility is desired in both perovskites and HTLs. Donor concentration: The concept of donor concentration originates in the doped semiconductor materials in favorable conditions. There, the addition of a specific dopant or impurity can add additional energy levels in the band alignment and provide a favorable passage of electrons to the conduction band. In an otherwise intrinsic system with no additional electron in the conductor band (which is the case for most moderate to high bandgap semiconductor materials), the concentration of electrons in the conduction band will be roughly equal to the donor concentration. This particular property can aid the transition and charge transfer in the perovskite/ETL interface further facilitating more carrier separation. However, for the HTL/perovskite interface, such energy levels in the band may facilitate the opposite effect. So, in an ideal perovskite solar device, the donor concentration on HTL is expected to be negligible. To maintain congruence with this idea, the donor concentration for the HTL in our input parameter in this simulation is kept at a null value. Acceptor concentration: The acceptor energy level takes electrons from the valence band or donates an electron to the conduction band, upon the addition of a dopant or an impurity. It is fundamentally related to the separatist that occurs at the ETL/perovskite interface. So, the value of acceptor concentration is taken from the experimental and previously reported literature on the topic. Absorption coefficient values: We have listed each material’s absorption coefficient vs light wavelength (400–700 nm). Also, we are more concerned with the perovskite (absorber) layer’s absorption coefficient as light absorption leads to exciton (electron–hole) couple generation. Recombination rate of electrons and holes: Recombination of electrons and holes can severely degrade the performance of solar devices. To maintain congruence with the previous investigations[44,45] on the subject and experimental data, a reactive recombination coefficient of 2.3 × 10–9 cm3/s was considered in the bulk of perovskites. In addition, the effect of Auger recombination was considered to be negligible as the radiative mode of recombination significantly dominates Auger recombination.[46−48] For the present work, the device output data was generated in SCAPS utilizing the parameters of ETL (TiO2) and HTL (Cu2O) to be optimized and constant (static) while varying the critical input parameters of the Cs-perovskite absorber layer over ranges derived from the variation of various Cs-perovskite materials (as listed in Table ). As listed above, these critical input material parameters include valence band density of state, conduction band density of state, electron affinity, bandgap, electron mobility, hole mobility, and permittivity. The optical absorption coefficient α, for the absorber layer, was set from a model given as follows:Here, Eg is denoted as the material bandgap and A and B are the model parameters. The model shown in the absorption coefficient formula contains two parameters A and B. B is a constant that is associated with adding graded adsorption to a graded material layer. It means that if a single layer contains a composition gradient, then the varying absorption across the layer is modeled by the second parameter. However, for a monolithic material layer with no composition gradient, the B parameter is set to 0, which is used for current simulation. So, for direct bandgap in our simulation A is a certain frequency-independent constant with the following formula:[49]where ℏ is reduced Plank’s constant, mr is the reduced mass that depends on the effective mass of electrons and holes, q is the elementary charge, n is the index of refraction, ∈0 is the vacuum permittivity, and xvc is a matrix element that depends on the lattice constant. For the absorbance curve input, an average of the absorbance curves (Figure ) of the Cs-perovskite materials listed in Table was taken.

Figure 4

Optical absorption coefficient of Cs-based perovskite materials.

Optical absorption coefficient of Cs-based perovskite materials. The parameters’ ranges of the perovskite layer were chosen based on published data for Cs-based perovskites reported and validated through both computational and theoretical methods.[41,44,45,50−56] These data have been consolidated in Table . Table shows the modulation range, increment δ, and the total number of steps for all of these input material parameters. All possible combinations of the input parameters were utilized to generate 63,500 unique devices, and their outputs including the key parameters defined in Section , Voc, Jsc, FF, and η, are provided in the Supporting Document. For the ML model and decision-making algorithm, the present research focused solely on the value of device PCE (η), which, as shown previously in Section , is a function of the Voc, Jsc, and FF.

Table 2

Perovskite Layer Parameters’ Range with Increments and Steps

parameter (with units)	range	increment delta	total number of steps
relative permittivity (ε_r)	7–11	1	5
bandgap energy (eV)	1.5–1.7	0.05	5
electron affinity (eV)	4.15–4.30	0.05	4
electron mobility (cm²/V.s)	20–100	20	5
hole mobility (cm²/V.s)	20–100	20	5
conduction band DoS (N_c) (cm^–3)	2 × 10¹⁸–1 × 10¹⁹	2 × 10¹⁸	5
valence band DoS (N_v) (cm^–3)	2 × 10¹⁸–1 × 10¹⁹	2 × 10¹⁸	5

A holistic breakdown of the input parameters of the three layers (ETL, absorber, HTL) as needed by SCAPS has been provided in Table . As can be seen, all parameters for the ETL and HTL are held static, together with a few of the Cs-based perovskite absorber parameters (thickness, donor concentration, acceptor concentration, radiative recombination) for ensuring focus on the more important parameters (explained below). The values were thoroughly extracted from previously published experiments and simulation-based literature,[41,44,45,57−65] with experimental findings validated against computational values. It is important to reiterate the perspective and context for the present research here. While the ETL and HTL layers impact the overall performance of the device, the main focus of present research is to probe the nontoxic absorber layer, where the most important photovoltaic actions of light capture, exciton generation, charge transport, and recombination occur. The given seven parameters (relative permittivity, bandgap energy, electron affinity, electron mobility, hole mobility, conduction band DoS, valence band DoS) were chosen to be varied due to the direct way they can be experimentally probed and changed by varying material synthesis, deposition, and overall device fabrication conditions. At the end of the day, the goal of this ML project is to provide incisive insights into input materials’ parameter impact on final device efficiency, with an ability to rank these impacts. This guidance can assist experimentalists directly in making high-efficiency devices.

Machine Learning Workflow

Data wrangling is a necessary step for a purely experimental dataset.[66] There can potentially be multiple measurements and different outputs for the same processing input parameter values in experimental settings. There is a need therefore to clean the data, based on experimental reliability and other factors. For the present research, the entire dataset was generated through simulation utilizing SCAPS-1D software. Again, a Decision Tree model was utilized for the current research; data preprocessing was therefore not required since tree-based models can handle qualitative predictors, i.e., they can generate predictions.[67] By contrast, data normalization and some other preprocessing steps are required in other models. Therefore, the raw output data from SCAPS, without any special data processing, was fully utilized. The raw imported data was a combination of the independent input parameters as listed in Table (also known in the Data Science field as ‘features’) and the solar device outputs (which are the dependent variables and also called ‘target variables’). This initial data was split into two single datasets: one containing the features and the other containing the target variables. After splitting the raw data into features and targets, 75% of the dataset (the raw Data) was utilized to train the model, and the rest (25%) was used as the test set; the test set remained unseen, that is, independent from the model, during its execution for predicting the output. We employed a decision tree model, which is one of the most used machine learning models due to its ease of understanding and clarity.[68] Even though the model’s outcomes are discrete and appear as clustered data, it is easy to grasp and analyze,[69] and it allows the finding of the most important features and correlations among multiple parameters using an intuitive method. As a result, the model provides extremely precise forecasts.[70] The model was trained using the RandomForestRegressor class[71] from scikit-learn (a highly useful library for ML used for classification, regression, clustering, etc.). As decision tree is identical to random forest having only one tree,[72] the hyperparameter (which regulates the learning), namely, ‘n estimators’, was chosen to the default value 1. So, the significance of using random forest decision tree model is that when overfitting is suspected to occur in a decision tree, it may be possible to tune the hyperparameter ‘n estimators’, that represents the number of trees. Overfitting is suspected if the model performs excellently on the training set but performs badly on the test dataset, i.e., the test accuracy is significantly lower.[73] After generating output data utilizing SCAPS, the decision tree model was utilized to fit the dataset; after evaluating its performance, features were generated and ranked relative to each other. The decision tree was visualized by taking only the most impactful parameters, together with generating a prediction toward how to tune the parameters for attaining higher device PCE values. The relative importance of features can also be known from the correlation matrix[74] by creating a correlation between the features and the target variable. Feature importance[75,76] in random forest provides a similar outcome as a correlation matrix does but can rank the input parameters as a function of importance based on impact on the output, provided that the model performs well. Upon executing the random forest model, the less important features in impacting the device PCE (valence band density of state, conduction band density of state, electron mobility, and permittivity) were excluded, and a new data utilizing only the most important features was created. This new data had the same number of rows compared to the initial data, but the features (input parameters) were no longer unique, i.e., there were multiple outputs with the same sets of features (Table ). Among the total 63,500 rows in the data, only 100 rows of features were found to be unique. At this junction, the lowest of the different efficiency (PCE) values from the rows with similar feature sets were accepted in the data, thereby ensuring that at least that device PCE may be produced for a specific set of input parameters. This decision helped illustrate the model and paves the way to modify the input settings to increase the ultimate device efficiency.

Table 4

New Data after Excluding the Four Least Important Input Material Parameter Columns

	E_A (eV)	E_g (eV)	h mobility (cm²/V.s)	PCE (%)
0	4.15	1.5	20	11.7929
1	4.15	1.5	20	11.7865
2	4.15	1.5	20	11.7802
3	4.15	1.5	20	11.7739
4	4.15	1.5	20	11.7677
...	...	...	...	...
63,495	4.3	1.7	100	9.1986
63,496	4.30	1.7	100	9.2009
63,497	4.30	1.7	100	9.2031
63,498	4.30	1.7	100	9.2051
63,499	4.30	1.7	100	9.2070

The Supporting Documentation includes the Python code (in a Jupyter notebook) used to analyze the data for the present study.

Results and Discussion

In this investigation, we utilized the solar cell simulation dataset and machine learning models to delineate the relative impacts of materials’ intrinsic parameters on the overall power conversion efficiency. It is known that various parameters are innately responsible for impacting the efficiency of solar devices; this theoretical study was focused on concentrating on those parameters that are more impactful on the device efficiency over others.

Evaluating Decision Tree Model Performance

Evaluating prediction accuracies for both training and testing datasets as well as other error metrics (different performance analyzing metrics used in statistics, e.g., RMSE, R2) were utilized to assess the model’s performance. The initial test and train datasets were calculated through the SCAPS-1D software. Both datasets are based on the computational framework that utilizes intrinsic device input parameters such as bandgap, electron affinity, conduction band density of state, valence band density of state, intrinsic defect density, acceptor density, etc. to provide device scale photovoltaic output parameters such as open-circuit voltage (Voc), closed-circuit current (Jsc), fill factor (FF), and power conversion efficiency (PCE). For the modeled dataset in this investigation, PCE is considered the target value as it depicts an accurate representation of the overall output device performance. SCAPS-1D simulation software is based upon a rigid computational framework. Relative importance and correlation between the intrinsic parameters were analyzed through supervised machine learning algorithms. Both the train and test sets show high accuracy. This is validated by evaluating the error metrics. Figure shows the parity plot for training data and test data side by side, indicating that the model predicts both the train and test datasets with high levels of accuracy.

Figure 5

Parity plot for train and test data side by side.

Parity plot for train and test data side by side. The below statistical metrics numerically validate this observation However, to get an unbiased estimate of accuracy, RepeatedKFold cross-validation from scikit-learn[77] was used. In the k-fold cross-validation process, a restricted dataset is divided into k nonoverlapping folds. The technique can be replicated numerous times using repeated k-fold cross-validation. The cross-validation relied on the test train dataset and was not compared or contrasted with any experimental data or input from any other source. If five-fold cross-validation was conducted 5 times, the model’s effectiveness would be estimated using 25 distinct sets. In the current research, while the dataset was split once for the single train and test validation, it was now split 5 times and the train and test sequences were repeated 5 times. The process was executed randomly, i.e., without any bias. As before, the standard error metrics including RMSE, R2, were repeated for the k-fold cross-validation. As before, as can be seen, the model generated high levels of accuracy in its prediction for both train and test datasets. These high accuracy numbers for the train and test datasets are likely a function of overfitting, indicating the possibility of some similar devices in the dataset. Looking closer, the 63,500 devices are stepwise cartesian products of seven parameters, i.e., a combination of different input parameter sets, as discussed in Table . It is therefore possible that certain parameters are not important and do not influence the output parameter. By excluding the least important features, which have little or no influence on device efficiency, only some of the rows will be unique among the 63,500 rows. In the subsequent analysis, we used the unique devices after dropping the least important features.

Generating a Prediction

With high confidence generated from cross-validation of the model in the previous section, the model was then utilized to predict the most important features, i.e., those which have the highest impact on the target. To this end, two different analyses built into the random forest algorithm, Feature Importance and Shapley Additive exPlanations (SHAP) analysis, were utilized.

Feature Relative Importance

The random forest model[75,76] from the scikit-learn package is used to calculate feature relative importance. There are one or more decision trees in a random forest model, and each decision tree is made up of internal nodes and leaves. The choice is done at the internal nodes by selecting a feature (valence and conduction band density of state, electron affinity, bandgap, electron mobility, hole mobility, or permittivity) and then splitting the data into two separate sets. It calculates how much each attribute reduces the “impurity” of the split (the feature with the greatest reduction is chosen for the internal node). For random forest regression, variance reduction is the measurement of the decrease in impurity (reduction in variance between two sample sets, i.e., difference between variance of a node and weighted sum of variances of its child nodes). The methodology calculates that for each feature variance is decreased on average. The average overall trees are the measurement of feature importance in random forest. These results are listed in Table and graphically demonstrated in Figure .

Table 5

Parameters and Their Relative Importance

parameter/absorber layer	relative importance (%)
bandgap energy	77.40
hole mobility	10.32
electron affinity	9.70
valence band DoS	1.31
conduction band DoS	1.26
relative permittivity	0.01
electron mobility	0.00

Figure 6

Relative strengths of importance indices of features.

Relative strengths of importance indices of features. Additional validation was done utilizing the correlation matrix. The correlation of the features with the target variable showcased the same result, as seen in Table .

Table 6

Correlation of Features with the Target

features (input material parameters)	correlation
bandgap energy	0.868535
hole mobility	0.302746
electron affinity	0.149510
conduction band DoS	0.104182
valence band DoS	0.068954
electron mobility	0.000688
relative permittivity	0.000307

The correlation matrix showcases the high feature importance of bandgap energy among the seven features. The features reported here act as input parameters for the SCAPS-1D simulation software; Table showcases the impact of these features on the simulated device PCE (target). The outcome from this exercise provides clarity as to which features to modulate to ultimately impact the PCE of the simulated device.

Shapley Additive exPlanations (SHAP) Analysis

To determine the ultimate importance forecast of features the model generates, a SHAP analysis has also been performed on that model. The package Shapley Additive exPlanations (SHAP) is a package of methods most often used for prediction; focusing on the relative importance of features provides a measurement of which variables have the most effect in the model. SHAP analysis creates a large number of predictions and evaluates the results through a comparative review. SHAP combines feature significance and impact in a summary plot. The SHAP summary plot showing the relative importance of noncorrelated features (in descending order) on efficiency is illustrated in Figure .

Figure 7

Noncorrelated features’ contribution on device PCE as measured by SHAP with random forest regression (in decreasing order).

Noncorrelated features’ contribution on device PCE as measured by SHAP with random forest regression (in decreasing order). Overall, it is observed through utilizing both methods that the three most important features, bandgap, hole mobility, and electron affinity (highest to lowest), have more than 97% impact on the target variable (device PCE). This information and insight are critical for the experimentalists, in particular, to enable them to effectively focus their efforts on the optimization of these parameters over the others in their efforts to improve device power conversion efficiency.

Visualizing the Decision Tree

Many ML models are “black boxes”; their inner workings are not interpretable to humans. Decision trees are often chosen because they are more explainable than other models. Here, Boolean logic can simply explain the circumstances, making it easier to understand in comparison to a black box model (such as an artificial neural network).[69] So, after a concise overview, most users can comprehend decision tree models. Trees can also be visually represented in a form that is simple to understand for nonexperts.[67] Implication of the decision tree model to similar systems can be found in research papers.[78,79] To provide clarity, a visualization of the decision tree model utilized in this current research has been shown in Figure .

Figure 8

Branch of the decision tree with only pure leaf nodes, demonstrating a route to achieve higher PCE.

Branch of the decision tree with only pure leaf nodes, demonstrating a route to achieve higher PCE. From Figure , it can be visualized that for increasing PCE bandgap energy, X2 < 1.625, X2 < 1.575 & X2 < 1.525, i.e., bandgap energy, X2 < 1.525 eV; hole mobility, X1 > 30, X1 > 50, X1 > 70 & X1 > 90, i.e., hole mobility, X1 > 90 cm2/V.s; and electron affinity, X0 > 4.175, X0 > 4.225 & X0 < 4.275, i.e., electron affinity, 4.225 < X0 < 4.275 eV. In the given dataset, corresponding bandgap, hole mobility, and EA for maximum device PCE (13.2912%) are 1.5 eV, 100 cm2/V.s, and 4.25 eV, respectively. From the decision tree, it can be visualized that PCE decreases as the value of bandgap energy (X2) increases and hole mobility (X1) decreases. It is additionally observed that PCE decreases with either an increase or decrease in EA (X0) from the approximate critical value of 4.25 eV while holding other parameters unchanged. This behavior indicates that EA should be optimized to be a value near 4.25 eV in practice. However as decision tree predictions are piecewise constant approximations, rather than continuous predictions, it is challenging to extrapolate them.[80] This indicates that a numerical value higher than the maximum value of a given output cannot be predicted. The output value beyond the maximum point loses any physical meaning. If the parameters are modulated any other way, the output value starts decreasing. For the present research and with the generated datasets, these observations, therefore, indicate that the device PCE increases as the bandgap energy decreases from 1.5 eV and the hole mobility increases from 100 eV. The inequality for each parameter sets the PCE that the solar device can produce and is tabulated in Table .

Table 7

Most Impactful Parameters and Their Inequalities for Increasing Device PCE

parameter/absorber Layer	inequality for increasing efficiency
bandgap energy (eV)	≤1.5
hole mobility (cm²/V.s)	≥100
electron affinity (eV)	≈4.25

The above conditions are ‘AND’ conditions. So, bandgap energy has to be <1.5 eV AND hole mobility has to be >100 cm2/V.s AND electron affinity is static at approximately 4.25 eV. The three-dimensional (3-D) representation in Figure and the contour plot (a graphical representation of a 3-D surface in a two-dimensional (2-D) format by drawing constant z slices) in Figure both show that the device PCE increases with decreasing bandgap energy and increasing the hole mobility. These representations additionally suggest that device PCE is maximum at an EA value of 4.25 eV.

Figure 9

PCE as a function of changing bandgap and hole mobility, and at a constant EA 4.25 eV.

Figure 10

Contour plots showing the relation of bandgap and hole mobility with device PCE at different EA values (A) 4.15, (B) 4.20, (C) 4.25, and (D) 4.30 eV.

PCE as a function of changing bandgap and hole mobility, and at a constant EA 4.25 eV. Contour plots showing the relation of bandgap and hole mobility with device PCE at different EA values (A) 4.15, (B) 4.20, (C) 4.25, and (D) 4.30 eV.

Device Optimization through SCAPS-1D

Supervised machine learning on the selective devices in the earlier discussion (Section ) provided a set of critical intrinsic material parameters for a champion device. As indicated from the target analysis, the parameters for that particular device are likely to yield excellent photovoltaic performance. Further optimizations of absorber layer thickness, back contact metal work function, and bulk defect density can significantly impact the device performance. In this section, step-by-step optimizations have been conducted through SCAPS-1D simulation software. An additional set of critical analyses including interfacial defect and device stability showcases the experimental viability of the champion device. The modulations and PCE values that have been calculated to facilitate this investigation are purely based on computational modeling. The calculations highlighted in this section have been carried out through the SCAPS-1D simulation software. The intuition generated from these calculations to improve device performance can ultimately be applied to experimental settings.

Bulk Absorber Layer Thickness Optimization

Inorganic and organic perovskite devices are fabricated through various deposition methods. In most cases, high bulk layer thickness leads to the high absorption of solar energy. But after a certain thickness, most perovskites become susceptible to intrinsic and extrinsic defects. For this section, absorber layer thickness was varied from 0.3 to 1.6 μm. The results illustrated in Figure indicate that peak PCE of 16.68% occurs at 1 μm perovskite layer thickness. For this particular device, therefore, it can be concluded that the highest PCE value of 16.68% can be obtained for the absorber thickness of 1 μm provided there are no additional defects in the bulk of the absorber. The overall trends indicate that Voc and FF decrease with increasing absorber layer thickness owing to the proportionate increase of defect sites within the film. Since a bulkier absorber generates more electron–hole pairs, Jsc rises with absorber layer thickness, resulting in a higher photocurrent.

Figure 11

J–V characteristic parameters as a function of absorber (perovskite) thickness: (a) Voc, (b) Jsc, (c) FF, and (d) PCE.

Bulk and Interfacial Defect Investigation

The bulk defect is a critical cause of low performance for most organic and inorganic Sn-based devices. In atmospheric conditions, Sn-based perovskites have the tendency to be oxidized to Sn2+, which compromises the photovoltaic energy conversion. Although defect level densities are highly dependent on experimental process routes, device performance at different defect energy and defect concentration can provide a comparative study that can be utilized for optimizing experimental conditions. The results of solar output parameters as a function of defect level and energy are illustrated in Figure .

Figure 12

Characteristic device parameters: (a) Voc, (b) Jsc, (c) FF, and (d) PCE at different defect concentrations and energy levels for the bulk absorber layer.

Characteristic device parameters: (a) Voc, (b) Jsc, (c) FF, and (d) PCE at different defect concentrations and energy levels for the bulk absorber layer. From the data, it is apparent that higher defect concentration may decrease the device performance significantly. Lowering the defect to 1 × 1014 cm–3 can yield up to a 40% increase in the device performance. However, such a low defect concentration is yet to be practical with the commercial thin film deposition route and a realistic 1 × 1016 cm–3 concentration of defect is considered for this investigation. Defect concentration in the interfaces can severely impact the device performances as well. In Figures and 14, defect concentrations at the HTL/absorber and ETL/absorber interfaces were investigated. From the outputs, it can be seen that higher defect concentrations at the HTL/absorber interface lower the device performance more severely than defects at the ETL/absorber interface.

Figure 13

Characteristic parameters: (a) Voc, (b) Jsc, (c) FF, and (d) PCE at different defect concentrations and energy levels for the HTL/absorber interface.

Figure 14

Characteristic parameters: (a) Voc, (b) Jsc, (c) FF, and (d) PCE at different defect concentrations and energy levels for the ETL/absorber interface.

Characteristic parameters: (a) Voc, (b) Jsc, (c) FF, and (d) PCE at different defect concentrations and energy levels for the HTL/absorber interface. Characteristic parameters: (a) Voc, (b) Jsc, (c) FF, and (d) PCE at different defect concentrations and energy levels for the ETL/absorber interface. The results of the bulk and interfacial defect analyses also clarify the influence of various defects on overall device performance. For instance, in the device bulk layer, defects with energies ranging from 0.3 to 1.4 eV may be categorized as deep defects, which have a significant negative influence on the device’s performance and should be avoided as far as possible. In contrast, at the interfaces between the absorber and the HTL, defects between the energy level of 0 and 1.1 eV can be classified as deep defects. It can be concluded that the severity of the defects in the photovoltaic performance is also subjective and dependent on the position, concentration, and energy level of the subsequent defect.

Choice of Back Contact Metals

Perovskite solar devices are in general integrated into the circuit board through the soldering of precious metals like silver and gold. The selection of the material can provide a better engineering and economic choice for the design of the subsequent devices. For the device simulation, the metal work function (which is the characteristic indicative parameter for every subsequent back contact metal) has been varied to analyze its impact on the photovoltaic performance. From the results showcased in Figure , it is apparent that back contact metals with their corresponding work function values of above 4.9 eV will yield relatively consistent device performance. Popular back contact metals like silver yield a lower work function of 4.6 eV and thereby should be considered a poor choice for this device.[44,81]

Figure 15

Characteristic parameters: (a) Voc, (b) Jsc, (c) FF, and (d) PCE utilizing different back contact metal work functions.

Quantum Efficiency and the J–V Curve of the Champion Device

Quantum efficiency for solar devices is a reliable metric that indicates the solar energy absorption potential. For solar devices to absorb high photovoltaic energy from the irradiation of the sun, high quantum efficiency in the region between 1.5 and 3 eV radiation is recommended.[82] External quantum efficiency (EQE) as a function of photon energy for the given optimized device (optimized through bulk absorber thickness, defect concentration, and back contact metal) is illustrated in Figure . The device demonstrated excellent EQE (85–95%) for the photons within the range of 2–3.5 eV energy. These results highlight the beneficial absorption potential for this discovered device under normal solar irradiation.

Figure 16

External quantum efficiency (EQE) at different photon energy levels for the champion device before and after optimization.

External quantum efficiency (EQE) at different photon energy levels for the champion device before and after optimization. Characteristic J–V parameter optimizations through step-by-step modulation of absorber layer thickness, defect density, and back contact metal have a remarkable impact on the overall performance of the device, as can be seen from Figure . The device PCE of the champion devices of the fully optimized device (16.68%) is significantly higher than that of the unoptimized device 13.29%.

Figure 17

J–V curve of the optimized and unoptimized devices.

Conclusions

An ML exercise was performed on Cs-perovskite-based solar devices, focusing on key input materials with an intent to delineate their impacts on ultimate device PCE. Supervised machine learning was conducted on the simulated parameters of the SCAPS-1D software where the target value was the theoretical photovoltaic efficiency of the device. It was demonstrated that the energy bandgap, hole mobility, and electron affinity of the absorber perovskite were the most impactful parameters, rendering them to be areas of focus and modification to obtain high device PCE. After generating these impactful parameters and excluding the rest, the decision tree for the model was visualized. It was demonstrated that the electron affinity of the perovskite material should be optimized to a specific value while maintaining critical inequality ranges for the bandgap and hole mobility. The combination of these criteria can lead to the realization of improved device PCE. Further improvement on the device performance was conducted through the optimization of several intrinsic parameters like bulk absorber thickness, defect concentration, and back contact metal that remarkably improved the overall PCE of the device from 13.29% (unoptimized) to 16.68% (optimized). It is important to note that the current investigation is theoretical in nature; further clarification can be obtained from experimental data. The insights provided herewith nonetheless should offer experimentalists a keen sense of targeting specific material parameters over others together with validating their impacts on device PCE.

21 in total

Review 1. Estimation of toxic hazard--a decision tree approach.

Authors: G M Cramer; R A Ford; R L Hall
Journal: Food Cosmet Toxicol Date: 1978-06

2. Charge Transport Limitations in Perovskite Solar Cells: The Effect of Charge Extraction Layers.

Authors: Irene Grill; Meltem F Aygüler; Thomas Bein; Pablo Docampo; Nicolai F Hartmann; Matthias Handloser; Achim Hartschuh
Journal: ACS Appl Mater Interfaces Date: 2017-10-19 Impact factor: 9.229

3. Pb-Based Perovskite Solar Cells and the Underlying Pollution behind Clean Energy: Dynamic Leaching of Toxic Substances from Discarded Perovskite Solar Cells.

Authors: Peidong Su; Yu Liu; Junke Zhang; Cong Chen; Bo Yang; Chunhui Zhang; Xu Zhao
Journal: J Phys Chem Lett Date: 2020-04-16 Impact factor: 6.475

4. Lead-free Perovskite Materials (NH₄ )₃ Sb₂ I_x Br_9-x.

Authors: Chuantian Zuo; Liming Ding
Journal: Angew Chem Int Ed Engl Date: 2017-04-28 Impact factor: 15.336

5. Supervised Machine Learning-Aided SCAPS-Based Quantitative Analysis for the Discovery of Optimum Bromine Doping in Methylammonium Tin-Based Perovskite (MASnI_3-xBr_x).

Authors: Hasan Al Jame; Saugata Sarker; Md Shafiqul Islam; Md Tohidul Islam; Abrar Rauf; Sumaiyatul Ahsan; Sadiq Shahriyar Nishat; Md Rafsun Jani; Kazi Md Shorowordi; Joaquin Carbonara; Saquib Ahmed
Journal: ACS Appl Mater Interfaces Date: 2021-12-28 Impact factor: 9.229

6. Organometal halide perovskites as visible-light sensitizers for photovoltaic cells.

Authors: Akihiro Kojima; Kenjiro Teshima; Yasuo Shirai; Tsutomu Miyasaka
Journal: J Am Chem Soc Date: 2009-05-06 Impact factor: 15.419

Review 7. Toxicity, mechanism and health effects of some heavy metals.

Authors: Monisha Jaishankar; Tenzin Tseten; Naresh Anbalagan; Blessy B Mathew; Krishnamurthy N Beeregowda
Journal: Interdiscip Toxicol Date: 2014-11-15

8. Superior Stability and Efficiency Over 20% Perovskite Solar Cells Achieved by a Novel Molecularly Engineered Rutin-AgNPs/Thiophene Copolymer.

Authors: Ahmed Mourtada Elseman; Walid Sharmoukh; Sajid Sajid; Peng Cui; Jun Ji; Shangyi Dou; Dong Wei; Hao Huang; Wenkang Xi; Lihua Chu; Yingfeng Li; Bing Jiang; Meicheng Li
Journal: Adv Sci (Weinh) Date: 2018-10-12 Impact factor: 16.806

9. Prospects for low-toxicity lead-free perovskite solar cells.

Authors: Weijun Ke; Mercouri G Kanatzidis
Journal: Nat Commun Date: 2019-02-27 Impact factor: 14.919

Review 10. Lead-Free Perovskite Materials for Solar Cells.

Authors: Minghao Wang; Wei Wang; Ben Ma; Wei Shen; Lihui Liu; Kun Cao; Shufen Chen; Wei Huang
Journal: Nanomicro Lett Date: 2021-01-25