| Literature DB >> 35510195 |
Arun Mannodi-Kanakkithodi1,2, Xiaofeng Xiang3, Laura Jacoby4, Robert Biegaj5, Scott T Dunham6, Daniel R Gamelin4, Maria K Y Chan1.
Abstract
We develop a framework powered by machine learning (ML) and high-throughput density functional theory (DFT) computations for the prediction and screening of functional impurities in groups IV, III-V, and II-VI zinc blende semiconductors. Elements spanning the length and breadth of the periodic table are considered as impurity atoms at the cation, anion, or interstitial sites in supercells of 34 candidate semiconductors, leading to a chemical space of approximately 12,000 points, 10% of which are used to generate a DFT dataset of charge dependent defect formation energies. Descriptors based on tabulated elemental properties, defect coordination environment, and relevant semiconductor properties are used to train ML regression models for the DFT computed neutral state formation energies and charge transition levels of impurities. Optimized kernel ridge, Gaussian process, random forest, and neural network regression models are applied to screen impurities with lower formation energy than dominant native defects in all compounds.Entities:
Keywords: combinatorial screening; computational materials science; density functional theory; high-throughput data; machine learning; materials informatics; mid-gap states; point defects; semiconductors
Year: 2022 PMID: 35510195 PMCID: PMC9058924 DOI: 10.1016/j.patter.2022.100450
Source DB: PubMed Journal: Patterns (N Y) ISSN: 2666-3899
Figure 1Outline and chemical space
(A–D) (A) The DFT-ML workflow followed in this work, and the semiconductor-impurity chemical space in terms of (B) the cation and anion choices for group IV, II–VI, and III–V compounds, (C) types of defect sites, and (D) impurity atoms selected from across the periodic table.
Figure 2Comparison of DFT-computed defect levels with experimentally measured levels
(Obtained from publications51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66). Measured versus DFT RMSE values are also shown for different semiconductor types and for the combined set of points. A few defect levels have been labeled.
Figure 3Charge- and Fermi-level-dependent formation energy picture
(A–C) Computed formation energies of native defects (solid lines) and selected impurities (dashed lines) in (A) ZnSe under Se-rich conditions, (B) AlAs under As-rich conditions, and (C) SiC under Si-rich conditions, as a function of the Fermi level as it goes from the VBM (E = 0 eV) to the CBM (E = experimental band gap). The intersection point of the dominant donor and acceptor type native defects (shown using extended dotted colored lines) approximately gives the equilibrium defect formation energy, and the vertical dotted lines show the equilibrium Fermi level. Some charge transition levels and neutral state formation energies have been labeled.
Figure 4Visualization of DFT data
(A–D) (A–C) transition levels (+3/+2) to (−2/−3), and (D) neutral state formation energies at A-rich and B-rich chemical potential conditions, plotted for different semiconductor types.
ML test set prediction RMSE values for transition levels
| Property | ML method | II–VI error (eV) | III–V error (eV) | IV–IV error (eV) | Total error (eV) |
|---|---|---|---|---|---|
| ε(+3/+2) | MLR | 0.35 | 0.37 | 0.34 | 0.35 |
| ε(+3/+2) | Ridge | 0.35 | 0.35 | 0.32 | 0.34 |
| ε(+3/+2) | LASSO | 0.36 | 0.36 | 0.32 | 0.35 |
| ε(+3/+2) | Elastic net | 0.35 | 0.35 | 0.32 | 0.34 |
| ε(+3/+2) | RFR | 0.36 | 0.31 | 0.35 | 0.34 |
| ε(+3/+2) | KRR | 0.33 | 0.37 | 0.31 | 0.33 |
| ε(+3/+2) | GPR | 0.32 | 0.36 | 0.32 | 0.33 |
| ε(+3/+2) | NN | 0.29 | 0.36 | 0.29 | 0.31 |
| ε(+2/+1) | MLR | 0.42 | 0.46 | 0.46 | 0.44 |
| ε(+2/+1) | Ridge | 0.42 | 0.43 | 0.45 | 0.43 |
| ε(+2/+1) | LASSO | 0.43 | 0.44 | 0.45 | 0.44 |
| ε(+2/+1) | Elastic net | 0.42 | 0.43 | 0.45 | 0.43 |
| ε(+2/+1) | RFR | 0.39 | 0.36 | 0.40 | 0.38 |
| ε(+2/+1) | KRR | 0.33 | 0.38 | 0.40 | 0.36 |
| ε(+2/+1) | GPR | 0.32 | 0.38 | 0.41 | 0.36 |
| ε(+2/+1) | NN | 0.29 | 0.35 | 0.38 | 0.33 |
| ε(+1/0) | MLR | 0.40 | 0.39 | 0.43 | 0.40 |
| ε(+1/0) | Ridge | 0.40 | 0.38 | 0.42 | 0.40 |
| ε(+1/0) | LASSO | 0.41 | 0.39 | 0.43 | 0.41 |
| ε(+1/0) | Elastic net | 0.40 | 0.38 | 0.42 | 0.40 |
| ε(+1/0) | RFR | 0.38 | 0.36 | 0.39 | 0.38 |
| ε(+1/0) | KRR | 0.31 | 0.34 | 0.38 | 0.33 |
| ε(+1/0) | GPR | 0.29 | 0.32 | 0.38 | 0.32 |
| ε(+1/0) | NN | 0.29 | 0.31 | 0.37 | 0.32 |
| ε(0/–1) | MLR | 0.37 | 0.42 | 0.34 | 0.38 |
| ε(0/–1) | Ridge | 0.37 | 0.40 | 0.34 | 0.37 |
| ε(0/–1) | LASSO | 0.37 | 0.40 | 0.34 | 0.37 |
| ε(0/–1) | Elastic net | 0.37 | 0.40 | 0.34 | 0.37 |
| ε(0/–1) | RFR | 0.37 | 0.33 | 0.35 | 0.35 |
| ε(0/–1) | KRR | 0.32 | 0.36 | 0.32 | 0.33 |
| ε(0/–1) | GPR | 0.31 | 0.34 | 0.32 | 0.32 |
| ε(0/–1) | NN | 0.28 | 0.33 | 0.31 | 0.30 |
| ε(–1/–2) | MLR | 0.33 | 0.38 | 0.30 | 0.33 |
| ε(–1/–2) | Ridge | 0.32 | 0.37 | 0.29 | 0.32 |
| ε(–1/–2) | LASSO | 0.32 | 0.37 | 0.29 | 0.33 |
| ε(–1/–2) | Elastic net | 0.32 | 0.37 | 0.29 | 0.33 |
| ε(–1/–2) | RFR | 0.34 | 0.35 | 0.27 | 0.33 |
| ε(–1/–2) | KRR | 0.29 | 0.32 | 0.27 | 0.29 |
| ε(–1/–2) | GPR | 0.29 | 0.31 | 0.28 | 0.29 |
| ε(–1/–2) | NN | 0.26 | 0.29 | 0.28 | 0.27 |
| ε(–2/–3) | MLR | 0.27 | 0.26 | 0.22 | 0.26 |
| ε(–2/–3) | Ridge | 0.27 | 0.26 | 0.22 | 0.25 |
| ε(–2/–3) | LASSO | 0.27 | 0.26 | 0.22 | 0.25 |
| ε(–2/–3) | Elastic net | 0.27 | 0.26 | 0.22 | 0.25 |
| ε(–2/–3) | RFR | 0.24 | 0.28 | 0.27 | 0.25 |
| ε(–2/–3) | KRR | 0.26 | 0.24 | 0.21 | 0.24 |
| ε(–2/–3) | GPR | 0.25 | 0.24 | 0.21 | 0.24 |
| ε(–2/–3) | NN | 0.25 | 0.22 | 0.22 | 0.24 |
The gene in closest proximity to the cytokine QTL SNPs.
ML test set prediction RMSE values for formation energies
| Property | ML method | II–VI error (eV) | III–V error (eV) | IV–IV error (eV) | Total error (eV) |
|---|---|---|---|---|---|
| ΔH (A rich) | MLR | 0.85 | 1.57 | 1.81 | 1.16 |
| ΔH (A rich) | Ridge | 0.85 | 1.54 | 1.78 | 1.14 |
| ΔH (A rich) | LASSO | 0.88 | 1.55 | 1.79 | 1.16 |
| ΔH (A rich) | Elastic Net | 0.85 | 1.53 | 1.78 | 1.14 |
| ΔH (A rich) | RFR | 1.05 | 1.03 | 1.20 | 1.07 |
| ΔH (A rich) | KRR | 0.62 | 1.35 | 1.32 | 0.89 |
| ΔH (A rich) | GPR | 0.59 | 1.33 | 1.71 | 0.96 |
| ΔH (A rich) | NN | 0.62 | 1.30 | 1.40 | 0.89 |
| ΔH (B rich) | MLR | 1.04 | 1.82 | 1.81 | 1.31 |
| ΔH (B rich) | Ridge | 1.04 | 1.73 | 1.77 | 1.29 |
| ΔH (B rich) | LASSO | 1.08 | 1.74 | 1.80 | 1.32 |
| ΔH (B rich) | Elastic Net | 1.05 | 1.72 | 1.77 | 1.28 |
| ΔH (B rich) | RFR | 1.09 | 1.25 | 1.52 | 1.18 |
| ΔH (B rich) | KRR | 0.77 | 1.52 | 1.45 | 1.03 |
| ΔH (B rich) | GPR | 0.82 | 1.52 | 1.70 | 1.11 |
| ΔH (B rich) | NN | 0.81 | 1.34 | 1.44 | 1.01 |
Lowest prediction errors.
Figure 5Parity plots for best regression models
(A–C) (A) Random Forest, (B) Gaussian process, and (C) NN regression, plotted for different semiconductor types.
Figure 6Gaussian process regression: Error versus uncertainty
(A and B) Prediction uncertainty as a function of absolute prediction error for (A) Gaussian process and (B) NN regression, plotted for different semiconductor types.
Figure 7ML model performance comparison
(A–C) The performance of various ML models by semiconductor type, in terms of (A) prediction RMSE, and screening accuracy, precision and recall scores for (B) dominating impurities and (C) low energy impurities with mid-gap energy levels at A-rich and B-rich chemical potential conditions.
Selected dominating impurities identified by both DFT and ML (GPR), at A-rich chemical potential conditions
| Semiconductor | Impurity | Shift in Eqm. EF | Dominating defects | Mid-gap level? |
|---|---|---|---|---|
| CdS | InCd | n-type | InCd, q = 1 and VCd, q = −2 | Y |
| CdS | IS | n-type | IS, q = 1 and VCd, q = −3 | Y |
| CdS | Tii | p-type | Tii, q = 2 and VS, q = −1 | Y |
| CdSe | CuCd | p-type | CuCd, q = −1 and Cdi, q = 2 | Y |
| CdSe | Fi | p-type | Fi, q = −1 and VSe, q = 2 | N |
| CdSe | Nii | p-type | Nii, q = −1 and VSe, q = 2 | Y |
| CdTe | BiCd | n-type | BiCd, q = 1 and VCd, q = −2 | Y |
| CdTe | AsTe | p-type | AsTe, q = −1 and VTe, q = 2 | Y |
| CdTe | Nai | n-type | Nai, q = 1 and VCd, q = −2 | N |
| ZnS | Lii | n-type | Lii, q = 1 and VZn, q = −2 | N |
| ZnS | Tii | n-type | Tii, q = 1 and VZn, q = −2 | Y |
| ZnSe | AlZn | n-type | AlZn, q = 1 and VZn, q = −2 | Y |
| ZnSe | BrSe | n-type | BrSe, q = 1 and ZnSe, q = −1 | Y |
| ZnTe | Cri | n-type | Cri, q = 1 and VTe, q = −2 | N |
| ZnTe | Mni | n-type | Mni, q = 1 and ZnTe, q = −2 | Y |
| AlN | SeN | p-type | SeN, q = −1 and VN, q = 1 | Y |
| AlP | HfAl | n-type | HfAl, q = 1 and AlP, q = −1 | Y |
| AlP | Cri | n-type | Cri, q = 1 and VAl, q = −2 | Y |
| AlAs | TiAl | n-type | TiAl, q = 1 and VAs, q = −3 | Y |
| GaN | TlGa | p-type | TlGa, q = −1 and VN, q = 1 | Y |
| GaN | PN | p-type | PN, q = −2 and VN, q = 1 | Y |
| GaP | NiGa | p-type | NiGa, q = −1 and Gai, q = 2 | Y |
| GaP | Lii | n-type | Lii, q = 1 and GaP, q = −2 | Y |
| GaAs | Sci | n-type | Sci, q = 3 and GaAs, q = −2 | Y |
| GaSb | AlGa | n-type | AlGa, q = 1 and VGa, q = −2 | Y |
| InN | Zri | n-type | Zri, q = 2 and VN, q = −1 | Y |
| InP | Cui | n-type | Cui, q = 1 and InP, q = −2 | Y |
| InAs | CaIn | p-type | CaIn, q = −1 and InAs, q = 2 | N |
| Si | TiSi | p-type | TiSi, q = −1 and Sii, q = 2 | Y |
| Si | Bei | n-type | Bei, q = 1 and VSi, q = −3 | Y |
| SiC | VSi | n-type | VSi, q = 1 and VC, q = −2 | Y |
| SiC | Cri | p-type | Cri, q = −1 and VC, q = 1 | Y |
| SnC | AsSn | n-type | AsSn, q = 1 and VC, q = −2 | N |
| SnC | CrSn | p-type | CrSn, q = −1 and VC, q = 2 | N |
Figure 8Defect formation energies from DFT and ML
(A–F) A comparison of the complete charge and Fermi level-dependent formation energy picture of selected impurities from DFT (solid lines) and GPR (dashed lines), presented for (A) CdTe at Cd-rich conditions, (B) ZnS at S-rich conditions, (C) AlAs at As-rich conditions, (D) GaP at Ga-rich conditions, (E) Si at Si-rich conditions, and (F) SiC at C-rich conditions. The dominant donor and acceptor type native defects are also pictured.