Literature DB >> 35664624

Efficient Adversarial Generation of Thermally Activated Delayed Fluorescence Molecules.

Zheng Tan¹, Yan Li², Ziying Zhang³, Xin Wu², Thomas Penfold⁴, Weimei Shi¹, Shiqing Yang¹.

Abstract

Adversarial generative models are becoming an essential tool in molecular design and discovery due to their efficiency in exploring the desired chemical space with the assistance of deep learning. In this article, we introduce an integrated framework by combining the modules of algorithmic synthesis, deep prediction, adversarial generation, and fine screening for the purpose of effective design of the thermally activated delayed fluorescence (TADF) molecules that can be used in the organic light-emitting diode devices. The retrosynthetic rules are employed to algorithmically synthesize the D-A complex based on the empirically defined donor and acceptor moieties, which is followed by the high-throughput labeling and prediction with the deep neural network. The new D-A molecules are subsequently generated via the adversarial autoencoder, with the excited-state property distributions perfectly matching those of the original samples. Fine screening of the generated molecules, including the spin-orbital coupling calculation and the excited-state optimization, is eventually implemented to select the qualified TADF candidates within the novel chemical space. Further investigation shows that the created structures fully mimic the original D-A samples by maintaining a significant charge transfer characteristic, a minimal adiabatic singlet-triplet gap, and a moderate spin-orbital coupling that are desirable for the delayed fluorescence.

Entities: Chemical

Year: 2022 PMID： 35664624 PMCID： PMC9161419 DOI： 10.1021/acsomega.2c02253

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

Thermally activated delayed fluorescence (TADF) molecules are widely known to be highly promising materials for making high efficiency organic light-emitting diode (OLED) devices.[1−11] In TADF, the nonemissive triplet excitons are harvested via the reverse intersystem crossing (RISC), provided the singlet–triplet gap is small enough so that the RISC can be thermally activated. These molecular systems are considered superior to conventional fluorescent and phosphorescent emitters, as they can simultaneously deliver enhanced internal quantum efficiency and cost-effective molecular design without the need for heavy atoms. However, the design of TADF molecules is challenging since it appears to require the optimization of two opposing quantities, under the approximation that the lowest singlet (S1) and triplet (T1) states have a charge transfer characteristic and are dominated by transitions between the highest occupied molecular orbital (HOMO) and the lowest unoccupied molecular orbital (LUMO). The oscillator strength f, which corresponds to the radiative rate and therefore lifetime of S1, is proportional to the square of the transition dipole moment ∫φa(r)rφb(r)d3r, where φa and φb are the wavefunctions of the initial and final states, respectively. An efficient emission requires f to be maximized. This points toward a strong overlap of the frontier transition orbitals, i.e., HOMO and LUMO. On the other hand, the energy gap (ΔEST) between S1 and T1 corresponds to the exchange integral ∫φa(r1)φb(r2)(1/r12)φa(r2)φb(r1)d3r1d3r2, where φa and φb are the initial and final wavefunctions, respectively, and r12 is the interelectronic distance. Qualified TADF candidates need a large RISC rate, which is equivalent to minimizing ΔEST and effectively reducing the overlap of frontier orbitals. Extensive research in the past decade has been dedicated to the exploration of molecular search space to deliver emitters exhibiting strong delayed fluorescence. A large family of TADF compounds has been experimentally[12−14] and computationally[15,16] uncovered. Currently, most of the TADF architectures comprise electron donor and acceptor moieties. The molecular frontier orbitals, i.e., HOMO and LUMO, are deemed capable of realizing an efficient separation through the donor–acceptor dihedral twisting provided that the HOMO–LUMO transition constitutes the major transition in the first excited state, which is mostly the case in TADF. By having a small ΔEST, the donor, acceptor, and linker are carefully modulated to keep an acceptable oscillator strength and a large enough torsional angle in the meantime. To date, most of the design of new materials has tended to focus on large-scale synthetic programs and trial and error. Given the size of chemical space, this is both inefficient and unlikely to lead to the development of transformative new molecules as it will be influenced by human bias. In recent years, approaches based upon high-throughput screening and the relevant machine learning techniques have been introduced to expand the possible TADF chemical space.[15,17,18] Gómez-Bombarelli et al.[17] built a virtual library based on the donor–(bridge)–acceptor architecture, where time-dependent density functional theory (TDDFT) was used to label the molecular excited-state properties and deep neural network (DNN) model was employed to screen the unknown candidates. Zhao et al.[18] have implemented a high-throughput virtual screening over the Cambridge Structural Database; interestingly, novel TADF emitters that go beyond the conventional donor–acceptor structure were identified leading to diversified design rules for the delayed fluorescent molecules. Adversarial modeling is a rapidly developing field in machine learning, and it has led to a variety of great successes when applied to image generation, high-quality speech construction, and text composition. It is also very attractive to generative chemistry, since the method can learn the properties of specific real training examples and then automatically generate new synthetic entities with similar characteristics.[19] Several groups in industry and academia have reported the use of adversarial models to inversely design the drug compound and optoelectronic materials with desired pharmacological and physicochemical properties[20−26] by employing adversarial autoencoder (AAE), generative adversarial network (GAN), reinforcement learning assisted GAN, etc. To further expand the TADF search space, this article introduces an alternative framework by integrating algorithmic synthesis, deep prediction, adversarial generation, and fine screening for the efficient computational design of molecules with preferentially excited-state properties. Given the empirically defined donor and acceptor fragments, we employ the breaking of retrosynthetically interesting chemical substructures[27] (BRICS) route to algorithmically synthesize 160671 D–A complexes. Part of the synthesized samples is selected for TDDFT calculations and labeled with the excited-state properties. We subsequently train a DNN model based on the labeled samples. The developed model is then used to predict the properties of generated molecules. The synthesized D–A chemical space is then fed into an adversarial autoencoder to generate molecules with similar photophysical properties. Fine screening is finally implemented to screen the generated samples for qualified TADF candidates, where extensive quantum chemical computations are carried out with the molecular excited-state geometry, spin–orbital coupling (SOC), frontier orbital transition, and adiabatic gap being examined. We find that the generated molecules can bear almost the same distributions of the excited-state properties as the original samples. Based on the structural similarity, the adversarial generation can perfectly reproduce the charge-transfer characteristics and the minimal adiabatic gap, pointing toward a feasible design philosophy for the TADF emitter. The paper is organized as follows. The second section is dedicated to the methodology including the four modules we have for the adversarial generation. The Results and Discussion Section presents the empirical results and discusses the main findings of our study. A conclusion is drawn in the end.

Methodology

Algorithmic Synthesis

As depicted in Figure , 100 acceptors and 100 donors are first empirically defined (Tables S1 and S2 in the Supporting Information). The donors and acceptors follow the molecular components used by Gómez-Bombarelli et al.[17] The BRICS[27] rule is employed to construct the molecular fragment by introducing the link atoms according to the BRICS chemical environments (16 environments in total). A D–A complex is then synthesized in accordance with the retrosynthetic principle given the constructed donor and acceptor fragments. For example, the fragments shown in Figure a represent the 16 chemical environments (aromatic ring system), which are indicated by the link atoms at the respective reactive sites, where the combination of the two fragments leads to a synthetic D–A sample. In principle, there are a variety of combination modes for TADF: D–A, D–A–D, A–D–A, D–bridge–A, etc. For the sake of simplicity, we only build D1A1 molecules to examine how much the adversarial generation profile can mimic the original chemical space. In total, 160671 samples are algorithmically synthesized.

Figure 1

Integrated modules for the generation and screening of TADF molecules. (a) Algorithmic synthesis for the D–A complexes using combinatorial chemistry. (b) Deep prediction for the excited-state properties based on molecular ECFP fingerprints. (c) Adversarial generation for the TADF chemical space. (d) Fine screening of the generated molecules based on high-throughput TDDFT calculations.

Deep Prediction

A total of 13594 molecules within the D1A1 chemical space are selected for the excited-state calculations. The samples are first processed with RDKit for the initial geometry optimization using the MMFF94 force field.[28] Ground-state geometries are reoptimized at the B3LYP/6-31G(d,p) level using Gaussian 16.[29] Excited-state calculations are carried out with TDDFT/B3LYP/6-31G(d,p) also using Gaussian 16. The excited-state properties at the S0 geometry, including the first three singlet and triplet excited-state energies, the first three singlet oscillator strengths, and the first singlet–triplet splitting, are exported for the sample labeling. The 13594 labeled molecules are partitioned as 8/1/1 into training, validation, and test sets to be used for the DNN training, hyperparametrization, and generalization, respectively. The extended-connectivity fingerprints (ECFPs) are applied to represent the topological characteristics of molecules.[30] We employ ECFP with 2048 bits for the machine learning training input and define the iteration number as 2 during the fingerprint generation. The network topology is set as Input-2048-1024-Output, where the input is 2048, denoting the molecular fingerprint, and the output is 1, representing the regressed target. The learning rate and the maximal training epoch are set to be 0.001 and 400, respectively. We take on an early stop by selecting the epoch number for generalization when the minimum of the loss function is reached in the validation set. Ten single-target DNN models are trained for each excited-state property and further employed as predictors for the property forecasting of generated samples.

Adversarial Generation

In our study, an adversarial autoencoder is chosen for molecular generation. The model basically follows the generation frameworks in MOSES[31] using a simplified molecular-input line-entry system[32] (SMILES) as input and output representations. Two neural networks, an encoder and a decoder, are utilized to map the high-dimensional string-based data onto the low-dimensional vector-based latent space. The encoder is a one-layer bidirectional gated recurrent unit (GRU) of 256 hidden dimensions, while the decoder is a three-layer GRU of 512 hidden dimensions. Note that a one-hot embedding layer is applied before the encoder to process the character sequence into digital information. Teacher forcing[33] is employed in the decoder to lower the autoencoder reconstruction loss. As shown in Figure c, an auxiliary discriminator network is trained to distinguish samples from a Gaussian prior distribution and the latent space. The encoder then adapts its latent space to minimize the discriminator’s predictive accuracy. The training process oscillates between training the encoder–decoder pair and the discriminator until the latent space and the Gaussian prior are indistinguishable. The latent space is of dimension 128, and the discriminator network is a two-layer fully connected neural network with 512 and 256 nodes, respectively. The training is performed for 60 epochs, with a learning rate of 0.0005 and a batch size of 128.

Fine Screening

The generated samples from AAE are first fed into the pretrained DNN predictors with the created ECFP fingerprint to obtain the predicted excited-state properties. Part of the samples is selected according to the initial screening criterion where ΔEST < 0.4 eV, and the first singlet oscillator strength S1_f > 0.02. We set the gap criterion relatively loosely so that further screening can have more space to choose proper candidates. Three hundred thirty-five molecules from the 62 276 generated samples are screened in this step. High-throughput calculations are implemented for these 335 molecules with the same procedure as in the section of the deep prediction to acquire the TDDFT properties at the S0 geometry. The same screening criterion (ΔEST < 0.4 eV and S1_f > 0.02) is again applied to the 335 molecules based on the “true” excited-state properties, where 67 samples are selected. The selected samples are further processed with the SOC calculations (at the S0 geometry), which are implemented with ORCA[34] at the level of B3LYP/def2-SVP. Forty-seven candidates that satisfy the SOC conditions are finally picked out for excited-state optimizations, where S1 and T1 geometries are optimized at the B3LYP/def2-SVP level using Gaussian 16. We also perform TDDFT with the Tamm–Dancoff approximation[35] (TDA) in the excited-state calculations to see whether it can be helpful to reduce the problem of triplet instability.[36] More details in fine screening are discussed in the following sections.

Results and Discussion

Excited-State Property Predictions

Figure presents the DNN prediction qualities on the synthesized D1A1 test set (10% of the 13594 labeled samples). The out-of-sample precisions are acceptable for both singlet and triplet energies, with the forecasting accuracy being gradually diminished for the higher excited state (as seen in Table S3 in Supporting Information). The observation is consistent with the previous report[37] where the prediction error is found to be larger as the excited state goes higher. The prediction R of ΔEST is 0.8636 with the root-mean-square error (RMSE) being calculated as 0.0814 eV, which is comparable with the predictive error of the TADF rate constant given in Gómez-Bombarelli et al.[17] The forecasting of singlet oscillator strengths is less accurate, especially for the higher excited states. Fortunately, the two most important properties, the singlet–triplet gap and the first singlet oscillator strength, are well fitted with R above 0.85, demonstrating that the current model can capture the characteristics of delayed fluorescence.

Figure 2

DNN predictions of the excited-state properties of TADF molecules. S1, S2, S3, T1, T2, and T3 represent the first three singlet and triplet excited-state energies, respectively, while f denotes the corresponding oscillator strength and ΔEST the first singlet–triplet gap. Note that the true data indicate the high-throughput calculations of molecular properties. The blue line is the identity mapping. The singlet and triplet energies and the energy difference are in the unit of eV, while f is dimensionless. Note that as exhibited in Figure , there are a bunch of “true” oscillators with strengths being zero while the predicted values are not, implying a reduced forecasting accuracy for molecules with a null f. We carefully checked the molecular geometry and found that the samples with a null f mostly possess a near-90° dihedral twisting between the donor and acceptor. The weakened prediction quality can possibly be attributed to the employed ECFP fingerprint, which is essentially two-dimensional and thus cannot incorporate the three-dimensional (3D) twisting information (known to be important to the transition probability). The analysis indicates that further improvement of the excited state forecasting accuracy may necessitate the 3D information to be encoded in the model. To examine the feasibility of training set size in predicting the out-of-sample properties, we vary the training set size for the model training and apply the model to predict the same test set. The prediction R is found to converge for both S1_f and ΔEST (Figure S1 in the Supporting Information), implying that the current labeled data set is sufficient to train the model with acceptable generalizability.

Adversarial Generation Profiles

A total of 160671 original D1A1 molecules are put into AAE for the generative training. The generation performances are measured in terms of validity, novelty, and uniqueness. Validity represents the fraction of molecules within the generated samples that can pass the RDKit’s sanitization check so that the atomic valency and the consistency of bonds in aromatic rings can be maintained. Novelty is the fraction of the generated molecules that are not present in the training set. Uniqueness gives rise to the percentage of the generated samples that are unique. We compute the overall generation efficiency for molecules that simultaneously fulfill the above three criteria, given different generation sizes. From Table , the validity and novelty do not vary significantly with the generation size, while the uniqueness is found to decline when more molecules are generated. The overall generation efficiency reaches a level of 0.3114 when 200 K samples are produced, which is equivalent to 62 276 molecules being effectively generated.

Table 1

Performance Metrics for the Adversarial Autoencoder Model Applied on the D1A1 Data Set: Fraction of Valid, Novel, Unique Molecules and the Overall Generation Efficiency Fulfilling the above Three Criteria Given Different Generation Sizes

generation size	validity	novelty	uniqueness	overall efficiency
1 K	0.8280	0.6490	0.9990	0.4750
10 K	0.8079	0.6587	0.9671	0.4460
100 K	0.8135	0.6503	0.8070	0.3555
200 K	0.8143	0.6537	0.7228	0.3114

The internal diversity of both the original and generation sets (160671 original and 62276 generated) are computed to examine how similar the molecules are in each (see computational details in the Supporting Information). The IntDiv1 and IntDiv2 for the original samples are 0.8338 and 0.8233, respectively, and those for the generated samples are 0.8243 and 0.8146, respectively. The computed diversity is significant and comparable with the diversity measure in Polykovskiy et al.,[31] demonstrating that both the synthesized and generated molecules are fully diversified. We employ the pretrained DNN models to predict the excited-state properties for both the original and generated D1A1 molecules. The property distributions are shown in Figure a. Perfect matches between the original and generated distributions are found, indicating a strong structural similarity of the produced D1A1 space with the synthesized one. Both original and generated S1 energies are roughly in the range of 2.5–4.5 eV, covering the full visible spectrum for the light emission. All of the singlet oscillator strengths tend toward a zero-side distribution. The singlet–triplet gap virtually follows a normal distribution, with a rather small proportion of molecules satisfying the gap criterion for TADF (usually 0.25 eV is the accepted upper limit[3]). We also inspect the synthetic accessibility (SA)[38] based on the fragment contributions and the molecular complexity. Both the original and generated samples bear similar distributions of SA, as seen in Figure b. A major peak at around 2.3 is observed, demonstrating that the D1A1 samples possess a molecular synthesizability approaching the catalog molecules.[38]

Figure 3

(a) Distributions of the excited-state properties of the original and generated D1A1 molecules. Note that both the original and generated properties are computed via the DNN model predictions. The singlet and triplet energies and the energy difference are in the unit of eV, while f is dimensionless. (b) Distributions of synthetic accessibility of the original and generated D1A1 molecules.

Fine Screening of TADF Candidates

Given the generated D1A1 molecules, we further take a series of fine screening procedures to see whether the qualified TADF candidates can be selected from the generation space. The initial screening is performed according to the predicted excited-state properties of generated samples. By applying a loose criterion (ΔEST < 0.4 eV and S1_f > 0.02), 335 samples are selected from the 62276 generated molecules. The selected samples are subsequently processed with DFT calculations (at the level of B3LYP/6-31G(d,p)) to obtain the ground-state geometry, based on which the excitation energies, oscillator strengths, and the energy difference are computed at the level of TDDFT/ B3LYP/6-31G(d,p). Further screening of these 335 samples with the “true” excited-state properties leads to 67 qualified candidates that can meet the same selection criterion. We note the discrepancy in the number of molecules satisfying the criterion based on the DNN-predicted and TDDFT-calculated properties. A prediction RMSE of 0.1888 for ΔEST and 0.1053 for S1_f are found for the generated molecules, which is slightly higher than that obtained from the test set in the original space (0.0814 for ΔEST and 0.0964 for S1_f). Additional examinations show that the structural similarity in terms of ECFP is limited between the original and generated space (see Tanimoto similarity analysis in the Supporting Information), which may partially explain the lowered predictive accuracy. The observation is consistent with the findings in Popova et al.,[39] where the prediction and generative model are performed in a separated manner with reduced forecasting precisions being visualized for novelly generated compounds. We, therefore, claim that the current exercise for property predictions within the generated space is reasonable and sufficient for the screening procedure to be performed. Considering the property predictions in other generative models,[40,41] further enhancement of the forecasting efficiency may entail a prediction–generation model integration, which is subject to future research. SOC calculations are performed on the S0 geometry for those 67 generated molecules with small enough vertical gaps. Both T1–S1 and T1–S0 SOC constants are computed. We found that most candidates have T1–S1 coupling below 5 cm–1, which is consistent with the observation in Zhao et al.[18] indicating that the variability of coupling between different multiplicities might be less important than the adiabatic gap explored later. The T1–S0 SOC constants are shown to be relatively large (part of the results are given in Table for the finally selected candidates). Since the T1–S0 coupling is responsible for the nonradiative decay of the triplet, we set the upper limit of T1–S0 SOC to be 15 cm–1 to suppress the possible nonradiative decay so that the RISC can be promoted. The lower limit of the T1–S1 coupling is prescribed to be 0.1 cm–1 so that a non-negligible intersystem SOC is guaranteed. Such screening further leads to 47 candidates for which the excited-state optimizations are performed.

Table 2

Adiabatic Singlet–Triplet Energy Gaps Computed via the B3LYP Functional with and without TDA and the T1S1 and T1S0 SOC Constants at the S0 Geometry for the Eventually Screened TADF Candidates

name	SMILES	B3LYP gap (eV)	B3LYP-TDA gap (eV)	T₁S₁ SOC (cm^–1)	T₁S₀ SOC (cm^–1)
mol_1	c1cnc2c(c1)[SiH2:2]c1ccnc(-n3c4ccccc4c4ccncc43)c1-2	0.1333	0.0106	0.3239	1.8620
mol_2	O=C(Cc1cccc2[nH]c3[nH]c4ccccc4c3c12)c1cccc2c1[nH]c1ccccc12	0.0163	0.0117	1.8729	7.1232
mol_3	c1ccc2c(c1)[nH]c1c2c2ccccc2n1-c1cc2[nH]c3ccccc3c2cn1	0.1223	0.0980	0.1136	0.7360
mol_4	c1ccc2c(c1)[nH]c1ccc3c(c4ccccc4n3-c3cncc4[nH]c5ccccc5c34)c12	0.1759	0.0798	0.1606	0.1746
mol_5	c1cc2c(cc1)[nH]c1ccc3c(c4ccccc4n3-c3cccc4c3ncc3ccccc34)c12	0.0227	0.0144	0.1703	1.3593
mol_6	c1ccc2c(c1)[nH]c1ccc3c(c4ccccc4n3-c3ccc4c5ccccc5c5nccnc5c4c3)c12	0.0130	0.0117	0.1225	1.0532
mol_7	c1ccc2c(c1)-c1cnc(-n3c4ccccc4c4ccccc43)cc1c1nccnc21	0.0995	0.0926	0.1396	1.6184
mol_8	O=C(c1ccc(C=O)c(-c2ccc3c(c2)Nc2ccccc2S3)c1)c1ccccc1	0.0483	0.0391	0.1507	8.6579
mol_9	O=Cc1ccc(-c2cccc3c2c2ccccc2n3-c2ccccc2)c(C#N)c1C#N	0.1280	0.0367	0.2102	0.7563
mol_10	c1cnc2c(c1)-c1ccc(-n3c4ccccc4c4c5ccccc5[nH]c43)cc1c1[nH]cnc21	0.0082	0.0078	0.2818	0.8635
mol_11	O=C(c1ccc(-n2c3ccccc3c3nc[nH]c23)cc1)C(F)(F)F	0.0070	0.0065	0.1691	1.7473
mol_12	N#Cc1ccc(C=O)c(-c2ccc(-n3c4ccccc4c4ccccc43)cc2)c1	0.0084	–0.0066	0.1288	9.6389
mol_13	CN1c2ccccc2Sc2ccc(-c3cccc4Nc5ccccc5c(=O)c34)cc21	0.2109	0.1751	0.5778	6.8733
mol_14	c1ccc2c(c1)[nH]c1ccc3c(c4ccccc4n3-c3cnc4ncccc4n3)c12	0.0239	0.0204	0.1868	1.7842
mol_15	CN1c2ccccc2Sc2cc(-c3cccc4[nH]c5ccccc5c(=O)c34)ccc21	0.1956	0.1594	0.8683	14.9823
mol_16	O=Cc1ccccc1N(c1cccc2[nH]c3ccccc3c12)c1ccccc1	0.1579	0.1168	0.4755	6.4195
mol_17	c1ccc2c(c1)[nH]c1c2c2ccccc2n1-c1ncc2c(c1)[SiH2:2]c1cnccc1-2	0.0761	0.0616	0.1962	0.7084
mol_18	CC(=O)c1ccc(-c2cccc3c4ccccc4c4nc[nH]c4c23)c(C#N)c1	0.2286	0.1081	0.1208	1.2150
mol_19	c1cc2c(cn1)-c1c(ccnc1-n1c3ccccc3c3ccccc31)c1ccccc12	0.2335	0.0899	0.5408	1.8974

The geometry optimization at the excited state is conducted at the level of TDDFT/B3LYP/def2-SVP. We notice that the selection of density functionals may give rise to a significant diversity of the computed excitation energies. The employment of usual exchange–correlation (XC) functionals with lower Hartree–Fock (HF) fraction, such as PBE0 and B3LYP, leads to an underestimation of the excitation energies,[42] while functionals with a higher HF fraction, such as M062X, overestimate the excitation energies and give an enlarged singlet–triplet gap.[43] Experimental identification of the excitation energy is often done via the spectroscopic measurement, but the situation of the experiment–theory matching depends on the individual molecule. Furthermore, the empirical ΔEST values are known to have significant experimental uncertainty.[17] We hereby claim that we only deal with the B3LYP-derived excited-state properties in this article, by particularly focusing on the reproducibility of the generative algorithm for molecular electronic structures. S1 and T1 geometry optimizations for the 47 candidates are implemented in the TDDFT context. We also test the effect of TDA since it is believed to produce a more accurate triplet state.[36] The adiabatic singlet–triplet gaps with and without TDA are computed, with the scattered plot being shown in Figure S3 (in Supporting Information). It is observed that the B3LYP-TDA gap is mostly smaller than the B3LYP gap, demonstrating that the screening based on the B3LYP gap would be reasonable as a more accurate description of excited states can lead to even smaller gaps, which further justifies the selection criterion. Table exhibits the samples with B3LYP gaps smaller than 0.25 eV, where the TDA gaps, T1S1 and T1S0 SOC constants, are displayed as well. These 19 molecules can be considered as the qualified candidates that are eventually selected from the generation space. To validate that the delayed fluorescence originates from the D–A molecular design in the generation space, we carefully examine the molecular structure for the 19 TADF candidates, as shown in Figure . For some samples, either acceptor or donor fragment already exists in the empirical donor/acceptor lists (Tables S1 and S2); however, the molecule holds the newly created donor or acceptor respectively. For example, in mol_8, a phenothiazine-like fragment serves as a known donor, which is connected to a novel acceptor. In another case, both fragments appear in the empirical list but simultaneously belong to the donor family or the acceptor family, such as mol_6, mol_13, and mol_15. The results indicate that the existing donor/acceptor lists are extensible based on the adversarial generated molecules.

Figure 4

Generated molecules that fulfill the fine screening criteria with satisfactory quantum chemical properties for the delayed fluorescence.

Generated molecules that fulfill the fine screening criteria with satisfactory quantum chemical properties for the delayed fluorescence. It should be noted that the designed TADF candidates are of essentially D–A structure, since these molecules are screened out of the generation space, which is structurally similar to the synthesized original space, as a result of the adversarial generation algorithm. This actually imposes some limitations to the design strategy that we can only find molecular types that are predefined within the original space. For the current design case, multiple potential TADF configurations, e.g., multiresonance and so on, are therefore missing. The issue can be possibly resolved by more advanced algorithms where the transfer of chemical space can happen during the model optimization, which is subject to future research. The frontier orbital analysis (calculated at the level of B3LYP/def2-SVP) is also performed for the 19 screened candidates at the optimized S0, S1, and T1 geometries, with two typical examples shown in Figure . It is clearly seen that the HOMO and LUMO orbitals are efficiently separated in both the ground and excited states, which is consistent with the minimal adiabatic gaps computed for these TADF molecules. The S1 and T1 geometries are similar to each other, but the donor–acceptor dihedral twisting is notably enlarged (the case is more significant in mol_11). The observation is in line with the principal TADF designing rule[44−46] where a twisted D–A structure is built to realize the reduced orbital overlap and the minimized ΔEST.

Figure 5

Frontier orbitals for two of the fine screened molecules analyzed at the optimized S0, S1, and T1 geometries.

Conclusions

The article has proposed an alternative design route for the TADF molecules beyond the current D–A combination and high-throughput screening. The integrated modules combining the algorithmic synthesis, deep prediction, adversarial generation, and fine screening, essentially provide a comprehensive paradigm for synthesizing and generating molecules with expected excited-state properties. The simple donor–acceptor design is adopted where the original D1A1 space is synthesized according to the BRICS rule. Nearly 14,000 samples are labeled via high-throughput calculations for the purpose of DNN model training and property prediction. Satisfactory forecasting precisions are obtained with deep learning, especially for the most important factors, the oscillator strength and the singlet–triplet splitting that to a large extent prescribe the TADF process. Adversarial autoencoder is applied to generate novel structures in terms of the molecular SMILES that can mimic the original D1A1 samples. With the assistance of DNN predictions, the distributions of excited-state properties for the generated molecules are found to perfectly match those of the original samples. Further analysis shows that the synthetic accessibility for both the original and generated D1A1 molecules is at a reasonable level, which facilitates material synthesis and device fabrication. By performing the fine screening on the generated chemical space, a bundle of qualified TADF candidates is eventually selected with minimal adiabatic gaps and moderate spin–orbital couplings. In addition to the similar structures, the adversarial generation algorithm can well reproduce the charge-transfer characteristics and the donor–acceptor dihedral twisting at the excited state, indicating a strong potential of the generative model in designing materials with expected photophysical mechanisms. Our research opens another pathway to the TADF molecular designs. Given the generative model and the original molecular space, the methodology can be easily applied to generate other types of TADF molecules, e.g., molecules with more complex D–A structures, molecules with resonant structures,[47] and molecules with the negative singlet–triplet gap[48] as recently introduced. The model can be further improved by incorporating reinforcement learning,[39,49,50] with which a biased property distribution can be generated and moved along the preferred direction. We expect that the current framework can lead to a significant enrichment of the TADF library and a higher vision for “design as we desire” that is possibly realizable in the near future.

27 in total

1. Computational Prediction for Singlet- and Triplet-Transition Energies of Charge-Transfer Compounds.

Authors: Shuping Huang; Qisheng Zhang; Yoshihito Shiota; Tetsuya Nakagawa; Kazuhiro Kuwabara; Kazunari Yoshizawa; Chihaya Adachi
Journal: J Chem Theory Comput Date: 2013-08-09 Impact factor: 6.006

2. druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico.

Authors: Artur Kadurin; Sergey Nikolenko; Kuzma Khrabrov; Alex Aliper; Alex Zhavoronkov
Journal: Mol Pharm Date: 2017-08-04 Impact factor: 4.939

Review 3. Purely Organic Thermally Activated Delayed Fluorescence Materials for Organic Light-Emitting Diodes.

Authors: Michael Y Wong; Eli Zysman-Colman
Journal: Adv Mater Date: 2017-03-03 Impact factor: 30.849

Review 4. Recent advances in organic thermally activated delayed fluorescence materials.

Authors: Zhiyong Yang; Zhu Mao; Zongliang Xie; Yi Zhang; Siwei Liu; Juan Zhao; Jiarui Xu; Zhenguo Chi; Matthew P Aldred
Journal: Chem Soc Rev Date: 2017-02-06 Impact factor: 54.564

5. Highly efficient organic light-emitting diodes from delayed fluorescence.

Authors: Hiroki Uoyama; Kenichi Goushi; Katsuyuki Shizu; Hiroko Nomura; Chihaya Adachi
Journal: Nature Date: 2012-12-13 Impact factor: 49.962

6. The theory of thermally activated delayed fluorescence for organic light emitting diodes.

Authors: T J Penfold; F B Dias; A P Monkman
Journal: Chem Commun (Camb) Date: 2018-04-17 Impact factor: 6.222

7. The ORCA quantum chemistry program package.

Authors: Frank Neese; Frank Wennmohs; Ute Becker; Christoph Riplinger
Journal: J Chem Phys Date: 2020-06-14 Impact factor: 3.488

8. Novel thermally activated delayed fluorescence materials-thioxanthone derivatives and their applications for highly efficient OLEDs.

Authors: Hui Wang; Lisha Xie; Qian Peng; Lingqiang Meng; Ying Wang; Yuanping Yi; Pengfei Wang
Journal: Adv Mater Date: 2014-06-05 Impact factor: 30.849

9. Thermally Activated Delayed Fluorescence in an Organic Cocrystal: Narrowing the Singlet-Triplet Energy Gap via Charge Transfer.

Authors: Lingjie Sun; Weijie Hua; Yang Liu; Guangjun Tian; Mingxi Chen; Mingxing Chen; Fangxu Yang; Shufeng Wang; Xiaotao Zhang; Yi Luo; Wenping Hu
Journal: Angew Chem Int Ed Engl Date: 2019-07-08 Impact factor: 15.336

10. A de novo molecular generation method using latent vector based generative adversarial network.

Authors: Oleksii Prykhodko; Simon Viet Johansson; Panagiotis-Christos Kotsias; Josep Arús-Pous; Esben Jannik Bjerrum; Ola Engkvist; Hongming Chen
Journal: J Cheminform Date: 2019-12-03 Impact factor: 5.514