Liam Wilbraham1, Enrico Berardo2, Lukas Turcani2, Kim E Jelfs2, Martijn A Zwijnenburg1. 1. Department of Chemistry , University College London , 20 Gordon Street , London WC1H 0AJ , United Kingdom. 2. Department of Chemistry , Imperial College London , South Kensington , London SW7 2AZ , United Kingdom.
Abstract
We propose a general high-throughput virtual screening approach for the optical and electronic properties of conjugated polymers. This approach makes use of the recently developed xTB family of low-computational-cost density functional tight-binding methods from Grimme and co-workers, calibrated here to (Time-Dependent) Density Functional Theory ((TD)DFT) data computed for a representative diverse set of (co)polymers. Parameters drawn from the resulting calibration using a linear model can then be applied to the xTB derived results for new polymers, thus generating near DFT-quality data with orders of magnitude reduction in computational cost. As a result, after an initial computational investment for calibration, this approach can be used to quickly and accurately screen on the order of thousands of polymers for target applications. We also demonstrate that the (opto)electronic properties of the conjugated polymers show only a very minor variation when considering different conformers and that the results of high-throughput screening are therefore expected to be relatively insensitive with respect to the conformer search methodology applied.
We propose a general high-throughput virtual screening approach for the optical and electronic properties of conjugated polymers. This approach makes use of the recently developed xTB family of low-computational-cost density functional tight-binding methods from Grimme and co-workers, calibrated here to (Time-Dependent) Density Functional Theory ((TD)DFT) data computed for a representative diverse set of (co)polymers. Parameters drawn from the resulting calibration using a linear model can then be applied to the xTB derived results for new polymers, thus generating near DFT-quality data with orders of magnitude reduction in computational cost. As a result, after an initial computational investment for calibration, this approach can be used to quickly and accurately screen on the order of thousands of polymers for target applications. We also demonstrate that the (opto)electronic properties of the conjugated polymers show only a very minor variation when considering different conformers and that the results of high-throughput screening are therefore expected to be relatively insensitive with respect to the conformer search methodology applied.
High-throughput virtual
screening (HTVS) has emerged as a powerful
tool across many areas of materials discovery.[1] At its core, it concerns the automated data-driven discovery of
promising candidate materials for a particular application. Low-computational-cost
techniques are applied in an initial screening step to expose promising
leads, which are then fed through more accurate, higher-cost techniques
to refine computed properties.[2] HTVS is
applied to explore vast, complex chemical spaces in order to accelerate
the discovery of promising materials, as well as to reduce experimental
effort spent on negative leads. Recent examples include the successful
screening of both organic and inorganic materials for applications
in areas including photovoltaics,[3−8] light emitting devices,[9,10] information storage,[11,12] gas storage,[13−19] and catalysis,[20,21] often resulting in materials
being synthesized with record-breaking properties.[10,18,19]Conjugated polymers are used as light
absorbers and electron donor
layers in organic photovoltaics,[22−26] the light-emitting layer of organic light emitting
diodes,[27,28] electrodes for batteries and supercapacitors,[29,30] and photocatalysts for proton reduction and overall water splitting.[31−35] All of these applications exploit a combination of the optoelectronic
and/or redox properties of conjugated polymers in combination with
their relatively facile tunability through copolymerization and chemical
modification, as well as the earth abundant nature of their constituents
(if not the catalysts used to make them via Suzuki or Stille coupling).
Applying HTVS approaches to polymeric materials, however, remains
challenging. Because of the size of the required oligomeric models,
calculations on polymers are inherently more computationally costly
than for small(er) molecules. Moreover, because of the large numbers
of potential monomer sequences, compositions, and the potential for
disorder along the polymer chains, extremely large numbers of calculations
can be required. For example, constructing simple, ordered, two-component
copolymers from a pool of 500 symmetric monomer units leads to over
120 000 possible polymers. If three-component copolymers are
considered for the same monomer pool, the number of possible polymer
increases to more than 20 million. Finally, most conjugated polymers
form amorphous or poorly crystalline solids, ruling out periodic calculations.The fundamental optoelectronic properties of a material can be
understood[36] in terms of (i) the optical
gap (Δo), the minimum energy to form a excited electron–hole
pair bound through their mutual electrostatic interaction (an exciton)
and below which the material is transparent to light, (ii) the fundamental
gap, the minimum energy to form a noninteracting excited-electron
and hole, (iii) the exciton binding energy, the difference between
the fundamental gap and optical gap, a measure of how strong the excited
electron and hole are bound in the exciton, and (iv) the ionization
potential (IP) and electron affinity (EA), the energy required to
remove an electron and released upon adding an electron to an oligomer,
respectively. IP and EA are commonly expressed in terms of standard
reduction potentials vs a standard electrode, e.g., the standard hydrogen
electrode (SHE). The fundamental gap, finally, can be defined in terms
of the difference between IP and EA.Because of the lack of
periodicity, properties such as the optical
gap, IP, and EA of polymers are typically calculated for a single
oligomeric strand; a description that can be improved upon by combining
this description with embedding[37−39] in a dielectric continuum model
to account for environmental effects. In the latter case, one assumes
that all intermolecular interaction, be it with other polymer strands
or a solvent or electrolyte, can be described in terms of an (isotropic)
dielectric response. Generally, the underlying calculations are performed
using Density Functional Theory (DFT) or the combination of DFT with GW theory, methods that, because of their computational
cost, are not ideally suited for HTVS. Hutchison and co-workers pioneered
the use of semiempirical methods to perform high-throughput virtual
screening for properties of conjugated polymers.[3,7,40] Semiempirical methods such as AM1,[41] PM6,[42] and PM7[43] approximate Hartree–Fock or DFT by parametrizing
integrals that are expensive to calculate. This parametrization, combined
with the use of minimal basis-sets results, makes semiempirical calculations
orders of magnitude faster than DFT but potentially less accurate.In this paper, we introduce an approach for the fast and accurate
screening of the optoelectronic properties of conjugated (co)polymers.
Besides direct use for HTVS we envisage that it can also be applied
to generate data to train machine-learning models. We demonstrate
that, by combining the recently developed xTB[44] family of density functional tight-binding methods with calibration
using a linear model to properties obtained through (Time-Dependent)
DFT ((TD)DFT) calculations, one can not only rapidly screen polymers
but also achieve (TD)DFT-quality results. We predominantly compare
with results of (TD)DFT calculations using the B3LYP density functional,
since we previously showed that calculations with this density functional
yield accurate potentials for conjugated polymers when compared to
experimental photoelectron spectroscopy data on polymeric solids,[39] while also predicting reasonable optical gap
values. The difference between our approach here and previous work[3,7,40,45] is 3-fold. Besides the use of xTB rather than alternative semiempirical
methods, we calculate adiabatic IP and EA values directly rather than
approximate them from orbital energies, include an explicit description
of the dielectric environment of the polymers, and use an integrated
family of methods for both electronic and optical properties of the
polymers.
General workflow
Structure Library Generation
In
this section, we give
an overview of the general workflow of the proposed high throughput
approach. As outlined in Figure , the process involves multiple steps. Starting from
a simplified molecular-input line-entry system (SMILES)[46] representation of each monomer unit, linear
polymer structures were generated using the Supramolecular Toolkit
(stk),[47,48] a python library for
the assembly, structure generation and property calculation of supramolecules,
which takes base functionality from RDKit.[49]stk allows for flexible copolymer formation from
arbitrary monomer units, control over monomer sequence within repeat
units, and the automatic generation of different structural isomers
where asymmetric monomer units (e.g., 2,5 linked pyridine) are concerned.
We restrict polymer chain length in all cases to the approximate equivalent
of 12 phenylene monomer units. For example, a copolymer of thiophene
and phenylene contains 6 thiophene and 6 phenylene units (each count
as 1), a copolymer of fluorene and phenylene contains 4 fluorene and
4 phenylene units (phenylene counts as 1, fluorene counts as 2), and
a copolymer of fluorene and carbazole contains 3 fluorene and 3 carbazole
units (each counts as 2), as shown in Figure b. Polymer models of this length have previously
been shown to provide approximately converged properties with respect
to oligomer length.[39]
Figure 1
(a) Workflow of the calibration
procedure, resulting in the linear
model used to calibrate semiempirically determined properties (IP,
EA, and optical gap (Δo)) to DFT-level calculations.
(b) Subsequent proposed high-throughput screening approach, where
the linear model produced in panel a is applied to new polymer structures
for optoelectronic property screening.
Figure 2
(a) Repeat units of the polymers in the calibration set. (b) 3D
geometries of the conformers of polymers with repeat units 8, 9, 23,
29, 32, and 38 as illustration of the types of conformers encountered.
(a) Workflow of the calibration
procedure, resulting in the linear
model used to calibrate semiempirically determined properties (IP,
EA, and optical gap (Δo)) to DFT-level calculations.
(b) Subsequent proposed high-throughput screening approach, where
the linear model produced in panel a is applied to new polymer structures
for optoelectronic property screening.(a) Repeat units of the polymers in the calibration set. (b) 3D
geometries of the conformers of polymers with repeat units 8, 9, 23,
29, 32, and 38 as illustration of the types of conformers encountered.
Conformer Search Strategy
Linear copolymers generally
contain large numbers of rotatable bonds, and therefore a conformer
search on the initially generated polymer structures is warranted.
Here, we make use of a stochastic rather than systematic approach,
sampling the conformational space of the polymer randomly using the
Experimental-Torsion Distance Geometry with additional basic knowledge
(ETKDG) method,[50] where we typically generate
5000 conformers per polymer. The resulting conformers are embedded
in 3D space and undergo a subsequent optimization and energy ranking
procedure using the Merck Molecular Force Field (MMFF)[51] as implemented in RDKit. This second optimization
step adds little computational cost to the overall conformer search,
as the dominant contribution to computational cost stems from the
initial stochastic conformer generation and embedding.
Property Calibration
At this stage, we take a set of
around 40 simple copolymers that we will refer to as the calibration
set. This is a set of copolymer structures composed of the monomer
units shown in Figure a, for which the optoelectronic properties span a wide range of values.
Polymer structures are generated and conformer searches are performed
on each. For each system in the calibration set, both (TD)DFT and
semiempirical calculations are performed on the lowest energy conformer,
according to MMFF, to compute ground state geometries, ionization
potentials (IP), electron affinities (EA), and optical gaps (for specific
computational details, see below). This data is then used to fit linear
models using Python 3.6 (numpy version 1.13.3) for IP, EA, and the
optical gap; calibrating properties calculated with the semiempirical
method to (TD)DFT-derived data. The motivation behind this approach
is that the slopes and intercepts of this calibration step can be
used to provide DFT-quality data at the cost of the associated semiempirical
method. In turn, this allows a much more rapid exploration of chemical
space for high-throughput screening of the intrinsic electronic properties
of polymer materials, with the relatively small initial computational
investment required for calibration. Naturally, this procedure relies
on the semiempirical and DFT-derived data being strongly linearly
correlated.
(TD)DFT Calculations
For the (TD)DFT
calculations,
IPs and EAs are computed using an adiabatic ΔDFT approach, based
around the DFT total energies of the neutral polymer and its relaxed
cationic/anionic counterparts. These calculations use the COSMO[52,53] solvation model to approximate the dielectric environment of a polymer
chain (e.g., at a polymer–water interface, εr 80.1, or within the polymer bulk, εr 2.0). As in
our previous work,[39,54,55] the (TD)DFT calculations use the B3LYP[56−59] density functional and the DZP[60] basis-set. The absorption spectra of the polymer
models are approximated by vertical singlet excitations, calculated
using (TD)B3LYP/DZP. Here, we define the optical gap, i.e., the onset
of light absorption, as the energy of the lowest vertical excitation
with nonzero oscillator strength. All (TD)DFT calculations described
until now have been performed with Turbomole 7.01.[61−63] Additional
optical gap calculations were performed using the CAM-B3LYP[64] and LC-ωHPBE[65] range separated hybrid functionals in combination with a 6-31G(d,p)
basis set, all using Gaussian 16.[66]
Semiempirical
Calculations
Though different semiempirical
methods have been applied to conjugated polymers previously,[3,7,40] we make use of a recently developed
density functional tight binding approach (GFN-xTB)[44] for structural optimizations of the neutral polymers. For
IP/EA calculations, we use an extension of the parent GFN-xTB method,
IPEA-xTB,[67] a differently parametrized
variant of GFN-xTB for the calculation of ionization potentials and
electron affinities. Essentially, IPEA-xTB performs vertical ΔSCF
calculations, and has previously been shown to result in typical errors
for computed vertical IP/EA values of 0.2–0.4 eV compared to
DFT in vacuum.[67] For optical gaps, we use
the tight binding simplified Tamm–Dancoff approach (sTDA)[68,69] applied to orbitals and orbital eigenvalues obtained through xTB
(sTDA-xTB),[70] an approach capable of ultrafast
computation of entire UV–vis absorption spectra. All GFN-xTB
calculations were performed using the xtb code,[71] whereas the sTDA results were obtained using
the sTDA[72] code. All GFN-xTB calculations,
but not sTDA calculations, used the generalized Born surface area
solvation model, with the default parameters for benzene and water
distributed with the xtb code. Finally, for comparison
we also perform analogous PM6,[42] PM7[43] (MOPAC 2016),[73] and
ZINDO/S[74] (Gaussian 16)[66] calculations on the polymers to obtain IP and EA values.
Results
In the following, we explore each stage of the high-throughput
screening methodology, assessing the accuracy of the combination of
the semiempirical calculation and linear calibration model, robustness
of the approach when applied to various oligomer lengths and compositions,
and how the conformer searching strategy ultimately affects predicted
properties.
Calibration–Ionization Potential and Electron Affinity
The IPs and EAs computed by IPEA-xTB and DFT/B3LYP for the calibration
set are shown in Figure . For each dielectric environment, simple linear models are fitted
to IP and EA, yielding intercept and gradient values that can be applied
to calibrate semiempirical data to obtain B3LYP-quality results. All
calibration parameters resulting from the regression models are given
in Table and Table S12. There is a strong linear correlation
between the B3LYP and IPEA-xTB data, with R2 values of 0.99 for both dielectric environments. For the aqueous
environment, the mean absolute differences (MAD) between IPEA-xTB
and DFT data are 0.37 and 0.16 V for IP and EA, respectively. After
calibration using the linear model, these are reduced to 0.08 and
0.06, respectively. The considerable reduction in MAD after calibration
demonstrates that not only the trends between different copolymer
properties but also the absolute values predicted by DFT are recovered
well by calibrated IPEA-xTB. Furthermore, the same calibration procedure
for the nonpolar (benzene) environment data results in MAD of 0.10
(IP) and 0.06 (EA) between IPEA-xTB and DFT results, indicating that
the calibration quality holds across strongly differing dielectric
environments with high and low dielectric constants.
Figure 3
IP (up triangles) and EA (down triangles)
values calculated for
the copolymer calibration set using IPEA-xTB and B3LYP/DZP. In each
case, properties are computed within a low dielectric screening environment
resembling benzene (left) and a high dielectric environment resembling
water (right). Two different fits to linear models are performed in
each case. (i) Single fit performed for both IP and EA, with the predictions
of the resulting model and its 95% prediction bounds shown as dashed
line and light blue shaded region, respectively. (ii) Two separate
fits are performed for IP and EA, with the predictions of the resulting
models and its 95% confidence bounds shown as solid lines and blue
shaded region, respectively. All values are presented versus the standard
hydrogen electrode potential (4.44 V).
Table 1
Parameters (slope and intercept) of
the Linear Models Applied to Calibrate xTB data (obtained with IPEA-xTB
and sTDA-xTB for IP/EA and optical gaps, respectively) to DFT data,
alongside Mean Absolute Errors of Calibrated Semiempirical Results
vs DFT (B3LYP)
Property
Environment
Slope
Intercept
MAD after calibration (eV)
IP
Low dielectric
0.91
0.15
0.10
EA
Low dielectric
0.92
–1.05
0.06
IP and EA
Low dielectric
1.27
–0.32
0.18
IP
High dielectric
0.88
–0.24
0.08
EA
High dielectric
0.84
–0.75
0.06
IP and EA
High
dielectric
1.02
–0.41
0.10
Optical Gap
Low dielectric
0.83
–0.21
0.13
Optical Gap
High dielectric
0.85
–0.25
0.13
IP (up triangles) and EA (down triangles)
values calculated for
the copolymer calibration set using IPEA-xTB and B3LYP/DZP. In each
case, properties are computed within a low dielectric screening environment
resembling benzene (left) and a high dielectric environment resembling
water (right). Two different fits to linear models are performed in
each case. (i) Single fit performed for both IP and EA, with the predictions
of the resulting model and its 95% prediction bounds shown as dashed
line and light blue shaded region, respectively. (ii) Two separate
fits are performed for IP and EA, with the predictions of the resulting
models and its 95% confidence bounds shown as solid lines and blue
shaded region, respectively. All values are presented versus the standard
hydrogen electrode potential (4.44 V).The above discussion was based on linear models where IP
and EA
values are fitted separately. We repeated this procedure, this time
fitting the IP and EA results simultaneously (Figure ), a so-called global fit. Calibration using
models based on a global fit results in slightly higher MAD values
for both dielectric environments (0.18 and 0.10 V for the IP and EAs
of low and high dielectric environments, respectively). The small
difference in performance for each dielectric environment is reflected
in the residual sum of squares, which is lower for the calibration
performed in the high dielectric environment (1.23) than the low dielectric
environment (2.44), indicating that, although they perform similarly,
the global calibration procedure yields a better fit for the former
than the latter. Here, we see that the effect of reducing the total
number of parameters when moving from two separate models for IP and
EA to one global fit is smallest in the high-dielectric range, indicating
that a global fit may be appropriate when considering high-dielectric
environments.For each linear model, 95% prediction bands indicate
that with
new polymers not used for fitting, we can expect an accuracy of ±0.25
V with respect to the B3LYP calculations. Below, unless explicitly
stated otherwise, all calibrated IPEA-xTB data will have been obtained
using separate models for IP and EA.We attempted analogous
calibrations for PM6,[42] PM7,[43] and ZINDO/S[74] as
examples of alternative semiempirical methods.
For PM6 and PM7, we calculated IP and EA both using ΔSCF and
by making the orbital approximation, while only applying the latter
for ZINDO/S. The orbital approximation is often made in the literature,[7,75] here one approximates IP and EA by the negative of the HOMO and
LUMO orbital energies of a given neutral polymer. The results of these
calibration attempts can be found in the Supporting Information (Figures S15–S17). In summary, PM6/PM7 ΔSCF
calculations and PM6/PM7/ZINDO/S calculations that make the orbital
approximation yield poorer results than IPEA-xTB data, before, and
importantly after, calibration (Table S12). However, we should note that ZINDO/S performs similarly to IPEA-xTB
for IP but that IPEA-xTB offers significant improvements for EA (MAE
of 0.18 V for ZINDO/S, 0.06 V for IPEA-xTB in the low dielectric case).
The improved performance of IPEA-xTB over PM7, PM6, and ZINDO/S is
likely to be at least partly due to the fact that IPEA-xTB was specifically
developed to predict IP and EA values.
Calibration–Optical
gap
Figure shows computed optical gaps (defined as
the vertical excitation to the lowest excited state associated with
nonzero oscillator strength) of the calibration set. In analogy to
the above, we fit a linear model using data obtained with (TD)B3LYP
and with sTDA-xTB. For both dielectric environments, (TD)DFT computed
optical gaps are reproduced well by the combination of sTDA-xTB and
a calibrated linear model. At least part of the latter shift is probably
due to the fact that the (TD)B3LYP calculations employed a dielectric
screening model, something that is unavailable for sTDA-xTB.
Figure 4
Optical gap,
defined as lowest computed vertical excitations with
nonzero oscillator strength, values calculated for the copolymer calibration
set at sTDA-xTB and (TD)DFT level (using B3LYP, CAM-B3LYP, and LC-ωHPBE).
In each case, properties are computed using a dielectric screening
model for benzene. Corresponding results for the water dielectric
model can be found in the Supporting Information. Linear model fits are performed in each case, with the predictions
of the fitted model and the 95% prediction bounds shown as blue dashed
lines and blue solid lines, respectively.
Optical gap,
defined as lowest computed vertical excitations with
nonzero oscillator strength, values calculated for the copolymer calibration
set at sTDA-xTB and (TD)DFT level (using B3LYP, CAM-B3LYP, and LC-ωHPBE).
In each case, properties are computed using a dielectric screening
model for benzene. Corresponding results for the water dielectric
model can be found in the Supporting Information. Linear model fits are performed in each case, with the predictions
of the fitted model and the 95% prediction bounds shown as blue dashed
lines and blue solid lines, respectively.To investigate effects of self-interaction and the potential
overestimation
of conjugation length by the B3LYP hybrid functional, we also employ
the range-separated hybrid functional CAM-B3LYP, which has previously
been shown to result in consistent agreement between experimental
and DFT calculated optical gaps in conjugated polymer systems, particularly
when applied to different oligomer lengths.[76]CAM-B3LYP indeed improves the fit quality relative to B3LYP.
However,
the improvement is marginal and results in a minor change in the slope
and a largely rigid shift of the DFT calculated absorption energies
to higher values. This improvement upon switching to a long-range
corrected functional is also exemplified when reperforming a fit against
data obtained with LC-ωHPBE. The improved fits are reflected
in the reduced MAD of noncalibrated sTDA-xTB optical gaps compared
to density functionals with improving asymptotic behavior, which induces
a more even treatment of polymers with different conjugation lengths
used in the calibration set (i.e., those with meta-lined monomer units).
Specifically, MAD values of optical gaps computed for the calibration
set in an aqueous environment compared to B3LYP, CAM-B3LYP and LC-ωHPBE
are 0.72, 0.20, and 0.18 eV, respectively. Similarly, the MADs for
the nonpolar (benzene) environment are 0.83, 0.22, and 0.18 eV respectively.
As we have seen above for IP and EA, these MAD values are significantly
reduced through calibration (0.15, 0.10, and 0.08 eV in an aqueous
environment, 0.15, 0.10, and 0.09 eV for the nonpolar (benzene) environment
for B3LYP, CAM-B3LYP, and LC-wHPBE, respectively). The effect of going
from B3LYP to CAM-B3LYP/LC-ωHPBE can be attributed to the improved
asymptotic behavior of the exchange-correlation (electronic) potentials
of these density functionals relative to B3LYP, reducing self-interaction
error and subsequent over- delocalization of π orbitals along
the polymer backbones.[77] As the electronic
potential in xTB displays the exact asymptotic behavior,[70] the better quality fit is obtained when a long-range
corrected density functional is expected.
Calibration
and Polymer Length
Following the observations
described above, we further test the robustness and versatility of
calibration through a linear model by considering oligomers of varying
length. To do so, we extract a subset of polymer structures from the
calibration set and consider how IP, EA, and optical gap values vary
with oligomer length. The results are shown in Figures S5 and S6 of the Supporting Information compared to
the predictions of the linear model and 95% prediction intervals from Figures and 4. Considering IP and EA, we can see that the linear relationship
between IPEA-xTB and B3LYP holds for all oligomer lengths, indicating
that issues with self-interaction observed in the optical gap calibration
do not carry over to those of IP and EA. This is consistent with the
excellent quality of fit observed in Figure for IP and EA, despite the fact that the
calibration set contains meta-linked polymers with a formally shorter
conjugation length.Now considering optical gaps, issues with
self-interaction become apparent for B3LYP. As the oligomer length
decreases from 12 to 8 and then to 4 equiv phenylene units, we see
the systematic increase in optical gap relative to sTDA-xTB with decreasing
oligomer length associated with the self-interaction error. Moving
to CAM-B3LYP, this effect is mitigated, though not entirely eliminated,
consistent with the improved (but not exact) asymptotic behavior of
the exchange-correlation potential. Now considering LC-ωHPBE,
the effect is further reduced compared to CAM-B3LYP, in line with
the previous analysis of computed optical gaps in Figure .
Validation of the Approach
Here we demonstrate the
robustness of the calibration procedure by introducing additional
polymer systems and molecules not used during fitting. Figure shows the previous IP, EA,
and optical gap calibrations for the benzene dielectric screening
model, alongside results for this validation set. The validation set,
see Figure S3, includes polymers potentially
relevant to organic photovoltaics, e.g., those based on benzothiazole
or thieno[3,2-b]thiophene and polymers from the literature
on photocatalytic water splitting, e.g., the carbon nitride polymer
melon, consisting of a linear chain of heptazine units linked by −NH–
bridges. It also includes some relevant nonpolymeric molecules, e.g.,
the fullerene OPV acceptor material PCBM, and even two nonconjugated
polymers (fully reduced polyaniline and poly(p-xylene)).
Aside from application, a crucial feature of the validation set is
that it is composed of polymers that are drastically different from
those used for calibration. We can see from Figure that the properties of all systems in the
validation set follow the same linear trends as identified for the
calibration set and thus the calibration developed above is not biased
to the polymers in the calibration set, but can be applied more generally
to others. Differences between calibrated IPEA-xTB/sTDA-xTB and B3LYP
data reflect this, with MAD values of 0.21 and 0.16 V for IP and EA
and 0.17 eV for optical gaps calculated using both B3LYP and CAM-B3LYP.
For the water dielectric screening model, MAD values of 0.16 and 0.14
V for IP and EA and 0.16 eV for optical gaps calculated using both
B3LYP and CAM-B3LYP were obtained.
Figure 5
(a) IP (triangles)/EA (circles) and (b)
optical gap (circles) values
computed at IPEA-xTB/sTDA-xTB and B3LYP/DZP level for the validation
set (Figure S3). In each case, a dielectric
screening model for benzene is used. The corresponding linear regression
prediction line and 95% prediction bounds from Figures and 4 are included.
Analogous results obtained using a dielectric screening model for
water can be found in Figure S19 of the
Supporting Information.
(a) IP (triangles)/EA (circles) and (b)
optical gap (circles) values
computed at IPEA-xTB/sTDA-xTB and B3LYP/DZP level for the validation
set (Figure S3). In each case, a dielectric
screening model for benzene is used. The corresponding linear regression
prediction line and 95% prediction bounds from Figures and 4 are included.
Analogous results obtained using a dielectric screening model for
water can be found in Figure S19 of the
Supporting Information.
Sensitivity of Results to Conformational and Repeat Unit Disorder
To investigate the sensitivity of calculated properties to polymer
conformation, we calculate IP, EA, and optical gaps for 5000 randomly
generated conformers of six selected copolymers. Each conformer is
optimized at GFN-xTB level, with IP/EA and optical gaps calculated
using IPEA-xTB and sTDA-xTB, respectively. Figure shows the properties calculated for each
conformer plotted against the calculated Boltzmann factor relative
to the lowest energy conformer. None of the properties are very sensitive
to the polymer conformation, with the maximum variation of a given
property with respect to conformation around 0.1 (e)V, in line with
previous findings.[78] Further, the maximum
dispersion in predicted properties lies at higher energy regions (low
Boltzmann factor), with far lower variation in IP, EA, and optical
gap in low-energy regions (approaching zero for IP and EA and ∼0.05
eV for optical cap). This highlights that, although an exhaustive
search of all possible conformers of each polymers may be desirable,
it is not necessary to predict the intrinsic electronic properties
of a polymer accurately using an oligomer model. In turn, this supports
our use of a stochastic approach coupled with a classical molecular
mechanics optimization for conformer generation, which inherently
may not necessarily be exhaustive.
Figure 6
(a) IP and (b) EA values computed with
IPEA-xTB and (c) optical
gap values computed with sTDA-xTB for 5000 randomly generated conformers
of polymers shown in panel d. The energy of each conformer is expressed
as a Boltzmann factor relative to the lowest energy conformer obtained
for each polymer model. In this case, each conformer structure has
been optimized via GFN-xTB. Symbol colors on plots match the colors
of the polymer repeat units shown in panel d. Boltzmann factors are
calculated at a temperature of 298 K.
(a) IP and (b) EA values computed with
IPEA-xTB and (c) optical
gap values computed with sTDA-xTB for 5000 randomly generated conformers
of polymers shown in panel d. The energy of each conformer is expressed
as a Boltzmann factor relative to the lowest energy conformer obtained
for each polymer model. In this case, each conformer structure has
been optimized via GFN-xTB. Symbol colors on plots match the colors
of the polymer repeat units shown in panel d. Boltzmann factors are
calculated at a temperature of 298 K.As a secondary observation, comparing the distribution of
Boltzmann
factors for each of the polymers, it appears that differences between
the different polymers can be linked to different inter-repeat unit
interactions. As an example, for polyphenylene, we calculate a broad
distribution of conformers at different energies, while for polypyridine
and polythiophene we obtain a narrow distribution centered at high
energy, with far fewer conformers at low energy. This highlights a
greater thermodynamic preference for polypyridine and polythiophene
to assume particular structures, while polyphenylene and PPV remain
more conformationally flexible from an energetic point of view. Finally,
the lack of sensitivity of the computed properties with respect to
conformation appears to hold even for two copolymers containing meta-linking
1,3-phenylene groups, where changes in conformation can lead to larger
overall changes in geometry.To test the effects of repeat unit
disorder along polymer chains,
i.e., the fact that for asymmetric
monomers there will be different regioregular and regiorandom structures,
we also produced 100 conformers of 25 randomly generated isomers of
polymers comprising asymmetric repeat units, generating analogous
plots to Figure (see
Supporting Information Figure S13). In the
same vein as the above analysis, introducing random disorder along
polymer chains seems to have little effect on this in all, with maximum
deviations in IP, EA, and optical gap around 0.2 (e)V, which reduces
to around 0.05 (e)V for lower-energy structures.
Expected Screening
Rate
Within the entire process,
the conformer-searching step is the clear bottleneck, predominantly
because 5000 conformers per polymer are generated, but also because
the full set of calculations with xTB on the lowest energy conformer
are extremely rapid (typically resulting in <1 min of computation
for a typical polymer model on a typical 24 core desktop machine).
As an illustrative example, using such a setup we expect a screening
rate of approximately 500 polymers per day. However, because of the
observed relative insensitivity of IP, EA, and optical gap values
to conformational degrees of freedom (Figure ), the number of conformers sampled per polymer
can be further reduced and as a result the screening rate suggested
above should be viewed as a lower limit.
Perspectives
As
demonstrated above, both IPEA-xTB and
sTDA-xTB perform well in terms of reproducing the ordering of the
IP, EA, and optical gap values predicted by (TD)B3LYP for the polymers
in the calibration set. After fitting to linear models for EA, IP,
and the optical gap, calibrated IPEA-xTB and sTDA-xTB also accurately
reproduce the corresponding absolute values. More importantly, calibrated
IPEA-xTB and sTDA-xTB also perform well when reproducing the IP/EA
and optical gap values predicted by (TD)B3LYP for the polymers in
the validation set, which are not included in the fitting. This ability
to accurately predict properties of systems outside of the calibration
set is a critical feature for high-throughput virtual screening, where
one expects to screen orders of magnitude more polymers than in the
calibration set. Bearing in mind that high-throughput approaches may
resort to higher-level (DFT) calculations after the identification
of promising leads from a low-cost approach, we show that this step
could effectively be avoided altogether as a result of the inherent
accuracy of the calibrated xTB data.It is interesting to see
that even if we performed our calibration exclusively for conjugated
polymers, the properties of the nonconjugated polymers (fully reduced
polyaniline and poly(p-xylene)) and a nonpolymeric
molecule (PCBM) in our validation set are also reasonably well described.
As such, the calibrated models proposed here are likely reasonably
transferrable and can, for example, be used to predict open-circuit
voltages and approximate energy conversions efficiencies using the
Scharber model[79] for bulk-heterojunction
organic solar cells using a conjugated polymer as a donor and PCBM
(or other molecular materials) as an acceptor.We have performed
separate calibrations for the low and high dielectric
environment cases, relevant to organic photovoltaics/organic light
emitting diodes and water splitting photocatalysis, respectively.
However, one would expect that the same procedure would yield results
of equal accuracy for the case of intermediate dielectric permittivity
values relevant for polymers in contact with moderately polar solvents,
relevant to the applications of conjugated polymers in batteries,
as well as the characterization of their IP/EA values using cyclic
voltammetry.The accuracy of the calibrated IPEA-xTB and sTDA-xTB
results is,
at best, that of the DFT functional they were calibrated to. Conceptually,
it would be just as easy to calibrate IPEA-xTB and sTDA-xTB to experimental
data or results from higher-level quantum chemical calculations. Our
rationale for calibrating to DFT rather than experimental data is
that, at least for IP and EA, there is very limited experimental data
to calibrate to, especially in the presence of water. Similarly, there
is limited data from accurate quantum chemical calculations for adiabatic
potentials of polymers in the presence of a dielectric.Following
on from the above, our usage of the word calibration
is not meant to signify that (TD)DFT using a particular functional
inherently yields results that are more accurate and as such we do
not benchmark here the xTB family of methods or any of the other semiempirical
method discussed above. Rather, the process of calibration and the
fitting of a linear model means that the semiempirical methods can
be used to predict the answer DFT would give for a fraction of the
computational cost. A good illustration of this is the case of the
optical gap, where, as discussed above, at least the trend in optical
gap values predicted by sTDA-xTB might be more reliable than that
predicted by (TD)B3LYP, as the former captures the correct asymptotic
behavior of the electronic potential and the latter does not.The observed relative insensitivity of the predicted IP, EA, and
optical gap values to conformational degrees of freedom is an attractive
property for high-throughput virtual screening for two reasons. First,
it implies that the effect of not finding the true lowest energy conformer
on the predicted optical and electronic properties is only very minor
and that hence a minimal conformer search will suffice. Second, and
perhaps more importantly, it suggests that the effect of conformational
disorder on the predicted properties is likely equally minor. Based
on the good correlation between properties predicted by the xTB family
of methods and (TD)B3LYP, it stands to reason that the observed insensitivity
to conformational degrees of freedom will translate to property predictions
by (TD)DFT, even if orders of magnitude more computationally expensive
to probe.
Conclusions
We demonstrate that
the xTB family of density functional tight-binding
methods from Grimme and co-workers form a powerful basis for a high-throughput
screening method for the optoelectronic properties of conjugated polymers.
We show that after fitting a linear model using a calibration set
of IPEA/sTDA-xTB and (TD)B3LYP results, one can use the former to
predict the latter at a fraction of the computational cost. We also
demonstrate that the (opto)electronic properties of conjugated polymers
are relatively insensitive to conformational degrees of freedom and
that hence a minimal conformer search will probably suffice as well
as, perhaps more importantly, that the effect of conformational disorder
on the predicted properties is likely minor.
Authors: Jun Yan; Xabier Rodríguez-Martínez; Drew Pearce; Hana Douglas; Danai Bili; Mohammed Azzouzi; Flurin Eisner; Alise Virbule; Elham Rezasoltani; Valentina Belova; Bernhard Dörling; Sheridan Few; Anna A Szumska; Xueyan Hou; Guichuan Zhang; Hin-Lap Yip; Mariano Campoy-Quiles; Jenny Nelson Journal: Energy Environ Sci Date: 2022-05-20 Impact factor: 39.714
Authors: Michael Sachs; Reiner Sebastian Sprick; Drew Pearce; Sam A J Hillman; Adriano Monti; Anne A Y Guilbert; Nick J Brownbill; Stoichko Dimitrov; Xingyuan Shi; Frédéric Blanc; Martijn A Zwijnenburg; Jenny Nelson; James R Durrant; Andrew I Cooper Journal: Nat Commun Date: 2018-11-23 Impact factor: 14.919
Authors: Anastasia Vogel; Mark Forster; Liam Wilbraham; Charlotte L Smith; Alexander J Cowan; Martijn A Zwijnenburg; Reiner Sebastian Sprick; Andrew I Cooper Journal: Faraday Discuss Date: 2019-07-04 Impact factor: 4.008
Authors: Yang Bai; Liam Wilbraham; Benjamin J Slater; Martijn A Zwijnenburg; Reiner Sebastian Sprick; Andrew I Cooper Journal: J Am Chem Soc Date: 2019-05-22 Impact factor: 15.419
Authors: Robert M Ziolek; Alejandro Santana-Bonilla; Raquel López-Ríos de Castro; Reimer Kühn; Mark Green; Christian D Lorenz Journal: ACS Nano Date: 2022-09-14 Impact factor: 18.027