Infrared photodissociation analyses are supported by theoretical calculations that allow a trustworthy interpretation of experimental spectra of gaseous ions. B3LYP calculations are the most prominent method used to model IR spectra, as detailed in our bibliographic survey. However, this and other commonly used methods are known to provide inaccurate energy values and geometries, especially when it comes to long-range interactions, such as intramolecular H-bonds, which show increased anharmonicity. Therefore, we evaluated some of the most commonly used density functional theory methods (B3LYP, CAM-B3LYP, and M06-2X) and basis sets (6-31+G(d,p), 6-311++G(d,p), 6-311++G(3df,2pd), aug-cc-pVDZ, and aug-cc-pVTZ), including anharmonicity and dispersion corrections. The results were compared to MP2 calculations and to experimental high-frequency (2000-4000 cm-1) IR multiphotonic dissociation (IRMPD) spectra of two protonated model molecules containing intramolecular hydrogen bonds: biotin and tryptophan. M06-2X/6-31+G(d,p) was shown to be the most cost-effective level of theory, whereas CAM-B3LYP was the most efficient method to describe the van der Waals interactions. The use of the dispersion correction D3, proposed by Grimme, improved the description of O-H vibrations involved in H-bonding but worsened the description of N-H stretches. Anharmonic calculations were shown to be extremely expensive when compared to other approaches. The efficiencies of well-established scaling factors (SFs) in opposition to sample-dependent SFs were also discussed and the use of fitted SFs were shown to be the most cost-effective approach to predict IRMPD spectra. M06-2X/6-31+G(d,p) and CAM-B3LYP/aug-cc-pVDZ were also tested against the fingerprint region. Our results suggest that these methods can also be used for analysis in this lower frequency range and should be regarded as the methods of choice for cost-effective IRMPD simulations rather than the ubiquitous B3LYP method, especially when further molecular properties are needed.
Infrared photodissociation analyses are supported by theoretical calculations that allow a trustworthy interpretation of experimental spectra of gaseous ions. B3LYP calculations are the most prominent method used to model IR spectra, as detailed in our bibliographic survey. However, this and other commonly used methods are known to provide inaccurate energy values and geometries, especially when it comes to long-range interactions, such as intramolecular H-bonds, which show increased anharmonicity. Therefore, we evaluated some of the most commonly used density functional theory methods (B3LYP, CAM-B3LYP, and M06-2X) and basis sets (6-31+G(d,p), 6-311++G(d,p), 6-311++G(3df,2pd), aug-cc-pVDZ, and aug-cc-pVTZ), including anharmonicity and dispersion corrections. The results were compared to MP2 calculations and to experimental high-frequency (2000-4000 cm-1) IR multiphotonic dissociation (IRMPD) spectra of two protonated model molecules containing intramolecular hydrogen bonds: biotin and tryptophan. M06-2X/6-31+G(d,p) was shown to be the most cost-effective level of theory, whereas CAM-B3LYP was the most efficient method to describe the van der Waals interactions. The use of the dispersion correction D3, proposed by Grimme, improved the description of O-H vibrations involved in H-bonding but worsened the description of N-H stretches. Anharmonic calculations were shown to be extremely expensive when compared to other approaches. The efficiencies of well-established scaling factors (SFs) in opposition to sample-dependent SFs were also discussed and the use of fitted SFs were shown to be the most cost-effective approach to predict IRMPD spectra. M06-2X/6-31+G(d,p) and CAM-B3LYP/aug-cc-pVDZ were also tested against the fingerprint region. Our results suggest that these methods can also be used for analysis in this lower frequency range and should be regarded as the methods of choice for cost-effective IRMPD simulations rather than the ubiquitous B3LYP method, especially when further molecular properties are needed.
Mass spectrometry (MS)
is a useful technique capable of being employed in the most diverse
applications, such as omic sciences,[1] gas-phase
reactions,[2,3] organic and organometallic reaction mechanism
determination,[4−6] and forensic analyses.[7] In this context, the ongoing development of novel ionization sources
improved the efficiency and versatility of MS.[8−11] The high sensitivity of MS is
also one of its advantageous features, which allows it to perform
the detection of trace analytes.[12] Nevertheless,
MS is only able to determine the mass–charge ratio of the ions
and does not provide any information about their conformational equilibrium
nor it is suitable to differentiate isomers, requiring the coupling
of ion mobility,[13] chromatography,[14] or tandem MS methods[15] to allow for this differentiation.Photodissociation methods,
such as infrared multiphotonic dissociation (IRMPD) spectroscopy,
cold-ion predissociation spectroscopy, and IR depletion of UV photofragmentation
[(IR)–UV] spectroscopy,[16−18] have stood out as valuable approaches
to the study of these gas-phase ions over the last decades.[19,20] The coupling of MS with vibrational spectroscopy has been employed
in the evaluation of protonation and coordination sites and conformational
equilibrium of gaseous ions of biomolecules such as vitamins,[21] amino acids,[22−24] and carbohydrates,[25] many other chemical systems such as organometallic
compounds,[26] and a myriad of other organic[27−29] and inorganic molecules.[30,31]Since the development
of free-electron laser sources in France (CLIO) and the Netherlands
(FELIX) and improvements to tabletop laser systems, these techniques
have received growing attention.[32,33] In this work,
we focus on IRMPD spectroscopy, an action ion spectroscopy that relies
on the absorption of multiple IR photons. Although very efficient
in assigning gaseous ion conformation, the multiphotonic nature of
this technique and the high temperature of the ions, when compared
to predissociation spectroscopy, are known to cause broadening and
redshifts in spectral bands.[34] These effects
are more accentuated when analyzing the fingerprint region, where
more photons are needed to perform the dissociation.To assign
a trustworthy structure and properties to the evaluated ions, the
experimental data are usually compared with theoretical calculations,
in which the density functional theory (DFT) method B3LYP[35,36] is ubiquitously used; eighty percentage of the 50 most cited articles
in the IRMPD spectroscopy area in the last decade performed calculations
using this method (Figure ).
Figure 1
Most used methods and basis set to predict IR spectra by percentage
as found by searching the topic “IRMPD spectroscopy”
and refining the results by “article” as document type
in Web of Science on December 2017. The upper part refers to the 50
most cited papers from 2007 to 2017, whereas the lower part refers
to the 50 most recent publications (2017). The complete list of the
methods and their respective references are available in the Supporting Information (Tables S1 and S2). The
bars inside the dashed area are amplified 5-fold for better visualization.
FT-DTCF refers to Fourier transform of dipole time correlation function;
PLEF refers to projection of electric field. Percentages rounded up
for clarity.
Most used methods and basis set to predict IR spectra by percentage
as found by searching the topic “IRMPD spectroscopy”
and refining the results by “article” as document type
in Web of Science on December 2017. The upper part refers to the 50
most cited papers from 2007 to 2017, whereas the lower part refers
to the 50 most recent publications (2017). The complete list of the
methods and their respective references are available in the Supporting Information (Tables S1 and S2). The
bars inside the dashed area are amplified 5-fold for better visualization.
FT-DTCF refers to Fourier transform of dipole time correlation function;
PLEF refers to projection of electric field. Percentages rounded up
for clarity.In this data collection,
it is notable that the Pople basis sets were always used in association
with B3LYP functional, with rare exceptions, unless a heavy atom was
calculated using an effective core potential. More recent literature
(2017) shows that even though B3LYP is still dominant, the use of
this functional decreased, indicating a search for new alternative
methods. Further, the percentage use of Dunning’s basis set
increased from 4%, in the most cited articles, to 21% in the most
recent ones, whereas the Pople basis set decreased from 88 to 68%.
Nevertheless, any of these computational methods, even the computationally
expensive MP2, do not exactly predict the vibrational spectra of ions,
mainly because of basis set truncation and anharmonicity effects.[37−39]Furthermore, the most commonly used DFT methods are known
to produce energy values with systematic errors caused by the several
approximations used to describe electronic density, which may influence
the calculated geometry, especially when long-range interactions are
involved.[40,41] For this reason, the evaluation of properties
calculated by these methods requires adjustment by the use of scaling
factors (SFs), regardless of the level of theory selected.The
efficiency of the SFs has been widely discussed in the literature.[42−44] Although this approach
could provide good correlations between experimental and calculated
data, SFs for all methods and basis set combinations are not easily
available from reliable sources, such as the U.S. National Institute
of Standards and Technology (NIST). When available, SFs are sometimes
parameterized for energy and not for frequency values.[45] Further, different molecular test sets could
provide different SFs, and these values are occasionally updated or
corrected by new studies. Consequently, researchers frequently use
empirical SFs to match the experimental data,[20] despite these empirical SFs lacking physical interpretations, which
may lead to erroneous assignments.The use of the calculated
relative free energy to differentiate possible gas-phase isomers may
sometimes be troublesome. Some systems show the lower energy conformer
to be less than a few kcal mol–1 more stable than
the other conformers surveyed, a level of accuracy that would not
be attained by the typical methodologies listed above.[46,47] In that sense, most of the gas-phase calculations reported in the
literature and compared to experimental dissociation spectroscopy
use a known absorption band to reference the calculations or use a
dual SF optimized for vibrational frequency prediction.[20,48]Unfortunately, the use of anharmonic calculations and higher
accuracy methodologies is too costly to allow their widespread use
as standard methods to survey multiple conformations, let alone extensive
conformational search algorithms. Some studies benchmark computational
methods to better describe the conformation of gas-phase ions of specific
systems using photodissociation techniques.[49−52]Some dynamic approaches
to describe anharmonic oscillators are known to provide good results,
as in the study of Grégoire and co-workers,[53] in which the IR spectra (fingerprint region) are predicted
by considering the contribution of different populations of some peptides
at 300 K as sampled by molecular dynamic simulations. Hernandez and
co-workers performed Born–Oppenheimer molecular dynamics of
an all-Ala b6 fragment of the AAAAAAMA peptide, also at 300 K.[54] The authors identified a Zundel-type H-bond
by using electronic total energies and gradients generated by B3LYP/6-31G(d)
level of theory and vibrational spectra generated by Fourier transform
of the autocorrelation function of the first derivative of the dipole
moment obtained from the molecular dynamic analysis. Galimberti and
co-workers applied a similar approach but used the Fourier transform
to the atomic polar tensors instead of the dipole autocorrelation
function.[55]Even though there are
more efficient approaches to describe the vibrational spectra of trapped
ions via computational chemistry, as discussed above, our survey shows
that most researchers still rely on DFT calculations to interpret
IRMPD spectra (Figure ). This scenario is because the DFT approaches are less computationally
expensive and because more rigorous calculation would not surpass
the inherent resolution of the IRMPD technique.[34] Therefore, it is relevant to evaluate the effectiveness
of the ongoing approaches of spectra prediction. This study focuses
on the evaluation of the efficiency of some of the most used theoretical
calculation methods (B3LYP, M06-2X, and MP2) in association with commonly
used basis sets to verify the level of theory that best describes
two ions known to have intramolecular interactions, that is, protonated
biotin and protonated tryptophan. These ions were chosen based on
the presence of poorly described intramolecular H-bonds in these systems
(Figure ).
Figure 2
Structure of protonated biotin and tryptophan.
We
also evaluated the effects caused by long-range correction (CAM-B3LYP[46]) and the dispersion correction factor proposed
by Grimme (DFT-D3 schemes[56]), namely, B3LYP-D3,
and as implemented in M06-2X and compared to M06-2X-D3. Anharmonicity
calculations were also evaluated.Structure of protonated biotin and tryptophan.Besides that, IRMPD spectra of
these ions have previously been reported[21,22] and could be used to allow for the validation of our experimental
setup because these are the first IRMPD analyses reported by our group.As will be demonstrated, M06-2X/6-31+G(d,p) was the most cost effective
of the methods surveyed to model our target systems, and the use of
SFs remains the best approach to correct systematic frequency deviations,
outperforming the use of dispersion corrections and extended basis
sets.
Methodology
Experimental Methods
Direct-infusion
electrospray ionization (ESI)-MS/MS experiments of ∼10–5 M solutions of biotin or tryptophan in methanol (Fluka)
were carried out in a modified Bruker AmaZon SL ion trap mass spectrometer
equipped with a perforated ring electrode. Two opposed 3 mm holes
were aligned to a CaF2 window (Thorlabs, WG51050) mounted
on the instrument vacuum manifold. This system allowed the trapped
ions to be irradiated in the center of the trap and the fragments
formed by photofragmentation at known wavelengths to be monitored.
The wavelength values were written to the raw data file during the
data acquisition via LabVIEW-based software that also controlled the
laser wavelength and irradiation time in a fashion similar to that
previously described in the literature.[57]The laser beam exiting the trap was guided by two 45°-mounted
gold-plated mirrors (Thorlabs, NB1-L01) located inside the vacuum
chamber, allowing the detection of the laser beam power going through
the trap. This feature allowed for the evaluation of the actual laser
power during the experiments, as needed for the correction of photofragmentation
yield with laser power.A Continuum Surelite Nd:YAG 1064 nm
laser (530 mJ/pulse, 10–40 ns, 10 Hz) was used to pump a LaserVision
OPO/OPA system used to output IR radiation in the desired spectral
range (2300–4000 cm–1, 14–21 mJ/pulse).
The OPO/OPA system was factory-calibrated using a near-IR wavemeter
at the zero-incidence position of the OPO crystals, allowing for internal
calibration by readjustment of the zero-incidence condition when needed.
Experiments were carried out at the higher mass resolution mode [enhanced
mode—full width at half-maximum (fwhm) = 0.2] after calibration
against tunemix ESI-L calibrant mixture (Agilent) in standard source
and ion optics conditions.Photofragmentation efficiency at
a given wavelength (WL), PEWL was calculated bywhere P represents
parent ion intensities at a given WL and F represents fragment ion intensities at the same
WL. Power-corrected PEWL was obtained by normalizing the
laser beam power as measured by a 407A Spectra-Physics power meter.
Theoretical Methods
Preliminary conformational distribution
survey was performed by the Spartan 16 computational package[58] at the MMF94 force field. The six lowest energy
conformations from a total of 68 for biotin and 36 for tryptophan
were then optimized at the B3LYP/6-31+G(d,p) level of theory using
Gaussian09 (rev. D01)[59] computational package,
default presets on our shared in-house 384 processor (AMD Opteron
6376) CentOS cluster. Anharmonic calculations were performed by adding
the command freq = anharm option in Gaussian 09. The most stable conformer
at the B3LYP/6-31+G(d,p) level of theory was then used as the initial
geometry and reoptimized in all other methods here reported. All calculated
geometries were checked to be true minima by vibrational analysis
and showed no imaginary frequencies. Chemcraft (version 1.8)[60] was used to generate input coordinates and to
visualize and export the calculated spectra. Table contains the SFs used. When available, the
SFs proposed by Kashinski et al.[39] were
used, as the authors demonstrated good agreement with experimental
data. Other sources used were the work of Kieninger and Ventura[61] and NIST databank.[45]
Table 1
Scaling Factors Used To Correct the IR Frequency Calculations
method/basis
6-31+G(d,p)
6-311++G(d,p)
6-311++G(3df,2pd)
aug-cc-pVDZ
aug-cc-pVTZ
B3LYP
0.964[45]
0.963[39]
0.969[61]
0.970[45]
0.968[45]
CAM-B3LYP
0.953a
0.953[39]
0.953[39]
0.956[39]
0.954[39]
M06-2X
0.947[39]
0.947[39]
0.947[39]
0.950[39]
0.956[45]
MP2
0.941[45]
0.952a
0.952[61]
0.959[45]
0.953[45]
These values were not found in the literature.
For CAM-B3LYP/6-31+G(d,p), the SF used was the same for CAM-B3LYP/6-311++G(d,p).
For MP2/6-311++G(d,p), the SF used was the same for MP2/6-311+G(d,p).
These values were not found in the literature.
For CAM-B3LYP/6-31+G(d,p), the SF used was the same for CAM-B3LYP/6-311++G(d,p).
For MP2/6-311++G(d,p), the SF used was the same for MP2/6-311+G(d,p).Experimental and calculated
spectra were compared through the determination of the root-mean-square
deviation (RMSD) of the predicted oscillatory frequency, in cm–1, and the experimental absorption band centroids as
determined by Origin’s 9.0 “peak find” tool.[62]RMSD minimization was carried out by Microsoft
Excel H&S 2016 Solver package.
Results and Discussion
IRMPD
Analyses and Comparison with the Literature
As discussed
in the Introduction, biotin and tryptophan
(Figure ) have previously
been analyzed by IRMPD spectroscopy by other research groups[21,22] and these data were used to validate our experimental IRMPD spectroscopy
setup and the results are reported (Figure ).
Figure 3
IRMPD spectrums of protonated (A) biotin, (B)
tryptophan power-corrected (both obtained by our group), (C) biotin,
and (D) tryptophan power-corrected (both extracted from the literature).
For (A,B), the fwhm values of the most intense peaks are 12 cm –1 (O–H stretch) and 9 cm–1 (N–H stretch), respectively.
IRMPD spectrums of protonated (A) biotin, (B)
tryptophan power-corrected (both obtained by our group), (C) biotin,
and (D) tryptophan power-corrected (both extracted from the literature).
For (A,B), the fwhm values of the most intense peaks are 12 cm –1 (O–H stretch) and 9 cm–1 (N–H stretch), respectively.We replicated the protonated biotin IRMPD analysis made by
Fraschetti and co-workers, and the spectrum obtained by our group
shows a good agreement with their previous results, with the absorption
band centroid differing by only 2–5 cm–1 (Table ).[21]
Table 2
Oscillator Assignment and Comparison between
the Experimental and Literature Absorption Band Centroid Position
in cm–1
oscillator
literature
experimental
biotin
σ C–H
2960
2958
σ C=O+–H···O
3180
3183
σ N–H (sym)
3405
3406
σ N–H (assym)
3475
3480
σ O–H
3562
3567
tryptophan
σ N–H···π
3044a
3059
σ N–H···O
3123a
3121
σ NH3 (assym)
3340
3335
σ N–H (indol)
3500
3500
σ O–H
3555
3553
Extracted from the paper of Pereverzev
et al.[51]
Extracted from the paper of Pereverzev
et al.[51]Although the intensity pattern observed was discrepant,
our system provided a greater sensitivity of the bands in the 2700–3300
cm–1 range, allowing the detection, with reasonable
fragmentation efficiency, of bands previously depicted to be 2 orders
of magnitude less intense (Figure A). The IRMPD spectrum analysis of tryptophan was performed
by Mino and co-workers[22] and the experimental
spectrum obtained by our group has also shown a good agreement with
their results, with the absorption band centroid wavenumbers differing
by 0–5 cm–1 (Figure B and Table ). The N–H···O and N–H···π
stretches, nevertheless, were compared with the work of Pereverzev
et al.[51] to circumvent the difficulty of
defining the centroid of these rather broad bands. In their study
at 5 K, they identified two conformers of tryptophan contributing
to the IR spectra because both conformers differ in only 0.5 kcal
mol–1. Our analysis showed a better matching with
the wavenumbers shown by the higher energy conformer; therefore, these
wavenumbers were used in our comparisons. Nevertheless, the difference
in the RMSD reflected by the use of the other conformer would be irrelevant
in comparison to the inaccuracy in detecting the centroid of such
a broad band, as will be discussed in the next section.It is
worth noting that the literature spectra of biotin and tryptophan
were obtained by using 20 and 15 IR pulses, respectively, whereas
in our experimental setup, only five pulses were used (Figure ). Therefore, differences in
S/N and band intensities were expected. The spectrum of biotin in
the literature is not power-corrected; that is why the comparison
is made with the experimental uncorrected spectrum obtained by our
group. However, the power-corrected spectrum for biotin is found in
the Supporting Information (Figure S1).
Computational Results
The protonation of biotin was reported
to take place at the carbonyl oxygen, causing the species to assume
a folded conformation where the carboxylic acid carboxyl oxygen interacts
with the proton (Figure ). The calculated IR spectrum was mainly affected by differences
in the H-bond distance and the dihedral angle γ.
Figure 4
Geometry of protonated
biotin (top) and tryptophan (bottom).
Geometry of protonated
biotin (top) and tryptophan (bottom).Tryptophan is protonated in the amino acid primary nitrogen
and the NH3 vibration modes are affected by the long-range
H-bond and N–H···π interactions,[22,51] causing the region between 2800 and 3300 cm–1 to
show a broad apparent band, which is the result of a series of different
oscillations present in this region. Further, the existence of low-energy
excited-state conformations causes additional broadening of the stretches
involved in the van der Waals interactions.To perform the correction
of the calculated spectra, SFs were obtained from multiple sources
because there was not a unique source reporting all of the required
values. Because different research groups have used different datasets
to estimate the SF, one could expect the SFs to be dependent on the
dataset used. Nevertheless, because of the use of extensive datasets,
this variation can be neglected. The SFs obtained from the literature
showed little variation between similar basis set calculations by
the same method, which is consistent with the fact that the SF depends
only weakly on the basis set used.[45] The
biggest variation was observed from MP2/6-31+G(d,p) (SF = 0.941) to
MP2/aug-cc-pVDZ (SF = 0.959)—both values were obtained from
NIST.Comparison between experimental and calculated spectra
showed that most of the calculations failed in describing the long-range
interactions (Figure ). The C–H stretch was not used for comparisons because the
calculated spectra showed several bands in this region and the broad
experimental features would not allow the accurate assignment of the
band centroid being compared (Figure and Table ).
Figure 5
Comparison between experimental and calculated spectra for (A)
biotin and (B) tryptophan.
Table 3
RMSD of Predicted Band Centroids in cm–1, Excluding and Including [in Brackets] the Vibrations Affected by
van der Waals Interactions, as a Function of the Methodology Used.
Walltime of Optimization/Frequency Calculation in Min/Processor Are
Reported within Parentheses
method/basis
6-31+G(d,p)
6-311++G(d,p)
6-311++G(3df,2pd)
Aug-cc-pVDZ
aug-cc-pVTZ
biotin
B3LYP
45[47]
31[48]
57[59]
44[51]
42[43]b,c
(72/46)
(188/271)
(1325/1346)
(503/241)
(4048/7839)
CAM-B3LYP
34[31]
25[35]
29[26]
26[28]
20[18]b,c
(154/105)
(344/131)
(1564/1589)
(443/417)
(5857/4235)
M06-2X
21[31]
26[44]
28[24]
26[28]
31[32]b,c
(104/68)
(265/154)
(2064/2666)
(750/1080)
(12062/5044)
MP2
15[21]
22[46]
a
6[16]
a
(681/1861)
(647/2196)
(11524b/a)
(4415/9838)
tryptophan
B3LYP
38[50]
27[40]
50[58]
43[49]
36[47]
(73/36)
(97/101)
(1874/1236)
(266/352)
(2928/5632)
CAM-B3LYP
19[43]
20[37]
21[35]
21[33]
12[30]
(83/61)
(238/175)
(1807/1417)
(663/371)
(1275/2016)
M06-2X
17[44]
24[41]
29[38]b
21[43]
24[50]
(50/36)
(163/120)
(367b/524b)
(597/440)
(2642/6379)
MP2
20[52]
23[54]
a
21[44]
a
(1981/931)
(2745/2128)
(6829b/a)
(4012/3456)
(15439b/a)
Analyses did not finish because of the extensive
time required.
Calculation
performed using 16 processors. All other calculations used 8 processors
as default.
Calculation
performed using keyword “opt = calcfc” because of convergence
problems.
Comparison between experimental and calculated spectra for (A)
biotin and (B) tryptophan.Analyses did not finish because of the extensive
time required.Calculation
performed using 16 processors. All other calculations used 8 processors
as default.Calculation
performed using keyword “opt = calcfc” because of convergence
problems.The RMSD for both
systems was first calculated by not including the wavenumbers from
the bands of N–H and O–H stretches. The RMSD was then
recalculated to include the H-bond and N–H···π
stretches, as shown within brackets in Table .Comparing these RMSD values with
those obtained without the van der Waals interactions, we could observe
that the deviation of biotin and tryptophan calculations was considerably
greater in the second and that this difference could be attributed
to the highly anharmonic intramolecular interactions, as described
in the literature.[63−65]This suggestion was supported by the fact that
CAM-B3LYP was able to describe the systems better than B3LYP because
the CAM-B3LYP method corrects the term for long-range exchange interaction.[46]When the RMSD was calculated without using
the H-bond stretching band, all of the RMSD values decreased, unless
for the CAM-B3LYP values for biotin. This behavior indicates that
the contribution of the H-bond band to the total deviation was smaller
for CAM-B3LYP in comparison to the other methods. For tryptophan,
this trend was not observed, and even though the CAM-B3LYP better
described the van der Waals interactions, the RMSD values were greatly
decreased for the smaller basis set when the interaction bands were
not considered.The 6-311++G(3df,2pd) basis set provided the
worst results when associated with the B3LYP method, even though it
is an extended basis set with diffuse functions in all atoms. It showed
good agreement when used with other functionals, despite being a high
computationally expensive basis set. It is noticeable that the 6-311++G(d,p)
basis set did not provide good results when H-bond was taken into
account. Nevertheless, when the H-bond was removed from the RMSD calculation,
6-311++G(d,p) was, in most of the cases, the better basis set to use
with B3LYP-based methods in both systems.MP2 calculations showed
good agreement with experimental spectra, but the time required to
perform these calculations was significantly higher compared to other
calculations, rendering them inefficient for extensive molecular systems.
Nonetheless, because MP2/aug-cc-PVDZ was the level of theory that
better described the experimental spectra, we can rely on this information
to obtain a more precise value of hydrogen-bond distance and angle
γ for protonated biotin in gaseous state, at 1.71 Å and
75°, respectively.The aug-cc-pVTZ basis set provided good
results in the calculations of tryptophan and biotin IR spectra. CAM-B3LYP,
once more, was shown to behave well when associated with an extended
basis set. However, calculations using this basis set were highly
time-consuming and sometimes produced RMSDs approximately the same
or even worse than calculations performed with a smaller basis set.
Because of the expensive computational cost, we were not able to perform
calculations using this basis set associated to MP2.M06-2X
and CAM-B3LYP were the methods showing better results in a reasonable
time. M06-2X/6-31+G(d,p) was the level of theory that was demonstrated
to have the best compromise between low error and computational cost.
In fact, it was the level of theory with the best results for tryptophan
and the best of the DFT-based methods for biotin when long-range interactions
were not considered. CAM-B3LYP/aug-cc-pVDZ was more computationally
expensive; however, it is a reasonable choice when one tries to study
intramolecular interactions such as hydrogen bonds within a practical
time.It is worth noting that tryptophan intramolecular band
centroid determination is affected by the possible existence of multiple
conformations, as discussed before. This behavior can be depicted
by the higher RMSD observed when this band is considered for the RMSD
calculation, as shown within brackets in Table .
Dispersion and Anharmonic Corrections
Because it is well-established that the usual DFT calculations such
as B3LYP are not effective in describing long-range interactions,
we evaluated two approaches used to overcome errors caused by these
effects, other than the use of SFs. Our survey showed that anharmonic
frequency calculations and dispersion corrections have recently been
used by researchers for IRMPD spectroscopy assignment. The anharmonic
calculation is evaluated because of the enhanced anharmonic character
of the intramolecular interactions, causing redshifts in the frequencies
of these oscillations. The empirical dispersion correction, as proposed
by Grimme for the DFT functional (termed DFT-D),[56,66] is used to model systems containing long-range interactions, resulting
in a more reliable conformation after optimization. To compare the
efficiency of these methods in predicting the experimental spectra,
we used unscaled theoretical spectra and compared the results of these
methods, as shown in Table .
Table 4
RMSD of Unscaled Predicted Band Centroids
in cm–1 as a Function of the Methodology Used. Walltime
of Optimization/Frequency Calculation in Min/Processor Are Reported
within Parentheses
method/basis
6-31+G(d,p)
6-311++G(d,p)
6-311++G(3df,2pd)
aug-cc-pVDZ
biotin
B3LYP
175
176
170
160
(72/46)
(188/271)
(1325/1346)
(503/241)
M06-2X
211
217
200
197
(104/68)
(265/154)
(2064/2666)
(750/1080)
B3LYP-D3
168
169
163
151
(110/71.5)
(215/222)
(1086/1027)
(578/381)
M06-2X-D3
211
217
200
197
(120/86)
(199/150)
(2233/1753)
(654/887)
B3LYP
90
anharmonic
(28/9434)
M06-2X
145
anharmonic
(55/9717)
tryptophan
B3LYP
173
165
165
152
(73/36)
(97/101)
(1874/1236)
(266/352)
M06-2X
217
212
205a
200
(50/36)
(163/120)
(367a/524a)
(597/440)
B3LYP-D3
180
172
171
158
(42/44)
(83/128)
(352/509)
(265/569)
M06-2X-D3
217
212
205a
200
(45/64)
(62/93)
(219a/496a)
(291/616)
B3LYP
36
anharmonic
(32/6581)
M06-2X
183
anharmonic
(39/7919)
Calculation performed using 16 processors. All other calculations
used 8 processors as default.
Calculation performed using 16 processors. All other calculations
used 8 processors as default.DFT-D3 was shown to be almost equally time-consuming as the calculations
performed without using the empirical dispersion correction, being
even quicker in some occasions.For both systems, this correction
factor did not improve the calculations performed with M06-2X, which
was to be expected because this functional already includes dispersion
correction in its formulation. However, when applied to B3LYP, different
trends of RMSD value were observable for each system. For biotin,
a comparison of each oscillator shows that the D3 correction improved
the description of the H-bond, whereas it worsened the description
of N–H symmetric and asymmetric stretches—mainly the
first one—as detailed in the Supporting Information (Tables S3 and S4). The free O–H stretch
showed almost no variation.The RMSDs in tryptophan calculation
were larger when using D3 correction, and once more, a worsening in
the description of N–H stretches is observed, mainly the H-Bond
(N–H···O) oscillator.The optimization
steps for the harmonic and anharmonic calculation schemes are the
same, and therefore, the computational cost remains practically unchanged
for both methods. The frequency calculations, nevertheless, were time-consuming.
The anharmonic calculations showed improvement in the prediction of
the vibrational spectra of both molecules, but the results were just
as satisfactory as the use of SFs for tryptophan when using B3LYP.
In fact, anharmonic calculations do not always show better results
than harmonic calculations, as demonstrated in the work of Buczek
et al.[67]As the authors demonstrated,
anharmonic calculations were not suitable to be used with a small
basis set but provided better spectra from the 6-31+G(d,p) basis set
to more extended ones. Nevertheless, we find it worth noting that
they used smaller systems, namely, water and formaldehyde. As this
theoretical calculation showed itself to be too expensive, we were
not able to perform it with a more extended basis set.Using
dispersion corrections did not provide better results than using SFs.
These results suggest that the calculations have to be doubly corrected
by using the dispersion correction and SF. Johnson et al.[68] even proposed some SFs for anharmonic vibrations—in
which the SF to B3LYP/6-31+G(d,p) is virtually 1. On the other hand,
we believe that if the intention is to predict vibrational spectra
and a SF is necessary, a practical choice would be to perform a less
expensive calculation and use a well-established SF.
Performance
at the Low-Frequency and Fingerprint Regions
Because IRMPD
data are also collected in the lower frequency range by predissociation
spectroscopy or using free-electron lasers, we used experimental data
from the literature to evaluate whether the most efficient theoretical
calculation we suggested for the prediction of long-range interactions
is also suitable for predicting the experimental spectra over the
fingerprint region (Table ). The long-range interaction effect is also expected to play
a relevant role in this spectral region. Fraschetti and co-workers
have assessed the carbonyl stretch region of biotin and demonstrated
that the wavenumbers of these oscillations are affected by the H-bond
formed after protonation of oxygen.[21]
Table 5
Comparison between Experimental Data from the Literature
and the Most Efficient Calculations in the Fingerprint Region at M06-2X/6-31+G(d,p)
and CAM-B3LYP/aug-cc-pVTZ Levels of Theory (in cm−1)
oscillator
literature
M06-2X
CAM-B3LYP
biotin
δ N1H2 and N3H2 (scissoring); C2–N3 (stretching)
1632
1601
1612
σ C2=O2
1642
1648
1646
σ C10=O10
1706
1700
1693
tryptophan
σ C=C; δ C–H (wag); O–H (wag)
1425
1391
1399
δ NH3 (umbrella)
1441
1414
1435
σ C=O
1785
1782
1751
As can
be seen in Table ,
both levels of theory provided a good prediction of IR spectra, reproducing
almost the same RMSD as those obtained at higher frequencies—25
and 19 cm–1 for biotin and tryptophan in M06-2X/6-31+G(d,p),
respectively, and 25 and 14 cm–1 for biotin and
tryptophan in CAM-B3LYP/aug-cc-pVDZ, respectively.The analysis
of tryptophan in the fingerprint region was performed by Pereverzev
et al.[51] As explained previously, their
group identified two conformers of tryptophan when using a cryogenic
IR–IR–UV system, allowing them to obtain highly resolved
spectra. We concluded that conformer A better matched our experiments,
and therefore, we used its bands to perform our comparisons. In this
case, M06-2X/6-31+G(d,p) better described the C=O stretch,
whereas CAM-B3LYP/aug-cc-pVDZ provided a better prediction of the
fingerprint bands.
Use of “Fitted” Scaling Factors
Besides the use of well-established SFs from the literature that
are based on test sets and therefore are less sample dependent, a
widespread procedure in the literature is to fit the SFs for minimizing
the RMSDs (what we call RMSD fitting in this manuscript) or scaling
the theoretical data so that one chosen band perfectly fits the experimental
data (band fitting).To check the performance of these alternative
methodologies, we carried out the RMSD minimization procedures and
compared the RMSD obtained by these methods to the well-stablished
SFs as shown in Figure . The SFs obtained for each optimization can be found in the Supporting Information (Table S5).
Figure 6
RMSD comparison
for different SFs for biotin (a) and tryptophan (b) for the tested
methodologies. Literature SFs are in orange, RMSD fitting is in blue,
and peak fitting is in gray.
RMSD comparison
for different SFs for biotin (a) and tryptophan (b) for the tested
methodologies. Literature SFs are in orange, RMSD fitting is in blue,
and peak fitting is in gray.The most striking characteristic of Figure is that the fitted procedures produce minimal
RMSD values for B3LYP and CAM-B3LYP that are virtually the same as
the one found for M06-2X/aug-cc-pVDZ for biotin. Those values are
much smaller than the RMSDs found for the ab initio MP2 method, even
so it is not correct to assume that MP2 performance is worse than
the hybrid functionals.Nevertheless, the B3LYP-based methodology
performance is not unexpected in this case, as the fitting procedures
produce smaller RMSD by their construction.Another relevant
information is that by comparing the literature SF results and fitted
results for both systems, one can clearly see that the B3LYP-based
functionals are corrected in greater extension than M06-2X and MP2,
which suggests that the B3LYP-based functionals are more sample-dependent
than the other methodologies.By comparing the overall results
for biotin, B3LYP, CAM-B3LYP, and M06-2X, we have achieved the minimum
RMSD for the fitted procedure. CAM-B3LYP is as good as B3LYP for the
smallest and greatest basis set but not for the intermediate ones.
For tryptophan, the fitted RMSD procedures for the B3LYP-based functionals
perform better than all other methods.These analyses also show
that the SF has an important role and that further studies may be
necessary to obtain IRMPD tailored SFs.An interesting result
is that if one chose to use sample-dependent fitted SFs, the B3LYP
methodology would give the best results, despite the unsuitability
of this methodology for hydrogen bonds as previously depicted.Even if the use of sample-specific fitted SFs provides better results,
their use for method comparison is troublesome because any differences
from the methodologies are leveled off by the different SFs.
Conclusions
When performing theoretical calculations, it is a logical decision
to rely on already established methodologies. B3LYP has stood out
as an efficient approach to obtain good prediction spectra in a short
time, compared to ab initio methods. Nonetheless, B3LYP presents some
disadvantages, such as a bad description of van der Waals interactions
and flawed calculation of energy, which could lead to an inaccurate
structural assignment.Our results suggest that M06-2X/6-31+G(d,p)
could be a better DFT level of theory to be used in predicting vibrational
spectra. Comparing this level of theory with B3LYP/6-31+G(d,p), we
observed that the increase in the time required was not too substantial,
whereas the decrease in the RMSD was significant. On the other hand,
if a better description of a long-range interaction is intended, CAM-B3LYP
seems to be the better choice, mainly when associated with the augmented
double or triple zeta correlation-consistent aug-cc-pVDZ or aug-cc-pVTZ.We tested two alternatives for correcting the dispersion effects.
The empirical dispersion correction proposed by Grimme for DFT methods
(DFT-D3) improved the B3LYP analyses of biotin slightly, but on the
other hand, it increased the RMSD of the predicted spectra of tryptophan.
Despite obtaining a better prediction of the of the H-bond oscillator,
this calculation worsened the description of N–H stretches.
Anharmonic calculations provided better predicted vibrational spectra
than did harmonic calculations when the 6-31+G(d,p) basis set was
used. However, the computational cost was hugely increased. Nonetheless,
none of these approaches were more efficient than the use of SF.It should be noticed that a minimum RMSD may be obtained by fitting
the SF for a specific sample. In this case, our results show that
the most cost-effective B3LYP/6-31+G(d,p) method performs better based
on the RMSD results than the other methodologies tested, including
MP2 calculations, despite their intrinsic inaccuracy.[37,38,43,44]
Authors: Kyle M Lambert; Joshua B Cox; Lin Liu; Amy C Jackson; Sam Yruegas; Kenneth B Wiberg; John L Wood Journal: Angew Chem Int Ed Engl Date: 2020-05-08 Impact factor: 15.336
Authors: Wouter A Remmerswaal; Kas J Houthuijs; Roel van de Ven; Hidde Elferink; Thomas Hansen; Giel Berden; Herman S Overkleeft; Gijsbert A van der Marel; Floris P J T Rutjes; Dmitri V Filippov; Thomas J Boltje; Jonathan Martens; Jos Oomens; Jeroen D C Codée Journal: J Org Chem Date: 2022-06-24 Impact factor: 4.198
Authors: Silvia Gervasoni; Giuliano Malloci; Andrea Bosin; Attilio V Vargiu; Helen I Zgurskaya; Paolo Ruggerone Journal: Sci Data Date: 2022-04-01 Impact factor: 6.444