Literature DB >> 31459042

Evaluation of Common Theoretical Methods for Predicting Infrared Multiphotonic Dissociation Vibrational Spectra of Intramolecular Hydrogen-Bonded Ions.

André F Rodrigues-Oliveira1, Francisco W M Ribeiro1, Gustavo Cervi1, Thiago C Correra1.   

Abstract

Infrared photodissociation analyses are supported by theoretical calculations that allow a trustworthy interpretation of experimental spectra of gaseous ions. B3LYP calculations are the most prominent method used to model IR spectra, as detailed in our bibliographic survey. However, this and other commonly used methods are known to provide inaccurate energy values and geometries, especially when it comes to long-range interactions, such as intramolecular H-bonds, which show increased anharmonicity. Therefore, we evaluated some of the most commonly used density functional theory methods (B3LYP, CAM-B3LYP, and M06-2X) and basis sets (6-31+G(d,p), 6-311++G(d,p), 6-311++G(3df,2pd), aug-cc-pVDZ, and aug-cc-pVTZ), including anharmonicity and dispersion corrections. The results were compared to MP2 calculations and to experimental high-frequency (2000-4000 cm-1) IR multiphotonic dissociation (IRMPD) spectra of two protonated model molecules containing intramolecular hydrogen bonds: biotin and tryptophan. M06-2X/6-31+G(d,p) was shown to be the most cost-effective level of theory, whereas CAM-B3LYP was the most efficient method to describe the van der Waals interactions. The use of the dispersion correction D3, proposed by Grimme, improved the description of O-H vibrations involved in H-bonding but worsened the description of N-H stretches. Anharmonic calculations were shown to be extremely expensive when compared to other approaches. The efficiencies of well-established scaling factors (SFs) in opposition to sample-dependent SFs were also discussed and the use of fitted SFs were shown to be the most cost-effective approach to predict IRMPD spectra. M06-2X/6-31+G(d,p) and CAM-B3LYP/aug-cc-pVDZ were also tested against the fingerprint region. Our results suggest that these methods can also be used for analysis in this lower frequency range and should be regarded as the methods of choice for cost-effective IRMPD simulations rather than the ubiquitous B3LYP method, especially when further molecular properties are needed.

Entities:  

Year:  2018        PMID: 31459042      PMCID: PMC6644661          DOI: 10.1021/acsomega.8b00815

Source DB:  PubMed          Journal:  ACS Omega        ISSN: 2470-1343


Introduction

Mass spectrometry (MS) is a useful technique capable of being employed in the most diverse applications, such as omic sciences,[1] gas-phase reactions,[2,3] organic and organometallic reaction mechanism determination,[4−6] and forensic analyses.[7] In this context, the ongoing development of novel ionization sources improved the efficiency and versatility of MS.[8−11] The high sensitivity of MS is also one of its advantageous features, which allows it to perform the detection of trace analytes.[12] Nevertheless, MS is only able to determine the mass–charge ratio of the ions and does not provide any information about their conformational equilibrium nor it is suitable to differentiate isomers, requiring the coupling of ion mobility,[13] chromatography,[14] or tandem MS methods[15] to allow for this differentiation. Photodissociation methods, such as infrared multiphotonic dissociation (IRMPD) spectroscopy, cold-ion predissociation spectroscopy, and IR depletion of UV photofragmentation [(IR)–UV] spectroscopy,[16−18] have stood out as valuable approaches to the study of these gas-phase ions over the last decades.[19,20] The coupling of MS with vibrational spectroscopy has been employed in the evaluation of protonation and coordination sites and conformational equilibrium of gaseous ions of biomolecules such as vitamins,[21] amino acids,[22−24] and carbohydrates,[25] many other chemical systems such as organometallic compounds,[26] and a myriad of other organic[27−29] and inorganic molecules.[30,31] Since the development of free-electron laser sources in France (CLIO) and the Netherlands (FELIX) and improvements to tabletop laser systems, these techniques have received growing attention.[32,33] In this work, we focus on IRMPD spectroscopy, an action ion spectroscopy that relies on the absorption of multiple IR photons. Although very efficient in assigning gaseous ion conformation, the multiphotonic nature of this technique and the high temperature of the ions, when compared to predissociation spectroscopy, are known to cause broadening and redshifts in spectral bands.[34] These effects are more accentuated when analyzing the fingerprint region, where more photons are needed to perform the dissociation. To assign a trustworthy structure and properties to the evaluated ions, the experimental data are usually compared with theoretical calculations, in which the density functional theory (DFT) method B3LYP[35,36] is ubiquitously used; eighty percentage of the 50 most cited articles in the IRMPD spectroscopy area in the last decade performed calculations using this method (Figure ).
Figure 1

Most used methods and basis set to predict IR spectra by percentage as found by searching the topic “IRMPD spectroscopy” and refining the results by “article” as document type in Web of Science on December 2017. The upper part refers to the 50 most cited papers from 2007 to 2017, whereas the lower part refers to the 50 most recent publications (2017). The complete list of the methods and their respective references are available in the Supporting Information (Tables S1 and S2). The bars inside the dashed area are amplified 5-fold for better visualization. FT-DTCF refers to Fourier transform of dipole time correlation function; PLEF refers to projection of electric field. Percentages rounded up for clarity.

Most used methods and basis set to predict IR spectra by percentage as found by searching the topic “IRMPD spectroscopy” and refining the results by “article” as document type in Web of Science on December 2017. The upper part refers to the 50 most cited papers from 2007 to 2017, whereas the lower part refers to the 50 most recent publications (2017). The complete list of the methods and their respective references are available in the Supporting Information (Tables S1 and S2). The bars inside the dashed area are amplified 5-fold for better visualization. FT-DTCF refers to Fourier transform of dipole time correlation function; PLEF refers to projection of electric field. Percentages rounded up for clarity. In this data collection, it is notable that the Pople basis sets were always used in association with B3LYP functional, with rare exceptions, unless a heavy atom was calculated using an effective core potential. More recent literature (2017) shows that even though B3LYP is still dominant, the use of this functional decreased, indicating a search for new alternative methods. Further, the percentage use of Dunning’s basis set increased from 4%, in the most cited articles, to 21% in the most recent ones, whereas the Pople basis set decreased from 88 to 68%. Nevertheless, any of these computational methods, even the computationally expensive MP2, do not exactly predict the vibrational spectra of ions, mainly because of basis set truncation and anharmonicity effects.[37−39] Furthermore, the most commonly used DFT methods are known to produce energy values with systematic errors caused by the several approximations used to describe electronic density, which may influence the calculated geometry, especially when long-range interactions are involved.[40,41] For this reason, the evaluation of properties calculated by these methods requires adjustment by the use of scaling factors (SFs), regardless of the level of theory selected. The efficiency of the SFs has been widely discussed in the literature.[42−44] Although this approach could provide good correlations between experimental and calculated data, SFs for all methods and basis set combinations are not easily available from reliable sources, such as the U.S. National Institute of Standards and Technology (NIST). When available, SFs are sometimes parameterized for energy and not for frequency values.[45] Further, different molecular test sets could provide different SFs, and these values are occasionally updated or corrected by new studies. Consequently, researchers frequently use empirical SFs to match the experimental data,[20] despite these empirical SFs lacking physical interpretations, which may lead to erroneous assignments. The use of the calculated relative free energy to differentiate possible gas-phase isomers may sometimes be troublesome. Some systems show the lower energy conformer to be less than a few kcal mol–1 more stable than the other conformers surveyed, a level of accuracy that would not be attained by the typical methodologies listed above.[46,47] In that sense, most of the gas-phase calculations reported in the literature and compared to experimental dissociation spectroscopy use a known absorption band to reference the calculations or use a dual SF optimized for vibrational frequency prediction.[20,48] Unfortunately, the use of anharmonic calculations and higher accuracy methodologies is too costly to allow their widespread use as standard methods to survey multiple conformations, let alone extensive conformational search algorithms. Some studies benchmark computational methods to better describe the conformation of gas-phase ions of specific systems using photodissociation techniques.[49−52] Some dynamic approaches to describe anharmonic oscillators are known to provide good results, as in the study of Grégoire and co-workers,[53] in which the IR spectra (fingerprint region) are predicted by considering the contribution of different populations of some peptides at 300 K as sampled by molecular dynamic simulations. Hernandez and co-workers performed Born–Oppenheimer molecular dynamics of an all-Ala b6 fragment of the AAAAAAMA peptide, also at 300 K.[54] The authors identified a Zundel-type H-bond by using electronic total energies and gradients generated by B3LYP/6-31G(d) level of theory and vibrational spectra generated by Fourier transform of the autocorrelation function of the first derivative of the dipole moment obtained from the molecular dynamic analysis. Galimberti and co-workers applied a similar approach but used the Fourier transform to the atomic polar tensors instead of the dipole autocorrelation function.[55] Even though there are more efficient approaches to describe the vibrational spectra of trapped ions via computational chemistry, as discussed above, our survey shows that most researchers still rely on DFT calculations to interpret IRMPD spectra (Figure ). This scenario is because the DFT approaches are less computationally expensive and because more rigorous calculation would not surpass the inherent resolution of the IRMPD technique.[34] Therefore, it is relevant to evaluate the effectiveness of the ongoing approaches of spectra prediction. This study focuses on the evaluation of the efficiency of some of the most used theoretical calculation methods (B3LYP, M06-2X, and MP2) in association with commonly used basis sets to verify the level of theory that best describes two ions known to have intramolecular interactions, that is, protonated biotin and protonated tryptophan. These ions were chosen based on the presence of poorly described intramolecular H-bonds in these systems (Figure ).
Figure 2

Structure of protonated biotin and tryptophan.

We also evaluated the effects caused by long-range correction (CAM-B3LYP[46]) and the dispersion correction factor proposed by Grimme (DFT-D3 schemes[56]), namely, B3LYP-D3, and as implemented in M06-2X and compared to M06-2X-D3. Anharmonicity calculations were also evaluated. Structure of protonated biotin and tryptophan. Besides that, IRMPD spectra of these ions have previously been reported[21,22] and could be used to allow for the validation of our experimental setup because these are the first IRMPD analyses reported by our group. As will be demonstrated, M06-2X/6-31+G(d,p) was the most cost effective of the methods surveyed to model our target systems, and the use of SFs remains the best approach to correct systematic frequency deviations, outperforming the use of dispersion corrections and extended basis sets.

Methodology

Experimental Methods

Direct-infusion electrospray ionization (ESI)-MS/MS experiments of ∼10–5 M solutions of biotin or tryptophan in methanol (Fluka) were carried out in a modified Bruker AmaZon SL ion trap mass spectrometer equipped with a perforated ring electrode. Two opposed 3 mm holes were aligned to a CaF2 window (Thorlabs, WG51050) mounted on the instrument vacuum manifold. This system allowed the trapped ions to be irradiated in the center of the trap and the fragments formed by photofragmentation at known wavelengths to be monitored. The wavelength values were written to the raw data file during the data acquisition via LabVIEW-based software that also controlled the laser wavelength and irradiation time in a fashion similar to that previously described in the literature.[57] The laser beam exiting the trap was guided by two 45°-mounted gold-plated mirrors (Thorlabs, NB1-L01) located inside the vacuum chamber, allowing the detection of the laser beam power going through the trap. This feature allowed for the evaluation of the actual laser power during the experiments, as needed for the correction of photofragmentation yield with laser power. A Continuum Surelite Nd:YAG 1064 nm laser (530 mJ/pulse, 10–40 ns, 10 Hz) was used to pump a LaserVision OPO/OPA system used to output IR radiation in the desired spectral range (2300–4000 cm–1, 14–21 mJ/pulse). The OPO/OPA system was factory-calibrated using a near-IR wavemeter at the zero-incidence position of the OPO crystals, allowing for internal calibration by readjustment of the zero-incidence condition when needed. Experiments were carried out at the higher mass resolution mode [enhanced mode—full width at half-maximum (fwhm) = 0.2] after calibration against tunemix ESI-L calibrant mixture (Agilent) in standard source and ion optics conditions. Photofragmentation efficiency at a given wavelength (WL), PEWL was calculated bywhere P represents parent ion intensities at a given WL and F represents fragment ion intensities at the same WL. Power-corrected PEWL was obtained by normalizing the laser beam power as measured by a 407A Spectra-Physics power meter.

Theoretical Methods

Preliminary conformational distribution survey was performed by the Spartan 16 computational package[58] at the MMF94 force field. The six lowest energy conformations from a total of 68 for biotin and 36 for tryptophan were then optimized at the B3LYP/6-31+G(d,p) level of theory using Gaussian09 (rev. D01)[59] computational package, default presets on our shared in-house 384 processor (AMD Opteron 6376) CentOS cluster. Anharmonic calculations were performed by adding the command freq = anharm option in Gaussian 09. The most stable conformer at the B3LYP/6-31+G(d,p) level of theory was then used as the initial geometry and reoptimized in all other methods here reported. All calculated geometries were checked to be true minima by vibrational analysis and showed no imaginary frequencies. Chemcraft (version 1.8)[60] was used to generate input coordinates and to visualize and export the calculated spectra. Table contains the SFs used. When available, the SFs proposed by Kashinski et al.[39] were used, as the authors demonstrated good agreement with experimental data. Other sources used were the work of Kieninger and Ventura[61] and NIST databank.[45]
Table 1

Scaling Factors Used To Correct the IR Frequency Calculations

method/basis6-31+G(d,p)6-311++G(d,p)6-311++G(3df,2pd)aug-cc-pVDZaug-cc-pVTZ
B3LYP0.964[45]0.963[39]0.969[61]0.970[45]0.968[45]
CAM-B3LYP0.953a0.953[39]0.953[39]0.956[39]0.954[39]
M06-2X0.947[39]0.947[39]0.947[39]0.950[39]0.956[45]
MP20.941[45]0.952a0.952[61]0.959[45]0.953[45]

These values were not found in the literature. For CAM-B3LYP/6-31+G(d,p), the SF used was the same for CAM-B3LYP/6-311++G(d,p). For MP2/6-311++G(d,p), the SF used was the same for MP2/6-311+G(d,p).

These values were not found in the literature. For CAM-B3LYP/6-31+G(d,p), the SF used was the same for CAM-B3LYP/6-311++G(d,p). For MP2/6-311++G(d,p), the SF used was the same for MP2/6-311+G(d,p). Experimental and calculated spectra were compared through the determination of the root-mean-square deviation (RMSD) of the predicted oscillatory frequency, in cm–1, and the experimental absorption band centroids as determined by Origin’s 9.0 “peak find” tool.[62] RMSD minimization was carried out by Microsoft Excel H&S 2016 Solver package.

Results and Discussion

IRMPD Analyses and Comparison with the Literature

As discussed in the Introduction, biotin and tryptophan (Figure ) have previously been analyzed by IRMPD spectroscopy by other research groups[21,22] and these data were used to validate our experimental IRMPD spectroscopy setup and the results are reported (Figure ).
Figure 3

IRMPD spectrums of protonated (A) biotin, (B) tryptophan power-corrected (both obtained by our group), (C) biotin, and (D) tryptophan power-corrected (both extracted from the literature). For (A,B), the fwhm values of the most intense peaks are 12 cm –1 (O–H stretch) and 9 cm–1 (N–H stretch), respectively.

IRMPD spectrums of protonated (A) biotin, (B) tryptophan power-corrected (both obtained by our group), (C) biotin, and (D) tryptophan power-corrected (both extracted from the literature). For (A,B), the fwhm values of the most intense peaks are 12 cm –1 (O–H stretch) and 9 cm–1 (N–H stretch), respectively. We replicated the protonated biotin IRMPD analysis made by Fraschetti and co-workers, and the spectrum obtained by our group shows a good agreement with their previous results, with the absorption band centroid differing by only 2–5 cm–1 (Table ).[21]
Table 2

Oscillator Assignment and Comparison between the Experimental and Literature Absorption Band Centroid Position in cm–1

 oscillatorliteratureexperimental
biotinσ C–H29602958
 σ C=O+–H···O31803183
 σ N–H (sym)34053406
 σ N–H (assym)34753480
 σ O–H35623567
    
tryptophanσ N–H···π3044a3059
 σ N–H···O3123a3121
 σ NH3 (assym)33403335
 σ N–H (indol)35003500
 σ O–H35553553

Extracted from the paper of Pereverzev et al.[51]

Extracted from the paper of Pereverzev et al.[51] Although the intensity pattern observed was discrepant, our system provided a greater sensitivity of the bands in the 2700–3300 cm–1 range, allowing the detection, with reasonable fragmentation efficiency, of bands previously depicted to be 2 orders of magnitude less intense (Figure A). The IRMPD spectrum analysis of tryptophan was performed by Mino and co-workers[22] and the experimental spectrum obtained by our group has also shown a good agreement with their results, with the absorption band centroid wavenumbers differing by 0–5 cm–1 (Figure B and Table ). The N–H···O and N–H···π stretches, nevertheless, were compared with the work of Pereverzev et al.[51] to circumvent the difficulty of defining the centroid of these rather broad bands. In their study at 5 K, they identified two conformers of tryptophan contributing to the IR spectra because both conformers differ in only 0.5 kcal mol–1. Our analysis showed a better matching with the wavenumbers shown by the higher energy conformer; therefore, these wavenumbers were used in our comparisons. Nevertheless, the difference in the RMSD reflected by the use of the other conformer would be irrelevant in comparison to the inaccuracy in detecting the centroid of such a broad band, as will be discussed in the next section. It is worth noting that the literature spectra of biotin and tryptophan were obtained by using 20 and 15 IR pulses, respectively, whereas in our experimental setup, only five pulses were used (Figure ). Therefore, differences in S/N and band intensities were expected. The spectrum of biotin in the literature is not power-corrected; that is why the comparison is made with the experimental uncorrected spectrum obtained by our group. However, the power-corrected spectrum for biotin is found in the Supporting Information (Figure S1).

Computational Results

The protonation of biotin was reported to take place at the carbonyl oxygen, causing the species to assume a folded conformation where the carboxylic acid carboxyl oxygen interacts with the proton (Figure ). The calculated IR spectrum was mainly affected by differences in the H-bond distance and the dihedral angle γ.
Figure 4

Geometry of protonated biotin (top) and tryptophan (bottom).

Geometry of protonated biotin (top) and tryptophan (bottom). Tryptophan is protonated in the amino acid primary nitrogen and the NH3 vibration modes are affected by the long-range H-bond and N–H···π interactions,[22,51] causing the region between 2800 and 3300 cm–1 to show a broad apparent band, which is the result of a series of different oscillations present in this region. Further, the existence of low-energy excited-state conformations causes additional broadening of the stretches involved in the van der Waals interactions. To perform the correction of the calculated spectra, SFs were obtained from multiple sources because there was not a unique source reporting all of the required values. Because different research groups have used different datasets to estimate the SF, one could expect the SFs to be dependent on the dataset used. Nevertheless, because of the use of extensive datasets, this variation can be neglected. The SFs obtained from the literature showed little variation between similar basis set calculations by the same method, which is consistent with the fact that the SF depends only weakly on the basis set used.[45] The biggest variation was observed from MP2/6-31+G(d,p) (SF = 0.941) to MP2/aug-cc-pVDZ (SF = 0.959)—both values were obtained from NIST. Comparison between experimental and calculated spectra showed that most of the calculations failed in describing the long-range interactions (Figure ). The C–H stretch was not used for comparisons because the calculated spectra showed several bands in this region and the broad experimental features would not allow the accurate assignment of the band centroid being compared (Figure and Table ).
Figure 5

Comparison between experimental and calculated spectra for (A) biotin and (B) tryptophan.

Table 3

RMSD of Predicted Band Centroids in cm–1, Excluding and Including [in Brackets] the Vibrations Affected by van der Waals Interactions, as a Function of the Methodology Used. Walltime of Optimization/Frequency Calculation in Min/Processor Are Reported within Parentheses

 method/basis6-31+G(d,p)6-311++G(d,p)6-311++G(3df,2pd)Aug-cc-pVDZaug-cc-pVTZ
biotinB3LYP45[47]31[48]57[59]44[51]42[43]b,c
  (72/46)(188/271)(1325/1346)(503/241)(4048/7839)
 CAM-B3LYP34[31]25[35]29[26]26[28]20[18]b,c
  (154/105)(344/131)(1564/1589)(443/417)(5857/4235)
 M06-2X21[31]26[44]28[24]26[28]31[32]b,c
  (104/68)(265/154)(2064/2666)(750/1080)(12062/5044)
 MP215[21]22[46]a6[16]a
  (681/1861)(647/2196)(11524b/a)(4415/9838) 
       
tryptophanB3LYP38[50]27[40]50[58]43[49]36[47]
  (73/36)(97/101)(1874/1236)(266/352)(2928/5632)
 CAM-B3LYP19[43]20[37]21[35]21[33]12[30]
  (83/61)(238/175)(1807/1417)(663/371)(1275/2016)
 M06-2X17[44]24[41]29[38]b21[43]24[50]
  (50/36)(163/120)(367b/524b)(597/440)(2642/6379)
 MP220[52]23[54]a21[44]a
  (1981/931)(2745/2128)(6829b/a)(4012/3456)(15439b/a)

Analyses did not finish because of the extensive time required.

Calculation performed using 16 processors. All other calculations used 8 processors as default.

Calculation performed using keyword “opt = calcfc” because of convergence problems.

Comparison between experimental and calculated spectra for (A) biotin and (B) tryptophan. Analyses did not finish because of the extensive time required. Calculation performed using 16 processors. All other calculations used 8 processors as default. Calculation performed using keyword “opt = calcfc” because of convergence problems. The RMSD for both systems was first calculated by not including the wavenumbers from the bands of N–H and O–H stretches. The RMSD was then recalculated to include the H-bond and N–H···π stretches, as shown within brackets in Table . Comparing these RMSD values with those obtained without the van der Waals interactions, we could observe that the deviation of biotin and tryptophan calculations was considerably greater in the second and that this difference could be attributed to the highly anharmonic intramolecular interactions, as described in the literature.[63−65] This suggestion was supported by the fact that CAM-B3LYP was able to describe the systems better than B3LYP because the CAM-B3LYP method corrects the term for long-range exchange interaction.[46] When the RMSD was calculated without using the H-bond stretching band, all of the RMSD values decreased, unless for the CAM-B3LYP values for biotin. This behavior indicates that the contribution of the H-bond band to the total deviation was smaller for CAM-B3LYP in comparison to the other methods. For tryptophan, this trend was not observed, and even though the CAM-B3LYP better described the van der Waals interactions, the RMSD values were greatly decreased for the smaller basis set when the interaction bands were not considered. The 6-311++G(3df,2pd) basis set provided the worst results when associated with the B3LYP method, even though it is an extended basis set with diffuse functions in all atoms. It showed good agreement when used with other functionals, despite being a high computationally expensive basis set. It is noticeable that the 6-311++G(d,p) basis set did not provide good results when H-bond was taken into account. Nevertheless, when the H-bond was removed from the RMSD calculation, 6-311++G(d,p) was, in most of the cases, the better basis set to use with B3LYP-based methods in both systems. MP2 calculations showed good agreement with experimental spectra, but the time required to perform these calculations was significantly higher compared to other calculations, rendering them inefficient for extensive molecular systems. Nonetheless, because MP2/aug-cc-PVDZ was the level of theory that better described the experimental spectra, we can rely on this information to obtain a more precise value of hydrogen-bond distance and angle γ for protonated biotin in gaseous state, at 1.71 Å and 75°, respectively. The aug-cc-pVTZ basis set provided good results in the calculations of tryptophan and biotin IR spectra. CAM-B3LYP, once more, was shown to behave well when associated with an extended basis set. However, calculations using this basis set were highly time-consuming and sometimes produced RMSDs approximately the same or even worse than calculations performed with a smaller basis set. Because of the expensive computational cost, we were not able to perform calculations using this basis set associated to MP2. M06-2X and CAM-B3LYP were the methods showing better results in a reasonable time. M06-2X/6-31+G(d,p) was the level of theory that was demonstrated to have the best compromise between low error and computational cost. In fact, it was the level of theory with the best results for tryptophan and the best of the DFT-based methods for biotin when long-range interactions were not considered. CAM-B3LYP/aug-cc-pVDZ was more computationally expensive; however, it is a reasonable choice when one tries to study intramolecular interactions such as hydrogen bonds within a practical time. It is worth noting that tryptophan intramolecular band centroid determination is affected by the possible existence of multiple conformations, as discussed before. This behavior can be depicted by the higher RMSD observed when this band is considered for the RMSD calculation, as shown within brackets in Table .

Dispersion and Anharmonic Corrections

Because it is well-established that the usual DFT calculations such as B3LYP are not effective in describing long-range interactions, we evaluated two approaches used to overcome errors caused by these effects, other than the use of SFs. Our survey showed that anharmonic frequency calculations and dispersion corrections have recently been used by researchers for IRMPD spectroscopy assignment. The anharmonic calculation is evaluated because of the enhanced anharmonic character of the intramolecular interactions, causing redshifts in the frequencies of these oscillations. The empirical dispersion correction, as proposed by Grimme for the DFT functional (termed DFT-D),[56,66] is used to model systems containing long-range interactions, resulting in a more reliable conformation after optimization. To compare the efficiency of these methods in predicting the experimental spectra, we used unscaled theoretical spectra and compared the results of these methods, as shown in Table .
Table 4

RMSD of Unscaled Predicted Band Centroids in cm–1 as a Function of the Methodology Used. Walltime of Optimization/Frequency Calculation in Min/Processor Are Reported within Parentheses

 method/basis6-31+G(d,p)6-311++G(d,p)6-311++G(3df,2pd)aug-cc-pVDZ
biotinB3LYP175176170160
  (72/46)(188/271)(1325/1346)(503/241)
 M06-2X211217200197
  (104/68)(265/154)(2064/2666)(750/1080)
 B3LYP-D3168169163151
  (110/71.5)(215/222)(1086/1027)(578/381)
 M06-2X-D3211217200197
  (120/86)(199/150)(2233/1753)(654/887)
 B3LYP90   
 anharmonic(28/9434)   
 M06-2X145   
 anharmonic(55/9717)   
      
tryptophanB3LYP173165165152
  (73/36)(97/101)(1874/1236)(266/352)
 M06-2X217212205a200
  (50/36)(163/120)(367a/524a)(597/440)
 B3LYP-D3180172171158
  (42/44)(83/128)(352/509)(265/569)
 M06-2X-D3217212205a200
  (45/64)(62/93)(219a/496a)(291/616)
 B3LYP36   
 anharmonic(32/6581)   
 M06-2X183   
 anharmonic(39/7919)   

Calculation performed using 16 processors. All other calculations used 8 processors as default.

Calculation performed using 16 processors. All other calculations used 8 processors as default. DFT-D3 was shown to be almost equally time-consuming as the calculations performed without using the empirical dispersion correction, being even quicker in some occasions. For both systems, this correction factor did not improve the calculations performed with M06-2X, which was to be expected because this functional already includes dispersion correction in its formulation. However, when applied to B3LYP, different trends of RMSD value were observable for each system. For biotin, a comparison of each oscillator shows that the D3 correction improved the description of the H-bond, whereas it worsened the description of N–H symmetric and asymmetric stretches—mainly the first one—as detailed in the Supporting Information (Tables S3 and S4). The free O–H stretch showed almost no variation. The RMSDs in tryptophan calculation were larger when using D3 correction, and once more, a worsening in the description of N–H stretches is observed, mainly the H-Bond (N–H···O) oscillator. The optimization steps for the harmonic and anharmonic calculation schemes are the same, and therefore, the computational cost remains practically unchanged for both methods. The frequency calculations, nevertheless, were time-consuming. The anharmonic calculations showed improvement in the prediction of the vibrational spectra of both molecules, but the results were just as satisfactory as the use of SFs for tryptophan when using B3LYP. In fact, anharmonic calculations do not always show better results than harmonic calculations, as demonstrated in the work of Buczek et al.[67] As the authors demonstrated, anharmonic calculations were not suitable to be used with a small basis set but provided better spectra from the 6-31+G(d,p) basis set to more extended ones. Nevertheless, we find it worth noting that they used smaller systems, namely, water and formaldehyde. As this theoretical calculation showed itself to be too expensive, we were not able to perform it with a more extended basis set. Using dispersion corrections did not provide better results than using SFs. These results suggest that the calculations have to be doubly corrected by using the dispersion correction and SF. Johnson et al.[68] even proposed some SFs for anharmonic vibrations—in which the SF to B3LYP/6-31+G(d,p) is virtually 1. On the other hand, we believe that if the intention is to predict vibrational spectra and a SF is necessary, a practical choice would be to perform a less expensive calculation and use a well-established SF.

Performance at the Low-Frequency and Fingerprint Regions

Because IRMPD data are also collected in the lower frequency range by predissociation spectroscopy or using free-electron lasers, we used experimental data from the literature to evaluate whether the most efficient theoretical calculation we suggested for the prediction of long-range interactions is also suitable for predicting the experimental spectra over the fingerprint region (Table ). The long-range interaction effect is also expected to play a relevant role in this spectral region. Fraschetti and co-workers have assessed the carbonyl stretch region of biotin and demonstrated that the wavenumbers of these oscillations are affected by the H-bond formed after protonation of oxygen.[21]
Table 5

Comparison between Experimental Data from the Literature and the Most Efficient Calculations in the Fingerprint Region at M06-2X/6-31+G(d,p) and CAM-B3LYP/aug-cc-pVTZ Levels of Theory (in cm−1)

 oscillatorliteratureM06-2XCAM-B3LYP
biotinδ N1H2 and N3H2 (scissoring); C2–N3 (stretching)163216011612
 σ C2=O2164216481646
 σ C10=O10170617001693
     
tryptophanσ C=C; δ C–H (wag); O–H (wag)142513911399
 δ NH3 (umbrella)144114141435
 σ C=O178517821751
As can be seen in Table , both levels of theory provided a good prediction of IR spectra, reproducing almost the same RMSD as those obtained at higher frequencies—25 and 19 cm–1 for biotin and tryptophan in M06-2X/6-31+G(d,p), respectively, and 25 and 14 cm–1 for biotin and tryptophan in CAM-B3LYP/aug-cc-pVDZ, respectively. The analysis of tryptophan in the fingerprint region was performed by Pereverzev et al.[51] As explained previously, their group identified two conformers of tryptophan when using a cryogenic IR–IR–UV system, allowing them to obtain highly resolved spectra. We concluded that conformer A better matched our experiments, and therefore, we used its bands to perform our comparisons. In this case, M06-2X/6-31+G(d,p) better described the C=O stretch, whereas CAM-B3LYP/aug-cc-pVDZ provided a better prediction of the fingerprint bands.

Use of “Fitted” Scaling Factors

Besides the use of well-established SFs from the literature that are based on test sets and therefore are less sample dependent, a widespread procedure in the literature is to fit the SFs for minimizing the RMSDs (what we call RMSD fitting in this manuscript) or scaling the theoretical data so that one chosen band perfectly fits the experimental data (band fitting). To check the performance of these alternative methodologies, we carried out the RMSD minimization procedures and compared the RMSD obtained by these methods to the well-stablished SFs as shown in Figure . The SFs obtained for each optimization can be found in the Supporting Information (Table S5).
Figure 6

RMSD comparison for different SFs for biotin (a) and tryptophan (b) for the tested methodologies. Literature SFs are in orange, RMSD fitting is in blue, and peak fitting is in gray.

RMSD comparison for different SFs for biotin (a) and tryptophan (b) for the tested methodologies. Literature SFs are in orange, RMSD fitting is in blue, and peak fitting is in gray. The most striking characteristic of Figure is that the fitted procedures produce minimal RMSD values for B3LYP and CAM-B3LYP that are virtually the same as the one found for M06-2X/aug-cc-pVDZ for biotin. Those values are much smaller than the RMSDs found for the ab initio MP2 method, even so it is not correct to assume that MP2 performance is worse than the hybrid functionals. Nevertheless, the B3LYP-based methodology performance is not unexpected in this case, as the fitting procedures produce smaller RMSD by their construction. Another relevant information is that by comparing the literature SF results and fitted results for both systems, one can clearly see that the B3LYP-based functionals are corrected in greater extension than M06-2X and MP2, which suggests that the B3LYP-based functionals are more sample-dependent than the other methodologies. By comparing the overall results for biotin, B3LYP, CAM-B3LYP, and M06-2X, we have achieved the minimum RMSD for the fitted procedure. CAM-B3LYP is as good as B3LYP for the smallest and greatest basis set but not for the intermediate ones. For tryptophan, the fitted RMSD procedures for the B3LYP-based functionals perform better than all other methods. These analyses also show that the SF has an important role and that further studies may be necessary to obtain IRMPD tailored SFs. An interesting result is that if one chose to use sample-dependent fitted SFs, the B3LYP methodology would give the best results, despite the unsuitability of this methodology for hydrogen bonds as previously depicted. Even if the use of sample-specific fitted SFs provides better results, their use for method comparison is troublesome because any differences from the methodologies are leveled off by the different SFs.

Conclusions

When performing theoretical calculations, it is a logical decision to rely on already established methodologies. B3LYP has stood out as an efficient approach to obtain good prediction spectra in a short time, compared to ab initio methods. Nonetheless, B3LYP presents some disadvantages, such as a bad description of van der Waals interactions and flawed calculation of energy, which could lead to an inaccurate structural assignment. Our results suggest that M06-2X/6-31+G(d,p) could be a better DFT level of theory to be used in predicting vibrational spectra. Comparing this level of theory with B3LYP/6-31+G(d,p), we observed that the increase in the time required was not too substantial, whereas the decrease in the RMSD was significant. On the other hand, if a better description of a long-range interaction is intended, CAM-B3LYP seems to be the better choice, mainly when associated with the augmented double or triple zeta correlation-consistent aug-cc-pVDZ or aug-cc-pVTZ. We tested two alternatives for correcting the dispersion effects. The empirical dispersion correction proposed by Grimme for DFT methods (DFT-D3) improved the B3LYP analyses of biotin slightly, but on the other hand, it increased the RMSD of the predicted spectra of tryptophan. Despite obtaining a better prediction of the of the H-bond oscillator, this calculation worsened the description of N–H stretches. Anharmonic calculations provided better predicted vibrational spectra than did harmonic calculations when the 6-31+G(d,p) basis set was used. However, the computational cost was hugely increased. Nonetheless, none of these approaches were more efficient than the use of SF. It should be noticed that a minimum RMSD may be obtained by fitting the SF for a specific sample. In this case, our results show that the most cost-effective B3LYP/6-31+G(d,p) method performs better based on the RMSD results than the other methodologies tested, including MP2 calculations, despite their intrinsic inaccuracy.[37,38,43,44]
  4 in total

1.  Total Synthesis of (±)-Phyllantidine: Development and Mechanistic Evaluation of a Ring Expansion for Installation of Embedded Nitrogen-Oxygen Bonds.

Authors:  Kyle M Lambert; Joshua B Cox; Lin Liu; Amy C Jackson; Sam Yruegas; Kenneth B Wiberg; John L Wood
Journal:  Angew Chem Int Ed Engl       Date:  2020-05-08       Impact factor: 15.336

2.  IRMPD Spectroscopy of Bare Monodeprotonated Genistein, an Antioxidant Flavonoid.

Authors:  Roberto Paciotti; Barbara Chiavarino; Cecilia Coletti; Debora Scuderi; Nazzareno Re; Davide Corinti; Lucretia Rotari; Simonetta Fornarini; Maria Elisa Crestoni
Journal:  ACS Omega       Date:  2022-05-30

3.  Stabilization of Glucosyl Dioxolenium Ions by "Dual Participation" of the 2,2-Dimethyl-2-(ortho-nitrophenyl)acetyl (DMNPA) Protection Group for 1,2-cis-Glucosylation.

Authors:  Wouter A Remmerswaal; Kas J Houthuijs; Roel van de Ven; Hidde Elferink; Thomas Hansen; Giel Berden; Herman S Overkleeft; Gijsbert A van der Marel; Floris P J T Rutjes; Dmitri V Filippov; Thomas J Boltje; Jonathan Martens; Jos Oomens; Jeroen D C Codée
Journal:  J Org Chem       Date:  2022-06-24       Impact factor: 4.198

4.  AB-DB: Force-Field parameters, MD trajectories, QM-based data, and Descriptors of Antimicrobials.

Authors:  Silvia Gervasoni; Giuliano Malloci; Andrea Bosin; Attilio V Vargiu; Helen I Zgurskaya; Paolo Ruggerone
Journal:  Sci Data       Date:  2022-04-01       Impact factor: 6.444

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.