| Literature DB >> 33378192 |
Matevž Pesek1, Andraž Juvan1, Jure Jakoš2, Janez Košmrlj2, Matija Marolt1, Martin Gazvoda2.
Abstract
Herein, we report a computational algorithm that follows a spectroscopist-driven elucidation process of the structure of an organic molecule based on IR, 1H and 13C NMR, and MS tabular data. The algorithm is independent from database searching and is based on a bottom-up approach, building the molecular structure from small structural fragments visible in spectra. It employs an analytical combinatorial approach with a graph search technique to determine the connectivity of structural fragments that is based on the analysis of the NMR spectra, to connect the identified structural fragments into a molecular structure. After the process is completed, the interface lists the compound candidates, which are visualized by the WolframAlpha computational knowledge engine within the interface. The candidates are ranked according to the predefined rules for analyzing the spectral data. The developed elucidator has a user-friendly web interface and is publicly available (http://schmarnica.si).Entities:
Year: 2020 PMID: 33378192 PMCID: PMC7903418 DOI: 10.1021/acs.jcim.0c01332
Source DB: PubMed Journal: J Chem Inf Model ISSN: 1549-9596 Impact factor: 4.956
Figure 1Flowchart of the proposed structure elucidator.
Definition of the Input of the Algorithm from 1H NMR, IR, and MS, along with Optional 13C NMR and Sources
| data source | data |
|---|---|
| 1H NMR | shift: value from the spectrum [ppm] |
| count: integral [#] | |
| splitting: number of peaks—1 (H neighbors) | |
| IR | frequency: position of a peak in the IR spectrum [cm–1] |
| broad: value is true if the absorption band peak is broad | |
| MS | mass of the molecule [g/mol] |
| 13C NMR (optional) | shift: value from the spectrum [ppm] |
Figure 2Schematic presentation of the elucidation algorithm. Each color denotes one of the five steps.
Rules for Computation of the Relevance Score of a Candidate Compound
| rule | description | penalty on the relevance score |
|---|---|---|
| graph contains unconnected edges (rule 1) | one or more bonds are not connected between fragments (for compounds with less than 5 unconnected edges) | –0.5% per unconnected edge |
| sum of difference in ppm (rule 2) | difference in NMR ppm values between connected fragments within the compound. | –0.5% per 1 ppm difference |
| the larger the ppm difference, the lower the relevance. | ||
| difference in mass between constructed compound and the fragments’ mass sum (rule 3) | the larger the difference in masses, the lower the relevance. | –0.5% per 1 g/mol difference |
Figure 3Simplified elucidation process with the proposed algorithm for 3-(4-chlorophenyl)propan-1-ol (1).
Figure 4Selected examples of resolved structures with IR, 1H NMR, 13C NMR, and MS data from the literature.
Figure 5Snapshots from the Schmarnica user interface (http://schmarnica.si/, accessed Nov 13, 2020): IR, 1H NMR, and MS data input, and optional input of 13C NMR data (above), along with presentation of the elucidated structure by WolframAlpha (below) for the case of isopropyl acetate. Computational complexity and developed optimizations.
Analysis of Impact of the Cumulative Optimization Techniques on the Algorithm’s Performance
| optimization type | sum of time spent for testing | average time
for elucidation | average number of candidates per test |
|---|---|---|---|
| none (baseline) | 341.04 | 8.53 | 45.2 |
| weight of compound candidate | 21.66 | 0.54 | 3.3 |
| Erdős–Gallai connectivity check | 12.88 | 0.32 | 2.1 |
| tree realization check | 11.45 | 0.29 | 1.9 |
| 13C NMR–additional data | 11.38 | 0.28 | 1.6 |
Reported test results for 40 tested compounds.
Reported in seconds.
Test refers to an average number of candidates on a single compound in a set of 40 tested compounds.