| Literature DB >> 17389044 |
Abstract
Entities:
Mesh:
Substances:
Year: 2007 PMID: 17389044 PMCID: PMC1851972 DOI: 10.1186/1471-2105-8-105
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Restrictions for number of elements during formula generation for small molecules based on examination of the DNP and Wiley mass spectral databases. For each element, the higher count was taken for denominating the element restriction rule #1
| < 500 | DNP | 29 | 72 | 10 | 18 | 4 | 7 | 15 | 8 | 5 | |
| Wiley | 39 | 72 | 20 | 20 | 9 | 10 | 16 | 10 | 4 | 8 | |
| < 1000 | DNP | 66 | 126 | 25 | 27 | 6 | 8 | 16 | 11 | 8 | |
| Wiley | 78 | 126 | 20 | 27 | 9 | 14 | 34 | 12 | 8 | 14 | |
| < 2000 | DNP | 115 | 236 | 32 | 63 | 6 | 8 | 16 | 11 | 8 | |
| Wiley | 156 | 180 | 20 | 40 | 9 | 14 | 48 | 12 | 10 | 15 | |
| < 3000 | DNP | 162 | 208 | 48 | 78 | 6 | 9 | 16 | 11 | 8 |
Figure 1Isotopic pattern of 45.000 compound formulas from the Wiley mass spectral database and 60.000 peptides formulas in the small molecule space < 1000 Dalton. M+1 and M+2 are given as relative abundances in [%] and are normalized to 100% of the highest isotope abundance in the molecular formula.
Figure 2Hydrogen/Carbon ratio (H/C) for 42,000 diverse molecules (containing C, H, N, S, O, P, F, Cl, Br, I, Si) taken from the Wiley mass spectral library.
Common element ratios obtained from 45.000 formulas comprising the Wiley mass spectral database for the mass range 30 Da – 1500 Da
| H/C | 0.2–3.1 | 0.1–6 | < 0.1 and 6–9 |
| F/C | 0–1.5 | 0–6 | > 1.5 |
| Cl/C | 0–0.8 | 0–2 | > 0.8 |
| Br/C | 0–0.8 | 0–2 | > 0.8 |
| N/C | 0–1.3 | 0–4 | > 1.3 |
| O/C | 0–1.2 | 0–3 | > 1.2 |
| P/C | 0–0.3 | 0–2 | > 0.3 |
| S/C | 0–0.8 | 0–3 | > 0.8 |
| Si/C | 0–0.5 | 0–1 | > 0.5 |
Multiple element count restriction for compounds < 2000 Da, based on the examination of the Beilstein database and the Dictionary of Natural Products
| NOPS all > 1 | N< 10, O < 20, P < 4, S < 3 | C15H34N9O8PS, C22H44N4O14P2S2, C24H38N7O19P3S |
| NOP all > 3 | N < 11, O < 22, P < 6 | C20H28N10O21P4, C10H18N5O20P5 |
| OPS all > 1 | O < 14, P < 3, S < 3 | C22H44N4O14P2S2, C16H36N4O4P2S2 |
| PSN all > 1 | P < 3, S < 3, N < 4 | C22H44N4O14P2S2, C16H36N4O4P2S2 |
| NOS all > 6 | N < 19 O < 14 S < 8 | C59H64N18O14S7 |
Figure 3Frequency distribution for the molecular masses of all elemental compositions downloaded from the PubChem database (2006) covering more than 5 million single compounds.
Results for number of molecular formulas in ranges < 500, < 1000 and < 2000 Da for elements CHNSOP with maximum valencies (vN = 5, vS = 6, vP = 5) and maximum element counts, last column with element count restrictions from either DNP or Wiley database
| Mass range (u) | Maximum number of molecular formulas | With element ratio check | With probability check | With probability check + element count restriction |
| 500 | 2,707,540 | 1,772,483 | 729,617 | 724,270 |
| 1000 | 139,735,355 | 87,888,303 | 32,555,050 | 30,077,741 |
| 2000 | 7,995,776,805 | 4,926,973,096 | 1,170,870,061 | 623,270,049 |
Figure 4Frequency distribution for 1,200 randomly selected molecules downloaded from the Dictionary of Natural Products at < 2000 Da and comprising C, H, N, S, O, P, F, Cl and Br. Left panel, 4a: mass distribution. Middle panel, 4b: simulated measured masses at 3 ppm mass accuracy. Right panel, 4c: simulated measured isotope ratios at ± 5% accuracy.
Validation of the seven rules using random sub-sampled test sets from specialized databases. Performance is given assuming mass spectrometry errors of ± 5% isotope abundance error and ± 3 ppm mass accuracy and calculating element combinations of C, H, N, S, O, P, F, Cl and Br
| Pharmaceuticals (DrugBank) | 2400 | 30–1093 | 99 | 90 | 8 | 78 |
| Natural Products (DNP) | 1200 | 92–2020 | 99 | 84 | 10 | 81 |
| Toxic Chemicals (TSCA) | 1200 | 56–2170 | 98 | 87 | 8 | 78 |
| Unknowns taken from Wiley+NIST | 1200 | 150–1536 | - | - | 78 | 65 |
Figure 5Mass dependence of calculated, chemically possible formulas derived from 1,200 randomly selected DNP molecules, imposed with simulated 3 ppm mass accuracy ± 5% isotope ratio measurement errors. Red graph: number of calculated formulas with common molecular generators. Green graph: number of formulas constrained by the seven rules. Outliers around 600 Dalton were found to be halogen containing compounds.
Figure 6Effect of ranking the output formulas of the 2,400 randomly selected DrugBank molecules, imposed with simulated ± 3 ppm mass accuracy ± 5% isotope ratio measurement errors. Mass dependence is shown for no database query (red graph, correct formula found in the top three hits), PubChem database query (blue graph, correct formula ranked top) or querying the DrugBank database (green graph, correct formula ranked top).
Figure 7Relative isotopic abundances of the M+1 and M+2 peak for all elemental compositions that would fit a measured mass of 774.94831 Da (Cangrelor), determined at 1 ppm mass accuracy (values exceeding 100% are removed in graphics). Most formulas can be discarded if isotope ratios are measured with an accuracy of ± 5% and used as search constraint (red box).
Single performance of each rule of the seven rules from a total of 696 formulas comprising the elements CHNSOP and Si, calculated from GC-TOF data of sorbitol TMS6
| 1) heuristic restrictions for number of elements | not used (smart H option instead) |
| 2) perform LEWIS and SENIOR check | can remove 420 candidates |
| 3) isotopic pattern filter at 5% error | can remove 668 candidates |
| isotopic pattern filter at 10% error | can remove 632 candidates |
| isotopic pattern filter at 20% error | can remove 462 candidates |
| 4) H/C ratio check (hydrogen/carbon ratio) | can remove 56 candidates |
| 5) NOPS ratio check (N, O, P, S/C ratios) | can remove 51 candidates |
| 6) heuristic HNOPS probability check | can remove 180 candidates |
| 7) TMS check | can remove 432 candidates |