| Literature DB >> 27258318 |
Lochana C Menikarachchi1,2, Ritvik Dubey3, Dennis W Hill4, Daniel N Brush5, David F Grant6.
Abstract
Metabolite structure identification remains a significant challenge in nontargeted metabolomics research. One commonly used strategy relies on searching biochemical databases using exact mass. However, this approach fails when the database does not contain the unknown metabolite (i.e., for unknown-unknowns). For these cases, constrained structure generation with combinatorial structure generators provides a potential option. Here we evaluated structure generation constraints based on the specification of: (1) substructures required (i.e., seed structures); (2) substructures not allowed; and (3) filters to remove incorrect structures. Our approach (database assisted structure identification, DASI) used predictive models in MolFind to find candidate structures with chemical and physical properties similar to the unknown. These candidates were then used for seed structure generation using eight different structure generation algorithms. One algorithm was able to generate correct seed structures for 21/39 test compounds. Eleven of these seed structures were large enough to constrain the combinatorial structure generator to fewer than 100,000 structures. In 35/39 cases, at least one algorithm was able to generate a correct seed structure. The DASI method has several limitations and will require further experimental validation and optimization. At present, it seems most useful for identifying the structure of unknown-unknowns with molecular weights <200 Da.Entities:
Keywords: in silico structure generation; liquid chromatography; mass spectrometry; nontargeted metabolomics
Year: 2016 PMID: 27258318 PMCID: PMC4931548 DOI: 10.3390/metabo6020017
Source DB: PubMed Journal: Metabolites ISSN: 2218-1989
Number of correct seed structures generated by different seed generation algorithms.
| Algorithm | Number of Correct Seed Structures (/39) | Average % Seed Similarity | % Seed Similarity Range |
|---|---|---|---|
| Algorithm-1 | 24 | 49.5 | 31.8–76.9 |
| Algorithm-2 | 19 | 49.7 | 15.7–87.5 |
| Algorithm-3–1 | 13 | 71.4 | 29.4–90.9 |
| Algorithm-3–2 | 13 | 71.4 | 29.4–90.9 |
| Algorithm-3–3 | 20 | 66.1 | 29.4–92.0 |
| Algorithm-3–4 | 14 | 63.7 | 29.4–90.9 |
| Algorithm-3–5 | 18 | 69.0 | 37.5–92.0 |
| Algorithm-3–6 | 21 | 66.7 | 37.5–89.5 |
Putative unknowns identified with Algorithm-1.
| Target | PMG-Seed | Number of PMG Structures | Number after MolFind | MetFrag Score Rank of the Correct Structure |
|---|---|---|---|---|
| 39482 | 146 | 11 | ||
| 58737 | 1502 | 8 | ||
| 34891 | 2889 | 1230 | ||
| 644 | 40 | 32 | ||
| 230 | 33 | 1 | ||
| 289 | 2 | 2 | ||
| 409 | 15 | 1 | ||
| 1726 | 99 | 3 | ||
| 922 | 30 | 16 |
Putative unknowns identified with Algorithm-3–6.
| Target | PMG-Seed | Number of PMG Structures | Number after MolFind | MetFrag Score Rank of the Correct Structure |
|---|---|---|---|---|
| 38 | 2 | 2 | ||
| 20 | 3 | 1 | ||
| 7965 | 1638 | 106 | ||
| 266 | 8 | 7 | ||
| 285 | 48 | 43 | ||
| 11 | 4 | 2 | ||
| 7706 | 531 | 24 | ||
| 13957 | 1124 | 354 | ||
| 8201 | 288 | 5 | ||
| 437 | 22 | 13 | ||
| 25951 | 1001 | 188 |
Structure filtering with Molecular Mechanics (MM) energies.
| Target PubChem ID | Number of Structures | MetFrag Score Ranking of the Correct Structure | ||
|---|---|---|---|---|
| Before MM Filter * | After MM Filter | Before MM Filter * | After MM Filter | |
| 187790 | 2 | 2 | 2 | 2 |
| 71593 | 3 | 3 | 1 | 1 |
| 92832 | 1638 | 367 | 106 | 34 |
| 1150 | 8 | 8 | 7 | 7 |
| 138 | 48 | 26 | 43 | 26 |
| 3134 | 4 | 4 | 2 | 2 |
| 11841 | 531 | 49 | 24 | 16 |
| 6057 | 1124 | 421 | 354 | 148 |
| 64969 | 288 | 163 | 5 | 5 |
| 825 | 22 | 20 | 13 | 13 |
| 5962 | 1001 | 211 | 188 | Filtered Out |
* From Table 3.
Figure 1Database assisted structure identification (DASI) flowchart. See text for details.
Variants of Algorithm-3.
| Variant | MetFrag Fragment Set | Atom Deletion Scheme |
|---|---|---|
| Algorithm-3–1 | Top cluster | Retain MCS atoms with at least 1 match |
| Algorithm-3–2 | All candidates | Retain MCS atoms with at least 1 match |
| Algorithm-3–3 | Top cluster | Retain MCS atoms with at least 2 matches |
| Algorithm-3–4 | All candidates | Retain MCS atoms with at least 2 matches |
| Algorithm-3–5 | Top cluster | Retain MCS atoms with at least average number of atom matches * |
| Algorithm-3–6 | All candidates | Retain MCS atoms with at least average number of atom matches * |
* Average number of atom matches is calculated by averaging the number of matches of atoms with at least one match.
Figure 2An example illustrating the steps involved in Algorithm 3–3. MCS atoms that matched MetFrag fragments are colored in green. The numbers indicate the number of times a particular MCS atom matched a MetFrag fragment.