Literature DB >> 35834352

Mass Difference Matching Unfolds Hidden Molecular Structures of Dissolved Organic Matter.

Carsten Simon¹, Kai Dührkop², Daniel Petras^3,4, Vanessa-Nina Roth¹, Sebastian Böcker², Pieter C Dorrestein³, Gerd Gleixner¹.

Abstract

Ultrahigh-resolution Fourier transform mass spectrometry (FTMS) has revealed unprecedented details of natural complex mixtures such as dissolved organic matter (DOM) on a molecular formula level, but we lack approaches to access the underlying structural complexity. We here explore the hypothesis that every DOM precursor ion is potentially linked with all emerging product ions in FTMS2 experiments. The resulting mass difference (Δm) matrix is deconvoluted to isolate individual precursor ion Δm profiles and matched with structural information, which was derived from 42 Δm features from 14 in-house reference compounds and a global set of 11 477 Δm features with assigned structure specificities, using a dataset of ∼18 000 unique structures. We show that Δm matching is highly sensitive in predicting potential precursor ion identities in terms of molecular and structural composition. Additionally, the approach identified unresolved precursor ions and missing elements in molecular formula annotation (P, Cl, F). Our study provides first results on how Δm matching refines structural annotations in van Krevelen space but simultaneously demonstrates the wide overlap between potential structural classes. We show that this effect is likely driven by chemodiversity and offers an explanation for the observed ubiquitous presence of molecules in the center of the van Krevelen space. Our promising first results suggest that Δm matching can both unfold the structural information encrypted in DOM and assess the quality of FTMS-derived molecular formulas of complex mixtures in general.

Entities: Chemical

Keywords: DI-ESI-MS/MS; FTMS; MS/MS; NOM; Orbitrap; deconvolution; natural organic matter; tandem mass spectrometry

Mesh：

Substances：

Year: 2022 PMID： 35834352 PMCID： PMC9352317 DOI： 10.1021/acs.est.2c01332

Source DB: PubMed Journal: Environ Sci Technol ISSN： 0013-936X Impact factor: 11.357

Introduction

Complex mixtures are key study objects in environmental and industrial applications, but their analysis remains challenging.[1−4] One of the most complex mixtures in natural ecosystems is dissolved organic matter (DOM).[5,6] DOM is a central intermediate of ecosystem metabolism and mirrors molecular imprints of interactions with its abiotic and biotic environment,[7−9] which form the basis for processes such as carbon sequestration and nutrient recycling.[10,11] Despite significant advances in ultrahigh-resolution mass spectrometry (FTMS)[2,4] and nuclear magnetic resonance spectroscopy,[12] scientists still struggle to decode this information on the molecular level,[13−17] and novel approaches to identify distinct structures are required to translate molecular-level information into improved process understanding. Open and living systems promote the formation of ultracomplex mixtures of thousands to millions of individual constituents[18,19] that mirror large environmental gradients.[20−22] As a consequence, DOM poses significant challenges to separation, isolation, and structure elucidation. Direct infusion (DI) FTMS techniques have become indispensable tools for the molecular-level analysis of DOM as they reveal unprecedented details of molecular formulas using the exact mass (MS1 data, m/z) even without prior separation.[23] However, FTMS techniques are selective and do not resolve all structural detail observed at the exact mass in DOM, as the presence of isobars and isomers hinders the identification of particular structures from these molecular formulas.[19,23−25] Additionally, current structural databases cover only a small fraction of molecular formulas encountered and typically lead to annotation rates <5%.[18,26,27] One way to obtain structure information on isomers and isobars is through collision-induced dissociation (CID; MS2, or multistage MSn).[27−29] The relatively wide isolation window (∼1 Da) of mass filters applied for precursor ion selection commonly hinders the isolation and subsequent fragmentation of single exact masses, leading to mixed “chimeric” MS2 spectra of co-fragmented precursor ions.[30] Even though some authors achieved isolation of single masses or improved description of chimeric tandem MS data, fragmentation patterns were found to be universal across DOM samples.[18,19,31−35] Most of these studies, however, focused on the major product ion peaks (fragments), which usually make up only 60–70% of the total product ion abundance, and thus disregarded many low-abundance signals that may be more suitable to detecting structural differences.[19,31] The major product ions encountered in tandem mass spectra of DOM relate to sequential neutral losses of common small building blocks, mainly CO2, H2O, or CO units.[14,33] A mass difference between a precursor and a product ion in an MS2 spectrum is herein called “delta mass” and referred to as Δm (plural Δm’s). Common Δm’s such as CO2 or H2O are deemed nonindicative for the identification of structural units.[18,28,31,33,36] In contrast, other studies found recurring low-m/z product ions (e.g., at m/z 95, 97, 109, 111, 123, 125, 137, 139, 151, and 153) that were interpreted as a limited set of core structural units substituted with a set of functional groups, yet in different amounts and configurational types that would lead to highly diverse mixtures.[37−44] From a stochastic standpoint, the occurrence of common neutral losses may not be surprising; many structures contain hydroxyl groups that could yield H2O losses, and CO2 could originate from ubiquitous carboxyl groups.[45] In contrast, the occurrence of two molecules sharing a larger substructure would be less probable and less easily detected as a major peak. Signatures of DOM’s structural diversity could thus prevail in the high number of low-abundance fragments usually detected below m/z 200–300, as opposed to the higher abundance of fragments connected to losses of CO2 or H2O. Given the large number of estimated isomers and isobars underlying usual DOM data,[18,19,31,32,39,45−48] we here build upon the hypothesis that every co-fragmented precursor ion potentially contributes to every emerging product ion signal. We interpret the resulting chimeric MS2 data as a structural fingerprint that can be deconvoluted to obtain individual precursor ion Δm matching profiles. The analysis of Δm’s that link precursor and product ions is independent of the masses of the unknown precursor ions and known reference compounds in databases of annotated Δm features, and therefore does not rely on indicative product ions (fragments) alone. Although this approach sacrifices the identification of true knowns, it allows for the identification of potential structural analogues via indicative Δm’s and may be especially suited when annotation rates are as low as in the case of DOM, i.e., when most compounds are unknown.[18,26,27] Despite the unknown identity of most of the molecules present in DOM, its potential sources can be constrained reasonably well. Plants produce most of the organic matter that sustains food webs in natural ecosystems. Plant metabolites such as polyphenols thus represent a major source of DOM. Therefore, an early decomposition phase likely exists when the imprint of soluble/solubilized plant metabolites is still detectable by MS2 experiments using current FTMS technology: Lignin-related compounds show indicative methoxyl/methyl radical losses,[18,49,50] glycosides indicate a sugar loss,[51,52] and hydrolyzable tannins may lose galloyl units.[52] Mass differences related to atoms such as N, S, P, Cl, Br, I, and F could also help to identify unknown organic nutrient species or disinfection byproducts, thereby widening the applicability of the approach.[1,53] Finally, indicative Δm fingerprints could provide constraints to putative compound group annotations derived from molecular formula data alone (van Krevelen diagrams) or allow for a more precise annotation.[54−56] We hypothesized that DOM from swamps and topsoil, in close contact with plant inputs and active microbial communities, would reflect recognizable plant-related source imprints that can be revealed by tandem mass spectrometry. Specifically, we explored links between precursor ion Δm matching profiles and precursor ion characteristics such as nominal mass, mass defect, initial ion abundance, fragmentation sensitivity, oxygen-to hydrogen ratio (O/C), heteroatom content, and structure suggestions. These properties are in part predictable from the assigned molecular formula, and thus allow for an evaluation of the approach (“proof of concept”) while also revealing potential nonassigned molecules (e.g., P-, Cl-, Br-, I-, and F-containing molecular formulas). Finally, we hypothesized that indicative Δm features of plant phenols, e.g., lignin- and tannin-related losses, would match their yet unknown structural analogues in DOM and that these patterns would reflect commonly applied “structural domain” distributions.[55,57,58]

Experimental Section

A detailed experimental procedure is provided in the Supporting Information of this article (Note S-1). In short, we chose 14 aromatic reference compounds as representative plant metabolites (Figure S-1 and Table S-1) and a forest topsoil pore water isolate[59] and Suwannee River Natural Organic Matter (SRNOM)[60] as exemplary DOM samples. All reference and sample solutions were directly infused into the ESI (electrospray) source of an Orbitrap Elite (Thermo Fisher Scientific, Bremen) at negative ionization mode (Table S-2) and fragmented by collision-induced dissociation (CID, MS2). We chose four nominal masses within the mass range typically observed in terrestrial DOM samples (m/z 200–500) for fragmentation (m/z 241, 301, 361, and 417, herein referred to as isolated precursor ion mixtures, “IPIMs”) to test the approach.[61] Soil DOM was analyzed at three normalized collision energy (NCE) levels (15, 20, and 25%). MS3 spectra of selected key product ions (aglycons of flavonoids and demethylated dimethoxy-methyl-benzoquinone) were acquired at NCE 20 or 25. After recalibration with known (Table S-3) or predicted product ions (losses of CO2, H2O, etc.), all major product ions were annotated with a molecular formula in reference compounds (Figure S-2, Tables S-4, and Table S-5) and DOM. Formula annotation was conducted with a Matlab routine recently incorporated into an open FTMS data processing pipeline.[62] For MS2 data analysis, we generated Δm matrices of every pairwise combination of precursor and product ions (“Δm fingerprints”). Every value in this matrix is referred to as a Δm feature or Δm. We compared the unknown Δm features in DOM to three lists of known Δm features: 54 Δm features ubiquitously found in DOM (Table S-6), 55 Δm features from the set of 14 reference compounds (Table S-7), and 11477 Δm features from a negative ESI MS2 library with 249 916 reference spectra of 17 994 unique molecular structures annotated by SIRIUS[63] (Figure S-3; list in supporting datasets). Reference spectra were collected from GNPS, MassBank, MoNA, and NIST.[64,65] The detection of a known Δm feature in DOM is herein called “Δm matching” and detected Δm features are called Δm matches. Matching was conducted at a mass tolerance of ± 0.0002 Da (2 ppm at 200 Da). The array of Δm matches of a single precursor ion is called the Δm matching profile, and all precursor ion profiles of an IPIM form the subset of matched Δm’s of the Δm matrix introduced above. The decomposition of the MS2 spectrum into a Δm matrix and therefore, individual Δm matching profiles is what we define as the deconvolution step in this study. Δm’s of lists (a) and (b) showed some overlap and were largely part of list (c) as well. The specificity of any Δm feature in list (c) was checked by their association to compound classes as defined by ClassyFire.[66] The top 15 significantly associated classes were then obtained for each Δm feature in list (c) and included in analyses using the reference-compound-derived list (list b) as well. We assessed the probability of false-positive matches and accounted for the number of elements in the formula, ion abundance, and measures of fragmentation sensitivity to validate our approach. The matching data were combined for each NCE level and transformed into a binary format. We classified Δm matching profiles of DOM precursor ions and reference compounds of lists (b) and (c) by two-way hierarchical clustering using Ward’s method and Euclidean distance, as well as Principal Components Analysis (PCA) in PAST (v3.10) for list b.[67] We visualized numbers of individual Δm matches and Δm cluster matches in van Krevelen space for all lists. We chose the structural domains reprinted in the 2014 review by Minor et al. for reference because this represents the general level of detail and type of classes distinguished in recent DOM studies (Figure S-4).[57,58,68−70] In two separate analyses, formulas were also classified with a more general and a data-based van Krevelen scheme besides the reference one.[58,71] Finally, we assessed the agreement between structures predicted by Δm matching and those suggested in natural product structural databases. We combined structure suggestions from different databases, including Dictionary of Natural Products,[72] KNApSAcK,[73] Metacyc,[74] KEGG,[75] and HMDB,[76] as well as their expanded in silico annotations based on predicted enzymatic transformations in the MINEs database.[77] Although the MINEs database covers 198 generalized chemical reaction rules it may not include all potential environmental reactions because those are not solely driven by enzymes. The InChi-Key of structures was used to exclude stereoisomers and classify suggested structures into compound classes by ClassyFire.[66]

Results and Discussion

Tandem MS Fragmentation of Reference Compounds and Construction of Δm Lists

The 14 reference compounds (Figures S-1, S-2, and S-4) yielded 42 unique Δm features (i.e., not covered in list a, Table S-6) but also eight that were described in DOM. These eight Δm features (namely: H2O, 18.0106; CO, 27.9949; C2H4, 28.0313; C2H2O, 42.0106; CO2, 43.9898; CH2O3, 62.0004; C2O3, 71.9847; and C3O5, 115.9746) were kept in list (b) to compare DOM and reference compounds (Table S-7). Besides precursor ion formulas #2 (Hydroxy-cinnamic acid, or p-coumaric acid; C9H8O3, 164.0473), #3 (Gallic acid; C7H6O5, 170.0215), and #5 (m-Guaiacol; C7H8O2, 124.0524), which were found among the 42 Δm’s as potential structural equivalents, five Δm’s of potential substructures likely to be found in DOM were added to list b, namely, precursor ions #1 (Vanillic acid; C8H8O4, 168.04226), #4 (Creosol, C8H10O2, 138.0681), #8 (Ellagic acid; C14H6O8, 302.0063), and #10 (Catechin; C15H14O6, 290.0790), and the neutral aglycon of compounds #12 and #13 (flavonol core of Spiraeoside and Isoquercitin; C15H10O7, 302.0427). More details on reference compound fragmentation are given in the SI (Note S-2).

Fragmentation Behavior of Soil DOM

DOM precursor ions were isolated and fragmented to obtain Δm data via matching (Figure S-5). To find the best collision energy to fragment DOM, we analyzed soil DOM at three NCE levels (15, 20, and 25). All IPIMs showed similar fragmentation properties (Note S-3 and Table S-8). Highest numbers of product ions were found at the highest NCE (Figure S-6). Product ion spectra did not indicate abrupt structural changes upon increasing NCE, showing no separation of isomers/isobars but a continuous increase in fragmentation across all precursor ions. Based on the above results, NCE of 25 was chosen to fragment SRNOM for comparison. Despite common differences between precursor ion abundance and O/C ratio or mass defect (Figure a,d), we found a significant positive link between both metrics and fragmentation sensitivity independent of nominal mass, ranging from half-life NCE (i.e., the NCE level causing 50% decrease in ion abundance) of 10–35 under our instrumental settings (calculated from linear fits). Remarkably, this trend was not observed in reference compounds (Figure b,e). Such a discrepancy has been observed also by Zark et al. for the common CO2 loss and was interpreted as a result of intrinsic averaging.[31,45] In contrast, Dit Foque et al. described the potential separation of less complex isomer mixtures by ramped fragmentation.[29] Bearing the limitation in mind that we only analyzed four IPIMs here, our results support the intrinsic averaging hypothesis and indicate that fragmentation sensitivity may be an additional property shaped by DOM complexity.[18,20,45] It also supports our assumption of a high number of isomers and isobars “hidden” beneath each precursor ion molecular formula, which also increases the probability to detect meaningful links between precursor and product ions. A minor group of oxygen-poor formulas was nonresponsive (Note S-3). Matching to list c showed no significant relation to O/C ratio but to mass defect (Figure c, f). In contrast to mass defect, initial ion abundance showed no link to fragmentation sensitivity but was significantly correlated to higher numbers of Δm matches (r = 0.41, R2 = 0.17, n = 157, p < 0.001; see also Tables S-9, S-10, S-11, and S-12, and Figure S-7). DOM precursor ions with an average O/C ratio matched more often than low-O/C, fragmentation-resistant precursor ions (Figures c and S-8, Note S-3)[18,19,35] or high-O/C, easily fragmented precursor ions (Figure b). These observations together show that fragmentation sensitivity and Δm matching seem to be independent DOM precursor ion properties and that Δm matching could be driven by ion abundance. SRNOM and the soil water sample shared most molecular formulas (n = 107; 84% of soil DOM and 74% of SRNOM formulas) and accounted for most of precursor ion abundance at NCE 25 (96.5% and 97.2%, respectively). Despite this high similarity, SRNOM precursor ions showed higher numbers of Δm matches (Figure c,f), which could indicate that the same molecular formula is more chemodiverse, i.e., has more underlying structural formulas in SRNOM compared to soil DOM (Section ).

Figure 1

Links between selected DOM precursor ion properties (top panels, initial ion abundance at NCE 0; middle panels, half-life normalized collision energy (NCE) at which ion abundance has dropped by 50%; bottom panels, matches of delta masses (Δm’s) of measured precursor and product ion masses (delta masses, Δm) with a list of 11477 known Δm features from SIRIUS) and each precursor ions’ O/C ratio (a–c) or mass defect (d–f). O/C ratios can only be shown for precursor ions with an annotated molecular formula. Black dots are individual soil DOM formulae. Additional data from reference compounds (red diamonds, see also Figure S-4) and SRNOM (orange crosses) is shown in middle and bottom panels, respectively. Statistical data were derived from linear fits; asterisks (***) denote p-value < 0.001.

Evaluation of the Δm Matching Approach

We used the matching data of molecular formulas in DOM for a proof-of-concept evaluation of our Δm matching approach. Specifically, we aimed to test the hypothesis that all precursor ions are potentially linked to all product ions in chimeric MS2 spectra of ultracomplex DOM. Our analysis was congruent with previous observations, showing losses of common Δm’s (Table S-6) while also revealing more detail (Figure S-5c, Table S-7). Details are given in the Supporting Information (Note S-4). We found expected trends in losses of CO2, CO, and CH2 in both samples (Figure a–c, g–i and Table S-13). The predicted heteroatom content (O, N, S) of assigned molecular formulas and a widened tolerance window were used for further analysis of the uncovered structural information. Random Δm matching would be expected if the calculated Δm values were affected by low resolution, low sensitivity, or artifacts such as reactions in the instrument (e.g., between the collision and Orbitrap cell[78]). Instead, we found that (1) precursor ions with low ion abundance matched to less Δm features (Figure S-7), (2) nonfragmented precursor ions matched to less or no Δm’s (Figure S-8), and (3) identity of Δm matches agreed with molecular formula prediction (e.g., loss of S-containing Δm’s from S-containing precursor ions, etc.; Figures S-9 and S-10). Our evaluation also shows that Δm matching not only helps in recalibration[79] but also serves to check formula annotation, as it revealed unresolved precursor ion compositions interfering especially with CHOS precursor ions (related to Cl, P, and F). This means that (1) these atoms should be included for better coverage of elemental composition (i.e., prioritization) in our specific sample context and that (2) higher resolution power may be required to resolve S-, Cl-, P-, or F-containing precursor ion compositions.[1] In summary, Δm matching revealed an inherently structured biogeochemical signal of precursor ions that seem to fragment individually and was highly sensitive in detecting precursor-product ion pairs. This suggests that chimeric DOM data can be deconvoluted to reveal differences in molecular composition not visible from MS1 inspection.[23,80] It should be stressed that these results will need further evaluation due to the small number of DOM precursor ions, m/z values, and samples analyzed here (159 in soil DOM, 221 in SRNOM), and that deconvolution should be further tested with better-characterized mixtures, including, e.g., structural analogues, artificial mixtures, or standard additions (spiking).[14,19,27,42,81]

Figure 2

Δm matches visualized in chemical space for soil (porewater) DOM (a–f) and SRNOM (g–l). Exemplary reference compound structures with potential indicative Δm units are shown in bottom panels (m–q). “Phenylprops” refers to the shortened Classyfire class of “Phenylpropanoids and polyketides”. Gray outlined boxes refer to anticipated structural domains (Figure S-4).[64] (a–l) Precursor ions with an annotated molecular formula by their atomic H/C and O/C ratios (van Krevelen plot; soil DOM, n = 127; SRNOM, n = 144). Dot size encodes the number of matches to nonindicative (a–c, g–i) vs indicative Δm’s (d–f, j–l); see legends in every plot. Colored boxes in indicative VK plots mark the expected structural region of formulas that would be expected to yield the respective Δm, and colors refer to the structural motifs marked in (m–q). Phenylpropanoid or benzenoid-like (sub-)structures as the ones shown in empty circles (o, p, q) may also contain methyl or methoxy groups (filled orange dots in m, n) that could produce methyl radical losses. Calculations based on Δm data are presented in more detail in Table S-13. Highlighted red open diamonds in (e) and (k) indicate loss of up to three gallic acid equivalents (size not drawn to scale).

Clustering with Reference Compound Δm’s Reflects Structural Trends

DOM precursor ions from both samples were compared based on Δm matching (Table S-7, see Section ). We grouped DOM precursor ions, reference compounds, and Δm features (list b) by two-way hierarchical clustering (Table S-14), i.e., matching of precursor ions across Δm features and vice versa. In the following, precursor ion clusters will be referred to by letters (A–H) and Δm clusters by numbers (1–7; Table ). Based on the specificity of SIRIUS Δm features (Table S-14), we defined five Δm clusters found herein as structure-specific (Table , Figure d,e,j,k; and Table S-13).

Table 1

Summary of Two-Way Clustering of DOM Precursors (Highlighted in Red) and 14 Reference Compounds (Highlighted in Green)

Numbers refer to Figure S1; #12* and #13* refer to MS3 spectra of flavonoid aglycons. Numbers are coverage in Δm matches compared to overall Δm’s per Δm cluster; values > 20% are highlighted in bold, values <10% are grayed out. Δm clusters are shown in rows (“Cl. #”, 1−7) and precursor clusters in columns (A–H, for details, see Table S14 and original clustering data in PANGAEA datasets). Additional columns show respective numbers of Δm matches (“n”) and assigned cluster name (compare Table S14). In the lower row, numbers of precursors per precursor cluster are given for both samples combined and individually. Few reference compounds clustered with precursor clusters D–H, which were dominated by DOM precursors with higher numbers of Δm matches. Compounds #7, #12, #13, and #14 contain polyol moieties; compounds #1, #4, #5, and #6 contain -methoxy and -methyl moieties (Figure S1) Δm features C4H8O4 (120.0423 Da, tetrose equivalent) and C6H10O5 (162.0528 Da, hexose equivalent), both members of cluster 2, were annotated to alcohols and polyols, carbohydrates, carbohydrate conjugates, and ether structures via SIRIUS (Table S-14). Reference compounds containing a polyol (quinic acid, #7) or a sugar (glucose, #12 and #13; mannose, #14) contributed Δm’s to this cluster (Table ).[51,52] Cluster 2 Δm’s matched to 18 soil DOM and 24 SRNOM precursor ions in the central van Krevelen plot despite the absence of “carbohydrate-like” precursor ions (lilac square, Figure d, j and o, q). The anticipated shift toward higher O/C and H/C ratios was nonetheless apparent (Figure e,f and k, l). Cluster 3 and 4 Δm features, partly specific to phenylpropanoid and benzenoid structures, were contributed by flavan-3-ols (#10, #11), flavon-3-ol aglycons (#12 and #13), and compounds containing cinnamic, coumaric, or gallic acid (#7, #9, #11).[28,33,52] Precursor ions that matched to clusters 3 and 4 (soil DOM: n = 27 and n = 12; SRNOM: n = 29, n = 21) were found in the “lignin-like domain” (orange square in Figure e,k; orange circles in panels o, p, q). These C- or H-rich Δm’s (e.g., C8H10O2 or C7H4O4) are likely no combinations of common O-rich losses (CO, H2O, or CO2) due to their low O/C and O/H ratios, but this requires further testing with model mixtures. Aliphatic chains could prevail as O-poor substructures in substituted cyclic core structures.[82,83] Similar to the detection of polyol-equivalent Δm matches outside the expected “carbohydrate domain”, gallate-equivalent losses were not matched to precursor ions in the anticipated “tannic domain” but to precursor ions outside of that box (red diamonds, turquoise square, Figure e, k; turquoise circle, panel p). Among the most prominent features was the methyl radical loss,[35,49,50] which matched to oxygen-poor DOM precursor ions and was one of three cluster 7 Δm features (soil DOM: n = 18, average O/C = 0.33, SRNOM: n = 25, average O/C = 0.32, Figure f, l). The distribution of CH3•-yielding precursor ions was paralleled by CH2 (soil DOM: r = 0.60, R2 = 0.35, n = 127, p < 0.001; SRNOM: r = 0.63, R2 = 0.39, n = 144, p < 0.001) and CO losses (r = 0.55, R2 = 0.30, n = 127, p < 0.001; r = 0.58, R2 = 0.34, n = 144, p < 0.001), implying structural similarities (Figure f,l), e.g., condensed structures with aliphatic, lactone, or quinone moieties.[34] CO and CH3• were also annotated to benzenoid structures via SIRIUS (Table S-14). The methyl radical loss is an expected diagnostic Δm of methoxylated aromatic rings as in lignin (orange square in Figure f, l; orange circles in panels m, n; see Note S-5), but was also matched to DOM precursor ions not classified as “lignin-like”.[18,31,35,49] The Δm features CH3•, CO and C2H4 were also linked to CH4 vs O series which describe regular 0.0364 Da increments in DOM that are formally annotated by an exchange of CH4 by O (Figure S-11).[37,38] Concurrent losses of CO and C2H4 explained the presence of these increments on the product ion level and were paralleled by losses of CH3•. This finding could explain the ubiquitous presence of CH4 vs O series in nonfragmented DOM; for example, concurrent β-oxidation and de-carbonylation could be enzymatic MS1 analogues of the patterns seen in MS2.[26] Alternatively, both methoxy-phenols (#4, #5) indicated an insertion of O for CH4 upon fragmentation (Note S-2, first section). Both observations may be relevant for the explanation of a key DOM property and require further exploration. Taken together, matching to Δm features derived from a small set of reference compounds revealed emerging clusters of precursor ion and Δm feature families that may prove more indicative if constrained with further DOM and reference compound data.[14] Anticipated structural domains were apparent but indicated clear overlap, which means that the same precursor ion was part of more than one Δm-predicted structural class. For example, 27 classic lignin-like precursor ions were part of seven precursor ion clusters (B–H; Table S-15) with clear differences in potential structural composition. An extended analysis using >700 compound class-associated SIRIUS Δm features (list c) substantiated these findings (Figures , S-12 and Note S-1).

Figure 3

Potential structural composition of exemplary sections in van Krevelen space shown in the H/C (a, b) and O/C (c, d) dimensions. Data from both DOM samples and for CHO precursors are presented. Each bar represents 100% of information derived from matched Δm features (list c) per precursor ion; compositions of individual precursors are aggregated as averages here for visualization (numbers of precursors given behind bar). Examples of individual precursor compositions are shown in the Supporting Information (Figure S-12). The analysis is based on the association of Δm features (list c) with their SIRIUS-annotated molecular (host) structures (details given in Note S-1). Structural classes are shown at the most aggregated class level of the Classyfire Ontology here (compare innermost circle in Figure S-3) but would allow for finer differentiation in future applications. CHNO and CHOS precursor ions matched with many of the S- and N-containing SIRIUS Δm features (list c, spanning 3–78 S- and 4–251 N-containing Δm’s in soil DOM and 0-154/0-350 in SRNOM; Tables S-16, S-17, S-18, and S-19). These represented on average 79 ± 19% (63 ± 31% in SRNOM) of all Δm matches per CHOS precursor ion or 91 ± 7% (79 ± 28%) of all CHNO precursor ion matches (Note S-6). CHNO precursor ions were annotated with reduced forms of N (including aralkylamines, amino acids, carboximidamides, and dicarboximides/urea-containing compounds, Table S-20) but not to nitrate esters.[34,84,85] S-containing Δm matches indicated the potential presence of sulfonic, thiol, thioether, or aromatic CHOS compounds.[86] These results show a wide potential diversity of N and S compounds in DOM that differs from earlier reports of mainly aromatic N and sulfonic S.[34,87,88] As most of these studies analyzed marine DOM, the detection of more diverse sets of CHOS and CHNO precursor ions could relate to the terrestrial, less degraded DOM analyzed here.[16,89−91] Further tests with N- and S-containing reference compounds and DOM samples are warranted to reveal the hidden diversity of CHNO/CHOS compounds and confirm potential structures, e.g., by NMR. All in all, our results show that it may be possible to refine molecular structure representations in van Krevelen plots by deconvoluted MS2 data and that complementary precursor ion information could be used to assess false or biased Δm-based class assignments (e.g., elemental composition, DBE, ionization, fragmentation sensitivity, ion mobility, polarity index, etc.).[13,55,58] Fluorescence or NMR spectroscopy could add valuable information if DOM would be fractionated before MS2 data acquisition, i.e., to assess indirect (statistical) links of MS2 features with complementary forms of structural insight.[21,92−94] Our findings must however be taken with caution for four reasons: SIRIUS Δm features (list c) were not obtained on the same instrument and thus may include features that, although correlated with certain compound classes, may not appear in DOM under the same instrumental settings. SIRIUS Δm features may be biased toward certain classes of compounds (Figure S-3), as our set of 14 aromatic compounds. Here, we only considered negative ESI mode data which is commonly employed for DOM analysis. Adding positive ESI or other ionizations would extend the range of Δm features and structural classes covered and likely decrease bias.[14,16,23,86] The same applies to other fragmentation techniques than CID. Product ion abundance was disregarded in our analysis, but could be used to weigh probabilities of potential precursor-product ion pairs in future, potentially in combination with fragmentation energy gradients (fragmentation trees),[95] moving m/z isolation windows, or ion accumulation time variation.[96] Despite a seemingly improved separation of extreme classes (high H/C ratios in fatty acids, high O/C ratios in carbohydrates, etc.), potential overlap in structural class boundaries remained considerable (Figures and S-12). Data-dependent and data-independent acquisition (DDA, DIA) techniques could be used to cover the whole mass range of precursor ions in DOM mass spectra in future, and are widely employed in LC-MS of complex mixtures.[16,27,97,98] For example, Ludwig et al. presented a DIA scheme (SWATH-MS) that employs one precursor ion scan and 32 isolation windows of 25 Da width, covering 800 Da within 3.3 s; similar schemes are likely transferable to acquire full mass range data of directly injected DOM.[99] Kurek et al. recently presented such data (m/z 392–408),[16] Leyva et al. discerned fragmentation pathways and structural families (mass range m/z 261–477).[14] The latter approach could be extended to include the diversity of structure-associated Δm features presented here. Together, this shows that practicable tandem MS acquisition strategies are in reach and will enable deeper analyses of Δm features in DOM soon.

Drivers of Differences in Δm Matching between Soil DOM and SRNOM

Although matching among the two samples was largely consistent, slight differences were apparent in van Krevelen distributions (list b: Figure , list c: Figures and S-12). We therefore tested the separation of precursor ion clusters by ordination (principal component analysis, Figure ) using list b. Precursor ion clusters were clearly separated on Principal Components 1 and 2 which together held about 47% of variation. Most considered precursor ions were shared among samples (64%, 38 out of 59), only a small number was sample-specific (SRNOM = 14, Soil DOM = 7). Sample-specific precursor ions were found in clusters A (linked to carboxylic acids), B (phenols, polyols) and C (benzenoids, Table ), the remaining clusters D–H were dominated by the shared precursor ions. Out of the 38 shared precursor ions, 30 (79%) grouped in the same precursor ion cluster despite a general trend to higher numbers of matches in SRNOM, but eight grouped differently (bold precursor ions in Figure ). These differences in matching could be related to different chemistries, i.e., different isomeric/isobaric composition.[84] For example, the cluster “switch” in C11H14O6 was largely explained by higher ion abundance and Δm matches in SRNOM, while in C23H22O4, the effect could be partly linked to higher fragmentation resistance in SRNOM (Table S-21). Unfortunately, we only have data on initial ion abundance and fragmentation sensitivity from the soil DOM isolate; other precursor ion properties, however, showed very similar trends in both samples (Table S-21).

Figure 4

Separation of DOM precursor ions based on Δm matching with list b. Principal component analysis of all precursor ions with more than one match to indicative Δm features of the 14 reference compounds (i.e., Δm features shown in Table S-7 that are not part of Table S-6, see Section ). Colors of dots distinguish precursor ions from both samples and reference compounds (see legend). Precursor ions detected in both samples are connected by dotted black lines. Precursor ion clusters (A–H) are marked by envelopes and letters (compare Tables 1 and S-14). Eight shared precursor ions that switched precursor ion clusters are highlighted by bold molecular formula (C12H14O9, A in soil DOM → H in SRNOM; C19H26O3, B → C; C26H26O5 and C23H22O4, B → D; C17H14O9, G → E; C19H22O7 and C22H26O8, H → E; C11H14O6, H → G). Similar clustering and Δm-predicted structural classes (Figure S-12) in shared precursor ions could indicate a conserved structural composition. Likewise, Kurek et al. observed high similarity in photoionized (APPI) and IMPRD-fragmented DOM samples but observed clear differences in CHOS fragmentation.[16] High similarities between DOM samples would be in line with stoichiometric principles (i.e., due to a large share in precursor ions between DOM samples) and could suggest that DOM processing diversifies, but also “randomizes” the molecular composition of each precursor ion (“universal” signal).[31,100,101] High congruence of fragmentation patterns (and thus, Δm matching) among DOM precursor ions has also been interpreted as a sign of similarly substituted but slightly differing core structures.[35,37] The clusters devised here were small due to the relatively small number of precursor ions and m/z values analyzed, and thus may not detect significant differences between samples yet. However, even with our small set of precursor ions, the clustering by Δm matching showed conserved differences in fragmentation between precursor ion clusters, and in part, even the same precursor ion in different samples. The fact that this could relate to differences in ion abundance (and therefore, possibly also ionization efficiency) or fragmentation sensitivity is intriguing and should be investigated across a wider range of DOM chemotypes using improved classification approaches as applied here (see also Section ).[14] In line with this, potential compositional differences between DOM samples became more apparent when more Δm features were used for the clustering (list c instead of list b; Figures and S-12).

Ion Abundance Is Linked to Δm Matching Frequency and Structural Diversity

Ion abundance was the most important driver for Δm matching in both samples and highest in the van Krevelen plot “region” usually assigned to ubiquitous lignin structures or carboxyl-rich aliphatic molecules.[59,83] This region also parallels with a maximum in potential underlying chemodiversity,[30,102] which could explain why these signals are ubiquitously found and especially dominant in recycled DOM.[90,103] Δm matching showed potential to reveal this underlying chemodiversity effect and was therefore compared to numbers of structure suggestions and Δm-predicted compound classes per precursor ion (Figure ). Numbers of Δm matches were significantly and positively related to the number of structure suggestions in absolute terms and for specific compound classes (Table S-21). The correlation between Δm-predicted and suggested compound classes was surprisingly similar in both samples and significant for almost all benzenoid-type (benzopyrans, methoxybenzenes, anisoles, phenols, etc.) and most phenylpropanoid-type structures (flavonoids, linear 1,3-diarylpropanoids). Among the organic acids, only vinylogous acids stood out (i.e., containing carboxylic acid groups with insertions of C=C bond(s)). Significant correlations were also found for pyrans, acryloyl compounds, carbohydrates, aryl ketones, and alkyl aryl ethers (fatty acids and analogues only in SRNOM).

Figure 5

Agreement between chemodiversity estimates based on molecular formula (structure suggestions) and precursor-product ion links (Δm matches). (a, b) Correlations between numbers of SIRIUS Δm matches (list c) vs structure suggestions (note log scale, incl. in silico hits): (a) soil DOM, (b) SRNOM. (c, d) Number of SIRIUS Δm matches in van Krevelen space (scales are similar but legends show different dot sizes): (c) soil DOM, (d) SRNOM; gray boxes refer to structural domains defined in Figure S-4. (e, f) Number of predicted classes per precursor ion based on SIRIUS Δm matches (color scale similar in both panels). Structural classes are associated with SIRIUS-annotated Δm features through correlation analysis of host structures and their Δm features (classification based on Classyfire): (e) soil DOM, (f) SRNOM. The positive link between ion abundance and numbers of Δm matches on the one hand and predicted and suggested structures on the other indicates that ion abundance may be linked to the number of structural isomers and isobars per molecular formula in FTMS spectra of DOM and explains why Δm-defined structural classes showed strong overlap in this study. It also provides additional support to our assumption that all precursor ions potentially contribute to all product ions in DOM: The patterns revealed through Δm matching were largely congruent with the independent estimate of structural composition by natural product databases. The fact that only some classes of compounds (mainly benzenoids and phenylpropanoids) showed significant correlations could point to bias toward plant natural products in the databases employed here; in turn, this means that the inclusion of other structure databases and the additional assignment of Δm’s not only to their host structures but also to host organisms (e.g., in GNPS[65]) could reveal further clues about the potential sources of molecular formulas in DOM. We propose that the number of Δm matches could be interpreted as a novel, relatively easily accessible measure to account for a precursor ions’ underlying potential structural diversity. Such information could help to better understand the mechanisms of DOM formation and persistence in the environment. Our results encourage further studies on the Δm matching behavior of synthetic mixtures of known structures and across DOM chemotypes, and the improved bioinformatic exploitation of chimeric (LC-) FTMSn data of complex organic mixtures.[14,104−106] We acknowledge that natural product and in silico databases are far from being complete, same as the database of annotated Δm matches we used here, despite its large coverage of ∼18 000 unique structures and ∼11 500 Δm’s (Figure S-3). For example, precursor ions with low mass defects showed exceptionally few structural hits, indicating bias in natural product databases (Figure S-13).[18] These structures were easily fragmented and yielded few Δm matches in our analysis. CHO precursor ions were double as likely to yield a suggestion than N- and S-containing precursor ions. These observations show that DOM contains unique molecular structures to be identified in future, potentially through the application of a wider range of ionization and fragmentation techniques that reduce structural bias.[14,16,23]

Implications

Tandem MS data of complex samples such as dissolved organic matter (DOM) is impeded by the co-fragmentation of precursor ions with similar nominal mass, and further complicated by the contribution of potential isomers and isobars. We employed an approach that analyzes the pairwise Δm’s between all precursor and product ions (Δm matrix). Using a very limited set of precursor ion features from two samples, we found potential signs of structural imprints related to e.g., benzenoids, phenylpropanoids, carbohydrates, sulfonic acids, thiols, thioethers, and amino acids. The successful matching of indicative Δm features and precursor ion clustering suggests a recognizable source imprint of primary or recycled plant remains in DOM. Tests with more DOM samples and artificial/treated mixtures (e.g., spiked DOM, or enzyme-degraded DOM) are required to test the assumptions employed here and to improve classifications by Δm clustering. Our first results indicate that FTMS2 data may be useful to differentiate molecular composition on the molecular formula level and that ion abundance and fragmentation sensitivity are two key variables that explain differences in MS2 data within and among samples. This is intriguing because a shared molecular formula could harbor a completely different set of structures, and larger sets of DOM data would improve the detection of these differences. Generally, our findings support the view that regions of the van Krevelen plot are associated with indicative Δm’s that relate to stoichiometric differences between compound classes. The most abundant precursor ions however showed a mixed MS2 signal that caused boundary overlap of these “Δm-defined regions” (Figures and 5e, f). While this finding is in line with known patterns of structural diversity and partly explains the ubiquitous presence of abundant DOM signals, it introduces a new paradigm to the interpretation of DOM FTMS data by assigning unknown precursor ions to multiple structural categories instead of just one (Figure S-12). Further evaluation of both natural and spiked/treated complex mixtures, constantly growing MS databases, and comprehensive decomplexation methods (LC-MS, IMS) will together provide fundamental insights into the deconvolution of chimeric spectra from complex samples, and ultimately show the potential to unfold the hidden molecular diversity and identity of DOM.

77 in total

1. ESI-MS/MS analysis of underivatised amino acids: a new tool for the diagnosis of inherited disorders of amino acid metabolism. Fragmentation study of 79 molecules of biological interest in positive and negative ionisation mode.

Authors: Monique Piraud; Christine Vianey-Saban; Konstantinos Petritis; Claire Elfakir; Jean-Paul Steghens; Aymeric Morla; Denis Bouchu
Journal: Rapid Commun Mass Spectrom Date: 2003 Impact factor: 2.419

2. Gas-Phase Chemistry in the GC Orbitrap Mass Spectrometer.

Authors: Tim U H Baumeister; Nico Ueberschaar; Georg Pohnert
Journal: J Am Soc Mass Spectrom Date: 2018-12-19 Impact factor: 3.109

3. High resolution techniques: general discussion.

Authors: Elaine Adair; Carlos Afonso; Nicholle G A Bell; Antony N Davies; Marc-André Delsuc; Ruth Godfrey; Royston Goodacre; Jeffrey A Hawkes; Norbert Hertkorn; Donald Jones; Pedro Lameiras; Adrien Le Guennec; Anneke Lubben; Mathias Nilsson; Ljiljana Paša-Tolić; Josh Richards; Ryan P Rodgers; Christopher P Rüger; Philippe Schmitt-Kopplin; Peter J Schoenmakers; Philip Sidebottom; Dan Staerk; Stephen Summerfield; Dušan Uhrín; Pieter van Delft; Justin J J van der Hooft; Fleur H M van Zelst; Alexander Zherebker
Journal: Faraday Discuss Date: 2019-08-15 Impact factor: 4.008

Mass Difference Matching Unfolds Hidden Molecular Structures of Dissolved Organic Matter.

Introduction

Experimental Section

Results and Discussion

Tandem MS Fragmentation of Reference Compounds and Construction of Δm Lists

Fragmentation Behavior of Soil DOM

Evaluation of the Δm Matching Approach

Clustering with Reference Compound Δm’s Reflects Structural Trends

Drivers of Differences in Δm Matching between Soil DOM and SRNOM

Ion Abundance Is Linked to Δm Matching Frequency and Structural Diversity

Implications

1. ESI-MS/MS analysis of underivatised amino acids: a new tool for the diagnosis of inherited disorders of amino acid metabolism. Fragmentation study of 79 molecules of biological interest in positive and negative ionisation mode.

2. Gas-Phase Chemistry in the GC Orbitrap Mass Spectrometer.

3. High resolution techniques: general discussion.

Review 4. The size-reactivity continuum of major bioelements in the ocean.

5. Fact or artifact: the representativeness of ESI-MS for complex natural organic mixtures.

6. Moving beyond the van Krevelen Diagram: A New Stoichiometric Approach for Compound Classification in Organisms.

7. Fragmentation studies for the structural characterization of marine dissolved organic matter.

8. Major structural components in freshwater dissolved organic matter.

9. Data-Based Chemical Class Regions for Van Krevelen Diagrams.

10. KEGG Atlas mapping for global analysis of metabolic pathways.