Literature DB >> 30433878

Beyond monoisotopic accurate mass spectrometry: ancillary techniques for identifying unknown features in non-targeted discovery analysis.

Joachim D Pleil, M Ariel Geer Wallace, James McCord.

Abstract

High-resolution mass spectrometry (HR-MS) is an important tool for performing non-targeted analysis for investigating complex organic mixtures in human or environmental media. This perspective demonstrates HR-MS compound identification strategies using atom counting, isotope ratios, and fragmentation pattern analysis based on 'exact' or 'accurate' mass, which allows analytical distinction among mass fragments with the same integer mass, but with different atomic constituents of the original molecules. Herein, HR-MS technology is shown to narrow down the identity of unknown compounds for specific examples, and ultimately inform future analyses when these compounds reoccur. Although HR-MS is important for all biological media, this is particularly critical for new methods and instrumentation invoking exhaled breath condensate, particles, and aerosols. In contrast to standard breath gas-phase analyses where 1 mass unit (Da) resolution is generally sufficient, the condensed phase breath media are particularly vulnerable to errors in compound identification because the larger organic non-volatile molecules can form identical integer mass fragments from different atomic constituents which then require high-resolution mass analyses to tell them apart.

Entities: CellLine Chemical Disease Species

Mesh：

Substances：
Isotopes

Year: 2018 PMID： 30433878 PMCID： PMC6394216 DOI： 10.1088/1752-7163/aae8c3

Source DB: PubMed Journal: J Breath Res ISSN： 1752-7155 Impact factor: 3.262

Premise

High-resolution mass spectrometry (HR-MS) has introduced an additional dimension for identifying unknown organic compounds in complex mixtures. Termed ‘non-targeted analysis’, this approach is defined as agnostic with respect to analytes; the samples are processed with no preconception as to content and the results are considered to be unidentified chemical ‘features’. These features are then post-processed to assign chemical formulae based on the HR-MS results (Schymanski , Sobus ). This perspective is intended to explain the value and general methods for HR-MS in investigating complex organic mixtures as in human blood, breath and urine, and demonstrate the value of some compound identification strategies with specific examples. The primary HR-MS parameter, referred to as ‘exact’, or ‘accurate’ mass, allows analytical distinction among mass fragments with the same integer mass, but with different atomic constituents of the original molecules. The most basic discrimination occurs when separating ionic fragments with resolution at the fourth or fifth decimal place. This is a vast improvement over standard single-digit mass units (integer mass) instrumentation. The underlying concepts and physics principles of exact mass analysis have been described in detail in a predecessor article (Pleil and Isaacs 2016). Such ‘monoisotopic’ analysis is only the first, and most basic step. Once features are located and assigned to a chromatographic retention time or retention index, they can be further investigated by their molecular fragmentation patterns, the exact masses of the fragments, and their relative isotopic abundances (McLafferty ). This perspective describes how to exploit new technology for identifying unknown organic constituents within complex matrices, develop reasonable confidence in their identity, and ultimately inform future analyses when these compounds reoccur. Although HR-MS is crucial for all biological media, this is particularly important for new methods and instrumentation invoking exhaled breath condensate, particles, and aerosols as a diagnostic biological medium complementing blood and urine analysis (Ladva , Zamuruyev , Ghio , Sauvain , Winters , Wallace and Pleil 2018a, 2018b). In contrast to standard breath gas-phase analyses where 1 Da resolution is generally sufficient, the condensed phase media are particularly vulnerable to errors in identification as the larger organic molecules contained therein have more possibilities for forming integer mass fragments. Various forms of HR-MS are now being employed in new breath applications. A cursory search of recent articles finds that real-time and gas chromatographic (GC) instruments are employing time-of-flight (ToF) as a replacement for linear quadrupole detectors to improve discrimination of ionic fragments and that liquid chromatography HR-MS applications using ToF, MS-MS and orbitrap instruments are becoming more prevalent (Herbig , Sukul , Nizio , Peralbo-Molina , Andra , Li , Bregy , Singh ).

Overview

As discussed in recent conceptual articles, chemical toxicity testing and human disease diagnosis have evolved beyond simple targeted analysis of chemicals of exposure, their chemical biomarkers, and certain endogenous response metabolites (Krewski , Ala-Korpela , Teeguarden , Vineis ). Basically, samples are analyzed for as many compounds as feasible within a particular laboratory’s capability, and subsequently subjected to statistical analysis to identify ‘features’ characteristic of the behavior under investigation. The simplest version of this process is designating ‘case-control’ pairs, where one sample is treated or exposed in some fashion. At this point, the results are compared and unknowns are differentiated between cases and controls by identity, relative concentration, or correlation and are subsequently further explored. The methodology has been implemented for a wide range of sample types including human blood, breath, and urine, as well as for in vitro systems investigating chemical changes in cell-lines, tissue biopsies, and bacterial/ fungal cultures (Vorst , Aura , Croley , Kerian ). Until recently, discrimination among sample groups has relied on targeted compounds, or those that were readily identified by existing methodologies. Data post-processing for such targeted experiments is relatively straightforward, and has for the most part been streamlined with computational tools. The advent of HR-MS, along with extraordinary advances in sensitivity, has resulted in an explosion of available data, and a concomitant burden on researchers in deciphering their meaning. Currently, the newest non-targeted (discovery) analyses require detailed supervision from subject matter experts to provide defensible results. A non-targeted experiment refers to one aiming to observe as many chemicals as possible from a complex sample mixture; in practice complete detection of all chemicals in a mixture is not possible with a single technique. Non-targeted approaches allow for sophisticated data mining and multivariate analysis to tease apart individual compounds associated with sample groups, but appropriate choice of techniques and experimental design is non-trivial. The state-of-the-art technology for interpreting HR-MS data is still under development; instrument manufacturers each have proprietary software/firmware and numerous open-source data analysis packages exist, which further complicates the task of creating a single data analysis workflow. As HR-MS technologies are becoming more commonplace in the analytical laboratory, we have developed some general guidance for the analyst community as to how to implement compound identification techniques beyond rudimentary library searches of exact match candidates. Specifically, we present a series of examples where uncertainty is reduced for assigning chemical structure and formula by implementing exact mass fragments and isotope ratios. We further suggest how software products could assist in automating aspects of decision-making for identifying unknown chemicals.

High-resolution mass spectrometry: what is ‘high-res’?

When comparing mass spectrometry instrumentation, there are several critical parameters of merit. The most central is the mass resolving power of the platform defined as the ability to successfully distinguish two closely separated masses. The International Union of Pure and Applied Chemistry (IUPAC) definition of MS resolution (R) is R = M/ΔM where M is the measured mass, and ΔM is the separation required to distinguish two peaks at a certain height from the baseline, similar to the definition of peak resolution in chromatography (IUPAC 1997). An instrument might be discussed in terms of the minimum separation required to resolve two equal height mass peaks (e.g. mass resolution of 0.001 Da at 250 Da). IUPAC further defines a single peak measurement methodology, also called mass resolving power, as M/ΔM where ΔM is the peak width at a specified height (figure 1). Because resolving power is an instrument performance parameter, it is the most commonly quoted value, and is traditionally measured at the full width at half maximum height (FWHM) of the MS peak (figure 1). Mass resolution can be measured at any degree of peak separation but most frequently at 10% of the maximum peak height, equivalent to the full peak width at 5% peak height for an isolated gaussian peak. It is worth noting that in common usage the terms resolution and resolving power are used interchangeably but should always specify the means of determination and the target mass (e.g. RFWHM @ m/z 200 = 100 000) because resolution/resolving power values are dependent on the mass measured and the height of peak measurement. For the remainder of this manuscript R will refer to FWHM resolving power at the discussed mass.

Figure 1.

Mass resolution definitions. Simulated FWHM and 10% valley ΔM measurements for a pair of isotopes at RFWHM ~ 1000 (left) and RFWHM ~ 40 000 (right).

An example calculation for theoretical peak pairs is shown in figure 1. A peak with a mass of 400.0000 and an observed peak width at half height of 0.5 Da has a single peak resolving power calculated as: The resolution of this peak from another peak at 401.0000 at 10% of the chromatographic peak width is 1 Da and has an R value calculated as: At a significantly higher resolving power the peak widths decreased substantially (figure 1, right) and the effective mass resolution likewise decreased (0.02 Da for the figure shown). Given the incremental progress of mass spectrometry instrumentation over the years, there is a constantly moving goalpost for describing when an instrument or spectrum is ‘high-resolution.’ This is further complicated by instrument manufacturers changing target masses for quoted resolution, and intermixed usage of FWHM and 10% valley definitions. Nevertheless, there are broad thresholds of resolution at which increasing amount of chemical information can be gained. (Marshall , Marshall and Hendrickson 2008) At R ~ 1000 nominal masses are distinguishable, (i.e., isotopic peaks of a single spectrum) and at R ~ 100 000 isotopologues differing only in the presence of nominally identical isotopes begin to separate (e.g., 13C12 C4 14NH5 and 12C5 15NH5, see figure 2).

Figure 2.

Theoretical MS spectrum of Pyridine (C5H5N) at increasing resolving power. At sufficiently high resolving power the mass peaks corresponding to 13C12C414NH5 and 12C515NH5 can be distinguished.

For the purposes of this commentary, any measurements capable of resolving the isotopic fingerprint of molecules can apply the compound identification strategies discussed. We also note that the mass described by significant digits beyond integer mass may be referred to as ‘exact mass’ or as ‘accurate mass’. In general, ‘exact mass’ is the fragment mass calculated from a known chemical formula, where as the term ‘accurate mass’ refers to a measurement with high precision; however, they are used interchangeably for the purposes of identifying compounds.

High-resolution mass spectrometry: a brief history

Exact (or accurate) mass spectrometers are not new; they trace their origins to ‘one of a kind’ cyclotron and magnetic sector instruments at major research centers that were used to separate inorganic radio-isotopes and organic molecules (e.g., Beynon 1956, Beynon 1959, Henning , De Laeter and Kurz 2006). A timeline of the history of exact mass measurement is available from the archives of the American Society for Mass Spectrometry; the Society attributes the initial achievements of high resolving power mass spectrometry to E O Lawrence and M S Livingston in 1932, and lists some of the major technical advances for Time of Flight (ToF-MS) in 1956, double-magnetic sector geometries from 1957 to 1960, and Fourier transform ion cyclotron resonance from 1965 to 1968 (Grayson 2008). The history of commercial HR-MS products began in the 1940’s. The Chemical Electrodynamics Corporation entered the MS commercial market in 1943 with the 21–101 Mass Spectrometer based on magnetic sector technology; it was used to assess petroleum hydrocarbons and had an estimated mass resolution of 0.05 Da at 250 Da (R ~ 4000) (Carlson ). The Omegatron, based on vacuum tube technology, was developed by University of Minnesota in 1949 as an MS instrument designed primarily as a residual gas analyzer separating unit Da gases for vacuum applications (Zdanuk ); a patent filed in 1957 indicates that it was improved to achieve resolution to ~0.01 Da @ 250 Da (R ~ 25 000) (McNarry and Hobson 1957). Subsequently other instruments entered the commercial HR-MS market including the Bendix ToF-MS (Wiley 1956), the Finnigan MAT Sector in 1978 (Huebschmann 2011), and the Bruker FTICR in 1983, based on research by Mel Comisarow and Alan Marshall (Comisarow and Marshall 1974). These legacy commercial instruments had resolving powers up to 10 000(~1 ppm). Contemporary HR-MS systems almost always incorporate a high-performance gas or liquid chromatography platform for analyte separation coupled to a TOF or FT detector. Some exceptions are the use of ionization sources (e.g., MALDI, SELDI, APCI, PTR, etc.) without a separation step that are directly coupled to the MS (Byrdwell 2001, Petricoin and Liotta 2003, Jordan , Gaugg , Ruhaak ). High-performance TOF instruments routinely generate resolutions of 10 000+, and research platforms exceeding 40 000, with very fast cycle times for fragmentations scans. While cutting edge research on FTICR continues, producing multi-million resolving power instruments (Hendrickson ), an electrostatic trapping FT instrument, the Orbitrap (Makarov 2000), has become the most prevalent FT-MS platform. Orbitrap instruments likewise can perform rapid MS/MS fragmentation with scalable MS and MS/MS resolutions from 10 000 to 100 000+. The prevalence of these high resolving power instruments, coupled with the power of accurate mass analysis and achievable MS/MS fragmentation duty cycles allows for analysts to achieve an unprecedented degree of information about unknown chemical species.

Interpreting high-resolution features: isotope ratios

The power of HR-MS measurement is the ability to resolve combinations of elements that differ in weights much less than 1.67 × 10−27 kg, or 1 Da. This enables very fine measurements of the exact mass, from which much information can be extracted (Pleil and Isaacs 2016), but also enables elemental composition analysis based on elemental isotopes. Each signal in a mass spectrometer is the measurement of a single type of ion composed of a linear combination of elemental isotopes. The mass of a molecule composed entirely of the most abundant stable isotope of each constituent element is known as the mono-isotopic mass (A) and is but one of many masses that can be measured for a molecule. Because isotopes of common elements such as carbon, sulfur, and many halogen species occur at appreciable levels in nature, molecules incorporating one or more higher mass stable isotopes are encountered whenever a molecular sample is measured. The variation in the number of neutrons contained in elemental isotopes creates atomic combinations with masses that differ nominally by a single Da, and these peaks are identified based on the nominal mass shift relative to the monoisotopic mass (i.e. A − 1, A+1, A + 2 etc.). The theoretical spectrum for a given molecule is thus the combination of all the peaks for all the isotopic combinations of atoms making up that molecule. Although rudimentary assignments of an atom can be made using isotope ratios using single Da resolution, it requires high-resolution to confirm which atoms are actually responsible for the isotopic ratio. For example, figure 3 shows the resolved structure for the two different A + 2 possibilities within the per-fluorinated compound C12H12F17NO5S, wherein the sulfur 34S and or two carbon 13C’s could each contribute to the A + 2 isotopic peak. Note that with sufficient resolution, one can distinguish the contribution from 34S versus that from 13C2, which could not be done with single Da resolution: and from 13C2: as shown in the inset centered a 607 Da.

Figure 3.

Theoretical High (R = 15 000) and Ultra-High (R = 150 000) resolution spectra of molecular formula C12H12F17NO5S. The additional information from exact masses of different isotopic constituents further confirms the monoisotopic identification of the original compound by showing the shift from the 34S versus the two 13C’s isotopes in the 607 Da centered peak (inset). Note: The peak labeled 13C2C10H12F17NO5S is not pure and contains further contributions that cannot be resolved under the conditions indicated.

The exact mass and relative abundance of atomic isotopes in nature is well characterized (table 1, Berglund and Wieser 2011). This means that calculating a theoretical mass spectrum is a straightforward, but computationally taxing problem of combinatorics, which has many implemented solutions (Valkenborg ). For the purposes of molecular formula generation, it is therefore possible to compare empirical chemical spectra against theoretical distributions for molecular formula generation. This is necessary for the assignment of molecules of middling complexity, because even an excellent <1 ppm mass accuracy is often insufficient to uniquely resolve the majority of chemicals currently known (Kind and Fiehn 2006). Note that the average molecular weight that appears in the standard Periodic Table of the Elements and is familiar to most chemists or biologists, is, in-fact, the weighted average of these many isotopic combinations.

Table 1.

Relative abundance values for stable isotopes of common organic elements. Derived from Isotopic compositions of the elements 2009 (Berglund and Weiser 2011).

Element	Isotope	Exact Mass (Da)	Composition Fraction
Hydrogen
¹H	A	1.007 825 032	0.998 85
²H	A+1	2.014 101 778	0.001 15
Carbon
¹²C	A	12.000 000 00	0.9893
¹³C	A+1	13.003 354 84	0.0107
Nitrogen
¹⁴N	A	14.003 074 00	0.996 36
¹⁵N	A+1	15.000 108 90	0.003 64
Oxygen
¹⁶O	A	15.994 914 62	0.997 57
¹⁷O	A+1	16.999 131 76	0.000 38
¹⁸O	A+2	17.999 159 61	0.002 05
Fluorine
¹⁹F	A	18.998 403 162	1
Phosphorus
³¹P	A	30.973 761 998	1
Sulfur
³²S	A	31.972 071 174	0.9499
³³S	A+1	32.971 458 910	0.0075
³⁴S	A+2	33.967 867 004	0.0425
Chlorine
³⁵Cl	A	34.968 526 82	0.7576
³⁷Cl	A+2	36.965 902 60	0.2424
Bromine
⁷⁹Br	A	78.918 3376	0.5069
⁸¹Br	A+2	80.916 2897	0.4931

The number of formula matching and scoring algorithms is extensive and constantly evolving, thanks in no small part to entries in the yearly Critical Assessment of Small Molecule Identification (CASMI) contest (http://casmi-contest.org/), which specifically invites the development of new formula generation software and techniques. Nevertheless, some of the basics for molecular assignment based on elemental composition are amenable to manual inspection of isotopic patterns and can be useful in the assignment of molecular formulae. It is worth noting that many of these strategies can be applied even to low resolution full-scan data, but this is difficult in complex samples where isotope peaks are convoluted with other compounds.

Example 1. Carbon Counting

For a typical organic molecule, the major impact on the A+1 abundance is the amount of carbon contained in the molecule. This is because carbon is generally the most prevalent individual atom in an organic structure, and the relative abundance of Carbon-13 (1.07%) is substantially higher than that from 2H (0.1%), 15N (0.3%), and 17O (0.03%). Under this enormous simplifying assumption, the abundance of the A+1 peak is simply the probability of having only one 13C in the molecule. For example, in a molecule with 8 carbon atoms, this can be approximated by the multi-nomial expansion (Valkenborg ) as having a likelihood of 8.3% and allows rough estimation of the carbon content of a molecule. Figure 4 shows the A+1 ion for four different configurations of organic molecules with 8 carbons. Although the relative prevalence of the A+1 varies slightly due to the other atoms, it maintains a value around 8%, indicating the presence of 8 carbon atoms in each chemical regardless of the other atoms. We note that the exact mass difference of 1.003 35 Da helps confirm that the isotope is 13C, rather than from 2H’s, 15N’s, or 17O’s.

Figure 4.

Theoretical shift of the A+1 peaks for four organic molecules compared to the calculated value for a molecule containing only carbon and hydrogen. In each case, the spacing between the monoisotopic mass (relative abundance 100%) and the 13CA+1 peak shown is 1.003 35 Da, and the relative abundance at ~8% indicates eight carbons. Fine isotope structure from 2H, 1.006 28 Da, from 15N, 0.997 03 Da, and from 33S, 0.999 387 Da, can be observed with sufficient resolving power.

Example 2. Sulfur Counting

As previously mentioned, the use of isotopic abundances can be of substantial value in narrowing the possible molecular formulae for a given mass. For example, a compound with an accurate mass of 307.083 906 has five possible formula matches within 5 ppm, and two within 1 ppm instrument resolution, when searched against the US Environmental Protection Agency’s (EPA) CompTox Chemistry Dashboard (https://comptox.epa.gov/dashboard), a publicly available database containing over 760 000 chemicals (Williams ). The A+1 peaks for these molecules have theoretical values which closely coincide, as is expected given the similar number of carbon atoms in these molecules. The A + 2 peak however, shows significant variation depending on the inclusion of sulfur or chlorine. Sulfur is common in biomolecules and has an A + 2 isotope with abundance ~5%, meaning that for a small molecule the A + 2 peak is similar or higher in abundance to the A+1 peak. Given an empirical spectrum with a relative A+1 and A + 2 abundance of 14% and 6% we could safely conclude this molecule is likely C10H17N3O6S, Glutathione as shown by the red curve in figure 5.

Figure 5.

Theoretical mass spectral patterns (R ~ 15 000) for five molecular formulae with exact mass 307.083 906 (within 5ppm instrument resolution). Insets show the A+1 and A + 2 isotope peaks. In the second extracted panel, the relative abundance of the A + 2 peaks for the five candidate compounds range from 2% to 30%. The 30% value indicates chlorine, and the entries below 3% eliminate sulfur (at 4.25%). Given that the true empirical spectrum has ~6% A + 2, and that sulfur and oxygen have A + 2 abundances of 4.25% and 0.25%, respectively, the relative A + 2 abundance for (O6 + S) should match 5.75%. Furthermore, the spacing between sulfur A and A + 2 is 1.995 795 83 whereas the shift for oxygen is 2.004 244 99, so the slight shift of the red trace confirms the sulfur atom, and the confirmed candidate is C10H17N3O6S, Glutathione.

Example 3. Halogen Counting

Several halogens offer very distinct isotopic patterns that are apparent to the naked eye. Both Bromine and Chlorine have major A + 2 isotopes with large abundances at ~25% for 37Cl and ~50% for 81Br. Consequently, an A + 2 peak of substantial magnitude can be representative of a compound containing one or more Cl or Br species. Inclusion of multiple halogen atoms yields complex splitting patterns in the MS spectrum, with spacing every two Daltons (figures 6, 7).

Figure 6.

Using isotopic abundance patterns to count the number of chlorine atoms in a C12 compound: at each individual mass fragment, exact mass can be further investigated to confirm the chemical formula. Chlorine has an abundance of A + 2 isotopes of ~25%, so a spacing at 1.997 375 78 Da with relative abundance of ~25% indicates 1 chlorine, 63% indicates 2 chlorines, 100% indicates 3 chlorines, etc. The spacing of A+1 at 1.003 354 84 Da, and the relative abundance at ~12%indicates 12 carbons.

Figure 7.

Similarly to figure 6, isotopic abundance patterns are used to count the number of bromine atoms in a C12 compound; at each monoisotopic mass fragment, exact mass can be further investigated to confirm the chemical formula. Bromine has an abundance of A + 2 isotopes of ~50% with a spacing of 1.997 9521 Da; as such abundances of A/(A + 2) = 1 indicates 1 bromine, and further ratios of A/(A + 2) = 2 indicates 2 bromines, A/(A + 2) = 3 indicates 3 bromines, etc.

Interpreting high-resolution features: fragmentation patterns

When organic compounds are fragmented, whether in-source by electron ionization (EI) or chemical ionization, or through intentional MS/MS, it is possible to form charged fragments, each with their characteristic isotopic features. Even at low resolution, the patterns of these fragments serve as an additional dimension for identifying compounds, especially when the molecular ion is missing or uncertain. High resolution MS can help distinguish between multiple possibilities for the atomic composition of a particular fragment just as easily as a molecular species. For example, a mass fragment at 85 Da could be either C6H13 or C5H9O at 1 Da resolution. However, at high resolution the quandary is resolved, as the accurate masses 85.101 725 and 85.065 339, respectively, are easily separated. Much like the isotopic mass patterns discussed in the previous section, accurate mass of fragments serves as identifiers of molecular substructures. For larger molecules with many fragments, this allows a plausible, but complicated process to reassemble the original chemical structure. Instead, patterns are most frequently compared against library spectra to generate a list of potential structures and/or substructures from the detected fragments. A standard library search (at unit resolution) generally gives a long list of potential chemical formulae, but high-resolution fragment spectra avoids many assignment issues regardless of the fragmentation method or structural assignment approach.

Example 1. Precursor discrimination in MS/MS

MS analysis with a triple-quadrupole mass spectrometer (QqQ) requires the ionization and selection of a precursor molecule with a single quadrupole, which is passed into a second collision cell and fragmented, before fragments are isolated by a third and final quadrupole section. The process of selecting both precursor and fragment at a known collision energy is intended to be very specific, even with low resolution quadrupoles. Nevertheless, false positives can occur between compounds of similar precursor mass with non-specific fragment transitions. Consider the following three chemicals, taurodeoxycholic acid (TDCA), perfluorooctanesulfonic acid (PFOS), and 8-(acetyloxy)-1,3,6-pyrenesulfonic acid (1,3,6-PSA) which might coelute in food and fish samples contaminated with fluorinated compounds. (table 2).

Table 2.

Three organic molecules demonstrating low resolutionMS/MStransitions of 499 > 80 from the loss of sulfonate.

Name	Formula	Exact Mass
TDCA	C26H45NO6S	499.296 75
PFOS	C8HF17O3S	499.937 49
8-(acetyloxy)-1,3,6-PSA	C18H12O11S3	499.954 17

Each of the three compounds exhibit a similar precursor mass within a 1 Da isolation window and share a common mass fragment transition at 499 > 80 due to the production of an SO3- ion fragment. The QqQ transition is thus non-specific at low resolution and requires further comparison to reference standards or the inclusion of secondary transitions to ensure the identity of the species and the purity of the transition observed. Using a higher resolution instrument, the transitions are obviously distinct. PFOS and TDCA transitions of 498.9 > 80.0 and 498.2 > 80.0 are resolvable at R ~ 1000, while PFOS and 1,3,6-PSA require a slightly higher resolution of R ~ 30 000 to distinguish the respective transitions of 498.9375 > 79.9568 and 498.9463 > 79.9568. Note that a typical HR-MS instrument still uses a low resolution quadrupole for precursor isolation, so coeluting species would not have isolated precursors, but accurate assignment of distinct precursor masses in a precursor scan allows for recognition of false assignments.

Example 2. Substructure Fragments

Distinguishing sub-structural fragments of an unknown compound allows for unequivocal identification of the structure. Consider a compound with the monoisotopic mass 270.0892, assigned empirical formula of C16H14O4, and an EI fragment spectrum as shown in figure 8.

Figure 8.

Theoretical EI spectrum of Cardamonin, with chemical formula C16H14O4 and monoisotopic mass 270.0892 indicating major fragments and losses.

A search of the US EPA CompTox Chemistry Dashboard yields 228 chemicals with this empirical formula, but significant variation in structure that can be investigated by fragmentation. Even at low resolution, the observable 193 fragment corresponds to the loss of a phenyl ring (loss of 77), allowing significant narrowing of the search space. The 139 fragment is more difficult to assign given only a low-resolution spectrum, as it could correspond to the loss of either C9H7O (131.0497) or C10H11 (131.0861). With high resolution mass measurement, the exact mass differential of 131.0492 is consistent with the C9H7O loss. The fragmentation is thus consistent with a chalcone backbone with dihydroxy and methoxy functionalization on a single phenyl ring, such as Cardamonin shown in figure 8. Further details on structural elucidation of complex structures based on MS/MS are beyond the scope of this manuscript, but comparison with reference spectra allows for validation even in the absence of sufficient expertise for de novo elucidation.

Interpreting high-resolution features: automated data reduction and high-resolution search algorithms

So far, the identifications schemes have been implemented manually, that is, each analytical feature was investigated individually using the expertise of the researcher with assistance from mass look-up tables and database searches. With the advent of higher sensitivity instrumentation, we are now faced with thousands of features per sample for hundreds or more samples from any given study. As such, the major HR-MS manufacturers, such as Agilent Technologies (Santa Clara, CA, USA), ThermoFisher Scientific (Waltham, MA, USA), SCIEX (Framingham, MA, USA), LECO (Saint Joseph, MI, USA), Shimadzu (Nakagyo-ku, Kyoto, Japan), and Waters (Milford, MA, USA), have all developed their own proprietary software specific for their platforms to automate identifications of unknowns. In addition, academic researchers have been working on more generic algorithms that can receive input from different instrumental data streams. Regardless of the exact implementation, the overarching goal is to employ the subject matter expertise described above in an automated fashion to expedite compound identification. Briefly, some of the major software packages for feature extraction, compound identification and annotation, as well as statistical analysis and visualization include the following: MassHunter Profinder (Agilent Technologies) Mass Profiler Professional (Agilent Technologies) Unknowns Analysis (Agilent Technologies) BioConfirm Software (Agilent Technologies) Mass Frontier Spectral Interpretation Software (ThermoFisher Scientific) Compound Discoverer Software (small molecule identification) (ThermoFisher Scientific) TargetQuan3 Software (identify persistent organic pollutants) (ThermoFisher Scientific) ToxFinder (ThermoFisher Scientific) PeakView Software (SCIEX) MarkerView Software (SCIEX) XCMSPlus Software (SCIEX) ChromaTOF (LECO) LabSolutions Insight (Shimadzu) ChromaLynx (Waters) MarkerLynx (Waters). Each vendor has their own preferred software packages for data reduction and analysis, and no two packages provide all of the same features and capabilities. Software selections depend upon the instrumentation being used for analysis as well as the needs of the researcher. Additionally, computer-based spectral fragmentation software methods, including CFMID, MAGMA, and MaxQuant are available to assist with data analysis (Cox and Mann 2008, Allen ). MaxQuant consists of a series of algorithms that can be used for peak detection and quantification as well as identification of fragmentation spectra (Cox and Mann 2008). CFM-ID is a web server that can be used for peak annotation, spectra prediction, and metabolite identification (Allen ). XCMS software is also now available as an online tool that can be used to analyze non-targeted LC/MS metabolomics data through feature detection, retention time correction, and alignment as well as providing a platform for statistical analyses and data visualization tools (Tautenhahn ). General mass spectral libraries such as the National Institute of Standards and Technology library can be used to search non-targeted mass spectral data. Additional databases, such as the U.S. EPA CompTox Chemistry Dashboard can be implemented into the data processing workflow to help characterize identified compounds of interest. Compounds can be searched by chemical name, CASRN, DSSTox ID, MSready formulae, exact formulae, monoisotopic mass, or InChIKey. A list of tentative compound identifications based on the results of MS data post-processing can be retrieved from the Dashboard. Additional compound information, including presence in lists, number of data sources, National Health and Nutrition Examination Survey predicted exposure, and number of PubMed articles can also be downloaded from this site, as well as specific properties of compounds of interest. Online databases, including the U.S. EPA CompTox Chemistry Dashboard, US EPA DSSTox, and ChemSpider can be used to rank order unknown compounds based on monoisotopic masses and chemical formulae as well as retrieve important chemical identification information, properties, and structural images (Rager , McEachran , Sobus ). Metabolomics databases, such as the human metabolome database (http://hmdb.ca/), METLIN (https://metlin.scripps.edu/landing_page.php?pgcontent=mainPage), and the Golm Metabolome Database (http://gmd.mpimp-golm.mpg.de/) can be used to search high resolution tandem MS/MS, GC/MS, and LC/MS spectra for known and unknown human metabolites (Rathahao-Paris ).

Summary and recommendations

Accurate (or exact) mass from HR-MS brings an additional dimension to the identification of complex organic molecules. The use of monoisotopic exact mass alone is a great improvement over standard single Da unit resolution, and often narrows compound identity from hundreds of possibilities to a dozen or so. Herein, we discussed the value of going beyond this single monoisotopic focus and incorporated additional knowledge to narrow the possibilities even further. There are two basic techniques that help confirm the suspected identities of unknown features based on their monoisotopic mass library search: count major constituent atoms such as carbon, sulfur, or halogens based on their isotopic abundances and exact mass differences between isotopes; identify the various molecular fragments from ionization by their exact mass and gain additional information to reconstruct the original molecule based on its building blocks. These methods can be implemented on a molecule-by-molecule basis by the researcher and are generally a first approach for initial evaluation of analytical features thought to be of importance. For example, in a case-control evaluation, a handful of features may be differentiated between the groups, and a hands-on identification effort is called for. However, this is not practical for processing hundreds (possibly thousands) of features from many samples. As described briefly, efficient data reduction requires specialized software. There are many entries in the mass spectrometry software arena that are constantly being updated and improved. The primary focus currently is building reference data used to confirm the calculations based on empirical spectra. As HR-MS instruments improve in sensitivity and mass accuracy, the combination of exact mass, knowledge of relative isotope abundances and the increasing size of confirmatory databases will continue to improve identification of unknowns in complex biological samples. The primary importance of this process is to document as many subsets of biological exposome data as possible to develop a comprehensive understanding of human systems biology. As we better understand the connections between health-related stressors and the human response through investigations of subtle biochemical perturbations, we will be able to develop interventions to preserve public health.

42 in total

1. Electrostatic axially harmonic orbital trapping: a high-performance technique of mass analysis

Authors:
Journal: Anal Chem Date: 2000-03-15 Impact factor: 6.986

Review 2. Scaling MS plateaus with high-resolution FT-ICRMS.

Authors: Alan G Marshall; Christopher L Hendrickson; Stone D H Shi
Journal: Anal Chem Date: 2002-05-01 Impact factor: 6.986

3. XCMS Online: a web-based platform to process untargeted metabolomic data.

Authors: Ralf Tautenhahn; Gary J Patti; Duane Rinehart; Gary Siuzdak
Journal: Anal Chem Date: 2012-05-10 Impact factor: 6.986

4. The chromatographic role in high resolution mass spectrometry for non-targeted analysis.

Authors: Timothy R Croley; Kevin D White; John H Callahan; Steven M Musser
Journal: J Am Soc Mass Spectrom Date: 2012-06-19 Impact factor: 3.109

5. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification.

Authors: Jürgen Cox; Matthias Mann
Journal: Nat Biotechnol Date: 2008-11-30 Impact factor: 54.908

6. Method validation of nanoparticle tracking analysis to measure pulmonary nanoparticle content: the size distribution in exhaled breath condensate depends on occupational exposure.

Authors: J-J Sauvain; G Suarez; J-L Edmé; O M P A Bezerra; K G Silveira; L S Amaral; A P S Carneiro; N Chérot-Kornobis; A Sobaszek; S Hulo
Journal: J Breath Res Date: 2017-01-24 Impact factor: 3.262

7. Human breath metabolomics using an optimized non-invasive exhaled breath condensate sampler.

Authors: Konstantin O Zamuruyev; Alexander A Aksenov; Alberto Pasamontes; Joshua F Brown; Dayna R Pettit; Soraya Foutouhi; Bart C Weimer; Michael Schivo; Nicholas J Kenyon; Jean-Pierre Delplanque; Cristina E Davis
Journal: J Breath Res Date: 2016-12-22 Impact factor: 3.262

Review 8. Atmospheric pressure chemical ionization mass spectrometry for analysis of lipids.

Authors: W C Byrdwell
Journal: Lipids Date: 2001-04 Impact factor: 1.880

Review 9. Integrating tools for non-targeted analysis research and chemical safety evaluations at the US EPA.

Authors: Jon R Sobus; John F Wambaugh; Kristin K Isaacs; Antony J Williams; Andrew D McEachran; Ann M Richard; Christopher M Grulke; Elin M Ulrich; Julia E Rager; Mark J Strynar; Seth R Newton
Journal: J Expo Sci Environ Epidemiol Date: 2017-12-29 Impact factor: 5.563

10. CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra.

Authors: Felicity Allen; Allison Pon; Michael Wilson; Russ Greiner; David Wishart
Journal: Nucleic Acids Res Date: 2014-06-03 Impact factor: 16.971

2 in total

1. Contemporary human breath related topics: aerosols, saliva, and HR-MS bioinformatics from Pittcon 2019.

Authors: Joachim D Pleil; M Ariel Geer Wallace; Wolfram Miekisch; Fabio DiFrancesco
Journal: J Breath Res Date: 2019-05-31 Impact factor: 3.262

2. Advances in proton transfer reaction mass spectrometry (PTR-MS): applications in exhaled breath analysis, food science, and atmospheric chemistry.

Authors: Joachim D Pleil; Armin Hansel; Jonathan Beauchamp
Journal: J Breath Res Date: 2019-06-04 Impact factor: 3.262

2 in total