| Literature DB >> 35343683 |
Stef R A Molenaar1,2, Bram van de Put1,2,3, Jessica S Desport1,2, Saer Samanipour1,2, Ron A H Peters1,2,4, Bob W J Pirok1,2.
Abstract
A fast algorithm for automated feature mining of synthetic (industrial) homopolymers or perfectly alternating copolymers was developed. Comprehensive two-dimensional liquid chromatography-mass spectrometry data (LC × LC-MS) was utilized, undergoing four distinct parts within the algorithm. Initially, the data is reduced by selecting regions of interest within the data. Then, all regions of interest are clustered on the time and mass-to-charge domain to obtain isotopic distributions. Afterward, single-value clusters and background signals are removed from the data structure. In the second part of the algorithm, the isotopic distributions are employed to define the charge state of the polymeric units and the charge-state reduced masses of the units are calculated. In the third part, the mass of the repeating unit (i.e., the monomer) is automatically selected by comparing all mass differences within the data structure. Using the mass of the repeating unit, mass remainder analysis can be performed on the data. This results in groups sharing the same end-group compositions. Lastly, combining information from the clustering step in the first part and the mass remainder analysis results in the creation of compositional series, which are mapped on the chromatogram. Series with similar chromatographic behavior are separated in the mass-remainder domain, whereas series with an overlapping mass remainder are separated in the chromatographic domain. These series were extracted within a calculation time of 3 min. The false positives were then assessed within a reasonable time. The algorithm is verified with LC × LC-MS data of an industrial hexahydrophthalic anhydride-derivatized propylene glycol-terephthalic acid copolyester. Afterward, a chemical structure proposal has been made for each compositional series found within the data.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35343683 PMCID: PMC9008690 DOI: 10.1021/acs.analchem.1c05336
Source DB: PubMed Journal: Anal Chem ISSN: 0003-2700 Impact factor: 6.986
User-Defined Parameters Used in the Algorithm
| Symbol | Parameter | Value |
|---|---|---|
| ROI analysis: minimum mass peak intensity | 100 counts | |
| Δ | ROI analysis: mass tolerance | 0.15 Da |
| ROI analysis: minimum number of consecutive datapoints | 6 scans | |
| background removal: occurrence
of signals ( | ||
| clustering: maximum Mahalanobis distance | 0.05 || 0.15 | |
| MARA: minimum mass of repeat unit | 12.0000 Da | |
| ΔMRmax | MARA: mass remainder tolerance when binning | 0.05 Da |
| MARA: optional parameter. The mass of the adduct | 22.9898 Da |
Figure 1Overview of the proposed data analysis strategy. Colors indicate (red) discard irrelevant data, (purple) grouping of data, and (green) classification of the compositional series.
Figure 2(A) LC × LC–MS plot. (B) Unfolded TIC signal. (C) Cumulative mass spectrum between 37 and 39 min. (D) Cumulative mass spectrum between 40 and 42 min.
Figure 3(A) Histogram of mass differences between all found ROIs. The most occurring difference at 206.0571 Da corresponds to the mass of the repeating unit (PG-TPA) of the polyester. (B) Histogram of the found MRs within the polyester data. All groups are numbered from the highest to the lowest intensity. The inset shows a zoomed-in region of the MR plot for series 2 and 9.
Figure 4(A) Most prominent compositional series. (B) Second most prominent compositional series. A minor contamination of series 9 is also visible. (C) Third most prominent compositional series. (D) Cumulative chromatogram of the three most prominent compositional series.
Figure 5Contour plot of the first, second, and ninth most prominent groups. Group 9 showed chromatographic overlap with group 1 but different chromatographic behavior than the second most prominent group.
Figure 6Contour plot of the polyester data showing the approximate positions of the 12 found groups. For a more detailed figure, please refer to the Supporting Information (Figure S8).