| Literature DB >> 35871703 |
Max Reuschenbach1,2, Lotta L Hohrenk-Danzouma1,2, Torsten C Schmidt1,2,3, Gerrit Renner4,5.
Abstract
High-resolution mass spectrometry is widely used in many research fields allowing for accurate mass determinations. In this context, it is pretty standard that high-resolution profile mode mass spectra are reduced to centroided data, which many data processing routines rely on for further evaluation. Yet information on the peak profile quality is not conserved in those approaches; i.e., describing results reliability is almost impossible. Therefore, we overcome this limitation by developing a new statistical parameter called data quality score (DQS). For the DQS calculations, we performed a very fast and robust regression analysis of the individual high-resolution peak profiles and considered error propagation to estimate the uncertainties of the regression coefficients. We successfully validated the new algorithm with the vendor-specific algorithm implemented in Proteowizard's msConvert. Moreover, we show that the DQS is a sum parameter associated with centroid accuracy and precision. We also demonstrate the benefit of the new algorithm in nontarget screenings as the DQS prioritizes signals that are not influenced by non-resolved isobaric ions or isotopic fine structures. The algorithm is implemented in Python, R, and Julia programming languages and supports multi- and cross-platform downstream data handling.Entities:
Keywords: Centroiding; Data processing; Data quality; HRMS
Mesh:
Substances:
Year: 2022 PMID: 35871703 PMCID: PMC9411079 DOI: 10.1007/s00216-022-04224-y
Source DB: PubMed Journal: Anal Bioanal Chem ISSN: 1618-2642 Impact factor: 4.478
Fig. 1Procedure of our new algorithm for centroiding profile mass spectra: The standard errors associated with the parabola in step 2 are used for the calculation of the Data Quality Scores
Fig. 2A Scatterplot of Data Quality Scores (DQS). Most peak profiles (99.7%) obtain a DQS above 0.90 which means that they show good agreement with the Gaussian model. B–E Four examples for peak profiles with fits (red) with different Data Quality Scores. The dotted lines mark the connection to the equal zero intensities located around the peak profiles. These values are not included for fitting
Fig. 3A Absolute mass accuracy in ppm compared for centroids falling into the categories I–IV. Centroids with a low DQS tend to show a higher deviation from the expected exact mass. B Precision in m/z compared for centroids falling into the categories I–IV given as relative standard deviation (RSD) in ppm. Centroids with a low DQS have lower precision in their m/z values. The categories are selected using the relative error for Gaussian peak area (1%, 5%, and 33%) and their corresponding DQS. The boxes enclose the interquartile range (IQR) and the median (horizontal line). The whiskers describe the quartiles 1.5 IQR
Fig. 4A Exemplary chromatographic profile that shows a pronounced peak. The Data Quality Score (DQS) of the centroids that formed this chromatographic peak is high (category I from Fig. 3), which indicates that the former peak profiles were of Gaussian shape. The color scale is given in subfigure C. B Peak profiles of the centroids applied in subfigure A. This subfigure shares the y-axis with subfigure A. The peak profiles are of Gaussian shape and symmetric; thus, the chance for the presence of non-resolved isobaric peaks or isotopic fine structures is reduced. C Chromatographic peak with lower DQS value (category II) in the data points. D The peak profiles of the centroids presented in subfigure C show higher asymmetry and lower match with the Gaussian model. Therefore, there is high potential for the presence of non-resolved underlying peaks in the peak profiles. This subfigure shares the y-axis with subfigure C
Fig. 5A Relationship between profile peak width [FWHM] and m/z centroid position. B Relationship between mass resolution and m/z centroid position. The mass resolution is calculated with Eqs. 15 and 16. To determine the relationship between the dependent and independent variables, a power law was fitted with non-linear regression
Fig. 6Histogram with 50 classes of relative m/z difference [ppm] between our obtained centroids and msConvert’s results. Median and interquartile range (IQR) are determined from 1129 difference values