| Literature DB >> 34843646 |
Saer Samanipour1,2,3, Phil Choi2,4, Jake W O'Brien2, Bob W J Pirok1, Malcolm J Reid3, Kevin V Thomas2.
Abstract
Centroiding is one of the major approaches used for size reduction of the data generated by high-resolution mass spectrometry. During centroiding, performed either during acquisition or as a pre-processing step, the mass profiles are represented by a single value (i.e., the centroid). While being effective in reducing the data size, centroiding also reduces the level of information density present in the mass peak profile. Moreover, each step of the centroiding process and their consequences on the final results may not be completely clear. Here, we present Cent2Prof, a package containing two algorithms that enables the conversion of the centroided data to mass peak profile data and vice versa. The centroiding algorithm uses the resolution-based mass peak width parameter as the first guess and self-adjusts to fit the data. In addition to the m/z values, the centroiding algorithm also generates the measured mass peak widths at half-height, which can be used during the feature detection and identification. The mass peak profile prediction algorithm employs a random-forest model for the prediction of mass peak widths, which is consequently used for mass profile reconstruction. The centroiding results were compared to the outputs of the MZmine-implemented centroiding algorithm. Our algorithm resulted in rates of false detection ≤5% while the MZmine algorithm resulted in 30% rate of false positive and 3% rate of false negative. The error in profile prediction was ≤56% independent of the mass, ionization mode, and intensity, which was 6 times more accurate than the resolution-based estimated values.Entities:
Mesh:
Year: 2021 PMID: 34843646 PMCID: PMC8674881 DOI: 10.1021/acs.analchem.1c03755
Source DB: PubMed Journal: Anal Chem ISSN: 0003-2700 Impact factor: 6.986
Figure 1All the steps taken from the raw data to profile prediction.
Figure 2Distribution of 100,000 randomly selected measured mass peak widths (mDa) as a function of (a) relative intensity (%), (b) the m/z value (Da), and (c) the retention factor (%). The red points were measured in the negative mode(i.e., ESI−) while the blue points were measured in the positive mode(i.e., ESI+).
Figure 3(a) Distribution of 10,000 randomly selected measured mass peak widths (mDa) from the test set vs the predicted mass peak widths and (b) the distribution of the prediction errors in mDa. The red points were measured in the negative mode(i.e. ESI−) while the blue points were measured in the positive mode(i.e. ESI+).
Figure 4Examples of the (a) the predicted profile of an m/z value profile at scan 1400 based on the centroided data and (b) the measured and predicted TICs of a wastewater influent sample. These plots show case the ability of the developed algorithms to predict the mass profiles of the centroided data using relative intensity, m/z value, and the retention factor.