| Literature DB >> 25972771 |
Mingshu Cao1, Karl Fraser1, Jan Huege1, Tom Featonby1, Susanne Rasmussen2, Chris Jones1.
Abstract
Liquid chromatography coupled to mass spectrometry (LCMS) is widely used in metabolomics due to its sensitivity, reproducibility, speed and versatility. Metabolites are detected as peaks which are characterised by mass-over-charge ratio (m/z) and retention time (rt), and one of the most critical but also the most challenging tasks in metabolomics is to annotate the large number of peaks detected in biological samples. Accurate m/z measurements enable the prediction of molecular formulae which provide clues to the chemical identity of peaks, but often a number of metabolites have identical molecular formulae. Chromatographic behaviour, reflecting the physicochemical properties of metabolites, should also provide structural information. However, the variation in rt between analytical runs, and the complicating factors underlying the observed time shifts, make the use of such information for peak annotation a non-trivial task. To this end, we conducted Quantitative Structure-Retention Relationship (QSRR) modelling between the calculated molecular descriptors (MDs) and the experimental retention times (rts) of 93 authentic compounds analysed using hydrophilic interaction liquid chromatography (HILIC) coupled to high resolution MS. A predictive QSRR model based on Random Forests algorithm outperformed a Multiple Linear Regression based model, and achieved a high correlation between predicted rts and experimental rts (Pearson's correlation coefficient = 0.97), with mean and median absolute error of 0.52 min and 0.34 min (corresponding to 5.1 and 3.2 % error), respectively. We demonstrate that rt prediction with the precision achieved enables the systematic utilisation of rts for annotating unknown peaks detected in a metabolomics study. The application of the QSRR model with the strategy we outlined enhanced the peak annotation process by reducing the number of false positives resulting from database queries by matching accurate mass alone, and enriching the reference library. The predicted rts were validated using either authentic compounds or ion fragmentation patterns.Entities:
Keywords: LCMS; Lolium perenne; Metabolite identification; Metabolomics; Peak annotation; QSRR
Year: 2014 PMID: 25972771 PMCID: PMC4419193 DOI: 10.1007/s11306-014-0727-x
Source DB: PubMed Journal: Metabolomics ISSN: 1573-3882 Impact factor: 4.290
Fig. 1a Overall negative correlation was observed between the experimental retention time of the reference compounds (rtRef) and XLogP (CDK-based calculation) for the 116 reference compounds which were used for the HILIC-based LCMS library construction; b Compounds with rt < 5 min and duplicated stereoisomers were not retained, leaving 93 compounds for the modelling process. A significant correlation between rtRef and XLogP was shown (r = −0.69, p value < 2.0e-14)
Fig. 2Correlation between the predicted retention time (rtPred, min) and the experimental retention time (rtRef, min) for the 93 reference compounds by the established models a Multiple Linear Regression (MLR) (r = 0.85), and b Random Forest (RF) model (r = 0.97)
Fig. 3The smoothed XIC of m/z 166.0532 ± 20 ppm from the eight samples. The boxplot shown (a) was based on the normalised peak heights from wavelet-based peak detection. Histogram (b) of the predicted retention time (pRT) of 216 PubChem compounds with the same chemical formula of C5H11NO3S
Fig. 4Diagram of the modelling process (literal: a, b) and the application of the established model for peak annotation (number: 1–6). a build a QSRR model based on experimental retention time (rt) of known compounds (a reference library); b update the model by incorporating the newly verified or putatively identified compounds. The model can be iteratively improved. 1 search databases with the measured accurate mass; 2 integrate and refine the query results from various resources and compute the structural presentation (SMILES) of the query list; 3 compute molecular descriptors and predict rt using the model; 4 annotate peaks by adding the predicted rt and its prediction accuracy; 5 verify the predicted rt with other evidence; 6 when no hits returned from database search by accurate mass, hypothetical compounds occurring in biological samples can be proposed and their structures can be sketched using a molecular editor to generate structural presentation