| Literature DB >> 35448470 |
Daniel Cañueto1, Reza M Salek2, Mònica Bulló3,4,5, Xavier Correig1,4,6, Nicolau Cañellas1,4,6.
Abstract
The quality of automatic metabolite profiling in NMR datasets from complex matrices can be affected by the numerous sources of variability. These sources, as well as the presence of multiple low-intensity signals, cause uncertainty in the metabolite signal parameters. Lineshape fitting approaches often produce suboptimal resolutions to adapt them in a complex spectrum lineshape. As a result, the use of software tools for automatic profiling tends to be restricted to specific biological matrices and/or sample preparation protocols to obtain reliable results. However, the analysis and modelling of the signal parameters collected during initial iteration can be further optimized to reduce uncertainty by generating narrow and accurate predictions of the expected signal parameters. In this study, we show that, thanks to the predictions generated, better profiling quality indicators can be outputted, and the performance of automatic profiling can be maximized. Our proposed workflow can learn and model the sample properties; therefore, restrictions in the biological matrix, or sample preparation protocol, and limitations of lineshape fitting approaches can be overcome.Entities:
Keywords: NMR; automatic profiling; machine learning
Year: 2022 PMID: 35448470 PMCID: PMC9027668 DOI: 10.3390/metabo12040283
Source DB: PubMed Journal: Metabolites ISSN: 2218-1989
Figure 1The figure shows a difficult signal fitting where the chemical shift variability present in this signal (a) forces lineshape fitting algorithms to consider a wide range (light grey rectangle) of possible chemical shift values during the fitting (b).
Figure 2(a) In this example, signal parameter prediction pipeline was used to optimize chemical shift value of a signal (table in the upper right). In order to enhance the chemical shifts of the signal in question, a training dataset was built, excluding the signal to predict (a,1). The dataset is then cleaned, filtered, and enriched to maximize its prediction quality (a,2). The information from the first iteration was used to train a prediction model. During training, bootstrap resampling was used to avoid overfitting inaccurate values (a,3). For each predicted chemical shift, the distribution of the predictions made during the bootstrap iterations was built and the median value and 95% PIs of this distribution were outputted (a,4). After optimization, the predicted value and PIs are shown in the bottom right table; in this case, an inaccurate chemical shift, shaded in red, was clearly outside the 95% PIs, shaded in green. (b) shows the distributions of chemical shift predictions generated. These distributions were very narrow and could help generate even narrower chemical shift ranges (dark grey rectangle) than those originally needed without machine learning prediction (light grey rectangle) (c).
Figure 3The spectrum-specific 95% PIs of the parameter values PIs are much narrower than the spectrum-unspecific 95% PIs. Chemical shift PIs are generally lower than the bucketing applied (6 × 10−4 ppm). The narrow PIs enhance the performance of error minimization algorithms to end in the right local minimum.