| Literature DB >> 34951624 |
Justin G Chitpin1,2,3,4,5, Anuradha Surendra6, Thao T Nguyen3,4,5,7, Graeme P Taylor3,4,5, Hongbin Xu3,4,5, Irina Alecu3,4,5, Roberto Ortega8, Julianna J Tomlinson9,10, Angela M Crawley2,5, Michaeline McGuinty2, Michael G Schlossmacher9,10, Rachel Saunders-Pullman8, Miroslava Cuperlovic-Culf5,6, Steffany A L Bennett2,3,4,5,7,9,10, Theodore J Perkins1,2,4,5.
Abstract
MOTIVATION: Bioinformatic tools capable of annotating, rapidly and reproducibly, large, targeted lipidomic datasets are limited. Specifically, few programs enable high-throughput peak assessment of liquid chromatography-electrospray ionization tandem mass spectrometry (LC-ESI-MS/MS) data acquired in either selected or multiple reaction monitoring (SRM and MRM) modes.Entities:
Year: 2021 PMID: 34951624 PMCID: PMC8896618 DOI: 10.1093/bioinformatics/btab854
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Common challenges associated with SRM, MRM and PRM peak identification. (a) Ambiguity occurs when multiple lipid isomers, isobars, and isotopes are detected within the same matrix at a given transition, yet technical variations in flow rate, composition of the mobile phase, temperature, pH, etc., cause their retention times to vary across samples. Data represent XICs of the same matrix (murine plasma) in animals fed different diets. Note six peaks are observed in one sample at a given transition. Seven peaks are observed in a different sample shifted by 1 min. Matching retention time would not align these shifted species. (b) Assigning lipid identities based on peak elution order (picking the nth eluting peak) will also lead to misidentifications when comparing lipid species across matrices. Data represent XICs of plasma and brain (temporal cortex) lipidomes from the same animal. Note both the retention time shift and the fundamentally different number of species within each matrix. Matching by either retention time or peak elution order would confound identification. (c) Matching lipids based on peak intensity features is complicated by pathological changes detected in lipid metabolism. Data represent XICs of the human plasma lipidome of patients with different neurodegenerative diseases. Note the marked change in abundances between conditions that impacts on lipid identification. While algorithms exist to address each of these challenges, few are applicable to datasets wherein all differences manifest simultaneously. BATL addresses these challenges
Fig. 2.Schematic of the BATL lipid identification workflow. BATL follows three steps: (i) users are asked to identify training datasets for which they have unambiguous knowledge of peak identities. (ii) These datasets are used to train BATL, constructing a naïve Bayes statistical model based on the peak features users select. (iii) The model and associated metadata are used by the BATL algorithm to annotate peaks in subsequent query SRM, MRM or PRM datasets
Specified SRM peak features for naïve Bayes model
| Feature | Description |
|---|---|
| Retention time (RT) | Peak retention time |
| Relative RT (RRT) | Peak divided by internal standard retention time |
| Subtracted RT (SRT) | Peak subtracted by internal standard retention time |
| Relative area (A) | Peak divided by internal standard area |
| Relative height (H) | Peak divided by internal standard height |
| Full width at half max (FWHM) | Peak width at half maximum height |
| Asymmetry factor (AF) | Quotient between centerline to back slope and centerline to front slope at 10% max peak height |
| Tailing factor (TF) | Distance between the front and back slope of a peak divided by twice the distance between the centerline and front slope at 5% max peak height |
Fig. 3.Classifier performance on 10-fold cross validation sphingolipid and glycerophosphocholine datasets. The 95% confidence intervals are shown in panels (a, b, d and e). In a and b),data represent mean accuracies of BATL models trained on retention time with each decision rule and retention time mean/window matching algorithms for (a) sphingolipids or (b) glycerophosphocholines (***Q < 0.001, t-test adjusted with the Benjamini–Hochberg method of all models against the MWBM decision rule). (c) Lipid assignment differences between MAP, constrained MAP, and MWBM decision rules during cross validation and trained using retention time. In the top panel, data represent the Gaussian likelihoods of five glycerophosphocholine isomers based on the retention time feature. The rows of gray dots indicate the retention times of four peaks from the same sample in the validation set. Each row indicates the outcome of the three decision rules. Arrows indicate the lipid assignments; checkmarks indicate correct assignments; and Xs indicate incorrect assignments. The numbers for constrained MAP indicate the order of peak assignments. In d and e,data represent mean accuracies of the BATL models using MWBM decision rule trained on several features and feature combinations for (d) sphingolipids or (e) glycerophosphocholines. The feature name codes are described in Table 1