| Literature DB >> 35180347 |
Gonçalo Graça1, Yuheng Cai1, Chung-Ho E Lau2, Panagiotis A Vorkas3,4, Matthew R Lewis5, Elizabeth J Want3, David Herrington6, Timothy M D Ebbels1.
Abstract
Untargeted metabolomics and lipidomics LC-MS experiments produce complex datasets, usually containing tens of thousands of features from thousands of metabolites whose annotation requires additional MS/MS experiments and expert knowledge. All-ion fragmentation (AIF) LC-MS/MS acquisition provides fragmentation data at no additional experimental time cost. However, analysis of such datasets requires reconstruction of parent-fragment relationships and annotation of the resulting pseudo-MS/MS spectra. Here, we propose a novel approach for automated annotation of isotopologues, adducts, and in-source fragments from AIF LC-MS datasets by combining correlation-based parent-fragment linking with molecular fragment matching. Our workflow focuses on a subset of features rather than trying to annotate the full dataset, saving time and simplifying the process. We demonstrate the workflow in three human serum datasets containing 599 features manually annotated by experts. Precision and recall values of 82-92% and 82-85%, respectively, were obtained for features found in the highest-rank scores (1-5). These results equal or outperform those obtained using MS-DIAL software, the current state of the art for AIF data annotation. Further validation for other biological matrices and different instrument types showed variable precision (60-89%) and recall (10-88%) particularly for datasets dominated by nonlipid metabolites. The workflow is freely available as an open-source R package, MetaboAnnotatoR, together with the fragment libraries from Github (https://github.com/gggraca/MetaboAnnotatoR).Entities:
Mesh:
Year: 2022 PMID: 35180347 PMCID: PMC8892435 DOI: 10.1021/acs.analchem.1c03032
Source DB: PubMed Journal: Anal Chem ISSN: 0003-2700 Impact factor: 6.986
Figure 1Schematic of MetaboannotatoR workflow for AIF LC–MS feature annotation.
Figure 2Automated annotation using MetaboAnnotatoR of feature 468.309 m/z 83 s from a representative sample of the MESA human serum Lipid+ dataset: (A) matched EICs and (B) corresponding pseudo-MS/MS spectrum of ions matched for the rank 1 candidate. (C) Table with ranked candidates for the same feature. Legend: mz.error—m/z error, E, in ppm; mz.metabolite—m/z of the parent ion of the matched candidate; matched.mz—m/z of the matched parent or fragment; fraction—number fragments of each candidate that have been matched to the target pseudo-MS/MS; and pseudo-MS/MS—logical value indicating if a pseudo-MS/MS was obtained (TRUE) or not (FALSE).
Comparison between Automated and Manual Annotations of MESA Datasetsa
| number
of correct annotations at each rank | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| dataset | rank 1 | rank 2 | rank 3 | rank 4 | rank 5 | incorrect | not annotated | precision (%) | recall (%) |
| Lipid+ QC | 134 (69.8%) | 12 (6.3%) | 4 (2.1%) | 1 (0.5%) | 1 (0.5%) | 14 (7.3%) | 26 (13.5%) | 91.6 | 85.4 |
| Lipid+ RC | 134 (69.8%) | 7 (3.6%) | 5 (2.6%) | 4 (2.1) | 1 (0.5%) | 17 (8.9%) | 24 (12.5%) | 89.9 | 86.3 |
| Lipid- QC | 75 (50.7%) | 19 (12.8%) | 2 (1.4%) | 1 (0.7%) | 10 (6.8%) | 40 (27.1%) | 90.6 | 70.8 | |
| Lipid– RC | 92 (62.2%) | 19 (12.8%) | 2 (1.4%) | 0 (0%) | 1 (0.7%) | 9 (6.1%) | 24 (16.2%) | 92.9 | 82.6 |
| HILIC+ QC | 143 (55.0%) | 14 (5.4%) | 6 (2.3%) | 3 (1.2%) | 31 (11.9%) | 63 (24.2%) | 84.3 | 72.5 | |
| HILIC+ RC | 131 (50.4%) | 23 (8.8%) | 7 (2.7%) | 6 (2.3%) | 14 (5.4%) | 39 (15.0%) | 40 (15.4%) | 82.3 | 81.9 |
The QC sample corresponds to a pool of study samples. The RC (RAMClustR) object contains pseudo-MS/MS spectra arranged into clusters from the 100 study samples from the three datasets using XCMS and RAMClustR. Results are organized according to the rank where the correct annotation was found after ranking the annotation scores in the descending order. For precision and recall, an annotation was defined as correct if it was found in ranks 1–5.
Figure 3Relationship between annotation accuracy and feature intensity. Box plots show distribution of feature intensities for each rank of correct annotation, incorrectness, or the absence of annotation. Nondetected features (below min intensity) are not presented.
Figure 4Venn diagrams showing the number of overlapped annotations between manually annotated features and those from MetaboAnnotatoR and MS-DIAL for Lipid+ (n = 161 features), Lipid– (n = 53 features), and HILIC+ (n = 50 features).
Automated Annotation of Additional Studiesa
| study/dataset | species | sample | chromatography/MS instrument | annotations correct/reported | precision (%) | recall (%) | no ref. |
|---|---|---|---|---|---|---|---|
| AT+ | adipose tissue | HILIC/Waters Synapt Q-ToF, ESI+ | 26/37 | 81.3 | 83.9 | 2 (5.4%) | |
| AT– | adipose tissue | HILIC/Waters Synapt Q-ToF, ESI– | 16/21 | 84.2 | 88.9 | 2 (9.5%) | |
| MTBLS666 | amniotic fluid | RP-C18/Waters Synapt Q-ToF, ESI– | 3/32 | 60.0 | 10.0 | 27 (84.4%) | |
| MTBLS816 | urine | HILIC Positive/Agilent Q-ToF, ESI+ | 31/107 | 88.6 | 30.1 | 60 (56.1%) |
Adipose tissue extract (AT) HILIC datasets and two publicly available datasets from the MetaboLights repository (MTBLS). Precision and recall refer to rank 1 automated annotations. No ref.—Annotations missed due to the absence of reference from the library. Annotations reported by the studies were level 2 and 3 except for MTBLS816 which is mostly composed of level 1 annotations (105 of 107).