| Literature DB >> 30903689 |
Oriol Senan1, Antoni Aguilar-Mogas1, Miriam Navarro2,3, Jordi Capellades2,3, Luke Noon3,4, Deborah Burks3,4, Oscar Yanes2,3, Roger Guimerà1,5, Marta Sales-Pardo1.
Abstract
MOTIVATION: The analysis of biological samples in untargeted metabolomic studies using LC-MS yields tens of thousands of ion signals. Annotating these features is of the utmost importance for answering questions as fundamental as, e.g. how many metabolites are there in a given sample.Entities:
Year: 2019 PMID: 30903689 PMCID: PMC6792096 DOI: 10.1093/bioinformatics/btz207
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Schematic representation of CliqueMS. CliqueMS identifies the features belonging to the same metabolite. CliqueMS uses as input LC-MS1 data in any format that can be converted into either an ‘xcmsSet’ or an ‘XCMSnExp’ object in R such as mzML, mzXML, mzData and NetCDF. First, CliqueMS determines peak-shape (i.e. coeluting) similarities between all pairs of features in the LC-MS1 spectrum. Then CliqueMS finds groups of features based on the network of similarities. The assumption is that the more similar a pair of features, the more likely they are to belong to the same group. Following a maximum likelihood procedure, CliqueMS finds the best division into fully connected groups of features (or cliques). Then, for each clique, CliqueMS proceeds to annotate each feature by establishing the parental ion neutral mass. Annotations are scores based on a table of empirically observed frequencies for each adduct. The final output is, for each feature, the five annotations with the highest score specifying the adducts/in-source fragment and its corresponding parental mass. See Supplementary Figure S1 for a detailed description of the installation process, input and output formats as well as the parameters and modules within CliqueMS
Summary of the full set of annotations for each sample
| Sample | Tool | Features | Number of cliques/ groups/clusters | Annotated unique parental masses | Annotated features (%) |
|---|---|---|---|---|---|
| Standards | CliqueMS | 275 | 69 | 49 (48) | 64 (55) |
| CAMERA | 164 | 25 | 32 | ||
| Retina IRS2 KO(+ ionization) | CliqueMS | 8489 | 606 | 1231 (1512) | 70 (57) |
| CAMERA | 2836 | 1303 | 43 | ||
| Retina IRS2 KO (− ionization) | CliqueMS | 3893 | 349 | 334 (494) | 44 (36) |
| CAMERA | 1083 | 552 | 32 | ||
| MTBLS103 HILIC | CliqueMS | 16 160 | 387 | 3186 (3703) | 84 (68) |
| CAMERA | 13 048 | 488 | 2947 | 62 | |
| xMSannotator | 230 | 5314 | 46 | ||
| MS-FLO | NA | 2875 | 57 | ||
| MTBLS103 C18 | CliqueMS | 24 620 | 927 | 3980 (4769) | 74 (61) |
| CAMERA | 19 871 | 1332 | 13131 | 58 | |
| xMSannotator | 540 | 6226 | 48 | ||
| MS-FLO | NA | 3283 | 41 |
Note: For single spectrum datasets, we show the total number of features in the LC-MS1 spectrum. For MTBLS103 datasets, we report the number of features after sample alignment with XCMS. For single spectrum datasets, we report results for CliqueMS and CAMERA. For multiple spectrum datasets, we report the results for CliqueMS, CAMERA, xMSannotator and MS-FLO. We report the total number of groups identified by the algorithm, the number of unique parental neutral masses identified and the percentage of features each algorithm associated to a parental neutral mass. For single spectrum datasets, we consider the five annotations with the highest scores produces by CliqueMS and report: (i) the average number of unique parental masses over annotations, and, in parenthesis, the number of unique parental masses in the annotation with the best score; (ii) the percentage of features with at least one annotation within the five annotations with best scores, and, in parenthesis, the percentage of features annotated within the best ranked annotation.
For MTBLS103 datasets, we run CliqueMS for each individual sample. For each dataset, we report the the average number of features and the average number of cliques obtained across samples. We also report: (i) the average number of unique parental neutral masses and, in parenthesis, the average number of unique parental masses in the top annotation and (ii) the average of the percentage of features with at least one annotation within the five top annotations and only considering the top annotation (in parenthesis).
Fig. 2.Feature annotation for a mixture of standards. (a) Extracted ion chromatogram. The nine ionized metabolites were annotated with CliqueMS. We show features that are adducts of each metabolite in different colors (shades of grey), as annotated by CliqueMS in (c). (b) Cliques identified by CliqueMS in the same experiment, after computing cosine correlation and maximizing clique likelihood. The intensity of the link is proportional to the correlation, and the area of each node is proportional to feature intensity. The colors are the same as in (a). For each feature, we show the annotation given by CliqueMs as shown in (c). We denote isotopes by adding a subindex to M, so that M0 corresponds to the monoisotopic mass and M1 to the first isotope. (c) Feature annotation by CliqueMS and CAMERA. For each metabolite, we show the different adducts annotated and the total number of isotopic variants of that particular adduct. Correctly annotated features are shown in green; incorrectly annotated features are shown in red (darker shade of grey), with indicating that the associated parental neutral mass was incorrect; non-annotated features are shown in white. For CliqueMS, we also show the ranking of the feature annotation that matches manual annotation. For CAMERA the * indicates those features for which the algorithm returned two possible annotations. DSPC stands for 1, 2-distearoyl-sn-glycero-3-phosphocholine. See Supplementary Material for CliqueMS annotations
Summary of the performance of different algorithms for complex samples
| Sample | Identified and annotated metabolites | Tool | Annotated metabolites | Adducts/mass fragments | Annotated features | |
|---|---|---|---|---|---|---|
| Multiple adducts | Single adduct | |||||
| Retina IRS2 KO (+ ionization) | 20 | CliqueMS | 15 | — | 50 | 95 |
| CAMERA | 8 | — | 25 | 45 | ||
| Retina IRS2 KO (− ionization) | 18 | CliqueMS | 6 | — | 16 | 35 |
| CAMERA | 5 | — | 14 | 33 | ||
| MTBLS103 HILIC | 6 (78) | CliqueMS | 5/6/56 | — | 18/26/213 | 44/72/318 |
| 6 | CAMERA | 3 | — | 13 | 21 | |
| xMSannotator | 1 | 4 | 10 | 10 | ||
| MS-FLO | 1 | — | 2 | 3 | ||
| MTBLS103 C18 | 9 (162) | CliqueMS | 6/8/104 | — | 17/29/304 | 46/66/524 |
| 9 | CAMERA | 3 | — | 11 | 20 | |
| xMSannotator | 3 | 6 | 13 | 13 | ||
| MS-FLO | 0 | — | 0 | 0 | ||
Note: For single spectrum samples (Retina IRS2 KO in positive and negative ionization mode), we report results for CAMERA and CliqueMS. For the datasets in MTBLS103 (Samino ), we report results for the chromatographic column operating in two different conditions: RP-C18 and HILIC. For the MTBLS103 datasets, we show results for CliqueMS, CAMERA, xMSAnnotator and MS-FLO. The multiple adduct and single adduct columns indicate the number of correctly annotated metabolites through the identification of at least two adducts with the same parental neutral mass, and the number of metabolites annotated through the annotation of a single adduct [annotated single adducts are assigned to (M + H)+ by xMSannotator].
CliqueMS analyzes individual samples, therefore in parenthesis we show the total number of annotated metabolites in all samples.
Because CliqueMS produces an individual annotation for each sample (13 for HILIC and 18 for RP-C18), we report three results : r1 shows the number of unique metabolites/adducts/features that are correctly annotated in of the samples; r2 shows the number of unique metabolites/adducts/features which are correctly annotated in at least one sample and r3 shows the aggregate numbers over samples.
Feature annotation for complex samples
| Metabolite | CliqueMS | CAMERA | |||
|---|---|---|---|---|---|
| Annotation | Iso- topes | Rank | Annotation | Iso- topes | |
| Uracil | (M+H)+ | 2 | 1 | (M+H)+ | 2 |
| (M+H-H2O)+ | 1 | 1 | (M+H-H2O)+ | 1 | |
| (M+H-NH3)+ | 1 | 1 | (M+H-NH3)+ | 1 | |
| Taurine | (M+H)+ | 2 | 2 | (M+H)+ | 2 |
| (M+Na)+ | 2 | 3 | (M+Na)+ | 2 | |
| (M+H-H2O)+ | 1 | 2 | (M+H-H2O)+ | 1 | |
| (2M+H)+ | 1 | 2 | (2M+H)+ | 1 | |
| (M2+Na)+ | 3 | 1 | (M-H+2Na)+ | 3 | |
| Adenine | (M+H)+ | 2 | 1 | (M2+NH4)+ | 2 |
| (M+H-NH3)+ | 2 | 1 | (M2+H)+ | 2 | |
| L-glutamic acid | (M+H)+ | 3 | 1 | (M2+NH4)+ | 3 |
| (M+H-H2O)+ | 2 | 1 | — | — | |
| (M+Na-H2O)+ | 1 | 3 | (M2+H-H2O)+ | 1 | |
| (M+Na)+ | 3 | 3 | (M2+H)+ | 3 | |
| (M-H+2Na)+ | 3 | 3 | (M2+Na)+ | 3 | |
| (M-2H+3Na)+ | 2 | 3 | (M2-H+2Na)+ | 3 | |
| Guanine | (M+H-H2O)+ | 1 | 1 | (M+H-H2O)+ | 1 |
| (M+H-NH3)+ | 2 | 1 | (M+H-NH3)+ | 2 | |
| (M+H)+ | 2 | 1 | (M+H)+ | 3 | |
| Xanthine | (M+Na)+ | 1 | 1 | — | — |
| (M+H-NH3)+ | 1 | 1 | — | — | |
| (M+H)+ | 2 | 1 | — | — | |
| L-2-aminoadipic acid | (M+H-H2O)+ | 1 | 2 | *(M+H-H2O)+ | 1 |
| (M+H)+ | 1 | 2 | *(M+H)+ | 1 | |
| L-ascorbic acid | (M+Na)+ | 1 | 1 | (M+Na)+ | 1 |
| (M+H)+ | 1 | 1 | (M+H)+ | 1 | |
| PC | (M+K)+ | 1 | 1 | (M2+K-H2O)+ | 1 |
| (M+Na)+ | 2 | 1 | (M2+Na-H2O)+ | 2 | |
| (M+H)+ | 3 | 1 | — | — | |
| Inosine | (M+K)+ | 2 | 1 | (M+K)+ | 1 |
| (2M+H)+ | 2 | 1 | (2M+H)+ | 3 | |
| (2M+Na)+ | 3 | 1 | (2M+Na)+ | 3 | |
| (M+H)+ | 3 | 1 | (M+H)+ | 3 | |
| (M+Na)+ | 2 | 1 | (M+Na)+ | 2 | |
| Guanosine | (2M+H)+ | 2 | 1 | (2M+H)+ | 2 |
| (M+Na)+ | 1 | 1 | (M+Na)+ | 1 | |
| (M+H)+ | 4 | 1 | (M+H)+ | 4 | |
| Glutathione | (M+Na)+ | 1 | 1 | (M+Na)+ | 1 |
| (M+H)+ | 2 | 1 | (M+H)+ | 3 | |
| (M+H-H2O)+ | 3 | 1 | (2M2+H)+ | 3 | |
| Oxigluthatione | (M+Na)+ | 2 | 1 | (2M2+Na)+ | 2 |
| (M+K)+ | 3 | 1 | (2M2+K)+ | 3 | |
| (M+H)+ | 3 | 1 | (2M2+H)+ | 4 | |
| NAD | (M+Na)+ | 1 | 1 | (2M2+Na)+ | 1 |
| (M+2H)2+) | 3 | 1 | (M2+H)+ | 1 | |
| (M+H)+ | 4 | 1 | (2M2+H)+ | 4 | |
Note: Detail of the adducts and in-source fragments annotated by CliqueMS and CAMERA for the retina samples of IRS2 deficient mice (+ ionization). For each molecule, we show the different adducts and in-source fragments annotated; in parenthesis we show the total number of isotopic variants of that particular adduct/in-source fragment. Correctly annotated features are shown in green (light grey); incorrectly annotated features are shown in red (darker grey), with indicating that the associated parental mass was incorrect; non-annotated features are shown in white. For CliqueMS, we also show the ranking of the feature annotation that matches manual annotation. For CAMERA the* indicates those features for which the algorithm returned two possible annotations (see Supplementary Material for the complete results obtained for this sample using CliqueMS and for the complete list of manually annotated metabolites).