| Literature DB >> 25649621 |
Joe Wandy1, Rónán Daly1, Rainer Breitling1, Simon Rogers1.
Abstract
MOTIVATION: The combination of liquid chromatography and mass spectrometry (LC/MS) has been widely used for large-scale comparative studies in systems biology, including proteomics, glycomics and metabolomics. In almost all experimental design, it is necessary to compare chromatograms across biological or technical replicates and across sample groups. Central to this is the peak alignment step, which is one of the most important but challenging preprocessing steps. Existing alignment tools do not take into account the structural dependencies between related peaks that coelute and are derived from the same metabolite or peptide. We propose a direct matching peak alignment method for LC/MS data that incorporates related peaks information (within each LC/MS run) and investigate its effect on alignment performance (across runs). The groupings of related peaks necessary for our method can be obtained from any peak clustering method and are built into a pair-wise peak similarity score function. The similarity score matrix produced is used by an approximation algorithm for the weighted matching problem to produce the actual alignment result.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25649621 PMCID: PMC4760236 DOI: 10.1093/bioinformatics/btv072
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Illustrative example of the incorporation of grouping information into the similarity score. Each node in the figure is a peak feature, and dotted ovals represent groups of related peaks, e.g. isotopes, fragments, etc. Initially weights (e.g. W) are computed for pairs of peaks (one from each run) with m/z and RT within pre-defined thresholds. These weights are converted into an overall score by incorporating grouping information. For example, peak pairs (A, E) and (B, G) are both within the threshold. As A and B are in the same group, and E and G are in the same group, the weights between pairs (A, E) and (B, G) are upweighted. Peak J is not related to any peaks that could be matched with A’s related peaks and the similarity between A and J is therefore downweighted (because ). The same applies to similarities between pairs (C, H) and (D, I)
F1 scores for the single-fraction experiment results on the P1 dataset
| Fraction | Join | SIMA | MW | MWG | MWM |
|---|---|---|---|---|---|
| 000 | 0.63 | 0.64 | 0.64 | 0.71 | |
| 020 | 0.88 | 0.88 | 0.88 | 0.90 | |
| 040 | 0.82 | 0.83 | 0.85 | 0.86 | |
| 060 | 0.76 | 0.78 | 0.78 | 0.83 | |
| 080 | 0.90 | 0.89 | 0.88 | 0.90 | |
| 100 | 0.89 | 0.89 | 0.89 | 0.91 | |
| Mean | 0.81 | 0.82 | 0.82 | 0.85 |
Notes: The tool with the highest F1 score for each fraction is highlighted in bold. The results for ‘All’ show the average F1 scores of individual fractions.
F1 scores for the single-fraction experiment results on the P2 dataset
| Fraction | Join | SIMA | MW | MWG | MWM |
|---|---|---|---|---|---|
| 000 | 0.45 | 0.45 | 0.45 | 0.45 | |
| 020 | 0.77 | 0.78 | 0.79 | 0.79 | |
| 040 | 0.77 | 0.78 | 0.77 | 0.77 | |
| 080 | 0.66 | 0.68 | 0.67 | 0.67 | |
| 100 | 0.55 | 0.58 | 0.56 | 0.70 | |
| Mean | 0.64 | 0.65 | 0.65 | 0.69 |
Note: The tool with the highest F1 score for each fraction is highlighted in bold. The results for ‘All’ show the average F1 scores of individual fractions.
Fig. 2.Precision and recall training performance for all parameters (m/z, RT tolerance, α and g) varied in the experiment for the fractions containing the most (Fig. 2a and c) and least (Fig. 2b and d) number of features in the P1 and P2 datasets. Plots for all the remaining fractions can be found in Figures 1 and 2 of Supplementary Material
Multiple-fractions experiment results for the P1 dataset
| Training Frac. | Testing performance | ||||
|---|---|---|---|---|---|
| Join | SIMA | MW | MWG | MWM | |
| 000 | 0.82 | 0.85 | 0.82 | ||
| 020 | 0.78 | 0.76 | 0.78 | 0.75 | |
| 040 | 0.78 | 0.76 | 0.77 | 0.79 | |
| 060 | 0.78 | 0.78 | 0.77 | 0.83 | |
| 080 | 0.71 | 0.73 | 0.72 | 0.77 | |
| 100 | 0.75 | 0.77 | 0.74 | 0.76 | |
Note: For each training fraction, the reported testing performance is the average of individual F1 scores from the testing fractions. The top-performing method (highest F1 score) is highlighted in bold.
Multiple-fractions experiment results for the P2 dataset
| Training fraction | Testing performance | ||||
|---|---|---|---|---|---|
| Join | SIMA | MW | MWG | MWM | |
| 000 | 0.62 | 0.61 | 0.48 | 0.61 | |
| 020 | 0.56 | 0.55 | 0.43 | 0.55 | |
| 040 | 0.52 | 0.41 | |||
| 080 | 0.56 | 0.50 | 0.50 | 0.50 | |
| 100 | 0.57 | 0.56 | 0.44 | 0.57 | |
Notes: For each training fraction, the reported testing performance is the average of individual F1 scores from the testing fractions. The top-performing method (highest F1 score) is highlighted in bold.
Fig. 3.Training performance shows the best F1 scores obtained by each method on 30 pairs of randomly selected metabolomic and glycomic training sets
Fig. 4.Testing performance shows how well each method generalize on the 30 different testing sets, each evaluated using the optimal training parameters from its corresponding training set