| Literature DB >> 32807888 |
Qin Liu1, Douglas Walker2, Karan Uppal3, Zihe Liu1, Chunyu Ma3, ViLinh Tran3, Shuzhao Li4, Dean P Jones3, Tianwei Yu5.
Abstract
With the growth of metabolomics research, more and more studies are conducted on large numbers of samples. Due to technical limitations of the Liquid Chromatography-Mass Spectrometry (LC/MS) platform, samples often need to be processed in multiple batches. Across different batches, we often observe differences in data characteristics. In this work, we specifically focus on data generated in multiple batches on the same LC/MS machinery. Traditional preprocessing methods treat all samples as a single group. Such practice can result in errors in the alignment of peaks, which cannot be corrected by post hoc application of batch effect correction methods. In this work, we developed a new approach that address the batch effect issue in the preprocessing stage, resulting in better peak detection, alignment and quantification. It can be combined with down-stream batch effect correction methods to further correct for between-batch intensity differences. The method is implemented in the existing workflow of the apLCMS platform. Analyzing data with multiple batches, both generated from standardized quality control (QC) plasma samples and from real biological studies, the new method resulted in feature tables with better consistency, as well as better down-stream analysis results. The method can be a useful addition to the tools available for large studies involving multiple batches. The method is available as part of the apLCMS package. Download link and instructions are at https://mypage.cuhk.edu.cn/academics/yutianwei/apLCMS/ .Entities:
Mesh:
Year: 2020 PMID: 32807888 PMCID: PMC7431853 DOI: 10.1038/s41598-020-70850-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Illustration of the two-stage preprocessing approach. (a) The overall workflow. (b) Illustration of the calculation of RT shift for individual samples. (c) Example between-batch RT shift calculated from a real dataset.
Figure 2Comparison of the two-stage preprocessing approach with traditional apLCMS and XCMS using standard sample. Each dot represents a parameter setting. (a) Total number of zeros in the final data matrix; (b) proportion of features with m/z matched to known metabolites using xMSAnnotator; (c) level of variation as measured by coefficient of variation (CV) in the final data matrix without considering batches; (d) level of variation as measured by coefficient of variation (CV) in the final data matrix without considering batches, considering only non-zero values; (e) level of variation as measured by CV after merging each batch; (f) level of variation as measured by CV after merging each batch, considering only non-zero values. In all CV plots, the point is median; vertical bars represent 10th to 90th percentile.
Figure 3Comparison of the two-stage preprocessing approach with traditional apLCMS and XCMS using the ST000868 dataset. Each dot represents a parameter setting. (a) Proportion of zeros in the final data matrix before merging triplets for each subject; (b) Proportion of features with m/z matched to known metabolites by xMSAnnotator; (c) Within-triplet coefficient of variation (CV). Point is median; vertical bars represent 10th to 90th percentile. (d) Number of significant features at FDR ≤ 0.2, without batch effect correction; (e) Number of significant features at FDR ≤ 0.2, after batch effect correction by ComBat; (f) Number of significant features at FDR ≤ 0.2, after batch effect correction by WaveICA.
Figure 4Comparison of the two-stage approach with traditional apLCMS and XCMS using CHDWB samples. Each dot represents a parameter setting. (a) Proportion of zeros in the final data matrix before merging triplets for each subject; (b) proportion of features with m/z matched to known metabolites by xMSAnnotator; (c) average within-triplet coefficient of variation (CV). Point is median; vertical bars represent 10th–90th percentile. (d) Number of significant features at FDR ≤ 0.2, without batch effect correction; (e) Number of significant features at FDR ≤ 0.2, after batch effect correction by ComBat; (f) Number of significant features at FDR ≤ 0.2, after batch effect correction by WaveICA.
Comparison of feature selection and pathway analysis results.
| Method | Total # features | # Significant pathways with 5 or more matched significant metabolites |
|---|---|---|
| Two-stage, Pwithin.detect = 0.3 pbatches = 0.3 | 5,024 | 6 |
| Two-stage, Pwithin.detect = 0.6 pbatches = 0.15 | 4,988 | 5 |
| Traditional apLCMS, min.profiles = 50 | 5,097 | |
| XCMS matched filter + orbiwarp, minsamp 30 | 5,004 | 0 |
| XCMS centWave + orbiwarp, minsamp 300 | 5,064 | 5 |
| XCMS centWave + loess, minsamp 240 | 5,201 | 1 |
| Two-stage, pwithin.detect = 0.2 pbatches = 0.45 | 4,034 | |
| Traditional apLCMS, min.profiles = 90 | 4,129 | |
| XCMS centWave + loess, minsamp 300 | 4,165 | 2 |
| XCMS matched filter + orbiwarp, minsamp 50 | 3,928 | 0 |
| Two-stage, pwithin.detect = 0.3 pbatches = 0.6 | 2,837 | |
| Traditional apLCMS, Min.profiles = 180 | 2,874 | 3 |
| XCMS matched filter + orbiwarp, minsamp 90 | 2,789 | 0 |
| Two-stage, pwithin.detect = 0.3 pbatches = 0.9 | 1667 | |
| Traditional apLCMS, Min.profiles = 300 | 1725 | 3 |
| XCMS matched filter + orbiwarp, minsamp 180 | 1704 | 0 |
BMI was used as the outcome variable. Age, age2, gender, and race were adjusted for in the model. Metabolic feature selection was conducted using features with < 25% zeros. Pathway analysis was conducted using Mummichog, using metabolic features with p < 0.05.
The bold italic font represents the biggest number of significant pathways in the comparison group
Significant pathways with at least 5 matched significant metabolic features for parameter settings where ~ 4,000 features were detected.
| Pathways | Overlap_size | Pathway_size | |
|---|---|---|---|
| Two-stage apLCMS (within-batch proportion 0.3, initially detected in at least 4 batches), 4,034 features | |||
| Lysine metabolism | 6 | 19 | 0.00185 |
| Phosphatidylinositol phosphate metabolism | 5 | 16 | 0.00479 |
| Butanoate metabolism | 5 | 17 | 0.00681 |
| Glycine, serine, alanine and threonine metabolism | 8 | 38 | 0.00798 |
| Aspartate and asparagine metabolism | 9 | 52 | 0.01899 |
| Urea cycle/amino group metabolism | 7 | 40 | 0.02756 |
| Pyrimidine metabolism | 5 | 27 | 0.0463 |
| Glycerophospholipid metabolism | 8 | 53 | 0.04966 |
| Traditional apLCMS (minimum samples detected 90), 4,129 features | |||
| Butanoate metabolism | 5 | 15 | 0.00387 |
| Glycine, serine, alanine and threonine metabolism | 8 | 37 | 0.00689 |
| Arachidonic acid metabolism | 6 | 24 | 0.00748 |
| Lysine metabolism | 5 | 18 | 0.0079 |
| Vitamin B3 (nicotinate and nicotinamide) metabolism | 5 | 18 | 0.0079 |
| Glycerophospholipid metabolism | 9 | 52 | 0.01681 |
| Urea cycle/amino group metabolism | 7 | 43 | 0.041 |
| Aspartate and asparagine metabolism | 8 | 53 | 0.04899 |
| XCMS (centWave + loess, IPO optimized, minimum samples detected 90), 4,165 features | |||
| C21-steroid hormone biosynthesis and metabolism | 6 | 24 | 0.00395 |
| Urea cycle/amino group metabolism | 5 | 30 | 0.04353 |