| Literature DB >> 28327092 |
Bo Wen1,2, Zhanlong Mei1,2, Chunwei Zeng1,2, Siqi Liu3,4.
Abstract
BACKGROUND: Non-targeted metabolomics based on mass spectrometry enables high-throughput profiling of the metabolites in a biological sample. The large amount of data generated from mass spectrometry requires intensive computational processing for annotation of mass spectra and identification of metabolites. Computational analysis tools that are fully integrated with multiple functions and are easily operated by users who lack extensive knowledge in programing are needed in this research field.Entities:
Keywords: Metabolomics; Normalization; Pipeline; Quality control; Workflow
Mesh:
Year: 2017 PMID: 28327092 PMCID: PMC5361702 DOI: 10.1186/s12859-017-1579-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Qualitative assessment of metaX compared to other existing metabolomics tools
| No. | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 14 | 15 | 10 | 11 | 12 | 13 | 14 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Feature | metaX | MAIT | Workflow4Metabolomics | MetMSLine | metaMS | MetaboNexus | MetaboAnalyst | XCMSOnline | MeltDB | Mzmine | Mzmatch | apLCMS | EigenMS | Metab | Metabomxtr | Metabolomics |
| Year | 2015 | 2014 | 2014 | 2013 | 2013 | 2014 | 2009 | 2012 | 2008 | 2006 | 2011 | 2009 | 2014 | 2011 | 2014 | 2014 |
| Language | R, Java | R | R, Perl, Python, Java | R | R | R | R, Java | R | perl, JavaScript and R | JAVA | JAVA, R | R | R | R | R | R |
| Platform independent | √ | √ | √ | √ | √ | Windows only | √ | √ | √ | √ | √ | √ | √ | √(windows & MacOS) | √ | √ |
| Open source | √ | √ | √ | √ | √ | √ | √ | √ | project- and user-specific access | √ | √ | √ | √ | √ | √ | √ |
| Usable offline | √ | √ | √ | √ | √ | √ | √ | - | - | √ | √ | √ | √ | √ | √ | √ |
| Power analysis | √ | - | - | - | - | - | √ | - | - | - | - | - | - | - | - | - |
| Automatic outlier samples finding | √ | - | √ | √ | - | - | - | - | - | - | - | - | - | - | - | - |
| PCA | √ | √ | √ | √ | - | √ | √ | √ | √ | √ | - | - | - | - | - | √ |
| Cluster analysis | √ | √ | √ | √ | - | √ | √ | - | √ | √ | - | - | - | - | - | √ |
| PLS-DA | √ | √ | √ | - | - | √ | √ | - | √ | - | - | - | - | - | - | - |
| ROC analysis | √ | - | - | - | - | √ | √ | - | - | - | - | - | - | - | - | - |
| Normalization | Sum, PQN, VSN, QC-RSC, ComBat, SVR, quantiles | - | Linear or local polynomial regression fitting | QC-LSC | - | Internal standard or quantile normalization | Normalized by sum/median, Normalized by reference sample/feature, sample specific normalization and quantile normalization | - | Normalized by specific compound or feature | Linear normalizaiton, normalized by internal standards | Normalized by Reference sample | - | combination of ANOVA and singular value decomposition | internal standard, medium, biomass(divides the intensity of each metabolite in a specific sample by the value of the biomass measured for this specific sample) | normalized using a mixture model with batch-specific thresholds and run order correction | normalized by sum,mean or media of each sample;normalized by specific reference;normalized by internal standards or optimal selection of multiple internal standards; |
| Biomarker analysis | √ | - | - | - | - | √ | √ | - | - | - | - | - | - | - | - | - |
| Correlation network analysis | √ | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| Metabolite identification | √ | √ | √ | √ | √ | √ | - | √ | √ | √ | √ | √ | - | - | - | - |
| Functional analysis | √ | - | - | - | - | √ | √ | - | √ | - | - | - | - | - | - | - |
| Quality assessment | √ | - | √ | - | - | - | - | √ | - | - | - | - | - | - | - | - |
| Peak picking | √ | √ | √ | - | √ | √ | √ | √ | √ | √ | √ | √ | - | - | - | - |
| HTML-Based report | √ | - | - | - | - | - | (PDF) | - | (PDF) | - | - | - | - | - | - | - |
Fig. 1Overview of metaX. This figure summarizes the main modules, functions and features of metaX. The input data and the functions are included in the figure
Fig. 2User interface of metaX for quality assessment and normalization evaluation
Fig. 3QC charts generated by metaX. a The intensity of feature distribution before normalization. b The intensity of feature distribution after normalization. c The correlation plot of QC samples before normalization. d The correlation plot of QC samples after normalization. e The missing value distribution in experimental and QC samples. f The CV distribution of all features before and after normalization for each group
Fig. 4QC charts generated by metaX. a The sum intensity of all features per sample before normalization over the analysis time (injection order). b The sum intensity of all features per sample after normalization over the analysis time (injection order). c The number of features per sample over the analysis time (injection order). d The score plot of PCA for the raw feature intensity data. e The score plot of PCA for the normalized data
Fig. 5Comparison of different normalization methods from PCA. a none, b QC-RSC, c ComBat, d SRV, e) PQN, f sum, g VSN and h quantiles. The different points in the figures refer to different samples, and the samples were color-coded according to their group information and shape-coded according to their batch information
The comparison of different normalization methods
| Methods | NO. of peaks | NO. of peaks (CV ≤ 30%)a | DEFb | Mean (CV) CHD d | Mean (CV) Health d | Mean (CV) QC e |
|---|---|---|---|---|---|---|
| ComBat | 1438 | 930 | 127 | 0.4261 | 0.3816 | 0.1636 |
| none | 1438 | 527 | 65 | 0.4865 | 0.4739 | 0.2114 |
| QC_RSC | 1438 | 1191 | 178 | 0.5108 | 0.4664 | 0.1098 |
| SVR | 1438 | 1293 | 170 | 0.4853 | 0.4583 | 0.1081 |
| PQN | 1438 | 793 | 125 | 0.4945 | 0.4681 | 0.1777 |
| Quantiles | 1438 | 740 | 118 | 0.4911 | 0.4646 | 0.1895 |
| sum | 1438 | 761 | 119 | 0.5044 | 0.4733 | 0.1979 |
| VSN | 1438 | 772 | 120 | 0.5014 | 0.4761 | 0.1912 |
Note:
aAfter normalization, the number of peaks with CV ≤ 30% in QC samples
bDEF: differentially expressed features with q-value < = 0.05, fold change > = 1.5 or fold change < = 0.667 and VIP > = 1
cMean (CV) CHD: The average CV of peaks in CHD disease group
dMean (CV) Health: The average CV of peaks in health group
eMean (CV) QC: The average CV of peaks in QC group
Fig. 6The score and loading plots of PCA. a Score plot of PCA and (b) Loading plot of PCA. The different points in the figures refer to different samples, and the samples are color-coded according to their group information. The QC samples were removed before performing the PCA analysis
Fig. 7The score and permutation test plots of PLS-DA and OPLS-DA. a Score plot of PLS-DA. R2Y: 0.908, Q2Y: 0.854. b Permutation test plot of PLS-DA, p-value < = 0.05. c Score plot of OPLS-DA. R2Y: 0.905, Q2Y: 0.847. d Permutation test plot of OPLS-DA, p-value < = 0.05. The different points in the score plots (A and C) refer to different samples, and the samples are color-coded according to their group information. The number of permutations for the permutation test is 200
The biomarkers selected by metaX
| MZ | RT (min) | Mass | HMDB | Name | Delta (ppm) | Chemical formula |
|---|---|---|---|---|---|---|
| 308.0498 | 10.46 | 285.0629 | HMDB14387 | Cladribine | −8.18 | C10H12ClN5O3 |
| 424.3412 | 11.94 | 423.3349 | HMDB06469 | Linoleyl carnitine | −2.31 | C25H45NO4 |
| 155.0281 | 2.81 | 116.066 | HMDB32411 | 2-Methyl-1-methylthio-2-butene | −8.77 | C6H12S |
| 130.0499 | 3.43 | 129.0426 | HMDB00267 | Pyroglutamic acid | 0.15 | C5H7NO3 |
| 174.9913 | 2.30 | NULL | NULL | NULL | NULL | NULL |
| 309.0533 | 10.47 | 270.0892 | HMDB33940 | Vignafuran | 3.44 | C16H14O4 |
| 425.3446 | 11.94 | 424.3341 | HMDB06327 | Alpha-Tocotrienol | 7.62 | C29H44O2 |
| 324.0443 | 9.33 | 301.0563 | HMDB01062 | N-Acetyl-D-Glucosamine 6-Phosphate | −3.86 | C8H16NO9P |
Fig. 8The ROC curve result of the six selected metabolites
Fig. 9The differential correction network. The top six largest numbers of nodes communities were color-coded. Detailed information about the samples and their communities are presented in Table S3