| Literature DB >> 34746766 |
Aimin Ma1,2, Xiaoquan Qi1,2.
Abstract
Plants produce a variety of metabolites that are essential for plant growth and human health. To fully understand the diversity of metabolites in certain plants, lots of methods have been developed for metabolites detection and data processing. In the data-processing procedure, how to effectively reduce false-positive peaks, analyze large-scale metabolic data, and annotate plant metabolites remains challenging. In this review, we introduce and discuss some prominent methods that could be exploited to solve these problems, including a five-step filtering method for reducing false-positive signals in LC-MS analysis, QPMASS for analyzing ultra-large GC-MS data, and MetDNA for annotating metabolites. The main applications of plant metabolomics in species discrimination, metabolic pathway dissection, population genetic studies, and some other aspects are also highlighted. To further promote the development of plant metabolomics, more effective and integrated methods/platforms for metabolite detection and comprehensive databases for metabolite identification are highly needed. With the improvement of these technologies and the development of genomics and transcriptomics, plant metabolomics will be widely used in many fields.Entities:
Keywords: application; data-processing methods; metabolites; plant metabolomics
Mesh:
Year: 2021 PMID: 34746766 PMCID: PMC8554038 DOI: 10.1016/j.xplc.2021.100238
Source DB: PubMed Journal: Plant Commun ISSN: 2590-3462
Figure 1The data-processing procedures of plant metabolic data.
PCA, principal-component analysis; HCA, hierarchical clustering analysis; PLS, partial least squares analysis; O-PLS, orthogonal to partial least squares analysis.
Summary of software for analyzing plant metabolic data.
| Software | Description | Compatibility | Language | Reference |
|---|---|---|---|---|
| XCMS | data preprocessing, alignment, and quantitation; but it is time-consuming for it to process large-scale datasets | LC-MS, GC-MS | R | |
| MetAlign | data preprocessing, alignment, and quantitation; but it is time-consuming for it to process large-scale datasets | LC-MS, GC-MS | C | |
| Mzmine | distributed computing algorithm-based peak alignment and multiple visualization modules are available for data visualization | LC-MS, GC-MS | Java | |
| AMDORAP | accurate | LC-MS | R | |
| MAIT | comprehensive statistical analysis tool for LC-MS metabolic data, but the data normalization is not included | LC-MS | R | |
| OpenMS | hundreds of workflows are available for data processing, and a highly flexible and professional software environment is provided for users | LC-MS | C++ | |
| metaX | a comprehensive workflow for untargeted metabolomics data, including data preprocessing, metabolites identification, pathway annotation, and biomarker selection | LC-MS, GC-MS | R | |
| ROIMCR | ROI-based peak detection and integration, and an MCR-ALS method is used to resolve peaks from mixture | LC-MS | MATLAB | |
| MetaboAnalyst | a powerful platform for metabolomics data analysis, including enrichment analysis, pathway analysis, and statistical analysis; however, the original data need to be converted and aligned by other software | LC-MS, GC-MS, NMR | Java, R | |
| MAVEN | machine learning-based peak quality assessment, pathway, and isotope-labeling visualization | LC-MS | – | |
| apLCMS | a hybrid feature detection approach is used to reduce false-positive and false-negative peaks, but a known-feature database is needed | LC-MS | R | |
| MS-FLO | retention time alignment, accurate mass tolerances, peak height similarity, and Pearson’s correlation analysis-based methods to minimize false-positive peaks | LC-MS | Python | |
| rFPF | an EIC profile-based method to remove false-positive features | LC-MS | MATLAB | |
| Peakonly | precise peak detection using a convolutional neural network-based deep learning method | LC-MS | Python | |
| AMDIS | data deconvolution; without the function of peak alignment | GC-MS | – | |
| ChromaTOF | GC-TOF-MS data deconvolution; without published algorithm descriptions | GC-MS | – | – |
| MetaQuant | target metabolome analysis, but an established library is required | GC-MS | Java | |
| MET-IDEA | target metabolome analysis, but a list containing | GC-MS | – | |
| TagFinder | peak alignment; without the function of baseline correction and peak smooth | GC-MS | Java | |
| MetaboliteDetector | data deconvolution and peak alignment based on a QT4 graphical user interface | GC-MS | C++ | |
| ADAP | data deconvolution and peak alignment using a two-phase approach | GC-MS | C++, R | |
| MS-DIAL | data deconvolution, peak alignment, and annotation | GC-MS | C | |
| eRah | peak deconvolution and alignment | GC-MS | R | |
| IP4M | 62 independent functions for data preprocessing, peak annotation, and pathway enrichment analysis | LC-MS, GC-MS | Java, Perl, R | |
| autoGCMSDataAnal | TIC peak detection and resolution using raw data; dynamic programming algorithm-based retention time-shift correction | GC-MS | MATLAB | |
| QPMASS | large-scale metabolic data analysis (alignment, backfill, and quantitation) | GC-MS | C++ |
Figure 2The strategy of a five-step filtering approach for metabolic data.
(A) The flow of the five-step filtering approach.
(B) The validation experiments of the five-step filtering approach using artificial samples (left) and biological samples of rice seed (right). Steps 1 to 5 correspond to the data-filtering procedures in the five-step filtering approach, and the retained peaks are the peaks left after data filtering.
Figure 3The performance of QPMASS software.
(A) The workflow of QPMASS.
(B) The processing time of QPMASS and XCMS for different number of GC-TOF-MS (dots) and GC-qMS (triangles) data.
(C) Comparison of quantification performance among QPMASS, XCMS, and ChromaTOF. The legend color and circle size correspond to the correlation of peak areas from different software.
(D) The alignment accuracy of QPMASS. The accuracy of alignment was compared between QPMASS and flagme.
Summary of available databases for plant metabolites identification and pathway analysis.
| Database | Compatibility | Link | Description | Reference |
|---|---|---|---|---|
| NIST | LC-MS, GC-MS | a most widely used mass spectral reference library, in which MS/MS spectra, mass spectra for multiple ion adducts, compound name, formula, CAS number, etc., are all included | – | |
| METLIN | LC-MS | including nearly one million molecular standards with MS/MS data, and supporting multiple retrieval modes | ||
| BinBase | GC-TOF-MS | peak filtering and annotation using a mass spectral metadata-based filtering algorithm | ||
| MMCD | NMR, LC-MS | compatible for identifying metabolites from both NMR and MS data | ||
| SIRIUS | LC-MS | comprehensive assessment of molecular structure using MS/MS data | ||
| MassBank | LC-MS, GC-MS | a distributed database and ESI-MS2 data, under different experimental conditions, are included | ||
| ReSpect | LC-MS | plant-specific MS/MS-based data resource and database | ||
| CSI:FingerID | LC-MS/MS | combining fragmentation tree computation and machine learning for molecular structure searching | ||
| LC-MS/MS library | LC-MS/MS | ultra-high-performance liquid chromatography-tandem mass spectral library of plant natural products | ||
| MS2LDA | LC-MS | Mass2Motifs-based method is used to annotate metabolites without the necessary of existing reference spectra; establishing biochemical relationships between molecules | ||
| GNPS | LC-MS | a natural product and metabolomics analysis platform using molecular networks | ||
| NAP | LC-MS | a re-ranking system is used to increase the annotation rates | ||
| MetDNA | LC-MS | large-scale and ambiguous identification of metabolites from LC-MS/MS datasets without the need of a standard spectral library | ||
| MMN | LC-MS | / | MicroTom metabolome and transcriptome dataset | |
| KEGG | – | one of the most complete and widely used databases; containing metabolic pathways from a wide variety of organisms | ||
| MetaCyc | – | experimentally elucidated metabolic pathway database | ||
| WikiPathways | – | a biological pathway database, including pathways from more than 30 species | ||
| PMN15 | – | genome-wide metabolic pathway databases for 126 plants |