| Literature DB >> 20626915 |
Adam J Carroll1, Murray R Badger, A Harvey Millar.
Abstract
BACKGROUND: Standardization of analytical approaches and reporting methods via community-wide collaboration can work synergistically with web-tool development to result in rapid community-driven expansion of online data repositories suitable for data mining and meta-analysis. In metabolomics, the inter-laboratory reproducibility of gas-chromatography/mass-spectrometry (GC/MS) makes it an obvious target for such development. While a number of web-tools offer access to datasets and/or tools for raw data processing and statistical analysis, none of these systems are currently set up to act as a public repository by easily accepting, processing and presenting publicly submitted GC/MS metabolomics datasets for public re-analysis. DESCRIPTION: Here, we present MetabolomeExpress, a new File Transfer Protocol (FTP) server and web-tool for the online storage, processing, visualisation and statistical re-analysis of publicly submitted GC/MS metabolomics datasets. Users may search a quality-controlled database of metabolite response statistics from publicly submitted datasets by a number of parameters (eg. metabolite, species, organ/biofluid etc.). Users may also perform meta-analysis comparisons of multiple independent experiments or re-analyse public primary datasets via user-friendly tools for t-test, principal components analysis, hierarchical cluster analysis and correlation analysis. They may interact with chromatograms, mass spectra and peak detection results via an integrated raw data viewer. Researchers who register for a free account may upload (via FTP) their own data to the server for online processing via a novel raw data processing pipeline.Entities:
Mesh:
Year: 2010 PMID: 20626915 PMCID: PMC2912306 DOI: 10.1186/1471-2105-11-376
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Structural overview of the MetabolomeExpress webserver. MetabolomeExpress consists of four interacting layers: i) an FTP repository; ii) an SQL database; iii) a set of server-side data-processing modules; and iv) a web-interface. The contents and general structure of each layer are indicated schematically. Further details are provided in the main text.
Figure 2Overview of the MetabolomeExpress data processing pipeline. Data flows through the MetabolomeExpress pipeline are illustrated schematically as a series of processing steps. File formats generated at each processing stage are indicated in brackets. Further details are provided in the main text.
Comparison of AMDIS and MetabolomeExpress MSRI library matching performance: summary of results.
| AMDIS | MetabolomeExpress | |||
|---|---|---|---|---|
| Total Peak Identifications Reported | 152 | 165 | 153 | 170 |
| True Positive | 140 | 150 | 149 | 163 |
| False Positive | 7 | 12 | 0 | 1 |
| False Negative | 35 | 24 | 20 | 7 |
| Ambiguous | 5 | 5 | 6 | 7 |
A raw data file from a representative GC/MS analysis of a complex methanolic plant tissue extract was processed and searched against a single MSRI reference library using either AMDIS or MetabolomeExpress. Each software package was used under two different sensitivity settings to allow a more meaningful comparison of results. All peak identification results were manually verified by inspection of relevant raw GC/MS signals as true positives, false positives, false negatives or ambiguous calls. False negatives were cases where a search failed to identify a component that was successfully identified by another search. Detailed results including manual validation comments are presented in additional file 4.
Figure 3Validation experiment: randomised combinatorial metabolite standard mixing. Three metabolite mixtures each contained a set of approximately 30 different non-co-eluting high-purity authentic metabolite standards at known concentrations (typical metabolite concentration: 200 ng/μl). The components of these mixtures were chosen such that no chromatographic co-elution would occur between components of the same mixture but co-elution would occur between components of different mixtures when two or more of the mixtures were combined into a single analysis. An eight point dilution series was prepared from each of the three mixtures, generating three sets of eight solutions. The order of each dilution series was randomised to generate a randomised mixing protocol table and aliquots of solutions were combined accordingly. This randomised mixing process was repeated 5 times to generate 5 sets of 8 solutions (40 solutions). These complex solutions, each containing the same set of 85 metabolite standards, were analysed by a standard GC/MS metabolomics protocol.
Figure 4Validation of the MetabolomeExpress MSRI Library Matching algorithm. The challenge dataset described in Figure 3 was processed using MetabolomeExpress and the strength of the relationship between metabolite concentration and reported signal intensity assessed for each metabolite derivative peak by calculating coefficients of determination (ie. R2 values). Linear regression plots are shown for (A) the best performing peak, Xylose (4TMS) and (B) the worst performing peak, Glutamine (3TMS). A histogram (C) shows the distribution of R2 values across the 81 metabolite derivative peaks examined.
Figure 5Poor 'concentration:signal intensity' correlation for glutamine (3TMS) was due to its time-dependent consumption by an unknown chemical process. The graph above shows the relationship between Glutamine (3TMS) signal intensity and time of analysis for 5 derivatised samples each originally supplied with the same amount of glutamine (450 pmol/μl). All samples were derivatised at the same time so a later time of analysis corresponds to a greater sample age.
Figure 6Global validation by correlation analysis of a combinatorial metabolite standard mixing GC/MS data set. The data matrix generated by MetabolomeExpress processing of the combinatorial standard mixing GC/MS data set (see Figure 3) was filtered to remove internal standards and analytes of unknown structure and then used to generate a correlation matrix which was used as input for hierarchical clustering in the statistical package, R. The reordered correlation matrix is shown as a heatmap with colours corresponding to analyte-analyte correlation coefficients (see color bar to the right). As expected, analytes were clustered into three major clusters corresponding to metabolite sub-mixes 1, 2 and 3 (see Figure 3). Analyte names have been coloured according to their metabolite submixture of origin (Mix 1 = red; Mix 2 = green; Mix 3 = blue). * Known breakdown product.
Features distinguishing MetabolomeExpress from existing GC/MS metabolomics web-tools.
| PlantMetabolomics.org | SetupX | MeltDB | MetaboAnalyst | MetabolomeExpress | |
|---|---|---|---|---|---|
| Number of peer-reviewed, biology-focused publication datasets publicly available | 0* | 0 | 0 | 0 | 8 |
| Accepts public raw data submissions | + | + | |||
| Accepts public processed data submissions | + | + | |||
| Long-term data storage | + | + | + | + | |
| MSI-compliant metadata | + | + | + | ||
| Raw GC/MS data files | + | + | |||
| Mass peak lists | + | + | |||
| Library match lists | + | + | |||
| Data matrices | + | + | + | ||
| Mass-spectral and retention-index libraries | + | ||||
| Precomputed fold-change/comparative statistical results | + | + | |||
| PCA Results | + | + | |||
| HCA Results | + | + | |||
| Statistical heatmap spreadsheets | + | ||||
| Metabolite-metabolite correlation network graphs | + | ||||
| Metabolite-metabolite correlation tables | + | ||||
| MapMan importable fold-change data files | + | ||||
| Cytoscape-importable fold-change data files | + | ||||
| Provides integrated access to peak detection | + | + | + | ||
| Provides own peak detection algorithm | + | ||||
| Performs peak identification without offline pre-processing | + | ||||
| Supports upload of custom MSRI libraries | + | ||||
| Allows users to perform custom statistical comparisons | + | + | + | + | |
| Fold change | + | + | + | + | |
| t-test | + | + | + | ||
| Principal Components Analysis (PCA) | + | + | + | ||
| Cluster Analysis | + | + | + | ||
| Metabolite-metabolite correlation Analysis | + | ||||
| Provides chromatogram visualisation | + | + | |||
| Chromatogram viewer displays TICs | + | + | |||
| Chromatogram viewer displays EICs | + | ||||
| Chromatogram viewer allows overlays of multiple chromatograms | + | ||||
| Chromatogram viewer zoomable | + | ||||
| Provides MS spectral visualisation | + | + | |||
| Provides MS spectra of library entries | + | + | |||
| Provides MS spectra of arbitrary MS scans | + | ||||
Features of MetabolomeExpress that distinguish it from the existing GC/MS metabolomics web-tools - PlantMetabolomics.org ([28]), SetupX [15], MeltDB [14] and MetaboAnalyst [13] - are listed. A '+' indicates that a feature is present. * We were unable to find any information indicating that any of the datasets stored in PlantMetabolomics.org have been published in peer-reviewed, biology-focused research articles.