| Literature DB >> 29678131 |
Alexander S Kirpich1,2,3,4, Miguel Ibarra1,2, Oleksandr Moskalenko5, Justin M Fear1,3,4,6, Joseph Gerken1, Xinlei Mi7, Ali Ashrafi1, Alison M Morse1,3,4, Lauren M McIntyre1,2,3,4.
Abstract
BACKGROUND: Metabolomics has the promise to transform the area of personalized medicine with the rapid development of high throughput technology for untargeted analysis of metabolites. Open access, easy to use, analytic tools that are broadly accessible to the biological community need to be developed. While technology used in metabolomics varies, most metabolomics studies have a set of features identified. Galaxy is an open access platform that enables scientists at all levels to interact with big data. Galaxy promotes reproducibility by saving histories and enabling the sharing workflows among scientists.Entities:
Mesh:
Year: 2018 PMID: 29678131 PMCID: PMC5910624 DOI: 10.1186/s12859-018-2134-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The SECIMTools structure: The outside cloud represents the Galaxy environment. The inside circle represents the set of SECIMTools. A common data handling and input/output architecture for all the SECIMTools, enables the development of analytical workflows without continual data manipulation and reformatting. Most tools expects two files describing the data, one giving information about each sample and the experimental design (design formatted file), and one giving the estimated feature intensities for each sample (wide formatted files). Galaxy expects files in a tab separated format (tsv). Tools that convert to tsv format from other common formats exist as a part of Galaxy. The output files are result files (e.g. -values from an ANOVA) and figures (e.g. Scatterplots). The result tables are returned to the user in a Galaxy compatible tsv format. Plots have a common color scheme with a customizable color palate that will apply the same coloring scheme to all results. A detailed description of the data formats is given in the user guide
Fig. 2Individual tool structure: The input data have the same standard format, and a common visualization manager which generates outputs in a standard format
Fig. 3Summary of ANOVA, Random Forrest and LASSO/Elastic Net methods with their advantages and disadvantages
Fig. 4An example of data preprocessing and Quality Control for MS data. The workflow begins with the Blank Feature Filtering, and removal of the features below the level of detection. The Standardized Euclidian Distance, the Principal Component Analysis, the Run Order Regression, The Magnitude Difference, the Coefficient of Variation, and the Retention Time tools are used for the diagnostics at the next step. Some tools require log transformed data for the input, and the Log/G-Log Transformation tool is included into the workflow to address that. Multiple summary flags are produced by each tool. The tool’s flags are merged and summarized with the option to delete flagged features
Fig. 5Workflow for ANOVA and Variable Selection. This workflow compares α = 0 Ridge Regression, α = 0.5 Elastic Net and α = 1 for LASSO to results from an ANOVA