| Literature DB >> 30271970 |
Dijun Chen1,2, Liang-Yu Fu3, Dahui Hu4, Christian Klukas5,6, Ming Chen7, Kerstin Kaufmann8.
Abstract
The wave of high-throughput technologies in genomics and phenomics are enabling data to be generated on an unprecedented scale and at a reasonable cost. Exploring the large-scale data sets generated by these technologies to derive biological insights requires efficient bioinformatic tools. Here we introduce an interactive, open-source web application (HTPmod) for high-throughput biological data modeling and visualization. HTPmod is implemented with the Shiny framework by integrating the computational power and professional visualization of R and including various machine-learning approaches. We demonstrate that HTPmod can be used for modeling and visualizing large-scale, high-dimensional data sets (such as multiple omics data) under a broad context. By reinvestigating example data sets from recent studies, we find not only that HTPmod can reproduce results from the original studies in a straightforward fashion and within a reasonable time, but also that novel insights may be gained from fast reinvestigation of existing data by HTPmod.Entities:
Year: 2018 PMID: 30271970 PMCID: PMC6123733 DOI: 10.1038/s42003-018-0091-x
Source DB: PubMed Journal: Commun Biol ISSN: 2399-3642
Fig. 1The HTPmod Shiny application for high-throughput data modeling and visualization. a The overall design and workflow of HTPmod. b The growMod module for plant growth modeling. Example results shown here are based on data from ref. [1]. c The predMod application for predicting traits of interest from high-dimensional data using various prediction models. The upper panel shows the general workflow of predMod. The lower panel shows example output of regression (left) or classification (right) from predMod. d High-throughput data visualization with the htpdVis application. Example graphs are generated by htpdVis using data from refs. [1,25]
Fig. 2Prediction of gene expression changes using transcription factor binding data in Arabidopsis. Data obtained from ref. [21] and the full names of models referred to Supplementary Table 2. All prediction models with default parameter settings in predMod were used in the analysis. Pearson’s correlations and corresponding p-values (in parentheses) are shown
Fig. 3Relative importance of features in prediction of gene expression changes. GLMNET (lasso and elastic-net regularized generalized linear model) regression model (in predMod) was used to predict gene expression changes, using binding strength in both ABA- and mock-treated conditions. Barplot shows the relative importance of the binding features in the prediction. The result is consistent with that from the original study[21]
Fig. 4Visualization of floral organ-specific transcriptome data in Arabidopsis[42] via t-SNE plots[33] using htpdVis. The pattern of organ-specific expression for genes with known organ signature is shown in the three-dimensional t-SNE plots in 2D (a) or 3D (b) views. c t-SNE plot in 2D view showing organ-specific expression pattern by adding more genes with unknown organ signature. Default parameter settings were used in all of these analyses