| Literature DB >> 27918597 |
Przemyslaw Stempor1, Julie Ahringer1.
Abstract
Experiments involving high-throughput sequencing are widely used for analyses of chromatin function and gene expression. Common examples are the use of chromatin immunoprecipitation for the analysis of chromatin modifications or factor binding, enzymatic digestions for chromatin structure assays, and RNA sequencing to assess gene expression changes after biological perturbations. To investigate the pattern and abundance of coverage signals across regions of interest, data are often visualized as profile plots of average signal or stacked rows of signal in the form of heatmaps. We found that available plotting software was either slow and laborious or difficult to use by investigators with little computational training, which inhibited wide data exploration. To address this need, we developed SeqPlots, a user-friendly exploratory data analysis (EDA) and visualization software for genomics. After choosing groups of signal and feature files and defining plotting parameters, users can generate profile plots of average signal or heatmaps clustered using different algorithms in a matter of seconds through the graphical user interface (GUI) controls. SeqPlots accepts all major genomic file formats as input and can also generate and plot user defined motif densities. Profile plots and heatmaps are highly configurable and batch operations can be used to generate a large number of plots at once. SeqPlots is available as a GUI application for Mac or Windows and Linux, or as an R/Bioconductor package. It can also be deployed on a server for remote and collaborative usage. The analysis features and ease of use of SeqPlots encourages wide data exploration, which should aid the discovery of novel genomic associations.Entities:
Keywords: aggregate gene profile plot; hierarchical cluster; k-means cluster; self-organizing maps; unsupervised machine learning
Year: 2016 PMID: 27918597 PMCID: PMC5133382 DOI: 10.12688/wellcomeopenres.10004.1
Source DB: PubMed Journal: Wellcome Open Res ISSN: 2398-502X
File formats accepted by SeqPlots.
|
| |
|---|---|
|
|
|
| General Feature Format | gff |
| Browser Extensible Data | bed |
| General Transfer Format | gtf |
|
| |
|
|
|
| bigWig Track Format a | bw |
| Wiggle Track Format b | wig |
| BedGraph Track Format b | bdg or bedGraph |
| Binary Sequence Alignment/Map c | bam |
apreferred track format
bconverted to bigWig upon upload
ccoverage is calculated using all aligned reads
Figure 1. An example of SeqPlots workflow to analyze H2A.Z, H3K36me3, H3K4me3 and CpG density across C. elegans protein coding TSSs separated by expression quintiles.
( a,b) Top, GUI interface showing clickable grid of signal/feature combinations. Bottom, plots resulting from the clicked selections. ( c) Plots of individual signals across genes in top expression quintile anchored at TSSs, plotting 1 kb upstream and 1.5 kb downstream of TSSs. ( d) Heatmaps generated using k-means clustering (3 clusters) of TSSs in top expression quintile, using H3K36me3 signal for clustering. ( e) Average signal profiles and ( f) heatmaps generated from cluster 2 (C2) in ( d) made by downloading full cluster data and uploading file with cluster 2 regions. Heatmaps were clustered using H3K4me3, H2A.Z and CpG signals. Data used to generate this figure are available from GEO (H3K4me3: GSE28770 - https://www.be-md.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE28770; H3K36me3: GSE62833 - https://www.be-md.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62833; H2A.Z/HTZ-1: GSE49717 - https://www.be-md.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE49717). TSS annotations are from 7, 8, or Wormbase/Ensembl 81 if a gene had no TSS annotation in either dataset (available from https://gist.github.com/Przemol/c5114067cc2dd236ed1dbcaf41003472). Genes were divided into expression bins using DCPM values from Gerstein et al. [9].