| Literature DB >> 24799436 |
Fidel Ramírez1, Friederike Dündar2, Sarah Diehl1, Björn A Grüning3, Thomas Manke4.
Abstract
We present a Galaxy based web server for processing and visualizing deeply sequenced data. The web server's core functionality consists of a suite of newly developed tools, called deepTools, that enable users with little bioinformatic background to explore the results of their sequencing experiments in a standardized setting. Users can upload pre-processed files with continuous data in standard formats and generate heatmaps and summary plots in a straight-forward, yet highly customizable manner. In addition, we offer several tools for the analysis of files containing aligned reads and enable efficient and reproducible generation of normalized coverage files. As a modular and open-source platform, deepTools can easily be expanded and customized to future demands and developments. The deepTools webserver is freely available at http://deeptools.ie-freiburg.mpg.de and is accompanied by extensive documentation and tutorials aimed at conveying the principles of deep-sequencing data analysis. The web server can be used without registration. deepTools can be installed locally either stand-alone or as part of Galaxy.Entities:
Mesh:
Year: 2014 PMID: 24799436 PMCID: PMC4086134 DOI: 10.1093/nar/gku365
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Overview of currently available deepTools
| Tool name | Type | Input files | Main output | Application |
|---|---|---|---|---|
| QC | 2 or more BAM | Clustered heatmap of similarity measures | Determine Pearson or Spearman correlations between read distributions | |
| QC | 2 BAM | Diagnostic plot | Assess enrichment strength of a ChIP-seq sample versus a control | |
| QC | 1 BAM | Diagnostic plots | Compare expected and observed GC distribution of reads | |
| Normalization | 1 BAM | BAM or bigWig | Obtain GC-corrected read (coverage) file | |
| Normalization | 1 BAM | bedGraph or bigWig | Obtain normalized read coverage of a single BAM | |
| Normalization | 2 BAM | bedGraph or bigWig | Normalize 2 BAM files to each other with a mathematical operation of Choice (fold change, log2 (ratio), sum, difference) | |
| Visualization | 1 bigWig, min. 1 BED | gzipped table | Calculate the values for heatmaps and summary plots | |
| Visualization | gzipped table from computeMatrix | xy-plot (summary plot) | Average profiles of read coverage for (groups of) genome regions | |
| Visualization | gzipped table from computeMatrix | (Un)clustered heatmap or read coverages | Identify patterns of read coverages for genome regions |
Here, we only indicate the main output files, but every data table underlying any image produced by deepTools can be downloaded and used in subsequent analyses. For a comparison of functionalities with previously published web servers, see Supplementary Table S1.
Figure 1.Examples of images created with deepTools. (A) Overview of the deepTools workflow that offers tools for visualization and for the intermediary NGS data processing steps (Table 1). Users can either start by directly uploading bigWig files for the generation of heatmap and summary plots, or they may upload BAM files, perform quality controls on them and produce normalized coverage files that can then be used for the visualization steps. (B) Clustered heatmap produced by the deepTools bamCorrelate module. Shown here are the Pearson correlation coefficients of various ChIP-seq samples; the clustering reveals that the ChIP signals of MOF in male and female cells differ significantly [data from (3), ENA accession: PRJEB3031]. (C) Exemplary plot produced by computeGCbias to assess the GC distribution of reads within a given BAM file. The sample here shows the typical over-representation of reads with high GC content that is often observed after excessive polymerase chain reaction amplification. An additional plot (not shown here, see Supplementary Materials) takes the genome-specific expectation into consideration. (D) Examples of different summary plots and heatmap versions generated by deepTools using normalized read coverages from a ChIP-seq for RNA polymerase II (Pol II) in male Drosophila melanogaster cells. bamCompare was used to calculate the log2 ratio of Pol II and the control sample. The resulting bigWig file was supplied together with a BED file containing the gene regions to computeMatrix which was used in scale-region mode to extract the scores for the genes. The left-most plot shows the subsequent default output of heatmapper: The Pol II signal over the body of all genes can be seen and genes are sorted according to the mean score. The summary plot on top of the heatmap indicates that, on average, Pol II is most strongly enriched around the start of genes which is also visible in the heatmaps. The center plot shows the same data, but here we supplied three individual BED files, one per chromosome. The summary plot suggests that the genes on the X chromosome show slightly higher average signals than those on chromosomes 2 and 3 which is consistent with the transcriptional upregulation of the male X chromosome in Drosophila (3). Additionally, heatmapper allows for the automated clustering of the data as exemplified in the right-most heatmap. Only by indicating the number of clusters to be found, the clustering results in an image where one can clearly differentiate between genes with elevated amounts of Pol II at the promoter and over the gene body (cluster 3) from genes with Pol II primarily at the promoters (cluster 2) and those with very weak Pol II signal (cluster 1). Abbreviations: bp, base pair; chr, chromosome; input, control sample for ChIP-seq experiments; Pol II, RNA polymerase II; TES, transcription end site; TSS, transcription start site.