| Literature DB >> 17501994 |
Zhengdong D Zhang1, Joel Rozowsky, Hugo Y K Lam, Jiang Du, Michael Snyder, Mark Gerstein.
Abstract
We developed Tilescope, a fully integrated data processing pipeline for analyzing high-density tiling-array data http://tilescope.gersteinlab.org. In a completely automated fashion, Tilescope will normalize signals between channels and across arrays, combine replicate experiments, score each array element, and identify genomic features. The program is designed with a modular, three-tiered architecture, facilitating parallelism, and a graphic user-friendly interface, presenting results in an organized web page, downloadable for further analysis.Entities:
Mesh:
Year: 2007 PMID: 17501994 PMCID: PMC1929149 DOI: 10.1186/gb-2007-8-5-r81
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Screenshots of Tilescope. (a) The applet of Tilescope, the graphic user interface of the pipeline. (b) An example of the data analysis result web page.
Figure 2Tiling array data processing by Tilescope. (a) Flow chart of major data processing steps. Yellow icons represent data in user-accessible files, and blue ones data in the pipeline program memory. See main text for details. (b) Log-intensity scatter plots of a tiling array from the STAT1 experiment set before and after normalization by four different methods. The first panel is the log2T verses log2R plot before normalization, where T and R are test intensity and reference intensity, respectively. The gray line represents where these two log-intensities are equal. The second panel is log2(T/R) verses log2(T×R) plot (the MA plot) before normalization. The dependency of the log-ratio on the intensity, evinced by a fitted loess curve, is prominent in the data. The other panels are the MA plots of array data after mean, median, loess, or quantile normalization. They clearly show that the distribution of log-ratios is centered at zero by all normalization methods, but the intensity-specific artifacts in the log-ratio measurements are removed by only loess or quantile normalization and not by the mean- or median-based method. (c) Signal and P value maps of all tiles in the ENCODE ENm002 region. In this region, the tiles near the transcription start site of IRF1, a transcription factor known to be regulated by STAT1, give the strongest signals. (d) Tilescope-identified STAT1 binding sites at the 5'-end of IRF1 are shown on the custom track in the UCSC genome browser.
Feature comparison between tiling microarray data analysis software*
| Tilescope | Bioconductor† | TAS‡ | MAT§ | TileMap | |
| Web | R packages | Standalone | Standalone | Standalone | |
| √ | × | √ | × | × | |
| Transcription data | √ | √ | √ | × | √ |
| ChIP-chip data | √ | √ | × | √ | √ |
| Affymetrix | √ | √ | √ | √ | √ |
| NimbleGen | √ | × | × | × | × |
| Mean/median | √ | ~ | √ | / | × |
| Loess | √ | ~ | × | / | × |
| Quantile | √ | ~ | × | / | √ |
| Max gap and min run | √ | ~ | √ | / | √ |
| Iterative peak identification | √ (new) | × | × | / | × |
| Hidden Markov model | √ | ~ | × | / | √ |
*Only programs explicitly applicable to high-density tiling microarray data were considered. The websites of the compared programs are listed as follows: Tilescope at [35]; Bioconductor at [37]; TAS at [38]; MAT at [39]; TileMap at [40]. †Strictly speaking, Bioconductor is not a ready-to-run program. It is a collection of software packages/libraries written in R. As a tool box, the analysis methods that it provides need to be written in an R program to run. ‡TAS is previously known as GTRANS. §MAT standardizes the probe value through the probe model, which obviates the need for sample normalization. Comparison symbols used in the table: √, available; ×, not available; ~, available but need to be programmed; /, not applicable.
Figure 3The ROC curves of the three feature identification methods implemented in Tilescope. The comparison of the performance of these methods was based on a well-studied STAT1 ChIP-chip data set and a list of experimentally tested STAT1 binding sites.