| Literature DB >> 18784119 |
Alan P Boyle1, Justin Guinney, Gregory E Crawford, Terrence S Furey.
Abstract
UNLABELLED: Tag sequencing using high-throughput sequencing technologies are now regularly employed to identify specific sequence features, such as transcription factor binding sites (ChIP-seq) or regions of open chromatin (DNase-seq). To intuitively summarize and display individual sequence data as an accurate and interpretable signal, we developed F-Seq, a software package that generates a continuous tag sequence density estimation allowing identification of biologically meaningful sites whose output can be displayed directly in the UCSC Genome Browser. AVAILABILITY: The software is written in the Java language and is available on all major computing platforms for download at http://www.genome.duke.edu/labs/furey/software/fseq.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18784119 PMCID: PMC2732284 DOI: 10.1093/bioinformatics/btn480
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Examples of histogram and density estimation properties. Blue dots represent sample positions being analyzed. (A, B) Locations of the bins used in histograms can cause data to look unimodal (A) or bimodal (B) depending on their starting positions (1.5 and 1.75, respectively). (C) Bandwidth affects the density generated in the same way as changing the size of bins. Over (red, dashed line) and under (green, dotted line) smoothed data can obscure the actual signal (black, solid line). (D) Example of how distributions over each point are combined to create the final distribution. Each of the samples are represented by Gaussian distributions which are summed to create the final density estimation.
Fig. 2.View of 10 kb region of Chromosome 8 shows an accurate duplication of windowing technique in STAT1 data (Robertson et al., 2007). Note that the histogram generated sites from Robertson et al. only display sites above a cutoff.