| Literature DB >> 21690102 |
Joseph K Pickrell1, Daniel J Gaffney, Yoav Gilad, Jonathan K Pritchard.
Abstract
MOTIVATION: Sequencing-based assays such as ChIP-seq, DNase-seq and MNase-seq have become important tools for genome annotation. In these assays, short sequence reads enriched for loci of interest are mapped to a reference genome to determine their origin. Here, we consider whether false positive peak calls can be caused by particular type of error in the reference genome: multicopy sequences which have been incorrectly assembled and collapsed into a single copy.Entities:
Mesh:
Year: 2011 PMID: 21690102 PMCID: PMC3137225 DOI: 10.1093/bioinformatics/btr354
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Sequences absent from the reference genome cause spurious peaks of sequencing reads. (A) An example of such a region. In each panel, we plot the density of uniquely mapped sequencing reads from three sources: the Illumina data from low coverage sequencing of Yoruba individuals from the 1000 Genomes Project (summed across all individuals), a study of DNaseI hypersensitivity (Pique-Regi ) and a study of MNase sensitivity (Schones ). In the first of these, copy number is expected to be approximately constant. In red are regions that we call as high depth regions at a threshold of 0.1%. (B) A long tail of very high read depth for sequences present once in the human reference. Using the coverage from the 1000 Genomes Project data, we plot the histogram of the coverage at each base (using 500 Mb of sequence). Marked are the positions corresponding to the top 0.1 and 0.01% of the distribution. (C) Collapsed repeats cause false peaks of sequencing reads in functional assays. For each experiment, we plot the fraction of the genome covered occupied by the mark, as well as the fraction of the HDRs covered by the mark. For the ChIP-seq on transcription factors, we used the binding sites called by the ENCODE Project (ENCODE Project Consortium, 2007) using PeakSeq (Rozowsky ). For the ChIP-seq on histone modifications (Wang ), we split the genome into windows of 200 bases and called the most extreme 0.1% of windows as bound. Shown are selected experiments; for all experiments see Supplementary Figures S2 and S3.