| Literature DB >> 19160518 |
Anton Valouev1, David S Johnson, Andreas Sundquist, Catherine Medina, Elizabeth Anton, Serafim Batzoglou, Richard M Myers, Arend Sidow.
Abstract
Molecular interactions between protein complexes and DNA mediate essential gene-regulatory functions. Uncovering such interactions by chromatin immunoprecipitation coupled with massively parallel sequencing (ChIP-Seq) has recently become the focus of intense interest. We here introduce quantitative enrichment of sequence tags (QuEST), a powerful statistical framework based on the kernel density estimation approach, which uses ChIP-Seq data to determine positions where protein complexes contact DNA. Using QuEST, we discovered several thousand binding sites for the human transcription factors SRF, GABP and NRSF at an average resolution of about 20 base pairs. MEME motif-discovery tool-based analyses of the QuEST-identified sequences revealed DNA binding by cofactors of SRF, providing evidence that cofactor binding specificity can be obtained from ChIP-Seq data. By combining QuEST analyses with Gene Ontology (GO) annotations and expression data, we illustrate how general functions of transcription factors can be inferred.Entities:
Mesh:
Substances:
Year: 2008 PMID: 19160518 PMCID: PMC2917543 DOI: 10.1038/nmeth.1246
Source DB: PubMed Journal: Nat Methods ISSN: 1548-7091 Impact factor: 28.547
Figure 1QuEST’s representation of ChIP-Seq data using density profiles.. (A) GABP ChIP-Seq reads from the promoter and CpG island of the Nitric oxide synthase interacting protein gene. Hypothetical GABP binding in five cells and the corresponding DNA fragments with sequencing reads. Below, actual read data. Forward reads are displayed as small blue bands and reverse reads as small maroon bands. (B) Forward (blue) and reverse (maroon) Read Density Profiles derived from the read data contribute to the Combined Density Profile (orange). The zero x-coordinate corresponds to coordinate 54775300 of human Chromosome 19, NCBI build 36.
ChIP-Seq data and analysis summary.
| GABP | SRF | NRSF | NRSF | |
|---|---|---|---|---|
| Number of aligned ChIP reads | 7862231 | 8721730 | 8813398 | 5358147 |
| Number of peaks called by QuEST | 6442 | 2429 | 2960 | 2596 |
| FDR estimate | 1/6442 | 1/2429 | <1/2960 | 1/2595 |
| % peaks near genes (<2Kb or internal) | 83% | 72% | 53% | 53% |
Figure 2Reproducibility and robustness of QuEST results assessed by comparison between two independent NRSF data sets. (A) Correlation between NRSF polyclonal and NRSF monoclonal peak scores (rho = 0.97) with the inset expanding the portion near the graph origin. (B) Bar chart of the distance between NRSF polyclonal and NRSF monoclonal peak call positions.
Figure 3Resolution of QuEST as quantified by the distance between QuEST peak calls and TFBS motif centers. Histograms in each panel represent the distribution of peak distances to the nearest high-scoring motif.
Figure 4Motif analysis results. Each panel displays significantly overrepresented motif Weblogos24 for each of the three transcription factors. Pie-charts show the fraction of peaks with motifs in close proximity to the peak (< 100 bps). Histograms show the distribution of the motif number within 100 bps of the peak.