| Literature DB >> 22517426 |
Federico Zambelli1, Graziano Pesole, Giulio Pavesi.
Abstract
Motif discovery has been one of the most widely studied problems in bioinformatics ever since genomic and protein sequences have been available. In particular, its application to the de novo prediction of putative over-represented transcription factor binding sites in nucleotide sequences has been, and still is, one of the most challenging flavors of the problem. Recently, novel experimental techniques like chromatin immunoprecipitation (ChIP) have been introduced, permitting the genome-wide identification of protein-DNA interactions. ChIP, applied to transcription factors and coupled with genome tiling arrays (ChIP on Chip) or next-generation sequencing technologies (ChIP-Seq) has opened new avenues in research, as well as posed new challenges to bioinformaticians developing algorithms and methods for motif discovery.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22517426 PMCID: PMC3603212 DOI: 10.1093/bib/bbs016
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Figure 1:Describing a ‘motif’ representing the binding specificity of a transcription factor (CREB). Given a set of oligos known to be bound by the same TF, we can represent the motif they form by a ‘consensus’ (bottom left) with the most frequent nucleotide in each position; a ‘degenerate’ consensus, which includes ambiguous positions where there is no nucleotide clearly preferred (N = any nucleotide; K = G or T; M = A or C, according to IUPAC codes [12]); an alignment profile (right) that can be converted into a nucleotide frequency matrix by dividing each column by the number of sites used, as well as into a ‘sequence logo’ [13] showing the conservation of nucleotides and the respective information content contribution at each position.
Figure 2:Schematic view of the results of ChIP-Seq performed on a genomic region bound by a TF [79]. DNA is fragmented at random, and thus the ends of each sequenced DNA fragment map on different positions on the genome. Each fragment is assumed to be the 5′ of a 200- to 300-bp region. The ‘peak’, corresponding to the point of maximum enrichment (‘coverage’) within the region (that is, appearing in the highest number of sequenced fragments) should be located in correspondence of the actual binding site for the TF (bottom).