| Literature DB >> 15202939 |
Annette Höglund1, Oliver Kohlbacher.
Abstract
Gene regulation in higher organisms is achieved by a complex network of transcription factors (TFs). Modulating gene expression and exploring gene function are major aims in molecular biology. Furthermore, the identification of putative target genes for a certain TF serve as powerful tools for specific targeting of rational drugs.Detecting the short and variable transcription factor binding sites (TFBSs) in genomic DNA is an intriguing challenge for computational and structural biologists. Fast and reliable computational methods for predicting TFBSs on a whole-genome scale offer several advantages compared to the current experimental methods that are rather laborious and slow. Two main approaches are being explored, advanced sequence-based algorithms and structure-based methods.The aim of this review is to outline the computational and experimental methods currently being applied in the field of protein-DNA interactions. With a focus on the former, the current state of the art in modeling these interactions is discussed. Surveying sequence and structure-based methods for predicting TFBSs, we conclude that in order to achieve a sound and specific method applicable on genomic sequences it is desirable and important to bring these two approaches together.Entities:
Year: 2004 PMID: 15202939 PMCID: PMC441406 DOI: 10.1186/1477-5956-2-3
Source DB: PubMed Journal: Proteome Sci ISSN: 1477-5956 Impact factor: 2.480
Figure 1Characteristics of C-G and T-A base pairs Intermolecular H-bonds (dotted lines) in the C-G and T-A bp, stabilize the DNA double helix. The bp edges form a pattern of H-bond acceptors and donors that can be recognized by amino acid side chains of proteins. This pattern is unique for each bp (C-G, G-C, T-A, and A-T) in the major groove (up), whereas it is only possible to distinguish a C-G bp (top) form an T-A bp (bottom) in the minor groove (down) [9]. H-bond acceptors and donors are indicated by outward and inward pointing arrows respectively. The letter M is the methyl group of the base T and His a ring hydrogen donor. The chemical composition of the DNA sugar-phosphate backbone (not shown) is constant and independent of the bp sequence.
Representation of an example TFBS. Two sequence-based representations of the same TFBS, a consensus sequence and a position specific scoring matrix (PSSM). The example used here is the binding site of the early growth response protein 1 (EGR-1, Zif268), which is a zinc finger protein.
| CONSENSUS | T | G | C | G | T | G | G | G | C | G | |
| POSITION | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
| SCORING MATRIX | A | 5 | 7 | 0 | 2 | 0 | 31 | 0 | 0 | 13 | 0 |
| C | 3 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | |||
| G | 5 | 0 | 14 | 0 | |||||||
| T | 0 | 2 | 0 | 0 | 0 | 0 | 11 | 0 | |||