| Literature DB >> 22837642 |
Pieter Meysman1, Kathleen Marchal, Kristof Engelen.
Abstract
It has been long known that DNA molecules encode information at various levels. The most basic level comprises the base sequence itself and is primarily important for the encoding of proteins and direct base recognition by DNA-binding proteins. A more elusive level consists of the local structural properties of the DNA molecule wherein the DNA sequence only plays an indirect supportive role. These properties are nevertheless an important factor in a large number of biomolecular processes and can be considered as informative signals for the presence of a variety of genomic features. Several recent studies have unequivocally shown the benefit of relying on such DNA properties for modeling and predicting genomic features as diverse as transcription start sites, transcription factor binding sites, or nucleosome occupancy. This review is meant to provide an overview of the key aspects of these DNA conformational and physicochemical properties. To illustrate their potential added value compared to relying solely on the nucleotide sequence in genomics studies, we discuss their application in research on transcription regulation mechanisms as representative cases.Entities:
Keywords: DNA structure; functional genomics; structural scales; transcription
Year: 2012 PMID: 22837642 PMCID: PMC3399529 DOI: 10.4137/BBI.S9426
Source DB: PubMed Journal: Bioinform Biol Insights ISSN: 1177-9322
Examples of structural properties.
| Structural property | Description | Category |
|---|---|---|
| Slide-rise-tilt-roll-twist-shift | The rotational and translational deviations present in DNA base pair steps. | Conformational |
| Curvature | The large scale curves made by the DNA molecule. They are often derived from the base pair step deviations as per the wedge model. The value of these scales typically corresponds to the intensity of the DNA curvature. | Conformational |
| Minor/major groove depth/width | The size of the minor and/or major groove, with larger grooves usually allowing easier access to the bases within the helix. | Conformational |
| A/Z-philicity | The propensity of the DNA molecule to adopt the A-form or Z-form. Often estimated based on the difference in free energy of these forms. | Conformational |
| Propeller twist | Though intrinsically a conformation property, there is a direct link between the twist of the DNA base pairs and it’s rigidity towards deformations. | Conformational |
| Persistence length | The molecular distance that the DNA molecule is expected to keep directionality. Also referred to as the DNA bending stiffness. | Physicochemical |
| DNA stability | Several measures for the DNA helical stability exists, often enumerating the energy theoretically needed per base pair step to disrupt or create the DNA helix. | Physicochemical |
| Stress-induced duplex stability | The stability of the DNA helix which accounts for the torsional stress resulting from the superhelical winding. | Physicochemical |
| Base stacking energy | The stacking energy of sequential bases that contributes to the overall stability of the DNA helix. | Physicochemical |
| Deformability | Deviations accepted by the DNA molecule in response to protein binding. The inverse of these scales is the rigidity, ie, the resistance towards these deviations. | Physicochemical |
| Bendability | Usually refers to the propensity of the DNA molecule to bend or be bent in a specific direction. For example, the Brukner scale enumerates the bendability towards the major groove. | Physicochemical |
Figure 1Modeling structural properties of the DNA.
Notes: Along the length of the DNA all oligonucleotides of a certain order (usually di- or trinucleotides) are looked up in a table (called a structural scale) which contains corresponding values measuring a certain structural property. These structural scales can represent conformational or physicochemical structural properties, and when viewed along the length of the DNA form a structural profile. Due to the discrete nature of the scale values, a structural profile usually has a staircase-like appearance (full line; the S axis represents the structural property values obtained from the lookup scales). Often these profiles are further smoothed (dotted line), or one might take the average value over a stretch of DNA of a given length (horizontal dashed lines) before they are put to use.
Figure 2Predicting DNA binding from DNA structural properties.
Notes: In the case of the consensus approach (left hand panel), all known binding sites are used to generate a consensus profile (red line, representing the structural property stability in this example), which in turn can be used to predict novel binding sites. The consensus profile is an average of structural profiles of aligned known binding sites (grey lines). In DNA threading (right hand panel), a sliding window moves along the DNA calculating the energy required (ΔE axis) for the given stretch of DNA to adapt the required conformation based on structural profiles, in this figure represented by the deformability (green line).
Summary of DNA-binding protein consensus approaches.
| Name | Statistical model | # Properties | Additional information |
|---|---|---|---|
| Karas et al | Absolute distance between profiles | 4 conformational + 1 physicochemical properties | |
| ACTIVITY/B–DNA–VIDEO | Gaussian distribution | 35 structural properties | |
| Liu et al | Gaussian distribution | 5 conformational properties | |
| Gunewardena et al | Linear discrimination model | 35 structural properties | Linear steady-state structural templates |
| SITECON | Chi-squared test | 35 structural properties | |
| Gardiner et al | Fourier transformation | 5 conformational + 18 physicochemical properties | |
| ICSF | Moses rank-like test | 6 conformational properties | |
| MDS-HMM | Hidden Markov model | 2 conformational properties | |
| ProMapper | Bayesian network | 35 structural properties | Additional sequence features, eg, base composition |
| Holloway et al | Support vector machine | 1 conformational + 4 physicochemical properties | Additional sequence features, eg, phylogeny |
| DISCOVER | Conditional random fields | 1 physicochemical property (DNA stability) | Additional sequence features, eg, phylogeny |
| GANN | Neural network + Genetic algorithm | Unspecified | |
| CRoSSeD | Conditional random fields | 5 conformational + 7 physicochemical properties | |
| SiteSleuth | Support vector machine | 12 conformational + 62 physicochemical properties | Physicochemical properties reduced to 8 eigen vectors |
Summary of structure-based promoter prediction methods.
| Name | Statistical model | Structural property | Organism(s) |
|---|---|---|---|
| EP3 | Average profile | Base stacking energy | Animals, fungi, algae, higher plants and protists |
| ProSOM | Unsupervised self organizing map | Base stacking energy | Human |
| Florquin et al | Adaptive quality-based clustering | 5 conformational + 8 physicochemical properties | Human, mouse and plant |
| PNNP | Pattern-based nearest neighbor search | DNA stability | Human, mouse, |
| PromPredict | Absolute and relative difference | DNA stability | Plants and prokaryotes |
| McPromoter | Stochastic segment model | DNA twist and persistence length | |
| Prostar | Mahalanobis distance | DNA deformability | Human |
| Profisi | Average profile | DNA stability | Human |
| ARTS | Support vector machine | DNA twist and base stacking energy | Human |
| Gardiner et al | Ward’s clustering algorithm | 5 conformational + 18 physicochemical properties | Human |
| Wang et al | Linear discrimination model | Stress-induced duplex destability | |
| N4 | Neural network | DNA stability | |
| Conilione et al | Neural network | Base stacking energy | |
| Parbhane et al | Neural network | DNA wedge and twist | |
| Mallios et al | Stepwise binary logistic regression | 2 conformational + 2 physicochemical properties |
Figure 3Conceptual representation of structural features of eukaryotic and prokaryotic promoters.
Notes: Eukaryotic proximal promoters (left hand panel) are generally characterized by an increased stability (red line; the S axis represents the structural property values) and a decreasing rigidity (green line) compared to surrounding regions (although eukaryotic promoters on average still have a higher rigidity than the rest of the genome), with strong peaks or valleys at functional sites such as the TSS or TATA box. In contrast, prokaryotic promoters on average have a decreased stability (red line) and increased rigidity (green line) and have been observed to show a broad curvature peak (blue line) upstream of the TSS.