| Literature DB >> 23408994 |
Rithun Mukherjee1, Perry Evans, Larry N Singh, Sridhar Hannenhalli.
Abstract
Transcriptional regulation critically depends on proper interactions between transcription factors (TF) and their cognate DNA binding sites. The widely used model of TF-DNA binding--the positional weight matrix (PWM)--presumes independence between positions within the binding site. However, there is evidence to show that the independence assumption may not always hold, and the extent of interposition dependence is not completely known. We hypothesize that the interposition dependence should partly be manifested as correlated evolution at the positions. We report a maximum-likelihood (ML) approach to infer correlated evolution at any two positions within a PWM, based on a multiple alignment of 5 mammalian genomes. Application to a genome-wide set of putative cis elements in human promoters reveals a prevalence of correlated evolution within cis elements. We found that the interdependence between two positions decreases with increasing distance between the positions. The interdependent positions tend to be evolutionarily more constrained and moreover, the dependence patterns are relatively similar across structurally related transcription factors. Although some of the detected mutational dependencies may be due to context-dependent genomic hyper-mutation, notably CG to TG, the majority is likely due to context-dependent preferences for specific nucleotide combinations within the cis elements. Patterns of evolution at individual nucleotide positions within mammalian TF binding sites are often significantly correlated, suggesting interposition dependence. The proposed methodology is also applicable to other classes of non-coding functional elements. A detailed investigation of mutational dependencies within specific motifs could reveal preferred nucleotide combinations that may help refine the DNA binding models.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23408994 PMCID: PMC3568137 DOI: 10.1371/journal.pone.0055521
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Overview of the approach.
The top left panel depicts positions i and j within a binding site for a PWM M having N genome-wide matches. The binding site in human is shown in the context of a 5-species multiple alignment. The top left panel also shows the phylogenetic trees for the two positions. The phylogenetic parameters are estimated from the genome-wide set of promoter alignments (top right panel). The likelihoods of ancestral nodes for a specific PWM and position (or position-pair) are then estimated from the N instances and the phylogenetic parameters. The likelihoods of individual trees and tree-pairs and ultimately the CoEvol for a pair of PWM positions are then estimated as detailed in the text. The lower right panel illustrates the procedure to generate Shuffle control. The figure depicts N instances of position pair (i,j) in the central row. A random j-position is paired with each of the i-positions (lower row).
Figure 2The distribution of CoEvol values for scope 1.
Shown are CoEvol values for Foreground, Random, RandomContext and Shuffle controls. Values for RandomContext are included after correcting for a shift in the distribution of tree likelihoods (see Methods for details). Relative to the most stringent control – Shuffle, the Foreground CoEvol values are significantly greater (Mann-Whitney U test p-value = 0.02, Kolmogorov-Smirnov test p-value = 1.1e-12).
Example to illustrate the concept of normalized weight (or probability) of ancestral assignments.
| Ancestral node assignment( | Sites ( |
|
| |||||
| 1 | 2 | 3 | 4 | 5 | ||||
| 1 | A | 0.001 | 0.05 | 0.0003 | 0.003 | 0.01 | 0.0643 | 0.0834 |
| 2 | T | 0.01 | 0.0007 | 0.009 | 0.08 | 0.1 | 0.1997 | 0.259021 |
| 3 | G | 0.00008 | 0.003 | 0.006 | 0.004 | 0.0009 | 0.01398 | 0.018133 |
| 4 | C | 0.023 | 0.018 | 0.065 | 0.3 | 0.087 | 0.493 | 0.639446 |
We present a simplified situation of two species related through a common ancestor, where the evolutionary tree has just one internal node representing the ancestor, with four possible ancestral assignments. For a sample PWM with 5 sites aligned over the two species, we provide representative values (in blue) for the probability of the tree corresponding to each site given a particular ancestral assignment. From these we work out the overall probability of an ancestral assignment given the data (last column). For details, see the text in the Materials and Methods section that references this table.