| Literature DB >> 19132088 |
Jaebum Kim1, Xin He, Saurabh Sinha.
Abstract
Characterization of the evolutionary constraints acting on cis-regulatory sequences is crucial to comparative genomics and provides key insights on the evolution of organismal diversity. We study the relationships among orthologous cis-regulatory modules (CRMs) in 12 Drosophila species, especially with respect to the evolution of transcription factor binding sites, and report statistical evidence in favor of key evolutionary hypotheses. Binding sites are found to have position-specific substitution rates. However, the selective forces at different positions of a site do not act independently, and the evidence suggests that constraints on sites are often based on their exact binding affinities. Binding site loss is seen to conform to a molecular clock hypothesis. The rate of site loss is transcription factor-specific and depends on the strength of binding and, in some cases, the presence of other binding sites in close proximity. Our analysis is based on a novel computational method for aligning orthologous CRMs on a tree, which rigorously accounts for alignment uncertainties and exploits binding site predictions through a unified probabilistic framework. Finally, we report weak purifying selection on short deletions, providing important clues about overall spatial constraints on CRMs. Our results present a complex picture of regulatory sequence evolution, with substantial plasticity that depends on a number of factors. The insights gained in this study will help us to understand the combinatorial control of gene regulation and how it evolves. They will pave the way for theoretical models that are cognizant of the important determinants of regulatory sequence evolution and will be critical in genome-wide identification of non-coding sequences under purifying or positive selection.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19132088 PMCID: PMC2607023 DOI: 10.1371/journal.pgen.1000330
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Correlation between the specificity of a TFBS position and its evolutionary rate, with Pecan alignments.
| Factor | Number of TFBSs | Width of motif | Correlation coefficient | P-value |
| bcd | 160 | 8 | −0.75 |
|
| cad | 175 | 9 | −0.48 | 0.0969 |
| dstat | 129 | 9 | −0.83 |
|
| hb | 170 | 8 | −0.69 |
|
| kni | 85 | 12 | −0.82 |
|
| kr | 177 | 11 | −0.53 |
|
| tll | 185 | 10 | −0.38 | 0.1375 |
Spearman's correlation coefficient.
Figure 1Correlation between the specificity of a TFBS position and its evolutionary rate in transcription factors Dstat and Kni, with Pecan alignments.
Comparison of HB and SS models.
| Factor | Median SSE | P-value |
| |
| HB model | SS model | |||
| bcd | 0.19 | 0.10 | <2.20E-16 | 8 |
| cad | 0.23 | 0.16 | <2.20E-16 | 8 |
| dstat | 0.12 | 0.06 | <2.20E-16 | 11 |
| hb | 0.10 | 0.07 | <2.20E-16 | 15 |
| kni | 0.21 | 0.15 | <2.20E-16 | 19 |
| kr | 0.19 | 0.15 | <2.20E-16 | 8 |
| tll | 0.18 | 0.10 | <2.20E-16 | 17 |
Median values of sum of squared errors (SSE) from 100 different simulations with the model.
P-value from paired Wilcoxon signed-rank test.
Optimal value of the free parameter of SS model.
Figure 2Distributions of evolutionary changes in observed binding sites (Observed), and those simulated by Halpern-Bruno (HB) and Site-level Selection (SS) models for the transcription factor Bcd in D. melanogaster and D. yakuba species pair.
(A) Distribution of energy difference between a predicted binding site in D. melanogaster and its orthologous site in D. yakuba. The x and y axes represent energy difference and frequency respectively. (B) Distribution of the number of substitutions between D. melanogaster and D. yakuba sites. The x and y axes represent the number of substitutions and frequency respectively. SSE denotes the sum of squared errors between the observed and the simulation-based distributions and “4Ns” denotes the optimal value of this free parameter of the SS model.
Figure 3The fraction of D. melanogaster TFBSs that are conserved in a related species (y-axis), as a function of the divergence time to that species (x-axis), for transcription factors Cad and Dstat.
Goodness-of-fit of a linear model for the fraction of conserved binding sites over divergence time.
| Factor | R2 (raw data) | Adjusted R2 (corrected data) | FP |
| bcd | 0.9813 | 0.9631 | 0.14 |
| cad | 0.9857 | 0.9693 | 0.29 |
| dstat | 0.9913 | 0.9831 | 0.26 |
| hb | 0.9114 | 0.9180 | 0.24 |
| kni | 0.9642 | 0.9883 | 0.31 |
| kr | 0.9698 | 0.9097 | 0.27 |
| tll | 0.9894 | 0.9515 | 0.32 |
R2 from raw data without correcting for the false positive rate.
Adjusted R2 from data corrected for the false positive rate.
Estimated false positive rate obtained by regression.
Comparison of loss rates of binding sites using real and random motifs.
| Random PWMs | |||
| Factor | Loss rate | Mean | Stdev |
| bcd | 0.1865 | 0.2530 | 0.0217 |
| cad | 0.1969 | 0.2444 | 0.0213 |
| dstat | 0.2471 | 0.2642 | 0.0172 |
| hb | 0.1470 | 0.1937 | 0.0211 |
| kni | 0.2315 | 0.2551 | 0.0170 |
| kr | 0.1811 | 0.2666 | 0.0172 |
| tll | 0.2147 | 0.2389 | 0.0191 |
These rates are without false positive correction.
Correlation between TFBS strength and TFBS turnover rate.
| Factor | Number of TFBS sets | Correlation coefficient | P-value | Random PWM |
| bcd | 163 | −0.71 |
| 0 |
| cad | 168 | −0.30 | 0.0974 | 18 |
| dstat | 129 | −0.46 |
| 11 |
| hb | 168 | −0.62 |
| 0 |
| kni | 86 | −0.58 |
| 4 |
| kr | 191 | −0.72 |
| 0 |
| tll | 188 | −0.86 |
| 0 |
Spearman's correlation coefficient.
Number of random PWMs (out of 100 simulations) that show greater correlation than the real motif.
Correlation between the distance between two adjacent homotypic sites and TFBS turnover rate.
| Factor | Number of TFBSs | Correlation coefficient | P-value |
| bcd | 157 | 0.04 | 0.3969 |
| cad | 162 | 0.38 |
|
| dstat | 112 | 0.00 | 0.5000 |
| hb | 156 | 0.30 |
|
| kni | 82 | 0.24 | 0.1270 |
| kr | 183 | 0.14 | 0.2212 |
| tll | 178 | 0.30 |
|
Spearman's correlation coefficient.
Binding site conservation and its spatial context.
| Factor | P vs D | O vs NO |
| bcd | 0.9981 | 0.5910 |
| cad | 0.5626 |
|
| dstat |
| 0.8981 |
| hb | 0.2141 |
|
| kni | 0.4784 | 0.2425 |
| kr | 0.2071 |
|
| tll |
| 0.0806 |
Numbers are P-values from hypergeometric test.
P means proximal and D means distal.
O means overlap and NO means non-overlap.
*: The opposite p-value is 0.0124.