| Literature DB >> 18411204 |
Abstract
Understanding transcriptional regulation is a major goal of molecular biology. Motif expression decomposition (MED) was recently introduced to describe the expression level of a gene as the sum of the products of the binding strengths of its cis-regulatory motifs and the activities of the corresponding trans-acting transcription factors (TFs). Here, we use computer simulation to examine the accuracy of MED. We found that although MED accurately rebuilds gene expression levels from decomposed motif binding strengths and TF activities, estimates of motif binding strengths and TF activities are unreliable. Nonetheless, MED provides accurate estimates of relative binding strengths of the same motif in different genes and relative activities of the same TF under different conditions. We found that reasonably accurate results are achievable with genome-wide expression data from only 30 conditions and that MED results are robust to the existence of unknown occurrences of known motifs, although they are less robust to the presence of unknown motifs. With these understandings, judicious use of MED will likely provide useful information about eukaryotic transcriptional regulation. As an example, MED results are used to demonstrate that motifs generally have higher binding strengths when appearing in multiple copies than appearing in one copy per promoter.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18411204 PMCID: PMC2425491 DOI: 10.1093/nar/gkn127
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Pearson's correlation coefficients (± standard deviation) between the true values and MED-predicted values of expression levels (), motif binding strengths () and TF activities ()
| Noise level (%) | |||||||
|---|---|---|---|---|---|---|---|
| 0 | 1.000 ± 0.000 | 0.120 ± 0.997 | 0.998 | 0.289 | 0.120 ± 0.997 | 0.996 | −0.044 |
| 5 | 0.997 ± 0.001 | 0.179 ± 0.988 | 0.986 | −0.028 | 0.179 ± 0.988 | 0.992 | 0.200 |
| 10 | 0.991 ± 0.005 | 0.119 ± 0.997 | 0.988 | 0.101 | 0.119 ± 0.997 | 0.964 | 0.020 |
| 20 | 0.962 ± 0.026 | 0.119 ± 0.996 | 0.942 | 0.004 | 0.119 ± 0.995 | 0.930 | 0.048 |
| 30 | 0.929 ± 0.036 | 0.199 ± 0.981 | 0.904 | 0.081 | 0.199 ± 0.981 | 0.862 | −0.045 |
| 40 | 0.872 ± 0.063 | 0.059 ± 0.997 | 0.848 | −0.028 | 0.059 ± 0.995 | 0.771 | 0.103 |
| 50 | 0.834 ± 0.067 | 0.178 ± 0.979 | 0.812 | −0.031 | 0.179 ± 0.977 | 0.714 | 0.110 |
| 100 | 0.606 ± 0.099 | 0.300 ± 0.890 | 0.587 | 0.064 | 0.303 ± 0.893 | 0.435 | 0.170 |
Note: The simulated expression data are from 300 conditions.
aRelative binding strengths of the same motif in two genes.
bRelative binding strengths of two different motifs.
cRelative activities of the same TF under two different conditions.
dRelative activities of two different TFs.
Figure 1.Comparison between the true () and MED-predicted (′) gene expression levels. The noise level is 30%. Note that the expression levels are log-transformed and thus can be negative.
Figure 2.Comparison between true () and MED-predicted (′) motif binding strengths. The noise level is 30%. (A) The scatter plot for true and predicted motif binding strengths. Note the difference in scale between X-axis and Y-axis. (B) True and predicted relative binding strengths of the same motifs in different genes. (C) True and predicted relative binding strengths of pairs of different motifs.
Figure 3.The distribution of Pearson's correlation coefficient between columns (motifs) of and , when all non-zero entries in I are (A) 1, (B) −1, and (C) randomly assigned to be either 1 or −1, with equal probabilities. B is the mean motif binding strength in .
Pearson's correlation coefficients between true values and MED-predicted values of expression levels (), relative motif binding strengths () and relative TF activities (), when the expression data are obtained from 300, 100 and 30 conditions, respectively
| Noise level (%) | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| 300 conditions | 100 conditions | 30 conditions | 300 conditions | 100 conditions | 30 conditions | 300 conditions | 100 conditions | 30 conditions | |
| 0 | 1.000 ± 0.000 | 0.997 ± 0.004 | 1.000 ± 0.000 | 0.998 | 0.993 | 0.976 | 0.996 | 0.998 | 0.996 |
| 5 | 0.997 ± 0.001 | 0.997 ± 0.001 | 0.998 ± 0.001 | 0.986 | 0.989 | 0.946 | 0.992 | 0.987 | 0.993 |
| 10 | 0.991 ± 0.005 | 0.990 ± 0.006 | 0.989 ± 0.009 | 0.988 | 0.933 | 0.867 | 0.964 | 0.956 | 0.976 |
| 20 | 0.962 ± 0.026 | 0.964 ± 0.025 | 0.967 ± 0.027 | 0.942 | 0.845 | 0.699 | 0.930 | 0.906 | 0.916 |
| 30 | 0.929 ± 0.036 | 0.930 ± 0.037 | 0.934 ± 0.049 | 0.904 | 0.840 | 0.586 | 0.862 | 0.873 | 0.818 |
| 40 | 0.872 ± 0.063 | 0.880 ± 0.061 | 0.887 ± 0.076 | 0.848 | 0.760 | 0.579 | 0.771 | 0.798 | 0.744 |
| 50 | 0.834 ± 0.067 | 0.833 ± 0.078 | 0.841 ± 0.098 | 0.812 | 0.611 | 0.404 | 0.714 | 0.680 | 0.623 |
| 100 | 0.606 ± 0.099 | 0.595 ± 0.125 | 0.652 ± 0.164 | 0.587 | 0.361 | 0.224 | 0.435 | 0.359 | 0.314 |
aRelative binding strengths of the same motif in two genes.
bRelative activities of the same TF under two different conditions.
Figure 4.Performance of the MED method in predicting relative motif binding strength when some motifs in the genome are undetected. The mean correlation coefficient from 10 simulations and the associated standard deviation are presented for each condition examined. In (A), a fraction of motifs (from 0% to 50%) for each TF are undetected in the genome. In (B), all motifs of a fraction of TFs (from 0% to 50%) are undetected in the genome. Different colors show different fractions.
Figure 5.Frequency distribution of the ratio () between the mean binding strength of a motif in promoters where it has multiple copies to the mean binding strength of the same motif in promoters where it has one copy. The distribution is from 44 different motifs in yeast.