| Literature DB >> 18286180 |
Troels T Marstrand1, Jes Frellsen, Ida Moltke, Martin Thiim, Eivind Valen, Dorota Retelska, Anders Krogh.
Abstract
BACKGROUND: In studies of gene regulation the efficient computational detection of over-represented transcription factor binding sites is an increasingly important aspect. Several published methods can be used for testing whether a set of hypothesised co-regulated genes share a common regulatory regime based on the occurrence of the modelled transcription factor binding sites. However there is little or no information available for guiding the end users choice of method. Furthermore it would be necessary to obtain several different software programs from various sources to make a well-founded choice.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18286180 PMCID: PMC2229843 DOI: 10.1371/journal.pone.0001623
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Speed comparison to naïve search
| File size | Our ESAsearch | Naïve | Searches |
|
| 0.20 | 2.44 | 15 |
|
| 0.13 | 1.22 | 14 |
|
| 0.04 | 0.27 | 12 |
|
| 0.01 | 0.07 | 8 |
Search time for our implementation compared to a naïve search. The final column indicates the number of PWMs to search with to ‘break-even’ with the naïve search taking into account the building time of the enhanced suffix array
Figure 1Performance of PWMs based on background model.
Average number of false hits in the background sequences per hit in the positive sequences across 117 JASPAR CORE PWMs.
Comparison of over-representation statistics based on background model.
| Order 0 | Binomial | Z-score | Fisher's | ROC | Wilcoxon | Ln-rank |
| TRUE | 99 | 67 | 95 | 54 | 21 | 53 |
| FALSE | 1046 | 4073 | 539 | 10386 | 2871 | 2326 |
| Ppv. | 0.0865 | 0.016 | 0.150 | 0.005 | 0.007 | 0.022 |
| Sens. | 0.846 | 0.573 | 0.812 | 0.462 | 0.180 | 0.453 |
| FPR | 0.065 | 0.254 | 0.034 | 0.648 | 0.180 | 0.145 |
| Spec. | 0.935 | 0.746 | 0.966 | 0.352 | 0.821 | 0.855 |
|
|
|
|
|
|
|
|
| TRUE | 92 | 59 | 87 | 59 | 26 | 48 |
| FALSE | 1522 | 3878 | 1219 | 5387 | 5785 | 2035 |
| Ppv. | 0.057 | 0.015 | 0.067 | 0.011 | 0.004 | 0.023 |
| Sens. | 0.786 | 0.504 | 0.744 | 0.504 | 0.222 | 0.410 |
| FPR | 0.095 | 0.242 | 0.076 | 0.336 | 0.361 | 0.127 |
| Spec. | 0.905 | 0.758 | 0.924 | 0.664 | 0.639 | 0.873 |
Performance of the different over-representation statistics based on a zeroth and third order background model. The PWM threshold is 0.9 of the scoring range.
Dilution test using Fisher's exact test.
| Prob. | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% |
| TRUE | 61 | 75 | 85 | 87 | 95 | 97 | 97 | 102 | 102 |
| FALSE | 395 | 433 | 465 | 492 | 539 | 573 | 604 | 652 | 681 |
| Sens. | 0.521 | 0.641 | 0.726 | 0.744 | 0.812 | 0.829 | 0.829 | 0.872 | 0.872 |
| Spec. | 0.975 | 0.973 | 0.971 | 0.969 | 0.966 | 0.964 | 0.962 | 0.960 | 0.958 |
Sensitivity and specificity measures based on the probability of embedded JASPAR sites across all 138 PWMs and 117 sequence sets, no correction for multiple testing.
Rank of the p53 PWM on ChIP data
| PET count | Binomial | Z-score | Fisher's | ROC | Wilcoxon | Ln-rank |
|
| 1 | 1 | 1 | 94 | 25 | 1 |
|
| 1 | 1 | 1 | 79 | 97 | 1 |
|
| 1 | 1 | 1 | 73.5 | 137 | 1 |
|
| 1 | 62 | 8 | 1 | 36.5 | 1 |
The rank of the PWM for p53 using the different statistics, * indicates that the significance value provided is significant at the 0.05 level.