| Literature DB >> 23282090 |
Abstract
BACKGROUND: Computational approaches for finding DNA regulatory motifs in promoter sequences are useful to biologists in terms of reducing the experimental costs and speeding up the discovery process of de novo binding sites. It is important for rule-based or clustering-based motif searching schemes to effectively and efficiently evaluate the similarity between a k-mer (a k-length subsequence) and a motif model, without assuming the independence of nucleotides in motif models or without employing computationally expensive Markov chain models to estimate the background probabilities of k-mers. Also, it is interesting and beneficial to use a priori knowledge in developing advanced searching tools.Entities:
Mesh:
Year: 2012 PMID: 23282090 PMCID: PMC3521183 DOI: 10.1186/1752-0509-6-S2-S4
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Conservation and rareness characterization of functional motifs
| Conserved ( | Random ( | ||||||
|---|---|---|---|---|---|---|---|
| CREB | 0.188 | 0.257 ±0.025 | 0.009 | 02.75 | 0.458 ±0.016 | 0.000 | 16.60 |
| SRF | 0.193 | 0.286 ±0.025 | 0.000 | 03.76 | 0.458 ±0.012 | 0.000 | 22.01 |
| TBP | 0.134 | 0.243 ±0.027 | 0.000 | 04.04 | 0.493 ±0.008 | 0.000 | 43.79 |
| MYOD | 0.104 | 0.195 ±0.036 | 0.004 | 02.54 | 0.467 ±0.016 | 0.000 | 22.22 |
| ERE | 0.214 | 0.331 ±0.012 | 0.000 | 10.15 | 0.439 ±0.007 | 0.000 | 31.87 |
| E2F | 0.203 | 0.309 ±0.019 | 0.000 | 05.65 | 0.444 ±0.009 | 0.000 | 27.54 |
| CRP | 0.307 | 0.380 ±0.006 | 0.000 | 11.48 | 0.422 ±0.005 | 0.000 | 21.45 |
| GAL4 | 0.246 | 0.261 ±0.016 | 0.181 | 00.88 | 0.418 ±0.008 | 0.000 | 20.95 |
| CREB* | 0.188 | 0.224 ±0.024 | 0.058 | 01.47 | 0.460 ±0.017 | 0.000 | 15.76 |
| SRF* | 0.193 | 0.261 ±0.023 | 0.000 | 03.01 | 0.461 ±0.010 | 0.000 | 26.46 |
| TBP* | 0.134 | 0.186 ±0.026 | 0.010 | 02.03 | 0.491 ±0.007 | 0.000 | 48.37 |
| MYOD* | 0.104 | 0.158 ±0.033 | 0.057 | 01.62 | 0.472 ±0.015 | 0.000 | 24.05 |
Remark: the following relation R(M)
Description of the used 33 datasets
| CREB | 200 | H | (05, 30, 12) | 17 | 19 |
| SRF | 200 | H | (09, 22, 12) | 20 | 35 |
| TBP | 200 | H | (05, 24, 07) | 95 | 95 |
| MEF2 | 200 | H | (07, 15, 10) | 17 | 17 |
| MYOD | 200 | H | (06, 06, 06) | 17 | 21 |
| ERE | 200 | M | (13, 13, 13) | 25 | 25 |
| E2F | 200 | M | (11, 11, 11) | 25 | 27 |
| CRP | 105 | E | (22, 22, 22) | 18 | 24 |
| dm01g | 1500 | D | (13, 28, 20) | 04 | 07 |
| dm04m | 2000 | D | (10, 26, 15) | 04 | 09 |
| hm02r | 1000 | H | (10, 36, 23) | 09 | 11 |
| hm03r | 1500 | H | (14, 46, 27) | 10 | 15 |
| hm06g | 500 | H | (06, 14, 08) | 09 | 09 |
| hm08m | 500 | H | (05, 34, 15) | 15 | 13 |
| hm09g | 1500 | H | (07, 26, 16) | 10 | 10 |
| hm10m | 500 | H | (07, 09, 08) | 06 | 11 |
| hm11g | 1000 | H | (06, 42, 14) | 08 | 19 |
| hm16g | 3000 | H | (09, 54, 23) | 07 | 07 |
| hm17g | 500 | H | (10, 18, 15) | 11 | 10 |
| hm20r | 2000 | H | (06, 71, 17) | 35 | 76 |
| hm21g | 1000 | H | (10, 23, 13) | 05 | 07 |
| hm24m | 500 | H | (08, 18, 12) | 08 | 08 |
| hm26m | 1000 | H | (11, 36, 25) | 09 | 10 |
| mus02r | 1000 | M | (10, 33, 19) | 09 | 12 |
| mus10g | 1000 | M | (05, 28, 15) | 13 | 15 |
| mus11m | 500 | M | (06, 27, 15) | 12 | 15 |
| yst08r | 1000 | M | (12, 49, 21) | 11 | 14 |
| yst09g | 1000 | Y | (09, 19, 17) | 16 | 13 |
| CREB | 500 | H | (05, 30, 12) | 17 | 19 |
| SRF | 500 | H | (09, 22, 12) | 20 | 36 |
| TBP | 500 | H | (05, 24, 07) | 95 | 95 |
| MEF2 | 500 | H | (07, 15, 10) | 17 | 17 |
| MYOD | 500 | H | (06, 06, 06) | 17 | 21 |
Notations: Ldenotes the average length of the sequences in base pair count (bp), Res is the resource: (D, H, M, Y, E) refer to (drosophila melanogaster, (human, mouse, rat), saccharomyces cerevisiae, e.coli) respectively, Ldenotes the length of the binding sites in bp, Nis the number of the sequences in the dataset and Nis the number of the binding sites in the dataset.
Sep(R, R) score comparison for different local block length w in R
| TF | ||||
|---|---|---|---|---|
| CREB | 0.022 ± 0.047 | 0.022 ± 0.047 | -0.016 ± 0.049 | -0.016 ± 0.049 |
| SRF | -0.022 ± 0.034 | -0.022 ± 0.034 | -0.030 ± 0.035 | -0.030 ± 0.035 |
| TBP | 0.125 ± 0.020 | 0.128 ± 0.020 | 0.128 ± 0.020 | 0.128 ± 0.020 |
| MEF2 | 0.358 ± 0.041 | 0.358 ± 0.041 | 0.367 ± 0.041 | 0.367 ± 0.041 |
| MYOD | 0.066 ± 0.037 | -0.089 ± 0.045 | -0.089 ± 0.045 | -0.089 ± 0.045 |
| ERE | -0.008 ± 0.028 | -0.008 ± 0.028 | -0.081 ± 0.031 | -0.210 ± 0.038 |
| E2F | 0.110 ± 0.027 | 0.110 ± 0.027 | 0.127 ± 0.026 | 0.136 ± 0.026 |
| CRP | 0.052 ± 0.028 | 0.052 ± 0.028 | 0.110 ± 0.024 | -0.110 ± 0.039 |
| 0.069 ± 0.034 | 0.065 ± 0.034 | 0.022 ± 0.037 | ||
| dm01g | 0.101 ± 0.035 | 0.101 ± 0.035 | 0.105 ± 0.036 | 0.100 ± 0.037 |
| dm04m | 0.053 ± 0.033 | 0.053 ± 0.033 | 0.051 ± 0.035 | 0.051 ± 0.035 |
| hm02r | 0.219 ±0.043 | 0.219 ± 0.043 | 0.146 ± 0.050 | 0.146 ± 0.050 |
| hm03r | 0.135 ± 0.037 | 0.135 ± 0.037 | 0.146 ± 0.037 | 0.146 ± 0.037 |
| hm06g | 0.139 ± 0.051 | 0.062 ± 0.058 | 0.062 ± 0.058 | 0.062 ± 0.058 |
| hm08m | 0.084 ± 0.041 | 0.091 ± 0.041 | 0.088 ± 0.042 | 0.088 ± 0.042 |
| hm09g | 0.114 ± 0.075 | 0.114 ± 0.075 | 0.141 ± 0.074 | 0.141 ± 0.074 |
| hm10m | 0.134 ± 0.038 | 0.134 ± 0.038 | 0.129 ± 0.040 | 0.129 ± 0.040 |
| hm11g | 0.168 ± 0.045 | 0.168 ± 0.045 | 0.191 ± 0.044 | 0.191 ± 0.044 |
| hm16g | 0.140 ± 0.077 | 0.140 ± 0.077 | 0.007 ± 0.098 | 0.007 ± 0.098 |
| hm17g | 0.065 ± 0.045 | 0.065 ± 0.045 | 0.026 ± 0.049 | 0.026 ± 0.049 |
| hm20r | 0.322 ± 0.023 | 0.322 ± 0.023 | 0.299 ± 0.024 | 0.299 ± 0.024 |
| hm21g | 0.064 ± 0.051 | 0.064 ± 0.051 | 0.060 ± 0.054 | 0.060 ± 0.054 |
| hm24m | 0.107 ± 0.042 | 0.107 ± 0.042 | 0.081 ± 0.045 | 0.081 ± 0.045 |
| hm26m | 0.265 ± 0.044 | 0.265 ± 0.044 | 0.216 ± 0.049 | 0.216 ± 0.049 |
| mus02r | 0.004 ± 0.119 | 0.004 ± 0.119 | -0.273 ± 0.198 | -0.273 ± 0.198 |
| mus10g | 0.350 ± 0.056 | 0.354 ± 0.056 | 0.354 ± 0.056 | 0.354 ± 0.056 |
| mus11m | 0.340 ± 0.042 | 0.340 ± 0.042 | 0.329 ± 0.043 | 0.329 ± 0.043 |
| yst08r | 0.131 ± 0.045 | 0.131 ± 0.045 | 0.118 ± 0.047 | 0.107 ± 0.047 |
| yst09g | 0.353 ± 0.056 | 0.353 ± 0.056 | 0.337 ± 0.058 | 0.333 ± 0.059 |
| 0.161 ± 0.050 | 0.131 ± 0.057 | 0.130 ± 0.057 | ||
| CREB | 0.072 ± 0.042 | 0.072 ± 0.042 | 0.049 ± 0.043 | 0.049 ± 0.043 |
| SRF | -0.026 ± 0.028 | -0.026 ± 0.028 | -0.032 ± 0.029 | -0.032 ± 0.029 |
| TBP | 0.129 ± 0.019 | 0.133 ± 0.019 | 0.133 ± 0.019 | 0.133 ± 0.019 |
| MEF2 | 0.372 ± 0.042 | 0.372 ± 0.042 | 0.380 ± 0.042 | 0.380 ± 0.042 |
| MYOD | 0.088 ± 0.034 | -0.076 ± 0.042 | -0.076 ± 0.042 | -0.076 ± 0.042 |
| 0.095 ± 0.035 | 0.091 ± 0.035 | 0.091 ± 0.035 | ||
| 0.088 ±0.033 | 0.069 ± 0.034 | 0.065 ± 0.034 | 0.022 ± 0.037 | |
| 0.164 ±0.050 | 0.161 ± 0.050 | 0.131 ± 0.057 | 0.130 ± 0.057 | |
| 0.127 ±0.033 | 0.095 ± 0.035 | 0.091 ± 0.035 | 0.091 ± 0.035 | |
| 0.108 ± 0.040 | 0.095 ± 0.042 | 0.081 ± 0.043 | ||
Remark: O(*) is a rounding operator and k is the length of k-mers. Sep(R, R) is computed on each dataset using 5000 random set of k-mers generated from each dataset. The result summary shows that w = O(k/3) criterion is likely to produce a better separability performance; hence it can be generally applied in the localization approach.
Sep(*, *) score comparison among R, R, IC and MAP score
| Result details: | ||||||
|---|---|---|---|---|---|---|
| CREB | -0.099 ± 0.051 | -0.080 ± 0.013 | 0.255 ± 0.030 | 0.268 ± 0.014 | 0.022 ± 0.047 | |
| SRF | -0.104 ± 0.036 | -0.133 ± 0.008 | 0.313 ± 0.020 | 0.294 ± 0.009 | -0.022 ± 0.034 | |
| TBP | -0.088 ± 0.025 | 0.056 ± 0.002 | 0.302 ± 0.014 | 0.395 ± 0.005 | 0.125 ± 0.020 | |
| MEF2 | -0.405 ± 0.088 | 0.092 ± 0.020 | 0.144 ± 0.049 | 0.446 ± 0.017 | 0.358 ± 0.041 | |
| MYOD | -0.113 ± 0.043 | -0.022 ± 0.010 | 0.299 ± 0.025 | 0.356 ± 0.011 | 0.066 ± 0.037 | |
| ERE | 0.060 ± 0.027 | 0.057 ± 0.011 | 0.416 ± 0.017 | 0.414 ± 0.012 | -0.008 ± 0.028 | |
| E2F | -0.048 ± 0.032 | 0.064 ± 0.012 | 0.350 ± 0.018 | 0.419 ± 0.012 | 0.110 ± 0.027 | |
| CRP | 0.013 ± 0.032 | 0.070 ± 0.018 | 0.486 ± 0.018 | 0.516 ± 0.013 | 0.052 ± 0.028 | |
| -0.098 ± 0.042 | ||||||
| dm01g | -0.080 ± 0.042 | 0.024 ± 0.027 | 0.294 ± 0.024 | 0.361 ± 0.023 | 0.101 ± 0.035 | |
| dm04m | -0.029 ± 0.038 | 0.026 ± 0.025 | 0.350 ± 0.022 | 0.384 ± 0.022 | 0.053 ± 0.033 | |
| hm02r | -0.187 ± 0.067 | 0.089 ± 0.029 | 0.320 ± 0.037 | 0.478 ± 0.024 | 0.219 ± 0.043 | |
| hm03r | -0.096 ± 0.045 | 0.076 ± 0.017 | 0.276 ± 0.026 | 0.389 ± 0.015 | 0.135 ± 0.037 | |
| hm06g | -0.145 ± 0.068 | 0.001 ± 0.031 | 0.227 ± 0.040 | 0.325 ± 0.025 | 0.139 ± 0.051 | |
| hm08m | -0.006 ± 0.048 | 0.082 ± 0.024 | 0.277 ± 0.030 | 0.340 ± 0.021 | 0.084 ± 0.041 | |
| hm09g | -0.120 ± 0.087 | -0.009 ± 0.041 | 0.211 ± 0.053 | 0.288 ± 0.035 | 0.114 ± 0.075 | |
| hm10m | -0.070 ± 0.050 | 0.071 ± 0.027 | 0.290 ± 0.030 | 0.383 ± 0.022 | 0.134 ± 0.038 | |
| hm11g | -0.172 ± 0.062 | 0.077 ± 0.016 | 0.224 ± 0.036 | 0.388 ± 0.016 | 0.168 ± 0.045 | |
| hm16g | -0.218 ± 0.100 | 0.000 ± 0.049 | 0.227 ± 0.056 | 0.364 ± 0.038 | 0.140 ± 0.077 | |
| hm17g | -0.076 ± 0.052 | -0.022 ± 0.026 | 0.379 ± 0.029 | 0.409 ± 0.021 | 0.065 ± 0.045 | |
| hm20r | -0.344 ± 0.044 | 0.098 ± 0.002 | 0.234 ± 0.022 | 0.486 ± 0.006 | 0.322 ± 0.023 | |
| hm21g | -0.183 ± 0.062 | -0.075 ± 0.036 | 0.293 ± 0.035 | 0.357 ± 0.027 | 0.064 ± 0.051 | |
| hm24m | -0.082 ± 0.052 | 0.024 ± 0.032 | 0.324 ± 0.031 | 0.390 ± 0.026 | 0.107 ± 0.042 | |
| hm26m | -0.114 ± 0.067 | 0.177 ± 0.034 | 0.377 ± 0.039 | 0.540 ± 0.028 | 0.265 ± 0.044 | |
| mus02r | -0.034 ± 0.110 | -0.061 ± 0.058 | 0.409 ± 0.062 | 0.393 ± 0.046 | 0.004 ± 0.119 | |
| mus10g | -0.630 ± 0.134 | -0.052 ± 0.020 | 0.001 ± 0.076 | 0.355 ± 0.019 | 0.350 ± 0.056 | |
| mus11m | -0.623 ± 0.098 | -0.049 ± 0.021 | 0.050 ± 0.054 | 0.386 ± 0.019 | 0.340 ± 0.042 | |
| yst08r | -0.019 ± 0.050 | 0.149 ± 0.024 | 0.037 ± 0.040 | 0.196 ± 0.019 | 0.131 ± 0.045 | |
| yst09g | -0.253 ± 0.102 | 0.179 ± 0.036 | -0.053 ± 0.073 | 0.310 ± 0.029 | 0.353 ± 0.056 | |
| -0.174 ± 0.069 | ||||||
| CREB | -0.102 ± 0.047 | -0.056 ± 0.012 | 0.248 ± 0.028 | 0.280 ± 0.013 | 0.072 ± 0.042 | |
| SRF | -0.085 ± 0.029 | -0.131 ± 0.007 | 0.324 ± 0.016 | 0.296 ± 0.008 | -0.026 ± 0.028 | |
| TBP | -0.080 ± 0.023 | 0.052 ± 0.002 | 0.307 ± 0.013 | 0.392 ± 0.005 | 0.129 ± 0.019 | |
| MEF2 | -0.420 ± 0.092 | 0.122 ± 0.020 | 0.132 ± 0.051 | 0.463 ± 0.017 | 0.372 ± 0.042 | |
| MYOD | -0.115 ± 0.040 | -0.017 ± 0.009 | 0.297 ± 0.023 | 0.358 ± 0.010 | 0.088 ± 0.034 | |
| -0.160 ± 0.046 | -0.006 ± 0.010 | |||||
| data group ( | ||||||
| -0.098 ± 0.042 | 0.013 ± 0.012 | 0.321 ± 0.024 | 0.388 ± 0.012 | 0.088 ± 0.033 | ||
| -0.174 ± 0.069 | 0.040 ± 0.029 | 0.237 ± 0.041 | 0.376 ± 0.024 | 0.164 ± 0.050 | ||
| -0.160 ± 0.046 | -0.006 ± 0.010 | 0.262 ± 0.026 | 0.358 ± 0.011 | 0.127 ± 0.033 | ||
| -0.144 ± 0.052 | ||||||
Remark: Sep(*, *) score is computed on a dataset using 5000 random set of k-mers generated from the dataset. It can be seen that the localized version improves MISCORE in terms of separability performance, i.e., Sep(R, R) > 0 holds for most of the cases. Sep(*, *) score comparison among other metrics show that MISCORE is likely to produce favorable separability performance than IC and MAP score.
Recognizability scores for the best candidate motifs
| Result details: a 10-run average | |||||
|---|---|---|---|---|---|
| CREB | 0.339 | 0.433 | 0.383 | 0.384 | |
| SRF | 0.582 | 0.757 | 0.725 | 0.721 | |
| TBP | 0.529 | 0.717 | 0.750 | 0.800 | |
| MEF2 | 0.362 | 0.763 | 0.742 | 0.757 | |
| MYOD | 0.517 | 0.265 | 0.243 | 0.209 | |
| ERE | 0.512 | 0.750 | 0.875 | 1.000 | |
| E2F | 0.383 | 0.800 | 0.800 | 0.700 | |
| CRP | 1.000 | 1.000 | 1.000 | 1.000 | |
| 0.528 | 0.686 | ||||
| dm01g | 0.107 | 0.195 | 0.151 | 0.127 | |
| dm04m | 0.180 | 0.134 | 0.219 | 0.188 | |
| hm02r | 0.159 | 0.305 | 0.700 | 0.617 | |
| hm03r | 0.257 | 0.179 | 0.225 | 0.255 | |
| hm06g | 0.264 | 0.176 | 0.255 | 0.297 | |
| hm08m | 0.341 | 0.304 | 0.224 | 0.320 | |
| hm09g | 0.156 | 0.299 | 0.304 | 0.307 | |
| hm10m | 0.364 | 0.416 | 0.489 | 0.474 | |
| hm11g | 0.275 | 0.390 | 0.194 | 0.192 | |
| hm16g | 0.419 | 0.540 | 0.550 | 0.507 | |
| hm17g | 1.000 | 1.000 | 1.000 | 1.000 | |
| hm20r | 0.456 | 0.304 | 0.306 | 0.390 | |
| hm21g | 0.407 | 0.450 | 0.180 | 0.190 | |
| hm24m | 0.198 | 0.172 | 0.263 | 0.266 | |
| hm26m | 0.297 | 0.313 | 0.317 | 0.169 | |
| mus02r | 0.400 | 0.393 | 0.233 | 0.332 | |
| mus10g | 1.000 | 0.867 | 0.900 | 0.800 | |
| mus11m | 0.254 | 0.392 | 0.532 | 0.558 | |
| yst08r | 0.247 | 0.239 | 0.151 | 0.231 | |
| yst09g | 0.389 | 0.460 | 0.344 | 0.314 | |
| 0.359 | 0.376 | ||||
| CREB | 0.512 | 0.422 | 0.375 | 0.540 | |
| SRF | 0.369 | 0.407 | 0.373 | 0.398 | |
| TBP | 0.542 | 0.875 | 0.583 | 0.750 | |
| MEF2 | 0.533 | 1.000 | 0.467 | 0.433 | |
| MYOD | 0.488 | 0.425 | 0.453 | 0.400 | |
| 0.489 | 0. | 0.450 | 0.504 | ||
| 0.528 | 0.686 | 0.690 | 0.696 | ||
| 0.358 | 0.376 | 0.377 | 0.377 | ||
| 0.489 | 0.626 | 0.450 | 0.504 | ||
| 0.458 | 0.506 | 0.526 | |||
| 0.443 | 0.531 | 0.533 | |||
Remark: a higher μ score indicates a better ability of a metric in recognizing the best candidate motif in terms of rank order from a set of putative motifs returned by a tool. MISCORE is found to have convincing recognizability performances that are comparable to IC and remarkably better than MAP score as indicated in the result summary.
Strong/weak motif class-wise average recognizability scores
| Strong/weak motif class-wise | |||||
|---|---|---|---|---|---|
| 0.373 | 0.412 | 0.409 | |||
| Strong (16/33 datasets) | 0.463 | 0.516 | 0.507 | ||
Remark: recognizability scores obtained by the metrics are compared between strong and weak motifs. Results show that MISCORE noticeably outperforms MAP score and performs comparably to IC in recognizing weak motifs. However, the localized-MISCORE is likely to be more effective in recognizing weak motifs than IC and MAP score.
Recognizability scores for the best candidate motifs using pk models
| Result details: a 10-run average | ||||||||
|---|---|---|---|---|---|---|---|---|
| CREB | 0.339 | 0.333 | 0.096 | 0.295 | 0.275 | 0.370 | 0.080 | |
| SRF | 0.667 | 0.717 | 0.500 | 0.553 | 0.553 | 0.657 | 0.564 | |
| TBP | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | |
| MEF2 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | |
| MYOD | 0.645 | 0.651 | 0.665 | 0.656 | 0.656 | 0.656 | 0.640 | |
| ERE | 1.000 | 1.000 | 1.000 | 1.000 | 0.917 | 0.875 | 1.000 | |
| E2F | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | |
| CRP | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.792 | |
| 0.831 | 0.837 | 0.783 | 0.813 | 0.800 | 0.820 | 0.760 | ||
| dm01g | 0.667 | 0.667 | 0.342 | 0.528 | 0.694 | 0.722 | 0.371 | |
| dm04m | 0.377 | 0.485 | 0.662 | 0.498 | 0.487 | 0.484 | 0.647 | |
| hm02r | 0.800 | 0.700 | 1.000 | 0.547 | 0.447 | 0.447 | 1.000 | |
| hm03r | 0.255 | 0.425 | 0.690 | 0.514 | 0.514 | 0.300 | 0.556 | |
| hm06g | 0.444 | 0.429 | 0.611 | 0.407 | 0.353 | 0.546 | 0.427 | |
| hm08m | 0.861 | 0.861 | 0.852 | 0.854 | 0.771 | 0.857 | 0.857 | |
| hm09g | 0.539 | 0.565 | 0.205 | 0.389 | 0.512 | 0.556 | 0.285 | |
| hm10m | 0.412 | 0.495 | 0.558 | 0.490 | 0.490 | 0.500 | 0.820 | |
| hm11g | 0.302 | 0.329 | 0.829 | 0.335 | 0.285 | 0.333 | 0.829 | |
| hm16g | 0.690 | 0.767 | 0.105 | 0.617 | 0.767 | 0.900 | 0.100 | |
| hm17g | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | |
| hm20r | 0.537 | 0.537 | 0.708 | 0.542 | 0.542 | 0.548 | 0.708 | |
| hm21g | 0.148 | 0.148 | 0.483 | 0.204 | 0.214 | 0.214 | 0.324 | |
| hm24m | 0.573 | 0.650 | 1.000 | 0.592 | 0.592 | 0.725 | 0.867 | |
| hm26m | 0.450 | 0.650 | 0.369 | 0.650 | 0.567 | 0.617 | 0.700 | |
| mus02r | 0.182 | 0.209 | 0.329 | 0.184 | 0.184 | 0.199 | 0.345 | |
| mus10g | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | |
| mus11m | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | |
| yst08r | 0.567 | 0.633 | 0.524 | 0.567 | 0.583 | 0.580 | 0.767 | |
| yst09g | 0.201 | 0.232 | 0.292 | 0.179 | 0.186 | 0.217 | 0.321 | |
| 0.550 | 0.589 | 0.628 | 0.555 | 0.559 | 0.587 | 0.646 | ||
| CREB | 0.642 | 0.642 | 0.556 | 0.657 | 0.657 | 0.667 | 0.476 | |
| SRF | 0.667 | 0.667 | 0.523 | 0.707 | 0.650 | 0.667 | 0.822 | |
| TBP | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | |
| MEF2 | 0.653 | 0.656 | 0.656 | 0.750 | 0.850 | 0.662 | 0.482 | |
| MYOD | 0.486 | 0.653 | 0.500 | 0.563 | 0.563 | 0.577 | 0.661 | |
| 0.690 | 0.723 | 0.647 | 0.735 | 0.744 | 0.715 | 0.688 | ||
| 0.831 | 0.837 | 0.783 | 0.813 | 0.800 | 0.820 | 0.760 | ||
| 0.550 | 0.589 | 0.628 | 0.555 | 0.559 | 0.587 | 0.646 | ||
| 0.690 | 0.723 | 0.647 | 0.735 | 0.744 | 0.715 | 0.688 | ||
| 0.690 | 0.686 | 0.701 | 0.701 | 0.707 | 0.698 | |||
Remark: MISCORE metrics Rand compute motif-to-pk similarity through the characterization of the motif signals, while the other metrics can not perform motif characterization. The result summary shows that MISCORE is capable of effectively utilizing the pk models in recognizing the functional motifs. Note: PCC: Pearson correlation coefficient [42]; ALLR: average log likelihood ratio [41]; KLD: Kullback-Leibler divergence [43-45]; ED: Euclidean distance [46]; and SW: Sandeline-Wasserman metric [47].
ORS(M) scores with several threshold regulators
| TF | λ = 0.5 | ||||
|---|---|---|---|---|---|
| CREB | 200 | 0.391 | 0.357 | 0.429 | 0.537 |
| 500 | 0.762 | 0.576 | 0.884 | 0.806 | |
| SRF | 200 | 0.040 | 0.048 | 0.055 | 0.059 |
| 500 | 0.107 | 0.108 | 0.126 | 0.144 | |
| TBP | 200 | 0.334 | 0.385 | 0.441 | 0.548 |
| 500 | 0.671 | 0.778 | 0.793 | 0.803 | |
| MEF2 | 200 | 0.041 | 0.050 | 0.065 | 0.100 |
| 500 | 0.129 | 0.177 | 0.392 | 0.655 | |
| MYOD | 200 | 0.292 | 0.289 | 0.289 | 0.289 |
| 500 | 0.303 | 0.620 | 0.710 | 0.746 | |
Remark: MISCORE-based over-representation scores ORS(.) are computed for each dataset with different thresholds. ORS(M) < 1 holds for all cases, indicating that the background rareness and the over-representation of functional motifs are correlated by MISCORE. As the promoter region grows in length from 200bp to 500bp, the ORS(M) scores tend to increase as anticipated. Note: Ldenotes the length of the promoter sequences.
Figure 1Correlation between the over-representation and the background rareness. ORSscores for the functional models M, the random models , and the conserved models for q = 1, 2, 3, . . ., 1000 are plotted for each dataset with 200bp and 500bp promoters in the left and in the right column, respectively. Threshold θ = R(M) + std(M)λ, λ = 0.0 is used. The figure depicts a rareness interpretable visualization through the statistical over-representation property of the functional motifs by showing that, the ORS(M) scores are found distant from the ORS(M) scores for all cases which implies that the random models have close to zero chance of being over-represented with comparison to the true models. In addition to this, the ORS(M) scores are found to be mostly rare with comparison to the ORS(M) scores, i.e., these non-functional conserved models have a rare chance of having better over-representation scores than the true models, for most of the datasets.