| Literature DB >> 16253142 |
Abstract
BACKGROUND: Cis-regulatory modules (CRMs) are short stretches of DNA that help regulate gene expression in higher eukaryotes. They have been found up to 1 megabase away from the genes they regulate and can be located upstream, downstream, and even within their target genes. Due to the difficulty of finding CRMs using biological and computational techniques, even well-studied regulatory systems may contain CRMs that have not yet been discovered.Entities:
Mesh:
Year: 2005 PMID: 16253142 PMCID: PMC1291357 DOI: 10.1186/1471-2105-6-262
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Key aspects of HexDiff and other algorithms. The table shows the knowledge used and the parameters required by the different algorithms.
| Algorithm | Knowledge Used | Parameters |
| HexDiff | CRM Locations | Number of hexamers in Hd |
| Ahab | PWMs | Window size |
| Cluster Buster | PWMs | Motif score threshold |
| MSCAN | PWMs | Motif score threshold |
| MCAST | PWMs | Motif score threshold |
| LWF | CRM Locations | String length |
Correlation between predicted and known CRMs. The performance of six different algorithms on a common data set is compared in this table. For each sequence, the Matthews correlation coefficient is calculated by checking whether each position is a TP, TN, FP, or FN and using the equation listed in the Methods section. The sum of the correlation coefficients gives a cumulative score for each algorithm on this data set.
| Gene | CRMs | HexDiff | Ahab | Cluster Buster | MSCAN | MCAST | LWF |
| btd | 1 | 0.70 | 0.57 | 0.19 | 0.01 | 0.07 | 0.10 |
| ems | 3 | 0.00 | 0.00 | -0.03 | 0.12 | -0.01 | -0.01 |
| eve | 6 | 0.55 | 0.63 | 0.65 | 0.50 | 0.41 | 0.06 |
| fkh | 1 | -0.03 | -0.02 | -0.02 | -0.04 | -0.02 | -0.01 |
| ftz | 5 | 0.40 | 0.28 | 0.28 | 0.07 | 0.16 | 0.08 |
| gt | 1 | 0.27 | 0.42 | 0.33 | 0.35 | 0.15 | 0.03 |
| h | 5 | 0.71 | 0.63 | 0.53 | 0.30 | 0.37 | 0.08 |
| hb | 2 | 0.35 | 0.63 | 0.39 | 0.34 | 0.24 | 0.04 |
| hkb | 1 | 0.51 | 0.00 | -0.02 | -0.02 | -0.08 | 0.09 |
| kni | 3 | 0.55 | 0.55 | 0.39 | 0.37 | 0.23 | -0.05 |
| kr | 3 | 0.43 | 0.00 | 0.77 | 0.20 | 0.11 | -0.03 |
| oc | 2 | 0.70 | -0.02 | 0.00 | 0.11 | 0.02 | 0.07 |
| prd | 7 | 0.01 | -0.07 | 0.16 | 0.07 | -0.04 | 0.05 |
| run | 6 | 0.27 | 0.16 | 0.08 | 0.08 | 0.02 | 0.07 |
| slp1 | 3 | -0.07 | 0.15 | -0.04 | 0.00 | 0.07 | 0.01 |
| tll | 3 | 0.35 | 0.56 | 0.58 | 0.19 | 0.12 | -0.04 |
| Total | 52 | 5.71 | 4.48 | 4.24 | 2.64 | 1.81 | 0.52 |
Sensitivities and positive predictive values (PPVs) of HexDiff and other algorithms. A known CRM was considered recovered if a predicted CRM overlapped it by at least 50 bp. The PPVs in this table are italicized because they are estimates of the true PPVs. Without complete knowledge of all CRMs that are present in the 16 sequences, it is possible that some of the predicted CRMs that are labeled as false positives are actually true positives.
| Algorithm | CRMs Recovered | Num CRMs | Sensitivity TP/(TP + FN) | True Positives | CRMs Predicted | PPV TP/(TP + FP) |
| HexDiff | 36 | 52 | 69.23% | 35 | 104 | |
| Ahab | 23 | 52 | 44.23% | 20 | 35 | |
| Cluster Buster | 31 | 52 | 59.62% | 23 | 88 | |
| MSCAN | 34 | 52 | 65.38% | 42 | 226 | |
| MCAST | 43 | 52 | 82.69% | 53 | 499 | |
| LWF | 27 | 52 | 51.92% | 48 | 433 |
Potential novel CRMs predicted by HexDiff and other algorithms. All of the predicted CRMs listed in this table were predicted by HexDiff and at least two other algorithms. The column labeled "Gene" lists the gene involved in the early development of Drosophila that is closest to the predicted CRM. The columns labeled 1–5 are the different algorithms whose predictions matched the CRMs predicted by HexDiff: 1 – Ahab, 2 – Cluster Buster, 3 – MSCAN, 4 – MCAST, and 5 – LWF. The predicted CRMs were also compared to a compilation of 124 CRMs [32] – matching CRMs are listed in the last column.
| Gene | Arm | Begin | End | Length | 1 | 2 | 3 | 4 | 5 | Matched |
| btd | X | 9534921 | 9535192 | 271 | * | * | ||||
| eve | 2R | 5492385 | 5493575 | 1190 | * | * | eve_late2_mel | |||
| fkh | 3R | 24421705 | 24422385 | 680 | * | * | ||||
| ftz | 3R | 2683060 | 2683406 | 346 | * | * | ||||
| gt | X | 2268347 | 2270179 | 1832 | * | * | * | |||
| gt | X | 2290228 | 2290685 | 457 | * | * | * | * | * | gt_23-bcd_mel |
| hb | 3R | 4503375 | 4503962 | 587 | * | * | * | |||
| hb | 3R | 4519805 | 4520172 | 367 | * | * | ||||
| kni | 3L | 20628230 | 20628504 | 274 | * | * | * | * | kni_+1_mel | |
| prd | 2L | 12080435 | 12082316 | 1881 | * | * | * | prd_bcd_mel | ||
| prd | 2L | 12089627 | 12089847 | 220 | * | * | prd_1_mel | |||
| run | X | 20488169 | 20488643 | 474 | * | * | * | * | ||
| run | X | 20524260 | 20524722 | 462 | * | * | * | * | ||
| slp1 | 2L | 3811050 | 3812092 | 1042 | * | * | ||||
| slp1 | 2L | 3822581 | 3823049 | 468 | * | * | ||||
| slp1 | 2L | 3824891 | 3825039 | 148 | * | * | * | * | * | slp_A-bcd_mel |
| slp1 | 2L | 3833433 | 3834671 | 1238 | * | * | * | * | slp2_-3_mel | |
| tll | 3R | 26680559 | 26683175 | 2616 | * | * | * | tll_bcd_mel |