| Literature DB >> 18606616 |
Jianfei Hu1, Haiyan Hu, Xiaoman Li.
Abstract
The identification of cis-regulatory modules (CRMs) can greatly advance our understanding of eukaryotic regulatory mechanism. Current methods to predict CRMs from known motifs either depend on multiple alignments or can only deal with a small number of known motifs provided by users. These methods are problematic when binding sites are not well aligned in multiple alignments or when the number of input known motifs is large. We thus developed a new CRM identification method MOPAT (motif pair tree), which identifies CRMs through the identification of motif modules, groups of motifs co-occurring in multiple CRMs. It can identify 'orthologous' CRMs without multiple alignments. It can also find CRMs given a large number of known motifs. We have applied this method to mouse developmental genes, and have evaluated the predicted CRMs and motif modules by microarray expression data and known interacting motif pairs. We show that the expression profiles of the genes containing CRMs of the same motif module correlate significantly better than those of a random set of genes do. We also show that the known interacting motif pairs are significantly included in our predictions. Commical">pared with several current methods, our method shows better performance in identifying meaningful CRMs.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18606616 PMCID: PMC2490743 DOI: 10.1093/nar/gkn407
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Construction of motif pair tree. (A) Motif hits of ten motifs in eight sequences. The motifs can overlap with each other as long as their start positions are separated by at least 4 bp. The motifs in the same box are all paired with each other. (B) Motifs and their paired motif list. The number in the parenthesis is the number of the genes that contain instances of the motif pair. (C) Motif pair tree. Each node in motif pair tree represents a motif. Each path in the motif pair tree represents a potential motif module.
Figure 2.Histogram of the expression similarities of gene pairs. (A) Expression similarity of gene pairs in GDS2202, average = 0.14. (B and C) Expression similarity of gene pairs containing instances of the same motif modules in Results I and II, mean = 0.31 and 0.27, respectively.
Figure 3.Histogram of the median expression similarity of gene pairs in gene sets. (A) 166 805 random gene sets generate from the same distribution of the size of the gene sets in Result I. The 95% quartile of this histogram is 0.263. (B) 555 645 random gene sets generate from the same distribution of the size of the gene sets in Result II. The 95% quartile of this histogram is 0.277. (C) 33 361 target gene sets of the predicted motif modules in Result I. Ninety-three percent of the target genes have a median gene pair expression similarity larger than 0.263. (D) 111 129 target gene sets from the predicted motif modules in Result II. Sixty-nine percent of the target genes have a median gene pair expression similarity larger than 0.277.
Significance of proportion of CE motif pairs in predicted motif modules
| Result | |||||
|---|---|---|---|---|---|
| I | 135 981 | 2515 | 15 263 | 440 | 1.32E-21 |
| II | 135 981 | 2515 | 22 479 | 481 | 5E-5 |
| I + II | 135 981 | 2515 | 27 546 | 625 | 2.29E-09 |
See text for the meaning of N, M, n and m.
Figure 4.One example of the predicted recurrent motif modules.
Ten groups of randomly generated CRMs and genes
| Group | No. of motifs | No. of genes |
|---|---|---|
| 1 | 3 | 16 |
| 2 | 3 | 24 |
| 3 | 4 | 15 |
| 4 | 4 | 29 |
| 5 | 5 | 26 |
| 6 | 5 | 17 |
| 7 | 6 | 25 |
| 8 | 6 | 17 |
| 9 | 7 | 17 |
| 10 | 8 | 16 |
| Total | 51 | 202 |
Comparison between MOPAT and others
| Method | No. of motif inserted | No. of motif candidate | No. of CRMs predicted (true) | No. of motifs predicted (true) | No. of motifs in CRM |
|---|---|---|---|---|---|
| MOPAT | 51 | 522 | 633 (109 | 60 (42) | 4 (3 |
| Cbust | 51 | 522 | 190 (0,0) | 515 (51) | 94 (17, 249) |
| Compel | 51 | – | 202 (0,0) | 61 (10) | 8.5 (5, 13) |
aThe number of CRMs that match the implanted CRMs perfectly.
bThe number of CRMs that match the implanted CRMs with one mismatch.
cMinimum number of motifs in a CRM.
dMaximum number of motifs in CRM.
See text for the details.