| Literature DB >> 20946653 |
Xiaohui Cai1, Lin Hou, Naifang Su, Haiyan Hu, Minghua Deng, Xiaoman Li.
Abstract
BACKGROUND: The identification of motif modules, groups of multiple motifs frequently occurring in DNA sequences, is one of the most important tasks necessary for annotating the human genome. Current approaches to identifying motif modules are often restricted to searches within promoter regions or rely on multiple genome alignments. However, the promoter regions only account for a limited number of locations where transcription factor binding sites can occur, and multiple genome alignments often cannot align binding sites with their true counterparts because of the short and degenerative nature of these transcription factor binding sites.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20946653 PMCID: PMC3091716 DOI: 10.1186/1471-2164-11-567
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Flow chart describing our method. (a) The basic procedure in our method. (b) The procedure to identify motif modules from conserved blocks.
Figure 2Contiguously conserved regions and discontiguously conserved regions. Discontiguously conserved regions often contain long divergent sequences, which makes the percent identity of the alignment of the corresponding regions be low.
Functional evidence used.
| Source | Number of gene sets | Number of significant motif modules | |
|---|---|---|---|
| FDR = 0.05 | FDR = 0.01 | ||
| BioCarta | 155 | 0 | 0 |
| KEGG | 168 | 122,279 | 10,378 |
| Genmapp | 88 | 5486 | 14 |
| GO | 1141 | 1,296,621 | 460,973 |
| PicTar | 162 | 2,645,699 | 1,647,433 |
| Cancer Module | 380 | 485,850 | 67,135 |
| Total | 2094 | 2,871,863 | 1,855,459 |
The first column lists the different sources of functional evidence. The second column is the number of all gene sets. The third and fourth columns are the number of motif modules whose target genes significantly overlap with the gene sets. All of them except KEGG pathways are downloaded from MSigDB. For KEGG pathways, we directly downloaded from the KEGG ftp site. Only the gene sets with at least ten genes are used.