| Literature DB >> 17945028 |
Simon Minovitsky1, Philip Stegmaier, Alexander Kel, Alexey S Kondrashov, Inna Dubchak.
Abstract
BACKGROUND: A substantial fraction of non-coding DNA sequences of multicellular eukaryotes is under selective constraint. In particular, approximately 5% of the human genome consists of conserved non-coding sequences (CNSs). CNSs differ from other genomic sequences in their nucleotide composition and must play important functional roles, which mostly remain obscure.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17945028 PMCID: PMC2176071 DOI: 10.1186/1471-2164-8-378
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Percentages of dinucleotide frequencies, in CNSs (red), non-CNSs (green), near-promoters (lue), and random sequences (black).
Motifs overrepresented in CNSs over non-CNSs
| Odd Chromosomes | Even Chromosomes | ||||
| Motif | Number of occurrences | Overrepre-sentation | Motif | Number of occurrences | Overrepresen-tation |
| SYTAATTA | 10620 | 3.45 | TTAATTAV | 12637 | 3.72 |
| CTRATTAS | 6152 | 3.14 | TAATTRCW | 12019 | 3.43 |
| WGYAATTA | 12596 | 3.09 | GYAATTAS | 6142 | 3.39 |
| TTAATTAV | 13141 | 3.08 | TTTAATBA | 15060 | 3.14 |
| STAATTGV | 8267 | 2.89 | ATTAATBA | 10910 | 3.07 |
| VWGCTAAT | 10503 | 2.84 | TAATTWGM | 10885 | 3.04 |
| TTTAATBA | 15800 | 2.77 | GMWTAATT | 9941 | 2.97 |
| GMWTAATT | 10290 | 2.72 | CWTAATKA | 10028 | 2.94 |
| TAATTATV | 10100 | 2.72 | ATTAAWTT | 11570 | 2.85 |
| STTAATKG | 5905 | 2.71 | TTAATBAT | 10115 | 2.79 |
| ATTVAATT | 12177 | 2.68 | CWKTAATT | 13079 | 2.75 |
| ATTAATBA | 11006 | 2.61 | VWGCTAAT | 9823 | 2.71 |
| CWKTAATT | 13577 | 2.59 | CMATWAAT | 10129 | 2.65 |
| ATAATTAV | 10536 | 2.58 | ATTTVATT | 15715 | 2.64 |
| SMAATTAA | 12754 | 2.57 | CAATTRCH | 8188 | 2.61 |
| SBTAATGA | 8828 | 2.56 | MCWAATTA | 9605 | 2.61 |
| VATTWGCA | 14265 | 2.53 | ATTWWGCA | 9959 | 2.61 |
| TWAATCAR | 10639 | 2.52 | GKTAATTW | 9019 | 2.59 |
| AATTAVTT | 12668 | 2.51 | AATTAMCW | 10053 | 2.58 |
| GTAATTMM | 7484 | 2.49 | MATTDGCA | 13694 | 2.58 |
Motifs overrepresented in CNSs over near-promoter sequences
| Odd Chromosomes | Even Chromosomes | ||||
| Motif | Number of occurrences | Overrepre- sentation | Motif | Number of occurrences | Overrepresen-tation |
| STAATTAS | 7576 | 4.55 | SYTAATTA | 9852 | 4.26 |
| TTAATKAR | 17516 | 4.33 | TTAATTAD | 14561 | 4.07 |
| GBTAATKA | 12299 | 3.96 | CTRATTAS | 5744 | 3.90 |
| VTAATTGM | 10174 | 3.91 | ATTAATGN | 9762 | 3.74 |
| TTTMATKA | 19449 | 3.86 | TAATTATD | 11760 | 3.73 |
| MTTMATTA | 13688 | 3.82 | TTTAATDA | 16633 | 3.66 |
| AATKYAAT | 15204 | 3.73 | ATAATTAB | 9233 | 3.62 |
| TTAATKGV | 12925 | 3.72 | TAATKSAA | 10418 | 3.59 |
| RTAATKAA | 13613 | 3.68 | STAATTGV | 7823 | 3.55 |
| MMTAATTA | 12518 | 3.68 | GYAATWAA | 10608 | 3.55 |
| TSTAATTW | 14964 | 3.49 | TGYAATTW | 13322 | 3.51 |
| AATKMATT | 18824 | 3.48 | AATGMWTT | 15412 | 3.49 |
| TGATWAAW | 12898 | 3.46 | AGYAATTW | 12585 | 3.41 |
| KATAATKA | 10739 | 3.46 | AATTDATT | 14693 | 3.39 |
| CATTAAKV | 10838 | 3.42 | AATTATAD | 10379 | 3.36 |
| CATWAWTT | 14599 | 3.39 | TWAATTGR | 8896 | 3.35 |
| CATTWAAW | 19325 | 3.37 | AWTARCAT | 9601 | 3.35 |
| CAATTAKV | 9515 | 3.33 | TAATTHAT | 12789 | 3.34 |
| ATRATTYA | 13356 | 3.30 | CWTTAATR | 9114 | 3.32 |
| ATTTYMAT | 20983 | 3.29 | ATTSMATT | 11547 | 3.27 |
Motifs overrepresented in CNSs over randomized sequences
| Odd Chromosomes | Even Chromosomes | ||||
| Motif | Number of occurrences | Overrepre-sentation | Motif | Number of occurrences | Overrepre-sentation |
| CWGSCWGS | 32472 | 7.50 | CWGSCWGV | 38927 | 5.78 |
| SCCHGSCH | 42207 | 5.68 | SCCWGGSN | 33122 | 5.63 |
| GGSWGGSN | 39555 | 5.55 | CYCWSCCH | 33976 | 5.50 |
| CWGSCCWS | 24103 | 5.52 | RGCWGSCH | 30738 | 4.95 |
| RGTCCTBY | 22100 | 5.45 | GGSDGRGV | 34873 | 4.93 |
| GRGSWGRG | 25293 | 5.36 | CWGSCYCH | 29902 | 4.78 |
| CCYYYCCH | 40727 | 5.22 | CWSCWGGV | 31840 | 4.73 |
| SCCWGGRV | 33839 | 5.20 | SCWGCWGV | 30968 | 4.71 |
| CWGSCYCH | 36409 | 5.04 | CWGGGRRV | 31866 | 4.64 |
| SCWGGGSN | 36038 | 5.03 | CWGRGSCH | 28886 | 4.61 |
| SCHGSCCH | 36013 | 4.91 | CCWGGRRV | 31578 | 4.61 |
| CWGRGSCH | 35318 | 4.77 | SCHGGSCH | 28689 | 4.50 |
| SCYCWGCH | 34141 | 4.56 | GGRARGRR | 29240 | 4.47 |
| NCAGCTGN | 32928 | 4.52 | RRGGCWGV | 30772 | 4.44 |
| CAGCTGNN | 32867 | 4.51 | RGGGRARR | 29828 | 4.41 |
| TWACWGAA | 14781 | 4.48 | GVWGGGRR | 31019 | 4.37 |
| RGGGRRAR | 32929 | 4.42 | CYCYVSCC | 19097 | 4.37 |
| CWGSAGSY | 24140 | 4.37 | KCCWSCCH | 26417 | 4.33 |
| SCWGGRAR | 32065 | 4.37 | CAGCYSNG | 16617 | 4.28 |
| GGARRGRR | 33390 | 4.37 | KKGGCWGV | 28051 | 4.13 |
Motifs found matching transcription factor PWMs from TRANSFAC
| DME280 | ATAAACAN | Forkhead DNA-binding domain | Vertebrate | FOXI1a,FOXF1,FOXL1,FOXO4 |
| DME424 | WGTAAAYA | Forkhead DNA-binding domain | Vertebrate | FOXC1,FOXA4a,HNF-3beta |
| DME768 | WTGTCATV | Basic region + leucine zipper (bZIP) | Nematode | Skn-1 |
| DME1427 | WGTCATSM | Basic region + leucine zipper (bZIP) | Nematode | Skn-1 |
| DME27 | VATTWGCA | POU | Vertebrate | POU2F1 |
| DME349 | ATAAACAN | Forkhead DNA-binding domain | Vertebrate | FOXI1a,FOXF1,FOXL1,FOXO4 |
| DME1014 | GTMAACAD | Forkhead DNA-binding domain | Vertebrate | FOXD1,HNF-3beta,FOXO1a |
| DME1700 | CCAATMAB | DNA-binding domain with Histone fold | Fungal | HAP2,HAP3,HAP4 |
| DME1268 | STGASTYA | Basic region + leucine zipper (bZIP) | Vertebrate | NF-E2,AP-1 |
| DME90 | VCAGATGN | Basic region + helix-loop-helix motif | Vertebrate | ITF-2,Tal-1beta |
| DME94 | CATCTGBN | Basic region + helix-loop-helix motif | Vertebrate | ITF-2,Tal-1beta,E47 |
| DME765 | RTGWSTCA | Basic region + leucine zipper (bZIP) | Vertebrate | NF-E2,AP-1,Fos,Jun,Fra |
| DME1106 | TGTTBACW | Forkhead DNA-binding domain | Vertebrate | HNF-3beta |
| DME1111 | ATAAACAH | Forkhead DNA-binding domain | Vertebrate | FOXI1a,FOXF1,FOXL1,FOXO4 |
| DME1920 | CCACGTGG | Basic region + helix-loop-helix motif | Plant, Vertebrate | PIF3,c-Myc:Max |
| DME11 | CAGCTGNN | Basic region + helix-loop-helix motif | Vertebrate | AP-4 |
| DME456 | MAYAAACA | Forkhead DNA-binding domain | Vertebrate | FOXF1 |
| DME790 | TATGVAAA | POU | Vertebrate | POU2F1 |
| DME930 | ATAAAYAT | Forkhead DNA-binding domain | Vertebrate, Insect | FOXI1a,Croc |
| DME1145 | TGTTBACW | Forkhead DNA-binding domain | Vertebrate | HNF-3beta |