| Literature DB >> 22912824 |
Shisong Ma1, Shawn Bachan, Matthew Porto, Hans J Bohnert, Michael Snyder, Savithramma P Dinesh-Kumar.
Abstract
The discovery of DNA regulatory motifs in the sequenced genomes using computational methods remains challenging. Here, we present MotifIndexer--a comprehensive strategy for de novo identification of DNA regulatory motifs at a genome level. Using word-counting methods, we indexed the existence of every 8-mer oligo composed of bases A, C, G, T, r, y, s, w, m, k, n or 12-mer oligo composed of A, C, G, T, n, in the promoters of all predicted genes of Arabidopsis thaliana genome and of selected stress-induced co-expressed genes. From this analysis, we identified number of over-represented motifs. Among these, major critical motifs were identified using a position filter. We used a model based on uniform distribution and the z-scores derived from this model to describe position bias. Interestingly, many motifs showed position bias towards the transcription start site. We extended this model to show biased distribution of motifs in the genomes of both A. thaliana and rice. We also used MotifIndexer to identify conserved motifs in co-expressed gene groups from two Arabidopsis species, A. thaliana and A. lyrata. This new comparative genomics method does not depend on alignments of homologous gene promoter sequences.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22912824 PMCID: PMC3418279 DOI: 10.1371/journal.pone.0043198
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Five steps to count the number of promoters harboring each 8-mer oligos within a selected group of promoters.
Figure 2Position bias of motifs.
(A) The distribution of the motif “rGTCAAmn" along the promoters in cluster_N0. Distance is relative to the TSS site. (B) A plot for oligos’ pValue against its z-score, from cluster_N0. Each dots represents a oligo. Shown are all oligos with pValue <1E-5, and a 0.03% sampling of those with pValue >1E-5. The blue line shows the trend. (C) A similar plot to B, but with oligos from randomly collected promoters. Note that the trend line is flat around 0.
Over-represented 8-mer motifs identified based on the pValue in the coexpressed genes promoters of cluster_N0 that are induced by abiotic and biotic stress.
| Ranking | Motif | InCluster | In Genome | pValue | q-Value |
| 0 | wrGTCAAm | 437 | 12418 | 3.68E-40 | 6.91E-33 |
| 1 | rGTCAAmn | 515 | 16289 | 1.57E-38 | 1.47E-31 |
| 2 | kTTGACyn | 515 | 16297 | 1.67E-38 | 1.04E-31 |
| 1066 | wTGACkTk | 361 | 11232 | 1.51E-21 | 2.65E-17 |
| 1067 | GAAAwkTm | 470 | 16228 | 1.51E-21 | 2.66E-17 |
| 1068 | TwGACnTk | 469 | 16183 | 1.52E-21 | 2.66E-17 |
| 1300 | rrrGTCAr | 385 | 12401 | 8.84E-21 | 1.27E-16 |
| 1301 | ACGCGkww | 107 | 1874 | 8.93E-21 | 1.29E-16 |
| 1302 | rrGTCArw | 421 | 14019 | 8.95E-21 | 1.29E-16 |
| 1303 | AwmGTCAr | 315 | 9380 | 8.97E-21 | 1.29E-16 |
| 93137 | ATwGACTw | 186 | 6508 | 1.00E-05 | 2.01E-03 |
| 93138 | AyGCsTyw | 186 | 6508 | 1.00E-05 | 2.01E-03 |
Major 8-mer motifs identified based on the position bias filter in the coexpressed genes promoters of cluster_N0 that are induced by abiotic and biotic stress.
| Ranking | Motif | In Cluster | In Genome | pValue | Mean position | TSS factor |
| 1 | rGTCAAmn | 515 | 16289 | 1.57E-38 | 612 | 11.57 |
| 8 | AAAGTCww | 365 | 9914 | 2.15E-34 | 604 | 7.89 |
| 1067 | GAAAwkTm | 470 | 16228 | 1.51E-21 | 567 | 6.05 |
| 1301 | ACGCGkww | 107 | 1874 | 8.93E-21 | 647 | 5.44 |
| 9900 | AwAAAAGk | 429 | 15841 | 2.29E-12 | 537 | 3.08 |
| 9048 | GAATwwTr | 437 | 16150 | 1.03E-12 | 527 | 2.2 |
| 6996 | wCACGynk | 354 | 12129 | 9.07E-14 | 569 | 5 |
| 11205 | AATTArTw | 404 | 14760 | 6.76E-12 | 521 | 1.54 |
| 13307 | ATAAwATA | 392 | 14328 | 2.84E-11 | 521 | 1.53 |
| 16102 | kACGACyn | 174 | 5127 | 1.37E-10 | 570 | 3.25 |
| 23351 | AACAAAAA | 392 | 14735 | 2.28E-09 | 527 | 1.99 |
| 51838 | sACGCrCk | 57 | 1307 | 3.73E-07 | 641 | 3.67 |
| 21072 | AwTCAAAG | 206 | 6546 | 1.05E-09 | 552 | 2.65 |
| 58106 | sAAGACTw | 153 | 4899 | 7.10E-07 | 603 | 4.72 |
| 84420 | TGrCCGCs | 26 | 449 | 4.88E-06 | 635 | 2.35 |
Major 8-mer motifs ranked by fold-change enrichment in the coexpressed genes promoters of cluster_N0 that are induced by abiotic and biotic stress.
| Related Major Motifs | Motifs | Cluster Size | In Cluster | In Genome | Fold change Enrichment | pValue | Mean Position | Z-score for TSS |
| rGTCAAmn, AAAGTCww | CTTTGACC | 712 | 64 | 1161 | 2.6 | 4.81E-12 | 676 | 4.95 |
| AAAGTCAA | 712 | 203 | 4185 | 2.3 | 7.33E-31 | 629 | 6.65 | |
| AGTTGACy | 712 | 100 | 2096 | 2.2 | 4.28E-14 | 598 | 3.47 | |
| GAAAwkTm, GAATwwTr | GAAAAGTC | 712 | 100 | 1704 | 2.8 | 2.93E-20 | 679 | 6.45 |
| GAAAAGTm | 712 | 186 | 4292 | 2.0 | 3.83E-22 | 630 | 6.49 | |
| GAAAwTTC | 712 | 136 | 4231 | 1.5 | 8.08E-07 | 602 | 4.18 | |
| ACGCGkww, ACrCGnkk | ACGCGTTA | 712 | 22 | 238 | 4.4 | 9.87E-09 | 549 | 0.75 |
| AACCGCGT | 712 | 24 | 311 | 3.6 | 7.00E-08 | 763 | 4.45 | |
| AACGCGTy | 712 | 38 | 493 | 3.6 | 1.13E-11 | 633 | 2.84 | |
| AArCGCGT | 712 | 54 | 841 | 3.0 | 7.81E-13 | 706 | 5.36 |
Major 8-mer motifs identified for coexpressed genes from various clusters.
| Cluster | Cluster Size | Motif | In cluster | In genome | pValue | Mean position | z score for TSS factor | Similar to motif | Motif sequences |
|
| |||||||||
|
| 217 | TTGACTTy | 101 | 5626 | 1.63E-24 | 624 | 5.06 | WBBOXPCWRKY1 | TTTGACy |
| 217 | kGTCAAmn | 154 | 16388 | 4.85E-11 | 559 | 3.16 | WBBOXPCWRKY1 | TTTGACy | |
| 217 |
| 93 | 7797 | 1.50E-10 | 598 | 3.42 | n/a | ||
| 217 |
| 51 | 3420 | 1.20E-08 | 542 | 1.02 | n/a | ||
| 217 | CACCwmCC | 32 | 1941 | 1.13E-06 | 712 | 4.38 | BOXLCOREDCPAL | ACCwwCC | |
| 217 |
| 21 | 1032 | 4.13E-06 | 701 | 3.17 | n/a | ||
|
| |||||||||
|
| 712 | rGTCAAmn | 515 | 16289 | 1.57E-38 | 612 | 11.57 | WBBOXPCWRKY1 | TTTGACy |
| 712 | AAAGTCww | 365 | 9914 | 2.15E-34 | 604 | 7.89 | |||
| 712 | GAAAwkTm | 470 | 16228 | 1.51E-21 | 567 | 6.05 | GAAATTT | GAAATTT | |
| 712 | GAATwwTr | 437 | 16150 | 1.03E-12 | 527 | 2.2 | |||
| 712 | ACGCGkww | 107 | 1874 | 8.93E-21 | 647 | 5.44 | CGCGBOXAT | vCGCGb | |
| 712 | ACrCGnkk | 310 | 9212 | 1.85E-20 | 602 | 6.95 | |||
| 712 |
| 174 | 5127 | 1.37E-10 | 570 | 3.25 | n/a | ||
| 712 |
| 57 | 1307 | 3.73E-07 | 641 | 3.67 | n/a | ||
| 712 |
| 354 | 12129 | 9.07E-14 | 569 | 5 | n/a | ||
| 712 |
| 429 | 15841 | 2.29E-12 | 537 | 3.08 | n/a | ||
| 712 |
| 404 | 14760 | 6.76E-12 | 521 | 1.54 | n/a | ||
| 712 |
| 392 | 14328 | 2.84E-11 | 521 | 1.53 | n/a | ||
| 712 | AACAAAAA | 392 | 14735 | 2.28E-09 | 527 | 1.99 | ANAERO1CONSENSUS | AAACAAA | |
| 712 | AwTCAAAG | 206 | 6546 | 1.05E-09 | 552 | 2.65 | T-box promoter motif | ACTTTG | |
| 712 |
| 153 | 4899 | 7.10E-07 | 603 | 4.72 | n/a | ||
| 712 |
| 26 | 449 | 4.88E-06 | 635 | 2.35 | n/a | ||
|
| 197 | mCGCGTnn | 87 | 3760 | 8.59E-32 | 709 | 7.95 | CGCGBOXAT | vCGCGb |
| 197 | GCGCGTsm | 15 | 565 | 1.46E-06 | 723 | 3.08 | |||
| 197 |
| 136 | 11958 | 2.02E-21 | 633 | 7.01 | n/a | ||
| 197 |
| 32 | 2089 | 6.21E-07 | 671 | 3.32 | n/a | ||
| 197 | CCACGyGs | 31 | 1197 | 4.29E-12 | 720 | 4.42 | Agris_GBF1/2/3 BS in ADH1 | CCACGTGG | |
| 197 | rCCGACny | 50 | 4077 | 2.97E-07 | 638 | 3.62 | DRECRTCOREAT | rCCGAC | |
| 197 | wAATATCk | 91 | 9177 | 1.93E-08 | 544 | 1.56 | EVENINGAT | AAAATATCT | |
| 197 |
| 133 | 15989 | 2.44E-08 | 572 | 3.41 | n/a | ||
| 197 | AwwTTGAC | 87 | 8781 | 5.25E-08 | 536 | 1.19 | WBOXATNPR1 | TTGAC | |
|
| |||||||||
|
| 465 | nGGCCCAn | 304 | 8287 | 2.14E-77 | 854 | 26.74 | UP1ATMSD | GGCCCAwww |
| 465 | mAGCCCAn | 223 | 6964 | 2.67E-39 | 806 | 18.83 | SITEIIATCYTC | TGGGCy | |
| 465 | AAACCCTr | 181 | 5612 | 1.55E-30 | 786 | 14.54 | UP2ATMSD | AAACCCTA | |
| 465 |
| 222 | 11462 | 1.50E-09 | 571 | 4.28 | n/a | ||
|
| 302 | AGGGTTTw | 152 | 5787 | 8.26E-40 | 870 | 18.41 | UP2ATMSD | AAACCCTA |
| 302 |
| 104 | 4709 | 2.78E-19 | 828 | 13.2 | n/a | ||
| 302 |
| 175 | 12653 | 1.02E-12 | 604 | 5.81 | n/a | ||
| 302 |
| 115 | 7141 | 2.54E-11 | 763 | 11.64 | n/a | ||
| 302 |
| 59 | 2773 | 5.83E-10 | 676 | 4.8 | n/a | ||
| 302 | nCGrCGkn | 149 | 10534 | 8.55E-11 | 588 | 4.49 | CGACGOSAMY3 | CGACG | |
| 302 |
| 35 | 1431 | 9.86E-08 | 652 | 3.17 | n/a | ||
| 302 |
| 165 | 11494 | 4.66E-13 | 681 | 9.33 | n/a | ||
| 302 |
| 122 | 8621 | 2.11E-08 | 627 | 5.36 | n/a | ||
|
| 1292 |
| 419 | 7451 | 7.03E-18 | 709 | 17.61 | n/a | |
| 1292 | nGsCCCAn | 522 | 10089 | 1.11E-15 | 685 | 17.76 | UP1ATMSD | GGCCCAwww | |
| 1292 |
| 206 | 3603 | 6.65E-09 | 654 | 8.13 | n/a | ||
| 1292 |
| 395 | 7940 | 8.71E-09 | 581 | 6.09 | n/a | ||
| 1292 | GATAAGnn | 627 | 13746 | 2.89E-08 | 595 | 9.33 | IBOX | GATAAG | |
| 1292 | CTCACTsw | 193 | 3405 | 4.43E-08 | 544 | 2.04 | SORLIP5AT | GAGTGAG | |
| 1292 |
| 262 | 5081 | 4.73E-07 | 555 | 3.16 | n/a | ||
|
| 820 | AwTGGGCy | 259 | 6146 | 2.16E-20 | 738 | 15.86 | SITEIIATCYTC | TGGGCy |
| 820 | rrCCGTTr | 196 | 4190 | 1.09E-19 | 669 | 9.43 | MYBCOREATCYCB1 | AACGG | |
| 820 | GCGsGArm | 86 | 1689 | 1.42E-10 | 656 | 5.21 | E2F1OSPCNA | GCGGGAAA | |
| 820 | AGwGwGwG | 344 | 10773 | 2.48E-09 | 609 | 8.52 | CTRMCAMV35S | TCTCTCTCT | |
| 820 |
| 79 | 1843 | 1.75E-06 | 639 | 4.82 | n/a | ||
| 820 | rGGsTTTw | 345 | 11207 | 1.78E-07 | 637 | 10.24 | UP2ATMSD | AAACCCTA | |
(Putative new motifs are marked bold).
Motifs identified in cluster_N0 via comparison between A. thaliana and A. lyrata.
|
|
| ||||||
| Motif | pValue | Mean Position | z score forTSS factor | pValue | Mean Position | z score forTSS factor | Conserved pValue |
| rGTCAAmn | 1.64E-38 | 554 | 5.23 | 2.91E-35 | 557 | 5.38 | 2.91E-35 |
| GAAAwkTC | 5.34E-22 | 605 | 6.39 | 1.29E-17 | 595 | 5.6 | 1.29E-17 |
| rmCGCGTw | 7.78E-23 | 613 | 4.07 | 2.60E-15 | 628 | 4.08 | 2.60E-15 |
| wCnACGAm | 1.67E-08 | 561 | 3.9 | 2.02E-09 | 573 | 4.74 | 1.67E-08 |
| TTGAATwk | 1.72E-08 | 567 | 4.76 | 2.64E-08 | 547 | 3.3 | 2.64E-08 |
| ACrCGCTn | 1.31E-07 | 538 | 1.07 | 2.33E-07 | 580 | 2.36 | 2.33E-07 |
| CGkACGmC | 6.57E-06 | 485 | −0.29 | 2.61E-06 | 472 | −0.52 | 6.57E-06 |
Comparison of 8-mer motifs identified from MotifIndexer vs Weeder and Amadeus.
| Cluster | Motif identified by MotifIndexer | Motifs identified by Weeder | Motifs identified by Amadeus (**) |
| N0 | rGTCAAmn, AAAGTCww | TTGACT, TTGACTTT, GTTGAC, GACTTT, GACTTTTC, TTGACC, TGACTT, CGTTGACT, TGACTA | CwwrGTCAAm |
| GAAAwkTm, GAATwwTr (*) | |||
| ACGCGkww, ACrCGnkk | ACGCGkTTw | ||
| kACGACyn, sACGCrCk, wCACGynk (*) | |||
| AwAAAAGk, AATTArTw, ATAAwATA, AACAAAAA, AwTCAAAG (*) | |||
| sAAGACTw (*) | |||
| kACTTTTTmA, mrvACkTTTA, TATTdCAATw, AmTwAwTTGC (*) | |||
|
| |||
| N19 | TTGACTTy, kGTCAAmn | GTTGAC, GTTGACTT, | AdrGTCAAAb |
| sCGTTkAn (*) | |||
| TCGAATTk | whTCGAAkTT | ||
| CACCwmCC (*) | |||
| AGTCkTCG (*) | |||
| krAnAATTsA (*) | |||
| N18 | nGGCCCAn | TGGGCC, AGGCCC, CAGGCCCA, CTGGGCCT, AGGCCCAT, CGGCCCAG, | mrGCCCA |
| mAGCCCAn | GCCCAT, TTGGGC, AAGCCC | mrGCCCA | |
| AAACCCTr | AAACCCTAr | ||
| CCGGnnTn (*) | |||
|
| |||
| ArCrrkAGTw, mArCGrCATC (*) |
(*)denotes motif identified by a single program.
(**)Amadeus represents motifs as position weight matrix. For easy comparison, they are transformed into oligo formats.
Motifs with shared position bias between Rice and Arabidopsis.
| Arabidopsis | Rice | |||||||
| Motif | Simlar to | Similar to Motif | Instances | Mean position | z-score | Instances | Mean position | z-score |
| GGCCCAnn | Place_UP1ATMSD | GGCCCAwww | 7625 | 689 | 58.58 | 19856 | 667 | 83.652 |
| AGCCCAnn | Place_SITEIIATCYTC | TGGGCy | 7646 | 633 | 41.393 | 12307 | 604 | 41.213 |
| CACGyGnC | Agris_ABFs binding site | CACGTGGC | 2790 | 651 | 28.288 | 5551 | 609 | 29.027 |
| TCTCTCTy | Place_CTRMCAMV35S | TCTCTCTCT | 7243 | 585 | 26.014 | 10033 | 587 | 31.265 |
| GGsTTTTn | Agris_TELO-box promoter | AAACCCTAA | 8292 | 566 | 21.993 | 7733 | 574 | 23.745 |
|
| n/a | 10210 | 561 | 22.379 | 11960 | 550 | 20.223 | |
| TGACGyGn | Agris_TGA1 binding site | TGACGTGG | 1954 | 612 | 17.645 | 5024 | 570 | 18.142 |
|
| n/a | 1386 | 624 | 16.445 | 4294 | 599 | 23.201 | |
| AAAAsGCs | Place_CDA1ATCAB2 | CAAAACGC | 1828 | 602 | 15.667 | 2317 | 594 | 16.209 |
| CTATAAAw | Place_TATABOX1 |
| 4636 | 560 | 15.055 | 5856 | 555 | 15.592 |
| TAAAsCCn | Place_UP2ATMSD | AAACCCTA | 5531 | 555 | 14.933 | 4131 | 568 | 15.952 |
| AnnCGACG | Place_CGACGOSAMY3 | CGACG | 3697 | 563 | 14.086 | 7802 | 558 | 18.743 |
| GCGCGnGn | Place_CGCGBOXAT | vCGCGb | 1033 | 621 | 13.937 | 11696 | 574 | 29.001 |
|
| n/a | 2183 | 581 | 13.655 | 3966 | 561 | 14.155 | |
| GnCACGTw | Agris_ACE promoter |
| 2069 | 582 | 13.415 | 4007 | 551 | 11.984 |
| GnCCGTTr | Place_MSACRCYM | AGACCGTTG | 1466 | 585 | 11.785 | 2546 | 576 | 13.926 |
| TAAATAss | Place_TATABOX1 |
| 2827 | 560 | 11.645 | 3238 | 555 | 11.418 |
| GGwCCCAC | Place_SITEIIBOSPCNA | TGGTCCCAC | 431 | 642 | 10.471 | 2413 | 590 | 15.999 |
|
| n/a | 2550 | 556 | 10.313 | 3480 | 562 | 13.361 | |
|
| n/a | 822 | 593 | 9.547 | 1793 | 616 | 17.525 | |
|
| n/a | 318 | 606 | 6.794 | 360 | 611 | 7.508 | |
|
| n/a | 326 | 597 | 6.286 | 2423 | 596 | 17.031 | |
| CCGACCsA | Place_DRE2COREZMRAB17 | ACCGAC | 326 | 593 | 6.07 | 905 | 594 | 10.126 |
|
| n/a | 283 | 600 | 6.061 | 309 | 599 | 6.255 | |
| TGACGTCA | Place_PALINDROMICCBOXGM | TGACGTCA | 259 | 596 | 5.556 | 270 | 593 | 5.514 |
| ACCCrCCC | Place_ACIPVPAL2 |
| 228 | 595 | 5.161 | 1323 | 642 | 18.419 |
|
| n/a | 200 | 590 | 4.592 | 1031 | 590 | 10.357 | |
| AGCGrGCC | Place_BS1EGCCR | AGCGGG | 124 | 607 | 4.261 | 780 | 604 | 10.397 |
(Putative new motifs are marked in bold).