| Literature DB >> 26484161 |
Abstract
While a number of DNA sequence motifs have been functionally characterized, the full repertoire of motifs in an organism (the motifome) is yet to be characterized. The present study wishes to widen the scope of motif content analysis in different monocot and dicot species that include both rice species, Brachypodium, corn, wheat as monocots and Arabidopsis, Lotus japonica, Medicago truncatula, and Populus tremula as dicots. All possible existing motifs were analyzed in different regions of genomes such as were found in different sets of sequences in these species: the whole genome, core proximal and distal promoters, 5' and 3' UTRs, and the 1st introns. Due to the increased number of species involved in this study compared to previous works, species relationships were analyzed based on the similarity of common motif content. Certain secondary structure elements were inferred in the genomes of these species as well as new unknown motifs. The distribution of 20 motifs common to the studied species were found to have a significantly larger occurrence within the promoters and 3' UTRs of genes, both being regulatory regions. Motifs common to the promoter regions of japonica rice, Brachypodium, and corn were also found in a number of orthologous and paralogous genes. Some of our motifs were found to be complementary to miRNA elements in Brachypodium distachyon and japonica rice.Entities:
Keywords: Dicot; Genome; Monocot; Motif; Promoter
Year: 2015 PMID: 26484161 PMCID: PMC4535654 DOI: 10.1016/j.gdata.2014.12.006
Source DB: PubMed Journal: Genom Data ISSN: 2213-5960
Available data sets for the studied species.
| Species | genome | Core promoters | Proximal promoters | Distal promoters | 1st introns | 5′ UTRs | 3′ UTRs |
|---|---|---|---|---|---|---|---|
| Monocots | |||||||
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| 1 | 1 | 1 | X | X | X | 1 | |
| 1 | X | X | X | X | X | X | |
| 1 | 1 | 1 | 1 | X | X | X | |
| Dicots | |||||||
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| 1 | X | X | X | X | X | X | |
| 1 | X | X | X | X | X | X | |
| 1 | X | X | X | X | X | X | |
General information on the genomes of the studied organisms.
| Species | A% | C% | G% | T% | Chrom. no. | Genome size (bp) | No. of genes | Reference |
|---|---|---|---|---|---|---|---|---|
| Monocots | ||||||||
| 26.8 | 23.2 | 23.2 | 26.8 | 5 | 271,923,306 | 12,825 | ||
| 28.2 | 21.8 | 21.8 | 28.2 | 12 | 382,150,945 | 30,294 | ||
| 28.6 | 21.4 | 21.4 | 28.6 | 12 | 427,026,737 | 49,710 | ||
| 26.5 | 23.5 | 23.5 | 26.5 | 10 | 2,065,722,704 | 54,814 | ||
| 27.3 | 22.7 | 22.7 | 27.3 | 7 | 6,846,530,000 | ~ 94,000–96,000 | ||
| Dicots | ||||||||
| 32.0 | 18.0 | 18.0 | 32.0 | 5 | 147,812,252 | 33,323 | ||
| 33.4 | 16.6 | 16.6 | 33.4 | 6 | 119,146,348 | ~ 20,800 | ||
| 33.4 | 16.6 | 16.6 | 33.4 | 9 | 307,511,856 | ~ 18,844 | ||
| 33.2 | 16.8 | 16.8 | 33.4 | 19 | 417,640,243 | n.a. | ||
Fig. 1a. Number of putative top 100 genomic motifs common to different combinations of the five monocot species studied.
b. Number of putative top 100 genomics motifs common to different combinations of the four dicot species studied.
List of 15 motifs common to monocots and dicots and their annotation. Reverse complement motifs underlined.
| Motif | PLACE annotation |
|---|---|
| ATRICHPSPETE CARGCW8GAT CARGNCAT MARTBOX | |
| AAAATAAA | -314MOTIFZMSBE1 CARGCW8GAT CARGNCAT ELEMENT1GMLBC3 MARTBOX |
| -314MOTIFZMSBE1 3AF1BOXPSRBCS3 CARGCW8GAT CARGNCAT ELEMENT1GMLBC3 MARTBOX | |
| TTTATTTT | -314MOTIFZMSBE1 CARGCW8GAT CARGNCAT ELEMENT1GMLBC3 MARTBOX |
| TTTGTTTT | |
| -314MOTIFZMSBE1 3AF1BOXPSRBCS3 CARGCW8GAT CARGNCAT ELEMENT1GMLBC3 MARTBOX | |
| ATRICHPSPETE CARGCW8GAT CARGNCAT MARTBOX |
Number of putative genomic top 100 motifs shared between different numbers of species for all monocot and dicot species.
| Species | Motifs shared with 1 species | Motifs shared with 2 species | Motifs shared with 3 species | Motifs shared with 4 species | Motifs shared with 5 species |
|---|---|---|---|---|---|
| Monocots | |||||
| 0 | 29 | 32 | 19 | 20 | |
| 2 | 27 | 34 | 19 | 20 | |
| 27 | 3 | 31 | 19 | 20 | |
| 44 | 7 | 13 | 16 | 20 | |
| 65 | 6 | 6 | 3 | 20 | |
| Dicots | |||||
| 28 | 18 | 13 | 41 | – | |
| 14 | 14 | 31 | 41 | – | |
| 15 | 18 | 26 | 41 | – | |
| 26 | 10 | 23 | 41 | – | |
Distribution of the 15 common motifs in three of the monocot species in promoters, within genes, and 3′ UTRs.
| Motif | O | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AAAAAAAA | 9704 (35.77%) | 7490 (27.61%) | 9930 (36.6%) | 16932 (39.88%) | 8504 (20.03%) | 17018 (40.08%) | 43339 (43.53%) | 18516 (18.59%) | 37700 (37.86%) | 17139 (33.3%) | 15953 (31%) | 18365 (35.68%) |
| AAAAAGAA | 2203 (35.44%) | 1715 (27.59%) | 2297 (36.95%) | 3418 (38.53%) | 2069 (23.32%) | 3384 (38.14%) | 11752 (39.67%) | 6936 (23.41%) | 10936 (36.91%) | 11730 (33.76%) | 11156 (32.11%) | 11856 (34.12%) |
| AAAAGAAA | 2601 (35.28%) | 2068 (28.05%) | 2702 (36.65%) | 3989 (38.47%) | 2385 (23%) | 3994 (38.52%) | 13977 (39.81%) | 8380 (23.87%) | 12747 (36.31%) | 13282 (34%) | 12455 (31.88%) | 13320 (34.1%) |
| AAAATAAA | 2839 (35.63%) | 2186 (27.44%) | 2941 (36.91%) | 4780 (42.49%) | 2200 (19.55%) | 4269 (37.95%) | 11265 (38.79%) | 7550 (25.99%) | 10225 (35.21%) | 12529 (33.94%) | 11713 (31.73%) | 12667 (34.31%) |
| AAAGAAAA | 2549 (35.57%) | 2014 (28.1%) | 2602 (36.31%) | 3971 (38.76%) | 2366 (23.09%) | 3906 (38.13%) | 14279 (39.81%) | 8569 (23.89%) | 13012 (36.28%) | 13220 (33.98%) | 12425 (31.94%) | 13255 (34.07%) |
| AAATAAAA | 2562 (36.46%) | 1911 (27.19%) | 2553 (36.33%) | 4024 (41.28%) | 2010 (20.62%) | 3712 (38.08%) | 10116 (38.35%) | 6900 (26.15%) | 9362 (35.49%) | 11660 (33.62%) | 11041 (31.84%) | 11975 (34.53%) |
| AAGAAAAA | 2379 (35.15%) | 1922 (28.4%) | 2466 (36.44%) | 3719 (38.68%) | 2169 (22.56%) | 3726 (38.75%) | 13257 (39.94%) | 7871 (23.71%) | 12057 (36.33%) | 11076 (33.46%) | 10595 (32.01%) | 11423 (34.51%) |
| AGAGAGAG | 2551 (36.59%) | 1836 (26.33%) | 2584 (37.06%) | 4628 (48.11%) | 1646 (17.11%) | 3344 (34.76%) | 10559 (50.19%) | 3682 (17.5%) | 6797 (32.3%) | 10652 (33.9%) | 10039 (31.95%) | 10723 (34.13%) |
| TTCTTTTT | 2218 (36.02%) | 1758 (28.55%) | 2181 (35.42%) | 3418 (39.22%) | 1943 (22.3%) | 3352 (38.47%) | 11210 (38.79%) | 7047 (24.38%) | 10642 (36.82%) | 11767 (34.3%) | 11088 (32.32%) | 11450 (33.37%) |
| TTTATTTT | 2922 (37.1%) | 2150 (27.29%) | 2804 (35.6%) | 4647 (42.39%) | 2169 (19.78%) | 4145 (37.81%) | 11273 (38.73%) | 7516 (25.82%) | 10312 (35.43%) | 12345 (34.3%) | 11420 (31.73%) | 12223 (33.96%) |
| TTTCTTTT | 2657 (36.71%) | 2031 (28.06%) | 2548 (35.21%) | 4025 (39.63%) | 2296 (22.61%) | 3833 (37.74%) | 13510 (39.04%) | 8483 (24.51%) | 12608 (36.43%) | 13187 (34.42%) | 12412 (32.4%) | 12709 (33.17%) |
| TTTGTTTT | 1778 (36.12%) | 1396 (28.36%) | 1748 (35.51%) | 2784 (38.12%) | 1758 (24.07%) | 2761 (37.81%) | 10184 (37.63%) | 7230 (26.72%) | 9647 (35.65%) | 12370 (33.61%) | 11994 (32.59%) | 12441 (33.8%) |
| TTTTATTT | 2488 (36.49%) | 1947 (28.55%) | 2383 (34.95%) | 3953 (41.83%) | 1883 (19.92%) | 3614 (38.24%) | 10097 (38.3%) | 6885 (26.11%) | 9378 (35.57%) | 11546 (34.34%) | 10712 (31.86%) | 11356 (33.78%) |
| TTTTCTTT | 2631 (36.65%) | 1983 (27.62%) | 2564 (35.72%) | 3997 (39.41%) | 2292 (22.59%) | 3853 (37.99%) | 14112 (39.71%) | 8669 (24.39%) | 12753 (35.88%) | 13030 (34.1%) | 12358 (32.34%) | 12818 (33.54%) |
| TTTTTCTT | 2464 (36.23%) | 1899 (27.92%) | 2437 (35.83%) | 3807 (39.83%) | 2104 (22.01%) | 3645 (38.14%) | 12886 (39.24%) | 8168 (24.87%) | 11784 (35.88%) | 11114 (34.54%) | 10423 (32.39%) | 10639 (33.06%) |
| TTTTTTTT | 9638 (36.22%) | 7599 (28.56%) | 9366 (35.2%) | 16845 (40.59%) | 8570 (20.65%) | 16076 (38.74%) | 43035 (43.19%) | 18886 (18.95%) | 37706 (37.84%) | 17833 (34.62%) | 16465 (31.96%) | 17210 (33.41%) |
Common genome motifs and their Spearman coefficient from the top 1000 motifs from the monocot species and Arabidopsis as an outlier species.
| 939/0.710 | |||||
| 704/0.417 | 716/0.449 | ||||
| 446/0.646 | 448/0.604 | 498/0.621 | |||
| 373/0.756 | 376/0.782 | 414/0.674 | 404/0.653 | ||
| 392/0.571 | 407/0.521 | 492/0.392 | 433/0.541 | 385/0.597 |
Distribution of the 15 common motifs in two of the dicot species in promoters, within genes, and 3′ UTRs.
| Motif | ||||||
|---|---|---|---|---|---|---|
| AAAAAAAA | 113367 (49.58%) | 21263 (9.3%) | 93997 (41.11%) | 176179 (37.06%) | 123956 (26.08%) | 175149 (36.85%) |
| AAAAAGAA | 20023 (44.96%) | 6323 (14.19%) | 18188 (40.84%) | 21275 (36.99%) | 15009 (26.09%) | 21224 (36.9%) |
| AAAAGAAA | 23235 (44.66%) | 7359 (14.14%) | 21421 (41.18%) | 25438 (36.84%) | 18090 (26.19%) | 25518 (36.95%) |
| AAAATAAA | 27470 (50.34%) | 5025 (9.2%) | 22073 (40.45%) | 46731 (37%) | 33212 (26.29%) | 46353 (36.7%) |
| AAAGAAAA | 24919 (44.73%) | 7887 (14.15%) | 22899 (41.1%) | 27648 (37.09%) | 19527 (26.19%) | 27365 (36.71%) |
| AAATAAAA | 26280 (49.71%) | 4922 (9.31%) | 21657 (40.97%) | 43908 (36.98%) | 31295 (26.35%) | 43526 (36.65%) |
| AAGAAAAA | 25016 (45.3%) | 7900 (14.3%) | 22302 (40.38%) | 28103 (36.92%) | 20056 (26.35%) | 27943 (36.71%) |
| TTCTTTTT | 19697 (44.9%) | 6015 (13.71%) | 18147 (41.37%) | 21078 (37.14%) | 14781 (26.04%) | 20885 (36.8%) |
| TTTATTTT | 26774 (49.75%) | 5057 (9.39%) | 21986 (40.85%) | 46196 (36.8%) | 33009 (26.29%) | 46310 (36.89%) |
| TTTCTTTT | 23091 (44.74%) | 7315 (14.17%) | 21204 (41.08%) | 25449 (36.95%) | 17949 (26.06%) | 25458 (36.97%) |
| TTTGTTTT | 26116 (45.24%) | 8194 (14.2%) | 23419 (40.57%) | 24684 (36.71%) | 17745 (26.39%) | 24814 (36.9%) |
| TTTTATTT | 25977 (49.31%) | 4984 (9.46%) | 21719 (41.22%) | 43667 (36.87%) | 30965 (26.14%) | 43790 (36.97%) |
| TTTTCTTT | 24914 (45.01%) | 7814 (14.11%) | 22619 (40.86%) | 27275 (36.84%) | 19404 (26.2%) | 27354 (36.94%) |
| TTTTTCTT | 25058 (45.5%) | 7743 (14.05%) | 22271 (40.43%) | 28005 (36.85%) | 19829 (26.09%) | 28143 (37.04%) |
| TTTTTTTT | 112239 (49.57%) | 20617 (9.1%) | 93565 (41.32%) | 173501 (36.76%) | 123670 (26.2%) | 174684 (37.02%) |
Fig. 2Pairwise comparison of common putative motifs in the whole genome, core and proximal promoters between all monocot and dicot species.
Fig. 3Number of common motifs within the 7 sequence subsets between Oryza sativa japonica and Arabidopsis thaliana.
3′ UTR motif content similarity and Spearman ranking between the studied monocot species.
| 167 (0.947) | 376 (0.589) | ||
| 515 (0.498) | |||
Proximal promoter motif content similarity and Spearman ranking between the studied monocot species.
| 642 (0.344) | 645 (0.416) | 655 (0.409) | ||
| 686 (0.399) | 439 (0.564) | |||
| 518 (0.447) | ||||
Distal promoter motif content similarity and Spearman ranking between the studied monocot species.
| 707 (0.341) | 619 (0.384) | ||
| 460 (0.545) | |||
List of Oryza sativa japonica and Brachypodium distachyon genes with more than 50 occurrences of reverse complementary motifs in their 3′ UTR regions.
| Gene ID | Number of reverse complementary 3′ UTR motifs | Gene annotation |
|---|---|---|
| Bradi1g32590 | 53 | 6-Phosphogluconate dehydrogenase family protein |
| Bradi1g56250 | 60 | A20/AN1-like zinc finger family protein |
| Bradi4g16400 | 73 | Agenet domain-containing protein |
| Os12g18729 | 75 | ARM repeat superfamily protein |
| Os08g43090 | 64 | Basic-leucine zipper (bZIP) transcription factor family protein |
| Bradi1g11310 | 61 | B-box type zinc finger protein with CCT domain |
| Os08g03310 | 67 | CCCH-type zinc fingerfamily protein with RNA-binding domain |
| Os01g10040 | 55 | Cytochrome P450, family 90, subfamily D, polypeptide 1 |
| Os11g05970 | 72 | FAD/NAD(P)-binding oxidoreductase family protein |
| Bradi2g05226 | 64 | Gigantea protein (GI) |
| Os11g47870 | 50 | GRAS family transcription factor |
| Os08g33750 | 58 | Homeodomain-like superfamily protein |
| Os06g06080 | 59 | Hydrolase-like protein family |
| Os02g01150 | 61 | hydroxypyruvate reductase |
| Os04g56500 | 52 | ILI1 binding bHLH 1 |
| Os01g61720 | 54 | IQ-domain 2 |
| Os01g10504 | 87 | K-box region and MADS-box transcription factor family protein |
| Os07g01490 | 156 | Kinesin 5 |
| Os10g13970 | 68 | Leucine-rich repeat protein kinase family protein |
| Os06g19990 | 56 | LORELEI-LIKE-GPI ANCHORED PROTEIN 3 |
| Os06g49380 | 52 | LRR and NB-ARC domains-containing disease resistance protein |
| Os07g44090 | 60 | myb domain protein 61 |
| Os01g09550 | 65 | NAC domain containing protein 75 |
| Bradi3g08890 | 51 | PEBP (phosphatidylethanolamine-binding protein) family protein |
| Bradi1g04820 | 56 | Peptidase S24/S26A/S26B/S26C family protein |
| Os01g53880 | 60 | Phytochrome-associated protein 1 |
| Os12g37480 | 53 | Plant invertase/pectin methylesterase inhibitor superfamily protein |
| Os02g11000 | 59 | Plant Tudor-like RNA-binding protein |
| Os06g41930 | 50 | PLATZ transcription factor family protein |
| Os07g28260 | 72 | P-loop containing nucleoside triphosphate hydrolases superfamily protein |
| Os05g01380 | 53 | Polygalacturonase inhibiting protein 1 |
| Os03g57940 | 56 | Protein kinase family protein |
| Os04g21340 | 52 | Protein of unknown function (DUF1685) |
| Os08g45170 | 74 | Protein of Unknown Function (DUF239) |
| Os02g08364 | 76 | Protein phosphatase 2C family protein |
| Bradi3g52740 | 57 | Pyrophosphorylase 1 |
| Bradi2g40040 | 63 | Ribosomal L28 family |
| Bradi4g44500 | 56 | Saposin B domain-containing protein |
| Os03g27590 | 56 | Serine carboxypeptidase-like 51 |
| Bradi2g39275 | 152 | Serine protease inhibitor, potato inhibitor I-type family protein |
| Bradi1g21510 | 80 | SPX domain gene 3 |
| Os12g08260 | 63 | Thiamin diphosphate-binding fold (THDP-binding) superfamily protein |
| Os04g20400 | 118 | UDP-Glycosyltransferase superfamily protein |
Core promoter motif content similarity and Spearman ranking between the studied monocot species.
| 271 (0.688) | 404 (0.531) | 432 (0.544) | ||
| 515 (0.498) | 569 (0.420) | |||
| 580 (0.479) | ||||