Literature DB >> 22912824

Discovery of stress responsive DNA regulatory motifs in Arabidopsis.

Shisong Ma1, Shawn Bachan, Matthew Porto, Hans J Bohnert, Michael Snyder, Savithramma P Dinesh-Kumar.   

Abstract

The discovery of DNA regulatory motifs in the sequenced genomes using computational methods remains challenging. Here, we present MotifIndexer--a comprehensive strategy for de novo identification of DNA regulatory motifs at a genome level. Using word-counting methods, we indexed the existence of every 8-mer oligo composed of bases A, C, G, T, r, y, s, w, m, k, n or 12-mer oligo composed of A, C, G, T, n, in the promoters of all predicted genes of Arabidopsis thaliana genome and of selected stress-induced co-expressed genes. From this analysis, we identified number of over-represented motifs. Among these, major critical motifs were identified using a position filter. We used a model based on uniform distribution and the z-scores derived from this model to describe position bias. Interestingly, many motifs showed position bias towards the transcription start site. We extended this model to show biased distribution of motifs in the genomes of both A. thaliana and rice. We also used MotifIndexer to identify conserved motifs in co-expressed gene groups from two Arabidopsis species, A. thaliana and A. lyrata. This new comparative genomics method does not depend on alignments of homologous gene promoter sequences.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 22912824      PMCID: PMC3418279          DOI: 10.1371/journal.pone.0043198

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

The mechanisms behind regulated gene expression have been subject of passionate discussions for generations of molecular biologists. Though the list for such mechanisms has constantly been broadened, promoter motif based regulation remains an extensively studied topic of importance for eukaryotic gene expression. DNA regulatory motifs, or cis-regulatory elements, are 5–15 base pair (bp) long nucleotides within the promoter region that function primarily as transcription factor (TF) binding sites, although binding for other proteins, such as kinases, has also been reported [1]. The interaction between TFs and their target motifs can lead to induction or repression of gene expression. Various methods have been used to identify motifs, including deletion based functional analysis, comparative genomics, analysis of co-expressed genes, and ChIP-chip or ChIP-seq [2], [3], [4]. It is assumed that co-expressed genes are likely to contain a set of over-represented motifs in their promoters that lead to their similar expression patterns. Various computational methods have been developed to discover such motifs. These methods can be divided into two categories based on their ways to represent motifs: those using a probabilistic sequence model and those using word-based methods [3], [5]. Probabilistic approaches describe motifs as position weight matrix (PWM), where parameters are optimized using maximum-likelihood principles or Bayesian inference. Many of these methods are designed for the discovery of long motifs commonly found in prokaryotes, rather than the shorter motifs more common to eukaryotes. These methods might also fail to uncover globally optimal solutions since some form of local search is used in these methods [5]. In contrast, word-based methods describe motifs as word strings or oligonucleotides with degenerative bases. These methods employ exhaustive enumeration to count the frequency of oligonucleotides and identify over-represented oligonucleotide sequences from input promoters. This method is more suitable for identification of the shorter length motifs found in eukaryotic genomes, and is also guaranteed to find globally optimal solutions [5]. A word-based algorithm, Oligo-Analysis, developed by van Helden et al, was proved useful in discovering yeast motifs, but was limited to relatively simple motifs without much degeneration [6]. By contrast, the algorithm Yeast Motif Finder (YMF) allows searching for degenerative motifs containing the wobble bases, r(A|G), y(C|T), s(C|G), and w(A|T), or motifs with spacer n in the middle, but the maximum number of wobble bases in a motif is typically restricted to 2 [7]. Two other algorithms, Discriminating Word Enumerator (DWE) and Discovery of Rank Imbalanced Motifs (DRIM), also search for words with restricted number of degenerated bases [8], [9]. Yet another algorithm, Weeder, searches for motifs ranging from 6–12 bp, with 1–4 mismatches [10]. However, to our knowledge there is yet to be a program that enumerates motifs with all possible wobble bases without limiting their numbers in a single motif. As more complete genome sequences are available, comparative genomics can be used for motif discovery. These methods aim to identify evolutionarily conserved non-coding sequences via sequence alignment, usually within the promoter region of homologous gene sets from related species. Such a strategy has been used to discover motifs from yeast, Drosophila, and mammals [4], [11], [12]. These studies show that many conserved motifs tend to have their positions biased towards the transcriptional start site (TSS), supporting the view that they are bona fide motifs. Here, we describe an algorithm termed MotifIndexer that is based on word counting strategy. We have successfully used this algorithm to identify motifs from promoters of all predicted A. thaliana genes and of selected stress-induced co-expressed genes [13]. We demonstrate a new comparative genomics method to discover potential DNA regulatory elements by identifying motifs with conserved positional bias in both the Arabidopsis and the rice genomes. Significantly, the method does not depend on alignments of homologous gene promoter sequences.

Results

Indexing the Presence of 8-mer Oligos in the A. thaliana Genome and Stress Induced Co-expressed Gene Cluster

In word-counting based motif discovery algorithms, the format used to represent motifs is usually limited by search space volume. To balance between motif length and motif complexity, we chose two different motif formats. The first considers 8-mer oligos, with all possible combination of A, C, G, T, r(A|G), y(C|T), s(C|G), w(A|T), m(A|C), k(G|T), or n(A|C|G|T). The 214,358,881 or 118 oligos in this format cover almost all versatility in terms of nucleotide degeneration. Only wobble bases representing three different nucleotides will be missed in this format. The second format extracts 12-mer oligos, with all possible combination of A, C, G, T, or n. This format, with limited complexity in degeneration, allows characterizing motifs up to 12 bp, and the total number of oligos is 244,140,625 or 512. For every 8- or 12-mer oligo, we developed an algorithm to efficiently catalog the number of promoters harboring it within the A. thaliana genome, or within any selected group of promoters. The procedure for cataloging the 8-mer oligos is shown in Figure 1. The procedure loops through all promoters with iterations. In every iteration, a specific promoter is selected (step 1), all 8-mer oligos without degeneration are extracted (step 2), and listed together with all their degenerative forms (step 3). Next, the collection of 8-mers are translated into numbers according to an undercimal (base-11) positional notation system, in which the nucleotides A, C, G, T, r, y, s, w, m, k, n are converted to digit number 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, and A, respectively (step 4). Then, an array recording the number of promoters harboring these oligos is updated, allowing for an increase of 1 for every appearance (step 5). For 12-mer oligos, the procedure is similar except that a quinary (base-5) positional notation system is used.
Figure 1

Five steps to count the number of promoters harboring each 8-mer oligos within a selected group of promoters.

Once oligo numbers are cataloged for the whole genome and for promoters of a specific group of co-expressed genes, a pValue is calculated for every 8-mer or 12-mer oligo based on the hypergeometric distribution. This pValue provides a measure for over or under-representation of the corresponding oligo in the selected groups of promoters. Our method guarantees the finding of all 8-mer oligos with pValue smaller than a selected cut-off value. In the following analysis, we focused on oligos existing in fewer than 50% of all the promoters in the genome, which include 75,029,949 8-mer oligos and 242,575,400 12-mer oligos. To test our algorithm, we selected co-expressed gene groups consisting of 7,424 genes from a previous study that mainly includes Arabidopsis genes that are differentially regulated during biotic and abiotic stress responses [13]. In this analyses, the AtGenExpress stress data set [14] were analyzed using fuzzy k-means clustering, and genes were divided into 178 clusters according to expression patterns, with 22 major clusters (N0-N21; see Figure 2 in [13]). Our analysis here focused on genes induced by both abiotic and biotic stresses (cluster_N0), common stress responsive genes (cluster_N12), and genes induced by pathogen-associated molecular patterns (PAMPs) (cluster_N19).
Figure 2

Position bias of motifs.

(A) The distribution of the motif “rGTCAAmn" along the promoters in cluster_N0. Distance is relative to the TSS site. (B) A plot for oligos’ pValue against its z-score, from cluster_N0. Each dots represents a oligo. Shown are all oligos with pValue <1E-5, and a 0.03% sampling of those with pValue >1E-5. The blue line shows the trend. (C) A similar plot to B, but with oligos from randomly collected promoters. Note that the trend line is flat around 0.

Position bias of motifs.

(A) The distribution of the motif “rGTCAAmn" along the promoters in cluster_N0. Distance is relative to the TSS site. (B) A plot for oligos’ pValue against its z-score, from cluster_N0. Each dots represents a oligo. Shown are all oligos with pValue <1E-5, and a 0.03% sampling of those with pValue >1E-5. The blue line shows the trend. (C) A similar plot to B, but with oligos from randomly collected promoters. Note that the trend line is flat around 0. All 93,138 over-represented 8-mer oligos with pValues < = 1E-05 for cluster_N0 are shown in Table 1, sorted by pValue. The highest scoring two motifs identified were “wrGTCAAm" and “rGTCAAmn", with pValue 7.9E-32 and 1.19E-30 respectively. We noticed that many oligos below these pValues were simply variants of these two primary motifs. To increase the discovery of distinct rather than similar motifs, we used a position-based filter to eliminate variants of major motifs. Once a major motif was selected, its positions along the promoters were marked, and other oligos that share at least 10% of its position were removed. Among the remaining oligos, another major motif was picked from the top candidates with lowest pValue, and its position was marked, in addition to the positions of the previous major motifs. This process was repeated until less than 10 oligos remained. Major motifs identified using this process for cluster_N0 are listed in Table 2.
Table 1

Over-represented 8-mer motifs identified based on the pValue in the coexpressed genes promoters of cluster_N0 that are induced by abiotic and biotic stress.

RankingMotifInClusterIn GenomepValueq-Value
0wrGTCAAm437124183.68E-406.91E-33
1rGTCAAmn515162891.57E-381.47E-31
2kTTGACyn515162971.67E-381.04E-31
1066wTGACkTk361112321.51E-212.65E-17
1067GAAAwkTm470162281.51E-212.66E-17
1068TwGACnTk469161831.52E-212.66E-17
1300rrrGTCAr385124018.84E-211.27E-16
1301ACGCGkww10718748.93E-211.29E-16
1302rrGTCArw421140198.95E-211.29E-16
1303AwmGTCAr31593808.97E-211.29E-16
93137ATwGACTw18665081.00E-052.01E-03
93138AyGCsTyw18665081.00E-052.01E-03
Table 2

Major 8-mer motifs identified based on the position bias filter in the coexpressed genes promoters of cluster_N0 that are induced by abiotic and biotic stress.

RankingMotifIn ClusterIn GenomepValueMean positionTSS factor
1rGTCAAmn515162891.57E-3861211.57
8AAAGTCww36599142.15E-346047.89
1067GAAAwkTm470162281.51E-215676.05
1301ACGCGkww10718748.93E-216475.44
9900AwAAAAGk429158412.29E-125373.08
9048GAATwwTr437161501.03E-125272.2
6996wCACGynk354121299.07E-145695
11205AATTArTw404147606.76E-125211.54
13307ATAAwATA392143282.84E-115211.53
16102kACGACyn17451271.37E-105703.25
23351AACAAAAA392147352.28E-095271.99
51838sACGCrCk5713073.73E-076413.67
21072AwTCAAAG20665461.05E-095522.65
58106sAAGACTw15348997.10E-076034.72
84420TGrCCGCs264494.88E-066352.35
The over-represented oligos can also be sorted by their fold enrichment in the group of selected promoters relative to the whole genome. For example, in cluster_N0, among the oligos with similarities to the major motifs, many showed higher fold change enrichment than the major motifs themselves. Table 3 lists such oligos, grouped by the related major motifs. However, fold change enrichment is a measure prone to false discovery according to our permutation experiment (data not shown), therefore pValue will be used in the following analysis.
Table 3

Major 8-mer motifs ranked by fold-change enrichment in the coexpressed genes promoters of cluster_N0 that are induced by abiotic and biotic stress.

Related Major MotifsMotifsCluster SizeIn ClusterIn GenomeFold change EnrichmentpValueMean PositionZ-score for TSS
rGTCAAmn, AAAGTCwwCTTTGACC7126411612.64.81E-126764.95
AAAGTCAA71220341852.37.33E-316296.65
AGTTGACy71210020962.24.28E-145983.47
GAAAwkTm, GAATwwTrGAAAAGTC71210017042.82.93E-206796.45
GAAAAGTm71218642922.03.83E-226306.49
GAAAwTTC71213642311.58.08E-076024.18
ACGCGkww, ACrCGnkkACGCGTTA712222384.49.87E-095490.75
AACCGCGT712243113.67.00E-087634.45
AACGCGTy712384933.61.13E-116332.84
AArCGCGT712548413.07.81E-137065.36

Position Bias of Top Motifs

Typically, promoter motifs show position bias towards the transcription start sites (TSS), a feature that has been used as supporting evidence for bona fide motifs in many studies [7], [15], [16], [17]. In our study, top motifs from most of the clusters also have their positions bias towards the TSS. For example, Figure 2A shows the distribution of the motif “rGTCAAmn" along the promoters in cluster_N0. The motif density is much higher in the region 0 to 200 base pair (bp) upstream of the TSS site than in other region. While the distribution plot depicted in Figure 2A is intuitive in showing the bias of individual motif, it is difficult to apply such plots to a large number of oligos at the same time. Therefore, we used a z-score based on the uniform distribution to mathematically describe such bias. We assume oligos without selection pressure (non-motifs) distribute evenly along the promoter without preference. Since all promoters used in this analysis are 1000 bp in length, the mean position for evenly distributed 8-bp oligos should be at position 497. For an oligo with n instances within the promoters, the variance of their means should be (9932−1)/n. Thus, for any 8-mer oligo with n instances within any selected promoters, a z-score is calculated as: where p is the average position location. For orientation, we describe p = 1 as the position at −1000 relative to TSS, while p = 993 is the position immediately adjacent to TSS. Evenly distributed oligos should have a z-score close to 0. A large positive z-score indicates the oligo has a higher occurrence in the region between 0 to −500 relative to the TSS than in the −500 to −1000 region. That is, it has its positions biased towards the TSS. The mean position and z-score for the top motif from most clusters are listed in Table S1. It is apparent that many of them have a large z-score, indicating a position bias of top motifs towards the TSS.

Identification of Major 8-mer Motifs from Different Clusters of Coregulated Stress Responsive Genes

In our analysis, we also observed a correlation between small pValues and positive z-scores for co-expressed stress responsive genes. Such correlations for genes in cluster_N0 are illustrated in Figure 2B. To determine the significance of this observed correlation among co-expressed genes, we analyzed randomly selected promoters with the same size cluster as N0. Not only was the number of oligos with small pValues much smaller, but the z-score for the oliogs also averaged around 0 without much bias (Figure 2C). This result met our expectations, since no motif should be enriched in these randomly selected promoters. To evaluate the false discovery rate (FDR) in our algorithm, we repeated the analysis on 100 groups of randomly selected promoters, with 50 promoters in each group. The number of oligos resulting from these analyses should be similar to the number of false positive oligos identified in an actual experiment. On average, each group had 1.76, 17.8, or 180 oligos (before position-based filter) with pValue smaller or equal to 1E-7, 1E-6, and 1E-5, respectively. These numbers are close to the expected value: 75,029,949 X pValue/4. Among the 180 oligos with pValue < = 1E-5, only 0.49 oligo has its TSS z-score larger than 3. Analysis on groups of promoters with size 200 or 500 returns slightly (∼10%) more oligos that we would consider to be false positive. Based on these results, we choose the cutoff values in our analysis to minimize the number of false discovered oligos while maximizing the number of bona fide motifs. We used two criteria for qualifying an oligo to be a potential motif. The first one is based on the pValue cut off, and the second one is based on the z-score cut off. For 8-mer oligos, we took all those with pValue <1E-7, and those with pValue between 1E-5 and 1E-7 and z-score >3 as potential motifs. It should be noted that, though the pValue and z-score show correlation, these are two different measurements and are independent of each other. Therefore, motifs qualifying for both criteria should have high confidence to be a true motif. Using these criteria, we identified all 8-mer motifs that are over-represented in various co-expressed gene clusters (Table 4 and Table S2). Among these, cluster_N11 had 2,089 oligos (before position-filter) with pValue <1E-7, while all others had more than 9,000 oligos fulfilling such criteria. Since the number of expected false positive oligos was around 2, the false discover rate would be less than 0.1%. In addition to the identification of known motifs, our analysis revealed 32 putative new motifs.
Table 4

Major 8-mer motifs identified for coexpressed genes from various clusters.

ClusterCluster SizeMotifIn clusterIn genomepValueMean positionz score for TSS factorSimilar to motifMotif sequences
Genes induced by pathogen elicitors and PAMPs
N19 217TTGACTTy10156261.63E-246245.06WBBOXPCWRKY1TTTGACy
217kGTCAAmn154163884.85E-115593.16WBBOXPCWRKY1TTTGACy
217 sCGTTkAn 9377971.50E-105983.42n/a
217 TCGAATTk 5134201.20E-085421.02n/a
217CACCwmCC3219411.13E-067124.38BOXLCOREDCPALACCwwCC
217 AGTCkTCG 2110324.13E-067013.17n/a
Genes induced by various biotic and abiotic stresses
N0 712rGTCAAmn515162891.57E-3861211.57WBBOXPCWRKY1TTTGACy
712AAAGTCww36599142.15E-346047.89
712GAAAwkTm470162281.51E-215676.05GAAATTTGAAATTT
712GAATwwTr437161501.03E-125272.2
712ACGCGkww10718748.93E-216475.44CGCGBOXATvCGCGb
712ACrCGnkk31092121.85E-206026.95
712 kACGACyn 17451271.37E-105703.25n/a
712 sACGCrCk 5713073.73E-076413.67n/a
712 wCACGynk 354121299.07E-145695n/a
712 AwAAAAGk 429158412.29E-125373.08n/a
712 AATTArTw 404147606.76E-125211.54n/a
712 ATAAwATA 392143282.84E-115211.53n/a
712AACAAAAA392147352.28E-095271.99ANAERO1CONSENSUSAAACAAA
712AwTCAAAG20665461.05E-095522.65T-box promoter motifACTTTG
712 sAAGACTw 15348997.10E-076034.72n/a
712 TGrCCGCs 264494.88E-066352.35n/a
N12 197mCGCGTnn8737608.59E-327097.95CGCGBOXATvCGCGb
197GCGCGTsm155651.46E-067233.08
197 rCGTGTnn 136119582.02E-216337.01n/a
197 CCGTGTnk 3220896.21E-076713.32n/a
197CCACGyGs3111974.29E-127204.42Agris_GBF1/2/3 BS in ADH1CCACGTGG
197rCCGACny5040772.97E-076383.62DRECRTCOREATrCCGAC
197wAATATCk9191771.93E-085441.56EVENINGATAAAATATCT
197 AAAArAGA 133159892.44E-085723.41n/a
197AwwTTGAC8787815.25E-085361.19WBOXATNPR1TTGAC
Genes downregulated during stress
N18 465nGGCCCAn30482872.14E-7785426.74UP1ATMSDGGCCCAwww
465mAGCCCAn22369642.67E-3980618.83SITEIIATCYTCTGGGCy
465AAACCCTr18156121.55E-3078614.54UP2ATMSDAAACCCTA
465 CCGGnnTn 222114621.50E-095714.28n/a
N14 302AGGGTTTw15257878.26E-4087018.41UP2ATMSDAAACCCTA
302 ArGCCCrT 10447092.78E-1982813.2n/a
302 nCCGGAnn 175126531.02E-126045.81n/a
302 rGCCCArw 11571412.54E-1176311.64n/a
302 GGTTsGGw 5927735.83E-106764.8n/a
302nCGrCGkn149105348.55E-115884.49CGACGOSAMY3CGACG
302 AGGsCTTm 3514319.86E-086523.17n/a
302 rGGTTTmw 165114944.66E-136819.33n/a
302 sGwTTTAs 12286212.11E-086275.36n/a
N2 1292 AwTGGsCy 41974517.03E-1870917.61n/a
1292nGsCCCAn522100891.11E-1568517.76UP1ATMSDGGCCCAwww
1292 CCkGTTTr 20636036.65E-096548.13n/a
1292 rnACGACr 39579408.71E-095816.09n/a
1292GATAAGnn627137462.89E-085959.33IBOXGATAAG
1292CTCACTsw19334054.43E-085442.04SORLIP5ATGAGTGAG
1292 rkGGACCn 26250814.73E-075553.16n/a
N5 820AwTGGGCy25961462.16E-2073815.86SITEIIATCYTCTGGGCy
820rrCCGTTr19641901.09E-196699.43MYBCOREATCYCB1AACGG
820GCGsGArm8616891.42E-106565.21E2F1OSPCNAGCGGGAAA
820AGwGwGwG344107732.48E-096098.52CTRMCAMV35STCTCTCTCT
820 AmCCGAAC 7918431.75E-066394.82n/a
820rGGsTTTw345112071.78E-0763710.24UP2ATMSDAAACCCTA

(Putative new motifs are marked bold).

(Putative new motifs are marked bold). Genes in cluster_N19 are induced by pathogen elicitors/effectors or PAMPs and have roles in defense. As expected, two W-Box like motifs “TTGACTTy" and “kGTCAAm" are highly over-represented in this cluster of co-expressed genes (Table 4). WRKY transcription factors have been shown to bind to the W-Box motif during defense response [18], [19]. Another motif in cluster_N19 with moderate pValue, which nevertheless showed strong position bias is “CACCwmCC" (Table 4). This motif is similar to the BOXLCOREDCPAL motif “ACCwwCC", which is the binding site for DcMYB1, a carrot MYB transcriptional activator of the DcPAL2 gene in response to elicitor treatments [20]. The BOXLCOREDCPAL motif itself is not over-represented. The genes in cluster_N19 with the “CACCwmCC" in their promoters are predicted to function in secondary metabolism processes, especially phenylpropanoid synthesis. This indicates a MYB regulated signaling pathway is activated upon PAMP treatment to synthesize metabolites that facilitate defense responses. Interestingly, we also identified a motif “sCGTTkAn" with position bias (Table 4) that has no similarity to any known motifs in the plant motifs databases Agris and PLACE [21], [22]. We hypothesize that a yet to be identified transcription factor class will bind to the “sCGTTkAn" motif and that this motif will most likely be involved in the regulation of elicitor/effector or PAMP induced gene expression. Genes in cluster_N0 are induced by biotic and abiotic stresses. The over-represented motifs “rGTCAAm" and “AAAGTCww" (Table 4) are related to the W-Box motif. By their presence they might explain the induction of W-Box containing genes by PAMPs. The motif “ACGCGkww" and “ACrCGnkk" are related to the CGCG motif with the core sequences as “ACGCGT". The Ca2+/calmodulin-binding transcription factor AtSR1 (also known as CAMTA3) has been shown to bind to the CGCG motif within the promoter of the EDS1 gene acting as a repressor of its expression [23]. EDS1, an important pathogen responsive gene [24], is involved in regulating the levels of the defense molecule salicylic acid (SA). EDS1 is included in the cluster_N0 gene set and the identification of “ACGCGkww" and “ACrCGnkk" motifs within the promoters of other cluster_N0 genes indicates that they may also be regulated by AtSR1 or related transcription factors, and that their functions might be EDS1-independent. Our analysis of genes in cluster_N0 also identified a motif “GAAAwkTm" (Table 4) that is related to the “GAAATTT" motif. “GAAATTT" is the binding motif for CBP60g and SARD1, two transcription factors with partially redundant role in SA signaling [25], [26]. Many of the CBP60g and SARD1 downstream genes are enriched in cluster_N0. In addition, AT rich motifs such as “AwAAAAGk", “AATTArTw", “AATAwATA", and “AACAAAAA" are also over-represented in genes of cluster_N0 together with potentially new motifs “kACGACyn", “sACGCrCk", “AwTCAAAG", and “TGrCCGCs" (Table 4). Genes in cluster_N12 are induced by a variety of abiotic or biotic stresses. These genes are often viewed as common stress responsive genes. In this cluster, motif “mCGCGT" and the related motifs “ACrCGy" and “rCrCGkmm" are highly represented (Table 4), all three of which are similar to the CGCG motif and the rapid stress responsive element “CGCGTT" [27]. They are also similar to the CM2 motif “CCGCGT" within the ZAT12 promoter, which is the binding site for transcription factor CAMTA3 [28]. ZAT12 is included in the cluster_N12 gene set. This indicates that the CGCG motif not only plays a role in SA-responsive gene expression but also in mediating general stress responsive gene expression. Other over-represented motifs in cluster N12 are “AAAArAGA", “GwCCGACk", “CCACkwGG", and “TAAGGCGk" (Table 4). Similarly, genes in cluster N11 are also induced by various stress, but to a lesser intensity, and the over-represented motifs in these genes include “mCGCGTnn", “rGTCAAAs", “GACTTTkn", and “CGTGTkwn" (Table S2). Promoters of genes from several other clusters used in this study are induced by the plant stress hormone abscisic acid (ABA) at early time points (N3) or late time points (N9, N10, N13), and also by other abiotic stresses. As reported earlier [13], the G-Box and related motifs were over-represented in these clusters: “GmCACGTr" in N3, “GmCACGTn" in N9 and N13, and “kmCACGTn" in N10 (Table S2). The G-Box motif “GmCACGTs" is also highly represented in cluster N1 genes that are induced by light. Interestingly, this G-Box motif in cluster_N1 shows the highest position bias among all G-Box motifs in various clusters (Table S2). Our analysis also identified many motifs from N18, N14, N2, and N5 cluster genes that are down-regulated during stress (Table 4). Cluster_N18 includes many genes encoding ribosomal proteins. Three motifs “nGGCCCAn", “mAGCCCAn", and “AAACCCTr" are over-represented in this cluster with highly-biased positioning (Table 4). Two of them show similarities to known motifs UP1ATMSD “GGCCCAwww" and UP2ATMSD “AAACCCTA" [29]. Cluster_N14 is enriched with genes encoding RNA helicases and these genes are slightly up-regulated by cold stress. Multiple motifs are identified in this cluster including “AGGGTTTw", “ArGCCCrT", “nCCGGAnn", “rGCCCArw", “GGTTsGGw", and “nCGrCGkn" (Table 4). Cluster_N5 contains many cell cycle related genes and its over-represented motifs includes “AwTGGGCy", “rrCCGTTr", “GCGsGArm", “AGwGwGwG", and “AmCCGAAC" (Table 4). Genes in cluster_N2 are highly down-regulated by pathogen stress and includes over represented motifs “AwTGGsCy", “nGsCCCAn", “CCkGTTTr", and the I-box motif “GATAAGnn" (Table 4). Interestingly, the above described clusters share the common motif, “GGCCCA". In their majority, the motifs described above are unique and have not been previously identified. These results indicate that yet to be identified transcription factors and signaling pathways may be responsible for regulating expression of genes in these clusters.

Identification of Over Represented 12-mer Motifs in Stress-responsive Gene Clusters

It has been reported that transcription factor binding motifs from eukaryotes are 6–10 bp long. Our 12-mer motif analysis supports this notion. By including letter “n" in the motifs, our algorithm revealed that the length of most core motifs is less than 9 bp (Table S3). For example, major 12-mer motifs identified from cluster_N18 are “nnTGGGCCnnnn", “nnAAGCCCAnnn", and “nnAAACCCTAnn" (Table S3), whose sequences are almost identical to the three major 8-mer motifs “nGGCCCAn", “mAGCCCAn", and “AAACCCTr" (Table 4). The trailing letter “n" at both ends of these motifs indicates the core sequences are 7 or 8 base pairs long. Similarly, major 12-mer motifs in cluster_N0 are “TTGACTTnnnnn", “GTCAACnnnnnn", “GTCAAAnAnnnn", and “AnCGCGTnnnnn" (Table S3). One exception is found among motifs identified from cluster_N7. Genes in this cluster are down-regulated by stresses and mainly expressed in xylem tissue of roots. The motif “CGTGnGnGGCAC" is over-represented in this cluster, with highly biased positioning towards TSS. Collectively from this analysis, we conclude that the analysis on 8-mer oligos captures most motifs, while the analysis on 12-mer complements and enforces the information gained from 8-mer oligo analysis and may recover a few additional long motifs.

Using MotifIndexer to Identify Over-represented Motifs from a Single Microarray Experimental Dataset

We tested if our algorithm could be used to identify over-represented motifs from dataset that are generated from single microarray experiments. The data were extracted from AtGenExpress and include Arabidopsis responses to the bacterial pathogen Pseudomonas syringae pv. tomato (Pst) expressing avrRpm1 effector. This analysis identified 461 genes that were up-regulated more than two-fold in Pst expressing avrRPM1 after 6 hours compared to the mock control (Dataset ME00331: http://www.arabidopsis.org/portals/expression/microarray/ATGenExpress.jsp). Our analysis of the promoters of these genes using MotifIndexer algorithm are listed in Table S4. This includes “rCGTGTnn", “ATATTwTA", “TCTAGAmr", “wrTTGACn", and “yATTCAAm" motifs. Not surprisingly, these motifs are similar to those found in cluster N0, N12, and N19 (Table 4), which includes majority of abiotic/biotic stresses responsive genes. Thus, these motifs can be expected to play important roles in plant responses to pathogens. The identification of these particular motifs unequivocally demonstrates our algorithm’s competence to realistically identify motifs in single datasets from microarray and/or RNA-Seq experiments.

Using MotifIndexer to Compare Promoters in Two Different Species of Arabidopsis

A. lyrata is a close relative of A. thaliana. The coding regions between the two genomes share ∼92% identity, while the promoter regions have ∼85% identity [30]. We hypothesized that these two species share similar co-expressed gene modules, and the corresponding motifs are conserved. We tested if MotifIndexer can be used to identify such conserved motifs by comparing these two genomes. For any given set of promoters from A. thaliana, we chose those have orthologous promoters in A. lyrata and subjected them to MotifIndexer analysis. The same procedure was carried out for the orthologous promoters in A. lyrata. For each 8-mer oligo, pValues from each species were obtained and the larger value of the two was assigned as a conserved pValue to that 8-mer oligo. The analysis was performed on groups of randomly chosen promoters, or on groups of promoters from co-expressed genes. For 100 groups of 30 randomly chosen promoters, there were on average 1,730 oligos with pValue < = 1E-04 in A. thaliana, while only 12 of them were conserved among the two species. Thus, 99% of random noises or false positive oligos can be removed by comparing the two genomes. And less than 1 oligo (0.45 in average) was left if the conserved pValue was set at pValue < = 1E-05. Similar results were obtained for 100 groups of 100 randomly chosen promoters. On the other hand, for promoters from cluster_N0, there were 164,215 oligos with pValue < = 1E-04 in A. thaliana, and 75,371 of them also have associated pValue < = 1E-04 in A. lyrata. Thus, 46% of the oligos are conserved among the two species. Similar results were also obtained for promoters in other clusters (data not shown). The conserved motifs in cluster_N0 identified via this procedure are listed in Table 5. Since much less false positive oligos were left, we set a cut off conserved pValue at 1E-05. The top 3 motifs, “rGTCAAmn", “GAAAwkTC", and “rmCGCGTw" are similar to the top motifs identified in analysis for A. thaliana alone. The other four motifs demonstrate the power of such comparative genome analysis. They share similar pValues and position bias distribution in both species, with conserved pValue between 1E-05 and 1E-08. Comparing with single species analysis, their pValues did not change, but the corresponding cut-off pValue was 2 orders larger. Thus, these motifs that would have been considered less significant in a single species analysis were more significant in a comparative genome analysis. This indicates comparative genome analysis using MotifIndexer can identify novel motifs.
Table 5

Motifs identified in cluster_N0 via comparison between A. thaliana and A. lyrata.

A. thaliana A. lyrata
MotifpValueMean Positionz score forTSS factorpValueMean Positionz score forTSS factorConserved pValue
rGTCAAmn1.64E-385545.232.91E-355575.382.91E-35
GAAAwkTC5.34E-226056.391.29E-175955.61.29E-17
rmCGCGTw7.78E-236134.072.60E-156284.082.60E-15
wCnACGAm1.67E-085613.92.02E-095734.741.67E-08
TTGAATwk1.72E-085674.762.64E-085473.32.64E-08
ACrCGCTn1.31E-075381.072.33E-075802.362.33E-07
CGkACGmC6.57E-06485−0.292.61E-06472−0.526.57E-06

Comparison of MotifIndexer with Weeder and Amadeus Word-based Motif Finders

We compared results obtained from MotifIndexer with two word-based motif finding software Weeder and Amadeus [10], [31]. Weeder had been shown to out-perform other motif discovery tools [3]. Using Weeder version 1.4.2, the top 15 potential motifs were identified from clusters_N0, _N19, and _N18. Similarly, motifs were also identified from these clusters using Amadeus V1.2 with default settings. The results were compared to those (8-mer motifs) from our MotifIndexer algorithm (Table 6). All three programs identified the same highest ranking motifs, e.g., the W-Box related motifs in cluster_N0 and _N19, and the “nGGCCCAn" motif from cluster_N18. Several lower ranking motifs were identified by MotifIndexer and Amadeus, but not by Weeder, such as the CGCG related motif “ACGCGkww" motif from cluster_N0, and the “TCGAATTk" motif from cluster_N19. There were several motifs only identified by MotifIndexer, such as the “CACCwmCC" motif from cluster_N19, and the “GAAAwkTm" motif from cluster_N0 that are known to play a role in SA related defense response [25], [26]. On the other hand, Amadeus also recovered several distinct motifs from cluster_N0 and N18, most of which did not have position bias in the promoters. At the same time, Weeder identified several CG rich motifs from cluster_N0 and _N18 that were not identified in MotifIndexer, but they only exist in a small number of promoters in both clusters (data not shown). These results indicate that, for MotifIndexer and Amadeus, while both identify more motifs than Weeder, they also recovered distinct motifs by themselves.
Table 6

Comparison of 8-mer motifs identified from MotifIndexer vs Weeder and Amadeus.

ClusterMotif identified by MotifIndexerMotifs identified by WeederMotifs identified by Amadeus (**)
N0rGTCAAmn, AAAGTCwwTTGACT, TTGACTTT, GTTGAC, GACTTT, GACTTTTC, TTGACC, TGACTT, CGTTGACT, TGACTACwwrGTCAAm
GAAAwkTm, GAATwwTr (*)
ACGCGkww, ACrCGnkkACGCGkTTw
kACGACyn, sACGCrCk, wCACGynk (*)
AwAAAAGk, AATTArTw, ATAAwATA, AACAAAAA, AwTCAAAG (*)
sAAGACTw (*)
kACTTTTTmA, mrvACkTTTA, TATTdCAATw, AmTwAwTTGC (*)
GGCGTACGCG, CGCGGCCAGG, ACGCGCGT, GGGCCGCC, AGGGCGGCCT, GGACGCCC, GCTGCCCCCG, GGCTGCCGCG (*)
N19TTGACTTy, kGTCAAmnGTTGAC, GTTGACTT, CGTTGACTTT, TTGACT, CGTTGACT, GTTGACTTTT, TGACTT, TTGACTTT, GTTGACTTTG, GACTTT, GCGTTGAC, GCGTTGACTT, TTGACC, AGTTGACT, AGCGTTGACT, ATTGAC, GGTTGACT, AGTTGACTTT AdrGTCAAAb
sCGTTkAn (*)
TCGAATTkwhTCGAAkTT
CACCwmCC (*)
AGTCkTCG (*)
krAnAATTsA (*)
N18nGGCCCAnTGGGCC, AGGCCC, CAGGCCCA, CTGGGCCT, AGGCCCAT, CGGCCCAG,mrGCCCA
mAGCCCAnGCCCAT, TTGGGC, AAGCCCmrGCCCA
AAACCCTrAAACCCTAr
CCGGnnTn (*)
GCGCCAGGGC, CCGCAGGGGC, AAGGCCCG, GCGCTTGCGC, CAGGCGCTGC, AGCCCGGGGC (*)
ArCrrkAGTw, mArCGrCATC (*)

(*)denotes motif identified by a single program.

(**)Amadeus represents motifs as position weight matrix. For easy comparison, they are transformed into oligo formats.

(*)denotes motif identified by a single program. (**)Amadeus represents motifs as position weight matrix. For easy comparison, they are transformed into oligo formats.

Genome Wide Discovery of Position Biased Motifs in the Arabidopsis and the Rice Genomes

The search for motifs with position bias towards TSS can also be applied to all promoters in a genome. To this end, we applied our algorithm to both the Arabidopsis and the Rice genomes. For Arabidopsis, we included the promoters from 17,461 genes that have a 5′ untranslated regions (UTR) of at least 50 bp in length. The promoters chosen extended 1000 bp upstream of the 5′ UTR. For rice, we included 22,493 genes’ promoters with the same criteria. For both Arabidopsis and rice, we calculated the z-score for 8-mer motifs with less than or equal to 2 base pairs as degenerative, with the degenerative bases limited to r, y, s, w, or n. This analysis resulted in 28 oligos with obvious position bias in both the Arabidopsis and the rice genome (Table 7). Of these 28 oligos, 19 bear similarities to or share binding sites with known motifs. Among these known motifs, some functions in house-keeping gene expression such as TATABOX1 and UP1ATMSD, while others function in stress responses such as ABFs and TGA1 binding sites. The other 9 oligos do not have similarity to known motifs from the databases of PLACE and Agris. However, because their position biases are conserved among both the Arabidopsis and the Rice genome, we hypothesize that they are novel potential motifs that are regulated during yet to be identified biological pathways.
Table 7

Motifs with shared position bias between Rice and Arabidopsis.

ArabidopsisRice
MotifSimlar toSimilar to MotifInstancesMean positionz-scoreInstancesMean positionz-score
GGCCCAnnPlace_UP1ATMSDGGCCCAwww762568958.581985666783.652
AGCCCAnnPlace_SITEIIATCYTCTGGGCy764663341.3931230760441.213
CACGyGnCAgris_ABFs binding siteCACGTGGC279065128.288555160929.027
TCTCTCTyPlace_CTRMCAMV35STCTCTCTCT724358526.0141003358731.265
GGsTTTTnAgris_TELO-box promoterAAACCCTAA829256621.993773357423.745
AAACCGnn n/a1021056122.3791196055020.223
TGACGyGnAgris_TGA1 binding siteTGACGTGG195461217.645502457018.142
CCGrnCCG n/a138662416.445429459923.201
AAAAsGCsPlace_CDA1ATCAB2CAAAACGC182860215.667231759416.209
CTATAAAwPlace_TATABOX1 CTATAAATAC 463656015.055585655515.592
TAAAsCCnPlace_UP2ATMSDAAACCCTA553155514.933413156815.952
AnnCGACGPlace_CGACGOSAMY3CGACG369756314.086780255818.743
GCGCGnGnPlace_CGCGBOXATvCGCGb103362113.9371169657429.001
sCCGTTTn n/a218358113.655396656114.155
GnCACGTwAgris_ACE promoter GACACGTAGA 206958213.415400755111.984
GnCCGTTrPlace_MSACRCYMAGACCGTTG146658511.785254657613.926
TAAATAssPlace_TATABOX1 CTATAAATAC 282756011.645323855511.418
GGwCCCACPlace_SITEIIBOSPCNATGGTCCCAC43164210.471241359015.999
GCCTwTAn n/a255055610.313348056213.361
CsGTyCGA n/a8225939.547179361617.525
CGCGTTwA n/a3186066.7943606117.508
GyGGGGTs n/a3265976.286242359617.031
CCGACCsAPlace_DRE2COREZMRAB17ACCGAC3265936.0790559410.126
CGGGTCAA n/a2836006.0613095996.255
TGACGTCAPlace_PALINDROMICCBOXGMTGACGTCA2595965.5562705935.514
ACCCrCCCPlace_ACIPVPAL2 CCCACCTACC 2285955.161132364218.419
TGGGGCCw n/a2005904.592103159010.357
AGCGrGCCPlace_BS1EGCCRAGCGGG1246074.26178060410.397

(Putative new motifs are marked in bold).

(Putative new motifs are marked in bold). In addition to identifying motifs that show strong position bias in both Arabidopsis and rice, we identified motifs with position bias in only one species, either Arabidopsis (Table S5) or rice (Table S6). In Arabidopsis, 12 out of the 28 biased motifs have shared binding sites with known plant motifs, while in rice 21 out of 70 motifs are known. The remaining motifs are potential novel motifs with possible monocot or dicot distinguishing differences, although the confidence values for these motifs are less pronounced when compared to the candidate motifs shared by both species. However, comparison with other plant species should improve the confidence value.

Discussion

Here, we describe an algorithm that identifies all over-represented 8-mer oligo in groups of promoters that meet a predetermined pValue cut off, except for those oligos with wobble bases representing three nucleotides. It should be noted that wobble bases representing three nucleotides are rarely used for motif discovery [21], [22]. The motif discovery method described here is based on counting oligos (words) without allowing for mismatches. This is distinct from other methods such as Weeder that includes mismatches [10]. We find our method to be more suitable in part because it has been shown that mutations in one or two critical bases in a motif can abolish its binding affinity to transcription factors [18], [23], [32]. In addition, our method expands on and appears superior to a previously described exact counting method [7] because it greatly expands the oligo coverage in terms of degeneration. Comparison of our MotifIndexer with Weeder and Amadeus, we found our program performs similarly to Amadeus [31], while both recovered distinct motifs. As a measurement for over-representation of oligos, a pValue was calculated for every oligo based on the hypergeometric distribution [33]. In the 8-mer motif analysis, only 75,029,949 oligos that exist in less than half of the genome’s promoters were considered. Among these oligos, we expected that ∼50% are present in a frequency larger than expected, or over-represented, in any cluster. Only these 50% oligos were used for pValue calculation. From this we further removed reverse complemented counterparts. Thus, the expected number of false positive oligos is 75,029,949×pValue/4. Our analysis on groups of randomly selected promoters fits well with this expectation. Based on this, we chose pValue cut-off of 1E-07 for our analysis, thus limiting the total number of false positive oligos to 2 in average. This cut-off worked well for co-expressed gene clusters with large number of genes. The clusters used in our analysis usually have more than 100 genes and many motifs with pValues smaller than 1E-07 were identified. However, for smaller clusters, i.e. those with 20–50 genes, true motifs might have their pValue larger than 1E-07, and a larger pValue cut-off is needed. In this case, comparative genomics can be used to further remove false positive oligos. A. lyrata is a close relative of A. thaliana. The homologous promoters from these two species share similarity levels between 75% and 95%. Under these similarity levels, the true motifs are expected to be conserved between the two species, while the false positive oligos are not. Since MotifIndexer calculated a pValue for every possible 8-mer motif, it can be easily adapted for comparative genomic analysis, as shown for cluster_N0 promoters (see Table 5). By comparing two genomes, the pValue cut-off can be lowered to 1E-05, and more conserved motifs will be recovered. We also found that our MotifIndexer program can be used for motif identification based on gene co-expression networks (data not shown). This indicates the versatility of MotifIndexer, a distinct advantage when compared to other motif finding programs. Our method uses a z-score to describe oligos position bias within the selected promoters as another independent measurement of likelihood of identifying a bona fide motif. Previously the position bias for individual motifs has been depicted by a plot of motif density vs. relative position to the TSS [4]. In several studies, position bias was also measured by dividing promoters into several windows of predetermined width and then then searching for over-represented motifs within these bins [15], [34]. Recently, Yokoyama et al. developed a motif-positional function to measure spatial preferences at fine-scale resolution [17]. Our z-score method based on uniform distribution provides an alternative way to describe motif position bias at fine-scale resolution. It can be easily applied on a large number of oligos at the same time. By combining pValues and z-scores, our method provides a simple but powerful scoring system for the oligos. By applying this scoring system to randomly selected groups of promoters, cut-off values for bona fide motifs can be easily determined (see Figure 2). However, if a motif has position bias in the middle of the promoters, such as around −500 bp in promoters of 1,000 bp long, our program will not identify such bias. We applied a position based filtering system to pool major motifs with their related degenerate motifs, which enabled us to differentiate motifs occurring at unique binding sites. By applying this scoring and filtering system, we identified many known/unknown motifs with pValue less than 1E-7, as well as some motifs with moderate pValue but obvious position bias. Some of the previously not reported motifs bear similarities to known motifs, increasing confidence of them being true motifs. We do recognize that the filtering method applied here might mask motifs sharing partial binding sites. Our analysis with 8-mer oligos guaranteed that those motifs with lowest pValue were found. The subsequent analysis on 12-mer oligos confirmed the length of core motifs to be less than 9 base pair. Besides core motifs, flanking sequences might also be important to determine binding specificity, especially specificity within the same family of transcription factors, i.e. the WRKY transcription factors [18]. Once the core motif sequences are determined, such flanking sequences can be determined by analyzing nucleotide composition for base pairs around the core motif sequences (data not shown). Our analysis can also be modified to study motifs in which interior spaces are inserted, such as “nnnn". However, a trial run did not reveal any meaningful motifs. We also used position bias to identify potential motifs on genome wide scale using comparative genomics. The examples shown in Table 7 indicate this is a promising tool. Previous applications of comparative genomics for motif finding have been mostly limited to comparing homologous genes of different genomes [12], [35]. Our method is not reliant on homologous gene comparison and thus expands our discovery scope. It is yet to be explored if their close relative species share similar motif bias or not. In conclusion, we describe a de novo motif discovery method which needs no parameter input and are suitable for the identification of over-represented motifs from large-scale transcriptome datasets. We name this method as MotifIndexer. Currently, we are probing our Arabidopsis protein microarrays containing transcription factors [36], [37] to identify factors that bind to novel and known motifs identified in this analysis.

Materials and Methods

Clustering Data and Promoter Sequences

The clustering data for co-expressed gene groups were taken from a previous study [13]. In that study, the AtGenExpress global stress data set was analyzed via fuzzy-k means clustering method [14], [38]. Twenty-two major clusters identified from that analysis were extracted and used in this analysis. Arabidopsis promoter dataset was downloaded from TAIR (ftp://ftp.arabidopsis.org/Sequences/blast_datasets/TAIR9_blastsets/TAIR9_upstream_1000_20090619). The dataset contains the upstream 1000 base pairs for 33,518 genes. The rice upstream sequences were downloaded from the Rice Annotation Project Database (http://rapdb.dna.affrc.go.jp/). The IRGSP/RAP build 5 dataset was used to extract the 1000 base pair sequences upstream of 5′UTR from those genes with 5′UTR longer than 50 bp.

Indexing Genome and Oligo pValue Calculation

The procedure to catalog the number of promoters containing specific 8-mer or 12-mer oligos is illustrated in Figure 1. We used algorithms written in perl and C++ to carry out the cataloging process. The algorithms have been tested in Linux and Windows system, which can scan ∼18 promoters (1000 bp in length) per minute with an Intel® Core™ i5 CPU M540 @2.53 GHz. 1 G memory is required for the calculation. The MotifIndexer algorithm will be provided through our website (http://dinesh-kumarlab.genomecenter.ucdavis.edu/downloads.html) and upon request for academic use. For pValue calculation of oligos, we only consider those presented in less than half of genome’s promoters, which includes 75029949 8-mer oligos and 242575400 12-mer oligos. Suppose in a group of selected genes with M promoters in total, an oligo presents in m promoters among them. And within the K promoters in the whole genome, the oligo presents in k promoters. If m is larger than the expected value, that is m > k *M/K, a pValue is calculated as:where a is the largest integer between 0 and k *M/K that fulfills the requirement: After pValues were calculated for all the over-represented oligos, they were sorted by their pValues. And for any pair of reverse complemented oligos, only one oligo is retained. Then, the selected promoters were scanned for the presence of the oligos to calculate their average position and z-score for TSS. Finally, major motifs were picked as described in the result section. The process was carried out on selected co-expressed gene clusters, as well as randomly selected promoter groups.

Promoter Comparison between A. thaliana and A. lyrata

The genome sequence of A. lyrata was downloaded from http://genome.jgi-psf.org/Araly1/Araly1.download.ftp.html. The protein sequences of A. lyrata (Filtered Models6) were blasted against the A. thaliana protein sequences, and homologous gene pairs were selected based on these criteria: a, having at least 80% identities in amino acid sequences; b, the length of the A. lyrata protein is between 0.8 to 1.2 fold of that of the A. thaliana; c, potential duplicated genes in A. lyrata were filtered out. In total, 19,938 genes from A. lyrata were left and their promoters were extracted as 1,000 base pairs up stream of the translation start codon, and the same was done for A. thaliana. The analysis was carried out as described in the result section. Top motifs have position bias. (DOC) Click here for additional data file. Major 8-mer motifs identified for coexpressed genes from various clusters. (DOC) Click here for additional data file. Major 12-mer motifs over-represented in stress responsive genes promoters. (DOC) Click here for additional data file. Major motifs identified from Arabidopsis genes induced by pv. tomato ( ) (avrRpm1) – a case study on a single microarray experiment. (DOC) Click here for additional data file. Motifs with position bias only in Arabidopsis. (DOC) Click here for additional data file. Motifs with position bias only in Rice. (DOC) Click here for additional data file.
  38 in total

1.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae.

Authors:  J D Hughes; P W Estep; S Tavazoie; G M Church
Journal:  J Mol Biol       Date:  2000-03-10       Impact factor: 5.469

2.  YMF: A program for discovery of novel transcription factor binding sites by statistical overrepresentation.

Authors:  Saurabh Sinha; Martin Tompa
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

3.  Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes.

Authors:  Giulio Pavesi; Paolo Mereghetti; Giancarlo Mauri; Graziano Pesole
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

4.  Plant cis-acting regulatory DNA elements (PLACE) database: 1999.

Authors:  K Higo; Y Ugawa; M Iwamoto; T Korenaga
Journal:  Nucleic Acids Res       Date:  1999-01-01       Impact factor: 16.971

5.  CBP60g and SARD1 play partially redundant critical roles in salicylic acid signaling.

Authors:  Lin Wang; Kenichi Tsuda; William Truman; Masanao Sato; Le V Nguyen; Fumiaki Katagiri; Jane Glazebrook
Journal:  Plant J       Date:  2011-07-06       Impact factor: 6.417

6.  Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies.

Authors:  J van Helden; B André; J Collado-Vides
Journal:  J Mol Biol       Date:  1998-09-04       Impact factor: 5.469

7.  Finding functional features in Saccharomyces genomes by phylogenetic footprinting.

Authors:  Paul Cliften; Priya Sudarsanam; Ashwin Desikan; Lucinda Fulton; Bob Fulton; John Majors; Robert Waterston; Barak A Cohen; Mark Johnston
Journal:  Science       Date:  2003-05-29       Impact factor: 47.728

8.  Eukaryotic regulatory element conservation analysis and identification using comparative genomics.

Authors:  Yueyi Liu; X Shirley Liu; Liping Wei; Russ B Altman; Serafim Batzoglou
Journal:  Genome Res       Date:  2004-03       Impact factor: 9.043

9.  Homeodomain protein ATHB6 is a target of the protein phosphatase ABI1 and regulates hormone responses in Arabidopsis.

Authors:  Axel Himmelbach; Thomas Hoffmann; Martin Leube; Beat Höhener; Erwin Grill
Journal:  EMBO J       Date:  2002-06-17       Impact factor: 11.598

10.  Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering.

Authors:  Audrey P Gasch; Michael B Eisen
Journal:  Genome Biol       Date:  2002-10-10       Impact factor: 13.583

View more
  12 in total

1.  Juvenile hormone and its receptor, methoprene-tolerant, control the dynamics of mosquito gene expression.

Authors:  Zhen Zou; Tusar T Saha; Sourav Roy; Sang Woon Shin; Tyler W H Backman; Thomas Girke; Kevin P White; Alexander S Raikhel
Journal:  Proc Natl Acad Sci U S A       Date:  2013-04-30       Impact factor: 11.205

Review 2.  Emerging tools, concepts and ideas to track the modulator genes underlying plant drought adaptive traits: An overview.

Authors:  Parvathi Ms; Karaba N Nataraja
Journal:  Plant Signal Behav       Date:  2016

3.  Arabidopsis RIBOSOMAL RNA PROCESSING7 Is Required for 18S rRNA Maturation.

Authors:  Rosa Micol-Ponce; Raquel Sarmiento-Mañús; Alejandro Ruiz-Bayón; Charlotte Montacié; Julio Sáez-Vasquez; María Rosa Ponce
Journal:  Plant Cell       Date:  2018-10-25       Impact factor: 11.277

4.  Reverse engineering: a key component of systems biology to unravel global abiotic stress cross-talk.

Authors:  Swetlana Friedel; Björn Usadel; Nicolaus von Wirén; Nese Sreenivasulu
Journal:  Front Plant Sci       Date:  2012-12-31       Impact factor: 5.753

5.  Integrative analysis of the shikonin metabolic network identifies new gene connections and reveals evolutionary insight into shikonin biosynthesis.

Authors:  Thiti Suttiyut; Robert P Auber; Manoj Ghaste; Cade N Kane; Scott A M McAdam; Jennifer H Wisecaver; Joshua R Widhalm
Journal:  Hortic Res       Date:  2022-01-20       Impact factor: 7.291

6.  W-box and G-box elements play important roles in early senescence of rice flag leaf.

Authors:  Li Liu; Wei Xu; Xuesong Hu; Haoju Liu; Yongjun Lin
Journal:  Sci Rep       Date:  2016-02-11       Impact factor: 4.379

7.  Incorporating motif analysis into gene co-expression networks reveals novel modular expression pattern and new signaling pathways.

Authors:  Shisong Ma; Smit Shah; Hans J Bohnert; Michael Snyder; Savithramma P Dinesh-Kumar
Journal:  PLoS Genet       Date:  2013-10-03       Impact factor: 5.917

8.  'In silico expression analysis', a novel PathoPlant web tool to identify abiotic and biotic stress conditions associated with specific cis-regulatory sequences.

Authors:  Julio C Bolívar; Fabian Machens; Yuri Brill; Artyom Romanov; Lorenz Bülow; Reinhard Hehl
Journal:  Database (Oxford)       Date:  2014-04-10       Impact factor: 3.451

9.  Identification of functional cis-regulatory elements by sequential enrichment from a randomized synthetic DNA library.

Authors:  Mario Roccaro; Nahal Ahmadinejad; Thomas Colby; Imre E Somssich
Journal:  BMC Plant Biol       Date:  2013-10-18       Impact factor: 4.215

10.  Salt stress and senescence: identification of cross-talk regulatory components.

Authors:  Annapurna Devi Allu; Aleksandra Maria Soja; Anhui Wu; Jedrzej Szymanski; Salma Balazadeh
Journal:  J Exp Bot       Date:  2014-05-06       Impact factor: 6.992

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.