Literature DB >> 28158431

Prediction of bipartite transcriptional regulatory elements using transcriptome data of Arabidopsis.

Yoshiharu Y Yamamoto1,2,3,4, Hiroyuki Ichida5, Ayaka Hieno2, Daichi Obata1, Mutsutomo Tokizawa2, Mika Nomoto6, Yasuomi Tada6, Kazutaka Kusunoki2, Hiroyuki Koyama1,2, Natsuki Hayami2.   

Abstract

In our previous study, a methodology was established to predict transcriptional regulatory elements in promoter sequences using transcriptome data based on a frequency comparison of octamers. Some transcription factors, including the NAC family, cannot be covered by this method because their binding sequences have non-specific spacers in the middle of the two binding sites. In order to remove this blind spot in promoter prediction, we have extended our analysis by including bipartite octamers that are composed of '4 bases-a spacer with a flexible length-4 bases'. 8,044 pre-selected bipartite octamers, which had an overrepresentation of specific spacer lengths in promoter sequences and sequences related to core elements removed, were subjected to frequency comparison analysis. Prediction of ER stress-responsive elements in the BiP/BiPL promoter and an ANAC017 target sequence resulted in precise detection of true positives, judged by functional analyses of a reported article and our own in vitro protein-DNA binding assays. These results demonstrate that incorporation of bipartite octamers with continuous ones improves promoter prediction significantly.
© The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

Entities:  

Keywords:  plant genome; promoter prediction; transcriptome

Mesh:

Substances:

Year:  2017        PMID: 28158431      PMCID: PMC5499772          DOI: 10.1093/dnares/dsw065

Source DB:  PubMed          Journal:  DNA Res        ISSN: 1340-2838            Impact factor:   4.458


Introduction

The availability of large-scale gene expression data by microarray and RNASeq analyses provided the possibility of predicting transcriptional regulatory elements in promoters using these data. For such predictions, methods for extraction of consensus sequences from a set of promoter sequences have been used (e.g. Gibbs Sampler and MEME). As the accuracy of these established methods is not sufficient for experimental validation, we have developed a novel, more accurate method based on a simple frequency comparison. The new method shows considerably higher accuracy and sensitivity than conventional methods, judged by re-prediction of experimentally identified regulatory elements. In addition, our recent studies demonstrated that it is also useful for predicting novel regulatory elements, and it has paved the way for prediction-oriented promoter analysis. Our prediction method compares the frequency of every octamer between focused and global promoter sets. This octamer-based methodology has worked well for recognition sites of many types of transcription factors including bZIP, AP2/DREB, zinc finger, CAMTA, and also PCF families. However, it is theoretically not good for the detection of spacer-containing bipartite motifs that are recognized by protein complexes including dimers. As a consequence, these promoter elements are considered a blind spot of our octamer-based prediction method. There are several databases for detection of transcription factor-binding sites in the promoter region. Among them, JASPAR is the most popular public database that covers higher plants. Detection of promoter elements is based on DNA motifs for each transcription factor proteins, and in the case of Arabidopsis which is the most intensely collected for data in higher plants, motifs for 48 factors are available out of ∼1,900 transcription factors in the genome. The 48 factors include bipartite motifs for several MADS family proteins and an AP2 family protein, ANT. Obviously, the coverage is low for continuous and also bipartite motifs. In this report, we have developed a supplemental method for promoter prediction that compensates for this blind spot by incorporating bipartite octamers (4 + 4 bases) to predictions that contain a spacer sequence in the middle. Our results using the pre-selected bipartite octamers showed improved prediction of the ER-stress responsive element (ERSE) as an element for tunicamycin-activated ER stress, and of ANAC017 target sites in promoters of the down-stream genes of ANAC017.

Materials and methods

2.1. Bioinformatics analysis

24,956 Arabidopsis promoter sequences of 1 kb long, starting from the most major transcription start site (TSS), were prepared previously. Counting of each bipartite octamer, statistical analyses, and LDSS (local distribution of short sequences) analysis were achieved with home-made Perl and C+ programs, and Excel (Microsoft Japan, Tokyo). Distribution profiles of bipartite octamers were subjected to smoothing with a 21 bin, which is a good width for regulatory sequences (REG), clustered with Cluster by the hierarchical method with correlation measurement, and visualized with TreeView as described previously. Prediction of transcriptional regulatory elements based on microarray data with the aid of a list of bipartite octamers was achieved essentially according to our previous report using modified software. Motifs of clustered sequences were expressed using WebLogo. For prediction of ERSEs, 162 genes with a fold-change of over 2.5 were selected from up-regulated genes (E-MEXP-3186 from ArrayExpress, https://www.ebi.ac.uk/arrayexpress/ (11 January 2017, date last accessed)). as a positive promoter group. Microarray data of ANAC017 mutants were also retrieved from ArrayExpress (E-GEOD-41136).

2.2. DNA–protein binding analysis

Complementary DNA for ANAC017 was prepared by reverse transcriptase-polymerase chain reaction (RT-PCR) from total RNA of Arabidopsis seedlings using a conventional method as described elsewhere. The coding sequence of the Flag tag and a T7 promoter were added to the ANAC017 CDS excluding the transmembrane domain (1–523 aa) by PCR. The prepared PCR product was subjected to in vitro transcription by T7 RNA polymerase and subsequent in vitro translation by a wheat germ system as described previously. Binding assays of the Flag-tagged ANAC017 protein and biotinylated oligo-DNA probes were achieved using AlphaScreen (Perkin Elmer Japan, Tokyo) as described. Sequences of a biotinylated DNA probe and non-biotinylated competitors are shown in Fig. 6.
Figure 6

Direct binding of ANAC017 to a bipartite octamer predicted for H2O2 response. (A) The predicted target site of ANAC017 in an ATERF71 (AT2G47520) promoter was used as a biotinylated DNA probe in a DNA–protein binding assay by AlphaScreen. Underlining shows the predicted target site identified by the bipartite analysis. The sequences of unbiotinylated competitor DNAs with base substitutions from the biotinylated probe are shown (M0–M8, unsubstituted bases are shown as dots). Asterisks on the top of the probe sequence indicate critical bases for binding by ANAC017 protein, revealed by Panel B. (B) Results of the AlphaScreen assay are shown as the relative AlphaScreen signal (the signal obtained with non-biotinylated probe/signal with biotinylated probe). The average and standard error of four assays are shown.

Results and discussion

3.1. Pre-selection of potential regulatory sequences

In our previous study, promoter prediction using microarray data based on enumerative octamer analysis was developed. The degree of overrepresentation for each octamer among a set of promoter sequences showing some transcriptional response over total promoters in a genome is calculated as the ‘Relative Appearance Ratio (RAR)’ . When predicting a specific promoter, an octamer is taken from the 5ʹ end of the promoter, the corresponding RAR value for the octamer is put back to the corresponding promoter position. Repeating these steps sliding 1 bp each time towards the 3ʹ direction provides a promoter scan with the RAR. Peak positions in the scanned results are predicted regulatory sites. We extended this procedure using a new type of bipartite octamers composed of ‘4 bases + a spacer + 4 bases’. In this article, we will refer to these octamers as ‘bipartite octamers (4 + 4)’. As a pilot analysis, we surveyed 12 lengths of the spacer from 1 to 12 bases long. Analysis of all the possible bipartite octamers with a 12-spacer length gives 12 RAR values for each promoter point. While trials of this prediction method for cold- and high light-stress responsive elements detected novel promoter sequences that could not be detected by the previous octamer analysis without spacers, they also gave us highly complex results making their analysis difficult (data not shown). Therefore, we decided to pre-select potential regulatory sequences before carrying out frequency analysis. For this purpose, we set the priorities to high coverage over accuracy in the sequence selection, because an accurate, and thus small, selection at this point results in low sensitivity of prediction. Accuracy can be increased at the subsequent prediction steps. During the analysis mentioned above, we noticed an overrepresentation of a specific spacer length over the others when the octamer (4 + 4) sequence is fixed. Fig. 1 shows examples of the relationship between the spacer length and the counts in the total 24,956 promoters of the Arabidopsis genome. In the case of AACGnTCGT (n = 0–12), a spacer of three appears ∼4 times more frequently than the other spacer lengths. The probability of this count profile, represented by the peak and total counts, under the assumption of random occurrence is very low (P = 2.2E-195), suggesting a strong selection pressure towards this biased profile. Interestingly, a count profile of its complementary sequence (ACGAnCGTT) also shows the same characteristics where a spacer of three gives a single high peak.
Figure 1

Overrepresentation of a specific length of spacer in promoter sequences. Counts of a bipartite-type octamer with a spacer in all the promoters of the Arabidopsis genome are shown. The horizontal axis indicates the number of ‘n’s, i.e. the length of the spacer between the two tetramers. The two octamers shown (AACGnTCGT and ACGAnCGTT) are mutually complementary.

Overrepresentation of a specific length of spacer in promoter sequences. Counts of a bipartite-type octamer with a spacer in all the promoters of the Arabidopsis genome are shown. The horizontal axis indicates the number of ‘n’s, i.e. the length of the spacer between the two tetramers. The two octamers shown (AACGnTCGT and ACGAnCGTT) are mutually complementary. Preferential appearance in the promoter sequence and conservation of characteristics between forward and reverse sequences may suggest that they are transcriptional regulatory sequences. Therefore, we decided to select bipartite octamers that have a preference for the spacer length among the promoter sequences. With spacer lengths from 0 to 12, the number of possible bipartite octamers (4 + 4) is 48 × 13 = 65,536 × 13 (Table 1). First, one spacer length from 0 to 12 bp for each octamer with the highest count among the promoter sequences was selected, extracting 65,536 sequences from 65,536 × 13. Second, sequences with spacers of 0 and 1 were removed from 65,536 sequences because they were thought to be detected by established analysis using octamers without a spacer. Then, the P value of the count profile with 12 spacer lengths for each bipartite octamer, as shown in Fig. 1, was considered, focusing on a peak height and the total count for 13 spacers, and 9,022 octamers with P < 1E-5 were selected.
Table 1

Selection of bipartite octamers

SelectionNumber of octamers
Total (0 ≤ n ≤ 12)65,536 × 13
Peak space65,536
Peak spacing: n ≠ 0, 1 & single peak41,158
Dominance of peak space (P < E-5)9,022
LDSS selection
 TATA or core903
 Solid type75
 Rest (REG & non-REG)8,044

Of the 13 spacing lengths, one with the highest appearance in the promoters was selected (Peak space), and octamers with spacing lengths of 0 or 1, and octamers with the two spacings of highest appearance (twin peaks) were removed. Distribution profiles were subjected to statistical analysis and octamers with a profile of P < E-5, as determined by the Chi-square test, were further selected. Selected octamers were subjected to LDSS analysis and TATA-, Core-, and Solid-type octamers were removed, providing a final 8,044 bipartite octamers.

Selection of bipartite octamers Of the 13 spacing lengths, one with the highest appearance in the promoters was selected (Peak space), and octamers with spacing lengths of 0 or 1, and octamers with the two spacings of highest appearance (twin peaks) were removed. Distribution profiles were subjected to statistical analysis and octamers with a profile of P < E-5, as determined by the Chi-square test, were further selected. Selected octamers were subjected to LDSS analysis and TATA-, Core-, and Solid-type octamers were removed, providing a final 8,044 bipartite octamers. These sequences were further subjected to LDSS analysis, where preference of appearance according to promoter position was evaluated. According to this analysis, 903 of them are revealed to be related to core promoter elements (Fig. 2A), including TATA box (Fig. 3A) and Y Patch (Fig. 3B), and 75 sequences with ‘Solid’ distribution profiles (Fig. 2B), which potentially include transposon-related sequences, will not be transcriptional regulatory elements. These 978 identified sequences (903 + 75) were removed for further analysis. A summary of the sequence extraction is shown in Table 1.
Figure 2

LDSS analysis of bipartite octamers. The distribution profiles of each octamer are shown. Each profile is shown as a colored horizontal line (rare = black, frequent = red), and the profiles of 903 (A), 75 (B), and 8,044 (C) octamers were subjected to hierarchical clustering. The cluster number of the ‘Rest’ group is indicated at the right end of the heat map (C). REG (1, 2), REG-like (3, 4), and non-REG (5–9).

Figure 3

Typical LDSS profiles of bipartite octamers. Typical distribution profiles are shown: TATA type (A), Y Patch-related Core type (B), REG type (C), and non-REG type (D). Profiles were subjected to smoothing with a bin of 21 bp, and the maximum value was adjusted to 1.0. The peak position of Panel A is -47, which means the first T of the TATA within the sequence, ‘ACTT….TATA’, comes at -38. The two sequences in Panel C are complementary each other.

LDSS analysis of bipartite octamers. The distribution profiles of each octamer are shown. Each profile is shown as a colored horizontal line (rare = black, frequent = red), and the profiles of 903 (A), 75 (B), and 8,044 (C) octamers were subjected to hierarchical clustering. The cluster number of the ‘Rest’ group is indicated at the right end of the heat map (C). REG (1, 2), REG-like (3, 4), and non-REG (5–9). Typical LDSS profiles of bipartite octamers. Typical distribution profiles are shown: TATA type (A), Y Patch-related Core type (B), REG type (C), and non-REG type (D). Profiles were subjected to smoothing with a bin of 21 bp, and the maximum value was adjusted to 1.0. The peak position of Panel A is -47, which means the first T of the TATA within the sequence, ‘ACTT….TATA’, comes at -38. The two sequences in Panel C are complementary each other. The remaining 8,044 sequences (Fig. 2C) were classified into three groups according to the LDSS profiles: the REG type (Clusters 1 and 2 in Figs 2C and 3C) that shows local distribution around -40 to -400 relative to the TSS and is a characteristic of a certain type of transcriptional regulatory element, the non-REG type whose distributions are scattered evenly with regard to promoter position (Clusters 5–9 in Figs 2C and 3D) and the intermediate REG-like type (Clusters 3 and 4 in Fig. 2C). Figure 3C shows two distribution profiles of complementary sequences to each other, and both of them have very similar distribution profiles with peak positions around -100. These results demonstrate their distribution profile is direction-insensitive, and this feature is the same as the REG type of continuous octamers. Considering the distribution profiles, sequences in clusters 1 and 2 of Fig. 2C are classified as the REG type. Sequence lists of the removed core and solid groups, and also the 8,044 selected sequences containing REG and non-REG types can be found in Supplementary Tables S1–S3, respectively. In order to pick up sequence groups from the extracted sequences, 1,053 selected REG-type sequences which are predicted to be regulatory sequences were mapped to 24,714 Arabidopsis promoter sequences with 1 kb length, and a matrix of presence/absence (1,053 × 24,714) was subjected to 2D clustering analysis (Fig. 4). This analysis provided us the clearest classification of cis elements. As shown in Panel A, several clusters have been identified each of which shares bipartite octamers and promoters.
Figure 4

Several groups in the REG-type bipartite octamers. 1,053 bipartite octamers with distribution profiles of the REG type (Fig. 2C) were mapped to 24,714 Arabidopsis promoters, and a resultant presence/absence matrix (1,053 × 24,714) was subjected to 2D clustering analysis (A). Presence and absence of an octamer in a promoter is shown as red and black, respectively. Labels of the matrix in vertical and horizontal dimensions are omitted. Octamers of 20 largest clusters were subjected to WebLogo (B). Some pairs of cluster (1–9, 2–15, 3–4, 5–7, 10–17, 12–13) are complementary, respectively. Clusters 5 and 7 are ERSE (CCAAT-N9-CCACG).

Several groups in the REG-type bipartite octamers. 1,053 bipartite octamers with distribution profiles of the REG type (Fig. 2C) were mapped to 24,714 Arabidopsis promoters, and a resultant presence/absence matrix (1,053 × 24,714) was subjected to 2D clustering analysis (A). Presence and absence of an octamer in a promoter is shown as red and black, respectively. Labels of the matrix in vertical and horizontal dimensions are omitted. Octamers of 20 largest clusters were subjected to WebLogo (B). Some pairs of cluster (1–9, 2–15, 3–4, 5–7, 10–17, 12–13) are complementary, respectively. Clusters 5 and 7 are ERSE (CCAAT-N9-CCACG). We subsequently selected 20 largest clusters for summarization of the bipartite octamers for each cluster as sequence motifs. The motifs prepared with WebLogo are shown in Panel B. Several tandem repeats (clusters 1, 6, 9, 12, and 13) and also palindromic sequence (clusters 8 and 11) are included. We suggest that they are recognized by a dimer or a trimer of the same protein family.

3.2. Inclusion of known transcriptional regulatory elements in selected bipartite octamers

The 8,044 selected sequences include REG-type groups. According to their LDSS profiles, they are strongly suggested to be transcriptional regulatory sequences. Their inclusion in the selected sequences is an indication of the ability of the procedure utilized to identify potential regulatory sequences. In order to find other features that may be regulatory elements in the sequences selected, we surveyed known transcriptional regulatory elements within the sequences. One of the reported bipartite regulatory motifs is the endoplasmic reticulum stress-responsive element (ERSE, consensus sequence CCAAT-N9-CCACG) that is conserved between mammals and higher plants. The ERSE in mammals is known to be recognized by a general transcription factor NF-Y for CCAAT and the stress responsive transcription factor ATF6 for CCACG, and spacing between the two elements is fixed to nine nucleotides. An ERSE is also found in Arabidopsis promoters and is known to have a conserved function. When we searched the selected sequences for ERSEs, 28 related sequences were found (Table 2), indicating the inclusion of these elements in the extracted sequences. Then, microarray data of the transcriptional response to ER stress activated by tunicamycin treatment were used to calculate the RAR values of the selected sequences. Typically, RAR values of over 3.0 are considered ‘positive’ as transcriptional regulatory sequences. As shown in Table 2, the RAR values of most of the ERSE-related sequences are much >3.0 (shown in bold in the table). These results mean that the bipartite ERSEs can be detected as ER stress (tunicamycin treatment)-responsive elements in our promoter prediction using the selected bipartite octamers (4 + 4). The application of such promoter prediction will be demonstrated later in this article.
Table 2

ERSE sequences in the selected bipartite octamers

LDSS clusterSequenceMotifRAR (Tunicamycin_up)
1_REGACCA…………CACGERSE: CCAAT-N9-CCACG4.42
1_REGTCCA…………CACGERSE: CCAAT-N9-CCACG6.4
1_REGCCAA………GCCAERSE: CCAAT-N9-CCACG3.52
1_REGCCAA……….CCACERSE: CCAAT-N9-CCACG4.65
1_REGCCAA……….CACGERSE: CCAAT-N9-CCACG6.42
1_REGCCAA…………ACGTERSE: CCAAT-N9-CCACG5.64
2_REGCAAT………CCACERSE: CCAAT-N9-CCACG5.84
1_REGCAAT……….CACGERSE: CCAAT-N9-CCACG7.76
1_REGCAAT……….ACGTERSE: CCAAT-N9-CCACG6.83
1_REGCAAT…………CGTGERSE: CCAAT-N9-CCACG4.95
2_REGAATA………CACGERSE: CCAAT-N9-CCACG2.3
2_REGAATC………CACGERSE: CCAAT-N9-CCACG6.8
4_REG-likeCACG…………ATTGERSE: CCAAT-N9-CCACG8.53
2_REGGACG…………ATTGERSE: CCAAT-N9-CCACG4.39
1_REGACGT……….GATTERSE: CCAAT-N9-CCACG3.93
2_REGACGT……….CATTERSE: CCAAT-N9-CCACG4.07
1_REGACGT……….ATTGERSE: CCAAT-N9-CCACG8.69
1_REGACGT…………TTGGERSE: CCAAT-N9-CCACG10.25
2_REGCGTG………TATTERSE: CCAAT-N9-CCACG4.37
1_REGCGTG………GATTERSE: CCAAT-N9-CCACG4.28
2_REGCGTG………CATTERSE: CCAAT-N9-CCACG6.64
1_REGCGTG……….ATTGERSE: CCAAT-N9-CCACG11.96
1_REGCGTG……….TTGGERSE: CCAAT-N9-CCACG12.8
1_REGCGTG…………TGGTERSE: CCAAT-N9-CCACG10.88
2_REGGTGG………ATTGERSE: CCAAT-N9-CCACG5.69
3_REG-likeGTGT………ATTGERSE: CCAAT-N9-CCACG4.87
2_REGTGGC………TTGGERSE: CCAAT-N9-CCACG5.74
4_REG-likeTGTC………TTGGERSE: CCAAT-N9-CCACG5.09

LDSS cluster indicates the group number of LDSS clustering shown in Fig. 2C. Sequence matching with ERSE is indicated with underlining. A dot in a sequence means any nucleotide. The RAR values of each octamer for tunicamycin up-regulation are also shown. RAR values >3.0 are shown in bold.

ERSE sequences in the selected bipartite octamers LDSS cluster indicates the group number of LDSS clustering shown in Fig. 2C. Sequence matching with ERSE is indicated with underlining. A dot in a sequence means any nucleotide. The RAR values of each octamer for tunicamycin up-regulation are also shown. RAR values >3.0 are shown in bold. Another reported bipartite motif of higher plants is the NAC-binding motif (CTTG-N5-CAAG) that is recognized by Arabidopsis ANAC013. This is also called the mitochondrial dysfunction motif (MDM) which includes the NAC-binding site and has several sequence variations. Our extracted bipartite sequences included 13 sequences related to the NAC-binding motif (Table 3). The presence of these sequences in the extracted sequences also supports the idea that our method does extract bipartite transcriptional regulatory sequences. One feature of this group is non-REG-type distribution (Table 3).
Table 3

NAC target sequences in the selected bipartite octamers

LDSS clusterSequenceMotif
7_non-REGACTT……CAAGNAC: CTTG-N5-CAAG
3_REG-likeCCTT……CAAGNAC: CTTG-N5-CAAG
9_non-REGGCTT……CAAGNAC: CTTG-N5-CAAG
9_non-REGCTTG….TGAANAC: CTTG-N5-CAAG
3_REG-likeCTTG….CACGNAC: CTTG-N5-CAAG
9_non-REGCTTC….CAAGNAC: CTTG-N5-CAAG
9_non-REGCTTG….GAAGNAC: CTTG-N5-CAAG
7_non-REGCTTG….CAAGNAC: CTTG-N5-CAAG
9_non-REGCGTG….CAAGNAC: CTTG-N5-CAAG
4_REG-likeCTTG……AAGTNAC: CTTG-N5-CAAG
9_non-REGCTTG……AAGGNAC: CTTG-N5-CAAG
9_non-REGCTTG……AAGANAC: CTTG-N5-CAAG
9_non-REGTTGC…TGAANAC: CTTG-N5-CAAG

LDSS cluster indicates the group number of LDSS clustering shown in Fig. 2C. Sequence matching with the NAC target motif is underlined. A dot in a sequence means any nucleotide.

NAC target sequences in the selected bipartite octamers LDSS cluster indicates the group number of LDSS clustering shown in Fig. 2C. Sequence matching with the NAC target motif is underlined. A dot in a sequence means any nucleotide. As the NAC gene family is composed of 94–106 genes in Arabidopsis, variations in the target sequence among this family and differentiation of their function could be present. Therefore, the sequences in Table 3 may be recognized by different NAC proteins.

3.3. Applications of bipartite octamer analysis to promoter prediction

The selected 8,044 bipartite octamers were utilized for promoter prediction. An ER stress-responsive BiP3 promoter was selected for the prediction of tunicamycin-responsive elements. The results of this prediction were compared with those obtained using the established method using octamers without any spacers. Figure 5A shows the results of promoter scans of BiP3 for tunicamycin-responsive elements. The vertical axis shows the RAR, and typically peaks >3.0 are selected as prediction sites. As shown, several peaks are >3.0, and two of them match experimentally identified ERSEs, ERSE-A, and ERSE-B. The RAR values of bipartite octamers are indicated with red X marks, and they show much higher RAR values than the continuous octamers at the positions of ERSEs, demonstrating the superiority of the bipartite octamers in detection of the ERSE.
Figure 5

Promoter prediction of BiP3/BiP-L for tunicamycin response. Application of the bipartite sequences for prediction of tunicamycin-responsive elements in a BiP/BiP-L (AT1G09080) promoter. (A) Results of promoter scans using octamers (green line graph) and bipartite octamers (red X mark) are shown. The Relative Appearance Ratio (RAR) means the degree of overrepresentation in tunicamycin-responsive promoters over the total promoters and thus high RAR sites predict corresponding regulatory elements. A height of RAR = 3.0, a conventional threshold, is indicated by a black horizontal line. Positions of experimentally identified ERSEs, ERSE-A and ERSE-B, are also shown in the graph. (B and C) Sequences from -247 to -198 (B) and from -147 to -98 (C), relative to the translation start site (ATG), are shown. Green letters show predictions based on continuous octamers, and red letters are predictions based on bipartite octamers. Underlining indicates sequences that match with the ERSE motif (CCAAT-N9-CCACG). Values in parentheses are RAR scores that show the ratio of overrepresentation in tunicamycin-responsive promoters over the global promoter set in the Arabidopsis genome. ERSE-A and ERSE-B are functionally confirmed ERSEs reported by Noh et al. Sequence in bold means an ERSE motif in ERSE-A and ERSE-B, as reported by Noh et al.

Promoter prediction of BiP3/BiP-L for tunicamycin response. Application of the bipartite sequences for prediction of tunicamycin-responsive elements in a BiP/BiP-L (AT1G09080) promoter. (A) Results of promoter scans using octamers (green line graph) and bipartite octamers (red X mark) are shown. The Relative Appearance Ratio (RAR) means the degree of overrepresentation in tunicamycin-responsive promoters over the total promoters and thus high RAR sites predict corresponding regulatory elements. A height of RAR = 3.0, a conventional threshold, is indicated by a black horizontal line. Positions of experimentally identified ERSEs, ERSE-A and ERSE-B, are also shown in the graph. (B and C) Sequences from -247 to -198 (B) and from -147 to -98 (C), relative to the translation start site (ATG), are shown. Green letters show predictions based on continuous octamers, and red letters are predictions based on bipartite octamers. Underlining indicates sequences that match with the ERSE motif (CCAAT-N9-CCACG). Values in parentheses are RAR scores that show the ratio of overrepresentation in tunicamycin-responsive promoters over the global promoter set in the Arabidopsis genome. ERSE-A and ERSE-B are functionally confirmed ERSEs reported by Noh et al. Sequence in bold means an ERSE motif in ERSE-A and ERSE-B, as reported by Noh et al. Figure 5B and C shows sequences around ERSE-A (Panel B) and ERSE-B (Panel C), and the predictions using continuous octamers are shown in green letters. In both cases, the CCACG site of the ERSE, which is the recognition site (CGTGT, the underlining shows the match with the ERSE) of the ATF6 homolog of Arabidopsis, bZIP28, was detected by continuous octamer analysis, but the NF-Y binding site (ATTGG, the underlining shows the match with ERSE) was not. These results indicate that the bZIP binding site of the ERSE has some specificity in ER-stress responsive promoters, but the NF-Y site does not have any specificity in the same promoters, resulting in the failure to detect the NF-Y site as an ERSE. For comparison, the same gene list from the microarray data of tunicamycin response has been applied to Gibbs Sampler, MEME, CONSENSUS and MD Scan provided by Melina II. These four methods could not detect either ERSE-A or ERSE-B (data not shown). Predictions using our bipartite octamers are shown in red letters in Fig. 5B and C. Both the bZIP and NF-Y binding sites were detected as putative ERSEs, indicating the advantage of using the bipartite octamer analysis to predict the bipartite ERSE. Comparison of the RAR signals between the two methods reveals that signals of the bipartite octamer analysis are much higher than those of the continuous octamer analysis (Panel A). This indicates higher sequence specificity of the detected ERSE sequences in the bipartite octamer method than in the continuous method. These results, shown in Fig. 5, demonstrate that detection of ERSEs among tunicamycin-stimulated ER-stress responsive promoters has been considerably improved by introducing the bipartite octamer analysis. The second application of our new method is the prediction of ANAC017 target sites using microarray data of Arabidopsis ANAC017 mutants. As ANAC017 is a mediator of H2O2 signaling, the effect of a mutation was compared with wild type after H2O2 treatments. As shown in Fig. 6A, a promoter region of ATERF71 was predicted as a target site of ANAC017 (underlined) according to the following results: the expression level of ATERF71 was reduced in the ANAC017 mutants, and a target site of ANAC017 predicted by bipartite octamer analysis was mapped to the promoter region of ATERF71. Direct binding of ANAC017 to a bipartite octamer predicted for H2O2 response. (A) The predicted target site of ANAC017 in an ATERF71 (AT2G47520) promoter was used as a biotinylated DNA probe in a DNA–protein binding assay by AlphaScreen. Underlining shows the predicted target site identified by the bipartite analysis. The sequences of unbiotinylated competitor DNAs with base substitutions from the biotinylated probe are shown (M0–M8, unsubstituted bases are shown as dots). Asterisks on the top of the probe sequence indicate critical bases for binding by ANAC017 protein, revealed by Panel B. (B) Results of the AlphaScreen assay are shown as the relative AlphaScreen signal (the signal obtained with non-biotinylated probe/signal with biotinylated probe). The average and standard error of four assays are shown. In order to confirm the predicted target site of ANAC017, direct binding of ANAC017 protein to the DNA sequence was examined in vitro by the AlphaScreen method. A FLAG-tagged ANAC017 protein was prepared in vitro using wheat germ and subjected to binding assays with a biotinylated oligo DNA probe. As shown in Fig. 6, binding assays of the ANAC017 protein to the probe containing the predicted target site gave positive signals, indicating direct binding (no competitor, Panel B). Addition of a non-biotinylated competitor reduced the signal (competitor M0), but addition of mutated competitors of M1, M2, M3, and M6 failed to reduce the binding signals. The target sequence, revealed by these assays, that is necessary for the binding is shown with asterisks in Panel A. These results demonstrate that the bipartite octamer analysis developed here has predicted the exact target sequence of the ANAC017 protein based on microarray data of ANAC017 mutants.

3.4. Possibility of further extension

We have tried several patterns other than [4 + 4], but could not obtain good improvements so far. In an example, a zinc finger protein, STOP1, recognizes [3 + 3 + 2] in a target promoter, but prediction of the target site using its knockout data was better with [8] than with [3 + 3 + 2] for some unknown reason. Currently, only results of [4 + 4] are worth reporting. We have given up to increase division patterns. We now think that extension of octamer analysis is enough with addition of [4 + 4]. In summary, we have successfully developed a supplementary method for promoter prediction for spacer-containing transcriptional regulatory elements in Arabidopsis. This method should be used alongside the octamer analysis that was established year ago and is applicable to any other genomes that are thought to contain bipartite transcriptional regulatory elements.

Conflict of interest

None declared.

Supplementary data

Supplementary Tables S1–S3 are available at www.dnaresearch.oxfordjournals.org.

Funding

This work is supported in part by JST ALCA (to Y.Y.Y.) and Grant-in-Aid for Scientific Research on Innovative Areas ‘Plant Perception’ (No. 23120511) (to Y.Y.Y.). Click here for additional data file.
  22 in total

1.  SENSITIVE TO PROTON RHIZOTOXICITY1, CALMODULIN BINDING TRANSCRIPTION ACTIVATOR2, and other transcription factors are involved in ALUMINUM-ACTIVATED MALATE TRANSPORTER1 expression.

Authors:  Mutsutomo Tokizawa; Yuriko Kobayashi; Tatsunori Saito; Masatomo Kobayashi; Satoshi Iuchi; Mika Nomoto; Yasuomi Tada; Yoshiharu Y Yamamoto; Hiroyuki Koyama
Journal:  Plant Physiol       Date:  2015-01-27       Impact factor: 8.340

2.  A membrane-bound NAC transcription factor, ANAC017, mediates mitochondrial retrograde signaling in Arabidopsis.

Authors:  Sophia Ng; Aneta Ivanova; Owen Duncan; Simon R Law; Olivier Van Aken; Inge De Clercq; Yan Wang; Chris Carrie; Lin Xu; Beata Kmiec; Hayden Walker; Frank Van Breusegem; James Whelan; Estelle Giraud
Journal:  Plant Cell       Date:  2013-09-17       Impact factor: 11.277

3.  Cluster analysis and display of genome-wide expression patterns.

Authors:  M B Eisen; P T Spellman; P O Brown; D Botstein
Journal:  Proc Natl Acad Sci U S A       Date:  1998-12-08       Impact factor: 11.205

4.  Expression of an evolutionarily distinct novel BiP gene during the unfolded protein response in Arabidopsis thaliana.

Authors:  Seung-Jae Noh; Chang Seob Kwon; Dong-Ha Oh; Jae Sun Moon; Won-Il Chung
Journal:  Gene       Date:  2003-06-05       Impact factor: 3.688

Review 5.  Plant transducers of the endoplasmic reticulum unfolded protein response.

Authors:  Yuji Iwata; Nozomu Koizumi
Journal:  Trends Plant Sci       Date:  2012-07-14       Impact factor: 18.313

6.  Prediction of transcriptional regulatory elements for plant hormone responses based on microarray data.

Authors:  Yoshiharu Y Yamamoto; Yohei Yoshioka; Mitsuro Hyakumachi; Kyonoshin Maruyama; Kazuko Yamaguchi-Shinozaki; Mutsutomo Tokizawa; Hiroyuki Koyama
Journal:  BMC Plant Biol       Date:  2011-02-24       Impact factor: 4.215

7.  Arabidopsis IRE1 catalyses unconventional splicing of bZIP60 mRNA to produce the active transcription factor.

Authors:  Yukihiro Nagashima; Kei-Ichiro Mishiba; Eiji Suzuki; Yukihisa Shimada; Yuji Iwata; Nozomu Koizumi
Journal:  Sci Rep       Date:  2011-07-01       Impact factor: 4.379

8.  Identification of plant promoter constituents by analysis of local distribution of short sequences.

Authors:  Yoshiharu Y Yamamoto; Hiroyuki Ichida; Minami Matsui; Junichi Obokata; Tetsuya Sakurai; Masakazu Satou; Motoaki Seki; Kazuo Shinozaki; Tomoko Abe
Journal:  BMC Genomics       Date:  2007-03-08       Impact factor: 3.969

9.  Melina II: a web tool for comparisons among several predictive algorithms to find potential motifs from promoter regions.

Authors:  Toshiyuki Okumura; Hiroki Makiguchi; Yuko Makita; Riu Yamashita; Kenta Nakai
Journal:  Nucleic Acids Res       Date:  2007-05-30       Impact factor: 16.971

10.  JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles.

Authors:  Anthony Mathelier; Oriol Fornes; David J Arenillas; Chih-Yu Chen; Grégoire Denay; Jessica Lee; Wenqiang Shi; Casper Shyr; Ge Tan; Rebecca Worsley-Hunt; Allen W Zhang; François Parcy; Boris Lenhard; Albin Sandelin; Wyeth W Wasserman
Journal:  Nucleic Acids Res       Date:  2015-11-03       Impact factor: 16.971

View more
  1 in total

1.  Transcriptome Analysis and Identification of a Transcriptional Regulatory Network in the Response to H2O2.

Authors:  Ayaka Hieno; Hushna Ara Naznin; Keiko Inaba-Hasegawa; Tomoko Yokogawa; Natsuki Hayami; Mika Nomoto; Yasuomi Tada; Takashi Yokogawa; Mieko Higuchi-Takeuchi; Kosuke Hanada; Minami Matsui; Yoko Ikeda; Yuko Hojo; Takashi Hirayama; Kazutaka Kusunoki; Hiroyuki Koyama; Nobutaka Mitsuda; Yoshiharu Y Yamamoto
Journal:  Plant Physiol       Date:  2019-05-07       Impact factor: 8.340

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.