Literature DB >> 15980498

AthaMap web tools for database-assisted identification of combinatorial cis-regulatory elements and the display of highly conserved transcription factor binding sites in Arabidopsis thaliana.

Nils Ole Steffens1, Claudia Galuschka, Martin Schindler, Lorenz Bülow, Reinhard Hehl.   

Abstract

The AthaMap database generates a map of cis-regulatory elements for the Arabidopsis thaliana genome. AthaMap contains more than 7.4 x 10(6) putative binding sites for 36 transcription factors (TFs) from 16 different TF families. A newly implemented functionality allows the display of subsets of higher conserved transcription factor binding sites (TFBSs). Furthermore, a web tool was developed that permits a user-defined search for co-localizing cis-regulatory elements. The user can specify individually the level of conservation for each TFBS and a spacer range between them. This web tool was employed for the identification of co-localizing sites of known interacting TFs and TFs containing two DNA-binding domains. More than 1.8 x 10(5) combinatorial elements were annotated in the AthaMap database. These elements can also be used to identify more complex co-localizing elements consisting of up to four TFBSs. The AthaMap database and the connected web tools are a valuable resource for the analysis and the prediction of gene expression regulation at http://www.athamap.de.

Entities:  

Mesh:

Substances:

Year:  2005        PMID: 15980498      PMCID: PMC1160156          DOI: 10.1093/nar/gki395

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The regulation of gene expression is mainly conferred by transcription factors (TFs) that bind to cis-regulatory sequences. These sequences can be used to generate hypothesis about TF that may be involved in the regulation of nearby genes (1,2). In Arabidopsis thaliana, more than 1500 TFs corresponding to ∼5% of the total genes have been identified (3). The largest families are MYB and MYB-related (190 members), AP2/EREBP (144), bHLH (139), NAC (109), C2H2(Zn) (105), HD (89), MADS (82), bZIP (81) and WRKY (72). Since the complete sequence of the A.thaliana genome has been published (4), it was desirable to have a map of transcription factor binding sites (TFBSs) for the whole genome. The non-restrictive nature of such a map permits the identification of regulatory sequences within transcribed and coding regions as well. To accomplish such a map, the pattern search program Patser (5) and publicly available alignment matrices were used to generate the AthaMap database, the first TFBS map for the whole A.thaliana genome (6). The second release of the AthaMap database presented here has increased the data content from ∼2.4 × 106 to >7.4 × 106 putative sites. Specific care has been taken in the annotation of CAT- and TATA-boxes, which were predicted using alignment matrices from the PlantProm database (7) together with the positional information relative to transcription start sites (TSSs) or translation start sites. Because each TFBS is associated with a particular score that represents the similarity of the site to the underlying alignment matrix, a new functionality was implemented that allows the identification of highly conserved binding sites. It is well known that the composition of binding sites in the regulatory region of a gene confers its specific expression profile (8). For example, two G-box like sequences constitute the as-1 element that is bound by bZIP TFs (9). Another example is the ocs element that occurs in certain glutathione S-transferase genes of Arabidopsis, which harbour a bZIP and DOF factor binding site in close vicinity (10–12). A wide variety of expression specificity is associated with the co-localization of MYB- and MYC-binding sites (13–16). Other examples are MADS/MADS TFBSs and those TFs that harbour two DNA-binding domains, such as AP2 (17,18). For the identification of such co-localizing elements, a new web tool was implemented that permits a user-defined identification of pairs of TFBSs in the genome of Arabidopsis by providing distance and quality parameters. This web tool was used to identify the co-localizing sites for known interacting factors. Such combinatorial elements were annotated to the AthaMap database and can also be used for the identification of more complex elements consisting of, for example, two combinatorial elements harbouring four TFBSs.

INCREASE IN AthaMap DATA CONTENT AND FUNCTIONALITY

As summarized in Table 1, the genomic positions of more than 7.4 × 106 putative TFBSs were determined in the A.thaliana genome. These positions were identified with 42 alignment matrices for 36 TFs. For the factors bZIP910, bZIP911, PIF3, ABI4, RAV1 and MYB.PH3, two different alignment matrices were employed and they are identified by numbers in brackets behind the factor name (Table 1). The binding sites were taken directly from the published literature, which is regularly screened in the process of updating the TRANSFAC® database with plant transcription factor data (2).
Table 1

The number of putative binding sites detected with alignment matrices for TFs from different factor families in the A.thaliana genome and annotated in the AthaMap database

FactorFamilySpeciesNumber of sitesReference for alignment matrix
ABF1bZIPA.thaliana2419(25)
bZIP910[1]bZIPAntirrhinum majus345(26)
bZIP910[2]bZIPA.majus470(26)
bZIP911[1]bZIPA.majus123(26)
bZIP911[2]bZIPA.majus145(26)
TGA1bZIPA.thaliana53 494(27)
TGA1abZIPNicotiana tabacum142 072(28)
O2bZIPZea mays173 685(28)
PIF3[1]bHLHA.thaliana1154(19)
PIF3[2]bHLHA.thaliana951(19)
DOF2DOFZ.mays1 840 355(29)
AGMADSA.thaliana46 240(30)
AGL3MADSA.thaliana73 298(31)
AGL15MADSA.thaliana262 900(32)
ABI4[1]AP2/EREBPZ.mays12 830(33)
ABI4[2]AP2/EREBPZ.mays11 955(33)
ANTAP2/EREBPA.thaliana294(34)
RAV1[1]AP2/EREBPA.thaliana310 764(18)
RAV1[2]AP2/EREBPA.thaliana229 983(18)
TEILAP2/EREBPN.tabacum602 300(35)
AtMYB15MYBA.thaliana209(36)
AtMYB77MYBA.thaliana17 836(36)
AtMYB84MYBA.thaliana231(36)
CDC5MYBA.thaliana11 574(37)
GAMYBMYBHordeum vulgare315 722(38)
MYB.PH3[1]MYBPetunia hybrida8529(39)
MYB.PH3[1]MYBP.hybrida7638(39)
PMYBZ.mays210 035(40)
GT1TrihelixDiverse species1 439 744(41)
PCF2TCPOryza sativa37 373(42)
PCF5TCPO.sativa14 090(42)
HVH21HD-KnH.vulgare526 877(43)
ALFIN1HD-PHDMedicago sativa546 159(44)
ATHB1HD-ZIPA.thaliana66 460(45)
ATHB5HD-ZIPA.thaliana7115(46)
ATHB9HD-ZIPA.thaliana303(47)
HAHB4HD-ZIPHelianthus annus90 825(48)
AGP1GATAN.tabacum108 199(49)
ZAP1WRKYA.thaliana4302(50)
ID1C2H2 (Zn)Z.mays156 641(51)
TBPDiverse species16 277(7)
CBFDiverse species62 033(7)
Total number of sites7 413 949
The screens were performed on the most recent version of the A.thaliana genome sequence (TIGR release 5.0, January 21, 2004). The pattern search program Patser (5) was used for the identification of binding sites as described previously (6). The following command line was used to run Patser: ‘patser-v3d -A a:t 0.320 c:g 0.180 -m matrixfile -f sequencefile -c -li -d2’. For all screens, the default threshold calculated by Patser from the adjusted information content of the matrix was employed. This criterion was chosen as an objective cut-off threshold value applicable for all the matrices as it represents a measure of how far the nucleotide frequency distribution in the alignment matrix diverges from the a priori probability for the occurrence of the nucleotides in the genome (5). In the case of CAT- and TATA-boxes (CBF and TBP), only those elements that occur upstream of known TSSs or predicted translation start sites were imported into the AthaMap database. TSSs and translation start sites were annotated to the AthaMap database as provided by the TIGR. The AthaMap database is based on the in silico determination of binding sites and does not distinguish between experimentally verified and predicted sites. Therefore, it is desirable to discriminate between higher and lower conserved binding sites. A criterion for the conservation of a site is the individual score of a TFBS determined by using Patser (5). In general, only TFBSs with a specific score above a threshold score determined for each matrix were imported into the AthaMap database and are displayed as putative binding sites. A high score close to the possible maximum score represents a highly conserved binding site whereas a low score close to the threshold stands for a less conserved site. Maximum score, threshold score and specific score of a site are identified in a tool tip box in the AthaMap database to evaluate individual TFBSs (6). To permit the exclusive display of higher conserved TFBSs, a new function was implemented in the AthaMap database that allows the user to restrict the number of sites shown by the quality of their scores. With the new ‘Restriction’ function on the ‘Search’ page of AthaMap, the user is able to restrict the sites displayed to those that are closer to the maximum score. This requires an input value as a percentage, which is then applied to the difference between maximum score and threshold score. For example, if the restrictive value is set to 20% then only sites with a score of at least 6 will be displayed for a matrix with a maximum score of 10 and a threshold score of 5, while normally all sites with a score of at least 5 would be shown. A user-defined increase in the threshold score of TFBSs displayed in the AthaMap database may eliminate putative false positive TFBSs.

A WEB TOOL FOR THE IDENTIFICATION OF CO-LOCALIZING TFBS

Gene expression specificity is often mediated by the interaction between TFs that recognize closely spaced binding sites (8). The importance of combinatorial control for gene expression makes it desirable to identify co-localizing TFBSs in the genome based on user provided parameters. For this, a new ‘co-localization’ web tool was implemented on the AthaMap website that permits the selection of two TFs and the designation of a specific minimum and maximum spacer of up to 50 bp between two TFBSs. The user may select two different TFs or two identical TFs. Furthermore, one can increase the threshold score of the TFBSs individually to obtain combinatorial elements that show a higher conservation of underlying binding sites. The result of the co-localization analysis is shown on the same page and gives the total number of co-localizing TFBSs detected, the chosen parameters for the co-localization analysis and the number of sites used in the analysis. The spacer between two binding sites is defined by the distance between the most 5′ positions of both TFBSs. This permits the identification of overlapping sites that may be relevant for longer matrices with non-overlapping core sequences. To avoid identical hits at the same chromosomal position when using TFs of the same family, it is suggested to select a minimum spacer length that is as long as the matrix of one of the two factors. In addition, even known TSSs can be selected to identify TFBSs in close vicinity to the TSSs. Owing to the large number of putative binding sites for some factors, the co-localization analysis had to be limited to ∼200 000 TFBSs for each factor to permit a co-localization analysis in a reasonable time. The number of TFBSs of 10 matrices was limited to higher conserved sites by increasing their threshold scores in the co-localization analysis. This applies to the matrices of factors AGL15, ALFIN1, DOF2, GAMYB, HVH21, P, RAV1, TEIL and GT1. The applied parameters can be found on the AthaMap website. With these restrictions, co-localization analyses are generally executed in <1 min. Figure 1 shows a modified screenshot of a result page for a co-localization analysis with AtMYB15 and TGA1, which are both factors from A.thaliana (Table 1). As user-defined parameters, a minimum spacer of 0 nt and a maximum spacer of 20 nt between the binding sites and the default threshold of the alignment matrices (11.85 and 5.81, respectively) were selected. The total number of co-localizing sites detected is nine (Figure 1, combinatorial elements). A result table shows the positions of the co-localizing binding sites, the chromosome and the orientation of the respective site with an arrow. Furthermore, the spacer length of the individual co-localizing element is shown. Each position is linked to an AthaMap sequence window that opens and shows the co-localizing sites highlighted within their genomic context (data not shown).
Figure 1

Modified screenshots of the web tool for the identification of co-localizing TFBSs in the A.thaliana genome. The results for a co-localization analysis between TFBSs for TGA1 and AtMYB15 using the default parameters are shown. The arrow points to a result window when ‘Show overview’ is selected. See the text for details.

On the result page, when selected, a feature ‘Show overview’ displays a table with a summary of the co-localization analysis (Figure 1, arrow). The inserted table displays the total number of sites that were obtained with all spacer lengths between the selected minimum and the maximum spacer. Here, the user can readily see if a preferred spacer length is detected for binding sites of two TFs. This new tool will be very helpful to identify co-localizing binding sites for TFs that were shown experimentally to interact with each other. Furthermore, genes harbouring a similar architecture of cis-regulatory elements may be identified.

ANNOTATION OF COMBINATORIAL ELEMENTS IN THE AthaMap DATABASE

The well-known examples for combinatorial elements in plants are the as-1 element that is bound by two dimers of bZIP transcription factors, the endosperm or ocs element that is recognized by a member of the bZIP and DOF TF family, and promoters that harbour MYC/MYB or MADS/MADS TF binding sites (9,12,16,17). Based on the approximate spacing between these elements, co-localizing sites were determined with the above described web tool and annotated as bZIP/bZIP, bZIP/DOF, MYC/MYB and MADS/MADS combinatorial elements. A second class of co-localizing TFBSs consists of sites for factors that harbour two DNA-binding domains, such as RAV1 (18). RAV1 belongs to the AP2/EREBP superfamily of TFs that comprises the subfamilies AP2, EREBP and RAV-like (3). RAV1 has two different DNA-binding domains and for each of them the binding specificity was identified (18) and annotated as RAV1[1] and RAV1[2] in the AthaMap database. All the putative RAV combinatorial elements were derived from a co-localization of RAV1[1] and RAV1[2]. Table 2 lists the total number of combinatorial elements identified in the A.thaliana genome and annotated in the AthaMap database. The factors used for the determination of combinatorial sites and the distances between putative binding sites are shown. A total of 183 159 combinatorial elements were annotated in the AthaMap database. These elements are identified in the AthaMap database by the factor family names and are displayed with a double line in the sequence window. For the AP2/EREBP member RAV1 the two different alignment matrices were employed for co-localization analysis. Each combinatorial RAV element consists of two TFBSs that correspond to both matrices.
Table 2

Combinatorial TFBS in the A.thaliana genome annotated in the AthaMap database

Combinatorial elementDistance between sites (bp)Factor binding sites employedaNumber of elements annotated
RAV3–20RAV1[1]/RAV1[2]28 535
bZIP/bZIP10–15TGA1a/TGA1a1037
bZIP/DOF3–40TGA1a/DOF284 389
MYC/MYB3–40TGA1a/all MYB38 065
MADS/MADS10–100All MADS/all MADS31 133
Total number of combinatorial elements183 159

aOwing to the palindromic nature of the TGA1a and MADS box matrices, TFBSs frequently occur in sense and antisense at the same position. This leads to redundant combinatorial elements for which only one was annotated in the database and is displayed.

MYC (bHLH) TFs apparently recognize binding sites that are identical or are very closely related to bZIP-binding sites (19–21). Hence, annotated bZIP sites were employed for the identification of MYC-binding sites in combinatorial elements. The identification of functional MYC/MYB-binding sites by employing bZIP sites can be shown for the gene encoding BANYULS that is induced by the interacting TFs TT8 (MYC) and TT2 (MYB) (16,22,23). When the Arabidopsis genome identification number of the Banyuls gene (AT1G61720.1) is used for a search in the AthaMap database, a putative MYC/MYB combinatorial element is detected upstream of the TATA-box (data not shown). This combinatorial element corresponds to the previously determined MYC and MYB regulatory sites in the Banyuls promoter (24). Table 3 summarizes several known or experimentally predicted combinatorial elements detected in the AthaMap database.
Table 3

Examples of known and experimentally predicted combinatorial elements identified by co-localization analysis and annotated in the AthaMap database

Combinatorial elementGeneAGIReference
MYC/MYBBanyulsAT1G61720.1(24)
MYC/MYBTT3AT5G42800.1(52,53)
MADS/MADSApetala3AT3G54340.1(54)
MADS/MADSAgamousAT4G18960.1(55)
bZIP/DOFGST8AT1G78380.1(56)

The element can be displayed when entering the Arabidopsis Genome Identifier (AGI) in the search window of the AthaMap database.

As a further asset of the AthaMap database, these annotated combinatorial elements can be included in the user-defined identification of co-localizing TFBSs as well. Therefore, more complex arrangements of regulatory elements consisting of up to four individual binding sites can be detected.

AVAILABILITY

The AthaMap resources are freely available for non-commercial users at .
  56 in total

1.  Redundant enhancers mediate transcriptional repression of AGAMOUS by APETALA2.

Authors:  K Bomblies; N Dagenais; D Weigel
Journal:  Dev Biol       Date:  1999-12-01       Impact factor: 3.582

2.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences.

Authors:  G Z Hertz; G D Stormo
Journal:  Bioinformatics       Date:  1999 Jul-Aug       Impact factor: 6.937

3.  DNA-binding and dimerization preferences of Arabidopsis homeodomain-leucine zipper transcription factors in vitro.

Authors:  H Johannesson; Y Wang; P Engström
Journal:  Plant Mol Biol       Date:  2001-01       Impact factor: 4.076

4.  Analysis of the spacing between the two palindromes of activation sequence-1 with respect to binding to different TGA factors and transcriptional activation potential.

Authors:  Stefanie Krawczyk; Corinna Thurow; Ricarda Niggeweg; Christiane Gatz
Journal:  Nucleic Acids Res       Date:  2002-02-01       Impact factor: 16.971

5.  Database-assisted promoter analysis.

Authors:  R Hehl; E Wingender
Journal:  Trends Plant Sci       Date:  2001-06       Impact factor: 18.313

6.  The auxin, hydrogen peroxide and salicylic acid induced expression of the Arabidopsis GST6 promoter is mediated in part by an ocs element.

Authors:  W Chen; K B Singh
Journal:  Plant J       Date:  1999-09       Impact factor: 6.417

7.  Functional conservation of plant secondary metabolic enzymes revealed by complementation of Arabidopsis flavonoid mutants with maize genes.

Authors:  X Dong; E L Braun; E Grotewold
Journal:  Plant Physiol       Date:  2001-09       Impact factor: 8.340

8.  DNA binding properties of the Arabidopsis floral development protein AINTEGUMENTA.

Authors:  S Nole-Wilson; B A Krizek
Journal:  Nucleic Acids Res       Date:  2000-11-01       Impact factor: 16.971

9.  Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.

Authors: 
Journal:  Nature       Date:  2000-12-14       Impact factor: 49.962

10.  The Arabidopsis TT2 gene encodes an R2R3 MYB domain protein that acts as a key determinant for proanthocyanidin accumulation in developing seed.

Authors:  N Nesi; C Jond; I Debeaujon; M Caboche; L Lepiniec
Journal:  Plant Cell       Date:  2001-09       Impact factor: 11.277

View more
  28 in total

Review 1.  Web-queryable large-scale data sets for hypothesis generation in plant biology.

Authors:  Siobhan M Brady; Nicholas J Provart
Journal:  Plant Cell       Date:  2009-04-28       Impact factor: 11.277

2.  TEOSINTE BRANCHED1/CYCLOIDEA/PROLIFERATING CELL FACTOR4 Interacts with WRINKLED1 to Mediate Seed Oil Biosynthesis.

Authors:  Que Kong; Sanjay K Singh; Jenny J Mantyla; Sitakanta Pattanaik; Liang Guo; Ling Yuan; Christoph Benning; Wei Ma
Journal:  Plant Physiol       Date:  2020-07-06       Impact factor: 8.340

3.  Thousands of cis-regulatory sequence combinations are shared by Arabidopsis and poplar.

Authors:  Jun Ding; Haiyan Hu; Xiaoman Li
Journal:  Plant Physiol       Date:  2011-11-04       Impact factor: 8.340

4.  bZIP28 and NF-Y transcription factors are activated by ER stress and assemble into a transcriptional complex to regulate stress response genes in Arabidopsis.

Authors:  Jian-Xiang Liu; Stephen H Howell
Journal:  Plant Cell       Date:  2010-03-05       Impact factor: 11.277

5.  Integration of bioinformatics and synthetic promoters leads to the discovery of novel elicitor-responsive cis-regulatory sequences in Arabidopsis.

Authors:  Jeannette Koschmann; Fabian Machens; Marlies Becker; Julia Niemeyer; Jutta Schulze; Lorenz Bülow; Dietmar J Stahl; Reinhard Hehl
Journal:  Plant Physiol       Date:  2012-06-28       Impact factor: 8.340

6.  Systematic prediction of cis-regulatory elements in the Chlamydomonas reinhardtii genome using comparative genomics.

Authors:  Jun Ding; Xiaoman Li; Haiyan Hu
Journal:  Plant Physiol       Date:  2012-08-22       Impact factor: 8.340

7.  AGRIS and AtRegNet. a platform to link cis-regulatory elements and transcription factors into regulatory networks.

Authors:  Saranyan K Palaniswamy; Stephen James; Hao Sun; Rebecca S Lamb; Ramana V Davuluri; Erich Grotewold
Journal:  Plant Physiol       Date:  2006-03       Impact factor: 8.340

8.  AthaMap, integrating transcriptional and post-transcriptional data.

Authors:  Lorenz Bülow; Stefan Engelmann; Martin Schindler; Reinhard Hehl
Journal:  Nucleic Acids Res       Date:  2008-10-08       Impact factor: 16.971

9.  PlantPAN: Plant promoter analysis navigator, for identifying combinatorial cis-regulatory elements with distance constraint in plant gene groups.

Authors:  Wen-Chi Chang; Tzong-Yi Lee; Hsien-Da Huang; His-Yuan Huang; Rong-Long Pan
Journal:  BMC Genomics       Date:  2008-11-26       Impact factor: 3.969

10.  The word landscape of the non-coding segments of the Arabidopsis thaliana genome.

Authors:  Jens Lichtenberg; Alper Yilmaz; Joshua D Welch; Kyle Kurz; Xiaoyu Liang; Frank Drews; Klaus Ecker; Stephen S Lee; Matt Geisler; Erich Grotewold; Lonnie R Welch
Journal:  BMC Genomics       Date:  2009-10-08       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.