Literature DB >> 17148485

AthaMap web tools for the analysis and identification of co-regulated genes.

Claudia Galuschka1, Martin Schindler, Lorenz Bülow, Reinhard Hehl.   

Abstract

The AthaMap database generates a map of cis-regulatory elements for the whole Arabidopsis thaliana genome. This database has been extended by new tools to identify common cis-regulatory elements in specific regions of user-provided gene sets. A resulting table displays all cis-regulatory elements annotated in AthaMap including positional information relative to the respective gene. Further tables show overviews with the number of individual transcription factor binding sites (TFBS) present and TFBS common to the whole set of genes. Over represented cis-elements are easily identified. These features were used to detect specific enrichment of drought-responsive elements in cold-induced genes. For identification of co-regulated genes, the output table of the colocalization function was extended to show the closest genes and their relative distances to the colocalizing TFBS. Gene sets determined by this function can be used for a co-regulation analysis in microarray gene expression databases such as Genevestigator or PathoPlant. Additional improvements of AthaMap include display of the gene structure in the sequence window and a significant data increase. AthaMap is freely available at http://www.athamap.de/.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 17148485      PMCID: PMC1761422          DOI: 10.1093/nar/gkl1006

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Bioinformatic tools in molecular biology can easily establish hypotheses for a directed design of experimental set-ups. Bioinformatic gene expression analysis is supported by increasing data on spatial and temporal gene expression and transcription factors (TFs). Gene transcription is mainly regulated by the binding of TFs to cis-regulatory sequences. The occurrence of a cis-sequence is the prerequisite for direct DNA binding that promotes or represses transcription of the gene. Eukaryotic regulation of gene expression is complex and involves synchronized binding of TFs to adjacent cis-regulatory sequences (1). A colocalization analysis of TF binding sites (TFBS) is useful to predict such combinatorial effects on gene expression. Furthermore, binding of TFs can coordinately regulate whole sets of genes. Bioinformatic methods have been established to predict putative binding sites of TFs in DNA sequences. Web-based resources for detecting TF binding sites or cis-regulatory sequences in plant genes not restricted to Arabidopsis thaliana are Place, PlantCare, and TRANSFAC® (2–4). Genome-wide detection of binding sites can be performed online with the regulatory sequence analysis (RSA) tools (5). A similar genomic sequence search in Arabidopsis can be performed using Patmatch at TAIR (6,7). Pattern recognition programs such as MatInspector, Match or Patser utilize alignment matrices which are derived for example from random binding site selection experiments that determine a set of DNA sequences that can be bound by the same factor (8–10). Using Patser, the AthaMap database was established for A.thaliana. This database generates a genome wide map of putative TF binding sites determined from alignment matrices (11). Web tools have been implemented for the detection of colocalizing cis-regulatory elements in the genome (12). Combinatorial elements based on known TF interactions have been identified. In addition to positional weight matrix-based detection of binding sites, experimentally verified binding sites were annotated as well (13). The last version of AthaMap contained the genomic positions of more than 8 × 106 putative TFBS for 88 TFs from 21 different families. Another resource for cis-regulatory sequences in A.thaliana is AGRIS (14,15). In contrast to AGRIS, AthaMap covers the whole A.thaliana genome and is mainly based on binding site detection by positional weight matrices. Co-regulation of genes may be directed by similar combinations of cis-regulatory elements. For A.thaliana, several web-based services harbour gene expression data from microarray experiments and allow recovery of co-regulated genes. Such web-based services are for example TAIR, NASCArrays tools, Stanford Microarray Database, Botany Array Resource, GEO, and Genevestigator (6,16–21). For the detection of gene clusters with similar expression patterns, ACT, Botany Array Resource, CSB.DB, and Genevestigator can be used (19,21–24). To enable discovery and analysis of common cis-regulatory elements annotated in AthaMap, a new Gene Analysis feature has been developed to allow comparative analysis of cis-elements in sets of co-transcribed genes. Similar expression patterns can also be determined by colocalizing TFBS. For this, the colocalization function has been improved for identification of gene sets harbouring similar combinations of TFBS. Furthermore, the data content in AthaMap has increased significantly and the gene structure is shown in the sequence display window.

INCREASED FUNCTIONALITY OF ATHAMAP

The Gene Analysis web tool

To identify and analyze co-regulated genes for common TFBS, the Gene Analysis web tool has been implemented. On the Gene Analysis page at AthaMap, a gene list can be entered by providing the locus identifier (AGI) of each gene separated by carriage returns. In addition to the gene list, the region of the genes to be analyzed needs to be specified as well. Therefore, the upstream and downstream borders of the analyzed regions relative to the annotated translation start point have to be entered. Because all matrix-based TF binding sites have a specific score between the threshold and maximum score defined by Patser, a restriction to higher conserved TFBS can be applied as well (12). As an example for co-regulation, the Demo button displays three genes in the input area. By default, the genomic region for analysis ranges from −500 bp upstream to 50 bp downstream relative to the translation start point. No restriction to higher conserved TFBS is set. A search result table lists the TF binding sites in the analyzed genomic region in detail. It displays the gene, the name and the family of the transcription factor and the chromosomal position of its TFBS. In addition, the distance of the binding site relative to the translation start point and the orientation of the binding site relative to the gene are specified. A plus means that the TF binding site and the gene are in the same orientation. Furthermore, for matrix-based TFBS also the maximum score and threshold score of the screening matrix as well as the individual score of the TFBS as a measure for sequence conservation are given. All listed genes and positions of the TFBS are linked to the sequence window for single gene display in the genomic context of surrounding binding sites. Because a gene may harbour more than one binding site for a specific TF, an overview table can be selected by using the ‘Show overview’ link. This results in a list of all gene-factor combinations having at least one binding site. The list shows the gene, the TF, the TF family, and the number of TFBS detected. The number of sites located upstream and downstream as well as their relative orientation to the start point of translation are given. This and all other tables can easily be copied and exported to a spreadsheet program for additional data processing. Orchestrated regulation of genes involves binding of specific TFs to sets of genes. By selecting ‘Show factors that are common in genes’, the occurrence of binding sites among the whole set of genes from a Gene Analysis search is displayed. In this table, all TFs with identified TF binding sites in the gene set are shown. The table is sorted hierarchically by the total number of genes per TF. Further information given is the total number of sites detected among the set of genes. To estimate TFBS frequencies, the theoretical number of TF binding sites in the genomic regions analyzed is also shown. This number is based on a theoretical random distribution of the total annotated TF binding sites of the respective TF. The ratio between real occurrence of TFBS and theoretical occurrence shows whether particular TF binding sites are over- or underrepresented. Further valuable information can be extracted by selecting ‘Show all factors’. This extends the table by showing also all TF with binding sites that are absent in the analyzed gene regions. A similar resource to inspect Arabidopsis promoter sets for cis-regulatory sequences is Athena (25). Important differences between AthaMap and Athena are the fixed promoter region of 3 kb in Athena and the flexible gene region selection in AthaMap. Furthermore, the data content is different. Athena binding sites are based on 105 TF consensus sequences from PLACE and AGRIS (2,14). In contrast, AthaMap is mainly based on alignment matrices of TFBS (11). This leads to a much higher TF binding site density in AthaMap. Athena has ∼30 TFBS in each promoter region of 3 kb (25). In comparison, AthaMap has a TF binding site density of ∼260 TFBS in a 3 kb region including the data update presented here.

Specific enrichment of drought-responsive elements in cold-induced genes

To demonstrate the functionality of the Gene Analysis web tool, three cold-inducible genes (cor15a: At2g42540, cor15b: At2g42530, and rd29A: At5g52310) were used as an example (26). The genomic region analyzed was first restricted to the upstream regions (−500 bp upstream, 0 bp downstream). The output of this Gene Analysis search is displayed in a detailed result list. The distribution of binding sites among the whole set of genes can be analyzed by selecting ‘Show factors that are common in genes’. Figure 1 shows that all three genes harbour DREB1A (CBF3), DREB1B (CBF1) and DREB1C (CBF2) binding sites in the upstream region. A P-value of 4.36 × 10−32 was determined for the occurence of 11 and more DREB1A binding sites within 500 bp of the upstream region of the three selected genes. This value was determined from the total number of 552 DREB1A TFBS identified in the genome (AthaMap documentation page), the total Arabidopsis genome sequence length of 119 186 497 bp, and the total 1500 bp analysed for DREB1A binding sites. For the 6 DREB1B and C TFBS the P-value is 6.21 × 10−20. It has been demonstrated, that these TFs, which are members of the AP2/EREBP family, can activate cold-induced genes by binding to the DRE/CRT cis-acting elements present in their promoter regions (27–29). The three sample genes are regulated by members of the AP2/EREBP transcription factor family, namely CBF/DREBs (26).
Figure 1

Partial screenshot of a Gene Analysis result page. Only the first lines of the result table showing common binding sites in the gene set are displayed.

Partial screenshot of a Gene Analysis result page. Only the first lines of the result table showing common binding sites in the gene set are displayed. In a further analysis of these genes, the genomic region was restricted to 0 bp upstream and 500 bp downstream to determine whether AP2/EREBP binding site overrepresentation is specific to the upstream regions. In this analysis, no DREB1A (CBF3), DREB1B (CBF1) and DREB1C (CBF2) binding sites were identified (data not shown). This indicates specific accumulation of these binding sites in the upstream region of the three genes. This example demonstrates the application of the Gene Analysis function for a set of co-regulated genes.

Extended functionality of the Colocalization Analysis web tool

The AthaMap colocalization web tool permits the positional identification of putative combinatorial elements (12). In the earlier version of AthaMap, only positional information of a predicted combinatorial element on a chromosome was shown. Now, the locus IDs of the closest genes of all colocalizing TF binding sites and the relative distances to the translation start sites are identified. For colocalization analysis, either a TF from the complete list of TFs can be selected by factor name or a factor family can be choosen and one member of this family can be selected for colocalization analysis. A denominator in front of the factor name indicates how the TF binding sites were identified. A bar (–) precedes all TFs that were annotated by matrix-based searches (11). A double bar (=) is assigned to combinatorial elements (12). TFs with binding sites derived from experimentally verified single sites based on consensus sequences are preceded by an arrow (>) (13). After a colocalization analysis, the resulting table specifies the chromosomal positions of the colocalizing binding sites and the locus ID of the nearest annotated gene. Furthermore, the distance relative to the start point of translation is given. Upstream positions are preceded by a minus. The locus IDs in this table are linked to the sequence display window showing the genomic context of the gene. Furthermore, an entire result list can easily be submitted to Gene Analysis for determination of further common cis-regulatory elements by using the respective link. Another link directly exports the gene list to the microarray gene expression analysis form in PathoPlant for the identification of co-expression during plant-pathogen interaction (30). Such a list of genes can also be used with other programs for co-expression analysis in microarray databases such as Genevestigator (21).

IMPROVED SEQUENCE DISPLAY AND INCREASED DATA CONTENT

Since the last update of AthaMap, a significant change in data display was implemented. In the earlier version, whole genes were displayed as underlined sequence stretches (11). Now, also gene structure elements, i.e. untranslated regions (UTRs), exons and introns, are identified. The annotation of gene structure is based on XML flatfiles downloaded from the TIGR web site (release 5.0) (31). These flatfiles were parsed using a Perl script and positional information for 5′- and 3′-UTRs, exons and introns were annotated to AthaMap. These regions are displayed in AthaMap with a colour code similar to the one used by TAIR (6). The colour code is explained on the sequence display window in AthaMap. The orientation of each gene is indicated below the sequence display window. Forward means that the gene is encoded on the annotated and displayed DNA strand, reverse means that the gene is encoded on the reverse complement strand. For further information on the respective gene, a short description is provided and direct links to TAIR, TIGR, and MIPS records are given (6,31,32). The data content of AthaMap was increased with nine new alignment matrices derived from eight TFs, another eight TFs with single site-based binding sites and one combinatorial element. These putative binding sites were determined as reported earlier (11–13). Table 1 lists the number of new TF binding sites detected with each matrix and the reference for the alignment matrix. In the case of SPL3 and SPL8, the sequences for alignment matrix generation were obtained directly from the authors of the respective publication (Table 1). Sequences for new single site-based screenings are shown in Table 2. One new combinatorial element was annotated as well. Binding sites derived from both HOX2a matrices were used for determination of combinatorial HOX2a elements (33). AthaMap now contains 9 872 372 TF binding sites detected with alignment matrices, 94 963 TF binding sites detected with experimentally verified TFBS, and 359 867 combinatorial elements based on known TF interactions. The TFs annotated in AthaMap cover most plant TF families (34). Table 3 summarizes the TF families and the number of different TFs represented in AthaMap.
Table 1

Matrix-based AthaMap data increase

FactorFamilySpeciesNumber of sitesReference for alignment matrix
AGL1MADSA.thaliana108 045(35)
AGL2MADSA.thaliana55 389(35)
SPL3SBPA.thaliana74 680(36)
SPL8SBPA.thaliana146 721(36)
HOX2a(1)HD-HOXZea mays755 289(33)
HOX2a(2)HD-HOXZea mays495 484(33)
bHLH66bHLHOryza sativa6811(37)
CBTCAMTAOryza sativa887(38)
LEC2ABI3/VP1A.thaliana139 340(39)

Number of putative binding sites detected with nine new alignment matrices for eight TFs in the A.thaliana genome.

Table 2

Site-based AthaMap data increase

FamilyFactorSynonymsAGIScreening sequencesNumber of sitesReference
AP2/EREBPERF7At3g20310GAGCCGCCA344(40)
TINY2DREB3At5g11590GAGCCGCCAC659(41)
CTACCGACAT
CTGCCGACAT
CTTCCGACAT
CTATCGACAT
CTACCGTCAT
bHLHMYC2JIN1, ZBF1, bHLH6, RAP1At1g32640TATACGTGTC GACACGTGGC662(42)
CAMTASR1At2g22300AAACGCGGAA427(43)
AAACGCGTAA
AAACGCGCAA
AACCGCGGAA
AACCGCGCAA
MYBCCA1At2g46830GCTAACTCG309(44)
LHYAt1g01060GCTAACTCG309(44)
WERMYB66At5g14750TCTCCAACTG877(45)
TACAACCGCA
TCTATGCCTG
TACGCATGCA
ACTAACGGTA
ACTAACAGTA
WRKY(Zn)WRKY40At1g80840CTTTGACCAA641(46)

A.thaliana TFs and screening sequences are listed with the corresponding core sequences being underlined. The number of sites annotated in AthaMap and the respective references are given.

Table 3

Transcription factor families represented in AthaMap

FamilyNumber of TFs
ABI3/VP13
AP2/EREBP16
bHLH3
bZIP11
C2C2 (Zn) Dof1
C2C2 (Zn) GATA4
C2H2(Zn)2
CAMTA2
CBF1
E2F/DP6
GARP1
GARP/ARR-B2
GATA1
HD-HOX1
HD-KNOTTED1
HD-PHD1
HD-ZIP6
MADS5
MYB14
NAC5
SBP6
TBP1
TCP2
Trihelix4
WRKY(Zn)4
Total103

In case of the CAT and TATA box binding proteins (CBF, TBP), the alignment matrices were extracted from the PlantProm database (47).

Matrix-based AthaMap data increase Number of putative binding sites detected with nine new alignment matrices for eight TFs in the A.thaliana genome. Site-based AthaMap data increase A.thaliana TFs and screening sequences are listed with the corresponding core sequences being underlined. The number of sites annotated in AthaMap and the respective references are given. Transcription factor families represented in AthaMap In case of the CAT and TATA box binding proteins (CBF, TBP), the alignment matrices were extracted from the PlantProm database (47).
  47 in total

1.  PathoPlant: a database on plant-pathogen interactions.

Authors:  Lorenz Bülow; Martin Schindler; Claudia Choi; Reinhard Hehl
Journal:  In Silico Biol       Date:  2004

2.  A CELD-fusion method for rapid determination of the DNA-binding sequence specificity of novel plant DNA-binding proteins.

Authors:  Gang-Ping Xue
Journal:  Plant J       Date:  2005-02       Impact factor: 6.417

3.  The Botany Array Resource: e-Northerns, Expression Angling, and promoter analyses.

Authors:  Kiana Toufighi; Siobhan M Brady; Ryan Austin; Eugene Ly; Nicholas J Provart
Journal:  Plant J       Date:  2005-07       Impact factor: 6.417

4.  Regulation of CAPRICE transcription by MYB proteins for root epidermis differentiation in Arabidopsis.

Authors:  Yoshihiro Koshino-Kimura; Takuji Wada; Tatsuhiko Tachibana; Ryuji Tsugeki; Sumie Ishiguro; Kiyotaka Okada
Journal:  Plant Cell Physiol       Date:  2005-03-28       Impact factor: 4.927

5.  A basic helix-loop-helix transcription factor in Arabidopsis, MYC2, acts as a repressor of blue light-mediated photomorphogenic growth.

Authors:  Vandana Yadav; Chandrashekara Mallappa; Sreeramaiah N Gangappa; Shikha Bhatia; Sudip Chattopadhyay
Journal:  Plant Cell       Date:  2005-05-27       Impact factor: 11.277

6.  Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes.

Authors:  J L Riechmann; J Heard; G Martin; L Reuber; C Jiang; J Keddie; L Adam; O Pineda; O J Ratcliffe; R R Samaha; R Creelman; M Pilgrim; P Broun; J Z Zhang; D Ghandehari; B K Sherman; G Yu
Journal:  Science       Date:  2000-12-15       Impact factor: 47.728

7.  AthaMap web tools for database-assisted identification of combinatorial cis-regulatory elements and the display of highly conserved transcription factor binding sites in Arabidopsis thaliana.

Authors:  Nils Ole Steffens; Claudia Galuschka; Martin Schindler; Lorenz Bülow; Reinhard Hehl
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

8.  PatMatch: a program for finding patterns in peptide and nucleotide sequences.

Authors:  Thomas Yan; Danny Yoo; Tanya Z Berardini; Lukas A Mueller; Dan C Weems; Shuai Weng; J Michael Cherry; Seung Y Rhee
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

9.  NCBI GEO: mining millions of expression profiles--database and tools.

Authors:  Tanya Barrett; Tugba O Suzek; Dennis B Troup; Stephen E Wilhite; Wing-Chi Ngau; Pierre Ledoux; Dmitry Rudnev; Alex E Lash; Wataru Fujibuchi; Ron Edgar
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

10.  Complete reannotation of the Arabidopsis genome: methods, tools, protocols and the final release.

Authors:  Brian J Haas; Jennifer R Wortman; Catherine M Ronning; Linda I Hannick; Roger K Smith; Rama Maiti; Agnes P Chan; Chunhui Yu; Maryam Farzad; Dongying Wu; Owen White; Christopher D Town
Journal:  BMC Biol       Date:  2005-03-22       Impact factor: 7.431

View more
  23 in total

1.  Transcriptional regulatory networks in Arabidopsis thaliana during single and combined stresses.

Authors:  Pankaj Barah; Mahantesha Naika B N; Naresh Doni Jayavelu; Ramanathan Sowdhamini; Khader Shameer; Atle M Bones
Journal:  Nucleic Acids Res       Date:  2015-12-17       Impact factor: 16.971

2.  Identification of a novel type of WRKY transcription factor binding site in elicitor-responsive cis-sequences from Arabidopsis thaliana.

Authors:  Fabian Machens; Marlies Becker; Felix Umrath; Reinhard Hehl
Journal:  Plant Mol Biol       Date:  2013-10-09       Impact factor: 4.076

Review 3.  Recent developments in plant zinc homeostasis and the path toward improved biofortification and phytoremediation programs.

Authors:  Hatem Rouached
Journal:  Plant Signal Behav       Date:  2012-12-06

4.  Molecular and physiological analysis of drought stress in Arabidopsis reveals early responses leading to acclimation in plant growth.

Authors:  Amal Harb; Arjun Krishnan; Madana M R Ambavaram; Andy Pereira
Journal:  Plant Physiol       Date:  2010-08-31       Impact factor: 8.340

5.  Integration of bioinformatics and synthetic promoters leads to the discovery of novel elicitor-responsive cis-regulatory sequences in Arabidopsis.

Authors:  Jeannette Koschmann; Fabian Machens; Marlies Becker; Julia Niemeyer; Jutta Schulze; Lorenz Bülow; Dietmar J Stahl; Reinhard Hehl
Journal:  Plant Physiol       Date:  2012-06-28       Impact factor: 8.340

6.  The role of annexin 1 in drought stress in Arabidopsis.

Authors:  Dorota Konopka-Postupolska; Greg Clark; Grazyna Goch; Janusz Debski; Krzysztof Floras; Araceli Cantero; Bartlomiej Fijolek; Stanley Roux; Jacek Hennig
Journal:  Plant Physiol       Date:  2009-05-29       Impact factor: 8.340

7.  The microRNA-regulated SBP-Box transcription factor SPL3 is a direct upstream activator of LEAFY, FRUITFULL, and APETALA1.

Authors:  Ayako Yamaguchi; Miin-Feng Wu; Li Yang; Gang Wu; R Scott Poethig; Doris Wagner
Journal:  Dev Cell       Date:  2009-08       Impact factor: 12.270

8.  AthaMap, integrating transcriptional and post-transcriptional data.

Authors:  Lorenz Bülow; Stefan Engelmann; Martin Schindler; Reinhard Hehl
Journal:  Nucleic Acids Res       Date:  2008-10-08       Impact factor: 16.971

9.  Properties of non-coding DNA and identification of putative cis-regulatory elements in Theileria parva.

Authors:  Xiang Guo; Joana C Silva
Journal:  BMC Genomics       Date:  2008-12-03       Impact factor: 3.969

10.  PlantPAN: Plant promoter analysis navigator, for identifying combinatorial cis-regulatory elements with distance constraint in plant gene groups.

Authors:  Wen-Chi Chang; Tzong-Yi Lee; Hsien-Da Huang; His-Yuan Huang; Rong-Long Pan
Journal:  BMC Genomics       Date:  2008-11-26       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.