Literature DB >> 23314754

STIFDB2: an updated version of plant stress-responsive transcription factor database with additional stress signals, stress-responsive transcription factor binding sites and stress-responsive genes in Arabidopsis and rice.

Mahantesha Naika1, Khader Shameer, Oommen K Mathew, Ramanjini Gowda, Ramanathan Sowdhamini.   

Abstract

Understanding the principles of abiotic and biotic stress responses, tolerance and adaptation remains important in plant physiology research to develop better varieties of crop plants. Better understanding of plant stress response mechanisms and application of knowledge derived from integrated experimental and bioinformatics approaches are gaining importance. Earlier, we showed that compiling a database of stress-responsive transcription factors and their corresponding target binding sites in the form of Hidden Markov models at promoter, untranslated and upstream regions of stress-up-regulated genes from expression analysis can help in elucidating various aspects of the stress response in Arabidopsis. In addition to the extensive content in the first version, STIFDB2 is now updated with 15 stress signals, 31 transcription factors and 5,984 stress-responsive genes from three species (Arabidopsis thaliana, Oryza sativa subsp. japonica and Oryza sativa subsp. indica). We have employed an integrated biocuration and genomic data mining approach to characterize the data set of transcription factors and consensus binding sites from literature mining and stress-responsive genes from the Gene Expression Omnibus. STIFDB2 currently has 38,798 associations of stress signals, stress-responsive genes and transcription factor binding sites predicted using the Stress-responsive Transcription Factor (STIF) algorithm, along with various functional annotation data. As a unique plant stress regulatory genomics data platform, STIFDB2 can be utilized for targeted as well as high-throughput experimental and computational studies to unravel principles of the stress regulome in dicots and gramineae. STIFDB2 is available from the URL: http://caps.ncbs.res.in/stifdb2.

Entities:  

Mesh:

Substances:

Year:  2013        PMID: 23314754      PMCID: PMC3583027          DOI: 10.1093/pcp/pcs185

Source DB:  PubMed          Journal:  Plant Cell Physiol        ISSN: 0032-0781            Impact factor:   4.927


Introduction

Among the world population, around 3.1 billion people from developing countries live in rural areas. For a large subset of this population (∼2.5 billion people), agriculture is the primary source for their livelihood and it also contributes to economic growth as 30% of the gross domestic product (GDP) (FAO 2012). By the middle of the 21st century, the expected world population will be about 10 billion and we will witness serious food shortages (Smith et al. 2010). The increasing pressure on global food productivity due to climate change, combined with the drastic increase in population, results in a demand for crop varieties that are adaptive and resistant to a variety of stresses. Considering these various socio-economic and agro-economic factors, sustainable agricultural production is an urgent issue to meet these challenges (Takeda and Matsuoka 2008, Turner et al. 2009, Newton et al. 2011). Plants are sessile, and they are often exposed to a wide range of both biotic and abiotic stresses, and they have developed intricate mechanisms to detect precise environmental changes, allowing optimal responses to adverse conditions (Atkinson and Urwin 2012). Biotic stress factors including bacteria, fungi, viruses, nematodes and herbivorous insects, and abiotic stress factors such as drought (Dubouzet et al. 2003, Mahajan and Tuteja 2005, Chaves et al. 2009, Fleury et al. 2010, Harb and Perreira 2011), cold (Warren 1998), heat, (Kang et al. 2011) salinity (Ulm et al. 2002, Mahajan and Tuteja 2005, Wallia et al. 2005), dehydration (Urao et al. 1993, Shinozaki and Yamaguchi-Shinozaki 2000, Tran et al. 2004), UV-B (Kilian et al. 2007), wounding (Cheong et al. 2002) and heavy metals (Jonak et al. 2004) cause 30–60% yield losses every year globally (Mantri et al. 2010). Stress response and tolerance towards these stresses were propagated by complex biological pathways and regulatory events involving mutliple molecular components (Shinozaki and Yamaguchi-Shinozaki 2000, Zhu 2001, Xiong et al. 2002, Abe et al. 2003, Jayasekaran et al. 2006, Liu et al. 2007, Agrarwal and Jha 2010, Baena-Gonzalez 2010, Tran and Mochida 2010, Walley and Dehesh 2010, Sinha et al 2011). The effects of abiotic or biotic stress may occur singularly or in combination and induce cellular damage at multiple stages of plant growth and development, and may induce varying degrees of phenotypic lethality including reduction in growth, wilting, loss of leaves, etc. (Chinnusamy et al. 2004). However, this is achieved by activating distinct signal transduction cascades, which in turn activate stress-responsive genes, ultimately leading to survival by transcriptional reprogramming at multiple stages of plant growth and development (Xiong and Zhu 2001). Understanding the basic biological mechanisms by which plants respond to multiple stresses is a prerequisite for the development of stress tolerance in crop plants. A wide variety of protein-coding genes, non-coding genes including microRNAs (Liu et al. 2008), transcription factors, epigenetic mechanisms and various biological pathways have been attributed to stress response (Singh et al. 2002, Hirayamna and Shinozaki 2007, Shinozaki and Yamaguchi-Shinozaki 2007, Kilian et al. 2012), tolerance (Sreenivasulu et al. 2007, Roy et al. 2011) and adaptation (Mirouze and Paszkowski 2011). Some genes have been reported to respond to both biotic and abiotic stress signals (Mantri et al 2010). Transcription factors are the master regulatory elements that directly bind to their distinct cis-regulatory elements and activate expression of many downstream genes, resulting in various mechanisms including stress tolerance (Agrarwal and Jha 2010). Various transcription factors such as AREB/ABF, MYB, AP2/EREBP, bZIP, MYC,HSF, DREB1/CBF, NAC, HB and WRKY were shown to influence stress response in plants (Singh et al. 2002, Shameer et al. 2009). Traditional methods to characterize stress-responsive transcription factors include nitrocellulose binding assays, DNA footprinting methods, gel-shift analysis, Southwestern blotting of both DNA and protein and high-throughput techniques such as chromatin immunoprecipitation chip (ChIp-chip) or ChIP-sequence, etc. While experimental methods are highly accurate, identification and characterization of the role of a given gene in a given stress response event will often be laborious and time consuming (Bulyk 2003, Zhou et al. 2010, Krasensky and Jonak 2012). To overcome this, computational approaches offer a platform to gather information by integrating various public data sets and sensitive prediction algorithms. Concurrent mining of public databases and analysis using robust algorithms would provide a novel platform to understand the major molecular activities involved in stress response, adaptation and tolerance (Hu et al. 2003, Fernandez-Suarez and Birney 2008, Perez-Rodriguez et al. 2010, Guberman et al. 2011, Kinsella et al. 2011, Sucaet and Deva 2011, Spooner et al. 2012). Genomic technologies are revolutionizing 21st century plant biology with the advent of high-throughput experimental platforms, optimized assay systems and advanced bioinformatics approaches. DNA microarray is a high-throughput technology extensively used to investigate plant model organisms such as Arabidopsis and rice varieties to detect expression levels of multiple transcripts quantitatively in parallel. Transcriptomic studies were employed for quantitative analysis of thousand of genes expressed at germination, growth and development, fertilization, flowering, and under biotic and abiotic stress conditions, thereby providing a convenient medium to characterize the stress regulome in plants (Oktem et al. 2008). An extensive collection of the expression data is currently available from public microarray databases such as Gene Expression Omnibus (GEO) (Barrett et al. 2001), ArrayExpress (Parkinson et al. 2007), Gene Expression Atlas (Kapushesky et al. 2010), etc. Mining such large-scale expression databases and integrating them with diverse data categories, using data interpretation algorithms, enables robust biological discoveries. The completion of the Arabidopsis thaliana (Arabidopsis Genome Initiative 2000) and Oryza sativa L. genomes (Yu et al. 2002, Zhao et al. 2004, International Rice Genome Sequencing Project 2005) further enhances the application of sensitive sequence search algorithms to predict putative transcription factor binding sites at the whole-genome level and to understand stress regulation via multiple transcription factors in plants. The stress-responsive mechanism in plants involves complex regulation of multiple genes and transcription factors. In the response to abiotic stresses such as ABA, drought, dehydration, cold, salinity, high light, heat, heavy metals, 10 specific families of transcription factors were known to be involved in A. thaliana and six specific families of transcription factors were known to be involved in O. sativa L. In plant biology, a detailed knowledge of the mechanisms of the transcriptional regulation of genes in response to biotic and abiotic stresses is an important paradigm. We hypothesize that identifying putative transcription factor binding sites for these stress-responsive transcription factors, upstream regions of genes differentially up-regulated in microarray studies due to multiple stress signals, could enable better understanding of genes and regulatory events mediating stress response mechanisms. Earlier, we described the Stress-responsive Transcription factor database, STIFDB (Shameer et al. 2009), a database that catalogued information of stress-responsive genes and transcription factor binding sites for abiotic stress-responsive genes in A. thaliana. Multiple experimental studies that profiled stress-responsive analysis in plants (Kang et al. 2011, Sanghera et al. 2011, Babitha et al. 2012) and computational studies (Mishra et al, 2009, Georgii et al. 2012) on stress-specific gene regulation have utilized the data compiled in STIFDB. In this paper, we report a new version of the database, called ‘STIFDB2’, as a data platform for the investigation of the plant stress regulome. To enable the utility of such a database for a wider plant research community, we have now updated the database with new features such as: the addition of agriculturally important crop species, new stress signals, curated transcription factors and their binding sites, additional stress-responsive genes from microarray experiments and orthologs recorded from other important crop plants. We have also integrated predicted orthologs for other agriculturally important crop species, i.e. maize, sorghum and soybean.

Results

The data compiled using biocuration and genomic data mining, stress-responsive transcription factor prediction and data integration were compiled as a web-based database called STIFDB2.

STIFDB2: a database for analysis of stress regulome in plants

STIFDB2 provides information on stress-responsive genes from a dicot (A. thaliana) and two gramineae species (O. sativa subsp. japonica and O. sativa subsp. indica). A total of 31 transcription factors (Table 1) were identified using the biocuration approach (see Table 1). A library of 15 stress signals that affect plants (ABA, aluminum, bacterial blight, cold, cold–drought–salt, dehydration, drought, heat, high light, iron, NaCl, osmotic stress, oxidative stress, UV-B and wounding) was also compiled (Table 2; details about transcription factors and consensus binding site data for A. thaliana were provided in Shameer et al. 2009). Transcription factor family/subfamily, cis-element, consensus binding site data and corresponding references (PubMed identifiers) used to perform transcription factor binding site prediction using stress-responsive genes in rice species are provided. Using the genomic data mining approach, a data set of 5,984 unique genes were identified and annotated with stress signals. STIFDB2 has 38,798 associations of stress signal, stress-responsive gene, transcription factor binding site, orientation of binding site and z-scores predicted using the STIF (Stress-responsive Transcription Factor) algorithm. Tables 3–5 provide a stress signal- and transcription factor-based summary of stress-responsive genes in STIFDB2. A summary of orthologs retrieved from sorghum, maize and soybean using stress-responsive genes in STIFDB2 is provided in Table 6. The percentage of multiple stress-responsive transcription factors responsive to various stress signals in STIFDB2 is provided in Fig. 1. Chromosomal distributions of genes in STIFDB2 for three species are provided in Fig. 2.
Table 1

Summary of species, stress-responsive genes, stress-responsive transcription factors and stress signals in STIFDB2

SpeciesStress-responsive genes in STIFDB2Stress-responsive transcription factorsStress signals
Arabidopsis thaliana3,150ABRE_ABI3_VP1ABA, aluminum, cold, cold–drought–salt, dehydration, drought, high light, iron, NaCl, osmotic stress, oxidative stress, UV-B and wounding
AuxRE_ARF
C_ABRE_bZIP
DREB_AP2_EREBP
GCC_box_AP2_EREBP
G_ABRE_bZIP
G_box1_bZIP
G_box2_bZIP
G_box_bHLH
HBE_HB
HSE1_HSF
Myb_box1_MYB
Myb_box2_MYB
Myb_box3_MYB
Myb_box4_MYB
Myb_box5_MYB
N_box_bHLH
Nac_box_NAC
W_box_WRKY
Oryza sativa subsp. japonica1,118ABRE_ABI3_VP1ABA, bacterial blight, cold, drought, heat, iron and NaCl
ABRE_bZIP
DREB_AP2_EREBP
G_box1_bZIP
Nac_box_NAC
OsIRO2_bHLH
PRE2_WRKY
PRE4_WRKY
Oryza sativa subsp. indica1,716DREB_AP2_EREBPCold, drought and NaCl
ABRE_bZIP
G_box1_bZIP
Nac_box_NAC

Transcription factors are named using the name of the cis-element followed by the transcription family name or subfamily name.

Table 2

Transcription factor family/subfamily, cis-element, consensus binding site data and corresponding references (PubMed identifiers) used to perform transcription factor binding site prediction using stress-responsive genes curated from the literature (PubMed identifiers of studies are provided) in Oryza sativa subsp. japonica and Oryza sativa subsp. indica

Transcription factor family nameStress signalReference (Stress signal)Name of the cis-elementCis-elementReference (Cis-element)
Oryza sativa subsp. japonica
ABI3/VP1Iron toxicity19737364ABRECATGC19737364
AP2/EREBPCold, drought, NaCl12609047CRT/DRERCCGAC12609047
15834008(ACCGAC
18470484GCCGAC)
bHLHIron toxicity16887895OsIRO2CACGTGG16887895
bZIPABA, NaCl, drought, heat18236009G-box1CCACGTGTC18236009
(C/T)ACGTGGC11828032
11828032ABRE
NACCold, drought,NaCl, ABA20632034CATGTG17587305
19135985
16924117
WRKYBacterial blight17986178PRE2ACGCTGCCG17986178
PRE4TGCGCTT
Oryza sativa subsp. Indica
AP2/EREBPDrought, cold, salinity12609047CRT/DRE(G/A)CCGAC12609047
18470484
bZIPABA, NaCl, drought10636868G-box1CCACGTGTC18236009
19048288ABRE(C/T)ACGTGGC11828032
NACCold, drought, NaCl17587305CATGTG17587305
Table 6

Orthologs of stress-responsive genes compiled in STIFDB2 from Ensembl Plant BioMart using the Compara pipeline

SpeciesStress- responsive genes in STIFDB2Orthologs retrieved using BioMart
Sorghum (Sorghum bicolor)Maize (Zea mays)Soybean (Glycine max)
Arabidopsis thaliana3,1502,5152,5062,519
Oryza sativa subsp. japonica1,118909838727
Oryza sativa subsp. indica1,7161,1551,055926
Fig. 1

Bioinformatics pipeline used to develop STIFDB2.

Fig. 2

Distribution of transcription factor binding sites predicted in the up-regulated stress-responsive genes due to the perturbation of various stress signals in (a) Arabidopsis thaliana (b) Oryza sativa subsp. japonica and (c) Oryza sativa subsp. indica.

Summary of species, stress-responsive genes, stress-responsive transcription factors and stress signals in STIFDB2 Transcription factors are named using the name of the cis-element followed by the transcription family name or subfamily name. Transcription factor family/subfamily, cis-element, consensus binding site data and corresponding references (PubMed identifiers) used to perform transcription factor binding site prediction using stress-responsive genes curated from the literature (PubMed identifiers of studies are provided) in Oryza sativa subsp. japonica and Oryza sativa subsp. indica Stress-specific distribution of Arabidopsis thaliana genes in STIFDB2 with predicted binding sites using the STIF algorithm Alu, aluminum; Col, cold; CDS, cold–drought–salt, Dehy, dehydration; Dro, drought; OS, osmotic stress; OxS, oxidative stress. Stress-specific distribution of genes of Oryza sativa subsp. japonica genes in STIFDB2 with predicted binding sites using the STIF algorithm Stress-specific distribution of Oryza sativa subsp. indica genes in STIFDB2 with predicted binding sites using the STIF algorithm Orthologs of stress-responsive genes compiled in STIFDB2 from Ensembl Plant BioMart using the Compara pipeline Bioinformatics pipeline used to develop STIFDB2. Distribution of transcription factor binding sites predicted in the up-regulated stress-responsive genes due to the perturbation of various stress signals in (a) Arabidopsis thaliana (b) Oryza sativa subsp. japonica and (c) Oryza sativa subsp. indica.

Features of STIFDB2

Users can browse and search STIFDB2 using TAIR, TIGR or RAP database identifiers. Genes were also segregated into different chromosomes across three species and can be browsed on the basis of chromosomal location. The data can be browsed using one of the four key data sets: gene, transcription factors, stress signals and chromosome (Fig. 3). Users can also search the database using a variety of keywords including gene descriptions, stress signals, transcription factors and Gene Ontology (GO) annotations. A Basic Local Alignment Search Tool (BLAST) interface is provided to search STIFDB2 using nucleotide sequences. Sequence searches can be performed against the completed set of sequences in STIFDB2 or can be selectively searched against the genome of interest. STIFDB2 also provides ‘TFMap’ that enables users to track the transcription factor of interest visually using a 2D map that consists of the upstream region + untranslated region (UTR) and predicted transcription factors. The database also provides additional information including information about predicted binding sites, chromosomes, various database identifiers and cross-references to other plant databases.
Fig. 3

Chromosomal distribution of stress-responsive genes in STIFDB2. The x-axis shows the number of stress-up-regulated genes curated from publicly available microarray data and the y-axis shows chromosome numbers.

Chromosomal distribution of stress-responsive genes in STIFDB2. The x-axis shows the number of stress-up-regulated genes curated from publicly available microarray data and the y-axis shows chromosome numbers.

Applications of STIFDB2

The data compiled in the previous version of STIFDB were utilized for a variety of experimental and computational studies related to abiotic stress response and transcription factor binding site predictions. Briefly, the database can be utilized for identifying putative transcription factor binding sites in the upstream regions/UTRs of stress-upregulated genes curated from gene expression studies. The data can be utilized to study various network features associated with genes up-regulated in the setting of various stresses. These data can also be extrapolated to identify protein–protein interactions amongst transcription factors (manuscript in preparation). STIFDB2 enables the identification of master transcription factors (transcription factors that bind in the up-regulated region of genes perturbed due to diverse stress signals). The database can also be used to study the functional role of highly up-regulated genes using function annotation data integrated in the database (manuscript in preparation). With the inclusion of orthologs, the database can be employed to study evolutionary conservation and cross-genome comparative analyses of abiotic stress-responsive genes across different plant species. Data compiled in STIFDB2 can also be used for the analysis of stress-responsive transcriptional regulatory networks in Arabidopsis or rice genomes.

Analysis of the transcriptional regulatory cascade of the abiotic stress response in A. thaliana using STIFDB2

To illustrate the application of STIFDB2, we performed an in-depth analysis of the transcriptional regulatory cascades of 19 stress-responsive transcription factors in A. thaliana. Each transcription factor, inducing expression of another transcription factor due to a stress signal, may act as an activator or repressor of several other genes including those that encode transcription factors. A schematic diagram to depict primary differences between normal transcription factor activity and a transcriptional regulatory cascade event is provided in Fig. 3. The initial list of stress-responsive transcription factors in STIFDB was identified from biocuration followed by genomic data mining. These transcription factors may, in turn, activate other families of transcription factors via a transcriptional regulatory cascade mechanism. The transcriptional regulatory cascade mechanism was studied using the kinetics of regulatory cascades of regulatory networks including developmental gene networks (Davidson et al. 2002. Bolouri and Davidson 2003). Transcriptional cascades were reported to play a mechanistic role in the abiotic stress response in Arabidopsis (Nover et al. 2001, Guo et al. 2008, Vandepoele et al. 2009). Several high-throughput experimental and computational studies (Chen and Zhu 2004, Shinozaki and Yamaguchi-Shinozaki 2007, Vandepoele et al. 2009, Moreno-Risueno et al. 2010, Cramer et al. 2011, Less et al. 2011, Krasensky and Jonak 2012, Walley and Dehesh 2012) were focused on the identification of transcriptional regulatory networks perturbed due to individual stress signals. A global survey of transcriptional regulatory cascades driven by 19 different stress-responsive transcription factors and 14 stresses were not reported elsewhere. Deconvoluting the role of a primary transcription factor in the regulation of one or more secondary transcription factors is challenging in A. thaliana due to various factors including paucity of the data. STIFDB2 offers a solution to this problem by providing a large compendium of stress signals, stress-responsive transcription factors, putative target binding sites predicted using the STIF algorithm and upstream regions of stress-responsive genes curated from GEO. We utilized the data from STIFDB2 to investigate the transcriptional regulatory cascade network underlying abiotic stress responses in A. thaliana. To understand the transcriptional regulatory cascades of transcription factors, associated with various stress conditions, we have used an analytical pipeline that performs text mining of annotation data (such as GO terms, protein domain information and Pfam2GO annotations). First, we grouped the STIF prediction results by transcription factors and retrieved the annotation data [gene description from TAIR, GO terms (molecular function subset) and protein domains for genes annotated with various stress signals]. For each transcription factor, genes that were predicted with their corresponding target binding sites were retrieved from STIFDB2. We performed targeted text mining in the annotation for GO terms in the ‘molecular function’ category pertaining to trancription factors or transcription factor activity. We used the term ‘sequence-specific DNA binding transcription factor activity’ (GO:0003700) as a primary filter. Terms from the neighborhood of GO:0003700 consisting of ‘DNA binding’- or ‘transcription’-related terms (see Table 7) were also used for filtering. Once the genes were identified, we further scanned them for protein domains annotated with transcription factors using Pfam2GO annotations and Pfam domain associations, leading to 90 PFAM domains and GO term association data from Pfam2GO (Supplementary Table S1).
Table 7

GO terms used for target text mining of stress-responsive genes to find transcriptional regulatory cascades

GO identifierGO term
GO:0003700Sequence-specific DNA binding transcription factor activity
GO:0001073DNA binding transcription antitermination factor activity
GO:0001199Metal ion-regulated sequence-specific DNA binding transcription factor activity
GO:0001130Sequence-specific DNA binding bacterial-type RNA polymerase transcription factor activity
GO:0001142Sequence-specific DNA binding mitochondrial RNA polymerase transcription factor activity
GO:0001167Sequence-specific DNA binding RNA polymerase I transcription factor activity
GO:0000981Sequence-specific DNA binding RNA polymerase II transcription factor activity
GO:0001034Sequence-specific DNA binding RNA polymerase III transcription factor activity
GO:0001011Sequence-specific DNA binding RNA polymerase recruiting transcription factor activity
GO:0001010Sequence-specific DNA binding transcription factor recruiting transcription factor activity
GO:0000975Regulatory region DNA binding
GO:0043565Sequence-specific DNA binding
GO:0000975Regulatory region DNA binding
GO:0043565Sequence-specific DNA binding
GO:0043566Structure-specific DNA binding
GO terms used for target text mining of stress-responsive genes to find transcriptional regulatory cascades A summary of transcription factors perturbed by stress-responsive transcription factors is provided in Table 8. We noted that ABRE_ABI3_VP1 has predicted binding sites in five different genes in Arabidopsis. One of these genes (AT3G16770; ethylene-responsive transcription factor RAP2-3; AtEBP) was annotated with GO terms (DNA binding molecular_function, sequence-specific DNA binding transcription factor activity) and encodes a Pfam domain Apetala 2 (PF00847; AP2 that belongs to a large family of transcription factors). In addition to ABRE_ABI3_VP1, consensus binding regions for six other cis-elements were also predicted in the upstream region + UTR of AtEBP, confirming the importance of this gene in the stress response. ABRE_ABI3_VP1 was primarily considered as a transcription factor responsive to ABA-related stress. Several independent studies have reported that ABA and other plant hormones could participate in the same signaling cascades (Beaudoin et al. 2000, Tuteja 2007). Functional analysis of AtEBP conferred its role in abiotic stress response pathways in A. thaliana (Buttner and Singh 1997, Ogawa et al. 2005). Several interesting stress-responsive transcription-based regulatory cascades were observed and may reveal more interesting biological connections that can be studied using functional genomics and regulatory network analyses.
Table 8

Summary of the transcription factors identified using the targeted annotation mining approach

Arabidopsis stress-responsive transcription factors in STIFDB2Target genes in STIFDBGO term hitsa with transcription factor annotationPfam domain hitsb with transcription factor annotation
ABRE_ABI3_VP1511
AuxRE_ARF1,40511740
C_ABRE_bZIP248278
DREB_AP2_EREBP9528231
GCC_box_AP2_EREBP299124
G_ABRE_bZIP173164
G_box1_bZIP11162
G_box2_bZIP1,0206018
G_box_bHLH9318728
HBE_HB188238
HSE1_HSF2,47219774
Myb_box1_MYB2,31817668
Myb_box2_MYB1,05010334
Myb_box3_MYB1,26910838
Myb_box4_MYB6595821
Myb_box5_MYB3,11325092
N_box_bHLH6834715
Nac_box_NAC1,0208130
W_box_WRKY2,07716764

GO terms used to filter genes associated with transcription factor/transcription factor activities are provided in Table 7.

Pfam domains used to define the functional role pertaining to transcription factor/transcription factor activity are filtered from Pfam2GO annotation (see Supplementary Table S1).

Summary of the transcription factors identified using the targeted annotation mining approach GO terms used to filter genes associated with transcription factor/transcription factor activities are provided in Table 7. Pfam domains used to define the functional role pertaining to transcription factor/transcription factor activity are filtered from Pfam2GO annotation (see Supplementary Table S1). In summary, our analysis indicates that varying number of transcription factors could be modulated by the main set of 19 stress-responsive factors. Extensive experimental validation will be required to elucidate the complex patterns of gene regulation via the regulatory network cascade. The data summarized for each stress signal [see Supplementary File (.xls)] and data for 19 different transcription factors (labeled using the transcription factor name and given in Table 8) could be used for designing such experiments including ChIP-seq experiments to understand stress regulation pathways including mechanisms of response, adaptation and tolerance.

Discussion

Plants are commercially important crops, and it is important to understand the basic mechanism of the natural stress response and improve upon their stress response for better productivity. It will also be crucial to know the transcription factors that control and help with combating stress using stress tolerance pathways and adaptive mechanisms. STIFDB2 is a large compendium of curated stress signals, stress-responsive genes and stress-responsive transcription factors, along with information on putative binding sites where the stress-responsive transcription factors are predicted to be bound to the upstream regions and UTRs of stress-responsive genes. STIFDB2 will be a data-centric platform for performing analyses pertaining to stress response, tolerance and adaptation. The data compiled in STIFDB can be categorized into three classes: (i) data extracted from biocuration; (ii) prediction results; and (iii) annotation data compiled from primary databases. Here, the first class of data refers to a list of Arabidopsis and rice stresses and stress-associated transcription factors curated from the literature. These data points were used as a query to find stress-responsive gene expression studies in Arabidopsis and rice transcriptomes. For each study, we curated corresponding up-regulated gene lists from the literature. After defining the stress-centric gene list, primary databases were mined to retrieve sequences, upstream regions, UTRs and annotations. Thus, STIFDB2 has multiple layers of biological data types integrated for stress gene analyses. The limitation of such knowledge-based approaches is the availability and reliability of validated binding sites and annotation of all genomes, including plant genomes. In this specific case, the underlying algorithm ‘STIF’ uses a Hidden Markov model (HMM). However, the models were generated using published consensus sites curated from the literature. The predictive approach in STIFDB2 is limited to pattern searching, and each binding site is provided with a z-score to indicate the strength of prediction. This enables the users to filter out different sites using the z-score threshold. The data, compiled in STIFDB2, can be used to answer biologically relevant questions on the stress regulome. For example, the data can be examined to identify the functional repertoire and molecular pathways associated with stress-responsive genes and various stress signals in A. thaliana (manuscript submitted) and rice (manuscript in preparation). The data can also be utilized to find whether the stress-responsive genes are GC-rich or GC-poor classes of plant genes (Carels and Bernardi 2000). Such analyses can also performed on the type of stress to understand the role of GC content and stress responses. Transcriptomic diversity of stress-responsive genes has been reported (Carels and Berbnardi 2000, Duque 2011, Mastrangelo et al 2012, Syed et al. 2012), and detailed analysis of the transcriptomic plasticity, due to alternative splicing (Reddy et al. 2012) of genes up-regulated due to various stress signals, can be performed using the data integrated in STIFDB2. Recent studies have indicated that RNA-binding proteins may play a crucial role in the stress response in plants (Lorkovic 2009, Duque 2011, Nakaminami et al. 2012). The annotation data integrated in the database can be utilized to identify genes involved in RNA-binding proteins. Another interesting avenue where data in STIFDB2 can be utilized is in the analysis of transcriptional regulatory cascades of abiotic and biotic stress responses in plants. We performed an extensive analysis with the curated transcription factor information, a rich set of annotation data in STIFDB2, and the results were discussed in the application section. In a similar manner, the database can be used for both experimental and computational studies pertaining to plant stress response, tolerance and adaptation.

Materials and Methods

The content of STIFDB2 was generated using a bioinformatics pipeline consisting of three modules as follows: biocuration and genomic data mining; prediction of stress-responsive transcription factor binding sites; and data integration for developing a database and web-based platform for the analysis of stress-responsive genes and the stress regulome in plants. The analytic pipeline used to develop STIFDB2 is provided below (Fig. 4).
Fig. 4

Schematic diagram of (a) normal transcription factor activity and (b) transcriptional regulatory cascade network. Blue rectangles indicate untranslated regions, red rectangles indicate promoter regions, exon regions are highlighted using green, creme and violet color. Introns are defined using a white region interspersed between exons, and polyadenylation sites in the 3′ untranslated regions are colored in orange.

Schematic diagram of (a) normal transcription factor activity and (b) transcriptional regulatory cascade network. Blue rectangles indicate untranslated regions, red rectangles indicate promoter regions, exon regions are highlighted using green, creme and violet color. Introns are defined using a white region interspersed between exons, and polyadenylation sites in the 3′ untranslated regions are colored in orange. Schematic diagram of the targeted text mining approach used to find transcriptional regulatory cascades mediated by stress-responsive genes from Arabidopsis thaliana in STIFDB2. Features of STIFDB2. (a) Front-page of STIFDB2. (b) Browse by chromosome in STIFDB2. (c) Browse by transcription factor page. (d) An intermediate page showing annotations and the link to access the 1,000 or 100 bp prediction results. (e) STIFDB2 page for the gene coding for the nodulin MtN21-like transporter family protein (AT1G01070) in Arabidopsis thaliana.

Biocuration and genomic data mining of stress-responsive transcription factors and stress-responsive genes

Information pertaining to stress-responsive transcription factors and transcription factor binding sites of consensus nucleotide regions was curated from the literature, as explained in the previous version of STIFDB (Sundar et al. 2008, Shameer et al. 2009). The initial list of stress-responsive transcription factors was identified, and the consensus sequences were retrieved and used to generate HMMs for prediction of transcription factor binding sites using the STIF algorithm. The compendium of stress-responsive genes was mined from GEO by consulting the corresponding genes from the literature retrieved from PubMed. A gene was considered as stress responsive in STIFDB2 when it was reported to be differentially up-regulated in a perturbation experiment using one of the stress signals in the stress signal library. The data mining approach was performed to filter stress-responsive genes from three different plant species as follows: A. thaliana, O. sativa subsp. japonica and O. sativa subsp. indica from public expression data sets. The transcription factors identified from biocuration were named using a convention that consists of the name of the cis-element followed by the transcription family name or subfamily. For example: Myb_box1 consists of the name of the cis-element ‘Myb_box1’ and the name of the transcription factor family ‘Myb’. Biotic and abiotic stress-responsive genes up-regulated (in at least two replicates) in microarray experiments with a ≥2.5-fold expression change were considered as candidates for STIFDB2. Gene sequences for 100 and 1,000 bp with their 5′ UTR were extracted using Ensembl Plant BioMart (Guberman et al. 2011, Kinsella et al. 2011) using the Arabidopsis Information Resource (TAIR Version 10) database for Arabidopsis (Lamesch et al. 2012), Rice Genome Annotation Project MSU/TIGR (Ouyang et al. 2007) for O. sativa subsp. indica, and Rice Annotation Project (RAP-DB) (Ohyanagi et al. 2006) for O. sativa subsp. japonica. Various annotation data (GO annotations, gene descriptions, etc.) and ortholog information were retrieved using Ensembl plant and Gramene BioMart.

Prediction of stress-responsive transcription factor binding sites

The STIF algorithm was used to identify putative transcription factor binding sites in the upstream regions and UTRs of stress-responsive genes retrieved from publicly available microarray data. Briefly, the STIF algorithm encodes the consensus binding site data as an HMM model (Suppelementary Fig. S1) and performs a sensitive sequence search in both the forward and reverse direction. The program gives the following output: transcription factor, z-score, normalization score, start and end of the predicted binding site, chromosomal location (upstream region or UTR depends on the location of the predicted binding site) and orientation of the strand (forward or reverse). We predicted stress-responsive transcription factors at two levels. The first level surveyed 1,000 bp + UTRs and the second level surveyed 100 bp + UTRs for putative binding sites. The detailed background on the STIF algorithm is available elsewhere [see Sundar et al. (2008) and Shameer et al. (2009) for a detailed description of the STIF algorithm and scoring method].

Data integration

Data retrieved from biocuration (transcription factor, cis-elements, binding site information and library of stress signals), genomic data mining (genes differentially expressed due to plant stress) and STIF prediction (transcription factors in the upstream regions + UTRs of stress-up-regulated genes) were integrated in STIFDB2. STIFDB2 has a large number of unique associations of stress signals, stress-responsive genes and transcription factor binding sites predicted using the STIF algorithm and targeted for plant stress regulome studies. These data sets were compiled as a database using a new interface that enables searching, browsing and retrieval of various information via a user-friendly web interface. STIFDB was developed on a MySQL backend. The web interface of STIFDB was developed using HTML and JavaScript. Perl-CGI programs were used for the development of search, query and retrieval system. Programs for searching putative binding sites and for performing STIF prediction were coded in Perl. Stress profiles have been created for each gene that indicates the associated stress signals. We have also integrated GO associations, gene descriptions from TAIR (Lamesch et al. 2012), Rice Annotation Project (RAP) database (Ouynag et al. 2007), and transcription factor-related information from Database of Arabidopsis Transcription Factors (DATF) (Guo et al. 2005) and Database of Rice Transcrption Factor (DRTF) (Gao et al. 2006).

Conclusion

Understanding the molecular pathways and regulatory networks which influence various facets of stress-responsive events in plants is crucial for developing stress-tolerant or stress-adaptive varieties of plants. Such information will also help to understand the metabolic, physiological and cellular mechanisms implicated in such processes. STIFDB2, as an expanded resource that includes two additional genomes (O. sativa subsp. japonica and O. sativa subsp. indica), seven new stress signals (aluminum, bacterial blight, heat, iron, osmotic stress, UV-B and wounding) and 3,355 genes, will be an ideal information resource for plant biologists and computational biologists to perform in-depth analyses. The database provides 40,217 data points, and the data can be used to find putative RNA-binding proteins, transcriptional regulatory cascades, transcriptomic diversity and GC content of stress-responsive genes. We envisage that, analogously to its previous version, STIFDB2 will be widely accepted by the community and aid in unraveling novel aspects of stress response in plants.

Supplementary data

Supplementary data are available at PCP online.

Funding

The authors thank University of Agricultural Sciences (Bangalore) and National Centre for Biological Sciences, Tata Institute of Fundamental Research for infrastructural support. MN acknowledges University Grant Commission, New Delhi, India, for the Senior Research Fellowship during the course of this research work.
Table 3

Stress-specific distribution of Arabidopsis thaliana genes in STIFDB2 with predicted binding sites using the STIF algorithm

Transcription factorsABAAluColCDSDehyDroHeatLightIronNaClOSOxSUV-BWounding
ABRE_ABI3_VP110200201030000
AuxRE_ARF299135201748372323137047331155423
C_ABRE_bZIP5211053586648128054106
DREB_AP2_EREBP23083752329344121983436213142917
GCC_box_AP2_EREBP7021125129826311881695
G_ABRE_bZIP5406331798391907273
G_box1_bZIP3304631516196615184
G_box2_bZIP2347346929294262235035624173812
G_box_bHLH24143612017369332013340531104020
HBE_HB48271266553498153112
HSE1_HSF5631989230737285155810687647349435
Myb_box1_MYB5021683229566914950211482245339538
Myb_box2_MYB247103961528315132245138115144724
Myb_box3_MYB30744701525395242805347224215520
Myb_box4_MYB1425242419172815024248792910
Myb_box5_MYB703241,1334193922636741441,117584012047
N_box_bHLH16362281313224121642526110102615
Nac_box_NAC22563521425330302275038224124216
W_box_WRKY4642074124646124746310271645278430

Alu, aluminum; Col, cold; CDS, cold–drought–salt, Dehy, dehydration; Dro, drought; OS, osmotic stress; OxS, oxidative stress.

Table 4

Stress-specific distribution of genes of Oryza sativa subsp. japonica genes in STIFDB2 with predicted binding sites using the STIF algorithm

Transcription factorsABABacterial blightColdDroughtHeatIronNaCl
ABRE_ABI3_VP130424118753115733
ABRE_bZIP36301471156
DREB_AP2_EREBP2026096428910220
G_box1_bZIP118021351
Nac_box_NAC162085543309721
OsIRO2_bHLH66921782213
PRE2_WRKY0121720
PRE4_WRKY2261230110
Table 5

Stress-specific distribution of Oryza sativa subsp. indica genes in STIFDB2 with predicted binding sites using the STIF algorithm

Transcription factorsColdDroughtNaCl
ABRE_bZIP2698109
DREB_AP2_EREBP181473785
G_box1_bZIP32922
Nac_box_NAC132321576
  86 in total

1.  Two classes of genes in plants.

Authors:  N Carels; G Bernardi
Journal:  Genetics       Date:  2000-04       Impact factor: 4.562

Review 2.  Cold, salinity and drought stresses: an overview.

Authors:  Shilpi Mahajan; Narendra Tuteja
Journal:  Arch Biochem Biophys       Date:  2005-11-09       Impact factor: 4.013

Review 3.  Cold stress: manipulating freezing tolerance in plants.

Authors:  G J Warren
Journal:  Curr Biol       Date:  1998-07-16       Impact factor: 10.834

4.  Identification and prediction of abiotic stress responsive transcription factors involved in abiotic stress signaling in soybean.

Authors:  Lam-Son Phan Tran; Keiichi Mochida
Journal:  Plant Signal Behav       Date:  2010-03-06

5.  High GC content: critical parameter for predicting stress regulated miRNAs in Arabidopsis thaliana.

Authors:  Akaash Kumar Mishra; Seep Agarwal; Chakresh Kumar Jain; Vibha Rani
Journal:  Bioinformation       Date:  2009-10-11

Review 6.  Competition for land.

Authors:  Pete Smith; Peter J Gregory; Detlef van Vuuren; Michael Obersteiner; Petr Havlík; Mark Rounsevell; Jeremy Woods; Elke Stehfest; Jessica Bellarby
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2010-09-27       Impact factor: 6.237

7.  Engineering cold stress tolerance in crop plants.

Authors:  Gulzar S Sanghera; Shabir H Wani; Wasim Hussain; N B Singh
Journal:  Curr Genomics       Date:  2011-03       Impact factor: 2.236

8.  BioMart Central Portal: an open database network for the biological community.

Authors:  Jonathan M Guberman; J Ai; O Arnaiz; Joachim Baran; Andrew Blake; Richard Baldock; Claude Chelala; David Croft; Anthony Cros; Rosalind J Cutts; A Di Génova; Simon Forbes; T Fujisawa; E Gadaleta; D M Goodstein; Gunes Gundem; Bernard Haggarty; Syed Haider; Matthew Hall; Todd Harris; Robin Haw; S Hu; Simon Hubbard; Jack Hsu; Vivek Iyer; Philip Jones; Toshiaki Katayama; R Kinsella; Lei Kong; Daniel Lawson; Yong Liang; Nuria Lopez-Bigas; J Luo; Michael Lush; Jeremy Mason; Francois Moreews; Nelson Ndegwa; Darren Oakley; Christian Perez-Llamas; Michael Primig; Elena Rivkin; S Rosanoff; Rebecca Shepherd; Reinhard Simon; B Skarnes; Damian Smedley; Linda Sperling; William Spooner; Peter Stevenson; Kevin Stone; J Teague; Jun Wang; Jianxin Wang; Brett Whitty; D T Wong; Marie Wong-Erasmus; L Yao; Ken Youens-Clark; Christina Yung; Junjun Zhang; Arek Kasprzyk
Journal:  Database (Oxford)       Date:  2011-09-18       Impact factor: 3.451

Review 9.  Drought, salt, and temperature stress-induced metabolic rearrangements and regulatory networks.

Authors:  Julia Krasensky; Claudia Jonak
Journal:  J Exp Bot       Date:  2012-01-30       Impact factor: 6.992

10.  Salt stress responses in Arabidopsis utilize a signal transduction pathway related to endoplasmic reticulum stress signaling.

Authors:  Jian-Xiang Liu; Renu Srivastava; Ping Che; Stephen H Howell
Journal:  Plant J       Date:  2007-07-28       Impact factor: 6.417

View more
  40 in total

1.  Transcriptional regulatory networks in Arabidopsis thaliana during single and combined stresses.

Authors:  Pankaj Barah; Mahantesha Naika B N; Naresh Doni Jayavelu; Ramanathan Sowdhamini; Khader Shameer; Atle M Bones
Journal:  Nucleic Acids Res       Date:  2015-12-17       Impact factor: 16.971

Review 2.  Stress-induced chromatin changes in plants: of memories, metabolites and crop improvement.

Authors:  Cécile Vriet; Lars Hennig; Christophe Laloi
Journal:  Cell Mol Life Sci       Date:  2015-01-13       Impact factor: 9.261

Review 3.  Augmentation of crop productivity through interventions of omics technologies in India: challenges and opportunities.

Authors:  Rajesh Kumar Pathak; Mamta Baunthiyal; Dinesh Pandey; Anil Kumar
Journal:  3 Biotech       Date:  2018-10-19       Impact factor: 2.406

4.  The complete chloroplast genome of the threatened Dipentodon sinicus (Dipentodontaceae).

Authors:  Ming-Tai An; Xing-Yong Cui; Jia-Xin Yang; Guo-Xiong Hu
Journal:  J Genet       Date:  2019-03       Impact factor: 1.166

5.  Evolutionarily Conserved Alternative Splicing Across Monocots.

Authors:  Wenbin Mei; Lucas Boatwright; Guanqiao Feng; James C Schnable; W Brad Barbazuk
Journal:  Genetics       Date:  2017-08-24       Impact factor: 4.562

6.  Depletion of abscisic acid levels in roots of flooded Carrizo citrange (Poncirus trifoliata L. Raf. × Citrus sinensis L. Osb.) plants is a stress-specific response associated to the differential expression of PYR/PYL/RCAR receptors.

Authors:  Vicent Arbona; Sara I Zandalinas; Matías Manzi; Miguel González-Guzmán; Pedro L Rodriguez; Aurelio Gómez-Cadenas
Journal:  Plant Mol Biol       Date:  2017-02-03       Impact factor: 4.076

7.  Arabidopsis ensemble reverse-engineered gene regulatory network discloses interconnected transcription factors in oxidative stress.

Authors:  Vanessa Vermeirssen; Inge De Clercq; Thomas Van Parys; Frank Van Breusegem; Yves Van de Peer
Journal:  Plant Cell       Date:  2014-12-30       Impact factor: 11.277

8.  Machine learning approaches distinguish multiple stress conditions using stress-responsive genes and identify candidate genes for broad resistance in rice.

Authors:  Rafi Shaik; Wusirika Ramakrishna
Journal:  Plant Physiol       Date:  2013-11-14       Impact factor: 8.340

9.  Simultaneous application of heat, drought, and virus to Arabidopsis plants reveals significant shifts in signaling networks.

Authors:  Christian Maximilian Prasch; Uwe Sonnewald
Journal:  Plant Physiol       Date:  2013-06-10       Impact factor: 8.340

10.  GEM2Net: from gene expression modeling to -omics networks, a new CATdb module to investigate Arabidopsis thaliana genes involved in stress response.

Authors:  Rim Zaag; Jean Philippe Tamby; Cécile Guichard; Zakia Tariq; Guillem Rigaill; Etienne Delannoy; Jean-Pierre Renou; Sandrine Balzergue; Tristan Mary-Huard; Sébastien Aubourg; Marie-Laure Martin-Magniette; Véronique Brunaud
Journal:  Nucleic Acids Res       Date:  2014-11-11       Impact factor: 19.160

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.