Literature DB >> 17933783

PlantTFDB: a comprehensive plant transcription factor database.

An-Yuan Guo1, Xin Chen, Ge Gao, He Zhang, Qi-Hui Zhu, Xiao-Chuan Liu, Ying-Fu Zhong, Xiaocheng Gu, Kun He, Jingchu Luo.   

Abstract

Transcription factors (TFs) play key roles in controlling gene expression. Systematic identification and annotation of TFs, followed by construction of TF databases may serve as useful resources for studying the function and evolution of transcription factors. We developed a comprehensive plant transcription factor database PlantTFDB (http://planttfdb.cbi.pku.edu.cn), which contains 26,402 TFs predicted from 22 species, including five model organisms with available whole genome sequence and 17 plants with available EST sequences. To provide comprehensive information for those putative TFs, we made extensive annotation at both family and gene levels. A brief introduction and key references were presented for each family. Functional domain information and cross-references to various well-known public databases were available for each identified TF. In addition, we predicted putative orthologs of those TFs among the 22 species. PlantTFDB has a simple interface to allow users to search the database by IDs or free texts, to make sequence similarity search against TFs of all or individual species, and to download TF sequences for local analysis.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17933783      PMCID: PMC2238823          DOI: 10.1093/nar/gkm841

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Transcription factors (TFs) are important regulators to activate or repress the expression of coding or non-coding genes, through which they can further influence or control many biological processes. TRANSFAC collects ample information about animal transcription factors and their known binding cis-elements, with much less information about plant transcription factors (1). The completion of Arabidopsis thaliana genome sequencing made it the first model plant for transcription factor studies at the whole genome level (2). Several Arabidopsis TF online databases such as AtTFDB and RARTF are available over the Internet (3,4). In previous work, we have systematically predicted and annotated the transcription factors in Arabidopsis, rice and poplar based on their genome sequences, and constructed three distinct TF databases (5–7). Nevertheless, the requirement for an integrated and user-friendly plant transcription factor database is increasing, while the genomic sequencing of more and more plant species is underway. Riano-Pachon et al. (8) has constructed a database of transcription factors for five plant species and made the first attempt for construction of a comprehensive plant transcription factor database. The PlanTAPDB developed by Rensing and his colleagues is a comprehensive phylogeny-based resource of plant transcription associated proteins (9). Here, we report a comprehensive plant TF database with 22 species (http://planttfdb.cbi.pku.edu.cn). In addition to the five species with complete genome sequences (Arabidopsis, rice, poplar, green alga and moss), we have also identified and annotated TFs based on the transcripts assembled by PlantGDB (10) for 17 plant species including crops, fruits, trees and other economically important plants. Detailed annotations were provided for each predicted TF. Furthermore, we have predicted TF orthologs in those species for comparative analysis and evolutionary studies. Both the sequences and annotation information for each identified TF are freely available for online access on the PlantTFDB website. We hope that PlantTFDB may become a useful resource for the research community, especially in the study of comparative genomics and transcription regulation.

DATA SOURCES AND METHODS

Data sources

Currently, PlantTFDB contains TFs identified in 22 species (Table 1). Genome sequences of Arabidopsis (A. thaliana), rice (Oryza sativa), poplar (Populus trichocarpa), green alga (Chlamydomonas reinhardtii) and moss (Physcomitrella patens) were downloaded from TAIR, TIGR and JGI. For the 17 species without available complete genome data, we downloaded the unique transcripts from the Plant Genome Database (PlantGDB, http://www.plantgdb.org/) (10). These plant unique transcripts (PUTs) were assembled by PlantGDB based on the mRNA and EST sequences. We applied the framefinder program in ESTate (Expressed Sequence Tag Analysis Tools Etc) package to predict the open reading frames and obtain protein sequences from these PUTs (http://www.ebi.ac.uk/~guy/estate/).
Table 1.

Basic information of 22 species and TFs in the current PlantTFDB

Data source (Version)aNameSpeciesTFsbTFs With Orthologsc
TAIR (v6)ArabidopsisArabidopsis thaliana22901346
JGI (v1.1)PoplarPopulus trichocarpa25762042
TIGR (v4.0)RiceOryza sativa (ssp. indica)20251763
Oryza sativa (ssp. japonica)23842124
JGI (v1.1)MossPhyscomitrella patens1170524
JGI (v3.0)Green algaChlamydomonas reinhardtii20564
PlantGDB (v155a)CropsBarleyHordeum vulgare618595
MaizeZea mays764734
SorghumSorghum bicolor397372
SugarcaneSaccharum officinarum11771157
WheatTriticum aestivum11271074
FruitsAppleMalus domestica1025938
GrapeVitis vinifera867793
OrangeCitrus sinensis599541
TreesPinePinus taeda950644
SprucePicea glauca440383
Economic plantsCottonGossypium hirsutum15671430
PotatoSolanum tuberosum13401243
SoybeanGlycine max18911774
SunflowerHelianthus annuus513435
TomatoLycopersicon esculentum998917
DeervetchLotus japonicus457434
MedicagoMedicago truncatula1022914

TAIR: The Arabidopsis Information Resource, http://www.arabidopsis.org/; TIGR: The Institute for Genomic Research, http://www.tigr.org/; JGI: DOE Joint Genome Institute, http://genome.jgi-psf.org/; PlantGDB: Plant Genome DataBase, http://www.plantgdb.org/.

The TF numbers of Arabidopsis and japonica rice are the number of gene models including alternative splicing.

The number of TFs of each species that has orthologs in all other species.

Basic information of 22 species and TFs in the current PlantTFDB TAIR: The Arabidopsis Information Resource, http://www.arabidopsis.org/; TIGR: The Institute for Genomic Research, http://www.tigr.org/; JGI: DOE Joint Genome Institute, http://genome.jgi-psf.org/; PlantGDB: Plant Genome DataBase, http://www.plantgdb.org/. The TF numbers of Arabidopsis and japonica rice are the number of gene models including alternative splicing. The number of TFs of each species that has orthologs in all other species.

Plant TF HMM profiles

Transcription factors are always grouped as different families based on their DNA binding domains. Currently, 64 TF families have been characterized in plants (7). Among them, 48 families have hidden Markov Model (HMM) profiles in the Pfam database (v20.0) (11), while the remaining 16 families do not have available HMM profiles since they either were newly identified or only had a few members. To build the HMM profiles of these 16 families, we took their protein sequences from the previous TF databases of Arabidopsis, rice and poplar (5–7) and performed multiple sequence alignment. Then, we manually refined the alignment results and kept only the regions representing the conserved DNA binding domain. Finally, we used the hmmbuild program in the HMMER package (http://hmmer.janelia.org/, v2.3.2) to build the HMM profiles for these 16 TF families.

TF identification

We applied the hmmsearch program in HMMER to search against the protein sequences of each species to predict TFs. Based on our previous experience and manual inspection, we took E-value 0.01 as the cutoff, which was widely adopted for HMMER search. Many TFs have more than one DNA binding domains (2). For example, the B3 domain (PF02362) was presented in either ABI3-VP1 family or RAV subfamily of the AP2 family. We assigned TFs into the ABI3-VP1 family if they only possessed the B3 domain, otherwise to the AP2 family if they had both B3 and AP2 domains (2). We developed a rules-driven program to handle such issues. Detailed rules to categorize the TFs can be found in the PlantTFDB help page (http://planttfdb.cbi.pku.edu.cn/help.php).

TF annotation

To provide comprehensive information for the identified TFs, we made extensive annotations on both the family and gene levels. For each TF family, a brief introduction and key reference were listed in the family page. BLAST search was performed against well-known public databases such as UniProt, RefSeq, EMBL and TRANSFAC. Putative functional domains were identified and annotated by InterProScan, and Gene Ontology annotations were further extracted. In addition, the expression profiles collected from UniGene EST/cDNA information were available for all putative TFs.

Putative ortholog annotation

To predict putative orthologous relationship of TFs among these species, we used the BLAST score ratio (BSR) method, which had been widely adopted by ENSEMBL (http://www.ensembl.org/) and other studies (12). An all-against-all BLASTP search with a strict cutoff E-value <1E−20 was performed, and the BSR value was calculated for each hit. After comparing results at different BSR value, we chose the BSR value ≥0.4 as the cutoff and we retrieved the top sequences in a species with the largest BSR value as the putative ortholog(s).

DATABASE CONSTRUCTION AND WEB INTERFACE

We used MySQL as the database management system and designed a uniform database structure for most of the species except for Arabidopsis, rice and poplar. Each species has its own separate database and annotations against EMBL, UniProt, Gene Ontology, RefSeq, TRANSFAC and UniGene were stored in individual tables. PlantTFDB has a user-friendly entry point for each species. We kept the previously constructed database interface of Arabidopsis, rice and poplar and developed a uniform web interface for the 19 newly added species (Figure 1). A uniform text query interface for each species was designed. BLAST search against all or individual species was provided. All the sequences and ortholog information are available through the download page. Users can click the TF ID to activate the TF annotation information page with detailed annotations (Figure 1). In addition, putative orthologs among other species can also be found for each TF.
Figure 1.

The annotation information of a typical entry of the PlantTFDB showing the rich annotation of a bread wheat MADS box transcription factor (PTTa00615.1). The annotation contains four major categories: (A) Basic information; (B) Annotation; (C) Sequence; (D) Protein sequence feature details. Some of the annotations were tailored since the screen dump of the actual web page is too large to fit. The actual layout and content of the web page could be slightly different, since we keep developing and updating the database with new data available.

The annotation information of a typical entry of the PlantTFDB showing the rich annotation of a bread wheat MADS box transcription factor (PTTa00615.1). The annotation contains four major categories: (A) Basic information; (B) Annotation; (C) Sequence; (D) Protein sequence feature details. Some of the annotations were tailored since the screen dump of the actual web page is too large to fit. The actual layout and content of the web page could be slightly different, since we keep developing and updating the database with new data available.

DISCUSSION

Protein sequences from PUTs

The PUT sequences assembled from transcripts may have insertion/deletion sites disrupting the open reading frames. We made TBLASTN against these PUTs using Arabidopsis proteins as query sequences and observed that more than half of them had frame shifts. Therefore, we used the framefinder program to obtain the protein sequences of the PUTs of 17 species.

Evaluation of self-built HMM profiles

Based on the multiple sequence alignment results of known members in Arabidopsis, rice and poplar, we built HMM profiles for DNA binding domains of 16 families, which did not have HMM profiles in the Pfam database (v20.0). We obtained the same hits from HMMER search against Arabidopsis proteins for CCAAT-HAP2 and MBF1 using the HMM profiles we built and the new HMM profiles added in Pfam (v22.0).

Evaluation of TF identification accuracy

To estimate the accuracy and reliability, we applied our pipeline to 10 well-annotated families in Arabidopsis. We measured the sensitivity and the specificity of our approach using the same approach described by Iida et al. (4) and Riano-Pachon et al. (8). Our results showed that the sensitivity and specificity of eight families were greater than 0.95, the sensitivity of two families were close to 0.90 and the specificity of one family was 0.935. This suggested that the approach we used had reasonable performance with acceptable accuracy.

Ortholog prediction

We used the widely-adopted BSR method to predict orthologs (12). To find an appropriate parameter, we made BLAST search against japonica rice to find orthologs of each Arabidopsis TF with six different BSR cutoffs (0.3, 0.33, 0.35, 0.4, 0.45 and 0.5) and compared the hit number, coverage and identity of the BLAST hits. Finally, we chose the BSR 0.4 as the cutoff of BLAST hits, which was relatively strict with an average coverage >80% and identity ∼60%. Based on these cutoffs, we chose the top sequences in a species with the largest BSR value as the putative ortholog(s).

CONCLUSION

PlantTFDB is our attempt for constructing a comprehensive plant transcription factor database with all currently available genome and transcript sequences. We will continue to add more species and new annotations when their sequence data become available. The extensive annotation of each specific TF family in 22 species and the information of orthologs among these species may facilitate the study of transcription regulation and the evolution of plant TFs.
  12 in total

1.  DATF: a database of Arabidopsis transcription factors.

Authors:  Anyuan Guo; Kun He; Di Liu; Shunong Bai; Xiaocheng Gu; Liping Wei; Jingchu Luo
Journal:  Bioinformatics       Date:  2005-02-24       Impact factor: 6.937

2.  PlanTAPDB, a phylogeny-based resource of plant transcription-associated proteins.

Authors:  Sandra Richardt; Daniel Lang; Ralf Reski; Wolfgang Frank; Stefan A Rensing
Journal:  Plant Physiol       Date:  2007-03-02       Impact factor: 8.340

3.  RARTF: database and tools for complete sets of Arabidopsis transcription factors.

Authors:  Kei Iida; Motoaki Seki; Tetsuya Sakurai; Masakazu Satou; Kenji Akiyama; Tetsuro Toyoda; Akihiko Konagaya; Kazuo Shinozaki
Journal:  DNA Res       Date:  2005       Impact factor: 4.458

4.  Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes.

Authors:  J L Riechmann; J Heard; G Martin; L Reuber; C Jiang; J Keddie; L Adam; O Pineda; O J Ratcliffe; R R Samaha; R Creelman; M Pilgrim; P Broun; J Z Zhang; D Ghandehari; B K Sherman; G Yu
Journal:  Science       Date:  2000-12-15       Impact factor: 47.728

5.  DPTF: a database of poplar transcription factors.

Authors:  Qi-Hui Zhu; An-Yuan Guo; Ge Gao; Ying-Fu Zhong; Meng Xu; Minren Huang; Jinchu Luo
Journal:  Bioinformatics       Date:  2007-03-28       Impact factor: 6.937

6.  DRTF: a database of rice transcription factors.

Authors:  Ge Gao; Yingfu Zhong; Anyuan Guo; Qihui Zhu; Wen Tang; Weimou Zheng; Xiaocheng Gu; Liping Wei; Jingchu Luo
Journal:  Bioinformatics       Date:  2006-03-21       Impact factor: 6.937

7.  TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes.

Authors:  V Matys; O V Kel-Margoulis; E Fricke; I Liebich; S Land; A Barre-Dirrie; I Reuter; D Chekmenev; M Krull; K Hornischer; N Voss; P Stegmaier; B Lewicki-Potapov; H Saxel; A E Kel; E Wingender
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

8.  Pfam: clans, web tools and services.

Authors:  Robert D Finn; Jaina Mistry; Benjamin Schuster-Böckler; Sam Griffiths-Jones; Volker Hollich; Timo Lassmann; Simon Moxon; Mhairi Marshall; Ajay Khanna; Richard Durbin; Sean R Eddy; Erik L L Sonnhammer; Alex Bateman
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

9.  Visualization of comparative genomic analyses by BLAST score ratio.

Authors:  David A Rasko; Garry S A Myers; Jacques Ravel
Journal:  BMC Bioinformatics       Date:  2005-01-05       Impact factor: 3.169

10.  AGRIS: Arabidopsis gene regulatory information server, an information resource of Arabidopsis cis-regulatory elements and transcription factors.

Authors:  Ramana V Davuluri; Hao Sun; Saranyan K Palaniswamy; Nicole Matthews; Carlos Molina; Mike Kurtz; Erich Grotewold
Journal:  BMC Bioinformatics       Date:  2003-06-23       Impact factor: 3.169

View more
  99 in total

1.  Small yet effective: the ethylene responsive element binding factor-associated amphiphilic repression (EAR) motif.

Authors:  Sateesh Kagale; Kevin Rozwadowski
Journal:  Plant Signal Behav       Date:  2010-06-01

2.  Wild soybean roots depend on specific transcription factors and oxidation reduction related genesin response to alkaline stress.

Authors:  Huizi DuanMu; Yang Wang; Xi Bai; Shufei Cheng; Michael K Deyholos; Gane Ka-Shu Wong; Dan Li; Dan Zhu; Ran Li; Yang Yu; Lei Cao; Chao Chen; Yanming Zhu
Journal:  Funct Integr Genomics       Date:  2015-04-15       Impact factor: 3.410

3.  TrichOME: a comparative omics database for plant trichomes.

Authors:  Xinbin Dai; Guodong Wang; Dong Sik Yang; Yuhong Tang; Pierre Broun; M David Marks; Lloyd W Sumner; Richard A Dixon; Patrick Xuechun Zhao
Journal:  Plant Physiol       Date:  2009-11-25       Impact factor: 8.340

4.  An expression database for roots of the model legume Medicago truncatula under salt stress.

Authors:  Daofeng Li; Zhen Su; Jiangli Dong; Tao Wang
Journal:  BMC Genomics       Date:  2009-11-11       Impact factor: 3.969

5.  The Dominant and Poorly Penetrant Phenotypes of Maize Unstable factor for orange1 Are Caused by DNA Methylation Changes at a Linked Transposon.

Authors:  Kameron Wittmeyer; Jin Cui; Debamalya Chatterjee; Tzuu-Fen Lee; Qixian Tan; Weiya Xue; Yinping Jiao; Po-Hao Wang; Iffa Gaffoor; Doreen Ware; Blake C Meyers; Surinder Chopra
Journal:  Plant Cell       Date:  2018-12-18       Impact factor: 11.277

6.  Cis-regulatory code of stress-responsive transcription in Arabidopsis thaliana.

Authors:  Cheng Zou; Kelian Sun; Joshua D Mackaluso; Alexander E Seddon; Rong Jin; Michael F Thomashow; Shin-Han Shiu
Journal:  Proc Natl Acad Sci U S A       Date:  2011-08-17       Impact factor: 11.205

7.  Differential expression analysis of a subset of GmNAC genes in shoots of two contrasting drought-responsive soybean cultivars DT51 and MTD720 under normal and drought conditions.

Authors:  Nguyen Binh Anh Thu; Xuan Lan Thi Hoang; Hieu Doan; Thanh-Hao Nguyen; Dao Bui; Nguyen Phuong Thao; Lam-Son Phan Tran
Journal:  Mol Biol Rep       Date:  2014-07-02       Impact factor: 2.316

8.  De novo transcriptome analysis deciphered polyoxypregnane glycoside biosynthesis pathway in Gymnema sylvestre.

Authors:  Kuldeepsingh A Kalariya; Dipal B Minipara; Ponnuchamy Manivel
Journal:  3 Biotech       Date:  2018-08-21       Impact factor: 2.406

9.  GRASSIUS: a platform for comparative regulatory genomics across the grasses.

Authors:  Alper Yilmaz; Milton Y Nishiyama; Bernardo Garcia Fuentes; Glaucia Mendes Souza; Daniel Janies; John Gray; Erich Grotewold
Journal:  Plant Physiol       Date:  2008-11-05       Impact factor: 8.340

10.  VitisNet: "Omics" integration through grapevine molecular networks.

Authors:  Jérôme Grimplet; Grant R Cramer; Julie A Dickerson; Kathy Mathiason; John Van Hemert; Anne Y Fennell
Journal:  PLoS One       Date:  2009-12-21       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.