Literature DB >> 24174544

PlantTFDB 3.0: a portal for the functional and evolutionary study of plant transcription factors.

Jinpu Jin1, He Zhang, Lei Kong, Ge Gao, Jingchu Luo.   

Abstract

With the aim to provide a resource for functional and evolutionary study of plant transcription factors (TFs), we updated the plant TF database PlantTFDB to version 3.0 (http://planttfdb.cbi.pku.edu.cn). After refining the TF classification pipeline, we systematically identified 129 288 TFs from 83 species, of which 67 species have genome sequences, covering main lineages of green plants. Besides the abundant annotation provided in the previous version, we generated more annotations for identified TFs, including expression, regulation, interaction, conserved elements, phenotype information, expert-curated descriptions derived from UniProt, TAIR and NCBI GeneRIF, as well as references to provide clues for functional studies of TFs. To help identify evolutionary relationship among identified TFs, we assigned 69 450 TFs into 3924 orthologous groups, and constructed 9217 phylogenetic trees for TFs within the same families or same orthologous groups, respectively. In addition, we set up a TF prediction server in this version for users to identify TFs from their own sequences.

Entities:  

Mesh:

Substances:

Year:  2013        PMID: 24174544      PMCID: PMC3965000          DOI: 10.1093/nar/gkt1016

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Transcription factors (TFs) play key roles in plant development and stress response by temporarily and spatially regulating the transcription of their target genes. TFs are usually classified into different families based on their DNA-binding domains (DBDs). In 2000, Riechmann et al. (1) made the first attempt for the genome-wide analysis of TFs in Arabidopsis thaliana soon after the availability of its whole genome sequence. In the following years, several databases dedicated to identification and annotation of plant TFs became publicly available, either for multiple species, such as PlnTFDB (2), PlanTAPDB (3), GRASSIUS (4), LegumeTFDB (5), DATFAP (6) and TreeTFDB (7), or for individual organisms, such as AGRIS (8), RARTF (9), TOBFAC (10), SoyDB (11) and wDBTF (12). During the past 8 years, we have constructed three species-specific TF databases DATF (13), DRTF (14) and DPTF (15) for model organisms Arabidopsis, rice and poplar, as well as a comprehensive plant TF database (PlantTFDB) (16,17). The databases we constructed were accessed >10 million hits per year and were widely used for functional and evolutionary study of plant TFs, as well as for the prediction and annotation of TFs in newly sequenced genomes. To meet requirements from our user community, we updated PlantTFDB to version 3.0 (http://planttfdb.cbi.pku.edu.cn/). In comparison with the previous two versions, PlantTFDB 3.0 covers more species and more TFs identified by the refined family assignment rules and improved prediction pipeline. In addition, new types of annotations were added, and phylogenetic trees and orthologous groups (OGs) were re-constructed. Finally, an online TF prediction server was set up (Table 1).
Table 1.

Comparison among the three versions of PlantTFDB

PlantTFDBVersion 1.0Version 2.0Version 3.0
Species224983
Species with genome sequences52867
Species without genome sequences172116
TF family645858
TF number26 40253 574129 288
Annotation
    Expert-curated descriptionNoNoYes
    ExpressionYesYesYes
    RegulationNoNoYes
    InteractionNoNoYes
    PhenotypeNoNoYes
    ReferenceYesYesYes
Orthologous groupYesYesYes
Phylogenetic tree
    FamilyNoYesYes
    Orthologous groupNoNoYes
Web serviceNoYesNo
TF prediction serverNoNoYes
Comparison among the three versions of PlantTFDB We believe that PlantTFDB 3.0 provides users with complete TF datasets, comprehensive annotations and useful analysis tools.

MATERIALS AND METHODS

Figure 1 shows the main steps in the construction of PlantTFDB 3.0, including data integration, TF classification, TF annotation and construction of orthologous groups.
Figure 1.

The flowchart for construction of PlantTFDB 3.0.

The flowchart for construction of PlantTFDB 3.0.

Sequence data

We downloaded protein sequences of 67 species with genome sequences from the Joint Genome Institute (JGI) and several other institutions engaged in plant genome sequencing and annotation projects (Supplementary Table S1). For 16 species without genome sequences, we downloaded their expressed sequence tag sequences from UniGene (18) and PlantGDB-assembled unique transcripts from PlantGDB (19), and then built reference proteome for each species (Supplementary Table S2) using a previous established pipeline (17).

Family assignment rules

TFs are usually classified into different families based on their DBDs. We used auxiliary and forbidden domains to distinguish complicated TF families with multiple signature domains. After a comprehensive literature review, we improved the family assignment rules described in the previous version (17) and arranged several families into superfamilies (Figure 2). We removed the forbidden domain Glyco_hydro_14 of the BES1 family, as recent studies demonstrated that BES1 family proteins with this domain also showed TF activity (20).
Figure 2.

Refined family assignment rules used for TF identification and assignment. Green ellipses represent TF families and red rectangles represent DBDs. Blue rectangles denote auxiliary domains and purple rectangles denote forbidden domains. Green solid lines link families and DBDs or auxiliary domains and number ‘1’ or ‘2’ indicates number of DBDs. Red dash lines link families and forbidden domains. Families belonging to the same superfamily are arranged within rectangles or rhombi.

Refined family assignment rules used for TF identification and assignment. Green ellipses represent TF families and red rectangles represent DBDs. Blue rectangles denote auxiliary domains and purple rectangles denote forbidden domains. Green solid lines link families and DBDs or auxiliary domains and number ‘1’ or ‘2’ indicates number of DBDs. Red dash lines link families and forbidden domains. Families belonging to the same superfamily are arranged within rectangles or rhombi.

Prediction pipeline

We refined the TF prediction pipeline by updating the hidden Markov model (HMM) profiles used to identify TFs and adjusted their thresholds. We downloaded the latest version of HMM profiles from Pfam (version 27.0) (21) for most signature domains and built our own HMM profiles for the remaining domain that did not have available Pfam HMM profiles. We used HMMER 3.0 (22) to identify TFs and assigned them into different families according to the family assignment rules described earlier.

Annotation pipeline

We used a pipeline comprising several packages to annotate identified TFs. Domain structure and GO annotation were predicted by InterProScan (version 4.8) (23). Cross-links to well-known resources were assigned to the best BLAST hits with maximal e-value 1e-10. Nuclear localization signals were predicted by PredictNLS (24). Other information such as expert-curated description, expression, regulation, conserved elements and references was collected from corresponding databases. Multiple sequence alignments (MSAs) for DBDs were constructed by HMM-guided method, and MSAs for full-length protein sequences were inferred by T-coffee (version 9.03) (25). Family trees across 83 species were inferred by FastTree (version 2.1.3) (26) with 100 resamplings. Family trees within each species were inferred by MrBayes (version 3.2.1) (27) based on the Dayhoff model for 50 000 generations. The Help page (http://planttfdb.cbi.pku.edu.cn/help_info.php#tfinfo) describes more detailed information on datasets and parameter settings.

Orthologous groups

Orthologous groups were inferred using the following methods implemented as a pipeline of Plaza (Figure 3) (28).
Figure 3.

The pipeline for construction of orthologous groups.

The pipeline for construction of orthologous groups. First, we selected a representative gene model for each locus from 67 species with genome sequences and filtered out proteins if their lengths were <50 aa. Then we classified these proteins into clusters by TribeMCL (29). After that, proteins within the same cluster were assigned into orthologous groups by OrthoMCL (30). For TFs in the same orthologous group, MSAs were constructed by T-coffee and phylogenetic trees were inferred by MrBayes (27) with the same parameters described earlier.

RESULTS AND DISCUSSION

Genomic TF repertoires of green plants

Using the refined TF prediction pipeline, we identified 129 288 TFs (116 585 loci) from 2 691 496 proteins (2 437 666 loci) of 83 species (Table 2, Supplementary Tables S3 and S4).
Table 2.

Average number of TFs in different taxonomic lineages summarized from 67 species with genome sequences

LineageSpeciesGeneTF (%)Family
Chlorophyta1010 550141 (1.34)35
Bryophytaa132 2731079 (3.34)53
Lycopodiophytab122 271665 (2.99)54
Coniferophytac171 1581851 (2.60)55
Basal Magnoliophytad126 846900 (3.35)58
Monocot1534 0171701 (5.00)58
Eudicot3834 7981861 (5.35)58

aPhyscomitrella patens.

bSelaginella moellendorffii.

cPicea abies.

dAmborella trichopoda.

Average number of TFs in different taxonomic lineages summarized from 67 species with genome sequences aPhyscomitrella patens. bSelaginella moellendorffii. cPicea abies. dAmborella trichopoda. The increased number of species with genome sequences and the availability of a conifer genome (31) gave us the chance to show the genomic TF repertoires across green plants for the first time (Table 2, Supplementary Table S3). Compared with green alga, land plants have a large increase in the number of TF families, TFs and percentage of TFs in their genome, which might correlate with morphological complexity of land plants (32).

Comprehensive annotations for TFs

A database of well-annotated TFs may provide users with rich information as well as insightful clues for further study. In an attempt to construct a comprehensive knowledgebase for plant TFs, we collected expert-curated description, expression, regulation, mutation and phenotype data from various public resources and made annotations for identified TFs in PlantTFDB 3.0 (Table 3), in addition to abundant annotations provided in the previous two versions (16,17). By integrating information from Entrez Gene (33), UniProtKB (34), GeneRIF (33) and mined by ourselves, we added related references for TFs.
Table 3.

Summary of annotations for TFs in PlantTFDB 3.0

TypeaSpeciesTFEntry
Expert-curated description2221286649
Expression
    UniGene4444 86245 239
    Microarray1415 42431 975
 Plant ontology56850174 162
Regulation
 Binding site/matrix24541729
 ChIP-chip/ChIP-seq15475
 microRNA12843
 Hormone1417803
Interaction109923101
Conserved element2370963 859
Phenotype24704147 684
Reference59500420 255

aNew types of annotations in this version are marked in bold.

Summary of annotations for TFs in PlantTFDB 3.0 aNew types of annotations in this version are marked in bold. Evolutionary conserved elements may work as transcriptional regulatory elements (35,36). Therefore, we collected these elements, which were identified based on the genome alignments of 9 crucifers (36) and 20 angiosperm plants (37), and added them into the current version, in addition to functional genomic annotations described earlier. Orthologs usually have similar function and are widely used to explore functions of poorly studied proteins. To help users infer the functions of poorly studied TFs, we constructed MSAs and phylogenetic trees within the same family across 83 species, based on conserved DBDs. We further assigned 69 450 TFs into 3924 orthologous groups and constructed phylogenetic trees for each orthologous group. As an aid to decipher their evolutionary relationships, we also built trees for individual TF families within the same species. Hyperlinks to TF pages were added in the tree branches so that the users could browse them conveniently. The MSAs and phylogenetic trees in PlantTFDB 3.0 can be freely downloaded for further analyses. Direct links to TFs of A. thaliana, the best-studied model plant and the best-annotated species in PlantTFDB 3.0, were also generated for all TFs in other species.

TF prediction server

In recent years, the TF classification rules we constructed have been widely used to annotate TFs of newly sequenced genomes (38,39). In this regard, we set up a TF prediction server (http://planttfdb.cbi.pku.edu.cn/prediction.php) for users to identify TFs from their own protein sequences. As A. thaliana is the best-annotated species in PlantTFDB 3.0, links to the best hits in A. thaliana are provided for predicted TFs. Currently, users can upload up to 100 sequences and obtain results within a minute from our server.

Further direction

We have updated our PlantTFDB to version 3.0, which provides TF repertoires across the main lineages of green plants. The knowledge we collected, the OGs and phylogenetic trees we inferred are useful resources for further exploration of the physiological function and evolutionary relationship of TFs. We will continue to work on this project to refine the family assignment rules and the prediction pipeline, and collect more type of useful information for identified TFs in the future.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

National Natural Science Foundation of China [31071160, 31171242]; China High-Tech Program [2006AA02Z334, 2012AA020409]; China National Key Basic Research Program [2011CBA01102]; China National Outstanding Youth Talents Program; China National Science and Technology Infrastructure Program [2009FY120100]. Funding for open access charge: State Key Laboratory of Protein and Plant Gene Research. Conflict of interest statement. None declared.
  38 in total

1.  T-Coffee: A novel method for fast and accurate multiple sequence alignment.

Authors:  C Notredame; D G Higgins; J Heringa
Journal:  J Mol Biol       Date:  2000-09-08       Impact factor: 5.469

2.  MrBayes 3: Bayesian phylogenetic inference under mixed models.

Authors:  Fredrik Ronquist; John P Huelsenbeck
Journal:  Bioinformatics       Date:  2003-08-12       Impact factor: 6.937

3.  β-amylase-like proteins function as transcription factors in Arabidopsis, controlling shoot growth and development.

Authors:  Heike Reinhold; Sebastian Soyk; Klára Simková; Carmen Hostettler; John Marafino; Samantha Mainiero; Cara K Vaughan; Jonathan D Monroe; Samuel C Zeeman
Journal:  Plant Cell       Date:  2011-04-12       Impact factor: 11.277

4.  GRASSIUS: a platform for comparative regulatory genomics across the grasses.

Authors:  Alper Yilmaz; Milton Y Nishiyama; Bernardo Garcia Fuentes; Glaucia Mendes Souza; Daniel Janies; John Gray; Erich Grotewold
Journal:  Plant Physiol       Date:  2008-11-05       Impact factor: 8.340

5.  wDBTF: an integrated database resource for studying wheat transcription factor families.

Authors:  Isabelle Romeuf; Dominique Tessier; Mireille Dardevet; Gérard Branlard; Gilles Charmet; Catherine Ravel
Journal:  BMC Genomics       Date:  2010-03-18       Impact factor: 3.969

6.  DPTF: a database of poplar transcription factors.

Authors:  Qi-Hui Zhu; An-Yuan Guo; Ge Gao; Ying-Fu Zhong; Meng Xu; Minren Huang; Jinchu Luo
Journal:  Bioinformatics       Date:  2007-03-28       Impact factor: 6.937

7.  An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions.

Authors:  Annabelle Haudry; Adrian E Platts; Emilio Vello; Douglas R Hoen; Mickael Leclercq; Robert J Williamson; Ewa Forczek; Zoé Joly-Lopez; Joshua G Steffen; Khaled M Hazzouri; Ken Dewar; John R Stinchcombe; Daniel J Schoen; Xiaowu Wang; Jeremy Schmutz; Christopher D Town; Patrick P Edger; J Chris Pires; Karen S Schumaker; David E Jarvis; Terezie Mandáková; Martin A Lysak; Erik van den Bergh; M Eric Schranz; Paul M Harrison; Alan M Moses; Thomas E Bureau; Stephen I Wright; Mathieu Blanchette
Journal:  Nat Genet       Date:  2013-06-30       Impact factor: 38.330

8.  The Pfam protein families database.

Authors:  Marco Punta; Penny C Coggill; Ruth Y Eberhardt; Jaina Mistry; John Tate; Chris Boursnell; Ningze Pang; Kristoffer Forslund; Goran Ceric; Jody Clements; Andreas Heger; Liisa Holm; Erik L L Sonnhammer; Sean R Eddy; Alex Bateman; Robert D Finn
Journal:  Nucleic Acids Res       Date:  2011-11-29       Impact factor: 16.971

9.  TreeTFDB: an integrative database of the transcription factors from six economically important tree crops for functional predictions and comparative and functional genomics.

Authors:  Keiichi Mochida; Takuhiro Yoshida; Tetsuya Sakurai; Kazuko Yamaguchi-Shinozaki; Kazuo Shinozaki; Lam-Son Phan Tran
Journal:  DNA Res       Date:  2013-01-02       Impact factor: 4.458

10.  PlantTFDB: a comprehensive plant transcription factor database.

Authors:  An-Yuan Guo; Xin Chen; Ge Gao; He Zhang; Qi-Hui Zhu; Xiao-Chuan Liu; Ying-Fu Zhong; Xiaocheng Gu; Kun He; Jingchu Luo
Journal:  Nucleic Acids Res       Date:  2007-10-12       Impact factor: 16.971

View more
  374 in total

Review 1.  Bioinformatic landscapes for plant transcription factor system research.

Authors:  Yijun Wang; Wenjie Lu; Dexiang Deng
Journal:  Planta       Date:  2015-12-30       Impact factor: 4.116

2.  Chiba Tendril-Less locus determines tendril organ identity in melon (Cucumis melo L.) and potentially encodes a tendril-specific TCP homolog.

Authors:  Shinji Mizuno; Masatoshi Sonoda; Yayoi Tamura; Eisho Nishino; Hideyuki Suzuki; Takahide Sato; Toshikatsu Oizumi
Journal:  J Plant Res       Date:  2015-08-15       Impact factor: 2.629

3.  Transcriptional regulatory networks in Arabidopsis thaliana during single and combined stresses.

Authors:  Pankaj Barah; Mahantesha Naika B N; Naresh Doni Jayavelu; Ramanathan Sowdhamini; Khader Shameer; Atle M Bones
Journal:  Nucleic Acids Res       Date:  2015-12-17       Impact factor: 16.971

4.  Next-generation sequencing (NGS) transcriptomes reveal association of multiple genes and pathways contributing to secondary metabolites accumulation in tuberous roots of Aconitum heterophyllum Wall.

Authors:  Tarun Pal; Nikhil Malhotra; Sree Krishna Chanumolu; Rajinder Singh Chauhan
Journal:  Planta       Date:  2015-04-24       Impact factor: 4.116

Review 5.  The Chlamydomonas genome project: a decade on.

Authors:  Ian K Blaby; Crysten E Blaby-Haas; Nicolas Tourasse; Erik F Y Hom; David Lopez; Munevver Aksoy; Arthur Grossman; James Umen; Susan Dutcher; Mary Porter; Stephen King; George B Witman; Mario Stanke; Elizabeth H Harris; David Goodstein; Jane Grimwood; Jeremy Schmutz; Olivier Vallon; Sabeeha S Merchant; Simon Prochnik
Journal:  Trends Plant Sci       Date:  2014-06-17       Impact factor: 18.313

6.  Patterns and Consequences of Subgenome Differentiation Provide Insights into the Nature of Paleopolyploidy in Plants.

Authors:  Meixia Zhao; Biao Zhang; Damon Lisch; Jianxin Ma
Journal:  Plant Cell       Date:  2017-11-27       Impact factor: 11.277

7.  The abiotic stress-responsive NAC-type transcription factor SlNAC4 regulates salt and drought tolerance and stress-related genes in tomato (Solanum lycopersicum).

Authors:  Mingku Zhu; Guoping Chen; Jianling Zhang; Yanjie Zhang; Qiaoli Xie; Zhiping Zhao; Yu Pan; Zongli Hu
Journal:  Plant Cell Rep       Date:  2014-07-26       Impact factor: 4.570

8.  A Comparison of the Transcriptomes of Cowpeas in Response to Two Different Ionizing Radiations.

Authors:  Ryulyi Kang; Eunju Seo; Aron Park; Woon Ji Kim; Byeong Hee Kang; Jeong-Hee Lee; Sang Hoon Kim; Si-Yong Kang; Bo-Keun Ha
Journal:  Plants (Basel)       Date:  2021-03-17

9.  The Systems Architecture of Molecular Memory in Poplar after Abiotic Stress.

Authors:  Elisabeth Georgii; Karl Kugler; Matthias Pfeifer; Elisa Vanzo; Katja Block; Malgorzata A Domagalska; Werner Jud; Hamada AbdElgawad; Han Asard; Richard Reinhardt; Armin Hansel; Manuel Spannagl; Anton R Schäffner; Klaus Palme; Klaus F X Mayer; Jörg-Peter Schnitzler
Journal:  Plant Cell       Date:  2019-01-31       Impact factor: 11.277

10.  Changes in the Common Bean Transcriptome in Response to Secreted and Surface Signal Molecules of Rhizobium etli.

Authors:  Virginia Dalla Via; Candela Narduzzi; Orlando Mario Aguilar; María Eugenia Zanetti; Flavio Antonio Blanco
Journal:  Plant Physiol       Date:  2015-08-17       Impact factor: 8.340

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.