Literature DB >> 21097470

PlantTFDB 2.0: update and improvement of the comprehensive plant transcription factor database.

He Zhang1, Jinpu Jin, Liang Tang, Yi Zhao, Xiaocheng Gu, Ge Gao, Jingchu Luo.   

Abstract

We updated the plant transcription factor (TF) database to version 2.0 (PlantTFDB 2.0, http://planttfdb.cbi.pku.edu.cn) which contains 53,319 putative TFs predicted from 49 species. We made detailed annotation including general information, domain feature, gene ontology, expression pattern and ortholog groups, as well as cross references to various databases and literature citations for these TFs classified into 58 newly defined families with computational approach and manual inspection. Multiple sequence alignments and phylogenetic trees for each family can be shown as Weblogo pictures or downloaded as text files. We have redesigned the user interface in the new version. Users can search TFs with much more flexibility through the improved advanced search page, and the search results can be exported into various formats for further analysis. In addition, we now provide web service for advanced users to access PlantTFDB 2.0 more efficiently.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 21097470      PMCID: PMC3013715          DOI: 10.1093/nar/gkq1141

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Transcription factors (TFs) are key regulators for transcriptional expression in biological processes (1). During the past years, several databases of plant TFs and other transcription regulators have been publicly available, such as PlnTFDB (2), PlantTAPDB (3), GRASSIUS (4), DATFAP (5), AGRIS (6), RARTF (7), LegumeTFDB (8) and TOBFAC (9). Start from 2005, we have constructed several species-specific plant TF databases with available genome sequences of Arabidopsis (DATF) (10), rice (DRTF) (11) and poplar (DPTF) (12), and integrated them into a comprehensive plant TF database (PlantTFDB 1.0) (13) with 26 402 TFs identified from 22 species. Of these 22 plants, five species have completed genome sequences and the others have unique transcripts integrated by PlantGDB (14). PlantTFDB 1.0 has received millions web hits since it went online in July 2007. With the rapid increase of plant genome sequences in public databases, we have updated the PlantTFDB 1.0 to version 2.0. PlantTFDB 2.0 contains TFs from 49 species covering the main lineages of the plant kingdom, 9 from green algae, 1 from moss, 1 from fern, 3 from gymnosperm and 35 from angiosperm. Using the refined pipeline, a total of 53 319 TFs were identified from these 49 species and classified into 58 families. We made both computational annotation and manual curation for those putative TFs. In order to infer the evolutionary relationships among identified TFs, we constructed phylogenetic trees for each TF family and predicted ortholog groups for the TFs identified from species with completed genome sequences. The web interface of the PlantTFDB 2.0 was redesigned to provide users with more flexible search functionality. In addition to browsing through a web browser, standard web service interface is now supported for advanced users to retrieve data from PlantTFDB 2.0 in a batch mode or integrate data in PlantTFDB 2.0 into their website. All resources in PlantTFDB 2.0 can be browsed, retrieved and downloaded freely.

RESULTS AND DISCUSSION

Improved identification pipeline for plant TFs

While annotations generated by genome sequencing projects provide the most abundant source for proteome of the given species, the automatic annotation nature may often produce incomplete or incorrect annotation (15). On the other hand, dedicated sequence databases like RefSeq (16) provide relatively high quality curation-based annotation. And expressed sequence tag (EST) is also an important source to complement genome annotation. By integrating all existing annotations derived from genome annotation, RefSeq, PlantGDB (14) and UniGene (17), we compiled a non-redundant reference proteome dataset for all 49 species (Supplementary Table S1, Supplementary Figures S1 and S2) for TF prediction. TFs are characterized by their signature DNA-binding domains (DBDs). We employed HMMER 3.0 to identify those signature DBDs from the above proteome data set. In total, 64 HMM models were used to identify domains in TF (Supplementary Table S2), of which 53 models were collected from Pfam 24.0 (18) and 11 models were built using the sequences we collected locally. In the previous version, we set e-value 0.01 as the threshold for domain identification. Based on manual inspection and literature review, we adopted domain-specific bit-score as the threshold in the current version, since e-value is dependent on the size of given protein data set (Supplementary Tables S3 and S4). In PlantTFDB 2.0, we adopted a slightly stringent definition that TFs are ‘proteins that show sequence-specific DNA binding and are capable of activating or/and repressing transcription’ (19). We made an extensive literature review and refined the rule-based classification scheme accordingly (Figure 1 and Supplementary Table S5). In PlantTFDB 2.0, we excluded families that do not meet the above criteria (Supplementary Table S6), including transcription cofactors and chromatin-related proteins such as remodeling factors, histone demethylases, DNA methyltransferases and histone acetyltransferases. Families such as TUBBY-like and Alfin-like were also removed since they were questioned or disproved by new experimental evidences. On the other hand, five newly identified TF families (DBB, FAR1, LSD, NF-X1, STAT) were added in PlantTFDB 2.0. Due to differences in domain composition, DNA binding specificity and function, AP2/ERF and HB were divided to sub-families. The M type of MADS TFs was classified as a new subfamily, since it has been reported that some M type of MADS-box genes could be pseudogenes or a new class of transposable element (19). Finally, we predicted 53 319 TFs from 49 species and classified them into 58 families (Tables 1 and 2, Supplementary Tables S7 and S8) using the refined pipeline.
Figure 1.

Family assignment rules used to identify and assign TFs into different families. Green ellipses represent TF families, and red rectangles denote DBDs. Blue and purple rectangles denote auxiliary and forbidden domains, respectively. Green solid lines link families and DBDs or auxiliary domains, number ‘1’ or ‘2’ on the lines indicate number of DBDs. Red dash lines link families and forbidden domains.

Table 1.

Summary of TFs identified from species with genome sequences

LineageSpeciesCommon nameProteinTF(%)FamilyOGaTFOGa
MonocotyledonBrachypodium distachyonPurple False Brome30 72616875.495610161271
Oryza sativa subsp. indicaIndian Rice43 02719364.505614271692
Oryza sativa subsp. japonicaJapanese Rice58 76024244.135614221636
Sorghum bicolorSorghum35 81018195.085412521583
Zea maysMaize62 18433555.405612081762
DicotyledonArabidopsis lyrataLyrate Rockcress32 23317295.365812981604
Arabidopsis thalianaThale Cress32 12520166.285812971609
Carica papayaPapaya27 82913874.98588811203
Cucumis sativusCucumber27 72517696.38578941153
Glycine maxSoybean48 70735467.285711483057
Lotus japonicus27 97412754.5656752986
Manihot esculentaCassava46 47822014.745810841922
Medicago truncatulaBarrel Medic52 08616053.08568231272
Mimulus guttatusSpotted Monkey Flower27 98916816.01578631345
Populus trichocarpaWestern Balsam Poplar45 18325855.725810862195
Prunus persicaPeach28 29915135.355810061380
Ricinus communisCastor Bean31 95312914.04579941170
Vitis viniferaWine Grape47 09724365.17589211207
FernSelaginella moellendorffii32 9699712.9555411856
MossPhyscomitrella patens subsp. patens40 60411882.9353322863
Green algaChlamydomonas reinhardtii23 0422240.9730123136
Chlorella sp. NC64A97621631.672894120
Coccomyxa sp. C-16999001231.24298290
Micromonas pusilla CCMP154510 5181411.3432119124
Micromonas sp. RCC29910 0741531.5232124134
Ostreococcus lucimarinus CCE990179601181.4830100103
Ostreococcus sp. RCC80974841001.34299597
Ostreococcus tauri7654971.27268991
Volvox carteri15 4161681.0928125137

aOG: number of ortholog groups including at least two TFs; TFOG: number of TFs in ortholog groups.

Table 2.

Summary of TFs identified from species without genome sequences

GroupsSpeciesCommon nameProteinTF(%)Family
MonocotyledonHordeum vulgareBarley24 0207783.2454
Panicum virgatumSwitchgrass30 07811403.7952
Saccharum officinarumSugarcane21 1726713.1748
Triticum aestivumWheat20 4947463.6453
DicotyledonArachis hypogaeaPeanut72432193.0239
Artemisia annuaSweet Wormwood13 0625143.9448
Brassica napusRape30 48213344.3853
Brassica rapaField Mustard14 3137185.0249
Citrus sinensisValencia Orange13 5225343.9546
Gossypium hirsutumUpland Cotton20 86211115.3350
Helianthus annuusSunflower86342793.2344
Malus x domesticaApple15 1736584.3451
Nicotiana tabacumTobacco18 8987934.2052
Raphanus sativusRadish14 7995733.8745
Solanum lycopersicumTomato15 7227995.0854
Solanum tuberosumPotato17 4457764.4552
Theobroma cacaoCocoa74932393.1944
Vigna unguiculataCowpea12 2054753.8948
GymnospermPicea glaucaWhite Spruce15 3765083.3048
Picea sitchensisSitka Spruce10 9893192.9047
Pinus taedaLoblolly Pine13 2754343.2747
Family assignment rules used to identify and assign TFs into different families. Green ellipses represent TF families, and red rectangles denote DBDs. Blue and purple rectangles denote auxiliary and forbidden domains, respectively. Green solid lines link families and DBDs or auxiliary domains, number ‘1’ or ‘2’ on the lines indicate number of DBDs. Red dash lines link families and forbidden domains. Summary of TFs identified from species with genome sequences aOG: number of ortholog groups including at least two TFs; TFOG: number of TFs in ortholog groups. Summary of TFs identified from species without genome sequences

Comprehensive annotation for plant TFs

Comprehensive and accurate annotations derived from various sources provide valuable clues for further functional analysis. Based on our established annotation pipeline, we performed systematic annotation for each family and individual TF. The main page of each family has a distribution chart to show the number of TFs of each species in this family. The information of brief introduction and key references for each family was updated based on literature survey. Multiple sequence alignments for DBDs of each family, either of individual species or among species, can be viewed as WebLogo pictures, or downloaded as text files. Phylogenetic trees can be displayed online or downloaded to local PC in Nexus format. Intra-species phylogenetic trees for each TF family were inferred by MrBayes (v3.2) (20) using the Dayhoff substitution model with 50 000 generations, and FastTree2.1 (21) was employed to construct inter-species trees with 100 resamplings. Annotations at the individual TF level contain general information, domain architecture, gene ontology, PDB hits, expression profiles, cross-references to other databases, ortholog groups, literature citations and links to other useful resources.

Improvement of user interface

We have redesigned the web interface for PlantTFDB 2.0 which has a uniform interface for all species now. Users can browse individual TFs of different families for each species by simply clicking the unique IDs assigned to each TF. The text search page has been greatly improved with much more flexibility for users to make advanced search. Users can select several species in the same or different lineages within the species tree to search TFs in one or more families. Users can combine several query conditions in a single search, including general descriptions, protein properties such as the range of sequence length, various tissues of gene expression and different fields of annotation for TF entries. Users can also customize and save the search results in various formats for further processing. While accessing the resource through web browsers is an easy and intuitive way for most users, web service is efficient for advanced users to access and integrate data into their own sites. We implemented a standard web service interface for PlantTFDB 2.0 (http://planttfdb.cbi.pku.edu.cn/webservice/server.php). A demo for client implementation in PHP is available to help users to get familiar with the web service interface (http://planttfdb.cbi.pku.edu.cn/webservice_client/client.php).

FURTHER DIRECTION

In conclusion, PlantTFDB 2.0 is not only an extensive update of the previous version with newly released 29 completed genomes and updated data sets, but also a great improvement of the user interface. The pipelines we developed for the prediction of TFs at genome scale, the scheme we defined to classify TF families in plants may provide the user community with some useful tools. We will continue on this project to make further update and improvement of PlantTFDB in the future.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

China 863 (2007AA02Z165), 973 (2007CB946904) and NSFC (31071160) programs. Funding for open access publication: China NSFC (31071160) program. Conflict of interest statement. None declared.
  20 in total

1.  MrBayes 3: Bayesian phylogenetic inference under mixed models.

Authors:  Fredrik Ronquist; John P Huelsenbeck
Journal:  Bioinformatics       Date:  2003-08-12       Impact factor: 6.937

2.  DATF: a database of Arabidopsis transcription factors.

Authors:  Anyuan Guo; Kun He; Di Liu; Shunong Bai; Xiaocheng Gu; Liping Wei; Jingchu Luo
Journal:  Bioinformatics       Date:  2005-02-24       Impact factor: 6.937

3.  FastTree 2--approximately maximum-likelihood trees for large alignments.

Authors:  Morgan N Price; Paramvir S Dehal; Adam P Arkin
Journal:  PLoS One       Date:  2010-03-10       Impact factor: 3.240

4.  RARTF: database and tools for complete sets of Arabidopsis transcription factors.

Authors:  Kei Iida; Motoaki Seki; Tetsuya Sakurai; Masakazu Satou; Kenji Akiyama; Tetsuro Toyoda; Akihiko Konagaya; Kazuo Shinozaki
Journal:  DNA Res       Date:  2005       Impact factor: 4.458

5.  The Pfam protein families database.

Authors:  Robert D Finn; Jaina Mistry; John Tate; Penny Coggill; Andreas Heger; Joanne E Pollington; O Luke Gavin; Prasad Gunasekaran; Goran Ceric; Kristoffer Forslund; Liisa Holm; Erik L L Sonnhammer; Sean R Eddy; Alex Bateman
Journal:  Nucleic Acids Res       Date:  2009-11-17       Impact factor: 16.971

6.  LegumeTFDB: an integrative database of Glycine max, Lotus japonicus and Medicago truncatula transcription factors.

Authors:  Keiichi Mochida; Takuhiro Yoshida; Tetsuya Sakurai; Kazuko Yamaguchi-Shinozaki; Kazuo Shinozaki; Lam-Son Phan Tran
Journal:  Bioinformatics       Date:  2009-11-17       Impact factor: 6.937

7.  DRTF: a database of rice transcription factors.

Authors:  Ge Gao; Yingfu Zhong; Anyuan Guo; Qihui Zhu; Wen Tang; Weimou Zheng; Xiaocheng Gu; Liping Wei; Jingchu Luo
Journal:  Bioinformatics       Date:  2006-03-21       Impact factor: 6.937

8.  AGRIS and AtRegNet. a platform to link cis-regulatory elements and transcription factors into regulatory networks.

Authors:  Saranyan K Palaniswamy; Stephen James; Hao Sun; Rebecca S Lamb; Ramana V Davuluri; Erich Grotewold
Journal:  Plant Physiol       Date:  2006-03       Impact factor: 8.340

9.  Database resources of the National Center for Biotechnology Information.

Authors:  Eric W Sayers; Tanya Barrett; Dennis A Benson; Evan Bolton; Stephen H Bryant; Kathi Canese; Vyacheslav Chetvernin; Deanna M Church; Michael Dicuccio; Scott Federhen; Michael Feolo; Lewis Y Geer; Wolfgang Helmberg; Yuri Kapustin; David Landsman; David J Lipman; Zhiyong Lu; Thomas L Madden; Tom Madej; Donna R Maglott; Aron Marchler-Bauer; Vadim Miller; Ilene Mizrachi; James Ostell; Anna Panchenko; Kim D Pruitt; Gregory D Schuler; Edwin Sequeira; Stephen T Sherry; Martin Shumway; Karl Sirotkin; Douglas Slotta; Alexandre Souvorov; Grigory Starchenko; Tatiana A Tatusova; Lukas Wagner; Yanli Wang; W John Wilbur; Eugene Yaschenko; Jian Ye
Journal:  Nucleic Acids Res       Date:  2009-11-12       Impact factor: 16.971

10.  PlnTFDB: updated content and new features of the plant transcription factor database.

Authors:  Paulino Pérez-Rodríguez; Diego Mauricio Riaño-Pachón; Luiz Gustavo Guedes Corrêa; Stefan A Rensing; Birgit Kersten; Bernd Mueller-Roeber
Journal:  Nucleic Acids Res       Date:  2009-10-25       Impact factor: 16.971

View more
  162 in total

1.  Germin-like protein 2 gene promoter from rice is responsive to fungal pathogens in transgenic potato plants.

Authors:  Faiza Munir; Satomi Hayashi; Jacqueline Batley; Syed Muhammad Saqlan Naqvi; Tariq Mahmood
Journal:  Funct Integr Genomics       Date:  2015-08-16       Impact factor: 3.410

2.  Genome-scale transcriptomic insights into early-stage fruit development in woodland strawberry Fragaria vesca.

Authors:  Chunying Kang; Omar Darwish; Aviva Geretz; Rachel Shahan; Nadim Alkharouf; Zhongchi Liu
Journal:  Plant Cell       Date:  2013-06-28       Impact factor: 11.277

3.  A Robust Auxin Response Network Controls Embryo and Suspensor Development through a Basic Helix Loop Helix Transcriptional Module.

Authors:  Tatyana Radoeva; Annemarie S Lokerse; Cristina I Llavata-Peris; Jos R Wendrich; Daoquan Xiang; Che-Yang Liao; Lieke Vlaar; Mark Boekschoten; Guido Hooiveld; Raju Datla; Dolf Weijers
Journal:  Plant Cell       Date:  2018-12-20       Impact factor: 11.277

4.  The WRKY transcription factor OsWRKY78 regulates stem elongation and seed development in rice.

Authors:  Chang-Quan Zhang; Yong Xu; Yan Lu; Heng-Xiu Yu; Ming-Hong Gu; Qiao-Quan Liu
Journal:  Planta       Date:  2011-05-06       Impact factor: 4.116

5.  Regulation of the grapevine polygalacturonase-inhibiting protein encoding gene: expression pattern, induction profile and promoter analysis.

Authors:  D Albert Joubert; Giulia de Lorenzo; Melané A Vivier
Journal:  J Plant Res       Date:  2012-08-30       Impact factor: 2.629

6.  VpWRKY3, a biotic and abiotic stress-related transcription factor from the Chinese wild Vitis pseudoreticulata.

Authors:  Ziguo Zhu; Jiangli Shi; Jiangling Cao; Mingyang He; Yuejin Wang
Journal:  Plant Cell Rep       Date:  2012-07-31       Impact factor: 4.570

7.  Anatomical and transcriptional dynamics of maize embryonic leaves during seed germination.

Authors:  Wen-Yu Liu; Yao-Ming Chang; Sean Chun-Chang Chen; Chen-Hua Lu; Yeh-Hwa Wu; Mei-Yeh Jade Lu; Di-Rong Chen; Arthur Chun-Chieh Shih; Chiou-Rong Sheue; Hsuan-Cheng Huang; Chun-Ping Yu; Hsin-Hung Lin; Shin-Han Shiu; Maurice Sun-Ben Ku; Wen-Hsiung Li
Journal:  Proc Natl Acad Sci U S A       Date:  2013-02-19       Impact factor: 11.205

8.  Genomewide analysis of LATERAL ORGAN BOUNDARIES Domain gene family in Zea mays.

Authors:  Yue-Min Zhang; Shi-Zhong Zhang; Cheng-Chao Zheng
Journal:  J Genet       Date:  2014-04       Impact factor: 1.166

9.  Transcriptome-wide identification of R2R3-MYB transcription factors in barley with their boron responsive expression analysis.

Authors:  Huseyin Tombuloglu; Guzin Kekec; Mehmet Serdal Sakcali; Turgay Unver
Journal:  Mol Genet Genomics       Date:  2013-03-29       Impact factor: 3.291

10.  Next-generation sequencing-based transcriptome profiling analysis of Pohlia nutans reveals insight into the stress-relevant genes in Antarctic moss.

Authors:  Shenghao Liu; Nengfei Wang; Pengying Zhang; Bailin Cong; Xuezheng Lin; Shouqiang Wang; Guangmin Xia; Xiaohang Huang
Journal:  Extremophiles       Date:  2013-03-27       Impact factor: 2.395

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.