Literature DB >> 27924042

PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants.

Jinpu Jin1, Feng Tian1,2, De-Chang Yang1, Yu-Qi Meng1, Lei Kong1, Jingchu Luo3, Ge Gao4.   

Abstract

With the goal of providing a comprehensive, high-quality resource for both plant transcription factors (TFs) and their regulatory interactions with target genes, we upgraded plant TF database PlantTFDB to version 4.0 (http://planttfdb.cbi.pku.edu.cn/). In the new version, we identified 320 370 TFs from 165 species, presenting a more comprehensive genomic TF repertoires of green plants. Besides updating the pre-existing abundant functional and evolutionary annotation for identified TFs, we generated three new types of annotation which provide more directly clues to investigate functional mechanisms underlying: (i) a set of high-quality, non-redundant TF binding motifs derived from experiments; (ii) multiple types of regulatory elements identified from high-throughput sequencing data; (iii) regulatory interactions curated from literature and inferred by combining TF binding motifs and regulatory elements. In addition, we upgraded previous TF prediction server, and set up four novel tools for regulation prediction and functional enrichment analyses. Finally, we set up a novel companion portal PlantRegMap (http://plantregmap.cbi.pku.edu.cn) for users to access the regulation resource and analysis tools conveniently.
© The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 27924042      PMCID: PMC5210657          DOI: 10.1093/nar/gkw982

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Transcription factors (TFs) temporarily and spatially turn on or off the transcription of their target genes through binding certain upstream elements. After the first attempt to identify the TF repertoire in Arabidopsis thaliana genome in 2000 (1), several dedicated TF databases for either individual or multiple species (2–10) have been publicly available, effectively advancing the study of plant transcriptional regulatory system. In response to the demand of systematical identification and annotation of plant TFs, we have developed three species-specific TF databases (DATF (11) for Arabidopsis, DRTF (12) for rice, and DPTF (13) for poplar) and three versions of comprehensive plant TF database (PlantTFDB) (14–17) during the past decade. These resources receive tens of millions web hits annually and have been widely used for the functional and evolutionary study of plant TFs and the identification and annotation of TFs in newly sequenced plants (18,19). The regulatory interactions between TFs and target genes provide crucial information to investigate functional mechanisms of TFs. Recently, the DNA binding motifs of hundreds of plant TFs have been determined (20–22), combined with genomic regulatory elements identified from ChIP-seq and DNase-seq (23), providing us an opportunity to systematically map downstream regulatory interactions of TFs. To meet the urgent demand of our user community for transcriptional regulation information, we upgraded PlantTFDB to version 4.0 (http://planttfdb.cbi.pku.edu.cn/). Compared with previous versions, PlantTFDB 4.0 covers more species and more TFs with updated abundant functional and evolutionary annotation. Moreover, three new types of annotation including binding motifs of TFs, regulatory elements and interactions between them have been added. In addition, we upgraded previous TF prediction server, and set up four novel tools for regulation prediction and functional enrichment analyses (Figure 1). Finally, we set up a new portal PlantRegMap (http://plantregmap.cbi.pku.edu.cn) for users to access the regulation resource and analysis tools conveniently.
Figure 1.

The flowchart for construction of PlantTFDB 4.0. Items marked with ‘(W)’ are web services. RBH means BLAST reciprocal best hit, and OG means orthologous group.

The flowchart for construction of PlantTFDB 4.0. Items marked with ‘(W)’ are web services. RBH means BLAST reciprocal best hit, and OG means orthologous group. We believe that PlantTFDB 4.0 provides users a comprehensive resource of plant TFs, regulatory elements and regulatory interactions between them as well as useful analysis tools, advancing the study in the community.

RESULTS AND DISCUSSION

More representative genomic TF repertoires of green plants

By incorporating 89 newly sequenced genomes released during past three years, we totally collected 156 species with genome sequences. Using our established TF prediction pipeline (16) and protein sequences downloaded from Joint Genome Institute and other plant genome sequencing and annotation projects (see http://planttfdb.cbi.pku.edu.cn/help_datasrc.php for detailed data source), we identified 315 099 TFs (266 849 loci) from 6 381 676 proteins (5 555 839 loci) of 156 species (Supplementary Table S1). In addition, 5271 TFs from 9 species in version 3.0 remaining without genome sequences were kept. By incorporating the 89 new species with genome sequences, we provide genomic TF repertoires covering two more early-diverged lineages, charophyta and marchantiophyta, of green plants (Figure 2A) and present a refined origination profiling of TF families (Figure 2B) than previous studies (24). In addition, the inclusion of more wild species for domestic species (e.g., Oryza rufipogon and Oryza nivara for Oryza sativa) also facilitates the study of crop domestication at transcriptional regulation level and further crop breeding.
Figure 2.

Summary of TFs identified from 156 species with genome sequences. (A) Average number of TFs in different taxonomic lineages. (B) The origination stage of TF families. The number in the parentheses (e.g., Eudicots (95)) is the number of species used to date the origination stage for TF families according to the protocol described in (24).

Summary of TFs identified from 156 species with genome sequences. (A) Average number of TFs in different taxonomic lineages. (B) The origination stage of TF families. The number in the parentheses (e.g., Eudicots (95)) is the number of species used to date the origination stage for TF families according to the protocol described in (24).

Updated pipelines for evolutionary and functional annotation

As the continuous effort to improve annotation coverage and quality, we not only updated all existing annotation from multiple canonical sources (24–27) but also refined the annotation pipelines for Orthologous Group (OG) inference and Gene Ontology (GO) term assignment. Orthologous genes usually possess similar function and are widely used to infer function interspecies. Given the recent reported methodology bias for OrthoMCL (28), we employed a new tool OrthoFinder (28) to infer orthologous groups (OGs) for proteins from 156 species with genome sequences (Figure 3), and got 2368 TF OGs for all species, 800 TF OGs for 17 representative species (Supplementary Table S2), 209, 1898, 1261, 1846 and 1900 TF OGs for the clades of Chlorophytae, Monocots, Asterids, Fabids and Malvids, respectively. Meanwhile, we also identified orthologous pairs under a stricter criterion, BLAST reciprocal best hits (RBHs), for all species pairs among 156 species with genome sequences, which serve as a basis for transferring annotation interspecies (e.g., from well-studied model plants like Arabidopsis thaliana to poor-annotated ones).
Figure 3.

Workflow for parsing BLAST reciprocal best hits (RBHs) and inferring orthologous groups.

Workflow for parsing BLAST reciprocal best hits (RBHs) and inferring orthologous groups. Gene Ontology is widely used to describe gene function. Taking an integrative strategy by combining sequence-based prediction (29), orthologous-based projection, and collection of existing annotation in canonical sources (25,26), we successfully annotated 93.2% (293 696 of 315 099) TFs in 165 species. Moreover, to provide a comprehensive reference dataset for follow-up functional enrichment analysis, we further annotated 59.3% (3 374 258 of 5 691 627) genes in these species, increasing the coverage of GO annotation to 9.1 times on average (Supplementary Table S3, see Supplementary Text 1 for details on the annotation pipeline).

Genome-wide regulatory maps of green plants

TFs regulate the transcription of their target genes through binding certain cis-elements. Mapping regulatory elements is essential for the identification of regulatory interactions and further functional mechanisms of TFs. High-throughput assays like ChIP-seq (30) and genome-wide TF footprinting (23,31) have been efficiently employed to identify genome-wide TF occupancy sites. Here, we identified 345 920 TF binding peaks for 14 TFs based on 26 public ChIP-seq datasets as well as 4 794 773 TF footprints based on 13 public DNase-seq datasets according to the protocol described in (31) (Table 1, Supplementary Tables S4 and S5 and Supplementary Text 2). As histone modification and nucleosome positioning also provide useful clues for the study of transcriptional regulation, such information is also incorporated (Table 1, Supplementary Tables S6–S8 and Supplementary Text 2). In addition, we set up a genome browser for users to access these regulatory elements conveniently (http://plantregmap.cbi.pku.edu.cn/cis-map.php).
Table 1.

Summary of integrated regulation information identified from high-throughput sequencing

TypeSpeciesOrganPeak
DNase I hypersensitive site2101 026 748
DNase I genomic TF footprints144 794 773
Histone modification57765 371
Nucleosome positioning1215 348 541
TF binding peak (ChIP-seq)24345 920
The DNA binding motifs of TFs are crucial to identifying their downstream target genes. During the past three years, binding motifs of hundreds of plant TFs have been determined and publicly available (20–22). We firstly collected 1790 binding motifs derived from experiments from multiple databases (including PlantCistromeDB (20), CIS-BP (21), JASPAR (32), UniPROBE (33), TRANSFAC (public 7.0) (34)), literature (22) and motifs discovered from ChIP-seq peaks using MEME-ChIP (35). For a TF with more than one motif, we manually curated the best one preferentially for the motif determined in vivo and presenting more similarity with other motifs of this TF. After filtering out low-quality ones (these with information content < 4.5), we got 674 non-redundant, high-quality binding motifs, covering 77.6% (45 of 58) TF families in PlantTFDB. To take full advantage of this resource, we projected these motifs to other species using the RBHs identified above (only RBHs belong to the same TF family were used) and adjusted the motifs using the nucleic acid background in promoter regions (TSS −500 bp to +100bp). Finally, we got sets of species-specific binding motifs for 156 species (Figure 4 and Supplementary Table S9).
Figure 4.

Curation and projection of TF binding motifs. (A) Workflow for curation and projection of high-quality TF binding motifs derived from experiments. (B) The visualization of TF binding motifs in PlantTFDB.

Curation and projection of TF binding motifs. (A) Workflow for curation and projection of high-quality TF binding motifs derived from experiments. (B) The visualization of TF binding motifs in PlantTFDB. The regulatory interactions between TFs and target genes provide the most direct evidences to investigate the functional mechanisms of TFs. We collected transcriptional regulatory interactions in following three ways: (i) through mining transcriptional regulatory interactions from literature and then manually curated each interaction, we got 1431 high-confident transcriptional regulatory interactions, covering 388 TFs from 47 families, in A. thaliana (24), which were integrated in this update; (ii) identified 9340 experiment-based regulatory interactions identified for six TFs with at least two ChIP-seq replications; (iii) in silico predicted 342 000 regulatory interactions by combining TF binding motifs and footprint data (Table 2, Supplementary Tables S4 and S5 and Supplementary Text 2). As the high-quality interactions are mainly limited to A. thaliana, we further inferred regulatory interactions through in silico searching potential binding sites of TFs using binding motifs in gene promoter regions for 132 species with fine annotation of gene positions (Table 2, Supplementary Table S10 and Supplementary Text 2). Users are encouraged to make use of the less reliable interactions combined with other assistant information such as expression information and conserved elements.
Table 2.

Summary of the regulatory interactions at the new portal PlantRegMap

MethodSpeciesOrganInteraction
Literature mining and manual curation11 431
ChIP-seq129 340
Motif + TF footprint14342 005
Motif13250 850 582

Novel online tools for transcriptional regulation prediction and analyses

For users to take full advantage of our rich data and pipelines, we upgraded the TF prediction server and set up four novel servers for TF identification, regulation prediction and functional enrichment analyses (see Supplementary Text 3 for more details). TF prediction: According to the requirement of users on TF prediction from newly identified transcripts, support for nucleic acid sequences was added. ESTScan (36) is employed to identify CDS regions of input nucleic acid sequences and translate them to protein sequences. In addition, the limitation on the size of uploaded file is relaxed to 100M based on users’ feedback. Binding site prediction: Based on the sets of species-specific, high-quality TF binding motifs integrated above, FIMO (37) is employed to scan for TF binding sites from the input sequences. Regulation prediction and enrichment analysis: Different from ‘Binding site prediction’ server, this tool focuses on inferring regulatory interactions between TFs and the input promoters, and searches for the enriched upstream regulators in the input gene set. GO enrichment: Based on our compiled GO annotation for 165 species generated (see Supplementary Text 1 for details on the annotation pipeline), this tool identifies the significantly enriched GO terms in the input gene set based on Fisher's exact tests (38). TF enrichment: Based on our pre-computed regulatory interactions, this tool allows user to search for enriched upstream regulators in the input gene set.

Resource availability

For users to access our resource conveniently, we set up a new portal PlantRegMap (http://plantregmap.cbi.pku.edu.cn/) for PlantTFDB (http://planttfdb.cbi.pku.edu.cn/), with PlantTFDB focusing on TF identification and annotation and the new portal for regulation prediction and analyses (Table 3). All of data in our database can be freely accessed and downloaded online.
Table 3.

Resources available at the new portal PlantRegMap

DataTool
Regulatory elementsBinding site prediction
Regulatory interactionsRegulation prediction
GO annotation for all genesGO enrichment
OGs and RBHs for all genesTF enrichment

FURTHER DIRECTION

We have upgraded our PlantTFDB to version 4.0 with a new portal PlantRegMap, which provides genomic TF repertoires covering the main lineages of green plants, sets of high-quality binding motifs, genome-wide regulatory interactions and rich analysis tools. The information for TFs and transcriptional regulation are useful resources for users to explore the functional mechanisms of TFs and plant transcriptional regulatory system (24). We will continue working on this project to embrace more useful annotation, and improve the regulation prediction through introducing the conserved elements.
  36 in total

1.  Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes.

Authors:  J L Riechmann; J Heard; G Martin; L Reuber; C Jiang; J Keddie; L Adam; O Pineda; O J Ratcliffe; R R Samaha; R Creelman; M Pilgrim; P Broun; J Z Zhang; D Ghandehari; B K Sherman; G Yu
Journal:  Science       Date:  2000-12-15       Impact factor: 47.728

2.  Gene: a gene-centered information resource at NCBI.

Authors:  Garth R Brown; Vichet Hem; Kenneth S Katz; Michael Ovetsky; Craig Wallin; Olga Ermolaeva; Igor Tolstoy; Tatiana Tatusova; Kim D Pruitt; Donna R Maglott; Terence D Murphy
Journal:  Nucleic Acids Res       Date:  2014-10-29       Impact factor: 16.971

3.  GRASSIUS: a platform for comparative regulatory genomics across the grasses.

Authors:  Alper Yilmaz; Milton Y Nishiyama; Bernardo Garcia Fuentes; Glaucia Mendes Souza; Daniel Janies; John Gray; Erich Grotewold
Journal:  Plant Physiol       Date:  2008-11-05       Impact factor: 8.340

4.  AGRIS: the Arabidopsis Gene Regulatory Information Server, an update.

Authors:  Alper Yilmaz; Maria Katherine Mejia-Guerra; Kyle Kurz; Xiaoyu Liang; Lonnie Welch; Erich Grotewold
Journal:  Nucleic Acids Res       Date:  2010-11-08       Impact factor: 16.971

5.  PlantTFDB 2.0: update and improvement of the comprehensive plant transcription factor database.

Authors:  He Zhang; Jinpu Jin; Liang Tang; Yi Zhao; Xiaocheng Gu; Ge Gao; Jingchu Luo
Journal:  Nucleic Acids Res       Date:  2010-11-18       Impact factor: 16.971

6.  TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes.

Authors:  V Matys; O V Kel-Margoulis; E Fricke; I Liebich; S Land; A Barre-Dirrie; I Reuter; D Chekmenev; M Krull; K Hornischer; N Voss; P Stegmaier; B Lewicki-Potapov; H Saxel; A E Kel; E Wingender
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

7.  UniProt: a hub for protein information.

Authors: 
Journal:  Nucleic Acids Res       Date:  2014-10-27       Impact factor: 16.971

8.  InterProScan 5: genome-scale protein function classification.

Authors:  Philip Jones; David Binns; Hsin-Yu Chang; Matthew Fraser; Weizhong Li; Craig McAnulla; Hamish McWilliam; John Maslen; Alex Mitchell; Gift Nuka; Sebastien Pesseat; Antony F Quinn; Amaia Sangrador-Vegas; Maxim Scheremetjew; Siew-Yit Yong; Rodrigo Lopez; Sarah Hunter
Journal:  Bioinformatics       Date:  2014-01-21       Impact factor: 6.937

9.  TreeTFDB: an integrative database of the transcription factors from six economically important tree crops for functional predictions and comparative and functional genomics.

Authors:  Keiichi Mochida; Takuhiro Yoshida; Tetsuya Sakurai; Kazuko Yamaguchi-Shinozaki; Kazuo Shinozaki; Lam-Son Phan Tran
Journal:  DNA Res       Date:  2013-01-02       Impact factor: 4.458

10.  JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles.

Authors:  Anthony Mathelier; Oriol Fornes; David J Arenillas; Chih-Yu Chen; Grégoire Denay; Jessica Lee; Wenqiang Shi; Casper Shyr; Ge Tan; Rebecca Worsley-Hunt; Allen W Zhang; François Parcy; Boris Lenhard; Albin Sandelin; Wyeth W Wasserman
Journal:  Nucleic Acids Res       Date:  2015-11-03       Impact factor: 16.971

View more
  499 in total

1.  Prediction of condition-specific regulatory genes using machine learning.

Authors:  Qi Song; Jiyoung Lee; Shamima Akter; Matthew Rogers; Ruth Grene; Song Li
Journal:  Nucleic Acids Res       Date:  2020-06-19       Impact factor: 16.971

Review 2.  Sizing up the cell cycle: systems and quantitative approaches in Chlamydomonas.

Authors:  James G Umen
Journal:  Curr Opin Plant Biol       Date:  2018-09-10       Impact factor: 7.834

3.  The draft genomes of five agriculturally important African orphan crops.

Authors:  Yue Chang; Huan Liu; Min Liu; Xuezhu Liao; Sunil Kumar Sahu; Yuan Fu; Bo Song; Shifeng Cheng; Robert Kariba; Samuel Muthemba; Prasad S Hendre; Sean Mayes; Wai Kuan Ho; Anna E J Yssel; Presidor Kendabie; Sibo Wang; Linzhou Li; Alice Muchugi; Ramni Jamnadass; Haorong Lu; Shufeng Peng; Allen Van Deynze; Anthony Simons; Howard Yana-Shapiro; Yves Van de Peer; Xun Xu; Huanming Yang; Jian Wang; Xin Liu
Journal:  Gigascience       Date:  2019-03-01       Impact factor: 6.524

4.  Identification of drought-responsive microRNAs in tomato using high-throughput sequencing.

Authors:  Minmin Liu; Huiyang Yu; Gangjun Zhao; Qiufeng Huang; Yongen Lu; Bo Ouyang
Journal:  Funct Integr Genomics       Date:  2017-09-28       Impact factor: 3.410

5.  Transcriptome characterization and generation of marker resource for Himalayan vulnerable species, Ulmus wallichiana.

Authors:  Amandeep Singh; Aasim Majeed; Pankaj Bhardwaj
Journal:  Mol Biol Rep       Date:  2021-01-13       Impact factor: 2.316

6.  New insights into the response of maize to fluctuations in the light environment.

Authors:  Jianzhou Qu; Xiaonan Gou; Wenxin Zhang; Ting Li; Jiquan Xue; Dongwei Guo; Shutu Xu
Journal:  Mol Genet Genomics       Date:  2021-02-25       Impact factor: 3.291

7.  Strategies and tools to improve crop productivity by targeting photosynthesis.

Authors:  Michael L Nuccio; Laura Potter; Suzy M Stiegelmeyer; Joseph Curley; Jonathan Cohn; Peter E Wittich; Xiaoping Tan; Jimena Davis; Junjian Ni; Jon Trullinger; Rick Hall; Nicholas J Bate
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2017-09-26       Impact factor: 6.237

8.  Molecular Mechanisms Preventing Senescence in Response to Prolonged Darkness in a Desiccation-Tolerant Plant.

Authors:  Meriem Durgud; Saurabh Gupta; Ivan Ivanov; M Amin Omidbakhshfard; Maria Benina; Saleh Alseekh; Nikola Staykov; Mareike Hauenstein; Paul P Dijkwel; Stefan Hörtensteiner; Valentina Toneva; Yariv Brotman; Alisdair R Fernie; Bernd Mueller-Roeber; Tsanko S Gechev
Journal:  Plant Physiol       Date:  2018-05-22       Impact factor: 8.340

9.  Genome-wide regulatory gene-derived SSRs reveal genetic differentiation and population structure in fiber flax genotypes.

Authors:  Dipnarayan Saha; Rajeev Singh Rana; Shantanab Das; Subhojit Datta; Jiban Mitra; Sylvie J Cloutier; Frank M You
Journal:  J Appl Genet       Date:  2018-10-27       Impact factor: 3.240

10.  Biosynthesis of riccionidins and marchantins is regulated by R2R3-MYB transcription factors in Marchantia polymorpha.

Authors:  Hiroyoshi Kubo; Shunsuke Nozawa; Takuma Hiwatashi; Youichi Kondou; Ryo Nakabayashi; Tetsuya Mori; Kazuki Saito; Kojiro Takanashi; Takayuki Kohchi; Kimitsune Ishizaki
Journal:  J Plant Res       Date:  2018-05-29       Impact factor: 2.629

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.