| Literature DB >> 19858103 |
Paulino Pérez-Rodríguez1, Diego Mauricio Riaño-Pachón, Luiz Gustavo Guedes Corrêa, Stefan A Rensing, Birgit Kersten, Bernd Mueller-Roeber.
Abstract
The Plant Transcription Factor Database (PlnTFDB; http://plntfdb.bio.uni-potsdam.de/v3.0/) is an integrative database that provides putatively complete sets of transcription factors (TFs) and other transcriptional regulators (TRs) in plant species (sensu lato) whose genomes have been completely sequenced and annotated. The complete sets of 84 families of TFs and TRs from 19 species ranging from unicellular red and green algae to angiosperms are included in PlnTFDB, representing >1.6 billion years of evolution of gene regulatory networks. For each gene family, a basic description is provided that is complemented by literature references, and multiple sequence alignments of protein domains. TF or TR gene entries include information of expressed sequence tags, 3D protein structures of homologous proteins, domain architecture and cross-links to other computational resources online. Moreover, the different species in PlnTFDB are linked to each other by means of orthologous genes facilitating cross-species comparisons.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19858103 PMCID: PMC2808933 DOI: 10.1093/nar/gkp805
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Species analysed and number of families and classified proteins per species
| Groups | Species | Source | Annotation version | Reference | Total number of proteinsa | Genome size (Mbp) | Number of families | Number of classified proteinsa |
|---|---|---|---|---|---|---|---|---|
| Red algae (Rhodophytes) | 1 | 20070710 | ( | 5008 | 16.52 | 34 | 147 | |
| 9 | ( | 6604 | 10 | 37 | 201 | |||
| Green algae (Prasinophytes) | 2 | 2 | ( | 10 455 | 15 | 49 | 289 | |
| 2 | 3 | ( | 10 160 | 15 | 49 | 326 | ||
| 2 | 2 | ( | 7812 | 12.56 | 47 | 216 | ||
| 2 | 2 | ( | 7651 | 13.204 | 46 | 236 | ||
| Green algae (Chlorophytes) | 2 | 4 | ( | 16 460 | 121 | 52 | 346 | |
| 2 | 1 | 9762 | 40 | 48 | 304 | |||
| 2 | 1 | 10 174 | 120 | 47 | 261 | |||
| Bryophyte (Bryopsida) | 2 | 1.1 | ( | 35 724 | 480 | 72 | 1295 | |
| Spike-moss (Lycopodiophyte) | 2 | 1 | 22 138 | 100 | 74 | 896 | ||
| Angiosperms (Monocots) | 3 | 20050118 | ( | 49 643 | 420 | 79 | 2393 | |
| 4 | 6 | ( | 63 306 | 420 | 79 | 2722 | ||
| 2 | 4 | ( | 35 682 | 730 | 78 | 2231 | ||
| 5 | 3b.50 | 55 810 | 2400 | 79 | 3608 | |||
| Angiosperms (Eudicots) | 7 | ( | 24 852 | 372 | 81 | 1480 | ||
| 2 | 1 | 32 234 | 206.7 | 81 | 2162 | |||
| 6 | 8 | ( | 30 707 | 125 | 81 | 2451 | ||
| 2 | 1.1 | ( | 45 009 | 485 | 81 | 2901 | ||
| 8 | 1 | ( | 30 342 | 500 | 80 | 1725 |
(1) CME GP, Cyanidioschyzon merolae Genome Project, http://merolae.biol.s.u-tokyo.ac.jp/; (2) JGI/DOE, Joint Genome Institute/Department of Energy, http://www.jgi.doe.gov/; (3) BGI, Beijing Genomics Institute, http://www.genomics.org.cn/; (4) TIGR, The Institute for Genomic Research, http://www.tigr.org/; (5) MaizeSequence.org, http://www.maizesequence.org; (6) TAIR, The Arabidopsis Information Resource, http://www.arabidopsis.org/; (7) The Hawaii Papaya Genome Project, http://asgpb.mhpcc.hawaii.edu/papaya/; (8) Genoscope, Centre Nacional de Séquençage http://www.genoscope.cns.fr/spip/Vitis-vinifera-e.html; (9) Data communicated by Prof. Dr Andreas Weber, University of Duesseldorf, Germany.
aNumber of non-redundant proteins.
Figure 1.Selecting the significance score threshold in newly created profile HMMs. The graphic shows the scores obtained for proteins in the V. carteri proteome when searched with the VARL HMM with an e-value cut-off of 10. Known members of the family in this species (TPs) are highlighted in green. The putative TN with the highest score is indicated by a purple arrow. The TP with the minimum score is highlighted by a green arrow. The significance score threshold (black line) is computed as the average between the minimum score for TPs (green line) and the maximum score for TNs (purple line). For this family, the selected threshold is −4.25 bits.
Figure 2.Screenshot of a web page displaying details for a TF gene in PlnTFDB. (A) Every gene page in PlnTFDB displays basic information (including species name and gene family assignment) for a given TF or TR. If gene names had been assigned (only for A. thaliana and O. sativa ssp. japonica) they will be displayed as well. (B) The best hits (hhsearch, probability of being a TP ≥98%) to PDB protein 3D structures are visualized as static images, a link is provided to the embedded Java applet Jmol where basic operations on the 3D structure can be performed. (C) Links to orthologues in PlnTFDB are provided. (D) Users can query PlnTFDB through similarity searches (BLAST) using a protein or a nucleotide sequence as query. (E) Domain architecture is displayed with links to the original domain databases (Pfam or our local database, see section ‘Identification of protein domains and new domains models’). (F) Links to the protein and transcript sequences of the gene are provided.
Sensitivity and PPV of PlnTFDB predictions
| Species | Family | Reference | TP/TP + FN | TP/TP + FP | Sensitivity | PPV |
|---|---|---|---|---|---|---|
| ATH | ( | 146/147 | 146/146 | 0.99 | 1.00 | |
| ( | 21/23 | 21/23 | 0.91 | 0.91 | ||
| ( | 28/29 | 28/28 | 0.97 | 1.00 | ||
| bHLH | ( | 125/154 | 125/136 | 0.81 | 0.92 | |
| ( | 70/76 | 70/70 | 0.92 | 1.00 | ||
| ( | 35/36 | 35/36 | 0.97 | 0.97 | ||
| ( | 29/29 | 29/29 | 1.00 | 1.00 | ||
| ( | 65/67 | 65/68 | 0.97 | 0.96 | ||
| ( | 32/32 | 32/33 | 1.00 | 0.97 | ||
| ( | 97/105 | 97/105 | 0.92 | 0.92 | ||
| ( | 98/108 | 98/105 | 0.91 | 0.93 | ||
| MYB | ( | 185/198 | 185/212 | 0.93 | 0.87 | |
| ( | 100/100 | 100/104 | 1.00 | 0.96 | ||
| ( | 16/17 | 16/16 | 0.94 | 1.00 | ||
| ( | 71/72 | 71/72 | 0.99 | 0.99 | ||
| OSAJ | bHLH | ( | 134/166 | 134/143 | 0.81 | 0.94 |
| bZIP | ( | 82/92 | 82/90 | 0.89 | 0.91 | |
| C2C2-GATA | ( | 18/19 | 18/27 | 0.95 | 0.67 | |
| ( | 65/67 | 65/70 | 0.97 | 0.93 | ||
| MYB | ( | 145/156 | 145/196 | 0.93 | 0.74 | |
| ( | 18/19 | 18/19 | 0.95 | 0.95 |
The sensitivity and the PPV were determined for selected A. thaliana (ATH) and O. sativa ssp. japonica (OSAJ) TF families. For the PPV, a deviation from 1.00 means the inclusion of FPs. For the sensitivity, deviations from 1.00 indicate exclusion of true members (FNs). Families with both values larger than 0.90 appear in bold face. TPs according to gold standard.