Literature DB >> 20965969

TcoF-DB: dragon database for human transcription co-factors and transcription factor interacting proteins.

Ulf Schaefer1, Sebastian Schmeier, Vladimir B Bajic.   

Abstract

The initiation and regulation of transcription in eukaryotes is complex and involves a large number of transcription factors (TFs), which are known to bind to the regulatory regions of eukaryotic DNA. Apart from TF-DNA binding, protein-protein interaction involving TFs is an essential component of the machinery facilitating transcriptional regulation. Proteins that interact with TFs in the context of transcription regulation but do not bind to the DNA themselves, we consider transcription co-factors (TcoFs). The influence of TcoFs on transcriptional regulation and initiation, although indirect, has been shown to be significant with the functionality of TFs strongly influenced by the presence of TcoFs. While the role of TFs and their interaction with regulatory DNA regions has been well-studied, the association between TFs and TcoFs has so far been given less attention. Here, we present a resource that is comprised of a collection of human TFs and the TcoFs with which they interact. Other proteins that have a proven interaction with a TF, but are not considered TcoFs are also included. Our database contains 157 high-confidence TcoFs and additionally 379 hypothetical TcoFs. These have been identified and classified according to the type of available evidence for their involvement in transcriptional regulation and their presence in the cell nucleus. We have divided TcoFs into four groups, one of which contains high-confidence TcoFs and three others contain TcoFs which are hypothetical to different extents. We have developed the Dragon Database for Human Transcription Co-Factors and Transcription Factor Interacting Proteins (TcoF-DB). A web-based interface for this resource can be freely accessed at http://cbrc.kaust.edu.sa/tcof/ and http://apps.sanbi.ac.za/tcof/.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 20965969      PMCID: PMC3013796          DOI: 10.1093/nar/gkq945

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

It is known that initiation and regulation of transcription in eukaryotic cells is a complex process. It is widely accepted that it constitutes a vastly complex process involving a large number of proteins interacting with the regulatory regions of DNA in various ways (1–5). Additionally, the process involves numerous proteins that do not themselves interact with the DNA molecule, but instead interact with DNA binding proteins, forming complexes that could themselves interact with DNA (6,7). This phenomenon significantly increases the complexity of the process of transcriptional regulation and initiation. Proteins that bind DNA and in this way affect the regulation of transcription are in this context referred to as transcription factors (TFs) (8). At the same time proteins that are not DNA binding in the context of transcriptional regulation, but are nevertheless involved in this process through the interaction with TFs are referred to as transcription co-factors (TcoFs) (9,10). The functions that TcoFs are known to have are multiple and include, among others, signal transmission, modulation of TF–DNA binding and chromatin modification (11,12). Apart from the interaction of TFs with the regulatory regions of the DNA molecule, the interactions between proteins (protein–protein interaction, PPI) are of importance for understanding the mechanisms of transcriptional regulation. PPIs include various forms of relationships between two proteins, e.g. the modification of one protein by another protein, the transport of one protein from one cellular location to another by another protein or the formation of a protein complex by two or more proteins, etc. PPIs of our interest that are important for the processes of transcriptional regulation include the physical interaction between two TFs, the physical interaction between a TF and a TcoF as well as potentially the physical interaction between a TF and another protein that cannot be regarded as a TcoF. The specific influence of TFs on transcription is widely accepted to be an important component in gene expression (13). It has been recognized that TFs can have an influence on disease development (14) and they have been previously identified as potential targets in cancer therapy (15–18) or cancer diagnostic biomarkers (19). The functionality of TFs has also been shown to be important for the development of treatments for a number of other human diseases (20–23). However, research on the interaction of TFs with other proteins in relation to transcription regulation has so far not been that intensive. It has previously been shown that PPI can be linked to organism complexity (24). It has also been shown that the same TF can have opposite functions depending on its interactions with TcoFs (25). We therefore believe that in order to get a more complete insight into the complexity of transcriptional regulatory processes and more accurately describe the role of TFs and TcoFs in relation to human diseases it is imperative to investigate TFs in conjunction with their interacting proteins. With this in mind, we developed a public resource: Dragon Database for Human Transcription Co-Factors and Transcription Factor Interacting Proteins (TcoF-DB). With this database, we wish to provide a comprehensive starting point for investigations into transcriptional regulation that include PPIs between TFs, TFs and TcoFs, as well as between TFs and other proteins. While there are numerous resources available that deal with either the complete human proteome (26–29), human PPI (30–33) or mammalian TFs (34–36), we are not aware of a resource that unifies the data necessary to directly investigate human TFs, human TcoFs and other human proteins that interact with TFs. TcoF-DB attempts to reduce this gap.

DATA INTEGRATION AND ANALYSIS

We generated a highly accurate set of human TFs. For this, we rely on data that was previously published by Vaquerizas et al. (13) and that we regard to be a gold-standard due to the meticulous way in which it was created. We extracted all TFs from that reference. This set constitutes all human proteins that possess a sequence specific DNA binding domain and additionally have passed a step of manual curation during which each protein was examined by a human curator for being a TF with high confidence. In addition to this, we extracted 70 proteins that are included in TRANSFAC 11.4 (34) but were not included in the list published by Vaquerizas et al. We manually checked each of these proteins and added 19 of them to our list of human TFs. To complement our set of TFs further, we finally extracted mouse TF genes from TFCAT (37) whose protein products bind DNA in a sequence specific as well as a non-sequence specific manner. For these mouse genes, we identified human ortholog genes using NCBI's Homologene (38) and their gene products, if available. We manually checked each of these proteins that were not already included in our set and added eight proteins from TFCAT to our TF list. In total, we identified 1365 human TFs in this way. It should be highlighted that each TF in this list has been hand curated at some point during the process. Four public databases of human PPIs were used to extract PPIs. These were MINT (accessed July 2010) (39), IntAct (accessed July 2010) (31), BioGRID version 3.0.67 (30) and Reactome version 33 (40). These databases were selected because they allow for the download of data in the format suitable for processing by a computer. All data in these four databases is experimentally confirmed. It was of particular importance that this information is presented in the PSI-MI format [molecular interaction standard of the Proteomics Standards Initiative (41)] in order to allow us to focus on PPI of a certain type. We only consider PPIs that represent physical interactions between two proteins. Thus, considered PSI-MI interaction types were one of the following: In this manner, we were able to extract 7045 unique interactions between two proteins where at least one of the participants is a TF from our list and the interaction is one of the above mentioned physical interactions. MI:0195 (covalent binding), MI:0407 (direct interaction), MI:0915 (physical association). All proteins that were identified as having a physical interaction with a known TF and are not themselves included in our reference list of TFs are initially considered ‘TF interacting proteins’. Evidence for the interaction is extracted from one of the aforementioned databases and added to our database. In total, 2300 distinct human proteins have been found to interact with one of 712 TF. This leaves 653 TFs from our list for which we were not able to find a known interaction with another protein. Out of these 2300 proteins, we subsequently endeavoured to identify those proteins that can be considered TcoFs. First, we postulated the necessary condition that a TcoF candidate is known to be located in the cell nucleus. For this, we check whether the protein in question is annotated in Gene Ontology (GO) (42) with term GO:0005634 (cellular component: ‘nucleus’). For this we use AmiGO version 1.7 (43). We do not consider proteins that do not possess this annotation as TcoFs. Second, we require that a TcoF candidate is known to be involved in transcriptional regulation. For this condition, we check whether the protein in questions is annotated in GO with either GO:0030528 (molecular function: ‘transcription regulator activity’) or one of its descendent terms, or with GO:0045449 (biological process: ‘regulation of transcription’) or one of its descendent terms. All proteins that do not have one of these annotations in either the molecular function or the biological process ontology of GO also are not considered TcoFs. In summary, we only consider a protein to be a TcoF if it satisfies all conditions shown in Table 1.
Table 1.

A proteins is only considered a TcoF if it satisfies these four conditions

No.Condition
1The protein is not characterized as a TF.
2It is shown to bind to a known transcription factor. This binding was demonstrated by experiment and is referenced in scientific literature.
3It is annotated in Gene Ontology with GO:0005634 (‘nucleus').
4It is annotated in Gene Ontology with GO:0030528 (‘transcription regulator activity') or one of its descendent terms or with GO:0045449 (‘regulation of transcription') or one of its descendent terms.
A proteins is only considered a TcoF if it satisfies these four conditions On the other hand, proteins that satisfy both conditions 1 and 2 but do not satisfy both conditions 3 and 4 are considered ‘TF binding proteins’, but not TcoF. In total, 529 TcoFs were identified in this way. This leaves 1771 proteins to be characterized as ‘TF binding proteins’. Since this way of characterizing TcoFs is heavily dependent on protein annotations in GO, we subsequently divided the TcoFs into different groups based on the type of evidence that is given for each relevant annotation in GO. For this purpose, we consider two groups of evidence given in GO: experimental evidence consisting of the evidence codes EXP, IDA, IMP, IGI, IEP and IPI and non-experimental evidence consisting of all other evidence codes (see http://www.geneontology.org/GO.evidence.shtml for details on evidence in GO). We then divide the TcoFs into the following four groups, based on the type of evidence that is present for the fulfilment of conditions 3 and 4 from Table 1: High-confidence TcoFs: All TcoFs that have experimental evidence for both, involvement in transcription regulation and for occurrence in the cell nucleus. Hypothetical TcoFs (Class 1): All TcoFs that have experimental evidence for involvement in transcription regulation, but only non-experimental evidence (e.g. ‘Inferred from Electronic Annotation' or ‘Author statement') for occurrence in the cell nucleus. Hypothetical TcoFs (Class 2): All TcoFs that have experimental evidence for occurrence in the cell nucleus, but only non-experimental evidence for involvement in transcription regulation. Hypothetical TcoFs (Class 3): All TcoFs that have only non-experimental evidence for both, involvement in transcription regulation and for occurrence in the cell nucleus. This classification leads to a distribution of TcoFs among the four groups as can be seen in Table 2.
Table 2.

Distribution of TcoFs among groups

TcoF groupsEvidence for involvement in transcription regulation
Experimental (%)Non-experimental (%)
Evidence for location in cell nucleusExperimental155 (29.3) High confidence96 (18.1) Hypothetical (Class2)
Non-experimental62 (11.7) Hypothetical (Class 1)216 (40.8) Hypothetical (Class 3)
Distribution of TcoFs among groups Only TcoFs that have experimental evidence cited for at least one GO annotation relevant to the regulation of transcription and have experimental evidence cited for occurrence in the cell nucleus are considered TcoFs with high confidence. If experimentally confirmed evidence for one of these is missing the protein is considered to be a hypothetical TcoF. The level of confidence for a hypothetical TcoF decreases the higher the number of the hypothetical TcoF class. In a final step, all data have been incorporated in a relational database (PostgreSQL version 8.4) and a simple web-interface has been implemented. The database can be accessed through http://cbrc.kaust.edu.sa/tcof/ or http://apps.sanbi.ac.za/tcof/. The interface allows for the search of human proteins by their name or Uniprot identifiers (Uniprot ID or accession number). Alternative names for proteins, as given in the Uniprot Knowledge Base (26), are included in the name search wherever possible. The possibility of viewing predefined sets of proteins like TFs and TcoFs also exists. The viewing of TcoFs can be narrowed by viewing only TcoFs of certain groups. The results page will give a comprehensive overview of all interactions. For each TF it is shown what other TFs it interacts with and what interactions with TcoFs and other interacting proteins are evident. For each TcoF and for each other protein interacting with a TF, a list of TFs that it interacts with is displayed. On each page outlining protein details the evidence for a protein's status is also displayed. Online help and a user manual are also available through the aforementioned URL. These will aid the user in successfully exploiting this database resource. In addition, we provide the core data of our database for download in a format that can be easily processed by a computer. The database will be updated annually.

DISCUSSION

To the best of our knowledge, this database constitutes the only currently available resource that provides comprehensive information about PPIs of human TFs, human TcoF and other human proteins that interact with human TFs. While there are numerous resources regarding PPI publicly available (44), none of them provides information specifically related to protein involvement in transcriptional regulation, and neither do they provide any mechanism to directly search for such information. The TcoF-DB database is developed with the understanding that a TcoF is a protein that has a proven binding interaction with a known TF, but does itself not directly bind to the regulatory DNA region. A TcoF is also required to be reported to be involved in the regulation of transcription. While it is true that the mechanism of transcription regulation involves larger protein complexes, we have concentrated our efforts on interactions between two proteins one of which is a known TF. The method by which we have chosen TFs provides a high accuracy of the TF set. We rely on a previously published set of TFs that was manually curated and we added a small number of manually curated TFs to this set. Thus, all TFs in our set have been hand-curated at some stage during the process of establishing our set of known TF. Although, we have applied stringent requirements for a protein to be considered a TcoF, we still rely on annotations in GO for determining whether a protein is involved in the regulation of transcription. We are aware that GO is in some cases unreliable due to incomplete or erroneous annotations, but we are not aware of an equally rich data source for functional annotations for human proteins. In order to enable users to judge the reliability of the annotations of a given protein, we have included detailed information about the evidence cited for each annotation used to classify a protein as a TcoF, including GO evidence code, evidence type and a reference pointing to the data source. Thus, it is possible for the user to quickly distinguish between TcoFs whose classification solely depends on non-experimental evidence such as automatic annotations or author statements from those whose annotation has been experimentally confirmed. It must also be said that the reliability of the identification of TcoFs that we performed is also dependent on the availability of PPI data in one of the resources we utilized. Missing or erroneous data in these resources will inevitably lead to the misclassification of TcoFs in TcoF-DB. However, we benefit from using four resources in parallel in the extraction of PPI data. The average numbers of interactions per TF and TcoF show that interactions happen abundantly in transcriptional regulation. The fact that more than half of the known human TFs have at least one proven PPI underlines the importance of PPI and TcoF for the understanding of transcriptional regulation. The arithmetic mean for the number of interactions per TFs is 1.8 for TF–TcoFs interactions, 1.7 for TF–TF interactions and 2.7 for interactions between TFs and other proteins. Because for each of these interaction types more than half our TFs do not have an interaction, the median values for these interactions are all zero. This means that the average TF interacts with 1.7 other TFs, 1.8 TcoFs and 2.7 other proteins. At the same time more than half of the TFs do not have an interaction with another protein of a given type. Considering all protein types (TFs, TcoFs and other proteins) 712 out of 1365 TFs have an interaction. The difference between the arithmetic mean and the median values described above also illustrates the fact that there are a number of TFs which are very promiscuous regarding PPI. The TF ‘cellular tumour antigen p53’ is one of the best-studied human TFs (13). It is also one of the most promiscuous TFs regarding reported PPI. For this TF, we report 313 unique interacting proteins, 35 other TFs, 99 TcoFs (31 out of the high-confidence group) and 179 other proteins. The human TF ‘TATA-box binding protein’ is also promiscuous, interacting with 71 TcoFs (17 out of the high-confidence group), 33 other TFs and 26 other proteins. These TFs are two of the most interacting TFs in our database. With 1365 TFs and 529 TcoFs (155 high-confidence TcoFs, 374 hypothetical TcoFs) indentified in this database, we can confirm a previous estimate that ∼10% of the human proteome is directly involved in transcriptional regulation (45). With human TFs having a total of 7045 interactions with other human proteins, the number of TF–protein complexes possibly involved in transcription is significantly larger than the number of TFs. One has to keep in mind that our classification of TcoFs depends on annotation data which in turn depends on experiments and data submissions. Such data is often inaccurate and incomplete, frequently focusing only on a specific cell location or specific function while alternative cell locations in which the protein expresses and/or alternative protein functions remain elusive. For this reason, proteins that have a proven interaction with a known TF, but are not classified as a TcoF in our database, because they lack the annotation to support such a claim, play an important role. The reason for this is that in some cases they might represent candidates for proteins with roles in transcriptional regulation.

FUNDING

Funding for open access charge: King Abdullah University of Science and Technology. Conflict of interest statement. None declared.
  45 in total

Review 1.  Role of general and gene-specific cofactors in the regulation of eukaryotic transcription.

Authors:  R G Roeder
Journal:  Cold Spring Harb Symp Quant Biol       Date:  1998

Review 2.  Coactivator and corepressor complexes in nuclear receptor function.

Authors:  L Xu; C K Glass; M G Rosenfeld
Journal:  Curr Opin Genet Dev       Date:  1999-04       Impact factor: 5.578

Review 3.  Transcriptional regulation in mammalian cells by sequence-specific DNA binding proteins.

Authors:  P J Mitchell; R Tjian
Journal:  Science       Date:  1989-07-28       Impact factor: 47.728

Review 4.  Transcription factors: an overview.

Authors:  D S Latchman
Journal:  Int J Biochem Cell Biol       Date:  1997-12       Impact factor: 5.085

Review 5.  The general transcription machinery and general cofactors.

Authors:  Mary C Thomas; Cheng-Ming Chiang
Journal:  Crit Rev Biochem Mol Biol       Date:  2006 May-Jun       Impact factor: 8.250

6.  The MIPS mammalian protein-protein interaction database.

Authors:  Philipp Pagel; Stefan Kovac; Matthias Oesterheld; Barbara Brauner; Irmtraud Dunger-Kaltenbach; Goar Frishman; Corinna Montrone; Pekka Mark; Volker Stümpflen; Hans-Werner Mewes; Andreas Ruepp; Dmitrij Frishman
Journal:  Bioinformatics       Date:  2004-11-05       Impact factor: 6.937

Review 7.  Multifunctional transcription factor YY1: a therapeutic target in human cancer?

Authors:  Chi-Chung Wang; Jeremy J W Chen; Pan-Chyr Yang
Journal:  Expert Opin Ther Targets       Date:  2006-04       Impact factor: 6.902

Review 8.  Transcription factors in disease.

Authors:  D Engelkamp; V van Heyningen
Journal:  Curr Opin Genet Dev       Date:  1996-06       Impact factor: 5.578

9.  TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes.

Authors:  V Matys; O V Kel-Margoulis; E Fricke; I Liebich; S Land; A Barre-Dirrie; I Reuter; D Chekmenev; M Krull; K Hornischer; N Voss; P Stegmaier; B Lewicki-Potapov; H Saxel; A E Kel; E Wingender
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

10.  An evaluation of human protein-protein interaction data in the public domain.

Authors:  Suresh Mathivanan; Balamurugan Periaswamy; T K B Gandhi; Kumaran Kandasamy; Shubha Suresh; Riaz Mohmood; Y L Ramachandra; Akhilesh Pandey
Journal:  BMC Bioinformatics       Date:  2006-12-18       Impact factor: 3.169

View more
  35 in total

1.  Intra- and Inter-cellular Rewiring of the Human Colon during Ulcerative Colitis.

Authors:  Christopher S Smillie; Moshe Biton; Jose Ordovas-Montanes; Keri M Sullivan; Grace Burgin; Daniel B Graham; Rebecca H Herbst; Noga Rogel; Michal Slyper; Julia Waldman; Malika Sud; Elizabeth Andrews; Gabriella Velonias; Adam L Haber; Karthik Jagadeesh; Sanja Vickovic; Junmei Yao; Christine Stevens; Danielle Dionne; Lan T Nguyen; Alexandra-Chloé Villani; Matan Hofree; Elizabeth A Creasey; Hailiang Huang; Orit Rozenblatt-Rosen; John J Garber; Hamed Khalili; A Nicole Desch; Mark J Daly; Ashwin N Ananthakrishnan; Alex K Shalek; Ramnik J Xavier; Aviv Regev
Journal:  Cell       Date:  2019-07-25       Impact factor: 41.582

2.  ProSampler: an ultrafast and accurate motif finder in large ChIP-seq datasets for combinatory motif discovery.

Authors:  Yang Li; Pengyu Ni; Shaoqiang Zhang; Guojun Li; Zhengchang Su
Journal:  Bioinformatics       Date:  2019-11-01       Impact factor: 6.937

3.  FARNA: knowledgebase of inferred functions of non-coding RNA transcripts.

Authors:  Tanvir Alam; Mahmut Uludag; Magbubah Essack; Adil Salhi; Haitham Ashoor; John B Hanks; Craig Kapfer; Katsuhiko Mineta; Takashi Gojobori; Vladimir B Bajic
Journal:  Nucleic Acids Res       Date:  2017-03-17       Impact factor: 16.971

Review 4.  Purification and characterization of transcription factors.

Authors:  L I Nagore; R J Nadeau; Q Guo; Y L A Jadhav; H W Jarrett; W E Haskins
Journal:  Mass Spectrom Rev       Date:  2013-07-07       Impact factor: 10.946

5.  A systematic survey of the Cys2His2 zinc finger DNA-binding landscape.

Authors:  Anton V Persikov; Joshua L Wetzel; Elizabeth F Rowland; Benjamin L Oakes; Denise J Xu; Mona Singh; Marcus B Noyes
Journal:  Nucleic Acids Res       Date:  2015-01-15       Impact factor: 16.971

6.  Inactivation of Cyclic AMP Response Element Transcription Caused by Constitutive p38 Activation Is Mediated by Hyperphosphorylation-Dependent CRTC2 Nucleocytoplasmic Transport.

Authors:  Huabin Ma; Zeyuan Liu; Chuan-Qi Zhong; Yifei Liu; Zhirong Zhang; Yaoji Liang; Jingxian Li; Shoufa Han; Jiahuai Han
Journal:  Mol Cell Biol       Date:  2019-04-16       Impact factor: 4.272

7.  Dietary fat differentially influences the lipids storage on the adipose tissue in metabolic syndrome patients.

Authors:  Antonio Camargo; Maria E Meneses; Pablo Perez-Martinez; Javier Delgado-Lista; Yolanda Jimenez-Gomez; Cristina Cruz-Teno; Francisco J Tinahones; Juan A Paniagua; Francisco Perez-Jimenez; Helen M Roche; Maria M Malagon; Jose Lopez-Miranda
Journal:  Eur J Nutr       Date:  2013-08-07       Impact factor: 5.614

8.  Finding biomarkers in non-model species: literature mining of transcription factors involved in bovine embryo development.

Authors:  Nicolas Turenne; Evgeniy Tiys; Vladimir Ivanisenko; Nikolay Yudin; Elena Ignatieva; Damien Valour; Séverine A Degrelle; Isabelle Hue
Journal:  BioData Min       Date:  2012-08-29       Impact factor: 2.522

9.  Network analysis of microRNAs and their regulation in human ovarian cancer.

Authors:  Sebastian Schmeier; Ulf Schaefer; Magbubah Essack; Vladimir B Bajic
Journal:  BMC Syst Biol       Date:  2011-11-03

10.  HOCOMOCO: a comprehensive collection of human transcription factor binding sites models.

Authors:  Ivan V Kulakovskiy; Yulia A Medvedeva; Ulf Schaefer; Artem S Kasianov; Ilya E Vorontsov; Vladimir B Bajic; Vsevolod J Makeev
Journal:  Nucleic Acids Res       Date:  2012-11-21       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.