| Literature DB >> 23874789 |
Marek J Piatek1, Michael C Schramm, Dharani D Burra, Abdulaziz Binshbreen, Boris R Jankovic, Rajesh Chowdhary, John A C Archer, Vladimir B Bajic.
Abstract
BACKGROUND: Initiation of transcription is essential for most of the cellular responses to environmental conditions and for cell and tissue specificity. This process is regulated through numerous proteins, their ligands and mutual interactions, as well as interactions with DNA. The key such regulatory proteins are transcription factors (TFs) and transcription co-factors (TcoFs). TcoFs are important since they modulate the transcription initiation process through interaction with TFs. In eukaryotes, transcription requires that TFs form different protein complexes with various nuclear proteins. To better understand transcription regulation, it is important to know the functional class of proteins interacting with TFs during transcription initiation. Such information is not fully available, since not all proteins that act as TFs or TcoFs are yet annotated as such, due to generally partial functional annotation of proteins. In this study we have developed a method to predict, using only sequence composition of the interacting proteins, the functional class of human TF binding partners to be (i) TF, (ii) TcoF, or (iii) other nuclear protein. This allows for complementing the annotation of the currently known pool of nuclear proteins. Since only the knowledge of protein sequences is required in addition to protein interaction, the method should be easily applicable to many species.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23874789 PMCID: PMC3709904 DOI: 10.1371/journal.pone.0068857
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Workflow of Analysis Pipeline.
Given a set of binding proteins and depending on the model type, the analysis pipeline predicts the functional identity of one of a TF binding partner. After a pair of binding proteins is given as the input, the pipeline calculates numerous amino acid physico-chemical properties for the pair and generates a matrix of feature vectors. This matrix of feature vectors is submitted to the Random Forest classifier that returns the predicted functional class assignment and the associated confidence score. The latter is a measure of how strongly the average feature-class probabilities of all decision trees pointed to each class based on the information gain of those decision trees.