| Literature DB >> 19754976 |
Claus Desler1, Prashanth Suravajhala, May Sanderhoff, Merete Rasmussen, Lene Juel Rasmussen.
Abstract
BACKGROUND: The definition of a hypothetical protein is a protein that is predicted to be expressed from an open reading frame, but for which there is no experimental evidence of translation. Hypothetical proteins constitute a substantial fraction of proteomes of human as well as of other eukaryotes. With the general belief that the majority of hypothetical proteins are the product of pseudogenes, it is essential to have a tool with the ability of pinpointing the minority of hypothetical proteins with a high probability of being expressed.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19754976 PMCID: PMC2758874 DOI: 10.1186/1471-2105-10-289
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Overview of subcellular localization prediction programs for eukaryotic proteins
| 4-5 | 67-76% | |
| 4 | 74% | |
| 1 | 85% | |
| 11 | 75% | |
| 11 | 81-94% | |
| 9 | 68-87% | |
| 3 | 90% | |
| 12 | 80% |
A selection of subcellular localization prediction programs for eukaryotic proteins reported to have a medium to high prediction accuracy. Listed are the numbers of compartments each program can predict targeting to, and the reported accuracy of the prediction.
Predicted subcellular distribution of human hypothetical proteins
| Nucleus | 37% | 36% |
| Cytoplasm | 13% | 14% |
| Plasma membrane | 12% | 8% |
| Lysosomes | 9% | 9% |
| Golgi | 9% | 11% |
| Peroxysomes | 7% | 10% |
| Extracellular/Secretory | 6% | 4% |
| Mitochondria | 5% | 5% |
| Endoplasmic reticulum | 2% | 3% |
The protein localization prediction program pTarget was used to predict the subcellular localization of 5860 and 1455 hypothetical proteins from the 2006 and 2008 datasets respectively.
Comparison of predicted versus experimentally determined status of proteins
| 20 | 30% | 65% | 5% | 85% | |
| 56 | 27% | 64% | 9% | 58% | |
| 100 | 36% | 53% | 11% | 45% | |
| - | 25% | 21% | 54% | 6% | |
Hypothetical proteins from the 2006 dataset sorted into groups depending on the probability of having a mitochondrial N-terminal presequence localization signal. Proteins of Group I, have been predicted by TargetP to belong to reliability class A, indicating the strongest prediction. Proteins of Group II contain proteins belonging to reliability class A and B, while proteins of Group III contain proteins belonging to reliability class A, B and C. All proteins of Group I, II and III have identifiable protein domains according to SMART. The three groups have been compared with all 5860 proteins of the 2006 dataset, and with their respective 2008 annotations, to evaluate whether the proteins have been characterized as being mitochondrial or have been removed.
Identified protein domains of 6 hypothetical proteins
| Complex-1-LYR | This hypothetical protein contains a Complex-1-LYR domain. The domain is present in a family of proteins, which include mitochondrial proteins from NADH-ubiquinone oxidoreductase complex 1. The domain is also present in the | |
| Methyltransf 12 | Methyltransferase 12 domain is present in proteins, which actively transfer methyl from ubiguitous S-adenosyl-L-methionine (SAM) to nitrogen, oxygen or carbon. This methyltrasferase domain is found in a variety of SAM-dependent methyltransferases including Coq3 methyltransferase, which is a mitochondrial protein involved in ubiquinone biosynthesis. Coq3 protein is located in the matrix of the mitochondria [ | |
| Sel1 | Sel1 like repeats are tetratricopeptide repeats (TPR) identified in LIN-12 proteins of | |
| EAW75090 | DUF1640 | DUF1640 domain is found in proteins of unknown functions. In |
| DUF143 | DUF143: This domain has no known function and is found in the | |
| EAW74251 | Trm112p | Trm112p is a zinc finger domain found in the TRM112 protein that is required for tRNA methylation in |
Description of protein domains identified in 6 hypothetical proteins of Group I, predicted to be expressed and to have a role in a mitochondrial context. In 4 out of 6 proteins, the identified protein domains have been described in experimentally characterized proteins of the mitochondria (First 4 domains).
Validation of selection strategy using a variety of prediction tools
| 20 | 30% | 65% | 5% | 85% | |
| 100 | 36% | 53% | 11% | 45% | |
| 198 | 34% | 57% | 8% | 25% | |
| 154 | 31% | 61% | 7% | 28% | |
| 9 | 11% | 89% | 0% | 100% | |
| - | 25% | 21% | 54% | 6% | |
Different combinations of prediction tools were used on either the whole 2006 dataset or parts of it, to demonstrate that our selection strategy can use a variety of prediction tools and is neither dependent on TargetP nor the SMART program.