| Literature DB >> 22837694 |
Gaston K Mazandu1, Nicola J Mulder1.
Abstract
High-throughput biology technologies have yielded complete genome sequences and functional genomics data for several organisms, including crucial microbial pathogens of humans, animals and plants. However, up to 50% of genes within a genome are often labeled "unknown", "uncharacterized" or "hypothetical", limiting our understanding of virulence and pathogenicity of these organisms. Even though biological functions of proteins encoded by these genes are not known, many of them have been predicted to be involved in key processes in these organisms. In particular, for Mycobacterium tuberculosis, some of these "hypothetical" proteins, for example those belonging to the Pro-Glu or Pro-Pro-Glu (PE/PPE) family, have been suspected to play a crucial role in the intracellular lifestyle of this pathogen, and may contribute to its survival in different environments. We have generated a functional interaction network for Mycobacterium tuberculosis proteins and used this to predict functions for many of its hypothetical proteins. Here we performed functional enrichment analysis of these proteins based on their predicted biological functions to identify annotations that are statistically relevant, and analysed and compared network properties of hypothetical proteins to the known proteins. From the statistically significant annotations and network information, we have tried to derive biologically meaningful annotations related to infection and disease. This quantitative analysis provides an overview of the functional contributions of Mycobacterium tuberculosis "hypothetical" proteins to many basic cellular functions, including its adaptability in the host system and its ability to evade the host immune response.Entities:
Keywords: function prediction; genome analysis; hypothetical protein; interaction network; tuberculosis
Mesh:
Substances:
Year: 2012 PMID: 22837694 PMCID: PMC3397526 DOI: 10.3390/ijms13067283
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 6.208
General MTB functional network topological parameters.
| Parameters | Value |
|---|---|
| Number of Proteins (Nodes) | 4136 |
| Number of Functional Interactions (Edges) | 59,919 |
| Average Degree (in and out) | 28.974 |
| Average Shortest Path Length | 3.6274 |
| Maximum Path Length | 11 |
| Number of Connected Components | 23 |
| % of Nodes in Largest Component | 98.7% |
| Number of Hubs | 201 |
Comparison of network topological properties of hypothetical proteins to the standard network topological values and those of proteins with previously known GO biological process terms.
| Metric | Average values | ||||
|---|---|---|---|---|---|
|
| |||||
| Hypothetical | Other proteins | Expected value | Other proteins | Expected value | |
| Degree | 16.16892 | 37.77267 | 28.09381 | ||
| Closeness | 0.277673 | 0.293277 | 0.27568 | 5.562 | 0.9792 |
| Betweenness | 6948.217 | 13913 | 15003 | ||
| Eigenvector | 0.000314 | 0.00591 | 0.00340 | ||
Figure 1Scatter plot showing proteins that are central in the MTB functional network. Each protein in the genome is plotted by its closeness value and coloured by whether it is characterised and central.
Proteins that have InterPro matches mapped to GO terms and their similarities to predicted GO terms.
| UniProt-ID | InterPro-ID | GO mapping | GO Prediction | Power |
|---|---|---|---|---|
| P67745 | IPR000835 | GO:0006355 [ | GO:0006355 [ | 1.00000 |
| Q8VKE6 | IPR002514 | GO:0006313 [ | GO:0006313 [ | 1.00000 |
| Q7D8E8 | IPR001845 | GO:0006355 [ | GO:0006355 [ | 1.00000 |
| Q7D5V1 | IPR000836 | GO:0009116 [ | GO:0009116 [ | 1.00000 |
| Q7D8W2 | IPR001087 | GO:0006629 [ | GO:0006629 [ | 1.00000 |
| Q8VJC1 | IPR006059 | GO:0006810 [ | GO:0006810 [ | 1.00000 |
| Q7D9M5 | IPR020946 | GO:0055114 [ | GO:0055114 [ | 1.00000 |
| P64725 | IPR002539 | GO:0008152 [ | GO:0008152 [ | 1.00000 |
| P71788 | IPR013216 | GO:0008152 [ | GO:0008152 [ | 1.00000 |
| Q10777 | IPR000873 | GO:0008152 [ | GO:0008152 [ | 1.00000 |
| O05796 | IPR013216 | GO:0008152 [ | GO:0008152 [ | 1.00000 |
| O07197 | IPR013094 | GO:0008152 [ | GO:0008152 [ | 1.00000 |
| P0A5F5 | IPR005674 | GO:0008152 [ | GO:0008152 [ | 0.66604 |
| IPR000383 | GO:0006508 [ | GO:0044238 [ | ||
| O06547 | IPR013216 | GO:0008152 [ | GO:0044237 [ | 0.24819 |
| Q7D8C2 | IPR013216 | GO:0008152 [ | GO:0042158 [ | 0.03033 |
| O05294 | IPR012908 | GO:0006886 [ | GO:0044237 [ | 0.01435 |
| GO:0006505 [ |
Repartition per class of hypothetical proteins in the MTB proteome before and after function prediction. PE/PPE- indicates the number of predicted proteins originating from the PE/PPE family.
| Functional Class | # Proteins before | Prediction | PE/PPE- | # Proteins after | % change | |
|---|---|---|---|---|---|---|
| 1 virulence, detoxification, adaptation | 176 | 85 | 1 | 3.33067 | 261 | 32.6 |
| 2 lipid metabolism | 230 | 80 | 6 | 9.76996 | 310 | 25.8 |
| 3 information pathways | 245 | 93 | 3 | 9.76996 | 338 | 27.5 |
| 4 cell wall and cell processes | 618 | 418 | 62 | 3.33067 | 1036 | 40.3 |
| 5 insertion seqs and phages | 82 | 65 | 4 | 1.11022 | 147 | 44.2 |
| 6 PE/PPE | 147 | −115 | - | - | 32 | - |
| 7 intermediary metabolism and respiration | 884 | 637 | 38 | 4.95160 | 1521 | 41.9 |
| 8 unknown | 1637 | −1351 | - | - | 286 | |
| 9 regulatory proteins | 176 | 88 | 1 | 9.54792 | 264 | 33.3 |
|
| ||||||
| Total | 4195 | 1466 | 115 | - | 4195 | 37.8 |
GO process terms significantly over-represented in the newly predicted GO set compared to complete set of GO terms.
| GO ID | GO name | Frequency | |
|---|---|---|---|
| GO:0006730 | one-carbon metabolic process | 364 | 0.00000 |
| GO:0009132 | nucleoside diphosphate metabolic process | 74 | 0.00000 |
| GO:0009123 | nucleoside monophosphate metabolic process | 72 | 0.00000 |
| GO:0009141 | nucleoside triphosphate metabolic process | 71 | 0.00000 |
| GO:0006353 | transcription termination, DNA-dependent | 88 | 1.11022 |
| GO:0019538 | protein metabolic process | 87 | 1.11022 |
| GO:0022900 | electron transport chain | 354 | 2.22045 |
| GO:0006793 | phosphorus metabolic process | 324 | 2.22045 |
| GO:0009061 | anaerobic respiration | 135 | 2.22045 |
| GO:0009307 | DNA restriction-modification system | 73 | 2.22045 |
| GO:0006662 | glycerol ether metabolic process | 277 | 3.33067 |
| GO:0015074 | DNA integration | 157 | 3.33067 |
| GO:0019419 | sulfate reduction | 86 | 3.33067 |
| GO:0051090 | regulation of sequence-specific DNA binding transcription factor activity | 64 | 3.33067 |
| GO:0006139 | nucleobase-containing compound metabolic process | 336 | 4.44089 |
| GO:0006259 | DNA metabolic process | 169 | 4.44089 |
| GO:0006313 | transposition, DNA-mediated | 113 | 4.44089 |
| GO:0006797 | polyphosphate metabolic process | 672 | 5.55112 |
| GO:0006796 | phosphate-containing compound metabolic process | 643 | 5.55112 |
| GO:0000103 | sulfate assimilation | 603 | 5.55112 |
| GO:0044238 | primary metabolic process | 548 | 5.55112 |
| GO:0006281 | DNA repair | 246 | 5.55112 |
| GO:0006413 | translational initiation | 156 | 5.55112 |
| GO:0009116 | nucleoside metabolic process | 107 | 5.55112 |
| GO:0044255 | cellular lipid metabolic process | 444 | 6.66134 |
| GO:0044262 | cellular carbohydrate metabolic process | 317 | 6.66134 |
| GO:0001121 | transcription from bacterial-type RNA polymerase promoter | 273 | 6.66134 |
| GO:0006302 | double-strand break repair | 200 | 6.66134 |
| GO:0009225 | nucleotide-sugar metabolic process | 177 | 6.66134 |
| GO:0006310 | DNA recombination | 170 | 6.66134 |
| GO:0006260 | DNA replication | 156 | 6.66134 |
| GO:0006396 | RNA processing | 109 | 6.66134 |
| GO:0015977 | carbon fixation | 534 | 7.77161 |
| GO:0042126 | nitrate metabolic process | 477 | 7.77156 |
| GO:0006082 | organic acid metabolic process | 244 | 7.77156 |
| GO:0006266 | DNA ligation | 150 | 7.77156 |
| GO:0006104 | succinyl-CoA metabolic process | 144 | 7.77156 |
| GO:0009060 | aerobic respiration | 118 | 7.77156 |
| GO:0006352 | transcription initiation, DNA-dependent | 112 | 7.77156 |
| GO:0044249 | cellular biosynthetic process | 618 | 8.88178 |
| GO:0006351 | transcription, DNA-dependent | 333 | 8.88178 |
| GO:0009399 | nitrogen fixation | 271 | 8.88178 |
| GO:0016042 | lipid catabolic process | 107 | 8.88178 |
| GO:0009117 | nucleotide metabolic process | 101 | 8.88178 |
| GO:0043620 | regulation of DNA-dependent transcription in response to stress | 96 | 8.88178 |
| GO:0001522 | pseudouridine synthesis | 92 | 8.88178 |
| GO:0006314 | intron homing | 137 | 9.99201 |
| GO:0006066 | alcohol metabolic process | 635 | 1.11022 |
| GO:0090305 | nucleic acid phosphodiester bond hydrolysis | 222 | 1.11022 |
| GO:0006284 | base-excision repair | 212 | 1.11022 |
| GO:0006289 | nucleotide-excision repair | 210 | 1.33227 |
| GO:0009451 | RNA modification | 101 | 1.33227 |
| GO:0008610 | lipid biosynthetic process | 184 | 1.66534 |
| GO:0044267 | cellular protein metabolic process | 58 | 3.66374 |
| GO:0006268 | DNA unwinding involved in replication | 53 | 7.32747 |
| GO:0006270 | DNA-dependent DNA replication initiation | 52 | 1.36557 |
| GO:0006412 | translation | 207 | 4.56302 |
| GO:0031554 | regulation of transcription termination, DNA-dependent | 50 | 5.19584 |
| GO:0030261 | chromosome condensation | 56 | 1.21125 |
| GO:0006801 | superoxide metabolic process | 693 | 2.91767 |
| GO:0000725 | recombinational repair | 46 | 8.03135 |
The number of associations in the MTB functional network, shown separately for each data source and confidence range from low to high.
| Association Evidence by Type | Low Confidence | Medium Confidence | High Confidence |
|---|---|---|---|
| Previous Functional Network | 6850 | 32488 | 25605 |
| Domain-domain | 0 | 5082 | 864 |
| Interologs | 0 | 0 | 1701 |
| Combined Score | 6844 | 30142 | 29776 |
Figure 2Performance analysis of the functional class prediction approaches. (a) ROC curve; (b) P-ROC curve.
Figure 3Performance analysis of the function prediction approaches for BP ontology. (a) ROC curve; (b) P-ROC curve.