| Literature DB >> 19604360 |
Antonio Marco1, Ignacio Marín.
Abstract
BACKGROUND: The characterization of the global functional structure of a cell is a major goal in bioinformatics and systems biology. Gene Ontology (GO) and the protein-protein interaction network offer alternative views of that structure.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19604360 PMCID: PMC2717056 DOI: 10.1186/1752-0509-3-69
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Figure 1Overview of the strategy used to compare GO and the interactome. For a given parent GO term, we extracted the proteins annotated with it and determined their primary distances (shortest path length) in the protein interaction network. The resulting graph was transformed into a dendrogram with UVCLUSTER. We then retrieved the proteins annotated with each child GO term and labeled them in the tree. We finally detected, using the program TreeTracker, the clusters in the tree significantly enriched for each child GO term.
Parent GO terms selected for the analysis, and number of elements included.
| Developmental process (BP) | 1 | 768 | 757 | 632 | 83.5% | 257 | 34.0% |
| Reproduction (BP) | 1 | 299 | 298 | 245 | 82.2% | 111 | 37.3% |
| Establishment of cellular localization (BP) | 1 | 573 | 568 | 452 | 79.6% | 188 | 33.1% |
| Response to stimulus (BP) | 1 | 670 | 657 | 514 | 78.2% | 207 | 31.5% |
| Ribonucleoprotein complex (CC) | 2 | 556 | 459 | 318 | 69.3% | 96 | 20.9% |
| Organelle envelope (CC) | 2 | 346 | 345 | 230 | 66.7% | 69 | 20.0% |
| Transcription regulator activity (MF) | 1 | 307 | 303 | 276 | 91.1% | 107 | 35.3% |
| Structural molecule activity (MF) | 1 | 307 | 286 | 231 | 80.8% | 75 | 26.2% |
| Transporter activity (MF) | 1 | 380 | 377 | 297 | 78.8% | 63 | 16.7% |
BP: Biological Process; CC: Cellular Component; MF: Molecular Function. 1: Levels of the parent GO terms. Level 1 terms are hierarchically located just below the three main categories (BP, CC and MF) while Level 2 terms are below a Level 1 term. 2: Number of genes selected for the analysis, i. e. those ascribed to the parent GO term which are also included in one of the selected child GO terms. 3: Genes among those in the previous column that contain ORFs and therefore encode for proteins. 4: Number of products among those in the selected ORFs for which interactions were compiled in the DIP database. 5: Same as 4, but for the GOLD dataset.
Summary of the GO terms used in this study.
| Reproductive developmental process (3006) | 26 (0) | 13 (0) | Organelle inner membrane (19866) | 105 (8) | 27 (2) |
| Anatomical structure development (48856) | 186 (15) | 94 (8) | Organelle outer membrane (31968) | 24 (0) | --- |
| Cellular developmental process (48869) | 450 (1) | 169 (0) | Organelle envelope lumen (31970) | 25 (0) | --- |
| Aging (7568) | 40 (0) | 22 (0) | Nuclear envelope (5635) | 86 (3) | 35 (0) |
| Mitochondrial envelope (5740) | 148 (9) | 34 (2) | |||
| Sexual reproduction (19953) | 95 (0) | 41 (0) | | ||
| Asexual reproduction (19954) | 74 (6) | 44 (4) | Transcriptional activator activity (16563) | 50 (0) | 24 (0) |
| Reproductive process (22414) | 207 (7) | 88 (4) | Transcriptional repressor activity (16564) | 35 (2) | 13 (1) |
| Rep. of a single-celled organism (32505) | 220 (7) | 99 (4) | Transcription factor activity (3700) | 45 (2) | 13 (1) |
| RNA polymerase II transcription factor activity (3702) | 112 (4) | 44 (1) | |||
| Transcriptional elongation regulator activity (3711) | 14 (6) | --- | |||
| Secretion by cell (32940) | 206 (9) | 84 (3) | Transcription cofactor activity (3712) | 36 (1) | 16 (0) |
| Establishment of nucleus localization (40023) | 17 (0) | --- | |||
| Intracellular transport (46907) | 409 (21) | 175 (10) | | ||
| Structural constituent of ribosome (3735) | 115 (0) | 21 (0) | |||
| Structural constituent of cytoskeleton (5200) | 50 (29) | 31 (18) | |||
| Response to endogenous stimulus (9719) | 197 (3) | 101 (0) | |||
| Cellular response to stimulus (51716) | 13 (0) | --- | | ||
| Response to abiotic stimulus (9628) | 83 (0) | 32 (0) | Ion transport activity (15075) | 111 (5) | 16 (0) |
| Response to external stimulus (9605) | 27 (0) | 13 (0) | Carbohydrate transporter activity (15144) | 26 (0) | --- |
| Response to biotic stimulus (6907) | 19 (0) | --- | ATPase activity, coupled to movement of substances (43492) | 41 (2) | --- |
| Response to chemical stimulus (42221) | 212 (0) | 65 (0) | Amine transporter activity (5275) | 27 (0) | --- |
| Response to stress (6950) | 370 (3) | 159 (0) | Organic acid transporter activity (5342) | 32 (0) | --- |
| Carrier activity (5386) | 67 (0) | 13 (0) | |||
| Intracellular transporter activity (5478) | 28 (0) | 17 (0) | |||
| Small nuclear ribonucleoprotein complex (30532) | 58 (2) | 24 (0) | Protein transporter activity (8565) | 48 (1) | 29 (1) |
| Preribosome (30684) | 12 (4) | --- | Lipid transporter activity (5319) | 11(2) | --- |
| Spliceosome (5681) | 74 (12) | 33 (2) | |||
| Small nucleolar ribonucleoprotein complex (5732) | 49 (43) | 10 (9) | |||
| Ribosome (5840) | 156 (5) | 45 (1) | |||
| Polysome (5844) | 11 (0) | --- | |||
Results for both the DIP and GOLD datasets are indicated. Parent GO terms are indicated in bold and, below them, the child GO terms are detailed. The numbers in parentheses adjacent to the names refer to the numerical identifiers of the GO terms. N: number of proteins for which we obtained PPI data and whose genes were annotated to the GO term. (P): in parentheses, number of proteins among those N that are annotated with the GO term based exclusively on PPI evidence. The child GO terms with less than 10 proteins found when analyzing the GOLD dataset were not further examined (dashes).
General results for the parent GO terms. Analyses using the DIP dataset.
| Developmental process (32502) | 63.6% (402/632) | 62.2% | 13.0% (74/570) | 0.46 ± 0.02 |
| Reproduction (3) | 58.4% (142/245) | 94.1% | 0% (0/25) | 0.38 ± 0.11 |
| Establishment of cellular localization (51649) | 66.8% (302/452) | 88.4% | 1.1% (3/264) | 0.43 ± 0.10 |
| Response to stimulus (50896) | 56.4% (290/514) | 77.5% | 19.5% (32/164) | 0.46 ± 0.05 |
| Ribonucleoprotein complex (30529) | 59.7% (190/318) | 77.8% | 12.8% (31/242) | 0.64 ± 0.06 |
| Organelle envelope (31967) | 39.6% (91/230) | 84.9% | 1.2% (1/83) | 0.47 ± 0.09 |
| Transcription regulator activity (30528) | 43.5% (120/276) | 67.6% | 15.0% (30/200) | 0.40 ± 0.08 |
| Structural molecule activity (5198) | 39.8% (92/231) | 95.6% | 0% (0/165) | 0.53 |
| Transporter Activity (5215) | 33.7% (100/297) | 72.5% | 6.4% (12/186) | 0.43 ± 0.06 |
Figure 2Hierarchical representation of the protein interaction network for the . On the left, tree based on secondary distances. The tree on the right is shown to make the topology easier to visualize. At the bottom, "Unconnected proteins" are those with no direct interactions, which are separated from the rest by UVCLUSTER. Numbers refer to different clusters found for the same child GO term, which are again shown in Figure 3. snoRNP complex: Small nucleolar ribonucleoprotein complex; snRNP complex: Small nuclear ribonucleoprotein complex. NMD: nonsense-mediated mRNA decay. LSM: like-SM protein complex.
Figure 3Ribonucleoprotein complex protein interaction network. All the proteins (dots) in this parent GO term that have at least one direct connection are shown. Colors refer to the child GO terms to which the proteins are annotated. White dots are proteins that do not belong to any of the analyzed child GO terms. The clusters detected in our analyses are framed with colored polygons. Color codes and cluster numbers as in Figure 2.
General results for the parent GO terms. Analyses using the GOLD dataset.
| Developmental process (32502) | 83.3% (214/257) | 82.0% | 7.2% (16/222) | 0.51 ± 0.06 |
| Reproduction (3) | 96.4% (107/111) | 82.5% | 8.3% (1/12) | 0.45 ± 0.03 |
| Establishment of cellular localization (51649) | 86.7% (163/188) | 76.8% | 46.2% (49/106) | 0.37 ± 0.02 |
| Response to stimulus (50896) | 78.3% (162/207) | 73.2% | 32.1% (18/56) | 0.48 ± 0.07 |
| Ribonucleoprotein complex (30529) | 82.3% (79/96) | 70.7% | 56.2% (41/73) | 0.72 ± 0.03 |
| Organelle envelope (31967) | 87.0% (60/69) | 79.5% | 26.5% (9/34) | 0.70 ± 0.05 |
| Transcription regulator activity (30528) | 39.3% (42/107) | 64.7% | 33.8% (26/77) | 0.42 ± 0.03 |
| Structural molecule activity (5198) | 69.3% (52/75) | 68.8% | 42.3% (22/52) | 0.91 |
| Transporter Activity (5215) | 87.3% (55/63) | 93.6% | 0.0% (0/50) | 0.63 ± 0.13 |
Figure 4Interactome-based structure of the GO term . For simplicity, significant clusters of size < 5 are omitted. This eliminates the term Polysome, for which only one cluster of size = 3 was found.
Figure 5GO terms for which it was found a significant enrichment for proteins in the clusters detected when analyzing the Notice how this structure, directly taken from the GO, differs from that shown in Figure 4. Numbers refer to the five clusters shown also in the other figures (1: Translation initiation factors; 2: Ribosome stalk; 3: Large mitochondrial subunit; 4: Elongation factors; 5: Small mitochondrial subunit).
Figure 6GO terms for which a significant enrichment for proteins in the clusters detected for the child GO terms . The names below the boxes refer to the child GO terms from which derive the clusters of proteins detected as significant. Notice the obvious overlap due to many proteins belonging to two or even the three child GO terms.