| Literature DB >> 21269460 |
Thomas R Burkard1, Melanie Planyavsky, Ines Kaupe, Florian P Breitwieser, Tilmann Bürckstümmer, Keiryn L Bennett, Giulio Superti-Furga, Jacques Colinge.
Abstract
BACKGROUND: On the basis of large proteomics datasets measured from seven human cell lines we consider their intersection as an approximation of the human central proteome, which is the set of proteins ubiquitously expressed in all human cells. Composition and properties of the central proteome are investigated through bioinformatics analyses.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21269460 PMCID: PMC3039570 DOI: 10.1186/1752-0509-5-17
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Number of protein groups and distinct peptides identified in the proteomics data.
| Cell line | Protein groups | Distinct peptides | Germ layer | ||
|---|---|---|---|---|---|
| HaCat | 2031 | 2673 | 13 | 29040 | Ectoderm |
| HEK293 | 4154 | 5412 | 64 | 71571 | Mesoderm |
| HeLa | 2379 | 3075 | 31 | 31609 | Mesoderm |
| HepG2 | 2494 | 3298 | 24 | 30194 | Endoderm |
| K562 | 3141 | 4078 | 37 | 48202 | Mesoderm |
| Namalwa | 2686 | 3527 | 29 | 37512 | Mesoderm |
| U937 | 2073 | 2720 | 25 | 28786 | Mesoderm |
(a) Number of protein group reporters in case alternative splice variants are counted as well because they have all the peptides of the protein group, although they do not necessarily have a specific peptide detected. (b) Isoforms supported by a specific peptide.
Gene ontology terms (biological process) found significant at the 5% level in the central proteome.
| GO ID | GO Term | P-value | |
|---|---|---|---|
| GO:0006412 | translation | 1.00E-30 | 0.37 |
| GO:0016043 | cellular component organization and biogenesis | 1.40E-29 | 0.13 |
| GO:0015031 | protein transport | 1.90E-25 | 0.19 |
| GO:0043170 | macromolecule metabolic process | 5.10E-23 | 0.14 |
| GO:0009056 | catabolic process | 3.60E-19 | 0.15 |
| GO:0044238 | primary metabolic process | 1.00E-16 | 0.10 |
| GO:0006091 | generation of precursor metabolites and energy | 2.90E-14 | 0.25 |
| GO:0007049 | cell cycle | 5.10E-12 | 0.14 |
| GO:0006259 | DNA metabolic process | 4.30E-07 | 0.13 |
| GO:0009058 | biosynthetic process | 7.50E-06 | 0.20 |
| GO:0009719 | response to endogenous stimulus | 6.10E-04 | 0.12 |
| GO:0008219 | cell death | 8.90E-04 | 0.10 |
| GO:0019725 | cellular homeostasis | 2.95E-03 | 0.12 |
| GO:0006950 | response to stress | 4.30E-03 | 0.10 |
aCoverage indicates the proportion of proteins annotated in Swiss-Prot with the term that are found in C.Prot.
Figure 1Network and pathways statistics. (A) Node degree (number of edges). Note the strong shift of C.Prot towards higher values. We also observe the absence of shift of the tissue specific genes (Spe.Trans) and the gradual shift from low abundant C.Prot entities to high abundant ones. (B) Eigenvector centrality values also display similar shifts, although in this case Spe.Trans even reverses the trend and differences between low and high abundant C.Prot are more modest. (C) Relative positions in pathways; 0 = beginning, 1 = end. No real bias for C.Prot but a strong preference for central position for its abundant proteins. Spe.Trans and low abundant C.Prot are more spread over all possible positions. (D) The same for drug targets. Note the strong shift towards initial positions for C.Prot drug targets, which significantly amplifies the already present preference of drug targets for such positions.
Figure 2The central interactome. (A) Shortest path distance distributions. We first remark that distances between C.Prot entities (red) are closer than distances between proteins of the human interactome (black), i.e. short distances below 4, which is the mean and median distance, are over-represented. Remarkably, C.Prot is also closer than on average to the non C.Prot proteins (orange). The abundant C.Prot proteins are even closer to each other and to the non C.Prot proteins (cyan and blue). It shows that C.Prot (and its most abundant components) are embedded "uniformly" in the human proteome. (B) Power law distribution of the whole human interactome versus the central interactome. The central interactome is more connected (exponent -1.1), i.e. frequency of high node degrees decreases slower, than the whole (exponent -1.8). (C) Central interactome with mapped significant biological processes (Table 2). Processes not significantly enriched in C.Prot are in black and multiple GO annotations are depicted by a circle (color chosen randomly) as opposed to a square for single GO. Shared GO term ancestors at a node were removed to eliminate trivial multiple annotations and stay at the most specific levels. We note that, except for a few, processes are not strongly localized in this network. It does not represent juxtaposed pathways but rather an exchange platform. We also observe that most proteins have multiple GO BP annotations (circular node shape), which de facto establish additional exchanges between fundamental cellular processes. Finally, we recognized some important complexes: (a) exosome, (b) ubiquitinol-cytochrome c reducatase, (c) NADH dehydrogenase, (d) oligosaccharyl transferase, (e) proteasome, (f) COPI, (g) ribonucleoprotein/splicosome, (h) proton-transporting ATP synthase, (i) ribosome, (j) signal recognition particle, (k) cytochrome c oxidase subunits, (l) pyruvate/2-oxoglutarate dehydrogenase complex, (m) prefoldin, (n) condensin, (o) Signal peptidase complex, (p) COPII, (q) septin complex. Network visualized with Cytoscape [56].
Figure 3Inter-biological process exchanges over the central interactome. High-scoring fluxes between biological processes provide us with a mean to summarize the main function of the central interactome, a subset of the human interactome that is likely to be expressed in all the human cells. In our scoring scheme, high scores represent fluxes that are much more intense than expected from GO term frequencies and protein connectivity, i.e. exchanges significantly favored by protein interactions. GO biological processes are represented as nodes and scores by the edge thickness. (A) Fluxes within the central interactome. The star-like topology with translation (red) at its center shows that most exchanges synchronize other cellular processes with translation. The strongest crosstalk can be observed between translation and GO categories (blue), which contain many members of the nucleic acid metabolism (needed for mRNA generation) and complexes such as signal recognition particle, coatomer protein complex and the splicosome. (B) Fluxes between C.Prot proteins and proteins not in C.Prot. As soon as the focus shifts away from the central interactome, translation loses its role as central communicator. Communication between C.Prot and non C.Prot are less specialized. Also, note the lost interconnectivity of the blue cluster, which reflects reduced activity of the processes mentioned above. (C) This trend is further amplified in the external fluxes between proteins not in C.Prot that become essentially global and ignore translation.
Figure 4Drug targets GO terms variation along pathways. Integration of GO biological process (BP) analysis and pathway positions. Proteins at the source (0-0.2), center (0.4-0.6) and end (0.8-1) of pathways in C.Prot and drug targets restricted to C.Prot are submitted to GO analysis. All the BP terms with P-values < 0.1% in at least one case are reported and we see that the general strong reduction for central pathway position (Figure 1D) is rather uniform over the BPs. The barplots represent the coverage of the GO terms.