| Literature DB >> 18509523 |
Björn Titz1, Seesandra V Rajagopala, Johannes Goll, Roman Häuser, Matthew T McKevitt, Timothy Palzkill, Peter Uetz.
Abstract
Protein interaction networks shed light on the global organization of proteomes but can also place individual proteins into a functional context. If we know the function of bacterial proteins we will be able to understand how these species have adapted to diverse environments including many extreme habitats. Here we present the protein interaction network for the syphilis spirochete Treponema pallidum which encodes 1,039 proteins, 726 (or 70%) of which interact via 3,649 interactions as revealed by systematic yeast two-hybrid screens. A high-confidence subset of 991 interactions links 576 proteins. To derive further biological insights from our data, we constructed an integrated network of proteins involved in DNA metabolism. Combining our data with additional evidences, we provide improved annotations for at least 18 proteins (including TP0004, TP0050, and TP0183 which are suggested to be involved in DNA metabolism). We estimate that this "minimal" bacterium contains on the order of 3,000 protein interactions. Profiles of functional interconnections indicate that bacterial proteins interact more promiscuously than eukaryotic proteins, reflecting the non-compartmentalized structure of the bacterial cell. Using our high-confidence interactions, we also predict 417,329 homologous interactions ("interologs") for 372 completely sequenced genomes and provide evidence that at least one third of them can be experimentally confirmed.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18509523 PMCID: PMC2386257 DOI: 10.1371/journal.pone.0002292
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1The protein interaction network of T. pallidum .
A: High-confidence protein interaction network (TPA HCI 0.5) including 576 proteins and 991 interactions. Nodes are color-coded according to TIGR main roles. Links are color-coded based on their logistic regression score (indicated as spectral scale). Proteins involved in DNA metabolism (Figure 4) are shown as enlarged red circles. Note their distributed topology. See Table S1 for all interactions and scores. B. Comparison of the approximated power-law degree distributions of the T. pallidum networks. Node degrees k and their relative frequency P(k) are plotted on a bilogarithmic scale and fitted by linear regression. “TPA all”, “TPA 50”, and “TPA HCI” are the complete T. pallidum network and sub-networks filtered by “preycount” or logistic regression, respectively. The insert shows the node degree distribution of the high-confidence T. pallidum network (TPA HCI 0.5) on a linear scale.
Topological properties of presented interaction networks.
| All | TP50 | HCI0.3 | HCI0.5 | HCI.0.7 | |
| Filtering: in degree | - | <50 | - | - | - |
| Filtering: log regr. score | - | - | >0.3 | >0.5 | >0.7 |
| False negative rate (1−sensitivity) | - | - | 18% | 20% | 50% |
| False positive rate (1−specificity) | 52% | 28% | 12% | ||
| proteins | 726 | 601 | 640 | 576 | 422 |
| Interactions in directed networks | 3684 | 1634 | 1628 | 992 | 414 |
| Av. Node degree | 10 | 5.4 | 5 | 3.4 | 1.9 |
| Av. Shortest path length | 2.95 | 3.88 | 3.95 | 4.73 | 8.08 |
| Power coefficient | 1.15 | 1.47 | 1.54 | 1.71 | 2.35 |
| R2 | 0.85 | 0.91 | 0.91 | 0.87 | 0.94 |
(includes reciprocal interactions).
Topological parameters for T. pallidum protein datasets and corresponding networks were calculated using the NetAnalyzer plugin for Cytoscape (http://med.bioinf.mpi-inf.mpg.de/netanalyzer/): whole network “all”, network filtered by in-degree “TPA 50”, and networks filtered by logistic regression score “HCI 0.3” - “HCI 0.7”. In addition, the false negative and the false positive rates after 10× cross validation are given for the datasets filtered by logistic regression.
Figure 4An expanded view on DNA metabolism.
A. The DNA metabolism network for T. pallidum based on the integration of several experimental and bioinformatical data sets (see methods). T. pallidum proteins with a DNA metabolism related function (red nodes) are linked by interactions from several high-confidence protein interaction datasets. The color of the interactions indicates their source (see color key), e.g., all blue interactions were identified in our study (i.e. in T. pallidum) and are part of the high-confidence interaction dataset for T. pallidum (for detailed list see Table S3b). Proteins of other functional classes are included, when their association is supported by at least one additional evidence. Grey lines indicate support of an interaction by bioinformatical predictions (String database score>0.4). Proteins with orange borders have been shown to localize to the nucleoid. Proteins with a hexagonal shape have a tight bioinformatical link to a DNA metabolism protein (String database score>0.8). Proteins that are discussed in the text are shown in larger, blue font. B. Co-immunoprecipiation (coIP) experiments for a number of selected DNA metabolism interactions are shown (thick lines in network). The coIP is conducted with an anti-Myc antibody. For each coIP, the total input and the fractions after coIP are analyzed by Western Blot probing with an anti-HA and an anti-Myc antibody as indicated on the left of each blot. The empty Myc-tag vector “M-” is used as a control for unspecific binding of the HA-tagged protein. HA-tagged proteins are labeled with “H” and their gene name or gene number, e.g. “H4” in the first coIP corresponds to HA-tagged protein TP0004. Accordingly, Myc-tagged proteins are labeled with “M”, e.g. M-gyrB corresponds to Myc-tagged GyrB protein.
Figure 2Genomic locations linked by protein interactions.
A,B. Certain genomic locations are especially tightly linked via protein interactions when compared to randomized networks. Genomic location links are visualized for the “TPA 50” protein interaction dataset (A) and for bioinformatical associations from the String database (B, “StringDB 700”, protein links with combined score>0.7) [18]. Grey lines indicate all individual protein interactions/associations connecting genes on the circular chromosome of T. pallidum (1.14 Mbp total size). Tightly connected clusters comprising 5 or more neighbouring genes were identified (thick lines) by a computational method, which is based on the comparison to re-wired versions of the network (see methods). The number of linking interactions between two clusters is indicated by the color of their connecting line and the enrichment compared to randomly re-wired networks is indicated by a Z-value (in the outer circle at the positions of the clusters). Due to the incorporation of genomic neighbourhood links by the String database (and for clarity), self-links between genomic locations are removed in the “StringDB 700” representation. C, The region flanking FliS (TP0943) is, for example, connected to the region of TP0046–TP0048, linking motility and sugar metabolism (TP0943–TP0946) to a cluster of uncharacterized proteins around TP0047 which appears to be involved in motility as well [14].
Figure 3Interactions between functional classes in pro- and eukaryotes.
Connections between functional classes mediated by protein-interactions in Y2H datasets (TPA HCI = T. pallidum high confidence interactions, CJE HCI = Campylobacter jejuni high confidence interactions), and two comprehensive coAP/MS datasets from E. coli [5] and yeast [3]). For each data set and each class combination, a functional class association index (fCAI) was calculated (see methods), which scores the interaction density between two functional classes in a dataset of given size and class coverage. The matrices show the significance of each enriched functional class link (see color key). Results obtained from genome-wide Y2H (top) or coAP/MS (bottom) experiments are compared between bacteria and yeast (see color key).
Novel functional assignments based on protein network and additional evidence.
| Gene | Novel Function | Evidences |
| TP0004 | Gyrase associated protein | GT (gyrase, gyrA); PI (gyrase, gyrB) |
| TP0050 | DNA replication, nucleotide metabolism | PI (dnaB), DOM (phophoribosyl transferase) |
| TP0064, TP0066, TP0067, TP0068 | Operon involved in DNA metabolism (+ cell division) | PI (DNA metabolism + cell division); GBAA (TP0066, cell division/chromosome partitioning); DOM (TP0065, DNA methylase); HOM (TP0067, putative cell div. protein) |
| TP0183 | DNA metabolism | GBAA (DNA metabolism); PI (dnaA, sbcD, DNA repair helicase) |
| TP0297 | Cell wall metabolism | PI (capsular polysaccharide biosynthesis protein); DOM: (SPOR = involved in peptidoglycane binding) |
| TP0320 (a) | dsDNA and nucleotide uptake | PI (ribulose-3-P-epimerase & exonuclease for dsDNA); GT (TP0319, TP0322, and TP0323 [rib/gal transporter]) |
| TP0443 | DNA metabolism and/or repair | PI (gidB (tRNA methyltransferase), recX); DOM (DALR anticodon binding domain); GT (recN) |
| TP0496 | Membrane protein involved in translational and cell division | PI (tRNA-synthetases, DNA primase); GBAA (translation); GT (rod-shape determining proteins) |
| TP0526 (b) | transcription termination/antitermination | PI (nusA) |
| TP0561 (c) | Membrane protein chaperone | PI (with membrane proteins), DOM (SsgA, sporulation, cell division) |
| TP0580 (e) | ABC transporter, polysaccharide (antigen) synthesis (dTMP sugar) | PI (uridylate kinase) (enzyme complex); DOM (GtrA): generation of sugar building block |
| TP0650 | Translation | GBAA (translation); GT (tRNA-synthetases); PI (peptide deformylase; ribosomal protein L32) |
| TP0658 (f) | Flagellar assembly factor fliW | PI (flagellin); GT (motility) |
| TP0772 | Transcription Regulator | PI (RNA-polymerase, TP0701); HOM (LysR family transciptional regulator, KEGG, SW-Score 122) |
| TP0920 | Energy production | GBAA (energy); GT (Oxidoreductase, TP0921) |
| TP0941 | Regulation of motility | GBAA (signal transduction); PT (FlgM); GT (FliS, FlgN) |
| TP0963 (d) | ABC transporter, membrane biogenesis | PI (TP0965); DOM (FtsX); GT (ABC transp., lipoprotein metabolism) |
All proteins in this table are currently annotated as (conserved) hypothetical [33]. Used evidence codes are: PI (protein interaction), GT (genomic context), DOM (protein domain), genomic loci link (GLL), guilt-by-association approach (GBAA), homology (HOM). Notes and references: (a) TP0319 is a purine nucleotide receptor and its whole operon probably involved in nucleotide import [41]; (b) ATP-dependent helicase (HrpA). (c) SsgA like proteins play a chaperonin-like role [42]; (d) Transporter complex with TP0965 (HlyD motif). (e) See ref. [43], (f) See ref. [34].
Figure 5Interacting clusters of orthologous groups (“iCOG”) show phylogenetically conserved interaction patterns.
Each row of the shown profile corresponds to a species and each column corresponds to a pair of interacting protein families (i.e. iCOG), for which an interaction was found in the high-confidence T. pallidum data set. The protein families were defined based on the “cluster of orthologous genes” approach (COG) (see methods). With this, the profile shows for each interaction of the T. pallidum data set whether both interacting proteins, only one interacting protein or none of the interacting proteins are conserved in a given species (given row). For each species from the shown taxonomy (y-axis) and each iCOG, a conservation value is shown in the matrix. This conservation value indicates whether both COGs are conserved/absent in a given species or whether only one or the other COG is conserved (see left upper corner for color key). Overall, three distinct conservation regions are visible in the clustered matrix: #1, #2, and region #3-#6, which we subdivided somewhat arbitrarily into individual clusters #3-#6 with increasing conservation from left to right (note branches on tree above). This figure is also available as zoomable Figure S1 in PDF format in which individual species names and iCOGs can be seen.
Figure 6Prediction of interactions for other species based on T. pallidum high-confidence data sets.
Species (y-axis) are ordered according to taxonomy (broad groups are indicated) and the number of predicted interactions for each species based on two confidence score cut-offs (HCI 0.5 with score>0.5 and HCI 0.7 with score>0.7) is shown.