| Literature DB >> 17999763 |
Kirill Evlampiev1, Hervé Isambert.
Abstract
BACKGROUND: Successive whole genome duplications have recently been firmly established in all major eukaryote kingdoms. Such exponential evolutionary processes must have largely contributed to shape the topology of protein-protein interaction (PPI) networks by outweighing, in particular, all time-linear network growths modeled so far.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17999763 PMCID: PMC2245809 DOI: 10.1186/1752-0509-1-49
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Figure 1Duplicated proteins from the 150 MY old WGD of . Distribution of duplicated (red) and random (black) node pairs versus number of shared partners. Node degree distribution of duplicated proteins (green) and all proteins of PPI network (blue).
Figure 2Model of protein-protein interaction network evolution through whole genome duplication. Whole genome duplication is followed by asymmetric divergence of protein duplicates with random distribution between genome copies (e.g. 1/1' vs 2/2'): "New" duplicates are left essentially free to accumulate neutral mutations with the likely outcome to become nonfunctional and eventually deleted unless some "new", duplication-derived interactions are selected; "Old" duplicates, on the other hand, are more constrained to conserve "old" interactions already present before duplication. The duplicated network with quadruplated links is graphically rearranged for convenience into old and new network copies (e.g. 2 and 2' duplicated nodes are swapped here). Links from the duplicated network are then kept with different probabilities γ(0 ≤ γ≤ 1) reflecting this asymmetric divergence between protein duplicates. An alternative model based on symmetric divergence of protein duplicates and random link "complementation" is illustrated in Fig. S1 and discussed in Supporting Information.
Figure 3Analytical and numerical results of PPI Network evolution under whole genome duplication. A. Phase diagram for the limit degree distribution as a function of network exponential growth rate, Γo + Γn, and asymmetric divergence of gene duplicates, Γo - Γn. In paricular, network conservation and scale-free topology are found to be intrinsically linked properties of PPI networks under genome duplication. Colored lines correspond to iso-exponent of scale-free degree distribution. All other regions of phase diagram are likely biologically irrelevant (see text). B&C. Comparison with protein direct physical interaction data for Yeast from BIND [38] and MIPS [39] databases: BIND (August 11, 2005 release), 4576 proteins, 9133 physical interactions, = 3.99, = 106 (filled symbols) and MIPS (downloaded online April 20, 2006), 4153 proteins, 7417 physical interactions, = 3.57, = 78.6 (open symbols). Squares correspond to raw data, while circles and triangles are statistically averaged with gaps in connectivity distribution for large k ≥ 20, due to the finite size of Yeast PPI network. B. One-parameter fit of connectivity distribution data p(corresponding to the "X" mark in A., see text). Numerical connectivity distribution averaged over 10,000 network realizations (central green line). Numerical averages plus or minus two standard deviations (±2σ) are also displayed to show the predicted dispersions (upper and lower green lines) [Raw data (squares) do not fit within the mean ± 2σ curves for large k due to the finite size of Yeast PPI network]. The fitting parameter γ = 0.26 corresponds to an effective growth rate of 1 + 2γ = 1.52. C. One-parameter fit of average connectivity of first neighbor proteins g[50] (i.e. k.gsums connectivities of first neighbors from proteins of connectivity k). Numerical predictions averaged over 10,000 network realizations (central blue line). Numerical averages plus or minus two standard deviations are also displayed (upper and lower blue lines). Same fitting parameter value as in B, γ = 0.26. Note that gis rescaled by / (as = holds for each network realization); this rescales large gfluctuations between network realizations, due to the divergence of for p~ k-with 2 > α > 0 for the one-parameter model.
Figure 4Combining whole genome duplication and domain shuffling of multi-domain proteins. A. Protein-domain interaction network. Nodes now correspond to single binding domains in a protein-domain interaction network (solid lines). Multi-binding-domain proteins are introduced through a new type of links corresponding to covalent peptide bonds between protein domains (black dashed lines). This provides a graphical framework to distinguish mutually exclusive, direct interactions ("XOR") between protein domains from cummulative, indirect interactions ("AND") within multi-protein complexes (red dashed lines). B&C. Comparison with protein direct & indirect interaction data for Yeast from BIND [38] database (B&C filled symbols, indirect interactions from [75,76]) and Ref [77] (C open symbols, see Supporting Information). Data are statistically averaged as in Fig. 3B&C to account for gaps in connectivities for large k ≥ 20, due to the finite size of Yeast PPI network. B. Two-parameter fit of both direct connectivity distribution pand average direct connectivity of first neighbor proteins g[50] (see Fig. 3C and text). Numerical predictions are averaged over 1,000 network realizations (central green and blue lines). Numerical averages plus or minus two standard deviations are also displayed to show the predicted dispersions (upper and lower green and blue lines). The two adjusted parameters (γ = 0.1 and λ = 0.3) correspond to a network growth rate of 20% and an average of 1.5 protein-binding sites (domains) per protein. The connectivity distribution of the underlying single-domain network (corresponding to γ = 0.1 and λ = 0.0) is also displayed (brown line) to illustrate its relation to the full multi-domain protein network (see text). C. Two-parameter fit of both direct & indirect "matrix" connectivity distribution pand average direct & indirect "matrix" connectivity of first neighbor proteins g[50] (see text). Same two adjusted parameters (γ = 0.1 and λ = 0.3) as in B while a selection of indirect interactions is added up to a total of 28,000 direct and indirect interactions (see Supporting Information).