| Literature DB >> 35560988 |
Xiao-Yan Xue1, Zhou Chen1, Yue Hu1, Dan Nie1, Hui Zhao1, Xing-Gang Mao2.
Abstract
As a model system, Escherichia coli has been used to study various life processes. A dramatic paradigm shift has occurred in recent years, with the study of single proteins moving toward the study of dynamically interacting proteins, especially protein-protein interaction (PPI) networks. However, despite the importance of PPI networks, little is known about the intrinsic nature of the network structure, especially high-dimensional topological properties. By introducing general hypergeometric distribution, we reconstruct a statistically reliable combined PPI network of E. coli (E. coli-PPI-Network) from several datasets. Unlike traditional graph analysis, algebraic topology was introduced to analyze the topological structures of the E. coli-PPI-Network, including high-dimensional cavities and cycles. Random networks with the same node and edge number (RandomNet) or scale-free networks with the same degree distribution (RandomNet-SameDD) were produced as controls. We discovered that the E. coli-PPI-Network had special algebraic typological structures, exhibiting more high-dimensional cavities and cycles, compared to RandomNets or, importantly, RandomNet-SameDD. Based on these results, we defined degree of involved q-dimensional cycles of proteins (q-DCprotein ) in the network, a novel concept that relies on the integral structure of the network and is different from traditional node degree or hubs. Finally, top proteins ranked by their 1-DCprotein were identified (such as gmhB, rpoA, rplB, rpsF and yfgB). In conclusion, by introducing mathematical and computer technologies, we discovered novel algebraic topological properties of the E. coli-PPI-Network, which has special high-dimensional cavities and cycles, and thereby revealed certain intrinsic rules of information flow underlining bacteria biology.Entities:
Keywords: Betti number; Network Science; complex system; drug resistance; higher-order interactions; simplex
Mesh:
Substances:
Year: 2022 PMID: 35560988 PMCID: PMC9249336 DOI: 10.1002/2211-5463.13437
Source DB: PubMed Journal: FEBS Open Bio ISSN: 2211-5463 Impact factor: 2.792
Fig. 1Schematic images illustrating the key concepts in algebraic topology and persistence homology. (A) Simplexes for different dimensions. (B) Calculation of boundary maps. Boundary map for Dim 1 Simplex ∂[1,2] = [2] − [1], and for Dim 2 Simplex ∂[1,2,3] = [1,2] + [2,3] + [3,1]. Furthermore, the boundary of the ‘Dim 2 Simplex Boundary’ is, ∂∂[1,2,3] = ∂[1,2] + ∂[2,3] + ∂[3,1] = [2] − [1] + [3] − [2] + [1] − [3] = 0. (C) Dim1 cycles (or cavities). There are two classes of Dim 1 cycles: (1) δ1[1,2,3,4]; (2) δ2[3,5,7,4] or [3,5,6,7,4], these two are equal to each other, because they enclosed a same ‘cavity’ (yellow part). The two cavities are filled with orange and yellow colors respectively. Note that δ3[5,6,7] is a simplex but not a cycle. (D) Process of persistence homology. An example of filtration which starts at value 1 and ends at value 5. Each image represents a filtration step and is assigned with a value. At value 1, there are five dim 0 simplexes (that are points, [1], [2], [3], [4], [5]) and four dim 1 simplexes (that are edges, [1,2], [2,3], [4,5], [1,5]). At value 2, one more dim 1 simplex [3,4] is added into the complex and thus a dim 1 cycle (cycle 1) is formed; hence, the life of this dim 1 cycle starts at value 2. At value 3, two more dim 0 simplexes ([6], [7]) and three more dim 1 simplexes ([2,6], [3,7], [6,7]) are added and another dim 1 cycle (cycle 2) is formed for which life starts at value 3. At value 4, a dim 2 simplex δ1 ([3,6,7]) is added, and the two dim 1 cycles are persisted. At value 5, another dim 2 simplex δ2 ([2,3,6]) is added and thus cycle 2 is disappeared whose life ended at value 5, whereas cycle 1 still persisted. Therefore, there are totally two dim 1 cycles, and the life length of cycle 1 is infinite – 2 (here infinite indicates the cycle persists longer than the observed filtration values), while the life of cycle 2 is 5 − 3 = 2.
General hypergeometric distribution analysis of the overlapped interactions in the three datasets. NOL, number of elements with specific overlapped feature.
| Total potential amount | Subsets selected | Number of independent groups | Overlap number | Number of overlapped genes | Mean of NOL distribution | Var of OL distribution | 95% CI of NOL distribution | False positive ( |
|---|---|---|---|---|---|---|---|---|
| 6 070 870 | 11 017, 3888, 5993 | 3 | ≥ 3 | 37 | 0.007 | 0.007 | 0–0.38 | 0.00% |
| 6 070 870 | 11 017, 3888, 5993 | 3 | ≥ 2 | 1142 | 21.76 | 21.66 | 0.94–42.57 | < 3.68% |
Number of cavities (Betti numbers) in different dimensions in Escherichia coli‐PPI‐Network, control RandomNet and RandomNet‐SameDD.
| Dimension |
| RandomNet | RandomNet‐SameDD | ||||
|---|---|---|---|---|---|---|---|
| Average | SD | 95% CI | Average | SD | 95% CI | ||
| 0 | 149 | 160.00 | 10.22 | 139.96–180.04 | 315.20 | 4.19 | 306.99–323.41 |
| 1 | 79 | 208.20 | 10.21 | 188.19–228.21 | 31.65 | 3.66 | 24.48–38.82 |
| 2 | 1 | 0.00 | 0.00 | 0.00–0.00 | 0.00 | 0.00 | 0.00–0.00 |
Fig. 2Betti curves of dimension 2 in RandomNets, showing the number of 2‐cavities during increased number of edges.
Key algebraic topological parameters of individual Escherichia coli‐PPI‐Network and corresponding RandomNetand RandomNet‐SameDD. Rand, RandomNet; Rand SameDD, RandomNet‐SameDD. For RandomNet and RandomNet‐SameDD, 200 samples were produced to get the results. The average values of the random networks are shown.
| Network | BDS | BDC | Betti2 | ||||||
|---|---|---|---|---|---|---|---|---|---|
|
| Rand | Rand SameDD |
| Rand | Rand SameDD |
| Rand | Rand SameDD | |
|
| 6 | 2 | 34 | 2 | 1 | 1 | 48.00 | 0.00 | 0.00 |
|
| 6 | 2 | 18 | 2 | 1 | 1 | 1.00 | 0.00 | 0.00 |
|
| 8 | 3 | 36 | 3 | 2 | 1 | 192.00 | 31.10 | 0.00 |
|
| 5 | 2 | 9 | 2 | 1 | 1 | 1.00 | 0.00 | 0.00 |
Fig. 3Key parameters of the Escherichia coli‐PPI‐Networks constructed from individual or combined datasets, including (A) BDS, (B) BDC, and (C) Betti2 values. For the RandomNet and RandomNet‐SameDD, error bars represent the SD and n = 100.
Fig. 4Representative gene node gmhB, who had low degree value (3), but high 1‐DCprotein (20). All of the 20 1‐cycles for which the gmhB involved are shown, as well as all of its neighbors (rplB, rpsF, rpso). Yellow nodes: node genes that form one of the cycles involving gmhB. Green node: node gene for gmhB. Blue nodes: node genes which are not involved in the cycles involving gmhB. The size of the node represents the degree of the genes (larger nodes have greater degree values).
Fig. 5Correlation between 1‐DCprotein and traditional parameters, including degree, cluster coefficient, betweenness, assortativity (assortativity of degree, closeness, and betweenness).