| Literature DB >> 31029085 |
Sara Rahiminejad1, Mano R Maurya2, Shankar Subramaniam3.
Abstract
BACKGROUND: Community detection algorithms are fundamental tools to uncover important features in networks. There are several studies focused on social networks but only a few deal with biological networks. Directly or indirectly, most of the methods maximize modularity, a measure of the density of links within communities as compared to links between communities.Entities:
Keywords: Biological function; Biological networks; Community detection; Modularity; Pathways
Mesh:
Substances:
Year: 2019 PMID: 31029085 PMCID: PMC6487005 DOI: 10.1186/s12859-019-2746-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Number of nodes and edges for communities detected using different methods for the Yeast PPI network (6532 nodes and 229,696 edges). The number in parenthesis after the name of each method represents the number of communities detected by that method. For example, Combo finds 8 communities. Modularity scores are also provided for different methods. For each method, we only consider the communities with 100 or more nodes and list up to 10 communities
| Combo (8) Q = 0.2654 | # of nodes | 2231 | 1514 | 1337 | 1284 | ||||||
| # of edges | 25,137 | 23,690 | 38,523 | 30,585 | |||||||
| Conclude (66) Q = 0.2468 | # of nodes | 788 | 744 | 602 | 468 | 423 | 359 | 288 | 271 | 252 | 199 |
| # of edges | 14,585 | 10,506 | 3272 | 988 | 5826 | 4123 | 1404 | 3486 | 940 | 1703 | |
| F. Greedy (10) Q = 0.2112 | # of nodes | 2608 | 2410 | 1466 | |||||||
| # of edges | 61,665 | 72,998 | 7180 | ||||||||
| L. Eigen (4) Q = 0.1686 | # of nodes | 2661 | 1910 | 984 | 977 | ||||||
| # of edges | 75,812 | 28,664 | 7373 | 7203 | |||||||
| Louvain (9) Q = 0.2643 | # of nodes | 1538 | 1472 | 1190 | 1151 | 993 | 131 | ||||
| # of edges | 16,015 | 23,394 | 13,247 | 31,202 | 22,553 | 676 | |||||
| Spinglass (9) Q = 0.2681 | # of nodes | 1607 | 1473 | 1194 | 1148 | 1076 | |||||
| # of edges | 16,616 | 23,876 | 12,282 | 32,641 | 23,854 |
Comparison of different methods with respect to three topological metrics, namely, Rand Index (RI), Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI) for the Yeast PPI network. When a method is compared with itself, RI, ARI and NMI are 1 (diagonal elements). Larger (smaller) the value of RI, ARI and NMI, the more (less) similar are the two methods being compared. For example, Louvain and Spinglass are most similar to each other
| Combo | Conclude | F. Greedy | L. Eigen | Louvain | Spinglass | ||
|---|---|---|---|---|---|---|---|
| RI | Combo | 1 | 0.7608 | 0.7157 | 0.6788 | 0.8319 | 0.8409 |
| ARI | Combo | 1 | 0.1466 | 0.3125 | 0.1942 | 0.5163 | 0.5479 |
| NMI | Combo | 1 | 0.2905 | 0.4024 | 0.2447 | 0.5413 | 0.5723 |
| RI | Conclude | 1 | 0.6815 | 0.7061 | 0.8083 | 0.8012 | |
| ARI | Conclude | 1 | 0.0818 | 0.0825 | 0.1659 | 0.1637 | |
| NMI | Conclude | 1 | 0.1956 | 0.1472 | 0.3016 | 0.2924 | |
| RI | F. Greedy | 1 | 0.6334 | 0.7098 | 0.7129 | ||
| ARI | F. Greedy | 1 | 0.146 | 0.2629 | 0.2764 | ||
| NMI | F. Greedy | 1 | 0.1918 | 0.3545 | 0.3652 | ||
| RI | L. Eigen | 1 | 0.6952 | 0.6936 | |||
| ARI | L. Eigen | 1 | 0.188 | 0.1914 | |||
| NMI | L. Eigen | 1 | 0.2231 | 0.2285 | |||
| RI | Louvain | 1 | 0.9021 | ||||
| ARI | Louvain | 1 | 0.6922 | ||||
| NMI | Louvain | 1 | 0.6644 | ||||
| RI | Spinglass | 1 | |||||
| ARI | Spinglass | 1 | |||||
| NMI | Spinglass | 1 |
Jaccard index (as a percentage) between the communities identified by two similar methods, namely, Louvain and Spinglass, for the Yeast PPI network. L1 to L5 refer to the communities detected by Louvain method and sorted by their size. Similarly, S1 to S5 refer to the communities detected by Spinglass method. The numbers in parenthesis represent the number of genes in each community. Community pairs with maximum overlap (e.g., L1 vs. S1) are indicated in bold text
Jaccard index (as a percentage) between the communities identified by two dissimilar methods, namely, Louvain and Leading Eigen for the Yeast PPI network. (L1 to L5: communities detected by Louvain; LE1 to LE4: communities detected by Leading Eigen). The numbers in parenthesis represent the number of genes in each community. Community pairs with maximum overlap (e.g., L1 vs. LE4) are indicated in bold text
A Comparison of KEGG pathway enrichment results between the first community of Louvain (L1) with 1538 genes and the first community of Spinglass (S1) with 1607 genes for the Yeast PPI network. The numbers inside parenthesis after L1 and S1 represent the number of genes that DAVID could annotate, which is generally less than the number of genes in those communities. The first column lists the broad category of pathways (M: Metabolism, CP: Cellular Processes). Many pathways enriched in L1 and S1 have good overlap (a large number of genes are common). FE: Fold Enrichment. False Discovery Rate (FDR) values for all pathways and both communities are approximately 1.10E+3 times p-value (the factor 1.10E+3 is related to the size of the community)
A comparison of KEGG pathway enrichment results between the first community of Louvain with 1538 genes and the fourth community of Leading Eigen (LE4) with 977 genes for the Yeast PPI network. The numbers inside parenthesis after L1 and LE4 represent the number of genes that DAVID could annotate, which is generally less than the number of genes in those communities. The first column lists the broad category of pathways (M: Metabolism, and CP: Cellular Processes), FE: Fold Enrichment, False Discovery Rate (FDR) values for all pathways and both communities are approximately 1.10E+3 times p-value
A comparison of KEGG pathway enrichment results between the second community of Louvain (L2) with 1472 genes and the second community of Spinglass (S2) with 1473 genes for the Yeast PPI network. The numbers inside parenthesis after L2 and S2 represent the number of genes that DAVID could annotate, which is generally less than the number of genes in those communities. The first column lists the broad category of pathways (GIP: Genetic Information Processing). Many pathways enriched in L2 and S2 have good overlap (a large number of genes are common). FE: Fold Enrichment. False Discovery Rate (FDR) values for all pathways and both communities are approximately 1.05E+3 times p-value
Modularity scores and number of communities detected by Combo for the Human PPI network. Each run uses a random seed between 0 and 10,000 in the procedure for finding communities
| Modularity | 0.3735 | 0.3734 | 0.3734 | 0.3729 | 0.3723 |
| # of communities detected | 11 | 13 | 11 | 15 | 11 |
| Modularity | 0.3723 | 0.3718 | 0.3715 | 0.3711 | 0.3704 |
| # of communities detected | 13 | 12 | 10 | 12 | 13 |
Modularity scores and number of communities detected by Spinglass for the Human PPI network. Each run uses a random seed between 0 and 10,000 in the procedure for finding communities
| Modularity | 0.3729 | 0.3727 | 0.3725 | 0.3725 | 0.3724 |
| # of communities detected | 21 | 22 | 22 | 24 | 21 |
| Modularity | 0.3721 | 0.3716 | 0.3716 | 0.3711 | 0.3688 |
| # of communities detected | 22 | 23 | 21 | 23 | 23 |
Comparison of different methods with respect to three topological metrics, namely, RI, ARI and NMI for the Human PPI network (20,644 nodes and 241,008 edges). When a method is compared with itself, RI, ARI and NMI are 1 (diagonal elements). Larger (smaller) the value of RI, ARI and NMI, the more (less) similar are the two methods being compared. For example, Combo and Spinglass are most similar to each other, Louvain being the next most similar to them. Overall, Combo, Louvain and Spinglass provide similar results
| Combo | F. Greedy | L. Eigen | Louvain | Spinglass | ||
|---|---|---|---|---|---|---|
| RI | Combo | 1 | 0.7314 | 0.3606 | 0.8805 | 0.8948 |
| ARI | Combo | 1 | 0.1806 | 0.0315 | 0.416 | 0.4998 |
| NMI | Combo | 1 | 0.3025 | 0.0936 | 0.4601 | 0.5551 |
| RI | F. Greedy | 1 | 0.444 | 0.7243 | 0.7258 | |
| ARI | F. Greedy | 1 | 0.0624 | 0.1609 | 0.1739 | |
| NMI | F. Greedy | 1 | 0.0787 | 0.2682 | 0.3063 | |
| RI | L. Eigen | 1 | 0.3531 | 0.3649 | ||
| ARI | L. Eigen | 1 | 0.0191 | 0.0326 | ||
| NMI | L. Eigen | 1 | 0.0711 | 0.0951 | ||
| RI | Louvain | 1 | 0.8832 | |||
| ARI | Louvain | 1 | 0.4479 | |||
| NMI | Louvain | 1 | 0.4679 | |||
| RI | Spinglass | 1 | ||||
| ARI | Spinglass | 1 | ||||
| NMI | Spinglass | 1 |
Top 10 pathways for a comparison of KEGG pathway enrichment results between C1 with 3252 genes and S1 with 3206 genes for the Human PPI network (20,644 nodes and 241,008 edges). The numbers inside parenthesis after C1 and S1 represent the number of genes that DAVID could annotate, which is generally less than the number of genes in those communities. The first column lists the broad category of pathways (M: Metabolism, HD: Human Diseases, CP: Cellular Processes, and GIP: Genetic Information Processing), FE: Fold Enrichment
Top 10 pathways for a comparison of KEGG pathway enrichment results between L1 with 3585 genes and S1 with 3206 genes for the Human PPI network. The numbers inside parenthesis after L1 and S1 represent the number of genes that DAVID could annotate, which is generally less than the number of genes in those communities. The first column lists the broad category of pathways (M: Metabolism, HD: Human Diseases, GIP: Genetic Information Processing, and CP: Cellular Processes), FE: Fold Enrichment
Fig. 1Pie charts for KEGG pathway enrichment results of C1 with 3252 genes and S1 with 3206 genes for the Human PPI network. Left chart shows the results for C1 and right chart shows the results for S1
Fig. 2Pie charts for KEGG pathway enrichment results of L1 with 3585 genes and S1 with 3206 genes for the Human PPI network. Left chart shows the results for L1 and right chart shows the results for S1
Jaccard index (as a percentage) between the communities detected by Louvain for the Yeast PPI network and orthologous genes of the communities detected by the same method for the Human PPI network in Yeast. Community pairs with maximum overlap of more than 10% (e.g., SC2 vs. HS3 ➔ SC) are indicated in bold text
A comparison of KEGG pathway enrichment results between the second community detected by Louvain for the Yeast PPI network (SC2) and HS3 ➔ SC (orthologous genes of the third community of the Human PPI network in Yeast). The first column lists the broad category of pathways (GIP: Genetic Information Processing), FE: Fold Enrichment
Fig. 3Comparing number of genes enriched in different pathways for the first community detected by Louvain (L1), the first community detected by Spinglass (S1) and the fourth community detected by Leading Eigen (LE4) for the Yeast PPI network
Number of genes enriched in each pathway for different communities detected by Combo for the Yeast PPI network. The first column lists the broad category of pathways (M: Metabolism, and GIP: Genetic Information Processing). Column 2 lists the different pathways enriched. Columns 3 through 7 represent number of enriched genes for different communities. Column 8 lists the total of all enriched genes of all communities together and the last column represents the maximum number of genes in that pathway. For example, in DAVID database, “metabolic pathways” contains 685 genes and of these, 384 were found in C1 and 140 were found in C4
Number of genes enriched in each pathway for different communities detected by Louvain for the Yeast PPI network. This table is arranged similar to Table 15
Number of genes enriched in each pathway for different communities detected by Spinglass for the Yeast PPI network. This table is arranged similar to Table 15
Fig. 4Scatter plots of avg-max (average of column-wise-maximum of Jaccard index matrix) values vs. run number. The left panel (a) shows the results for the Yeast PPI network and the right panel (b) shows the results for the Human PPI network
Fig. 5a Node-degree distribution for the combined network. b Node-degree distribution for the physical interaction-only network. c Comparison of counts of nodes between the combined network and the physical interaction-only network
Summary of community detection methods
| Name of Method | Equation # | Computational complexity | Reference |
|---|---|---|---|
| Girvan Newman | 1 | O(m2n) | [ |
| Fast Greedy | 2, 3, 4 | O(nlog2n) | [ |
| Combo | 1 | O(n2logc) | [ |
| Louvain | 1, 5 | O(nlogn) | [ |
| Conclude | 6, 7 | O(m) | [ |
| Infomap | 8 | O(m) | [ |
| Leading Eigen | 9, 10, 12 | O(n2) | [ |
| Spinglass | 15 | O(n3.2) | [ |
Fig. 6Flow chart of the steps used in our analysis