| Literature DB >> 36091993 |
Miaomiao Liu1,2, Jinyun Yang1, Jingfeng Guo3, Jing Chen3, Yongsheng Zhang1.
Abstract
To solve the problems of poor stability and low modularity (Q) of community division results caused by the randomness of node selection and label update in the traditional label propagation algorithm, an improved two-stage label propagation algorithm based on LeaderRank was proposed in this study. In the first stage, the order of node updating was determined by the participation coefficient (PC). Then, a new similarity measure was defined to improve the label selection mechanism so as to solve the problem of label oscillation caused by multiple labels of the node with the most similarity to the node. Moreover, the influence of the nodes was comprehensively used to find the initial community structure. In the second stage, the rough communities obtained in the first stage were regarded as nodes, and their merging sequence was determined by the PC. Next, the non-weak community and the community with the largest number of connected edges were combined. Finally, the community structure was further optimized to improve the modularity so as to obtain the final partition result. Experiments were performed on nine classic realistic networks and 19 artificial datasets with different scales, complexities, and densities. The modularity and normalized mutual information (NMI) were used as evaluation indexes for comparing the improved algorithm with dozens of relevant classic algorithms. The results showed that the proposed algorithm yields superior performance, and the results of community partitioning obtained using the improved algorithm were stable and more accurate than those obtained using other algorithms. In addition, the proposed algorithm always performs well in nine large-scale artificial data sets with 6,000 to 50,000 nodes and three large realistic network datasets, which verifies its computational performance and utility in community detection for large-scale networks.Entities:
Keywords: Community division; Label propagation; LeaderRank; Modularity; Node influence; Weak community
Year: 2022 PMID: 36091993 PMCID: PMC9454888 DOI: 10.7717/peerj-cs.981
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1(A–C) A network instance with two communities.
The nodes in different communities are represented by different shapes and colors in Fig. 1.
Figure 2Results of 100 experiments of the two algorithms on two networks.
LPA and LPA-TS were tested 100 times on two different networks.
Figure 3LPA-TS algorithm partitioning results for Karate network in the first stage.
Different colors represent different neighborhoods.
First stage of the LPA-ITSLR algorithm.
| 1: assign a unique label to each node in the |
| 2: |
| 3: |
| 4: |
| 5: |
| 6: |
| 7: |
| 8: |
| 9: |
| 10: |
| 11: |
| 12: |
Second stage of the LPA-ITSLR algorithm.
| 1: |
| 2: if |
| 3: |
| 4: |
| 5: get the community set |
| 6: take each |
| 7: take the number of edges between nodes as weight; |
| 8: get the new network |
| 9: |
| 10: |
| 11: |
| 12: |
| 13: try merge |
| 14: if ( |
| 15: merge |
| 16: |
| 17: |
Basic structural parameters of real datasets.
| Dataset |
|
| | | < | < | < | |
|---|---|---|---|---|---|---|---|
| Karate | 34 | 78 | 2 | 17 | 4.588 | 2.408 | 0.588 |
| Dolphin | 62 | 159 | 2 | 12 | 5.129 | — | 0.309 |
| Polbooks | 105 | 441 | 3 | 25 | 8.4 | 3.079 | 0.448 |
| Football | 115 | 613 | 12 | 12 | 10.661 | 2.508 | 0.403 |
| Les_Miserable | 77 | 254 | — | 36 | 6.597 | 2.641 | 0.736 |
| NetScience | 379 | 914 | 16 | 34 | 3.451 | 6.042 | 0.798 |
Note:
Six classic real social network datasets were used in the experiment; their attribute characteristics are presented in Table 3.
Figure 4Community partition results of real networks.
The partition results of this algorithm on six real data sets are shown.
), as shown in Table 4. The independent experimental results for each time are shown in Fig. 5. As can be seen from the experimental results presented in Table 4 and Fig. 5, LPA-ITSLR performed well in all datasets, with the exception that the average module degree on the NetScience was slightly lower than that obtained using the other two algorithms. Moreover, LPA-ITSLR yielded more stable community partitioning results and a higher modularity than the other two algorithms. NetScience is a weighted network; however, in the experiment, the weight was ignored, and it was transformed into a powerless network for community division. Therefore, the quality of community division on this network obtained using LPA-ITSLR was slightly lower than that obtained using the other two algorithms. However, in the 10 independent experiments, the results of the LPA and LPA-TS algorithms exhibited fluctuations, indicating that the two algorithms are unstable due to the randomness of node and label update. The module-degree value of the proposed LPA-ITSLR algorithm always remained stable for every network, indicating that LPA-ITSLR effectively solves the oscillation problem in the process of label propagation and has higher accuracy and stability.
Average modularity values of 10 experiments for the three algorithms on real datasets.
| Dataset/< | LPA | LPA-TS | LPA-ITSLR |
|---|---|---|---|
| Karate | 0.3174 | 0.3716 | 0.4242 |
| Dolphin | 0.4920 | 0.3759 | 0.5418 |
| Polbooks | 0.3801 | 0.4569 | 0.5207 |
| Football | 0.5819 | 0.6010 | 0.6068 |
| Les_Miserable | 0.2719 | 0.5007 | 0.5102 |
| NetScience | 0.7769 | 0.7573 | 0.7567 |
Figure 5Comparison of algorithm stability.
The modularity comparison of three algorithms on six networks.
Figure 6Comparison of modularity of LPA, LPA-TS, and LPA-ITSLR.
The results of 100 experiments were compared on Karate, Football and Dolphin networks.
Modularity comparison of five algorithms.
| Network | Karate | Dolphin | Polbooks | Football |
|---|---|---|---|---|
| COPRA | 0.2348 ± 0.10187 | 0.3741 ± 0.03946 | 0.4884 ± 0.03215 | 0.5972 ± 0.02115 |
| WLPA | 0.3682 ± 0.08176 | 0.3695 ± 0.02517 | 0.5070 ± 0.00622 | 0.5981 ± 0.01374 |
| LINSIA | 0.3989 ± 0.00004 | 0.3878 ± 0.00005 | 0.4521 ± 0.00007 | 0.5853 ± 0.00007 |
| LILPA | 0.4213 ± 0.0029 | 0.4003 ± 0.00214 | 0.4635 ± 0.00646 | 0.6061 ± 0.00151 |
| LPA-ITSLR | 0.4242 | 0.5418 | 0.5207 | 0.6068 |
Results of eight algorithms on classical networks.
| Network | Karate | Dolphin | Football | |||
|---|---|---|---|---|---|---|
| Criteria | | |
| | |
| | |
|
| Fastgreedy | 3 | 0.38 | 4 | 0.495 | 6 | 0.549 |
| LPA | 2 | 0.292 | 3 | 0.492 | 9 | 0.576 |
| Leading Eigenvector | 4 | 0.393 | 5 | 0.491 | 8 | 0.492 |
| Walktrap | 5 | 0.353 | 4 | 0.489 | 10 | 0.602 |
| NIBLPA | 3 | 0.352 | 5 | 0.452 | 9 | 0.542 |
| EdMot | 3 | 0.412 | 4 | 0.518 | 9 | 0.604 |
| LPA-MNI | 2 | 0.372 | 4 | 0.527 | 11 | 0.582 |
| LPA-ITSLR | 2 | 0.4242 | 2 | 0.5418 | 10 | 0.6068 |
Description of synthetic networks.
| Network | | | < |
| |||
|---|---|---|---|---|---|---|
| LFR-1–LFR-8 | 1,000 | 20 | 50 | 10 | 50 | 0.1–0.45 |
| LFR-9 | 5,000 | 10 | 50 | 50 | 50 | 0.1 |
| LFR-10 | 5,000 | 10 | 50 | 50 | 50 | 0.3 |
and NMI were used as evaluation indicators. The experimental results are shown in Fig. 7. As the value of µ increased, the network became more complex. The modularity of the community division results of the three algorithms on the corresponding network decreased by varying degrees, but LPA-ITSLR yielded higher modularity than the other algorithms. Moreover, the NMI value of LPA-ITSLR on the first seven networks was 1, and the NMI value of the network with a µ value of 0.45 was 0.9943, showing extremely strong stability and higher quality of community division.
Figure 7Comparison of modularity and NMI on eight synthetic datasets.
The modularity and NMI results of the three algorithms under different µ values were compared.
Results for LFR9 and LFR10.
| Criteria | NMI |
| ||
|---|---|---|---|---|
| Network | LFR-9 | LFR-10 | LFR-9 | LFR-10 |
| COPRA | 0.9853 ± 0.00466 | 0.9859 ± 0.00413 | 0.4259 ± 0.00786 | 0.3353 ± 0.00286 |
| SLPA | 0.9994 ± 0.00081 | 0.9931 ± 0.00352 | 0.4467 ± 0.00122 | 0.3437 ± 0.00193 |
| LINSIA | 0.8813 ± 0.00000 | 0.8267 ± 0.00007 | 0.3221 ± 0.00007 | 0.3107 ± 0.00007 |
| DLPA+ | 0.9887 ± 0.00135 | 0.9414 ± 0.00156 | 0.4423 ± 0.00164 | 0.3381 ± 0.00074 |
| WLPA | 0.9980 ± 0.00113 | 0.9979 ± 0.00111 | 0.4443 ± 0.00174 | 0.3366 ± 0.00145 |
| LPA_NI | 0.9987 ± 0.00082 | 0.9847 ± 0.00124 | 0.4467 ± 0.00024 | 0.3437 ± 0.00112 |
| LILPA | 0.9955 ± 0.00084 | 0.9692 ± 0.00115 | 0.4472 ± 0.00011 | 0.3453 ± 0.00041 |
| LPA-ITSLR | 0.9862 | 0.9531 | 0.8782 | 0.8091 |
Note:
For large-scale artificial networks LFR-9 and LFR-10 with high complexity, the proposed LPA-ITSLR algorithm was compared with seven recent label propagation algorithms for community division. Q and NMI were considered as evaluation parameters.
Actual number of communities and the number of communities detected by LPA-ITSLR.
| LFR networks | Actual number of communities | Number of communities divided by LPA-ITSLR |
|---|---|---|
| LFR-1 | 35 | 40 |
| LFR-2 | 35 | 35 |
| LFR-3 | 38 | 38 |
| LFR-4 | 45 | 45 |
| LFR-5 | 39 | 39 |
| LFR-6 | 42 | 42 |
| LFR-7 | 42 | 42 |
| LFR-8 | 42 | 40 |
| LFR-9 | 85 | 81 |
| LFR-10 | 98 | 69 |
Note:
The number of real communities and algorithm division.
Community detection results of nine large-scale artificial networks.
| Dataset | | | < |
| Actual number of communities | Number of communities found | < | NMI | |||
|---|---|---|---|---|---|---|---|---|---|---|
| LFR-11 | 6,000 | 10 | 50 | 30 | 60 | 0.1 | 125 | 128 | 0.8730 | 0.9762 |
| LFR-12 | 7,000 | 10 | 50 | 30 | 60 | 0.1 | 130 | 133 | 0.8686 | 0.9510 |
| LFR-13 | 8,000 | 10 | 50 | 30 | 60 | 0.1 | 176 | 176 | 0.8828 | 0.9805 |
| LFR-14 | 9,000 | 10 | 50 | 30 | 60 | 0.1 | 175 | 178 | 0.8722 | 0.9629 |
| LFR-15 | 10,000 | 10 | 50 | 30 | 60 | 0.1 | 175 | 180 | 0.8775 | 0.9678 |
| LFR-16 | 20,000 | 10 | 50 | 30 | 60 | 0.1 | 436 | 464 | 0.8842 | 0.9844 |
| LFR-17 | 30,000 | 10 | 50 | 30 | 60 | 0.1 | 668 | 683 | 0.8846 | 0.9851 |
| LFR-18 | 40,000 | 10 | 50 | 30 | 60 | 0.1 | 1,058 | 1,049 | 0.8837 | 0.9793 |
| LFR-19 | 50,000 | 10 | 50 | 30 | 60 | 0.1 | 1,382 | 1,341 | 0.8643 | 0.9637 |
Note:
Large scale data results presentation.
Properties of large-scale social network topology.
| Network | | | | | < | < | < | |
|---|---|---|---|---|---|---|
| 1,133 | 5,451 | 71 | 9.6220 | 3.6060 | 0.2540 | |
| PB | 1,224 | 33,430 | 702 | 54.6242 | 3 | 0.2259 |
| PG | 4,941 | 6,594 | 19 | 2.6691 | 20.0941 | 0.1031 |
Note:
Large scale realistic data set parameter display.
Comparison of community division results of five classic algorithm.
| Network | GN | CNM | SC | LPA | LPA-ITSLR |
|---|---|---|---|---|---|
| 0.446/10 | 0.412/45 | 0.014/4 | |||
| PB | 0.418/205 | 0.328/62 | 0.410/3 | ||
| PG | 0.857/39 | 0.830/42 | 0.871/38 |
Note:
The data with the largest modularity value in the table is displayed in bold font, and the data with the second largest value is underlined.
Comparative analysis of time complexity of algorithms.
| Algorithm | Time complexity |
|---|---|
| GN | |
| Newman Fastgreedy | |
| Edge-Betweenness | |
| CNM | |
| SC | |
| Walktrap | |
| LPA | |
| NIBLPA | |
| LPA-MNI | |
| LPA-TS | |
| LPA-ITSLR |