| Literature DB >> 36010793 |
Tao Wang1,2, Mengyu Jiao1, Xiaoxia Wang3.
Abstract
Link prediction is an important task in the field of network analysis and modeling, and predicts missing links in current networks and new links in future networks. In order to improve the performance of link prediction, we integrate global, local, and quasi-local topological information of networks. Here, a novel stacking ensemble framework is proposed for link prediction in this paper. Our approach employs random forest-based recursive feature elimination to select relevant structural features associated with networks and constructs a two-level stacking ensemble model involving various machine learning methods for link prediction. The lower level is composed of three base classifiers, i.e., logistic regression, gradient boosting decision tree, and XGBoost, and their outputs are then integrated with an XGBoost model in the upper level. Extensive experiments were conducted on six networks. Comparison results show that the proposed method can obtain better prediction results and applicability robustness.Entities:
Keywords: complex networks; ensemble learning; link prediction; recursive feature elimination; stacking
Year: 2022 PMID: 36010793 PMCID: PMC9407261 DOI: 10.3390/e24081124
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.738
Basic statistics of six networks. Notation: N is the number of nodes, M is the number of links, C is the clustering coefficient, R is the assortative coefficient, and H is the degree heterogeneity.
| Networks |
|
|
|
|
|
|---|---|---|---|---|---|
|
| 297 | 2148 | 0.308 | −0.163 | 1.800 |
| Vicker | 29 | 376 | 0.733 | −0.157 | 0.982 |
| 1133 | 5451 | 0.254 | 0.078 | 1.942 | |
| NS | 1589 | 2742 | 0.791 | 0.462 | 2.011 |
| SciMet | 3084 | 10,399 | 0.175 | −0.033 | 2.78 |
| Router | 5022 | 6258 | 0.033 | −0.138 | 5.503 |
Figure 1Difference in matching scores between the LP index and the PA index.
Figure 2SELLP model.
Figure 3The CVS of selected features.
AUC results of the base and ensemble models on six networks.
| Results Based on the Original Features | ||||||
|---|---|---|---|---|---|---|
| Methods | Vicker |
| NS | SciMet | Router | |
| SELLP | 0.8290 | 0.9236 | 0.9538 | 0.9424 | 0.9378 | 0.9604 |
| GBDT | 0.7750 | 0.9079 | 0.9461 | 0.9308 | 0.9138 | 0.9477 |
| XGBoost | 0.7604 | 0.9085 | 0.9457 | 0.9460 | 0.9268 | 0.9531 |
| LR | 0.7318 | 0.8631 | 0.9365 | 0.9263 | 0.8967 | 0.9287 |
|
| ||||||
|
|
|
|
|
|
|
|
| RF-RFE -SELLP | 0.9118 | 0.9525 | 0.9949 | 0.9747 | 0.9764 | 0.9884 |
| RF-RFE-GBDT | 0.8663 | 0.9419 | 0.9747 | 0.9476 | 0.9438 | 0.9680 |
| RF-RFE-XGBoost | 0.8059 | 0.9278 | 0.9786 | 0.9521 | 0.9563 | 0.9769 |
| RF-RFE-LR | 0.8263 | 0.9121 | 0.9627 | 0.9497 | 0.9432 | 0.9581 |
Accuracy results of the base and ensemble models on six networks.
| Results Based on the Original Features | ||||||
|---|---|---|---|---|---|---|
| Methods | Vicker |
| NS | SciMet | Router | |
| SELLP | 0.8260 | 0.9246 | 0.9536 | 0.9524 | 0.9447 | 0.9600 |
| GBDT | 0.7681 | 0.9093 | 0.9245 | 0.9434 | 0.9237 | 0.9276 |
| XGBoost | 0.7826 | 0.9030 | 0.9453 | 0.9461 | 0.9279 | 0.9492 |
| LR | 0.7681 | 0.8813 | 0.9353 | 0.9361 | 0.9019 | 0.9248 |
|
| ||||||
|
|
|
|
|
|
|
|
| RF-RFE-SELLP | 0.8985 | 0.9534 | 0.9945 | 0.9747 | 0.9764 | 0.9884 |
| RF-RFE-GBDT | 0.8405 | 0.9369 | 0.9745 | 0.9611 | 0.9537 | 0.9780 |
| RF-RFE-XGBoost | 0.8405 | 0.9355 | 0.9854 | 0.9606 | 0.9572 | 0.9768 |
| RF-RFE-LR | 0.8115 | 0.9081 | 0.9726 | 0.9592 | 0.9328 | 0.9492 |
Precision results of the base and ensemble models on six networks.
| Results Based on the Original Features | ||||||
|---|---|---|---|---|---|---|
| Methods | Vicker |
| NS | SciMet | Router | |
| SELLP | 0.8246 | 0.9294 | 0.9545 | 0.9488 | 0.9481 | 0.9652 |
| GBDT | 0.7868 | 0.8919 | 0.9309 | 0.9375 | 0.9100 | 0.9474 |
| XGBoost | 0.7822 | 0.9097 | 0.9455 | 0.9419 | 0.9310 | 0.9523 |
| LR | 0.7791 | 0.8696 | 0.9370 | 0.9221 | 0.9098 | 0.9259 |
|
| ||||||
|
|
|
|
|
|
|
|
| RF-RFE-SELLP | 0.8743 | 0.9670 | 0.9927 | 0.9794 | 0.9853 | 0.9913 |
| RF-RFE-GBDT | 0.8614 | 0.9320 | 0.9727 | 0.9548 | 0.9510 | 0.9789 |
| RF-RFE-XGBoost | 0.8367 | 0.9469 | 0.9845 | 0.9592 | 0.9524 | 0.9744 |
| RF-RFE-LR | 0.8189 | 0.8934 | 0.9781 | 0.9481 | 0.9428 | 0.9576 |
F1-score results of the base and ensemble models on six networks.
| Results based on the original features | ||||||
|---|---|---|---|---|---|---|
| Methods | Vicker |
| NS | SciMet | Router | |
| SELLP | 0.8571 | 0.9213 | 0.9536 | 0.9421 | 0.9374 | 0.9601 |
| GBDT | 0.8048 | 0.9059 | 0.9457 | 0.9304 | 0.9136 | 0.9478 |
| XGBoost | 0.8314 | 0.9066 | 0.9455 | 0.9457 | 0.9265 | 0.9533 |
| LR | 0.8260 | 0.8552 | 0.9363 | 0.9257 | 0.8960 | 0.9290 |
|
| ||||||
|
|
|
|
|
|
|
|
| RF-RFE-SELLP | 0.9156 | 0.9501 | 0.9945 | 0.9744 | 0.9762 | 0.9885 |
| RF-RFE-GBDT | 0.8607 | 0.9380 | 0.9745 | 0.9473 | 0.9436 | 0.9682 |
| RF-RFE-XGBoost | 0.8817 | 0.9261 | 0.9785 | 0.9512 | 0.9559 | 0.9769 |
| RF-RFE-LR | 0.8395 | 0.9109 | 0.9624 | 0.9486 | 0.9429 | 0.9584 |
Figure 4AUC results of all comparative methods on six networks. The best performance is emphasized in green for each network.
Figure 5Accuracy results of all comparative methods on six networks. The best performance is emphasized in green for each network.
Figure 6Precision results of all comparative methods on six networks. The best performance is emphasized in green for each network.
Figure 7F1-score results of all comparative methods on six networks. The best performance is emphasized in green for each network.
Statistical test for RF-RFE-SELLP vs. AdaBoost, RF, MLP, CN, MFI, and SRW in mean Accuracy on the Email network.
| AdaBoost | RF | MLP | CN | MFI | SRW | |
|---|---|---|---|---|---|---|
| 0.48 | 0.45 | 0.11 | 0.02 | 0.14 | 0.19 | |
| 4.95 × 10−5 | 3.41 × 10−3 | 1.25 × 10−4 | 3.14 × 10−7 | 4.56 × 10−9 | 6.09 × 10−9 | |
| Mean Accuracy | 0.9341 | 0.9461 | 0.9249 | 0.6227 | 0.5441 | 0.8161 |
| Mean Accuracy of RF-RFE-SELLP | 0.9748 | 0.9748 | 0.9748 | 0.9748 | 0.9748 | 0.9748 |
Figure 8Precision results with different training set sizes.
Figure 9F1-score results with different training set sizes.