| Literature DB >> 24991920 |
Qi Yu1, Chao Long2, Yanhua Lv3, Hongfang Shao4, Peifeng He3, Zhiguang Duan5.
Abstract
Research collaborations are encouraged because a synergistic effect yielding good results often appears. However, creating and organizing a strong research group is a difficult task. One of the greatest concerns of an individual researcher is locating potential collaborators whose expertise complement his best. In this paper, we propose a method that makes link predictions in co-authorship networks, where topological features between authors such as Adamic/Adar, Common Neighbors, Jaccard's Coefficient, Preferential Attachment, Katzβ, and PropFlow may be good indicators of their future collaborations. Firstly, these topological features were systematically extracted from the network. Then, supervised models were used to learn the best weights associated with different topological features in deciding co-author relationships. Finally, we tested our models on the co-authorship networks in the research field of Coronary Artery Disease and obtained encouraging accuracy (the precision, recall, F1 score and AUC were, respectively, 0.696, 0.677, 0.671 and 0.742 for Logistic Regression, and respectively, 0.697, 0.678, 0.671 and 0.743 for SVM). This suggests that our models could be used to build and manage strong research groups.Entities:
Mesh:
Year: 2014 PMID: 24991920 PMCID: PMC4081126 DOI: 10.1371/journal.pone.0101214
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Formula for the 6 topological features used in this paper.
| Type | Topological feature | Description |
| Neighborhood-based | Common Neighbors |
|
| Jaccard's coefficient |
| |
| Adamic/Adar |
| |
| Preferential attachment |
| |
| Path-based | Katzβ |
|
| PropFlow | the probability that a restricted random walk starting at |
denotes node i. denotes the set of all neighbors of . denotes the number of all neighbors of .
The summarization of the author sets with different productivity.
| Author Type | # Authors | # New Relationship | # All Possible Relationship |
| All authors | 51,555 | 137,219 | 3,838,391 |
| # Papers > = 5 | 7,606 | 100,335 | 2,608,004 |
| # Papers > = 10 | 2,435 | 64,098 | 1,529,799 |
| # Papers > = 25 | 394 | 19,839 | 467,493 |
| # Papers > = 50 | 75 | 5,285 | 117,029 |
| # Papers > = 100 | 9 | 593 | 15,821 |
All of the documents containing the word “coronary” in their titles, abstracts or keywords were collected from Web of Science. The scope was limited to the years 2008 through 2013. Two time periods were considered for the networks: T1 = [2008–2010], T2 = [2011–2013]. The authors were confined to those acitve in both T1 and T2 periods.
Test results of LR and SVM model for entire topological feature set vs. baseline topological feature set.
| Evaluation Measure | Entire topological feature set | Baseline topological feature set | ||
|
|
|
|
| |
| Precision | 0.696 | 0.697 | 0.504 | 0.495 |
| Recall | 0.677 | 0.678 | 0.509 | 0.509 |
| F1 score | 0.671 | 0.671 | 0.361 | 0.345 |
| AUC | 0.742 | 0.743 | 0.502 | 0.501 |
Figure 1Test results of the LR model (Authors with high productivity and less productivity).
Figure 2Test results of the SVM model (Authors with high productivity and less productivity).
Test results of LR and SVM before vs. after using the selected topological features.
| Evaluation Measure | Before using the selected topological features | After using the selected topological features | ||
|
|
|
|
| |
| Precision | 0.696 | 0.697 | 0.697 | 0.702 |
| Recall | 0.677 | 0.678 | 0.678 | 0.679 |
| F1 score | 0.671 | 0.671 | 0.671 | 0.672 |
| AUC | 0.742 | 0.743 | 0.744 | 0.754 |
By using the feature selection methods, Adamic/Adar, Preferential attachment, Katzβ, and PropFlow were selected as the most effective ones for boththe LR model, while Adamic/Adar, Common Neighbors, Preferential attachment, and PropFlow were selected for the SVM model.
Figure 3Test results of the LR model for each topological feature.