| Literature DB >> 31127715 |
Renchu Guan1,2, Dong Xu1,3,4, Xiaoyue Feng1,3, Hao Zhang1, Yijie Ren3, Penghui Shang3, Yi Zhu3, Yanchun Liang1,2.
Abstract
BACKGROUND: It is of great importance for researchers to publish research results in high-quality journals. However, it is often challenging to choose the most suitable publication venue, given the exponential growth of journals and conferences. Although recommender systems have achieved success in promoting movies, music, and products, very few studies have explored recommendation of publication venues, especially for biomedical research. No recommender system exists that can specifically recommend journals in PubMed, the largest collection of biomedical literature.Entities:
Keywords: PubMed; biomedical literature; convolutional neural network; deep learning; recommender system
Mesh:
Year: 2019 PMID: 31127715 PMCID: PMC6555124 DOI: 10.2196/12957
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1Architecture of our Pubmender system. CNN: convolutional neural network; ISSN: International Standard Serial Number.
Figure 2The structure of our deep convolutional neural network model.
Details of the first dataset (Jan 2007 to Dec 2016).
| Statistic | Number of journalsa | Number of papersb | |
| 100≤xc≤400 | 740 | 157,038 | |
| 400<x≤2000 | 330 | 259,676 | |
| 2000<x≤10,000 | 55 | 195,426 | |
| >10,000 | 5 | 225,742 | |
| Total | 1130 | 837,882 | |
| Maximum class size | 1 | 153,608 | |
| Minimum class size | 4 | 100 | |
| Average class size | N/Ad | 741 | |
aThis represents the total number of journals in this range.
bThis represents the total number of papers published in all journals in this range.
cx represents the number of papers published in one journal.
dN/A: not applicable.
Word statistics of abstracts.
| Size | Number of abstracts |
| 20≤xa<50 | 25,499 |
| 50≤x<100 | 76,614 |
| 100≤x<150 | 139,420 |
| 150≤x<200 | 227,993 |
| 200≤x<250 | 191,156 |
| 250≤x<300 | 87,597 |
| 300≤x<350 | 46,275 |
| x>350 | 43,328 |
ax denotes the number of words in the abstract.
Hyperparameters of convolutional operation.
| Convolutional layer | Convolution kernel count | Window size |
| First | 256 | 3 |
| Second | 128 | 4 |
| Third | 96 | 5 |
Accuracy of Bi-LSTM, fastText, and Pubmender. Italicized values indicate the best results. acc@N represents the accuracy for top-N selection.
| Methods | acc@1 | acc@3 | acc@5 |
| fastText | 0.66 | 0.86 | 0.92 |
| Bi-LSTMa (max-pooling) | 0.71 | 0.90 | 0.95 |
| Pubmender |
aBi-LSTM: bi-directional long short-term memory.
Accuracy of the classification by Pubmender and other systems. Italicized values indicate the best results. acc@N represents the accuracy for top-N selection.
| Methods | Paper count | Journal count | acc@1 | acc@3 | acc@5 | acc@10 |
| Pubmender | 837,882 | 1130 | ||||
| MASa [ | 58,466 | 300 | —b | — | 0.24 | 0.46 |
| ACMc [ | 172,890 | 2197 | — | — | 0.56 | 0.70 |
| CiteSeer [ | 35,020 | 739 | — | — | 0.24 | 0.29 |
aMAS: Microsoft Academic Search.
bExperimental evaluation is not available.
cACM: Association for Computing Machinery.
Pubmender accuracy at top N(@N) of imbalance class data. acc@N represents the accuracy for top-N selection.
| Paper count range | acc@1 | acc@3 | acc@5 | acc@10 | Paper count |
| Tiny | 0.27 | 0.44 | 0.54 | 0.66 | 16,337 |
| Small | 0.43 | 0.63 | 0.72 | 0.82 | 26,259 |
| medium | 0.62 | 0.81 | 0.88 | 0.94 | 19,588 |
| Large | 0.66 | 0.91 | 0.96 | 0.98 | 22,579 |
| All | 0.50 | 0.71 | 0.78 | 0.86 | 84,763 |
Macro-average and Micro-average metrics for recommendation results.
| Metrics | Macro-average | Micro-average | ||||
| Precision | Recall | F1 | Precision | Recall | F1 | |
| Top-1 | 0.38 | 0.32 | 0.33 | 0.50 | 0.50 | 0.50 |
| Top-3 | 0.37 | 0.50 | 0.41 | 0.45 | 0.71 | 0.55 |
| Top-5 | 0.35 | 0.59 | 0.42 | 0.42 | 0.78 | 0.55 |
| Top-10 | 0.32 | 0.70 | 0.42 | 0.38 | 0.86 | 0.53 |
Comparison between Pubmender and Journal Finder. Italicized values indicate the best results. acc@N represents the accuracy for top-N selection.
| Systems | acc@1 | acc@3 | acc@5 | acc@10 |
| Pubmender | ||||
| Journal Finder | 0.05 | 0.12 | 0.13 | 0.21 |
| Improvement (%) | 1140 | 525 | 546 | 329 |
Comparison between Pubmender and Journal Suggester. Italicized values indicate the best results. acc@N represents the accuracy for top-N selection.
| Systems | acc@1 | acc@3 | acc@5 | acc@10 |
| Pubmender | ||||
| Journal Suggester | 0.11 | 0.15 | 0.17 | 0.18 |
| Improvement (%) | 418 | 440 | 412 | 406 |