| Literature DB >> 34150995 |
Abdul Shahid1, Muhammad Tanvir Afzal2, Abdullah Alharbi3, Hanan Aljuaid4, Shaha Al-Otaibi5.
Abstract
From the past half of a century, identification of the relevant documents is deemed an active area of research due to the rapid increase of data on the web. The traditional models to retrieve relevant documents are based on bibliographic information such as Bibliographic coupling, Co-citations, and Direct citations. However, in the recent past, the scientific community has started to employ textual features to improve existing models' accuracy. In our previous study, we found that analysis of citations at a deep level (i.e., content level) can play a paramount role in finding more relevant documents than surface level (i.e., just bibliography details). We found that cited and citing papers have a high degree of relevancy when in-text citations frequency of the cited paper is more than five times in the citing paper's text. This paper is an extension of our previous study in terms of its evaluation of a comprehensive dataset. Moreover, the study results are also compared with other state-of-the-art approaches i.e., content, metadata, and bibliography. For evaluation, a user study is conducted on selected papers from 1,200 documents (comprise about 16,000 references) of an online journal, Journal of Computer Science (J.UCS). The evaluation results indicate that in-text citation frequency has attained higher precision in finding relevant papers than other state-of-the-art techniques such as content, bibliographic coupling, and metadata-based techniques. The use of in-text citation may help in enhancing the quality of existing information systems and digital libraries. Further, more sophisticated measure may be redefined be considering the use of in-text citations. ©2021 Shahid et al.Entities:
Keywords: Citations; Digital Libraries; In-text Citation; Relevant Documents
Year: 2021 PMID: 34150995 PMCID: PMC8189020 DOI: 10.7717/peerj-cs.524
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1System architecture for computing in-text citation frequencies of references in the body text of the article.
Figure 2The distribution of in-text citation frequencies in various ranges.
Randomly selected paper for conduction of user studies to evaluate the role of in-text citation in finding relevant papers.
| 1001 | Behavioral Institutions and Refinements in | 12 | 8 | 35 | 27 | 8 | |
| 1136 | Constant Size Ciphertext HIBE in the Augmented | 13 | 10 | 14 | 10 | 4 | |
| 114 | Hausdorff Measure and Lukasiewicz Languages | 11 | 12 | 23 | 16 | 7 | |
| 1140 | An Approach to Polygonal Approximation of Digital Curves Based on Discrete Particle Swarm Algorithm | 10 | 10 | 17 | 15 | 2 | |
| 118 | Sequential Computability of a Function. | 11 | 12 | 17 | 11 | 6 | |
| 218 | Incremental Maintenance of Data Warehouses Based on Past Temporal Logic Operators | 10 | 9 | 35 | 33 | 2 | |
| 248 | An Automatic Verification Technique for | 9 | 3 | 32 | 24 | 8 | |
| 299 | Lazy Cyclic Reference Counting | 9 | 8 | 16 | 13 | 3 | |
| 53 | Consensus-Based Hybrid Adaptation of Web Systems User Interfaces | 11 | 2 | 26 | 17 | 9 | |
| 58 | On Theoretical Upper Bounds for Routing Estimation | 11 | 6 | 11 | 11 | 0 | |
| Total | 226 | 177 | 49 |
In-text citation frequencies of cited papers in citing papers.
| 1 | 7 | 2 | 2 | 1 | 1 | 1 | 6 | 1 | 2 | |
| 4 | 21 | 3 | 1 | 6 | 2 | 1 | 2 | 1 | 1 | |
| 5 | 4 | 1 | 2 | 3 | 2 | 2 | 6 | 1 | 11 | |
| 3 | 1 | 5 | 1 | 7 | 5 | 2 | 3 | 1 | 1 | |
| 4 | 2 | 2 | 13 | 1 | 5 | 2 | 10 | 3 | 2 | |
| 9 | 3 | 12 | 0 | 8 | 16 | 3 | 1 | 5 | 1 | |
| 10 | 10 | 9 | 1 | 17 | 3 | 1 | 5 | 10 | 1 | |
| 3 | 3 | n/a | 1 | 4 | 6 | n/a | 1 | 12 | 1 | |
| 3 | 6 | n/a | n/a | 4 | 4 | n/a | n/a | 2 | n/a | |
| n/a | 2 | n/a | n/a | 2 | 2 | n/a | n/a | 14 | n/a |
The accuracy scores of recommendations generated by each approach for different input articles.
| IDs | In-text citations | Content based | Bibliographic coupling | Title terms |
|---|---|---|---|---|
| 1,001 | 0.80 | 0.60 | 0.40 | 0.40 |
| 1,136 | 1.00 | 0.80 | 0.60 | 0.60 |
| 114 | 0.80 | 0.60 | 0.40 | 0.40 |
| 1,140 | 1.00 | 0.80 | 0.60 | 0.60 |
| 118 | 1.00 | 0.60 | 0.40 | 0.40 |
| 218 | 1.00 | 0.60 | 0.80 | 0.40 |
| 248 | 1.00 | 1.00 | 0.60 | 0.60 |
| 299 | 1.00 | 0.80 | 0.60 | 0.80 |
| 53 | 1.00 | 0.80 | 0.20 | 0.20 |
| 58 | 1.00 | 1.00 | 1.00 | 0.40 |
The cosine similarity values generated based on TF-IDF terms vectors for cited and citing papers.
| P/ R | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| R1 | 0.29 | 0.28 | 0.19 | 0.28 | 0.26 | 0.21 | 0.04 | 0.26 | 0.14 | 0.12 |
| R2 | 0.34 | 0.35 | 0.28 | 0.03 | 0.27 | 0.28 | 0.2 | 0.25 | 0.18 | 0.33 |
| R3 | 0.3 | 0.25 | 0.17 | 0.19 | 0.38 | 0.19 | 0.29 | 0.43 | 0.15 | 0.38 |
| R4 | 0.3 | 0.18 | 0.2 | 0.35 | 0.18 | 0.3 | 0.21 | 0.38 | 0.21 | 0.16 |
| R5 | 0.49 | 0.22 | 0.4 | 0.43 | 0.5 | 0.2 | 0.23 | 0.5 | 0.19 | 0.24 |
| R6 | 0.21 | 0.2 | 0.13 | 0.07 | 0.43 | 0.27 | 0.74 | 0.45 | 0.34 | 0.11 |
| R7 | 0.24 | 0.46 | 0.07 | 0.09 | 0.37 | 0.27 | 0.08 | 0.34 | 0.32 | 0.04 |
| R8 | 0 | 0.24 | n/a | 0.11 | 0.21 | 0.23 | n/a | 0.19 | 0.48 | 0.02 |
| R9 | 0.11 | 0.19 | n/a | n/a | 0.19 | 0.21 | n/a | n/a | 0.03 | n/a |
| R10 | n/a | 0.1 | n/a | n/a | 0 | 0.17 | n/a | n/a | 0.05 | n/a |
Figure 3The total number of recommendations produced by each technique.
The number of common references between cited and citing papers.
| P/ R | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| R1 | 4 | 5 | 0 | 7 | 5 | 2 | 0 | 4 | 0 | 3 |
| R2 | 2 | 8 | 0 | 0 | 0 | 2 | 0 | 4 | 0 | 1 |
| R3 | 3 | 2 | 11 | 1 | 1 | 0 | 1 | 0 | 0 | 4 |
| R4 | 5 | 4 | 0 | 7 | 4 | 7 | 0 | 1 | 0 | 2 |
| R5 | 0 | 0 | 0 | 0 | 0 | 1 | 3 | 5 | 0 | 2 |
| R6 | 15 | 5 | 13 | 0 | 0 | 3 | 22 | 8 | 0 | 0 |
| R7 | 2 | 11 | 0 | 0 | 8 | 3 | 1 | 1 | 1 | 0 |
| R8 | 4 | 0 | n/a | 0 | 8 | 4 | n/a | 1 | 0 | 0 |
| R9 | 0 | 4 | n/a | n/a | 3 | 3 | n/a | n/a | 0 | n/a |
| R10 | n/a | 1 | n/a | n/a | 0 | 1 | n/a | n/a | 0 | n/a |
The similarity between cited and citing papers by considering their titles.
| P/ R | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| R1 | 0 | 1 | 0 | 1 | 0 | 3 | 0 | 1 | 0 | 1 |
| R2 | 0 | 1 | 1 | 3 | 1 | 1 | 1 | 0 | 0 | 0 |
| R3 | 2 | 0 | 1 | 0 | 1 | 2 | 1 | 3 | 0 | 0 |
| R4 | 0 | 0 | 0 | 6 | 1 | 1 | 0 | 3 | 2 | 1 |
| R5 | 0 | 0 | 3 | 0 | 5 | 0 | 1 | 3 | 0 | 0 |
| R6 | 1 | 0 | 0 | 0 | 0 | 0 | 6 | 2 | 0 | 0 |
| R7 | 0 | 3 | 0 | 0 | 2 | 1 | 0 | 2 | 0 | 0 |
| R8 | 0 | 0 | n/a | 0 | 1 | 1 | n/a | 0 | 1 | 0 |
| R9 | 0 | 0 | n/a | n/a | 0 | 3 | n/a | n/a | 0 | n/a |
| R10 | n/a | 0 | n/a | n/a | 0 | 1 | n/a | n/a | 0 | n/a |
The similarity score between cited and citing papers using their keywords.
| P/ R | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| R1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| R2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| R3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| R4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| R5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| R6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| R7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| R8 | 0 | 0 | 0 | n/a | 0 | 0 | n/a | 0 | 0 | 0 |
| R9 | 0 | 0 | n/a | n/a | 0 | 0 | n/a | n/a | 0 | n/a |
| R10 | n/a | 0 | n/a | n/a | 0 | 0 | n/a | n/a | 0 | n/a |
The score of similar authors for cited and citing papers.
| P/ R | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| R1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| R2 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| R3 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
| R4 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| R5 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
| R6 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 |
| R7 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| R8 | 0 | 0 | n/a | 0 | 0 | 0 | n/a | 0 | 1 | 0 |
| R9 | 0 | 0 | n/a | n/a | 0 | 0 | n/a | n/a | 0 | n/a |
| R10 | n/a | 0 | n/a | n/a | 0 | 0 | n/a | n/a | 0 | n/a |
Figure 4Top five recommendations of in-text citations, content, bibliographic coupling and metadata based techniques.