| Literature DB >> 32134940 |
Shahzad Nazir1, Muhammad Asif1, Shahbaz Ahmad1, Faisal Bukhari2, Muhammad Tanvir Afzal3, Hanan Aljuaid4.
Abstract
A citation is deemed as a potential parameter to determine linkage between research articles. The parameter has extensively been employed to form multifarious academic aspects like calculating the impact factor of journals, h-Index of researchers, allocate different research grants, find the latest research trends, etc. The current state-of-the-art contends that all citations are not of equal importance. Based on this argument, the current trend in citation classification community categorizes citations into important and non-important reasons. The community has proposed different approaches to extract important citations such as citation count, context-based, metadata, and textual based approaches. The contemporary state-of-the-art in citation classification community ignores significantly potential features that can play a vital role in citation classification. This research presents a novel approach for binary citation classification by exploiting section-wise in-text citation frequencies, similarity score, and overall citation count-based features. The study also introduces machine learning algorithms based novel approach for assigning appropriate weights to the logical sections of research papers. The weights are allocated to the citations with respect to their sections. To perform the classification, we used three classification techniques, Support Vector Machine, Kernel Linear Regression, and Random Forest. The experiment was performed on two annotated benchmark datasets that contain 465 and 311 citation pairs of research articles respectively. The results revealed that the proposed approach attained an improved value of precision (i.e., 0.84 vs 0.72) from contemporary state-of-the-art approach.Entities:
Mesh:
Year: 2020 PMID: 32134940 PMCID: PMC7058319 DOI: 10.1371/journal.pone.0228885
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Overall methodology diagram.
Statistical information of data set D1.
| Classes | Annotation | Label | Citations | |
|---|---|---|---|---|
| 0 | Related Work | Non-important | 0 | 398 |
| 1 | Comparing with work | Non-important | ||
| 2 | Using work | Important | 1 | 67 |
| 3 | Extending the work | Important | ||
Annotated data set D1.
| Annotator | Paper | Cited-by | Follow-up |
|---|---|---|---|
| A | H05-1079 | W08-2004 | 0 |
| A | H05-1079 | N06-1005 | 1 |
| A | I05-2038 | C10-1076 | 0 |
| B | P05-1045 | D11-1135 | 0 |
| B | P05-1045 | D11-1141 | 1 |
Statistical information of data set D2.
| Classes | Annotation | Label | Citations | |
|---|---|---|---|---|
| 0 | Related Work | Non-important | 0 | 216 |
| 1 | Comparing with work | Non-important | ||
| 2 | Using work | Important | 1 | 95 |
| 3 | Extending the work | Important | ||
Titles and Id’s of data set D2.
| ID | Titles |
|---|---|
| 1 | JavaSymphony: A Programming and Execution Environment |
| 2 | Scheduling JavaSymphony Applications on Many-Core Parallel Computers |
| 3 | On the Evaluation of JavaSymphony for Heterogeneous Multi-core Clusters |
| 4 | Parallelism as a Concern in Java through Fork-join Synchronization Patterns |
| 5 | The JavaSymphony Extensions for Parallel GPU Computing |
Annotated data set D2.
| PaperID | CitedBy | Fine Grained Value |
|---|---|---|
| 1 | 2 | 1 |
| 1 | 3 | 1 |
| 1 | 4 | 0 |
| 1 | 5 | 0 |
| 1 | 6 | 0 |
Section-wise citation frequency.
| Annotator | Paper | Cited-by | Follow-up | Total Frequency | Introduction Frequency | Literature Frequency | Methodology Frequency | Results Frequency |
|---|---|---|---|---|---|---|---|---|
| A | H05-1079 | W08-2004 | 0 | 1 | 1 | 0 | 0 | 0 |
| A | H05-1079 | N06-1005 | 1 | 4 | 4 | 0 | 0 | 0 |
| A | I05-2038 | C10-1076 | 0 | 1 | 0 | 0 | 1 | 0 |
| B | P05-1045 | D11-1135 | 0 | 1 | 0 | 0 | 0 | 1 |
| B | P05-1045 | D11-1141 | 1 | 2 | 0 | 0 | 1 | 1 |
Statistics of dataset D1.
| Citation Pairs | Important | Non-important | Sections | Citations |
|---|---|---|---|---|
| 457 | 69 | 388 | Introduction | 155 |
| Literature Review | 131 | |||
| Methodology | 404 | |||
| Results and Discussions | 77 |
Statistics of dataset D2.
| Citation Pairs | Important | Non-important | Sections | Citations |
|---|---|---|---|---|
| 282 | 89 | 193 | Introduction | 157 |
| Literature Review | 122 | |||
| Methodology | 116 | |||
| Results and Discussions | 69 |
Appropriate normalized weights by Multiple Regression.
| Sections | Weights | Weight Rank |
|---|---|---|
| Introduction | 0.1891921316 | 3 |
| Literature Review | 0.1470393226 | 4 |
| Methodology | 0.3663496373 | 1 |
| Results and Discussions | 0.2974189085 | 2 |
Appropriate normalized weights by Neural Network.
| Sections | Weights | Weight Rank |
|---|---|---|
| Introduction | 0.19095378 | 3 |
| Literature Review | 0.17626289 | 4 |
| Methodology | 0.28763501 | 2 |
| Results and Discussions | 0.34514832 | 1 |
Multiple Regression results on section-wise citation count.
| Total Frequency | Introduction | Literature Review | Methodology | Results |
|---|---|---|---|---|
| 2 | 0 | 0 | 0.7326992746 | 0 |
| 1 | 0.1891921316 | 0 | 0 | 0 |
| 2 | 0.1891921316 | 0.1470393226 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0.2974189085 |
| 1 | 0 | 0 | 0 | 0.2974189085 |
Neural Network results on section-wise citation count.
| Total Frequency | Introduction | Literature Review | Methodology | Results |
|---|---|---|---|---|
| 2 | 0 | 0 | 0.57527002 | 0 |
| 1 | 0.19095378 | 0 | 0 | 0 |
| 2 | 0.19095378 | 0.17626289 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0.34514832 |
| 1 | 0 | 0 | 0 | 0.34514832 |
Fig 2Evaluating section-wise weight score for D1.
Fig 3Evaluating section-wise weight score for D2.
Fig 4Overall score for D1.
Fig 5Overall score for D2.
Fig 6Results comparison of Valenzuela’s dataset.
Fig 7F-measure comparison of dataset D2.