| Literature DB >> 32059035 |
Min Song1, Keun Young Kang1, Tatsawan Timakum1,2, Xinyuan Zhang3.
Abstract
Acknowledgements have been examined as important elements in measuring the contributions to and intellectual debts of a scientific publication. Unlike previous studies that were limited in the scope of analysis and manual examination. The present study aimed to conduct the automatic classification of acknowledgements on a large scale of data. To this end, we first created a training dataset for acknowledgements classification by sampling the acknowledgements sections from the entire PubMed Central database. Second, we adopted various supervised learning algorithms to examine which algorithm performed best in what condition. In addition, we observed the factors affecting classification performance. We investigated the effects of the following three main aspects: classification algorithms, categories, and text representations. The CNN+Doc2Vec algorithm achieved the highest performance of 93.58% accuracy in the original dataset and 87.93% in the converted dataset. The experimental results indicated that the characteristics of categories and sentence patterns influenced the performance of classification. Most of the classifiers performed better on the categories of financial, peer interactive communication, and technical support compared to other classes.Entities:
Year: 2020 PMID: 32059035 PMCID: PMC7021295 DOI: 10.1371/journal.pone.0228928
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Acknowledgements categories of previous studies.
| (A1) Acknowledgements categories of different scholars | (A2) Acknowledgements categories of previous studies in conclusion | |
|---|---|---|
| Mackintosh [ | 1. Facilities | |
| Patel [ | 1. Technical assistance | |
| McCain [ | 1. Access to research-related information | |
| Cronin [ | 1. Paymaster | |
| Tiew and Sen [ | 1. Moral support | |
| Cronin, Shaw, And LaBarre [ | 1. Conceptual | |
| Rattan [ | 1. Access support | |
Fig 1Overall procedure of proposed approach.
Categories of acknowledgements in this study.
| Categories | Description |
|---|---|
| The acknowledgement statement is about a person or organization for advising and supporting during the research process and the reporting of the study as a manuscript. This category represents specific suggestions and technical and analysis support in research, such as study design, providing help in the use of laboratory tools and space, and sample preparation. Moreover, it includes discussions, comments, and assessments on the study, report, and manuscript, including reviewing, editing, proofreading, and linguistic support. | |
| The declaration statement relates to conflicts of interest, access to unpublished data, and copyright as well as the authority of a person or organization. It also relates to ethics approval and consent with permission to publish the data and manuscript. This category involves moral support and privacy concerns as well as declarations of lack of funding. | |
| The statement of presentation states whether a portion of the work, including an abstract, poster, or oral presentation, was presented at a conference, proceeding, or seminar. | |
| This acknowledges any grants or scholarships received by the researcher from external or internal funding. | |
| In this statement, the authors express gratitude to persons or organizations that encouraged them during the study. This acknowledgement is not of specific assistance. | |
| The general statement presents information not related to a study directly (e.g., information on a person or institute). This is an example of the |
Examples of noun phrase patterns in each type of acknowledgements category.
| Acknowledgements categories with noun phrase patterns of previous study Paul-Hua et al. [ | Acknowledgements categories with noun phrase patterns of this study (Six categories) |
|---|---|
Number of sentences for each category.
| Category | Number of Sentences |
|---|---|
| Declaration | 1,871 |
| Financial | 2,450 |
| Peer Interactive Communication and Technical Support | 2,532 |
| Presentation | 1,209 |
| General Acknowledgement | 1,448 |
| General Statement | 1,383 |
Fig 2Performance results of seven classifiers.
Fig 3Overall procedure of text conversion.
Examples of text representations.
| Category | Original sentence | Transformed sentence | |
|---|---|---|---|
| Declaration | All participants gave written informed consent before inclusion after adequate explanation of the study protocol. | All participants gave written informed consent before inclusion declaration | |
| Financial | ChemMatCARS Sector 15 is supported by the National Science Foundation under grant number NSF/CHE-1346572. | This project has been supported by the ORGANIZATION GRANT_NO finance | |
| Peer Interactive Communication and Technical Support | Zoher Gueroui and Christophe Sandt are gratefully acknowledged for access to the fluorescence and FTIR microscopes, respectively. | PERSON and PERSON are gratefully acknowledged for access | |
| Presentation | This work was presented in part at the 104th Annual Meeting of the American Society for Microbiology, New Orleans, LA, 23–27 May 2004. | This work was presented in part SET LOCATION ORGANIZATION GRANT_NO presentation | |
| General Acknowledgement | We appreciate the participation of the patients and their families. | We appreciate the participation of the patients | |
| General Statement | R. N. is an Investigator of the Howard Hughes Medical Institute. | PERSON is an Investigator of the ORGANIZATION |
Fig 4Frequency distribution for original dataset.
Fig 5Frequency distribution for converted dataset.
Performance result of classification algorithms (F1-measure).
| Model | Original dataset (%) | Converted dataset (%) |
|---|---|---|
| CNN | 79.7 | 80.8 |
| RNN | 55.9 | 70.5 |
| CNN + Doc2vec | ||
| kNN | 80.3 | 80.1 |
| Logistic regression | 89.7 | 86.9 |
| Naïve Bayes | 88.1 | 87.2 |
| N-gram language model | 87.8 | 85.8 |
Average of proportional performance results on categories with original dataset.
| Algorithms (Accuracy) | Categories | |||||
|---|---|---|---|---|---|---|
| Peer interactive communication and technical support | Declaration | Presentation | Financial | General acknowledgement | General statement | |
| CNN (80.00%) | 8.57 | 17.14 | 11.43 | 7.14 | 14.29 | |
| RNN (55.80%) | 5.48 | 8.27 | 12.57 | 6.07 | 3.92 | |
| CNN+Doc2vec (93.58%) | 21.66 | 17.21 | 9.73 | 11.38 | 11.61 | |
| kNN (79.67%) | 14.04 | 13.77 | 11.01 | 10.65 | 11.89 | |
| Logistic regression (89.54%) | 18.72 | 15.37 | 11.84 | 10.60 | 12.12 | |
| Naïve Bayes (88.21%) | 20.28 | 15.97 | 11.75 | 8.86 | 10.33 | |
| N-gram language model (87.88%) | 20.24 | 15.92 | 11.75 | 8.81 | 10.10 | |
Average of proportional performance results on categories with converted dataset.
| Algorithms (Accuracy) | Categories | |||||
|---|---|---|---|---|---|---|
| Peer interactive communication and technical support | Declaration | Presentation | Financial | General acknowledgement | General statement | |
| CNN (80.00%) | 15.71 | 8.57 | 5.71 | 18.57 | 11.43 | |
| RNN (69.03%) | 11.43 | 11.54 | 13.42 | 9.50 | 6.89 | |
| CNN+Doc2vec (87.93%) | 20.00 | 14.82 | 11.47 | 10.97 | 10.56 | |
| kNN (80.40%) | 19.60 | 12.07 | 11.47 | 7.53 | 9.64 | |
| Logistic regression (87.15%) | 20.93 | 14.92 | 11.89 | 8.44 | 9.96 | |
| Naïve Bayes (87.20%) | 20.24 | 15.05 | 11.38 | 9.73 | 9.87 | |
| N-gram language model (85.87%) | 19.96 | 15.00 | 11.38 | 9.78 | 9.18 | |
Fig 6Performance comparison by accuracy measures on original and converted datasets.