Halil Kilicoglu1, Graciela Rosemblat1, Mario Malicki2,3, Gerben Ter Riet2. 1. Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, Bethesda, MD, USA. 2. Department of General Practice, Academic Medical Center, Amsterdam, The Netherlands. 3. Department of Research in Biomedicine and Health, University of Split School of Medicine, Split, Croatia.
Abstract
Objective: To automatically recognize self-acknowledged limitations in clinical research publications to support efforts in improving research transparency. Methods: To develop our recognition methods, we used a set of 8431 sentences from 1197 PubMed Central articles. A subset of these sentences was manually annotated for training/testing, and inter-annotator agreement was calculated. We cast the recognition problem as a binary classification task, in which we determine whether a given sentence from a publication discusses self-acknowledged limitations or not. We experimented with three methods: a rule-based approach based on document structure, supervised machine learning, and a semi-supervised method that uses self-training to expand the training set in order to improve classification performance. The machine learning algorithms used were logistic regression (LR) and support vector machines (SVM). Results: Annotators had good agreement in labeling limitation sentences (Krippendorff's α = 0.781). Of the three methods used, the rule-based method yielded the best performance with 91.5% accuracy (95% CI [90.1-92.9]), while self-training with SVM led to a small improvement over fully supervised learning (89.9%, 95% CI [88.4-91.4] vs 89.6%, 95% CI [88.1-91.1]). Conclusions: The approach presented can be incorporated into the workflows of stakeholders focusing on research transparency to improve reporting of limitations in clinical studies.
Objective: To automatically recognize self-acknowledged limitations in clinical research publications to support efforts in improving research transparency. Methods: To develop our recognition methods, we used a set of 8431 sentences from 1197 PubMed Central articles. A subset of these sentences was manually annotated for training/testing, and inter-annotator agreement was calculated. We cast the recognition problem as a binary classification task, in which we determine whether a given sentence from a publication discusses self-acknowledged limitations or not. We experimented with three methods: a rule-based approach based on document structure, supervised machine learning, and a semi-supervised method that uses self-training to expand the training set in order to improve classification performance. The machine learning algorithms used were logistic regression (LR) and support vector machines (SVM). Results: Annotators had good agreement in labeling limitation sentences (Krippendorff's α = 0.781). Of the three methods used, the rule-based method yielded the best performance with 91.5% accuracy (95% CI [90.1-92.9]), while self-training with SVM led to a small improvement over fully supervised learning (89.9%, 95% CI [88.4-91.4] vs 89.6%, 95% CI [88.1-91.1]). Conclusions: The approach presented can be incorporated into the workflows of stakeholders focusing on research transparency to improve reporting of limitations in clinical studies.
Authors: Robert Schulz; Adrian Barnett; René Bernard; Nicholas J L Brown; Jennifer A Byrne; Peter Eckmann; Małgorzata A Gazda; Halil Kilicoglu; Eric M Prager; Maia Salholz-Hillel; Gerben Ter Riet; Timothy Vines; Colby J Vorland; Han Zhuang; Anita Bandrowski; Tracey L Weissgerber Journal: BMC Res Notes Date: 2022-06-11