Halil Kilicoglu1, Graciela Rosemblat2, Linh Hoang3, Sahil Wadhwa4, Zeshan Peng2, Mario Malički5, Jodi Schneider3, Gerben Ter Riet6. 1. School of Information Sciences, University of Illinois at Urbana-Champaign, Champaign, IL, USA; U.S. National Library of Medicine, National Institutes of Health, Bethesda, MD, USA. Electronic address: halil@illinois.edu. 2. U.S. National Library of Medicine, National Institutes of Health, Bethesda, MD, USA. 3. School of Information Sciences, University of Illinois at Urbana-Champaign, Champaign, IL, USA. 4. Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL, USA. 5. Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, USA. 6. Urban Vitality Center of Expertise, Faculty of Health, Amsterdam University of Applied Sciences, Amsterdam, the Netherlands; Department of Cardiology Heart Center, Amsterdam UMC, University of Amsterdam, the Netherlands.
Abstract
OBJECTIVE: To annotate a corpus of randomized controlled trial (RCT) publications with the checklist items of CONSORT reporting guidelines and using the corpus to develop text mining methods for RCT appraisal. METHODS: We annotated a corpus of 50 RCT articles at the sentence level using 37 fine-grained CONSORT checklist items. A subset (31 articles) was double-annotated and adjudicated, while 19 were annotated by a single annotator and reconciled by another. We calculated inter-annotator agreement at the article and section level using MASI (Measuring Agreement on Set-Valued Items) and at the CONSORT item level using Krippendorff's α. We experimented with two rule-based methods (phrase-based and section header-based) and two supervised learning approaches (support vector machine and BioBERT-based neural network classifiers), for recognizing 17 methodology-related items in the RCT Methods sections. RESULTS: We created CONSORT-TM consisting of 10,709 sentences, 4,845 (45%) of which were annotated with 5,246 labels. A median of 28 CONSORT items (out of possible 37) were annotated per article. Agreement was moderate at the article and section levels (average MASI: 0.60 and 0.64, respectively). Agreement varied considerably among individual checklist items (Krippendorff's α= 0.06-0.96). The model based on BioBERT performed best overall for recognizing methodology-related items (micro-precision: 0.82, micro-recall: 0.63, micro-F1: 0.71). Combining models using majority vote and label aggregation further improved precision and recall, respectively. CONCLUSION: Our annotated corpus, CONSORT-TM, contains more fine-grained information than earlier RCT corpora. Low frequency of some CONSORT items made it difficult to train effective text mining models to recognize them. For the items commonly reported, CONSORT-TM can serve as a testbed for text mining methods that assess RCT transparency, rigor, and reliability, and support methods for peer review and authoring assistance. Minor modifications to the annotation scheme and a larger corpus could facilitate improved text mining models. CONSORT-TM is publicly available at https://github.com/kilicogluh/CONSORT-TM.
OBJECTIVE: To annotate a corpus of randomized controlled trial (RCT) publications with the checklist items of CONSORT reporting guidelines and using the corpus to develop text mining methods for RCT appraisal. METHODS: We annotated a corpus of 50 RCT articles at the sentence level using 37 fine-grained CONSORT checklist items. A subset (31 articles) was double-annotated and adjudicated, while 19 were annotated by a single annotator and reconciled by another. We calculated inter-annotator agreement at the article and section level using MASI (Measuring Agreement on Set-Valued Items) and at the CONSORT item level using Krippendorff's α. We experimented with two rule-based methods (phrase-based and section header-based) and two supervised learning approaches (support vector machine and BioBERT-based neural network classifiers), for recognizing 17 methodology-related items in the RCT Methods sections. RESULTS: We created CONSORT-TM consisting of 10,709 sentences, 4,845 (45%) of which were annotated with 5,246 labels. A median of 28 CONSORT items (out of possible 37) were annotated per article. Agreement was moderate at the article and section levels (average MASI: 0.60 and 0.64, respectively). Agreement varied considerably among individual checklist items (Krippendorff's α= 0.06-0.96). The model based on BioBERT performed best overall for recognizing methodology-related items (micro-precision: 0.82, micro-recall: 0.63, micro-F1: 0.71). Combining models using majority vote and label aggregation further improved precision and recall, respectively. CONCLUSION: Our annotated corpus, CONSORT-TM, contains more fine-grained information than earlier RCT corpora. Low frequency of some CONSORT items made it difficult to train effective text mining models to recognize them. For the items commonly reported, CONSORT-TM can serve as a testbed for text mining methods that assess RCT transparency, rigor, and reliability, and support methods for peer review and authoring assistance. Minor modifications to the annotation scheme and a larger corpus could facilitate improved text mining models. CONSORT-TM is publicly available at https://github.com/kilicogluh/CONSORT-TM.
Authors: Isabelle Boutron; Douglas G Altman; David Moher; Kenneth F Schulz; Philippe Ravaud Journal: Ann Intern Med Date: 2017-06-20 Impact factor: 25.391
Authors: Story C Landis; Susan G Amara; Khusru Asadullah; Chris P Austin; Robi Blumenstein; Eileen W Bradley; Ronald G Crystal; Robert B Darnell; Robert J Ferrante; Howard Fillit; Robert Finkelstein; Marc Fisher; Howard E Gendelman; Robert M Golub; John L Goudreau; Robert A Gross; Amelie K Gubitz; Sharon E Hesterlee; David W Howells; John Huguenard; Katrina Kelner; Walter Koroshetz; Dimitri Krainc; Stanley E Lazic; Michael S Levine; Malcolm R Macleod; John M McCall; Richard T Moxley; Kalyani Narasimhan; Linda J Noble; Steve Perrin; John D Porter; Oswald Steward; Ellis Unger; Ursula Utz; Shai D Silberberg Journal: Nature Date: 2012-10-11 Impact factor: 49.962
Authors: Iveta Simera; David Moher; Allison Hirst; John Hoey; Kenneth F Schulz; Douglas G Altman Journal: BMC Med Date: 2010-04-26 Impact factor: 8.775
Authors: Mark D Wilkinson; Michel Dumontier; I Jsbrand Jan Aalbersberg; Gabrielle Appleton; Myles Axton; Arie Baak; Niklas Blomberg; Jan-Willem Boiten; Luiz Bonino da Silva Santos; Philip E Bourne; Jildau Bouwman; Anthony J Brookes; Tim Clark; Mercè Crosas; Ingrid Dillo; Olivier Dumon; Scott Edmunds; Chris T Evelo; Richard Finkers; Alejandra Gonzalez-Beltran; Alasdair J G Gray; Paul Groth; Carole Goble; Jeffrey S Grethe; Jaap Heringa; Peter A C 't Hoen; Rob Hooft; Tobias Kuhn; Ruben Kok; Joost Kok; Scott J Lusher; Maryann E Martone; Albert Mons; Abel L Packer; Bengt Persson; Philippe Rocca-Serra; Marco Roos; Rene van Schaik; Susanna-Assunta Sansone; Erik Schultes; Thierry Sengstag; Ted Slater; George Strawn; Morris A Swertz; Mark Thompson; Johan van der Lei; Erik van Mulligen; Jan Velterop; Andra Waagmeester; Peter Wittenburg; Katherine Wolstencroft; Jun Zhao; Barend Mons Journal: Sci Data Date: 2016-03-15 Impact factor: 6.444