Satya S Sahoo1, Joshua Valdez2, Matthew Kim3, Michael Rueschman3, Susan Redline3. 1. Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA. Electronic address: satya.sahoo@case.edu. 2. Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA. 3. Department of Medicine, Brigham and Women's Hospital and Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA.
Abstract
OBJECTIVE: Reproducibility of research studies is key to advancing biomedical science by building on sound results and reducing inconsistencies between published results and study data. We propose that the available data from research studies combined with provenance metadata provide a framework for evaluating scientific reproducibility. We developed the ProvCaRe platform to model, extract, and query semantic provenance information from 435, 248 published articles. METHODS: The ProvCaRe platform consists of: (1) the S3 model and a formal ontology; (2) a provenance-focused text processing workflow to generate provenance triples consisting of subject, predicate, and object using metadata extracted from articles; and (3) the ProvCaRe knowledge repository that supports "provenance-aware" hypothesis-driven search queries. A new provenance-based ranking algorithm is used to rank the articles in the search query results. RESULTS: The ProvCaRe knowledge repository contains 48.9 million provenance triples. Seven research hypotheses were used as search queries for evaluation and the resulting provenance triples were analyzed using five categories of provenance terms. The highest number of terms (34%) described provenance related to population cohort followed by 29% of terms describing statistical data analysis methods, and only 5% of the terms described the measurement instruments used in a study. In addition, the analysis showed that some articles included a higher number of provenance terms across multiple provenance categories suggesting a higher potential for reproducibility of these research studies. CONCLUSION: The ProvCaRe knowledge repository (https://provcare. CASE: edu/) is one of the largest provenance resources for biomedical research studies that combines intuitive search functionality with a new provenance-based ranking feature to list articles related to a search query.
OBJECTIVE: Reproducibility of research studies is key to advancing biomedical science by building on sound results and reducing inconsistencies between published results and study data. We propose that the available data from research studies combined with provenance metadata provide a framework for evaluating scientific reproducibility. We developed the ProvCaRe platform to model, extract, and query semantic provenance information from 435, 248 published articles. METHODS: The ProvCaRe platform consists of: (1) the S3 model and a formal ontology; (2) a provenance-focused text processing workflow to generate provenance triples consisting of subject, predicate, and object using metadata extracted from articles; and (3) the ProvCaRe knowledge repository that supports "provenance-aware" hypothesis-driven search queries. A new provenance-based ranking algorithm is used to rank the articles in the search query results. RESULTS: The ProvCaRe knowledge repository contains 48.9 million provenance triples. Seven research hypotheses were used as search queries for evaluation and the resulting provenance triples were analyzed using five categories of provenance terms. The highest number of terms (34%) described provenance related to population cohort followed by 29% of terms describing statistical data analysis methods, and only 5% of the terms described the measurement instruments used in a study. In addition, the analysis showed that some articles included a higher number of provenance terms across multiple provenance categories suggesting a higher potential for reproducibility of these research studies. CONCLUSION: The ProvCaRe knowledge repository (https://provcare. CASE: edu/) is one of the largest provenance resources for biomedical research studies that combines intuitive search functionality with a new provenance-based ranking feature to list articles related to a search query.
Authors: Guergana K Savova; James J Masanz; Philip V Ogren; Jiaping Zheng; Sunghwan Sohn; Karin C Kipper-Schuler; Christopher G Chute Journal: J Am Med Inform Assoc Date: 2010 Sep-Oct Impact factor: 4.497
Authors: Ida Sim; Samson W Tu; Simona Carini; Harold P Lehmann; Brad H Pollock; Mor Peleg; Knut M Wittkowski Journal: J Biomed Inform Date: 2013-11-13 Impact factor: 6.317
Authors: B A Nosek; G Alter; G C Banks; D Borsboom; S D Bowman; S J Breckler; S Buck; C D Chambers; G Chin; G Christensen; M Contestabile; A Dafoe; E Eich; J Freese; R Glennerster; D Goroff; D P Green; B Hesse; M Humphreys; J Ishiyama; D Karlan; A Kraut; A Lupia; P Mabry; T A Madon; N Malhotra; E Mayo-Wilson; M McNutt; E Miguel; E Levy Paluck; U Simonsohn; C Soderberg; B A Spellman; J Turitto; G VandenBos; S Vazire; E J Wagenmakers; R Wilson; T Yarkoni Journal: Science Date: 2015-06-26 Impact factor: 47.728
Authors: Margaret J Gabanyi; Paul D Adams; Konstantin Arnold; Lorenza Bordoli; Lester G Carter; Judith Flippen-Andersen; Lida Gifford; Juergen Haas; Andrei Kouranov; William A McLaughlin; David I Micallef; Wladek Minor; Raship Shah; Torsten Schwede; Yi-Ping Tao; John D Westbrook; Matthew Zimmerman; Helen M Berman Journal: J Struct Funct Genomics Date: 2011-04-07
Authors: Mark D Wilkinson; Michel Dumontier; I Jsbrand Jan Aalbersberg; Gabrielle Appleton; Myles Axton; Arie Baak; Niklas Blomberg; Jan-Willem Boiten; Luiz Bonino da Silva Santos; Philip E Bourne; Jildau Bouwman; Anthony J Brookes; Tim Clark; Mercè Crosas; Ingrid Dillo; Olivier Dumon; Scott Edmunds; Chris T Evelo; Richard Finkers; Alejandra Gonzalez-Beltran; Alasdair J G Gray; Paul Groth; Carole Goble; Jeffrey S Grethe; Jaap Heringa; Peter A C 't Hoen; Rob Hooft; Tobias Kuhn; Ruben Kok; Joost Kok; Scott J Lusher; Maryann E Martone; Albert Mons; Abel L Packer; Bengt Persson; Philippe Rocca-Serra; Marco Roos; Rene van Schaik; Susanna-Assunta Sansone; Erik Schultes; Thierry Sengstag; Ted Slater; George Strawn; Morris A Swertz; Mark Thompson; Johan van der Lei; Erik van Mulligen; Jan Velterop; Andra Waagmeester; Peter Wittenburg; Katherine Wolstencroft; Jun Zhao; Barend Mons Journal: Sci Data Date: 2016-03-15 Impact factor: 6.444