Babatunde Kazeem Olorisade1, Pearl Brereton2, Peter Andras3. 1. School of Computing and Mathematics, Keele University, Staffs ST5 5BG, UK. Electronic address: b.k.olorisade@keele.ac.uk. 2. School of Computing and Mathematics, Keele University, Staffs ST5 5BG, UK. Electronic address: o.p.brereton@keele.ac.uk. 3. School of Computing and Mathematics, Keele University, Staffs ST5 5BG, UK. Electronic address: p.andras@keele.ac.uk.
Abstract
CONTEXT: Independent validation of published scientific results through study replication is a pre-condition for accepting the validity of such results. In computation research, full replication is often unrealistic for independent results validation, therefore, study reproduction has been justified as the minimum acceptable standard to evaluate the validity of scientific claims. The application of text mining techniques to citation screening in the context of systematic literature reviews is a relatively young and growing computational field with high relevance for software engineering, medical research and other fields. However, there is little work so far on reproduction studies in the field. OBJECTIVE: In this paper, we investigate the reproducibility of studies in this area based on information contained in published articles and we propose reporting guidelines that could improve reproducibility. METHODS: The study was approached in two ways. Initially we attempted to reproduce results from six studies, which were based on the same raw dataset. Then, based on this experience, we identified steps considered essential to successful reproduction of text mining experiments and characterized them to measure how reproducible is a study given the information provided on these steps. 33 articles were systematically assessed for reproducibility using this approach. RESULTS: Our work revealed that it is currently difficult if not impossible to independently reproduce the results published in any of the studies investigated. The lack of information about the datasets used limits reproducibility of about 80% of the studies assessed. Also, information about the machine learning algorithms is inadequate in about 27% of the papers. On the plus side, the third party software tools used are mostly free and available. CONCLUSIONS: The reproducibility potential of most of the studies can be significantly improved if more attention is paid to information provided on the datasets used, how they were partitioned and utilized, and how any randomization was controlled. We introduce a checklist of information that needs to be provided in order to ensure that a published study can be reproduced.
CONTEXT: Independent validation of published scientific results through study replication is a pre-condition for accepting the validity of such results. In computation research, full replication is often unrealistic for independent results validation, therefore, study reproduction has been justified as the minimum acceptable standard to evaluate the validity of scientific claims. The application of text mining techniques to citation screening in the context of systematic literature reviews is a relatively young and growing computational field with high relevance for software engineering, medical research and other fields. However, there is little work so far on reproduction studies in the field. OBJECTIVE: In this paper, we investigate the reproducibility of studies in this area based on information contained in published articles and we propose reporting guidelines that could improve reproducibility. METHODS: The study was approached in two ways. Initially we attempted to reproduce results from six studies, which were based on the same raw dataset. Then, based on this experience, we identified steps considered essential to successful reproduction of text mining experiments and characterized them to measure how reproducible is a study given the information provided on these steps. 33 articles were systematically assessed for reproducibility using this approach. RESULTS: Our work revealed that it is currently difficult if not impossible to independently reproduce the results published in any of the studies investigated. The lack of information about the datasets used limits reproducibility of about 80% of the studies assessed. Also, information about the machine learning algorithms is inadequate in about 27% of the papers. On the plus side, the third party software tools used are mostly free and available. CONCLUSIONS: The reproducibility potential of most of the studies can be significantly improved if more attention is paid to information provided on the datasets used, how they were partitioned and utilized, and how any randomization was controlled. We introduce a checklist of information that needs to be provided in order to ensure that a published study can be reproduced.
Authors: K Bretonnel Cohen; Jingbo Xia; Pierre Zweigenbaum; Tiffany J Callahan; Orin Hargraves; Foster Goss; Nancy Ide; Aurélie Névéol; Cyril Grouin; Lawrence E Hunter Journal: LREC Int Conf Lang Resour Eval Date: 2018-05
Authors: Allard J van Altena; René Spijker; Mariska M G Leeflang; Sílvia Delgado Olabarriaga Journal: Res Synth Methods Date: 2021-08-25 Impact factor: 9.308
Authors: Annette M O'Connor; Guy Tsafnat; Stephen B Gilbert; Kristina A Thayer; Ian Shemilt; James Thomas; Paul Glasziou; Mary S Wolfe Journal: Syst Rev Date: 2019-02-20
Authors: Allison Gates; Samantha Guitard; Jennifer Pillay; Sarah A Elliott; Michele P Dyson; Amanda S Newton; Lisa Hartling Journal: Syst Rev Date: 2019-11-15