Weixiang Shao1, Clive E Adams2, Aaron M Cohen3, John M Davis4, Marian S McDonagh3, Sujata Thakurta3, Philip S Yu1, Neil R Smalheiser5. 1. Department of Computer Science, University of Illinois at Chicago, Chicago, IL 60612, USA. 2. Division of Psychiatry, University of Nottingham, Nottingham, UK. 3. Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA. 4. Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 60612, USA. 5. Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 60612, USA. Electronic address: neils@uic.edu.
Abstract
OBJECTIVE: It is important to identify separate publications that report outcomes from the same underlying clinical trial, in order to avoid over-counting these as independent pieces of evidence. METHODS: We created positive and negative training sets (comprised of pairs of articles reporting on the same condition and intervention) that were, or were not, linked to the same clinicaltrials.gov trial registry number. Features were extracted from MEDLINE and PubMed metadata; pairwise similarity scores were modeled using logistic regression. RESULTS: Article pairs from the same trial were identified with high accuracy (F1 score=0.843). We also created a clustering tool, Aggregator, that takes as input a PubMed user query for RCTs on a given topic, and returns article clusters predicted to arise from the same clinical trial. DISCUSSION: Although painstaking examination of full-text may be needed to be conclusive, metadata are surprisingly accurate in predicting when two articles derive from the same underlying clinical trial.
OBJECTIVE: It is important to identify separate publications that report outcomes from the same underlying clinical trial, in order to avoid over-counting these as independent pieces of evidence. METHODS: We created positive and negative training sets (comprised of pairs of articles reporting on the same condition and intervention) that were, or were not, linked to the same clinicaltrials.gov trial registry number. Features were extracted from MEDLINE and PubMed metadata; pairwise similarity scores were modeled using logistic regression. RESULTS: Article pairs from the same trial were identified with high accuracy (F1 score=0.843). We also created a clustering tool, Aggregator, that takes as input a PubMed user query for RCTs on a given topic, and returns article clusters predicted to arise from the same clinical trial. DISCUSSION: Although painstaking examination of full-text may be needed to be conclusive, metadata are surprisingly accurate in predicting when two articles derive from the same underlying clinical trial.
Authors: Aaron M Cohen; Neil R Smalheiser; Marian S McDonagh; Clement Yu; Clive E Adams; John M Davis; Philip S Yu Journal: J Am Med Inform Assoc Date: 2015-02-05 Impact factor: 4.497
Authors: Neil R Smalheiser; Can Lin; Lifeng Jia; Yu Jiang; Aaron M Cohen; Clement Yu; John M Davis; Clive E Adams; Marian S McDonagh; Weiyi Meng Journal: Health Inf Sci Syst Date: 2014-01-10
Authors: Aaron M Cohen; Neil R Smalheiser; Marian S McDonagh; Clement Yu; Clive E Adams; John M Davis; Philip S Yu Journal: J Am Med Inform Assoc Date: 2015-02-05 Impact factor: 4.497
Authors: Annette M O'Connor; Guy Tsafnat; Stephen B Gilbert; Kristina A Thayer; Ian Shemilt; James Thomas; Paul Glasziou; Mary S Wolfe Journal: Syst Rev Date: 2019-02-20