Literature DB >> 24434031

Rule-based deduplication of article records from bibliographic databases.

Yu Jiang1, Can Lin, Weiyi Meng, Clement Yu, Aaron M Cohen, Neil R Smalheiser.   

Abstract

We recently designed and deployed a metasearch engine, Metta, that sends queries and retrieves search results from five leading biomedical databases: PubMed, EMBASE, CINAHL, PsycINFO and the Cochrane Central Register of Controlled Trials. Because many articles are indexed in more than one of these databases, it is desirable to deduplicate the retrieved article records. This is not a trivial problem because data fields contain a lot of missing and erroneous entries, and because certain types of information are recorded differently (and inconsistently) in the different databases. The present report describes our rule-based method for deduplicating article records across databases and includes an open-source script module that can be deployed freely. Metta was designed to satisfy the particular needs of people who are writing systematic reviews in evidence-based medicine. These users want the highest possible recall in retrieval, so it is important to err on the side of not deduplicating any records that refer to distinct articles, and it is important to perform deduplication online in real time. Our deduplication module is designed with these constraints in mind. Articles that share the same publication year are compared sequentially on parameters including PubMed ID number, digital object identifier, journal name, article title and author list, using text approximation techniques. In a review of Metta searches carried out by public users, we found that the deduplication module was more effective at identifying duplicates than EndNote without making any erroneous assignments.

Entities:  

Mesh:

Year:  2014        PMID: 24434031      PMCID: PMC3893659          DOI: 10.1093/database/bat086

Source DB:  PubMed          Journal:  Database (Oxford)        ISSN: 1758-0463            Impact factor:   3.451


  6 in total

1.  A simple algorithm for identifying abbreviation definitions in biomedical text.

Authors:  Ariel S Schwartz; Marti A Hearst
Journal:  Pac Symp Biocomput       Date:  2003

Review 2.  Probabilistic record linkage and a method to calculate the positive predictive value.

Authors:  Tony Blakely; Clare Salmond
Journal:  Int J Epidemiol       Date:  2002-12       Impact factor: 7.196

3.  Automatic linkage of vital records.

Authors:  H B NEWCOMBE; J M KENNEDY; S J AXFORD; A P JAMES
Journal:  Science       Date:  1959-10-16       Impact factor: 47.728

4.  FRIL: A tool for comparative record linkage.

Authors:  Pawel Jurczyk; James J Lu; Li Xiong; Janet D Cragan; Adolfo Correa
Journal:  AMIA Annu Symp Proc       Date:  2008-11-06

5.  Design and implementation of Metta, a metasearch engine for biomedical literature retrieval intended for systematic reviewers.

Authors:  Neil R Smalheiser; Can Lin; Lifeng Jia; Yu Jiang; Aaron M Cohen; Clement Yu; John M Davis; Clive E Adams; Marian S McDonagh; Weiyi Meng
Journal:  Health Inf Sci Syst       Date:  2014-01-10

6.  Find duplicates among the PubMed, EMBASE, and Cochrane Library Databases in systematic review.

Authors:  Xingshun Qi; Man Yang; Weirong Ren; Jia Jia; Juan Wang; Guohong Han; Daiming Fan
Journal:  PLoS One       Date:  2013-08-20       Impact factor: 3.240

  6 in total
  9 in total

1.  Identifying and removing duplicate records from systematic review searches.

Authors:  Yoojin Kwon; Michelle Lemieux; Jill McTavish; Nadine Wathen
Journal:  J Med Libr Assoc       Date:  2015-10

2.  De-duplication of database search results for systematic reviews in EndNote.

Authors:  Wichor M Bramer; Dean Giustini; Gerdien B de Jonge; Leslie Holland; Tanja Bekhuis
Journal:  J Med Libr Assoc       Date:  2016-07

3.  Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine.

Authors:  Aaron M Cohen; Neil R Smalheiser; Marian S McDonagh; Clement Yu; Clive E Adams; John M Davis; Philip S Yu
Journal:  J Am Med Inform Assoc       Date:  2015-02-05       Impact factor: 4.497

4.  Design and implementation of Metta, a metasearch engine for biomedical literature retrieval intended for systematic reviewers.

Authors:  Neil R Smalheiser; Can Lin; Lifeng Jia; Yu Jiang; Aaron M Cohen; Clement Yu; John M Davis; Clive E Adams; Marian S McDonagh; Weiyi Meng
Journal:  Health Inf Sci Syst       Date:  2014-01-10

Review 5.  Better duplicate detection for systematic reviewers: evaluation of Systematic Review Assistant-Deduplication Module.

Authors:  John Rathbone; Matt Carter; Tammy Hoffmann; Paul Glasziou
Journal:  Syst Rev       Date:  2015-01-14

6.  Evaluation of unique identifiers used as keys to match identical publications in Pure and SciVal - a case study from health science.

Authors:  Heidi Holst Madsen; Dicte Madsen; Marianne Gauffriau
Journal:  F1000Res       Date:  2016-06-29

7.  Considerations for conducting systematic reviews: evaluating the performance of different methods for de-duplicating references.

Authors:  Sandra McKeown; Zuhaib M Mir
Journal:  Syst Rev       Date:  2021-01-23

8.  srBERT: automatic article classification model for systematic review using BERT.

Authors:  Sungmin Aum; Seon Choe
Journal:  Syst Rev       Date:  2021-10-30

9.  Systematic review automation technologies.

Authors:  Guy Tsafnat; Paul Glasziou; Miew Keen Choong; Adam Dunn; Filippo Galgani; Enrico Coiera
Journal:  Syst Rev       Date:  2014-07-09
  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.