Literature DB >> 30958542

A new approach and gold standard toward author disambiguation in MEDLINE.

Dina Vishnyakova1, Raul Rodriguez-Esteban1, Fabio Rinaldi2,3,4.   

Abstract

OBJECTIVE: Author-centric analyses of fast-growing biomedical reference databases are challenging due to author ambiguity. This problem has been mainly addressed through author disambiguation using supervised machine-learning algorithms. Such algorithms, however, require adequately designed gold standards that reflect the reference database properly. In this study we used MEDLINE to build the first unbiased gold standard in a reference database and improve over the existing state of the art in author disambiguation.
MATERIALS AND METHODS: Following a new corpus design method, publication pairs randomly picked from MEDLINE were evaluated by both crowdsourcing and expert curators. Because the latter showed higher accuracy than crowdsourcing, expert curators were tasked to create a full corpus. The corpus was then used to explore new features that could improve state-of-the-art author disambiguation algorithms that would not have been discoverable with previously existing gold standards.
RESULTS: We created a gold standard based on 1900 publication pairs that shows close similarity to MEDLINE in terms of chronological distribution and information completeness. A machine-learning algorithm that includes new features related to the ethnic origin of authors showed significant improvements over the current state of the art and demonstrates the necessity of realistic gold standards to further develop effective author disambiguation algorithms. DISCUSSION AND
CONCLUSION: An unbiased gold standard can give a more accurate picture of the status of author disambiguation research and help in the discovery of new features for machine learning. The principles and methods shown here can be applied to other reference databases beyond MEDLINE. The gold standard and code used for this study are available at the following repository: https://github.com/amorgani/AND/.
© The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.

Keywords:  MEDLINE; author name disambiguation; gold standard; machine learning; text mining

Mesh:

Year:  2019        PMID: 30958542      PMCID: PMC7647200          DOI: 10.1093/jamia/ocz028

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  8 in total

1.  Quantifying the complexity of medical research.

Authors:  Raul Rodriguez-Esteban; William T Loging
Journal:  Bioinformatics       Date:  2013-08-31       Impact factor: 6.937

2.  Visualizing evolution and impact of biomedical fields.

Authors:  Murat Cokol; Raul Rodriguez-Esteban
Journal:  J Biomed Inform       Date:  2008-05-11       Impact factor: 6.317

3.  Author Name Disambiguation in MEDLINE.

Authors:  Vetle I Torvik; Neil R Smalheiser
Journal:  ACM Trans Knowl Discov Data       Date:  2009-07-01       Impact factor: 2.713

4.  Understanding PubMed user search behavior through log analysis.

Authors:  Rezarta Islamaj Dogan; G Craig Murray; Aurélie Névéol; Zhiyong Lu
Journal:  Database (Oxford)       Date:  2009-11-27       Impact factor: 3.451

5.  Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text.

Authors:  Àlex Bravo; Tong Shu Li; Andrew I Su; Benjamin M Good; Laura I Furlong
Journal:  Database (Oxford)       Date:  2016-06-15       Impact factor: 3.451

6.  Author Name Disambiguation for PubMed.

Authors:  Wanli Liu; Rezarta Islamaj Doğan; Sun Kim; Donald C Comeau; Won Kim; Lana Yeganova; Zhiyong Lu; W John Wilbur
Journal:  J Assoc Inf Sci Technol       Date:  2013-11-21       Impact factor: 2.687

7.  Identifying medical terms in patient-authored text: a crowdsourcing-based approach.

Authors:  Diana Lynn MacLean; Jeffrey Heer
Journal:  J Am Med Inform Assoc       Date:  2013-05-05       Impact factor: 4.497

8.  Author Disambiguation in PubMed: Evidence on the Precision and Recall of Author-ity among NIH-Funded Scientists.

Authors:  Marc J Lerchenmueller; Olav Sorenson
Journal:  PLoS One       Date:  2016-07-01       Impact factor: 3.240

  8 in total
  1 in total

1.  ReCiter: An open source, identity-driven, authorship prediction algorithm optimized for academic institutions.

Authors:  Paul J Albert; Sarbajit Dutta; Jie Lin; Zimeng Zhu; Michael Bales; Stephen B Johnson; Mohammad Mansour; Drew Wright; Terrie R Wheeler; Curtis L Cole
Journal:  PLoS One       Date:  2021-04-01       Impact factor: 3.240

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.