Literature DB >> 20072710

Author Name Disambiguation in MEDLINE.

Vetle I Torvik1, Neil R Smalheiser.   

Abstract

BACKGROUND: We recently described "Author-ity," a model for estimating the probability that two articles in MEDLINE, sharing the same author name, were written by the same individual. Features include shared title words, journal name, coauthors, medical subject headings, language, affiliations, and author name features (middle initial, suffix, and prevalence in MEDLINE). Here we test the hypothesis that the Author-ity model will suffice to disambiguate author names for the vast majority of articles in MEDLINE.
METHODS: Enhancements include: (a) incorporating first names and their variants, email addresses, and correlations between specific last names and affiliation words; (b) new methods of generating large unbiased training sets; (c) new methods for estimating the prior probability; (d) a weighted least squares algorithm for correcting transitivity violations; and (e) a maximum likelihood based agglomerative algorithm for computing clusters of articles that represent inferred author-individuals.
RESULTS: Pairwise comparisons were computed for all author names on all 15.3 million articles in MEDLINE (2006 baseline), that share last name and first initial, to create Author-ity 2006, a database that has each name on each article assigned to one of 6.7 million inferred author-individual clusters. Recall is estimated at ~98.8%. Lumping (putting two different individuals into the same cluster) affects ~0.5% of clusters, whereas splitting (assigning articles written by the same individual to >1 cluster) affects ~2% of articles. IMPACT: The Author-ity model can be applied generally to other bibliographic databases. Author name disambiguation allows information retrieval and data integration to become person-centered, not just document-centered, setting the stage for new data mining and social network tools that will facilitate the analysis of scholarly publishing and collaboration behavior. AVAILABILITY: The Author-ity 2006 database is available for nonprofit academic research, and can be freely queried via http://arrowsmith.psych.uic.edu.

Entities:  

Year:  2009        PMID: 20072710      PMCID: PMC2805000          DOI: 10.1145/1552303.1552304

Source DB:  PubMed          Journal:  ACM Trans Knowl Discov Data        ISSN: 1556-4681            Impact factor:   2.713


  9 in total

1.  When A. Rose is not A. Rose: the vagaries of author searching.

Authors:  Caryn L Scoville; E Diane Johnson; Amanda L McConnell
Journal:  Med Ref Serv Q       Date:  2003

2.  A probabilistic similarity metric for Medline records: a model for author name disambiguation.

Authors:  Vetle I Torvik; Marc Weeber; Don R Swanson; Neil R Smalheiser
Journal:  AMIA Annu Symp Proc       Date:  2003

3.  A day in the life of PubMed: analysis of a typical day's query log.

Authors:  Jorge R Herskovic; Len Y Tanaka; William Hersh; Elmer V Bernstam
Journal:  J Am Med Inform Assoc       Date:  2007-01-09       Impact factor: 4.497

4.  A quantitative model for linking two disparate sets of articles in MEDLINE.

Authors:  Vetle I Torvik; Neil R Smalheiser
Journal:  Bioinformatics       Date:  2007-04-26       Impact factor: 6.937

5.  Scientific publishing: identity crisis.

Authors:  Jane Qiu
Journal:  Nature       Date:  2008-02-14       Impact factor: 49.962

6.  An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts.

Authors:  W J Wilbur; Y Yang
Journal:  Comput Biol Med       Date:  1996-05       Impact factor: 4.589

7.  Probabilistic linkage of large public health data files.

Authors:  M A Jaro
Journal:  Stat Med       Date:  1995 Mar 15-Apr 15       Impact factor: 2.373

8.  Arrowsmith two-node search interface: a tutorial on finding meaningful links between two disparate sets of articles in MEDLINE.

Authors:  Neil R Smalheiser; Vetle I Torvik; Wei Zhou
Journal:  Comput Methods Programs Biomed       Date:  2009-01-30       Impact factor: 5.428

9.  Anne O'Tate: A tool to support user-driven summarization, drill-down and browsing of PubMed search results.

Authors:  Neil R Smalheiser; Wei Zhou; Vetle I Torvik
Journal:  J Biomed Discov Collab       Date:  2008-02-15
  9 in total
  39 in total

1.  Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction.

Authors:  Aurélie Névéol; Rezarta Islamaj Doğan; Zhiyong Lu
Journal:  J Biomed Inform       Date:  2010-11-20       Impact factor: 6.317

2.  Faculty Promotion and Attrition: The Importance of Coauthor Network Reach at an Academic Medical Center.

Authors:  Erica T Warner; René Carapinha; Griffin M Weber; Emorcia V Hill; Joan Y Reede
Journal:  J Gen Intern Med       Date:  2016-01       Impact factor: 5.128

3.  A new approach and gold standard toward author disambiguation in MEDLINE.

Authors:  Dina Vishnyakova; Raul Rodriguez-Esteban; Fabio Rinaldi
Journal:  J Am Med Inform Assoc       Date:  2019-10-01       Impact factor: 4.497

4.  Gender Differences in Receipt of National Institutes of Health R01 Grants Among Junior Faculty at an Academic Medical Center: The Role of Connectivity, Rank, and Research Productivity.

Authors:  Erica T Warner; René Carapinha; Griffin M Weber; Emorcia V Hill; Joan Y Reede
Journal:  J Womens Health (Larchmt)       Date:  2017-08-03       Impact factor: 2.681

5.  Biomedical text mining for research rigor and integrity: tasks, challenges, directions.

Authors:  Halil Kilicoglu
Journal:  Brief Bioinform       Date:  2018-11-27       Impact factor: 11.622

6.  Disambiguation of patent inventors and assignees using high-resolution geolocation data.

Authors:  Greg Morrison; Massimo Riccaboni; Fabio Pammolli
Journal:  Sci Data       Date:  2017-05-16       Impact factor: 6.444

7.  Design of a generic, open platform for machine learning-assisted indexing and clustering of articles in PubMed, a biomedical bibliographic database.

Authors:  Neil R Smalheiser; Aaron M Cohen
Journal:  Data Inf Manag       Date:  2018-05-22

8.  Last Place? The Intersection of Ethnicity, Gender, and Race in Biomedical.

Authors:  Gerald Marschke; Allison Nunez; Bruce A Weinberg; Huifeng Yu
Journal:  AEA Pap Proc       Date:  2018-05

9.  Evolution of coauthorship in public health services and systems research.

Authors:  Michael E Bales; Stephen B Johnson; Jonathan W Keeling; Kathleen M Carley; Frank Kunkel; Jacqueline A Merrill
Journal:  Am J Prev Med       Date:  2011-07       Impact factor: 5.043

10.  Quantifying Conceptual Novelty in the Biomedical Literature.

Authors:  Shubhanshu Mishra; Vetle I Torvik
Journal:  Dlib Mag       Date:  2016 Sep-Oct
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.