Literature DB >> 33326193

Large expert-curated database for benchmarking document similarity detection in biomedical literature search.

Peter Brown1, Yaoqi Zhou1,2.   

Abstract

Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.
© The Author(s) 2019. Published by Oxford University Press.

Entities:  

Year:  2019        PMID: 33326193      PMCID: PMC7291946          DOI: 10.1093/database/baz085

Source DB:  PubMed          Journal:  Database (Oxford)        ISSN: 1758-0463            Impact factor:   3.451


  29 in total

1.  Citizen science: People power.

Authors:  Eric Hand
Journal:  Nature       Date:  2010-08-05       Impact factor: 49.962

2.  In the pursuit of a semantic similarity metric based on UMLS annotations for articles in PubMed Central Open Access.

Authors:  Leyla Jael Garcia Castro; Rafael Berlanga; Alexander Garcia
Journal:  J Biomed Inform       Date:  2015-08-01       Impact factor: 6.317

3.  Probability tables for individual comparisons by ranking methods.

Authors:  F WILCOXIN
Journal:  Biometrics       Date:  1947-09       Impact factor: 2.571

4.  Biomedical literature: Testers wanted for article search tool.

Authors:  Peter Brown; Yaoqi Zhou
Journal:  Nature       Date:  2017-09-06       Impact factor: 49.962

5.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases.

Authors:  J A Hanley; B J McNeil
Journal:  Radiology       Date:  1983-09       Impact factor: 11.105

6.  Clustering more than two million biomedical publications: comparing the accuracies of nine text-based similarity approaches.

Authors:  Kevin W Boyack; David Newman; Russell J Duhon; Richard Klavans; Michael Patek; Joseph R Biberstine; Bob Schijvenaars; André Skupin; Nianli Ma; Katy Börner
Journal:  PLoS One       Date:  2011-03-17       Impact factor: 3.240

7.  A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge.

Authors:  Trevor Cohen; Kirk Roberts; Anupama E Gururaj; Xiaoling Chen; Saeid Pournejati; George Alter; William R Hersh; Dina Demner-Fushman; Lucila Ohno-Machado; Hua Xu
Journal:  Database (Oxford)       Date:  2017-01-01       Impact factor: 3.451

8.  MedlineRanker: flexible ranking of biomedical literature.

Authors:  Jean-Fred Fontaine; Adriano Barbosa-Silva; Martin Schaefer; Matthew R Huska; Enrique M Muro; Miguel A Andrade-Navarro
Journal:  Nucleic Acids Res       Date:  2009-05-08       Impact factor: 16.971

9.  Database resources of the National Center for Biotechnology Information.

Authors: 
Journal:  Nucleic Acids Res       Date:  2015-11-28       Impact factor: 16.971

10.  A collaborative approach for research paper recommender system.

Authors:  Khalid Haruna; Maizatul Akmar Ismail; Damiasih Damiasih; Joko Sutopo; Tutut Herawan
Journal:  PLoS One       Date:  2017-10-05       Impact factor: 3.240

View more
  3 in total

1.  Stabilometric Biofeedback Training in Cognitive and Affective Function Improvement. Contribution of the Russian Scientific School. Part II.

Authors:  O M Bazanova; A V Kovaleva
Journal:  Hum Physiol       Date:  2022-06-03

2.  Psychophysiological Indicators of Postural Control. Contribution of the Russian Scientific School. Part I.

Authors:  O M Bazanova; A V Kovaleva
Journal:  Hum Physiol       Date:  2022-04-19

3.  Realization of Intelligent Scoring System of Taekwondo Protective Gear under the Application of Neural Network BP Model.

Authors:  Xiaoqing Xie; Fei Tan
Journal:  Comput Intell Neurosci       Date:  2022-06-29
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.