Literature DB >> 33618727

A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine.

Leonardo Campillos-Llanos1, Ana Valverde-Mateos2, Adrián Capllonch-Carrión3, Antonio Moreno-Sandoval4.   

Abstract

BACKGROUND: The large volume of medical literature makes it difficult for healthcare professionals to keep abreast of the latest studies that support Evidence-Based Medicine. Natural language processing enhances the access to relevant information, and gold standard corpora are required to improve systems. To contribute with a new dataset for this domain, we collected the Clinical Trials for Evidence-Based Medicine in Spanish (CT-EBM-SP) corpus.
METHODS: We annotated 1200 texts about clinical trials with entities from the Unified Medical Language System semantic groups: anatomy (ANAT), pharmacological and chemical substances (CHEM), pathologies (DISO), and lab tests, diagnostic or therapeutic procedures (PROC). We doubly annotated 10% of the corpus and measured inter-annotator agreement (IAA) using F-measure. As use case, we run medical entity recognition experiments with neural network models.
RESULTS: This resource contains 500 abstracts of journal articles about clinical trials and 700 announcements of trial protocols (292 173 tokens). We annotated 46 699 entities (13.98% are nested entities). Regarding IAA agreement, we obtained an average F-measure of 85.65% (±4.79, strict match) and 93.94% (±3.31, relaxed match). In the use case experiments, we achieved recognition results ranging from 80.28% (±00.99) to 86.74% (±00.19) of average F-measure.
CONCLUSIONS: Our results show that this resource is adequate for experiments with state-of-the-art approaches to biomedical named entity recognition. It is freely distributed at: http://www.lllf.uam.es/ESP/nlpmedterm_en.html . The methods are generalizable to other languages with similar available sources.

Entities:  

Keywords:  Clinical Trials; Evidence-Based Medicine; Inter-Annotator Agreement; Natural Language Processing; Semantic Annotation

Mesh:

Year:  2021        PMID: 33618727      PMCID: PMC7898014          DOI: 10.1186/s12911-021-01395-z

Source DB:  PubMed          Journal:  BMC Med Inform Decis Mak        ISSN: 1472-6947            Impact factor:   2.796


  3 in total

1.  Aggregating UMLS semantic types for reducing conceptual complexity.

Authors:  A T McCray; A Burgun; O Bodenreider
Journal:  Stud Health Technol Inform       Date:  2001

2.  SNOMED-CT: The advanced terminology and coding system for eHealth.

Authors:  Kevin Donnelly
Journal:  Stud Health Technol Inform       Date:  2006

3.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Authors:  Jinhyuk Lee; Wonjin Yoon; Sungdong Kim; Donghyeon Kim; Sunkyu Kim; Chan Ho So; Jaewoo Kang
Journal:  Bioinformatics       Date:  2020-02-15       Impact factor: 6.937

  3 in total
  4 in total

1.  CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice.

Authors:  Shaina Raza; Brian Schwartz; Laura C Rosella
Journal:  BMC Bioinformatics       Date:  2022-06-02       Impact factor: 3.307

2.  Correction to: A clinical trials corpus annotated with UMLS entities to enhance the access to evidence‑based medicine.

Authors:  Leonardo Campillos-Llanos; Ana Valverde-Mateos; Adrián Capllonch-Carrión; Antonio Moreno-Sandoval
Journal:  BMC Med Inform Decis Mak       Date:  2021-04-07       Impact factor: 2.796

3.  The OpenDeID corpus for patient de-identification.

Authors:  Jitendra Jonnagaddala; Aipeng Chen; Sean Batongbacal; Chandini Nekkantti
Journal:  Sci Rep       Date:  2021-10-07       Impact factor: 4.379

4.  Negation and uncertainty detection in clinical texts written in Spanish: a deep learning-based approach.

Authors:  Oswaldo Solarte Pabón; Orlando Montenegro; Maria Torrente; Alejandro Rodríguez González; Mariano Provencio; Ernestina Menasalvas
Journal:  PeerJ Comput Sci       Date:  2022-03-07
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.