Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine.

Literature DB >> 33618727

A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine.

Leonardo Campillos-Llanos¹, Ana Valverde-Mateos², Adrián Capllonch-Carrión³, Antonio Moreno-Sandoval⁴.

Abstract

BACKGROUND: The large volume of medical literature makes it difficult for healthcare professionals to keep abreast of the latest studies that support Evidence-Based Medicine. Natural language processing enhances the access to relevant information, and gold standard corpora are required to improve systems. To contribute with a new dataset for this domain, we collected the Clinical Trials for Evidence-Based Medicine in Spanish (CT-EBM-SP) corpus.
METHODS: We annotated 1200 texts about clinical trials with entities from the Unified Medical Language System semantic groups: anatomy (ANAT), pharmacological and chemical substances (CHEM), pathologies (DISO), and lab tests, diagnostic or therapeutic procedures (PROC). We doubly annotated 10% of the corpus and measured inter-annotator agreement (IAA) using F-measure. As use case, we run medical entity recognition experiments with neural network models.
RESULTS: This resource contains 500 abstracts of journal articles about clinical trials and 700 announcements of trial protocols (292 173 tokens). We annotated 46 699 entities (13.98% are nested entities). Regarding IAA agreement, we obtained an average F-measure of 85.65% (±4.79, strict match) and 93.94% (±3.31, relaxed match). In the use case experiments, we achieved recognition results ranging from 80.28% (±00.99) to 86.74% (±00.19) of average F-measure.
CONCLUSIONS: Our results show that this resource is adequate for experiments with state-of-the-art approaches to biomedical named entity recognition. It is freely distributed at: http://www.lllf.uam.es/ESP/nlpmedterm_en.html . The methods are generalizable to other languages with similar available sources.

Entities: CellLine Chemical Disease Gene Species

Keywords: Clinical Trials; Evidence-Based Medicine; Inter-Annotator Agreement; Natural Language Processing; Semantic Annotation

Mesh：

Year: 2021 PMID： 33618727 PMCID： PMC7898014 DOI： 10.1186/s12911-021-01395-z

Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN： 1472-6947 Impact factor: 2.796

3 in total

1. Aggregating UMLS semantic types for reducing conceptual complexity.

Authors: A T McCray; A Burgun; O Bodenreider
Journal: Stud Health Technol Inform Date: 2001

2. SNOMED-CT: The advanced terminology and coding system for eHealth.

Authors: Kevin Donnelly
Journal: Stud Health Technol Inform Date: 2006

3. BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Authors: Jinhyuk Lee; Wonjin Yoon; Sungdong Kim; Donghyeon Kim; Sunkyu Kim; Chan Ho So; Jaewoo Kang
Journal: Bioinformatics Date: 2020-02-15 Impact factor: 6.937

3 in total

4 in total

1. CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice.

Authors: Shaina Raza; Brian Schwartz; Laura C Rosella
Journal: BMC Bioinformatics Date: 2022-06-02 Impact factor: 3.307

2. Correction to: A clinical trials corpus annotated with UMLS entities to enhance the access to evidence‑based medicine.

Authors: Leonardo Campillos-Llanos; Ana Valverde-Mateos; Adrián Capllonch-Carrión; Antonio Moreno-Sandoval
Journal: BMC Med Inform Decis Mak Date: 2021-04-07 Impact factor: 2.796

3. The OpenDeID corpus for patient de-identification.

Authors: Jitendra Jonnagaddala; Aipeng Chen; Sean Batongbacal; Chandini Nekkantti
Journal: Sci Rep Date: 2021-10-07 Impact factor: 4.379

4. Negation and uncertainty detection in clinical texts written in Spanish: a deep learning-based approach.

Authors: Oswaldo Solarte Pabón; Orlando Montenegro; Maria Torrente; Alejandro Rodríguez González; Mariano Provencio; Ernestina Menasalvas
Journal: PeerJ Comput Sci Date: 2022-03-07

4 in total