Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Intrinsic Evaluation of Contextual and Non-contextual Word Embeddings using Radiology Reports.

Literature DB >> 35308988

Intrinsic Evaluation of Contextual and Non-contextual Word Embeddings using Radiology Reports.

Mirza S Khan^1,2,3, Bennett A Landman^2,3, Stephen A Deppen³, Michael E Matheny^1,3.

Abstract

Many clinical natural language processing methods rely on non-contextual word embedding (NCWE) or contextual word embedding (CWE) models. Yet, few, if any, intrinsic evaluation benchmarks exist comparing embedding representations against clinician judgment. We developed intrinsic evaluation tasks for embedding models using a corpus of radiology reports: term pair similarity for NCWEs and cloze task accuracy for CWEs. Using surveys, we quantified the agreement between clinician judgment and embedding model representations. We compare embedding models trained on a custom radiology report corpus (RRC), a general corpus, and PubMed and MIMIC-III corpora (P&MC). Cloze task accuracy was equivalent for RRC and P&MC models. For term pair similarity, P&MC-trained NCWEs outperformed all other NCWE models (ρspearman 0.61 vs. 0.27-0.44). Among models trained on RRC, fastText models often outperformed other NCWE models and spherical embeddings provided overly optimistic representations of term pair similarity. ©2021 AMIA - All rights reserved.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35308988 PMCID： PMC8861761

Source DB: PubMed Journal: AMIA Annu Symp Proc ISSN： 1559-4076

Keyword Cloud
References

14 in total

Intrinsic Evaluation of Contextual and Non-contextual Word Embeddings using Radiology Reports.

1. Measures of semantic similarity and relatedness in the biomedical domain.

2. Fleischner Society: glossary of terms for thoracic imaging.

3. Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study.

4. Intelligent Word Embeddings of Free-Text Radiology Reports.

5. Extracting similar terms from multiple EMR-based semantic embeddings to support chart reviews.

6. Corpus domain effects on distributional semantic modeling of medical terms.

7. Semantic similarity in the biomedical domain: an evaluation across knowledge sources.

8. Secondary use of clinical data: the Vanderbilt approach.

9. Differential Documentation of Race in the First Line of the History of Present Illness.

10. BioWordVec, improving biomedical word embeddings with subword information and MeSH.