| Literature DB >> 35308988 |
Mirza S Khan1,2,3, Bennett A Landman2,3, Stephen A Deppen3, Michael E Matheny1,3.
Abstract
Many clinical natural language processing methods rely on non-contextual word embedding (NCWE) or contextual word embedding (CWE) models. Yet, few, if any, intrinsic evaluation benchmarks exist comparing embedding representations against clinician judgment. We developed intrinsic evaluation tasks for embedding models using a corpus of radiology reports: term pair similarity for NCWEs and cloze task accuracy for CWEs. Using surveys, we quantified the agreement between clinician judgment and embedding model representations. We compare embedding models trained on a custom radiology report corpus (RRC), a general corpus, and PubMed and MIMIC-III corpora (P&MC). Cloze task accuracy was equivalent for RRC and P&MC models. For term pair similarity, P&MC-trained NCWEs outperformed all other NCWE models (ρspearman 0.61 vs. 0.27-0.44). Among models trained on RRC, fastText models often outperformed other NCWE models and spherical embeddings provided overly optimistic representations of term pair similarity. ©2021 AMIA - All rights reserved.Entities:
Mesh:
Year: 2022 PMID: 35308988 PMCID: PMC8861761
Source DB: PubMed Journal: AMIA Annu Symp Proc ISSN: 1559-4076