| Literature DB >> 34939069 |
Luke T Slater1,2,3,4, Andreas Karwath1,2,3,4, Robert Hoehndorf5, Georgios V Gkoutos1,2,3,4,6,7,8.
Abstract
Semantic similarity is a useful approach for comparing patient phenotypes, and holds the potential of an effective method for exploiting text-derived phenotypes for differential diagnosis, text and document classification, and outcome prediction. While approaches for context disambiguation are commonly used in text mining applications, forming a standard component of information extraction pipelines, their effects on semantic similarity calculations have not been widely explored. In this work, we evaluate how inclusion and disclusion of negated and uncertain mentions of concepts from text-derived phenotypes affects similarity of patients, and the use of those profiles to predict diagnosis. We report on the effectiveness of these approaches and report a very small, yet significant, improvement in performance when classifying primary diagnosis over MIMIC-III patient visits.Entities:
Keywords: context disambiguation; differential diagnosis; negation; ontology; phenotype profiles; semantic similarity
Year: 2021 PMID: 34939069 PMCID: PMC8685209 DOI: 10.3389/fdgth.2021.781227
Source DB: PubMed Journal: Front Digit Health ISSN: 2673-253X
Figure 1Flow chart describing the experimental methodology.
The number of annotations across the text records associated with the 1,000 sampled patients, and associated modifiers.
|
|
|
|
|
|---|---|---|---|
| 43,953 | 8,057 | 3,102 | 317 |
Each annotation was evaluated for uncertainty and negation, which are not mutually exclusive.
Summary of each set of patient phenotype profiles considered as an experimental setting.
|
|
|
|
|---|---|---|
| PP | All phenotypes included | 43,953 |
| PP | Negated annotations removed | 35,896 |
| PP | Uncertain annotations removed | 40,851 |
| PP | Negated and uncertain annotations removed | 33,111 |
The phenotype profiles are formed from the list of annotations associated with each patient. Different sets were formed by removing sets of annotations depending on the contextual uncertainty and negation modifiers associated with them by Komenti.
Results of classification of shared primary diagnosis, compared between different sets subsets of patient phenotype profiles.
|
|
|
|
|
|
|---|---|---|---|---|
| PP | 0.7743 (0.7724–0.7762) | 0.423 | 0.606 | – |
| PP | 0.7795 (0.7776–0.7814) |
| 0.615 | 3.588e-09 |
| PP | 0.7804 (0.7786–0.7823) | 0.421 | 0.599 | 0.4463 |
| PP |
| 0.437 |
| 3.3e-15 |
A@10 refers to the percentage of patient visits whose ten most similar patient visits contained at least one patient visit sharing a primary diagnosis. P-value is calculated by Mann-Whitney-U test on rank of true matches compared to PP
Bold values indicate the greatest values for AUC, MRR, and A@10.