| Literature DB >> 27643689 |
Noha Alnazzawi1, Paul Thompson1, Sophia Ananiadou1.
Abstract
Biomedical literature articles and narrative content from Electronic Health Records (EHRs) both constitute rich sources of disease-phenotype information. Phenotype concepts may be mentioned in text in multiple ways, using phrases with a variety of structures. This variability stems partly from the different backgrounds of the authors, but also from the different writing styles typically used in each text type. Since EHR narrative reports and literature articles contain different but complementary types of valuable information, combining details from each text type can help to uncover new disease-phenotype associations. However, the alternative ways in which the same concept may be mentioned in each source constitutes a barrier to the automatic integration of information. Accordingly, identification of the unique concepts represented by phrases in text can help to bridge the gap between text types. We describe our development of a novel method, PhenoNorm, which integrates a number of different similarity measures to allow automatic linking of phenotype concept mentions to known concepts in the UMLS Metathesaurus, a biomedical terminological resource. PhenoNorm was developed using the PhenoCHF corpus-a collection of literature articles and narratives in EHRs, annotated for phenotypic information relating to congestive heart failure (CHF). We evaluate the performance of PhenoNorm in linking CHF-related phenotype mentions to Metathesaurus concepts, using a newly enriched version of PhenoCHF, in which each phenotype mention has an expert-verified link to a concept in the UMLS Metathesaurus. We show that PhenoNorm outperforms a number of alternative methods applied to the same task. Furthermore, we demonstrate PhenoNorm's wider utility, by evaluating its ability to link mentions of various other types of medically-related information, occurring in texts covering wider subject areas, to concepts in different terminological resources. We show that PhenoNorm can maintain performance levels, and that its accuracy compares favourably to other methods applied to these tasks.Entities:
Mesh:
Year: 2016 PMID: 27643689 PMCID: PMC5028053 DOI: 10.1371/journal.pone.0162287
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Types and statistics of entity mentions annotated in the PhenoCHF corpus.
| Semantic categories | Description | # of annotated mentions in narrative EHR reports | # of annotated mentions in literature articles |
|---|---|---|---|
| Any medical problem that contributes to the occurrence of CHF | 1320 | 1107 | |
| A condition that increases the chance of a patient having CHF | 1335 | 408 | |
| Any observable manifestation of a disease. Symptoms are subjective manifestations experienced by a patient and reported to a health professional. Signs are physical manifestations of a disease observed by someone other than the patient, e.g. a physician using by physical examination of diagnostic tests. | 2449 | 304 | |
| Conditions associated with abnormalities in kidney functions that put a patient at higher risk of developing | 308 | 329 |
Fig 1PhenoNorm normalisation workflow.
Results of applying PhenoNorm to the PhenoCHF corpus.
| Phenotypic categories | Accuracy | |
|---|---|---|
| Without post-processing | With post-processing | |
| 0.899 | 0.907 | |
| 0.745 | 0.759 | |
| 0.789 | 0.835 | |
| 0.869 | 0.887 | |
| 0.825 | 0.847 | |
| 0.917 | 0.917 | |
| 0.878 | 0.889 | |
| 0.837 | 0.859 | |
| 0.869 | 0.880 | |
| 0.875 | 0.886 | |
Comparison of MetaMap, BeCAS, SoftTFIDF and PhenoNorm when applied to PhenoCHF.
| Method | Accuracy | |
|---|---|---|
| Narrative EHR reports | Articles | |
| 0.469 | 0.631 | |
| 0.187 | 0.353 | |
| 0.764 | 0.837 | |
| 0.847 | 0.886 | |
Fig 2The overlap between phenotype concepts appearing in EHR narratives and literature articles.
Examples of different ways of mentioning the same phenotype concepts in narrative EHR reports and literature articles.
| Type of variability | PhenoCHF corpus | |
|---|---|---|
| EHR narrative mentions | Article mentions | |
Sodium overload Drop in blood pressure | Hypernatremia Hypotension | |
Left ventricle is dilated Mild mitral calcification | Left ventricular dilatation Calcification of mitral valve | |
Cardiac output decreased | Decreased cardiac output | |
Hyperkalemic | Hyperkalemia | |
Moderate left ventricular enlargement | Left ventricular enlargement | |
Comparison of PhenoNorm against other approaches applied to the ShARe/CLEF corpus.
| Method | NER Performance (F-score) | Normalisation accuracy of recognised entity mentions |
|---|---|---|
| - | 0.83 | |
| 0.42 | 0.94 | |
| 0.68 | 0.87 | |
| 0.85 | 0.90 |
Micro-averaged performance comparison of PhenoNorm against other normsalisation approaches applied to the NCBI disease corpus.
| Method | F-score | Accuracy |
|---|---|---|
| 0.69 | 0.64 | |
| 0.33 | - | |
| 0.57 | - | |
| 0.59 | - | |
| 0.67 | - | |
| 0.78 | - |
Results of applying PhenoNorm to the heart failure and pulmonary embolism corpora.
| Method | Corpus | F | Accuracy |
|---|---|---|---|
| 0.76 | 0.77 | ||
| 0.83 | 0.86 | ||
| 0.84 | - |