| Literature DB >> 23847425 |
Abstract
Converting information contained in natural language clinical text into computer-amenable structured representations can automate many clinical applications. As a step towards that goal, we present a method which could help in converting novel clinical phrases into new expressions in SNOMED CT, a standard clinical terminology. Since expressions in SNOMED CT are written in terms of their relations with other SNOMED CT concepts, we formulate the important task of identifying relations between clinical phrases and SNOMED CT concepts. We present a machine learning approach for this task and using the dataset of existing SNOMED CT relations we show that it performs well.Entities:
Keywords: SNOMED CT; clinical phrases; natural language processing; relation identification
Year: 2013 PMID: 23847425 PMCID: PMC3702194 DOI: 10.4137/BII.S11645
Source DB: PubMed Journal: Biomed Inform Insights ISSN: 1178-2226
Some examples of natural language clinical phrases and their corresponding SNOMED CT expressions.
| Natural language clinical phrase | SNOMED CT expression |
|---|---|
| (a) severe pain in the stomach | 116680003 |is a| = 22253000 |pain| |
| (b) neoplasm of right lower lobe of lung | 116680003 |is a| = 126713003 |neoplasm of lung| |
| (c) acute gastric ulcer with perforation | 116680003 |is a| = 95529005 |acute gastric ulcer| |
| (d) family history of aplastic anemia | 116680003 |is a| 243796009 |situation with explicit context| |
Notes: The numbers are the SNOMED CT concept and relation identifiers and their natural language descriptions are shown for human readability. The “=” character indicates relation kind on its left side and the related concept on its right side.
Maximum F-measures over the precision-recall curves obtained by the similarity baseline and by the trained system for the most frequent SNOMED CT relations.
| Relation | Similarity baseline (%) | Trained system (%) |
|---|---|---|
| Associated morphology (disorder, morphologic abnormality) | 66.67 | 84.95 |
| Causative agent (disorder, substance) | 82.94 | 90.98 |
| Finding site (disorder, body structure) | 66.67 | 87.13 |
| Finding site (finding, body structure) | 66.67 | 90.35 |
| Has active ingredient (product, substance) | 85.92 | 91.38 |
| Is a (body structure, body structure) | 79.82 | 90.24 |
| Is a (disorder, disorder) | 78.18 | 86.00 |
| Is a (finding, finding) | 78.98 | 88.18 |
| Is a (organism, organism) | 66.67 | 82.20 |
| Is a (procedure, procedure) | 73.49 | 87.87 |
| Is a (product, product) | 67.48 | 88.52 |
| Is a (substance, substance) | 66.67 | 74.30 |
| Part of (body structure, body structure) | 66.67 | 90.56 |
| Procedure site direct (procedure, body structure) | 66.67 | 89.26 |
Figure 1Precision-recall curves for the “is a (procedure, procedure)” relation obtained using the similarity baseline and using the trained system.
Figure 2Learning curve for the “is a (procedure, procedure)” relation obtained using the trained system.
Note: The similarity baseline is shown for comparison.
Ablation results showing contributions of the different types of implicit features corresponding to the different terms of the kernel in Equation 1.
| System | Average maximum F-measure (%) |
|---|---|
| Baseline | 72.39 |
| Without word-pairs features | 73.89 |
| Without similarity score feature | 84.27 |
| Without common words features | 85.54 |
| All features | 87.28 |
Notes: The numbers are the averages of the maximum F-measures across the 14 relations. Without word-pairs features, without similarity score feature and without common word features correspond to omitting first, second and third terms respectively from Equation 1.