| Literature DB >> 25954580 |
Yuan Luo1, Ozlem Uzuner2.
Abstract
The UMLS Semantic Network is constructed by experts and requires periodic expert review to update. We propose and implement a semi-supervised approach for automatically identifying UMLS semantic relations from narrative text in PubMed. Our method analyzes biomedical narrative text to collect semantic entity pairs, and extracts multiple semantic, syntactic and orthographic features for the collected pairs. We experiment with seeded k-means clustering with various distance metrics. We create and annotate a ground truth corpus according to the top two levels of the UMLS semantic relation hierarchy. We evaluate our system on this corpus and characterize the learning curves of different clustering configuration. Using KL divergence consistently performs the best on the held-out test data. With full seeding, we obtain macro-averaged F-measures above 70% for clustering the top level UMLS relations (2-way), and above 50% for clustering the second level relations (7-way).Entities:
Year: 2014 PMID: 25954580 PMCID: PMC4419772
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Semantic Relation distribution. AW and ISA are top level UMLS relations, the rest are in the second level. Note that there are 174 AW instances that do not fall in the second level *RT relations.
| Relation | Count |
|---|---|
| Associated_with (AW) | 9561 |
| Spatially_related_to (SRT) | 488 |
| Functionally_related_to (FRT) | 4719 |
| Conceptually_related_to (CRT) | 3177 |
| Physically_related_to (PRT) | 506 |
| Temporally_related_to (TRT) | 497 |
| Isa (ISA) | 397 |
Fgure 1System workflow
Figure 2Link grammar output of an example sentence. In this example, multi-word phrases are highlighted with blue color and inter-phrase links are highlighted with red color. Two example link paths are also highlighted with green and yellow respectively.
Figure 3Macro-averaged and micro-averaged Precision, Recall and F-measure on 2-way and 7-way relation using KL divergence as the distance metric. Results are averaged over 30 runs, confidence intervals at α=0.05 are also shown, most of which are small, suggesting statistical stability.
Performance per class of k-means clustering using KL divergence with random seeds on the relations from the top level of the UMLS Semantic Network.
| Seed | Relation | Precision | Recall | F-measure |
|---|---|---|---|---|
| 10% | AW | 96.45% | 98.83% | 97.62% |
| ISA | 30.78% | 10.94% | 15.78% | |
| 20% | AW | 96.79% | 98.83% | 97.80% |
| ISA | 42.62% | 19.83% | 26.64% | |
| 30% | AW | 96.88% | 98.61% | 97.74% |
| ISA | 41.09% | 22.31% | 28.54% | |
| 40% | AW | 97.05% | 98.50% | 97.77% |
| ISA | 42.84% | 26.75% | 32.64% | |
| 50% | AW | 97.16% | 98.55% | 97.85% |
| ISA | 45.81% | 29.53% | 35.78% |
Performance per class of k-means clustering using KL divergence with random seeds (fractions 10% to 50%) on the relations from the second level of the UMLS Semantic Network. AW here includes AW in-stances that do not fall into *RT categories.
| Seed | Relation | Precision | Recall | F-measure |
|---|---|---|---|---|
| 10% | AW | 27.50% | 8.43% | 12.20% |
| CRT | 49.35% | 55.02% | 51.93% | |
| FRT | 57.92% | 64.77% | 61.12% | |
| ISA | 36.39% | 15.68% | 21.47% | |
| PRT | 21.04% | 10.83% | 14.09% | |
| SRT | 35.90% | 21.31% | 26.39% | |
| TRT | 22.38% | 13.71% | 16.73% | |
| 20% | AW | 33.56% | 16.96% | 21.79% |
| CRT | 54.12% | 58.30% | 56.09% | |
| FRT | 60.41% | 67.25% | 63.63% | |
| ISA | 43.84% | 18.76% | 26.13% | |
| PRT | 26.91% | 15.17% | 19.21% | |
| SRT | 44.19% | 30.79% | 36.05% | |
| TRT | 26.21% | 18.61% | 21.50% | |
| 30% | AW | 34.11% | 16.76% | 21.54% |
| CRT | 56.80% | 59.90% | 58.28% | |
| FRT | 62.63% | 68.56% | 65.44% | |
| ISA | 47.86% | 25.09% | 32.65% | |
| PRT | 30.22% | 19.57% | 23.57% | |
| SRT | 50.84% | 38.69% | 43.78% | |
| TRT | 28.47% | 23.20% | 25.35% | |
| 40% | AW | 36.77% | 20.39% | 25.41% |
| CRT | 59.03% | 63.00% | 60.94% | |
| FRT | 64.05% | 68.99% | 66.42% | |
| ISA | 52.89% | 29.44% | 37.65% | |
| PRT | 34.19% | 22.57% | 27.01% | |
| SRT | 51.68% | 43.30% | 47.01% | |
| TRT | 29.86% | 23.98% | 26.47% | |
| 50% | AW | 36.15% | 23.43% | 28.01% |
| CRT | 60.55% | 65.17% | 62.77% | |
| FRT | 66.11% | 69.55% | 67.78% | |
| ISA | 55.49% | 33.80% | 41.86% | |
| PRT | 37.13% | 24.73% | 29.58% | |
| SRT | 55.04% | 49.97% | 52.26% | |
| TRT | 32.47% | 27.28% | 29.54% |
Performance per class of k-means clustering using KL divergence with random seeds (fractions 60% to 100%) on the relations from the second level of the UMLS Semantic Network. See Table 1 for notation on abbreviations.
| Seed | Relation | Precision | Recall | F-measure |
|---|---|---|---|---|
| 60% | AW | 39.19% | 25.69% | 30.38% |
| CRT | 62.59% | 66.51% | 64.48% | |
| FRT | 67.42% | 70.53% | 68.93% | |
| ISA | 55.60% | 35.47% | 43.18% | |
| PRT | 40.10% | 28.87% | 33.46% | |
| SRT | 58.65% | 56.80% | 57.57% | |
| TRT | 34.28% | 28.74% | 31.16% | |
| 70% | AW | 40.30% | 27.06% | 31.80% |
| CRT | 63.34% | 68.07% | 65.61% | |
| FRT | 68.46% | 70.49% | 69.45% | |
| ISA | 56.28% | 36.71% | 44.36% | |
| PRT | 39.91% | 29.17% | 33.64% | |
| SRT | 58.41% | 58.45% | 58.35% | |
| TRT | 33.87% | 29.46% | 31.44% | |
| 80% | AW | 35.76% | 26.08% | 29.75% |
| CRT | 64.35% | 69.26% | 66.71% | |
| FRT | 70.13% | 70.96% | 70.54% | |
| ISA | 56.96% | 39.19% | 46.36% | |
| PRT | 42.92% | 32.83% | 37.16% | |
| SRT | 61.37% | 65.22% | 63.17% | |
| TRT | 34.92% | 31.09% | 32.86% | |
| 90% | AW | 38.06% | 28.33% | 31.95% |
| CRT | 65.31% | 70.87% | 67.97% | |
| FRT | 71.43% | 71.20% | 71.31% | |
| ISA | 58.41% | 41.15% | 48.26% | |
| PRT | 44.52% | 35.70% | 39.60% | |
| SRT | 62.01% | 67.66% | 64.68% | |
| TRT | 34.89% | 31.33% | 33.00% | |
| 100% | AW | 34.48% | 29.41% | 31.75% |
| CRT | 66.09% | 72.24% | 69.03% | |
| FRT | 72.71% | 71.40% | 72.05% | |
| ISA | 58.93% | 42.31% | 49.25% | |
| PRT | 45.00% | 36.00% | 40.00% | |
| SRT | 63.89% | 71.13% | 67.32% | |
| TRT | 35.87% | 33.67% | 34.74% |
Performance per class of k-means clustering using KL divergence with random seeds on the relations from the top level of the UMLS Semantic Network. See Table 1 for notation on abbreviations.
| Seed | Relation | Precision | Recall | F-measure |
|---|---|---|---|---|
| 60% | AW | 97.29% | 98.62% | 97.95% |
| ISA | 49.54% | 32.78% | 39.28% | |
| 70% | AW | 97.39% | 98.46% | 97.92% |
| ISA | 48.66% | 35.47% | 40.94% | |
| 80% | AW | 97.43% | 98.48% | 97.95% |
| ISA | 49.59% | 36.41% | 41.95% | |
| 90% | AW | 97.49% | 98.36% | 97.92% |
| ISA | 48.77% | 38.16% | 42.79% | |
| 100% | AW | 97.55% | 98.32% | 97.94% |
| ISA | 49.21% | 39.74% | 43.97% |