| Literature DB >> 25874581 |
Yan Wang1, Zhiyuan Liu1, Maosong Sun2.
Abstract
Combined with neural language models, distributed word representations achieve significant advantages in computational linguistics and text mining. Most existing models estimate distributed word vectors from large-scale data in an unsupervised fashion, which, however, do not take rich linguistic knowledge into consideration. Linguistic knowledge can be represented as either link-based knowledge or preference-based knowledge, and we propose knowledge regularized word representation models (KRWR) to incorporate these prior knowledge for learning distributed word representations. Experiment results demonstrate that our estimated word representation achieves better performance in task of semantic relatedness ranking. This indicates that our methods can efficiently encode both prior knowledge from knowledge bases and statistical knowledge from large-scale text corpora into a unified word representation model, which will benefit many tasks in text mining.Entities:
Mesh:
Year: 2015 PMID: 25874581 PMCID: PMC4395361 DOI: 10.1371/journal.pone.0118437
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1CBOW model.
Fig 2A small miniature of topology in WordNet.
The word bank is related to many other words by synsets (represented with dashed circles) and hyponym/hypernym relationship (represented with arrow).
Fig 3Link-based knowledge is constructed according to the topology as shown in Fig. 2.
Sample of University of South Florida Free Association Norms.
| Cue | Target | #G | #P |
|---|---|---|---|
| bank | money | 144 | 115 |
| bank | account | 144 | 5 |
| bank | robber | 144 | 5 |
| bank | teller | 144 | 4 |
| bank | loan | 144 | 2 |
| bank | vault | 144 | 2 |
Number of Links in Different Datasets.
| Dataset | Training Set (thousand) | Testing Set (thousand) |
|---|---|---|
| XL | 125 | 5 |
| XXL | 598 | 30 |
| XXXL | 2640 | 100 |
| WN | 2000 | 80 |
| WAN | 68 | 3 |
Average Relatedness of Different Methods.
| Method | WN-Train | WN-Test |
|---|---|---|
| CBOW | 0.084 | 0.083 |
| JO-SPR | 0.432 | 0.423 |
| PO-SPR | 0.153 | 0.153 |
| PO-ER |
|
|
Average Relatedness on Different Datasets.
| Dataset | CBOW | Single Set | Multiple Sets |
|---|---|---|---|
| XL-Train | 0.199 | 0.321 | 0.286 |
| XL-Test | 0.199 | 0.271 | 0.282 |
| XXL-Train | 0.141 | 0.254 | 0.218 |
| XXL-Test | 0.142 | 0.224 | 0.213 |
| XXXL-Train | 0.090 | 0.206 | 0.143 |
| XXXL-Test | 0.091 | 0.189 | 0.139 |
| WN-Train | 0.084 | 0.432 | 0.278 |
| WN-Test | 0.083 | 0.423 | 0.273 |
| WAN-Train | 0.161 | 0.241 | 0.192 |
| WAN-Test | 0.161 | 0.205 | 0.189 |
Fig 4Average relatedness.
Preference Classification Precision of Different Methods.
| Method | WANP-Train | WANP-Test |
|---|---|---|
| CBOW | 0.628 | 0.628 |
| JO-NSR | 0.746 | 0.717 |
| PO-MR | 0.942 | 0.914 |
Fig 5Influence of iterations per Link.
Spearman’s ρ on Wordsim-353 Dataset.
| Model | Knowledge | WS353 |
|---|---|---|
| Multiple Prototypes | - | 0.770 |
| Tiered Clustering | - | 0.769 |
| WN30G | WordNet | 0.660 |
| ESA | Wikipedia | 0.750 |
| CBOW | - | 0.734 |
| JO-SPR | XL | 0.756 |
| JO-SPR | XXL | 0.754 |
| JO-SPR | XXXL | 0.765 |
| JO-SPR | WN | 0.746 |
| JO-SPR | WAN | 0.758 |
| PO-ER | WN | 0.764 |
| JO-NSR | WANP | 0.734 |
| PO-MR | WANP | 0.707 |
| JO-SPR, PO-ER | XXXL, WN |
|
| JO-SPR, PO-ER | WAN, WN |
|