| Literature DB >> 29236676 |
Abstract
Word sense disambiguation (WSD) is an important step in biomedical text mining, which is responsible for assigning an unequivocal concept to an ambiguous term, improving the accuracy of biomedical information extraction systems. In this work we followed supervised and knowledge-based disambiguation approaches, with the best results obtained by supervised means. In the supervised method we used bag-of-words as local features, and word embeddings as global features. In the knowledge-based method we combined word embeddings, concept textual definitions extracted from the UMLS database, and concept association values calculated from the MeSH co-occurrence counts from MEDLINE articles. Also, in the knowledge-based method, we tested different word embedding averaging functions to calculate the surrounding context vectors, with the goal to give more importance to closest words of the ambiguous term. The MSH WSD dataset, the most common dataset used for evaluating biomedical concept disambiguation, was used to evaluate our methods. We obtained a top accuracy of 95.6 % by supervised means, while the best knowledge-based accuracy was 87.4 %. Our results show that word embedding models improved the disambiguation accuracy, proving to be a powerful resource in the WSD task.Entities:
Keywords: Biomedical text mining; information extraction; word embeddings
Mesh:
Year: 2017 PMID: 29236676 PMCID: PMC6042812 DOI: 10.1515/jib-2017-0051
Source DB: PubMed Journal: J Integr Bioinform ISSN: 1613-4516
Supervised learning WSD accuracies (standard deviations) with bag-of-words as local features.
| U | B | U + B | |
|---|---|---|---|
| DT | 0.9067 (0.0030) | 0.8335 (0.0045) | 0.9019 (0.0018) |
| kNN | 0.9324 (0.0017) | 0.8850 (0.0043) | 0.9354 (0.0019) |
| LR | 0.9205 (0.0025) | 0.8704 (0.0018) | 0.9101 (0.0024) |
| MLP | 0.9401 (0.0013) | 0.9224 (0.0010) | 0.9445 (0.0022) |
| SVM | 0.9511 (0.0013) | 0.9253 (0.0028) |
Accuracies are the average across five folds. Five classifiers were tested. U, unigrams; B, bigrams; DT, decision tree; kNN, k-nearest neighbor (k = 5); LR, logistic regression; MLP, multi-layer perceptron; SVM, support vector machine. The top accuracy is shown in bold.
Knowledge-based WSD accuracies (standard deviations) using CUI association values (MeSH term co-occurrences), CUI definitions (UMLS), and word embeddings.
| S100 | S300 | |||||
|---|---|---|---|---|---|---|
| W5 | W20 | W50 | W5 | W20 | W50 | |
| CS | 0.8318 (0.0020) | 0.8407 (0.0025) | 0.8468 (0.0034) | 0.8355 (0.0018) | 0.8477 (0.0014) | 0.8501 (0.0026) |
| nPMI ≥ 0.8 | 0.8304 (0.0018) | 0.8395 (0.0023) | 0.8461 (0.0030) | 0.8340 (0.0017) | 0.8466 (0.0013) | 0.8491 (0.0023) |
| nPMI ≥ 0.5 | 0.8155 (0.0029) | 0.8290 (0.0041) | 0.8343 (0.0042) | 0.8190 (0.0030) | 0.8323 (0.0042) | 0.8352 (0.0040) |
| nPMI ≥ 0.3 | 0.8560 (0.0012) | 0.8704 (0.0013) | 0.8573 (0.0010) | 0.8705 (0.0018) | 0.8719 (0.0025) | |
IDF word embedding averaging with logarithmic decay, f(d) = 1/ln(1 + d), was used to calculate the surrounding context vectors. Accuracies are the average across five folds. S, Size; W, window; CS, cosine similarity between term context vector and concept vector only; nPMI, normalized pointwise mutual information; nPMI ≥ thresh, cosine similarity plus related concepts with a nPMI value higher than the threshold. The top accuracy is shown in bold.
Supervised learning WSD accuracies (standard deviations) with word embeddings as global features.
| S100 | S300 | |||||
|---|---|---|---|---|---|---|
| W5 | W20 | W50 | W5 | W20 | W50 | |
| DT | 0.9219 (0.0017) | 0.9185 (0.0030) | 0.9194 (0.0033) | 0.9186 (0.0013) | 0.9186 (0.0025) | 0.9166 (0.0017) |
| kNN | 0.9452 (0.0024) | 0.9452 (0.0024) | 0.9447 (0.0017) | 0.9449 (0.0019) | 0.9444 (0.0023) | 0.9441 (0.0025) |
| LR | 0.9500 (0.0013) | 0.9495 (0.0008) | 0.9495 (0.0011) | 0.9505 (0.0012) | 0.9508 (0.0008) | 0.9509 (0.0013) |
| MLP | 0.9503 (0.0011) | 0.9498 (0.0016) | 0.9501 (0.0012) | 0.9503 (0.0010) | 0.9508 (0.0016) | |
| SVM | 0.9449 (0.0018) | 0.9452 (0.0026) | 0.9431 (0.0012) | 0.9452 (0.0025) | 0.9446 (0.0012) | 0.9444 (0.0008) |
Accuracies are the average across five folds. Five classifiers were tested. S, Size; W, window; DT, decision tree; kNN, k-nearest neighbor (k = 5); LR, logistic regression; MLP, multi-layer perceptron; SVM, support vector machine. The top accuracy is shown in bold.
Supervised learning WSD accuracies (standard deviations) with unigrams (bag-of-words) as local features and word embeddings as global features.
| S100 | S300 | |||||
|---|---|---|---|---|---|---|
| W5 | W20 | W50 | W5 | W20 | W50 | |
| DT | 0.9244 (0.0018) | 0.9215 (0.0031) | 0.9229 (0.0038) | 0.9218 (0.0028) | 0.9194 (0.0016) | 0.9191 (0.0035) |
| kNN | 0.9464 (0.0024) | 0.9468 (0.0026) | 0.9467 (0.0022) | 0.9475 (0.0017) | 0.9473 (0.0026) | 0.9468 (0.0021) |
| LR | 0.9515 (0.0013) | 0.9514 (0.0010) | 0.9515 (0.0008) | 0.9519 (0.0012) | 0.9524 (0.0011) | 0.9520 (0.0011) |
| MLP | 0.9556 (0.0006) | 0.9555 (0.0003) | 0.9544 (0.0011) | 0.9550 (0.0009) | 0.9545 (0.0011) | |
| SVM | 0.9490 (0.0008) | 0.9486 (0.0011) | 0.9481 (0.0015) | 0.9499 (0.0013) | 0.9496 (0.0009) | 0.9482 (0.0016) |
Accuracies are the average across five folds. Five classifiers were tested. S, Size; W, window; DT, decision tree; kNN, k-nearest neighbor (k = 5); LR, logistic regression; MLP, multi-layer perceptron; SVM, support vector machine. The top accuracy is shown in bold.
Knowledge-based WSD accuracies (standard deviations) using CUI association values (MeSH term co-occurrences), CUI definitions (UMLS), and word embeddings.
| S100 | S300 | |||||
|---|---|---|---|---|---|---|
| W5 | W20 | W50 | W5 | W20 | W50 | |
| CS | 0.8144 (0.0012) | 0.8254 (0.0010) | 0.8321 (0.0026) | 0.8181 (0.0010) | 0.8319 (0.0015) | 0.8337 (0.0039) |
| nPMI ≥ 0.8 | 0.8132 (0.0014) | 0.8243 (0.0011) | 0.8314 (0.0024) | 0.8168 (0.0008) | 0.8312 (0.0016) | 0.8332 (0.0036) |
| nPMI ≥ 0.5 | 0.8005 (0.0041) | 0.8152 (0.0038) | 0.8197 (0.0038) | 0.8030 (0.0037) | 0.8174 (0.0031) | 0.8209 (0.0034) |
| nPMI ≥ 0.3 | 0.8430 (0.0022) | 0.8573 (0.0022) | 0.8446 (0.0026) | 0.8566 (0.0025) | 0.8582 (0.0015) | |
TF-IDF word embedding averaging was used to calculate the surrounding context vectors. Accuracies are the average across five folds. S, Size; W, window; CS, cosine similarity between term context vector and concept vector only; nPMI, normalized pointwise mutual information; nPMI ≥ thresh, cosine similarity plus related concepts with a nPMI value higher than the threshold. The top accuracy is shown in bold.
Knowledge-based WSD accuracies (standard deviations) using CUI association values (MeSH term co-occurrences), CUI definitions (UMLS), and word embeddings.
| S100 | S300 | |||||
|---|---|---|---|---|---|---|
| W5 | W20 | W50 | W5 | W20 | W50 | |
| CS | 0.8415 (0.0022) | 0.8473 (0.0018) | 0.8502 (0.0033) | 0.8457 (0.0019) | 0.8533 (0.0019) | 0.8533 (0.0037) |
| nPMI ≥ 0.8 | 0.8395 (0.0022) | 0.8459 (0.0024) | 0.8493 (0.0032) | 0.8438 (0.0019) | 0.8515 (0.0023) | 0.8518 (0.0039) |
| nPMI ≥ 0.5 | 0.8234 (0.0013) | 0.8348 (0.0012) | 0.8376 (0.0015) | 0.8267 (0.0028) | 0.8377 (0.0030) | 0.8396 (0.0031) |
| nPMI ≥ 0.3 | 0.8617 (0.0017) | 0.8720 (0.0016) | 0.8622 (0.0020) | 0.8730 (0.0025) | 0.8736 (0.0021) | |
IDF word embedding averaging with fractional decay, f(d) = 1/d, was used to calculate the surrounding context vectors. Accuracies are the average across five folds. S, Size; W, window; CS, cosine similarity between term context vector and concept vector only; nPMI, normalized pointwise mutual information; nPMI ≥ thresh, cosine similarity plus related concepts with a nPMI value higher than the threshold. The top accuracy is shown in bold.
Knowledge-based WSD accuracies (standard deviations) using CUI association values (MeSH term co-occurrences), CUI definitions (UMLS), and word embeddings.
| S100 | S300 | |||||
|---|---|---|---|---|---|---|
| W5 | W20 | W50 | W5 | W20 | W50 | |
| CS | 0.8259 (0.0013) | 0.8270 (0.0031) | 0.8278 (0.0036) | 0.8278 (0.0018) | 0.8302 (0.0024) | 0.8276 (0.0022) |
| nPMI ≥ 0.8 | 0.8236 (0.0011) | 0.8255 (0.0032) | 0.8264 (0.0033) | 0.8255 (0.0021) | 0.8283 (0.0028) | 0.8264 (0.0023) |
| nPMI ≥ 0.5 | 0.8057 (0.0022) | 0.8137 (0.0007) | 0.8150 (0.0027) | 0.8092 (0.0035) | 0.8162 (0.0017) | 0.8168 (0.0021) |
| nPMI ≥ 0.3 | 0.8378 (0.0022) | 0.8458 (0.0032) | 0.8459 (0.0030) | 0.8404 (0.0029) | 0.8469 (0.0027) | |
IDF word embedding averaging with exponential decay, f(d) = exp(−d), was used to calculate the surrounding context vectors. Accuracies are the average across five folds. S, Size; W, window; CS, cosine similarity between term context vector and concept vector only; nPMI, normalized pointwise mutual information; nPMI ≥ thresh, cosine similarity plus related concepts with a nPMI value higher than the threshold. The top accuracy is shown in bold.
Knowledge-based WSD accuracies (standard deviations) using CUI association values (MeSH term co-occurrences), CUI definitions (UMLS), and word embeddings.
| S100 | S300 | |||||
|---|---|---|---|---|---|---|
| W5 | W20 | W50 | W5 | W20 | W50 | |
| CS | 0.8164 (0.0024) | 0.8286 (0.0011) | 0.8341 (0.0024) | 0.8203 (0.0017) | 0.8352 (0.0023) | 0.8365 (0.0034) |
| nPMI ≥ 0.8 | 0.8154 (0.0024) | 0.8277 (0.0008) | 0.8334 (0.0020) | 0.8193 (0.0012) | 0.8343 (0.0020) | 0.8357 (0.0028) |
| nPMI ≥ 0.5 | 0.8019 (0.0043) | 0.8178 (0.0031) | 0.8236 (0.0040) | 0.8057 (0.0043) | 0.8203 (0.0032) | 0.8245 (0.0025) |
| nPMI ≥ 0.3 | 0.8458 (0.0023) | 0.8600 (0.0018) | 0.8471 (0.0019) | 0.8598 (0.0014) | 0.8611 (0.0010) | |
IDF word embedding averaging with no decay, f(d) = 1, was used to calculate the surrounding context vectors. Accuracies are the average across five folds. S, Size; W, window; CS, cosine similarity between term context vector and concept vector only; nPMI, normalized pointwise mutual information; nPMI ≥ thresh, cosine similarity plus related concepts with a nPMI value higher than the threshold. The top accuracy is shown in bold.