| Literature DB >> 29986664 |
Geert Heyman1, Ivan Vulić2, Marie-Francine Moens3.
Abstract
BACKGROUND: Bilingual lexicon induction (BLI) is an important task in the biomedical domain as translation resources are usually available for general language usage, but are often lacking in domain-specific settings. In this article we consider BLI as a classification problem and train a neural network composed of a combination of recurrent long short-term memory and deep feed-forward networks in order to obtain word-level and character-level representations.Entities:
Keywords: Bilingual lexicon induction; Biomedical text mining; Medical terminology; Representation learning
Mesh:
Year: 2018 PMID: 29986664 PMCID: PMC6038323 DOI: 10.1186/s12859-018-2245-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Comparable corpora. Excerpts of the English-Dutch comparable corpus in the biomedical domain that we used in the experiments with a few domain-specific translations indicated in red
, we create 2Nnoise or negative training pairs. These negative samples are generated by randomly sampling N target language words/phrases , i=1,…,N from V and pairing them with the source language word/phrase p from the true translation pair
.4 Similarly, we randomly sample N source language words/phrases and pair them with p to serve as negative samples. We then train the network by minimizing the cross-entropy loss, a commonly used loss function for classification that optimizes the likelihood of the training data. The loss function is expressed by Eq. 1, where D denotes the set of negative examples used during training, and where y denotes the binary label for
(1 for valid translation pairs, 0 otherwise).
Fig. 2Character-level encoder. An illustration of the character-level LSTM encoder architecture using the example EN-NL translation pair
simply as the concatenation of the embeddings for p and p:
Fig. 3Classification component. Illustrations of the classification component with feed-forward networks of different depths. a: H=0. b: H=2 (our model). All layers are fully connected. This figure is taken from [51]
Recall of the words and phrases in the training and test lexicons w.r.t. the extracted vocabularies
|
|
|
| ||||
|---|---|---|---|---|---|---|
| Phrases | Words+Phrases | Phrases | Words+Phrases | Phrases | Words+Phrases | |
| Training lex. | 86.26 | 97.03 | 72.06 | 95.31 | 80.96 | 99.51 |
| Test lex. | 88.60 | 97.12 | 67.44 | 95.62 | 79.69 | 99.11 |
In the EN-NL column we show the percentage of translation pairs for which both source and target words/phrases are present in the vocabulary. In the EN/NL columns we show the percentage of English/Dutch words/phrases that are present in the vocabulary
Fig. 4Precision, recall and F1 for candidate generation with 2N candidates
Comparison of word-level BLI systems
|
| |||||||
|---|---|---|---|---|---|---|---|
|
|
|
| |||||
|
| |||||||
|
|
| 13.48 | 9.15 | 21.95 | 15.84 | 14.24 | 9.73 |
|
| 0.55 | 0.88 | NaN | NaN | 0.51 | 0.80 | |
|
|
| 17.08 | 21.19 | 24.04 | 26.47 | 17.59 | 21.56 |
|
| 23.83 | 25.05 | 25.77 | 27.27 | 23.99 | 25.22 | |
|
| |||||||
|
|
|
| |||||
|
| |||||||
|
|
| 12.78 | 10.03 | 21.43 | 12.52 | 13.52 | 10.31 |
|
| 0.22 | 0.69 | NaN | 0.93 | 0.20 | 0.71 | |
|
|
| 16.47 | 21.50 | 23.48 | 23.75 | 17.01 | 21.68 |
|
| 22.80 | 24.41 | 26.74 | 27.14 | 23.10 | 24.62 | |
The best scores are indicated in bold
Comparison of character-level BLI methods from prior work [44, 45] with automatically learned character-level representations
|
| ||||||
|---|---|---|---|---|---|---|
|
|
|
| ||||
|
| ||||||
| ED | 24.49 | 19.53 | 15.62 | 19.87 | 23.83 | 19.55 |
| log(ED | 28.57 | 28.17 | 18.05 | 17.27 | 27.86 | 27.46 |
| 25.99 | 11.20 | 18.40 | 14.35 | 25.49 | 11.31 | |
| CHARPAIRS | 31.95 | 32.32 | 23.70 | 25.97 | 31.39 | 31.92 |
|
| ||||||
|
|
|
| ||||
|
| ||||||
| ED | 28.10 | 28.29 | 8.70 | 8.63 | 26.97 | 27.24 |
| log(ED | 29.30 | 28.95 | 19.48 | 19.35 | 28.70 | 28.39 |
| 29.76 | 29.65 | 17.57 | 17.45 | 29.05 | 29.00 | |
| CHARPAIRS | 30.70 | 32.19 | 31.82 | 30.61 | 30.81 | 32.15 |
The best scores are indicated in bold
Results of the model that combines word-level and character-level representations (CHARPAIRS -SGNS) and the best performing single component models (CHARPAIRS and SGNS)
|
| ||||||
|---|---|---|---|---|---|---|
|
|
|
| ||||
|
| ||||||
| CHARPAIRS | 31.95 | 32.32 | 23.70 | 25.97 | 31.39 | 31.92 |
|
| 23.83 | 26.36 | 17.37 | 17.08 | 25.77 | 25.81 |
| CHARPAIRS -SGNS | 34.57 | 33.61 | 18.18 | 23.29 | 33.47 | 32.99 |
|
| ||||||
|
|
|
| ||||
|
| ||||||
| CHARPAIRS | 30.70 | 32.19 | 31.82 | 30.61 | 30.81 | 32.15 |
|
| 22.80 | 24.41 | 26.74 | 27.14 | 23.10 | 24.62 |
| CHARPAIRS -SGNS | 34.34 | 34.60 | 23.17 | 26.59 | 33.60 | 34.15 |
The best scores are indicated in bold
Predicted translations of single component models and the combined model, illustrating the advantage of the combined model. Correct translations are in bold
| Source word | Predictions CHARPAIRS | Predictions | Predictions CHARPAIRS -SGNS |
|---|---|---|---|
| Miscarriage | / | zwangerschap, |
|
| Contractions | contraststof |
|
|
| Injected | injecties, injectie | naald |
|
| Desensitization |
| injecties, | |
| Hart attack | |||
| Multifocal | multiple, | dominante |
|
Fig. 5Hidden layers. The influence of the number of layers H between the representations and the output layer on the BLI performance
Fig. 6Training set size. The influence of the training set size (the number of training pairs)
Fig. 7Word frequency. This plot shows how performance varies when we filter out translation pairs with frequency lower than the specified cut-off point (on x axis)
Results on a subset of the test data consisting of translation pairs with Greek or Latin origin
| ED | CHARPAIRS | SGNS | CHARPAIRS -SGNS | |
|---|---|---|---|---|
| 50.25 | 54.46 | 42.92 | 57.20 | |
| 50.23 | 55.04 | 48.14 | 56.41 |
The best scores are indicated in bold