| Literature DB >> 30048480 |
Francisco J Zamora-Martínez1, Salvador España-Boquera2, Maria Jose Castro-Bleda2, Adrian Palacios-Corella2.
Abstract
This paper presents a new method to reduce the computational cost when using Neural Networks as Language Models, during recognition, in some particular scenarios. It is based on a Neural Network that considers input contexts of different length in order to ease the use of a fallback mechanism together with the precomputation of softmax normalization constants for these inputs. The proposed approach is empirically validated, showing their capability to emulate lower order N-grams with a single Neural Network. A machine translation task shows that the proposed model constitutes a good solution to the normalization cost of the output softmax layer of Neural Networks, for some practical cases, without a significant impact in performance while improving the system speed.Entities:
Mesh:
Year: 2018 PMID: 30048480 PMCID: PMC6062053 DOI: 10.1371/journal.pone.0200884
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Scheme of a 5-gram Fallback Variable History NNLM (Fallback V-NNLM).
It is composed of one 5-gram Variable History NNLM (V-NNLM) and four precomputed tables of constants. If the softmax normalization constant is found in the 5-gram table for the input wwww, the query is processed and the probability p(w∣h) = p(w∣wwww) is computed. If not, the query is delegated to the same V-NNLM but with the input 〈dummy〉 w ww at the 4-gram table, and so on.
Lines and words of the News-Commentary corpus (English part).
| Set | # lines | # words |
|---|---|---|
| Training (News-Commentary 2010) | 125.8K | 2.9M |
| Validation (News 2008) | 2.0K | 49.7K |
| Test (News 2010) | 2.5K | 61.9K |
| Total | 130.3K | 3.0M |
Averaged PPL measures for the News-Commentary validation and test sets.
Measures given by regular NNLMs and by 5-gram V-NNLMs without using the fallback method. No combination with count-based models has been performed.
| Validation set PPL | ||||
| Model | ||||
| 2 | 3 | 4 | 5 | |
| Bigram NNLM | 412.6 ± 1.0 | – | – | – |
| Trigram NNLM | – | 345.9 ± 0.9 | – | – |
| 4-gram NNLM | – | – | 327.0 ± 1.1 | – |
| 5-gram NNLM | – | – | – | 319.4 ± 1.0 |
| V-NNLM | 423.3 ± 1.3 | 356.3 ± 0.9 | 334.9 ± 0.8 | 326.5 ± 0.7 |
| Test set PPL | ||||
| Model | ||||
| 2 | 3 | 4 | 5 | |
| Bigram NNLM | 407.7 ± 1.1 | – | – | – |
| Trigram NNLM | – | 338.6 ± 1.0 | – | – |
| 4-gram NNLM | – | – | 320.6 ± 1.1 | – |
| 5-gram NNLM | – | – | – | 312.6 ± 0.9 |
| V-NNLM | 418.2 ± 1.3 | 350.0 ± 1.0 | 329.4 ± 0.9 | 320.2 ± 0.8 |
Averaged PPL measures for the News-Commentary validation and test sets.
Measures given by SRI N-gram models, by regular NNLMs, and by the proposed Fallback V-NNLMs. Let us remark that both NNLMs and Fallback V-NNLMs are linearly combined with the 4-gram count-based model.
| Validation set PPL | |||
| SRI | NNLM | F.V-NNLM | |
| 2 | 332 | 252.0 ± 0.2 | 252.6 ± 0.1 |
| 3 | 308 | 231.1 ± 0.3 | 237.9 ± 0.2 |
| 4 | 305 | 221.0 ± 0.4 | 233.0 ± 0.2 |
| 5 | 305 | 216.6 ± 0.4 | 232.2 ± 0.2 |
| Test set PPL | |||
| SRI | NNLM | F.V-NNLM | |
| 2 | 409 | 244.8 ± 0.2 | 245.4 ± 0.1 |
| 3 | 377 | 223.4 ± 0.3 | 230.3 ± 0.2 |
| 4 | 297 | 213.8 ± 0.4 | 225.8 ± 0.2 |
| 5 | 297 | 209.0 ± 0.3 | 225.0 ± 0.3 |
Statistics of the bilingual Spanish-English task of the News-Commentary 2010 corpus.
| Corpora | Spanish | English | ||
|---|---|---|---|---|
| # lines | # words | # lines | # words | |
| News2008 | 2.0K | 52.6K | 2.0K | 49.7K |
| News2009 | 2.5K | 68.0K | 2.5K | 65.6K |
| News2010 | 2.5K | 65.5K | 2.5K | 61.9K |
Baseline results using Moses and our decoder for the News2010 test set.
Results using a count-based 4-gram as the target language model. Time is the average decoding time (in seconds) per sentence measured on an Intel(R) Core(TM) i5 CPU 750 @2.67GHz CPU with 16GB RAM.
| System | BLEU | TER | Time (s/sentence) |
|---|---|---|---|
| Moses | 22.6 | 57.8 | 0.6 |
| Our system | 22.7 | 57.8 | 0.4 |
BLEU and TER for the News2010 test set for NNLM and Fallback V-NNLM models, for different N-gram orders.
Time is computed as the average seconds per sentence of the translation plus the rescoring step times.
| BLEU | TER | Time (sec./sentence) | ||||
|---|---|---|---|---|---|---|
| NNLM | F.V-NNLM | NNLM | F.V-NNLM | NNLM | F.V-NNLM | |
| 2 | 22.9 | 22.9 | 57.5 | 57.5 | 0.5 | 0.5 |
| 3 | 23.3 | 23.1 | 57.3 | 57.4 | 0.7 | 0.5 |
| 4 | 23.4 | 23.2 | 57.2 | 57.4 | 1.5 | 0.6 |
| 5 | 23.2 | 23.1 | 57.3 | 57.4 | 2.6 | 0.6 |