| Literature DB >> 33431023 |
Tingjun Xu1, Weiming Chen2, Junhong Zhou2, Jingfang Dai2, Yingyong Li2, Yingli Zhao2.
Abstract
Machine translation of chemical nomenclature has considerable application prospect in chemical text data processing between languages. However, rule based machine translation tools have to face significant complication in rule sets building, especially in translation of chemical names between English and Chinese, which are the two most used languages of chemical nomenclature in the world. We applied two types of neural networks in the task of chemical nomenclature translation between English and Chinese, and made a comparison with an existing rule based machine translation tool. The result shows that deep learning based approaches have a great chance to precede rule based translation tools in machine translation of chemical nomenclature between English and Chinese.Entities:
Year: 2020 PMID: 33431023 PMCID: PMC7460765 DOI: 10.1186/s13321-020-00457-0
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Fig. 1Architecture of the CNN based neural networks for machine translation of chemical nomenclature (a). Illustration of the CNN based neural networks for machine translation of chemical nomenclature in training mode (b)
Fig. 2Architecture of the LSTM based neural networks for machine translation of chemical nomenclature (a). Illustration of the LSTM based neural networks for machine translation of chemical nomenclature in training mode (b)
Fig. 3Illustration of the rule based machine translation tool of chemical nomenclature
The performances and comparisons between DL and rule based machine translation of chemical nomenclature
| Field | CNN based | LSTM based | Rule based |
|---|---|---|---|
| Success Rate En2Ch | 100% | 100% | 75.97% |
| Success Rate Ch2En | 100% | 100% | 59.90% |
| String Matching Accuracy En2Ch | 82.92% | 89.64% | 39.81% |
| String Matching Accuracy Ch2En | 78.11% | 55.44% | 43.77% |
| Data Matching Accuracy En2Ch | 84.44% | 90.82% | 45.15% |
| Data Matching Accuracy Ch2En | 80.22% | 57.40% | 44.91% |
| Manual Spot Check En2Ch | 90.00% | 89.00% | 80.00% |
| Manual Spot Check Ch2En | 82.00% | 61.00% | 78.00% |
| Running Time En2Ch (s) | 1423 | 190 | 288 |
| Running Time Ch2En (s) | 1876 | 303 | 322 |
The performances and comparisons of translating chemical names using different naming systems and having different length
| Field | CNN based (%) | LSTM based (%) | Rule based (%) |
|---|---|---|---|
| IUPAC names | 92.00 | 62.00 | 80.00 |
| CAS names | 80.00 | 52.00 | 78.00 |
| Length not greater than 6 | 80.42 | 60.47 | 38.81 |
| Length greater than 6 | 74.38 | 47.31 | 49.33 |