| Literature DB >> 35024046 |
Syed Abdul Basit Andrabi1, Abdul Wahid1.
Abstract
Machine translation is an ongoing field of research from the last decades. The main aim of machine translation is to remove the language barrier. Earlier research in this field started with the direct word-to-word replacement of source language by the target language. Later on, with the advancement in computer and communication technology, there was a paradigm shift to data-driven models like statistical and neural machine translation approaches. In this paper, we have used a neural network-based deep learning technique for English to Urdu languages. Parallel corpus sizes of around 30923 sentences are used. The corpus contains sentences from English-Urdu parallel corpus, news, and sentences which are frequently used in day-to-day life. The corpus contains 542810 English tokens and 540924 Urdu tokens, and the proposed system is trained and tested using 70 : 30 criteria. In order to evaluate the efficiency of the proposed system, several automatic evaluation metrics are used, and the model output is also compared with the output from Google Translator. The proposed model has an average BLEU score of 45.83.Entities:
Mesh:
Year: 2022 PMID: 35024046 PMCID: PMC8747903 DOI: 10.1155/2022/7873012
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Conceptual encoder-decoder.
Figure 2Workflow of Model.
Figure 3Sentence read by LSTM encoder.
Figure 4Encoder-decoder architecture.
Figure 5Attention mechanism.
Corpus sentences and words.
| Language | Sentences from Quran | Sentences from Bible | News and frequently used sentences | Total sentences | Number of tokens |
|---|---|---|---|---|---|
| English | 6414 | 7957 | 16552 | 30923 | 542810 |
| Urdu | 6414 | 7957 | 16552 | 30923 | 540924 |
Comparison of predicted output by a proposed model with Google Translate.
| English sentence | Predicted Urdu by our model | Google Translator output |
|---|---|---|
| I can translate simple sentences very well | میں سادہ جملے بہت اچھی طرح سے ترجمہ کر سکتا ہوں | میں سادہ جملوں کا بہت اچھی طرح ترجمہ کر سکتا ہوں۔ |
| Please come here | براہ مہربانی یہاں آو | براہِ کرم یہاں آئیں |
| My marriage is scheduled on next month | میری شادی اگلے مہینے میں طے ی گئی ہے | میری شادی اگلے مہینے طے ہے۔ |
| I hate politics | میں سیاست سے نفرت کرتا ہوں | مجھے سیاست سے نفرت ہے۔ |
| This is difficult work I cannot do it | یہ مشکل کام ہے میں ایسا نہیں کر سکتا | یہ مشکل کام ہے۔ میں یہ نہیں کر سکتا |
| My father is a great man | میرا باپ ایک عظیم آدمی ہے | میرے والد ایک عظیم انسان ہیں۔ |
| Good morning hello teacher | صبح بخیر ہیلو استاد | صبح بخیر ہیلو ٹیچر |
| My mother passed away yesterday | میری ماں کل گزر گئی | میری والدہ کل انتقال کر گئیں |
| My mother was a nice lady | میری ماں ایک اچھی عورت تھی | میری ماں ایک اچھی عورت تھی |
| This model gives good translation results | یہ ماڈل اچھا ترجمہ کے نتائج فراہم کرتا ہے | یہ ماڈل ترجمہ کے اچھے نتائج دیتا ہے۔ |
Several evaluation metrics values.
| BLEU | Precision | Recall | NIST | WER |
|
|---|---|---|---|---|---|
| 38.15 | 60.23 | 29.40 | 53.2 | 5 | 39.51 |
| 36.20 | 89.56 | 27.30 | 51.5 | 6 | 41.84 |
| 37.50 | 61.31 | 25.56 | 62.56 | 6 | 36.07 |
| 46.05 | 88.12 | 27.77 | 70.62 | 7 | 42.23 |
| 43.12 | 90.19 | 28.67 | 65.5 | 7 | 43.50 |
| 25.56 | 86.21 | 26.56 | 41.09 | 9 | 40.60 |
| 73.03 | 60.43 | 21.32 | 79.21 | 4 | 31.51 |
| 71.23 | 62.31 | 19.55 | 78.56 | 5 | 29.76 |
| 50.12 | 80.69 | 28.26 | 69.15 | 4 | 41.85 |
| 75.02 | 78.24 | 28.57 | 79.27 | 3 | 41.85 |
| 28.41 | 52.56 | 21.64 | 58.1 | 7 | 30.65 |
| 36.65 | 56.16 | 24.55 | 63.23 | 5 | 34.16 |
| 35.23 | 58.19 | 26.71 | 60.15 | 6 | 36.61 |
| 61.17 | 60.25 | 20.22 | 70.03 | 4 | 30.27 |
| 30.15 | 75.57 | 28.53 | 60.96 | 7 | 41.42 |
Figure 6Graphical representation of various metrics.