| Literature DB >> 29924810 |
Saqib Hakak1, Amirrudin Kamsin1, Shivakumara Palaiahnakote1, Omar Tayan2, Mohd Yamani Idna Idris1, Khir Zuhaili Abukhir3.
Abstract
Arabic script is highly sensitive to changes in meaning with respect to the accurate arrangement of diacritics and other related symbols. The most sensitive Arabic text available online is the Digital Qur'an, the sacred book of Revelation in Islam that all Muslims including non-Arabs recite as part of their worship. Due to the different characteristics of the Arabic letters like diacritics (punctuation symbols), kashida (extended letters) and other symbols, it is written and available in different styles like Kufi, Naskh, Thuluth, Uthmani, etc. As social media has become part of our daily life, posting downloaded Qur'anic verses from the web is common. This leads to the problem of authenticating the selected Qur'anic passages available in different styles. This paper presents a residual approach for authenticating Uthmani and plain Qur'an verses using one common database. Residual (difference) is obtained by analyzing the differences between Uthmani and plain Quranic styles using XOR operation. Based on predefined data, the proposed approach converts Uthmani text into plain text. Furthermore, we propose to use the Tuned BM algorithm (BMT) exact pattern matching algorithm to verify the substituted Uthmani verse with a given database of plain Qur'anic style. Experimental results show that the proposed approach is useful and effective in authenticating multi-style texts of the Qur'an with 87.1% accuracy.Entities:
Mesh:
Year: 2018 PMID: 29924810 PMCID: PMC6010264 DOI: 10.1371/journal.pone.0198284
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Different writing styles of Digital Holy Quran [8].
Fig 2Main Arabic diacritics [18].
Fig 3Tajweed symbols.
Fig 4(a) Uthmanic style (b) Plain writing style verse.
Fig 5The logical flow of the proposed approach.
Fig 6Sample UNICODE representation.
Fig 7Tokenized quranic verse.
XOR operation of verses.
Analysis of Uthmanic and plain quranic verses.
Symbols removed.
Pre-processing in benchmark dataset.
Performance analysis of character-based exact matching algorithms.
Fig 8Boyer Moore algorithm.
Fig 9Prototype.
Analysis without using XOR and substitution.
Fig 10Prototype snapshot.
Comparative analysis after XOR and substitution phase.
Unverified verses.