| Literature DB >> 34352003 |
Ahmed Taha1, Heba M Khalil1, Tarek El-Shishtawy2.
Abstract
Nowadays, forensic authorship authentication plays a vital role in identifying the number of unknown authors as a result of the world's rapidly rising internet use. This paper presents two-level learning techniques for authorship authentication. The learning technique is supplied with linguistic knowledge, statistical features, and vocabulary features to enhance its efficiency instead of learning only. The linguistic knowledge is represented through lexical analysis features such as part of speech. In this study, a two-level classifier has been presented to capture the best predictive performance for identifying authorship. The first classifier is based on vocabulary features that detect the frequency with which each author uses certain words. This classifier's results are fed to the second one which is based on a learning technique. It depends on lexical, statistical and linguistic features. All of the three sets of features describe the author's writing styles in numerical forms. Through this work, many new features are proposed for identifying the author's writing style. Although, the proposed new methodology is tested for Arabic writings, it is general and can be applied to any language. According to the used machine learning models, the experiment carried out shows that the trained two-level classifier achieves an accuracy ranging from 94% to 96.16%.Entities:
Mesh:
Year: 2021 PMID: 34352003 PMCID: PMC8341647 DOI: 10.1371/journal.pone.0255661
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1The framework of the proposed method.
Example of words of correlation tools features.
| Feature | Example in the Arabic language | Translation in the English |
|---|---|---|
| Conjunctions | (و, ف, ثم, او,. . . .. . . .) | (And, At, Then, Or) |
| Adverbs | (هنا, هناك, غدا, دائما,. . . ..) | (Here, There, Tomorrow, Always) |
| Quantitives | (كل, كلا, معظم, جميع, . . .. . .) | (All, No, Most, Whole) |
| Modal verbs | (عسى, نعم, بئس, . . .. . .) | (Hopefully, Yes, Misery) |
| Prepositions | (من, الى, عن, مع, على,. . .. . .) | (From, To, On, With, In) |
Results of vocabulary level classifier only.
| Author | TF-IDF Accuracy |
|---|---|
| Author 1 | 85% |
| Author 2 | 93.3% |
| Author 3 | 93.3% |
| Author 4 | 96.6% |
| Author 5 | 98.3% |
| Author 6 | 91.6% |
| Author 7 | 88.3% |
| Author 8 | 91.6% |
| Author 9 | 95% |
| Author 10 | 91.6% |
Results of machine learning classifier.
| Author | Bagging model Accuracy | AdaBoost model Accuracy |
|---|---|---|
| Author 1 | 91.66% | 91.66% |
| Author 2 | 88.33% | 95% |
| Author 3 | 93.33% | 93.33% |
| Author 4 | 95% | 96.66% |
| Author 5 | 98.33% | 98.33% |
| Author 6 | 88.33% | 91.66% |
| Author 7 | 91.66% | 93.33% |
| Author 8 | 91.66% | 95% |
| Author 9 | 85% | 93.33% |
| Author 10 | 91.66% | 88.33% |
Results of using two-level classifiers.
| Author | Bagging Accuracy Based on a two-level classifier | AdaBoost Accuracy Based on a two-level classifier |
|---|---|---|
| Author 1 | 93.8% | 96.6% |
| Author 2 | 91.3% | 95% |
| Author 3 | 93.3% | 96.6% |
| Author 4 | 95% | 96.6% |
| Author 5 | 98.3% | 100% |
| Author 6 | 91.6% | 96.6% |
| Author 7 | 95% | 95% |
| Author 8 | 93.3% | 96.6% |
| Author 9 | 93.3% | 95% |
| Author 10 | 95% | 98.6% |
Results of the proposed method and compared method.
| Reference | Method | Accuracy | Arabic corpus |
|---|---|---|---|
| Alaa et al. [ | MNB | 92.03% | Arabic articles collected from Alwaraq website |
| MPNB | 87.40% | ||
| This paper | AdaBoost | 96.16% | Same data |
| Bagging | 94% |