| Literature DB >> 35111913 |
Shazia Maqsood1, Abdul Shahid1, Muhammad Tanvir Afzal2, Muhammad Roman1, Zahid Khan3, Zubair Nawaz4, Muhammad Haris Aziz5.
Abstract
Readability is an active field of research in the late nineteenth century and vigorously persuaded to date. The recent boom in data-driven machine learning has created a viable path forward for readability classification and ranking. The evaluation of text readability is a time-honoured issue with even more relevance in today's information-rich world. This paper addresses the task of readability assessment for the English language. Given the input sentences, the objective is to predict its level of readability, which corresponds to the level of literacy anticipated from the target readers. This readability aspect plays a crucial role in drafting and comprehending processes of English language learning. Selecting and presenting a suitable collection of sentences for English Language Learners may play a vital role in enhancing their learning curve. In this research, we have used 30,000 English sentences for experimentation. Additionally, they have been annotated into seven different readability levels using Flesch Kincaid. Later, various experiments were conducted using five Machine Learning algorithms, i.e., KNN, SVM, LR, NB, and ANN. The classification models render excellent and stable results. The ANN model obtained an F-score of 0.95% on the test set. The developed model may be used in education setup for tasks such as language learning, assessing the reading and writing abilities of a learner.Entities:
Keywords: Flesch-Kincaid; Language learning; Machine learning; Natural language processing; Sentence readability
Year: 2022 PMID: 35111913 PMCID: PMC8771811 DOI: 10.7717/peerj-cs.818
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1Ways of learning a foreign language.
Traditional text readability methods.
| Elements have taken into account | ||||||
|---|---|---|---|---|---|---|
| S. No. | Formula | Long word count | Sentence Count | Syllable Count | Word Count | Word frequency |
| 1 | Flesch-Kincaid ( | ✓ | ✓ | ✓ | ||
| 2 | Flesch-Reading Ease ( | ✓ | ✓ | ✓ | ||
| 3 | Gunning FOG Index ( | ✓ | ✓ | ✓ | ||
| 4 | New Dale-chall ( | ✓ | ✓ | ✓ | ||
| 5 | Fry Readability Graph ( | ✓ | ✓ | ✓ | ✓ | |
Comparative analysis of machine learning methods measuring text readability.
| Sr. No. | Research study | Methodology/Algorithm | Language | Language dataset | Benchmark | Acc | Elements have taken into account |
|---|---|---|---|---|---|---|---|
| 1 | READ-IT | SVM | Italian | Newspaper data/ | Flesch-Kincaid | 80% | Lexical, syntactic feature |
| 2 |
| Unigram model | English | educational Web pages | Flesch-Kincaid | 75.4% | Surface linguistic feature and content feature |
| 3 |
| SVM | English | An educational newspaper | Flesch-Kincaid and Lexile | Not mentioned | Syntactic feature, coherence |
| 4 |
| Unigram model | English | Thirty articles from the Wall Street Journal | Not mentioned | 88% | Lexical, syntactic, and discourse feature |
| 5 | Heilman et al. (2007) | Unigram model | English | Textbook materials from English as a Second Language reading courses | Not mentioned | Not mentioned | Syntactic feature, coherence |
Figure 2Proposed methodology adopted for sentence classification.
Feature set.
| S. No. | Features | Examples |
|---|---|---|
| 1 | No of Words | 9 |
| 2 | No of Syllables | 13 |
| 3 | Noun Phrase | 1 |
| 4 | Complex words | 0 |
| 5 | Noun | 3 |
| 6 | Verb | 3 |
| 7 | Adverb | 0 |
| 8 | Adjective | 1 |
Figure 3Confusion matrix for (A) KNN; (B) LR; (C) SVM and (D) NB.
Figure 4ROC curve for (A) KNN; (B) LR; (C) NB and (D) SVM.
Figure 5Confusion matrix for Artificial Neural Network.
Figure 6ROC curves for Artificial Neural Network.
Figure 7Classification results with all the features.