| Literature DB >> 35330607 |
Lihua Jian1, Huiqun Xiang2,3, Guobin Le2,3.
Abstract
Text readability is very important in meeting people's information needs. With the explosive growth of modern information, the measurement demand of text readability is increasing. In view of the text structure of words, sentences, and texts, a hybrid network model based on convolutional neural network is proposed to measure the readability of English texts. The traditional method of English text readability measurement relies too much on the experience of artificial experts to extract features, which limits its practicability. With the increasing variety and quantity of text readability measurement features to be extracted, it is more and more difficult to extract deep features manually, and it is easy to introduce irrelevant features or redundant features, resulting in the decline of model performance. This paper introduces the concept of hybrid network model in deep learning; constructs a hybrid network model suitable for English text readability measurement by combining convolutional neural network, bidirectional long short-term memory network, and attention mechanism network; and replaces manual automatic feature extraction by machine learning, which greatly improves the measurement efficiency and performance of text readability.Entities:
Mesh:
Year: 2022 PMID: 35330607 PMCID: PMC8940551 DOI: 10.1155/2022/6984586
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1CNN model structure.
The details of Weekly Reader corpus and WeeBit corpus.
| Reading level | Applicable age | Number of chapters | Average number of sentences per text | |
|---|---|---|---|---|
| Weekly Reader corpus | Level 2 | 7-8 | 633 | 23.45 |
| Level 3 | 8-9 | 795 | 23.22 | |
| Level 4 | 9-10 | 805 | 29.17 | |
| Senior | 10–12 | 1316 | 31.22 | |
| WeeBit corpus | Level 2 | 7-8 | 641 | 23.01 |
| Level 3 | 8-9 | 791 | 23.45 | |
| Level 4 | 9-10 | 822 | 29.23 | |
| KS3 | 11–14 | 652 | 22.11 | |
| GCSE | 14–16 | 3600 | 28.22 |
PCC value range and its correlation strength.
| PCC value | Correlation strength |
|---|---|
| 0–0.2 | Very weak correlation or no correlation |
| 0.2–0.4 | Weak correlation |
| 0.4–0.6 | Moderate correlation |
| 0.6–0.8 | Strong correlation |
| 0.8–1 | Extremely strong correlation |
Experimental environment parameters.
| Name | Parameter |
|---|---|
| Memory | 15.6 G |
| Graphics | GeForce GTX 1080 Ti/PCLe/SSE2 |
| Processor | Intel CoreTM i7-8700 CPU @ 3.7 GHz x 12 |
Hyperparameter settings.
| Hyperparameter | Introduction | Value |
|---|---|---|
| learning.rate | Initial value of learning rate | 0.001 |
| embedding.size | Word vector dimension | 100 |
| filter.size | Convolution kernel size | 3,4,5 |
| num.filter | Number of convolution kernels | 200 |
| Dropout | Dropout probability size | 0.5 |
| l2.reg.lambda | Size of L2 regularized lambda | 0.0001 |
| lstm.hidden | LSTM hidden layer size | 100 |
| batch.size | Batch size | 100 |
| max.length | Length of sequence | 1538 |
Comparison with CNN and LSTM related models.
| Model | Accuracy | Pearson correlation coefficient |
|---|---|---|
| CNN | 0.801 | 0.840 |
| LSTM | 0.711 | 0.744 |
| BiLSTM | 0.719 | 0.836 |
| CNN-BiLSTM | 0.831 | 0.892 |
| CNN-BiLSTM-MoT | 0.877 | 0.921 |
| CNN-BiLSTM-ATT | 0.886 | 0.938 |
Comparison with existing traditional methods (on WeeBit data set).
| Model | Accuracy | Pearson correlation coefficient |
|---|---|---|
| Model 1 [ | 0.929 | — |
| Model 2 [ | 0.811 | 0.902 |
| The proposed model | 0.891 | 0.932 |
Comparison with existing traditional methods (on Weekly Reader data set).
| Model | Accuracy | Pearson correlation coefficient |
|---|---|---|
| Model 3 [ | 0.732 | — |
| Model 4 [ | 0.628 | — |
| Model 1 [ | 0.911 | — |
| The proposed model | 0.775 | 0.836 |