| Literature DB >> 35808394 |
Ivan Vajs1,2, Vanja Ković3, Tamara Papić4, Andrej M Savić1, Milica M Janković1.
Abstract
Considering the detrimental effects of dyslexia on academic performance and its common occurrence, developing tools for dyslexia detection, monitoring, and treatment poses a task of significant priority. The research performed in this paper was focused on detecting and analyzing dyslexic tendencies in Serbian children based on eye-tracking measures. The group of 30 children (ages 7-13, 15 dyslexic and 15 non-dyslexic) read 13 different text segments on 13 different color configurations. For each text segment, the corresponding eye-tracking trail was recorded and then processed offline and represented by nine conventional features and five newly proposed features. The features were used for dyslexia recognition using several machine learning algorithms: logistic regression, support vector machine, k-nearest neighbor, and random forest. The highest accuracy of 94% was achieved using all the implemented features and leave-one-out subject cross-validation. Afterwards, the most important features for dyslexia detection (representing the complexity of fixation gaze) were used in a statistical analysis of the individual color effects on dyslexic tendencies within the dyslexic group. The statistical analysis has shown that the influence of color has high inter-subject variability. This paper is the first to introduce features that provide clear separability between a dyslexic and control group in the Serbian language (a language with a shallow orthographic system). Furthermore, the proposed features could be used for diagnosing and tracking dyslexia as biomarkers for objective quantification.Entities:
Keywords: colored background; developmental dyslexia; eye-tracking; feature extraction; k-nearest neighbors; logistic regression; machine learning; random forest; reading; screening; support vector machine
Mesh:
Year: 2022 PMID: 35808394 PMCID: PMC9269601 DOI: 10.3390/s22134900
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Trial visualization example for (A) a control subject and (B) a dyslexic subject. The color of the line represents the line length in pixels according to the presented color scale and the red stars represent blink events.
ML algorithm input feature options.
| Algorithm Input Options | |||
|---|---|---|---|
| No. | Feature Set Input Options | No. | Single Feature Input |
| 1. | Conventional features | 4. |
|
| 5. |
| ||
| 6. |
| ||
| 7. |
| ||
| 2. | Proposed features | 8. |
|
| 9. |
| ||
| 10. |
| ||
| 11. |
| ||
| 12. |
| ||
| 13. |
| ||
| 14. |
| ||
| 15. |
| ||
| 3. | Conventional and Proposed features | 16. |
|
| 17. |
| ||
Figure 2The analysis pipeline. LR—logistic regression; SVM—support vector machine; KNN—k-nearest neighbors; RF—random forest.
Feature group classification evaluation metrics (the proposed feature results marked with bold text). ACC—accuracy; Se—sensitivity; Sp—specificity; AUROC—area under the receiver operating characteristic curve.
| Feature Group | ML Algorithm | |||
|---|---|---|---|---|
| LR | SVM | KNN | RF | |
| Conventional features | ||||
| ACC | 0.84 | 0.85 | 0.81 | 0.82 |
| Se | 0.78 | 0.72 | 0.66 | 0.75 |
| Sp | 0.90 | 0.97 | 0.94 | 0.92 |
| F1 score | 0.83 | 0.82 | 0.77 | 0.81 |
| AUROC | 0.88 | 0.89 | 0.87 | 0.86 |
|
| ||||
| ACC |
|
|
|
|
| Se |
|
|
|
|
| Sp |
|
|
|
|
| F1 score |
|
|
|
|
| AUROC |
|
|
|
|
| All features | ||||
| ACC | 0.94 | 0.93 | 0.87 | 0.94 |
| Se | 0.89 | 0.87 | 0.75 | 0.86 |
| Sp | 0.98 | 0.98 | 0.98 | 0.97 |
| F1 score | 0.93 | 0.92 | 0.84 | 0.91 |
| AUROC | 0.96 | 0.97 | 0.94 | 0.94 |
Classification accuracies for single feature inputs.
| Feature | ML Algorithm | ||||
|---|---|---|---|---|---|
| SVM | LR | RF | KNN | ||
|
|
| 0.78 | 0.75 | 0.74 | 0.76 |
|
| 0.90 | 0.90 | 0.89 | 0.89 | |
|
| 0.74 | 0.74 | 0.76 | 0.73 | |
|
|
|
|
|
| |
|
| 0.89 | 0.90 | 0.89 | 0.89 | |
|
|
| 0.84 | 0.85 | 0.84 | 0.84 |
|
| 0.78 | 0.74 | 0.77 | 0.76 | |
|
| 0.35 | 0.30 | 0.52 | 0.63 | |
|
| 0.46 | 0.49 | 0.48 | 0.63 | |
|
| 0.81 | 0.81 | 0.83 | 0.82 | |
|
| 0.78 | 0.74 | 0.76 | 0.76 | |
|
| 0.57 | 0.47 | 0.63 | 0.57 | |
|
| 0.48 | 0.56 | 0.60 | 0.56 | |
|
| 0.80 | 0.77 | 0.74 | 0.75 | |
Figure 3Feature importance of the eye-tracking features based on the decrease in the impurity of the random forest algorithm.
Figure 4The boxplots of (A) Fixation intersection coefficient, (B) Fixation fractal dimension, (C) Fixation intersection variability for each color configuration and two subject groups (dyslexic and control).
Figure 5The visualization of data for all dyslexic subjects, for the three color configurations that (A) show a statistically significant difference and (B) show no statistical difference. Dots represent the background color configurations, and circles represent the overlay color configurations.
Sensitivity for single feature inputs.
| Feature | ML Algorithm | ||||
|---|---|---|---|---|---|
| SVM | LR | RF | KNN | ||
|
|
| 0.59 | 0.65 | 0.60 | 0.61 |
|
| 0.85 | 0.84 | 0.83 | 0.84 | |
|
| 0.54 | 0.61 | 0.62 | 0.58 | |
|
|
|
|
|
| |
|
| 0.84 | 0.87 | 0.83 | 0.84 | |
|
|
| 0.74 | 0.78 | 0.76 | 0.77 |
|
| 0.59 | 0.64 | 0.60 | 0.59 | |
|
| 0.31 | 0.29 | 0.48 | 0.50 | |
|
| 0.32 | 0.33 | 0.43 | 0.50 | |
|
| 0.74 | 0.75 | 0.82 | 0.83 | |
|
| 0.59 | 0.64 | 0.61 | 0.59 | |
|
| 0.44 | 0.46 | 0.42 | 0.45 | |
|
| 0.28 | 0.46 | 0.44 | 0.44 | |
|
| 0.61 | 0.65 | 0.59 | 0.62 | |
Specificity for single feature inputs.
| Feature | ML Algorithm | ||||
|---|---|---|---|---|---|
| SVM | LR | RF | KNN | ||
|
|
| 0.95 | 0.83 | 0.89 | 0.90 |
|
| 0.95 | 0.96 | 0.93 | 0.94 | |
|
| 0.94 | 0.85 | 0.87 | 0.86 | |
|
|
|
|
|
| |
|
| 0.93 | 0.92 | 0.93 | 0.93 | |
|
|
| 0.93 | 0.91 | 0.91 | 0.91 |
|
| 0.95 | 0.83 | 0.92 | 0.91 | |
|
| 0.39 | 0.34 | 0.59 | 0.76 | |
|
| 0.60 | 0.64 | 0.60 | 0.77 | |
|
| 0.87 | 0.85 | 0.82 | 0.83 | |
|
| 0.95 | 0.83 | 0.89 | 0.91 | |
|
| 0.70 | 0.47 | 0.73 | 0.70 | |
|
| 0.69 | 0.66 | 0.81 | 0.70 | |
|
| 0.97 | 0.86 | 0.83 | 0.87 | |
F1 score for single feature inputs.
| Feature | ML Algorithm | ||||
|---|---|---|---|---|---|
| SVM | LR | RF | KNN | ||
|
|
| 0.72 | 0.71 | 0.70 | 0.71 |
|
| 0.89 | 0.89 | 0.87 | 0.88 | |
|
| 0.67 | 0.69 | 0.70 | 0.68 | |
|
|
|
|
|
| |
|
| 0.88 | 0.89 | 0.87 | 0.88 | |
|
|
| 0.81 | 0.83 | 0.82 | 0.83 |
|
| 0.72 | 0.71 | 0.71 | 0.70 | |
|
| 0.32 | 0.28 | 0.50 | 0.57 | |
|
| 0.37 | 0.39 | 0.47 | 0.58 | |
|
| 0.78 | 0.79 | 0.82 | 0.83 | |
|
| 0.72 | 0.71 | 0.71 | 0.70 | |
|
| 0.50 | 0.46 | 0.50 | 0.51 | |
|
| 0.34 | 0.51 | 0.53 | 0.50 | |
|
| 0.74 | 0.73 | 0.67 | 0.71 | |
Area under the receiver operating characteristic curve for single feature inputs.
| Feature | ML Algorithm | ||||
|---|---|---|---|---|---|
| SVM | LR | RF | KNN | ||
|
|
| 0.67 | 0.79 | 0.73 | 0.75 |
|
| 0.94 | 0.95 | 0.92 | 0.94 | |
|
| 0.74 | 0.73 | 0.77 | 0.77 | |
|
|
|
|
|
| |
|
| 0.93 | 0.96 | 0.92 | 0.94 | |
|
|
| 0.87 | 0.89 | 0.86 | 0.87 |
|
| 0.67 | 0.78 | 0.72 | 0.72 | |
|
| 0.37 | 0.32 | 0.59 | 0.61 | |
|
| 0.51 | 0.38 | 0.58 | 0.62 | |
|
| 0.85 | 0.87 | 0.85 | 0.87 | |
|
| 0.67 | 0.78 | 0.72 | 0.72 | |
|
| 0.53 | 0.34 | 0.59 | 0.58 | |
|
| 0.40 | 0.54 | 0.60 | 0.58 | |
|
| 0.68 | 0.79 | 0.73 | 0.76 | |