| Literature DB >> 35719689 |
Abstract
Machine learning models are biased toward data seen during the training steps. The models will tend to give good results in classes where there are many examples and poor results in those with few examples. This problem generally occurs when the classes to predict are imbalanced and this is frequent in educational data where for example, there are skills that are very difficult or very easy to master. There will be less data on students that correctly answered questions related to difficult skills and who incorrectly answered those related to skills easy to master. In this paper, we tackled this problem by proposing a hybrid architecture combining Deep Neural Network architectures- especially Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN)-with expert knowledge for user modeling. The proposed solution uses attention mechanism to infuse expert knowledge into the Deep Neural Network. It has been tested in two contexts: knowledge tracing in an intelligent tutoring system (ITS) called Logic-Muse and prediction of socio-moral reasoning in a serious game called MorALERT. The proposed solution is compared to state-of-the-art machine learning solutions and experiments show that the resulting model can accurately predict the current student's knowledge state (in Logic-Muse) and thus enable an accurate personalization of the learning process. Other experiments show that the model can also be used to predict the level of socio-moral reasoning skills (in MorALERT). Our findings suggest the need for hybrid neural networks that integrate prior expert knowledge (especially when it is necessary to compensate for the strong dependency-of deep learning methods-on data size or the possible unbalanced datasets). Many domains can benefit from such an approach to building models that allow generalization even when there are small training data.Entities:
Keywords: attention; deep learning; expert knowledge; hybrid neural networks; logical reasoning skill; socio-moral reasoning skill; user modeling
Year: 2022 PMID: 35719689 PMCID: PMC9203682 DOI: 10.3389/frai.2022.921476
Source DB: PubMed Journal: Front Artif Intell ISSN: 2624-8212
Figure 1Attention mechanism. Example of translation from English (I am a student) to French (je suis un étudiant).
Figure 2Global attentional hybrid model.
Distribution of responses over skills.
|
|
|
|
|
|
|---|---|---|---|---|
| MppFd | Modus ponendo ponens with few disabling conditions | 294 |
| 0.16078 |
| MppMd | Modus ponendo ponens with many disabling conditions | 294 | 0.898 | 0.23726 |
| MppCcf | Modus ponendo ponens at the counterfactual level | 294 |
| 0.2394 |
| MppA | Modus ponendo ponens at the abstract level | 294 |
| 0.16066 |
| MttFd | Modus tollendo tollens with few disabling conditions | 294 | 0.8435 | 0.26646 |
| MttMd | Modus tollendo tollens with many disabling conditions | 294 | 0.7925 | 0.29985 |
| MttCcf | Modus tollendo tollens at the counterfactual level | 294 | 0.7494 | 0.33326 |
| MttA | Modus tollendo tollens at the abstract level | 294 | 0.8401 | 0.28974 |
| AcMa | Affirmation of the consequent with many alternatives | 294 |
| 0.38072 |
| AcFa | Affirmation of the consequent with few alternatives | 294 |
| 0.3652 |
| AcCcf | Affirmation of the consequent at the counterfactual level | 294 |
| 0.40801 |
| AcA | Affirmation of the consequent at the abstract level | 294 |
| 0.41038 |
| DaMa | Denial of the antecedent with many alternatives | 294 |
| 0.37389 |
| DaFa | Denial of the antecedent with few alternatives | 294 |
| 0.35081 |
| DaCcf | Denial of the antecedent at the counterfactual level | 294 |
| 0.40662 |
| DaA | Denial of the antecedent at the abstract level | 294 |
| 0.42077 |
Skills difficult to master (Average <0.4) and skills easy to master (Average > 0.9) are in bold.
The DKT and the DKT+BN on skills that are difficult to master.
|
|
|
|
|
| |||||
|---|---|---|---|---|---|---|---|---|---|
| DKT | 0.74 | 0.09 | 0.72 | 0.0 | 0.79 | 0.0 | 0.79 | 0.0 | |
| DKT+BN |
|
| 0.78 | 0.43 |
|
|
|
| |
|
|
|
|
|
|
| ||||
| DKT | 0.79 | 0.0 | 0.73 | 0.0 | 0.80 | 0.0 | 0.79 | 0.0 |
|
| DKT+BN |
|
|
|
|
|
|
|
|
|
We repeated the experiments 20 times. The last column is the overall accuracy of the model. For each skill, the first column corresponds to the value of the f1score for predicting that the skill was incorrectly answered and the second column to the value of the f1score for predicting that the skill was correctly answered. The best ratio for each skill is in bold.
The DKT and the DKT+BN on Skills that are easy to master.
|
|
|
|
|
| ||||
|---|---|---|---|---|---|---|---|---|
| DKT | 0.0 | 0.97 | 0.0 | 0.94 | 0.0 | 0.91 | 0.0 | 0.90 |
| DKT+BN | 0.0 | 0.95 | 0.0 | 0.97 | 0.0 | 0.89 | 0.19 | 0.88 |
|
|
|
|
|
| ||||
| DKT | 0.0 | 0.96 | 0.0 | 0.87 | 0.0 | 0.96 | 0.0 | 0.90 |
| DKT+BN | 0.0 | 0.97 | 0.12 | 0.85 | 0.0 | 0.99 | 0.1 | 0.89 |
We repeated the experiments 20 times.
Figure 3Heatmaps illustrating the prediction with the 2 models on the same student. DKT is unable to make accurate predictions on skills with little data, especially skills that are difficult to master. The label in the vertical dimension refers to the input fed into the models at each time step. The color of the heatmap indicates the predicted probability that the student will correctly answer a question related to a skill in the next time step. The darker the color, the higher the probability. Here, the student gave 2 out of 3 correct answers on AC_FMA and we see that the DKT+BN is able to track down that information compared to the DKT.
Figure 4MorALERT serious game.
Figure 5Brief description of SoMoral coding and examples (Chiasson et al., 2017).
Distribution of verbatims between classes.
|
|
|
|
|
|---|---|---|---|
| 1 | 232 | 1.5 | 11 |
| 2 | 76 | 2.5 | 29 |
| 3 | 207 | 3.5 | 31 |
| 4 | 40 | 4.5 | 3 |
| 5 | 49 |
Figure 6Final model for predicting socio-moral reasoning skill levels.
Figure 7The proposed hybrid architecture using LSTM (left) or CNN (right) for the prediction of socio-moral reasoning level.
Overall precision, recall, f1score, and accuracy of all the models.
|
|
|
|
|
|
|---|---|---|---|---|
| Expert-only | 0.47 | 0.40 | 0.38 | 0.40 |
| cnn-only | 0.58 | 0.53 | 0.49 | 0.53 |
| lstm-only | 0.42 | 0.43 | 0.42 | 0.43 |
| cnn-expert | 0.62 | 0.62 | 0.62 | 0.62 |
| lstm-expert | 0.54 | 0.53 | 0.51 | 0.53 |
| cnn-expert-att | 0.67 | 0.65 | 0.63 | 0.65 |
| lstm-expert-att | 0.59 | 0.60 | 0.58 | 0.60 |
| Final model | 0.72 | 0.75 | 0.73 | 0.75 |
Figure 8Precision of all the models for each level.
Figure 9F1-score of all the models for each level.