| Literature DB >> 32191738 |
Manal Mohammed1,2, Nazlia Omar1.
Abstract
The assessment of examination questions is crucial in educational institutes since examination is one of the most common methods to evaluate students' achievement in specific course. Therefore, there is a crucial need to construct a balanced and high-quality exam, which satisfies different cognitive levels. Thus, many lecturers rely on Bloom's taxonomy cognitive domain, which is a popular framework developed for the purpose of assessing students' intellectual abilities and skills. Several works have been proposed to automatically handle the classification of questions in accordance with Bloom's taxonomy. Most of these works classify questions according to specific domain. As a result, there is a lack of technique of classifying questions that belong to the multi-domain areas. The aim of this paper is to present a classification model to classify exam questions based on Bloom's taxonomy that belong to several areas. This study proposes a method for classifying questions automatically, by extracting two features, TFPOS-IDF and word2vec. The purpose of the first feature was to calculate the term frequency-inverse document frequency based on part of speech, in order to assign a suitable weight for essential words in the question. The second feature, pre-trained word2vec, was used to boost the classification process. Then, the combination of these features was fed into three different classifiers; K-Nearest Neighbour, Logistic Regression, and Support Vector Machine, in order to classify the questions. The experiments used two datasets. The first dataset contained 141 questions, while the other dataset contained 600 questions. The classification result for the first dataset achieved an average of 71.1%, 82.3% and 83.7% weighted F1-measure respectively. The classification result for the second dataset achieved an average of 85.4%, 89.4% and 89.7% weighted F1-measure respectively. The finding from this study showed that the proposed method is significant in classifying questions from multiple domains based on Bloom's taxonomy.Entities:
Year: 2020 PMID: 32191738 PMCID: PMC7081997 DOI: 10.1371/journal.pone.0230442
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Bloom’s taxonomy cognitive domain levels.
Fig 2Proposed model.
The number of questions in each dataset.
| Cognitive Level | Collected Dataset | Yahya et al. (2012) |
|---|---|---|
| Knowledge Level | 26 | 100 |
| Comprehension Level | 23 | 100 |
| Application Level | 15 | 100 |
| Analysis Level | 23 | 100 |
| Synthesis Level | 30 | 100 |
| Evaluation Level | 24 | 100 |
Sample of questions from each dataset.
| Collected dataset | Yahya et al. (2012) dataset | |||
|---|---|---|---|---|
| Level | Question | Area | Question | Area |
| Memorize and recall the periodic table | Chemistry | Label the parts of the microscope shown on the right | Biology | |
| Name three 19th-century women English authors. | Literature | List reserved words in C programming. | Computer programming | |
| Explain how the heart is like a pump. | Biology | Retell the story in your words. | Literature | |
| Draw a diagram explaining how air pressure affects the weather. | Physics | Determine the next number in a sequence | Math | |
| Design or sketch a marketing strategy for your product using a known strategy as a model. | Social marketing | Show how E-CRM can be used to improve marketing positioning as explained in the article | Social marketing | |
| Write a C++ statement to declare a variable of type music Type name MyTune. | Computer programming | Sketch a prediction of the field lines for the arrangement of electrodes shown in | Physics | |
| What is the relationship between probability and statistical analysis? | Math | By comparing the map of the tectonic plates to the earthquake map, what inferences can you make? | Geology | |
| Analyze a work of art in terms of form, color and texture. | Art | Break down the components of a standard film camera and explain how they interact to make the machine work. | Physics | |
| Write a JAVA program to show the Overloading concept | Computer programming | Explain how the biological concept of symbiotic relationships could be used to help solve socially created problems like water pollution, overflowing garbage landfills, or homelessness. | Ecology | |
| Use your imagination to create a picture about the story. Then, add one new thing that was not in the story. | Literature | Develop a SQA Plan for a software development project which is defined in the attached document. | Software engineering | |
| Your advice has been sought to settle the following dispute in Company X. Referring to appropriate legal principles, write a short report advising the company on the best course of action to adopt. | Law | Conclude and support which economic system leads to a higher standard of living. | Economy | |
| "Don’t use public instance variables" is defensive programming techniques. discuss why it is good advice? | Computer programming | Examine the stated positions of both major political candidates with regard to a particular issue and state good reasons (based on principles discussed in class) for why one candidates position is more likely to be effective than the other’s. | Political science | |
Weighted F1-measure of different weight cases with collected dataset using KNN, LR and SVM.
| Cases | w1 | w2 | w3 | KNN | LR | SVM |
|---|---|---|---|---|---|---|
| CASE 1 | VB | OTHERWISE | - | 0.634 | 0.733 | 0.738 |
| CASE 2 | VB | NN | OTHERWISE | 0.656 | 0.751 | 0.759 |
| CASE 4 | VB | NN-RB | OTHERWISE | 0.647 | 0.739 | 0.749 |
| CASE 5 | VB | NN-JJ-RB | OTHERWISE | 0.633 | 0.743 | 0.751 |
| CASE 6 | VB-WH | OTHERWISE | - | 0.568 | 0.703 | 0.718 |
| CASE 7 | VB- WH | NN | OTHERWISE | 0.621 | 0.698 | 0.652 |
| CASE 8 | VB- WH | NN-JJ | OTHERWISE | 0.614 | 0.701 | 0.721 |
| CASE 9 | VB- WH | NN-RB | OTHERWISE | 0.625 | 0.698 | 0.718 |
| CASE 10 | VB- WH | NN-JJ-RB | OTHERWISE | 0.605 | 0.701 | 0.721 |
| CASE 11 | VB-WH-MD | OTHERWISE | - | 0.574 | 0.687 | 0.715 |
| CASE 12 | VB-WH-MD | NN | OTHERWISE | 0.610 | 0.687 | 0.713 |
| CASE 13 | VB-WH-MD | NN-JJ | OTHERWISE | 0.606 | 0.694 | 0.717 |
| CASE 14 | VB-WH-MD | NN-RB | OTHERWISE | 0.610 | 0.687 | 0.713 |
| CASE 15 | VB-WH-MD | NN-JJ-RB | OTHERWISE | 0.619 | 0.694 | 0.717 |
Weighted F1-measure of different weight cases with Yahya et al. (2012) dataset using KNN, LR and SVM.
| Cases | w1 | w2 | w3 | KNN | LR | SVM |
|---|---|---|---|---|---|---|
| CASE 1 | VB | OTHERWISE | - | 0.800 | 0.828 | 0.847 |
| CASE 2 | VB | NN | OTHERWISE | 0.824 | 0.848 | 0.861 |
| CASE 4 | VB | NN-RB | OTHERWISE | 0.817 | 0.834 | 0.841 |
| CASE 5 | VB | NN-JJ-RB | OTHERWISE | 0.826 | 0.837 | 0.855 |
| CASE 6 | VB-WH | OTHERWISE | - | 0.749 | 0.804 | 0.790 |
| CASE 7 | VB- WH | NN | OTHERWISE | 0.766 | 0.812 | 0.796 |
| CASE 8 | VB- WH | NN-JJ | OTHERWISE | 0.780 | 0.815 | 0.800 |
| CASE 9 | VB- WH | NN-RB | OTHERWISE | 0.770 | 0.814 | 0.797 |
| CASE 10 | VB- WH | NN-JJ-RB | OTHERWISE | 0.777 | 0.817 | 0.804 |
| CASE 11 | VB-WH-MD | OTHERWISE | - | 0.747 | 0.805 | 0.799 |
| CASE 12 | VB-WH-MD | NN | OTHERWISE | 0.762 | 0.815 | 0.803 |
| CASE 13 | VB-WH-MD | NN-JJ | OTHERWISE | 0.770 | 0.816 | 0.808 |
| CASE 14 | VB-WH-MD | NN-RB | OTHERWISE | 0.763 | 0.817 | 0.806 |
| CASE 15 | VB-WH-MD | NN-JJ-RB | OTHERWISE | 0.764 | 0.816 | 0.808 |
Example of weighting method using classical TF-IDF and modified TFPOS-IDF.
| Tagged terms | recall / VB | main /JJ | components /NNS | flowchart/NN |
|---|---|---|---|---|
| Stemmed terms | recal | main | compon | flowchart |
| 0.774 | 0.581 | 0.194 | 0.161 | |
| 0.878 | 0.440 | 0.146 | 0.122 |
Fig 3Example of converting question into a word vector.
Fig 4Example of combining word2vec with TFPOS-IDF.
Results of using KNN with TF-IDF, TFPOS-IDF, W2V-TFPOSIDF for the collected dataset.
| Cognitive Level | TF-IDF | TFPOS-IDF | W2V-TFPOSIDF | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Recall | Precision | F1-measure | Recall | Precision | F1-measure | Recall | Precision | F1-measure | |
| Knowledge | 0.781 | 0.800 | 0.777 | 0.794 | 0.887 | 0.822 | 0.782 | 0.947 | 0.832 |
| Comprehension | 0.889 | 0.778 | 0.819 | 0.920 | 0.783 | 0.837 | 0.857 | 0.617 | 0.691 |
| Application | 0.134 | 0.316 | 0.173 | 0.134 | 0.341 | 0.183 | 0.307 | 0.580 | 0.375 |
| Analysis | 0.742 | 0.464 | 0.553 | 0.741 | 0.526 | 0.601 | 0.707 | 0.798 | 0.729 |
| Synthesis | 0.691 | 0.660 | 0.657 | 0.801 | 0.694 | 0.729 | 0.881 | 0.777 | 0.808 |
| Evaluation | 0.464 | 0.742 | 0.554 | 0.559 | 0.749 | 0.627 | 0.655 | 0.825 | 0.694 |
Results of using KNN with TF-IDF, TFPOS-IDF, W2V-TFPOSIDF for the Yahya et al. (2012) dataset.
| Cognitive Level | TF-IDF | TFPOS-IDF | W2V-TFPOSIDF | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Recall | Precision | F1-measure | Recall | Precision | F1-measure | Recall | Precision | F1-measure | |
| Knowledge | 0.946 | 0.735 | 0.824 | 0.987 | 0.798 | 0.880 | 0.933 | 0.866 | 0.895 |
| Comprehension | 0.796 | 0.814 | 0.800 | 0.812 | 0.865 | 0.834 | 0.880 | 0.919 | 0.896 |
| Application | 0.660 | 0.826 | 0.734 | 0.820 | 0.870 | 0.846 | 0.869 | 0.847 | 0.856 |
| Analysis | 0.875 | 0.687 | 0.767 | 0.903 | 0.837 | 0.865 | 0.920 | 0.815 | 0.864 |
| Synthesis | 0.686 | 0.795 | 0.730 | 0.725 | 0.788 | 0.750 | 0.788 | 0.812 | 0.794 |
| Evaluation | 0.634 | 0.862 | 0.723 | 0.755 | 0.917 | 0.820 | 0.729 | 0.947 | 0.817 |
Results of LR with TF-IDF, TFPOS-IDF, W2V-TFPOSIDF for the collected dataset.
| Cognitive Level | TF-IDF | TFPOS-IDF | W2V-TFPOSIDF | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Recall | Precision | F1-measure | Recall | Precision | F1-measure | Recall | Precision | F1-measure | |
| Knowledge | 0.833 | 0.898 | 0.851 | 0.833 | 0.924 | 0.864 | 0.885 | 0.960 | 0.910 |
| Comprehension | 0.873 | 0.798 | 0.814 | 0.937 | 0.897 | 0.907 | 0.857 | 0.886 | 0.860 |
| Application | 0.077 | 0.308 | 0.121 | 0.154 | 0.596 | 0.241 | 0.327 | 0.726 | 0.419 |
| Analysis | 0.759 | 0.592 | 0.645 | 0.810 | 0.646 | 0.695 | 0.948 | 0.844 | 0.882 |
| Synthesis | 0.910 | 0.701 | 0.782 | 0.970 | 0.747 | 0.835 | 0.930 | 0.825 | 0.861 |
| Evaluation | 0.655 | 0.823 | 0.709 | 0.750 | 0.917 | 0.811 | 0.893 | 0.883 | 0.879 |
Results of LR TF-IDF, TFPOS-IDF, W2V-TFPOSIDF for Yahya et al. (2012) dataset.
| Cognitive Level | TF-IDF | TFPOS-IDF | W2V-TFPOSIDF | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Recall | Precision | F1-measure | Recall | Precision | F1-measure | Recall | Precision | F1-measure | |
| Knowledge | 0.907 | 0.807 | 0.850 | 0.997 | 0.826 | 0.901 | 0.987 | 0.917 | 0.950 |
| Comprehension | 0.803 | 0.913 | 0.852 | 0.806 | 0.930 | 0.859 | 0.858 | 0.970 | 0.908 |
| Application | 0.728 | 0.905 | 0.804 | 0.797 | 0.898 | 0.843 | 0.858 | 0.876 | 0.865 |
| Analysis | 0.907 | 0.733 | 0.807 | 0.927 | 0.864 | 0.893 | 0.944 | 0.939 | 0.940 |
| Synthesis | 0.742 | 0.863 | 0.793 | 0.777 | 0.856 | 0.810 | 0.845 | 0.835 | 0.834 |
| Evaluation | 0.808 | 0.765 | 0.779 | 0.840 | 0.831 | 0.829 | 0.868 | 0.868 | 0.865 |
Results of SVM with TF-IDF, TFPOS-IDF, W2V-TFPOSIDF for the collected dataset.
| Cognitive Level | TF-IDF | TFPOS-IDF | W2V-TFPOSIDF | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Recall | Precision | F1-measure | Recall | Precision | F1-measure | Recall | Precision | F1-measure | |
| Knowledge | 0.834 | 0.990 | 0.895 | 0.847 | 1.000 | 0.908 | 0.885 | 0.978 | 0.920 |
| Comprehension | 0.952 | 0.827 | 0.876 | 0.936 | 0.873 | 0.897 | 0.904 | 0.890 | 0.892 |
| Application | 0.153 | 0.538 | 0.229 | 0.173 | 0.596 | 0.261 | 0.462 | 0.629 | 0.507 |
| Analysis | 0.862 | 0.535 | 0.645 | 0.896 | 0.540 | 0.663 | 0.966 | 0.865 | 0.905 |
| Synthesis | 0.920 | 0.780 | 0.837 | 0.970 | 0.810 | 0.877 | 0.881 | 0.850 | 0.855 |
| Evaluation | 0.608 | 0.817 | 0.679 | 0.701 | 0.939 | 0.779 | 0.870 | 0.859 | 0.853 |
Results of SVM TF-IDF, TFPOS-IDF, W2V-TFPOSIDF for Yahya et al. (2012) dataset.
| Cognitive Level | TF-IDF | TFPOS-IDF | W2V-TFPOSIDF | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Recall | Precision | F1-measure | Recall | Precision | F1-measure | Recall | Precision | F1-measure | |
| Knowledge | 0.980 | 0.841 | 0.902 | 0.987 | 0.893 | 0.935 | 0.993 | 0.929 | 0.961 |
| Comprehension | 0.799 | 0.925 | 0.854 | 0.816 | 0.932 | 0.868 | 0.871 | 0.945 | 0.905 |
| Application | 0.766 | 0.853 | 0.803 | 0.839 | 0.853 | 0.845 | 0.880 | 0.853 | 0.864 |
| Analysis | 0.914 | 0.788 | 0.845 | 0.938 | 0.910 | 0.921 | 0.961 | 0.929 | 0.943 |
| Synthesis | 0.714 | 0.868 | 0.778 | 0.791 | 0.831 | 0.805 | 0.841 | 0.850 | 0.841 |
| Evaluation | 0.797 | 0.756 | 0.771 | 0.825 | 0.823 | 0.819 | 0.829 | 0.915 | 0.866 |
Alpha values of t-test.
| Classifier | Dataset | TF-IDF vs. TFPOS-IDF | TFPOS-IDF vs. W2V-TFPOSIDF |
|---|---|---|---|
| KNN | Collected Dataset | 0.002056 | 0.041744 |
| Yahya et al. (2012) Dataset | 9.95E-06 | 0.049986 | |
| LR | Collected Dataset | 0.001163 | 0.006075 |
| Yahya et al. (2012) Dataset | 1.03E-05 | 6.63E-06 | |
| SVM | Collected Dataset | 0.001089 | 0.00065 |
| Yahya et al. (2012) Dataset | 6.34E-05 | 5.61E-05 |