| Literature DB >> 36091982 |
Farah Maheen1, Yazeed Yasin Ghadi2, Muhammad Asif1, Haseeb Ahmad1, Shahbaz Ahmad1, Fahad Alturise3, Othman Asiry4.
Abstract
Students require continuous feedback for effective learning. Multiple choice questions (MCQs) are extensively used among various assessment methods to provide such feedback. However, manual MCQ generation is a tedious task that requires significant effort, time, and domain knowledge. Therefore, a system must be present that can automatically generate MCQs from the given text. The automatic generation of MCQs can be carried out by following three sequential steps: extracting informative sentences from the textual data, identifying the key, and determining distractors. The dataset comprising of various topics from the 9th and 11th-grade computer science course books are used in this work. Moreover, TF-IDF, Jaccard similarity, quality phrase mining, K-means, and bidirectional encoder representation from transformers techniques are utilized for automatic MCQs generation. Domain experts validated the generated MCQs with 83%, 77%, and 80% accuracy, key generation, and distractor generation, respectively. The overall MCQ generation achieved 80% accuracy through this system by the experts. Finally, a desktop app was developed that takes the contents in textual form as input, processes it at the backend, and visualizes the generated MCQs on the interface. The presented solution may help teachers, students, and other stakeholders with automatic MCQ generation.Entities:
Keywords: BERT; Multiple choice questions; Natural language processing; TF-IDF; Text analysis
Year: 2022 PMID: 36091982 PMCID: PMC9454961 DOI: 10.7717/peerj-cs.1010
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1Structure of MCQ.
Figure 2Input and output of the system.
Figure 3Fragment of WordNet concept hierarchy (https://www.nltk.org/book/ch02.html).
Figure 4Basic flow diagram of MCQ generation modules.
Evaluation metrics by various researchers.
| System | Type of evaluation | Evaluation_metrics | Accuracy |
|---|---|---|---|
|
| Semi-automatic evaluation | Quality of cloze items | Corresponding to input request system generated 66.2%, 69.4%, 60.0% and 61.5% correct sentences. |
|
| Expert language teacher | Quality of questions | More than 80% |
|
| Five English teachers | Sentence length, simplicity, or difficulty level | 66.53% |
|
| Two biology students | Useful for learning and answerable, or not | Evaluator1: sentence selection 91.66%, key selection 94.16%, distractor selection 60.05% and |
|
| Five evaluators having domain knowledge | The difficulty, domain relevance, question information, over-informative or under-informative | Distractors average accuracy 88% and key accuracy 79.4% |
|
| Three evaluators and evaluation guidelines | Informativeness and relevance | The average score of 3.18/4 |
|
| 15 human evaluators | Sentence, gap, and distractors are good | Question sentence 94%, gaps 87% and distractors 60% |
|
| Five human evaluators | Quality of questions | Informative sentences 93.21%, key selection 83.03% and distractor quality 91.07% |
|
| Human tutors | Question acceptance | 70.66% |
|
| Five English teachers | Quality of questions | 65% |
|
| Experimental results and discussions | Efficiency of system | Informative sentences 72%, blank generation 77.6% and distractor generation accuracy 78.8% |
Figure 5MCQ generation process.
Figure 6System architecture.
Figure 7Preprocessing.
Figure 8Informative sentence extraction module.
Scoring features.
| Feature | Type | Description |
|---|---|---|
| Quality Phrases | Integer | Number of quality phrases in raw text |
| Average TF | Float | The average frequency of tokens in raw text |
| Average IDF | Float | Average of the IDF scores of tokens |
| # of NP | Float | Number of noun phrases in a sentence |
| # of VP | Float | Number of verb phrases in a sentence |
| # of Stop Words | Float | Number of stop words in a sentence |
| # of tokens | Integer | Number of tokens in a sentence |
| Chapter Title Similarity | Float | Jaccard similarity of a sentence to the title of chapter |
Figure 9Quality phrase mining.
Figure 10Distractor dictionary.
Figure 11System architecture.
System requirements.
| Sr# | Description | Detail |
|---|---|---|
| 1 | Server Platform | Ubuntu |
| 2 | Server RAM | 8 GB |
| 3 | Server Storage | 10 GB |
| 4 | Server CPUs | 2 vCPU |
| 5 | Terminal Platform | Ubuntu/Windows |
| 6 | Terminal RAM | 4 GB |
| 7 | Terminal Storage | 2 GB |
Figure 12Desktop app input fields.
Figure 13Desktop app full view.
System evaluation results by domain experts.
| Informativeness | Key generation | Distractor generation | |
|---|---|---|---|
| Evaluator 1 | 8.5 | 7.5 | 6.5 |
| Evaluator 2 | 7.5 | 4 | 9.5 |
| Evaluator 3 | 8.5 | 9.5 | 9 |
| Evaluator 4 | 9.5 | 8 | 7.5 |
| Evaluator 5 | 8.5 | 7.5 | 8.5 |
| Evaluator 6 | 9 | 9.5 | 9 |
| Evaluator 7 | 8.5 | 6 | 7.5 |
| Evaluator 8 | 8.5 | 9 | 9.5 |
| Evaluator 9 | 7.5 | 6.5 | 5.5 |
| Evaluator 10 | 7 | 9.5 | 7.5 |
| Percentage | 83 | 77 | 80 |