| Literature DB >> 26693256 |
Chenghua Lin1, Dong Liu2, Wei Pang1, Zhe Wang3.
Abstract
In this paper, we present a semi-automatic system (Sherlock) for quiz generation using linked data and textual descriptions of RDF resources. Sherlock is distinguished from existing quiz generation systems in its generic framework for domain-independent quiz generation as well as in the ability of controlling the difficulty level of the generated quizzes. Difficulty scaling is non-trivial, and it is fundamentally related to cognitive science. We approach the problem with a new angle by perceiving the level of knowledge difficulty as a similarity measure problem and propose a novel hybrid semantic similarity measure using linked data. Extensive experiments show that the proposed semantic similarity measure outperforms four strong baselines with more than 47 % gain in clustering accuracy. In addition, we discovered in the human quiz test that the model accuracy indeed shows a strong correlation with the pairwise quiz similarity.Entities:
Keywords: Educational games; Linked data; Quiz generation; RDF; Semantic similarity; Text analytics
Year: 2015 PMID: 26693256 PMCID: PMC4675796 DOI: 10.1007/s12559-015-9347-7
Source DB: PubMed Journal: Cognit Comput ISSN: 1866-9956 Impact factor: 5.418
Fig. 1Overall architecture of Sherlock
Fig. 2User interface for playing quizzes. a User interface when an incorrect choice is made. b User interface when a correct choice is made
Statistics of the Wildlife textual dataset
| Dataset | # of docs | Avg. doc length† | Avg. doc length* | Vocab. size† | Vocab. size* |
|---|---|---|---|---|---|
| Wildlife | 437 | 1190 | 652 | 26,004 | 18,237 |
Denotes before preprocessing and * denotes after preprocessing
Fig. 3Deriving the gold standard for the BBC Wildlife dataset using the biological classification system
Clustering accuracy of different similarity measures for measuring quiz difficulty levels
| Dataset | LDSD | WUP | KLD | TF-IDF | TF-IDF (LD) |
|---|---|---|---|---|---|
| RDF | RDF | Text | Text | RDF | |
| Difficult | 18.4 | 2.4 | 37.5 | 29.2 |
|
| Medium | 7.9 | 9.3 | 11.4 | 11.6 |
|
| Easy | 82 | 74.5 | 50.9 | 44.8 |
|
| Overall | 36.1 | 28.7 | 33.3 | 28.5 |
|
Unit in % and numbers in boldface denote the best result in their respective row
Top 10 most similar animals to Cheetah found by different algorithms (inappropriate ones are highlighted in bold)
| WUP | KLD | TF-IDF | LDSD | TF-IDF (LD) |
|---|---|---|---|---|
| Jaguar | Leopard | Leopard | Lion | Serval |
| Lion | Lion |
|
| Snow Leopard |
| Serval | Cougar | Lion | Leopard | Lion |
| Cougar | Tiger | Leopard Cat | Tiger | Leopard |
|
| Jaguar | Cougar | Serval | Cougar |
|
|
| Asian Golden Cat | Cougar | Wildcat |
|
| Leopard Cat |
|
| Jaguar |
|
| Snow Leopard |
|
| Tiger |
|
|
|
|
|
|
|
|
| Snow Leopard |
| Eurasian Lynx |
Fig. 4Averaged quiz similarity based on different similarity measures on the Wildlife domain dataset. a LDSD. b TF-IDF. c TF-IDF (LD)
Fig. 5Pearson’s correlation between the model accuracy and the pairwise similarity of quizzes. a LDSD (). b TF-IDF (). c TF-IDF (LD) ()
Fig. 6a Ontologies for food recipes and b paintings and artists. Note widely used predicates such as rdfs:label and rdfs:comment are omitted for making the figures more concise
Statistics of the RDF datasets from three different domain
| Dataset | RDF triples | Number of species | RDF triples per recipe | Distinct objects shared by at least two recipes | Distinct subjects shared by at least two recipes |
|---|---|---|---|---|---|
| Wildlife | 49,897 | 886 | 16.1 | 323 | 74 |
| Food | 55,006 | 5412 | 9.3 | 2419 | 0 |
| YourPaintings | 25,314 | 41 | 197.9 | 252 | 39 |
Fig. 7User interface for creating a quiz