| Literature DB >> 35340983 |
Wei Song1, Wei Ye1, Philippe Fournier-Viger2.
Abstract
Online learning is playing an increasingly important role in education. Massive open online course (MOOC) platforms are among the most important tools in online learning, and record historical learning data from an extremely large number of learners. To enhance the learning experience, a promising approach is to apply sequential pattern mining (SPM) to discover useful knowledge in these data. In this paper, mining sequential patterns (SPs) with flexible constraints in MOOC enrollment data is proposed, which follows that research approach. Three constraints are proposed: the length constraint, discreteness constraint, and validity constraint. They are used to describe the effect of the length of enrollment sequences, variance of enrollment dates, and enrollment moments, respectively. To improve the mining efficiency, the three constraints are pushed into the support, which is the most typical parameter in SPM, to form a new parameter called support with flexible constraints (SFC). SFC is proved to satisfy the downward closure property, and two algorithms are proposed to discover SPs with flexible constraints. They traverse the search space in a breadth-first and depth-first manner. The experimental results demonstrate that the proposed algorithms effectively reduce the number of patterns, with comparable performance to classical SPM algorithms.Entities:
Keywords: Downward closure property; MOOC; Sequential pattern; Support with flexible constraints
Year: 2022 PMID: 35340983 PMCID: PMC8940599 DOI: 10.1007/s10489-021-03122-7
Source DB: PubMed Journal: Appl Intell (Dordr) ISSN: 0924-669X Impact factor: 5.086
Characteristics of the Course Recommendation dataset
| Feature | Number |
|---|---|
| Time span | 547 days |
| Number of courses | 1,302 |
| Number of sequences | 82,535 |
| Length of the longest sequence | 398 |
| Length of the shortest sequence | 3 |
| Average sequence length | 5.19 |
Example sequence database
| sid | Input sequence |
|---|---|
| Data structure, Introduction to logic, Operating system, Linear algebra, Introduction to big data | |
| Linear algebra, Data structure, Operating system, Data mining | |
| Database, Principles of economics, Data mining | |
| Database, Data structure, Operating system | |
| Introduction to big data, Database, Data mining |
Fig. 1Distribution of sequence lengths in the Course Recommendation dataset
Two sequences with enrollment dates
| sid | Input sequence |
|---|---|
| ( | |
| (Introduction to big data, 2017/2/14), ( |
Two input sequences with specific enrollment times
| sid | Input sequence |
|---|---|
| ( | |
| (Introduction to big data, 2017/2/14 8:21:00), ( |
S-projected database in the example sequence database
| sid | Input sequence |
|---|---|
| Linear algebra, Introduction to big data | |
| Data mining |
Although S = < Data structure, Operating system > is contained by IS1, IS2, and IS4, the S-projected database is only composed of two suffixes because IS4 / S = ∅
Fig. 2Comparison of execution times
Fig. 3Comparison of memory usage
Fig. 4Number of discovered patterns
Performance comparison for the level-wise algorithms
| Algorithm | Runtime (Sec) | Memory usage (MB) | Number of SPs |
|---|---|---|---|
| SPM-LC-L | 378.39 | 1212.01 | 2347 |
| SPM-DC-L | 601.68 | 1474.88 | 1911 |
| SPM-VC-L | 561.53 | 1318.12 | 2143 |
| SPM-FC-L | 589.31 | 1310.34 | 2081 |
Performance comparison for the projection-based algorithms
| Algorithm | Runtime (Sec) | Memory usage (MB) | Number of SPs |
|---|---|---|---|
| SPM-LC-P | 21.45 | 559.19 | 10,472 |
| SPM-DC-P | 31.53 | 694.98 | 11,847 |
| SPM-VC-P | 29.99 | 628.09 | 11,678 |
| SPM-FC-P | 30.71 | 599.87 | 11,325 |
Two input sequences containing S1
| sid | Input sequence |
|---|---|
| (Ideological and moral cultivation, 2016/10/18 3:47), (Introduction to Zizhi Tongjian, 2016/12/6 8:03), (Hybrid learning, 2016/12/6 1:42), (News photography, 2016/12/6 6:35), (The practice of MOOC teaching, 2016/12/6 12:19), ( | |
| (The practice of MOOC teaching, 2017/2/20 11:56), ( |
Two input sequences containing S2
| sid | Input sequence |
|---|---|
| ( | |
| ( |