Literature DB >> 30761366

Study on MOOC scoring algorithm based on Chinese University MOOC learning behavior data.

Yong Luo^1,2, Guochang Zhou¹, Jianping Li¹, Xiao Xiao³.

Abstract

Existing online learning evaluation methods do not accurately reflect learning effects, which only considers test and assignment scores. A comprehensive evaluation algorithm is proposed in this paper based on the big data of learning behavior. The conversion ratio is taken into account, which is defined by information entropy theory. The algorithm comprehensively considers the learner's multiple learning behaviors, such as viewing videos, doing exercises, taking exams, participating in discussions. The new evaluation algorithm can help learners understand the learning state and maintain their interest.

Entities: Chemical Disease Gene Species

Keywords: Applied mathematics; Computational mathematics; Education

Year: 2018 PMID： 30761366 PMCID： PMC6286268 DOI： 10.1016/j.heliyon.2018.e00960

Source DB: PubMed Journal: Heliyon ISSN： 2405-8440

Introduction

MOOCs (Massive Open Online Courses) [1] have developed rapidly. Coursera, Udacity and edX [2] are currently three major MOOC suppliers in the world. The Chinese University MOOC [3] is an important online open course operator in China. The massive online open courses [4] enable learners to access educational resources, share learning experiences, and gain certification. Although MOOCs are developing rapidly, there are also some problems, such as high dropout rates, low resource utilization and lack of an effective profit model [5]. The lack of scientific evaluation methods is an obstacle to the development of MOOCs. The current evaluation method is to weight the learner's exercise scores and exam score. But most learners learn online open courses not to obtain certificates. They may have watched some videos, completed some exercises, or participated in course discussions. These learning behaviors can all help master knowledge. How to evaluate these learning behaviors is a very valuable study. Many scholars have studied the learning behavior and learning evaluation. Tanmay Sinha [6] operationalizes video lecture clickstreams of students into cognitively plausible higher-level behaviors. Their results illustrate how such a metric inspired by cognitive psychology can help answer critical questions regarding students' engagement, their future click interactions and participation trajectories that lead to in-video and course dropouts. Through mining and analyzing the massive data of learning behavior of over 80000 participants from the courses, Jiang [7] endeavored to manifest more than one side of learning activity in MOOC. Meanwhile, according to the characteristic of learning behavior in Chinese MOOC, learners are classified into several groups and then the relationship between their learning behavior and performance is thoroughly studied. Geza Kovacs [8] analyzed how users interact with in-video quizzes, and how in-video quizzes influence users' lecture viewing behavior. Through data analysis, they found the peak period for students to think about issues. Kloft [9] presented an approach that works on click-stream data. Among other features, the machine learning algorithm takes the weekly history of student data into account and thus is able to notice changes in student behavior over time. Wang [10] adopt a content analysis approach to analyze students' cognitively relevant behaviors in a MOOC discussion forum and further explore the relationship between the quantity and quality of that participation with their learning gains. They built a computational model to automate the analysis so that it is possible to extend the content analysis to all communication that occurred in the MOOC. Qiu [11] analyzed key factors that influence students' engagement in MOOCs. They observe significant behavioral heterogeneity in students' course selection as well as their learning patterns. Ren, et al. [12] developed a personalized linear multiple regression (PLMR) model to predict the grade for a student, prior to attempting the assessment activity. The developed model is real-time and tracks the participation of a student within a MOOC and predicts the performance of a student on the next assessment within the course offering. These studies analyze the impact of learning behavior on learning outcomes, including watching videos [6, 9, 11], participating in discussions [7, 10], and doing exercises [8, 12]. Currently, the scoring algorithm for viewing video behavior is not provided. At the same time, the comprehensive evaluation algorithm that integrates all learning behaviors needs to be studied. How to score the viewing video behavior and how to score the learning process are our focus.

Materials & methods

Viewing video behavior data

Online education has become more and more popular, and MOOC learning behavior big data research [13, 14] has become more and more important. MOOC operators record learning behavior data for research that can help them improve their courses. At the same time, it is also beneficial for learners to master the rules of online learning. Our data are provided by the Chinese University MOOC. Table 1 lists the basic information of the 4 courses. These four courses are all Chinese National Great Open Online Courses (2017). The data are the records of the 2016 full semester. Active learners are our analysis objects, who contain at least one learning record. The visiting sites for the four courses are www.icourse163.org/course/NUDT-17004, www.icourse163.org/course/NUDT-9004, www.icourse163.org/university/ZJU, www.icourse163.org/course/ZJU-199001.

Table 1

Course information.

Name	Videos	Active learners	Exercises
Advanced mathematics	129	27664	19
Game theory	38	14749	8
C programming	81	24684	2
First aid knowledge	19	2295	8

Course information. The data include the duration of watching videos, the achievement of completing the exercise and test scores. The duration of watching videos is the total time for each learner to watch each video. The sampling period is 10 seconds for Advanced Mathematics, the other ones are 20 seconds. Thus, the viewing duration is an integer multiple of 10 or 20, and the unit is seconds. First of all, the problem of viewing video behavior scoring needs to be resolved. We analyzed the viewing video behavior data and hoped to understand its characteristics. According to the test scores, the learners are divided into passers, losers and abandoners. The Scores of passers are above the pass line (60%), but the losers took the exam but failed. Abandoners are learners who have studied but have not taken the test. The differences in the viewing video behavior for three kind of learners are studied. Fig. 1 shows the percentage of passers and losers in the four courses. The pass rates for active learners are between 0.9% and 9%, which are close to that of other MOOC platforms such as Cousera and edX.

Fig. 1

The proportion of passers and losers for 4 courses.

The proportion of passers and losers for 4 courses. For MOOCs, the course videos are the most important learning resource [5]. Because video is the main carrier of course knowledge, the general idea is that knowledge cannot be acquired without watching the course video [15]. Therefore, it is reasonable to think that there are significant differences between the three types of learners in watching video. However, the results surprised us. According to how long the learner watched the video, we studied the learner's complete viewing of video behavior. Considering that the learner can skip the invalid time of the opening and ending video, the complete viewing video refers to the viewing time exceeds 80% of the video length. The ratio here is an empirical value, and we found no significant difference when the ratio is between 0.8 and 0.9. This ratio is an empirical value. On the other hand, if the time to watch the video is too short, knowledge cannot be mastered. So, the duration less than 20 seconds [15] is not valid. Thus, partial viewing video behavior refers to duration more than 20 seconds, but less than 80%. Four courses listed in Table 1 are chosen to analyze the differences in the viewing video behaviors. The abscissa of Fig. 2 is the number of complete viewing videos. When a course has videos, its value range is . Fig. 2 is a probability distribution of the complete viewing videos number. The vertical axis shows the probability of each value. The blue curve represents the passers, and the purple curve corresponds the losers. The red curve represents abandoners. It was found that the difference between the three curves was not significant for all four courses.

Fig. 2

The probability distribution of the complete viewing videos number for 4 courses.

The probability distribution of the complete viewing videos number for 4 courses. So, it can be inferred that the most of learners who passed the test actually did not watch the video. Failure to watch the video of the course means that learners lack the process of learning knowledge. The greatest possibility is that they have mastered the knowledge of this course, and just want to test. For MOOCs, the analysis of the video viewing behavior tells us that the current course score does not reflect the learning effect. Many learners who receive a course certificate do not learn through MOOC. However, more learners have gained knowledge but have not been evaluated.

Mastering learning theory

MOOC's core teaching principle is master learning [16]. The central point is: the learning ability of a student does not directly determine his learning effectiveness. It only determines the amount of time, and it takes him to master the content. Students just invest the time needed to learn the knowledge. With the help of the teacher, 90% of the students can master the knowledge imparted. Based on this basic theory, the total time of video viewing can be used as a measure of knowledge acquired by learners. Assume that a course has $n$ video. For a learner, the total time of viewing the -th video is , then the record of viewing videos can be represented as a vector . The length of each video can also be represented as a vector . Thus, the completion ratio can be obtained. The mastered knowledge can be measured by learning time, so the completion ratio can be used to measure the ratio of acquired knowledge to the amount of knowledge contained in the video. The average () represents the average proportion of knowledge acquired by the learner throughout the learning process. In this paper, is used as one of parameters for viewing video score.

Conversion rate

If is directly used as the score for viewing video behavior, it is not sufficient. Because, here is the problem of conversion efficiency. When the parameter of two learners is equal, it does not mean that the obtained knowledge is also equal. We convert the elements of set into ratios , where . Let Assume that there are two learners (A and B), and their parameter . The distribution of and are shown in Fig. 3. For learner A, the proportions of , and are very high. When , decreases rapidly. The distribution of is more uniform. For learner A, most time of viewing video was spent on the first few videos. By repeatedly watching several videos, the knowledge that can be acquired is limited. Learner B's viewing time is evenly distributed to each video, indicating that he/she has maintained interest in learning and has systematically learned the knowledge. So, learner B may gain more knowledge than learner A.

Fig. 3

A schematic diagram of the distribution of $P$ for two learners.

A schematic diagram of the distribution of $P$ for two learners. The approximate uniform distribution of can be used as the conversion efficiency of knowledge. The closer is to the uniform distribution, the higher the knowledge conversion rate is. Shannon information entropy theory [17] is introduced to solve the problem of knowledge conversion rate. According to the information entropy formula given by Shannon, for any random variable , its information entropy is defined as follows, in units of bits. Information entropy describes the uncertainty of random variables. Consider as an independent variable. The maximum value of is equivalent to solving the conditional extremum problem. To solve the equation, construct the Lagrange equation [18] as follow. From the necessary conditions of extreme values, the following equations are obtained. When , takes the maximum value, which is marked as . That is, when is uniformly distributed, takes the maximum. On the other hand, when , takes the minimum. If a learner spends almost all of time on one video, it means that the learner does not want to know more about the video. Learning without interest is not effective. When a learner spend time watching each video, it means video is attractive. At the same time, his knowledge of learning is also systematic, and the learning effect is good. The ratio can describe the extent to which the distribution of approaches a uniform distribution. Therefore, we define to express the rate of conversion ratio. The score of the learner watching the video is expressed as . Because knowledge is unmeasurable, is just a response of quantity. It can be used as a score for viewing video behavior.

Design

We removed videos that did not contain learning content, such as course introductions. The length of the lecture video is usually greater than 120 seconds [19]. However, in the collected learning behavior data, a large number of learners watch a video for a total time of no more than 60 seconds. As everyone knows, too short viewing time does not enough for learning. Assuming that there are types of learning behavior, the score of each learning behavior is normalized to the interval [0,1]. The score of the -th learning behavior is , where . Fig. 4 is a radar graph of learning behavior scores. The -th learning behavior score is marked as in the figure. The area boundary surrounded by is denoted as .

Fig. 4

Radar graph of learning behavior scores.

Radar graph of learning behavior scores. From the additivity of the curve integral, can be expressed as the sum of the fractional integrals. Calculate the integral . Let and be and respectively. The parameter equation of line segment can be expressed as Calculate the curve integral along . Let . Then, the curve integral can be converted into a definite integral. To simplify the expression, can be expressed as The area of can be obtained by adding each segment integral. The area is calculated as When , . The score of each learning behavior is , its coordinate value can be obtained by the following formula. Substituting coordinate values into the area formula, the area can be expressed as When , takes the maximum and is denoted as . The process score is defined as . The geometric meaning of the process score can be understood from Fig. 4, the process score is the ratio of the area of the red area to the area of the outer positive n-edge.

Calculation

Apply comprehensive score to rate learners in four courses. Scoring learning behaviors include watching videos, completing exercises, participating in discussions, taking mock tests, and taking course exams. The scores for participation in the discussion are determined by the rank of participation in the discussion among all learners. In order to compare the two scoring modes, we calculated the current score ratio and the comprehensive scoring ratio of the 4 courses. The current score ratio is the ratio of the number of learners with a score greater than 0 to the number of electives in the course. The comprehensive scoring ratio is the ratio of the number of learners with a process score greater than 0 to the number of electives in the course. The current MOOC scoring method is a weighted average exam and practice scores, and it is not possible to quantify the course of watching videos and participating in discussions. The comprehensive scores take into account all the learning behaviors. Fig. 5 compares current course scoring rate and comprehensive scoring rate of the four courses. The data in Fig. 5 shows that the ratio of current course scores is much lower than the process score ratio. This feature is very significant in the course of C Programming. Of course, the difference between the two ratios of some courses is not so significant, such as Advanced Mathematics. The high comprehensive scoring rate indicates that learners participate in many learning activities. It also shows that the course is more attractive. But there are many factors influencing the attraction of the course, such as the content of the course.

Fig. 5

Comparison of current course scoring rate and comprehensive scoring rate.

Comparison of current course scoring rate and comprehensive scoring rate. Comprehensive score can give an evaluation of the learning behaviors, which can help learners understand learning status. Because more than 90% of learners do not receive a course certificate, learning process evaluation can maintain and stimulate motivation. The videos are the knowledge carrier for online open courses, which are the most important learning resource for learners. To illustrate the scientific nature of the comprehensive score, we analyzed the relationship between watching video behavior scores and current course scores, watching video behavior scores and comprehensive scores. Fig. 6 compares the relationship between viewing video behavior and two types of scores for four courses. The left image in Fig. 6 is a scatter plot of the video behavior scores and current course scores. The right image is a scatter plot of the video behavior scores and comprehensive scores. Observing the left image, it can be seen that there is no significant relationship between the current course scores and the video viewing behavior. In the right image, the comprehensive scores show a significant positive linear correlation with the viewing video behavior.

Fig. 6

The comprehensive score can more accurately reflect the learning effect.

The comprehensive score can more accurately reflect the learning effect. The main knowledge carriers of MOOC are videos [15], but many MOOC learners have obtained a course certificate without learning behavior. Therefore, it is unscientific to evaluate MOOC courses by simply referring to test scores. The comprehensive score is more accurate in the evaluation of learning.

Conclusion

All the conclusions of this paper are based on the data provided by the Chinese University MOOC. Perhaps the platform may be different, and learners may also have differences. This paper studies the comprehensive evaluation of MOOC learners. The learning behavior data are provided by the Chinese University MOOC. A good learning process scoring method should be able to score all learning behaviors. Under the existing scoring method, many learners who watched the video, completed the exercise, and participated in the discussion did not receive course scores. In the reality that most learners' goals are not for course certificates, the comprehensive evaluation of learning is necessary. The data in Fig. 1 shows that the passing rate of MOOC courses is very low, and most learners are not trying to get a course certificate. In this paper, it is found that the behavioral characteristics of watching videos are very close for three types of learner (Fig. 2). Because passers did not watch enough videos, we conclude that most of passers already mast knowledge of the course. Online courses are used as an aid to their learning. Therefore, current course scores do not truly reflect the learning effect. The effect of viewing video is a research focus of this paper. Based on the mastering learning theory [16], the length of time to watch the video serves as a measure of the amount of knowledge. However, knowledge acquired by learners also needs to consider the conversion rate. In this paper, the conversion rate is determined by the allocation of viewing video time. When the learner spends enough time on every video, the learning effect is better. Shannon information entropy theory [17] can describe this trend. The knowledge conversion rate in this paper is defined as the ratio of the information entropy to the maximum, which can indicate the reasonableness of the time allocation. The comprehensive scores use a geometric approach. The ratio of the radar graph (Fig. 4) area to the maximum is taken as the process score. When calculating the area, the Green formulation [20] is used to convert the double integral into the curve integral. This paper derives the formula for calculating the process score. The results show that the process score achieves the composite evaluation. The percentage of learners who scored has increased significantly, and some courses have increased nearly 70 times. In addition, comprehensive scores are more accurate for learning assessments. Without the learning process, only the learners who took the exam were found. On the other hand, a learning behavior radar graph can also help learners to accurately understand their status. The comprehensive score can help them analyze learning characteristics and understand the learning behaviors that need to be strengthened. There are still some issues in this article that merit further discussion. The courses studied in this article are taught in Chinese. Another interesting research is whether the learning behavior of other language courses shows similar rules. We expect that researchers with such data sets will be able to give answers.

Declarations

Author contribution statement

Yong Luo: Conceived and designed the experiments; Analyzed and interpreted the data; Wrote the paper. Guochang Zhou: Performed the experiments. Jianping Li: Conceived and designed the experiments. Xiao Xiao: Contributed reagents, materials, analysis tools or data.

Funding statement

This work was supported by the Open Research Fund of National Ministry of Education Higher Education Research Center Grant NO. 2017008. This work was supported by the by the Open Research Fund of Hunan Provincial Key Laboratory of Network Investigational Technology Grant NO. 2017WLZC003. Yong Luo has received research grant from the National Ministry of Education Higher Education Research Center. Jianping Li has received research grant from the Chinese University MOOC.

Competing interest statement

The authors declare no conflict of interest.

Additional information

No additional information is available for this paper.

2 in total

1. Enhancement of Medical Students' Performance and Motivation in Pathophysiology Courses: Shifting From Traditional Instruction to Blended Learning.

Authors: Dan Wang; Junhai Zhou; Qiuhui Wu; Guannan Sheng; Xin Li; Huiling Lu; Jing Tian
Journal: Front Public Health Date: 2022-01-26

2. Online-Offline Teaching for Bio-Pharmaceutical Students During the COVID-19 Pandemic: The Case Study of Advanced Mathematics in Application-Oriented Universities of China.

Authors: Weicai Peng; Shuchao Wang
Journal: Front Public Health Date: 2022-07-14

2 in total