| Literature DB >> 31824363 |
Zhuangzhuang Han1, Qiwei He2, Matthias von Davier3.
Abstract
The Programme for International Student Assessment (PISA) introduced the measurement of problem-solving skills in the 2012 cycle. The items in this new domain employ scenario-based environments in terms of students interacting with computers. Process data collected from log files are a record of students' interactions with the testing platform. This study suggests a two-stage approach for generating features from process data and selecting the features that predict students' responses using a released problem-solving item-the Climate Control Task. The primary objectives of the study are (1) introducing an approach for generating features from the process data and using them to predict the response to this item, and (2) finding out which features have the most predictive value. To achieve these goals, a tree-based ensemble method, the random forest algorithm, is used to explore the association between response data and predictive features. Also, features can be ranked by importance in terms of predictive performance. This study can be considered as providing an alternative way to analyze process data having a pedagogical purpose.Entities:
Keywords: PISA; feature generation; feature selection; interactive items; problem-solving; process data; random forests
Year: 2019 PMID: 31824363 PMCID: PMC6882413 DOI: 10.3389/fpsyg.2019.02461
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Countries and economies with sample sizes.
| Australia | 1,855 |
| Austria | 442 |
| Belgium | 726 |
| Bulgaria | 988 |
| Canada | 1,516 |
| Chile | 526 |
| Chinese Taipei | 494 |
| Columbia | 736 |
| Croatia | 962 |
| Czechia | 1,526 |
| Denmark | 636 |
| Estonia | 464 |
| Finland | 1,769 |
| France | 429 |
| Germany | 430 |
| Hong Kong | 433 |
| Hungary | 424 |
| Ireland | 407 |
| Israel | 440 |
| Italy | 453 |
| Japan | 1,005 |
| Korean | 449 |
| Macao | 519 |
| Malaysia | 938 |
| Montenegro | 917 |
| Netherland | 891 |
| Norway | 401 |
| Poland | 379 |
| Portugal | 486 |
| Russia | 504 |
| Serbia | 867 |
| Shanghai-China | 408 |
| Singapore | 469 |
| Slovak | 485 |
| Slovenia | 667 |
| Spain | 885 |
| Sweden | 418 |
| Turkey | 998 |
| United Arab Emirates | 1,023 |
| United States | 425 |
| Uruguay | 966 |
FIGURE 1A snapshot of the problem-solving item Climate Control in PISA 2012.
An example of process data for a test taker solving the climate control item.
| START_ITEM | 1288.1 | 1 | start | NULL | NULL | NULL | NULL | NULL | NULL |
| ACER_EVENT | 1291.9 | 2 | reset | 0 | 0 | 0 | 25 | 25 | NULL |
| ACER_EVENT | 1338.4 | 3 | apply | 1 | 1 | 1 | 27 | 28 | NULL |
| ACER_EVENT | 1346.8 | 4 | apply | 1 | 1 | 2 | 29 | 33 | NULL |
| ACER_EVENT | 1350.1 | 5 | apply | 1 | 2 | 2 | 31 | 36 | NULL |
| ACER_EVENT | 1354.5 | 6 | apply | 2 | 2 | 2 | 35 | 36 | NULL |
| ACER_EVENT | 1361.1 | 7 | apply | 2 | 1 | 1 | 36 | 36 | NULL |
| ACER_EVENT | 1361.1 | 8 | reset | 0 | 0 | 0 | 25 | 25 | NULL |
| ACER_EVENT | 1375.3 | 9 | diagram | NULL | NULL | NULL | NULL | NULL | 000000 |
| ACER_EVENT | 1376.2 | 10 | diagram | NULL | NULL | NULL | NULL | NULL | 000000 |
| ACER_EVENT | 1400.1 | 11 | diagram | NULL | NULL | NULL | NULL | NULL | 000000 |
| ACER_EVENT | 1402.1 | 12 | diagram | NULL | NULL | NULL | NULL | NULL | 000001 |
| ACER_EVENT | 1406.8 | 13 | diagram | NULL | NULL | NULL | NULL | NULL | 000001 |
| ACER_EVENT | 1408.4 | 14 | diagram | NULL | NULL | NULL | NULL | NULL | 000101 |
| ACER_EVENT | 1410.2 | 15 | diagram | NULL | NULL | NULL | NULL | NULL | 000101 |
| ACER_EVENT | 1410.6 | 16 | diagram | NULL | NULL | NULL | NULL | NULL | 100101 |
| END_ITEM | 1416.1 | 17 | end | NULL | NULL | NULL | NULL | NULL | NULL |
All levels of AD sequence with sample size and percentage of correctness.
| AD | 6490 | 2377 | 36.63 |
| ADA | 1118 | 522 | 46.69 |
| ADAD | 2996 | 1648 | 55.01 |
| ADADA | 697 | 401 | 57.53 |
| ADADAD | 8004 | 6470 | 80.83 |
| ADADADA | 1648 | 1459 | 88.53 |
| ADADADAD | 777 | 558 | 71.81 |
| ADADADADA | 250 | 188 | 75.20 |
| ADADADADAD | 167 | 115 | 68.86 |
| ADADADADADA | 64 | 41 | 64.06 |
| ADADADADADAD | 74 | 53 | 71.62 |
| ADADADADADADA | 29 | 17 | 58.62 |
| ADADADADADADAD | 15 | 8 | 53.33 |
| ADADADADADADADA | 8 | 6 | 75.00 |
| ADADADADADADADAD | 6 | 2 | 33.33 |
| ADADADADADADADADA | 7 | 3 | 42.86 |
| ADADADADADADADADAD | 4 | 1 | 25.00 |
| ADADADADADADADADADA | 3 | 1 | 33.33 |
| ADADADADADADADADADADAD | 1 | 0 | 0.00 |
| ADADADADADADADADADADADA | 1 | 1 | 100.00 |
| ADADADADADADADADADADADADA | 1 | 0 | 0.00 |
| ADADADADADADADADADADADADADA | 1 | 1 | 100.00 |
| ADADADADADADADADADADADADADAD | 1 | 1 | 100.00 |
| DA | 803 | 123 | 15.32 |
| DAD | 398 | 137 | 34.42 |
| DADA | 232 | 74 | 31.90 |
| DADAD | 190 | 91 | 47.89 |
| DADADA | 108 | 40 | 37.04 |
| DADADAD | 345 | 259 | 75.07 |
| DADADADA | 124 | 76 | 61.29 |
| DADADADAD | 84 | 54 | 64.29 |
| DADADADADA | 38 | 18 | 47.37 |
| DADADADADAD | 22 | 11 | 50.00 |
| DADADADADADA | 27 | 7 | 25.93 |
| DADADADADADAD | 11 | 5 | 45.45 |
| DADADADADADADA | 10 | 0 | 0.00 |
| DADADADADADADAD | 10 | 7 | 70.00 |
| DADADADADADADADA | 12 | 2 | 16.67 |
| DADADADADADADADAD | 6 | 2 | 33.33 |
| DADADADADADADADADA | 8 | 0 | 0.00 |
| DADADADADADADADADADA | 3 | 2 | 66.67 |
| DADADADADADADADADADAD | 1 | 0 | 0.00 |
| DADADADADADADADADADADA | 3 | 2 | 66.67 |
| DADADADADADADADADADADAD | 2 | 1 | 50.00 |
| DADADADADADADADADADADADA | 6 | 3 | 50.00 |
| DADADADADADADADADADADADAD | 1 | 0 | 0.00 |
| DADADADADADADADADADADADADAD | 1 | 1 | 100.00 |
| DADADADADADADADADADADADADADADA | 3 | 0 | 0.00 |
| DADADA | |||
| N | 5414 | 267 | 4.93 |
All contracted levels of AD sequence with sample size and percentage of correctness.
| Incomplete | 5414 | 267 | 4.93 |
| Start from D | 2448 | 915 | 37.38 |
| AD only | 6490 | 2377 | 36.63 |
| 1<=AD<3 | 4811 | 2571 | 53.44 |
| AD>=3 | 11061 | 8925 | 80.69 |
FIGURE 2A tree-based diagram for contracted levels of the AD sequence. Indices in parentheses are sample size, number of correct responses, and conditional probability of correctness, respectively, for each class or contracted class of the “AD sequence” variable.
Variables generated from process data of climate control task 1.
| Unigram | 3 | D, R, A |
| Bigram | 16 | DD, AA, RA, AR, AD, DA, AE, SD, SA, DR, DE, RD, RE, RR, SR, SE |
| Trigram | 48 | ADD, AAR, SRD, DDR, AAE, DRE, AAA, ARD, SDR, ADE, RAA, RRE, DDD, DAR, ARR, DAA, RDA, RRA, DAD, SDA, RRR, AAD, RAD, RRD, ADR, ARE, DRR, RDE, DRR, SRA, ADA, SAR, SRE, ARA, RAR, SDE, DRA, RDD, RDR, SDD, DAE, SAR, DDA, DRD, SRR, SAA, SAD, RAE |
| Behavioral indicators | 4 | AD sequence, VOTAT group, VOTAT num, n_actions |
| Time-related features | 6 | D time, A time, R time, E time, total time, time_bf_action. |
| Total | 77 |
FIGURE 3Prediction performance versus number of eliminated predictors for a backward elimination. Dashed lines record the change of validation performance (classification accuracy) for each training set as the number of eliminated feature increases; the bold solid line represents the average performance for five-fold; the vertical dashed line (the number of excluded features=49) indicates where a large reduction of prediction performance begins.
Features selected through the five-fold validated backward elimination.
| 21 features | D, AD sequence, VOTAT num, |
Features ranked by permutation importance measure (mean decrease accuracy).
| D | 0.199 |
| VOTAT group | 0.056 |
| AD sequence | 0.042 |
| VOTAT num | 0.023 |
| R | 0.022 |
| R time | 0.018 |
| DDD | 0.017 |
| n_actions | 0.015 |
| RA | 0.014 |
| A | 0.013 |
| D time | 0.009 |
| AAA | 0.008 |
| ADR | 0.007 |