| Literature DB >> 34907497 |
Björn E Hommel1,2, Franz-Josef M Wollang3, Veronika Kotova4, Hannes Zacher5, Stefan C Schmukle5.
Abstract
Algorithmic automatic item generation can be used to obtain large quantities of cognitive items in the domains of knowledge and aptitude testing. However, conventional item models used by template-based automatic item generation techniques are not ideal for the creation of items for non-cognitive constructs. Progress in this area has been made recently by employing long short-term memory recurrent neural networks to produce word sequences that syntactically resemble items typically found in personality questionnaires. To date, such items have been produced unconditionally, without the possibility of selectively targeting personality domains. In this article, we offer a brief synopsis on past developments in natural language processing and explain why the automatic generation of construct-specific items has become attainable only due to recent technological progress. We propose that pre-trained causal transformer models can be fine-tuned to achieve this task using implicit parameterization in conjunction with conditional generation. We demonstrate this method in a tutorial-like fashion and finally compare aspects of validity in human- and machine-authored items using empirical data. Our study finds that approximately two-thirds of the automatically generated items show good psychometric properties (factor loadings above .40) and that one-third even have properties equivalent to established and highly curated human-authored items. Our work thus demonstrates the practical use of deep neural networks for non-cognitive automatic item generation.Entities:
Keywords: automatic item generation; deep learning; language modeling; natural language processing; neural networks
Mesh:
Year: 2021 PMID: 34907497 PMCID: PMC9166894 DOI: 10.1007/s11336-021-09823-9
Source DB: PubMed Journal: Psychometrika ISSN: 0033-3123 Impact factor: 2.290
Fig. 1Schematic Diagram of the Attention-Mechanism and Components of the Transformer Architecture. Note. The process illustrates the encoding and transformation of the sequence “walks by river bank” by components of the transformer architecture (Vaswani et al., 2017). Weight matrices ( and are randomly initialized and then learned during the training process. In case of causal language models, masking (see Eq. 5) is applied to . (a) Matrix product of and ; (b) Scaling and softmax is applied; Input sequence length; Model dimensionality, i.e., length of embedding vectors; h Current attention head; Number of attention heads; Current layer; Embedding matrix (; Embedding matrix subset (; Key, query, and value weight matrices (; Transposed key matrix (; Query matrix (; Value matrix (; Attention matrix (; Weight matrix (; Layer output matrix (; = Matrix subdivision; = Matrix concatenation.
Fig. 2Illustration of the Workflow of the Proposed Method for Construct-Specific Automatic Item Generation. Note. Workflow for (a) fine-tuning a causal transformer model using the proposed segmented training pattern, and (b) applying the partial pattern to prompt a causal transformer for the generation of construct-specific item stems. The depicted transformer shows the 12-layer decoder architecture of the Generative Pretrained Transformer adopted from Radford et al. (2018), although the workflow in principle is agnostic to what causal transformer architecture is chosen.
Fig. 3Differences in Search Heuristics for Generated Items and Tokens. Note. Item generation after fine-tuning when prompted for the construct label Pessimism, using various search heuristics. (a) greedy search; (b) beam search with search beams, dashed lines indicate lower total sequence probabilities; (c) to (g) show next-token probabilities for the premise “#Pessimism@I am” on the y-axis; (c) multinomial sampling with no transformation; (d) multinomial sampling with ; (e) multinomial sampling with nucleus sampling at ; (f) multinomial sampling with temperature ; and (g) multinomial sampling with temperature .
Comparison of Confirmatory Factor Analyses of Human- and Machine-authored Scales for Trained Construct Labels
| Human-authored | Machine-authored | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Scale | CFI | RMSEA | CFI | RMSEA | |||||||||
| Openness to experience | .95 | .14 | .62 | [.82, .72] | .72 | [.65, .78] | .95 | .10 | .54 | [.44, .75] | .66 | [.66, .58] | .097 |
| Conscientiousness | .93 | .23 | .72 | [.74, .81] | .81 | [.76, .85] | 1.00 | .00 | .44 | [.15, .69] | .46 | [.46, .36] | |
| Extraversion | .98 | .15 | .77 | [.89, .86] | .86 | [.82, .89] | 1.00 | .05 | .67 | [.34, .90] | .75 | [.75, .68] | |
| Agreeableness | .96 | .17 | .73 | [.86, .80] | .80 | [.75, .85] | .80 | .27 | .58 | [.35, .87] | .63 | [.63, .49] | |
| Neuroticism | .99 | .13 | .80 | [.91, .87] | .87 | [.84, .90] | .98 | .17 | .56 | [.02, .92] | .70 | [.70, .61] | |
Note. respondents. Mean of standardized factor loadings; Range of standardized factor loadings; Omega coefficient of internal consistency; percentile bootstrapped 95% confidence interval for omega coefficient. bootstrapped probability of models’ differences in omega coefficients ( bootstrapped resamples; data from 446 iterations were omitted due to failed model convergence).
Descriptive Statistics and Factor Loadings of Machine-authored Items for Trained Construct Labels
| Item | Frequencies | Skewness | Kurtosis | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | |||||||
| I can enjoy a wide variety of musical styles. (OPE | 4.10 | 1.05 | 7 | 13 | 30 | 71 | 99 | 0.76 | .62 | 1 | |
| I like to be surprised. (OPE | 3.13 | 1.32 | 32 | 39 | 61 | 45 | 43 | .36 | 0 | ||
| I love to contemplate the universe and its beauty. (OPE | 3.94 | 1.12 | 9 | 15 | 46 | 60 | 90 | .65 | 1 | ||
| I like to be with people who are different from myself. (OPE | 3.50 | 1.06 | 9 | 25 | 75 | 68 | 43 | .35 | 0 | ||
| I am not a fan of change. (OPE-) | 3.11 | 1.32 | 29 | 47 | 61 | 36 | 47 | 0.00 | .35 | 0 | |
| I am not always on time for work. (CON-) | 4.01 | 1.28 | 12 | 28 | 21 | 43 | 116 | .53 | 0 | ||
| I know that I make many mistakes. (CON-) | 2.53 | 1.20 | 53 | 61 | 57 | 35 | 14 | 0.35 | .20 | 0 | |
| I work too hard. (CON | 3.17 | 1.28 | 25 | 45 | 62 | 44 | 44 | .55 | 0 | ||
| I do not like to read or study. (CON-) | 4.23 | 1.04 | 8 | 8 | 27 | 59 | 118 | 1.55 | .54 | 0 | |
| I am not concerned with details. (CON-) | 4.27 | 0.95 | 4 | 10 | 23 | 68 | 115 | 1.57 | .65 | 0 | |
| I am able to speak confidently. (EXT | 3.96 | 1.11 | 8 | 18 | 37 | 69 | 88 | 0.06 | .84 | 1 | |
| I avoid public places. (EXT-) | 3.50 | 1.28 | 21 | 31 | 44 | 66 | 58 | .46 | 0 | ||
| I am able to handle myself in a crowd. (EXT | 3.98 | 1.07 | 8 | 15 | 34 | 79 | 84 | 0.44 | .73 | 1 | |
| I do not like to talk about myself. (EXT-) | 2.59 | 1.25 | 50 | 65 | 52 | 32 | 21 | 0.41 | .45 | 0 | |
| I am able to hold my own in a discussion. (EXT | 4.16 | 0.97 | 6 | 11 | 19 | 90 | 94 | 1.74 | .60 | 1 | |
| I care a lot about others. (AGR | 4.25 | 0.92 | 4 | 5 | 34 | 67 | 110 | 1.30 | .87 | 1 | |
| I am easily angered. (AGR-) | 3.96 | 1.17 | 11 | 19 | 31 | 65 | 94 | 0.07 | .39 | 0 | |
| I don’t like to argue. (AGR | 3.95 | 1.14 | 10 | 17 | 38 | 65 | 90 | 0.05 | .23 | 0 | |
| I am not easily offended. (AGR | 3.43 | 1.25 | 16 | 45 | 38 | 71 | 50 | .24 | 0 | ||
| I am not a nice person. (AGR-) | 4.51 | 0.84 | 3 | 5 | 17 | 47 | 148 | 3.83 | .79 | 1 | |
| I am generally happy and content. (NEU-) | 2.15 | 1.19 | 82 | 70 | 36 | 18 | 14 | 0.91 | .72 | 0 | |
| I am often upset by minor things. (NEU | 2.30 | 1.23 | 72 | 69 | 33 | 34 | 12 | 0.64 | .89 | 1 | |
| I am a person who is easily moved by the good moods and bad moods of others. (NEU | 3.47 | 1.23 | 22 | 24 | 52 | 73 | 49 | .28 | 0 | ||
| I am generally cheerful and optimistic. (NEU-) | 2.30 | 1.26 | 72 | 67 | 43 | 18 | 20 | 0.76 | .69 | 0 | |
| I seldom feel scared. (NEU-) | 3.01 | 1.28 | 30 | 57 | 45 | 56 | 32 | 0.00 | .38 | 0 | |
Note. Based on data from respondents. Standardized factor loading in a CFA model with the five human-authored items and the respective machine-authored item; Factor loading of respective machine-authored item within the range of factor loadings for human-authored scales (1 within the range); OPE Openness to experience; CON Conscientiousness; EXT Extraversion; AGR Agreeableness; NEU Neuroticism; /- indicates positive or negative keying.
Goodness of Fit Statistics, Factor Loadings and Reliability Estimates of Confirmatory Factor Analyses of Machine-authored Scales for Untrained Construct Labels
| Scale | CFI | RMSEA | ||||
|---|---|---|---|---|---|---|
| Benevolence | 1.00 | .05 | .69 | [.49, .94] | .74 | [.67, .79] |
| Egalitarianism | .99 | .09 | .76 | [.67, .87] | .78 | [.69, .85] |
| Egoism | .90 | .12 | .44 | [.08, .85] | .58 | [.47, .67] |
| Joviality | .83 | .16 | .44 | [.17, .92] | .54 | [.42, .62] |
| Pessimism | .99 | .11 | .70 | [.45, .93] | .82 | [.77, .86] |
Note. respondents. Mean of standardized factor loadings; Range of standardized factor loadings; Omega total coefficient of internal consistency; bootstrapped 95% confidence interval for omega coefficient, based on bootstrap iterations.
Descriptive Statistics and Factor Loadings of Machine-authored Items for Untrained Construct Labels
| Item | Frequencies | Skewness | Kurtosis | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | ||||||
| I care about others’ well-being. (BEN | 4.41 | 0.76 | 2 | 1 | 21 | 77 | 119 | 2.61 | .78 | |
| I forgive others. (BEN | 3.85 | 1.09 | 9 | 19 | 39 | 82 | 71 | 0.05 | .55 | |
| I am not a person who would do anything nice for anyone. (BEN-) | 4.57 | 0.79 | 2 | 6 | 12 | 44 | 156 | 4.71 | .66 | |
| I have little sympathy for poor people. (BEN-) | 4.17 | 1.23 | 14 | 17 | 16 | 44 | 129 | 0.70 | .49 | |
| I am not interested in others feelings. (BEN-) | 4.30 | 0.98 | 4 | 11 | 25 | 55 | 125 | 1.36 | .94 | |
| I believe that the rights of others should be treated equally. (EGA | 4.72 | 0.59 | 1 | 2 | 4 | 43 | 170 | 10.18 | .87 | |
| I believe that all races are created equal. (EGA | 4.60 | 0.89 | 6 | 4 | 13 | 27 | 170 | 6.09 | .71 | |
| I believe that it is wrong to exploit others for your own gain. (EGA | 4.52 | 0.92 | 7 | 5 | 9 | 44 | 155 | 5.37 | .67 | |
| I believe in the equality of all peoples. (EGA | 4.65 | 0.72 | 2 | 3 | 11 | 38 | 166 | 6.95 | .81 | |
| I believe that the rights of others should be respected without question. (EGA | 4.35 | 0.84 | 2 | 6 | 22 | 72 | 118 | 1.88 | .77 | |
| I believe that I have the right to my own way of life. (EGO | 4.45 | 0.72 | 2 | 1 | 15 | 79 | 123 | 3.69 | .08 | |
| I often exaggerate my achievements. (EGO | 1.94 | 1.11 | 97 | 74 | 25 | 13 | 11 | 1.24 | 0.84 | .26 |
| I believe that I am the best. (EGO | 2.57 | 1.35 | 67 | 44 | 50 | 35 | 24 | 0.34 | .85 | |
| I believe that I have more power than others. (EGO | 2.20 | 1.17 | 78 | 63 | 46 | 22 | 11 | 0.71 | .60 | |
| I am not overly proud of my achievements. (EGO-) | 3.28 | 1.32 | 26 | 41 | 49 | 54 | 50 | .39 | ||
| I am very jovial. (JOV | 3.37 | 1.18 | 15 | 37 | 65 | 57 | 46 | .92 | ||
| I do things that are not fun. (JOV-) | 3.34 | 1.23 | 16 | 41 | 69 | 41 | 53 | .17 | ||
| I sometimes laugh out loud. (JOV | 4.33 | 0.93 | 4 | 11 | 13 | 73 | 119 | 2.39 | .18 | |
| I am never sad. (JOV | 1.92 | 1.14 | 106 | 61 | 26 | 18 | 9 | 1.15 | 0.40 | .39 |
| I am easily entertained. (JOV | 3.62 | 1.06 | 12 | 17 | 58 | 88 | 45 | 0.05 | .55 | |
| I am not likely to succeed in my goals. (PES | 1.90 | 1.13 | 110 | 54 | 33 | 14 | 9 | 1.15 | 0.46 | .71 |
| I can see that things are never going to be the way I want them to be. (PES | 2.72 | 1.33 | 51 | 50 | 57 | 33 | 29 | 0.26 | .52 | |
| I am not optimistic. (PES | 2.09 | 1.28 | 103 | 49 | 26 | 29 | 13 | 0.88 | .93 | |
| I am always on the lookout for a better way. (PES-) | 1.99 | 0.97 | 79 | 83 | 44 | 9 | 5 | 0.90 | 0.55 | .45 |
| I look at the bright side. (PES-) | 2.23 | 1.25 | 79 | 69 | 32 | 23 | 17 | 0.83 | 0.90 | |
Note. Based on data from respondents. Standardized factor loadings in a CFA model including the five machine-authored items of the respective dimension; BEN Benevolence; EGA Egalitarianism; EGO Egoism; JOV Joviality; PES Pessimism; /- indicates positive or negative keying.