| Literature DB >> 36035509 |
Anna-Carolina Haensch1, Bernd Weiß2, Patricia Steins3, Priscilla Chyrva2,3, Katja Bitz4.
Abstract
In this study, we demonstrate how supervised learning can extract interpretable survey motivation measurements from a large number of responses to an open-ended question. We manually coded a subsample of 5,000 responses to an open-ended question on survey motivation from the GESIS Panel (25,000 responses in total); we utilized supervised machine learning to classify the remaining responses. We can demonstrate that the responses on survey motivation in the GESIS Panel are particularly well suited for automated classification, since they are mostly one-dimensional. The evaluation of the test set also indicates very good overall performance. We present the pre-processing steps and methods we used for our data, and by discussing other popular options that might be more suitable in other cases, we also generalize beyond our use case. We also discuss various minor problems, such as a necessary spelling correction. Finally, we can showcase the analytic potential of the resulting categorization of panelists' motivation through an event history analysis of panel dropout. The analytical results allow a close look at respondents' motivations: they span a wide range, from the urge to help to interest in questions or the incentive and the wish to influence those in power through their participation. We conclude our paper by discussing the re-usability of the hand-coded responses for other surveys, including similar open questions to the GESIS Panel question.Entities:
Keywords: machine learning; semi-automated analysis; support vector machine (SVM); survey methodology; survey research; text analysis
Year: 2022 PMID: 36035509 PMCID: PMC9403118 DOI: 10.3389/fdata.2022.880554
Source DB: PubMed Journal: Front Big Data ISSN: 2624-909X
Figure 1Approximate design of the GESIS Panel questionnaire, mail survey, wave bf, recreated by the authors. Translation reads: “(2) For what reasons do you participate in the surveys of the Please name the three most important reasons. Most important reason: …, Second most important reason: …Third most important reason: ….”
List of categories for respondents' motivation to participate in the GESIS Panel.
|
| ||
|---|---|---|
| 1. Interest | 10. Help science | 19. Other survey characteristics |
| 2. Curiosity | 11. Help politicians | 20. Importance in general |
| 3. Learning | 12. Help society | 21. No reason/Other |
| 4. Tell opinion | 13. Help | |
| 5. Influence | 14. Brevity | |
| 6. Incentive | 15. Anonymity | |
| 7. Fun | 16. Professionalism | |
| 8. Routine | 17. Recruiter | |
| 9. Dutifulness | 18. Recruitment | |
A more detailed coding scheme with examples can be found in the Appendix.
Figure 2Most important reason given by panelists in wave “bf” (2014) of the GESIS Panel. Semi-automated classification of reasons described in Section 4.2.3.
Steps in semi-automated coding, choice of methods and alternatives.
|
|
|
|---|---|
|
| |
| Sampling | |
| Number of coders | 1, |
| Resolving differences in coding | (1) Reaching consensus between original coders (2) |
|
| |
| Spellchecking | |
| Lowercasing | Yes/ |
| Stemming or | Yes/ |
| lemmatization | Yes/ |
| Stopword removal | Yes/ |
| Tokenization | |
| Inclusion of word/sentence embeddings | |
| Inclusion of non-text data | Yes/ |
|
| |
| Statistical learning algorithm | Tree-based methods (e.g., boosting or random forests), |
| Additional human coding for observations with low predictive probability | Yes/ |
|
| |
| Evaluation parameters | |
Our decisions for our example are in bold font.
Figure 3Logistic regression of panel dropout on independent variable most important participation reason, categorized in four broader categories “Extrinsic reasons,” “Intrinsic reasons,” and “Survey-related reasons,” and “Other reasons.” Reference category for reason: No reason given, reference category for gender: female, reference category for education: no formal education diploma. The AME for unit nonresponse is 0.4140 (SE 0.0183), not depicted since it is outside of the x-axis scale.