| Literature DB >> 35721677 |
Pavel Kiselev1, Valeriya Matsuta2, Artem Feshchenko2, Irina Bogdanovskaya3, Boris Kiselev4.
Abstract
Predicting personality traits from social networking site profiles can help to assess individual differences in verbal reasoning without using long questionnaires. Inspired by earlier studies, which investigated whether abstract-thinking ability are predictable by social networking sites data, we used supervised machine learning to predict verbal-reasoning ability based on a proposed set of features extracted from virtual community membership. A large sample (N = 3,646) of Russian young adults aged 18-22 years approved access to the data from their social networking accounts and completed an online test on verbal reasoning. We experimented with binary classification machine-learning models for verbal-reasoning prediction. Prediction performance was tested on isolated control subsamples for men and women. The results of prediction on AUC-ROC metrics for control subsamples over 0.7 indicated reasonably good performance on predicting verbal-reasoning level. We also investigated the contribution of virtual community's genres to verbal reasoning level prediction for male and female participants. Theoretical interpretations of results stemming from both Vygotsky's sociocultural theory and behavioural genomics are discussed, including the implication that virtual communities make up a non-shared environment that can cause variance in verbal reasoning. We intend to conduct studies to explore the implications of the results further.Entities:
Keywords: Machine learning; Social networking site; Verbal reasoning; Virtual community
Year: 2022 PMID: 35721677 PMCID: PMC9198326 DOI: 10.1016/j.heliyon.2022.e09664
Source DB: PubMed Journal: Heliyon ISSN: 2405-8440
Subsample size for sex and verbal reasoning.
| Percentile | Development | Control | Total | |||
|---|---|---|---|---|---|---|
| Men n (% all develop. men) | Women n (% all develop. women) | Men n (% all control men) | Women n (% all control women) | All Men develop. + control (% total men) | All Women develop. + control (% total women) | |
| ≤75th | 973 (76.9%) | 1,448 (71.8%) | 108 (77.1%) | 161 (71.9%) | 1,081 (76.9%) | 1,609 (71.8%) |
| >75th | 292 (23.1%) | 569 (28.2%) | 32 (22.9%) | 63 (28.1%) | 324 (23.1%) | 632 (28.2%) |
| Total | 1,265 | 2017 | 140 | 224 | 1,405 | 2,241 |
Descriptive statistics for verbal-reasoning ability.
| Sex | Mean | n | SD | Min | Max | 75th percentile |
|---|---|---|---|---|---|---|
| Men | 7.62 | 1,405 | 3.38 | 0 | 14 | 10 |
| Women | 8.15 | 2,241 | 3.24 | 0 | 14 | 10 |
Note. SD, standard deviation; Min, minimum; Max, maximum.
Mean, standard deviation, corrected item-total correlation, and Cronbach's alpha of the verbal-reasoning test.
| Item | Mean | SD | Item-total correlation | Cronbach's alpha if item deleted | ||||
|---|---|---|---|---|---|---|---|---|
| Men | Women | Men | Women | Men | Women | Men | Women | |
| Q_1 | 0.63 | 0.70 | 0.484 | 0.459 | 0.334 | 0.342 | 0.782 | 0.767 |
| Q_2 | 0.50 | 0.50 | 0.500 | 0.500 | 0.449 | 0.466 | 0.772 | 0.755 |
| Q_3 | 0.70 | 0.79 | 0.459 | 0.408 | 0.295 | 0.236 | 0.785 | 0.775 |
| Q_4 | 0.73 | 0.78 | 0.446 | 0.416 | 0.510 | 0.476 | 0.768 | 0.755 |
| Q_5 | 0.51 | 0.54 | 0.500 | 0.499 | 0.346 | 0.395 | 0.782 | 0.762 |
| Q_6 | 0.68 | 0.71 | 0.468 | 0.456 | 0.395 | 0.452 | 0.777 | 0.757 |
| Q_7 | 0.28 | 0.34 | 0.447 | 0.475 | 0.307 | 0.303 | 0.784 | 0.771 |
| Q_8 | 0.53 | 0.53 | 0.500 | 0.499 | 0.572 | 0.522 | 0.761 | 0.749 |
| Q_9 | 0.61 | 0.65 | 0.489 | 0.476 | 0.404 | 0.360 | 0.776 | 0.765 |
| Q_10 | 0.58 | 0.65 | 0.493 | 0.477 | 0.473 | 0.458 | 0.770 | 0.756 |
| Q_11 | 0.67 | 0.70 | 0.471 | 0.457 | 0.496 | 0.443 | 0.768 | 0.758 |
| Q_12 | 0.30 | 0.32 | 0.458 | 0.466 | 0.438 | 0.403 | 0.773 | 0.761 |
| Q_13 | 0.17 | 0.18 | 0.380 | 0.387 | 0.324 | 0.255 | 0.782 | 0.773 |
| Q_14 | 0.74 | 0.77 | 0.438 | 0.423 | 0.319 | 0.318 | 0.783 | 0.769 |
Note. SD, standard deviation.
Best Parameters of Classifiers for Development of Subsample of Russian young adults by Five-Fold Cross-Validation in Measuring Verbal-Reasoning Ability.
| Classifier | Sex | ||
|---|---|---|---|
| Men | Women | ||
| CatBoost | Depth | 5 | 4 |
| Learning rate | 0.17 | 0.1 | |
| Number of estimators | 25 | 100 | |
| Scale positive weight | 4 | 3 | |
| Positive index cut-off | 0.5 | 0.3 | |
| Negative index cut-off | –0.7 | –0.7 | |
| Decision tree | Max depth | 5 | 5 |
| Class weight | 4 | 3 | |
| Positive index cut-off | 0.5 | 0.3 | |
| Negative index cut-off | –0.7 | –0.7 | |
Classifier AUC-ROC Metrics on Development (average for five runs) and Control Subsamples.
| Classifier | Development (mean ± SD) | Control | ||
|---|---|---|---|---|
| Men | Women | Men | Women | |
| CatBoost | 0.72 ± 0.05 | 0.72 ± 0.03 | 0.72 | 0.74 |
| Decision tree | 0.68 ± 0.04 | 0.69 ± 0.04 | 0.72 | 0.69 |
Note. SD, standard deviation.
Classifier F1 Metrics on Development (average for five runs) and Control Subsamples.
| Classifier | Development (mean ± SD) | Control | ||
|---|---|---|---|---|
| Men | Women | Men | Women | |
| CatBoost | 0.48 ± 0.07 | 0.52 ± 0.03 | 0.51 | 0.55 |
| Decision tree | 0.45 ± 0.06 | 0.51 ± 0.01 | 0.51 | 0.51 |
Note. SD, standard deviation.
Figure 1Confusion matrix for the male control subsample.
Figure 2Confusion matrix for the female control subsample.
Figure 3Positive index via genre indexes for male participants.
Figure 4Positive index via genre indexes for female participants.