| Literature DB >> 30861015 |
Ana Carolina E S Lima1, Leandro Nunes de Castro1.
Abstract
Temperament and Psychological Types can be defined as innate psychological characteristics associated with how we relate with the world, and often influence our study and career choices. Furthermore, understanding these features help us manage conflicts, develop leadership, improve teaching and many other skills. Assigning temperament and psychological types is usually made by filling specific questionnaires. However, it is possible to identify temperamental characteristics from a linguistic and behavioral analysis of social media data from a user. Thus, machine-learning algorithms can be used to learn from a user's social media data and infer his/her behavioral type. This paper initially provides a brief historical review of theories on temperament and then brings a survey of research aimed at predicting temperament and psychological types from social media data. It follows with the proposal of a framework to predict temperament and psychological types from a linguistic and behavioral analysis of Twitter data. The proposed framework infers temperament types following the David Keirsey's model, and psychological types based on the MBTI model. Various data modelling and classifiers are used. The results showed that Random Forests with the LIWC technique can predict with 96.46% of accuracy the Artisan temperament, 92.19% the Guardian temperament, 78.68% the Idealist, and 83.82% the Rational temperament. The MBTI results also showed that Random Forests achieved a better performance with an accuracy of 82.05% for the E/I pair, 88.38% for the S/N pair, 80.57% for the T/F pair, and 78.26% for the J/P pair.Entities:
Mesh:
Year: 2019 PMID: 30861015 PMCID: PMC6413941 DOI: 10.1371/journal.pone.0212844
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of the papers found related to temperament classification.
| Algorithm | Features | Measure | I/E | N/S | T/F | J/P | |
|---|---|---|---|---|---|---|---|
| [ | TiMBL | MBSP, n-gram, Lexical features | F-Measure | 65.38% | 61.81% | 49.09% | 51.67% |
| [ | NB, SVM | n-gram, LIWC | F-measure | I: 9.00% | N: 75.88% | F: 75.00% | J: 84.26% |
| [ | NB, logistic regression and SV classification | n-gram | Accuracy | 63.90% | 74.60% | 60.80% | 58.50% |
| [ | Logistic regression | n-gram | Accuracy | 72.50% | 77.40% | 61.20% | 55.40% |
| [ | LinearSVC | n-gram | F-Measure | 67.87% | 73.01% | 58.45% | 56.06% |
| [ | NB | n-gram, POS-tags | Accuracy | 80.00% | 60.00% | 60.00% | 60.00% |
Meta-attributes used in the TECLA framework.
| Name | Type | Description |
|---|---|---|
| A1 | Behavior | Total number of tweets posted by the user so far |
| A2 | Behavior | Number of followers |
| A3 | Behavior | Number of followed |
| A4 | Behavior | Number of times the user was listed |
| A5 | Behavior | Number of times the user was favorited |
| A6 | Behavior | Gender |
| A7 to A94 | Grammatical | If attributes from LIWC |
| A7 to A19 | Grammatical | If attributes from MRC |
| A7 to A41 | Grammatical | If attributes from sTagger |
| A7 to A41 | Grammatical | If attributes from oNLP |
Fig 1MBTI classification scheme: four decomposing classifiers are trained.
Fig 2Example of the classifier representation used in TECLA for the MBTI model.
Fig 3Keirsey classification scheme: four binary classifiers are trained.
Distribution of users for each MBTI type.
| I | ISTJ | 75 | ISFJ | 77 | INFJ | 257 | INTJ | 193 |
| ISTP | 22 | ISFP | 51 | INFP | 175 | INTP | 111 | |
| E | ESTP | 15 | ESFP | 26 | ENFP | 148 | ENTP | 70 |
| ESTJ | 36 | ESFJ | 36 | ENFJ | 106 | ENTJ | 102 |
Ratio between the various MBTI types of users.
| E/I | 539 (35.93%) | 961 (64.07%) |
| N/S | 1162 (77.47%) | 338 (22.53%) |
| T/F | 624 (41.60%) | 876 (58.40%) |
| J/P | 882 (58.80%) | 618 (41.20%) |
| Female/Male | 939 (62.60%) | 561 (37.40%) |
Proportion of users by temperament in the dataset collected.
| Temperament | Count | Percentage |
|---|---|---|
| Guardian (ISTP, ISFP, ESTP, ESFP) | 224 | 14,93% |
| Artisan (ISTJ, ISFJ, ESTJ, ESFJ) | 114 | 7,60% |
| Idealist (INFJ, INFP, ENFJ, ENFP) | 686 | 45,73% |
| Rational (INTJ, INTP, ENTJ, ENTP) | 476 | 31,73% |
Average (mode) value for each attribute extracted by Plank.
| Myers-Briggs | Total | Avg. Followers | Avg. Statuses | Avg. Favorites | Avg. Listed | Gender (F/M) |
|---|---|---|---|---|---|---|
| Extroversion | 539 | 1549.59 | 14587.85 | 4185.09 | 44.32 | 325/214 |
| Introversion | 961 | 1694.10 | 17279.32 | 4928.68 | 30.66 | 614/347 |
| Sensing | 338 | 2851.14 | 18976.33 | 5312.95 | 51.95 | 229/109 |
| Intuition | 1162 | 1290.51 | 15537.25 | 4471.99 | 30.80 | 710/452 |
| Thinking | 624 | 1529.57 | 15959.86 | 4157.91 | 25.99 | 340/284 |
| Feeling | 876 | 1722.38 | 16563.16 | 5020.20 | 42.39 | 599/277 |
| Judging | 882 | 2034.39 | 15359.27 | 4150.58 | 40.52 | 564/318 |
| Perceiving | 618 | 1082.41 | 17672.17 | 5390.64 | 28.50 | 375/243 |
| Keirsey | Total | Avg. Followers | Avg. Statuses | Avg. Favorites | Avg. Listed | Gender (F/M) |
| Guardian | 224 | 3924.24 | 18501.26 | 4864.42 | 68.96 | 153/71 |
| Artisan | 114 | 742.60 | 19909.79 | 6194.29 | 18.53 | 76/38 |
| Idealist | 686 | 1201.93 | 15959.14 | 4720.33 | 33.49 | 459/227 |
| Rational | 476 | 1418.18 | 14929.22 | 4114.08 | 26.93 | 251/225 |
Accuracy (ACC), F-measure (F) and AUC for Twitter with 5 features.
| Algorithm | ACC | F-measure (No) | F-measure (Yes) | AUC |
|---|---|---|---|---|
| AdaBoost | 75.46%±0.38% | 83.86%±0.34% | 9.94%±9.94% | 58.32±0.64 |
| Bagging | 81.69%±0.31% | 86.85%±0.34% | 40.10%±40.10% | 77.14±0.52 |
| J48 | 76.66%±0.38% | 84.62%±0.50% | 22.35%±22.35% | 54.75±0.57 |
| NaiveBayes | 48.67%±2.06% | 47.78%±2.04% | 31.67%±31.67% | 51.89±0.16 |
| RandomForest | 86.64%±0.26% | 90.05%±0.26% | 68.51%±68.51% | 83.81±0.91 |
| SVM | 74.88%±0.32% | 81.07%±1.31% | 12.88%±12.88% | 49.98±0.02 |
Accuracy (ACC), F-measure (F) and AUC for MRC with 9 features.
| Algorithm | ACC | F-measure (No) | F-measure (Yes) | AUC |
|---|---|---|---|---|
| AdaBoost | 75.41%±0.50% | 83.67%±0.58% | 10.33%±2.51% | 61.72±1.04 |
| Bagging | 83.02%±0.35% | 87.78%±0.37% | 44.79%±1.27% | 80.58±0.2 |
| J48 | 77.34%±0.83% | 84.95%±0.59% | 28.69%±2.67% | 59.70±0.79 |
| NaiveBayes | 72.69%±0.11% | 77.29%±0.34% | 22.94%±0.63% | 57.52±0.24 |
| RandomForest | 87.48%±0.25% | 90.74%±0.28% | 69.81%±0.86% | 86.26±0.35 |
| SVM | 75.00%±0.00% | 84.87%±0.00% | 0.00±0.00% | 50.00±0.00 |
Accuracy (ACC) and F-measure (F) for LIWC with 25 features.
| Algorithm | ACC | F-measure (No) | F-measure (Yes) | AUC |
|---|---|---|---|---|
| AdaBoost | 75.75±0.27 | 83.46%±0.87% | 17.23%±1.33% | 86.83±2.59 |
| Bagging | 84.38±0.18 | 88.78%±0.26% | 49.76%±1.65% | 84.06±1.53 |
| J48 | 83.71±0.47 | 87.75%±0.67% | 61.21%±1.08% | 87.82±0.82 |
| NaiveBayes | 67.49±0.13 | 75.87%±0.09% | 35.90%±0.48% | 87.54±0.85 |
| RandomForest | 87.91±0.13 | 91.14%±0.13% | 70.52%±0.81% | 86.83±2.59 |
| SVM | 76.21±0.38 | 83.95%±0.35% | 11.72%±0.46% | 84.06±1.53 |
Accuracy (ACC), F-measure (F) and AUC for ONLP with 24 features.
| Algorithm | ACC | F-measure (No) | F-measure (Yes) | AUC |
|---|---|---|---|---|
| AdaBoost | 75.36%±0.46% | 82.34%±0.94% | 12.69%±2.84% | 59.48±0.11 |
| Bagging | 83.54%±0.46% | 88.30%±0.47% | 44.23%±1.06% | 80.71±0.63 |
| J48 | 83.26%±0.18% | 87.40%±0.29% | 61.09%±1.75% | 76.28±0.72 |
| NaiveBayes | 71.31%±0.07% | 78.18%±0.14% | 27.52%±0.74% | 59.35±0.27 |
| RandomForest | 87.60%±0.33% | 90.95%±0.31% | 69.68%±0.63% | 86.12±0.76 |
| SVM | 75.60%±0.30% | 83.87%±0.27% | 9.73%±0.68% | 51.49±0.41 |
Accuracy (ACC), F-measure (F) and AUC for the Random Forest.
| Twitter 5 attributes | ||||
| Artisan | Guardian | Idealist | Rational | |
| ACC | 95.53%±0.44% | 90.63%±0.53% | 77.27%±0.59% | 80.27%±0.70% |
| F-measure (No) | 97.60%±0.24% | 94.66%±0.30% | 79.60%±0.49% | 86.15%±0.50% |
| F-measure (Yes) | 67.32%±3.05% | 61.71%±2.39% | 74.32%±0.81% | 65.72%±1.45% |
| AUC | 84.08±2.55 | 82.45±1.13 | 85.07±0.53 | 83.65±0.72 |
| MRC 9 attributes | ||||
| Artisan | Guardian | Idealist | Rational | |
| ACC | 92.92%±0.30% | 87.33%±0.46% | 73.09%±0.91% | 77.73%±1.18% |
| F-measure (No) | 98.09%±0.22% | 95.42%±0.24% | 80.07%±0.74% | 88.00%±0.62% |
| F-measure (Yes) | 72.54%±3.64% | 63.60%±2.42% | 74.00%±1.19% | 67.05%±2.20% |
| AUC | 89.97±1.63 | 83.80±1.27 | 86.33±0.85 | 84.93±0.95 |
| LIWC 25 attributes | ||||
| Artisan | Guardian | Idealist | Rational | |
| ACC | 96.46%±0.27% | 92.19%±0.44% | 78.68%±0.61% | 83.82%±0.70% |
| F-measure (No) | 98.11%±0.14% | 95.61%±0.24% | 81.47%±0.50% | 89.04%±0.45% |
| F-measure (Yes) | 72.54%±2.85% | 64.54%±2.75% | 74.89%±0.87% | 69.13%±1.58% |
| AUC | 86.83±2.59 | 84.06±1.53 | 87.82±0.82 | 87.54±0.85 |
| oNLP 24 attributes | ||||
| Artisan | Guardian | Idealist | Rational | |
| ACC | 96.40%±0.20% | 92.01%±0.50% | 78.29%±1.00% | 82.83%±0.90% |
| F-measure (No) | 98.08%±0.13% | 95.51%±0.25% | 81.09%±1.01% | 88.42%±0.61% |
| F-measure (Yes) | 71.46%±2.29% | 63.73%±2.57% | 74.50%±1.53% | 66.75%±1.82% |
| AUC | 86.94±0.52 | 87.16±1.02 | 87.03±0.84 | 86.94±1.17 |
Accuracy (ACC), F-measure (F) and AUC for Twitter with 5 features in the MBTI prediction.
| Algorithm | ACC | F-measure (No) | F-measure (Yes) | AUC |
|---|---|---|---|---|
| AdaBoost | 65.46%±0.24% | 38.81%±2.94% | 56.00%±1.61% | 58.14±0.49 |
| Bagging | 75.12%±0.17% | 65.97%±1.03% | 71.97%±0.34% | 77.45±0.36 |
| J48 | 66.66%±0.26% | 47.52%±1.85% | 60.34%±2.44% | 58.53±0.45 |
| NaiveBayes | 59.98%±0.24% | 52.06%±0.39% | 31.39%±1.52% | 51.11±0.07 |
| RandomForest | 81.54%±0.09% | 78.71%±0.80% | 79.29%±0.23% | 84.81±0.20 |
| SVM | 65.73%±0.00% | 37.01%±0.00% | 53.29%±0.00% | 49.99±0.03 |
Accuracy (ACC), F-measure (F) and AUC for MRC with 16 features in MBTI prediction.
| Algorithm | ACC | F-measure (No) | F-measure (Yes) | AUC |
|---|---|---|---|---|
| AdaBoost | 65.75%±0.25% | 45.43%±1.07% | 53.56%±1.60% | 58.10±0.16 |
| Bagging | 77.51%±0.13% | 70.02%±1.15% | 74.46%±0.49% | 81.26±0.46 |
| J48 | 69.93%±0.61% | 59.03%±2.11% | 65.66%±2.98% | 62.76±0.81 |
| NaiveBayes | 63.47%±0.11% | 49.62%±0.12% | 54.39%±0.29% | 56.18±0.17 |
| RandomForest | 81.83%±0.09% | 78.80%±0.62% | 79.13%±0.37% | 87.06±0.25 |
| SVM | 64.72%±0.00% | 38.15%±0.00% | 40.20%±0.00% | 50.00±0.00 |
Accuracy (ACC), F-measure (F) and AUC for LIWC with 27 features in MBTI prediction.
| Algorithm | ACC | F-measure (No) | F-measure (Yes) | AUC |
|---|---|---|---|---|
| AdaBoost | 65.73%±0.28% | 47.94%±0.52% | 56.47%±2.03% | 59.70±0.33 |
| Bagging | 78.45%±0.22% | 70.97%±0.77% | 75.39%±0.70% | 82.85±0.43 |
| J48 | 77.71%±0.52% | 72.33%±2.00% | 76.73%±0.66% | 77.04±0.43 |
| NaiveBayes | 61.32%±0.08% | 51.82%±0.11% | 54.65%±0.37% | 58.80±0.15 |
| RandomForest | 82.58%±0.08% | 79.61%±0.61% | 79.92%±0.39% | 87.79±0.56 |
| SVM | 64.83%±0.04% | 38.53%±0.43% | 41.42%±0.53% | 50.06±0.12 |
Accuracy (ACC), F-measure (F) and AUC for ONLP with 22 features in MBTI prediction.
| Algorithm | ACC | F-measure (No) | F-measure (Yes) | AUC |
|---|---|---|---|---|
| AdaBoost | 65.02%±0.16% | 37.99%±1.41% | 58.11%±1.51% | 57.08±0.46 |
| Bagging | 77.73%±0.22% | 69.60%±0.70% | 74.74%±0.58% | 80.95±0.40 |
| J48 | 78.09%±0.36% | 73.93%±0.80% | 76.85%±0.10% | 76.70±0.19 |
| NaiveBayes | 60.69%±0.13% | 51.02%±0.28% | 58.09%±0.40% | 55.11±0.42 |
| RandomForest | 82.15%±0.14% | 79.08%±0.82% | 79.56%±0.27% | 87.02±0.28 |
| SVM | 64.75%±0.15% | 38.15%±0.21% | 43.72%±0.60% | 49.95±0.13 |
Accuracy (ACC), F-measure (F) and AUC for the Random Forest in MBTI prediction.
| Twitter 5 attributes | ||||
| E/I | S/N | T/F | J/P | |
| ACC | 80.82%±0.74% | 87.65%±0.96% | 79.77%±0.79% | 77.93%±0.93% |
| F-measure (No) | 85.85%±0.56% | 71.91%±2.58% | 83.52%±0.66% | 73.54%±1.28% |
| F-measure (Yes) | 70.23%±1.10% | 92.09%±0.59% | 73.80%±1.14% | 81.06%±0.75% |
| AUC | 85.22±0.81 | 85.33±0.85 | 85.09±0.78 | 83.62±1.21 |
| MRC 16 attributes | ||||
| E/I | S/N | T/F | J/P | |
| ACC | 81.39%±0.74% | 87.32%±0.66% | 78.74%±0.66% | 77.89%±1.50% |
| F-measure (No) | 86.59%±0.56% | 70.77%±1.84% | 82.95%±0.52% | 72.98%±1.68% |
| F-measure (Yes) | 69.56%±1.14% | 91.90%±0.40% | 71.78%±0.98% | 81.29%±1.36% |
| AUC | 87.85±0.59 | 86.96±0.93 | 86.79±0.38 | 86.64±0.44 |
| LIWC 27 attributes | ||||
| E/I | S/N | T/F | J/P | |
| ACC | 81.89%±0.66% | 88.17%±1.00% | 80.57%±0.80% | 78.26%±0.79% |
| F-measure (No) | 87.03%±0.49% | 71.54%±2.43% | 84.49%±0.63% | 73.66%±1.06% |
| F-measure (Yes) | 70.04%±1.05% | 92.54%±0.63% | 74.01%±1.15% | 81.49%±0.66% |
| AUC | 87.86±0.43 | 87.35±1.69 | 87.94±0.74 | 88.02±0.65 |
| oNLP 22 attributes | ||||
| E/I | S/N | T/F | J/P | |
| ACC | 82.05%±0.65% | 88.38%±0.68% | 80.01%±0.89% | 77.89%±1.03% |
| F-measure (No) | 87.12%±0.44% | 72.13%±1.94% | 84.07%±0.65% | 73.22%±1.16% |
| F-measure (Yes) | 70.38%±1.26% | 92.66%±0.41% | 73.15%±1.44% | 81.17%±0.94% |
| AUC | 86.94±0.52 | 87.16±1.02 | 87.03±0.84 | 86.94±1.17 |
Comparing with MBTI and Keirsey Results from the Literature.
| Algorithm | Features | Measure | I/E | N/S | T/F | J/P | |
|---|---|---|---|---|---|---|---|
| [ | TiMBL | MBSP, n-gram, Lexical features | F-Measure | 65.38% | 61.81% | 49.09% | 51.67% |
| [ | NB, SVM | n-gram, LIWC | F-measure | I: | N: 75.88% | F: 75.00% | J: |
| [ | NB, logistic regression and SV classification | n-gram | Accuracy | 63.90% | 74.60% | 60.80% | 58.50% |
| [ | Logistic regression | n-gram | Accuracy | 72.50% | 77.40% | 61.20% | 55.40% |
| [ | LinearSVC | n-gram | F-Measure | 67.87% | 73.01% | 58.45% | 56.06% |
| [ | NB | n-gram, POS-tags | Accuracy | 80.00% | 60.00% | 60.00% | 60.00% |
| TECLA (MBTI) | Random | LIWC, oNLP | Accuracy | 82.05% | 88.38% | 80.57% | 78.26% |
| TECLA (MBTI) | Random | LIWC, oNLP | F-measure | I: 87.2% | N: 92.66% | F: 84.49% | J: 81.49% |
| Algorithm | Features | Measure | Artisan | Guardian | Idealist | Rational | |
| Lima and de Castro (2016) | SVM, KNN(2) | LIWC | Accuracy | 87.67% | 83.56% | 60.27% | 58.90% |
| TECLA (Keirsey) | Random | LIWC | Accuracy | 96.46% | 92.19% | 78.68% | 83.82% |