| Literature DB >> 36211657 |
Lei Cai1,2, Xiaoqian Liu1,2.
Abstract
The personality assessment is in high demand in various fields and is becoming increasingly more important in practice. In recent years, with the rapid development of machine learning technology, the integration research of machine learning and psychology has become a new trend. In addition, the technology of automatic personality identification based on facial analysis has become the most advanced research direction in large-scale personality identification technology. This study proposes a method to automatically identify the Big Five personality traits by analyzing the facial movement in ordinary videos. In this study, we collected a total of 82 sample data. First, through the correlation analysis between facial features and personality scores, we found that the points from the right jawline to the chin contour showed a significant negative correlation with agreeableness. Simultaneously, we found that the movements of the left cheek's outer contour points in the high openness group were significantly higher than those in the low openness group. This study used a variety of machine learning algorithms to build the identification model on 70 key points of the face. Among them, the CatBoost regression algorithm has the best performance in the five dimensions, and the correlation coefficients between the model prediction results and the scale evaluation results are about medium correlation (0.37-0.42). Simultaneously, we executed the Split-Half reliability test, and the results showed that the reliability of the experimental method reached a high-reliability standard (0.75-0.96). The experimental results further verify the feasibility and effectiveness of the automatic assessment method of Big Five personality traits based on individual facial video analysis.Entities:
Keywords: Big Five; facial key point; machine learning; noninvasive identification; personality trait identification
Mesh:
Year: 2022 PMID: 36211657 PMCID: PMC9533697 DOI: 10.3389/fpubh.2022.1001828
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Demographic information of subjects.
|
|
|
|
|---|---|---|
|
| ||
| Female | 55 | 67.07% |
| Male | 27 | 32.93% |
|
| ||
| Single | 48 | 58.54% |
| Partnered | 30 | 36.59% |
| Married | 2 | 2.44% |
| Divorced | 2 | 2.44% |
|
| ||
| Graduated from senior high school or technical secondary school | 2 | 2.44% |
| College student | 33 | 40.24% |
| Graduate / Vocational college graduation | 21 | 25.61% |
| Graduate student | 20 | 24.39% |
| Master/PHD | 6 | 7.32% |
Mean = 22.41, Variance = 17.58.
Figure 1The score distribution of BFI-44 on five dimensions.
Figure 2The 70 key points on face.
Correlation analysis of facial key point movement and personality score.
|
|
|
|---|---|
| P4_X_VAR | −0.231 |
| P5_X_VAR | −0.219 |
| P6_X_VAR | −0.222 |
| P7_X_VAR | −0.246 |
| P8_X_VAR | −0.246 |
| P9_X_VAR | −0.233 |
| P9_Y_VAR | −0.226 |
| P10_X_VAR | −0.221 |
p < 0.05.
High-low grouping T-test.
|
|
| |||||
|---|---|---|---|---|---|---|
|
|
|
|
| |||
|
|
|
|
| |||
| P11_Y_VAR | 0.149 | 0.007 | 0.103 | 0.003 | 2.126 | 0.039 |
| P12_Y_MEAN | 0.4 | 0.008 | 0.334 | 0.007 | 2.503 | 0.016 |
| P13_Y_MEAN | 0.384 | 0.007 | 0.317 | 0.008 | 2.496 | 0.017 |
| P13_Y_VAR | 0.145 | 0.006 | 0.096 | 0.003 | 2.297 | 0.027 |
| P14_Y_MEAN | 0.392 | 0.008 | 0.328 | 0.008 | 2.225 | 0.031 |
| P15_Y_MEAN | 0.395 | 0.006 | 0.331 | 0.01 | 2.319 | 0.025 |
30 time series features.
|
|
|
|---|---|
| Maximum | Calculates the highest value in a set of values |
| Minimum | Calculates the lowest value in a set of values |
| Mean | Average of a set of values |
| Variance | Variance of a set of values |
| standard_deviation | Standard deviation of a set of values |
| Skewness | Sample skewness of a set of values (calculated with the adjusted Fisher-Pearson standardized moment coefficient G1) |
| Kurtosis | The kurtosis of a set of values (calculated with the adjusted Fisher-Pearson standardized moment coefficient G2) |
| Median | Median of a set of values |
| abs_energy | Absolute energy of a set of values which is the sum over the squared values |
| absolute_sum_of_changes | Sum over the absolute value of consecutive changes in a set of values |
| variance_larger_than_std | Denoting if the variance of a set of values is greater than its standard deviation. Return Int value |
| count_above_mean | Number of values in a set of values that are higher than the mean of itself |
| count_below_mean | Number of values in a set of values that are lower than the mean of itself |
| first_location_of_maximum | First location of the maximum value of a set of values |
| first_location_of_minimum | First location of the minimum value of a set of values |
| last_location_of_maximum | Relative last location of the maximum value of a set of values |
| last_location_of_minimum | Last location of the minimum value of a set of values |
| has_duplicate | Checks if any value in a set of values occurs more than once |
| has_duplicate_max | Checks if the maximum value of a set of values is observed more than once |
| has_duplicate_min | Checks if the minimum value of a set of values is observed more than once |
| longest_strike_above_mean | Length of the longest consecutive subsequence in a set of values that is bigger than the mean of itself |
| longest_strike_below_mean | Length of the longest consecutive subsequence in a set of values that is smaller than the mean of itself |
| mean_abs_change | Mean over the absolute differences between a set of values |
| mean_change | Mean over the absolute differences between a set of values |
| percentage_of_reoccurring_datapoints_to_all_datapoints | Percentage of unique values that are present in a set of values more than once |
| ratio_value_number_to_time_series_length | The factor is one if all values in a set of values occur only once and below one if this is not the case |
| sum_of_reoccurring_data_points | Sum of all data points that are present in a set of values more than once |
| sum_of_reoccurring_values | Sum of all values that are present in a set of values more than once |
| sum_values | Sum over a set of values |
| range | Calculates the range value of a set of values |
Results of the personality identification model.
|
|
|
|
|
|
| |||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
| |
|
| ||||||||||
| PCA = 25 | 0.107 | 35.94 | 0.044 | 3.86 | 0.033 | 4.64 | 0.004 | 23.2 | 0.159 | 3.71 |
| PCA = 30 | 0.15 | 34.5 | 0.177 | 3.77 | −0.011 | 3.95 | 0.035 | 25.3 | −0.009 | 4.48 |
|
| ||||||||||
| PCA = 25 | 0.231 | 3.59 | 0.174 | 3.42 | 0.043 | 3.59 | −0.009 | 3.81 | 0.139 | 4.35 |
| PCA = 30 | 0.166 | 2.88 | 0.313 | 3.12 | 0.138 | 3.2 | 0.088 | 3.85 | 0.158 | 3.96 |
|
| ||||||||||
| PCA = 25 | 0.338 | 2.23 | 0.349 | 3.28 | 0.318 | 4.65 | 0.423 | 2.77 | 0.359 | 2.86 |
| PCA = 30 | 0.338 | 3.73 | 0.32 | 3.43 | 0.36 | 3.96 | 0.307 | 4.35 | 0.404 | 2.41 |
|
| ||||||||||
| PCA = 25 | 0.265 | 4 | 0.32 | 7.96 | 0.476 | 2.95 | 0.295 | 1.32 | 0.227 | 5.17 |
| PCA = 30 | 0.341 | 8.69 | 0.388 | 6.94 | 0.307 | 2 | 0.293 | 2.67 | 0.382 | 1.87 |
|
| ||||||||||
| PCA = 25 |
| 4.4 |
| 4.66 |
| 3.14 |
| 4.39 |
| 3.33 |
| PCA = 30 |
| 4.51 |
| 4.71 |
| 3.68 |
| 2.58 |
| 3.69 |
Bold values indicates the best model.
Split-half reliability.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| CBR (PCA = 25) | 0.96 | 0.927 | 0.756 | 0.815 | 0.864 |
| CBR (PCA = 30) | 0.931 | 0.96 | 0.824 | 0.834 | 0.883 |