| Literature DB >> 30921389 |
Sandra C Matz1, Jochen I Menges2, David J Stillwell3, H Andrew Schwartz4.
Abstract
Information about a person's income can be useful in several business-related contexts, such as personalized advertising or salary negotiations. However, many people consider this information private and are reluctant to share it. In this paper, we show that income is predictable from the digital footprints people leave on Facebook. Applying an established machine learning method to an income-representative sample of 2,623 U.S. Americans, we found that (i) Facebook Likes and Status Updates alone predicted a person's income with an accuracy of up to r = 0.43, and (ii) Facebook Likes and Status Updates added incremental predictive power above and beyond a range of socio-demographic variables (ΔR2 = 6-16%, with a correlation of up to r = 0.49). Our findings highlight both opportunities for businesses and legitimate privacy concerns that such prediction models pose to individuals and society when applied without individual consent.Entities:
Mesh:
Year: 2019 PMID: 30921389 PMCID: PMC6438464 DOI: 10.1371/journal.pone.0214369
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Density distributions of annual income.
The distribution of the study sample is displayed in red, the distribution of the US Census data in blue.
Pearson Product-Moment correlations between predicted and actual income values obtained from 10-fold cross-validated LASSO models.
Column 1 displays model accuracies when using the psychological and socio-demographic controls only. Columns 2–4 display the accuracies of the model when adding Likes, Status Updates, and the combination of the two to the control models. Row 1 displays the accuracies for the models using Facebook data exclusively with no controls. All correlations are significant at p < 0.001.
| Controls | + Likes | + Status Updates | + Likes & Status Updates | |
|---|---|---|---|---|
| No controls | - | 0.27 | 0.41 | 0.43 |
| Personality | 0.14 | 0.27 | 0.41 | 0.42 |
| Demographics | 0.16 | 0.32 | 0.43 | 0.43 |
| Zip code Income | 0.21 | 0.30 | 0.43 | 0.43 |
| Industry | 0.23 | 0.28 | 0.41 | 0.42 |
| Education | 0.30 | 0.34 | 0.44 | 0.44 |
| All socio-demographics | 0.42 | 0.42 | 0.48 | 0.48 |
| All socio-dem. + personality | 0.43 | 0.43 | 0.48 | 0.49 |
Fig 2Pearson Product-Moment correlations between predicted and actual income values.
Red bars indicate the predictive power of the socio-demographic variables used as baseline comparisons. The ‘Demographics” model includes age, gender, and ethnicity. Blue bars indicate the predictive accuracy of Facebook data, separated by Likes, Facebook Status updates, and a combination of the two. The purple bars display the results of the comprehensive models, which include both socio-demographic variables, personality and Facebook data.
Low and high income likes.
Likes most strongly associated with high income (right) and low income (left), controlling for age and gender.
| Low income | High income |
|---|---|
| ❖ if i text a person in the same room as me, i stare at them 'til they get it | ❖ The Smith Center |
| ❖ We act like its a secret drug deal when someone is just giving us gum | ❖ Sheets |
| ❖ All Things Tumblr | ❖ The Cosmopolitan of Las Vegas |
| ❖ Funniest Pics | ❖ Frankie J |
| ❖ Amazing Things | ❖ Beauty4Moms |
| ❖ Eminem | ❖ Janie and Jack |
| ❖ Don't EVER break a pinky promise. That stuff is LEGIT. | ❖ Paula’s Choice |
| ❖ Bullet for my Valentine | ❖ It Works Skinny Wrap Team |
| ❖ Dentist Stop Talking to Me, I Cant Talk | ❖ Pier 39 |
| ❖ Having a "sweatpants, hair tied, chillen | ❖ X Out |
Fig 3Low and high income word clouds.
Words and phrases most positively correlated with income (top) and most negatively correlated with income (bottom), after controlling for age and gender.