| Literature DB >> 30485305 |
Joanne Hinds1, Adam N Joinson1.
Abstract
To what extent does our online activity reveal who we are? Recent research has demonstrated that the digital traces left by individuals as they browse and interact with others online may reveal who they are and what their interests may be. In the present paper we report a systematic review that synthesises current evidence on predicting demographic attributes from online digital traces. Studies were included if they met the following criteria: (i) they reported findings where at least one demographic attribute was predicted/inferred from at least one form of digital footprint, (ii) the method of prediction was automated, and (iii) the traces were either visible (e.g. tweets) or non-visible (e.g. clickstreams). We identified 327 studies published up until October 2018. Across these articles, 14 demographic attributes were successfully inferred from digital traces; the most studied included gender, age, location, and political orientation. For each of the demographic attributes identified, we provide a database containing the platforms and digital traces examined, sample sizes, accuracy measures and the classification methods applied. Finally, we discuss the main research trends/findings, methodological approaches and recommend directions for future research.Entities:
Mesh:
Year: 2018 PMID: 30485305 PMCID: PMC6261568 DOI: 10.1371/journal.pone.0207112
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1PRISMA Flowchart summarising study retrieval and selection.
Fig 2Waffle chart highlighting the proportion of demographic attributes comprising our dataset.
Fig 3Number of articles published per year and by quality of publication.
Number of articles predicting gender, with associated platforms and references.
| Category (n = no. articles) | Platform (n = no. articles) | References |
|---|---|---|
| Twitter (106) | [ | |
| Facebook (7) | [ | |
| YouTube (2) | [ | |
| Netlog (2) | [ | |
| Flickr (3) | [ | |
| Pintrest (1) | [ | |
| Instagram (1) | [ | |
| Sina Weibo (1) | [ | |
| Social Media (General) (25) | [ | |
| Smartphones (25) | [ | |
| Tablets (1) | [ | |
| News sites (3) | [ | |
| Websites (6) | [ | |
| IMDB (1) | [ | |
| Hotel Reviews (25) | [ | |
| Movielens (2) | [ | |
| Crowdfunding Essays (1) | [ | |
| [ | ||
| Blogs (General) (51) | [ | |
| Vietnamese Blogs (1) | [ | |
| Tumblr (1) | [ | |
| NR (9) | [ | |
| Last.fm (3) | [ | |
| Yahoo! (1) | [ | |
| Bing (1) | [ | |
| Chat Logs (General) (18) | [ | |
| Heaven BBS (2) | [ | |
| World of Warcraft (1) | [ | |
| Wi-Fi (1) | [ | |
| NA (1) | [ | |
| Professional Writing (1) | [ | |
| Essays (15) | [ |
Number of articles predicting gender, with associated predictors and references.
| Category (n = no. articles) | Predictors (n = no. articles) | References |
|---|---|---|
| Language (123) | [ | |
| Network Data (8) | [ | |
| Colours (4) | [ | |
| Meta-data (17) | [ | |
| Names (13) | [ | |
| Images (30) | [ | |
| Locations (2) | [ | |
| Facebook Likes (2) | [ | |
| Tags (3) | [ | |
| Activity (1) | [ | |
| Check-ins (1) | [ | |
| Application Data (9) | [ | |
| Call Logs/SMS Data (11) | [ | |
| Location Data (4) | [ | |
| Language (35) | [ | |
| Website Data (1) | [ | |
| Network Traffic Traces (1) | [ | |
| Background Colours (1) | [ | |
| Video Tags/Titles (1) | [ | |
| Web Usage Data (1) | [ | |
| Language (55) | [ | |
| Behavioural Data (1) | [ | |
| Meta-data (3) | [ | |
| Language (9) | [ | |
| Meta-data, Listening Habits (3) | [ | |
| Query Log Data (1) | [ | |
| Facebook Likes, Profile Data (1) | [ | |
| Language (20) | [ | |
| Behavioural Data (1) | [ | |
| Wi-Fi Traffic (1) | [ | |
| Academic Researcher Emails (1) | [ | |
| Language (15) | [ |
Number of articles predicting age, with associated platforms and references.
| Category (n = no. articles) | Platform (n = no. articles) | References |
|---|---|---|
| IMDB (1) | [ | |
| Other (8) | [ | |
| Hotel Reviews (24) | [ | |
| Bing (1) | [ | |
| Yahoo! (1) | [ | |
| [ | ||
| Blogs (General) (50) | [ | |
| LiveJournal (1) | [ | |
| NR (18) | [ | |
| Vietnamese Forums (1) | [ | |
| Breast Cancer Forum (1) | [ | |
| Twitter (75) | [ | |
| Social Media (General) (25) | [ | |
| Facebook (5) | [ | |
| Flickr (1) | [ | |
| Netlog (2) | [ | |
| YouTube (2) | [ | |
| Instagram (2) | [ | |
| Pokec (1) | [ | |
| Sina Weibo (3) | [ | |
| NR (4) | [ | |
| Last.fm (3) | [ | |
| World of Warcraft (1) | [ | |
| NR (19) | [ | |
| Essays (14) | [ |
Number of articles predicting age, with associated predictors and references.
| Category (n = no. articles) | Predictors (n = no. articles) | Reference |
|---|---|---|
| Language (30) | [ | |
| Website Data (3) | [ | |
| Network Data (1) | [ | |
| Network Traffic Data (1) | [ | |
| Demographics, Names, Followers (1) | [ | |
| Facebook Likes (1) | [ | |
| Query Logs (1) | [ | |
| Language (54) | [ | |
| Meta-data (6) | [ | |
| Application Use (7) | [ | |
| Call/SMS Data (15) | [ | |
| Location Data (8) | [ | |
| Accelerometer Data (5) | [ | |
| Network Data (7) | [ | |
| Language (2) | [ | |
| Language (81) | [ | |
| Meta-data (7) | [ | |
| Network Data (12) | [ | |
| Facebook Likes (2) | [ | |
| Names (4) | [ | |
| Images (4) | [ | |
| Check-ins (1) | [ | |
| Language (4) | [ | |
| Music Meta-data/Listening Habits (3) | [ | |
| Profile Information (1) | [ | |
| Character Features/Behavioural Data (1) | [ | |
| Language (19) | [ | |
| Meta-data (1) | [ | |
| Language (14) | [ |
Number of articles predicting location, with associated platforms and references.
| Category (n = no. articles) | Platform (n = no. articles) | References |
|---|---|---|
| Facebook (2) | [ | |
| Twitter (20) | [ | |
| Flickr (3) | [ | |
| Foursquare (3) | [ | |
| Brightkite (1) | [ | |
| Google+ (1) | [ | |
| Gowalla (1) | [ | |
| NR (1) | [ | |
| NR (1) | [ | |
| NR (1) | [ | |
| Webretho, Otofun, Tinhte (1) | [ | |
| Yahoo! (1) | [ | |
| NR (1) | [ |
Number of articles predicting location, with associated predictors and references.
| Category (n = no. articles) | Predictor (n = no. articles) | Reference |
|---|---|---|
| Location Data (16) | [ | |
| Network Data (7) | [ | |
| Names (2) | [ | |
| Facebook Likes (1) | [ | |
| Language (16) | [ | |
| Spatial, Visual, Temporal Features (1) | [ | |
| Check-in Data (2) | [ | |
| Location Data (3) | [ | |
| Language (1) | [ | |
| Language (1) | [ | |
| Applications (1) | [ | |
| Language (1) | [ | |
| Query Logs (1) | [ | |
| Network Traffic Traces (1) | [ |
Number of articles predicting political orientation, with associated platforms and references.
| Category (n = no. articles) | Platform (n = no. articles) | References |
|---|---|---|
| Twitter (25) | [ | |
| Facebook (2) | [ | |
| IMDB (1) | [ | |
| Bing (1) | [ | |
| Digg (1) | [ | |
| Blogs (Other) (3) | [ |
Number of articles predicting political orientation, with associated predictors and references.
| Category (n = no. articles) | Predictors (n = no. articles) | References |
|---|---|---|
| Meta-data (4) | [ | |
| Language (24) | [ | |
| Network Data (10) | [ | |
| Facebook Likes (2) | [ | |
| Language (3) | [ | |
| Network Data (2) | [ | |
| Location Data (1) | [ | |
| Name Data (1) | [ | |
| Facebook Likes (1) | [ | |
| Language (4) | [ |
Number of articles predicting sexual orientation, with associated platforms and references.
| Category (n = no. articles) | Platform (n = no. articles) | References |
|---|---|---|
| Friendster (2) | [ | |
| Facebook (3) | [ | |
| Sina Weibo (1) | [ | |
| NR (1) | [ |
Number of articles predicting sexual orientation, with associated predictors and references.
| Category (n = no. articles) | Predictors (n = no. articles) | References |
|---|---|---|
| Network Data (2) | [ | |
| Gender, Relationship Status, Sexual Orientation (1) | [ | |
| Facebook Likes (2) | [ | |
| Check-ins (1) | [ | |
| Images (1) | [ |
Number of articles predicting family and relationship status, with associated platforms and references.
| Category (n = no. articles) | Platform (n = no. articles) | References |
|---|---|---|
| Facebook (3) | [ | |
| Friendster (1) | [ | |
| Twitter (4) | [ | |
| Sina Weibo (1) | [ | |
| NR (9) | [ | |
| NR (1) | [ |
Number of articles predicting family and relationship status, with associated predictors and references.
| Category (n = no. articles) | Predictors (n = no. articles) | References |
|---|---|---|
| Facebook Likes (2) | [ | |
| Language (4) | [ | |
| Relationship Status (1) | [ | |
| Network Data (1) | [ | |
| Check-ins (1) | [ | |
| Application Data, Behavioural Data, Call Data (8) | [ | |
| Location Data (1) | [ | |
| Web Usage Data (1) | [ |
Number of articles predicting ethnicity or race, with associated platforms and references.
| Category (n = no. articles) | Platform (n = no. articles) | References |
|---|---|---|
| Twitter (12) | [ | |
| Facebook (3) | [ | |
| News (1) | [ | |
| Other (2) | [ | |
| Smartphone (1) | [ | |
| Tablet (2) | [ | |
| Meta-data, Listening Habits (1) | [ |
Number of articles predicting ethnicity or race, with associated predictors and references.
| Category (n = no. articles) | Predictors (n = no. articles) | References |
|---|---|---|
| Names (6) | [ | |
| Language (11) | [ | |
| Network Data (2) | [ | |
| Location Data (2) | [ | |
| Meta-data (1) | [ | |
| Facebook Likes (1) | [ | |
| Profile Images (2) | [ | |
| Names (1) | [ | |
| Web Browsing Histories (1) | [ | |
| Language, Network Data (1) | [ | |
| Application Data (1) | [ | |
| Actions, Keystrokes, Timestamps (1) | [ | |
| Meta-data, Listening Habits (1) | [ |
Number of articles predicting education level, with associated platforms and references.
| Category (n = no. articles) | Platform (n = no. articles) | References |
|---|---|---|
| Twitter (6) | [ | |
| Facebook (1) | [ | |
| Sina Weibo (1) | [ | |
| NR (4) | [ | |
| NR (4) | [ | |
| NA (1) | [ |
Number of articles predicting education level, with associated predictors and references.
| Category (n = no. articles) | Predictors (n = no. articles) | References |
|---|---|---|
| Language (7) | [ | |
| Network Data (2) | [ | |
| Meta-data (2) | [ | |
| Facebook Likes (1) | [ | |
| Check-ins (1) | [ | |
| Language (1) | [ | |
| Network Data (1) | [ | |
| Website Data (1) | [ | |
| Meta-data (1) | [ | |
| Web Browsing Histories (1) | [ | |
| NR (1) | [ | |
| Language (4) | [ | |
| Wi-Fi Traffic (1) | [ |
Number of articles predicting income, with associated platforms and references.
| Category (n = no. articles) | Platform (n = no. articles) | References |
|---|---|---|
| Twitter (6) | [ | |
| NR (5) | [ | |
| NR (2) | [ |
Number of articles predicting income, with associated predictors and references.
| Category (n = no. articles) | Predictors (n = no. articles) | References |
|---|---|---|
| Language (6) | [ | |
| Application Data (1) | [ | |
| Call/SMS Data (4) | [ | |
| Network Data (1) | [ | |
| Web Usage Data (2) | [ |
Number of articles predicting language, with associated platforms and references.
| Category (n = no. articles) | Platform (n = no. articles) | References |
|---|---|---|
| Smartphone (1) | [ | |
| Tablet (1) | [ | |
| [ | ||
| NR (3) | [ | |
| Twitter (3) | [ |
Number of articles predicting language, with associated predictors and references.
| Category (n = no. articles) | Platform (n = no. articles) | References |
|---|---|---|
| Application Data (1) | [ | |
| Actions, Keystrokes, Timestamps (1) | [ | |
| Language (1) | [ | |
| Language (3) | [ | |
| Names, Location (1) | [ | |
| Language (2) | [ |
Number of articles predicting religion, with associated platforms and references.
| Category (n = no. articles) | Platform (n = no. articles) | References |
|---|---|---|
| Twitter (4) | [ | |
| Facebook (2) | [ | |
| Bing (1) | [ | |
| NR (1) | [ |
Number of articles predicting religion, with associated predictors and references.
| Category (n = no. articles) | Platform (n = no. articles) | References |
|---|---|---|
| Language (4) | [ | |
| Facebook Likes (2) | [ | |
| Facebook Likes, Profile Data (1) | [ | |
| Application Data (1) | [ |
Number of articles predicting occupation, with associated platforms and references.
| Category (n = no. articles) | Platform (n = no. articles) | References |
|---|---|---|
| Twitter (10) | [ | |
| NR (2) | [ | |
| NR (8) | [ | |
| NR (1) | [ | |
| NR (1) | [ |
Number of articles predicting occupation, with associated predictors and references.
| Category (n = no. articles) | Platform (n = no. articles) | References |
|---|---|---|
| Language (9) | [ | |
| Network Data (3) | [ | |
| Meta-data (5) | [ | |
| Language (2) | [ | |
| Meta-data (1) | [ | |
| Application Data (6) | [ | |
| Call Data (5) | [ | |
| Location Data (2) | [ | |
| Time/Day Data, Website Data (1) | [ | |
| Language (1) | [ |
Number of articles predicting health, with associated predictors and references.
| Category (n = no. articles) | Platform (n = no. articles) | References |
|---|---|---|
| Twitter (5) | [ | |
| Facebook (1) | [ | |
| MyFitnessPal (1) | [ | |
| Reddit (1) | [ | |
| NR (2) | [ |
Number of articles predicting health, with associated predictors and references.
| Category (n = no. articles) | Predictors (n = no. articles) | References |
|---|---|---|
| Language (5) | [ | |
| Images (1) | [ | |
| Facebook Likes (1) | [ | |
| Behavioural Data (1) | [ | |
| Application Data (2) | [ | |
| Network Data (1) | [ |
Number of articles predicting social class, with associated platforms and references.
| Category (n = no. articles) | Platform (n = no. articles) | References |
|---|---|---|
| Twitter (1) | [ | |
| Foursquare (1) | [ |
Number of articles predicting social class, with associated predictors and references.
| Category (n = no. articles) | Platform (n = no. articles) | References |
|---|---|---|
| Language (1) [ | [ |