| Literature DB >> 25729900 |
Luke Sloan1, Jeffrey Morgan2, Pete Burnap2, Matthew Williams1.
Abstract
This paper specifies, designs and critically evaluates two tools for the automated identification of demographic data (age, occupation and social class) from the profile descriptions of Twitter users in the United Kingdom (UK). Meta-data data routinely collected through the Collaborative Social Media Observatory (COSMOS: http://www.cosmosproject.net/) relating to UK Twitter users is matched with the occupational lookup tables between job and social class provided by the Office for National Statistics (ONS) using SOC2010. Using expert human validation, the validity and reliability of the automated matching process is critically assessed and a prospective class distribution of UK Twitter users is offered with 2011 Census baseline comparisons. The pattern matching rules for identifying age are explained and enacted following a discussion on how to minimise false positives. The age distribution of Twitter users, as identified using the tool, is presented alongside the age distribution of the UK population from the 2011 Census. The automated occupation detection tool reliably identifies certain occupational groups, such as professionals, for which job titles cannot be confused with hobbies or are used in common parlance within alternative contexts. An alternative explanation on the prevalence of hobbies is that the creative sector is overrepresented on Twitter compared to 2011 Census data. The age detection tool illustrates the youthfulness of Twitter users compared to the general UK population as of the 2011 Census according to proportions, but projections demonstrate that there is still potentially a large number of older platform users. It is possible to detect "signatures" of both occupation and age from Twitter meta-data with varying degrees of accuracy (particularly dependent on occupational groups) but further confirmatory work is needed.Entities:
Mesh:
Year: 2015 PMID: 25729900 PMCID: PMC4346393 DOI: 10.1371/journal.pone.0115545
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
NS-SEC Analytic Categories (Source: ONS 2014).
|
|
| |
|---|---|---|
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
|
Note that ‘students’, ‘occupations not stated’ and ‘not classifiable’ are treated as ‘not classified’ and are not included in this table.
Ten most frequent misclassified occupations for three-coder agreement.
|
|
|
|
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Occupations correctly identified (not flagged by any expert coder) where frequency = >5.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fig 1Proportion of individuals by NS-SEC group across four data sources
Pattern Matching Rules for Identifying Age Data.
|
|
|
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
Pattern Matching Rules for Identifying Type I errors.
|
|
|
|---|---|
|
|
|
|
|
|
|
| |
|
|
Fig 2Comparison of age distribution between Twitter and the 2011 Census.
Breakdown of Twitter users by age groups.
|
|
|
|
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|