| Literature DB >> 30092008 |
Abstract
This paper examines the association between given and family names and self-ascribed ethnicity as classified by the 2011 Census of Population for England and Wales. Using Census data in an innovative way under the new Office for National Statistics (ONS) Secure Research Service (SRS; previously the ONS Virtual Microdata Laboratory, VML), we investigate how bearers of a full range of given and family names assigned themselves to 2011 Census categories, using a names classification tool previously described in this journal. Based on these results, we develop a follow-up ethnicity estimation tool and describe how the tool may be used to observe changing relations between naming practices and ethnic identities as a facet of social integration and cosmopolitanism in an increasingly diverse society.Entities:
Mesh:
Year: 2018 PMID: 30092008 PMCID: PMC6084909 DOI: 10.1371/journal.pone.0201774
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
List of tables provided as input into the VML, subject to minimum stated counts.
| VML input table | value | min. count |
|---|---|---|
| 1. forename by ethnicity | for each forename: n(correct) / n(bearers) | 10 |
| 2. forename by age and ethnicity | for each forename: n(correct) / n(bearers in age group) | 10 |
| 3. surname by ethnicity | for each surname: n(correct) / n(bearers) | 10 |
| 4. surname by region and ethnicity | for each surname: n(correct) / n(bearers in region) | 10 |
| 5. full name by ethnicity | for each full name: n(correct) / n(bearers) | 100 |
| 6. sex by age and ethnicity | n(correct, sex, age, ethnicity) / n(sex, age, ethnicity) | 10 |
| 7. sex by marital status and ethnicity | n(correct, sex, marital status, ethnicity / n(sex, marital status, ethnicity) | 10 |
| 8. sex by region (GOR) and ethnicity | n(correct, sex, GOR, ethnicity) / n(sex, GOR, ethnicity) | 10 |
Fig 1Proportion of forenames in different ethnic groups estimated from cluster analysis of names in the 2011 Census (left) and from predictions by Onomap (right).
Fig 2Proportion of surnames in different ethnic groups estimated from cluster analysis of names in the 2011 Census (left) and from predictions by Onomap (right).
Frequencies of aggregate categories of self-assigned ethnicity in 2011 Census (England and Wales) rounded to the nearest 100 from calculations made using VML.
| 41,764,000 | 484,300 | 2,242,200 | 1,328,400 | 1,047,900 | 403,800 |
| 349,100 | 686,100 | 884,900 | 532,000 | 1,366,800 | 688,700 |
Fig 3Unweighted prediction accuracy for different age bands.
Fig 4Prediction accuracy for women (solid line) and men (dashed line) by age in each self-ascribed ethnic category (the small Arab/Mixed/Other category is not shown).
Fig 5Prediction accuracy for women (black bar) and men (grey bar) by marital status in each self-ascribed ethnic category (the small Arab/Mixed/Other category is not shown).
Fig 6Prediction accuracy according to region of residence (North East NE; North West NW; East Midlands EM; West Midlands WM; Yorkshire and Humber YH; East England EE; Greater London GL; South East SE; South West SW; Wales WA).
Fig 7Prediction accuracy for women (black bar) and men (grey bar) by region in each ethnic group (codes for regions as in Fig 6).
First set of follow-up algorithms and their main characteristics.
| label | Description |
|---|---|
| OM-F | Ethnicity estimation for a person based on forename ethnicity as recorded in Onomap (= OM) |
| OM-G | … based on fore or surname ethnicity depending on gender of person (using Onomap) |
| OM-GA | … based on either fore or surname based on gender and age of record (using Onomap) |
| EE-A | … based on highest mean of the forename and surname |
| EE-M | … based on highest product of the forename and surname |
| EE-R | … based on that combination of forename and surname matching ethnicities with the highest surname |
| EE-RS | … based on highest surname |
| EE-W | … based on highest |
The Census-based surname lookup table for Ethnicity Estimator.
| name | rank | ethnicity code | ethnicity label | weight |
|---|---|---|---|---|
| AARON | 1 | WBR | White British | .81 |
| AARON | 2 | BCA | Black Caribbean | .06 |
| AARON | 3 | OXX | Other ethnic group | .05 |
| AARON | 4 | WAO | White Other | .04 |
| AARON | 5 | BAF | Black African | .04 |
Example of assigning ethnic class using Ethnicity Estimator.
| forename | surname | matched class | forename | surname | product | mean | assignment by EE | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| weight | rank | weight | rank | -A | -M | -R | -RS | -W | |||||
| SHERNETTE | AARON | WBR | .07 | 2 | .81 | 1 | .0567 | .44 | - | X | X | X | |
| SHERNETTE | AARON | BCA | .84 | 1 | .06 | 2 | .0504 | .45 | X | - | - | - | X |
| SHERNETTE | AARON | OXX | - | - | .05 | 3 | - | - | - | - | - | - | - |
| SHERNETTE | AARON | WAO | - | - | .04 | 4 | - | - | - | - | - | - | - |
| SHERNETTE | AARON | BAF | - | - | .04 | 5 | - | - | - | - | - | - | - |
Aggregate prediction accuracy of all algorithms for each ethnic group.
| algo. | ethnic group | weighted | unweighted | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| WBR | WIR | WAO | AIN | APK | ABD | ACN | AAO | BAF | BCA | OXX | WBR-OXX | WAO-OXX | WBR-OXX | WAO-OXX | |
| OM | 89.88 | 64.24 | 76.13 | 56.45 | 78.51 | 23.61 | 39.71 | 2.90 | 5.87 | 82.57 | 46.83 | 45.17 | |||
| OM-F | 84.47 | 42.14 | 55.09 | 61.13 | 70.81 | 44.13 | 59.97 | 22.56 | 32.20 | 4.13 | 77.42 | 43.06 | 44.14 | 39.88 | |
| OM-G | 88.61 | 47.16 | 57.80 | 62.68 | 72.09 | 49.43 | 69.24 | 23.37 | 35.88 | 3.33 | 7.30 | 81.20 | 44.89 | 46.99 | 42.35 |
| OM-GA | 89.81 | 47.75 | 58.73 | 64.20 | 75.97 | 54.20 | 73.00 | 23.98 | 37.42 | 3.31 | 6.24 | 82.42 | 46.26 | 48.60 | 44.12 |
| EE-A | 99.54 | 0.01 | 43.21 | 76.58 | 87.63 | 59.61 | 66.22 | 32.30 | 45.42 | 3.63 | 3.03 | 46.54 | 47.02 | 46.40 | |
| EE-M | 99.54 | 0.01 | 43.14 | 76.39 | 58.93 | 66.15 | 31.97 | 45.22 | 3.43 | 3.46 | 90.08 | 46.51 | 46.93 | 46.30 | |
| EE-R | 99.28 | 0.07 | 43.89 | 86.27 | 68.28 | 33.12 | 4.90 | 4.09 | 89.98 | 47.72 | 47.28 | ||||
| EE-RS | 99.01 | 0.02 | 43.53 | 75.59 | 85.11 | 56.77 | 48.19 | 2.20 | 89.72 | 47.01 | 48.28 | ||||
| EE-W | 0.01 | 42.94 | 75.96 | 86.65 | 59.96 | 70.90 | 32.05 | 44.07 | 2.40 | 1.70 | 90.08 | 46.05 | 46.93 | 46.29 | |
| 83.00 | 0.95 | 4.05 | 2.46 | 2.03 | 0.77 | 0.68 | 1.12 | 1.48 | 1.02 | 2.44 | |||||
* means of average prediction accuracy across all ethnic groups, weighted by relative population size (last row); the unweighted figures provide average prediction accuracy regardless of ethnic group size.
Fig 8Prediction accuracy of OM-GA for women (solid line) and men (dashed line) by age in each ethnic group.
Instances of incompleteness of forename and surname pairs, and types of inconsistency between forenames and surnames.
| code | Description |
|---|---|
| _s | forename missing, surname available |
| f_ | forename available, surname missing |
| fs | both forename and surname available |
| 1u | fore and surnames have no possible ethnic group in common |
| con | consonant—both forename and surname are classified as same census ethnicity |
| dis | dissonant—forename and surname are classified as different |
| com | surnames are composites, either hyphenated or spaced (e.g. ‘Brown-Taylor’, ‘De Smith’) |
| mis | surname missing |
| sgl | surnames only have single component (e.g. ‘Brown’, ‘Smith’, ‘Taylor’, ‘O’Brien’) |
| su | surname not matched |
Fig 9Prediction accuracy of EE-A by format of names in each ethnic group.
Re-weighted modifications of algorithms to mitigate the dominance of the ‘White British’ category for EE-A and EE-RS.
| label | description | surname weight | forename weight |
|---|---|---|---|
| EE-A1 | EE-A with WBR re-weighted (surnames) | 1/3 | 1 |
| EE-A2 | EE-A with WBR re-weighted (surnames) | 1/9 | 1 |
| EE-A3 | EE-A with WBR re-weighted (fore and surnames) | 1/3 | 1/3 |
| EE-A4 | EE-A with WBR re-weighted (fore and surnames) | 1/9 | 1/3 |
| EE-A5 | EE-A with WBR re-weighted (fore and surnames) | 1/3 | 1/9 |
| EE-A6 | EE-A with WBR re-weighted (fore and surnames) | 1/9 | 1/9 |
| EE-RS1 | EE-RS with WBR re-weighted (surnames) | 1/3 | 1 |
| EE-RS2 | EE-RS with WBR re-weighted (surnames) | 1/9 | 1 |
Aggregate prediction accuracy of re-weighted algorithms for each ethnic group.
| algo. | ethnic group | weighted | unweighted | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| WBR | WIR | WAO | AIN | APK | ABD | ACN | AAO | BAF | BCA | OXX | WBR-OXX | WAO-OXX | WBR-OXX | WAO-OXX | |
| 89.88 | 47.34 | 59.11 | 64.24 | 76.13 | 56.45 | 78.51 | 23.61 | 39.71 | 2.90 | 5.87 | 82.57 | 46.83 | 49.43 | 45.17 | |
| 99.54 | 0.01 | 43.21 | 76.58 | 87.63 | 59.61 | 66.22 | 32.30 | 45.42 | 3.63 | 3.03 | 90.09 | 46.54 | 47.02 | 46.40 | |
| 1.00 | 46.00 | 78.00 | 89.00 | 60.00 | 68.00 | 33.00 | 46.00 | 5.00 | 4.00 | 89.89 | 48.07 | 57.18 | 47.67 | ||
| 4.00 | 47.00 | 79.00 | 89.00 | 70.00 | 34.00 | 46.00 | 8.00 | 4.00 | 90.05 | 48.87 | 58.27 | 48.67 | |||
| 2.00 | 55.00 | 79.00 | 89.00 | 71.00 | 34.00 | 50.00 | 8.00 | 4.00 | 51.29 | 59.27 | 50.11 | ||||
| 98.00 | 7.00 | 56.00 | 80.00 | 78.00 | 51.00 | 11.00 | 5.00 | 89.85 | 52.63 | 61.09 | 51.89 | ||||
| 97.00 | 7.00 | 64.00 | 80.00 | 72.00 | 11.00 | 9.00 | 89.43 | 55.18 | 61.73 | 52.78 | |||||
| 95.00 | 78.00 | 88.12 | |||||||||||||
| 98.00 | 1.00 | 49.00 | 77.00 | 86.00 | 57.00 | 83.00 | 49.00 | 8.00 | 2.00 | 89.23 | 49.09 | 58.64 | 49.56 | ||
| 95.00 | 51.00 | 79.00 | 86.00 | 58.00 | 50.00 | 3.00 | 87.22 | 50.83 | 61.91 | 51.56 | |||||
| 83.00 | 0.95 | 4.05 | 2.46 | 2.03 | 0.77 | 0.68 | 1.12 | 1.48 | 1.02 | 2.44 | |||||
* weighted average across the ethnic groups shown weighted by the relative frequency of each ethnic group shown in the bottom row. High values column-wise values are highlighted in bold. Note that EE values for individual ethnic groups were rounded by ONS.