| Literature DB >> 35058471 |
Guoqiang Wu1, Alison Heppenstall2,3, Petra Meier4, Robin Purshouse5, Nik Lomax2,3.
Abstract
In order to understand the health outcomes for distinct sub-groups of the population or across different geographies, it is advantageous to be able to build bespoke groupings from individual level data. Individuals possess distinct characteristics, exhibit distinct behaviours and accumulate their own unique history of exposure or experiences. However, in most disciplines, not least public health, there is a lack of individual level data available outside of secure settings, especially covering large portions of the population. This paper provides detail on the creation of a synthetic micro dataset for individuals in Great Britain who have detailed attributes which can be used to model a wide range of health and other outcomes. These attributes are constructed from a range of sources including the United Kingdom Census, survey and administrative datasets. It provides a rationale for the need for this synthetic population, discusses methods for creating this dataset and provides some example results of different attribute distributions for distinct sub-population groups and over different geographical areas.Entities:
Year: 2022 PMID: 35058471 PMCID: PMC8776798 DOI: 10.1038/s41597-022-01124-9
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
Fig. 1Methodological framework of generating synthetic population.
Summary of linking variables in Understanding Society adult microdata.
| Variable | Description | Values and categories |
|---|---|---|
| PID | Personal identifier code | e.g. 476867687, 483520936, etc. |
| Sex/age | Sex and age group | M_16_24 (Male, aged 16–24 years) |
| M_25_34 (Male, aged 25–34 years) | ||
| M_35_49 (Male, aged 35–49 years) | ||
| M_50_64 (Male, aged 50–64 years) | ||
| M_65_74 (Male, aged 65–74 years) | ||
| M_75+ (Male, aged 75 years and over) | ||
| F_16_24 (Female, aged 16–24 years) | ||
| F_25_34 (Female, aged 25–34 years) | ||
| F_35_49 (Female, aged 35–49 years) | ||
| F_50_64 (Female, aged 50–64 years) | ||
| F_65_74 (Female, aged 65–74 years) | ||
| F_75+ (Female, aged 75 years and over) | ||
| Ecostatus | Economic status | In paid employment |
| Self-employed | ||
| Unemployed | ||
| Full-time student | ||
| Retired | ||
| Looking after home or family | ||
| Long-term sick or disabled | ||
| Others | ||
| Hiqualif | Highest educational qualification | None (no qualification) |
| Level 1 or 2 (O Levels/CSE/GCSEs or equivalent) | ||
| Level 3 (A Levels or equivalent) | ||
| Level 4 or above (Degrees or higher degrees) | ||
| Others (e.g. apprenticeships) | ||
| Marstat | Marital status and civil partnership status | Single |
| Married | ||
| Civil partnership | ||
| Separated | ||
| Divorced | ||
| Widowed or surviving partner | ||
| Ethnicity | Ethnic group | White |
| Asian | ||
| Black | ||
| Mixed | ||
| Others | ||
| Hhtype | Composition of household | 1_adult_no_child (1 adult only) |
| 1_adult_child (1 adult with child/children) | ||
| 1_couple_no_child (1 couple without child) | ||
| 1_couple_child (1 couple with child/children) | ||
| Others_no_child (Other compositions without child) | ||
| Others_child (Other compositions with child/children) | ||
| Tenure | Housing tenure | Owned outright |
| Owned mortgage | ||
| Social rented | ||
| Private rented | ||
| Others |
Example of formatted adult microdata.
| PID | Sex/age | Ecostatus | Hiqualif | Marstat | Ethnicity | Hhtype | Tenure |
|---|---|---|---|---|---|---|---|
| 476867699 | F_16_24 | In_paid_employment | Level_3 | Single | White | Others_no_child | Private_rented |
| 477211091 | M_75+ | Retired | Others | Married | White | 1_couple_no_child | Owned_outright |
| 477285207 | F_35_49 | Self_employed | Level_4_above | Single | White | 1_adult_child | Owned_outright |
| 478511255 | M_16_24 | Student | Level_3 | Single | Mixed | Others_no_child | Owned_outright |
| 477034289 | F_25_34 | In_paid_employment | Others | Single | White | Others_no_child | Owned_outright |
| 478631609 | M_16_24 | In_paid_employment | Level_3 | Single | White | Others_no_child | Owned_outright |
| 478913817 | F_16_24 | In_paid_employment | Level_3 | Single | White | Others_no_child | Owned_outright |
Constraints and source datasets for adult and child microsimulation models.
| Model | Constraint | Source datasets |
|---|---|---|
| Adult & Child | Sex/age | ONS Mid-year (2018) population estimates - 2011 LSOA based by single year of age ( |
| NRS Mid-2018 Small Area Population Estimates Scotland for 2011 Data Zones - by single year of age ( | ||
| Adult | Economic status | 2011 Census Table LC6107EW - Economic activity by sex by age for 2011 LSOA ( |
| 2011 Scotland’s Census Table LC6107SC - Economic activity by age for 2011 Data Zone ( | ||
| Adult | Highest level of qualification | 2011 Census Table QS501EW - Highest level of qualification for 2011 LSOA ( |
| 2011 Scotland’s Census Table QS501SC - Highest level of qualification for 2011 Data Zone ( | ||
| Adult | Marital status | 2011 Census Table KS103EW - Marital and civil partnership status for 2011 LSOA ( |
| 2011 Scotland’s Census Table KS103SC - Marital and civil partnership status for 2011 Data Zone ( | ||
| Adult | Ethnicity | 2011 Census Table LC2109EWLS - Ethnic group by age for LSOA ( |
| 2011 Scotland’s Census Table LC2101SC - Ethnic group by age for 2011 Data Zone ( | ||
| Adult | Household composition | 2011 Census Table LC1109EW - Household composition by age by sex for 2011 LSOA ( |
| 2011 Scotland’s Census Table LC1109SC – Household composition by age for 2011 Data Zone ( | ||
| Adult | Housing Tenure | 2011 Census Table LC3409EW - General health by tenure by age for 2011 LSOA ( |
| 2011 Scotland’s Census Table LC4302SC - Tenure by general health by long-term health problem or disability by age for Data Zone ( |
(Note: For adult model, constraints contain samples of usual residents aged 16+ derived from the original datasets. For child model, the constraint contains the derived samples of usual residents aged 15 and under).
Extracted sample of the sex/age constraint for adult model.
| LSOA code | M_16_24 | M_25_34 | M_35_49 | M_50_64 | M_65_74 | M_75+ | …. | F_75+ | Total (adult) baseline |
|---|---|---|---|---|---|---|---|---|---|
| E01004766 | 99 | 114 | 153 | 157 | 99 | 76 | …. | 99 | 1356 |
| E01004767 | 114 | 158 | 191 | 155 | 70 | 71 | …. | 100 | 1513 |
| E01004768 | 73 | 75 | 120 | 201 | 93 | 53 | …. | 57 | 1267 |
(Note: Table is only for illustrative purposes and therefore does not present all categories).
Extracts from the economic status and highest level of qualification constraints.
| LSOA code | Economic status | Highest level of qualification | |||||||
|---|---|---|---|---|---|---|---|---|---|
| In paid employement | Self-employed | Retired | Full-time student | …. | Level 1/2 | Level 3 | Level 4/above | …. | |
| E01004766 | 630 | 89 | 181 | 88 | …. | 407 | 163 | 267 | …. |
| E01004767 | 697 | 104 | 183 | 100 | …. | 442 | 212 | 312 | …. |
| E01004768 | 693 | 148 | 184 | 84 | …. | 379 | 169 | 488 | …. |
(Note: Table is only for illustrative purposes and therefore does not present all categories).
Extracts from the marital status and ethnicity constraints.
| LSOA code | Marital status | Ethnicity | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Single | Married | Separated | Divorced | …. | White | Mixed | Asian | Black | Others | |
| E01004766 | 464 | 502 | 51 | 156 | …. | 1375 | 26 | 161 | 28 | 2 |
| E01004767 | 511 | 594 | 54 | 120 | …. | 1337 | 30 | 298 | 10 | 3 |
| E01004768 | 279 | 826 | 24 | 88 | …. | 1501 | 28 | 63 | 4 | 5 |
(Note: Table is only for illustrative purposes and therefore does not present all categories).
Extracts from the household composition and housing tenure constraints.
| LSOA code | Household composition | Housing tenure | |||||||
|---|---|---|---|---|---|---|---|---|---|
| 1 adult no child | 1adult with child | 1 couple no child | 1 couple with child | …. | Owned outright | Owned mortgage | Social rented | …. | |
| E01004766 | 350 | 222 | 340 | 544 | …. | 440 | 636 | 180 | …. |
| E01004767 | 286 | 143 | 398 | 646 | …. | 527 | 694 | 85 | …. |
| E01004768 | 100 | 90 | 422 | 872 | …. | 519 | 970 | 16 | …. |
(Note: Table is only for illustrative purposes and therefore does not present all categories).
Example of individual health and household income summaries in adult microdata.
| PID | Subjective wellbeing (Likert score) | Subjective wellbeing (Caseness score) | SF-12 Physical Component Summary | SF-12 Mental Component Summary | Total household net income per month (£) |
|---|---|---|---|---|---|
| 476867699 | 14 | 4 | 58.69 | 31.54 | 2459.68 |
| 477211091 | 10 | 0 | 42.18 | 49.02 | 2068.67 |
| 477285207 | 17 | 6 | 28.86 | 37.48 | 2986.60 |
| 478511255 | 7 | 0 | 46.55 | 55.97 | 6823.43 |
| 477034289 | 16 | 5 | 61.80 | 48.18 | 3614.15 |
| 478631609 | 6 | 0 | 59.91 | 52.21 | 7750.00 |
| 478913817 | 18 | 5 | 52.18 | 25.61 | 2642.31 |
Fig. 2Examples of aggregated health conditions estimates at LSOA and equivalent level in the four selected city regions.
Validation metrics for the comparison of simulated and actual counts in each constraint.
| Constraint | R2 | SRSME | RE |
|---|---|---|---|
| Sex/age | 0.999997 | 0.000478 | 0.000002 |
| Economic status | 0.999759 | 0.114631 | 0.005403 |
| Highest level of qualification | 0.986956 | 0.270564 | 0.014439 |
| Marital status | 0.999968 | 0.085743 | 0.002326 |
| Ethnicity | 0.999991 | 0.001056 | 0.000095 |
| Household composition | 0.995798 | 0.246496 | 0.009777 |
| Housing tenure | 0.984656 | 0.320565 | 0.020327 |
Correlations between IMD/WIMD/SIMD rank and microsimulation estimates of income and health conditions at LSOA or Data Zone level.
| Spearman’s Rho | |||
|---|---|---|---|
| England | Wales | Scotland | |
| (Greater Manchester & Sheffield) | (Cardiff) | (Glasgow) | |
| Total household net income per month | 0.8718 (P-value < 0.0001) | 0.8842 (P-value < 0.0001) | 0.8591 (P-value < 0.0001) |
| SF-12 Physical Component Summary | 0.7224 (P-value < 0.0001) | 0.7352 (P-value < 0.0001) | 0.7641 (P-value < 0.0001) |
| SF-12 Mental Component Summary | 0.8704 (P-value < 0.0001) | 0.8839 (P-value < 0.0001) | 0.8454 (P-value < 0.0001) |
| Subjective wellbeing (Likert score) | −0.9041 (P-value < 0.0001) | −0.9174 (P-value < 0.0001) | −0.8867 (P-value < 0.0001) |
| Subjective wellbeing (Caseness score) | −0.8992 (P-value < 0.0001) | −0.9135 (P-value < 0.0001) | −0.8783 (P-value < 0.0001) |
Fig. 3Comparison of microsimulation estimates of annual household net income and ONS income estimates at MSOA level.
| Measurement(s) | Health Disparity Populations • socio-economic outcomes |
| Technology Type(s) | computational modeling technique • digital curation |
| Factor Type(s) | geographical area |
| Sample Characteristic - Location | Great Britain |