| Literature DB >> 34303377 |
Frank Popham1, Elise Whitley2, Oarabile Molaodi2, Linsay Gray2.
Abstract
BACKGROUND: Health surveys provide a rich array of information but on relatively small numbers of individuals and evidence suggests that they are becoming less representative as response levels fall. Routinely collected administrative data offer more extensive population coverage but typically comprise fewer health topics. We explore whether data combination and multiple imputation of health variables from survey data is a simple and robust way of generating these variables in the general population.Entities:
Keywords: Census; Multiple imputation; Surveys; Validation
Year: 2021 PMID: 34303377 PMCID: PMC8310590 DOI: 10.1186/s12982-021-00099-z
Source DB: PubMed Journal: Emerg Themes Epidemiol ISSN: 1742-7622
Distribution of harmonised key and auxiliary imputation variables in census and survey datasets
| Census data (n (%)) | Survey data (n (%)) | |
|---|---|---|
| Key variables | ||
| Sex | ||
| Male | 687,812 (49.5) | 64,166 (47.9) |
| Female | 702,282 (50.5) | 69,817 (52.1) |
| Age group | ||
| 25–29 | 179,770 (12.9) | 14,742 (11.0) |
| 30–34 | 173,146 (12.5) | 15,674 (11.7) |
| 35–39 | 176,241 (12.7) | 16,107 (12.0) |
| 40–44 | 193,181 (13.9) | 18,178 (13.6) |
| 45–49 | 192,828 (13.9) | 18,596 (13.9) |
| 50–54 | 169,063 (12.2) | 17,300 (12.9) |
| 55–59 | 148,525 (10.7) | 15,998 (11.9) |
| 60–64 | 157,340 (11.3) | 17,488 (13.0) |
| Housing tenure | ||
| Owns home outright | 297,630 (21.4) | 30,938 (23.1) |
| Mortgage/shared ownership | 637,581 (45.9) | 62,250 (46.4) |
| Private rent | 256,592 (18.5) | 21,518 (16.1) |
| Social rent | 198,291 (14.3) | 19,377 (14.5) |
| English region | ||
| North east | 67,752 (4.9) | 10,994 (8.2) |
| North west | 183,198 (13.2) | 21,460 (16.0) |
| Yorkshire and the Humber | 136,179 (9.8) | 14,438 (10.8) |
| East Midlands | 117,676 (8.5) | 9,064 (6.8) |
| West Midlands | 143,120 (10.3) | 14,076 (10.5) |
| East of England | 152,850 (11.0) | 11,856 (8.8) |
| London | 230,677 (15.6) | 17,830 (13.3) |
| South east | 223,883 (16.1) | 20,509 (15.3) |
| South west | 134,759 (9.7) | 13,856 (10.3) |
| Auxiliary variables | ||
| Highest educational qualificationa | ||
| Higher education | 467,010 (33.6) | 43,244 (32.3) |
| School (advanced) | 165,525 (11.9) | 26,421 (19.7) |
| School (standard) | 413,096 (29.7) | 36,913 (27.5) |
| Other qualifications | 128,137 (9.2) | 10,417 (7.8) |
| No formal qualifications | 216,326 (15.6) | 17,088 (12.7) |
| Marital status | ||
| Single | 410,131 (29.5) | 35,650 (26.6) |
| Married | 747,394 (53.8) | 77,440 (57.8) |
| Civil partnership | 4327 (0.3) | 486 ( 0.4) |
| Divorced | 153,731 (11.1) | 13,559 (10.1) |
| Widowed | 24,058 (1.7) | 2412 (1.8) |
| Separated | 50,453 (3.6) | 4536 (3.4) |
| Country of birth | ||
| UK | 1,141,684 (82.1) | 112,135 (83.6) |
| European Union | 74,891 (5.4) | 6434 (4.8) |
| Other | 173,519 (12.5) | 15,514 (11.6) |
| Ethnicity | ||
| White | 1,196,262 (86.1) | 117,832 (87.9) |
| Mixed race | 19,874 (1.4) | 973 (0.7) |
| Asian | 110,598 (8.0) | 9729 (7.3) |
| Black | 47,930 (3.5) | 3757 (2.8) |
| Other | 15,430 (1.1) | 1792 (1.3) |
aHigher education: Degree level qualification (or equivalent)/Higher education qualification below degree level; School (advanced): A Level/Higher/Advanced Diploma/Progression Diploma/ONC/National Level BTEC; School (standard): O Level or GCSE equivalent/O Grade or CSE equivalent/Standard Grade/Higher or Foundation diploma; Other qualifications: including foreign qualifications below degree level
Overall distribution of self-rated health in original census data versus data from or imputed from survey data
| Census data | Raw survey data | Imputed from survey data | ||||
|---|---|---|---|---|---|---|
| Standard logistic regression | Poisson regression | Ordinal logistic regression | Multinomial logistic regression | |||
| % (N) | % (N) | % (95% CI) | % (95% CI) | % (95% CI) | % (95% CI) | |
| Self-rated health | ||||||
| Very good | 42.6 (592,633) | 37.9 (50,792) | – | – | 38.0 (37.7, 38.3) | 38.2 (37.9, 38.5) |
| Good | 40.2 (558,445) | 40.5 (54,281) | 40.2 (39.8, 40.5) | 40.1 (39.8, 40.4) | ||
| Fair | 12.1 (167,813) | 15.4 (20,707) | 15.6 (15.4, 15.8) | 15.2 (14.9, 15.4) | ||
| Bad | 4.0 (55,535) | 4.8 (6440) | 4.8 (4.7, 4.9) | 4.9 (4.8, 5.0) | ||
| Very bad | 1.1 (15,668) | 1.4 (1863) | 1.4 (1.3, 1.4) | 1.6 (1.5, 1.8) | ||
| Self-rated health | ||||||
| Very good/good/fair | 94.9 (1,318,891) | 93.8 (125,780) | 93.7 (93.6, 93.9) | 93.8 (93.7, 93.9) | 93.8 (93.7, 94.0) | 93.5 (93.3, 93.7) |
| Bad/very bad | 5.1 (71,203) | 6.2 (8303) | 6.3(6.1, 6.4) | 6.2 (6.0, 6.3) | 6.2 (6.0, 6.3) | 6.5 (6.3, 6.7) |
Fig. 1Comparison of proportion of bad or very bad self-rated health in original census data versus survey data
Linear associations between proportion of bad or very bad self-rated health across 576 groups comparing original census data with data from or imputed from survey data
| Raw survey data | Imputed from survey data | ||||
|---|---|---|---|---|---|
| Standard logistic regression | Poisson regression | Ordinal logistic regression | Multinomial logistic regression | ||
| Linear regression line | |||||
| Intercept (95% CI) | 0.01 (0.00, 0.01) | 0.00 (− 0.00, 0.00) | 0.00 (− 0.00, 0.00) | − 0.01 (− 0.01, − 0.00) | − 0.00 (− 0.01, − 0.00) |
| Slope (95% CI) | 0.82 (0.79, 0.84) | 0.82 (0.79, 0.84) | 0.80 (0.78, 0.82) | 1.00 (0.98, 1.03) | 0.83 (0.81, 0.85) |
| Correlation | 0.93 | 0.95 | 0.95 | 0.96 | 0.95 |
Fig. 2Comparison of proportion of bad or very bad self-rated health in original census data versus data imputed from survey data