| Literature DB >> 19039032 |
C H Jackson1, N G Best, S Richardson.
Abstract
Routinely collected administrative data sets, such as national registers, aim to collect information on a limited number of variables for the whole population. In contrast, survey and cohort studies contain more detailed data from a sample of the population. This paper describes Bayesian graphical models for fitting a common regression model to a combination of data sets with different sets of covariates. The methods are applied to a study of low birth weight and air pollution in England and Wales using a combination of register, survey, and small-area aggregate data. We discuss issues such as multiple imputation of confounding variables missing in one data set, survey selection bias, and appropriate propagation of information between model components. From the register data, there appears to be an association between low birth weight and environmental exposure to NO(2), but after adjusting for confounding by ethnicity and maternal smoking by combining the register and survey data under our models, we find there is no significant association. However, NO(2) was associated with a small but significant reduction in birth weight, modeled as a continuous variable.Entities:
Mesh:
Substances:
Year: 2008 PMID: 19039032 PMCID: PMC2648903 DOI: 10.1093/biostatistics/kxn041
Source DB: PubMed Journal: Biostatistics ISSN: 1465-4644 Impact factor: 5.899
Fig. 1.General model for regression of y on x using a combination of data sets with different observed covariates. Circles represent unknown quantities and squares represent observed data. Covariates x( missing in data set 1 are predicted from a regression fitted using the observed values of x( in data set 2 and variables x( common to both. Covariates x( missing in data set 2 are predicted in a similar way using information from data set 1.
Summary of register, ecological, MCS data, and MCS data weighted to represent the population. Continuous variables summarized as mean (standard deviation), discrete variables summarized as number and percentage
| Administrative data | Millennum Cohort | Millennium Cohort (weighted) | |
| Register data | |||
| Number of births | 56 525 | 13 143 | 13 143 |
| Birth weight (kg) | 3.36 (0.58) | 3.33 (0.58) | 3.37 (0.57) |
| Low birth weight (< 2.5 kg | 3474 (6.1%) | 900 (6.8%) | 793 (6%) |
| NO2 | 29.41 (8.54) | 29.19 (8.98) | 28.66 (8.25) |
| SO2 | 04.2 (1.87) | 4.13 (1.7) | 4.17 (1.7) |
| Social class | |||
| Professional | 1877 (3.3%) | 332 (2.5%) | 452 (3.4%) |
| Managerial, technical | 12 975 (23%) | 2683 (20.4%) | 3258 (24.8%) |
| Skilled nonmanual | 12 062 (21.3%) | 4104 (31.2%) | 4363 (33.2%) |
| Skilled manual | 2875 (5.1%) | 1111 (8.5%) | 1168 (8.9%) |
| Partly skilled | 4422 (7.8%) | 2589 (19.7%) | 2329 (17.7%) |
| Unskilled | 539 (1%) | 495 (3.8%) | 460 (3.5%) |
| Other | 21 775 (38.5%) | 1759 (13.4%) | 1062 (8.1%) |
| Aggregate data | |||
| Inactive | 21 573 (38.2%) | 7224 (55%) | 6480 (49.3%) |
| Ethnic group | |||
| White | 88.3% | 10 342 (78.7%) | 11 484 (87.4%) |
| South Asian | 5.7% | 1658 (12.6%) | 870 (6.6%) |
| Black | 2.9% | 616 (4.7%) | 380 (2.9%) |
| Other | 3.1% | 489 (3.7%) | 372 (2.8%) |
| Tobacco | 237 (85) | ||
| Smoking | 4036 (30.7%) | 3931 (29.9%) | |
The standard definition (United Nations Children's Fund and World Health Organization, 2004).
Unemployed or economically inactive.
Annual tobacco expenditure per person (pounds).
Smoking during pregnancy.
Fig. 2.Graphical model (full probability model for imputation and regression). Unknown quantities (parameters or missing data) are represented by circles and observed data by squares.
Fig. 3.Graphical model (2-stage imputation and regression). In Stage 1, the imputation model with parameters γ,νis fitted to the ethnicity and smoking data xin the MCS and used to predict probabilities qgoverning the missing data xin the register. In Stage 2, the model of interest is fitted to the low–birth weight outcomes yin the MCS and yin the register, using a Dirichlet prior distribution for qparameterized by δ.
Fig. 4.Odds ratios of low birth weight associated with pollution, smoking, ethnicity, and social class, estimated using 3 different combinations of data. In all cases, the fitted model included pollution, smoking, ethnicity, and social class. Horizontal axis is on the log-scale.