| Literature DB >> 35654993 |
Mohammad Mamouei1, Yajie Zhu2, Milad Nazarzadeh2, Abdelaali Hassaine2, Gholamreza Salimi-Khorshidi2, Yutong Cai2, Kazem Rahimi2.
Abstract
Multicollinearity refers to the presence of collinearity between multiple variables and renders the results of statistical inference erroneous (Type II error). This is particularly important in environmental health research where multicollinearity can hinder inference. To address this, correlated variables are often excluded from the analysis, limiting the discovery of new associations. An alternative approach to address this problem is the use of principal component analysis. This method, combines and projects a group of correlated variables onto a new orthogonal space. While this resolves the multicollinearity problem, it poses another challenge in relation to interpretability of results. Standard hypothesis testing methods can be used to evaluate the association of projected predictors, called principal components, with the outcomes of interest, however, there is no established way to trace the significance of principal components back to individual variables. To address this problem, we investigated the use of sparse principal component analysis which enforces a parsimonious projection. We hypothesise that this parsimony could facilitate the interpretability of findings. To this end, we investigated the association of 20 environmental predictors with all-cause mortality adjusting for demographic, socioeconomic, physiological, and behavioural factors. The study was conducted in a cohort of 379,690 individuals in the UK. During an average follow-up of 8.05 years (3,055,166 total person-years), 14,996 deaths were observed. We used Cox regression models to estimate the hazard ratio (HR) and 95% confidence intervals (CI). The Cox models were fitted to the standardised environmental predictors (a) without any transformation (b) transformed with PCA, and (c) transformed with SPCA. The comparison of findings underlined the potential of SPCA for conducting inference in scenarios where multicollinearity can increase the risk of Type II error. Our analysis unravelled a significant association between average noise pollution and increased risk of all-cause mortality. Specifically, those in the upper deciles of noise exposure have between 5 and 10% increased risk of all-cause mortality compared to the lowest decile.Entities:
Mesh:
Year: 2022 PMID: 35654993 PMCID: PMC9163152 DOI: 10.1038/s41598-022-13362-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Descriptive summary of the study sample, environmental exposures, and outcome.
| Variables | Women | Men | All |
|---|---|---|---|
| Age, mean (SD) | 56.66 (7.94) | 57.15 (8.10) | 56.88 (8.02) |
| Age at the time of event (SD) | 66.39 (7.32) | 67.28 (6.87) | 66.93 (7.06) |
| Townsend deprivation index, mean (SD) | − 1.44 (2.96) | − 1.39 (3.06) | − 1.42 (3.01) |
| Ethnicity: British (%) | 87.98 | 88.84 | 88.37 |
| Ethnicity: Any other white background (%) | 3.56 | 2.60 | 3.12 |
| Ethnicity: Irish (%) | 2.46 | 2.67 | 2.55 |
| Ethnicity: Indian (%) | 1.20 | 1.37 | 1.28 |
| Ethnicity: other (%) | 3.53 | 4.43 | 4.58 |
| Annual average day-time noise level (dB(A))a, mean (SD) | 55.35 (4.22) | 55.39 (4.27) | 55.37 (4.24) |
| Annual average evening noise level (dB(A))a, mean (SD) | 51.61 (4.22) | 51.64 (4.27) | 51.62 (4.24) |
| Annul average night-time noise level (dB(A))a, mean (SD) | 46.53 (4.22) | 46.57 (4.27) | 46.55 (4.24) |
| Domestic garden coverage (%) within 1000mb, mean (SD) | 24.46 (11.26) | 24.24 (11.23) | 24.36 (11.25) |
| Domestic garden coverage (%) within 300mb, mean (SD) | 31.49 (14.67) | 31.27 (14.70) | 31.39 (14.68) |
| Greenspace coverage (%) within 1000mc, mean (SD) | 45.28 (21.56) | 45.37 (21.44) | 45.32 (21.51) |
| Greenspace coverage (%) within 300mc, mean (SD) | 35.40 (23.20) | 35.51 (23.07) | 35.45 (23.14) |
| Natural environment coverage (%) within 1000md, mean (SD) | 41.32 (25.67) | 41.35 (25.59) | 41.33 (25.63) |
| Natural environment coverage (%) within 300md, mean (SD) | 26.68 (25.31) | 26.78 (25.25) | 26.72 (25.28) |
| Water body coverage (%) within 1000me, mean (SD) | 1.24 (2.46) | 1.25 (2.45) | 1.25 (2.46) |
| Water body coverage (%) within 300me, mean (SD) | 0.87 (2.88) | 0.89 (2.90) | 0.88 (2.89) |
| Costal distance (meter), mean (SD) | 45.39 (26.82) | 45.85 (26.77) | 45.60 (26.80) |
| NO2; (μg/m3), mean (SD) | 26.67 (7.50) | 26.72 (7.58) | 26.69 (7.54) |
| NOx; (μg/m3), mean (SD) | 43.89 (15.21) | 44.05 (15.55) | 43.96 (15.36) |
| PM10; (μg/m3), mean (SD) | 16.22 (1.87) | 16.23 (1.88) | 16.23 (1.87) |
| PMcoarse; (μg/m3)f, mean (SD) | 6.42 (0.89) | 6.42 (0.89) | 6.42 (0.89) |
| PM2.5; (μg/m3), mean (SD) | 9.98 (1.03) | 9.99 (1.05) | 9.98 (1.04) |
| Sum of major road length within 100 m (m) g, mean (SD) | 27.25 (75.41) | 28.23 (77.80) | 27.70 (76.51) |
| Traffic intensity on nearest major road (vehicles/day)h, mean (SD) | 23,472.94 (21,322.17) | 23,477.37 (21,272.41) | 23,474.95 (21,299.52) |
| Traffic intensity on nearest road (vehicles/day)h, mean (SD) | 1480.04 (4906.16) | 1516.51 (5020.38) | 1496.63 (4958.49) |
| Years of follow-up, mean (SD) | 8.08 (1.03) | 8.01 (1.19) | 8.05 (1.10) |
| Number of events | 5954 (2.88%) | 9042 (5.23%) | 14,996 (3.95%) |
| Incidence rate, per 1000 person-years | 4 | 7 | 5 |
aAverage sound level pressure LAeq between the hours of 07:00 to 19:00 for day-time; 19:00–23:00 for evening; 23:00–07:00 for night-time;
bDerived from the land use types classed as 'domestic garden' from the Generalised Land Use Database (GLUD) 2005 for England at the Census Output Area level;
cDerived from the land use types classed as 'greenspace' from the Generalised Land Use Database (GLUD) 2005 for England at the Census Output Area level;
dDerived from the land cover classified as 'natural environment' from the Land Cover Map (LCM) 2007;
eDerived from the land use types classed as 'water' from the Generalised Land Use Database (GLUD) 2005 for England at the Census Output Area level;
fPM coarse (particulate matter between 2.5 and 10 µm); Land Use Regression (LUR) estimate for annual average 2010;
gThe definition of a major road for the local road network is a road with traffic intensity greater than 5000 motor vehicles per 24 h;
hTraffic intensity is the average total number of motor vehicles per 24 h on the nearest major road based upon a local road network.
Figure 1The top 20 primary causes of deaths within the cohort.
Figure 2Schematic representation of PCA and Sparse PCA projection of the variables (x) to the latent space or principal components (z). The second layer shows a subsequent regression analysis for the outcome of interest (y).
Figure 3(a) The plot shows log(HR) per 1 standard deviation increase of the variables (b) Pairwise Pearson correlation between socioeconomic, demographic, physiological and environmental factors in a large cohort of 379,690 in the UK.
Figure 4Log(HR) for different values of the L1 penalty coefficient () in the penalised Cox model.
Figure 5Schematic representation of the association of environmental variables with all-cause mortality using a two-stage regression analysis (a) with PCA and (b) with SPCA. In the first stage (i.e. dimensionality reduction), the variables are transformed to principal components. In the second stage, a Cox model was used to investigate the association of the transformed variables and all-cause mortality.
Figure 6Log(Hazard Ratio) of all-cause mortality for different noise pollution exposure deciles compared to the lowest decile, i.e. (46.72, 47.23], after adjusting for socioeconomic, demographic, environmental, and physiological, and behavioural covariates.