| Literature DB >> 29563153 |
Pooja Jain1, Paolo Vineis1,2, Benoît Liquet3,4, Jelle Vlaanderen5, Barbara Bodinier1, Karin van Veldhoven1, Manolis Kogevinas6,7,8,9, Toby J Athersuch1,10, Laia Font-Ribera6,7,8,9, Cristina M Villanueva6,7,8,9, Roel Vermeulen1,5, Marc Chadeau-Hyam1,5.
Abstract
Epidemiological studies provide evidence that environmental exposures may affect health through complex mixtures. Formal investigation of the effect of exposure mixtures is usually achieved by modelling interactions, which relies on strong assumptions relating to the identity and the number of the exposures involved in such interactions, and on the order and parametric form of these interactions. These hypotheses become difficult to formulate and justify in an exposome context, where influential exposures are numerous and heterogeneous. To capture both the complexity of the exposome and its possibly pleiotropic effects, models handling multivariate predictors and responses, such as partial least squares (PLS) algorithms, can prove useful. As an illustrative example, we applied PLS models to data from a study investigating the inflammatory response (blood concentration of 13 immune markers) to the exposure to four disinfection by-products (one brominated and three chlorinated compounds), while swimming in a pool. To accommodate the multiple observations per participant (n=60; before and after the swim), we adopted a multilevel extension of PLS algorithms, including sparse PLS models shrinking loadings coefficients of unimportant predictors (exposures) and/or responses (protein levels). Despite the strong correlation among co-occurring exposures, our approach identified a subset of exposures (n=3/4) affecting the exhaled levels of 8 (out of 13) immune markers. PLS algorithms can easily scale to high-dimensional exposures and responses, and prove useful for exposome research to identify sparse sets of exposures jointly affecting a set of (selected) biological markers. Our descriptive work may guide these extensions for higher dimensional data. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.Entities:
Keywords: OMICs data; exposome; multi-level sparse PLS models; multiple exposures; multivariate response
Mesh:
Substances:
Year: 2018 PMID: 29563153 PMCID: PMC6031275 DOI: 10.1136/jech-2017-210061
Source DB: PubMed Journal: J Epidemiol Community Health ISSN: 0143-005X Impact factor: 3.710
Summary statistics of PISCINA II study—mean (SD) levels of exposures and proteins before and after swimming
| Before swimming | After swimming | P values | |
| n=56 | n=56 | ||
| Exposures in exhaled breath (μg/m3) | |||
| CHCl3 | 0.43 (0.30) | 11.53 (4.83) | 7.0E−20 |
| BDCM | 0.06 (0.05) | 2.49 (1.23) | 7.0E−20 |
| DBCM | 0.02 (0.03) | 0.54 (0.33) | 1.3E−19 |
| CHBr3 | 0.03 (0.02) | 0.11 (0.08) | 3.6E−16 |
| Outcome (proteins concentration in pg/mL) | |||
| CCL11 | 131.30 (30.86) | 121.72 (29.36) | 0.038 |
| CCL2 | 214.92 (86.21) | 204.52 (89.67) | 0.420 |
| CCL22 | 473.54 (195.94) | 442.82 (193.87) | 0.330 |
| CRP | 1616 (3,084.76) | 1559.74 (3,081.06) | 0.771 |
| CXCL10 | 22.75 (11.44) | 19.79 (10.27) | 0.049 |
| EGF | 33.76 (26.20) | 33.19 (34.37) | 0.213 |
| G-CSF | 12.71 (6.44) | 12.53 (6.66) | 0.864 |
| IL-17 | 4.79 (2.10) | 4.74 (1.86) | 0.877 |
| IL-1ra | 351.05 (193.27) | 424.81 (252.39) | 0.030 |
| IL-8 | 4.44 (2.77) | 3.77 (2.04) | 0.265 |
| MPO | 12 885.16 (8,998.80) | 12 829.78 (6,520.99) | 0.248 |
| Periostin | 127,536.87 (39,857.79) | 129,579.87 (41,388.81) | 0.830 |
| VEGF | 54.76 (44.71) | 52.48 (42.03) | 0.841 |
* Differences in the mean levels before and after the swimming experiment are assessed using a paired Student t -test (log-transformed exposures values were considered).
BDCM, bromodichloromethane; CCL11, C-C motif chemokine 11, CCL2 motif, chemokine (C-C motif) ligand 2; CCL22, C-C motif chemokine 22; CHBr3, bromoform; CRP, C reactive protein;, CHCl3, chloroform; CXCL10, C-X-C motif chemokine 10, DBCM, dibromochloromethane; EGF, epidermal growth factor, G-CSF, granulocyte colony-stimulating factor, IL, interleukin; MPO, myeloperoxidase; VEGF, vascular endothelial growth factor.
Figure 1Spearman correlation coefficients for exposures (top) and Pearson correlation coefficients for protein levels (bottom) before (first column) and after (second column) the swim. The third column represent the correlation coefficients between differences in exposures and proteins levels. BDCM, bromodichloromethane; CCL11, C-C motif chemokine 11, CCL2 motif, chemokine (C-C motif) ligand 2; CCL22, C-C motif chemokine 22; CHBr3, bromoform; CRP, C reactive protein; CHCl3, chloroform; CXCL10, C-X-C motif chemokine 10; DBCM, dibromochloromethane; EGF, epidermal growth factor, G-CSF, granulocyte colony-stimulating factor; IL, interleukin; MPO, myeloperoxidase; VEGF, vascular endothelial growth factor.
Results from the multilevel (s)PLS analyses regressing the four exposures (predictors) against the 13 assayed proteins (response). Results are presented for the PLS analyses, for sparse PLS models performing variable selection on exposures (sPLS on X), on proteins (sPLS on Y) and on both exposures and proteins sPLS on X and Y. We report in the table the loadings coefficients for the (s)PLS components of exposures (top table) and proteins (bottom table). For the (sPLS) components of exposures, we report the per-component proportion of variance (in both X and Y) explained, and for components of the proteins we only report the proportion of the variance in Y. For all sparse PLS models, results are only presented for the first PLS component, which is the only one to be retained according to the Q2 criterion (see methods)
| PLS | sPLS on X | sPLS on Y | sPLS on X and Y | ||||
| Exposures (X matrix) | C 1X | C 2X | C 3X | C 4X | C 1X′ | C 1X″ | C 1X‴ |
| CHCl3 | − 0.50 | − 0.60 | − 0.60 | − 0.17 | − 0.48 | − 0.50 | -0.48 |
| BDCM | −0.52 | −0.21 | 0.45 | 0.70 | −0.67 | −0.52 | −0.66 |
| DBCM | −0.51 | 0.11 | 0.51 | −0.68 | −0.57 | −0.51 | −0.58 |
| CHBr3 | −0.46 | 0.76 | −0.42 | 0.15 | 0.00 | −0.46 | 0.00 |
| Explained Variance in X | 94.8% | 4.5% | 0.6% | 0.04% | 94.0% | 94.8% | 94.0% |
| Explained Variance in Y | 10.1% | 1.3% | 1.9% | 1.3% | 10.4% | 14.2% | 16.1% |
| Protein levels (Y matrix) | C1Y | C2Y | C3Y | C4Y | C1Y' | C1Y'’ | C1Y'’’ |
| CCL2 | 0.12 | 0.195 | −0.09 | −0.02 | 0.13 | 0.00 | 0.00 |
| IL-8 | 0.31 | 0.062 | 0.19 | 0.12 | 0.32 | 0.30 | 0.29 |
| EGF | −0.10 | 0.216 | −0.38 | −0.11 | −0.09 | 0.00 | 0.00 |
| MPO | −0.14 | 0.310 | 0.18 | 0.05 | −0.13 | −0.02 | 0.00 |
| VEGF | 0.21 | −0.266 | −0.11 | −0.36 | 0.20 | 0.13 | 0.11 |
| IL-17 | 0.03 | 0.169 | 0.20 | 0.22 | 0.03 | 0.00 | 0.00 |
| CCL22 | 0.42 | −0.131 | −0.32 | −0.09 | 0.41 | 0.44 | 0.43 |
| G-CSF | 0.05 | −0.079 | −0.41 | −0.43 | 0.05 | 0.00 | 0.00 |
| CCL11 | 0.29 | 0.221 | −0.27 | −0.16 | 0.30 | 0.26 | 0.26 |
| CRP | 0.19 | 0.367 | −0.11 | −0.53 | 0.20 | 0.09 | 0.11 |
| CXCL10 | 0.57 | 0.121 | −0.05 | 0.46 | 0.57 | 0.68 | 0.67 |
| Periostin | −0.18 | −0.318 | −0.31 | −0.08 | −0.18 | −0.08 | −0.08 |
| IL-1ra | −0.38 | −0.627 | 0.52 | −0.28 | −0.40 | −0.39 | −0.41 |
| Explained Variance in Y | 19.7% | 6.9% | 19.5% | 23.3% | 19.8% | 17.7% | 17.4% |
Results are presented for the PLS analyses, for sparse PLS models performing variable selection on exposures (sPLS on X), on proteins (sPLS on Y) and on both exposures and proteins sPLS on X and Y. We report in the table the loadings coefficients for the (s)PLS components of exposures (top table) and proteins (bottom table). For the (sPLS) components of exposures, we report the per-component proportion of variance (in both X and Y) explained, and for components of the proteins we only report the proportion of the variance in Y. For all sparse PLS models, results are only presented for the first PLS component, which is the only one to be retained according to the Q 2 criterion (see methods).
BDCM, bromodichloromethane; CCL11, C-C motif chemokine 11, CCL2 motif, chemokine (C-C motif) ligand 2; CCL22, C-C motif chemokine 22; CHBr3, bromoform; CRP, C reactive protein; CHCl3, chloroform; CXCL10, C-X-C motif chemokine 10; DBCM, dibromochloromethane; EGF, epidermal growth factor, G-CSF, granulocyte colony-stimulating factor; IL, interleukin; MPO, myeloperoxidase; PLS, partial least squares; sPLS, sparse partial least squares; VEGF, vascular endothelial growth factor.
Figure 2Variable importance in projection plots and proportion of variance explained by protein. Results are presented for PLS model (A), for sparse PLS performing variable selection on exposures (B), on proteins (C), and both on exposures and proteins (D). CCL11, C-C motif chemokine 11, CCL2 motif, chemokine (C-C motif) ligand 2; CCL22, C-C motif chemokine 22; CRP, C reactive protein; CXCL10, C-X-C motif chemokine 10; EGF, epidermal growth factor, G-CSF, granulocyte colony-stimulating factor; IL, interleukin; MPO, myeloperoxidase; PLS, partial least squares; sVEGF, vascular endothelial growth factor.
Figure 3X-Y score plot representing the PLS scores for the first exposure PLS component (C1X‴x-axis) as a function of the scores of the first PLS component for proteins (C1Y‴, y-axis). Scores are presented for all (n=60) participants before (blue), and after (orange) the swimming session. Results are presented for the sparse PLS models performing variable selection of both exposures and proteins. PLS, partial least squares.
Figure 4Per-protein coefficient of determination (R2) (A) and Akaike information criterion (B) for the four PLS models investigated: non-penalised, with variable selection on X, on Y and on both X and Y. Results are also represented for a linear mixed model using the participant ID as random effect, and the set of four exposure are fixed effects, in relation to each protein separately. CCL11, C-C motif chemokine 11, CCL2 motif, chemokine (C-C motif) ligand 2; CCL22, C-C motif chemokine 22; CRP, C reactive protein; CXCL10, C-X-C motif chemokine 10; EGF, epidermal growth factor, G-CSF, granulocyte colony-stimulating factor; IL, interleukin; MPO, myeloperoxidase; PLS, partial least squares; VEGF, vascular endothelial growth factor.